View Issue Details

IDProjectCategoryView StatusLast Update
0004534The Dark ModDesign/Codingpublic12.06.2017 08:04
Reporterstgatilov Assigned Tostgatilov  
PrioritynormalSeveritynormalReproducibilityN/A
Status resolvedResolutionfixed 
Product VersionSVN 
Target VersionTDM 2.06Fixed in VersionTDM 2.06 
Summary0004534: Playing sound stream from video file
DescriptionRight now interface of idCinematic is not ready for playing audio from the video file.
It would be great to see how this can be extended, implement audio decoding/buffering in idCinematicFFmpeg and put the audio into sound system properly.
TagsNo tags attached.

Relationships

related to 0004519 resolvedstgatilov Investigate using videos of common formats (avi, mp4) 
related to 0004540 resolvedstgatilov Sound stops playing in cinematics because of breakpoints 
related to 0004542 resolvedstgatilov Support playing stereo sound in cinematics 

Activities

stgatilov

stgatilov

12.06.2017 05:49

administrator   ~0008893

Last edited: 12.06.2017 05:59

Playing sound was added in SVN rev 6931.

idSoundSample now has type WAVE_FORMAT_TAG_STREAM_CINEMATICS in addition to previously available WAVE_FORMAT_TAG_PCM and WAVE_FORMAT_TAG_OGG. This type is somewhat special, because it is the only completely streaming type of sound sample. Such a sound sample stores pointer to idCinematic in itself, and takes sound data from it.

In order to create cinematic-based sound sample, one should use command "fromVideo [material_name]" instead of specifying explicit sound file in the sound shader. When sound sample is created in idSoundSample::Load, it is checked if the filename (i.e. material_name) equals the name of existing material. If yes, then a cinematic-based sound is created.

When the sound is played, sound samples are fetched via idSampleDecoderLocal::Decode method, which chooses appropriate code path depending on sound sample type. For cinematic-based sound sample, audio samples are fetched directly from linked cinematic using the new idcinematic::SoundForTimeInterval virtual method.

Aside from sound system support described above, support for decoding audio is added in idCinematicFFmpeg. In order to enable it, one must set "withAudio" parameter to "videoMap" command in material declaration. Without this key, sound packets would be discarded regardless of other settings. When this key is set, it is passed to idCinematic::InitFromFile as an additional parameter. The sound samples are queued using the new method idCinematic::SoundForTimeInterval.

Internally, the implementation works like this.
The AVFormat library parses the file sequentally and splits it into packets, which are later classified into interleaved sequences known as streams. When withAudio = true, both video and audio streams are parsed. Extracted packets are put into packet queue, there are two separate queues for audio and video packets. Note that queuing packets is absolutely necessary, because the requests for video frames and sound samples do not fit exactly how the streams are interleaved in the file. Moreover, the packet decoding function is guarded with lock (critical section), because sound is usually fetched from the dedicated sound thread.

Method ImageForTime fetches video packets from the corresponding queue, decodes them into ready-to-load RGBA frames, which are stored as DecodedFrame structs. These frames are stored in a queue (old frames memory is reused if possible). Each frame contains a timestamp + duration. The frames are discarded from queue when ImageForTime is called for time moment greater than (timestamp + duration), which is very important to avoid huge memory consumption.

Method SoundForTimeInterval fetches audio packets from the corresponding queue, decodes then into 44K sound samples, and puts them into FIFO queue (AVAudioFifo). This queue is timestamped as a whole: when new audio packet is decoded, this timestamp is updated so that the newly decoded samples start at desired moment of time (possibly moving the previously decoded samples as result). This should work perfectly for proper sound streams, but if sound stream has gaps, this may probably cause weird behavior. The output sound samples are taken from the FIFO, and the old samples are discarded.

An important question is how video and audio are synced. Currently video clock is tied to renderer backend time. It is reset to zero when ResetTime is called, and also when ImageForTime is called before reset. If user pauses game (e.g. on breakpoint), then backend time would jump forward when rendering is resumed, which forces cinematic to fast-forward the video.
However, SoundForTimeInterval receives offset from the beginning of the sound + interval length (in sound samples). This offset does not jump if the game is paused. Moreover, sound cannot be fetched using only renderer's backend time, because sound system may request future sound data at any moment, in particular it usually schedules 3 blocks of sound samples ahead.
As a result, I implemented hybrid system: there is sound clock, which is tied to sample offset, but this clock can be shifted sometimes. Sound clock time if converted to video time by adding _soundTimeOffset. However, if we see that resulting video time is too different from the last video time passed into ImageForTime (i.e. more than 0.3 seconds of difference), then sound clock is resynced: _soundTimeOffset is changed to nullify the current difference in clocks.

I think I should also mention various limitations of the current system:
1. All the sound-related things do not work with legacy ROQ cinematics implementation. I think if you try to attach sound to material with legacy cinematics, everything would work, but you won't hear sound.
2. We cannot play sound from looped video. This is checked, so even if you set both flags, one of them would be discarded.
3. If you forget "withAudio" flag and attach a sound shader, then you simply won't hear anything, just as in p.1.
4. If you set "withAudio" flag, make sure sound shader is linked to it. It should start playing approximately at the same time when the video is started. If you forget linking sound shader, then sound packets (compressed) would pile up and consume memory. The total memory consumption can achieve the total size of compressed sound stream in the video file.
5. Never link two sound shaders to the same material with cinematics (this is not checked)! I'm pretty sure this won't work because SoundForTimeInterval method mutates the state of sound FIFO. If two sound samples try to fetch audio data from single cinematics, something bad would happen.

stgatilov

stgatilov

12.06.2017 06:00

administrator   ~0008894

This post should be very helpful for setting up video + sound playback:
  http://forums.thedarkmod.com/topic/18820-why-roq/page-2#entry407624

Issue History

Date Modified Username Field Change
17.05.2017 04:16 stgatilov New Issue
17.05.2017 04:16 stgatilov Status new => assigned
17.05.2017 04:16 stgatilov Assigned To => stgatilov
17.05.2017 04:23 stgatilov Relationship added related to 0004519
12.06.2017 05:49 stgatilov Note Added: 0008893
12.06.2017 05:50 stgatilov Note Edited: 0008893
12.06.2017 05:55 stgatilov Note Edited: 0008893
12.06.2017 05:56 stgatilov Note Edited: 0008893
12.06.2017 05:58 stgatilov Note Edited: 0008893
12.06.2017 05:59 stgatilov Note Edited: 0008893
12.06.2017 06:00 stgatilov Note Added: 0008894
12.06.2017 06:01 stgatilov Status assigned => resolved
12.06.2017 06:01 stgatilov Fixed in Version => TDM 2.06
12.06.2017 06:01 stgatilov Resolution open => fixed
12.06.2017 06:01 stgatilov Target Version => TDM 2.06
12.06.2017 07:25 stgatilov Relationship added related to 0004540
12.06.2017 08:04 stgatilov Relationship added related to 0004542