View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0004534||The Dark Mod||Design/Coding||public||17.05.2017 04:16||12.06.2017 08:04|
|Target Version||TDM 2.06||Fixed in Version||TDM 2.06|
|Summary||0004534: Playing sound stream from video file|
|Description||Right now interface of idCinematic is not ready for playing audio from the video file.|
It would be great to see how this can be extended, implement audio decoding/buffering in idCinematicFFmpeg and put the audio into sound system properly.
|Tags||No tags attached.|
Playing sound was added in SVN rev 6931.
idSoundSample now has type WAVE_FORMAT_TAG_STREAM_CINEMATICS in addition to previously available WAVE_FORMAT_TAG_PCM and WAVE_FORMAT_TAG_OGG. This type is somewhat special, because it is the only completely streaming type of sound sample. Such a sound sample stores pointer to idCinematic in itself, and takes sound data from it.
In order to create cinematic-based sound sample, one should use command "fromVideo [material_name]" instead of specifying explicit sound file in the sound shader. When sound sample is created in idSoundSample::Load, it is checked if the filename (i.e. material_name) equals the name of existing material. If yes, then a cinematic-based sound is created.
When the sound is played, sound samples are fetched via idSampleDecoderLocal::Decode method, which chooses appropriate code path depending on sound sample type. For cinematic-based sound sample, audio samples are fetched directly from linked cinematic using the new idcinematic::SoundForTimeInterval virtual method.
Aside from sound system support described above, support for decoding audio is added in idCinematicFFmpeg. In order to enable it, one must set "withAudio" parameter to "videoMap" command in material declaration. Without this key, sound packets would be discarded regardless of other settings. When this key is set, it is passed to idCinematic::InitFromFile as an additional parameter. The sound samples are queued using the new method idCinematic::SoundForTimeInterval.
Internally, the implementation works like this.
The AVFormat library parses the file sequentally and splits it into packets, which are later classified into interleaved sequences known as streams. When withAudio = true, both video and audio streams are parsed. Extracted packets are put into packet queue, there are two separate queues for audio and video packets. Note that queuing packets is absolutely necessary, because the requests for video frames and sound samples do not fit exactly how the streams are interleaved in the file. Moreover, the packet decoding function is guarded with lock (critical section), because sound is usually fetched from the dedicated sound thread.
Method ImageForTime fetches video packets from the corresponding queue, decodes them into ready-to-load RGBA frames, which are stored as DecodedFrame structs. These frames are stored in a queue (old frames memory is reused if possible). Each frame contains a timestamp + duration. The frames are discarded from queue when ImageForTime is called for time moment greater than (timestamp + duration), which is very important to avoid huge memory consumption.
Method SoundForTimeInterval fetches audio packets from the corresponding queue, decodes then into 44K sound samples, and puts them into FIFO queue (AVAudioFifo). This queue is timestamped as a whole: when new audio packet is decoded, this timestamp is updated so that the newly decoded samples start at desired moment of time (possibly moving the previously decoded samples as result). This should work perfectly for proper sound streams, but if sound stream has gaps, this may probably cause weird behavior. The output sound samples are taken from the FIFO, and the old samples are discarded.
An important question is how video and audio are synced. Currently video clock is tied to renderer backend time. It is reset to zero when ResetTime is called, and also when ImageForTime is called before reset. If user pauses game (e.g. on breakpoint), then backend time would jump forward when rendering is resumed, which forces cinematic to fast-forward the video.
However, SoundForTimeInterval receives offset from the beginning of the sound + interval length (in sound samples). This offset does not jump if the game is paused. Moreover, sound cannot be fetched using only renderer's backend time, because sound system may request future sound data at any moment, in particular it usually schedules 3 blocks of sound samples ahead.
As a result, I implemented hybrid system: there is sound clock, which is tied to sample offset, but this clock can be shifted sometimes. Sound clock time if converted to video time by adding _soundTimeOffset. However, if we see that resulting video time is too different from the last video time passed into ImageForTime (i.e. more than 0.3 seconds of difference), then sound clock is resynced: _soundTimeOffset is changed to nullify the current difference in clocks.
I think I should also mention various limitations of the current system:
1. All the sound-related things do not work with legacy ROQ cinematics implementation. I think if you try to attach sound to material with legacy cinematics, everything would work, but you won't hear sound.
2. We cannot play sound from looped video. This is checked, so even if you set both flags, one of them would be discarded.
3. If you forget "withAudio" flag and attach a sound shader, then you simply won't hear anything, just as in p.1.
4. If you set "withAudio" flag, make sure sound shader is linked to it. It should start playing approximately at the same time when the video is started. If you forget linking sound shader, then sound packets (compressed) would pile up and consume memory. The total memory consumption can achieve the total size of compressed sound stream in the video file.
5. Never link two sound shaders to the same material with cinematics (this is not checked)! I'm pretty sure this won't work because SoundForTimeInterval method mutates the state of sound FIFO. If two sound samples try to fetch audio data from single cinematics, something bad would happen.
This post should be very helpful for setting up video + sound playback:
|17.05.2017 04:16||stgatilov||New Issue|
|17.05.2017 04:16||stgatilov||Status||new => assigned|
|17.05.2017 04:16||stgatilov||Assigned To||=> stgatilov|
|17.05.2017 04:23||stgatilov||Relationship added||related to 0004519|
|12.06.2017 05:49||stgatilov||Note Added: 0008893|
|12.06.2017 05:50||stgatilov||Note Edited: 0008893||View Revisions|
|12.06.2017 05:55||stgatilov||Note Edited: 0008893||View Revisions|
|12.06.2017 05:56||stgatilov||Note Edited: 0008893||View Revisions|
|12.06.2017 05:58||stgatilov||Note Edited: 0008893||View Revisions|
|12.06.2017 05:59||stgatilov||Note Edited: 0008893||View Revisions|
|12.06.2017 06:00||stgatilov||Note Added: 0008894|
|12.06.2017 06:01||stgatilov||Status||assigned => resolved|
|12.06.2017 06:01||stgatilov||Fixed in Version||=> TDM 2.06|
|12.06.2017 06:01||stgatilov||Resolution||open => fixed|
|12.06.2017 06:01||stgatilov||Target Version||=> TDM 2.06|
|12.06.2017 07:25||stgatilov||Relationship added||related to 0004540|
|12.06.2017 08:04||stgatilov||Relationship added||related to 0004542|