View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0006300||The Dark Mod||Coding||public||16.06.2023 19:05||14.11.2023 15:04|
|Product Version||TDM 2.11|
|Target Version||TDM 2.12|
|Summary||0006300: Refactor idImage code for CPU-resident textures|
|Description||Several major things happened in idImage since Doom 3:|
* framebuffers have their attachments as idImage-s
* some parts of engine use CPU-resident images
* image programs can use compressed images
* multithreading is real now
There is a particular concern with CPU-resident images, since the old code does not support flexible usage.
The loading part simply runs from start to end, there is no way to change anything after loading is over, especially if multithreading is involved.
idImage should be refactored with two goals:
1) The images which are readonly and loaded/reloaded from some source (e.g. file) should be separated from scratch images (e.g. FBO attachments).
2) Reloadable images should store data in several ways: GPU texture, CPU uncompressed data, CPU compressed data. Depending on residency changes, it should be easy to get CPU representations from each other and upload to GPU. Note that image can be compressed/uncompressed on GPU, and it can be compressed/uncompressed in source file, but the system should support all the cases.
|Additional Information||Original thread:|
|Tags||No tags attached.|
First I decided to fill the holes in texture compression, i.e. implement software code for compression into DXT1/3/5 formats.
After that we'll have C++ code for converting images between compressed and uncompressed representations in all cases.
As usual, I started with some cleanup:
r10402 Deleted unused image formats from functions used in image reporting.
r10403 Deleted support for invalid/generic internal image formats.
r10404 Supported GL_RGBA16F and two depth/stencil internal formats in idImage::Print.
r10405 Shuffled internal formats around, moved BitsForInternalFormat code closer to idImage::Print.
r10406 FormatIsDXT: better implementation + renamed to IsImageFormatCompressed to avoid confusion with RGTC.
r10407 BitsForInternalFormat now returns CPU-side size by default.
r10408 Moved DXT decompression tests and IsImageFormatCompressed function to new file Image_compress.cpp.
Then I cleaned up idMath rounding functions (new DXT compression code requires function like std::nearbyint):
r10409 Renamed idMath::Ftol to idMath::Ftou, deleted FtolFast, added rounding to comments.
r10410 idMath::FtoiFast now uses round-to-nearest as originally indended.
r10411 Added unit tests to verify that idMath rounding functions work as intended.
r10412 Deleted Ftou function completely.
r10413 Massive rename of rounding functions.
Finally, added compression code and integrated it:
r10416 Preparation for software DXT compression.
r10417 Implemented DXT1/3/5 compression with generic software code.
r16815 Added a test image from Kodak image benchmark for CompressDxt:Quality unit test.
r10418 Added debug cvar image_forceRecompress.
The only case which is not implemented is DXT1 with 1-bit transparency.
It is never used for automatic compression now (transparent diffuse/specular textures become DXT3), and this is probably for good.
Anyway, there is no need for asking driver to compress anything now.
The cvar image_forceRecompress can be test software compression.
Without it only a very few images are compressed in software, since all textures are precompressed DDS (except for normal maps).
A new series of commits adds SSE2 fast implementation for all the compression functions:
r10419 Refactored SIMD initialization infrastructure.
r10420 Compare outputs of "generic" and optimized SIMD implementations in image compression unit tests.
r10421 Refactoring RGTC compression SSE2 code in preparation for supporting other formats.
r10422 Added SSE2 implementation of DXT1 compression.
r10423 Added implementation of DXT5 compression.
r10424 Added fast implementation for DXT3 compression.
Correctness is briefly tested by unit test:
Performance can be measured by disabled unit test:
runTests -tc="CompressDxt:Performance" -ns
Right now for me the numbers are:
G:\TheDarkMod\darkmod_src\renderer\resources\Image_compress.cpp(164): MESSAGE: Compressed (1024 x 1024) image to format 83f0 in 4.396 ms
G:\TheDarkMod\darkmod_src\renderer\resources\Image_compress.cpp(164): MESSAGE: Compressed (1024 x 1024) image to format 83f2 in 4.853 ms
G:\TheDarkMod\darkmod_src\renderer\resources\Image_compress.cpp(164): MESSAGE: Compressed (1024 x 1024) image to format 83f3 in 5.104 ms
G:\TheDarkMod\darkmod_src\renderer\resources\Image_compress.cpp(164): MESSAGE: Compressed (1024 x 1024) image to format 8dbd in 1.066 ms
Obviously, RGTC is the cheapest to compress, and DXT1 color compression dominates all the other formats.
The actual refactoring lands into trunk:
r10468 Now idImage is abstract class with idImageAsset and idImageScratch derived classes.
r10469 GenerateAttachment method moved to idImageScratch class.
r10470 A bunch of methods moved from idImage to idImageAsset.
r10471 UploadScratch moved to idImageScratch.
r10472 A lot more methods moved to idImageAsset.
r10473 Ambient occlusion code now uses GenerateAttachment as all the other methods.
r10474 Fixed some listImages warnings.
r10475 Moved more members to idImageAsset.
r10476 ImageFromFunction generators now return imageBlock_t, i.e. CPU-side image data.
r10477 Loading file-defined and function-defined asset images is merged into single code path.
r10478 Textures used in idMaterial are now idImageAsset.
Now there are idImageAsset and idImageScratch, with idImage being their common base class.
Scratch images are the textures with temporary contents, which you can modify at any moment.
Usually they are generated as attachments to framebuffers, although there is a single "_scratchImage" texture which is never attached to FBO but filled by copying contents from other texture.
These images are very simple, they have no reloading, compression, etc.
Asset images are readonly immutable textures, with contents initialized from some source.
Source can be: file, image program, or generator function.
Now generator function merely provides CPU-side contents of the image (W, H, pixels), so there is much less specific behavior in this case.
Files and image programs use the same code path just as before.
Major bugfixes (thankfully caught by Daft Mugi):
r10485 Fixed assert for same source file (comparison should be case-insensitive).
r10486 If someone requests image with empty name, print warning with load stack and return default image.
r10487 FrameBufferManager::CopyRender now generates any destination image, not only _scratch and _scratch2.
One more bugfix:
r10489 Reimplemented double vision effect with custom shader, thus fixing its dependence on alpha of the rendered frame. Note: requires proper revision with material in assets repo.
r16843 Added textures/postprocess/doublevision material for new doublevision implementation.
It turns out that original doublevision implementation depended on alpha contents of the rendered frame, which are rather undefined.
On one scene, alpha is often 0.8, and near lights it jumps up to 8 --- the dependency was not good =)
The dependency is now removed: custom shader blends texture with multiple shifts, and blending coefficients no longer depend on alpha contents of frame.
Previously all scratch images had RGBA8 format.
Now they have the same format as the main renderbuffer (i.e. RGBA16F usually).
That is why previously dependency was not a big issue (high values were clamped down to 1).
Obviously, RGBA16F costs more memory bandwidth.
But I think it is still better way, so let's retain it like that.
One more bugfix:
r10505 Hardcode alpha = 1 to all scratch images created via renderbuffer copy.
This one applies to materials like this:
description "use on a patch below water surface to create strong reflections"
blend blend // standard alpha-blending
mirrorRenderMap 128 128 // texture is generated by rendering mirror subview
translate 0.5, 0.5
scale 0.5, 0.5
red .5 * Parm0
green .5 * Parm1
blue .5 * Parm2
alpha 0.3 // source alpha = 0.3 * texture alpha
|16.06.2023 19:05||stgatilov||New Issue|
|17.06.2023 15:54||stgatilov||Assigned To||=> stgatilov|
|17.06.2023 15:54||stgatilov||Status||new => assigned|
|19.06.2023 19:41||stgatilov||Note Added: 0016027|
|19.06.2023 20:02||stgatilov||Note Edited: 0016027|
|21.06.2023 14:29||stgatilov||Note Added: 0016028|
|28.10.2023 10:01||stgatilov||Note Added: 0016151|
|04.11.2023 20:38||stgatilov||Note Added: 0016164|
|05.11.2023 15:10||stgatilov||Note Added: 0016166|
|14.11.2023 15:04||stgatilov||Note Added: 0016176|