View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0003234||The Dark Mod||Coding||public||19.09.2012 00:47||25.06.2018 05:52|
|Priority||normal||Severity||feature||Reproducibility||have not tried|
|Target Version||TDM 2.06||Fixed in Version||TDM 2.06|
|Summary||0003234: Merge Mh's Optimized VBO Code|
|Description||Quake coder Mh has taken an initial look at modernizing Doom 3's Vertex Cache and VBO Code:|
"What the glMapBufferRange stuff does is allow you to take advantage of a VBO streaming pattern that D3D has enjoyed since at least version 7 - in D3D terms it's known as the discard/no-overwrite pattern.
A VBO is a GPU resource, and normally, if you try to update a GPU resource that is currently in use for drawing with (entirely possible because of the asynchronous nature of CPU/GPU operation), everything must stall and wait for drawing to complete before the update can happen. The stock Doom 3 code actually double-buffers it's streaming VBOs to try avoid this (in a slightly obfuscated way) but glMapBufferRange is a more robust way.
So, I mentioned discard/no-overwrite above. Here's what they do.
The buffer is filled in a linear manner. You've got 2mb (or whatever) of space, vertexes are added beginning at position 0, as new vertexes are added they get appended until the buffer fills, then magic happens.
This standard update is no-overwrite; your code makes a promise to GL that it's not going to overwrite any region of the buffer that may be currently in use for drawing, and in return GL will let you update the buffer without blocking. In order to be able to keep this promise your code must maintain a counter indicating how much space in the buffer it has previously used, and add new verts to the buffer at this counter position.
When the buffer becomes full you "discard". This doesn't throw away anything previously added, instead GL will keep the previous block of buffer memory around for as long as is needed to satisfy any pending draw calls, but will give you a new, fresh block for any further updates. That's the "magic" I mentioned above, and it's what lets you use a streaming VBO without any blocking.
This pattern will also let you get rid of Doom 3's double buffering, thus saving you some GPU memory (I haven't yet done this in my code). Because there's no more blocking it will run faster in cases where there is a lot of dynamic buffer usage, but because Doom 3 locks at 60fps it may not be as directly measurable as if the engine was unlocked. Hence the "it feels more responsive but I can't quite put my finger on it" result.
There's another chunk of code in the standard Alloc call which deals with updates of non-streaming VBOs and which is implemented in quite an evil manner by the stock Doom 3 code. When updating such a VBO you can get a faster update if the glBufferData params are the same as was previously used for that VBO (the driver can just reuse the previous block of buffer memory instead of needing to fully reallocate). Doom 3 doesn't do that, so it doesn't get these faster updates, but by searching the free static headers list for a VBO that matches and using that instead of just taking the first one from it, it can. Obviously it sucks that you need to search the list in this way, and a better implementation would just store the VBO with the object that uses it, and reuse the same VBO each time. Since this mainly happens with model animations an ever better implementation would use transform feedback to animate the model instead of animating it on the CPU and needing to re-upload verts each frame, but I haven't even looked at that yet.
So all in all the stock VBO implementation is an unholy mess that needs serious work to get it functioning right, much the same way as Quake 1 lightmap updates were a mess. That code just represents the start of a process.."
|Steps To Reproduce||1) Replace VertexCache.cpp and VertexCache.h with the attached|
3) Test the results
|Tags||No tags attached.|
May require an updated glext.h
Here is a newer version that was compiled against Doom 3 that includes
the gl_arb_map_buffer_range extension.
Or just grab from OpenGL.org
I saw no positive FPS results from doing this, in fact, I lost about 4-5 FPS on average. (YMMV, different hardware can influence the results.)
I compared against:
[Original.exe + Patch] vs. [Self Compiled] vs. [Self Compiled + MH VBO]
The MH VBO code was the slowest of all of them.
You will need updated versions of wglext.h, and glext.h, plus you have to manually define several GL functions that are used by VertexCache.h\cpp, basically, just look at how the game defines other GL functions for use, and do the same for the new functions.
I also had to replace a conditional:
glConfig.ARBMapBufferRangeAvailable // MH custom code? Not provided..
I think it's the same thing, but, I'm guessing since I don't have MH's full source...
I would stop by:
and discuss your changes with Mh.
I am sure he would be quite happy to hear that you are testing
Did you test the changes with a Time Demo like the one that is shipped with TDM?
Reckless's general download page:
compare with Raynorpat's code below...
(Raynorpat) Even further improvements (BFG's GLSL backend ported):
Related OpenGL optimizations:
Original work for comparison:
Revelator's recent post:
|Has anyone actually managed to get rid of vbo double buffer? I can see they have been trying to glMapBufferRange but with double buffered vbos it's same speed or worse.|
Added in rev 7116
Due to be replaced with BFG's implementation:
|r_useMapBufferRange produces artifacts with com_smp and tdm_lg_interleave > 1.|
|19.09.2012 00:47||nbohr1more||New Issue|
|24.09.2012 22:00||nbohr1more||Note Added: 0004850|
|24.09.2012 22:00||nbohr1more||Note Edited: 0004850|
|24.09.2012 22:02||nbohr1more||Note Edited: 0004850|
|08.10.2012 20:02||CodeMonkey||Note Added: 0004897|
|08.10.2012 22:00||nbohr1more||Note Added: 0004899|
|06.03.2014 22:27||nbohr1more||Note Added: 0006416|
|07.03.2014 00:18||STiFU||Relationship added||child of 0003684|
|11.03.2014 01:55||nbohr1more||Note Edited: 0004899|
|11.03.2014 01:55||nbohr1more||Note Edited: 0006416|
|21.03.2014 02:25||nbohr1more||Note Added: 0006450|
|12.11.2014 18:01||nbohr1more||Note Added: 0007120|
|12.11.2014 19:22||nbohr1more||Note Added: 0007121|
|26.11.2016 16:45||duzenko||Note Added: 0008559|
|08.09.2017 15:19||nbohr1more||Assigned To||=> duzenko|
|08.09.2017 15:19||nbohr1more||Severity||normal => feature|
|08.09.2017 15:19||nbohr1more||Status||new => resolved|
|08.09.2017 15:19||nbohr1more||Resolution||open => fixed|
|08.09.2017 15:19||nbohr1more||Fixed in Version||=> TDM 2.06|
|08.09.2017 15:19||nbohr1more||Target Version||=> TDM 2.06|
|08.09.2017 15:20||nbohr1more||Note Added: 0009181|
|04.10.2017 15:22||nbohr1more||Note Added: 0009387|
|25.06.2018 05:52||stgatilov||Relationship added||related to 0004849|