View Issue Details

IDProjectCategoryView StatusLast Update
0003984The Dark ModGraphicspublic05.03.2024 06:33
ReporterSteveL Assigned Tocabalistic  
PrioritynormalSeveritynormalReproducibilityN/A
Status assignedResolutionopen 
Product VersionSVN 
Summary0003984: GPU vertex skinning
DescriptionAnimated (MD5) meshes and particle effects currently work out the exact position of each vertex of each mesh on the CPU, and then load the results to the GPU every frame. Both these things take time, and the result is that we can typically have only 3 or 4 AI on screen before losing FPS.

The animation system is responsible for deciding where an AI's (or any md5 mesh's) skeleton is. Skinning is the process of fitting its visible surfaces over the current skeleton configuration. That can be done on either the CPU or the GPU. We currently use only the CPU. The best solution is to do it on *both* CPU and GPU, because doing the sums twice is less costly than passing the results between processors.

Ideally, the CPU will skin only the collision model (where needed, approximations are used most of the time) and any shadowcasting meshes, because it needs to know those results for other purposes. The GPU will skin the shadow mesh and the remaining visible meshes, which the CPU doesn't care about.
Additional InformationThe GPU has 100s or 1000s of processors that can perform this task in parallel, unlike the single CPU processor that does the task now, so letting the GPU do it will be better. Plus, the uploading of results to the GPU takes even longer than calculating them.

Example: A single tdm_ai_citywatch seen close up with one shadowcasting light and one ambient_world light transfers 825kb of data to the GPU each frame for its visible meshes and shadow volumes.

The fast memory cache reserved on the gpu for current-frame-only meshes in TDM is only 2MB, so 3 AI or a moderate rainfall are enough to exceed it and make slower memory kick in. We could try enlarging that, but we won't need to if we get GPU skinning working.

Another problem I've spotted in researching this: 212kb of the 825kb for a city watch guard is duplicated data. Again no need to fix that if we have gpu skinning.

There is only 1 copy of the base MD5 mesh held in CPU memory for all AI using a given modelDef in a map. That base copy is used together with each AI's unique skeleton configuration to generate the individual drawn surfaces for those AI each frame. But those draw surfs are all skinned and uploaded individually to the GPU every frame.

With GPU skinning, a single copy of the modelDef will be held on each of the gpu AND the cpu, and what gets passed from the cpu to the gpu each frame for each AI is just the skeleton configuration: a matrix for each joint which totals 6kb per AI instead of 825kb.
TagsNo tags attached.

Relationships

child of 0003684 new Investigate GPL Renderer Improvements 

Activities

SteveL

SteveL

01.01.2015 20:51

reporter   ~0007291

Last edited: 01.01.2015 20:54

The changes made in Doom3 BFG for GPU Skinning:

- In Model.h: The basic surface type (triangle mesh) has had an extra pointer added: idRenderModelStatic* staticModelWithJoints. That's used for meshes whose vertex positions are determined by a set of joints.

- idMD5Mesh -- which represents a modelDef, the data shared by all AI of a given type has had several changes:
- - idMD5Mesh::ParseMesh builds an AI mesh in base pose (T-pose) which it loads to the GPU as a static vbo. That's the base mesh used to draw all AI of that type later. It also generates a list of matrices, one for each joint, that translate from model space into joint space.
- - idMD5Mesh::UpdateSurface chooses wehther to skin a given mesh for a given frame on the CPU or GPU. It used to do CPU skinning in all cases. In BFG it can choose to produce a set of updated joint matrices instead, whixch'll be used on the GPU to skin the mesh.

- idRenderModelMD5::InstantiateDynamicModel used to call idMD5Mesh::UpdateSurface for each mesh. In BFG it builds the skinning matrix, a matrix for each joint in its current position which will be used to move all attached vertices into position.

- tr_frontend_addmodels.cpp
This is a new file in BFG, which inherits some of the functions from (missing) tr_light.cpp, including R_AddModels, now with more variants. Does VBO allocation for loading the current joint configuation for an AI to the GPU. Uses a single-frame vbo, so it gets replaced once per frame. Tests whether it's current for the frame.

- tr_backend_draw.cpp
- - RB_RenderInteractions chooses between versions of the usual shader programs for light-interacting surfaces or shadow volumes -- each program now comes in 2 flavours, for skinned or unskinned meshes.
- - RB_DrawElementsWithCounters binds the uniform buffer (current joint config) and does the rest.

- VertexCache
- - has been upgraded to allow a third type of vbo -- the list of current joint matrices for a skinned mesh. This is a uniform buffer object, not an array of vertex data.

- The shader system is now pure GLSL (no ARB assembly shaders). That enables some of the above techniques.

SteveL

SteveL

01.01.2015 21:11

reporter   ~0007292

Last edited: 01.01.2015 21:19

Some parts of the above implementation are supported only by GLSL shaders, and we won't be able to use them unamended with our ARB shader programs.

Upgrading to allow GLSL shaders is an obvious later step for TDM engine, but there are reasons to push ahead with GPU Skinning and soft shadows first -- i.e., that upgrading to GLSL might take a very long time or not get finished.

The BFG implementation is still a close fit for our engine, so we'd use much of it, but we'd need some changes:

- VertexCache -- we can leave this unamended. We can't use the new uniform buffer objects for storing joint matrices until we have GLSL.

- Model_MD5 changes: we'd want these. The routine to build the initial bind pose, and the changes to InstantiateDynamicModel which makes a skinning matrix for each frame. But we'll need to break the skinning matrices into a more compact representation -- a quaternion plus a vector offset (7 floats) instead of a matrix (minimum 12 floats).

- tr_frontend_addmodels: we won't use this change. We can't (yet) use uniform buffer objects. We will have to load the skinning matrices during the backend
draw, using the environmental and local shader program variables.

Additional changes
==================
The big thing is we can't pass as much data to the gpu per AI as BFG's GLSL implementation does. It uses a set of 102 4x4 matrices (=408 "registers") to hold the current position and orientation for each joint -- so the joint limit for any AI is 102 (not counting joints that are not used for skinning).

We will have to deal with tighter limits. My GPU allows 4096 regsiters to be passed as uniforms to GLSL programs in a single draw call, but only 256 registers for ARB programs. And the spec for ARB says that cards have to support only 96 registers, so we should try to stay in that limit.

We can get a second set of 96 registers by using both environmental AND local parameters, giving 192 to play with.
The engine already reserves 21 environmental registers (numbers 0-20) for drawing light interactions, and 4 local registers for the "VertexParm" parameters that people can specify in shader programs.

So we have range 21-95 of each set clear to play with. That's a maximum of 76 joints, if we put the joint's orientation as a quaternion in one list, and the joint's position as a vector in the second list. Each of those can be packed into a single register.

TDM's most complex AI models use 68 joints for skinning, so it's enough. Any meshes that wanted to use more joints in future would have to continue to be skinned on the CPU, at least till we implement GLSL.

BFG packs the vertex->joint mappings and weights into the original mesh's "color" settings. I'll have to check whether we use vertex coloring for any light interactions. If we do, then we can't do that. Ideally we'd find somewhere else to put them so we maintain maximum flexibility.

SteveL

SteveL

03.01.2015 12:38

reporter   ~0007296

Last edited: 03.01.2015 16:05

Fortunately, there's not much code that creates VBOs, and even less that's used by MD5 meshes. It all sits in idVertexCache, ::Alloc() and ::AllocFrameTemp(). This is a clean design and it shouldn't be too hard to work with.

::AllocFrameTemp() deals with the memory meant to be used for single-frame meshes. Two buffers are maintained of 2mb each, so that one can be written to while the other is being read from ("double-buffered"). In theory anyway. We don't need it while we are using single-threaded architecture, and it isn't used much. Only used by gui drawing, weather patches, skybox texgens, and some decals.

::Alloc() is the one used by md5 meshes (as well as the rest of the world's solid objects).

Only 2 functions pass lists of mesh vertices to the GPU (they use "Alloc"):
- R_CreateAmbientCache -- which does visible surfaces
- R_CreateVertexProgramShadowCache -- which does shadow volumes

For a static model, this only needs doing once at map start. For md5 meshes, it's done every frame while they are in view.
Those functions pass the entire mesh to the GPU. Individual lights will hit subsections of those meshes.

Only 2 functions pass indexes to the GPU. The indexes are what specifies subsections of the meshes loaded above, for individual light interactions:
- idInteraction::AddActiveInteraction -- which deals with lit materials and shadow volumes
- R_AddAmbientDrawsurfs -- which deals with unlit materials (or unlit stages of a material decl)

A given light will probably hit less than half the tris belonging to an AI (because the other half will be facing away). So the interaction between that light and the AI mesh will need a subset of the vertices that got uploaded by R_CreateAmbientCache above. That subset is identified by an index, and the index needs calculating and uploading every frame, as the subset of tris changes frame by frame.

Likewise the shadowcasting sillhouette changes frame by frame. That gets uploaded as a index to the shadow volume verts too.

Typical figures in bytes for the allocations in 1 frame, for one AI hit by 1 shadowcasting light and one ambient:
R_CreateAmbientCache 273120 (Verts)
R_CreateVertexProgramShadowCache 74176 (Verts)
idInteraction::AddActiveInteraction 190560 (Index)
R_AddAmbientDrawsurfs 66312 (Index)

The aim of GPU skinning is to save the Verts allocations, or just over 57% of the total.

We might be able to cut down on indexes too. The duplicate data mentioned in the original report is duplicate index data coming from idInteraction::AddActiveInteraction, but that's probably a separate task.

Also, I need to check the BFG code again to see how it handles he indexes for interactions. Maybe (hopefully) it's doing something different. To be able to calculate the indexes on the CPU as is done now, it would need to skin the visible meshes on the CPU too, else it wouldn't know which tris face the light and which don't.

SteveL

SteveL

03.01.2015 12:41

reporter   ~0007297

Last edited: 03.01.2015 12:50

Moving objects like func_rotators and func_movers don't upload new vertex data every frame. The verts are static because they are measured in relation to the model's origin and axis, and those measurements don't change when the model moves. It's just the model origin and axis that change. Instead of uploading new verts, they just create new indexes for light interactions, to say which bits of the model are hit by lights.

Likewise a moving light doesn't cause static models to re-upload their vertex data, but again the indexes are nvalidated and get recreated and re-uploaded.

For static models where the lights are not moving, none of the above gets recreated or re-uploaded. Both vertex and index vbos are re-used frame to frame.

SteveL

SteveL

14.02.2015 16:04

reporter   ~0007420

Last edited: 14.02.2015 16:05

BFG only did half the job with vertex skinning, and we shouldn't stop where it left off. It saves about half the traffic between the CPU and GPU (the vert locations for animated meshes), but it still relies on the CPU calculating and uploading the other half each frame: the indexes (lists) of triangles that are hit by a light each frame, and the shadowcasting sillhouette of each model.

BFG is still an OpenGL2 renderer. It uses GLSL but it doesn't take advantage of any of the OpenGL3 facilities that could cut out the rest of the traffic. We could try using a geometry shader to discard unlit triangles, for example, and to work out the shadow sillhouette.

Doing this work on the GPU instead of the CPU is essential because we want to use Instancing for drawing multiple copies of a model in one draw call, which means we'll want to use the same lists of tris for every copy of a model. No more having the CPU work out what tris get hit by a light, which makes it maintain and upload a different Index for each copy of a model.

One example that we do want to follow from BFG: it adopts a "stateless" system where none of this info is cached between frames, or even between draw passes in a single frame. It's all re-calculated every time it needs to be used. That's a good thing both for code simplicity and for performance. Since the engine was designed, number-crunching has become relatively much cheaper than storing and retrieving blocks of data.

nbohr1more

nbohr1more

17.09.2017 18:57

developer   ~0009274

Today: 9/17/2017
Code freeze: 10/4/2017

Possibility of completion in 2.06? 0%

Moving to 2.07
Bikerdude

Bikerdude

26.12.2020 17:21

reporter   ~0013273

Is this track still relevant give all the new changes since 2.08..?
Bikerdude

Bikerdude

16.04.2021 17:31

reporter   ~0013865

@cabalistic , thanks for picking this up fella.
nbohr1more

nbohr1more

25.04.2021 16:28

developer   ~0013911

Last edited: 05.03.2024 06:33

Id Tech 4 "Iced Tech Engine" has implemented this:

https://github.com/jmarshall23/IcedTech

Though the divergences in this engine may be too extreme to be of use compared to porting BFG changes (etc).

Edit 2024-03 :

icecoldduke ( the developer ) has offered assistance with porting the GPU skinning portion
nbohr1more

nbohr1more

29.06.2021 23:30

developer   ~0014137

Hah, Robert had a fork of vanilla Doom 3 with GPU skinning hidden in his repo for awhile:

https://github.com/RobertBeckebans/TEKUUM-D3

Issue History

Date Modified Username Field Change
22.12.2014 02:33 SteveL New Issue
22.12.2014 02:38 SteveL Assigned To => SteveL
22.12.2014 02:38 SteveL Status new => assigned
22.12.2014 02:38 SteveL Relationship added child of 0003684
22.12.2014 03:06 SteveL Description Updated
01.01.2015 20:51 SteveL Note Added: 0007291
01.01.2015 20:54 SteveL Note Edited: 0007291
01.01.2015 21:11 SteveL Note Added: 0007292
01.01.2015 21:14 SteveL Note Edited: 0007292
01.01.2015 21:19 SteveL Note Edited: 0007292
03.01.2015 12:38 SteveL Note Added: 0007296
03.01.2015 12:41 SteveL Note Added: 0007297
03.01.2015 12:41 SteveL Note Edited: 0007297
03.01.2015 12:46 SteveL Note Edited: 0007296
03.01.2015 12:50 SteveL Note Edited: 0007297
03.01.2015 16:05 SteveL Note Edited: 0007296
14.02.2015 16:04 SteveL Note Added: 0007420
14.02.2015 16:05 SteveL Note Edited: 0007420
30.12.2015 15:28 SteveL Target Version TDM 2.04 => TDM 2.05
22.11.2016 20:24 nbohr1more Product Version => SVN
22.11.2016 20:24 nbohr1more Target Version TDM 2.05 => TDM 2.06
15.02.2017 04:36 grayman Assigned To SteveL =>
15.02.2017 04:36 grayman Status assigned => new
17.09.2017 18:57 nbohr1more Note Added: 0009274
17.09.2017 18:58 nbohr1more Target Version TDM 2.06 => TDM 2.07
09.06.2018 18:38 nbohr1more Target Version TDM 2.07 =>
26.12.2020 17:21 Bikerdude Note Added: 0013273
16.04.2021 15:22 nbohr1more Assigned To => cabalistic
16.04.2021 15:22 nbohr1more Status new => assigned
16.04.2021 17:31 Bikerdude Note Added: 0013865
25.04.2021 16:28 nbohr1more Note Added: 0013911
29.06.2021 23:30 nbohr1more Note Added: 0014137
05.03.2024 06:33 nbohr1more Note Edited: 0013911