View Issue Details

IDProjectCategoryView StatusLast Update
0006650The Dark ModCodingpublic01.01.2026 12:43
Reporterstgatilov Assigned Tostgatilov  
PrioritynormalSeverityfeatureReproducibilityhave not tried
Status feedbackResolutionopen 
Product VersionTDM 2.13 
Target VersionTDM 2.14 
Summary0006650: Improvements of r_useParallelAddModels
DescriptionIt seems that the parallelization of R_AddModelSurfaces has some problems with:

1) Overhead.
  Noticed in Tracy for lightgem subview when they are MANY entities.
  For instance, this happens in starting area of Displacement, where parallelSky light covers a lot of entities in the starting area.
  There are a lot of renderEntities processed, but each of them takes little time.
  
2) Load balancing.
  This happens sometimes when some long job accidentally is done last, thus all the other workers are stale and wait for it.
  For instance, it happens in the starting area of mission Scroll of Remembrance.
TagsNo tags attached.

Relationships

related to 0006503 resolvedstgatilov Jobs system improvements 

Activities

stgatilov

stgatilov

01.01.2026 12:24

administrator   ~0017103

I somewhat improved the system in:
  r11078 idParallelJobManager::GetNumProcessingUnits now returns the actual number of worker threads to run a joblist on.
  r11079 Run R_AddSingleModel with adaptive chunking based on last-frame timing history.

To improve scheduling of R_AddSingleModel, some timing information is needed.
Now execution time of each R_AddSingleModel call for each view is measured and stored in idRenderEntityLocal.
Here are some changes to reduce overhead of various timing routines:
  r11075 GetClockTicks no longer runs CPUID instruction to serialize instruction stream.
  r11076 Replaced return type of GetClockTicks & ClockTicksPerSecond from double to uint64.
  r11077 Changed timer in job system: Sys_GetClockTicks instead of Sys_GetTimeMicroseconds.

The R_AddSingleModel jobs are first sorted by their execution time: slower jobs should run first, faster jobs should run later.
This reordering alone improves load balancing, including the Scroll of Remembrance case where on extra-tough entity normally processed last.

To reduce overhead, we then group the jobs into intervals/chunks, each chunk is executed as one job in job system.
We normally aim for chunks of size r_parallelAddModelsChunk = 0.1 ms, but the chunks near the end are more subdivided so that workers are more likely to finish simultaneously.
This is logically rather complicated algorithm, you can see it in JobChunks.cpp.
And of course it uses the timing information as well.
stgatilov

stgatilov

01.01.2026 12:25

administrator   ~0017104

The new mode is enabled under r_useParallelAddModels = 2, which is now enabled by default:
  r17390 Frontend Acceleration ON now corresponds to r_useParallelAddModels = 2.

I expect the new mode useful only for debugging and comparison purposes.
But it can still be enabled by setting r_useParallelAddModels = 1.
stgatilov

stgatilov

01.01.2026 12:35

administrator   ~0017105

Last edited: 01.01.2026 12:36

The new mode improves FPS at the start of Scroll of Remembrance from 160 to 190.

I attached Tracy screenshots of a frame with old mode and the new mode.
As you see, in the old mode there was a huge "_area0" model which took 2ms to process, and it also happened to run last for some reason.
That's why the whole parallel for took:
  2 ms to process all the other jobs in 5 threads
  1.3 ms to process _area0 in 1 thread, which other 4 threads are sleeping
Now this model starts first as the toughest one and runs in parallel with the other jobs.
Altogether it takes about 2.5 ms now (instead of 3.3 ms).
6650_ScrollOfRemembrance_old.png (239,600 bytes)   
6650_ScrollOfRemembrance_old.png (239,600 bytes)   
6650_ScrollOfRemembrance_new.png (225,795 bytes)   
6650_ScrollOfRemembrance_new.png (225,795 bytes)   
stgatilov

stgatilov

01.01.2026 12:42

administrator   ~0017106

Here is how chunking works on lightgem subview in Displacement.

The whole set of jobs takes 0.25 ms.
You can see pretty large chunks of jobs at the very beginning (recall that 0.1 ms per chunks is the target).
But near the end of execution we see progressively smaller chunks, which helps the worker threads to finish almost at the same time.

Issue History

Date Modified Username Field Change
01.01.2026 11:30 stgatilov New Issue
01.01.2026 11:30 stgatilov Status new => assigned
01.01.2026 11:30 stgatilov Assigned To => stgatilov
01.01.2026 12:16 stgatilov Relationship added related to 0006503
01.01.2026 12:24 stgatilov Note Added: 0017103
01.01.2026 12:25 stgatilov Note Added: 0017104
01.01.2026 12:35 stgatilov Note Added: 0017105
01.01.2026 12:35 stgatilov File Added: 6650_ScrollOfRemembrance_old.png
01.01.2026 12:35 stgatilov File Added: 6650_ScrollOfRemembrance_new.png
01.01.2026 12:36 stgatilov Note Edited: 0017105
01.01.2026 12:42 stgatilov Note Added: 0017106
01.01.2026 12:42 stgatilov File Added: 6650_Displacement_chunking.png
01.01.2026 12:43 stgatilov Status assigned => feedback