0006650: Improvements of r_useParallelAddModels - The Dark Mod Bugtracker

ID	Project	Category	View Status	Date Submitted	Last Update

0006650	The Dark Mod	Coding	public	01.01.2026 11:30	01.01.2026 12:43

Reporter	stgatilov	Assigned To	stgatilov
Priority	normal	Severity	feature	Reproducibility	have not tried
Status	feedback	Resolution	open
Product Version	TDM 2.13
Target Version	TDM 2.14

Summary	0006650: Improvements of r_useParallelAddModels
Description	It seems that the parallelization of R_AddModelSurfaces has some problems with: 1) Overhead. Noticed in Tracy for lightgem subview when they are MANY entities. For instance, this happens in starting area of Displacement, where parallelSky light covers a lot of entities in the starting area. There are a lot of renderEntities processed, but each of them takes little time. 2) Load balancing. This happens sometimes when some long job accidentally is done last, thus all the other workers are stale and wait for it. For instance, it happens in the starting area of mission Scroll of Remembrance.
Tags	No tags attached.

stgatilov 01.01.2026 12:24 administrator ~0017103	I somewhat improved the system in: r11078 idParallelJobManager::GetNumProcessingUnits now returns the actual number of worker threads to run a joblist on. r11079 Run R_AddSingleModel with adaptive chunking based on last-frame timing history. To improve scheduling of R_AddSingleModel, some timing information is needed. Now execution time of each R_AddSingleModel call for each view is measured and stored in idRenderEntityLocal. Here are some changes to reduce overhead of various timing routines: r11075 GetClockTicks no longer runs CPUID instruction to serialize instruction stream. r11076 Replaced return type of GetClockTicks & ClockTicksPerSecond from double to uint64. r11077 Changed timer in job system: Sys_GetClockTicks instead of Sys_GetTimeMicroseconds. The R_AddSingleModel jobs are first sorted by their execution time: slower jobs should run first, faster jobs should run later. This reordering alone improves load balancing, including the Scroll of Remembrance case where on extra-tough entity normally processed last. To reduce overhead, we then group the jobs into intervals/chunks, each chunk is executed as one job in job system. We normally aim for chunks of size r_parallelAddModelsChunk = 0.1 ms, but the chunks near the end are more subdivided so that workers are more likely to finish simultaneously. This is logically rather complicated algorithm, you can see it in JobChunks.cpp. And of course it uses the timing information as well.

stgatilov 01.01.2026 12:25 administrator ~0017104	The new mode is enabled under r_useParallelAddModels = 2, which is now enabled by default: r17390 Frontend Acceleration ON now corresponds to r_useParallelAddModels = 2. I expect the new mode useful only for debugging and comparison purposes. But it can still be enabled by setting r_useParallelAddModels = 1.

stgatilov 01.01.2026 12:35 administrator ~0017105 Last edited: 01.01.2026 12:36	The new mode improves FPS at the start of Scroll of Remembrance from 160 to 190. I attached Tracy screenshots of a frame with old mode and the new mode. As you see, in the old mode there was a huge "_area0" model which took 2ms to process, and it also happened to run last for some reason. That's why the whole parallel for took: 2 ms to process all the other jobs in 5 threads 1.3 ms to process _area0 in 1 thread, which other 4 threads are sleeping Now this model starts first as the toughest one and runs in parallel with the other jobs. Altogether it takes about 2.5 ms now (instead of 3.3 ms). 6650_ScrollOfRemembrance_old.png (239,600 bytes) 6650_ScrollOfRemembrance_old.png (239,600 bytes) 6650_ScrollOfRemembrance_new.png (225,795 bytes) 6650_ScrollOfRemembrance_new.png (225,795 bytes)

stgatilov 01.01.2026 12:42 administrator ~0017106	Here is how chunking works on lightgem subview in Displacement. The whole set of jobs takes 0.25 ms. You can see pretty large chunks of jobs at the very beginning (recall that 0.1 ms per chunks is the target). But near the end of execution we see progressively smaller chunks, which helps the worker threads to finish almost at the same time. 6650_Displacement_chunking.png (343,587 bytes)

Date Modified	Username	Field	Change
01.01.2026 11:30	stgatilov	New Issue
01.01.2026 11:30	stgatilov	Status	new => assigned
01.01.2026 11:30	stgatilov	Assigned To	=> stgatilov
01.01.2026 12:16	stgatilov	Relationship added	related to 0006503
01.01.2026 12:24	stgatilov	Note Added: 0017103
01.01.2026 12:25	stgatilov	Note Added: 0017104
01.01.2026 12:35	stgatilov	Note Added: 0017105
01.01.2026 12:35	stgatilov	File Added: 6650_ScrollOfRemembrance_old.png
01.01.2026 12:35	stgatilov	File Added: 6650_ScrollOfRemembrance_new.png
01.01.2026 12:36	stgatilov	Note Edited: 0017105
01.01.2026 12:42	stgatilov	Note Added: 0017106
01.01.2026 12:42	stgatilov	File Added: 6650_Displacement_chunking.png
01.01.2026 12:43	stgatilov	Status	assigned => feedback