View Issue Details

IDProjectCategoryView StatusLast Update
0005562The Dark ModCodingpublic24.07.2021 10:37
Reporterstgatilov Assigned Tostgatilov  
PrioritynormalSeveritynormalReproducibilityN/A
Status resolvedResolutionfixed 
Product VersionTDM 2.09 
Target VersionTDM 2.10Fixed in VersionTDM 2.10 
Summary0005562: dmap: various optimizations
DescriptionCreated this issue to track numerous generic optimizations in dmap.
TagsNo tags attached.

Activities

stgatilov

stgatilov

11.03.2021 03:55

administrator   ~0013776

Last edited: 11.03.2021 03:56

This is perhaps the weirdest thing I found:
  r9190 Ignore func_statics' vertices for global T-junctions removal.
Basically, all vertices of all func_static-s were inserted into vertex hashfor global T-junction removal.
Which means that if and edge of a brush(patch) passes through vertex of func_static, then it was split there.
In my opinion this breaks the whole division between entities/models and brushes: models should not participate in compilation and should not slow it down!

This "fix" can be disabled by cvar dmap_dontSplitWithFuncStaticVertices.

For example: on Painter's Wife this weird vertex insertion took 35% of the whole dmap time!
stgatilov

stgatilov

11.03.2021 04:42

administrator   ~0013777

Last edited: 11.03.2021 05:53

The next pack of commits:
  r9201 Preliminary change: added PlaneSet.cpp and moved implementation code there.
  r9202 Fixed performance issues in planes hashing in dmap.

While Fabien Sanglard called idPlaneSet "a really neat and fast Plane Hashing System", it had terrible drawbacks.
The core problem is that hash cell is computed from D(distance) component only (as trunc(D/8)), normal vector is ignored.
This does not sound like a terrible problem when compiling world geometry only, but unfortunately the planes set is global in dmap and contains all planes from ALL entities.
Which means that brush-based func_static-s are all added into the world at the origin, producing tremendous number of planes with near-zero D.
The paramount example is NHAT politics map with its famous stalagmyte-filled cave (all brush-based func_static-s), which results in 65% of full dmap time wasted on planes hashing.

I changed it to typical hash cells in 4D space, so that normal vector is taken into account too.
The size of cell is 1/32 by normal component and 1/4 by distance component.
The cells are stored in hash table with some typical linear hash function modulo prime.
Of course, extending cells to 4D means having to look into 16 cells in worst case, instead of only 2 cells in the original code.
But it is much faster overall.

The second issue was the size of hash table.
idHashIndex has fixed number of cells, it cannot grow up due to design limitations.
For planes set, it was using default number of cells = 1024.
Obviously, large maps have much more planes.
I have added some rough estimate of map size and made hash table size proportional to it.

New code is under cvar dmap_planeHashing.
stgatilov

stgatilov

11.03.2021 05:52

administrator   ~0013778

The next commits are about quadratic handling of lists:
  r9203 Optimized AddTriListToArea in dmap: spreading triangles over planar groups (cvar dmap_fasterPutPrimitives)
  r9204 Optimize concatenation of ambient triangle groups in WriteOutputSurfaces in dmap
Nothing interesting here, just better asymptotic time.
The main beneficiary here is again NHAT politics map, which has insane number of planar groups / triangles due to those awful stalagmites.

The next pack is about idVertexSet --- structure for hashing 3D points:
  r9205 Preliminary changes in idVectorSet.
  r9206 Faster search for equal vertices in output triangulation in dmap (cvar dmap_fasterShareMapTriVerts)
  r9207 Optimized portals melting in dmap-aas (cvar dmap_fasterAasMeltPortals)

Fixed rather stupid bug which allowed to use idHashIndex with non-power-of-two size.
idHashIndex does not complain, but makes most of hash table cells inactive, which may be very slow of course.

The AAS portal melting uses idVertexSet for hashing portal vertices.
While there are typically very few such vertices per portal, a brand new idVertexSet with 65536 cells was allocated for every portal.
Now this set only has 64 cells, and its memory is reused between portals.

Another fix here is faster merging of equal vertices when outputting mesh from dmap.
That's rather creative usage of idVertexSet, I must admit...
stgatilov

stgatilov

11.03.2021 05:56

administrator   ~0013779

Here is some memory leak in AAS code, which has been there all the time:
  r9208 Fixed ancient memory leak in idBrushList::Chop.
No idea how much memory it could leak, but potential is pretty high =)

The next commits accelerate brushes merging in AAS:
  r9209 Preliminary changes in dmap AAS code.
  r9210 Faster brushes merging in dmap-aas.
Basically, when AAS starts, it tries to merge brushes with each other if they 1) don't overlap 2) are easily mergeable.
The search for such pairs was quadratic, I simply optimized it with hash table --- under cvar dmap_fasterAasBrushListMerge.
stgatilov

stgatilov

11.03.2021 05:59

administrator   ~0013780

The next commit is about memory management:
  r9211 Reallocate less memory in dmap-aas when splitting/subtracting brushes.
This is basically about move semantics.

When BSP tree is built, you have a list of brushes, which you recursively subdivide by splitting planes into subtree.
The typical operation here is "split list of brushes by this plane", in which most of the brushes are not split by plane, so they either go to the left of to the right.
The problem here is that the original code was non-destructive: the input list of brushes was not destroyed/modified, so all such non-split brushes were fully copied.
This caused tons of unnecessary motion in memory heap.
I have added destructive versions for splitting a brush and for splitting a list of brushes: they simple "move" existing memory buffers from input to output as much as possible.
stgatilov

stgatilov

11.03.2021 06:03

administrator   ~0013781

Yet another optimization of bad asymptotics:
  r9212 Optimized saving of AAS file.
But this one is due to idList weirdness: author called setNum(N-1) instead of SetNum(N-1, false) =)

Disabled spammy console output:
  r9213 Disabled spam about "reachable areas in cluster" in dmap-aas.
Interestingly, it shaves a 10 seconds off dmap time for Painter's Wife!
I wonder how much faster dmap would be if it did not render anything at all =)

Quadratic search over area-area pairs:
  r9214 Optimized computation of "water-jump" reachabilities in dmap-aas.
It is funny that all typical reachabilities are computed very fast,
but this "jump from ledge into water" exotic reachabilities take a lot of time to compute, while TDM does not even use them.
Fixed by using BSP tree for areas pruning, under cvar dmap_fasterAasWaterJumpReachability
stgatilov

stgatilov

11.03.2021 06:13

administrator   ~0013782

This one is very specific and complicated optimization:
  r9215 Optimized brushes subtraction/chopping in dmap-aas.

Both dmap and AAS build BSP tree from all brushes.
But BSP trees in AAS are much worse in this regard, because:
  1) dmap builds one BSP tree, while AAS builds one tree per every AAS size (typically 2-4).
  2) dmap uses brushes as provided, but AAS inflates brushes by AAS size.
The point 2 is quite severe.

Most of the brushes on any map don't overlap, although they often touch each other exaclty.
This situation is perfect for BSP construction: the plane selection heuristic works very well, and brush splits are very rare.
When AAS offsets all brushes, they start overlapping each other: every pair which touched now overlap.
As the result, BSP construction becomes damn slower!

The AAS tool improves this situation by preprocessing brushes in special way.
For each pair of compatible brushes A, B, it tries to replace them with brushes A\B, B (where A\B is CSG subtraction).
To avoid creating too much fragments due to subtraction, such replacement is allowed only if A\B consists of at most two brushes.
Such preprocessing removes most of overlaps, eradicating some brushes completely --- as the result, the subsequent BSP construction becomes several times faster.

The problem is that the preprocessing itself is quite slow =)
Simply disabling it does not help: in fact, it makes BSP construction so slow, that the overall performance becomes much worse.

The slowness partially stems from the fact that chopping code computes A\B and B\A for every two brushes, and then looks whether they can be used for replacement.
So if e.g. A\B contains six brushes, all of them would be allocated and computed, only to be discarded afterwards.
And that's the place where I put a bit of optimization.
In some cases it is easy to deduce that A\B would surely have more than two brushes, so I do this analysis and omit doing subtraction if analysis allows that.

This optimization can be disabled by cvar pruneAasBrushesChopping.
stgatilov

stgatilov

11.03.2021 06:18

administrator   ~0013783

This commit is not about this issue exactly:
  r9216 Added meta-cvar dmap_compatibility for compiling old maps.
It is about providing users some simple way to run dmap in the mode compatible with previous versions.
There are individual cvars, which can be toggled off, but now there are too many of them already.
So I added one more cvar =)

If you set dmap_compatibility to "208" or "209", then most of the optimizations done for 2.10 will be disabled.
The only reason to do so is troubleshooting: if dmap works properly under "dmap_compatibility 208" but does not work without it, then one of the optimizations for 2.10 can be blamed.

A much more useful value is "207", because it disables visportals/opacity changes/fixes of TDM 2.08.
These changes are pretty major, e.g. Behind Closed Doors has become leaky in 2.08 because of them!
If you want to dmap old FM without fixing visportals and accidental leaks, use this compatibility mode --- hopefully, you won't need to fix too much then.
Dragofer

Dragofer

24.07.2021 10:37

developer   ~0014180

@stgatilov
I've started doing some improvements to my first FM One Step Too Far and some change I made now means that DMAP is very, very slow compared to the version on the TDM mirrors. It's very slow both in 2.09 and 2.10dev for me, so maybe there's still potential to do something about this?

I think it might have been caused by some changes I made to the very large portal_sky/caulk brushes sealing the beach area of the mission against the void. I basically recreated a few of them to get rid of "brush x is unbounded" or "brush x has over 128 windings" type console errors.

It'd be really appreciated if you could look at this because it's wasting a lot of time to have every dmap hang for so long:
https://drive.google.com/file/d/1IusHuWBOdPxWHPedUDbW9FwwebuAMNCf/view?usp=sharing

Issue History

Date Modified Username Field Change
08.03.2021 03:47 stgatilov New Issue
08.03.2021 03:47 stgatilov Status new => assigned
08.03.2021 03:47 stgatilov Assigned To => stgatilov
11.03.2021 03:55 stgatilov Note Added: 0013776
11.03.2021 03:56 stgatilov Note Edited: 0013776
11.03.2021 04:42 stgatilov Note Added: 0013777
11.03.2021 05:52 stgatilov Note Added: 0013778
11.03.2021 05:53 stgatilov Note Edited: 0013777
11.03.2021 05:56 stgatilov Note Added: 0013779
11.03.2021 05:59 stgatilov Note Added: 0013780
11.03.2021 06:03 stgatilov Note Added: 0013781
11.03.2021 06:13 stgatilov Note Added: 0013782
11.03.2021 06:18 stgatilov Note Added: 0013783
11.03.2021 06:19 stgatilov Status assigned => resolved
11.03.2021 06:19 stgatilov Resolution open => fixed
11.03.2021 06:19 stgatilov Fixed in Version => TDM 2.10
24.07.2021 10:37 Dragofer Note Added: 0014180