View Issue Details

IDProjectCategoryView StatusLast Update
0004521The Dark ModCodingpublic30.08.2017 13:02
Reporterstgatilov Assigned Tostgatilov  
PrioritynormalSeveritytweakReproducibilityN/A
Status resolvedResolutionfixed 
Target VersionTDM 2.06Fixed in VersionTDM 2.06 
Summary0004521: Make idClip::Translation faster
DescriptionCurrently clip checks take considerable time, especially on open-scape maps with many guards (due to frequent visibility checks).
The main reason for this is usage of inefficient algorithm in case when translation is lengthy.
Steps To ReproduceLaunch "Inn business" map, and see in your profiler.
You might want to alert guards all over the map to be sure =)
TagsNo tags attached.

Activities

stgatilov

stgatilov

05.05.2017 16:37

administrator   ~0008831

Last edited: 05.05.2017 16:49

A bit of information about how clipping currently works.

The whole collision/clipping thing is done on two levels:
1. idClip: manages clipmodels currently in world with their transformations.
2. idCollisionModelManager: manages a single collision model and traces against it.
The first thing is located in game, so it was always opened. It only performs upper-level pruning, i.e. it determines for a trace request which clipmodels it may hit.
The second thing is located in engine, so it was closed source initially. It does all the most important job of detecting collisions precisely, along with contact features. It can only check a trace model against a single collision model (e.g. a sword, a well, a door, etc).
Also, there is one special clipmodel, which is the "world".

P.S. Some discussion about what is "clipmodel" can be found here:
  http://forums.thedarkmod.com/topic/2986-dynamically-resizing-models-wiki/?p=246977

stgatilov

stgatilov

05.05.2017 16:47

administrator   ~0008832

The data structure of idClip is the following.

At the moment of creation, the bounds of the whole world are computed. Then a full binary tree of depth 12 is created: the root node corresponds to the whole world bounds, and each node is split into two halves at the middle along it greatest extent to obtain its two sons. Nodes are called "clipSector"-s.
Despite the fact that clip sectors structure is a BSP tree, it is easy to see that it actually works as a uniform grid. The leaves of the BSP tree correspond to cells of the grid. The world bounds are split into 2^K equal parts alone each coordinate, with sum Kx + Ky + Kz = 12, so that the final cells are almost cubic in their shape. For instance, if the world has cubic shape, it would be divided into 16 x 16 x 16 pieces; if it is not tall, then division 32 x 32 x 4 is quite likely.

During the whole game every object (entity) has its clipmodel linked into the idClip data structure. Every leaf node in clip sectors tree contains a linked list of clipmodels which are fully or partially in it. The clipmodel is linked into every cell which intersects its current AABB (absBounds).

When a trace request comes to idClip, it does the following:
1. Asks CollisionModelManager to collide trace model against the World.
2. Detects all clipmodels which may potentially intersect with the trace request (by traversing clip sectors tree).
3. Iterates over all the candidate clipmodels found on step 2, and asks CollisionModelManager to collide trace model against each of them.
4. Among all collision found on step 3, choose the earliest one (minimal "fraction" = time).

Note that on step 2 shortened trace request is used, if collision against world was found: trace request ends at the point of the detected collision.
stgatilov

stgatilov

05.05.2017 17:17

administrator  

Clip_old.pdf (82,207 bytes)
stgatilov

stgatilov

05.05.2017 17:17

administrator  

Clip_new.pdf (81,114 bytes)
stgatilov

stgatilov

05.05.2017 17:30

administrator   ~0008833

The current way of doing step 2 of the algorithm (and partly the next steps too) is very inefficient, if the movement of the trace request is large.
One particular class of such trace requests is generated by visibility checks of AI ("CanSee" method). When there are several guards in an open space, they generate several such request per frame.

Currently the AABB of the whole movement (from start to end) is taken, and then every thing whose AABB intersects it is traversed: each such clipmodel is recorded as a candidate. Obviously, when we take a small tracemodel and trace its movement through the whole map, the AABB of movement would be huge along X and Y axes (most likely), so a lot of clipmodels would be treated as candidates without need.
Even if a candidate clipmodel is far from the line segment of movement, the CollisionModelManager would at least traverse its BSP. This is all a clear performance loss.

An example is shown in the attached image Clip_old. Eight clipmodels are within AABB of the movement, although clearly only four of them have a chance to intersect the movement line. However, all eight clipmodels would be candidates, and collision against each of them would be checked (in random order).
stgatilov

stgatilov

05.05.2017 17:40

administrator   ~0008834

The new way significantly reduces the volume which is checked for intersection against AABB of clipmodels. Instead of taking AABB of the whole movement, we take the volume obtained by sweeping the AABB of the clipmodel along the trace line. This volume is called "moving bounds", since it is AABB (idBounds) moved along a line segment. In case of point trace, it is a just line segment.

In order to check whether "moving bounds" intersects a given AABB, we deflate the AABB being moved into a point, while inflating the AABB to be intersected by the same extent. As a result, we will need to determine whether a given line segment intersects a given (inflated) AABB. This is most easily solved by "slab method", see e.g. here:
  https://tavianator.com/fast-branchless-raybounding-box-intersections/

The new code is called from idClip::Translation only when the movement of the trace request is clearly rather large. If trace request has no movement, then the old code is used.
In the new code, a clip sector is traversed only if moving bounds intersect its volume. Also, a clipmodel is put into candidates list only if moving bounds intersect its absBounds.

As a result, the number of candidate clipmodels for collision checking is reduced significantly (from 8 to 4 on the example picture).
stgatilov

stgatilov

06.05.2017 03:46

administrator   ~0008835

But this is not done yet.
Out of four items on the example, only two seem to be important: the rest two are clearly farther along the trace line, so there is no point in checking trace against them.

A very nice property of slab method for ray-box intersection is: it computes the exact subsegment of the ray lying within the box. In terms of movement, it computes the range [L; R], so that moving point is inside box at time L <= t <= R. The same applies to moving bounds: we know the range [L;R], such that moving bounds surely do not intersect entity bounds outside of this range, hence the tracemodel cannot have common points with the clipmodel outside of this range too.
This can be used to prune any clipmodel, if its time range [L;R] is greater than the least currently known collision time. In the new code, lower bounds L on possible intersection time is reported for each candidate clipmodel. Moreover, candidate clipmodels are sorted by this lower bound. On step 3, whenever we see a candidate with L > t (here t - current first time of collision), we can break from the loop and avoid checking all the rest of candidates.

The new algorithm is shown on picture Clip_new. The four candidate items (which intersect the moving bounds) are numbered from 1 to 4, sorted by their intersection time lower bound. Note that the horseshoe is checked first, and hammer is checked afterwards. Nevertheless, only these two clipmodels are checked.
stgatilov

stgatilov

06.05.2017 04:23

administrator   ~0008836

I used MSVC2013 profiler to record first minute of "Inn Business" mission (without doing anything by the player).

idClip takes nominal portion of time:
  in the old code: 21.3 %
  with moving bounds check: 13.8% (excludes pruning by t > L)
  with the new code: 5.3%

Moreover, the new code spends most of the time (precisely: 4.4 / 5.3) in idPhysics_Monster::CheckGround and idPhysics_Actor::EvaluateContacts methods. Both of them have almost nonexistent movement (by 0.25f). It seems that they spend time only on really colliding polygons, so it is sort of nonreducible time...

For debugging purposes, I added cvar g_showCollisionAlongView. When enabled, it repeatedly shoots a ray from the eye along view direction, and highlights the first clipmodels intersected (also, the name of entity is printed to console).

The changes are not yet in SVN, but I'll commit they in the coming days.
stgatilov

stgatilov

07.05.2017 03:31

administrator   ~0008843

One remaining question is: why is CollisionModelManager called with full movement (start -> end), given that candidate clipmodels are determined by shortened movenent (start -> results.endpos) ?
Using shortened movement may further improve performance. Perhaps it is somehow important for CollisionModelManager, so I leave it as it is now.

Committed the changes in revisions 6867, 6868, 6869.
Also, committed a useful debugging cvar g_showCollisionAlongView in revision 6870.

Issue History

Date Modified Username Field Change
05.05.2017 02:40 stgatilov New Issue
05.05.2017 02:40 stgatilov Status new => assigned
05.05.2017 02:40 stgatilov Assigned To => stgatilov
05.05.2017 16:37 stgatilov Note Added: 0008831
05.05.2017 16:47 stgatilov Note Added: 0008832
05.05.2017 16:49 stgatilov Note Edited: 0008831
05.05.2017 17:17 stgatilov File Added: Clip_old.pdf
05.05.2017 17:17 stgatilov File Added: Clip_new.pdf
05.05.2017 17:30 stgatilov Note Added: 0008833
05.05.2017 17:40 stgatilov Note Added: 0008834
06.05.2017 03:46 stgatilov Note Added: 0008835
06.05.2017 04:23 stgatilov Note Added: 0008836
07.05.2017 03:31 stgatilov Note Added: 0008843
07.05.2017 03:32 stgatilov Status assigned => resolved
07.05.2017 03:32 stgatilov Fixed in Version => TDM 2.06
07.05.2017 03:32 stgatilov Resolution open => fixed
07.05.2017 04:40 nbohr1more Target Version => TDM 2.06
19.07.2017 16:41 stgatilov Relationship added related to 0004573
21.07.2017 02:09 stgatilov Relationship deleted related to 0004573
30.08.2017 13:01 nbohr1more Relationship added related to 0004613
30.08.2017 13:02 nbohr1more Relationship added related to 0004610
30.08.2017 16:30 nbohr1more Relationship deleted related to 0004613
30.08.2017 16:30 nbohr1more Relationship deleted related to 0004610