View Issue Details

IDProjectCategoryView StatusLast Update
0004832The Dark ModCodingpublic07.07.2018 07:10
Reporterstgatilov Assigned Toduzenko  
PrioritylowSeveritytweakReproducibilityN/A
Status closedResolutionfixed 
Product VersionTDM 2.06 
Target VersionTDM 2.07Fixed in VersionTDM 2.07 
Summary0004832: Optimize R_CalcPointCull with SSE
DescriptionIn some cases this function takes 11.6% of CPU time, so it would benefit from optimizing.
Probably also look at other frontend functions nearby.
Steps To Reproduce1. Start mission "Rightful Property"
2. Find the location described in forum post (linked below).
3. FPS is low =) Use profiler to see where CPU time goes.
Additional InformationThe location originally described by nbohr1more:
  http://forums.thedarkmod.com/topic/19484-32-bit-or-64-bit/#entry422995
In the context that 64-bit version is slower here.
TagsNo tags attached.

Relationships

related to 0004681 resolvedduzenko Front renderer optimization 
related to 0004550 resolvedstgatilov Cleanup of SIMD code 

Activities

duzenko

duzenko

10.06.2018 13:27

developer   ~0010514

Last edited: 10.06.2018 13:37

View 2 revisions

Can you AVX this loop?

        for ( i = 0; i < 6; i++ ) {
            float d = frustum[i].Distance( vec );
            bits |= (d < LIGHT_CLIP_EPSILON) << i;
            bits |= (d > -LIGHT_CLIP_EPSILON) << (i + 6);
        }

Do the distance computation on all 6 planes at the same time, bit operations too if possible.

stgatilov

stgatilov

23.06.2018 04:23

administrator   ~0010559

Last edited: 23.06.2018 04:26

View 3 revisions

Do not use AVX. It is unsupported on most current hardware.
Moreover, we don't even have AVX version of idSIMD. Adding it is possible but not easy, because in addition to CPUID bit you have to query support from OS.
Also keep in mind that AVX gives +70% boost in the best case, and given that we have 3-float vectors everywhere, most likely you won't get any benefit at all.

Currently there are two options for using SIMD:
1. Use SSE or SSE2 (nothing more) --- that's already used by compiler itself.
2. Use dynamically dispatched function for anything newer (aka idSIMD).

The second option makes sense only for functions which do sufficient amount of work. Probably OK if you move loop by vertices into idSIMD completely.

Option 1 can be seen in idBounds::IntersectsBounds.
Code for option 2 is usually in "#else /* SIMD_USE_ASM */" section in Simd_SSE2.cpp

duzenko

duzenko

23.06.2018 05:56

developer   ~0010560

I am talking about the planes loop, which is 6 elements/iterations.
It's too big for 128 bit SSE, and avx would be better here.
AVX supported by 86% of steam users and would be especially useful on chips like Core M5, which throttle to 1 GHz under load.
stgatilov

stgatilov

23.06.2018 09:36

administrator   ~0010561

I suggest starting with SSE2 version.

After that we can decide what to do with AVX.
duzenko

duzenko

23.06.2018 09:42

developer   ~0010562

The thing is, the SSE2 version requires more setup and operation code, because it has to separately process the first 4 planes and the other 2.
I already did the AVX version locally and want to create a SimdAVX class for it.
duzenko

duzenko

23.06.2018 09:44

developer   ~0010563

Last edited: 23.06.2018 09:45

View 2 revisions

Btw speed gain is good enough so that I want to do the same with R_CalcInteractionCullBits.

duzenko

duzenko

23.06.2018 11:08

developer   ~0010564

At revision: 7505
duzenko

duzenko

23.06.2018 11:31

developer   ~0010565

Oops, bug fixed in 7506
duzenko

duzenko

23.06.2018 16:11

developer   ~0010566

Added a temp cvar: com_tempAllowAVX for debugging/testing.
stgatilov

stgatilov

23.06.2018 17:21

administrator   ~0010569

Last edited: 23.06.2018 17:25

View 2 revisions

What about SSE/AVX switch penalty?
Do you perform vzeroupper at the end of routine?

See here:
  https://stackoverflow.com/a/7841251/556899
  https://stackoverflow.com/a/43881620/556899

duzenko

duzenko

23.06.2018 17:47

developer   ~0010570

Not yet, but we should.

Issue History

Date Modified Username Field Change
10.06.2018 13:17 stgatilov New Issue
10.06.2018 13:17 stgatilov Status new => assigned
10.06.2018 13:17 stgatilov Assigned To => stgatilov
10.06.2018 13:27 duzenko Note Added: 0010514
10.06.2018 13:37 duzenko Note Edited: 0010514 View Revisions
21.06.2018 19:05 nbohr1more Relationship added related to 0004681
23.06.2018 04:23 stgatilov Note Added: 0010559
23.06.2018 04:24 stgatilov Note Edited: 0010559 View Revisions
23.06.2018 04:26 stgatilov Note Edited: 0010559 View Revisions
23.06.2018 05:56 duzenko Note Added: 0010560
23.06.2018 09:36 stgatilov Note Added: 0010561
23.06.2018 09:42 duzenko Note Added: 0010562
23.06.2018 09:44 duzenko Note Added: 0010563
23.06.2018 09:45 duzenko Note Edited: 0010563 View Revisions
23.06.2018 11:08 duzenko Note Added: 0010564
23.06.2018 11:31 duzenko Note Added: 0010565
23.06.2018 16:11 duzenko Note Added: 0010566
23.06.2018 17:21 stgatilov Note Added: 0010569
23.06.2018 17:25 stgatilov Note Edited: 0010569 View Revisions
23.06.2018 17:47 duzenko Note Added: 0010570
25.06.2018 06:15 stgatilov Assigned To stgatilov => duzenko
27.06.2018 17:22 stgatilov Relationship added related to 0004550
07.07.2018 07:10 duzenko Status assigned => closed
07.07.2018 07:10 duzenko Resolution open => fixed
07.07.2018 07:10 duzenko Fixed in Version => TDM 2.07