View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0004832 | The Dark Mod | Coding | public | 10.06.2018 13:17 | 07.07.2018 07:10 |
Reporter | stgatilov | Assigned To | duzenko | ||
Priority | low | Severity | tweak | Reproducibility | N/A |
Status | closed | Resolution | fixed | ||
Product Version | TDM 2.06 | ||||
Target Version | TDM 2.07 | Fixed in Version | TDM 2.07 | ||
Summary | 0004832: Optimize R_CalcPointCull with SSE | ||||
Description | In some cases this function takes 11.6% of CPU time, so it would benefit from optimizing. Probably also look at other frontend functions nearby. | ||||
Steps To Reproduce | 1. Start mission "Rightful Property" 2. Find the location described in forum post (linked below). 3. FPS is low =) Use profiler to see where CPU time goes. | ||||
Additional Information | The location originally described by nbohr1more: http://forums.thedarkmod.com/topic/19484-32-bit-or-64-bit/#entry422995 In the context that 64-bit version is slower here. | ||||
Tags | No tags attached. | ||||
Can you AVX this loop? for ( i = 0; i < 6; i++ ) { float d = frustum[i].Distance( vec ); bits |= (d < LIGHT_CLIP_EPSILON) << i; bits |= (d > -LIGHT_CLIP_EPSILON) << (i + 6); } Do the distance computation on all 6 planes at the same time, bit operations too if possible. |
|
Do not use AVX. It is unsupported on most current hardware. Moreover, we don't even have AVX version of idSIMD. Adding it is possible but not easy, because in addition to CPUID bit you have to query support from OS. Also keep in mind that AVX gives +70% boost in the best case, and given that we have 3-float vectors everywhere, most likely you won't get any benefit at all. Currently there are two options for using SIMD: 1. Use SSE or SSE2 (nothing more) --- that's already used by compiler itself. 2. Use dynamically dispatched function for anything newer (aka idSIMD). The second option makes sense only for functions which do sufficient amount of work. Probably OK if you move loop by vertices into idSIMD completely. Option 1 can be seen in idBounds::IntersectsBounds. Code for option 2 is usually in "#else /* SIMD_USE_ASM */" section in Simd_SSE2.cpp |
|
I am talking about the planes loop, which is 6 elements/iterations. It's too big for 128 bit SSE, and avx would be better here. AVX supported by 86% of steam users and would be especially useful on chips like Core M5, which throttle to 1 GHz under load. |
|
I suggest starting with SSE2 version. After that we can decide what to do with AVX. |
|
The thing is, the SSE2 version requires more setup and operation code, because it has to separately process the first 4 planes and the other 2. I already did the AVX version locally and want to create a SimdAVX class for it. |
|
Btw speed gain is good enough so that I want to do the same with R_CalcInteractionCullBits. |
|
At revision: 7505 | |
Oops, bug fixed in 7506 | |
Added a temp cvar: com_tempAllowAVX for debugging/testing. | |
What about SSE/AVX switch penalty? Do you perform vzeroupper at the end of routine? See here: https://stackoverflow.com/a/7841251/556899 https://stackoverflow.com/a/43881620/556899 |
|
Not yet, but we should. | |
Date Modified | Username | Field | Change |
---|---|---|---|
10.06.2018 13:17 | stgatilov | New Issue | |
10.06.2018 13:17 | stgatilov | Status | new => assigned |
10.06.2018 13:17 | stgatilov | Assigned To | => stgatilov |
10.06.2018 13:27 | duzenko | Note Added: 0010514 | |
10.06.2018 13:37 | duzenko | Note Edited: 0010514 | |
21.06.2018 19:05 | nbohr1more | Relationship added | related to 0004681 |
23.06.2018 04:23 | stgatilov | Note Added: 0010559 | |
23.06.2018 04:24 | stgatilov | Note Edited: 0010559 | |
23.06.2018 04:26 | stgatilov | Note Edited: 0010559 | |
23.06.2018 05:56 | duzenko | Note Added: 0010560 | |
23.06.2018 09:36 | stgatilov | Note Added: 0010561 | |
23.06.2018 09:42 | duzenko | Note Added: 0010562 | |
23.06.2018 09:44 | duzenko | Note Added: 0010563 | |
23.06.2018 09:45 | duzenko | Note Edited: 0010563 | |
23.06.2018 11:08 | duzenko | Note Added: 0010564 | |
23.06.2018 11:31 | duzenko | Note Added: 0010565 | |
23.06.2018 16:11 | duzenko | Note Added: 0010566 | |
23.06.2018 17:21 | stgatilov | Note Added: 0010569 | |
23.06.2018 17:25 | stgatilov | Note Edited: 0010569 | |
23.06.2018 17:47 | duzenko | Note Added: 0010570 | |
25.06.2018 06:15 | stgatilov | Assigned To | stgatilov => duzenko |
27.06.2018 17:22 | stgatilov | Relationship added | related to 0004550 |
07.07.2018 07:10 | duzenko | Status | assigned => closed |
07.07.2018 07:10 | duzenko | Resolution | open => fixed |
07.07.2018 07:10 | duzenko | Fixed in Version | => TDM 2.07 |