View Issue Details

IDProjectCategoryView StatusLast Update
0005027The Dark ModCodingpublic15.04.2019 16:48
Reporterstgatilov Assigned Tostgatilov  
PrioritynormalSeveritytweakReproducibilityN/A
Status resolvedResolutionfixed 
PlatformVisual C++OSWindows 
Product VersionSVN 
Target VersionTDM 2.08Fixed in VersionTDM 2.08 
Summary0005027: Fine-tune "Debug with Inlines" configuration for development
DescriptionWe have four configurations now: Full debug, full release, "Debug with Inlines", and one more (which I'll remove soon).

Full debug is very slow (several times slower than Full release). This situation has lead to the fact that some developers debug release build. But Full release is very hard to debug, and I even see "#pragma optimize(off)" being inserted to deoptimized the chunks of code being debugged.

The idea is to remove dust from "Debug with Inlines" configuration and make it something in the middle: the code which is convenient for debugging in 99% case, but is quite fast to prefer it over Release.
TagsNo tags attached.

Activities

stgatilov

stgatilov

15.04.2019 15:12

administrator   ~0011741

In svn rev 8176:

1) Disable runtime checks (/RTC).
They noticeably slow down execution.
The downside is: stack corruption won't be detected.
If you need to check for stack corruption, you have to run Debug build.

2) Enabled /Debug:FASTLINK linker option.
This should accelerate handling of debug symbols by linker.
I see 10-20% link speed improvement --- much less than I expected.
Maybe that's because of some third-party libs, which are compiled with ordinary /Zi flag.

3) Enabled "Edit and Continue" feature.
I'm not sure how useful it is, probably we will disable it back in future.
Note that we cannot enable it in Debug (combined with /RTC is slow down execution by many times), and cannot enable in Release (optimized builds are not supported).
I see no downside to having it, except that this feature itself usually does not work even in trivial cases.

"Debug with Inlines" improved from 42 FPS to 54 FPS in StLucia hall.
stgatilov

stgatilov

15.04.2019 15:43

administrator   ~0011742

One interesting thing is that SIMD intrinsics are particularly bad in Debug build.
There are two reasons for it:

1) Intrinsics-using code has a lot of excessive load/stores without optimizations. Unfortunately, fast code with intrinsics needs optimization.

2) Code using intrinsics often calls getters of idVec-s, idMat-s, idDrawVert-s and all other trivial idlib structures. Even though these getters are very small, compiler does not inline them without optimizations (despite /Ob1 meaning that they CAN be inlined).

The point 2 was also changed somewhere near 2.06 when ID_INLINE was changed from __forceinline to just inline.
Note that while __forceinline rarely makes harm in optimized build (and rarely makes sense though), it can be a real disaster for "Debug with Inlines" build! Even simple operator[] with assert generates a page of assembly code, and more complicated method is going to generate lots of trash.

The plan is to combat both problems at once:
1) Manually enable optimization for a few compute-intensive functions.
2) Mark trivial getters of idlib types as "forceinline".
stgatilov

stgatilov

15.04.2019 15:44

administrator   ~0011743

In svn rev 8177:

Added macros DEBUG_OPTIMIZE_ON/DEBUG_OPTIMIZE_OFF, intended to surround a function/method in CPP file.
These macros are empty everywhere except in MSVC "Debug with Inlines" build.
In this build, they turn on/off optimization via "#pragma optimize".

Note that:
* It works only for functions in CPP file: no way to force-optimize function in header.
* Be extremely careful to disable optimization back, and don't overuse it!

These macros are used to force optimization on several heavy functions:
1) All idSIMD_SSE2 methods using intrinsics (assembly does not care about optimization).
2) All Simd_AVX and Simd_AVX2 methods --- they are using intrinsics too.
3) CopyBuffer in BufferObject.cpp --- intrinsics-based, used quite heavily.
4) idRenderMatrix::GetFrustumCorners and idRenderMatrix::CullFrustumCornersToPlane show up noticeably in profiler.
5) Various R_LocalPointToGlobal/R_GlobalPointToLocal and similar methods in tr_main.cpp --- they show up in profiler.

Speed improved from 54 FPS to 58 FPS (StLucia hall).
To be further improved with inlining fix.
stgatilov

stgatilov

15.04.2019 15:54

administrator   ~0011744

In svn rev 8178:

Fixed global macro ID_FORCE_INLINE on MSVC: it was erroneously defined as __inline.
Defined ID_FORCE_INLINE on GCC.

Marked a bunch of methods in idlib as ID_FORCE_INLINE:
1) All member getters returning a reference or a primitive value.
2) Most of operator[] methods: even those having index-checking assert (though this assert generates lots of trash).
3) Most of the methods of idVec* (except for clearly slow/heavy ones).
4) Methods of __m128c in sys-intrinsics.h.
5) A few simple setters, trivial equality checks.
6) A few methods of idMath.

This improves performance from 58 FPS to 63 FPS (in StLucia hall).

Issue History

Date Modified Username Field Change
15.04.2019 14:55 stgatilov New Issue
15.04.2019 14:55 stgatilov Status new => assigned
15.04.2019 14:55 stgatilov Assigned To => stgatilov
15.04.2019 15:12 stgatilov Note Added: 0011741
15.04.2019 15:43 stgatilov Note Added: 0011742
15.04.2019 15:44 stgatilov Note Added: 0011743
15.04.2019 15:54 stgatilov Note Added: 0011744
15.04.2019 16:48 stgatilov Status assigned => resolved
15.04.2019 16:48 stgatilov Fixed in Version => TDM 2.08
15.04.2019 16:48 stgatilov Resolution open => fixed