View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0005027||The Dark Mod||Coding||public||15.04.2019 14:55||15.04.2019 16:48|
|Platform||Visual C++||OS||Windows||OS Version|
|Target Version||TDM 2.08||Fixed in Version||TDM 2.08|
|Summary||0005027: Fine-tune "Debug with Inlines" configuration for development|
|Description||We have four configurations now: Full debug, full release, "Debug with Inlines", and one more (which I'll remove soon).|
Full debug is very slow (several times slower than Full release). This situation has lead to the fact that some developers debug release build. But Full release is very hard to debug, and I even see "#pragma optimize(off)" being inserted to deoptimized the chunks of code being debugged.
The idea is to remove dust from "Debug with Inlines" configuration and make it something in the middle: the code which is convenient for debugging in 99% case, but is quite fast to prefer it over Release.
|Tags||No tags attached.|
In svn rev 8176:
1) Disable runtime checks (/RTC).
They noticeably slow down execution.
The downside is: stack corruption won't be detected.
If you need to check for stack corruption, you have to run Debug build.
2) Enabled /Debug:FASTLINK linker option.
This should accelerate handling of debug symbols by linker.
I see 10-20% link speed improvement --- much less than I expected.
Maybe that's because of some third-party libs, which are compiled with ordinary /Zi flag.
3) Enabled "Edit and Continue" feature.
I'm not sure how useful it is, probably we will disable it back in future.
Note that we cannot enable it in Debug (combined with /RTC is slow down execution by many times), and cannot enable in Release (optimized builds are not supported).
I see no downside to having it, except that this feature itself usually does not work even in trivial cases.
"Debug with Inlines" improved from 42 FPS to 54 FPS in StLucia hall.
One interesting thing is that SIMD intrinsics are particularly bad in Debug build.
There are two reasons for it:
1) Intrinsics-using code has a lot of excessive load/stores without optimizations. Unfortunately, fast code with intrinsics needs optimization.
2) Code using intrinsics often calls getters of idVec-s, idMat-s, idDrawVert-s and all other trivial idlib structures. Even though these getters are very small, compiler does not inline them without optimizations (despite /Ob1 meaning that they CAN be inlined).
The point 2 was also changed somewhere near 2.06 when ID_INLINE was changed from __forceinline to just inline.
Note that while __forceinline rarely makes harm in optimized build (and rarely makes sense though), it can be a real disaster for "Debug with Inlines" build! Even simple operator with assert generates a page of assembly code, and more complicated method is going to generate lots of trash.
The plan is to combat both problems at once:
1) Manually enable optimization for a few compute-intensive functions.
2) Mark trivial getters of idlib types as "forceinline".
In svn rev 8177:
Added macros DEBUG_OPTIMIZE_ON/DEBUG_OPTIMIZE_OFF, intended to surround a function/method in CPP file.
These macros are empty everywhere except in MSVC "Debug with Inlines" build.
In this build, they turn on/off optimization via "#pragma optimize".
* It works only for functions in CPP file: no way to force-optimize function in header.
* Be extremely careful to disable optimization back, and don't overuse it!
These macros are used to force optimization on several heavy functions:
1) All idSIMD_SSE2 methods using intrinsics (assembly does not care about optimization).
2) All Simd_AVX and Simd_AVX2 methods --- they are using intrinsics too.
3) CopyBuffer in BufferObject.cpp --- intrinsics-based, used quite heavily.
4) idRenderMatrix::GetFrustumCorners and idRenderMatrix::CullFrustumCornersToPlane show up noticeably in profiler.
5) Various R_LocalPointToGlobal/R_GlobalPointToLocal and similar methods in tr_main.cpp --- they show up in profiler.
Speed improved from 54 FPS to 58 FPS (StLucia hall).
To be further improved with inlining fix.
In svn rev 8178:
Fixed global macro ID_FORCE_INLINE on MSVC: it was erroneously defined as __inline.
Defined ID_FORCE_INLINE on GCC.
Marked a bunch of methods in idlib as ID_FORCE_INLINE:
1) All member getters returning a reference or a primitive value.
2) Most of operator methods: even those having index-checking assert (though this assert generates lots of trash).
3) Most of the methods of idVec* (except for clearly slow/heavy ones).
4) Methods of __m128c in sys-intrinsics.h.
5) A few simple setters, trivial equality checks.
6) A few methods of idMath.
This improves performance from 58 FPS to 63 FPS (in StLucia hall).
|15.04.2019 14:55||stgatilov||New Issue|
|15.04.2019 14:55||stgatilov||Status||new => assigned|
|15.04.2019 14:55||stgatilov||Assigned To||=> stgatilov|
|15.04.2019 15:12||stgatilov||Note Added: 0011741|
|15.04.2019 15:43||stgatilov||Note Added: 0011742|
|15.04.2019 15:44||stgatilov||Note Added: 0011743|
|15.04.2019 15:54||stgatilov||Note Added: 0011744|
|15.04.2019 16:48||stgatilov||Status||assigned => resolved|
|15.04.2019 16:48||stgatilov||Fixed in Version||=> TDM 2.08|
|15.04.2019 16:48||stgatilov||Resolution||open => fixed|