View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0004550 | The Dark Mod | Coding | public | 02.07.2017 06:14 | 14.04.2021 12:45 |
Reporter | stgatilov | Assigned To | stgatilov | ||
Priority | low | Severity | minor | Reproducibility | N/A |
Status | resolved | Resolution | fixed | ||
Target Version | TDM 2.10 | Fixed in Version | TDM 2.10 | ||
Summary | 0004550: Cleanup of SIMD code | ||||
Description | Current SIMD processors are a mess: there are some weird ad hoc ifdefs all over the place... They need some cleanup, at least: 1. Remove MMX and Altivec processors 2. Fix ifdefs weirdness. 3. Fix cpuid detection on Linux (move code to sys/linux?) 4. Add SSSE3/SSE4.1 versions --- UPDATE: superseded by AVX processors. | ||||
Tags | No tags attached. | ||||
related to | 0002427 | resolved | stgatilov | Broken SIMD Support on Linux |
related to | 0004435 | closed | duzenko | Investigate fp-precision related issues |
related to | 0003177 | resolved | stgatilov | Illegal instruction caused by SSE2 used on non-SSE CPU |
related to | 0004832 | closed | duzenko | Optimize R_CalcPointCull with SSE |
related to | 0004857 | resolved | stgatilov | Build Error due to reference to cpuid.h on PPC64 |
http://forums.thedarkmod.com/topic/19081-remove-simd-3dnow-simd-altivec/ | |
One more bit for future: add AVX detection for Linux. Related: 0004832 |
|
We seem have a user with a PPC, albeit his system is altivec-less. | |
Also: 1) Probably remove assembly? 2) Integrate some Mac-specific intrinsics code (how the hell added it?) 3) Fix some weird MacOS ifdefs around the code 4) Review Elbrus fixes 5) Try to globally undef __SSE__ and other macros and compile project (on Linux?) |
|
Some work on the topic: r9250. Removed MMX and 3DNow SIMD processors. r9251. Minor cleanup in sys/simd code. r9252. Moved the code for detecting SSE/AVX into new common source sys_cpu.cpp. Also fixed it to compile and work on Linux (hopefully, even on non-x86 CPUs). r9253. Removed hacky CPUID code from Simd.cpp. So now we only have Generic, SSE, SSE2, SSE3, AVX, AVX2 + unused Altivec. Thanks to the last two commits, CPUID detection now works the same way on Windows and Linux. So AVX and AVX2 code should work automatically on Linux now! |
|
More work related to FPU flags: r9254. Moved FPU-related functions to sys_cpu.cpp. r9255. Fixed FPU control functions for VC 64-bit and GCC. Previously Flush-To-Zero and Denormals-Are-Zero flags were only enabled on MSVC 32-bit. On 64-bit VC and perhaps some Linuxes, denormals were generated, slowing game down. Now FTZ and DAZ are enabled on all supported platforms (i.e. x86/x64 + VC/GCC). Note that Ubuntu seems to enable these flags by default, so the behavior change is mainly on Windows 64-bit. Also com_fpexceptions should now work on Linux. The function Sys_FPU_SetPrecision is most likely dead, since it is x87-only. It sets FPU internal precision to single-precision, double-precision, or extended-prevision. Note that I only made sure SSE control word is set properly. Old x87 has its own control word, and it is most likely not affected by anything, especially on Linux. TDM has moved off this goddamn x87 monster long time ago, and there is no reason to look back =) Now the plan is to see when all these FPU settings (FTZ+DAZ, exceptions, FPU precision) are set, and ensure they are set on all threads on all platforms. |
|
More changes: r9256. Added ThreadStartup and ThreadHeartbeat functions to idSys. r9257. Reworked how FPU properties are set on main thread. r9258. Set FPU props on frontend threads. r9259. Set FPU properties on Async thread. (+ r9265. Fixed shutting down async thread) r9260. Set FPU properties on job system worker threads. Note that FPU settings are thread-private. When new thread is started, it usually gets system-default values. It means that if we want to e.g. disable denormals globally, we have to do it in every thread. I have added two functions for setting and updating FPU properties: sys->ThreadStartup(); sys->ThreadHeartbeat(); Now their calls are added to the following threads: main thread frontend thread async thread job system worker threads All the other threads still have denormals, and will not crash on infinity/NaN if com_fpexceptions is set. |
|
More changes follow: r9261. Removed VPCALL weirdness. r9262. Removed SIMD implementations under MACOS_X and __i386__ only. r9263. Removed some weird MACOS_X ifdefs. r9267. Removed hacks to disable optimization on GCC on some functions/files. r9268. Removed already commented SIMD code for sound resampling. r9269. testSIMD now uses sys->GetClockTicks() to measure time, instead of platform-specific hacks. r9270. Fixed warning about "no symbols in obj file". r9271. Moved original SIMD assembly by ID into new idSIMD_IdAsm processor and disabled it by default. This removes most of the platform-specific garbage and unnecessary stuff from SIMD code. The titanic amount of inline assembly originally written by ID software is moved to special "IdAsm" processor, so it does not bother us anymore. It is only compiled on MSVC 32-bit (which is deprecated BTW), and can only be enabled explicitly via com_forceGenericSIMD cvar. This is in fact behavior change: 32-bit Windows game has switched from ID's code to our own (which is used on all other platforms). Brief testing showed that our SSE3 code is in general faster than ID's assembly, mainly due to much faster DeriveTangents. |
|
And the final pack of commits: r9272. Fixed GCC build: declared virtual functions must be defined. r9273. Now idSIMDProcessor has virtual destructor (fixing warning). r9274. Set -msse2 in CMake only if compiler supports it (check_cxx_compiler_flag). r9275 + r9276. Do not use/include sys_intrinsics.h. r9277. testSIMD command no longer accepts processor name as argument. The currently enabled processor is always tested against generic one. r9278. Do not compile SSE/AVX SIMD processors on non-x86 architectures. r9279. Protect CPUID instruction and xmmintrin.h include from non-x86 architectures. The main goal of these changes is to allow compiling TDM on non-x86 architectures. It was rather hard to do with TDM 2.09 source code, because there were tons of CPUID/SSE intrinsics and headers, used without any checks. Now it should be possible. In fact, I managed to cross-compile all TDM code for 64-bit ARM architecture at rev 9279. I only had to provide CPUSTRING define for ARM, in order to avoid "#error unknown CPU": +#elif defined(__aarch64__) +#define CPUSTRING "arm_64" +#elif defined(__arm__) +#define CPUSTRING "arm" Note that I only COMPILED all our code, I did not try to link it. That would obviously require building all third-party libs for ARM, which is not fun. So there might be linking errors. Of course, I did not test it too =) One interesting point in the changes is removal of sys_intrinsics.h usages. We have two ways to use SSE/AVX in the code now: 1) void myfunction() { #ifdef __SSE2__ {fast code} #else {generic code} #endif 2) idSIMDProcessor Both of these ways always provide generic code to be used if SSE is not available. The functions from sys_intrinsics.h in its current form do not provide generic implementation, and thus tie TDM to x86 CPU. If we ever decide to copy more code from D3BFG relying on them, then we'll have to provide generic implementations. |
|
Date Modified | Username | Field | Change |
---|---|---|---|
02.07.2017 06:14 | stgatilov | New Issue | |
02.07.2017 06:14 | stgatilov | Assigned To | => stgatilov |
02.07.2017 06:14 | stgatilov | Status | new => assigned |
30.08.2017 16:59 | nbohr1more | Relationship added | related to 0002427 |
30.08.2017 17:02 | nbohr1more | Relationship added | related to 0004435 |
30.08.2017 17:37 | nbohr1more | Note Added: 0009131 | |
13.09.2017 21:14 | nbohr1more | Relationship added | related to 0003177 |
27.06.2018 17:22 | stgatilov | Note Added: 0010590 | |
27.06.2018 17:22 | stgatilov | Relationship added | related to 0004832 |
05.07.2018 02:50 | stgatilov | Relationship added | related to 0004857 |
05.07.2018 07:39 | duzenko | Note Added: 0010644 | |
20.01.2019 07:18 | stgatilov | Target Version | => TDM 2.08 |
22.02.2020 13:47 | stgatilov | Target Version | TDM 2.08 => |
21.03.2020 17:40 | stgatilov | Target Version | => TDM 2.09 |
19.05.2020 16:24 | stgatilov | Note Added: 0012524 | |
05.12.2020 12:29 | stgatilov | Target Version | TDM 2.09 => TDM 2.10 |
11.04.2021 15:01 | stgatilov | Note Added: 0013836 | |
11.04.2021 16:35 | stgatilov | Note Added: 0013837 | |
12.04.2021 15:18 | stgatilov | Note Added: 0013840 | |
14.04.2021 06:01 | stgatilov | Note Edited: 0013840 | |
14.04.2021 06:06 | stgatilov | Note Added: 0013846 | |
14.04.2021 12:44 | stgatilov | Note Added: 0013847 | |
14.04.2021 12:45 | stgatilov | Status | assigned => resolved |
14.04.2021 12:45 | stgatilov | Resolution | open => fixed |
14.04.2021 12:45 | stgatilov | Fixed in Version | => TDM 2.10 |
14.04.2021 12:45 | stgatilov | Description Updated |