View Issue Details

IDProjectCategoryView StatusLast Update
0004550The Dark ModCodingpublic14.04.2021 12:45
Reporterstgatilov Assigned Tostgatilov  
PrioritylowSeverityminorReproducibilityN/A
Status resolvedResolutionfixed 
Target VersionTDM 2.10Fixed in VersionTDM 2.10 
Summary0004550: Cleanup of SIMD code
DescriptionCurrent SIMD processors are a mess: there are some weird ad hoc ifdefs all over the place...
They need some cleanup, at least:
1. Remove MMX and Altivec processors
2. Fix ifdefs weirdness.
3. Fix cpuid detection on Linux (move code to sys/linux?)
4. Add SSSE3/SSE4.1 versions --- UPDATE: superseded by AVX processors.
TagsNo tags attached.

Relationships

related to 0002427 resolvedstgatilov Broken SIMD Support on Linux 
related to 0004435 closedduzenko Investigate fp-precision related issues 
related to 0003177 resolvedstgatilov Illegal instruction caused by SSE2 used on non-SSE CPU 
related to 0004832 closedduzenko Optimize R_CalcPointCull with SSE 
related to 0004857 resolvedstgatilov Build Error due to reference to cpuid.h on PPC64 

Activities

nbohr1more

nbohr1more

30.08.2017 17:37

developer   ~0009131

http://forums.thedarkmod.com/topic/19081-remove-simd-3dnow-simd-altivec/
stgatilov

stgatilov

27.06.2018 17:22

administrator   ~0010590

One more bit for future: add AVX detection for Linux.
Related: 0004832
duzenko

duzenko

05.07.2018 07:39

developer   ~0010644

We seem have a user with a PPC, albeit his system is altivec-less.
stgatilov

stgatilov

19.05.2020 16:24

administrator   ~0012524

Also:
1) Probably remove assembly?
2) Integrate some Mac-specific intrinsics code (how the hell added it?)
3) Fix some weird MacOS ifdefs around the code
4) Review Elbrus fixes
5) Try to globally undef __SSE__ and other macros and compile project (on Linux?)
stgatilov

stgatilov

11.04.2021 15:01

administrator   ~0013836

Some work on the topic:
  r9250. Removed MMX and 3DNow SIMD processors.
  r9251. Minor cleanup in sys/simd code.
  r9252. Moved the code for detecting SSE/AVX into new common source sys_cpu.cpp. Also fixed it to compile and work on Linux (hopefully, even on non-x86 CPUs).
  r9253. Removed hacky CPUID code from Simd.cpp.

So now we only have Generic, SSE, SSE2, SSE3, AVX, AVX2 + unused Altivec.
Thanks to the last two commits, CPUID detection now works the same way on Windows and Linux.
So AVX and AVX2 code should work automatically on Linux now!
stgatilov

stgatilov

11.04.2021 16:35

administrator   ~0013837

More work related to FPU flags:
  r9254. Moved FPU-related functions to sys_cpu.cpp.
  r9255. Fixed FPU control functions for VC 64-bit and GCC.

Previously Flush-To-Zero and Denormals-Are-Zero flags were only enabled on MSVC 32-bit.
On 64-bit VC and perhaps some Linuxes, denormals were generated, slowing game down.
Now FTZ and DAZ are enabled on all supported platforms (i.e. x86/x64 + VC/GCC).
Note that Ubuntu seems to enable these flags by default, so the behavior change is mainly on Windows 64-bit.

Also com_fpexceptions should now work on Linux.

The function Sys_FPU_SetPrecision is most likely dead, since it is x87-only.
It sets FPU internal precision to single-precision, double-precision, or extended-prevision.

Note that I only made sure SSE control word is set properly.
Old x87 has its own control word, and it is most likely not affected by anything, especially on Linux.
TDM has moved off this goddamn x87 monster long time ago, and there is no reason to look back =)

Now the plan is to see when all these FPU settings (FTZ+DAZ, exceptions, FPU precision) are set,
and ensure they are set on all threads on all platforms.
stgatilov

stgatilov

12.04.2021 15:18

administrator   ~0013840

Last edited: 14.04.2021 06:01

More changes:
  r9256. Added ThreadStartup and ThreadHeartbeat functions to idSys.
  r9257. Reworked how FPU properties are set on main thread.
  r9258. Set FPU props on frontend threads.
  r9259. Set FPU properties on Async thread. (+ r9265. Fixed shutting down async thread)
  r9260. Set FPU properties on job system worker threads.

Note that FPU settings are thread-private.
When new thread is started, it usually gets system-default values.
It means that if we want to e.g. disable denormals globally, we have to do it in every thread.

I have added two functions for setting and updating FPU properties:
  sys->ThreadStartup();
  sys->ThreadHeartbeat();
Now their calls are added to the following threads:
  main thread
  frontend thread
  async thread
  job system worker threads
All the other threads still have denormals, and will not crash on infinity/NaN if com_fpexceptions is set.
stgatilov

stgatilov

14.04.2021 06:06

administrator   ~0013846

More changes follow:
  r9261. Removed VPCALL weirdness.
  r9262. Removed SIMD implementations under MACOS_X and __i386__ only.
  r9263. Removed some weird MACOS_X ifdefs.
  r9267. Removed hacks to disable optimization on GCC on some functions/files.
  r9268. Removed already commented SIMD code for sound resampling.
  r9269. testSIMD now uses sys->GetClockTicks() to measure time, instead of platform-specific hacks.
  r9270. Fixed warning about "no symbols in obj file".
  r9271. Moved original SIMD assembly by ID into new idSIMD_IdAsm processor and disabled it by default.

This removes most of the platform-specific garbage and unnecessary stuff from SIMD code.

The titanic amount of inline assembly originally written by ID software is moved to special "IdAsm" processor, so it does not bother us anymore.
It is only compiled on MSVC 32-bit (which is deprecated BTW), and can only be enabled explicitly via com_forceGenericSIMD cvar.
This is in fact behavior change: 32-bit Windows game has switched from ID's code to our own (which is used on all other platforms).
Brief testing showed that our SSE3 code is in general faster than ID's assembly, mainly due to much faster DeriveTangents.
stgatilov

stgatilov

14.04.2021 12:44

administrator   ~0013847

And the final pack of commits:
  r9272. Fixed GCC build: declared virtual functions must be defined.
  r9273. Now idSIMDProcessor has virtual destructor (fixing warning).
  r9274. Set -msse2 in CMake only if compiler supports it (check_cxx_compiler_flag).
  r9275 + r9276. Do not use/include sys_intrinsics.h.
  r9277. testSIMD command no longer accepts processor name as argument. The currently enabled processor is always tested against generic one.
  r9278. Do not compile SSE/AVX SIMD processors on non-x86 architectures.
  r9279. Protect CPUID instruction and xmmintrin.h include from non-x86 architectures.

The main goal of these changes is to allow compiling TDM on non-x86 architectures.
It was rather hard to do with TDM 2.09 source code, because there were tons of CPUID/SSE intrinsics and headers, used without any checks.
Now it should be possible.

In fact, I managed to cross-compile all TDM code for 64-bit ARM architecture at rev 9279.
I only had to provide CPUSTRING define for ARM, in order to avoid "#error unknown CPU":
  +#elif defined(__aarch64__)
  +#define CPUSTRING "arm_64"
  +#elif defined(__arm__)
  +#define CPUSTRING "arm"
Note that I only COMPILED all our code, I did not try to link it.
That would obviously require building all third-party libs for ARM, which is not fun.
So there might be linking errors. Of course, I did not test it too =)

One interesting point in the changes is removal of sys_intrinsics.h usages.
We have two ways to use SSE/AVX in the code now:
 1) void myfunction() {
     #ifdef __SSE2__
       {fast code}
     #else
       {generic code}
     #endif
 2) idSIMDProcessor
Both of these ways always provide generic code to be used if SSE is not available.
The functions from sys_intrinsics.h in its current form do not provide generic implementation, and thus tie TDM to x86 CPU.
If we ever decide to copy more code from D3BFG relying on them, then we'll have to provide generic implementations.

Issue History

Date Modified Username Field Change
02.07.2017 06:14 stgatilov New Issue
02.07.2017 06:14 stgatilov Assigned To => stgatilov
02.07.2017 06:14 stgatilov Status new => assigned
30.08.2017 16:59 nbohr1more Relationship added related to 0002427
30.08.2017 17:02 nbohr1more Relationship added related to 0004435
30.08.2017 17:37 nbohr1more Note Added: 0009131
13.09.2017 21:14 nbohr1more Relationship added related to 0003177
27.06.2018 17:22 stgatilov Note Added: 0010590
27.06.2018 17:22 stgatilov Relationship added related to 0004832
05.07.2018 02:50 stgatilov Relationship added related to 0004857
05.07.2018 07:39 duzenko Note Added: 0010644
20.01.2019 07:18 stgatilov Target Version => TDM 2.08
22.02.2020 13:47 stgatilov Target Version TDM 2.08 =>
21.03.2020 17:40 stgatilov Target Version => TDM 2.09
19.05.2020 16:24 stgatilov Note Added: 0012524
05.12.2020 12:29 stgatilov Target Version TDM 2.09 => TDM 2.10
11.04.2021 15:01 stgatilov Note Added: 0013836
11.04.2021 16:35 stgatilov Note Added: 0013837
12.04.2021 15:18 stgatilov Note Added: 0013840
14.04.2021 06:01 stgatilov Note Edited: 0013840
14.04.2021 06:06 stgatilov Note Added: 0013846
14.04.2021 12:44 stgatilov Note Added: 0013847
14.04.2021 12:45 stgatilov Status assigned => resolved
14.04.2021 12:45 stgatilov Resolution open => fixed
14.04.2021 12:45 stgatilov Fixed in Version => TDM 2.10
14.04.2021 12:45 stgatilov Description Updated