View Issue Details

IDProjectCategoryView StatusLast Update
0002427The Dark ModCodingpublic06.10.2017 03:18
Reportertels Assigned Tostgatilov  
Status resolvedResolutionfixed 
Product VersionTDM 1.02 
Target VersionTDM 2.06Fixed in VersionTDM 2.06 
Summary0002427: Broken SIMD Support on Linux
DescriptionEven with 0002413 fixed, no SIMD provider is chosen, it seems most SIMD functions are simply absent in Linux and would need to be reimplemented. Doing so could give a nice speedbost for certain functions.
Additional InformationA few tutorials for reference:

TagsNo tags attached.


related to 0004613 resolvedgrayman Unexpected supernatural event in "The Golden Skull" 
related to 0004550 assignedstgatilov Cleanup of SIMD code 
related to 0003594 closedstgatilov amd64 support 
child of 0002413 closedtels Broken CPUID on Linux 




14.01.2011 14:52

administrator   ~0003454

Last edited: 14.01.2011 16:20

View 2 revisions

I looked through "Simd_SSE.cpp". It contains only msvc implementation which is written in inline assembly. Damn assembler addicts=) There is also a tiny section under "#if defined(MACOS_X) && defined(__i386__)" which uses intel intrinsics but it has about three functions only. Also I downloaded quake4sdk. It has gcc assembly for Simd_MMX with cool comments but still has no GCC support in Simd_SSE.
There are three possible solutions:
1. Try converting assembly. I think it is almost impossible though.
2. Write Simd_SSE on intel sse intrinsics. This way it'll compile on MSVC, GCC, ICC and probably some other compilers. A lot of work as I see...
3. Kick GCC for auto-vectorization. Maybe it can vectorize SIMD_Generic code?

Here is the comment from q4sdk "Simd_MMX.cpp":
gcc inline assembly:
inline assembly for the MMX SIMD processor written there mostly as an experiment
does not increase performance on timedemos ( nor did I expect it to, libc-i686 does the job very well already )

although the newer gcc can read inline asm using the intel syntax ( with minor reformatting and escaping of register names ),
it's still a long way from providing an easy compatibility with MSVC inline assembly
mostly because of the input/output registers, the clobber lists
and generally all the things gcc tries to be clever about when you give it a piece of inline assembly
( typically, compiling this at -O1 or better will produce bad code, and some of it won't compile with -fPIC either )

at this point, writing everything in nasm from the ground up, or using intel's compiler to produce the Simd_*.o objects is
still the best alternative

Update: I run TDM release build with AMD CodeAnalyst with com_forceGenericSIMD 1 and compared CPU time spent in SIMD routines. It is about 5% with SSE and generic versions. I can't notice any SSE speedup from these stats. Can anyone prove that SSE version of SIMD gives any speedup? I thought about writing the most time-consuming SIMD routines in SSE intrinsics but now I think it is useless anyway.



16.01.2011 10:26

developer   ~0003458

It probably depends on what exactly you do. One thing I thought it might be good for is using a LODE to generate a HUGE rendermodel, at the end it calls

        SIMDProcessor->MinMax( newSurf->geometry->bounds[0], newSurf->geometry->bounds[1], newSurf->geometry->verts, newSurf->geometry->numVerts );

and this might get speed up a lot. (But then, maybe the time it spends is only a fraction?)


29.01.2011 07:17

administrator   ~0003503

Won't fix it unless someone finds a good reason for doing it.


29.01.2011 09:43

developer   ~0003505

I am reopening this issue to remind me to look into it on the Linux side, esp. for rendermodel generation (where 1million tris copy take a very long time).


24.06.2017 10:23

administrator   ~0008925

The current situation has changed from the previous time.
First: TDM was optimized =)
Second: MSVC2013 profiler is much better than CodeAnalyst

Now several SIMD functions take considerable time, because they are used to process dynamic meshes each frame. Also, we have working 64-bit build now, which also suffers from lack of SIMD acceleration, just as Linux.

I have just implemented SIMD versions of (whatever I noticed in profiler):
  MinMax(const idDrawVert *, int)
I did it from scratch, so the approach is often different. According to benchmark results, some of my functions are faster that ID's, some are slower (although I do not trust benchmark for TransformVerts). Anyway, my version is a good step from generic implementation =)

I think I won't override ID's original functions in Win32 case. Also, I'll probably commit these improvements only when Linux build is fixed:


02.07.2017 06:10

administrator   ~0008942

I have committed the changes in revision 6991.
Despite the fact the the functions take considerable time of TDM time, and they get faster now, overall FPS does not change much. I guess we are limited by CPU time wasted in OpenGL driver =(

Issue History

Date Modified Username Field Change
19.11.2010 07:43 tels New Issue
19.11.2010 07:44 tels Relationship added child of 0002413
19.11.2010 07:45 tels Additional Information Updated View Revisions
14.01.2011 14:52 stgatilov Note Added: 0003454
14.01.2011 16:20 stgatilov Note Edited: 0003454 View Revisions
16.01.2011 10:26 tels Note Added: 0003458
29.01.2011 07:14 stgatilov Assigned To => stgatilov
29.01.2011 07:14 stgatilov Status new => assigned
29.01.2011 07:17 stgatilov Note Added: 0003503
29.01.2011 07:17 stgatilov Status assigned => resolved
29.01.2011 07:17 stgatilov Resolution open => won't fix
29.01.2011 07:18 stgatilov Assigned To stgatilov =>
29.01.2011 09:43 tels Note Added: 0003505
29.01.2011 09:43 tels Assigned To => tels
29.01.2011 09:43 tels Status resolved => assigned
22.11.2011 19:21 tels Assigned To tels =>
03.01.2015 16:15 grayman Status assigned => new
24.06.2017 10:23 stgatilov Note Added: 0008925
24.06.2017 10:23 stgatilov Assigned To => stgatilov
24.06.2017 10:23 stgatilov Status new => assigned
25.06.2017 22:50 nbohr1more Additional Information Updated View Revisions
02.07.2017 06:10 stgatilov Note Added: 0008942
02.07.2017 06:11 stgatilov Status assigned => resolved
02.07.2017 06:11 stgatilov Fixed in Version => TDM 2.06
02.07.2017 06:11 stgatilov Resolution won't fix => fixed
02.07.2017 06:11 stgatilov Target Version => TDM 2.06
30.08.2017 16:58 nbohr1more Relationship added related to 0004613
30.08.2017 16:59 nbohr1more Relationship added related to 0004550
06.10.2017 03:18 nbohr1more Relationship added related to 0003594