View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0002427 | The Dark Mod | Coding | public | 19.11.2010 07:43 | 06.10.2017 03:18 |
Reporter | tels | Assigned To | stgatilov | ||
Priority | normal | Severity | normal | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Product Version | TDM 1.02 | ||||
Target Version | TDM 2.06 | Fixed in Version | TDM 2.06 | ||
Summary | 0002427: Broken SIMD Support on Linux | ||||
Description | Even with 0002413 fixed, no SIMD provider is chosen, it seems most SIMD functions are simply absent in Linux and would need to be reimplemented. Doing so could give a nice speedbost for certain functions. | ||||
Additional Information | A few tutorials for reference: * http://www.linux-tutorial.info/modules.php?name=Howto&pagename=Parallel-Processing-HOWTO/Parallel-Processing-HOWTO-4.html * http://www.kernel.org/pub/linux/kernel/people/geoff/cell/ps3-linux-docs/ps3-linux-docs-latest/CellProgrammingTutorial/BasicsOfSIMDProgramming.html * http://www.codeproject.com/KB/recipes/sseintro.aspx http://forums.thedarkmod.com/topic/11828-simd-module-and-cpuid-on-linux/#entry234725 | ||||
Tags | No tags attached. | ||||
I looked through "Simd_SSE.cpp". It contains only msvc implementation which is written in inline assembly. Damn assembler addicts=) There is also a tiny section under "#if defined(MACOS_X) && defined(__i386__)" which uses intel intrinsics but it has about three functions only. Also I downloaded quake4sdk. It has gcc assembly for Simd_MMX with cool comments but still has no GCC support in Simd_SSE. There are three possible solutions: 1. Try converting assembly. I think it is almost impossible though. 2. Write Simd_SSE on intel sse intrinsics. This way it'll compile on MSVC, GCC, ICC and probably some other compilers. A lot of work as I see... 3. Kick GCC for auto-vectorization. Maybe it can vectorize SIMD_Generic code? Here is the comment from q4sdk "Simd_MMX.cpp": /* gcc inline assembly: inline assembly for the MMX SIMD processor written there mostly as an experiment does not increase performance on timedemos ( nor did I expect it to, libc-i686 does the job very well already ) although the newer gcc can read inline asm using the intel syntax ( with minor reformatting and escaping of register names ), it's still a long way from providing an easy compatibility with MSVC inline assembly mostly because of the input/output registers, the clobber lists and generally all the things gcc tries to be clever about when you give it a piece of inline assembly ( typically, compiling this at -O1 or better will produce bad code, and some of it won't compile with -fPIC either ) at this point, writing everything in nasm from the ground up, or using intel's compiler to produce the Simd_*.o objects is still the best alternative */ Update: I run TDM release build with AMD CodeAnalyst with com_forceGenericSIMD 1 and compared CPU time spent in SIMD routines. It is about 5% with SSE and generic versions. I can't notice any SSE speedup from these stats. Can anyone prove that SSE version of SIMD gives any speedup? I thought about writing the most time-consuming SIMD routines in SSE intrinsics but now I think it is useless anyway. |
|
It probably depends on what exactly you do. One thing I thought it might be good for is using a LODE to generate a HUGE rendermodel, at the end it calls SIMDProcessor->MinMax( newSurf->geometry->bounds[0], newSurf->geometry->bounds[1], newSurf->geometry->verts, newSurf->geometry->numVerts ); and this might get speed up a lot. (But then, maybe the time it spends is only a fraction?) |
|
Won't fix it unless someone finds a good reason for doing it. | |
I am reopening this issue to remind me to look into it on the Linux side, esp. for rendermodel generation (where 1million tris copy take a very long time). | |
The current situation has changed from the previous time. First: TDM was optimized =) Second: MSVC2013 profiler is much better than CodeAnalyst Now several SIMD functions take considerable time, because they are used to process dynamic meshes each frame. Also, we have working 64-bit build now, which also suffers from lack of SIMD acceleration, just as Linux. I have just implemented SIMD versions of (whatever I noticed in profiler): DeriveTangents NormalizeTangents TransformVerts MinMax(const idDrawVert *, int) I did it from scratch, so the approach is often different. According to benchmark results, some of my functions are faster that ID's, some are slower (although I do not trust benchmark for TransformVerts). Anyway, my version is a good step from generic implementation =) I think I won't override ID's original functions in Win32 case. Also, I'll probably commit these improvements only when Linux build is fixed: http://forums.thedarkmod.com/topic/18940-reviving-linux-build/ |
|
I have committed the changes in revision 6991. Despite the fact the the functions take considerable time of TDM time, and they get faster now, overall FPS does not change much. I guess we are limited by CPU time wasted in OpenGL driver =( |
|
Date Modified | Username | Field | Change |
---|---|---|---|
19.11.2010 07:43 | tels | New Issue | |
19.11.2010 07:44 | tels | Relationship added | child of 0002413 |
19.11.2010 07:45 | tels | Additional Information Updated | |
14.01.2011 14:52 | stgatilov | Note Added: 0003454 | |
14.01.2011 16:20 | stgatilov | Note Edited: 0003454 | |
16.01.2011 10:26 | tels | Note Added: 0003458 | |
29.01.2011 07:14 | stgatilov | Assigned To | => stgatilov |
29.01.2011 07:14 | stgatilov | Status | new => assigned |
29.01.2011 07:17 | stgatilov | Note Added: 0003503 | |
29.01.2011 07:17 | stgatilov | Status | assigned => resolved |
29.01.2011 07:17 | stgatilov | Resolution | open => won't fix |
29.01.2011 07:18 | stgatilov | Assigned To | stgatilov => |
29.01.2011 09:43 | tels | Note Added: 0003505 | |
29.01.2011 09:43 | tels | Assigned To | => tels |
29.01.2011 09:43 | tels | Status | resolved => assigned |
22.11.2011 19:21 | tels | Assigned To | tels => |
03.01.2015 16:15 | grayman | Status | assigned => new |
24.06.2017 10:23 | stgatilov | Note Added: 0008925 | |
24.06.2017 10:23 | stgatilov | Assigned To | => stgatilov |
24.06.2017 10:23 | stgatilov | Status | new => assigned |
25.06.2017 22:50 | nbohr1more | Additional Information Updated | |
02.07.2017 06:10 | stgatilov | Note Added: 0008942 | |
02.07.2017 06:11 | stgatilov | Status | assigned => resolved |
02.07.2017 06:11 | stgatilov | Fixed in Version | => TDM 2.06 |
02.07.2017 06:11 | stgatilov | Resolution | won't fix => fixed |
02.07.2017 06:11 | stgatilov | Target Version | => TDM 2.06 |
30.08.2017 16:58 | nbohr1more | Relationship added | related to 0004613 |
30.08.2017 16:59 | nbohr1more | Relationship added | related to 0004550 |
06.10.2017 03:18 | nbohr1more | Relationship added | related to 0003594 |