View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0004919 | The Dark Mod | Coding | public | 19.12.2018 04:52 | 26.12.2018 11:45 |
Reporter | stgatilov | Assigned To | duzenko | ||
Priority | normal | Severity | crash | Reproducibility | random |
Status | resolved | Resolution | fixed | ||
Product Version | TDM 2.07 | ||||
Target Version | TDM 2.07 | Fixed in Version | TDM 2.07 | ||
Summary | 0004919: TDM sometimes crashes after finishing mission | ||||
Description | There are reports from different people about TDM crashing when you finish mission. Unfortunately, the problem seems randomly reproducible. | ||||
Steps To Reproduce | Finish missions =) | ||||
Additional Information | Reports: StiFU: http://forums.thedarkmod.com/topic/19756-207-beta-will-start-soon/page-2#entry430992 http://forums.thedarkmod.com/topic/19756-207-beta-will-start-soon/page-3#entry431131 Nbohr1more: http://forums.thedarkmod.com/topic/19756-207-beta-will-start-soon/page-5#entry431288 http://forums.thedarkmod.com/topic/19756-207-beta-will-start-soon/page-5#entry431344 | ||||
Tags | No tags attached. | ||||
Attached Files | |||||
Managed to reproduce quite reliably on Rat's Triumphant. I have attached my config file and savegame (for beta207-01 build). Debug symbols for the build can be found on developer forums. Steps to reproduce are: 1) Load game from save. 2) Open the door in front of you and go out. 3) Mission ends: if you see the success screen, repeat from step 1. Sometimes one repeat is enough, sometimes I need three. I have even put steps 1-3 into automation macro (attached as 4919.seq). You can replay it by putting the file near console.py, then from within console execute "@file 4919.seq". |
|
Stack trace on crash (Main thread): > [Inline Frame] TheDarkModx64.exe!GetEffectiveLightRadius() Line 322 C++ TheDarkModx64.exe!RB_GLSL_DrawInteractions_ShadowMap(const drawSurf_s * surf, bool clear) Line 346 C++ [Inline Frame] TheDarkModx64.exe!RB_GLSL_DrawLight_ShadowMap() Line 419 C++ TheDarkModx64.exe!RB_GLSL_DrawInteractions_SingleLight() Line 476 C++ TheDarkModx64.exe!RB_GLSL_DrawInteractions() Line 595 C++ TheDarkModx64.exe!RB_STD_DrawView() Line 1210 C++ TheDarkModx64.exe!RB_DrawView() Line 955 C++ TheDarkModx64.exe!RB_ExecuteBackEndCommands(const emptyCommand_t * cmds) Line 904 C++ [Inline Frame] TheDarkModx64.exe!R_IssueRenderCommands(frameData_t * frameData) Line 135 C++ TheDarkModx64.exe!idRenderSystemLocal::EndFrame(int * frontEndMsec, int * backEndMsec) Line 615 C++ TheDarkModx64.exe!idSessionLocal::UpdateScreen(bool outOfSequence) Line 2651 C++ It crashes while trying to evaluate: float lightRadius = backEnd.vLight->lightDef->parms.radius; It seems that backEnd.vLight is still alive, but backEnd.vLight->lightDef is already dead (unable to read memory). I am on shadow maps and have both uncapped FPS and multithreading enabled. |
|
Here is another stack trace: > TheDarkModx64.exe!RB_CreateSingleDrawInteractions(const drawSurf_s * surf) Line 764 C++ TheDarkModx64.exe!RB_GLSL_CreateDrawInteractions(const drawSurf_s * surf) Line 192 C++ TheDarkModx64.exe!RB_GLSL_DrawLight_Stencil() Line 314 C++ TheDarkModx64.exe!RB_GLSL_DrawInteractions_SingleLight() Line 482 C++ TheDarkModx64.exe!RB_GLSL_DrawInteractions() Line 595 C++ TheDarkModx64.exe!RB_STD_DrawView() Line 1210 C++ TheDarkModx64.exe!RB_DrawView() Line 955 C++ TheDarkModx64.exe!RB_ExecuteBackEndCommands(const emptyCommand_t * cmds) Line 904 C++ [Inline Frame] TheDarkModx64.exe!R_IssueRenderCommands(frameData_t * frameData) Line 135 C++ The crash is here: // nbohr1more: 0004292 nospecular and nodiffuse fix else if ( backEnd.vLight->lightDef->parms.noSpecular ) { break; } else if ( inter.specularImage ) { RB_SubmittInteraction( &inter ); } Once again: backEnd.vLight->lightDef points to memory which cannot be read. |
|
BTW, I have found the following comment: typedef struct viewLight_s { struct viewLight_s *next; // back end should NOT reference the lightDef, because it can change when running SMP idRenderLightLocal *lightDef; ... }; It seems that the code snippets above disobey this rule. |
|
The crash happens even with SMP off. | |
Other than untangling why the renderer is shutting down improperly at the end of missions, perhaps we can force caching similar to this stanza in Light.cpp: // make sure the demonic shader is cached if ( spawnArgs.GetString( "mat_demonic", NULL, &demonic_shader ) ) { declManager->FindType( DECL_MATERIAL, demonic_shader ); } Not sure the memory leak implications though. |
|
I just cured the crash in my local SVN with these caches of your above failure points.: if ( spawnArgs.GetString( "noSpecular", NULL, &nospec_set ) ) { declManager->FindType( DECL_ENTITYDEF, nospec_set ); } if ( spawnArgs.GetString( "radius", NULL, &radius_set ) ) { declManager->FindType( DECL_ENTITYDEF, radius_set ); } Not sure if I should just keep going and add cache for nearly every light entity arg? |
|
Added all these common light entity attributes: const char *nospec_set; const char *radius_set; const char *emitter_size_set; const char *noShadows_set; const char *pointLight_set; const char *parallel_set; Rev 7851 trunk |
|
The problem still happens for me on fresh rev 7851, with the same repro steps and on the same line of code. Since the changes do not help, I suggest rolling them back. By the way, I seriously suggest using automation to reproduce the problem. You need to install newest Python 3, make sure it is added to path, then run: python console.py 4919x10.seq From the /devel/auto/scripts/. Just make sure that you have latest automation, both .seq files near console.py, and savegame in the proper place. |
|
Can I suggest to copy those lightDef fields to viewLight_s on the frontend? | |
I guess the solution should be something like that. To be honest, I hoped there were some generic policy about it. I recall some "frontendGeo"/"backendGeo" stuff where data was copied for backend to read later. Most likely the same approach should be used here. And it is quite surprising that it is not yet established. It would be also great to hear cabalistic's opinion on this. |
|
@nbohr1more, if revision 7852 works for you then you might want to roll back your caching code | |
Hm, I'm a bit confused. If this bug were down to the backend touching stuff that it shouldn't, shouldn't it have affected 2.06, as well? Or are the parts that access the lightDef new code? Although we probably do want to stop the backend from accessign the lightDef, if the crash still happens with SMP off, then the root cause in this instance might be a little more complicated. The race condition that the comment warns about would not happen with SMP off, so the vLight might have been corrupted by other means... |
|
The first code snippet (I saw it more frequently) was added after 2.06. The second one was existing before, although I caught crash there with SMP on, so who knows... |
|
@duzenko: You probably won't like this feedback. I tried rev 7852 with 7851 reverted and instantly got the crash. I put my changes from 7851 back and have not been able to reproduce the crash. I don't think either change hurts anything and 7852 seems to move more of this to the render side so I guess we could leave things as is and merge to 2.07 if nobody comes up with a better solution. We'll at least reduce the likeliness of this issue. |
|
I tried to reproduce the issue on rev 7852: about 30 launches with finishing mission 5 times in a row on each launch. The problem does not happen. | |
To be clear, I had to revert 7851 in my local copy to get the crash while on 7852. |
|
I failed to confirm that. I took 7852 and rolled back 7851 locally. The issue does not happen for me on the attached config file on Release x64. Nbohr1more, could you please re-check that? I suspect that "it must have been rats"... Also, could you please show what exactly you do to reproduce it? Overally, given the random nature of this issue, the more confirmations of anything we have, the better. As for the changes in 7851, I don't understand how could they help. As far as I see it, the changes only make sure that some parts of decl files are cached into memory from the very beginning. But: 1) The soft radius is checked every frame, so if it was taken from that memory location, it would surely be cached after a frame or too. 2) It seems to me that the memory which is read in the crash sites is not from decl files, and loading something into decl files should not affect it in any way. |
|
@nbohr1more, a stack trace would be great Do you have any of the two multi-lights shaders on? They're still accessing the lightDef directly but since they're off by default I did not fix them yet. |
|
I'll try to get a stack from a debug run after Christmas. I'm gonna be tied up with family activities 'till Wednesday. If I'm lucky, maybe Tuesday night. The crashes, thus far, have bypassed the auto-debugger prompt in MSVC. |
|
Have you enabled break on all exceptions in the debugger? Might help | |
STiFU, instead of giving advice you'd better try to reproduce the problem yourself on rev 7851, on rev 7852, and on rev 7852 with rev 7851 rolled back. It would be more helpful =))) |
|
Will do as soon as I am back from Christmas holidays. | |
Thus far I am unable to reproduce with a debug build. | |
I wasn't able to reproduce on the initial load of my debug save but then I did an iron man (no save) run of the mission and got a freeze then crash to desktop. A subsequent load of the save before the exit door of that last run failed to produce the crash. |
|
Ok, I'll create a script to finish the mission from start to end and check it. Anyway, I have already reverted 7851 in trunk and in process of building next beta package. |
|
Thank you. I just did another iron man run without crashing so it seems (at least) the rarity of this failure has increased. |
|
I have added a sequence for fast playthrough. It starts new game, teleports to main square, walks a bit there, rotates camera full around, teleports to testament, takes it with some loot nearby, teleports to mission finish area. I replayed it dozens of times on x64, both Release and Debug, and no crashes happened. I played with uncapped FPS and SMP on. Most of the runs had no soft shadows, but I also did some with soft shadows, both Maps and Stencil. It looks like the crash does not happen on my simple sequence. Of course, the real playthrough is much more representative, and it might turn out that crash happens depending on some conditions which I do not meet. I think as soon as we find a reliable way reproduce it, just create another bug about it =) |
|
Good enough for me. :) | |
Date Modified | Username | Field | Change |
---|---|---|---|
19.12.2018 04:52 | stgatilov | New Issue | |
19.12.2018 04:53 | stgatilov | Relationship added | related to 0004835 |
22.12.2018 13:06 | stgatilov | File Added: repro4919.save | |
22.12.2018 13:06 | stgatilov | File Added: 4919.seq | |
22.12.2018 13:07 | stgatilov | File Added: Darkmod.cfg | |
22.12.2018 13:10 | stgatilov | Note Added: 0011049 | |
22.12.2018 13:14 | stgatilov | Note Added: 0011050 | |
22.12.2018 13:27 | stgatilov | Note Added: 0011053 | |
22.12.2018 13:36 | stgatilov | Note Added: 0011054 | |
22.12.2018 14:03 | stgatilov | Note Added: 0011055 | |
23.12.2018 03:37 | nbohr1more | Note Added: 0011061 | |
23.12.2018 03:38 | nbohr1more | Note Edited: 0011061 | |
23.12.2018 03:38 | nbohr1more | Note Edited: 0011061 | |
23.12.2018 04:41 | nbohr1more | Note Added: 0011062 | |
23.12.2018 05:46 | nbohr1more | Note Added: 0011064 | |
23.12.2018 05:47 | nbohr1more | Status | new => feedback |
23.12.2018 07:19 | stgatilov | Note Added: 0011065 | |
23.12.2018 07:19 | stgatilov | Status | feedback => new |
23.12.2018 07:19 | stgatilov | File Deleted: 4919.seq | |
23.12.2018 07:20 | stgatilov | File Added: 4919.seq | |
23.12.2018 07:20 | stgatilov | File Added: 4919x10.seq | |
23.12.2018 13:01 | stgatilov | Assigned To | => cabalistic |
23.12.2018 13:01 | stgatilov | Status | new => assigned |
23.12.2018 14:48 | duzenko | Note Added: 0011073 | |
23.12.2018 15:02 | stgatilov | Note Added: 0011076 | |
23.12.2018 15:02 | stgatilov | Target Version | => TDM 2.07 |
23.12.2018 15:13 | duzenko | Note Added: 0011079 | |
23.12.2018 15:17 | cabalistic | Note Added: 0011080 | |
23.12.2018 17:15 | stgatilov | Note Added: 0011090 | |
24.12.2018 03:01 | nbohr1more | Note Added: 0011093 | |
24.12.2018 04:24 | stgatilov | Note Added: 0011094 | |
24.12.2018 04:30 | nbohr1more | Note Added: 0011095 | |
24.12.2018 05:08 | stgatilov | Note Added: 0011096 | |
24.12.2018 06:32 | duzenko | Note Added: 0011097 | |
24.12.2018 14:12 | nbohr1more | Note Added: 0011098 | |
24.12.2018 14:16 | STiFU | Note Added: 0011099 | |
24.12.2018 17:35 | stgatilov | Note Added: 0011100 | |
24.12.2018 22:17 | STiFU | Note Added: 0011104 | |
25.12.2018 05:20 | nbohr1more | Note Added: 0011106 | |
25.12.2018 05:43 | nbohr1more | Note Added: 0011107 | |
25.12.2018 05:53 | stgatilov | Note Added: 0011108 | |
25.12.2018 06:02 | nbohr1more | Note Added: 0011109 | |
26.12.2018 03:52 | stgatilov | File Added: rats_complete.seq | |
26.12.2018 03:56 | stgatilov | Note Added: 0011120 | |
26.12.2018 03:57 | stgatilov | Assigned To | cabalistic => duzenko |
26.12.2018 03:57 | stgatilov | Status | assigned => resolved |
26.12.2018 03:57 | stgatilov | Fixed in Version | => TDM 2.07 |
26.12.2018 03:57 | stgatilov | Resolution | open => fixed |
26.12.2018 03:59 | nbohr1more | Note Added: 0011121 | |
26.12.2018 04:00 | nbohr1more | Relationship deleted | related to 0004835 |
26.12.2018 04:01 | nbohr1more | Relationship added | has duplicate 0004835 |