View Issue Details

IDProjectCategoryView StatusLast Update
0004919The Dark ModCodingpublic26.12.2018 11:45
Reporterstgatilov Assigned Toduzenko  
PrioritynormalSeveritycrashReproducibilityrandom
Status resolvedResolutionfixed 
Product VersionTDM 2.07 
Target VersionTDM 2.07Fixed in VersionTDM 2.07 
Summary0004919: TDM sometimes crashes after finishing mission
DescriptionThere are reports from different people about TDM crashing when you finish mission. Unfortunately, the problem seems randomly reproducible.
Steps To ReproduceFinish missions =)
Additional InformationReports:
StiFU:
  http://forums.thedarkmod.com/topic/19756-207-beta-will-start-soon/page-2#entry430992
  http://forums.thedarkmod.com/topic/19756-207-beta-will-start-soon/page-3#entry431131

Nbohr1more:
  http://forums.thedarkmod.com/topic/19756-207-beta-will-start-soon/page-5#entry431288
  http://forums.thedarkmod.com/topic/19756-207-beta-will-start-soon/page-5#entry431344
TagsNo tags attached.

Relationships

has duplicate 0004835 closedstgatilov Somewhere Above the City: debriefing hangs 

Activities

stgatilov

stgatilov

22.12.2018 13:06

administrator  

repro4919.save (1,545,084 bytes)
stgatilov

stgatilov

22.12.2018 13:07

administrator  

Darkmod.cfg (12,460 bytes)
stgatilov

stgatilov

22.12.2018 13:10

administrator   ~0011049

Managed to reproduce quite reliably on Rat's Triumphant.

I have attached my config file and savegame (for beta207-01 build). Debug symbols for the build can be found on developer forums.

Steps to reproduce are:
1) Load game from save.
2) Open the door in front of you and go out.
3) Mission ends: if you see the success screen, repeat from step 1.
Sometimes one repeat is enough, sometimes I need three.

I have even put steps 1-3 into automation macro (attached as 4919.seq). You can replay it by putting the file near console.py, then from within console execute "@file 4919.seq".
stgatilov

stgatilov

22.12.2018 13:14

administrator   ~0011050

Stack trace on crash (Main thread):
> [Inline Frame] TheDarkModx64.exe!GetEffectiveLightRadius() Line 322 C++
     TheDarkModx64.exe!RB_GLSL_DrawInteractions_ShadowMap(const drawSurf_s * surf, bool clear) Line 346 C++
     [Inline Frame] TheDarkModx64.exe!RB_GLSL_DrawLight_ShadowMap() Line 419 C++
     TheDarkModx64.exe!RB_GLSL_DrawInteractions_SingleLight() Line 476 C++
     TheDarkModx64.exe!RB_GLSL_DrawInteractions() Line 595 C++
     TheDarkModx64.exe!RB_STD_DrawView() Line 1210 C++
     TheDarkModx64.exe!RB_DrawView() Line 955 C++
     TheDarkModx64.exe!RB_ExecuteBackEndCommands(const emptyCommand_t * cmds) Line 904 C++
     [Inline Frame] TheDarkModx64.exe!R_IssueRenderCommands(frameData_t * frameData) Line 135 C++
     TheDarkModx64.exe!idRenderSystemLocal::EndFrame(int * frontEndMsec, int * backEndMsec) Line 615 C++
     TheDarkModx64.exe!idSessionLocal::UpdateScreen(bool outOfSequence) Line 2651 C++

It crashes while trying to evaluate:
  float lightRadius = backEnd.vLight->lightDef->parms.radius;
It seems that backEnd.vLight is still alive, but backEnd.vLight->lightDef is already dead (unable to read memory).

I am on shadow maps and have both uncapped FPS and multithreading enabled.
stgatilov

stgatilov

22.12.2018 13:27

administrator   ~0011053

Here is another stack trace:
> TheDarkModx64.exe!RB_CreateSingleDrawInteractions(const drawSurf_s * surf) Line 764 C++
     TheDarkModx64.exe!RB_GLSL_CreateDrawInteractions(const drawSurf_s * surf) Line 192 C++
     TheDarkModx64.exe!RB_GLSL_DrawLight_Stencil() Line 314 C++
     TheDarkModx64.exe!RB_GLSL_DrawInteractions_SingleLight() Line 482 C++
     TheDarkModx64.exe!RB_GLSL_DrawInteractions() Line 595 C++
     TheDarkModx64.exe!RB_STD_DrawView() Line 1210 C++
     TheDarkModx64.exe!RB_DrawView() Line 955 C++
     TheDarkModx64.exe!RB_ExecuteBackEndCommands(const emptyCommand_t * cmds) Line 904 C++
     [Inline Frame] TheDarkModx64.exe!R_IssueRenderCommands(frameData_t * frameData) Line 135 C++

The crash is here:
  // nbohr1more: 0004292 nospecular and nodiffuse fix
  else if ( backEnd.vLight->lightDef->parms.noSpecular ) {
    break;
  } else if ( inter.specularImage ) {
    RB_SubmittInteraction( &inter );
  }

Once again: backEnd.vLight->lightDef points to memory which cannot be read.
stgatilov

stgatilov

22.12.2018 13:36

administrator   ~0011054

BTW, I have found the following comment:

typedef struct viewLight_s {
  struct viewLight_s *next;
  // back end should NOT reference the lightDef, because it can change when running SMP
  idRenderLightLocal *lightDef;
  ...
};

It seems that the code snippets above disobey this rule.
stgatilov

stgatilov

22.12.2018 14:03

administrator   ~0011055

The crash happens even with SMP off.
nbohr1more

nbohr1more

23.12.2018 03:37

developer   ~0011061

Last edited: 23.12.2018 03:38

Other than untangling why the renderer is shutting down improperly at the end of
missions, perhaps we can force caching similar to this stanza in Light.cpp:

// make sure the demonic shader is cached
    if ( spawnArgs.GetString( "mat_demonic", NULL, &demonic_shader ) )
    {
        declManager->FindType( DECL_MATERIAL, demonic_shader );
    }

Not sure the memory leak implications though.

nbohr1more

nbohr1more

23.12.2018 04:41

developer   ~0011062

I just cured the crash in my local SVN with these caches of your above failure points.:

if ( spawnArgs.GetString( "noSpecular", NULL, &nospec_set ) )
    {
        declManager->FindType( DECL_ENTITYDEF, nospec_set );
    }
    
    if ( spawnArgs.GetString( "radius", NULL, &radius_set ) )
    {
        declManager->FindType( DECL_ENTITYDEF, radius_set );
    }

Not sure if I should just keep going and add cache for nearly every light entity arg?
nbohr1more

nbohr1more

23.12.2018 05:46

developer   ~0011064

Added all these common light entity attributes:

    const char *nospec_set;
    const char *radius_set;
    const char *emitter_size_set;
    const char *noShadows_set;
    const char *pointLight_set;
    const char *parallel_set;

Rev 7851 trunk
stgatilov

stgatilov

23.12.2018 07:19

administrator   ~0011065

The problem still happens for me on fresh rev 7851, with the same repro steps and on the same line of code.
Since the changes do not help, I suggest rolling them back.

By the way, I seriously suggest using automation to reproduce the problem.
You need to install newest Python 3, make sure it is added to path, then run:
  python console.py 4919x10.seq
From the /devel/auto/scripts/.
Just make sure that you have latest automation, both .seq files near console.py, and savegame in the proper place.
stgatilov

stgatilov

23.12.2018 07:20

administrator  

4919.seq (159 bytes)
stgatilov

stgatilov

23.12.2018 07:20

administrator  

4919x10.seq (160 bytes)
duzenko

duzenko

23.12.2018 14:48

developer   ~0011073

Can I suggest to copy those lightDef fields to viewLight_s on the frontend?
stgatilov

stgatilov

23.12.2018 15:02

administrator   ~0011076

I guess the solution should be something like that.

To be honest, I hoped there were some generic policy about it. I recall some "frontendGeo"/"backendGeo" stuff where data was copied for backend to read later. Most likely the same approach should be used here. And it is quite surprising that it is not yet established.

It would be also great to hear cabalistic's opinion on this.
duzenko

duzenko

23.12.2018 15:13

developer   ~0011079

@nbohr1more, if revision 7852 works for you then you might want to roll back your caching code
cabalistic

cabalistic

23.12.2018 15:17

developer   ~0011080

Hm, I'm a bit confused. If this bug were down to the backend touching stuff that it shouldn't, shouldn't it have affected 2.06, as well? Or are the parts that access the lightDef new code?

Although we probably do want to stop the backend from accessign the lightDef, if the crash still happens with SMP off, then the root cause in this instance might be a little more complicated. The race condition that the comment warns about would not happen with SMP off, so the vLight might have been corrupted by other means...
stgatilov

stgatilov

23.12.2018 17:15

administrator   ~0011090

The first code snippet (I saw it more frequently) was added after 2.06.
The second one was existing before, although I caught crash there with SMP on, so who knows...
nbohr1more

nbohr1more

24.12.2018 03:01

developer   ~0011093

@duzenko:

You probably won't like this feedback.

I tried rev 7852 with 7851 reverted and instantly got the crash.

I put my changes from 7851 back and have not been able to reproduce the crash.

I don't think either change hurts anything and 7852 seems to move more of this
to the render side so I guess we could leave things as is and merge to 2.07 if
nobody comes up with a better solution. We'll at least reduce the likeliness of
this issue.
stgatilov

stgatilov

24.12.2018 04:24

administrator   ~0011094

I tried to reproduce the issue on rev 7852: about 30 launches with finishing mission 5 times in a row on each launch. The problem does not happen.
nbohr1more

nbohr1more

24.12.2018 04:30

developer   ~0011095

To be clear, I had to revert 7851 in my local copy to get the crash
while on 7852.
stgatilov

stgatilov

24.12.2018 05:08

administrator   ~0011096

I failed to confirm that.

I took 7852 and rolled back 7851 locally.
The issue does not happen for me on the attached config file on Release x64.
Nbohr1more, could you please re-check that?
I suspect that "it must have been rats"...
Also, could you please show what exactly you do to reproduce it?

Overally, given the random nature of this issue, the more confirmations of anything we have, the better.

As for the changes in 7851, I don't understand how could they help.
As far as I see it, the changes only make sure that some parts of decl files are cached into memory from the very beginning. But:
1) The soft radius is checked every frame, so if it was taken from that memory location, it would surely be cached after a frame or too.
2) It seems to me that the memory which is read in the crash sites is not from decl files, and loading something into decl files should not affect it in any way.
duzenko

duzenko

24.12.2018 06:32

developer   ~0011097

@nbohr1more, a stack trace would be great

Do you have any of the two multi-lights shaders on? They're still accessing the lightDef directly but since they're off by default I did not fix them yet.
nbohr1more

nbohr1more

24.12.2018 14:12

developer   ~0011098

I'll try to get a stack from a debug run after Christmas. I'm gonna be tied up with family activities 'till Wednesday.

If I'm lucky, maybe Tuesday night.

The crashes, thus far, have bypassed the auto-debugger prompt in MSVC.
STiFU

STiFU

24.12.2018 14:16

developer   ~0011099

Have you enabled break on all exceptions in the debugger? Might help
stgatilov

stgatilov

24.12.2018 17:35

administrator   ~0011100

STiFU, instead of giving advice you'd better try to reproduce the problem yourself on rev 7851, on rev 7852, and on rev 7852 with rev 7851 rolled back.
It would be more helpful =)))
STiFU

STiFU

24.12.2018 22:17

developer   ~0011104

Will do as soon as I am back from Christmas holidays.
nbohr1more

nbohr1more

25.12.2018 05:20

developer   ~0011106

Thus far I am unable to reproduce with a debug build.
nbohr1more

nbohr1more

25.12.2018 05:43

developer   ~0011107

I wasn't able to reproduce on the initial load of my debug save but then I did an iron man (no save) run of the mission and got a freeze then crash to desktop.

A subsequent load of the save before the exit door of that last run failed to produce the crash.
stgatilov

stgatilov

25.12.2018 05:53

administrator   ~0011108

Ok, I'll create a script to finish the mission from start to end and check it.
Anyway, I have already reverted 7851 in trunk and in process of building next beta package.
nbohr1more

nbohr1more

25.12.2018 06:02

developer   ~0011109

Thank you.

I just did another iron man run without crashing so it seems (at least) the rarity of this failure has increased.
stgatilov

stgatilov

26.12.2018 03:52

administrator  

rats_complete.seq (1,053 bytes)
stgatilov

stgatilov

26.12.2018 03:56

administrator   ~0011120

I have added a sequence for fast playthrough.
It starts new game, teleports to main square, walks a bit there, rotates camera full around, teleports to testament, takes it with some loot nearby, teleports to mission finish area.

I replayed it dozens of times on x64, both Release and Debug, and no crashes happened. I played with uncapped FPS and SMP on. Most of the runs had no soft shadows, but I also did some with soft shadows, both Maps and Stencil.

It looks like the crash does not happen on my simple sequence. Of course, the real playthrough is much more representative, and it might turn out that crash happens depending on some conditions which I do not meet.
I think as soon as we find a reliable way reproduce it, just create another bug about it =)
nbohr1more

nbohr1more

26.12.2018 03:59

developer   ~0011121

Good enough for me. :)

Issue History

Date Modified Username Field Change
19.12.2018 04:52 stgatilov New Issue
19.12.2018 04:53 stgatilov Relationship added related to 0004835
22.12.2018 13:06 stgatilov File Added: repro4919.save
22.12.2018 13:06 stgatilov File Added: 4919.seq
22.12.2018 13:07 stgatilov File Added: Darkmod.cfg
22.12.2018 13:10 stgatilov Note Added: 0011049
22.12.2018 13:14 stgatilov Note Added: 0011050
22.12.2018 13:27 stgatilov Note Added: 0011053
22.12.2018 13:36 stgatilov Note Added: 0011054
22.12.2018 14:03 stgatilov Note Added: 0011055
23.12.2018 03:37 nbohr1more Note Added: 0011061
23.12.2018 03:38 nbohr1more Note Edited: 0011061
23.12.2018 03:38 nbohr1more Note Edited: 0011061
23.12.2018 04:41 nbohr1more Note Added: 0011062
23.12.2018 05:46 nbohr1more Note Added: 0011064
23.12.2018 05:47 nbohr1more Status new => feedback
23.12.2018 07:19 stgatilov Note Added: 0011065
23.12.2018 07:19 stgatilov Status feedback => new
23.12.2018 07:19 stgatilov File Deleted: 4919.seq
23.12.2018 07:20 stgatilov File Added: 4919.seq
23.12.2018 07:20 stgatilov File Added: 4919x10.seq
23.12.2018 13:01 stgatilov Assigned To => cabalistic
23.12.2018 13:01 stgatilov Status new => assigned
23.12.2018 14:48 duzenko Note Added: 0011073
23.12.2018 15:02 stgatilov Note Added: 0011076
23.12.2018 15:02 stgatilov Target Version => TDM 2.07
23.12.2018 15:13 duzenko Note Added: 0011079
23.12.2018 15:17 cabalistic Note Added: 0011080
23.12.2018 17:15 stgatilov Note Added: 0011090
24.12.2018 03:01 nbohr1more Note Added: 0011093
24.12.2018 04:24 stgatilov Note Added: 0011094
24.12.2018 04:30 nbohr1more Note Added: 0011095
24.12.2018 05:08 stgatilov Note Added: 0011096
24.12.2018 06:32 duzenko Note Added: 0011097
24.12.2018 14:12 nbohr1more Note Added: 0011098
24.12.2018 14:16 STiFU Note Added: 0011099
24.12.2018 17:35 stgatilov Note Added: 0011100
24.12.2018 22:17 STiFU Note Added: 0011104
25.12.2018 05:20 nbohr1more Note Added: 0011106
25.12.2018 05:43 nbohr1more Note Added: 0011107
25.12.2018 05:53 stgatilov Note Added: 0011108
25.12.2018 06:02 nbohr1more Note Added: 0011109
26.12.2018 03:52 stgatilov File Added: rats_complete.seq
26.12.2018 03:56 stgatilov Note Added: 0011120
26.12.2018 03:57 stgatilov Assigned To cabalistic => duzenko
26.12.2018 03:57 stgatilov Status assigned => resolved
26.12.2018 03:57 stgatilov Fixed in Version => TDM 2.07
26.12.2018 03:57 stgatilov Resolution open => fixed
26.12.2018 03:59 nbohr1more Note Added: 0011121
26.12.2018 04:00 nbohr1more Relationship deleted related to 0004835
26.12.2018 04:01 nbohr1more Relationship added has duplicate 0004835