View Issue Details

IDProjectCategoryView StatusLast Update
0004848The Dark ModCodingpublic30.07.2018 13:55
Reporterduzenko Assigned Toduzenko  
PriorityhighSeveritycrashReproducibilityalways
Status resolvedResolutionfixed 
Product VersionSVN 
Target VersionTDM 2.07Fixed in VersionTDM 2.07 
Summary0004848: Crashes when quick loading | Lightgem as Subview
DescriptionNot sure if it's related to the VBO changes, but still assigning to @cabalistic just in case.
Steps To ReproduceI usually crash the second or third time I quick load on maps like Inn Business or Rightful Property.
Additional Informationread access violation.
backEnd.vLight->lightDef was 0x49BC71E8.

> TheDarkModNoTools.exe!RB_CreateSingleDrawInteractions(const drawSurf_s * surf) Line 801 C++
     TheDarkModNoTools.exe!RB_GLSL_CreateDrawInteractions(const drawSurf_s * surf) Line 179 C++
     [Inline Frame] TheDarkModNoTools.exe!RB_GLSL_DrawLight_Stencil() Line 261 C++
     TheDarkModNoTools.exe!RB_GLSL_DrawInteractions() Line 346 C++
     TheDarkModNoTools.exe!RB_STD_DrawView() Line 1284 C++
     TheDarkModNoTools.exe!RB_DrawView() Line 872 C++
     TheDarkModNoTools.exe!RB_ExecuteBackEndCommands(const emptyCommand_t * cmds) Line 593 C++
     [Inline Frame] TheDarkModNoTools.exe!R_IssueRenderCommands(frameData_t * frameData) Line 135 C++
     TheDarkModNoTools.exe!idRenderSystemLocal::EndFrame(int * frontEndMsec, int * backEndMsec) Line 645 C++
     TheDarkModNoTools.exe!idSessionLocal::UpdateScreen(bool outOfSequence) Line 2648 C++
     [Inline Frame] TheDarkModNoTools.exe!idSessionLocal::ShowLoadingGui() Line 480 C++
     TheDarkModNoTools.exe!idSessionLocal::ExecuteMapChange(bool noFadeWipe) Line 1520 C++
     TheDarkModNoTools.exe!idSessionLocal::LoadGame(const char * saveName) Line 2173 C++
     TheDarkModNoTools.exe!LoadGame_f(const idCmdArgs & args) Line 1780 C++
     TheDarkModNoTools.exe!idCmdSystemLocal::ExecuteTokenizedString(const idCmdArgs & args) Line 482 C++
     TheDarkModNoTools.exe!idCmdSystemLocal::ExecuteCommandBuffer() Line 662 C++
     TheDarkModNoTools.exe!idEventLoop::RunEventLoop(bool commandExecution) Line 173 C++
     TheDarkModNoTools.exe!idCommonLocal::Frame() Line 2426 C++
     TheDarkModNoTools.exe!WinMain(HINSTANCE__ * hInstance, HINSTANCE__ * hPrevInstance, char * lpCmdLine, int nCmdShow) Line 1371 C++
TagsNo tags attached.

Activities

cabalistic

cabalistic

23.06.2018 16:33

developer   ~0010567

Afaik it was reported by a user for 2.06 already, wasn't it? If so, it's not related to VBO.

Unfortunately, I cannot reliably reproduce this. I've seen it happen once or twice, but whenever I try to capture it in the debugger, I can reload again and again and nothing's happening :(
cabalistic

cabalistic

23.06.2018 16:35

developer   ~0010568

What's puzzling for me, though, is why would you have any active lights if showing the map loading screen? Sounds to me like it's stuck somewhere in a transition?
duzenko

duzenko

24.06.2018 12:12

developer  

savegames.zip (2,379,180 bytes)
duzenko

duzenko

24.06.2018 12:15

developer   ~0010577

Save game attached.
It has active lights when map is about to show the loading GUI.
But at that point all light defs have already been destroyed.
duzenko

duzenko

24.06.2018 16:11

developer   ~0010579

#define USE_LIBC_MALLOC 0
appears to work around it.
Looks like it is an old bug, but it did not manifest until recently because idHeap masked it.
duzenko

duzenko

24.06.2018 17:48

developer   ~0010580

At revision: 7514
stgatilov

stgatilov

28.06.2018 17:35

administrator   ~0010609

Last edited: 28.06.2018 17:44

View 2 revisions

Now the game crashes during quickload again (Inn Business, Release x64)

The call stack is:
> TheDarkModx64.exe!RB_ExecuteBackEndCommands(const emptyCommand_t * cmds) Line 573 C++
     TheDarkModx64.exe!R_IssueRenderCommands(frameData_t * frameData) Line 138 C++
     TheDarkModx64.exe!LightGem::Render() Line 348 C++
     TheDarkModx64.exe!idRenderSystemLocal::EndFrame(int * frontEndMsec, int * backEndMsec) Line 643 C++
     TheDarkModx64.exe!idSessionLocal::UpdateScreen(bool outOfSequence) Line 2650 C++
     [Inline Frame] TheDarkModx64.exe!idSessionLocal::ShowLoadingGui() Line 480 C++
     TheDarkModx64.exe!idSessionLocal::ExecuteMapChange(bool noFadeWipe) Line 1522 C++
     TheDarkModx64.exe!idSessionLocal::LoadGame(const char * saveName) Line 2175 C++
     TheDarkModx64.exe!LoadGame_f(const idCmdArgs & args) Line 1781 C++

In the switch line:
    while (cmds) {
        switch ( cmds->commandId ) {
        case RC_NOP:
It says cmds is 0x28000000000, which is access violation.

stgatilov

stgatilov

29.06.2018 16:57

administrator   ~0010610

Last edited: 29.06.2018 16:58

View 4 revisions

I have tried to understand what is going on here.

So there are frameData_t of various kind. Two of such frameData-s are located in backend (tr_main.cpp):
  frameData_t smpFrameData[2];
With some pointers referencing these two frames.

Each of these two frameData-s has a linked list of commands:
  emptyCommand_t *cmdHead, *cmdTail;
They allocated from the corresponding frame memory.

Also, there are two lightgem datas in LightGem:
  lightGemDrawCmds_t m_LightgemDrawCmds[2];
And each of them has its own commands linked list.
When time comes, commands list of lightgem is temporarily patched into the backend's command list, only to add a few commands to the list (I guess to reuse global backend functions).

However, lightgem commands are still allocated from the same two chunks of frame memory as ordinary backend commands are. This creates a dangerous situation when someone can add one more frame switch in one place without clearing the other commands list, and new commands will start randomly overwriting the commands still referenced in the other list.

I don't know how tied together these things are, but I feel this is not very good. Ideally, lightgem commands should be allocated from their own buffers.

cabalistic

cabalistic

29.06.2018 23:38

developer   ~0010611

Ah, hm. That's probably something I messed up, I'll look into it.

However, smpFrameData is the general-purpose allocation buffer for a single frame. It's not just for commands, but it's basically to allocate anything that lives and is valid for a single frame. The separate lightgem commands are a bit of a hack, yes. That's mostly because you can't directly add them to the standard command queue due to subtle assumptions that would be violated. And unfortunately, due to the stateful backend, you have to replace the standard queue with the lightgem one to render it, or refactor a good part of the backend to not be so stateful.

But the lightgem commands are still tied to the same frame as the standard command queue. So it is correct that they draw from the same frameData allocation pool. If that leads to crashes due to incorrect cleanup of the lightgem, then that needs to be fixed instead. Introducing a separate frameData buffer might fix the crash, but it would still mean that data meant for a different frame is used in the wrong place, so it's not the right fix, imho.
stgatilov

stgatilov

30.06.2018 03:30

administrator   ~0010612

Last edited: 30.06.2018 03:30

View 2 revisions

The problem is caused by svn rev 7514, which was supposed to fixquickload crashes.
Before that quickloading crashed for a different reason, I guess.

duzenko

duzenko

30.06.2018 05:23

developer   ~0010616

Last edited: 30.06.2018 07:27

View 3 revisions

Do we want to convert lightgem to a regular subview so that it's drawn just like everything else in TDM?
It would remove all the hacks accumulated over the years.
...
Let me try that locally and I will post my version to github for discussion.

duzenko

duzenko

30.06.2018 09:38

developer   ~0010621

Proposed changes (this fixes the quick load crash to me)
https://github.com/duzenko/darkmod-experiments/commit/4726d715b5fece44f5467001625e0f42f9da3bb4
nbohr1more

nbohr1more

30.06.2018 15:19

developer   ~0010622

I thought it always was a subview?
Or did the IsLightGem() conversion change that?

Code looks good to me but do optimizations like this still work?

renderworld_portals.cpp


if ( !entity->parms.isLightgem && entity->parms.noShadow && tr.viewDef->IsLightGem() )
                continue;
stgatilov

stgatilov

30.06.2018 16:06

administrator   ~0010623

What is subview exactly?
What is already working via subviews in TDM?
duzenko

duzenko

30.06.2018 16:19

developer   ~0010624

A subview is a render like mirror, portal sky, or remote camera. I.e. a render separate to "main" player's eyes view. Subviews are rendered inside the same backend run as the main view. It's sometimes copied to texture for use in the main view.

Before 2.06 we only had xray, remote camera, and mirror as subviews. Portal sky and lightgem were done as separate backend calls due to doom3 source code closed at the time of their implementation.

In 2.06 I moved portal sky to the subview system. Now it's lightgem turn. By moving to subviews I mean rendering it in the same backend run so that we don't need to maintain the separate backend data for lightgem.

I think the lightgem optimization should still work because I implicitly mark the subview's viewId as lightgem. It looks good when I view it with r_subviewsONly 1 and the lightgem HUD seems to change adequately.

Things TODO is revisiting/reimplementing the "hand held" objects influence, split/interleave, and source code level object links (I moved some things from private to public and wonder if it could be done better)
stgatilov

stgatilov

30.06.2018 17:47

administrator   ~0010625

Last edited: 30.06.2018 17:49

View 3 revisions

I have no idea how far this change goes.
I tested two maps with this patch, doing many quicksaves/quickloads. No crashes at all. Also, did not notice any problems with lightgem.

Probably wait for Cabalistic's opinion?

P.S. When I first came to TDM, lightgem was working via taking screenshot to named pipe: 0002768 =)

cabalistic

cabalistic

30.06.2018 17:51

developer   ~0010626

In principle, it sounds like a way better idea than what I hacked together to get the lightgem onto the frontend/backend split. I wasn't knowledgable enough about the subviews system to attempt it myself.

I'll have a look at the code later :)
duzenko

duzenko

01.07.2018 07:57

developer   ~0010630

At revision: 7532
cabalistic

cabalistic

01.07.2018 08:26

developer   ~0010631

I left you a few comments on the Github commit. Good work, overall. I only have one concern about certain functionality you commented out. And also, please let's remove the old implementation instead of commenting it out, especially those parts you moved into a new function. That's what source control is for :)
duzenko

duzenko

01.07.2018 08:31

developer   ~0010632

@grayman asked me to be careful with deletions. I should have left a note in the comment saying to delete after 2.07 release.
cabalistic

cabalistic

01.07.2018 09:03

developer   ~0010633

But in this case you basically just moved functionality and removed a hack I only introduced with the BFG changes. So those changes were never even "productive". No point in letting them in the code for 2.07.

I think the issue with quickloads related to the lightgem rendering is fixed now. But this is a change that was introduced after 2.06. Are we confident the original problem is understood and fixed, as well?
stgatilov

stgatilov

07.07.2018 07:26

administrator   ~0010649

Last edited: 07.07.2018 07:28

View 2 revisions

I suppose Grayman was mainly talking about deleting the code which you did NOT write, especially some code which is present in TDM for a long time and does not make any harm to anyone.
If you MOVE code, then you should remove it at the old location, otherwise you make things more confusing =)

Anyway, I have a question: why exactly do we need two RAM buffers for lightgem contents? And why just creating two buffers is enough for fixing it?

I think this issue should be marked as resolved now. If someone else has quickloading crashes in the nearest future, he can reopen it.
As for why 2.06 was sometimes crashing, I'm afraid it is useless to seek the answer on the current SVN version. If you really like to know the historic answer, you can checkout 2.06 branch and investigate.

cabalistic

cabalistic

07.07.2018 07:31

developer   ~0010650

Rendering the lightgem now happens on the backend, while analysis of the result happens in the frontend. Since both can run in parallel, they should not use the same buffer. So one buffer belongs to the frontend to analyse while backend copies results of the current frame lightgem render to the other buffer.
stgatilov

stgatilov

07.07.2018 07:43

administrator   ~0010651

And why this double buffering is not necessary for other subviews then?
cabalistic

cabalistic

07.07.2018 07:46

developer   ~0010652

Because they don't produce feedback to the frontend. Lightgem is unique in that we read back the results of the render to the system RAM and decide based on the contents how dark or light the lightgem should be. The other subviews just produce partial results that are then used in the remaining rendering pipeline on the backend.
nbohr1more

nbohr1more

07.07.2018 12:21

developer   ~0010654

The double-buffer change seems to have cured the flickering AI issue...

However, now No Honor Among Thieve's crashes to the console after clicking the mission start prompt.

VertexCache static vertex buffer uploaded, memory used: 40563 kb
VertexCache static index buffer uploaded, memory used: 9256 kb
Generated framebuffer DEPTH_STENCIL attachment: 1366x768
Generated framebuffer DEPTH_STENCIL attachment: 1366x768
speaker_zone_ambient::init() called
Color read from main ambient light 'ambient_world': 0.05 0.05 0.08

Changed location from '' to 'town'.
--------- Game Map Shutdown ----------
WARNING:Door VaultDoor2 is not within a valid AAS area
WARNING:Door treasurelid2 is not within a valid AAS area
WARNING:Door safe_door_1 is not within a valid AAS area
WARNING:Door atdm_mover_door_3 is not within a valid AAS area
ModelGenerator memory: 21 LOD entries with 31 users using 10206 bytes, memory s
aved: 13341 bytes.
--------- Game Map Shutdown done -----
ERROR:idRenderWorld::UpdateEntityDef: index = -1
--------------------------------------
]condump nhat_crash.txt
Dumped console text to nhat_crash.txt.
nbohr1more

nbohr1more

07.07.2018 12:24

developer   ~0010655

Same issue in "The Transaction" which also starts with a scripted cut-scene.
cabalistic

cabalistic

07.07.2018 12:32

developer   ~0010656

The double buffers we were talking about are not related to anything visual (except for the lightgem value), so I don't see how they would have anything to do with AI flickering. I did increase the initial amount of vertex cache, though, so if the flickering was due to having to resize the cache, that might have been it.

You could experiment with the relevant CVars, r_frameVertexMemory and r_frameIndexMemory. For me, the resize is effectively flicker-free, but I'm running fairly high frame rates so that might be hiding the effect.
cabalistic

cabalistic

07.07.2018 12:58

developer   ~0010657

As for the cutscene errors: I found the cause. The lightgem render is trying to hide the player from view for the lightgem render, but the player apparently does not exist during those cutscenes. We probably missed a conditional when moving the lightgem to a subview to skip the lightgem in these circumstances. I will double-check the changes.
cabalistic

cabalistic

07.07.2018 13:14

developer   ~0010658

Should be fixed with r7553.
nbohr1more

nbohr1more

07.07.2018 16:51

developer   ~0010659

Thank you.

No crash in either NHAT or Transaction, including the 2nd cut-scene in Transaction.

With a good 20 minutes of testing, only one instance of a flickering AI happened and he flickered once and nearly imperceptibly.

This is golden.
duzenko

duzenko

08.07.2018 09:30

developer   ~0010660

Do we want to interleave lightgem rendering?
I added tdm_lg_interleave as check for LG toggle for now.
cabalistic

cabalistic

08.07.2018 09:46

developer   ~0010661

Personally, I don't think it's worth it anymore. From my nSight profilings, the lightgem rendering inside the normal backend operations is a very minor part of the frame render, and doing it only every n-th frame isn't going to save any significant processing or rendering time. As such I think it only makes the code more complicated and adds another potential for subtle lightgem bugs?
nbohr1more

nbohr1more

08.07.2018 13:37

developer   ~0010662

Hmm.

The users who benefit from lg_interleave are single core users.
They only benefit if lg_split is active as well.

So we have these questions:

1) How many single core users do we have?
2) Of these users, how many have very poor performance in 2.06?
3) Of these users, how many have tried LG interleave options?

That said, I am guessing it wouldn't be too hard to add lg_split back and enable it only when com_smp = 0?
cabalistic

cabalistic

08.07.2018 13:40

developer   ~0010663

Ever since I divided the lightgem rendering into frontend/backend part (and also now with the subview), it only renders one lightgem image per frame, and it alternates up or down between frames. So the old lg_split option is gone, but technically always active.
nbohr1more

nbohr1more

08.07.2018 13:54

developer   ~0010664

Thanks, that narrows down things a bit.

If we keep lg_interleave, we should still constrain it to com_smp 0.
(If we are going to keep offering a single core mode...)
stgatilov

stgatilov

29.07.2018 15:10

administrator   ~0010732

I have one question: is this thing done?
Did anyone experience savegame crashes lately?
The work about moving lightgem into subview is also long done, I guess.

If the core things are done, it's better to resolve the issue (you can move any follow-up tasks into separate issues).

Issue History

Date Modified Username Field Change
23.06.2018 16:19 duzenko New Issue
23.06.2018 16:19 duzenko Status new => assigned
23.06.2018 16:19 duzenko Assigned To => cabalistic
23.06.2018 16:33 cabalistic Note Added: 0010567
23.06.2018 16:35 cabalistic Note Added: 0010568
24.06.2018 11:34 duzenko Assigned To cabalistic => duzenko
24.06.2018 12:12 duzenko File Added: savegames.zip
24.06.2018 12:15 duzenko Note Added: 0010577
24.06.2018 16:11 duzenko Note Added: 0010579
24.06.2018 17:48 duzenko Note Added: 0010580
28.06.2018 17:35 stgatilov Note Added: 0010609
28.06.2018 17:44 stgatilov Note Edited: 0010609 View Revisions
29.06.2018 16:57 stgatilov Note Added: 0010610
29.06.2018 16:58 stgatilov Note Edited: 0010610 View Revisions
29.06.2018 16:58 stgatilov Note Edited: 0010610 View Revisions
29.06.2018 16:58 stgatilov Note Edited: 0010610 View Revisions
29.06.2018 23:38 cabalistic Note Added: 0010611
30.06.2018 03:30 stgatilov Note Added: 0010612
30.06.2018 03:30 stgatilov Note Edited: 0010612 View Revisions
30.06.2018 05:23 duzenko Note Added: 0010616
30.06.2018 05:25 duzenko Note Edited: 0010616 View Revisions
30.06.2018 07:27 duzenko Note Edited: 0010616 View Revisions
30.06.2018 09:38 duzenko Note Added: 0010621
30.06.2018 15:19 nbohr1more Note Added: 0010622
30.06.2018 16:06 stgatilov Note Added: 0010623
30.06.2018 16:19 duzenko Note Added: 0010624
30.06.2018 17:47 stgatilov Note Added: 0010625
30.06.2018 17:48 stgatilov Note Edited: 0010625 View Revisions
30.06.2018 17:49 stgatilov Note Edited: 0010625 View Revisions
30.06.2018 17:51 cabalistic Note Added: 0010626
01.07.2018 07:57 duzenko Note Added: 0010630
01.07.2018 08:26 cabalistic Note Added: 0010631
01.07.2018 08:31 duzenko Note Added: 0010632
01.07.2018 09:03 cabalistic Note Added: 0010633
07.07.2018 07:26 stgatilov Note Added: 0010649
07.07.2018 07:28 stgatilov Note Edited: 0010649 View Revisions
07.07.2018 07:31 cabalistic Note Added: 0010650
07.07.2018 07:43 stgatilov Note Added: 0010651
07.07.2018 07:46 cabalistic Note Added: 0010652
07.07.2018 12:21 nbohr1more Note Added: 0010654
07.07.2018 12:24 nbohr1more Note Added: 0010655
07.07.2018 12:32 cabalistic Note Added: 0010656
07.07.2018 12:58 cabalistic Note Added: 0010657
07.07.2018 13:14 cabalistic Note Added: 0010658
07.07.2018 16:51 nbohr1more Note Added: 0010659
08.07.2018 09:29 duzenko Summary Crashes when quick loading => Crashes when quick loading | Lightgem as Subview
08.07.2018 09:30 duzenko Note Added: 0010660
08.07.2018 09:46 cabalistic Note Added: 0010661
08.07.2018 13:37 nbohr1more Note Added: 0010662
08.07.2018 13:40 cabalistic Note Added: 0010663
08.07.2018 13:54 nbohr1more Note Added: 0010664
19.07.2018 08:55 duzenko Severity crash => tweak
29.07.2018 15:10 stgatilov Note Added: 0010732
30.07.2018 13:55 nbohr1more Priority normal => high
30.07.2018 13:55 nbohr1more Severity tweak => crash
30.07.2018 13:55 nbohr1more Status assigned => resolved
30.07.2018 13:55 nbohr1more Resolution open => fixed
30.07.2018 13:55 nbohr1more Fixed in Version => TDM 2.07