View Issue Details

IDProjectCategoryView StatusLast Update
0005538The Dark ModObjectivespublic11.05.2021 00:34
Reporterbwyan Assigned Tostgatilov  
PrioritynormalSeveritycrashReproducibilityalways
Status resolvedResolutionfixed 
PlatformLinuxOSUbuntuOS Version20.04.2 LTS
Product VersionTDM 2.08 
Target VersionTDM 2.10Fixed in VersionTDM 2.10 
Summary0005538: Segmentation fault when receiving a new objective in mission "WS3: Cleighmoor"
DescriptionHello,

This is my first bug report, so any comments on how I could improve this post will be appreciated.

Mission: William Steele 3: Cleighmoor (https://www.thedarkmod.com/missiondetails/?id=82)
Save game file: https://nextcloud.bwyan.dk/index.php/s/A32yPXZtC2W4gmB (compressed .tar.gz file)

When loading "Quicksave_0" and waiting a few seconds for the civilian to finish his dialogue with the guard, a new mission objective is supposed to be received, but instead the game exits to desktop with the following output (cropped to immediately after the save game was created):

NEW OBJECTIVE
signal caught: Segmentation fault
si_code 1
Trying to exit gracefully..
--------- Game Map Shutdown ----------
ModelGenerator memory: 23 LOD entries with 3 users using 2713 bytes.
WARNING:idClipModel::FreeTraceModel: tried to free uncached trace model (index=0)
--------- Game Map Shutdown done -----
Shutting down sound hardware
idRenderSystem::Shutdown()
double free or corruption (out)
double fault Aborted, bailing out
shutdown terminal support
About to exit with code 6
Steps To ReproduceLoad the linked save game for the mission "William Steele 3: Cleighmoor" and wait a few seconds and the game should crash.
Additional InformationThis has only been tested on the recently released version 2.09, but I can't select that as an option in the "Product Version" drop-down.
TagsCrash

Relationships

related to 0004835 closedstgatilov Somewhere Above the City: debriefing hangs 

Activities

nbohr1more

nbohr1more

16.02.2021 16:51

developer   ~0013691

Confirmed the crash...
nbohr1more

nbohr1more

16.02.2021 17:05

developer   ~0013692

Cannot reproduce in gdb ...
nbohr1more

nbohr1more

16.02.2021 17:32

developer   ~0013693

Reproducible with com_smp 0 and com_fixedTic 0 (not affected by multi-core or uncapped FPS)
bwyan

bwyan

16.02.2021 20:44

reporter   ~0013694

@nbohr1more: Do I understand you correctly that the crash may be due to my chosen in-game settings (chosen in the main menu), or are these settings that you mention exclusive to the dev console?
nbohr1more

nbohr1more

17.02.2021 01:10

developer   ~0013695

@bwyan : Multi-Core and Uncapped FPS are known to be sources of stability issues so I ruled them out.

It is possible that some setting is responsible for this but it seems unlikely now.

As I can tell, the conversation is causing an unhandled clipmodel to be freed.

Probably something that needs to be covered in the Entity destructor...

I will continue to investigate.
nbohr1more

nbohr1more

17.02.2021 05:20

developer   ~0013696

Hmm...

[/game/ai/Conversation/ConversationSystem.cpp ( 160):DEB (CONVERSATION) FR: 79491] Terminating conversation SewellTalksToSmithson due to error.

[/game/ai/Mind.cpp ( 156):INF (AI) FR: 79491] Ending State Conversation (Sewell)
[/game/ai/Mind.cpp ( 156):INF (AI) FR: 79491] Ending State Conversation (Smithson)

End of log
[/game/StimResponse/Response.cpp ( 97):DEB (STIMRESP) FR: 79494] Running ResponseScript
nbohr1more

nbohr1more

17.02.2021 15:20

developer   ~0013700

@grayman : I don't see a "SewellTalksToSmithson" variable in the script for this mission,
do you know how this conversation is initiated?
stgatilov

stgatilov

17.02.2021 15:35

administrator   ~0013701

Can't reproduce on Windows, but can reproduce on Linux.

Here are stack traces:
[Frontend]
#0 0x00007ffff6e2556f in _int_malloc (av=av@entry=0x7fffb0000020, bytes=bytes@entry=7916) at malloc.c:3734
0000001 0x00007ffff6e271d4 in __GI___libc_malloc (bytes=7916) at malloc.c:2920
0000002 0x000000000070bf72 in idHeap::Allocate (bytes=7916, this=<optimized out>) at /mnt/hgfs/thedarkmod/darkmod_src/idlib/Heap.cpp:261
0000003 Mem_Alloc (size=size@entry=7916) at /mnt/hgfs/thedarkmod/darkmod_src/idlib/Heap.cpp:1070
0000004 0x0000000000a22f24 in idClass::operator new (s=7916, s@entry=7912) at /mnt/hgfs/thedarkmod/darkmod_src/game/gamesys/Class.cpp:456
0000005 0x00000000009201b4 in CResponse::TriggerResponse (this=0x2005d660, sourceEntity=0x83880e4, stim=std::shared_ptr (count 1, weak 0) 0x20cfabc0)
    at /mnt/hgfs/thedarkmod/darkmod_src/game/StimResponse/Response.cpp:98
0000006 0x00000000005de515 in idGameLocal::DoResponseAction (this=this@entry=0x20b7c00 <gameLocal>, stim=std::shared_ptr (count 1, weak 0) 0x20cfabc0, numEntities=numEntities@entry=34,
    originator=originator@entry=0x83880e4, stimOrigin=...) at /mnt/hgfs/thedarkmod/darkmod_src/game/Game_local.cpp:7377
0000007 0x00000000005def2f in idGameLocal::ProcessStimResponse (this=this@entry=0x20b7c00 <gameLocal>, ticks=ticks@entry=416610695) at /mnt/hgfs/thedarkmod/darkmod_src/game/Game_local.cpp:7611
#8 0x00000000005df50e in idGameLocal::RunFrame (this=0x20b7c00 <gameLocal>, clientCmds=<optimized out>, timestepMs=<optimized out>) at /mnt/hgfs/thedarkmod/darkmod_src/game/Game_local.cpp:3310
0000009 0x0000000000505408 in idSessionLocal::RunGameTic (this=0x1850100 <sessLocal>, timestepMs=16) at /mnt/hgfs/thedarkmod/darkmod_src/framework/Session.cpp:3071
0000010 0x0000000000508ce7 in idSessionLocal::RunGameTics (this=0x1850100 <sessLocal>) at /mnt/hgfs/thedarkmod/darkmod_src/framework/Session.cpp:3114
0000011 idSessionLocal::FrontendThreadFunction (this=0x1850100 <sessLocal>) at /mnt/hgfs/thedarkmod/darkmod_src/framework/Session.cpp:3160
0000012 0x0000000000508f89 in idSessionLocal::<lambda(void*)>::operator() (__closure=0x0, x=<optimized out>) at /mnt/hgfs/thedarkmod/darkmod_src/framework/Session.cpp:3254
0000013 idSessionLocal::<lambda(void*)>::_FUN(void *) () at /mnt/hgfs/thedarkmod/darkmod_src/framework/Session.cpp:3256
0000014 0x00007ffff7bc16ba in start_thread (arg=0x7fffc9d76700) at pthread_create.c:333
0000015 0x00007ffff6eaa4dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

[Backend]
#0 0x00007ffff6ea0007 in ioctl () at ../sysdeps/unix/syscall-template.S:84
0000001 0x00007fffe18c7478 in drmIoctl () from /usr/lib/x86_64-linux-gnu/libdrm.so.2
0000002 0x00007fffe18ca24f in drmCommandWriteRead () from /usr/lib/x86_64-linux-gnu/libdrm.so.2
0000003 0x00007fffe111b584 in ?? () from /usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so
0000004 0x00007fffe111a092 in ?? () from /usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so
0000005 0x00007fffe112f0d6 in ?? () from /usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so
0000006 0x00007fffe1130a7c in ?? () from /usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so
0000007 0x00007fffe0bff9a5 in ?? () from /usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so
#8 0x00007fffe0b60ac6 in ?? () from /usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so
0000009 0x00007fffe0c038b3 in ?? () from /usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so
0000010 0x00007fffe0b62133 in ?? () from /usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so
0000011 0x00007fffe0b622d2 in ?? () from /usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so
0000012 0x0000000000bcf62b in RenderBackend::DrawLightgem (this=0x2b99920 <renderBackendImpl>, viewDef=0x7fffd03aa900,
    lightgemData=0x4b4a250 "\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022\031\022\022"...) at /mnt/hgfs/thedarkmod/darkmod_src/renderer/backend/RenderBackend.cpp:160
0000013 0x000000000081fc6b in RB_ExecuteBackEndCommands (cmds=0x7fffd03d5b00) at /mnt/hgfs/thedarkmod/darkmod_src/renderer/tr_backend.cpp:846
0000014 0x00000000007d61c5 in R_IssueRenderCommands (frameData=0x27adb40 <smpFrameData>) at /mnt/hgfs/thedarkmod/darkmod_src/renderer/RenderSystem.cpp:140
0000015 idRenderSystemLocal::EndFrame (this=0x27a1580 <tr>, frontEndMsec=0x0, backEndMsec=0x0) at /mnt/hgfs/thedarkmod/darkmod_src/renderer/RenderSystem.cpp:635
0000016 0x00000000005008ac in idSessionLocal::UpdateScreen (this=0x1850100 <sessLocal>, outOfSequence=<optimized out>) at /mnt/hgfs/thedarkmod/darkmod_src/framework/Session.cpp:2757
0000017 0x00000000004b0a8d in idCommonLocal::Frame (this=0x17c5200 <commonLocal>) at /mnt/hgfs/thedarkmod/darkmod_src/framework/Common.cpp:2546
0000018 0x00000000004748ed in main (argc=1, argv=0x7fffffffdeb8) at /mnt/hgfs/thedarkmod/darkmod_src/sys/posix/platform_linux.cpp:580

[Sound thread]
#0 0x0000000000d4512f in res2_inverse ()
0000001 0x0000000000d45a67 in mapping0_inverse ()
0000002 0x0000000000d211e8 in _fetch_and_process_packet.constprop.10 ()
0000003 0x0000000000d25278 in ov_read_float ()
0000004 0x000000000085f348 in idSampleDecoderLocal::DecodeOGG (this=this@entry=0x1eb219e0, sample=sample@entry=0xc79f8a0, sampleOffset44k=sampleOffset44k@entry=0, sampleCount44k=sampleCount44k@entry=8192,
    dest=dest@entry=0x7fffc956c0b0) at /mnt/hgfs/thedarkmod/darkmod_src/sound/snd_decoder.cpp:561
0000005 0x000000000085f683 in idSampleDecoderLocal::Decode (this=0x1eb219e0, sample=0xc79f8a0, sampleOffset44k=<optimized out>, sampleCount44k=<optimized out>, dest=0x7fffc956c0b0)
    at /mnt/hgfs/thedarkmod/darkmod_src/sound/snd_decoder.cpp:440
0000006 0x00000000008692a3 in idSoundChannel::GatherChannelSamples (this=0x1ead1bd8, sampleOffset44k=<optimized out>, sampleCount44k=<optimized out>, dest=<optimized out>)
    at /mnt/hgfs/thedarkmod/darkmod_src/sound/snd_emitter.cpp:278
0000007 0x0000000000877922 in idSoundWorldLocal::AddChannelContribution (this=this@entry=0x6dd33b0, sound=sound@entry=0x1ead1b70, chan=chan@entry=0x1ead1bd8, current44kHz=current44kHz@entry=4878336,
    numSpeakers=numSpeakers@entry=2, finalMixBuffer=finalMixBuffer@entry=0x2a3c450 <soundSystemLocal+48>) at /mnt/hgfs/thedarkmod/darkmod_src/sound/snd_world.cpp:2155
#8 0x0000000000877e13 in idSoundWorldLocal::MixLoop (this=0x6dd33b0, current44kHz=current44kHz@entry=4878336, numSpeakers=numSpeakers@entry=2, finalMixBuffer=0x2a3c450 <soundSystemLocal+48>)
    at /mnt/hgfs/thedarkmod/darkmod_src/sound/snd_world.cpp:559
0000009 0x000000000086e6b8 in idSoundSystemLocal::AsyncUpdateWrite (this=0x2a3c420 <soundSystemLocal>, inTime=110623) at /mnt/hgfs/thedarkmod/darkmod_src/sound/snd_system.cpp:763
0000010 0x00000000004b3ca1 in idCommonLocal::SingleAsyncTic (this=this@entry=0x17c5200 <commonLocal>) at /mnt/hgfs/thedarkmod/darkmod_src/framework/Common.cpp:2632
0000011 0x00000000004b3da8 in idCommonLocal::Async (this=0x17c5200 <commonLocal>) at /mnt/hgfs/thedarkmod/darkmod_src/framework/Common.cpp:2689
0000012 0x0000000000c537f8 in Sys_AsyncThread () at /mnt/hgfs/thedarkmod/darkmod_src/sys/linux/main.cpp:96
0000013 0x00007ffff7bc16ba in start_thread (arg=0x7fffc9575700) at pthread_create.c:333
0000014 0x00007ffff6eaa4dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

When crash happened, gdb showed me frontend thread.
I guess the crash happened inside malloc.
Since size looks OK, this crash most likely means heap corruption somewhere.
It can also explain why it does not lead to crash on Windows.

No idea what to do next. Is it possible to run TDM under valgrind?
grayman

grayman

18.02.2021 16:05

viewer   ~0013704

When Smithson starts his patrol and walks into "trigger_once_entityname_11", a conversation is started ("SewellTalksToSmithson") between him and Sewell.

I have no clue what's causing the issue. The only known crash in Cleighmoor was fixed years ago by this update:

https://www.dropbox.com/s/ph8pa8eyobuh36q/ws3_cleighmoor_map_patch.zip?dl=0
cabalistic

cabalistic

18.02.2021 16:29

developer   ~0013705

How easy is it to reproduce this issue without the savegame? It would be very helpful, I think, to know if this already existed in 2.08 or not, so that we could potentially bisect to the problematic change.
stgatilov

stgatilov

22.02.2021 15:48

administrator   ~0013716

Last edited: 22.02.2021 15:49

I tried to reproduce it on Windows with (full) Debug, but there is no issues like heap corruption detected.
Also tried to run TDM under valgrind, but it produces tons of redirection warnigs, after which crashes deep inside VMWare OpenGL driver during glX initialization.

And yes, I don't see instructions how to reproduce from fresh start. I wonder if it is possible.
nbohr1more

nbohr1more

20.04.2021 03:10

developer   ~0013880

Latest 2.10 rev 9293: I tried to reproduce from a fresh start and when I got to the cut-scene no crash appeared.
stgatilov

stgatilov

20.04.2021 03:30

administrator   ~0013881

That's very expected result.

I guess we will have to leave this issue be, letting this "ghost of undefined behavior" live in the game until it gets caught in more apparent way.
stgatilov

stgatilov

26.04.2021 15:10

administrator   ~0013915

Last edited: 26.04.2021 15:17

Another independent report:
  https://forums.thedarkmod.com/index.php?/topic/20900-ws3-claymore-crash-segfault/&tab=comments#comment-459882
Attaching savegame on TDM 2.09.

Unfortunately, I cannot reproduce crash with this save on Linux VM.
Quicksave_0.save (2,923,977 bytes)
stgatilov

stgatilov

26.04.2021 16:10

administrator   ~0013916

I have added automation sequence for reproducing the issue:

@installfm ws3_cleighmoor
@file enter_game.seq

setviewpos 4300 1300 720 0 90 0
@gamectrl frob
@sleep 140.0
exit

Basically, it teleports straight to the door, opens it and waits for 140 seconds.
On Linux, I get crash exactly at the end of second conversation sometimes (not always).

On assets SVN, it can be run from /devel/auto/scripts as:
  python run.py 5538_ws3_crash.seq 100
Make sure to update from SVN first.

To be honest, it is a big nuisance for automation stuff that Linux build of TDM does not exit cleanly (it just hangs on exit).
stgatilov

stgatilov

26.04.2021 17:07

administrator   ~0013917

Last edited: 26.04.2021 17:13

I managed to reproduce the problem on MSVC build of fresh trunk, on Release x64 configuration.
I used automation sequence, and got crash on 7-th restart.

Here is stack trace of main thread:
     ntdll.dll!00007ffbf4fe1323() Unknown
     ntdll.dll!00007ffbf4fe06e1() Unknown
     TheDarkModx64.exe!_free_base(void * block) Line 105 C++
     [Inline Frame] TheDarkModx64.exe!idDynamicAlloc<int,262144,1024>::Free(int * ptr) Line 226 C++
     [Inline Frame] TheDarkModx64.exe!R_ReallyFreeStaticTriSurf(srfTriangles_s *) Line 381 C++
> TheDarkModx64.exe!R_FreeDeferredTriSurfs(frameData_t * frame) Line 471 C++
     TheDarkModx64.exe!R_ToggleSmpFrame() Line 185 C++
     TheDarkModx64.exe!idRenderSystemLocal::EndFrame(int * frontEndMsec, int * backEndMsec) Line 693 C++
     TheDarkModx64.exe!idSessionLocal::UpdateScreen(bool outOfSequence) Line 2750 C++
     TheDarkModx64.exe!idCommonLocal::Frame() Line 2547 C++
     TheDarkModx64.exe!WinMain(HINSTANCE__ * hInstance, HINSTANCE__ * hPrevInstance, char * lpCmdLine, int nCmdShow) Line 1221 C++
     [Inline Frame] TheDarkModx64.exe!invoke_main() Line 102 C++
     TheDarkModx64.exe!__scrt_common_main_seh() Line 288 C++

All the other threads seem to be sleep-waiting...
stgatilov

stgatilov

27.04.2021 04:09

administrator   ~0013918

Last edited: 27.04.2021 04:10

Reproduced on Debug with Inlines using the same automation script.
The crash happens in same location.

Looking closer, the last function in TDM code is idHeap::Free16.
It deallocates 16-byte aligned memory of tri->indexes, previously allocated using idHeap::Allocate16.

This function allocates more memory and aligns it manually, and saves original pointer just before the returned aligned one:
  ptr = (byte *) malloc( bytes + 16 + sizeof(intptr_t) );
  alignedPtr = (byte *) ( ( ( (intptr_t) ptr ) + 15) & %tilde%15 );
  if ( alignedPtr - ptr < sizeof(intptr_t) )
    alignedPtr += 16;
  *((intptr_t *)(alignedPtr - sizeof(intptr_t))) = (intptr_t) ptr; //!!! save original pointer
  return (void *) alignedPtr;
The Free16 function fetches original pointer and calls free on it:
  free( (void *) *((intptr_t *) (( (byte *) p ) - sizeof(intptr_t))) );

In this case, pointer [p] passed to Free16 is 0x0000025548fa8110.
Here is what I see in memory dump:
0x0000025548FA8100 cd cd cd cd cd cd cd cd 00 81 fa 48 00 00 00 00 НННННННН.ЃъH....
0x0000025548FA8110 00 00 00 00 ff ff 00 00 00 00 00 00 04 00 00 00 ....яя..........
The 8 bytes immediately preceding address p must contain address of the originally allocated pointer.
Now it contains 0x0000000048fa8100 instead of 0x0000025548fa8100.
When free tries to deallocate this address, it gets access violation.
Obviously, the higher 32 bits of the address got zeroed somehow.

Here are possible causes:
1) Out-of-bounds array write could easily lead to such situation:
       tri->indexes[-1] = 0;
   This is the most plausible hypothesis.
2) Mismatching alloc/free functions were used.
   It seems that memory for tri->indexes is allocated using several different means.
   Deallocating with different allocator could potentially cause any issues.
   The current deallocation happens in R_FreeDeferredTriSurfs, and who knows where it came from...

Also note that the contents of freed tri->indexes array are:
  0, 65535, 0, 4, 5, 3, 24, 15, 14, ...
Given that numVerts = 504, the value 65535 is obviously invalid too.
stgatilov

stgatilov

27.04.2021 07:29

administrator   ~0013919

I can confirm that the problem happens on 2.08.
It also happens on SVN with "com_forceGenericSIMD 1" (I suspected SIMD code might be culprit).

Normally, susceptible version crashes without 10 attempts, but 2.07 survived 36 runs.
So it is certain that the problem appeared between 2.07 and 2.08.
Which is 1.5 years of development =(
cabalistic

cabalistic

27.04.2021 15:27

developer   ~0013921

I'm not 100% sure it's related, but I've seen a crash in the same place on a test map provided to me by kingsal. The crash itself is also reproducible, however it occasionally occurs in a different place than the free - which is why I also suspected a memory corruption as the root cause, but didn't manage to track it down, unfortunately. The crash on that test map appears to coincide with a lantern held by a guard being dropped when that guard dies. Potentially the crash might be related to interactions with that lantern?

Unfortunately, the test map was embedded in a wip map, so is a little too large to share here. Perhaps kingsal could extract it. If the issues are related, perhaps finding the similarities could help tracking down the root cause.
stgatilov

stgatilov

27.04.2021 16:36

administrator   ~0013922

Last edited: 27.04.2021 17:23

I decided to use more sophisticated tool from Microsoft, namely "Enable page heap" from GFlags:
  https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/gflags-and-pageheap
It adds non-accessible page after each allocation, causing page fault on overrun.

Contrary to my fears, TDM starts and plays fine in this mode =)
6 GB RAM taken, and works pretty fast (80 FPS is perfectly fine).

TDM crashed at the same moment, with stack trace ("this" pointer is wrong):
     [Inline Frame] TheDarkModx64.exe!idThread::ClearWaitFor() Line 951 C++
     [Inline Frame] TheDarkModx64.exe!idThread::Pause() Line 1098 C++
     TheDarkModx64.exe!idThread::End() Line 814 C++
> TheDarkModx64.exe!ai::Subsystem::ClearTasks() Line 238 C++
     TheDarkModx64.exe!ai::IdleState::Init(idAI * owner) Line 122 C++
     TheDarkModx64.exe!ai::Mind::Think() Line 94 C++
     TheDarkModx64.exe!idAI::Think() Line 2667 C++
     TheDarkModx64.exe!idGameLocal::RunFrame(const usercmd_t * clientCmds, int timestepMs) Line 3337 C++
     TheDarkModx64.exe!idSessionLocal::RunGameTic(int timestepMs) Line 3039 C++
     TheDarkModx64.exe!idSessionLocal::RunGameTics() Line 3082 C++
     TheDarkModx64.exe!idSessionLocal::ActivateFrontend() Line 3192 C++
     TheDarkModx64.exe!idRenderSystemLocal::EndFrame(int * frontEndMsec, int * backEndMsec) Line 638 C++
     TheDarkModx64.exe!idSessionLocal::UpdateScreen(bool outOfSequence) Line 2750 C++
     TheDarkModx64.exe!idCommonLocal::Frame() Line 2547 C++
     TheDarkModx64.exe!WinMain(HINSTANCE__ * hInstance, HINSTANCE__ * hPrevInstance, char * lpCmdLine, int nCmdShow) Line 1221 C++

Needless to say, this is very far from things I suspected =)

UPDATE: The guy who can't get his tasks cleared is "Sewell".
I guess that's one of the guys who were talking.
The thread is "StartMonitoringSewell" (go around and ask guards).

The whole code here is crawling with STL and shared pointers, and add "recycle bins" to make lifetime even more complicated =(
stgatilov

stgatilov

27.04.2021 17:28

administrator   ~0013923

The thread "StartMonitoringSewell" ends in idEvent::ServiceEvents (I guess it means natural ending) at game time 138974.
When it happens, thread is deleted, as seen in idThread::Execute:
      done = interpreter.Execute();
    if ( done ) {
        End();
        if ( interpreter.terminateOnExit ) {
            PostEventMS( &EV_Remove, 0 );
        }

The ai task remains with dangling pointer to deleted thread.
At game time 138990, the AI decides to "think", and tries to clear its tasks.
ScriptTask::OnFinish simply tries to end thread, without knowing it is already dead:
void ScriptTask::OnFinish(idAI* owner)
{
    if (_thread != NULL)
    {
        // We've got a non-NULL thread, this means it's still alive, end it now
        _thread->End();
    }
}
stgatilov

stgatilov

28.04.2021 04:25

administrator   ~0013925

Last edited: 28.04.2021 04:26

I know two ways to fix the problem.

1) Store some "thread handle" instead of direct pointer, just like in idEntityPtr.
Then we can check before every access whether thread is dead or not.
Thread number can serve as such handle: each newly created thread gets new number.
Function idThread::GetThread(num) can be used to find/check for thread.

2) Make ScriptTask own its thread, so that nobody else deletes it.
I looked at different cases where idThread* is stored as member, and all of them use this approach.
Basically, if you call ManualDelete() method after creating thread, then it won't ever get deleted automatically.
You should delete such thread when your object holding thread pointer is destroyed.

In order to test all of this better, I need some more usage cases of ScriptTask.
Here are usages I see:
  startAlarmWhistle of lantern bot
  RunScript command in conversation
Now I wonder which FMs have a lot of such things...
stgatilov

stgatilov

28.04.2021 12:47

administrator   ~0013927

Fixed in svn rev 9321 with approach 2.

Checked with 20 runs on Debug With Inlines, and with one additional run with "page heap".
No problems now.
bwyan

bwyan

11.05.2021 00:34

reporter   ~0013977

@stgatilov I see you have been busy solving this issue. In the mean time I've been otherwise occupied and I just wanted to thank you for your hard work on this.

Issue History

Date Modified Username Field Change
16.02.2021 13:33 bwyan New Issue
16.02.2021 13:35 bwyan OS Version => 20.04.2 LTS
16.02.2021 13:35 bwyan Additional Information Updated
16.02.2021 16:30 bwyan Tag Attached: Crash
16.02.2021 16:50 nbohr1more Target Version => TDM 2.10
16.02.2021 16:51 nbohr1more Note Added: 0013691
16.02.2021 16:51 nbohr1more Severity normal => crash
16.02.2021 16:51 nbohr1more Status new => confirmed
16.02.2021 17:05 nbohr1more Note Added: 0013692
16.02.2021 17:32 nbohr1more Note Added: 0013693
16.02.2021 20:44 bwyan Note Added: 0013694
17.02.2021 01:10 nbohr1more Note Added: 0013695
17.02.2021 05:20 nbohr1more Note Added: 0013696
17.02.2021 15:20 nbohr1more Note Added: 0013700
17.02.2021 15:35 stgatilov Note Added: 0013701
18.02.2021 16:05 grayman Note Added: 0013704
18.02.2021 16:29 cabalistic Note Added: 0013705
22.02.2021 15:48 stgatilov Note Added: 0013716
22.02.2021 15:49 stgatilov Note Edited: 0013716
20.04.2021 03:10 nbohr1more Note Added: 0013880
20.04.2021 03:30 stgatilov Note Added: 0013881
26.04.2021 15:10 stgatilov Note Added: 0013915
26.04.2021 15:10 stgatilov File Added: Quicksave_0.save
26.04.2021 15:17 stgatilov Note Edited: 0013915
26.04.2021 16:10 stgatilov Note Added: 0013916
26.04.2021 17:07 stgatilov Note Added: 0013917
26.04.2021 17:07 stgatilov Note Edited: 0013917
26.04.2021 17:13 stgatilov Note Edited: 0013917
27.04.2021 04:09 stgatilov Note Added: 0013918
27.04.2021 04:09 stgatilov Note Edited: 0013918
27.04.2021 04:10 stgatilov Note Edited: 0013918
27.04.2021 04:10 stgatilov Note Edited: 0013918
27.04.2021 04:10 stgatilov Note Edited: 0013918
27.04.2021 07:29 stgatilov Note Added: 0013919
27.04.2021 15:27 cabalistic Note Added: 0013921
27.04.2021 16:36 stgatilov Note Added: 0013922
27.04.2021 16:42 stgatilov Note Edited: 0013922
27.04.2021 17:11 stgatilov Note Edited: 0013922
27.04.2021 17:23 stgatilov Note Edited: 0013922
27.04.2021 17:23 stgatilov Note Edited: 0013922
27.04.2021 17:28 stgatilov Note Added: 0013923
28.04.2021 04:25 stgatilov Note Added: 0013925
28.04.2021 04:25 stgatilov Assigned To => stgatilov
28.04.2021 04:25 stgatilov Status confirmed => assigned
28.04.2021 04:26 stgatilov Note Edited: 0013925
28.04.2021 09:47 stgatilov Relationship added related to 0004835
28.04.2021 12:47 stgatilov Note Added: 0013927
28.04.2021 12:47 stgatilov Product Version => TDM 2.08
28.04.2021 12:47 stgatilov Status assigned => resolved
28.04.2021 12:47 stgatilov Resolution open => fixed
28.04.2021 12:47 stgatilov Fixed in Version => TDM 2.10
11.05.2021 00:34 bwyan Note Added: 0013977