View Issue Details

IDProjectCategoryView StatusLast Update
0005600The Dark ModCodingpublic20.04.2024 15:44
Reporterstgatilov Assigned Tostgatilov  
PrioritynormalSeveritycrashReproducibilitysometimes
Status resolvedResolutionfixed 
Product VersionTDM 2.09 
Target VersionTDM 2.13Fixed in VersionTDM 2.13 
Summary0005600: Crashes due to cvarSystem not being threadsafe
DescriptionI was doing a regular automatic start of every released FM, when I got a crash on northdale2 FM.
The crash happened immediately after automation "clicked attack to start game" in idCVarSystemLocal::FindInternal function.
Looking at registers and error message, it was clear that cvars[i] was NULL when memory was read, while at the moment of crash it already pointed to correct idCVar object.
Apparently, it was a race condition: someone had already put cvar to hash table, but had not updated array element yet.

It turns out that quite a lot of code start registering or setting cvars when game starts, so this is a serious problem.
I think we should add mutex to idCVarSystemLocal, and protect all methods of idCVarSystemLocal with it.
TagsNo tags attached.

Relationships

related to 0005928 resolvedstgatilov Save game while bow is strung yields broken bow on fresh load 

Activities

nbohr1more

nbohr1more

20.12.2021 17:10

developer   ~0014602

Move to 2.11 ?
stgatilov

stgatilov

12.04.2024 08:03

administrator   ~0016638

Looking at the description, the race condition happened while registering cvar.
But in reality all cvars are global variables, and they are registered during global CRT initialization in single thread: no race condition is possible in this case!
And after CRT initialization, the set of existing cvars stays unchanged, so there is no problem searching for them without lock.

Perhaps this crash happened because sometimes we used nasty approach of making a cvar static variable inside a function.
Such a cvar is initialized only when the function is first executed, thus:
  1) User cannot see/find it when game starts: it does not exist yet.
  2) If function is called during gameplay in parallel section, then race condition happens.
I stopped doing this because of p.1, but I guess now we have a more important reason to avoid such practice.

I only found one such cvar right now, so I fixed it:
  r10707 Cvar r_maxTri renamed to r_maxTrisPerDrawCall and made global variable.
I guess it also explains why I have not seen any similar crashes in a while.
stgatilov

stgatilov

12.04.2024 08:06

administrator   ~0016639

So let's repeat:
1) It is ok to define idCVar as global variable in cpp file. It does not matter if it has static keyword or not.
2) It is ok to define idCVar as static class member.
3) It is NOT ok to define idCVar as non-static class member (it would be initialized only when object is created).
4) It is NOT ok to define idCVar as static variable inside function (it would be initialized only when function is called).
nbohr1more

nbohr1more

15.04.2024 05:17

developer   ~0016653

I vaguely recall that this issue was at one point considered to be the culprit behind some arrow crashes.

The only one I could reliably repeat was in "The Transaction"

Since this change, I have yet to repeat the crash again. I hope this is the end of that saga.
stgatilov

stgatilov

15.04.2024 16:33

administrator   ~0016654

I heavily doubt that the arrow crash is caused by threadsafety of cvars =)
nbohr1more

nbohr1more

16.04.2024 19:12

developer   ~0016655

I'm not gonna argue on that since my whole statement was a vague recollection.

I think the premise may have revolved around parsing tdm_weapon_next_on_empty during the bow loop but I'll table
anything more on this until I find the old convo.
stgatilov

stgatilov

20.04.2024 12:54

administrator   ~0016656

I cleaned cvars system before trying to implement cvar mission overrides:
  https://forums.thedarkmod.com/index.php?/topic/22428-internal-cvar/

These commits are for merging, i.e. remove internal cvar and cvars sharing:
  r10713 Merged idInternalCVar into idCVar.
  r10714 Cleanup after recent removal of idInternalCVar.
And these later cvars are about threadsafety:
  r10715 Removed "modifiedFlags" state from idCVarSystem.
  r10716 Disabled dynamic creation of unknown/unregistered cvars.
  r10717 Described thread-safety conditions in description header.

I think I have reviewed this code rather thoroughly now and am quite confident in what is OK and what is not.
Here is the citation from CVarSystem.h:

    stgatilov 0005600: CVars must be declared as global variable or as static class member.
    Unlike original Doom 3, a cvar can only have one C++ variable connected to it.

    stgatilov 0005600: Standard thread-safety principles apply:
     * It is OK to read/write different cvars in parallel.
     * It is OK to read the same cvar in parallel.
     * It is WRONG to write cvar in parallel with reading/writing the same cvar.
     * Some methods are only called during initialization, they are never in parallel to anything.

Issue History

Date Modified Username Field Change
02.05.2021 11:59 stgatilov New Issue
02.05.2021 11:59 stgatilov Status new => assigned
02.05.2021 11:59 stgatilov Assigned To => stgatilov
20.12.2021 17:10 nbohr1more Note Added: 0014602
21.12.2021 14:11 stgatilov Target Version TDM 2.10 => TDM 2.11
15.11.2022 06:05 nbohr1more Severity normal => crash
15.11.2022 17:07 nbohr1more Target Version TDM 2.11 => TDM 2.12
17.12.2023 03:16 nbohr1more Target Version TDM 2.12 => TDM 2.13
12.04.2024 08:03 stgatilov Note Added: 0016638
12.04.2024 08:06 stgatilov Note Added: 0016639
12.04.2024 08:06 stgatilov Status assigned => feedback
15.04.2024 05:17 nbohr1more Note Added: 0016653
15.04.2024 16:33 stgatilov Note Added: 0016654
16.04.2024 19:12 nbohr1more Note Added: 0016655
20.04.2024 12:54 stgatilov Note Added: 0016656
20.04.2024 12:54 stgatilov Status feedback => resolved
20.04.2024 12:54 stgatilov Resolution open => fixed
20.04.2024 12:54 stgatilov Fixed in Version => TDM 2.13
20.04.2024 15:44 stgatilov Relationship added related to 0005928