View Issue Details

IDProjectCategoryView StatusLast Update
0005600The Dark ModCodingpublic04.05.2024 14:02
Reporterstgatilov Assigned Tostgatilov  
PrioritynormalSeveritycrashReproducibilitysometimes
Status resolvedResolutionfixed 
Product VersionTDM 2.09 
Target VersionTDM 2.13Fixed in VersionTDM 2.13 
Summary0005600: Crashes due to cvarSystem not being threadsafe
DescriptionI was doing a regular automatic start of every released FM, when I got a crash on northdale2 FM.
The crash happened immediately after automation "clicked attack to start game" in idCVarSystemLocal::FindInternal function.
Looking at registers and error message, it was clear that cvars[i] was NULL when memory was read, while at the moment of crash it already pointed to correct idCVar object.
Apparently, it was a race condition: someone had already put cvar to hash table, but had not updated array element yet.

It turns out that quite a lot of code start registering or setting cvars when game starts, so this is a serious problem.
I think we should add mutex to idCVarSystemLocal, and protect all methods of idCVarSystemLocal with it.
TagsNo tags attached.

Relationships

related to 0005928 resolvedstgatilov Save game while bow is strung yields broken bow on fresh load 
related to 0006530 assignedstgatilov Make script constants customizable by mission 

Activities

nbohr1more

nbohr1more

20.12.2021 17:10

developer   ~0014602

Move to 2.11 ?
stgatilov

stgatilov

12.04.2024 08:03

administrator   ~0016638

Looking at the description, the race condition happened while registering cvar.
But in reality all cvars are global variables, and they are registered during global CRT initialization in single thread: no race condition is possible in this case!
And after CRT initialization, the set of existing cvars stays unchanged, so there is no problem searching for them without lock.

Perhaps this crash happened because sometimes we used nasty approach of making a cvar static variable inside a function.
Such a cvar is initialized only when the function is first executed, thus:
  1) User cannot see/find it when game starts: it does not exist yet.
  2) If function is called during gameplay in parallel section, then race condition happens.
I stopped doing this because of p.1, but I guess now we have a more important reason to avoid such practice.

I only found one such cvar right now, so I fixed it:
  r10707 Cvar r_maxTri renamed to r_maxTrisPerDrawCall and made global variable.
I guess it also explains why I have not seen any similar crashes in a while.
stgatilov

stgatilov

12.04.2024 08:06

administrator   ~0016639

So let's repeat:
1) It is ok to define idCVar as global variable in cpp file. It does not matter if it has static keyword or not.
2) It is ok to define idCVar as static class member.
3) It is NOT ok to define idCVar as non-static class member (it would be initialized only when object is created).
4) It is NOT ok to define idCVar as static variable inside function (it would be initialized only when function is called).
nbohr1more

nbohr1more

15.04.2024 05:17

developer   ~0016653

I vaguely recall that this issue was at one point considered to be the culprit behind some arrow crashes.

The only one I could reliably repeat was in "The Transaction"

Since this change, I have yet to repeat the crash again. I hope this is the end of that saga.
stgatilov

stgatilov

15.04.2024 16:33

administrator   ~0016654

I heavily doubt that the arrow crash is caused by threadsafety of cvars =)
nbohr1more

nbohr1more

16.04.2024 19:12

developer   ~0016655

I'm not gonna argue on that since my whole statement was a vague recollection.

I think the premise may have revolved around parsing tdm_weapon_next_on_empty during the bow loop but I'll table
anything more on this until I find the old convo.
stgatilov

stgatilov

20.04.2024 12:54

administrator   ~0016656

I cleaned cvars system before trying to implement cvar mission overrides:
  https://forums.thedarkmod.com/index.php?/topic/22428-internal-cvar/

These commits are for merging, i.e. remove internal cvar and cvars sharing:
  r10713 Merged idInternalCVar into idCVar.
  r10714 Cleanup after recent removal of idInternalCVar.
And these later cvars are about threadsafety:
  r10715 Removed "modifiedFlags" state from idCVarSystem.
  r10716 Disabled dynamic creation of unknown/unregistered cvars.
  r10717 Described thread-safety conditions in description header.

I think I have reviewed this code rather thoroughly now and am quite confident in what is OK and what is not.
Here is the citation from CVarSystem.h:

    stgatilov 0005600: CVars must be declared as global variable or as static class member.
    Unlike original Doom 3, a cvar can only have one C++ variable connected to it.

    stgatilov 0005600: Standard thread-safety principles apply:
     * It is OK to read/write different cvars in parallel.
     * It is OK to read the same cvar in parallel.
     * It is WRONG to write cvar in parallel with reading/writing the same cvar.
     * Some methods are only called during initialization, they are never in parallel to anything.
stgatilov

stgatilov

28.04.2024 12:31

administrator   ~0016673

It turned out that some missions use "setcvar" to create cvars dynamically.
Searching over whole missions archive, I found only two cases for that.

The first one is street clock, which passes number of hours between functions in "game_clock_hours" cvar.
I have fixed this:
  r293 [heartv2]. Fixed clock script to NOT use dynamically created cvar.
  r294 [iris]. Fixed clock script to NOT use dynamically created cvar.

The second one is override of "tdm_weapon_arrow.script" by mission.
This scripts used to have "debug mode" in which it wrote and read cvars --- this mode is already deleted from core assets.
I think we should ping mappers to understand why they decided to do this.

Here is the list of missions:
  ahouseoflockedsecrets
  byanyothername
  cauldron_v2_2
  good
  hazard
  northdale1
  northdale2
  seeking
  snowed_inn
  written
Player can't shoot arrows anymore in these missions.

https://forums.thedarkmod.com/index.php?/topic/22428-internal-cvar/&do=findComment&comment=494049
Bikerdude

Bikerdude

29.04.2024 14:33

reporter   ~0016677

So Ive just learned this commit has broken a number of FM's - Northdale series, Written in Stone, AGN, Volta 2, Hazard, Seeking Lady Lecister, and about five others.
stgatilov

stgatilov

04.05.2024 11:31

administrator   ~0016679

Fixed the broken missions:
  r10725 Restored the ability to create cvars dynamically.

Now almost the whole cvarSystem is under mutex (everything that iterates over cvars or searches cvar by name).
Fast access via C++ idCVar variable is lock-free, and still very fast.

Fixed the issue with saving darkmod.cfg:
  r10727 Save config files only after modifications, don't do it every frame.

I somewhat revised the system.
Previously there was only one flag, and idKeyInput + idGamepadInput was modifying it directly.
Now each of the three systems has its own "modified" flag.
The flag in cvarSystem is now atomic bool, and only considers archived cvars.
Bikerdude

Bikerdude

04.05.2024 12:16

reporter   ~0016680

Thanks fella :-)

Issue History

Date Modified Username Field Change
02.05.2021 11:59 stgatilov New Issue
02.05.2021 11:59 stgatilov Status new => assigned
02.05.2021 11:59 stgatilov Assigned To => stgatilov
20.12.2021 17:10 nbohr1more Note Added: 0014602
21.12.2021 14:11 stgatilov Target Version TDM 2.10 => TDM 2.11
15.11.2022 06:05 nbohr1more Severity normal => crash
15.11.2022 17:07 nbohr1more Target Version TDM 2.11 => TDM 2.12
17.12.2023 03:16 nbohr1more Target Version TDM 2.12 => TDM 2.13
12.04.2024 08:03 stgatilov Note Added: 0016638
12.04.2024 08:06 stgatilov Note Added: 0016639
12.04.2024 08:06 stgatilov Status assigned => feedback
15.04.2024 05:17 nbohr1more Note Added: 0016653
15.04.2024 16:33 stgatilov Note Added: 0016654
16.04.2024 19:12 nbohr1more Note Added: 0016655
20.04.2024 12:54 stgatilov Note Added: 0016656
20.04.2024 12:54 stgatilov Status feedback => resolved
20.04.2024 12:54 stgatilov Resolution open => fixed
20.04.2024 12:54 stgatilov Fixed in Version => TDM 2.13
20.04.2024 15:44 stgatilov Relationship added related to 0005928
28.04.2024 12:31 stgatilov Note Added: 0016673
29.04.2024 14:33 Bikerdude Note Added: 0016677
04.05.2024 11:32 stgatilov Note Added: 0016679
04.05.2024 12:16 Bikerdude Note Added: 0016680
04.05.2024 14:02 stgatilov Relationship added related to 0006530