View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0005600 | The Dark Mod | Coding | public | 02.05.2021 11:59 | 04.05.2024 14:02 |
Reporter | stgatilov | Assigned To | stgatilov | ||
Priority | normal | Severity | crash | Reproducibility | sometimes |
Status | resolved | Resolution | fixed | ||
Product Version | TDM 2.09 | ||||
Target Version | TDM 2.13 | Fixed in Version | TDM 2.13 | ||
Summary | 0005600: Crashes due to cvarSystem not being threadsafe | ||||
Description | I was doing a regular automatic start of every released FM, when I got a crash on northdale2 FM. The crash happened immediately after automation "clicked attack to start game" in idCVarSystemLocal::FindInternal function. Looking at registers and error message, it was clear that cvars[i] was NULL when memory was read, while at the moment of crash it already pointed to correct idCVar object. Apparently, it was a race condition: someone had already put cvar to hash table, but had not updated array element yet. It turns out that quite a lot of code start registering or setting cvars when game starts, so this is a serious problem. I think we should add mutex to idCVarSystemLocal, and protect all methods of idCVarSystemLocal with it. | ||||
Tags | No tags attached. | ||||
Move to 2.11 ? | |
Looking at the description, the race condition happened while registering cvar. But in reality all cvars are global variables, and they are registered during global CRT initialization in single thread: no race condition is possible in this case! And after CRT initialization, the set of existing cvars stays unchanged, so there is no problem searching for them without lock. Perhaps this crash happened because sometimes we used nasty approach of making a cvar static variable inside a function. Such a cvar is initialized only when the function is first executed, thus: 1) User cannot see/find it when game starts: it does not exist yet. 2) If function is called during gameplay in parallel section, then race condition happens. I stopped doing this because of p.1, but I guess now we have a more important reason to avoid such practice. I only found one such cvar right now, so I fixed it: r10707 Cvar r_maxTri renamed to r_maxTrisPerDrawCall and made global variable. I guess it also explains why I have not seen any similar crashes in a while. |
|
So let's repeat: 1) It is ok to define idCVar as global variable in cpp file. It does not matter if it has static keyword or not. 2) It is ok to define idCVar as static class member. 3) It is NOT ok to define idCVar as non-static class member (it would be initialized only when object is created). 4) It is NOT ok to define idCVar as static variable inside function (it would be initialized only when function is called). |
|
I vaguely recall that this issue was at one point considered to be the culprit behind some arrow crashes. The only one I could reliably repeat was in "The Transaction" Since this change, I have yet to repeat the crash again. I hope this is the end of that saga. |
|
I heavily doubt that the arrow crash is caused by threadsafety of cvars =) | |
I'm not gonna argue on that since my whole statement was a vague recollection. I think the premise may have revolved around parsing tdm_weapon_next_on_empty during the bow loop but I'll table anything more on this until I find the old convo. |
|
I cleaned cvars system before trying to implement cvar mission overrides: https://forums.thedarkmod.com/index.php?/topic/22428-internal-cvar/ These commits are for merging, i.e. remove internal cvar and cvars sharing: r10713 Merged idInternalCVar into idCVar. r10714 Cleanup after recent removal of idInternalCVar. And these later cvars are about threadsafety: r10715 Removed "modifiedFlags" state from idCVarSystem. r10716 Disabled dynamic creation of unknown/unregistered cvars. r10717 Described thread-safety conditions in description header. I think I have reviewed this code rather thoroughly now and am quite confident in what is OK and what is not. Here is the citation from CVarSystem.h: stgatilov 0005600: CVars must be declared as global variable or as static class member. Unlike original Doom 3, a cvar can only have one C++ variable connected to it. stgatilov 0005600: Standard thread-safety principles apply: * It is OK to read/write different cvars in parallel. * It is OK to read the same cvar in parallel. * It is WRONG to write cvar in parallel with reading/writing the same cvar. * Some methods are only called during initialization, they are never in parallel to anything. |
|
It turned out that some missions use "setcvar" to create cvars dynamically. Searching over whole missions archive, I found only two cases for that. The first one is street clock, which passes number of hours between functions in "game_clock_hours" cvar. I have fixed this: r293 [heartv2]. Fixed clock script to NOT use dynamically created cvar. r294 [iris]. Fixed clock script to NOT use dynamically created cvar. The second one is override of "tdm_weapon_arrow.script" by mission. This scripts used to have "debug mode" in which it wrote and read cvars --- this mode is already deleted from core assets. I think we should ping mappers to understand why they decided to do this. Here is the list of missions: ahouseoflockedsecrets byanyothername cauldron_v2_2 good hazard northdale1 northdale2 seeking snowed_inn written Player can't shoot arrows anymore in these missions. https://forums.thedarkmod.com/index.php?/topic/22428-internal-cvar/&do=findComment&comment=494049 |
|
So Ive just learned this commit has broken a number of FM's - Northdale series, Written in Stone, AGN, Volta 2, Hazard, Seeking Lady Lecister, and about five others. | |
Fixed the broken missions: r10725 Restored the ability to create cvars dynamically. Now almost the whole cvarSystem is under mutex (everything that iterates over cvars or searches cvar by name). Fast access via C++ idCVar variable is lock-free, and still very fast. Fixed the issue with saving darkmod.cfg: r10727 Save config files only after modifications, don't do it every frame. I somewhat revised the system. Previously there was only one flag, and idKeyInput + idGamepadInput was modifying it directly. Now each of the three systems has its own "modified" flag. The flag in cvarSystem is now atomic bool, and only considers archived cvars. |
|
Thanks fella :-) | |
Date Modified | Username | Field | Change |
---|---|---|---|
02.05.2021 11:59 | stgatilov | New Issue | |
02.05.2021 11:59 | stgatilov | Status | new => assigned |
02.05.2021 11:59 | stgatilov | Assigned To | => stgatilov |
20.12.2021 17:10 | nbohr1more | Note Added: 0014602 | |
21.12.2021 14:11 | stgatilov | Target Version | TDM 2.10 => TDM 2.11 |
15.11.2022 06:05 | nbohr1more | Severity | normal => crash |
15.11.2022 17:07 | nbohr1more | Target Version | TDM 2.11 => TDM 2.12 |
17.12.2023 03:16 | nbohr1more | Target Version | TDM 2.12 => TDM 2.13 |
12.04.2024 08:03 | stgatilov | Note Added: 0016638 | |
12.04.2024 08:06 | stgatilov | Note Added: 0016639 | |
12.04.2024 08:06 | stgatilov | Status | assigned => feedback |
15.04.2024 05:17 | nbohr1more | Note Added: 0016653 | |
15.04.2024 16:33 | stgatilov | Note Added: 0016654 | |
16.04.2024 19:12 | nbohr1more | Note Added: 0016655 | |
20.04.2024 12:54 | stgatilov | Note Added: 0016656 | |
20.04.2024 12:54 | stgatilov | Status | feedback => resolved |
20.04.2024 12:54 | stgatilov | Resolution | open => fixed |
20.04.2024 12:54 | stgatilov | Fixed in Version | => TDM 2.13 |
20.04.2024 15:44 | stgatilov | Relationship added | related to 0005928 |
28.04.2024 12:31 | stgatilov | Note Added: 0016673 | |
29.04.2024 14:33 | Bikerdude | Note Added: 0016677 | |
04.05.2024 11:32 | stgatilov | Note Added: 0016679 | |
04.05.2024 12:16 | Bikerdude | Note Added: 0016680 | |
04.05.2024 14:02 | stgatilov | Relationship added | related to 0006530 |