View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0004661||The Dark Mod||Mapping||public||12.11.2017 12:23||19.03.2018 16:15|
|Status||closed||Resolution||no change required|
|Platform||PC, Windows, x64||OS||Win 7/8||OS Version||Sp2/8.1|
|Product Version||TDM 2.06|
|Target Version||TDM 2.06||Fixed in Version||TDM 2.06|
|Summary||0004661: TDM x64 crashing to desktop during DMAP|
|Description||The x64bit version of TDM is crashing to desktop during DMAP.|
|Steps To Reproduce||1. load tdm.|
2. run dmap command "dmap <mapname>" and wait for map compile to complete
3. TDM crashed to desktop, I then take a condump.
|Additional Information||# copy of condump here - https://mega.nz/#!2Et0GLDI!kL3jUJrsHT8RiECq-0Kz8_5IcfboxEMYGcfGlgMMvMY|
# copy of console print, see attached.
|Tags||No tags attached.|
shadowhide_loadtdm_206_86-64.zip (10,265 bytes)
The problem happens during MergeLeafNodes of AAS build.
One of the nodes gets freed with pointer still pointing at it. As a result recursive search gets 0xfeeefeeefeeefeee pointer which crashes. Most likely it is a bit different when not starting TDM with debugger, but anyway the logical error of leaving a pointer to freed node would result in a crash.
Most likely the problem is caused by bad brushes.
There are some brushes which seem to be unbounded and the following warning is generated for them:
brush %d on entity %d is unbounded (+ some coordinates)
As a result, their faces have some points with y = -262144 (this is constant MAX_WORLD_SIZE). For such huge coordinates, 32-bit float becomes rather imprecise.
Inside brush BSP some node gets to the wrong side of the splitting plane, This node contains the portals of the bad brush, so its bounds have minY = -262144. When the BSP leaves A and B are merged, all references to A are replaced with references to B, after which B is deleted. The references are found via BSP search using bounds (see UpdateTreeAfterMerge_r), and due to the wrong structure of BSP some reference is not found and not replaced.
As a result, there is a reference to freed node, which causes the crash.
I think what I should say here is: listen to warnings of DMap.
I can of course try to change tolerances (i.e. epsilons) in the algorithm, but this is a very dangerous path. Moreover, the tolerances are adjusted for normal values in normal cases, and it makes no sense to adjust them for huge values like 262144 which must not normally happen.
Also, the reason why this does not happen in 2.05 is most likely switching from x87 arithmetic to SSE arithmetic. This does not immediately mean that the new build performs worse, it simply behaves different.
|Well, I look at the reported unbounded brushes (10354, 10471, 10487, 11034) in map-file, and I do not see anything suspicious. There are typical boxes, all normals orthogonal or opposite to each other.|
|Perhaps it would be prudent to test this on a simple map to isolate the problem. As far as I'm aware this bug happens every single time, no matter the map complexity.|
Ok, I removed the problematic brushes by using DarkRadiant's Map->Find Brush functionality. After that the unbounded brush messages disappeared.
But the issue remained in place: the node does not have correct place in BSP tree judging from its bounds. And now the bounds are normal, no huge nunmbers like -2^18 there.
So the issue is worse because there is no diagnostic about it.
Given that this map is too large, it is very hard to debug or analyze anything. Even getting to the place of crash takes 10 minutes in release build (on Ryzen CPU).
If anyone else has crashes or serious issues with dmapping (on other maps), I'll definitely investigate them. But I hardly see how should I proceed with this issue.
I have some hope for the following idea: find some stuff which can be deleted so that the error goes away. My last message (PM) about it was:
"How can I see Z coordinate in DarkRadiant?
X and Y coordinates are shown when moving mouse on the plan, but the map has too many levels by Z.
I need to find what can generate bounds [304, 114, 656.70] --- [395, 130, 678.32]."
The question is still important: if it is possible to understand where it is, we should try to delete something around it, and perhaps the error goes away.
Alternatively, we can try to create a MWE out of the map. We can iteratively bisect the map and check in which half the error happens. I tried to remove half of the map, but then dmapping stopped earlier due to "leaking". It seems that bisecting maps is not very easy and needs good mapping skills, which I don't have. Needless to say, it is very time-consuming too.
Currently it seems that there is some numeric instability during dmapping, which is caused by something on the map. That's not the defect of the map, just some bad luck. If this is true, people won't meet this issue frequently.
|Is this the only map that suffers this problem?|
No one else complained by now.
It can mean that one one else tested dmapping...
Given the "10 minute dmap", I assume that the sample map provided includes both the 'main' map and the 'sewer' map.
Does gameplay require that the player return to the sewers once he's left them?
If not, what is the challenge of splitting this into two maps?
Or, as a compromise, use the target_endlevel entity to leave the sewers and enter the streets? But there would be no going back to the sewers using this method.
What's the downside of dmapping using 32-bit dmap and playing the mission with 64-bit TDM?
There should be no crashes (otherwise all existing missions are at risk), but is there a question of precision loss that a player would notice?
After 64-bit dmapping 3 large TDM maps (BCD, Politics, and L&L) with no crashes, I'm leaning toward not fixing dmap for Shadowhide's map as long as:
1 - it dmaps okay using 32-bit dmap
2 - the result is playable on 64-bit TDM
I'm reluctant to tinker with dmap for 2.06 this late in the game. I suspect this is a problem with the architecture of Shadowhide's city map.
If it becomes commonplace for mappers to make huge maps like Shadowhide's in the future, and the problem arises again, then we can revisit it in a later version of TDM, but NOT during beta periods.
I have no idea, but most likely the version dmapped with 32-bit build or even with 2.05 build is perfectly playable on any build of 2.06.
There is a worse side. In my opinion, this issue is caused by some floating point instability. Given the complexity of dmap algorithm (and the size of the map), it is not surprising.
The most terrible thing that some floating point instabilities can produce bad maps: dmapping does not crash, but you will have issues at some point of the map as you play. I hope that anything like this doesn't and won't happen, but such a possibility should not be forgotten.
As for the Bikerdude's map, I can probably provide some assistance for fixing it (well, my last idea was to try deleting things near some coordinates and see if it helps).
Should I close the issue as "won't fix"?
|Let’s hear what Biker says first|
Last edited: 05.01.2018 20:51
1. it dosen't now dmap using 32bit, and has never dmapped with the x64 version.
2. if I dmap it with 2.05 it will play in any version of TDM.
On the subject of the map, its not massively larger than BCD in size of complexity -
Brushes - 33061
Patches - 14432
Entities - 6108
File Size - 54MB
Brushes - 26273
Patches - 15950
Entities - 6137
File Size - 45MB
The dmap time is down to 3-5mins now depending on CPU used.
What I recommend at this point is to dmap your map in 2.05. It appears that any fix for this will require some changes to dmap that we should not be willing to undertake at this stage of getting 2.06 out the door.
If this were a rampant problem across many new maps, I'd say we fix it now, but after following all the dmapping issues you've been having with this map, it strikes me that the map is rather unique.
Hopefully we can get it sorted in 2.07, should you ever need to re-release the map in the future.
Does this map depend on any assets that appear for the first time in 2.06?
If so, can you copy them to your mission folders and build in 2.05?
I'm thinking we push the problem to 2.07, especially since stgatilov suggested it might be a floating point problem tickled by huge maps.
@Grayman, nope its all 2.05.
Just tried to dmap under 2.06 to see if it would crash and it did with the usual malloc error.
Per this thread:
I'm closing this issue.
|Was marked as closed by Grayman, but still appears to be open.|
||Status||new => assigned|
||Assigned To||=> stgatilov|
||File Added: shadowhide_loadtdm_206_86-64.zip|
|14.11.2017 16:58||stgatilov||Note Added: 0009613|
|20.11.2017 17:36||stgatilov||Note Added: 0009646|
|21.11.2017 03:26||stgatilov||Note Added: 0009651|
|21.11.2017 08:00||Spooks||Note Added: 0009652|
|21.11.2017 16:43||stgatilov||Note Added: 0009661|
|06.12.2017 15:56||stgatilov||Note Added: 0009737|
|13.12.2017 13:39||grayman||Note Added: 0009768|
|13.12.2017 13:45||stgatilov||Note Added: 0009769|
|13.12.2017 13:47||grayman||Note Added: 0009770|
|13.12.2017 13:48||grayman||Note Edited: 0009770||View Revisions|
|13.12.2017 14:00||grayman||Note Added: 0009771|
|13.12.2017 14:54||grayman||Note Added: 0009772|
|13.12.2017 14:55||grayman||Note Edited: 0009772||View Revisions|
|13.12.2017 15:52||stgatilov||Note Added: 0009778|
|13.12.2017 16:02||grayman||Note Added: 0009779|
|02.01.2018 17:13||stgatilov||Assigned To||stgatilov => user81|
|02.01.2018 17:13||stgatilov||Status||assigned => feedback|
||Note Added: 0009983|
||Status||feedback => assigned|
||Note Edited: 0009983||View Revisions|
||Note Edited: 0009983||View Revisions|
|07.01.2018 17:15||grayman||Note Added: 0009988|
|23.01.2018 15:13||grayman||Note Added: 0010029|
||Note Added: 0010048|
|15.03.2018 13:16||grayman||Note Added: 0010063|
|15.03.2018 13:16||grayman||Status||assigned => resolved|
|15.03.2018 13:16||grayman||Resolution||open => no change required|
|15.03.2018 13:16||grayman||Fixed in Version||=> TDM 2.06|
||Note Added: 0010094|
||Status||resolved => closed|