0006725: Read translation strings directly as UTF8

ID	Project	Category	View Status	Date Submitted	Last Update

0006725	The Dark Mod	Feature proposal	public	23.06.2026 20:18	28.06.2026 20:23

Reporter	Geep	Assigned To
Priority	normal	Severity	normal	Reproducibility	have not tried
Status	new	Resolution	open
Product Version	TDM 2.14

Summary	0006725: Read translation strings directly as UTF8
Description	It would simplify the translation process considerably if the engine could read in the UTF8 all.lang file directly, and the other .lang files could be dispensed with. While a big ask code-wise, this could also provide a modern way forward. Maybe a 2.15 or 2.16 roadmap entry?
Additional Information	Possible implementations: 1) Add C++ code to translate from utf8 to iso-8859-x and other encodings. My gen_lang_plus utility has that for iso's. Related bugtracker 3012 also gives a possible code source. 2) More ambitiously, do away with 8-bit encoding entirely (except maybe keyboard entry). UTF8 internally for strings. Instead of using DAT files for fonts, use a different format (let's call it here UDAT), that includes 16-bit Unicode values as the key, instead of implied 8-bit index. Latin and Cyrllic font bitmaps can be merged, since no longer a 256 max character limit. For FMs, if a given all.lang has been lost, it can be easily reassembled from constituent .lang with a utility.
Tags	No tags attached.

Geep 24.06.2026 16:12 reporter ~0017334	Another possible implementation: 3) Like 2, but break up all.lang (and do away with it) into individual <language>.ulang files, encoded in utf8.

Geep 28.06.2026 20:23 reporter ~0017338	Hmmm, the question of breaking up all.lang is a bit orthogonal to other issues, so let me add another possible implementation: 4) Like 1, but break up all.lang (and do away with it) into individual <language>.ulang files, encoded in utf8. all.lang is an enormous text file, which makes it unwieldy to work with. The reason to keep it whole was that all translation would refer to a common [English] source. If we do break it up, perhaps the convention would be to keep the source english as a big comment within each non-english .ulang file? or maybe some versioning enumeration within the english.ulang, that the other .ulang could reference?

Date Modified	Username	Field	Change
23.06.2026 20:18	Geep	New Issue
23.06.2026 20:18	Geep	Relationship added	related to 0003012
24.06.2026 16:12	Geep	Note Added: 0017334
28.06.2026 20:23	Geep	Note Added: 0017338