Patrik Nusszer's programming blog

Transformers games configuration - Low level

;

The key - Thank You, Bacter

In this very first paragraph I'd like to say thanks to Bacter, who took the time and disassembled the game to get the encryption key. It is:

as;dwepo2345098]qw]{}p2039458pseasdfzcvvp;aseiurwefsdcfszdcvn

Introduction

This is the first article on this blog, testing the look of an article on this design. Also I'm not really used to blogging, so be please, tolerate any kind of mistakes regarding expressions, or even grammatics (I'm Hungarian). So, In this article I'd like to share you, programmer, advanced or beginner, how the configuration setting files with the extensions ini and int are ciphered, encrypted. There is an educational utility which can do this job (encryption and decryption), you can download from the softwares page, called TF Game Util. Before reading and downloading, let me notice you: the purpose of this article and the development of the utility is pure 'education' and the improvement of your skills. If you are a fresh developer not committed to anything, willing to discover the world of programming, you'll find interesting articles - I've experienced quite much despite the fact that I've only been programming for 2 years yet. All the experience you discover here is something you will not learn at school, even if it is specialized in teaching programming. So let's get to transformers configurations...

Configuration types

All Cybertronian Transformers games use Unreal Engine. This engine uses 2 types of configurations: one with .ini extension which stores settings for the logical functioning of the game, variable-value assignments. And an other one with varying extensions: .int, .DEU, .ITA, etc... All of these configurations are so called 'localizations'. Sounds familiar? 'locale'? You know, that 'en_US', 'en_GB', 'hu_HU' stuff telling the software which language is requested. In short, these 'localization' files contain texts appearing in the game in different languages. The .int one contains the English texts, the other ones are translations and their extensions well reveal the language they stand for (eg: .DEU means 'Deutsch' telling it is the German translation.)

IO Conclusion

So if they are configurations can I read them like plain text? Of course not. They are encrypted. If I decrypted it, could I read it? Still no. It is because they are not just pure texts encrypted. They got a structure. So there are two factors to consider: cryptography and structure.

The cryptography

The cryptography of the configurations is quite simple.

Simple XOR

No, this not a real cryptographical algorithm. It is a truth table. A logical and bitwise operation. One of those on which your computer's electronics depend on. Such bitwise operations are AND, NOT, OR, and XOR, meaning Exclusive OR. A truth table defines what is the result of the combination of two, or one bit(s). This last one only applies to the NOT operation which takes one operand and its opposite. If the operand is 1, the result is 0. If the operand is 0, the result is 1. In case of AND, 0 AND 0 gives 0, 0 AND 1 gives 0, 1 AND 0 gives 0, 1 AND 1 gives 1. You would remember this truth table like this: The result is 1 if and only if the both the first and the second operands are 1, all other cases gives 0.
OR: 0 OR 0 = 0, 0 OR 1 = 1, 1 OR 0 = 1, 1 OR 1 = 1; If at least one of the operands are 1, the result is 1.
And finally XOR: 0 XOR 0 = 0, 0 XOR 1 = 1, 1 XOR 0 = 1, but 1 XOR 1 = 0; If exactly one of the operands are 1 the result is 1.
In cryptography you have an input source you would like to encrypt and a key. Since they are stored on computers, they are both sequences of bits on your hard disk. And it's as simple as it sounds: you have these two sequences like two bit sequences of operands. You take one bit from each in order and perform the XOR operation on them and you write the result bits to a file.
If the key is shorter than the content you are encrypting, the key is repeated (which is a bad practise anyway). Then you can decrypt the content performing XOR on the encrypted data and the key. You would ask, how is it possible? Why does it neccessarily gives back the exact same original data? XOR is an esceptional one - it's symmatrical. You could mathematically prove it trying out each case. You'll always get back the input.

Now I'll bring up something mentioned but important. Repeating the key is a bad practice. This one brings up some security considerations. Actually, the key for the configurations is 61 characters long, so it's obvious that it is repeated. Though there's something else the game developers took care about: the encryption of zeros, which is something even more dreadful than repeating the key. Take a look:
0 XOR 1 = 1;
0 XOR 0 = 0;
1 XOR 0 = 1;
There's one thing they have in common: if performed, eighter the first or the second operand is the result. If the two types of results are well mixed, along with the fourth non reflecting operation of course, there should be no problem. But otherwise it's like copying eighter the key or the content into the encrypted data.

Key and content in danger

I know it is not time to talk about structure; But it's important to mention that one reason why the configuration can't be plain text is that there can be different encodings in a single configuration. Why? Because by default the encoding is Windows-1252, but that hasn't got a set of characters great enough for translations. Mainly localizations, but the main configurations (.ini) also contain 16 bit Little Endian Unicode blocks, sections. Let's forget about Windows-1252. There's something about Unicode... It's stored in 2 bytes. The latin characters are found in the very beginning of it. Imagine that the German localization file contains mainly latin letters with some special characters requiring the use of Unicode. It means, most of the time, the second byte remains blank...

...filled with zeros

It's like reflecting a character of the key almost every second byte when it comes to the decryption of a Unicode block. And the solution is simple: only encrypting the first byte, leaving the second unencrypted.

Cryptography conclusion

There are 2 (in fact 3, but we'll get to it later) types of blocks in a configuration: A Windows-1252, and a 16 bit Little Endian Unicode encoded. All bytes of Windows-1252 encoded blocks are processed, while only the first byte is processed of the Unicode blocks.

The Structure

As I mentioned there's a kind of structure, the configurations are not just plain text. There are blocks and integers stored according to two's complement. In fact these integers might also be called blocks, in case we call them blocks of bytes. But to make it easy, only call blocks those bytes that you can use for decryption. This is the structure:
- First 4 bytes of the file are converted to a positive 32 bit integer. It can't be negative because this number tells us how many blocks there are in the configuration.
The following pattern is repeated as many times as the number of blocks
- 4 bytes are converted to a signed 32 bit integer. This number defines the size of the next block in bytes. It can be a positive number, a negative, and 0. It's essential to make difference between them as they stand for 3 different block types. Always the absolute value should be taken of the value.
- Next block bytes come. You need to read as many bytes from the encrypted data as many the size of the block. Every block's last byte is a so called null terminator. This is a zero byte telling you it is the end of the block, and it's not processed.
- And that's all... repeat these two until you reach the number of blocks.

The Structure > Integer bytes

So the 4 bytes are converted to 32 bit signed integers. I'm not gonna take the time now to explain how'd you perform the conversion. It's a function of the standard library with which the language you are using comes. You can also write your own function too, it can be solved mathematically. If interested, take a look at the source of my project NussConverterX. It is a converter with a byte and base converter class with an additional string math class. So there are these 4 bytes. But it's not as easy to convert them in case your languages' standard library's function has no endianness parameter. Yes. The best about game engines is that you can develop cross platform games in them. Unreal Engine is no exception. The console, PS and XBOX versions of the Cybertronian games use Unreal Engine too. There are configurations for these too. Though there's a little difference.

Endianness

Endianness is the byte order of the data types stored on different platforms. It can be little or big. One is the reverse of the other. And that's enough to know. If a converter function is compatible with one of them, the reverse of the 4 bytes should be passed to get the value you want. So you should use the endianness in function of platform.
- Windows uses little endian byte order
- XBOX and PS uses big endian byte order. For these platforms a function processing bytes in little endian byte order should be reversed.

NOTE: This information tells you that which endianness is the mainstream, the default, which is suggested to be used. It's up to you in what order you pack your bytes.

The Structure > Integers & blocks

So there are 3 numbers possible when you convert the 4 bytes to a 32 bit signed integer:
- 0: It means a zero size block with no length. The reason why it exists at all is unknown.
- Positive integer: It means a Windows-1252 encoded block where all bytes are processed, except the null terminator which is included in the block.
- Negative integer: It means a 16 bit Unicode block with Little Endian byte order, on all platforms. Only the first byte is processed. You need to take the absolute value of the integer, and always skip the second byte.

The Structure > Cryptography in association with blocks

You might get too distracted concentrating on the most important things to consider, and that's when you forget about details. So remember, because it's important: when decrypting a new block, you should not start from the first character of the key, but where it was stopped. Imagine it like as if you processed a whole document. The difference is that it is in pieces.

Structure conclusion

Structure overview:
- First 4 bytes converted to a 32 bit integer gives the number of blocks. These bytes are converted according to the endianness of the platform.
Repeated as many times as the number of blocks:
- 4 bytes converted to a 32 bit signed integer gives the size of the block in bytes. These bytes are converted according to the endianness of the platform. The absolute value of this integer is taken.
- Reading as many bytes from the stream as the number of the block gives blocks that can be processed. The last byte, the null terminator is skipped.

Overall low-level conclusion

- First 4 bytes converted to a 32 bit integer gives the number of blocks. These bytes are converted according to the endianness of the platform.
Repeated as many times as the number of blocks:
- 4 bytes converted to a 32 bit signed integer gives the size of the block in bytes. These bytes are converted according to the endianness of the platform. The absolute value of this integer is taken.
- Reading as many bytes from the stream as the number of the block gives blocks that can be processed. The last byte, the null terminator is skipped.
- Processing: if the block size was negative, only the first byte is processed if we make byte pairs. If the block size was zero, there is nothing to process. Processing means XORing the key and the block bytes. The result will be an unencrypted byte. The key is not started over every block, it's continued as if a whole document is processed.

Note: In case there are multiple encodings in one configuration, you may guess you can't just write the unencrypted bytes to a file and open it in a text editor. The solution is up to You, Developer. If you ask me, I'd convert the non Unicode characters to Unicode so that there'll be no quality loss.