Alhexx' F.E.A.R. Arch00 File Format Analysis V1.00 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= So, let's take a look at FEAR's (quite stange) Archive format this time. First off, I want to tell you something about the file format layout before we start with the single data types. I. File Layout -=-=-=-=-=-=-= 1. File Header 2. Name Table 3. File List 4. Directory List 5. File Data Okay, now we're ready to analize the single parts of the file. II. File Header -=-=-=-=-=-=-=- Here's the structure for the header: struct FEAR_HEADER { unsigned long ulMagic; // "LTAR" unsigned long ulVersion; // 3 unsigned long ulNameTableLength; unsigned long ulNumDirs; unsigned long ulNumFiles; unsigned long ulUnknown1; // usually 1 unsigned long ulUnknown2; // usually 0 unsigned long ulUnknown3; // 0, 1 unsigned char ucCRC[16]; }; // 48 Bytes ulNameTableLength tells you the length of the name table in bytes. ulNumDirs tells you the number of directories in the directory list. ulNumFiles tells you the number of files in the file list. I really have no idea about the unknown parts... II.i Header CRC -=-=-=-=-=-=-=- As you can see, there's a CRC saved in the header of the file. The length of the CRC is 128 bit (=16 bytes). Unfortunately, I have no idea how the CRC is calculated, but it seems like it's a MD3/MD4/MD5 checksum, however, I have not been able to recalculate it :( But: I've done a few tests using my hex editor and it seems like F.E.A.R. does not check the CRC when loading the files. I have set the CRC in the files "FEARA.Arch00", "FEARE.Arch00" and "FEARL.Arch00" to 0 and the game launched without any errors, so there's a good chance that you can simply leave the CRC 0 when creating a new Arch00 file. ^_^ III. Name Table -=-=-=-=-=-=-=- The next part that follows the header is the name table. The names of all files and directories are stored here, each one separated by a NULL-terminator. However, the NULL-terminator is "stretched" to a multiple of 4. So a name terminator has at least 1 NULL-Byte and 4 NULL-Bytes at maximum. Here's a "screenshot" of the first 64 bytes of the name table from "FEAR.Arch00": 0 1 2 3 4 5 6 7 8 9 A B C D E F | ASCII ----+---------------------------------------+---------------- 0x30|0000 0000 416E 696D 6174 696F 6E44 6174|....AnimationDat 0x40|6162 6173 6500 0000 4645 4152 5F61 6E69|abase...FEAR_ani 0x50|6D2E 4761 6D64 6230 3064 6300 4645 4152|m.Gamdb00dc.FEAR 0x60|5F61 6E69 6D2E 4761 6D64 6230 3070 0000|_anim.Gamdb00p.. The first name is "AnimationDatabase". Its offset within the name table is 0x4. After the name there's a terminating NULL-Char. The position AFTER the terminating NULL-Char is 0x16, which is NOT divideable by 4, so there are 2 more NULL-Chars following. The next name is "FEAR_anim.Gamdb00dc". Its offset is 0x18. The position after the terminating NULL-Char is 0x1C, which IS divideable by 4, so there are no more NULL-Chars. I think now you should understand how the method works. Now let's take a closer look at the first 4 bytes: The first 4 bytes of the name table are always 0. The reason is that the name at offset 0 is reserved for the ROOT-directory, which has no name. (You can see it that way: the name of the Root-dir is "", that explains the 4 NULL-terminating bytes) Note: You do not need to create an array of strings here, the names are not accessed by indices, but by their offset in bytes. This means you can simply read the whole name table as a single string and then simply jump to the offset you need. III.i Directory Names -=-=-=-=-=-=-=-=-=-=- There is something weird about directory names, or to be more exact, Sub-directory names: Let's say you have a directory named "Dir1". And it has got a sub-directory called "SubDir1". Then the name of the sub-directory is stored as "Dir1/SubDir1". That means that the name of a directory contains the full path to it. IV. File List -=-=-=-=-=-=- Let's take a look at the File List Entry Data Type: struct FEAR_FILE { unsigned long ulNameOffset; unsigned __int64 uqFileOffset; unsigned __int64 uqFileSize; unsigned __int64 uqFileSizeCompressed; unsigned long ulDummy; // 0 }; // 32 Bytes ulNameOffset is the offset of the file's name within the name table. uqFileOffset is the absolute offset of the file data within the archive. Note: uqFileSize and uqFileSizeCompressed are usually (or always?) the same, i.e. the files are not compressed. I have to admit that I even do not know which one of the two entries tells you the real and which one the compressed file size, since I haven't found a compressed file yet. And I do not know what compression algorithm would be used if the data was compressed... Note: When implementing this data type, remember to place it in a "pragma pack"-block, otherwise C/C++ will tell you that the structure has a size of 48 bytes *_* V. Directory List -=-=-=-=-=-=-=-=- Now it's getting more interesting: The Directory List... struct FEAR_DIR { unsigned long ulNameOffset; unsigned long ulFirstSubFolder; unsigned long ulNextFolder; unsigned long ulNumFiles; }; // 16 Bytes ulNameOffset is the offset of the directory's name within the name table. It is 0 for the ROOT-directory. ulFirstSubFolder is the index of the first sub folder within the directory list. It is 0xFFFFFFFF if the directory has no sub-dirs. ulNextFolder is the index of the next folder on the same level. It is 0xFFFFFFFF if there are no more directories on the same level. ulNumFiles is the number of files within this directory. To have a better idea of ulFirstSubFolder and ulNextFolder, let's read the following article: V.i Directory Tree Structure -=-=-=-=-=-=-=-=-=-=-=-=-=-= Usually, when implementing a tree-like structure, you would implement it using recursive functions, and you would also store the tree recursively to a file (as most formats do). However, the Arch00 format is different. It stores the directory tree using lists. Let's take a look at a tree here (make sure you use a fixed-size font): ROOT ROOT / | \ | / | \ | / | \ | SD1 SD2 SD3 SD1->SD2->SD3 What you see on your left side is a tree as we (should) know it. What you see on your right side is a tree as it is realized in the Arch00 format. So if you want to access the third sub-directory of a dir, you will have to go through the first two sub-dirs to access it. In the right diagram above, "|" stands for for Sub-Folder and "->" stands for Next-Folder. V.ii Files within a Directory -=-=-=-=-=-=-=-=-=-=-=-=-=-=- As you might have recognized, there is no "FirstFileIndex" or something like this in here. And this is quite unhandy. So, let's say you are at Directory with the index 'i' in the dir list, how do you get the offset of it's first file? Quite simple, but disappointing: You will have to sum up all "ulNumFiles" of the directories 0..(i-1) to get the index of the first file of your directory. VI. Final Words -=-=-=-=-=-=-=- Well, I don't think that this is one of my best documents. However, I hope it will help you with your work on the file format. ~~ Greetings Fly Out To: ~~ Cici ficedula Jackrabbitz Kaddy #17 Levitikus MatsuJin mirex Pitty Qhimm s_kaspar The SaiNt ... and everyone who misses his/her name here! Visit or contact me: -------------------- Home : http://www.alhexx.com Forum : http://forums.alhexx.com or : http://ffab.mypage.sk Mail : alhexx@alhexx.com - Alhexx 00:38 2006-04-24