There’s no complete font available online with all the different font sizes and characters from Nokia cellphones from the late 90s and early 2000s, so I set out to find a way to make my own. There’s a handful of programs out there that can open and edit different parts of early Nokia firmwares, but the vast majority depend on Windows, and have become increasingly hard to find over the past 20 years. I wanted a method of extracting the bitmap for each character from a DCT3 Nokia phone without the help of any external program, under Linux.
Obtaining a firmware flash file
Two versions of flash files are typically used when extracting or flashing DCT3 phones: the .fls format, commonly referred to as Dejan flash files, and the Wintesla format. Both formats serve the same function, but the Dejan format had the entire flash saved in a single .fls binary file, while the Wintesla format has the PPM as a separate file. The font chunk is saved in the same way in either format, so the type used is of little importance.
A flash cable, such as the one described on NuukiaWorld, can be used to extract the flash file of a DCT3 phone. However, this firmware will only have strings and fonts from the locale it’s programmed with, meaning some characters won’t be available in the final exported font. The ideal option is to find Nokia’s flash files used for their service software Wintesla, as it comes with all the different locales available for a particular software version for the chosen phone model. In my reverse engineering of the font storage format, I’ll be using firmware version 4.18 extracted from a 3310, with the B langpack, which includes British English, German, French, Italian, Dutch, Turkish, Arabic, and Hebrew.
Finding the font chunk
The firmware consists of several different blocks: MCU, PMM, and PPM. Each block is then split up into further chunks and subchunks, which can be seen when loading up the firmware in PPModd or NokiX. The PPM is the block that’s of interest here, since it contains all the fonts for a particular locale, as well as all the menu strings and bitmaps. Exporting the FONT chunk in either of these programs results in a binary file that is a subset of the larger PPM block. A quick comparison between the FONT chunk file and the flash file it originated from shows that the chunk starts 28 bytes after the start of the string 46 4F 4E 54 66 63 6F 6E 76
(seen as FONTfconv
in the text view). This holds true for all the firmwares of the different phones I’ve tested, meaning this string can be used to programmatically find the FONT subchunk.
Data structure
Both the PCD8544 and the OM6206 were used to drive the LCDs on phones in the DCT3 generation, both using the same method of memory addressing. I found conflicting reports as to whether the PCF8812 or the OM6206 was used to drive the larger 96×65 displays for DCT3 phones. On the surface, they seem to be identical, but they have swapped endian-ness, with the location of the MSB and LSB being flipped. Given that I can take the display from a Nokia 3410 and place it in a 3310 and still see the contents of the screen in the top left corner without anything being flipped around, I find it more likely that the OM6206 was used. The datasheet details that each byte represents 8 vertical pixels, with the least significant bit as the topmost pixel and the most significant bit as the lowermost pixel.
Image data in Nokia’s firmware is stored in the same format. For this character, each column can be interpreted as a byte, translating to 7E FE 88 FE 7E 00
in hexadecimal. When viewing the font in a hex editor, this is how it would be seen. In the firmware, all fonts of equal width are stored as a vertical list in a matrix. My firmware has fonts ranging from 2 pixels wide all the way to 13 pixels wide, so there are 12 separate matrices containing font pixel data.
Chunk layout
The FONT chunk can be divided into 6 areas.
#1
The first area consists of the first 4 bytes, with the fourth byte being the number of font styles and weights present in the PPM. In this particular firmware, the fonts large/bold, small/plain, small/bold, and tiny/plain are present, so the fourth byte is set to 04
.
#2
The second area follows these 4 bytes. The length of this area is a multiple of 44 bytes, with the multiple being the fourth byte of the first area. This firmware has 4 fonts, so this area is 176 bytes long. It contains some general information about the fonts, such as the font names and some memory locations, and can be split up in the following format for better visibility:
00 00 15 58 00 00 15 38 00 00 00 B0 00 00 00 DD 00 6E E0 00 07 80 00 00 6C 61 72 67 65 00 00 00 00 00 00 62 6F 6C 64 00 00 00 00 00
00 00 15 2C 00 00 15 14 00 00 07 74 00 00 00 CB 00 65 E0 00 07 80 00 00 73 6D 61 6C 6C 00 00 00 00 00 00 70 6C 61 69 6E 00 FF 00 00
00 00 15 00 00 00 14 F0 00 00 0D A8 00 00 00 D1 00 67 E0 00 07 80 00 00 73 6D 61 6C 6C 00 00 00 00 00 00 62 6F 6C 64 00 00 FE 00 00
00 00 14 D4 00 00 14 CC 00 00 14 0C 00 00 00 14 00 14 E0 00 07 80 00 00 74 69 6E 79 00 00 00 00 00 00 00 70 6C 61 69 6E 00 FD 00 00
a b c d e f g h i j k
The splits and what they are meant to represent are mostly educated guesses based on what I’ve found in different firmwares.
a
: Subtracting b
from a
results in a multiple of 8, with the difference for each subsequent font being 8, 16, 24 less than the previous. The difference between a
and b
will always be 8 for the last font. The significance of a
, b
, or the difference between them is not known to me yet.b
: Same as a
.c
: Represents the byte offset within the font chunk of where information for each character is stored, but with a 44 byte multiple offset applied. The multiple is determined by the font’s position in the list. The first font has an offset of 0 bytes added to the value at c
, the second has 44 bytes added, the third 88, the fourth 132, and so forth.d
: This seems to be a multiple of extra of what I refer to as “spacers” present after the character information.e
: Represents the number “character groups”, minus 1, explained further down.f
: Seems like it might be related to e
, but I currently don’t know its use.g
: Default character. Seems to be E000
for alphanumerical fonts and 0020
for numerical fonts. Special fonts such as the calculator or phone book font use default characters such as 002A
or 0039
.h
: Flags of some sort for the font. I’ve only ever seen 07 80 00 00
, but their function isn’t immediately obvious to me.i
: Font style, eg. large, small, chin.j
: Font weight, eg. bold, plain, 24.k
: Probably some sort of style/weight ID. Typically starts at 00
, and decrements by 1 for each further subsequent style (FF
, FE
, FD
, FC
…), however this does not hold true for all firmwares.
#3
The third area comprises information and the location of the bitmap data for each character. The length in bytes of this area is the sum of the number of char groups specified by e
in the previous area, with each char group being 8 bytes long. These have the following format:
... 00 20 00 21 00 00 05 AA 00 22 00 22 00 00 15 AA 00 23 00 25 00 00 19 AA 00 26 00 26 00 00 1D AA 00 27 00 27 00 00 01 AA 00 28 00 29 00 00 09 AA 00 2A 00 2B 00 09 D9 AA 00 2C 00 2C 00 06 89 AA 00 2D 00 2D 00 00 11 AA 00 2E 00 2E 00 09 C9 AA 00 2F 00 2F 00 03 51 AA ...
The first two bytes are the Unicode value for a character, specifying the first character of this character group. The following two bytes are also a Unicode character, specifying the last character in the group.
A group consists of characters with consecutive Unicode values that have both the same height and width. In the above snippet, a few character groups from the large/bold font from my firmware are shown. Characters 0020
and 0021
are both 3 pixels wide and 13 pixels tall, so they can be lumped into a group together. Character 0022
is 7 pixels wide, and neither 0021
nor 0023
are of the same width, so it’s in a group on it’s own. Characters 0023
, 0024
, and 0025
all have the dimension of 8×13 pixels, so they share a group.
The last four bytes can be further split into 3 overlapping pieces of information. The first three of those 4 represent the vertical pixel location for the start of the char group within the larger font pixel matrix, multiplied by 64, and added to the value from the first char group of the same width. The last of those 4 represents the baseline position for that font. However, in the above snippet, the number in the first 3 bytes from the last 4 is odd, which can’t be divided by 4, and the last byte seems to indicate that the baseline is at 170 pixels, yet the font is only 13 pixels tall. It turns out that a third piece of information shares space with these 2 pieces of data. The last 4 bits of those first 3 bytes and the first 4 bits of the last byte also contain information about the height of the font, ranging between 0 and 31, multiplied by 2.
The vertical offset is multiplied by 4, the height by 2, and the baseline by 1, allowing for the data to be differentiated with the following logic:
If the first three bytes are odd, subtract 1, and divide by 4. If the result is an integer, the first hexadecimal digit of the height is 1, and if it isn’t, the first hexadecimal digit of the height is 3. If the first three bytes are even, divide by 4. If the result is an integer, the first hexadecimal digit is 0, otherwise, it is 2. The baseline for the font can range between 0 and 30, 0
and 1E
in hexadecimal. If the first hexadecimal digit of the last byte is odd, subtract 1, and if it is even, do nothing. The result is the second hexadecimal digit of the height. Subtracting these resulting digits from their originating bytes, as well as concatenating them, yields the 3 values separated from each other.
For example, the char group 00 78 00 7A 00 3D D5 AA
contains the three characters 0078
, 0079
, and 007A
. 003DD5
, 15829 in decimal, is odd. Subtracting 1 and dividing by 4 results in 3957, which is an integer, so the first digit for the height byte is 1
. The first digit of AA
, 10 in decimal, is even, so no modification is needed. For this character group, the vertical offset bytes for the first character in the list are 000014
(20). Subtracting 1 and 20 from 15829 and dividing by 64 gives us 247, which is the vertical index in the matrix for this character width where these 3 characters are stored. Removing the first digit from AA
yields 0A
, which means the baseline is 10 pixels from the top. Concatenating the two height digits results in 1A
, which when divided by 2 is 13 in decimal, the height of this character.
#4
The fourth area serves a purpose I have yet to discover. It consists of what I refer to as “spacers”, typically only 00 00 FF FF 00 00 00 00
, or something similar. Some firmwares include extra spacers with seemingly random values. The number of 00 00 FF FF 00 00 00 00
corresponds to the number of font style/weights present in the firmware, and the number of extra random spacers with corresponds to the number at d
in the second area. In this firmware, there are 4 fonts and d
is 0 for all 4 of them, so the total length of this area is 8×4 = 32 bytes.
#5
The fifth area contains information about the matrices for each of the font widths. Each piece of information is 12 bytes long, and the number of these pieces corresponds to the number of wont widths in the font. As mentioned earlier in when explaining the data structure, my firmware has 12 font widths, so the total length of this area is 12×12 = 144 bytes. These pieces of information can be split in the following format:
00 00 00 90 00 01 00 02 00 00 00 B8 00 00 00 B4 00 01 00 03 00 00 02 C8 00 00 01 B4 00 01 00 04 00 00 03 C0 00 00 03 88 00 01 00 05 00 00 06 30 00 00 07 5C 00 01 00 06 00 00 0A 90 00 00 0F 3C 00 01 00 07 00 00 05 70 00 00 13 F4 00 01 00 08 00 00 05 30 00 00 19 18 00 01 00 09 00 00 03 A0 00 00 1D 20 00 01 00 0A 00 00 01 70 00 00 1E E0 00 01 00 0B 00 00 00 B0 00 00 1F C8 00 01 00 0C 00 00 00 60 00 00 20 4C 00 01 00 0D 00 00 00 20 a b c d
a
: This is the number of bytes from the first byte of each piece of information for which the pixel data for that matrix is stored.b
: Seems to always be 00 01
, throughout every firmware I’ve looked at.c
: The character width for that matrix.d
: The height of that matrix.
#6
This sixth area contains the pixel data for character matrices from the previous area. The length of this area can be obtained by the sum of the multiplication of the width and height of each matrix divided by 8. The data in this area is the raw pixel data in the format outlined earlier when explaining the data structure. It is stored in vertical columns of characters for each character width in the font.
Producing bitmaps
With all this information, the process of extracting a separate bitmap for each character can be done automatically with the help of some code. A copy of a quick C program I made for this can be found on my GitHub. I’ve tested this code on the firmwares of DCT3 phones with both 84×48 and 96×65 displays with no apparent issues. It also works on firmwares which include a Chinese font, which seems to crash many of the older available programs that processed fonts from these phones, probably due to the fact that it has a variable height throughout the same font. This should also work with a decrypted PPM of some DCT4 phones, but this is untested at the time of writing.
These bitmaps can then be further processed into another usable format for a different platform, such as a FontForge script to convert it to a bitmap font.
0 Comments
1 Pingback