2007-03-01

Quark file format

This is the file format used by Quark Copydesk.

The file is broken into chunks. Each chunk is exactly 256 bytes long. Chunks are identified by their ID number. Chunk 1 starts at position 0, Chunk 2 starts at position 512, and so on. The first chunk in the file contains the header followed by the TOC.

Quark Header:
The first thing in this file is a 26 byte Quark Header.

0000 0004 File Format Version (0x001b001b)
0004 0008 Identifier ("SPIFSPOC")
000c 0004 QPS Header Offset
0010 000a Reserved
The QPS Header Offset points to the QPS Header attached to the end of the file.

Table of Contents:
Immediately after the header is a table of contents. This table lists the positions in the file for all the interesting sections of the file. Each entry is the Chunk ID of the start of the section.

0000 0004 QPS Column
0004 0004 QPS History
0008 0004 Unknown
000c 0004 Unknown
0010 0004 Backup Color
0014 0004 Backup Font
0018 0004 Color
001c 0004 Font
0020 0004 Component
0024 0004 Style
0028 0004 Unknown
002c 0004 Unknown
0030 0004 Component Data


Some of the sections are unknown. The main sections we are interested in are Font, Component, Style, and Component Data.

General Section Format:
Sections are made up of one or more chunks. Each Chunk is formatted as follows.

0000 00fc Chunk Data
00fc 0004 Next Chunk ID


Next Chunk ID will be 0 if this is the last chunk in the section. Otherwise it points to the next chunk that makes up the current section's data.

If the Next Chunk ID is negative, then the chunk it references is a special chunk called a Contiguous Chunk. Negate the Contiguous Chunk's ID to obtain the actual Chunk ID. A Contiguous Chunk looks different than a regular chunk.

0000 0002 Number of Contiguous Chunks
0002 **** Chunk Data
**** 0004 Next Chunk ID


Chunk Data is (Number of Contiguous Chunks)*256-6 bytes long. That means that if there is only 1 Contiguous Chunk, then Chunk Data is 250 bytes long, and Next Chunk ID is located at the end of the chunk.

This system means that all sections will be padded to fit into their respective chunks.

When describing section formats below, we will ignore the chunk system, and simply focus on the format of the data that is stuffed into those chunks.

Font Section:
The Font section contains a list of all the fonts used in this document. The fonts listed in this section are references in the rest of the file by their ID number.

0000 0004 Length
0004 0002 Number of Fonts
0006 **** Font Record 0
**** **** Font Record 1
**** **** Font Record 2 ...


Each Font Record looks like so.
0000 0002 Font ID
0002 0002 Padding
0004 0001 Font Name Length
0005 **** Font name
**** 0001 Short Name Length
**** **** Short Name


Style Section:
The Style section contains a list of all the Paragraph and Character styles used in the document. The overall section is structured as follows.

0000 0004 Length of Character Styles
0004 009a Character Style 0
009e 009a Character Style 1
0138 009a Character Style 2...
**** 0004 Length of Paragraph Styles
**** 00c6 Paragraph Style 0
**** 00c6 Paragraph Style 1
**** 00c6 Paragraph Style 2...


Each Character Style is 154 bytes long and is structured like so.

0000 0001 Length of Style Name
0001 0055 Style Name
0056 0002 Font ID
0058 0002 Font Flags
005a 0004 Font Size (16.16 fixed text size)
005e 001f Unknown
0079 0001 Hide Flag (notes have this flag set)
007a 0020 Unknown


Character Style Flags:
0001 Bold
0002 Italic
0004 Underline
0008 Outline
0010 Dropshadow
0020 Superscript
0040 Subscript
0100 Superior
0200 Strikeout
0400 Allcaps
0800 Smallcaps


Each Paragraph Style is 198 bytes long and looks like so.

0000 0001 Style Name Length
0001 003f Style Name
0040 0002 Style ID
0042 0016 Reserved
0058 0002 Justification
005a 006c Reserved



Component Section:
The Component section lists the name and type of all the components in the file.
0000 0004 Unknown
0004 0004 Component Section Length
0008 0008 Reserved
0010 0002 Number of Components
0012 001e Reserved
0030 008e Component 0
00be 008e Component 1
014c 008e Component 2...


Each Component is 142 bytes long and is structured like so.

0000 0018 Unknown
0018 0004 Component ID
001c 0004 Unknown
0020 0001 Type Length
0021 001f Component Type
0040 0001 Name Length
0041 001f Component Name
0060 002e Reserved


Component Data Section:
The Component Data section contains the raw data for each component.

0000 0002 Number of Components
0002 001c Component Index 1
001e 001c Component Index 2
003a 001c Component Index 3...


Some components have some extra information at the end of them... so you cannot rely on the offsets of any component until you have checked the component prior. Each component index is fairly straightforward.

0000 0004 Unknown
0004 0004 Flags
0008 0008 Reserved
0010 0004 Chunk ID
0014 0004 Component ID
0018 0004 Reserved
001c 0004 Extra Length
0020 **** Extra Data


If Flags is nonzero, then Extra Length and Extra Data are present. Extra Data contains information such as the filename that the component was imported from.

Each Component Index points to a Chunk that contains pointers to the raw text and styles of the component.

0000 0004 Length of text in this component
0004 0004 Number of Paragraphs * 8
0008 0004 Paragraph 0 Chunk ID
000c 0004 Paragraph 0 Length
0010 0004 Paragraph 1 Chunk ID
0014 0004 Paragraph 1 Length...
**** **** Character Styles
**** **** Paragraph Styles


The Paragraph Chunks are special Chunks. They don't follow the standard Chunk format at all. Paragraph 0 starts at the referenced Chunk ID, however, the data of that block just continues from the beginning for length bytes. There is no chunk metadata at all.

The Character Styles are shown below.

0000 0004 Number of Character Styles * 8
0004 0004 Style 0 ID
0008 0004 Style 0 Length
000c 0004 Style 1 ID
0010 0004 Style 1 Length...


The way the styles work is pretty simple. Style ID references a specific Style in the Style Section. This style applies for Length bytes of text. Then the next style referenced takes over and applies to the next Length bytes of text, and so on.

Paragraph Styles work in the exact same way.

QPS Header:
This header is identical the format used in the QPS Protocol (more on that later).

No comments: