2019-09-24

A Word Search Generator

A while ago I needed to generate a daily word search using a word list that was unique for each day.  I didn't find anything online so I invented my own algorithm.  It is designed to encourage words to overlap as much as possible.

The setup for the algorithm is very simple and a bit boring, so I'll just cover the basics here.  The two things we need before we begin is a list of words to put in the word search, as well as the dimensions of the letter grid.  For my implementation, the dimensions and a file containing a whitespace-separated list of words were passed in as arguments.

The word list also needs one other, non-obvious thing.  You need to make sure that none of the words in the word list are substrings of any other words in the word list.  Be sure to check for reverse matches as well.  For example, "times" and "emit" would be very confusing to search for.

Then before we hit the main algorithm, we sort the word list by length, with the longest words first.  We then loop through all the words in the word list and attempt to insert them into the grid.  If we succeed, we mark the word as used so we can later print it out as part of the clue list.

for (word : words) {
  if (addWord(grid, word)) {
    word.wasUsed = true;
  }
}

To add a word to the grid, we first loop through every cell in the grid.  We check to see if the cell is blank or if the cell matches the first letter of our given word.  If either of these are true, we measure the score for that position.  Then we move onto the next cell.

addWord(grid: Grid, word: Word): bool {
  bestScore = 0;
  for (y := 0..grid.height) {
    for (x := 0..grid.width) {
      if (grid.cell(x, y) == Blank || grid.cell(x, y) == word[0]) {
        measure(grid, word, x, y);
      }
    }
  }
  // random best fit goes here
  ...
}

To measure the score, we try to place the word in the grid at that location in all directions.  We then calculate a score based on the number of intersections for a given direction.

measure(grid: Grid, word: Word, x: int, y: int) {
  for (dy : [-1, 0, 1]) {
    for (dx : [-1, 0, 1]) {
      if (dx == 0 && dy == 0) {  // no direction
        continue;
      }
      ex := x + word.length * dx;
      ey := y + word.length * dy;
      if (ex < 0 || ex >= grid.width ||
          ey < 0 || ey >= grid.height) {  // word doesn't fit
        continue;
      }
      int score = 1;
      for (i := 0..word.length) {
        cell := grid.cell(x + i * dx, y + i * dy);  // next cell
        if (cell == word[i]) {  // matches word
          score++;
        } else if (cell != Blank) {  // word doesn't fit here at all
          score = 0;
          break;
        }
      }
      if (score >= bestScore) {  // are we among the best matches?
        if (score > bestScore) {
          bestScore = score;
          matches.clear();  // new tier of best scoring matches
        }
        matches.add(Match(x, y, dx, dy));
        if (dx && dy) {  // we want diagonals, so add twice
          matches.add(new Match(x, y, dx, dy));
        }
      }
    }
  }
}

This is the core of the algorithm, so we should look at it at little closer.  We loop through all the directions, which is where dx and dy are one of -1, 0, 1.  If both dx and dy are 0, then that doesn't point in any direction, so we skip that case.  We calculate the end position of the word for the current direction and check to see if it fits on the grid.  If not, we skip that direction.

Next we calculate the score for that direction.  Encountering a cell already containing a matching letter is worth 1 point.  This encourages intersecting words.  If we encounter a cell with a different letter, we reset the score to zero and move on since the word doesn't actually fit there.

If the score is among the best scores, we add our current orientation to the list of matches.

What we end up with is a list of all the highest scoring orientations.  Given the choice between a horizontal or vertical match and a diagonal match, we would prefer the diagonal match.  Thus, we've added any diagonal matches to the high-score list twice.

Note that we haven't actually placed the word on the grid yet.. we've just calculated a list of best places to put it.

After we have tested every cell in the grid, we now have a list of possible places to put that word.  If we don't have any, that means the word doesn't fit anywhere in the grid and we should abandon that word and move onto the next.

If we do have a list of possibilities, we should pick a random one and then place the word into the grid.  Then move onto the next word.

Finally, after we have added all possible words to the grid, we loop through the grid and fill out any blanks with random letters.

Then you can output the grid and marked words.

Possible improvements could be the inclusion of red herrings.  When filling in the blanks with random letters, you could use letters from the word list to make "almost" matches.  It's also trivial to make the grid a random shape.  Just take the grid and before you do anything, fill in any blank spaces with a special character.  Then replace the special character with a blank on the final grid.