Sudoku and Graph Theory

When you get stuck on a fiendishly difficult sudoku, it’s hard not to wonder if the puzzle really has a solution. At another moment, aglow in the triumph of a clever deduction, you might have a sneaking suspicion that there may be a simpler, more systematic way of finding the answer. Further questions may come to mind: How many different sudoku puzzles are possible in the standard 9-by-9 format? Can a puzzle with few initial entries be easier to solve than one with more entries? What’s the smallest number of initial entries necessary to guarantee that there’s one, and only one, solution?

This sudoku puzzle has 17 initial entries, the smallest number known to be possible on a 9-by-9 grid. So far, no one has figured out whether or not a solvable puzzle exists with only 16 initial entries. Herzberg and Murty, Notices of the American Mathematical Society

Murty found this poorly-designed sudoku puzzle in Air Canada’s in-flight magazine. Despite having 29 initial entries, it has two possible solutions. Herzberg and Murty, Notices of the American Mathematical Society

Although the puzzles are often billed as requiring no math to solve, some of the questions they raise call for mathematical analysis. Researchers are now taking up the task. An article in the June Notices of the American Mathematical Society lays a mathematical framework for addressing some basic questions about sudokus.

Each 9-by-9 sudoku grid is composed of nine 3-by-3 subgrids. Initially, some of these 81 squares contain a number, from 1 through 9, but others do not. The object is to fill in the empty squares so that each row, column, and subgrid contains all of the numbers 1 through 9, in any order. Each puzzle has only one solution.

Agnes M. Herzberg and M. Ram Murty of Queen’s University in Kingston, Ontario have translated the problem of solving a sudoku puzzle into the language of graph theory. The 81 squares in the grid correspond to vertices in a mathematical graph. A line connects vertices that appear in the same row, column, or subgrid. This translation allowed the mathematicians to use mathematical tools developed in graph theory to understand sudoku.

Although sudoku puzzles almost always use the digits 1 through 9, any nine symbols will suffice. For example, a puzzle could have nine letters, shapes, or colors instead of numbers. When graph theorists label the vertices, they call it a “coloring.” A sudoku puzzle begins with a partial coloring, since only a few spots have numbers. Once each vertex is colored and no two connected vertices have the same color, the coloring is called “proper.”

Thus, in the language of graph theory, solving a sudoku means extending a partial coloring of the graph into a proper coloring.

Herzberg and Murty used techniques from graph theory to show that a mathematically simple formula exists for the number of possible solutions to a given sudoku puzzle. If the puzzle is designed correctly, it has only one possible solution. Such a formula might help a stumped puzzle-solver make sure that a certain sudoku really does have a solution, and that it has only one solution.

Unfortunately, there’s a glitch: although the mathematicians proved that the formula exists, they weren’t able to figure out what it is.

Herzberg and Murty have established, however, that for a puzzle to have precisely one solution, the initial entries need to include at least eight of the nine digits. Their reasoning is simple. Suppose that neither 1 nor 2 appears in the initial entries. Then in any solution, all the 1’s could be switched with all the 2’s, so there would be at least two valid solutions.

Sudoku puzzles are an example of a type of graph known as a Latin square, which mathematicians have studied for centuries. A Latin square is simply a grid of numbers from 1 to n arranged so that each row and each column contain precisely one instance of each number. If a Latin square contains subgrids that also contain the n numbers once each, then it’s a sudoku.

How many Latin squares also happen to be sudoku puzzles? Counting up the total possible number of 9-by-9 Latin squares turns out to be quite difficult, and determining the total possible number of sudokus is even harder. In 1975, researchers determined that there are 5,524,751,496,156,892,842,531,225,600 (about 5.5 x 10²⁷) Latin squares with a 9-by-9 configuration. And two years ago, Bertram Felgenhauer of Dresden Technical University in Germany and Frazer Jarvis of the University of Sheffield in England figured out that there are 3,546,146,300,288 (about 3.5 x 10¹²) meaningfully different 9-by-9 sudoku puzzles. That’s a huge number, but not nearly as huge as the number of Latin squares.

But Herzberg and Murty posed a broader question. The standard 9-by-9 sudoku has nine 3-by-3 subgrids, but a sudoku can be smaller or larger. For example, a 4-by-4 sudoku has four 2-by-2 subgrids. A 16-by-16 sudoku has sixteen 4-by-4 subgrids, and generally, an n²-by-n² sudoku would have n-squared n-by-n subgrids. How many Latin squares of these other possible sizes are also sudokus?

That question seems impossible to answer, given that no one knows how many Latin squares exist for any size larger than 11-by-11. Furthermore, no one knows how many sudoku puzzles exist for any grid size exceeding 9-by-9. Nevertheless, Herzberg and Murty managed to compute that for a randomly chosen Latin square with dimension n²-by-n², the bigger n is, the smaller the probability that it is also a sudoku. In fact, the probability approaches zero as n gets larger.

Why expend all this mathematical energy on a little puzzle? “It’s fun,” Murty says.

But he also points to a couple of serious reasons. Remarkably enough, sudoku could have practical applications when viewed as a graph theory problem. For example, scheduling committee meetings for various groups in different time slots can pose a similar mathematical challenge. Each vertex represents a different committee, and two committees are joined by a line if they have a member in common. If some of the committees have already been assigned time slots, scheduling the remaining committee meetings involves extending a partial coloring to a proper coloring, so that all meetings are allotted times that don’t conflict with any other meetings. Murty also points to applications in designing test fields for agricultural studies and in avoiding interference when assigning frequency channels to television stations.

Murty says that he is fascinated by sudoku puzzles because they pose a simple problem that connects with sophisticated mathematics. For example, the same tools he is using to understand sudoku are also applicable to the “four color problem.” This is the classical conjecture that for any given map, only four different colors are needed to color each country (or state) so that any two countries that share a border are not the same color. It took more than a century to solve that problem.

“These questions that look innocent can have deep mathematics in them,” he says. “Sudoku is one of those.”

If you would like to comment on this article, please see the blog version.