## Word chains in Haskell

An example of a word chain puzzle (generally described at codekata.com) is to find words that links “cat” to the word “dog” using words from the dictionary only replacing one character at the time. The shortest chain that solves the example is:

cat –> cot –> cog –> dog

The problem can be seen as a graph problem where each word is a node and all nodes that only differs by one character are joined with an edge. Using such a representation, the solution to the problem can be calculated using a standard shortest path algorithm.

Lets look at how this can be implemented in Haskell. Representing graphs in (pure) functional languages are are tricky but Martin Erwig’s Functional Graph Library comes to rescue. Use cabal to download and install it from hackage.

> cabal install fgl

Once installed we can import it.

import Data.Graph.Inductive.Graph import Data.Graph.Inductive.Tree import Data.Graph.Inductive.Query.SP

Building the graph turns out to be straight forward.

graph :: Dictionary -> Gr Word Int graph dictionary = let nodes :: [LNode Word] nodes = zip [0..] dictionary edges :: [LEdge Int] edges = [(i,i',1) | (i,w) <- nodes, (i',w') <- nodes, i /= i', neighbours w w'] in mkGraph nodes edges

Nodes are pairs of an integer and a generic data type which in this case are strings. One line 5 all nodes are created by zipping an infinite list of integer indexes (who does not love that Haskell actually allows this abstraction) with our dictionary which is simple a list of strings.

Edges are three tuples; index of the first node, index of the second node and the edge weight which is always one in the case of word chains. At line 7 we leverage list comprehensions to construct all the edges. All that is needed above that is a function that checks if two words are “neighbors” meaning that they only differ one character.

neighbours :: Word -> Word -> Bool neighbours xs ys = length xs == length ys && distance 0 xs ys == 1 where distance d [] [] = d distance d (x:xs) (y:ys) = case (x==y, d) of (False, 0) -> distance 1 xs ys (False, _) -> -1 -- no use to continue... (True, d) -> distance d xs ys

The code above tries to be a bit efficient but I am sure it can be expressed more concise. So, now we can build a graph and luckily the graph library also provides a shortest path implementation:

sp :: (Graph gr, Real b) => Node -> Node -> gr a b -> Path

That is all we need to put our solution together into a working program.

main = do contents <- readFile "dictionary.txt" let dictionary = lines contents putStrLn "Initializing..." t1 <- getCurrentTime let !g = graph dictionary t2 <- getCurrentTime putStrLn $ "Done. Took " ++ (show (diffUTCTime t2 t1)) let toIndex w = fromJust $ elemIndex w dictionary let fromIndex i = dictionary!!i let loop = do putStrLn "source:" sourceString <- getLine putStrLn "target:" destString <- getLine let source = toIndex sourceString let target = toIndex destString let path = sp source target g putStrLn $ show $ map fromIndex path loop loop

Line 2 reads the dictionary from a text file and line 3 splits each line into a separate string which produces a list of words.

Line 6 creates the graph. It using the BangPatterns extension (!) to avoid lazy initialization. This is not exactly needed but I find it nice to be able to initialize the graph and then startup.

Line 8 and 9 are converters between word index and actual word.

Line 13 and 15 retrieves source and target words from the user (such as “cat” and “dog”).

Line 18 executes the shortest path search for the words given by the user.

That’s it. Lets compile an optimized executable of this program using ghc.

> ghc -O main.hs

When executing the program it spends a lot of time on the initialization. The graph takes almost four minutes to construct on my laptop for a dictionary of approximately 58000 words. A bit annoying for the user. Having constructing the graph the results are calculated instantly which is nice. In the next blog post, we will implement the same solution using c++ and boost graph library.

(The entire source code is published on github)