Home > Uncategorized > Word puzzle in C++

Word puzzle in C++

Last post presented a solution to the word chain problem based on basic graph theory and implemented in Haskell. It was concluded that the initialization of the graph from the dictionary was time consuming (about four minutes on my laptop). C++ is known for industrial strength library implementations and strong compilers. Can we improve the performance using C++ then? I decided to try out, equipped with Microsoft Visual Studio 2013 and the boost graph library.

Type definitions

Before we begin coding we need a few type definitions.

typedef boost::adjacency_list<boost::listS, boost::vecS, boost::undirectedS> graph_t;
typedef std::vector<std::string> dictionary_t;
typedef dictionary_t::size_type index_t;
typedef std::map<std::string, index_t> index_lookup_t;

Line 1: The graph type. The boost graph library provides (at least) two graph implementation; the matrix graph which is for dense graphs and the adjacency list which is for sparse graphs. For this problem the graph is definitely a sparse one we go for the latter option. The template parameters specifies how nodes are their neighbors are stored. I will not go into detail about this but it is reasonably well documented  in the library manual.  Next template argument specifies that we want an undirected graph since the neighborhood concept for word chains is symmetric.

Line 2: The dictionary type which is simply a vector of strings.

Line 3: The index type is the index used for the vector.

Line 4: We need to be able to retrieve an index given a word. The data structure used for this purpose is a map defined as below.
Constructing the graph

Now, we can write some code. First thing is to load the dictionary and construct the graph.

dictionary_t dictionary(
 istream_iterator<string>(ifstream("dictionary.txt")),
 istream_iterator<string>());

index_lookup_t word2index;
transform(
 dictionary.begin(), dictionary.end(),
 counting_range<index_t>(0, dictionary.size()).begin(),
 inserter(word2index, word2index.begin()),
 make_pair<const string&, const index_t&>);

graph_t g(dictionary.size());
for (auto i = 0U; i < dictionary.size(); ++i)
{
 for (auto j = i + 1; j < dictionary.size(); ++j)
 {
   if (neighbours(dictionary[i], dictionary[j]))
     add_edge(i, j, g);
 }
}

Line 1-3 leverages stream iterators to read the file data and construct the vector.

Line 6-10 uses the the std algorithm transform to construct the mapping from words to indices. The idea is to create pairs of indices and words and the insert the result into the map using the inserter class. boost counting_range comes handy to produce an enumeration from zero to the size of the dictionary. In Haskell we used an infinite list [0..] in this case.

Line 12-19 loops over the words and insert edges for every words that are neighbors.

Implementing the neighbor test is simple.

bool neighbours(const std::string& s1, const std::string& s2)
{
 const size_t length = s1.length();
 if (length != s2.length())
 return false;

 int diff = 0;
 for (auto i = 0U; i < length && diff <= 1; ++i)
 {
 if (s1[i] != s2[i])
 ++diff;
 }

 return diff == 1;
}
Finding the shortest path

Once we got the graph we want to find the shortest path between to words.

vector<vertex_descriptor_t> predecessors(num_vertices(ctx.g));

dijkstra_shortest_paths(
 ctx.g,
 sourceIndex,
 predecessor_map(&predecessors[0]).
 weight_map(fixed_weight_pmap())
 );

list<string> path;
auto index = targetIndex;
while (index != sourceIndex)
{
 path.push_back(ctx.dictionary[index]);
 index = predecessors[index];
}

path.push_back(ctx.dictionary[index]);
path.reverse();
return path;

The boost graph library comes with the dijkstra_shortest_paths function for this purpose. We pass in:

  • the graph
  • the source node

and (using the slightly confusing solution for passing named parameters):

  • a predecessor map where to store the result
  • a weight map that defines the weights for the edges
    The weight map was a bit of a struggle. The standard way is to provide a weight for each edge but that seemed like a  waste since every edge had the weight one. I ended up writing a very simple class implementing the read only property map concept.
struct fixed_weight_pmap { };

template <class K> 
inline int get(const fixed_weight_pmap& pa, const K& k) { return 1; }

namespace boost 
{ 
 template<> 
 struct property_traits<fixed_weight_pmap> 
 { 
   typedef int value_type; 
 }; 
}
Conclusions

So… how about performance? Does not still take four minutes to initialize? No. Nope. Not at all. Not even close. It takes less than ten seconds. Pretty impressive. The boost graph library is fast. It is generic. Unfortunately, it is not always easy to use. Partly due to the generality and the amount of concepts/sub-libraries/styles used (one example is the named parameters). Partly due to (imo) cumbersome documentation which too often is missing a good example. I hope this article can (very) slightly improve on the last deficiency.

References

The entire source code is stored on github.

The red book. Boost graph library user guide and manual.

Advertisements
Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: