Home > Uncategorized > More on CSV parsers in C++

More on CSV parsers in C++

I was watching Herb Sutter talking about the new C++ the other day. The video was from the Windows build conference and the talk was really good by the way. Anyway, Herb claims that modern C++ is clean, type safe and efficient. This got me thinking about my CSV parsers which I was not really happy with. My main concern is clarity but not without sacrificing too much efficiency.

I started playing around with the boost tokenizer trying to get more out of it. After a while I also re-discovered the lexical_cast from the boost library which provides a short and clean way to convert strings to integers or doubles. Combining this with the stream iterators this was my result:

#include <iostream>

#include <string>

#include <fstream>

#include <boost/tokenizer.hpp>

#include <boost/lexical_cast.hpp>

#include <boost/algorithm/string.hpp>

 

template<class T>

std::vector<std::vector<T>> ParseCsv(std::istream& is)

{

    using namespace std;

    using namespace boost;

    typedef istream_iterator<char> iterator;

    typedef char_separator<char> separator;

    typedef tokenizer<separator, iterator, string> Tokenizer;

 

    is.unsetf(ios_base::skipws);

 

    vector<vector<T>> result;

    Tokenizer tokens(iterator(is), iterator(), separator(“,”, “\n”));   

    bool newLine = true;   

    for(auto token = tokens.begin(); token != tokens.end(); ++token)

    {

        if (newLine)

        {

            result.push_back(vector<T>());

            newLine = false;

        }

 

        if (*token == “\n”)

            newLine = true;

        else

            result.back().push_back(lexical_cast<T>(trim_copy(*token)));

    };

 

    return result;

}

 

Usage is simple:

 

vector<vector<int>> result = ParseCsv<int>(ifstream(“test.csv”));

 

I think I like this one better than my previous attempts; assuming though that boost is available.

Advertisements
Categories: Uncategorized
  1. H.
    July 23, 2012 at 4:30 pm

    Very nice and useful code, thank you.

    However, g++ 4.6.3 on Ubuntu 12.04 with -Wall will come up with the following:

    warning: ‘auto’ will change meaning in C++0x; please remove it

    I changed the line

    for(auto token = tokens.begin(); token != tokens.end(); ++token)

    to

    for(Tokenizer::iterator token = tokens.begin(); token != tokens.end(); ++token)

    to remove it.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: