Home > Uncategorized > Tokenize a string using powershell and regular expressions

Tokenize a string using powershell and regular expressions

Just a snippet for creating tokens from a given string and a set of delimiters.

function tokenize($text, $delims = ".,;\s")
{
    # ref: http://stackoverflow.com/questions/521146/c-sharp-split-string-but-keep-split-chars-separators
    [regex]::Split($text, "(?<=[$delims])") | % { 
        if ($_[-1] -match "[$delims]")
        {
            if ($_.Length -gt 1) 
            {
                $_.SubString(0, $_.Length1)
            }
            $matches[0]
        }
        else
        {
            $_
        }
    }   
}

Sample usage:

tokenize "daniel;;;robert,johanna.torkil anna`tigor   adam"

Which will create the following tokens:

"daniel"; ";"; ";"; ";"; "robert"; ","; "johanna"; "."; "torkil"; " "; "anna"; "    "; "igor"; " "; " "; " "; "adam"

Advertisements
Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: