Archive for September, 2016

A first look at elasticsearch graph

September 24, 2016 Leave a comment

Today I sat down to have a look at elasticsearch graph. I needed a dataset to analyze and I picked the enron mail dataset which was released after the enron scandal.

The dataset is provided as a sql script for mysql. I was too lazy to install mysql so I just did a quick and dirty powershell hack to convert some of the data into a csv file.

$lines = Get-Content .\dataset.csv
$lines = $lines | ? { $_ -match "^\s*\(\d+" }
$lines = $lines | % { $_.Trim().TrimStart('(').TrimEnd('),').Replace("`"","").Replace("'","`"") }
Set-Content .\dataset.csv -Value $lines

After this little hack it was simple enough to parse the data using the built-in Import-Csv command.

$header = "mid", "sender", "date", "message_id", "subject", "body", "folder"
$dataset = Import-CSV -Delimiter ',' -Path .\dataset.csv -Header $header

The next thing was to create an elasticsearch index. After some experimentation I came up with this definition.

$mappingDef = @{
  mappings= @{
        mid=@{ type="string" }
        sender=@{ type="string"; index="not_analyzed" }
        date=@{ type="date"; format= "yyyy-MM-dd HH:mm:ss" }
        message_id=@{ type="string"; index="not_analyzed" }
        subject=@{ type="string" }
        body=@{ type="string" }
        folder=@{ type="string"; index="not_analyzed" }
} | ConvertTo-Json -Depth 10

Invoke-RestMethod -Method PUT "http://localhost:9200/enron" -ContentType "application/json" -Body $mappingDef


The next thing is to send all the data to elasticsearch:

foreach($mail in $dataset)
  $body = $mail | ConvertTo-Json
  Invoke-RestMethod -Method Put "http://localhost:9200/enron/email/$($mail.mid)" -ContentType 'application/json' -Body $body

Assuming elasticsearch and kibana is available, graph is easy to install by following the instructions on the elasticsearch page. Firing up the graph plugin page I was now able to create my first graph:


Categories: Uncategorized