Archive

Archive for August, 2013

Hughes lists

Lists in functional languages, such as the build in list in F#, has good performance, O(1), for adding an element to the head of the list (an operation called cons).

1 :: [2;3;4]   // This is fast

However appending an element at the end of the list (an operation called snoc) is expensive, O(n).

[1;2;3] @ [4] // This is slow

Hughes lists (or DLists) are a clever data type that offers good performance for snoc. The implementation is simple (copied directly from http://stackoverflow.com/questions/2483131/cons-operator-in-f) :

module HList =
    type ‘a hlist = ‘a list -> ‘a list   (* a John Hughes list *)
    let empty : ‘a hlist = let id xs = xs in id
    let append xs ys = fun tail -> xs (ys tail)
    let singleton x = fun tail -> x :: tail
    let cons x xs = append (singleton x) xs
    let snoc xs x = append xs (singleton x)
    let to_list : ‘a hlist -> ‘a list = fun xs -> xs []

How does this compare to the .NET standard list implementation System.Collections.Generic.List?

Consider the following test code which executes ‘max’ number of snocs using the above hughes list implementation:

measureTime <|
    fun () ->
        Seq.fold (fun l e -> HList.snoc l e) HList.empty [1..max] |> HList.to_list

compared to this code that uses the standard list

measureTime <|
    fun () ->
        Seq.fold (fun (l : List<int>) e -> l.Add(e); l;) (new List<int>()) [1..max]

It turns out that the hughes list executes in about 7.5 seconds when max is set to 10000000 on my machine. The standard list executes in about 3.3 seconds.

How about the memory usage. This is the profile when running the hughes list (area of interest is marked with green):

memory_usage_hlist

This is the profile when running the .NET generic list:

memory_usage_net_list

We can conclude from this that the hughes lists are less performant (by a factor two or more in F#) and also used more memory (by a factor two). I do not think that those numbers should prevent us from using the hughes lists in many situations where we need a fast snoc.

Implementation note, the code for measureTime

let measureTime x =
    let stopWatch = System.Diagnostics.Stopwatch.StartNew()
    x() |> ignore
    stopWatch.Stop()
    stopWatch.Elapsed.TotalMilliseconds

Categories: Uncategorized

Seq.split in F#

I wanted to split a list into sub-lists given some kind of separator. For example, I wanted:

[1;4;5;0;0;7;9;1;0;3;0;0;0;9] to be split by the zero markers into [[1;4;5];[7;9;1];[3];[9]]

I could not find anything in the standard F# libraries. I found this snippet:

let split (selector:’a->bool) (source:seq<‘a>) :seq<seq<‘a>>=
  let i = ref 0
  source
  |> Seq.groupBy (fun elem -> if selector elem then incr i
                              !i)
  |> Seq.map snd

which is interesting but it produces (ignoring the difference between seq and list):

[[1;4;5];[0];[0;7;9;1];[0;3];[0];[0];[0;9]]

which was not exactly what I wanted. Like regex split i did not want to keep the split symbols. Also I think groupBy adds unnecessary performance overhead.

After a bit of fiddling I came up with this:

let split (pred : ‘a -> bool) (source : seq<‘a>) : seq<list<‘a>> =
    seq {
        let cur = ref (new List<_>())
        for elem in source do
            if pred elem then
                if cur.Value.Count > 0 then
                    yield Seq.toList(cur.Value)
                    cur.Value.Clear()
            else
                cur.Value.Add(elem)
    }

Are there better ways? Clearer? With better performance? More functional in style?

Categories: Uncategorized