## Haskell code contemplation

I picked up a programming excercise at some random site. The problem involved finding the m smallest elements in a list. After a bit of fiddling I came up with this one:

import Data.List.Ordered fsm :: Ord a => Int -> [a] -> [a] fsm m = foldr update [] where insert = insertBagBy (flip compare) update x minSet@(~(maxElem:minSet')) | length minSet < m = insert x minSet | x < maxElem = insert x minSet' | otherwise = minSet

The idea is to keep track of the smallest numbers (up till count m) while folding the list. The data structure used is a simple (but ordered) list. The following operations are used:

- insert
- length
- findMax
- deleteMax

Considering the efficiency both findMax and deleteMax are O(1). Insert is O(n). That is a bit unfortunate. I was hoping for O(log(n)) but when I look at the implementation of insertBagBy this is clearly not that efficient. However, insert is not a frequent operation for many input cases and if m is small it could still be something we could live with. The length operation is O(n) which is worse since this operation will be triggered for every element in the list (for any input).

You might wonder why I did not simply use the Data.Set? I started out with that one until I realized it cannot handle duplicate elements. For example finding the two minimum elements in [1,1,2,3]. It should be [1,1] but the set will not handle that correctly so I turned to ordered lists instead.

I though about the costly length operation for a while and came up with another version.

import Data.List.Ordered fsm :: Ord a => Int -> [a] -> [a] fsm m xs = foldr update (sort xs_) _xs where (xs_,_xs) = splitAt m xs rcomp = flip compare sort = sortBy rcomp insert = insertBagBy rcomp update x minSet@(maxElem:minSet') | x < maxElem = insert x minSet' | otherwise = minSet

Instead of starting with an empty set this version takes the m first elements and uses them as an initial set (of minimum numbers). Thus the length operation can be avoided entirely.

That was cool but we can do even better using the Data.Heap module. It provides an efficient implementation of a heap with all the operations we need. The updated code looks like below.

import Prelude hiding (drop) import Data.Maybe (fromJust) import Data.Heap fsm :: Ord a => Int -> [a] -> [a] fsm m = toList . foldr update (empty :: MaxHeap a) where findMax = fromJust . viewHead deleteMax = drop 1 update x minSet | size minSet < m = insert x minSet | x < findMax minSet = insert x (deleteMax minSet) | otherwise = minSet

In summary we got:

- insert, O(log n)
- length/size, O(1)
- findMax, O(1)
- deleteMax, O(log n).
*This was deduced by looking at the source code. The documentation does not say.*

So… indeed the operations are efficient and also the code is easy to read. This implementation has O(n log(m)) worst case complexity which can be compared to the previous ones having O(m ⋅ n).

However, once we have an efficient heap implementation we might consider taking full advantage of it.

import Prelude hiding (take) import Data.Heap fsm :: Ord a => Int -> [a] -> [a] fsm m = take m . foldr insert (empty :: MinHeap a)

In this way we turned it into a one liner. We are still relying only on efficient operations and ends up with a similiar (but slightly worse) time complexity O(n log(n)). A downside is that we are consuming more memory, O(n) instead of O(m) in the previous implementation.

It is a bit of a surprise for me that such a simple programming exercise gives so many options and raises so many questions. Also if compared to the problem of finding the smallest element in a list which does not give that many options at all. A simple implementation is the optimal.

fs :: Ord a => [a] -> a fs = foldr1 (\x min -> if x < min then x else min)

I guess it is not that uncommon, though, that a small rephrasing of the problem changes the entire game. One example that comes to my mind is that it is efficient to find the shortest path in a graph but the longest path is expensive to calculate (with the known algorithms).