Sunday, February 17, 2013

Haskell Profiling Redux. Also, briefly Ponies.

Before we get into the Criterion benchmarking library, I guess I should fulfill the function of this journal of mine every so often. This is entirely unrelated to programming, so skip to the next heading if you want to get directly to benching examples.

Ponies

Before the last little while, I could have recommended it to you without reservations.

However, apparently some Hasbro exec wasn't happy to leave well enough alone? So Season 3, which concluded this past Saturday was both a) half as long as usual and b) very hit-or-miss.

The finale in particular was excruciating. Not that the animators and writers didn't do their best, I guess, but it's pretty obvious that this episode was rushed as fuck and not at all what they were planning for the ending of the season[1]. I had to stop watching because my eye-rolling intensity was ramping up at each scene. You really have to work to kill my willing suspension of disbelief in any cartoon, let alone this one. That should tell you something. My wife is a more hardcore fan, having started in on ponies at G1, back when the fandom was almost exclusively female. She managed to finish out the episode, then she cried for a while. If she wasn't still nursing, I have no doubt that she'd have pulled out the scotch. That should tell you the same thing, but in 76-point, bold, condensed type.

So anyway, instead of

Go watch ponies, they're quite good. Inaimathi

my recommendation now has to look something like

Go watch ponies, they're quite good. But avoid episodes 10, 11, 12 and 13 of season 3. And actually, S3E3 and S3E4 weren't up to par either. And S3E9 has some annoyingly out of character behavior. And while otherwise excellent, the season 3 premiere foreshadows some interesting character development and story hooks that never got followed up on, so don't watch that unless you're ok with getting no closure at all. Inaimathi

That's ... less than a ringing endorsement, but I guess I'll stick around to see what they can pull off in season 4[2]. In the meantime, there are some fan projects that look really good[4]. Slice of Life is a tumblr based webcomic that honestly looks like what the next step for the official media should be; a simple, character-driven story of some minor characters introduced over the course of the first two seasons[5]. Also, there's Turnabout Storm, a fan-made FiM/Phoenix Wright crossover which treats both source series fairly respectfully. That might just be my wife's and my weird tastes showing, but we're both heavily invested.

Haskell

Right, back to the subject at hand. Last week, I finally got out to the Haskell group. They don't always meet up, and when they do it's usually in the second Wednesday of the month[6], but this month, they met on a Thursday that wasn't otherwise occupied for me.

First impressions are that I have no idea what the fuck I'm doing.

These guys are far enough beyond me in terms of math chops that I couldn't follow all of the conversation happening. I know it's a good thing to periodically be the dumbest guy in the room, but it doesn't feel good while it's happening. Anyhow, we had a long-ish presentation on Arrows and their implications followed by some free-form discussion. One of the things I picked up was the question of how Acid-State compares performance-wise to other data back-ends available, and another was mention of Criterion. The first is something I aim to get to next time, the second is a profiling library for Haskell that doesn't require you to go through any of that GHC compilation flag bullshit I took you through last time. So I figured I'd crack it open and see if it can provide decent output for me.

Profiling Haskell with Criterion

Criterion doesn't work on the same level as the GHC profiler. Specifically, it works on individual functions rather than complete programs. It lets you specify labels and benchmark groups, and it takes your hardware into consideration. In concrete terms, lets take that Life module from a while ago for a spin.

module Life where
import Data.List (group, sort, concatMap)
import Data.Set

inRange :: Ord a => a -> a -> a -> Bool
inRange low n high = low < n && n < high

lifeStep :: Integer -> Set (Integer, Integer) -> Set (Integer, Integer)
lifeStep worldSize cells = fromList [head g | g <- grouped cells, viable g]
  where grouped = group . sort . concatMap neighbors . toList
        neighbors (x, y) = [(x+dx, y+dy) | dx <- [-1..1], dy <- [-1..1], 
                            (dx,dy) /= (0,0), inSize (dx+x) (dy+y)]
        inSize x y = inR x worldSize && inR y worldSize
        inR = inRange 0
        viable [_,_,_] = True
        viable [c,_] = c `member` cells
        viable _ = False

runLife :: Integer -> Integer -> Set (Integer, Integer) -> Set (Integer, Integer)
runLife worldSize steps cells = rec (steps - 1) cells
  where rec 0 cells = cells
        rec s cells = rec (s - 1) $! lifeStep worldSize cells

glider = fromList [(1, 0), (2, 1), (0, 2), (1, 2), (2, 2)]
blinker = fromList [(1, 0), (1, 1), (1, 2)]
gosperGliderGun = fromList [(24, 0), (22, 1), (24, 1), (12, 2), (13, 2), (20, 2), (21, 2), (34, 2), (35, 2), (11, 3), (15, 3), (20, 3), (21, 3), (34, 3), (35, 3), (0, 4), (1, 4), (10, 4), (16, 4), (20, 4), (21, 4), (0, 5), (1, 5), (10, 5), (14, 5), (16, 5), (17, 5), (22, 5), (24, 5), (10, 6), (16, 6), (24, 6), (11, 7), (15, 7), (12, 8), (13, 8)]

main :: IO ()
main = putStrLn . show $ runLife 50 5000 gosperGliderGun

The Critreion benching specification for that would look something like

{-# LANGUAGE ScopedTypeVariables #-}

import Criterion.Main
import qualified Life

main = defaultMain [
  bgroup "lifeStep" [
     bench "Gun" $ whnf step Life.gosperGliderGun,
     bench "Glider" $ whnf step Life.glider,
     bench "Blinker" $ whnf step Life.blinker
     ],
  bgroup "runLife" [ 
     bench "Gun" $ whnf run Life.gosperGliderGun,
     bench "Glider" $ whnf run Life.glider,
     bench "Blinker" $ whnf run Life.blinker
     ],
  bgroup "main" [
    bench "with IO" $ whnfIO Life.main
    ]
  ]
  where step = Life.lifeStep 50
        run = Life.runLife 50 5000

Take note of a few things in here. First, bgroup is of type String -> [Benchmark] -> Benchmark, which means that you can nest them; I just don't. Second, because you have to pass the functions you bench this way, you obviously can't measure internal definitions; you'll need to pull those pieces out to the top-level in order to figure out how much time they're taking. Third, whnf takes the target function and its last argument as separate arguments. The documentation says this is to prevent bench calls themselves from being optimized away by ghc -O, which would be less than ideal for obvious reasons. The output of the above will be something like

warming up
estimating clock resolution...
mean is 2.233879 us (320001 iterations)
found 52279 outliers among 319999 samples (16.3%)
  158 (4.9e-2%) low severe
  52121 (16.3%) high severe
estimating cost of a clock call...
mean is 53.36260 ns (14 iterations)

benchmarking lifeStep/Gun
mean: 1.741280 ms, lb 1.737541 ms, ub 1.745301 ms, ci 0.950
std dev: 19.92180 us, lb 17.54283 us, ub 24.73115 us, ci 0.950

benchmarking lifeStep/Glider
mean: 202.0392 us, lb 201.8147 us, ub 202.3017 us, ci 0.950
std dev: 1.240524 us, lb 1.058166 us, ub 1.716059 us, ci 0.950

benchmarking lifeStep/Blinker
mean: 113.2888 us, lb 113.1514 us, ub 113.4549 us, ci 0.950
std dev: 775.2598 ns, lb 649.4014 ns, ub 1.132731 us, ci 0.950

benchmarking runLife/Gun
collecting 100 samples, 1 iterations each, in estimated 539.4134 s
mean: 5.436085 s, lb 5.427699 s, ub 5.447670 s, ci 0.950
std dev: 50.19115 ms, lb 38.42313 ms, ub 67.52400 ms, ci 0.950

benchmarking runLife/Glider
mean: 21.11409 ms, lb 20.93418 ms, ub 21.53549 ms, ci 0.950
std dev: 1.325480 ms, lb 478.7184 us, ub 2.322194 ms, ci 0.950
found 5 outliers among 100 samples (5.0%)
  3 (3.0%) high mild
  2 (2.0%) high severe
variance introduced by outliers: 59.525%
variance is severely inflated by outliers

benchmarking runLife/Blinker
mean: 19.77626 ms, lb 19.74970 ms, ub 19.81207 ms, ci 0.950
std dev: 157.6956 us, lb 125.2292 us, ub 204.1448 us, ci 0.950

benchmarking main/with IO
fromList [(3,5),(4,4),(4,6),(5,4),(5,6),(6,5),(17,3),(17,4),(18,3),(18,4),(22,4),(22,5),(23,3),(23,6),(24,4),(24,6),(25,5),(34,2),(34,3),(35,2),(35,3)]
collecting 100 samples, 1 iterations each, in estimated 547.4592 s
fromList [(3,5),(4,4),(4,6),(5,4),(5,6),(6,5),(17,3),(17,4),(18,3),(18,4),(22,4),(22,5),(23,3),(23,6),(24,4),(24,6),(25,5),(34,2),(34,3),(35,2),(35,3)]

<snip a fuckton of duplicates />

fromList [(3,5),(4,4),(4,6),(5,4),(5,6),(6,5),(17,3),(17,4),(18,3),(18,4),(22,4),(22,5),(23,3),(23,6),(24,4),(24,6),(25,5),(34,2),(34,3),(35,2),(35,3)]
mean: 772.6040 us, lb 753.6831 us, ub 785.0876 us, ci 0.950
std dev: 77.70160 us, lb 51.99922 us, ub 110.7696 us, ci 0.950
found 8 outliers among 100 samples (8.0%)
  5 (5.0%) low severe
  2 (2.0%) high severe
variance introduced by outliers: 79.011%
variance is severely inflated by outliers

You can apparently generate shiny HTML/JS-based performance reports, judging by what I'm reading here, but I haven't bothered to. Basically what this library does, which you can see by that output, is take each function you pass it and try it a bunch of times, then hand you stats on how long it took. It doesn't give you any kind of memory usage information, and won't give you information on sub-expressions. On the other hand, you don't need to dick around with installing the profiling versions of any library, or any of GHC's profiling flags, and you get to be more surgical about it and this compiles results from a bunch of trials[7] which gives me slightly more confidence in the results.

So, I mean, pros and cons.

Not having to grub around my .cabal config[8] sounds like a good thing, and I typically care a lot more about execution time than memory usage in the applications I write. So this looks like an all-round better way to benchmark than the default.

Tune in next time, when I'll probably put this tool to use in comparing some of the database options I have in Haskell.


Footnotes

1 - [back] - Since there was no mention of this "Equestria Games" thing they were building up for three episodes.

2 - [back] - My wife's theory is that I shouldn't bother. She thinks the studio has been giving FiM less attention because they're preparing to ramp up on Equestria Girls; another spin-off show for kids about chicks in high-school[3]. It's plausible, but I'll still reserve judgement.

3 - [back] - Incidentally, my wife adds "Booooo!".

4 - [back] - Except that they're happening in a parallel universe now[9] or whatever, because Twilight Sparkle is not a unicorn anymore.

5 - [back] - Incidentally, if anyone from Hasbro is reading; one way you could correct course at this point is by walking over to egophiliac's house with a giant cheque, asking her what number she'd like on it, then putting her in charge of the writing and/or editing staff.

6 - [back] - Which conflicts with my workout schedule.

7 - [back] - The number of which you can specify by using defaultMainWith instead of defaultMain and passing it a potentially modified defaultConfig.

8 - [back] - Or re-install half my libraries later.

9 - [back] - Actually, after I wrote this, I went over to check out the Slice of Life archives and it turns out that egophiliac has been going back and revising history as new stuff gets added to the show. For instance, Twilight now has added wings and there was a slight conversational change between Pound Cake and Scootaloo. My theory is that she hasn't updated more frequently lately because she's been too busy changing her archives in response to Hasbro's various wankery.

Saturday, February 16, 2013

Wai Without Yesod - The Front End

Firstly, those notes to self I mentioned last time have been applied. You can see the results here, or more specifically, here among other places. Those used to be separate functions that each edited specific things, rather than each calling out to one actual editor. Also, the countItem and commentItem pieces were one function that accepted multiple Maybes and only applied the relevant ones. That was more complicated than it needed to be, so it has been separated into what you see there. Thinking about that function more thoroughly also pointed me to a bug I had previously missed[1].

Secondly, I ended up having to put together two front-ends; an Angular-based page and something a bit more traditional with jQuery. The reason was that the Angular.js version refused to work on my phone for some bizarre reason. It showed the intro screen fine, and displayed auth errors like it was supposed to, but refused to show the main screen on a successful authentication. I have no idea why that was, but since half the point of this app was that I could check it from my phone on the way from work, it wasn't going to fly. Luckily, the API-friendly back-end technique I'm trying out here made it a breeze to create a new front-end without touching the rest of the application. The changes involved a couple of CSS tweaks and re-writes of goget.js and index.html. Also, I had to throw handlebars back in there.

Thirdly, I deployed it. It doesn't run under HTTPS yet, so don't put in anything illegal or embarrassing, but that's a usable shopping list synchronizer which I intend to use. Let me know if you try it and anything explodes.

On to the code!

At the moment, I've got the jQuery and Angular versions separated into different branches, but I'll merge them shortly and just provide each as a separate front-end[2]. On a scale this small, it turns out not to matter much how you write the interface. If you check out the line-count on both those front-ends, the reactive version saves about 10 lines of HTML and 15 of JavaScript. It stacks up in larger applications, and if there's an option to use less JS, I'll take it, but in this case, the elegant solution doesn't work, so whatever. Lets start with the HTML markup first. Here's the Angular

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <meta charset="UTF-8" />
    <title>GoGet - Because I Can't Be Expected to Remember This Shit</title>
  </head>
  <body ng-app="goget">
    
    <div ng-controller="GoGetCtrl">
      <div ng-show="!user.loggedIn" class="user-form">
        <div ng-show="authError" class="error">{{authError}}</div>
        <input type="text" placeholder="User Name" ng-model="user.name" />
        <input type="password" placeholder="Passphrase" ng-model="user.passphrase" />
        <a class="register" ng-click="register(user.name, user.passphrase)">Register</a>
        <button class="btn login" ng-click="login(user.name, user.passphrase)"><i class="icon-check"></i> Login</button>
      </div>
        
      <ul ng-show="user.loggedIn" class="shopping-list">
        <li class="{{itm.status}}" ng-repeat="itm in itemList"
            ng-mouseover="itm.hovered = true" ng-mouseout="itm.hovered = false">
          <span class="count">{{itm.count}}x</span> 
          <span class="name">{{itm.name}}</span> 
          <button class="btn" ng-click="got(itm.name)" ng-show="itm.status=='Need'"><i class="icon-check"></i></button>
          <button class="btn" ng-click="need(itm.name)" ng-show="itm.status=='Got'"><i class="icon-exclamation-sign"></i></button>
          <p class="comment" ng-show="itm.hovered">{{itm.comment}}</p>
        </li>
        <li class="controls">
          <input type="text" placeholder="Item Name" ng-model="newItem.name" /> 
          <input type="text" placeholder="Comment" ng-model="newItem.comment" />
          <input type="text" placeholder="Count" ng-model="newItem.count">
          <button class="btn" ng-click="add(newItem.name, newItem.comment, newItem.count)"><i class="icon-plus"></i></button>
        </li>
      </ul>
    </div>

    <!-- ------ -->
    <!-- Styles -->
    <!-- ------ -->
    <link rel="stylesheet" href="/static/css/bootstrap.min.css" type="text/css" media="screen" />
    <link rel="stylesheet" href="/static/css/bootstrap-responsive.min.css" type="text/css" media="screen" />

    <link rel="stylesheet" href="/static/css/style.css" type="text/css" media="screen" />    

    <!-- ------- -->
    <!-- Scripts -->
    <!-- ------- -->
    <script src="/static/js/underscore-min.js" type="text/javascript"></script>
    <script src="/static/js/angular.min.js" type="text/javascript"></script>    
    
    <script src="/static/js/goget.js" type="text/javascript"></script>
  </body>
</html>

and here's the jQuery

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <meta charset="UTF-8" />
    <title>GoGet - Because I Can't Be Expected to Remember This Shit</title>
  </head>
  <body> 

    <!-- templates -->
    <script id="tmp-item" type="text/x-handlebars-template">
      <li class="{{status}}">
        <span class="count">{{count}}x</span>
        <span class="name">{{name}}</span>
        {{#controls this}}{{/controls}}
        <!-- <p class="comment">{{comment}}</p> -->
      </li>
    </script>

    <script id="tmp-item-controls" type="text/x-handlebars-template">
      <button class="btn" onclick="goget.{{fn}}(jQuery(this).siblings('.name').text())"><i class="{{iconClass}}"></i></button>
    </script>

    <!-- body -->
    <div>
      <div class="user-form">
        <div class="error"></div>
        <input type="text" class="user-name" placeholder="User Name" />
        <input type="password" class="passphrase" placeholder="Passphrase" />
        <a class="register" onclick="util.applyToUser(goget.register)">Register</a>
        <button class="btn login" onclick="util.applyToUser(goget.login)">
          <i class="icon-check"></i> Login
        </button>
      </div>

      <ul class="shopping-list">
      </ul>
      <ul class="shopping-list-controls">
        <li class="controls">
          <input type="text" class="name" placeholder="Item Name" /> 
          <input type="text" class="comment" placeholder="Comment" />
          <input type="text" class="count" placeholder="Count" value="1" />
          <button class="btn" onclick="util.applyToVals(goget.add, '.controls ', ['.name', '.comment', '.count'])"><i class="icon-plus"></i></button>
        </li>
      </ul>
    </div>

    <!-- ------ -->
    <!-- Styles -->
    <!-- ------ -->
    <link rel="stylesheet" href="/static/css/bootstrap.min.css" type="text/css" media="screen" />
    <link rel="stylesheet" href="/static/css/bootstrap-responsive.min.css" type="text/css" media="screen" />

    <link rel="stylesheet" href="/static/css/style.css" type="text/css" media="screen" />    

    <!-- ------- -->
    <!-- Scripts -->
    <!-- ------- -->
    <script src="/static/js/underscore-min.js" type="text/javascript"></script>
    <script src="/static/js/handlebars.js" type="text/javascript"></script>    
    <script src="/static/js/jquery.min.js" type="text/javascript"></script>    
    
    <script src="/static/js/goget.js" type="text/javascript"></script>
  </body>
</html>

There's a bunch of common boilerplate at the bottom and top that you can safely ignore. The meat begins at the body tag and stops at the block comment denoting the Styles section. The only real difference is that you can see some unfamiliar util calls and explicit templates in the jQuery version. Oh, the Angular version also controls its visibility explicitly through those ng-show attributes; the jQuery version relies on CSS to do the same. The actual differences are readily on display in the JS code though. First, Angular

var App = angular.module("goget", [])
    .config(function ($httpProvider) {
        /// Angular's post doesn't do the correct default thing with POST parameters
        $httpProvider.defaults.headers.post['Content-Type'] = 'application/x-www-form-urlencoded; charset=UTF-8';
        $httpProvider.defaults.transformRequest = function(data){
            return _.map(data, function (val, k) { return encodeURIComponent(k) + "=" + encodeURIComponent(val); }).join("&");
        }
    });

App.controller('GoGetCtrl', function ($scope, $http) {
    $scope.itemList = [];
    $scope.newItem = { count: 1 };
    $scope.user = { id: false, loggedIn: false, passphrase: "" };

    function itemPost (uri, params) {
        $http.post(uri, params)
            .success(function (data) {
                $scope.itemList = data;
            })
            .error(function (data) {
                console.log(data);
            })
    }

    function userPost (uri, params) {
        console.log("Sending " + uri + " request...")
        $http.post(uri, params)
            .success(function (data) {
                $scope.user.id = data.id;
                $scope.user.loggedIn = true;
                $scope.itemList = data.items;
            })
            .error(function (data) {
                $scope.authError = data;
                console.log(data)
            })
    }

    $scope.login = function (name, pass) {
        userPost("/auth/login", {name : name, passphrase: pass});
    }

    $scope.register = function (name, pass) {
        userPost("/auth/register", {name : name, passphrase: pass});
    }

    $scope.add = function (itemName, comment, count) {
        $http.post("/app/new", {itemName: itemName, comment: comment, count: count})
            .success(function (data) {
                $scope.itemList = data;
                $scope.newItem = { count: 1 }
            })
    }
    
    $scope.need = function (itemName) {
        itemPost("/app/item/need", {itemName: itemName});
    }
    
    $scope.got = function (itemName) {
        itemPost("/app/item/got", {itemName: itemName});
    }

});

and then jQuery

var util = {
    hcompile: function (template) {
        return Handlebars.compile($("#tmp-" + template).html())
    },
    vals: function (listOfDOMSelectors) {
        return _.map(listOfDOMSelectors, function (s) { return $(s).val() })
    },
    under: function (DOMContext, listOfDOMSelectors) {
        return _.map(listOfDOMSelectors, function (s) { return DOMContext + s })
    },
    applyToVals: function (fn, DOMContext, listOfDOMSelectors) {
        return fn.apply({}, util.vals(util.under(DOMContext, listOfDOMSelectors)));
    },
    applyToUser: function (fn) {
        return util.applyToVals(fn, '.user-form ', ['.user-name', '.passphrase']);
    }
}

Handlebars.registerHelper("controls", function (anItem) {
    if (anItem.status == 'Got') {
        var ctrl = {fn: 'need', iconClass: "icon-exclamation-sign"}
    } else {
        var ctrl = {fn: 'got', iconClass: "icon-check"}
    }
    return new Handlebars.SafeString(templates.itemButtons(ctrl));
})

var templates = {
    item: util.hcompile("item"),
    itemButtons: util.hcompile("item-controls")
}

var goget = {
    render: function (itemList) {
        $(".shopping-list-controls").show()
        $(".shopping-list").empty();
        $.each(itemList, function (ix, anItem) {
            $(".shopping-list").append(templates.item(anItem));
        })
    },
    itemPost: function (uri, params) {
        $.post(uri, params)
            .done(function (data, textStatus, jqXHR) {
                goget.render($.parseJSON(jqXHR.responseText))
            })
            .fail(function (data, textStatus, jqXHR) {
                console.log(["Failed!", data, textStatus, jqXHR])
                // something odd happened; either invalid item, or failed connection
            })
    },
    userPost: function (uri, params) {
        $.post(uri, params)
            .done(function (data, textStatus, jqXHR) {
                $(".user-form").hide();
                goget.render($.parseJSON(jqXHR.responseText).items);
            })
            .fail(function (data) {
                console.log(["Failed!", data.responseText])
                $(".user-form .error").text(data.responseText).show()
            })
    },
    login: function (name, pass) {
        goget.userPost("/auth/login", { name: name, passphrase: pass });
    },
    register: function (name, pass) {
        goget.userPost("/auth/register", { name: name, passphrase: pass });
    },
    add: function (itemName, comment, count) {
        goget.itemPost("/app/new", {itemName: itemName, comment: comment, count: count})
    },
    need: function (itemName) {
        goget.itemPost("/app/item/need", {itemName: itemName});
    },
    got: function (itemName) {
        goget.itemPost("/app/item/got", {itemName: itemName});
    }
    
}

util is a bunch of shortcut functions that make it relatively simple to do things which are trivial in the reactive version. Take applyToVals, for instance.

    applyToVals: function (fn, DOMContext, listOfDOMSelectors) {
        return fn.apply({}, util.vals(util.under(DOMContext, listOfDOMSelectors)));
    },

This is only necessary because in order to get values out of inputs, I have to do DOM traversals. Take a look at the sample invocation back in the jQuery-style HTML file

onclick="util.applyToVals(goget.add, '.controls ', ['.name', '.comment', '.count'])"

So, in other words, there are three controls in the DOM somewhere, and I'd like to grab their values and pass them to the function goget.add. I could call goget.add($(".controls .name").val(), $(".controls .comment").val(), $(".controls .count").val(),), but that seems more than mildly annoying if I have to do it multiple times. So I pulled out the pattern; applyToVals takes a function, a DOM context[3] and a list of element selectors. It then concatenates the DOM context onto each of the selectors, and returns a list of the values of the elements specified by those selectors.

How do we do that in Angular?

        <li class="controls">
          <input type="text" placeholder="Item Name" ng-model="newItem.name" /> 
          <input type="text" placeholder="Comment" ng-model="newItem.comment" />
          <input type="text" placeholder="Count" ng-model="newItem.count">
          <button class="btn" ng-click="add(newItem.name, newItem.comment, newItem.count)"><i class="icon-plus"></i></button>
        </li>

That's actually a snippet from the Angular-style HTML file, and only about 1/5th of it is responsible for the equivalent. Each of the inputs we care about has an ng-model property, and that we then just pass those models into add. If it worked where I needed it to, I wouldn't have bothered finding a better solution than this.

Most of the rest of the util namespace is actually just intermediate definitions for util.applyToVals, and there's one definition that uses it specifically to pull out data from the user form. Oh, and a shorthand for compiling a particular Handlebars template. There's a snippet where we define a helper function for the main template, and a place in the goget namespace wherein we call render, which is famously missing from Angular, and that's really it. The rest of it is transliterated pretty clearly.

The only other thing I'll highlight is that the Angular version contains this:

    .config(function ($httpProvider) {
        /// Angular's post doesn't do the correct default thing with POST parameters
        $httpProvider.defaults.headers.post['Content-Type'] = 'application/x-www-form-urlencoded; charset=UTF-8';
        $httpProvider.defaults.transformRequest = function(data){
            return _.map(data, function (val, k) { return encodeURIComponent(k) + "=" + encodeURIComponent(val); }).join("&");
        }
    });

which the jQuery version doesn't. As far as I'm concerned, this is the one place where the Angular devs are just plain wrong. I know other approaches are possible here, so I guess it's a good thing that there's an option. But as far as I'm aware, all the widely used HTTP servers out there right now expect POST parameters to be encoded in the www-form format by default. And that's not what Angular does with JSON objects by default.

$http.post("/foo", { bar: 1, baz: 2 });

will actually send the server a POST body that looks like {"bar":1,"baz":2}. At that point it's up to you to grab the raw request and parse that body with a JSON interpreter. What you likely want, because most HTTP servers will parse it appropriately by default, is bar=1&baz=2, with both the keys and values getting URI-encoded just in case. The way you do that is by using this config option I've got above. The jQuery equivalent doesn't need this, because $.post does the right thing with no additional prodding[4].


Footnotes

1 - [back] - Specifically, it had to do with reading the count parameter. You can see the fix here and here. Short version: count needs to be readable as an Integer for the back-end to proceed, but it's coming from the outside, which means I can't guarantee that. The initial version of the code was optimistic, simply using read :: Integer assuming it could work. If a malicious front-end sent back something that couldn't be read as a number, that would have given me a run-time error. I'm under the impression that these are to be avoided in Haskell..

2 - [back] - Defaulting to jQuery because I want to use it from my phone, and putting the angular version at /angular/* rather than at root.

3 - [back] - The common selector prefix of any elements I'll need to grab from.

4 - [back] - I have no idea whether there are options to do it another way in jQuery.

Sunday, February 10, 2013

Wai Without Yesod - Simple Example of Haskell Web Development

It's been a slow writing month, as you can tell by the number of recent submissions. Hopefully, I remember how to do this.

Other than raising a child[1], taking a long-overdue break from programming in my down time and dealing with various other mundane bullshit, I've been working on a small project in Haskell that'll make coordination for my wife and I marginally simpler. It's a minimal, web-based running shopping list, and the idea is to use it to forget fewer things during the weekly trip as well as make occasional after-work quick grocery stops possible. I could easily have knocked this out in a day or three using Python, but I figured I'd take some time to put my Haskell skills through more rigorous testing.

Last time, I worked with Happstack, which is the best looking web framework available in the language, as far as I'm concerned. This time, I just pulled out warp, the wai-based server, and went frameworkless.

Why wai without Y?

Using a framework binds you to it. Yesod especially seems to have a stick up its ass about using something other than Haskell to build pieces of a project. You may recall that I theorized about keeping front and back-ends entirely separate a little while ago. Well, I'm still on that. So as much as framework maintainers want me to use or Hamlet or whatever, doing so would be counter-productive for me. Yesod actually goes one further and has a JS-generation language too. The idea is supposed to be that all of your code then goes through the same rigorous type checks and optimizations that the ML family is famous for. In practice, what it means is that you're more-or-less forced to use jQuery[2], and it means that all of your project is in Haskell[3] and it means that your server-side code is doing everything[4]. I think I'll stick to the manual trio for the front-end and just let Haskell do the heavy database lifting.

The easiest way to do that seems to be to basically keep the model and a controller on server-side, and the view and a controller on the client side. Happstack is the only one of the main three Haskell web frameworks to make something like that reasonably simple, but I already got some time in with it, and the Warp server claims to bench much better.

So, let's give this a whirl.

Haskell Dev General Thoughts

Before I get to the code in my usual self-review fashion, let me let you in on some lessons I had to learn the hard way by hitting my head up against the language.

Firstly, don't try to do bottom-up design here. Or, at least, slow down with it until you get fairly good with the language, fairly familiar with the documentation/conventions, and fairly good at understanding how GHCi works. The techniques of wishful thinking and building the language up towards your problem are still applicable, but Haskell has a way of focusing pretty relentlessly on types. Even though it infers a lot of type information without your help, the most common pieces of advice I get from other Haskellers is to

  1. work out what the type of my function is going to be before writing the function itself and
  2. explicitly write it down above the function

I've found that this does help, if for no other reason than thinking through your operations' types will highlight the pieces that you didn't think through well enough, and it'll remind you what the next piece needs to interface with. It's just not what I'm used to doing[5].

Secondly, don't trust GHCi completely. As a Lisper, this sort of blew my mind because the Lisp REPL is a running Lisp image. If something works in the SLIME prompt, it's a fairly good bet that you can just copy it out into your .lisp file and have it work the same way[6]. GHCi isn't exactly that. Its type reflection directive does some odd things when inferring types[7], and it fudges IO-related operations in general[8]. For simple stuff that doesn't do IO, you can probably still get away with Lisp-style exploratory REPLing, but it doesn't seem to be a good general strategy. For GHCi, at least. For all I know, HUGS is better at this sort of thing, but I haven't used it extensively yet, despite it being the only Haskell REPL on offer for ARM at the moment.

Thirdly, it's possible[9] to apply the venerable technique of debugging by printf. At first glance, it seems like it wouldn't be, since doing any output from a function pollutes its type with IO, which then cascades to all of the callers of that function and causes you to rewrite half the project if you want to add some output in one place. Oh, and then rewrite it back once you're done looking at debugging output. There's a library called Debug.Trace that lets you pull off something similar enough. It highlights very clearly that this isn't meant for production use though; what you're supposed to do, near as I can tell, is import qualified Debug.Trace as Debug, then sprinkle Debug.trace "A trace message goes here..." $ {{the thing you want trace output for} throughout your code, and run M-x query-replace-regexp Debug.trace ".*?" later to replace these calls with nothing. It's possible that there's an automatic way of removing them, but I didn't bother finding it for a project this size.

Routing

{-# LANGUAGE OverloadedStrings #-}
module Main where

import Data.Default (def)
import Data.String (fromString)
import Data.Aeson
import qualified Data.Text as Text
import qualified Data.Vault as Vault

import Data.Acid (AcidState, Update, Query, makeAcidic, openLocalState)
import Data.Acid.Advanced (update', query')
import Data.Acid.Local (createCheckpointAndClose)
import Data.IxSet ((@=), Proxy(..), getOne)

import Network.Wai
import Network.Wai.Parse (parseRequestBody, lbsBackEnd)
import Network.Wai.Session (withSession)
import Network.Wai.Session.Map (mapStore_)
import Network.Wai.Handler.Warp (run)
import Network.Wai.EventSource (ServerEvent (..), eventSourceAppChan)
import Network.HTTP.Types (ok200, unauthorized401, status404)

import Control.Exception (bracket)
import Control.Concurrent.Chan (Chan, newChan, dupChan, writeChan)
import Control.Monad.Trans.Resource (ResourceT)
    
import TypeSynonyms
import Util
import Model
import Handlers

routes :: DB -> SessionStore -> Request -> RES
routes db session req = do
  let Just (sessionLookup, sessionInsert) = Vault.lookup session (vault req)
  user <- sessionLookup "user"
  case pathInfo req of
    ("app":rest) -> 
      loggedInRoutes db user rest req
    ("auth":rest) ->
      authRoutes db sessionLookup sessionInsert rest req
    ["static", subDir, fileName] -> 
      serveStatic subDir fileName
    [] -> 
      resFile "text/html" "static/index.html"
    ["favicon.ico"] -> 
      resPlaceholder
    _ -> res404

authRoutes :: DB ->  LookupFN -> InsertFN -> [Text.Text] -> Request -> RES
authRoutes db sLookup sInsert path req = do
  withPostParams req ["name", "passphrase"] route
  where route [name, pass] = 
          case path of
            ["login"] -> 
              login db sInsert name pass
            ["register"] -> 
              case pass of
                "" -> resError "At least pick a non-empty passphrase"
                _  -> register db sInsert name pass
            _ -> res404

loggedInRoutes :: DB -> Maybe String -> [Text.Text] -> Request -> RES
loggedInRoutes db maybeUserName path req = do
  (params, _) <- parseRequestBody lbsBackEnd req
  case maybeUserName of
    Just name -> do
      maybeAccount <- query' db $ AccountByName name
      case maybeAccount of
        Just user -> case path of
          ("item":rest) -> 
            withParams params ["itemName"] route
            where route [itemName] = itemRoutes db user itemName rest params
          ["list"] -> 
            listItems db user
          ["new"] -> 
            withParams params ["itemName", "comment", "count"] new
            where new [name, comment, count] = newItem db user name comment (read count :: Integer)
          ["change-passphrase"] -> 
            withParams params ["newPassphrase"] change
            where change [newPass] = changePassphrase db user newPass
          _ -> res404
        Nothing -> resError "Invalid user"
    Nothing -> resError "Not Logged In"

itemRoutes :: DB -> Account -> String -> [Text.Text] -> BSAssoc -> RES
itemRoutes db user itemName path params = do
  case getOne $ (accountItems user) @= itemName of
    Just item -> case path of
      ["need"] -> 
        needItem db user item
      ["got"] -> 
        gotItem db user item
      ["delete"] -> 
        deleteItem db user item
      ["edit"] ->
        edit $ extractOptional params ["comment", "count"]
        where edit [comment, count] = editItem db user item comment count
      _ -> res404
    Nothing -> resError "Invalid item"

----- Server start
main = do
  session <- Vault.newKey
  store <- mapStore_
  bracket (openLocalState initialDB) (createCheckpointAndClose) 
    (\db -> run 3000 . withSession store (fromString "SESSION") def session $ routes db session)

Basically, case statements. routes at the top there dispatches on pathInfo req, which returns the URI minus GET/# parameters and split on /. You then use the standard Haskell pattern matching facilities to figure out what the user has requested and what to do about it.

Lets take a close-up look at the type signature of routes before moving on.

routes :: DB -> SessionStore -> Request -> RES

That should look suspiciously minimal to anyone who's actually done web development in Haskell before, and it is. The why has to do with this import

import TypeSynonyms

I have no idea whether this is good Haskelling practice or not, but I ended up defining descriptive synonyms for a bunch of the complex types I needed to work with. Then I realized that I need to refer to them in more than one module and it would be better to centralize them rather than having copies of the definitions in each relevant file.

module TypeSynonyms where

import Data.Acid (AcidState)
import qualified Data.ByteString.Char8 as BS
import qualified Data.Vault as Vault

import Network.Wai
import Network.Wai.Session (Session)

import Control.Monad.Trans.Resource (ResourceT)

import Model

type DB = AcidState GoGetDB
type RES = ResourceT IO Response
type SessionStore = Vault.Key (Session (ResourceT IO) String String)
type LookupFN = (String -> ResourceT IO (Maybe String))
type InsertFN = (String -> String -> ResourceT IO ())
type BSAssoc = [(BS.ByteString, BS.ByteString)]

So DB is shorthand for the database class we're using[10], RES is shorthand for the HTTP response IO type, and SessionStore/LookupFN/InsertFN are session vault and related lookup/insertion functions respectively. I also defined shorthand for a ByteString association table, since that's how parsed request parameters are stored and they get passed around more than once.

Ok, back to routes.

If you read through that file, you'll notice that a lot of validation and lookup logic in with the routing rather than in the specific handlers that might need them. That sort of happened accidentally, and again, I'm not sure it's the best way to organize these files, but it does have two big advantages. First, because validation and error routing happens beforehand, the handler functions themselves can be exclusively concerned with the successful case. By the time an item-related function is called for example, it's guaranteed to have a request from an existing, logged-in user relating to an existing item. So the handler doesn't need to check for any of those internally. Second, we centralize the validation and lookups. If we expected the handlers themselves to deal with it, then each of the item-related handlers, for example would need to check for an authenticated user, and they'd each have to check that the item they're asked to operate on actually exists. By doing it beforehand, we only do that check once.

I mentioned that this is new to me. That's because the various Python/Ruby frameworks I'm familiar with all represent a routing table as some sort of ordered associative list of regexes and handlers, while all of the Common Lisp/Clojure servers I'm familiar with give you something along the lines of define-handler, which takes a name, handler body and routing URI, removing the need for an explicit central routing structure at all. As I recall, Smalltalk works something like the Lisps and Erlang does something similar to Python/Ruby. So this is the first time I did any real work as part of handler routing, and it seems like it might be a good approach. In a dynamically typed language, I'd be really worried about not making it obvious enough that a handler function is expecting thoroughly validated input rather than doing that part itself, which would increase the chances of a dev passing in unvalidated input and causing an explosion somewhere. But the type annotations and rigorous static checking take care of that for me here in Haskell-land.

Lets take a look at these simplified handlers we're passing stuff on to.

The Handlers

module Handlers ( listItems, needItem, gotItem, editItem, deleteItem, newItem
                , changePassphrase, register, login ) where

import Control.Monad.Trans  (liftIO)

import Data.Maybe (fromMaybe)
import qualified Data.ByteString.Char8 as BS

import Data.Acid (AcidState, Update, Query, makeAcidic, openLocalState)
import Data.Acid.Advanced (update', query')
import Data.Acid.Local (createCheckpointAndClose)
import Data.IxSet (Indexable(..), IxSet(..), (@=), Proxy(..), getOne, ixFun, ixSet, insert, delete, toAscList, updateIx )

import Crypto.Scrypt (EncryptedPass(..), Pass(..), defaultParams, encryptPass, verifyPass)

import TypeSynonyms
import Util
import Model

---------- HTTP Handlers
----- Item Related
listItems :: DB -> Account -> RES
listItems db user = do
  resIxItems  $ accountItems user

needItem :: DB -> Account -> Item -> RES
needItem db user item = do
  update' db $ ChangeItem user new
  resIxItems $ updateIx (itemName item) new (accountItems user) 
    where new = item { itemStatus = Need }

gotItem :: DB -> Account -> Item -> RES
gotItem db user item = do
  update' db $ ChangeItem user new
  resIxItems $ updateIx (itemName item) new (accountItems user)
    where new = item { itemStatus = Got }

editItem :: DB -> Account -> Item -> Maybe String -> Maybe String -> RES
editItem db user item newComment newCount = do
  update' db $ ChangeItem user new
  resIxItems $ updateIx (itemName item) new (accountItems user)
    where new = item { itemComment = comment, itemCount = count }
          comment = fromMaybe (itemComment item) newComment
          count = fromMaybe (itemCount item) (maybeRead newCount :: Maybe Integer)

deleteItem :: DB -> Account -> Item -> RES
deleteItem db user item = do
  update' db $ DeleteItem user item
  resIxItems $ delete item (accountItems user)

newItem :: DB -> Account -> String -> String -> Integer -> RES
newItem db user name comment count =
  case getOne $ (accountItems user) @= name of
    Just item -> needItem db user item
    Nothing -> do
      update' db $ NewItem user item
      resIxItems $ insert item (accountItems user)
      where item = Item { itemName=name, itemComment=comment, itemCount=count, itemStatus=Need }

----- Account Related
changePassphrase :: DB -> Account -> String -> RES
changePassphrase db user newPassphrase = do
  new <- liftIO . encryptPass defaultParams . Pass $ BS.pack newPassphrase
  update' db . UpdateAccount $ user { accountPassphrase = unEncryptedPass new }
  resOk user

register :: DB -> InsertFN -> String -> String -> RES
register db sessionInsert name passphrase = do
  pass <- liftIO . encryptPass defaultParams . Pass $ BS.pack passphrase
  existing <- query' db $ AccountByName name
  case existing of
    Nothing -> do
      acct <- update' db . NewAccount name $ unEncryptedPass pass
      sessionInsert "user" name
      resOk acct
    _ -> resError "User already exists"

login :: DB -> InsertFN -> String -> String -> RES
login db sessionInsert name passphrase = do 
  res <- query' db $ AccountByName name
  case res of
    Just user -> case verifyPass defaultParams (Pass $ BS.pack passphrase) pass of
      (True, _) -> do
        sessionInsert "user" $ accountName user
        resOk user
      _ -> resNO
      where pass = EncryptedPass $ accountPassphrase user
    _ -> resNO

<p>The authentication functions are predictably complicated, but I'll get to them later. Take a look at the <code>needItem</code> function.</p>

needItem :: DB -> Account -> Item -> RES
needItem db user item = do
  update' db $ ChangeItem user new
  resIxItems $ updateIx (itemName item) new (accountItems user) 
    where new = item { itemStatus = Need }

It's not expecting an account name and item ID to reference by. It's expecting an Account[11] and it's expecting an Item[12]. It does the work of updating the DB, and then sends back an appropriate response.

Really, I could have made one more general function along the lines of editItem, then called it for need, got, and separate handlers for changeComment and changeCount. In fact, that was officially a note to self.

EDIT: The item-related section now reads
needItem :: DB -> Account -> Item -> RES
needItem db user item = updateItem db user new
  where new = item { itemStatus = Need }

gotItem :: DB -> Account -> Item -> RES
gotItem db user item = updateItem db user new
  where new = item { itemStatus = Got }

editItem :: DB -> Account -> Item -> Maybe String -> Maybe String -> RES
editItem db user item newComment newCount = updateItem db user new
  where new = item { itemComment = comment, itemCount = count }
        comment = fromMaybe (itemComment item) newComment
        count = fromMaybe (itemCount item) (maybeRead newCount :: Maybe Integer)

updateItem :: DB -> Account -> Item -> RES
updateItem db user newItem = do
  update' db $ ChangeItem user newItem
  resIxItems $ updateIx (itemName newItem) newItem (accountItems user)
Sat, 09 Feb, 2013

The way it's currently written, the most complex of the item-related handlers is editItem, and that's because it needs to optionally change the comment, count or both depending on what's passed in. This is the price you pay for automatic currying and maximally terse partials; those features don't share space well with optional/keyword/rest arguments. The result is that when you need the latter, you need to represent them as mandatory Maybe args, or as a custom type argument. We've already gone through an example of the first approach. You can see the second if you squint at verifyPass and encryptPass. Specifically, the second argument, defaultParams is of type ScryptParams, which is defined as

data ScryptParams = Params { logN, r, p, bufLen :: Integer} deriving (Eq)

which is really a way of representing keyword args in a language without any. defaultParams itself is defined as

defaultParams :: ScryptParams
defaultParams = fromJust (scryptParams 14 8 1)

and scryptParams is a surprisingly complicated function that validates input and returns Params { logN, r, p, bufLen = 64 }. In Lisp, verifyPass would have an arg line like

(verify-pass incoming stored &key (logN 14) (r 8) (p 1) (bufLen 64))

and start off with some assertions to mimic the validation done in scryptParams. Of course, that's not to say that the Haskell approach is a hack; both approaches have their advantages and disadvantages in practice[13]. In the specific situation I'm dealing with above, since we don't have optionals, it would probably have been better to separate the count and comment changing handlers and let the front-end call the specific one it wants. That was another note to self.

Since we're here, and since I'm the guy who's been going on and on about this, something would be slightly amiss if I failed to note the authentication system, at least in passing. We're using the scrypt algorithm to store and verify passwords[14]. If a password is verified we store the users' name in their session cookie. wai-session encrypts its cookies for security, so this system would actually be a simplistic[15] but secure way of maintaining user account as long as we used it over HTTPS.

Lets see, where were we. Oh, right, all those functions beginning with res in the Handlers and Main modules aren't built-ins. They're defined in a generically named Util module.

Util

{-# LANGUAGE OverloadedStrings #-}
module Util ( resOk, res404, resError, resNO, resFile, resIxItems, serveStatic
            , resPlaceholder
            , extractOptional, withParams, withPostParams 
            , maybeRead) where

import Data.String (fromString)
import Data.Aeson
import qualified Data.Text as Text
import qualified Data.ByteString.Char8 as BS

import Data.IxSet (IxSet(..), Proxy(..), toAscList )

import Network.Wai
import Network.Wai.Parse (parseRequestBody, lbsBackEnd)
import Network.HTTP.Types (ok200, unauthorized401, status404)

import Control.Monad (sequence, liftM)
import Control.Monad.Trans.Resource (ResourceT)

import TypeSynonyms
import Model

resIxItems :: IxSet Item -> RES
resIxItems body = resOk $ toAscList (Proxy :: Proxy ItemStatus) $ body

resOk :: (ToJSON a) => a -> RES
resOk body = return $ responseLBS ok200 [] $ encode body

resNO :: RES
resNO = resError "NO -_-"

res404 :: RES
res404 = return $ responseLBS status404 [] $ fromString "Not Found"

resPlaceholder :: RES
resPlaceholder = return $ responseLBS status404 [] $ fromString "Not implemented yet"

resError :: String -> RES
resError message = return $ responseLBS unauthorized401 [] $ fromString message

resFile :: BS.ByteString -> FilePath -> RES
resFile contentType filename = return $ ResponseFile ok200 [("Content-Type", contentType)] filename Nothing

serveStatic :: Text.Text -> Text.Text -> RES
serveStatic subDir fName = 
  case sub of
    "js" -> serve "text/javascript"
    "css" -> serve "text/css"
    "img" -> serve "image/png"
    _ -> res404
  where serve mimeType = resFile mimeType $ concat ["static/", sub, "/", Text.unpack fName]
        sub = Text.unpack subDir

withPostParams :: Request -> [BS.ByteString] -> ([String] -> RES) -> RES
withPostParams req paramNames fn = do
  (params, _) <- parseRequestBody lbsBackEnd req
  withParams params paramNames fn

withParams :: BSAssoc -> [BS.ByteString] -> ([String] -> RES) -> RES
withParams params paramNames fn = 
  case extractParams params paramNames of
    Just paramVals -> 
      fn paramVals
    Nothing ->
      resError $ concat ["Need '", paramsList, "' parameters"]
      where paramsList = BS.unpack $ BS.intercalate "', '" paramNames

extractOptional :: BSAssoc -> [BS.ByteString] -> [Maybe String]
extractOptional  params paramNames = map lookunpack paramNames
  where lookunpack k = do
          res <- lookup k params
          return $ BS.unpack res

extractParams :: BSAssoc -> [BS.ByteString] -> Maybe [String]
extractParams params paramNames = do
  res <- allLookups params paramNames
  return $ map BS.unpack res

maybeRead :: Read a => Maybe String -> Maybe a
maybeRead str = do
  res <- str
  return $ read res

allLookups :: Eq a => [(a, a)] -> [a] -> Maybe [a]
allLookups assoc keys = sequence $ map (\k -> lookup k assoc) keys

The response functions seem fairly self-explanatory. resOk constructs a standard HTTP 200 response, resIxItems takes an IxSet Items and constructs a response by JSON-encoding it, and the error handlers each return a 404 or perhaps 401 with some error message. The resPlaceholder was something I used as a[16] placeholder in the various routes functions while writing the final handlers. I think the only place a call to it still exists is in the favicon handler.

withPostParams and more generally withParams are functions that call functions using the output of serial lookup calls, which would be a pain in the ass to do manually. maybeRead is exactly what it says on the tin; it's a wrapper around Haskell's polymorphic read function wrapped so that it can deal with Maybe String rather than String input.

That's ... really it. I'm struggling to describe these a bit more than I usually do because the type signatures tell you a pretty significant amount of what you need to know about a given function. Not everything, obviously, but you'd be surprised how many times I've cut "foo is self-explanatory" from this write-up.

The last module left is the model; the one that actually takes all this information and stores it in some way.

The Model

Before we dive into the code on this one, I want to highlight two things.

First, I initially did my best to separate the model entirely from the rest of the logic. Going so far as to define "external" API functions to call from other modules. That didn't last. Using naked query' and update' caused some type issues that I'm still not entirely clear about, and the code separating the database primitives from other functions was almost tripling the total size of the model. When modularity costs that much, I prefer to chuck it in a bin and resign myself to re-writing most of the application if I need to change out my database engine.

Second, this isn't the first back-end I tried using[17]. Before settling on AcidState, I tried out the haskell-MongoDB interface and hdbc-sqlite3, neither of which impressed me much[18]. The Mongo interface just plain does not fit with the Haskll way of doing things. It's a massive, loosely structured, JSON-based key/value store, and since my application mainly responds to clients using JSON feeds, I figured that would be a good fit. One problem is that it turns out that aeson[19] has a fundamentally different type architecture than mongodbs BSON, which means that converting between them is ... lets go with "non-trivial". The other big problem is that doing k/v lookups in Haskell is harder than dealing with native Haskell types, which means that the easiest way of using Mongo here would have been to define a type, and then specify how it gets encoded/decoded to both JSON and BSON. Given that I'm pretty used to pymongo and Monger, that's a lot more work than I was expecting. The hdbc interface was slightly better, since relational databases assume some up-front modeling, and slightly worse, since it expected me to write string-template based queries. Both external options required dealing with the actual DB through an external connection, both required a conversion step from associative lists before being converted to JSON, and both seemed to expect me to perform that conversion step.

AcidState didn't. It serializes most native Haskell types[20].

Without further ado

{-# LANGUAGE DeriveDataTypeable, GeneralizedNewtypeDeriving, RecordWildCards, TemplateHaskell, TypeFamilies, OverloadedStrings #-}
 
module Model ( initialDB
             , GoGetDB(..), Account(..), Item(..), ItemStatus(..)
             , NewAccount(..), UpdateAccount(..), AccountByName(..), GetAccounts(..)
             , NewItem(..), DeleteItem(..), ChangeItem(..) ) where

import Control.Monad.Reader (ask)
import Control.Monad.State  (get, put)
import Control.Monad.Trans  (liftIO)
import Control.Monad.IO.Class (MonadIO)

import Data.Acid (AcidState, Update, Query, makeAcidic, openLocalState)
import Data.Acid.Local (createCheckpointAndClose)
import Data.Acid.Advanced (update', query')
import Data.Data (Data, Typeable)
import Data.IxSet (Indexable(..), IxSet(..), (@=), Proxy(..), getOne, ixFun, ixSet, insert, delete, toAscList, updateIx )
import Data.SafeCopy (SafeCopy, base, deriveSafeCopy)
import Data.Text.Lazy (toStrict)
import qualified Data.Text as Text
import Data.ByteString.Char8 (ByteString, pack)
import Data.Maybe
import Data.Aeson

import Crypto.Scrypt (EncryptedPass(..), Pass(..), defaultParams, encryptPass, verifyPass)

---------- Base types (for IxSet and Account components)
newtype AccountId = AccountId { unAccountId :: Integer } deriving (Eq, Ord, Data, Show, Enum, Typeable, SafeCopy)

---------- Item-Related types
data ItemStatus = Need | Got deriving (Eq, Ord, Data, Enum, Read, Show, Typeable)
deriveSafeCopy 0 'base ''ItemStatus

data Item = Item { itemName :: String, itemComment :: String, itemStatus :: ItemStatus, itemCount :: Integer } deriving (Eq, Ord, Show, Data, Typeable)
deriveSafeCopy 0 'base ''Item
instance Indexable Item where
  empty = ixSet [ ixFun $ (:[]) . itemName
                , ixFun $ (:[]) . itemStatus
                , ixFun $ (:[]) . itemCount
                ]
instance ToJSON Item where
  toJSON (Item name comment status count) = object [ "name" .= name
                                                   , "comment" .= comment
                                                   , "status" .= show status
                                                   , "count" .= count 
                                                   ]

---------- Account
data Account = Account { accountId :: AccountId
                       , accountName :: String 
                       , accountPassphrase :: ByteString
                       , accountItems :: IxSet Item
                       } deriving (Eq, Show, Data, Typeable) 

instance Ord Account where
  a `compare` b = (accountId a) `compare` (accountId b)

deriveSafeCopy 0 'base ''Account
instance Indexable Account where
  empty = ixSet [ ixFun $ (:[]) . accountId
                , ixFun $ (:[]) . accountName 
                ]

instance ToJSON Account where
  toJSON (Account id name _ items) = object [ "id" .= unAccountId id
                                            , "name" .= name
                                            , "items" .= toAscList (Proxy :: Proxy ItemStatus) items]

---------- DB root type
  --- This is declared so that acid-state has a top level element to store
data GoGetDB = GoGetDB { nextAccountId :: AccountId, accounts :: IxSet Account
                       } deriving (Show, Data, Typeable)
deriveSafeCopy 0 'base ''GoGetDB

initialDB :: GoGetDB
initialDB = GoGetDB { nextAccountId = AccountId 0, accounts = empty }

---------- Insertion Functions
newAccount :: String -> ByteString -> Update GoGetDB Account
newAccount name passphrase = do
  db@GoGetDB{..} <- get
  let account = Account { accountId = nextAccountId
                        , accountName = name
                        , accountPassphrase = passphrase
                        , accountItems = empty
                        }
  put $ db { nextAccountId = succ nextAccountId
           , accounts = insert account accounts 
           }
  return account

deleteItem :: Account -> Item -> Update GoGetDB ()
deleteItem acct item = do
  db@GoGetDB{..} <- get
  put $ db { accounts = updateIx (accountId acct) removed accounts }
    where removed = acct { accountItems = delete item (accountItems acct)}

newItem :: Account -> Item -> Update GoGetDB ()
newItem acct item = do
  db@GoGetDB{..} <- get
  put $ db { accounts = updateIx (accountId acct) added accounts }
    where added = acct { accountItems = insert item (accountItems acct) }

changeItem :: Account -> Item -> Update GoGetDB ()
changeItem acct item = do
  db@GoGetDB{..} <- get
  put $ db { accounts = updateIx (accountId acct) changed accounts }
    where changed = acct { accountItems = updateIx (itemName item) item (accountItems acct)}

updateAccount :: Account -> Update GoGetDB ()
updateAccount u = do
  db@GoGetDB{..} <- get
  put $ db { accounts = updateIx (accountId u) u accounts }

---------- Query Functions
getAccounts :: Query GoGetDB [Account]
getAccounts = do
  GoGetDB{..} <- ask
  return $ toAscList (Proxy :: Proxy AccountId) accounts

getAccount :: (Typeable a) => a -> Query GoGetDB (Maybe Account)
-- separate so we can get accounts by something else at some point in the future
getAccount ix = do
  GoGetDB{..} <- ask
  return $ getOne $ accounts @= ix

accountByName :: String -> Query GoGetDB (Maybe Account)
accountByName name = getAccount name

makeAcidic ''GoGetDB [ 'newAccount, 'newItem, 'deleteItem, 'changeItem, 'updateAccount, 'accountByName, 'getAccounts ]

The temptation is really strong to say "this is self-explanatory", but that's only true if you've also gone through the relevant Happstack Crash Course section. Basically, AcidState is a Haskell-native noSQL data store. You define Haskell types that represent stuff you want to store, and it serializes them to disk through an interface that looks pretty close to the state monad if you squint hard enough.

I want to draw your attention to a few things.

---------- Account
data Account = Account { accountId :: AccountId
                       , accountName :: String 
                       , accountPassphrase :: ByteString
                       , accountItems :: IxSet Item
                       } deriving (Eq, Show, Data, Typeable) 

instance Ord Account where
  a `compare` b = (accountId a) `compare` (accountId b)

deriveSafeCopy 0 'base ''Account
instance Indexable Account where
  empty = ixSet [ ixFun $ (:[]) . accountId
                , ixFun $ (:[]) . accountName 
                ]

instance ToJSON Account where
  toJSON (Account id name _ items) = object [ "id" .= unAccountId id
                                            , "name" .= name
                                            , "items" .= toAscList (Proxy :: Proxy ItemStatus) items]

This is the full definition of the Account type. The ToJSON declaration allows aeson to serialize this type for the front-end, and the Indexable class lets IxSet construct collections of Accounts. IxSet itself is a multi-indexing set implementation that I'm using to store both the accounts and shopping list items for this project.

The deriveSafeCopy call there is actually a TemplateHaskell invocation, and not a regular function[21]. Basically what that means is that this call will be resolved at compile time; in that sense, TemplateHaskell is a very restricted answer to Lisp macros.

Next, note the type of accountPassphrase. If you take a look at the login handler from earlier, you'll notice something a bit odd.

login :: DB -> InsertFN -> String -> String -> RES
login db sessionInsert name passphrase = do 
  res <- query' db $ AccountByName name
  case res of
    Just user -> case verifyPass defaultParams (Pass $ BS.pack passphrase) pass of
      (True, _) -> do
        sessionInsert "user" $ accountName user
        resOk user
      _ -> resNO
      where pass = EncryptedPass $ accountPassphrase user
    _ -> resNO

We're storing a ByteString in the database, but converting it to/from the type EncryptedPass. We don't have to do that with other types, but the developers who wrote Haskell's scrypt library didn't bother deriving Typeable for their types. There's quite likely a way for me to derive it manually, but that seems like more trouble in this particular instance. I just wanted to point out that even though we're doing type conversion to fit something into an AcidState DB here, that's not the ideal case, and you typically don't have to.

Finally, note that except for initialDB, all the exports from the Model module are types rather than functions

module Model ( initialDB
             , GoGetDB(..), Account(..), Item(..), ItemStatus(..)
             , NewAccount(..), UpdateAccount(..), AccountByName(..), GetAccounts(..)
             , NewItem(..), DeleteItem(..), ChangeItem(..) ) where

If you take a look at any query' or update' call in the handlers, you'll note that they work by passing arguments to one of these types. What's actually happening is that AcidState is forward-journaling your requests so that it can fulfill the ACID guarantees. The trouble is that functions aren't inherently serializable in Haskell. So what it expects you to do is define the appropriate database functions, then use makeAcidic to derive the relevant, serializable types.

That about does it for the back end. I was going to go over the client-side code too, but this piece is getting quite long already. I'll likely write a follow-up to show you how I ended up actually calling the JSON-based handlers I describe above, but in the meantime, you'll have to check out goget.js and index.html in the static folder of the github project.


Footnotes

1 - [back] - Which has progressed to three-month-old status, in case you care.

2 - [back] - Or whatever JS framework the server-side framework team picked out.

3 - [back] - So good luck getting a front-end specialist in later.

4 - [back] - So any changes, regardless how trivial, actually need a re-compile and re-run on the final server.

5 - [back] - Which shouldn't surprise you in the least: this just in languages that affect the way you think about programming expect you to think differently about programming.

6 - [back] - Modulo the obvious state problems you have from potentially having some intermediary values defined in the image.

7 - [back] - That SO question has an example in the answers' comments; the type EventResult is reported as belonging to the module Data.Acid.Common, but that file actually doesn't exist. What's actually happening is that Common is a hidden module in the AcidState project, and another module is responsible for exporting its symbols. I didn't know this just from looking. The reason it matters is that when you want to make a type signature explicit by importing the relevant module, GHCi will tell you where a given type is defined and not where it's exported. Fun times.

8 - [back] - Which kind of makes sense, because conceptually speaking, a purely functional REPL for a lazy language would more or less have to be implemented in the IO monad.

9 - [back] - If slightly clunkier than in CL or Python.

10 - [back] - There'll be more on that later, obviously. Do note that this is only a reasonable thing to do because we only use one database class for our model; if we used several, we'd need to figure something else out. To be fair though, I'm having a hard time imagining a situation that would call for using several DB classes in a single project.

11 - [back] - Meaning lookup has been done and validated for it.

12 - [back] - Meaning we've already collapsed the waveform and made sure that the user wants to need an existing item, otherwise we'd be expecting a Maybe Item here instead.

13 - [back] - In fact, I've been meaning to write a piece comparing the two, I just haven't gotten around to it.

14 - [back] - Which you still shouldn't count as a flat out recommendation, but I am using it, and I do intend to deploy this, so draw what conclusions you like.

15 - [back] - We don't do any kind of throttling on login, aside from the complexity of the scrypt algorithm itself, and we don't check registration requests for automation with recaptcha or similar. I'm not sure how I feel about the first, while the second seems entirely unnecessary for an acount that doesn't allow sending any kind of email, or doing anything other than managing associated data.

16 - [back] - Surprise.

17 - [back] - Which, on reflection, is probably why I over-estimated the need to switch out databases at first.

18 - [back] - Although I really could have put a project this small together with either if I felt like it.

19 - [back] - Haskell's main JSON encoding library.

20 - [back] - In order to serialize a type, you need to derive Typeable and SafeCopy. They're both trivial tasks for your own types, assuming you've got TemplateHaskell on, but are non-trivial for types you're including from modules you didn't write. You'll see an example of this when you see how I store encrypted passwords later on. That's the only type I needed to massage myself though; had I used an option other than AcidState, I'd have had to do the same for all of them.

21 - [back] - You can tell because of the ' and '' argument prefixes.