Friday, May 24, 2013

Truthy and Falsy vs Explicit

I've been thinking about this for a bit, and I'm honestly not sure I can understand it while it's in my head. You know what time it is.

In general, having Falsy things makes it easier to use the if statement, and not having them means getting more explicit code. With that in mind, here's the state of play across the spectrum.

Haskell

doesn't take your shit. If you want to use if instead of pattern matching or guards, you're damn well going to pass it a Bool.

GHCi, version 7.4.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> let yayNay val = if val then "Yay!" else "Nay!"
Prelude> yayNay False
"Nay!"
Prelude> yayNay []

<interactive>:4:8:
    Couldn't match expected type `Bool' with actual type `[a0]'
    In the first argument of `yayNay', namely `[]'
    In the expression: yayNay []
    In an equation for `it': it = yayNay []
Prelude> yayNay ""

<interactive>:5:8:
    Couldn't match expected type `Bool' with actual type `[Char]'
    In the first argument of `yayNay', namely `""'
    In the expression: yayNay ""
    In an equation for `it': it = yayNay ""
Prelude> yayNay 0

<interactive>:6:8:
    No instance for (Num Bool)
      arising from the literal `0'
    Possible fix: add an instance declaration for (Num Bool)
    In the first argument of `yayNay', namely `0'
    In the expression: yayNay 0
    In an equation for `it': it = yayNay 0
Prelude> 

If you want to check whether something is not empty rather than merely False, check for that.

empty [] = True
empty _ = False

or, better yet, just use pattern matching in the function you're writing instead of explicit if

foo [] = handleEmpty
foo (a:b:rest) = handleTwo
foo list = handleN

Python

At the entirely other end of the spectrum is Python. It'll take your shit, interpreting if in a Pythonic™© way to mean the appropriate empty check in some contexts.

>>> def yayNay(val):
...     if val:
...             return "Yay!"
...     else:
...             return "Nay!"
... 
>>> yayNay(False)
'Nay!'
>>> yayNay([])
'Nay!'
>>> yayNay({})
'Nay!'
>>> yayNay("")
'Nay!'
>>> yayNay(()) ## the empty tuple
'Nay!'
>>> yayNay(None)
'Nay!'
>>> yayNay(0)
'Nay!'
>>> 

I'm not sure how to feel about that. On the one hand, yes, empty checks are easier. On the other, empty checks usually aren't what you actually want[1], and I can't see another benefit of doing things this way. I also don't really understand why [] and 0 should be Falsy, but [0] should be Truthy. Also, it's not quite as simple as "Empty values should be Falsy". Because there are classes that are in the standard library for which this behavior would also make sense[2], but that lack it.

>>> import Queue
>>> Queue.Queue()
<Queue.Queue instance at 0x2837518>
>>> yayNay(Queue.Queue())
'Yay!'
>>> 

If that was actually the rationale, you'd also kind of expect to be able to define your own Falsy values, but you can't seem to.

JavaScript

does almost the same thing as Python, but adds null, undefined and NaN to the list of Falsy values, and considers empty sequences other than the empty string Truthy. That is,

>// from the Chromium prompt
undefined
> function yayNay(val) { if (val) return "Yay!"; else return "Nay!"; }
undefined
> yayNay(0)
"Nay!"
> yayNay([])
"Yay!"
> yayNay({})
"Yay!"
> yayNay(undefined)
"Nay!"
> yayNay(null)
"Nay!"
> yayNay("")
"Nay!"
> yayNay(NaN)
"Nay!"

There are no tuples in JS, so it can't do anything there. I have no idea what the reasoning is otherwise though. I especially have no idea what would possess someone to think that making the empty array Truthy and the empty string Falsy, unless strings are actually implemented as linked lists underneath.

Common Lisp

is middle-of-the road in this respect. It has a canonical t and nil as True/False, but nil also pulls double-duty as the empty list.

; SLIME 2012-09-04
CL-USER> (defun yay-nay (val) (if val :yay! :nay!))
YAY-NAY
CL-USER> (yay-nay 0)
:YAY!
CL-USER> (yay-nay (vector))
:YAY!
CL-USER> (yay-nay (make-hash-table))
:YAY!
CL-USER> (yay-nay (list))
:NAY!
CL-USER> (yay-nay '())
:NAY!
CL-USER> (yay-nay nil)
:NAY!
CL-USER> 

To a first approximation, it seems that the rationale here is "Sequences that you're likely to interact with through recursion should be Falsy when empty". Except that breaks down when you think about it a bit, because Common Lisp doesn't support tail call optimization either, the way that say, Scheme does.

Speaking of which...

Scheme

CHICKEN
(c)2008-2011 The Chicken Team
(c)2000-2007 Felix L. Winkelmann
Version 4.7.0 
linux-unix-gnu-x86-64 [ 64bit manyargs dload ptables ]
compiled 2011-09-05 on gladstone.duckburg.org (Linux)

#;1> (define (yay-nay val) (if val 'yay! 'nay!))
#;2> (yay-nay 0)
yay!
#;3> (yay-nay nil) ;; wait, fuck, that doesn't exist here

Error: unbound variable: nil

        Call history:

        <syntax>          (yay-nay nil)
        <eval>    (yay-nay nil) <--
#;3> (yay-nay '())
yay!
#;4> (yay-nay (list))
yay!
#;5> (yay-nay #f)
nay!
#;6> (yay-nay (null? (list)))
yay!
#;7> (yay-nay (not (list)))
nay!
#;8>

Scheme, or at least Chicken Scheme, doesn't seem to treat anything but an actual #f as false. Which is mildly bizarre, because the recursion logic would actually make sense here.

Clojure

seems to do its usual and split the difference between Common Lisp and Scheme.

REPL started; server listening on localhost port 40977
user=> (defn yay-nay [val] (if val :yay! :nay!))
#'user/yay-nay
user=> (yay-nay 0)
:yay!
user=> (yay-nay [])
:yay!
user=> (yay-nay "")
:yay!
user=> (yay-nay (list))
:yay!
user=> (yay-nay '())
:yay!
user=> (yay-nay nil)
:nay!
user=> (yay-nay false)
:nay!
user=> 

That is, empty sequences are all Truthy, false is obviously false, and nil is a Falsy value that doesn't double as the empty list.

Conclusion

Hah! You thought I was going to conclude something?

Non-Conclusion

Why is nil Falsy in Clojure? Why is [] Truthy but "" Falsy in JavaScript? Why, if a language has already decided to support empty sequences and values as Falsy, should a sequence composed of nothing but Falsy values be Truthy? Why give this treatment only to some sequences and container values? Why, even if that made sense, can you not define your own classes whose empty states are Falsy?

I haven't a fucking clue. And I guess I'll be asking pointed questions at the various language groups I attend periodically. Each of these languages have gotten me mileage, and I'm used to doing explicit empty checks, so Falsy empty values don't give me any sort of concrete advantage. Anything other than the Haskell approach seems arbitrary, though to be fair, purely functional languages don't have to deal with the idea of pointers[3] which makes things easier.

There definitely isn't any kind of consensus on the issue, and I really wasn't expecting any, but it bothers me mightily that I can't even get my head around what the parameters of the problem are. It's not that we haven't decided whether Falsy or Explicit is better, it's that I'm not sure what factors would play into picking one over the other.

So much for peeling that one, I guess. I'll keep you posted if I have any more thoughts.


Footnotes

1 - [back] - And don't make much sense anyway since Tail Recursion is unpythonic for some odd-sounding non- or semi-reasons.

2 - [back] - For some values of make and sense.

3 - [back] - Or "names", since Python insists on renaming things.

Wednesday, May 22, 2013

Code Retreat - Sudoku

Dammit, Dann, this is getting to be a habit.

I haven't gotten to the bottom of the squares problem yet, and yesterday was the monthly Toronto Code Retreat. Which means my time is effectively up thanks to this new problem we're being handed. I'm sure there's a canonical, three-line solution out there somewhere, but I didn't want to read up on it until after the event, so I'm thinking at this from first principles. As a result, it won't be pretty or optimized.

The Problem

Near as I can tell, Sudoku is a Set problem. This isn't how I first heard the rules described, but it fits. A solved Sudoku board is one where

  • each number is filled in
  • each row is a set
  • each column is a set
  • each disjoint nxn square is a set (where n is the square root of the board size)

we'll call this "The Sudoku Property"

Now there's actually two problems; one is generating boards and one is solving boards. Solving seems to be pretty straightforward:

  1. take a board
  2. optionally, fill in all obvious squares until there are no more (an obvious square is one where only one legal move exists)
  3. for each blank square, try each possibility

If it can't be done, you've got an unsolvable board. If it can, return the first solution you find[1]. That seems to be that.

Generating is a bit of a different story. The naive solution is to generate a bunch of numbers and place them onto a board such that they respect TSP. Essentially, we're solving an empty board by starting with some random seeds. It really seems like this should work, but I can't convince myself that placing a number in an empty square while respecting the TSP will certainly avoid generating impossible boards. That is, an otherwise valid board which contains one or more empty squares whose set of possibilities is the empty set. It might be a better idea to start with nine sets and perform some sort transformation on it until we get something that respects TSP, rather than building it up piecemeal.

The Event

We did this pretty casually. The problem as presented was to solve boards generated by this site, and not bother with the generation side of things. In practice, no group ever got to the point of a general brute-force Sudoku solver. Three groups managed to solve obvious boards (ones where it's possible to find a complete solution without guessing), and one managed to hook together a bogosort-style solution that could, in principle, if we left it running for a few years, solve an arbitrary 9x9 Sudoku board.

We went the usual three sessions, and I did my usual and used a different language each time. First up, we had a mostly-ideation session in which we ostensibly worked in Python. The only snippet I've got left over from that is

def makeBoard(regionWidth):
    side = regionWidth * regionWidth
    return [[0 for x in xrange(side)] for y in xrange(side)]

sample = [[0, 7, 1, 4, 0, 0, 0, 0, 5],
          [0, 0, 0, 0, 5, 0, 0, 8, 0],
          [0, 0, 3, 9, 0, 7, 6, 0, 0],
          [0, 0, 0, 0, 0, 1, 0, 0, 0],
          [0, 9, 0, 8, 0, 6, 0, 0, 3],
          [0, 0, 0, 0, 0, 0, 8, 2, 0],
          [0, 6, 0, 0, 4, 0, 7, 0, 8],
          [3, 0, 0, 0, 0, 0, 0, 9, 0],
          [0, 0, 0, 0, 8, 5, 0, 0, 0]]

def placements(board, n):
    """Return list of coords where `n` is a legal placement."""
    emptyRows = [i for i, row in enumerate(board) if not n in row]
    res = []
    for i in emptyRows:
        for j, col in enumerate(board[i]):
            if col == 0: res.append((i, j))
    return res

The idea would be to get a function which takes a board and an n, and finds out what the possible placements of n are. I'm not entirely sure how we would have proceeded if this ended up working; I think we'd have to hope really hard that at least one of the digits has only one possible position, which doesn't sound terribly likely in general. The fully implemented version of that function would also check for column and block presences, rather than just filtering rows poorly.

For round two, I got into a group of three with two people who wanted to see Haskell in action. So, after some type-system snafus, we did this:

module Sudoku where

import Data.Set (Set(..), fromList, union, difference)
import Data.List (genericIndex, genericTake)

type Board = [[Int]]

sample :: Board
sample = [[0, 7, 1, 4, 0, 0, 0, 0, 5],
          [0, 0, 0, 0, 5, 0, 0, 8, 0],
          [0, 0, 3, 9, 0, 7, 6, 0, 0],
          [0, 0, 0, 0, 0, 1, 0, 0, 0],
          [0, 9, 0, 8, 0, 6, 0, 0, 3],
          [0, 0, 0, 0, 0, 0, 8, 2, 0],
          [0, 6, 0, 0, 4, 0, 7, 0, 8],
          [3, 0, 0, 0, 0, 0, 0, 9, 0],
          [0, 0, 0, 0, 8, 5, 0, 0, 0]]

row :: Board -> (Int, Int) -> Set Int
row board (x, y) = fromList $ genericIndex board y

col :: Board -> (Int, Int) -> Set Int
col board (x, y) = fromList $ map (flip genericIndex x) board

origin :: Int -> Int
-- origin n = 3 * fromIntegral $ floor $ n / 3
origin n
  | 3 > n = 0
  | 6 > n = 3
  | otherwise = 6

block :: Board -> (Int, Int) -> Set Int
block board (x, y) = fromList . concat . map (genericTake blockSize . drop ox) $ genericTake blockSize $ drop oy board
  where blockSize = 3
        ox = origin x
        oy = origin y
        
possibilities :: Board -> (Int, Int) -> Set Int
possibilities board (x, y) = difference (fromList [1..9]) b
  where a = union (row board (x, y)) $ col board (x, y)
        b = union a $ block board (x, y)

...which is also ugly as fuck. Let me zoom into two particular pieces for you

origin :: Int -> Int
-- origin n = 3 * fromIntegral $ floor $ n / 3
origin n
  | 3 > n = 0
  | 6 > n = 3
  | otherwise = 6

That commented line is what we wanted to do initially. Actually, what we wanted was just blockSize * (floor $ n/blockSize), except that Ints apparently don't derive RealFrac? I don't know what the actual problem there is, and I'll likely toss the question up to SO later, but at the time, we went with the ugly, 9x9-specific, guard-based approach.

block :: Board -> (Int, Int) -> Set Int
block board (x, y) = fromList . concat . map (genericTake blockSize . drop ox) $ genericTake blockSize $ drop oy board
  where blockSize = 3
        ox = origin x
        oy = origin y

Note that blockSize is specified as 3. Which means that this function is also specific to 9x9 Sudoku boards. We went this route for the same reasons as before; the type system doesn't like it when you try to use fromIntegral . sqrt $ length board for some reason, and we didn't have enough time left to figure out what type annotations we'd need to convince GHCi that yes, this is really what we mean.

We also didn't really have enough time in one 40-minute session to write solve, and chat about Haskell and try to form theories about the errors we were seeing, but it's a reasonable-if-rushed implementation of the approach I was assuming we'd use.

The last session, my partner and I decided to have some fun with JavaScript. Except neither of us had node installed standalone, so we ended up using the Chromium JS console. Once we loaded up underscore, things went much faster than expected. We managed to put together a solution that works on any valid board size, that could solve boards where it doesn't need to guess.

var sample4x4 = [[1, 0, 3, 0],
                 [0, 4, 0, 2],
                 [0, 3, 4 ,0],
                 [4, 0, 2, 3]];

var sample9x9 = [[0, 7, 1, 4, 0, 0, 0, 0, 5],
                 [0, 0, 0, 0, 5, 0, 0, 8, 0],
                 [0, 0, 3, 9, 0, 7, 6, 0, 0],
                 [0, 0, 0, 0, 0, 1, 0, 0, 0],
                 [0, 9, 0, 8, 0, 6, 0, 0, 3],
                 [0, 0, 0, 0, 0, 0, 8, 2, 0],
                 [0, 6, 0, 0, 4, 0, 7, 0, 8],
                 [3, 0, 0, 0, 0, 0, 0, 9, 0],
                 [0, 0, 0, 0, 8, 5, 0, 0, 0]];

function possibilities (board, x, y) { 
    return _.difference(_.range(1, _.size(board[0]) + 1), 
                        row(board, x, y),
                        col(board, x, y),
                        block(board, x, y))
}

function row(board, x, y) {
    return board[y];
}

function col(board, x, y) {
    return _.map(board, function (row) { row[x] })
}

function block(board, x, y) {
    var blockSize = Math.sqrt(_.size(board[0]))
    var origin = function (n) { 
        return blockSize * Math.floor(n / blockSize)
    }
    var relevantRows = _.take(_.drop(board, origin(y)), blockSize);
    return _.reduce(relevantRows, function (memo, row) { 
        return _.union(memo, _.take(_.drop(row, origin(y)), blockSize)) 
    }, []);
}

function isFilled (board) { 
    return _.every(board, function (row) { _.every(row, function (val) { val != 0; })})
}

function solve(board) {
    if (isFilled(board)) {
        return board;
    } else {
        return solve(_.map(board, function (row, y) { 
            return _.map(row, function (val, x) { 
                var poss = possibilities(board, x, y);
                if (val != 0) {
                    return val;
                } else if ((val == 0) && (_.size(poss) == 1)) {
                    return poss[0];
                } else {
                    console.log("UNSOLVED");
                    return 0;
                }
            })
        }));
    }
}

Easily the most annoying thing about writing quickly in JS after having gotten so used to the functional paradigm is having to return things explicitly. That bugs the crap out of me over in Python-land too, but at least Python makes up for it by saving you from lines like this

                }
            })
        }));
    }
}

I could do the Lisp-esque thing and start writing

                }})}));}}

but doubt anyone would appreciate that.

Further Reading

The Sudoku situation isn't quite as solved as I assumed before reading up on it. The appropriate Rosetta Code page doesn't have any solutions that leave me gobsmacked by elegance the way that the Clojure Game of Life did. Though the Prolog approach looks quite interesting. Also, the Haskellers apparently look at Sudoku-solver-writing as a hobby, so it has its own page over at the Haskell wiki. "Elegant" isn't something I'd say about any of these, but that's probably because my Math level isn't high enough.

The last note I'll leave you on is that brute-forcing apparently isn't the best solution here. Sudoku is an instance of something called the Exact cover problem, and a Marlboro College student named Sam Auciello wrote an implementation of that solution in Python (links to code files at the bottom of that page page). I haven't quite gotten my head around this one yet, and that's probably what I'll be doing with my next few chunklets of spare time.


Footnotes

1 - [back] - (If you're doing this functionally and recursively, it seems like it would be fairly straight-forward. Side-effects complicate things slightly since you have to copy your board out at each decision step, or devise some other method of backtracking)

Thursday, May 2, 2013

How to implement a lazy sort

You can't implement a lazy sort.

Not really.

I mean, ok, yes, you can implement a sort that defers as much work as possible via that destructured approach I touched on last time and which I will expand on in a minute, but that's not a lazy sort.

It has certain characteristics which seem intrinsic to sorts that prevent it from behaving the way you'd expect an actual lazy function to behave. For starters, it doesn't save you any memory. In fact, you need to pull some fairly fancy footwork to write a functional, deferring sort that doesn't waste much more memory than the in-place version[1]. Next, it can't consume infinite sequences. That's to do with the definition of a sort; I can't see a good way of sorting an infinite sequence without backtracking, which isn't practical in the wild. Finally, it doesn't seem to save you as much time in composition as a regular lazy function would. It has to consume the entire relevant input stream, do some preliminary processing on it[2], then hand you a stream of outputs.

The end result is a sort function that defers a lot of its work, and does save you a lot of time if you only happen to want the first bit of a list sorted, and it even saves you a little bit of time if you want to run several lazy functions in succession on its output, but it doesn't quite do enough to get the label "lazy" without a massive asterisk next to it.

How to implement a lazy* sort

First up, here's the Python code.

from copy import copy

def eager(aList, pred=lambda a, b: a > b):
    arr = copy(aList)
    count = len(aList)
    heapify(arr, count, pred)
    end = count - 1
    while end > 0:
        __swap(arr, end, 0)
        end -= 1
        sift(arr, 0, end, pred)
    arr.reverse()
    return arr

def lazy(aList, pred=lambda a, b: a > b):
    arr = copy(aList)
    count = len(arr)
    heapify(arr, count, pred)
    end = count - 1
    while end > 0:
        yield arr[0]
        __swap(arr, end, 0)
        end -= 1
        sift(arr, 0, end, pred)
    yield arr[0]

def heapify(arr, count, pred=lambda a, b: a > b):
    start = (count - 1)/2
    while start >= 0:
        sift(arr, start, count-1, pred)
        start -= 1
    return arr

def sift(arr, start, end, pred=lambda a, b: a > b):
    root = start
    while root * 2 + 1 <= end:
        child = root * 2 + 1
        target = root
        if pred(arr[target], arr[child]):
            target = child
        if child+1 <= end and pred(arr[target], arr[child+1]):
            target = child + 1
        if not target == root:
            __swap(arr, root, target)
            root = target
        else:
            return arr

def __swap(arr, a, b):
    arr[a], arr[b] = arr[b], arr[a]

This is a direct translation of the Wikipedia Heapsort pseudo-code. And yes, I used Python because it's close enough to pseudo-code that I could do a working, line-by-line translation. It was really, really tempting to just implement eager as list(lazy(aList, pred)), but that wouldn't have told me what I wanted to know for the next bit.

>>> import heapsort, cProfile
>>> from random import Random
>>> sample = [Random().randint(0, 1000) for i in xrange(0, 5000)]
>>> cProfile.run("heapsort.eager(sample)")
         172098 function calls in 0.103 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.103    0.103 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 copy.py:113(_copy_with_constructor)
        1    0.000    0.000    0.000    0.000 copy.py:66(copy)
        1    0.002    0.002    0.013    0.013 heapsort.py:27(heapify)
   107567    0.014    0.000    0.014    0.000 heapsort.py:3(<lambda>)
        1    0.004    0.004    0.103    0.103 heapsort.py:3(eager)
     7499    0.069    0.000    0.097    0.000 heapsort.py:34(sift)
    57023    0.015    0.000    0.015    0.000 heapsort.py:49(__swap)
        1    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'reverse' of 'list' objects}


>>> cProfile.run("heapsort.lazy(sample)")
         3 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 heapsort.py:15(lazy)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


>>> cProfile.run("list(heapsort.lazy(sample))")
         177097 function calls in 0.107 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.107    0.107 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 copy.py:113(_copy_with_constructor)
        1    0.000    0.000    0.000    0.000 copy.py:66(copy)
   107567    0.015    0.000    0.015    0.000 heapsort.py:15(<lambda>)
     5001    0.004    0.000    0.106    0.000 heapsort.py:15(lazy)
        1    0.002    0.002    0.014    0.014 heapsort.py:27(heapify)
     7499    0.070    0.000    0.098    0.000 heapsort.py:34(sift)
    57023    0.015    0.000    0.015    0.000 heapsort.py:49(__swap)
        1    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}


>>> 

This is that work deferring thing happening. The call to heapsort.lazy doesn't give you a sorted list; it gives you a generator you can traverse to get that list. The call to heapsort.eager does give you the whole sorted list, and takes very slightly less time than the lazy version if you happen to need the whole list. As I said before though; if you're only after the first 10% or so elements, there's no contest in terms of execution time, even if you're trying to be semi-functional by copying out the input instead of destructively modifying it.

Oh, before anyone gets the wrong idea

>>> cProfile.run("sorted(sample)")
         3 function calls in 0.002 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.002    0.002 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.002    0.002    0.002    0.002 {sorted}


>>> 

My stupid little heapsort implementation isn't meant to showcase how slow Python is or anything like that. It's a learning exercise to show how you'd go about implementing a deferred sort in principle, not go into the nuts-and-bolts tuning process that comes once you've got your algorithm and data structures down. In other words, see it as a direct comparison of a shitty sort to the equivalent-except-lazy shitty sort.

Second verse, same as the first

Except with more parentheses. And I actually try to think through the problem rather than mindlessly parroting back an algorithm outline pilfered from Wikipedia.

(defpackage :heapsort (:use :cl))
(in-package :heapsort)

(defun heapsort-lazy (seq &optional (predicate #'>))
  (let* ((len (length seq))
         (heap-vector (heapify seq len predicate)))
    (lambda ()
      (decf len)
      (heappop! heap-vector len predicate))))

(defun heapsort-eager (seq &optional (predicate #'>))
  (let* ((len (length seq))
         (heap-vector (heapify seq len predicate)))
    (loop for i from (- len 1) downto 0
       unless (= 0 i) 
       collect (heappop! heap-vector i predicate))))

(defun heapify (seq len predicate)
  (loop with buf = (make-array len :adjustable t :fill-pointer len)
     for ix from 0
     for elem in seq do (setf (aref buf ix) elem)
     do (loop with i = ix
           while (> i 0)
           for parent = (heap-parent i)
           until (compare buf parent i predicate)
           do (swap! buf parent i)
           do (setf i parent))
     finally (return buf)))

(defun sift! (heap-vector start end predicate)
  (loop with ix = start
     until (> (+ 1 (* 2 ix)) end)
     while (loop for child-ix in (heap-children-descending heap-vector ix end predicate)
              when child-ix
              do (unless (or (compare heap-vector ix child-ix predicate))
                   (swap! heap-vector ix child-ix)
                   (setf ix child-ix)
                   (return t))
              finally (return nil))))

(defun heappop! (heap-vector last predicate)
  (swap! heap-vector 0 last)
  (sift! heap-vector 0 (- last 1) predicate)
  (vector-pop heap-vector))

(defun heap-children-descending (heap-vector ix bounds predicate)
  (let ((child-l (+ 1 (* 2 ix)))
        (child-r (+ 2 (* 2 ix))))
    (cond ((> child-l bounds) nil)
          ((> child-r bounds) (list child-l nil))
          (t (if (compare heap-vector child-l child-r predicate) 
                 (list child-l child-r)
                 (list child-r child-l))))))

(defun heap-parent (n)
  (- (ceiling (/ n 2)) 1))

(defun compare (arr ix-a ix-b &optional (predicate #'>))
  (funcall predicate (aref arr ix-a) (aref arr ix-b)))

(defun swap! (arr ix-a ix-b)
  (rotatef (aref arr ix-a) (aref arr ix-b)))

I'm not going to bother showing you the profiling on this one. Rest assured that the results on this and the Python version were very similar; the eager version is marginally faster than the lazy version at sorting the entire list and handing it to you, but has a massive disadvantage if you only want some small chunklet of the complete list. Also, the built-in sort beats both by several orders of magnitude.

For those of you who, like me, have never worked with heaps before[3], here's some basic theory. A heap is actually two things:

  • A tree-based data structure in which each parent node is ordered with respect to its children. This is the easier-than-sorted-but-still-useful property mentioned earlier; children aren't ordered with respect to each other, and if you're watching the wiki illustration for the first time ever, you might be forgiven for thinking that step 1 involves randomly re-arranging your input. It's very easy to pull out the next element; it's the root. However, every time you pop the root, you need to do some re-juggling to maintain the heap property.
  • A way of packing said tree-based data-structure into a 1-d array. It's not painfully obvious, so I figured I'd make this part explicit: you pack a heap into a vector by designating (aref vector (+ 1 (* i 2))) and (aref vector (+ 2 (* i 2))) to be the children of (aref vector i). This is faster than navigating an actual pointer tree, but it makes the structure of the code a bit counter-intuitive to the uninitiated, since it's talking about indices in non-obvious ways rather than talking about parents and children.

Now then, most of the actual heapsort.lisp code is implementing a heap. Again, just for educational purposes, I'm sure there's a variable-predicate heap implementation floating around somewhere even though I haven't looked for it[4]. In fact, lets take a look at the top-level functions before diving into that code, just to get it out of the way.

(defun heapsort-eager (seq &optional (predicate #'>))
  (let* ((len (length seq))
         (heap-vector (heapify seq len predicate)))
    (loop for i from (- len 1) downto 0
       unless (= 0 i) 
       collect (heappop! heap-vector i predicate))))

We take a list, heapify it, then collect heappop!ed elements and return the result. Nothing to see here, it's exactly what you'd expect from a sort.

The lazy version is mildly more interesting

(defun heapsort-lazy (seq &optional (predicate #'>))
  (let* ((len (length seq))
         (heap-vector (heapify seq len predicate)))
    (lambda ()
      (decf len)
      (heappop! heap-vector len predicate))))

Common Lisp doesn't have the notion of a generator in the same sense as Python, but a lambda with a closure around it does just as well for our purposes. You keep calling it to get at the next element, and it eventually throws an invalid-array-index-error that you need to deal with in some way. This actually seems like the most reasonable solution here; the alternative is something like

(defun heapsort-lazy (seq &optional (predicate #'>))
  (let* ((len (length seq))
         (heap-vector (heapify seq len predicate)))
    (lambda ()
      (if (>= 0 len)
          (values nil nil)
          (progn (decf len)
                 (values (heappop! heap-vector len predicate) t))))))

It's tempting to just return NIL, but then there's no way for a caller to disambiguate between "The next element in your sequence is NIL" and "There are no more elements in the sequence. So NIL". My kingdom for a Maybe monad, as annoying as most people seem to consider them.

Anyhow, onward.

(defun heapify (seq len predicate)
  (loop with buf = (make-array len :adjustable t :fill-pointer len)
     for ix from 0
     for elem in seq do (setf (aref buf ix) elem)
     do (loop with i = ix
           while (> i 0)
           for parent = (heap-parent i)
           until (compare buf parent i predicate)
           do (swap! buf parent i)
           do (setf i parent))
     finally (return buf)))

My definition of heapify doesn't use a call to sift! anywhere, in blatant defiance of the standard implementation. Really, I should have factored that middle bit out into heappush!, because that's what it amounts to. You start with an empty heap, insert new elements, and compare each new element to its parent, calling swap! until you have something that respects the Heap Property.

(defun swap! (arr ix-a ix-b)
  (rotatef (aref arr ix-a) (aref arr ix-b)))

swap! is implemented in terms of rotatef; it takes an array and two indices, and swaps the appropriate array cells. heap-parent shouldn't surprise you at all if you were paying attention when I explained what a heap actually is

(defun heap-parent (n)
  (- (ceiling (/ n 2)) 1))

And heappop! swaps the first element with the last, calls sift! on everything but the last element, then runs vector-pop to return the last element and shorten the vector.

(defun heappop! (heap-vector last predicate)
  (swap! heap-vector 0 last)
  (sift! heap-vector 0 (- last 1) predicate)
  (vector-pop heap-vector))

Which just leaves the sift! procedure, and its utility functions.

(defun sift! (heap-vector start end predicate)
  (loop with ix = start
     until (> (+ 1 (* 2 ix)) end)
     while (loop for child-ix in (heap-children-descending heap-vector ix end predicate)
              when child-ix
              do (unless (or (compare heap-vector ix child-ix predicate))
                   (swap! heap-vector ix child-ix)
                   (setf ix child-ix)
                   (return t))
              finally (return nil))))

It takes a start parameter, since the pseudo-code did the same, but I didn't find myself calling it with anything other than 0, so maybe that was a bit of a waste. To be fair, that pseudo also uses sift! as part of insertion, rather than doing the more straight-forward parent comparison, which might explain the difference. This is literally the most impenetrable part of this program, and it's crucial, because unless you understand this you won't get how the entire system produces sorted output. I'll take it slow, just in case. Feel free to stop reading here if you know this already.

  ...
  (loop with ix = start
  ...

We're starting at the beginning

     ...
     until (> (+ 1 (* 2 ix)) end)
     while [something-huge]
     ...

and going either until we get to the end of the heap, or until we get to the point where no more calls to swap! are needed. That makes sense because if we don't need to swap! further, and we started with a heap, we know that the rest of it already satisfies the Heap Property and therefore doesn't need to be heaped again. If we've gotten to the end, then we know that we just tried to sift! the smallest element in the heap, which is why it's at the bottom.

     while (loop for child-ix in (heap-children-descending heap-vector ix end predicate)
     ...

That's going to do something to each child-ix in the result of heap-children-descending function.

(defun heap-children-descending (heap-vector ix bounds predicate)
  (let ((child-l (+ 1 (* 2 ix)))
        (child-r (+ 2 (* 2 ix))))
    (cond ((> child-l bounds) nil)
          ((> child-r bounds) (list child-l nil))
          (t (if (compare heap-vector child-l child-r predicate) 
                 (list child-l child-r)
                 (list child-r child-l))))))

This isn't in the pseudocode either; I ended up deciding to compare the children for size so that the parent compares against the greatest first so that the Heap Property is more easily preserved. The return value is a list of children in descending order, and we also handle the case where this particular parent only has one child. Oh, also, compare is just a utility function that helps me compare two array elements by index

(defun compare (arr ix-a ix-b &optional (predicate #'>))
  (funcall predicate (aref arr ix-a) (aref arr ix-b)))

Now then.

              ...
              when child-ix
              do (unless (or (compare heap-vector ix child-ix predicate))
                   (swap! heap-vector ix child-ix)
                   (setf ix child-ix)
                   (return t))
              finally (return nil))))
              ...

If there are any children, compare the parent to them in descending order. If one of the children is bigger than the parent, swap! them, set ix to the index of that child (which now contains our new parent), and return t (this tells the upper loop to keep comparing, since we now need to compare that new ix to its children). Otherwise, return nil to signal that we've achieved a new heap.

And that's that. As long as the Heap Property is respected, the next item by whatever predicate was passed is always one quasi-efficient heappop! away, and as a result, a call to heapsort-lazy never does any more work than it absolutely needs to while still providing flexible sorted output.

I learned a lot implementing this. Hopefully the write-up does something for someone else out there too.


Footnotes

1 - [back] - I don't incidentally, both the Python and Common Lisp code I'm going to show you use generators. That is, side-effect-dependent lazy sequences with the restriction that you can only traverse them once, mostly as a result of those side-effects.

2 - [back] - I've been saying "heapsort", though that's strictly speaking not the case. What you need for a lazy* sort is an intermediate structure that

  • is easier create than a full sort in the general case
  • has some property which makes it more useful than an arbitrary collection when it comes to finding next
A heap does both of those, and it's fairly easy to understand, so I picked it, but it's not necessarily the only or best approach. I just don't know any other ones.

3 - [back] - Graphic Design degree, remember? They didn't offer Data Structures 101, as much as I would have enjoyed that much more than Art History for Consumers.

4 - [back] - As a note, Python doesn't seem to. The standard heapq doesn't provide a way of pulling out a key from the sorted structures, or a way change out predicates. The standard Python wisdom seems to be pulling keys out yourself, storing your actual values in a way indexable by said keys, then sorting the keys instead.

Monday, April 29, 2013

Dissecting Squares

I know I posted earlier, but this has to come out of my head. And I don't think I'll have time to do much more writing this week, despite the fact that there are at least three other things on my mind that I need to idea peel.

So...

That's what I'm thinking.

And here's an attempt at forcing this primitive tool called "English" to explain what it means in my head.

First off, this is what I was thinking previously. Which is to say, not much in particular; just working my way up to meaningfully tearing into the problem. The basic problem is that there are lots of ways to dissect a square into smaller, integer-sized squares, and brute-forcing it isn't going to be viable for very high ns. In fact, from what I've observed, it took multiple days of compute time to solve for 16, and that's not encouraging.

Thing is, there are lots of little ways to cut out a lot of the brute-force work required for the calculation. An approach that starts from the universe of possibilities and filters isn't going to get very far, but we can constrain that universe pretty significantly if we pick our model carefully, and I think I have.

Characteristics of the Optimal Dissection Model

Cut at the base

A huge contribution to the final tally of work is going to be figuring out where 1x1 squares can fit. If you look at each of them as a Placement, you'll be overwhelmed pretty quickly. An easy way out of that is picking a model that lets you ignore 1x1 squares, and the easiest obvious way to do that is using a sparse array with the understanding that any unrecorded placements actually represent 1x1 squares. This lets you ignore a bunch of StartingPoints too (more on that in a moment).

Take symmetry into account

A second huge contribution to the bottom line of work for this problem is that you need to recompute a lot of dissections which you must then de-duplicate later. I don't think we can lick this one entirely; we'll still need to do Set insertions at some point in the process just to make sure we're not counting anything twice. But. We can cut out a lot of them. Specifically, since the problem already mentions rotations and reflections, we don't need to Place any squares that we know will be picked up by some other permutations' reflections or rotations. That can greatly reduce the number potential starting nodes these placement trees. It also means we'll want a representation where rotate and reflect are really, really fast operations. The sparse array approach seems to be better than using an explicit grid, but I'm not quite convinced it's optimal.

Bring it all together

So what we've got as a 10000-foot-view for the process is this:

  1. find all StartingPoints
  2. unfold each starting point, starting with squares of size nxn and working your way down to 2x2
  3. record the null dissection (a grid full of 1x1 squares)
  4. return the size of the set of recorded dissections
  • A StartingPoint is any point on our grid from '(0 0) to `(,(limit n) ,(limit n)).
  • The limit of n is (- (ceiling (/ n 2)) 1)
  • To unfold a StartingPoint, insert the solitary placement as a dissection (remember, any empty spaces are treated as "placed 1x1 squares"), then find all free SecondPoints and unfold them.
  • A SecondPoint is any free StartingPoint, or any point between `(,(limit n) ,y) and `(,(- n 2) ,y). Again, any others should be represented among reflections/rotateions of other dissections.
  • To unfold a SecondPoint, insert it as a dissection, then unfold all remaining free points on the grid (there's probably a way to cut this step down, but I can't see it yet).
  • To unfold any other point, insert it as a dissection, and unfold all remaining free points until there's no more room.
  • To insert a dissection, perform Set insertion on each of its rotateions and reflections into the set of all insertions for this particular grid.
  • A free point is one where there's enough room to put a square larger than 1x1 (this implies that place could probably remove some additional squares from the potential starting pool to make calculating this easier).

Granted, that's easier said than done. I get the feeling that when I actually go to implement this fully, a whole bunch of problems are going to crop up, but at least I have a half-way decent starting point.

Briefly, Laziness

I've gotten some questions about what, exactly, laziness is good for, so I'll want to touch on it. Briefly.

Short answer: it saves you space and time.

Space

Really, this should be obvious, but if we're going through this, lets do it properly. Here are some lists

(defparameter foo (list 1 2 3 4 5 6 7 8 9))
foo = [1, 2, 3, 4, 5, 6, 7, 8, 9] ## This one's from Python
foo = [1, 2, 3, 4, 5, 6, 7, 8, 9] -- This one's from Haskell
foo = [1, 2, 3, 4, 5, 6, 7, 8, 9] // This one's from Javascript

and here's some lazy lists

(defun lazy-numbers (&optional (starting-with 1))
  (cons starting-with (lambda () (lazy-numbers (+ starting-with 1)))))

(defparameter foo (lazy-numbers))
def lazyNumbers(startingWith=1):
    i = startingWith
    while True:
        yield i
        i += 1

foo = lazyNumbers()
foo = [1..]
function lazyNumbers (startingWith) {
    if (startingWith === undefined) startingWith = 1;
    return { num: startingWith,
             next: function () {
                 this.num += 1;
                 return this.num
             }
           }
}

foo = lazyNumbers();

Granted the second set looks more complicated, except for the Haskell line, but it has some advantages. Firstly, while the first bunch of lists is bounded, this second bunch is infinite, and you do sometimes want to express that. More to the point though, the reason these can be infinite is that they're lazy. They only compute as much of the remainder of the sequence as you actually ask for, which means they save you space in two ways

  • they don't keep the entire list in memory by default; they deal with only one element at a time and any used ones are garbage collected unless you decide to keep a pointer to them yourself
  • they never bother computing parts of the list you don't call for, so they don't waste space storing values that'll never actually get used somewhere

That's basic stuff, it should be pretty obvious. Less obvious, but more significant, is how this saves you time.

Time

By its lonesome, it actually doesn't.

(loop with lst = (list 1 2 3 4 5 6 7 8 9)
   repeat 5 for n in lst
   do (format t "~a~%" n))

has the same performance characteristics as

(loop with lst = (lazy-numbers 1) repeat 5
     repeat 5 for n = (car lst) for lst = (funcall (cdr lst))
     do (format t "~a~%" n))

But what about when we compose multiple operations on one sequence? You'll recall that back when I sat down to optimize Life for a couple of days, I made a big deal of reducing the number of iterations of our corpus. Specifically, recall that the Haskell version of Gridless Life started out like this:

import Data.List

neighbors :: (Int, Int) -> [(Int, Int)]
neighbors (x, y) = [(x+dx, y+dy) | dx <- [-1..1], dy <- [-1..1], (dx,dy) /= (0,0)]

lifeStep :: [(Int, Int)] -> [(Int, Int)]
lifeStep cells = [head g | g <- grouped cells, viable g]
  where grouped = group . sort . concat . map neighbors
        viable [_,_,_] = True
        viable [c,_] = c `elem` cells
        viable _ = False

and zoom in on one line in particular.

...
  where grouped = group . sort . concat . map neighbors
        ...

That right there is what laziness trivially enables. If you think about what's being done here, you'll realize that you'd never do this in an eager context. Because if you did, what you'd get is

  1. the argument would get evaluated
  2. it would get traversed once by map neighbors
  3. and then it would get traversed again by concat
  4. and then it would get traversed again by sort[1]
  5. and then it would get traversed again by group

which means six total trips over the entire list. That's bad because every element you touch adds to your run time, and each element you have to touch again as part of the computation is one that you can't throw out of memory. On the other hand, expressing a computation by composing smaller, easy computations is very straightforward[2]. This is a place where the code you want to run, and the code you want to write are completely different.

What you want to run is a tight loop that walks over the entire corpus once, and applies all of the chained functions at once per element. What you want to write is the naive composition of those chained functions, because you've otherwise created an algorithm that will only ever be useful for your particular computation, and that computation will be burdened with the specifics of iteration which will make it non-trivial at best and impenetrable at worst.

Now, granted, Common Lisp has other ways of dealing of dealing with this[3], but laziness is one general solution. If each of those functions were lazy (as, in fact, they are in Haskell), what you'd get instead is exactly what you want. One, tight loop running over the entire corpus, applying only as much grouping, sorting, concating and neighborsing as it needed to for the next part of the computation. That saves you a few trips over the input with no additional effort on your part. Talk about low-hanging fruit.

I'll be honest, I was also going to talk about my latest thoughts on the square dissection problem, but this ended up being longer than I'd like as it is. It'll probably happen next time, I guess. Probably. Stay tuned if you're into that sort of thing.


Footnotes

1 - [back] - A note on functional, lazy sorts, because I was wondering about this both back when tuning the Haskell version of Life and as I was re-visiting it for today's feature. The way that lazy sorts seem to work is basically by using a destructured heapsort. Specifically, if you take a look at this pseudocode, what's happening is that a lazy sort runs heapify right away and passes up the first element, then pulls out the next element each time it's asked for one. That results in On performance for finding the first element (which as far as I'm aware is what you'd have to do in order to get the "nextest" element in the general case anyway), followed by O(log n) performance on looking up each next element. That's good because it means you don't have to do it all at once, and it means that if you only want the first 5 sorted elements of a list of 10000, you get to avoid most of the work. On the other hand, note that this doesn't save you much memory, if any; you still need to store that heap for the whole list, even if you only want the first few chunklets.

2 - [back] - Which is why I did this for the first pass of that program, not realizing that Haskell's lazy-by-default outlook would also make it about as efficient as it could be.

3 - [back] - In fact, the situation of "I want to write the pretty code, but run the ugly code" should send any Lisp veterans off thinking about how you'd defmacro your way out of this particular annoyance.

Tuesday, April 23, 2013

Conduits and More Squares

Ok, so I've been coding something for the past little while in between thinking about squares and reading up on concurrency in haskell. I'm not going to tell you what it is, because we've established my track record, but I did want to note that it looks like it's going to rely on Conduits. A minimal module that provides some interesting additional options for defpackage.

It doesn't have a quicklisp installation package yet, so I've written a basic .asd and chucked it up onto my github in the meantime.

More Square Thoughts

I haven't been actively working on this, just sort of letting it percolate at the back of my mind while other stuff is going on. Having taken a closer look at the original solution for 1..4, and imagining how its author went about getting it, it looks like we might be able to cut a lot of placements out of the process if we picked a representation that was easy to reflect and rotate. Then the actual solution would look something like

(defclass grid ...)

(defmethod rotate ((grid grid) squares direction) ...)
(defmethod reflect ((grid grid) squares axis) ...)

(defmethod collect-rotations ((grid grid) squares)
  (insert-dissection grid squares)
  (loop repeat 3 (insert-dissection grid (rotate grid squares :cw)))
  (rotate grid squares :cw))

(defmethod collect-reflection ((grid grid) squares)
  (loop for axis in '(:x :y :xy :yx)
     do (collect-rotations (reflect grid squares axis))))

(defmethod collect-dissections ((grid grid) squares)
  (collect-rotations grid squares)
  (collect-reflections grid squares))

(loop for sq in (starting-positions a-grid)
   do (loop until (full? g) collect (place-next g) into g
         do (collect-dissections a-grid g))
   finally (return (dissection-count grid)))

Assuming this works, we could easily write a solution to the "hard" version too; just call insert-dissection directly rather than going through collect-dissection and friends.

I'm carefully refraining from defining dissection-count, insert-dissection, grid, and indeed rotate and reflect. What I know is

  • insert-dissection is going to have to do the work of ensuring that duplicate dissections are discarded, which implies storing them as a set or hash
  • the definitions of rotate and reflect will depend heavily on the definition of grid, and square storage needs to be thought out to make sure they're all fast
  • it seems like there should be a better way to solve this problem than "brute force", but I'm not seeing it yet. I've joked about it before, but this may actually be the problem that gets me off my ass and into seriously learning about genetic algorithms

That's that for now. I was going to talk a bit about how work has been going, and the shape of small-scale development in Toronto's medical industry, but it looks like a client has finally decided to respond to me, so back to the grindstone I go.

Friday, April 19, 2013

Hardware and Squares

First things first, there was finally a sale on these things, and I was getting sick of the jury-rigged Vertex in my current laptop, so I picked one up. They're back up at ~$170, but mine cost me a much more attractive $90 last week.

First impressions are ok.

I'm not going to endorse SATA3 for anyone, because it's honestly not a mind-blowing change. I mean load times are noticeably faster, but the difference isn't anything like what you get from switching to a SATA2 SSD from a traditional platter drive. If you end up trading up to this from one of the slower SSDs, be prepared to hear yourself say "Oh, that's nice" rather than "This is fucking glorious". Honestly, my purchase was a result of the fact that this sale price came under my may-as-well threshold for hardware and that my previous HD was literally held in by rubber stoppers and electrical tape.

Dissecting Squares

I'm blaming Dann for this one again. Here's the problem we decided to go after for this months' Toronto Code Retreat:

Suppose you have a grid of squares nxn large. There is a finite number of ways you can split that square grid up into squares of at least 1x1 size. For example, for a 1x1 grid, there is only 1 possible dissection. For a 2x2 grid, there are 2 (either 4 1x1 squares or 1 2x2 square). For a 3x3 grid there are 6; (either 9 1x1 squares, 1 3x3 square, or 4 separate combinations of 1 2x2 square and 5 1x1 squares). This is the Number of Square Dissections.

(A complete illustration of the first four terms is over in the OEIS page. There's also a further three pages illustrating the fifth term. This gets big in a hurry.)

The challenge:

  • "Easy" Mode: Write a function that takes a number N and returns the Number of Square Dissections of an N by N grid.
  • "Hard" Mode: Write a second function with the same input as above that returns the number of unique, under rotation and reflection, dissections.

Before you chuckle at this, bear in mind that a general solution is publishable[1]. According to Dann, the sequence is solved up to an N of 15 or so, and that's already some ungodly number best expressed by exponents.

A couple of people took a stab at working "Hard" mode solutions[2], but as far as I know, no one managed to get to a working solution for "Easy" mode. This is the first Code Retreat I've gone to where I've felt deeply inadequate as a programmer, and that the most popular tool "a notebook" rather than a particular programming language.

As to the problem, I tried a bunch of approaches, none of which seemed to do anything other than annoy me. In terms of problem-solving, I'm usually a data-representation guy. Maybe it's the design degree making me focus on that, I don't fucking know, but it works fairly well. Given a thorny problem, my first reflex is to figure out a way of storing the data such that the mechanical components of the solution are going to be as simple as possible with no more moving parts than necessary. Which is why my train of thought as the problem was coming in was something like...

Can we do this as a naive area calculation? No, that actually fails on the 3x3 grid. We need to represent area and position, or we might represent the remaining area of the board in a more explicit way. Too complicated. Ok, we need to fit things into other things; is this actually a factoring problem? No, we're looking for the count of unique ways of factoring a thing rather than actual factorization. Doing it that way sounds like it would be computationally expensive. Can we model it as a tree of possible square placements? We'd need to represent each square as a width x y triple, at minimum, starting with (list (- n 2) 0 0) and working from there. But there are multiple interconnected combinations that can start with each 2x2 square on a 4x4 grid, and we can't actually prune a lot of sub-branches. Can we do this as a graph traversal? Sounds pretty go... wait, no. No, we wouldn't be visiting each node once. In fact, we can't know a-priori how many times a given node would have to be visited, or what the endpoints of any valid traversal are. Can we model this as an inventory system? Each grid has yae space and adding a square depletes it; when you get stuck, fill the rest with ones, each fill is a different dissection, count the results. Wait, isn't that going to be O^scary^fucking-scary? You need to generate all possible inventory combinations and de-duplicate. I guess you could keep a set of dissections somewhere and keep a canonical sorting order for placed squares. That still implies a representation that tracks position and size rather than just a size and a count. Is it possible that a 2D grid is actually a better representation for this one?-Inaimathi's brain

As it happens, sitting down and throwing together a simple program to test some of these didn't do anything to help. I'd start, get about a third of the way through, then realize that I wasn't accounting for this or that edge case. Or I'd figure out that edge-cases were covered, but the approach doesn't scale up to a general solution. The most promising I've got so far, and this is nowhere near a working solution yet, looks like

(defun positions (board-size square-size)
  (loop for x from 0 to (- board-size square-size)
     append (loop for y from 0 to (- board-size square-size)
               collect (list square-size x y))))

(defun next-squares (board-size starting-square disregard)
  (destructuring-bind (sq-size sq-x sq-y) starting-square
    (loop for current-sq-size from sq-size downto 2
         append (loop for x from sq-x 
                   append (loop for y from sq-y
                             when (and (or (= x sq-size) (= y sq-size)) 
                                       (not (member (list current-sq-size x y) disregard :test #'equal))) 
                             collect (list current-sq-size x y)
                             until (>= (+ sq-size y) board-size))
                   until (>= (+ sq-size x) board-size)))))

(defun tree-out (board-size list-of-squares)
  (loop for sq in list-of-squares
     collect sq into dis
     collect (next-squares board-size sq dis)))

which is godawful, and doesn't work yet, but it isn't obvious that it can't work so I'm rolling with it. The (very) general high-level approach is starting from the largest possible sub-squares of a given grid and doing a depth-first placement traversal. Each successful placement will then be sorted and stored in a set, which will ensure uniqueness[3]. There's a few places where we can definitely prune possibilities, but not as many as I thought there would be, which means that this will ultimately be a fairly expensive operation in any case.

Ugh. This isn't where I was planning on sinking the next week or so of my free time. I wanted to say a few words about Raspberry Pi hacking, and about its GPIO facilities specifically, or to talk a bit more about Wai in the context of large web applications, or maybe finally unveil my Plan For World Domination™© centered around building a general HTML-based MMO, but no.

I got a meme on me. So instead, I'll be thinking about squares.

And how they fit into each other.


Footnotes

1 - [back] - A proof that it's an NP-complete problem in the general case would also be noteworthy apparently.

2 - [back] - That is, taking the list of all dissections of a given grid size and filtering out the rotationally/reflectively unique ones.

3 - [back] - This can all happen in a single traversal, though there might be some opportunity for easy parallelism depending on how the implementation shakes out.

Tuesday, April 16, 2013

Simple web chat using Haskell's Wai/Warp

Here's a quick and dirty chat application written in Wai[1].

{-# LANGUAGE OverloadedStrings #-}
module Main where

import Control.Concurrent.Chan

import Control.Monad.Trans (liftIO)

import Network.Wai
import Network.Wai.EventSource
import Network.Wai.Handler.Warp (run)
import Network.Wai.Middleware.Gzip (gzip, def)
import Network.Wai.Parse (parseRequestBody, lbsBackEnd)
import Network.HTTP.Types (status200, ok200)

import Blaze.ByteString.Builder.Char.Utf8 (fromString)
import Data.ByteString.Char8 (ByteString, unpack)

app :: Chan ServerEvent -> Application
app chan req = do
  (params, _) <- parseRequestBody lbsBackEnd req
  case pathInfo req of
    [] -> return $ ResponseFile status200 [("Content-Type", "text/html")] "static/index.html" Nothing
    ["post"] -> liftIO $ postMessage chan $ lookPost "message" params
    ["source"] -> eventSourceAppChan chan req
    path -> error $ "unexpected pathInfo " ++ show (queryString req)

lookPost :: ByteString -> [(ByteString, ByteString)] -> String
lookPost paramName params = case lookup paramName params of
  Just val -> unpack val
  _ ->  ""

postMessage :: Chan ServerEvent -> String -> IO Response
postMessage chan msg = do
  writeChan chan $ ServerEvent (Just $ fromString "message") Nothing $ [fromString msg]
  return $ responseLBS ok200 [] "Posted"

main :: IO ()
main = do  
  chan <- newChan
  run 8000 $ gzip def $ app chan

That's the most basic example I could find/cobble together of using SSEs in Wai. That's the library called Network.Wai.EventSource up there, and you can see the channel represented in the expressions involving newChan, eventSourceAppChan and writeChan. Basically, we set up a Chan[2] at server startup, we hand out an endpoint whenever someone requests /source, and we write to all endpoints whenever someone requests /post.

The file index.html is exactly what you think it is; about 10 lines each of HTML and JavaScript that set up the front-end EventSource hooks and make sure the chat list gets updated with each new message. You could write it yourself without very much trouble.

This isn't particularly interesting. Firstly because, as you can see, it's ridiculously simple, and secondly because it doesn't scale. I mean it scales with users, sure. According to the Warp benchmarks, we can expect this to support somewhere between 20k and 50k people chatting depending on their loquaciousness, but since they'll all be chatting anonymously in the same room, the experience will stop being useful well before that. The next step confounded me for a little while because I had the assumption that using state in Haskell meant using the State monad[3]. It turns out that's probably not what you'd want here.

What we're after is a system where you can start up arbitrary new rooms, and post to a specific one. In other words, something like

{-# LANGUAGE OverloadedStrings #-}
module Main where

import Control.Concurrent.Chan
import Control.Concurrent (forkIO, threadDelay)

import Control.Monad.Trans (liftIO)
import Control.Monad.Trans.Resource (ResourceT)

import Network.Wai
import Network.Wai.EventSource
import Network.Wai.Handler.Warp (run)
import Network.Wai.Middleware.Gzip (gzip, def)
import Network.Wai.Parse (parseRequestBody, lbsBackEnd)

import Network.HTTP.Types (status200, ok200)
import Blaze.ByteString.Builder.Char.Utf8 (fromString)
import qualified Data.ByteString.Char8 as C

import Data.IORef
import Data.Text (unpack, pack)

app :: IORef [(String, Chan ServerEvent)] -> Application
app channels req = do
  (params, _) <- parseRequestBody lbsBackEnd req
  case pathInfo req of
    [] -> serveFile "text/html" "static/index.html"
    ["jquery.js"] -> serveFile "text/javascript" "static/jquery.min.js"
    ["chat.js"] -> serveFile "text/javascript" "static/chat.js"
    [channelName, action] -> do
      chan <- liftIO $ getOrCreateChannel channels $ unpack channelName
      case action of
        "post" -> 
          liftIO $ postMessage chan $ lookPost "message" params
        "source" -> 
          eventSourceAppChan chan req
        _ -> serveFile "text/html" "static/index.html"
    _ -> serveFile "text/html" "static/index.html"

serveFile :: C.ByteString -> FilePath -> ResourceT IO Response
serveFile mime filePath = return $ ResponseFile status200 [("Content-Type", mime)] filePath Nothing

lookPost :: C.ByteString -> [(C.ByteString, C.ByteString)] -> String
lookPost paramName params = case lookup paramName params of
  Just val -> C.unpack val
  _ ->  ""

getOrCreateChannel :: IORef [(String, Chan ServerEvent)] -> String -> IO (Chan ServerEvent)
getOrCreateChannel channels name = do
  res <- readIORef channels
  case lookup name res of
    Just chan -> 
      return chan
    _ -> do
      new <- newChan
      atomicModifyIORef channels (\cs -> ((name, new):cs, new))
      return new

postMessage :: Chan ServerEvent -> String -> IO Response
postMessage chan msg = do
  writeChan chan $ ServerEvent (Just $ fromString "message") Nothing $ [fromString msg]
  return $ responseLBS ok200 [] "Posted"

main :: IO ()
main = do
  channels <- newIORef []
  run 8000 $ gzip def $ app channels

That's a bit chunkier, but not by very much.

The significant operations there all involve something called an IORef, which is Haskell-talk for "a pointer". You can think of it an IO-based global variable that you can store stuff in[4], in this case, a map of channel names to channel streams.

That index.html file has a bunch of front-end changes too, mostly to do with acquiring and displaying multiple SSE sources, but we're not interested in that today. In the back-end, you'll notice that we've got a new function, getOrCreateChannel, which takes a "pointer" to our channel map and a name, and either returns the result of looking up that name, or inserts and returns a corresponding entry. readIORef "dereferences" that "pointer" to our map, and atomicModifyIORef mutates it. The rest of it should be self-explanatory.

Because we need to do a channel lookup before calling postMessage or eventSourceAppChan, our routes get a bit more complicated. We need to call getOrCreateChannel on the passed in channelName, then pass that to the appropriate function and return the response[5].

Finally, instead of passing a single channel to our app, we need to pass it a "pointer" to our lookup table. That happens in main at the bottom there.

The result of this exercise, as long as we put the front-end together appropriately, is a multi-room, anonymous, HTML chat system. More importantly though, this is a demonstration of how to handle simple global states in Haskell without tearing all your hair out.

I really wish someone else had written this before I started thinking about it...


Footnotes

1 - [back] - No, still not Yesod. Feel perfectly free to use it if that's your thing, but I'd still recommend Happstack if you absolutely, positively need a framework..

2 - [back] - Which I assume is reasonably efficient, since it's one of Haskell's basic concurrency constructs.

3 - [back] - Also, because I'm still not quite awesome enough at this that I can manipulate type expressions in my head. As a result, successful signature changes rarely happen first try, and I often find myself commenting them out then resorting to :t in GHCi and following the compilers' lead. I assume that's mechanical rather than a conceptual problem though, and talking about how I need more practice won't really help you out in any way.

4 - [back] - The IORef docs warn that using more than one of these in a program makes them unreliable in a multi-threaded setting. The thing is:

  • This chat program is extremely simple, needing only one global map to store open channels
  • If it ever got to the point of needing a more complex model, I'd hook it up to AcidState rather than trying to fiddle with MVars myself.

5 - [back] - You can see that happening in the branch labeled [channelName, action] ->, though we easily could have separated it into an external function rather than nesting cases.