Monday, April 29, 2013

Dissecting Squares

I know I posted earlier, but this has to come out of my head. And I don't think I'll have time to do much more writing this week, despite the fact that there are at least three other things on my mind that I need to idea peel.


That's what I'm thinking.

And here's an attempt at forcing this primitive tool called "English" to explain what it means in my head.

First off, this is what I was thinking previously. Which is to say, not much in particular; just working my way up to meaningfully tearing into the problem. The basic problem is that there are lots of ways to dissect a square into smaller, integer-sized squares, and brute-forcing it isn't going to be viable for very high ns. In fact, from what I've observed, it took multiple days of compute time to solve for 16, and that's not encouraging.

Thing is, there are lots of little ways to cut out a lot of the brute-force work required for the calculation. An approach that starts from the universe of possibilities and filters isn't going to get very far, but we can constrain that universe pretty significantly if we pick our model carefully, and I think I have.

Characteristics of the Optimal Dissection Model

Cut at the base

A huge contribution to the final tally of work is going to be figuring out where 1x1 squares can fit. If you look at each of them as a Placement, you'll be overwhelmed pretty quickly. An easy way out of that is picking a model that lets you ignore 1x1 squares, and the easiest obvious way to do that is using a sparse array with the understanding that any unrecorded placements actually represent 1x1 squares. This lets you ignore a bunch of StartingPoints too (more on that in a moment).

Take symmetry into account

A second huge contribution to the bottom line of work for this problem is that you need to recompute a lot of dissections which you must then de-duplicate later. I don't think we can lick this one entirely; we'll still need to do Set insertions at some point in the process just to make sure we're not counting anything twice. But. We can cut out a lot of them. Specifically, since the problem already mentions rotations and reflections, we don't need to Place any squares that we know will be picked up by some other permutations' reflections or rotations. That can greatly reduce the number potential starting nodes these placement trees. It also means we'll want a representation where rotate and reflect are really, really fast operations. The sparse array approach seems to be better than using an explicit grid, but I'm not quite convinced it's optimal.

Bring it all together

So what we've got as a 10000-foot-view for the process is this:

  1. find all StartingPoints
  2. unfold each starting point, starting with squares of size nxn and working your way down to 2x2
  3. record the null dissection (a grid full of 1x1 squares)
  4. return the size of the set of recorded dissections
  • A StartingPoint is any point on our grid from '(0 0) to `(,(limit n) ,(limit n)).
  • The limit of n is (- (ceiling (/ n 2)) 1)
  • To unfold a StartingPoint, insert the solitary placement as a dissection (remember, any empty spaces are treated as "placed 1x1 squares"), then find all free SecondPoints and unfold them.
  • A SecondPoint is any free StartingPoint, or any point between `(,(limit n) ,y) and `(,(- n 2) ,y). Again, any others should be represented among reflections/rotateions of other dissections.
  • To unfold a SecondPoint, insert it as a dissection, then unfold all remaining free points on the grid (there's probably a way to cut this step down, but I can't see it yet).
  • To unfold any other point, insert it as a dissection, and unfold all remaining free points until there's no more room.
  • To insert a dissection, perform Set insertion on each of its rotateions and reflections into the set of all insertions for this particular grid.
  • A free point is one where there's enough room to put a square larger than 1x1 (this implies that place could probably remove some additional squares from the potential starting pool to make calculating this easier).

Granted, that's easier said than done. I get the feeling that when I actually go to implement this fully, a whole bunch of problems are going to crop up, but at least I have a half-way decent starting point.

Briefly, Laziness

I've gotten some questions about what, exactly, laziness is good for, so I'll want to touch on it. Briefly.

Short answer: it saves you space and time.


Really, this should be obvious, but if we're going through this, lets do it properly. Here are some lists

(defparameter foo (list 1 2 3 4 5 6 7 8 9))
foo = [1, 2, 3, 4, 5, 6, 7, 8, 9] ## This one's from Python
foo = [1, 2, 3, 4, 5, 6, 7, 8, 9] -- This one's from Haskell
foo = [1, 2, 3, 4, 5, 6, 7, 8, 9] // This one's from Javascript

and here's some lazy lists

(defun lazy-numbers (&optional (starting-with 1))
  (cons starting-with (lambda () (lazy-numbers (+ starting-with 1)))))

(defparameter foo (lazy-numbers))
def lazyNumbers(startingWith=1):
    i = startingWith
    while True:
        yield i
        i += 1

foo = lazyNumbers()
foo = [1..]
function lazyNumbers (startingWith) {
    if (startingWith === undefined) startingWith = 1;
    return { num: startingWith,
             next: function () {
                 this.num += 1;
                 return this.num

foo = lazyNumbers();

Granted the second set looks more complicated, except for the Haskell line, but it has some advantages. Firstly, while the first bunch of lists is bounded, this second bunch is infinite, and you do sometimes want to express that. More to the point though, the reason these can be infinite is that they're lazy. They only compute as much of the remainder of the sequence as you actually ask for, which means they save you space in two ways

  • they don't keep the entire list in memory by default; they deal with only one element at a time and any used ones are garbage collected unless you decide to keep a pointer to them yourself
  • they never bother computing parts of the list you don't call for, so they don't waste space storing values that'll never actually get used somewhere

That's basic stuff, it should be pretty obvious. Less obvious, but more significant, is how this saves you time.


By its lonesome, it actually doesn't.

(loop with lst = (list 1 2 3 4 5 6 7 8 9)
   repeat 5 for n in lst
   do (format t "~a~%" n))

has the same performance characteristics as

(loop with lst = (lazy-numbers 1) repeat 5
     repeat 5 for n = (car lst) for lst = (funcall (cdr lst))
     do (format t "~a~%" n))

But what about when we compose multiple operations on one sequence? You'll recall that back when I sat down to optimize Life for a couple of days, I made a big deal of reducing the number of iterations of our corpus. Specifically, recall that the Haskell version of Gridless Life started out like this:

import Data.List

neighbors :: (Int, Int) -> [(Int, Int)]
neighbors (x, y) = [(x+dx, y+dy) | dx <- [-1..1], dy <- [-1..1], (dx,dy) /= (0,0)]

lifeStep :: [(Int, Int)] -> [(Int, Int)]
lifeStep cells = [head g | g <- grouped cells, viable g]
  where grouped = group . sort . concat . map neighbors
        viable [_,_,_] = True
        viable [c,_] = c `elem` cells
        viable _ = False

and zoom in on one line in particular.

  where grouped = group . sort . concat . map neighbors

That right there is what laziness trivially enables. If you think about what's being done here, you'll realize that you'd never do this in an eager context. Because if you did, what you'd get is

  1. the argument would get evaluated
  2. it would get traversed once by map neighbors
  3. and then it would get traversed again by concat
  4. and then it would get traversed again by sort[1]
  5. and then it would get traversed again by group

which means six total trips over the entire list. That's bad because every element you touch adds to your run time, and each element you have to touch again as part of the computation is one that you can't throw out of memory. On the other hand, expressing a computation by composing smaller, easy computations is very straightforward[2]. This is a place where the code you want to run, and the code you want to write are completely different.

What you want to run is a tight loop that walks over the entire corpus once, and applies all of the chained functions at once per element. What you want to write is the naive composition of those chained functions, because you've otherwise created an algorithm that will only ever be useful for your particular computation, and that computation will be burdened with the specifics of iteration which will make it non-trivial at best and impenetrable at worst.

Now, granted, Common Lisp has other ways of dealing of dealing with this[3], but laziness is one general solution. If each of those functions were lazy (as, in fact, they are in Haskell), what you'd get instead is exactly what you want. One, tight loop running over the entire corpus, applying only as much grouping, sorting, concating and neighborsing as it needed to for the next part of the computation. That saves you a few trips over the input with no additional effort on your part. Talk about low-hanging fruit.

I'll be honest, I was also going to talk about my latest thoughts on the square dissection problem, but this ended up being longer than I'd like as it is. It'll probably happen next time, I guess. Probably. Stay tuned if you're into that sort of thing.


1 - [back] - A note on functional, lazy sorts, because I was wondering about this both back when tuning the Haskell version of Life and as I was re-visiting it for today's feature. The way that lazy sorts seem to work is basically by using a destructured heapsort. Specifically, if you take a look at this pseudocode, what's happening is that a lazy sort runs heapify right away and passes up the first element, then pulls out the next element each time it's asked for one. That results in On performance for finding the first element (which as far as I'm aware is what you'd have to do in order to get the "nextest" element in the general case anyway), followed by O(log n) performance on looking up each next element. That's good because it means you don't have to do it all at once, and it means that if you only want the first 5 sorted elements of a list of 10000, you get to avoid most of the work. On the other hand, note that this doesn't save you much memory, if any; you still need to store that heap for the whole list, even if you only want the first few chunklets.

2 - [back] - Which is why I did this for the first pass of that program, not realizing that Haskell's lazy-by-default outlook would also make it about as efficient as it could be.

3 - [back] - In fact, the situation of "I want to write the pretty code, but run the ugly code" should send any Lisp veterans off thinking about how you'd defmacro your way out of this particular annoyance.

Tuesday, April 23, 2013

Conduits and More Squares

Ok, so I've been coding something for the past little while in between thinking about squares and reading up on concurrency in haskell. I'm not going to tell you what it is, because we've established my track record, but I did want to note that it looks like it's going to rely on Conduits. A minimal module that provides some interesting additional options for defpackage.

It doesn't have a quicklisp installation package yet, so I've written a basic .asd and chucked it up onto my github in the meantime.

More Square Thoughts

I haven't been actively working on this, just sort of letting it percolate at the back of my mind while other stuff is going on. Having taken a closer look at the original solution for 1..4, and imagining how its author went about getting it, it looks like we might be able to cut a lot of placements out of the process if we picked a representation that was easy to reflect and rotate. Then the actual solution would look something like

(defclass grid ...)

(defmethod rotate ((grid grid) squares direction) ...)
(defmethod reflect ((grid grid) squares axis) ...)

(defmethod collect-rotations ((grid grid) squares)
  (insert-dissection grid squares)
  (loop repeat 3 (insert-dissection grid (rotate grid squares :cw)))
  (rotate grid squares :cw))

(defmethod collect-reflection ((grid grid) squares)
  (loop for axis in '(:x :y :xy :yx)
     do (collect-rotations (reflect grid squares axis))))

(defmethod collect-dissections ((grid grid) squares)
  (collect-rotations grid squares)
  (collect-reflections grid squares))

(loop for sq in (starting-positions a-grid)
   do (loop until (full? g) collect (place-next g) into g
         do (collect-dissections a-grid g))
   finally (return (dissection-count grid)))

Assuming this works, we could easily write a solution to the "hard" version too; just call insert-dissection directly rather than going through collect-dissection and friends.

I'm carefully refraining from defining dissection-count, insert-dissection, grid, and indeed rotate and reflect. What I know is

  • insert-dissection is going to have to do the work of ensuring that duplicate dissections are discarded, which implies storing them as a set or hash
  • the definitions of rotate and reflect will depend heavily on the definition of grid, and square storage needs to be thought out to make sure they're all fast
  • it seems like there should be a better way to solve this problem than "brute force", but I'm not seeing it yet. I've joked about it before, but this may actually be the problem that gets me off my ass and into seriously learning about genetic algorithms

That's that for now. I was going to talk a bit about how work has been going, and the shape of small-scale development in Toronto's medical industry, but it looks like a client has finally decided to respond to me, so back to the grindstone I go.

Friday, April 19, 2013

Hardware and Squares

First things first, there was finally a sale on these things, and I was getting sick of the jury-rigged Vertex in my current laptop, so I picked one up. They're back up at ~$170, but mine cost me a much more attractive $90 last week.

First impressions are ok.

I'm not going to endorse SATA3 for anyone, because it's honestly not a mind-blowing change. I mean load times are noticeably faster, but the difference isn't anything like what you get from switching to a SATA2 SSD from a traditional platter drive. If you end up trading up to this from one of the slower SSDs, be prepared to hear yourself say "Oh, that's nice" rather than "This is fucking glorious". Honestly, my purchase was a result of the fact that this sale price came under my may-as-well threshold for hardware and that my previous HD was literally held in by rubber stoppers and electrical tape.

Dissecting Squares

I'm blaming Dann for this one again. Here's the problem we decided to go after for this months' Toronto Code Retreat:

Suppose you have a grid of squares nxn large. There is a finite number of ways you can split that square grid up into squares of at least 1x1 size. For example, for a 1x1 grid, there is only 1 possible dissection. For a 2x2 grid, there are 2 (either 4 1x1 squares or 1 2x2 square). For a 3x3 grid there are 6; (either 9 1x1 squares, 1 3x3 square, or 4 separate combinations of 1 2x2 square and 5 1x1 squares). This is the Number of Square Dissections.

(A complete illustration of the first four terms is over in the OEIS page. There's also a further three pages illustrating the fifth term. This gets big in a hurry.)

The challenge:

  • "Easy" Mode: Write a function that takes a number N and returns the Number of Square Dissections of an N by N grid.
  • "Hard" Mode: Write a second function with the same input as above that returns the number of unique, under rotation and reflection, dissections.

Before you chuckle at this, bear in mind that a general solution is publishable[1]. According to Dann, the sequence is solved up to an N of 15 or so, and that's already some ungodly number best expressed by exponents.

A couple of people took a stab at working "Hard" mode solutions[2], but as far as I know, no one managed to get to a working solution for "Easy" mode. This is the first Code Retreat I've gone to where I've felt deeply inadequate as a programmer, and that the most popular tool "a notebook" rather than a particular programming language.

As to the problem, I tried a bunch of approaches, none of which seemed to do anything other than annoy me. In terms of problem-solving, I'm usually a data-representation guy. Maybe it's the design degree making me focus on that, I don't fucking know, but it works fairly well. Given a thorny problem, my first reflex is to figure out a way of storing the data such that the mechanical components of the solution are going to be as simple as possible with no more moving parts than necessary. Which is why my train of thought as the problem was coming in was something like...

Can we do this as a naive area calculation? No, that actually fails on the 3x3 grid. We need to represent area and position, or we might represent the remaining area of the board in a more explicit way. Too complicated. Ok, we need to fit things into other things; is this actually a factoring problem? No, we're looking for the count of unique ways of factoring a thing rather than actual factorization. Doing it that way sounds like it would be computationally expensive. Can we model it as a tree of possible square placements? We'd need to represent each square as a width x y triple, at minimum, starting with (list (- n 2) 0 0) and working from there. But there are multiple interconnected combinations that can start with each 2x2 square on a 4x4 grid, and we can't actually prune a lot of sub-branches. Can we do this as a graph traversal? Sounds pretty go... wait, no. No, we wouldn't be visiting each node once. In fact, we can't know a-priori how many times a given node would have to be visited, or what the endpoints of any valid traversal are. Can we model this as an inventory system? Each grid has yae space and adding a square depletes it; when you get stuck, fill the rest with ones, each fill is a different dissection, count the results. Wait, isn't that going to be O^scary^fucking-scary? You need to generate all possible inventory combinations and de-duplicate. I guess you could keep a set of dissections somewhere and keep a canonical sorting order for placed squares. That still implies a representation that tracks position and size rather than just a size and a count. Is it possible that a 2D grid is actually a better representation for this one?-Inaimathi's brain

As it happens, sitting down and throwing together a simple program to test some of these didn't do anything to help. I'd start, get about a third of the way through, then realize that I wasn't accounting for this or that edge case. Or I'd figure out that edge-cases were covered, but the approach doesn't scale up to a general solution. The most promising I've got so far, and this is nowhere near a working solution yet, looks like

(defun positions (board-size square-size)
  (loop for x from 0 to (- board-size square-size)
     append (loop for y from 0 to (- board-size square-size)
               collect (list square-size x y))))

(defun next-squares (board-size starting-square disregard)
  (destructuring-bind (sq-size sq-x sq-y) starting-square
    (loop for current-sq-size from sq-size downto 2
         append (loop for x from sq-x 
                   append (loop for y from sq-y
                             when (and (or (= x sq-size) (= y sq-size)) 
                                       (not (member (list current-sq-size x y) disregard :test #'equal))) 
                             collect (list current-sq-size x y)
                             until (>= (+ sq-size y) board-size))
                   until (>= (+ sq-size x) board-size)))))

(defun tree-out (board-size list-of-squares)
  (loop for sq in list-of-squares
     collect sq into dis
     collect (next-squares board-size sq dis)))

which is godawful, and doesn't work yet, but it isn't obvious that it can't work so I'm rolling with it. The (very) general high-level approach is starting from the largest possible sub-squares of a given grid and doing a depth-first placement traversal. Each successful placement will then be sorted and stored in a set, which will ensure uniqueness[3]. There's a few places where we can definitely prune possibilities, but not as many as I thought there would be, which means that this will ultimately be a fairly expensive operation in any case.

Ugh. This isn't where I was planning on sinking the next week or so of my free time. I wanted to say a few words about Raspberry Pi hacking, and about its GPIO facilities specifically, or to talk a bit more about Wai in the context of large web applications, or maybe finally unveil my Plan For World Domination™© centered around building a general HTML-based MMO, but no.

I got a meme on me. So instead, I'll be thinking about squares.

And how they fit into each other.


1 - [back] - A proof that it's an NP-complete problem in the general case would also be noteworthy apparently.

2 - [back] - That is, taking the list of all dissections of a given grid size and filtering out the rotationally/reflectively unique ones.

3 - [back] - This can all happen in a single traversal, though there might be some opportunity for easy parallelism depending on how the implementation shakes out.

Tuesday, April 16, 2013

Simple web chat using Haskell's Wai/Warp

Here's a quick and dirty chat application written in Wai[1].

{-# LANGUAGE OverloadedStrings #-}
module Main where

import Control.Concurrent.Chan

import Control.Monad.Trans (liftIO)

import Network.Wai
import Network.Wai.EventSource
import Network.Wai.Handler.Warp (run)
import Network.Wai.Middleware.Gzip (gzip, def)
import Network.Wai.Parse (parseRequestBody, lbsBackEnd)
import Network.HTTP.Types (status200, ok200)

import Blaze.ByteString.Builder.Char.Utf8 (fromString)
import Data.ByteString.Char8 (ByteString, unpack)

app :: Chan ServerEvent -> Application
app chan req = do
  (params, _) <- parseRequestBody lbsBackEnd req
  case pathInfo req of
    [] -> return $ ResponseFile status200 [("Content-Type", "text/html")] "static/index.html" Nothing
    ["post"] -> liftIO $ postMessage chan $ lookPost "message" params
    ["source"] -> eventSourceAppChan chan req
    path -> error $ "unexpected pathInfo " ++ show (queryString req)

lookPost :: ByteString -> [(ByteString, ByteString)] -> String
lookPost paramName params = case lookup paramName params of
  Just val -> unpack val
  _ ->  ""

postMessage :: Chan ServerEvent -> String -> IO Response
postMessage chan msg = do
  writeChan chan $ ServerEvent (Just $ fromString "message") Nothing $ [fromString msg]
  return $ responseLBS ok200 [] "Posted"

main :: IO ()
main = do  
  chan <- newChan
  run 8000 $ gzip def $ app chan

That's the most basic example I could find/cobble together of using SSEs in Wai. That's the library called Network.Wai.EventSource up there, and you can see the channel represented in the expressions involving newChan, eventSourceAppChan and writeChan. Basically, we set up a Chan[2] at server startup, we hand out an endpoint whenever someone requests /source, and we write to all endpoints whenever someone requests /post.

The file index.html is exactly what you think it is; about 10 lines each of HTML and JavaScript that set up the front-end EventSource hooks and make sure the chat list gets updated with each new message. You could write it yourself without very much trouble.

This isn't particularly interesting. Firstly because, as you can see, it's ridiculously simple, and secondly because it doesn't scale. I mean it scales with users, sure. According to the Warp benchmarks, we can expect this to support somewhere between 20k and 50k people chatting depending on their loquaciousness, but since they'll all be chatting anonymously in the same room, the experience will stop being useful well before that. The next step confounded me for a little while because I had the assumption that using state in Haskell meant using the State monad[3]. It turns out that's probably not what you'd want here.

What we're after is a system where you can start up arbitrary new rooms, and post to a specific one. In other words, something like

{-# LANGUAGE OverloadedStrings #-}
module Main where

import Control.Concurrent.Chan
import Control.Concurrent (forkIO, threadDelay)

import Control.Monad.Trans (liftIO)
import Control.Monad.Trans.Resource (ResourceT)

import Network.Wai
import Network.Wai.EventSource
import Network.Wai.Handler.Warp (run)
import Network.Wai.Middleware.Gzip (gzip, def)
import Network.Wai.Parse (parseRequestBody, lbsBackEnd)

import Network.HTTP.Types (status200, ok200)
import Blaze.ByteString.Builder.Char.Utf8 (fromString)
import qualified Data.ByteString.Char8 as C

import Data.IORef
import Data.Text (unpack, pack)

app :: IORef [(String, Chan ServerEvent)] -> Application
app channels req = do
  (params, _) <- parseRequestBody lbsBackEnd req
  case pathInfo req of
    [] -> serveFile "text/html" "static/index.html"
    ["jquery.js"] -> serveFile "text/javascript" "static/jquery.min.js"
    ["chat.js"] -> serveFile "text/javascript" "static/chat.js"
    [channelName, action] -> do
      chan <- liftIO $ getOrCreateChannel channels $ unpack channelName
      case action of
        "post" -> 
          liftIO $ postMessage chan $ lookPost "message" params
        "source" -> 
          eventSourceAppChan chan req
        _ -> serveFile "text/html" "static/index.html"
    _ -> serveFile "text/html" "static/index.html"

serveFile :: C.ByteString -> FilePath -> ResourceT IO Response
serveFile mime filePath = return $ ResponseFile status200 [("Content-Type", mime)] filePath Nothing

lookPost :: C.ByteString -> [(C.ByteString, C.ByteString)] -> String
lookPost paramName params = case lookup paramName params of
  Just val -> C.unpack val
  _ ->  ""

getOrCreateChannel :: IORef [(String, Chan ServerEvent)] -> String -> IO (Chan ServerEvent)
getOrCreateChannel channels name = do
  res <- readIORef channels
  case lookup name res of
    Just chan -> 
      return chan
    _ -> do
      new <- newChan
      atomicModifyIORef channels (\cs -> ((name, new):cs, new))
      return new

postMessage :: Chan ServerEvent -> String -> IO Response
postMessage chan msg = do
  writeChan chan $ ServerEvent (Just $ fromString "message") Nothing $ [fromString msg]
  return $ responseLBS ok200 [] "Posted"

main :: IO ()
main = do
  channels <- newIORef []
  run 8000 $ gzip def $ app channels

That's a bit chunkier, but not by very much.

The significant operations there all involve something called an IORef, which is Haskell-talk for "a pointer". You can think of it an IO-based global variable that you can store stuff in[4], in this case, a map of channel names to channel streams.

That index.html file has a bunch of front-end changes too, mostly to do with acquiring and displaying multiple SSE sources, but we're not interested in that today. In the back-end, you'll notice that we've got a new function, getOrCreateChannel, which takes a "pointer" to our channel map and a name, and either returns the result of looking up that name, or inserts and returns a corresponding entry. readIORef "dereferences" that "pointer" to our map, and atomicModifyIORef mutates it. The rest of it should be self-explanatory.

Because we need to do a channel lookup before calling postMessage or eventSourceAppChan, our routes get a bit more complicated. We need to call getOrCreateChannel on the passed in channelName, then pass that to the appropriate function and return the response[5].

Finally, instead of passing a single channel to our app, we need to pass it a "pointer" to our lookup table. That happens in main at the bottom there.

The result of this exercise, as long as we put the front-end together appropriately, is a multi-room, anonymous, HTML chat system. More importantly though, this is a demonstration of how to handle simple global states in Haskell without tearing all your hair out.

I really wish someone else had written this before I started thinking about it...


1 - [back] - No, still not Yesod. Feel perfectly free to use it if that's your thing, but I'd still recommend Happstack if you absolutely, positively need a framework..

2 - [back] - Which I assume is reasonably efficient, since it's one of Haskell's basic concurrency constructs.

3 - [back] - Also, because I'm still not quite awesome enough at this that I can manipulate type expressions in my head. As a result, successful signature changes rarely happen first try, and I often find myself commenting them out then resorting to :t in GHCi and following the compilers' lead. I assume that's mechanical rather than a conceptual problem though, and talking about how I need more practice won't really help you out in any way.

4 - [back] - The IORef docs warn that using more than one of these in a program makes them unreliable in a multi-threaded setting. The thing is:

  • This chat program is extremely simple, needing only one global map to store open channels
  • If it ever got to the point of needing a more complex model, I'd hook it up to AcidState rather than trying to fiddle with MVars myself.

5 - [back] - You can see that happening in the branch labeled [channelName, action] ->, though we easily could have separated it into an external function rather than nesting cases.