Tuesday, March 25, 2014

Fact Base Indices

Got through a late-night programming binge, followed by more of the same on my lunch break, followed by a couple more days of thought and work during downtime. It's up at my fact-base implementation. Basically, we have indices now. Fairly naive, simple indices, but they should suffice for what I'm up to[1].

Before we get to discussing any code, lets back up and discuss the idea of an index for a moment.

The Idea of an index

An index in this context is an extra layer on top of our fact storage engine that keeps track of what we've put in/taken out in a way that makes certain things easier to look up. It's easier for fact-bases than it is for relational databases. Since every fact is made up of three components[3], all we have to do is keep an index by one or two slots. What we're basically looking to do is a really fast lookup of some subset of all facts in a base based on one or two of those keys[4]. The way I've chosen to do it, after some advice from friends who've used systems something like this, is by maintaining hash tables[5] in memory that give you shortcuts to some specified indices. We're trading space[6] for time[7].

The Post-Explanation Version

This was a much different article initially; I was going to discuss some bone-headed intermediate states for this code before getting to the "final"[8]. It ended up looking like a stupid idea in this particular instance because the previous versions weren't things I'd consider running after having thought through it a bit more.

I try to do that a fair amount these days.

Not the stupid implementations thing, though that's also true. I mean sit down with another actual human being and talk them through the code I just wrote. It's not quite as good as blogging about it, but it's faster and turns up almost as many issues. It also has an added benefit that blogging doesn't seem to give me, which is showing me when I'm completely off my nut. Anyhow, here's what I had after I explained the thing to someone, poured some shower time into thinking about it, and drafting the first version of this post:

(in-package #:fact-base)

(defclass index ()
  ((table :reader table :initform (make-hash-table :test 'equal))))

(defun make-index (indices)
  (let ((index (make-instance 'index)))
    (loop for ix in indices
       do (setf (gethash ix (table index))
                (make-hash-table :test 'equal)))
    index))

(defmethod indexed? ((state index) (ix-type symbol))
  (gethash ix-type (table state)))

(defun decide-index (&optional a b c)
  (cond ((and a b c) (list :abc a b c))
        ((and a b) (list :ab a b))
        ((and a c) (list :ac a c))
        ((and b c) (list :bc b c))
        ((and a) (list :a a))
        ((and b) (list :b b))
        ((and c) (list :c c))))

(defmethod format-index ((ix-type symbol) (fact list))
  (destructuring-bind (a b c) fact
    (case ix-type
      (:abc (list a b c))
      (:ab (list a b))
      (:ac (list a c))
      (:bc (list b c))
      (:a (list a))
      (:b (list b))
      (:c (list c)))))

(defmethod map-insert! ((facts list) (state index))
  (dolist (f facts) (insert! f state)))

(defmethod insert! ((fact list) (state index))
  (loop for ix being the hash-keys of (table state)
     for ix-table being the hash-values of (table state)
     do (push fact (gethash (format-index ix fact) ix-table))))

(defmethod delete! ((fact list) (state index))
  (loop for ix being the hash-keys of (table state)
     for ix-table being the hash-values of (table state)
     for formatted = (format-index ix fact)
     do (setf (gethash formatted ix-table) 
              (remove fact (gethash formatted ix-table) :test #'equal :count 1))
     unless (gethash formatted ix-table) do (remhash formatted ix-table)))

;;;;; Show methods
;; Entirely for debugging purposes. 
;; Do not use in production. 
;; Seriously.
(defmethod show (thing &optional (depth 0))
  (format t "~a~a" (make-string depth :initial-element #\space) thing))

(defmethod show ((tbl hash-table) &optional (depth 0))
  (loop for k being the hash-keys of tbl
     for v being the hash-values of tbl
     do (format t "~a~5@a ->~%" 
                (make-string depth :initial-element #\space) k)
     do (show v (+ depth 8))
     do (format t "~%")))

(defmethod show ((ix index) &optional (depth 0))
  (show (table ix) depth))

There, that's not so intimidating, is it? The actual interface to these indices is in fact-base.lisp, but the above contains most of the functionality. First, ignore the show methods at the bottom there. That was just a piece of hackery to give me a usable visual representation of an index while I was debugging this beast. Lets start at the top.

(defclass index ()
  ((table :reader table :initform (make-hash-table :test 'equal))))

(defun make-index (indices)
  (let ((index (make-instance 'index)))
    (loop for ix in indices
       do (setf (gethash ix (table index))
                (make-hash-table :test 'equal)))
    index))

(defmethod indexed? ((state index) (ix-type symbol))
  (gethash ix-type (table state)))

An index has a table of bindings. You'd call make-index in a way resembling (make-index '(:a :b :bc :ac)), which would give you back an index with room to dissect a fact base into chunklets keyed off of

  1. the first element
  2. the second element
  3. the second then the third element
  4. the first then the third element

No, I have no idea if I'll ever actually need a setup like this. The indexed? method takes an index and an ix-type symbol and tells you whether the given index is tracking that particular type of lookup.

(defun decide-index (&optional a b c)
  (cond ((and a b c) (list :abc a b c))
        ((and a b) (list :ab a b))
        ((and a c) (list :ac a c))
        ((and b c) (list :bc b c))
        ((and a) (list :a a))
        ((and b) (list :b b))
        ((and c) (list :c c))))

(defmethod format-index ((ix-type symbol) (fact list))
  (destructuring-bind (a b c) fact
    `(,ix-type
      ,@ (case ix-type
           (:abc (list a b c))
           (:ab (list a b))
           (:ac (list a c))
           (:bc (list b c))
           (:a (list a))
           (:b (list b))
           (:c (list c))))))

Those both map an index type to the components they'll need for insertion/lookup. I've thought about factoring out the obvious pattern, and even wrote some prototype code to do it, but it turns out that for 6 indices, the macrology involved is more complicated than the trivial lookup-table thing. If fact-base were an arbitrary storage system, this would probably be complicated enough to macro away, but the whole point is that I'm only ever storing triples. Which means I'm never going to need more index types than this, and usually much less.

(defmethod map-insert! ((facts list) (state index))
  (dolist (f facts) (insert! f state)))

(defmethod insert! ((fact list) (state index))
  (loop for ix being the hash-keys of (table state)
     for ix-table being the hash-values of (table state)
     do (push fact (gethash (format-index ix fact) ix-table))))

(defmethod delete! ((fact list) (state index))
  (loop for ix being the hash-keys of (table state)
     for ix-table being the hash-values of (table state)
     for formatted = (format-index ix fact)
     do (setf (gethash formatted ix-table) 
              (remove fact (gethash formatted ix-table) :test #'equal :count 1))
     unless (gethash formatted ix-table) do (remhash formatted ix-table)))

These three utility methods at the end are exactly what you'd expect. insert! takes a fact and an index and inserts one into the other, into how-many-ever particular lookups that index is tracking. map-insert! is a shorthand for inserting a bunch of facts at once into the same index. And finally, delete! takes a fact and removes it from all index lookups, then cleans up empty lookup lists.

And that's that. You've got a quick run-through of how this works out in practice here. I get the feeling I'll be talking about deltas, forking and the applications of fact bases before long, but we shall see.


Footnotes

1 - [back] - Granted, because "what I'm up to" at this point "an almost trivial semi-anonymous forum for a local meetup group", and "an almost trivial notebook-style REPL for common lisp", that's true of almost any data storage technique ever, but still. The naive, index-less storage was giving cl-kanren[2] a bit more trouble than I wanted it to when I pushed the stored entry count past 10000 or so. Which is not satisfactory. So yeah, this index is basically a search-space optimization for my database traversals. I'll let you know how it goes.

2 - [back] - Which I'll also have to talk about before long, if for no reason other than to get some ideas out of my head temporarily.

3 - [back] - The second two might be compound structures, but it's still only three top-level elements.

4 - [back] - If you have none of the three components of the fact you're looking for, you can't do better than "all facts"; if you have all of them, you already have the fact you're looking for. So we're only interested in the other two cases.

5 - [back] - There's no particular reason you couldn't use some tree structure if you like, but hashes are easy, and they come with Common lisp, so...

6 - [back] - Which is in ample supply these days.

7 - [back] - Which is always a vanishingly scarce resource.

8 - [back] - Which isn't really final any more; I've updated it several times since the stuff here.

Thursday, March 20, 2014

Housekeeping

Just some quick housekeeping while I draft up my next proper piece.

cl-cwd

A friend of mine mentioned that he had to hack up cl-git-fs to make it export cwd and with-cwd. Which was surprising when I heard it, but really shouldn't have been. This is the sort of problem Common Lispers seem to solve by copy/pasting a few implementation-specific snippets into projects that need the functionality. That's not good enough for me. So, here's cl-cwd; a library to get/manipulate/temporarily manipulate your current working directory in a cross-platform and cross-implementation way. Patches and bug reports welcome as always, and hopefully it's useful for you. I haven't gotten around to hacking the same components out of cl-git-fs, but I might eventually. Or, I might just use the os section of UIOP, which was only pointed out to me after I posted that cl-cwd repo. I'm not entirely sure what the right approach here is; are monolithic utility libraries preferable to very small, single-purpose packages? Not sure, but I kind of prefer the smaller, specific ones for my own purposes. Even though it never wastes enough resources to matter to me, it's deeply irritating on some level to include all of :alexandria and :anaphora for, essentially with-gensyms, aif, awhen and alambda.

fact-base updates and relevant minutia

I've gotten some work done on fact-base, which on reflection really should have been called cl-triple-store, with an eye on using it in production. We'll see if it actually happens, or if it just remains an idle fancy, but

  1. you can now add indexes to your fact bases, which should make data retrieval faster[1]
  2. writing deltas is now quite efficient[2]

Again, let me know if it's useful, or if it's sooooo-close-to-being-useful-if-it-only-had-this-one-feature. I might do something about it at this stage.

The only related piece of minutia is that I've found myself reading a lot about miniKanren, core.logic and logic programming in general. I think it might be a really good way to query structures like these triple-stores I've been building lately. Also, it turns out someone's already done most of the work of implementing that in CL for me, so I forked it and added/changed the few chunklets I needed to. Matthew Swank, if you're reading this and care, let me know and I'll send you patches. I assumed you wouldn't care, since the copyright line said 2008, but I might have been wrong.

Editors

Someone finally sat down and walked me through the installation for Light Table[3]. I haven't been paying attention to the propaganda, so I'm not entirely sure exactly how this is going to revolutionize editing[4], but it looks promising for a prototype. I was able to get to a productive-ish point with it within about an hour, and that's a high bar. I remember learning Emacs[5] over the course of weeks.

Another one I tried enough to appreciate is IPython Notebook. The idea of a browser-based editor has, shall we say, crossed my mind, but the idea of applying it to 1D coding hadn't occurred to me. I gotta say, I like the idea, and I'll be trying to do something about that right here. I've only got the easy part done so far; a front-end of code-mirror rigged up to a Lisp process that evaluates things and sends the result values back[6]. The hard part is going to involve a persistence layer, multiple cells, multiple notebooks and probably some smaller browsing/presentation/generation features to let me do whatever[7]. Spoiler warning: fact-base and its history/indexing operations will feature prominently. The ultimate goal is no less than replacing Emacs as my Common Lisp IDE of choice.

And that's sort of why I've been on another editor/tools kick lately.

I've been talking to some friends in the local programming community, and we seem to have collectively decided that we really, really want Light Table to succeed because it sucks not being able to recommend anything as a starting tool for Common Lisp development to programming newbies. Our options right now are what can charitably be referred to as a very intimidating operating system/rss aggregator/IRC client/moustache trimmer[8], a wannabe very intimidating blah blah blah, a very pretty, passably comfortable pair of handcuffs, a very performant, robust, ridiculously expensive pair of handcuffs, and vim. If you're recommending Scheme, there's also Racket[9], but that still doesn't count as a Common Lisp environment.

Some of those are things I'd recommend to people who are already programmers. One of them is something that I'd recommend to people looking to learn languages for the sake of learning them. But there isn't a tool anywhere on that list that compares to something like Idle or, more to the point, Light Table. I don't think it's quite at the point where I'd switch over to it myself, but I sincerely hope it gets there. And I'll be devoting some effort to pushing that along. You can too, since it's up and properly licensed and all...


Footnotes

1 - [back] - There's an article either in the works or already posted, depending on what order I decide to put these up.

2 - [back] - From a macro-optimization perspective; I'm sure I could pull some micro-optimizations on it too, but I'm not into that.

3 - [back] - Turns out that even the Debian Unstable repo doesn't have a recent enough version of Leiningen. I needed to latest using this.

4 - [back] - In fact, the only visible revolutionization of "randomly putting shit in your buffers that you can't edit instead of giving you a goddamn REPL", is profoundly annoying, though it's possible to disagree about these things. I guess I can imagine it looking really snazzy in presentations and demo videos.

5 - [back] - ...and before that jEdit, and before that Eclipse.

6 - [back] - And yes, it handles multiple values as well as *standard-output* emissions, in case you were wondering.

7 - [back] - At minimum the possibility to write :cl-who/:cl-css instead of markdown, a way of installing libraries that doesn't go through quicklisp, and maybe some git/drakma/shell support. I may end up having to add non-shitty static file serving to house for this...

8 - [back] - Pre-built Lisp IDE version optional.

9 - [back] - Which I do wholeheartedly recommend; it's very cool.

Thursday, March 6, 2014

Flow-Based Games

So I just tried to write what I thought was a fairly simple game prototype tentatively titled "Gratuitous Resource Gathering" using Elm. Something in the style of Cookie Clicker(DO NOT click that link if you were planning on doing anything today). Here's how far I got:

import Mouse
import Time

port resourceA : Signal Bool
port building : Signal String

buildingCost = foldp (+) 0 <| keepWhen (lift2 (>) (constant 50) balance) 0 <| sampleOn building <| constant 50

tickIncrement = foldp (+) 0 <| sampleOn buildingCost <| constant 0.01
tick = sampleOn (every Time.millisecond) <| constant True

spent = foldp (+) 0 <| merges [ sampleOn building buildingCost ]

gathered = foldp (+) 0 <| merges [ sampleOn resourceA <| constant 1
                                 , sampleOn tick tickIncrement ]

balance = lift round <| lift2 (-) gathered spent

main = lift (flow down) <| combine [ lift asText balance ]

That's almost the complete game, except for one very annoying detail: it doesn't work. When I try to run it in the appropriate HTML harness, I get

s2 is undefined
    Open the developer console for more details.

The reason is that buildingCost signal. When the user purchases a building, I need to check whether they have enough resource balance to buy it. However, the balance is the sum of two other signals, gathered and spent, the second of which is affected by building purchases. Looks like circular signals aren't a thing in Elm right now. I'm not sure how to resolve this inside the language, and I already told you all about ports last time, so my natural first reaction was to think about how I'd go about computing that price check outside the Elm module. Unfortunately, once I started mentally pulling things out of Elm, I quickly arrived at the conclusion that the whole thing would probably need to be turned inside-out. In other words, I'd be using Elm purely as a way of avoiding manual DOM manipulation in one or two components of a mostly Javascript project.

Maybe that'd still be worth it, but it feels quite unsatisfying.

No real idea what to do about it though. I'll talk to some people I consider smarter than me and see what they think. Hopefully there's a reasonable way around the problem that doesn't include doing most of it in manual JS.

In the meantime, I hacked together something in Daimio.

outer
        @resource-click dom-on-click resource
        @building-click dom-on-click building
        @show dom-set-html display

        @timer every-half-second

        $building-cost 5
        $click-increment 1
        $tick-increment 0
        $balance 0

        inc-balance 
                { __ | add $balance | >$balance }
        dec-balance 
                { __ | subtract value __ from $balance | >$balance }

        can-afford
                { __ | ($building-cost $balance) | max | eq $balance }
        
        @resource-click -> {__ | $click-increment } -> inc-balance -> @show
        @building-click -> can-afford -> { __ | then $building-cost else 0} -> dec-balance -> @show
                           can-afford -> { __ | then 10 else 0 | add $tick-increment | >$tick-increment }
                           can-afford -> { __ | then "Buying..." | tap }

        @timer -> {__ | $tick-increment } -> inc-balance -> { __ | $balance } ->  @show

No highlighting mode for that one yet; I'm workin' on it. Also, the above was kind of non-trivial because I had to add my own timer event, and I still want to figure out how to factor out the process of buying a building. But it works well enough on my machine. I'll post the full project up to my github once I do a bit more thinking about it.