Saturday, April 12, 2014

Querying Fact Bases Redux

So I put together that thing I talked about last time, only to discover three things.

Firstly, the only places I actually needed lazy operation could be handled by passing a body directly to the query macro.

Secondly, when I loaded the thing up, worked up a 40k fact corpus[1] and ran

(for-all (and (?id :user ?name) (?id :time ?time) (?id :number 62))
         :in *base* :get (list ?id ?time ?name))

I'd get this:

You may have noticed that this isn't an animated gif. It hangs there for something on the order of thirty seconds, more if profiling is on, and then returns the expected result. So that won't really do. There's some interesting points I'll talk about later, that have to do with clause order and the underlying operations. But, even though this is probably the worst way to write this particular query, it should return in under a second.

Thirdly, that I had exactly zero use cases for or goals. This might change, but until then, it looks like I don't even need unification[2].

So as a result, I sat down and took the precise opposite approach to traversal that I tried last time. Instead of trying to keep it elegant and lazy, lets make it hacky and eager. Here's our problem, once again:

(for-all (?id :user ?name) 
         :in *base* :get (list ?id ?name))

should basically be the same as

(loop for (a b c) in (current *base*) when (eq b :user) collect (list a c))

and

(for-all (and (?id :user ?name) (?id :time ?time) (?id :number 62) 
         :in *base* :get ?time))

should more or less be equivalent to

(loop for (a b c) in (current *base*) 
   when (eq b :user)
   append (loop for (d e f) in (current *base*)
             when (and (eq d a) (eq e :time)))
             append (loop for (g h i) in (current *base*)
                       when (and (eq g d) (eq h :number) (= i 62))
                       collect f))

Except, you know, it should be smarter about using indices where it can. But that's a pretty straight-forward specification.

lookup and decide-index changes - take 1

The first thing I had to do was change lookup and decide-index a bit, because I wanted them to be mildly less naive. And yeah, I broke down and added some macrology to pull out all the repetition in the index-related functions. Turns out that was a good thing.

(defmacro lookup-index (state &rest indices)
  (with-gensyms (st)
    `(let ((,st ,state))
       (cond ,@(loop for i in indices 
                  for syms = (key->symbols i)
                  collect `((and (indexed? (index ,st) ,i)
                                 ,@syms)
                            (list ,i ,@syms)))))))

(defmethod decide-index ((state fact-base) &optional a b c)
  (lookup-index state :abc :ab :ac :bc :a :b :c))

Short version is, the function now takes a fact-base in addition to an a, b and c, and checks whether a particular type of index is kept for a fact base before otherwise seeing whether it would be appropriate for the current query.

(defmethod lookup ((state fact-base) &key a b c)
  (if (every #'not (list a b c))
      (current state)
      (let ((ix (aif (decide-index state a b c)
                     (gethash (rest it) (gethash (first it) (table (index state))))
                     (current state))))
        (loop for f in ix
           when (and (or (not a) (equal a (first f)))
                     (or (not b) (equal b (second f)))
                     (or (not c) (equal c (third f))))
           collect f))))

lookup now has to be mindful of this, and has to check that the indexed facts match the incoming query. Because we're now potentially using a more general index than the query calls for. My gut tells me this is still a net increase in performance since last time, even though our best case is now On with the size of the result rather than 01. If it comes to it, I'll go back and make that more efficient.

Actually, lets fix it right now.

lookup and decide-index changes - take 2

(defmethod lookup ((state fact-base) &key a b c)
  (if (every #'not (list a b c))
      (current state)
      (multiple-value-bind (index ideal-index) (decide-index state a b c)
        (let ((ix (if index
                      (gethash (rest index) (gethash (first index) (table (index state))))
                      (current state))))
          (if (and index (eq (first index) ideal-index))
              ix
              (loop for f in ix
                 when (and (or (not a) (equal a (first f)))
                           (or (not b) (equal b (second f)))
                           (or (not c) (equal c (third f))))
                 collect f))))))

That more complicated version of lookup expects two values instead of one; which index we're using, and which index we'd ideally use. If the two are the same, we just return the results of our lookup, otherwise we have to do the narrowing traversal. That's about as efficient as it's going to get without making it lazy. Which I guess I could, but not right now. However, we also need a modified decide-index to pull this little trick off. And that's going to be fugly.

(defmacro lookup-index (state &rest indices)
  (with-gensyms (ix ideal applicable?)
    `(let ((,ix (index ,state))
           (,ideal))
       ,@(loop for i in indices 
            for syms = (key->symbols i)
            collect `(let ((,applicable? (and ,@syms)))
                       (when (and (null ,ideal) ,applicable?) (setf ,ideal ,i))
                       (when (and (indexed? ,ix ,i) ,applicable?)
                         (return-from decide-index 
                           (values (list ,i ,@syms) ,ideal)))))
       (values nil ,ideal))))

(defmethod decide-index ((state fact-base) &optional a b c)
  (lookup-index state :abc :ab :ac :bc :a :b :c))

Say what you will about imperative programming; it's efficient. That's a single pass over the relevant indices that returns both the least general applicable index, and the ideal index for a given query. Which means we can now profitably compare the two in lookup, which means that our best case is back up to O1, since we don't need to traverse queries for things we've indexed.

With those modifications, I can pull some fancier crap in translating for-all calls into loops. Specifically, I can do this:

This

(defun goal->destructuring-form (goal &key (bindings (make-hash-table)))
  (labels ((rec (elem)
             (cond ((listp elem)
                    (mapcar #'rec elem))
                   ((or (eq '? elem) (not (variable? elem)))
                    (gensym))
                   ((and (variable? elem) (gethash elem bindings))
                    (gensym))
                   ((variable? elem)
                    (setf (gethash elem bindings) t)
                    elem)
                   (t (error "Somethings' up. goal->destructuring-form~%     ~s~%     ~s~%     ~s"
                             bindings goal elem)))))
    (mapcar #'rec goal)))

(defun goal->lookup (base goal &key (bindings (make-hash-table)))
  (flet ((->ix (elem)
           (cond ((and (variable? elem) (gethash elem bindings))
                  elem)
                 ((any-variables? elem)
                  nil)
                 (t elem))))
    (destructuring-bind (a b c) goal
      `(lookup ,base 
               :a ,(->ix a) 
               :b ,(->ix b)
               :c ,(->ix c)))))

(defun goal->or-expression (a b c goal)
  (flet ((test (term elem) `(equal ,term ,elem)))
    `(and ,(test a (first goal))
          ,(test b (second goal))
          ,(test c (third goal)))))

(defmethod handle-goals ((goal-type (eql 'and)) base goals collecting)
  (let ((bindings (make-hash-table)))
    (labels ((single-goal (destruct lookup tail)
               `(loop for ,destruct in ,lookup ,@tail))
             (rec (goals)
               ;; We want to generate the lookups first, 
               ;; because the bindings are going to be generated
               ;; from the result of the lookup. Meaning, if the bindings 
               ;; are established in a given destruct clause,
               ;; they won't be usable until the NEXT lookup.
               ;; Therefore, even though it isn't immediately obvious, 
               ;; order matters in this let* form

               (let* ((lookup (goal->lookup base (first goals) :bindings bindings))
                      (destruct (goal->destructuring-form (first goals) :bindings bindings)))
                 (if (null (cdr goals))
                     (single-goal destruct lookup `(collect ,collecting))
                     (single-goal destruct lookup `(append ,(rec (rest goals))))))))
      (rec (rest goals)))))

(defmethod handle-goals (goal-type base goals collecting)
  ;; Same story here as in handle-goals (eql 'and) method
  (let* ((bindings (make-hash-table))
         (lookup (goal->lookup base goals :bindings bindings))
         (destruct (goal->destructuring-form goals :bindings bindings)))
    `(loop for ,destruct in ,lookup collect ,collecting)))

(defmacro for-all (goal-term &key in get)
  (with-gensyms (base)
    (let ((template (replace-anonymous (or get `(list ,@(variables-in goal-term))))))
      `(let ((,base ,in))
         ,(handle-goals (first goal-term) base goal-term template)))))

We'll go through it in a minute, but the point of these changes is that writing

(for-all (and (?id :user ?name) (?id :time ?time) (?id :number 62))
         :in *base* :get (list ?id ?time ?name))

should expand directly into something like

(LET ((#:BASE1122 *BASE*))
  (LOOP FOR (?ID #:G1123 ?NAME) 
     IN (LOOKUP #:BASE1122 :A NIL :B :USER :C NIL)
     APPEND (LOOP FOR (#:G1124 #:G1125 ?TIME) 
               IN (LOOKUP #:BASE1122 :A ?ID :B :TIME :C NIL)
               APPEND (LOOP FOR (#:G1126 #:G1127 #:G1128) 
                         IN (LOOKUP #:BASE1122 :A ?ID :B :NUMBER :C 62)
                         COLLECT (LIST ?ID ?TIME ?NAME)))))

rather than the lazy-ish generator tree from last time. Thanks to our re-structuring of lookup, this is about as efficient as it's going to get without re-jigging goal order. The only edge case we have is what happens if the entire goal is perfectly indexable, except it seems that the programmer would use lookup directly in those situations[3].

On to the code review. Reading. Whatever.

(defun goal->destructuring-form (goal &key (bindings (make-hash-table)))
  (labels ((rec (elem)
             (cond ((listp elem)
                    (mapcar #'rec elem))
                   ((or (eq '? elem) (not (variable? elem)))
                    (gensym))
                   ((and (variable? elem) (gethash elem bindings))
                    (gensym))
                   ((variable? elem)
                    (setf (gethash elem bindings) t)
                    elem)
                   (t (error "Somethings' up. goal->destructuring-form~%     ~s~%     ~s~%     ~s"
                             bindings goal elem)))))
    (mapcar #'rec goal)))

step one of the transformation is to put together the destructuring-form for a particular goal

;;             this thing
;;          vvvvvvvvvvvvvvvvvvv
  (LOOP FOR (?ID #:G1123 ?NAME) IN (LOOKUP #:BASE1122 :A NIL :B :USER :C NIL)
...

In order to do that, we have to replace everything other than variables with gensym calls, but keep the same tree structure. loop does deep destructuring, so we can get away with using this as a pattern-matching strategy. We also need to replace already bound variables from previous destructuring-forms with the same gensym calls so they don't get re-assigned unnecessarily.

(defun goal->lookup (base goal &key (bindings (make-hash-table)))
  (flet ((->ix (elem)
           (cond ((and (variable? elem) (gethash elem bindings))
                  elem)
                 ((any-variables? elem)
                  nil)
                 (t elem))))
    (destructuring-bind (a b c) goal
      `(lookup ,base 
               :a ,(->ix a) 
               :b ,(->ix b)
               :c ,(->ix c)))))

The next thing we need to put together is a given goals' lookup clause

;;                                       this thing
;;                                 vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
  (LOOP FOR (?ID #:G1123 ?NAME) IN (LOOKUP #:BASE1122 :A NIL :B :USER :C NIL)
...

We're being conservative at the moment, but there's an optimization or two I could still make here. The way we're dealing with these is:

  1. if a given goal-component is a variable, then look it up by its value in the current bindings[4]
  2. if a given goal-component is a compound form which contains any variables, don't index by it[5]
  3. otherwise, use it as an index

Onwards to handle-goal; the real meat of this approach. Lets take a look at how we deal with singleton goals first

(defmethod handle-goals (goal-type base goals collecting)
  ;; Same story here as in handle-goals (eql 'and) method
  (let* ((bindings (make-hash-table))
         (lookup (goal->lookup base goals :bindings bindings))
         (destruct (goal->destructuring-form goals :bindings bindings)))
    `(loop for ,destruct in ,lookup collect ,collecting)))

Easy, right? grab the results of goal->lookup and goal->destructuring-form and stitch them into a loop along with the collecting clause. Nothing fancy here, except for that cryptic note about a different method definition.

(defmethod handle-goals ((goal-type (eql 'and)) base goals collecting)
  (let ((bindings (make-hash-table)))
    (labels ((single-goal (destruct lookup tail)
               `(loop for ,destruct in ,lookup ,@tail))
             (rec (goals)
               ;; We want to generate the lookups first, 
               ;; because the bindings are going to be generated
               ;; from the result of the lookup. Meaning, if the bindings 
               ;; are established in a given destruct clause,
               ;; they won't be usable until the NEXT lookup.
               ;; Therefore, even though it isn't immediately obvious, 
               ;; order matters in this let* form
               (let* ((lookup (goal->lookup base (first goals) :bindings bindings))
                      (destruct (goal->destructuring-form (first goals) :bindings bindings)))
                 (if (null (cdr goals))
                     (single-goal destruct lookup `(collect ,collecting))
                     (single-goal destruct lookup `(append ,(rec (rest goals))))))))
      (rec (rest goals)))))

And this is the full story[6]. Because of the specific way we want lookup and destruct to interact with their containing bindings, their order matters quite a bit. Play around with the macroexpander if you don't quite see it from just the definition.

Anyhow, the way we deal with and goals is by building up a chain of loop forms, each one dealing with a single goal while taking the previous goals into account. All but the last one need to append their results, while the last needs to collect them. The only part we've got left is the now trivial step of putting together the for-all macro interface to the rest of this compilation pipeline[7].

(defmacro for-all (goal-term &key in collecting)
  (with-gensyms (base)
    (let ((template (replace-anonymous (or get `(list ,@(variables-in goal-term))))))
      `(let ((,base ,in))
         ,(handle-goals (first goal-term) base goal-term template)))))

Tadaah!

I haven't yet put together an equivalent facility for the old apply keyword arg, but because of how we've changed up the underlying code processors, collecting can now trivially handle things like

(for-all (and (?id :user ?name) (?id :time ?time) (?id :number 62))
         :in *base* :collecting (list ?name (+ ?id ?time 62)))

This concludes the part of this post wherein I talk about implementation details. The rest is just one or two interesting notes about traversals. If you're getting bored, or tired, this is a pretty good break-point for you.

Traversal Notes

Near the beginning of this piece, I said

...this is probably the worst way to write this particular query...-Inaimathi

referring to

(for-all (and (?id :user ?name) (?id :time ?time) (?id :number 62))
         :in *base* :get (list ?id ?time ?name))

and the reason should be fairly obvious now that we know exactly how we go about finding these answers. Remember, the expansion for this form, after compensating for the different keyword argument in our new for-all, is

(LET ((#:BASE1262 *BASE*))
  (LOOP FOR (?ID #:G1263 ?NAME) 
    IN (LOOKUP #:BASE1262 :A NIL :B :USER :C NIL)
    APPEND (LOOP FOR (#:G1264 #:G1265 ?TIME) 
              IN (LOOKUP #:BASE1262 :A ?ID :B :TIME :C NIL)
              APPEND (LOOP FOR (#:G1266 #:G1267 #:G1268) 
                        IN (LOOKUP #:BASE1262 :A ?ID :B :NUMBER :C 62)
                        COLLECT (LIST ?ID ?TIME ?NAME)))))

and just so that we're perfectly clear on what that means, here's the Lisp-esque pseudo-code

(for-each goal-1
    append (for-each goal-2 
               append (for-each goal-3
                          collect [some list of components])))

Now granted, we're aggressively using indices where we can, so we can slice a lot of the constant time out of this equation depending on how often such an operation happens, but no matter how efficiently we slice it, we're going to take a number of steps equal to goal-3 * (goal-2 * goal-1). That is, we're going On over the candidates for the last goal, for each candidate of the previous goal, for each candidate of the previous goal and so on.

This is why the indices help us a lot. If we couldn't effectively discount swathes of our initial corpus, the performance characteristic would be On^m where n is the size of our fact base and m is the number of goals. Meaning that it behooves us to cut as many candidates as early as possible, since early reductions in our problem space will give us much better returns.

In other words, to paraphrasingly re-iterate Norvig, even though

(for-all (and (?id :user ?name) (?id :time ?time) (?id :number 62))
         :in *base* :get (list ?id ?time ?name))

and

(for-all (and (?id :number 62) (?id :time ?time) (?id :user ?name))
         :in *base* :get (list ?id ?time ?name))

are logically equivalent, the latter is going to perform noticeably better, because (?id :number 62) has a much smaller set of candidate facts than (?id :user ?name) in our particular corpus. One interesting exercise, which I'll leave for next time, would be to have for-all try to optimally sort its and goals by putting the smallest candidate lists at the beginning so as to reduce the search-space with no thought required from the user. The above is a trivial example; there's one goal that has more indexable terms in it than the others, so in general[8] it will probably yield a smaller candidate list. The real way about this feels like it would be to aggressively index goals at the start of a query and sample their corpus size, then sort on that. Not sure if that would cost more than it buys me though, since it feels like that would get complex fast.

Anyway, like I said, I'll leave it for next time.

If I end up seeing performance issues in the things I'm building out of fact-base.

And I get bored.


Footnotes

1 - [back] - Like this, if you're interested:

(defparameter *base* (make-fact-base :indices '(:a :ab :abc)))

(defmethod test-generate! (n)
  (loop repeat n
     do (multi-insert! 
         *base* `((:number ,(random 100)) (:type :digit) 
                  (:time ,(get-universal-time)) 
                  (:user ,(nth (random 7) '("Inaimathi" "Anon" "Someone Else" "Albert" "Beatrice" "Charles" "Daria")))))))

(test-generate! 10000)

2 - [back] - Which makes things much simpler for this approach. Hopefully, you'll see why as we go.

3 - [back] - and they can, since it's still an :exported symbol itself.

4 - [back] - if it has been bound by a previous destructuring-form, it'll be assigned by this point, which means we'll be able to index by it. Otherwise, gethash will return nil, which is exactly what we want.

5 - [back] - This is where we could be a bit more efficient, in case you're interested. If we wanted to be very precise about it, we'd say that we could use a compound form with variables as an index, provided that all of its variables have been bound prior to this point in the traversal. I'm leaving it out for now because

  • it would further complicate an already tricky chunk of code
  • I'm not sure how often this edge case would happen in practice and
  • if it does happen, the current result will be a slightly less efficient traversal, which doesn't sound too bad. If the consequence were incorrect results instead, I'd have reconsidered

6 - [back] - As an aside, this is the first place I've seen in something like 8 years where a comment is appropriate. It doesn't mirror the code to which it pertains and it explains a non-obvious but necessary facet of the implementation. Usually, I'd either work out some naming scheme that would make the point obvious, or just factor out the chunk of code that needs explanation. There doesn't need to be a simple way of doing either here[9].

7 - [back] - And just to highlight this, it is a compilation pipeline. I mentioned this at a semi-Lisp-related meet-up lately, and it's true enough to repeat to the internets: a good way of conceptualizing a Common Lisp macro is as a compiler that takes some Lisp code and emits different Lisp code. Because of the way Lisp is structured, we get the first chunk of an actual compilation pipeline for free, and essentially start with a tokenized input. It's a pretty powerful technique once you get your head around it.

8 - [back] - Though not necessarily in plenty of specific cases.

9 - [back] - Though I guess I could factor that let out into a with-for-all-forms if it turned out I had to repeat it in many places.

Wednesday, April 9, 2014

Querying Fact Bases

So it seems that a lot of people are into this logic programming thing I've been reading about lately. There's the already mentioned Reasoned Schemer trio of Friedman/Byrd/Kiselyov behind the beautiful but arcane miniKanren language, a prolog-like contained in Peter Norvig's Paradigms of Artificial Intelligence Programming chapters 11, 12 and 14, another one in Graham's On Lisp chapters 19, 22, 23, 24 and yet another in chapter 4.4 of Abelson and Sussman's SICP. So there's a lot of literature around dealing with how you go about building a unifier or pattern-matcher[1].

Anyway, I've been consuming this literature for a while, and the part I want to zoom in on is searching the database. The other stuff is easy; a unifier can be straight-forwardly built in about ten lines[2] and handling variables is the same tree-traversal stuff you've seen a hundred times before, but the actual search never seems to be the focus of these things. And I'm looking for a particular type of search for fact-base. I showed it off recently at a Toronto Lisp Group meeting along with the app it's supposed to enable and mentioned that querying is mildly annoying when you get to compound queries. Specifically, I took as an example the very simple database

(defparameter base (make-fact-base))
(insert! base (list 0 :message "This is a sample message"))
(insert! base (list 1 :message "This is another one"))
(insert! base (list 1 :author "Inaimathi"))
(insert! base (list 2 :message "That second one was written by me. This one is a meta-message (also by me)."))
(insert! base (list 2 :author "Inaimathi"))
(insert! base (list 2 :type :meta))

and pointed out that in order to find All the message bodies authored by Inaimathi, you'd have to take two steps

  1. traverse the database looking for facts that have '(:author "Inaimathi") for a rest
  2. get all the facts that have the same first as one of the above and the second :message, and collect their thirds

In actual lisp, that looks like

(loop for (a b c) in (lookup base :b :author :c "Inaimathi") 
      collect (caddar (lookup base :a a :b :message)))

That's even passably fast, thanks to our index system, but it's annoying to write, and it forces me to do a fact-base->objects conversion in some places rather than write out these multi-stage iterations myself. What I'd like to be able to do in the above is something like

(for-all (and (?id :author "Inaimathi") (?id :message ?message)) :in my-fact-base :get ?message)

and have the system figure it out for me. Granted in this situation, you don't gain very much, but it would be a compounding gain for more complex queries. For instance, if I suddenly decided I want to select All the message bodies authored by Inaimathi pertaining to other messages, the query language version handles it very simply:

(for-all (and (?id :author "Inaimathi") (?id :message ?message) (?id :type :meta)) :in my-fact-base :get ?message)

whereas the manual version would add another level of iteration I'd need to work through. Oh, and have fun with the situation where you only want the first 5 or so hits. The easiest solution with the manual approach is searching the entire space and throwing away all but the first n results. You could do better, but you're suddenly in the supremely annoying situation where your queries all look mildly different, but perform the same basic task.

What I figure I'd want is a lazy or lazy-ish way of getting the results. The lazy solution can easily be converted to the eager solution later, but it's really painful to take the eager approach and then find out that you only needed to do about 4% of the work done. I'll be using generators, rather than outright lazy sequences just because they're mildly easier to put together. For a single goal, that's trivial.

(for-all (?id :author "Inaimathi") :in my-fact-base)

All you have to do here is have a generator that runs over the facts in my-fact-base and returns the next matching one it finds. Something like

(defun match-single (goal bindings facts)
  (let ((fs facts))
    (lambda ()
      (loop for res = (unify goal (pop fs) bindings)
         unless (fail? res) do (return res)
         while fs
         finally (return (fail))))))

would do fine. I'm pointedly refusing to commit to an implementation of (fail), unify and bindings at each point for the purposes of this post, but am using the stuff out of Norvig's PAIP source code. For the uninitiated: A goal is the thing you're trying to match; it's an expression that may contain some variables. A variable is a thing that you can substitute; it can either be unbound or assigned a value in a particular set of bindings. If a unification fails, it returns (fail), and if it's successful it returns the set of bindings that would make that unification expression true. For instance, if you unified ?a with 5, starting with empty bindings, unify would return the set of bindings in which ?a is bound to 5.

So the above match-single definition would return a generator which, when called, would either (fail), or return the environment resulting from unifying the next element of facts with goal. Hopefully, straight-forward, though you may need to do a bit of reading up on it if you've never seen the terms before.

The next easiest thing to do would be handling a set of ored goals. That is

(for-all (or (?id :author "Aaron") (?id :author "Bradley")) :in my-fact-base)

It's basically the same thing, except that instead of applying a single goal and checking if it sticks, we're applying several goals in sequence and seeing if any of them stick. Something like

(defun match-ors (goals bindings facts)
  (let ((fs facts))
    (flet ((try-goals (f)
             (loop for g in goals
                for res = (unify g f bindings)
                when res do (return res)
                finally (return (fail)))))
      (lambda ()
        (loop for res = (try-goals (pop fs))
           unless (fail? res) do (return res)
           while fs
           finally (return (fail)))))))

which is by my estimation only marginally more complicated. The tricky part is traversing a fact-base in order to satisfy anded goals. Like in that example I mentioned near the beginning:

(for-all (and (?id :author "Inaimathi") (?id :message ?message) (?id :type :meta)) :in my-fact-base :get ?message)

Think about it.

What you want here is fairly complicated to express in English. I'm still trying to return a generator from the whole thing, but expressing its behavior is a complex.

If you only get one goal, you want to fall through to a call to match-single; that's still fairly straight-forward. The magic happens at more than one goal. And I just deleted about four paragraphs of prose that would have thoroughly confused you. It's not a very easy set of concepts to express in English because it refers to pieces of itself fairly often.

Lets try it this way:

(defun match-ands (goals bindings facts)
  (let ((generator (match-single (first goals) bindings facts))
        (rest-generator))
    (if (null (cdr goals))
        generator
        (labels ((next-gen ()
                   (let ((res (funcall generator)))
                     (if (fail? res)
                         (fail)
                         (setf rest-generator (match-ands (rest goals) res facts)))))
                 (backtrack! ()
                   (if (fail? (next-gen))
                       (fail)
                       (next)))
                 (next ()
                   (if (null rest-generator)
                       (backtrack!)
                       (let ((res (funcall rest-generator)))
                         (if (fail? res)
                             (backtrack!)
                             res)))))
          #'next))))

The image you want, once you've put the initial generator tower together, is one of those combo bike-locks.

If you want to search its entire possibility space, you spin the last ring until it runs out of values. Then you spin the second-to-last ring once, and retry the last ring. When you run out of values on the second-to-last ring, you spin the third-to-last ring once and so on. It's an incredibly tedious exercise, which is why I'd prefer a machine to handle it.

Now, chunk by chunk

(defun match-ands (goals bindings facts)
  (let ((generator (match-single (first goals) bindings facts))
        (rest-generator))
    (if (null (cdr goals))
        generator
        ...

By the time we're calling this function, I assume that it'll be handed at least one goal. You always want the generator of your first goal, and if you only get the one goal, you just return said generator and you're done. Multiple goals are where you need to pull fancy footwork. Again, one chunk at a time:

        ...
        (labels ((next-gen ()
                   (let ((res (funcall generator)))
                     (if (fail? res)
                         (fail)
                         (setf rest-generator (match-ands (rest goals) res facts)))))
                 ...

This is where we set the rest-generator from earlier. It's just the procedure that will return the next result from proving the rest of the goals given the set of bindings built from proving the first goal into the starting set of bindings given to match-ands initially. If calling the first goals' generator fails, we likewise fail; otherwise we set rest-generator to the generator we create by passing the result back up to match-ands.

                 ...
                 (backtrack! ()
                   (if (fail? (next-gen))
                       (fail)
                       (next)))
                 ...

Occasionally, we have to backtrack. Which in this context means we try to call next-gen. If that fails, we likewise fail, otherwise we invoke next. Which...

                 ...
                 (next ()
                   (if (null rest-generator)
                       (backtrack!)
                       (let ((res (funcall rest-generator)))
                         (if (fail? res)
                             (backtrack!)
                             res)))))
                       ...

...sets up an initial rest-generator if there isn't one, then tries to call it. If that fails, we backtrack![3], otherwise we return the result.

          ...
          #'next))))

That next function I just described is the generator we want for a multi-goal proof, which means that it's the final return value from match-ands.

The only required piece of infrastructure left is for-all itself. We want it to be able to provide results, or do something with them lazily. Which means it'll look something like

(defmacro for-all (goal-term &key in get apply)
  (assert in nil "Need a database to query...")
  (when (and get apply)
    (format t ":apply and :get arguments passed in; ignoring :get"))
  (with-gensyms (template gen res facts)
    `(let* ((,facts ,in)
            (,gen ,(cond ((eq 'and (car goal-term))
                          `(match-ands ',(replace-anonymous (rest goal-term)) +succeed+ ,facts))
                         ((eq 'or (car goal-term))
                          `(match-ors ',(replace-anonymous (rest goal-term)) +succeed+ ,facts))
                         (t
                          `(match-single ',goal-term +succeed+ ,facts))))
            ,@(unless apply
              `((,template ',(replace-anonymous (or get goal-term))))))
       (loop for ,res = (funcall ,gen)
          while ,res collect ,(if apply
                                  `(apply (lambda ,(variables-in apply) ,apply)
                                          (subst-bindings ,res ',(variables-in apply)))
                                  `(subst-bindings ,res ,template))))))

Which isn't nearly as complicated as it seems at first glance. Lets go through that too.

(defmacro for-all (goal-term &key in get apply)
  (assert in nil "Need a database to query...")
  (when (and get apply)
    (format t ":apply and :get arguments passed in; ignoring :get"))
    ...

arguments, and trying to be helpful with invocation errors. We want the thing to be readable too, which is why I use "mandatory" keyword arguments in this one.

  ...
  (with-gensyms (template gen res facts)
    `(let* ((,facts ,in)
            (,gen ,(cond ((eq 'and (car goal-term))
                          `(match-ands ',(replace-anonymous (rest goal-term)) +succeed+ ,facts))
                         ((eq 'or (car goal-term))
                          `(match-ors ',(replace-anonymous (rest goal-term)) +succeed+ ,facts))
                         (t
                          `(match-single ',goal-term +succeed+ ,facts))))
   ...

we're setting up some name sanitation for certain words we'd like to use in the definition that should still be usable by the callers of for-all. Note the use of replace-anonymous, the definition can be found in Norvig's prolog implementation. The entirety of that cond decides which of our matchers we're going to use to traverse our corpus.

            ...
            ,@(unless apply
              `((,template ',(replace-anonymous (or get goal-term))))))
            ...

If we get passed the apply argument, we'll be doing something special later. Otherwise, we'll want to slot our results into the template in gen, and failing that, just slot it back into the querying goal form.

       ...
       (loop for ,res = (funcall ,gen)
          while ,res collect ,(if apply
                                  `(apply (lambda ,(variables-in apply) ,apply)
                                          (subst-bindings ,res ',(variables-in apply)))
                                  `(subst-bindings ,res ,template))))))

And that's the meat of it. We're going to be grabbing results out of our generator. As you can see, the special thing we're doing with the apply argument is stitching up a function to apply to a substituted list of our results. If we didn't get an apply, we're just slotting said result back into the template we defined earlier. I find that seeing some macroexpansions really helps understanding at this stage. So, here are the basics:

Plain single-goal:

CL-USER> (macroexpand '(for-all (?id :author "Inaimathi") :in my-fact-base))
(LET* ((#:FACTS1073 MY-FACT-BASE)
       (#:GEN1071
        (MATCH-SINGLE '(?ID :AUTHOR "Inaimathi") +SUCCEED+ #:FACTS1073))
       (#:TEMPLATE1070 '(?ID :AUTHOR "Inaimathi")))
  (LOOP FOR #:RES1072 = (FUNCALL #:GEN1071)
        WHILE #:RES1072
        COLLECT (SUBST-BINDINGS #:RES1072 #:TEMPLATE1070)))
T

or-goals:

CL-USER> (macroexpand '(for-all (or (?id :author "Inaimathi") (?id :message ?message)) :in my-fact-base))
(LET* ((#:FACTS1077 MY-FACT-BASE)
       (#:GEN1075
        (MATCH-ORS '((?ID :AUTHOR "Inaimathi") (?ID :MESSAGE ?MESSAGE))
                   +SUCCEED+ #:FACTS1077))
       (#:TEMPLATE1074 '(OR (?ID :AUTHOR "Inaimathi") (?ID :MESSAGE ?MESSAGE))))
  (LOOP FOR #:RES1076 = (FUNCALL #:GEN1075)
        WHILE #:RES1076
        COLLECT (SUBST-BINDINGS #:RES1076 #:TEMPLATE1074)))
T

and-goals:

CL-USER> (macroexpand '(for-all (and (?id :author "Inaimathi") (?id :message ?message)) :in my-fact-base))
(LET* ((#:FACTS1081 MY-FACT-BASE)
       (#:GEN1079
        (MATCH-ANDS '((?ID :AUTHOR "Inaimathi") (?ID :MESSAGE ?MESSAGE))
                    +SUCCEED+ #:FACTS1081))
       (#:TEMPLATE1078
        '(AND (?ID :AUTHOR "Inaimathi") (?ID :MESSAGE ?MESSAGE))))
  (LOOP FOR #:RES1080 = (FUNCALL #:GEN1079)
        WHILE #:RES1080
        COLLECT (SUBST-BINDINGS #:RES1080 #:TEMPLATE1078)))
T

Using the :get option

CL-USER> (macroexpand '(for-all (and (?id :author "Inaimathi") (?id :message ?message)) :in my-fact-base :get ?message))
(LET* ((#:FACTS1085 MY-FACT-BASE)
       (#:GEN1083
        (MATCH-ANDS '((?ID :AUTHOR "Inaimathi") (?ID :MESSAGE ?MESSAGE))
                    +SUCCEED+ #:FACTS1085))
       (#:TEMPLATE1082 '?MESSAGE))
  (LOOP FOR #:RES1084 = (FUNCALL #:GEN1083)
        WHILE #:RES1084
        COLLECT (SUBST-BINDINGS #:RES1084 #:TEMPLATE1082)))
T

Using the :apply option

CL-USER> (macroexpand '(for-all (and (?id :author "Inaimathi") (?id :message ?message)) :in my-fact-base :apply (format t "~s~%   -Inaimathi" ?message)))
(LET* ((#:FACTS1089 MY-FACT-BASE)
       (#:GEN1087
        (MATCH-ANDS '((?ID :AUTHOR "Inaimathi") (?ID :MESSAGE ?MESSAGE))
                    +SUCCEED+ #:FACTS1089)))
  (LOOP FOR #:RES1088 = (FUNCALL #:GEN1087)
        WHILE #:RES1088
        COLLECT (APPLY
                 (LAMBDA (?MESSAGE) (FORMAT T "~s~%   -Inaimathi" ?MESSAGE))
                 (SUBST-BINDINGS #:RES1088 '(?MESSAGE)))))
T
CL-USER> 

And that's that. Granted, the implementation is a bit more complicated than just writing manual loops, but I'm convinced there are a couple wins here. Firstly, the invocation is simpler, which means that the above definitions will eventually "pay for themselves" in terms of complexity. Secondly, it seems like I could fairly easily mod this into parenscript-friendly forms, which means this'll save me from having to convert fact-bases to object lists on the client side. But that's something I'll tell you about next time.


Footnotes

1 - [back] - Almost always using the question-mark-prefix notation for logic variables for some reason. I'm not sure what the approach gains or loses you yet. I guess in the case of miniKanren, it gains you the ability to unify on vectors since there's no ambiguity, and it might make it easier to read the resulting programs, but I'm not banking on that.

2 - [back] - Though do go over Norvig's version to see a dissection of the common bugs.

3 - [back] - And remember, backtrack! itself fails if it runs out of search space.

Saturday, April 5, 2014

Buildapp Minutia

Before I get to telling you what I've been up to lately, let me share some insight into what it takes to build Lisp executables on Windows and Linux. This is something I had to do recently at work. I wouldn't normally bother, but we still have Windows dev boxes, and it seemed like an interesting enough exercise. In no particular order, here are things you need to think about when you're putting things together with buildapp.

Be careful with your storage folders

My usual tactic for this sort of thing is to store static files and stored files in the project directory. It's easy to do because I just get to use relative paths everywhere, and it works well enough if your distribution plan is basically git clone https://github.com/you/whatever.git. The only downside is that you have to run the program in the context of its project directory, or have random static folders pop up in random places.

For people who don't just randomly write Lisp code for fun, this isn't exactly acceptable. To do proper binary deployment for a more general audience, you need to make sure that there's a single consistent place for your program to store stuff. And if you want to be cross-platform about it, you can't just do (defvar *storage* "~/project-name/") and call it a day. Fortunately, user-homedir-pathname is a thing, so what I ended up doing is setting my storage directory to (merge-pathnames ".project-name/" (user-homedir-pathname)). However, it wasn't quite that simple.

My other usual tactic is to just put such config variable declarations in the appropriate package.lisp file, and leave it at that. Unfortunately, when I tried that with buildapp, the first teammate who tried it responded with

Why is this complaining that "C:\Users\Inaimathi\2dmacs\" doesn't exist?

The problem is that buildapp takes your image and binary and ships it, including anything you did at compile or config time. What I really wanted was for my binary to check at runtime what the users' home directory is and use that as the starting point for its own storage directory. The way I ended up doing that, though I'm sure others are possible, was to define the appropriate variables where I'd usually put them, then set them again in the entry-point function. Mine happened to be called main, because why not, so what you'd see if you looked into the appropriate files is[1]

;;; package.lisp

...

(defparameter *storage* nil)

...
;;; start.lisp

...

(defun main (&optional argv)
  ...
  (setf *storage* (merge-pathnames ".2dmacs/" (user-homedir-pathname)))
  ...
  )

That lets the appropriate functions access storage files, but makes sure that *storage* will ultimately be set to the appropriate directory on the users' machine rather than yours.

Native SBCL shell sucks balls under Windows

We had a couple problems deploying a specific part of the system that dealt with running external programs. And like the title says, that was balls on Windows. Not that I expected it to run perfectly across operating systems, but it turned out to be particularly thorny. First off, external-program doesn't do what we wanted. It kinda does, but doesn't seem to let you have a hook in to search the users' environment variables for program locations. Luckily, we're using SBCL, and its native sb-ext:run-program does give you this facility in the form of the search keyword argument. Also, for some reason SBCL handles streams really oddly in Windows? Not sure why, but the end result was that I couldn't capture *standard-output* in the way I wanted to in one part of the code. We ended up dealing with that by calling sb-ext:run-program with :output :stream, then collecting the result through sb-ext:process-output instead of direct stream capture. Cool, I guess. It seemed to work in testing. But the particular shell command we were using started seizing up once we hit a particular corpus size[2]. The solution we ended up taking was using uiop:run-program instead. It deals with streams the way I was expecting and handles all shell commands we've thrown at it so far very snappily.

Finding dependencies is ... non-trivial

buildapp doesn't load dependencies automatically. It expects you to pass it a bunch of --load-system command line arguments for things you want included in the final binary. The first thing I did was check the docs. There didn't seem to be a dependency-tree call anywhere, so I wrote this one. On a whim though, I dropped by #lisp, where I got the opportunity to ask Zach about it[3]. It turns out that what I wrote is really only going to get you the dependencies declared in asdf files, and there are apparently enough people out there who hook manual load or ql:quickload statements in through odd, ad-hoc ways that he doesn't go that route. Turns out that what he does is asdf:load-system/ql:quickload the thing he wants to build, then runs ql:write-asdf-manifest-file to find the list of all systems included in the image after everything relevant has been loaded. The buildapp call you make after that needs to look something like buildapp --manifest-file wherever/you/had/ql:write-manifest-file/output.txt --load-system foo --load-system bar ... --entry your-entry-fn --output your-program-name. Mildly annoying, but at least it'll get you what you need.


Footnotes

1 - [back] - Ok, it's mildly more complicated than this too, because I wanted to deal with some command line arguments, and I wanted some static files to be optionally re-generated at each program load. You can still see the basic principle.

2 - [back] - If you must know, it was git ls-tree, and the call started hanging around the time that we committed the ~third file into the target repository.

3 - [back] - Here's the transcript, for those of you who want the details:

13:58 <inaimathi> Anyone here who knows things about asdf
                  dependencies, and is willing to answer
                  questions/lend an eye?
13:58 <Xach> inaimathi: I know a bit. What's up?
13:59 <inaimathi> I'm trying to find the dependency tree
                  (preferably ordered) of a given asdf system.
13:59 <inaimathi> Is there a build-in way of doing that?
13:59 <inaimathi> *built
13:59 <inaimathi> Hm
14:00 <inaimathi> Actually, you might be able to help with the
                  larger problem too. The real problem is that I'm
                  trying to run buildapp for a project, and I want
                  to know what systems I need to load as part of
                  the shell command.
14:00 <Xach> oh. the way i do that is to load the project once with
             dependencies downloaded automatically and then note
             what was loaded.
14:00 <Xach> then i load it again with just those things.
14:01 <Xach> it is not great but arbitrary things load during
             find-system time so i'm not sure if there's a nice way
             around it.
14:01 <inaimathi> How do I go about doing that?
14:02 <inaimathi> That is, loading a project while getting output
                  of what's being loaded. Is there a ql flag or
                  something?
14:03 <Xach> inaimathi: i don't actually note the specifics. i just
             load it, and then have quicklisp dump out an index to
             the currently installed libraries via
             (ql:write-asdf-manifest-file
             "/path/to/my/project/system-index.txt")
14:03 <inaimathi> Ah
14:03 <Xach> then i use buildapp --asdf-manifest system-index.txt
             --<the rest of the stuff>
14:04 <Xach> Each time I make a new project I refine the makefile
             technique a little more
14:05 <inaimathi> Ok then. Do you think something like
                  http://stackoverflow.com/a/22732580/190887 could
                  be a valid approach, or is that going to miss
                  things?
14:07 <Xach> inaimathi: One difficulty arises from .asd files with
             things like (eval-when ... (asdf:load-system
             "some-prerequisite"))
14:07 <inaimathi> Right; those wouldn't be noted by the asdf system
                  itself.
14:07 <inaimathi> Dammit.
14:08 <Xach> I also don't know if the slot you're looking at
             includes :defsystem-depends-on dependencies.
14:08 <Xach> inaimathi: I once asked about how to do this on
             asdf-devel and I got an answer I didn't really
             understand (it was complicated) and I haven't
             revisited it. And I'm not sure there's an archive you
             can search for it.
14:09 <rtoym> asdf-devel is on gmane.org
14:10 <inaimathi> Hm. I'll take the asdf-manifest-file approach for
                  the moment, and probably ask around on asdf-devel
                  later.
14:10 <inaimathi> Thanks!

Tuesday, March 25, 2014

Fact Base Indices

Got through a late-night programming binge, followed by more of the same on my lunch break, followed by a couple more days of thought and work during downtime. It's up at my fact-base implementation. Basically, we have indices now. Fairly naive, simple indices, but they should suffice for what I'm up to[1].

Before we get to discussing any code, lets back up and discuss the idea of an index for a moment.

The Idea of an index

An index in this context is an extra layer on top of our fact storage engine that keeps track of what we've put in/taken out in a way that makes certain things easier to look up. It's easier for fact-bases than it is for relational databases. Since every fact is made up of three components[3], all we have to do is keep an index by one or two slots. What we're basically looking to do is a really fast lookup of some subset of all facts in a base based on one or two of those keys[4]. The way I've chosen to do it, after some advice from friends who've used systems something like this, is by maintaining hash tables[5] in memory that give you shortcuts to some specified indices. We're trading space[6] for time[7].

The Post-Explanation Version

This was a much different article initially; I was going to discuss some bone-headed intermediate states for this code before getting to the "final"[8]. It ended up looking like a stupid idea in this particular instance because the previous versions weren't things I'd consider running after having thought through it a bit more.

I try to do that a fair amount these days.

Not the stupid implementations thing, though that's also true. I mean sit down with another actual human being and talk them through the code I just wrote. It's not quite as good as blogging about it, but it's faster and turns up almost as many issues. It also has an added benefit that blogging doesn't seem to give me, which is showing me when I'm completely off my nut. Anyhow, here's what I had after I explained the thing to someone, poured some shower time into thinking about it, and drafting the first version of this post:

(in-package #:fact-base)

(defclass index ()
  ((table :reader table :initform (make-hash-table :test 'equal))))

(defun make-index (indices)
  (let ((index (make-instance 'index)))
    (loop for ix in indices
       do (setf (gethash ix (table index))
                (make-hash-table :test 'equal)))
    index))

(defmethod indexed? ((state index) (ix-type symbol))
  (gethash ix-type (table state)))

(defun decide-index (&optional a b c)
  (cond ((and a b c) (list :abc a b c))
        ((and a b) (list :ab a b))
        ((and a c) (list :ac a c))
        ((and b c) (list :bc b c))
        ((and a) (list :a a))
        ((and b) (list :b b))
        ((and c) (list :c c))))

(defmethod format-index ((ix-type symbol) (fact list))
  (destructuring-bind (a b c) fact
    (case ix-type
      (:abc (list a b c))
      (:ab (list a b))
      (:ac (list a c))
      (:bc (list b c))
      (:a (list a))
      (:b (list b))
      (:c (list c)))))

(defmethod map-insert! ((facts list) (state index))
  (dolist (f facts) (insert! f state)))

(defmethod insert! ((fact list) (state index))
  (loop for ix being the hash-keys of (table state)
     for ix-table being the hash-values of (table state)
     do (push fact (gethash (format-index ix fact) ix-table))))

(defmethod delete! ((fact list) (state index))
  (loop for ix being the hash-keys of (table state)
     for ix-table being the hash-values of (table state)
     for formatted = (format-index ix fact)
     do (setf (gethash formatted ix-table) 
              (remove fact (gethash formatted ix-table) :test #'equal :count 1))
     unless (gethash formatted ix-table) do (remhash formatted ix-table)))

;;;;; Show methods
;; Entirely for debugging purposes. 
;; Do not use in production. 
;; Seriously.
(defmethod show (thing &optional (depth 0))
  (format t "~a~a" (make-string depth :initial-element #\space) thing))

(defmethod show ((tbl hash-table) &optional (depth 0))
  (loop for k being the hash-keys of tbl
     for v being the hash-values of tbl
     do (format t "~a~5@a ->~%" 
                (make-string depth :initial-element #\space) k)
     do (show v (+ depth 8))
     do (format t "~%")))

(defmethod show ((ix index) &optional (depth 0))
  (show (table ix) depth))

There, that's not so intimidating, is it? The actual interface to these indices is in fact-base.lisp, but the above contains most of the functionality. First, ignore the show methods at the bottom there. That was just a piece of hackery to give me a usable visual representation of an index while I was debugging this beast. Lets start at the top.

(defclass index ()
  ((table :reader table :initform (make-hash-table :test 'equal))))

(defun make-index (indices)
  (let ((index (make-instance 'index)))
    (loop for ix in indices
       do (setf (gethash ix (table index))
                (make-hash-table :test 'equal)))
    index))

(defmethod indexed? ((state index) (ix-type symbol))
  (gethash ix-type (table state)))

An index has a table of bindings. You'd call make-index in a way resembling (make-index '(:a :b :bc :ac)), which would give you back an index with room to dissect a fact base into chunklets keyed off of

  1. the first element
  2. the second element
  3. the second then the third element
  4. the first then the third element

No, I have no idea if I'll ever actually need a setup like this. The indexed? method takes an index and an ix-type symbol and tells you whether the given index is tracking that particular type of lookup.

(defun decide-index (&optional a b c)
  (cond ((and a b c) (list :abc a b c))
        ((and a b) (list :ab a b))
        ((and a c) (list :ac a c))
        ((and b c) (list :bc b c))
        ((and a) (list :a a))
        ((and b) (list :b b))
        ((and c) (list :c c))))

(defmethod format-index ((ix-type symbol) (fact list))
  (destructuring-bind (a b c) fact
    `(,ix-type
      ,@ (case ix-type
           (:abc (list a b c))
           (:ab (list a b))
           (:ac (list a c))
           (:bc (list b c))
           (:a (list a))
           (:b (list b))
           (:c (list c))))))

Those both map an index type to the components they'll need for insertion/lookup. I've thought about factoring out the obvious pattern, and even wrote some prototype code to do it, but it turns out that for 6 indices, the macrology involved is more complicated than the trivial lookup-table thing. If fact-base were an arbitrary storage system, this would probably be complicated enough to macro away, but the whole point is that I'm only ever storing triples. Which means I'm never going to need more index types than this, and usually much less.

(defmethod map-insert! ((facts list) (state index))
  (dolist (f facts) (insert! f state)))

(defmethod insert! ((fact list) (state index))
  (loop for ix being the hash-keys of (table state)
     for ix-table being the hash-values of (table state)
     do (push fact (gethash (format-index ix fact) ix-table))))

(defmethod delete! ((fact list) (state index))
  (loop for ix being the hash-keys of (table state)
     for ix-table being the hash-values of (table state)
     for formatted = (format-index ix fact)
     do (setf (gethash formatted ix-table) 
              (remove fact (gethash formatted ix-table) :test #'equal :count 1))
     unless (gethash formatted ix-table) do (remhash formatted ix-table)))

These three utility methods at the end are exactly what you'd expect. insert! takes a fact and an index and inserts one into the other, into how-many-ever particular lookups that index is tracking. map-insert! is a shorthand for inserting a bunch of facts at once into the same index. And finally, delete! takes a fact and removes it from all index lookups, then cleans up empty lookup lists.

And that's that. You've got a quick run-through of how this works out in practice here. I get the feeling I'll be talking about deltas, forking and the applications of fact bases before long, but we shall see.


Footnotes

1 - [back] - Granted, because "what I'm up to" at this point "an almost trivial semi-anonymous forum for a local meetup group", and "an almost trivial notebook-style REPL for common lisp", that's true of almost any data storage technique ever, but still. The naive, index-less storage was giving cl-kanren[2] a bit more trouble than I wanted it to when I pushed the stored entry count past 10000 or so. Which is not satisfactory. So yeah, this index is basically a search-space optimization for my database traversals. I'll let you know how it goes.

2 - [back] - Which I'll also have to talk about before long, if for no reason other than to get some ideas out of my head temporarily.

3 - [back] - The second two might be compound structures, but it's still only three top-level elements.

4 - [back] - If you have none of the three components of the fact you're looking for, you can't do better than "all facts"; if you have all of them, you already have the fact you're looking for. So we're only interested in the other two cases.

5 - [back] - There's no particular reason you couldn't use some tree structure if you like, but hashes are easy, and they come with Common lisp, so...

6 - [back] - Which is in ample supply these days.

7 - [back] - Which is always a vanishingly scarce resource.

8 - [back] - Which isn't really final any more; I've updated it several times since the stuff here.

Thursday, March 20, 2014

Housekeeping

Just some quick housekeeping while I draft up my next proper piece.

cl-cwd

A friend of mine mentioned that he had to hack up cl-git-fs to make it export cwd and with-cwd. Which was surprising when I heard it, but really shouldn't have been. This is the sort of problem Common Lispers seem to solve by copy/pasting a few implementation-specific snippets into projects that need the functionality. That's not good enough for me. So, here's cl-cwd; a library to get/manipulate/temporarily manipulate your current working directory in a cross-platform and cross-implementation way. Patches and bug reports welcome as always, and hopefully it's useful for you. I haven't gotten around to hacking the same components out of cl-git-fs, but I might eventually. Or, I might just use the os section of UIOP, which was only pointed out to me after I posted that cl-cwd repo. I'm not entirely sure what the right approach here is; are monolithic utility libraries preferable to very small, single-purpose packages? Not sure, but I kind of prefer the smaller, specific ones for my own purposes. Even though it never wastes enough resources to matter to me, it's deeply irritating on some level to include all of :alexandria and :anaphora for, essentially with-gensyms, aif, awhen and alambda.

fact-base updates and relevant minutia

I've gotten some work done on fact-base, which on reflection really should have been called cl-triple-store, with an eye on using it in production. We'll see if it actually happens, or if it just remains an idle fancy, but

  1. you can now add indexes to your fact bases, which should make data retrieval faster[1]
  2. writing deltas is now quite efficient[2]

Again, let me know if it's useful, or if it's sooooo-close-to-being-useful-if-it-only-had-this-one-feature. I might do something about it at this stage.

The only related piece of minutia is that I've found myself reading a lot about miniKanren, core.logic and logic programming in general. I think it might be a really good way to query structures like these triple-stores I've been building lately. Also, it turns out someone's already done most of the work of implementing that in CL for me, so I forked it and added/changed the few chunklets I needed to. Matthew Swank, if you're reading this and care, let me know and I'll send you patches. I assumed you wouldn't care, since the copyright line said 2008, but I might have been wrong.

Editors

Someone finally sat down and walked me through the installation for Light Table[3]. I haven't been paying attention to the propaganda, so I'm not entirely sure exactly how this is going to revolutionize editing[4], but it looks promising for a prototype. I was able to get to a productive-ish point with it within about an hour, and that's a high bar. I remember learning Emacs[5] over the course of weeks.

Another one I tried enough to appreciate is IPython Notebook. The idea of a browser-based editor has, shall we say, crossed my mind, but the idea of applying it to 1D coding hadn't occurred to me. I gotta say, I like the idea, and I'll be trying to do something about that right here. I've only got the easy part done so far; a front-end of code-mirror rigged up to a Lisp process that evaluates things and sends the result values back[6]. The hard part is going to involve a persistence layer, multiple cells, multiple notebooks and probably some smaller browsing/presentation/generation features to let me do whatever[7]. Spoiler warning: fact-base and its history/indexing operations will feature prominently. The ultimate goal is no less than replacing Emacs as my Common Lisp IDE of choice.

And that's sort of why I've been on another editor/tools kick lately.

I've been talking to some friends in the local programming community, and we seem to have collectively decided that we really, really want Light Table to succeed because it sucks not being able to recommend anything as a starting tool for Common Lisp development to programming newbies. Our options right now are what can charitably be referred to as a very intimidating operating system/rss aggregator/IRC client/moustache trimmer[8], a wannabe very intimidating blah blah blah, a very pretty, passably comfortable pair of handcuffs, a very performant, robust, ridiculously expensive pair of handcuffs, and vim. If you're recommending Scheme, there's also Racket[9], but that still doesn't count as a Common Lisp environment.

Some of those are things I'd recommend to people who are already programmers. One of them is something that I'd recommend to people looking to learn languages for the sake of learning them. But there isn't a tool anywhere on that list that compares to something like Idle or, more to the point, Light Table. I don't think it's quite at the point where I'd switch over to it myself, but I sincerely hope it gets there. And I'll be devoting some effort to pushing that along. You can too, since it's up and properly licensed and all...


Footnotes

1 - [back] - There's an article either in the works or already posted, depending on what order I decide to put these up.

2 - [back] - From a macro-optimization perspective; I'm sure I could pull some micro-optimizations on it too, but I'm not into that.

3 - [back] - Turns out that even the Debian Unstable repo doesn't have a recent enough version of Leiningen. I needed to latest using this.

4 - [back] - In fact, the only visible revolutionization of "randomly putting shit in your buffers that you can't edit instead of giving you a goddamn REPL", is profoundly annoying, though it's possible to disagree about these things. I guess I can imagine it looking really snazzy in presentations and demo videos.

5 - [back] - ...and before that jEdit, and before that Eclipse.

6 - [back] - And yes, it handles multiple values as well as *standard-output* emissions, in case you were wondering.

7 - [back] - At minimum the possibility to write :cl-who/:cl-css instead of markdown, a way of installing libraries that doesn't go through quicklisp, and maybe some git/drakma/shell support. I may end up having to add non-shitty static file serving to house for this...

8 - [back] - Pre-built Lisp IDE version optional.

9 - [back] - Which I do wholeheartedly recommend; it's very cool.