Saturday, March 26, 2011

A Little Bit of Elisp

I've had too much Common Lisp coding at work this week. I basically did two 12 hour sessions across Wednesday and Thursday, then a 4 hour on Friday with a little time off for fighting PHP-related fires and a bit of sleep. So today, I took a break. And what did I do on my break, you might ask?

I hacked Emacs Lisp.

I tell ya, my fiancee loves me. That's right. I took a break from my Common Lisp day-job by getting up on Saturday and dusting off some old Elisp code I had lying around. I touched up some git-mode customizations, an etags library I sometimes use, and my .emacs itself, but my main target was the blog-mode module (which I've actually been using to write these articles, except for one awkward brush with a markdown converter). It has served, but the code was far from elegant, and there were a couple of features I've been meaning to add, but never quite got around to, always telling myself to just get through the blog post instead. The code is still far from elegant, so I won't talk about that, but the features are there.

First thing, and probably the most pressing, is that those nice highlighted code-blocks were getting annoying. It would work fine for plain gray text (which I use sometimes, in small inline snippets), but to do it properly, I had to paste code into a separate buffer, turn on the correct highighting mode, run htmlize-buffer on it, then paste it back into the blog post and maybe tweak it for good measure. I figured that my ideal interaction would be the code auto-detecting what language I'm using and highighting correctly, but one step back would be asking for a highlighting mode and applying it to the code I wanted to htmlize. So here's how that looks

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; <pre> and <code> definitions
(definsert code-block "<pre>" "</pre>")
(definsert inline-code "<code>" "</code>")

;; region versions are more complicated to accomodate htmlize
(defun region-to-inline-code (code-mode)
  "HTMLize just the current region and wrap it in a <code> block"
  (interactive "CMode name: ")
  (let* ((start (region-beginning))
         (end (region-end))
         (htmlified (get-htmlified-region start end code-mode)))
    (delete-region start end)
    (insert-inline-code)
    (insert htmlified)))

(defun region-to-code-block (code-mode)
  "HTMLize the current region and wrap it in a <pre> block"
  (interactive "CMode name: ")
  (let* ((start (region-beginning))
         (end (region-end))
         (result (get-htmlified-region start end code-mode)))
    (delete-region start end)
    (insert-code-block)
    (insert result)))

(defun get-htmlified-region (start end code-mode)
  "Returns a string of the current region HTMLized with highlighting according to code-mode"
  (let ((htmlified nil))
    (clipboard-kill-ring-save start end)
    (get-buffer-create "*blog-mode-temp*") ;;using 'with-temp-buffer here doesn't apply correct higlighting
    (with-current-buffer "*blog-mode-temp*"
      (funcall code-mode)
      (clipboard-yank)
      (setq htmlified (substring (htmlize-region-for-paste (point-min) (point-max)) 6 -6)))
    (kill-buffer "*blog-mode-temp*")
    htmlified))

I pasted that block in from my code file, highlighted it, then typed C-c C-p emacs-lisp-mode [ret], in case you were wondering. The result was that pretty block above. region-to-code-block and region-to-inline-code are actually the same function except for which insert they use, and I would factor that out if it ever got to the point that there needed to be a third function doing the same, but it doesn't seem worth it for just two functions.

EDIT:

Ok, ok goddammit. Here. They're simplified now.

(defun region-to-inline-code (code-mode)
  "HTMLize just the current region and wrap it in a <code> block"
  (interactive "CMode name: ")
  (htmlized-region code-mode #'insert-inline-code))

(defun region-to-code-block (code-mode)
  "HTMLize the current region and wrap it in a <pre> block"
  (interactive "CMode name: ")
  (htmlized-region code-mode #'insert-code-block))

(defun htmlized-region (code-mode insert-fn)
  (let* ((start (region-beginning))
         (end (region-end))
         (result (get-htmlified-region start end code-mode)))
    (delete-region start end)
    (funcall insert-fn)
    (insert result)))
Sun, 27 Mar, 2011

I uh, also put in an edit block function and a footnote manager[1]. The edit blocks are pretty self-explanatory; just a block with a date at the bottom to indicate when I did the thing. After a couple of definition macros[2], it's actually a one-liner.

(deftag edit "<span class=\"edit\">EDIT:\n\n" (concat "\n" (format-time-string "%a, %d %b, %Y" (current-time)) "</span>"))

The footnote manager is a bit more complex. I've actually been doing them manually for the last little while, which started to get frustrating[3]. The process was to put a numbered tag down with <a name="somethingHopefullyUnique">, and hook it up to a correspondingly numbered [back] link at the bottom of the page, then write the footnote, then find my way back. The linking turns out to be the hardest part there, because these posts potentially get displayed together on my blog, so I had to be very careful to make the name unique across the entire blogs' history, not just within that article[4]. With this new function, instead it's C-c f to insert a fresh footnote, or C-c C-f to convert the selected region to a footnote. The links are generated and numbered automatically, so all I have to do is actually write the footnote[5].

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; footnote definitions
(defun insert-footnote ()
  "Inserts footnote, and a return link at the bottom of the file. 
   Moves point to footnote location."
  (interactive)
  (progn (footnotes-header)
         (let ((footnote-name (format-time-string "%a-%b-%d-%H%M%S%Z-%Y" (current-time)))
               (num (number-to-string (+ 1 (count-footnotes)))))
           (insert "<a href=\"#foot-" footnote-name "\" name=\"note-" footnote-name "\">[" num "]</a>")
           (goto-char (point-max))
           (insert "\n\n" num " - <a href=\"#note-" footnote-name "\" name=\"foot-" footnote-name "\">[back]</a> - "))))

(defun region-to-footnote ()
  "Inserts a footnote at point and return link at the bottom. Moves the current region to the end of the file. 
   Leaves point where it is."
  (interactive)
  (save-excursion (kill-region (region-beginning) (region-end))
         (insert-footnote)
         (yank)))

(defun footnotes-header ()
  "Inserts footnote header if not already present"
  (unless (save-excursion (search-forward blog-footnote-header nil t))
    (save-excursion 
      (goto-char (point-max))
      (insert "\n\n" blog-footnote-header))))

(defun count-footnotes ()
  "Returns the number of footnotes in the current file. Used for human-readable note labels"
  (interactive)
  (save-excursion
    (if (not (search-forward blog-footnote-header nil t))
        0
      (let ((count -1))
        (while (progn (setq count (1+ count))
                      (search-forward "<a href=\"#note-" nil t)))
        count))))

Boy, that's playing hell with the highlighting right now. It's fairly self-explanatory; count-footnotes counts up how many footnotes I have left, footnotes-header checks if there's a footnote header in the post already[6], insert-footnote just creates a new footnote/backlink and takes me to the bottom of the page to write it, and finally, region-to-footnote takes the current region and converts it to a new footnote (leaving the point where it is).

Even though it's a simple, and specific[7] piece of code, I still learned a lot by testing it out like this. Specifically, the code formatting functions need to accept nil as an argument[8] (which should take 5 minutes), and the footnote section needs a way to re-number footnotes and jump between corresponding note/back links (which seems like it could take a while).

I'm going to sleep now though; I'll leave those features for the next time I need a break from Common Lisp.

EDIT:

Ok, so it was actually slightly less than 5 minutes to get the code argument done; one line change did it (see if you can guess which one)

(when (fboundp code-mode) (funcall code-mode))

The latest is now up at github.

Sun, 27 Mar, 2011

Footnotes

1 - [back] - And yes, since you ask, I am basically using this post as a way to test the editing mode I'm talking about.

2 - [back] - If you want to see the definition macros, check out the github page I started for my little utility files. The documentation is extremely light, but it's only because I fully expect to be the only one using these.

3 - [back] - To the point that I would frequently include a long, rambling paranthetical comment instead of putting the damned thought in a footnote, where it belongs. Interface difficulties really do lead to a lot of shoddy work, it seems.

4 - [back] - The way I'd been doing that was by using the article name and a number in the href and name parameters. The mode is actually better, using a date and timestamp.

5 - [back] - I still haven't found a way to automate writing these columns, but that's not the same as saying it can't be done.

6 - [back] - And adds one if it doesn't exist yet.

7 - [back] - Which is to say, it had a very specific goal in mind.

8 - [back] - (and default to fundamental-mode in that case)

Wednesday, March 23, 2011

Javascript with a Lisp

Obviously, I'm not a vet yet, so take these musings on Parenscript with a grain of salt. Also, feel free to look up the tutorial they provide for a more hands-on approach; I'm just talking about my experience with it, not attempting to teach it.

There are some ugly, un-abstractable patterns in JavaScript code (which you'll be familiar with if you've ever done more than a tiny bit of jQuery development). They show up often, and you can't really do much about them in JS without resorting to eval. Which you shouldn't do. Parenscript knocks most of them out cold. The argument about jQuery being Good Enough™ also turns out to be moot, since you can easily compose work in both (that is, include jQuery and use Parenscript to generate framework code rather than plain JavaScript). I've created exactly three JS files with this so far, and here are some macros that I'm not sure I'd be willing to do without (I'll start small).

(defpsmacro $ (selector &body chains)
  `(chain (j-query ,selector)
          ,@chains))

That's the pretty essential one I mentioned last time; it just lets you do things like

($ ".target-div" 
     (css (create :height 30 :background-color "#f00")) 
     (effect "explode" 3000))

it's just keeping pace with jQuery. Next up

(defpsmacro \ (&body body) `(lambda () ,@body))

I... honestly wasn't expecting to use this. I'm borrowing Haskell's anonymous function notation for brevity, but only because there's no actual λ key on my keyboard. This is something you don't even notice while coding in JavaScript. You just get used to having to wrap various random things in function () { ... }. It never occurs how annoying this is until you get the chance to do away with it.

(defpsmacro doc-ready (&body body)
  `($ document
      (ready (\ ,@body))))

Told you \ would come in handy (and this is one of the about twenty places it shows up in a 70-line parenscript file). This isn't particularly interesting; just shortcut notation for $(document).ready(function () { ... });.

(defpsmacro defpsajax (name (&rest args) url &optional (success '(lambda (data) ($d data))))
  `(defun ,name ,args
     (chain $ (ajax (create :url ,url
       :data (create ,@(loop for a in args collect (intern (to-str a) :keyword) collect a))
       :context (@ document body)
       :type "POST"
       :success ,success
       :error (lambda (a b error) ($d a b error)))))))

An odd note; I have to quote the default optional function (as above), but I must pass unquoted lambdas in, otherwise it barfs. This one's a bit heavier. It's a shortcut for defining ajax functions. This is the sort of thing you just plain can't do in vanilla javascript. You'd have to define it as

function defPsAjax(address, dataSet, fn) {
    if(!fn) fn = function (data) {$d(data);};
    $.ajax({ url: address,
             type: 'post',
             data: dataSet,
             success: fn,
             error: function (a, b, error) {$d(a, b, error);}
           });
}

and then use it by doing something like

function foo(bar) {
    defPsAjax("/url", { "bar": bar }, function (data) { baz; }); 
}

instead of being able to

(defpsajax foo (bar) "/url" (lambda (data) baz))

I have two problems with that. First, it doesn't kill the boilerplate around defining foo (which you don't have to deal with if you use the macro). Second, that shorter macro definition expands into a full $.ajax call, which means there's no additional overhead from foo calling defPsAjax at runtime. Together, those problems prevent you from properly expressing things in vanilla jQuery; you'll incur (significant) readability and (probably trivial) performance penalties by creating enough intermediate functions. Neither penalty piles up if you use defpsmacro.

There are also a few nice things I get for free (rather than having to define them). As I mentioned last time, having who-ps-html (for easy HTML generation with javascript) and format (for string templating) was already enough to tempt me into using parenscript. Putting strings together in js is fugly. I'm aware of the hacks, and they're not nearly as satisfying as just having a proper string-formatting primitive available in the language. Trying the same tactic with strings which contain HTML tags crosses over into pug fugly territory without so much as a warning. Even if you absolutely must concatenate strings at runtime, (+ foo ", " bar " || " baz) is still easier than foo + ", " + bar + " || " + baz. There's a couple of other similarly useful things that you don't see until you work with them. let and let* are both supported, for starters. let* is actually pretty straightforward

(let* ((a 2)
       (b (+ 2 a)))
    (foo a b))

expands into

var a = 2;
var b = 2 + a;
foo(a, b);

but the equivalent let maintains the limitation that declarations don't refer to each other.

var a1 = 2;
var b = 2 + a;
foo(a1, b);

That might be problematic if you're the sort of person who names variables with numbers at the end. I'm not, so I'll take it.

Another free advantage is optional arguments and implicit returns.

(defun foo (bar &optional (baz "mumble")) baz)

expands into the javascript

function foo (bar, baz){
    if(baz === undefined) {
       baz = "mumble";
    }
    return baz;
}

That's it for the good stuff I've discovered so far (although if you don't agree that macros, easy html formatting, real optional arguments and implicit return are a pretty big pile of win, you might be a JavaScript programmer[1]).

Lets talk about where Parenscript can bite you in the ass.

First, avoid it if you're a lisp newb. There are a lot of parentheses running around when you write your javascript code this way, and just one can make the difference between $(foo).bar({'a': b}); and $(foo).bar.create('a', b);. The real downfall here is that, unlike in plain Common Lisp, it won't throw an error about unbalanced parentheses (if you don't have enough parentheses, it'll still tell you, but it won't give you the typical "expecting [n] args" error if you transpose one). Instead of erroring, it will generate incorrect JS code. This is actually a good argument for using macro-heavy parenscript code because the fewer actual expressions you have to type, the less chance there is that you mistype one. Use your macroexpander and show-paren-mode aggressively.

Second, the chain macro has some fairly odd behaviour with other macros, and it keeps you from abstracting certain patterns without resorting to ps* instead of ps. For instance

(defpsmacro highlight (&optional (color "\#0f0"))
  `(effect "highlight" (create :color ,color) 500))

Having defined that, I would expect (ps ($ "foo" (highlight))) to expand into $("foo").effect('highlight', { 'color': '#0f0' }, 500);, but it actually does $("foo").highlight();. If I wanted to get that first expansion, I'd have to define highlight as

(defun highlight (&optional (color "\#0f0"))
  `(effect "highlight" (create :color ,color) 500))

and call it by doing (ps* `($ "foo" ,(highlight))). That's not actually horrible (we're only into regular fugly here) but it prevents you from fully using your macroexpander, does no work at macroexpansion time and requires you to quote your input. Manageable, but still a net loss.

The last part is that your javascript definitions share the Lisp namespace. Which makes sense, since one of the goals of Parenscript is to have js and CL interoprerate at some level, but it still caught me slightly by surprise. What I mean specifically is

(ps (defun foo () bar))

In addition to expanding out to function foo () { return bar; }, it also defines a Lisp function in the current package called foo. The reason I found this out is that I have a habit of giving JS ajax functions the same name as the functions they'll be interacting with on the server side. Don't do that. I spent a good 15 minutes trying to debug a very odd wrong number of arguments error before realizing that I was accidentally shadowing the function I needed to call.

As a final note, and this should really go without saying, parenscript is not a way to avoid learning JavaScript or jQuery (or your framework of choice). It's a way to simplify development work with them after you know them cold and have more than a few hours logged with Common Lisp. Use it properly and it'll serve you well, go in with a broken/incomplete understanding of JavaScript at your own peril.


1 - [back] I'm putting this footnote here because I don't want that comment to sound bigoted. I make a distinction between "someone who knows JavaScript" (a programmer who, among other languages, also uses JavaScript) and "JavaScript programmer" (someone who knows only JavaScript and is zealously convinced it's Good Enough™). I have nothing against the first group. I have the same contempt for the second group that I reserve for all [x] programmers, whether [x] is JavaScript, C, Java, Basic, C#, Haskell or Lisp.

Friday, March 11, 2011

Puzzling with Lisp

This post started out as a reply to a lisper on codereview.stackexchange.com. It got pretty long, so I ended up posting the final solution there, and the story here. For those too lazy to click, the problem was given as

;; Cryptoarithmetic. In cryptoarithmetic problems, we are given a problem wherein the digits are replaced
;; with characters representing digits. A solution to such a problem is a set of digits that, when substituted
;; in the problem, gives a true numerical interpretation. Example:
;;   IS
;;   IT
;;   ___
;;   OK
;;
;;   Has a solution { I = 1; K = 1; O = 3; S = 5; T = 6}.  For each of the below cryptoarithmetic problems,
;;   write a program that finds all the solutions in the shortest possible time.
;;
;;   IS     I
;;   IT    AM
;;   __    __
;;   OK    OK

The original poster put up a brute force solution, so I figured I'd follow suit (I'm sure there's a way to reduce the problem space significantly, but I've spent quite enough time on this puzzle. If you'd like to upstage me, feel more than free).

Ok, so I sat down for a while (probably longer than I should have) and thought how I would actually put together a brute-force solution to this. Basically, the ideal scenario is: for the input IS IT OK, I would have a function to compute (= (+ IS IT) OK).

    (lambda (s i |t| o k) ;; T is a reserved symbol, so I must escape the t
      (= (+ (digits->number i s) (digits->number i |t|)) (digits->number o k)))
If I had this function, I could simply
    (loop for i from 0 to 99999
          when (apply [that function] (number->digits i)) 
          collect (number->bindings i '(s i |t| o k)))
EDIT:

I was severely under-optimistic here; the ideal situation would be to have a function that does

(lambda (s i |t| o k)
  (when (= (+ (digits->number i s) (digits->number i |t|)) (digits->number o k))
    (list "s" s "i" i "t" |t| "o" o "k" k)))
which I could then use by doing (loop for i from 0 to 99999 when (apply [that function] (number->digits i)) collect it) Let this be a lesson to you; always apply the maximum amount of wishful thinking (the response over at CR has solution-fn generate this function rather than the original) Fri, 08 Apr, 2011

That would give me the list of all single-digit bindings for the letters "S I T O K" that satisfy the problem. Ok, so what do I need for that? First off,

    (defun digits->number (&rest digits) 
       (parse-integer (coerce digits 'string)))

    (defun number->digits (num &optional (pad-to 5)) 
       (coerce (format nil (concatenate 'string"~" (write-to-string pad-to) ",'0d") num) 'list))

    (defun numbers->bindings (num bindings)
       (let ((digits (number->digits num)))
          (mapcar (lambda (b d) (,b . ,(parse-integer (format nil "~a" d)))) bindings digits)))

those helpers were mentioned, and seem simple enough (I'm defining them as naively as possible, with no eye to performance right now, so don't bother pointing out that strings are slow). The last thing I need is something that takes a problem string (like "it is ok") and returns that lambda above. So, here's the magic trick.

    (defmacro solution-fn (problem-string)
      (let* ((terms (string->terms problem-string))
             (args (remove-duplicates (apply #'append terms))))
        `(lambda ,args
           (= (+ ,@(loop for term in (cdr terms) collect `(digits->number ,@term)))
              (digits->number ,@(car terms))))))

That introduces one more helper function, string->terms, which should return '((|o| |k|) (|i| |t|) (|i| |s|)) given "is it ok". It actually doesn't matter what order the rest are in, as long as the "answer" ("ok" in this case) is the first element. Here's what the function looks like:

    (defun string->terms (problem-string)
      (reverse
       (mapcar (lambda (s) (mapcar (lambda (i) (intern (format nil "~a" i))) 
                                   (coerce s 'list)))
               (cl-ppcre:split " " (string-downcase problem-string)))))

you'll have to install and include cl-ppcre to use cl-ppcre:split. I recommend [quicklisp](http://www.quicklisp.org/beta/) for your installation needs. I'm downcasing to avoid that problem with T being a reserved symbol.

Tadaaah!

    (loop for i from 0 to 99999 ;;careful to enter these correctly
          when (apply (solution-fn "is it ok") (number->digits i)) 
          collect (numbers->bindings i '(|o| |k| |t| |i| |s|))) ;;note the re-ordered args to match output from `solution-fn`

Gives you the solution in reasonable time.

...

That's not very satisfying though, is it. With this solution, we'd need to manually figure out the order of arguments in the final function, and we'd have to enter the correct number of 9s for those arguments. Whenever there's an easy-to-miss detail in the code somewhere, I like to make sure it's as hard to miss as possible. The best way to make it hard to miss is to tell the program to take care of these details itself. To quote Olin Shivers

I refuse to do things computers can do. -Olin Shivers
    (defmacro solve-for (problem-string)
      (let* ((terms (string->terms problem-string))
             (args (remove-duplicates (apply #'append terms)))
             (nines (parse-integer (coerce (make-list (length args) :initial-element #\9) 'string))))
        `(loop for i from 0 to ,nines
               when (apply (solution-fn ,problem-string) (number->digits i))
               collect (numbers->bindings i ',args))))

Now, (solve-for "it is ok") will give you the answer you need. And it'll work for similar problems. That's not really the solution yet though; the problem asks for "shortest possible time", and so far I've been paying for simplicity with increased runtime. I don't trust myself here though; a priori reasoning about efficiency is fine for a first stab at the problem, but when I want to optimize, the profiler is my friend. After turning profiling on for my helper functions in SLIME and running solve-for for "it is ok", "its not ok", "i am ok", "i am not ok" and (accidentally) "it no ok", M-x slime-profile-report spits out...

  seconds  |     gc     |     consed     |    calls   |  sec/call  |  name  
-----------------------------------------------------------------
    62.983 |      1.732 | 23,736,407,072 | 11,203,305 |   0.000006 | NUMBER->DIGITS
    14.780 |      0.092 |  3,777,455,536 | 43,600,000 |  0.0000003 | DIGITS->NUMBER
     0.037 |      0.004 |     11,936,688 |      3,305 |   0.000011 | NUMBERS->BINDINGS
     0.000 |      0.000 |         37,008 |          8 |   0.000000 | STRING->TERMS
-----------------------------------------------------------------
    77.800 |      1.828 | 27,525,836,304 | 54,806,618 |            | Total

So the bottleneck is clearly number->digits by a pretty wide margin and this really shouldn't come as a shock given how I've been representing digits. Time to change that.This is where I'll resort to some light side-effect. When I get a big, necessary performance boost in return.

    (defun number->digits (num &optional (pad-to 5))
      (let ((temp num)
            (digits nil))
        (loop do (multiple-value-call 
                     (lambda (rest d) (setf temp rest digits (cons d digits)))
                   (floor temp 10))
              until (= pad-to (length digits)))
        digits))

And, of course, I need to go back and make sure that the appropriate functions are now expecting a list of integers rather than a list of chars. That ends up very slightly simplifying numbers->bindings

    (defun numbers->bindings (num bindings) 
      (let ((digits (number->digits num))) (mapcar (lambda (b d) `(,b . , d)) bindings digits)))

and complicating digits->number

    (defun digits->number (&rest digits)
      (apply #'+ (loop for d in (reverse digits) for i from 0
                       collect (* d (expt 10 i)))))

Lets see. After solve ing for "it is ok", "its not ok", "i am ok", "i am not ok" and (not accidentally this time) "it no ok" again:

  seconds  |     gc     |     consed    |    calls   |  sec/call  |  name  
----------------------------------------------------------------
    10.993 |      0.752 | 7,199,265,488 | 43,900,000 |  0.0000003 | DIGITS->NUMBER
     5.009 |      0.224 | 1,069,467,616 | 11,304,260 |  0.0000004 | NUMBER->DIGITS
    0.0003 |      0.000 |       634,880 |      4,260 |  0.0000001 | NUMBERS->BINDINGS
     0.000 |      0.000 |        48,224 |         10 |   0.000000 | STRING->TERMS
----------------------------------------------------------------
    16.002 |      0.976 | 8,269,416,208 | 55,208,530 |            | Total

Which is a pretty drastic improvement to number->digits. digits->number is doing worse now though. I'll try out something else. Intuitively, this should be worse, but you never know 'till you try a few million times.

    (defun digits->number (&rest digits) (parse-integer (format nil "~{~a~}" digits)))
  seconds  |     gc     |     consed     |    calls   |  sec/call  |  name  
-----------------------------------------------------------------
    61.560 |      2.489 | 27,440,408,672 | 43,900,000 |   0.000001 | DIGITS->NUMBER
     3.973 |      0.004 |         96,832 | 11,304,260 |  0.0000004 | NUMBER->DIGITS
     0.004 |      0.000 |        593,920 |      4,260 |   0.000001 | NUMBERS->BINDINGS
     0.000 |      0.000 |         47,232 |         10 |   0.000000 | STRING->TERMS
-----------------------------------------------------------------
    65.537 |      2.493 | 27,441,146,656 | 55,208,530 |            | Total

Ok, looks like I didn't need the profiler to figure that one out; intuition is sometimes right. I'm reverting to the numeric version, and the only other things I'll test is inlining solution-fn and changing a reverse out for nreverse in digits->number. It doesn't seem like these will make a huge difference, but it should improve the memory situation by a bit (and digits->number is hurting on conses).

  seconds  |     gc     |     consed    |    calls   |  sec/call  |  name  
----------------------------------------------------------------
     7.627 |      0.400 | 4,363,098,528 | 43,900,000 |  0.0000002 | DIGITS->NUMBER
     4.388 |      0.160 |   776,123,744 | 11,304,260 |  0.0000004 | NUMBER->DIGITS
     0.004 |      0.000 |       876,544 |      4,260 |   0.000001 | NUMBERS->BINDINGS
     0.000 |      0.000 |        21,600 |          5 |   0.000000 | STRING->TERMS
----------------------------------------------------------------
    12.019 |      0.560 | 5,140,120,416 | 55,208,525 |            | Total

Not bad. 3 fewer seconds of runtime in exchange for a single lambda inlining and one additional character in a function name. At this point,

    (progn (solve-for "it is ok") 
           (solve-for "its not ok") 
           (solve-for "i am ok") 
           (solve-for "i am not ok") 
           (solve-for "it no ok"))

runs in just under 15 seconds with profiling off (and about a minute with the whole package profiled), which is pretty good by my estimation. I think I'll cut it here. I've already spent about two hours on this article, and that feels like it's more than enough. If I had a mind to improve it, I'd focus on a few specifics.

  • Firstly, digits->number can do some of its calculations at macroexpansion time (since by that point, we know what number of digits we're expecting in each term, we could create a custom function with the specific arity we need instead of using a &rest argument).
  • Secondly, numbers->bindings can be folded into solution-fn (so that it generates a function that returns either a set of bindings, or nil, instead of t or nil). That should save another couple of seconds.
  • Thirdly you could parallelize this operation. Remember, we're just applying a function to each element of the list of integers between 0 and 99999... Since there are no sequential data dependencies, there's no reason it couldn't be broken up into equal pieces across a server cluster if you wanted to solve "the quick brown fox jumps over the lazy dog".
  • Finally you could spend some time figuring out an algorithmic solution

By the by, I'm experimenting with a markdown converter this time (instead of using my blog-mode. Some assembly has been required in a couple of places (mainly that paste of the problem, and the profiler reports), but I'll try again next time, with this in mind. I'll also see what it takes to get some code highlighting up ins using just markdown.

EDIT: I've gone back over this post with blog-mode. The markdown generator was messing up entirely too many things for my liking (and didn't give me code highlighting to boot). Well, I tried it at least... Fri, 08 Apr, 2011

Tuesday, March 8, 2011

Parenscript

Recently, I picked up parenscript. I've been meaning to, it's available through quicklisp, and there was an increasing amount of repetition in my js code, so I gave it a shot. I'm about half way through putting together my first js file with it, and goddamn is this sweet.

I've read arguments that it creates a needless abstraction over JS that's better handled through jQuery (which I have some experience with). That's actually one of the reasons I've been a bit slow on the draw to this one. A combination of assuming that additional abstraction would cost more than it would benefit, and having experience with a technology that's Good Enough™. After having put together cl-css, though, I've noticed that even a minimal Lisp-based abstraction layer over a target language gives you a lot. If nothing else, you get macros and backquoting for free.

In the case of parenscript, there's actually a bigger win (which has variously been "solved" by regex templating, hidden concatenation and frameworks of varying success). Well, there's no reason to deal with JavaScripts' downright criminal lack of string templating with the grin-and-bear method. Even beyond that, there are quite a few places where the right thing for your JS code to do is generate further HTML. You sometimes want these components generated on the client side, because they're useless clutter if the client has javascript disabled.

Out of curiosity, have you ever tried doing that by hand? I'll wait, go ahead, give it a shot. Load up a blank page, include jQuery, then let me know what the best way is to generate the HTML for a google-maps-esque UI widget with javascript.

Well, here's how html-generation looks in parenscript

(chain ($ ".ui-container")
       (prepend 
          (who-ps-html (:div :class "ui-widgets"
                           (:img :id "directions" :src "directions.jpg")
                           (:img :id "street-view" :src "street-view"))))
       (css (create :cursor "pointer")))

In other words, exacly like html-generation in cl-who (which is to say, beautiful). The chain calls are ugly[1] in comparison to the standard jQuery $(".foo").bar(), and create isn't objectively brilliant[2], but the ability to do string templating and HTML generation in Lisp is such a load off my mind that I don't care. It adds a minor inconvenience in places I couldn't bring myself to care about, but provides relief specifically in the most tedious and error-prone parts of javascript development.

There's very little that wouldn't strike me as an improvement over

$(".ui-container")
    .prepend("<div class='ui-widgets'>"
             + "<img id='directions' src='directions-icon.jpg' />"
             + "<img id='street-view' src='street-view-icon.jpg' />"
             + "</div>")
    .css({"cursor": "pointer"});

And god help you if you need to sneak a variable in as the content/class-name of one of those tags.

Keep in mind that this is for a simple, throwaway example. If I wanted to get fancy, I'd throw in some stitched functions and macros (in fact, I'll post the code of that module I'm working on once I finish the thing, and I'll try to work out how I would have done it by hand).

So yeah. My first reaction was a resounding "Meh. I don't need this level of abstraction. I already know jQuery". The ability to have defmacro and format available to me while writing Javascript piqued my interest, and who-ps-html just sort of sealed the deal.

It's a little embarrassing that the title of this blog is becoming less and less accurate the more time I spend with Lisp. I've been using an increasing number of s-exp generators (cl-who, cl-css, clsql and parenscript, in case you were keeping score). Actually, embarrassing isn't the right word for it.

Worrying.

Tools, of course, can be the subtlest of traps. One day, I know, I must smash the emerald.


1 [back] For single method calls, anyway. In any case, I get the feeling I could macro my way out of this pretty easily. Something like

(defpsmacro $ (selector &body chains)
  `(chain (j-query ,selector)
      ,@chains))

seems to more or less solve my pain points.

(ps ($ ".ui-container"
       (prepend (who-ps-html (:div :class "ui-widgets"
                                   (:img :id "directions" :src "directions.jpg")
                                   (:img :id "street-view" :src "street-view"))))
       (css (create :cursor "pointer"))))

> "jQuery('.ui-container').prepend('<DIV CLASS=\"ui-widgets\"><IMG ID=\"directions\" SRC=\"directions.jpg\"><IMG ID=\"street-view\" SRC=\"street-view\"></DIV>').css({ 'cursor' : 'pointer' });"

j-query seems really weird, but it expands properly, and it'll only ever show up in the macro definition anyway.

2 [back] I personally prefer (create :foo "bar" :baz "mumble") to { 'foo' : 'bar', 'baz' : 'mumble' }, but I'm sure many would disagree.

Sunday, March 6, 2011

Formlets and Loop

Just a quick update today.

First, I've pushed an update to the formlets project on github. It now supports input type file (which includes managing the enctype properly, displaying file inputs and providing a couple of basic predicate generators for validation). Check the project page, or the new wiki, both of which have slightly more robust documentation than you'll find here.

There's really no occasion to this, by the way. I try to be a self-centered realist in terms of design philosophy, so the only reason I added file fields here was that I finally found I needed them. It actually surprises me quite a bit that I got by for so long with only inputs, passwords, textareas and recaptchas, but there you have it. I'm in the middle of another project at work now though, so I may soon add options and dates. Don't hold your breath though.

Second, I've been figuring out loop for the past little while (and re-wrote a lot of the formlet internals with it, now that I've realized that it can basically do everything). The pieces that loop helped in are ones that would otherwise have to be expressed in terms of recursion and some intermediary variables. If you want to take a look at it in action, check out the "Validation related functions" section in this diff. Seven lines of loop saved me something like 12 lines of recursion and six lines of helper function. And not only that, but it's (in my opinion, obviously) much easier for a human reader to parse this way. I haven't learned the full loop syntax yet, and doubt I ever will, given its specification. It looks like loop itself is, without exaggeration, several times more complicated than the rest of Common Lisp combined. iterate doesn't seem to be much better in this regard, by the way. It seems to be loop with a few extra parens thrown in. But loop isn't hard to learn because it has too few parentheses, rather becuase it's mind-bendingly complicated.

In any case, I've found tutorials on both iterate and loop (as well as the CL cookbook loop entry and Seibels' treatment in PCL). The two things that I needed to know in order to make that formlets code work were either omitted, buried, or merely implied in each source. Specifically, I needed to iterate by two elements of a list, and I needed to be able to return a flat list that had double the elements of the input (collecting two elements per input element). Basically

'(:a 1 :b 2 :c 3 :d 4) => '(:a :b :c :d)
'(:a :b :c :d) => '(:a "A" :b "B" :c "C" :d "D")

That's very slightly beyond mapcar as far as I know, so the way it ended up getting written was a recursion. Which came out very complicated (not that the situation helped; this would have been relatively straightforward in regular code, but throw in two or three steps of macroexpansion, and it gets ugly fast). So, for my own future reference (and hopefully, for the benefit of anyone else that does a search on this), here's how you do it with loop.

;; Iterating by multiple elements
(defvar test-list '(:a 1 :b 2 :c 3 :d 4))
> TEST-LIST

(loop 
   for (key value) on test-list by #'cddr
   collecting key)
> (:A :B :C :D)

;; Collecting multiple elements
(setf test-list '(:a :b :c :d))
> (:A :B :C :D)

(loop 
   for key in test-list
   collecting key 
   collecting (string key))
> (:A "A" :B "B" :C "C" :D "D")

You can actually skip the "ing" in this case, writing the last two clauses as collect key collect (string key). There's also no requirement for making this a multi-line statement, I just feel that it's easier to read here.

EDIT: It's also been pointed out to me that the second one there can be written without loop as (mapcan (lambda (n) (list n (string n)) test-list).

Third, and no one other than me cares, so you may want to go read something interesting instead.

I know it's not very impressive yet, but keep in mind that I started off in the 35-45 range. Hopefully, I can crack 100 this year.

Thursday, March 3, 2011

CLSQL. And NOTHING ELSE.

I am discussing CLSQL this week. If you don't want to hear about it, here's a picture of two llamas instead.

So I've decided to switch over to CLSQL from cl-mysql for my Common Lisp databasing needs. Partly because CLSQL provides a database-agnostic, s-expression based syntax for SQL queries (as opposed to MySQL-specific string-representations) , partly because it seems to be closer to a "standard" CL database library, but mainly because it's installable thorough quicklisp, whereas cl-mysql is only installable by downloading the tarball from its github and asdf-install:installing that. Then crossing your fingers that you only get 37 compilation errors.

As usual, here's the experience from the perspective of a not-particularly-bright, young lisper.

Before I even get into using it, though, I have to admit that installation wasn't free of speed bumps. Using (ql:quickload "clsql") seemed to install and include the thing correctly, but as soon as I tried to use connect, it barfed at me, saying that it couldn't compile the C ffi libraries clsql was expecting. This particular machine is running on Debian 6 (the Intel 32 version) and SBCL 1.0.40.0 (this was also before the recent Quicklisp beta update, so it may not even be an issue anymore). Anyway, it turns out that I had to apt-get install cl-sql. libmysqlclient-dev was installed already (zach told me to check in our brief SO correspondence), but that didn't seem to make a difference. On my desktop at home, I've got pretty much the same setup, except it's an AMD 64 machine instead of an Intel, and that seemed to trigger a couple of warnings (though following the [Accept] restarts got it into a workable state). Finally, I had one last problem on my Linode (where I hadn't thought to install gcc for some odd reason, so the C ffi libraries had no hope of compiling). That's the installation headaches over with.

TL;DR;so far: if you have any problems, make sure to install libmysqlclient-dev, gcc and cl-sql from the Debian repos. Expect two warnings on AMD machines (which you can [Accept] through).

The actual usage is fairly simple, assuming you're already familiar with SQL. There are two interfaces; a functional one and an OO one that binds tables to CLOS objects. I don't know much about that second one, so this is going to deal with my use of the functional interface.

If you're going to be playing along through the repl, you'll need to evaluate

(connect '("localhost" "database-name" "database-user-name" "password") 
         :database-type :mysql) 
(start-sql-recording)
(enable-sql-reader-syntax)
;; I'm using :mysql. You could use something else, it shouldn't matter for the purposes of this article

The use of connect is fairly self-explanatory. start-sql-recording returns the SQL equivalent of any cl-sql query you evaluate (so don't use it in files, it's just for repl purposes). Finally, the call to enable-sql-reader-syntax lets you use CLSQLs bracket-delimited SQL macros in the REPL. If you've got a file you want to use CLSQL in (as opposed to at the repl), put (file-enable-sql-reader-syntax) at the top (right after the in-package statement if you have one). The syntax works in two relevant ways.

First, it converts lisp-case expressions to SQL_CASE expressions. For example

(create-table [user]
              '(([user-id] integer :not-null :unique 
                           :primary-key :auto-increment)
                ([first-name] (string 50)) ([last-name] (string 50)) 
                ([num-logins] integer) ([password] string) ([salt] string)))

>;; 2011-03-03T09:41:44 localhost/database/user => CREATE TABLE USERS (USER_ID INT NOT NULL UNIQUE PRIMARY KEY AUTO_INCREMENT, FIRST_NAME CHAR(50), LAST_NAME CHAR(50), NUM_LOGINS INT(11), PASSWORD VARCHAR(255), SALT VARCHAR(255)) Type=InnoDB

If you're using a database other than MySQL, the CREATE TABLE statement will look different (if you want to play around creating stuff, you can find the CLSQL column-type reference about half-way down this page).

Second, it'll give you access to a subset of lisp for the purposes of creating SQL expressions, as in the :where clause here

(select [*] 
        :from [user] 
        :where [and [= [first-name] "Inai"] 
                    [= [last-name] "mathi"]])

>;; 2011-03-03T12:41:40 localhost/database/user => SELECT * FROM USER WHERE ((FIRST_NAME = 'Inai') AND (LAST_NAME = 'mathi'))

Like I said, it takes a subset of lisp, not the whole thing, so while you can construct pretty elaborate where clauses using (null|not|and|or|[=><]), you can't do something like

(update-records [user] 
                :attributes '([num-logins]) 
                :values '([+ 1 [num-logins]]))

Incidentally, that's one of the two ways you can organize column name and values in a query. The other (which I prefer whenever I'm dealing with more than one attribute at a time) is to pass up attribute-value pairs like so

(insert-records :into [user] 
                :av-pairs `(([first-name] "Inai") ([last-name] "mathi") 
                            ([password] ,(salt-password pw salt)) 
                            ([salt] ,salt)))

As in regular SQL (or, at least, MySQL sql), if you're inserting a value for each column, in order, you can leave out the :attributes specification altogether. Simple, right? As long as you know SQL and Lisp, I mean.

The pitfalls i've hit in the coding bit really have more to do with some MySQL-specific (I guess?) things that I still didn't expect to be handling myself. For example, the first time I saw their [ ] notation, I thought "Oh, this is a way to translate some stuff to SQL notation". It seemed like a safe assumption that this would include things like [now] or [(now)], one of which I thought would call the sql NOW(); function to get the current datetime in the appropriate format. And that's a no. I honestly didn't think I'd have to write

(defun mysql-now ()
  (multiple-value-bind 
        (second minute hour date month year day-of-week dst-p tz) 
      (get-decoded-time)
    (declare (ignore day-of-week dst-p tz))
    ;; ~2,'0d is the designator for a two-digit, zero-padded number
    (format nil "~a-~2,'0d-~2,'0d ~2,'0d:~2,'0d:~2,'0d" 
                 year month date hour minute second)))

myself, but there you have it. Also, TIL that multiple-value-bind has an odd indeting pattern.

The next thing is that the timestamp column type doesn't seem to be supported by CLSQLs functional interface. The reference page I linked earlier states that you should be able to specify a timestamp column using wall-time, but that creates a vanilla datetime in MySQL (timestamps are separate, and will store the current time whenever that row is INSERTed or UPDATEd). The solution seems to be

  1. Don't use timestamp columns, and instead manually call mysql-now when I need to update timestamps
  2. Sidestep the CLSQL reader syntax by using the query function anywhere you want a timestamp.

The second is unacceptable because it's basically what cl-mysql does by default, except without the automatic sql escaping. The default CLSQL syntax handles this, by the way, so feel perfectly free to call your admin account "'); DROP TABLE USERS; SELECT '", it shouldn't cause any trouble other than being annoying to type each time. To be fair, I'd probably only have to use SQL literals at table creation, so it wouldn't be the end of the world, but it's also not ideal. Not using timestamps where they're appropriate just because my tools don't like it is even worse. Hopefully, a solution presents itself, but judging from the response over at SO, I'm not holding my breath. Maybe I'm using wall-time incorrectly, or there's another column specifier that gets the correct behavior in MySQL, I dunno.

The only other problem I'm having is understanding how exactly you're supposed to use the with-connection and with-default-connection functions. Using with-default-connection doesn't seem to close the connection or (return it to the pool if you're using one). with-connection does, but it gives you style-warnings if you don't explicitly pass that connection to any queries you wrap in it.

This last one is probably a broken understanding on my part though. Intuitively, I'd expect to either

  • Wrap each handler function in a with-connection (so that any database hits happening as a result of that handler share a connection)
  • Wrap each database-manipulating function in a with-connection (so that each database hit has its own connection. Sounds bad, but it's actually manageable on my current project)
  • Start a connection with the server, and use that one to handle all traffic (which sounds scary in many ways, so I'm not seriously considering)

The second honestly sounds like the right choice (though I could be wrong depending on how much overhead is associated with starting a connection to the database server; I should run that through the profiler this weekend), but the first one is also acceptable. The trouble is that I can't reconcile either with the fact that the with-connection twins really seem to want me passing explicit database references around. Like I said, more research is necessary.

Sorry for starting this month out on the boring side, but I've been poking at this for a week or so, and I needed to clear my head of it.