Friday, August 31, 2012

Environments and Tradeoffs

I hit up the Coding Dojo again this week, and we switched it up a little bit.

This time, we decided to split into two groups; one using Emacs[1] and one using a simple, OS X editor.

Yes, I already pointed out what a bad idea this was, but to no avail. We have at least one person there who really really wants to use Emacs, so that's that I guess. Anyway, given that the group has its share of vim users, and its share of various IDE users, we were bound to get into a ribbing match by the end.

I may or may not have referred to the OS X setup as "Lowest Common Denominator", and then had to explain that I didn't mean it in a bad way. That told me that there's another part of my internal state that I assume is common knowledge, but may not be. If this is already part of your experience, you'll know within the next two sentences or so, and at that point you can skip the rest of the article knowing you're not missing much.

Here's the trade-off you make when you choose your environment, or customize it, or (if you're really hardcore) build your own: ease-of-use vs. worth-learning. That's NOT a fancy way of saying "you fuckers are lazy for not learning #{my_editor}". There's an actual trade there.

Ease of Use

When you sit down at this environment, it will be easy to pick up. You may need to learn one or two new keystrokes, and you may need to toggle one or two options to make it a bit comfortable. You will never have it explode on you. You'll never have to make use of its built-in auto-debugger, which it probably doesn't have, because it doesn't ship with the source code.

You'll also never really bend it to your will, which means that you won't be coding as fast as you can possibly be coding. You'll need to make peace with the fact that it just plain won't let you do certain things, or force you to do certain repetitive things manually, and that you'll need to use external tools for certain pieces of your workflow. Assuming you choose to live with it.

Worth Learning

You can not pick up this environment in a day or two. It will take you weeks or months. It has substantially different keybindings than general-purpose editors because it is or includes a special-purpose editor. You need to go through a lot of configuration before you get it feeling just right. You may need to change your keyboard layout slightly, and/or write up a few custom modules. You will see the debugger, and you will say "Thank fucking god that I have access to this", because you will need it.

You will likely be able to pull source code for it, and it will likely have its own modification language/framework[2]

In the long run, it will make you much more productive. Noticeably more productive. You will show someone how you work, and their reaction will be "How the hell did you do that?".

Where It Matters

If you're a professional programmer, and actually want to be effective at it, I'd argue that it's a mistake not to pick the second option for your solo programming time. Note that this doesn't mean "pick Emacs". I did, but that's mainly because of the languages I use. vi/vim, Eclipse, leksah, jEdit, or whatever might make as much sense for you. Gedit, or Notepad++ or similar doesn't cut it here. And if there are any mutants out there coding in Word or something, just stay away from me, because I will not be able to veil my contempt.

The reason it makes sense there is that you're the only one whose effectiveness you need to worry about maximizing. That means that you can optimize the hell out of it without regard for the learning curve, or the portability[3].

Now, on the flipside to the standard solo coding activity where your Worth Learning©™ environment should dominate, are use cases like pair programming. Or, oh, I dunno, Coding Dojos.

You do not want the same things in that situation.

First, you're not trying to maximize your own throughput, but the throughput of the group. It is a sub-optimal outcome if one of you can hit 10 lines per minute, and the rest can't even get to one. That means you have two options

  • Set up a standardized environment that everyone in the group agrees to, send out a setup script for whatever platform, and have everyone practice outside of the event
  • Set up a minimal environment with a very short learning curve so that everyone can pick it up without practice, and go back to their own customized environments otherwise

This is why I'm against using Emacs for physical, social coding unless it's with a complete group of Emacs users. You'll be handicapping some people pretty severely for no relevant benefit. In fact, unless you set up the vanilla Emacs distro, you'll be handicapping everyone for no relevant benefit, because every Emacs setup tends to be set up in its own way.

So yeah. "Lowest Common Denominator" is what you want here.


Footnotes

1 - [back] - On GNU/Linux, but this is incidental; I put together the Emacs environment on my machine, and I happen to be a Debian user. We never used anything other than Emacs, so the fact that I use a Tiling WM never really came up.

2 - [back] - Good language and framework optional; Elisp seems to be at the upper end of the curve these days, and it's not a particularly stellar language. Lack of namespace management gets pretty annoying after a while.

3 - [back] - Though you probably should keep a setup script somewhere to make it easier for yourself to re-install if necessary.

Tuesday, August 28, 2012

Partial Poker Hand Kata in Common Lisp, Clojure and Haskell

I mentioned the Toronto Coding Dojo last week[1]. Specifically, I mentioned trying to go over the Poker Hand Kata from scratch each week in Clojure.

We haven't solved it yet, but we're getting there. Half the point is getting to know the language, and the TDD technique, so it's not as though failing to get to the end is the worst possible thing, really. I'm warming to the language, but not the technique (more on that next time).

We were supposed to have a dojo github page, but there doesn't seem to be a link going out from the meetup, and I can't find it after ten minutes of determined googling, so I can't point you to it. I have, however taken first stabs at the problem in three languages and want to go over the problem a bit.

EDIT:

Turns out the organizer is keeping the progress repos in his GitHub profile.

Fri, 31 Aug, 2012

Fundamentally, it's a sorting problem. We have cards, whose relevant properties are a rank and a suit. We have an ordered set of hand types, each of which have their own tie-braking method with other hands of the same type. The task, near as I can tell, is taking a pair of hands, figuring out their types, then sorting them to find out the winner[2].

The constructs we need to represent here are ranks, suits, cards (which is just a (rank suit) combo) and hands (which are just lists of cards). Here's my first first stab in Common Lisp[3].

;; poker-hands.lisp

(defpackage :poker (:use :cl :split-sequence))
(in-package :poker)

(defparameter *letter->val* '(#\T 10 #\J 11 #\Q 12 #\K 13 #\A 14))
(defparameter *hand-type->val* '(:high-card 1 :pair 2 :two-pairs 3 :three-of-a-kind 4 
                                 :straight 5 :flush 6 :full-house 7 
                                 :four-of-a-kind 8 :straight-flush 9))

(defclass card ()
  ((rank :reader rank :initarg :rank)
   (suit :reader suit :initarg :suit)))

(defun read-card (card-string)
  (make-instance 'card
                 :rank (or (getf *letter->val* (aref card-string 0))
                           (parse-integer card-string :junk-allowed t))
                 :suit (aref card-string 1)))

(defun read-hand (hand-string)
  (sort (mapcar #'read-card (split-sequence #\space hand-string))
        #'> :key #'rank))

(defun flush-p (cards) 
  (let ((suits (mapcar #'suit cards)))
    (every (lambda (s) (eq s (car suits))) (cdr suits))))

(defun range (start end)
  (loop for i from start to end collect i))

(defun straight-p (cards)
  (equal (mapcar #'rank cards)
         (loop repeat (length cards) 
            for i from (rank (car cards)) downto 0
            collect i)))

(defun find-sets (cards)
  (let ((copy (copy-list cards)))
    (loop for c in copy
       when (remove (rank c) cards :key #'rank :test-not #'=) collect it
       do (setf cards (delete (rank c) cards :key #'rank)))))

(defun set-of-p (n sets)
  (some (lambda (s) (= (length s) n)) sets))

(defun count-sets-of (n sets)
  (count-if (lambda (s) (= (length s) n)) sets))

(defun hand-type (hand)
  (let ((sets (find-sets hand)))
    (cond ((and (flush-p hand) (straight-p hand)) :straight-flush)
          ((set-of-p 4 sets) :four-of-a-kind)
          ((and (set-of-p 3 sets) (set-of-p 2 sets)) :full-house)
          ((flush-p hand) :flush)
          ((straight-p hand) :straight)
          ((set-of-p 3 sets) :three-of-a-kind)
          ((= 2 (count-sets-of 2 sets)) :two-pairs)
          ((set-of-p 2 sets) :pair)
          (t :high-card)))) 

(defmethod break-tie (hand-type (hand-a list) (hand-b list))
  (loop for a in hand-a
        for b in hand-b
        unless (= (rank a) (rank b))
          do (return (> a b))))

(defun hand-type-> (hand-type-a hand-type-b)
  (> (getf *hand-type->val* hand-type-a)
     (getf *hand-type->val* hand-type-b)))

(defun hand-> (hand-a hand-b)
  (let ((type-a (hand-type hand-a))
        (type-b (hand-type hand-b)))
    (or (hand-type-> type-a type-b)
        (when (eq type-a type-b)
          (break-tie type-a hand-a hand-b)))))

Not bad for about 20 minutes of work. I punt on the break-tie method at the bottom there, opting to just compare high cards until someone wins. Like I said, that really should be doing something else; for instance, if we have two three-of-a-kind hands, we'd want to compare the set of three as opposed to the high cards. Once we've got the hands read into an easier format, we can test flush-p, which takes a list of cards and checks if they've all got the same suit, and straight-p, which takes a list of cards and checks if they constitute a run.

read-card takes a two-character string and returns a new card based on it. A card is just a rank attached to a suit. read-hand takes the specified hand string format, and returns a list of cards from it. Finally, we've got hand-type-> and hand->, which compare hand types and hands respectively[4].

It's minimal, and it doesn't really solve the problem, but I'm already familiar with the CL way of doing things, so I didn't want to spend any more time on this one than I really had to.

On we go to

;; poker-hands.clj

(ns poker-hands.core
  (:use [clojure.string :only [split]]))

(def rank-map {\T 10 \J 11 \Q 12 \K 13 \A 14})
(def name-map ["Rules for Draw and Stud Poker" "Ace" "Two" "Three" "Four" "Five" "Six" "Seven" "Eight" "Nine" "Ten" "Jack" "Queen" "King" "Ace"])
(def suit-map {\H :hearts \C :clubs \S :spades \D :diamonds})
(def hand-map {:straight-flush 8 :four-of-a-kind 7 :full-house 6 :flush 5 :straight 4 :three-of-a-kind 3 :two-pairs 2 :pair 1 :high-card 0})

(defn read-card [card-string]
  (let [rank (or (get rank-map (first card-string)) (read-string (subs card-string 0 1)))
        suit (get suit-map (second card-string))
        name (get name-map rank)]
    {:rank rank :suit suit :name name}))

(defn read-hand [hand-string]
  (sort-by :rank (map read-card (split hand-string #" "))))

(defn flush? [cards]
  (= 1 (count (group-by :suit cards))))

(defn straight? [cards]
  (let [ranks (map :rank cards)]
    (= ranks (range (first ranks) (+ 1 (last ranks))))))

(defn group-of? [n sets]
  (some #(= (count (second %1)) n) sets))

(def four-of-a-kind? (partial group-of? 4))
(def three-of-a-kind? (partial group-of? 3))
(def pair? (partial group-of? 2))

(defn count-sets-of [n sets]
  (count (filter #(= (count (second %1)) n) sets)))

(defn hand-type [hand]
  (let [sets (group-by :rank hand)]
    (cond (and (straight? hand) (flush? hand)) :straight-flush
          (four-of-a-kind? sets) :four-of-a-kind
          (and (three-of-a-kind? sets) (pair? 2 sets)) :full-house
          (flush? hand) :flush
          (straight? hand) :straight
          (three-of-a-kind? sets) :three-of-a-kind
          (= 2 (count-sets-of 2 sets)) :two-pairs
          (pair? sets) :pair
          :else :high-card)))

(defn break-tie [hand-a hand-b] true)

(defn hand-> [hand-a hand-b]
  (let [type-a (hand-type hand-a)
        type-b (hand-type hand-b)]
    (or (apply > (map #(get hand-map %) [type-a type-b]))
        (when (= type-a type-b)
          (break-tie hand-a hand-b)))))

The Clojure version took me a bit longer since I'm still at the stage of having to code with a reference open, and I don't even have clojure-slime set up to give me argument hints. As I assumed though; there aren't really big conceptual differences between this one and the CL version. It's more compact by about 20 lines, but that's almost entirely due to the fact that Clojure has built-in range and group-by functions, which I had to define myself in the previous take.

The only other real difference is that there aren't any classes here, since Clojure encourages map and vector use instead. That's helped a bit by implicit indexing[5] and lambda shorthand[6]. Note that this already handles card names, rather than just ranks.

partial is what Clojure calls currying, and those three functions are there for readability in the hand-type body.

The part that I'm pointedly not showing here because it would be really boring, is the ~60 line set of test cases the group wrote up for this little program, as part of the construction process. Mostly, they were things like making sure that the read functions returned appropriate values from appropriate-looking strings, and specifying the basic functionality of how different hand types are coordinated and ranked.

On that note, here's the third (and final) stab I'm posting today

-- poker-hands.hs

import Data.String
import Data.List
import Data.Ord

data Rank = Two | Three | Four | Five | Six | Seven | Eight | Nine 
          | Ten | Jack | Queen | King | Ace 
          deriving (Eq, Ord, Show, Bounded, Enum)
                   
instance Read Rank where
  readsPrec _ value =
    let tbl = zip "23456789TJQKA" [Two .. Ace]
    in case lookup (head value) tbl of
      Just val -> [(val, tail value)]
      Nothing -> error $ "\nInvalid rank: " ++ value

data Suit = H | C | D | S deriving (Eq, Ord, Show, Read)

data Card = Card { rank :: Rank, suit :: Suit } deriving (Eq, Ord, Show)

instance Read Card where
  readsPrec _ value =    
    [(Card (read r :: Rank) (read s :: Suit), drop 2 value)]
    where r = init value
          s = snd $ splitAt (length r) value
          
data Hand = Hand { handRank :: HandRank, cards :: [Card] } 
          deriving (Eq, Show, Ord)

instance Read Hand where
  readsPrec _ value =
    [(Hand (getHandRank res) res, "")]
    where res = reverse . sort . map read $ words value :: [Card]

data HandRank = HighCard [Rank] 
              | Pair [Rank]
              | TwoPair [Rank]
              | ThreeOfAKind [Rank]
              | Straight [Rank]
              | Flush [Rank]
              | FullHouse [Rank] 
              | FourOfAKind [Rank] 
              | StraightFlush [Rank]
              deriving (Eq, Ord, Show)

isFlush :: [Card] -> Bool
isFlush cards = (1==) . length . group $ map suit cards

isStraight :: [Card] -> Bool
isStraight cards = 
  let rs = sort $ map rank cards
      run = [(head rs) .. (last rs)]
  in rs == run

getHandRank cards =
  let ranks = reverse . sort $ map rank cards
      uniqueRanks = nub ranks
      rankGroups = sortByLen $ group ranks
      handRank = case cards of
        _ | (isFlush cards) && (isStraight cards)  -> StraightFlush
          | has4 rankGroups                        -> FourOfAKind
          | (has3 rankGroups) && (has2 rankGroups) -> FullHouse
          | isFlush cards                          -> Flush
          | isStraight cards                       -> Straight
          | has3 rankGroups                        -> ThreeOfAKind 
          | (countGroupsOf 2 rankGroups) == 2      -> TwoPair
          | has2 rankGroups                        -> Pair
          | otherwise                              -> HighCard
  in handRank uniqueRanks

-------------------------------
-- General Utility Functions --
-------------------------------
hasGroupOf :: Int -> [[a]] -> Bool
hasGroupOf n groups = n `elem` (map length groups)
has4 = hasGroupOf 4
has3 = hasGroupOf 3
has2 = hasGroupOf 2

countGroupsOf :: Int -> [[a]] -> Int
countGroupsOf n groups = length $ filter (\g -> length g == n) groups

sortByLen :: [[a]] -> [[a]]
sortByLen = sortBy (flip $ comparing length)

Haskell is... odd. It's up there in the language bar because I poke at it rather vigorously with some frequency, but I've yet to do anything serious with it. I like it, but I always get the feeling that it doesn't like me very much.

This one took me a while. I'd bet it was between three and four hours. First, re-reading some of the documentation I'd already gone through as a refresher, then going through a bunch of reference docs to find particular function names[7], and finally writing the actual program.

It contains a few lines more than the Common Lisp solution, and about 20 more than the Clojure piece, but I'll cut it some slack for two reasons in this case. First, because those type signatures and declarations effectively replace between 90% and 95% of those boring test cases I mentioned. And second, because unlike the Lisp approaches, this one is complete apart from printing the output and one piece of input procedure.

That is, if you hand it a pair of hand strings and run compare, you'll get back the correct answer, down to the last tie breaker[8].

I use instance Read to declare readers for Rank, just but derive Read on Suit outright. Those two compose to let us read Cards and Hands as well. All of these types derive Ord, because the whole point is sorting them, and rank also derives Bounded and Enum so that I have an easier time of expressing a range of cards.

Once all the types are declared, the rest of the program just kind of falls out. You can see more or less the same flush and straight detectors, and even the same structure in getHandRank (except that it's named differently).

What you don't see is any boilerplate surrounding hand comparisons. Or, in fact, any comparison functions at all. We sort cards twice[9], but that's it. Because those types are defined deriving among other things Ord, you can use all the standard comparison operators to do the rest.

I was going to say a few proper words comparing the approaches and languages here, but this piece is already quite a bit longer than I'd like it to be. It'll have to wait for next time[10].


Footnotes

1 - [back] - Heads up if you were planning on joining us, by the way, they're holding a poll on what day next weeks' meeting should be held. If you weren't there yet, and your reason was "I'm not free that day", you may want to give your opinion a voice.

2 - [back] - There's also a bit of incidental complexity around displaying the winners after that, that I'll ignore for now.

3 - [back] - It's what I'm comfortable with. Also, note that all these tries were written before I started writing this post, so they have less thought in them than they otherwise might.

4 - [back] - I only implemented one direction, since the problem at hand doesn't call for more.

5 - [back] - As seen in that group-by call in hand-type.

6 - [back] - As seen in group-of?, count-sets-of and probably a couple of other places.

7 - [back] - Hoogle helps immensely once you get your head around the type system, but I'd really like to have access to it on my local machine, along with proper auto-completion and type signature hinting.

8 - [back] - Just as an aside though, I have no idea what order suits are actually supposed to go in, so I arbitrarily picked H | C | D | S, even though that's almost certainly wrong. Don't hold that against the program, or the tools, that's just me being a not poker player.

9 - [back] - And sort groups of cards once.

10 - [back] - Probably after I finish up my thoughts about authentication.

Thursday, August 23, 2012

Starting Clojure

So I've been going to this Coding Dojo thing, I guess. In an attempt to finally get off my ass and into Clojure, but also into

  • socializing with functional programmers outside of the Lisp group[1]
  • getting a bit more of a handle on test driven development[2].

For the past two weeks, we've been (unsuccessfully so far, but no one is about to give up yet) trying to run through the poker hand kata in Clojure. Half the point here is trying out the language, and I've successfully procrastinated until they got a fantastic, standardized build system going so that I don't have to fuck around installing libraries by hand, which seems like it'll be very gratifying after the bunch of time spent in the Erlang world lately.

Installing Clojure

Clojure the debian package is actually not in the free repos. You can apt-get install clojure, but only after adding contrib and non-free to your sources.list, which I don't particularly want to do. In case you haven't noticed yet, I'm the sort of person who occasionally runs vrms, just to make sure. It turns out though, that the Clojure build tool can handle the task of installing the language for you, and provide faux-quicklisp/quickproject functionality and is in the free repos as of wheezy. So, one

apt-get install leiningen

later had me on my feet. Or part of the way, at least. That install gives you lein new and lein repl, but doesn't by itself set up a development environment. In order to do that, I also had to lein plugin install swank-clojure, and shove clojure-mode into my .emacs. At that point, I was ostensibly ready to start on a project, but SLIME and swank-clojure weren't playing nice for whatever reason. I still haven't puzzled it out, but the best idea any docs gave me was that Clojure really doesn't want you to have your own swank installed, thank you very much.

Given that I'm a professional Common Lisper these days, I had exactly zero chance of following that instruction. Instead, I wired up clojure-mode to use the inferior-lisp option by adding the following additional code to my .emacs

(defun na-load-buffer ()
  (interactive)
  (point-to-register 5)
  (mark-whole-buffer)
  (lisp-eval-region (point) (mark) nil)
  (jump-to-register 5))


(defun clojure-run-test ()
  (interactive)
  (let ((b (get-buffer-create "*clojure-test*")))
    (with-current-buffer b
      (erase-buffer)
      (insert (shell-command-to-string "lein test")))
    (display-buffer b)))

;; inferior-lisp support.
;; Because fuck you, that's why.
(add-hook 'clojure-mode-hook
          '(lambda () 
             (define-key clojure-mode-map (kbd "C-c C-c") 'lisp-eval-defun)
             (define-key clojure-mode-map (kbd "C-x C-e") 'lisp-eval-last-sexp)
             (define-key clojure-mode-map (kbd "C-c C-e") 'lisp-eval-last-sexp)
             (define-key clojure-mode-map (kbd "C-c C-l") 'na-load-buffer)
             (define-key clojure-mode-map (kbd "C-c C-z") 'run-lisp)
             (define-key clojure-mode-map (kbd "C-c C-b") 'clojure-run-test)))

After all that, run-lisp in a Clojure buffer will start up a Clojure REPL, and the keyboard shortcuts I'm used to from common-lisp-mode will more or less work as before. clojure-run-test is mind-numbingly slow, and I don't get completions or arglist hints, but it's good enough for a start.

Trying Clojure

The first thought that struck me was "Wait a minute, this looks a hell of a lot like Scheme". And really, that turns out to be pretty on the money, from what I can see so far at least. Clojure is a JVM Scheme with curlies, brackets, an Arc-esque obsession with counting characters needed in the source code, and heavy emphasis on immutability. That was bolded because, if you're in a hurry, you can basically stop reading now. If I were to offer advice about whether to learn it or not, I'd say

  1. if you need to do any extensive work on the JVM, use Clojure, it beats the alternatives
  2. if you don't know a Lisp yet, Clojure is a reasonable choice for your first[3]
  3. if you already know Scheme or Common Lisp, and are comfortable with it, and don't go in for this JVM nonsense, don't bother learning Clojure because it'll teach you nothing new in the Perlis sense

The differences are mostly in minutia, rather than the general principles of the language. I'll go through the few that are obvious from cursory poking, but if you're interested at all, you should take in Clojure for Lisp Programmers Part 1 and Part 2, in which Rich Hickey tells you basically everything I'm about to and a few more things besides.

There are probably bigger differences than the ones I'll point out, consider this a "preliminary impressions" note, because I've yet to do anything more serious than an attempt at that poker hand kata.

  • Different Truth/Falsity Values Clojure has an explicit true and false. nil and the empty list are not equivalent[4], and you're free to define one-letter local variables that designate time, traffic or totals. That's different from both CL and Scheme, and I'm sort of leaning towards calling it frivolous, but I'll see how it works out in practice[5].
  • No Separate Function Namespace Clojure cribs from Scheme here. A single function/variable namespace means you don't need to use #', and it means you don't need separate let/flet. Oddly, there are two define forms[6], but it's otherwise closer to the Scheme way of doing things.
  • Fewer Parentheses I'm talking about let and cond bodies here. CL and Scheme both have you delimit each pair in an additional set of parens, while Clojure doesn't. This might make transpose-sexps a bit weirder on their clauses, but reduces the amount of typing you need to do by a tiny amount in the general case.
  • Polymorphic Built-Ins The general equality test in Clojure is =, unlike CL or Scheme where you need to pick between =, eq, eql, etc. first, last, map and many others also work generically on sequences rather than just on lists.
  • Vectors Everywhere [1 2 3] is "the vector of 1, 2, 3" rather than a list. Because of the polymorphic thing above, this doesn't introduce as much syntactic complexity as you'd think, and it means you don't need to worry about which end of a list you're taking from. Argument lists are all vectors rather than lists.
  • Destructuring By Default I'm pretty used to whipping out destructuring-bind in Common Lisp because it's sometimes the most straightforward way of expressing something. I don't use it nearly as often as often in CL as I do in Python or Erlang just because it doesn't save typing in nearly as many situations given what the construct looks like[7]. In Clojure, you can do something like
    user=>(def foo [1 2 3 4])
          #'user/foo
          user=> (let [[a b c d] foo] (list a b c d))
          (1 2 3 4)
    which means that I could start doing this much more frivolously.
  • Curlies and Brackets Obviously. It's not as though CL doesn't have them, but they tend to get used very sparingly as part of reader macros. Clojure uses curlies to designate hash-maps/sets and [] to designate (among other things) vectors. Personally, I don't miss the JavaScript/jQuery matching hell that comes with nesting all three of them, but they don't seem to be mutually nesting in a lot of places, and paredit helps a lot anyway.
  • Whitespace Commas The quote and backquote still work as expected, but the "unquote" modifier is ~ rather than ,. This is another one that I see as frivolous, though I guess it could reduce cognitive friction for people who are used to delimiting lists with things other than spaces.

Two bigger ones that I feel the need to call out more prominently because I like them are multimethods and doc hashes.

If you're a Common Lisper, you're already used to multimethods. What's different about them in Clojure is that the generic function declaration takes a dispatch function. Which means that you can specialize methods on arbitrary properties, rather than just types. In Common Lisp, I occasionally have to declare a class for something just so that I can define methods for it, even if the thing I'm dispatching on really makes more sense as a slot than a class. The Clojure approach would save me code in these places.

Doc hashes are severely beefed up docstrings. Or, you could think of them as programming-by-contract-lite, I guess. You still have the option of doing the usual docstring thing

(defn read-card [card-string]
  "Takes a card string and returns a card hash with a :rank, :suit and :name"
  (let [rank (or (get rank-map (first card-string)) (read-string (str (first card-string))))
        suit (get suit-map (second card-string))
        name (get name-map rank)]
    {:rank rank :suit suit :name name}))

but if you want to get detailed, explicit, and compiler-checked, you have the option of doing something like

(defn read-card [card-string]
  {:doc "Takes a card string and returns a card hash with a :rank, :suit and :name"
   :pre [(string? card-string) (= 2 (count card-string))]
   :post [(= clojure.lang.PersistentArrayMap (class %))]}  
  (let [rank (or (get rank-map (first card-string)) (read-string (str (first card-string))))
        suit (get suit-map (second card-string))
        name (get name-map rank)]
    {:rank rank :suit suit :name name}))

You can define inline tests too, if you want, but it's probably better to keep those in a separate test file. The static typists among you are probably snickering at this, but I like it better because these are optional. You don't want them on every function ever, you just want them on the potentially confusing functions, whose existence you should be trying to minimize. This is one step closer to getting code and documentation to coexist peacefully.


Footnotes

1 - [back] - Though there is some overlap.

2 - [back] - Which is actually a lot less painful with functional programming in general than it seemed to be for the various Java/PHP teams I've had the pleasure of UI-ing for.

3 - [back] - Because it has the elegance of Scheme, combined with the production presence of Java meaning it'll be easier to convince your boss to let you use this than it will to let you use an actual Scheme, not that there's a lack of JVM options there.

4 - [back] - Though nil does equate to false for boolean purposes.

5 - [back] - As a note, having thought about it a little more, there are a couple of places where this is the unambiguously right thing to do, and I've yet to think up a situation where it'll trip me up.

6 - [back] - def for variables and defn for functions.

7 - [back] - (destructuring-bind (a b c) some-form-here &body)

Monday, August 20, 2012

Not Having Any Fun

Ok, so I mentioned I was working on a new thing that involved moderation, administration and the auth system I put together as part of the Four-and-a-half-and-counting part series on Authentication. I've still got one or two left to write there, but since this "don't talk about it 'till it's done" thing worked out so well, I'm going to keep you in suspense.

The result of my toil is Nitrochan a massively-ish scalable, real-time message board system inspired by the *abas that the internet is so full of. My problem with 4chan and similar boards is that they are sort of like going to a restaurant and having a guy come by to shit on your plate every few minutes. It seems that what you'd really want[1] is a constant, flowing stream of shit that you can pan for nuggets at your leisure. And this is an attempt at that. When a new thread is started, the boards are all updated with new data. When a new message is posted, the appropriate threads move up the sort order, and people already on the thread get the new message via Comet rather than having to F5. Threads can be moderated and moved between boards through similarly soft-real-time mechanisms.

The github is there, released under the terms of the AGPL[2]. I'll have another go at setting up an instance here for my own nefarious purposes[3] later this week.

The UI layer is still somewhat incomplete for a message board; we can't designate images as spoilers/nsfw, there aren't any comment markup options yet, there's no way to proactively protect a board or thread from spam, and the RSA login process is just as manual and painful as it was the last time I discussed it.

Still, we've got a good starting point to look at in terms of putting a running system together[4].

Now then, the bad stuff.

Bad Stuff

The Erlang deployment process is really beginning to annoy the fuck out of me.

I mean, it kind of did last time too, but I figured that it would get simpler as I went on and automated pieces. That... didn't really happen. You'll note that I mentioned I'll be trying again to set up an instance of Nitrochan.

The attempt proved to be futile, even without having to wrestle with rebar again. I'm really beginning to grudge that the language designers seemed to have considered actual deployment of an app to be outside of their scope. That's a shame, because every useful application is going to need to be deployed somewhere, and doing this stuff manually gets really tedious if you rely on even two or three libraries not found in the core Erlang image. rebar would be a good solution, from what I understand about it, assuming it did what it says on the tin. It has yet to for me.

That's saying nothing of the massive headaches I've gone through as a result of platform incompatibilities. Basically, I spent about an hour trying to figure out why it only works on my machine, only to remember that my deployment environment is a 32-bit Debian machine while my development environment is a 64-bit build of the same. That's really the only thing that could possibly make a difference, because I've painstakingly reproduced my local directory structure, installed programs and downloaded libraries, but running the same make tasks on the same code seems to crash on the server even while working perfectly well on my local. This is Not Inspiring Confidence©™ in Erlangs' touted cross-platform abilities. Because if I can't depend on my program running the same way on two builds of the same OS on different architectures, how am I supposed to believe that it'll do anything but explode when I try to deploy on another OS?

Bottom line, I'm not having any fun.

I did learn a lot about concurrency outside of the lock/mutex world, and I appreciated the opportunity to mess around with actors on a grander scale than I would usually be permitted, but the continuing headaches aren't worth it for me so far. I may come back to it once I've recharged my mental batteries. For the next week or so, I'll be playing around with Clojure[5].


Footnotes

1 - [back] - If that was your thing.

2 - [back] - Read it before hacking on it, and get my permission if you want to use the system under another license.

3 - [back] - Really, it's just so that the Toronto Lisp User Group can have something better than mailing list with which to communicate between meetings.

4 - [back] - And one of the items on that roadmap is automating the RSA login process.

5 - [back] - Links added to the sidebar, and shiny new logo in the bar up top.

Sunday, August 12, 2012

cl-mop, or "Yak Shaving for Fun and Marginal Profit"

"mop" stands for "Meta-Object Protocol", and it's a term closely related to CLOS. I've mentioned getting annoyed at a certain piece of it last time, when I needed to iterate over CLOS instance slots for some weird reason. It turns out that due to the way MOP support is implemented, this is a non-trivial thing to do portably.

Last week, I got into a situation where I needed a temporary copy of an object. What I really wanted was an object with most slots mirroring an existing instance, but with changed values in two slots. For reasons related to the layout of the surrounding code, I did not want to destructively modify the object itself because it was unclear whether the old values would be expected on a subsequent call. So I googled around a bit, and found that the situation for copying is pretty much the same as it is for iterating. There isn't a built-in, general way of making a copy of a CLOS instance, shallow or otherwise, and implementing it myself in a semi-portable way would require doing all the annoying things that I had to pull with slot iteration earlier.

So, being that I occasionally profess to be a non-idiot programmer, I figured I'd take a stab at solving the problem in a semi-satisfactory way.

And here we are.

That implements slot-names (which takes a CLOS instance or class and returns a list of its slot names), map-slots (which takes a (lambda (slot-name slot-value) ...) and an instance, and maps over the bound slots of that instance), shallow-copy (which does exactly what it sounds like it would do) and deep-copy (which is tricky enough that I hereby direct you to the documentation and/or code if you're sufficiently curious about it).

I did cursory testing in GNU Clisp, and fairly extensive testing (followed by some production use) in SBCL, though the :shadowing-import directive should work properly in a number of others as well.

Now, I realize that due to the kind of crap you can pull using CLOS by design, this isn't a complete solution. That said, it did solve the problems I was staring down, and I think I've made it portable/extensible enough that you'll be able to do more or less what you want in a straight-forward way. For basic use cases, it solves the problem outright, which should save me a bit of time in the coming weeks. For more complex cases, each of the exported symbols is a method, which means you can easily def your own if you need to treat a certain class differently from others.

Saturday, August 11, 2012

Steven? I Disagree.

Ok, yes, I know the last time I disagreed with Yegge, I wound up eating my own hat, but I hereby suggest that partitioning the software industry into Libs vs. Cons is a stupid idea, and gains us nothing. In any sense.

Firstly, because those terms are already loaded with enough political and emotional baggage that people are going to have a hard time letting go[1], and that's going to lead to[2] the same kind of partisan garbage that US politics is well known for.

Secondly, because partitioning any group of people into two explicit, conflicting sides is hands down the worst way of easing/preventing/reducing conflict within that group. Ostensibly, that's what he's trying to do with the thought framework; point out that certain things are a matter of preference rather than points of debate, and that we should therefore stop arguing about them. Something tells me the actual effect of this conceptual framework will lead to a different outcome[3]. I've read comments calling the opposition to this classification scheme "weird", and I have to wonder why. It's divisive, pretty much by definition. The fact that certain pieces of it are correct doesn't make it worth keeping in its entirety, and in any case...

Thirdly, the underlying properties he presents are, for the most part, not a matter of preference. He sort of presents them that way, but I disagree at that level. Hell, lets do a blow by blow. here are the points he defines as principles of software conservatives.

  1. Software should aim to be bug free before it launches...
  2. Programmers should be protected from errors...
  3. Programmers have difficulty learning new syntax...
  4. Production code must be safety-checked by a compiler...
  5. Data stores must adhere to a well-defined, published schema...
  6. Public interfaces should be rigorously modeled...
  7. Production systems should never have dangerous or risky back-doors...
  8. If there is ANY doubt as to the safety of a component, it cannot be allowed in production ...
  9. Fast is better than slow. Everyone hates slow code. Code should perform well. You should engineer all your code for optimum speed up front, right out of the box...

The software liberals supposedly have the inverse principles. He makes them explicit in his entry, but I won't bother to quote them here. Note that points 1, 4, 5, 6, 7, 8 and 9 have not a fucking thing to do with personal preference. They're things that make sense in some contexts, and not in others. Some programmers really, really like having error prevention in the form of a restricted language (#2), and some really hate learning new syntax (#3), but the rest of these "principles" involve trade-offs that sometimes make sense and are sometimes retarded. Should All software aim to be bug free? Should production code All be checked by a compiler? Should production systems Never have back-doors? We actually can't know the right answer in general, from a static analysis at least. At the risk of being painted as a godless, sissy liberal in the wake of Yegge's proposal, we need to take a look at the run-time environment.

Your high-frequency trading software or your Air-Data/Inertial Reference Unit, or your cardiac implant firmware had damn well better be bug free, and rigorously modeled AND compiler checked AND free of back-doors AND not allowed anywhere near production if they're even suspected of incorrectness. When the stakes are billions, or lives, eating the cost of a more extensive and rigorous development process makes sense[4]. On the flip-side, when we're dealing with a situation where the software is replacing an already buggy manual process that no lives or life savings depend on, no one is going to care about a complication. Likewise, there isn't a benefit to taking weeks to prevent a bug that you can hotfix in days or hours. Finally, if the cost of a rollback or upgrade is close enough to trivial, you can be forgiven for taking more risks than you otherwise would.

This is not what a preference looks like; it makes sense sometimes and not others, and a correct one can be chosen based on context. A preference is something that there really isn't a "correct" way of thinking about. Something that we have to accept because it's atomic. So even if globally bifurcating the industry would lead to some new insight[5], and even if that insight would improve inter-programmer relations[6], these aren't the axes to do it on.

So there.

Steven... I disagree. And I won't be adopting your thought framework until you consider filtering out your projections.


Footnotes

1 - [back] - If you take a look at the HN, /. and G+ discussions, you'll already see people conflating the political meanings with the proposed software-oriented labels. Less so on slashdot, where most seem to simply dismiss the point of view, but there's a comment on the Google Plus page that reads

Dynamic typing has been shown through research to reduce maintainability compared to static typing. Lars Ivar Igesund

Which is, near as I can tell, Utter Horseshit™©. If you bother reading on, when someone asks for a citation, the response is

The research was done by a friend of mine while working at one of those famous, private research centers (yes, one you've heard of), but to my knowledge it has not been released. I don't remember the statically typed language used in the study, but I Imagine it was Java. The dynamically typed language was Ruby. This I can't point you to it, I just hope that you believe me when I tell you the conclusion of it. It certainly jives with mine experiences.Lars Ivar Igesund

That's about what I was expecting; "This guy I hang out with told me my opinion was totally right". Oh, by the way, 16 upvotes, or plusses, or whatever the fuck. Never-mind the fact that a methodology isn't outlined, or that the definition of "maintainability" isn't mentioned, or that the languages involved are "I Imagine ... Java" and Ruby, or that we don't know if/how the researcher controlled for differences among teams/programmers/projects or (in case this was a single team doing to separate projects) the teams' innate preferences/learning over the course of the experiment.

2 - [back] - Actually, as you can see by the previous note, "is already leading to" would be more accurate. Hell, there's already a guy out there calling himself a "Software Libertarian", and we haven't even gotten through Software Ayn Rand yet. That's some leapfrogging right there.

3 - [back] - I believe that may be the second time I've linked that comic this month.

4 - [back] - In a similar vein, it's interesting to note that NASA's Mars rovers have "dangerous or risky back-doors" capable of modifying the systems' programming and data. Presumably it was too risky to send them out without the possibility of an in-flight bugfix?

5 - [back] - I doubt it will.

6 - [back] - Again, severely doubt it.

Sunday, August 5, 2012

Irritation

Just a short update this time, involving things I keep stubbing my toe on in Lisp and Erlang.

Common Lisp is not Object Oriented

The object orientation support is bugging me again. Not just me, either[1], because a bunch of modules I've been making use of lately have functions with names like time-difference or queue-push, which is precisely what the generic functions are supposed to save you from doing. It recently annoyed the fuck out of me while putting together a simple, caching implementation of a thread-safe queue. I wanted that construct to have push, pop and length, but because those names already designate top-level functions, it's not quite as simple as declaring them.

I'm not about to be dumb enough to propose that this makes Common Lisp an unacceptable language, especially since it looks like this could easily be fixed within the spec as it exists today, and I already quasi-proposed a semi-solution. I just have to give voice to that minor frustration, and point out that what you'd really want in this situation is access to a lot of the basic CLHS symbols as methods rather than functions. Not having this has now bitten me directly in the ass no less than twice[2], and signs that it might be worth fixing are showing in various CL libraries.

Erlang Should Be More Like JavaScript

Wow, do records suck donkey dong!

Ok, to be fair, they're better than having to deal with plain tuples when you're working with large constructs, and they're arguably The Right Way to deal with database storage, but they're a fundamentally annoying and hacky way of implementing key/value pairs.

The problem is record sharing. Here's a thought exercise: what happens when you have a system that deals with the storage and manipulation of sets of comments[3], and a second, completely separate system which would like to consume the output of that first one in order to display these sets in interesting ways for human consumption?

If you had a real k/v construct built in, like what every other goddamn language on this earth seems to have, what you would do is pass an instance of that construct across.

If the hash map was a fundamental data type in Erlang, you would have no problem in this situation.

But.

Records are basically tuples, wearing a bunch of reader macros and syntactic sugar. That means they're potentially faster than using a dynamic data structure for the same purpose, but it means that you can't just pass a record between two otherwise decoupled systems. If you want the same sort of behavior that you'd get out of native k/v support, you have three options I can see, and they all make me want to glare menacingly at Joe Armstrong, or at least whoever decided that records were a satisfactory solution.

Option 1: Duplicate Records

You declare the same record in both systems, then send records across.

This sucks balls because changing the record suddenly requires you to change and recompile both projects. They are not really decoupled anymore. In our theoretical example above, say we've decided that we'd really like to start tracking comments hierarchically. We need to add a pair of new fields, root and parent so that each comment can tell you which tree its part of and where in that tree it is.

-record(comment, {id, user, thread, root, parent, timestamp, title, body, status}).

Now, we can't just make this change in the model component, because if you had different record declarations in the model than the view, you'd get compiler errors. If you have multiple views trying to make use of the same model, and not all of them need the new data[4], too fucking bad, you're changing them all over anyway. This isn't even the worst case scenario, by the way. If you decide that the record shouldn't change fundamentally, but that you merely need to reorder fields, you won't even get a compiler error if you forget to change records in both places.

This is not the sort of brittleness that I expect from a key/value construct.

Option 2: Shared Records

You can write one file, lets say records.hrl, put all your record declarations in there and then include that file in both projects.

This sucks balls because now you don't actually have two decoupled projects at all. You've got one giant, mostly disjoint project with shared data declarations. It's not horrible, to be fair, but remember that having a run-time construct rather than a compile-time record system wouldn't even require this much additional planning.

Option 3: Sending Tuples or Proplists

This is the option I went with for a recent project, and I'm honestly not sure it was the right approach, but there would have been record name collisions otherwise, so whatever, I guess.

Instead of sending records between components directly, you emit a tuple from the model and consume it in the view, potentially creating an intermediate record if you need to. This has pretty much all the downsides of Option 1, except that you don't have a single record name-space to deal with. If you take the Proplist approach, it gets very slightly better because you only need to put together the one abstraction layer to do look-ups, and if you make it complete enough, you don't need to change it whenever you change the record definitions. That's still a lot more annoying than just having this force pre-resolved.

I remember writing up notes from a talk Joe gave about Erlang. One of the points he covers under the "Missing Things" heading was Hash Maps, wherein he pointed out this specific issue with the fundamental architecture of the language. In the notes, I sort of acknowledge that he has a point, but don't linger on it too long. Honestly, I was thinking that it wouldn't bite at all, let alone as hard as it actually has. Joe, if you're reading this, you were right. And for the love of god, if you've got a solution in mind, DO IT.

lists:keyfind/3 and workarounds like this aren't nearly as satisfying as just having an actual, dynamic key/value construct built into the language from the ground up.


Footnotes

1 - [back] - Though I may be the only one who's noticing enough to bitch about it.

2 - [back] - That I've noticed.

3 - [back] - -record(comment, {id, user, thread, timestamp, title, body, status}).

4 - [back] - For instance, if there are places that you're displaying the same set of comments, but don't really care about their hierarchy.

Friday, August 3, 2012

Indirect Reflections

I've got some thinking to do, and given how long the alternative was taking, it's obvious that it can't happen effectively in my head. I don't want to tell you exactly what I'm working on yet, because revealing my projects before I'm done with them results in them never getting done. Witness the detritus that already litters this blog

  • cl-chan took about a year and a half to get a quarter of the way I was going, whereupon I was distracted by shinies
  • Strifebarge was supposed to be a quick weekend project to get me back into the groove of programming after a bit of a vacation, but it's taking months and counting
  • auth was meant to have a working external API layer by now, as well as two-factor-authentication capability[1]
  • clomments was a piece that I literally planned out in its entirety and proceeded to 0.1 in about four hours, then got bored and started poking at Arduinos
  • cl-leet took months of planning and a week of the CL Games Competition to get to a hemi-semi-playable state[3]

Hell, the only projects I've gotten to done, for some value of "done", are the ones I never really think about as projects.

  • emacs-utils is sitting quietly up on github, saving me a few hours per day on various tasks.
  • formlets is doing likewise, though not on a daily basis.
  • cl-css should probably be replaced by something closer to cl-who, and stop using so many `',@s, but it gets the job done in the meanwhile.
  • finally, my mplayer web-frontend is still as awful as the day I threw it together, but it actually functions and lets me "control"[4] my media center from any wifi-capable device in the house.

So, given the track record of "things I talk about first" vs "things I put together first", you'll pardon me for keeping my latest exploits under my hat until I'm ready to pull the big red lever. Thing is, there's a component that I'm trying to assemble that has me unsure about direction, so sitting down and throwing those thoughts through the loopback interface seems like a good idea.

Moderation

I've talked about this before, but not exactly in the same context. How do you moderate a system? Scratch that, how do you moderate a decentralized, public system with an eye for data transmission and potentially divisive discussions, in the light of recent-era copyright rules.

In totality, that offers some interesting challenges, even if no individual component is an unsolved problem. DMCA et al basically necessitate that there be a way to permanently and completely remove a piece of information from a given server, because legal battles may result otherwise. Maybe they don't happen often in practice, but that's still not the sort of risk I'd be willing to take. Trouble is, permanent and complete deletion of information gives some odd incentives to the moderators.

Ok, actually, lets step back a bit further, I've noticed another assumption that should probably be explained.

Moderators

And that's probably not far enough.

Authority Figures

Hmm. No, it's bigger than that too.

Market-Capable Primates

Right, that's far enough back. I'll try to zoom back in as quickly as I can while at least giving some clues as to my thought process.

The interactions of MCPs is predictable in a couple of ways[5]. When you get a bunch of them talking to each other, over whatever hardware and protocol they actually decide to use, you're going to get three basic types of messages going back and forth.

  1. messages genuinely generated by some internal state (regular discussion, *signal*)
  2. messages generated by external forces rather than intrinsic interest ("buy these dick pills!" or "one weird old tip to whiten your small intestine!", *noise*)
  3. messages generated by environmental factors, causing intrinsic interest (religious discussions, *echo*)

You can subtype each of those almost arbitrarily, but it's possible to classify any message you come across into one of the three.

The perfect communication forum[6] would consist of a single instance of each *echo* conversation[7], and be completely free of *noise*.

The ideal message profile is (setf *noise* 0 *echo* #'unique-p *signal* #'not-echo-p)

Authority Figures

In order to ensure that a given forum approaches the ideal message profile, most of them vest power in authority figures. These figures tend to be present whether the forum has other ways of telling *signal* from *noise*, and I'll argue that the reason is largely because of that third category of message we've identified as being somewhat useful to MCPs. The power vested in these authority figures is largely censorship; they kill the *noise* that slips through whatever automated/cloud-based/crowd-sourced/buzzword-compliant system is in place to catch the bulk of it, and are expected to make judgment calls bout *echo*s. If a given topic is judged as being *noise*y, it's deleted, or its visibility is artificially reduced in some way.

Authority Figures in this context do a lot of their work behind closed doors, and each of them is only human. The vague hope is that either they'll be kept in check by the community that develops around them, or by other Authority Figures. In meatspace, that's not a less-than-catastrophic assumption to make, but web forums tend to be viewed as less important (or perhaps better monitored), so something different seems to be happening.

Hmm. We'll actually need a detour on our way back.

Echoes and Shadows

The problem with *echo*s is precisely that they demand a judgment call. One human will take a look at the weekly /r/lisp argument about newLisp/Clojure/whatever-the-new-lisp-dialect-is and hit the spam button before she gets past the first sentence. Another will take a look at the exact same conversation, wonder why they've never heard about it before, and grumble quite loudly when someone closes it. That grumble incurs a cost on the system, measured in citizen good-will; someone who had no idea about a particular discussion is effectively prevented from having it, or forced to have it somewhere else.

This is the best-case situation, mind you; Authority Figures that are doing their very best to provide a balanced community free from inbuilt bias will still occasionally trip over an *echo* and shitcan it, or accidentally mistake a *signal* for *noise*. The typical case is probably going to be worse; AFs deleting *signal* they don't agree with, or aggressively permitting *noise* they enjoy on some level.

That's the trouble with Authority Figure-based *noise* reducing systems; false positives and negatives in situations where you'd rather not have them if you can avoid it. The naive response is fine-graining that Authority.

Moderators

Instead of having a set of Authority Figures for the whole community, shard the community and set up Moderators for each shard. That should reduce pressure on each Moderator, as well as allow them to work to their strengths by moderating communities centered around things they're more than baseline passionate about. The thing is, the output of this process is still not accurate sorting of message types. Moderators still commonly delete things for reasons other than objective merit. If you disagree, spend a few hours here[8].

Moderation

And we're back. Based on the principles outlined above, it seems like the best way to avoid over-moderation-related costs on a community is to make sure that any actions moderators take are

  • publicly viewable in context
  • fully reversible

As mentioned earlier though, there needs to be a way to actually, factually delete threads posts and images for legal purposes. If you get hit upside the head with a DMCA or similar, you can't really say "Yup, we've deleted it, only our moderator community can see it now", that shit needs to be gone. Which means I'm stuck implementing both, and worrying (perhaps excessively) about the effects of the nuclear option. In other words, I want

  1. delete; meaning, "make sure no one but mods can see this, show everyone else a deleted tag"
  2. undelete; meaning, "oops, that wasn't supposed to happen, release that one back to the public"
  3. purge; meaning, "nuclear option, this is either unauthorized or illegal media and it needs to go. Log who hit the kill switch, and ask them for a reason (which should ideally be a copy of the C&D that came in requesting the deletion)"

That third one implies the presence of an outside deletions system that keeps track of information about vaporized stuff without keeping it for archival and post-mortem purposes later.

Perfect! That cleared my mind a bit. I think I can see the way through now. Hopefully, this doesn't prevent me from reaching it.


Footnotes

1 - [back] - That's still coming[2], the project I'm pointedly not mentioning uses auth for the user system, and actually started as a demo project for how you'd go about hooking that up to a larger system.

2 - [back] - Though, to be perfectly fair, I've been saying that about a lot of things.

3 - [back] - Though it did result in two articles that were reasonably interesting to write.

4 - [back] - It can browse one specified directory and play one video at a time. I don't a random wifi user to be able to do anything more than that.

5 - [back] - And a few of them might extend past MC, right into Social Primates in general, but I'm thinking of a particular primate species which disproportionately tends to internet use so we don't have to cast a net quite that wide.

6 - [back] - "Forum" in the general sense, not just the kind you find on the internet.

7 - [back] - You need to talk about these things, but given how often MCPs circle back to them, it's very unlikely you have a new idea, and we definitely don't need to keep hearing about it every week.

8 - [back] - If you were there at the early days of programmers.SE, you don't disagree.

Monday, July 16, 2012

CL-Chan (a CLSQL and Hunchentoot crash course) - Part 2

I've fallen to a fit of OCD in the past few days, which has compelled me to clean out my langnostic drafts folder. Yes, I have a drafts folder. It turns out that it contained something on the order of 12 almost finished articles that I just never got around to polishing. I'm still working on the authentication system, and poking around at the prospect of a real-time message board, but I'll also be either scrapping or refining+posting those forgotten drafts for the next few weeks. Starting with one about a year and a half in the making.

Start with Part 1 if you like.


Aaaand we're finally back.

This time we'll be looking at extending the board features and dealing with images, but first, I'm sure I'm not the only one tired of having to type out

(ql:quickload (list :hunchentoot :cl-who :clsql :formlets))
(load "cl-chan.lisp")

every time I want to load the package. Ideally, I'd like that to be a single statement ("Load all cl-chan files in the correct order, and do the same for each dependency"). That's one of the things an asdf-system definition lets you do[1]. First up, we've been keeping everything in one file, and we really shouldn't. At the very least, the model should be isolated since it's going to grow shortly. It's also typical for CL projects to have a separate package.lisp file[2]. Our package file is going to be very simple, since we don't export anything yet.

(defpackage :cl-chan (:use :cl :cl-who :hunchentoot :formlets :clsql)
  (:shadowing-import-from :clsql :select))
(in-package :cl-chan)

(connect '("localhost" "cl_chan" "me" "my password") :database-type :mysql)

(setf *default-caching* nil)

(setf formlets:*public-key* "my-public-key" 
      formlets:*private-key* "my-private-key")

(defvar *web-server* (start (make-instance 'hunchentoot:easy-acceptor :port 4242)))
(push (create-static-file-dispatcher-and-handler "/cl-chan.css" (merge-pathnames "cl-chan.css")) *dispatch-table*)

That should do it.

Next, lets pull out our testing data into a file named "testing-data.lisp" (we'll remove this later, but it will let you re-create your database fairly easily if you need to while we're still playing around)

(in-package :cl-chan)

(defun create-test-database ()
  (create-tables)
  (insert-test-data))

(defun drop-tables ()
  (dolist (i '(board thread comment))
    (drop-view-from-class i)))

(defun create-tables ()
  (dolist (i '(board thread comment))
    (create-view-from-class i)))

(defun insert-test-data ()
  (loop for i in (list (make-instance 'board :name "a")
                       (make-instance 'thread :board-id 1)
                       (make-instance 'thread :board-id 1)

                       (make-instance 'comment 
                            :thread-id 1
                            :author "me" :email "my@email.com" :subject "FRIST"
                            :body "I am most certainly the first poster in this fine establishment"
                            :date-time (now))
                       (make-instance 'comment 
                            :thread-id 1
                            :author "someone else" :email "you@fmail.com" :subject "Stop being a douchebag"
                            :date-time (now))
                       (make-instance 'comment 
                            :thread-id 1
                            :subject "You must be new here"
                            :body "trolled-softly.jpg"
                            :date-time (now))
                       (make-instance 'comment 
                            :thread-id 2
                            :body "[Something vaguely anti-semetic.]"
                            :date-time (now)))
        do (update-records-from-instance i)))

Next, pull out the model (the classes and related methods) into model.lisp. This'll actually be the most complicated individual file in the project.

(in-package :cl-chan)
(file-enable-sql-reader-syntax)

(defun now () (clsql-sys:utime->time (get-universal-time)))

;;;;;;;;;; board
(def-view-class board ()
  ((id :accessor id :initarg :id :type integer :db-constraints (:not-null :auto-increment) :db-kind :key)
   (name :reader name :initarg :name :type (string 5))
   (threads :accessor threads :db-kind :join
            :db-info (:join-class thread :home-key id :foreign-key board-id :set t))))

(defmethod echo ((board board))
  (with-html-output (*standard-output* nil :indent t)
    (:h1 (str (name board))) (:hr)
    (show-formlet post-thread-form) (:hr)
    (dolist (thread (threads board))
      (summarize thread))))

;;;;;;;;;; thread
(def-view-class thread ()
  ((id :accessor id :initarg :id :type integer :db-constraints (:not-null :auto-increment) :db-kind :key)
   (board-id :reader board-id :initarg :board-id :type integer)
   (comments :accessor comments :db-kind :join
             :db-info (:join-class comment :home-key id :foreign-key thread-id :set t))))

(defmethod summarize ((thread thread) &optional (preview-comment-count 5))
  (let* ((preview-comments (last (cdr (comments thread)) preview-comment-count))
         (omitted-count (- (length (cdr (comments thread))) (length preview-comments)))
         (first-comment (car (comments thread))))
    (with-html-output (*standard-output* nil :indent t)
      (:div :class "thread"
            (echo-header first-comment)
            (:a :href (format nil "/thread?thread-id=~a" (id thread)) "Reply")
            (:span :class "body" (:p (str (body first-comment))))
            (when (> omitted-count 0)
              (htm (:p :class "omitted" 
                       (str (format nil "~a comments omitted (and we don't do pictures yet)" 
                                    omitted-count)))))
            (dolist (r preview-comments) (str (echo r)))))))

(defmethod echo ((thread thread))
  (let ((first-comment (car (comments thread))))
    (with-html-output (*standard-output* nil :indent t)
      (:a :href "/board" "[Back]") (:hr)
      (show-formlet post-comment-form :default-values (list (id thread))) (:hr)
      (:div :class "thread"
            (echo-header first-comment)
            (:span :class "body" 
                   (:p (str (body first-comment))))
            (dolist (r (cdr (comments thread))) (str (echo r)))))))

;;;;;;;;;; comment
(def-view-class comment ()
  ((id :accessor id :initarg :id :type integer :db-constraints (:not-null :auto-increment) :db-kind :key)
   (thread-id :reader thread-id :initarg :thread-id :type integer)
   (author :reader author :initarg :author :initform nil :type string)
   (email :reader email :initarg :email :initform nil :type string)
   (subject :reader subject :initarg :subject :initform nil :type string)
   (body :reader body :initarg :body :initform nil :type string)
   (date-time :reader date-time :initarg :date-time :type wall-time)))

(defmethod echo ((comment comment))
  (with-html-output-to-string (*standard-output* nil :indent t)
    (:div :class "comment"
          (echo-header comment)
          (:span :class "body" 
                 (:p (str (body comment)))))))

(defmethod echo-header ((comment comment))
  (with-html-output (*standard-output*)
    (:span :class "header" 
           (dolist (elem '(author email date-time subject))
             (htm (:span :class (format nil "~(~a~)" elem) (str (slot-value comment elem))))))))

Having chopped all that off, your cl-chan.lisp file should be left at

(in-package :cl-chan)
(file-enable-sql-reader-syntax)

(defmacro page-template ((&key title) &body body)
  `(with-html-output-to-string (*standard-output* nil :prologue t :indent t)
     (:html :xmlns "http://www.w3.org/1999/xhtml" :xml\:lang "en" :lang "en"
            (:head (:meta :http-equiv "Content-Type" :content "text/html;charset=utf-8")
                   (:title (str ,title))
                   (:link :rel "stylesheet" :type "text/css" :href "/cl-chan.css"))
            (:body ,@body))))

(define-formlet (post-comment-form)
    ((thread-id hidden) 
     (author text) (email text) (subject text) (body textarea)
     (captcha recaptcha))
  (let ((new-comment (make-instance 'comment
                                    :thread-id (parse-integer thread-id)
                                    :author author :email email 
                                    :subject subject :body body
                                    :date-time (now))))
    (update-records-from-instance new-comment)
    (redirect (format nil "/thread?thread-id=~a" thread-id))))

(define-formlet (post-thread-form)
    ((author text) (email text) (subject text)
     (body textarea :validation ((longer-than? 5) "You need to type at least six characters here."))
     (captcha recaptcha))
  (let* ((thread-id (update-records-from-instance
                     (make-instance 'thread :board-id 1)))
         (new-comment (make-instance 'comment 
                                     :thread-id thread-id
                                     :author author :email email 
                                     :subject subject :body body
                                     :date-time (now))))
    (update-records-from-instance new-comment)
    (redirect"/board")))

(define-easy-handler (board-page :uri "/board") ()
  (page-template (:title "cl-chan")
    (let ((board (caar (select 'board :where [= [slot-value 'board 'id] 1]))))
      (echo board))))

(define-easy-handler (thread-page :uri "/thread") (thread-id)
  (let ((thread (caar (select 'thread :where [= [slot-value 'thread 'id] thread-id]))))
    (page-template (:title (or (subject (car (comments thread))) (id thread)))
      (echo thread))))

Now that we've broken everything up, start up a new file called cl-chan.asd and put this in it

;;; -*- Mode: Lisp -*-
(defpackage :cl-chan-system (:use :cl :asdf))
(in-package :cl-chan-system)

(asdf:defsystem cl-chan
  :version "0.001"
  :author "Inaimathi"
  :maintainer "you"
  :licence "AGPL"
  :description "A simple message board server"
  :depends-on (:hunchentoot :cl-who :clsql :formlets)
  :serial t  
  :components ((:file "package") (:file "model") (:file "cl-chan") (:file "testing-data")))

You should now be able to load up your lisp (with cl-chan/ as your working directory) and

> (ql:quickload :cl-chan)
To load "cl-chan":
  Load 1 ASDF system:
    cl-chan
; Loading "cl-chan"
..................................................
[package cl-chan]...
(:CL-CHAN)

Much better than doing it manually, and if you're showing off your app, you get to pretend you know Zach Beane for about two seconds until people realize what's going on :p. Now keep in mind that in this particular project, it hasn't saved us all that much. Even with our better organized code, we'd only really need to evaluate

(ql:quickload (list :hunchentoot :cl-who :clsql :formlets))
(load "package.lisp")
(load "model.lisp")
(load "cl-chan.lisp")
(load "testing-data.lisp")

in order to launch our little message board. Once you start adding files, you rapidly see the value of being able to use a single load statement instead, since a properly written .asd automatically loads them all in the correct order, as well as loading any noted dependencies. Note that once you understand what asd/package.lisp files are supposed to look like, you can auto-generate them with quickproject (thanks again, Zach).

Ok, it's about damn time we figured out what to do about these images we want, otherwise it's not much of an imageboard.

Alternatives Note

I'm going to continue the rest of the tutorial assuming we're taking a slightly reduced functionality, Lisp-only approach. It'll make the system really easy to set up, and increase its portability across various platforms (since everything we'll be using is a lisp library, it'll run anywhere you can load a thread-capable Common Lisp implementation). However, that's far from the only option.

The FFI Route

First, if you don't mind some installation headaches, you can use the faster-at-run-time :cl-gd (which is a set of UFFI bindings to the C-based GD graphics library).

Ostensibly, you can install libgd2-xpm or libgd2-noxpm (if you don't know the difference, just use the first one) and then (ql:quickload :cl-gd). And if that works for you, fantastic, you lucky fucker. Before you celebrate though, make sure to try out an example or two, because I thought it really was this easy to begin with.

If you're getting undefined alien function errors, as I did, you'll actually need to install the debian package from the lenny repos. If you're already running lenny, just do apt-get install cl-gd, otherwise you'll need to add

deb http://ftp.us.debian.org/debian/ lenny main contrib

to your /etc/apt/sources.list file, then run apt-get install cl-gd. I have no idea what issues you'll run into with lisps other than SBCL on systems other than Debian linux. Which is why I'm doing the portable thing in the actual tutorial.

The Lazy Bastard Route

Don't let the name throw you, it may be a legitimate option, depending on the circumstances. Basically, instead of dicking around with native libraries, or FFI calls, you just load up :trivial-shell and do something like

(shell-command (format nil "convert ~a -resize 300\\> ~a-preview.jpg" image-file image-name))

The performance on it sucks donkey dong, and it's not portable to non-posix-compliant platforms, and it requires you to have ImageMagick installed on the deployment environment, and it means you need to handle server-side image naming yourself (which we were going to do anyway) to avoid shell injection attacks. However, it's one line of code and it covers conversion for pretty much every graphical format under the sun (the main tutorial will be doing things the hard way, so you'll fully appreciate the simplicity of shelling out). If you can afford those hits, it's not a bad option.

I'll be continuing with the pure-Lisp version for portability purposes, and that gives us a bit of a problem. Unlike the "FFI" and "Lazy Bastard" options outlined above, Common Lisp doesn't have a general image-formatting library. We'll need to use separate libraries (and slightly different processes) for different image formats. Three in all, since we want to fully implement the 4chan formats; ch-image for JPEGs, imago for PNGs and skippy for GIFs. That's a pretty obvious place to apply method calls, actually, so lets start by formalizing the process for a single image type, then extending it to the others.

There are exactly two things we'll want to do to an incoming image: store the original, and store a 250x250 pixel proportional preview image. Before we get to that, we'll need to include a new library. :cl-fad will give us some easy ways of dealing with files on disk, so add it to your asd file package.lisp and load it into your REPL too. While we're at it, lets add the image manipulating libraries and the new file "images.lisp" into the mix.

;;; cl-chan.asd

...
:depends-on (:hunchentoot :cl-who :clsql :formlets :cl-fad 
                            ;; image related
                            :imago :skippy :ch-image)
...
:components ((:file "package") (:file "model") (:file "cl-chan") (:file "testing-data") (:file "images")))

;;; package.lisp

(defpackage 
    :cl-chan (:use :cl :cl-who :hunchentoot :formlets :clsql :cl-fad)
    (:import-from :imago :read-png :write-png) ;; resize
    (:import-from :ch-image :read-image-file :write-image-file) ;; resize-image
    (:import-from :skippy :load-data-stream :output-data-stream) ;; scale
    (:shadowing-import-from :clsql :select))
...

Note that we're importing the entirety of :cl-fad, but just select symbols from the image libraries. This isn't strictly necessary, but since we're going to be including three different utilities that do similar things, I get the sneaking suspicion that we'd get symbol collisions otherwise. I've imported relevant operations from all three libraries, even though we're starting out with JPGs only. Now then, start a new file called images.lisp, and add the following to it

(in-package :cl-chan)

(defclass image-upload ()
  ((name :reader name :initarg :name)
   (file-path :reader file-path :initarg :file-path)))

;;;;;;;;;; utility
(defun file-tuple->image-upload (hunchentoot-file-tuple)
  (destructuring-bind (file-path original-file-name mimetype) hunchentoot-file-tuple
    (make-instance (intern (string-upcase mimetype) :cl-chan)
                   :name (file-namestring file-path)
                   :file-path file-path)))

(defun store! (hunchentoot-file-tuple)
  (when hunchentoot-file-tuple
    (let ((img (file-tuple->image-upload hunchentoot-file-tuple)))
      (store-images! img))))

(defun new-dimensions (size width height)
  "Given a target size and width/height, returns a new width/height preserving aspect ratio. 
Does not scale images smaller than 250x250."
  (let ((ratio (max 1 (float (/ (max width height) size)))))
    (values (round (/ width ratio)) (round (/ height ratio)) ratio)))

;;;;;;;;;; PNGs
(defclass png (image-upload) ())
(defclass image/x-png (png) ())
(defclass image/png (png) ())

(defmethod store-images! ((img png))
  "Saves a preview and a big version of the given image in directories specified by the *big-dir* and *preview-dir* conf variables."
  (let* ((pic (read-png (file-path img)))
         (w (imago:image-width pic))
         (h (imago:image-height pic))
         (pic-name (make-pathname :name (name img) :type "png")))
    (copy-file (file-path img) (merge-pathnames pic-name *big-dir*))
    (multiple-value-bind (new-width new-height) (new-dimensions 250 w h)
      (write-png (imago:resize pic new-width new-height) 
                 (merge-pathnames pic-name *preview-dir*)))
    (namestring pic-name)))

So. What we just did was create a new class called image-upload, subclass it specifically for png, and write the store-images! method. :imago doesn't seem to provide a way to preserve aspect ratio for an image as you resize it, so we have to do that manually. I resisted the temptation to make it png-specific, because it's entirely possible that we'll need to call the same code as part of generating previews for the other formats.

Take a closer look at the file-tuple->image-upload. We're using the incoming file mimetype as a class name. That may sound like a bad idea, but as you'll see in a few minutes, we're going to be restrictive about what input we accept. It's just that in order to build a system we can extend later, we can't really be restrictive here.

Extensibility Note

The way we're going to restrict input is by doing server-side validation on the files our users will upload. That's a good idea, but doing just that will leave validation for this function elsewhere in the codebase. Without seeing that validation, the definition for file-tuple->image-upload can easily be mistaken for an injection attack vector (if we didn't validate, a user could send a bogus mimetype and cause us to spawn, for example, a pathname instead of an image. I can't think of an obvious attack that would be enabled by that, but it's still best to minimize vectors). Lets think through the alternatives here

Validate in file-tuple->image-upload

Whether we do it by writing an assertion to make sure that the incoming mimetype meets some criteria, or by creating a specific class using a cond statement, we run into the same problem: in order to add a new supported image type, it won't be enough to just evaluate a new store-images! method and new classes. We'll actually have to slightly re-write file-tuple->image-upload. That's bad; we'd like extensibility to be possible without involving edits to an existing cond in our package.

If you really, really feel nervous about leaving a naked class declaration like we did, you can add something along the lines of (assert (string= "image/" (subseq mimetype 0 6))) to the function, just to ensure the incoming is an image. Even doing that is going to come back for a bite of your ass if you decide to allow PostScript/PDF uploads in your forum (since both of those have the "application/something" mimetype).

Break file-tuple->image-upload up into methods

Instead of doing validation in-function, it's also possible to break the image-upload-creating function up into different methods. This is a viable, and technically more object-oriented, approach to the problem. I'm choosing not to go that way because it would mean defining something like

(defmethod make-image-upload ((mime (eql 'image/png)) hunchentoot-file-tuple)  
  "Handles PNG image-upload creation with the image/png mimetype"
  (destructuring-bind (file-path original-filename mime-string) hunchentoot-file-tuple
    (declare (ignore original-filename mime-string))
    (make-instance 'png :name (file-namestring file-path) :file-path file-path)))

(defmethod make-image-upload ((mime (eql 'image/x-png)) hunchentoot-file-tuple)
  "Handles PNG image-upload creation with the image/x-png mimetype"
  (make-image-upload 'image/png hunchentoot-file-tuple))

[repeat for every image type]

rather than merely something like

(defclass image/x-png (png) ())
(defclass image/png (png) ())

[repeat for every image type]

Yes, it's more object-oriented, but it's a lot more verbose, and it'll get even worse if you want to support an image type that has more than two common mimetypes. Keeping the amount of code you type to a minimum is very good practice for all the reasons you've probably already heard.

Note that we've actually got three subclasses for png. There's two reasons for that. I go over one in the Extensibility Note above. The other is that, while PNGs are technically supposed to be of mimetype image/png, I've seen several in the wild with image/x-png instead. Now, even though there are multiple mimetypes a PNG could have, we won't be dealing differently with each of them, so it's enough to create a png class with the appropriate methods, and then subclass that for individual mimetypes we plan to encounter. If you've seen others, feel free to add them.

Now that we have a way of dealing with images, lets set up the rest of our system to deal with them. First off, add the following lines to your package.lisp

(defparameter *image-storage-directory* "img")
(defparameter *big-dir* (merge-pathnames (make-pathname :directory `(:relative ,*image-storage-directory* "big"))))
(defparameter *preview-dir* (merge-pathnames (make-pathname :directory `(:relative ,*image-storage-directory* "preview"))))
(ensure-directories-exist *big-dir*)
(ensure-directories-exist *preview-dir*)
(push (create-folder-dispatcher-and-handler 
       "/img/" 
       (merge-pathnames (make-pathname :directory `(:relative ,*image-storage-directory*)))) 
      *dispatch-table*)

I'll format it a bit better in the code I check in, but it'll basically do the same thing. That's a specifier for a local image storage directory, and two specific subdirectories (which we ensure-exist just in case) where we'll be keeping the images that get uploaded to our board. The last few lines push our new storage folder onto the dispatch table so that Hunchentoot can serve the contained files. Now that we've got that, we'll need to change our formlets to accept an image file, and tell them what to do with it. We'll also need to add an image field to our comment class,

;;; model.lisp
...
(def-view-class comment ()
  ((id :accessor id :initarg :id :type integer :db-constraints (:not-null :auto-increment) :db-kind :key)
   (thread-id :reader thread-id :initarg :thread-id :type integer)
   (author :reader author :initarg :author :initform nil :type string)
   (email :reader email :initarg :email :initform nil :type string)
   (subject :reader subject :initarg :subject :initform nil :type string)
   (body :reader body :initarg :body :initform nil :type string)
   (date-time :reader date-time :initarg :date-time :type wall-time)
   (image :reader image :initarg :image :type string)))

...

State Note

You'll also need to re-create your database tables (or evaluate alter table COMMENT add column IMAGE varchar(255); against your database.

The actual SQL you need to run will vary slightly based on what database you're using. The above works with MySQL.

and change its echo method to output the image preview along with the comment. You'll also want to add the image echoing code to the thread echo method, since that does its own thing. In fact, it'd probably be a better idea to define a new echo-image method and call it in those three places.

;;; model.lisp

(defmethod summarize ((thread thread) &optional (preview-comment-count 5))
...
            (:a :href (format nil "/thread?thread-id=~a" (id thread)) "Reply")
            (echo-image first-comment)
            (:span :class "body" (:p (str (body first-comment))))
            (:br :class "clear")
...

(defmethod echo ((thread thread))
...
            (echo-header first-comment)
            (:span :class "body" 
                   (echo-image first-comment)
                   (:p (str (body first-comment))))
            (dolist (r (cdr (comments thread))) (str (echo r)))))))
...

(defmethod echo ((comment comment))
  (with-html-output-to-string (*standard-output* nil :indent t)
    (:div :class "comment"
          (echo-header comment)
          (:span :class "body" 
                 (echo-image comment)
                 (:p (str (body comment)))
                 (:br :class "clear")))))

...

(defmethod echo-image ((comment comment))
  (when (image comment) 
    (with-html-output (*standard-output* nil :indent t)
      (:a :href (merge-pathnames (image comment) "/img/big/") 
          (:img :class "pic" :src (merge-pathnames (image comment) "/img/preview/"))))))

The last thing we need to do is change up the comment formlets so that they accept images (with validation) and add the appropriate image URI to the comments they produce.

;;; cl-chan.lisp

...
(defun validate-image (hunchentoot-file-tuple)
  (or (null hunchentoot-file-tuple)
      (and (funcall (file-type? "image/x-png" "image/png") hunchentoot-file-tuple)
           (funcall (file-smaller-than? 3000000) hunchentoot-file-tuple))))

...
     (author text) (email text) (subject text) (body textarea) 
     (image file :validation (#'validate-image "We accept PNGs smaller than 3MB"))
     (captcha recaptcha))
    (let* ((pic (store! image))
           (new-comment (make-instance 'comment
                                       :thread-id (parse-integer thread-id)
                                       :author author :email email 
                                       :subject subject :body body
                                       :date-time (now)
                                       :image pic)))
      (update-records-from-instance new-comment)
...

     (image file :validation (#'validate-image "We accept PNGs smaller than 3MB"))
     (captcha recaptcha))
  (let* ((thread-id (update-records-from-instance
                     (make-instance 'thread :board-id 1)))
         (pic (store! image))
         (new-comment (make-instance 'comment 
                                     :thread-id thread-id
                                     :author author :email email 
                                     :subject subject :body body
                                     :date-time (now)
                                     :image pic)))
...

Note the image validation function I was talking about. That makes sure that the file coming at file-tuple->image-upload is only going to be one of the options it can handle.

That should be that. If you head over to your browser now, you should be able to upload a PNG and see it kind of shittily rendered in the middle of your comment. If you'd rather do without the "shittily", add these two lines to your cl-chan.css

.pic { float: left; }
.clear { clear: both; }

Whew!

Ok, before you relax, remember, we've still got two more image types to handle, and one subtle bug to fix. Lets add those image types first.

;;; image.lisp

...

;;;;;;;;;; JPGs
(defclass jpg (image-upload) ())
(defclass image/jpeg (jpg) ())
(defclass image/pjpeg (jpg) ())

(defmethod store-images! ((img jpg))
  "Saves a preview and a big version of the given image in directories specified by the *big-dir* and *preview-dir* conf variables."
  (let ((pic-name (make-pathname :name (name img) :type "jpg")))
    (copy-file (file-path img) (merge-pathnames pic-name *big-dir*))
    (let* ((pic (read-image-file (merge-pathnames pic-name *big-dir*)))
           (w (ch-image:image-width pic))
           (h (ch-image:image-height pic)))
      (multiple-value-bind (new-width new-height) (new-dimensions 250 w h)
        (write-image-file (merge-pathnames pic-name *preview-dir*)
                          (ch-image:resize-image pic new-height new-width))))
    (namestring pic-name)))

also, we need to modify the validation functions on our formlets.

;;; cl-chan.lisp

(defun validate-image (hunchentoot-file-tuple)
  (or (null hunchentoot-file-tuple)
      (and (funcall (file-type? "image/x-png" "image/png" "image/jpeg" "image/pjpeg") hunchentoot-file-tuple)
           (funcall (file-smaller-than? 3000000) hunchentoot-file-tuple))))

...
     (image file :validation (#'validate-image "We accept PNGs or JPGs smaller than 3MB"))
...
     (image file :validation ((file-type? "image/x-png" "image/png" "image/jpeg" "image/pjpeg") 
                              "You need to upload an image of type PNG or JPG"
                              (file-smaller-than? 3000000) "Your file needs to be smaller than 3MB"))
...

That should do it.

We implemented JPGs ahead of GIFs because it's much closer to the PNG scaling we already did, but note the number of differences there.

  1. the width and height arguments are ordered y x rather than x y
  2. the pathname and image parameters to the write function are in a different order
  3. the functions involved in reading/writing/resizing are named differently
  4. the steps we need to take are in a different order because ch-image can't seem to read image files with no file extension. That means we copy the big one first, rename it, then read that instead of the original temp file

Those are all small differences that you nevertheless need to get right if you don't want a face full of errors or odd results (like that screenshot above) when you start your board up. Before we tackle GIFs and that subtle bug, let me just point out that someone who picked our "Lazy Bastard" route earlier has saved themselves all of this trouble, and probably has higher quality previews to boot. But we're here to learn things, so we're going the hard way. Now then.

;;; images.lisp
...
;;;;;;;;;; GIFs
(defclass image/gif (image-upload) ())

(defmethod store-images! ((img image/gif))
  "Saves a preview and a big version of the given image in directories specified by the *big-dir* and *preview-dir* conf variables."
  (let* ((pic (load-data-stream (file-path img)))
         (first-frame (aref (skippy:images pic) 0))
         (width (skippy:width pic))
         (height (skippy:height pic))
         (pic-name (make-pathname :name (name img) :type "gif")))
    (copy-file (file-path img) (merge-pathnames pic-name *big-dir*))
    (multiple-value-bind (new-w new-h) (new-dimensions 250 width height)
      (let ((new-pic (skippy:make-data-stream 
                      :width new-w :height new-h
                      :color-table (skippy:color-table pic))))
        (skippy:add-image 
         (skippy:composite first-frame
                           (skippy:make-image :width new-w :height new-h) 
                           :width new-w :height new-h)
         new-pic)
        (output-data-stream new-pic (merge-pathnames pic-name *preview-dir*))))
    (namestring pic-name)))

and the appropriate formlet changes. In fact, we'd really better pull out the image types into a separate variable so that we only need to change them in one place.

;;; package.lisp

...
(defparameter *allowed-image-fn*
  (file-type? "image/x-png" "image/png" "image/jpeg" "image/pjpeg" "image/gif"))

(defparameter *image-message*
  "You need a PNG, JPG or GIF smaller than 3MB")
...

;;; cl-chan.lisp

(defun validate-image (hunchentoot-file-tuple)
  (or (null hunchentoot-file-tuple)
      (and (funcall *allowed-image-fn* hunchentoot-file-tuple)
           (funcall (file-smaller-than? 3000000) hunchentoot-file-tuple))))

(define-formlet (post-comment-form)
...
     (image file :validation (#'validate-image *image-message*))
...

(define-formlet (post-thread-form)
      (image file :validation (*allowed-image-fn* *image-message*
                              (file-smaller-than? 3000000) 
                              "Your file needs to be smaller than 3MB"))
...

And that's that.

You'll notice that the GIF resizing process is a lot more complicated than what we had to do for PNGs or JPGs. That's because GIFs are potentially animated, so the Lisp library that handles them treats them as streams of images. That allows for better frame control, but it does mean that we need to

  1. load the stream
  2. pull out the first frame
  3. resize that frame[3]
  4. shove it into a fresh stream
  5. write that new stream of one image

Once again, notice how much work the Lazy Bastards have saved themselves with that one line of code. Incidentally, had I known about Skippy's lack of ability to scale an image down, I probably would have gone with the lazy option myself and chucked portability in a fucking bin.

It's already done, so no sense in tearing out half of this column now. Especially since its been a good what... year and a half since I started it? Yeah, sounds like it's about time to get the FILDI out.

Really quickly before we go, I mentioned a subtle bug. We're letting Hunchentoot generate tempnames for our files. That's very simple, since we do nothing, but its internal name counter gets reset every time it shuts down. Which means that if you shut it down in production, new images are going to start clobbering your old ones. The easiest way to solve this is appending a timestamp to them. Between that and Hunchentoots' internal temporary file counter, we should be set in terms of unique names. Given how we use the name field of the image-upload class, the simplest way to do this is actually in file-tuple->image-upload.

(defun file-tuple->image-upload (hunchentoot-file-tuple)
  (destructuring-bind (file-path original-file-name mimetype) hunchentoot-file-tuple
    (make-instance (intern (string-upcase mimetype) :cl-chan)
                   :name (format nil "~a-~a" (file-namestring file-path) (get-universal-time)) 
                   :file-path file-path)))

Note the change in the line starting with :name. Ok, I've had enough of this bullshit. New codebase up at github, now get out.

Tune In Next Time (heh heh) For:

  • finally getting to multiple boards!
  • walking through the lazy option in its entirety!
  • some better usability and UI, maybe!

Footnotes

1 - [back] - The other is put together a downloadable archive so that other Lispers can install your package through asdf-install, but we'll discuss that later.

2 - [back] - So that when someone else is using your package, they can go to one consistent place to see all the symbols you're exporting and including. I tend to put conf variables there too if there aren't very many of them, but they should probably be in their own conf.lisp file.

3 - [back] - Actually, there's no usable resize or scale option, so we just crop it to the top left.