Monday, May 28, 2012

jQuery Accordion Widget in Blogger

I was adding some Erlang and Emacs-related items to the Resources section of the sidebar, and the height finally got annoying enough to do something about.

You'll notice that the it now sits neatly inside a jQueryUI accordion widget. You could easily go inspect that element, but let me save you a right-click and some DOM navigation.

<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js" type="text/javascript"></script>
<script src="https://ajax.googleapis.com/ajax/libs/jqueryui/1.8.18/jquery-ui.min.js" type="text/javascript"></script>
<script type="text/javascript"> $(document).ready(function () { $("#accordion").accordion({autoHeight: false, collapsible : true, active: false}); });</script>

<link rel="stylesheet" href="http://ajax.googleapis.com/ajax/libs/jqueryui/1.8.17/themes/base/jquery-ui.css" type="text/css" media="all" />

<style type="text/css">
  #sidebar #accordion h3 { padding-left: 25px; font-weight: bold; }
  #sidebar #accordion div ul { list-style: circle; }
  #sidebar #accordion div ul a { text-decoration: underline; color: blue; }
</style>

<span style="color: #d7d7d7; padding-bottom: 15px; font-size: x-small;">(in no particular order)</span>
<div id="accordion">
  <h3>Section Title</h3>
  <div>
    Section Text
    <ul>
      <li><a href="foo">bar</a></li>
    </ul>
  </div>
</div>

As you know unless you've been living under a rock for about five years, Google hosts copies of the jQuery and jQueryUI libraries. As you can see from the style link, they also host the relevant CSS in many themes. I was afraid I'd have to do some JS hacking to get all this working with Blogger, but it's fairly straightforward. Just add an HTML/JavaScript widget to your blog, paste in the above code, and play with the styling a bit if you like.

Thanks to jQuery, the sidebar is now markedly less cluttered than it was twenty minutes ago.

Saturday, May 26, 2012

Boring Update

This has been one hell of a month, mostly for non-technical reasons, but I think I need to discuss some of them regardless. The following is a journal-style entry, so skip it if you're here for any kind of language discussion.

Specialization

Firstly, you may have noticed that I've been hacking Erlang lately. It's verbose, it's obtuse, it works at bizarre cross-purposes with itself, but it has endeared itself to me for reasons I've already discussed. It's not too clear to me why I have this drive to try new languages, and it's not entirely clear whether it gives me an edge or dulls it in the end. It feels like I'm making reasonable progress and gaining perspective on the process of expressing processes precisely, and maybe that's enough. The root of the chain is this bias I have against overspecialization, which may or may not be an evolutionary vestige, but it doesn't seem to have hurt me yet. It seems intuitively obvious that I'd want to avoid the situation where I don't have the right tools for a job, and that means keeping a lot of them around. Admittedly, I haven't practised this in real life, but cognitive tools don't take up space, and are always at my call, so it's much easier to justify.

I've had conversations with quite a few people I respect that go the other way. That is, they seem to think that going deep is much better than going broad, but that honestly only seems to be true if your goal is to end up as a corporate developer or team lead somewhere. I've also had encounters with people almost hard-wired to a particular language. One or two Lispers I keep in touch with seem genuinely concerned that I've been off doing Erlang or Smalltalk work. Pretty much every C++/C#/Java programmer I've met so far in real life have condescendingly stated that [their language] is the only one you should ever consider for production work. To top it off, I've interacted with a worrying number of Haskell douches who aggressively push their preference on other functional programmers.

That can't be the correct approach, regardless of how powerful an individual language is.

Make

The Erlang play I've engaged in has forced me to take a serious look at make. I mentioned a while ago that I reach for Ruby whenever I need to do almost any small bit of scripting. Until about a week ago, this included deployment scripts. It never really occurred to me that make was good for something other than compiling C projects, but taking a closer look, it seems like it can do quite a bit. It has conditionals, loops and functions, and it deals with command line arguments a lot more gracefully than scripts in typical general-purpose languages.

exclude = .git .gitignore *~ docs/* *org config.lisp log

define deploy
        git checkout $(1);
        rsync -rv $(foreach var, $(exclude), --exclude $(var)) ./ $(2);
        ssh $(3);
endef

deploy-public:
        $(call deploy, master, [user]@[server]:[project-root], [user]@[server])

deploy-client-a:
        $(call deploy, [client-branch], [user]@[server]:hhsc-[project-root], [user]@[server])

deploy-client-b:
        $(call deploy, [client-branch], [user]@[server]:hhsc-[project-root], [user]@[server])

ssh:
        ssh [user]@[server]

That saved me about 40 lines when compared to the Ruby script that used to do the same job[1]. Granted, the Makefile makes me type out the [user]@[server] string twice, because : is otherwise interpreted as a control character and there's oddly no way to escape it, but that's an acceptable blemish given the overall line savings. Now that's not to say that make is more elegant than Ruby, just that it's a lot more specialized for the task. Most of the chaff from those 56 lines was doing command-line parsing and some declarations, which again hints that command line argument parsing is a hack.

The other advantage of the Makefile is that using it gives me meaningful completions at the command line. In the above, if I tabbed on make, it would give me the different tasks as potential entries

inaimathi@hermaeus:~/project$ make 
deploy-client-a  deploy-client-b  deploy-public  Makefile       ssh
inaimathi@hermaeus:~/project$ make |

That's going to get more convenient the more clients we start supporting. I'm not going to go through the full make syntax; it's fairly self explanatory and docs exist in any case. A definition looks like that define..endef block, calling a function looks like $(call fn, arg1, arg2, ...), the exclude line shows you what a variable looks like, and the bit that looks like $(foreach ...) is a loop. That should be enough for pretty much anything you need to do with the tool.

Music

I had a fit of OCD the other day, and decided to finally organize my music library to prevent my phone from reporting

Unknown Artist -- 178 songs

instead of correctly sorted collections. I did reach for Ruby here, and two scripts turned out to be particularly useful

#!/usr/bin/ruby

require 'optparse'
require 'fileutils'

class String
  def naive_title_case(split_by = "-")
    split(split_by).map(&:capitalize).join " "
  end
  def strip_song
    s = self.split("--")
    (s[1] ? s[1] : self).gsub(".ogg", "")
  end
end
  
ARGV.each do |target|
  artist = target.gsub("/", "").naive_title_case
  FileUtils.cd(target) do
    Dir.entries(".").find_all{|e| e.end_with? ".ogg"}.each do |file|
      `vorbiscomment -t 'ARTIST=#{artist}' -t 'TITLE=#{file.strip_song.naive_title_case}' -w #{file}`
    end
  end
end
#!/usr/bin/ruby

require 'optparse'
require 'fileutils'

$options = {:sub => "", :downcase => nil}
OptionParser.new do |opts|
  opts.on('-r', '--regex REGEX', String, 
          'Specify the regular expression to replace') {|reg| $options[:regex] = Regexp.new(reg)}
  opts.on('-s', '--sub SUBSTITUTE', String, 
          'Specify what to replace the match with. By default, the empty string (so matches are stripped).') {|$options[:sub]|}
  opts.on('-d', '--downcase', 'If passed, all filenames will be downcased.'){|$options[:downcase]|}
end.parse!

usage unless ARGV.length > 0

def rename(str)
  ($options[:downcase] ? 
   str.downcase : str).gsub($options[:regex], $options[:sub])
end

ARGV.each do |target|
  File.rename(target, rename(target))
end

The first one is a very thin wrapper around vorbiscomment that lets me pass it more than one file at a time and uses my idiosyncratic file storage/naming conventions to infer the title and "artist"[2] of the piece. The second one is just a simple regex application script which lets me format many files at once without going through the mind numbing tedium of one mv call per file[3].

What I listen to these days is actually slightly embarrassing. A little while ago, I was working with some friends, obviously enjoying some tunes on my headphones, and pretty much froze when one of them passed a speaker wire. I'm not even sure why; we've been friends for a pretty fucking long time at this point, and I knew that musical preferences would not be the thing to finally drive us apart, but I still hesitated at listening to some of this shit with another human being.

Not at all sure where that comes from. I guess it's that I used to be a rocker back in the day. The last time I actually bought a related album was back in 2007. Looking at my current, newly-organized library, it's split about half and half between pony/videogame related electronica and classical of some sort, but I honestly didn't notice the change taking place. I'm not even sure if rock is a thing in general anymore, but it's definitely not a thing I listen to. And I guess I wasn't sure whether my friends knew that yet, since we don't tend to talk about it.

It's really odd how the peripheral pieces of my identity are the ones that cause me the most concern. I remember admitting to myself that I was really a programmer/illustrator and not a Graphic Designer, and that didn't have much of an impact on how I behaved. The little things seem to perturb me a lot more when I notice them. Maybe it has to do with the fact that they tend to change while I'm not paying attention, rather than being an effort of conscious will...


Footnotes

1 - [back] - wc -l says deploy.rb was 56, while the actual Makefile clocks in at 20.

2 - [back] - "Artist" is in quotes because I actually use it to group playlists, rather than Artists in the usual sense.

3 - [back] - Incidentally, you can see what I mean when I call script arguments a hack, right? More than half of each of those scripts is taken up by a huge, verbose, un-abstractable block whose entire reason for existence is making up for the fact that I'm writing a function that I want to be command-line accessible.

Thursday, May 24, 2012

Assumptions

Your language is making an assumption about your work[1]. See if you've noticed it:

Whatever your program does, it will run in one process, on one machine, inside one network.

That's not to say that your language prevents you from violating that assumption, but if you do, you'll need to do something odd, or something fraught, or something arcane. Some languages also make a further assumption that your program will run on one core, which is becoming more and more ridiculous, and the result is a broken or missing threading model[2].

The

...hands-down most interesting thing about Erlang... -Inaimathi
is that it does not make this assumption. Each of your component pieces are meant to be created either as sets of functional definitions, or as interlocking, message-passing processes running almost completely independently from one another. If your program needs to run from multiple nodes on the same machine for whatever reason, it uses the rpc:call/4 I quickly demonstrated last time. Calls across machines and networks look the same, except that the target node is going to be a foreign IP rather than 127.0.1.1.

Calling out is just one piece, of course, and it wouldn't work particularly well without a standard communication protocol for Erlang processes to use. XML-RPC or similar could work in other languages, I suppose, but I'd always have this sneaking suspicion that I'm paying more for the channel itself than the messages being sent through. Erlang also provides facilities for process management, the most obvious and useful being supervision trees. Those let you specify monitoring and restarting strategies for the components of your program.

Being a web guy, it's sort of obvious to me that this is a good system structuring strategy.

Normally, most of these things would be handled outside the program. In fact, for the most part, I'm used to having to handle them outside of the program. I set up OS-level logging and restarting mechanisms, along with some scripted instruction about what to do in the event of a node failure. You'd need to explicitly define an inter-process communication protocol or grab one of the existing ones, and use it to make sure your system had a measure of node awareness[3].

For what it's worth, that works. The difference is that it takes more work than specifying it in source, it looks different from the program code and increases external dependencies, and (probably the most egregious from my perspective) it's not in source control by default, and therefore probably being treated as part of the deployment steps rather than as a first class citizen of the program proper. I'm going to mangle Greenspun's 10th here for illustrative purposes

Any sufficiently complicated, distributed system contains an ad-hoc, informally specified, bug-ridden, slow implementation of half of Erlang/OTP

It's really all stuff you already know cold.

  • Inter-node communication, complete with heartbeats and the appropriate byte-protocol
  • a specified, standard remote procedure call mechanism
  • formalized and explicit policy settings about failures
  • graceful distribution, logging, deployment and re-deployment (as per the "hot code swapping")

The difference is that you're probably used to it being specified in the surrounding infrastructure and systems rather than in the program itself.


Footnotes

1 - [back] - It actually doesn't matter what your language is; everything short of Erlang, and some experimental/domain-specific languages built to break this particular assumption, make it.

2 - [back] - Which is not in and of itself bad, the bad part is that the sysadmin is meant to pick up the slack manually.

3 - [back] - I've actually only done this once, and it ended up using HTTP for inter-process communication. It solved the specific problem, but something tells me that wouldn't be the easiest thing in the world to scale up.

Thursday, May 17, 2012

Please Don't Listen to Jeff Atwood

On my bus ride back from work[1], I've been thinking about how to make the response to this precise and thorough.

The stuff I was going to come out with included a reference to this Sussman talk from Danfest, which concludes by highlighting the title of a Minsky paper; "Why Programming is a Good Medium for Expressing Poorly-Understood and Sloppily-Formulated Ideas". I won't link you to the actual point in the talk wherein this happens, because it's only about thirty minutes long and well worth your time in its entirety. The gist is that by forcing yourself to describe a process or concept well enough that a very stupid machine (a computer) can understand it, you can iron out the unnoticed gaps and assumptions in your own knowledge. This is particularly relevant when dealing with other humans, who by and large aren't stupid, but merely missing some piece of information that you've begun to take for granted, or perhaps only ever learned by rote.

That would have segued naturally into the point that learning to think precisely can help humans communicate more effectively with each other, and not just with machines. The rebuttal would have continued with a short, faux-op-ed from an early scribe claiming that literacy is completely overrated and unnecessary in most peoples' every-day lives (claiming in all seriousness that hunters really just need to worry about is not breaking their spear arms, and that the farmers should focus exclusively on their plowshares). He'd conclude by asking you to refuse to learn how to read and write, because frankly, he's sick enough of his current colleagues' grammatical errors without you adding your own cock-ups to the mix.

Then I was going to point out this video from the MIT 600 Computer Science course[2], wherein Harold Abelson explains to the fresh class that "Computer Science" is not about computers in the same sense that Physics is not about particle accelerators, or that Biology is not about microscopes and petri dishes. What Computer Science is about, he claims, is formalizing certain types of formerly intuitive knowledge. In this case, imperative knowledge. How to do things. For a finale, I'd point out that, while Jeff was talking about coding where I'm making an argument for something more generally useful, humans might find it easier getting to the latter after going through the former. Seibel's Coders at Work[3] shows that one of the two[4] peculiar things about people who become good programmers is that they had early exposure to computers and coding, at a time when that wasn't really the typical experience.

I was going to write that, but on second reading, his latest piece seems to have the paradoxical message of

  1. You shouldn't bother learning things that won't directly and obviously make you better at the tasks in your job description[5]
  2. You shouldn't learn to program just for the money[6]

I'm not too familiar with the "everyone should learn to code" movement, but I doubt its core message is that everyone should become a professional programmer. Hell, I know how impossible that proposition is, and I've talked about it before. The thing is, unlike plumbing[7], programming[8] does teach those who study it a lot about communicating precisely, thinking clearly, and solving problems in general. So it at least seems like a believable candidate for "the next literacy".

So, yes, please, do learn to program. Don't avoid it just because you can grow turnips, or answer phones, or sit in meetings fine without it.

Go beyond

10 PRINT "HELLO"
20 GOTO 10

Begin to understand how to think precisely, and communicate clearly with entities who don't have a lot of knowledge in common with you. Don't worry that you'll never actually use this at your day job, and certainly don't expect to be a highly paid programmer in just 7 days. But do learn, because it will be interesting, and fun, and useful in places you might not expect.


Footnotes

1 - [back] - As a Lisp programmer at a small Toronto company, just in case my bias wasn't obvious enough already.

2 - [back] - (better known as SICP)

3 - [back] - (author's talk here)

4 - [back] - (the other is that most of them use Emacs)

5 - [back] -

To those who argue programming is an essential skill we should be teaching our children, right up there with reading, writing, and arithmetic: can you explain to me how Michael Bloomberg would be better at his day to day job of leading the largest city in the USA if he woke up one morning as a crack Java coder? It is obvious to me how being a skilled reader, a skilled writer, and at least high school level math are fundamental to performing the job of a politician. Or at any job, for that matter. But understanding variables and functions, pointers and recursion? I can't see it.-Jeff Atwood
(emphasis his).

I think my response is obvious from what I've said already, but just in case. "Programming" is not "variables and functions, pointers and recursion". It is a way to describe a process or concept so well that things which don't even share your biology can understand it. This is useful when dealing with things that do share your biology, but not quite all of your knowledge, and it is useful when explaining fundamental concepts to the uninitiated.

6 - [back] -

Please don't advocate learning to code just for the sake of learning how to code. Or worse, because of the fat paychecks. -Jeff Atwood

7 - [back] - Which deals with a very specific, physical system, isn't particularly fun, isn't particularly social, and only ever needs to be practised when something goes wrong.

8 - [back] - Which deals with a wide variety of at least partially imaginary systems, is fun, is mostly social, and can be applied in situations that don't involve water spraying out from under your sink.

Tuesday, May 15, 2012

Erlang and Barcodes

Ok, this isn't actually all Erlang. In fact, by line-count, it's a Postscript project, but all of those lines were already written by someone else. Also, I'm not sure whether I'll get the same benefit here that I got out of my Strifebarge write-up, but it's the third such piece, so I've gone back and added labels to group them.

"Almost Literate Programming".

What I'm doing isn't quite the LP that Knuth advocates because it doesn't self-extract, share space with the executable source, or make use of variable labels to automatically update certain portions. However, it still gains me considerable reflective clarity about what the goal of the program is, and it hopefully conveys the essence to whoever happens to be reading. With that out of the way...

Generating Barcodes

As you may have noticed from the above links, there already exists a Postscript-based barcode generator which I'm going to use pretty shamelessly in order to generate bitmap barcodes of various descriptions. Taking a look at the actual code for that generator should make it obvious that you probably don't want to just echo the entire system every time you need to generate something[1]. We'll get to that though, lets start from the system side first. This is what a .app declaration looks like in Erlang

# Makefile
all: *.erl *.c
        make wand
        erlc -W *erl

run: 
        erl -name ps_barcode@127.0.1.1 -eval 'application:load(ps_barcode).' -eval 'application:start(ps_barcode).'

wand: wand.c erl_comm.c driver.c
        gcc -o wand `pkg-config --cflags --libs MagickWand` wand.c erl_comm.c driver.c

clean:
        rm wand 
        rm *beam
%% ps_barcode.app
{application, ps_barcode,
 [{description, "barcode image generator based on ps-barcode"},
  {vsn, "1.0"},
  {modules, [ps_barcode_app, ps_barcode_supervisor, barcode_data, wand, ps_bc]},
  {registered, [ps_bc, wand, ps_barcode_supervisor]},
  {applications, [kernel, stdlib]},
  {mod, {ps_barcode_app, []}},
  {start_phases, []}]}.
-module(ps_barcode_app).
-behaviour(application).
-export([start/2, stop/1]).

start(_Type, StartArgs) -> ps_barcode_supervisor:start_link(StartArgs).
stop(_State) -> ok.
-module(ps_barcode_supervisor).
-behavior(supervisor).

-export([start/0, start_for_testing/0, start_link/1, init/1]).

start() ->
    spawn(fun() -> supervisor:start_link({local, ?MODULE}, ?MODULE, _Arg = []) end).

start_for_testing() ->
    {ok, Pid} = supervisor:start_link({local, ?MODULE}, ?MODULE, _Arg = []),
    unlink(Pid).

start_link(Args) ->
    supervisor:start_link({local, ?MODULE}, ?MODULE, Args).

init([]) ->
    {ok, {{one_for_one, 3, 10},
          [{tag1, 
            {wand, start, []},
            permanent,
            brutal_kill,
            worker,
            [wand]},
           {tag2,
            {ps_bc, start, []},
            permanent,
            10000,
            worker,
            [ps_bc]}]}}.

The Makefile is not, strictly speaking, necessary, but a bunch of stuff needs to be done manually in its absence. The above code is approximately equivalent to a Lisp .asd file, in that it tells Erlang what needs to be compiled/called in order to run the system I'm about to define[2].

  {modules, [ps_barcode_app, ps_barcode_supervisor, barcode_data, wand, ps_bc]},

That line specifies which other modules we'll be loading as part of the application, as well as their start order (which is relevant for a certain supervision strategy).

  {registered, [ps_bc, wand, ps_barcode_supervisor]},

That one specifies registered processes we expect.

  {mod, {ps_barcode_app, []}},

That one tells Erlang which module's start function to call in order to start the application, and what arguments to pass it as StartArgs.

init([]) ->
    {ok, {{one_for_one, 3, 10},
          [{tag1, 
            {wand, start, []},
            permanent,
            brutal_kill,
            worker,
            [wand]},
           {tag2,
            {ps_bc, start, []},
            permanent,
            10000,
            worker,
            [ps_bc]}]}}.

That does something interesting; it defines how the supervisor should act, and how it should treat its child processes. {one_for_one, 3, 10} means that if a supervised process errors, it should be restarted on its own up to 3 times in 10 seconds[3]. Both sub-processes are permanent[4] workers[5]. The last interesting bit is the brutal_kill/10000 part; that's the Shutdown variable. It determines how the process should be terminated; brutal_kill means "kill the process right away", an integer means "send the process a stop command and wait up to this many milliseconds, then kill it".

Lets follow the applications' start order and move on to

-module(barcode_data).
-export([read_default_file/0, read_file/1]).
-export([export_ets_file/1, import_ets_file/0]).

export_ets_file(Table) -> ets:tab2file(Table, "ps-barcode-blocks").
import_ets_file() -> {ok, Tab} = ets:file2tab(filename:absname("ps-barcode-blocks")), Tab.

read_default_file() -> read_file("barcode.ps").
read_file(Filename) ->
    {ok, File} = file:open(Filename, [read]),
    TableId = ets:new(ps_barcode_blocks, [ordered_set]),
    trim_flash(File),
    {ok, Tab} = read_all_blocks(File, TableId),
    file:close(File),
    Tab.

trim_flash(IoDevice) -> read_until(IoDevice, "% --BEGIN TEMPLATE").

read_all_blocks(IoDevice, TableId) ->
    case Res = read_block(IoDevice) of
        [] -> {ok, TableId};
        _ -> ets:insert(TableId, parse_block(Res)),
             read_all_blocks(IoDevice, TableId)
    end.

read_block(IoDevice) -> read_until(IoDevice, "% --END ").

parse_block([["%", "BEGIN", "PREAMBLE"] | Body]) ->
    {preamble, lists:append(Body)};
parse_block([["%", "BEGIN", "RENDERER", Name] | Body]) ->
    {list_to_atom(Name), renderer, lists:append(Body)};
parse_block([["%", "BEGIN", "ENCODER", Name] | Body]) ->
    parse_encoder_meta(Name, Body);
parse_block(_) -> {none}.

parse_encoder_meta (Name, Encoder) -> parse_encoder_meta(Name, Encoder, [], {[], [], []}).
parse_encoder_meta (Name, [["%", "RNDR:" | Renderers] | Rest], Acc, {_, R, S}) ->
    parse_encoder_meta(Name, Rest, Acc, {Renderers, R, S});
parse_encoder_meta (Name, [["%", "REQUIRES" | Reqs] | Rest], Acc, {A, _, S}) ->
    parse_encoder_meta(Name, Rest, Acc, {A, Reqs, S});
parse_encoder_meta (Name, [["%", "SUGGESTS" | Suggs] | Rest], Acc, {A, R, _}) ->
    parse_encoder_meta(Name, Rest, Acc, {A, R, Suggs});
parse_encoder_meta (Name, [["%", "EXOP:" | Exop] | Rest], Acc, Cmp) ->
    parse_encoder_meta(Name, Rest, [{def_arg, Exop} | Acc], Cmp);
parse_encoder_meta (Name, [["%", "EXAM:" | Exam] | Rest], Acc, Cmp) ->
    parse_encoder_meta(Name, Rest, [{example, string:join(Exam, " ")} | Acc], Cmp);
parse_encoder_meta (Name, [["%" | _] | Rest], Acc, Cmp) ->
    parse_encoder_meta(Name, Rest, Acc, Cmp);
parse_encoder_meta (Name, Body, [DefArgs, Example], {A, R, S}) ->
    Reqs = [list_to_atom(strip_nl(X)) || X <- lists:append([A, R, S])],
    {list_to_atom(Name), encoder, {requires, Reqs}, Example, DefArgs, lists:append(Body)}.

strip_nl(String) -> string:strip(String, right, $\n).

read_until(IoDevice, StartsWith) -> read_until(IoDevice, StartsWith, []).
read_until(IoDevice, StartsWith, Acc) ->
    case file:read_line(IoDevice) of
        {ok, "\n"} ->
            read_until(IoDevice, StartsWith, Acc);
        {ok, Line} -> 
            case lists:prefix(StartsWith, Line) of
                true -> 
                    lists:reverse(Acc);
                false -> 
                    read_until(IoDevice, StartsWith, 
                               [process_line(Line) | Acc])
            end;
        {error, _} -> error;
        eof -> lists:reverse(Acc)
    end.

process_line(Line) ->
    case lists:prefix("% --", Line) of
        true -> split_directive_line(Line);
        false -> Line
    end.

split_directive_line(Line) ->
    [X || X <- re:split(strip_nl(Line), "( |--)", [{return, list}]),
          X /= " ", X /= [], X /= "--", X /="\n"].

This is a reasonably simple reader program. The goal of it is to break that 17111 line .ps file into individual components. First, a preamble (basic definitions that need to go into each file), then a set of renderers[6], and a rather large number of encoders[7]. These components are stored in an ETS table held in memory. The initial Postscript file only needs to be parsed once; the resulting ETS table is then exported to a file on disk so that it can just be loaded in the future.

Do note the nested case statements there. Last time, I complained about the guards, and this is why. Really, I should have been able to write that as

        ...
        {ok, Line} where lists:prefix(StartsWith, Line) -> lists:reverse(Acc);
        {ok, Line} -> read_until(IoDevice, StartsWith, [process_line(Line) | Acc]);
        ...

but even though lists:prefix is a perfectly functional predicate, it's not in that blessed subset of Erlang that can be called from within a guard sequence. The consequence, in this case, is that I have to bust out a second case block, and waste six lines doing it. Moving onto sorting PS blocks...

parse_block([["%", "BEGIN", "PREAMBLE"] | Body]) ->
    {preamble, lists:append(Body)};
parse_block([["%", "BEGIN", "RENDERER", Name] | Body]) ->
    {list_to_atom(Name), renderer, lists:append(Body)};
parse_block([["%", "BEGIN", "ENCODER", Name] | Body]) ->
    parse_encoder_meta(Name, Body);
parse_block(_) -> {none}.

The preamble and renderers are really just named strings, but the encoders have more metadata about them.

parse_encoder_meta (Name, Encoder) -> parse_encoder_meta(Name, Encoder, [], {[], [], []}).
parse_encoder_meta (Name, [["%", "RNDR:" | Renderers] | Rest], Acc, {_, R, S}) ->
    parse_encoder_meta(Name, Rest, Acc, {Renderers, R, S});
parse_encoder_meta (Name, [["%", "REQUIRES" | Reqs] | Rest], Acc, {A, _, S}) ->
    parse_encoder_meta(Name, Rest, Acc, {A, Reqs, S});
parse_encoder_meta (Name, [["%", "SUGGESTS" | Suggs] | Rest], Acc, {A, R, _}) ->
    parse_encoder_meta(Name, Rest, Acc, {A, R, Suggs});
parse_encoder_meta (Name, [["%", "EXOP:" | Exop] | Rest], Acc, Cmp) ->
    parse_encoder_meta(Name, Rest, [{def_arg, Exop} | Acc], Cmp);
parse_encoder_meta (Name, [["%", "EXAM:" | Exam] | Rest], Acc, Cmp) ->
    parse_encoder_meta(Name, Rest, [{example, string:join(Exam, " ")} | Acc], Cmp);
parse_encoder_meta (Name, [["%" | _] | Rest], Acc, Cmp) ->
    parse_encoder_meta(Name, Rest, Acc, Cmp);
parse_encoder_meta (Name, Body, [DefArgs, Example], {A, R, S}) ->
    Reqs = [list_to_atom(strip_nl(X)) || X <- lists:append([A, R, S])],
    {list_to_atom(Name), encoder, {requires, Reqs}, Example, DefArgs, lists:append(Body)}.

This is not the most elegant function. In fact, now that I look at it, it seems like I could fairly easily replace that {A, R, S} tuple with a list accumulator.

EDIT:

Turns out there was a reason I did it this way; we need this data to be in the order of Renderers, Required, Suggested, but the order they're parsed in is actually Required, Suggested, Renderers (also, some components have no requirements, and some have no suggestions). The ordering is confusing enough that I just decided to keep it explicit.

Wed, 16 May, 2012

What we're doing here is breaking apart an encoder block, and pulling out

  • the list of other blocks we need to output before this one[8]
  • a piece of example data that this particular encoder can handle[9]
  • the default arguments to passed to this encoder
  • the body code of this encoder

The list of required blocks is exhaustive for each encoder, so we don't need to recursively check requirements later, it's enough to store and act on all requirements of a given barcode.

-export([read_default_file/0, read_file/1]).
-export([export_ets_file/1, import_ets_file/0]).

Those exported functions are really all that a user of this module should ever need to call; you're either processing a new revision of the ps file, or you're importing the already exported ETS table derived from the ps file, or you're exporting a new ETS table for later loading. Now that we've seen how we store the relevant data, lets take a look at

-module(wand).

-behaviour(gen_server).

-export([start/0, stop/0]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
         terminate/2, code_change/3]).

-export([process/1]).

process(Filename) -> gen_server:call(?MODULE, {process_barcode, Filename}).

handle_call({process_barcode, Filename}, _From, State) ->
    State ! {self(), {command, Filename}},
    receive
        {State, {data, Data}} ->
            {reply, decode(Data), State}
    end;
handle_call({'EXIT', _Port, Reason}, _From, _State) ->
    exit({port_terminated, Reason}).

decode([0]) -> {ok, 0};
decode([1]) -> {error, could_not_read};
decode([2]) -> {error, could_not_write}.

%%%%%%%%%%%%%%%%%%%% generic actions
start() -> gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).
stop() -> gen_server:call(?MODULE, stop).

%%%%%%%%%%%%%%%%%%%% gen_server handlers
init([]) -> {ok, open_port({spawn, filename:absname("wand")}, [{packet, 2}])}.
handle_cast(_Msg, State) -> {noreply, State}.
handle_info(_Info, State) -> {noreply, State}.
terminate(_Reason, State) -> State ! {self(), close}, ok.
code_change(_OldVsn, State, _Extra) -> {ok, State}.

This is actually pretty much the same C Port file I used last time, except that this one has been re-written to use gen_server, rather than being plain Erlang code. I still refuse to use that godawful file template they ship with their Emacs mode though[10]. All it does is call out to a C program named wand to do the actual image processing involved in generating these barcodes. All you need to know is that we send it a barcodes' file name, and it quickly generates a high-res PNG version in the same folder.

Right, that's it for the periphery, lets finally dive into

-module(ps_bc).

-behaviour(gen_server).

-export([start/0, stop/0]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2, terminate/2, code_change/3]).

-export([help/0, help/1, write/3, write/5, generate/2, generate/3, change/1, make_tempname/0]).

help() -> gen_server:call(?MODULE, help).
help(BarcodeType) -> gen_server:call(?MODULE, {help, BarcodeType}).
write(DestFolder, BarcodeType, Data) -> 
    write(DestFolder, BarcodeType, Data, 200, 200).
write(DestFolder, BarcodeType, Data, Width, Height) ->
    gen_server:call(?MODULE, {write, DestFolder, BarcodeType, Data, Width, Height}).
generate(BarcodeType, Data) -> generate("/tmp/", BarcodeType, Data).
generate(DestFolder, BarcodeType, Data) -> 
    NameOfTempFile = write(DestFolder, BarcodeType, Data),
    wand:process(NameOfTempFile),
    NameOfTempFile.
change(TableId) -> gen_server:call(?MODULE, {change, TableId}).

handle_call(help, _From, State) ->
    {reply, ets:match(State, {'$1', encoder, '_', '_', '_', '_'}), State};
handle_call({help, BarcodeType}, _From, State) ->
    {reply, ets:match(State, {BarcodeType, encoder, '_', '$1', '_', '_'}), State};
handle_call({write, DestFolder, BarcodeType, Data, Width, Height}, _From, State) ->
    Fname = make_tempname(DestFolder),
    {ok, File} = file:open(Fname, [write, exclusive]),
    [[{requires, CompList}, {def_arg, ExArgs}]] = ets:match(State, {BarcodeType, encoder, '$1', '_', '$2', '_'}),
    file:write(File, io_lib:format("%!PS-Adobe-2.0\n%%BoundingBox: 0 0 ~w ~w\n%%LanguageLevel: 2\n", [Width, Height])),
    write_component(preamble, State, File),
    file:write(File, "\n/Helvetica findfont 10 scalefont setfont\n"),
    lists:map(fun (C) -> write_component(C, State, File) end, CompList),
    write_component(BarcodeType, State, File),
    write_barcode(File, BarcodeType, ExArgs, Data),
    file:close(File),
    {reply, Fname, State};
handle_call({change_table, Tab}, _From, _State) ->
    {reply, {watching_table, Tab}, Tab}.

make_tempname() ->
    {A, B, C} = now(),
    [D, E, F] = lists:map(fun integer_to_list/1, [A, B, C]),
    lists:append(["tmp.", D, ".", E, ".", F]).
make_tempname(TargetDir) ->
    filename:absname_join(TargetDir, make_tempname()).

write_component(preamble, Table, File) ->
    [[Pre]] = ets:match(Table, {preamble, '$1'}),
    file:write(File, Pre);
write_component(Name, Table, File) -> 
    file:write(File, lookup_component(Name, Table)).

write_barcode(File, datamatrix, _, Data)        -> format_barcode_string(File, datamatrix, "", Data);
write_barcode(File, BarcodeType, ExArgs, Data)  -> format_barcode_string(File, BarcodeType, string:join(ExArgs, " "), Data).

format_barcode_string(File, BarcodeType, ExArgString, DataString) ->
    io:format(File, "10 10 moveto (~s) (~s) /~s /uk.co.terryburton.bwipp findresource exec showpage",
              [DataString, ExArgString, BarcodeType]).
                     
lookup_component(Name, Table) -> 
    Ren = ets:match(Table, {Name, renderer, '$1'}),
    Enc = ets:match(Table, {Name, encoder, '_', '_', '_', '$1'}),
    case {Ren, Enc} of
        {[], [[Res]]} -> Res;
        {[[Res]], []} -> Res
    end.

%%%%%%%%%%%%%%%%%%%% generic actions
start() -> gen_server:start_link({local, ?MODULE}, ?MODULE, [], []). %% {local/global, Name}, Mod, InitArgs, Opts
stop() -> gen_server:call(?MODULE, stop).

%%%%%%%%%%%%%%%%%%%% gen_server handlers
init([]) -> {ok, barcode_data:import_ets_file()}.
handle_cast(_Msg, State) -> {noreply, State}.
handle_info(_Info, State) -> {noreply, State}.
terminate(_Reason, _State) -> ok.
code_change(_OldVsn, State, _Extra) -> {ok, State}.

This is where the meat of the application resides, so I'll take my time with it.

First off, note that the init function loads that ETS file we generated in barcode_data.

init([]) -> {ok, barcode_data:import_ets_file()}.

That's where our data is stored, and we'll be looking up components by referring to it.

lookup_component(Name, Table) -> 
    Ren = ets:match(Table, {Name, renderer, '$1'}),
    Enc = ets:match(Table, {Name, encoder, '_', '_', '_', '$1'}),
    case {Ren, Enc} of
        {[], [[Res]]} -> Res;
        {[[Res]], []} -> Res
    end.

This is (again) not the most elegant code. Really, what I'd want to do is look up Ren first, check if it returned something, then check whether an Enc exists. That would save me a look-up every once in a while[11]. Do take note that there's no clause to handle the event that a faulty index was passed; in that case, the process will fail with an unmatched pattern and promptly be restarted. This function is in turn used by write_component to actually output the given block to a file

write_component(preamble, Table, File) ->
    [[Pre]] = ets:match(Table, {preamble, '$1'}),
    file:write(File, Pre);
write_component(Name, Table, File) -> 
    file:write(File, lookup_component(Name, Table)).

Before we tear into handle, just one more note about the remaining interesting helper function

make_tempname() ->
    {A, B, C} = now(),
    [D, E, F] = lists:map(fun integer_to_list/1, [A, B, C]),
    lists:append(["tmp.", D, ".", E, ".", F]).
make_tempname(TargetDir) ->
    filename:absname_join(TargetDir, make_tempname()).

Actually, two functions (make_tempname/0 and make_tempname/1). I thought about just using os:cmd("mktemp") instead, but decided against it. make_tempname uses now() to generate a unique temporary filename. It optionally takes a directory specification, in which case it creates an absolute filename in that directory.

By the way, that's how you handle optional arguments in Erlang. You create multiple functions with the same name, but with different arity and just write the matching expression for each. It's surprisingly elegant, and the only thing differentiating these from a single function declaration is that they're separated by . rather than by ;. Obviously, if you plan on extending such a function to the users of your module, you need to export all the arities you've defined. I wasn't being needlessly pedantic earlier, Erlang treats make_tempname/0 and make_tempname/1 as completely separate functions.

Right, now then.

handle_call(help, _From, State) ->
    {reply, ets:match(State, {'$1', encoder, '_', '_', '_', '_'}), State};
handle_call({help, BarcodeType}, _From, State) ->
    {reply, ets:match(State, {BarcodeType, encoder, '_', '$1', '_', '_'}), State};
handle_call({write, DestFolder, BarcodeType, Data, Width, Height}, _From, State) ->
    Fname = make_tempname(DestFolder),
    {ok, File} = file:open(Fname, [write, exclusive]),
    [[{requires, CompList}, {def_arg, ExArgs}]] = ets:match(State, {BarcodeType, encoder, '$1', '_', '$2', '_'}),
    file:write(File, io_lib:format("%!PS-Adobe-2.0\n%%BoundingBox: 0 0 ~w ~w\n%%LanguageLevel: 2\n", [Width, Height])),
    write_component(preamble, State, File),
    file:write(File, "\n/Helvetica findfont 10 scalefont setfont\n"),
    lists:map(fun (C) -> write_component(C, State, File) end, CompList),
    write_component(BarcodeType, State, File),
    write_barcode(File, BarcodeType, ExArgs, Data),
    file:close(File),
    {reply, Fname, State};
handle_call({change_table, Tab}, _From, _State) ->
    {reply, {watching_table, Tab}, Tab}.

The last directive there should probably be implemented as a handle_cast rather than handle_call[12]. The first two should probably return processed data rather than raw ETS results. Rest assured that mental notes have been made. The message help returns a list of available encoders[13], while asking for help with a specific encoder will return its example data. All the meat is in that extra large message handler in the middle.

Deep breath.

A message of {write, DestFolder, BarcodeType, Data, Width, Height} will output Data in a BarcodeType barcode in the DestFolder folder and format it to WidthxHeight dimensions. That's actually going to get trickier. Right now, the dimensions are just assumed to be 200x200 in the initial PS, and that C module is expected to output a properly formatted PS file. There are a few problems with that though[14], so what I will ultimately want to do is have the C module return the appropriate dimensions and have ps_bc change this initial file later. That's another TODO.

What the write message actually does, in order is

  • generates a tempfile name for the directory it was passed
  • opens that File for output
  • looks up the required blocks in our ETS table
  • writes the preamble to File
  • writes the required blocks to File
  • writes the barcode component to File
  • writes a Postscript directive invoking that component with Data to File
  • closes File
  • replies with the absolute tempfile name that it generated

And there you have it, we now have a barcode PS file in the specified location.

The rest of the functions here are either gen_server pieces (which I won't go into), or interface functions (which I will)

help() -> gen_server:call(?MODULE, help).
help(BarcodeType) -> gen_server:call(?MODULE, {help, BarcodeType}).
write(DestFolder, BarcodeType, Data) -> 
    write(DestFolder, BarcodeType, Data, 200, 200).
write(DestFolder, BarcodeType, Data, Width, Height) ->
    gen_server:call(?MODULE, {write, DestFolder, BarcodeType, Data, Width, Height}).
generate(BarcodeType, Data) -> generate("/tmp/", BarcodeType, Data).
generate(DestFolder, BarcodeType, Data) -> 
    NameOfTempFile = write(DestFolder, BarcodeType, Data),
    wand:process(NameOfTempFile),
    NameOfTempFile.
change(TableId) -> gen_server:call(?MODULE, {change, TableId}).

This is a set of exported functions to let outside modules easily interact with the internal ps_bc process. change, help and write map to the corresponding handle_call messages we looked at earlier[15]. generate is something else. This is the principal function I expect to be called from outside the module, though AFAIK, there's no way to highlight that from within the code. To that end, it collects everything you need to create a barcode from start to finish; it accepts a BarcodeType and Data (and optionally a DestFolder) and calls write/3 to create the Postscript file, then wand:process to create the corresponding PNG and rasterized PS file, and finally returns the tempfile name that it generated. That should probably actually return a list of absolute file-names it created rather than just the base name. Mental note number 6.

Whew! At the risk of pulling a Yegge, this piece is turning out a lot longer than I though it was going to be. Lets get it wrapped up quickly.

Nitrogen

Nitrogen is an Erlang web framework I've been playing with. I won't explain it in depth, just use it to show you how you'd go about invoking the above program for realsies. In fact, here's a nitrogen/rel/nitrogen/site/src/index.erl that will call out to ps_barcode to generate a barcode based on user input and let them download the bitmap and Postscript file:

%% -*- mode: nitrogen -*-
-module (index).
-compile(export_all).
-include_lib("nitrogen_core/include/wf.hrl").

main() -> #template { file="./site/templates/bare.html" }.

title() -> "Welcome to Nitrogen".

body() ->
    #container_12 { body=[
        #grid_8 { alpha=true, prefix=2, suffix=2, omega=true, body=inner_body() }
    ]}.

inner_body() -> 
    [
        #h3 { text="PS Barcode Generator" },
        #h1 { text="In MOTHERFUCKING ERLANG"},
        #p{},
        #textbox { id=barcode_data, text=get_example(qrcode)},
        barcode_type_dropdown(qrcode),
        #button { id=button, text="Generate", postback=click },
        #p{ id=result, body=[
            #image { id=barcode_img },
            #p { id=barcode_link }
        ]}
    ].

barcode_type_dropdown(DefaultType) ->
    Types = rpc:call('ps_barcode@127.0.1.1', ps_bc, help, []),
    #dropdown { id=barcode_type, value=DefaultType, postback=select_type,
        options=lists:map(fun ([T]) -> #option {text=T, value=T} end, Types)
    }.

get_example(BarcodeType) ->
    [[{example,Example}]] = rpc:call('ps_barcode@127.0.1.1', ps_bc, help, [BarcodeType]),
    Example.

event(click) ->
    [_, Fname] = re:split(
        rpc:call('ps_barcode@127.0.1.1', ps_bc, generate, 
            [filename:absname("site/static/images"), list_to_atom(wf:q(barcode_type)), wf:q(barcode_data)]),
        "site/static", [{return, list}]),
    wf:replace(barcode_img, 
        #image { 
            id=barcode_img,
            image=string:concat(Fname, ".png"),
            actions=#effect { effect=highlight }
    }),
    wf:replace(barcode_link,
        #link {
            id=barcode_link,
            text="Download PS file",
            url=string:concat(Fname, ".ps")
    });
event(select_type) ->
    wf:set(barcode_data, get_example(list_to_atom(wf:q(barcode_type)))).

The actual calls to our application happen

get_example(BarcodeType) ->
    [[{example,Example}]] = rpc:call('ps_barcode@127.0.1.1', ps_bc, help, [BarcodeType]),
    Example.

here[16] and

        ...
        rpc:call('ps_barcode@127.0.1.1', ps_bc, generate, 
            [filename:absname("site/static/images"), list_to_atom(wf:q(barcode_type)), wf:q(barcode_data)]),
        "site/static", [{return, list}]),
        ...

here. Recall that make run on the Makefile I defined earlier started a node named 'ps_barcode@127.0.1.1' and started our application in it. So, if we want to use it from another Erlang node, all we have to do is start them both up using the same cookie, and then use the built in rpc:call function, specifying the appropriate node, module, function and arguments. The return message is going to be a response from our application.

The code shown here won't actually run on its own[17], I left out the C file[18], as well as the actual barcode.ps that the whole thing is based on. I'll act on the mental notes I've collected first, and then toss the whole thing up on my github for you to play with. The nitrogen module is minimal enough that I won't feel bad for leaving it out, but the one above should work with your copy of nitrogen.

It's actually just a minimally modified version of the default index.erl file that comes with the framework, the only interesting pieces in it are the rpc:call lines which demonstrate the hands-down most interesting thing about Erlang. The thing that justifies putting up with all the warts and annoyances[19]. I'll expand on that next time though, this was already more than enough stuff coming out of my mind.


Footnotes

1 - [back] - The complete file is 17111 lines, and we really only need about 800-1200 at the outside to generate a single specific barcode.

2 - [back] - Incidentally, I didn't do this first. I sort of wish I had in retrospect, because it would have saved me some dicking around with erl, but I actually wrote the code first, then wrote the above based on it. Also incidentally, a lot of it doesn't seem like much of it will change on a project-by-project basis. That tells me that we're either working with the wrong abstractions, or there are tricky things you can do at this stage that I haven't yet grasped. It also tells me that I should probably write some generation scripts for it.

3 - [back] - one_for_all and rest_for_one are other possible strategies, _all restarts all child processes rather than just the one that errored, and rest_ just restarts processes later in the start order.

4 - [back] - Which means they get restarted when they error, and hang around after they've finished their work.

5 - [back] - Which means that we have a pretty shallow supervision tree in this case, but we really don't need more.

6 - [back] - Routines that do general operations for a particular class of barcode, such as linear or matrix.

7 - [back] - Routines that do the job of converting a specific piece of data into a particular type of barcode, such as qrcode, code93 or datamatrix.

8 - [back] - renderers, required encoders and suggested encoders.

9 - [back] - Some, like datamatrix and qrcode, can handle almost arbitrary string information, while others are restricted to a subset of ascii, and others require a specific number of numeric characters.

10 - [back] - As an aside here, that's one of the things that really rustles my jimmies about Erlang. I've gotten extremely used to including a pretty extensive documentation string with each Common Lisp function and method, knowing that a potential user will be able to make full use of any describe calls they make. It's actually even better for methods, since you get the documentation for the generic you define, as well as a compilation of all doc-strings for the related defmethod calls. Erlang isn't having any of this shit. If you want to include doc-strings, you can damn well write Java-style precisely formatted comments and use a separate doc extractor to read them. I guess this is how most languages do it? It still seems stupid to have a system this dynamic that doesn't allow runtime documentation pokes. Sigh. Ok, let's get back to it.

11 - [back] - Which, granted, isn't really worth saving given how blazingly fast ETS is, but still.

12 - [back] - The only difference being that handle_cast doesn't send a response message to its caller.

13 - [back] - Which we'll use later to give the user something to do about them.

14 - [back] - Specifically, since I'm using the Imagemagick API, the PS that it outputs is actually rasterised. That means it'll be much larger than the initial file and take that much longer to output. Literally the only advantage to it is that it properly sets the width and height of the document.

15 - [back] - Note that I do export help/0, help/1, write/3 and write/5 separately.

16 - [back] - Which, again, really should be expecting a naked string response rather than a raw ETS lookup record.

17 - [back] - The complete code doesn't quite work yet either. Most of it does what it's supposed to, but I've already found one odd case where things don't quite work the way they're supposed to. Tips and patches welcome.

18 - [back] - Which I was actually going to discuss, but this has gone on quite long enough already.

19 - [back] - At least, until I learn enough about it to put together an analogous system in Common Lisp :P