Monday, February 17, 2014

Autocompletion Example with Ports in Elm

Two things on the agenda today. First, Elm has gotten some improvements that might mean I end up using it in production at some point. Second, I tried a new language called Pure, which I found by searching for "dynamically typed haskell". Stick around if that sounds interesting.

EDIT:

Do not bother sticking around if that sounds interesting. I ended up talking about Elm so much that I never got into Pure.

Tue, 18 Feb, 2014

Elm Lang

For those of you just joining us, Elm is a pure-functional, statically typed, optionally-type-inferring language closely based on Haskell, which targets a JavaScript-hosted VM for deployment. That is, there's an elm-runtime.js which Elm code compiles to target, and the result is highly reactive web front-ends that don't require any mucking around with the DOM. Now that we're all on the same page...

Elm. Again.

This came up at a recent Code Retreat, and it looks interesting as fuck in context with the FBP stuff I've been doing at work recently. The problem we were solving at the event was autocompletion. That is, given a partial input, return possible completions from some dictionary. Here's a short[1] implementation in Elm.

module Autocomplete where

import String
import Keyboard
import Graphics.Input as Input

(field, content) = Input.field "Enter text"

fState : [String] -> Input.FieldState
fState comps = case comps of
                 [] -> {string = "", selectionStart=0, selectionEnd=0}
                 _  -> {string = (head comps), selectionStart=0, selectionEnd=0}

esc = Keyboard.isDown 27
ctrlSpace = dropRepeats . lift and <| combine [Keyboard.ctrl, Keyboard.space]

empty : Signal Element
empty = sampleOn (merge Keyboard.enter esc) . fst <| Input.field "Enter text"

completeElem : Signal Element
completeElem = lift ((Input.fields Input.emptyFieldState).field id "Enter text") . sampleOn ctrlSpace . lift fState <| lift completions content

completions : String -> [String]
completions partial = if | 0 < String.length partial -> filter (String.startsWith partial) wordList
                         | otherwise          -> []

main = lift2 above (merges [field, completeElem, empty]) . lift asText <| lift completions content

--- Dummy data
wordList : [String]
wordList = String.words "one two three four five six seven eight nine ten"

The point here is: there's an input that displays completions as you type. If you hit the enter or esc keys, that input is cleared, and if you hit Ctrl+Space, it's filled with the top completion. The above doesn't let you select a different completion, which it should, but it's a pretty instructive example.

Lets go through it.

module Autocomplete where

import String
import Keyboard
import Graphics.Input as Input

Module and import declarations. Nothing to see here, move along.

(field, content) = Input.field "Enter text"

fState : [String] -> Input.FieldState
fState comps = case comps of
                 [] -> {string = "", selectionStart=0, selectionEnd=0}
                 _  -> {string = (head comps), selectionStart=0, selectionEnd=(String.length <| head comps)}

The first line in this bit sets up a field, which is represented as a pair of Signals; one for the element and one for the content. Signals are a pretty good way of modeling state changes over time in a purely-functional context. You can think of one as the infinite stream of possible values it'll contain, the current of which your program will be continuously operating. An input field is a pair of signals because you'd like to be able to change it[2], as well as receive updates about its state changes. We'll do that by combining several signals on particular sample points, and FieldState is the type we can eventually funnel into a field.

esc = Keyboard.isDown 27
ctrlSpace = dropRepeats . lift and <| combine [Keyboard.ctrl, Keyboard.space]

These represent two different signals we'd like from the Keyboard module. The first will be True whenever the Escape key is down[3], the second will be True when both the Ctrl and Space key are pressed[4]. The types at each step might be useful. In particular, ctrlSpace : Signal Bool, combine : [Signal a] -> Signal [a] and lift and : Signal [Bool] -> Signal Bool. The dropRepeats is the only chunklet whose type signature will give you no further understanding [5]; it's there to prevent partial signal changes from triggering a "change" in the ctrlSpace signal itself. Also, on a syntax note, the <| is identical to Haskell's $.

Onward.

empty : Signal Element
empty = sampleOn (merge Keyboard.enter esc) . fst <| Input.field "Enter text"

completeElem : Signal Element
completeElem = lift ((Input.fields Input.emptyFieldState).field id "Enter text") . sampleOn ctrlSpace . lift fState <| lift completions content

completions : String -> [String]
completions partial = if | 0 < String.length partial -> filter (String.startsWith partial) wordList
                         | otherwise          -> []

This is the real meat right here. empty is the signal of empty elements which will "changes" whenever the enter or esc keys are pressed[6]. completeElem is the signal of filled elements that "changes" whenever the user hits Ctrl + Space. Finally, completions is the signal of completions of the current text in the main input.

main = lift2 above (merges [field, completeElem, empty]) . lift asText <| lift completions content

--- Dummy data
wordList : [String]
wordList = String.words "one two three four five six seven eight nine ten"

These remaining lines render the input and completions to screen, and set up the extremely minimal test dictionary. That's it. What we have here is exactly what was described. An input, backed by a word list, which is cleared on either Enter or Esc, and completed on Ctrl+Space.

The New Part

None of that was new.

If you've read any of the articles on this blog tagged Elm, you'd have known all of it already. The new part is that you can now have your Elm programs communicate with the outside world. In the case we're considering above, a solitary auto-completing input is pretty useless. But imagine if you could use it essentially as a minibuffer in a larger project. You'd want to be able to pass it new completion lists, and you'd want it to notify you when the user entered some input. It goes without saying that you'd like all this to be encapsulated within a known, non-global namespace, so that you could combine your Elm minibuffer with arbitrary JS projects.

So, lets do it.

module Autocomplete where

import String
import Keyboard
import Graphics.Input as Input

(field, content) = Input.field "Enter text"

fState : [String] -> Input.FieldState
fState comps = case comps of
                 [] -> {string = "", selectionStart=0, selectionEnd=0}
                 _  -> {string = (head comps), selectionStart=0, selectionEnd= (String.length <| head comps)}

esc = Keyboard.isDown 27
ctrlSpace = dropRepeats . lift and <| combine [Keyboard.ctrl, Keyboard.space]

empty : Signal Element
empty = sampleOn (merge Keyboard.enter esc) . fst <| Input.field "Enter text"

completeElem : Signal Element
completeElem = lift ((Input.fields Input.emptyFieldState).field id "Enter text") . sampleOn ctrlSpace . lift fState <| lift2 completions content wordList

completions : String -> [String] -> [String]
completions partial wordList = if | 0 < String.length partial -> filter (String.startsWith partial) wordList
                                  | otherwise          -> []

main = lift2 above (merges [field, completeElem, empty]) . lift asText <| lift2 completions content wordList

port wordList : Signal [String]

port output : Signal String
port output = keepIf (\s -> s/="") "" (sampleOn Keyboard.enter content)

That's a minimally changed .elm file. The differences are

  • We've added port declarations at the bottom there. One incoming, which is just a type declaration, and one outgoing, which has a type declaration and a transmitter function.
  • We've changed completions so that it takes its wordList as input
  • Anywhere we used to call completions with lift completions content, we now have to call it with lift2 completions content wordList

The file you'd embed that module into would look something like this[7]

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <title>Embedding Autocomplete - Elm</title>
    <script type="text/javascript" src="/elm-runtime.js"></script>
    <script type="text/javascript" src="/build/Autocomplete.js"></script>
  </head>
  <body>
    <div id="auto" style="position: absolute; left: 50px; top: 50px; width: 600px; height: 100px; border: 2px dashed #000;"></div>
    <script type="text/javascript">
      var can = Elm.embed(Elm.Autocomplete, 
                          document.getElementById("auto"), 
                          {wordList: ["one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"]});
      can.ports.output.subscribe(function (msg) { console.log("FROM MINIBUFFER :: ", msg) })
    </script>
  </body>
</html>
EDIT:

You can find a running demo of the above here.

Sat, 22 Feb, 2014

The relevant bits are the positioned div, which will contain our program, and the Elm.embed call, which sets it up. Note especially the third argument; you have to do that for any input ports in the component you're embedding. Finally, note the subscribe call which fits that output port we defined with a listener, in this case a naive one that just prints everything it gets to the console.

This is awesome.

It's awesome enough that I'm seriously considering Elm for some production work at work. Because I want to apply Elm in the places where it'll do massive amounts of good, and leave the other stuff to stateful JavaScript programs. Using the ports approach above, I can get exactly that. If there was something similar for Hskell, I'd probably have taken the plunge and built something with it by now[8].

In Case You're Reading, evancz

There are still a few minor headaches with the language, though thankfully I didn't have to stub my toe on most of them this time around. The only ones that ended up being annoying, or will be very shortly are

  • No signal defaults from within .elm files. This bites during development. When you have an Elm module that will depend on an outside signal for its operation, you have to set a default value for that signal outside. This is ok once you've got the embedding file together, but it does mean that that second Autocomplete.elm file above will give you the error
    Initialization Error: port 'wordList' was not given an input!
        Open the developer console for more details.
    if you try to run it standalone without modifications. The workaround I've been using is to comment out the port declaration line, and add one that reads wordList = constant ["one", "two", "three", "four", "five", "six"]. It works, but I'd rather not have to do it.
  • No Haskell-style sections. It only bit once in this program, and it's tolerable, but I'd much rather write (/="") than the equivalent, but syntactically noisier (\s -> s /= "").
  • No Indexing. I'm almost convinced this has to be an omission on my part, and there's actually a way to do it out of the box, because it seems mildly bizarre to have List.head and String.sub in a language, but no list indexing operator or function. If there is one, just point me to it. In the meantime, you can define your own minimal version as (!!) lst ix = lst |> drop ix |> head, or maybe
    (!!) lst ix = case drop ix lst of
                    [] -> Nothing
                    sub -> Just <| head sub
    if getting out of array bounds gives you pause. Neither of these deals with negative indices, but they'll give you trivial indexing capabilities.

That's that, I guess. I was going to talk a bit about Pure. And Forth. And maybe incidentally a bit about C, but this is way longer than I was expecting already, so I think I'll call it for today.


Footnotes

1 - [back] - Admittedly this took something like hours total. Writing it involved a lot of experimentation and documentation browsing, and I did a quick clean-up pass afterwards. I'm really hoping the development time goes down as I get more practice with the language and type system.

2 - [back] - Actually hang on. If you're new to the FRP thing, I should clarify that you never really change things. Remember; any stateful component is represented as the lazy, infinite list of its complete history. What I really meant by the shorthand "change it" is "merge multiple signals of the same type in a way that gives one of them primacy in certain situations". It's counter-intuitive the first few times, but it's helpful to keep the perspective in mind when you're dealing with Elms.

3 - [back] - And hence will change on a keyDown or keyUp event for that key.

4 - [back] - And will therefore change on either ctrl/space keyDown, whichever is second and on ctrl/space keyUp, whichever is first.

5 - [back] - It's dropRepeats : Signal a -> Signal a, in case you really, really care.

6 - [back] - That's the (merge Keyboard.enter esc).

7 - [back] - Assuming those were accurate urls for elm-runtime.js and Autocomplete.js on your system.

8 - [back] - For the record, there might be something like it for Haskell, you'll hear excited giggling from me if I happen to find it. For the moment, I'm still evaluating the FRP section of the Hasekellwiki.

Sunday, February 9, 2014

Fact Bases and Total History

No, since you ask, I haven't read anything related to datomic, though friends keep telling me I should. This is just stuff we've been talking about at the Toronto SICP reading group, and a couple of other places.

The context was mildly different; we had this conversation in relation to the notional bank accounts from chapter 3 of SICP where they introduce mutable state. The discussed version would have been simpler to optimize and manipulate, since it was entirely numeric, but I've also got fact-bases on my mind thanks to a now co-worker who's thought pretty deeply about them.

The conversation touched on total-history data-structures, and their effects on performance and convenience. The end result is this little project I just put together over the course of half an afternoon.

Let me back up...

Imagine a toy bank account.

The basic one proposed in the book is as simple as a balance, a getter, and a pair of setters named deposit and withdraw. You could add more detail, like interest rates and appropriate calculation functions, authentication mechanisms, and a designated owners list, but that's all beside the point.

Regardless of how much detail you imagined, you probably think about the principal structure being the current balance, hence simple numeric value of some precision. A total-history bank account is not that; it's a starting state (lets say 0), as well as The Total History (hence the name) of all transactions or modifications affecting it. So, instead of something like

$50.34

you're looking at a thing more like

'(...
  (3600901270 :deposit +100)
  (3600913394 :withdraw -20)
  (3600913519 :purchase -29.66))

stretching back from the beginning of the accounts' existence to now. This means that you have access to the "current value" of the account at any given time in its history. You can go back and check what happened and when, and if you like, you can ask questions like "What would it look like today if I had made an extra deposit here, and an extra withdrawal here?" In order to get the current value, you have to project it. That is, you need to go back through history and apply all the recorded events in order to see what falls out the other end. If you really are doing basic numeric modifications, it's pretty easy to parallelize some parts of that projection process, but I'll leave that as a thought experiment for the reader.

Bank accounts aren't the only things you can model this way; specifically, fact-bases can be usefully thought of in this manner.

Now then...

It turns out that total-history structures give you some interesting properties and challenges.

First, if you want all of history, you can never delete anything. You have to put in deletion tokens which remove some element or class of elements from your corpus. Because you can never delete anything, a total-history data-structure has the nice property of being append-only. Which means that you can play some neat optimization tricks in serializing it to disk, like say, clustering deltas. This comes in really handy for very large data sets, or ones which are updated very frequently. Since you're only dealing with shipping diffs around, you can easily save yourself bandwidth on keeping copies in sync, or you could easily break your corpus up across different physical drives. Undo/redo also becomes fairly simple to layer on top of a corpus that already uses this approach.

Second, time-stamping becomes pretty critical. If you took a look at that github link above, and poked around in here, you'll have noticed that I'm using local-time rather than get-universal-time because I need much finer granularity than seconds. Going to microseconds doesn't fully solve the problem, of course; if your throughput is high enough, there might still be collisions, and therefore potential data loss/duplication on updates. Putting something scalable together in Erlang would be easier, because now/0 guarantees unique values on subsequent invocations.

Still open decisions are how to go about storing deletion tokens, and when/how aggressively to prune history. A passable answer for the second is "never", so that's what I'm going with for the moment. The first one doesn't seem to have a right answer.

Storing deletion tokens...

The three approaches I can see off the top of my head are storing a deletion index, storing a deletion value, and storing a deletion template. I'm doing the third at the moment, though I'm not convinced it's the right approach. So lets start with that.

Deletion Template

Basically the deletion primitive looks like (delete! (list _ :subject _) fact-base), which goes through and deletes any fact whose second element is :subject, and also keeps that matching template as an indicator. A deletion entry then looks like

(<timestamp> :delete (list _ :subject _))

When you go to apply this particular deletion token, you'll need to create the function that checks for a match against it, then run that function across your built up state, removing anything it marks. There are three downsides here. First, this deletion token might pick up things other than the specific facts that it actually deletes in any particular traversal, which means that prospective evaluation gets more complicated. Second, because it's ambiguous, you can't easily reverse it; if you're building up state, you can't just back up over a deletion token, you need to throw away your accumulated state and start from the beginning again to get to the point you were at before applying it. Third, because it involves keeping a piece of match logic in the record, this approach means pulling out eval during de-serialization.

On the flip-side, it's easily parallelizable, and it doesn't care one whit about the order of the facts its traversing or the direction of traversal.

Deletion Index

This approach just has you keep the offset of the removed fact. A deletion token looks like

(<timestamp> :delete 37)

or maybe

(<timestamp> :delete (list 13 572 1335))

To apply this one, you go back through your built-up state and drop the nth elements. Mostly the same downsides as the Deletion Template; it gets a bit tricky when you want prospective change projection and it isn't cleanly reversible. It's mildly easier on memory, since we're just slinging integers around, and it doesn't need eval, but it suddenly matters which direction you're traversing your corpus in, it matters how your corpus is ordered, and I could see it getting in the way of parallelization later.

Deletion Value

A deletion value token looks something like

(<timestamp> :delete (<id> :subject "whatever the third value is"))

You apply this by going through the accumulated corpus and removing the first fact that matches it. Granted, it's slower in general (because applying it in general involves an arbitrary tree-compare), and it's more wasteful of disk space (because we have to store those arbitrary trees as well as comparing them with facts to remove). But. Depending on how strictly you enforce it, this one is more easily reversible, and it doesn't need eval either, since it's just storing a value.

Wrapping up...

A couple of other thoughts I'd like to leave percolating:

  • It might be possible to get the best of both worlds by making sure that a deletion token remembers the particular members it removed at application time. That would let both the Template and Index approach reverse easily, though it would complicate the projection process somewhat.
  • It might be useful to have a layer of meta-tokens. That is, insertions in history that modify other history entries. Off the top of my head, only :ignore or :duplicate seem like they'd be significantly useful. On the plus side, you now have a meta layer to these histories which gives you more flexibility in using them. On the other hand, you now need to have one of efficient reversals, a two-pass loading procedure, or really really shitty performance for heavily meta-tagged histories.

That's that. Not the most flowing narrative I've ever produced, but I think it touched on one or two interesting ideas. And anyhow, I'm still chewing over most of this. I'll let you know how it goes, and if I end up finding a real use for it.