Saturday, December 14, 2013

Implementing Humane Authentication

Last time, I mentioned the auth system proposed by Raskin in his book The Humane Interface. This time, lets dissect something concrete.

Actually, before we get to dissecting anything, let me emphasize again that the most humane approach to authentication is not requiring it. If you have a system that could be exposed to anonymous users, that's what you should do. If you've decided that you absolutely must have some sort of authentication step, then this doesn't seem to be a bad way to go.

Now then...

;;;; raskin-auth.asd
(asdf:defsystem #:raskin-auth
  :serial t
  :description "Implementation of the authentication system proposed in The Humane Interface"
  :author "Inaimathi <leo.zovic@gmail.com>"
  :license "AGPL, as usual"
  :depends-on (#:cl-ppcre #:ironclad)
  :components ((:file "package")
               (:file "util")
               (:file "raskin-auth")))
;;;; package.lisp
(defpackage #:raskin-auth
  (:use #:cl)
  (:export #:new-account! #:sign-in))
;;;; util.lisp
(in-package #:raskin-auth)

;;;; Dictionary-related
(defconstant +dict+ (coerce 
                     (with-open-file (s "/usr/share/dict/american-english")
                       (loop for line = (read-line s nil :eof) until (eql line :eof)
                          unless (cl-ppcre:scan "[^a-zA-Z]" line)
                          collect (string-downcase line)))
                     'vector))

(defun random-words (count &optional (dict +dict+))
  (loop repeat count
     collect (aref dict (random (length dict)))))

;;;; Hash-related
(defmethod iterated-digest ((count integer) (digest-spec symbol) (message string))
  (assert (> count 0))
  (loop with res = (ironclad:ascii-string-to-byte-array message)
     repeat count do (setf res (ironclad:digest-sequence digest-spec res))
     finally (return res)))
;;;; raskin-auth.lisp
(in-package #:raskin-auth)

(setf *random-state* (make-random-state t))

(defparameter *users* (make-hash-table :test 'equal))

(defun hash (passphrase)
  (ironclad:byte-array-to-hex-string (iterated-digest 10000 :sha256 passphrase)))

(defun fresh-passphrase ()
  (let ((u-mod (/ (hash-table-count *users*) 100000)))
    (loop for num-words = (+ 2 (floor u-mod) (random (+ 2 (ceiling u-mod))))
       for passphrase = (format nil "~{~(~a~)~^-~}" (random-words num-words))
       unless (gethash passphrase *users*) do (return passphrase))))

(defun new-account! (&optional (user-data t))
  (let ((passphrase (fresh-passphrase)))
    (setf (gethash (hash passphrase) *users*) user-data)
    passphrase))

(defun sign-in (passphrase)
  (gethash (hash passphrase) *users*))

And that's it. In a production system, you'd obviously want to wire everything up to some database system or other rather than using an in-memory hash-table, but this explains the concept well enough. You'd use this module by including it, then calling (new-account! [user account data goes here]) (which will return a newly generated passphrase) and (sign-in "a-passphrase-goes-here") (which will return either nil or the account data you associated with the given passphrase) as necessary.

Lets go through it.

;;;; raskin-auth.asd
(asdf:defsystem #:raskin-auth
  :serial t
  :description "Implementation of the authentication system proposed in The Humane Interface"
  :author "Inaimathi <leo.zovic@gmail.com>"
  :license "AGPL, as usual"
  :depends-on (#:cl-ppcre #:ironclad)
  :components ((:file "package")
               (:file "util")
               (:file "raskin-auth")))
;;;; package.lisp
(defpackage #:raskin-auth
  (:use #:cl)
  (:export #:new-account! #:sign-in))

That's the ASD file and package. The first makes sure you can load this system using asdf or quicklisp, and the second declares your imports and exports. I'm trying something new this time and refusing to use :use or :import-from and friends. I've gotten a couple comments to the effect that it gets a bit confusing if I import symbols directly rather than labeling them inline with the package they came from, so even though raskin-auth does use things from both ironclad and cl-ppcre, the package.lisp file is staying minimal.

;;;; util.lisp
(in-package #:raskin-auth)

;;;; Dictionary-related
(defconstant +dict+ (coerce 
                     (with-open-file (s "/usr/share/dict/american-english")
                       (loop for line = (read-line s nil :eof) until (eql line :eof)
                          unless (cl-ppcre:scan "[^a-zA-Z]" line)
                          collect (string-downcase line)))
                     'vector))

(defun random-words (count &optional (dict +dict+))
  (loop repeat count
     collect (aref dict (random (length dict)))))

;;;; Hash-related
(defmethod iterated-digest ((count integer) (digest-spec symbol) (message string))
  (assert (> count 0))
  (loop with res = (ironclad:ascii-string-to-byte-array message)
     repeat count do (setf res (ironclad:digest-sequence digest-spec res))
     finally (return res)))

random-words creates a list of count random words by picking them out of a dictionary, which is +dict+ by default. You don't necessarily want these words to be unique, so we don't check for that. +dict+ is just some slightly sanitized output from /usr/share/dict/american-english, which is where Debian keeps the default English language dictionary. The result of that read is a vector of all words in the dict file that are composed entirely of lowercase letters. What we're doing, essentially is shuf -n [count] /usr/share/dict/american-english. Except we're filtering for some stuff, so that should really get piped through a grep or two. Use whatever method you'd like; the end goal is to get a list of count random words, from a list of ~60000 different words, each with an equal probability.

iterated-digest takes a count, a digest-spec and a message, and applies the specified digest to the message count times sequentially. We'll take a look at how you call it in a second.

;;;; raskin-auth.lisp
(in-package #:raskin-auth)

(setf *random-state* (make-random-state t))

(defparameter *users* (make-hash-table :test 'equal))

(defun hash (passphrase)
  (ironclad:byte-array-to-hex-string (iterated-digest 10000 :sha256 passphrase)))

(defun fresh-passphrase ()
  (let ((u-mod (/ (hash-table-count *users*) 100000)))
    (loop for num-words = (+ 2 (floor u-mod) (random (+ 2 (ceiling u-mod))))
       for passphrase = (format nil "~{~(~a~)~^-~}" (random-words num-words))
       unless (gethash passphrase *users*) do (return passphrase))))

(defun new-account! (&optional (user-data t))
  (let ((passphrase (fresh-passphrase)))
    (setf (gethash (hash passphrase) *users*) user-data)
    passphrase))

(defun sign-in (passphrase)
  (gethash (hash passphrase) *users*))

*users* is a hash table that'll keep all of our user records[1], and both new-account! and sign-in are hopefully self explanatory. Let me linger on the rest of that though.

First, you absolutely positively need the *random-state* initialization. Without that line, your system will generate the same order of passphrases each time it starts up. Maybe that's not too big a deal in general, but I'm paranoid enough that I want proper, os-seeded randomness out when I'm generating authentication tokens.

Second, you can see the iterated-digest call here:

(defun hash (passphrase)
  (ironclad:byte-array-to-hex-string (iterated-digest 10000 :sha256 passphrase)))

That takes a particular passphrase string and returns the result of applying the :sha256 digest to it 10000 times. I guess you could make that :sha512 if you really wanted to.

Finally, fresh-passphrase does the job of calling random-words, concatenating the result, and checking whether the result of that is already on record. It keeps going until it generates a passphrase that no one else is using at the moment, and returns that. You can see that it scales somewhat with count of users registered, just to make sure we don't get into the situation where a particular passphrase length is particularly easy to guess.

That's it. Again, what I see here is reasonable security.

Thoughts

On the one hand, you don't get to salt passphrase hashes. Which means that if anyone manages to trick a user of this auth system into revealing their ciphertexts, they'll have a mildly easier time cracking the result. And, since every passphrase is unique, they can knock out some tiny number of possibilities as they go. You also can't easily change your hashing tactic in-flight. Hypothetically, if you chose the iterated :sha256 approach from above, and it then turned out that clever people found ways to compromise that hash, you wouldn't be able to switch your tactics on a live system easily, the way you could with a user-name-oriented system. You would be able to increase the number of hashings fairly easily; just modify your hash to do more iterations, and modify your registered users' passwords to make up the difference.

On the other hand, no one will ever have the passphrase 123 with this system. And, since they didn't pick it, they presumably won't have this same passphrase on any other service they frequent, which means a compromise here won't have to result in a mad dash to change their account passwords anywhere else fo fear of exploits.

The only other downsides seem to be that you can't choose a passphrase, and that if you forget your passphrase, you must create a new account.


Footnotes

1 - [back] - Because it's a hash table, and I don't bother doing any kind of locking, the system you see specified here very likely won't do for any multi-threaded use-cases. You can either add locks, or go the whole nine and replace that hash table with an external database, but I don't need either to see the basic properties of the system, so I didn't implement them.

Tuesday, December 10, 2013

Jef Raskin on Authentication

I've got an idea to peel, and for a change, it's not even mine. I'm in the middle of reading through a Raskin book entitled "The Humane Interface", in which he suggests a different take on user authentication. In section 6-4-3[1], Raskin suggests that signing on to a system can be accomplished without requiring a user name. That is, instead of ... you know what, here, this is easier:

Users are doing more work than necessary when signing on to most systems. You first state who you are -- your "handle", "online name" or "system name" -- and then you provide a password. The name presumably tells the system who you are, and the password prevents unauthorized persons from using your account.

In fact, you are telling the system who you are twice. All that is logically required is that you type a password. There is no loss of system security: The probability of guessing someone's name and password depends on how the password was chosen, its length and the like. Finding the user's online name is usually trivial; in fact, it is commonly made public so that she can be communicated with. A badly chosen password, such as your dog's name, is the most common reason for poor security.

The technical argument that typing two separate strings of characters gives more security is false. If the online name is j characters and the password is k characters, the user, to sign on, must type j+k characters, of which only k characters are unknown to a potential interloper. If the password was chosen randomly -- this is the best you can do -- from a character set with q characters, the probability of breaking into the account on a single guess is 1 / q^k.

Jef Raskin -- The Humane Interface, p183

If you've given authentication systems anywhere near as much thought as I have, the trouble you should immediately see is that in a system like the one proposed above, a password must be unique to a user. Luckily, Raskin sees that one coming.

The question arises: How can you ensure that everybody has a unique password in a password-only system? What if two or more users chose the same password? The best option is to have the system assign them. This method can result in very unmemorable passwords, such as 2534-788834-003PR7 or ty6*>fj`d%d[2]. Another technique is to use a random pair of dictionary words, such as demijohn-shoestring, confirmed-tweezers or sulphur-dive. If a dictionary of 60,000 words is used, the chance of guessing a password on the first try is one in three billion, six-hundred thousand. Using three words puts the difficulty of guessing them beyond hacking with current technology; there are 2.16 x 10^14

such combinations, and guessing and checking a billion of these a day, beyond what can be done at present, would still take about 10^5 days, or 275 years. That's reasonably secure. User-created passwords, at least those more readily memorized by the user, are inherently less secure.

When the idea of improving the interface to a web site or a computer system by simplifying the sign-on process to require only a password is suggested, it is usually rejected on one of two grounds. Either the programmers say that that's just not the way it's done, or they say that they have no control over the sign-on procedure. But someone, of course, does have that control.

Jef Raskin -- The Humane Interface, p183,184

Before I discuss this idea with my self, I have to disagree with two points. First, the odds of guessing a correct password on the first try is not 1 in 3 600 000 000, or 1 in (* 2.16 (expt 10 14))[3]. It's n in whichever-you-picked, where n is the number of users you have. With a password-only system, an attacker is no longer trying to guess a particular users' password, they are trying to guess any password already assigned by your system. Second, I'm not entirely sure that badly chosen passwords are any longer the most common reason for poor security[4], but rather utterly, mind-fuckingly stupid security design by password DB teams[5].

With that our of the way, lets all don our white hats[6], and imagine the proposed system in enough detail to implement it.

How does a user log in?

In the context of a web application, they've got one field to fill out, "passphrase", and one button to click, "Log In". The passphrase entered is then hashed and looked up in our user database; if it matches a passphrase hash we have on file, the user ID is retrieved and used to get the specified users' program state. We then continue along letting them do what they're actually here to do. In an ideal system, this authentication step would be entirely optional, allowing it to happen at the last possible moment, when a user needed to commit some piece of data to their server-side corpus.

This is easily the biggest introduced weakness I see in the proposed system. Because we only have a passphrase to work with, we can only use either an unsalted hash, or a per-server "salt" to keep our passphrases out of plaintext. If we didn't, that user lookup based on the password would take a long time. Scaling at On with number of users, with some fairly ridiculous constants tacked on. That's dangerous, because we're suddenly gambling that the rest of the application our auth system is embedded in won't allow any injection attacks, or leak database information any other way. Granted, because we're guaranteed to have unique passwords, such a disclosure isn't as easy to take advantage of as it might be, but it's still a concern.

What happens when the user enters a passphrase that isn't currently assigned?

There are really only two reasonable possibilities:

They get an artificial delay, followed by the above message. The standard log-in procedure also needs to have an equivalent delay, otherwise attackers might abort a guess before getting the response back, which would prevent them from actually being delayed in the practical sense. It doesn't have to be long; a second or two would be enough to prevent the kind of guess hammering I've got in mind, and it wouldn't be too annoying to users provided we put in a little spinning graphic in the meanwhile[7].

They get a "logged in" response with the default state in place, and no other warning. Effectively, an "incorrect" passphrase entry becomes a registration. Users might get annoyed at this one, since it would seem at first that their program state is gone.

Having thought about this for a bit, it becomes clear that there's only one reasonable possibility, and it's the first one.

How does the registration process work?

This might be context sensitive by application. For instance, Deal lets users play entirely anonymously. I can easily imagine a system wherein after 10 minutes of play time, a user just automatically got an in-game notice with a passphrase that would let them resume where they were. Because the server controls all the steps to a registration, it can happen behind the scenes with some game time effectively taking the place of a Captcha. This could be used with any system that lets you start off anonymously; wikis, bulletin boards, forums, etc.

That system, elegant as it might be from the implementation and usability side of things, wouldn't work for something like GoGet. Where the only possible reason to use the application is to go back later and check what you put in the first time. In that situation, you'd want the usual up-front "Register" button that would do the Captcha thing to make sure you're not a robot[8], and hand the user an account before they start doing stuff. Really, this might be re-designed too though; have the system start you off on a blank check-list, with an unobtrusive "Log In" form at the top of the page, with the added button "Save", which would register you and hand you a passphrase with which you could access the list you just made.

What do we do when passphrase exhaustion occurs

Granted, 216 000 000 000 000 is a large number, but it's not infinite, which means some clever bastard out there is going to find a way to cut it in half a few times for the purposes of guessing. And it doesn't take very many halvings to get that down to a tractable level. We have to deal with this problem a good deal sooner than "passphrase exhaustion"; if we get to the point where all passphrases are assigned, an attacker suddenly gets access to an account no matter which possibility they guess. But if we did something naive like hand out 2-word passphrases until they ran out, then an attacker who registers and receives a 3-word passphrase would know that any 2-word combination of our dictionary words will give them access to an existing account. We'd really want to generate new passphrases well before we ran out; at something like 10% exhaustion at a guess. Or better yet, don't limit passphrase length to two words, make it n random words, where n is something like

(let ((u-mod (/ user-count 1000000)))
  (random (+ 2 (floor u-mod)) (+ 4 (ceiling u-mod))))

That should give attackers less purchase, and scale naturally with additional users.

What Have We Got?

Switching briefly over to my black hat, I can't see an attack on this system that would get you any traction above and beyond traditional password-based implementations[9]. That doesn't mean there isn't a way, of course. I'll present the idea to some discerning and devious thinkers to see what they can come up with. Otherwise, we've got some interesting properties here, mainly because the server-side is the one putting everything together. We have a passphrase system that

  • only requires the barest interaction with the user
  • can be initiated automatically at some point[10]
  • will not suffer the failure mode that someone will use this same passphrase everywhere[11], meaning that even if the system is compromised, all an attacker has is access to an account for one particular service
  • will never have to worry about a passphrase as shitty as "password" or "12345"

On the flipside, though:

  • You can't change your passphrase to something you want
  • There isn't a way to recover or reset a forgotten passphrase[12]

All in all, barring someone pointing out some egregious security flaw in this approach, it seems to be worth implementing.

I'll see what I can do.


Footnotes

1 - [back] - Starting on pg 183 in the copy I'm holding.

2 - [back] - The actual printed second password contains some unicode characters which were not accurately reproduced here.

3 - [back] - Depending on how many words you decide to pack into each generated password.

4 - [back] - Though it may just be a media imparted bias on my part.

5 - [back] - Such as the refusal to use appropriate hashing algorithms, or inadvertent opening of various injection attacks.

6 - [back] - Mine's a tuque because it's cold out and I'm in Canada, but you should feel free to don your hacking fedora, trilby, stetson, what-have-you as regionally appropriate.

7 - [back] - Most authentication systems I interact with take longer anyhow.

8 - [back] - Or not, really. Depending on how much traffic your system can handle, how much you care about preserving disk spce, and whether you give your users the ability to use SMTP facilities, you might get away with putting in an artificial 3 or 4 second delay before registration completes rather than trying to prevent automatic sign-ups. That's what I plan to do, in any case.

9 - [back] - Apart from the situation where our ciphertext passwords have been leaked. Which, granted, isn't a high bar, but still.

10 - [back] - As in the situation in Deal that would automatically hand the user a passphrase ~10 into active use of an unregistered account.

11 - [back] - Hopefully, at least.

12 - [back] - Since the passphrase acts as both a name and password, if you forget it, you just have to start a new account. Allowing the user to save as much of their data as possible locally would work to alleviate some of the pain from this.

Sunday, December 1, 2013

Quick Update on Deal

Deal proceeds apace.

I've spent the past while putting together a minimal, single-threaded asynchronous server to simplify the deployment process. Almost done, and you can see the progress on this branch in this subdirectory. The remaining stuff left ToDo is:

Better Errors. I need to put together an appropriate assertion mechanism. Straight up assert works fine in a multi-threaded context, but does some mean things when you've only got the one thread. Normally, it wouldn't be that big a deal, but I have to special-case my handler-case statements for SIMPLE-ERROR in order to allow shell interruption. Unfortunately, they're both conditions of type simple-error, which means that if I do it naively, I either let both or neither through. What I'm planning to do is define a macro named something like http-assert, which will throw a type of error I can then safely convert into an HTTP-400 response.

Basic Static Files. Currently, I'm serving static files through nginx only. Which is the efficient way of doing it. However, one use-case I'm thinking of for Deal is that of a small, geographically disparate team setting up a private server for themselves. It's kind of a pain in the ass to have to set up a reverse proxy for that situation, so it would be nice if House provided a basic file handler for people to use.

That's ... going to get complicated though. A first crack at the implementation is here and here. That only works for text files so far[1], and it only works for a laughably small number of mimetypes. A more complete map can be found here or here, but I'm not going to be anywhere near as thorough; remember, this is an edge case. This lightweight server is not in the business of serving out static files in an efficient manner. That's what things like nginx are for, and I've got no doubt they're doing a better job than I could.

Touch Ups Sessions don't expire yet. And when they do, I'll want to give them the same kind of behavior hooks that I've got going for new-session. There's also the non-trivial matter of porting the rest of the Deal system to work better with the House server, but I get the feeling I'm most of the way there already.

Famous last words, right?


Footnotes

1 - [back] - I'm still trying to iron out kinks; in particular there seems to be some kind of character encoding issue left in the way that I just can't get my head around. I'll be asking on SO shortly.