Showing posts with label C. Show all posts
Showing posts with label C. Show all posts

Sunday, April 29, 2012

Notes From the Borders of Erlang

This is going to be a pretty disjointed, Erlang-heavy article, since that's basically been the main dominating piece of programming-related thought in my brain for the past week. It actually started a while back, when I got the unofficial heads-up that we'll soon be starting a new project at work which will call for super-massive transaction counts, require high reliability/uptime and be mostly server-based. That short-list tells me that the right tool for the job is probably a functional language that focuses on inter-process communication, and enforced isolation between components. Also, one of the big reasons I like my current company is that if I make a decision about what technology we're using, no one gets to tell me to fuck off.

Thus began the research...

There are the usual set of resources over there in the sidebar[1], but do also take the time to check out this vimeo piece featuring Joe Armstrong. It won't really give you much insight into how to use the language, but it will show you a bit of the history and intent. Like I said, entirely worth it to hear the man talk, but here are the big points, as extracted by yours truly; he highlighted three things that were missing from Erlang[2], one big mistake, two not-too-bad ideas and three fairly nice ideas that the team had when developing the language. He noted that these are controversial, but I tend to agree with a pretty large number of his assessments. Then again, I'm the crazy motherfucker who regularly blogs about his experiences with Lisp, Smalltalk, Erlang and Ruby, so maybe I'm not the best person to gauge what a mainstream opinion is supposed to look like.

Three Missing Things

Hash Maps - JSON-style key/value data structures. Not just adding them to the system, but making them the fundamental data-type rather than tuples or arrays. I can see why, too; if you look at any tutorial or piece of Erlang code, you'll see things that fake key/value pairs using tuples. Things like {shopping_list, [{oranges, 3}, {apples, 4}, {bread, 1}]}, which would be better expressed as a JSON structure[3].

Higher Order Modules - code in Erlang is organized into modules, which is par for the course these days, but you can't programmatically introspect on them at runtime. Joe mentioned the example of being able to send a particular standardized message and getting back a list of messages supported by the target. I guess this probably might get built into the existing language piecemeal by convention rather than specification. I'm imagining a situation where a given team agrees that they'll write all their modules to accept a help message which would return a list of the functions it provides and a specification of inputs they'd each accept. Thing is, 1. that wouldn't be a language-wide standard, and 2. it would take additional explicit work by the developers. If it was handled at the language level, everyone would have access to the same introspection facilities, and they'd be handled with no additional thought or deed on the developers' part.

The Ability to receive a fun - Erlang is a higher-order language, and you can send around function names whenever and wherever you damn well please, but apparently the built-in receive directive won't let you pass it an anonymous function. Ok, this isn't one you could solve with macros, but I'm not entirely sure it would be a good idea in the first place. The thing on the other end of the line isn't necessarily code you can trust, but it would certainly add more flexibility.

One Big Mistake

Lost Too Much Prolog - Joe's a big Prolog fan, which should come as no surprise to anyone who's read any Erlang tutorials, watched any Erlang talks, or indeed, written any Erlang code. I'm not qualified to comment, never having done anything approaching serious development in Prolog[4].

Two Not Too Bad Ideas

He gave this talk to an American audience, so he had to have a section with Good™ and Great™ ideas, though he would have preferred to be more modest about it. In deference to his preference, I'm keeping his intended titles.

Lightweight Processes Are Ok -

"... we've shown that you can do processes in the language, and we've shown there's no need for threads. Threads are intrinsically evil, and [shouldn't] be used. Threads were sort of this 'Oh my goodness, processes aren't efficient enough, so lets use this abomination to...' horrible things." - Joe Armstrong
For my part, I've got a half-written piece about cl-actors sitting in my drafts folder. It's a pretty good, lightweight implementation of the actor model built on top of bordeaux-threads. And if you like the Erlang-style message passing, do give it a shot, but it doesn't quite do the same thing as Erlang manages. The threading model means you can't expect to reliably spawn thousands of cl-actors on a typical machine. For comparison, the Pragmatic book has an example on pg 149/150 wherein Joe removes the built-in safety limit of 32 767 processes and has Erlang spawn 200 000 without breaking a sweat[5]. That seems like at least part of the story behind those mind-boggling benchmarks that you've all probably seen by now.

OTP Behaviours - The correct way to think of Behaviours, Joe says, is to consider them the process equivalent of higher-order functions. They formalize basic request patterns between processes letting individuals focus on the differences. I don't actually have enough experience with them yet, but if Joe's description is accurate, I can see them being very useful when constructing complex systems with a reliability requirement.

Three Fairly Nice Ideas

Bit Syntax - Is frequently useful when setting up low-level communications with non-Erlang processes, and reading files. Joe calls this out as the first of three very useful features, and it really is elegant. If you've never seen it, I encourage you to take a quick look. Short version: the notation they've set up gives you access to the same pattern matching facilities you can expect from the rest of the language, which in turn makes it very simple to decode and process binary data.

Formalized Inter-process Relationships - This is another feature that typical "Erlang-style" systems miss. They're useful as fuck when you're building multi-processing systems, but it seems like you could add them on later if you picked your primary primitives properly. The idea is that you can explicitly link various processes in certain ways. For instance, you can tell a group of processes to all fail if one of them fails, or you can tell a specific process to monitor another, restarting it in the event of an error.

Offensive Programming - He called it "non-defensive programming", but I like the negative name better. Offensive programming is the technique of programming only for the successful case, and letting any error take down the process involved (someone will be along to pick up the pieces and restart it shortly). That would sound crazy in your typical language, but starts looking like a good idea when your principal method of organization is a completely isolated process.

The FFI

Aside from historical notes and tutorials, I've been looking at how I'd go about interfacing Erlang to other languages. The standard seems to be doing it the same way you'd interface different Erlang processes. Except that where Erlang nodes already know how to talk to each other, the protocol needs to be implemented manually for other languages. It works consistently whether you're talking to Python, Ruby, Common Lisp, Java or C[6]. All the languages I've taken a look at so far come with an established protocol to talk to Erlang in some way.

Here's a practical example that I'll actually end up refining for deployment later; a C-based interface to some very specific ImageMagick routines.

First, the Erlang communication functions[7]

/* erl_comm.c */

#include <unistd.h>

typedef unsigned char byte;

int read_cmd(byte *buff);
int write_cmd(byte *buff, int len);
int read_exact(byte *buff, int len);
int write_exact(byte *buff, int len);

int read_cmd(byte *buff) {
  int len;
  if (read_exact(buff, 2) != 2) {
    return(-1);
  }
  len = (buff[0] << 8) | buff[1];
  return read_exact(buff, len);
}

int write_cmd(byte *buff, int len) {
  byte li;
  li = (len >> 8) & 0xff;
  write_exact(&li, 1);
  
  li = len & 0xff;
  write_exact(&li, 1);

  return write_exact(buff, len);
}

int read_exact(byte *buff, int len){
  int i, got=0;
  do {
    if ((i = read(0, buff+got, len-got)) <= 0) {
      return(i);
    }
    got +=i;
  } while (got<len);
  buff[len] = '\0';
  return(len);
}

int write_exact(byte *buff, int len) {
  int i, wrote = 0;
  do {
    if ((i = write(1, buff+wrote, len-wrote)) <= 0) {
      return(i);
    }
    wrote += i;
  } while (wrote<len);
  return(len);
}

Next, the "driver". This is the bit that will actually end up being spawned and fed input by the Erlang process

/* driver.c */
#include <limits.h>
#include <libgen.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

typedef unsigned char byte;

int read_cmd(byte *buff);
int write_cmd(byte *buff, int len);

char *chop_path(char *orig) {
  char buf[PATH_MAX + 1];
  char *res, *dname, *thumb;

  res = realpath(orig, buf);
  if (res) {
    dname = dirname(res);
    thumb = strcat(dname, "/thumbnail.png");
    return thumb;
  }
  return 0;
}

int main(){
  int result, i, len;
  byte buff[255];
  char *thumb;

  while (read_cmd(buff) > 0) {
    thumb = chop_path(buff);
    result = thumbnail(buff, thumb);
    
    buff[0] = result;
    write_cmd(buff, 1);
  }
}

Then the actual function I'll be wanting to call[8]

/* wand.c */
#include <stdio.h>
#include <stdlib.h>
#include <wand/MagickWand.h>

#define ThrowWandException(wand, ret) \
{ \
  char \
    *description; \
 \
  ExceptionType \
    severity; \
 \
  description=MagickGetException(wand,&severity); \
  (void) fprintf(stderr,"%s %s %lu %s\n",GetMagickModule(),description); \
  description=(char *) MagickRelinquishMemory(description); \
  wand=DestroyMagickWand(wand); \
  MagickWandTerminus(); \
  return ret; \
}

int thumbnail (char *image_name, char *thumbnail_name){

  MagickWand *magick_wand;
  MagickBooleanType status;

  /* Read an image. */
  MagickWandGenesis();
  magick_wand=NewMagickWand();
  status=MagickReadImage(magick_wand, image_name);
  if (status == MagickFalse) ThrowWandException(magick_wand, 1);
  
  /* Turn the images into a thumbnail sequence. */
  MagickResetIterator(magick_wand);
  while (MagickNextImage(magick_wand) != MagickFalse)
    MagickResizeImage(magick_wand,106,80,LanczosFilter,1.0);

  /* Write the image then destroy it. */
  status=MagickWriteImages(magick_wand, thumbnail_name, MagickTrue);
  if (status == MagickFalse) ThrowWandException(magick_wand, 2);
  magick_wand=DestroyMagickWand(magick_wand);
  MagickWandTerminus();
  
  return 0;
}

and, finally, the actual calling Erlang module.

%% wand.erl
-module(wand).
-export([start/0, stop/0, restart/0]).
-export([thumbnail/1]).

start() ->
    spawn(fun() ->
                  register(wand, self()),
                  process_flag(trap_exit, true),
                  Port = open_port({spawn, "./wand"}, [{packet, 2}]),
                  loop(Port)
          end).

stop() -> wand ! stop.

restart() -> stop(), start().

thumbnail(Filename) ->
    call_port(Filename).

call_port(Msg) ->
    wand ! {call, self(), Msg},
    receive
        {wand, Result} ->
            Result
    end.

loop(Port) ->
    receive
        {call, Caller, Msg} ->
            Port ! {self(), {command, Msg}},
            receive
                {Port, {data, Data}} ->
                    Caller ! {wand, decode(Data)}    
            end,
            loop(Port);
        stop ->
            Port ! {self(), close},
            receive
                {Port, closed} ->
                    exit(normal)
            end;
        {'EXIT', Port, Reason} ->
            exit({port_terminated, Reason})
    end.

decode([0]) -> {ok, 0};
decode([1]) -> {error, could_not_read};
decode([2]) -> {error, could_not_write}.

Once all that is done, and compiled using gcc -o wand `pkg-config --cflags --libs MagickWand` wand.c erl_comm.c driver.c, I can call it from an Erlang process as if it were a native thumbnail generator.

Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:4:4] [async-threads:0] [kernel-poll:false]
  Eshell V5.9.1  (abort with ^G)
1> c(wand).
{ok,wand}
2> wand:start().
<0.39.0>
3> wand:thumbnail("original.png").
{ok, 0}
4> wand:thumbnail("/home/inaimathi/Pictures/and-another.png").
{ok, 0}

You'll have to take my word for it, but those both generate the appropriate "thumbnail.png" file in the same directory as the specified images.

All of that looks pretty complicated, but it really isn't when you sit down and read it. If I had to break it out by time involved, it would look something like

  • 10% reading up on ImageMagick C interface
  • 5% reading up on Erlang C FFI (emphasis on the ports)
  • 5% writing it
  • 80% trying to figure out why the C component was segfaulting when assembling a thumbnail path (short answer: I didn't include everything I needed to)

As a parting note, having gone through the rat's nest that is pathname manipulation in C, I hereby promise to never again bitch about Lisp's pathname handling. Nothing like wading waist-deep in horse shit to remind you how good you've got it merely living within earshot of the stables.


Footnotes

1 - [back] - Though I'll admit, the Erlang section is pretty sparse compared to the rest of them

2 - [back] - That he'd put in if he had to do it again

3 - [back] - For the record, I'm trying really hard not to put on my Lisp hat and say something like "Mmmm, mmmm, this syntactic abstraction is fucking delicious! How's it working for you guys? Oh, you haven't had any?! That's a shame..." in an obnoxiously smug voice. It's difficult, and this footnote may count as a failure. Sorry.

4 - [back] - In fact the entirety of my related experience is the appropriate chapter from 7 Languages..., flipping through the Reasoned Schemer and the SICP lectures wherein prolog is briefly implemented on top of Lisp. Thant link is to the playlist rather than the correct episode; it's been a while, and I no longer remember which it was specifically.

5 - [back] - This was reportedly on a 2.4gHz Celeron machine with a half-gig of ram, so that was not a consequence of awesome hardware

6 - [back] - Actually, that's half true. There are three different ways to interface with a C program; you can do port-based communication with a custom protocol, you can call C natively at the risk of system collapse with errors, or you can implement the Erlang protocol and pretend to be an Erlang process for the purposes of interoperability. The other languages I've taken a look at do that last one, but you've got options if you're rolling your own

7 - [back] - Ripped bleeding from Programming Erlang. It which won't change at all, regardless of what specific protocol I end up picking

8 - [back] - With thanks to the reference implementation from the ImageMagick team

Wednesday, December 14, 2011

Teensy Mode

Ok, this isn't one of those bullshit times where I just say "I'm going to bed" and then end up spending three hours explaining various shit at the expense of wakefulness the following day. Seriously, just one thing, I added a very quick, simple teensy-mode module to my emacs-utils project over at github. It turns out that I can't focus to save my life, but it has been damn enjoyable jumping into code so thoroughly after the set of weeks I've just been through.

The relevant bits are really just

(defun teensy-compile-current-file ()
  (interactive)
  (shell-command (format "%s %s" teensy-compile-file-command buffer-file-name)))

(defun teensy-compile-project ()
  (interactive)
  (shell-command teensy-compile-project-command))

(defun teensy-load-hex-file ()
  (interactive)
  (let ((hex-files (directory-files (file-name-directory buffer-file-name) nil "hex$"))
        (command (format "%s -mmcu=%s -wv " teensy-loader-program teensy-processor-type)))
    (cond ((not hex-files) (message "No hex file found"))
          ((= 1 (length hex-files)) (shell-command (concat command (car hex-files))))
          (t (shell-command (concat command (completing-read "Hex File: " hex-files)))))))

(defun teensy-compile-project-load-file ()
  (interactive)
  (progn (teensy-compile-project)
         (teensy-load-hex-file)))

the rest of it is various Emacs boilerplate like key and customization declarations. The one interesting thing I've learned from this experience is that the :options keyword does pretty much jack shit. In a hook customization, you can presumably at least see the list, but it still doesn't constrain your choices to what's there. It might be nice to have the option, since the customization I had in mind was the chip-type of your Teensy board (a required option that needs to be correct or it'll hang your chip, so it's sort of critical to get it correct).

I'm also flaking out on actual project compilation, opting to keep the make command involved, which has some seriaah fuck this, I'm going to sleep.

Tuesday, December 13, 2011

Teensy Fucking Passwords

I've been trying to get to as many Toronto Lisp Users' Group meetings as I can since missing one a couple months ago, though I'm not entirely sure it's good for me. The minutes of conversation/interesting thing ratio is approaching zero, to the point that it might be more accurate to start tracking interesting things/minute, which is excellent except that I have this day job thing which keeps me from spending every waking moment on interesting fiction, crazy languages(closed source warning) and crazier architectures. That's without even accounting for the fact that my wife and I have been beating down the Thalmor on a regular basis.

The last link is actually what got me off my ass and seriously interested in coding again in my spare time[1]. The thing is, as soon as I started looking at it, I had to admit that I have no fucking idea how to go about serious embedded programming, let alone how to do it with forth[2].

I also mentioned a little while ago that I tried poking around at the Teensy and found it to actually work on my machine[3]. The other thing I like is that Teensy lets me use Emacs for development, though there isn't a nicely integrated mode for compiling and loading projects/files onto the chip without jumping into an eshell buffer. I might want to sink some time into that...

No! Focus! No new shit right now!

Anyway, poking around is all very well, but I find that I really need a project of some sort to work on otherwise I don't internalize the concepts effectively. I do still eventually want to build myself a chordite (or something similar; relevant bit at the 2:10 mark), but that looks like it'll take a lot more industrial design and wiring than I'm in the mood for. At about the same time I was thinking about all of this, I stumbled upon a writeup about a hardware keylogger and an article entitled fuck passwords.

The second one is all the correct complaints about passwords and authentication that OpenID was supposed to almost solve, and that SSH keys do solve for those of us lucky enough to be using SSH. The problems remaining are

  • Not every system uses OpenID (and even if they did, you still need to authenticate to your OpenID provider, most of whom use passwords :p)
  • The ones that don't use OpenID are the ones you want to be most secure (banks and such), and their password constraints make them the hardest to remember
  • The most secure thing to do is to come up with a separate password for every service and rotate them regularly, but this is hard to memorize

On top of that, there are a few things[4] that can't be solved by OpenID or RSA keys. Just keep that problem in mind.

The other link is called "PHUKD". And a quick glance at that spec page will tell you exactly why. It's a pranking/surveillance tool whose accompanying paper is entitled Plug and Prey: Malicious USB Devices. The way that it works ostensibly this. First, you surreptitiously connect the device to your targets' computer. Then, depending on what you want, you either set the thing to record keyboard/mouse input for later collection, or have it emulate a mouse/keyboard to run random malicious programs[5]. Like I said, the phukd library makes it a fairly explicit goal to screw with people's computers in this way with a separate CommandAtRunBar for each of Windows, GNOME and OS X. That's just ... mean.

Putting it all together, and as usual, I'm almost 100% sure that I'm not even remotely the first person to think this up, but why not just put your passwords onto a hardware key?

It'll be something like the actual teensy keyboard implementation, but bind each button to a sequence of N characters that get typed out whenever it's pushed. Here's what I came up with after a bit of fiddling (pardon the giant code block, C isn't quite as easy for me to express ideas in).

#include <avr/io.h>
#include <avr/pgmspace.h>
#include <avr/interrupt.h>
#include <util/delay.h>
#include <string.h>
#include "usb_keyboard.h"

#define LED_CONFIG      (DDRD |= (1<<6))
#define LED_ON          (PORTD &= ~(1<<6))
#define LED_OFF         (PORTD |= (1<<6))
#define CPU_PRESCALE(n) (CLKPR = 0x80, CLKPR = (n))

uint16_t idle_count=0;

int key_from_char(char c){
  if (c >= 'a' && c <= 'z') return c - 93;
  if (c >= 'A' && c <= 'Z') return c - 61;
  if (c >= '1' && c <= '9') return c - 19;
  if (c == '0') return 39; // 0 is low in ASCII, high in the USB key definition
 
  // there's no easy mapping between the other ASCII characters and 
  // USB keyboard keys, so these are all 
  switch (c) {
  case 32: return KEY_SPACE; break;
  case 33: return KEY_1; break;
  case 34: return KEY_QUOTE; break;
  case 35: return KEY_NUMBER; break;
  case 36: return KEY_4; break;
  case 37: return KEY_5; break;
  case 38: return KEY_7; break;
  case 39: return KEY_QUOTE; break;
  case 40: return KEY_9; break;
  case 41: return KEY_0; break;
  case 42: return KEYPAD_ASTERIX; break;
  case 43: return KEYPAD_PLUS; break;
  case 44: return KEY_COMMA; break;
  case 45: return KEYPAD_MINUS; break;
  case 46: return KEY_PERIOD; break;
  case 47: return KEYPAD_SLASH; break;

  case 58: return KEY_SEMICOLON; break;
  case 59: return KEY_SEMICOLON; break;
  case 60: return KEY_COMMA; break;
  case 61: return KEY_EQUAL; break;
  case 62: return KEY_PERIOD; break;
  case 63: return KEY_SLASH; break;
  case 64: return KEY_2; break;

  case 91: return KEY_LEFT_BRACE; break;
  case 92: return KEY_BACKSLASH; break;
  case 93: return KEY_RIGHT_BRACE; break;
  case 94: return KEY_6; break;
  case 95: return KEY_MINUS; break;
  case 96: return KEY_TILDE; break;

  case 123: return KEY_LEFT_BRACE; break;
  case 124: return KEY_BACKSLASH; break;
  case 125: return KEY_RIGHT_BRACE; break;
  case 126: return KEY_TILDE; break;

  default: return 0;
  }
}

int modifier_from_char(char c){
  if ((c >= 'A' && c <= 'Z') ||
      c == 33 || c == 34 ||
      (c >= 36 && c <= 38) ||
      c == 40 || c == 41 ||
      c == 58 || c == 60 ||
      (c >= 62 && c <= 64) ||
      c == 94 || c == 95 ||
      (c >= 123 && c <= 126)) return KEY_SHIFT;
  
  return 0;
}

int8_t usb_keyboard_print(char *s){
  int s_len = strlen(s);
  int i;

  for(i = 0; i < s_len; i++){
    usb_keyboard_press(key_from_char(s[i]), modifier_from_char(s[i]));
  }
}

int main(void) {
  uint8_t b, mask, i;
  uint8_t b_previous=0xFF;
  CPU_PRESCALE(0); // set for 16 MHz clock

  DDRB = 0x00;
  PORTB = 0xFF;
 
  usb_init();
  while (!usb_configured()) /* wait */ ;
  _delay_ms(1000);

  while (1) {
    b = PINB;
    mask = 1;

    for (i=0; i<8; i++) {
      if (((b & mask) == 0) && (b_previous & mask) != 0) {
        usb_keyboard_print("abcdABCD1234!@#$%^&*()_+|~{}:\">?<-=\\`[];',./");
      }
      mask = mask << 1;
    }

    b_previous = b;
    _delay_ms(2);
  }
}

It's actually very close to the example code that runs the keyboard, except that I've modified it to take ASCII strings as input and pretend to type them out. I didn't complicate things (yet) by trying to figure out how to hook up an LED or micro SD card. It just keeps passwords in as part of the program source, and outputs one password per button.

I'm going to develop this over the next little while, at least while my interest is focused enough on it, into a unit that stores a bunch of passwords, lets me select which one I want (indexed by what the password belongs to), and output it. Ideally, I'd also get this down to a package small enough that I could just put on a physical keychain. That's a pretty good learning project for the short term.

Anyway, full circle with the fucking of passwords, when I'm done, this will be a little machine that I can plug into any USB-capable computer, push a button or two, and have it output (with much better recall and more accuracy than I could) a password that can be as massive and unique as I damn well please. It's not computer-specific (so I don't need to worry about syncing it, except for backup purposes), and it works anywhere I'd want to plug in an actual keyboard. That should kill the security problems dead. Of course, I'll need to keep some sort of safeguard in case it stops working or falls into the wrong hands, but at least I won't be forced to pick between convenience and security anymore.

It seems like a pretty obvious, elegant solution to the problem, conceptually. I can see why it hasn't caught on yet though; if you want to use something like this, you basically need to build it yourself. If I had to buy one of these dongles rather than build one, there's no chance in hell that I'd do it. I'd have to either really, truly, trust the provider enough to believe that they'd never snoop on my passwords or log unrelated information, OR I'd need to get a blank key and the source code and then put them together myself. There are just too many fragile points to do it any other way.

And that's about it. Oh, actually, I did toss this code up to codereview, in case someone out there is actually a C coder and wants to tear me a new one about something. I'm still chewing through clomments issues, and I've actually done some real work on cl-chan which I'll hopefully be kicking up soon; just as soon as I decide how to treat the image manipulation pieces.


Footnotes

1 - [back] - If you're also interested, but don't want to drop the $500 on a full, two-chip eval board, you can also get a breadboard kit for about $60 total, though you have to buy the individual chip, power supply and breakout board.

2 - [back] - My only experience with it was writing a few toy programs in gforth, which I get the feeling won't be very useful.

3 - [back] - In marked contrast to the more popular Arduino.

4 - [back] - Like the actual, physical computer I use everyday.

5 - [back] - Or just commandeer the browser to take your target to goatse