2009年5月4日星期一

A Lambda Calculus Interpreter in Haskell

I will present a Lambda Calculus interpreter in this post. It is an (almost) exact translation from the ML code in chapter 9 of ML for the Working Programmer by Lawrence C. Paulson. This program incorporates a parser to convert string to an internal represenation of lambda expression, a pretty printer to reconvert the internal represenation to a formatted string in a human-readable form, and two evaluation functions (lazy and strict respectively) to reduce a given lambda expression to its normal form.

Enjoy.

Code below:
------------

module Lci (
            Lambda
           ,reader           -- String -> Lambda
           ,pr               -- Lambda -> String
           ,byValue          -- Lambda -> Lambda
           ,byName           -- Lambda -> Lambda
           ,evalV            -- String -> Lambda
           ,evalN            -- String -> Lambda
           ,tryV             -- String -> IO ()
           ,tryN             -- String -> IO ()
           ) where

-- Parser
import Text.ParserCombinators.Parsec
import qualified Text.ParserCombinators.Parsec.Token as T
import Text.ParserCombinators.Parsec.Language (emptyDef)

-- Pretty Printing
import qualified Text.PrettyPrint as PP
import Text.PrettyPrint (Doc, (<>), (<+>))

-- Environment
import qualified Data.Map as Map
import Data.Maybe


-------------------
-- Data
-------------------

data Lambda = Free String
            | Bound Int
            | Abs String Lambda
            | App Lambda Lambda
              deriving (Eq, Show, Read)

type Env = Map.Map String Lambda
 

-------------------
-- Parser
-------------------

abstract :: Int -> String -> Lambda -> Lambda
abstract i s (Free s') | s == s'   = Bound i
                       | otherwise = Free s'
abstract _ _ (Bound i) = Bound i
abstract i s (Abs s' t) = Abs s' (abstract (i+1) s t)
abstract i s (App t1 t2) = App (abstract i s t1) (abstract i s t2)

-- Abstraction over several free variables
absList :: [String] -> Lambda -> Lambda
absList xs t = foldr (\x acc -> Abs x (abstract 0 x acc)) t xs

-- Application of t to several terms               
appList :: Lambda -> [Lambda] -> Lambda
appList = foldl (\acc x -> App acc x)

-- Shift a term's non-local indices by i
shift :: Int -> Int -> Lambda -> Lambda
shift 0 _ u = u
shift i d u@(Free s) = u
shift i d u@(Bound j) = if j >= d then Bound (j+i) else Bound j
shift i d (Abs s t) = Abs s (shift i (d+1) t)
shift i d (App t1 t2) = App (shift i d t1) (shift i d t2)

-- Substitution for bound variable

subst :: Int -> Lambda -> Lambda -> Lambda
subst _ _ t@(Free _) = t
subst i u t@(Bound i') | i' <>
                       | i' == i   = shift i 0 u
                       | otherwise = Bound (i' - 1) -- non-local to t
subst i u (Abs s t') = Abs s (subst (i+1) u t')
subst i u (App t1 t2) = App (subst i u t1) (subst i u t2)

-- Lexer & Parser
lexer :: T.TokenParser ()
lexer = T.makeTokenParser emptyDef

whiteSpace= T.whiteSpace lexer
lexeme    = T.lexeme lexer
symbol    = T.symbol lexer
parens    = T.parens lexer
identifier= T.identifier lexer

p_term :: Parser Lambda
p_term = do{ symbol "^"
           ; ids <- many1 identifier
           ; symbol "."
           ; t <- p_term
           ; return $ absList ids t
           }
     <|> do{ t <- p_atom
           ; ts <- many p_atom
           ; return $ appList t ts
           }

p_atom :: Parser Lambda
p_atom = do{ s <- identifier
           ; return $ Free s
           }
     <|> parens p_term

p_top :: Parser Lambda
p_top = do{ whiteSpace
          ; t <- p_term
          ; eof
          ; return t
          }

reader :: String -> Lambda
reader input = case (parse p_top "" input) of
                  Left err -> error $ show err
                  Right x  -> x


-------------------
-- Pretty Printing
-------------------

-- Free variables in a term
vars :: Lambda -> [String]
vars (Free s) = [s]
vars (Bound _) = []
vars (Abs s t) = vars t
vars (App t u) = vars t ++ vars u

-- Rename variable "a" to avoid clashes
rename bs a = if a `elem` bs then rename bs (a ++ "'") else a

-- Remove leading lambdas, return bound variable names
stripAbs t = strip [] t
    where strip bs (Abs s t) =
              let b = rename (vars t) s
              in strip (b:bs) (subst 0 (Free b) t)
          strip bs u = (reverse bs, u)

pr_term :: Lambda -> Doc
pr_term (Free s) = PP.text s
pr_term (Bound i) = PP.text "??UNMATCHED INDEX??"
pr_term t@(Abs _ _) = 
    let (b:bs, u)     = stripAbs t
        spaceJoin b z = " " ++ b ++ z
        binder        = "^" ++ b ++ (foldr spaceJoin ". " bs)
    in PP.sep [PP.text binder, pr_term u]
pr_term t = PP.sep (pr_app t)
            
pr_app (App t u) = pr_app t ++ [PP.nest 1 (pr_atom u)]
pr_app t         = [pr_atom t]
pr_atom (Free s) = PP.text s
pr_atom t        = PP.nest 1 $ PP.parens $ pr_term t

pr :: Lambda -> String
pr = PP.render . pr_term


-------------------
-- Reduction
-------------------

eval (App t1 t2) =
    case eval t1 of
      (Abs a u) -> eval (subst 0 (eval t2) u)
      t         -> App t (eval t2)
eval t = t

byValue :: Lambda -> Lambda
byValue t = bodies (eval t)
    where bodies (Abs a t) = Abs a (byValue t)
          bodies (App t u) = App (bodies t) (bodies u)
          bodies t         = t

headNF (Abs a t) = Abs a (headNF t)
headNF (App t u) = 
    case headNF t of
      (Abs a t') -> headNF (subst 0 u t')
      u'         -> App u' u
headNF t = t

byName :: Lambda -> Lambda
byName = args . headNF
    where args (Abs a t) = Abs a (args t)
          args (App t u) = App (args t) (byName u)
          args t         = t


-----------------------------------------
-- Standard Environment (Lambda Prelude)
-----------------------------------------

-- Substitution for free variables from environment
inst :: Env -> Lambda -> Lambda
inst env t@(Free s)  = fromMaybe t (Map.lookup s env)
inst env t@(Bound i) = t
inst env (Abs s t')  = Abs s (inst env t')
inst env (App t1 t2) = App (inst env t1) (inst env t2)

insertEnv :: Env -> (String, String) -> Env
insertEnv env (a, s) = Map.insert a (reader s) env

stdEnv = foldl insertEnv Map.empty [
          -- booleans
          ("true",  "^x y. x"), ("false", "^x y. y")
         ,("if",    "^p x y. p x y")
          -- pairs
         ,("pair",  "^x y f. f x y")
         ,("fst",   "^p. p true"), ("snd", "^p. p false")
         ]


-------------------
-- Evaluators
-------------------

evalWith :: (Lambda -> Lambda) -> Env -> String -> Lambda
evalWith fn env = fn . inst env . reader
              
evalV = evalWith byValue stdEnv
evalN = evalWith byName  stdEnv

tryV = putStrLn . pr . evalV
tryN = putStrLn . pr . evalN

2009年2月6日星期五

Delayed-Branch-Slot Optimization and the Result

Background

 An Instruction Set Simulator (ISS) is a program that takes stream of target instructions and simulates the effects of execution of the code stream on a host machine. Usually the target and the host are of different architectures, but nothing prevents you from running a, say, x86 simulator on a PC. Base on this definition, an ISS is nothing but an ordinary interpreter, and from the fact that it interprets machine code instead of high level languages, hence freeing from lexical analysis, parsing, byte-code compilation, garbage collection…, etc., all sorts of complications one normally finds in modern high level language interpreters, it seems that an ISS is a trivial (often labor-intensive though) construct to make.

 Well, not quite so…

 The tricky thing is that you want your simulator runs fast, really fast. A naive implementation with a central “read-evaluate-loop” structure could hardly exceed the speed of interpreting over 10 million target instructions per second (MIPS). While 10 MIPS may be okay for running standalone, user-mode applications, it is hardly enough for running an OS, which easily takes billions of instructions just to boot up.

 Threaded-code is an elegant control structure that strike at a sweet point of balance between implementation complexity and run-time efficiency. A threaded-code ISS with is capable of running system level code at about 100 MIPS with full accuracy at the instruction level.

 This is the technique we actually adopted in developing an ISS for a RISC processor. While happy with the end performance, we want to see if we can squeeze more runtime efficiency out of it. One thing that made me feel less satisfactory in our implementation was the way we coped with branch delay slot.

 Conceptually, processor executes instructions sequentially, one following another, until it reaches a branch or jump instruction, where it branches to an effective address other than the one following it. However, in some RISC machines, to better use of the pipeline structure, branching is delayed after the execution of the instruction immediately following it. The position following the branch instruction is called the branch delay slot. This is one of the few occasions where hardware natures becoming visible to programmers.

 One way to simulate this behavior -- perhaps the most obvious one -- is to keep a Boolean valued flag, whose value is set by every branch/jump instruction (if the branch/jump is taken), and cleared otherwise. This flag is checked at the end of the service routine of every normal instruction. If it is set, we jump to a pre-calculated target address; if it is not, we simple increase the instruction pointer and move on to the next instruction.

 A slightly better solution that can do away with the conditional statement associated with the Boolean flag is to maintain two instruction pointers, ip and next_ip at the same time, so that we always jump to the address pointed to by next_ip after the execution of every instruction.

 Both of these two solutions use an auxiliary variable to simulate the correct execution flow of the processor in the presence of branch delay slot. In either case, a runtime overhead is added for every normal instruction, even though most instructions do not locate in the delay slot.

 Can we do better?

Idea for Optimization

 It did not take long before I realized that whether an instruction locates in a delayed slot is a matter that can be resolved in the decoding phase, e.g., it is independent of the state of the processor. Hence for each instruction that may possibly reside in the delayed slot, we create two specialized intermediate form instructions. The instruction service routines of the two specialized forms have most of their interpretative code in common but have different dispatching semantics. Instructions that do not locate in delayed slot will simply increase the instruction pointer and jumping to the instruction service routine of the next instruction. Instructions that locates in the delayed slot in general must check if the branch was taken (for conditional branch/jump instructions) and if so, set the instruction pointer to the target address calculated by the previous branch/jump instruction and transfer control to the service routine of the instruction at that address. If the branch is not taken, the behavior is the same as those that do not locates in the delayed slot.

 In essence, this is another application of the instruction specialization optimization technique typically found in instruction set simulators where an intermediate form of decoded instructions are utilized to avoid decoding the same stream of instructions over and over again. The difference is, in this case, instead of specializing on the bit pattern of an individual instruction group, we specialize over the spatial position of individual instructions relative to special (branch/jump) instructions.

Implementation

 Once the idea is clear, the implementation is quite straight forward.

 We translate raw instruction stream into the decoded form on a page-by-page basis. A page is the smallest unit (normally 4KB, but may vary in modern architectures) that the Memory Management Unit (MMU) of the processor breaks up the memory address space for critical OS-related operations such as address translation, protection, etc., as a whole. Doing so has the extra benefit that in-page branching, which is predominately the usual case against off-page branching, does not need to go through the MMU – an expensive operation that could degrade the performance of the simulation significantly.

 While decoding the raw instructions of a certain page, we maintain a local flag that is set when the instruction under decoding is a branch/jump instruction and is cleared otherwise. This flag is checked for every instruction: if it is set, we choose the delayed-slot version of the interpretive code of that instruction; if it is clear, we use the normal version of the interpretive code of that instruction. Note that this is very much the same as we did in the pre-optimization implementation, only this time it is done in the decoding phase instead of run time.

 The only problem of that scheme is what to do with the first instruction on a page. Where does the initial value of that flag come from? While cross-page branch delay slot is uncommon, the architecture does not exclude this from happening, so we must handle this in our interpreter.

 The answer is that we maintain a similar, but non-local flag in each of the decoded page, whose value is determined by the last instruction on that page. The local flag then gets its initial value from that state flag of its previous page. If its previous page does not exist when this page is decoded, then the initial value is by default set to false.

 The correctness of this algorithm is based on the following reasoning: if the first instruction on a page is in delayed slot, then we must get here from executing instructions of its immediate previous page, which must have its decoded page in memory with the in-slot flag being correctly set.

 Result

 Happy with this discovery (I am not aware of anyone coming up with a similar idea to the best of my knowledge. This is a very specific domain of application anyway.), I re-implemented the original threaded-code instruction set simulator with the new algorithm and expect a somewhere close to 20% performance improvement.

 How did I come up with this 20% number? Simple. The intrinsic figure of merit of the performance of an instruction set simulator is the average number of host instructions executed to interpret one target instruction. This number is independent of the clock frequency of the host processor. While it certainly depends on the instruction set architecture of the host processor, we have reason to believe that it does not vary significantly from architectures to architectures. I estimate that on average our original simulator executes 20 host instructions to interrupt one target instruction, and by specializing non-branching instructions with regard on their relative position in the code stream, I could at least reduce 3 host instructions on average. One does not need to be particularly good at math to compute the final result then. It is a very rough, but nevertheless valid estimation.

 To my surprise, the benchmark shows that the new simulator executes at 105 MIPS (Millions Instructions per Second) against 115 MIPS of the original one. The delayed-slot “optimization” I have labored with actually brings down the performance by almost ten percent!

 How do I explain the seemingly contradictory result?

 It could be that the time the simulator spends on decoding is not negligible comparing with the time it spends on executing. In this case, the overhead incurred in the new simulator in the decoding phase could offset what it gains in the execution phase. But a more likely reason that I suspect is that the interpretive code has been restructured inadvertently in the new simulator for a less-optimal memory layout for utilizing the cache system of the host machine. This is just a guess. I don’t have a definite answer to this puzzle yet.

 Conclusion

 So what is the lesson of this story?

 While not directly relevant, and by no means did I regret for this small adventure to optimize an already fast simulator, I could not help but thinking of the famous remark D. Knuth made many years ago: “premature optimization is the root of all evil”.

 Optimization is hard, harder than most of us would expect. A simple solution is often the best solution. When optimization is necessary, we should try to optimize “in the large”. For example, by replacing the simple, classic read-evaluate-loop control structure with direct-threaded code, we get an order of magnitude performance boost. Surely it is worth the effort for a performance critical application. But don’t waste your time on tiny piece optimizations that you thought would work. Even when it works, it may take its penalty on something else, say, code clarity, which is more important in an engineering sense than a, say, 5% increase of runtime efficiency.

 Let me end up here by quoting from another great mind: “keep things simple, but no simpler.”


2009年1月4日星期日

Useful Shell Shortcuts

Here are some small, simple, but surprisingly useful shell commands that I use everyday. Hopefully, you may find it useful too.

Change Directory and List Content

alias cd..='cd ..' alias cd...='cd ../..'  cl() {     cd $1 && ls } 

The functions of these commands are self-explanatory and may seem trivial on the first sight. But since ls and cd are the most common shell commands we use everyday, these shortcuts will save you a tremendously amount of key stokes in the long run.

List Sub-directories Only

alias lsd=alias lsd='ls -F | grep / | sed -e '\''s/\///g'\'' | column' 

Say that you are in a large code base (for instance, the Linux kernel) and you want to see if a particular sub-directory exists but do not want to clutter the output from the ls command with all the ordinary files, then this shortcut comes in handy.

Trash instead of Delete

alias trash="mv -t ~/.Trash --backup=t" 

This command moves files into a specific location instead of delete them permanently. It is the shell version trash can utility usually found in a GUI environment. Note that in Ubuntu, you might want to replace ~/.Trash with~/.local/share/Trash/files, the standard location for trashed objects.


2008年12月16日星期二

A Simple Symbolic Differentiation Program in Haskell

This program is inspired from, again, SICP. Comparing to the Scheme code (section 2.3.2 of SICP), my program has a front-end parser that converts external representation (a String) of an expression to its internal representation (AST). The purpose of writing this program is to get familiar with the Parsec library in particular and to gain a better understanding of monadic parsers in general.
Loading the program in Hugs, a simple session go like this:
Main> deriv_x "x+3"
1
Main> deriv_x "x*y"
y
Main> deriv_x "(x+3)*x*y"
((x)+((x)+(3)))*(y)
Main> deriv_x "(x+3)^2"
(2)*((x)+(3))
The output is unfortunately cluttered with a lot of unnecessary parenthesizes. This can be solved by writing a specific function that normalize expressions to a canonical form. I am sure there is well-defined algorithm to do this but as I said above, writing a well-polished program doing differentiations really is not the purpose here.  The complete program is list below:
-- Simple symbolic differentiation program
module Main where

import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import qualified Text.ParserCombinators.Parsec.Token as T
import Text.ParserCombinators.Parsec.Language

-- Symbolic Expression

type Symbol = Char

data Expr = Add Expr Expr
| Mul Expr Expr
| Exp Expr Integer
| Var Symbol
| Num Integer
deriving (Eq)

instance Show Expr where
show e = case e' of
Num x -> show x
Var x -> [x]
Add u v -> "(" ++ show u ++ ")" ++ "+" ++ "(" ++ show v ++ ")"
Mul u v -> "(" ++ show u ++ ")" ++ "*" ++ "(" ++ show v ++ ")"
Exp u n -> "(" ++ show u ++ ")" ++ "^" ++ show n
where e' = simplify e

-- Deriving

deriv_ :: Symbol -> Expr -> Expr
deriv_ x e = simplify $ deriv__ x e
where deriv__ _ (Num _) = Num 0
deriv__ x (Var s) | (s == x) = Num 1
| otherwise = Num 0
deriv__ x (Add u v) = Add (deriv__ x u) (deriv__ x v)
deriv__ x (Mul u v) = Add (Mul (deriv__ x u) v) (Mul u (deriv__ x v))
deriv__ x (Exp u n) = Mul (Mul (Num n) (Exp u (n-1))) (deriv__ x u)

-- Expression Simplifier

simplify :: Expr ->; Expr
simplify e = let e' = simplify' e
in if e == e' then e else simplify e'
where
simplify' e@(Num n) = e
simplify' e@(Var x) = e
simplify' (Add (Num 0) u) = u
simplify' (Add u (Num 0)) = u
simplify' (Add (Num n) (Num m)) = Num (n + m)
simplify' (Add u v) = Add (simplify' u) (simplify' v)
simplify' (Mul (Num 0) v) = Num 0
simplify' (Mul u (Num 0)) = Num 0
simplify' (Mul (Num 1) v) = v
simplify' (Mul u (Num 1)) = u
simplify' (Mul (Num n) (Num m)) = Num (n * m)
simplify' (Mul u v) = Mul (simplify' u) (simplify' v)
simplify' (Exp u 0) = Num 1
simplify' (Exp u 1) = simplify u
simplify' (Exp (Num m) n) = Num (m ^ n)
simplify' (Exp u n) = Exp (simplify u) n

-- Parser

lang = T.makeTokenParser emptyDef

natural = T.natural lang
operator c = T.lexeme lang (char c)
variable = T.lexeme lang lower


expr = buildExpressionParser table factor
"expression"

mkNode :: (Expr -> Expr -> Expr) -> Expr -> Expr -> Expr
mkNode op t1 t2 = op t1 t2

mkAdd = mkNode Add
mkMul = mkNode Mul

mkExp :: Expr -> Expr -> Expr
mkExp e (Num n) = Exp e n
mkExp e _ = error "exponent must be a number"

table = [[op '^' (mkExp) AssocRight]
,[op '*' (mkMul) AssocLeft]
,[op '+' (mkAdd) AssocLeft]
]
where
op c f assoc
= Infix (do{ operator c; return f} <?> "operator") assoc

factor = T.parens lang expr
<|> do {v <- natural; return $ Num v }
<|> do {v <- variable; return $ Var v}
<?> "factor"

-- Driver

parseExpr :: String -> Expr
parseExpr input = case parse expr "" input of
Left err -> error $ "parse error at " ++ show err
Right out -> out

deriv_x = deriv_ 'x' . parseExpr

2008年12月11日星期四

"Game of Life" in Haskell

This post includes a Haskell implementation of Conway’s game of life. There is nothing particularly fancy here and the implementation is not of the most efficient one either but, hi, this is my first interactive Haskell program (and it is not hard at all).


type Size = Int
type Cell = (Int, Int)

type Board = [Cell]
data Life = Life Size Board

instance Show Life where

show (Life size board) =
[if c == size then '\n'

else if (r, c) `elem` board then '@'

else '-' | r <- [0..size-1],

c <- [0..size]]

next_life :: Life -> Life

next_life (Life size board) = Life size new_board where
new_board = survivors ++ births
survivors = [cell | cell <- board,

let n = living_neighbors cell
in n == 2 || n == 3]

births = [(r,c) | r <- [0..size-1],

c <- [0..size-1],
let cell = (r, c)

in (not (cell `elem` board)) && ((living_neighbors cell) == 3)]

living_neighbors cell = length $ filter is_living $ neighbors cell
is_living cell = cell `elem` board
neighbors (r, c) = [((r - 1) `mod` size, ((c - 1) `mod` size)),

((r - 1) `mod` size, c),

((r - 1) `mod` size, ((c + 1) `mod` size)),

(r , ((c - 1) `mod` size)),

(r , ((c + 1) `mod` size)),

((r + 1) `mod` size, ((c - 1) `mod` size)),

((r + 1) `mod` size, c),

((r + 1) `mod` size, ((c + 1) `mod` size))]

interactive :: Life -> IO ()
interactive life
= do print life
c <- getChar
if c == 'q' then

return ()
else
interactive $ next_life life




Loading the above program into Hugs and an interactive session goes like this:

Main> interactive $ Life 5 [(1,3), (2,1), (2,3), (3,2), (3,3)]
-----
---@-
-@-@-
--@@-
-----


-----
--@--
---@@
--@@-
-----


-----
---@-
----@
--@@@
-----


-----
-----
--@-@
---@@
---@-

q

Main>

Have fun!

2008年12月6日星期六

Huffman Tree Decoder

Section 2.3.3 of SICP contains a program that decodes messages from a pre-built Huffman tree.

 

The Scheme code:

 

(define (make-leaf symbol weight)

  (list 'leaf symbol weight))

 

(define (leaf? object)

  (eq? (car object) 'leaf))

 

(define (symbol-leaf x) (cadr x))

(define (weight-leaf x) (caddr x))

(define (make-code-tree left right)

  (list left

        right

        (append (symbols left) (symbols right))

        (+ (weight left) (weight right))))

 

(define (left-branch tree) (car tree))

(define (right-branch tree) (cadr tree)) 

(define (symbols tree)

  (if (leaf? tree)

      (list (symbol-leaf tree))

      (caddr tree)))


(define (weight tree)

  (if (leaf? tree)

      (weight-leaf tree)

      (cadddr tree))) 


(define (decode bits tree)

  (define (decode-1 bits current-branch)

    (if (null? bits)

        '()

        (let ((next-branch

               (choose-branch (car bits) current-branch)))

          (if (leaf? next-branch)

              (cons (symbol-leaf next-branch)

                    (decode-1 (cdr bits) tree))

              (decode-1 (cdr bits) next-branch)))))

  (decode-1 bits tree))

 

(define (choose-branch bit branch)

  (cond ((= bit 0) (left-branch branch))

        ((= bit 1) (right-branch branch))

        (else (error "bad bit -- CHOOSE-BRANCH" bit))))

 

The Haskell code that does the same thing:

 

data Bit = B0 | B1

           deriving (Eq, Ord)

 

instance Show Bit where

    show B0 = "0"

    show B1 = "1"

 

type Symbol = Char

type Weight = Int

 

data Tree = Leaf Symbol Weight

          | Node [Symbol] Weight Tree Tree

 

decode :: Tree -> [Bit] -> [Symbol]

decode tree [] = []

decode tree bits = s : decode tree rest

    where (s, rest) = decode1 tree bits

 

decode1 :: Tree -> [Bit] -> (Symbol, [Bit])

decode1 (Node _ _ (Leaf sym _) _           ) (B0:rest) = (sym, rest)

decode1 (Node _ _ left         _           ) (B0:rest) = decode1 left rest

decode1 (Node _ _ _            (Leaf sym _)) (B1:rest) = (sym, rest)

decode1 (Node _ _ _            right       ) (B1:rest) = decode1 right rest

 

-- Test

 

sample_message :: [Bit]

sample_message = [B0, B1, B1, B0, B0, B1, B0, B1, B0, B1, B1, B1, B0]

 

sample_tree :: Tree

sample_tree = Node ['A', 'B', 'C', 'D'] 8

                   (Leaf 'A' 4)

                   (Node ['B', 'C', 'D'] 4

                         (Leaf 'B' 2)

                         (Node ['C', 'D'] 2

                               (Leaf 'C' 1)

                               (Leaf 'D' 1)))

 

-- decode sample_tree sample_message

-- "ACABBDA"

 

Comments:

I think everybody will agree that the Haskell code is much cleaner and more concise than the corresponding Scheme code. I attribute the cleanness and conciseness to Haskell's algebraic data type and pattern matching structure: where in Scheme one express any data structure in lists in Haskell one defines data structures more naturally in data types; where in Scheme one fetch fields of data structures using ca(d*)r, in Haskell one binds fields of data structures to formal parameters via pattern matching.

 

During the course of learning Haskell, I come to the feeling that Scheme is the assembly language for functional programming comparing with Haskell (or ML and the like) in the sense that assembly languages are low-level comparing to C or even C++ in the imperative languages world.

 

Saying Scheme is a low-level language by no means implies that Scheme is a bad functional language. No. I thought and still believe that Scheme is the one of the most elegant programming language in the world. Just as learning an assembly language will help make one a better C programmer, learning Scheme will get one closer to the underlying computation model (Lambda Calculus) that underpins all functional programming languages, so that one can distinguish the essentials from the non-essentials and don't get lost in the complex constructs offered by a modern language like Haskell (However, I do think typing is something essential but totally missing in Scheme).

 

All I am saying is that Scheme remains as a good (perhaps the best) programming language for educational purpose, but for real-world product development, Haskell (or the ML family) is probably a better candidate.

 

P.S.:

Philip Walder has a paper discussing the merit of statically typed, lazy functional programming languages (KRC/Miranda) over dynamically typed, eager evaluation languages (Scheme/Lisp). An enlightening reading.