Sunday, May 30, 2010

The Zipper

In talking about destination passing style, I mentioned that a basic difficulty in purely functional programming is that data structures must be built bottom-up and traversed top-down. Huet's zipper is a functional data structure that allows more flexible navigation and functional alteration.

Zippers are best illustrated on a data structure more complicated than lists, so we'll look at binary trees with Nat-labeled leaves:

Tree ::= Leaf Nat | Branch Tree Tree

We want to be able to navigate up and down within a tree, while functionally updating it, and without starting from the top every time. First, imagine we have a cursor somewhere in the tree, from which we can navigate in any direction, or functionally update the tree under the cursor. We can represent this situation by separating the tree into two parts: a tree context (a tree with a hole), representing the tree above the cursor, and a tree, representing the tree under the cursor.

TreeCtx ::= Hole | Left TreeCtx Tree | Right Tree TreeCtx
Cursor = TreeCtx * Tree


It should be clear that given a cursor, we can "plug the hole" to get the full tree back, forgetting where we were focusing within it. A cursor gives us a way to track our location within a tree, but navigating upwards into the tree is still expensive, because we must navigate the context from top down.

The next trick will be familiar to anyone who has seen frame stacks and the CK machine. We can define

TreeFrame ::= LFrame Tree | RFrame Tree
FrameStack = List(TreeFrame)


and observe that the types TreeCtx and FrameStack are isomorphic. Now that we've recognized contexts as a list of frames, though, we can further recognize that we can keep this list in reverse order. That way, traversing the list top down amounts to traversing the context bottom up.

Zipper = FrameStack * Tree

goUp (Nil, _) = Error "Already at top"
goUp (Cons (LFrame t) fs, u) = (fs, Branch u t)
goUp (Cons (RFrame t) fs, u) = (fs, Branch t u)

goDownLeft (fs, Leaf _) = Error "At leaf"
goDownLeft (fs, Branch t u) = (Cons (LFrame u) fs, t)

goDownRight (fs, Leaf _) = Error "At leaf"
goDownRight (fs, Branch t u) = (Cons (RFrame t) fs, u)

update (fs, _) t = (fs, t)


All of these operations take constant time; they also all implicitly involve allocation. The data structure is called the "zipper" because moving up and down the tree is analagous to moving a zipper up and down its tracks.

There is a close relationship to Deutsch-Schorr-Waite (pointer-reversing) algorithms, which can be understood as the imperative counterpart to the zipper. Finally, one of the neat things about zippers is that they can be generated in a uniform way for most data structures we use in functional programming -- and this is done through an analog to derivatives, applied at the level of types! See Conor McBride's page for the details.

No comments:

Post a Comment