Tuesday, April 5, 2011
Tuesday, October 12, 2010
ICFP roundup, day #3
I missed a lot of day 3, but here were some standouts:
- Functional Parallel Algorithms Guy Blelloch gave a fantastic invited talk on the analysis of parallel algorithms. His work -- now 15 years old -- uses a parallelized version of the lambda calculus as a fundamental model of computation. The parallelism is exactly that expected from a pure functional language: in a function application, the operator and operand are evaluated in parallel. Using this simple model, it is possible to characterize both the "work" (roughly, reduction steps in a sequential semantics) and the "span" (roughly, longest sequence of data dependencies). The span can be understood as the minimum time needed to perform a computation given an unlimited supply of parallel processors. Guy gave a big-step cost semantics for his model, and then walked through several variations of sorting algorithms, analyzing them all. Elegant and clearly-presented.
- Rethinking Supercompilation Supercompilation is, roughly, partial evaluation combined with case splitting -- that is, you attempt to symbolically execute the program at compile-time, and insert case analysis when the value of an unknown is required. Neil Mitchell's paper gives a new architecture for supercompilation that is both simple and performant. The main insight seems to be the handling of let bindings: the intermediate language is a variant of ANF, which forces a lot of design choices in a good direction. This is one of the papers we read before heading to ICFP. The talk was quite good, and helped clarify many details from the paper.
- Parametricity and Dependent Types This talk addressed a question I've been curious about for a while: what does parametricity look like in a dependently-typed setting? The talk showed that a single logical relation can be given covering all pure type systems (PTS) at once. The trick is to use the expressive power of PTS to give the logical relation as a type-to-type transformation, where the types in the range may land in a different location in the lambda cube from those in the domain. If you specialize the relation to, say, System F, it's easy to see how the resulting types correspond precisely to the definition of the relation in second-order logic. I still haven't had time to sit down and work through the details, partly because I need to learn more about PTS first. Very curious to see what the impact of this paper turns out to be.
Sunday, October 10, 2010
ICFP roundup, day #2
Here are the talks from day #2 that really stood out for me:
- The impact of higher-order state and control effects on local relational reasoning This was probably my favorite result presented at the conference, though not my favorite talk. The work studies two orthogonal language features -- higher-order state and control -- and characterizes reasoning principles valid in the presence or absence of these features. The reasoning principles, like the features themselves, are orthogonal. I believe in each case the model is fully abstract, but this is due more to the use of biorthogonality than the particulars of each feature. Still, the model seems quite insightful and I'm looking forward to reading the paper.
- Distance makes the types grow stronger Jason Reed gave a fun talk on "differential privacy" and how it can be achieved via a type system. The work is in the area of anonymity of data: how can you release enough data to perform useful statistical work without also leaking private information? A timely and surprisingly difficult problem. Differential privacy, if I understood it correctly, is a kind of metric-continuity: if you change a data set by a small amount, say by deleting a record, the information you can glean from the data set should likewise change by a small amount. The type system is meant to enforce this property.
- Fortifying macros Ryan Culpepper gave an excellent talk on his macro system, known as syntax-parse. The talk told a convincing story about both the power of macros, and the difficulty of making them robust. As is often the case, error checking and reporting code easily dwarfs and obscured the intended functionality of macros. The syntax-parse system allows you to write macros in a very direct style, but still provide robust error handling, most of which you get for free from the system. To make this work, syntactic forms are given a more explicit treatment: you can define a form like binding-list-without-repetitions, give it a user-friendly name and appropriate error handling, and then use it freely in macros. Such extra structure helps to keep macros at a high level of abstraction, and to give the system the right hooks to perform error handling on its own.
Thursday, October 7, 2010
ICFP roundup, day #1
Update: my apologies if this post appears multiple times in RSS: I had a bit of a tussle with blogger.
Here are some of the talks from ICFP day 1 that I got something out of:
Here are some of the talks from ICFP day 1 that I got something out of:
- ML: Metalanguage or Object Language? Mike Gordon's invited talk was a biographical talk about Milner. He used flashy presentation software that would've been annoying for a technical talk, but worked well for a talk with a lot of photos and scanned letters. One of my favorite anecdotes was the following: Dana Scott, having at last produced a model of the untyped lambda calculus, had become embarrassed by his previous insistence that only the typed version was worthwhile. He wrote a letter to Milner, asking him to stop working with LCF (which was based on the typed version) and instead work with the untyped calculus. Milner stuck with LCF -- writing, basically, "well, I've already got all this code, you see..."
- Functional Pearl: Every Bit Counts This was a great talk (and an even greater paper). The idea is to build coding schemes based on question-answer games, where every question adds some information ("every bit counts"). The paper presents a beautiful collection of combinators for describing such games, which yield correct-by-construction encoders and decoders. Perhaps the neatest aspect of the approach is that codes can be built such that, for example, every possible bit sequence is (a prefix of) a code for a well-typed program. The authors plan to explore applications to typed bytecodes, as in the CLR or JVM.
- Lolliproc: to Concurrency from Classical Linear Logic via Curry-Howard and Control Mazurak takes a new stab at Curry-Howard for concurrency. From a high level, the system seems beautifully simple, and the laws you gain from classical linear logic are interpreted directly as concurrency primitives. From a type theoretic perspective, the work seemed insightful, but from a pure concurrency perspective it was exploring well-trodden ground. Definitely worth keeping an eye on.
- Abstracting Abstract Machines I've seen Van Horn's talk before, but it's nice even the second time around. His work shows how to construct static analyses in a turn-the-crank fashion, using reduction semantics. The key point is to factor every component of the program state that lives in an infinite domain through a heap: each time you want to yield an inhabitant of the domain, you must allocate a new address. You can then bound the size of the heap, and the various ways of doing this yield various well-known analyses. Very cool!
- Lazy Tree Splitting Mike Rainey's talk was about work on the Manticore project. They are dealing with nested data parallelism and have taken a work-stealing approach. Mike's talk focused on a technique for splitting up work as late as possible, when one processor detects that others are idle. Since work is tree-structured, you need a way to quickly split up the tree of work and communicate parts of it to other threads. The technique appears to give robust, scalable performance in practice, which is what they were after: more eager approaches only scale well with application-specific tuning.
- Logical Types for Untyped Languages I've also seen Tobin-Hochstadt's talk before, but the conference version was a worthwhile re-run! This work takes occurrence typing in a new, very elegant direction, by internalizing propositional logic directly into the type system. There's been a lot of work using theorem provers within type checkers (for, say, gradual typing), but this type system can do basic logical reasoning without a theorem prover, and in a robust way. As usual from the PLT group, the work was directly motivated by, and tested against, a large body of real code.
Monday, October 4, 2010
ICFP, POPL
Back from ICFP, and slowly getting resettled. I'll be writing about the conference soon!
Meanwhile, I'm very happy to report that my paper with Mitch Wand, A separation logic for refining concurrent objects, has been accepted for publication in POPL 2011. You can find a draft of the paper here.
Meanwhile, I'm very happy to report that my paper with Mitch Wand, A separation logic for refining concurrent objects, has been accepted for publication in POPL 2011. You can find a draft of the paper here.
Abstract. Fine-grained concurrent data structures are crucial for gaining performance from multiprocessing, but their design is a subtle art. Recent literature has made large strides in verifying these data structures, using either atomicity refinement or separation logic with rely-guarantee reasoning. In this paper we show how the ownership discipline of separation logic can be used to enable atomicity refinement, and we develop a new rely-guarantee method that is localized to the definition of a data structure. We present the first semantics of separation logic that is sensitive to atomicity, and show how to control this sensitivity through ownership. The result is a logic that enables compositional reasoning about atomicity and interference, even for programs that use fine-grained synchronization and dynamic memory allocation.
Wednesday, September 22, 2010
The join calculus
How do you solve the dining philosopher's problem in your favorite concurrency paradigm? Can you do it in a decentralized way, so that philosophers desiring unrelated chopsticks never get in each other's way?
The join calculus, originally developed for distributed programming, offers a truly elegant solution to this classic concurrency problem. In fact, the solution requires little more than stating the problem:
This bit of pseudocode treats each chopstick as a separate channel, and also uses channels to represent requests for chopsticks (hungry). There are a collection of joins, which provide an event handler that should fire when messages are available on a given set of channels. All the relevant messages are removed from their queues in a single atomic step.
Aside from a few important details (e.g. are channels synchronous? it depends), that example provides a pretty complete picture of what the join calculus offers. It's possible to code up solutions to a number of classic problems in a similarly succinct way: reader-writer locks, barriers, condition variables, semaphores, Santa Claus...
Of course, many synchronization constructs (even the lowly lock) are capable of encoding a large variety of other constructs, though perhaps not as elegantly. The real question is whether such encodings provide the same parallel scalability that a direct solution does.
To answer this question for the join-based encodings, we need to know how joins are implemented. An implementation must provide the atomicity guarantee described above, and every implementation I'm aware of does this by using a single lock per group of joined channels. This centralized lock destroys scalability because it sequentializes use of the channels and firing of events. In the case of the dining philosophers, it means that two philosophers sitting far from one another must still contend for the same lock before picking up their independent chopsticks.
I've been working on an algorithm and implementation that allows for a much finer grain of concurrency. Stay tuned...
The join calculus, originally developed for distributed programming, offers a truly elegant solution to this classic concurrency problem. In fact, the solution requires little more than stating the problem:
channel hungry[1..n]
channel chops[1..n]
when hungry[1], chops[1], chops[2] do
eat(1); chops[1]; chops[2]; think(1); hungry[1]
...
when hungry[n], chops[n], chops[1] do
eat(n); chops[n]; chops[1]; think(n); hungry[1]
initially
hungry[1] ; ... ; hungry[n]
chops[1] ; ... ; chops[n]
This bit of pseudocode treats each chopstick as a separate channel, and also uses channels to represent requests for chopsticks (hungry). There are a collection of joins, which provide an event handler that should fire when messages are available on a given set of channels. All the relevant messages are removed from their queues in a single atomic step.
Aside from a few important details (e.g. are channels synchronous? it depends), that example provides a pretty complete picture of what the join calculus offers. It's possible to code up solutions to a number of classic problems in a similarly succinct way: reader-writer locks, barriers, condition variables, semaphores, Santa Claus...
Of course, many synchronization constructs (even the lowly lock) are capable of encoding a large variety of other constructs, though perhaps not as elegantly. The real question is whether such encodings provide the same parallel scalability that a direct solution does.
To answer this question for the join-based encodings, we need to know how joins are implemented. An implementation must provide the atomicity guarantee described above, and every implementation I'm aware of does this by using a single lock per group of joined channels. This centralized lock destroys scalability because it sequentializes use of the channels and firing of events. In the case of the dining philosophers, it means that two philosophers sitting far from one another must still contend for the same lock before picking up their independent chopsticks.
I've been working on an algorithm and implementation that allows for a much finer grain of concurrency. Stay tuned...
Tuesday, September 21, 2010
ICFP pregame: Reduceron
Some PRL members have been meeting to discuss ICFP papers prior to the conference. Since we worked hard to grok the essence of each, it seems worth posting that here.
The Reduceron Reconfigured
Matthew Naylor and Colin Runciman
Long ago (the '80s) there was a great deal of interest in building architecture geared toward functional languages (taking Backus's Turing Lecture literally). These efforts burned out, though, because industry heavyweights were able to make unsurpassable progress with traditional von Neumann machines.
This paper revisits functional architecture in the world of FPGAs. I can't say how much their architecture differs from those proposed in the '80s, but the big emphasis in the paper is on a certain kind of parallelism. This parallelism stems from the purity of the language, but not in the way you might think: it is not at the level of program expressions but at the level of instructions.
How do you execute a beta-reduction with several arguments? On a von Neumann architecture, you do it in many sequential steps. The proposal here is to provide instructions that can, in a single cycle, execute such a beta-reduction. That single instruction interacts with memory in several ways: it loads the definition of the function, it pops the arguments off the stack, and it does appropriate pointer surgery to replace formals by actuals. In parallel, and in one step.
The paper proposes a compilation strategy and instruction set, and discusses an FPGA implementation. On the benchmarks they give, the FPGA comes within an order of magnitude of GHC running on a modern Intel processor -- and the FPGA clocks 30x slower than the Intel. Given how much engineering has gone into GHC, that's pretty impressive.
A complaint: the takeaway wasn't clear. Even with their technique, the FPGA is a lot slower (in absolute terms) than what we have with GHC. So having an FPGA coprocessor, as they suggest, seems fruitless. Should we design ASICs along the lines of this paper?
The Reduceron Reconfigured
Matthew Naylor and Colin Runciman
Long ago (the '80s) there was a great deal of interest in building architecture geared toward functional languages (taking Backus's Turing Lecture literally). These efforts burned out, though, because industry heavyweights were able to make unsurpassable progress with traditional von Neumann machines.
This paper revisits functional architecture in the world of FPGAs. I can't say how much their architecture differs from those proposed in the '80s, but the big emphasis in the paper is on a certain kind of parallelism. This parallelism stems from the purity of the language, but not in the way you might think: it is not at the level of program expressions but at the level of instructions.
How do you execute a beta-reduction with several arguments? On a von Neumann architecture, you do it in many sequential steps. The proposal here is to provide instructions that can, in a single cycle, execute such a beta-reduction. That single instruction interacts with memory in several ways: it loads the definition of the function, it pops the arguments off the stack, and it does appropriate pointer surgery to replace formals by actuals. In parallel, and in one step.
The paper proposes a compilation strategy and instruction set, and discusses an FPGA implementation. On the benchmarks they give, the FPGA comes within an order of magnitude of GHC running on a modern Intel processor -- and the FPGA clocks 30x slower than the Intel. Given how much engineering has gone into GHC, that's pretty impressive.
A complaint: the takeaway wasn't clear. Even with their technique, the FPGA is a lot slower (in absolute terms) than what we have with GHC. So having an FPGA coprocessor, as they suggest, seems fruitless. Should we design ASICs along the lines of this paper?
Thursday, September 16, 2010
Shared-state versus message-passing
One of the most basic distinctions made in concurrency theory is between shared-state concurrency and message-passing concurrency. I'm finally getting around to reading John Reppy's book on Concurrent ML, and he has a succinct characterization of the two:
Shared-memory languages rely on imperative features ... for interprocess communication, and provide separate synchronization primitives. Message-passing languages provide a single unified mechanism for both synchronization and communication.
Tuesday, September 14, 2010
Hacking on, and with, Racket
Things have been a bit quiet on the research front: I've been doing a a bit of hacking and a lot of yak shaving.
Today's accomplishment: I've added a new primitive to Racket for compare-and-set. In hindsight, this is a pretty simple patch, maybe a couple-dozen lines. But getting there required some quality time with the 14Kloc jit.c file.
The upshot is that, with this patch, Racket can be used for writing non-blocking algorithms, modulo ongoing work to increase concurrency in its runtime system. I'm planning to design and build a comprehensive library for parallelism, inspired by java.util.concurrent but with somewhat different goals. I hope to have more to say about this soon.
Today's accomplishment: I've added a new primitive to Racket for compare-and-set. In hindsight, this is a pretty simple patch, maybe a couple-dozen lines. But getting there required some quality time with the 14Kloc jit.c file.
The upshot is that, with this patch, Racket can be used for writing non-blocking algorithms, modulo ongoing work to increase concurrency in its runtime system. I'm planning to design and build a comprehensive library for parallelism, inspired by java.util.concurrent but with somewhat different goals. I hope to have more to say about this soon.
Tuesday, September 7, 2010
Code → paper
I'm working on a paper where the main contribution is an algorithm, and the interesting aspect of the algorithm is not a complexity bound, or even a liveness property. Its appeal is harder to formalize: it enables fine-grained parallelism.
The evidence here is empirical, not theoretical. Consequently, when I started simplifying from real- to pseudo-code, my coauthor balked. There are at least two reasons pseudocode is a bad idea. First, it's easy to introduce mistakes. Second, it's a bit dishonest: your numbers relate to the real code, not a platonic version of it.
But there are good arguments in favor of pseudocode, too. In our case, it's relatively painless to treat the simplification I wanted to do as a refactoring of the real code -- so our library improves along the way. That's partly due to writing the code in a relatively high-level language. But in more complex, low-level cases, making the code presentable while keeping it executable could be much trickier, or even impossible.
What's a reasonable guideline for using pseudocode?
The evidence here is empirical, not theoretical. Consequently, when I started simplifying from real- to pseudo-code, my coauthor balked. There are at least two reasons pseudocode is a bad idea. First, it's easy to introduce mistakes. Second, it's a bit dishonest: your numbers relate to the real code, not a platonic version of it.
But there are good arguments in favor of pseudocode, too. In our case, it's relatively painless to treat the simplification I wanted to do as a refactoring of the real code -- so our library improves along the way. That's partly due to writing the code in a relatively high-level language. But in more complex, low-level cases, making the code presentable while keeping it executable could be much trickier, or even impossible.
What's a reasonable guideline for using pseudocode?
Subscribe to:
Posts (Atom)