Pittsburgh (CMU) 2024

Sam Estep: Polymorphism in an Efficient Differentiable Programming Language

Reverse-mode automatic differentiation (autodiff) is quite useful for performing numerical optimization with complicated objectives and constraints. Autodiff has been somewhat well-studied in the context of many language features, but that is currently obscured by more basic forms of autodiff in popular machine learning frameworks like PyTorch. Achieving the correct asymptotic complexity for reverse-mode autodiff is subtle. Recent literature has explored this formally for functional languages, but this work often omits the consideration of functions with custom derivatives, such as iterative solvers. In this talk I will first illustrate a reverse-mode autodiff scheme that achieves the right asymptotics on an expressive source language, after which I will lay out some language design challenges and possible solutions for specifying custom derivatives.

Roberto Ierusalimschy: An Alternative Formalization for References

In their book Programming Language Foundations, chapter Typing Mutable References, Pierce at al. use “store typings” to type locations and then be able to prove a preservation theorem for a language with references and side effects. In this talk I present an alternative formalization for that chapter, adding types directly to the location terms. This alternative formalization has a complexity on par with the original, but it moves some complexity from the design to the proofs, which in my view makes for an easier understanding. (Open to debate!!) This alternative is also more flexible, as it allows different terms pointing to the same location to have different types.

Roberto Ierusalimschy: A Compact Implementation for Arrays in Lua

A problem in several languages with dynamic typing, and in Lua in particular, is the space wasted with alignment in tagged values. In this talk I review some ways to avoid this waste and present a novel(?) way to store tagged values that have good locality, portability (that is, can be expressed in ANSI C), easiness for resizes, and reasonable access costs. (Obs: I am not sure there are enough slots available for two talks. Obviously I can present only one talk, whichever the audience finds more interesting. Anyway, I guess this will be [or can be] a short talk.)

Tom Van Cutsem: A study of design patterns in Agoric’s Zoe smart contracting framework

Zoe is a framework for writing smart contracts (i.e. programs embedded in a blockchain) using a capability-secure subset of JavaScript (“Hardened JavaScript”). Smart contracts are primarily used to implement financial instruments such as token swaps, covered call options, etc. Starting from a small but representative corpus of Zoe contracts, we mine the code for recurring software patterns. We identify two types of patterns: “trade patterns” at the level of the smart contract logic, and “code patterns” at the level of object-capability programming. We will highlight a few of these patterns during the talk. Time-permitting, we will also discuss a few language design issues that recently came up in Hardened JavaScript - in particular with respect to “hardening” a graph of objects, and whether and how the “hardened” state of an object graph can be safely virtualized through reflective proxy objects.

Matthew Weidner: Positions: An Abstraction for Distributed Ordering

Many GUI applications involve lists in which insertions and deletions shift the indices of later elements - e.g., the list of characters in a text editor. The application then needs a way to refer to an element independently of its current index, for features like undo stacks, text annotations, and especially collaborative editing. In this talk, I describe “positions”: immutable, totally ordered identifiers that let you refer to list elements across time and across devices. Positions come from the literature on Conflict-free Replicated Data Types (CRDTs) but have more general uses. I describe two challenges of using positions - efficiency, and distributed metadata management - and possible solutions. While positions are traditionally implemented as a library feature, I invite discussion about what direct language support might look like, especially in distributed programming languages.

Klaus Ostermann: Deriving Dependently-Typed OOP from First Principles

The expression problem describes how most types can easily be extended with new ways to produce the type or new ways to consume the type, but not both. When abstract syntax trees are defined as an algebraic data type, for example, they can easily be extended with new consumers, such as print or eval, but adding a new constructor requires the modification of all existing pattern matches. The expression problem is one way to elucidate the difference between functional or data-oriented programs (easily extendable by new consumers) and object-oriented programs (easily extendable by new producers). This difference between programs which are extensible by new producers or new consumers also exists for dependently typed programming, but with one core difference: Dependently-typed programming almost exclusively follows the functional programming model and not the object-oriented model, which leaves an interesting space in the programming language landscape unexplored. We explore the field of dependently-typed object-oriented programming by deriving it from first principles using the principle of duality. That is, we do not extend an existing object-oriented formalism with dependent types in an ad-hoc fashion, but instead start from a familiar data-oriented language and derive its dual fragment by the systematic use of defunctionalization and refunctionalization. Our central contribution is a dependently typed calculus which contains two dual language fragments. We provide type- and semantics-preserving transformations between these two language fragments: defunctionalization and refunctionalization. We have implemented this language and these transformations and use this implementation to explain the various ways in which constructions in dependently typed programming can be explained as special instances of the general phenomenon of duality.

Joe Politz: An Incremental Approach to JIT Construction

Abdulaziz Ghuloum’s An Incremental Approach to Compiler Construction is a blueprint for a whole family of compilers courses that present students with a working end-to-end compiler from day one. There are two key design decisions that support this: first, carefully selecting the order of language features to support this incrementalism (numbers, then arithmetic, then binding, then datatypes, then control and functions, then heap-allocated data), and second, using an existing toolchain (gcc and nasm) to generate authentic binaries outside of any simulation or special environment.

Last time I taught compilers, I attempted an extension to Ghuloum’s approach that added just-in-time components to each of the steps. Early on, this is centered on a REPL experience for compiled code where each REPL entry is separately compiled (with shared global variables). Then, a runtime environment can omit certain dynamic checks when the values of variables are known. Later, functions are added and a simple form of specialization is presented. The compiler and runtime are both implemented in Rust, and make use of existing libraries for assembling and evaluating generated code at runtime.

In this first offering, these were offered as optional assignment extensions. Some students succeeded at the project. The next goal is to refine the material to make it more tractable to include as a required (and natural) part of the existing assignments. In particular (and unsurprisingly) dynamic generation of specialized code for functions requires particular attention and scaffolding.

Michael Greenberg: A unified shell AST

The POSIX shell is critical infrastructure, and it’s used in its minimal form in shells like dash, ash, and busybox’s sh. But other, more featureful shells—notably bash and zsh, among others—go well beyond POSIX, defining custom syntax and giving concrete meaning to things left undefined by POSIX (like local variables 😭). We lack good tools for the shell in part because parsing it is so hard. What would it look like to have a single, unified shell AST?

Matthew Flatt: Macro-Extensible Binding Positions in Rhombus

Rhombus is a new language that is built on Racket and that takes advantage of Racket’s macro facilities to implement its own macro extensibility. At the same time, Rhombus goes further than Racket by allowing macro extension of syntactic positions other than expression and definition positions—including binding positions. Extensible binding positions enable pervasive pattern matching, annotations that can implement contracts or type-like static information, and variables that stand for a sequence of values and can be used only in repetition positions. In the talk, I’ll describe the current protocol for binding expansion in Rhombus. Maybe it’s too tied to the specific features that we have in Rhombus. Maybe it’s interesting and worth pursuing in a more formal way. Maybe WG members will know existing work that the Rhombus team should be looking at for more inspiration.twit

Nada Amin: Staged Relational Programming

We present the design and implementation of staged-miniKanren, which adds high-level facilities for generating miniKanren code from miniKanren, a relational programming language. Writing a relational interpreter for a functional language enables the synthesis of functional programs. We build on this, and stage a relational interpreter, removing the interpretative overhead, and directly compiling functions to relations. We can synthesize program sketches, where the known parts are optimized. We rely on stage polymorphism to run known parts at full speed while falling back to interpretation for unknown parts. Beyond miniKanren, this work has implications for programmable program synthesis.

Lindsey Kuper: Library-Level Choreographic Programming

Choreographic programming (CP) is an emerging paradigm for programming distributed applications that run on multiple nodes. In CP, the programmer writes one program, called a choreography, that is then transformed to individual programs for each node via a compilation step called endpoint projection (EPP). While CP languages have existed for over a decade now, library-level CP – in which choreographies are expressed as programs in an existing host language, and choreographic language constructs and EPP are provided entirely by a host-language library – is still in its infancy.

In this talk, we’ll assess the current state of the art of library-level CP. We’ll discuss which elements of CP are especially tricky to pull off at the library level, mull over some possible implementation approaches, and consider what features a host language ought to have to be CP-library-friendly.

Jonathan Aldrich: Meerkat: Scalable Reactive Programming with Live Updates

Reactive programming is now the default paradigm for client-side web development. But today’s reactive frameworks only help on the client side; integration with server-side data is awkward and error prone. In 2015, Domingues and Costa Seco proposed a tierless approach to reactive programming that also provides safe, live updates with running code. We are developing Meerkat, a language and runtime extending this approach to make it scale up to replicated servers and data stores.

Luke Church: Candela & Tom: An experiment in language tooling for the arts, and how to pay for it

I would like to talk about my current project Candela & Tom, a live language that attempts to implement a live entelechy - push/pull execution, an IDE built using techniques from the games industry and tight (and hopefully not too annoying) integration of synthesis. As well as making some art together, I’d also like to talk about how logics of capital have influenced the language design.

Geoffrey Litt: End-User Programming + LLMs: opportunities for PL design

End-user programming (EUP) environments, e.g. spreadsheets, help people write programs outside the context of professional software engineering. We now have a new tool to support EUP: LLM-powered program synthesis. But LLMs are far from a silver bullet, and require thoughtful design from both the PL and HCI perspectives to integrate into EUP workflows. In this talk, I share some early work on environments and techniques for using LLMs to support EUP, and suggest some opportunities for the language design community in this area.

Cyrus Omar: SAG: Semantics and Strategy Augmented Generation from LLMs

LLM-powered program synthesis is based on an impoverished cursor-local token-based model of the synthesis task. We are hooking LLMs up to language servers and strategy languages to semantically and strategically contextualize and semantically constrain their predictions. I will present an overview of both some of our initial efforts (where initial results in Hazel are extremely promising!) and some future ideas for the community to think about as we (re)consider the important question of where natural language processing belongs in developer workflows. I will also talk about how AI safety connects to capability safety as a side discussion.

Jan-Willem Maessen: SQL: A Rant

There are many things to love about SQL; the relational model and notion of a non-Turing-complete DSL that does one thing very well have been incredibly effective. But from a language designer’s perspective it’s a hot mess that will cause you to repeatedly scream, What? and Why?! Here we’ll take a quirky biased tour focusing on naming and binding semantics. If we’re lucky maybe we’ll see dialects that address some of these problems, but I’ll focus mainly on a couple of dialects I’m familiar with and have accessible documentation.

Jonathan Edwards: Operational Version Control

It would be useful to have version control like git but for data structures, particularly the data structures we call spreadsheets, databases, and ASTs. Operational Version Control observes changes to typed data structures by recording high-level operations performed in a GUI or API. These operations can change both values and types, with type changes inducing corresponding value changes (so-called schema migration). Version control capabilities such as differencing, merging, and reverting are constructed out of transformations on operation histories. This theory requires as input a set of rules formalizing the intuitive sense that two operations “do the same thing” in different states. Our prototype implementation present the user a simplified conceptual model: branches with linear append-only histories, forking by copying branches, and cherry-picking as the fundamental merge operation.

Tomas Petricek: Programming Systems Deserve a Theory Too!

Most programming is done by modifying source code in a text editor or an IDE and most programming research is focused on designing better programming languages. But what if we shift our attention from “programming languages” to “programming systems” and think about programming as interaction with a rich stateful system? In this talk, I will explore the consequences of this shift in focus. I will discuss what kind of research methodologies we need for studying programming systems, interesting aspects of past programming systems that become invisible if we think just about the language and, finally, what kind of systems we may design if we ask how to make programming more open, transparent and accessible.

Christos Dimoulas: The Rational Programmer

I will describe a new method we have been developing with my students to investigate questions about the pragmatics of programming languages. The talk will revolve around an instance of the method that aims to explore how different semantics for gradual typing affect debugging.

Peter Zhong: LLM Programming Abstractions

LLMs facilitate a new way of programming in which “instructions” are no longer unambiguous APIs but are statements in a natural language like English. Experts in this space, a new area known as prompt engineering, program their LMs—or elicit specific behaviors from them—by combining particular keywords, prompt formats, and even cognitive models. My talk aims to provide a review of frameworks that enable developers to integrate LLMs and prompting into their programming workflow using a systematic taxonomy that we coined as the Language Model System Interface Model (LMSI), and provide a map of the landscape of frameworks for abstracting interactions with and between LLMs.

Tobias Wrigstad: Behaviour-Oriented Concurrency and Isolation in Verona and Beyond

The Verona programming language is a new systems-level programming language in development as a collaboration between Azure Research, Imperial College and Uppsala University. Two of its most important novel features are a new concurrency model and an ownership system for ensuring data-race freedom and memory management performance reasoning. Both these rely heavily on a capability-based static type system. In the first part of this talk, I’ll briefly overview Verona’s concurrency model and how its data-race freedom is guaranteed. In the second part of the talk, I will talk about the challenges of the on-going effort of porting these results to the untyped Python programming language to remove its global interpreter lock, enable fully concurrent cycle detection, while (hopefully) maintaining backwards compatibility.

Ben L. Titzer: Virgil’s New Directions for Systems Programming

Systems programming is a little like pornography: ill-defined obscenity that titillates some and disgusts others. When pressed for a definition, we might remark “I’ll know it when I see it.” But what is interesting and unique about systems programming, and what challenges does it present programming language designers? In this talk, I’ll cover some language design choices and features of Virgil, both new and old, explain the rationale for some design decisions, talk a little about implementation issues and techniques, and explore directions for both research contributions in the space and practical advancements of the state of the art.