[UPDATE, 2015-02-23: The post has been slightly updated based on comments by Edward Kmett.]
In trying to understand the Bound library, and in particular, its limitations (if any) in terms of expressiveness and efficiency. I will here try to implement a small subset of the Feldspar EDSL using Bound.
DISCLAIMER: Statements in this text may very well reflect my misunderstandings rather than actual truths about Bound.
Note that I’m assuming basic familiarity with Bound. If you’re not familiar with the library, see its documentation.
I would be happy to receive feedback and hints for how to improve the code.
The whole code can be found here: FeldsparBound.lhs
Implementing Feldspar using Bound
Here is the subset of Feldspar that we will work with:
data Data a = Var a | Int Int | Add (Data a) (Data a) | Ix (Data a) (Data a) | Par (Data a) (Scope () Data a) deriving (Eq,Show,Functor,Foldable,Traversable)
Ix represents array indexing, and
Par represents array construction, corresponding to the following function on lists:
parList :: Int -> (Int -> a) -> [a] parList l f = [f i | i <- [0 .. l-1]]
For simplicity, I chose to put the
Scope directly on
Par instead of going through a lambda as we do in Feldspar.
Monad instance defines substitution in a standard way:
instance Monad Data where return = Var Var a >>= k = k a Int a >>= k = Int a Add a b >>= k = Add (a >>= k) (b >>= k) Ix a i >>= k = Ix (a >>= k) (i >>= k) Par l f >>= k = Par (l >>= k) (f >>>= k) instance Applicative Data where pure = return; (<*>) = ap
Here I choose to represent free variables as strings:
type Name = String type D = Data Name
The user interface is then as follows:
v :: Name -> D v = Var instance Num D where fromInteger = Int . fromInteger (+) = Add (!) :: D -> D -> D (!) = Ix par :: Name -> D -> D -> D par i l f = Par l $ abstract1 i f
Here are two simple example expressions:
ex1 = par "i" 10 $ v "i" + 0 -- Corresponds to `parList 10 $ \i -> i + 0` ex2 = par "i" 10 $ v "arr" ! (v "i" + 0) -- Corresponds to `parList 10 $ \i -> arr !! (i + 0)`
In the definition of
par, we use
abstract1 is to bind the free variable (convert to deBruijn). It has to traverse the whole body, so the complexity is quadratic in the number of nested binders.
Now to the interesting question of how one goes about to transform terms in the above representation.
Ideally, we want to do this without looking at deBruijn indices at all. Two questions arise:
- How to apply a transformation under a
- How to pattern match on binders and variables and rewrite them?
For the first question, my initial naive attempt was to use a combination of
abstract1 to lift a transformation on
D to a transformation on
transScopeBAD :: (D -> D) -> Scope () Data Name -> Scope () Data Name transScopeBAD f = abstract1 "x" . f . instantiate1 (Var "x")
The use of
instantiate1 changes all occurrences of the bound variable to the free variable
Var "x" and
abstract1 re-binds that variable after
f has been applied to the body. Unfortunately, this doesn’t work. If there are nested scopes, their corresponding variables will all be renamed to
"x", so the innermost
abstract1 is going to capture all of them. Also, since
abstract1 has to traverse the whole body, nested uses of
transScope are going to be inefficient.
A better approach is to work more directly with the representation and use
toScope (here with the type specialized to
transScope :: (Data (Var b a) -> Data (Var b a)) -> Scope b Data a -> Scope b Data a transScope f = toScope . f . fromScope
(This version of
transScope was suggested by Edward. In my initial attempt, I used
Note how the type clearly expresses the fact that the term passed to the transformation function
f has one extra variable (of type
b) in addition to the free variables (of type
a). This information was missing in
transScope, we can now write the following simplifier:
simplify :: Data a -> Data a simplify e = simp $ case e of Add a b -> Add (simplify a) (simplify b) Ix a i -> Ix (simplify a) (simplify i) Par l f -> Par (simplify l) $ transScope simplify f _ -> e where simp (Add (Int 0) b) = b simp (Add a (Int 0)) = a -- The catch-all case `simp e = e` will come later
Note that it is important for
simplify to be polymorphic in the free variables since we’re using it at a different type when passing it to
transScope. This requirement was not obvious to me at first. (In retrospect, it is obvious that the polymorpic type is preferred, since I don’t want the simplifier to do anything with the free variables.)
To make the simplifier more interesting, we add a case corresponding to the following rule for
parList (defined above):
parList l f !! i ==> f i
This turns out to be easy:
simp (Ix (Par l f) i) = instantiate1 i f
Another similar rule that we may want to express is
parList l (\i -> arr !! i) ==> arr (where i `notElem` freeVars arr)
(In both of these rules, we ignore out-of-bounds indexing for simplicity.)
To express this rule, we need to pattern match on binders and variables. My initial attempt was the following:
-- simp (Par l f) -- | Ix arr i <- instantiate1 (v "i") f -- , i == v "i" -- , "i" `Foldable.notElem` arr -- = arr
This looks quite nice from a readability perspective, but it doesn’t work because
transScope does not permit free variables as strings. (And, in general, using
instantiate1 in this way has the same problem as
transScopeBAD above, that the variable
"i" might easily clash with other instantiated variables.)
Instead, it seems that one has to work more directly with the deBruijn representation:
simp (Par l f) | Ix arr i <- fromScope f , Var (B ()) <- i , Just arr' <- lower arr = arr' -- Catch-all case to complete the definition of `simp`: simp e = e
lower function tries to lower the scope of an expression, and succeeds if the expression does not refer to the closest bound variable:
lower :: Data (Var b a) -> Maybe (Data a) lower = traverse $ unvar (const Nothing) Just
Note the use of
fromScope rather than
unscope above. This is to change the generalized deBruijn representation into one the standard representation which
I would have hoped to avoid matching explicitly on the deBruijn representation of
Var (B ())). But I guess this is just unavoidable?
The check that
i does not occur in
arr is replaced by the pattern match on the result of
lower. One nice thing here is that the type system does not allow us to return
arr directly, so we cannot accidentally forget to check whether
i occurs in
arr (which I tend to forget everytime!).
*Main> ex1 Par (Int 10) (Scope (Add (Var (B ())) (Int 0))) *Main> simplify ex1 Par (Int 10) (Scope (Var (B ()))) *Main> ex2 Par (Int 10) (Scope (Ix (Var (F (Var "arr"))) (Add (Var (B ())) (Int 0)))) *Main> simplify ex2 Var "arr"
At first I was tempted to use
abstract1 everywhere and to represent free variables as strings. Although this leads to code that looks readable, it does not work in general:
- Representing free variables as strings can easily lead to capturing bugs
- It leads to inefficient code:
abstract1has to traverse the whole expression, so it should be used sparingly
instantiate1may not have to traverse the whole expression due to the generalized deBruijn representation, but I’m not sure what this means in practice
From a capturing point of view, it may be considered OK to use
abstract1 and variables as strings the way I did in the user interface above. The user will just have to make sure to use distinct variable names. However, due to the nested uses of
abstract1, it is still very inefficient to construct terms in this way. A better way is probably to construct terms in a different representation and then convert them all to deBruijn in a single pass.
Once I realized that I couldn’t use
abstract1 for transformations under a
Scope, I was forced to use
toScope which expose a bit the representation of variables. However, this turned out to be a very good thing that just made the code safer.
The resulting code does not use
abstract1, so it avoids that source of inefficiency. There is one use of
instantiate1, but that is a substitution coming from the rewrite rule – it has nothing to do with the Bound library (and it could be avoided by rewriting to a let binding instead of performing the substitution).
In the end, there appears to be only one problematic inefficiency left: the
lower function. In order to know whether an expression refers to the closest bound variable,
lower naturally has to traverse the whole expression. Of course, this is true regardless of how variables are represented. But in a standard named representation of variables, it is e.g. possible to return a set of free variables from
simplify, so that membership can be tested without traversing the expression. I don’t see how that could be done here.
In conclusion, I see many advantages of using Bound, especially when it comes to safety and not worrying about capturing. However, there are some things that make me hesitate on adopting it for something like Feldspar:
- I wasn’t able to completely hide the deBruijn indices in the simplifier. Can it be done? Is this going to be a big problem for people trying to express rewrite rules?
lowerfunction is a potential source of inefficiency in a simplifier. Can it be fixed? Will there be other inefficiencies as the rules get more complex?
If someone has the answers to those questions, I would be happy to know!