Compiler paths: policy DSL to Calyx eDSL #38

anshumanmohan · 2024-07-24T03:25:26Z

anshumanmohan
Jul 24, 2024
Maintainer

This is an attempt to nail down the various handshakes that need to happen as we try to compile programs written in our policy DSL to hardware designs in the Calyx eDSL. From the Calyx eDSL we will go to hardware, but for our purposes the Calyx eDSL is the target.

Overview, and Challenge

Let's work with work-conserving policies for now; I'll talk about PIFO trees and not PIEO trees for that reason. Let's ignore the peek operation. Also, pops do need to be supported but are not all that challenging, so I will ignore them for now.

The source of our compiler is very clear: it is a policy, as defined in our policy DSL. This is a tree-shaped OCaml object whose leaves are classes and whose internal nodes are policies. The policies, for the purposes of this discussion, are just easy "off the shelf" things: FIFO, Fair, and Strict. That is, we aren't thinking about handwritten policies in the style of this issue.

What is the target of our compiler? Well, given a user-written policy, we need to figure out two things that will together realize the policy in the Calyx eDSL:

A set of PIFOs, in particular as many PIFOs as there are nodes in the policy tree. For now let's say these PIFOs are our general-purpose binary heaps, which take a value and a rank and insert the value with that rank.
Some logical way to react to a push command. That is, some of our PIFOs are acting like the leaves of a tree and the others are acting as nodes; at push we need to push a payload value at a leaf and push integer values into the leaf's ancestor nodes as described in Formal Abstractions. In addition to figuring our what to put into what PIFO, we must figure out the ranks for each of these pushes.

It is easy to see that step 1 above is not all that hard. It is step 2 that is the challenge. How are we going to figure out what to push where, and with what rank? And that too in hardware?

In the figure below, the rows have the same values. This is because the left column represents the given tree while the right column represents the equivalent tree after tree-to-tree compilation in the Formal Abstractions sense (hereafter, T2TC). Only one version of T2TC exists right now, and so it is shown with a solid arrow. Our starting point is on the top left (policy DSL) and our end goal is the bottom right (Calyx eDSL, after T2TC). The rest of this post discusses the various ways I see of getting from our source to our target.

Approach 1: decentralized clever pieces of hardware (red way)

This approach handles the logic of rank-selection in a decentralized way. It goes directly from the policy to the Calyx eDSL.

Say the user wrote the following policy using the DSL: strict(fair(A, B, C), D). This policy has six nodes. We won't just spin up six naive PIFOs, but rather we'll spin up four PIFOs that have fifo shims on them, a PIFO that has a 3-way fair shim on it, and a PIFO that has a 2-way strict shim on it.

When I say "naive PIFO", I mean a normal/general/textbook PIFO, meaning it takes a value and a rank and inserts that value with that rank.

When I say "PIFO with a shim", I mean a wrapped-up gadget that no longer takes a value and a rank, but rather just takes a value. The shim is a little bit of machinery that maintains internal state and internal logic sufficient to determine a rank for that value, possibly updating its state in the process. The gadget has an underlying naive PIFO, into which this value is then enqueued with that rank.

If we spin up clever little pieces of hardware like this, then the big challenge above (finding "some logical way to react to a push command") is no longer so hard. Our PIFO+shim gadgets have the tricky logic baked in; we just need to remember the parent-child relation between them. When push is called, we will be given a payload and a target leaf (one of the FIFO-shimmed gadgets). We will simply give the payload to the leaf, letting the leaf itself determine a rank, and then we will approach the leaf's parent and have it enqueue an index n corresponding to which index child the leaf was from the PoV of the parent (e.g., if I was my parent's second child and I receive an input, I will complete the enqueue into myself and then tell my parent to please enqueue a 2). Again we must just furnish n, and the parent node will itself figure out what rank n should get. This will recurse upwards until we hit the root.

At first blush I actually think this could work great. It directly goes from the policy DSL to the Calyx eDSL, it makes use of the many little gadgets that the undergrads have built over the summer. A drawback of this approach, though, is that the "decentralized but clever" approach is not very amenable to T2TC. We don't have this version of T2TC written out obviously, and further, the T2TC is currently designed to operate on a centralized logic (where an insertion path is computed for an incoming packet) and this is different from the decentralized system discussed above. Since each "shim+PIFO" gadget will basically be a black box, it may be hard to enact a hardware-level version of the T2TC that introduces transient nodes.

I'm not saying that T2TC here is impossible, but I am saying that the trade-off is becoming clear: we get an easier solution to the problem of setting up a working hardware solution that can handle a push, but we need to do some interesting new work for T2TC. In the figure, this approach is shown in red.

Approach 2: exploit OCaml T2TC (blue way)

Burned by the above, this approach seeks to exploit the existing T2TC that we have! That T2TC is from a "source" control to a target control. So how do we exploit this?

We need to compile the user-given policy into c_src, which is a control that one might hand-write in the OCaml development. Examples of such handwritten policies can be found in alg.ml. I don't think this will be so bad. Aside: we can also run our existing visualizer on c_src to make sure that the user-written policies are behaving as expected.
We get to use the existing T2TC to get a new control c_tar that simulates c_src.
We need to compile c_tar into Calyx. This will be tricky. Let's talk about it after breaking out of this little list.

Recall that a control is a triple of a state, a PIFO tree, and a scheduling transaction. The scheduling transaction's whole job is to look at incoming packets and issue an insertion path, including all the appropriate ranks, for that packet to successfully be enqueued. Sounds great, right? We can just set up a number of naive PIFOs and then use the scheduling transaction to get paths? Well the issue is that this path-computation is happening in software, not hardware. Moving this logic to hardware will be its own challenge, and I don't have really anything else to say about how we'll tackle that challenge.

Just to make some progress, we could lean into this path-computation-in-software thing. We could get the OCaml control c_tar to tell us, for a given batch of input packets, exactly what insertion paths it would have assigned to those packets. This is not so hard: just run the OCaml-level simulator and make it produce some kind of trace. This trace will need to be a little like our .data files that Calyx queues currently take, but obviously it will need to be beefed up beyond the shared interface's current capabilities because it will need to communicate entire paths to the underlying hardware. The .data input will also include calls to pop. The underlying hardware will be a number of naive PIFOs (as many PIFOs as there are nodes in c_tar's PIFO tree) and it will need to learn how to read these new beefed-up path inputs. The hardware will also need to produce a result, which will go into an .expect file. As an aside, we could tweak our Python visualizer to read in this .expect file and graph out the result of our Calyx-level hardware work.

In the figure, this approach is shown in blue. I have a little empty brace there to show that the final blue arrow doesn't quite get home: there remains a software-hardware gap.

Approach 3: T2TC early (green way)

This approach is to somehow find a way to do T2TC at the policy DSL level, and then go directly from the policy DSL to the Calyx eDSL.

As we saw in approach 1 above, translation from the source policy written in the policy DSL to some runnable Calyx was really quite easy. But then we got stuck in the left column, and we needed to implement T2TC in the hardest possible setting: the bottom row, trying to go from left to right.

We have a longish discussion about finding a way to coherently represent transient nodes in the policy DSL. A few ideas have been aired, but the tension there is always to keep enough information around such that translation to Calyx remains easy. If we can figure that out, we'll be able to:

Write a new T2TC that goes from policy DSL to policy DSL, inserting transient nodes but keeping around a rich amount of information.
Directly compile this T2TC-compiled policy into Calyx eDSL. I'm super duper waving my hands, but the hope is that the policy will carry enough information that the translation will be straightforward.

In the figure, this approach is shown in green. This is the approach I like most in theory, but I understand the least in practice. Send help!

sampsyo · 2024-07-24T11:35:53Z

sampsyo
Jul 24, 2024
Maintainer

Thanks for writing this down! This is a fantastic and very clear overview. Here is a quick summary of what I think is the central tension:

Let "simple policies" be the kind that is currently representable in our policy DSL. Let "complex policies" be the kind that is the result from T2TC, i.e., a generalization of simple policies that adds transient nodes. Here are some facts (let me know if I am wrong):

We of course know how to represent simple policies in a DSL. We cannot yet represent complex policies in our DSL, although there is some in-flight thinking about how to maybe go about this (DSL Sketch #5).
Simple policies are amenable to distributed queueing logic, outlined in "Approach 1" above. Complex policies (as far as we know) seem to require centralized queueing logic, i.e., an omniscient controller that generates a path all at once. (And of course simple policies could also be implemented with a centralized controller—after all, every simple policy is a complex policy.)

I think the latter implies that we almost certainly want to build centralized-controller hardware, right? That is, we don't think there's a lot of hope that we might come up with a decentralized way to implement complex policies, so a decentralized hardware generator would end up limited to simple policies, which would be a shame.

Here is one way that we could proceed, which is a kind of blend of blue and green:

In an initial phase, we focus on simple policies and centralized hardware. This means that the hardware generator would start with the policy DSL to specify tree topology. We design the hardware to allow centralized control so that we can expand into complex policies later.
- One chunk of work is on just stamping out the PIFOs for the necessary and exposing the necessary signals to the controller.
- Another chunk of work is on building the controller itself, i.e., centralized path-generation hardware. But maybe this is not too hard because of the restriction to simple policies.
- During this phase, it will be helpful to have software models of each for testing. As in, we would want to have (1) a software model of a controller to test the actual PIFO tree, and (2) software model of a PIFO tree to test the controller. The latter is pretty close to stuff we already have; we would just need to decide on the controller/tree interface. For the former, we could either design it from scratch or figure out how to use the Formal Abstractions abstract to behave the way we want.
We kick the complex-policy can down the road to a second phase.
- We would need a way to represent complex policies in the DSL (because our existing hardware-generator, built in phase 1, takes the DSL as input). I have faith that we can figure out something workable here.
- The PIFO tree hardware generator probably does not need to change? As in, the actual tree of queues looks the same for simple vs. complex policies.
- The real work in this phase would instead be on the controller.

The point of this ordering is that it lets us get started generating hardware immediately without figuring out how to do complex policies yet. In parallel to that, we can work on figuring out what we want to do in phase 2. It leaves open two options for that:

Reimplement T2TC (green path), if we want to go whole hog. We would obtain a new software model and hardware implementation for complex policies.
Reuse the Formal Abstractions artifact for T2TC (mostly blue path), but add a little translation to just emit the tree shape in the policy DSL (to drive hardware generation, adding an arc from "OCaml Artifact" to "Policy DSL" in the diagram). We would still rely on the existing artifact for our software model of the controller.

Neither option frees us from the challenge of implementing actual hardware for the complex-policy controller. That is unavoidable.

5 replies

anshumanmohan Jul 24, 2024
Maintainer Author

I agree 100% with your framing of the central tension. And if I may look to the future a little, "simple policies" could eventually include other things like arbitrary x:y splits (e.g., A:B :: 30:70), non-work-conserving policies (e.g., flow B only gets to release one packet every second), and so on. Basically anything that is representable by the following wrapped-up gadget:

a "naive" PIFO/PIEO
a potentially clever shim atop of that

is a "simple policy", and a straightforward hierarchical combination of simple policies is still a simple policy.

When you add transient nodes into your hierarchy, that turns it into a complex policy.

anshumanmohan Jul 24, 2024
Maintainer Author

We design the hardware to allow centralized control so that we can expand into complex policies later.

Ah I see, so instead of doing the absolute most obvious conversion from policy DSL to eDSL (my first red arrow) you are suggesting that we do a slightly harder thing that has better long-term prospects. I find this route easiest to understand in contrast to my first red arrow, so here's an explanation via contrast:

Stamp out a tree of naive PIFOs, not clever PIFO+shim gadgets. Not hard.
Slurp up the individual shim logic that you would have given to your PIFOs into a centralized path-grantor. Not too hard, since we're "just" stapling together a few different shim logics, not inserting any fancy transient nodes yet.

The software models you suggest sound great so that we aren't trying to make some amazing tower of infrastructure directly in hardware. I presume the enriched .data and .expect files that I propose above would be reasonable ways for the software and hardware to communicate?

anshumanmohan Jul 24, 2024
Maintainer Author

but add a little translation to just emit the tree shape in the policy DSL (to drive hardware generation, adding an arc from "OCaml Artifact" to "Policy DSL" in the diagram)

Just to make sure I'm getting this right, we're talking about this new yellow path, and the new arc you propose is the yellow arrow I have annotated with 3?

anshumanmohan Jul 24, 2024
Maintainer Author

Thank you so much for looking through this! I find this very exciting but I will admit it makes my head spin a little. I'll keep thinking on it!

sampsyo Jul 24, 2024
Maintainer

I presume the enriched .data and .expect files that I propose above would be reasonable ways for the software and hardware to communicate?

Yes. This is orthogonal, I think, to whether we want to invent a fancier testbench someday that does not require up-front traces.

Just to make sure I'm getting this right, we're talking about this new yellow path, and the new arc you propose is the yellow arrow I have annotated with 3?

Yes, exactly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiler paths: policy DSL to Calyx eDSL #38

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Compiler paths: policy DSL to Calyx eDSL #38

anshumanmohan Jul 24, 2024 Maintainer

Overview, and Challenge

Approach 1: decentralized clever pieces of hardware (red way)

Approach 2: exploit OCaml T2TC (blue way)

Approach 3: T2TC early (green way)

Replies: 1 comment · 5 replies

sampsyo Jul 24, 2024 Maintainer

anshumanmohan Jul 24, 2024 Maintainer Author

anshumanmohan Jul 24, 2024 Maintainer Author

anshumanmohan Jul 24, 2024 Maintainer Author

anshumanmohan Jul 24, 2024 Maintainer Author

sampsyo Jul 24, 2024 Maintainer

anshumanmohan
Jul 24, 2024
Maintainer

Replies: 1 comment 5 replies

sampsyo
Jul 24, 2024
Maintainer

anshumanmohan Jul 24, 2024
Maintainer Author

anshumanmohan Jul 24, 2024
Maintainer Author

anshumanmohan Jul 24, 2024
Maintainer Author

anshumanmohan Jul 24, 2024
Maintainer Author

sampsyo Jul 24, 2024
Maintainer