-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
State synthesis using quantum state aggregator #2637
base: main
Are you sure you want to change the base?
Conversation
I, Ben Howe <[email protected]>, hereby add my Signed-off-by to this commit: 86681ef Signed-off-by: Ben Howe <[email protected]> Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
…antum-device-state
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
…antum-device-state
…antum-device-state
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
…antum-device-state
Signed-off-by: Anna Gringauze <[email protected]>
…antum-device-state
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
…antum-device-state Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
…ate-ops Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
…ate-ops Signed-off-by: Anna Gringauze <[email protected]>
…ate-ops Signed-off-by: Anna Gringauze <[email protected]>
…antum-device-state Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
…antum-device-state Signed-off-by: Anna Gringauze <[email protected]>
…antum-device-state Signed-off-by: Anna Gringauze <[email protected]>
I, Anna Gringauze <[email protected]>, hereby add my Signed-off-by to this commit: 9563371 Signed-off-by: Anna Gringauze <[email protected]>
…antum-device-state Signed-off-by: Anna Gringauze <[email protected]>
…antum-device-state Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
…antum-device-state Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
…antum-device-state Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
…antum-device-state Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
…antum-device-state Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
…antum-state-aggregator Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about the Python issues with all of this and it's really not clear that this is the right approach in that context.
Specifically, in order to achieve reasonable performance in python a kernel is effectively pre-compiled to (call it) assembly code. Then when that kernel is launched (from a call-site), very little overhead is incurred. Arguments are marshalled and the pre-compiled assembly code is run.
For state handling then we have these cases:
- It's a simulation, so a raw pointer to the data (same address space) can be forwarded to the simulator layer to do what it needs to do. Marshalling is a simple copy of the pointer.
- The simulation is happening in another address space (server), so the marshalling involves transmitting the data with the kernel or independently from the client.
- The data isn't really on the client, which requires the server runtime to have the smarts of figuring out what state data to send to what kernel when they are launched. (I don't know if that was designed or vetted.)
- There is no data and the "state data" is really calls to (chains) of other kernels. In this case, it's too late for Python since the assembly code doesn't contain those calls and was built without them. One possible design is for the assembly code to call back to the runtime with a state object encoding (TBD) that allows the runtime to launch the kernels such that the qubit set is initialized into some state. But the point is we wouldn't be doing argument synthesis here as the assembly code is already compiled. Specifically, the call
Array* a = __quantum__rt__qubit_allocate_array_with_cudaq_state_ptr(state_ptr);
would itself understand how to unmarshall the data implied by the state_ptr
and return a set (array) of qubits from running the implied kernels itself. This pushes almost the entire problem to be as late as possible and not something the compiler (at least in Python) can reason about.
@@ -313,4 +313,10 @@ def AnyStateInitLike : TypeConstraint<cc_PointerType.predicate, | |||
"state initializer types">; | |||
def AnyStateInitType : Type<AnyStateInitLike.predicate, "initial state type">; | |||
|
|||
def AnyStatePointerType : Type< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put up a PR to make pointer element type constraints easy to specify. (I thought I'd already done so, but I don't see it on main.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
@@ -1418,7 +1418,7 @@ def quake_CreateStateOp : QuakeOp<"create_state", [Pure]> { | |||
cc_PointerType:$data, | |||
AnySignlessInteger:$length | |||
); | |||
let results = (outs cc_PointerType:$result); | |||
let results = (outs AnyStatePointerType:$result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let results = (outs AnyStatePointerType:$result); | |
let results = (outs PointerOf<[cc_StateType]>:$result); |
with the above PR merged, this can be used instead.
"Replace `quake.init_state` instructions with call to the kernel generating the state"; | ||
let description = [{ | ||
This optimization replaces `quake.init_state`, `quake.get_number_of_qubits`, | ||
and `quake.get_state` operations invoked on state pointers during argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't possible to call cudaq::get_state from within a kernel. But that isn't what this op means, so it is a confusing overloading of terms. Can we rename quake.get_state
to something like quake.materialize_state
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
#include "mlir/IR/PatternMatch.h" | ||
#include "mlir/Transforms/GreedyPatternRewriteDriver.h" | ||
#include "mlir/Transforms/Passes.h" | ||
#include <span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::span not used; not needed?
/// %0 = quake.get_state @callee.num_qubits_0 @callee.init_0 : !cc.ptr<!cc.state> | ||
/// %1 = quake.get_number_of_qubits %0 : (!cc.ptr<!cc.state>) -> i64 | ||
/// ─────────────────────────────────────────── | ||
/// ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// ... |
PatternRewriter &rewriter) const override { | ||
|
||
auto stateOp = numQubits.getOperand(); | ||
if (auto getState = stateOp.getDefiningOp<quake::GetStateOp>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typically we want the then block to emit the error and the fall-through code to be the "winning" case that does the transformation.
/// Replace `quake.init_state` by a call to a (modified) kernel that produced | ||
/// the state. | ||
/// | ||
/// ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// ``` | |
/// ```mlir |
I believe that is legal in doxygen. It seems to be used in other files.
/// %0 = quake.get_state @callee.num_qubits_0 @callee.init_0 : !cc.ptr<!cc.state> | ||
/// %3 = quake.init_state %2, %0 : (!quake.veq<?>, !cc.ptr<!cc.state>) -> !quake.veq<?> | ||
/// ─────────────────────────────────────────── | ||
/// ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// ... |
We can elide these triple dots in these transformation comments.
auto *ctx = &getContext(); | ||
auto func = getOperation(); | ||
RewritePatternSet patterns(ctx); | ||
patterns.insert<ReplaceGetNumQubitsPattern, ReplaceInitStatePattern>(ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just the 2 patterns.
// TODO: add an option to use the kernel info if available, i.e. for | ||
// remote simulators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a clearer definition of what we mean by "remote simulators". There are several possibilities and I'm not sure what the issues are with this.
- There are no local processes. CPU and QPU code run on server (there) together. In this case, the simulator likely shares the same address space as the CPU code, so it can dereference a pointer (like from a span) and read whatever state data there is. There is no need to clone anything, since the pointer at the time of the call must be valid or the program is on the verge of issuing a SIGSEGV.
- CPU code runs on client (here) and QPU code runs on the server (there). In this case, the QPU cannot be responsible for cloning an object that resides on the client. The state must be passed to the server as a copy.
- Hybrid model of CPU runs here and QPU runs there, but rather than bundle the kernel along with the state data, they are sent separately. Here again, the server must have the ability to receive and associate the state data to a specific kernel. Furthermore, it must be able to correctly construct the state data in the server's address space such that it will use the calling conventions of the kernel in that address space. This may or may not require the server runtime to make a copy of the data. But if it does make a copy, the kernel itself would not manage that data or make yet another copy. It merely has to pass the unowned data on to the simulator.
One of the performance sticking points with all of this is trying to avoid making copies of state data. Hence it seems like we want the kernel design to simply refuse to accept anything other than unowned references to state data. The runtime machinery and the simulator will be then be 100% responsible for making the copies (and the poor performance that entails).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remote simulators here mean host and kernel code executed in different processes, and the kernel is executed on a simulator. I'll update the comment. Synthesis for those scenarios currently inline the state data in the kernel, but we could use the alternative option from this PR as well, if the kernel data is available. I was thinking about trying it out via the option in question. This "option" addition is not part of the PR, just something to think about in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the comment for now as the issue is unrelated to the PR
Description
State synthesis alternative implementation using state aggregator before argument conversion:
Add a stage to argument synthesis for quantum devices,
StateAggregator
, thatinit
andnum_qubit
functions in there for the kernels collected to the moduleBasically, the state synthesis would be split into 4 parts:
init
andnum_qubits
functions for all states inStateAggregator
quake.get_state @num_qubits @init
for thecudaq::state*
argument inArgumentConverter
run synthesis as usual
quake.get_state
instruction used inquake.inist_state
quake.get_number_of_qubits
inReplaceStateByKernel
pass by calls to theinit
andget_qubits
functionsThe user code (in RemoteBaseRESTQPU.h) then