New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

State synthesis using quantum state aggregator #2637

Draft

annagrin wants to merge 61 commits into NVIDIA:main from annagrin:quantum-state-aggregator

Collaborator

annagrin commented Feb 18, 2025

Description

State synthesis alternative implementation using state aggregator before argument conversion:

Add a stage to argument synthesis for quantum devices, StateAggregator, that

processes the arguments and collects all the states and kernel names and their arguments from there recursively
adds all the new init and num_qubit functions in there for the kernels collected to the module

Basically, the state synthesis would be split into 4 parts:

create all the init and num_qubits functions for all states in StateAggregator
create substitution quake.get_state @num_qubits @init for the cudaq::state* argument in ArgumentConverter
run synthesis as usual
replace quake.get_state instruction used in quake.inist_state quake.get_number_of_qubits in ReplaceStateByKernel pass by calls to the init and get_qubits functions

The user code (in RemoteBaseRESTQPU.h) then

Creates a state aggregator from top level kernel and its args
gets the list of kernels and their arguments from it
creates a list of argument converters for each kernel+args
uses the converters to create all substs
Calls argument synthesis with the references to the kernel names and substs

bmhowe23 and others added 30 commits

October 17, 2024 14:33


          DCO Remediation Commit for Ben Howe <[email protected]>

ac01dd1

I, Ben Howe <[email protected]>, hereby add my Signed-off-by to this commit: 86681ef

Signed-off-by: Ben Howe <[email protected]>
Signed-off-by: Anna Gringauze <[email protected]>


          State pointer synthesis for quantum hardware

21a87c1

Signed-off-by: Anna Gringauze <[email protected]>


          Merge with main

3fc56de

Signed-off-by: Anna Gringauze <[email protected]>


          Merge with main

7969a75

Signed-off-by: Anna Gringauze <[email protected]>


          Fix test failure on anyon platform

755d0d1

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

dc5e77e

…antum-device-state


          Make StateInitialization a funcOp pass

382bc99

Signed-off-by: Anna Gringauze <[email protected]>


          Fix issues and tests for the rest of quantum architectures

d3a05d4

Signed-off-by: Anna Gringauze <[email protected]>


          Merge with main

ac151f2

Signed-off-by: Anna Gringauze <[email protected]>


          Fix failing quantinuum state prep tests

51ef054

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

0cdf3e9

…antum-device-state


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

5307aa4

…antum-device-state


          Address CR comments

a7f5387

Signed-off-by: Anna Gringauze <[email protected]>


          Merge with main

eb8db13

Signed-off-by: Anna Gringauze <[email protected]>


          Format

9f0937f

Signed-off-by: Anna Gringauze <[email protected]>


          Fix failing test

2f3a623

Signed-off-by: Anna Gringauze <[email protected]>


          Format

b381350

Signed-off-by: Anna Gringauze <[email protected]>


          Format

dc87ca4

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

e4c7735

…antum-device-state


          Replaced getState intrinsic by cc.get_state op

53a34c9

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

30777f3

…antum-device-state


          Remove print

fe6d409

Signed-off-by: Anna Gringauze <[email protected]>


          Remove getCudaqState references

48704e3

Signed-off-by: Anna Gringauze <[email protected]>


          Minor updates

137f621

Signed-off-by: Anna Gringauze <[email protected]>


          Fix failing quake test

ad7c6bc

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

83683f7

…antum-device-state

Signed-off-by: Anna Gringauze <[email protected]>


          Add a few state-related cc ops

78c0a44

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into st…

6682c39

…ate-ops

Signed-off-by: Anna Gringauze <[email protected]>


          Fix test_argument_conversion

102f819

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into st…

6b2c015

…ate-ops

Signed-off-by: Anna Gringauze <[email protected]>

annagrin added 24 commits

November 12, 2024 10:06


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into st…

f0176ae

…ate-ops

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

d17fa6d

…antum-device-state

Signed-off-by: Anna Gringauze <[email protected]>


          Merge with state-ops

Signed-off-by: Anna Gringauze <[email protected]>


          Add description for new algorithm for state syntesis

6fdccba

Signed-off-by: Anna Gringauze <[email protected]>


          Merge with main

fc5e154

Signed-off-by: Anna Gringauze <[email protected]>


          Fix tests

1dfa805

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

b67fc88

…antum-device-state

Signed-off-by: Anna Gringauze <[email protected]>


          Make intermediate IR legal by separating allocs


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

f32b066

…antum-device-state

Signed-off-by: Anna Gringauze <[email protected]>


          DCO Remediation Commit for Anna Gringauze <[email protected]>

008e8c1

I, Anna Gringauze <[email protected]>, hereby add my Signed-off-by to this commit: 9563371

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

84a4369

…antum-device-state

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

1c0a4b3

…antum-device-state

Signed-off-by: Anna Gringauze <[email protected]>


          Address some PR comments

f8e35eb

Signed-off-by: Anna Gringauze <[email protected]>


          Address more CR comments

e79ad6a

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

88cd5d5

…antum-device-state

Signed-off-by: Anna Gringauze <[email protected]>


          Cleanup

c0d9ae9

Signed-off-by: Anna Gringauze <[email protected]>


          Address CR comments

1ecd8cc

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

0238a66

…antum-device-state

Signed-off-by: Anna Gringauze <[email protected]>


          Address more CR comments

de387fc

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

a5150a5

…antum-device-state

Signed-off-by: Anna Gringauze <[email protected]>


          Address more CR comments

7cf306a

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

16de803

…antum-device-state

Signed-off-by: Anna Gringauze <[email protected]>


          Use StateAggregator before argument conversion

f3107bd

Signed-off-by: Anna Gringauze <[email protected]>


          Merge branch 'main' of https://github.com/NVIDIA/cuda-quantum into qu…

5cfd0e0

…antum-state-aggregator

Signed-off-by: Anna Gringauze <[email protected]>

annagrin marked this pull request as draft

February 18, 2025 22:56

annagrin added 4 commits

February 18, 2025 14:59


          Format

0071c21

Signed-off-by: Anna Gringauze <[email protected]>


          Cleanup and fix failing test

01f7d5b

Signed-off-by: Anna Gringauze <[email protected]>


          Cleanup state hash

b8fa541

Signed-off-by: Anna Gringauze <[email protected]>


          Add more tests for state argument conversion

6ef4b07

Signed-off-by: Anna Gringauze <[email protected]>

schweitzpgi reviewed

View reviewed changes

Collaborator

schweitzpgi left a comment •

edited

Loading

I've been thinking about the Python issues with all of this and it's really not clear that this is the right approach in that context.

Specifically, in order to achieve reasonable performance in python a kernel is effectively pre-compiled to (call it) assembly code. Then when that kernel is launched (from a call-site), very little overhead is incurred. Arguments are marshalled and the pre-compiled assembly code is run.

For state handling then we have these cases:

It's a simulation, so a raw pointer to the data (same address space) can be forwarded to the simulator layer to do what it needs to do. Marshalling is a simple copy of the pointer.
The simulation is happening in another address space (server), so the marshalling involves transmitting the data with the kernel or independently from the client.
The data isn't really on the client, which requires the server runtime to have the smarts of figuring out what state data to send to what kernel when they are launched. (I don't know if that was designed or vetted.)
There is no data and the "state data" is really calls to (chains) of other kernels. In this case, it's too late for Python since the assembly code doesn't contain those calls and was built without them. One possible design is for the assembly code to call back to the runtime with a state object encoding (TBD) that allows the runtime to launch the kernels such that the qubit set is initialized into some state. But the point is we wouldn't be doing argument synthesis here as the assembly code is already compiled. Specifically, the call

  Array* a = __quantum__rt__qubit_allocate_array_with_cudaq_state_ptr(state_ptr);

would itself understand how to unmarshall the data implied by the state_ptr and return a set (array) of qubits from running the implied kernels itself. This pushes almost the entire problem to be as late as possible and not something the compiler (at least in Python) can reason about.

include/cudaq/Optimizer/Dialect/CC/CCTypes.td

                                        "state initializer types">;
               def AnyStateInitType : Type<AnyStateInitLike.predicate, "initial state type">;
+              def AnyStatePointerType : Type<

Collaborator

schweitzpgi Feb 19, 2025

I'll put up a PR to make pointer element type constraints easy to specify. (I thought I'd already done so, but I don't see it on main.)

Collaborator Author

annagrin Feb 20, 2025

thanks!

include/cudaq/Optimizer/Dialect/Quake/QuakeOps.td

                   cc_PointerType:$data,
                   AnySignlessInteger:$length
                 );
-                let results = (outs cc_PointerType:$result);
+                let results = (outs AnyStatePointerType:$result);

Collaborator

schweitzpgi Feb 19, 2025

Suggested change

      
              let results = (outs AnyStatePointerType:$result);
          
              let results = (outs PointerOf<[cc_StateType]>:$result);

with the above PR merged, this can be used instead.

include/cudaq/Optimizer/Transforms/Passes.td

+                  "Replace `quake.init_state` instructions with call to the kernel generating the state";
+                let description = [{
+                  This optimization replaces `quake.init_state`, `quake.get_number_of_qubits`,
+                  and `quake.get_state` operations invoked on state pointers during argument

Collaborator

schweitzpgi Feb 19, 2025

It isn't possible to call cudaq::get_state from within a kernel. But that isn't what this op means, so it is a confusing overloading of terms. Can we rename quake.get_state to something like quake.materialize_state?

Collaborator Author

annagrin Feb 20, 2025

sure

lib/Optimizer/Transforms/ReplaceStateWithKernel.cpp

+              #include "mlir/IR/PatternMatch.h"
+              #include "mlir/Transforms/GreedyPatternRewriteDriver.h"
+              #include "mlir/Transforms/Passes.h"
+              #include <span>

Collaborator

schweitzpgi Feb 19, 2025

std::span not used; not needed?

lib/Optimizer/Transforms/ReplaceStateWithKernel.cpp

+              ///  %0 = quake.get_state @callee.num_qubits_0 @callee.init_0 : !cc.ptr<!cc.state>
+              ///  %1 = quake.get_number_of_qubits %0 : (!cc.ptr<!cc.state>) -> i64
+              /// ───────────────────────────────────────────
+              /// ...

Collaborator

schweitzpgi Feb 19, 2025

Suggested change

/// ...

lib/Optimizer/Transforms/ReplaceStateWithKernel.cpp

+                                              PatternRewriter &rewriter) const override {
+                  auto stateOp = numQubits.getOperand();
+                  if (auto getState = stateOp.getDefiningOp<quake::GetStateOp>()) {

Collaborator

schweitzpgi Feb 19, 2025

nit: typically we want the then block to emit the error and the fall-through code to be the "winning" case that does the transformation.

lib/Optimizer/Transforms/ReplaceStateWithKernel.cpp

+              /// Replace `quake.init_state` by a call to a (modified) kernel that produced
+              /// the state.
+              ///
+              /// ```

Collaborator

schweitzpgi Feb 19, 2025

Suggested change

      
            /// ```
          
            /// ```mlir

I believe that is legal in doxygen. It seems to be used in other files.

lib/Optimizer/Transforms/ReplaceStateWithKernel.cpp

+              ///  %0 = quake.get_state @callee.num_qubits_0 @callee.init_0 : !cc.ptr<!cc.state>
+              ///  %3 = quake.init_state %2, %0 : (!quake.veq<?>, !cc.ptr<!cc.state>) -> !quake.veq<?>
+              /// ───────────────────────────────────────────
+              /// ...

Collaborator

schweitzpgi Feb 19, 2025

Suggested change

/// ...

We can elide these triple dots in these transformation comments.

lib/Optimizer/Transforms/ReplaceStateWithKernel.cpp

+                  auto *ctx = &getContext();
+                  auto func = getOperation();
+                  RewritePatternSet patterns(ctx);
+                  patterns.insert<ReplaceGetNumQubitsPattern, ReplaceInitStatePattern>(ctx);

Collaborator

schweitzpgi Feb 19, 2025

Looks good. Just the 2 patterns.

runtime/common/ArgumentConversion.cpp

Comment on lines +110 to +111

		// TODO: add an option to use the kernel info if available, i.e. for
		// remote simulators

Collaborator

schweitzpgi Feb 19, 2025

We need a clearer definition of what we mean by "remote simulators". There are several possibilities and I'm not sure what the issues are with this.

There are no local processes. CPU and QPU code run on server (there) together. In this case, the simulator likely shares the same address space as the CPU code, so it can dereference a pointer (like from a span) and read whatever state data there is. There is no need to clone anything, since the pointer at the time of the call must be valid or the program is on the verge of issuing a SIGSEGV.
CPU code runs on client (here) and QPU code runs on the server (there). In this case, the QPU cannot be responsible for cloning an object that resides on the client. The state must be passed to the server as a copy.
Hybrid model of CPU runs here and QPU runs there, but rather than bundle the kernel along with the state data, they are sent separately. Here again, the server must have the ability to receive and associate the state data to a specific kernel. Furthermore, it must be able to correctly construct the state data in the server's address space such that it will use the calling conventions of the kernel in that address space. This may or may not require the server runtime to make a copy of the data. But if it does make a copy, the kernel itself would not manage that data or make yet another copy. It merely has to pass the unowned data on to the simulator.

One of the performance sticking points with all of this is trying to avoid making copies of state data. Hence it seems like we want the kernel design to simply refuse to accept anything other than unowned references to state data. The runtime machinery and the simulator will be then be 100% responsible for making the copies (and the poor performance that entails).

Collaborator Author

annagrin Feb 20, 2025

Remote simulators here mean host and kernel code executed in different processes, and the kernel is executed on a simulator. I'll update the comment. Synthesis for those scenarios currently inline the state data in the kernel, but we could use the alternative option from this PR as well, if the kernel data is available. I was thinking about trying it out via the option in question. This "option" addition is not part of the PR, just something to think about in the future.

Collaborator Author

annagrin Feb 20, 2025

removed the comment for now as the issue is unrelated to the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet