How does warp.ScopedCapture works? #565
-
Hi everyone. I have started learning Warp for a month (with no exp in CUDA programming) and I am trying to understand how warp.ScopedCapture works. From my understanding, it seems that this wrapper wraps the entire code block inside, executes it while allocating memory for this specific code block, which in turns reduce the overall run time if the code block is repetitively executed. Below is the code that I have been working on. I modified the example_quadruped.py file with some ideas from example_drone.py in order to get states with gradient. The quadruped env is defined by the class Example, which has attribute self.sim_tick indicating the simulation step of the robot . Inside the simulate() method, I made it so that each time a simulation substep is executed, self.sim_tick increase by 1. However, when I check self.sim_tick after each forward pass (i.e. executing self.sim_substeps times), it stays the same at the value of self.sim_substeps. It seems to me that the attribute self.sim_tick is no longer update further after it was pre-executed in the initialization. Can you help me with this? Thank you very much! My code is as follow
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The TL;DR is that the The line Sometimes this isn't always sufficient as you might need access to the current value of Bonus: Be careful of using |
Beta Was this translation helpful? Give feedback.
The TL;DR is that the
ScopedCapture
records (and does not execute) the CUDA operations (including the inputs and output pointers) like kernel launches, memory allocations, copies, event recording, etc.The line
self.sim_tick +=1
will be executed at the Python scope while the graph capture is being made, and subsequent launches of this graph will not run this line since the CUDA runtime never saw this operation. One workaround is to move these non-CUDA operations outside of the graph capture and manually increment them after thewp.capture_launch(self.graph)
.Sometimes this isn't always sufficient as you might need access to the current value of
self.sim_tick
in one of your kernels. In thi…