How does warp.ScopedCapture works? #565

DINHQuangDung1999 · 2025-03-06T09:33:13Z

DINHQuangDung1999
Mar 6, 2025

Hi everyone. I have started learning Warp for a month (with no exp in CUDA programming) and I am trying to understand how warp.ScopedCapture works. From my understanding, it seems that this wrapper wraps the entire code block inside, executes it while allocating memory for this specific code block, which in turns reduce the overall run time if the code block is repetitively executed.

Below is the code that I have been working on. I modified the example_quadruped.py file with some ideas from example_drone.py in order to get states with gradient. The quadruped env is defined by the class Example, which has attribute self.sim_tick indicating the simulation step of the robot . Inside the simulate() method, I made it so that each time a simulation substep is executed, self.sim_tick increase by 1. However, when I check self.sim_tick after each forward pass (i.e. executing self.sim_substeps times), it stays the same at the value of self.sim_substeps. It seems to me that the attribute self.sim_tick is no longer update further after it was pre-executed in the initialization.

Can you help me with this? Thank you very much! My code is as follow

import math
import os
import time 

import numpy as np
import matplotlib.pyplot as plt 

import warp as wp
import warp.examples
import warp.sim
import warp.sim.render



def compute_env_offsets(num_envs, env_offset=(5.0, 0.0, 5.0), up_axis="Y"):# compute positional offsets per environment
    
    env_offset = np.array(env_offset)
    nonzeros = np.nonzero(env_offset)[0]
    num_dim = nonzeros.shape[0]
    if num_dim > 0:
        side_length = int(np.ceil(num_envs ** (1.0 / num_dim)))
        env_offsets = []
    else:
        env_offsets = np.zeros((num_envs, 3))
    if num_dim == 1:
        for i in range(num_envs):
            env_offsets.append(i * env_offset)
    elif num_dim == 2:
        for i in range(num_envs):
            d0 = i // side_length
            d1 = i % side_length
            offset = np.zeros(3)
            offset[nonzeros[0]] = d0 * env_offset[nonzeros[0]]
            offset[nonzeros[1]] = d1 * env_offset[nonzeros[1]]
            env_offsets.append(offset)
    elif num_dim == 3:
        for i in range(num_envs):
            d0 = i // (side_length * side_length)
            d1 = (i // side_length) % side_length
            d2 = i % side_length
            offset = np.zeros(3)
            offset[0] = d0 * env_offset[0]
            offset[1] = d1 * env_offset[1]
            offset[2] = d2 * env_offset[2]
            env_offsets.append(offset)
    env_offsets = np.array(env_offsets)
    min_offsets = np.min(env_offsets, axis=0)
    correction = min_offsets + (np.max(env_offsets, axis=0) - min_offsets) / 2.0
    if isinstance(up_axis, str):
        up_axis = "XYZ".index(up_axis.upper())
    correction[up_axis] = 0.0  # ensure the envs are not shifted below the ground plane
    env_offsets -= correction
    return env_offsets


class Example:
    def __init__(self, stage_path="example_quadruped.usd", num_envs=9, requires_grad = False, num_frames = 300):
        articulation_builder = wp.sim.ModelBuilder()
        wp.sim.parse_urdf(
            os.path.join(warp.examples.get_asset_directory(), "quadruped.urdf"),
            articulation_builder,
            xform=wp.transform([0.0, 3.0, 0.0], warp.quat_from_axis_angle(wp.vec3(1.0, 0.0, 0.0), -math.pi * 0.5)),
            floating=True,
            density=1000,
            armature=0.01,
            stiffness=50,
            damping=1,
            contact_ke=1.0e6,
            contact_kd=1.0e2,
            contact_kf=1.0e2,
            contact_mu=1.0,
            limit_ke=1.0e4,
            limit_kd=1.0e1,
        )

        builder = wp.sim.ModelBuilder()

        self.sim_time = 0.0
        fps = 100
        self.frame_dt = 1.0 / fps

        self.sim_substeps = 50
        self.sim_dt = self.frame_dt / self.sim_substeps

        self.num_envs = num_envs

        offsets = compute_env_offsets(self.num_envs)
        for i in range(self.num_envs):
            builder.add_builder(articulation_builder, xform=wp.transform(offsets[i], wp.quat_identity()))
            builder.joint_q[-12:] = [0.2, 0.4, -0.6, -0.2, -0.4, 0.6, -0.2, 0.4, -0.6, 0.2, -0.4, 0.6]
            builder.joint_axis_mode = [wp.sim.JOINT_MODE_TARGET_POSITION] * len(builder.joint_axis_mode)
            builder.joint_act[-12:] = [0.2, 0.4, -0.6, -0.2, -0.4, 0.6, -0.2, 0.4, -0.6, 0.2, -0.4, 0.6]
        np.set_printoptions(suppress=True)

        self.model = builder.finalize()
        self.model.ground = True

        self.sim_tick = 0
        self.model.joint_attach_ke = 16000.0
        self.model.joint_attach_kd = 200.0
        self.use_tile_gemm = False

        self.integrator = wp.sim.SemiImplicitIntegrator()

        if stage_path:
            self.renderer = warp.sim.render.SimRendererOpenGL(self.model, stage_path)
        else:
            self.renderer = None

        self.requires_grad = requires_grad
        if requires_grad:
            self.states = tuple(self.model.state() for _ in range(num_frames*self.sim_substeps + 1))
            self.controls = tuple(self.model.control() for _ in range(num_frames*self.sim_substeps))
        else:
            self.states = [self.model.state(), self.model.state()]
            self.controls = (self.model.control(),)

        wp.sim.eval_fk(self.model, self.model.joint_q, self.model.joint_qd, None, self.state)

        self.use_cuda_graph = wp.get_device().is_cuda and wp.is_mempool_enabled(wp.get_device())
        if self.use_cuda_graph:
            with wp.ScopedCapture() as capture:
                self.simulate()
            self.graph = capture.graph
        else:
            self.graph = None

    @property
    def state(self) -> wp.sim.State:
        return self.states[self.sim_tick if self.requires_grad else 0]

    @property
    def next_state(self) -> wp.sim.State:
        return self.states[self.sim_tick + 1 if self.requires_grad else 1]

    @property
    def control(self) -> wp.sim.Control:
        return self.controls[min(len(self.controls) - 1, self.sim_tick) if self.requires_grad else 0]

    def simulate(self):
        for _ in range(self.sim_substeps):
            self.state.clear_forces()
            wp.sim.collide(self.model, self.state)
            self.integrator.simulate(self.model, self.state, self.next_state, self.sim_dt)
            if not self.requires_grad:
                self.states[0], self.states[1] = self.states[1], self.states[0]
            self.sim_tick +=1

    def step(self):
        with wp.ScopedTimer("step"):
            if self.use_cuda_graph:
                wp.capture_launch(self.graph)
            else:
                self.simulate()
        self.sim_time += self.frame_dt

    def render(self):
        if self.renderer is None:
            return

        with wp.ScopedTimer("render"):
            self.renderer.begin_frame(self.sim_time)
            self.renderer.render(self.state)
            self.renderer.end_frame()
            # time.sleep(1)

if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument("--device", type=str, default=None, help="Override the default Warp device.")
    parser.add_argument(
        "--stage_path",
        type=lambda x: None if x == "None" else str(x),
        default="example_quadruped.usd",
        help="Path to the output USD file.",
    )
    parser.add_argument("--num_frames", type=int, default=300, help="Total number of frames.")
    parser.add_argument("--num_envs", type=int, default=1, help="Total number of simulated environments.")

    args = parser.parse_known_args()[0]

    with wp.ScopedDevice(args.device):
        example = Example(stage_path=args.stage_path, 
                          num_envs=args.num_envs,
                          requires_grad=False,
                          num_frames=args.num_frames)

        for _ in range(args.num_frames):
            example.step()
            example.render()
        if example.renderer:
            example.renderer.save()

Answered by shi-eric

Mar 6, 2025

The TL;DR is that the ScopedCapture records (and does not execute) the CUDA operations (including the inputs and output pointers) like kernel launches, memory allocations, copies, event recording, etc.

The line self.sim_tick +=1 will be executed at the Python scope while the graph capture is being made, and subsequent launches of this graph will not run this line since the CUDA runtime never saw this operation. One workaround is to move these non-CUDA operations outside of the graph capture and manually increment them after the wp.capture_launch(self.graph).

Sometimes this isn't always sufficient as you might need access to the current value of self.sim_tick in one of your kernels. In thi…

View full answer

shi-eric · 2025-03-06T15:41:56Z

shi-eric
Mar 6, 2025
Maintainer

The TL;DR is that the ScopedCapture records (and does not execute) the CUDA operations (including the inputs and output pointers) like kernel launches, memory allocations, copies, event recording, etc.

The line self.sim_tick +=1 will be executed at the Python scope while the graph capture is being made, and subsequent launches of this graph will not run this line since the CUDA runtime never saw this operation. One workaround is to move these non-CUDA operations outside of the graph capture and manually increment them after the wp.capture_launch(self.graph).

Sometimes this isn't always sufficient as you might need access to the current value of self.sim_tick in one of your kernels. In this case you would allocate another Warp array to store the value of sim_tick so it can be passed/updated inside a GPU kernel rather than at the Python scope.

Bonus: Be careful of using self.states[0], self.states[1] = self.states[1], self.states[0] inside a graph with multiple substeps, as taking an odd number of substeps can break code if this is not accounted for. A solution is to use two graphs and alternate between them.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does warp.ScopedCapture works? #565

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

How does warp.ScopedCapture works? #565

DINHQuangDung1999 Mar 6, 2025

Replies: 1 comment

shi-eric Mar 6, 2025 Maintainer

DINHQuangDung1999
Mar 6, 2025

shi-eric
Mar 6, 2025
Maintainer