Making PyTorch3D cameras more flexible #783
gkioxari
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
tl;dr: I landed this commit 0c32f09 which makes cameras more flexible to use in PyTorch3D. It is a breaking change, so I want to draw the attention of current users of PyTorch3D.
The commit (0c32f09) refactors cameras such that
A. Defining cameras in any space
Cameras are currently defined by providing
R
,T
andK
(K
is parametrized based on the camera type). The methodcameras.transform_points(points)
transforms inputpoints
defined in any coordinate system by applying[K| R; t]
topoints
.B. Interfacing with the PyTorch3D renderer
Coordinate system conventions come into play when interfacing with the PyTorch3D renders.
Here, we assume that
points
are provided in the PyTorch3D coordinate system (+X: left
,+Y: up
and+Z: from us to screen
).By design, our PyTorch3D renderers assume that points are converted to NDC space by the cameras before being passed through the rasterizer. The PyTorch3D NDC space is defined as
+X: left
,+Y: up
and+Z: from us to screen
and spans the space[-1, 1] x [-u, u]
or[-u, u] x [-1, 1]
whereu > 1
is the aspect ratio of the image size.B1. Defining cameras in NDC or screen space
To make camera definitions flexible, we support cameras to be defined in NDC or screen space. For example, for
PerspectiveCameras
users can define the focal length and the principal point either in NDC space (normalized space) or in screen space (image space). The latter convention is common for camera definitions.To interface correctly with the PyTorch3D renderers, we provide helper functions to convert cameras defined in screen space to NDC projected points. The screen to NDC conversion is called by the PyTorch3D renderers and transforms the points to the space needed for correct rendering.
Below is an example of two equivalent cameras, one defined in NDC space and one in screen space.
B2. Breaking changes
The camera commit (0c32f09) contains three breaking changes:
in_ndc=False
. In the past, screen cameras were implied if the user providedimage_size
in the constructor. The latter functionality is no longer supported, and users can now provideimage_size
for either screen or NDC cameras.image_size
is now(height, width)
, instead of(width, height)
, to match the rasterization settings assumptions.cameras.transform_points_screen
transforms points to NDC and then to screen space. For non-square images, it assumes the NDC conventions, that is points are projected to[-1, 1] x [-u, u]
or[-u, u] x [-1, 1]
whereu > 1
is the aspect ratio of the image size. This is unlike the previous assumptions, where points in NDC space were mapped to[-1, 1]x[-1, 1]
regardless of image aspect ratio.B3. How to upgrade code
Specifically, to upgrade code:
(1) Initializing Cameras
Look at each call to any of the camera init functions (
PerspectiveCameras
,OrthographicCameras
,OpenGLPerspectiveCamerase
,OpenGLOrthographicCameras
,SfMPerspectiveCameras
,SfMOrthographicCameras
,FoVPerspectiveCameras
,FovOrthographicCameras
). If it sets image_size:image_size = ((128, 256),)
(height, width)
instead of(width, height)
in the previous releases.in_ndc=False
since you were specifying a screen camera.For example, (old code)
Becomes (new code)
(2) Projecting points to screen coordinates
Look at all the places where the method
transform_points_screen
is called on the object returned from one of those init functions.transform_points_screen
previously had a compulsory image_size argument, which is now an optional keyword argument.image_size
is set here to a container of pairs / or a tensor of shape [something, 2], swap the order of the inner dimension (this is width and height).image_size
parameter was also passed in the constructor and it is the same, then it no longer needs to be passed at this location.image_size
is still being passed, make sure it is passed as a keyword argument, e.g..image_size = ((128, 256),)
image_size
(either set at this call or else known from the init function) is a tuple of unequal values, the output of the call is scaled differently now and updating your code requires care. Specifically, for each batch element, if the new image size is (H, W),I am using GH Discussions for the first time but I thought it's a good idea to engage in discussions with yall and resolve any issues you might have.
Beta Was this translation helpful? Give feedback.
All reactions