I have added a bunch of things based on the official repo, some of them are ONLY experimental.
- Script to filter out very large models in objaverse-github (save disk space)
- API backend (queuing mechanism supported)
- Blender API plugin (also refer to trellis_blender)
- Image conditioned detail variation changes to a simpler voxelization method as official repo
- training a multiview image conditioner (WIP)
- image-conditioned detail variation algorithm
- support 3 different postprocessing methods: simplify / remesh / subdivision
- support baking sRGB texture
- support low vram mode (8GB should be enough in that case)
- expose all these params to gradio UI
- allow to save all extrinsics/intrinsics/GS rendered images, so they can be used elsewhere
- one-script dataset processing file (see dataset_toolkits)
TRELLIS is a large 3D asset generation model. It takes in text or image prompts and generates high-quality 3D assets in various formats, such as Radiance Fields, 3D Gaussians, and meshes. The cornerstone of TRELLIS is a unified Structured LATent (SLAT) representation that allows decoding to different output formats and Rectified Flow Transformers tailored for SLAT as the powerful backbones. We provide large-scale pre-trained models with up to 2 billion parameters on a large 3D asset dataset of 500K diverse objects. TRELLIS significantly surpasses existing methods, including recent ones at similar scales, and showcases flexible output format selection and local 3D editing capabilities which were not offered by previous models.
Check out our Project Page for more videos and interactive demos!
- High Quality: It produces diverse 3D assets at high quality with intricate shape and texture details.
- Versatility: It takes text or image prompts and can generate various final 3D representations including but not limited to Radiance Fields, 3D Gaussians, and meshes, accommodating diverse downstream requirements.
- Flexible Editing: It allows for easy editings of generated 3D assets, such as generating variants of the same object or local editing of the 3D asset.
12/26/2024
- Release TRELLIS-500K dataset and toolkits for data preparation.
12/18/2024
- Implementation of multi-image conditioning for TRELLIS-image model. (#7). This is based on tuning-free algorithm without training a specialized model, so it may not give the best results for all input images.
- Add Gaussian export in
app.py
andexample.py
. (#40)
- Release inference code and TRELLIS-image-large model
- Release dataset and dataset toolkits
- Release TRELLIS-text model series
- Release training code
- System: The code is currently tested only on Linux. For windows setup, you may refer to #3 (not fully tested).
- Hardware: An NVIDIA GPU with at least 16GB of memory is necessary. The code has been verified on NVIDIA A100 and A6000 GPUs.
- Software:
- The CUDA Toolkit is needed to compile certain submodules. The code has been tested with CUDA versions 11.8 and 12.2.
- Conda is recommended for managing dependencies.
- Python version 3.8 or higher is required.
-
Clone the repo:
git clone --recurse-submodules https://github.com/microsoft/TRELLIS.git cd TRELLIS
-
Install the dependencies:
Before running the following command there are somethings to note:
- By adding
--new-env
, a new conda environment namedtrellis
will be created. If you want to use an existing conda environment, please remove this flag. - By default the
trellis
environment will use pytorch 2.4.0 with CUDA 11.8. If you want to use a different version of CUDA (e.g., if you have CUDA Toolkit 12.2 installed and do not want to install another 11.8 version for submodule compilation), you can remove the--new-env
flag and manually install the required dependencies. Refer to PyTorch for the installation command. - If you have multiple CUDA Toolkit versions installed,
PATH
should be set to the correct version before running the command. For example, if you have CUDA Toolkit 11.8 and 12.2 installed, you should runexport PATH=/usr/local/cuda-11.8/bin:$PATH
before running the command. - By default, the code uses the
flash-attn
backend for attention. For GPUs do not supportflash-attn
(e.g., NVIDIA V100), you can remove the--flash-attn
flag to installxformers
only and set theATTN_BACKEND
environment variable toxformers
before running the code. See the Minimal Example for more details. - The installation may take a while due to the large number of dependencies. Please be patient. If you encounter any issues, you can try to install the dependencies one by one, specifying one flag at a time.
- If you encounter any issues during the installation, feel free to open an issue or contact us.
Create a new conda environment named
trellis
and install the dependencies:. ./setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast
The detailed usage of
setup.sh
can be found by running. ./setup.sh --help
.Usage: setup.sh [OPTIONS] Options: -h, --help Display this help message --new-env Create a new conda environment --basic Install basic dependencies --xformers Install xformers --flash-attn Install flash-attn --diffoctreerast Install diffoctreerast --vox2seq Install vox2seq --spconv Install spconv --mipgaussian Install mip-splatting --kaolin Install kaolin --nvdiffrast Install nvdiffrast --demo Install all dependencies for demo
- By adding
We provide the following pretrained models:
Model | Description | #Params | Download |
---|---|---|---|
TRELLIS-image-large | Large image-to-3D model | 1.2B | Download |
TRELLIS-text-base | Base text-to-3D model | 342M | Coming Soon |
TRELLIS-text-large | Large text-to-3D model | 1.1B | Coming Soon |
TRELLIS-text-xlarge | Extra-large text-to-3D model | 2.0B | Coming Soon |
The models are hosted on Hugging Face. You can directly load the models with their repository names in the code:
TrellisImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS-image-large")
If you prefer loading the model from local, you can download the model files from the links above and load the model with the folder path (folder structure should be maintained):
TrellisImageTo3DPipeline.from_pretrained("/path/to/TRELLIS-image-large")
Here is an example of how to use the pretrained models for 3D asset generation.
import os
# os.environ['ATTN_BACKEND'] = 'xformers' # Can be 'flash-attn' or 'xformers', default is 'flash-attn'
os.environ['SPCONV_ALGO'] = 'native' # Can be 'native' or 'auto', default is 'auto'.
# 'auto' is faster but will do benchmarking at the beginning.
# Recommended to set to 'native' if run only once.
import imageio
from PIL import Image
from trellis.pipelines import TrellisImageTo3DPipeline
from trellis.utils import render_utils, postprocessing_utils
# Load a pipeline from a model folder or a Hugging Face model hub.
pipeline = TrellisImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS-image-large")
pipeline.cuda()
# Load an image
image = Image.open("assets/example_image/T.png")
# Run the pipeline
outputs = pipeline.run(
image,
seed=1,
# Optional parameters
# sparse_structure_sampler_params={
# "steps": 12,
# "cfg_strength": 7.5,
# },
# slat_sampler_params={
# "steps": 12,
# "cfg_strength": 3,
# },
)
# outputs is a dictionary containing generated 3D assets in different formats:
# - outputs['gaussian']: a list of 3D Gaussians
# - outputs['radiance_field']: a list of radiance fields
# - outputs['mesh']: a list of meshes
# Render the outputs
video = render_utils.render_video(outputs['gaussian'][0])['color']
imageio.mimsave("sample_gs.mp4", video, fps=30)
video = render_utils.render_video(outputs['radiance_field'][0])['color']
imageio.mimsave("sample_rf.mp4", video, fps=30)
video = render_utils.render_video(outputs['mesh'][0])['normal']
imageio.mimsave("sample_mesh.mp4", video, fps=30)
# GLB files can be extracted from the outputs
glb = postprocessing_utils.to_glb(
outputs['gaussian'][0],
outputs['mesh'][0],
# Optional parameters
simplify=0.95, # Ratio of triangles to remove in the simplification process
texture_size=1024, # Size of the texture used for the GLB
)
glb.export("sample.glb")
# Save Gaussians as PLY files
outputs['gaussian'][0].save_ply("sample.ply")
After running the code, you will get the following files:
sample_gs.mp4
: a video showing the 3D Gaussian representationsample_rf.mp4
: a video showing the Radiance Field representationsample_mesh.mp4
: a video showing the mesh representationsample.glb
: a GLB file containing the extracted textured meshsample.ply
: a PLY file containing the 3D Gaussian representation
app.py provides a simple web demo for 3D asset generation. Since this demo is based on Gradio, additional dependencies are required:
. ./setup.sh --demo
After installing the dependencies, you can run the demo with the following command:
python app.py
Then, you can access the demo at the address shown in the terminal.
The web demo is also available on Hugging Face Spaces!
We provide TRELLIS-500K, a large-scale dataset containing 500K 3D assets curated from Objaverse(XL), ABO, 3D-FUTURE, HSSD, and Toys4k, filtered based on aesthetic scores. Please refer to the dataset README for more details.
TRELLIS models and the majority of the code are licensed under the MIT License. The following submodules may have different licenses:
-
diffoctreerast: We developed a CUDA-based real-time differentiable octree renderer for rendering radiance fields as part of this project. This renderer is derived from the diff-gaussian-rasterization project and is available under the LICENSE.
-
Modified Flexicubes: In this project, we used a modified version of Flexicubes to support vertex attributes. This modified version is licensed under the LICENSE.
If you find this work helpful, please consider citing our paper:
@article{xiang2024structured,
title = {Structured 3D Latents for Scalable and Versatile 3D Generation},
author = {Xiang, Jianfeng and Lv, Zelong and Xu, Sicheng and Deng, Yu and Wang, Ruicheng and Zhang, Bowen and Chen, Dong and Tong, Xin and Yang, Jiaolong},
journal = {arXiv preprint arXiv:2412.01506},
year = {2024}
}