TRELLIS-500K is a dataset of 500K 3D assets curated from Objaverse(XL), ABO, 3D-FUTURE, HSSD, and Toys4k, filtered based on aesthetic scores. This dataset serves for 3D generation tasks.
The dataset is provided as csv files containing the 3D assets' metadata.
The following table summarizes the dataset's filtering and composition:
NOTE: Some of the 3D assets lack text captions. Please filter out such assets if captions are required.
Source | Aesthetic Score Threshold | Filtered Size | With Captions |
---|---|---|---|
ObjaverseXL (sketchfab) | 5.5 | 168307 | 167638 |
ObjaverseXL (github) | 5.5 | 311843 | 306790 |
ABO | 4.5 | 4485 | 4390 |
3D-FUTURE | 4.5 | 9472 | 9291 |
HSSD | 4.5 | 6670 | 6661 |
All (training set) | - | 500777 | 494770 |
Toys4k (evaluation set) | 4.5 | 3229 | 3180 |
The dataset is hosted on Hugging Face Datasets. You can preview the dataset at
https://huggingface.co/datasets/JeffreyXiang/TRELLIS-500K
There is no need to download the csv files manually. We provide toolkits to load and prepare the dataset.
We provide toolkits for data preparation.
. ./dataset_toolkits/setup.sh
First, we need to load the metadata of the dataset.
python dataset_toolkits/build_metadata.py <SUBSET> --output_dir <OUTPUT_DIR> [--source <SOURCE>]
SUBSET
: The subset of the dataset to load. Options areObjaverseXL
,ABO
,3D-FUTURE
,HSSD
, andToys4k
.OUTPUT_DIR
: The directory to save the data.SOURCE
: Required ifSUBSET
isObjaverseXL
. Options aresketchfab
andgithub
.
For example, to load the metadata of the ObjaverseXL (sketchfab) subset and save it to datasets/ObjaverseXL_sketchfab
, we can run:
python dataset_toolkits/build_metadata.py ObjaverseXL --source sketchfab --output_dir datasets/ObjaverseXL_sketchfab
Next, we need to download the 3D assets.
python dataset_toolkits/download.py <SUBSET> --output_dir <OUTPUT_DIR> [--rank <RANK> --world_size <WORLD_SIZE>]
SUBSET
: The subset of the dataset to download. Options areObjaverseXL
,ABO
,3D-FUTURE
,HSSD
, andToys4k
.OUTPUT_DIR
: The directory to save the data.
You can also specify the RANK
and WORLD_SIZE
of the current process if you are using multiple nodes for data preparation.
For example, to download the ObjaverseXL (sketchfab) subset and save it to datasets/ObjaverseXL_sketchfab
, we can run:
NOTE: The example command below sets a large WORLD_SIZE
for demonstration purposes. Only a small portion of the dataset will be downloaded.
python dataset_toolkits/download.py ObjaverseXL --output_dir datasets/ObjaverseXL_sketchfab --world_size 160000
Some datasets may require interactive login to Hugging Face or manual downloading. Please follow the instructions given by the toolkits.
Multiview images can be rendered with:
python dataset_toolkits/render.py <SUBSET> --output_dir <OUTPUT_DIR> [--num_views <NUM_VIEWS>] [--rank <RANK> --world_size <WORLD_SIZE>]
SUBSET
: The subset of the dataset to render. Options areObjaverseXL
,ABO
,3D-FUTURE
,HSSD
, andToys4k
.OUTPUT_DIR
: The directory to save the data.NUM_VIEWS
: The number of views to render. Default is 150.RANK
andWORLD_SIZE
: Multi-node configuration.
For example, to render the ObjaverseXL (sketchfab) subset and save it to datasets/ObjaverseXL_sketchfab
, we can run:
python dataset_toolkits/render.py ObjaverseXL --output_dir datasets/ObjaverseXL_sketchfab
We can voxelize the 3D models with:
python dataset_toolkits/voxelize.py <SUBSET> --output_dir <OUTPUT_DIR> [--rank <RANK> --world_size <WORLD_SIZE>]
SUBSET
: The subset of the dataset to voxelize. Options areObjaverseXL
,ABO
,3D-FUTURE
,HSSD
, andToys4k
.OUTPUT_DIR
: The directory to save the data.RANK
andWORLD_SIZE
: Multi-node configuration.
For example, to voxelize the ObjaverseXL (sketchfab) subset and save it to datasets/ObjaverseXL_sketchfab
, we can run:
python dataset_toolkits/voxelize.py ObjaverseXL --output_dir datasets/ObjaverseXL_sketchfab
To prepare the training data for SLat VAE, we need to extract DINO features from multiview images and aggregate them into sparse voxel grids.
python dataset_toolkits/extract_features.py --output_dir <OUTPUT_DIR> [--rank <RANK> --world_size <WORLD_SIZE>]
OUTPUT_DIR
: The directory to save the data.RANK
andWORLD_SIZE
: Multi-node configuration.
For example, to extract DINO features from the ObjaverseXL (sketchfab) subset and save it to datasets/ObjaverseXL_sketchfab
, we can run:
python dataset_toolkits/extract_feature.py --output_dir datasets/ObjaverseXL_sketchfab
Now the metadata file should be updated manually:
python dataset_toolkits/build_metadata.py ObjaverseXL --output_dir datasets/ObjaverseXL_sketchfab
Encoding the sparse structures into latents to train the first stage generator:
python dataset_toolkits/encode_ss_latent.py --output_dir <OUTPUT_DIR> [--rank <RANK> --world_size <WORLD_SIZE>]
OUTPUT_DIR
: The directory to save the data.RANK
andWORLD_SIZE
: Multi-node configuration.
For example, to encode the sparse structures into latents for the ObjaverseXL (sketchfab) subset and save it to datasets/ObjaverseXL_sketchfab
, we can run:
python dataset_toolkits/encode_ss_latent.py --output_dir datasets/ObjaverseXL_sketchfab
Then update the metadata file with:
python dataset_toolkits/build_metadata.py ObjaverseXL --output_dir datasets/ObjaverseXL_sketchfab
Encoding SLat for second stage generator training:
python dataset_toolkits/encode_latent.py --output_dir <OUTPUT_DIR> [--rank <RANK> --world_size <WORLD_SIZE>]
OUTPUT_DIR
: The directory to save the data.RANK
andWORLD_SIZE
: Multi-node configuration.
For example, to encode SLat for the ObjaverseXL (sketchfab) subset and save it to datasets/ObjaverseXL_sketchfab
, we can run:
python dataset_toolkits/encode_latent.py --output_dir datasets/ObjaverseXL_sketchfab
Then update the metadata file with:
python dataset_toolkits/build_metadata.py ObjaverseXL --output_dir datasets/ObjaverseXL_sketchfab
To train the image conditioned generator, we need to render image conditions with augmented views.
python dataset_toolkits/render_cond.py <SUBSET> --output_dir <OUTPUT_DIR> [--num_views <NUM_VIEWS>] [--rank <RANK> --world_size <WORLD_SIZE>]
SUBSET
: The subset of the dataset to render. Options areObjaverseXL
,ABO
,3D-FUTURE
,HSSD
, andToys4k
.OUTPUT_DIR
: The directory to save the data.NUM_VIEWS
: The number of views to render. Default is 24.RANK
andWORLD_SIZE
: Multi-node configuration.
For example, to render image conditions for the ObjaverseXL (sketchfab) subset and save it to datasets/ObjaverseXL_sketchfab
, we can run:
python dataset_toolkits/render_cond.py ObjaverseXL --output_dir datasets/ObjaverseXL_sketchfab
Merge all above procedures into a single shell script:
bash dataset_toolkits/dataset_pipe.sh