This repository contains code for LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes. LUDVIG uses a learning-free approach to uplift visual features from models such as DINOv2, SAM and CLIP into 3D Gaussian Splatting scenes. It refines 3D features, such as coarse segmentation masks, based on a graph diffusion process that incorporates the 3D geomtry of the scene and DINOv2 feature similarities. We evaluate on foreground/background and open-vocabulary object segmentation tasks.
Illustration of the inverse and forward rendering between 2D visual features (produced by DINOv2) and a 3D Gaussian Splatting scene. In the inverse rendering (or uplifting) phase, features are created for each 3D Gaussian by aggregating coarse 2D features over all viewing directions. For forward rendering, the 3D features are projected on any given viewing direction as in regular Gaussian Splatting.
Clone the repo and cd
into it
git clone [email protected]:naver/ludvig.git
cd ludvig
Run the following script to set the paths to your cuda dependencies, e.g. cuda_path=/usr/local/cuda-11.8
:
bash script/set_cuda.sh ${cuda_path}
Modify the pytorch-cuda
version in environment.yml
to match your CUDA version, and create the ludvig
environment:
mamba env create -f environment.yml
Our code has been tested on Ubuntu 22, CUDA 11.8 with GPU A6000 ADA (48GB of memory).
The project in ludvig/
is as follows:
ludvig_*.py
: Main scripts for uplifting, graph diffusion and evaluation.scripts/
: Bash scripts callingludvig_*.py
configs/
: Configuration files for different models and evaluation tasks.diffusion/
: Classes for graph diffusion.evaluation/
: Classes for evaluation, including segmentation on NVOS and SPIn-NeRF with SAM and DINOv2.
Additionally, you should have the following folders in ludvig/
(e.g. as symbolic links to storage locations):
checkpoints/
: Where model checkpoints should be stored. Our experiments use DINOv2 ViT-g with registers, SAM ViT-H, SAM2 Hiera Large.dataset/
: Where datasets should be stored.logs/
: Where logs will be stored.
For this demo, we use the stump
scene from Mip-NeRF 360, with the pretrained Gaussian Splatting representation provided by the authors of Gaussian Splatting.
First download the scene and weights:
bash script/demo_download.sh
This saves the data in dataset/stump
and model weights in dataset/stump/gs
.
The following script will uplift DINOv2 features and save visualizations of the uplifted features:
python demo.py
The script creates an instance of ludvig_uplift.LUDVIGUplift based on paths to the data and on the configuration configs/demo.yaml.
It then runs uplifting through model.uplift()
and saves 3D features and visualizations through model.save()
.
The method model.uplift()
:
-
creates a dataset from a subclass of predictors.base.BaseDataset that generates the feature maps to be uplifted
-
calls utils.solver.uplifting to uplift the 2D feature maps generated by the dataset.
See details on the uplifting function
The function utils.solver.uplifting takes the following arguments:
-
loader
: An iterable (in our case, an instance ofDINOv2Dataset
) that should yield(feature, camera)
pairs.-
feature
: A tensor of shape$(C, H, W)$ . -
camera
: An instance ofgaussiansplatting.scene.cameras.Simple_Camera
.
-
-
gaussian
: The Gaussian Splatting model, which is an instance ofgaussiansplatting.scene.gaussian_model.GaussianModel
.
-
For constructing 2D DINOv2 feature maps, we use the dataset predictors.dino.DINOv2Dataset (as indicated in demo.yaml
).
The dataset loads the scene images, predicts DINOv2 features and performs dimensionality reduction.
Currently supported numbers of features are {1, 2, 3, 10, 20, 30, 40, 50, 100, 200, 256, 512}. If you need to uplift features with another dimension, you can add the option at line 421 here and compile again.
Directly uplifting existing 2D feature maps. If you directly have features or masks to uplift, you can use predictors.base.BaseDataset instead. The path to your features should be given as directory
argument to the dataset, as in configs/demo_rgb.yaml. As a mock example, running python demo.py --rgb
will directly uplif and reproject RGB images.
Note that the name of your features should match camera names (i.e. the names of RGB images used for training).
The method model.save()
saves uplifted features and visualizations in logs/demo/
.
You can also define your own postprocessing or evaluation procedure by subclassing evaluation.base.EvaluationBase and adding it to the configuration file under evaluation
. You can take example from our experimental setup for SPIn-NeRF and NVOS.
The datasets should be stored in ludvig/dataset
.
All experiments require a trained Gaussian Splatting representation of the scene saved in a gs/
folder under dataset/${scene_path}/gs
, as indicated in the structures below.
Download instructions for SPIn-NeRF data
-
lego_real_night_radial:
Download from Google Drive. -
Tanks & Temples (truck):
Download from the Tanks & Temples website. -
NeRF LLFF data (fern, fortress, horns, leaves, orchids, room):
Download from Google Drive. -
nerf supervision (fork):
Download from Google Drive.
The segmentation masks from the SPIn-NeRF data can be found here. After downloading the data, organize the data to match the following structure:
dataset/
├── SPIn-NeRF_data/
│ ├── fork/
│ │ ├── images/
│ │ └── gs/
│ ├── lego/
│ └── ...
└── SPIn-NeRF_masks/
├── fork/
├── lego/
└── ...
The LLFF dataset is available here.
The scribbles and test masks are provided here.
The data should have the following structure, with gs/
containing the Gaussian Splatting logs:
dataset/
├── llff_data/
│ ├── fern/
│ │ ├── images/
│ │ └── gs/
│ └── ...
└── llff_masks/
├── masks/
├── reference_image/
└── scribbles/
Uplifting and segmentation on SPIn-NeRF and NVOS:
bash script/seg.sh $scene $cfg
with the following arguments:
scene
: The scene to evaluate, e.g.,trex
,horns_center
, etc.cfg
: Configuration file for evaluation, e.g.,dif_NVOS
,sam_SPIn
(see below).
Configuration files for foreground/background segmentation on SPIn-NeRF and NVOS
sam_[NVOS|SPIn]
: segmentation on NVOS/SPIn-NeRF with SAM.dif_[NVOS|SPIn]
: segmentation on NVOS/SPIn-NeRF with DINOv2 and graph diffusion.xdif_SPIn
: segmentation on SPIn-NeRF with DINOv2, without graph diffusion.depth_SPIn
: segmentation on SPIn-NeRF with mask uplifting and reprojection.singleview_[sam|dinov2]_SPIn
: single-view segmentation with DINOv2 or SAM.
We evaluate on the extended version of the LERF dataset introduced by LangSplat.
Download their data and save Gaussian Splatting logs in a gs/
folder as indicated in the structure below.
dataset/lerf_ovs
├── figurines/
│ ├── images/
│ └── gs/
├── ...
└── label/
├── figurines/
└── ...
-
Uplift CLIP and DINOv2 features:
bash script/lerf_uplift.sh $scene
with
scene
one offigurines
,ramen
, etc.
The uplifted features are saved inlogs/lerf/$scene/clip
andlogs/lerf/$scene/dinov2
. -
Evaluate the uplifted features on localization and segmentation tasks:
bash script/lerf_eval.sh $scene $cfg [--no_diffusion]
with
cfg
eitherlerf_eval_sam
for segmentation with SAM orlerf_eval
otherwise (automatic thresholding). Pass--no_diffusion
to disable graph diffusion based on DINOv2 features.
To reproduce our results on object localization, you can runbash scripts/lerf_eval.sh $scene lerf_eval --no_diffusion
.
The evaluation results (IoU and localization accuracy) are saved inlogs/lerf/$scene/iou.txt
, the mask predictions inlogs/lerf/$scene/masks*
, and the localization heatmaps inlogs/lerf/$scene/localization
.
If you find our work useful, please consider citing us:
@article{marrie2024ludvig,
title={LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes},
author={Marrie, Juliette and Menegaux, Romain and Arbel, Michael and Larlus, Diane and Mairal, Julien},
journal={arXiv preprint arXiv:2410.14462},
year={2024}
}
For any inquiries or contributions, please reach out to [email protected].
LUDVIG, Copyright (C) 2024, 2025 Inria and NAVER Corp., CC BY-NC-SA 4.0 License.