LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes

This repository contains code for LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes. LUDVIG uses a learning-free approach to uplift visual features from models such as DINOv2, SAM and CLIP into 3D Gaussian Splatting scenes. It refines 3D features, such as coarse segmentation masks, based on a graph diffusion process that incorporates the 3D geomtry of the scene and DINOv2 feature similarities. We evaluate on foreground/background and open-vocabulary object segmentation tasks.

Illustration of the inverse and forward rendering between 2D visual features (produced by DINOv2) and a 3D Gaussian Splatting scene. In the inverse rendering (or uplifting) phase, features are created for each 3D Gaussian by aggregating coarse 2D features over all viewing directions. For forward rendering, the 3D features are projected on any given viewing direction as in regular Gaussian Splatting.

Setup

Clone the repo and cd into it

git clone [email protected]:naver/ludvig.git
cd ludvig

Run the following script to set the paths to your cuda dependencies, e.g. cuda_path=/usr/local/cuda-11.8:

bash script/set_cuda.sh ${cuda_path}

Modify the pytorch-cuda version in environment.yml to match your CUDA version, and create the ludvig environment:

mamba env create -f environment.yml

Our code has been tested on Ubuntu 22, CUDA 11.8 with GPU A6000 ADA (48GB of memory).

Project Structure

The project in ludvig/ is as follows:

ludvig_*.py: Main scripts for uplifting, graph diffusion and evaluation.
scripts/: Bash scripts calling ludvig_*.py
configs/: Configuration files for different models and evaluation tasks.
diffusion/: Classes for graph diffusion.
evaluation/: Classes for evaluation, including segmentation on NVOS and SPIn-NeRF with SAM and DINOv2.

Additionally, you should have the following folders in ludvig/ (e.g. as symbolic links to storage locations):

checkpoints/: Where model checkpoints should be stored. Our experiments use DINOv2 ViT-g with registers, SAM ViT-H, SAM2 Hiera Large.
dataset/: Where datasets should be stored.
logs/: Where logs will be stored.

Demo

Data

For this demo, we use the stump scene from Mip-NeRF 360, with the pretrained Gaussian Splatting representation provided by the authors of Gaussian Splatting.
First download the scene and weights:

bash script/demo_download.sh

This saves the data in dataset/stump and model weights in dataset/stump/gs.

Running the demo

The following script will uplift DINOv2 features and save visualizations of the uplifted features:

python demo.py

The script creates an instance of ludvig_uplift.LUDVIGUplift based on paths to the data and on the configuration configs/demo.yaml.
It then runs uplifting through model.uplift() and saves 3D features and visualizations through model.save().

Feature map generation and uplifting

The method model.uplift():

creates a dataset from a subclass of predictors.base.BaseDataset that generates the feature maps to be uplifted
calls utils.solver.uplifting to uplift the 2D feature maps generated by the dataset.
See details on the uplifting function

The function utils.solver.uplifting takes the following arguments:
- loader: An iterable (in our case, an instance of DINOv2Dataset) that should yield (feature, camera) pairs.
  - feature: A tensor of shape $(C, H, W)$.
  - camera: An instance of gaussiansplatting.scene.cameras.Simple_Camera.
- gaussian: The Gaussian Splatting model, which is an instance of gaussiansplatting.scene.gaussian_model.GaussianModel.

For constructing 2D DINOv2 feature maps, we use the dataset predictors.dino.DINOv2Dataset (as indicated in demo.yaml).
The dataset loads the scene images, predicts DINOv2 features and performs dimensionality reduction.

Currently supported numbers of features are {1, 2, 3, 10, 20, 30, 40, 50, 100, 200, 256, 512}. If you need to uplift features with another dimension, you can add the option at line 421 here and compile again.

Directly uplifting existing 2D feature maps. If you directly have features or masks to uplift, you can use predictors.base.BaseDataset instead. The path to your features should be given as directory argument to the dataset, as in configs/demo_rgb.yaml. As a mock example, running python demo.py --rgb will directly uplif and reproject RGB images.
Note that the name of your features should match camera names (i.e. the names of RGB images used for training).

Visualization and evaluation

The method model.save() saves uplifted features and visualizations in logs/demo/.
You can also define your own postprocessing or evaluation procedure by subclassing evaluation.base.EvaluationBase and adding it to the configuration file under evaluation. You can take example from our experimental setup for SPIn-NeRF and NVOS.

Reproducing results

The datasets should be stored in ludvig/dataset.
All experiments require a trained Gaussian Splatting representation of the scene saved in a gs/ folder under dataset/${scene_path}/gs, as indicated in the structures below.

Foreground/background segmentation

Data

SPIn-NeRF

Download instructions for SPIn-NeRF data

lego_real_night_radial:
Download from Google Drive.
Tanks & Temples (truck):
Download from the Tanks & Temples website.
NeRF LLFF data (fern, fortress, horns, leaves, orchids, room):
Download from Google Drive.
nerf supervision (fork):
Download from Google Drive.

The segmentation masks from the SPIn-NeRF data can be found here. After downloading the data, organize the data to match the following structure:

dataset/
├── SPIn-NeRF_data/
│   ├── fork/
│   │   ├── images/
│   │   └── gs/  
│   ├── lego/
│   └── ...
└── SPIn-NeRF_masks/
    ├── fork/
    ├── lego/
    └── ...

NVOS

The LLFF dataset is available here. The scribbles and test masks are provided here.
The data should have the following structure, with gs/ containing the Gaussian Splatting logs:

dataset/
├── llff_data/
│   ├── fern/
│   │   ├── images/
│   │   └── gs/  
│   └── ...
└── llff_masks/
    ├── masks/
    ├── reference_image/
    └── scribbles/

Uplifting and evaluation

Uplifting and segmentation on SPIn-NeRF and NVOS:

    bash script/seg.sh $scene $cfg

with the following arguments:

scene: The scene to evaluate, e.g., trex, horns_center, etc.
cfg: Configuration file for evaluation, e.g., dif_NVOS, sam_SPIn (see below).

Configuration files for foreground/background segmentation on SPIn-NeRF and NVOS

sam_[NVOS|SPIn]: segmentation on NVOS/SPIn-NeRF with SAM.
dif_[NVOS|SPIn]: segmentation on NVOS/SPIn-NeRF with DINOv2 and graph diffusion.
xdif_SPIn: segmentation on SPIn-NeRF with DINOv2, without graph diffusion.
depth_SPIn: segmentation on SPIn-NeRF with mask uplifting and reprojection.
singleview_[sam|dinov2]_SPIn: single-view segmentation with DINOv2 or SAM.

Open-vocabulary object detection

Data

We evaluate on the extended version of the LERF dataset introduced by LangSplat. Download their data and save Gaussian Splatting logs in a gs/ folder as indicated in the structure below.

dataset/lerf_ovs
├── figurines/
│   ├── images/
│   └── gs/
├── ...
└── label/
    ├── figurines/
    └── ...

Uplifting and evaluation

Uplift CLIP and DINOv2 features:
```
bash script/lerf_uplift.sh $scene
```
with scene one of figurines, ramen, etc.
The uplifted features are saved in logs/lerf/$scene/clip and logs/lerf/$scene/dinov2.
Evaluate the uplifted features on localization and segmentation tasks:
```
bash script/lerf_eval.sh $scene $cfg [--no_diffusion]
```
with cfg either lerf_eval_sam for segmentation with SAM or lerf_eval otherwise (automatic thresholding). Pass --no_diffusion to disable graph diffusion based on DINOv2 features.
To reproduce our results on object localization, you can run bash scripts/lerf_eval.sh $scene lerf_eval --no_diffusion.
The evaluation results (IoU and localization accuracy) are saved in logs/lerf/$scene/iou.txt, the mask predictions in logs/lerf/$scene/masks*, and the localization heatmaps in logs/lerf/$scene/localization.

Citing LUDVIG

If you find our work useful, please consider citing us:

@article{marrie2024ludvig,
    title={LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes},
    author={Marrie, Juliette and Menegaux, Romain and Arbel, Michael and Larlus, Diane and Mairal, Julien},
    journal={arXiv preprint arXiv:2410.14462},
    year={2024}
}

For any inquiries or contributions, please reach out to [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
clip_utils		clip_utils
configs		configs
diffusion		diffusion
dinov2		dinov2
evaluation		evaluation
gaussiansplatting		gaussiansplatting
predictors		predictors
script		script
utils		utils
LICENSE.txt		LICENSE.txt
README.md		README.md
demo.jpg		demo.jpg
demo.py		demo.py
environment.yml		environment.yml
figure.png		figure.png
ludvig_base.py		ludvig_base.py
ludvig_clip.py		ludvig_clip.py
ludvig_uplift.py		ludvig_uplift.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes

Table of Contents

Setup

Project Structure

Demo

Data

Running the demo

Feature map generation and uplifting

Visualization and evaluation

Reproducing results

Foreground/background segmentation

Data

SPIn-NeRF

NVOS

Uplifting and evaluation

Open-vocabulary object detection

Data

Uplifting and evaluation

Citing LUDVIG

License

About

Releases

Packages

Languages

License

naver/ludvig

Folders and files

Latest commit

History

Repository files navigation

LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes

Table of Contents

Setup

Project Structure

Demo

Data

Running the demo

Feature map generation and uplifting

Visualization and evaluation

Reproducing results

Foreground/background segmentation

Data

SPIn-NeRF

NVOS

Uplifting and evaluation

Open-vocabulary object detection

Data

Uplifting and evaluation

Citing LUDVIG

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages