Skip to content

naver/ludvig

Repository files navigation

LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes

This repository contains code for LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes. LUDVIG uses a learning-free approach to uplift visual features from models such as DINOv2, SAM and CLIP into 3D Gaussian Splatting scenes. It refines 3D features, such as coarse segmentation masks, based on a graph diffusion process that incorporates the 3D geomtry of the scene and DINOv2 feature similarities. We evaluate on foreground/background and open-vocabulary object segmentation tasks.

LUDVIG Main Figure

Illustration of the inverse and forward rendering between 2D visual features (produced by DINOv2) and a 3D Gaussian Splatting scene. In the inverse rendering (or uplifting) phase, features are created for each 3D Gaussian by aggregating coarse 2D features over all viewing directions. For forward rendering, the 3D features are projected on any given viewing direction as in regular Gaussian Splatting.

Table of Contents

  1. Setup
  2. Project Structure
  3. Demo
  4. Reproducing results
  5. Citing LUDVIG
  6. License

Setup

Clone the repo and cd into it

git clone [email protected]:naver/ludvig.git
cd ludvig

Run the following script to set the paths to your cuda dependencies, e.g. cuda_path=/usr/local/cuda-11.8:

bash script/set_cuda.sh ${cuda_path}

Modify the pytorch-cuda version in environment.yml to match your CUDA version, and create the ludvig environment:

mamba env create -f environment.yml

Our code has been tested on Ubuntu 22, CUDA 11.8 with GPU A6000 ADA (48GB of memory).

Project Structure

The project in ludvig/ is as follows:

  • ludvig_*.py: Main scripts for uplifting, graph diffusion and evaluation.
  • scripts/: Bash scripts calling ludvig_*.py
  • configs/: Configuration files for different models and evaluation tasks.
  • diffusion/: Classes for graph diffusion.
  • evaluation/: Classes for evaluation, including segmentation on NVOS and SPIn-NeRF with SAM and DINOv2.

Additionally, you should have the following folders in ludvig/ (e.g. as symbolic links to storage locations):

Demo

Data

For this demo, we use the stump scene from Mip-NeRF 360, with the pretrained Gaussian Splatting representation provided by the authors of Gaussian Splatting.
First download the scene and weights:

bash script/demo_download.sh

This saves the data in dataset/stump and model weights in dataset/stump/gs.

Running the demo

The following script will uplift DINOv2 features and save visualizations of the uplifted features:

python demo.py

The script creates an instance of ludvig_uplift.LUDVIGUplift based on paths to the data and on the configuration configs/demo.yaml.
It then runs uplifting through model.uplift() and saves 3D features and visualizations through model.save().

Feature map generation and uplifting

The method model.uplift():

For constructing 2D DINOv2 feature maps, we use the dataset predictors.dino.DINOv2Dataset (as indicated in demo.yaml).
The dataset loads the scene images, predicts DINOv2 features and performs dimensionality reduction.

Currently supported numbers of features are {1, 2, 3, 10, 20, 30, 40, 50, 100, 200, 256, 512}. If you need to uplift features with another dimension, you can add the option at line 421 here and compile again.

Directly uplifting existing 2D feature maps. If you directly have features or masks to uplift, you can use predictors.base.BaseDataset instead. The path to your features should be given as directory argument to the dataset, as in configs/demo_rgb.yaml. As a mock example, running python demo.py --rgb will directly uplif and reproject RGB images.
Note that the name of your features should match camera names (i.e. the names of RGB images used for training).

Visualization and evaluation

The method model.save() saves uplifted features and visualizations in logs/demo/.
You can also define your own postprocessing or evaluation procedure by subclassing evaluation.base.EvaluationBase and adding it to the configuration file under evaluation. You can take example from our experimental setup for SPIn-NeRF and NVOS.

3D DINOv2 PCA of first stump image from Mip-NeRF 360

Reproducing results

The datasets should be stored in ludvig/dataset.
All experiments require a trained Gaussian Splatting representation of the scene saved in a gs/ folder under dataset/${scene_path}/gs, as indicated in the structures below.

Foreground/background segmentation

Data

SPIn-NeRF
Download instructions for SPIn-NeRF data

The segmentation masks from the SPIn-NeRF data can be found here. After downloading the data, organize the data to match the following structure:

dataset/
├── SPIn-NeRF_data/
│   ├── fork/
│   │   ├── images/
│   │   └── gs/  
│   ├── lego/
│   └── ...
└── SPIn-NeRF_masks/
    ├── fork/
    ├── lego/
    └── ...
NVOS

The LLFF dataset is available here. The scribbles and test masks are provided here.
The data should have the following structure, with gs/ containing the Gaussian Splatting logs:

dataset/
├── llff_data/
│   ├── fern/
│   │   ├── images/
│   │   └── gs/  
│   └── ...
└── llff_masks/
    ├── masks/
    ├── reference_image/
    └── scribbles/

Uplifting and evaluation

Uplifting and segmentation on SPIn-NeRF and NVOS:

    bash script/seg.sh $scene $cfg

with the following arguments:

  • scene: The scene to evaluate, e.g., trex, horns_center, etc.
  • cfg: Configuration file for evaluation, e.g., dif_NVOS, sam_SPIn (see below).
Configuration files for foreground/background segmentation on SPIn-NeRF and NVOS
  • sam_[NVOS|SPIn]: segmentation on NVOS/SPIn-NeRF with SAM.
  • dif_[NVOS|SPIn]: segmentation on NVOS/SPIn-NeRF with DINOv2 and graph diffusion.
  • xdif_SPIn: segmentation on SPIn-NeRF with DINOv2, without graph diffusion.
  • depth_SPIn: segmentation on SPIn-NeRF with mask uplifting and reprojection.
  • singleview_[sam|dinov2]_SPIn: single-view segmentation with DINOv2 or SAM.

Open-vocabulary object detection

Data

We evaluate on the extended version of the LERF dataset introduced by LangSplat. Download their data and save Gaussian Splatting logs in a gs/ folder as indicated in the structure below.

dataset/lerf_ovs
├── figurines/
│   ├── images/
│   └── gs/
├── ...
└── label/
    ├── figurines/
    └── ...   

Uplifting and evaluation

  1. Uplift CLIP and DINOv2 features:

    bash script/lerf_uplift.sh $scene

    with scene one of figurines, ramen, etc.
    The uplifted features are saved in logs/lerf/$scene/clip and logs/lerf/$scene/dinov2.

  2. Evaluate the uplifted features on localization and segmentation tasks:

    bash script/lerf_eval.sh $scene $cfg [--no_diffusion]

    with cfg either lerf_eval_sam for segmentation with SAM or lerf_eval otherwise (automatic thresholding). Pass --no_diffusion to disable graph diffusion based on DINOv2 features.
    To reproduce our results on object localization, you can run bash scripts/lerf_eval.sh $scene lerf_eval --no_diffusion.
    The evaluation results (IoU and localization accuracy) are saved in logs/lerf/$scene/iou.txt, the mask predictions in logs/lerf/$scene/masks*, and the localization heatmaps in logs/lerf/$scene/localization.

Citing LUDVIG

If you find our work useful, please consider citing us:

@article{marrie2024ludvig,
    title={LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes},
    author={Marrie, Juliette and Menegaux, Romain and Arbel, Michael and Larlus, Diane and Mairal, Julien},
    journal={arXiv preprint arXiv:2410.14462},
    year={2024}
}

For any inquiries or contributions, please reach out to [email protected].

License

LUDVIG, Copyright (C) 2024, 2025 Inria and NAVER Corp., CC BY-NC-SA 4.0 License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published