Release v1.1.0

3DTopia · Mar 4, 2024 · 03465ae · 03465ae
1 parent d4caebb
commit 03465ae
Show file tree

Hide file tree

Showing 72 changed files with 5,240 additions and 598 deletions.
diff --git a/README.md b/README.md
@@ -20,6 +20,7 @@
 
 ## News
 
+- [2024.03.04] Version update v1.1. Release model weights trained on both Objaverse and MVImgNet. Codebase is majorly refactored for better usability and extensibility. Please refer to [v1.1.0](https://github.com/3DTopia/OpenLRM/releases/tag/v1.1.0) for details.
 - [2024.01.09] Updated all v1.0 models trained on Objaverse. Please refer to [HF Models](https://huggingface.co/zxhezexin) and overwrite previous model weights.
 - [2023.12.21] [Hugging Face Demo](https://huggingface.co/spaces/zxhezexin/OpenLRM) is online. Have a try!
 - [2023.12.20] Release weights of the base and large models trained on Objaverse.
@@ -34,9 +35,11 @@ cd OpenLRM
 ```
 
 ### Environment
-```
-pip install -r requirements.txt
-```
+- Install requirements for OpenLRM first.
+  ```
+  pip install -r requirements.txt
+  ```
+- Please then follow the [xFormers installation guide](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) to enable memory efficient attention inside [DINOv2 encoder](openlrm/models/encoders/dinov2/layers/attention.py).
 
 ## Quick Start
 
@@ -46,14 +49,14 @@ pip install -r requirements.txt
 - Weights will be downloaded automatically when you run the inference script for the first time.
 - Please be aware of the [license](LICENSE_WEIGHT) before using the weights.
 
-| Model | Training Data | Layers | Feat. Dim | Trip. Dim. | Render Res. | Link |
+| Model | Training Data | Layers | Feat. Dim | Trip. Dim. | In. Res. | Link |
 | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
-| openlrm-small-obj-1.0 | Objaverse | 12 | 768 | 32 | 192 | [HF](https://huggingface.co/zxhezexin/openlrm-small-obj-1.0) |
-| openlrm-base-obj-1.0 | Objaverse | 12 | 1024 | 40 | 192 | [HF](https://huggingface.co/zxhezexin/openlrm-base-obj-1.0) |
-| openlrm-large-obj-1.0 | Objaverse | 16 | 1024 | 80 | 384 | [HF](https://huggingface.co/zxhezexin/openlrm-large-obj-1.0) |
-| openlrm-small | Objaverse + MVImgNet | 12 | 768 | 32 | 192 | To be released |
-| openlrm-base | Objaverse + MVImgNet | 12 | 1024 | 40 | 192 | To be released |
-| openlrm-large | Objaverse + MVImgNet | 16 | 1024 | 80 | 384 | To be released |
+| openlrm-obj-small-1.1 | Objaverse | 12 | 512 | 32 | 224 | [HF](https://huggingface.co/zxhezexin/openlrm-obj-small-1.1) |
+| openlrm-obj-base-1.1 | Objaverse | 12 | 768 | 48 | 336 | [HF](https://huggingface.co/zxhezexin/openlrm-obj-base-1.1) |
+| openlrm-obj-large-1.1 | Objaverse | 16 | 1024 | 80 | 448 | [HF](https://huggingface.co/zxhezexin/openlrm-obj-large-1.1) |
+| openlrm-mix-small-1.1 | Objaverse + MVImgNet | 12 | 512 | 32 | 224 | [HF](https://huggingface.co/zxhezexin/openlrm-mix-small-1.1) |
+| openlrm-mix-base-1.1 | Objaverse + MVImgNet | 12 | 768 | 48 | 336 | [HF](https://huggingface.co/zxhezexin/openlrm-mix-base-1.1) |
+| openlrm-mix-large-1.1 | Objaverse + MVImgNet | 16 | 1024 | 80 | 448 | [HF](https://huggingface.co/zxhezexin/openlrm-mix-large-1.1) |
 
 Model cards with additional details can be found in [model_card.md](model_card.md).
 
@@ -63,16 +66,20 @@ Model cards with additional details can be found in [model_card.md](model_card.m
 
 ### Inference
 - Run the inference script to get 3D assets.
-- You may specify which form of output to generate by setting the flags `--export_video` and `--export_mesh`.
-
-    ```
-    # Example usages
-    # Render a video
-    python -m lrm.inferrer --model_name openlrm-base-obj-1.0 --source_image ./assets/sample_input/owl.png --export_video
-    
-    # Export mesh
-    python -m lrm.inferrer --model_name openlrm-base-obj-1.0 --source_image ./assets/sample_input/owl.png --export_mesh
-    ```
+- You may specify which form of output to generate by setting the flags `EXPORT_VIDEO=true` and `EXPORT_MESH=true`.
+- Please set default `INFER_CONFIG` according to the model you want to use. E.g., `infer-b.yaml` for base models and `infer-s.yaml` for small models.
+- An example usage is as follows:
+
+  ```
+  # Example usage
+  EXPORT_VIDEO=true
+  EXPORT_MESH=true
+  INFER_CONFIG="./configs/infer-b.yaml"
+  MODEL_NAME="zxhezexin/openlrm-mix-base-1.1"
+  IMAGE_INPUT="./assets/sample_input/owl.png"
+
+  python -m openlrm.launch infer.lrm --infer $INFER_CONFIG model_name=$MODEL_NAME image_input=$IMAGE_INPUT export_video=$EXPORT_VIDEO export_mesh=$EXPORT_MESH
+  ```
 
 ## Training
 To be released soon.

diff --git a/app.py b/app.py
@@ -0,0 +1,210 @@
+# Copyright (c) 2023-2024, Zexin He
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import os
+from PIL import Image
+import numpy as np
+import gradio as gr
+
+
+def assert_input_image(input_image):
+    if input_image is None:
+        raise gr.Error("No image selected or uploaded!")
+
+def prepare_working_dir():
+    import tempfile
+    working_dir = tempfile.TemporaryDirectory()
+    return working_dir
+
+def init_preprocessor():
+    from openlrm.utils.preprocess import Preprocessor
+    global preprocessor
+    preprocessor = Preprocessor()
+
+def preprocess_fn(image_in: np.ndarray, remove_bg: bool, recenter: bool, working_dir):
+    image_raw = os.path.join(working_dir.name, "raw.png")
+    with Image.fromarray(image_in) as img:
+        img.save(image_raw)
+    image_out = os.path.join(working_dir.name, "rembg.png")
+    success = preprocessor.preprocess(image_path=image_raw, save_path=image_out, rmbg=remove_bg, recenter=recenter)
+    assert success, f"Failed under preprocess_fn!"
+    return image_out
+
+
+def demo_openlrm(infer_impl):
+
+    def core_fn(image: str, source_cam_dist: float, working_dir):
+        dump_video_path = os.path.join(working_dir.name, "output.mp4")
+        dump_mesh_path = os.path.join(working_dir.name, "output.ply")
+        infer_impl(
+            image_path=image,
+            source_cam_dist=source_cam_dist,
+            export_video=True,
+            export_mesh=False,
+            dump_video_path=dump_video_path,
+            dump_mesh_path=dump_mesh_path,
+        )
+        return dump_video_path
+
+    def example_fn(image: np.ndarray):
+        from gradio.utils import get_cache_folder
+        working_dir = get_cache_folder()
+        image = preprocess_fn(
+            image_in=image,
+            remove_bg=True,
+            recenter=True,
+            working_dir=working_dir,
+        )
+        video = core_fn(
+            image=image,
+            source_cam_dist=2.0,
+            working_dir=working_dir,
+        )
+        return image, video
+
+
+    _TITLE = '''OpenLRM: Open-Source Large Reconstruction Models'''
+
+    _DESCRIPTION = '''
+        <div>
+            <a style="display:inline-block" href='https://github.com/3DTopia/OpenLRM'><img src='https://img.shields.io/github/stars/3DTopia/OpenLRM?style=social'/></a>
+            <a style="display:inline-block; margin-left: .5em" href="https://huggingface.co/zxhezexin"><img src='https://img.shields.io/badge/Model-Weights-blue'/></a>
+        </div>
+        OpenLRM is an open-source implementation of Large Reconstruction Models.
+
+        <strong>Image-to-3D in 10 seconds!</strong>
+
+        <strong>Disclaimer:</strong> This demo uses `openlrm-mix-base-1.1` model with 288x288 rendering resolution here for a quick demonstration.
+    '''
+
+    with gr.Blocks(analytics_enabled=False) as demo:
+
+        # HEADERS
+        with gr.Row():
+            with gr.Column(scale=1):
+                gr.Markdown('# ' + _TITLE)
+        with gr.Row():
+            gr.Markdown(_DESCRIPTION)
+
+        # DISPLAY
+        with gr.Row():
+
+            with gr.Column(variant='panel', scale=1):
+                with gr.Tabs(elem_id="openlrm_input_image"):
+                    with gr.TabItem('Input Image'):
+                        with gr.Row():
+                            input_image = gr.Image(label="Input Image", image_mode="RGBA", width="auto", sources="upload", type="numpy", elem_id="content_image")
+
+            with gr.Column(variant='panel', scale=1):
+                with gr.Tabs(elem_id="openlrm_processed_image"):
+                    with gr.TabItem('Processed Image'):
+                        with gr.Row():
+                            processed_image = gr.Image(label="Processed Image", image_mode="RGBA", type="filepath", elem_id="processed_image", width="auto", interactive=False)
+
+            with gr.Column(variant='panel', scale=1):
+                with gr.Tabs(elem_id="openlrm_render_video"):
+                    with gr.TabItem('Rendered Video'):
+                        with gr.Row():
+                            output_video = gr.Video(label="Rendered Video", format="mp4", width="auto", autoplay=True)
+
+        # SETTING
+        with gr.Row():
+            with gr.Column(variant='panel', scale=1):
+                with gr.Tabs(elem_id="openlrm_attrs"):
+                    with gr.TabItem('Settings'):
+                        with gr.Column(variant='panel'):
+                            gr.Markdown(
+                                """
+                                <strong>Best Practice</strong>:
+                                    Centered objects in reasonable sizes. Try adjusting source camera distances.
+                                """
+                            )
+                            checkbox_rembg = gr.Checkbox(True, label='Remove background')
+                            checkbox_recenter = gr.Checkbox(True, label='Recenter the object')
+                            slider_cam_dist = gr.Slider(1.0, 3.5, value=2.0, step=0.1, label="Source Camera Distance")
+                            submit = gr.Button('Generate', elem_id="openlrm_generate", variant='primary')
+
+        # EXAMPLES
+        with gr.Row():
+            examples = [
+                ['assets/sample_input/owl.png'],
+                ['assets/sample_input/building.png'],
+                ['assets/sample_input/mailbox.png'],
+                ['assets/sample_input/fire.png'],
+                ['assets/sample_input/girl.png'],
+                ['assets/sample_input/lamp.png'],
+                ['assets/sample_input/hydrant.png'],
+                ['assets/sample_input/hotdogs.png'],
+                ['assets/sample_input/traffic.png'],
+                ['assets/sample_input/ceramic.png'],
+            ]
+            gr.Examples(
+                examples=examples,
+                inputs=[input_image], 
+                outputs=[processed_image, output_video],
+                fn=example_fn,
+                cache_examples=os.getenv('SYSTEM') != 'spaces',
+                examples_per_page=20,
+            )
+
+        working_dir = gr.State()
+        submit.click(
+            fn=assert_input_image,
+            inputs=[input_image],
+            queue=False,
+        ).success(
+            fn=prepare_working_dir,
+            outputs=[working_dir],
+            queue=False,
+        ).success(
+            fn=preprocess_fn,
+            inputs=[input_image, checkbox_rembg, checkbox_recenter, working_dir],
+            outputs=[processed_image],
+        ).success(
+            fn=core_fn,
+            inputs=[processed_image, slider_cam_dist, working_dir],
+            outputs=[output_video],
+        )
+
+        demo.queue()
+        demo.launch()
+
+
+def launch_gradio_app():
+
+    os.environ.update({
+        "APP_ENABLED": "1",
+        "APP_MODEL_NAME": "zxhezexin/openlrm-mix-base-1.1",
+        "APP_INFER": "./configs/infer-gradio.yaml",
+        "APP_TYPE": "infer.lrm",
+        "NUMBA_THREADING_LAYER": 'omp',
+    })
+
+    from openlrm.runners import REGISTRY_RUNNERS
+    from openlrm.runners.infer.base_inferrer import Inferrer
+    InferrerClass : Inferrer = REGISTRY_RUNNERS[os.getenv("APP_TYPE")]
+    with InferrerClass() as inferrer:
+        init_preprocessor()
+        if os.getenv('SYSTEM') != 'spaces':
+            from openlrm.utils.proxy import no_proxy
+            demo = no_proxy(demo_openlrm)
+        else:
+            demo = demo_openlrm
+        demo(infer_impl=inferrer.infer_single)
+
+
+if __name__ == '__main__':
+
+    launch_gradio_app()
diff --git a/configs/infer-b.yaml b/configs/infer-b.yaml
@@ -0,0 +1,8 @@
+source_size: 336
+source_cam_dist: 2.0
+render_size: 288
+render_views: 160
+render_fps: 40
+frame_size: 4
+mesh_size: 384
+mesh_thres: 3.0
diff --git a/configs/infer-gradio.yaml b/configs/infer-gradio.yaml
@@ -0,0 +1,7 @@
+source_size: 336
+render_size: 288
+render_views: 100
+render_fps: 25
+frame_size: 2
+mesh_size: 384
+mesh_thres: 3.0
diff --git a/configs/infer-l.yaml b/configs/infer-l.yaml
@@ -0,0 +1,8 @@
+source_size: 448
+source_cam_dist: 2.0
+render_size: 384
+render_views: 160
+render_fps: 40
+frame_size: 2
+mesh_size: 384
+mesh_thres: 3.0
diff --git a/configs/infer-s.yaml b/configs/infer-s.yaml
@@ -0,0 +1,8 @@
+source_size: 224
+source_cam_dist: 2.0
+render_size: 192
+render_views: 160
+render_fps: 40
+frame_size: 4
+mesh_size: 384
+mesh_thres: 3.0