Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis Implementation

Open in Streamlit Open In Colab

แ„‰แ…ณแ„แ…ณแ„…แ…ตแ†ซแ„‰แ…ฃแ†บ 2021-07-04 แ„‹แ…ฉแ„’แ…ฎ 4 11 51

This project attempted to implement the paper Putting NeRF on a Diet (DietNeRF) in JAX/Flax. DietNeRF is designed for rendering quality novel views in few-shot learning scheme, a task that vanilla NeRF (Neural Radiance Field) struggles. To achieve this, the author coins Semantic Consistency Loss to supervise DietNeRF by prior knowledge from CLIP Vision Transformer. Such supervision enables DietNeRF to learn 3D scene reconstruction with CLIP's prior knowledge on 2D views.

Besides this repo, you can check our write-up and demo here:

๐Ÿคฉ Demo

  1. You can check out our demo in Hugging Face Space
  2. Or you can set up our Streamlit demo locally (model checkpoints will be fetched automatically upon startup)
pip install -r requirements_demo.txt
streamlit run app.py

Streamlit Demo

โœจ Implementation

Our code is written in JAX/ Flax and mainly based upon jaxnerf from Google Research. The base code is highly optimized in GPU & TPU. For semantic consistency loss, we utilize pretrained CLIP Vision Transformer from transformers library. To learn more about DietNeRF, our experiments and implementation, you are highly recommended to check out our very detailed Notion write-up!

แ„‰แ…ณแ„แ…ณแ„…แ…ตแ†ซแ„‰แ…ฃแ†บ 2021-07-04 แ„‹แ…ฉแ„’แ…ฎ 4 11 51

๐Ÿค— Hugging Face Model Hub Repo

You can also find our project on the Hugging Face Model Hub Repository.

Our JAX/Flax implementation currently supports:

Platform Single-Host GPU Multi-Device TPU
Type Single-Device Multi-Device Single-Host Multi-Host
Training Supported Supported Supported Supported
Evaluation Supported Supported Supported Supported

๐Ÿ’ป Installation

# Clone the repo
git clone https://github.com/codestella/putting-nerf-on-a-diet
# Create a conda environment, note you can use python 3.6-3.8 as
# one of the dependencies (TensorFlow) hasn't supported python 3.9 yet.
conda create --name jaxnerf python=3.6.12; conda activate jaxnerf
# Prepare pip
conda install pip; pip install --upgrade pip
# Install requirements
pip install -r requirements.txt
# [Optional] Install GPU and TPU support for Jax
# Remember to change cuda101 to your CUDA version, e.g. cuda110 for CUDA 11.0.
!pip install --upgrade jax "jax[cuda110]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
# install flax and flax-transformer
pip install flax transformers[flax]

โšฝ Dataset

Download the datasets from the NeRF official Google Drive. Please download the nerf_synthetic.zip and unzip them in the place you like. Let's assume they are placed under /tmp/jaxnerf/data/.

๐Ÿ’– Methods

  • ๐Ÿ‘‰๐Ÿ‘‰ You can check VEEEERY detailed explanation about our project on Notion Report

แ„‰แ…ณแ„แ…ณแ„…แ…ตแ†ซแ„‰แ…ฃแ†บ 2021-07-04 แ„‹แ…ฉแ„’แ…ฎ 4 11 51

Based on the principle that โ€œa bulldozer is a bulldozer from any perspectiveโ€, Our proposed DietNeRF supervises the radiance field from arbitrary poses (DietNeRF cameras). This is possible because we compute a semantic consistency loss in a feature space capturing high-level scene attributes, not in pixel space. We extract semantic representations of renderings using the CLIP Vision Transformer, then maximize similarity with representations of ground-truth views. In effect, we use prior knowledge about scene semantics learned by single-view 2D image encoders to constrain a 3D representation.

You can check detail information on the author's paper. Also, you can check the CLIP based semantic loss structure on the following image.

แ„‰แ…ณแ„แ…ณแ„…แ…ตแ†ซแ„‰แ…ฃแ†บ 2021-07-04 แ„‹แ…ฉแ„’แ…ฎ 4 11 51

Our code used JAX/FLAX framework for implementation. So that it can achieve much speed up than other NeRF codes. At last, our code used hugging face, transformer, CLIP model library.

๐ŸคŸ How to use

python -m train \
  --data_dir=/PATH/TO/YOUR/SCENE/DATA \ % e.g., nerf_synthetic/lego
  --train_dir=/PATH/TO/THE/PLACE/YOU/WANT/TO/SAVE/CHECKPOINTS \
  --config=configs/CONFIG_YOU_LIKE

You can toggle the semantic loss by โ€œuse_semantic_lossโ€ in configuration files.

๐Ÿ’Ž Experimental Results

โ— Rendered Rendering images by 8-shot learned Diet-NeRF

DietNeRF has a strong capacity to generalise on novel and challenging views with EXTREMELY SMALL TRAINING SAMPLES!

HOTDOG / DRUM / SHIP / CHAIR / LEGO / MIC

โ— Rendered GIF by occluded 14-shot learned NeRF and Diet-NeRF

We made artificial occlusion on the right side of image (Only picked left side training poses). The reconstruction quality can be compared with this experiment. DietNeRF shows better quality than Original NeRF when It is occluded.

Training poses

LEGO

[DietNeRF] [NeRF]

SHIP

[DietNeRF] [NeRF]

๐Ÿ‘จโ€๐Ÿ‘งโ€๐Ÿ‘ฆ Our Teams

Teams Members
Project Managing Stella Yang To Watch Our Project Progress, Please Check Our Project Notion
NeRF Team Stella Yang, Alex Lau, Seunghyun Lee, Hyunkyu Kim, Haswanth Aekula, JaeYoung Chung
CLIP Team Seunghyun Lee, Sasikanth Kotti, Khali Sifullah , Sunghyun Kim
Cloud TPU Team Alex Lau, Aswin Pyakurel, JaeYoung Chung, Sunghyun Kim

๐Ÿ˜Ž What we improved from original JAX-NeRF : Innovation

  • Neural rendering with fewshot images
  • Hugging face CLIP based semantic loss loop
  • You can choose coarse mlp / coarse + fine mlp training (coarse + fine is on the main branch / coarse is on the coarse_only branch)
    • coarse + fine : shows good geometric reconstruction
    • coarse : shows good PSNR/SSIM result
  • Make Video/GIF rendering result, --generate_gif_only arg can run fast rendering GIF.
  • Cleaning / refactoring the code
  • Made multiple models / colab / space for Nice demo

๐Ÿ’ž Social Impact

  • Game Industry
  • Augmented Reality Industry
  • Virtual Reality Industry
  • Graphics Industry
  • Online shopping
  • Metaverse
  • Digital Twin
  • Mapping / SLAM

๐ŸŒฑ References

This project is based on โ€œJAX-NeRFโ€.

@software{jaxnerf2020github,
  author = {Boyang Deng and Jonathan T. Barron and Pratul P. Srinivasan},
  title = {{JaxNeRF}: an efficient {JAX} implementation of {NeRF}},
  url = {https://github.com/google-research/google-research/tree/master/jaxnerf},
  version = {0.0},
  year = {2020},
}

This project is based on โ€œPutting NeRF on a Dietโ€.

@misc{jain2021putting,
      title={Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis}, 
      author={Ajay Jain and Matthew Tancik and Pieter Abbeel},
      year={2021},
      eprint={2104.00677},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

๐Ÿ”‘ License

Apache License 2.0

โค๏ธ Special Thanks

Our Project is started in the HuggingFace X GoogleAI (JAX) Community Week Event.

Thank you for our mentor Suraj and organizers in JAX/Flax Community Week! Our team grows up with this community learning experience. It was wonderful time!

แ„‰แ…ณแ„แ…ณแ„…แ…ตแ†ซแ„‰แ…ฃแ†บ 2021-07-04 แ„‹แ…ฉแ„’แ…ฎ 4 11 51

Common Computer AI sponsored multiple V100 GPUs for our project! Thank you so much for your support! แ„‰แ…ณแ„แ…ณแ„…แ…ตแ†ซแ„‰แ…ฃแ†บ

Downloads last month
0
Unable to determine this model's library. Check the docs .