TriDF

Abstract

Remote sensing novel view synthesis (NVS) offers significant potential for 3D interpretation of remote sensing scenes, with important applications in urban planning and environmental monitoring. However, remote sensing scenes frequently lack sufficient multi-view images due to acquisition constraints. While existing NVS methods tend to overfit when processing limited input views, advanced few-shot NVS methods are computationally intensive and perform sub-optimally in remote sensing scenes. This paper presents TriDF, an efficient hybrid 3D representation for fast remote sensing NVS from as few as 3 input views. Our approach decouples color and volume density information, modeling them independently to reduce the computational burden on implicit radiance fields and accelerate reconstruction. We explore the potential of the triplane representation in few-shot NVS tasks by mapping high-frequency color information onto this compact structure, and the direct optimization of feature planes significantly speeds up convergence. Volume density is modeled as continuous density fields, incorporating reference features from neighboring views through image-based rendering to compensate for limited input data. Additionally, we introduce depth-guided optimization based on point clouds, which effectively mitigates the overfitting problem in few-shot NVS. Comprehensive experiments across multiple remote sensing scenes demonstrate that our hybrid representation achieves a 30x speed increase compared to NeRF-based methods, while simultaneously improving rendering quality metrics over advanced few-shot methods (7.4% increase in PSNR, 12.2% in SSIM, and 18.7% in LPIPS).

Method

We introduce an efficient hybrid representation for few-shot novel view synthesis, which takes in sparse posed images and performs fast reconstruction. TriDF consists of a Tri-Plane branch and a Density Fields branch and separately predicts color and volume density. High-frequency color features are mapped onto the triplane for direct optimization, and volume density is modeled as continuous density fields through MLPs, significantly improving training efficiency. We also integrate the image-based rendering framework and depth-guided optimization based on 3D point clouds for such hybrid representations to stabilize the training process of few-shot NVS tasks.

TriDF successfully avoids the typical degradation problem observed in few-shot rendering and achieves superior rendering quality, which accurately recovers the delicate geometry of the scenes. Our approach achieves a more accurate reconstruction of scene details, such as the edges and corners of buildings, and synthesizes novel view images with higher fidelity.

Visualizations on LEVIR-NVS dataset

On LEVIR-NVS dataset, we visualize results from 16 different scenes trained with 3 views repectively. TriDF produces pleasing appearances while demonstrating detailed thin structures.

Trained with 3 Views

Building#1

Church

College

Mountain#1

Mountain#2

Observation

Building#2

Town#1

Stadium

Town#2

Mountain#3

Town#3

Factory

Park

School

Downtown

Citation


@article{kang2025tridf,
  title={TriDF: Triplane-Accelerated Density Fields for Few-Shot Remote Sensing Novel View Synthesis},
  author={Kang, Jiaming and Chen, Keyan and Zou, Zhengxia and Shi, Zhenwei},
  journal={arXiv preprint arXiv:2503.13347},
  year={2025}
}

TriDF

Triplane-Accelerated Density Fields for Few-Shot

Remote Sensing Novel View Synthesis

Jiaming Kang Keyan Chen Zhengxia Zou Zhenwei Shi Beihang University

Abstract

Method

Baseline Comparisons

Visualizations on LEVIR-NVS dataset

Trained with 3 Views

Citation