3D Urban Scene Reconstruction
Using Multi-View Aerial Imagery
Urban area operations have become extremely important in the post 9-11 world as military operations have
increased significantly in densely populated areas. Tactical operations in urban areas require high fidelity
geospatial information for force deployment, threat analysis (e.g. sniper LOS), etc. Realistic 3D urban scene
models are needed for improved situational awareness. In addition, these models are useful for homeland
security applications like emergency response and event planning.
We are presently pursuing a number of research activities aimed at developing a suite of automated/semi-
automated tools for 3D urban scene modeling from multi-view aerial imagery. This research is an extension
of our previous computer vision research on multi-view image-based modeling.
In previous research, we developed a novel PDE-based deformable surface桳agrangian Surface Flow that is
capable of automatically evolving its shape to capture geometric boundaries and simultaneously discover
their underlying topological structure. The deformation behavior of the model is governed by partial
differential equations (PDE) that are derived by the principle of variational analysis. The model ensures
regularity and stability, and it can accurately represent very sharp features.
Our approach is unique in its combination of simultaneous use of a high number of arbitrary camera views
with an explicit mesh that is intuitive and easy-to-interact-with. Our model-based approach automatically
selects the best views for reconstruction, allows for visibility checking and progressive refinement of the
model as more images become available. Results from extensive experiments on synthetic and real data
demonstrate robustness, high reconstruction accuracy and visual quality. Our mathematical formulation allows
us to use the same model for different types of data (e.g. LIDAR, SAR, IR, etc), simply by using the
appropriate data interface function.
With increased computer performance, our reconstruction method will soon achieve interactive run times.
One can envision a user controlling the quality of the reconstruction during image capture, being able to
capture the most necessary remaining views to complete the reconstruction.
Figure 1 below shows an example of 3D reconstruction from multi-view images. Figure 1(a) shows
positions of the 16 raster images of a Buddha statue, one of which is shown in 1(b). Figure 1(c) is the
final texture mapped mesh rendered from a similar viewpoint as the image shown in Figure 1(b).
Figure 2 shows an example of incremental reconstruction. Figure 2(a) is the reconstruction result from
6 frontal images, and Figure 2(b) is its mesh representation. The back of the model has not deformed
due to the lack of image data. Figure 2(c) is one of 5 images added later. After adding images, the model
further deforms in Figure 2(d), and finally captures the complete shape shown in Figure 2(e) and Figure 2(f).
Figure 3 shows a preliminary result we obtained by applying this research to the urban scene modeling
domain. The input image data are five multi-view digital aerial photos over Kansas City, MO. Figure 3(a)
and 3(b) are the texture mapped rendering of a building that were automatically reconstructed for
two different viewpoints.