Research Articles

Building Extraction Model and Application from Multi-Scene Remote Sensing Data

Avaliable online: May 28, 2026
Authors: Yang, Gangyu; Zhai, Wei; Li, Jincheng; Cao, Yingying
Abstract:
Read more... The extraction of building footprints based on high-resolution remote sensing images is widely used in many fields such as land survey and urban planning, and high resolution also brings complex background information. In order to improve the effectiveness of building footprint extraction, this paper proposes a RefineSegFormer that combines a hierarchical transformer encoder and a cross-level feedback Refine feature pyramid network implemented through deformable convolution. RefineSegFormer achieved an F1 score of 96.55% and an IoU of 93.33% on the widely recognized WHU building data set, an F1 of 93.19% and an IoU of 87.25% on the 0.3-m Mandalay City data set, and an F1 of 90.53% and an IoU of 82.70% on 2-m Dingri County data set, all of which achieved the best results, proving the effectiveness and practical application value of the proposed model in this paper. The improved lightweight model for building footprint extraction has better performance than the unimproved medium model. In submeter remote sensing image building footprint extraction, the parameters are only 20% of the medium model, which can achieve more than 98% of the performance of the improved medium model, balancing extraction accuracy and inference speed. In the extraction of building footprints from meter remote sensing images with a small number of buildings and high similarity between buildings and the background, it is more recommended to use the improved medium model. Compared with the unimproved medium model, F1 has increased by 6.71%, and IoU has increased by 10.55%. The model and code of RefineSegFormer can be available at: https://github.com/amanforinteresting/RefineSegFormer.
Research Articles

IT-Seg: Morphology-Constrained and Spatially Guided Individual Tree Segmentation from Terrestrial and Mobile Point Clouds

Avaliable online: May 28, 2026
Authors: Chen, Maolin; Yin, Hanwen; Zhang, Zhiqing; Zhang, Huan; Bo, Liming;
Abstract:
Read more... Individual trees are essential components of forest ecosystems, and accurate tree-level segmentation provides a crucial foundation for forest ecosystem modeling and biodiversity assessment. We propose a novel point cloud-based individual tree segmentation method guided by morphological prior constraints and single-tree canopy radiative effective extent (SCREE), a metric used to define the canopy boundary for each tree, computed from high-dimensional features output by a network. First, the original point cloud is divided into a central point set and a buffer point set based on morphological prior constraints. On this basis, different strategies are applied for trunk extraction from the central and buffer points: trunks in the central point set are extracted through adaptive density-based filtering, while trunks in the buffer point set are extracted using a semantic segmentation network. Finally, canopy points are assigned to individual trunks to obtain complete tree structures. This assignment is guided by the proposed SCREE, inferred using a diameter at breast height–height generative model to generate a constrained space. Validated on six forest scenes from three public data sets, the proposed method achieves an average instance-level accuracy of 84.71% and an average point-level accuracy of 81.85%, demonstrating strong robustness across diverse forest environments. Moreover, compared with existing methods, our method exhibits higher stability when handling variations in samples and the presence of small trees across different types of scanners.
Research Articles

A Local–Global Feature Aggregation Method for Semantic Segmentation of Lidar Point Clouds in Railway Scenarios

Avaliable online: May 27, 2026
Authors: Peng, Cheng; Zhang, Jiuyan; Gan, Jun; Li, Te; Yang, Juntao; Kang, Zhizhong;
Abstract:
Read more... Railways serve as critical national infrastructures, and anomalies in key facilities can pose serious threats to transportation safety. Due to complex spatial structures and large-scale variations in railway point clouds, existing semantic segmentation methods struggle with multi-scale feature representation and semantic modeling. To address this issue, we propose a graph convolution–based point cloud semantic segmentation method. The network uses a multi-level feature aggregation framework in which dilated residual blocks facilitate adaptive feature fusion, and hierarchical downsampling progressively expands the receptive field to capture global structural information. Furthermore, a coordinate-guided graph convolution network is introduced to enhance local structural perception and encode topological relationships by constructing graphs from point coordinates and propagating features to model geometric relations and long-range dependencies. Together, these components achieve a unified multi-scale semantic representation for railway point clouds. With the WHU-Railway3D data set, our method achieved mean intersection over union scores of 72.40% and overall accuracy of 91.10% in the urban scenarios and mean intersection over union scores of 77.89% and overall accuracy of 95.50% in the rural scenarios, which shows state-of-the-art performance among current methods.
Research Articles Open Access

Dashcam Video: A Complementary Low-Cost Data Stream for On-Demand Forest-Infrastructure System Monitoring

Avaliable online: May 20, 2026
Authors: Joshi, Durga; Witharana, Chandi; Fahey, Robert; Worthley, Thomas; Zhu, Zhe; Cerrai, Diego;
Abstract:
Read more... Urban green infrastructure plays a critical role in enhancing ecological resilience and reducing infrastructure vulnerability in metropolitan settings. However, achieving scalable, high-resolution monitoring of urban green infrastructure remains a persistent challenge due to visual occlusion, structural complexity, and the cost or inaccessibility of conventional three-dimensional remote sensing technologies. This study introduces a novel, low-cost, and reproducible framework for near real-time, object-level structural assessment and geo-location of roadside vegetation and infrastructure using commonly available but underused dashboard camera (i.e., dashcam) video data. A pipeline was developed that combines monocular depth estimation, supervised calibration of a monocular depth proxy, and geometric triangulation to generate accurate spatial and structural data from continuous street-level video streams acquired from vehicle-mounted dashcams. Depth outputs were treated as a relative proxy and calibrated via a gradient-boosted regression model to metric camera-to-object distance, particularly for distant objects. The depth correction model achieved strong predictive performance (R2 = 0.92, mean absolute error = 0.31 on transformed scale), significantly reducing bias beyond 15 m. Further, object locations were estimated using global positioning system–based triangulation, whereas object heights were calculated using pinhole camera geometry. This method was evaluated under varying conditions of camera placement and vehicle speed. The configuration involving interior-mounted cameras and low-speed travel yielded the highest accuracy, with mean geo-location error of 2.83 m (interquartile range = 2.64 m) and mean absolute error in height estimation of 2.09 m for trees and 0.88 m for poles. This approach complements conventional overhead remote sensing methods, such as lidar and stereo imaging by enabling low-cost, frequent object-level monitoring of vegetation risks and infrastructure exposure for utility and urban planners, whereas broader transferability across locations, camera models, and environmental conditions require further multi-site evaluation.
Research Articles

Exploring the Utility of a Multivariate Soil Hyperspectral Reflectance Model for Estimating Soil Moisture Using Sentinel-2 Data

Avaliable online: May 14, 2026
Authors: Atyosi, Yonwaba; Cho, Moses Azong; Majozi, Nobuhle Patience; Bonnet, Wessel; Ramoelo, Abel;
Abstract:
Read more... Accurate and spatially transferable estimation of soil moisture is critical for sustainable agriculture, water resource management, and drought monitoring, particularly in data-scarce semiarid regions. However, soil moisture retrieval from optical satellite data remains challenging due to heterogeneous soil conditions and limited model generalizability, especially when interactions between soil moisture and clay content are neglected. This study presents a physically informed, simulation-based multivariate framework for estimating soil moisture from freely available Sentinel-2 multispectral imagery that explicitly accounts for soil clay content and its interaction with moisture. A Monte Carlo look-up table comprising 100,000 synthetic soil reflectance spectra was generated under varying soil moisture and clay conditions and resampled to Sentinel-2 spectral bands. Soil moisture-sensitive spectral band combinations, ratios, and newly developed soil moisture indices were derived and used to train machine learning models, which were evaluated using group-aware cross-validation to assess spatial robustness and transferability. Model application across multiple agricultural sites in South Africa’s Eastern Cape and Limpopo provinces, regions geographically distinct from calibration areas and spanning contrasting ecological and climatic conditions demonstrated high predictive performance (R² up to 0.91; RMSE as low as 0.71) and strong spatial transferability. The results indicate that explicitly integrating soil property interactions within a synthetic spectral modeling framework substantially improves Sentinel-2–based soil moisture estimation. The proposed approach advances operational optical remote sensing of soil moisture by bridging physically consistent spectral simulations and scalable multispectral observations, providing a transferable methodology for precision irrigation, drought early warning, and sustainable agricultural water management in semiarid environments.
Research Articles

An Anchor Point–Assisted Image-Matching Method for Weakly Textured Scenes Using Structured Features

Avaliable online: May 13, 2026
Authors: Chen, Ming; Shi, Jiangong; Ma, Zhenling; Xie, Hong; Wang, Zhengjie;
Abstract:
Read more... Photogrammetric mapping missions using robots or unmanned aerial vehicles (UAVs) often encounter environments where sparse strong-texture structures are found alongside extensive weak-texture surfaces, such as indoor corridors with frames and aerial views of shorelines. Establishing accurate correspondences under such circumstances remains a major technical challenge. Traditional methods can only match a few or no correspondences. Deep learning algorithms can yield stable matches in weak-texture regions. However, they rely on large-scale annotated data and computational resources, limiting their real-time application and generalization capability. This article proposes an anchor point–assisted image-matching method, using Euclidean distances and angular relationships between corner points in the weak-texture area and anchor points in the strong-texture area to establish structured features, enhancing the distinctiveness of the corner points in the weak-texture area. Meanwhile, epipolar geometry constraints are applied to restrict the candidate corner points search to a one-dimensional range, thereby improving the accuracy, efficiency, and reliability of the corner point matching. Comparative experiments on indoor, outdoor, and UAV data sets demonstrate that the proposed method achieves a greater number of correct matches than scale-invariant feature transform (SIFT), oriented FAST and rotated BRIEF (ORB), KAZE, AKAZE, grid-based motion statistics (GMS), SuperGlue, and LightGlue with a conventional CPU configuration. The presented method, SuperGlue, and LightGlue achieve a success rate (SR) of 100% on three data sets and secure sufficient correspondences in weak-texture regions. In contrast, the SR obtained by traditional methods ranges from 70% to 80%. In addition, our method runs in under 3 seconds, whereas the deep learning methods still require over 10 seconds even on a lightweight CPU. The effects of anchor point numbers, dual threshold settings, and epipolar search range on matching performance are analyzed to identify the optimal parameter configuration. The proposed method provides potential for robot and UAV photogrammetric mapping applications in sparsely textured environments based on resource-constrained platforms.
Research Articles

Monitoring Large Gradient Deformation by Means of Time-Series LuTan-1 Synthetic Aperture Radar Images

Avaliable online: April 27, 2026
Authors: Zhang, Xiang; Li, Tao; Zhao, Hui; Huang, Hai;
Abstract:
Read more... In order to provide a feasible technical strategy for large gradient subsidence monitoring, a time-series interferometric synthetic aperture radar (InSAR) technique combining high spatial and temporal resolution LuTan-1 SAR data was applied for subsidence monitoring. The capability of LuTan-1 images for large gradient subsidence monitoring was analyzed and discussed, and quantitative evaluation was implemented in combination with the field measurements. The accurate and recent digital elevation model generated by bistatic LuTan-1 data was used for terrain phase removal. The dynamic subsidence of Datong mining area was extracted based on the time-series InSAR techniques. In combination with the field leveling measurements and Sentinel-1 monitoring results, several conclusions were derived. Firstly, there are four main deformation areas in the monitoring area from Jan to May 2023, and the maximum cumulative deformation reach to −4.206 m. The dynamic spatial and temporal deformation details of mining areas were characterized, which also revealed the relationship between mining subsidence law and mining activities. Secondly, all of the subsidence details were extracted based on the LuTan-1 data, which indicated the capability of large gradient deformation monitoring. Thirdly, based on the field leveling measurements, the root mean square error of deformation monitoring results with LuTan-1 data is less than 40 mm, and the maximal relative error is less than 4%. The complete and accurate subsidence details of the mining area with large gradient deformation were obtained. The results indicated that LuTan-1 satellites provided an effective support for large gradient deformation monitoring, which is significant for mining subsidence law research, disaster monitoring, and management.
Research Articles

Topologically Aware Roof Wireframe Reconstruction from Airborne Laser Scanning Point Clouds Using Persistent Homology and Hybrid Geometric–Topological Constraints

Avaliable online: April 25, 2026
Authors: Huo, Pengpeng; Jiang, Jinguang; Kong, Gefei; Fan, Hongchao;
Abstract:
Read more... Airborne laser scanning point clouds constitute a primary data source for city-scale 3D building modeling. However, the automated reconstruction of geometrically accurate and topologically consistent roof wireframes persists as a formidable challenge, hindered by data noise, uneven point density, and the topological complexity of urban roofs. Existing data-driven methods exhibit high sensitivity to local fitting errors, frequently leading to topological inconsistencies, whereas model-driven approaches are constrained by predefined primitive libraries, limiting their generalization to complex composite roofs. To address these limitations, this paper proposes a novel framework for roof wireframe reconstruction that fuses topological data analysis (TDA) with hybrid geometric-topological constraints. A core innovation of this framework is the application of persistent homology to extract globally stable topological skeletons from 2.5D height scalar fields, serving as a robust topological prior independent of local geometric noise. Specifically, the method first generates regularized 3D eave lines constrained by 2D building footprints. Subsequently, a “geometry–topology” dual-domain cross-validation mechanism is used to validate geometric hypotheses derived from planar adjacencies against TDA-extracted topological critical points (ridges and valleys), thereby effectively suppressing spurious structure lines. Finally, a global optimization model governed by the minimum description length principle enforces implicit regularization to rectify local geometric distortions and guarantee topological closure. Experiments on the Trondheim dataset demonstrate that the proposed method achieves a root mean square error of 0.443 m, while enhancing the length-weighted completeness and correctness of structure lines to 96.23% and 97.41%, respectively. These results validate the efficacy of integrating height-field–based topological skeletons for reconstructing complex roof wireframes, offering a scalable and robust solution for automated large-scale urban 3D modeling.
Research Articles

A Shallow Sea Bathymetric Inversion Method from Active–Passive Satellite Remote Sensing Data Based on a Residual Correction Model

Avaliable online: April 23, 2026
Authors: Dong, Zhipeng; Li, Zhixian; Liu, Yanxiong; Feng, Yikai; Chen, Yilan; Huang, Guoan;
Abstract:
Read more... Satellite-derived bathymetry is an important method for obtaining shallow sea water depth. To address the problem that the traditional band log-ratio model exhibits significant discrepancies in bathymetric inversion accuracy in both shallow sea (<2 m) and deep sea (>12 m), where water depth is overestimated in shallow areas and underestimated in deep areas, this paper proposes a bathymetric inversion method from active–passive satellite remote sensing data based on a residual correction model. First, the proposed method uses the denoised Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2) ATL03 data set as precise depth control points to construct and implement the traditional band log-ratio model, generating initial bathymetric inversion results. Second, the initial inversion result is compared with the ICESat-2 depth control points, and the residuals between the initial water depth estimates and the true water depths in the local region are calculated. Then, based on the relationship between the spectral characteristics of remote sensing images and water depth residuals, a random forest model is constructed to fit the residual distribution across the entire study area. Finally, using the residual distribution derived from this fitting process, the initial bathymetric inversion results for the entire region are corrected, thereby improving the accuracy and precision of the water depth data. Experimental results demonstrate that the proposed method achieves a bathymetric inversion accuracy with the root mean square error better than 1.57 m and the mean absolute error better than 1.15 m for islands with varying seabed topographies, providing high-precision shallow sea bathymetric inversion results.
Research Articles

Reference-free Evaluation of Geometric–Textural Consistency for Photogrammetric Meshes

Avaliable online: April 22, 2026
Authors: Wang, Wenxiang; Liu, Yihang; Zhou, Yang; Jiang, Yuying; Yang, Fan; Hu, Zhongwen;
Abstract:
Read more... Photogrammetric mesh has emerged as a novel type of remote sensing data, finding wide application in diverse remote sensing scenarios, such as urban management, forestry monitoring, and agricultural surveillance. Semantic analysis of photogrammetric meshes serves as the foundation for numerous applications, in which the quality of triangular facets exerts a significant effect on the accuracy of face-wise mesh segmentation. Specifically, the inconsistency between geometric edges and textural edges often gives rise to faces with mixed semantic labels, which substantially undermines the reliability and precision of subsequent semantic analysis. Thus, the evaluation of such quality is essential for ensuring the high quality of photogrammetric meshes for semantic analysis. However, current mesh quality assessment methods predominantly focus solely on geometric and visual fidelity, lacking dedicated metrics for assessing the intrinsic consistency between geometric edges and textural boundaries. To address this limitation, we propose a reference-free approach for photogrammetric meshes based on assessing the consistency between geometric edges and texture-derived boundaries. The proposed approach projects 3D mesh faces onto the 2D texture space and compares their geometric edges with texture-derived object contours. We quantify this consistency using a novel buffer-zone–matching strategy combined with an area-weighted intersection over union (AIoU) metric. The experiments were carried on a multi-resolution data set and the public SUM Parts benchmark data set. The results indicate that the metric exhibits high sensitivity to quality variations across different levels of detail. Furthermore, a statistically significant correlation between AIoU and upper-bound semantic purity is observed, validating its reliability in reflecting intrinsic semantic segmentation potential. The proposed method also provides new insight into 3D reconstruction and mesh quality evaluation. The code is available at https://github.com/zwhoo/geometric-textural-consistency-evaluation
Research Articles Open Access

Revealing Feature Contribution Mechanisms for Remote Sensing Scene Understanding

Avaliable online: April 20, 2026
Authors: Chen, He; Zheng, Xianwei; Gong, Jianya;
Abstract:
Read more... Deep learning plays a central role in remote sensing scene understanding, making interpretability essential for analyzing and trusting model decisions. Feature contribution analysis is a key interpretability tool, yet existing methods often rely on artificial feature conflicts or feature suppression, which are easily confounded by strong semantic correlations in remote sensing imagery. (In particular, the fixed overhead viewing geometry tightly couples object shape with semantic category, which biases feature contribution estimation and misleads the interpretation of intrinsic model preferences, thus obscuring the genuine feature utilization patterns of models.) To address these limitations, we propose a systematic feature contribution analysis framework that integrates multi-modal feature decoupling with dynamic contribution aggregation. By disentangling shape, texture, and spectrum representations and progressively aggregating them, the proposed method enables unbiased quantification of feature contributions. The framework supports cross-architecture and cross–data set analysis. Extensive experiments reveal clear architectural- and data set–dependent feature preference patterns: convolutional neural networks exhibit an inherent texture bias across remote sensing tasks, while Vision Transformers realize balanced integration of shape, texture, and spectrum in object-level classification and shift to spectral feature dominance in scene-level land cover classification. We further find that the feature preference of remote sensing deep learning models is jointly determined by network inductive biases and data set characteristics rather than a single architectural attribute, offering new insights into remote sensing deep learning models.
Research Articles

Three-dimensional Drainage Infrastructure Modeling for Flood Vulnerability in Southeast Texas

Avaliable online: April 15, 2026
Authors: Ibironke, Ademola; Dhakal, Uddav; Bhusal, Shishir; Wu, Xing; Kim, Yong Je; Lai, Feilin;
Abstract:
Read more... Urban flooding poses escalating risks to transportation infrastructure in low-lying, flood-prone regions, underscoring the need for scalable, elevation-aware assessment frameworks. This study evaluates the flood vulnerability of drainage infrastructure in Southeast Texas by developing an automated 3D modeling framework that integrates digital elevation models, airborne lidar data, and water surface elevation scenarios through a fully automated pipeline incorporating large-language-model–assisted data standardization, Python/ArcPy geometry generation, and attribute-driven computer-generated architecture procedural modeling. Using ArcGIS Pro and CityEngine, 842 culverts and 223 bridges were modeled following quality-control filtering, and flood exposure was quantified using an elevation-differencing metric for 100- and 500-year flood events. The results indicate that fewer than 2% of culverts remain above water during a 100-year flood, while fewer than 1% remain accessible under a 500-year event, with median inundation depths of approximately 0.54 and 0.90 m, respectively. The proposed workflow reduces manual processing time by approximately 80% while preserving vertical consistency, providing a transferable and uncertainty-aware framework to support regional flood-resilience planning and future integration with hydrodynamic models. The code and supporting materials associated with this study are publicly available via GitHub at: aibironke/3D-Drainage-Flood-Vulnerability
Research Articles

Fine 3D Modeling of Tunnel Surrounding Rock and Extraction of Rock Mass Structure Features Based on RGB-D Camera

Avaliable online: April 14, 2026
Authors: Zhang, Rongchun; Han, Jiaqi; Chen, Song; Jiang, Yi; He, Yanlan; Ge, Yufan; Shi, Shang; Liu, Lanfa;
Abstract:
Read more... In the context of large-scale hydropower infrastructure, the accurate extraction of tunnel rock mass structural information is vital for informed engineering design. However, this task remains inherently difficult due to the complexity of subsurface geological conditions and the inherent limitations of conventional survey techniques. This study proposes an integrated methodology that combines high-resolution 3D reconstruction with the Geo-AINet ensemble learning framework to enable automated identification of structural planes. The process involves generating detailed tunnel models, extracting multi-dimensional semantic features, performing initial segmentation, and subsequently applying cluster analysis to refine the structural interpretations. Experimental validation confirms the method’s high level of precision: the extracted structural orientations exhibit average angular deviations of less than 3 degrees, with a maximum error margin not exceeding 5 degrees relative to manual measurements. These findings demonstrate the method’s capacity to meet stringent engineering standards while enhancing both the efficiency and safety of geological data acquisition in tunneling projects.
Research Articles

Cross-Domain Few-Shot Hyperspectral Classification with Feature Enhancement and Loss Optimization

Avaliable online: April 13, 2026
Authors: Dang, Yuanyuan; Li, Mengyu ; Li, Hao
Abstract:
Read more... Cross-domain few-shot learning has attracted increasing attention in hyperspectral image classification. However, in scenarios with scarce annotations and significant domain shifts, existing methods struggle to simultaneously adapt to the inherent spectral-spatial coupling of hyperspectral data while maintaining high-dimensional data-processing efficiency and balancing the learning of few-shot and cross-domain hard samples. This study proposes a collaborative optimization scheme based on improvements to the cross-domain few-shot learning with cross-modal alignment and supervised contrastive learning method. Specifically, an enhanced SimAM module dynamically weights spectral-spatial features to strengthen discriminative information of ground objects and mitigate domain shifts. A Channel Relation Attention module is introduced to capture channel-spatial dependencies and adaptively select key channels, improving high-dimensional feature processing efficiency. Moreover, a weighted joint loss function combining cross-entropy and focal loss is constructed, with class weights adjusted according to the joint distribution of source and target domains to balance sample learning priorities. Experiments on three benchmark hyperspectral data sets demonstrate that the proposed method achieves significant improvements in classification accuracy, robustness, and cross-domain adaptability.
Research Articles

Lidar-based Simultaneous Localization and Mapping with Connecting Traverse Constraints for Underground Mine Mapping

Avaliable online: April 12, 2026
Authors: Xu, Zhihua; Li, Yuanyuan; Xu, Ershuai; Wu, Jing; Lin, Jiaxuan; Zhang, Yuansheng; Yang, Bisheng; Peng, Suping;
Abstract:
Read more... Underground 3D mapping plays a vital role in mineral resource exploration and development. Lidar-based simultaneous localization and mapping (SLAM) has become a key technology in this domain, offering autonomous navigation and real-time mapping capabilities. However, in narrow and geometrically repetitive underground mine corridors—characterized by long passages and a lack of loop closures—conventional lidar-SLAM systems suffer from significant error accumulation, leading to degraded mapping accuracy. To overcome these limitations, we propose a novel lidar-SLAM method incorporating connecting traverse constraints. First, edge and planar features are extracted from lidar scans via a segmentation strategy and refined using least-squares fitting. These features are then integrated into a factor graph framework to optimize pose estimation and construct a consistent point cloud map. Furthermore, we introduce an external traverse connection mechanism to impose additional constraints in the pose graph, effectively correcting trajectory drift and 3D point coordinates. Our approach mitigates the effect of noise and error accumulation commonly encountered in traditional lidar-SLAM techniques. Evaluation on a self-collected data set and two public benchmarks demonstrates that the proposed method consistently outperforms four state-of-the-art systems: LOAM, LeGO-LOAM, S-LOAM, and F-LOAM. In our self-collected data set, the trajectory root mean square error (RMSE) values for LOAM, LeGO-LOAM, S-LOAM, F-LOAM, and the proposed method are 0.693, 0.506, 4.062, 2.542, and 0.467 m, respectively. Compared to these baseline methods, our approach achieves error reductions of 32.6%, 7.7%, 88.5%, and 81.6%, respectively. On the two public data sets, the trajectory RMSE values for the same methods are 1.132, 0.351, 5.848, 8.850, and 0.084 m and 0.612, 0.590, 3.671, 9.279, and 0.558 m. This corresponds to relative error reductions of 88.6%, 67.2%, 97.5%, and 98.5% and of 8.8%, 5.4%, 84.8%, and 94.0%, respectively, confirming its robustness and accuracy in challenging underground environments.
Research Articles

Terrain Variation Aware Surface Height Estimation Network for High-Precision Multi-view Satellite Stereo

Avaliable online: April 4, 2026
Authors: Guo, Wei; Chen, Min; Fang, Tong; Zhao, Junqi; Zhang, Jinbo; Zhang, Zhanhao; Liu, Kai; Hu, Han; Ge, Xuming; Zhu, Qing; Xu, Bo;
Abstract:
Read more... Deep learning–based multi-view stereo (MVS) has shown great potential for large-scale 3D reconstruction from satellite imagery. However, existing methods struggle with the large depth ranges and complex scenes inherent to this domain. By relying on rigid range-guided sampling and local consistency assumptions, these approaches often neglect specific terrain variations. This leads to misallocated hypothesis budgets and excessive smoothing across discontinuities in complex terrain. To address these limitations, we propose a terrain variation–aware multi-view satellite imagery surface height estimation network that introduces surface normal information to construct an end-to-end terrain-guided height estimation framework. To effectively capture terrain variation, the network computes normal maps from height maps and measures local terrain variation within the neighborhood of reference pixels based on height and normal information. The network further incorporates a local terrain-guided height hypothesis module and a terrain variation–aware feature enhancement module that optimize height hypothesis sampling precision and enhance the terrain-aware features in regions with significant terrain variation, respectively. Additionally, we propose a joint height-normal loss to implicitly optimize height estimation. Extensive experiments conducted on the WHU-TLC and MVS3D data sets demonstrate that our method achieves state-of-the-art performance across all evaluation metrics. Notably, in urban areas and mixed scenes, the proposed approach achieves a significant improvement in the proportion of high-precision pixels compared to existing methods, with the maximum improvement reaching 30.25%.
Research Articles Open Access

UAV Lidar vs. Structure-from-Motion (SfM) Photogrammetry for Coastal Dune Monitoring: A Two-Year Multi-Temporal Case Study from the Southern Baltic Sea

Avaliable online: April 1, 2026
Authors: Śledziowski, Jakub;
Abstract:
Read more... Unmanned aerial vehicles (UAVs) are increasingly used for high-resolution coastal monitoring. This study compares two widely used approaches for deriving digital elevation models (DEMs) from eight repeat UAV surveys (coacquired lidar + red–green–blue [RGB] in the same flight): (1) UAV lidar and (2) structure-from-motion photogrammetry (SfM) based on RGB imagery. A total of 16 DEMs (8 field campaigns in both 2022 and 2023) acquired with a DJI Matrice 300 real-time kinematic drone equipped with a Zenmuse L1 sensor over a 1-km dune–beach system in Mrzeżyno, southern Baltic coast (Poland). Using an automated transect-based workflow (895 cross-shore profiles), shoreline position, beach and dune widths, elevations, slopes, and volumetric change were extracted and related to tide-gauge–based hydrometeorological conditions using deep neural networks and complementary driver-consistency analyses. Both approaches captured consistent temporal trends and converged on the same dominant environmental drivers, with storm activity (event count and duration) together with wave and sea-level conditions explaining most of the observed variability. Differences were largest over water surfaces and locally within vegetation-affected dune areas, where lidar produced fewer artifacts. Despite these local deviations, the net beach + dune volume change between the first and last survey differed by only 0.48% between lidar and SfM, indicating that SfM can provide comparable coastal-change information for many monitoring tasks when survey geometry and quality control are carefully managed.
Research Articles

Enhanced Visual Relocalization: A Cross-Modal Scene Coordinate Regression Approach

Avaliable online: March 26, 2026
Authors: Ma, Wei; Wang, Xinyu; Zhang, Yujie; Zhu, Chao; Dai, Hengming; Ge, Liang;
Abstract:
Read more... Visual relocalization finds application across a multitude of domains. Within this realm, scene coordinate regression methods are particularly noteworthy, as they bypass traditional intermediate steps and directly estimate camera pose by regressing 2D–3D point correspondences. However, the model is limited to relying solely on reprojection constraints and is challenged with the task of implicitly triangulating points. Without the guidance of a ground-truth 3D point cloud, the model’s ability to achieve high positioning accuracy is compromised. In this study, we address the challenge by incorporating the concept of cross-modal feature detection loss into our network architecture. We introduce cross-modal feature detection-based scene coordinate regression network (CFDN), a novel network that integrates a randomization technique to blend image-derived features with corresponding pixel positions, camera intrinsics, and ground-truth poses. This integration effectively mitigates correlated gradients, thereby significantly enhancing the efficiency of the training process. The network culminates in a regression layer that maps 2D pixel coordinates to their corresponding 3D scene coordinates with high precision. Notably, we have engineered a novel cross-modal feature detection loss by introducing explicit 3D geometric constraints based on the idea of contrastive learning on top of the 2D reprojection loss to refine the accuracy of scene regression. Empirical results demonstrate that our method achieves state-of-the-art performance. Specifically, CFDN achieves a relocalization accuracy of 97.2% and 99.9% on the indoor 7Scenes and 12Scenes data sets, respectively. In the outdoor Cambridge landmarks data set, it reduces the average median pose error to 17 cm/0.2°, outperforming existing baselines while maintaining a compact model footprint without requiring 3D models or depth maps for supervision.
Research Articles Open Access

Beyond Global Navigation Satellite Systems–Based Georeferencing: Recent Trends in Positioning and Navigation

Avaliable online: March 18, 2026
Authors: Eid, Mohamedelmustafa Omer; Remondino, Fabio; Toth, Charles;
Abstract:
Read more... Positioning, navigation, and timing (PNT) information is critical for a range of critical civil, industrial, and defense applications. PNT relies predominantly on global navigation satellite systems (GNSSs), but GNSS signals are becoming increasingly vulnerable to natural and man-made disruptions, which motivated the development of alternative PNT technologies that can complement or substitute GNSS in denied environments. These approaches fall into three categories: opportunistic, infrastructure-based, and environment-based PNT. While numerous studies have examined individual techniques, the literature remains fragmented and lacks a comprehensive review. This paper reviews and synthesizes recent advances across all three categories, highlighting application scenarios, achieved accuracy, and limitations. Opportunistic PNT, i.e., using signals of opportunity, has evolved rapidly over the past decade with the introduction of 5G cellular systems and the large-scale deployment of low Earth orbit constellations, although their accuracy remains at the meter level. Infrastructure-based methods can achieve meter- to subcentimeter-level accuracy, but with trade-offs between accuracy, coverage, and reliability. Environment-based PNT requires no external infrastructure and continues to advance through improvements in visual and lidar simultaneous localization and mapping (SLAM) algorithms, while radar-SLAM offers a more robust alternative in environments where visual- or lidar-based systems may struggle. Despite all these alternatives, GNSS remains unmatched due to decades of infrastructure investment and operational evolution. Future resilient PNT in GNSS-denied environments will rely on adaptive multi-sensor integration, combining ambient signals, dedicated infrastructure, and SLAM to meet application-specific needs.
Research Articles

An Improved Manifold Learning Algorithm for Dimension Reduction of Fusion Data for Multispectral and Synthetic Aperture Radar Images

Avaliable online: March 16, 2026
Authors: Zheng, Zezhong; Huang, Jingfan; Yu, Shuang; Yu, Zhenlu;
Abstract:
Read more... In the application of remote sensing data, the data obtained by a single sensor often cannot meet the actual needs, because the information contained in these data is often limited. Data fusion can improve the availability of these data, but it also brings the problem of information redundancy. In this paper, a whole set of multi-source remote sensing image-processing frameworks was constructed first and aimed at the problem of neighborhood selection in an isometric mapping (ISOMAP) algorithm; an improved ISOMAP algorithm based on the L1 norm (a sparse optimization technique that minimizes absolute value sums) was proposed to mine the inherent low-dimensional structure of multi-source remote sensing data and reduce the dimension of the data. In this paper, the traditional manifold learning algorithm is improved by developing a version of ISOMAP based on the L1 norm (L1-ISOMAP) for dimensionality reduction tasks. After dimensionality reduction, the support vector machine (SVM) and random forest (RF) are used to classify the dimensionality reduction data. The experimental results demonstrate that compared with other dimensionality reduction methods such as Pairwise Controlled Manifold Approximation Projection (PaCMAP), Uniform Manifold Approximation and Projection (UMAP), traditional ISOMAP, and other improved ISOMAP variants (achieving 96.90% overall accuracy with SVM and 98.69% with RF, with kappa coefficients of 0.96 and 0.98, respectively), our L1-ISOMAP achieves superior overall classification accuracy and exhibits stronger robustness in processing multi-source remote sensing data.
Research Articles

ICTNet: Interactive Convolution and Transformer Network for Hyperspectral Image Classification

Avaliable online: March 10, 2026
Authors: An, Jinliang; Wang, Muzi; Dai, Longlong; Zhang, Weidong;
Abstract:
Read more... The classification of a hyperspectral image (HSI) plays a critical role and serves as the foundation for many related applications. The combination of convolutional neural networks (CNNs) and transformers has shown promising performance in HSI classification by using the advantages of the two networks. However, existing hybrid models often offer limited feature interaction and fusion between the two branches. Here, a novel interactive convolution and transformer network (ICTNet) for HSI classification is proposed. Specifically, raw HSI is first fed into a multi-scale feature-enhancement module, where convolution operations with varying kernel sizes are used to extract multi-scale features, and shuffle attention is used to further enhance feature representation. Enhanced features are then processed by a dual-branch CNN and transformer fusion module to leverage local information and long-range dependencies. Additionally, the feature interaction and fusion module is proposed to facilitate dynamic interaction and fusion of features during extraction and propagation between the two branches, enhancing the diversity and interactivity of the features. Extensive experimental results on three real HSI data sets demonstrate that the proposed ICTNet outperforms state-of-the-art HSI classification methods.
Research Articles Open Access

Toward Automatic Vector Extraction from Scanned Historical Aerial Photo Indexes

Avaliable online: March 9, 2026
Authors: Malek, Salim; Farella, Elisa Mariarosaria; Perda, Giulio; Cantoro, Gianluca; Remondino, Fabio;
Abstract:
Read more... National archives worldwide have recently dedicated significant efforts to preserve fragile historical collections (e.g., maps, documents, photographs) through scanning and digitization. Advancements in digital technologies have opened new opportunities for processing and accessing these materials; however, as these collections grow, significant challenges arise in efficiently managing, linking, and using vast volumes of digital data. This paper addresses the challenge imposed by the digital transformation of vast archive collections, with a focus on scanned materials preserved in historical photographic and mapping archives and intended for the creation of digital databases and online catalogues. This study targets the semi-automatic vectorization of aerial photo indexes (APIs), often referred to as finding aids for aerial reconnaissance sorties (i.e., maps on which aerial photo footprints [each corresponding to a historical aerial photo] from different surveying flight missions were manually drawn onto base topographic maps). Aerial images were captured during the 20th century for military, reconnaissance, and mapping purposes, and their footprints were then manually transposed onto reference maps. Currently, archives and mapping institutions managing creating digital databases and catalogues are manually addressing a demanding and time-consuming vectorization task, and automated solutions are highly needed. This contribution proposes a novel and comprehensive semi-automatic vectorization pipeline that leverages and integrates computer vision techniques and uses deep learning and computational geometry for detecting, isolating, and polygonizing aerial image footprints in APIs. The methodology is applied to a large collection of APIs scanned by the Italian National Historical AirPhoto Archive at the Italian National Institute for Cataloguing and Documentation of the Ministry of Culture and will be added to the online database and WebGIS catalogue recently set up to disseminate copies of historical aerial photographs.
Research Articles

Lidar–PTZ Fusion System for Environmental Perception and Analysis Detection

Avaliable online: March 6, 2026
Authors: Ye, Tao; Seo, Minwu; Lee, Chul-Hee;
Abstract:
Read more... With the rapid proliferation of unmanned aerial vehicles (UAVs), concerns regarding privacy breaches, unauthorized intrusions, and public safety have become increasingly critical. To address these challenges, this study developed a lidar–pan-tilt-zoom (PTZ) camera fusion system for real-time UAV detection and tracking. The proposed approach integrates YOLOv11-based visual detection, PointPillars point-cloud processing, and an adaptive extended Kalman filter (EKF) within a unified framework to achieve robust and temporally consistent target state estimation under sparse observation conditions. A spatially constrained virtual-point augmentation scheme is introduced to improve feature stability and model generalization under sparse lidar point-cloud conditions. Furthermore, a dual-layer fusion EKF strategy is designed to jointly exploit PointPillars detection outputs and adaptive noise–aware centroid measurements, enabling robust dynamic state estimation and continuous tracking even with intermittent data. The proposed system was implemented and evaluated through both public data sets and real-world outdoor experiments. Experimental results demonstrate that the lidar–PTZ fusion system achieves an average positional deviation of ±1.62 m, an effective detection range of about 30 m, and a maximum valid range of approximately 42 m, while maintaining a frame-level detection continuity of 96.8% and operating at around 32 frames/second. The evaluation results demonstrate that the fusion strategy effectively mitigates the limitations of single-sensor detection and provides a reliable foundation for multi-target cooperative sensing and airspace security monitoring applications.
Research Articles

Perspective-Invariant Matching of Aerial–Terrestrial Images Based on Multi-Coplanar Geometry for Building Modeling

Avaliable online: March 5, 2026
Authors: He, Haiqing; Chen, Longyu; Zhou, Fuyang; Yuan, Ye; Liu, Jiahao; Wang, Zhenda;
Abstract:
Read more... Multi-view images captured from both aerial and terrestrial perspectives exhibit substantial variations in viewpoints, pronounced geometric distortions, and significant scale changes, posing considerable challenges for the integrated three-dimensional (3D) modeling of buildings within an aerial–terrestrial collaborative framework. To address these challenges, this paper introduces a perspective-invariant matching method for aerial–terrestrial images based on multi-coplanar geometry, specifically tailored for building modeling. The proposed method develops an innovative multi-perspective transformation model that iteratively extracts multiple coplanar geometric relationships from two-dimensional imagery using local matching correlations. Furthermore, it deduces the functional interrelations among these planes to construct a tailored multi-coplanar geometric model for local matching domains. Subsequently, the local imagery is segmented into subimages across various perspective spaces, and a deep matching graph neural network (GNN) is used to discern point and line structures within these images. By applying inverse perspective transformations, the corresponding lines and points within the coplanar spaces are accurately reprojected onto the original multi-view images. Ultimately, aerial triangulation is conducted using cross-platform corresponding features to achieve refined building modeling. Experimental results demonstrate that our method surpasses existing techniques in reconstructing more detailed 3D models of buildings. Notably, the accuracy of cross-platform image matching exceeds 99.27%, highlighting the effectiveness and robustness of our proposed method.
Research Articles

Reconstructing Heritage Interiors from Point Clouds via Architectural Element Segmentation

Avaliable online: March 4, 2026
Authors: Wang, Feng; Yang, Zexin; Liu, Yang; Zhang, Ming; Ye, Qin; Luo, Junqi;
Abstract:
Read more... Indoor 3D reconstruction is crucial for digital documentation, preservation, and management of historical buildings by capturing their detailed spatial layouts and architectural structures. However, reconstructing heritage interiors from point clouds remains challenging due to data incompleteness and imperfect segmentation led by complex geometries, occlusions, and intricate spatial arrangements. To address these challenges, we proposed a semantically guided framework for structured 3D reconstruction of historical interiors. Our approach began by extracting architectural elements using learning-based semantic and instance segmentation, augmented by an implicit feature enhancement module. We then generated a 2D floor plan from wall and ceiling points, followed by constructing an enclosure model composed of walls, ceilings, and floors via Markov random field optimization. Remaining structural elements (e.g., beams, columns, windows, and doors) were reconstructed from detected geometric primitives. Finally, we integrated the enclosure model and structural element models into a complete indoor reconstruction. Experiments on point clouds captured by mobile laser scanning, terrestrial laser scanning, and photogrammetry demonstrated that our method achieved superior accuracy, compactness, and computational efficiency compared to existing approaches, validating its robustness across diverse data sources.
Research Articles

MFOR: A Benchmark for Multi-scene Military Fine-Grained Object Recognition

Avaliable online: March 3, 2026
Authors: Yan, Qiuyu; Su, Yu; Wang, Shu; Yu, Yongan; Hua, Yixin; Dai, Chenguang;
Abstract:
Read more... Given the growing demand for military fine-grained object reconnaissance, warning, and intelligence analysis, the automated recognition of military objects using high-resolution remote sensing imagery has become increasingly significant. Nevertheless, the existing remote sensing military object recognition data sets are characterized by a limited number of fine-grained category levels and the difficulty in encompassing the typical military objects. This limits the accurate recognition of military objects in different scenes and lacks the association between different class levels. To address the above issues, this research uses satellite remote sensing data platforms, such as Google Earth and PIE Engine, to collect high-resolution remote sensing image data from developed countries. Military objects are identified across various scenes, including air, sea, and ground. Combined with common related standards, this research uses rotating boxes and a four-level category system (scene, kind, function, and type) to conduct annotations. A military fine-grained object recognition data set named MFOR (for Military Fine-grained Object Recognition; data set download link: https://drive.google.com/file/d/1sjH7p94wM16fPr_shCjZ4o0swDUSVP2N/view?usp=drive_link) has been constructed and made publicly accessible. MFOR is a large-scale military object fine-grained recognition data set that covers different scenes, including sea, land, and air. It consists of 12 633 images, 145 distinct fine-grained categories, and 38 759 object instances. To establish a benchmark for military fine-grained object recognition in remote sensing imagery, this research conducts experiments and analysis on eight rotating frame object recognition networks using the MFOR data set, aiming to offer valuable references for related research.
Research Articles

Localized Brightness–Shape–Moisture Soil Parameterization for Improved PROSAIL-5 Canopy Simulation

Avaliable online: March 2, 2026
Authors: Bonnet, Wessel; Cho, Moses Azong; Chirwa, Paxie W.; Masemola, Cecilia;
Abstract:
Read more... The mapping and modeling of canopies is important for the understanding, monitoring, and management of vegetation systems. To effectively model or simulate canopy spectra at the hyperspectral level, background soil spectra must first be simulated accurately. Spectrometer data simulation such as this will become increasingly relevant as newer imaging spectroscopy missions such as PRISMA and AVIRIS-NG start to produce more data sets. In this study, the brightness–shape–moisture (BSM) radiative transfer model (RTM) with localized parameter distributions was used in conjunction with the PROSAIL-5 RTM to simulate canopy spectra for different biomes in the semiarid regions of Southern Africa as captured with PRISMA or AVIRIS-NG sensors. It is hypothesized that a PROSAIL-5 RTM with biome-specific BSM parameters can more accurately simulate PRISMA and AVIRIS-NG spectra than a PROSAIL-5 model simulated with the default soil spectrum, as represented by lower relative root mean square error (RRMSE) values against actual spectra. It is demonstrated that the PRISMA and AVIRIS-NG canopy spectra are more effectively simulated using their respective biome-specific BSM parameter regimes, with RRMSE improvements from 14.56% to 13.91% and from 13.64% to 10.71% for two tested images captured with these sensors.
Research Articles

Remote Sensing Image Denoising Network Based on Locally Enhanced Window Transformer

Avaliable online: March 2, 2026
Authors: Zhang, Shaohua; Li, Wenyun; Bian, Hefang;
Abstract:
Read more... Remote sensing images serve as indispensable and pivotal carriers in diverse fields, including natural resource management, ecological environment protection, and urban–rural construction planning. Acquiring high-quality remote sensing images is particularly critical for practical applications. However, limitations such as adverse weather conditions and equipment constraints often hinder the acquisition of high-quality data. To tackle this challenge, this paper proposes a U-shaped Transformer network architecture—incorporating locally enhanced windows and denoted as ULTNet—for restoring low-quality remote sensing images to high-quality ones. The network leverages a locally enhanced transformer (LT) block, equipped with locally enhanced windows, for attention extraction. This block applies a local self-attention mechanism within nonoverlapping spatial windows, effectively reducing computational complexity while capturing richer global dependencies to enable end-to-end remote sensing image restoration. Both quantitative and qualitative analyses demonstrate that the proposed method exhibits superior restoration performance and strong generalization capabilities.
Research Articles

Reserve Identification Model for Oil Tank Based on Satellite Image Analysis under Complex Observation Angles

Avaliable online: March 1, 2026
Authors: Li, Yang; Yang, Jin-Rong; Liu, Ke-Hang; Chen, Hong-Wei;
Abstract:
Read more... The reserve volume of oil tanks can be obtained through in-depth analysis of satellite images. This approach addresses the urgent demand for precise and rapid acquisition of global oil tank reserves. A novel method for extracting oil tank shadows based on projection characteristics of a cylinder was proposed that extracts the shadows obscured by the oil tank’s side wall under complex observation angles. A preprocessed satellite image was used to extract the oil tank shadow via arc feature extraction, Hough transform, and an edge detection algorithm. Subsequently, a reserve identification model for oil tanks based on geometric projection and shadow boundary extraction was constructed. This model used the parallel projection characteristics of satellite imaging and combined the geometric structure of the oil tank to infer the shadow length by the geometric relationship between the extreme points of the shadow boundary and the tank centers. The oil tank height was then accurately determined, and finally, the reserve was identified using height difference and tank radius. The proposed method was validated through the establishment of a tank experimental system and the application of an oil tank case study from the United Arab Emirates. The research results show that the method exhibited high accuracy and reliability. Through multiple sets of tank experiments, the proposed method yielded a relative error range of 0.49% to 2.14% in identifying the reserve of the tank model. To further verify its accuracy, the method was applied to the 18# oil tank at Jebel Dhanna Port, yielding a relative error of 0.44% in volume identification.
Research Articles Open Access

Evaluating U.S. Geological Survey Three-dimensional Elevation Program Lidar Data as a Source for Building Height

Avaliable online: February 27, 2026
Authors: Liu, Jung-kuan; Shavers, Ethan; Arundel, Samantha T.; Qin, Rongjun;
Abstract:
Read more... Accurate building height information is used for numerous urban applications, including infrastructure modeling, disaster risk assessment, and energy demand estimation. The U.S. Geological Survey collects airborne lidar data to support the three-dimensional elevation program (3DEP). The high point density and vertical accuracy of the lidar point clouds are optimal for 3D mapping. Few studies have systematically evaluated building height extraction from 3DEP lidar point clouds in urban areas. This study assesses the accuracy and consistency of building height estimates from two publicly available data sets by comparing them to the 3DEP lidar data. The public building height data sets are USA Structures and Microsoft Global Building Footprints (GBF), and they are evaluated across two contrasting urban sites. Building heights from USA Structures are also lidar-derived, whereas GBF estimates are based on stereo matching of optical imagery. Building heights from 3DEP were extracted by classifying lidar points using a deep learning model, isolating roof and ground returns, and computing the height as the median elevation difference between the roof and surrounding ground points. For both sites, 3DEP lidar-derived heights show strong agreement with USA Structures (R² = 0.87 and 0.86; MAE = 1.8 and 2.2 m), confirming the internal consistency and reliability of lidar-derived measurements. In contrast, comparisons with GBF height estimates reveal significantly lower correlations (R² = 0.54 and 0.60; MAE = 2.5 and 3.1 m), highlighting the limitations of imagery-based height reconstruction, particularly in dense or heterogeneous urban environments. These findings indicate that lidar can be used as an accurate source for vertical structure characterization and underscore the benefits of improved integration and bias correction when using imagery-derived building heights. The results provide support for hybrid approaches that combine the broad spatial coverage of optical data sets with the accuracy of lidar for scalable 3D urban mapping.
Research Articles

Incorporating Directional Relationship Modelling for Building Change Detection from Very-High-Resolution Satellite Images

Avaliable online: February 26, 2026
Authors: Zha, Ying; Li, Mengmeng; Liu, Xuanguang;
Abstract:
Read more... Detecting building changes from bitemporal very-high-resolution satellite images is often hindered by false positives, especially for tall structures, due to variations in sun azimuth and satellite viewing angles. To address this issue, we propose MDR-Net, a novel change detection network that models the directional relationship between buildings and their shadows. Unlike existing methods that rely on handcrafted rules and metadata (e.g., solar angles), MDR-Net automatically learns directional features from images using a customized directional modeling module (DMM). To further enhance feature representation, we introduce an adaptive feature fusion module (AFFM), which uses adaptive selection and dynamic adjustment strategies to emphasize critical features and effectively fuse directional information with bitemporal semantics. We evaluated MDR-Net on three building change detection data sets, i.e., Fuzhou (FZ), Wuhan (WH), and Guangzhou (GZ), and compared it with seven state-of-the-art methods: SMCD-Net-m, ChangeFormer, BIT, FC-Siam-diff, SNUNet, DGMA2-Net, and MSPSNet. The results showed that MDR-Net achieved the highest F1 scores of 93.84% and 82.22% on the FZ and WH data sets, respectively, evidently reducing false detections, particularly under large viewing-angle differences. In addition, our method, with fixed DMM weights and without sun azimuth information, successfully learns the directional relationship when retrained from the FZ to GZ data sets, demonstrating strong generalization to the public data set.
Research Articles

Momentum-Based Perturbation Decomposition for Transferable Three-Dimensional Point-Cloud Attacks

Avaliable online: February 15, 2026
Authors: Liu, Weiquan; Xie, Min; Huang, Xingwang; Su, Jiasheng; Sun, Yanwen; Lin, Shiwei; Su, Jinhe; Wang, Zongyue; Cai, Guorong;
Abstract:
Read more... In the field of photogrammetry and remote sensing, three-dimensional point cloud data—acquired via airborne lidar, satellite stereo imagery, or terrestrial laser scanning—has emerged as the core data type for applications such as terrain mapping, urban modeling, disaster monitoring, and infrastructure inspection. The accuracy and reliability of these applications largely depend on the precise interpretation of point-cloud data. However, existing deep neural networks exhibit pronounced vulnerability to adversarial perturbation attacks that can mislead models into misclassifying targets, potentially leading to catastrophic consequences. Most of the current point cloud–perturbation methods rely on the white box–attack assumption, which is difficult to achieve in real application scenarios. To address this gap, this paper focuses on enhancing the transferability of point-cloud perturbations—the ability of adversarial samples generated for one model to mislead other uninvolved models. A novel adversarial point cloud–generation method, momentum-based decompose perturbation (MBDP) is proposed. The MBDP method decomposes adversarial perturbations into two orthogonal subperturbations. By integrating a momentum-based iterative fast sign algorithm, the MBDP method synchronously optimizes both subperturbations to generate adversarial samples that lie far from the original model’s decision boundary while maintaining cross-model generalizability. On both real-world remote sensing data sets and synthetic data sets, the MBDP method achieves great performance. By exposing cross-model vulnerabilities in photogrammetric point clouds, this work equips developers with a diagnostic tool to audit and subsequently harden artificial-intelligence–driven mapping and monitoring systems, laying the groundwork for more reliable geospatial products in disaster response, urban planning, and environmental conservation.
Research Articles

Comparative Evaluation of Deep-Learning Models for Point Cloud Upsampling: Insights from Indoor Parking Lot Data Set

Avaliable online: January 2, 2026
Authors: Fatholahi, Sarah Narges; Yin, Shunde; Hu, Kristie; He, Hongjie; Yao, Kevin Yikai; Zhang, Dedong; Lu, Dening; Li, Jonathan;
Abstract:
Read more... Upsampling of point clouds is a critical process in three-dimensional data processing, aimed at enhancing the resolution, uniformity, and overall quality of sparse and irregular point cloud data. This task becomes especially necessary in underground parking lots, where factors such as occlusions, reflective surfaces, and limited sensor coverage often result in incomplete or uneven point distributions. Accurate upsampling in these environments can significantly improve downstream tasks such as semantic segmentation, object detection, and spatial mapping by providing denser and more geometrically consistent reconstructions. In this paper, a comprehensive evaluation of four state-of-the-art deep-learning algorithms for point cloud upsampling is presented. Leveraging a novel, high-quality data set captured in a real-world indoor parking lot environment, the performance of these models is systematically assessed under conditions characterized by structural complexity and varying point densities. This comparative study evaluates each model’s ability to address sparsity and non-uniformity while preserving geometric fidelity and achieving uniform distribution in challenging indoor scenarios.
Research Articles Open Access

Semantic Change Detection with Constrained Dual-Head Convolutional Neural Network Architecture for Oil/Gas Well Site Monitoring

Avaliable online: December 1, 2025
Authors: Xu, Hongzhang; He, Hongjie; Zhang, Ying; Zhang, Dedong; Li, Jonathan;
Abstract:
Read more... High-resolution mapping of land disturbance and reclamation is important for assessing the cumulative environmental effects of oil/gas production. The growing availability of high-resolution satellite imagery, combined with recent advances in deep learning, offers a desirable solution for detecting land surface changes on disturbed land. In this study, we constructed the Alberta oil/gas wells semantic change detection (SCD) data set in Alberta, Canada, based on high-resolution satellite imagery from WorldView-2 and SPOT-6. The data set consists of 328 pairs of bitemporal images (512 × 512 pixels at 1.5-m resolution), along with corresponding semantic change maps, binary change maps, and land cover maps. In addition, we proposed a constrained dual-head convolutional neural network (CNN) framework that jointly learns semantic change and binary change tasks. Specifically, two segmentation heads are designed—one for semantic changes and one for binary changes—and are explicitly connected through a cosine similarity loss that enforces consistency between the two tasks. Taking High-Resolution Net (HRNet)-v2 as the backbone, our model was pretrained on the large-scale SEmantic Change detectiON Data Set (SECOND) and fine-tuned on our developed data set. Comparative experiments with BiSRNet, HGINet, and SCanNet demonstrate that our approach achieves superior performance, with the highest mean intersection over union (mIoU) (79.47%) and separated Kappa (SeK) (28.40%) after fine-tuning. Incorporating land cover maps as additional supervision further boosts results, with our approach reaching an mIoU of 80.05% and a SeK of 29.71%. These findings highlight the effectiveness of the proposed constrained dual-head CNN architecture and the benefit of leveraging land cover information for advancing SCD in remote sensing.
Research Articles Open Access

RSIDetNet: An Efficient Oriented Small Object Detection Model for Remote Sensing Images Based on Cross-Scale Feature Fusion and Large Kernel Decomposition

Avaliable online: October 22, 2025
Authors: Kang, Zizhuang; He, Bing; Luo, Wen; Fu, Ying; He, Wei; Han, Yihui; Jia, Mingquan;
Abstract:
Read more... Small object detection in remote sensing images is crucial for maximizing data utility, but small objects face challenges due to their limited pixel coverage, low resolution, and susceptibility to background noise. This paper proposes an orientated small object detection model for remote sensing images based on cross-scale feature fusion and large kernel decomposition. The model consists of four main components: the image feature extraction module, the multi-scale feature fusion module, the cross-fusion region proposal network for generating candidate regions, and the dual detection head for predicting target categories and rotating bounding boxes. Experiments are conducted on two datasets, SODA-A and HRSC-2016, and the results show that the proposed model improves the mean average precision (mAP) by at least 6.3% over classical 1-stage models and by at least 2.6% over classical 2-stage model. In particular, when detecting very small objects (area less than 144 pixels), the mAP value is as high as 17.2%, which is a significant improvement compared with other models, indicating that it is very effective in dealing with the difficult task of small object detection.
Research Articles

Stripe Noise Removal of ZY1-02D Hyperspectral Images Using an Improved Three-Dimensional U-Net Network

Avaliable online: September 26, 2025
Authors: Gao, Ruoheng; Dong, Xinfeng; Li, Na; Cui, Jing; Li, Tongtong; Wu, Jingkai; Bai, Wei; Zhang, Rui;
Abstract:
Read more... The ZY1-02D satellite, equipped with China’s first civilian hyperspectral payload, provides valuable data for remote sensing applications. However, its hyperspectral images (HSIs) are often degraded by stripe noise, significantly limiting their practical utility. Traditional denoising methods are challenged by the complex spatial and spectral characteristics inherent to HSIs, frequently resulting in compromised image quality. Fusion residual block and attention mechanism U-Net (FEA–U-Net), a novel three-dimensional destriping network, is proposed to eliminate stripe noise in hyperspectral imagery. This framework innovatively integrates cross-dimensional attention mechanisms with deep residual learning. A composite loss function combining mean squared error and spectral angle was designed to ensure spectral fidelity before and after denoising. Through systematic evaluation across varying input band numbers, the optimal network configuration was determined. When evaluated on ZY1-02D data sets, state-of-the-art performance is achieved by FEA–U-Net, demonstrating superior geological information preservation and computational efficiency. Compared with existing methods, the highest reported denoising performance was observed, with peak signal-to-noise ratio and structural similarity index reaching 48.1681 and 0.9998, respectively. Spectral curve integrity is effectively preserved, enhancing lithological classification and mineral identification accuracy in hyperspectral imagery.
Research Articles

Scale-adaptive Knowledge Distillation with Superpixel for Hyperspectral Image Classification

Avaliable online: September 26, 2025
Authors: Dong, Shuang; Li, Ying; Xie, Ming; Han, Tingting;
Abstract:
Read more... Hyperspectral image (HSI) classification is a critical area in remote sensing with broad applications in geoscience. While deep learning methods have gained popularity for HSI classification, their potential remains underexplored due to limited labeled data. To address this, we propose a scale-adaptive knowledge distillation with superpixel framework that trains deep neural networks using unlabeled samples. The proposed framework incorporates three core components: (1) scale-adaptive superpixel knowledge distillation, (2) bilateral spatial–spectral attention mechanisms, and (3) three-dimensional (3D) hyperspectral data transformation. The distillation module implements self-supervised learning through dynamically generated soft labels based on cross-dimensional similarity metrics. The workflow proceeds through three stages: Initially, spatial–spectral joint distance metrics evaluate the affinity between unlabeled superpixels and target classes. Subsequently, these measurements inform probabilistic soft label assignments for each superpixel cluster. Finally, an end-to-end trainable dense convolutional network with dual attention pathways is refined by optimizing the divergence between the adaptive label distributions and network predictions. Additionally, 3D transformations, including spectral and spatial rotations of the HSI cube, are applied to maximize the utility of labeled data. Experiments on three public HSI data sets demonstrate that the proposed method achieves competitive accuracy and efficiency compared to existing approaches. The implementation code is available at https://github.com/San-dow/Awnsome-SAKDS_HSI.