Arrow Research search

Author name cluster

Wei Dong

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

32 papers
2 author rows

Possible papers

32

TMLR Journal 2026 Journal Article

Incorporating New Knowledge into Federated Learning: Advances, Insights, and Future Directions

  • Lixu Wang
  • Sun Yinggang
  • Yang Zhao
  • Jiaqi Wu
  • Jiahua Dong
  • Ating Yin
  • Qinbin Li
  • Qingqing Ye

Federated Learning (FL) is a distributed learning approach that allows participants to collaboratively train machine learning models without sharing the raw data. It is rapidly developing in an era where privacy protection is increasingly valued. It is this rapid development trend, along with the continuous emergence of new demands for FL in the real world, that prompts us to focus on a very important problem: How to Incorporate New Knowledge into Federated Learning? The primary challenge here is to effectively and timely incorporate various new knowledge into existing FL systems and evolve these systems to reduce costs, upgrade functionalities, and facilitate sustainable development. In the meantime, established FL systems should preserve existing functionalities during the incorporation of new knowledge. In this paper, we systematically define the main sources of new knowledge in FL, including new features, tasks, models, and algorithms. For each source, we thoroughly analyze and discuss the technical approaches for incorporating new knowledge into existing FL systems and examine the impact of the form and timing of new knowledge arrival on the incorporation process. Unlike prior surveys that primarily catalogue FL techniques under a fixed system specification, we adopt a lifecycle evolution perspective and synthesize methods that enable time-varying integration of new features, tasks, models, and aggregation algorithms while preserving existing functionality. Furthermore, we comprehensively discuss the potential future directions for FL, incorporating new knowledge and considering a variety of factors, including scenario setups, security and privacy threats, and incentives.

AAAI Conference 2026 Conference Paper

RPGen: Robust and Differentially Private Synthetic Image Generation

  • Zihao Wang
  • Hao Peng
  • Wei Dong
  • Yuecen Wei
  • Li Sun
  • Zhengtao Yu

Differentially private (DP) image synthesis enables the generation of realistic images while bounding privacy leakage, facilitating secure data sharing across organizations. However, the Gaussian noise injected during DP training, such as via DP-SGD, often severely degrades synthesis quality by disrupting model convergence. To address this, we introduce RPGen, a novel framework that enhances diffusion models' parameter robustness to mitigate DP noise effects without compromising privacy guarantees. At its core, RPGen employs adversarial model perturbation (AMP) during public pre-training to build resilience against perturbations, but we identify and tackle the critical issue of robustness transferability across domains. RPGen achieves this through a three-step process: (1) A pre-trained classifier infers labels for private images, aggregated into a class distribution noised with Gaussian mechanism for DP, and public samples are selected to match this privatized distribution for domain alignment; (2) The diffusion model is pre-trained on this curated subset with adversarial model perturbation to foster robustness; (3) The model undergoes fine-tuning on private data using DP-SGD. This synergy of robustness augmentation and transferability optimization yields high-fidelity synthesis. Extensive evaluations on ImageNet for pre-training, with CelebA and CIFAR-10 for synthesis, show RPGen outperforming state-of-the-art baselines across epsilon in 1, 5, 10. On average, it achieves 20.18% lower FID and 5.45% higher classification accuracy. Ablations confirm the efficacy of domain curation and modest perturbations, establishing RPGen as a new benchmark for privacy-utility trade-offs in image generation.

AAAI Conference 2026 Conference Paper

Seeing Beyond Illusion: Generalized and Efficient Mirror Detection

  • Mingfeng Zha
  • Guoqing Wang
  • Tianyu Li
  • Wei Dong
  • Peng Wang
  • Yang Yang

Reflective imaging enables the mirror imagings and physical entities to possess identical attributes, e.g., color and shape. Current mirror detection (MD) methods primarily rely on designing functional components to establish the correlation and disparities between the imagings and entities, thereby identifying the mirror regions. However, the exploration of extended scenes with dynamic content changes is rarely investigated. Therefore, we propose the MirrorSAM designed for MD based on the Segment Anything Model (SAM). Specifically, due to the varying reflections produced by mirrors in different positions and the complex visual space that interferes with localization, we design the hierarchical mixture of direction experts (HMDE) in the low-rank space to reduce biases towards entities in SAM and dynamically adjust experts based on the input scene. We observe differences in depth between mirrors and adjacent areas, and propose the depth token calibration (DTC), which introduces a learnable depth token to generate the depth map and serve as an error correction factor. We further formulate the selective pixel-prototype contrastive (SPPC) loss, selecting partially confusable samples to promote the decoupling of mirror and non-mirror representations. Extensive experiments conducted on four mirror benchmarks and two settings demonstrate that our approach surpasses state-of-the-art methods with few trainable parameters and FLOPs. We further extend to four transparent surface benchmarks to validate generalization.

EAAI Journal 2026 Journal Article

Self-discharge estimation for lithium-ion batteries based on formation data in production

  • Haoyuan Zheng
  • Shaobin Yang
  • Weihua Xue
  • Shouzhen Xiao
  • Ding Shen
  • Wei Dong
  • Xu Zhang

Global annual shipments of lithium-ion batteries reached 1545. 1 GW-hours (GWh) in 2024, representing a substantial increase. Notably, the energy-storage segment alone experienced a year-on-year growth of 64. 9 %. Prior to dispatch, lithium-ion batteries must undergo self-discharge testing to ensure safety and reliability. In practice, identifying the approximately 2% of batteries exhibiting excessive self-discharge requires a prolonged resting period (10-30 days) to track self-discharge voltage drop (SDV-drop), which accounts for nearly two-thirds of the overall production cycle and severely limits manufacturing efficiency. Rapid and accurate prediction of self-discharge behavior has thus become a pressing engineering challenge. This study presents an artificial intelligence enabled framework that predicts a 28-day voltage drop using formation-stage data, thereby obviating the prolonged rest period. The approach integrates latent feature extraction from charge-discharge curves, unsupervised clustering, and transfer learning. Specifically, both comprehensive temporal and static features are automatically extracted from current, voltage, and capacity trajectories, along with scalar performance indicators. A hybrid K-means-t-distributed stochastic neighbor embedding (t-SNE) algorithm partitions the dataset into internally homogeneous clusters, enhancing intra-cluster consistency and inter-cluster separability. During transfer learning, maximum mean discrepancy aligns feature distributions between source and target domains, while a feature-label consistency constraint further mitigates domain shift and improves generalization. Comparative experiments demonstrate that the proposed model markedly outperforms state-of-the-art baselines in predicting SDV-drop. This framework thus provides a theoretical foundation and practical pathway for rapid self-discharge assessment, which enables significant reductions in production cycle time and improves manufacturing efficiency.

AAAI Conference 2026 Conference Paper

SemanticNN: Compressive and Error-Resilient Semantic Offloading for Extremely Weak Devices

  • Jiaming Huang
  • Yi Gao
  • Fuchang Pan
  • Renjie Li
  • Wei Dong

With the rapid growth of the Internet of Things (IoT), integrating artificial intelligence (AI) on extremely weak embedded devices has garnered significant attention, enabling improved real-time performance and enhanced data privacy. However, the resource limitations of such devices and unreliable network conditions necessitate error-resilient device-edge collaboration systems. Traditional approaches focus on bit-level transmission correctness, which can be inefficient under dynamic channel conditions. In contrast, we propose SemanticNN, a semantic codec that tolerates bit-level errors in pursuit of semantic-level correctness, enabling compressive and resilient collaborative inference offloading under strict computational and communication constraints. It incorporates a Bit Error Rate (BER)-aware decoder that adapts to dynamic channel conditions and a Soft Quantization (SQ)-based encoder to learn compact representations. Building on this architecture, we introduce Feature-augmentation Learning, a novel training strategy that enhances offloading efficiency. To address encoder-decoder capability mismatches from asymmetric resources, we propose XAI-based Asymmetry Compensation to enhance decoding semantic fidelity. We conduct extensive experiments on STM32 using three models and six datasets across image classification and object detection tasks. Experimental results demonstrate that, under varying transmission error rates, SemanticNN significantly reduces feature transmission volume by 56.82–344.83× while maintaining superior inference accuracy.

AAAI Conference 2026 Conference Paper

Zero-Reference Joint Low-Light Enhancement and Deblurring via Visual Autoregressive Modeling with VLM-Derived Modulation

  • Wei Dong
  • Han Zhou
  • Junwei Lin
  • Jun Chen

Real-world dark images commonly exhibit not only low visibility and contrast but also complex noise and blur, posing significant restoration challenges. Existing methods often rely on paired data or fail to model dynamic illumination and blur characteristics, leading to poor generalization. To tackle this, we propose a generative framework based on visual autoregressive (VAR) modeling, guided by perceptual priors from the vision-language model (VLM). Specifically, to supply informative conditioning cues for VAR models, we deploy an adaptive curve estimation scheme to modulate the diverse illumination based on VLM-derived visibility scores. In addition, we integrate dynamic and spatial-frequency-aware Rotary Positional Encodings (SF-RoPE) into VAR to enhance its ability to model structures degraded by blur. Furthermore, we propose a recursive phase-domain modulation strategy that mitigates blur-induced artifacts in the phase domain via bounded iterative refinement guided by VLM-assessed blur scores. Our framework is fully unsupervised and achieves state-of-the-art performance on benchmark datasets.

TMLR Journal 2025 Journal Article

A Survey on Large Language Model Acceleration based on KV Cache Management

  • Haoyang Li
  • Yiming Li
  • Anxin Tian
  • Tianhao Tang
  • Zhanchao Xu
  • Xuejia Chen
  • Nicole HU
  • Wei Dong

Large Language Models (LLMs) have revolutionized a wide range of domains such as natural language processing, computer vision, and multi-modal tasks due to their ability to comprehend context and perform logical reasoning. However, the computational and memory demands of LLMs, particularly during inference, pose significant challenges when scaling them to real-world, long-context, and real-time applications. Key-Value (KV) cache management has emerged as a critical optimization technique for accelerating LLM inference by reducing redundant computations and improving memory utilization. This survey provides a comprehensive overview of KV cache management strategies for LLM acceleration, categorizing them into token-level, model-level, and system-level optimizations. Token-level strategies include KV cache selection, budget allocation, merging, quantization, and low-rank decomposition, while model-level optimizations focus on architectural innovations and attention mechanisms to enhance KV reuse. System-level approaches address memory management, scheduling, and hardware-aware designs to improve efficiency across diverse computing environments. Additionally, the survey provides an overview of both text and multimodal datasets and benchmarks used to evaluate these strategies. By presenting detailed taxonomies and comparative analyses, this work aims to offer useful insights for researchers and practitioners to support the development of efficient and scalable KV cache management techniques, contributing to the practical deployment of LLMs in real-world applications.

EAAI Journal 2025 Journal Article

Adaptive weighted multiple imputation with generative adversarial networks for improving wind speed data integrity

  • Weirui Jiang
  • Jinxing Che
  • Kun Hu
  • Yifan Xu
  • Wei Dong

The integrity of wind speed data is crucial for energy planning and management. Generative Adversarial Imputation Network (GAIN) provides a new approach to address the significant challenge in the field of data science of accurately reconstructing missing data. However, the autocorrelation in time series often leads to reliance on time windows for data imputation, which tends to ignore global correlations within the data. To this end, this paper presents to perform high-dimensional mapping of the data and embed high-dimensional positional encoding, using a Euclidean norm penalty (L2 penalty) to constrain the generator's high-dimensional parameters, to improve the global correlations ability of the traditional GAIN model. This approach not only overcomes the limitations of time windows but also effectively captures the autocorrelation in time series, providing a global perspective for the imputation task. To enhance the imputation performance under high missing rate, an adaptive weighted multiple imputation mechanism is proposed, which assigns appropriate weights to each imputation result, thereby significantly reducing randomness and uncertainty. Experimental results, based on wind speed data with varying missing rates and distributions across four seasons, demonstrate that the proposed imputation framework achieves superior imputation accuracy and performs exceptionally well across all scenarios. Compared to the traditional GAIN model, our approach achieves average improvements of 44. 97 %, 48. 35 %, and 61. 00 % under low, medium, and high missing rate conditions, respectively. The corresponding average Root Mean Square Error (RMSE) values are reduced to as low as 0. 50, 0. 58, and 0. 75, highlighting the robustness and effectiveness of the proposed method.

AAAI Conference 2025 Conference Paper

Low-Light Image Enhancement via Generative Perceptual Priors

  • Han Zhou
  • Wei Dong
  • Xiaohong Liu
  • Yulun Zhang
  • Guangtao Zhai
  • Jun Chen

Although significant progress has been made in enhancing visibility, retrieving texture details, and mitigating noise in Low-Light (LL) images, the challenge persists in applying current Low-Light Image Enhancement (LLIE) methods to real-world scenarios, primarily due to the diverse illumination conditions encountered. Furthermore, the quest for generating enhancements that are visually realistic and attractive remains an underexplored realm. In response to these challenges, we present a novel LLIE framework with the guidance of Generative Perceptual Priors (GPP-LLIE) derived from vision-language models (VLMs). Specifically, we first propose a pipeline that guides VLMs to assess multiple visual attributes of the LL image and quantify the assessment to output the global and local perceptual priors. Subsequently, to incorporate these generative perceptual priors to benefit LLIE, we introduce a transformer-based backbone in the diffusion process, and develop a new layer normalization (GPP-LN) and an attention mechanism (LPP-Attn) guided by global and local perceptual priors. Extensive experiments demonstrate that our model outperforms current SOTA methods on paired LL datasets and exhibits superior generalization on real-world data.

NeurIPS Conference 2025 Conference Paper

Sum Estimation under Personalized Local Differential Privacy

  • Dajun Sun
  • Wei Dong
  • Yuan Qiu
  • Ke Yi
  • Graham Cormode

People have diverse privacy requirements. This is best modeled using a personalized local differential privacy model where each user privatizes their data using a possibly different privacy parameter. While the model of personalized local differential privacy is a natural and important one, prior work has failed to give meaningful error bounds. In this paper, we study the foundational sum/mean estimation problem under this model. We present two novel protocols that achieve strong error guarantees. The first gives a guarantee based on the radius of the data, suiting inputs that are centered around zero. The second extends the guarantee to the diameter of the data, capturing the case when the points are situated arbitrarily. Experimental results on both synthetic and real data show that our protocols significantly outperform existing methods in terms of accuracy while providing a strong level of privacy.

AAAI Conference 2025 Conference Paper

TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings

  • Dawei Yan
  • Pengcheng Li
  • Yang Li
  • Hao Chen
  • Qingguo Chen
  • Weihua Luo
  • Wei Dong
  • Qingsen Yan

Currently, inspired by the success of vision-language models (VLMs), an increasing number of researchers are focusing on improving VLMs and have achieved promising results. However, most existing methods concentrate on optimizing the connector and enhancing the language model component, while neglecting improvements to the vision encoder itself. In contrast, we propose Text Guided LLaVA (TG-LLaVA) in this paper, which optimizes VLMs by guiding the vision encoder with text, offering a new and orthogonal optimization direction. Specifically, inspired by the purpose-driven logic inherent in human behavior, we use learnable latent embeddings as a bridge to analyze textual instruction and add the analysis results to the vision encoder as guidance, refining it. Subsequently, another set of latent embeddings extracts additional detailed text-guided information from high-resolution local patches as auxiliary information. Finally, with the guidance of text, the vision encoder can extract text-related features, similar to how humans focus on the most relevant parts of an image when considering a question. This results in generating better answers. Experiments on various datasets validate the effectiveness of the proposed method. Remarkably, without the need for additional training data, our proposed method can bring more benefits to the baseline (LLaVA-1.5) compared with other concurrent methods. Furthermore, the proposed method consistently brings improvement in different settings.

YNIMG Journal 2025 Journal Article

Uncovering the neural basis of risk preferences in cooperative Dyads: A fNIRS study

  • Qianlan Yin
  • Jing Wen
  • Shuo Chen
  • Tianya Hou
  • Ying Liu
  • Danni Yang
  • Guorui Liu
  • Peiqi Shi

BACKGROUND: Individuals' risk preferences have been shown to influence their decision-making in various contexts. However, the neural mechanisms underlying the relationship between risk preference and decision-making in a social setting remain unclear. This study utilized functional near-infrared spectroscopy (fNIRS) to investigate the neural correlates of dyadic decision-making under risk and the modulating effect of individual risk preference. METHOD: This study examined the impact of risk preference on group decision-making using a two-phase experimental design. Based on G-power software calculations, 168 right-handed participants (62 males, 106 females, mean age 21.26±1.70) were recruited. Participants first completed a single-player Sequential Risk Task to measure risk preference, followed by group classification into three groups: Risky&Risky, Risky&Safe, and Safe&Safe. Task performance and decision-making behavior were recorded. Functional Near-Infrared Spectroscopy (fNIRS) was employed to measure cortical activation in the prefrontal cortex, focusing on inter-brain synchrony and coupling directionality using wavelet coherence and Granger causality(GC) analyses. Data were preprocessed to remove noise, and statistical analyses included repeated measures ANOVAs, Support Vector Regression and multiple regression analyses. RESULTS: = 0.173 and 0.191). CONCLUSION: This study employed fNIRS hyperscanning to investigate how individual differences in risk preference impact decision-making in dyadic contexts. The results indicated that variations in connectivity and information transfer between the orbitofrontal and medial prefrontal cortices underlie the distinct risk-taking behaviors exhibited by dyadic pairs. These findings underscore the pivotal role of affective and cognitive control mechanisms and individual risk personality traits in cooperative decision-making under conditions of uncertainty.

IROS Conference 2024 Conference Paper

BEVRender: Vision-based Cross-view Vehicle Registration in Off-road GNSS-denied Environment

  • Lihong Jin
  • Wei Dong
  • Wenshan Wang
  • Michael Kaess

We introduce BEVRender, a novel learning-based approach for the localization of ground vehicles in Global Navigation Satellite System (GNSS)-denied off-road scenarios. These environments are typically challenging for conventional vision-based state estimation due to the lack of distinct visual landmarks and the instability of vehicle poses. To address this, BEVRender generates high-quality local bird’s-eye-view (BEV) images of the local terrain. Subsequently, these images are aligned with a georeferenced aerial map through template matching to achieve accurate cross-view registration. Our approach overcomes the inherent limitations of visual inertial odometry systems and the substantial storage requirements of image-retrieval localization strategies, which are susceptible to drift and scalability issues, respectively. Extensive experimentation validates BEVRender’s advancement over existing GNSS-denied visual localization methods, demonstrating notable enhancements in both localization accuracy and update frequency.

NeurIPS Conference 2024 Conference Paper

ECMamba: Consolidating Selective State Space Model with Retinex Guidance for Efficient Multiple Exposure Correction

  • Wei Dong
  • Han Zhou
  • Yulun Zhang
  • Xiaohong Liu
  • Jun Chen

Exposure Correction (EC) aims to recover proper exposure conditions for images captured under over-exposure or under-exposure scenarios. While existing deep learning models have shown promising results, few have fully embedded Retinex theory into their architecture, highlighting a gap in current methodologies. Additionally, the balance between high performance and efficiency remains an under-explored problem for exposure correction task. Inspired by Mamba which demonstrates powerful and highly efficient sequence modeling, we introduce a novel framework based on \textbf{Mamba} for \textbf{E}xposure \textbf{C}orrection (\textbf{ECMamba}) with dual pathways, each dedicated to the restoration of reflectance and illumination map, respectively. Specifically, we firstly derive the Retinex theory and we train a Retinex estimator capable of mapping inputs into two intermediary spaces, each approximating the target reflectance and illumination map, respectively. This setup facilitates the refined restoration process of the subsequent \textbf{E}xposure \textbf{C}orrection \textbf{M}amba \textbf{M}odule (\textbf{ECMM}). Moreover, we develop a novel \textbf{2D S}elective \textbf{S}tate-space layer guided by \textbf{Retinex} information (\textbf{Retinex-SS2D}) as the core operator of \textbf{ECMM}. This architecture incorporates an innovative 2D scanning strategy based on deformable feature aggregation, thereby enhancing both efficiency and effectiveness. Extensive experiment results and comprehensive ablation studies demonstrate the outstanding performance and the importance of each component of our proposed ECMamba. Code is available at \url{https: //github. com/LowlevelAI/ECMamba}.

NeurIPS Conference 2024 Conference Paper

Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation

  • Wei Dong
  • Yuan Sun
  • Yiting Yang
  • Xing Zhang
  • Zhijun Lin
  • Qingsen Yan
  • Haokui Zhang
  • Peng Wang

A common strategy for Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks by learning a low-rank adaptation matrix. This matrix is decomposed into a product of down-projection and up-projection matrices, with the bottleneck dimensionality being crucial for reducing the number of learnable parameters, as exemplified by prevalent methods like LoRA and Adapter. However, these low-rank strategies typically employ a fixed bottleneck dimensionality, which limits their flexibility in handling layer-wise variations. To address this limitation, we propose a novel PEFT approach inspired by Singular Value Decomposition (SVD) for representing the adaptation matrix. SVD decomposes a matrix into the product of a left unitary matrix, a diagonal matrix of scaling values, and a right unitary matrix. We utilize Householder transformations to construct orthogonal matrices that efficiently mimic the unitary matrices, requiring only a vector. The diagonal values are learned in a layer-wise manner, allowing them to flexibly capture the unique properties of each layer. This approach enables the generation of adaptation matrices with varying ranks across different layers, providing greater flexibility in adapting pre-trained models. Experiments on standard downstream vision tasks demonstrate that our method achieves promising fine-tuning performance.

NeurIPS Conference 2023 Conference Paper

Efficient Adaptation of Large Vision Transformer via Adapter Re-Composing

  • Wei Dong
  • Dawei Yan
  • Zhijun Lin
  • Peng Wang

The advent of high-capacity pre-trained models has revolutionized problem-solving in computer vision, shifting the focus from training task-specific models to adapting pre-trained models. Consequently, effectively adapting large pre-trained models to downstream tasks in an efficient manner has become a prominent research area. Existing solutions primarily concentrate on designing lightweight adapters and their interaction with pre-trained models, with the goal of minimizing the number of parameters requiring updates. In this study, we propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation from a fresh perspective. Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme. Specifically, we leverage symmetric down-/up-projections to construct bottleneck operations, which are shared across layers. By learning low-dimensional re-scaling coefficients, we can effectively re-compose layer-adaptive adapters. This parameter-sharing strategy in adapter design allows us to further reduce the number of new parameters while maintaining satisfactory performance, thereby offering a promising approach to compress the adaptation cost. We conduct experiments on 24 downstream image classification tasks using various Vision Transformer variants to evaluate our method. The results demonstrate that our approach achieves compelling transfer learning performance with a reduced parameter count. Our code is available at https: //github. com/DavidYanAnDe/ARC.

NeurIPS Conference 2022 Conference Paper

Differentially Private Covariance Revisited

  • Wei Dong
  • Yuting Liang
  • Ke Yi

In this paper, we present two new algorithms for covariance estimation under concentrated differential privacy (zCDP). The first algorithm achieves a Frobenius error of $\tilde{O}(d^{1/4}\sqrt{\mathrm{tr}}/\sqrt{n} + \sqrt{d}/n)$, where $\mathrm{tr}$ is the trace of the covariance matrix. By taking $\mathrm{tr}=1$, this also implies a worst-case error bound of $\tilde{O}(d^{1/4}/\sqrt{n})$, which improves the standard Gaussian mechanism's $\tilde{O}(d/n)$ for the regime $d>\widetilde{\Omega}(n^{2/3})$. Our second algorithm offers a tail-sensitive bound that could be much better on skewed data. The corresponding algorithms are also simple and efficient. Experimental results show that they offer significant improvements over prior work.

IROS Conference 2022 Conference Paper

Learned Depth Estimation of 3D Imaging Radar for Indoor Mapping

  • Ruoyang Xu
  • Wei Dong
  • Akash Sharma
  • Michael Kaess

3D imaging radar offers robust perception capability through visually demanding environments due to the unique penetrative and reflective properties of millimeter waves (mmWave). Current approaches for 3D perception with imaging radar require knowledge of environment geometry, accumulation of data from multiple frames for perception, or access to between-frame motion. Imaging radar presents an additional difficulty due to the complexity of its data representation. To address these issues, and make imaging radar easier to use for downstream robotics tasks, we propose a learning-based method that regresses radar measurements into cylindrical depth maps using LiDAR supervision. Due to the limitation of the regression formulation, directions where the radar beam could not reach will still generate a valid depth. To address this issue, our method additionally learns a 3D filter to remove those pixels. Experiments show that our system generates visually accurate depth estimation. Furthermore, we confirm the overall ability to generalize in the indoor scene using the estimated depth for probabilistic occupancy mapping with ground truth trajectory. The code and model will be released 1 1 https://github.com/rpl-cmu/learned-depth-imaging-radar.

ICRA Conference 2021 Conference Paper

Compositional and Scalable Object SLAM

  • Akash Sharma
  • Wei Dong
  • Michael Kaess

We present a fast, scalable, and accurate Simultaneous Localization and Mapping (SLAM) system that represents indoor scenes as a graph of objects. Leveraging the observation that artificial environments are structured and occupied by recognizable objects, we show that a compositional and scalable object mapping formulation is amenable to a robust SLAM solution for drift-free large-scale indoor reconstruction. To achieve this, we propose a novel semantically assisted data association strategy that results in unambiguous persistent object landmarks and a 2. 5D compositional rendering method that enables reliable frame-to-model RGB-D tracking. Consequently, we deliver an optimized online implementation that can run at near frame rate with a single graphics card, and provide a comprehensive evaluation against state-of-the-art baselines. An open-source implementation will be provided at https://github.com/rpl-cmu/object-slam.

IROS Conference 2021 Conference Paper

Map Compressibility Assessment for LiDAR Registration

  • Ming-Fang Chang
  • Wei Dong
  • Joshua G. Mangelson
  • Michael Kaess
  • Simon Lucey

We aim to assess the performance of LiDAR-to-map registration on compressive maps. Modern autonomous vehicles utilize pre-built HD (High-Definition) maps to perform sensor-to-map registration, which recovers pose estimation failures and reduces drift in a large-scale environment. However, sensor-to-map registration is usually realized by registering the sensor to a dense 3D model, which occupies massive storage space in the HD map and requires much data processing overhead. Although smaller 3D models are preferable, the optimal compressive map format for preservation of the best registration performance remains unclear. In this paper, we propose a novel and challenging benchmark to evaluate existing LiDAR-to-map registration methods from three perspectives: map compressibility, robustness, and precision. We compared various map formats, including raw points, hierarchical GMMs, and feature points, and show their performance trade-offs between compressibility and robustness on real-world LiDAR datasets: KITTI Odometry Dataset and Argoverse Tracking Dataset. Our benchmark reveals that state-of-the-art deep feature point based methods outperform traditional methods significantly when the map size budget is high. However, when map size budget is low, deep methods are outperformed by the methods using simpler models in Argoverse Tracking Dataset due to poor spatial coverage. In addition, we observe that the recently published TEASER++ significantly outperforms RANSAC for the feature point methods. Our analysis provides a valuable reference for the community to design budgeted real-world systems and find potential research opportunities. We will release the benchmark for public use.

JBHI Journal 2021 Journal Article

Towards Predictive Analysis on Disease Progression: A Variational Hawkes Process Model

  • Zhaohong Sun
  • Zhoujian Sun
  • Wei Dong
  • Jinlong Shi
  • Zhengxing Huang

Massively available longitudinal data about long-term disease trajectories of patients provides a golden mine for the understanding of disease progression and efficient health service delivery. It calls for quantitative modeling of disease progression, which is a tricky problem due to the complexity of the disease progression process as well as the irregularity of time documented in trajectories. In this study, we tackle the problem with the goal of predictively analyzing disease progression. Specifically, we propose a novel Variational Hawkes Process (VHP) model to generalize disease progression and predict future patient states based on the clinical observational data of past disease trajectories. First, Hawkes Process captures the intensity of irregular visits in a trajectory documented to medical facilities and controls the aforementioned information flowing into future visits. Thereafter, the captured intensity is incorporated into a Variational Auto-Encoder to generate the representation of the future partial disease trajectory for a target patient in a predictive manner. To further improve the prediction performance, we equip the proposed model with a disease trajectory discriminator to distinguish the generated trajectories from real ones. We evaluate the proposed model on two public datasets from the MIMIC-III database pertaining to heart failure and sepsis patients, respectively, and one real-world dataset from a Chinese hospital pertaining to heart failure patients with multiple admissions. Experimental results demonstrate that the proposed model significantly outperforms state-of-the-art baselines, and may derive a set of practical implications that can benefit a wide spectrum of management and applications on disease progression.

JBHI Journal 2020 Journal Article

On Clinical Event Prediction in Patient Treatment Trajectory Using Longitudinal Electronic Health Records

  • Huilong Duan
  • Zhoujian Sun
  • Wei Dong
  • Kunlun He
  • Zhengxing Huang

Healthcare process leaves patient treatment trajectory (PTT), described as a sequence of interdependent clinical events affiliated with a large volume of longitudinal therapy and treatment information. Predicting the future clinical event in PTT, as a vital and essential task for providing insights into the entire treatment trajectory, can serve as an efficient and proactive altering service for health service delivery. However, it is challenging because there are long-term dependencies between clinical events, which are irregularly distributed along the temporal axis with varying time intervals. This characteristic inevitably impedes the performance of clinical event prediction (CEP) using the existing approaches. To address this challenge, we propose a novel approach to learn representative and discriminative PTT features for CEP. In detail, multivariate Hawkes process (HP) is adopted to uncover the mutual excitation intensities between clinical event pairs in an interpretable manner. Thereafter, the captured spontaneous and interactional intensities of events are incorporated into recurrent neural networks (RNN) to encode PTT in latent representations, while jointly performing the CEP task based on the extracted trajectory representations. We evaluate the performance of the proposed approach on a real clinical dataset consisting of 13, 545 visits of 2, 102 heart failure patients. Compared to state-of-the-art methods, our best model achieves 6. 4% and 4. 1% AUC performance gains on three-months and one-year CEP tasks, respectively. The experimental results demonstrate that the proposed approach outperforms state-of-the-art models in CEP, and can be profitably exploited as a basis for PTT analysis and optimization.

JBHI Journal 2019 Journal Article

Adversarial MACE Prediction After Acute Coronary Syndrome Using Electronic Health Records

  • Zhengxing Huang
  • Wei Dong

Acute coronary syndrome (ACS), as an emergent and severe syndrome due to decreased blood flow in the coronary arteries, is a leading cause of death and serious long-term disability globally. ACS is usually caused by one of three problems: ST elevation myocardial infarction, non-ST elevation myocardial infarction, or unstable angina. Major adverse cardiac event (MACE) prediction, as a critical tool to estimate the likelihood an individual is at risk of ACS, has been widely adopted in the early prevention and intervention of ACS. Although valuable, existing MACE prediction models are designed to predict the overall probability of MACE occurrence for ACS patients, and lack the ability to look for insight into the disease to distinguish the different subtypes of ACS in a fine-grained manner. It is interesting to exploit the different subtypes of ACS and mine their private and shared underlying knowledge to improve the performance of MACE prediction. In this study, we propose utilizing a large volume of heterogeneous electronic health records for the application of MACE prediction. In detail, we address the multi-subtype-oriented MACE prediction for ACS as a multi-task learning (MTL) problem, present a MTL-based model to predict MACE of ACS patients with the different subtypes, and incorporate adversarial learning into the model to alleviate both the shared and private latent feature spaces of each subtype of ACS from interfering with each other. A real clinical dataset containing 2, 863 ACS patient samples is collected from a Chinese hospital to validate the proposed model. Experimental results demonstrate that the prediction performance of our proposed model obtains a significant improvement, compared to single-subtype-oriented MACE prediction models.

IROS Conference 2019 Conference Paper

GPU Accelerated Robust Scene Reconstruction

  • Wei Dong
  • Jaesik Park
  • Yi Yang
  • Michael Kaess

We propose a fast and accurate 3D reconstruction system that takes a sequence of RGB-D frames and produces a globally consistent camera trajectory and a dense 3D geometry. We redesign core modules of a state-of-the-art offline reconstruction pipeline to maximally exploit the power of GPU. We introduce GPU accelerated core modules that include RGBD odometry, geometric feature extraction and matching, point cloud registration, volumetric integration, and mesh extraction. Therefore, while being able to reproduce the results of the high-fidelity offline reconstruction system, our system runs more than 10 times faster on average. Nearly 10Hz can be achieved in medium size indoor scenes, making our offline system even comparable to online Simultaneous Localization and Mapping (SLAM) systems in terms of the speed. Experimental results show that our system produces more accurate results than several state-of-the-art online systems. The system is open source at https://github.com/theNded/Open3D.

ICRA Conference 2019 Conference Paper

Surfel-Based Dense RGB-D Reconstruction With Global And Local Consistency

  • Yi Yang
  • Wei Dong
  • Michael Kaess

Achieving high surface reconstruction accuracy in dense mapping has been a desirable target for both robotics and vision communities. In the robotics literature, simultaneous localization and mapping (SLAM) systems use RGB-D cameras to reconstruct a dense map of the environment. They leverage the depth input to provide accurate local pose estimation and a locally consistent model. However, drift in the pose tracking over time leads to misalignments and artifacts. On the other hand, offline computer vision methods, such as the pipeline that combines structure-from-motion (SfM) and multi-view stereo (MVS), estimate the camera poses by performing batch optimization. These methods achieve global consistency, but suffer from heavy computation loads. We propose a novel approach that integrates both methods to achieve locally and globally consistent reconstruction. First, we estimate poses of keyframes in the offline SfM pipeline to provide strong global constraints at relatively low cost. Afterwards, we compute odometry between frames driven by off-the-shelf SLAM systems with high local accuracy. We fuse the two pose estimations using factor graph optimization to generate accurate camera poses for dense reconstruction. Experiments on real-world and synthetic datasets demonstrate that our approach produces more accurate models comparing to existing dense SLAM systems, while achieving significant speedup with respect to state-of-the-art SfM-MVS pipelines.

ICRA Conference 2018 Conference Paper

An Efficient Volumetric Mesh Representation for Real-Time Scene Reconstruction Using Spatial Hashing

  • Wei Dong
  • Jieqi Shi
  • Weijie Tang
  • Xin Wang 0072
  • Hongbin Zha

Mesh plays an indispensable role in dense realtime reconstruction essential in robotics. Efforts have been made to maintain flexible data structures for 3D data fusion, yet an efficient incremental framework specifically designed for online mesh storage and manipulation is missing. We propose a novel framework to compactly generate, update, and refine mesh for scene reconstruction upon a volumetric representation. Maintaining a spatial-hashed field of cubes, we distribute vertices with continuous value on discrete edges that support O(1) vertex accessing and forbid memory redundancy. By introducing Hamming distance in mesh refinement, we further improve the mesh quality regarding the triangle type consistency with a low cost. Lock-based and lock-free operations were applied to avoid thread conflicts in GPU parallel computation. Experiments demonstrate that the mesh memory consumption is significantly reduced while the running speed is kept in the online reconstruction process.

AIIM Journal 2015 Journal Article

On local anomaly detection and analysis for clinical pathways

  • Zhengxing Huang
  • Wei Dong
  • Lei Ji
  • Liangying Yin
  • Huilong Duan

Objective Anomaly detection, as an imperative task for clinical pathway (CP) analysis and improvement, can provide useful and actionable knowledge of interest to clinical experts to be potentially exploited. Existing studies mainly focused on the detection of global anomalous inpatient traces of CPs using the similarity measures in a structured manner, which brings order in the chaos of CPs, may decline the accuracy of similarity measure between inpatient traces, and may distort the efficiency of anomaly detection. In addition, local anomalies that exist in some subsegments of events or behaviors in inpatient traces are easily overlooked by existing approaches since they are designed for detecting global or large anomalies. Method In this study, we employ a probabilistic topic model to discover underlying treatment patterns, and assume any significant unexplainable deviations from the normal behaviors surmised by the derived patterns are strongly correlated with anomalous behaviours. In this way, we can figure out the detailed local abnormal behaviors and the associations between these anomalies such that diagnostic information on local anomalies can be provided. Results The proposed approach is evaluated via a clinical data-set, including 2954 unstable angina patient traces and 483, 349 clinical events, extracted from a Chinese hospital. Using the proposed method, local anomalies are detected from the log. In addition, the identified associations between the detected local anomalies are derived from the log, which lead to clinical concern on the reason resulting in these anomalies in CPs. The correctness of the proposed approach has been evaluated by three experience cardiologists of the hospital. For four types of local anomalies (i. e. , unexpected events, early events, delay events, and absent events), the proposed approach achieves 94%, 71% 77%, and 93. 2% in terms of recall. This is quite remarkable as we do not use a prior knowledge. Conclusion Substantial experimental results show that the proposed approach can effectively detect local anomalies in CPs, and also provide diagnostic information on the detected anomalies in an informative manner.

JBHI Journal 2014 Journal Article

Similarity Measure Between Patient Traces for Clinical Pathway Analysis: Problem, Method, and Applications

  • Zhengxing Huang
  • Wei Dong
  • Huilong Duan
  • Haomin Li

Clinical pathways leave traces, described as event sequences with regard to a mixture of various latent treatment behaviors. Measuring similarities between patient traces can profitably be exploited further as a basis for providing insights into the pathways, and complementing existing techniques of clinical pathway analysis (CPA), which mainly focus on looking at aggregated data seen from an external perspective. Most existing methods measure similarities between patient traces via computing the relative distance between their event sequences. However, clinical pathways, as typical human-centered processes, always take place in an unstructured fashion, i. e. , clinical events occur arbitrarily without a particular order. Bringing order in the chaos of clinical pathways may decline the accuracy of similarity measure between patient traces, and may distort the efficiency of further analysis tasks. In this paper, we present a behavioral topic analysis approach to measure similarities between patient traces. More specifically, a probabilistic graphical model, i. e. , latent Dirichlet allocation (LDA), is employed to discover latent treatment behaviors of patient traces for clinical pathways such that similarities of pairwise patient traces can be measured based on their underlying behavioral topical features. The presented method provides a basis for further applications in CPA. In particular, three possible applications are introduced in this paper, i. e. , patient trace retrieval, clustering, and anomaly detection. The proposed approach and the presented applications are evaluated via a real-world dataset of several specific clinical pathways collected from a Chinese hospital.

IROS Conference 2010 Conference Paper

Kinematics parameters estimation for an AFM/robot integrated micro-force measurement system

  • Wei Dong
  • David Rostoucher
  • Michaël Gauthier

This paper introduces a novel atomic force microscope (AFM) and parallel robot integrated micro-force measurement system whose objective is the measurement of adhesion force between planar micro-objects. This paper is mainly focused on the kinematics parameters estimation between the objects to be measured, the parallel robot and the AFM system in order to position both objects during measurement. A substrate is placed on the end-platform of the parallel robot system, on which three markers are utilized as the reference information to the kinematics parameters estimation. The markers are identified by the AFM scanning in order to identify the kinematics parameters of the whole system. Based on the classic Gauss-Newton algorithm, the position and orientation can be solved. Finally, the effectiveness of the proposed method is demonstrated through the experiments on the prototype of the micro-force measurement system. The parameters estimation methodology outlined is generic and also can be extended to a variety of applications in calibration of micro-robots.

ICRA Conference 2008 Conference Paper

A low-cost motion tracker and its error analysis

  • Wei Dong
  • Kwang Yong Lim
  • Young Koon Goh
  • Kim Doang Nguyen
  • I-Ming Chen 0001
  • Song Huat Yeo
  • Henry Been-Lirn Duh

This paper develops a physical model of an inertial/magnetic measurement unit by effectively integrating an accelerometer, a magnetometer, and two gyroscopes for low-g motion tracking applications. The proposed model breaks down the errors contributed by individual components, then determines error elimination methods based on sensor behavior and characteristics, and finally constructs a feedback loop for continuous self-calibration. Measurement errors are reduced by adopting a systematic design methodology: 1) tilt errors are minimized through a careful selection of A/D convertor resolution and by making compensation on sensor bias and scale factor; 2) heading errors are reduced by cancelling out nearby ferrous distortions and making tilt-compensation on the magnetometer; 3) errors from gyroscope measurements are eliminated via the least squares algorithm and continuous corrections using orientation data at the steady-state position. Preliminary tests for low-g motion sensing show that the motion tracker can achieve less than ±0. 5° accuracy in tilt and less than ±1° accuracy in yaw angle measurement with above-mentioned methods.

ICRA Conference 2008 Conference Paper

A wearable, self-calibrating, wireless sensor network for body motion processing

  • Kwang Yong Lim
  • F. Young Koon Goh
  • Wei Dong
  • Kim Doang Nguyen
  • I-Ming Chen 0001
  • Song Huat Yeo
  • Henry Been-Lirn Duh
  • Chung Gon Kim

A novel self-calibrating sensing technology using miniature linear encoders and Inertial/magnetic Measurement Unit (IMU) provides the accuracy, fast response and robustness required by many body motion processing applications. Our sensor unit consists of an accelerometer, a 3-axis magnetic sensor, 2 gyroscopes and a miniature linear encoder. The fusion of data from the sensors is accomplished by extracting the gravity related term from the accelerometer and consistently calibrating the gyroscopes and linear encoder when the sensor unit is under static conditions. Using the fused sensors, we developed a complete motion processing system that consists of a gateway where the human kinematics modeling is embedded. A time divided multiple access wireless architecture is adopted to synchronize the sensor network at 100Hz. Experimental results show that the combination of the IMU and linear encoder produces a low RMS error of 3. 5° and correlation coefficient of 99. 01%. A video showing the capture a performer’s upper body motion is also realized.