Arrow Research search

Author name cluster

Lei He

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers
2 author rows

Possible papers

20

EAAI Journal 2026 Journal Article

A doubly reinforced local search for solving the Quadratic Multiple Knapsack Problem

  • Yingsong Nie
  • Lingyan Zhang
  • Xiaolu Liu
  • Lei He
  • Tao Guan
  • Yan Jin

The Quadratic Multiple Knapsack Problem (QMKP) is a computationally challenging combinatorial optimization problem with important real-world applications in manufacturing and production planning. Traditional methods often struggle to effectively balance intensification and diversification, relying heavily on expert experience to guide the search process. To address these limitations, we propose a doubly reinforced local search algorithm (denoted as DRLS) that integrates two distinct reinforcement learning methods into a multi-neighborhood tabu search framework, each activated at different stages of the search process to improve adaptive decision-making. Specifically, a multi-armed bandit mechanism is incorporated into the neighborhood selection phase to dynamically select promising neighborhood operators in a stateless environments for effective search. In addition, a Q-learning model is employed in item removal phase to state-dependent remove items, enabling the search to escape from local optima. Evaluations on 720 benchmark instances across four datasets demonstrate that DRLS consistently outperforms six state-of-the-art algorithms in both solution quality and runtime efficiency. In particular, DRLS discovers new best-known solutions for over 50% of the instances, highlighting its effectiveness. Additional experiments are presented to gain insight into the role of the reinforcement learning components.

AAAI Conference 2026 Conference Paper

Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark

  • Jiahao Wang
  • Xiangyu Cao
  • Jiaru Zhong
  • Yuner Zhang
  • Zeyu Han
  • Haibao Yu
  • Chuang Zhang
  • Lei He

While cooperative perception can overcome the limitations of single-vehicle systems, the practical implementation of vehicle-to-vehicle and vehicle-to-infrastructure systems is often impeded by significant economic barriers. Aerial-ground cooperation (AGC), which pairs ground vehicles with drones, presents a more economically viable and rapidly deployable alternative. However, this emerging field has been held back by a critical lack of high-quality public datasets and benchmarks. To bridge this gap, we present Griffin, a comprehensive AGC 3D perception dataset, featuring over 250 dynamic scenes (37k+ frames). It incorporates varied drone altitudes (20-60m), diverse weather conditions, realistic drone dynamics via CARLA-AirSim co-simulation, and critical occlusion-aware 3D annotations. Accompanying the dataset is a unified benchmarking framework for cooperative detection and tracking, with protocols to evaluate communication efficiency, altitude adaptability, and robustness to communication latency, data loss and localization noise. By experiments through different cooperative paradigms, we demonstrate the effectiveness and limitations of current methods and provide crucial insights for future research.

AAAI Conference 2026 Conference Paper

Mixture-of-Trees: Learning to Select and Weigh Reasoning Paths for Efficient LLM Inference

  • Yangbo Wei
  • Zhen Huang
  • Shaoqiang Lu
  • Junhong Qian
  • Dongge Qin
  • Ting Jung Lin
  • WEI W. XING
  • Chen Wu

We introduce Mixture-of-Trees (MoT), a novel framework that integrates sparse expert activation with structured tree-based reasoning for efficient LLM inference. MoT employs a learned gating mechanism to selectively activate only the most relevant expert reasoning trees for each problem, where experts use models of varying capacities based on task complexity. The framework features three key innovations: (1) sparse expert activation through unified gating networks, (2) specialized expert trees that leverage domain-specific expertise while optimizing the quality-efficiency trade-off, and (3) collaborative debate mechanisms for conflicting solutions. Additionally, MoT includes a shared baseline tree with early stopping—activated experts perform lightweight validation and terminate early when confidence is high. Experiments across five benchmarks (GSM8K, MATH, AIME 2024, MMLU, HotpotQA) show that MoT achieves 2-7 percentage point accuracy improvements while reducing LLM calls by 37-40% compared to existing multi-path methods.

AAAI Conference 2026 Conference Paper

SparseCoop: Cooperative Perception with Kinematic-Grounded Queries

  • Jiahao Wang
  • Zhongwei Jiang
  • Wenchao Sun
  • Jiaru Zhong
  • Haibao Yu
  • Yuner Zhang
  • Chenyang Lu
  • Chuang Zhang

Cooperative perception is critical for autonomous driving, overcoming the inherent limitations of a single vehicle, such as occlusions and constrained fields-of-view. However, current approaches sharing dense Bird's-Eye-View (BEV) features are constrained by quadratically-scaling communication costs and the lack of flexibility and interpretability for precise alignment across asynchronous or disparate viewpoints. While emerging sparse query-based methods offer an alternative, they often suffer from inadequate geometric representations, suboptimal fusion strategies, and training instability. In this paper, we propose SparseCoop, a fully sparse cooperative perception framework for 3D detection and tracking that completely discards intermediate BEV representations. Our framework features a trio of innovations: a kinematic grounded instance query that uses an explicit state vector with 3D geometry and velocity for precise spatio-temporal alignment; a coarse-to-fine aggregation module that effectively integrates information from both matched and unmatched instances; and a cooperative instance denoising task that provides stable, abundant supervision to accelerate and stabilize training. Experiments on V2X-Seq and Griffin datasets show SparseCoop achieves state-of-the-art performance. Notably, it delivers this performance with superior computational efficiency and a highly competitive transmission cost, while showing remarkable robustness to real-world challenges like communication latency.

EAAI Journal 2025 Journal Article

A generative design method of airfoil based on conditional variational autoencoder

  • Xu Wang
  • Weiqi Qian
  • Tun Zhao
  • Hai Chen
  • Lei He
  • Haisheng Sun
  • Yuan Tian

The challenges in multi-objective and multi-dimensional optimization design of airfoils, marked by prolonged optimization cycles and low accuracy, call for an efficient solution to expedite airfoil design. This study presents an innovative airfoil generative design model based on a conditional variational autoencoder (CVAE). Initially, to overcome the limitation of insufficient training data, the model leverages the variational autoencoder (VAE) to learn the spatial distribution of University of Illinois at Urbana-Champaign (UIUC) airfoils, enabling the generation of a diverse set of airfoils with similar distributions. Subsequently, two CVAE-based airfoil generation models, the airfoil freedom design model and the airfoil precision design model, are proposed, which can realize diverse airfoil design under different conditions, such as shape and aerodynamic conditions. Furthermore, two measurements of roughness and diversity are introduced to evaluate the quality of the generated airfoils. The impact of different conditions and network parameters on the model’s generation performance is thoroughly analyzed. Results indicate that our proposed model achieves a 65% lower error compared to physics-guided conditional Wasserstein generative adversarial networks (PG-cWGAN) when generating airfoils that satisfy a specific lift coefficient and a 99. 99% lower error compared to airfoil pressure distributions generative adversarial networks (Airfoil-Cp-GAN) when generating airfoils that satisfy specific pressure distributions. This method introduces a more creative and accurate approach for aircraft designers in the realm of airfoil design. The code used for this paper is available at https: //github. com/liujun39/airfoilvae.

IROS Conference 2025 Conference Paper

Controllable Traffic Simulation through LLM-Guided Hierarchical Reasoning and Refinement

  • Zhiyuan Liu
  • Leheng Li
  • Yuning Wang
  • Haotian Lin 0006
  • Hao Chen
  • Zhizhe Liu
  • Lei He
  • Jianqiang Wang 0003

Evaluating autonomous driving systems in complex and diverse traffic scenarios through controllable simulation is essential to ensure their safety and reliability. However, existing traffic simulation methods face challenges in their controllability. To address this, we propose a novel diffusion-based and LLM-enhanced traffic simulation framework. Our approach incorporates a high-level understanding module and a low-level refinement module, which systematically examines the hierarchical structure of traffic elements, guides LLMs to thoroughly analyze traffic scenario descriptions step by step, and refines the generation by self-reflection, enhancing their understanding of complex situations. Furthermore, we propose a Frenet-frame-based cost function framework that provides LLMs with geometrically meaningful quantities, improving their grasp of spatial relationships in a scenario and enabling more accurate cost function generation. Experiments on the Waymo Open Motion Dataset (WOMD) demonstrate that our method can handle more intricate descriptions and generate a broader range of scenarios in a controllable manner.

AAAI Conference 2025 Conference Paper

Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation

  • Ziqian Ning
  • Shuai Wang
  • Yuepeng Jiang
  • Jixun Yao
  • Lei He
  • Shifeng Pan
  • Jie Ding
  • Lei Xie

Rap, a prominent genre of vocal performance, remains underexplored in vocal generation. General vocal synthesis depends on precise note and duration inputs, requiring users to have related musical knowledge, which limits flexibility. In contrast, rap typically features simpler melodies, with a core focus on a strong rhythmic sense that harmonizes with accompanying beats. In this paper, we propose Freestyler, the first system that generates rapping vocals directly from lyrics and accompaniment inputs. Freestyler utilizes language model-based token generation, followed by a conditional flow matching model to produce spectrograms and a neural vocoder to restore audio. It allows a 3-second prompt to enable zero-shot timbre control. Due to the scarcity of publicly available rap datasets, we also present RapBank, a rap song dataset collected from the internet, alongside a meticulously designed processing pipeline. Experimental results show that Freestyler produces high-quality rapping voice generation with enhanced naturalness and strong alignment with accompanying beats, both stylistically and rhythmically.

ICRA Conference 2025 Conference Paper

Hierarchical End-to-End Autonomous Driving: Integrating BEV Perception with Deep Reinforcement Learning

  • Siyi Lu
  • Lei He
  • Shengbo Eben Li
  • Yugong Luo
  • Jianqiang Wang 0003
  • Keqiang Li 0002

End-to-end autonomous driving offers a stream-lined alternative to the traditional modular pipeline, integrating perception, prediction, and planning within a single framework. While Deep Reinforcement Learning (DRL) has recently gained traction in this domain, existing approaches often overlook the critical connection between feature extraction of DRL and perception. In this paper, we bridge this gap by mapping the DRL feature extraction network directly to the perception phase, en-abling clearer interpretation through semantic segmentation. By leveraging Bird's-Eye- View (BEV) representations, we propose a novel DRL-based end-to-end driving framework that utilizes multi-sensor inputs to construct a unified three-dimensional understanding of the environment. This BEV-based system extracts and translates critical environmental features into high-level abstract states for DRL, facilitating more informed control. Extensive experimental evaluations demonstrate that our approach not only enhances interpretability but also significantly outperforms state-of-the-art methods in autonomous driving control tasks, reducing the collision rate by 20 %.

ICRA Conference 2025 Conference Paper

Unveiling the Black Box: Independent Functional Module Evaluation for Bird's-Eye-View Perception Model

  • Ludan Zhang
  • Xiaokang Ding
  • Yuqi Dai
  • Lei He
  • Keqiang Li

End-to-end models are emerging as the mainstream in autonomous driving perception. However, the inability to meticulously deconstruct their internal mechanisms results in diminished development efficacy and impedes the establishment of trust. Pioneering in the issue, we present the Independent Functional Module Evaluation for Bird's-EyeView Perception Model (BEV-IFME), a novel framework that juxtaposes the module's feature maps against Ground Truth within a unified semantic Representation Space to quantify their similarity, thereby assessing the training maturity of individual functional modules. The core of the framework lies in the process of feature map encoding and representation aligning, facilitated by our proposed two-stage Alignment AutoEncoder, which ensures the preservation of salient information and the consistency of feature structure. The metric for evaluating the training maturity of functional modules, Similarity Score, demonstrates a robust positive correlation with BEV metrics, with an average correlation coefficient of 0. 9387, attesting to the framework's reliability for assessment purposes.

AAAI Conference 2025 Conference Paper

USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation

  • Wanjiang Weng
  • Hongsong Wang
  • Junbo Wang
  • Lei He
  • Guo-Sen Xie

Contrastive learning has achieved great success in skeleton-based representation learning recently. However, the prevailing methods are predominantly negative-based, necessitating additional momentum encoder and memory bank to get negative samples, which increases the difficulty of model training. Furthermore, these methods primarily concentrate on learning a global representation for recognition and retrieval tasks, while overlooking the rich and detailed local representations that are crucial for dense prediction tasks. To alleviate these issues, we introduce a Unified Skeleton-based Dense Representation Learning framework based on feature decorrelation, called USDRL, which employs feature decorrelation across temporal, spatial, and instance domains in a multi-grained manner to reduce redundancy among dimensions of the representations to maximize information extraction from features. Additionally, we design a Dense Spatio-Temporal Encoder (DSTE) to capture fine-grained action representations effectively, thereby enhancing the performance of dense prediction tasks. Comprehensive experiments, conducted on the benchmarks NTU-60, NTU-120, PKU-MMD I, and PKU-MMD II, across diverse downstream tasks including action recognition, action retrieval, and action detection, conclusively demonstrate that our approach significantly outperforms the current state-of-the-art (SOTA) approaches.

IROS Conference 2025 Conference Paper

Vision-Driven 2D Supervised Fine-Tuning Framework for Bird's Eye View Perception

  • Lei He
  • Qiaoyi Wang
  • Honglin Sun
  • Qing Xu 0010
  • Bolin Gao
  • Shengbo Eben Li
  • Jianqiang Wang 0003
  • Keqiang Li 0002

Visual bird’s eye view (BEV) perception, dute to its excellent perceptual capabilities, is progressively replacing costly LiDAR-based perception systems, especially in the realm of urban intelligent driving. However, this type of perception still relies on LiDAR data to construct ground truth databases, a process that is both cumbersome and time-consuming. Additionally, most mass-produced autonomous driving systems are equipped solely with surround camera sensors and lack the LiDAR data necessary for precise annotation. To tackle this challenge, we propose a fine-tuning method for BEV perception network based on visual 2D semantic perception, aimed at enhancing the model’s generalization capabilities in new scene data. Leveraging the maturity of 2D perception technologies, our method utilizes only 2D semantic segmentation labels and monocular depth estimations, thereby significantly reducing the dependence on expensive BEV ground truths and offering strong potential for industrial deployment. Extensive experiments and comparative analyses on the nuScenes and Waymo datasets demonstrate the effectiveness of our method. Specifically, it improves mAP and NDS by 2. 51% and 1. 93% on nuScenes, and by 1. 21% and 0. 78% on Waymo, respectively, validating its practical utility and robustness across diverse domains.

NeurIPS Conference 2024 Conference Paper

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

  • Leying Zhang
  • Yao Qian
  • Long Zhou
  • Shujie Liu
  • Dongmei Wang
  • Xiaofei Wang
  • Midia Yousefi
  • Yanmin Qian

Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation. CoVoMix first converts dialogue text into multiple streams of discrete tokens, with each token stream representing semantic information for individual talkers. These token streams are then fed into a flow-matching based acoustic model to generate mixed mel-spectrograms. Finally, the speech waveforms are produced using a HiFi-GAN model. Furthermore, we devise a comprehensive set of metrics for measuring the effectiveness of dialogue modeling and generation. Our experimental results show that CoVoMix can generate dialogues that are not only human-like in their naturalness and coherence but also involve multiple talkers engaging in multiple rounds of conversation. This is exemplified by instances generated in a single channel where one speaker's utterance is seamlessly mixed with another's interjections or laughter, indicating the latter's role as an attentive listener. Audio samples are enclosed in the supplementary.

NeurIPS Conference 2023 Conference Paper

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models

  • Yuancheng Wang
  • Zeqian Ju
  • Xu Tan
  • Lei He
  • Zhizheng Wu
  • Jiang Bian
  • Sheng Zhao

Audio editing is applicable for various purposes, such as adding background sound effects, replacing a musical instrument, and repairing damaged audio. Recently, some diffusion-based methods achieved zero-shot audio editing by using a diffusion and denoising process conditioned on the text description of the output audio. However, these methods still have some problems: 1) they have not been trained on editing tasks and cannot ensure good editing effects; 2) they can erroneously modify audio segments that do not require editing; 3) they need a complete description of the output audio, which is not always available or necessary in practical scenarios. In this work, we propose AUDIT, an instruction-guided audio editing model based on latent diffusion models. Specifically, \textbf{AUDIT} has three main design features: 1) we construct triplet training data (instruction, input audio, output audio) for different audio editing tasks and train a diffusion model using instruction and input (to be edited) audio as conditions and generating output (edited) audio; 2) it can automatically learn to only modify segments that need to be edited by comparing the difference between the input and output audio; 3) it only needs edit instructions instead of full target audio descriptions as text input. AUDIT achieves state-of-the-art results in both objective and subjective metrics for several audio editing tasks (e. g. , adding, dropping, replacement, inpainting, super-resolution). Demo samples are available at https: //audit-demopage. github. io/.

AAAI Conference 2023 Conference Paper

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

  • Yihan Wu
  • Junliang Guo
  • Xu Tan
  • Chen Zhang
  • Bohan Li
  • Ruihua Song
  • Lei He
  • Sheng Zhao

Video dubbing aims to translate the original speech in a film or television program into the speech in a target language, which can be achieved with a cascaded system consisting of speech recognition, machine translation and speech synthesis. To ensure the translated speech to be well aligned with the corresponding video, the length/duration of the translated speech should be as close as possible to that of the original speech, which requires strict length control. Previous works usually control the number of words or characters generated by the machine translation model to be similar to the source sentence, without considering the isochronicity of speech as the speech duration of words/characters in different languages varies. In this paper, we propose VideoDubber, a machine translation system tailored for the task of video dubbing, which directly considers the speech duration of each token in translation, to match the length of source and target speech. Specifically, we control the speech length of generated sentence by guiding the prediction of each word with the duration information, including the speech duration of itself as well as how much duration is left for the remaining words. We design experiments on four language directions (German -> English, Spanish -> English, Chinese English), and the results show that VideoDubber achieves better length control ability on the generated speech than baseline methods. To make up the lack of real-world datasets, we also construct a real-world test set collected from films to provide comprehensive evaluations on the video dubbing task.

NeurIPS Conference 2022 Conference Paper

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

  • Yichong Leng
  • Zehua Chen
  • Junliang Guo
  • Haohe Liu
  • Jiawei Chen
  • Xu Tan
  • Danilo Mandic
  • Lei He

Binaural audio plays a significant role in constructing immersive augmented and virtual realities. As it is expensive to record binaural audio from the real world, synthesizing them from mono audio has attracted increasing attention. This synthesis process involves not only the basic physical warping of the mono audio, but also room reverberations and head/ear related filtration, which, however, are difficult to accurately simulate in traditional digital signal processing. In this paper, we formulate the synthesis process from a different perspective by decomposing the binaural audio into a common part that shared by the left and right channels as well as a specific part that differs in each channel. Accordingly, we propose BinauralGrad, a novel two-stage framework equipped with diffusion models to synthesize them respectively. Specifically, in the first stage, the common information of the binaural audio is generated with a single-channel diffusion model conditioned on the mono audio, based on which the binaural audio is generated by a two-channel diffusion model in the second stage. Combining this novel perspective of two-stage synthesis with advanced generative models (i. e. , the diffusion models), the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples. Experiment results show that on a benchmark dataset, BinauralGrad outperforms the existing baselines by a large margin in terms of both object and subject evaluation metrics (Wave L2: $0. 128$ vs. $0. 157$, MOS: $3. 80$ vs. $3. 61$). The generated audio samples\footnote{\url{https: //speechresearch. github. io/binauralgrad}} and code\footnote{\url{https: //github. com/microsoft/NeuralSpeech/tree/master/BinauralGrad}} are available online.

NeurIPS Conference 2022 Conference Paper

TreeMoCo: Contrastive Neuron Morphology Representation Learning

  • Hanbo Chen
  • Jiawei Yang
  • Daniel Iascone
  • Lijuan Liu
  • Lei He
  • Hanchuan Peng
  • Jianhua Yao

Morphology of neuron trees is a key indicator to delineate neuronal cell-types, analyze brain development process, and evaluate pathological changes in neurological diseases. Traditional analysis mostly relies on heuristic features and visual inspections. A quantitative, informative, and comprehensive representation of neuron morphology is largely absent but desired. To fill this gap, in this work, we adopt a Tree-LSTM network to encode neuron morphology and introduce a self-supervised learning framework named TreeMoCo to learn features without the need for labels. We test TreeMoCo on 2403 high-quality 3D neuron reconstructions of mouse brains from three different public resources. Our results show that TreeMoCo is effective in both classifying major brain cell-types and identifying sub-types. To our best knowledge, TreeMoCo is the very first to explore learning the representation of neuron tree morphology with contrastive learning. It has a great potential to shed new light on quantitative neuron morphology analysis. Code is available at https: //github. com/TencentAILabHealthcare/NeuronRepresentation.

IROS Conference 2021 Conference Paper

CLMM-Net: Robust Cascaded LiDAR Map Matching based on Multi-Level Intensity Map

  • Kai Chen 0028
  • Lei He
  • Xiaofeng Wang
  • Yuqian Liu
  • Ming Zhao

LiDAR map matching(LMM) is a critical localization technique in autonomous driving while existing methods have problems in terms of both accuracy and robustness when driving in the scenes with poor structure information (e. g. highways). This paper put forward a multi-level intensity map based cascaded network for LiDAR map matching in autonomous driving. The network uses an effective multi-level intensity map representation to compactly encode the appearance and structure information of point clouds, which effectively reduce the position ambiguity in structure-less scenarios. Besides, this method leverages the multi-scale nature of deep neural networks and matches the online LiDAR observation with the offline map in a coarse-to-fine manner so as to balance the time-consuming and precision. Extensive experiments on diverse autonomous driving environments demonstrate the superiority of our proposed method over other existing state-of-the-art methods.

NeurIPS Conference 2021 Conference Paper

Exploring Forensic Dental Identification with Deep Learning

  • Yuan Liang
  • Weikun Han
  • Liang Qiu
  • Chen Wu
  • Yiting Shao
  • Kun Wang
  • Lei He

Dental forensic identification targets to identify persons with dental traces. The task is vital for the investigation of criminal scenes and mass disasters because of the resistance of dental structures and the wide-existence of dental imaging. However, no widely accepted automated solution is available for this labour-costly task. In this work, we pioneer to study deep learning for dental forensic identification based on panoramic radiographs. We construct a comprehensive benchmark with various dental variations that can adequately reflect the difficulties of the task. By considering the task's unique challenges, we propose FoID, a deep learning method featured by: (\textit{i}) clinical-inspired attention localization, (\textit{ii}) domain-specific augmentations that enable instance discriminative learning, and (\textit{iii}) transformer-based self-attention mechanism that dynamically reasons the relative importance of attentions. We show that FoID can outperform traditional approaches by at least \textbf{22. 98\%} in terms of Rank-1 accuracy, and outperform strong CNN baselines by at least \textbf{10. 50\%} in terms of mean Average Precision (mAP). Moreover, extensive ablation studies verify the effectiveness of each building blocks of FoID. Our work can be a first step towards the automated system for forensic identification among large-scale multi-site databases. Also, the proposed techniques, \textit{e. g. }, self-attention mechanism, can also be meaningful for other identification tasks, \textit{e. g. }, pedestrian re-identification. Related data and codes can be found at \href{https: //github. com/liangyuandg/FoID}{https: //github. com/liangyuandg/FoID}.

AAAI Conference 2021 Conference Paper

Oral-3D: Reconstructing the 3D Structure of Oral Cavity from Panoramic X-ray

  • Weinan Song
  • Yuan Liang
  • Jiawei Yang
  • Kun Wang
  • Lei He

Panoramic X-ray (PX) provides a 2D picture of the patient’s mouth in a panoramic view to help dentists observe the invisible disease inside the gum. However, it provides limited 2D information compared with cone-beam computed tomography (CBCT), another dental imaging method that generates a 3D picture of the oral cavity but with more radiation dose and a higher price. Consequently, it is of great interest to reconstruct the 3D structure from a 2D X-ray image, which can greatly explore the application of X-ray imaging in dental surgeries. In this paper, we propose a framework, named Oral-3D, to reconstruct the 3D oral cavity from a single PX image and prior information of the dental arch. Specifically, we first train a generative model to learn the cross-dimension transformation from 2D to 3D. Then we restore the shape of the oral cavity with a deformation module with the dental arch curve, which can be obtained simply by taking a photo of the patient’s mouth. To be noted, Oral-3D can restore both the density of bony tissues and the curved mandible surface. Experimental results show that Oral-3D can efficiently and effectively reconstruct the 3D oral structure and show critical information in clinical applications, e. g. , tooth pulling and dental implants. To the best of our knowledge, we are the first to explore this domain transformation problem between these two imaging methods.

ICRA Conference 2020 Conference Paper

Integrated moment-based LGMD and deep reinforcement learning for UAV obstacle avoidance

  • Lei He
  • Nabil Aouf
  • James F. Whidborne
  • Bifeng Song

In this paper, a bio-inspired monocular vision perception method combined with a learning-based reaction local planner for obstacle avoidance of micro UAVs is presented. The system is more computationally efficient than other vision-based perception and navigation methods such as SLAM and optical flow because it does not need to calculate accurate distances. To improve the robustness of perception against illuminance change, the input image is remapped using image moment which is independent of illuminance variation. After perception, a local planner is trained using deep reinforcement learning for mapless navigation. The proposed perception and navigation methods are evaluated in some realistic simulation environments. The result shows that this light-weight monocular perception and navigation system works well in different complex environments without accurate depth information.