Arrow Research search

Author name cluster

Kai Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

42 papers
2 author rows

Possible papers

42

JBHI Journal 2026 Journal Article

A 3D Edge-Attention Denoising Diffusion Network for Prostate Segmentation in Puncture Biopsy

  • Haomin Kuang
  • Jiaxin Guo
  • Kai Xu
  • Yun-Hui Liu

Prostate cancer is the second most common cancer in men, and transrectal ultrasound (TRUS) guided biopsy is the standard method to diagnose prostate cancer. Accurate prostate segmentation in TRUS images is crucial for precise biopsy. Manual segmentation is laborious, while automated segmentation faces significant challenges due to the low signal-to-noise ratio, blurred boundaries, and presence of noise and artifacts. To address these issues, this paper proposes a 3D edge-attention denoising diffusion network, aiming to achieve high accuracy and generalizability for prostate segmentation in TRUS-guided biopsy. The proposed network incorporates an edge attention denoising U-Net (EAD U-Net) to extract and utilize desired edge information in TRUS images, improving the segmentation accuracy in challenging regions of the prostate. To reduce uncertainty and enhance network accuracy, we incorporate a Kalman fusion module, which utilizes the Kalman filter and all estimations from the EAD U-Net in reverse process to obtain the optimal segmentation estimation. The proposed network was evaluated using 1834 3D ultrasound images from two open-source datasets. Comparative experiments with existing methods demonstrate that our method surpasses state-of-the-art techniques, proving its effectiveness in prostate segmentation from TRUS images. The proposed method achieved an average Dice similarity coefficient of 92. 92% and 94. 0%, and the 95th percentile of Hausdorff distance of 1. 07 mm and 0. 77 mm on two datasets, demonstrating the potential to facilitate accurate MRI-TRUS fusion guided prostate biopsy.

AAAI Conference 2026 Conference Paper

AnchorHOI: Zero-shot Generation of 4D Human-Object Interaction via Anchor-based Prior Distillation

  • Sisi Dai
  • Kai Xu

Despite significant progress in text-driven 4D human-object interaction (HOI) generation with supervised methods, the scalability remains limited by the scarcity of large-scale 4D HOI datasets. To overcome this, recent approaches attempt zero-shot 4D HOI generation with pre-trained image diffusion models. However, interaction cues are minimally distilled during the generation process, restricting their applicability across diverse scenarios. In this paper, we propose AnchorHOI, a novel framework that thoroughly exploits hybrid priors by incorporating video diffusion models beyond image diffusion models, advancing 4D HOI generation. Nevertheless, directly optimizing high-dimensional 4D HOI with such priors remains challenging, particularly for human pose and compositional motion. To address this challenge, AnchorHOI introduces an anchor-based prior distillation strategy, which constructs interaction-aware anchors and then leverages them to guide generation in a tractable two-step process. Specifically, two tailored anchors are designed for 4D HOI generation: anchor Neural Radiance Fields (NeRFs) for expressive interaction composition, and anchor keypoints for realistic motion synthesis. Extensive experiments demonstrate that AnchorHOI outperforms previous methods with superior diversity and generalization.

EAAI Journal 2026 Journal Article

Cross-modal correlation-guided hierarchical multiscale network for cloud removal of optical remote sensing imagery

  • Anling Wang
  • Kai Xu
  • Wenxin Wang
  • Chengcheng Fan

Synthetic aperture radar (SAR) provides significant complementary information for cloud removal in optical remote sensing imagery, enabling the recovery of large-scale missing regions. To achieve high-quality reconstruction, this study focuses on two key challenges: extracting valid complementary information from SAR data and maintaining a balance between global scene coherence and fine-grained detail restoration. Therefore, we propose a cross-modal correlation-guided hierarchical multiscale network, which synergizes multimodal fusion and multiscale optimization to restore scene details occluded by clouds. For cross modal fusion, the correlation propagated fusion module, aims to propagate SAR-derived global correlations to cloud-contaminated optical images. Furthermore, the hierarchical multiscale image reconstruction integrates the strengths of the location-driven feature aggregation in aggregating information between adjacent scales and the optimization capabilities of the deep supervision mechanism across multilevel. Extensive experiments demonstrate that the proposed method surpasses current methods by over 0. 4 in peak signal-to-noise ratio on both real and synthetic datasets, producing cloud-free results with clearer object details and more harmonious overall appearance. Five-fold cross-validation and comparisons under varying cloud cover levels further validate the proposed method's strong generalization and robustness. Moreover, the validation of land cover classification after cloud removal highlights its practical applicability in real.

JBHI Journal 2026 Journal Article

Fundus Image Enhancement With Pyramid Conditional Flow

  • Kai Xu
  • Zhen Liang
  • Wenjun Wei
  • Huaian Chen
  • Yi Jin

Deep learning-based approaches, which learn pixel-to-pixel mapping from input to output images, have demonstrated exceptional performance in enhancing low-quality fundus images. However, due to the ambiguous definition of the ground-truth high-quality image, the pixel-to-pixel mapping encounters an ill-posed problem arising from the complex one-to-many relationship between low-quality fundus images and their corresponding high-quality versions. To address this problem, this work proposes a PCFlow, the first normalizing flow method that learns the complex distributions of high-quality fundus images rather than a pixel-to-pixel mapping. Unlike the existing image natural enhancement methods that aim to restore images with comfortable visual quality, PCFlow enhances fundus images by prioritizing clinically significant information. To this end, we design a condition module that utilizes retinal structure as a conditioning factor to constrain the optimization of PCFlow, and then build an invertible coupling layer that employs a pyramid structure for identifying each frequency component of retinal features. With the cooperation and interactions of these key components, the proposed PCFlow preserves the retinal structures and pathological characteristics essential for clinical applications. Extensive experiments on the real and synthetic fundus datasets demonstrate that our method achieves better performance.

AAAI Conference 2026 Conference Paper

Topology-Inspired Backward-Free Framework for Test-Time Adaptation in Medical Detection

  • Bin Pu
  • Xingguo Lv
  • Jiewen Yang
  • Kai Xu
  • Lei Zhao
  • Zuozhu Liu
  • Kenli Li

Recently, Test-Time Adaptation (TTA) has gained increasing attention in medical imaging due to its ability to improve model generalization under domain shifts without retraining. In particular, directly applying a well-trained model across various medical centers faces significant performance degradation caused by variations in equipment, operators, imaging conditions, and scanning skill levels of sonographers. Existing TTA methods either rely on parameter adaptation that increases computational cost or apply simple prediction fusion that ignores anatomical structure knowledge. To address these limitations, we propose a novel backward-free Topology-aware TTA framework named T^3 that integrates Structural Perception Modeling (SPM) and Box Regression Adaptation (BRA). SPM is implemented through an organ space heatmap generated via Gaussian kernel superposition. This heatmap encodes anatomical topology without requiring additional training or source data. BRA further improves localization and classification by fusing detection outputs based on the contribution of detected results to anatomically meaningful peak points from the heatmaps. Extensive experiments were conducted across six cross-domain scenarios, and the results demonstrate that our method achieves state-of-the-art cross-domain detection performance while maintaining high efficiency, offering a practical and robust solution for real-world medical diagnostic applications.

AAAI Conference 2026 Conference Paper

Unified Mixture-of-Experts Framework for Joint Cardiac and Vascular Ultrasound Analysis and Report Generation

  • Bin Pu
  • Jiewen Yang
  • Xingguo Lv
  • Kai Xu
  • Kenli Li

Echocardiography and vascular ultrasound are essential for comprehensive cardiovascular assessment, yet manual evaluation and writing reports are labor-intensive, time-consuming, and require expertise from both cardiology and vascular surgery departments. Current automated report generation systems mainly focus on X-ray or CT, often neglecting echocardiographic modalities and critical quantitative parameters like aortic diameter and main pulmonary artery diameter, limiting their clinical utility. Moreover, the interdependence between cardiac and peripheral vascular health necessitates cross-departmental insights, which existing methods fail to incorporate. To address these limitations, we first propose the vision-language framework named the Echo-Cardiac-Vascular (ECV), for joint cardiac and vascular ultrasound report generation and parameter measurements. ECV introduces a Mixture-of-Experts vision encoder tailored for distinct ultrasound subtypes, a structured parameter measurement module for accurate quantification, and task-specific decoders that generate interpretable, multimodal diagnostic reports. Our framework, trained on 10K+ paired records, achieves high accuracy, improving diagnostic efficiency, consistency, and cross-disciplinary clinical applicability.

EAAI Journal 2025 Journal Article

A time and frequency convolutional Autoencoder for anomaly detection in industrial robots based on inertial measurement unit error calibration

  • Jianlong Li
  • Xiaoqin Liu
  • Xing Wu
  • Dongxiao Wang
  • Kai Xu
  • Yashan Li

In the realm of industrial robots, ensuring operational reliability and Long-Term Autonomy hinges on the accurate detection of anomalies. However, this sample difference due to noise, joint random errors and sensor errors increases the challenge of robot anomaly detection. To address this problem, an unsupervised deep learning method based on inertial measurement unit (IMU) error calibration is proposed. Firstly, the attitude signals acquired by the IMU from the end of the robot were calibrated using Kalman filtering. The three dimensional (3D) free acceleration was corrected based on the calibrated attitude signal and the calibrated 3D free acceleration signal was used as a signal sample. Secondly, a time and frequency convolutional autoencoder model (TFCAE) is proposed. And the distribution of the different component signals is fitted by stacking multiple encoder modules and 3D-TFCAE is used for 3D free acceleration signal reconstruction model. Then, the error sphere radius is calculated based on the reconstruction error of the 3D free acceleration signal. And the error sphere radius is used as the anomaly detection threshold to realize the robust detection of different types of anomalies. The model was evaluated on a constructed anomaly dataset. This study contributes an innovative 3D-TFCAE architecture, integrating Kalman filtering with time-frequency feature fusion, markedly enhancing anomaly detection in complex signal environments. Experimental findings reveal that 3D-TFCAE significantly outperforms 18 baseline models, improving detection accuracy by about 20 %–40 %, offering an effective solution for high-precision anomaly detection in industrial robots. The code for this project is available at https: //github. com/LJlong977/3DTFCAE.

ICLR Conference 2025 Conference Paper

EvA: Erasing Spurious Correlations with Activations

  • Qiyuan He
  • Kai Xu
  • Angela Yao

Spurious correlations often arise when models associate features strongly correlated with, but not causally related to, the label e.g. an image classifier associates bodies of water with ducks. To mitigate spurious correlations, existing methods focus on learning unbiased representation or incorporating additional information about the correlations during training. This work removes spurious correlations by ``**E**rasing **wi**th **A**ctivations'' (EvA). EvA learns class-specific spurious indicator on each channel for the fully connected layer of pretrained networks. By erasing spurious connections during re-weighting, EvA achieves state-of-the-art performance across diverse datasets (6.2\% relative gain on BAR and achieves 4.1\% on Waterbirds). For biased datasets without any information about the spurious correlations, EvA can outperform previous methods (4.8\% relative gain on Waterbirds) with 6 orders of magnitude less compute, highlighting its data and computational efficiency.

AAAI Conference 2025 Conference Paper

Hierarchically-Structured Open-Vocabulary Indoor Scene Synthesis with Pre-trained Large Language Model

  • Weilin Sun
  • Xinran Li
  • Manyi Li
  • Kai Xu
  • Xiangxu Meng
  • Lei Meng

Indoor scene synthesis aims to automatically produce plausible, realistic, and diverse 3D indoor scenes, especially given arbitrary user requirements. Recently, the promising generalization ability of pre-trained large language models (LLM) assist in open-vocabulary indoor scene synthesis. However, the challenge lies in converting the LLM-generated outputs into reasonable and physically feasible scene layouts. In this paper, we propose to generate hierarchically structured scene descriptions with LLM and then compute the scene layouts. Specifically, we train a hierarchy-aware network to infer the fine-grained relative positions between objects and design a divide-and-conquer optimization to solve for scene layouts. The advantages of using hierarchically structured scene representation are two-fold. First, the hierarchical structure provides a rough grounding for object arrangement, which alleviates contradictory placements with dense relations and enhances the generalization ability of the network to infer fine-grained placements. Second, it naturally supports the divide-and-conquer optimization, by first arranging the sub-scenes and then the entire scene, to more effectively solve for a feasible layout. We conduct extensive comparison experiments and ablation studies with both qualitative and quantitative evaluations to validate the effectiveness of our key designs with the hierarchically structured scene representation. Our approach can generate more reasonable scene layouts while better aligned with the user requirements and LLM descriptions. We also present open-vocabulary scene synthesis and interactive scene design results to show the strength of our approach in the applications.

AAAI Conference 2025 Conference Paper

Physical-aware Neural Radiance Fields for Efficient Exposure Correction

  • Kai Xu
  • Mingwen Shao
  • Yuanjian Qiao
  • Yan Wang

Neural Radiance Fields (NeRF) has achieved remarkable success in synthesizing impressive novel views. However, existing methods usually fail to handle scenes with adverse lighting conditions caused by external time variations and different camera settings, leading to poor visual quality. To address this challenge, we propose a physical-aware NeRF for efficient exposure correction, named PHY-NeRF. Specifically, we design Adaptive Lighting Particles inspired by the theory of light scattering and absorption, which can adjust the illumination intensity during volume rendering. Subsequently, we can handle scenes with different lighting conditions by jointly optimizing camera parameters and these lighting particles. Moreover, to promote natural brightness transitions, we devise a global illumination consistency module to control the lighting intensity across views at the feature level while completing more details. Benefiting from the above designs, our PHY-NeRF can tackle arbitrary low-light or overexposed scenes in an unsupervised manner. Extensive experiments show that our PHY-NeRF achieves state-of-the-art results in addressing adverse lighting problems while ensuring high rendering efficiency.

NeurIPS Conference 2025 Conference Paper

Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

  • Isha Puri
  • Shivchander Sudalairaj
  • Guangxuan Xu
  • Abhishek Bhandwaldar
  • Kai Xu
  • Akash Srivastava

Large language models (LLMs) have achieved significant performance gains via scaling up model sizes and/or data. However, recent evidence suggests diminishing returns from such approaches, motivating a pivot to scaling test-time compute. Existing deterministic inference-time scaling methods, usually with reward models, cast the task as a search problem, but suffer from a key limitation: early pruning. Due to inherently imperfect reward models, promising trajectories may be discarded prematurely, leading to suboptimal performance. We propose a novel inference-time scaling approach by adapting particle-based Monte Carlo methods. Our method maintains a diverse set of candidates and robustly balances exploration and exploitation. Our empirical evaluation demonstrates that our particle filtering methods have a 4--16x better scaling rate over deterministic search counterparts on both various challenging mathematical and more general reasoning tasks. Using our approach, we show that Qwen2. 5-Math-1. 5B-Instruct surpasses GPT-4o accuracy in only 4 rollouts, while Qwen2. 5-Math-7B-Instruct scales to o1 level accuracy in only 32 rollouts. Our work not only presents an effective method to inference-time scaling, but also connects rich literature in probabilistic inference with inference-time scaling of LLMs to develop more robust algorithms in future work.

JBHI Journal 2025 Journal Article

UA-VLFM: An Uncertainty-aware Vision-Language Foundation Model for Auxiliary Diagnosis of Vitreoretinal Iymphoma

  • Wenwen Wang
  • Aidi Lin
  • Tian Lin
  • Zhen Liang
  • Kai Xu
  • Tao Li
  • Dan Liang
  • Shanshan Yu

Vitreoretinal lymphoma (VRL) is a rare malignant ocular tumor, and its early diagnosis is crucial for patient prognosis. However, due to its insidious and diverse clinical manifestations, it is often misdiagnosed as other ophthalmic diseases, leading to blindness or even fatal outcomes. In this study, an uncertainty-aware visionlanguage foundational model (UA-VLFM) based on contrastive learning and uncertainty estimation is developed to achieve automatic classification of VRL and other 5 retinal diseases. First, we integrate MAE-based pretraining knowledge on large-scale optical coherence tomography (OCT) images and efficient Low-rank adaption (LoRA) optimization strategy to enhance the representation ability and optimization efficiency of the model. Moreover, an uncertainty-aware contrastive learning method based on Dirichlet distribution within the contrastive vision-language pretraining framework is proposed to further align vision and language feature in the high-dimensional embedding space and obtain prediction results with corresponding uncertainty scores, thereby enhancing the reliability of VRL diagnosis. In the test dataset with 5, 563 OCT images, UA-VLFM achieves a higher average F1 score of 0. 9684 than other state-of-the-art algorithms (0. 8186-0. 9427) and improves to 0. 9839 with the threshold strategy. Notably, the proposed UA-VLFM achieves an F1 score of 0. 9217 and 0. 9544 before and after thresholding on VRL, the most challenging category, significantly outperforming other methods (0. 5089-0. 9366 and 0. 6639-0. 9133). Our UA-VLFM provides a trustworthy method for aiding in the diagnosis of VRL on retinal OCT images. The code has been released on Github: https://github.com/wang-wen-wen/UA-VLFM.

AAAI Conference 2024 Conference Paper

DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection

  • Yunfan Ye
  • Kai Xu
  • Yuhang Huang
  • Renjiao Yi
  • Zhiping Cai

Limited by the encoder-decoder architecture, learning-based edge detectors usually have difficulty predicting edge maps that satisfy both correctness and crispness. With the recent success of the diffusion probabilistic model (DPM), we found it is especially suitable for accurate and crisp edge detection since the denoising process is directly applied to the original image size. Therefore, we propose the first diffusion model for the task of general edge detection, which we call DiffusionEdge. To avoid expensive computational resources while retaining the final performance, we apply DPM in the latent space and enable the classic cross-entropy loss which is uncertainty-aware in pixel level to directly optimize the parameters in latent space in a distillation manner. We also adopt a decoupled architecture to speed up the denoising process and propose a corresponding adaptive Fourier filter to adjust the latent features of specific frequencies. With all the technical designs, DiffusionEdge can be stably trained with limited resources, predicting crisp and accurate edge maps with much fewer augmentation strategies. Extensive experiments on four edge detection benchmarks demonstrate the superiority of DiffusionEdge both in correctness and crispness. On the NYUDv2 dataset, compared to the second best, we increase the ODS, OIS (without post-processing) and AC by 30.2%, 28.1% and 65.1%, respectively. Code: https://github.com/GuHuangAI/DiffusionEdge.

ICML Conference 2024 Conference Paper

GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding

  • Cunxiao Du
  • Jing Jiang 0001
  • Yuanchen Xu
  • Jiawei Wu 0003
  • Sicheng Yu
  • Yongqi Li 0001
  • Shenggui Li
  • Kai Xu

Speculative decoding is a relatively new decoding framework that leverages small and efficient draft models to reduce the latency of LLMs. In this study, we introduce GliDe and CaPE, two low-hassle modifications to vanilla speculative decoding to further improve the decoding speed of a frozen LLM. Specifically, GliDe is a modified draft model architecture that reuses the cached keys and values from the target LLM, while CaPE is a proposal expansion method that uses the draft model’s confidence scores to help select additional candidate tokens for verification. Extensive experiments on different benchmarks demonstrate that our proposed GliDe draft model significantly reduces the expected decoding latency. Additional evaluation using walltime reveals that GliDe can accelerate Vicuna models up to 2. 17x and further extend the improvement to 2. 61x with CaPE. We will release our code, data, and the trained draft models.

TMLR Journal 2024 Journal Article

LInK: Learning Joint Representations of Design and Performance Spaces through Contrastive Learning for Mechanism Synthesis

  • Amin Heyrani Nobari
  • Akash Srivastava
  • Dan Gutfreund
  • Kai Xu
  • Faez Ahmed

In this paper, we introduce LInK, a novel framework that integrates contrastive learning of performance and design space with optimization techniques for solving complex inverse problems in engineering design with discrete and continuous variables. We focus on the path synthesis problem for planar linkage mechanisms. By leveraging a multimodal and transformation-invariant contrastive learning framework, LInK learns a joint representation that captures complex physics and design representations of mechanisms, enabling rapid retrieval from a vast dataset of over 10 million mechanisms. This approach improves precision through the warm start of a hierarchical unconstrained nonlinear optimization algorithm, combining the robustness of traditional optimization with the speed and adaptability of modern deep learning methods. Our results on an existing benchmark demonstrate that LInK outperforms existing methods with 28 times less error compared to a state-of-the-art approach while taking 20 times less time on an existing benchmark. Moreover, we introduce a significantly more challenging benchmark, named LINK-ABC, which involves synthesizing linkages that trace the trajectories of English capital alphabets—an inverse design benchmark task that existing methods struggle with due to large nonlinearities and tiny feasible space. Our results demonstrate that LInK not only advances the field of mechanism design but also broadens the applicability of contrastive learning and optimization to other areas of engineering. The code and data are publicly available at https://github.com/ahnobari/LInK.

ICML Conference 2024 Conference Paper

Practical Hamiltonian Monte Carlo on Riemannian Manifolds via Relativity Theory

  • Kai Xu
  • Hong Ge

Hamiltonian Monte Carlo (HMC) samples from an unnormalized density by numerically integrating Hamiltonian dynamics. Girolami & Calderhead (2011) extend HMC to Riemannian manifolds, but the resulting method faces integration instability issues for practical usage. While previous works have tackled this challenge by using more robust metric tensors than Fisher’s information metric, our work focuses on designing numerically stable Hamiltonian dynamics. To do so, we start with the idea from Lu et al. (2017), which designs momentum distributions to upper-bound the particle speed. Then, we generalize this Lu et al. (2017) method to Riemannian manifolds. In our generalization, the upper bounds of velocity norm become position-dependent, which intrinsically limits step sizes used in high curvature regions and, therefore, significantly reduces numerical errors. We also derive a more tractable algorithm to sample from relativistic momentum distributions without relying on the mean-field assumption.

NeurIPS Conference 2024 Conference Paper

Privacy without Noisy Gradients: Slicing Mechanism for Generative Model Training

  • Kristjan Greenewald
  • Yuancheng Yu
  • Hao Wang
  • Kai Xu

Training generative models with differential privacy (DP) typically involves injecting noise into gradient updates or adapting the discriminator's training procedure. As a result, such approaches often struggle with hyper-parameter tuning and convergence. We consider the \emph{slicing privacy mechanism} that injects noise into random low-dimensional projections of the private data, and provide strong privacy guarantees for it. These noisy projections are used for training generative models. To enable optimizing generative models using this DP approach, we introduce the \emph{smoothed-sliced $f$-divergence} and show it enjoys statistical consistency. Moreover, we present a kernel-based estimator for this divergence, circumventing the need for adversarial training. Extensive numerical experiments demonstrate that our approach can generate synthetic data of higher quality compared with baselines. Beyond performance improvement, our method, by sidestepping the need for noisy gradients, offers data scientists the flexibility to adjust generator architecture and hyper-parameters, run the optimization over any number of epochs, and even restart the optimization process---all without incurring additional privacy costs.

ICLR Conference 2024 Conference Paper

Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement

  • Kai Xu
  • Rongyu Chen
  • Gianni Franchi
  • Angela Yao

Activation shaping has proven highly effective for identifying out-of-distribution (OOD) samples post-hoc. Activation shaping prunes and scales network activations before estimating the OOD energy score; such an extremely simple approach achieves state-of-the-art OOD detection with minimal in-distribution (ID) accuracy drops. This paper analyzes the working mechanism behind activation shaping. We directly show that the benefits for OOD detection derive only from scaling, while pruning is detrimental. Based on our analysis, we propose SCALE, an even simpler yet more effective post-hoc network enhancement method for OOD detection. SCALE attains state-of-the-art OOD detection performance without any compromises on ID accuracy. Furthermore, we integrate scaling concepts into learning and propose Intermediate Tensor SHaping (ISH) for training-time OOD detection enhancement. ISH achieves significant AUROC improvements for both near- and far-OOD, highlighting the importance of activation distributions in emphasizing ID data characteristics. Our code and models are available at https://github.com/kai422/SCALE.

IROS Conference 2024 Conference Paper

VRExplorer: An Efficient View-Region based Autonomous Exploration Method in Unknown Environments for UAV

  • Kai Xu
  • Lanxiang Zheng
  • Mingxin Wei
  • Hui Cheng

Autonomous exploration plays a crucial role in robotics applications like rescue and scene reconstruction. This work addresses the challenges of autonomous exploration in intricate unknown environments by presenting a novel UAV autonomous exploration method based on a new concept of the view-region. Our proposed approach leverages the view-region to replace the conventional viewpoint generation and selection process, streamlining the planning process for exploration. Simultaneously, we model the problem of maximizing frontier coverage within the field of view during exploration, and jointly optimize it with the exploration path optimization problem. This approach ensures exploration path safety and effectiveness while being aggressive. Additionally, a gimbal is incorporated beneath the camera, with an associated optimization problem designed to minimize UAV self-rotation and enhance exploration efficiency. Simulations and real-world experiments demonstrate that the proposed method outperforms existing state-of-the-art methods in terms of runtime and distance traveled.

ICLR Conference 2023 Conference Paper

DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training

  • Joya Chen
  • Kai Xu
  • Yuhui Wang
  • Yifei Cheng 0002
  • Angela Yao

A standard hardware bottleneck when training deep neural networks is GPU memory. The bulk of memory is occupied by caching intermediate tensors for gradient computation in the backward pass. We propose a novel method to reduce this footprint - Dropping Intermediate Tensors (DropIT). DropIT drops min-k elements of the intermediate tensors and approximates gradients from the sparsified tensors in the backward pass. Theoretically, DropIT reduces noise on estimated gradients and therefore has a higher rate of convergence than vanilla-SGD. Experiments show that we can drop up to 90\% of the intermediate tensor elements in fully-connected and convolutional layers while achieving higher testing accuracy for Visual Transformers and Convolutional Neural Networks on various tasks (e.g., classification, object detection, instance segmentation). Our code and models are available at https://github.com/chenjoya/dropit.

TMLR Journal 2023 Journal Article

Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression

  • Akash Srivastava
  • Seungwook Han
  • Kai Xu
  • Benjamin Rhodes
  • Michael U. Gutmann

Functions of the ratio of the densities $p/q$ are widely used in machine learning to quantify the discrepancy between the two distributions $p$ and $q$. For high-dimensional distributions, binary classification-based density ratio estimators have shown great promise. However, when densities are well-separated, estimating the density ratio with a binary classifier is challenging. In this work, we show that the state-of-the-art density ratio estimators do perform poorly on well-separated cases and demonstrate that this is due to distribution shifts between training and evaluation time. We present an alternative method that leverages multi-class classification for density ratio estimation and does not suffer from distribution shift issues. The method uses a set of auxiliary densities $\{m_k\}_{k=1}^K$ and trains a multi-class logistic regression to classify the samples from $p, q$ and $\{m_k\}_{k=1}^K$ into $K+2$ classes. We show that if these auxiliary densities are constructed such that they overlap with $p$ and $q$, then a multi-class logistic regression allows for estimating $\log p/q$ on the domain of any of the $K+2$ distributions and resolves the distribution shift problems of the current state-of-the-art methods. We compare our method to state-of-the-art density ratio estimators on both synthetic and real datasets and demonstrate its superior performance on the tasks of density ratio estimation, mutual information estimation, and representation learning.

AAAI Conference 2023 Conference Paper

Multi-Resolution Monocular Depth Map Fusion by Self-Supervised Gradient-Based Composition

  • Yaqiao Dai
  • Renjiao Yi
  • Chenyang Zhu
  • Hongjun He
  • Kai Xu

Monocular depth estimation is a challenging problem on which deep neural networks have demonstrated great potential. However, depth maps predicted by existing deep models usually lack fine-grained details due to convolution operations and down-samplings in networks. We find that increasing input resolution is helpful to preserve more local details while the estimation at low resolution is more accurate globally. Therefore, we propose a novel depth map fusion module to combine the advantages of estimations with multi-resolution inputs. Instead of merging the low- and high-resolution estimations equally, we adopt the core idea of Poisson fusion, trying to implant the gradient domain of high-resolution depth into the low-resolution depth. While classic Poisson fusion requires a fusion mask as supervision, we propose a self-supervised framework based on guided image filtering. We demonstrate that this gradient-based composition performs much better at noisy immunity, compared with the state-of-the-art depth map fusion method. Our lightweight depth fusion is one-shot and runs in real-time, making it 80X faster than a state-of-the-art depth fusion method. Quantitative evaluations demonstrate that the proposed method can be integrated into many fully convolutional monocular depth estimation backbones with a significant performance boost, leading to state-of-the-art results of detail enhancement on depth maps. Codes are released at https://github.com/yuinsky/gradient-based-depth-map-fusion.

ICLR Conference 2023 Conference Paper

Synthetic Data Generation of Many-to-Many Datasets via Random Graph Generation

  • Kai Xu
  • Georgi Ganev
  • Emile Joubert
  • Rees Davison
  • Olivier Van Acker
  • Luke Robinson

Synthetic data generation (SDG) has become a popular approach to release private datasets. In SDG, a generative model is fitted on the private real data, and samples drawn from the model are released as the protected synthetic data. While real-world datasets usually consist of multiple tables with potential \emph{many-to-many} relationships (i.e.~\emph{many-to-many datasets}), recent research in SDG mostly focuses on modeling tables \emph{independently} or only considers generating datasets with special cases of many-to-many relationships such as \emph{one-to-many}. In this paper, we first study challenges of building faithful generative models for many-to-many datasets, identifying limitations of existing methods. We then present a novel factorization for many-to-many generative models, which leads to a scalable generation framework by combining recent results from random graph theory and representation learning. Finally, we extend the framework to establish the notion of $(\epsilon,\delta)$-differential privacy. Through a real-world dataset, we demonstrate that our method can generate synthetic datasets while preserving information within and across tables better than its closest competitor.

AAAI Conference 2022 Conference Paper

Efficient One-Pass Multi-View Subspace Clustering with Consensus Anchors

  • Suyuan Liu
  • Siwei Wang
  • Pei Zhang
  • Kai Xu
  • Xinwang Liu
  • Changwang Zhang
  • Feng Gao

Multi-view subspace clustering (MVSC) optimally integrates multiple graph structure information to improve clustering performance. Recently, many anchor-based variants are proposed to reduce the computational complexity of MVSC. Though achieving considerable acceleration, we observe that most of them adopt fixed anchor points separating from the subsequential anchor graph construction, which may adversely affect the clustering performance. In addition, postprocessing is required to generate discrete clustering labels with additional time consumption. To address these issues, we propose a scalable and parameter-free MVSC method to directly output the clustering labels with optimal anchor graph, termed as Efficient One-pass Multi-view Subspace Clustering with Consensus Anchors (EOMSC-CA). Specially, we combine anchor learning and graph construction into a uniform framework to boost clustering performance. Meanwhile, by imposing a graph connectivity constraint, our algorithm directly outputs the clustering labels without any post-processing procedures as previous methods do. Our proposed EOMSC-CA is proven to be linear complexity respecting to the data size. The superiority of our EOMSC-CA over the effectiveness and efficiency is demonstrated by extensive experiments. Our code is publicly available at https: //github. com/Tracesource/EOMSC-CA.

AAAI Conference 2022 Conference Paper

Fusion Multiple Kernel K-means

  • Yi Zhang
  • Xinwang Liu
  • Jiyuan Liu
  • Sisi Dai
  • Changwang Zhang
  • Kai Xu
  • En Zhu

Multiple kernel clustering aims to seek an appropriate combination of base kernels to mine inherent non-linear information for optimal clustering. Late fusion algorithms generate base partitions independently and integrate them in the following clustering procedure, improving the overall efficiency. However, the separate base partition generation leads to inadequate negotiation with the clustering procedure and a great loss of beneficial information in corresponding kernel matrices, which negatively affects the clustering performance. To address this issue, we propose a novel algorithm, termed as Fusion Multiple Kernel k-means (FMKKM), which unifies base partition learning and late fusion clustering into one single objective function, and adopts early fusion technique to capture more sufficient information in kernel matrices. Specifically, the early fusion helps base partitions keep more beneficial kernel details, and the base partitions learning further guides the generation of consensus partition in the late fusion stage, while the late fusion provides positive feedback on two former procedures. The close collaboration of three procedures results in a promising performance improvement. Subsequently, an alternate optimization method with promising convergence is developed to solve the resultant optimization problem. Comprehensive experimental results demonstrate that our proposed algorithm achieves stateof-the-art performance on multiple public datasets, validating its effectiveness. The code of this work is publicly available at https: //github. com/ethan-yizhang/Fusion-Multiple-Kernel- K-means.

NeurIPS Conference 2021 Conference Paper

A Bayesian-Symbolic Approach to Reasoning and Learning in Intuitive Physics

  • Kai Xu
  • Akash Srivastava
  • Dan Gutfreund
  • Felix Sosa
  • Tomer Ullman
  • Josh Tenenbaum
  • Charles Sutton

Humans can reason about intuitive physics in fully or partially observed environments even after being exposed to a very limited set of observations. This sample-efficient intuitive physical reasoning is considered a core domain of human common sense knowledge. One hypothesis to explain this remarkable capacity, posits that humans quickly learn approximations to the laws of physics that govern the dynamics of the environment. In this paper, we propose a Bayesian-symbolic framework (BSP) for physical reasoning and learning that is close to human-level sample-efficiency and accuracy. In BSP, the environment is represented by a top-down generative model of entities, which are assumed to interact with each other under unknown force laws over their latent and observed properties. BSP models each of these entities as random variables, and uses Bayesian inference to estimate their unknown properties. For learning the unknown forces, BSP leverages symbolic regression on a novel grammar of Newtonian physics in a bilevel optimization setup. These inference and regression steps are performed in an iterative manner using expectation-maximization, allowing BSP to simultaneously learn force laws while maintaining uncertainty over entity properties. We show that BSP is more sample-efficient compared to neural alternatives on controlled synthetic datasets, demonstrate BSP's applicability to real-world common sense scenes and study BSP's performance on tasks previously used to study human physical reasoning.

IJCAI Conference 2021 Conference Paper

Objective-aware Traffic Simulation via Inverse Reinforcement Learning

  • Guanjie Zheng
  • Hanyang Liu
  • Kai Xu
  • Zhenhui Li

Traffic simulators act as an essential component in the operating and planning of transportation systems. Conventional traffic simulators usually employ a calibrated physical car-following model to describe vehicles' behaviors and their interactions with traffic environment. However, there is no universal physical model that can accurately predict the pattern of vehicle's behaviors in different situations. A fixed physical model tends to be less effective in a complicated environment given the non-stationary nature of traffic dynamics. In this paper, we formulate traffic simulation as an inverse reinforcement learning problem, and propose a parameter sharing adversarial inverse reinforcement learning model for dynamics-robust simulation learning. Our proposed model is able to imitate a vehicle's trajectories in the real world while simultaneously recovering the reward function that reveals the vehicle's true objective which is invariant to different dynamics. Extensive experiments on synthetic and real-world datasets show the superior performance of our approach compared to state-of-the-art methods and its robustness to variant dynamics of traffic.

AAAI Conference 2021 Conference Paper

Online 3D Bin Packing with Constrained Deep Reinforcement Learning

  • Hang Zhao
  • Qijin She
  • Chenyang Zhu
  • Yin Yang
  • Kai Xu

We solve a challenging yet practically useful variant of 3D Bin Packing Problem (3D-BPP). In our problem, the agent has limited information about the items to be packed into a single bin, and an item must be packed immediately after its arrival without buffering or readjusting. The item’s placement also subjects to the constraints of order dependence and physical stability. We formulate this online 3D-BPP as a constrained Markov decision process (CMDP). To solve the problem, we propose an effective and easy-to-implement constrained deep reinforcement learning (DRL) method under the actor-critic framework. In particular, we introduce a prediction-and-projection scheme: The agent first predicts a feasibility mask for the placement actions as an auxiliary task and then uses the mask to modulate the action probabilities output by the actor during training. Such supervision and projection facilitate the agent to learn feasible policies very efficiently. Our method can be easily extended to handle lookahead items, multi-bin packing, and item re-orienting. We have conducted extensive evaluation showing that the learned policy significantly outperforms the state-of-the-art methods. A preliminary user study even suggests that our method might attain a human-level performance.

YNIMG Journal 2021 Journal Article

Relationship between the disrupted topological efficiency of the structural brain connectome and glucose hypometabolism in normal aging

  • Qiuhui Bi
  • Wenxiao Wang
  • Na Niu
  • He Li
  • Yezhou Wang
  • Weijie Huang
  • Kewei Chen
  • Kai Xu

Normal aging is accompanied by structural degeneration and glucose hypometabolism in the human brain. However, the relationship between structural network disconnections and hypometabolism in normal aging remains largely unknown. In the present study, by combining MRI and PET techniques, we investigated the metabolic mechanism of the structural brain connectome and its relationship with normal aging in a cross-sectional, community-based cohort of 42 cognitively normal elderly individuals aged 57–84 years. The structural connectome was constructed based on diffusion MRI tractography, and the network efficiency metrics were quantified using graph theory analyses. FDG-PET scanning was performed to evaluate the glucose metabolic level in the cortical regions of the individuals. The results of this study demonstrated that both network efficiency and cortical metabolism decrease with age (both p < 0. 05). In the subregions of the bilateral thalamus, significant correlations between nodal efficiency and cortical metabolism could be observed across subjects. Individual-level analyses indicated that brain regions with higher nodal efficiency tend to exhibit higher metabolic levels, implying a tight coupling between nodal efficiency and glucose metabolism (r = 0. 56, p = 1. 15 × 10−21). Moreover, efficiency-metabolism coupling coefficient significantly increased with age (r = 0. 44, p = 0. 0046). Finally, the main findings were also reproducible in the ADNI dataset. Together, our results demonstrate a close coupling between structural brain connectivity and cortical metabolism in normal elderly individuals and provide new insight that improve the present understanding of the metabolic mechanisms of structural brain disconnections in normal aging.

NeurIPS Conference 2021 Conference Paper

Targeted Neural Dynamical Modeling

  • Cole Hurwitz
  • Akash Srivastava
  • Kai Xu
  • Justin Jude
  • Matthew Perich
  • Lee Miller
  • Matthias Hennig

Latent dynamics models have emerged as powerful tools for modeling and interpreting neural population activity. Recently, there has been a focus on incorporating simultaneously measured behaviour into these models to further disentangle sources of neural variability in their latent space. These approaches, however, are limited in their ability to capture the underlying neural dynamics (e. g. linear) and in their ability to relate the learned dynamics back to the observed behaviour (e. g. no time lag). To this end, we introduce Targeted Neural Dynamical Modeling (TNDM), a nonlinear state-space model that jointly models the neural activity and external behavioural variables. TNDM decomposes neural dynamics into behaviourally relevant and behaviourally irrelevant dynamics; the relevant dynamics are used to reconstruct the behaviour through a flexible linear decoder and both sets of dynamics are used to reconstruct the neural activity through a linear decoder with no time lag. We implement TNDM as a sequential variational autoencoder and validate it on simulated recordings and recordings taken from the premotor and motor cortex of a monkey performing a center-out reaching task. We show that TNDM is able to learn low-dimensional latent dynamics that are highly predictive of behaviour without sacrificing its fit to the neural data.

AAAI Conference 2020 Conference Paper

Learning Part Generation and Assembly for Structure-Aware Shape Synthesis

  • Jun Li
  • Chengjie Niu
  • Kai Xu

Learning powerful deep generative models for 3D shape synthesis is largely hindered by the difficulty in ensuring plausibility encompassing correct topology and reasonable geometry. Indeed, learning the distribution of plausible 3D shapes seems a daunting task for the holistic approaches, given the significant topological variations of 3D objects even within the same category. Enlightened by the fact that 3D shape structure is characterized as part composition and placement, we propose to model 3D shape variations with a part-aware deep generative network, coined as PAGENet. The network is composed of an array of per-part VAE-GANs, generating semantic parts composing a complete shape, followed by a part assembly module that estimates a transformation for each part to correlate and assemble them into a plausible structure. Through delegating the learning of part composition and part placement into separate networks, the difficulty of modeling structural variations of 3D shapes is greatly reduced. We demonstrate through both qualitative and quantitative evaluations that PAGENet generates 3D shapes with plausible, diverse and detailed structure, and show two applications, i. e. , semantic shape segmentation and part-based shape editing.

AAAI Conference 2020 Conference Paper

MetaLight: Value-Based Meta-Reinforcement Learning for Traffic Signal Control

  • Xinshi Zang
  • Huaxiu Yao
  • Guanjie Zheng
  • Nan Xu
  • Kai Xu
  • Zhenhui Li

Using reinforcement learning for traffic signal control has attracted increasing interests recently. Various value-based reinforcement learning methods have been proposed to deal with this classical transportation problem and achieved better performances compared with traditional transportation methods. However, current reinforcement learning models rely on tremendous training data and computational resources, which may have bad consequences (e. g. , traffic jams or accidents) in the real world. In traffic signal control, some algorithms have been proposed to empower quick learning from scratch, but little attention is paid to learning by transferring and reusing learned experience. In this paper, we propose a novel framework, named as MetaLight, to speed up the learning process in new scenarios by leveraging the knowledge learned from existing scenarios. MetaLight is a value-based metareinforcement learning workflow based on the representative gradient-based meta-learning algorithm (MAML), which includes periodically alternate individual-level adaptation and global-level adaptation. Moreover, MetaLight improves thestate-of-the-art reinforcement learning model FRAP in traffic signal control by optimizing its model structure and updating paradigm. The experiments on four real-world datasets show that our proposed MetaLight not only adapts more quickly and stably in new traffic scenarios, but also achieves better performance.

AAAI Conference 2020 Conference Paper

NeoNav: Improving the Generalization of Visual Navigation via Generating Next Expected Observations

  • Qiaoyun Wu
  • Dinesh Manocha
  • Jun Wang
  • Kai Xu

We propose improving the cross-target and cross-scene generalization of visual navigation through learning an agent that is guided by conceiving the next observations it expects to see. This is achieved by learning a variational Bayesian model, called NeoNav, which generates the next expected observations (NEO) conditioned on the current observations of the agent and the target view. Our generative model is learned through optimizing a variational objective encompassing two key designs. First, the latent distribution is conditioned on current observations and the target view, leading to a modelbased, target-driven navigation. Second, the latent space is modeled with a Mixture of Gaussians conditioned on the current observation and the next best action. Our use of mixtureof-posteriors prior effectively alleviates the issue of overregularized latent space, thus significantly boosting the model generalization for new targets and in novel scenes. Moreover, the NEO generation models the forward dynamics of agentenvironment interaction, which improves the quality of approximate inference and hence benefits data efficiency. We have conducted extensive evaluations on both real-world and synthetic benchmarks, and show that our model consistently outperforms the state-of-the-art models in terms of success rate, data efficiency, and generalization.

NeurIPS Conference 2020 Conference Paper

PIE-NET: Parametric Inference of Point Cloud Edges

  • Xiaogang Wang
  • Yuelang Xu
  • Kai Xu
  • Andrea Tagliasacchi
  • Bin Zhou
  • Ali Mahdavi-Amiri
  • Hao Zhang

We introduce an end-to-end learnable technique to robustly identify feature edges in 3D point cloud data. We represent these edges as a collection of parametric curves (i. e. ,~lines, circles, and B-splines). Accordingly, our deep neural network, coined PIE-NET, is trained for parametric inference of edges. The network relies on a "region proposal" architecture, where a first module proposes an over-complete collection of edge and corner points, and a second module ranks each proposal to decide whether it should be considered. We train and evaluate our method on the ABC dataset, a large dataset of CAD models, and compare our results to those produced by traditional (non-learning) processing pipelines, as well as a recent deep learning based edge detector (EC-NET). Our results significantly improve over the state-of-the-art from both a quantitative and qualitative standpoint.

NeurIPS Conference 2020 Conference Paper

Telescoping Density-Ratio Estimation

  • Benjamin Rhodes
  • Kai Xu
  • Michael U. Gutmann

Density-ratio estimation via classification is a cornerstone of unsupervised learning. It has provided the foundation for state-of-the-art methods in representation learning and generative modelling, with the number of use-cases continuing to proliferate. However, it suffers from a critical limitation: it fails to accurately estimate ratios p/q for which the two densities differ significantly. Empirically, we find this occurs whenever the KL divergence between p and q exceeds tens of nats. To resolve this limitation, we introduce a new framework, telescoping density-ratio estimation (TRE), that enables the estimation of ratios between highly dissimilar densities in high-dimensional spaces. Our experiments demonstrate that TRE can yield substantial improvements over existing single-ratio methods for mutual information estimation, representation learning and energy-based modelling.

AAAI Conference 2020 Conference Paper

Toward A Thousand Lights: Decentralized Deep Reinforcement Learning for Large-Scale Traffic Signal Control

  • Chacha Chen
  • Hua Wei
  • Nan Xu
  • Guanjie Zheng
  • Ming Yang
  • Yuanhao Xiong
  • Kai Xu
  • Zhenhui Li

Traffic congestion plagues cities around the world. Recent years have witnessed an unprecedented trend in applying reinforcement learning for traffic signal control. However, the primary challenge is to control and coordinate traffic lights in large-scale urban networks. No one has ever tested RL models on a network of more than a thousand traffic lights. In this paper, we tackle the problem of multi-intersection traf- fic signal control, especially for large-scale networks, based on RL techniques and transportation theories. This problem is quite difficult because there are challenges such as scalability, signal coordination, data feasibility, etc. To address these challenges, we (1) design our RL agents utilizing ‘pressure’ concept to achieve signal coordination in region-level; (2) show that implicit coordination could be achieved by individual control agents with well-crafted reward design thus reducing the dimensionality; and (3) conduct extensive experiments on multiple scenarios, including a real-world scenario with 2510 traffic lights in Manhattan, New York City 1 2.

EAAI Journal 2019 Journal Article

Online probabilistic goal recognition and its application in dynamic shortest-path local network interdiction

  • Kai Xu
  • Yunxiu Zeng
  • Qi Zhang
  • Quanjun Yin
  • Lin Sun
  • Kaiming Xiao

Goal recognition is the task of inferring an agent’s goals given some or all of the agent’s observed actions. However, few research focuses on how to improve the usage effectiveness of knowledge produced by a goal recognition system. In this work, we propose a probabilistic goal recognition approach tailored to a dynamic shortest-path network interdiction problem. Apart from inferring a probabilistic distribution over the possible goals of an agent, our work has another four key novelties: (i) a dynamic shortest-path local network interdiction model that allocates resources locally per step using goal recognition information; (ii) two behavior modeling approaches, including a data-driven learning method based on Inverse Reinforcement Learning as well as a heuristic method taking advantage of the network information, to help solve both the data-intensive and no available data situations; (iii) a heuristic named Subjective Confidence that uses variance in particle system for flexible resource allocation adjustment. The empirical test results show the effectiveness of our goal recognition method, and also verify the practical implications of these methods in solving scalable multi-terminus network interdiction problem.

NeurIPS Conference 2019 Conference Paper

Scalable Spike Source Localization in Extracellular Recordings using Amortized Variational Inference

  • Cole Hurwitz
  • Kai Xu
  • Akash Srivastava
  • Alessio Buccino
  • Matthias Hennig

Determining the positions of neurons in an extracellular recording is useful for investigating the functional properties of the underlying neural circuitry. In this work, we present a Bayesian modelling approach for localizing the source of individual spikes on high-density, microelectrode arrays. To allow for scalable inference, we implement our model as a variational autoencoder and perform amortized variational inference. We evaluate our method on both biophysically realistic simulated and real extracellular datasets, demonstrating that it is more accurate than and can improve spike sorting performance over heuristic localization methods such as center of mass.

AAAI Conference 2018 Conference Paper

Reduced-Rank Linear Dynamical Systems

  • Qi She
  • Yuan Gao
  • Kai Xu
  • Rosa Chan

Linear Dynamical Systems are widely used to study the underlying patterns of multivariate time series. A basic assumption of these models is that high-dimensional time series can be characterized by some underlying, low-dimensional and time-varying latent states. However, existing approaches to LDS modeling mostly learn the latent space with a prescribed dimensionality. When dealing with short-length highdimensional time series data, such models would be easily overfitted. We propose Reduced-Rank Linear Dynamical Systems (RRLDS), to automatically retrieve the intrinsic dimensionality of the latent space during model learning. Our key observation is that the rank of the dynamics matrix of LDS captures the intrinsic dimensionality, and the variational inference with a reduced-rank regularization finally leads to a concise, structured, and interpretable latent space. To enable our method to handle count-valued data, we introduce the dispersion-adaptive distribution to accommodate over-/ equal- / and under-dispersion nature of such data. Results on both simulated and experimental data demonstrate our model can robustly learn latent space from short-length, noisy, countvalued data and significantly improve the prediction performance over the state-of-the-art methods.

IJCAI Conference 2017 Conference Paper

Bridging the Gap between Observation and Decision Making: Goal Recognition and Flexible Resource Allocation in Dynamic Network Interdiction

  • Kai Xu
  • Kaiming Xiao
  • Quanjun Yin
  • Yabing Zha
  • Cheng Zhu

Goal recognition, which is the task of inferring an agent’s goals given some or all of the agent’s observed actions, is one of the important approaches in bridging the gap between the observation and decision making within an observe-orient-decide-act cycle. Unfortunately, few researches focus on how to improve the utilization of knowledge produced by a goal recognition system. In this work, we propose a Markov Decision Process-based goal recognition approach tailored to a dynamic shortest-path local network interdiction (DSPLNI) problem. We first introduce a novel DSPLNI model and its solvable dual form so as to incorporate real-time knowledge acquired from goal recognition system. Then a Markov Decision Process-based goal recognition model along with its dynamic Bayesian network representation and the applied goal inference method is proposed to identify the evader’s real goal within the DSPLNI context. Based on that, we further propose an efficient scalable technique in maintaining action utility map used in fast goal inference, and develop a flexible resource assignment mechanism in DSPLNI using knowledge from goal recognition system. Experimental results show the effectiveness and accuracy of our methods both in goal recognition and dynamic network interdiction.

ICRA Conference 2012 Conference Paper

Configuration comparison for surgical robotic systems using a single access port and continuum mechanisms

  • Kai Xu
  • Xidian Zheng

Research on robot-assisted laparoscopic SPA (Single Port Access) surgery and N. O. T. E. S (Natural Orifice Translumenal Endoscopic Surgery) have thrived in the past a few years. A configuration similarity between these surgical robotic slaves is that two robotic arms are extended from the same access port (either a laparoscope or an endoscope) for surgical interventions. However, upon designing such a surgical robotic slave, the structure of the extended robotic arms has not been explored thoroughly based on evaluation of their distal dexterity. This paper presents a simulation-based comparison among three different structures which could be used to form these extended robotic arms. Results presented in this paper could serve as a design reference for surgical robotic slaves which use a single access port and continuum mechanisms.

ICRA Conference 2002 Conference Paper

Control System Design of THBIP-I Humanoid Robot

  • Mingguo Zhao
  • Li Liu
  • Jingsong Wang
  • Ken Chen
  • Jiandong Zhao
  • Kai Xu

Describes the progress of the control system design and implementation of the THBIP-I humanoid robot. The robot has 32 degrees of freedom and each joint is driven by a brushless DC electronic motor. Screw/nuts transmission mechanism is adapted in some joints of lower limbs to achieve compact and good dynamic performance. The control system of the robot has four subsystems: remote brain work station, mobile controller, distributed control units and sensor processing unit. At the present state, the lower limbs and upper limbs have been built and tested with off line gait planning. The distributed control units use PID schemes to servo the pre-generated joint trajectories. Under this architecture, the robot can perform stable walking with 30 centimeters step at 20 second per step.