Arrow Research search

Author name cluster

Xu Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

49 papers
2 author rows

Possible papers

49

AAAI Conference 2026 Conference Paper

Beyond Step Pruning: Information Theory Based Step-level Optimization for Self-Refining Large Language Models

  • Jinman Zhao
  • Erxue Min
  • Hui Wu
  • Ziheng Li
  • Zexu Sun
  • Hengyi Cai
  • Shuaiqiang Wang
  • Xu Chen

Large language models (LLMs) have shown impressive capabilities in natural language tasks, yet they continue to struggle with multi-step mathematical reasoning, where correctness depends on a precise chain of intermediate steps. Preference optimization methods such as Direct Preference Optimization (DPO) have improved answer-level alignment, but they often overlook the reasoning process itself, providing little supervision over intermediate steps that are critical for complex problem-solving. Existing fine-grained approaches typically rely on strong annotators or reward models to assess the quality of individual steps. However, reward models are vulnerable to reward hacking. To address this, we propose ISLA, a reward-model-free framework that constructs step-level preference data directly from SFT gold traces. ISLA also introduces a self-improving pruning mechanism that identifies informative steps based on two signals: their marginal contribution to final accuracy (relative accuracy) and the model’s uncertainty, inspired by the concept of information gain. Empirically, ISLA achieves better performance than DPO while using only 12% of the training tokens, demonstrating that careful step-level selection can significantly improve both reasoning accuracy and training efficiency.

AAAI Conference 2026 System Paper

PHOTONS: Pose-Free Human-Centric Photo-Realistic Real-Time Novel View Synthesis from Sparse Views

  • Yongyang Cheng
  • Boqin Qin
  • Zhao Hui
  • Xu Chen
  • Tao Zhang
  • Shang Sun
  • Haiquan Kang
  • Xiaojie Xu

We present PHOTONS (Pose-Free Human-Centric Photo-Realistic Real-Time Novel View Synthesis from Sparse Views), a real-time framework for novel view synthesis without requiring camera calibration. Our method reconstructs consistent 3D Gaussian point clouds and synthesizes 2K photo-realistic novel views from arbitrary numbers (>=2) of freely placed cameras. PHOTONS faithfully renders dynamic human bodies amid complex backgrounds, including interactive object manipulation and fine-grained details (e.g., hair strands), while maintaining 25 FPS throughput on commodity GPU like NVIDIA RTX 4090. By combining pose-free spatial point cloud reconstruction with Gaussian parameter estimation, our method demonstrates strong resilience to occlusions and camera perturbations. Additionally, we develop a 3D stereo system that drastically reduces setup complexity compared to existing solutions. Experiments on public and custom datasets show that PHOTONS outperforms state-of-the-art methods in both efficiency and visual quality.

NeurIPS Conference 2025 Conference Paper

CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension

  • Rui Li
  • Zeyu Zhang
  • Xiaohe Bo
  • Zihang Tian
  • Xu Chen
  • Quanyu Dai
  • Zhenhua Dong
  • Ruiming Tang

Current Large Language Models (LLMs) are confronted with overwhelming information volume when comprehending long-form documents. This challenge raises the imperative of a cohesive memory module, which can elevate vanilla LLMs into autonomous reading agents. Despite the emergence of some heuristic approaches, a systematic design principle remains absent. To fill this void, we draw inspiration from Jean Piaget's Constructivist Theory, illuminating three traits of the agentic memory---structured schemata, flexible assimilation, and dynamic accommodation. This blueprint forges a clear path toward a more robust and efficient memory system for LLM-based reading comprehension. To this end, we develop CAM, a prototype implementation of Constructivist Agentic Memory that simultaneously embodies the structurality, flexibility, and dynamicity. At its core, CAM is endowed with an incremental overlapping clustering algorithm for structured memory development, supporting both coherent hierarchical summarization and online batch integration. During inference, CAM adaptively explores the memory structure to activate query-relevant information for contextual response, akin to the human associative process. Compared to existing approaches, our design demonstrates dual advantages in both performance and efficiency across diverse long-text reading comprehension tasks, including question answering, query-based summarization, and claim verification.

NeurIPS Conference 2025 Conference Paper

Distributional LLM-as-a-Judge

  • Luyu Chen
  • Zeyu Zhang
  • Haoran Tan
  • Quanyu Dai
  • Yang Hao
  • Zhenhua Dong
  • Xu Chen

LLMs have emerged as powerful evaluators in the LLM-as-a-Judge paradigm, offering significant efficiency and flexibility compared to human judgments. However, previous methods primarily rely on single-point evaluations, overlooking the inherent diversity and uncertainty in human evaluations. This approach leads to information loss and decreases the reliability of evaluations. To address this limitation, we propose a novel training framework that explicitly aligns the LLM-generated judgment distribution with human evaluation distributions. Specifically, we propose a distributional alignment objective based on KL divergence, combined with an auxiliary cross-entropy regularization to stabilize the training process. Furthermore, due to limited human annotations, empirical human distributions are merely noisy estimates of the true underlying distribution. We therefore incorporate adversarial training to ensure a robust alignment with this true distribution, rather than overfitting to its imperfect approximation. Extensive experiments across various LLM backbones and evaluation tasks demonstrate that our framework significantly outperforms existing closed-source LLMs and conventional single-point alignment methods, with superior alignment quality, strong robustness, and competitive evaluation accuracy.

NeurIPS Conference 2025 Conference Paper

Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable

  • Ruoxin Chen
  • Junwei Xi
  • Zhiyuan Yan
  • Ke-Yue Zhang
  • Shuang Wu
  • Jingyi Xie
  • Xu Chen
  • Lei Xu

The rapid increase in AI-generated images (AIGIs) underscores the need for detection methods. Existing detectors are often trained on biased datasets, leading to overfitting on spurious correlations between non-causal image attributes and real/synthetic labels. While these biased features enhance performance on the training data, they result in substantial performance degradation when tested on unbiased datasets. A common solution is to perform data alignment through generative reconstruction, matching the content between real and synthetic images. However, we find that pixel-level alignment alone is inadequate, as the reconstructed images still suffer from frequency-level misalignment, perpetuating spurious correlations. To illustrate, we observe that reconstruction models restore the high-frequency details lost in real images, inadvertently creating a frequency-level misalignment, where synthetic images appear to have richer high-frequency content than real ones. This misalignment leads to models associating high-frequency features with synthetic labels, further reinforcing biased cues. To resolve this, we propose Dual Data Alignment (DDA), which aligns both the pixel and frequency domains. DDA generates synthetic images that closely resemble real ones by fusing real and synthetic image pairs in both domains, enhancing the detector's ability to identify forgeries without relying on biased features. Moreover, we introduce two new test sets: DDA-COCO, containing DDA-aligned synthetic images, and EvalGEN, featuring the latest generative models. Our extensive evaluations demonstrate that a detector trained exclusively on DDA-aligned MSCOCO improves across diverse benchmarks. Code is available at https: //github. com/roy-ch/Dual-Data-Alignment.

JBHI Journal 2025 Journal Article

Enhancing Trustworthiness of Semantic Segmentation in Cataract Surgery Videos via Intra-Phase Label Propagation

  • Mingen Zhang
  • Yuanyuan Gu
  • Xu Chen
  • Botian Zheng
  • Donghan Wu
  • Jinxian Zhang
  • Yufei Wu
  • Yonghuai Liu

Accurate segmentation of semantic features is a pivotal procedure for cataract surgery assistance, surgical skill assessment and related applications. However, previous studies have failed to consider the instance-level feature similarity of instruments across different surgical phases in cataract surgery videos, leading to unreliable decision-making regarding instrument categories. In this study, we propose a label propagation framework to effectively leverage the consistency of phase-specific instruments, which utilizes the initial frame labels from each surgical phase to predict masks for the remaining frames, achieving precise and trustworthy semantic segmentation of cataract surgery videos. Specifically, we design a pseudo-label generation and filtering strategy to automatically obtain highly reliable initial frame labels for each surgical phase. In addition, we establish a fixed-size memory bank with an adaptive update module to ensure long-term applicability in real surgical environments. To address the common problem of blurred edges in cataract surgery scenes, we develop a semantic edge perception module to allow the model to focus on and distinguish the edges of different objects. The proposed method achieved an mIoU of 80. 7% and 88. 8% on a publicly available dataset (14 categories) and a private dataset (12 categories) with a total of 9, 723 frames, respectively, significantly outperforming the state-of-the-art methods and other label propagation-based approaches. Furthermore, our method minimizes memory consumption and maintains about 30 FPS while processing long video sequences.

NeurIPS Conference 2025 Conference Paper

Iterative Missing Data Imputation with Model Form Adaptation and Non-Missing Feature Supervision

  • Hao Wang
  • zhengnan li
  • Zhichao Chen
  • Xu Chen
  • Shuting He
  • Guangyi Liu
  • Haoxuan Li
  • Zhouchen Lin

Iterative imputation is a prevalent method for missing data imputation, where each feature is imputed iteratively by treating it as a target variable estimated from all other features. However, iterative imputation method suffers from two principal limitations: (1) it imposes a single parametric model form to impute all features, neglecting the potential for optimal models to vary among features, which risks model misspecification; and (2) it assumes every feature contains missing values, overlooking the potential presence of non-missing features, termed as oracle features, which are informative for imputation. To address these limitations, we propose kernel point imputation (KPI), a bi-level optimization framework for iterative missing data imputation. At the inner level, KPI adaptively learns the optimal model form for each feature within a reproducing kernel Hilbert space, addressing limitation (1). At the outer level, KPI utilizes oracle features as supervisory signals to iteratively refine the imputations, addressing limitation (2). Experiments demonstrate that KPI outperforms competitive imputation methods. Code is available at https: //github. com/FMLYD/kpi. git.

NeurIPS Conference 2025 Conference Paper

MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants

  • Zeyu Zhang
  • Quanyu Dai
  • Luyu Chen
  • Zeren Jiang
  • Rui Li
  • Jieming Zhu
  • Xu Chen
  • Yi Xie

LLM-based agents have been widely applied as personal assistants, capable of memorizing information from user messages and responding to personal queries. However, there still lacks an objective and automatic evaluation on their memory capability, largely due to the challenges in constructing reliable questions and answers (QAs) according to user messages. In this paper, we propose MemSim, a Bayesian simulator designed to automatically construct reliable QAs from generated user messages, simultaneously keeping their diversity and scalability. Specifically, we introduce the Bayesian Relation Network (BRNet) and a causal generation mechanism to mitigate the impact of LLM hallucinations on factual information, facilitating the automatic creation of an evaluation dataset. Based on MemSim, we generate a dataset in the daily-life scenario, named MemDaily, and conduct extensive experiments to assess the effectiveness of our approach. We also provide a benchmark for evaluating different memory mechanisms in LLM-based agents with the MemDaily dataset.

NeurIPS Conference 2025 Conference Paper

MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework

  • Qirui Mi
  • Mengyue Yang
  • Xiangning Yu
  • Zhiyu Zhao
  • Cheng Deng
  • Bo An
  • Haifeng Zhang
  • Xu Chen

Simulating collective decision-making involves more than aggregating individual behaviors; it emerges from dynamic interactions among individuals. While large language models (LLMs) offer strong potential for social simulation, achieving quantitative alignment with real-world data remains a key challenge. To bridge this gap, we propose the \textbf{M}ean-\textbf{F}ield \textbf{LLM} (\textbf{MF-LLM}) framework, the first to incorporate mean field theory into LLM-based social simulation. MF-LLM models bidirectional interactions between individuals and the population through an iterative process, generating population signals to guide individual decisions, which in turn update the signals. This interplay produces coherent trajectories of collective behavior. To improve alignment with real-world data, we introduce \textbf{IB-Tune}, a novel fine-tuning method inspired by the \textbf{I}nformation \textbf{B}ottleneck principle, which retains population signals most predictive of future actions while filtering redundant history. Evaluated on a real-world social dataset, MF-LLM reduces KL divergence to human population distributions by \textbf{47\%} compared to non-mean-field baselines, enabling accurate trend forecasting and effective intervention planning. Generalizing across 7 domains and 4 LLM backbones, MF-LLM provides a scalable, high-fidelity foundation for social simulation.

NeurIPS Conference 2025 Conference Paper

Time-o1: Time-Series Forecasting Needs Transformed Label Alignment

  • Hao Wang
  • Licheng Pan
  • Zhichao Chen
  • Xu Chen
  • Qingyang Dai
  • Lei Wang
  • Haoxuan Li
  • Zhouchen Lin

Training time-series forecasting models poses unique challenges in loss function design. Most existing approaches adopt temporal mean squared error, but this study reveals two critical limitations: (1) it ignores the presence of label autocorrelation, which biases it from the true label sequence likelihood; (2) it involves excessive number of tasks, which complicates optimization, especially for long-term forecasting. To address these issues, we introduce Time-o1, a transform-enhanced loss function for time-series forecasting. The central idea is to transform the label sequence into decorrelated components with discriminated significance. Models are then trained to align the most significant components, thereby effectively mitigating label autocorrelation and reducing task amount. Experiments demonstrate that Time-o1 achieves state-of-the-art performance and is compatible with various forecast models. Code is available at https: //github. com/Master-PLC/Time-o1.

AAAI Conference 2024 Conference Paper

A Diffusion-Based Framework for Multi-Class Anomaly Detection

  • Haoyang He
  • Jiangning Zhang
  • Hongxu Chen
  • Xuhai Chen
  • Zhishan Li
  • Xu Chen
  • Yabiao Wang
  • Chengjie Wang

Reconstruction-based approaches have achieved remarkable outcomes in anomaly detection. The exceptional image reconstruction capabilities of recently popular diffusion models have sparked research efforts to utilize them for enhanced reconstruction of anomalous images. Nonetheless, these methods might face challenges related to the preservation of image categories and pixel-wise structural integrity in the more practical multi-class setting. To solve the above problems, we propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection, which consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion’s denoising network, and a feature-space pre-trained feature extractor. Firstly, The SG network is proposed for reconstructing anomalous regions while preserving the original image’s semantic information. Secondly, we introduce Spatial-aware Feature Fusion (SFF) block to maximize reconstruction accuracy when dealing with extensively reconstructed areas. Thirdly, the input and reconstructed images are processed by a pre-trained feature extractor to generate anomaly maps based on features extracted at different scales. Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach which surpasses the state-of-the-art methods, e.g., achieving 96.8/52.6 and 97.2/99.0 (AUROC/AP) for localization and detection respectively on multi-class MVTec-AD dataset. Code will be available at https://lewandofskee.github.io/projects/diad.

AAAI Conference 2024 Conference Paper

AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model

  • Teng Hu
  • Jiangning Zhang
  • Ran Yi
  • Yuzhen Du
  • Xu Chen
  • Liang Liu
  • Yabiao Wang
  • Chengjie Wang

Anomaly inspection plays an important role in industrial manufacture. Existing anomaly inspection methods are limited in their performance due to insufficient anomaly data. Although anomaly generation methods have been proposed to augment the anomaly data, they either suffer from poor generation authenticity or inaccurate alignment between the generated anomalies and masks. To address the above problems, we propose AnomalyDiffusion, a novel diffusion-based few-shot anomaly generation model, which utilizes the strong prior information of latent diffusion model learned from large-scale dataset to enhance the generation authenticity under few-shot training data. Firstly, we propose Spatial Anomaly Embedding, which consists of a learnable anomaly embedding and a spatial embedding encoded from an anomaly mask, disentangling the anomaly information into anomaly appearance and location information. Moreover, to improve the alignment between the generated anomalies and the anomaly masks, we introduce a novel Adaptive Attention Re-weighting Mechanism. Based on the disparities between the generated anomaly image and normal sample, it dynamically guides the model to focus more on the areas with less noticeable generated anomalies, enabling generation of accurately-matched anomalous image-mask pairs. Extensive experiments demonstrate that our model significantly outperforms the state-of-the-art methods in generation authenticity and diversity, and effectively improves the performance of downstream anomaly inspection tasks. The code and data are available in https://github.com/sjtuplayer/anomalydiffusion.

AAAI Conference 2024 Conference Paper

Efficient Online Crowdsourcing with Complex Annotations

  • Reshef Meir
  • Viet-An Nguyen
  • Xu Chen
  • Jagdish Ramakrishnan
  • Udi Weinsberg

Crowdsourcing platforms use various truth discovery algorithms to aggregate annotations from multiple labelers. In an online setting, however, the main challenge is to decide whether to ask for more annotations for each item to efficiently trade off cost (i.e., the number of annotations) for quality of the aggregated annotations. In this paper, we propose a novel approach for general complex annotation (such as bounding boxes and taxonomy paths), that works in an online crowdsourcing setting. We prove that the expected average similarity of a labeler is linear in their accuracy conditional on the reported label. This enables us to infer reported label accuracy in a broad range of scenarios. We conduct extensive evaluations on real-world crowdsourcing data from Meta and show the effectiveness of our proposed online algorithms in improving the cost-quality trade-off.

IROS Conference 2024 Conference Paper

Learned Slip-Detection-Severity Framework using Tactile Deformation Field Feedback for Robotic Manipulation

  • Neel Jawale
  • Navneet Kaur
  • Amy Santoso
  • Xiaohai Hu
  • Xu Chen

Safely handling objects and avoiding slippage are fundamental challenges in robotic manipulation, yet traditional techniques often oversimplify the issue by treating slippage as a binary occurrence. Our research presents a framework that both identifies slip incidents and measures their severity. We introduce a set of features based on detailed vector field analysis of tactile deformation data captured by the GelSight Mini sensor. Two distinct machine learning models use these features: one focuses on slip detection, and the other evaluates the slip’s severity, which is the slipping velocity of the object against the sensor surface. Our slip detection model achieves an average accuracy of 92%, and the slip severity estimation model exhibits a mean absolute error (MAE) of 0. 6 cm/s for unseen objects. To demonstrate the synergistic approach of this framework, we employ both the models in a tactile feedback-guided vertical sliding task. Leveraging the high accuracy of slip detection, we utilize it as the foundational and corrective model and integrate the slip severity estimation into the feedback control loop to address slips without overcompensating. Videos and demonstrations are available at: https://sites.google.com/uw.edu/lsds

ICML Conference 2024 Conference Paper

Multiply Robust Estimation for Local Distribution Shifts with Multiple Domains

  • Steven Wilkins-Reeves
  • Xu Chen
  • Qi Ma
  • Christine Agarwal
  • Aude Hofleitner

Distribution shifts are ubiquitous in real-world machine learning applications, posing a challenge to the generalization of models trained on one data distribution to another. We focus on scenarios where data distributions vary across multiple segments of the entire population and only make local assumptions about the differences between training and test (deployment) distributions within each segment. We propose a two-stage multiply robust estimation method to improve model performance on each individual segment for tabular data analysis. The method involves fitting a linear combination of the based models, learned using clusters of training data from multiple segments, followed by a refinement step for each segment. Our method is designed to be implemented with commonly used off-the-shelf machine learning models. We establish theoretical guarantees on the generalization bound of the method on the test risk. With extensive experiments on synthetic and real datasets, we demonstrate that the proposed method substantially improves over existing alternatives in prediction accuracy and robustness on both regression and classification tasks. We also assess its effectiveness on a user city prediction dataset from Meta.

NeurIPS Conference 2024 Conference Paper

Reflective Multi-Agent Collaboration based on Large Language Models

  • Xiaohe Bo
  • Zeyu Zhang
  • Quanyu Dai
  • Xueyang Feng
  • Lei Wang
  • Rui Li
  • Xu Chen
  • Ji-Rong Wen

Benefiting from the powerful language expression and planning capabilities of Large Language Models (LLMs), LLM-based autonomous agents have achieved promising performance in various downstream tasks. Recently, based on the development of single-agent systems, researchers propose to construct LLM-based multi-agent systems to tackle more complicated tasks. In this paper, we propose a novel framework, named COPPER, to enhance the collaborative capabilities of LLM-based agents with the self-reflection mechanism. To improve the quality of reflections, we propose to fine-tune a shared reflector, which automatically tunes the prompts of actor models using our counterfactual PPO mechanism. On the one hand, we propose counterfactual rewards to assess the contribution of a single agent’s reflection within the system, alleviating the credit assignment problem. On the other hand, we propose to train a shared reflector, which enables the reflector to generate personalized reflections according to agent roles, while reducing the computational resource requirements and improving training stability. We conduct experiments on three datasets to evaluate the performance of our model in multi-hop question answering, mathematics, and chess scenarios. Experimental results show that COPPER possesses stronger reflection capabilities and exhibits excellent generalization performance across different actor models.

AAAI Conference 2024 Conference Paper

Rethinking Reverse Distillation for Multi-Modal Anomaly Detection

  • Zhihao Gu
  • Jiangning Zhang
  • Liang Liu
  • Xu Chen
  • Jinlong Peng
  • Zhenye Gan
  • Guannan Jiang
  • Annan Shu

In recent years, there has been significant progress in employing color images for anomaly detection in industrial scenarios, but it is insufficient for identifying anomalies that are invisible in RGB images alone. As a supplement, introducing extra modalities such as depth and surface normal maps can be helpful to detect these anomalies. To this end, we present a novel Multi-Modal Reverse Distillation (MMRD) paradigm that consists of a frozen multi-modal teacher encoder to generate distillation targets and a learnable student decoder targeting to restore multi-modal representations from the teacher. Specifically, the teacher extracts complementary visual features from different modalities via a siamese architecture and then parameter-freely fuses these information from multiple levels as the targets of distillation. For the student, it learns modality-related priors from the teacher representations of normal training data and performs interaction between them to form multi-modal representations for target reconstruction. Extensive experiments show that our MMRD outperforms recent state-of-the-art methods on both anomaly detection and localization on MVTec-3D AD and Eyecandies benchmarks. Codes will be available upon acceptance.

TIST Journal 2024 Journal Article

Robust Structure-Aware Graph-based Semi-Supervised Learning: Batch and Recursive Processing

  • Xu Chen

Graph-based semi-supervised learning plays an important role in large scale image classification tasks. However, the problem becomes very challenging in the presence of noisy labels and outliers. Moreover, traditional robust semi-supervised learning solutions suffers from prohibitive computational burdens thus cannot be computed for streaming data. Motivated by that, we present a novel unified framework robust structure-aware semi-supervised learning called Unified RSSL (URSSL) for batch processing and recursive processing robust to both outliers and noisy labels. Particularly, URSSL applies joint semi-supervised dimensionality reduction with robust estimators and network sparse regularization simultaneously on the graph Laplacian matrix iteratively to preserve the intrinsic graph structure and ensure robustness to the compound noise. First, in order to relieve the influence from outliers, a novel semi-supervised robust dimensionality reduction is applied relying on robust estimators to suppress outliers. Meanwhile, to tackle noisy labels, the denoised graph similarity information is encoded into the network regularization. Moreover, by identifying strong relevance of dimensionality reduction and network regularization in the context of robust semi-supervised learning (RSSL), a two-step alternative optimization is derived to compute optimal solutions with guaranteed convergence. We further derive our framework to adapt to large scale semi-supervised learning particularly suitable for large scale image classification and demonstrate the model robustness under different adversarial attacks. For recursive processing, we rely on reparameterization to transform the formulation to unlock the challenging problem of robust streaming-based semi-supervised learning. Last but not least, we extend our solution into distributed solutions to resolve the challenging issue of distributed robust semi-supervised learning when images are captured by multiple cameras at different locations. Extensive experimental results demonstrate the promising performance of this framework when applied to multiple benchmark datasets with respect to state-of-the-art approaches for important applications in the areas of image classification and spam data analysis.

ICRA Conference 2024 Conference Paper

STT: Stateful Tracking with Transformers for Autonomous Driving

  • Longlong Jing
  • Ruichi Yu
  • Xu Chen
  • Zhengli Zhao
  • Shiwei Sheng
  • Colin Graber
  • Qi Chen
  • Qinru Li

Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model’s performance on state estimation or deploying complex heuristics to predict the states. In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately. STT consumes rich appearance, geometry, and motion signals through long term history of detections and is jointly optimized for both data association and state estimation tasks. Since the standard tracking metrics like MOTA and MOTP do not capture the combined performance of the two tasks in the wider spectrum of object states, we extend them with new metrics called S-MOTA and MOTP S that address this limitation. STT achieves competitive real-time performance on the Waymo Open Dataset.

AAAI Conference 2024 Conference Paper

Text-to-Image Generation for Abstract Concepts

  • Jiayi Liao
  • Xu Chen
  • Qiang Fu
  • Lun Du
  • Xiangnan He
  • Xiang Wang
  • Shi Han
  • Dongmei Zhang

Recent years have witnessed the substantial progress of large-scale models across various domains, such as natural language processing and computer vision, facilitating the expression of concrete concepts. Unlike concrete concepts that are usually directly associated with physical objects, expressing abstract concepts through natural language requires considerable effort since they are characterized by intricate semantics and connotations. An alternative approach is to leverage images to convey rich visual information as a supplement. Nevertheless, existing Text-to-Image (T2I) models are primarily trained on concrete physical objects and often struggle to visualize abstract concepts. Inspired by the three-layer artwork theory that identifies critical factors, intent, object and form during artistic creation, we propose a framework of Text-to-Image generation for Abstract Concepts (TIAC). The abstract concept is clarified into a clear intent with a detailed definition to avoid ambiguity. LLMs then transform it into semantic-related physical objects, and the concept-dependent form is retrieved from an LLM-extracted form pattern set. Information from these three aspects will be integrated to generate prompts for T2I models via LLM. Evaluation results from human assessments and our newly designed metric concept score demonstrate the effectiveness of our framework in creating images that can sufficiently express abstract concepts.

AAAI Conference 2024 Conference Paper

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

  • Xinyi He
  • Mengyu Zhou
  • Xinrun Xu
  • Xiaojun Ma
  • Rui Ding
  • Lun Du
  • Yan Gao
  • Ran Jia

Tabular data analysis is crucial in various fields, and large language models show promise in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like forecasting and chart generation. To address this gap, we developed the Text2Analysis benchmark, incorporating advanced analysis tasks that go beyond the SQL-compatible operations and require more in-depth analysis. We also develop five innovative and effective annotation methods, harnessing the capabilities of large language models to enhance data quality and quantity. Additionally, we include unclear queries that resemble real-world user questions to test how well models can understand and tackle such challenges. Finally, we collect 2249 query-result pairs with 347 tables. We evaluate five state-of-the-art models using three different metrics and the results show that our benchmark presents introduces considerable challenge in the field of tabular data analysis, paving the way for more advanced research opportunities.

AAAI Conference 2024 Conference Paper

Would You Like Your Data to Be Trained? A User Controllable Recommendation Framework

  • Lei Wang
  • Xu Chen
  • Zhenhua Dong
  • Quanyu Dai

Recommender systems have a significant impact on various real-world applications, shaping people's daily lives and enhancing productivity. Traditional recommender models aim to collect extensive user information to accurately estimate user preferences. However, in practical scenarios, users may not want all their behaviors to be included in the model training process. This paper introduces a novel recommendation paradigm that allows users to indicate their ``willingness'' regarding which data should contribute to model training. The models are then optimized to maximize utility, which considers the trade-off between recommendation performance and respecting user preferences. The recommendation problem is formulated as a multiplayer game, with each user acting as a player and using a selection vector to indicate their willingness to include specific interacted items in training. To efficiently solve this game, an influence function-based model is proposed to approximate recommendation performances for different actions without re-optimizing the model. Furthermore, an enhanced model leveraging multiple anchor actions for the influence function is introduced to improve performance approximation accuracy. The convergence rate of the algorithm is theoretically analyzed, and the advantages of incorporating multiple anchor actions are demonstrated. Extensive experiments on both simulated and real-world datasets validate the effectiveness of the proposed models in balancing recommendation quality and user willingness. To promote this research direction, we have released our project at https://paitesanshi.github.io/IFRQE/.

AAMAS Conference 2023 Conference Paper

A Hybrid Framework of Reinforcement Learning and Physics-Informed Deep Learning for Spatiotemporal Mean Field Games

  • Xu Chen
  • Shuo Liu
  • Xuan Di

Mean field games (MFG) are developed to solve equilibria in multiagent systems (MAS) with many agents. The majority of literature on MFGs is focused on finite states and actions. In many engineering applications such as autonomous driving, however, each agent (e. g. , an autonomous vehicle) makes a continuous-time-space (or spatiotemporal dynamic) decision to optimize a nonlinear cumulative reward. In this paper, we focus on a class of generic MFGs with continuous states and actions defined over a spatiotemporal domain for a finite horizon, named “spatiotemporal MFG (ST-MFG). " The mean field equilibria (MFE) for such games are challenging to solve using numerical methods to meet a satisfactory resolution in time and space, while it is critical to deploy smooth dynamic control in autonomous driving. Thus, we propose two methods, one is a joint reinforcement learning (RL) and machine learning framework, which iteratively solves agents’ optimal policies using RL, and propagates population density using physics-informed deep learning (PIDL). The other is a pure PIDL framework that updates agents’ states and population density altogether using deep neural networks. Both the proposed methods are mesh-free (i. e. , not restricted by mesh granularity), and have shown to be efficient in learning equilibria in autonomous driving MFGs. The PIDL method alone is faster to train than the RL-PIDL integrated method, when the environment dynamic is known.

NeurIPS Conference 2023 Conference Paper

Bayesian Active Causal Discovery with Multi-Fidelity Experiments

  • Zeyu Zhang
  • Chaozhuo Li
  • Xu Chen
  • Xing Xie

This paper studies the problem of active causal discovery when the experiments can be done based on multi-fidelity oracles, where higher fidelity experiments are more precise and expensive, while the lower ones are cheaper but less accurate. In this paper, we formally define the task of multi-fidelity active causal discovery, and design a probabilistic model for solving this problem. In specific, we first introduce a mutual-information based acquisition function to determine which variable should be intervened at which fidelity, and then a cascading model is proposed to capture the correlations between different fidelity oracles. Beyond the above basic framework, we also extend it to the batch intervention scenario. We find that the theoretical foundations behind the widely used and efficient greedy method do not hold in our problem. To solve this problem, we introduce a new concept called $\epsilon$-submodular, and design a constraint based fidelity model to theoretically validate the greedy method. We conduct extensive experiments to demonstrate the effectiveness of our model.

TMLR Journal 2023 Journal Article

Contrastive Attraction and Contrastive Repulsion for Representation Learning

  • Huangjie Zheng
  • Xu Chen
  • Jiangchao Yao
  • Hongxia Yang
  • Chunyuan Li
  • Ya Zhang
  • Hao Zhang
  • Ivor Tsang

Contrastive learning (CL) methods effectively learn data representations in a self-supervision manner, where the encoder contrasts each positive sample over multiple negative samples via a one-vs-many softmax cross-entropy loss. By leveraging large amounts of unlabeled image data, recent CL methods have achieved promising results when pretrained on large-scale datasets, such as ImageNet. However, most of them consider the augmented views from the same instance are positive pairs, while views from other instances are negative ones. Such binary partition insufficiently considers the relation between samples and tends to yield worse performance when generalized on images in the wild. In this paper, to further improve the performance of CL and enhance its robustness on various datasets, we propose a doubly CL strategy that contrasts positive samples and negative ones within themselves separately. We realize this strategy with contrastive attraction and contrastive repulsion (CACR), which makes the query not only exert a greater force to attract more distant positive samples but also do so to repel closer negative samples. Theoretical analysis reveals that CACR generalizes CL's behavior by positive attraction and negative repulsion. It further considers the intra-contrastive relation within the positive and negative pairs to narrow the gap between the sampled and true distribution, which is important when datasets are less curated. Extensive large-scale experiments on standard vision tasks show that CACR not only consistently outperforms existing CL methods on benchmark datasets, but also shows better robustness when generalized on imbalanced image datasets.

NeurIPS Conference 2023 Conference Paper

Offline Imitation Learning with Variational Counterfactual Reasoning

  • Zexu Sun
  • Bowei He
  • Jinxin Liu
  • Xu Chen
  • Chen Ma
  • Shuai Zhang

In offline imitation learning (IL), an agent aims to learn an optimal expert behavior policy without additional online environment interactions. However, in many real-world scenarios, such as robotics manipulation, the offline dataset is collected from suboptimal behaviors without rewards. Due to the scarce expert data, the agents usually suffer from simply memorizing poor trajectories and are vulnerable to the variations in the environments, lacking the capability of generalizing to new environments. To automatically generate high-quality expert data and improve the generalization ability of the agent, we propose a framework named \underline{O}ffline \underline{I}mitation \underline{L}earning with \underline{C}ounterfactual data \underline{A}ugmentation (OILCA) by doing counterfactual inference. In particular, we leverage identifiable variational autoencoder to generate \textit{counterfactual} samples for expert data augmentation. We theoretically analyze the influence of the generated expert data and the improvement of generalization. Moreover, we conduct extensive experiments to demonstrate that our approach significantly outperforms various baselines on both \textsc{DeepMind Control Suite} benchmark for in-distribution performance and \textsc{CausalWorld} benchmark for out-of-distribution generalization.

NeurIPS Conference 2023 Conference Paper

REASONER: An Explainable Recommendation Dataset with Comprehensive Labeling Ground Truths

  • Xu Chen
  • Jingsen Zhang
  • Lei Wang
  • Quanyu Dai
  • Zhenhua Dong
  • Ruiming Tang
  • Rui Zhang
  • Li Chen

Explainable recommendation has attracted much attention from the industry and academic communities. It has shown great potential to improve the recommendation persuasiveness, informativeness and user satisfaction. In the past few years, while a lot of promising explainable recommender models have been proposed, the datasets used to evaluate them still suffer from several limitations, for example, the explanation ground truths are not labeled by the real users, the explanations are mostly single-modal and around only one aspect. To bridge these gaps, in this paper, we build a new explainable recommendation dataset, which, to our knowledge, is the first contribution that provides a large amount of real user labeled multi-modal and multi-aspect explaination ground truths. In specific, we firstly develop a video recommendation platform, where a series of questions around the recommendation explainability are carefully designed. Then, we recruit about 3000 high-quality labelers with different backgrounds to use the system, and collect their behaviors and feedback to our questions. In this paper, we detail the construction process of our dataset and also provide extensive analysis on its characteristics. In addition, we develop a library, where ten well-known explainable recommender models are implemented in a unified framework. Based on this library, we build several benchmarks for different explainable recommendation tasks. At last, we present many new opportunities brought by our dataset, which are expected to promote the field of explainable recommendation. Our dataset, library and the related documents have been released at https: //reasoner2023. github. io/.

ICRA Conference 2022 Conference Paper

Depth Estimation Matters Most: Improving Per-Object Depth Estimation for Monocular 3D Detection and Tracking

  • Longlong Jing
  • Ruichi Yu
  • Henrik Kretzschmar
  • Kang Li
  • Charles R. Qi
  • Hang Zhao 0021
  • Alper Ayvaci
  • Xu Chen

Monocular image-based 3D perception has become an active research area in recent years owing to its applications in autonomous driving. Approaches to monocular 3D perception including detection and tracking, however, often yield inferior performance when compared to LiDAR-based techniques. Through systematic analysis, we identified that per-object depth estimation accuracy is a major factor bounding the performance. Motivated by this observation, we propose a multi-level fusion method that combines different representations (RGB and pseudo-LiDAR) and temporal information across multiple frames for objects (tracklets) to enhance per-object depth estimation. Our proposed fusion method achieves the state-of-the-art performance of per-object depth estimation on the Waymo Open Dataset, the KITTI detection dataset, and the KITTI MOT dataset. We further demonstrate that by simply replacing estimated depth with fusion-enhanced depth, we can achieve significant improvements in monocular 3D perception tasks, including detection and tracking.

IJCAI Conference 2022 Conference Paper

FastRE: Towards Fast Relation Extraction with Convolutional Encoder and Improved Cascade Binary Tagging Framework

  • Guozheng Li
  • Xu Chen
  • Peng Wang
  • Jiafeng Xie
  • Qiqing Luo

Recent work for extracting relations from texts has achieved excellent performance. However, most existing methods pay less attention to the efficiency, making it still challenging to quickly extract relations from massive or streaming text data in realistic scenarios. The main efficiency bottleneck is that these methods use a Transformer-based pre-trained language model for encoding, which heavily affects the training speed and inference speed. To address this issue, we propose a fast relation extraction model (FastRE) based on convolutional encoder and improved cascade binary tagging framework. Compared to previous work, FastRE employs several innovations to improve efficiency while also keeping promising performance. Concretely, FastRE adopts a novel convolutional encoder architecture combined with dilated convolution, gated unit and residual connection, which significantly reduces the computation cost of training and inference, while maintaining the satisfactory performance. Moreover, to improve the cascade binary tagging framework, FastRE first introduces a type-relation mapping mechanism to accelerate tagging efficiency and alleviate relation redundancy, and then utilizes a position-dependent adaptive thresholding strategy to obtain higher tagging accuracy and better model generalization. Experimental results demonstrate that FastRE is well balanced between efficiency and performance, and achieves 3-10$\times$ training speed, 7-15$\times$ inference speed faster, and 1/100 parameters compared to the state-of-the-art models, while the performance is still competitive. Our code is available at \url{https: //github. com/seukgcode/FastRE}.

AAAI Conference 2022 Conference Paper

Learning to Identify Top Elo Ratings: A Dueling Bandits Approach

  • Xue Yan
  • Yali Du
  • Binxin Ru
  • Jun Wang
  • Haifeng Zhang
  • Xu Chen

The Elo rating system is widely adopted to evaluate the skills of (chess) game and sports players. Recently it has been also integrated into machine learning algorithms in evaluating the performance of computerised AI agents. However, an accurate estimation of the Elo rating (for the top players) often requires many rounds of competitions, which can be expensive to carry out. In this paper, to improve the sample efficiency of the Elo evaluation (for top players), we propose an efficient online match scheduling algorithm. Specifically, we identify and match the top players through a dueling bandits framework and tailor the bandit algorithm to the gradient-based update of Elo. We show that it reduces the per-step memory and time complexity to constant, compared to the traditional likelihood maximization approaches requiring O(t) time. Our algorithm has a regret guarantee of Õ( √ T), sublinear in the number of competition rounds and has been extended to the multidimensional Elo ratings for handling intransitive games. We empirically demonstrate that our method achieves superior convergence speed and time efficiency on a variety of gaming tasks.

NeurIPS Conference 2022 Conference Paper

Neuron with Steady Response Leads to Better Generalization

  • Qiang Fu
  • Lun Du
  • Haitao Mao
  • Xu Chen
  • Wei Fang
  • Shi Han
  • Dongmei Zhang

Regularization can mitigate the generalization gap between training and inference by introducing inductive bias. Existing works have already proposed various inductive biases from diverse perspectives. However, none of them explores inductive bias from the perspective of class-dependent response distribution of individual neurons. In this paper, we conduct a substantial analysis of the characteristics of such distribution. Based on the analysis results, we articulate the Neuron Steadiness Hypothesis: the neuron with similar responses to instances of the same class leads to better generalization. Accordingly, we propose a new regularization method called Neuron Steadiness Regularization (NSR) to reduce neuron intra-class response variance. Based on the Complexity Measure, we theoretically guarantee the effectiveness of NSR for improving generalization. We conduct extensive experiments on Multilayer Perceptron, Convolutional Neural Networks, and Graph Neural Networks with popular benchmark datasets of diverse domains, which show that our Neuron Steadiness Regularization consistently outperforms the vanilla version of models with significant gain and low additional computational overhead.

KER Journal 2021 Journal Article

A comprehensive overview of RDF for spatial and spatiotemporal data management

  • Fu Zhang
  • Qingzhe Lu
  • Zhenjun Du
  • Xu Chen
  • Chunhong Cao

Abstract Currently, a large amount of spatial and spatiotemporal RDF data has been shared and exchanged on the Internet and various applications. Resource Description Framework (RDF) is widely accepted for representing and processing data in different (including spatiotemporal) application domains. The effective management of spatial and spatiotemporal RDF data are becoming more and more important. A lot of work has been done to study how to represent, query, store, and manage spatial and spatiotemporal RDF data. In order to grasp and learn the main ideas and research results of spatial and spatiotemporal RDF data, in this paper, we provide a comprehensive overview of RDF for spatial and spatiotemporal data management. We summarize spatial and spatiotemporal RDF data management from several essential aspects such as representation, querying, storage, performance assessment, datasets, and management tools. In addition, the direction of future research and some comparisons and analysis are also discussed in depth.

JBHI Journal 2021 Journal Article

Estimating Reference Bony Shape Models for Orthognathic Surgical Planning Using 3D Point-Cloud Deep Learning

  • Deqiang Xiao
  • Chunfeng Lian
  • Hannah Deng
  • Tianshu Kuang
  • Qin Liu
  • Lei Ma
  • Daeseung Kim
  • Yankun Lang

Orthognathic surgical outcomes rely heavily on the quality of surgical planning. Automatic estimation of a reference facial bone shape significantly reduces experience-dependent variability and improves planning accuracy and efficiency. We propose an end-to-end deep learning framework to estimate patient-specific reference bony shape models for patients with orthognathic deformities. Specifically, we apply a point-cloud network to learn a vertex-wise deformation field from a patient's deformed bony shape, represented as a point cloud. The estimated deformation field is then used to correct the deformed bony shape to output a patient-specific reference bony surface model. To train our network effectively, we introduce a simulation strategy to synthesize deformed bones from any given normal bone, producing a relatively large and diverse dataset of shapes for training. Our method was evaluated using both synthetic and real patient data. Experimental results show that our framework estimates realistic reference bony shape models for patients with varying deformities. The performance of our method is consistently better than an existing method and several deep point-cloud networks. Our end-to-end estimation framework based on geometric deep learning shows great potential for improving clinical workflows.

IJCAI Conference 2021 Conference Paper

HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping

  • Yuhan Wang
  • Xu Chen
  • Junwei Zhu
  • Wenqing Chu
  • Ying Tai
  • Chengjie Wang
  • Jilin Li
  • Yongjian Wu

In this work, we propose a high fidelity face swapping method, called HifiFace, which can well preserve the face shape of the source face and generate photo-realistic results. Unlike other existing face swapping works that only use face recognition model to keep the identity similarity, we propose 3D shape-aware identity to control the face shape with the geometric supervision from 3DMM and 3D face reconstruction method. Meanwhile, we introduce the Semantic Facial Fusion module to optimize the combination of encoder and decoder features and make adaptive blending, which makes the results more photo-realistic. Extensive experiments on faces in the wild demonstrate that our method can preserve better identity, especially on the face shape, and can generate more photo-realistic results than previous state-of-the-art methods. Code is available at: https: //johann. wang/HifiFace

AAMAS Conference 2021 Conference Paper

Learning Correlated Communication Topology in Multi-Agent Reinforcement learning

  • Yali Du
  • Bo Liu
  • Vincent Moens
  • Ziqi Liu
  • Zhicheng Ren
  • Jun Wang
  • Xu Chen
  • Haifeng Zhang

Communication improves the efficiency and convergence of multiagent learning. Existing study of agent communication has been limited on predefined fixed connections. While an attention mechanism exists and is useful for scheduling the communication between agents, it, however, largely ignores the dynamical nature of communication and thus the correlation between agents’ connections. In this work, we adopt a normalizing flow to encode correlation between agents interactions. The dynamical communication topology is directly learned by maximizing the agent rewards. In our end-to-end formulation, the communication structure is learned by considering it as a hidden dynamical variable. We realize centralized training of critics and graph reasoning policy, and decentralized execution from local observation and message that are received through the learned dynamical communication topology. Experiments on cooperative navigation in the particle world and adaptive traffic control tasks demonstrate the effectiveness of our method.

IJCAI Conference 2021 Conference Paper

TrafficStream: A Streaming Traffic Flow Forecasting Framework Based on Graph Neural Networks and Continual Learning

  • Xu Chen
  • Junshan Wang
  • Kunqing Xie

With the rapid growth of traffic sensors deployed, a massive amount of traffic flow data are collected, revealing the long-term evolution of traffic flows and the gradual expansion of traffic networks. How to accurately forecasting these traffic flow attracts the attention of researchers as it is of great significance for improving the efficiency of transportation systems. However, existing methods mainly focus on the spatial-temporal correlation of static networks, leaving the problem of efficiently learning models on networks with expansion and evolving patterns less studied. To tackle this problem, we propose a Streaming Traffic Flow Forecasting Framework, TrafficStream, based on Graph Neural Networks (GNNs) and Continual Learning (CL), achieving accurate predictions and high efficiency. Firstly, we design a traffic pattern fusion method, cleverly integrating the new patterns that emerged during the long-term period into the model. A JS-divergence-based algorithm is proposed to mine new traffic patterns. Secondly, we introduce CL to consolidate the knowledge learned previously and transfer them to the current model. Specifically, we adopt two strategies: historical data replay and parameter smoothing. We construct a streaming traffic data set to verify the efficiency and effectiveness of our model. Extensive experiments demonstrate its excellent potential to extract traffic patterns with high efficiency on long-term streaming network scene. The source code is available at https: //github. com/AprLie/TrafficStream.

ICML Conference 2021 Conference Paper

Unified Robust Semi-Supervised Variational Autoencoder

  • Xu Chen

In this paper, we propose a novel noise-robust semi-supervised deep generative model by jointly tackling noisy labels and outliers simultaneously in a unified robust semi-supervised variational autoencoder (URSVAE). Typically, the uncertainty of of input data is characterized by placing uncertainty prior on the parameters of the probability density distributions in order to ensure the robustness of the variational encoder towards outliers. Subsequently, a noise transition model is integrated naturally into our model to alleviate the detrimental effects of noisy labels. Moreover, a robust divergence measure is employed to further enhance the robustness, where a novel variational lower bound is derived and optimized to infer the network parameters. By proving the influence function on the proposed evidence lower bound is bounded, the enormous potential of the proposed model in the classification in the presence of the compound noise is demonstrated. The experimental results highlight the superiority of the proposed framework by the evaluating on image classification tasks and comparing with the state-of-the-art approaches.

AAAI Conference 2020 Conference Paper

AutoDAL: Distributed Active Learning with Automatic Hyperparameter Selection

  • Xu Chen
  • Brett Wujek

Automated machine learning (AutoML) strives to establish an appropriate machine learning model for any dataset automatically with minimal human intervention. Although extensive research has been conducted on AutoML, most of it has focused on supervised learning. Research of automated semisupervised learning and active learning algorithms is still limited. Implementation becomes more challenging when the algorithm is designed for a distributed computing environment. With this as motivation, we propose a novel automated learning system for distributed active learning (AutoDAL) to address these challenges. First, automated graph-based semisupervised learning is conducted by aggregating the proposed cost functions from different compute nodes in a distributed manner. Subsequently, automated active learning is addressed by jointly optimizing hyperparameters in both the classification and query selection stages leveraging the graph loss minimization and entropy regularization. Moreover, we propose an efficient distributed active learning algorithm which is scalable for big data by first partitioning the unlabeled data and replicating the labeled data to different worker nodes in the classification stage, and then aggregating the data in the controller in the query selection stage. The proposed Auto- DAL algorithm is applied to multiple benchmark datasets and a real-world electrocardiogram (ECG) dataset for classification. We demonstrate that the proposed AutoDAL algorithm is capable of achieving significantly better performance compared to several state-of-the-art AutoML approaches and active learning algorithms.

AAAI Conference 2019 Conference Paper

Dynamic Explainable Recommendation Based on Neural Attentive Models

  • Xu Chen
  • Yongfeng Zhang
  • Zheng Qin

Providing explanations in a recommender system is getting more and more attention in both industry and research communities. Most existing explainable recommender models regard user preferences as invariant to generate static explanations. However, in real scenarios, a user’s preference is always dynamic, and she may be interested in different product features at different states. The mismatching between the explanation and user preference may degrade costumers’ satisfaction, confidence and trust for the recommender system. With the desire to fill up this gap, in this paper, we build a novel Dynamic Explainable Recommender (called DER) for more accurate user modeling and explanations. In specific, we design a time-aware gated recurrent unit (GRU) to model user dynamic preferences, and profile an item by its review information based on sentence-level convolutional neural network (CNN). By attentively learning the important review information according to the user current state, we are not only able to improve the recommendation performance, but also can provide explanations tailored for the users’ current preferences. We conduct extensive experiments to demonstrate the superiority of our model for improving recommendation performance. And to evaluate the explainability of our model, we first present examples to provide intuitive analysis on the highlighted review information, and then crowd-sourcing based evaluations are conducted to quantitatively verify our model’s superiority.

IROS Conference 2018 Conference Paper

StreetMap - Mapping and Localization on Ground Planes using a Downward Facing Camera

  • Xu Chen
  • Anurag Sai Vempati
  • Paul A. Beardsley

This paper describes a system to map a ground-plane, and to subsequently use the map for localization of a mobile robot. The robot has a downward-facing camera, and works on a variety of ground textures including general texture like tarmac, man-made designs like carpet, and rectilinear textures like indoor tiles or outdoor slabs. Such textures provide a basis for measuring relative motion (i. e. computer mouse functionality). But the goal here is the more challenging one of absolute localization. The paper describes a complete working pipeline to build a globally consistent map of a given ground-plane and subsequently to localize within this map at real-time. Two algorithms are described. The first is a feature-based approach which is general to any ground plane texture. The second algorithm takes advantage of the extra constraints available for common rectilinear textures like indoor tiling, paving slabs, and laid brickwork. Quantitative and qualitative experimental results are shown for mapping and localization on a variety of ground-planes.

NeurIPS Conference 2014 Conference Paper

Unsupervised Deep Haar Scattering on Graphs

  • Xu Chen
  • Xiuyuan Cheng
  • Stephane Mallat

The classification of high-dimensional data defined on graphs is particularly difficult when the graph geometry is unknown. We introduce a Haar scattering transform on graphs, which computes invariant signal descriptors. It is implemented with a deep cascade of additions, subtractions and absolute values, which iteratively compute orthogonal Haar wavelet transforms. Multiscale neighborhoods of unknown graphs are estimated by minimizing an average total variation, with a pair matching algorithm of polynomial complexity. Supervised classification with dimension reduction is tested on data bases of scrambled images, and for signals sampled on unknown irregular grids on a sphere.