Arrow Research search

Author name cluster

Wei Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

48 papers
2 author rows

Possible papers

48

AAAI Conference 2026 Conference Paper

Kronos: A Foundation Model for the Language of Financial Markets

  • Yu Shi
  • Zongliang Fu
  • Shuo Chen
  • Bohan Zhao
  • Wei Xu
  • Changshui Zhang
  • Jian Li

The success of large-scale pre-training paradigm, exemplified by Large Language Models (LLMs), has inspired the development of Time Series Foundation Models (TSFMs). However, their application to financial candlestick (K-line) data remains limited, often underperforming non-pre-trained architectures. Moreover, existing TSFMs often overlook crucial downstream tasks such as volatility prediction and synthetic data generation. To address these limitations, we propose Kronos, a unified, scalable pre-training framework tailored to financial K-line modeling. Kronos introduces a specialized tokenizer that discretizes continuous market information into token sequences, preserving both price dynamics and trade activity patterns. We pre-train Kronos using an autoregressive objective on a massive, multi-market corpus of over 12 billion K-line records from 45 global exchanges, enabling it to learn nuanced temporal and cross-asset representations. Kronos excels in a zero-shot setting across a diverse set of financial tasks. On benchmark datasets, Kronos boosts price series forecasting RankIC by 93% over the leading TSFM and 87% over the best non-pre-trained baseline. It also achieves a 9% lower MAE in volatility forecasting and a 22% improvement in generative fidelity for synthetic K-line sequences. These results establish Kronos as a robust, versatile foundation model for end-to-end financial time series analysis.

TIST Journal 2026 Journal Article

Toward Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT

  • Zhen Tao
  • YanFang Chen
  • Dinghao Xi
  • Zhiyu Li
  • Wei Xu

The increasing prevalence of large language models (LLMs) has significantly advanced text generation, but the human-like quality of LLM outputs presents major challenges in reliably distinguishing between human-authored and LLM-generated texts. Existing detection benchmarks are constrained by their reliance on static datasets, scenario-specific tasks (e.g., question answering and text refinement), and a primary focus on English, overlooking the diverse linguistic and operational subtleties of LLMs. To address these gaps, we propose CUDRT, a comprehensive evaluation framework and bilingual benchmark in Chinese and English, categorizing LLM activities into five key operations: Create, Update, Delete, Rewrite, and Translate. CUDRT provides extensive datasets tailored to each operation, featuring outputs from state-of-the-art LLMs to assess the reliability of LLM-generated text detectors. This framework supports scalable, reproducible experiments and enables in-depth analysis of how operational diversity, bilingual training sets, and LLM architectures influence detection performance. Our extensive experiments demonstrate the framework’s capacity to optimize detection systems and provide practical guidance for training model-based detectors, revealing that training on specific operations and outputs from certain LLMs significantly improves model-based detector generalization. By advancing robust methodologies for identifying LLM-generated texts, this work contributes to the development of intelligent systems capable of meeting real-world bilingual detection challenges. Source code and dataset are available at GitHub.

NeurIPS Conference 2025 Conference Paper

A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications

  • Zhenyu Tao
  • Wei Xu
  • Xiaohu You

The bisimulation metric (BSM) is a powerful tool for computing state similarities within a Markov decision process (MDP), revealing that states closer in BSM have more similar optimal value functions. While BSM has been successfully utilized in reinforcement learning (RL) for tasks like state representation learning and policy exploration, its application to multiple-MDP scenarios, such as policy transfer, remains challenging. Prior work has attempted to generalize BSM to pairs of MDPs, but a lack of rigorous analysis of its mathematical properties has limited further theoretical progress. In this work, we formally establish a generalized bisimulation metric (GBSM) between pairs of MDPs, which is rigorously proven with the three fundamental properties: GBSM symmetry, inter-MDP triangle inequality, and the distance bound on identical states. Leveraging these properties, we theoretically analyse policy transfer, state aggregation, and sampling-based estimation in MDPs, obtaining explicit bounds that are strictly tighter than those derived from the standard BSM. Additionally, GBSM provides a closed-form sample complexity for estimation, improving upon existing asymptotic results based on BSM. Numerical results validate our theoretical findings and demonstrate the effectiveness of GBSM in multi-MDP scenarios.

AAAI Conference 2025 Conference Paper

CROSSNEWS: A Cross-Genre Authorship Verification and Attribution Benchmark

  • Marcus Ma
  • Duong Minh Le
  • Junmo Kang
  • Yao Dou
  • John Cadigan
  • Dayne Freitag
  • Alan Ritter
  • Wei Xu

Authorship models have historically generalized poorly to new domains because of the wide distribution of author-identifying signals across domains. In particular, the effects of topic and genre are highly domain-dependent and impact authorship analysis performance greatly. This paper addresses the existing data gap in authorship for these resources by introducing CROSSNEWS, a novel cross-genre dataset that connects formal journalistic articles and casual social media posts. CROSSNEWS is the largest authorship dataset of its kind for supporting both verification and attribution tasks, with comprehensive topic and genre annotations. We use CROSSNEWS to demonstrate that current models exhibit poor performance in genre transfer scenarios, underscoring the need for authorship models robust to genre-specific effects. We also explore SELMA, a new LLM embedding approach for large-scale authorship setups that outperforms existing models in both same-genre and cross-genre settings.

EAAI Journal 2025 Journal Article

Deep reinforcement learning explanation-assisted integer variable reduction method for security-constrained unit commitment

  • Yuchen Dai
  • Wei Xu
  • Minghui Yan
  • Feng Xue
  • Jianfeng Zhao

The large-scale security-constrained unit commitment (SCUC) is pivotal for ensuring the secure and economical operation of modern power systems. Formulated as a mixed-integer nonlinear programming problem, mathematical model-based methods struggle to balance computation efficiency and solution accuracy. While artificial intelligence methods offer promising potential, they face several obstacles, including limited interpretability and generalizability constraints. In light of these challenges, this paper proposes an interpretation method for deep reinforcement learning models that is used to reduce integer variables for large-scale SCUC problem. This method employs a Gaussian Mixture Model to cluster the decision outcomes of the agents and utilizes an improved decision tree to interpret the clustering results. We analyze the physical implications behind the phenomenon of unit output distributions exhibiting multiple independent Gaussian distributions. Then, these interpretations are applied to identify active integer variables, thereby simplifying the complexity of the SCUC problem and enhancing solution efficiency. Furthermore, an improved Markov decision process model with domain knowledge pertinent of power systems is constructed to enhance the interpretability and reliability of the agents. A distinctive feature of this model is the incorporation of a bidirectional mapping of unsafe and safe actions. The case studies on the SG-126 system demonstrate that the proposed method achieves a significant increase in solution speed without loss of accuracy. The identified active integer variables are proven to be accurate and effective, contributing to improve computation efficiency of unit commitment. The proposed method also provides a novel explainable artificial intelligence-assisted method for complex decision-making problems in other fields.

YNIMG Journal 2025 Journal Article

Facilitating cognitive neuroscience research with 80-sensor optically pumped magnetometer magnetoencephalography (OPM-MEG)

  • Wei Xu
  • Pan Liao
  • Miao Cao
  • David J. White
  • Bingjiang Lyu
  • Jia-Hong Gao

Recent advancements in optically pumped magnetometer magnetoencephalography (OPM-MEG) make it a promising alternative to conventional SQUID-MEG systems. Nonetheless, as reported in the literature, current OPM-MEG systems are often constrained by a limited number of sampling points, which restricts their capability to match the full-head coverage offered by SQUID-MEG systems. Additionally, whether OPM-MEG can deliver results comparable to SQUID-MEG in practical cognitive neuroscience applications remains largely unexplored. In this study, we introduce a high-density, full-head coverage OPM-MEG system with 80 sensors and systematically compare the performance of OPM-MEG and SQUID-MEG, from sensor- to source-level analysis, across various classic cognitive tasks. Our results demonstrate that visual and auditory evoked fields captured using OPM-MEG align closely with those obtained from SQUID-MEG. Furthermore, steady-state visual evoked field and finger-tapping-induced beta power change recorded with OPM-MEG are accurately localized to corresponding brain regions, with activation centers highly congruent to those observed with SQUID-MEG. For resting-state recordings, the two modalities exhibit similar power distributions, functional connectomes, and microstate clusters. These findings indicate that the 80-sensor OPM-MEG system provides spatial and temporal characteristics comparable to those of traditional SQUID-MEG. Thus, our study offers empirical evidence supporting the efficacy of high-density OPM-MEG and suggests that OPM-MEG, with dense sampling capability, represents a compelling alternative to conventional SQUID-MEG, facilitating further exploration of human cognition.

TMLR Journal 2025 Journal Article

Generalized Tangent Kernel: A Unified Geometric Foundation for Natural Gradient and Standard Gradient

  • Qinxun Bai
  • Steven Rosenberg
  • Wei Xu

Natural gradients have been widely studied from both theoretical and empirical perspectives, and it is commonly believed that natural gradients have advantages over standard (Euclidean) gradients in capturing the intrinsic geometric structure of the underlying function space and being invariant under reparameterization. However, for function optimization, a fundamental theoretical issue regarding the existence of natural gradients on the function space remains underexplored. We address this issue by providing a geometric perspective and mathematical framework for studying both natural gradient and standard gradient that is more complete than existing studies. The key tool that unifies natural gradient and standard gradient is a generalized form of the Neural Tangent Kernel (NTK), which we name the Generalized Tangent Kernel (GTK). Using a novel orthonormality property of GTK, we show that for a fixed parameterization, GTK determines a Riemannian metric on the entire function space which makes the standard gradient as “natural" as the natural gradient in capturing the intrinsic structure of the parameterized function space. Many aspects of this approach relate to RKHS theory. For the practical side of this theory paper, we showcase that our framework motivates new solutions to the non-immersion/degenerate case of natural gradient and leads to new families of natural/standard gradient descent methods.

JBHI Journal 2025 Journal Article

High-Frequency SSVEP-BCI With Row-Column Dual-Frequency Encoding and Decoding Strategy for Reduced Training Data

  • Yufeng Ke
  • Xiaohe Chen
  • Wei Xu
  • Tao Wang
  • Shuaishuai Shen
  • Dong Ming

Steady-state visual evoked potentials (SSVEP)-based brain-computer interfaces (BCIs) have the potential to be utilized in various fields due to their high accuracies and information transfer rates (ITR). High-frequency (HF) visual stimuli have shown promise in reducing visual fatigue and enhancing user comfort. However, these HF-SSVEP-BCIs often face limitations in the number of commands and typically require extensive individual training data to achieve high performance. In this study, we proposed a row-column dual-frequency encoding and decoding method using HF stimulation to develop a comfortable BCI system that supports multiple commands and reduces training costs. We arranged 20 targets in a matrix of five rows and four columns, with each target modulated by left-and-right field stimulation using two frequency-phase combinations. Targets in each row or column share a unique frequency-phase combination, allowing EEG data from the same row or column to be used collectively to train a row/column index decoding model for target identification. To evaluate the performance of our method, we constructed a 20-target asynchronous robotic arm control system with the adaptive window method. With only four training trials per target, the online system achieved an ITR of 105. 14 ± 14. 15 bits/min, a true positive rate of 98. 18 ± 2. 87%, a false positive rate of 7. 39 ± 6. 73%, and a classification accuracy of 91. 88 ± 5. 75%, with an average data length of 925. 70 ± 45. 44 ms. These results indicate that the proposed protocol can deliver accurate and rapid command outputs for a comfortable SSVEP-based BCI with minimal training data and fewer frequencies.

AAMAS Conference 2025 Conference Paper

IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems

  • Yihuan Mao
  • Yipeng Kang
  • Peilun Li
  • Ning Zhang
  • Wei Xu
  • Chongjie Zhang

As AI agents become integral to infrastructure, robust coordination and message synchronization are crucial. The Byzantine Generals Problem (BGP) models resilience in multi-agent systems (MAS) under adversarial conditions, handling scenarios with malicious agents—stemming from AI hallucinations or external attacks. Traditional BGP demands global consensus, which is often unnecessary and inefficient in practice. We introduce Imperfect BGP (IBGP), aligning with the local coordination patterns in MAS to address this gap, offering provable resilience against communication attacks and adaptability to changing environments, as validated by empirical results.

IJCAI Conference 2025 Conference Paper

Map2Traj: Street Map Piloted Zero-shot Trajectory Generation Method for Wireless Network Optimization

  • Zhenyu Tao
  • Wei Xu
  • Xiaohu You

In modern wireless networks, user mobility modeling plays a pivotal role in learning-based network optimization, particularly in tasks such as user association and resource allocation. Traditional random mobility models, e. g. , random waypoint and Gauss Markov model, often fail to accurately capture the distribution patterns of users within real-world areas. While trace-based mobility models and advanced learning-based trajectory generation methods offer improvements, they are frequently limited by the scarcity of real-world trajectory data in target areas, primarily due to privacy concerns. This paper introduces Map2Traj, a novel zero-shot trajectory generation method that leverages the diffusion model to capture the intrinsic relationship between street maps and user mobility. With solely the street map of an unobserved area, Map2Traj generates synthetic user trajectories that closely resemble the real-world ones in trajectory pattern and spatial distribution. This enables the creation of high-fidelity individual user channel states and an accurate representation of the overall network user distribution, facilitating effective wireless network optimization. Extensive experiments across multiple regions in Xi'an and Chengdu, China demonstrate the effectiveness of our proposed method for zero-shot trajectory generation. A case study applying Map2Traj to user association and load balancing in wireless networks is also presented to validate its efficacy in network optimization.

NeurIPS Conference 2025 Conference Paper

NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding

  • Wei Xu
  • Cheng Wang
  • Dingkang Liang
  • Zongchuang Zhao
  • Xingyu Jiang
  • Peng Zhang
  • Xiang Bai

Underwater exploration offers critical insights into our planet and attracts increasing attention for its broader applications in resource exploration, national security, etc. We study the underwater scene understanding methods, which aim to achieve automated underwater exploration. The underwater scene understanding task demands multi-task perceptions from multiple granularities. However, the absence of large-scale underwater multi-task instruction-tuning datasets hinders the progress of this research. To bridge this gap, we construct NautData, a dataset containing 1. 45 M image-text pairs supporting eight underwater scene understanding tasks. It enables the development and thorough evaluation of the underwater scene understanding models. Underwater image degradation is a widely recognized challenge that interferes with underwater tasks. To improve the robustness of underwater scene understanding, we introduce physical priors derived from underwater imaging models and propose a plug-and-play vision feature enhancement (VFE) module, which explicitly restores clear underwater information. We integrate this module into renowned baselines LLaVA-1. 5 and Qwen2. 5-VL and build our underwater LMM, NAUTILUS. Experiments conducted on the NautData and public underwater datasets demonstrate the effectiveness of the VFE module, consistently improving the performance of both baselines on the majority of supported tasks, thus ensuring the superiority of NAUTILUS in the underwater scene understanding area. Data and models are available at https: //github. com/H-EmbodVis/NAUTILUS.

EAAI Journal 2025 Journal Article

Pointer type instrument reading method based on key point detection

  • Yongjie Zhai
  • Wei Xu
  • Zhenyuan Zhao
  • Guotian Yang
  • Biqiang Du

Accurate and generalizable instrument recognition is a requirement of multiple types of pointer instruments in power distribution plants. However, it is difficult to read instruments with limited visibility. A global geometric network (GGNet) key point detection model is proposed for pointer-type instrument recognition by analyzing the geometric structure of these instruments and detecting the scale key points, pointer and dial center. First, a spatial channel fusion (SCF) module is used to obtain the global sensory field, to enhance the detection effect of the pointer tip key points. Second, an ellipse-aware feature aggregator (EFA) module is designed to enhance the overall robustness of the model, adaptively capture the global geometric information of the key points on the instrument scale, and form a GGNet model, thus, the detection effect of the difficult-to-detect scale is improved in the scene of limited visibility, and high precision reading is finally achieved. The experimental results show that the algorithm achieves an accuracy of 94. 0% in key point detection and an average error of only 0. 37% in readings for the pointer gauge recognition task. The results comparing GGNet with four other different network models show that the proposed model performs best for a number of error metrics.

YNIMG Journal 2025 Journal Article

Temporal dynamics of quantity processing: distinct time course and representational patterns revealed by multivariate pattern analysis

  • Jinhua Tian
  • Wei Xu
  • Bailu Si
  • Guochen Sun
  • Ke Zhou

People employ both discrete and continuous quantities to quantify aspects of their environment. However, the temporal dynamics and interactions underlying the processing of these quantitative information remain insufficiently understood. Our study aimed to address this gap by employing a one-back task in conjunction with magnetoencephalography (MEG) to investigate neural responses to dot stimuli representing both discrete (e.g., number of dots) and continuous (e.g., distribution of dots in space) quantities. Our primary finding, derived from representational similarity analysis (RSA), was that processing of field area and numerosity information preceded that of individual information (e.g., individual area and shape), suggesting different timing in the processing of these visual dimensions. Furthermore, within-dimensional temporal generalization analysis revealed distinct temporal patterns of these two different information: numerosity and field area exhibited a combination of chain-like (sequential, non-overlapping processes) and reactivated (initially active, then silent, then reactivated) patterns. Notably, an intermediate 'silent' phase emerged between the initial generalizable representation and subsequent trials, indicating the retrieval of early information to meet subsequent task demands (e.g., the one-back response). In contrast, individual area and shape predominantly followed a chain-like pattern. Furthermore, cross-dimensional temporal generalization analysis showed that numerosity and individual area representations could generalize to each other point-to-point in time, and that early numerosity representations and late individual area representations also generalized to each other, implying both parallel and sequential shared representation of these quantities. Field area showed limited generalization to numerosity and individual area, suggesting that they are processed independently. In summary, our results suggest that quantity processing involves temporally distinct operations with different processing timings and a shared encoding pattern of numerosity and individual area that links these temporally distinct processes.

NeurIPS Conference 2024 Conference Paper

A Unified Framework for 3D Scene Understanding

  • Wei Xu
  • Chunsheng Shi
  • Sifan Tu
  • Xin Zhou
  • Dingkang Liang
  • Xiang Bai

We propose UniSeg3D, a unified 3D scene understanding framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary segmentation tasks within a single model. Most previous 3D segmentation approaches are typically tailored to a specific task, limiting their understanding of 3D scenes to a task-specific perspective. In contrast, the proposed method unifies six tasks into unified representations processed by the same Transformer. It facilitates inter-task knowledge sharing, thereby promoting comprehensive 3D scene understanding. To take advantage of multi-task unification, we enhance performance by establishing explicit inter-task associations. Specifically, we design knowledge distillation and contrastive learning to transfer task-specific knowledge across different tasks. Experiments on three benchmarks, including ScanNet20, ScanRefer, and ScanNet200, demonstrate that the UniSeg3D consistently outperforms current SOTA methods, even those specialized for individual tasks. We hope UniSeg3D can serve as a solid unified baseline and inspire future work. Code and models are available at \url{https: //dk-liang. github. io/UniSeg3D/}.

ICML Conference 2024 Conference Paper

FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

  • Yuwei Fu
  • Haichao Zhang
  • Di Wu 0044
  • Wei Xu
  • Benoit Boulet

In this work, we investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with pre-defined textual task descriptions. We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks. To address this issue, we introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL. Specifically, we enhance the performance of SAC/DrQ baseline agents on sparse reward tasks by fine-tuning VLM representations and using relay RL to avoid local minima. Extensive experiments on the Meta-world benchmark tasks demonstrate the efficacy of the proposed method. Code is available at: https: //github. com/fuyw/FuRL.

EAAI Journal 2024 Journal Article

Intelligent control of district heating system based on RDPG

  • Mingju Gong
  • Yan Liu
  • Jiawang Sun
  • Wei Xu
  • Wenxiang Li
  • Changcheng Yan
  • Wencheng Fu

Given the continuous expansion of heating areas in recent years, the design of a precise and dependable district heating system (DHS) has become increasingly crucial. Traditional control decisions are made based on real-time environmental temperature feedback, often leading to uneven heating on the user side and affecting residents' comfort. This paper proposes an intelligent control strategy based on the deep reinforcement learning recurrent deterministic policy gradient (RDPG) algorithm for DHSs. To explore the control performance of the RDPG algorithm on DHS, we have meticulously modeled the pivotal components of the DHSs, namely plate heat exchangers, secondary heating pipe networks, and heat users. Moreover, taking into account the periodic factors in heating regulation, the traditional recurrent neural network (RNN) in the recurrent deterministic policy gradient (RDPG) algorithm has been replaced with the long short-term memory (LSTM) network. The proposed algorithm was trained using actual data from a heat exchange station in Tianjin and compared with reinforcement learning algorithms such as TD3, DPPO, DDPG, and A3C in terms of training rewards, effectiveness, and training stability. The results of the models are evaluated and visualized. Experimental results show that the proposed control method based on the RDPG algorithm, compared to other control schemes, can achieve the highest training reward and the most stable control performance, with an indoor temperature fluctuation range of only 0. 1 °C.

ICLR Conference 2024 Conference Paper

Kill Two Birds with One Stone: Rethinking Data Augmentation for Deep Long-tailed Learning

  • Binwu Wang
  • Pengkun Wang 0001
  • Wei Xu
  • Xu Wang 0029
  • Yudong Zhang 0005
  • Kun Wang 0056
  • Yang Wang 0015

Real-world tasks are universally associated with training samples that exhibit a long-tailed class distribution, and traditional deep learning models are not suitable for fitting this distribution, thus resulting in a biased trained model. To surmount this dilemma, massive deep long-tailed learning studies have been proposed to achieve inter-class fairness models by designing sophisticated sampling strategies or improving existing model structures and loss functions. Habitually, these studies tend to apply data augmentation strategies to improve the generalization performance of their models. However, this augmentation strategy applied to balanced distributions may not be the best option for long-tailed distributions. For a profound understanding of data augmentation, we first theoretically analyze the gains of traditional augmentation strategies in long-tailed learning, and observe that augmentation methods cause the long-tailed distribution to be imbalanced again, resulting in an intertwined imbalance: inherent data-wise imbalance and extrinsic augmentation-wise imbalance, i.e., two 'birds' co-exist in long-tailed learning. Motivated by this observation, we propose an adaptive Dynamic Optional Data Augmentation (DODA) to address this intertwined imbalance, i.e., one 'stone' simultaneously 'kills' two 'birds', which allows each class to choose appropriate augmentation methods by maintaining a corresponding augmentation probability distribution for each class during training. Extensive experiments across mainstream long-tailed recognition benchmarks (e.g., CIFAR-100-LT, ImageNet-LT, and iNaturalist 2018) prove the effectiveness and flexibility of the DODA in overcoming the intertwined imbalance.

IJCAI Conference 2024 Conference Paper

Make Bricks with a Little Straw: Large-Scale Spatio-Temporal Graph Learning with Restricted GPU-Memory Capacity

  • Binwu Wang
  • Pengkun Wang
  • Zhengyang Zhou
  • Zhe Zhao
  • Wei Xu
  • Yang Wang

Traffic prediction plays a key role in various smart city applications, which can help traffic managers make traffic plans in advance, assist online ride-hailing companies in deploying vehicles reasonably, and provide early warning of congestion for safety authorities. While increasingly complex models achieve impressive prediction performance, there are concerns about the effectiveness of these models in handling large-scale road networks. Especially for researchers who don't have access to powerful GPU devices, the expensive memory burden limits the usefulness of these models. In this paper, we take the first step of learning on the large-scale spatio-temporal graph and propose a divide-and-conquer training strategy for Large Spatio-Temporal Graph Learning, namely LarSTL. The core idea behind this strategy is to divide the large graph into multiple subgraphs, which are treated as task streams to sequentially train the model to conquer each subgraph one by one. We introduce a novel perspective based on the continuous learning paradigm to achieve this goal. In order to overcome forgetting the knowledge learned from previous subgraphs, an experience-replay strategy consolidates the learned knowledge by replaying nodes sampled from previous subgraphs. Moreover, we configure specific feature adaptors for each subgraph to extract personalized features, and it is also beneficial to consolidate the learned knowledge from the perspective of parameters. We conduct experiments using multiple large-scale traffic network datasets on a V100 GPU with only 16GB memory, and the results demonstrate that our LarSTL can achieve competitive performance and high efficiency.

IROS Conference 2024 Conference Paper

MFCalib: Single-shot and Automatic Extrinsic Calibration for LiDAR and Camera in Targetless Environments Based on Multi-Feature Edge

  • Tianyong Ye
  • Wei Xu
  • Chunran Zheng
  • Yukang Cui 0001

This paper presents MFCalib, an innovative extrinsic calibration technique for LiDAR and RGB camera that operates automatically in targetless environments with a single data capture. At the heart of this method is using a rich set of edge information, significantly enhancing calibration accuracy and robustness. Specifically, we extract both depth-continuous and depth-discontinuous edges, along with intensity-discontinuous edges on planes. This comprehensive edge extraction strategy ensures our ability to achieve accurate calibration with just one round of data collection, even in complex and varied settings. Addressing the uncertainty of depth-discontinuous edges, we delve into the physical measurement principles of LiDAR and develop a beam model, effectively mitigating the issue of edge inflation caused by the LiDAR beam. Extensive experiment results demonstrate that MFCalib outperforms the state-of-the-art targetless calibration methods across various scenes, achieving and often surpassing the precision of multi-scene calibrations in a single-shot collection. To support community development, we make our code available open-source on GitHub.

NeurIPS Conference 2024 Conference Paper

PointMamba: A Simple State Space Model for Point Cloud Analysis

  • Dingkang Liang
  • Xin Zhou
  • Wei Xu
  • Xingkui Zhu
  • Zhikang Zou
  • Xiaoqing Ye
  • Xiao Tan
  • Xiang Bai

Transformers have become one of the foundational architectures in point cloud analysis tasks due to their excellent global modeling ability. However, the attention mechanism has quadratic complexity, making the design of a linear complexity method with global modeling appealing. In this paper, we propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks. Unlike traditional Transformers, PointMamba employs a linear complexity algorithm, presenting global modeling capacity while significantly reducing computational costs. Specifically, our method leverages space-filling curves for effective point tokenization and adopts an extremely simple, non-hierarchical Mamba encoder as the backbone. Comprehensive evaluations demonstrate that PointMamba achieves superior performance across multiple datasets while significantly reducing GPU memory usage and FLOPs. This work underscores the potential of SSMs in 3D vision-related tasks and presents a simple yet effective Mamba-based baseline for future research. The code is available at https: //github. com/LMD0311/PointMamba.

NeurIPS Conference 2024 Conference Paper

Robot Policy Learning with Temporal Optimal Transport Reward

  • Yuwei Fu
  • Haichao Zhang
  • Di Wu
  • Wei Xu
  • Benoit Boulet

Reward specification is one of the most tricky problems in Reinforcement Learning, which usually requires tedious hand engineering in practice. One promising approach to tackle this challenge is to adopt existing expert video demonstrations for policy learning. Some recent work investigates how to learn robot policies from only a single/few expert video demonstrations. For example, reward labeling via Optimal Transport (OT) has been shown to be an effective strategy to generate a proxy reward by measuring the alignment between the robot trajectory and the expert demonstrations. However, previous work mostly overlooks that the OT reward is invariant to temporal order information, which could bring extra noise to the reward signal. To address this issue, in this paper, we introduce the Temporal Optimal Transport (TemporalOT) reward to incorporate temporal order information for learning a more accurate OT-based proxy reward. Extensive experiments on the Meta-world benchmark tasks validate the efficacy of the proposed method. Our code is available at: https: //github. com/fuyw/TemporalOT.

ICML Conference 2024 Conference Paper

Two Fists, One Heart: Multi-Objective Optimization Based Strategy Fusion for Long-tailed Learning

  • Zhe Zhao 0008
  • Pengkun Wang 0001
  • Haibin Wen
  • Wei Xu
  • Song Lai 0001
  • Qingfu Zhang 0001
  • Yang Wang 0015

Real-world data generally follows a long-tailed distribution, which makes traditional high-performance training strategies unable to show their usual effects. Various insights have been proposed to alleviate this challenging distribution. However, some observations indicate that models trained on long-tailed distributions always show a trade-off between the performance of head and tail classes. For a profound understanding of the trade-off, we first theoretically analyze the trade-off problem in long-tailed learning and creatively transform the trade-off problem in long-tailed learning into a multi-objective optimization (MOO) problem. Motivated by these analyses, we propose the idea of strategy fusion for MOO long-tailed learning and point out the potential conflict problem. We further design a Multi-Objective Optimization based Strategy Fusion (MOOSF), which effectively resolves conflicts, and achieves an efficient fusion of heterogeneous strategies. Comprehensive experiments on mainstream datasets show that even the simplest strategy fusion can outperform complex long-tailed strategies. More importantly, it provides a new perspective for generalized long-tailed learning. The code is available in the accompanying supplementary materials.

JBHI Journal 2024 Journal Article

Using the Cocktail Party Effect to Add the Coding Dimension of Auditory Event Related Potential Brain-Computer Interface

  • Wei Xu
  • Jiabei Tang
  • Hongzhi Qi

Objective: The auditory event-related potential based brain–computer interface (aERP-BCI) is a classical paradigm of brain–computer communication. To improve the coding efficiency of aERP-BCI, this study proposes a method using two parallel voice channels to add the coding dimension based on the cocktail party effect. Methods: The novel paradigm used male and female voices to establish two parallel oddball sound stimulus sequences. In comparison, the baseline paradigm only presented male or female stimulus sequences. Both the double voice condition (DVC) and the single voice condition (SVC) paradigms carried out offline experiments and the DVC also carried out online experiment. Subsequently, the EEG signal and BCI operation results were compared and analyzed. Conclusion: The cocktail party effect caused a significant difference in the EEG responses of non-target stimulus between the focused vocal channel and the ignored vocal channel under the DVC paradigm, and the focused and ignored channels achieved a recognition accuracy of 97. 2%. The target recognition rate of DVC was 82. 3%, with no significant difference compared with 85% of SVC while the information transfer rate (ITR) of DVC reaching 15. 3 bits/min was significantly higher than that of SVC. Significance: The cocktail party effect improves the coding efficiency by adding parallel channels without reducing the target/non-target stimulus recognition in the focused vocal channel. This provides a novel direction for the performance improvement of aERP-BCI.

YNIMG Journal 2024 Journal Article

Wireless optically pumped magnetometer MEG

  • Hao Cheng
  • Kaiyan He
  • Congcong Li
  • Xiao Ma
  • Fufu Zheng
  • Wei Xu
  • Pan Liao
  • Rui Yang

The current magnetoencephalography (MEG) systems, which rely on cables for control and signal transmission, do not fully realize the potential of wearable optically pumped magnetometers (OPM). This study presents a significant advancement in wireless OPM-MEG by reducing magnetization in the electronics and developing a tailored wireless communication protocol. Our protocol effectively eliminates electromagnetic interference, particularly in the critical frequency bands of MEG signals, and accurately synchronizes the acquisition and stimulation channels with the host computer's clock. We have successfully achieved single-channel wireless OPM-MEG measurement and demonstrated its reliability by replicating three well-established experiments: The alpha rhythm, auditory evoked field, and steady-state visual evoked field in the human brain. Our prototype wireless OPM-MEG system not only streamlines the measurement process but also represents a major step forward in the development of wearable OPM-MEG applications in both neuroscience and clinical research.

JBHI Journal 2022 Journal Article

HMRNet: High and Multi-Resolution Network With Bidirectional Feature Calibration for Brain Structure Segmentation in Radiotherapy

  • Hao Fu
  • Guotai Wang
  • Wenhui Lei
  • Wei Xu
  • Qianfei Zhao
  • Shichuan Zhang
  • Kang Li
  • Shaoting Zhang

Accurate segmentation of Anatomical brain Barriers to Cancer spread (ABCs) plays an important role for automatic delineation of Clinical Target Volume (CTV) of brain tumors in radiotherapy. Despite that variants of U-Net are state-of-the-art segmentation models, they have limited performance when dealing with ABCs structures with various shapes and sizes, especially thin structures (e. g. , the falx cerebri) that span only few slices. To deal with this problem, we propose a High and Multi-Resolution Network (HMRNet) that consists of a multi-scale feature learning branch and a high-resolution branch, which can maintain the high-resolution contextual information and extract more robust representations of anatomical structures with various scales. We further design a Bidirectional Feature Calibration (BFC) block to enable the two branches to generate spatial attention maps for mutual feature calibration. Considering the different sizes and positions of ABCs structures, our network was applied after a rough localization of each structure to obtain fine segmentation results. Experiments on the MICCAI 2020 ABCs challenge dataset showed that: 1) Our proposed two-stage segmentation strategy largely outperformed methods segmenting all the structures in just one stage; 2) The proposed HMRNet with two branches can maintain high-resolution representations and is effective to improve the performance on thin structures; 3) The proposed BFC block outperformed existing attention methods using monodirectional feature calibration. Our method won the second place of ABCs 2020 challenge and has a potential for more accurate and reasonable delineation of CTV of brain tumors.

YNIMG Journal 2022 Journal Article

Multimodal neuroimaging with optically pumped magnetometers: A simultaneous MEG-EEG-fNIRS acquisition system

  • Xingyu Ru
  • Kaiyan He
  • Bingjiang Lyu
  • Dongxu Li
  • Wei Xu
  • Wenyu Gu
  • Xiao Ma
  • Jiayi Liu

Multimodal neuroimaging plays an important role in neuroscience research. Integrated noninvasive neuroimaging modalities, such as magnetoencephalography (MEG), electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS), allow neural activity and related physiological processes in the brain to be precisely and comprehensively depicted, providing an effective and advanced platform to study brain function. Noncryogenic optically pumped magnetometer (OPM) MEG has high signal power due to its on-scalp sensor layout and enables more flexible configurations than traditional commercial superconducting MEG. Here, we integrate OPM-MEG with EEG and fNIRS to develop a multimodal neuroimaging system that can simultaneously measure brain electrophysiology and hemodynamics. We conducted a series of experiments to demonstrate the feasibility and robustness of our MEG-EEG-fNIRS acquisition system. The complementary neural and physiological signals simultaneously collected by our multimodal imaging system provide opportunities for a wide range of potential applications in neurovascular coupling, wearable neuroimaging, hyperscanning and brain-computer interfaces.

NeurIPS Conference 2022 Conference Paper

PaCo: Parameter-Compositional Multi-task Reinforcement Learning

  • Lingfeng Sun
  • Haichao Zhang
  • Wei Xu
  • Masayoshi Tomizuka

The purpose of multi-task reinforcement learning (MTRL) is to train a single policy that can be applied to a set of different tasks. Sharing parameters allows us to take advantage of the similarities among tasks. However, the gaps between contents and difficulties of different tasks bring us challenges on both which tasks should share the parameters and what parameters should be shared, as well as the optimization challenges due to parameter sharing. In this work, we introduce a parameter-compositional approach (PaCo) as an attempt to address these challenges. In this framework, a policy subspace represented by a set of parameters is learned. Policies for all the single tasks lie in this subspace and can be composed by interpolating with the learned set. It allows not only flexible parameter sharing, but also a natural way to improve training. We demonstrate the state-of-the-art performance on Meta-World benchmarks, verifying the effectiveness of the proposed approach.

NeurIPS Conference 2022 Conference Paper

Society of Agents: Regret Bounds of Concurrent Thompson Sampling

  • Yan Chen
  • Perry Dong
  • Qinxun Bai
  • Maria Dimakopoulou
  • Wei Xu
  • Zhengyuan Zhou

We consider the concurrent reinforcement learning problem where $n$ agents simultaneously learn to make decisions in the same environment by sharing experience with each other. Existing works in this emerging area have empirically demonstrated that Thompson sampling (TS) based algorithms provide a particularly attractive alternative for inducing cooperation, because each agent can independently sample a belief environment (and compute a corresponding optimal policy) from the joint posterior computed by aggregating all agents' data, which induces diversity in exploration among agents while benefiting shared experience from all agents. However, theoretical guarantees in this area remain under-explored; in particular, no regret bound is known on TS based concurrent RL algorithms. In this paper, we fill in this gap by considering two settings. In the first, we study the simple finite-horizon episodic RL setting, where TS is naturally adapted into the concurrent setup by having each agent sample from the current joint posterior at the beginning of each episode. We establish a $\tilde{O}(HS\sqrt{\frac{AT}{n}})$ per-agent regret bound, where $H$ is the horizon of the episode, $S$ is the number of states, $A$ is the number of actions, $T$ is the number of episodes and $n$ is the number of agents. In the second setting, we consider the infinite-horizon RL problem, where a policy is measured by its long-run average reward. Here, despite not having natural episodic breakpoints, we show that by a doubling-horizon schedule, we can adapt TS to the infinite-horizon concurrent learning setting to achieve a regret bound of $\tilde{O}(DS\sqrt{ATn})$, where $D$ is the standard notion of diameter of the underlying MDP and $T$ is the number of timesteps. Note that in both settings, the per-agent regret decreases at an optimal rate of $\Theta(\frac{1}{\sqrt{n}})$, which manifests the power of cooperation in concurrent RL.

NeurIPS Conference 2022 Conference Paper

Towards Safe Reinforcement Learning with a Safety Editor Policy

  • Haonan Yu
  • Wei Xu
  • Haichao Zhang

We consider the safe reinforcement learning (RL) problem of maximizing utility with extremely low constraint violation rates. Assuming no prior knowledge or pre-training of the environment safety model given a task, an agent has to learn, via exploration, which states and actions are safe. A popular approach in this line of research is to combine a model-free RL algorithm with the Lagrangian method to adjust the weight of the constraint reward relative to the utility reward dynamically. It relies on a single policy to handle the conflict between utility and constraint rewards, which is often challenging. We present SEditor, a two-policy approach that learns a safety editor policy transforming potentially unsafe actions proposed by a utility maximizer policy into safe ones. The safety editor is trained to maximize the constraint reward while minimizing a hinge loss of the utility state-action values before and after an action is edited. SEditor extends existing safety layer designs that assume simplified safety models, to general safe RL scenarios where the safety model can in theory be arbitrarily complex. As a first-order method, it is easy to implement and efficient for both inference and training. On 12 Safety Gym tasks and 2 safe racing tasks, SEditor obtains much a higher overall safety-weighted-utility (SWU) score than the baselines, and demonstrates outstanding utility performance with constraint violation rates as low as once per 2k time steps, even in obstacle-dense environments. On some tasks, this low violation rate is up to 200 times lower than that of an unconstrained RL method with similar utility performance. Code is available at https: //github. com/hnyu/seditor.

JBHI Journal 2021 Journal Article

Attention-Guided Deep Neural Network With Multi-Scale Feature Fusion for Liver Vessel Segmentation

  • Qingsen Yan
  • Bo Wang
  • Wei Zhang
  • Chuan Luo
  • Wei Xu
  • Zhengqing Xu
  • Yanning Zhang
  • Qinfeng Shi

Liver vessel segmentation is fast becoming a key instrument in the diagnosis and surgical planning of liver diseases. In clinical practice, liver vessels are normally manual annotated by clinicians on each slice of CT images, which is extremely laborious. Several deep learning methods exist for liver vessel segmentation, however, promoting the performance of segmentation remains a major challenge due to the large variations and complex structure of liver vessels. Previous methods mainly using existing UNet architecture, but not all features of the encoder are useful for segmentation and some even cause interferences. To overcome this problem, we propose a novel deep neural network for liver vessel segmentation, called LVSNet, which employs special designs to obtain the accurate structure of the liver vessel. Specifically, we design Attention-Guided Concatenation (AGC) module to adaptively select the useful context features from low-level features guided by high-level features. The proposed AGC module focuses on capturing rich complemented information to obtain more details. In addition, we introduce an innovative multi-scale fusion block by constructing hierarchical residual-like connections within one single residual block, which is of great importance for effectively linking the local blood vessel fragments together. Furthermore, we construct a new dataset containing 40 thin thickness cases (0. 625 mm) which consist of CT volumes and annotated vessels. To evaluate the effectiveness of the method with minor vessels, we also propose an automatic stratification method to split major and minor liver vessels. Extensive experimental results demonstrate that the proposed LVSNet outperforms previous methods on liver vessel segmentation datasets. Additionally, we conduct a series of ablation studies that comprehensively support the superiority of the underlying concepts.

ICLR Conference 2021 Conference Paper

Mutual Information State Intrinsic Control

  • Rui Zhao 0011
  • Yang Gao 0029
  • Pieter Abbeel
  • Volker Tresp
  • Wei Xu

Reinforcement learning has been shown to be highly successful at many challenging tasks. However, success heavily relies on well-shaped rewards. Intrinsically motivated RL attempts to remove this constraint by defining an intrinsic reward function. Motivated by the self-consciousness concept in psychology, we make a natural assumption that the agent knows what constitutes itself, and propose a new intrinsic objective that encourages the agent to have maximum control on the environment. We mathematically formalize this reward as the mutual information between the agent state and the surrounding state under the current agent policy. With this new intrinsic motivation, we are able to outperform previous methods, including being able to complete the pick-and-place task for the first time without using any task reward. A video showing experimental results is available at https://youtu.be/AUCwc9RThpk.

NeurIPS Conference 2021 Conference Paper

TAAC: Temporally Abstract Actor-Critic for Continuous Control

  • Haonan Yu
  • Wei Xu
  • Haichao Zhang

We present temporally abstract actor-critic (TAAC), a simple but effective off-policy RL algorithm that incorporates closed-loop temporal abstraction into the actor-critic framework. TAAC adds a second-stage binary policy to choose between the previous action and a new action output by an actor. Crucially, its "act-or-repeat" decision hinges on the actually sampled action instead of the expected behavior of the actor. This post-acting switching scheme let the overall policy make more informed decisions. TAAC has two important features: a) persistent exploration, and b) a new compare-through Q operator for multi-step TD backup, specially tailored to the action repetition scenario. We demonstrate TAAC's advantages over several strong baselines across 14 continuous control tasks. Our surprising finding reveals that while achieving top performance, TAAC is able to "mine" a significant number of repeated actions with the trained policy even on continuous tasks whose problem structures on the surface seem to repel action repetition. This suggests that aside from encouraging persistent exploration, action repetition can find its place in a good policy behavior. Code is available at https: //github. com/hnyu/taac.

AAAI Conference 2020 Conference Paper

Discourse Level Factors for Sentence Deletion in Text Simplification

  • Yang Zhong
  • Chao Jiang
  • Wei Xu
  • Junyi Jessy Li

This paper presents a data-driven study focusing on analyzing and predicting sentence deletion — a prevalent but understudied phenomenon in document simplification — on a large English text simplification corpus. We inspect various document and discourse factors associated with sentence deletion, using a new manually annotated sentence alignment corpus we collected. We reveal that professional editors utilize different strategies to meet readability standards of elementary and middle schools. To predict whether a sentence will be deleted during simplification to a certain level, we harness automatically aligned data to train a classification model. Evaluated on our manually annotated data, our best models reached F1 scores of 65. 2 and 59. 7 for this task at the levels of elementary and middle school, respectively. We find that discourse level factors contribute to the challenging task of predicting sentence deletion for simplification.

IJCAI Conference 2020 Conference Paper

Feature Statistics Guided Efficient Filter Pruning

  • Hang Li
  • Chen Ma
  • Wei Xu
  • Xue Liu

Building compact convolutional neural networks (CNNs) with reliable performance is a critical but challenging task, especially when deploying them in real-world applications. As a common approach to reduce the size of CNNs, pruning methods delete part of the CNN filters according to some metrics such as l1-norm. However, previous methods hardly leverage the information variance in a single feature map and the similarity characteristics among feature maps. In this paper, we propose a novel filter pruning method, which incorporates two kinds of feature map selections: diversity-aware selection (DFS) and similarity-aware selection (SFS). DFS aims to discover features with low information diversity while SFS removes features that have high similarities with others. We conduct extensive empirical experiments with various CNN architectures on publicly available datasets. The experimental results demonstrate that our model obtains up to 91. 6% parameter decrease and 83. 7% FLOPs reduction with almost no accuracy loss.

IJCAI Conference 2020 Conference Paper

Financial Risk Prediction with Multi-Round Q& A Attention Network

  • Zhen Ye
  • Yu Qin
  • Wei Xu

Financial risk is an essential indicator of investment, which can help investors to understand the market and companies better. Among the many influencing factors of financial risk, researchers find the earnings conference call is the most significant one. Predicting financial volatility after the earnings conference call has been critical to beneficiaries, including investors and company managers. However, previous work mainly focuses on the feature extraction from the word-level or document-level. The vital structure of conferences, the alternate dialogue, is ignored. In this paper, we introduced our Multi-Round Q&A Attention Network, which brings into account the dialogue form in the first place. Based on the data of earnings call transcripts, we apply our model to extract features of each round of dialogue through a bidirectional attention mechanism and predict the volatility after the earnings conference call events. The results prove that our model significantly outperforms the previous state-of-the-art methods and other baselines in three different periods.

IJCAI Conference 2020 Conference Paper

Multi-hop Reading Comprehension across Documents with Path-based Graph Convolutional Network

  • Zeyun Tang
  • Yongliang Shen
  • Xinyin Ma
  • Wei Xu
  • Jiale Yu
  • Weiming Lu

Multi-hop reading comprehension across multiple documents attracts much attentions recently. In this paper, we propose a novel approach to tackle this multi-hop reading comprehension problem. Inspired by the human reasoning processing, we introduce a path-based graph with reasoning paths which extracted from supporting documents. The path-based graph can combine both the idea of the graph-based and path-based approaches, so it is better for multi-hop reasoning. Meanwhile, we propose Gated-GCN to accumulate evidences on the path-based graph, which contains a new question-aware gating mechanism to regulate the usefulness of information propagating across documents and add question information during reasoning. We evaluate our approach on WikiHop dataset, and our approach achieves the the-state-of-art accuracy against previous published approaches. Especially, our ensemble model surpasses the human performance by 4. 2%.

AAAI Conference 2019 Conference Paper

RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets

  • Liping Li
  • Wei Xu
  • Tianyi Chen
  • Georgios B. Giannakis
  • Qing Ling

In this paper, we propose a class of robust stochastic subgradient methods for distributed learning from heterogeneous datasets at presence of an unknown number of Byzantine workers. The Byzantine workers, during the learning process, may send arbitrary incorrect messages to the master due to data corruptions, communication failures or malicious attacks, and consequently bias the learned model. The key to the proposed methods is a regularization term incorporated with the objective function so as to robustify the learning task and mitigate the negative effects of Byzantine attacks. The resultant subgradient-based algorithms are termed Byzantine-Robust Stochastic Aggregation methods, justifying our acronym RSA used henceforth. In contrast to most of the existing algorithms, RSA does not rely on the assumption that the data are independent and identically distributed (i. i. d.) on the workers, and hence fits for a wider class of applications. Theoretically, we show that: i) RSA converges to a near-optimal solution with the learning error dependent on the number of Byzantine workers; ii) the convergence rate of RSA under Byzantine attacks is the same as that of the stochastic gradient descent method, which is free of Byzantine attacks. Numerically, experiments on real dataset corroborate the competitive performance of RSA and a complexity reduction compared to the state-of-the-art alternatives.

AAAI Conference 2018 Conference Paper

Unsupervised Learning of Geometry From Videos With Edge-Aware Depth-Normal Consistency

  • Zhenheng Yang
  • Peng Wang
  • Wei Xu
  • Liang Zhao
  • Ramakant Nevatia

Learning to reconstruct depths from a single image by watching unlabeled videos via deep convolutional network (DCN) is attracting significant attention in recent years, e. g. (Zhou et al. 2017). In this paper, we propose to use surface normal representation for unsupervised depth estimation framework. Our estimated depths are constrained to be compatible with predicted normals, yielding more robust geometry results. Specifically, we formulate an edge-aware depth-normal consistency term, and solve it by constructing a depth-to-normal layer and a normal-to-depth layer inside of the DCN. The depth-to-normal layer takes estimated depths as input, and computes normal directions using cross production based on neighboring pixels. Then given the estimated normals, the normal-to-depth layer outputs a regularized depth map through local planar smoothness. Both layers are computed with awareness of edges inside the image to help address the issue of depth/normal discontinuity and preserve sharp edges. Finally, to train the network, we apply the photometric error and gradient smoothness to supervise both depth and normal predictions. We conducted experiments on both outdoor (KITTI) and indoor (NYUv2) datasets, and showed that our algorithm vastly outperforms state-of-the-art, which demonstrates the benefits of our approach.

TIST Journal 2018 Journal Article

Visual Analytics of Heterogeneous Data Using Hypergraph Learning

  • Cong Xie
  • Wen Zhong
  • Wei Xu
  • Klaus Mueller

For real-world learning tasks (e.g., classification), graph-based models are commonly used to fuse the information distributed in diverse data sources, which can be heterogeneous, redundant, and incomplete. These models represent the relations in different datasets as pairwise links. However, these links cannot deal with high-order relations which connect multiple objects (e.g., in public health datasets, more than two patient groups admitted by the same hospital in 2014). In this article, we propose a visual analytics approach for the classification on heterogeneous datasets using the hypergraph model. The hypergraph is an extension to traditional graphs in which a hyperedge connects multiple vertices instead of just two. We model various high-order relations in heterogeneous datasets as hyperedges and fuse different datasets with a unified hypergraph structure. We use the hypergraph learning algorithm for predicting missing labels in the datasets. To allow users to inject their domain knowledge into the model-learning process, we augment the traditional learning algorithm in a number of ways. In addition, we also propose a set of visualizations which enable the user to construct the hypergraph structure and the parameters of the learning model interactively during the analysis. We demonstrate the capability of our approach via two real-world cases.

JMLR Journal 2017 Journal Article

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization

  • Shun Zheng
  • Jialei Wang
  • Fen Xia
  • Wei Xu
  • Tong Zhang

In modern large-scale machine learning applications, the training data are often partitioned and stored on multiple machines. It is customary to employ the data parallelism approach, where the aggregated training loss is minimized without moving data across machines. In this paper, we introduce a novel distributed dual formulation for regularized loss minimization problems that can directly handle data parallelism in the distributed setting. This formulation allows us to systematically derive dual coordinate optimization procedures, which we refer to as Distributed Alternating Dual Maximization (DADM). The framework extends earlier studies described in (Boyd et al., 2011; Ma et al., 2017; Jaggi et al., 2014; Yang, 2013) and has rigorous theoretical analyses. Moreover, with the help of the new formulation, we develop the accelerated version of DADM (Acc-DADM) by generalizing the acceleration technique from (Shalev-Shwartz and Zhang, 2014) to the distributed setting. We also provide theoretical results for the proposed accelerated version, and the new result improves previous ones (Yang, 2013; Ma et al., 2017) whose iteration complexities grow linearly on the condition number. Our empirical studies validate our theory and show that our accelerated approach significantly improves the previous state- of-the-art distributed dual coordinate optimization algorithms. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

AAAI Conference 2017 Short Paper

Discovering Conversational Dependencies between Messages in Dialogs

  • Wenchao Du
  • Pascal Poupart
  • Wei Xu

We investigate the task of inferring conversational dependencies between messages in one-on-one online chat, which has become one of the most popular forms of customer service. We propose a novel probabilistic classifier that leverages conversational, lexical and semantic information. The approach is evaluated empirically on a set of customer service chat logs from a Chinese e-commerce website. It outperforms heuristic baselines.

IJCAI Conference 2017 Conference Paper

Joint Training for Pivot-based Neural Machine Translation

  • Yong Cheng
  • Qian Yang
  • Yang Liu
  • Maosong Sun
  • Wei Xu

While recent neural machine translation approaches have delivered state-of-the-art performance for resource-rich language pairs, they suffer from the data scarcity problem for resource-scarce language pairs. Although this problem can be alleviated by exploiting a pivot language to bridge the source and target languages, the source-to-pivot and pivot-to-target translation models are usually independently trained. In this work, we introduce a joint training algorithm for pivot-based neural machine translation. We propose three methods to connect the two models and enable them to interact with each other during training. Experiments on Europarl and WMT corpora show that joint training of source-to-pivot and pivot-to-target models leads to significant improvements over independent training across various languages.

AAAI Conference 2017 Conference Paper

Maximum Reconstruction Estimation for Generative Latent-Variable Models

  • Yong Cheng
  • Yang Liu
  • Wei Xu

Generative latent-variable models are important for natural language processing due to their capability of providing compact representations of data. As conventional maximum likelihood estimation (MLE) is prone to focus on explaining irrelevant but common correlations in data, we apply maximum reconstruction estimation (MRE) to learning generative latent-variable models alternatively, which aims to find model parameters that maximize the probability of reconstructing the observed data. We develop tractable algorithms to directly learn hidden Markov models and IBM translation models using the MRE criterion, without the need to introduce a separate reconstruction model to facilitate efficient inference. Experiments on unsupervised part-of-speech induction and unsupervised word alignment show that our approach enables generative latent-variable models to better discover intended correlations in data and outperforms maximum likelihood estimators significantly.

AAAI Conference 2016 Conference Paper

Discovering User Attribute Stylistic Differences via Paraphrasing

  • Daniel Preotiuc-Pietro
  • Wei Xu
  • Lyle Ungar

User attribute prediction from social media text has proven successful and useful for downstream tasks. In previous studies, differences in user trait language use have been limited primarily to the presence or absence of words that indicate topical preferences. In this study, we aim to find linguistic style distinctions across three different user attributes: gender, age and occupational class. By combining paraphrases with a simple yet effective method, we capture a wide set of stylistic differences that are exempt from topic bias. We show their predictive power in user profiling, conformity with human perception and psycholinguistic hypotheses, and potential use in generating natural language tailored to specific user traits.

NeurIPS Conference 2015 Conference Paper

Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question

  • Haoyuan Gao
  • Junhua Mao
  • Jie Zhou
  • Zhiheng Huang
  • Lei Wang
  • Wei Xu

In this paper, we present the mQA model, which is able to answer questions about the content of an image. The answer can be a sentence, a phrase or a single word. Our model contains four components: a Long Short-Term Memory (LSTM) to extract the question representation, a Convolutional Neural Network (CNN) to extract the visual representation, an LSTM for storing the linguistic context in an answer, and a fusing component to combine the information from the first three components and generate the answer. We construct a Freestyle Multilingual Image Question Answering (FM-IQA) dataset to train and evaluate our mQA model. It contains over 150, 000 images and 310, 000 freestyle Chinese question-answer pairs and their English translations. The quality of the generated answers of our mQA model on this dataset is evaluated by human judges through a Turing Test. Specifically, we mix the answers provided by humans and our model. The human judges need to distinguish our model from the human. They will also provide a score (i. e. 0, 1, 2, the larger the better) indicating the quality of the answer. We propose strategies to monitor the quality of this evaluation process. The experiments show that in 64. 7% of cases, the human judges cannot distinguish our model from humans. The average score is 1. 454 (1. 918 for human). The details of this work, including the FM-IQA dataset, can be found on the project page: \url{http: //idl. baidu. com/FM-IQA. html}.

ICRA Conference 2011 Conference Paper

An improved ZMP trajectory design for the biped robot BHR

  • Wei Xu
  • Qiang Huang 0002
  • Jing Li 0074
  • Zhangguo Yu
  • Xuechao Chen
  • Qian Xu

An improved ZMP (Zero Moment Point) trajectory for a biped robot is designed in this paper, which imitates a human's actual ZMP trajectory in the walking process. A new method of walking pattern generation based on forward moving ZMP in SSP (Single Support Phase) is also provided. It can keep the ZMP moving forward instead of staying in the center of supporting region during SSP, which is helpful for increasing the walking speed. We have been developing BHR, which has 38 DOFs (degree of freedom). The effectiveness of the method is conducted by simulation and walking experiment on BHR.

EAAI Journal 2010 Journal Article

Real-time driving danger-level prediction

  • Jinjun Wang
  • Wei Xu
  • Yihong Gong

This paper introduces a driving danger-level prediction system that uses multiple sensor inputs and statistical modeling to predict the driving risk. Three types of features were collected for the research, specifically the vehicle dynamic parameter, the driver's physiological data and the driver's behavior feature. To model the temporal patterns that lead to safe/dangerous driving state, several sequential supervised learning algorithms were evaluated in the paper, including hidden Markov model, conditional random field and reinforcement learning. Experimental results showed that using reinforcement learning based method with the vehicle dynamic parameters feature outperforms the rest algorithms, and adding the other two features could further improve the prediction accuracy. Based on the result, a live driving danger-level prediction prototype system was developed. Compared to many previous researches that focused on monitoring the driver's vigilance level to infer the possibility of potential driving risk, our live system is non-intrusive to the driver, and hence it is very desirable for driving danger prevention applications. Subjective on-line user study of our prototype system gave promising results.

NeurIPS Conference 2008 Conference Paper

Deep Learning with Kernel Regularization for Visual Recognition

  • Kai Yu
  • Wei Xu
  • Yihong Gong

In this paper we focus on training deep neural networks for visual recognition tasks. One challenge is the lack of an informative regularization on the network parameters, to imply a meaningful control on the computed function. We propose a training strategy that takes advantage of kernel methods, where an existing kernel function represents useful prior knowledge about the learning task of interest. We derive an efficient algorithm using stochastic gradient descent, and demonstrate very positive results in a wide range of visual recognition tasks.