Arrow Research search

Author name cluster

Xin Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

79 papers
2 author rows

Possible papers

79

AAAI Conference 2026 Conference Paper

G-IR: Geometric Image Representation for Learning

  • Xin Chen
  • Qi Zhao
  • Wei Zeng
  • Zongben Xu

Images are generally represented by pixel intensities or color values, which are usually used as direct inputs for learning. This study innovatively proposes a geometric image representation method and refreshes the general learning model (e.g., autoencoder) in the diffeomorphic space. Based on the theory of geometric optimal transport and quasiconformal mapping, we equivalently transform the intensity representation into a shape representation. The image space becomes a diffeomorphic space, where any image can be uniquely represented as a Beltrami coefficient function defined on a uniform grid reference, and vice versa. This innovative geometric image representation (G-IR) captures the fine-grained structure inherent in the entire image, which is different from the traditional feature extraction that focuses on the internal geometric objects of the image (such as boundaries and axes). The diffeomorphic property preserves structure in the generation process, which is very necessary in the field of real physics. It can be assembled into existing pipelines as a plug-in, providing structure-preserving properties for the entire framework. Experiments on image restoration and interpolation validated the high efficiency, efficacy and applicability of the G-IR method, demonstrating its superior performance compared to common pixel-level image appearance representations.

AAAI Conference 2026 Conference Paper

Parallelizable Riemannian Alternating Direction Method of Multipliers for Non-convex Pose Graph Optimization

  • Xin Chen
  • Chunfeng Cui
  • Deren Han
  • Liqun Qi

Pose graph optimization (PGO) is fundamental to robot perception and navigation systems, serving as the mathematical backbone for solving simultaneous localization and mapping (SLAM). Existing solvers suffer from polynomial growth in computational complexity with graph size, hindering real-time deployment in large-scale scenarios. In this paper, by duplicating variables and introducing equality constraints, we reformulate the problem and propose a Parallelizable Riemannian Alternating Direction Method of Multipliers (PRADMM) to solve it efficiently. Compared with the state-of-the-art methods that usually exhibit polynomial time complexity growth with graph size, PRADMM enables efficient parallel computation across vertices regardless of graph size. Crucially, all subproblems admit closed-form solutions, ensuring PRADMM maintains exceptionally stable performance. Furthermore, by carefully exploiting the structures of the coefficient matrices in the constraints, we establish the global convergence of PRADMM under mild conditions, enabling larger relaxation step sizes within the interval (0,2). Extensive empirical validation on two synthetic datasets and multiple real-world 3D SLAM benchmarks confirms the superior computational performance of PRADMM.

EAAI Journal 2026 Journal Article

Physics-aware dynamic graph embedding with contrastive feature alignment for transient stability prediction under grid topology variations

  • Zijian Lyu
  • Xin Chen
  • Gengfeng Li

Accurate, low-latency online prediction of transient stability is essential for secure operation of modern power systems subject to disturbances. Although data-driven deep learning methods have shown strong predictive performance, their accuracy often deteriorates when system parameters or operating conditions change, particularly under grid topology variations. This lack of adaptability limits their effectiveness in real-world applications. By introducing a physics-informed inductive bias derived from multi-machine swing dynamics, this paper proposes a physics-aware Dynamic Graph Embedding (DGE) that encodes time-synchronized phasor measurement unit (PMU) signals together with network structural information into compact, node-wise representations, and a DGE-based Supervised Contrastive Learning (DGE-SCL) framework utilizing a lightweight Convolutional Neural Network (CNN) backbone. This framework combines topology-invariant data augmentation with supervised contrastive feature learning to obtain topology-robust, class-discriminative embeddings. These components are applied to real-time transient-stability classification, enabling efficient transfer of pretrained predictors across different network configurations. The method is evaluated on the Institute of Electrical and Electronics Engineers (IEEE) 39- and 145-bus test systems under multiple N-1 and N- m -1 topology scenarios; results show consistently improved generalization and robustness compared to baselines while maintaining low inference latency suitable for near-real-time deployment.

AAAI Conference 2026 Conference Paper

PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis

  • Jiao Xu
  • Junwei Liu
  • Jiangwei Lao
  • Qi Zhu
  • Yunpeng Zhao
  • Congyun Jin
  • Shinan Liu
  • Zhihong Lu

Recent advances in medical multi-modal models focus on specialized image analysis like dermatology, pathology, or radiology. However, they do not fully capture the complexity of real-world clinical diagnostics, which involve heterogeneous inputs and require ongoing contextual understanding during patient-physician interactions. To bridge this gap, we introduce PulseMind, a new family of multi-modal diagnostic models that integrates a systematically curated dataset, a comprehensive evaluation benchmark, and a tailored training framework. Specifically, we first construct a diagnostic dataset, MediScope, which comprises 98,000 real-world multi-turn consultations and 601,500 medical images, spanning over 10 major clinical departments and more than 200 sub-specialties. Then, to better reflect the requirements of real-world clinical diagnosis, we develop the PulseMind Benchmark, a multi-turn diagnostic consultation benchmark with a four-dimensional evaluation protocol comprising proactiveness, accuracy, usefulness, and language quality. Finally, we design a training framework tailored for multi-modal clinical diagnostics, centered around a core component named Comparison-based Reinforcement Policy Optimization (CRPO). Compared to absolute score rewards, CRPO uses relative preference signals from multi-dimensional comparisons to provide stable and human-aligned training guidance. Extensive experiments demonstrate that PulseMind achieves competitive performance on both the diagnostic consultation benchmark and public medical benchmarks.

JBHI Journal 2026 Journal Article

Source-Resilient Joint Learning Framework for Preserving Stable Generalization on Diverse Ultrasonic Source Scenarios

  • Bin Huang
  • Zhong Liu
  • Ziyue Xu
  • Shing-Chow Chan
  • Huiying Wen
  • Chao HOU
  • Qicai Huang
  • Meiqin Jiang

Joint learning on diverse ultrasonic source scenarios presents a challenge in preserving stable generalization due to the combination of heterogeneity of different sources and the inconsistency of joint learning features. Previous joint learning studies, which are not source-resilient frameworks, may not preserve stable generalization when trained on diverse source scenarios. Furthermore, the limited variations in single-source data and the interference from ultrasound imaging, which are common in ultrasonic source scenarios, further decrease generalization. To address these problems, we proposed a source-resilient joint learning framework consisting of three stages: 1) Source transforming, where our 1-to-N transformation unifies diverse source scenarios for source-resiliency. 2) Our feature enhancement modules model the source-resilient joint learning network, including a manifold-constraint normalization module (MCNM) for addressing heterogeneity by minimizing manifold-based loss, a task-consistent attention module (TCAM) shares the multi-scale features with self-attention to address inconsistency, and an adaptive feature-shifting module (AFSM) for feature-level augmentation to overcome single-source data. 3) Our ultrasound-hybrid linear mapping (USmapping) cascades speckle randomization and mask-guiding Monge-Kantorovitch linear mapping to achieve ultrasonic style randomization for addressing the interference of ultrasonic data. Our framework was evaluated on eight ultrasound datasets from various scanners at multiple centers and surpassed previous comparable studies in both segmentation (DSC $WAvg$ of 75. 7%) and classification (AUROC $WAvg$ of 68. 8%) tasks. Our framework has the potential to serve as a general framework for enhancing the performance of joint learning under diverse ultrasonic source scenarios.

EAAI Journal 2025 Journal Article

A cross-domain multi-scale feature fusion network based on graph convolution for intelligent fault diagnosis

  • Quanyu Zhong
  • Qiang Li
  • Junxiao Ren
  • Xin Chen
  • Dong Liu
  • Qiang Yang

Fault diagnosis of mechanical equipment plays a critical role in enhancing system stability and ensuring operational safety. Multi-modal monitoring data provides a more comprehensive view of the equipment’s condition, enabling more precise state diagnosis through the processing of such data. A major challenge in contemporary multi-modal information fusion lies in the effective integration of cross-modal information. To overcome the inherent limitations of single-view approaches often found in existing multi-modal methods, this paper presents a multi-view, multi-modal information fusion approach based on graph convolutional networks. This method not only extracts key features from signals efficiently but also uncovers the interrelationships between different modal signals through a multi-view interaction mechanism, achieving more robust information fusion and enhanced fault diagnosis performance. This method consists of three core modules: feature extraction, information fusion, and classification. The feature extraction module utilizes a multi-level residual architecture, Mini-Long Short Term Memory, and self-attention mechanisms to capture both local and global signal features, model temporal dependencies, and refine feature selection. The information fusion module combines data and feature-level signal correlations using graph convolutional networks for cross-modal interaction. The classification module employs a two-layer fully connected network for fault diagnosis. This method quantitatively assesses the contribution of each modality, providing a theoretical basis for understanding their physical significance. Experimental results and ablation studies demonstrate its superior performance and enhanced accuracy in fault diagnosis, offering a novel approach for condition monitoring of complex equipment.

YNIMG Journal 2025 Journal Article

Changes in white matter predict efficacy of repetitive transcranial magnetic stimulation in Parkinson's disease

  • Jinying Han
  • Lingling Lv
  • Xin Chen
  • Mengqi Wang
  • Lili Hu
  • Fengbo Xing
  • Pingping Liu
  • Liuzhenxiong Yu

BACKGROUND: The efficacy of continuous theta burst stimulation (cTBS) in Parkinson's disease (PD) exhibits considerable variability. Emerging evidence links changes in brain white matter (WM) activity to the onset and progression of PD, offering novel insights into its pathophysiology. OBJECTIVE: Exploring activity patterns within different WM regions to predict the therapeutic efficacy of cTBS. METHODS: This retrospective study included 68 patients with PD who underwent a 14-day cTBS targeting the supplementary motor area (25,200 pulses). Patients were classified as responders (R, n = 20) or non-responders (NR, n = 48) based on whether their UPDRS III score improved by ≥30 %. Pre-intervention differences in WM amplitude of low-frequency fluctuations (ALFF) and fractional ALFF in fMRI were analyzed, along with their correlation with motor symptom improvement. A support vector machine (SVM) model was developed to predict cTBS efficacy and validated in an independent cohort (n = 22). RESULTS: Compared to the NR group, R patients exhibited greater improvements in rigidity and axial symptoms, accompanied by lower baseline ALFF in multiple WM tracts. SVM analysis identified higher baseline UPDRS III and rigidity scores, along with reduced ALFF in the left corticospinal tract, right ILF, and left anterior thalamic radiation, as predictors of better motor outcomes. In an independent cohort, predicted and actual UPDRS III improvements showed a concordance correlation coefficient (CCC) of 0.630. A combined model incorporating rigidity scores and ILF_R ALFF achieved moderate accuracy in predicting rigidity improvement (CCC = 0.725). CONCLUSION: Baseline WM function may serve as a biomarker for predicting motor response to cTBS.

IROS Conference 2025 Conference Paper

ChatBuilder: LLM-assisted Modular Robot Creation

  • Xin Chen
  • Xifeng Gao
  • Lifeng Zhu
  • Aiguo Song
  • Zherong Pan

Modular robotic structures simplify robot design and manufacturing by using standardized modules, enhancing flexibility and adaptability. However, the need for manual input in design and assembly limit their potential. Current methods to automate this process still require significant human effort and technical expertise. This paper introduces a novel approach that employs Large Language Models (LLMs) as intelligent agents to automate the creation of modular robotic structures. We decompose the modular robot creation task and develop two agents based on LLM to plan and assemble the modular robots from text prompts. By inputting a textual description, users can generate robot designs that are validated in both simulated and real-world environments. This method reduces the need for manual intervention and lowers the technical barrier to creating complex robotic systems.

AAAI Conference 2025 Conference Paper

CohEx: A Generalized Framework for Cohort Explanation

  • Fanyu Meng
  • Xin Liu
  • Zhaodan Kong
  • Xin Chen

eXplainable Artificial Intelligence (XAI) has garnered significant attention for enhancing transparency and trust in machine learning models. However, the scopes of most existing explanation techniques focus either on offering a holistic view of the explainee model (global explanation) or on individual instances (local explanation), while the middle ground, i.e., cohort-based explanation, is less explored. Cohort explanations offer insights into the explainee's behavior on a specific group or cohort of instances, enabling a deeper understanding of model decisions within a defined context. In this paper, we discuss the unique challenges and opportunities associated with measuring cohort explanations, define their desired properties, and create a generalized framework for generating cohort explanations based on supervised clustering.

ICML Conference 2025 Conference Paper

Disentangled Graph Spectral Domain Adaptation

  • Liang Yang 0002
  • Xin Chen
  • Jiaming Zhuo
  • Di Jin 0001
  • Chuan Wang 0002
  • Xiaochun Cao
  • Zhen Wang 0004
  • Yuanfang Guo

The distribution shifts and the scarcity of labels prevent graph learning methods, especially graph neural networks (GNNs), from generalizing across domains. Compared to Unsupervised Domain Adaptation (UDA) with embedding alignment, Unsupervised Graph Domain Adaptation (UGDA) becomes more challenging in light of the attribute and topology entanglement in the representation. Beyond embedding alignment, UGDA turns to topology alignment but is limited by the ability of the employed topology model and the estimation of pseudo labels. To alleviate this issue, this paper proposed a Disentangled Graph Spectral Domain adaptation (DGSDA) by disentangling attribute and topology alignments and directly aligning flexible graph spectral filters beyond topology. Specifically, Bernstein polynomial approximation, which mimics the behavior of the function to be approximated to a remarkable degree, is employed to capture complicated topology characteristics and avoid the expensive eigenvalue decomposition. Theoretical analysis reveals the tight GDA bound of DGSDA and the rationality of polynomial coefficient regularization. Quantitative and qualitative experiments justify the superiority of the proposed DGSDA.

NeurIPS Conference 2025 Conference Paper

DoDo-Code: an Efficient Levenshtein Distance Embedding-based Code for 4-ary IDS Channel

  • Alan J. X. Guo
  • Sihan Sun
  • Xiang Wei
  • Mengyi Wei
  • Xin Chen

With the emergence of new storage and communication methods, the insertion, deletion, and substitution (IDS) channel has attracted considerable attention. However, many topics on the IDS channel and the associated Levenshtein distance remain open, making the invention of a novel IDS-correcting code a hard task. Furthermore, current studies on single-IDS-correcting code misalign with the requirements of applications which necessitates the correcting of multiple errors. Compromise solutions have involved shortening codewords to reduce the chance of multiple errors. However, the code rates of existing codes are poor at short lengths, diminishing the overall storage density. In this study, a novel method is introduced for designing high-code-rate single-IDS-correcting codewords through deep Levenshtein distance embedding. A deep learning model is utilized to project the sequences into embedding vectors that preserve the Levenshtein distances between the original sequences. This embedding space serves as a proxy for the complex Levenshtein domain, within which algorithms for codeword search and segment correcting is developed. While the concept underpinning this approach is straightforward, it bypasses the mathematical challenges typically encountered in code design. The proposed method results in a code rate that outperforms existing combinatorial solutions, particularly for designing short-length codewords.

YNIMG Journal 2025 Journal Article

Effect of intelligence quotient discrepancy on attention and executive function in children with attention deficit hyperactivity disorder: an fNIRS study

  • Xin Chen
  • Liang-liang Chen
  • Jing-rong Wang
  • Ying-ying Cai
  • Xiao-dan Yu

Intelligence quotient discrepancy (IQD) is associated with neurodevelopmental disorders, but its impact on attention and executive function (EF) deficits in children with attention deficit hyperactivity disorder (ADHD) is unknown.This study aimed to examine the effect of IQD by functional near-infrared spectroscopy (fNIRS). The current study included 114 children with ADHD and a full-scale IQ ≥ 70, encompassing verbal IQ (VIQ) and performance IQ (PIQ). Participants were divided based on IQ discrepancies into the NON-IQD (n = 60, |PIQ-VIQ| 1 standard deviation) groups, with 27 and 19 individuals undergoing fNIRS, respectively. Both the Behavior Rating Inventory of Executive Function (BRIEF) scale and fNIRS during a go/no-go task were utilized for the assessment of EF. Attention was measured with the Swanson, Nolan, and Pelham Version IV (SNAP-IV) scale, the Integrated Visual and Auditory Continuous Performance Test (IVA/CPT), and monitoring of 1-48 channel aberrant hemodynamics with fNIRS during the task.The study indicates that IQD plays a role in attention and EF impairment in children with ADHD, linked to abnormal hemodynamics in the right medial prefrontal cortex (RmPFC).

AAAI Conference 2025 Conference Paper

ERL-MPP: Evolutionary Reinforcement Learning with Multi-head Puzzle Perception for Solving Large-scale Jigsaw Puzzles of Eroded Gaps

  • Xingke Song
  • Xiaoying Yang
  • Chenglin Yao
  • Jianfeng Ren
  • Ruibin Bai
  • Xin Chen
  • Xudong Jiang

Solving jigsaw puzzles has been extensively studied. While most existing models focus on solving either small-scale puzzles or puzzles with no gap between fragments, solving large-scale puzzles with gaps presents distinctive challenges in both image understanding and combinatorial optimization. To tackle these challenges, we propose a framework of Evolutionary Reinforcement Learning with Multi-head Puzzle Perception (ERL-MPP) to derive a better set of swapping actions for solving the puzzles. Specifically, to tackle the challenges of perceiving the puzzle with gaps, a Multi-head Puzzle Perception Network (MPPN) with a shared encoder is designed, where multiple puzzlet heads comprehensively perceive the local assembly status, and a discriminator head provides a global assessment of the puzzle. To explore the large swapping action space efficiently, an Evolutionary Reinforcement Learning (EvoRL) agent is designed, where an actor recommends a set of suitable swapping actions from a large action space based on the perceived puzzle status, a critic updates the actor using the estimated rewards and the puzzle status, and an evaluator coupled with evolutionary strategies evolves the actions aligning with the historical assembly experience. The proposed ERL-MPP is comprehensively evaluated on the JPLEG-5 dataset with large gaps and the MIT dataset with large-scale puzzles. It significantly outperforms all state-of-the-art models on both datasets.

AAAI Conference 2025 Conference Paper

Exploring Enhanced Contextual Information for Video-Level Object Tracking

  • Ben Kang
  • Xin Chen
  • Simiao Lai
  • Yang Liu
  • Yi Liu
  • Dong Wang

Contextual information at the video level has become increasingly crucial for visual object tracking. However, existing methods typically use only a few tokens to convey this information, which can lead to information loss and limit their ability to fully capture the context. To address this issue, we propose a new video-level visual object tracking framework called MCITrack. It leverages Mamba's hidden states to continuously record and transmit extensive contextual information throughout the video stream, resulting in more robust object tracking. The core component of MCITrack is the Contextual Information Fusion module, which consists of the mamba layer and the cross-attention layer. The mamba layer stores historical contextual information, while the cross-attention layer integrates this information into the current visual features of each backbone block. This module enhances the model's ability to capture and utilize contextual information at multiple levels through deep integration with the backbone. Experiments demonstrate that MCITrack achieves competitive performance across numerous benchmarks. For instance, it gets 76.6% AUC on LaSOT and 80.0% AO on GOT-10k, establishing a new state-of-the-art performance.

JBHI Journal 2025 Journal Article

Fluid Intake Action Detection Based on Egocentric Videos and YOLOv8 Models

  • Xin Chen
  • Xinqi Bao
  • Ernest N. Kamavuako

Dehydration in older adults poses significant health risks, requiring effective monitoring solutions. This study addresses the challenge of detecting fluid intake accurately using a first-person, vision-based approach with wearable cameras and advanced object detection models. We developed a comprehensive dataset comprising 17 hours of drinking footage (∼3100 events) and 15 hours of non-drinking activities (∼3600 events) recorded as interference, from 36 participants, collected between October 2022 and January 2023 at King's College London. We include various container types and daily activities to enhance the model's robustness and generalizability. YOLOv8 models were used to detect drinking-related objects, and a mechanism was developed to analyse the size and position of the detection output to identify hand-container interactions and movements. The models achieved mAP@50 over 0. 97 and F1-score over 0. 95 in detecting drinking-related objects. Action detection testing results from video streams demonstrated an F1-score of 0. 917, which dropped to 0. 863 when interference activities were added. Additionally, the model detected the start of drinking activities with an average latency of 0. 24 seconds and the end with 0. 04 seconds, indicating high temporal accuracy. These results demonstrate the feasibility of egocentric, vision-based fluid-intake detection and its potential application in preventing dehydration. To our knowledge, this is the first vision-based dataset focusing on fluid-intake actions from a first-person viewpoint—offering a novel foundation for advancing hydration monitoring in older adults and various real-world contexts.

ICML Conference 2025 Conference Paper

La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation

  • Kai Liu
  • Bowen Xu
  • Shaoyu Wu
  • Xin Chen
  • Hao Zhou
  • Yongliang Tao
  • Lulu Hu

Activation sparsity can reduce the computational overhead and memory transfers during the forward pass of Large Language Model (LLM) inference. Existing methods face limitations, either demanding time-consuming recovery training that hinders real-world adoption, or relying on empirical magnitude-based pruning, which causes fluctuating sparsity and unstable inference speed-up. This paper introduces LaRoSA ( La yerwise Ro tated S parse A ctivation), a novel method for activation sparsification designed to improve LLM efficiency without requiring additional training or magnitude-based pruning. We leverage layerwise orthogonal rotations to transform input activations into rotated forms that are more suitable for sparsification. By employing a Top-K selection approach within the rotated activations, we achieve consistent model-level sparsity and reliable wall-clock time speed-up. LaRoSA is effective across various sizes and types of LLMs, demonstrating minimal performance degradation and robust inference acceleration. Specifically, for LLaMA2-7B at 40% sparsity, LaRoSA achieves a mere 0. 17 perplexity gap with a consistent 1. 30$\times$ wall-clock time speed-up, and reduces the accuracy gap in zero-shot tasks compared to the dense model to just 0. 54%, while surpassing TEAL by 1. 77% and CATS by 17. 14%.

ICML Conference 2025 Conference Paper

Learning Safety Constraints for Large Language Models

  • Xin Chen
  • Yarden As
  • Andreas Krause 0001

Large language models (LLMs) have emerged as powerful tools but pose significant safety risks through harmful outputs and vulnerability to adversarial attacks. We propose SaP–short for Safety Polytope–a geometric approach to LLM safety, that learns and enforces multiple safety constraints directly in the model’s representation space. We develop a framework that identifies safe and unsafe regions via the polytope’s facets, enabling both detection and correction of unsafe outputs through geometric steering. Unlike existing approaches that modify model weights, SaP operates post-hoc in the representation space, preserving model capabilities while enforcing safety constraints. Experiments across multiple LLMs demonstrate that our method can effectively detect unethical inputs, reduce adversarial attack success rates while maintaining performance on standard tasks, thus highlighting the importance of having an explicit geometric model for safety. Analysis of the learned polytope facets reveals emergence of specialization in detecting different semantic notions of safety, providing interpretable insights into how safety is captured in LLMs’ representation space.

NeurIPS Conference 2025 Conference Paper

MS-BART: Unified Modeling of Mass Spectra and Molecules for Structure Elucidation

  • Yang Han
  • Pengyu Wang
  • Kai Yu
  • Xin Chen
  • Lu Chen

Mass spectrometry (MS) plays a critical role in molecular identification, significantly advancing scientific discovery. However, structure elucidation from MS data remains challenging due to the scarcity of annotated spectra. While large-scale pretraining has proven effective in addressing data scarcity in other domains, applying this paradigm to mass spectrometry is hindered by the complexity and heterogeneity of raw spectral signals. To address this, we propose MS-BART, a unified modeling framework that maps mass spectra and molecular structures into a shared token vocabulary, enabling cross-modal learning through large-scale pretraining on reliably computed fingerprint–molecule datasets. Multi-task pretraining objectives further enhance MS-BART's generalization by jointly optimizing denoising and translation task. The pretrained model is subsequently transferred to experimental spectra through finetuning on fingerprint predictions generated with MIST, a pre-trained spectral inference model, thereby enhancing robustness to real-world spectral variability. While finetuning alleviates the distributional difference, MS-BART still suffers molecular hallucination and requires further alignment. We therefore introduce a chemical feedback mechanism that guides the model toward generating molecules closer to the reference structure. Extensive evaluations demonstrate that MS-BART achieves SOTA performance across 5/12 key metrics on MassSpecGym and NPLIB1 and is faster by one order of magnitude than competing diffusion-based methods, while comprehensive ablation studies systematically validate the model's effectiveness and robustness. We provide the data and code at https: //github. com/OpenDFM/MS-BART.

EAAI Journal 2025 Journal Article

Real-time and explainable rock mass classification under imbalanced tunnel boring machine data using hybrid resampling and ensemble learning

  • Rui Li
  • Junlong Yan
  • Yueji He
  • Shaoxuan Guo
  • Qingsong Zhang
  • Rentai Liu
  • Yanyi Liu
  • Xuanyue Feng

The construction safety and efficiency of Tunnel Boring Machines (TBMs) are highly dependent on the accurate identification of surrounding rock mass grades. This study develops a data-driven rock mass prediction model using tunneling parameters collected from the Yinchao Jiliao diversion project. Mutual information coefficient, spearman correlation analysis, and kernel density estimation were comprehensively applied to identify the most relevant statistical features derived from key tunneling parameters that are associated with surrounding rock classes. Seven individual models and three ensemble learning models were established, with hyperparameters optimized via a Tree-structured Parzen Estimator (TPE) based Bayesian algorithm and stratified five-fold cross-validation. To address the core challenges of highly imbalanced sample distribution and inter-class feature overlap, this study introduced Synthetic Minority Over-sampling Technique (SMOTE) and SMOTE-Tomek for data preprocessing. Considering the asymmetric risk associated with misclassification of different rock mass grades in practical tunneling engineering, a risk preference metric termed High-Risk Average Recall (HRAR) was proposed to evaluate model, prioritizing the prevention of misclassifying high-risk rock masses (Class IV and V) as low-risk rock masses (Class II and III). Based on comprehensive metrics, the SMOTE-Tomek-preprocessed Soft-Voting ensemble model achieved superior macro-average performance and high HRAR value. To enhance model transparency and credibility, SHapley Additive exPlanations (SHAP) was employed for explainability analysis. This method elucidated the contribution and influence of key features (thrust, torque, advance rate) on rock mass classification across different models. This study provides a systematic solution and technical foundation for geological perception and risk early-warning in intelligent TBM tunneling.

EAAI Journal 2025 Journal Article

Recent advances in flotation froth image analysis via deep learning

  • Xin Chen
  • Dan Liu
  • Longzhou Yu
  • Ping Shao
  • Mingyan An
  • Shuming Wen

Flotation froth image analysis with computer vision systems has witnessed a transformative evolution through the integration of deep learning. Deep learning outperforms traditional feature design by effectively learning intricate feature representations, thus enhancing the assessment of froth flotation processes' operational performance. Flotation froth image analysis via deep learning facilitates real-time monitoring of dynamic flotation processes, guiding the adjustment of operational variables through predicting performance indicators, recognizing froth states and segmenting foam edges, which promotes resource efficiency and supports the sustainable development of beneficiation. Despite the vast potential of deep learning for time-series forecasting within the multistage flotation cycle, its capabilities remain underexplored. To fill this gap, based on recent research, we discuss the application of temporal and multistage information in flotation cycle. We introduce the development trends of deep learning in various processes of flotation froth image analysis, including data collection, dataset preprocessing, feature extraction, and modeling. We particularly discuss advanced techniques for extracting time-series features, and developing multistage models and innovative data collection methods, so as to emphasize the importance of using temporal information. Eventually, the review explores several trends and challenges for future research. This review is expected to leave readers with deeper thoughts about algorithm design and data collection in the flotation domain, thereby promoting further research and development in beneficiation automation.

AAAI Conference 2025 Conference Paper

SUTrack: Towards Simple and Unified Single Object Tracking

  • Xin Chen
  • Ben Kang
  • Wanting Geng
  • Jiawen Zhu
  • Yi Liu
  • Dong Wang
  • Huchuan Lu

In this paper, we propose a simple yet unified single object tracking (SOT) framework, dubbed SUTrack. It consolidates five SOT tasks (RGB-based, RGB-Depth, RGB-Thermal, RGB-Event, RGB-Language Tracking) into a unified model trained in a single session. Due to the distinct nature of the data, current methods typically design individual architectures and train separate models for each task. This fragmentation results in redundant training processes, repetitive technological innovations, and limited cross-modal knowledge sharing. In contrast, SUTrack demonstrates that a single model with a unified input representation can effectively handle various SOT tasks, eliminating the need for task-specific designs and separate training sessions. Additionally, we introduce a task-recognition training strategy and a soft token type embedding to further enhance SUTrack's performance with minimal overhead. Experiments show that SUTrack outperforms previous task-specific counterparts across 11 datasets spanning five SOT tasks. Moreover, we provide a range of models catering edge devices as well as high-performance GPUs, striking a good trade-off between speed and accuracy. We hope SUTrack could serve as a strong foundation for further compelling research into unified tracking models.

AAAI Conference 2025 Conference Paper

Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking

  • Jiawen Zhu
  • Huayi Tang
  • Xin Chen
  • Xinying Wang
  • Dong Wang
  • Huchuan Lu

Efficient tracking has garnered attention for its ability to operate on resource-constrained platforms for real-world deployment beyond desktop GPUs. Current efficient trackers mainly follow precision-oriented trackers, adopting a one-stream framework with lightweight modules. However, blindly adhering to the one-stream paradigm may not be optimal, as incorporating template computation in every frame leads to redundancy, and pervasive semantic interaction between template and search region places stress on edge devices. In this work, we propose a novel asymmetric Siamese tracker named AsymTrack for efficient tracking. AsymTrack disentangles template and search streams into separate branches, with template computing only once during initialization to generate modulation signals. Building on this architecture, we devise an efficient template modulation mechanism to unidirectional inject crucial cues into the search features, and design an object perception enhancement module that integrates abstract semantics and local details to overcome the limited representation in lightweight tracker. Extensive experiments demonstrate that AsymTrack offers superior speed-precision trade-offs across different platforms compared to the current state-of-the-arts. For instance, AsymTrack-T achieves 60.8% AUC on LaSOT and 224/81/84 FPS on GPU/CPU/AGX, surpassing HiT-Tiny by 6.0% AUC with higher speeds.

NeurIPS Conference 2024 Conference Paper

3DET-Mamba: Causal Sequence Modelling for End-to-End 3D Object Detection

  • Mingsheng Li
  • Jiakang Yuan
  • Sijin Chen
  • Lin Zhang
  • Anyu Zhu
  • Xin Chen
  • Tao Chen

Transformer-based architectures have been proven successful in detecting 3D objects from point clouds. However, the quadratic complexity of the attention mechanism struggles to encode rich information as point cloud resolution increases. Recently, state space models (SSM) such as Mamba have gained great attention due to their linear complexity and long sequence modeling ability for language understanding. To exploit the potential of Mamba on 3D scene-level perception, for the first time, we propose 3DET-Mamba, which is a novel SSM-based model designed for indoor 3d object detection. Specifically, we divide the point cloud into different patches and use a lightweight yet effective Inner Mamba to capture local geometric information. To observe the scene from a global perspective, we introduce a novel Dual Mamba module that models the point cloud in terms of spatial distribution and continuity. Additionally, we design a Query-aware Mamba module that decodes context features into object sets under the guidance of learnable queries. Extensive experiments demonstrate that 3DET-Mamba surpasses previous 3DETR on indoor 3D detection benchmarks such as ScanNet, improving AP25/AP50 from 65. 0\%/47. 0\% to 70. 4\%/54. 4\%, respectively.

NeurIPS Conference 2024 Conference Paper

Achieving Optimal Clustering in Gaussian Mixture Models with Anisotropic Covariance Structures

  • Xin Chen
  • Anderson Ye Zhang

We study clustering under anisotropic Gaussian Mixture Models (GMMs), where covariance matrices from different clusters are unknown and are not necessarily the identity matrix. We analyze two anisotropic scenarios: homogeneous, with identical covariance matrices, and heterogeneous, with distinct matrices per cluster. For these models, we derive minimax lower bounds that illustrate the critical influence of covariance structures on clustering accuracy. To solve the clustering problem, we consider a variant of Lloyd's algorithm, adapted to estimate and utilize covariance information iteratively. We prove that the adjusted algorithm not only achieves the minimax optimality but also converges within a logarithmic number of iterations, thus bridging the gap between theoretical guarantees and practical efficiency.

NeurIPS Conference 2024 Conference Paper

Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models

  • Yimeng Zhang
  • Xin Chen
  • Jinghan Jia
  • Yihua Zhang
  • Chongyu Fan
  • Jiancheng Liu
  • Mingyi Hong
  • Ke Ding

Diffusion models (DMs) have achieved remarkable success in text-to-image generation, but they also pose safety risks, such as the potential generation of harmful content and copyright violations. The techniques of machine unlearning, also known as concept erasing, have been developed to address these risks. However, these techniques remain vulnerable to adversarial prompt attacks, which can prompt DMs post-unlearning to regenerate undesired images containing concepts (such as nudity) meant to be erased. This work aims to enhance the robustness of concept erasing by integrating the principle of adversarial training (AT) into machine unlearning, resulting in the robust unlearning framework referred to as AdvUnlearn. However, achieving this effectively and efficiently is highly nontrivial. First, we find that a straightforward implementation of AT compromises DMs’ image generation quality post-unlearning. To address this, we develop a utility-retaining regularization on an additional retain set, optimizing the trade-off between concept erasure robustness and model utility in AdvUnlearn. Moreover, we identify the text encoder as a more suitable module for robustification compared to UNet, ensuring unlearning effectiveness. And the acquired text encoder can serve as a plug-and-play robust unlearner for various DM types. Empirically, we perform extensive experiments to demonstrate the robustness advantage of AdvUnlearn across various DM unlearning scenarios, including the erasure of nudity, objects, and style concepts. In addition to robustness, AdvUnlearn also achieves a balanced tradeoff with model utility. To our knowledge, this is the first work to systematically explore robust DM unlearning through AT, setting it apart from existing methods that overlook robustness in concept erasing. Codes are available at https: //github. com/OPTML-Group/AdvUnlearn. Warning: This paper contains model outputs that may be offensive in nature.

EAAI Journal 2024 Journal Article

Discriminative feature learning based on multi-view attention network with diffusion joint loss for speech emotion recognition

  • Yang Liu
  • Xin Chen
  • Yuan Song
  • Yarong Li
  • Shengbei Wang
  • Weitao Yuan
  • Yongwei Li
  • Zhen Zhao

In speech emotion recognition, existing models often struggle to accurately classify emotions with high similarity. In this paper, we propose a novel architecture that integrates a multi-view attention network (MVAN) and diffusion joint loss to alleviate confusion by placing a stronger focus on emotions that are challenging to classify accurately. First, we use logarithmic Mel-spectrograms (log-Mels), deltas, and delta-deltas of log-Mels as three-dimensional features to minimize external interference. Then, we design the MVAN to extract effective multi-time scale emotion features, where the channel and spatial attention are used to selectively localize the regions in the input features related to the target emotion. A Multi-time view bidirectional long and short-term memory network is used to extract the shallow edge features and deep semantic features, and multi-scale self-attention fuses these features through cross-scale attention fusion to obtain multi-time scale emotion features. Finally, a diffusion joint loss strategy is introduced to distinguish the emotional embeddings with high similarity by the generated complex emotion triplets in a diffusing fashion. We evaluated our proposed method on the Interactive Emotional Mood Binary Motion Capture (IEMOCAP), Chinese Academy of Sciences Automation Institute of Automation (CASIA), and Berlin German Emotion Speech Bank (EMODB) corpus. The results show significant improvements over existing methods, achieving 86. 87% WA, 86. 60% UA, and 86. 82% WF1 on IEMOCAP; 70. 74% WA, 70. 74% UA, and 70. 25% WF1 on CASIA; and 93. 65% WA, 91. 13% UA, and 92. 26% WF1 on EMODB. These results confirm the superiority of our method. Our code and model are available at https: //github. com/Littleznnz/MVAN-DiffSEG.

UAI Conference 2024 Conference Paper

Gradient descent in matrix factorization: Understanding large initialization

  • Hengchao Chen
  • Xin Chen
  • Mohamad Elmasri
  • Qiang Sun

Gradient Descent (GD) has been proven effective in solving various matrix factorization problems. However, its optimization behavior with large initial values remains less understood. To address this gap, this paper presents a novel theoretical framework for examining the convergence trajectory of GD with a large initialization. The framework is grounded in signal-to-noise ratio concepts and inductive arguments. The results uncover an implicit incremental learning phenomenon in GD and offer a deeper understanding of its performance in large initialization scenarios.

JBHI Journal 2024 Journal Article

Improving Tumor Classification by Reusing Self-Predicted Segmentation of Medical Images as Guiding Knowledge

  • Xiaoyi Lin
  • Mingyu Wang
  • Fei Li
  • Ziyue Xu
  • Jia Chen
  • Xin Chen
  • Chenglang Yuan
  • Songxiong Wu

Differential diagnosis of tumors is important for computer-aided diagnosis. In computer-aided diagnosis systems, expert knowledge of lesion segmentation masks is limited as it is only used during preprocessing or as supervision to guide feature extraction. To improve the utilization of lesion segmentation masks, this study proposes a simple and effective multitask learning network that improves medical image classification using self-predicted segmentation as guiding knowledge; we call this network RS $^{2}$ -net. In RS $^{2}$ -net, the predicted segmentation probability map obtained from the initial segmentation inference is added to the original image to form a new input, which is then reinput to the network for the final classification inference. We validated the proposed RS $^{2}$ -net using three datasets: the pNENs-Grade dataset, which tested the prediction of pancreatic neuroendocrine neoplasm grading, and the HCC-MVI dataset, which tested the prediction of microvascular invasion of hepatocellular carcinoma, and ISIC 2017 public skin lesion dataset. The experimental results indicate that the proposed strategy of reusing self-predicted segmentation is effective, and RS $^{2}$ -net outperforms other popular networks and existing state-of-the-art studies. Interpretive analytics based on feature visualization demonstrates that the improved classification performance of our reuse strategy is due to the semantic information that can be acquired in advance in a shallow network.

IJCAI Conference 2024 Conference Paper

LocMoE: A Low-overhead MoE for Large Language Model Training

  • Jing Li
  • Zhijie Sun
  • Xuan He
  • Li Zeng
  • Yi Lin
  • Entong Li
  • Binfan Zheng
  • Rongqian Zhao

The Mixtures-of-Experts (MoE) model is a widespread distributed and integrated learning method for large language models (LLM), which is favored due to its ability to sparsify and expand models efficiently. However, the performance of MoE is limited by load imbalance and high latency of All-to-All communication, along with relatively redundant computation owing to large expert capacity. Load imbalance may result from existing routing policies that consistently tend to select certain experts. The frequent inter-node communication in the All-to-All procedure also significantly prolongs the training time. To alleviate the above performance problems, we propose a novel routing strategy that combines load balance and locality by converting partial inter-node communication to that of intra-node. Notably, we elucidate that there is a minimum threshold for expert capacity, calculated through the maximal angular deviation between the gating weights of the experts and the assigned tokens. We port these modifications on the PanGu-Σ model based on the MindSpore framework with multi-level routing and conduct experiments on Ascend clusters. The experiment results demonstrate that the proposed LocMoE reduces training time per epoch by 12. 68% to 22. 24% compared to classical routers, such as hash router and switch router, without impacting the model accuracy.

NeurIPS Conference 2024 Conference Paper

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

  • Sijin Chen
  • Xin Chen
  • Anqi Pang
  • Xianfang Zeng
  • Wei Cheng
  • Yijun Fu
  • Fukun Yin
  • Zhibin Wang

The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications. However, given its unstructured graph representation, the direct generation of high-fidelity 3D meshes is challenging. Fortunately, with a pre-defined ordering strategy, 3D meshes can be represented as sequences, and the generation process can be seamlessly treated as an auto-regressive problem. In this paper, we validate Neural Coordinate Field (NeurCF), an explicit coordinate representation with implicit neural embeddings, is a simple-yet-effective representation for large-scale sequential mesh modeling. After that, we present MeshXL, a family of generative pre-trained auto-regressive models that addresses 3D mesh generation with modern large language model approaches. Extensive experiments show that MeshXL is able to generate high-quality 3D meshes, and can also serve as foundation models for various down-stream applications.

AAAI Conference 2024 Conference Paper

Plug-In Diffusion Model for Sequential Recommendation

  • Haokai Ma
  • Ruobing Xie
  • Lei Meng
  • Xin Chen
  • Xu Zhang
  • Leyu Lin
  • Zhanhui Kang

Pioneering efforts have verified the effectiveness of the diffusion models in exploring the informative uncertainty for recommendation. Considering the difference between recommendation and image synthesis tasks, existing methods have undertaken tailored refinements to the diffusion and reverse process. However, these approaches typically use the highest-score item in corpus for user interest prediction, leading to the ignorance of the user's generalized preference contained within other items, thereby remaining constrained by the data sparsity issue. To address this issue, this paper presents a novel Plug-in Diffusion Model for Recommendation (PDRec) framework, which employs the diffusion model as a flexible plugin to jointly take full advantage of the diffusion-generating user preferences on all items. Specifically, PDRec first infers the users' dynamic preferences on all items via a time-interval diffusion model and proposes a Historical Behavior Reweighting (HBR) mechanism to identify the high-quality behaviors and suppress noisy behaviors. In addition to the observed items, PDRec proposes a Diffusion-based Positive Augmentation (DPA) strategy to leverage the top-ranked unobserved items as the potential positive samples, bringing in informative and diverse soft signals to alleviate data sparsity. To alleviate the false negative sampling issue, PDRec employs Noise-free Negative Sampling (NNS) to select stable negative samples for ensuring effective model optimization. Extensive experiments and analyses on four datasets have verified the superiority of the proposed PDRec over the state-of-the-art baselines and showcased the universality of PDRec as a flexible plugin for commonly-used sequential encoders in different recommendation scenarios. The code is available in https://github.com/hulkima/PDRec.

AAAI Conference 2024 Conference Paper

PM-INR: Prior-Rich Multi-Modal Implicit Large-Scale Scene Neural Representation

  • Yiying Yang
  • Fukun Yin
  • Wen Liu
  • Jiayuan Fan
  • Xin Chen
  • Gang Yu
  • Tao Chen

Recent advancements in implicit neural representations have contributed to high-fidelity surface reconstruction and photorealistic novel view synthesis. However, with the expansion of the scene scale, such as block or city level, existing methods will encounter challenges because traditional sampling cannot cope with the cubically growing sampling space. To alleviate the dependence on filling the sampling space, we explore using multi-modal priors to assist individual points to obtain more global semantic information and propose a priorrich multi-modal implicit neural representation network, Pm-INR, for the outdoor unbounded large-scale scene. The core of our method is multi-modal prior extraction and crossmodal prior fusion modules. The former encodes codebooks from different modality inputs and extracts valuable priors, while the latter fuses priors to maintain view consistency and preserve unique features among multi-modal priors. Finally, feature-rich cross-modal priors are injected into the sampling regions to allow each region to perceive global information without filling the sampling space. Extensive experiments have demonstrated the effectiveness and robustness of our method for outdoor unbounded large-scale scene novel view synthesis, which outperforms state-of-the-art methods in terms of PSNR, SSIM, and LPIPS.

AAAI Conference 2024 Conference Paper

REGLO: Provable Neural Network Repair for Global Robustness Properties

  • Feisi Fu
  • Zhilu Wang
  • Weichao Zhou
  • Yixuan Wang
  • Jiameng Fan
  • Chao Huang
  • Qi Zhu
  • Xin Chen

We present REGLO, a novel methodology for repairing pretrained neural networks to satisfy global robustness and individual fairness properties. A neural network is said to be globally robust with respect to a given input region if and only if all the input points in the region are locally robust. This notion of global robustness also captures the notion of individual fairness as a special case. We prove that any counterexample to a global robustness property must exhibit a corresponding large gradient. For ReLU networks, this result allows us to efficiently identify the linear regions that violate a given global robustness property. By formulating and solving a suitable robust convex optimization problem, REGLO then computes a minimal weight change that will provably repair these violating linear regions.

JBHI Journal 2023 Journal Article

Automatic Diagnosis of Significant Liver Fibrosis From Ultrasound B-Mode Images Using a Handcrafted-Feature-Assisted Deep Convolutional Neural Network

  • Zhong Liu
  • Bin Huang
  • Huiying Wen
  • Zhicheng Lu
  • Qicai Huang
  • Meiqin Jiang
  • Changfeng Dong
  • Yingxia Liu

The accurate diagnosis of significant liver fibrosis ( $ \boldsymbol {\geq}$ F2) in patients with chronic liver disease (CLD) is critical, as $\boldsymbol {\geq }$ F2 is a crucial factor that should be considered in selecting an antiviral therapy for these patients. This article proposes a handcrafted-feature-assisted deep convolutional neural network (HFA-DCNN) that helps radiologists automatically and accurately diagnose significant liver fibrosis from ultrasound (US) brightness (B)-mode images. The HFA-DCNN model has three main branches: one for automatic region of interest (ROI) segmentation in the US images, another for attention deep feature learning from the segmented ROI, and the third for handcrafted feature extraction. The attention deep learning features and handcrafted features are fused in the back end of the model to enable more accurate diagnosis of significant liver fibrosis. The usefulness and effectiveness of the proposed model were validated on a dataset built upon 321 CLD patients with liver fibrosis stages confirmed by pathological evaluations. In a fivefold cross validation (FFCV), the proposed model achieves accuracy, sensitivity, specificity, and area under the receiver-operating-characteristic (ROC) curve (AUC) values of 0. 863 (95% confidence interval (CI) 0. 820–0. 899), 0. 879 (95% CI 0. 823–0. 920), 0. 872 (95% CI 0. 800–0. 925), and 0. 925 (95% CI 0. 891–0. 952), which are significantly better than those obtained by the comparative methods. Given its excellent performance, the proposed HFA-DCNN model can serve as a promising tool for the noninvasive and accurate diagnosis of significant liver fibrosis in CLD patients.

NeurIPS Conference 2023 Conference Paper

Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation

  • Zibo Zhao
  • Wen Liu
  • Xin Chen
  • Xianfang Zeng
  • Rui Wang
  • Pei Cheng
  • Bin Fu
  • Tao Chen

We present a novel alignment-before-generation approach to tackle the challenging task of generating general 3D shapes based on 2D images or texts. Directly learning a conditional generative model from images or texts to 3D shapes is prone to producing inconsistent results with the conditions because 3D shapes have an additional dimension whose distribution significantly differs from that of 2D images and texts. To bridge the domain gap among the three modalities and facilitate multi-modal-conditioned 3D shape generation, we explore representing 3D shapes in a shape-image-text-aligned space. Our framework comprises two models: a Shape-Image-Text-Aligned Variational Auto-Encoder (SITA-VAE) and a conditional Aligned Shape Latent Diffusion Model (ASLDM). The former model encodes the 3D shapes into the shape latent space aligned to the image and text and reconstructs the fine-grained 3D neural fields corresponding to given shape embeddings via the transformer-based decoder. The latter model learns a probabilistic mapping function from the image or text space to the latent shape space. Our extensive experiments demonstrate that our proposed approach can generate higher-quality and more diverse 3D shapes that better semantically conform to the visual or textural conditional inputs, validating the effectiveness of the shape-image-text-aligned space for cross-modality 3D shape generation.

NeurIPS Conference 2023 Conference Paper

MotionGPT: Human Motion as a Foreign Language

  • Biao Jiang
  • Xin Chen
  • Wen Liu
  • Jingyi Yu
  • Gang Yu
  • Tao Chen

Though the advancement of pre-trained large language models unfolds, the exploration of building a unified model for language and other multimodal data, such as motion, remains challenging and untouched so far. Fortunately, human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training that can enhance the performance of motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks. Specifically, we employ the discrete vector quantization for human motion and transfer 3D motion into motion tokens, similar to the generation process of word tokens. Building upon this "motion vocabulary", we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language. Moreover, inspired by prompt learning, we pre-train MotionGPT with a mixture of motion-language data and fine-tune it on prompt-based question-and-answer tasks. Extensive experiments demonstrate that MotionGPT achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.

TMLR Journal 2023 Journal Article

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

  • Stephen Casper
  • Xander Davies
  • Claudia Shi
  • Thomas Krendl Gilbert
  • Jérémy Scheurer
  • Javier Rando
  • Rachel Freedman
  • Tomek Korbak

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-layered approach to the development of safer AI systems.

NeurIPS Conference 2023 Conference Paper

PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation

  • Yuhan Ding
  • Fukun Yin
  • Jiayuan Fan
  • Hui Li
  • Xin Chen
  • Wen Liu
  • Chongshan Lu
  • Gang Yu

Recent advances in implicit neural representations have achieved impressive results by sampling and fusing individual points along sampling rays in the sampling space. However, due to the explosively growing sampling space, finely representing and synthesizing detailed textures remains a challenge for unbounded large-scale outdoor scenes. To alleviate the dilemma of using individual points to perceive the entire colossal space, we explore learning the surface distribution of the scene to provide structural priors and reduce the samplable space and propose a Point Diffusion implicit Function, PDF, for large-scale scene neural representation. The core of our method is a large-scale point cloud super-resolution diffusion module that enhances the sparse point cloud reconstructed from several training images into a dense point cloud as an explicit prior. Then in the rendering stage, only sampling points with prior points within the sampling radius are retained. That is, the sampling space is reduced from the unbounded space to the scene surface. Meanwhile, to fill in the background of the scene that cannot be provided by point clouds, the region sampling based on Mip-NeRF 360 is employed to model the background representation. Expensive experiments have demonstrated the effectiveness of our method for large-scale scene novel view synthesis, which outperforms relevant state-of-the-art baselines.

JBHI Journal 2023 Journal Article

Self-Supervised Tumor Segmentation With Sim2Real Adaptation

  • Xiaoman Zhang
  • Weidi Xie
  • Chaoqin Huang
  • Ya Zhang
  • Xin Chen
  • Qi Tian
  • Yanfeng Wang

This paper targets on self-supervised tumor segmentation. We make the following contributions: (i) we take inspiration from the observation that tumors are often characterised independently of their contexts, we propose a novel proxy task “layer-decomposition”, that closely matches the goal of the downstream task, and design a scalable pipeline for generating synthetic tumor data for pre-training; (ii) we propose a two-stage Sim2Real training regime for unsupervised tumor segmentation, where we first pre-train a model with simulated tumors, and then adopt a self-training strategy for downstream data adaptation; (iii) when evaluating on different tumor segmentation benchmarks, e. g. BraTS2018 for brain tumor segmentation and LiTS2017 for liver tumor segmentation, our approach achieves state-of-the-art segmentation performance under the unsupervised setting. While transferring the model for tumor segmentation under a low-annotation regime, the proposed approach also outperforms all existing self-supervised approaches; (iv) we conduct extensive ablation studies to analyse the critical components in data simulation, and validate the necessity of different proxy tasks. We demonstrate that, with sufficient texture randomization in simulation, model trained on synthetic data can effortlessly generalise to datasets with real tumors.

ICML Conference 2023 Conference Paper

Sketched Ridgeless Linear Regression: The Role of Downsampling

  • Xin Chen
  • Yicheng Zeng
  • Siyue Yang
  • Qiang Sun 0007

Overparametrization often helps improve the generalization performance. This paper presents a dual view of overparametrization suggesting that downsampling may also help generalize. Focusing on the proportional regime $m\asymp n \asymp p$, where $m$ represents the sketching size, $n$ is the sample size, and $p$ is the feature dimensionality, we investigate two out-of-sample prediction risks of the sketched ridgeless least square estimator. Our findings challenge conventional beliefs by showing that downsampling does not always harm generalization but can actually improve it in certain cases. We identify the optimal sketching size that minimizes out-of-sample prediction risks and demonstrate that the optimally sketched estimator exhibits stabler risk curves, eliminating the peaks of those for the full-sample estimator. To facilitate practical implementation, we propose an empirical procedure to determine the optimal sketching size. Finally, we extend our analysis to cover central limit theorems and misspecified models. Numerical studies strongly support our theory.

ECAI Conference 2023 Conference Paper

Specializing Small Language Models Towards Complex Style Transfer via Latent Attribute Pre-Training

  • Yongfeng Huang 0001
  • Xin Chen
  • Lin Zhang

In this work, we introduce the concept of complex text style transfer tasks, and constructed complex text datasets based on two widely applicable scenarios. Our dataset is the first large-scale data set of its kind, with 700 rephrased sentences and 1, 000 sentences from the game Genshin Impact. While large language models (LLM) have shown promise in complex text style transfer, they have drawbacks such as data privacy concerns, network instability, and high deployment costs. To address these issues, we explore the effectiveness of small models (less than T5-3B) with implicit style pre-training through contrastive learning. We also propose a method for automated evaluation of text generation quality based on alignment with human evaluations using ChatGPT. Finally, we compare our approach with existing methods and show that our model achieves state-of-art performances of few-shot text style transfer models.

AAAI Conference 2022 Conference Paper

Anisotropic Fourier Features for Neural Image-Based Rendering and Relighting

  • Huangjie Yu
  • Anpei Chen
  • Xin Chen
  • Lan Xu
  • Ziyu Shao
  • Jingyi Yu

Recent neural rendering techniques have greatly benefited image-based modeling and relighting tasks. They provide a continuous, compact, and parallelable representation by modeling the plenoptic function as multilayer perceptrons (MLPs). However, vanilla MLPs suffer from spectral biases on multidimensional datasets. Recent rescues based on isotropic Fourier features mapping mitigate the problem but still fall short of handling heterogeneity across different dimensions, causing imbalanced regression and visual artifacts such as excessive blurs. We present an anisotropic random Fourier features (RFF) mapping scheme to tackle spectral biases. We first analyze the influence of bandwidth from a different perspective: we show that the optimal bandwidth exhibits strong correlations with the frequency spectrum of the training data across various dimensions. We then introduce an anisotropic feature mapping scheme with multiple bandwidths to model the multidimensional signal characteristics. We further propose an efficient bandwidth searching scheme through iterative golden-section search that can significantly reduce the training overload from polynomial time to logarithm. Our anisotropic scheme directly applies to neural surface light-field rendering and image-based relighting. Comprehensive experiments show that our scheme can more faithfully model lighting conditions and object features as well as preserve fine texture details and smooth view transitions even when angular and spatial samples are highly imbalanced.

EAAI Journal 2022 Journal Article

Life prediction of underground structure by sulfate corrosion using Harris hawks optimizing genetic programming

  • Yuan Xie
  • Wei Gao
  • Yiwei Wang
  • Xin Chen
  • Shuangshuang Ge
  • Sen Wang

A corrosive sulfate environment can cause strong deterioration and destruction of reinforced concrete (RC) underground structures and seriously reduce their service life. Thus, it is very important to predict the service life of RC underground structures in corrosive sulfate environments. However, the service life of underground structures is affected by numerous complicated engineering and environmental factors and cannot be determined by traditional theoretical and experimental investigations. Therefore, to solve this problem, a new data-driven method based on Harris hawks optimizing genetic programming (HHO-GP) is proposed. In this new method, to improve the traditional genetic programming (GP), a new global optimization algorithm called Harris hawks optimization (HHO) is adopted to optimize its main controlling parameters. Based on 25 groups of real engineering data, the life prediction model of underground structures in corrosive sulfate environments with 12 main engineering and environmental influence factors is established by the HHO-GP method. The results show that the average relative training error (5. 5%) and predicting error (6. 3%) of the new prediction model are small. Therefore, the proposed HHO-GP method can construct a suitable life prediction model based on only real engineering data, regardless of how many complicated influencing factors are considered. Moreover, our data-driven life prediction model is described by one explicit polynomial function based on 12 influencing factors. Thus, it can be applied in real engineering simply and easily. Finally, the influence of the main controlling parameters of the HHO-GP on its accuracy and efficiency is analyzed. The results reveal that considering the computing accuracy and efficiency and the model completeness, the small population size and maximum iterations of HHO are suitable, whose recommended values are all 15. The population size and maximum number of iterations of GP have little influence on the prediction accuracy. Their recommended values all can be 50.

JBHI Journal 2022 Journal Article

Multiparametric Quantitative US Examination of Liver Fibrosis: A Feature-Engineering and Machine-Learning Based Analysis

  • Huiying Wen
  • Wei Zheng
  • Min Li
  • Qing Li
  • Qiang Liu
  • Jianhua Zhou
  • Zhong Liu
  • Xin Chen

Quantitative ultrasound (QUS), which attempts to extract quantitative features from the US radiofrequency (RF) or envelope data for tissue characterization, is becoming a promising technique for noninvasive assessments of liver fibrosis. However, the number of feature variables examined and finally used in the existing QUS methods is typically small, limiting the diagnostic performance. Therefore, this paper devises a new multiparametric QUS (MP-QUS) method which enables the extraction of a large number of feature variables from US RF signals and allows for the use of feature-engineering and machine-learning based algorithms for liver fibrosis assessment. In the MP-QUS, eighty-four feature variables were extracted from multiple QUS parametric maps derived from the RF signals and the envelope data. Afterwards, feature reduction and selection were performed in turn to remove the feature redundancy and identify the best combination of features in the reduced feature set. Finally, a variety of machine-learning algorithms were tested for fibrosis classification with the selected features, based on the results of which the optimal classifier was established. The performance of the proposed MP-QUS method for staging liver fibrosis was evaluated on an animal model, with histologic examination as the reference standard. The mean accuracy, sensitivity, specificity and area under the receiver-operating-characteristic curve achieved by MP-QUS are respectively 83. 38%, 86. 04%, 80. 82%, and 0. 891 for recognizing significant liver fibrosis, and 85. 50%, 88. 92%, 85. 24%, and 0. 924 for diagnosing liver cirrhosis. The proposed MP-QUS method paves a way for its future extension to assess liver fibrosis in human subjects.

IJCAI Conference 2022 Conference Paper

Pruning-as-Search: Efficient Neural Architecture Search via Channel Pruning and Structural Reparameterization

  • Yanyu Li
  • Pu Zhao
  • Geng Yuan
  • Xue Lin
  • Yanzhi Wang
  • Xin Chen

Neural architecture search (NAS) and network pruning are widely studied efficient AI techniques, but not yet perfect. NAS performs exhaustive candidate architecture search, incurring tremendous search cost. Though (structured) pruning can simply shrink model dimension, it remains unclear how to decide the per-layer sparsity automatically and optimally. In this work, we revisit the problem of layer-width optimization and propose Pruning-as-Search (PaS), an end-to-end channel pruning method to search out desired sub-network automatically and efficiently. Specifically, we add a depth-wise binary convolution to learn pruning policies directly through gradient descent. By combining the structural reparameterization and PaS, we successfully searched out a new family of VGG-like and lightweight networks, which enable the flexibility of arbitrary width with respect to each layer instead of each stage. Experimental results show that our proposed architecture outperforms prior arts by around 1. 0% top-1 accuracy under similar inference speed on ImageNet-1000 classification task. Furthermore, we demonstrate the effectiveness of our width search on complex tasks including instance segmentation and image translation. Code and models are released.

JBHI Journal 2021 Journal Article

2D and 3D CT Radiomic Features Performance Comparison in Characterization of Gastric Cancer: A Multi-Center Study

  • Lingwei Meng
  • Di Dong
  • Xin Chen
  • Mengjie Fang
  • Rongpin Wang
  • Jing Li
  • Zaiyi Liu
  • Jie Tian

Objective: Radiomics, an emerging tool for medical image analysis, is potential towards precisely characterizing gastric cancer (GC). Whether using one-slice 2D annotation or whole-volume 3D annotation remains a long-time debate, especially for heterogeneous GC. We comprehensively compared 2D and 3D radiomic features’ representation and discrimination capacity regarding GC, via three tasks ( ${\boldsymbol{T}^{\boldsymbol{LNM}}}$, lymph node metastasis’ prediction; ${\boldsymbol{T}^{\boldsymbol{LVI}}}$, lymphovascular invasion's prediction; ${\boldsymbol{T}^{\boldsymbol{pT}}}$, pT4 or other pT stages’ classification). Methods: Four-center 539 GC patients were retrospectively enrolled and divided into the training and validation cohorts. From 2D or 3D regions of interest (ROIs) annotated by radiologists, radiomic features were extracted respectively. Feature selection and model construction procedures were customed for each combination of two modalities (2D or 3D) and three tasks. Subsequently, six machine learning models ( $\boldsymbol{Model}_{2\boldsymbol{D}}^{\boldsymbol{LNM}}$, $\boldsymbol{Model}_{3\boldsymbol{D}}^{\boldsymbol{LNM}}$; $\boldsymbol{Model}_{2\boldsymbol{D}}^{\boldsymbol{LVI}}$, $\boldsymbol{Model}_{3\boldsymbol{D}}^{\boldsymbol{LVI}}$; $\boldsymbol{Model}_{2\boldsymbol{D}}^{\boldsymbol{pT}}$, $\boldsymbol{Model}_{3\boldsymbol{D}}^{\boldsymbol{pT}}$ ) were derived and evaluated to reflect modalities’ performances in characterizing GC. Furthermore, we performed an auxiliary experiment to assess modalities’ performances when resampling spacing different. Results: Regarding three tasks, the yielded areas under the curve (AUCs) were: $\boldsymbol{Model}_{2\boldsymbol{D}}^{\boldsymbol{LNM}}$ 's 0. 712 (95% confidence interval, 0. 613–0. 811), $\boldsymbol{Model}_{3\boldsymbol{D}}^{\boldsymbol{LNM}}$ 's 0. 680 (0. 584–0. 775); $\boldsymbol{Model}_{2\boldsymbol{D}}^{\boldsymbol{LVI}}$ 's 0. 677 (0. 595–0. 761), $\boldsymbol{Model}_{3\boldsymbol{D}}^{\boldsymbol{LVI}}$ 's 0. 615 (0. 528-0. 703); $\boldsymbol{Model}_{2\boldsymbol{D}}^{\boldsymbol{pT}}$ 's 0. 840 (0. 779–0. 901), $\boldsymbol{Model}_{3\boldsymbol{D}}^{\boldsymbol{pT}}$ 's 0. 813 (0. 747–0. 879). Moreover, the auxiliary experiment indicated that $\boldsymbol{Model}{\boldsymbol{s}_{2\boldsymbol{D}}}$ are statistically advantageous than $\boldsymbol{Model}{\boldsymbol{s}_{3\boldsymbol{D}}}$ with different resampling spacings. Conclusion: Models constructed with 2D radiomic features revealed comparable performances with those constructed with 3D features in characterizing GC. Significance: Our work indicated that time-saving 2D annotation would be the better choice in GC, and provided a related reference to further radiomics-based researches.

JBHI Journal 2021 Journal Article

Accurate and Feasible Deep Learning Based Semi-Automatic Segmentation in CT for Radiomics Analysis in Pancreatic Neuroendocrine Neoplasms

  • Bingsheng Huang
  • Xiaoyi Lin
  • Jingxian Shen
  • Xin Chen
  • Jia Chen
  • Zi-Ping Li
  • Mingyu Wang
  • Chenglang Yuan

Current clinical practice or radiomics studies of pancreatic neuroendocrine neoplasms (pNENs) require manual delineation of the lesions in computed tomography (CT) images, which is time-consuming and subjective. We used a semi-automatic deep learning (DL) method for segmentation of pNENs and verified its feasibility in radiomics analysis. This retrospective study included two datasets: Dataset 1, contrast-enhanced CT images (CECT) of 80 and 18 patients respectively collected from two centers; and Dataset 2, CECT of 56 and 16 patients respectively from two centers. A DL-based semi-automatic segmentation model was developed and validated with Dataset 1 and Dataset 2, and the segmentation results were used for radiomics analysis from which the performance was compared against that based on manual segmentation. The mean Dice similarity coefficient of the trained segmentation model was 81. 8% and 74. 8% for external validation with Dataset 1 and Dataset 2 respectively. Four classifiers frequently used in radiomics studies were trained and tested with leave-one-out cross-validation strategy. For pathological grading prediction with Dataset 1, the area under the receiver operating characteristic curve (AUC) with semi-automatic segmentation was up to 0. 76 and 0. 87 respectively for internal and external validation. For recurrence study with Dataset 2, the AUC with semi-automatic segmentation was up to 0. 78. All these AUCs were not statistically significant from the corresponding results based on manual segmentation. Our study showed that DL-based semi-automatic segmentation is accurate and feasible for the radiomics analysis in pNENs.

NeurIPS Conference 2021 Conference Paper

An Empirical Investigation of Representation Learning for Imitation

  • Cynthia Chen
  • Xin Chen
  • Sam Toyer
  • Cody Wild
  • Scott Emmons
  • Ian Fischer
  • Kuang-Huei Lee
  • Neel Alex

Imitation learning often needs a large demonstration set in order to handle the full range of situations that an agent might find itself in during deployment. However, collecting expert demonstrations can be expensive. Recent work in vision, reinforcement learning, and NLP has shown that auxiliary representation learning objectives can reduce the need for large amounts of expensive, task-specific data. Our Empirical Investigation of Representation Learning for Imitation (EIRLI) investigates whether similar benefits apply to imitation learning. We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites. In the settings we evaluate, we find that existing algorithms for image-based representation learning provide limited value relative to a well-tuned baseline with image augmentations. To explain this result, we investigate differences between imitation learning and other settings where representation learning has provided significant benefit, such as image classification. Finally, we release a well-documented codebase which both replicates our findings and provides a modular framework for creating new representation learning algorithms out of reusable components.

AAAI Conference 2021 Conference Paper

Correlation-Aware Heuristic Search for Intelligent Virtual Machine Provisioning in Cloud Systems

  • Chuan Luo
  • Bo Qiao
  • Wenqian Xing
  • Xin Chen
  • Pu Zhao
  • Chao Du
  • Randolph Yao
  • Hongyu Zhang

The optimization of resource is crucial for the operation of public cloud systems such as Microsoft Azure, as well as servers dedicated to the workloads of large customers such as Microsoft 365. Those optimization tasks often need to take unknown parameters into consideration and can be formulated as Prediction+Optimization problems. This paper proposes a new Prediction+Optimization method named Correlation-Aware Heuristic Search (CAHS) that is capable of accounting for the uncertainty in unknown parameters and delivering effective solutions to difficult optimization problems. We apply this method to solving the predictive virtual machine (VM) provisioning (PreVMP) problem, where the VM provisioning plans are optimized based on the predicted demands of different VM types, to ensure rapid provisions upon customers’ requests and to pursue high resource utilization. Unlike the current state-of-the-art PreVMP approaches that assume independence among the demands for different VM types, CAHS incorporates demand correlation when conducting prediction and optimization in a novel and effective way. Our experiments on two public benchmarks and one industrial benchmark demonstrate that CAHS can achieve better performance than its nine state-of-the-art competitors. CAHS has been successfully deployed in Microsoft Azure and significantly improved its performance. The main ideas of CAHS have also been leveraged to improve the efficiency and the reliability of the cloud services provided by Microsoft 365.

IJCAI Conference 2021 Conference Paper

Few-shot Neural Human Performance Rendering from Sparse RGBD Videos

  • Anqi Pang
  • Xin Chen
  • Haimin Luo
  • Minye Wu
  • Jingyi Yu
  • Lan Xu

Recent neural rendering approaches for human activities achieve remarkable view synthesis results, but still rely on dense input views or dense training with all the capture frames, leading to deployment difficulty and inefficient training overload. However, existing advances will be ill-posed if the input is both spatially and temporally sparse. To fill this gap, in this paper we propose a few-shot neural human rendering approach (FNHR) from only sparse RGBD inputs, which exploits the temporal and spatial redundancy to generate photo-realistic free-view output of human activities. Our FNHR is trained only on the key-frames which expand the motion manifold in the input sequences. We introduce a two-branch neural blending to combine the neural point render and classical graphics texturing pipeline, which integrates reliable observations over sparse key-frames. Furthermore, we adopt a patch-based adversarial training process to make use of the local redundancy and avoids over-fitting to the key-frames, which generates fine-detailed rendering results. Extensive experiments demonstrate the effectiveness of our approach to generate high-quality free view-point results for challenging human performances under the sparse setting.

NeurIPS Conference 2021 Conference Paper

FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark

  • Mingjie Li
  • Wenjia Cai
  • Rui Liu
  • Yuetian Weng
  • Xiaoyun Zhao
  • Cong Wang
  • Xin Chen
  • Zhong Liu

The automatic generation of long and coherent medical reports given medical images (e. g. Chest X-ray and Fundus Fluorescein Angiography (FFA)) has great potential to support clinical practice. Researchers have explored advanced methods from computer vision and natural language processing to incorporate medical domain knowledge for the generation of readable medical reports. However, existing medical report generation (MRG) benchmarks lack both explainable annotations and reliable evaluation tools, hindering the current research advances from two aspects: firstly, existing methods can only predict reports without accurate explanation, undermining the trustworthiness of the diagnostic methods; secondly, the comparison among the predicted reports from different MRG methods is unreliable using the evaluation metrics of natural-language generation (NLG). To address these issues, in this paper, we propose an explainable and reliable MRG benchmark based on FFA Images and Reports (FFA-IR). Specifically, FFA-IR is large, with 10, 790 reports along with 1, 048, 584 FFA images from clinical practice; it includes explainable annotations, based on a schema of 46 categories of lesions; and it is bilingual, providing both English and Chinese reports for each case. Besides using the widely used NLG metrics, we propose a set of nine human evaluation criteria to evaluate the generated reports. We envision FFA-IR as a testbed for explainable and reliable medical report generation. We also hope that it can broadly accelerate medical imaging research and facilitate interaction between the fields of medical imaging, computer vision, and natural language processing.

AAAI Conference 2021 Conference Paper

Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks

  • Xin Chen
  • Lingxi Xie
  • Jun Wu
  • Longhui Wei
  • Yuhui Xu
  • Qi Tian

Neural architecture search has attracted wide attentions in both academia and industry. To accelerate it, researchers proposed weight-sharing methods which first train a super-network to reuse computation among different operators, from which exponentially many sub-networks can be sampled and efficiently evaluated. These methods enjoy great advantages in terms of computational costs, but the sampled sub-networks are not guaranteed to be estimated precisely unless an individual training process is taken. This paper owes such inaccuracy to the inevitable mismatch between assembled network layers, so that there is a random error term added to each estimation. We alleviate this issue by training a graph convolutional network to fit the performance of sampled sub-networks so that the impact of random errors becomes minimal. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates, which consequently leads to better performance of the final architecture. In addition, our approach also enjoys the flexibility of being used under different hardware constraints, since the graph convolutional network has provided an efficient lookup table of the performance of architectures in the entire search space.

TCS Journal 2021 Journal Article

Multiple facility location games with envy ratio

  • Wenjing Liu
  • Yuan Ding
  • Xin Chen
  • Qizhi Fang
  • Qingqin Nong

We study deterministic mechanism design without money for k-facility location games with envy ratio on a real line segment, where a set of strategic agents report their locations and a social planner locates k facilities for minimizing the envy ratio. The objective of envy ratio, which is defined as the maximum over the ratios between any two agents' utilities, is derived from fair division to measure the fairness with respect to a certain facility location profile. The problem is studied in two settings. In the homogeneous k-facility location game where k facilities serve the same purpose, we propose a 2 k 2 k − 1 -approximate deterministic group strategyproof mechanism which is also the best deterministic strategyproof mechanism. In the heterogeneous k-facility location game where each facility serves a different purpose, when k is even, we devise an optimal and group strategyproof mechanism; when k is odd, we provide a k + 1 k − 1 -approximate deterministic group strategyproof mechanism.

NeurIPS Conference 2021 Conference Paper

On the Bias-Variance-Cost Tradeoff of Stochastic Optimization

  • Yifan Hu
  • Xin Chen
  • Niao He

We consider stochastic optimization when one only has access to biased stochastic oracles of the objective, and obtaining stochastic gradients with low biases comes at high costs. This setting captures a variety of optimization paradigms widely used in machine learning, such as conditional stochastic optimization, bilevel optimization, and distributionally robust optimization. We examine a family of multi-level Monte Carlo (MLMC) gradient methods that exploit a delicate trade-off among the bias, the variance, and the oracle cost. We provide a systematic study of their convergences and total computation complexities for strongly convex, convex, and nonconvex objectives, and demonstrate their superiority over the naive biased stochastic gradient method. Moreover, when applied to conditional stochastic optimization, the MLMC gradient methods significantly improve the best-known sample complexity in the literature.

EAAI Journal 2021 Journal Article

The Barzilai–Borwein Method for distributed optimization over unbalanced directed networks

  • Jinhui Hu
  • Xin Chen
  • Lifeng Zheng
  • Ling Zhang
  • Huaqing Li

This paper studies optimization problems over multi-agent systems, in which all agents cooperatively minimize a global objective function expressed as a sum of local cost functions. Each agent in the systems uses only local computation and communication in the overall process without leaking their private information. Based on the Barzilai–Borwein (BB) method and multi-consensus inner loops, a distributed algorithm with the availability of larger step-sizes and accelerated convergence, named as ADBB, is proposed. Moreover, owing to the employment of only row-stochastic weight matrices, ADBB can resolve the optimization problems over unbalanced directed networks without requiring the knowledge of neighbors’ out-degree for each agent. Via establishing contraction relationships between the consensus error, the optimality gap, and the gradient tracking error, ADBB is theoretically proved to converge linearly to the global optimal solution. A real-world data set is used in simulations to validate the correctness of the theoretical analysis.

NeurIPS Conference 2020 Conference Paper

Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning

  • Yifan Hu
  • Siqi Zhang
  • Xin Chen
  • Niao He

Conditional stochastic optimization covers a variety of applications ranging from invariant learning and causal inference to meta-learning. However, constructing unbiased gradient estimators for such problems is challenging due to the composition structure. As an alternative, we propose a biased stochastic gradient descent (BSGD) algorithm and study the bias-variance tradeoff under different structural assumptions. We establish the sample complexities of BSGD for strongly convex, convex, and weakly convex objectives under smooth and non-smooth conditions. Our lower bound analysis shows that the sample complexities of BSGD cannot be improved for general convex objectives and nonconvex objectives except for smooth nonconvex objectives with Lipschitz continuous gradient estimator. For this special setting, we propose an accelerated algorithm called biased SpiderBoost (BSpiderBoost) that matches the lower bound complexity. We further conduct numerical experiments on invariant logistic regression and model-agnostic meta-learning to illustrate the performance of BSGD and BSpiderBoost.

TCS Journal 2020 Journal Article

Flow shop for dual CPUs in dynamic voltage scaling

  • Vincent Chau
  • Xin Chen
  • Ken C.K. Fong
  • Minming Li
  • Kai Wang

We study the following flow shop scheduling problem on two processors. We are given n jobs with a common deadline D, where each job j has workload p i, j on processor i and a set of processors which can vary their speed dynamically. Job j can be executed on the second processor if the execution of job j is completed on the first processor. Our objective is to find a feasible schedule such that all jobs are completed by the common deadline D with minimized energy consumption. For this model, we present a linear program for the discrete speed case, where the processor can only run at specific speeds in S = { s 1, s 2, ⋯, s q } and the job execution order is fixed. We also provide a m α − 1 -approximation algorithm for the arbitrary order case and for continuous speed model where m is the number of processors and α is a parameter of the processor. We then introduce a new variant of flow shop scheduling problem called sense-and-aggregate model motivated by data aggregation in wireless sensor networks where the base station needs to receive data from sensors and then compute a single aggregate result. In this model, the first processor will receive unit size data from sensors and the second processor is responsible for calculating the aggregate result. The second processor can decide when to aggregate and the workload that needs to be done to aggregate x data will be f ( x ) and another unit size data will be generated as the result of the partial aggregation which will then be used in the next round aggregation. Our objective is to find a schedule such that all data are received and aggregated by the deadline with minimum energy consumption. We present an O ( n 5 ) dynamic programming algorithm when f ( x ) = x and a greedy algorithm when f ( x ) = x − 1. Finally, we investigate the performance of the flowshop problem when the order of jobs is fixed by comparing it to the approximation algorithm with an arbitrary order. We show experimentally that the approximation ratio is close to 1 when there are few machines and when there are more jobs.

TCS Journal 2020 Journal Article

General Rumor Blocking: An efficient random algorithm with martingale approach

  • Qizhi Fang
  • Xin Chen
  • Qingqin Nong
  • Zongchao Zhang
  • Yongchang Cao
  • Yan Feng
  • Tao Sun
  • Suning Gong

Rumor Blocking, an important optimization problem in social network, has been extensively studied in the literature. Given social network G = ( V, E ) and rumor seed set A, the goal is asking for k protector seeds that protect the largest expected number of social individuals by truth. However, the source of rumor is always uncertain, rather than being predicted or being known in advance in the real situations, while rumor spreads like wildfire on the Internet. This paper presents General Rumor Blocking with unpredicted rumor seed set (randomized A) and various personal profits while being protected (weights of nodes in V). We first show that the objective function of this problem is non-decreasing and submodular, and thus a ( 1 − 1 / e ) approximate solution can be returned by greedy approach. We then propose an efficient random algorithm R-GRB which returns a ( 1 − 1 / e − ε ) approximate solution with at least 1 − n − ℓ probability. We show that it runs in O ( m ( n − r ) ( k log ⁡ ( n − r ) + ℓ log ⁡ n ) / ε 2 ) expected time, where m = | E |, n = | V |, r = | A | and k is the number of protector seeds. Finally, we conduct extensive experiments to evaluate the R-GRB and show that it is superior in both theory and experiment.

NeurIPS Conference 2020 Conference Paper

Graph Stochastic Neural Networks for Semi-supervised Learning

  • Haibo Wang
  • Chuan Zhou
  • Xin Chen
  • Jia Wu
  • Shirui Pan
  • Jilong Wang

Graph Neural Networks (GNNs) have achieved remarkable performance in the task of the semi-supervised node classification. However, most existing models learn a deterministic classification function, which lack sufficient flexibility to explore better choices in the presence of kinds of imperfect observed data such as the scarce labeled nodes and noisy graph structure. To improve the rigidness and inflexibility of deterministic classification functions, this paper proposes a novel framework named Graph Stochastic Neural Networks (GSNN), which aims to model the uncertainty of the classification function by simultaneously learning a family of functions, i. e. , a stochastic function. Specifically, we introduce a learnable graph neural network coupled with a high-dimensional latent variable to model the distribution of the classification function, and further adopt the amortised variational inference to approximate the intractable joint posterior for missing labels and the latent variable. By maximizing the lower-bound of the likelihood for observed node labels, the instantiated models can be trained in an end-to-end manner effectively. Extensive experiments on three real-world datasets show that GSNN achieves substantial performance gain in different scenarios compared with stat-of-the-art baselines.

IJCAI Conference 2020 Conference Paper

Intelligent Virtual Machine Provisioning in Cloud Computing

  • Chuan Luo
  • Bo Qiao
  • Xin Chen
  • Pu Zhao
  • Randolph Yao
  • Hongyu Zhang
  • Wei Wu
  • Andrew Zhou

Virtual machine (VM) provisioning is a common and critical problem in cloud computing. In industrial cloud platforms, there are a huge number of VMs provisioned per day. Due to the complexity and resource constraints, it needs to be carefully optimized to make cloud platforms effectively utilize the resources. Moreover, in practice, provisioning a VM from scratch requires fairly long time, which would degrade the customer experience. Hence, it is advisable to provision VMs ahead for upcoming demands. In this work, we formulate the practical scenario as the predictive VM provisioning (PreVMP) problem, where upcoming demands are unknown and need to be predicted in advance, and then the VM provisioning plan is optimized based on the predicted demands. Further, we propose Uncertainty-Aware Heuristic Search (UAHS) for solving the PreVMP problem. UAHS first models the prediction uncertainty, and then utilizes the prediction uncertainty in optimization. Moreover, UAHS leverages Bayesian optimization to interact prediction and optimization to improve its practical performance. Extensive experiments show that UAHS performs much better than state-of-the-art competitors on two public datasets and an industrial dataset. UAHS has been successfully applied in Microsoft Azure and brought practical benefits in real-world applications.

IROS Conference 2020 Conference Paper

Path Planning Under MIMO Network Constraints for Throughput Enhancement in Multi-robot Data Aggregation Tasks

  • Alexandra Pogue
  • Samer S. Hanna
  • Andy Nichols
  • Xin Chen
  • Danijela Cabric
  • Ankur Mehta

Under line-of-sight (LOS) network conditions, multi-input multi-output (MIMO) wireless communications can increase the channel capacity between a team of robots and a multi-antenna array at a stationary base station. This increased capacity can result in greater data throughput, shortening the time necessary to complete channel-limited data aggregation tasks. To take advantage of this higher capacity channel, the robots in the team must be positioned to maximize complex channel orthogonality between each robot and receiver antenna. Using geometrically motivated assumptions, we derive transmitter spacing rules that can be easily be added on to existing path plans to improve backhaul throughput for data offloading from the robot team, with minimal impact on other system objectives. We demonstrate the effectiveness of the approach- both in ideal as well as realistic channels outside the domain of our simplifying assumptions-with numerical examples of robot-coordinated path plans in two example environments, achieving up to 42% improvement in task completion times.

YNICL Journal 2020 Journal Article

Protein-based amide proton transfer-weighted MR imaging of amnestic mild cognitive impairment

  • Zewen Zhang
  • Caiqing Zhang
  • Jian Yao
  • Xin Chen
  • Fei Gao
  • Shanshan Jiang
  • Weibo Chen
  • Jinyuan Zhou

Amide proton transfer-weighted (APTw) MRI is a novel molecular imaging technique that can noninvasively detect endogenous cellular proteins and peptides in tissue. Here, we demonstrate the feasibility of protein-based APTw MRI in characterizing amnestic mild cognitive impairment (aMCI). Eighteen patients with confirmed aMCI and 18 matched normal controls were scanned at 3 Tesla. The APTw, as well as conventional magnetization transfer ratio (MTR), signal differences between aMCI and normal groups were assessed by the independent samples t-test, and the receiver-operator-characteristic analysis was used to assess the diagnostic performance of APTw. When comparing the normal control group, aMCI brains typically had relatively higher APTw signals. Quantitatively, APTw intensity values were significantly higher in nine of 12 regions of interest in aMCI patients than in normal controls. The largest areas under the receiver-operator-characteristic curves were 0.88 (gray matter in occipital lobe) and 0.82 (gray matter in temporal lobe, white matter in occipital lobe) in diagnosing aMCI patients. On the contrary, MTR intensity values were significantly higher in only three of 12 regions of interest in the aMCI group. Additionally, the age dependency analyses revealed that these cross-sectional APTw/MTR signals had an increasing trend with age in most brain regions for normal controls, but a decreasing trend with age in most brain regions for aMCI patients. Our early results show the potential of the APTw signal as a new imaging biomarker for the noninvasive molecular diagnosis of aMCI.

YNICL Journal 2019 Journal Article

Disrupted functional and structural connectivity within default mode network contribute to WMH-related cognitive impairment

  • Xin Chen
  • Lili Huang
  • Qing Ye
  • Dan Yang
  • Ruomeng Qin
  • Caimei Luo
  • Mengchun Li
  • Bing Zhang

AIMS: The prevalence of white matter hyperintensities (WMH) rises dramatically with aging. Both the progression of WMH and changing patterns of default mode network (DMN) have been proven to be closely associated with cognitive function. The present study hypothesized that changes in functional connectivity and structural connectivity of DMN contributed to WMH related cognitive impairment. METHODS: A total of 116 subjects were enrolled from the Cerebral Small Vessel Disease Register in Drum Tower Hospital of Nanjing University, and were distributed across three categories according to Fazekas rating scale: WMH I (n = 57), WMH II (n = 34), and WMH III(n = 25). All participants underwent neuropsychological tests and multimodal MRI scans, including diffusion tensor imaging and resting-state fMRI imaging. The alterations of functional connectivity and structural connectivity within the DMN were further explored. RESULTS: Age and hypertension were risk factors for WMH progression. Subjects with a higher WMH burden displayed higher DMN functional connectivity in the medial frontal gyrus, while lower DMN functional connectivity in the thalamus. After adjusting for aging, gender, and education, the increased DMN functional connectivity in the medial frontal gyrus, and the increased mean diffusivity of the white matter tracts between the hippocampus and posterior cingulate cortex were independent indicators of worse performance in memory. Moreover, the decreased DMN functional connectivity in the thalamus and increased mean diffusivity of the white matter tracts between the thalamus and posterior cingulate cortex were independent risk factors for a slower processing speed. CONCLUSION: The changes in functional connectivity and structural connectivity within the DMN attributed to WMH progression were responsible for the development of cognitive impairment.

NeurIPS Conference 2019 Conference Paper

Online Optimal Control with Linear Dynamics and Predictions: Algorithms and Regret Analysis

  • Yingying Li
  • Xin Chen
  • Na Li

This paper studies the online optimal control problem with time-varying convex stage costs for a time-invariant linear dynamical system, where a finite lookahead window of accurate predictions of the stage costs are available at each time. We design online algorithms, Receding Horizon Gradient-based Control (RHGC), that utilize the predictions through finite steps of gradient computations. We study the algorithm performance measured by dynamic regret: the online performance minus the optimal performance in hindsight. It is shown that the dynamic regret of RHGC decays exponentially with the size of the lookahead window. In addition, we provide a fundamental limit of the dynamic regret for any online algorithms by considering linear quadratic tracking problems. The regret upper bound of one RHGC method almost reaches the fundamental limit, demonstrating the effectiveness of the algorithm. Finally, we numerically test our algorithms for both linear and nonlinear systems to show the effectiveness and generality of our RHGC.

YNICL Journal 2019 Journal Article

Reorganization of the somatosensory pathway after subacute incomplete cervical cord injury

  • Qian Chen
  • Weimin Zheng
  • Xin Chen
  • Xuejing Li
  • Ling Wang
  • Wen Qin
  • Kuncheng Li
  • Nan Chen

OBJECTIVE: The main purpose of the present study was to investigate the possible somatosensory-related brain functional reorganization after traumatic spinal cord injury (SCI). METHODS: Thirteen patients with subacute incomplete cervical cord injury (ICCI) and thirteen age- and sex-matched healthy controls (HCs) were recruited. Eleven patients and all the HCs underwent both sensory task-related brain functional scanning and whole brain structural scanning on a 3.0 Tesla MRI system, and two patients underwent only structural scanning; the process of structural scanning was completed on thirteen patients, while functional scanning was only applied to eleven patients. We performed sensory task-related functional MRI (fMRI) to investigate the functional changes in the brain. In addition, voxel-based morphometry (VBM) was applied to explore whether any sensory-related brain structural changes occur in the whole brain after SCI. RESULTS: Compared with HCs, ICCI patients exhibited decreased activation in the left postcentral gyrus (postCG), the brainstem (midbrain and right pons) and the right cerebellar lobules IV-VI. Moreover, a significant positive association was found between the activation in the left PostCG and the activation in both the brainstem and the right cerebellar lobules IV-VI. Additionally, the decrease in gray matter volume (GMV) was detected in the left superior parietal lobule (SPL). The decrease of white matter volume (WMV) was observed in the right temporal lobe, the right occipital lobe, and the right calcarine gyrus. No structural change in the primary sensory cortex (S1), the secondary somatosensory cortex (S2) or the thalamus was detected. CONCLUSION: These functional and structural findings may demonstrate the existence of an alternative pathway in the impairment of somatosensory function after SCI, which consists of the ipsilateral cerebellum, the brainstem and the contralateral postCG. It provides a new theoretical basis for the mechanism of sensory-related brain alteration in SCI patients and the rehabilitation therapy based on this pathway in the future.

YNICL Journal 2018 Journal Article

Gray-matter-specific MR imaging improves the detection of epileptogenic zones in focal cortical dysplasia: A new sequence called fluid and white matter suppression (FLAWS)

  • Xin Chen
  • Tianyi Qian
  • Tobias Kober
  • Guojun Zhang
  • Zhiwei Ren
  • Tao Yu
  • Yueshan Piao
  • Nan Chen

Objectives: To evaluate the diagnostic value and characteristic features of FCD epileptogenic zones using a novel sequence called fluid and white matter suppression (FLAWS). Materials and methods: Thirty-nine patients with pathologically confirmed FCD and good surgery outcomes (class I or II, according to the Engel Epilepsy Surgery Outcome Scale) were retrospectively included in the study. All the patients underwent a preoperative whole-brain MRI examination that included conventional sequences (T2WI, T1WI, two-dimensional (2D) axial, coronal fluid-attenuated inversion recovery [FLAIR]) and FLAWS. An additional 3D-FLAIR MRI sequence was performed in 17 patients. To evaluate the sensitivity and specificity of FLAWS and investigate the cause of false-positives, 36 healthy volunteers were recruited as normal controls. Two radiologists evaluated all the image data. The detection rates of the FCD epileptogenic zone on different sequences were compared based on five criteria: abnormal cortical morphology (thickening, thinning, or abnormally deep sulcus); abnormal cortical signal intensity; blurred gray-white matter junction; abnormal signal intensity of the subcortical white matter, and the transmantle sign. The sensitivity and specificity of FLAWS for detecting the FCD lesions were calculated with the reviewers blinded to all the clinical information, i.e. to the patient identity and the location of the resected regions. To explore how many features were sufficient for the diagnosis of the epileptogenic zones, the frequency of each criterion in the resected regions and their combinations were assessed on FLAWS, according to the results of the assessment when the reviewers were aware of the location of the resected regions. Based on the findings of the 17 patients with an additional 3D-FLAIR scan when the reviewers were aware of the location of the resected regions, quantitative analysis of the regions of interest was used to compare the tissue contrast among 2D-axial FLAIR, 3D-FLAIR, and the FLAWS sequence. Visualization score analysis was used to evaluate the visualization of the five features on conventional, 3D-FLAIR, and FLAWS images. Finally, to explore the reason for false-positive results, a further evaluation of the whole brain FLAWS images was conducted for all the subjects. Results: The sensitivity and specificity for detecting the FCD lesions on the FLAWS sequence were 71.9% and 71.1%, respectively. When the reviewers were blinded to the location of the resected regions, the detection rate of the FLAWS sequence was significantly higher than that of the conventional sequences (P = 0.00). In the 17 patients who underwent an additional 3D FLAIR scan, no statistically significant difference was found between the FLAWS and the 3D-FLAIR (P = 0.25). All the patients had at least two imaging features, one of which was "the blurred junction of the gray-white matter." The transmantle sign, which is widely believed to be a specific feature of FCD type II, could also be observed in type I on the FLAWS sequence. The relative tissue contrast of FLAWS was higher than that of the 2D-FLAIR with respect to lesion/white matter (WM), deep gray matter (GM)/WM, and cortex/WM (P = 0.00 for all three measures) and higher than that of the 3D-FLAIR with respect to the lesion/WM (P = 0.01). The visualization score analysis showed that the visualization of FLAWS was more enhanced than that of the conventional and 3D-FLAIR images with respect to the blurred junction (P = 0.00 for both comparisons) and the abnormal signal intensity of the subcortical white matter (P = 0.01 for both comparisons). The thin-threadlike signal and individual FCD features outside the epileptogenic regions were considered the primary cause of the false-positive results of FLAWS. Conclusions: FLAWS can help in the detection of FCD epileptogenic zones. It is recommended that epileptogenic zone on FLAWS be diagnosed based on a combination of two features, one of which should be the "blurred junction of the gray-white matter" in types I and II. In type III, the combination of "the blurred junction of the gray-white matter" with "abnormal signal intensity of subcortical white matter" is recommended.

TAAS Journal 2017 Journal Article

Integrating Reinforcement Learning with Multi-Agent Techniques for Adaptive Service Composition

  • Hongbign Wang
  • Xin Chen
  • Qin Wu
  • Qi Yu
  • Xingguo Hu
  • Zibin Zheng
  • Athman Bouguettaya

Service-oriented architecture is a widely used software engineering paradigm to cope with complexity and dynamics in enterprise applications. Service composition, which provides a cost-effective way to implement software systems, has attracted significant attention from both industry and research communities. As online services may keep evolving over time and thus lead to a highly dynamic environment, service composition must be self-adaptive to tackle uninformed behavior during the evolution of services. In addition, service composition should also maintain high efficiency for large-scale services, which are common for enterprise applications. This article presents a new model for large-scale adaptive service composition based on multi-agent reinforcement learning. The model integrates reinforcement learning and game theory, where the former is to achieve adaptation in a highly dynamic environment and the latter is to enable agents to work for a common task (i.e., composition). In particular, we propose a multi-agent Q-learning algorithm for service composition, which is expected to achieve better performance when compared with the single-agent Q-learning method and multi-agent SARSA (State-Action-Reward-State-Action) method. Our experimental results demonstrate the effectiveness and efficiency of our approach.

IJCAI Conference 2017 Conference Paper

Switched Linear Multi-Robot Navigation Using Hierarchical Model Predictive Control

  • Chao Huang
  • Xin Chen
  • Yifan Zhang
  • Shengchao Qin
  • Yifeng Zeng
  • Xuandong Li

Multi-robot navigation control in the absence of reference trajectory is rather challenging as it is expected to ensure stability and feasibility while still offer fast computation on control decisions. The intrinsic high complexity of switched linear dynamical robots makes the problem even more challenging. In this paper, we propose a novel HMPC based method to address the navigation problem of multiple robots with switched linear dynamics. We develop a new technique to compute the reachable sets of switched linear systems and use them to enable the parallel computation of control parameters. We present theoretical results on stability, feasibility and complexity of the proposed approach, and demonstrate its empirical advance in performance against other approaches.

IJCAI Conference 2016 Conference Paper

Hierarchical Model Predictive Control for Multi-Robot Navigation

  • Chao Huang
  • Xin Chen
  • Yifan Zhang
  • Shengchao Qin
  • Yifeng Zeng
  • Xuandong Li

Ensuring the stability is the most important requirement for the navigation control of multi-robot systems with no reference trajectory. The popular heuristic-search methods cannot provide theoretical guarantees on stability. In this paper, we propose a Hierarchical Model Predictive Control scheme that employs reachable sets to decouple the navigation problem of linear dynamical multi-robot systems. The proposed control scheme guarantees the stability and feasibility, and is more efficient and viable than other Model Predictive Control schemes, as evidenced by our simulation results.

EAAI Journal 2015 Journal Article

Coordinated learning based on time-sharing tracking framework and Gaussian regression for continuous multi-agent systems

  • Xin Chen
  • Penghuan Xie
  • Yong He
  • Min Wu

Applying multi-agent reinforcement learning (MARL) in continuous distributed control system is an attractive issue, because it entitles agents adaptively to construct a cooperative behavior, even if the dynamics of such distributed system is unknown a priori. However the implementation of MARL always suffers from dimension explosion, nonstationary learning, and generalization in continuous systems. This paper presents a continuous coordinated learning algorithm with time-sharing tracking framework (CCL-TT) to deal with these problems, in which the value function is dimension reduced to lighten dimension explosion, the time-sharing tracking framework (TTF) is developed to solve nonstationary learning, and Gaussian regression modeling is applied to realize generalization. With TTF, a macroscopic concurrent learning is set up to meet the requirements of temporal stationary condition in value learning and generalization. Finally the simulation illustrates how CCL-TT realizes cooperative learning without knowledge about the dynamics of the system, even with disturbance.

EAAI Journal 2013 Journal Article

Fuzzy SVM learning control system considering time properties of biped walking samples

  • Liyang Wang
  • Zhi Liu
  • C.L. Philip Chen
  • Yun Zhang
  • Sukhan Lee
  • Xin Chen

To learn biped walking dynamics accurately, and then compensate time-varying external disturbances timely, a time-sequence-based fuzzy SVM (TSF-SVM) learning control system considering time properties of biped walking samples is proposed. For the first time, time-sequence-based triangular and Gaussian fuzzy membership functions have been proposed for the single support phase (SSP) and the double support phase (DSP), respectively, according to time properties of different biped phases, which provides an effective way to formulate time properties of biped walking samples in the context of time-varying external disturbances. In addition, a time-sequence-based moving learning window (TS-MLW) is proposed for online training of the proposed TSF-SVM. The performance of the proposed TSF-SVM is compared with other typical intelligent methods; simulation results demonstrate that the proposed method is more sensitive to occasional external disturbances, which increases the stability margin and prevents the robot from falling down.

TCS Journal 2011 Journal Article

Optimal algorithms for online scheduling with bounded rearrangement at the end

  • Xin Chen
  • Yan Lan
  • Attila Benko
  • György Dósa
  • Xin Han

In this paper, we consider an online non-preemptive scheduling problem on two related machines, where at most K jobs are allowed to be rearranged, but only after all jobs have been revealed and (temporarily) scheduled. We minimize the makespan, and we call the problem as Online scheduling with bounded rearrangement at the end (BRE), which is a semi-online problem. Jobs arrive one by one over list. After all the jobs have been arrived and scheduled, we are informed that the input sequence is over; then at most K already scheduled jobs can be reassigned. With respect to the worst case ratio, we close the gap between the lower bound and upper bound, improving the previous result as well. Especially, for the lower bound, (i) for s ≥ 2 an improved lower bound s + 2 s + 1 is obtained, which is better than ( s + 1 ) 2 s 2 + s + 1 (Liu et al. (2009) [9]); (ii) for 1 + 5 2 ≤ s < 2, an improved lower bound s 2 s 2 − s + 1 is obtained, which is better than ( s + 1 ) 2 s 2 + s + 1 (Liu et al. (2009) [9]). For the upper bound, (i) for s ≥ 2 and K = 1, a new upper bound s + 2 s + 1 is obtained, which is optimal and better than the one s + 1 s in Liu et al. (2009) [9]; (ii) for 1 + 5 2 ≤ s < 2 and K = 2, an upper bound s 2 s 2 − s + 1 is proposed, which is optimal and better than the previous one s + 1 s in Liu et al. (2009) [9]; (iii) for s < 1 + 5 2 and K = 2, an upper bound ( s + 1 ) 2 s 2 + s + 1 is obtained, which is also optimal and better than the previous one min { s + 1 s, ( s + 1 ) 2 s + 2 } in Liu et al. (2009) [9].

ICRA Conference 2010 Conference Paper

Wearable accelerometer based extendable activity recognition system

  • Jie Yang 0002
  • Shuangquan Wang
  • Ningjiang Chen
  • Xin Chen
  • Pengfei Shi

Recognizing the human activities of daily living (ADL) is an important research issue in the pervasive environment. Activity recognition is treated as a classification problem and the multi-class classifier is often used. Though the multi-class classifier can obtain high classification accuracy, it can not detect the noise activities and unknown activities, and the system has no extendable recognition capability. In this paper, we proposed a recognition system which can recognize known activities and detect unknown activities simultaneously. For each known activity, one one-class classification model is built up and the combined one-class classification models are used to judge whether a test sample belongs to known activities. For the known samples, the multi-class classifier is used to recognize their types. For the continuous unknown samples, based on segmentation algorithm, training samples of new activities are extracted and added into the recognition system to extend the system's recognition capability.

IROS Conference 2006 Conference Paper

On-Line Vibration Source Detection of Running Trains Based on Acceleration Measurement

  • Chengyou Wang
  • Qiugen Xiao
  • Hua Liang
  • Xin Chen
  • Xuanping Cai
  • Yun-Hui Liu 0001

To ensure safety of railway operation, it is important to regularly check railway conditions such as deformation of the rails. To monitor rail deformation, this paper presents a method for detecting sources of vibrations a running train on-line by measuring accelerations, which include the train bogie's lateral acceleration, and the crossbeam's lateral and vertical accelerations. A series of detection algorithms including peak-peak value entropy comparison, weighted correlation coefficients comparison etc. are proposed in the method, according to different characters of vibrations from train itself and rail deformation. To eliminate the vibration due to the train itself, the algorithm employs the peak-peak value entropy comparison. To identify the order of the vibrations between crossbeam and bogie, a weighted correlation coefficient is applied. Weight center and maximum position are used to detect at last. The algorithms were implemented on a passenger train using ARM processor and real experiments were conducted on the train on the railway between Shenyang and Dalian in China. The experiments demonstrated that the proposed method can produce satisfactory results.

NeurIPS Conference 2005 Conference Paper

The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search

  • Gregory Zelinsky
  • Wei Zhang
  • Bing Yu
  • Xin Chen
  • Dimitris Samaras

To investigate how top-down (TD) and bottom-up (BU) information is weighted in the guidance of human search behavior, we manipulated the proportions of BU and TD components in a saliency-based model. The model is biologically plausible and implements an artificial retina and a neuronal population code. The BU component is based on feature- contrast. The TD component is defined by a feature-template match to a stored target representation. We compared the model’s behavior at differ- ent mixtures of TD and BU components to the eye movement behavior of human observers performing the identical search task. We found that a purely TD model provides a much closer match to human behavior than any mixture model using BU information. Only when biological con- straints are removed (e. g. , eliminating the retina) did a BU/TD mixture model begin to approximate human behavior.