Arrow Research search

Author name cluster

Shuo Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

52 papers
2 author rows

Possible papers

52

AAAI Conference 2026 Conference Paper

ADAPT: Adaptive Decentralized Architecture with Perception-Aligned Training for Structural Generalization in Multi-Agent RL

  • Zhixiang Zhang
  • Shuo Chen
  • Yexin Li
  • Feng Wang

Multi-agent reinforcement learning (MARL) excels in cooperative and competitive tasks, but most architectures are tied to fixed input-output sizes and require retraining when the number of perceptible or controllable objects changes. While structural generalization techniques mitigate this, they rely on centralized training, raising concerns about scalability and privacy. We propose ADAPT, the first framework to support structural generalization under a decentralized training and decentralized execution (DTDE) paradigm. Every agent adopts an object-centric view, encoding each observed object into a feature vector and aggregating them into a variable-length set representation. To enable each agent to infer task-level contexts from this dynamic input independently, we propose a dynamic-consistency loss that enforces spatio-temporal alignment between context representations and observed environmental dynamics. Agents then condition their policies on the inferred contexts to make locally aligned decisions. For zero-shot transfer, we propose FINE (Foresight INdex for multi-agEnt), a metric that considers Q-value overestimation and enables cross-policy comparison of long-term impact, facilitating effective policy transfer. Experiments show that ADAPT surpasses existing DTDE methods and outperforms CTDE baselines in zero-shot generalization.

TIST Journal 2026 Journal Article

Atom-Motif Contrastive Transformer for Molecular Property Prediction

  • Wentao Yu
  • Shuo Chen
  • Chen Gong
  • Bo Han
  • Gang Niu
  • Masashi Sugiyama

Recently, Graph Transformer (GT) models have been widely used in the task of Molecular Property Prediction (MPP) due to their high reliability in characterizing the latent relationship among graph nodes (i.e., the atoms in a molecule). However, most existing GT-based methods usually explore the basic interactions between pairwise atoms, and thus they fail to consider the important interactions among critical motifs (e.g., functional groups consisted of several atoms) of molecules. As motifs in a molecule are significant patterns that are of great importance for determining molecular properties (e.g., toxicity and solubility), overlooking motif interactions inevitably hinders the effectiveness of MPP. To address this issue, we propose a novel Atom-Motif Contrastive Transformer (AMCT), which not only explores the atom-level interactions but also considers the motif-level interactions. Since the representations of atoms and motifs for a given molecule are actually two different views of the same instance, they are naturally aligned to generate the self-supervisory signals for model training. Meanwhile, the same motif can exist in different molecules, and hence we also employ the contrastive loss to maximize the representation agreement of identical motifs across different molecules. Finally, in order to clearly identify the motifs that are critical in deciding the properties of each molecule, we further construct a property-aware attention mechanism into our learning framework. Our proposed AMCT is extensively evaluated on 10 popular benchmark datasets, and both quantitative and qualitative results firmly demonstrate its effectiveness when compared with the state-of-the-art methods.

AAAI Conference 2026 Conference Paper

Kronos: A Foundation Model for the Language of Financial Markets

  • Yu Shi
  • Zongliang Fu
  • Shuo Chen
  • Bohan Zhao
  • Wei Xu
  • Changshui Zhang
  • Jian Li

The success of large-scale pre-training paradigm, exemplified by Large Language Models (LLMs), has inspired the development of Time Series Foundation Models (TSFMs). However, their application to financial candlestick (K-line) data remains limited, often underperforming non-pre-trained architectures. Moreover, existing TSFMs often overlook crucial downstream tasks such as volatility prediction and synthetic data generation. To address these limitations, we propose Kronos, a unified, scalable pre-training framework tailored to financial K-line modeling. Kronos introduces a specialized tokenizer that discretizes continuous market information into token sequences, preserving both price dynamics and trade activity patterns. We pre-train Kronos using an autoregressive objective on a massive, multi-market corpus of over 12 billion K-line records from 45 global exchanges, enabling it to learn nuanced temporal and cross-asset representations. Kronos excels in a zero-shot setting across a diverse set of financial tasks. On benchmark datasets, Kronos boosts price series forecasting RankIC by 93% over the leading TSFM and 87% over the best non-pre-trained baseline. It also achieves a 9% lower MAE in volatility forecasting and a 22% improvement in generative fidelity for synthetic K-line sequences. These results establish Kronos as a robust, versatile foundation model for end-to-end financial time series analysis.

EAAI Journal 2026 Journal Article

Mining high average-efficiency itemsets based on a compact list structure

  • Gufeng Li
  • Xuanwei Zhang
  • Tao Shang
  • Shuo Chen

High utility itemset mining is a pivotal research direction in data mining, aiming to discover high-value itemsets within datasets. However, traditional approaches ignore product costs, complicating the identification of truly high-revenue combinations. To address this issue, high efficiency itemset mining has been proposed, which defines efficiency as the utility-to-cost ratio. Nevertheless, this measure fails to consider itemset sizes, potentially causing efficiency to inflate as the itemset expands and posing a fairness problem when a uniform threshold is used to evaluate itemsets of different sizes. Consequently, high average-efficiency itemset mining has been introduced for more equitable assessment. To enhance the performance of this algorithm, we propose an improved high average-efficiency itemset mining algorithm based on a compact list structure. Our algorithm introduces a novel average-efficiency list structure, derives a tight upper bound for the maximum average efficiency, and incorporates an innovative pruning strategy. Furthermore, by leveraging an estimated average-efficiency co-occurrence structure, our algorithm significantly reduces the number of join operations. These optimizations collectively result in a substantial improvement in the mining of high average-efficiency itemsets. Experimental results confirm that the proposed algorithm achieves significant improvements in both computational efficiency and scalability.

AAAI Conference 2026 Conference Paper

RMLer: Synthesizing Novel Objects Across Diverse Categories via Reinforcement Mixing Learning

  • Jun Li
  • Zikun Chen
  • Haibo Chen
  • Shuo Chen
  • Jian Yang

Novel object synthesis by integrating distinct textual concepts from diverse categories remains a significant challenge in text-to-image generation. Existing methods often suffer from insufficient concept mixing, lack of rigorous evaluation, and suboptimal outputs, resulting in conceptual imbalance, superficial combinations, or mere juxtapositions. To address these limitations, we propose Reinforcement Mixing Learning (RMLer), a framework that formulates cross-category concept fusion as a reinforcement learning problem: mixed features serve as states, mixing strategies as actions, and visual outcomes as rewards. Specifically, we design an MLP policy network to predict dynamic coefficients for blending cross-category text embeddings. We further introduce visual rewards based on (1) semantic similarity and (2) compositional balance between the fused object and its constituent concepts, and optimize the policy via proximal policy optimization. At inference time, a selection strategy leverages these rewards to curate the highest-quality fused objects. Extensive experiments demonstrate that RMLer synthesizes coherent, high-fidelity objects from diverse categories and consistently outperforms existing methods. Our work provides a robust framework for generating novel visual concepts, with promising applications in film, gaming, and design.

ICLR Conference 2025 Conference Paper

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

  • Xingrui Wang
  • Wufei Ma
  • Angtian Wang
  • Shuo Chen
  • Adam Kortylewski
  • Alan L. Yuille

For vision-language models (VLMs), understanding the dynamic properties of objects and their interactions in 3D scenes from videos is crucial for effective reasoning about high-level temporal and action semantics. Although humans are adept at understanding these properties by constructing 3D and temporal (4D) representations of the world, current video understanding models struggle to extract these dynamic semantics, arguably because these models use cross-frame reasoning without underlying knowledge of the 3D/4D scenes. In this work, we introduce **DynSuperCLEVR**, the first video question answering dataset that focuses on language understanding of the dynamic properties of 3D objects. We concentrate on three physical concepts—*velocity*, *acceleration*, and *collisions*—within 4D scenes. We further generate three types of questions, including factual queries, future predictions, and counterfactual reasoning that involve different aspects of reasoning on these 4D dynamic properties. To further demonstrate the importance of explicit scene representations in answering these 4D dynamics questions, we propose **NS-4DPhysics**, a **N**eural-**S**ymbolic VideoQA model integrating **Physics** prior for **4D** dynamic properties with explicit scene representation of videos. Instead of answering the questions directly from the video text input, our method first estimates the 4D world states with a 3D generative model powered by a physical prior, and then uses neural symbolic reasoning to answer the questions based on the 4D world states. Our evaluation on all three types of questions in DynSuperCLEVR shows that previous video question answering models and large multimodal models struggle with questions about 4D dynamics, while our NS-4DPhysics significantly outperforms previous state-of-the-art models.

TMLR Journal 2025 Journal Article

CyberThreat-Eval: Can Large Language Models Automate Real-World Threat Research?

  • Xiangsen Chen
  • Xuan Feng
  • Shuo Chen
  • Matthieu Maitre
  • Sudipto Rakshit
  • Diana Duvieilh
  • Ashley Picone
  • Nan Tang

Analyzing Open Source Intelligence (OSINT) from large volumes of data is critical for drafting and publishing comprehensive CTI reports. This process usually follows a three-stage workflow---triage, deep search and TI drafting. While Large Language Models (LLMs) offer a promising route toward automation, existing benchmarks still have limitations. These benchmarks often consist of tasks that do not reflect real-world analyst workflows. For example, human analysts rarely receive tasks in the form of multiple-choice questions. Also, existing benchmarks often rely on model-centric metrics that emphasize lexical overlap rather than actionable, detailed insights essential for security analysts. Moreover, they typically fail to cover the complete three-stage workflow. To address these issues, we introduce CyberThreat-Eval, which is collected from the daily CTI workflow of a world-leading company. This expert-annotated benchmark assesses LLMs on practical tasks across all three stages as mentioned above. It utilizes analyst-centric metrics that measure factual accuracy, content quality, and operational costs. Our evaluation using this benchmark reveals important insights into the limitations of current LLMs. For example, LLMs often lack the nuanced expertise required to handle complex details and struggle to distinguish between correct and incorrect information. To address these challenges, the CTI workflow incorporates both external ground-truth databases and human expert knowledge. TRA allows human experts to iteratively provide feedback for continuous improvement. The code of CyberThreat-Eval benchmark is available at https://github.com/secintelligence/CyberThreat-Eval.

NeurIPS Conference 2025 Conference Paper

Enhancing Contrastive Learning with Variable Similarity

  • Haowen Cui
  • Shuo Chen
  • Jun Li
  • Jian Yang

Contrastive learning has achieved remarkable success in self-supervised learning by pretraining a generalizable feature representation based on the augmentation invariance. Most existing approaches assume that different augmented views of the same instance (i. e. , the positive pairs ) remain semantically invariant. However, the augmentation results with varying extent may introduce semantic discrepancies or even content distortion, and thus the conventional (pseudo) supervision from augmentation invariance may lead to misguided learning objectives. In this paper, we propose a novel method called Contrastive Learning with Variable Similarity (CLVS) to accurately characterize the intrinsic similarity relationships between different augmented views. Our method dynamically adjusts the similarity based on the augmentation extent, and it ensures that strongly augmented views are always assigned lower similarity scores than weakly augmented ones. We provide a theoretical analysis to guarantee the effectiveness of the variable similarity in improving model generalizability. Extensive experiments demonstrate the superiority of our approach, achieving gains of 2. 1\% on ImageNet-100 and 1. 4\% on ImageNet-1k compared with the state-of-the-art methods.

TAAS Journal 2025 Journal Article

HAG-MTF: Higher-Order Adaptive Generative Graph for Massive Traffic Forecasting in Industry 5.0

  • Lei Wang
  • Huaming Wu
  • Fan Zhang
  • Keqiu Li
  • Wei Yu
  • Shuo Chen

With the evolution of urban smart transportation, the complexity of urban traffic networks escalates, emphasizing the importance of large-scale traffic data prediction in traffic management and urban planning. Traditional spatiotemporal graph models, such as Graph-WaveNet and MTGCN, face exponentially increasing computational complexity as the spatial dimensions expand. To address this challenge, we propose a novel Higher-order Adaptive Generative graph for Massive Traffic Forecasting (HAG-MTF) approach, which utilizes generative AI and high-order graph structures to model the intricate spatial dependencies in large-scale traffic data. The HAG-MTF incorporates a high-order dimensionality reduction module to optimize traffic node processing, utilizing prior graph relationships to generate a fusion graph that dynamically incorporates neighborhood information for efficient, localized graph convolution. The model further incorporates the high-order spatiotemporal relationship extraction module (H-net), enhancing the capacity and speed of traffic data processing while boosting prediction accuracy for complex spatial structures. Furthermore, HAG-MTF introduces a fusion loss function that hierarchically balances multiple objectives, ensuring both precision and computational efficiency. HAG-MTF adaptively handles large-scale real-world traffic data, meeting the needs of traffic controllers and urban planners for predicting massive datasets in practical settings. It supports efficient, flexible interactions via parameter tuning and model outputs, ultimately integrating human insights into traffic analysis and decision-making. This dynamic human-machine collaboration differs from non-Industry 5.0 approaches, which rely on purely automated systems without human input. Those lead to inflexible, brittle conclusions and recommendations, neglecting shifts in traffic patterns driven by human behavior. Extensive experiments on real-world traffic datasets demonstrate that HAG-MTF significantly improves processing efficiency for high-complexity spatial data while delivering precise, human-informed predictions through generative AI-driven operations.

AAAI Conference 2025 Conference Paper

Hybrid Data-Free Knowledge Distillation

  • Jialiang Tang
  • Shuo Chen
  • Chen Gong

Data-free knowledge distillation aims to learn a compact student network from a pre-trained large teacher network without using the original training data of the teacher network. Existing collection-based and generation-based methods train student networks by collecting massive real examples and generating synthetic examples, respectively. However, they inevitably become weak in practical scenarios due to the difficulties in gathering or emulating sufficient real-world data. To solve this problem, we propose a novel method called Hybrid Data-Free Distillation (HiDFD), which leverages only a small amount of collected data as well as generates sufficient examples for training student networks. Our HiDFD comprises two primary modules, i.e., the teacher-guided generation and student distillation. The teacher-guided generation module guides a Generative Adversarial Network (GAN) by the teacher network to produce high-quality synthetic examples from very few real-world collected examples. Specifically, we design a feature integration mechanism to prevent the GAN from overfitting and facilitate the reliable representation learning from the teacher network. Meanwhile, we drive a category frequency smoothing technique via the teacher network to balance the generative training of each category. In the student distillation module, we explore a data inflation strategy to properly utilize a blend of real and synthetic data to train the student network via a classifier-sharing-based feature alignment technique. Intensive experiments across multiple benchmarks demonstrate that our HiDFD can achieve state-of-the-art performance using 120 times less collected data than existing methods.

IJCAI Conference 2025 Conference Paper

Label Distribution Learning with Biased Annotations Assisted by Multi-Label Learning

  • Zhiqiang Kou
  • Si Qin
  • Hailin Wang
  • Jing Wang
  • Mingkun Xie
  • Shuo Chen
  • Yuheng Jia
  • Tongliang Liu

Multi-label learning (MLL) has gained attention for its ability to represent real-world data. Label Distribution Learning (LDL), an extension of MLL to learning from label distributions, faces challenges in collecting accurate label distributions. To address the issue of biased annotations, based on the low-rank assumption, existing works recover true distributions from biased observations by exploring the label correlations. However, recent evidence shows that the label distribution tends to be full-rank, and naive apply of low-rank approximation on biased observation leads to inaccurate recovery and performance degradation. In this paper, we address the LDL with biased annotations problem from a novel perspective, where we first degenerate the soft label distribution into a hard multi-hot label and then recover the true label information for each instance. This idea stems from an insight that assigning hard multi-hot labels is often easier than assigning a soft label distribution, and it shows stronger immunity to noise disturbances, leading to smaller label bias. Moreover, assuming that the multi-label space for predicting label distributions is low-rank offers a more reasonable approach to capturing label correlations. Theoretical analysis and experiments confirm the effectiveness and robustness of our method on real-world datasets.

AAAI Conference 2025 Conference Paper

Learning Generalized Residual Exchange-Correlation-Uncertain Functional for Density Functional Theory

  • Sizhuo Jin
  • Shuo Chen
  • Jianjun Qian
  • Ying Tai
  • Jun Li

Density Functional Theory (DFT) stands as a widely used and efficient approach for addressing the many-electron Schrödinger equation across various domains such as physics, chemistry, and biology. However, a core challenge that persists over the long term pertains to refining the exchange-correlation (XC) approximation. This approximation significantly influences the triumphs and shortcomings observed in DFT applications. Nonetheless, a prevalent issue among XC approximations is the presence of systematic errors, stemming from deviations from the mathematical properties of the exact XC functional. For example, although both B3LYP and DM21 (DeepMind 21) exhibit improvements over previous benchmarks, there is still potential for further refinement. In this paper, we propose a strategy for enhancing XC approximations by estimating the neural uncertainty of the XC functional, named Residual XC-Uncertain Functional. Specifically, our approach involves training a neural network to predict both the mean and variance of the XC functional, treating it as a Gaussian distribution. To ensure stability in each sampling point, we construct the mean by combining traditional XC approximations with our neural predictions, mitigating the risk of divergence or vanishing values. It is crucial to highlight that our methodology excels particularly in cases where systematic errors are pronounced. Empirical outcomes from three benchmark tests substantiate the superiority of our approach over existing state-of-the-art methods. Our approach not only surpasses related techniques but also significantly outperforms both the popular B3LYP and the recent DM21 methods, achieving average RMSE improvements of 62% and 37%, respectively, across the three benchmarks: W4-17, G21EA, and G21IP.

AAAI Conference 2025 Conference Paper

Modeling Inter-Intra Heterogeneity for Graph Federated Learning

  • Wentao Yu
  • Shuo Chen
  • Yongxin Tong
  • Tianlong Gu
  • Chen Gong

Heterogeneity is a fundamental and challenging issue in federated learning, especially for the graph data due to the complex relationships among the graph nodes. To deal with the heterogeneity, lots of existing methods perform the weighted federation based on their calculated similarities between pairwise clients (i.e., subgraphs). However, their inter-subgraph similarities estimated with the outputs of local models are less reliable, because the final outputs of local models may not comprehensively represent the real distribution of subgraph data. In addition, they ignore the critical intra-heterogeneity which usually exists within each subgraph itself. To address these issues, we propose a novel Federated learning method by integrally modeling the Inter-Intra Heterogeneity (FedIIH). For the inter-subgraph relationship, we propose a novel hierarchical variational model to infer the whole distribution of subgraph data in a multi-level form, so that we can accurately characterize the inter-subgraph similarities with the global perspective. For the intra-heterogeneity, we disentangle the subgraph into multiple latent factors and partition the model parameters into multiple parts, where each part corresponds to a single latent factor. Our FedIIH not only properly computes the distribution similarities between subgraphs, but also learns disentangled representations that are robust to irrelevant factors within subgraphs, so that it successfully considers the inter- and intra- heterogeneity simultaneously. Extensive experiments on six homophilic and five heterophilic graph datasets in both non-overlapping and overlapping settings demonstrate the effectiveness of our method when compared with eight state-of-the-art methods. Specifically, FedIIH averagely outperforms the second-best method by a large margin of 5.79% on all heterophilic datasets.

IROS Conference 2025 Conference Paper

Peg-in-hole assembly method based on visual reinforcement learning and tactile pose estimation

  • Yong Tao
  • Shuo Chen
  • Haitao Liu
  • He Gao
  • Yu Tao
  • Yixian Chen 0003
  • Hongxing Wei

When robots replicate human actions in peg-in-hole assembly tasks, such as USB Type-A insertion and removal, the complexity of the process and frequent obstructions from the inner walls make it difficult for robots to handle collisions or avoid jamming. These difficulties contribute to a low success rate in assembly. This paper proposes a vision-guided reinforcement learning pre-assembly combined with tactile feedback-based pose estimation adjustment method for peg-in-hole assembly, achieving significant improvement in success rates for complex assembly tasks. First, during the pretraining process of reinforcement learning, high-reward sample data is collected, and a behaviour cloning (BC) algorithm is constructed based on sample data structure. The network is pretrained as a policy regression layer. Under sparse reward conditions, outputs of the twin delayed deep deterministic policy gradient (TD3) network and the BC network are combined to improve training stability and accelerate convergence, enhancing the efficiency of vision-based assembly. Then, to address the instability caused by collisions with the inner and outer walls of the hole when vision-based assembly remains incomplete, an in-hand pose estimation algorithm based on the Gelsight visuotactile sensor is integrated. This algorithm facilitates real-time adjustments to the position of the robot’s end-effector, improving the likelihood of successful peg-in-hole assembly. Finally, to validate the effectiveness of the proposed method, experiments were conducted using the V-REP simulation platform and the real Franka robot platform. In the experiments, success rates of 90-93% and 80-85%, respectively, were achieved.

NeurIPS Conference 2025 Conference Paper

RankMatch: A Novel Approach to Semi-Supervised Label Distribution Learning Leveraging Rank Correlation between Labels

  • Zhiqiang Kou
  • Yucheng Xie
  • Hailin Wang
  • Junyang Chen
  • Jingq Wang
  • Ming-Kun Xie
  • Shuo Chen
  • Yuheng Jia

Pseudo label based semi-supervised learning (SSL) for single-label and multi-label classification tasks has been extensively studied; however, semi-supervised label distribution learning (SSLDL) remains a largely unexplored area. Existing SSL methods fail in SSLDL because the pseudo-labels they generate only ensure overall similarity to the ground truth but do not preserve the ranking relationships between true labels, as they rely solely on KL divergence as the loss function during training. These skewed pseudo-labels lead the model to learn incorrect semantic relationships, resulting in reduced performance accuracy. To address these issues, we propose a novel SSLDL method called \textit{RankMatch}. \textit{RankMatch} fully considers the ranking relationships between different labels during the training phase with labeled data to generate higher-quality pseudo-labels. Furthermore, our key observation is that a flexible utilization of pseudo-labels can enhance SSLDL performance. Specifically, focusing solely on the ranking relationships between labels while disregarding their margins helps prevent model overfitting. Theoretically, we prove that incorporating ranking correlations enhances SSLDL performance and establish generalization error bounds for \textit{RankMatch}. Finally, extensive real-world experiments validate its effectiveness.

TIST Journal 2025 Journal Article

Robust Learning under Hybrid Noise

  • Yang Wei
  • Shuo Chen
  • Shanshan Ye
  • Bo Han
  • Chen Gong

Feature noise and label noise are ubiquitous in practical scenarios, which pose great challenges for training a robust machine learning model. Most previous approaches usually deal with only a single problem of either feature noise or label noise. However, in real-world applications, hybrid noise, which contains both feature noise and label noise, is very common due to the unreliable data collection and annotation processes. Although some results have been achieved by a few representation learning based attempts, this issue is still far from being addressed with promising performance and guaranteed theoretical analyses. To address the challenge, we propose a novel unified learning framework called Feature and Label Recovery (FLR) to combat the hybrid noise from the perspective of data recovery, where we concurrently reconstruct both the feature matrix and the label matrix of input data. Specifically, the clean feature matrix is discovered by the low-rank approximation, and the ground-truth label matrix is embedded based on the recovered features with a nuclear norm regularization. Meanwhile, the feature noise and label noise are characterized by their respective adaptive matrix norms to satisfy the corresponding maximum likelihood. As this framework leads to a non-convex optimization problem, we develop the non-convex Alternating Direction Method of Multipliers (ADMM) with the convergence guarantee to solve our learning objective. We also provide the theoretical analysis to show that the generalization error of FLR can be upper-bounded in the presence of hybrid noise. Experimental results on several typical benchmark datasets clearly demonstrate the superiority of our proposed method over the state-of-the-art robust learning approaches for various noises.

AAAI Conference 2025 Conference Paper

Towards Better Spherical Sliced-Wasserstein Distance Learning with Data-Adaptive Discriminative Projection Direction

  • Hongliang Zhang
  • Shuo Chen
  • Lei Luo
  • Jian Yang

Spherical Sliced-Wasserstein (SSW) has recently been proposed to measure the discrepancy between spherical data distributions in various fields, such as geology, medical domains, computer vision, and deep representation learning. However, in the original SSW, all projection directions are treated equally, which is too idealistic and cannot accurately reflect the importance of different projection directions for various data distributions. To address this issue, we propose a novel data-adaptive Discriminative Spherical Sliced-Wasserstein (DSSW) distance, which utilizes a projected energy function to determine the discriminative projection direction for SSW. In our new DSSW, we introduce two types of projected energy functions to generate the weights for projection directions with complete theoretical guarantees. The first type employs a non-parametric deterministic function that transforms the projected Wasserstein distance into its corresponding weight in each projection direction. This improves the performance of the original SSW distance with negligible additional computational overhead. The second type utilizes a neural network-induced function that learns the projection direction weight through a parameterized neural network based on data projections. This further enhances the performance of the original SSW distance with less extra computational overhead. Finally, we evaluate the performance of our proposed DSSW by comparing it with several state-of-the-art methods across a variety of machine learning tasks, including gradient flows, density estimation on real earth data, and self-supervised learning.

YNIMG Journal 2025 Journal Article

Uncovering the neural basis of risk preferences in cooperative Dyads: A fNIRS study

  • Qianlan Yin
  • Jing Wen
  • Shuo Chen
  • Tianya Hou
  • Ying Liu
  • Danni Yang
  • Guorui Liu
  • Peiqi Shi

BACKGROUND: Individuals' risk preferences have been shown to influence their decision-making in various contexts. However, the neural mechanisms underlying the relationship between risk preference and decision-making in a social setting remain unclear. This study utilized functional near-infrared spectroscopy (fNIRS) to investigate the neural correlates of dyadic decision-making under risk and the modulating effect of individual risk preference. METHOD: This study examined the impact of risk preference on group decision-making using a two-phase experimental design. Based on G-power software calculations, 168 right-handed participants (62 males, 106 females, mean age 21.26±1.70) were recruited. Participants first completed a single-player Sequential Risk Task to measure risk preference, followed by group classification into three groups: Risky&Risky, Risky&Safe, and Safe&Safe. Task performance and decision-making behavior were recorded. Functional Near-Infrared Spectroscopy (fNIRS) was employed to measure cortical activation in the prefrontal cortex, focusing on inter-brain synchrony and coupling directionality using wavelet coherence and Granger causality(GC) analyses. Data were preprocessed to remove noise, and statistical analyses included repeated measures ANOVAs, Support Vector Regression and multiple regression analyses. RESULTS: = 0.173 and 0.191). CONCLUSION: This study employed fNIRS hyperscanning to investigate how individual differences in risk preference impact decision-making in dyadic contexts. The results indicated that variations in connectivity and information transfer between the orbitofrontal and medial prefrontal cortices underlie the distinct risk-taking behaviors exhibited by dyadic pairs. These findings underscore the pivotal role of affective and cognitive control mechanisms and individual risk personality traits in cooperative decision-making under conditions of uncertainty.

TMLR Journal 2024 Journal Article

Can We Count on LLMs? The Fixed-Effect Fallacy and Claims of GPT-4 Capabilities

  • Thomas Ball
  • Shuo Chen
  • Cormac Herley

In this paper we explore evaluation of LLM capabilities. We present measurements of GPT-4 performance on several deterministic tasks; each task involves a basic calculation and takes as input parameter some element drawn from a large well-defined population (e.g., count elements in a list, multiply two k-digit numbers, etc). We examine several conditions per-task and perform enough trials so that statistically significant differences can be detected. This allows us to investigate the sensitivity of task-accuracy both to query phrasing and input parameter population. We find that seemingly trivial modifications in the task-prompt or input population can yield differences far larger than can be explained by sampling effects. For example, performance on a simple list-counting task varies with query-phrasing and list-length, but also with list composition (i.e., the thing-to-be-counted) and object frequency e.g., success when an element accounts for ≈ 50\% of a list is different from when it accounts for ≈ 70\% etc). We conclude that efforts to quantify LLM capabilities easily succumb to the language-as-fixed-effect fallacy, where experimental observations are improperly generalized beyond what the data supports. A consequence appears to be that intuitions that have been formed based on interactions with humans form a very unreliable guide as to which input modifications should ``make no difference'' to LLM performance.

IJCAI Conference 2024 Conference Paper

Efficiency Calibration of Implicit Regularization in Deep Networks via Self-paced Curriculum-Driven Singular Value Selection

  • Zhe Li
  • Shuo Chen
  • Jian Yang
  • Lei Luo

The generalization of neural networks has been a major focus of research in deep learning. It is often interpreted as an implicit bias towards solutions with specific properties. Especially, in practical applications, it has been observed that linear neural networks (LNN) tend to favor low-rank solutions for matrix completion tasks. However, most existing methods rely on increasing the depth of the neural network to enhance the low rank of solutions, resulting in higher complexity. In this paper, we propose a new explicit regularization method that calibrates the implicit bias towards low-rank trends in matrix completion tasks. Our approach automatically incorporates smaller singular values into the training process using a self-paced learning strategy, gradually restoring matrix information. By jointly using both implicit and explicit regularization, we effectively capture the low-rank structure of LNN and accelerate its convergence. We also analyze how our proposed penalty term interacts with implicit regularization and provide theoretical guarantees for our new model. To evaluate the effectiveness of our method, we conduct a series of experiments on both simulated and real-world data. Our experimental results clearly demonstrate that our method has better robustness and generalization ability compared with other methods.

ICML Conference 2024 Conference Paper

Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding

  • Chuanhao Sun
  • Zhihang Yuan
  • Kai Xu 0014
  • Luo Mai
  • N. Siddharth 0001
  • Shuo Chen
  • Mahesh K. Marina

Fourier features based positional encoding (PE) is commonly used in machine learning tasks that involve learning high-frequency features from low-dimensional inputs, such as 3D view synthesis and time series regression with neural tangent kernels. Despite their effectiveness, existing PEs require manual, empirical adjustment of crucial hyperparameters, specifically the Fourier features, tailored to each unique task. Further, PEs face challenges in efficiently learning high-frequency functions, particularly in tasks with limited data. In this paper, we introduce sinusoidal PE (SPE), designed to efficiently learn adaptive frequency features closely aligned with the true underlying function. Our experiments demonstrate that SPE, without hyperparameter tuning, consistently achieves enhanced fidelity and faster training across various tasks, including 3D view synthesis, Text-to-Speech generation, and 1D regression. SPE is implemented as a direct replacement for existing PEs. Its plug-and-play nature lets numerous tasks easily adopt and benefit from SPE.

NeurIPS Conference 2024 Conference Paper

Novel Object Synthesis via Adaptive Text-Image Harmony

  • Zeren Xiong
  • Zedong Zhang
  • Zikun Chen
  • Shuo Chen
  • Xiang Li
  • Gan Sun
  • Jian Yang
  • Jun Li

In this paper, we study an object synthesis task that combines an object text with an object image to create a new object image. However, most diffusion models struggle with this task, \textit{i. e. }, often generating an object that predominantly reflects either the text or the image due to an imbalance between their inputs. To address this issue, we propose a simple yet effective method called Adaptive Text-Image Harmony (ATIH) to generate novel and surprising objects. First, we introduce a scale factor and an injection step to balance text and image features in cross-attention and to preserve image information in self-attention during the text-image inversion diffusion process, respectively. Second, to better integrate object text and image, we design a balanced loss function with a noise parameter, ensuring both optimal editability and fidelity of the object image. Third, to adaptively adjust these parameters, we present a novel similarity score function that not only maximizes the similarities between the generated object image and the input text/image but also balances these similarities to harmonize text and image integration. Extensive experiments demonstrate the effectiveness of our approach, showcasing remarkable object creations such as colobus-glass jar. https: //xzr52. github. io/ATIH/

AAAI Conference 2024 Conference Paper

On the Unstable Convergence Regime of Gradient Descent

  • Shuo Chen
  • Jiaying Peng
  • Xiaolong Li
  • Yao Zhao

Traditional gradient descent (GD) has been fully investigated for convex or L-smoothness functions, and it is widely utilized in current neural network optimization. The classical descent lemma ensures that for a function with L-smoothness, the GD trajectory converges stably towards the minimum when the learning rate is below 2 / L. This convergence is marked by a consistent reduction in the loss function throughout the iterations. However, recent experimental studies have demonstrated that even when the L-smoothness condition is not met, or if the learning rate is increased leading to oscillations in the loss function during iterations, the GD trajectory still exhibits convergence over the long run. This phenomenon is referred to as the unstable convergence regime of GD. In this paper, we present a theoretical perspective to offer a qualitative analysis of this phenomenon. The unstable convergence is in fact an inherent property of GD for general twice differentiable functions. Specifically, the forwardinvariance of GD is established, i.e., it ensures that any point within a local region will always remain within this region under GD iteration. Then, based on the forward-invariance, for the initialization outside an open set containing the local minimum, the loss function will oscillate at the first several iterations and then become monotonely decreasing after the GD trajectory jumped into the open set. This work theoretically clarifies the unstable convergence phenomenon of GD discussed in previous experimental works. The unstable convergence of GD mainly depends on the selection of the initialization, and it is actually inevitable due to the complex nature of loss function.

NeurIPS Conference 2023 Conference Paper

Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models

  • Shuo Chen
  • Jindong Gu
  • Zhen Han
  • Yunpu Ma
  • Philip Torr
  • Volker Tresp

Various adaptation methods, such as LoRA, prompts, and adapters, have been proposed to enhance the performance of pre-trained vision-language models in specific domains. As test samples in real-world applications usually differ from adaptation data, the robustness of these adaptation methods against distribution shifts are essential. In this study, we assess the robustness of 11 widely-used adaptation methods across 4 vision-language datasets under multimodal corruptions. Concretely, we introduce 7 benchmark datasets, including 96 visual and 87 textual corruptions, to investigate the robustness of different adaptation methods, the impact of available adaptation examples, and the influence of trainable parameter size during adaptation. Our analysis reveals that: 1) Adaptation methods are more sensitive to text corruptions than visual corruptions. 2) Full fine-tuning does not consistently provide the highest robustness; instead, adapters can achieve better robustness with comparable clean performance. 3) Contrary to expectations, our findings indicate that increasing the number of adaptation data and parameters does not guarantee enhanced robustness; instead, it results in even lower robustness. We hope this study could benefit future research in the development of robust multimodal adaptation methods. The benchmark, code, and dataset used in this study can be accessed at https: //adarobustness. github. io.

AAAI Conference 2023 Conference Paper

Bidirectional Optical Flow NeRF: High Accuracy and High Quality under Fewer Views

  • Shuo Chen
  • Binbin Yan
  • Xinzhu Sang
  • Duo Chen
  • Peng Wang
  • Xiao Guo
  • Chongli Zhong
  • Huaming Wan

Neural Radiance Fields (NeRF) can implicitly represent 3D-consistent RGB images and geometric by optimizing an underlying continuous volumetric scene function using a sparse set of input views, which has greatly benefited view synthesis tasks. However, NeRF fails to estimate correct geometry when given fewer views, resulting in failure to synthesize novel views. Existing works rely on introducing depth images or adding depth estimation networks to resolve the problem of poor synthetic view in NeRF with fewer views. However, due to the lack of spatial consistency of the single-depth image and the poor performance of depth estimation with fewer views, the existing methods still have challenges in addressing this problem. So this paper proposes Bidirectional Optical Flow NeRF(BOF-NeRF), which addresses this problem by mining optical flow information between 2D images. Our key insight is that utilizing 2D optical flow images to design a loss can effectively guide NeRF to learn the correct geometry and synthesize the right novel view. We also propose a view-enhanced fusion method based on geometry and color consistency to solve the problem of novel view details loss in NeRF. We conduct extensive experiments on the NeRF-LLFF and DTU MVS benchmarks for novel view synthesis tasks with fewer images in different complex real scenes. We further demonstrate the robustness of BOF-NeRF under different baseline distances on the Middlebury dataset. In all cases, BOF-NeRF outperforms current state-of-the-art baselines for novel view synthesis and scene geometry estimation.

IROS Conference 2023 Conference Paper

BlinkFlow: A Dataset to Push the Limits of Event-Based Optical Flow Estimation

  • Yijin Li
  • Zhaoyang Huang
  • Shuo Chen
  • Xiaoyu Shi 0002
  • Hongsheng Li 0001
  • Hujun Bao
  • Zhaopeng Cui
  • Guofeng Zhang 0001

Event cameras provide high temporal precision, low data rates, and high dynamic range visual perception, which are well-suited for optical flow estimation. While data-driven optical flow estimation has obtained great success in RGB cameras, its generalization performance is seriously hindered in event cameras mainly due to the limited and biased training data. In this paper, we present a novel simulator, BlinkSim, for the fast generation of large-scale data for event-based optical flow. BlinkSim incorporates a configurable rendering engine alongside an event simulation suite. By leveraging the wealth of current 3D assets, the rendering engine enables us to automatically build up thousands of scenes with different objects, textures, and motion patterns and render very high-frequency images for realistic event data simulation. Based on BlinkSim, we construct a large training dataset and evaluation benchmark BlinkFlow that contains sufficient, diversiform, and challenging event data with optical flow ground truth. Experiments show that BlinkFlow improves the generalization performance of state-of-the-art methods by more than 40% on average and up to 90%. Moreover, we further propose an Event-based optical Flow transFormer (E-FlowFormer) architecture. Powered by our BlinkFlow, E-FlowFormer outperforms the SOTA methods by up to 91% on the MVSEC dataset and 14% on the DSEC dataset and presents the best generalization performance. The source code and data are available at https://zju3dv.github.io/blinkflow/.

EAAI Journal 2023 Journal Article

Hyperbolic embedding steered spatiotemporal graph convolutional network for video-based remote heart rate estimation

  • Hang Shao
  • Lei Luo
  • Shuo Chen
  • Chuanfei Hu
  • Jian Yang

Remote heart rate estimation aims to predict cardiac activity signals from facial videos without any physical contact, which has been showing promising results recently. However, existing estimation methods based on deep convolutional networks only focus on the rigid receptive field, while ignoring potential spatial correlations of different facial regions, which obviously cannot reduce the overfitting caused by various noise and motion interference unrelated to cardiac activity. To address these issues, this paper proposes PhysGCN, an end-to-end spatiotemporal graph convolutional network with the hyperbolic embedding, to coordinate the contributions of intra- and inter-frame features of facial videos for long-term heart rate estimation. Specifically, firstly, we convert the facial video captured by the vision system into a graph-structure spatiotemporal map, and use the link set of the graph to determine and lock the spatial relative positions of multiple skin sub-regions formed by intra-frame face segmentation and projection. Secondly, to purify the signal and prevent the interference from heart rate irrelevant features, we integrate and measure the similarity between sub-regions within the graph in a non-Euclidean space by a hyperbolic embedding module, which can characterize the correlation more distinctly compared to the plane space. Finally, we dynamically and elaborately orchestrate the inherent temporal and learned spatial features in a graph convolutional module to obtain reliable heart rate waveforms. We conduct extensive comparative experiments and ablation studies on multiple public datasets to verify the superiority and robustness of our method. Experiments show that our method can effectively estimate heart rate from facial videos, and its performance surpasses or matches the state-of-the-art methods.

NeurIPS Conference 2023 Conference Paper

Self-Weighted Contrastive Learning among Multiple Views for Mitigating Representation Degeneration

  • Jie Xu
  • Shuo Chen
  • Yazhou Ren
  • Xiaoshuang Shi
  • Hengtao Shen
  • Gang Niu
  • Xiaofeng Zhu

Recently, numerous studies have demonstrated the effectiveness of contrastive learning (CL), which learns feature representations by pulling in positive samples while pushing away negative samples. Many successes of CL lie in that there exists semantic consistency between data augmentations of the same instance. In multi-view scenarios, however, CL might cause representation degeneration when the collected multiple views inherently have inconsistent semantic information or their representations subsequently do not capture sufficient discriminative information. To address this issue, we propose a novel framework called SEM: SElf-weighted Multi-view contrastive learning with reconstruction regularization. Specifically, SEM is a general framework where we propose to first measure the discrepancy between pairwise representations and then minimize the corresponding self-weighted contrastive loss, and thus making SEM adaptively strengthen the useful pairwise views and also weaken the unreliable pairwise views. Meanwhile, we impose a self-supervised reconstruction term to regularize the hidden features of encoders, to assist CL in accessing sufficient discriminative information of data. Experiments on public multi-view datasets verified that SEM can mitigate representation degeneration in existing CL methods and help them achieve significant performance improvements. Ablation studies also demonstrated the effectiveness of SEM with different options of weighting strategies and reconstruction terms.

IJCAI Conference 2022 Conference Paper

Active Contrastive Set Mining for Robust Audio-Visual Instance Discrimination

  • Hanyu Xuan
  • Yihong Xu
  • Shuo Chen
  • Zhiliang Wu
  • Jian Yang
  • Yan Yan
  • Xavier Alameda-Pineda

The recent success of audio-visual representation learning can be largely attributed to their pervasive property of audio-visual synchronization, which can be used as self-annotated supervision. As a state-of-the-art solution, Audio-Visual Instance Discrimination (AVID) extends instance discrimination to the audio-visual realm. Existing AVID methods construct the contrastive set by random sampling based on the assumption that the audio and visual clips from all other videos are not semantically related. We argue that this assumption is rough, since the resulting contrastive sets have a large number of faulty negatives. In this paper, we overcome this limitation by proposing a novel Active Contrastive Set Mining (ACSM) that aims to mine the contrastive sets with informative and diverse negatives for robust AVID. Moreover, we also integrate a semantically-aware hard-sample mining strategy into our ACSM. The proposed ACSM is implemented into two most recent state-of-the-art AVID methods and significantly improves their performance. Extensive experiments conducted on both action and sound recognition on multiple datasets show the remarkably improved performance of our method.

YNIMG Journal 2022 Journal Article

Cerebral blood flow and cardiovascular risk effects on resting brain regional homogeneity

  • Bhim M. Adhikari
  • L. Elliot Hong
  • Zhiwei Zhao
  • Danny J.J. Wang
  • Paul M. Thompson
  • Neda Jahanshad
  • Alyssa H. Zhu
  • Stefan Holiga

Regional homogeneity (ReHo) is a measure of local functional brain connectivity that has been reported to be altered in a wide range of neuropsychiatric disorders. Computed from brain resting-state functional MRI time series, ReHo is also sensitive to fluctuations in cerebral blood flow (CBF) that in turn may be influenced by cerebrovascular health. We accessed cerebrovascular health with Framingham cardiovascular risk score (FCVRS). We hypothesize that ReHo signal may be influenced by regional CBF; and that these associations can be summarized as FCVRS→CBF→ReHo. We used three independent samples to test this hypothesis. A test-retest sample of N = 30 healthy volunteers was used for test-retest evaluation of CBF effects on ReHo. Amish Connectome Project (ACP) sample (N = 204, healthy individuals) was used to evaluate association between FCVRS and ReHo and testing if the association diminishes given CBF. The UKBB sample (N = 6, 285, healthy participants) was used to replicate the effects of FCVRS on ReHo. We observed strong CBF→ReHo links (p<2. 5 × 10−3) using a three-point longitudinal sample. In ACP sample, marginal and partial correlations analyses demonstrated that both CBF and FCVRS were significantly correlated with the whole-brain average (p<10−6) and regional ReHo values, with the strongest correlations observed in frontal, parietal, and temporal areas. Yet, the association between ReHo and FCVRS became insignificant once the effect of CBF was accounted for. In contrast, CBF→ReHo remained significantly linked after adjusting for FCVRS and demographic covariates (p<10−6). Analysis in N = 6, 285 replicated the FCVRS→ReHo effect (p = 2. 7 × 10−27). In summary, ReHo alterations in health and neuropsychiatric illnesses may be partially driven by region-specific variability in CBF, which is, in turn, influenced by cardiovascular factors.

NeurIPS Conference 2022 Conference Paper

Learning Contrastive Embedding in Low-Dimensional Space

  • Shuo Chen
  • Chen Gong
  • Jun Li
  • Jian Yang
  • Gang Niu
  • Masashi Sugiyama

Contrastive learning (CL) pretrains feature embeddings to scatter instances in the feature space so that the training data can be well discriminated. Most existing CL techniques usually encourage learning such feature embeddings in the highdimensional space to maximize the instance discrimination. However, this practice may lead to undesired results where the scattering instances are sparsely distributed in the high-dimensional feature space, making it difficult to capture the underlying similarity between pairwise instances. To this end, we propose a novel framework called contrastive learning with low-dimensional reconstruction (CLLR), which adopts a regularized projection layer to reduce the dimensionality of the feature embedding. In CLLR, we build the sparse / low-rank regularizer to adaptively reconstruct a low-dimensional projection space while preserving the basic objective for instance discrimination, and thus successfully learning contrastive embeddings that alleviate the above issue. Theoretically, we prove a tighter error bound for CLLR; empirically, the superiority of CLLR is demonstrated across multiple domains. Both theoretical and experimental results emphasize the significance of learning low-dimensional contrastive embeddings.

AAAI Conference 2022 Conference Paper

Linearity-Aware Subspace Clustering

  • Yesong Xu
  • Shuo Chen
  • Jun Li
  • Jianjun Qian

Obtaining a good similarity matrix is extremely important in subspace clustering. Current state-of-the-art methods learn the similarity matrix through self-expressive strategy. However, these methods directly adopt original samples as a set of basis to represent itself linearly. It is difficult to accurately describe the linear relation between samples in the real-world applications, and thus is hard to find an ideal similarity matrix. To better represent the linear relation of samples, we present a subspace clustering model, Linearity-Aware Subspace Clustering (LASC), which can consciously learn the similarity matrix by employing a linearity-aware metric. This is a new subspace clustering method that combines metric learning and subspace clustering into a joint learning framework. In our model, we first utilize the self-expressive strategy to obtain an initial subspace structure and discover a low-dimensional representation of the original data. Subsequently, we use the proposed metric to learn an intrinsic similarity matrix with linearity-aware on the obtained subspace. Based on such a learned similarity matrix, the inter-cluster distance becomes larger than the intra-cluster distances, and thus successfully obtaining a good subspace cluster result. In addition, to enrich the similarity matrix with more consistent knowledge, we adopt a collaborative learning strategy for self-expressive subspace learning and linearity-aware subspace learning. Moreover, we provide detailed mathematical analysis to show that the metric can properly characterize the linear correlation between samples.

NeurIPS Conference 2021 Conference Paper

Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model

  • Jiangning Zhang
  • Chao Xu
  • Jian Li
  • Wenzhou Chen
  • Yabiao Wang
  • Ying Tai
  • Shuo Chen
  • Chengjie Wang

Inspired by biological evolution, we explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) and derive that both of them have consistent mathematical representation. Analogous to the dynamic local population in EA, we improve the existing transformer structure and propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly. Moreover, we introduce the spatial-filling curve into the current vision transformer to sequence image data into a uniform sequential format. Thus we can design a unified EAT framework to address multi-modal tasks, separating the network architecture from the data format adaptation. Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works while having smaller parameters and greater throughput. We further conduct multi-modal tasks to demonstrate the superiority of the unified EAT, \eg, Text-Based Image Retrieval, and our approach improves the rank-1 by +3. 7 points over the baseline on the CSS dataset.

YNIMG Journal 2021 Journal Article

Comparing empirical kinship derived heritability for imaging genetics traits in the UK biobank and human connectome project

  • Si Gao
  • Brian Donohue
  • Kathryn S. Hatch
  • Shuo Chen
  • Tianzhou Ma
  • Yizhou Ma
  • Mark D. Kvarta
  • Heather Bruce

Imaging genetics analyses use neuroimaging traits as intermediate phenotypes to infer the degree of genetic contribution to brain structure and function in health and/or illness. Coefficients of relatedness (CR) summarize the degree of genetic similarity among subjects and are used to estimate the heritability – the proportion of phenotypic variance explained by genetic factors. The CR can be inferred directly from genome-wide genotype data to explain the degree of shared variation in common genetic polymorphisms (SNP-heritability) among related or unrelated subjects. We developed a central processing and graphics processing unit (CPU and GPU) accelerated Fast and Powerful Heritability Inference (FPHI) approach that linearizes likelihood calculations to overcome the ∼N 2–3 computational effort dependency on sample size of classical likelihood approaches. We calculated for 60 regional and 1. 3 × 105 voxel-wise traits in N = 1, 206 twin and sibling participants from the Human Connectome Project (HCP) (550 M/656 F, age = 28. 8 ± 3. 7 years) and N = 37, 432 (17, 531 M/19, 901 F; age = 63. 7 ± 7. 5 years) participants from the UK Biobank (UKBB). The FPHI estimates were in excellent agreement with heritability values calculated using Genome-wide Complex Trait Analysis software (r = 0. 96 and 0. 98 in HCP and UKBB sample) while significantly reducing computational (102–4 times). The regional and voxel-wise traits heritability estimates for the HCP and UKBB were likewise in excellent agreement (r = 0. 63–0. 76, p < 10−1 0). In summary, the hardware-accelerated FPHI made it practical to calculate heritability values for voxel-wise neuroimaging traits, even in very large samples such as the UKBB. The patterns of additive genetic variance in neuroimaging traits measured in a large sample of related and unrelated individuals showed excellent agreement regardless of the estimation method. The code and instruction to execute these analyses are available at www. solar-eclipse-genetics. org.

YNICL Journal 2021 Journal Article

Comparison of regional brain deficit patterns in common psychiatric and neurological disorders as revealed by big data

  • Peter Kochunov
  • Meghann C. Ryan
  • Qifan Yang
  • Kathryn S. Hatch
  • Alyssa Zhu
  • Sophia I. Thomopoulos
  • Neda Jahanshad
  • Lianne Schmaal

Neurological and psychiatric illnesses are associated with regional brain deficit patterns that bear unique signatures and capture illness-specific characteristics. The Regional Vulnerability Index (RVI) was developed toquantify brain similarity by comparing individual white matter microstructure, cortical gray matter thickness and subcortical gray matter structural volume measures with neuroanatomical deficit patterns derived from large-scale meta-analytic studies. We tested the specificity of the RVI approach for major depressive disorder (MDD) and Alzheimer’s disease (AD) in a large epidemiological sample of UK Biobank (UKBB) participants (N = 19, 393; 9138 M/10, 255F; age = 64. 8 ± 7. 4 years). Compared to controls free of neuropsychiatric disorders, participants with MDD (N = 2, 248; 805 M/1443F; age = 63. 4 ± 7. 4) had significantly higher RVI-MDD values (t = 5. 6, p = 1·10−8), but showed no detectable difference in RVI-AD (t = 2. 0, p = 0. 10). Subjects with dementia (N = 7; 4 M/3F; age = 68. 6 ± 8. 6 years) showed significant elevation in RVI-AD (t = 4. 2, p = 3·10−5) but not RVI-MDD (t = 2. 1, p = 0. 10) compared to controls. Even within affective illnesses, participants with bipolar disorder (N = 54) and anxiety disorder (N = 773) showed no significant elevation in whole-brain RVI-MDD. Participants with Parkinson’s disease (N = 37) showed elevation in RVI-AD (t = 2. 4, p = 0. 01) while subjects with stroke (N = 247) showed no such elevation (t = 1. 1, p = 0. 3). In summary, we demonstrated elevation in RVI-MDD and RVI-AD measures in the respective illnesses with strong replicability that is relatively specific to the respective diagnoses. These neuroanatomic deviation patterns offer a useful biomarker for population-wide assessments of similarity to neuropsychiatric illnesses.

YNICL Journal 2021 Journal Article

Mapping local and long-distance resting connectivity markers of TMS-related inhibition reduction in schizophrenia

  • Stephanie M. Hare
  • Xiaoming Du
  • Bhim M. Adhikari
  • Shuo Chen
  • Chen Mo
  • Ann Summerfelt
  • Mark D. Kvarta
  • Laura Garcia

Short interval intracortical inhibition (SICI) is a biomarker for altered motor inhibition in schizophrenia, but the manner in which distant sites influence the inhibitory cortical-effector response remains elusive. Our study investigated local and long-distance resting state functional connectivity (rsFC) markers of SICI in a sample of N = 23 patients with schizophrenia and N = 29 controls. Local functional connectivity was quantified using regional homogeneity (ReHo) analysis and long-range connectivity was estimated using seed-based rsFC analysis. Direct and indirect effects of connectivity measures on SICI were modeled using mediation analysis. Higher SICI ratios (indicating reduced inhibition) in patients were associated with lower ReHo in the right insula. Follow-up rsFC analyses showed that higher SICI scores (indicating reduced inhibition) were associated with reduced connectivity between right insula and hubs of the corticospinal pathway: sensorimotor cortex and basal ganglia. Mediation analysis supported a model in which the direct effect of local insular connectivity strength on SICI is mediated by the interhemispheric connectivity between insula and left sensorimotor cortex. The broader clinical implications of these findings are discussed with emphasis on how these preliminary findings might inform novel interventions designed to restore or improve SICI in schizophrenia and deepen our understanding of motor inhibitory control and impact of abnormal signaling in motor-inhibitory pathways in schizophrenia.

NeurIPS Conference 2021 Conference Paper

Spectrum-to-Kernel Translation for Accurate Blind Image Super-Resolution

  • Guangpin Tao
  • Xiaozhong Ji
  • Wenzhuo Wang
  • Shuo Chen
  • Chuming Lin
  • Yun Cao
  • Tong Lu
  • Donghao Luo

Deep-learning based Super-Resolution (SR) methods have exhibited promising performance under non-blind setting where blur kernel is known; however, blur kernels of Low-Resolution (LR) images in different practical applications are usually unknown. It may lead to a significant performance drop when degradation process of training images deviates from that of real images. In this paper, we propose a novel blind SR framework to super-resolve LR images degraded by arbitrary blur kernel with accurate kernel estimation in frequency domain. To our best knowledge, this is the first deep learning method which conducts blur kernel estimation in frequency domain. Specifically, we first demonstrate that feature representation in frequency domain is more conducive for blur kernel reconstruction than in spatial domain. Next, we present a Spectrum-to-Kernel (S$2$K) network to estimate general blur kernels in diverse forms. We use a conditional GAN (CGAN) combined with SR-oriented optimization target to learn the end-to-end translation from degraded images' spectra to unknown kernels. Extensive experiments on both synthetic and real-world images demonstrate that our proposed method sufficiently reduces blur kernel estimation error, thus enables the off-the-shelf non-blind SR methods to work under blind setting effectively, and achieves superior performance over state-of-the-art blind SR methods, averagely by 1. 39dB, 0. 48dB (Gaussian kernels) and 6. 15dB, 4. 57dB (motion kernels) for scales $2\times$ and $4\times$ respectively.

YNICL Journal 2021 Journal Article

Temporal-thalamic and cingulo-opercular connectivity in people with schizophrenia

  • Adam J. Culbreth
  • Qiong Wu
  • Shuo Chen
  • Bhim M. Adhikari
  • L. Elliot Hong
  • James M. Gold
  • James A. Waltz

A growing body of research has suggested that people with schizophrenia (SZ) exhibit altered patterns of functional and anatomical brain connectivity. For example, many previous resting state functional connectivity (rsFC) studies have shown that, compared to healthy controls (HC), people with SZ demonstrate hyperconnectivity between subregions of the thalamus and sensory cortices, as well as hypoconnectivity between subregions of the thalamus and prefrontal cortex. In addition to thalamic findings, hypoconnectivity between cingulo-opercular brain regions thought to be involved in salience detection has also been commonly reported in people with SZ. However, previous studies have largely relied on seed-based analyses. Seed-based approaches require researchers to define a single a priori brain region, which is then used to create a rsFC map across the entire brain. While useful for testing specific hypotheses, these analyses are limited in that only a subset of connections across the brain are explored. In the current manuscript, we leverage novel network statistical techniques in order to detect latent functional connectivity networks with organized topology that successfully differentiate people with SZ from HCs. Importantly, these techniques do not require a priori seed selection and allow for whole brain investigation, representing a comprehensive, data-driven approach to determining differential connectivity between diagnostic groups. Across two samples, (Sample 1: 35 SZ, 44 HC; Sample 2: 65 SZ, 79 HC), we found evidence for differential rsFC within a network including temporal and thalamic regions. Connectivity in this network was greater for people with SZ compared to HCs. In the second sample, we also found evidence for hypoconnectivity within a cingulo-opercular network of brain regions in people with SZ compared to HCs. In summary, our results replicate and extend previous studies suggesting hyperconnectivity between the thalamus and sensory cortices and hypoconnectivity between cingulo-opercular regions in people with SZ using data-driven statistical and graph theoretical techniques.

AAAI Conference 2020 Conference Paper

AATEAM: Achieving the Ad Hoc Teamwork by Employing the Attention Mechanism

  • Shuo Chen
  • Ewa Andrejczuk
  • Zhiguang Cao
  • Jie Zhang

In the ad hoc teamwork setting, a team of agents needs to perform a task without prior coordination. The most advanced approach learns policies based on previous experiences and reuses one of the policies to interact with new teammates. However, the selected policy in many cases is sub-optimal. Switching between policies to adapt to new teammates’ behaviour takes time, which threatens the successful performance of a task. In this paper, we propose AATEAM – a method that uses the attention-based neural networks to cope with new teammates’ behaviour in real-time. We train one attention network per teammate type. The attention networks learn both to extract the temporal correlations from the sequence of states (i. e. contexts) and the mapping from contexts to actions. Each attention network also learns to predict a future state given the current context and its output action. The prediction accuracies help to determine which actions the ad hoc agent should take. We perform extensive experiments to show the effectiveness of our method.

AAAI Conference 2020 Conference Paper

Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization

  • Hanyu Xuan
  • Zhenyu Zhang
  • Shuo Chen
  • Jian Yang
  • Yan Yan

In human multi-modality perception systems, the benefits of integrating auditory and visual information are extensive as they provide plenty supplementary cues for understanding the events. Despite some recent methods proposed for such application, they cannot deal with practical conditions with temporal inconsistency. Inspired by human system which puts different focuses at specific locations, time segments and media while performing multi-modality perception, we provide an attention-based method to simulate such process. Similar to human mechanism, our network can adaptively select “where” to attend, “when” to attend and “which” to attend for audio-visual event localization. In this way, even with large temporal inconsistent between vision and audio, our network is able to adaptively trade information between different modalities and successfully achieve event localization. Our method achieves state-of-the-art performance on AVE (Audio-Visual Event) dataset collected in the real life. In addition, we also systemically investigate audio-visual event localization tasks. The visualization results also help us better understand how our model works.

NeurIPS Conference 2020 Conference Paper

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

  • Xiang Li
  • Wenhai Wang
  • Lijun Wu
  • Shuo Chen
  • Xiaolin Hu
  • Jun Li
  • Jinhui Tang
  • Jian Yang

One-stage detector basically formulates object detection as dense classification and localization (i. e. , bounding box regression). The classification is usually optimized by Focal Loss and the box location is commonly learned under Dirac delta distribution. A recent trend for one-stage detectors is to introduce an \emph{individual} prediction branch to estimate the quality of localization, where the predicted quality facilitates the classification to improve detection performance. This paper delves into the \emph{representations} of the above three fundamental elements: quality estimation, classification and localization. Two problems are discovered in existing practices, including (1) the inconsistent usage of the quality estimation and classification between training and inference, and (2) the inflexible Dirac delta distribution for localization. To address the problems, we design new representations for these elements. Specifically, we merge the quality estimation into the class prediction vector to form a joint representation, and use a vector to represent arbitrary distribution of box locations. The improved representations eliminate the inconsistency risk and accurately depict the flexible distribution in real data, but contain \emph{continuous} labels, which is beyond the scope of Focal Loss. We then propose Generalized Focal Loss (GFL) that generalizes Focal Loss from its discrete form to the \emph{continuous} version for successful optimization. On COCO {\tt test-dev}, GFL achieves 45. 0\% AP using ResNet-101 backbone, surpassing state-of-the-art SAPD (43. 5\%) and ATSS (43. 6\%) with higher or comparable inference speed.

AAAI Conference 2020 Conference Paper

Understanding the Disharmony between Weight Normalization Family and Weight Decay

  • Xiang Li
  • Shuo Chen
  • Jian Yang

The merits of fast convergence and potentially better performance of the weight normalization family have drawn increasing attention in recent years. These methods use standardization or normalization that changes the weight W to W, which makes W independent to the magnitude of W. Surprisingly, W must be decayed during gradient descent, otherwise we will observe a severe under- fitting problem, which is very counter-intuitive since weight decay is widely known to prevent deep networks from over-fitting. Moreover, if we substitute (e. g. , weight normalization) W = W ||W || in the original loss function i L(f(xi; W ), yi) + 1 2 λ||W ||2, it is observed that the regularization term 1 2 λ||W ||2 will be canceled as a constant 1 2 λ in the optimization objective. Therefore, to decay W, we need to explicitly append: 1 2 λ||W ||2. In this paper, we theoretically prove that 1 2 λ||W ||2 improves optimization only by modulating the effective learning rate and fairly has no influence on generalization when the weight normalization family is compositely employed. Furthermore, we also expose several serious problems when introducing weight decay term to weight normalization family, including the missing of global minimum, training instability and sensitivity of initialization. To address these problems, we propose an Adaptive Weight Shrink (AWS) scheme, which gradually shrinks the weights during optimization by a dynamic coefficient proportional to the magnitude of the parameter. This simple yet effective method appropriately controls the effective learning rate, which significantly improves the training stability and makes optimization more robust to initialization.

IJCAI Conference 2019 Conference Paper

ATSIS: Achieving the Ad hoc Teamwork by Sub-task Inference and Selection

  • Shuo Chen
  • Ewa Andrejczuk
  • Athirai A. Irissappane
  • Jie Zhang

In an ad hoc teamwork setting, the team needs to coordinate their activities to perform a task without prior agreement on how to achieve it. The ad hoc agent cannot communicate with its teammates but it can observe their behaviour and plan accordingly. To do so, the existing approaches rely on the teammates' behaviour models. However, the models may not be accurate, which can compromise teamwork. For this reason, we present Ad Hoc Teamwork by Sub-task Inference and Selection (ATSIS) algorithm that uses a sub-task inference without relying on teammates' models. First, the ad hoc agent observes its teammates to infer which sub-tasks they are handling. Based on that, it selects its own sub-task using a partially observable Markov decision process that handles the uncertainty of the sub-task inference. Last, the ad hoc agent uses the Monte Carlo tree search to find the set of actions to perform the sub-task. Our experiments show the benefits of ATSIS for robust teamwork.

NeurIPS Conference 2019 Conference Paper

Curvilinear Distance Metric Learning

  • Shuo Chen
  • Lei Luo
  • Jian Yang
  • Chen Gong
  • Jun Li
  • Heng Huang

Distance Metric Learning aims to learn an appropriate metric that faithfully measures the distance between two data points. Traditional metric learning methods usually calculate the pairwise distance with fixed distance functions (\emph{e. g. ,}\ Euclidean distance) in the projected feature spaces. However, they fail to learn the underlying geometries of the sample space, and thus cannot exactly predict the intrinsic distances between data points. To address this issue, we first reveal that the traditional linear distance metric is equivalent to the cumulative arc length between the data pair's nearest points on the learned straight measurer lines. After that, by extending such straight lines to general curved forms, we propose a Curvilinear Distance Metric Learning (CDML) method, which adaptively learns the nonlinear geometries of the training data. By virtue of Weierstrass theorem, the proposed CDML is equivalently parameterized with a 3-order tensor, and the optimization algorithm is designed to learn the tensor parameter. Theoretical analysis is derived to guarantee the effectiveness and soundness of CDML. Extensive experiments on the synthetic and real-world datasets validate the superiority of our method over the state-of-the-art metric learning models.

AAAI Conference 2019 Conference Paper

Data-Adaptive Metric Learning with Scale Alignment

  • Shuo Chen
  • Chen Gong
  • Jian Yang
  • Ying Tai
  • Le Hui
  • Jun Li

The central problem for most existing metric learning methods is to find a suitable projection matrix on the differences of all pairs of data points. However, a single unified projection matrix can hardly characterize all data similarities accurately as the practical data are usually very complicated, and simply adopting one global projection matrix might ignore important local patterns hidden in the dataset. To address this issue, this paper proposes a novel method dubbed “Data-Adaptive Metric Learning” (DAML), which constructs a data-adaptive projection matrix for each data pair by selectively combining a set of learned candidate matrices. As a result, every data pair can obtain a specific projection matrix, enabling the proposed DAML to flexibly fit the training data and produce discriminative projection results. The model of DAML is formulated as an optimization problem which jointly learns candidate projection matrices and their sparse combination for every data pair. Nevertheless, the over-fitting problem may occur due to the large amount of parameters to be learned. To tackle this issue, we adopt the Total Variation (TV) regularizer to align the scales of data embedding produced by all candidate projection matrices, and thus the generated metrics of these learned candidates are generally comparable. Furthermore, we extend the basic linear DAML model to the kernerlized version (denoted “KDAML”) to handle the non-linear cases, and the Iterative Shrinkage-Thresholding Algorithm (ISTA) is employed to solve the optimization model. Intensive experimental results on various applications including retrieval, classification, and verification clearly demonstrate the superiority of our algorithm to other state-of-the-art metric learning methodologies.

IJCAI Conference 2018 Conference Paper

Adversarial Metric Learning

  • Shuo Chen
  • Chen Gong
  • Jian Yang
  • Xiang Li
  • Yang Wei
  • Jun Li

In the past decades, intensive efforts have been put to design various loss functions and metric forms for metric learning problem. These improvements have shown promising results when the test data is similar to the training data. However, the trained models often fail to produce reliable distances on the ambiguous test pairs due to the different samplings between training set and test set. To address this problem, the Adversarial Metric Learning (AML) is proposed in this paper, which automatically generates adversarial pairs to remedy the sampling bias and facilitate robust metric learning. Specifically, AML consists of two adversarial stages, i. e. confusion and distinguishment. In confusion stage, the ambiguous but critical adversarial data pairs are adaptively generated to mislead the learned metric. In distinguishment stage, a metric is exhaustively learned to try its best to distinguish both adversarial pairs and original training pairs. Thanks to the challenges posed by the confusion stage in such competing process, the AML model is able to grasp plentiful difficult knowledge that has not been contained by the original training pairs, so the discriminability of AML can be significantly improved. The entire model is formulated into optimization framework, of which the global convergence is theoretically proved. The experimental results on toy data and practical datasets clearly demonstrate the superiority of AML to representative state-of-the-art metric learning models.

AAAI Conference 2018 Conference Paper

POMDP-Based Decision Making for Fast Event Handling in VANETs

  • Shuo Chen
  • Athirai Irissappane
  • Jie Zhang

Malicious vehicle agents broadcast fake information about traffic events and thereby undermine the benefits of vehicleto-vehicle communication in vehicular ad-hoc networks (VANETs). Trust management schemes addressing this issue do not focus on effective/fast decision making in reacting to traffic events. We propose a Partially Observable Markov Decision Process (POMDP) based approach to balance the trade-off between information gathering and exploiting actions resulting in faster responses. Our model copes with malicious behavior by maintaining it as part of a small state space, thus is scalable for large VANETs. We also propose an algorithm to learn model parameters in a dynamic behavior setting. Experimental results demonstrate that our model can effectively balance the decision quality and response time while still being robust to sophisticated malicious attacks.

AAMAS Conference 2017 Conference Paper

Automatic Construction of Agent-based Simulation Using Business Process Diagrams and Ontology-based Models

  • Donghun Kang
  • Zhenchao C. Bing
  • Wen Song
  • Zehong Hu
  • Shuo Chen
  • Jie Zhang
  • Hui Xi

In this paper, we present a tool for the business users to analyze different business scenarios using business process diagrams and ontology-based models. The business scenarios involve different types of entities where the business process diagrams are suitable for describing entities’ behaviors. The ontology-based model is proposed to capture entities’ attributes and their relations in a hierarchical manner. The tool can automatically construct agent-based simulation models, which can be executed instantly on the agent-based simulation engine without the help of software developers.

IJCAI Conference 2017 Conference Paper

Deep Multi-species Embedding

  • Di Chen
  • Yexiang Xue
  • Daniel Fink
  • Shuo Chen
  • Carla P. Gomes

Understanding how species are distributed across landscapes over time is a fundamental question in biodiversity research. Unfortunately, most species distribution models only target a single species at a time, despite strong ecological evidence that species are not independently distributed. We propose Deep Multi-Species Embedding (DMSE), which jointly embeds vectors corresponding to multiple species as well as vectors representing environmental covariates into a common high-dimensional feature space via a deep neural network. Applied to bird observational data from the citizen science project eBird, we demonstrate how the DMSE model discovers inter-species relationships to outperform single-species distribution models (random forests and SVMs) as well as competing multi-label models. Additionally, we demonstrate the benefit of using a deep neural network to extract features within the embedding and show how they improve the predictive performance of species distribution modelling. An important domain contribution of the DMSE model is the ability to discover and describe species interactions while simultaneously learning the shared habitat preferences among species. As an additional contribution, we provide a graphical embedding of hundreds of bird species in the Northeast US.

YNIMG Journal 2012 Journal Article

Determining functional connectivity using fMRI data with diffusion-based anatomical weighting

  • F. DuBois Bowman
  • Lijun Zhang
  • Gordana Derado
  • Shuo Chen

There is strong interest in investigating both functional connectivity (FC) using functional magnetic resonance imaging (fMRI) and structural connectivity (SC) using diffusion tensor imaging (DTI). There is also emerging evidence of correspondence between functional and structural pathways within many networks (Greicius, et al. , 2009; Skudlarski et al. , 2008; van den Heuvel et al. , 2009), although some regions without SC exhibit strong FC (Honey et al. , 2008). These findings suggest that FC may be mediated by (direct or indirect) anatomical connections, offering an opportunity to supplement fMRI data with DTI data when determining FC. We develop a novel statistical method for determining FC, called anatomically weighted FC (awFC), which combines fMRI and DTI data. Our awFC approach implements a hierarchical clustering algorithm that establishes neural processing networks using a new distance measure consisting of two components, a primary functional component that captures correlations between fMRI signals from different regions and a secondary anatomical weight reflecting probabilities of SC. The awFC approach defaults to conventional unweighted clustering for specific parameter settings. We optimize awFC parameters using a strictly functional criterion, therefore our approach will generally perform at least as well as an unweighted analysis, with respect to intracluster coherence or autocorrelation. AwFC also yields more informative results since it provides structural properties associated with identified functional networks. We apply awFC to two fMRI data sets: resting-state data from 6 healthy subjects and data from 17 subjects performing an auditory task. In these examples, awFC leads to more highly autocorrelated networks than a conventional analysis. We also conduct a simulation study, which demonstrates accurate performance of awFC and confirms that awFC generally yields comparable, if not superior, accuracy relative to a standard approach.

AAAI Conference 2010 Conference Paper

What if the Irresponsible Teachers Are Dominating?

  • Shuo Chen
  • Jianwen Zhang
  • Guangyun Chen
  • Changshui Zhang

As the Internet-based crowdsourcing services become more and more popular, learning from multiple teachers or sources has received more attention of the researchers in the machine learning area. In this setting, the learning system is dealing with samples and labels provided by multiple teachers, who in common cases, are non-expert. Their labeling styles and behaviors are usually diverse, some of which are even detrimental to the learning system. Thus, simply putting them together and utilizing the algorithms designed for singleteacher scenario would be not only improper, but also damaging. The problem calls for more specific methods. Our work focuses on a case where the teachers are composed of good ones and irresponsible ones. By irresponsible, we mean the teacher who takes the labeling task not seriously and label the sample at random without inspecting the sample itself. This behavior is quite common when the task is not attractive enough and the teacher just wants to finish it as soon as possible. Sometimes, the irresponsible teachers could take a considerable part among all the teachers. If we do not take out their effects, our learning system would be ruined with no doubt. In this paper, we propose a method for picking out the good teachers with promising experimental results. It works even when the irresponsible teachers are dominating in numbers.

IJCAI Conference 2009 Conference Paper

  • Shuo Chen
  • Changshui Zhang

The Universum sample, which is defined as the sample that doesn’t belong to any of the classes the learning task concerns, has been proved to be helpful in both supervised and semi-supervised settings. The former works treat the Universum samples equally. Our research found that not all the Universum samples are helpful, and we propose a method to pick the informative ones, i. e. , inbetween Universum samples. We also set up a new semi-supervised framework to incorporate the in-between Universum samples. Empirical experiments show that our method outperforms the former ones.