Author name cluster

Shuo Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

52 papers

2 author rows

AAAI Conference 2026 Conference Paper

ADAPT: Adaptive Decentralized Architecture with Perception-Aligned Training for Structural Generalization in Multi-Agent RL

Zhixiang Zhang
Shuo Chen
Yexin Li
Feng Wang

Multi-agent reinforcement learning (MARL) excels in cooperative and competitive tasks, but most architectures are tied to fixed input-output sizes and require retraining when the number of perceptible or controllable objects changes. While structural generalization techniques mitigate this, they rely on centralized training, raising concerns about scalability and privacy. We propose ADAPT, the first framework to support structural generalization under a decentralized training and decentralized execution (DTDE) paradigm. Every agent adopts an object-centric view, encoding each observed object into a feature vector and aggregating them into a variable-length set representation. To enable each agent to infer task-level contexts from this dynamic input independently, we propose a dynamic-consistency loss that enforces spatio-temporal alignment between context representations and observed environmental dynamics. Agents then condition their policies on the inferred contexts to make locally aligned decisions. For zero-shot transfer, we propose FINE (Foresight INdex for multi-agEnt), a metric that considers Q-value overestimation and enables cross-policy comparison of long-term impact, facilitating effective policy transfer. Experiments show that ADAPT surpasses existing DTDE methods and outperforms CTDE baselines in zero-shot generalization.

PDF Details DOI

TIST Journal 2026 Journal Article

Atom-Motif Contrastive Transformer for Molecular Property Prediction

Wentao Yu
Shuo Chen
Chen Gong
Bo Han
Gang Niu
Masashi Sugiyama

Recently, Graph Transformer (GT) models have been widely used in the task of Molecular Property Prediction (MPP) due to their high reliability in characterizing the latent relationship among graph nodes (i.e., the atoms in a molecule). However, most existing GT-based methods usually explore the basic interactions between pairwise atoms, and thus they fail to consider the important interactions among critical motifs (e.g., functional groups consisted of several atoms) of molecules. As motifs in a molecule are significant patterns that are of great importance for determining molecular properties (e.g., toxicity and solubility), overlooking motif interactions inevitably hinders the effectiveness of MPP. To address this issue, we propose a novel Atom-Motif Contrastive Transformer (AMCT), which not only explores the atom-level interactions but also considers the motif-level interactions. Since the representations of atoms and motifs for a given molecule are actually two different views of the same instance, they are naturally aligned to generate the self-supervisory signals for model training. Meanwhile, the same motif can exist in different molecules, and hence we also employ the contrastive loss to maximize the representation agreement of identical motifs across different molecules. Finally, in order to clearly identify the motifs that are critical in deciding the properties of each molecule, we further construct a property-aware attention mechanism into our learning framework. Our proposed AMCT is extensively evaluated on 10 popular benchmark datasets, and both quantitative and qualitative results firmly demonstrate its effectiveness when compared with the state-of-the-art methods.

Details DOI

AAAI Conference 2026 Conference Paper

Kronos: A Foundation Model for the Language of Financial Markets

Yu Shi
Zongliang Fu
Shuo Chen
Bohan Zhao
Wei Xu
Changshui Zhang
Jian Li

The success of large-scale pre-training paradigm, exemplified by Large Language Models (LLMs), has inspired the development of Time Series Foundation Models (TSFMs). However, their application to financial candlestick (K-line) data remains limited, often underperforming non-pre-trained architectures. Moreover, existing TSFMs often overlook crucial downstream tasks such as volatility prediction and synthetic data generation. To address these limitations, we propose Kronos, a unified, scalable pre-training framework tailored to financial K-line modeling. Kronos introduces a specialized tokenizer that discretizes continuous market information into token sequences, preserving both price dynamics and trade activity patterns. We pre-train Kronos using an autoregressive objective on a massive, multi-market corpus of over 12 billion K-line records from 45 global exchanges, enabling it to learn nuanced temporal and cross-asset representations. Kronos excels in a zero-shot setting across a diverse set of financial tasks. On benchmark datasets, Kronos boosts price series forecasting RankIC by 93% over the leading TSFM and 87% over the best non-pre-trained baseline. It also achieves a 9% lower MAE in volatility forecasting and a 22% improvement in generative fidelity for synthetic K-line sequences. These results establish Kronos as a robust, versatile foundation model for end-to-end financial time series analysis.

PDF Details DOI

EAAI Journal 2026 Journal Article

Mining high average-efficiency itemsets based on a compact list structure

Gufeng Li
Xuanwei Zhang
Tao Shang
Shuo Chen

High utility itemset mining is a pivotal research direction in data mining, aiming to discover high-value itemsets within datasets. However, traditional approaches ignore product costs, complicating the identification of truly high-revenue combinations. To address this issue, high efficiency itemset mining has been proposed, which defines efficiency as the utility-to-cost ratio. Nevertheless, this measure fails to consider itemset sizes, potentially causing efficiency to inflate as the itemset expands and posing a fairness problem when a uniform threshold is used to evaluate itemsets of different sizes. Consequently, high average-efficiency itemset mining has been introduced for more equitable assessment. To enhance the performance of this algorithm, we propose an improved high average-efficiency itemset mining algorithm based on a compact list structure. Our algorithm introduces a novel average-efficiency list structure, derives a tight upper bound for the maximum average efficiency, and incorporates an innovative pruning strategy. Furthermore, by leveraging an estimated average-efficiency co-occurrence structure, our algorithm significantly reduces the number of join operations. These optimizations collectively result in a substantial improvement in the mining of high average-efficiency itemsets. Experimental results confirm that the proposed algorithm achieves significant improvements in both computational efficiency and scalability.

Details DOI

AAAI Conference 2026 Conference Paper

RMLer: Synthesizing Novel Objects Across Diverse Categories via Reinforcement Mixing Learning

Jun Li
Zikun Chen
Haibo Chen
Shuo Chen
Jian Yang

Novel object synthesis by integrating distinct textual concepts from diverse categories remains a significant challenge in text-to-image generation. Existing methods often suffer from insufficient concept mixing, lack of rigorous evaluation, and suboptimal outputs, resulting in conceptual imbalance, superficial combinations, or mere juxtapositions. To address these limitations, we propose Reinforcement Mixing Learning (RMLer), a framework that formulates cross-category concept fusion as a reinforcement learning problem: mixed features serve as states, mixing strategies as actions, and visual outcomes as rewards. Specifically, we design an MLP policy network to predict dynamic coefficients for blending cross-category text embeddings. We further introduce visual rewards based on (1) semantic similarity and (2) compositional balance between the fused object and its constituent concepts, and optimize the policy via proximal policy optimization. At inference time, a selection strategy leverages these rewards to curate the highest-quality fused objects. Extensive experiments demonstrate that RMLer synthesizes coherent, high-fidelity objects from diverse categories and consistently outperforms existing methods. Our work provides a robust framework for generating novel visual concepts, with promising applications in film, gaming, and design.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

Xingrui Wang
Wufei Ma
Angtian Wang
Shuo Chen
Adam Kortylewski
Alan L. Yuille

For vision-language models (VLMs), understanding the dynamic properties of objects and their interactions in 3D scenes from videos is crucial for effective reasoning about high-level temporal and action semantics. Although humans are adept at understanding these properties by constructing 3D and temporal (4D) representations of the world, current video understanding models struggle to extract these dynamic semantics, arguably because these models use cross-frame reasoning without underlying knowledge of the 3D/4D scenes. In this work, we introduce **DynSuperCLEVR**, the first video question answering dataset that focuses on language understanding of the dynamic properties of 3D objects. We concentrate on three physical concepts—*velocity*, *acceleration*, and *collisions*—within 4D scenes. We further generate three types of questions, including factual queries, future predictions, and counterfactual reasoning that involve different aspects of reasoning on these 4D dynamic properties. To further demonstrate the importance of explicit scene representations in answering these 4D dynamics questions, we propose **NS-4DPhysics**, a **N**eural-**S**ymbolic VideoQA model integrating **Physics** prior for **4D** dynamic properties with explicit scene representation of videos. Instead of answering the questions directly from the video text input, our method first estimates the 4D world states with a 3D generative model powered by a physical prior, and then uses neural symbolic reasoning to answer the questions based on the 4D world states. Our evaluation on all three types of questions in DynSuperCLEVR shows that previous video question answering models and large multimodal models struggle with questions about 4D dynamics, while our NS-4DPhysics significantly outperforms previous state-of-the-art models.

Details

TMLR Journal 2025 Journal Article

CyberThreat-Eval: Can Large Language Models Automate Real-World Threat Research?

Xiangsen Chen
Xuan Feng
Shuo Chen
Matthieu Maitre
Sudipto Rakshit
Diana Duvieilh
Ashley Picone
Nan Tang

Analyzing Open Source Intelligence (OSINT) from large volumes of data is critical for drafting and publishing comprehensive CTI reports. This process usually follows a three-stage workflow---triage, deep search and TI drafting. While Large Language Models (LLMs) offer a promising route toward automation, existing benchmarks still have limitations. These benchmarks often consist of tasks that do not reflect real-world analyst workflows. For example, human analysts rarely receive tasks in the form of multiple-choice questions. Also, existing benchmarks often rely on model-centric metrics that emphasize lexical overlap rather than actionable, detailed insights essential for security analysts. Moreover, they typically fail to cover the complete three-stage workflow. To address these issues, we introduce CyberThreat-Eval, which is collected from the daily CTI workflow of a world-leading company. This expert-annotated benchmark assesses LLMs on practical tasks across all three stages as mentioned above. It utilizes analyst-centric metrics that measure factual accuracy, content quality, and operational costs. Our evaluation using this benchmark reveals important insights into the limitations of current LLMs. For example, LLMs often lack the nuanced expertise required to handle complex details and struggle to distinguish between correct and incorrect information. To address these challenges, the CTI workflow incorporates both external ground-truth databases and human expert knowledge. TRA allows human experts to iteratively provide feedback for continuous improvement. The code of CyberThreat-Eval benchmark is available at https://github.com/secintelligence/CyberThreat-Eval.