Arrow Research search

Author name cluster

Shiyu Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
2 author rows

Possible papers

14

AAAI Conference 2026 Conference Paper

Context-aware Graph Meta-learning

  • Ningbo Huang
  • Gang Zhou
  • Meng Zhang
  • Shunhang Li
  • Ling Wang
  • Shiyu Wang
  • Yi Xia

Developing a universal graph model capable of generalizing across diverse graph domains has consistently been a key objective in graph learning. Recently, many studies have focused on achieving in-context learning (ICL) on graphs, which can generalize to novel tasks without the need for fine-tuning, similar to large language models (LLMs) such as GPT-3. These researches can be primarily divided into graph-based methods and LLM-based methods. However, the generalization performance of the former is limited by the representation capability of GNNs, while the latter faces the challenge of LLMs understanding graph structures. Therefore, we propose CAGML, a context-aware graph meta-learning model, which learns to generalize to cross-domain and cross-granularity graph tasks using a meta-trained Transformer. Firstly, we formulate graph few-shot learning tasks as a structure-aware sequence modeling problem to unify cross-domain and cross-granularity tasks. Then, a structure-aware Transformer (SAT) is introduced as a graph in-context learner to make predictions with a few labels and the task-specific structural context. Finally, we pre-train SAT in a meta-optimization manner on large-scale citation network and knowledge graph. Experiments on 6 cross-domain graph datasets show that, without fine-tuning, CAGML can achieve state-of-the-art (SOTA) performance in terms of average performance across cross-granularity tasks on adopted datasets.

EAAI Journal 2026 Journal Article

Fine-tuning a vulnerability-specific large language model for a hybrid software vulnerability detection method

  • Shiyu Wang
  • Yuyao Jiang
  • Shuaijianni Xu
  • Yiwen Liu
  • Ming Yin
  • Guofeng He

Detecting vulnerabilities in large-scale, multi-file software systems remains a critical challenge, as traditional techniques and current large language models (LLMs) struggle with long code dependencies and complex control flows. This challenge is further compounded for Java, which dominates enterprise software yet remains relatively underexplored in vulnerability detection research. In this study, we first fine-tune a vulnerability-specific LLM called VulDetLLM and then propose a hybrid vulnerability detection method using VulDetLLM and program-assisted language (PAL) model for large-scale source code (HyVD-VP). HyVD-VP performs static detection using semantically meaningful slices and external knowledge, verifies predictions through lightweight dynamic checks during runtime, and integrates static and dynamic signals via an LLM-based decision module to achieve accurate and explainable detection. We evaluate HyVD-VP primarily on Java, which is widely used in mission-critical enterprise domains such as banking, telecommunications, and large-scale web services. The method consistently outperforms traditional analyzers (Fortify, SpotBugs) and recent research baselines, achieving 96. 3 % accuracy, 95. 7 % F1-score while reducing false negatives. Importantly, it also identified 16 previously unknown vulnerabilities in six real-world enterprise projects, underscoring its industrial relevance. Based on our current evaluation in Java, the detection pipeline shows moderate runtime overhead mainly from dynamic validation, as indicated by runtime profiling. A feasibility study further suggests the potential adaptability of HyVD-VP to multiple programming languages and constrained hardware scenarios simultaneously, indicating both practical applicability and extensibility. These results establish HyVD-VP as a promising step toward scalable, industry-ready solutions for reliable vulnerability detection in large-scale software systems.

NeurIPS Conference 2025 Conference Paper

APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

  • Akshara Prabhakar
  • Zuxin Liu
  • Ming Zhu
  • Jianguo Zhang
  • Tulika Manoj Awalgaonkar
  • Shiyu Wang
  • Zhiwei Liu
  • Haolin Chen

Training effective AI agents for multi-turn interactions requires high-quality data that captures realistic human-agent dynamics, yet such data is scarce and expensive to collect manually. We introduce APIGen-MT, a two-phase framework that generates verifiable and diverse multi-turn agent data. In the first phase, our agentic pipeline produces detailed task blueprints with ground-truth actions, leveraging a committee of LLM reviewers and iterative feedback loops. These blueprints are then transformed into complete interaction trajectories through simulated human-agent interplay. We train a family of models---the xLAM-2-fc-r series with sizes ranging from 1B to 70B parameters. Our models outperform frontier models such as GPT-4o and Claude 3. 5 on $\tau$-bench and BFCL benchmarks, with the smaller models surpassing their larger counterparts, particularly in multi-turn settings, while maintaining superior consistency across multiple trials. Comprehensive experiments demonstrate that our verified blueprint-to-details approach yields high-quality training data, enabling the development of more reliable, efficient, and capable agents. We open-source both the synthetic data collected and the trained xLAM-2-fc-r models to advance research in AI agents. Dataset: https: //huggingface. co/datasets/Salesforce/APIGen-MT-5k & Models: https: //huggingface. co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4

ICML Conference 2025 Conference Paper

Fast, Accurate Manifold Denoising by Tunneling Riemannian Optimization

  • Shiyu Wang
  • Mariam Avagyan
  • Yihan Shen
  • Arnaud Lamy
  • Tingran Wang
  • Szabolcs Márka
  • Zsuzsa Márka
  • John Wright 0001

Learned denoisers play a fundamental role in various signal generation (e. g. , diffusion models) and reconstruction (e. g. , compressed sensing) architectures, whose success derives from their ability to leverage low-dimensional structure in data. Existing denoising methods, however, either rely on local approximations that require a linear scan of the entire dataset or treat denoising as generic function approximation problems, sacrificing efficiency and interpretability. We consider the problem of efficiently denoising a new noisy data point sampled from an unknown manifold $\mathcal M \in \mathbb{R}^D$, using only noisy samples. This work proposes a framework for test-time efficient manifold denoising, by framing the concept of "learning-to-denoise" as "learning-to-optimize". We have two technical innovations: (i) online learning methods which learn to optimize over the manifold of clean signals using only noisy data, effectively "growing" an optimizer one sample at a time. (ii) mixed-order methods which guarantee that the learned optimizers achieve global optimality, ensuring both efficiency and near-optimal denoising performance. We corroborate these claims with theoretical analyses of both the complexity and denoising performance of mixed-order traversal. Our experiments on scientific manifolds demonstrate significantly improved complexity-performance tradeoffs compared to nearest neighbor search, which underpins existing provable denoising approaches based on exhaustive search.

AAAI Conference 2025 Conference Paper

Text2Data: Low-Resource Data Generation with Textual Control

  • Shiyu Wang
  • Yihao Feng
  • Tian Lan
  • Ning Yu
  • Yu Bai
  • Ran Xu
  • Huan Wang
  • Caiming Xiong

Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines. Recognizing the importance of this interface, the machine learning community is investing considerable effort in generating data that is semantically coherent with textual instructions. While strides have been made in text-to-data generation spanning image editing, audio synthesis, video creation, and beyond, low-resource areas characterized by expensive annotations or complex data structures, such as molecules, motion dynamics, and time series, often lack textual labels. This deficiency impedes supervised learning, thereby constraining the application of advanced generative models for text-to-data tasks. In response to these challenges in the low-resource scenario, we propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model. Subsequently, it undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting. Comprehensive experiments demonstrate that Text2Data is able to achieve enhanced performance regarding controllability across various modalities, including molecules, motions and time series, when compared to existing baselines.

ICLR Conference 2025 Conference Paper

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

  • Xiaoming Shi
  • Shiyu Wang
  • Yuqi Nie
  • Dianqi Li
  • Zhou Ye 0001
  • Qingsong Wen
  • Ming Jin

Deep learning for time series forecasting has seen significant advancements over the past decades. However, despite the success of large-scale pre-training in language and vision domains, pre-trained time series models remain limited in scale and operate at a high cost, hindering the development of larger capable forecasting models in real-world applications. In response, we introduce Time-MoE, a scalable and unified architecture designed to pre-train larger, more capable forecasting foundation models while reducing inference costs. By leveraging a sparse mixture-of-experts (MoE) design, Time-MoE enhances computational efficiency by activating only a subset of networks for each prediction, reducing computational load while maintaining high model capacity. This allows Time-MoE to scale effectively without a corresponding increase in inference costs. Time-MoE comprises a family of decoder-only transformer models that operate in an auto-regressive manner and support flexible forecasting horizons with varying input context lengths. We pre-trained these models on our newly introduced large-scale data Time-300B, which spans over 9 domains and encompassing over 300 billion time points. For the first time, we scaled a time series foundation model up to 2.4 billion parameters, achieving significantly improved forecasting precision. Our results validate the applicability of scaling laws for training tokens and model size in the context of time series forecasting. Compared to dense models with the same number of activated parameters or equivalent computation budgets, our models consistently outperform them by large margin. These advancements position Time-MoE as a state-of-the-art solution for tackling real-world time series forecasting challenges with superior capability, efficiency, and flexibility. Code is available at https://github.com/Time-MoE/Time-MoE

NeurIPS Conference 2024 Conference Paper

NeuroBOLT: Resting-state EEG-to-fMRI Synthesis with Multi-dimensional Feature Mapping

  • Yamin Li
  • Ange Lou
  • Ziyuan Xu
  • Shengchao Zhang
  • Shiyu Wang
  • Dario J. Englot
  • Soheil Kolouri
  • Daniel Moyer

Functional magnetic resonance imaging (fMRI) is an indispensable tool in modern neuroscience, providing a non-invasive window into whole-brain dynamics at millimeter-scale spatial resolution. However, fMRI is constrained by issues such as high operation costs and immobility. With the rapid advancements in cross-modality synthesis and brain decoding, the use of deep neural networks has emerged as a promising solution for inferring whole-brain, high-resolution fMRI features directly from electroencephalography (EEG), a more widely accessible and portable neuroimaging modality. Nonetheless, the complex projection from neural activity to fMRI hemodynamic responses and the spatial ambiguity of EEG pose substantial challenges both in modeling and interpretability. Relatively few studies to date have developed approaches for EEG-fMRI translation, and although they have made significant strides, the inference of fMRI signals in a given study has been limited to a small set of brain areas and to a single condition (i. e. , either resting-state or a specific task). The capability to predict fMRI signals in other brain areas, as well as to generalize across conditions, remain critical gaps in the field. To tackle these challenges, we introduce a novel and generalizable framework: NeuroBOLT, i. e. , Neuro-to-BOLD Transformer, which leverages multi-dimensional representation learning from temporal, spatial, and spectral domains to translate raw EEG data to the corresponding fMRI activity signals across the brain. Our experiments demonstrate that NeuroBOLT effectively reconstructs unseen resting-state fMRI signals from primary sensory, high-level cognitive areas, and deep subcortical brain regions, achieving state-of-the-art accuracy with the potential to generalize across varying conditions and sites, which significantly advances the integration of these two modalities.

IJCAI Conference 2023 Conference Paper

Full Scaling Automation for Sustainable Development of Green Data Centers

  • Shiyu Wang
  • Yinbo Sun
  • Xiaoming Shi
  • Zhu Shiyi
  • Lin-Tao Ma
  • James Zhang
  • Yangfei Zheng
  • Liu Jian

The rapid rise in cloud computing has resulted in an alarming increase in data centers' carbon emissions, which now accounts for >3% of global greenhouse gas emissions, necessitating immediate steps to combat their mounting strain on the global climate. An important focus of this effort is to improve resource utilization in order to save electricity usage. Our proposed Full Scaling Automation (FSA) mechanism is an effective method of dynamically adapting resources to accommodate changing workloads in large-scale cloud computing clusters, enabling the clusters in data centers to maintain their desired CPU utilization target and thus improve energy efficiency. FSA harnesses the power of deep representation learning to accurately predict the future workload of each service and automatically stabilize the corresponding target CPU usage level, unlike the previous autoscaling methods, such as Autopilot or FIRM, that need to adjust computing resources with statistical models and expert knowledge. Our approach achieves significant performance improvement compared to the existing work in real-world datasets. We also deployed FSA on large-scale cloud computing clusters in industrial data centers, and according to the certification of the China Environmental United Certification Center (CEC), a reduction of 947 tons of carbon dioxide, equivalent to a saving of 1538, 000 kWh of electricity, was achieved during the Double 11 shopping festival of 2022, marking a critical step for our company’s strategic goal towards carbon neutrality by 2030.

AAAI Conference 2023 Conference Paper

SLOTH: Structured Learning and Task-Based Optimization for Time Series Forecasting on Hierarchies

  • Fan Zhou
  • Chen Pan
  • Lintao Ma
  • Yu Liu
  • Shiyu Wang
  • James Zhang
  • Xinxin Zhu
  • Xuanwei Hu

Multivariate time series forecasting with hierarchical structure is widely used in real-world applications, e.g., sales predictions for the geographical hierarchy formed by cities, states, and countries. The hierarchical time series (HTS) forecasting includes two sub-tasks, i.e., forecasting and reconciliation. In the previous works, hierarchical information is only integrated in the reconciliation step to maintain coherency, but not in forecasting step for accuracy improvement. In this paper, we propose two novel tree-based feature integration mechanisms, i.e., top-down convolution and bottom-up attention to leverage the information of the hierarchical structure to improve the forecasting performance. Moreover, unlike most previous reconciliation methods which either rely on strong assumptions or focus on coherent constraints only, we utilize deep neural optimization networks, which not only achieve coherency without any assumptions, but also allow more flexible and realistic constraints to achieve task-based targets, e.g., lower under-estimation penalty and meaningful decision-making loss to facilitate the subsequent downstream tasks. Experiments on real-world datasets demonstrate that our tree-based feature integration mechanism achieves superior performances on hierarchical forecasting tasks compared to the state-of-the-art methods, and our neural optimization networks can be applied to real-world tasks effectively without any additional effort under coherence and task-based constraints.

YNIMG Journal 2022 Journal Article

An awareness-dependent mapping of saliency in the human visual system

  • Lijuan Wang
  • Ling Huang
  • Mengsha Li
  • Xiaotong Wang
  • Shiyu Wang
  • Yuefa Lin
  • Xilin Zhang

The allocation of exogenously cued spatial attention is governed by a saliency map. Yet, how salience is mapped when multiple salient stimuli are present simultaneously, and how this mapping interacts with awareness remains unclear. These questions were addressed here using either visible or invisible displays presenting two foreground stimuli (whose bars were oriented differently from the bars in the otherwise uniform background): a high salience target and a distractor of varied, lesser salience. Interference, or not, by the distractor with the effective salience of the target served to index a graded or non-graded nature of salience mapping, respectively. The invisible and visible displays were empirically validated by a two-alternative forced choice test (detecting the quadrant of the target) demonstrating subjects' performance at or above chance level, respectively. By combining psychophysics, fMRI, and effective connectivity analysis, we found a graded distribution of salience with awareness, changing to a non-graded distribution without awareness. Crucially, we further revealed that the graded distribution was contingent upon feedback from the posterior intraparietal sulcus (pIPS, especially from the right pIPS), whereas the non-graded distribution was innate to V1. Together, this awareness-dependent mapping of saliency reconciles several previous, seemingly contradictory findings regarding the nature of the saliency map.

NeurIPS Conference 2022 Conference Paper

Deep Generative Model for Periodic Graphs

  • Shiyu Wang
  • Xiaojie Guo
  • Liang Zhao

Periodic graphs are graphs consisting of repetitive local structures, such as crystal nets and polygon mesh. Their generative modeling has great potential in real-world applications such as material design and graphics synthesis. Classical models either rely on domain-specific predefined generation principles (e. g. , in crystal net design), or follow geometry-based prescribed rules. Recently, deep generative models have shown great promise in automatically generating general graphs. However, their advancement into periodic graphs has not been well explored due to several key challenges in 1) maintaining graph periodicity; 2) disentangling local and global patterns; and 3) efficiency in learning repetitive patterns. To address them, this paper proposes Periodical-Graph Disentangled Variational Auto-encoder (PGD-VAE), a new deep generative model for periodic graphs that can automatically learn, disentangle, and generate local and global graph patterns. Specifically, we develop a new periodic graph encoder consisting of global-pattern encoder and local-pattern encoder that ensures to disentangle the representation into global and local semantics. We then propose a new periodic graph decoder consisting of local structure decoder, neighborhood decoder, and global structure decoder, as well as the assembler of their outputs that guarantees periodicity. Moreover, we design a new model learning objective that helps ensure the invariance of local-semantic representations for the graphs with the same local structure. Comprehensive experimental evaluations have been conducted to demonstrate the effectiveness of the proposed method.

NeurIPS Conference 2022 Conference Paper

Multi-objective Deep Data Generation with Correlated Property Control

  • Shiyu Wang
  • Xiaojie Guo
  • Xuanyang Lin
  • Bo Pan
  • Yuanqi Du
  • Yinkai Wang
  • Yanfang Ye
  • Ashley Petersen

Developing deep generative models has been an emerging field due to the ability to model and generate complex data for various purposes, such as image synthesis and molecular design. However, the advance of deep generative models is limited by the challenges to generate objects that possess multiple desired properties because: 1) the existence of complex correlation among real-world properties is common but hard to identify; 2) controlling individual property enforces an implicit partially control of its correlated properties, which is difficult to model; 3) controlling multiple properties under variour manners simultaneously is hard and underexplored. We address these challenges by proposing a novel deep generative framework that recovers semantics and correlation of properties through disentangled latent vectors. The correlation is handled via an explainable mask pooling layer, and properties are precisely retained by the generated objects via the mutual dependence between latent vectors and properties. Our generative model preserves properties of interest while handles correlation and conflicts of properties under a multi-objective optimization framework. The experiments demonstrate our model's superior performance in generating objects with desired properties.

NeurIPS Conference 2021 Conference Paper

GraphGT: Machine Learning Datasets for Graph Generation and Transformation

  • Yuanqi Du
  • Shiyu Wang
  • Xiaojie Guo
  • Hengning Cao
  • Shujie Hu
  • Junji Jiang
  • Aishwarya Varala
  • Abhinav Angirekula

Graph generation has shown great potential in applications like network design and mobility synthesis and is one of the fastest-growing domains in machine learning for graphs. Despite the success of graph generation, the corresponding real-world datasets are few and limited to areas such as molecules and citation networks. To fill the gap, we introduce GraphGT, a large dataset collection for graph generation and transformation problem, which contains 36 datasets from 9 domains across 6 subjects. To assist the researchers with better explorations of the datasets, we provide a systemic review and classification of the datasets based on research tasks, graph types, and application domains. We have significantly (re)processed all the data from different domains to fit the unified framework of graph generation and transformation problems. In addition, GraphGT provides an easy-to-use graph generation pipeline that simplifies the process for graph data loading, experimental setup and model evaluation. Finally, we compare the performance of popular graph generative models in 16 graph generation and 17 graph transformation datasets, showing the great power of GraphGT in differentiating and evaluating model capabilities and drawbacks. GraphGT has been regularly updated and welcomes inputs from the community. GraphGT is publicly available at \url{https: //graphgt. github. io/} and can also be accessed via an open Python library.

YNIMG Journal 2017 Journal Article

Neural correlates of believing

  • Xiaochun Han
  • Ting Zhang
  • Shiyu Wang
  • Shihui Han

Beliefs provide a fundamental cognitive basis for human behavior. But how the brain believes remains a mystery. We investigated the neural underpinnings of believing by scanning healthy adults using functional magnetic resonance imaging when they made yes/no responses to the questions whether they believe or think that a trait adjective describes themselves or a celebrity. We found that, relative to thinking, believing was characterized with better memory of self-related adjectives. Moreover, believing (vs. thinking) was associated with stronger activations in the left anterior insula/inferior frontal cortex, stronger functional connectivity between the medial prefrontal cortex and left occipital cortex during judgments of one's own personality traits, and stronger intrinsic connectivity between the left occipital cortex and the left anterior insula/inferior frontal cortex. Our findings shed new light on the neurocognitive processes that characterize believing as a mental process in healthy adults.