Arrow Research search

Author name cluster

Yang Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

146 papers
2 author rows

Possible papers

146

EAAI Journal 2026 Journal Article

A two-stage framework for photovoltaic power forecasting: Integrating adaptive hybrid decomposition with a novel predictor

  • Xiaonan Shen
  • Junjie Shen
  • Tianle Zhang
  • Yuting Zhang
  • Yang Wang

To address the severe non-stationarity and multi-scale fluctuations in photovoltaic (PV) power output, this paper proposes a novel two-stage forecasting framework that integrates complexity-aware adaptive hybrid decomposition with an xLSTM (Extended Long-Short Term Memory)-KAN (Kolmogorov-Arnold Network) model. Initially, the original time series is decomposed using Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN). A K-Means clustering method based on Sample Entropy (SE) is then employed to identify the high-frequency component exhibiting the greatest stochasticity. This component subsequently undergoes a secondary adaptive decomposition via Sequential Variational Mode Decomposition (SVMD), thereby effectively isolating noise and enhancing the signal's purity. The optimized components are then fed into the xLSTM-KAN prediction model. Unlike traditional “black-box" deep learning architectures, this model integrates xLSTM to capture long-term dependencies and enhance parallel computation efficiency, while also leveraging KAN's learnable activation functions, which are parameterized by B-splines, to significantly improve approximation accuracy and model interpretability. Experimental results from four actual photovoltaic power plants (30–130 MW) show that the proposed temporal forecasting model, xLSTM-KAN, reduces MAE (Mean Absolute Error) by an average of 5. 81% and RMSE (Root Mean Square Error) by 7. 45% compared to other advanced architectures like iTransformer and LSTMformer. It also exhibits superior stability, with VAE (Variance of Absolute Error) decreasing by an average of 15. 02%. Moreover, the proposed adaptive decomposition strategy, SVMD-ICEEMDAN, lowers MAE by an average of 9. 12% and RMSE by 18. 44% compared to traditional hybrid methods such as VMD-CEEMDAN. These results validate the framework's robustness across different scales and climatic conditions, providing reliable and interpretable decision support for power systems with high renewable energy penetration.

AAAI Conference 2026 Conference Paper

Agent-SAMA: State-Aware Mobile Assistant

  • Linqiang Guo
  • Wei Liu
  • Yi Wen Heng
  • Tse-Hsun (Peter) Chen
  • Yang Wang

Mobile Graphical User Interface (GUI) agents aim to autonomously complete tasks within or across apps based on user instructions. While recent Multimodal Large Language Models (MLLMs) enable these agents to interpret UI screens and perform actions, existing agents remain fundamentally reactive. They reason over the current UI screen but lack a structured representation of the app navigation flow, lim- iting GUI agents’ ability to understand execution context, detect unexpected execution results, and recover from errors. We introduce Agent-SAMA, a state-aware multi-agent framework that models app execution as a Finite State Machine (FSM), treating UI screens as states and user actions as transitions. Agent-SAMA implements four specialized agents that collaboratively construct and use FSMs in real time to guide task planning, execution verification, and recovery. We evaluate Agent-SAMA on two types of benchmarks: cross- app (Mobile-Eval-E, SPA-Bench) and mostly single-app (AndroidWorld). On Mobile-Eval-E, Agent-SAMA achieves an 84.0% success rate and a 71.9% recovery rate. On SPA-Bench, it reaches an 80.0% success rate with a 66.7% recovery rate. Compared to prior methods, Agent-SAMA improves task success by up to 12% and recovery success by 13.8%. On AndroidWorld, Agent-SAMA achieves a 63.7% success rate, outperforming the baselines. Our results demonstrate that structured state modeling enhances robustness and can serve as a lightweight, model-agnostic memory layer for future GUI agents.

AAAI Conference 2026 Conference Paper

AlignTrack: Top-Down Spatiotemporal Resolution Alignment for RGB-Event Visual Tracking

  • Chuanyu Sun
  • Jiqing Zhang
  • Yang Wang
  • Yuanchen Wang
  • Yutong Jiang
  • Baocai Yin
  • Xin Yang

Most existing RGB-Event trackers rely on strictly aligned datasets, overlooking the asynchronous spatio-temporal resolutions common in real-world scenarios. This methodological limitation impedes effective RGB-Event feature alignment and ultimately degrades tracking performance. To overcome this limitation, we propose AlignTrack, a novel tracking framework built upon a Top-Down Alignment (TDA) strategy inspired by the human visual system. Our TDA framework follows an encode-decode-align paradigm: it first encodes multimodal features to generate target-related priors, which are then progressively decoded to guide a subsequent feature alignment pass. Within this framework, we introduce two key innovations: (1) a Cross-Prior Attention (CPA) module that effectively generates and integrates cross-modal priors, and (2) a Cross-Modal Semantic Alignment (CSA) loss that maximizes mutual information to enforce semantic consistency between modalities. Extensive experiments show that AlignTrack achieves state-of-the-art performance on four challenging RGB-Event tracking benchmarks, demonstrating its robustness in both aligned and unaligned scenarios. Ablation studies further validate the significant contribution of each proposed component.

AAAI Conference 2026 Conference Paper

Aware Distillation for Robust Vision-Language Tracking Under Linguistic Sparsity

  • Guangtong Zhang
  • Bineng Zhong
  • Shirui Yang
  • Yang Wang
  • Tian Bai

Vision-language object tracking overcomes the limitations of relying solely on visual features by leveraging language descriptions of objects to provide cross-modal semantic information, thereby enhancing model robustness in complex scenarios. However, most existing high-performance vision-language trackers are trained jointly on pure visual data and vision-language multimodal data. Due to the relative sparsity of language annotations in the data, the trackers tend to prioritize the localization role of visual features, diminishing the model's attention to language information. To mitigate this issue, we propose a novel vision-language tracker: Aware Distillation for Robust Vision-Language Tracking under Linguistic Sparsity (ADTrack). We introduce a knowledge distillation framework employing a knowledge-rich teacher model and a lightweight student model to establish modality correlations between vision and language, enabling efficient modeling between visual information and language descriptions. Specifically, our lightweight student module simultaneously distills language encoding capabilities from large language models through teacher-guided learning on input language, while performing target-aware perception on template images using language descriptions to generate more effective template features for subsequent visual extraction. Furthermore, to ensure perceptual robustness in linguistically sparse scenarios, we simulate language-deficient conditions during training and employ contrastive learning to enhance model adaptability. Extensive experiments demonstrate that ADTrack reduces parameters by over 50% while achieving state-of-the-art (SOTA) performance and speed on vision-language tracking benchmarks, including LaSOT, LaSOText, TNL2K, OTB-Lang and MGIT.

EAAI Journal 2026 Journal Article

BoA-SQL: Executable Blueprint-of-Action for Text-to-SQL with reinforcement learning

  • Yang Wang
  • Zhilong Xie
  • Lin Zhang
  • Lingyun Gu
  • Qing Li

Recent progress in large language models has opened new possibilities for querying databases using natural language instead of SQL. Yet existing methods, often relying on linear reasoning, struggle with complex nested logic and are susceptible to error propagation. We propose BoA-SQL, which turns reasoning into a durable, structured plan. Our contributions are threefold: (1) To resolve schema ambiguity and context overload, a lightweight, knowledge graph-driven linker grounds the query by pruning irrelevant schema before planning. (2) To overcome the structural mismatch of linear plans, a persistent, tree-structured blueprint aligns with SQL’s hierarchy, enabling localized repair of faulty segments without full re-computation. (3) To align the language model’s text-generation objective with execution correctness, a two-stage reinforcement learning policy optimizes the entire blueprint for task success. Extensive evaluations on public benchmarks Spider and a complex real-world database validate this approach. BoA-SQL achieves 85. 6% execution accuracy. This figure notably increases to 88. 2% after correcting benchmark label errors. These findings suggest that durable, structure-aware planning, combined with schema grounding and planning-aligned optimization, is a practical path to reliable Text-to-SQL.

AAAI Conference 2026 Conference Paper

Dynamic Weight Adaptation in Spiking Neural Networks Inspired by Biological Homeostasis

  • Yunduo Zhou
  • Bo Dong
  • Chang Li
  • Yuanchen Wang
  • Xuefeng Yin
  • Yang Wang
  • Xin Yang

Homeostatic mechanisms play a crucial role in maintaining optimal functionality within the neural circuits of the brain. By regulating physiological and biochemical processes, these mechanisms ensure the stability of an organism’s internal environment, enabling it to better adapt to external changes. Among these mechanisms, the Bienenstock, Cooper, and Munro (BCM) theory has been extensively studied as a key principle for maintaining the balance of synaptic strengths in biological systems. Despite the extensive development of spiking neural networks (SNNs) as a model for bionic neural networks, no prior work in the machine learning community has integrated biologically plausible BCM formulations into SNNs to provide homeostasis. In this study, we propose a Dynamic Weight Adaptation Mechanism (DWAM) for SNNs, inspired by the BCM theory. DWAM can be integrated into the host SNN, dynamically adjusting network weights in real time to regulate neuronal activity, providing homeostasis to the host SNN without any fine-tuning. We validated our method through dynamic obstacle avoidance and continuous control tasks under both normal and specifically designed degraded conditions. Experimental results demonstrate that DWAM not only enhances the performance of SNNs without existing homeostatic mechanisms under various degraded conditions but also further improves the performance of SNNs that already incorporate homeostatic mechanisms.

AAAI Conference 2026 Conference Paper

Experiential Fairness: Bridging the Gap Between User Experience and Resource-Centric Fairness in Online LLM Services

  • Jiahua Huang
  • Wentai Wu
  • Yongheng Liu
  • Guozhi Liu
  • Yang Wang
  • Weiwei Lin

Conventional fairness in multi-tenant Large Language Model (LLM) inference services is typically defined by system-centric metrics such as equitable resource allocation. We argue that this is unilateral and it creates a gap between measured system performance and actual user-perceived quality. We challenge this notion by introducing and formalizing Experiential Fairness, a user-centric paradigm that shifts the objective from equality of opportunity (resource access) to equity of outcome (user experience). With this motivation we propose ExFairS, a lightweight scheduling framework that perceives each user's satisfaction as a composite measure of Service Level Objective (SLO) compliance and resource consumption, and dynamically re-orders the serving queue guided by a credit-based priority mechanism. Extensive experiments on an 8-GPU NVIDIA V100 node show that ExFairS reduces the SLO violation rate by up to 100% and improves system throughput by 14-21.9%, outperforming state-of-the-art schedulers and delivering a demonstrably higher degree of Experiential Fairness.

AAAI Conference 2026 Conference Paper

FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning

  • Jiajun Cao
  • Qizhe Zhang
  • Peidong Jia
  • Xuhui Zhao
  • Bo Lan
  • Xiaoan Zhang
  • Lizhuo
  • Xiaobao Wei

Vision-Language-Action (VLA) models have demonstrated significant potential in complex scene understanding and action reasoning, leading to their increasing adoption in end-to-end autonomous driving systems. However, the long visual tokens of VLA models greatly increase computational costs. Current visual token pruning methods in Vision-Language Models (VLM) rely on either visual token similarity or visual-text attention, but both have shown poor performance in autonomous driving scenarios. Given that human drivers concentrate on relevant foreground areas while driving, we assert that retaining visual tokens containing this foreground information is essential for effective decision-making. Inspired by this, we propose FastDriveVLA, a novel reconstruction-based vision token pruning framework designed specifically for autonomous driving. FastDriveVLA includes a plug-and-play visual token pruner called ReconPruner, which prioritizes foreground information through MAE-style pixel reconstruction. A novel adversarial foreground-background reconstruction strategy is designed to train ReconPruner for the visual encoder of VLA models. Once trained, ReconPruner can be seamlessly applied to different VLA models with the same visual encoder without retraining. To train ReconPruner, we also introduce a large-scale dataset called nuScenes-FG, consisting of 241K image-mask pairs with annotated foreground regions. Our approach achieves state-of-the-art results on the nuScenes open-loop planning benchmark across different pruning ratios.

AAAI Conference 2026 Conference Paper

MSAnchor: De Novo Molecular Generation from Mass Spectrometry Data with Anchor-Extended Molecular Scaffolds

  • Xiaohan Qin
  • Chao Wang
  • Zhengyang Zhou
  • Linjiang Chen
  • Wenjie Du
  • Yang Wang

Tandem mass spectrometry (MS/MS) is a critical tool for identifying molecular structures. By efficiently separating molecular fragments based on their mass-to-charge (m/z) ratios, it facilitates molecular generation and subsequent scientific discoveries. However, de novo molecular generation from MS/MS spectra remains fundamentally constrained by two paramount challenges: the vast chemical space requires effective structural constraints, and the absence of fine-grained substructural generation weakens the correspondences between spectral features and molecular structures. In this work, we propose MSAnchor, a novel two-stage framework for MS/MS-based molecular structure generation. We mitigate the search space challenge through the introduction of Anchor-Extended Molecular Scaffold (AEMS) representation that explicitly encodes side-chain anchoring points, thereby dramatically reducing combinatorial complexity. Leveraging the explicit attachment sites provided by AEMS, we develop anchor-specific priors that establish effective alignments between spectral features and molecular substructures. This fine-grained substructural correspondence is further enhanced by a modified Conditional Information Bottleneck (CIB) module that extracts the most informative spectral components in a structure-aware manner. These innovations enable MSAnchor to generate molecular structures that closely reflect spectral characteristics while constraining combinatorial complexity. Extensive experiments on the CANOPUS and MassSpecGym datasets demonstrate that MSAnchor achieves state-of-the-art performance in molecular structure prediction from MS/MS spectra, with performance improvements that are particularly more pronounced for molecules with higher complexity.

AAAI Conference 2026 Conference Paper

Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing

  • Shichao Ma
  • Yunhe Guo
  • Jiahao Su
  • Qihe Huang
  • Zhengyang Zhou
  • Yang Wang

Text-to-image generation tasks have driven remarkable advances in diverse media applications, yet most focus on single-turn scenarios and struggle with iterative, multi-turn creative tasks. Recent dialogue-based systems attempt to bridge this gap, but their single-agent, sequential paradigm often causes intention drift and incoherent edits. To address these limitations, we present Talk2Image, a novel multi-agent system for interactive image generation and editing in multi-turn dialogue scenarios. Our approach integrates three key components: intention parsing from dialogue history, task decomposition and collaborative execution across specialized agents, and feedback-driven refinement based on a multi-view evaluation mechanism. Talk2Image enables step-by-step alignment with user intention and consistent image editing. Experiments demonstrate that Talk2Image outperforms existing baselines in controllability, coherence, and user satisfaction across iterative image generation and editing tasks.

JMLR Journal 2026 Journal Article

Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective

  • Yuling Jiao
  • Yanming Lai
  • Yang Wang
  • Bokai Yan

The Transformer model is widely used in various application areas of machine learning, such as natural language processing. This paper investigates the approximation of the Hölder continuous function class $\mathcal{H}_{Q}^{\beta}\left([0,1]^{d\times n},\mathbb{R}^{d\times n}\right)$ by Transformers and constructs several Transformers that can overcome the curse of dimensionality. These Transformers consist of one self-attention layer with one head and the softmax function as the activation function, along with several feedforward layers. For example, to achieve an approximation accuracy of $\epsilon$, if the activation functions of the feedforward layers in the Transformer are ReLU and floor, only $\mathcal{O}\left(\log\frac{1}{\epsilon}\right)$ layers of feedforward layers are needed, with widths of these layers not exceeding $\mathcal{O}\left(\frac{1}{\epsilon^{2/\beta}}\log\frac{1}{\epsilon}\right)$. If other activation functions are allowed in the feedforward layers, the width of the feedforward layers can be further reduced to a constant. These results demonstrate that Transformers have a strong expressive capability. The construction in this paper is based on the Kolmogorov-Arnold Superposition Theorem and does not require the concept of contextual mapping, hence our proof is more intuitively clear compared to previous Transformer approximation works. Additionally, the translation technique proposed in this paper helps to apply the previous approximation results of feedforward neural networks to Transformer research. [abs] [ pdf ][ bib ] &copy JMLR 2026. ( edit, beta )

AAAI Conference 2026 Conference Paper

U2B: Scale-unbiased Representation Converter for Graph Classification with Imbalanced and Balanced Scale Distributions

  • Guanjun Wang
  • Jianhao Zhang
  • Jiaming Ma
  • Sheng Huang
  • Pengkun Wang
  • Zhengyang Zhou
  • Binwu Wang
  • Yang Wang

Graph classification is a critical task in analyzing graph data, with applications across various domains. While graph neural networks (GNNs) have achieved remarkable results, their ability to generalize across graphs of varying scales remains a challenge. Conventional models often perform well on large-scale graphs but struggle with distributions that are skewed towards small scales. Conversely, models tailored to address scale imbalances frequently prioritize small-scale graphs, leading to diminished performance in more balanced scenarios. To overcome these limitations, we introduce a Unbalanced-Balanced Representation Converter (U2B), which exhibits no explicit bias toward graph scales. U2B employs a two-step workflow: a distillation phase to extract base features from both node-level and graph-level representations, followed by a refinement phase to generate unbiased representations for improved balance. In the distillation phase, a static constraint guides node-level adjustments, improving the representation of nodes in small graphs. Simultaneously, a dynamic constraint in the graph-level process mitigates biases toward features from large graphs. To ensure harmony between the representations, a consistency alignment loss is introduced, aligning node-level and graph-level features to create more cohesive and balanced graph representations. Extensive experiments on multiple datasets show that U2B achieves competitive performance.

EAAI Journal 2025 Journal Article

A knowledge-refined hybrid graph model for quality prediction of industrial processes

  • Yang Wang
  • Feifan Shen
  • Lingjian Ye

The complexity of industrial processes has spurred the application of soft sensor techniques for predicting key quality variables based on easy-measurable process variables. Currently, data-driven soft sensors based on Artificial Intelligence techniques have become the mainstream. However, these soft sensing models deeply rely on the quality of training data, where the domain knowledge is often ignored. Meanwhile, a significant amount of labeled data is not fully utilized. To address these issues, this paper proposes a supervised framework based on a knowledge-refined hybrid graph network, which contributes to the artificial intelligence application of nonlinear dynamic soft sensors. The problems of applying traditional artificial intelligence models in soft sensor have been addressed by reconstructing the input module of graph neural networks with knowledge-guided approaches. Both spatial and temporal correlations of process data are captured and the hybrid network significantly improves the reliability and interpretability of the soft sensing model. By incorporating labeled data into the model, the representation of quality information is also enhanced. Finally, the proposed framework was applied to an industrial debutanizer column, and the experimental results fully demonstrate the effectiveness and superiority of the method.

EAAI Journal 2025 Journal Article

A metro rail corrugation detection framework based on car body vibration signals and unsupervised learning

  • Yang Wang
  • Hong Xiao
  • Mahantesh M. Nadakatti
  • Zhihai Zhang
  • Yihao Chi
  • Xiubo Liu

Onboard detection of rail corrugation traditionally relies on axle box acceleration data and requires labeled data for supervised learning. To address data acquisition and labeling challenges, this study proposes an unsupervised learning framework using car body vertical vibration acceleration signals. First, the vibration response and transmission characteristics of train components were analyzed. Then, car body acceleration signals were transformed into time-frequency spectrograms using the Synchrosqueezed Wave Packet Transform (SSWPT). An unsupervised learning framework based on momentum contrastive learning was trained on these spectrograms, followed by fine-tuning for tasks such as corrugation wavelength classification and amplitude assessment. The results confirm that rail corrugation wavelength and amplitude can be effectively characterized using the SSWPT spectrograms. The baseline model achieved a Top-1 accuracy of 97. 70 % and a Top-5 accuracy of 98. 92 %. Incorporating the Convolutional Block Attention Module (CBAM) and multi-crop augmentation further improved Top-1 and Top-5 accuracy by 0. 93 % and 0. 49 %, respectively. For binary classification of corrugation presence, wavelength classification accuracy ranged from 95 % to 100 %, while amplitude assessment exceeded 95 %. Field verification results indicate that the proposed detection framework offers a robust data-driven method for onboard detection of rail corrugation.

AAAI Conference 2025 Conference Paper

Anchor Learning with Potential Cluster Constraints for Multi-view Clustering

  • Yawei Chen
  • Huibing Wang
  • Jinjia Peng
  • Yang Wang

Anchor-based multi-view clustering has received extensive attention due to its efficient performance. Existing methods only focus on how to dynamically learn anchors from the original data and simultaneously construct anchor graphs describing the relationships between samples and perform clustering, while ignoring the reality of anchors, i.e., high-quality anchors should be generated uniformly from different clusters of data rather than scattered outside the clusters. To deal with this problem, we propose a noval method termed Anchor Learning with Potential Cluster Constraints for Multi-view Clustering (ALPC) method. Specifically, ALPC first establishes a shared latent semantic module to constrain anchors to be generated from specific clusters, and subsequently, ALPC improves the representativeness and discriminability of anchors by adapting the anchor graph to capture the common clustering center of mass from samples and anchors, respectively. Finally, ALPC combines anchor learning and graph construction into a unified framework for collaborative learning and mutual optimization to improve the clustering performance. Extensive experiments demonstrate the effectiveness of our proposed method compared to some state-of-the-art MVC methods.

AAAI Conference 2025 Conference Paper

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving

  • Tao Tang
  • Dafeng Wei
  • Zhengyu Jia
  • Tian Gao
  • Changwei Cai
  • Chengkai Hou
  • Peng Jia
  • Kun Zhan

The rapid development of the autonomous driving industry has led to a significant accumulation of autonomous driving data. Consequently, there comes a growing demand for retrieving data to provide specialized optimization. However, directly applying previous image retrieval methods faces several challenges, such as the lack of global feature representation and inadequate text retrieval ability for complex driving scenes. To address these issues, firstly, we propose the BEV-TSR framework which leverages descriptive text as an input to retrieve corresponding scenes in the Bird’s Eye View (BEV) space. Then to facilitate complex scene retrieval with extensive text descriptions, we employ a large language model (LLM) to extract the semantic features of the text inputs and incorporate knowledge graph embeddings to enhance the semantic richness of the language embedding. To achieve feature alignment between the BEV feature and language embedding, we propose Shared Cross-modal Embedding with a set of shared learnable embeddings to bridge the gap between these two modalities, and employ a caption generation task to further enhance the alignment. Furthermore, there lack of well-formed retrieval datasets for effective evaluation. To this end, we establish a multi-level retrieval dataset, nuScenes-Retrieval, based on the widely adopted nuScenes dataset. Experimental results on the multi-level nuScenes-Retrieval show that BEV-TSR achieves state-of-the-art performance, e.g., 85.78% and 87.66% top-1 accuracy on scene-to-test and text-to-scene retrieval respectively.

AAAI Conference 2025 Conference Paper

Boosting Image De-Raining via Central-Surrounding Synergistic Convolution

  • Long Peng
  • Yang Wang
  • Xin Di
  • PeizheXia
  • Xueyang Fu
  • Yang Cao
  • Zheng-Jun Zha

Rainy images suffer from quality degradation due to the synergistic effect of rain streaks and accumulation. The rain streaks are anisotropic and show a specific directional arrangement, while the rain accumulation is isotropic and shows a consistent concentration distribution in local regions. This distribution difference makes unified representation learning for rain streaks and accumulation challenging, which may lead to structure distortion and contrast degradation in the deraining results. To address this problem, a central-surrounding mechanism inspired Synergistic Convolution (SC) is proposed to extract rain streaks and accumulation features simultaneously. Specifically, the SC consists of two parallel novel convolutions: Central-Surrounding Difference Convolution (CSD) and Central-Surrounding Addition Convolution (CSA). In CSD, the difference operation between central and surrounding pixels is injected into the feature extraction process of convolution to perceive the direction distribution of rain streaks. In CSA, the addition operation between central and surrounding pixels is injected into the feature extraction process of convolution to facilitate the modeling of rain accumulation properties. The SC can be used as a general unit to substitute Vanilla Convolution (VC) in current de-raining networks to boost performance. To reduce computational costs, CSA and CSD in SC are merged into a single VC kernel by our parameter equivalent transformation before inferencing. Evaluations of twelve de-raining methods on nine public datasets demonstrate that our proposed SC can comprehensively improve the performance of twelve de-raining networks under various rainy conditions without changing the original network structure or introducing extra computational costs. Even for the current SOTA methods, SC can further achieve SOTA++ performance. The source codes will be publicly available.

NeurIPS Conference 2025 Conference Paper

Bridging the Gap Between Cross-Domain Theory and Practical Application: A Case Study on Molecular Dissolution

  • Sihan Wang
  • Wenjie Du
  • Qing Zhu
  • Yang Wang

Artificial intelligence (AI) has played a transformative role in chemical research, greatly facilitating the prediction of small molecule properties, simulation of catalytic processes, and material design. These advances are driven by increases in computing power, open source machine learning frameworks, and extensive chemical datasets. However, a persistent challenge is the limited amount of high-quality real-world data, while models calculated based on large amounts of theoretical data are often costly and difficult to deploy, which hinders the applicability of AI models in real-world scenarios. In this study, we enhance the prediction of solute-solvent properties by proposing a novel sample selection method: the iterative core subset extraction (CSIE) framework. CSIE iteratively updates the core sample subset based on information gain to remove redundant features in theoretical data and optimize the performance of the model on real chemical datasets. Furthermore, we introduce an asymmetric molecular interaction graph neural network (AMGNN) that combines positional information and bidirectional edge connections to simulate real-world chemical reaction scenarios to better capture solute-solvent interactions. Experimental results show that our method can accurately extract the core subset and improve the prediction accuracy.

IJCAI Conference 2025 Conference Paper

Can We Verify Step by Step for Incorrect Answer Detection?

  • Xin Xu
  • Shizhe Diao
  • Can Yang
  • Yang Wang

Chain-of-Thought (CoT) prompting has marked a significant advancement in enhancing the reasoning capabilities of large language models (LLMs). Previous studies have developed various extensions of CoT, which focus primarily on enhancing end-task performance. In addition, there has been research on assessing the quality of reasoning chains in CoT. This raises an intriguing question: Is it possible to predict the accuracy of LLM outputs by scrutinizing the reasoning chains they generate? To answer this research question, we introduce a benchmark, R2PE, designed specifically to explore the relationship between reasoning chains and performance in various reasoning tasks spanning five different domains. This benchmark aims to measure the falsehood of the final output of LLMs based on the reasoning steps. To make full use of information in multiple reasoning chains, we propose the process discernibility score (PDS) framework that beats the answer-checking baseline by a large margin. Concretely, this resulted in an average of 5. 1% increase in the F1 score and 2. 97% improvement in AUC-PR across all 45 subsets within R2PE. We further demonstrate our PDS’s efficacy in advancing open-domain QA accuracy. Our code will be released in the final version. Codes and data are available at https: //github. com/XinXU-USTC/R2PE. git. For further details on the appendix, please refer to https: //arxiv. org/abs/2402. 10528.

IJCAI Conference 2025 Conference Paper

Causal Learning Meet Covariates: Empowering Lightweight and Effective Nationwide Air Quality Forecasting

  • Jiaming Ma
  • Zhiqing Cui
  • Binwu Wang
  • Pengkun Wang
  • Zhengyang Zhou
  • Zhe Zhao
  • Yang Wang

Air quality prediction plays a crucial role in the development of smart cities, garnering significant attention from both academia and industry. Current air quality prediction models encounter two major limitations: their high computational complexity limits scalability to nationwide datasets, and they often regard weather covariates as optional auxiliary information. In reality, weather covariates can have a substantial impact on air quality indices (AQI), exhibiting a significant causal association. In this paper, we first present a nationwide air quality dataset to address the lack of open-source, large-scale datasets in this field. Then we propose a causal learning model, CauAir, for air quality prediction that harnesses the powerful representation capabilities of the Transformer to explicitly model the causal association between weather covariates and AQI. To address the high complexity of traditional Transformers, we design CachLormer, which features two key innovations: a simplified architecture with redundant components removed, and a cache-attention mechanism that employs learnable embeddings for perceiving causal association between AQI and weather covariates in a coarsegrained perspective. We use information theory to illustrate the superiority of the proposed model. Finally, experimental results on three datasets with 28 as the baseline demonstrate that our model achieves competitive performance, while maintaining high training efficiency and low memory consumption. The source code is available at CauAir Official Repository.

IJCAI Conference 2025 Conference Paper

Consensus-Guided Incomplete Multi-view Clustering via Cross-view Affinities Learning

  • Qian Liu
  • Huibing Wang
  • Jinjia Peng
  • Yawei Chen
  • Mingze Yao
  • Xianping Fu
  • Yang Wang

Incomplete multi-view clustering (IMC) has garnered substantial attention due to its capacity to handle unlabeled data. Existing methods predominantly explore pairwise consistency between every two views. However, such consistency is highly susceptible to missing samples and outliers within a certain view and thus deviates from the true clustering distribution. Moreover, dual-view interaction neglects the collaboration effects of multiple views, making it challenging to capture the holistic characteristics across views. In response to these issues, we propose a novel Consensus-Guided Incomplete Multi-view Clustering via Cross-view Affinities Learning (CAL). Specifically, CAL reconstructs views with available instances to mine sample-wise affinities and harness comprehensive content information within views. Subsequently, to extract clean structural information, CAL imposes a structured sparse constraint on the representation tensor to eliminate biased errors. Furthermore, by integrating the consensus representation into a representation tensor, CAL can employ high-order interaction of multiple views to depict the semantic correlation between views while acquiring a unified structural graph across multiple views. Extensive experiments on seven benchmark datasets demonstrate that CAL outperforms some state-of-the-art methods in clustering performance. The code is available at https: //github. com/whbdmu/CAL.

ICML Conference 2025 Conference Paper

Counterfactual Contrastive Learning with Normalizing Flows for Robust Treatment Effect Estimation

  • Jiaxuan Zhang
  • Emadeldeen Eldele
  • Fuyuan Cao
  • Yang Wang
  • Xiaoli Li 0001
  • Jiye Liang

Estimating Individual Treatment Effects (ITE) from observational data is challenging due to covariate shift and counterfactual absence. While existing methods attempt to balance distributions globally, they often lack fine-grained sample-level alignment, especially in scenarios with significant individual heterogeneity. To address these issues, we reconsider counterfactual as a proxy to emulate balanced randomization. Furthermore, we derive a theoretical bound that links the expected ITE estimation error to both factual prediction errors and representation distances between factuals and counterfactuals. Building on this theoretical foundation, we propose FCCL, a novel method designed to effectively capture the nuances of potential outcomes under different treatments by (i) generating diffeomorphic counterfactuals that adhere to the data manifold while maintaining high semantic similarity to their factual counterparts, and (ii) mitigating distribution shift via sample-level alignment grounded in our derived generalization-error bound, which considers factual-counterfactual similarity and category consistency. Extensive evaluations on benchmark datasets demonstrate that FCCL outperforms 13 state-of-the-art methods, particularly in capturing individual-level heterogeneity and handling sparse boundary samples.

NeurIPS Conference 2025 Conference Paper

Deciphering the Extremes: A Novel Approach for Pathological Long-tailed Recognition in Scientific Discovery

  • Zhe Zhao
  • Haibin Wen
  • Xianfu Liu
  • Rui Mao
  • Pengkun Wang
  • Liheng Yu
  • Linjiang Chen
  • Bo An

Scientific discovery across diverse fields increasingly grapples with datasets exhibiting pathological long-tailed distributions: a few common phenomena overshadow a multitude of rare yet scientifically critical instances. Unlike standard benchmarks, these scientific datasets often feature extreme imbalance coupled with a modest number of classes and limited overall sample volume, rendering existing long-tailed recognition (LTR) techniques ineffective. Such methods, biased by majority classes or prone to overfitting on scarce tail data, frequently fail to identify the very instances—novel materials, rare disease biomarkers, faint astronomical signals—that drive scientific breakthroughs. This paper introduces a novel, end-to-end framework explicitly designed to address pathological long-tailed recognition in scientific contexts. Our approach synergizes a Balanced Supervised Contrastive Learning (B-SCL) mechanism, which enhances the representation of tail classes by dynamically re-weighting their contributions, with a Smooth Objective Regularization (SOR) strategy that manages the inherent tension between tail-class focus and overall classification performance. We introduce and analyze the real-world ZincFluor chemical dataset ($\mathcal{T}=137. 54$) and synthetic benchmarks with controllable extreme imbalances (CIFAR-LT variants). Extensive evaluations demonstrate our method's superior ability to decipher these extremes. Notably, on ZincFluor, our approach achieves a Tail Top-2 accuracy of $66. 84\%$, significantly outperforming existing techniques. On CIFAR-10-LT with an imbalance ratio of $1000$ ($\mathcal{T}=100$), our method achieves a tail-class accuracy of $38. 99\%$, substantially leading the next best. These results underscore our framework's potential to unlock novel insights from complex, imbalanced scientific datasets, thereby accelerating discovery.

IJCAI Conference 2025 Conference Paper

Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration

  • Long Peng
  • Xin Di
  • ZhanFeng Feng
  • Wenbo Li
  • Renjing Pei
  • Yang Wang
  • Xueyang Fu
  • Yang Cao

Image restoration aims to recover details and enhance contrast in degraded images. With the growing demand for high-quality imaging (e. g. , 4K and 8K), achieving a balance between restoration quality and computational efficiency has become increasingly critical. Existing methods, primarily based on CNNs, Transformers, or their hybrid approaches, apply uniform deep representation extraction across the image. However, these methods often struggle to effectively model long-range dependencies and largely overlook the spatial characteristics of image degradation (regions with richer textures tend to suffer more severe damage), making it hard to achieve the best trade-off between restoration quality and efficiency. To address these issues, we propose a novel texture-aware image restoration method, TAMambaIR, which simultaneously perceives image textures and achieves a trade-off between performance and efficiency. Specifically, we introduce a novel Texture-Aware State Space Model, which enhances texture awareness and improves efficiency by modulating the transition matrix of the state-space equation and focusing on regions with complex textures. Additionally, we design a Multi-Directional Perception Block to improve multi-directional receptive fields while maintaining low computational overhead. Extensive experiments on benchmarks for image super-resolution, deraining, and low-light image enhancement demonstrate that TAMambaIR achieves state-of-the-art performance with significantly improved efficiency, establishing it as a robust and efficient framework for image restoration.

IJCAI Conference 2025 Conference Paper

DO-CoLM: Dynamic 3D Conformation Relationships Capture with Self-Adaptive Ordering Molecular Relational Modeling in Language Models

  • Zhuo Chen
  • Jiahui Zhang
  • Sihan Wang
  • Hongxin Xiang
  • Jianmin Wang
  • Wenjie Du
  • Yang Wang

Molecular Relational Learning (MRL) aims to understand interactions between molecular pairs, playing a critical role in advancing biochemical research. Recently, Large Language Models (LLMs), with their extensive knowledge bases and advanced reasoning capabilities, have emerged as powerful tools for MRL. However, existing LLMs, which primarily rely on SMILES strings and molecular graphs, face two major challenges. They struggle to capture molecular stereochemistry and dynamics, as molecules possess multiple 3D conformations with varying reactivity and dynamic transformation relationships that are essential for accurately predicting molecular interactions but cannot be effectively represented by 1D SMILES or 2D molecular graphs. Additionally, these models do not consider the autoregressive nature of LLMs, overlooking the impact of input order on model performance. To address these issues, we propose DO-CoLM: a Dynamic relationship capture and self-adaptive Ordering 3D molecular Conformation LM for MRL. By introducing modules to dynamically model intra-molecular and inter-molecular conformational relationships and adaptively adjust the molecular modality input order, DO-CoLM achieves superior performance, as demonstrated by experimental results on 12 cross-domain datasets.

AAAI Conference 2025 Conference Paper

Drawing Informative Gradients from Sources: A One-stage Transfer Learning Framework for Cross-city Spatiotemporal Forecasting

  • Yudong Zhang
  • Xu Wang
  • Xuan Yu
  • Zhaoyang Sun
  • Kai Wang
  • Yang Wang

Spatiotemporal forecasting (STF) is pivotal in urban computing, yet data scarcity in developing cities hampers robust model training. Addressing this, recent studies leverage transfer learning to migrate knowledge from data-rich (source) to data-poor (target) cities. This strategy, while effective, faces challenges as pre-trained models risk absorbing noise and harmful information due to data distribution disparities, potentially undermining the accuracy of forecasts for target cities. To address this issue, we propose a one-stage STF framework named Target-Skewed Joint Training (TSJT). Central to TSJT is a novel Target-Skewed Backward training strategy that selectively refines gradients from source city data, preserving only the elements that positively impact the target city. To further enhance the quality of these gradients, we have designed a Node Prompting Module (NPM). TSJT is crafted for seamless integration with existing STF models, endowing them with the capability to efficiently tackle challenges stemming from data scarcity. Experimental results on several real-world datasets from multiple cities substantiate the efficacy of TSJT in the realm of cross-city transfer learning.

NeurIPS Conference 2025 Conference Paper

Dynamic and Chemical Constraints to Enhance the Molecular Masked Graph Autoencoders

  • Jiahui Zhang
  • Wenjie Du
  • Yang Wang

Masked Graph Autoencoders (MGAEs) have gained significant attention recently. Their proxy tasks typically involve random corruption of input graphs followed by reconstruction. However, in the molecular domain, two main issues arise: the predetermined mask ratio and reconstruction objectives can lead to suboptimal performance or negative transfer due to overly simplified or complex tasks, and these tasks may deviate from chemical priors. To tackle these challenges, we propose Dynamic and Chemical Constraints (DyCC) for MGAEs. This includes a masking strategy called GIBMS, which preserves essential semantic information during graph masking while adaptively adjusting the mask ratio and content for each molecule. Additionally, we introduce a Soft Label Generator (SLG) that reconstructs masked tokens as learnable prototypes (soft labels) rather than hard labels. These components adhere to chemical constraints and allow dynamic variation of proxy tasks during training. We integrate the model-agnostic DyCC into various MGAEs and conduct comprehensive experiments, demonstrating significant performance improvements. Our code is available at \url{https: //github. com/forever-ly/DyCC}.

EAAI Journal 2025 Journal Article

Energy performance prediction of centrifugal pumps based on adaptive support vector regression

  • Huican Luo
  • Peijian Zhou
  • Jiayi Cui
  • Yang Wang
  • Haisheng Zheng
  • Yantian Wang

It is of great significance to speed up the development and optimization of pumps with energy performance prediction methods. Machine learning is widely used for performance prediction of centrifugal pumps due to its fast and accurate predictions. However, the prediction model performance distinctly for the different geometry and performance parameters. This paper proposes an adaptive support vector regression (SVR) model for predicting centrifugal pump energy performance, which incorporates input-output correlation analysis and differential evolution to automatically adjust the input parameter weights. The model's performance was validated against experimental data, yielding mean absolute residuals (MAR) of 0. 174 for head, 0. 113 for power, and 1. 658 for efficiency. Additionally, the model achieved an R2 of 0. 995 and a mean square error (MSE) of 2. 99. In multi-operation conditions, by adjusting the parameter vector, the adaptive SVR reduced the mean absolute relative error (MARE) of head, power, and efficiency to 0. 443%, 1. 07%, and 6. 63%, respectively, representing improvements of 79. 6%, 86. 2%, and 31. 6% compared to the original SVR model. The proposed model also outperformed the adaptive least squares support vector regression (LSSVR).

NeurIPS Conference 2025 Conference Paper

Enhancing the Maximum Effective Window for Long-Term Time Series Forecasting

  • Jiahui Zhang
  • Zhengyang Zhou
  • Wenjie Du
  • Yang Wang

Long-term time series forecasting (LTSF) aims to predict future trends based on historical data. While longer lookback windows theoretically offer more comprehensive insights, Transformer-based models often struggle with them. On one hand, longer windows introduce more noise and redundancy, hindering the model's learning process. On the other hand, Transformers suffer from attention dispersion and are prone to overfitting to noise, especially when processing long sequences. In this paper, we introduce the Maximum Effective Window (MEW) metric to assess a model's ability to effectively utilize the lookback window. We also propose two model-agnostic modules to enhance MEW, enabling models to better leverage historical data for improved performance. Specifically, to reduce redundancy and noise, we introduce the Information Bottleneck Filter (IBF), which employs information bottleneck theory to extract the most essential subsequences from the input. Additionally, we propose the Hybrid-Transformer-Mamba (HTM), which incorporates the Mamba mechanism for selective forgetting of long sequences while harnessing the Transformer's strong modeling capabilities for shorter sequences. We integrate these two modules into various Transformer-based models, and experimental results show that they effectively enhance MEW, leading to improved overall performance. Our code is available at \url{https: //github. com/forever-ly/PIH}.

NeurIPS Conference 2025 Conference Paper

Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video

  • Xueyang Yu
  • Cheng Shi
  • Yang Wang
  • Sibei Yang

Envision an AI capable of functioning in human-like settings, moving beyond mere observation to actively understand, anticipate, and proactively respond to unfolding events. Towards this vision, we focus on the innovative task where, given ego-streaming video input, an assistant proactively answers diverse, evolving questions at the opportune moment, while maintaining synchronized perception and reasoning. This task embodies three key properties: (1) Proactive Coherence, (2) Just-in-Time Responsiveness, and (3) Synchronized Efficiency. To evaluate and address these properties, we first introduce ESTP-Bench (Ego Streaming Proactive Benchmark) alongside the ESTP-F1 metric—a novel framework designed for their rigorous assessment. Secondly, we propose a comprehensive technical pipeline to enable models to tackle this challenging task. This pipeline comprises: (1) a data engine, (2) a multi-stage training strategy, and (3) a proactive dynamic compression technique. Our proposed model effectively addresses these critical properties while achieving state-of-the-art (SOTA) performance on the standard COIN benchmark.

AAAI Conference 2025 Conference Paper

Formal Synthesis of Barrier Certificates Using Fourier Kolmogorov-Arnold Network

  • Xiongqi Zhang
  • Junwei Xu
  • Yang Wang
  • Dongming Xiang
  • Wang Lin
  • Zuohua Ding

Barrier certificate generation is an efficient and powerful technique for formally verifying safety properties of cyber-physical systems. Feed-forward neural networks (FNNs) are commonly used to synthesize barrier certificates, but the fixed activation functions limit their efficiency and scalability. In this paper, we propose a novel method for generating barrier certificates using Fourier Kolmogorov-Arnold Networks (KANs). Specifically, it utilizes Fourier KANs to replace FNNs as the template of barrier certificates. Since Fourier KAN has learnable activation functions and uses trigonometric functions as its basis functions, it can efficiently improve the representation power and is easy to train for neural barrier certificates. Then, it formally verifies the validity of the candidate Fourier KAN barrier certificates using both the Lipschitz method and the Satisfiability Modulo Theories, improving the efficiency and success rate of verification. We implement the tool KAN4BC, and evaluate its performance over a set of benchmarks. The experimental results demonstrate the effectiveness and efficiency of our method.

NeurIPS Conference 2025 Conference Paper

GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling

  • Tianhao Chen
  • Xin Xu
  • Zijing Liu
  • Pengxiang Li
  • Xinyuan Song
  • AJAY JAISWAL
  • Fan Zhang
  • Jishan Hu

Modern Large Language Models, such as the LLaMA, Qwen and DeepSeek series, predominantly adopt the Pre-LayerNorm (Pre-LN) Transformer architecture. While being stable during pretraining and scalable to large model sizes, Pre-LN suffers from an exponential growth in activation variance across layers, causing the shortcut to dominate over sub-layer outputs in the residual connection and limiting the learning capacity of deeper layers. To mitigate this issue, we propose Gradient-Preserving Activation Scaling (GPAS), a simple technique that can be used in combination with existing approaches. GPAS works by scaling down the intermediate activations while keeping their gradients unchanged. This leaves information in the activations intact, and avoids the gradient vanishing problem associated with gradient downscaling. Extensive experiments across various model sizes from 71M to 1B show that GPAS achieves consistent performance gains. Beyond enhancing Pre-LN Transformers, GPAS also shows promise in improving alternative architectures such as Sandwich-LN and DeepNorm, demonstrating its versatility and potential for improving training dynamics in a wide range of settings. Our code is available at https: //github. com/dandingsky/GPAS.

NeurIPS Conference 2025 Conference Paper

Integrating Drug Substructures and Longitudinal Electronic Health Records for Personalized Drug Recommendation

  • Wenjie Du
  • Xuqiang Li
  • Jinke Feng
  • Shuai Zhang
  • Wen Zhang
  • Yang Wang

Drug recommendation systems aim to identify optimal drug combinations for patient care, balancing therapeutic efficacy and safety. Advances in large-scale longitudinal EHRs have enabled learning-based approaches that leverage patient histories such as diagnoses, procedures, and previously prescribed drugs, to model complex patient-drug relationships. Yet, many existing solutions overlook standard clinical practices that favor certain drugs for specific conditions and fail to fully integrate the influence of molecular substructures on drug efficacy and safety. In response, we propose \textbf{SubRec}, a unified framework that integrates representation learning across both patient and drug spaces. Specifically, SubRec introduces a conditional information bottleneck to extract core drug substructures most relevant to patient conditions, thereby enhancing interpretability and clinical alignment. Meanwhile, an adaptive vector quantization mechanism is designed to generate patient–drug interaction patterns into a condition-aware codebook which reuses clinically meaningful patterns, reduces training overhead, and provides a controllable latent space for recommendation. Crucially, the synergy between condition-specific substructure learning and discrete patient prototypes allows SubRec to make accurate and personalized drug recommendations. Experimental results on the real-world MIMIC III and IV demonstrate our model's advantages. The source code is available at \href{https: //anonymous. 4open. science/r/DrugRecommendation-5173}{https: //anonymous. 4open. science/}.

EAAI Journal 2025 Journal Article

Intra- and inter-instance Location Correlation Network for human–object interaction detection

  • Minglang Lu
  • Guanci Yang
  • Yang Wang
  • Kexin Luo

Objective: Human–object interaction detection is to detect human–object pairs and identify their interactions, which is of great significance to improve the perception and decision-making ability of embodied artificial intelligence robots such as service robots. Under the robot view in complex indoor scenes, some situations with imbalanced sizes and crowded instances make human–object detection and matching particularly difficult. Novelty: Focus on the new task of robotic human–object interaction detection in complex indoor scenes, we propose an intra- and inter-instance location correlation network (ILCN) for human–object interaction detection. First, to improve the detecting and matching of complex combinations, we put forward parallel human–object spatial association method (PSA). PSA links humans and objects in a parallel manner by splicing their instance features and then exploits spatial prior knowledge to enhance the focus of features on instances layer by layer. Then, we design an interaction network block through instance spatial layout relationship guidance (SLG). SLG extracts spatial layout relations from the inner human–object location and fuses the layout relation with interaction queries through self-attention, while in cross-attention the fused features will be split by a skip-connection mechanism to guarantee interaction classification. Findings: The results of the comparison experiments with state-of-the-art methods on three public benchmark datasets show that ILCN achieves the best comprehensive performance on the dataset with complex combinations.

TMLR Journal 2025 Journal Article

LanPaint: Training-Free Diffusion Inpainting with Asymptotically Exact and Fast Conditional Sampling

  • Candi Zheng
  • Yuan Lan
  • Yang Wang

Diffusion models excel at joint pixel sampling for image generation but lack efficient training-free methods for partial conditional sampling (e.g., inpainting with known pixels). Prior works typically formulate this as an intractable inverse problem, relying on coarse variational approximations, heuristic losses requiring expensive backpropagation, or slow stochastic sampling. These limitations preclude (1) accurate distributional matching in inpainting results, (2) efficient inference modes without gradients, and (3) compatibility with fast ODE-based samplers. To address these limitations, we propose LanPaint: a training-free, asymptotically exact partial conditional sampling method for ODE-based and rectified-flow diffusion models. By leveraging carefully designed Langevin dynamics, LanPaint enables fast, backpropagation-free Monte Carlo sampling. Experiments demonstrate that our approach achieves superior performance with precise partial conditioning and visually coherent inpainting across diverse tasks. Code is available on https://github.com/scraed/LanPaint.

NeurIPS Conference 2025 Conference Paper

Less but More: Linear Adaptive Graph Learning Empowering Spatiotemporal Forecasting

  • Jiaming Ma
  • Binwu Wang
  • Guanjun Wang
  • Kuo Yang
  • Zhengyang Zhou
  • Pengkun Wang
  • Xu Wang
  • Yang Wang

The effectiveness of Spatiotemporal Graph Neural Networks (STGNNs) critically hinges on the quality of the underlying graph topology. While end-to-end adaptive graph learning methods have demonstrated promising results in capturing latent spatiotemporal dependencies, they often suffer from high computational complexity and limited expressive capacity. In this paper, we propose MAGE for efficient spatiotemporal forecasting. We first conduct a theoretical analysis demonstrating that the ReLU activation function employed in existing methods amplifies edge-level noise during graph topology learning, thereby compromising the fidelity of the learned graph structures. To enhance model expressiveness, we introduce a sparse yet balanced mixture-of-experts strategy, where each expert perceives the unique underlying graph through kernel-based functions and operates with linear complexity relative to the number of nodes. The sparsity mechanism ensures that each node interacts exclusively with compatible experts, while the balancing mechanism promotes uniform activation across all experts, enabling diverse and adaptive graph representations. Furthermore, we theoretically establish that a single graph convolution using the learned graph in MAGE is mathematically equivalent to multiple convolutional steps under conventional graphs. We evaluate MAGE against advanced baselines on multiple real-world spatiotemporal datasets. MAGE achieves competitive performance while maintaining strong computational efficiency.

NeurIPS Conference 2025 Conference Paper

Many Minds, One Goal: Time Series Forecasting via Sub-task Specialization and Inter-agent Cooperation

  • Qihe Huang
  • Zhengyang Zhou
  • Yangze Li
  • Kuo Yang
  • Binwu Wang
  • Yang Wang

Time series forecasting is a critical and complex task, characterized by diverse temporal patterns, varying statistical properties, and different prediction horizons across datasets and domains. Conventional approaches typically rely on a single, unified model architecture to handle all forecasting scenarios. However, such monolithic models struggle to generalize across dynamically evolving time series with shifting patterns. In reality, different types of time series may require distinct modeling strategies. Some benefit from homogeneous multi-scale forecasting awareness, while others rely on more complex and heterogeneous signal perception. Relying on a single model to capture all temporal diversity and structural variations leads to limited performance and poor interpretability. To address this challenge, we propose a Multi-Agent Forecasting System (MAFS) that abandons the one-size-fits-all paradigm. MAFS decomposes the forecasting task into multiple sub-tasks, each handled by a dedicated agent trained on specific temporal perspectives (e. g. , different forecasting resolutions or signal characteristics). Furthermore, to achieve holistic forecasting, agents share and refine information through different communication topology, enabling cooperative reasoning across different temporal views. A lightweight voting aggregator then integrates their outputs into consistent final predictions. Extensive experiments across 11 benchmarks demonstrate that MAFS significantly outperforms traditional single-model approaches, yielding more robust and adaptable forecasts.

NeurIPS Conference 2025 Conference Paper

ModuLM: Enabling Modular and Multimodal Molecular Relational Learning with Large Language Models

  • Zhuo Chen
  • Yizhen Zheng
  • Huan Yee Koh
  • Hongxin Xiang
  • Linjiang Chen
  • Wenjie Du
  • Yang Wang

Molecular Relational Learning (MRL) aims to understand interactions between molecular pairs, playing a critical role in advancing biochemical research. With the recent development of large language models (LLMs), a growing number of studies have explored the integration of MRL with LLMs and achieved promising results. However, the increasing availability of diverse LLMs and molecular structure encoders has significantly expanded the model space, presenting major challenges for benchmarking. Currently, there is no LLM framework that supports both flexible molecular input formats and dynamic architectural switching. To address these challenges, reduce redundant coding, and ensure fair model comparison, we propose ModuLM, a framework designed to support flexible LLM-based model construction and diverse molecular representations. ModuLM provides a rich suite of modular components, including 8 types of 2D molecular graph encoders, 11 types of 3D molecular conformation encoders, 7 types of interaction layers, and 7 mainstream LLM backbones. Owing to its highly flexible model assembly mechanism, ModuLM enables the dynamic construction of over 50, 000 distinct model configurations. In addition, we provide comprehensive benchmark results to demonstrate the effectiveness of ModuLM in supporting LLM-based MRL tasks.

NeurIPS Conference 2025 Conference Paper

MoFo: Empowering Long-term Time Series Forecasting with Periodic Pattern Modeling

  • Jiaming Ma
  • Binwu Wang
  • Qihe Huang
  • Guanjun Wang
  • Pengkun Wang
  • Zhengyang Zhou
  • Yang Wang

The stable periodic patterns present in the time series data serve as the foundation for long-term forecasting. However, existing models suffer from limitations such as continuous and chaotic input partitioning, as well as weak inductive biases, which restrict their ability to capture such recurring structures. In this paper, we propose MoFo, which interprets periodicity as both the correlation of period-aligned time steps and the trend of period-offset time steps. We first design period-structured patches—2D tensors generated through discrete sampling—where each row contains only period-aligned time steps, enabling direct modeling of periodic correlations. Period-offset time steps within a period are aligned in columns. To capture trends across these offset time steps, we introduce a period-aware modulator. This modulator introduces an adaptive strong inductive bias through a regulated relaxation function, encouraging the model to generate attention coefficients that align with periodic trends. This function is end-to-end trainable, enabling the model to adaptively capture the distinct periodic patterns across diverse datasets. Extensive empirical results on widely used benchmark datasets demonstrate that MoFo achieves competitive performance while maintaining high memory efficiency and fast training speed.

IJCAI Conference 2025 Conference Paper

MTGIB-UNet: A Multi-Task Graph Information Bottleneck and Uncertainty Weighted Network for ADMET Prediction

  • Xuqiang Li
  • Wenjie Du
  • Jun Xia
  • Jianmin Wang
  • Xiaoqi Wang
  • Yang Yang
  • Yang Wang

Accurate prediction of ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties is crucial in drug development, as these properties directly impact a drug's efficacy and safety. However, existing multi-task learning models often face challenges related to noise interference and task conflicts when dealing with complex molecular structures. To address these issues, we propose a novel multi-task Graph Neural Network (GNN) model, \textbf{MTGIB-UNet}. The model begins by encoding molecular graphs to capture intricate molecular structure information. Subsequently, based on the Graph Information Bottleneck (GIB) principle, the model compresses the information flow by extracting subgraphs, retaining task-relevant features while removing noise for each task. These embeddings are then fused through a gated network that dynamically adjusts the contribution weights of auxiliary tasks to the primary task. Specifically, an uncertainty weighting (UW) strategy is applied, with additional emphasis placed on the primary task, allowing dynamic adjustment of task weights while strengthening the influence of the primary task on model training. Experiments on standard ADMET datasets demonstrate that our model outperforms existing methods. Additionally, the model shows good interpretability by identifying key molecular substructures related to specific ADMET endpoints.

AAAI Conference 2025 Conference Paper

Multi-label Self Knowledge Distillation

  • Xucong Wang
  • Pengkun Wang
  • Shurui Zhang
  • Miao Fang
  • Yang Wang

Self-Knowledge Distillation (SKD) leverages the student's own knowledge to create a virtual teacher for distillation when the pre-trained bulky teacher is not available. Whilst existing SKD approaches demonstrate gorgeous efficiency in single-label learning, to directly apply them to multi-label learning would suffer from dramatic degradation due to the following inherent imbalance: \textit{targets with unified labels but multifarious visual scales are crammed into one image, resulting in biased learning of major targets and disequilibrium of precision-recall}. To address this issue, this paper proposes a novel SKD method for multi-label learning named Multi-label Self-knowledge Distillation (MSKD), incorporating three Spatial Decoupling mechanisms (i.e. Locality-SD (L-SD), Reconstruction-SD (R-SD), and Step-SD (S-SD)). L-SD exploits relational dark knowledge from regional outputs to amplify the model's perception of visual details. R-SD reconstructs global semantics by integrating regional outputs from local patches and leverages it to guide the model. S-SD aligns outputs of the same input at different steps, aiming to find a synthetical optimizing direction and avoid the overconfidence. In addition, MSKD combines our tailored loss named MBD for balanced distillation. Exhaustive experiments demonstrate that MSKD not only outperforms previous approaches but also effectively mitigates biased learning and equips the model with more robustness.

AAAI Conference 2025 Conference Paper

Navigating Towards Fairness with Data Selection

  • Yixuan Zhang
  • Zhidong Li
  • Yang Wang
  • Fang Chen
  • Xuhui Fan
  • Feng Zhou

Machine learning algorithms often struggle to eliminate inherent data biases, particularly those arising from unreliable labels, which poses a significant challenge in ensuring fairness. Existing fairness techniques that address label bias typically involve modifying models and intervening in the training process, but these lack flexibility for large-scale datasets. To address this limitation, we introduce a data selection method designed to efficiently and flexibly mitigate label bias, tailored to more practical needs. Our approach utilizes a zero-shot predictor as a proxy model that simulates training on a clean holdout set. This strategy, supported by peer predictions, ensures the fairness of the proxy model and eliminates the need for an additional holdout set, which is a common requirement in previous methods. Without altering the classifier's architecture, our modality-agnostic method effectively selects appropriate training data and has proven efficient and effective in handling label bias and improving fairness across diverse datasets in experimental evaluations.

EAAI Journal 2025 Journal Article

On-board detection of rail corrugation using improved convolutional block attention mechanism

  • Yang Wang
  • Hong Xiao
  • Chaozhi Ma
  • Zhihai Zhang
  • Xuhao Cui
  • Aimin Xu

Leveraging acceleration sensors affixed to the train body enables continuous surveillance of rail corrugation, delivering cost-effectiveness, operational efficiency, and portability. Establishing the correlation between vertical body acceleration and rail corrugation poses a substantial challenge. To ensure uninterrupted monitoring of rail corrugation, an initial development involved constructing a train-track integrated simulation model that accounted for the dynamics of flexible wheelsets and tracks, thereby generating a simulated dataset of vertical body acceleration. Subsequent improvements were made to the conventional Convolutional Block Attention Module (CBAM) architecture, culminating in the proposal of a deep one-dimensional convolutional residual network model named Train Body Vertical Acceleration Network (TBVA-Net), founded on an improved CBAM framework. Training was conducted using the simulated dataset, showcasing the reduced model complexity and total parameter count of the improved CBAM architecture, which notably amplified classification accuracy. The TBVA-Net, employing the refined CBAM, consistently achieved test accuracies exceeding 95%, averaging at 98. 6% on the simulated dataset. Validation through field-measured data corroborated the rationale behind the proposed TBVA-Net architecture. Fine-tuning with a limited subset of labeled field data led to a transfer accuracy of 98. 5%. This paper presents an innovative approach for detecting rail corrugation through vertical acceleration signals obtained from operational vehicles.

NeurIPS Conference 2025 Conference Paper

One Stone with Two Birds: A Null-Text-Null Frequency-Aware Diffusion Models for Text-Guided Image Inpainting

  • Haipeng Liu
  • Yang Wang
  • Meng Wang

Text-guided image inpainting aims at reconstructing the masked regions as per text prompts, where the longstanding challenges lie in the preservation for unmasked regions, while achieving the semantics consistency between unmasked and inpainted masked regions. Previous arts failed to address both of them, always with either of them to be remedied. Such facts, as we observed, stem from the entanglement of the hybrid (e. g. , mid-and-low) frequency bands that encode varied image properties, which exhibit different robustness to text prompts during the denoising process. In this paper, we propose a null-text-null frequency-aware diffusion models, dubbed NTN-Diff, for text-guided image inpainting, by decomposing the semantics consistency across masked and unmasked regions into the consistencies as per each frequency band, while preserving the unmasked regions, to circumvent two challenges in a row. Based on the diffusion process, we further divide the denoising process into early (high-level noise) and late (low-level noise) stages, where the mid-and-low frequency bands are disentangled during the denoising process. As observed, the stable mid-frequency band is progressively denoised to be semantically aligned during text-guided denoising process, which, meanwhile, serves as the guidance to the null-text denoising process to denoise low-frequency band for the masked regions, followed by a subsequent text-guided denoising process at late stage, to achieve the semantics consistency for mid-and-low frequency bands across masked and unmasked regions, while preserve the unmasked regions. Extensive experiments validate the superiority of NTN-Diff over the state-of-the-art diffusion models to text-guided diffusion models. Our code can be accessed from https: //github. com/htyjers/NTN-Diff.

IROS Conference 2025 Conference Paper

PC 2 P: Multi-Agent Path Finding via Personalized-Enhanced Communication and Crowd Perception

  • Guotao Li
  • Shaoyun Xu
  • Yuexing Hao
  • Yang Wang
  • Yuhui Sun

Distributed Multi-Agent Path Finding (MAPF) integrated with Multi-Agent Reinforcement Learning (MARL) has emerged as a prominent research focus, enabling real-time cooperative decision-making in partially observable environments through inter-agent communication. However, due to insufficient collaborative and perceptual capabilities, existing methods are inadequate for scaling across diverse environmental conditions. To address these challenges, we propose PC 2 P, a novel distributed MAPF method derived from a Q-learning-based MARL framework. Initially, we introduce a personalized-enhanced communication mechanism based on dynamic graph topology, which ascertains the core aspects of "who" and "what" in interactive process through three-stage operations: selection, generation, and aggregation. Concurrently, we incorporate local crowd perception to enrich agents’ heuristic observation, thereby strengthening the model’s guidance for effective actions via the integration of static spatial constraints and dynamic occupancy changes. To resolve extreme deadlock issues, we propose a region-based deadlock-breaking strategy that leverages expert guidance to implement efficient coordination within confined areas. Experimental results demonstrate that PC 2 P achieves superior performance compared to state-of-the-art distributed MAPF methods in varied environments. Ablation studies further confirm the effectiveness of each module for overall performance.

NeurIPS Conference 2025 Conference Paper

PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement

  • ZhanFeng Feng
  • Long Peng
  • Xin Di
  • Yong Guo
  • Wenbo Li
  • Yulun Zhang
  • Renjing Pei
  • Yang Wang

Multi-frame video enhancement tasks aim to improve the spatial and temporal resolution and quality of video sequences by leveraging temporal information from multiple frames, which are widely used in streaming video processing, surveillance, and generation. Although numerous Transformer-based enhancement methods have achieved impressive performance, their computational and memory demands hinder deployment on edge devices. Quantization offers a practical solution by reducing the bit-width of weights and activations to improve efficiency. However, directly applying existing quantization methods to video enhancement tasks often leads to significant performance degradation and loss of fine details. This stems from two limitations: (a) inability to allocate varying representational capacity across frames, which results in suboptimal dynamic range adaptation; (b) over-reliance on full-precision teachers, which limits the learning of low-bit student models. To tackle these challenges, we propose a novel quantization method for video enhancement: Progressive Multi-Frame Quantization for Video Enhancement (PMQ-VE). This framework features a coarse-to-fine two-stage process: Backtracking-based Multi-Frame Quantization (BMFQ) and Progressive Multi-Teacher Distillation (PMTD). BMFQ utilizes a percentile-based initialization and iterative search with pruning and backtracking for robust clipping bounds. PMTD employs a progressive distillation strategy with both full-precision and multiple high-bit (INT) teachers to enhance low-bit models' capacity and quality. Extensive experiments demonstrate that our method outperforms existing approaches, achieving state-of-the-art performance across multiple tasks and benchmarks. The code will be made publicly available.

NeurIPS Conference 2025 Conference Paper

PointMAC: Meta-Learned Adaptation for Robust Test-Time Point Cloud Completion

  • Linlian Jiang
  • Rui Ma
  • Li Gu
  • Ziqiang Wang
  • Xinxin Zuo
  • Yang Wang

Point cloud completion is essential for robust 3D perception in safety-critical applications such as robotics and augmented reality. However, existing models perform static inference and rely heavily on inductive biases learned during training, limiting their ability to adapt to novel structural patterns and sensor-induced distortions at test time. To address this limitation, we propose PointMAC, a meta-learned framework for robust test-time adaptation in point cloud completion. It enables sample-specific refinement without requiring additional supervision. Our method optimizes the completion model under two self-supervised auxiliary objectives that simulate structural and sensor-level incompleteness. A meta-auxiliary learning strategy based on Model-Agnostic Meta-Learning (MAML) ensures that adaptation driven by auxiliary objectives is consistently aligned with the primary completion task. During inference, we adapt the shared encoder on-the-fly by optimizing auxiliary losses, with the decoder kept fixed. To further stabilize adaptation, we introduce Adaptive $\lambda$-Calibration, a meta-learned mechanism for balancing gradients between primary and auxiliary objectives. Extensive experiments on synthetic, simulated, and real-world datasets demonstrate that PointMAC achieves state-of-the-art results by refining each sample individually to produce high-quality completions. To the best of our knowledge, this is the first work to apply meta-auxiliary test-time adaptation to point cloud completion.

IJCAI Conference 2025 Conference Paper

Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks

  • Jiafan Li
  • Jiaqi Zhu
  • Liang Chang
  • Yilin Li
  • Miaomiao Li
  • Yang Wang
  • Yi Yang
  • Hongan Wang

Nowadays, numerous online platforms can be described as multi-modal heterogeneous networks (MMHNs), such as Douban's movie networks and Amazon's product review networks. Accurately categorizing nodes within these networks is crucial for analyzing the corresponding entities, which requires effective representation learning on nodes. However, existing multi-modal fusion methods often adopt either early fusion strategies which may lose the unique characteristics of individual modalities, or late fusion approaches overlooking the cross-modal guidance in GNN-based information propagation. In this paper, we propose a novel model for node classification in MMHNs, named Heterogeneous Graph Neural Network with Inter-Modal Attention (HGNN-IMA). It learns node representations by capturing the mutual influence of multiple modalities during the information propagation process, within the framework of heterogeneous graph transformer. Specifically, a nested inter-modal attention mechanism is integrated into the inter-node attention to achieve adaptive multi-modal fusion, and modality alignment is also taken into account to encourage the propagation among nodes with consistent similarities across all modalities. Moreover, an attention loss is augmented to mitigate the impact of missing modalities. Extensive experiments validate the superiority of the model in the node classification task, providing an innovative view to handle multi-modal data, especially when accompanied with network structures. The full version including Appendix is available at http: //arxiv. org/abs/2505. 07895.

IJCAI Conference 2025 Conference Paper

Revealing Concept Shift in Spatio-Temporal Graphs via State Learning

  • Kuo Yang
  • Yunhe Guo
  • Qihe Huang
  • Zhengyang Zhou
  • Yang Wang

Dynamic graphs are ubiquitous in the real world, presenting the temporal evolution of individuals within spatial associations. Recently, dynamic graph learning research is flourishing, striving to more effectively capture evolutionary patterns and spatial correlations. However, existing methods still fail to address the issue of concept shift in dynamic graphs. Concept shift manifests as a distribution shift in the mapping pattern between historical observations and future evolution. The reason is that some environment variables in dynamic graphs exert varying effects on evolution patterns, but these variables are not effectively captured by the models, leading to the intractable concept shift issue. To tackle this issue, we propose a State-driven environment inference framework (Samen) to achieve a dynamic graph learning framework equipped with concept generalization ability. Firstly, we propose a two-stage environment inference and compression strategy. From the perspective of state space, we introduce a prefix-suffix collaborative state learning mechanism to bidirectionally model the spatio-temporal states. A hierarchical state compressor is further designed to refine the state information resulting in concept shift. Secondly, we propose a skip-connection spatio-temporal prediction module, which effectively utilizes the inferred environments to improve the model's generalization capability. Finally, we select seven datasets from different domains to validate the effectiveness of our model. By comparing the performance of different models on samples with concept shift, we verify that our Samen gains generalization capacity that existing methods fail to capture.

AAAI Conference 2025 Conference Paper

SMamba: Sparse Mamba for Event-based Object Detection

  • Nan Yang
  • Yang Wang
  • Zhanwen Liu
  • Meng Li
  • Yisheng An
  • Xiangmo Zhao

Transformer-based methods have achieved remarkable performance in event-based object detection, owing to the global modeling ability. However, they neglect the influence of non-event and noisy regions and process them uniformly, leading to high computational overhead. To mitigate computation cost, some researchers propose window attention based sparsification strategies to discard unimportant regions, which sacrifices the global modeling ability and results in suboptimal performance. To achieve better trade-off between accuracy and efficiency, we propose Sparse Mamba (SMamba), which performs adaptive sparsification to reduce computational effort while maintaining global modeling capability. Specifically, a Spatio-Temporal Continuity Assessment module is proposed to measure the information content of tokens and discard uninformative ones by leveraging the spatiotemporal distribution differences between activity and noise events. Based on the assessment results, an Information-Prioritized Local Scan strategy is designed to shorten the scan distance between high-information tokens, facilitating interactions among them in the spatial dimension. Furthermore, to extend the global interaction from 2D space to 3D representations, a Global Channel Interaction module is proposed to aggregate channel information from a global spatial perspective. Results on three datasets (Gen1, 1Mpx, and eTram) demonstrate that our model outperforms other methods in both performance and efficiency.

AAAI Conference 2025 Conference Paper

STEM-LTS: Integrating Semantic-Temporal Dynamics in LLM-driven Time Series Analysis

  • Zhe Zhao
  • Pengkun Wang
  • Haibin Wen
  • Shuang Wang
  • Liheng Yu
  • Yang Wang

Time series forecasting plays a crucial role in domains such as finance, healthcare, and climate science. However, as modern time series data become increasingly complex, featuring high dimensionality, intricate spatiotemporal dependencies, and multi-scale evolutionary patterns, traditional analytical methods and existing predictive models face significant challenges. Although Large Language Models (LLMs) excel in capturing long-range dependencies, they still struggle with multi-scale dynamics and seasonal patterns. Moreover, while LLMs' semantic representation capabilities are rich, they often lack explicit alignment with the numerical patterns and temporal structures of time series data, leading to limitations in predictive accuracy and interpretability. To address these challenges, this paper proposes a novel framework, STEM-LTS (Semantic-TEmporal Modeling for Large-scale Time Series). STEM-LTS enhances the ability to capture complex spatiotemporal dependencies by integrating time series decomposition techniques with LLM-based modeling. The semantic-temporal alignment mechanism within the framework significantly improves LLMs' ability to interpret and forecast time series data. Additionally, we develop an adaptive multi-task learning strategy to optimize the model's performance across multiple dimensions. Through extensive experiments on various real-world datasets, we demonstrate that STEM-LTS achieves significant improvements in prediction accuracy, robustness to noise, and interpretability. Our work not only advances LLM-based time series analysis but also offers new perspectives on handling complex temporal data.

NeurIPS Conference 2025 Conference Paper

SwitchLingua: The First Large-Scale Multilingual and Multi-Ethnic Code-Switching Dataset

  • Peng Xie
  • Xingyuan Liu
  • Yequan Bie
  • Tsz Wai Chan
  • Yangqiu Song
  • Yang Wang
  • Hao Chen
  • Kani Chen

Code-switching (CS) is the alternating use of two or more languages within a conversation or utterance, often influenced by social context and speaker identity. This linguistic phenomenon poses challenges for Automatic Speech Recognition (ASR) systems, which are typically designed for a single language and struggle to handle multilingual inputs. The growing global demand for multilingual applications, including Code-Switching ASR (CSASR), Text-to-Speech (TTS), and Cross-Lingual Information Retrieval (CLIR), highlights the inadequacy of existing monolingual datasets. Although some code-switching datasets exist, most are limited to bilingual mixing within homogeneous ethnic groups, leaving a critical need for a large-scale, diverse benchmark akin to ImageNet in computer vision. To bridge this gap, we introduce \textbf{LinguaMaster}, a multi-agent collaboration framework specifically designed for efficient and scalable multilingual data synthesis. Leveraging this framework, we curate \textbf{SwitchLingua}, the first large-scale multilingual and multi-ethnic code-switching dataset, including: (1) 420K CS textual samples across 12 languages, and (2) over 80 hours of audio recordings from 174 speakers representing 18 countries/regions and 63 racial/ethnic backgrounds, based on the textual data. This dataset captures rich linguistic and cultural diversity, offering a foundational resource for advancing multilingual and multicultural research. Furthermore, to address the issue that existing ASR evaluation metrics lack sensitivity to code-switching scenarios, we propose the \textbf{Semantic-Aware Error Rate (SAER)}, a novel evaluation metric that incorporates semantic information, providing a more accurate and context-aware assessment of system performance. Benchmark experiments on SwitchLingua with state-of-the-art ASR models reveal substantial performance gaps, underscoring the dataset’s utility as a rigorous benchmark for CS capability evaluation. In addition, SwitchLingua aims to encourage further research to promote cultural inclusivity and linguistic diversity in speech technology, fostering equitable progress in the ASR field. LinguaMaster (Code): github. com/Shelton1013/SwitchLingua, SwitchLingua (Data): https: //huggingface. co/datasets/Shelton1013/SwitchLingua text, https: //huggingface. co/datasets/Shelton1013/SwitchLingua audio

NeurIPS Conference 2025 Conference Paper

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

  • Xiao Liang
  • Zhong-Zhi Li
  • Yeyun Gong
  • Yang Wang
  • Hengyuan Zhang
  • Yelong Shen
  • Ying Nian Wu
  • Weizhu Chen

Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for training large language models (LLMs) on complex reasoning tasks, such as mathematical problem solving. A prerequisite for the scalability of RLVR is a high-quality problem set with precise and verifiable answers. However, the scarcity of well-crafted human-labeled math problems and limited-verification answers in existing distillation-oriented synthetic datasets limit their effectiveness in RL. Additionally, most problem synthesis strategies indiscriminately expand the problem set without considering the model’s capabilities, leading to low efficiency in generating useful questions. To mitigate this issue, we introduce a Self-aware Weakness-driven problem Synthesis framework (SwS) that systematically identifies model deficiencies and leverages them for problem augmentation. Specifically, we define weaknesses as questions that the model consistently fails to learn through its iterative sampling during RL training. We then extract the core concepts from these failure cases and synthesize new problems to strengthen the model's weak areas in subsequent augmented training, enabling it to focus on and gradually overcome its weaknesses. Without relying on external knowledge distillation, our framework enables robust generalization by empowering the model to self-identify and address its weaknesses in RL, yielding average performance gains of 10% and 7. 7% on 7B and 32B models across eight mainstream reasoning benchmarks. Our code and data are available at https: //anonymous. 4open. science/r/SwS-E6F5/

IJCAI Conference 2025 Conference Paper

Time-Frequency Disentanglement Boosted Pre-Training: A Universal Spatio-Temporal Modeling Framework

  • Yudong Zhang
  • Zhaoyang Sun
  • Xu Wang
  • Xuan Yu
  • Kai Wang
  • Yang Wang

Current spatio-temporal modeling techniques largely rely on the abundant data and the design of task-specific models. However, many cities lack well-established digital infrastructures, making data scarcity and the high cost of model development significant barriers to application deployment. Therefore, this work aims to enable spatio-temporal learning to cope with the problems of few-shot data modeling and model generalizability. To this end, we propose a Universal Spatio-Temporal Correlationship pre-training framework (USTC), for spatio-temporal modeling across different cities and tasks. To enhance the spatio-temporal representations during pre-training, we propose to decouple the time-frequency patterns within data, and leverage contrastive learning to maintain the time-frequency consistency. To further improve the adaptability to downstream tasks, we design a prompt generation module to mine personalized spatio-temporal patterns on the target city, which can be integrated with the learned common spatio-temporal representations to collaboratively serve downstream tasks. Extensive experiments conducted on real-world datasets demonstrate that USTC significantly outperforms the advanced baselines in forecasting, imputation, and extrapolation across cities.

NeurIPS Conference 2025 Conference Paper

TS-MOF: Two-Stage Multi-Objective Fine-tuning for Long-Tailed Recognition

  • Zhe Zhao
  • Zhiheng Gong
  • Pengkun Wang
  • Haibin Wen
  • Cankun Guo
  • Bo Xue
  • Xi Lin
  • Zhenkun Wang

Long-Tailed Recognition (LTR) presents a significant challenge due to extreme class imbalance, where existing methods often struggle to balance performance across head and tail classes. Directly applying multi-objective optimization (MOO) to leverage multiple LTR strategies can be complex and unstable. To address this, we propose TS-MOF (Two-Stage Multi-Objective Fine-tuning), a novel framework that strategically decouples feature learning from classifier adaptation. After standard pre-training, TS-MOF freezes the feature backbone and focuses on an efficient multi-objective fine-tuning of specialized classifier heads. The core of TS-MOF's second stage lies in two innovations: Refined Performance Level Agreement for adaptive task weighting based on real-time per-class performance, and Robust Deterministic Projective Conflict Gradient for stable gradient conflict resolution and constructive fusion. This approach enables effective synergy between diverse LTR strategies, leading to significant and balanced performance improvements. Extensive experiments on CIFAR100-LT, ImageNet-LT, and iNaturalist 2018 demonstrate that TS-MOF achieves state-of-the-art results, particularly enhancing tail class accuracy (e. g. , +3. 3\% on CIFAR100-LT IR=100 tail) while improving head class performance, all within a remarkably short fine-tuning period of 20 epochs.

EAAI Journal 2025 Journal Article

Uncertainty-aware focal loss for object segmentation

  • Lei Chen
  • Yang Wang
  • Jibin Yang
  • Yunfei Zheng
  • Tong Han
  • Bo Zhang
  • Tieyong Cao

In the loss function of object segmentation models, misclassified pixels whose prediction are opposite to the ground truth and uncertain pixels whose predicted probability is close to 0. 5 are more important for model training. Focusing on misclassified pixels can improve the segmentation accuracy of the model, and focusing on uncertain pixels can help the model to form better decision surfaces. However, existing methods fail to take both types of pixels into account simultaneously. To enhance the learning on these two types of important pixels, the Uncertainty-aware Focal Loss (UFL) is proposed based on the analysis of Uncertainty-aware Loss (UAL). Then, by leveraging the S-shaped property of the sigmoid function, a loss function is constructed that can simultaneously increase the loss and loss derivatives of misclassified and uncertain pixels. In order to solve the gradient vanishing problem of the sigmoid function on well-classified pixels, a regularization constraint term is defined, whose value is the square of predicted probability. Finally, the pixel loss value is dynamically adjusted at different stages of training according to the changes in the contributions of misclassified and uncertain pixels to the model training, which improves the targeted learning for misclassified and uncertain pixels. Experimental results on two different types of network structures and six datasets demonstrate that the proposed method can better segment the uncertain and misclassified pixels. Especially, on the DUT-O dataset, UFL improves mean Intersection over Union (mIoU) by almost 2. 7 % compared to UAL.

AAAI Conference 2025 Conference Paper

Unsupervised Domain Adaptive Person Search via Dual Self-Calibration

  • Linfeng Qi
  • Huibing Wang
  • Jiqing Zhang
  • Jinjia Peng
  • Yang Wang

Unsupervised Domain Adaptive (UDA) person search focuses on employing the model trained on a labeled source domain dataset to a target domain dataset without any additional annotations. Most effective UDA person search methods typically utilize the ground truth of the source domain and pseudo-labels derived from clustering during the training process for domain adaptation. However, the performance of these approaches will be significantly restricted by the disrupting pseudo-labels resulting from inter-domain disparities. In this paper, we propose a Dual Self-Calibration (DSCA) framework for UDA person search that effectively eliminates the interference of noisy pseudo-labels by considering both the image-level and instance-level features perspectives. Specifically, we first present a simple yet effective Perception-Driven Adaptive Filter (PDAF) to adaptively predict a dynamic filter threshold based on input features. This threshold assists in eliminating noisy pseudo-boxes and other background interference, allowing our approach to focus on foreground targets and avoid indiscriminate domain adaptation. Besides, we further propose a Cluster Proxy Representation (CPR) module to enhance the update strategy of cluster representation, which mitigates the pollution of clusters from misidentified instances and effectively streamlines the training process for unlabeled target domains. With the above design, our method can achieve state-of-the-art (SOTA) performance on two benchmark datasets, with 80.2% mAP and 81.7% top-1 on the CUHK-SYSU dataset, with 39.9% mAP and 81.6% top-1 on the PRW dataset, which is comparable to or even exceeds the performance of some fully supervised methods.

EAAI Journal 2025 Journal Article

WB-YOLO: An efficient wild bat detection method for ecological monitoring in complex environments

  • Yang Wang
  • Chang Ma
  • Chuanxin Zhao
  • Huijuan Xia
  • Congxi Chen
  • Ying Zhang

The study of bat species and their distribution is vital for understanding the origins and transmission pathways of epidemic diseases. However, the detection of wild bats faces significant challenges due to their complex natural habitats and frequent occlusions caused by social behavior. To address these issues, we propose an object detection method based on the improved You Only Look Once version 7 (YOLOv7) to achieve efficient wild bat detection (WB-YOLO). This method integrates a Vision Transformer encoder module to improve global context integration, adopts deformable convolution, and optimizes the spatial pyramid pooling structure for effective multi-scale feature fusion while reducing computational complexity. Furthermore, a hybrid attention mechanism is introduced to capture both spatial and channel information, enhancing robustness in complex environments. Experimental results in a data set of wild bat images collected in Anhui Province demonstrate that WB-YOLO achieves a precision of 90. 7%, a recall of 89. 0%, and a mean average precision (mAP) of 94. 7%, significantly outperforming other deep learning models in detecting bats in complex scenes and under occlusion. Our approach offers an efficient and accurate solution for the detection of wild bats in real time, with potential applications in ecological research and disease prevention. Code and data related to this work are publicly available at https: //github. com/macandzzz/WB-YOLO.

AAAI Conference 2024 Conference Paper

A Twist for Graph Classification: Optimizing Causal Information Flow in Graph Neural Networks

  • Zhe Zhao
  • Pengkun Wang
  • Haibin Wen
  • Yudong Zhang
  • Zhengyang Zhou
  • Yang Wang

Graph neural networks (GNNs) have achieved state-of-the-art results on many graph representation learning tasks by exploiting statistical correlations. However, numerous observations have shown that such correlations may not reflect the true causal mechanisms underlying the data and thus may hamper the ability of the model to generalize beyond the observed distribution. To address this problem, we propose an Information-based Causal Learning (ICL) framework that combines information theory and causality to analyze and improve graph representation learning to transform information relevance to causal dependence. Specifically, we first introduce a multi-objective mutual information optimization objective derived from information-theoretic analysis and causal learning principles to simultaneously extract invariant and interpretable causal information and reduce reliance on non-causal information in correlations. To optimize this multi-objective objective, we enable a causal disentanglement layer that effectively decouples the causal and non-causal information in the graph representations. Moreover, due to the intractability of mutual information estimation, we derive variational bounds that enable us to transform the above objective into a tractable loss function. To balance the multiple information objectives and avoid optimization conflicts, we leverage multi-objective gradient descent to achieve a stable and efficient transformation from informational correlation to causal dependency. Our approach provides important insights into modulating the information flow in GNNs to enhance their reliability and generalization. Extensive experiments demonstrate that our approach significantly improves the robustness and interpretability of GNNs across different distribution shifts. Visual analysis demonstrates how our method converts informative dependencies in representations into causal dependencies.

ICML Conference 2024 Conference Paper

Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

  • Chen Zhang
  • Qiang He
  • Yuan Zhou
  • Elvis S. Liu
  • Hong Wang
  • Jian Zhao 0010
  • Yang Wang

Deep Reinforcement Learning (DRL) agents have demonstrated impressive success in a wide range of game genres. However, existing research primarily focuses on optimizing DRL competence rather than addressing the challenge of prolonged player interaction. In this paper, we propose a practical DRL agent system for fighting games named Shūkai, which has been successfully deployed to Naruto Mobile, a popular fighting game with over 100 million registered users. Shūkai quantifies the state to enhance generalizability, introducing Heterogeneous League Training (HELT) to achieve balanced competence, generalizability, and training efficiency. Furthermore, Shūkai implements specific rewards to align the agent’s behavior with human expectations. Shūkai ’s ability to generalize is demonstrated by its consistent competence across all characters, even though it was trained on only 13% of them. Additionally, HELT exhibits a remarkable 22% improvement in sample efficiency. Shūkai serves as a valuable training partner for players in Naruto Mobile, enabling them to enhance their abilities and skills.

IJCAI Conference 2024 Conference Paper

Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition

  • Yang Wang
  • Haiyang Mei
  • Qirui Bao
  • Ziqi Wei
  • Mike Zheng Shou
  • Haizhou Li
  • Bo Dong
  • Xin Yang

We introduce a novel multimodality synergistic knowledge distillation scheme tailored for efficient single-eye motion recognition tasks. This method allows a lightweight, unimodal student spiking neural network (SNN) to extract rich knowledge from an event-frame multimodal teacher network. The core strength of this approach is its ability to utilize the ample, coarser temporal cues found in conventional frames for effective emotion recognition. Consequently, our method adeptly interprets both temporal and spatial information from the conventional frame domain, eliminating the need for specialized sensing devices, e. g. , event-based camera. The effectiveness of our approach is thoroughly demonstrated using both existing and our compiled single-eye emotion recognition datasets, achieving unparalleled performance in accuracy and efficiency over existing state-of-the-art methods.

NeurIPS Conference 2024 Conference Paper

Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization Regime

  • Haoyu Geng
  • Hang Ruan
  • Runzhong Wang
  • Yang Li
  • Yang Wang
  • Lei Chen
  • Junchi Yan

Predictive combinatorial optimization, where the parameters of combinatorial optimization (CO) are unknown at the decision-making time, is the precise modeling of many real-world applications, including energy cost-aware scheduling and budget allocation on advertising. Tackling such a problem usually involves a prediction model and a CO solver. These two modules are integrated into the predictive CO pipeline following two design principles: ''Predict-then-Optimize (PtO)'', which learns predictions by supervised training and subsequently solves CO using predicted coefficients, while the other, named ''Predict-and-Optimize (PnO)'', directly optimizes towards the ultimate decision quality and claims to yield better decisions than traditional PtO approaches. However, there lacks a systematic benchmark of both approaches, including the specific design choices at the module level, as well as an evaluation dataset that covers representative real-world scenarios. To this end, we develop a modular framework to benchmark 11 existing PtO/PnO methods on 8 problems, including a new industrial dataset for combinatorial advertising that will be released. Our study shows that PnO approaches are better than PtO on 7 out of 8 benchmarks, but there is no silver bullet found for the specific design choices of PnO. A comprehensive categorization of current approaches and integration of typical scenarios are provided under a unified benchmark. Therefore, this paper could serve as a comprehensive benchmark for future PnO approach development and also offer fast prototyping for application-focused development. The code is available at \url{https: //github. com/Thinklab-SJTU/PredictiveCO-Benchmark}.

NeurIPS Conference 2024 Conference Paper

Breaking Long-Tailed Learning Bottlenecks: A Controllable Paradigm with Hypernetwork-Generated Diverse Experts

  • Zhe Zhao
  • Haibin Wen
  • Zikang Wang
  • Pengkun Wang
  • Fanfu Wang
  • Song Lai
  • Qingfu Zhang
  • Yang Wang

Traditional long-tailed learning methods often perform poorly when dealing with inconsistencies between training and test data distributions, and they cannot flexibly adapt to different user preferences for trade-offs between head and tail classes. To address this issue, we propose a novel long-tailed learning paradigm that aims to tackle distribution shift in real-world scenarios and accommodate different user preferences for the trade-off between head and tail classes. We generate a set of diverse expert models via hypernetworks to cover all possible distribution scenarios, and optimize the model ensemble to adapt to any test distribution. Crucially, in any distribution scenario, we can flexibly output a dedicated model solution that matches the user's preference. Extensive experiments demonstrate that our method not only achieves higher performance ceilings but also effectively overcomes distribution shift while allowing controllable adjustments according to user preferences. We provide new insights and a paradigm for the long-tailed learning problem, greatly expanding its applicability in practical scenarios. The code can be found here: https: //github. com/DataLab-atom/PRL.

IJCAI Conference 2024 Conference Paper

ELF-UA: Efficient Label-Free User Adaptation in Gaze Estimation

  • Yong Wu
  • Yang Wang
  • Sanqing Qu
  • Zhijun Li
  • Guang Chen

We consider the problem of user-adaptive 3D gaze estimation. The performance of person-independent gaze estimation is limited due to interpersonal anatomical differences. Our goal is to provide a personalized gaze estimation model specifically adapted to a target user. Previous work on user-adaptive gaze estimation requires some labeled images of the target person data to fine-tune the model at test time. However, this can be unrealistic in real-world applications, since it is cumbersome for an end-user to provide labeled images. In addition, previous work requires the training data to have both gaze labels and person IDs. This data requirement makes it infeasible to use some of the available data. To tackle these challenges, this paper proposes a new problem called efficient label-free user adaptation in gaze estimation. Our model only needs a few unlabeled images of a target user for the model adaptation. During offline training, we have some labeled source data without person IDs and some unlabeled person-specific data. Our proposed method uses a meta-learning approach to learn how to adapt to a new user with only a few unlabeled images. Our key technical innovation is to use a generalization bound from domain adaptation to define the loss function in meta-learning, so that our method can effectively make use of both the labeled source data and the unlabeled person-specific data during training. Extensive experiments validate the effectiveness of our method on several challenging benchmarks.

AAAI Conference 2024 Conference Paper

Fair Graph Learning Using Constraint-Aware Priority Adjustment and Graph Masking in River Networks

  • Erhu He
  • Yiqun Xie
  • Alexander Sun
  • Jacob Zwart
  • Jie Yang
  • Zhenong Jin
  • Yang Wang
  • Hassan Karimi

Accurate prediction of water quality and quantity is crucial for sustainable development and human well-being. However, existing data-driven methods often suffer from spatial biases in model performance due to heterogeneous data, limited observations, and noisy sensor data. To overcome these challenges, we propose Fair-Graph, a novel graph-based recurrent neural network that leverages interrelated knowledge from multiple rivers to predict water flow and temperature within large-scale stream networks. Additionally, we introduce node-specific graph masks for information aggregation and adaptation to enhance prediction over heterogeneous river segments. To reduce performance disparities across river segments, we introduce a centralized coordination strategy that adjusts training priorities for segments. We evaluate the prediction of water temperature within the Delaware River Basin, and the prediction of streamflow using simulated data from U.S. National Water Model in the Houston River network. The results showcase improvements in predictive performance and highlight the proposed model's ability to maintain spatial fairness over different river segments.

IJCAI Conference 2024 Conference Paper

Fast One-Stage Unsupervised Domain Adaptive Person Search

  • Tianxiang Cui
  • Huibing Wang
  • Jinjia Peng
  • Ruoxi Deng
  • Xianping Fu
  • Yang Wang

Unsupervised person search aims to localize a particular target person from a gallery set of scene images without annotations, which is extremely challenging due to the unexpected variations of the unlabeled domains. However, most existing methods dedicate to developing multi-stage models to adapt domain variations while using clustering for iterative model training, which inevitably increase model complexity. To address this issue, we propose a Fast One-stage Unsupervised person Search (FOUS) which complementaryly integrates domain adaption with label adaption within an end-to-end manner without iterative clustering. To minimize the domain discrepancy, FOUS introduced an Attention-based Domain Alignment Module (ADAM) which can not only align various domains for both detection and ReID tasks but also construct an attention mechanism to reduce the adverse impacts of low-quality candidates resulting from unsupervised detection. Moreover, to avoid the redundant iterative clustering mode, FOUS adopts a prototype-guided labeling method which minimizes redundant correlation computations for partial samples and assigns noisy coarse label groups efficiently. The coarse label groups will be continuously refined via label-flexible training network with an adaptive selection strategy. With the adapted domains and labels, FOUS can achieve the state-of-the-art (SOTA) performance on two benchmark datasets, CUHK-SYSU and PRW. The code is available at https: //github. com/whbdmu/FOUS.

NeurIPS Conference 2024 Conference Paper

Get Rid of Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework

  • Zhongchao Yi
  • Zhengyang Zhou
  • Qihe Huang
  • Yanjiang Chen
  • Liheng Yu
  • Xu Wang
  • Yang Wang

Spatiotemporal learning has become a pivotal technique to enable urban intelligence. Traditional spatiotemporal models mostly focus on a specific task by assuming a same distribution between training and testing sets. However, given that urban systems are usually dynamic, multi-sourced with imbalanced data distributions, current specific task-specific models fail to generalize to new urban conditions and adapt to new domains without explicitly modeling interdependencies across various dimensions and types of urban data. To this end, we argue that there is an essential to propose a Continuous Multi-task Spatio-Temporal learning framework (CMuST) to empower collective urban intelligence, which reforms the urban spatiotemporal learning from single-domain to cooperatively multi-dimensional and multi-task learning. Specifically, CMuST proposes a new multi-dimensional spatiotemporal interaction network (MSTI) to allow cross-interactions between context and main observations as well as self-interactions within spatial and temporal aspects to be exposed, which is also the core for capturing task-level commonality and personalization. To ensure continuous task learning, a novel Rolling Adaptation training scheme (RoAda) is devised, which not only preserves task uniqueness by constructing data summarization-driven task prompts, but also harnesses correlated patterns among tasks by iterative model behavior modeling. We further establish a benchmark of three cities for multi-task spatiotemporal learning, and empirically demonstrate the superiority of CMuST via extensive evaluations on these datasets. The impressive improvements on both few-shot streaming data and new domain tasks against existing SOAT methods are achieved. Code is available at https: //github. com/DILab-USTCSZ/CMuST.

TCS Journal 2024 Journal Article

Hardness of Entropic Module-LWE

  • Hao Lin
  • Mingqiang Wang
  • Jincheng Zhuang
  • Yang Wang

The Learning with Errors (LWE) problem is a versatile basis for building various purpose post-quantum schemes. Goldwasser et al. [ISC 2010] initialized the study of a variant of this problem called the Entropic LWE problem, where the LWE secret is generated from a distribution with a certain min-entropy. Brakerski and Döttling recently further extended the study in this field, and first proved the hardness of the Entropic LWE problem with unbounded secret [Eurocrypt 2020], then gave a similar result for the Entropic Ring-LWE problem [TCC 2020]. In this work, we systematically study the hardness of the Entropic Module-LWE problem. Adapting the “lossiness approach” to the module setting, we give lower entropy bounds for the secret distributions that guarantee the hardness of the Entropic Module-LWE problem in both search and decision cases, where results are divided into two settings: bounded and unbounded norm. We also present that our search entropy lower bound in the unbounded case is essentially tight. An application of our bounded result is to deduce the hardness for the Binary Module-LWE problem. One of our central techniques is a new generalized leftover hash lemma over rings, which might be of independent interest.

AAAI Conference 2024 Conference Paper

HDMixer: Hierarchical Dependency with Extendable Patch for Multivariate Time Series Forecasting

  • Qihe Huang
  • Lei Shen
  • Ruixin Zhang
  • Jiahuan Cheng
  • Shouhong Ding
  • Zhengyang Zhou
  • Yang Wang

Multivariate time series (MTS) prediction has been widely adopted in various scenarios. Recently, some methods have employed patching to enhance local semantics and improve model performance. However, length-fixed patch are prone to losing temporal boundary information, such as complete peaks and periods. Moreover, existing methods mainly focus on modeling long-term dependencies across patches, while paying little attention to other dimensions (e.g., short-term dependencies within patches and complex interactions among cross-variavle patches). To address these challenges, we propose a pure MLP-based HDMixer, aiming to acquire patches with richer semantic information and efficiently modeling hierarchical interactions. Specifically, we design a Length-Extendable Patcher (LEP) tailored to MTS, which enriches the boundary information of patches and alleviates semantic incoherence in series. Subsequently, we devise a Hierarchical Dependency Explorer (HDE) based on pure MLPs. This explorer effectively models short-term dependencies within patches, long-term dependencies across patches, and complex interactions among variables. Extensive experiments on 9 real-world datasets demonstrate the superiority of our approach. The code is available at https://github.com/hqh0728/HDMixer.

NeurIPS Conference 2024 Conference Paper

Improving Generalization of Dynamic Graph Learning via Environment Prompt

  • Kuo Yang
  • Zhengyang Zhou
  • Qihe Huang
  • Limin Li
  • Yuxuan Liang
  • Yang Wang

Out-of-distribution (OOD) generalization issue is a well-known challenge within deep learning tasks. In dynamic graphs, the change of temporal environments is regarded as the main cause of data distribution shift. While numerous OOD studies focusing on environment factors have achieved remarkable performance, they still fail to systematically solve the two issue of environment inference and utilization. In this work, we propose a novel dynamic graph learning model named EpoD based on prompt learning and structural causal model to comprehensively enhance both environment inference and utilization. Inspired by the superior performance of prompt learning in understanding underlying semantic and causal associations, we first design a self-prompted learning mechanism to infer unseen environment factors. We then rethink the role of environment variable within spatio-temporal causal structure model, and introduce a novel causal pathway where dynamic subgraphs serve as mediating variables. The extracted dynamic subgraph can effectively capture the data distribution shift by incorporating the inferred environment variables into the node-wise dependencies. Theoretical discussions and intuitive analysis support the generalizability and interpretability of EpoD. Extensive experiments on seven real-world datasets across domains showcase the superiority of EpoD against baselines, and toy example experiments further verify the powerful interpretability and rationality of our EpoD.

IJCAI Conference 2024 Conference Paper

LeRet: Language-Empowered Retentive Network for Time Series Forecasting

  • Qihe Huang
  • Zhengyang Zhou
  • Kuo Yang
  • Gengyu Lin
  • Zhongchao Yi
  • Yang Wang

Time series forecasting (TSF) plays a pivotal role in many real-world applications. Recently, the utilization of Large Language Models (LLM) in TSF has demonstrated exceptional predictive performance, surpassing most task-specific forecasting models. The success of LLM-based forecasting methods underscores the importance of causal dependence modeling and pre-trained knowledge transfer. However, challenges persist in directly applying LLM to TSF, i. e. , the unacceptable parameter scales for resource-intensive model optimization, and the significant gap of feature space between structural numerical time series and natural language. To this end, we propose LeRet, a Language-empowered Retentive network for TSF. Technically, inspired by the causal extraction in LLM, we propose a causal dependence learner, enhanced by a patch-level pre-training task, to capture sequential causal evolution. To minimize the gap between numeric and language, we initialize a language description protocol for time series and design a TS-related language knowledge extractor to learn from language description, avoiding training with large-scale parameters. Finally, we dedicatedly achieve a Language-TS Modality Integrator for the fusion of two types data, and enable language-empowered sequence forecasting. Extensive evaluations demonstrate the effectiveness of our LeRet, especially reveal superiority on few-shot, and zero-shot forecasting tasks. Code is available at https: //github. com/hqh0728/LeRet.

NeurIPS Conference 2024 Conference Paper

LLM-AutoDA: Large Language Model-Driven Automatic Data Augmentation for Long-tailed Problems

  • Pengkun Wang
  • Zhe Zhao
  • Haibin Wen
  • Fanfu Wang
  • Binwu Wang
  • Qingfu Zhang
  • Yang Wang

The long-tailed distribution is the underlying nature of real-world data, and it presents unprecedented challenges for training deep learning models. Existing long-tailed learning paradigms based on re-balancing or data augmentation have partially alleviated the long-tailed problem. However, they still have limitations, such as relying on manually designed augmentation strategies, having a limited search space, and using fixed augmentation strategies. To address these limitations, this paper proposes a novel LLM-based long-tailed data augmentation framework called LLM-AutoDA, which leverages large-scale pretrained models to automatically search for the optimal augmentation strategies suitable for long-tailed data distributions. In addition, it applies this strategy to the original imbalanced data to create an augmented dataset and fine-tune the underlying long-tailed learning model. The performance improvement on the validation set serves as a reward signal to update the generation model, enabling the generation of more effective augmentation strategies in the next iteration. We conducted extensive experiments on multiple mainstream long-tailed learning benchmarks. The results show that LLM-AutoDA outperforms state-of-the-art data augmentation methods and other re-balancing methods significantly.

IJCAI Conference 2024 Conference Paper

Make Bricks with a Little Straw: Large-Scale Spatio-Temporal Graph Learning with Restricted GPU-Memory Capacity

  • Binwu Wang
  • Pengkun Wang
  • Zhengyang Zhou
  • Zhe Zhao
  • Wei Xu
  • Yang Wang

Traffic prediction plays a key role in various smart city applications, which can help traffic managers make traffic plans in advance, assist online ride-hailing companies in deploying vehicles reasonably, and provide early warning of congestion for safety authorities. While increasingly complex models achieve impressive prediction performance, there are concerns about the effectiveness of these models in handling large-scale road networks. Especially for researchers who don't have access to powerful GPU devices, the expensive memory burden limits the usefulness of these models. In this paper, we take the first step of learning on the large-scale spatio-temporal graph and propose a divide-and-conquer training strategy for Large Spatio-Temporal Graph Learning, namely LarSTL. The core idea behind this strategy is to divide the large graph into multiple subgraphs, which are treated as task streams to sequentially train the model to conquer each subgraph one by one. We introduce a novel perspective based on the continuous learning paradigm to achieve this goal. In order to overcome forgetting the knowledge learned from previous subgraphs, an experience-replay strategy consolidates the learned knowledge by replaying nodes sampled from previous subgraphs. Moreover, we configure specific feature adaptors for each subgraph to extract personalized features, and it is also beneficial to consolidate the learned knowledge from the perspective of parameters. We conduct experiments using multiple large-scale traffic network datasets on a V100 GPU with only 16GB memory, and the results demonstrate that our LarSTL can achieve competitive performance and high efficiency.

IJCAI Conference 2024 Conference Paper

MMGNN: A Molecular Merged Graph Neural Network for Explainable Solvation Free Energy Prediction

  • Wenjie Du
  • Shuai Zhang
  • Di Wu
  • Jun Xia
  • Ziyuan Zhao
  • Junfeng Fang
  • Yang Wang

In this paper, we address the challenge of accurately modeling and predicting Gibbs free energy in solute-solvent interactions, a pivotal yet complex aspect in the field of chemical modeling. Traditional approaches, primarily relying on deep learning models, face limitations in capturing the intricate dynamics of these interactions. To overcome these constraints, we introduce a novel framework, molecular modeling graph neural network (MMGNN), which more closely mirrors real-world chemical processes. Specifically, MMGNN explicitly models atomic interactions such as hydrogen bonds by initially forming indiscriminate connections between intermolecular atoms, which are then refined using an attention-based aggregation method, tailoring to specific solute-solvent pairs. To address the challenges of non-interactive or repulsive atomic interactions, MMGNN incorporates interpreters for nodes and edges in the merged graph, enhancing explainability and reducing redundancy. MMGNN stands as the first framework to explicitly align with real chemical processes, providing a more accurate and scientifically sound approach to modeling solute-solvent interactions. The infusion of explainability allows for the extraction of key subgraphs, which are pivotal for further research in solute-solvent dynamics. Extensive experimental validation confirms the efficacy and enhanced explainability of MMGNN.

AAAI Conference 2024 Conference Paper

NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching

  • Hongbo Zhang
  • Guang Wang
  • Xu Wang
  • Zhengyang Zhou
  • Chen Zhang
  • Zheng Dong
  • Yang Wang

One of the most important tasks in ride-hailing is order dispatching, i.e., assigning unserved orders to available drivers. Recent order dispatching has achieved a significant improvement due to the advance of reinforcement learning, which has been approved to be able to effectively address sequential decision-making problems like order dispatching. However, most existing reinforcement learning methods require agents to learn the optimal policy by interacting with environments online, which is challenging or impractical for real-world deployment due to high costs or safety concerns. For example, due to the spatiotemporally unbalanced supply and demand, online reinforcement learning-based order dispatching may significantly impact the revenue of the ride-hailing platform and passenger experience during the policy learning period. Hence, in this work, we develop an offline deep reinforcement learning framework called NondBREM for large-scale order dispatching, which learns policy from only the accumulated logged data to avoid costly and unsafe interactions with the environment. In NondBREM, a Nondeterministic Batch-Constrained Q-learning (NondBCQ) module is developed to reduce the algorithm extrapolation error and a Random Ensemble Mixture (REM) module that integrates multiple value networks with multi-head networks is utilized to improve the model generalization and robustness. Extensive experiments on large-scale real-world ride-hailing datasets show the superiority of our design.

AAAI Conference 2024 Conference Paper

OSFFNet: Omni-Stage Feature Fusion Network for Lightweight Image Super-Resolution

  • Yang Wang
  • Tao Zhang

Recently, several lightweight methods have been proposed to implement single-image super-resolution (SISR) on resource-constrained devices. However, these methods primarily focus on simplifying network structures without the full utilization of shallow features. The fact remains that shallow features encompass crucial details for the super-resolution task, including edges, textures, and colors. Therefore, developing a novel architecture that can effectively integrate features from different levels and capitalize on their mutual complementarity is necessary. We first analyze the relationship between multi-stage features and the restoration tasks in a classic lightweight SR method. Based on these observations, we propose an Omni-Stage Feature Fusion (OSFF) architecture, which incorporates Original Image Stacked Initialisation, Shallow Feature Global Connection, and Multi-Receptive Field Dynamic Fusion. An Attention-Enhanced Feature Distillation module is also designed to enhance the model performance. Finally, leveraging these contributions, we construct an Omni-Stage Feature Fusion Network (OSFFNet). Through extensive experiments on various benchmark datasets, the proposed model outperforms state-of-the-art methods. Notably, it achieves a 0.26dB PSNR improvement over the second-best method for x2 SR on the Urban100 dataset.

IJCAI Conference 2024 Conference Paper

Scene-Adaptive Person Search via Bilateral Modulations

  • Yimin Jiang
  • Huibing Wang
  • Jinjia Peng
  • Xianping Fu
  • Yang Wang

Person search aims to localize specific a target person from a gallery set of images with various scenes. As the scene of moving pedestrian changes, the captured person image inevitably bring in lots of background noise and foreground noise on the person feature, which are completely unrelated to the person identity, leading to severe performance degeneration. To address this issue, we present a Scene-Adaptive Person Search (SEAS) model by introducing bilateral modulations to simultaneously eliminate scene noise and maintain a consistent person representation to adapt to various scenes. In SEAS, a Background Modulation Network (BMN) is designed to encode the feature extracted from the detected bounding box into a multi-granularity embedding, which reduces the input of background noise from multiple levels with norm-aware. Additionally, to mitigate the effect of foreground noise on the person feature, SEAS introduces a Foreground Modulation Network (FMN) to compute the clutter reduction offset for the person embedding based on the feature map of the scene image. By bilateral modulations on both background and foreground within an end-to-end manner, SEAS obtains consistent feature representations without scene noise. SEAS can achieve state-of-the-art (SOTA) performance on two benchmark datasets, CUHK-SYSU with 97. 1% mAP and PRW with 60. 5% mAP. The code is available at https: //github. com/whbdmu/SEAS.

AAAI Conference 2024 Conference Paper

Test-Time Domain Adaptation by Learning Domain-Aware Batch Normalization

  • Yanan Wu
  • Zhixiang Chi
  • Yang Wang
  • Konstantinos N. Plataniotis
  • Songhe Feng

Test-time domain adaptation aims to adapt the model trained on source domains to unseen target domains using a few unlabeled images. Emerging research has shown that the label and domain information is separately embedded in the weight matrix and batch normalization (BN) layer. Previous works normally update the whole network naively without explicitly decoupling the knowledge between label and domain. As a result, it leads to knowledge interference and defective distribution adaptation. In this work, we propose to reduce such learning interference and elevate the domain knowledge learning by only manipulating the BN layer. However, the normalization step in BN is intrinsically unstable when the statistics are re-estimated from a few samples. We find that ambiguities can be greatly reduced when only updating the two affine parameters in BN while keeping the source domain statistics. To further enhance the domain knowledge extraction from unlabeled data, we construct an auxiliary branch with label-independent self-supervised learning (SSL) to provide supervision. Moreover, we propose a bi-level optimization based on meta-learning to enforce the alignment of two learning objectives of auxiliary and main branches. The goal is to use the auxiliary branch to adapt the domain and benefit main task for subsequent inference. Our method keeps the same computational cost at inference as the auxiliary branch can be thoroughly discarded after adaptation. Extensive experiments show that our method outperforms the prior works on five WILDS real-world domain shift datasets. Our method can also be integrated with methods with label-dependent optimization to further push the performance boundary. Our code is available at https://github.com/ynanwu/MABN.

AAAI Conference 2024 Conference Paper

Test-Time Personalization with Meta Prompt for Gaze Estimation

  • Huan Liu
  • Julia Qi
  • Zhenhao Li
  • Mohammad Hassanpour
  • Yang Wang
  • Konstantinos N. Plataniotis
  • Yuanhao Yu

Despite the recent remarkable achievement in gaze estimation, efficient and accurate personalization of gaze estimation without labels is a practical problem but rarely touched on in the literature. To achieve efficient personalization, we take inspiration from the recent advances in Natural Language Processing (NLP) by updating a negligible number of parameters, "prompts", at the test time. Specifically, the prompt is additionally attached without perturbing original network and can contain less than 1% of a ResNet-18's parameters. Our experiments show high efficiency of the prompt tuning approach. The proposed one can be 10 times faster in terms of adaptation speed than the methods compared. However, it is non-trivial to update the prompt for personalized gaze estimation without labels. At the test time, it is essential to ensure that the minimizing of particular unsupervised loss leads to the goals of minimizing gaze estimation error. To address this difficulty, we propose to meta-learn the prompt to ensure that its updates align with the goal. Our experiments show that the meta-learned prompt can be effectively adapted even with a simple symmetry loss. In addition, we experiment on four cross-dataset validations to show the remarkable advantages of the proposed method.

AAAI Conference 2024 Conference Paper

Towards Dynamic Spatial-Temporal Graph Learning: A Decoupled Perspective

  • Binwu Wang
  • Pengkun Wang
  • Yudong Zhang
  • Xu Wang
  • Zhengyang Zhou
  • Lei Bai
  • Yang Wang

With the progress of urban transportation systems, a significant amount of high-quality traffic data is continuously collected through streaming manners, which has propelled the prosperity of the field of spatial-temporal graph prediction. In this paper, rather than solely focusing on designing powerful models for static graphs, we shift our focus to spatial-temporal graph prediction in the dynamic scenario, which involves a continuously expanding and evolving underlying graph. To address inherent challenges, a decoupled learning framework (DLF) is proposed in this paper, which consists of a spatial-temporal graph learning network (DSTG) with a specialized decoupling training strategy. Incorporating inductive biases of time-series structures, DSTG can interpret time dependencies into latent trend and seasonal terms. To enable prompt adaptation to the evolving distribution of the dynamic graph, our decoupling training strategy is devised to iteratively update these two types of patterns. Specifically, for learning seasonal patterns, we conduct thorough training for the model using a long time series (e.g., three months of data). To enhance the learning ability of the model, we also introduce the masked auto-encoding mechanism. During this period, we frequently update trend patterns to expand new information from dynamic graphs. Considering both effectiveness and efficiency, we develop a subnet sampling strategy to select a few representative nodes for fine-tuning the weights of the model. These sampled nodes cover unseen patterns and previously learned patterns. Experiments on dynamic spatial-temporal graph datasets further demonstrate the competitive performance, superior efficiency, and strong scalability of the proposed framework.

ICLR Conference 2023 Conference Paper

A Laplace-inspired Distribution on SO(3) for Probabilistic Rotation Estimation

  • Yingda Yin
  • Yang Wang
  • He Wang 0010
  • Baoquan Chen

Estimating the 3DoF rotation from a single RGB image is an important yet challenging problem. Probabilistic rotation regression has raised more and more attention with the benefit of expressing uncertainty information along with the prediction. Though modeling noise using Gaussian-resembling Bingham distribution and matrix Fisher distribution is natural, they are shown to be sensitive to outliers for the nature of quadratic punishment to deviations. In this paper, we draw inspiration from multivariate Laplace distribution and propose a novel Rotation Laplace distribution on SO(3). Rotation Laplace distribution is robust to the disturbance of outliers and enforces much gradient to the low-error region, resulting in a better convergence. Our extensive experiments show that our proposed distribution achieves state-of-the-art performance for rotation regression tasks over both probabilistic and non-probabilistic baselines. Our project page is at pku-epic.github.io/RotationLaplace.

NeurIPS Conference 2023 Conference Paper

CrossGNN: Confronting Noisy Multivariate Time Series Via Cross Interaction Refinement

  • Qihe Huang
  • Lei Shen
  • Ruixin Zhang
  • Shouhong Ding
  • Binwu Wang
  • Zhengyang Zhou
  • Yang Wang

Recently, multivariate time series (MTS) forecasting techniques have seen rapid development and widespread applications across various fields. Transformer-based and GNN-based methods have shown promising potential due to their strong ability to model interaction of time and variables. However, by conducting a comprehensive analysis of the real-world data, we observe that the temporal fluctuations and heterogeneity between variables are not well handled by existing methods. To address the above issues, we propose CrossGNN, a linear complexity GNN model to refine the cross-scale and cross-variable interaction for MTS. To deal with the unexpected noise in time dimension, an adaptive multi-scale identifier (AMSI) is leveraged to construct multi-scale time series with reduced noise. A Cross-Scale GNN is proposed to extract the scales with clearer trend and weaker noise. Cross-Variable GNN is proposed to utilize the homogeneity and heterogeneity between different variables. By simultaneously focusing on edges with higher saliency scores and constraining those edges with lower scores, the time and space complexity (i. e. , $O(L)$) of CrossGNN can be linear with the input sequence length $L$. Extensive experimental results on 8 real-world MTS datasets demonstrate the effectiveness of CrossGNN compared with state-of-the-art methods.

IROS Conference 2023 Conference Paper

Design and Evaluation of Bidirectional Continuous Rotation and Variable Curvature Needle Steering Algorithm

  • Farid Tavakkolmoghaddam
  • Charles Bales
  • Yang Wang
  • Zhanyue Zhao
  • Gregory S. Fischer

The success rate of robotic-assisted needle-guided interventions for applications such as tissue biopsy and targeted drug delivery relies heavily on the accuracy of the needle placement. Tissue shift and needle tip deflection due to needle-tissue interaction are some factors that can adversely affect the outcome of these procedures. In this paper, we present a novel algorithm for robotically-steered bevel tip needles that provides variable needle curvatures by continuously controlling the rotation speed of the needle in a bidirectional manner. Our algorithm is an extension of the Continuous Rotation and Variable Curvature (CURV) algorithm and extends its use with wired sensorized needles. Additionally, we present algorithms for the implementation of our proposed method for closed-loop needle steering in robotic systems with image or sensor feedback. To validate our approach, we perform two benchtop needle insertion experiments in a gelatin phantom and ex vivo tissue. In the first experiment, we demonstrate the capability of our proposed algorithm in achieving variable curvatures and compare it with the CURV algorithm and our simulation results. The second experiment studies the effect of the unidirectional and bidirectional needle steering on the tissue wind-up using a novel force collection setup. Our results highlight the capability of the proposed algorithm in achieving variable curvature profiles and suggest a potential advantage compared to the original method in terms of reducing the imbalanced forces sensed at the load cell due to the needle-tissue friction buildup.

YNICL Journal 2023 Journal Article

Longitudinal alterations in cerebral perfusion following a season of adolescent contact sport participation compared to non-contact athletes

  • Benjamin L. Brett
  • Alex D. Cohen
  • Michael A. McCrea
  • Yang Wang

BACKGROUND: Cerebral blood flow (CBF) change, a non-invasive marker of head injury, has yet to be thoroughly investigated as a potential consequence of repetitive head impacts (RHI) via contact sport participation in youth athletes. We examined pre-to post-season differences in relative CBF (rCBF), arterial transit time (ATT), and neurocognition between adolescent contact sport (CS; 79.4% of which were football players) and non-contact sport (NCS) athletes. METHODS: Adolescent athletes (N = 57; age = 14.70 ± 1.97) completed pre- and post-season clinical assessments and neuroimaging. Brain perfusion was evaluated using an advanced 3D pseudo-continuous ASL sequence with Hadamard encoded multiple post-labeling delays. Mixed-effect models tested group-by-time interactions for rCBF, ATT, and neurocognition. RESULTS: A significant group-by-time interaction was observed for rCBF in a cluster consisting primarily of frontal and parietal lobe regions, with regional rCBF increasing in CS and decreasing among NCS athletes. No significant interaction was observed for ATT. A significant group-by-time interaction was observed for verbal memory and visual motor speed, with NCS athletes improving and CS athletes exhibiting lower performance from pre-to post-season in comparison. CONCLUSIONS: Alterations in rCBF and variability in cognition, not purported neurovasculature changes (measured by ATT), were observed following one season of CS participation. Further study surrounding the clinical meaningfulness of these findings, as they related to adverse long-term outcomes, is needed.

AAAI Conference 2023 Conference Paper

MetaZSCIL: A Meta-Learning Approach for Generalized Zero-Shot Class Incremental Learning

  • Yanan Wu
  • Tengfei Liang
  • Songhe Feng
  • Yi Jin
  • Gengyu Lyu
  • Haojun Fei
  • Yang Wang

Generalized zero-shot learning (GZSL) aims to recognize samples whose categories may not have been seen at training. Standard GZSL cannot handle dynamic addition of new seen and unseen classes. In order to address this limitation, some recent attempts have been made to develop continual GZSL methods. However, these methods require end-users to continuously collect and annotate numerous seen class samples, which is unrealistic and hampers the applicability in the real-world. Accordingly, in this paper, we propose a more practical and challenging setting named Generalized Zero-Shot Class Incremental Learning (CI-GZSL). Our setting aims to incrementally learn unseen classes without any training samples, while recognizing all classes previously encountered. We further propose a bi-level meta-learning based method called MetaZSCIL to directly optimize the network to learn how to incrementally learn. Specifically, we sample sequential tasks from seen classes during the offline training to simulate the incremental learning process. For each task, the model is learned using a meta-objective such that it is capable to perform fast adaptation without forgetting. Note that our optimization can be flexibly equipped with most existing generative methods to tackle CI-GZSL. This work introduces a feature generative framework that leverages visual feature distribution alignment to produce replayed samples of previously seen classes to reduce catastrophic forgetting. Extensive experiments conducted on five widely used benchmarks demonstrate the superiority of our proposed method.

TIST Journal 2023 Journal Article

Recent Few-shot Object Detection Algorithms: A Survey with Performance Comparison

  • Tianying Liu
  • Lu Zhang
  • Yang Wang
  • Jihong Guan
  • Yanwei Fu
  • Jiajia Zhao
  • Shuigeng Zhou

The generic object detection (GOD) task has been successfully tackled by recent deep neural networks, trained by an avalanche of annotated training samples from some common classes. However, it is still non-trivial to generalize these object detectors to the novel long-tailed object classes, which have only few labeled training samples. To this end, the Few-Shot Object Detection (FSOD) has been topical recently, as it mimics the humans’ ability of learning to learn and intelligently transfers the learned generic object knowledge from the common heavy-tailed to the novel long-tailed object classes. Especially, the research in this emerging field has been flourishing in recent years with various benchmarks, backbones, and methodologies proposed. To review these FSOD works, there are several insightful FSOD survey articles [ 58, 59, 74, 78 ] that systematically study and compare them as the groups of fine-tuning/transfer learning and meta-learning methods. In contrast, we review the existing FSOD algorithms from a new perspective under a new taxonomy based on their contributions, i.e., data-oriented, model-oriented, and algorithm-oriented. Thus, a comprehensive survey with performance comparison is conducted on recent achievements of FSOD. Furthermore, we also analyze the technical challenges, the merits and demerits of these methods, and envision the future directions of FSOD. Specifically, we give an overview of FSOD, including the problem definition, common datasets, and evaluation protocols. The taxonomy is then proposed that groups FSOD methods into three types. Following this taxonomy, we provide a systematic review of the advances in FSOD. Finally, further discussions on performance, challenges, and future directions are presented.

AAAI Conference 2023 Conference Paper

Rethinking Data-Free Quantization as a Zero-Sum Game

  • Biao Qian
  • Yang Wang
  • Richang Hong
  • Meng Wang

Data-free quantization (DFQ) recovers the performance of quantized network (Q) without accessing the real data, but generates the fake sample via a generator (G) by learning from full-precision network (P) instead. However, such sample generation process is totally independence of Q, specialized as failing to consider the adaptability of the generated samples, i.e., beneficial or adversarial, over the learning process of Q, resulting into non-ignorable performance loss. Building on this, several crucial questions --- how to measure and exploit the sample adaptability to Q under varied bit-width scenarios? how to generate the samples with desirable adaptability to benefit the quantized network? --- impel us to revisit DFQ. In this paper, we answer the above questions from a game-theory perspective to specialize DFQ as a zero-sum game between two players --- a generator and a quantized network, and further propose an Adaptability-aware Sample Generation (AdaSG) method. Technically, AdaSG reformulates DFQ as a dynamic maximization-vs-minimization game process anchored on the sample adaptability. The maximization process aims to generate the sample with desirable adaptability, such sample adaptability is further reduced by the minimization process after calibrating Q for performance recovery. The Balance Gap is defined to guide the stationarity of the game process to maximally benefit Q. The theoretical analysis and empirical studies verify the superiority of AdaSG over the state-of-the-arts. Our code is available at https://github.com/hfutqian/AdaSG.

JBHI Journal 2023 Journal Article

Stress Detection Through Wrist-Based Electrodermal Activity Monitoring and Machine Learning

  • Lili Zhu
  • Petros Spachos
  • Pai Chet Ng
  • Yuanhao Yu
  • Yang Wang
  • Konstantinos Plataniotis
  • Dimitrios Hatzinakos

Stress is an inevitable part of modern life. While stress can negatively impact a person's life and health, positive and under-controlled stress can also enable people to generate creative solutions to problems encountered in their daily lives. Although it is hard to eliminate stress, we can learn to monitor and control its physical and psychological effects. It is essential to provide feasible and immediate solutions for more mental health counselling and support programs to help people relieve stress and improve their mental health. Popular wearable devices, such as smartwatches with several sensing capabilities, including physiological signal monitoring, can alleviate the problem. This work investigates the feasibility of using wrist-based electrodermal activity (EDA) signals collected from wearable devices to predict people's stress status and identify possible factors impacting stress classification accuracy. We use data collected from wrist-worn devices to examine the binary classification discriminating stress from non-stress. For efficient classification, five machine learning-based classifiers were examined. We explore the classification performance on four available EDA databases under different feature selections. According to the results, Support Vector Machine (SVM) outperforms the other machine learning approaches with an accuracy of 92. 9 for stress prediction. Additionally, when the subject classification included gender information, the performance analysis showed significant differences between males and females. We further examine a multimodal approach for stress classifications. The results indicate that wearable devices with EDA sensors have a great potential to provide helpful insight for improved mental health monitoring.

JMLR Journal 2022 Journal Article

An Error Analysis of Generative Adversarial Networks for Learning Distributions

  • Jian Huang
  • Yuling Jiao
  • Zhen Li
  • Shiao Liu
  • Yang Wang
  • Yunfei Yang

This paper studies how well generative adversarial networks (GANs) learn probability distributions from finite samples. Our main results establish the convergence rates of GANs under a collection of integral probability metrics defined through H\"{o}lder classes, including the Wasserstein distance as a special case. We also show that GANs are able to adaptively learn data distributions with low-dimensional structures or have H\"{o}lder densities, when the network architectures are chosen properly. In particular, for distributions concentrated around a low-dimensional set, we show that the learning rates of GANs do not depend on the high ambient dimension, but on the lower intrinsic dimension. Our analysis is based on a new oracle inequality decomposing the estimation error into the generator and discriminator approximation error and the statistical error, which may be of independent interest. [abs] [ pdf ][ bib ] &copy JMLR 2022. ( edit, beta )

AIJ Journal 2022 Journal Article

Bayesian feature interaction selection for factorization machines

  • Yifan Chen
  • Yang Wang
  • Pengjie Ren
  • Meng Wang
  • Maarten de Rijke

Factorization machines are a generic supervised method for a wide range of tasks in the field of artificial intelligence, such as prediction, inference, etc. , which can effectively model feature interactions. However, handling combinations of features is expensive due to the exponential growth of feature interactions with the order. In nature, not all feature interactions are equally useful for prediction. Recently, a large number of methods that perform feature interaction selection have attracted great attention because of their effectiveness at filtering out useless feature interactions. Current feature interaction selection methods suffered from the following limitations: (1) they assume that all users share the same feature interactions; and (2) they select pairwise feature interactions only. In this paper, we propose novel Bayesian variable selection methods, targeting feature interaction selection for factorization machines, which effectively reduce the number of interactions. We study personalized feature interaction selection to account for individual preferences, and further extend the model to investigate higher-order feature interaction selection on higher-order factorization machines. We provide empirical evidence for the advantages of the proposed Bayesian feature interaction selection methods using different prediction tasks.

IJCAI Conference 2022 Conference Paper

Beyond Homophily: Structure-aware Path Aggregation Graph Neural Network

  • Yifei Sun
  • Haoran Deng
  • Yang Yang
  • Chunping Wang
  • Jiarong Xu
  • Renhong Huang
  • Linfeng Cao
  • Yang Wang

Graph neural networks (GNNs) have been intensively studied in various real-world tasks. However, the homophily assumption of GNNs' aggregation function limits their representation learning ability in heterophily graphs. In this paper, we shed light on the path level patterns in graphs that can explicitly reflect rich semantic and structural information. We therefore propose a novel Structure-aware Path Aggregation Graph Neural Network (PathNet) aiming to generalize GNNs for both homophily and heterophily graphs. Specifically, we first introduce a maximal entropy path sampler, which helps us sample a number of paths containing structural context. Then, we introduce a structure-aware recurrent cell consisting of order-preserving and distance-aware components to learn the semantic information of neighborhoods. Finally, we model the preference of different paths to target node after path encoding. Experimental results demonstrate that our model achieves superior performance in node classification on both heterophily and homophily graphs.

NeurIPS Conference 2022 Conference Paper

DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection

  • Xuanwen Huang
  • Yang Yang
  • Yang Wang
  • Chunping Wang
  • Zhisheng Zhang
  • Jiarong Xu
  • Lei Chen
  • Michalis Vazirgiannis

Graph Anomaly Detection (GAD) has recently become a hot research spot due to its practicability and theoretical value. Since GAD emphasizes the application and the rarity of anomalous samples, enriching the varieties of its datasets is fundamental. Thus, this paper present DGraph, a real-world dynamic graph in the finance domain. DGraph overcomes many limitations of current GAD datasets. It contains about 3M nodes, 4M dynamic edges, and 1M ground-truth nodes. We provide a comprehensive observation of DGraph, revealing that anomalous nodes and normal nodes generally have different structures, neighbor distribution, and temporal dynamics. Moreover, it suggests that 2M background nodes are also essential for detecting fraudsters. Furthermore, we conduct extensive experiments on DGraph. Observation and experiments demonstrate that DGraph is propulsive to advance GAD research and enable in-depth exploration of anomalous nodes.

NeurIPS Conference 2022 Conference Paper

Domain Generalization by Learning and Removing Domain-specific Features

  • Yu Ding
  • Lei Wang
  • Bin Liang
  • Shuming Liang
  • Yang Wang
  • Fang Chen

Deep Neural Networks (DNNs) suffer from domain shift when the test dataset follows a distribution different from the training dataset. Domain generalization aims to tackle this issue by learning a model that can generalize to unseen domains. In this paper, we propose a new approach that aims to explicitly remove domain-specific features for domain generalization. Following this approach, we propose a novel framework called Learning and Removing Domain-specific features for Generalization (LRDG) that learns a domain-invariant model by tactically removing domain-specific features from the input images. Specifically, we design a classifier to effectively learn the domain-specific features for each source domain, respectively. We then develop an encoder-decoder network to map each input image into a new image space where the learned domain-specific features are removed. With the images output by the encoder-decoder network, another classifier is designed to learn the domain-invariant features to conduct image classification. Extensive experiments demonstrate that our framework achieves superior performance compared with state-of-the-art methods.

YNICL Journal 2022 Journal Article

Frequency-dependent white-matter functional network changes associated with cognitive deficits in subcortical vascular cognitive impairment

  • Juanwei Ma
  • Feng Liu
  • Yang Wang
  • Lin Ma
  • Yali Niu
  • Jing Wang
  • Zhaoxiang Ye
  • Jing Zhang

Vascular cognitive impairment (VCI) refers to all forms of cognitive decline associated with cerebrovascular diseases, in which white matter (WM) is highly vulnerable. Although previous studies have shown that blood oxygen level-dependent (BOLD) signals inside WM can effectively reflect neural activities, whether WM BOLD signal alterations are present and their roles underlying cognitive impairment in VCI remain largely unknown. In this study, 36 subcortical VCI (SVCI) patients and 36 healthy controls were enrolled to evaluate WM dysfunction. Specifically, fourteen distinct WM networks were identified from resting-state functional MRI using K-means clustering analysis. Subsequently, between-network functional connectivity (FC) and within-network BOLD signal amplitude of WM networks were calculated in three frequency bands (band A: 0.01-0.15 Hz, band B: 0.08-0.15 Hz, and band C: 0.01-0.08 Hz). Patients with SVCI manifested decreased FC mainly in bilateral parietal WM regions, forceps major, superior and inferior longitudinal fasciculi. These connections extensively linked with distinct WM networks and with gray-matter networks such as frontoparietal control, dorsal and ventral attention networks, which exhibited frequency-specific alterations in SVCI. Additionally, extensive amplitude reductions were found in SVCI, showing frequency-dependent properties in parietal, anterior corona radiate, pre/post central, superior and inferior longitudinal fasciculus networks. Furthermore, these decreased FC and amplitudes showed significant positive correlations with cognitive performances in SVCI, and high diagnostic performances for SVCI especially combining all bands. Our study indicated that VCI-related cognitive deficits were characterized by frequency-dependent WM functional abnormalities, which offered novel applicable neuromarkers for VCI.

NeurIPS Conference 2022 Conference Paper

Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts

  • Tao Zhong
  • Zhixiang Chi
  • Li Gu
  • Yang Wang
  • Yuanhao Yu
  • Jin Tang

In this paper, we tackle the problem of domain shift. Most existing methods perform training on multiple source domains using a single model, and the same trained model is used on all unseen target domains. Such solutions are sub-optimal as each target domain exhibits its own specialty, which is not adapted. Furthermore, expecting single-model training to learn extensive knowledge from multiple source domains is counterintuitive. The model is more biased toward learning only domain-invariant features and may result in negative knowledge transfer. In this work, we propose a novel framework for unsupervised test-time adaptation, which is formulated as a knowledge distillation process to address domain shift. Specifically, we incorporate Mixture-of-Experts (MoE) as teachers, where each expert is separately trained on different source domains to maximize their specialty. Given a test-time target domain, a small set of unlabeled data is sampled to query the knowledge from MoE. As the source domains are correlated to the target domains, a transformer-based aggregator then combines the domain knowledge by examining the interconnection among them. The output is treated as a supervision signal to adapt a student prediction network toward the target domain. We further employ meta-learning to enforce the aggregator to distill positive knowledge and the student network to achieve fast adaptation. Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art and validates the effectiveness of each proposed component. Our code is available at https: //github. com/n3il666/Meta-DMoE.

AAAI Conference 2022 Conference Paper

ProgressiveMotionSeg: Mutually Reinforced Framework for Event-Based Motion Segmentation

  • Jinze Chen
  • Yang Wang
  • Yang Cao
  • Feng Wu
  • Zheng-Jun Zha

Dynamic Vision Sensor (DVS) can asynchronously output the events reflecting apparent motion of objects with microsecond resolution, and shows great application potential in monitoring and other fields. However, the output event stream of existing DVS inevitably contains background activity noise (BA noise) due to dark current and junction leakage current, which will affect the temporal correlation of objects, resulting in deteriorated motion estimation performance. Particularly, the existing filter-based denoising methods cannot be directly applied to suppress the noise in event stream, since there is no spatial correlation. To address this issue, this paper presents a novel progressive framework, in which a Motion Estimation (ME) module and an Event Denoising (ED) module are jointly optimized in a mutually reinforced manner. Specifically, based on the maximum sharpness criterion, ME module divides the input event into several segments by adaptive clustering in a motion compensating warp field, and captures the temporal correlation of event stream according to the clustered motion parameters. Taking temporal correlation as guidance, ED module calculates the confidence that each event belongs to real activity events, and transmits it to ME module to update energy function of motion segmentation for noise suppression. The two steps are iteratively updated until stable motion segmentation results are obtained. Extensive experimental results on both synthetic and real datasets demonstrate the superiority of our proposed approaches against the State-Of-The-Art (SOTA) methods.

AAAI Conference 2022 Conference Paper

Self-Supervised Spatiotemporal Representation Learning by Exploiting Video Continuity

  • Hanwen Liang
  • Niamul Quader
  • Zhixiang Chi
  • Lizhe Chen
  • Peng Dai
  • Juwei Lu
  • Yang Wang

Recent self-supervised video representation learning methods have found significant success by exploring essential properties of videos, e. g. speed, temporal order, etc. This work exploits an essential yet under-explored property of videos, the video continuity, to obtain supervision signals for selfsupervised representation learning. Specifically, we formulate three novel continuity-related pretext tasks, i. e. continuity justification, discontinuity localization, and missing section approximation, that jointly supervise a shared backbone for video representation learning. This self-supervision approach, termed as Continuity Perception Network (CPNet), solves the three tasks altogether and encourages the backbone network to learn local and long-ranged motion and context representations. It outperforms prior arts on multiple downstream tasks, such as action recognition, video retrieval, and action localization. Additionally, the video continuity can be complementary to other coarse-grained video properties for representation learning, and integrating the proposed pretext task to prior arts can yield much performance gains.

YNIMG Journal 2021 Journal Article

Fronto-occipital mismatch responses in pre-attentive detection of visual changes: Implication on a generic brain network underlying Mismatch Negativity (MMN)

  • Chun-Yu Tse
  • Yu-Hei Shum
  • Xue-Zhen Xiao
  • Yang Wang

Current theories of pre-attentive change detection suggest a regularity or prediction violation mechanism involving a frontotemporal network. Modulations of the early inferior frontal cortex (IFC) mismatch response representing the effort in comparing a stimulus to the prediction, the superior temporal cortex (STC) response indicating deviance detection, and the late IFC response representing prediction model updating were consistently demonstrated in auditory change detection using event-related optical signal (EROS). If the prediction violation hypothesis is universal, a generic neural mechanism should be found in all sensory modalities. We postulated a generic fronto-sensory cortical network underlying the prediction violation mechanism: the IFC is responsible for non-modality-specific prediction processes while the sensory cortices are responsible for modality-specific error signal generation process. This study examined the involvement of the IFC-occipital cortex (OC) network in visual pre-attentive change detection. The EROS mismatch responses to deviant bar arrays violating a fixed orientation regularity (low in regularity abstractness) were compared to that of deviant violating a rotational orientation regularity (high in abstractness) while the information available for establishing the prediction model was manipulated by varying the number of standards preceding the deviants. Modulations of the IFCOC mismatch response patterns by abstractness and train length reflected the processing demands on the prediction processes and were similar to that of the IFC-STC network in auditory change detection. These findings demonstrated that the fronto-sensory cortical network is not unique to auditory pre-attentive change detection and provided supports for a universal neural mechanism across sensory modalities as suggested by the prediction violation hypothesis.

NeurIPS Conference 2021 Conference Paper

Generalized Linear Bandits with Local Differential Privacy

  • Yuxuan Han
  • Zhipeng Liang
  • Yang Wang
  • Jiheng Zhang

Contextual bandit algorithms are useful in personalized online decision-making. However, many applications such as personalized medicine and online advertising require the utilization of individual-specific information for effective learning, while user's data should remain private from the server due to privacy concerns. This motivates the introduction of local differential privacy (LDP), a stringent notion in privacy, to contextual bandits. In this paper, we design LDP algorithms for stochastic generalized linear bandits to achieve the same regret bound as in non-privacy settings. Our main idea is to develop a stochastic gradient-based estimator and update mechanism to ensure LDP. We then exploit the flexibility of stochastic gradient descent (SGD), whose theoretical guarantee for bandit problems is rarely explored, in dealing with generalized linear bandits. We also develop an estimator and update mechanism based on Ordinary Least Square (OLS) for linear bandits. Finally, we conduct experiments with both simulation and real-world datasets to demonstrate the consistently superb performance of our algorithms under LDP constraints with reasonably small parameters $(\varepsilon, \delta)$ to ensure strong privacy protection.

YNIMG Journal 2021 Journal Article

Improved resting state functional connectivity sensitivity and reproducibility using a multiband multi-echo acquisition

  • Alexander D. Cohen
  • Baolian Yang
  • Brice Fernandez
  • Suchandrima Banerjee
  • Yang Wang

Recent advances in functional MRI techniques include multiband (MB) imaging and multi-echo (ME) imaging. In MB imaging multiple slices are acquired simultaneously leading to significant increases in temporal and spatial resolution. Multi-echo imaging enables multiple echoes to be acquired in one shot, where the ME images can be used to denoise the BOLD time series and increase BOLD sensitivity. In this study, resting state fMRI (rs-fMRI) data were collected using a combined MBME sequence and compared to an MB single echo sequence. In total, 29 subjects were imaged, and 18 of them returned within two weeks for repeat imaging. Participants underwent one MBME scan with three echoes and one MB scan with one echo. Both datasets were processed using standard denoising and advanced denoising. Advanced denoising included multi-echo independent component analysis (ME-ICA) for the MBME data and ICA-AROMA for the MB data. Resting state functional connectivity (RSFC) was evaluated using both selective seed-based and whole grey matter (GM) region-of-interest (ROI) based approaches. The reproducibility of connectivity metrics was also analyzed in the repeat subjects. In addition, functional connectivity density (FCD), a data-driven approach that counts the number of significant connections, both within a local cluster and globally, with each voxel was analyzed. Regardless of the standard or advanced denoising technique, all seed-based RSFC was significantly higher for MBME compared to MB. Much more GM ROI combinations showed significantly higher RSFC for MBME vs. MB. Reproducibility, evaluated using the dice coefficient was significantly higher for MBME relative to MB data. Finally, FCD was also higher for MBME vs. MB data. This study showed higher RSFC for MBME vs. MB data using selected seed-based, whole GM ROI-based, and data-driven approaches. Reproducibility found also higher for MBME data. Taken together, these results indicate that MBME is a promising technique for rs-fMRI.

NeurIPS Conference 2021 Conference Paper

Non-asymptotic Error Bounds for Bidirectional GANs

  • Shiao Liu
  • Yunfei Yang
  • Jian Huang
  • Yuling Jiao
  • Yang Wang

We derive nearly sharp bounds for the bidirectional GAN (BiGAN) estimation error under the Dudley distance between the latent joint distribution and the data joint distribution with appropriately specified architecture of the neural networks used in the model. To the best of our knowledge, this is the first theoretical guarantee for the bidirectional GAN learning approach. An appealing feature of our results is that they do not assume the reference and the data distributions to have the same dimensions or these distributions to have bounded support. These assumptions are commonly assumed in the existing convergence analysis of the unidirectional GANs but may not be satisfied in practice. Our results are also applicable to the Wasserstein bidirectional GAN if the target distribution is assumed to have a bounded support. To prove these results, we construct neural network functions that push forward an empirical distribution to another arbitrary empirical distribution on a possibly different-dimensional space. We also develop a novel decomposition of the integral probability metric for the error analysis of bidirectional GANs. These basic theoretical results are of independent interest and can be applied to other related learning problems.

AAAI Conference 2021 Conference Paper

Partial-Label and Structure-constrained Deep Coupled Factorization Network

  • Yan Zhang
  • Zhao Zhang
  • Yang Wang
  • Zheng Zhang
  • Li Zhang
  • Shuicheng Yan
  • Meng Wang

In this paper, we technically propose an enriched prior guided framework, called Dual-constrained Deep Semi-Supervised Coupled Factorization Network (DS2 CF-Net), for discovering hierarchical coupled data representation. To extract hidden deep features, DS2 CF-Net is formulated as a partial-label and geometrical structure-constrained framework. Specifically, DS2 CF-Net designs a deep factorization architecture using multilayers of linear transformations, which can coupled update both the basis vectors and new representations in each layer. To enable learned deep representations and coefficients to be discriminative, we also consider enriching the supervised prior by joint deep coefficients-based label prediction and then incorporate the enriched prior information as additional label and structure constraints. The label constraint can enable the intra-class samples to have same coordinate in feature space, and the structure constraint forces the coefficients in each layer to be block-diagonal so that the enriched prior using the self-expressive label propagation are more accurate. Our network also integrates the adaptive dualgraph learning to retain the local structures of both data and feature manifolds in each layer. Extensive experiments on image datasets demonstrate the effectiveness of DS2 CF-Net for representation learning and clustering.

YNIMG Journal 2021 Journal Article

Using multiband multi-echo imaging to improve the robustness and repeatability of co-activation pattern analysis for dynamic functional connectivity

  • Alexander D. Cohen
  • Catie Chang
  • Yang Wang

Emerging evidence has shown that functional connectivity is dynamic and changes over the course of a scan. Furthermore, connectivity patterns can arise from short periods of co-activation on the order of seconds. Recently, a dynamic co-activation patterns (CAPs) analysis was introduced to examine the co-activation of voxels resulting from individual timepoints. The goal of this study was to apply CAPs analysis on resting state fMRI data collected using an advanced multiband multi-echo (MBME) sequence, in comparison with a multiband (MB) sequence with a single echo. Data from 28 healthy control subjects were examined. Subjects underwent two resting state scans, one MBME and one MB, and 19 subjects returned within two weeks for a repeat scan session. Data preprocessing included advanced denoising namely multi-echo independent component analysis (ME-ICA) for the MBME data and an ICA-based strategy for Automatic Removal of Motion Artifacts (ICA-AROMA) for the MB data. The CAPs analysis was conducted using the newly published TbCAPs toolbox. CAPs were extracted using both seed-based and seed-free approaches. Timepoints were clustered using k-means clustering. The following metrics were compared between MBME and MB datasets: mean activation in each CAP, the spatial correlation and mean squared error (MSE) between each timepoint and the centroid CAP it was assigned to, within-dataset variance across timepoints assigned to the same CAP, and the between-session spatial correlation of each CAP. Co-activation was heightened for MBME data for the majority of CAPs. Spatial correlation and MSE between each timepoint and its assigned centroid CAP were higher and lower respectively for MBME data. The within-dataset variance was also lower for MBME data. Finally, the between-session spatial correlation was higher for MBME data. Overall, our findings suggest that the advanced MBME sequence is a promising avenue for the measurement of dynamic co-activation patterns by increasing the robustness and reproducibility of the CAPs.

JMLR Journal 2020 Journal Article

Efficient Inference for Nonparametric Hawkes Processes Using Auxiliary Latent Variables

  • Feng Zhou
  • Zhidong Li
  • Xuhui Fan
  • Yang Wang
  • Arcot Sowmya
  • Fang Chen

The expressive ability of classic Hawkes processes is limited due to the parametric assumption on the baseline intensity and triggering kernel. Therefore, it is desirable to perform inference in a data-driven, nonparametric approach. Many recent works have proposed nonparametric Hawkes process models based on Gaussian processes (GP). However, the likelihood is non-conjugate to the prior resulting in a complicated and time-consuming inference procedure. To address the problem, we present the sigmoid Gaussian Hawkes process model in this paper: the baseline intensity and triggering kernel are both modeled as the sigmoid transformation of random trajectories drawn from a GP. By introducing auxiliary latent random variables (branching structure, P\'{o}lya-Gamma random variables and latent marked Poisson processes), the likelihood is converted to two decoupled components with a Gaussian form which allows for an efficient conjugate analytical inference. Using the augmented likelihood, we derive an efficient Gibbs sampling algorithm to sample from the posterior; an efficient expectation-maximization (EM) algorithm to obtain the maximum a posteriori (MAP) estimate and furthermore an efficient mean-field variational inference algorithm to approximate the posterior. To further accelerate the inference, a sparse GP approximation is introduced to reduce complexity. We demonstrate the performance of our three algorithms on both simulated and real data. The experiments show that our proposed inference algorithms can recover well the underlying prompting characteristics efficiently. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

IJCAI Conference 2020 Conference Paper

Financial Risk Analysis for SMEs with Graph-based Supply Chain Mining

  • Shuo Yang
  • Zhiqiang Zhang
  • Jun Zhou
  • Yang Wang
  • Wang Sun
  • Xingyu Zhong
  • Yanming Fang
  • Quan Yu

Small and Medium-sized Enterprises (SMEs) are playing a vital role in the modern economy. Recent years, financial risk analysis for SMEs attracts lots of attentions from financial institutions. However, the financial risk analysis for SMEs usually suffers data deficiency problem, especially for the mobile financial institutions which seldom collect credit-related data directly from SMEs. Fortunately, although credit-related information of SMEs is hard to be acquired sufficiently, the interactive relationships between SMEs, which may contain valuable information of financial risk, is usually available for the mobile financial institutions. Finding out credit-related relationship of SME from massive interactions helps comprehensively model the SMEs thus improve the performance of financial risk analysis. In this paper, tackling the data deficiency problem of financial risk analysis for SMEs, we propose an innovative financial risk analysis framework with graph-based supply chain mining. Specifically, to capture the credit-related topology structural and temporal variation information of SMEs, we design and employ a novel spatial-temporal aware graph neural network, to mine supply chain relationship on a SME graph, and then analysis the credit risk based on the mined supply chain graph. Experimental results on real-world financial datasets prove the effectiveness of our proposal for financial risk analysis for SMEs.

JBHI Journal 2020 Journal Article

Multimodal Data Analysis of Alzheimer's Disease Based on Clustering Evolutionary Random Forest

  • Xia-an Bi
  • Xi Hu
  • Hao Wu
  • Yang Wang

Alzheimer's disease (AD) has become a severe medical challenge. Advances in technologies produced high-dimensional data of different modalities including functional magnetic resonance imaging (fMRI) and single nucleotide polymorphism (SNP). Understanding the complex association patterns among these heterogeneous and complementary data is of benefit to the diagnosis and prevention of AD. In this paper, we apply the appropriate correlation analysis method to detect the relationships between brain regions and genes, and propose “brain region-gene pairs” as the multimodal features of the sample. In addition, we put forward a novel data analysis method from technology aspect, cluster evolutionary random forest (CERF), which is suitable for “brain region-gene pairs”. The idea of clustering evolution is introduced to improve the generalization performance of random forest which is constructed by randomly selecting samples and sample features. Through hierarchical clustering of decision trees in random forest, the decision trees with higher similarity are clustered into one class, and the decision trees with the best performance are retained to enhance the diversity between decision trees. Furthermore, based on CERF, we integrate feature construction, feature selection and sample classification to find the optimal combination of different methods, and design a comprehensive diagnostic framework for AD. The framework is validated by the samples with both fMRI and SNP data from ADNI. The results show that we can effectively identify AD patients and discover some brain regions and genes associated with AD significantly based on this framework. These findings are conducive to the clinical treatment and prevention of AD.

IJCAI Conference 2020 Conference Paper

Recovering Accurate Labeling Information from Partially Valid Data for Effective Multi-Label Learning

  • Ximing Li
  • Yang Wang

Partial Multi-label Learning (PML) aims to induce the multi-label predictor from datasets with noisy supervision, where each training instance is associated with several candidate labels but only partially valid. To address the noisy issue, the existing PML methods basically recover the ground-truth labels by leveraging the ground-truth confidence of the candidate label, i. e. , the likelihood of a candidate label being a ground-truth one. However, they neglect the information from non-candidate labels, which potentially contributes to the ground-truth label recovery. In this paper, we propose to recover the ground-truth labels, i. e. , estimating the ground-truth confidences, from the label enrichment, composed of the relevance degrees of candidate labels and irrelevance degrees of non-candidate labels. Upon this observation, we further develop a novel two-stage PML method, namely Partial Multi-Label Learning with Label Enrichment-Recovery (PML3ER), where in the first stage, it estimates the label enrichment with unconstrained label propagation, then jointly learns the ground-truth confidence and multi-label predictor given the label enrichment. Experimental results validate that PML3ER outperforms the state-of-the-art PML methods.

IJCAI Conference 2020 Conference Paper

Recurrent Relational Memory Network for Unsupervised Image Captioning

  • Dan Guo
  • Yang Wang
  • Peipei Song
  • Meng Wang

Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models. In this paper, we propose a novel memory-based network rather than GAN, named Recurrent Relational Memory Network (R2M). Unlike complicated and sensitive adversarial learning that non-ideally performs for long sentence generation, R2M implements a concepts-to-sentence memory translator through two-stage memory mechanisms: fusion and recurrent memories, correlating the relational reasoning between common visual concepts and the generated words for long periods. R2M encodes visual context through unsupervised training on images, while enabling the memory to learn from irrelevant textual corpus via supervised fashion. Our solution enjoys less learnable parameters and higher computational efficiency than GAN-based methods, which heavily bear parameter sensitivity. We experimentally validate the superiority of R2M than state-of-the-arts on all benchmark datasets.

AAAI Conference 2020 Conference Paper

RiskOracle: A Minute-Level Citywide Traffic Accident Forecasting Framework

  • Zhengyang Zhou
  • Yang Wang
  • Xike Xie
  • Lianliang Chen
  • Hengchang Liu

Real-time traffic accident forecasting is increasingly important for public safety and urban management (e. g. , real-time safe route planning and emergency response deployment). Previous works on accident forecasting are often performed on hour levels, utilizing existed neural networks with static region-wise correlations taken into account. However, it is still challenging when the granularity of forecasting step improves as the highly dynamic nature of road network and inherent rareness of accident records in one training sample, which leads to biased results and zero-inflated issue. In this work, we propose a novel framework RiskOracle, to improve the prediction granularity to minute levels. Specifically, we first transform the zero-risk values in labels to fit the training network. Then, we propose the Differential Timevarying Graph neural network (DTGN) to capture the immediate changes of traffic status and dynamic inter-subregion correlations. Furthermore, we adopt multi-task and region selection schemes to highlight citywide most-likely accident subregions, bridging the gap between biased risk values and sporadic accident distribution. Extensive experiments on two real-world datasets demonstrate the effectiveness and scalability of our RiskOracle framework.

EAAI Journal 2020 Journal Article

Robust RGB-D tracking via compact CNN features

  • Yong Wang
  • Xian Wei
  • Lingkun Luo
  • Wen Wen
  • Yang Wang

Feature representation is at the core of visual tracking. This paper presents a robust tracking method in RGB-D videos. Firstly, the RGB and depth images are separately encoded using a hierarchical convolutional neural network (CNN) features. Secondly, in order to reduce computation cost, we exploit random projection to compress the CNN features. The high dimensional CNN features are randomly projected into a low dimensional feature space. The correlation filter tracking framework is then independently carried out in RGB and depth images. And backward tracking scheme is adopted to evaluate the tracking results in these two images. The final position is determined according to the tracked location in the two image channels. In addition, model updating is implemented adaptively. Our tracker is evaluated on two RGB-D benchmark datasets and achieves comparable results to the other state-of-the-art RGB-D tracking methods.

TIST Journal 2020 Journal Article

Understanding the Long-Term Evolution of Electric Taxi Networks

  • Guang Wang
  • Fan Zhang
  • Huijun Sun
  • Yang Wang
  • Desheng Zhang

Due to the ever-growing concerns over air pollution and energy security, more and more cities have started to replace their conventional taxi fleets with electric ones. Even though environmentally friendly, the rapid promotion of electric taxis raises problems to both taxi drivers and governments, e.g., prolonged waiting/charging time, unbalanced utilization of charging infrastructures, and inadequate taxi supply due to the long charging time. In this article, we conduct the first longitudinal measurement study to understand the long-term evolution of mobility and charging patterns by utilizing 5-year data from one of the largest electric taxi networks in the world, i.e., the Shenzhen electric taxi network in China. In particular, (1) we first perform an electric taxi contextualization about their operation and charging activities; (2) then we design a generic charging event extraction algorithm based on GPS data and charging station data, and (3) based on the contextualization and extracted charging activities, we perform a comprehensive measurement study called ePat to explore the evolution of the electric taxi network from the mobility and charging perspectives. Our ePat is based on 4.8 TB taxi GPS data, 240 GB taxi transaction data, and metadata from 117 charging stations, during an evolution process from 427 electric taxis in 2013 to 13,178 in 2018. Moreover, ePat also explores the impacts of various contexts and benefits during the evolution process. Our ePat as a comprehensive measurement of the electric taxi network mobility and charging evolution has the potential to advance the understanding of the evolution patterns of electric taxi networks and pave the way for analyzing future shared autonomous vehicles.

IJCAI Conference 2020 Conference Paper

Unsupervised Vehicle Re-identification with Progressive Adaptation

  • Jinjia Peng
  • Yang Wang
  • Huibing Wang
  • Zhao Zhang
  • Xianping Fu
  • Meng Wang

Vehicle re-identification (reID) aims at identifying vehicles across different non-overlapping cameras views. The existing methods heavily relied on well-labeled datasets for ideal performance, which inevitably causes fateful drop due to the severe domain bias between the training domain and the real-world scenes; worse still, these approaches required full annotations, which is labor-consuming. To tackle these challenges, we propose a novel Progressive Adaptation Learning method for vehicle reID, named PAL, which infers from the abundant data without annotations. For PAL, a data adaptation module is employed for source domain, which generates the images with similar data distribution to unlabeled target domain as “pseudo target samples”. These pseudo samples are combined with the unlabeled samples that are selected by a dynamic sampling strategy to make training faster. We further proposed a weighted label smoothing (WLS) loss, which considers the similarity between samples with different clusters to balance the confidence of pseudo labels. Comprehensive experimental results validate the advantages of PAL on both VehicleID and VeRi-776 dataset.

AAAI Conference 2020 Conference Paper

When AWGN-Based Denoiser Meets Real Noises

  • Yuqian Zhou
  • Jianbo Jiao
  • Haibin Huang
  • Yang Wang
  • Jue Wang
  • Honghui Shi
  • Thomas Huang

Discriminative learning based image denoisers have achieved promising performance on synthetic noises such as Additive White Gaussian Noise (AWGN). The synthetic noises adopted in most previous work are pixel-independent, but real noises are mostly spatially/channel-correlated and spatially/channel-variant. This domain gap yields unsatisfied performance on images with real noises if the model is only trained with AWGN. In this paper, we propose a novel approach to boost the performance of a real image denoiser which is trained only with synthetic pixel-independent noise data dominated by AWGN. First, we train a deep model that consists of a noise estimator and a denoiser with mixed AWGN and Random Value Impulse Noise (RVIN). We then investigate Pixel-shuffle Down-sampling (PD) strategy to adapt the trained model to real noises. Extensive experiments demonstrate the effectiveness and generalization of the proposed approach. Notably, our method achieves state-of-theart performance on real sRGB images in the DND benchmark among models trained with synthetic noises. Codes are available at https: //github. com/yzhouas/PD-Denoising-pytorch.

IJCAI Conference 2019 Conference Paper

Approximate Optimal Transport for Continuous Densities with Copulas

  • Jinjin Chi
  • Jihong Ouyang
  • Ximing Li
  • Yang Wang
  • Meng Wang

Optimal Transport (OT) formulates a powerful framework by comparing probability distributions, and it has increasingly attracted great attention within the machine learning community. However, it suffers from severe computational burden, due to the intractable objective with respect to the distributions of interest. Especially, there still exist very few attempts for continuous OT, i. e. , OT for comparing continuous densities. To this end, we develop a novel continuous OT method, namely Copula OT (Cop-OT). The basic idea is to transform the primal objective of continuous OT into a tractable form with respect to the copula parameter, which can be efficiently solved by stochastic optimization with less time and memory requirements. Empirical results on real applications of image retrieval and synthetic data demonstrate that our Cop-OT can gain more accurate approximations to continuous OT values than the state-of-the-art baselines.

IJCAI Conference 2019 Conference Paper

Discriminative Sample Generation for Deep Imbalanced Learning

  • Ting Guo
  • Xingquan Zhu
  • Yang Wang
  • Fang Chen

In this paper, we propose a discriminative variational autoencoder (DVAE) to assist deep learning from data with imbalanced class distributions. DVAE is designed to alleviate the class imbalance by explicitly learning class boundaries between training samples, and uses learned class boundaries to guide the feature learning and sample generation. To learn class boundaries, DVAE learns a latent two-component mixture distributor, conditioned by the class labels, so the latent features can help differentiate minority class vs. majority class samples. In order to balance the training data for deep learning to emphasize on the minority class, we combine DVAE and generative adversarial networks (GAN) to form a unified model, DVAAN, which generates synthetic instances close to the class boundaries as training data to learn latent features and update the model. Experiments and comparisons confirm that DVAAN significantly alleviates the class imbalance and delivers accurate models for deep learning from imbalanced data.

NeurIPS Conference 2019 Conference Paper

Region Mutual Information Loss for Semantic Segmentation

  • Shuai Zhao
  • Yang Wang
  • Zheng Yang
  • Deng Cai

Semantic segmentation is a fundamental problem in computer vision. It is considered as a pixel-wise classification problem in practice, and most segmentation models use a pixel-wise loss as their optimization criterion. However, the pixel-wise loss ignores the dependencies between pixels in an image. Several ways to exploit the relationship between pixels have been investigated, \eg, conditional random fields (CRF) and pixel affinity based methods. Nevertheless, these methods usually require additional model branches, large extra memories, or more inference time. In this paper, we develop a region mutual information (RMI) loss to model the dependencies among pixels more simply and efficiently. In contrast to the pixel-wise loss which treats the pixels as independent samples, RMI uses one pixel and its neighbour pixels to represent this pixel. Then for each pixel in an image, we get a multi-dimensional point that encodes the relationship between pixels, and the image is cast into a multi-dimensional distribution of these high-dimensional points. The prediction and ground truth thus can achieve high order consistency through maximizing the mutual information (MI) between their multi-dimensional distributions. Moreover, as the actual value of the MI is hard to calculate, we derive a lower bound of the MI and maximize the lower bound to maximize the real value of the MI. RMI only requires a few extra computational resources in the training stage, and there is no overhead during testing. Experimental results demonstrate that RMI can achieve substantial and consistent improvements in performance on PASCAL VOC 2012 and CamVid datasets. The code is available at \url{https: //github. com/ZJULearning/RMI}.

TIST Journal 2018 Journal Article

Adaptive Online One-Class Support Vector Machines with Applications in Structural Health Monitoring

  • Ali Anaissi
  • Nguyen Lu Dang Khoa
  • Thierry Rakotoarivelo
  • Mehrisadat Makki Alamdari
  • Yang Wang

One-class support vector machine (OCSVM) has been widely used in the area of structural health monitoring, where only data from one class (i.e., healthy) are available. Incremental learning of OCSVM is critical for online applications in which huge data streams continuously arrive and the healthy data distribution may vary over time. This article proposes a novel adaptive self-advised online OCSVM that incrementally tunes the kernel parameter and decides whether a model update is required or not. As opposed to existing methods, this novel online algorithm does not rely on any fixed threshold, but it uses the slack variables in the OCSVM to determine which new data points should be included in the training set and trigger a model update. The algorithm also incrementally tunes the kernel parameter of OCSVM automatically based on the spatial locations of the edge and interior samples in the training data with respect to the constructed hyperplane of OCSVM. This new online OCSVM algorithm was extensively evaluated using synthetic data and real data from case studies in structural health monitoring. The results showed that the proposed method significantly improved the classification error rates, was able to assimilate the changes in the positive data distribution over time, and maintained a high damage detection accuracy in all case studies.

IROS Conference 2018 Conference Paper

Design and Implementation of a Novel Aerial Manipulator with Tandem Ducted Fans

  • Yibo Zhang
  • Changle Xiang
  • Bin Xu
  • Yang Wang
  • Xiaoliang Wang

This paper proposes a novel aerial manipulator with tandem ducted fans, which takes both trafficability and effective loading into account. The aerial manipulator is particularly suitable for grasping in complex and narrow environment, in which traditional multi-rotor and helicopter would be inaccessible. The comprehensive integrated dynamic model is established by taking the aerial vehicle dynamics and manipulator dynamics as a whole. On this basis, a multilayer composite controller with feedforward compensation is designed, considering the mutual reactive influence between the aerial vehicle and the manipulator to improve the stability of the system under the motion of the manipulator. The simulation and actual flight tests verify the effectiveness of the design and show good stability and tracking performance of the system.

IJCAI Conference 2018 Conference Paper

Enhancing Semantic Representations of Bilingual Word Embeddings with Syntactic Dependencies

  • Linli Xu
  • Wenjun Ouyang
  • Xiaoying Ren
  • Yang Wang
  • Liang Jiang

Cross-lingual representation is a technique that can both represent different languages in the same latent vector space and enable the knowledge transfer across languages. To learn such representations, most of existing works require parallel sentences with word-level alignments and assume that aligned words have similar Bag-of-Words (BoW) contexts. However, due to differences in grammar structures among different languages, the contexts of aligned words in different languages may appear at different positions of the sentence. To address this issue of different syntactics across different languages, we propose a model of bilingual word embeddings integrating syntactic dependencies (DepBiWE) by producing dependency parse-trees which encode the accurate relative positions for the contexts of aligned words. In addition, a new method is proposed to learn bilingual word embeddings from dependency-based contexts and BoW contexts jointly. Extensive experimental results on a real world dataset clearly validate the superiority of the proposed model DepBiWE on various natural language processing (NLP) tasks.

YNIMG Journal 2018 Journal Article

Establishing the functional connectivity of the frontotemporal network in pre-attentive change detection with Transcranial Magnetic Stimulation and event-related optical signal

  • Chun-Yu Tse
  • Long-Yin Yip
  • Troby Ka-Yan Lui
  • Xue-Zhen Xiao
  • Yang Wang
  • Winnie Chiu Wing Chu
  • Nathan Allen Parks
  • Sandra Sau-Man Chan

Current theories of pre-attentive deviant detection postulate that before the Superior Temporal Cortex (STC) detects a change, the Inferior Frontal Cortex (IFC) engages in stimulus analysis, which is particularly critical for ambiguous deviations (e. g. , deviant preceded by a short train of standards). These theories rest on the assumption that IFC and STC are functionally connected, which has only been supported by correlational brain imaging studies. We examined this functional connectivity assumption by applying Transcranial Magnetic Stimulation (TMS) to disrupt IFC function, while measuring the later STC mismatch response with the event-related optical signal (EROS). EROS can localize brain activity in both spatial and temporal dimensions via measurement of optical property changes associated with neuronal activity, and is inert to the electromagnetic interference produced by TMS. Specifically, the STC mismatch response at 120–180 ms elicited by a deviant preceded by a short standard train when IFC TMS was applied at 80 ms was compared with the STC mismatch responses in temporal control (TMS with 200 ms delay), spatial control (sham TMS at vertex), auditory control (TMS pulse noise only), and cognitive control (deviant preceded by a long standard train) conditions. The STC mismatch response to deviants preceded by the short train was abolished by TMS of the IFC at 80 ms, while the STC responses remained intact in all other control conditions. These results confirm the involvement of the IFC in the STC mismatch response and support a functional connection between IFC and STC.

IJCAI Conference 2018 Conference Paper

Real-time Traffic Pattern Analysis and Inference with Sparse Video Surveillance Information

  • Yang Wang
  • Yiwei Xiao
  • Xike Xie
  • Ruoyu Chen
  • Hengchang Liu

Recent advances in video surveillance systems enable a new paradigm for intelligent urban traffic management systems. Since surveillance cameras are usually sparsely located to cover key regions of the road under surveillance, it is a big challenge to perform a complete real-time traffic pattern analysis based on incomplete sparse surveillance information. As a result, existing works mostly focus on predicting traffic volumes with historical records available at a particular location and may not provide a complete picture of real-time traffic patterns. To this end, in this paper, we go beyond existing works and tackle the challenges of traffic flow analysis from three perspectives. First, we train the transition probabilities to capture vehicles' movement patterns. The transition probabilities are trained from third-party vehicle GPS data, and thus can work in the area even if there is no camera. Second, we exploit the Multivariate Normal Distribution model together with the transferred probabilities to estimate the unobserved traffic patterns. Third, we propose an algorithm for real-time traffic inference with surveillance as a complement source of information. Finally, experiments on real-world data show the effectiveness of our approach.

TAAS Journal 2017 Journal Article

On Service Migrations in the Cloud for Mobile Accesses

  • Yang Wang
  • Bharadwaj Veeravalli
  • Chen-Khong Tham
  • Shuibing He
  • Chengzhong Xu

We study the problem of dynamically migrating a service in the cloud to satisfy an online sequence of mobile batch-request demands in a cost-effective way. The service may have single or multiple replicas, each running on a virtual machine. As the origin of mobile accesses frequently changes over time, this problem is particularly important for time-bounded services to achieve enhanced Quality of Service and cost effectiveness. Moving the service closer to the client locations not only reduces the service access latency but also minimizes the network costs for service providers. However, these benefits are not free. The migration comes at a cost of bulk-data transfer and service disruption, and hence, increasing the overall service costs. To gain the benefits of service migration while minimizing the caused monetary costs, we propose an efficient search-based algorithm Dmig to migrate a single server, and then extend it as a scalable algorithm, called mDmig, to the multi-server situation, a more general case in the cloud. Both algorithms are fully distributed, symmetric, and characterized by the effective use of historical access information to conduct virtual migration so that the limitations of local search in the cost reduction can be overcome. To evaluate the algorithms, we compared them with some existing algorithms and an off-line algorithm. Our simulation results showed that the proposed algorithms exhibit better performance in service migration by adapting to the changes of mobile access patterns in a cost-effective way.

IJCAI Conference 2016 Conference Paper

Bayesian Optimization of Partition Layouts for Mondrian Processes

  • Yi Wang
  • Bin Li
  • Xuhui Fan
  • Yang Wang
  • Fang Chen

The Mondrian process (MP) produces hierarchical partitions on a product space as a kd-tree, which can be served as a flexible yet parsimonious partition prior for relational modeling. Due to the recursive generation of partitions and varying dimensionality of the partition state space, the inference procedure for the MP relational modeling is extremely difficult. The prevalent inference method reversible-jump MCMC for this problem requires a number of unnecessary retrospective steps to transit from one partition state to a very similar one and it is prone to fall into a local optimum. In this paper, we attempt to circumvent these drawbacks by proposing an alternative method for inferring the MP partition structure. Based on the observation that similar cutting rate measures on the partition space lead to similar partition layouts, we propose to impose a nonhomogeneous cutting rate measure on the partition space to control the layouts of the generated partitions - the original MCMC sampling problem is thus transformed into a Bayesian global optimization problem. The empirical tests demonstrate that Bayesian optimization is able to find better partition structures than MCMC sampling with the same number of partition structure proposals.

NeurIPS Conference 2016 Conference Paper

Infinite Hidden Semi-Markov Modulated Interaction Point Process

  • matt zhang
  • Peng Lin
  • Ting Guo
  • Yang Wang
  • Fang Chen

The correlation between events is ubiquitous and important for temporal events modelling. In many cases, the correlation exists between not only events' emitted observations, but also their arrival times. State space models (e. g. , hidden Markov model) and stochastic interaction point process models (e. g. , Hawkes process) have been studied extensively yet separately for the two types of correlations in the past. In this paper, we propose a Bayesian nonparametric approach that considers both types of correlations via unifying and generalizing hidden semi-Markov model and interaction point process model. The proposed approach can simultaneously model both the observations and arrival times of temporal events, and determine the number of latent states from data. A Metropolis-within-particle-Gibbs sampler with ancestor resampling is developed for efficient posterior inference. The approach is tested on both synthetic and real-world data with promising outcomes.

AAAI Conference 2016 Conference Paper

Interaction Point Processes via Infinite Branching Model

  • Peng Lin
  • Bang Zhang
  • Ting Guo
  • Yang Wang
  • Fang Chen

Many natural and social phenomena can be modeled by interaction point processes (IPPs) (Diggle et al. 1994), stochastic point processes considering the interaction between points. In this paper, we propose the infinite branching model (IBM), a Bayesian statistical model that can generalize and extend some popular IPPs, e. g. , Hawkes process (Hawkes 1971; Hawkes and Oakes 1974). It treats IPP as a mixture of basis point processes with the aid of a distance dependent prior over branching structure that describes the relationship between points. The IBM can estimate point event intensity, interaction mechanism and branching structure simultaneously. A generic Metropolis-within-Gibbs sampling method is also developed for model parameter inference. The experiments on synthetic and real-world data demonstrate the superiority of the IBM.

IJCAI Conference 2016 Conference Paper

Iterative Views Agreement: An Iterative Low-Rank Based Structured Optimization Method to Multi-View Spectral Clustering

  • Yang Wang
  • Wenjie Zhang
  • Lin Wu
  • Xuemin Lin
  • Meng Fang
  • Shirui Pan

Multi-view spectral clustering, which aims at yielding an agreement or consensus data objects grouping across multi-views with their graph laplacian matrices, is a fundamental clustering problem. Among the existing methods, Low-Rank Representation (LRR) based method is quite superior in terms of its effectiveness, intuitiveness and robustness to noise corruptions. However, it aggressively tries to learn a common low-dimensional subspace for multi-view data, while inattentively ignoring the local manifold structure in each view, which is critically important to the spectral clustering; worse still, the low-rank minimization is enforced to achieve the data correlation consensus among all views, failing to flexibly preserve the local manifold structure for each view. In this paper, 1) we propose a multi-graph laplacian regularized LRR with each graph laplacian corresponding to one view to characterize its local manifold structure. 2) Instead of directly enforcing the low-rank minimization among all views for correlation consensus, we separately impose low-rank constraint on each view, coupled with a mutual structural consensus constraint, where it is able to not only well preserve the local manifold structure but also serve as a constraint for that from other views, which iteratively makes the views more agreeable. Extensive experiments on real-world multi-view data sets demonstrate its superiority.

AAAI Conference 2016 Conference Paper

The Ostomachion Process

  • Xuhui Fan
  • Bin Li
  • Yi Wang
  • Yang Wang
  • Fang Chen

Stochastic partition processes for exchangeable graphs produce axis-aligned blocks on a product space. In relational modeling, the resulting blocks uncover the underlying interactions between two sets of entities of the relational data. Although some flexible axis-aligned partition processes, such as the Mondrian process, have been able to capture complex interacting patterns in a hierarchical fashion, they are still in short of capturing dependence between dimensions. To overcome this limitation, we propose the Ostomachion process (OP), which relaxes the cutting direction by allowing for oblique cuts. The partitions generated by an OP are convex polygons that can capture inter-dimensional dependence. The OP also exhibits interesting properties: 1) Along the time line the cutting times can be characterized by a homogeneous Poisson process, and 2) on the partition space the areas of the resulting components comply with a Dirichlet distribution. We can thus control the expected number of cuts and the expected areas of components through hyper-parameters. We adapt the reversible-jump MCMC algorithm for inferring OP partition structures. The experimental results on relational modeling and decision tree classification have validated the merit of the OP.

IJCAI Conference 2016 Conference Paper

Tri-Party Deep Network Representation

  • Shirui Pan
  • Jia Wu
  • Xingquan Zhu
  • Chengqi Zhang
  • Yang Wang

Information network mining often requires examination of linkage relationships between nodes for analysis. Recently, network representation has emerged to represent each node in a vector format, embedding network structure, so off-the-shelf machine learning methods can be directly applied for analysis. To date, existing methods only focus on one aspect of node information and cannot leverage node labels. In this paper, we propose TriDNR, a tri-party deep network representation model, using information from three parties: node structure, node content, and node labels (if available) to jointly learn optimal node representation. TriDNR is based on our new coupled deep natural language module, whose learning is enforced at three levels: (1) at the network structure level, TriDNR exploits inter-node relationship by maximizing the probability of observing surrounding nodes given a node in random walks; (2) at the node content level, TriDNR captures node-word correlation by maximizing the co-occurrence of word sequence given a node; and (3) at the node label level, TriDNR models label-word correspondence by maximizing the probability of word sequence given a class label. The tri-party information is jointly fed into the neural network model to mutually enhance each other to learn optimal representation, and results in up to 79% classification accuracy gain, compared to state-of-the-art methods.

TCS Journal 2015 Journal Article

Optimistic fair exchange in the enhanced chosen-key model

  • Yang Wang
  • Man Ho Au
  • Willy Susilo

Optimistic fair exchange (OFE) is a kind of protocol to guarantee fairness for the parties involved in an exchange with the help of an arbitrator. A fundamental work of optimistic fair exchange is to define security models capturing realistic attacks and design schemes secure in practical models. The security models are very essential to ensure that they capture practical situation, which will ensure that the protocols can be adopted in practice. The contributions of this paper are three fold. First, we observe that the existing OFE models do not capture realistic situation, where the adversary can actually observe the full signatures generated by the signer, prior to launching the actual attack. That is to say, the adversary is not provided with the signing oracle, which will produce full signatures generated by the signer. It is commonly believed that the full signatures generated by the signer can be simulated by the full signatures generated by the arbitrator. Unfortunately, we show that this perception is false. Second, we propose an enhanced model of OFE that explicitly provides the adversary with the signing oracle, which outputs the full signatures generated by the signer. We demonstrate the difference between our enhanced model and the existing chosen-key model through two concrete OFE schemes that serve as counterexamples. Finally, we revisit two existing generic constructions of optimistic fair exchange schemes, one based on verifiably encrypted signatures, and the other based on conventional signatures and ring signatures. Our result shows that the two generic approaches can still offer schemes secure in our enhanced model, which captures the real scenario that dishonest users may have access to the full signatures generated by the signer.

EAAI Journal 2014 Journal Article

A tabu search based memetic algorithm for the maximum diversity problem

  • Yang Wang
  • Jin-Kao Hao
  • Fred Glover
  • Zhipeng Lü

This paper presents a highly effective memetic algorithm for the maximum diversity problem based on tabu search. The tabu search component uses a successive filter candidate list strategy and the solution combination component employs a combination operator based on identifying strongly determined and consistent variables. Computational experiments on three sets of 40 popular benchmark instances indicate that our tabu search/memetic algorithm (TS/MA) can easily obtain the best known results for all the tested instances (where no previous algorithm has achieved) as well as improved results for six instances. Analysis of comparisons with state-of-the-art algorithms demonstrates statistically that our TS/MA competes very favorably with the best performing algorithms. Key elements and properties of TS/MA are also analyzed to disclose the benefits of integrating tabu search (using a successive filter candidate list strategy) and solution combination (based on critical variables).

TCS Journal 2014 Journal Article

Attribute-based optimistic fair exchange: How to restrict brokers with policies

  • Yang Wang
  • Man Ho Au
  • Willy Susilo

Optimistic fair exchange (OFE) is a kind of protocols for solving the fair exchange problem between two participants with the help of an arbitrator that only needs to be involved when dispute occurs. As far as we are concerned, all previous work on OFE does not take into account user's attributes such as nationality and age. We identify that in some applications, the attributes could play an important role in the exchange to take place, and OFE may not be suitable to these scenarios. We introduce a new notion named attribute-based optimistic fair exchange (ABOFE) to solve the fair exchange problem in the attribute-based setting. We formalise the notion of ABOFE and present a security model in the multi-user setting under the chosen-key attack. We also present a generic construction of ABOFE from existing cryptographic primitives and prove that our proposal is secure with respect to our definition in the standard model. An instantiation in the standard model is discussed.

YNIMG Journal 2013 Journal Article

Robust estimation of fractal measures for characterizing the structural complexity of the human brain: Optimization and reproducibility

  • Joaquín Goñi
  • Olaf Sporns
  • Hu Cheng
  • Maite Aznárez-Sanado
  • Yang Wang
  • Santiago Josa
  • Gonzalo Arrondo
  • Vincent P. Mathews

High-resolution isotropic three-dimensional reconstructions of human brain gray and white matter structures can be characterized to quantify aspects of their shape, volume and topological complexity. In particular, methods based on fractal analysis have been applied in neuroimaging studies to quantify the structural complexity of the brain in both healthy and impaired conditions. The usefulness of such measures for characterizing individual differences in brain structure critically depends on their within-subject reproducibility in order to allow the robust detection of between-subject differences. This study analyzes key analytic parameters of three fractal-based methods that rely on the box-counting algorithm with the aim to maximize within-subject reproducibility of the fractal characterizations of different brain objects, including the pial surface, the cortical ribbon volume, the white matter volume and the gray matter/white matter boundary. Two separate datasets originating from different imaging centers were analyzed, comprising 50 subjects with three and 24 subjects with four successive scanning sessions per subject, respectively. The reproducibility of fractal measures was statistically assessed by computing their intra-class correlations. Results reveal differences between different fractal estimators and allow the identification of several parameters that are critical for high reproducibility. Highest reproducibility with intra-class correlations in the range of 0. 9–0. 95 is achieved with the correlation dimension. Further analyses of the fractal dimensions of parcellated cortical and subcortical gray matter regions suggest robustly estimated and region-specific patterns of individual variability. These results are valuable for defining appropriate parameter configurations when studying changes in fractal descriptors of human brain structure, for instance in studies of neurological diseases that do not allow repeated measurements or for disease-course longitudinal studies.

YNIMG Journal 2012 Journal Article

Characteristics and variability of structural networks derived from diffusion tensor imaging

  • Hu Cheng
  • Yang Wang
  • Jinhua Sheng
  • William G. Kronenberger
  • Vincent P. Mathews
  • Tom A. Hummer
  • Andrew J. Saykin

Structural brain networks were constructed based on diffusion tensor imaging (DTI) data of 59 young healthy male adults. The networks had 68 nodes, derived from FreeSurfer parcellation of the cortical surface. By means of streamline tractography, the edge weight was defined as the number of streamlines between two nodes normalized by their mean volume. Specifically, two weighting schemes were adopted by considering various biases from fiber tracking. The weighting schemes were tested for possible bias toward the physical size of the nodes. A novel thresholding method was proposed using the variance of number of streamlines in fiber tracking. The backbone networks were extracted and various network analyses were applied to investigate the features of the binary and weighted backbone networks. For weighted networks, a high correlation was observed between nodal strength and betweenness centrality. Despite similar small-worldness features, binary networks and weighted networks are distinctive in many aspects, such as modularity and nodal betweenness centrality. Inter-subject variability was examined for the weighted networks, along with the test–retest reliability from two repeated scans on 44 of the 59 subjects. The inter-/intra-subject variability of weighted networks was discussed in three levels — edge weights, local metrics, and global metrics. The variance of edge weights can be very large. Although local metrics show less variability than the edge weights, they still have considerable amounts of variability. Weighting scheme one, which scales the number of streamlines by their lengths, demonstrates stable intra-class correlation coefficients against thresholding for global efficiency, clustering coefficient and diversity. The intra-class correlation analysis suggests the current approach of constructing weighted network has a reasonably high reproducibility for most global metrics.

JMLR Journal 2012 Journal Article

Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition

  • Yang Wang
  • Duan Tran
  • Zicheng Liao
  • David Forsyth

We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. Most previous work in part-based models only considers rigid parts (e.g., torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate. In this paper, we introduce hierarchical poselets---a new representation for modeling the pose configuration of human bodies. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g., torso + left arm). In the extreme case, they can be the whole bodies. The hierarchical poselets are organized in a hierarchical way via a structured model. Human parsing can be achieved by inferring the optimal labeling of this hierarchical model. The pose information captured by this hierarchical model can also be used as a intermediate representation for other high-level tasks. We demonstrate it in action recognition from static images. [abs] [ pdf ][ bib ] &copy JMLR 2012. ( edit, beta )

NeurIPS Conference 2012 Conference Paper

Kernel Latent SVM for Visual Recognition

  • Weilong Yang
  • Yang Wang
  • Arash Vahdat
  • Greg Mori

Latent SVMs (LSVMs) are a class of powerful tools that have been successfully applied to many applications in computer vision. However, a limitation of LSVMs is that they rely on linear models. For many computer vision tasks, linear models are suboptimal and nonlinear models learned with kernels typically perform much better. Therefore it is desirable to develop the kernel version of LSVM. In this paper, we propose kernel latent SVM (KLSVM) -- a new learning framework that combines latent SVMs and kernel methods. We develop an iterative training algorithm to learn the model parameters. We demonstrate the effectiveness of KLSVM using three different applications in visual recognition. Our KLSVM formulation is very general and can be applied to solve a wide range of applications in computer vision and machine learning.

YNIMG Journal 2011 Journal Article

Regional reproducibility of pulsed arterial spin labeling perfusion imaging at 3T

  • Yang Wang
  • Andrew J. Saykin
  • Josef Pfeuffer
  • Chen Lin
  • Kristine M. Mosier
  • Li Shen
  • Sungeun Kim
  • Gary D. Hutchins

Arterial spin labeling (ASL) is a promising non-invasive magnetic resonance imaging (MRI) technique for measuring regional cerebral blood flow (rCBF) or perfusion in vivo. To evaluate the feasibility of ASL as a biomarker for clinical trials, it is important to examine test-retest reproducibility. We investigated both inter- and intra-session reproducibility of perfusion MRI using a pulsed ASL (PASL) sequence PICORE Q2TIPS with an echo-planar imaging (EPI) readout. Structural MRI regions of interest (ROIs) were extracted individually by automated parcellation and segmentation methods using FreeSurfer. These cortical and subcortical ROIs were used to assess regional perfusion stability. Our results indicated regional variability in grey matter rCBF. Although rCBF measurements were characterized by intersubject variation, our results also indicated relatively less within-subject variability estimated as within-subject standard deviation (SDW) (intersession SDW: 2. 0 to 8. 8; intrasession SDW: 2. 8 to 9. 6) and acceptable reliabilities as measured using intraclass correlation coefficient (ICC) (intersession ICC: 0. 68 to 0. 94; intrasession ICC: 0. 66 to 0. 95) for regional MRI perfusion measurements using the PICORE Q2TIPS technique. Overall, our findings suggest that PASL is a technique with good within and between session reproducibility. Further reproducibility studies in target populations relevant for specific clinical trials of neurovascular related agents will be important and the present results provide a framework for such assessments.

NeurIPS Conference 2010 Conference Paper

A Discriminative Latent Model of Image Region and Object Tag Correspondence

  • Yang Wang
  • Greg Mori

We propose a discriminative latent model for annotating images with unaligned object-level textual annotations. Instead of using the bag-of-words image representation currently popular in the computer vision community, our model explicitly captures more intricate relationships underlying visual and textual information. In particular, we model the mapping that translates image regions to annotations. This mapping allows us to relate image regions to their corresponding annotation terms. We also model the overall scene label as latent information. This allows us to cluster test images. Our training data consist of images and their associated annotations. But we do not have access to the ground-truth region-to-annotation mapping or the overall scene label. We develop a novel variant of the latent SVM framework to model them as latent variables. Our experimental results demonstrate the effectiveness of the proposed model compared with other baseline methods.

NeurIPS Conference 2010 Conference Paper

Beyond Actions: Discriminative Models for Contextual Group Activities

  • Tian Lan
  • Yang Wang
  • Weilong Yang
  • Greg Mori

We propose a discriminative model for recognizing group activities. Our model jointly captures the group activity, the individual person actions, and the interactions among them. Two new types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. Different from most of the previous latent structured models which assume a predefined structure for the hidden layer, e. g. a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. Our experimental results demonstrate that by inferring this contextual information together with adaptive structures, the proposed model can significantly improve activity recognition performance.

NeurIPS Conference 2009 Conference Paper

A Rate Distortion Approach for Semi-Supervised Conditional Random Fields

  • Yang Wang
  • Gholamreza Haffari
  • Shaojun Wang
  • Greg Mori

We propose a novel information theoretic approach for semi-supervised learning of conditional random fields. Our approach defines a training objective that combines the conditional likelihood on labeled data and the mutual information on unlabeled data. Different from previous minimum conditional entropy semi-supervised discriminative learning methods, our approach can be naturally cast into the rate distortion theory framework in information theory. We analyze the tractability of the framework for structured prediction and present a convergent variational training algorithm to defy the combinatorial explosion of terms in the sum over label configurations. Our experimental results show that the rate distortion approach outperforms standard $l_2$ regularization and minimum conditional entropy regularization on both multi-class classification and sequence labeling problems.

NeurIPS Conference 2008 Conference Paper

Learning a discriminative hidden part model for human action recognition

  • Yang Wang
  • Greg Mori

We present a discriminative part-based approach for human action recognition from video sequences using motion features. Our model is based on the recently proposed hidden conditional random field~(hCRF) for object recognition. Similar to hCRF for object recognition, we model a human action by a flexible constellation of parts conditioned on image observations. Different from object recognition, our model combines both large-scale global features and local patch features to distinguish various actions. Our experimental results show that our model is comparable to other state-of-the-art approaches in action recognition. In particular, our experimental results demonstrate that combining large-scale global features and local patch features performs significantly better than directly applying hCRF on local patches alone.

NeurIPS Conference 2005 Conference Paper

Fast Krylov Methods for N-Body Learning

  • Nando Freitas
  • Yang Wang
  • Maryam Mahdaviani
  • Dustin Lang

This paper addresses the issue of numerical computation in machine learning domains based on similarity metrics, such as kernel methods, spectral techniques and Gaussian processes. It presents a general solution strategy based on Krylov subspace iteration and fast N-body learning methods. The experiments show significant gains in computation and storage on datasets arising in image segmentation, object detection and dimensionality reduction. The paper also presents theoretical bounds on the stability of these methods.

TCS Journal 2004 Journal Article

A space-efficient algorithm for sequence alignment with inversions and reversals

  • Zhi-Zhong Chen
  • Yong Gao
  • Guohui Lin
  • Robert Niewiadomski
  • Yang Wang
  • Junfeng Wu

A dynamic programming algorithm to find an optimal alignment for a pair of DNA sequences has been described by Schöniger and Waterman. The alignments use not only substitutions, insertions, and deletions of single nucleotides, but also inversions, which are the reversed complements, of substrings of the sequences. With the restriction that the inversions are pairwise non-intersecting, their proposed algorithm runs in O ( n 2 m 2 ) time and consumes O ( n 2 m 2 ) space, where n and m are the lengths of the input sequences, respectively. We develop a space-efficient algorithm to compute such an optimal alignment which consumes only O ( nm ) space within the same amount of time. Our algorithm enables the computation for a pair of DNA sequences of length up to 10, 000 to be carried out on an ordinary desktop computer. Simulation study is conducted to verify some biological facts about gene shuffling across species.

IJCAI Conference 2003 Conference Paper

Switching Hypothesized Measurements: A Dynamic Model with Applications to Occlusion Adaptive Joint Tracking

  • Yang Wang
  • Tele Tan
  • Kia-Fock Loe

This paper proposes a dynamic model supporting multimodal state space probability distributions and presents the application of the model in dealing with visual occlusions when tracking multiple objects jointly. For a set of hypotheses, multiple measurements are acquired at each time instant. The model switches among a set of hypothesized measurements during the propagation. Two computationally efficient filtering algorithms are derived for online joint tracking. Both the occlusion relationship and state of the objects are recursively estimated from the history of measurement data. The switching hypothesized measurements (SHM) model is generally applicable to describe various dynamic processes with multiple alternative measurement methods.