Arrow Research search

Author name cluster

Chao Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

79 papers
2 author rows

Possible papers

79

AAAI Conference 2026 Conference Paper

Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching

  • Yanhao Dong
  • Yubo Miao
  • Weinan Li
  • Xiao Zheng
  • Chao Wang
  • Jiesheng Wu
  • Feng Lyu

Large Language Models (LLMs) exhibit pronounced memory-bound characteristics during inference due to High Bandwidth Memory (HBM) bandwidth constraints. In this paper, we propose an L2 Cache-oriented asynchronous KV Cache prefetching method to break through the memory bandwidth bottleneck in LLM inference through computation-load overlap. By strategically scheduling idle memory bandwidth during active computation windows, our method proactively prefetches required KV Cache into GPU L2 cache, enabling high-speed L2 cache hits for subsequent accesses and effectively hiding HBM access latency within computational cycles. Extensive experiments on NVIDIA H20 GPUs demonstrate that the proposed method achieves 2.15× improvement in attention kernel efficiency and up to 1.97× end-to-end throughput enhancement, surpassing state-of-the-art baseline FlashAttention-3. Notably, our solution maintains orthogonality to existing optimization techniques and can be integrated with current inference frameworks, providing a scalable latency-hiding solution for next-generation LLM inference engines.

AAAI Conference 2026 Conference Paper

AEDR: Training-Free AI-Generated Image Attribution via Autoencoder Double-Reconstruction

  • Chao Wang
  • Zijin Yang
  • Yaofei Wang
  • Weiming Zhang
  • Kejiang Chen

The rapid advancement of image-generation technologies has made it possible for anyone to create photorealistic images using generative models, raising significant security concerns. To mitigate malicious use, tracing the origin of such images is essential. Reconstruction-based attribution methods offer a promising solution, but they often suffer from reduced accuracy and high computational costs when applied to state‑of‑the‑art (SOTA) models. To address these challenges, we propose AEDR (AutoEncoder Double-Reconstruction), a novel training‑free attribution method designed for generative models with continuous autoencoders. Unlike existing reconstruction‑based approaches that rely on the value of a single reconstruction loss, AEDR performs two consecutive reconstructions using the model’s autoencoder, and adopts the ratio of these two reconstruction losses as the attribution signal. This signal is further calibrated using the image homogeneity metric to improve accuracy, which inherently cancels out absolute biases caused by image complexity, with autoencoder‑based reconstruction ensuring superior computational efficiency. Experiments on eight top latent diffusion models show that AEDR achieves 25.5% higher attribution accuracy than existing reconstruction‑based methods, with requiring only 1% of the computational time.

AAAI Conference 2026 Conference Paper

CloserToMe: A Unified Framework for Accurate and Transferable Latency Prediction Across Heterogeneous Devices

  • Cheng Tang
  • Guochong Sui
  • Wenqi Lou
  • Zihan Wang
  • Jiayi Tuo
  • Wenqian Xie
  • Yinkang Gao
  • Yixuan Zhu

Hardware accelerators such as GPUs, NPUs, and FPGAs are essential to meeting AI’s computational demands. With the proliferation of heterogeneous devices across cloud and edge, various model optimization techniques adapt to diverse hardware characteristics through operator transformations and structural modifications. Accurate, efficient latency prediction enables rapid selection of optimal strategies across hardware backends. Many existing methods treat hardware as a black-box executor, directly regressing latency without explicitly modeling the intricate interactions between neural network (NN) structures and device-specific execution behaviors. To address these challenges, we introduce a new modeling perspective that captures the interaction between neural architectures and hardware execution. To capture device-specific characteristics, we propose two complementary modeling strategies. The Device Behavior Signature Selector (DBSel) characterizes hardware execution behavior by selectively probing a small set of representative architectures, forming a compact, workload-driven profile. In parallel, we construct capability vectors that capture the hierarchical memory of each device and compute characteristics, providing a structured abstraction of its architectural capacity. To unify both behavioral and structural views, we introduce the Hardware–Operation Dialogue Module (HODM), which models fine-grained interactions between neural operators and hardware properties. Together, these components empower CloserToMe to deliver accurate and transferable latency predictions across unseen and diverse platforms.

AAAI Conference 2026 Conference Paper

Enhancing Conversational Recommender Systems with Tree-Structured Knowledge and Pretrained Language Models

  • Yongwen Ren
  • Chao Wang
  • Peng Du
  • Chuan Qin
  • Dazhong Shen
  • Hui Xiong

Recent advances in pretrained language models (PLMs) have significantly improved conversational recommender systems (CRS), enabling more fluent and context-aware interactions. To further enhance accuracy and mitigate hallucination, many methods integrate PLMs with knowledge graphs (KGs), but face key challenges: failing to fully exploit PLM reasoning over graph relationships, indiscriminately incorporating retrieved knowledge without context filtering, and neglecting collaborative preferences in multi-turn dialogues. To this end, we propose PCRS-TKA, a prompt-based framework employing retrieval-augmented generation to integrate PLMs with KGs. PCRS-TKA constructs dialogue-specific knowledge trees from KGs and serializes them into texts, enabling structure-aware reasoning while capturing rich entity semantics. Our approach selectively filters context-relevant knowledge and explicitly models collaborative preferences using specialized supervision signals. A semantic alignment module harmonizes heterogeneous inputs, reducing noise and enhancing accuracy. Extensive experiments demonstrate that PCRS-TKA consistently outperforms all baselines in both recommendation and conversational quality.

EAAI Journal 2026 Journal Article

GMFIMamba: Remote sensing change detection based on group Mamba feature interaction

  • Wenliang Xu
  • Suting Chen
  • Feilong Bi
  • Chao Wang
  • Xiao Shu

With the advancement of satellite technology, high-resolution remote sensing images have been widely used in the field of change detection. Building Change Detection (BCD) and Building Damage Assessment (BDA) are both sub-tasks of change detection. BCD aims to detect structural changes in buildings over time, whereas BDA focuses on assessing the level of building damage after a disaster. BCD is of great value for urban planning, while BDA plays a crucial role in post-disaster rescue efforts. To address these tasks, we propose a change detection method based on Mamba, named GMFIMamba. Specifically, we design a Convolution–Visual State Space (Conv-VSS) block, which combines the local feature extraction capability of Convolutional Neural Networks (CNNs) with the global feature modeling ability of Mamba. By integrating local and global features, our approach improves the accuracy of change region detection. To tackle the issue of insufficient feature extraction for small-scale buildings in existing models, we introduce the Multi-branch Dilated Convolution Feature Enhancement Module (MCFEM). In addition, we design the Grouped Mamba-Based Bitemporal Features Interaction Module (GMBFIM) to facilitate effective interaction between bitemporal images, leading to more accurate change feature extraction. Experiments on three public datasets demonstrate that the proposed method achieves superior performance in both BCD and BDA tasks, proving its effectiveness.

AAAI Conference 2026 Conference Paper

Image Restoration via Primal Dual Hybrid Gradient and Flow Generative Model

  • Ji Li
  • Chao Wang

Regularized optimization has been a classical approach to solving imaging inverse problems, where the regularization term enforces desirable properties of the unknown image. Recently, the integration of flow matching generative models into image restoration has garnered significant attention, owing to their powerful prior modeling capabilities. In this work, we incorporate such generative priors into a Plug-and-Play (PnP) framework based on proximal splitting, where the proximal operator associated with the regularizer is replaced by a time-dependent denoiser derived from the generative model. While existing PnP methods have achieved notable success in inverse problems with smooth squared ℓ2 data fidelity--typically associated with Gaussian noise--their applicability to more general data fidelity terms remains underexplored. To address this, we propose a general and efficient PnP algorithm inspired by the primal-dual hybrid gradient (PDHG) method. Our approach is computationally efficient, memory-friendly, and accommodates a wide range of fidelity terms. In particular, it supports both ℓ1 and ℓ2 norm-based losses, enabling robustness to non-Gaussian noise types such as Poisson and impulse noise. We validate our method on several image restoration tasks, including denoising, super-resolution, deblurring, and inpainting, and demonstrate that ℓ1 and ℓ2 fidelity terms outperform the conventional squared ℓ2 loss in the presence of non-Gaussian noise.

AAAI Conference 2026 Conference Paper

Integrating Reweighted Least Squares with Plug-and-Play Diffusion Priors for Noisy Image Restoration

  • Ji Li
  • Chao Wang

Existing plug-and-play image restoration methods typically employ off-the-shelf Gaussian denoisers as proximal operators within classical optimization frameworks based on variable splitting. Recently, denoisers induced by generative priors have been successfully integrated into regularized optimization methods for image restoration under Gaussian noise. However, their application to non-Gaussian noise--such as impulse noise--remains largely unexplored. In this paper, we propose a plug-and-play image restoration framework based on generative diffusion priors for robust removal of general noise types, including impulse noise. Within the maximum a posteriori (MAP) estimation framework, the data fidelity term is adapted to the specific noise model. Departing from the conventional least-squares loss used for Gaussian noise, we introduce a generalized Gaussian scale mixture-based loss, which approximates a wide range of noise distributions and leads to an ℓq-norm fidelity term. This optimization problem is addressed using an iteratively reweighted least squares (IRLS) approach, wherein the proximal step involving the generative prior is efficiently performed via a diffusion-based denoiser. Experimental results on benchmark datasets demonstrate that the proposed method effectively removes non-Gaussian impulse noise and achieves superior restoration performance.

EAAI Journal 2026 Journal Article

Mixed-variable optimization of bi-directional functionally graded porous plates based on multi-patch isogeometric analysis

  • Yun Chong
  • Chao Wang
  • Chenxu Chu
  • Liangliang Ma

Bi-directional functionally graded (2D-FG) porous plates are widely used in aerospace and high-end engineering owing to their lightweight and tailorable properties. This study proposes a mixed-variable optimization framework for 2D-FG porous plates based fixed Hamming weight for the first time, which integrates intelligent optimization strategies into the design of 2D-FG porous plates to achieve unified modeling and adaptive optimization of material distribution and porosity type. In the mechanical analysis, based on the third-order shear deformation theory and Nitsche’s method, the static behavior of 2D-FG porous plates is numerically analyzed and verified by isogeometric analysis. In the optimization design problem, the first natural frequency is maximized as the objective, with design variables including material volume fraction, porosity, and porosity distribution type. Four numerical examples are presented to confirm the effectiveness and robustness of the proposed method in enhancing dynamic performance and material distribution quality, offering a new perspective for the optimal design of functionally graded porous composite structures.

AAAI Conference 2026 Conference Paper

MSAnchor: De Novo Molecular Generation from Mass Spectrometry Data with Anchor-Extended Molecular Scaffolds

  • Xiaohan Qin
  • Chao Wang
  • Zhengyang Zhou
  • Linjiang Chen
  • Wenjie Du
  • Yang Wang

Tandem mass spectrometry (MS/MS) is a critical tool for identifying molecular structures. By efficiently separating molecular fragments based on their mass-to-charge (m/z) ratios, it facilitates molecular generation and subsequent scientific discoveries. However, de novo molecular generation from MS/MS spectra remains fundamentally constrained by two paramount challenges: the vast chemical space requires effective structural constraints, and the absence of fine-grained substructural generation weakens the correspondences between spectral features and molecular structures. In this work, we propose MSAnchor, a novel two-stage framework for MS/MS-based molecular structure generation. We mitigate the search space challenge through the introduction of Anchor-Extended Molecular Scaffold (AEMS) representation that explicitly encodes side-chain anchoring points, thereby dramatically reducing combinatorial complexity. Leveraging the explicit attachment sites provided by AEMS, we develop anchor-specific priors that establish effective alignments between spectral features and molecular substructures. This fine-grained substructural correspondence is further enhanced by a modified Conditional Information Bottleneck (CIB) module that extracts the most informative spectral components in a structure-aware manner. These innovations enable MSAnchor to generate molecular structures that closely reflect spectral characteristics while constraining combinatorial complexity. Extensive experiments on the CANOPUS and MassSpecGym datasets demonstrate that MSAnchor achieves state-of-the-art performance in molecular structure prediction from MS/MS spectra, with performance improvements that are particularly more pronounced for molecules with higher complexity.

JBHI Journal 2026 Journal Article

SHIELD: Blockchain-Enabled Lightweight Authentication Framework for Secure Wearable Health Monitoring in IoMT Environments

  • Chao Wang
  • Xingsi Xue
  • Jing Wang
  • Huamao Jiang

Consumer health devices generate massive volumes of sensitive medical data requiring secure authentication mechanisms that accommodate the resource constraints of wearable sensors and portable diagnostic equipment. Traditional centralized authentication approaches in Internet of Medical Things (IoMT) environments suffer from single points of failure, privacy vulnerabilities, and scalability limitations when managing diverse health monitoring devices. This paper presents secure healthcare IoMT enhanced lightweight device authentication (SHIELD), a blockchain-based lightweight authentication framework designed for resource-constrained consumer health devices. The framework leverages blockchain's immutable and decentralized properties, combined with efficient elliptic curve cryptography, to ensure secure storage and verification of device identities while providing mutual authentication between health devices and medical data servers. Security analysis demonstrates that SHIELD satisfies twelve critical security properties, including decentralization, resistance to password guessing and replay attacks, perfect forward secrecy, and session key security. Performance evaluation reveals that SHIELD achieves computational efficiency at 9. 837 milliseconds authentication latency, representing 31% improvement over previous best-performing schemes. The framework requires only 1384 bits of communication overhead and maintains minimal average delay times suitable for real-time health monitoring applications. Blockchain implementation analysis confirms practical deployment feasibility with 0. 0356 MGas operational costs per authentication session.

AAAI Conference 2026 Conference Paper

Small but Mighty: Dynamic Wavelet Expert-Guided Fine-Tuning of Large-Scale Models for Optical Remote Sensing Object Segmentation

  • Yanguang Sun
  • Chao Wang
  • Jian Yang
  • Lei Luo

Accurately localizing and segmenting relevant objects from optical remote sensing images (ORSIs) is critical for advancing remote sensing applications. Existing methods are typically built upon moderate-scale pre-trained models and employ diverse optimization strategies to achieve promising performance under full-parameter fine-tuning. In fact, deeper and larger-scale foundation models can provide stronger support for performance improvement. However, due to their massive number of parameters, directly adopting full-parameter fine-tuning leads to pronounced training difficulties, such as excessive GPU memory consumption and high computational costs, which result in extremely limited exploration of large-scale models in existing works. In this paper, we propose a novel dynamic wavelet expert-guided fine-tuning paradigm with fewer trainable parameters, dubbed WEFT, which efficiently adapts large-scale foundation models to ORSIs segmentation tasks by leveraging the guidance of wavelet experts. Specifically, we introduce a task-specific wavelet expert extractor to model wavelet experts from different perspectives and dynamically regulate their outputs, thereby generating trainable features enriched with task-specific information for subsequent fine-tuning. Furthermore, we construct an expert-guided conditional adapter that first enhances the fine-grained perception of frozen features for specific tasks by injecting trainable features, and then iteratively updates the information of both types of feature, allowing for efficient fine-tuning. Extensive experiments show that our WEFT not only outperforms 21 state-of-the-art (SOTA) methods on three ORSIs datasets, but also achieves optimal results in camouflage, natural, and medical scenarios.

TMLR Journal 2026 Journal Article

SpikingMamba: Towards Energy-Efficient Large Language Models via Knowledge Distillation from Mamba

  • Yulong Huang
  • Jianxiong Tang
  • Chao Wang
  • Ziyi Wang
  • Jianguo Zhang
  • Zhichao Lu
  • Bojun Cheng
  • Luziwei Leng

Large Language Models (LLMs) have achieved remarkable performance across tasks but remain energy-intensive due to dense matrix operations. Spiking neural networks (SNNs) improve energy efficiency by replacing dense matrix multiplications with sparse accumulations. Their sparse spike activity enables efficient LLMs deployment on edge devices. However, prior SNN-based LLMs often sacrifice performance for efficiency, and recovering accuracy typically requires full pretraining, which is costly and impractical. To address this, we propose SpikingMamba, an energy-efficient SNN-based LLMs distilled from Mamba that improves energy efficiency with minimal accuracy sacrifice. SpikingMamba integrates two key components: (a) SI-LIF, a signed-integer spiking neuron that preserves semantic polarity through signed multi-level spike representations. (b) A training-exclusive Smoothed Gradient Compensation (SGC) path mitigating quantization loss while preserving spike-driven efficiency. We employ a single-stage distillation strategy to transfer the zero-shot ability of pretrained Mamba and further enhance it via reinforcement learning (RL). Experiments show that SpikingMamba-1.3B achieves a 4.76$\times$ energy benefit, with only a 4.78\% zero-shot accuracy gap compared to the original Mamba. The model achieves a further 2.55\% accuracy improvement after RL, narrowing the performance gap from 4.78\% to 2.23\%.

EAAI Journal 2025 Journal Article

A real-time prediction model for instantaneous dam-break flood evolution of concrete gravity dams based on attention mechanism and spatiotemporal multiple features

  • Chao Wang
  • Yaofei Zhang
  • Sherong Zhang
  • Xiaohua Wang
  • Xingbo Zhou
  • Yishu Lai

Simulating the flood evolution following the sudden breach of concrete gravity dams is crucial for enabling prompt emergency flood control decisions. The real-time performance and reliability of these flood propagation simulations are essential for improving the accuracy and speed of emergency responses. This study introduces a deep learning model that integrates an attention mechanism to predict flood evolution parameters in real time. Initially, parameters such as water depth and flow rate were measured under 32 distinct dam-break scenarios using a hydrodynamic model. By combining terrain data with time-series flood discharge data, we compiled a dataset containing 1984 entries, enhanced through reduced-order methods. A novel deep learning model, the Flood-Swin-Transformer, was then developed to predict the spatiotemporal evolution of dam-break floods. This model was benchmarked against 11 baseline models and four state-of-the-art deep learning models. The results indicate: (1) Baseline models accurately predict water depth but are less effective at predicting flow rate parameters; (2) Deep learning models outperform baseline models in both accuracy and classification capabilities for water depth and flow rate parameters, showing robust performance; (3) Extensive analyses, including error, classification accuracy, effectiveness, robustness, and flood parameter error mapping, demonstrate the superior performance of the proposed model; (4) The proposed model predicts flood evolution up to 43. 75 times faster than traditional hydrodynamic models, facilitating real-time prediction capabilities.

NeurIPS Conference 2025 Conference Paper

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding

  • Ahmed Masry
  • Juan Rodriguez
  • Tianyu Zhang
  • Suyuchen Wang
  • Chao Wang
  • Aarash Feizi
  • Akshay Kalkunte Suresh
  • Abhay Puri

Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models hinges on having a good connector that maps visual features generated by a vision encoder to a shared embedding space with the LLM while preserving semantic similarity. Existing connectors, such as multilayer perceptrons (MLPs), lack inductive bias to constrain visual features within the linguistic structure of the LLM’s embedding space, making them data-hungry and prone to cross-modal misalignment. In this work, we propose a novel vision-text alignment method, AlignVLM, that maps visual features to a weighted average of LLM text embeddings. Our approach leverages the linguistic priors encoded by the LLM to ensure that visual features are mapped to regions of the space that the LLM can effectively interpret. AlignVLM is particularly effective for document understanding tasks, where visual and textual modalities are highly correlated. Our extensive experiments show that AlignVLM achieves state-of-the-art performance compared to prior alignment methods, with larger gains on document understanding and under low-resource setups. We provide further analysis demonstrating its efficiency and robustness to noise.

EAAI Journal 2025 Journal Article

Bogie doctor: Automated fault diagnosis in high-speed trains via deep learning-based maintenance log analysis

  • Zhenyu Zhang
  • Runsheng Miao
  • Chao Wang
  • Yong Qin

The bogie with its numerous components and complex fault evolution patterns, has fault modes that mutually influence each other, significantly impacting the stability and safety of the train. During the train operation and maintenance process, a large number of maintenance logs are generated, detailing fault phenomena, locations, and impacts. Mining these maintenance logs for automated bogie fault diagnosis can enhance fault handling efficiency and effectively ensure the safe and efficient operation of trains. Therefore, a new deep learning framework is developed to address the challenges of feature representation and low diagnostic accuracy. First, the fault textual data in the maintenance logs is preprocessed, and a specialized dictionary related to bogie faults is constructed to improve the quality of text representation. Then, the developed deep learning framework is used as a classifier, and Robustly optimized Bidirectional Encoder Representations from Transformers approach (RoBERTa) model maps words into real-valued vectors, and the advantages of Bidirectional Gated Recurrent Unit (BiGRU) and Text Convolutional Neural Network (TextCNN) are leveraged in extracting global and local features to enhance fault diagnosis accuracy. Finally, the model's effectiveness is validated using high-speed railway maintenance logs. The results indicate that the proposed model achieves good diagnostic accuracy, with precision and recall rates of 88. 75 % and 88. 27 %, respectively.

NeurIPS Conference 2025 Conference Paper

FACE: A General Framework for Mapping Collaborative Filtering Embeddings into LLM Tokens

  • Chao Wang
  • Yixin Song
  • Jinhui Ye
  • Chuan Qin
  • Dazhong Shen
  • Lingfeng Liu
  • Xiang Wang
  • Yanyong Zhang

Recently, large language models (LLMs) have been explored for integration with collaborative filtering (CF)-based recommendation systems, which are crucial for personalizing user experiences. However, a key challenge is that LLMs struggle to interpret the latent, non-semantic embeddings produced by CF approaches, limiting recommendation effectiveness and further applications. To address this, we propose FACE, a general interpretable framework that maps CF embeddings into pre-trained LLM tokens. Specifically, we introduce a disentangled projection module to decompose CF embeddings into concept-specific vectors, followed by a quantized autoencoder to convert continuous embeddings into LLM tokens (descriptors). Then, we design a contrastive alignment objective to ensure that the tokens align with corresponding textual signals. Hence, the model-agnostic FACE framework achieves semantic alignment without fine-tuning LLMs and enhances recommendation performance by leveraging their pre-trained capabilities. Empirical results on three real-world recommendation datasets demonstrate performance improvements in benchmark models, with interpretability studies confirming the interpretability of the descriptors. Code is available in \url{https: //github. com/YixinRoll/FACE}.

TAAS Journal 2025 Journal Article

Factorization-based Attribute Residual Summary for Adaptive Edge-based Autonomous System Security

  • Jiuzhen Zeng
  • Laurence T. Yang
  • Chao Wang
  • Xin Nie
  • Bocheng Ren
  • Honglu Zhao

Due to the particularity of the marginal environment, edge-based autonomous systems face significant risks associated with security operations. Traffic anomaly detection in edge-based autonomous systems has become increasingly crucial for ensuring the security of these systems. Existing works lack consideration of the relationship between traffic attributes and anomaly types. In particular, existing solutions struggle with detecting anomalies that primarily manifest statistical signs in only a few attributes. To address this, we propose a nonnegative factorization-based attribute residual summary and a nonparametric statistic framework for adaptive security monitoring in edge-based autonomous systems. Specifically, the nonnegative factorization, which depends on the multiplicative update rules, is introduced to extract attribute features. Using the tensor linear representation, the attribute residual summary is built, which depicts the statistic discrepancy well even if only a few of traffic attributes are affected, to implement adaptive security monitoring for various attacks in edge-based autonomous systems. Then, a nonparametric statistic framework is developed, which achieves the real-time detection by accumulating and comparing each statistic evidence. Extensive experiments with real-world traffic trace datasets validate the adaptivity, accuracy, real-time performance, and superiority of our method, particularly in dealing with anomalies that exhibit statistical signs in only a few traffic attributes.

ICML Conference 2025 Conference Paper

Generative Audio Language Modeling with Continuous-valued Tokens and Masked Next-Token Prediction

  • Shu-Wen Yang
  • Byeonggeun Kim
  • Kuan-Po Huang
  • Qingming Tang
  • Huy Phan
  • Bo-Ru Lu
  • Harshavardhan Sundar
  • Shalini Ghosh

Autoregressive next-token prediction with the Transformer decoder has become a de facto standard in large language models (LLMs), achieving remarkable success in Natural Language Processing (NLP) at scale. Extending this paradigm to audio poses unique challenges due to its inherently continuous nature. We research audio generation with a causal language model (LM) without discrete tokens. We leverage token-wise diffusion to model the continuous distribution of the next continuous-valued token. Our approach delivers significant improvements over previous discrete solution, AudioGen, achieving 20% and 40% relative gains on AudioCaps in Frechet Audio Distance (FAD) and Kullback-Leibler (KL) divergence, respectively. Additionally, we propose a novel masked next-token prediction task that incorporates masked prediction into the causal LM framework. On AudioCaps, the innovation yields 41% and 33% relative FAD improvements over AudioGen Base (285M) and AudioGen Large (1B) models, respectively, and is on par with the state-of-the-art (SOTA) diffusion models. Furthermore, we achieve these results with significantly fewer parameters—193M for our Base and 462M for our Large models.

YNIMG Journal 2025 Journal Article

Hippocampal subfields in aging: Sex-specific trajectories in structure and hemodynamics

  • Jiaqi Wen
  • Chenyang Li
  • Zhe Sun
  • Chao Wang
  • Jiangyang Zhang
  • Xiaojun Guan
  • Xiaojun Xu
  • Thomas Wisniewski

Sex differences in hippocampal aging have been increasingly recognized, with females showing greater vulnerability to neurodegeneration, particularly after menopause. However, the underlying neurobiological mechanisms remain unclear, especially at the level of hippocampal subfields. Leveraging high-resolution T1-, T2-weighted, and multi-delay arterial spin labeling MRI from 650 adults in the Human Connectome Project-Aging dataset, we examined sex-specific alterations in hippocampal subfield volume, arterial transit time (ATT), and cerebral blood flow (CBF) across the adult lifespan. All hippocampal subfields showed age-related atrophy and ATT prolongation. An age × sex interaction effect on ATT was observed in CA1 and CA2, indicating that age-related increases in ATT were more pronounced in females than in males in these subfields. Moreover, females exhibited more pronounced hippocampal subfields CBF reductions with aging and atrophy, while males showed relatively preserved CBF, with an increase in subiculum perfusion. Furthermore, CA1 showed the lowest perfusion and the strongest association with atrophy among hippocampal subfields. To investigate the potential impact of menopausal hormonal changes on sex-specific patterns, we explored the hypothalamic structure and hemodynamic alterations during aging and their effects on the hippocampus, given that hypothalamus regulates gonadal hormone secretion through the hypothalamic-pituitary-gonadal axis. We found significant hypothalamic atrophy during aging in both sexes, accompanied by ATT prolongation exclusively in females, which was associated with hippocampal atrophy and impaired hemodynamics. Our study highlights the intricate interplay between hippocampal structure and vascular function, revealing sex- and subfield-specific aging trajectories. These findings provide a normative quantitative imaging reference to age-related neurodegenerative diseases such as Alzheimer's Disease.

ICML Conference 2025 Conference Paper

IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling

  • Kuan-Po Huang
  • Shu-Wen Yang
  • Huy Phan
  • Bo-Ru Lu
  • Byeonggeun Kim
  • Sashank Macha
  • Qingming Tang
  • Shalini Ghosh

Text-to-audio generation synthesizes realistic sounds or music given a natural language prompt. Diffusion-based frameworks, including the Tango and the AudioLDM series, represent the state-of-the-art in text-to-audio generation. Despite achieving high audio fidelity, they incur significant inference latency due to the slow diffusion sampling process. MAGNET, a mask-based model operating on discrete tokens, addresses slow inference through iterative mask-based parallel decoding. However, its audio quality still lags behind that of diffusion-based models. In this work, we introduce IMPACT, a text-to-audio generation framework that achieves high performance in audio quality and fidelity while ensuring fast inference. IMPACT utilizes iterative mask-based parallel decoding in a continuous latent space powered by diffusion modeling. This approach eliminates the fidelity constraints of discrete tokens while maintaining competitive inference speed. Results on AudioCaps demonstrate that IMPACT achieves state-of-the-art performance on key metrics including Fréchet Distance (FD) and Fréchet Audio Distance (FAD) while significantly reducing latency compared to prior models. The project website is available at https: //audio-impact. github. io/.

EAAI Journal 2025 Journal Article

Multimodal stock market emotion recognition model trained with a large language model

  • Chao Liu
  • Yuxia Miao
  • Qi Zhao
  • Chao Wang
  • Xiangyu Zhu

Stock market emotion recognition models are often trained on public textual datasets. These datasets are well-labelled but may not have the same distribution as real-world data and consequently fail to reflect the real world. Furthermore, stock market emotion is reflected not only in textual comments but also in images; emotion recognition on the basis of textual comments alone is one-sided. Considering these issues, in this paper, we propose a multimodal emotion recognition framework that is trained via imitation learning from a large language model. The proposed framework contains two main innovations. First, by leveraging large language model's compositionality capability, the proposed framework generates pseudo labels from textual comments. Through imitating the patterns of the pseudo labels, the framework can be trained directly on unlabelled real-world data, which addresses current models' distribution drift between public datasets and the real-world. Second, multimodal fusion is equipped with the proposed framework, enabling emotion recognition from textual stock market comments and images simultaneously. Compared with existing methods, the proposed method in this paper significantly improves the performance of stock market emotion recognition by leveraging large language models and multimodal stock market data. The experimental results demonstrate that the emotion recognition framework proposed in this paper outperforms the existing single-mode models and multimodal models, with the accuracy, precision, recall, and F1 score being 82. 97 %, 83. 03 %, 83. 05 %, and 82. 88 % respectively. This framework provides a new pathway for multimodal and unsupervised/semi-supervised emotion recognition in the stock market.

IROS Conference 2025 Conference Paper

Parameter Selections and Applications for Soft Bellows Actuators (SBAs) with Various Performance Metrics

  • Wenjing Zou
  • Zhekai Li
  • Ziting Xiao
  • Kailan Zheng
  • Chao Lin
  • Haolin Chen
  • Peifeng Yu
  • Yi Niu

Soft bellows actuators (SBAs), a particular type of soft pneumatic actuators (SPAs), are widely used in various applications, such as climbing robots, industrial grippers, and wearable devices. Despite their advantages of uniform motion and high efficiency, the design of SBAs often relies on experiential methods rather than standardized guidelines. This results in unclear optimization pathways and a misalignment between SBA performance and specific application requirements. This study identifies six critical parameters of linear pneumatic SBAs: Shore hardness (SH), number of units (N), thickness (t), mid-diameter (R m ), unit width (x), and unit depth (h). We explore how these parameters influence load capacity, displacement efficiency, and bending resistance. Experimental findings indicate that increasing SH, t, x, and h and decreasing N enhance load capacity. Moreover, increases in N, R m, x, and h, along with decreases in SH and t, improve displacement efficiency. Furthermore, enhancing SH, t, and R m and reducing N, x, and h strengthen bending resistance. Based on these insights, we design three types of SBAs tailored to specific tasks, which are implemented in a high-load pneumatic gripper, a high-efficiency displacement table, and a pneumatic worm-inspired climbing robot. This research contributes to the targeted design of SBAs, offering a novel approach for the effective optimization and performance prediction of particular SPAs, thereby facilitating the broader application of soft robots.

EAAI Journal 2025 Journal Article

Plug-and-play dynamic optimization for three-dimensional Gaussian generation

  • Qixuan Li
  • Haoyang Li
  • Chao Wang
  • Yang Zhou
  • Yan Peng

Recent advancements in Three-Dimensional (3D) asset generation have demonstrated remarkable progress in generation efficiency, enabling transformative applications across creative industries and mission-critical domains including autonomous systems. Current 3D asset generation primarily employs Score Distillation Sampling (SDS) to derive 3D priors from Two-Dimensional (2D) diffusion models. While this contribution ensures high generation quality, it is time-consuming. Recent methods have utilized 3D Gaussian Splatting for image rendering, which, despite enhancing generation speed, compromised on quality. Our method aims to balance the quality and speed of 3D asset generation by designing a plug-and-play optimization process that combines the strengths of both methods. We propose a rapid 3D Gaussian generation framework that begins with constructing a pipeline to generate multi-view images from text input using pre-trained generative models. Then our method utilizes 3D Gaussian Splatting for quick 3D asset initialization and subsequently performs detail optimization using Gaussian Filter and SDS-based 2D diffusion model optimizer. Additionally, we have optimized the loss function for 3D Gaussian Splatting and ensured the entire optimization process is plug-and-play, offering high generation quality and speed. Our method demonstrates strong adaptability in representative single-object 3D Gaussian generation tasks, indicating promising generalization potential. Achieving high-quality 3D generation on a single Graphics Processing Unit (GPU), our framework outperforms most popular optimization-based models in generation speed (5 × speedup +). Furthermore, when juxtaposed with the latest inference-based models, our optimization architecture offers a notable enhancement in generation quality (Contrastive Language-Image Pre-Training Score 33. 8 vs. 27. 3) within an acceptable amount of time.

AAAI Conference 2025 Conference Paper

RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs

  • Jiaxing Wu
  • Lin Ning
  • Luyang Liu
  • Harrison Lee
  • Neo Wu
  • Chao Wang
  • Sushant Prakash
  • Shawn O'Banion

LLM-powered personalization agent systems employ Large Language Models (LLMs) to predict users’ behavior from their past activities. However, their effectiveness often hinges on the ability to effectively leverage extensive, long user historical data due to its inherent noise and length of such data. Existing pre-trained LLMs may generate summaries that are concise but lack the necessary context for downstream tasks, hindering their utility in personalization systems. To address these challenges, we introduce Reinforcement Learning from Prediction Feedback (RLPF). RLPF fine-tunes LLMs to generate concise, human-readable user summaries that are optimized for downstream task performance. By maximizing the usefulness of the generated summaries, RLPF effectively distills extensive user history data while preserving essential information for downstream tasks. Our empirical evaluation demonstrates significant improvements in both extrinsic downstream task utility and intrinsic summary quality, surpassing baseline methods by up to 22% and achieving an up to 84.59% win rate on Factuality, Abstractiveness, and Readability. RLPF also achieves a remarkable 74% reduction while improving performance on 16 out of 19 unseen tasks and/or datasets, showcasing its generalizability. This approach offers a promising solution for enhancing LLM personalization by effectively transforming long, noisy user histories into informative and human-readable representations.

EAAI Journal 2025 Journal Article

Shapelet and Graph Convolutional Network with transformer for Channel Access Attacks classification

  • Yiting Hou
  • Jianhua Fan
  • Xianglin Wei
  • Chao Wang

The open and broadcast nature of wireless communication makes signals susceptible to Channel Access Attacks (CAA) at Medium Access Control (MAC) layer, disrupting network performance. Existing Graph Neural Network (GNN)-based detection methods face critical limitations: graph construction from time series often disrupts temporal continuity, and detection accuracy degrades as the distance from attack source increases. To address these challenges, we propose a novel node-level classification framework, Shapelet and Transformer Graph Convolutional Network (SHA-TGCN). The key innovations lies in leveraging GNN inherent advantages: lower data requirements and lower computational complexity. SHA-TGCN requires less training data and has lower complexity than traditional neural networks, benefiting from local feature aggregationand our compact Shapelet-based structure, making it suitable for real-time CAA detection in resource-constrained wireless network environments. Experimental results demonstrate that SHA-TGCN outperforms other GNNs with an average classification accuracy of 80. 31% and fast computational efficiency.

AAAI Conference 2025 Conference Paper

SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

  • Shuaijie Shen
  • Chao Wang
  • Renzhuo Huang
  • Yan Zhong
  • Qinghai Guo
  • Zhichao Lu
  • Jianguo Zhang
  • Luziwei Leng

Known as low energy consumption networks, spiking neural networks (SNNs) have gained a lot of attention within the past decades. While SNNs are increasing competitive with artificial neural networks (ANNs) for vision tasks, they are rarely used for long sequence tasks, despite their intrinsic temporal dynamics. In this work, we develop spiking state space models (SpikingSSMs) for long sequence learning by leveraging on the sequence learning abilities of state space models (SSMs). Inspired by dendritic neuron structure, we hierarchically integrate neuronal dynamics with the original SSM block, meanwhile realizing sparse synaptic computation. Furthermore, to solve the conflict of event-driven neuronal dynamics with parallel computing, we propose a light-weight surrogate dynamic network which accurately predicts the after-reset membrane potential and compatible to learnable thresholds, enabling orders of acceleration in training speed compared with conventional iterative methods. On the long range arena benchmark task, SpikingSSM achieves competitive performance to state-of-the-art SSMs meanwhile realizing on average 90% of network sparsity. On language modeling, our network significantly surpasses existing spiking large language models (spikingLLMs) on the WikiText-103 dataset with only a third of the model size, demonstrating its potential as backbone architecture for low computation cost LLMs.

TMLR Journal 2025 Journal Article

TempFlex: Advancing MLLMs with Temporal Perception and Natively Scalable Resolution Encoding

  • Zhanyu Wang
  • Chen Tang
  • Haoyu He
  • Kuan Feng
  • Chao Wang
  • Bingni Zhang
  • Xiaolei Xu
  • SHEN WANG

Multimodal large language models (MLLMs) have made significant progress across vision-language tasks, yet many designs still suffer from two core limitations. (i) Excessive visual tokens and broken global context: Tiled Patch Encoding fragments high-resolution images, leading to token overload and disrupting global attention modeling. (ii) Lack of temporal reasoning: Most models process video as independent frames using static image encoders, failing to capture temporal dynamics. We present TempFlex-VL, a token-efficient and temporally aware MLLM that addresses both issues through lightweight architectural enhancements. First, we introduce a resolution-agnostic visual encoder that directly processes full images without tiling, preserving global context while substantially reducing visual tokens. Second, we propose Temporal Fiber Fusion (TFF), a plug-and-play module with three complementary pathways: (1) a dynamic local-convolution branch for fine-grained motion, (2) a gated memory accumulator for long-term dependencies, and (3) a periodic encoder for modeling cyclic patterns. These signals are softly fused, enabling the model to adapt to diverse temporal structures without overfitting. To support large-scale video-language pretraining, we curate TempFlex-2M, a high-quality synthetic video–text corpus generated in a single stage via GPT-4o with direct visual prompting. We instantiate TempFlex-VL using two different language backbones, Gemma3-4B and Qwen3-4B, demonstrating the generality of our design across architectures. Both variants achieve state-of-the-art or competitive results on a wide range of image and video benchmarks while markedly improving token efficiency. Code is publicly available at: https://github.com/wang-zhanyu/TempFlex.

TAAS Journal 2025 Journal Article

The Comp-TSSs Scheme for Anomaly Detection in AI-Powered Autonomous Driving

  • Jiuzhen Zeng
  • Laurence T. Yang
  • Chao Wang
  • Lei Zhang

Given the vulnerability of vehicular networks to security attacks and the criticality of secure AI-powered autonomous driving, this paper emphasizes the security issue concerning vehicular networks in AI-powered autonomous vehicles. The novel complementary tensor summary statistics named as Comp-TSSs, is proposed for the statistical depiction of discrepancy between normal and abnormal volume instances in vehicular networks. This suggested Comp-TSSs enhances vehicular network security by incorporating reconstruction and regularization statistic terms derived from TPCA, which is extended from PCA through a fresh perspective of fully diagonalizing the covariance tensor. Comp-TSSs effectively captures multi-dimensional correlations in vehicular network volume data, providing complementary measures for representation residuals and weighted distances of instances projected in the principal tensor subspace. Building upon Comp-TSSs, a non-parametric statistic framework is developed for real-time detection of diverse volume anomalies, ensuring the security of AI-powered autonomous driving. The theoretical analyses concerning its detection performance and parameter selection are provided as well. Extensive experiments on synthetic and real-world datasets validate our superior vehicular network security monitoring system for AI-powered autonomous vehicles. It demonstrates higher true positive rates, lower false alarm rates, and minimal detection delays, even when both of the energy and variance anomalies are present.

EAAI Journal 2025 Journal Article

Traffic congestion recognition based on convolutional neural networks in different scenarios

  • Chao Wang
  • Qiang Shang
  • Kun Liu
  • Wenxue Zhang

Traffic congestion has become a common problem worldwide, leading to the waste of time and resources. Accurate recognition of traffic congestion is crucial to solve these problems. Traditional traffic congestion recognition methods can only be accurate in a specific scenario. To solve the generalization of the model for traffic congestion recognition under different lighting conditions and weather. This paper proposes a traffic congestion recognition framework based on convolutional neural networks to recognize traffic congestion in different scenarios such as daytime, nighttime, rainy days, and foggy days. Since foggy obscure critical information in traffic images, and sensitivity analysis reveals that foggy conditions have a significant impact on traffic images, this paper proposes a Residual Attention-based Gated Fusion Defogging Network (RAGFNet) to process foggy traffic images. You Only Look Once (YOLO) demonstrates outstanding performance and efficiency in the field of image recognition, so the eighth version of You Only Look Once combined with local feature extraction (LFE-YOLOv8) is used to recognize traffic images. The model we used has the highest accuracy of 98. 2% compared to other mainstream models, Recall is 97. 7% and F1 value is 98. 6%. Finally, to validate the applicability of the model in Intelligent Transportation Systems. A real-time evaluation was performed using Frames Per Second (FPS); congested areas were visualized using Gradient-weighted Class Activation Mapping (Grad-CAM) to explain the model and make it easier for traffic engineers to understand the model's decisions.

AAAI Conference 2025 Conference Paper

Unaligned Message-Passing and Contextualized-Pretraining for Robust Geo-Entity Resolution

  • Yuwen Ji
  • Wenbo Xie
  • Jiaqi Zhang
  • Chao Wang
  • Ning Guo
  • Lei Shi
  • Yue Zhang

Geo-entity resolution involves linking records that refer to the same entities across different spatial datasets, which underpins location-based services. Given the varying quality of geo-data, this task is known to be challenging, as directly comparing the semantic-centric representations of two entities is no longer reliable. To robustify geo-entity resolution in this context, the main research question is how to effectively extend the current semantics-centric representations of geo-entity with geographical context from its spatial neighbors. Existing methods consider names from neighbors, but they struggle to fully utilize the unaligned neighbor attributes. In this paper, we study the representation of geo-context for robust geo-entity resolution and propose two adaptations that efficiently leverage unaligned geo-entity attributes across spatial neighbors: (1) A plugin module, namely Unaligned Message-Passing (UMP), that propagates unaligned neighbor features to integrate geo-context into the token embeddings output by language model; (2) a contextualized pretraining framework (CP) that allows the former to leverage unlabelled geo-entity data. Experiments show that our method surpasses the baselines, achieving higher F1 scores on 8 real-world geo-datasets in terms of robustness, with an improvement of up to 7.9%. The ablation study further justifies our proposal.

EAAI Journal 2025 Journal Article

Underwater acoustic signal recognition system with multi-scale hybrid cepstral feature strategy and joint deep network

  • Hong Yang
  • Jinmei Li
  • Guohui Li
  • Chao Wang

In this paper, we propose a new underwater acoustic signal recognition system to address the recognition difficulties caused by the susceptibility of signals to complex noise interference in underwater acoustic environments. Specifically, the proposed system includes two stages: feature extraction and recognition. Feature extraction: a multi-scale hybrid cepstral feature strategy is proposed. It uses new singular spectrum decomposition to obtain multi-scale components and then extracts the Mel-frequency cepstral coefficients, inverse Mel-frequency cepstral coefficients, Gammatone frequency cepstral coefficients, and linear prediction cepstral coefficients of each component. After feature enhancement and selection, a novel multi-scale hybrid cepstral feature set is constructed. This feature set realizes the complementarity and enhancement of different cepstral features and effectively solves the problems of single feature expression and data redundancy. Recognition: a new joint deep network model is proposed. It adopts the unique design of one-dimensional convolutional neural network (1DCNN) and bidirectional gated recursive unit (BiGRU), which realizes the mutual complement of spatial information extracted by 1DCNN and dependent information captured by BiGRU and effectively improves the processing ability of the model for complex feature sets. In addition, the Kepler optimization algorithm and self-concern mechanism are introduced into the network, which solves the problem of selecting network parameters and improves the focus ability of the model on key features. By setting up multiple groups of comparison and ablation experiments, the recognition results of underwater acoustic data, including ship-radiated noise signals and marine biological signals, show that the recognition accuracy of the proposed system reaches 96. 11 % and 98. 67 %, respectively, which is better than all comparison methods. In addition, we further verified that the system still has high robustness under a low signal-to-noise ratio, which provides new ideas for research in the field of underwater acoustic signal recognition.

AIJ Journal 2024 Journal Article

A crossword solving system based on Monte Carlo tree search

  • Jingping Liu
  • Lihan Chen
  • Sihang Jiang
  • Chao Wang
  • Sheng Zhang
  • Jiaqing Liang
  • Yanghua Xiao
  • Rui Song

Although the development of AI in games is remarkable, intelligent machines still lag behind humans in games that require the ability of language understanding. In this paper, we focus on the crossword puzzle resolution task. Solving crossword puzzles is a challenging task since it requires the ability to answer natural language questions with knowledge and the ability to execute a search over possible answers to find an optimal set of solutions for the grid. Previous solutions are devoted to exploiting heuristic strategies in search to find solutions while having limited ability to explore the search space. We build a comprehensive system for crossword puzzle resolution based on Monte Carlo Tree Search (MCTS). As far as we know, we are the first to model the crossword puzzle resolution problem as a Markov Decision Process and apply the MCTS to solve it. We construct a dataset for crossword puzzle resolution based on daily puzzles from The New York Times with detailed specifications of both the puzzle and clue database selection. Our method achieves state-of-the-art performance on the dataset. The code of the system and experiments in this paper is publicly available: https: //www. github. com/lhlclhl/CP.

EAAI Journal 2024 Journal Article

A robustness division based multi-population evolutionary algorithm for solving vehicle routing problems with uncertain demand

  • Hao Jiang
  • Yanhui Tong
  • Bowen Song
  • Chao Wang
  • Jiahang Li
  • Qi Liu
  • Xingyi Zhang

The vehicle routing problem with uncertain demand (VRPUD) is an extension of capacitated vehicle routing problem (CVRP), where the demand of each customer is unknown when dispatching the vehicles to service customers. Since it is more practical than CVRP, the VRPUD has aroused wide attention. Although the evolutionary algorithms (EAs) have demonstrate its promising performance on solving VRPUD, the most of EAs only consider the robustness of solution after generating offspring, which limit the quality of solutions found by EAs. To this end, in this paper, a robustness division based multi-population evolutionary algorithm (RDMPEA) is developed for VRPUDs, where the robustness is considered before, during and after offspring. Specifically, before generating offspring, the RDMPEA first divides the individuals into different subpopulations according to their robustness level, and only the individuals within the same subpopulation can match each other and generate offspring. During generating offspring, the RDMPEA employs a route based crossover operator to generate offspring, where the routes with higher robustness have a greater probability of being inherited by the offspring. After generating offspring, a dedicated environment selection strategy is applied to survive the individuals with better robustness and travel cost. In the experiments, the proposed RDMPEA is compared to three state-of-the-art heuristic methods tailored for VRPUDs on a variety of instances obtained by using three widely used vehicle routing problem benchmarks. The experimental results indicate that the proposed RDMPEA is superior to three compared algorithms, and can find solutions with better travel cost and robustness.

AAAI Conference 2024 Conference Paper

Beyond Entities: A Large-Scale Multi-Modal Knowledge Graph with Triplet Fact Grounding

  • Jingping Liu
  • Mingchuan Zhang
  • Weichen Li
  • Chao Wang
  • Shuang Li
  • Haiyun Jiang
  • Sihang Jiang
  • Yanghua Xiao

Much effort has been devoted to building multi-modal knowledge graphs by visualizing entities on images, but ignoring the multi-modal information of the relation between entities. Hence, in this paper, we aim to construct a new large-scale multi-modal knowledge graph with triplet facts grounded on images that reflect not only entities but also their relations. To achieve this purpose, we propose a novel pipeline method, including triplet fact filtering, image retrieving, entity-based image filtering, relation-based image filtering, and image clustering. In this way, a multi-modal knowledge graph named ImgFact is constructed, which contains 247,732 triplet facts and 3,730,805 images. In experiments, the manual and automatic evaluations prove the reliable quality of our ImgFact. We further use the obtained images to enhance model performance on two tasks. In particular, the model optimized by our ImgFact achieves an impressive 8.38% and 9.87% improvement over the solutions enhanced by an existing multi-modal knowledge graph and VisualChatGPT on F1 of relation classification. We release ImgFact and its instructions at https://github.com/kleinercubs/ImgFact.

EAAI Journal 2024 Journal Article

Combustion process modeling based on deep sparse least squares support vector regression

  • Wei Zheng
  • Chao Wang
  • Da Liu

In the face of massive historical data of coal-fired power plants, the method of Deep Sparse Least Squares Support Vector Regression (DS-LSSVR) is proposed for building the combustion process model with ideal prediction accuracy and speed. The sparsity process contains two stages. In the first stage, bottom compensation clustering based on grey relational entropy is proposed for deleting similar training samples on the basis of preserving model information as much as possible. In the second stage, the contributive and weighted particle swarm optimization algorithm is proposed to achieve the deep sparsity of DS-LSSVR model. The simulation experiments show that the DS-LSSVR model owns a higher sparsity rate than other sparse LSSVR models. In the application experiment, DS-LSSVR is used to develop the model of NOx emissions and obtains a sparsity rate of 91% with a high prediction accuracy. Moreover, the prediction time of DS-LSSVR model of NOx emissions is less than 1 ms. Therefore, the DS-LSSVR model is able to provide a powerful support for predicting and optimizing the performance index of combustion process online.

TCS Journal 2024 Journal Article

Decision algorithms for reversibility of 1D cellular automata under reflective boundary conditions

  • Junchi Ma
  • Chen Wang
  • Weilin Chen
  • Defu Lin
  • Chao Wang

Reversibility is one of the most significant properties of cellular automata (CA). In this paper, we focus on the reversibility of one-dimensional finite CA under reflective boundary conditions (RBC). We present two algorithms for deciding the reversibility of one-dimensional CA under RBC. Both algorithms work for not only linear rules but also non-linear rules. The first algorithm is to determine what we call the “strict reversibility” of CA. The second algorithm is to compute what we call the “reversibility function” of CA. Reversibility functions are proved to be periodic. Based on the algorithms, we list some experiment results of one-dimensional CA under RBC and analyse some features of this family of CA.

IJCAI Conference 2024 Conference Paper

DGR: A General Graph Desmoothing Framework for Recommendation via Global and Local Perspectives

  • Leilei Ding
  • Dazhong Shen
  • Chao Wang
  • Tianfu Wang
  • Le Zhang
  • Yanyong Zhang

Graph Convolutional Networks (GCNs) have become pivotal in recommendation systems for learning user and item embeddings by leveraging the user-item interaction graph's node information and topology. However, these models often face the famous over-smoothing issue, leading to indistinct user and item embeddings and reduced personalization. Traditional desmoothing methods in GCN-based systems are model-specific, lacking a universal solution. This paper introduces a novel, model-agnostic approach named Desmoothing Framework for GCN-based Recommendation Systems (DGR). It effectively addresses over-smoothing on general GCN-based recommendation models by considering both global and local perspectives. Specifically, we first introduce vector perturbations during each message passing layer to penalize the tendency of node embeddings approximating overly to be similar with the guidance of the global topological structure. Meanwhile, we further develop a tailored-design loss term for the readout embeddings to preserve the local collaborative relations between users and their neighboring items. In particular, items that exhibit a high correlation with neighboring items are also incorporated to enhance the local topological information. To validate our approach, we conduct extensive experiments on 5 benchmark datasets based on 5 well-known GCN-based recommendation models, demonstrating the effectiveness and generalization of our proposed framework. Our code is available at https: //github. com/me-sonandme/DGR.

ICML Conference 2024 Conference Paper

DiffFPR: Diffusion Prior for Oversampled Fourier Phase Retrieval

  • Ji Li
  • Chao Wang

This paper tackled the challenging Fourier phase retrieval problem, the absolute uniqueness of which does not hold. The existence of equivalent solution (a. k. a. trivial solution ambiguity) hinders the successful recovery, especially for multi-channel color image. The traditional iterative engine, such as the Relaxed Averaged Alternating Reflections (RAAR), can be applied to reconstruct the image channel-wisely. However, due to the relative uniqueness of the solution, the restoration is not automatically aligned with the accurate orientation for each channel, resulting in a reconstructed image that deviates significantly from the true solution manifold. To address this issue, by penalizing the mismatch of the image channels, a diffusion model as the strong prior of the color image is integrated into the iterative engine. The combination of the traditional iterative engine and the diffusion model provides an effective solution to the oversampled Fourier phase retrieval. The formed algorithm, DiffFPR, is validated by experiments. The code is available at https: //github. com/Chilie/DiffFPR.

AAAI Conference 2024 Conference Paper

Emergent Communication for Numerical Concepts Generalization

  • Enshuai Zhou
  • Yifan Hao
  • Rui Zhang
  • Yuxuan Guo
  • Zidong Du
  • Xishan Zhang
  • Xinkai Song
  • Chao Wang

Research on emergent communication has recently gained significant traction as a promising avenue for the linguistic community to unravel human language's origins and explore artificial intelligence's generalization capabilities. Current research has predominantly concentrated on recognizing qualitative patterns of object attributes(e.g., shape and color) and paid little attention to the quantitative relationship among object quantities which is known as the part of numerical concepts. The ability to generalize numerical concepts, i.e., counting and calculations with unseen quantities, is essential, as it mirrors humans' foundational abstract reasoning abilities. In this work, we introduce the NumGame, leveraging the referential game framework, forcing agents to communicate and generalize the numerical concepts effectively. Inspired by the human learning process of numbers, we present a two-stage training approach that sequentially fosters a rudimentary numerical sense followed by the ability of arithmetic calculation, ultimately aiding agents in generating semantically stable and unambiguous language for numerical concepts. The experimental results indicate the impressive generalization capabilities to unseen quantities and regularity of the language emergence from communication.

EAAI Journal 2024 Journal Article

Fatigue life prediction driven by mesoscopic defect data

  • Chao Wang
  • Yali Yang
  • Hao Chen
  • Sha Xu
  • Yongfang Li
  • Ruoping Zhang
  • Ming Ling

The research of predicting fatigue life through defect features is somewhat limited. In order to further study the influence of defect characteristics on fatigue life, a modification of Murakami model was proposed to calculate relative stress intensity factor, which is related with the influence of location, size, length-diameter ratio, flattening rate and adjacent interaction of defects comprehensively. A physics-informed neural network (PINN) was constructed with a physical information loss function transformed based on the relative stress intensity factor. Recognizing the challenges posed by inadequate training data, a mega trend diffusion technique based Gaussian distribution (G-MTD) is proposed to augment the dataset and maintain the distribution of the original data. By merging the G-MTD technique with the PINN, a comprehensive machine learning framework is established for the fatigue life prediction. The research findings demonstrate that this framework yields higher prediction accuracy and efficiency than the purely data-driven methods.

IJCAI Conference 2024 Conference Paper

FlagVNE: A Flexible and Generalizable Reinforcement Learning Framework for Network Resource Allocation

  • Tianfu Wang
  • Qilin Fan
  • Chao Wang
  • Long Yang
  • Leilei Ding
  • Nicholas Jing Yuan
  • Hui Xiong

Virtual network embedding (VNE) is an essential resource allocation task in network virtualization, aiming to map virtual network requests (VNRs) onto physical infrastructure. Reinforcement learning (RL) has recently emerged as a promising solution to this problem. However, existing RL-based VNE methods are limited by the unidirectional action design and one-size-fits-all training strategy, resulting in restricted searchability and generalizability. In this paper, we propose a flexible and generalizable RL framework for VNE, named FlagVNE. Specifically, we design a bidirectional action-based Markov decision process model that enables the joint selection of virtual and physical nodes, thus improving the exploration flexibility of solution space. To tackle the expansive and dynamic action space, we design a hierarchical decoder to generate adaptive action probability distributions and ensure high training efficiency. Furthermore, to overcome the generalization issue for varying VNR sizes, we propose a meta-RL-based training method with a curriculum scheduling strategy, facilitating specialized policy training for each VNR size. Finally, extensive experimental results show the effectiveness of FlagVNE across multiple key metrics. Our code is available at https: //github. com/GeminiLight/flag-vne.

AAAI Conference 2024 Short Paper

icsPLMs: Exploring Pre-trained Language Models in Intelligent Customer Service (Student Abstract)

  • Shixuan Liu
  • Chao Wang
  • Shuangyong Song

Pre-trained language models have shown their high performance of text processing in intelligent customer service platforms. However, these models do not leverage domain specific information. In this paper, we propose icsPLMs optimized for intelligent customer service on both word and sentence levels. Our experimental results represent that using targeted strategies can further improve the performance of pre-trained language models in this field.

NeurIPS Conference 2024 Conference Paper

Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking

  • Xi Chen
  • Chuan Qin
  • Chuyu Fang
  • Chao Wang
  • Chen Zhu
  • Fuzhen Zhuang
  • Hengshu Zhu
  • Hui Xiong

In a rapidly evolving job market, skill demand forecasting is crucial as it enables policymakers and businesses to anticipate and adapt to changes, ensuring that workforce skills align with market needs, thereby enhancing productivity and competitiveness. Additionally, by identifying emerging skill requirements, it directs individuals towards relevant training and education opportunities, promoting continuous self-learning and development. However, the absence of comprehensive datasets presents a significant challenge, impeding research and the advancement of this field. To bridge this gap, we present Job-SDF, a dataset designed to train and benchmark job-skill demand forecasting models. Based on millions of public job advertisements collected from online recruitment platforms, this dataset encompasses monthly recruitment demand. Our dataset uniquely enables evaluating skill demand forecasting models at various granularities, including occupation, company, and regional levels. We benchmark a range of models on this dataset, evaluating their performance in standard scenarios, in predictions focused on lower value ranges, and in the presence of structural breaks, providing new insights for further research. Our code and dataset are publicly accessible via the https: //github. com/Job-SDF/benchmark.

AAAI Conference 2024 Conference Paper

Multi-Domain Multi-Scale Diffusion Model for Low-Light Image Enhancement

  • Kai Shang
  • Mingwen Shao
  • Chao Wang
  • Yuanshuo Cheng
  • Shuigen Wang

Diffusion models have achieved remarkable progress in low-light image enhancement. However, there remain two practical limitations: (1) existing methods mainly focus on the spatial domain for the diffusion process, while neglecting the essential features in the frequency domain; (2) conventional patch-based sampling strategy inevitably leads to severe checkerboard artifacts due to the uneven overlapping. To address these limitations in one go, we propose a Multi-Domain Multi-Scale (MDMS) diffusion model for low-light image enhancement. In particular, we introduce a spatial-frequency fusion module to seamlessly integrates spatial and frequency information. By leveraging the Multi-Domain Learning (MDL) paradigm, our proposed model is endowed with the capability to adaptively facilitate noise distribution learning, thereby enhancing the quality of the generated images. Meanwhile, we propose a Multi-Scale Sampling (MSS) strategy that follows a divide-ensemble manner by merging the restored patches under different resolutions. Such a multi-scale learning paradigm explicitly derives patch information from different granularities, thus leading to smoother boundaries. Furthermore, we empirically adopt the Bright Channel Prior (BCP) which indicates natural statistical regularity as an additional restoration guidance. Experimental results on LOL and LOLv2 datasets demonstrate that our method achieves state-of-the-art performance for the low-light image enhancement task. Codes are available at https://github.com/Oliiveralien/MDMS.

NeurIPS Conference 2024 Conference Paper

On the Target-kernel Alignment: a Unified Analysis with Kernel Complexity

  • Chao Wang
  • Xin He
  • Yuwen Wang
  • Junhui Wang

This paper investigates the impact of alignment between the target function of interest and the kernel matrix on a variety of kernel-based methods based on a general loss belonging to a rich loss function family, which covers many commonly used methods in regression and classification problems. We consider the truncated kernel-based method (TKM) which is estimated within a reduced function space constructed by using the spectral truncation of the kernel matrix and compare its theoretical behavior to that of the standard kernel-based method (KM) under various settings. By using the kernel complexity function that quantifies the complexity of the induced function space, we derive the upper bounds for both TKM and KM, and further reveal their dependencies on the degree of target-kernel alignment. Specifically, for the alignment with polynomial decay, the established results indicate that under the just-aligned and weakly-aligned regimes, TKM and KM share the same learning rate. Yet, under the strongly-aligned regime, KM suffers the saturation effect, while TKM can be continuously improved as the alignment becomes stronger. This further implies that TKM has a strong ability to capture the strong alignment and provide a theoretically guaranteed solution to eliminate the phenomena of saturation effect. The minimax lower bound is also established for the squared loss to confirm the optimality of TKM. Extensive numerical experiments further support our theoretical findings. The Python code for reproducing the numerical experiments is available at https: //github. com/wywangen.

NeurIPS Conference 2024 Conference Paper

OwMatch: Conditional Self-Labeling with Consistency for Open-World Semi-Supervised Learning

  • Shengjie Niu
  • Lifan Lin
  • Jian Huang
  • Chao Wang

Semi-supervised learning (SSL) offers a robust framework for harnessing the potential of unannotated data. Traditionally, SSL mandates that all classes possess labeled instances. However, the emergence of open-world SSL (OwSSL) introduces a more practical challenge, wherein unlabeled data may encompass samples from unseen classes. This scenario leads to misclassification of unseen classes as known ones, consequently undermining classification accuracy. To overcome this challenge, this study revisits two methodologies from self-supervised and semi-supervised learning, self-labeling and consistency, tailoring them to address the OwSSL problem. Specifically, we propose an effective framework called OwMatch, combining conditional self-labeling and open-world hierarchical thresholding. Theoretically, we analyze the estimation of class distribution on unlabeled data through rigorous statistical analysis, thus demonstrating that OwMatch can ensure the unbiasedness of the label assignment estimator with reliability. Comprehensive empirical analyses demonstrate that our method yields substantial performance enhancements across both known and unknown classes in comparison to previous studies. Code is available at https: //github. com/niusj03/OwMatch.

IJCAI Conference 2024 Conference Paper

Pre-DyGAE: Pre-training Enhanced Dynamic Graph Autoencoder for Occupational Skill Demand Forecasting

  • Xi Chen
  • Chuan Qin
  • Zhigaoyuan Wang
  • Yihang Cheng
  • Chao Wang
  • Hengshu Zhu
  • Hui Xiong

Occupational skill demand (OSD) forecasting seeks to predict dynamic skill demand specific to occupations, beneficial for employees and employers to grasp occupational nature and maintain a competitive edge in the rapidly evolving labor market. Although recent research has proposed data-driven techniques for forecasting skill demand, the focus has remained predominantly on overall trends rather than occupational granularity. In this paper, we propose a novel Pre-training Enhanced Dynamic Graph Autoencoder (Pre-DyGAE), forecasting skill demand from an occupational perspective. Specifically, we aggregate job descriptions (JDs) by occupation and segment them into several timestamps. Subsequently, in the initial timestamps, we pre-train a graph autoencoder (GAE), consisting of a semantically-aware cross-attention enhanced uncertainty-aware encoder and decoders for link prediction and edge regression to achieve graph reconstruction. In particular, we utilize contrastive learning on skill cooccurrence clusters to solve the data sparsity and a unified Tweedie and ranking loss for predicting the imbalanced distribution. Afterward, we incorporate an adaptive temporal encoding unit and a temporal shift module into GAE to achieve a dynamic GAE (DyGAE). Furthermore, we fine-tune the DyGAE with a two-stage optimization strategy and infer future representations. Extensive experiments on four real-world datasets validate the effectiveness of Pre-DyGAE compared with state-of-the-art baselines.

IJCAI Conference 2024 Conference Paper

Prompt Learning with Extended Kalman Filter for Pre-trained Language Models

  • Quan Li
  • Xike Xie
  • Chao Wang
  • S. Kevin Zhou

Prompt learning has gained popularity as a means to leverage the knowledge embedded in pre-trained language models (PLMs) for NLP tasks while using a limited number of trainable parameters. While it has shown promise in tasks like sentiment classification and natural language inference, generating suitable prompts for PLMs, as opposed to human prompts, remains a challenge. In this paper, we introduce an abstraction of the prompt learning process using an extended Kalman filter. Our approach, called Conditional Extended Kalman Filter based on Neural Networks (CEKFNN), effectively infers more appropriate prompt tokens by enhancing the classic extended Kalman filter with PLM's contextual representation power. Specifically, CEKFNN learns transition and emission functions from PLM embeddings of input sentences to infer latent prompt tokens. We refine CEKFNN using an alternate-training approach, retraining a PLM's emission function with prompt tokens inferred by prompt models (PMs), as well as the initial and transition functions. PLM's output labels assist in PMs' training. When updating the pre-trained language model (PLM), we use an adapter approach with few trainable parameters, leaving PLM parameters frozen. We evaluate CEKFNN across open-source PLMs, demonstrating performance improvements over state-of-the-art methods while using a limited number of trainable parameters. It shows that CEKFNN performs on-par or better than fine-tuning, which requires updating all parameters in the PLM.

AAAI Conference 2024 Conference Paper

Temporal Graph Contrastive Learning for Sequential Recommendation

  • Shengzhe Zhang
  • Liyi Chen
  • Chao Wang
  • Shuangli Li
  • Hui Xiong

Sequential recommendation is a crucial task in understanding users' evolving interests and predicting their future behaviors. While existing approaches on sequence or graph modeling to learn interaction sequences of users have shown promising performance, how to effectively exploit temporal information and deal with the uncertainty noise in evolving user behaviors is still quite challenging. To this end, in this paper, we propose a Temporal Graph Contrastive Learning method for Sequential Recommendation (TGCL4SR) which leverages not only local interaction sequences but also global temporal graphs to comprehend item correlations and analyze user behaviors from a temporal perspective. Specifically, we first devise a Temporal Item Transition Graph (TITG) to fully leverage global interactions to understand item correlations, and augment this graph by dual transformations based on neighbor sampling and time disturbance. Accordingly, we design a Temporal item Transition graph Convolutional network (TiTConv) to capture temporal item transition patterns in TITG. Then, a novel Temporal Graph Contrastive Learning (TGCL) mechanism is designed to enhance the uniformity of representations between augmented graphs from identical sequences. For local interaction sequences, we design a temporal sequence encoder to incorporate time interval embeddings into the architecture of Transformer. At the training stage, we take maximum mean discrepancy and TGCL losses as auxiliary objectives. Extensive experiments on several real-world datasets show the effectiveness of TGCL4SR against state-of-the-art baselines of sequential recommendation.

ICML Conference 2024 Conference Paper

Towards Theoretical Understanding of Learning Large-scale Dependent Data via Random Features

  • Chao Wang
  • Xin Bing
  • Xin He
  • Caixing Wang

Random feature (RF) mapping is an attractive and powerful technique for solving large-scale nonparametric regression. Yet, the existing theoretical analysis crucially relies on the i. i. d. assumption that individuals in the data are independent and identically distributed. It is still unclear whether learning accuracy would be compromised when the i. i. d. assumption is violated. This paper aims to provide theoretical understanding of the kernel ridge regression (KRR) with RFs for large-scale dependent data. Specifically, we consider two types of data dependence structure, namely, the $\tau$-mixing process with exponential decay coefficient, and that with polynomial decay coefficient. Theoretically, we prove that the kernel ridge estimator with RFs achieves the minimax optimality under the exponential decay scenario, but yields a sub-optimal result under the polynomial decay case. Our analysis further reveals how the decay rate of the $\tau$-mixing coefficient impacts the learning accuracy of the kernel ridge estimator with RFs. Extensive numerical experiments on both synthetic and real examples further validate our theoretical findings and support the effectiveness of the KRR with RFs in dealing with dependent data.

EAAI Journal 2023 Journal Article

A two-stream network with complementary feature fusion for pest image classification

  • Chao Wang
  • Jinrui Zhang
  • Jin He
  • Wei Luo
  • Xiaohui Yuan
  • Lichuan Gu

Pests are diverse and the available datasets often contain an uneven number of examples for different pests (a. k. a. , the long-tail distribution). This poses a great challenge to learning-based classification methods, especially deep networks, and often leads to degraded performance, especially for the minority (tail) classes. This paper presents a deep learning integration architecture based on decoupling training and fusion learning, which integrates different models with complementary performance on pest datasets with a long-tailed distribution to improve the overall classification performance of pests. A deep neural network is designed that fuses two complementary deep learning models at the feature level, which consists of a convolution neural network (ConvNeXt) and a Swin Transformer model for decoupling training. Experiments are conducted using three datasets (d0, insect, and IP102), and evaluation on accuracy, recall, and F1-Score is reported. For the large-scale pest dataset with long-tailed distribution IP102, the accuracy achieves 76. 1%, which outperforms the state-of-the-art methods. In addition, the accuracy for d0 and insect datasets are 98. 5% and 92. 3%, respectively.

EAAI Journal 2023 Journal Article

Autonomous dispatch trajectory planning on flight deck: A search-resampling-optimization framework

  • Xinwei Wang
  • Bai Li
  • Xichao Su
  • Haijun Peng
  • Lei Wang
  • Chen Lu
  • Chao Wang

There is a growing expectation to realize the autonomous dispatch on flight deck, where dispatch trajectory planning is seen as the key technique. Optimal-control based method has shown great advantages in high degree of constraint satisfaction over its counterparts in the last decade. However, it suffers from low computational efficiency even numerical divergence under scenarios with complicated obstacles. To deal with such an issue, a search-resampling-optimization (SRO) framework is proposed in this paper. A hybrid A* algorithm is employed to generate a coarse path according to the boundary conditions in the search stage. Then a resampling process is implemented to pave a series of safe dispatch corridors (SDCs) along the coarse path. Finally, by replacing the common one-to-one collision-avoidance with the constructed within-SDC constraints, an optimal control problem whose scale is totally independent of the number of obstacles can be formulated. The resampled result is further fed into the optimization stage to facilitate the numerical solution. Dispatch trajectory planning for taxiing aircraft and tractor can be treated uniformly under this framework. And numerical simulations demonstrate that the SRO framework is efficient and robust even with narrow accessible tunnels. The SRO is inherently flexible and can be easily extended to the trajectory planning problem in other fields. A video of the main idea and numerical simulations in this paper is available at www. bilibili. com/video/BV1tP4y1d7xy/.

EAAI Journal 2023 Journal Article

Deep transfer learning-based damage detection of composite structures by fusing monitoring data with physical mechanism

  • Cheng Liu
  • Xuebing Xu
  • Jun Wu
  • Haiping Zhu
  • Chao Wang

Damage detection of carbon fiber reinforced plastics (CFRP) composites is a challenging issue owing to their intrinsic high degree of anisotropy with complex failure modes. Data-driven approaches have great potential in damage detection for CFRP composites. However, the lack of physical interpretability makes data-driven methods highly dependent on the amount of data, which involves significant effort in experiment and is impractical to obtain for all damage conditions. To overcome the above challenges, a robust and generalizable framework for damage detection of CFRP composite structures is proposed by fusing monitoring data with physical mechanisms. In this framework, the numerical method is leveraged to build physical models of the composite structures to generate data under various damage conditions. Then, a deep transfer learning-based model is applied in the fusion of experiment and simulation data, which mitigates the discrepancy between the physical model and real experiment. With the combination of domain adaptation and domain adversarial training in the model, the domain invariant features can be generalized from the source domain (simulation data) to the target domain (experiment data). Therefore, the physical interpretability is supplemented in the data-driven model. To verify the adaptability of the method, different transfer tasks based on the accelerated aging experiments are performed. The results show that the proposed method can reduce the dependence of the data-driven methods on real monitoring data and prevails in all evaluation indicators than other methods. Eventually, this method can reduce the experiment effort of damage detection for composites while holding a great detection accuracy.

YNICL Journal 2023 Journal Article

Difference of mean Hounsfield units (dHU) between follow-up and initial noncontrast CT scan predicts 90-day poor outcome in spontaneous supratentorial acute intracerebral hemorrhage with deep convolutional neural networks

  • Xiaona Xia
  • Xiaoqian Zhang
  • Jiufa Cui
  • Qingjun Jiang
  • Shuai Guan
  • Kongming Liang
  • Hao Wang
  • Chao Wang

OBJECTIVES: This study aimed to investigate the usefulness of a new non-contrast CT scan (NCCT) sign called the dHU, which represented the difference in mean Hounsfield unit values between follow-up and the initial NCCT for predicting 90-day poor functional outcomes in acute supratentorial spontaneous intracerebral hemorrhage(sICH) using deep convolutional neural networks. METHODS: A total of 377 consecutive patients with sICH from center 1 and 91 patients from center 2 (external validation set) were included. A receiver operating characteristic (ROC) analysis was performed to determine the critical value of dHU for predicting poor outcome at 90 days. Modified Rankin score (mRS) >3 or >2 was defined as the primary and secondary poor outcome, respectively. Two multivariate models were developed to test whether dHU was an independent predictor of the two unfavorable functional outcomes. RESULTS: The ROC analysis showed that a dHU >2.5 was a critical value to predict the poor outcomes (mRS >3) in sICH. The sensitivity, specificity, and accuracy of dHU >2.5 for poor outcome prediction were 37.5%, 86.0%, and 70.6%, respectively. In multivariate models developed after adjusting for all elements of the ICH score and hematoma expansion, dHU >2.5 was an independent predictor of both primary and secondary poor outcomes (OR = 2.61, 95% CI [1.32,5.13], P = 0.006; OR = 2.63, 95% CI [1.36,5.10], P = 0.004, respectively). After adjustment for all possible significant predictors (p 2.5 had a positive association with primary and secondary poor outcomes (OR = 3.25, 95% CI [1.52,6.98], P = 0.002; OR = 3.42, 95% CI [1.64,7.15], P = 0.001). CONCLUSIONS: The dHU of hematoma based on serial CT scans is independently associated with poor outcomes after acute sICH, which may help predict clinical evolution and guide therapy for sICH patients.

TMLR Journal 2023 Journal Article

GIT-Net: Generalized Integral Transform for Operator Learning

  • Chao Wang
  • Alexandre H. Thiery

This article introduces GIT-Net, a deep neural network architecture for approximating Partial Differential Equation (PDE) operators, inspired by integral transform operators. GIT-NET harnesses the fact that common differential operators commonly used for defining PDEs can often be represented parsimoniously when expressed in specialized functional bases (e.g., Fourier basis). Unlike rigid integral transforms, GIT-Net parametrizes adaptive generalized integral transforms with deep neural networks. When compared to several recently proposed alternatives, GIT-Net's computational and memory requirements scale gracefully with mesh discretizations, facilitating its application to PDE problems on complex geometries. Numerical experiments demonstrate that GIT-Net is a competitive neural network operator, exhibiting small test errors and low evaluations across a range of PDE problems. This stands in contrast to existing neural network operators, which typically excel in just one of these areas.

AAAI Conference 2023 Conference Paper

Incremental Image De-raining via Associative Memory

  • Yi Gu
  • Chao Wang
  • Jie Li

While deep learning models have achieved the state-of-the-art performance on single-image rain removal, most methods only consider learning fixed mapping rules on the single synthetic dataset for lifetime. This limits the real-life application as iterative optimization may change mapping rules and training samples. However, when models learn a sequence of datasets in multiple incremental steps, they are susceptible to catastrophic forgetting that adapts to new incremental episodes while failing to preserve previously acquired mapping rules. In this paper, we argue the importance of sample diversity in the episodes on the iterative optimization, and propose a novel memory management method, Associative Memory, to achieve incremental image de-raining. It bridges connections between current and past episodes for feature reconstruction by sampling domain mappings of past learning steps, and guides the learning to trace the current pathway back to the historical environment without storing extra data. Experiments demonstrate that our method can achieve better performance than existing approaches on both inhomogeneous and incremental datasets within the spectrum of highly compact systems.

EAAI Journal 2023 Journal Article

Integration of ROV and vision-based underwater inspection for Limnoperna fortunei in water conveyance structure

  • Xin Fang
  • Heng Li
  • Sherong Zhang
  • Jikang Zhang
  • Chao Wang
  • Xiaohua Wang
  • Ziao Ma
  • He Jia

The invasion of Limnoperna fortunei (L. fortunei) has been identified as one major biofouling in the operation of hydraulic engineering, which not only corrodes the concrete structures but also reduces the pipe diameter and increases the surface roughness, leading to the decrease of water conveyance capacity and the increase of the project operation cost. To better cope with this problem, an automated underwater inspection analysis scheme for the biofouling of L. fortune is provided in this study, which innovatively integrates the underwater remote operating rover (ROV) and computer vision techniques to inspect and evaluate the invasion of L. fortunei in water conveyance structure. This scheme first presents an image enhancement approach based on the fusion strategy to improve the quality of images extracted from underwater robot inspection videos. Then, the L. fortunei is segmented by U-Net in the enhanced underwater images, and the definition of adherent area ratio quantitatively assesses the biofouling severity. At last, the underwater inspection analysis scheme is implemented in a typical aqueduct, and the automatic analysis results are compared with the field investigation during the emptying maintenance of the aqueduct. In this study, the dataset of real ROV inspection video sequences was first used to evaluate the effectiveness of the proposed method for inspecting L. fortunei invasions in realistic scenarios, and then for the comparison with state-of-the-art methods. The results show that the proposed automated inspection scheme is capable of efficiently improving the underwater imaging quality and accurately detecting the L. fortunei.

JBHI Journal 2023 Journal Article

Sequential Active Contour Based on Morphological-Driven Thresholding for Ultrasound Image Segmentation of Ascites

  • Amirhossein Fallahdizcheh
  • Sandeep Laroia
  • Chao Wang

Paracentesis is a high-demanding and routine operation, which has great potentials and benefits if semi-autonomous procedures can be developed. One of the most important techniques that facilitate semi-autonomous paracentesis is to segment the ascites from ultrasound images accurately and efficiently. The ascites, however, is usually with significantly different shapes and noise among different patients, and its shape/size changes dynamically during the paracentesis. This makes most of existing image segmentation methods either time consuming or inaccurate for segmenting ascites from its background. In this article, we propose a two-stage active contour method to facilitate accurate and efficient segmentation of ascites. First, a morphological-driven thresholding method is developed to locate the initial contour of the ascites automatically. Then, the identified initial contour is fed into a novel sequential active contour algorithm to segment the ascites from background accurately. The proposed method is tested and compared with state-of-the-art active contour methods on over 100 real ultrasound images of ascites, and the results show the superiority of our method in both accuracy and time efficiency.

NeurIPS Conference 2023 Conference Paper

Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift

  • Xingdong Feng
  • Xin He
  • Caixing Wang
  • Chao Wang
  • Jingnan Zhang

Covariate shift occurs prevalently in practice, where the input distributions of the source and target data are substantially different. Despite its practical importance in various learning problems, most of the existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.

AAAI Conference 2022 Conference Paper

DeepVisualInsight: Time-Travelling Visualization for Spatio-Temporal Causality of Deep Classification Training

  • Xianglin Yang
  • Yun Lin
  • Ruofan Liu
  • Zhenfeng He
  • Chao Wang
  • Jin Song Dong
  • Hong Mei

Understanding how the predictions of deep learning models are formed during the training process is crucial to improve model performance and fix model defects, especially when we need to investigate nontrivial training strategies such as active learning, and track the root cause of unexpected training results such as performance degeneration. In this work, we propose a time-travelling visual solution DeepVisualInsight (DVI), aiming to manifest the spatiotemporal causality while training a deep learning image classifier. The spatio-temporal causality demonstrates how the gradient-descent algorithm and various training data sampling techniques can influence and reshape the layout of learnt input representation and the classification boundaries in consecutive epochs. Such causality allows us to observe and analyze the whole learning process in the visible low dimensional space. Technically, we propose four spatial and temporal properties and design our visualization solution to satisfy them. These properties preserve the most important information when projecting and inverse-projecting input samples between the visible low-dimensional and the invisible high-dimensional space, for causal analyses. Our extensive experiments show that, comparing to baseline approaches, we achieve the best visualization performance regarding the spatial/temporal properties and visualization efficiency. Moreover, our case study shows that our visual solution can well reflect the characteristics of various training scenarios, showing good potential of DVI as a debugging tool for analyzing deep learning training processes.

AIJ Journal 2022 Journal Article

VoCSK: Verb-oriented commonsense knowledge mining with taxonomy-guided induction

  • Jingping Liu
  • Tao Chen
  • Chao Wang
  • Jiaqing Liang
  • Lihan Chen
  • Yanghua Xiao
  • Yunwen Chen
  • Ke Jin

Commonsense knowledge acquisition is one of the fundamental issues in realizing human-level AI. However, commonsense knowledge is difficult to obtain because it is a human consensus and rarely explicitly appears in texts or other data. In this paper, we focus on the automatic acquisition of a typical kind of implicit verb-oriented commonsense knowledge (e. g. , “person eats food”), which is the concept-level knowledge of verb phrases. For this purpose, we propose a taxonomy-guided induction method to mine verb-oriented commonsense knowledge from verb phrases with the help of a probabilistic taxonomy. First, we design an entropy-based triplet filter to cope with noisy verb phrases. Then, we propose a joint model based on the minimum description length principle and a neural language model to generate verb-oriented commonsense knowledge. Besides, we introduce two strategies to accelerate the computation, including the simulated annealing-based approximate solution and the verb phrase clustering method. Finally, we conduct extensive experiments to prove that our solution is more effective than competitors in mining verb-oriented commonsense knowledge. We construct a commonsense knowledge base called VoCSK, containing 259 verbs and 18, 406 verb-oriented commonsense knowledge. To verify the usefulness of VoCSK, we utilize the knowledge in this KB to improve the model performance on two downstream applications.

EAAI Journal 2022 Journal Article

Welding sequence optimization to reduce welding distortion based on coupled artificial neural network and swarm intelligence algorithm

  • Chunbiao Wu
  • Chao Wang
  • Jae-Woong Kim

This study aims to develop a welding sequence optimization (WSO) framework based on coupled artificial neural network (ANN) and swarm intelligence algorithm for minimizing welding distortion of thin-walled squared Al–Mg–Si alloy tube components. This framework is mainly composed of two critical computer programs. Firstly, a multilayer feedforward backpropagation neural network (BPNN) system was established to rapidly estimate residual distortion for an arbitrary welding sequence so that welding sequence can be optimized for achieving desired welding quality. For this purpose, a series of nonlinear thermo-elastic–plastic finite element (FE) simulations were conducted and verified with experiments to generate the input database of the neural network. Subsequently, a reliable BPNN model was successfully created and trained within an acceptable error. Secondly, a novel swarm intelligence algorithm, namely, bees algorithm (BA) was proposed to solve the complicated WSO problems. In this optimization process, the trained BPNN model was implanted into this proposed BA for computing the fitness value of arbitrary welding sequences. Moreover, welding experiments were also performed to confirm the performance of the proposed optimization method. Comparing the results from experimental measurements, FE simulations, and proposed WSO framework, it is demonstrated that this proposed BPNN-and-BA-based WSO framework can be successfully applied in practical engineering to obtain an optimal welding sequence for minimizing final welding distortion.

JBHI Journal 2021 Journal Article

Efficient 3D Junction Detection in Biomedical Images Based on a Circular Sampling Model and Reverse Mapping

  • Lan Shen
  • Min Liu
  • Chao Wang
  • Changhao Guo
  • Erik Meijering
  • Yaonan Wang

Detection and localization of terminations and junctions is a key step in the morphological reconstruction of tree-like structures in images. Previously, a ray-shooting model was proposed to detect termination points automatically. In this paper, we propose an automatic method for 3D junction points detection in biomedical images, relying on a circular sampling model and a 2D-to-3D reverse mapping approach. First, the existing ray-shooting model is improved to a circular sampling model to extract the pixel intensity distribution feature across the potential branches around the point of interest. The computation cost can be reduced dramatically compared to the existing ray-shooting model. Then, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is employed to detect 2D junction points in maximum intensity projections (MIPs) of sub-volume images in a given 3D image, by determining the number of branches in the candidate junction region. Further, a 2D-to-3D reverse mapping approach is used to map these detected 2D junction points in MIPs to the 3D junction points in the original 3D images. The proposed 3D junction point detection method is implemented as a build-in tool in the Vaa3D platform. Experiments on multiple 2D images and 3D images show average precision and recall rates of 87. 11% and 88. 33% respectively. In addition, the proposed algorithm is dozens of times faster than the existing deep-learning based model. The proposed method has excellent performance in both detection precision and computation efficiency for junction detection even in large-scale biomedical images.

AAAI Conference 2021 Conference Paper

Learning Term Embeddings for Lexical Taxonomies

  • Jingping Liu
  • Menghui Wang
  • Chao Wang
  • Jiaqing Liang
  • Lihan Chen
  • Haiyun Jiang
  • Yanghua Xiao
  • Yunwen Chen

Lexical taxonomies, a special kind of knowledge graph, are essential for natural language understanding. This paper studies the problem of lexical taxonomy embedding. Most existing graph embedding methods are difficult to apply to lexical taxonomies since 1) they ignore implicit but important information, namely, sibling relations, which are not explicitly mentioned in lexical taxonomies and 2) there are lots of polysemous terms in lexical taxonomies. In this paper, we propose a novel method for lexical taxonomy embedding. This method optimizes an objective function that models both hyponym-hypernym relations and sibling relations. A termlevel attention mechanism and a random walk based metric are then proposed to assist the modeling of these two kinds of relations, respectively. Finally, a novel training method based on curriculum learning is proposed. We conduct extensive experiments on two tasks to show that our approach outperforms other embedding methods and we use the learned term embeddings to enhance the performance of the state-of-theart models that are based on BERT and RoBERTa on text classification.

IJCAI Conference 2021 Conference Paper

Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness

  • Dazhong Shen
  • Chuan Qin
  • Chao Wang
  • Hengshu Zhu
  • Enhong Chen
  • Hui Xiong

As one of the most popular generative models, Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference. However, when the decoder network is sufficiently expressive, VAE may lead to posterior collapse; that is, uninformative latent representations may be learned. To this end, in this paper, we propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space, and thus the representation can be learned in a meaningful and compact manner. Specifically, we first theoretically demonstrate that it will result in better latent space with high diversity and low uncertainty awareness by controlling the distribution of posterior’s parameters across the whole data accordingly. Then, without the introduction of new loss terms or modifying training strategies, we propose to exploit Dropout on the variances and Batch-Normalization on the means simultaneously to regularize their distributions implicitly. Furthermore, to evaluate the generalization effect, we also exploit DU-VAE for inverse autoregressive flow based-VAE (VAE-IAF) empirically. Finally, extensive experiments on three benchmark datasets clearly show that our approach can outperform state-of-the-art baselines on both likelihood estimation and underlying classification tasks.

IROS Conference 2021 Conference Paper

Temporally-Continuous Probabilistic Prediction using Polynomial Trajectory Parameterization

  • Zhaoen Su
  • Chao Wang
  • Henggang Cui
  • Nemanja Djuric
  • Carlos Vallespi-Gonzalez
  • David Bradley

A commonly-used representation for motion prediction of actors is a sequence of waypoints (comprising positions and orientations) for each actor at discrete future time-points. While regressing waypoints is simple and flexible, it can exhibit unrealistic higher-order derivatives (such as acceleration) and approximation errors at intermediate time steps. To address this issue we propose a general representation for temporally-continuous probabilistic trajectory prediction that regresses polynomial parameterization coefficients. We evaluate the proposed representation on supervised trajectory prediction tasks using two large self-driving data sets. The results show realistic higher-order derivatives and better accuracy at interpolated time-points, as well as the benefits of the inferred noise distributions over the trajectories. Extensive experimental studies based on existing state-of-the-art models demonstrate the effectiveness of the proposed approach relative to other representations in predicting the future motions of vehicle, bicyclist, and pedestrian traffic actors.

NeurIPS Conference 2021 Conference Paper

Topic Modeling Revisited: A Document Graph-based Neural Network Perspective

  • Dazhong Shen
  • Chuan Qin
  • Chao Wang
  • Zheng Dong
  • Hengshu Zhu
  • Hui Xiong

Most topic modeling approaches are based on the bag-of-words assumption, where each word is required to be conditionally independent in the same document. As a result, both of the generative story and the topic formulation have totally ignored the semantic dependency among words, which is important for improving the semantic comprehension and model interpretability. To this end, in this paper, we revisit the task of topic modeling by transforming each document into a directed graph with word dependency as edges between word nodes, and develop a novel approach, namely Graph Neural Topic Model (GNTM). Specifically, in GNTM, a well-defined probabilistic generative story is designed to model both the graph structure and word sets with multinomial distributions on the vocabulary and word dependency edge set as the topics. Meanwhile, a Neural Variational Inference (NVI) approach is proposed to learn our model with graph neural networks to encode the document graphs. Besides, we theoretically demonstrate that Latent Dirichlet Allocation (LDA) can be derived from GNTM as a special case with similar objective functions. Finally, extensive experiments on four benchmark datasets have clearly demonstrated the effectiveness and interpretability of GNTM compared with state-of-the-art baselines.

YNICL Journal 2020 Journal Article

Increased thalamic volume and decreased thalamo-precuneus functional connectivity are associated with smoking relapse

  • Chao Wang
  • Shuyue Wang
  • Zhujing Shen
  • Wei Qian
  • Yeerfan Jiaerken
  • Xiao Luo
  • Kaicheng Li
  • Qingze Zeng

The thalamus, with the highest density of nicotinic acetylcholine receptor (nAChR) in the brain, plays a central role in thalamo-cortical circuits that are implicated in nicotine addiction. However, little is known about whether the thalamo-cortical circuits are potentially predictive of smoking relapse. In the current study, a total of 125 participants (84 treatment-seeking male smokers and 41 age-matched male nonsmokers) were recruited. Structural and functional magnetic resonance images (MRI) were acquired from all participants. After a 12-week smoking cessation treatment with varenicline, the smokers were then divided into relapsers (n = 54) and nonrelapsers (n = 30). Then, we compared thalamic volume and seed-based thalamo-cortical resting state functional connectivity (rsFC) prior to the cessation treatment among relapsers, nonrelapsers and nonsmokers to investigate the associations between thalamic structure/function and smoking relapse. Increased thalamic volume was detected in smokers relative to nonsmokers, and in relapsers relative to nonrelapsers, especially on the left side. Moreover, decreased left thalamo-precuneus rsFC was detected in relapsers relative to nonrelapsers. Additionally, a logistic regression analysis showed that the thalamic volume and thalamo-precuneus rsFC predicted smoking relapse with an accuracy of 75.7%. These novel findings indicate that increased thalamic volume and decreased thalamo-precuneus rsFC are associated with smoking relapse, and these thalamic measures may be used to predict treatment efficacy of nicotine addiction and serve as a potential biomarker for personalized medicine.

YNICL Journal 2020 Journal Article

Inter-channel phase differences during sleep spindles are altered in Veterans with PTSD

  • Chao Wang
  • Srinivas Laxminarayan
  • J. David Cashmere
  • Anne Germain
  • Jaques Reifman

Sleep disturbances are common complaints in patients with post-traumatic stress disorder (PTSD). To date, however, objective markers of PTSD during sleep remain elusive. Sleep spindles are distinctive bursts of brain oscillatory activity during non-rapid eye movement (NREM) sleep and have been implicated in sleep protection and sleep-dependent memory processes. In healthy sleep, spindles observed in electroencephalogram (EEG) data are highly synchronized across different regions of the scalp. Here, we aimed to investigate whether the spatiotemporal synchronization patterns between EEG channels during sleep spindles, as quantified by the phase-locking value (PLV) and the mean phase difference (MPD), are altered in PTSD. Using high-density (64-channel) EEG data recorded from 78 combat-exposed Veteran men (31 with PTSD and 47 without PTSD) during two consecutive nights of sleep, we examined group differences in the PLV and MPD for slow (10-13 Hz) and fast (13-16 Hz) spindles separately. To evaluate the reproducibility of our findings, we set apart the first 47 consecutive participants (18 with PTSD) for the initial discovery and reserved the remaining 31 participants (13 with PTSD) for replication analysis. In the discovery analysis, compared to the non-PTSD group, the PTSD group showed smaller MPDs during slow spindles between the frontal and centro-parietal channel pairs on both nights. We obtained reproducible results in the replication analysis in terms of statistical significance and effect size. The PLVs during slow or fast spindles did not significantly differ between groups. The reduced inter-channel phase difference during slow spindles in PTSD may reflect pathological changes in the underlying thalamocortical circuits. This novel finding, if independently validated, may prove useful in developing sleep-focused PTSD diagnostics and interventions.

AAAI Conference 2020 Short Paper

Session-Level User Satisfaction Prediction for Customer Service Chatbot in E-Commerce (Student Abstract)

  • Riheng Yao
  • Shuangyong Song
  • Qiudan Li
  • Chao Wang
  • Huan Chen
  • Haiqing Chen
  • Daniel Dajun Zeng

This paper aims to predict user satisfaction for customer service chatbot in session level, which is of great practical significance yet rather untouched. It requires to explore the relationship between questions and answers across different rounds of interactions, and handle user bias. We propose an approach to model multi-round conversations within one session and take user information into account. Experimental results on a dataset from a real-world industrial customer service chatbot Alime demonstrate the good performance of our proposed model.

AAAI Conference 2020 Conference Paper

SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit Feedback

  • Chao Wang
  • Hengshu Zhu
  • Chen Zhu
  • Chuan Qin
  • Hui Xiong

The recent development of online recommender systems has a focus on collaborative ranking from implicit feedback, such as user clicks and purchases. Different from explicit ratings, which reflect graded user preferences, the implicit feedback only generates positive and unobserved labels. While considerable efforts have been made in this direction, the wellknown pairwise and listwise approaches have still been limited by various challenges. Specifically, for the pairwise approaches, the assumption of independent pairwise preference is not always held in practice. Also, the listwise approaches cannot efficiently accommodate “ties” due to the precondition of the entire list permutation. To this end, in this paper, we propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to inherently accommodate the characteristics of implicit feedback in recommender system. Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons and can be implemented with matrix factorization and neural networks. Meanwhile, we also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to M/N, where M and N are the numbers of items and users, respectively. Finally, extensive experiments on four real-world datasets clearly validate the superiority of SetRank compared with various state-of-the-art baselines.

YNICL Journal 2019 Journal Article

Gray matter structural covariance networks changes along the Alzheimer's disease continuum

  • Kaicheng Li
  • Xiao Luo
  • Qingze Zeng
  • Peiyu Huang
  • Zhujing Shen
  • Xiaojun Xu
  • Jingjing Xu
  • Chao Wang

Alzheimer's disease (AD) has a long neuropathological accumulation phase before the onset of dementia. Such AD neuropathological deposition between neurons impairs the synaptic communication, resulting in networks disorganization. Our study aimed to explore the evolution patterns of gray matter structural covariance networks (SCNs) along AD continuum. Based on the AT(N) (i. e. , Amyloid/Tau/Neurodegeneration) pathological classification system, we classified subjects into four groups using cerebrospinal fluid amyloid-beta1–42 (A) and phosphorylated tau protein181 (T). We identified 101 subjects with normal AD biomarkers (A-T-), 40 subjects with Alzheimer's pathologic change (A + T−), 101 subjects with biological AD (A + T+) and 91 AD with dementia (demented subjects with A + T+). We used four regions of interest to anchor default mode network (DMN, medial temporal subsystem and midline core subsystem), salience network (SN) and executive control network (ECN). Finally, we used a multi-regression model-based linear-interaction analysis to assess the SCN changes. Along the disease progression, DMN and SN showed increased structural association at the early stage while decreased structural association at the late stage. Moreover, ECN showed progressively increased structural association as AD neuropathological profiles progress. In conclusion, this study found the dynamic trajectory of SCNs changes along the AD continuum and support the network disconnection hypothesis underlying AD neuropathological progression. Further, SCN may potentially serve as an effective AD biomarker.

AAAI Conference 2019 Conference Paper

Molecular Property Prediction: A Multilevel Quantum Interactions Modeling Perspective

  • Chengqiang Lu
  • Qi Liu
  • Chao Wang
  • Zhenya Huang
  • Peize Lin
  • Lixin He

Predicting molecular properties (e. g. , atomization energy) is an essential issue in quantum chemistry, which could speed up much research progress, such as drug designing and substance discovery. Traditional studies based on density functional theory (DFT) in physics are proved to be time-consuming for predicting large number of molecules. Recently, the machine learning methods, which consider much rule-based information, have also shown potentials for this issue. However, the complex inherent quantum interactions of molecules are still largely underexplored by existing solutions. In this paper, we propose a generalizable and transferable Multilevel Graph Convolutional neural Network (MGCN) for molecular property prediction. Specifically, we represent each molecule as a graph to preserve its internal structure. Moreover, the well-designed hierarchical graph neural network directly extracts features from the conformation and spatial information followed by the multilevel interactions. As a consequence, the multilevel overall representations can be utilized to make the prediction. Extensive experiments on both datasets of equilibrium and off-equilibrium molecules demonstrate the effectiveness of our model. Furthermore, the detailed results also prove that MGCN is generalizable and transferable for the prediction.

IJCAI Conference 2019 Conference Paper

Relation Extraction Using Supervision from Topic Knowledge of Relation Labels

  • Haiyun Jiang
  • Li Cui
  • Zhe Xu
  • Deqing Yang
  • Jindong Chen
  • Chenguang Li
  • Jingping Liu
  • Jiaqing Liang

Explicitly exploring the semantics of a relation is significant for high-accuracy relation extraction, which is, however, not fully studied in previous work. In this paper, we mine the topic knowledge of a relation to explicitly represent the semantics of this relation, and model relation extraction as a matching problem. That is, the matching score between a sentence and a candidate relation is predicted for an entity pair. To this end, we propose a deep matching network to precisely model the semantic similarity between a sentence-relation pair. Besides, the topic knowledge also allows us to derive the importance information of samples as well as two knowledge-guided negative sampling strategies in the training process. We conduct extensive experiments to evaluate the proposed framework and observe improvements in AUC of 11. 5% and max F1 of 5. 4% over the baselines with state-of-the-art performance.

AAAI Conference 2018 Conference Paper

Confidence-Aware Matrix Factorization for Recommender Systems

  • Chao Wang
  • Qi Liu
  • Runze Wu
  • Enhong Chen
  • Chuanren Liu
  • Xunpeng Huang
  • Zhenya Huang

Collaborative filtering (CF), particularly matrix factorization (MF) based methods, have been widely used in recommender systems. The literature has reported that matrix factorization methods often produce superior accuracy of rating prediction in recommender systems. However, existing matrix factorization methods rarely consider confidence of the rating prediction and thus cannot support advanced recommendation tasks. In this paper, we propose a Confidence-aware Matrix Factorization (CMF) framework to simultaneously optimize the accuracy of rating prediction and measure the prediction confidence in the model. Specifically, we introduce variance parameters for both users and items in the matrix factorization process. Then, prediction interval can be computed to measure confidence for each predicted rating. These confidence quantities can be used to enhance the quality of recommendation results based on Confidence-aware Ranking (CR). We also develop two effective implementations of our framework to compute the confidence-aware matrix factorization for large-scale data. Finally, extensive experiments on three real-world datasets demonstrate the effectiveness of our framework from multiple perspectives.

YNIMG Journal 2018 Journal Article

The frequency of alpha oscillations: Task-dependent modulation and its functional significance

  • Immanuel Babu Henry Samuel
  • Chao Wang
  • Zhenhong Hu
  • Mingzhou Ding

Power (amplitude) and frequency are two important characteristics of EEG alpha oscillations (8–12 Hz). There is an extensive literature showing that alpha power can be modulated in a goal-oriented manner to either enhance or suppress sensory information processing. Only a few studies to date have examined the task-dependent modulation of alpha frequency. Instead, alpha frequency is often viewed as a trait variable, and used to characterize individual differences in cognitive functioning. We performed two experiments to examine the task-dependent modulation of alpha frequency and its functional significance. In the first experiment, high-density EEG was recorded from 21 participants performing a Sternberg working memory task. The results showed that: (1) during memory encoding, alpha frequency decreased with increasing memory load, whereas during memory retention and retrieval, alpha frequency increased with increasing memory load, (2) higher alpha frequency prior to the onset of probe was associated with longer reaction time, and (3) higher alpha frequency prior to the onset of cue or probe was associated with weaker early cue-evoked or probe-evoked neural responses. In the second experiment, simultaneous EEG-fMRI was recorded from 59 participants during resting state. An EEG-informed fMRI analysis revealed that the spontaneous fluctuations of alpha frequency, but not alpha power, were inversely associated with BOLD activity in the visual cortex. Taken together, these findings suggest that alpha frequency is task-dependent, may serve as an indicator of cortical excitability, and along with alpha power, provides more comprehensive indexing of sensory gating.

AAMAS Conference 2016 Conference Paper

Multi-Agent Continuous Transportation with Online Balanced Partitioning (Extended Abstract)

  • Chao Wang
  • Somchaya Liemhetcharat
  • Kian Hsiang Low

We introduce the concept of continuous transportation task to the context of multi-agent systems. A continuous transportation task is one in which a multi-agent team visits a number of fixed locations, picks up objects, and delivers them to a transportation hub. The goal is to maximize the rate of transportation while the objects are replenished over time. In this extended abstract, we present a hybrid of centralized and distributed approaches that minimize communications in the multi-agent team. We contribute a novel online partitioning-transportation algorithm with information gathering in the multi-agent team.

YNIMG Journal 2015 Journal Article

The cortical surface area of the insula mediates the effect of DBH rs7040170 on novelty seeking

  • Jin Li
  • Yue Cui
  • Karen Wu
  • Bing Liu
  • Yun Zhang
  • Chao Wang
  • Tianzi Jiang

Novelty seeking (NS) is a personality trait important for adaptive functioning, but an excessive level of NS has been linked to psychiatric disorders such as ADHD and substance abuse. Previous research has investigated separately the neural and genetic bases of the NS trait, but results were mixed and neural and genetic bases have yet to be examined within the same study. In this study, we examined the interrelationships among the dopamine beta-hydroxylase (DBH) gene, brain structure, and the NS trait in 359 healthy Han Chinese subjects. We focused on the DBH gene because it encodes a key enzyme for dopamine metabolism, NS is believed to be related to the dopaminergic system and has been reported associated with DBH variation. Results showed a significant positive association between the cortical surface area of the left insula and NS score. Furthermore, the DBH genetic polymorphism at the SNP rs7040170 was strongly associated with both the surface area of the left insula and NS score, with G carriers having a larger left insula surface area and a higher NS score than AA homozygotes. Subsequent path analysis suggested that the insula partially mediated the association between the DBH gene and the NS trait. Our data provided the first evidence for the involvement of the insula in the dopamine–NS relationship. Future studies of molecular mechanisms underlying the NS personality trait and related psychiatric disorders should consider the mediation effect of the neural structure.

NeurIPS Conference 2014 Conference Paper

Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors

  • Lingqiao Liu
  • Chunhua Shen
  • Lei Wang
  • Anton van den Hengel
  • Chao Wang

Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) to characterize the generation process of local features. This choice has shown to be sufficient for traditional low dimensional local features, e. g. , SIFT; and typically, good performance can be achieved with only a few hundred Gaussian distributions. However, the same number of Gaussians is insufficient to model the feature space spanned by higher dimensional local features, which have become popular recently. In order to improve the modeling capacity for high dimensional features, it turns out to be inefficient and computationally impractical to simply increase the number of Gaussians. In this paper, we propose a model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace. With certain approximation, this model can be converted to a sparse coding procedure and the learning/inference problems can be readily solved by standard sparse coding methods. By calculating the gradient vector of the proposed model, we derive a new fisher vector encoding strategy, termed Sparse Coding based Fisher Vector Coding (SCFVC). Moreover, we adopt the recently developed Deep Convolutional Neural Network (CNN) descriptor as a high dimensional local feature and implement image classification with the proposed SCFVC. Our experimental evaluations demonstrate that our method not only significantly outperforms the traditional GMM based Fisher vector encoding but also achieves the state-of-the-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.