Author name cluster

Jia Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

29 papers

2 author rows

AAAI Conference 2026 Conference Paper

A Hybrid Space Model for Misaligned Multi-modality Image Fusion

Yi Xiao
Jia Wang
Zhu Liu
Di Wang
Jinyuan Liu
Risheng Liu

Infrared and visible image fusion aims to integrate complementary information, such as thermal saliency from infrared imagery and fine-grained texture details from visible imagery. However, real-world multi-modal misalignment and geometric deformation often introduce severe artifacts. Most existing methods focus on feature extraction within Euclidean space, thereby neglecting the inherent hierarchical structures embedded in multimodal representations. While Euclidean space excels at preserving local structural details and supporting efficient computation, hyperbolic space is naturally suited for modeling hierarchical relationships due to its geometric properties. Building upon these observations, this paper proposes a unified framework that jointly optimizes image registration and fusion through a dual-space architecture. This architecture synergistically combines the local fidelity of Euclidean geometry with the hierarchical modeling capability of hyperbolic geometry to enhance multimodal representation learning. Specifically, this paper introduces Hyperbolic Coupled Contrastive Learning Optimization (HCCLO), which aligns and optimizes the hierarchical structures of infrared and visible embeddings in hyperbolic space. Moreover, this paper designs a task-adaptive dual-space features fusion mechanism, which dynamically balances and fuses Euclidean local features with hyperbolic hierarchical representations, thereby improving adaptability for downstream tasks. Extensive experiments on misaligned multimodal datasets demonstrate that our method achieves state-of-the-art performance, while effectively capturing both spatial dependencies and hierarchical semantics.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Bidirectional Noise Injection: Enhancing Diffusion Models via Coordinated Input-Output Perturbation

Tianyi Zheng
Jiayang Gao
Peng-tao Jiang
Fengxiang Yang
Ben Wan
Hao Zhang
Jinwei Chen
Jia Wang

Diffusion models have demonstrated remarkable success in image generation, yet a persistent challenge remains: the bias between model predictions and the target distribution. In this paper, we propose a Bidirectional Noise Injection framework for enhancing diffusion models, implemented via Coordinated Input-Output Perturbation (CIOP). Our approach mitigates this bias by randomly applying synchronized noise injection to both the model inputs and the prediction targets during the training stage. This stochastic, synchronized noise injected acts as a smoothing mechanism that effectively reduces the 2-Wasserstein distance between the predicted and target distributions, as substantiated by our theoretical analysis based on optimal transport theory. Extensive experiments on multiple benchmark datasets and various generative tasks demonstrate that our method improves generation quality and training efficiency without incurring additional computational cost. Furthermore, the design of CIOP enables seamless integration with existing diffusion model improvements and advanced frameworks, thereby broadening its applicability. These results highlight the potential of Bidirectional Noise Injection via CIOP to alleviate bias in diffusion-based generative models across a wide range of settings.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Breaking Down Market Barriers: Distilled Prompt-Tuning Approach for Cross-Market Recommendation

Leqi Zhang
Wayne Lu
Haiyang Zhang
Elliott Wen
Zhixuan Liang
Jia Wang

Cross-market recommendation (CMR) faces severe challenges from distribution shifts between data-rich source markets and sparse target markets. Existing methods rely on a pre-training and fine-tuning paradigm for knowledge transfer, yet suffer from two key limitations: i) the objective gap between pre-training and full-parameter fine-tuning causes loss of generalized knowledge from source markets; ii) the high computational costs of extensive fine-tuning hinder scalability. To this end, we propose DCMPT, a novel Distilled Cross-Market Prompt-Tuning approach. DCMPT reframes the problem under a more efficient pre-training and prompt-tuning paradigm. Instead of full fine-tuning, we adapt a pre-trained universal backbone by freezing its weights and injecting a minimal set of learnable prompts to form a "student" model. To effectively optimize these prompts on sparse data, we introduce a novel teacher-student architecture: a specialized "teacher" model, trained exclusively on the target market, provides dense, market-specific supervision. This guidance is delivered via a dual distillation strategy designed to transfer global ranking patterns and adapt to local consumer tastes. Extensive experiments on real-world market datasets demonstrate that DCMPT significantly outperforms state-of-the-art methods, achieving superior target market performance with substantial parameter-efficiency.

PDF Details DOI

AAAI Conference 2026 Conference Paper

From IDs to Semantics: A Generative Framework for Cross-Domain Recommendation with Adaptive Semantic Tokenization

Peiyu Hu
Wayne Lu
Jia Wang

Cross-domain recommendation (CDR) is crucial for improving recommendation accuracy and generalization, yet traditional methods are often hindered by the reliance on shared user/item IDs, which are unavailable in most real-world scenarios. Consequently, many efforts have focused on learning disentangled representations through multi-domain joint training to bridge the domain gaps. Recent Large Language Model (LLM)-based approaches show promise, they still face critical challenges, including: (1) the \textbf{item ID tokenization dilemma}, which leads to vocabulary explosion and fails to capture high-order collaborative knowledge; and (2) \textbf{insufficient domain-specific modeling} for the complex evolution of user interests and item semantics. To address these limitations, we propose \textbf{GenCDR}, a novel \textbf{Gen}erative \textbf{C}ross-\textbf{D}omain \textbf{R}ecommendation framework. GenCDR first employs a \textbf{Domain-adaptive Tokenization} module, which generates disentangled semantic IDs for items by dynamically routing between a universal encoder and domain-specific adapters. Symmetrically, a \textbf{Cross-domain Autoregressive Recommendation} module models user preferences by fusing universal and domain-specific interests. Finally, a \textbf{Domain-aware Prefix-tree} enables efficient and accurate generation. Extensive experiments on multiple real-world datasets demonstrate that GenCDR significantly outperforms state-of-the-art baselines. Our code is available in the supplementary materials.

PDF Details DOI

EAAI Journal 2026 Journal Article

Structure-based curriculum learning ultrasound gallbladder image classification network

Xintao Mu
Shengbiao Yang
Jing Zhuo
Yang Li
Jia Wang
Cheng Peng
Xin Wang

Details DOI

AAAI Conference 2025 Conference Paper

DiffusionREC: Diffusion Model with Adaptive Condition for Referring Expression Comprehension

Jingcheng Ke
Waikeung Wong
Jia Wang
Mu Li
Lunke Fei
Jie Wen

The objective of referring expression comprehension (REC) is to accurately identify the object in an image described by a given expression. Existing REC methods, including transformer-based and graph-based approaches among others, have shown robust performance in REC tasks. In this study, we present a groundbreaking framework named DiffusionREC for REC task. This framework reimagines the REC task as a text guided bounding box denoising diffusion process, through which noisy bounding boxes are refined and distilled to pinpoint the target box. Throughout the training process, the bounding box of the target object diffuses from its ground-truth position towards a random distribution. Simultaneously, a filtering-based object decoder is introduced to reverse this diffusion of noise, conditional on the provided expression, the result from previous denoised step and the interaction between the expression and the image. At the inference stage, we begin by randomly generating a collection of boxes. Subsequently, the filtering-based object decoder is iteratively employed to refine and prune these bounding boxes, taking into account the conditions on the given expression, the results from the previous denoised step, and the interaction between the expression and the image. Extensive experiments conducted on six datasets demonstrate that DiffusionREC outperforms previous REC methods, yielding superior performances.

PDF Details DOI

IROS Conference 2025 Conference Paper

DPGP: A Hybrid 2D-3D Dual Path Potential Ghost Probe Zone Prediction Framework for Safe Autonomous Driving

Weiming Qu
Jiawei Du
Shenghai Yuan 0001
Jia Wang
Yang Sun
Shengyi Liu
Yuanhao Zhu
Jiayi Rao

Modern robots must coexist with humans in dense urban environments. A key challenge is the ghost probe problem, where pedestrians or objects unexpectedly rush into traffic paths. This issue affects both autonomous vehicles and human drivers. Existing works propose vehicle-to-everything (V2X) strategies and non-line-of-sight (NLOS) imaging for ghost probe zone detection. However, most require high computational power or specialized hardware, limiting real-world feasibility. Additionally, many methods do not explicitly address this issue. To tackle this, we propose DPGP, a hybrid 2D-3D fusion framework for ghost probe zone prediction using only a monocular camera during training and inference. With unsupervised depth prediction, we observe ghost probe zones align with depth discontinuities, but different depth representations offer varying robustness. To exploit this, we fuse multiple feature embeddings to improve prediction. To validate our approach, we created a 12K-image dataset annotated with ghost probe zones, carefully sourced and cross-checked for accuracy. Experimental results show our framework outperforms existing methods while remaining cost-effective. To our knowledge, this is the first work extending ghost probe zone prediction beyond vehicles, addressing diverse non-vehicle objects. We will open-source our code and dataset for community benefit.

Details

NeurIPS Conference 2025 Conference Paper

GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning

Haolong Yan
Yeqing Shen
Xin Huang
Jia Wang
Kaijun Tan
Zhixuan Liang
Hongxin Li
Zheng Ge

With the rapid development of Large Vision Language Models, the focus of Graphical User Interface (GUI) agent tasks shifts from single-screen tasks to complex screen navigation challenges. However, real-world GUI environments, such as PC software and mobile Apps, are often complex and proprietary, making it difficult to obtain the comprehensive environment information needed for agent training and evaluation. This limitation hinders systematic investigation and benchmarking of agent navigation capabilities. To address this limitation, we introduce GUI Exploration Lab, a simulation environment engine for GUI agent navigation research that enables flexible definition and composition of screens, icons, and navigation graphs, while providing full access to environment information for comprehensive agent training and evaluation. Through extensive experiments, we find that supervised fine-tuning enables effective memorization of fundamental knowledge, serving as a crucial foundation for subsequent training. Building on this, single-turn reinforcement learning further enhances generalization to unseen scenarios. Finally, multi-turn reinforcement learning encourages the development of exploration strategies through interactive trial and error, leading to further improvements in screen navigation performance. We validate our methods on both static and interactive benchmarks, demonstrating that our findings generalize effectively to real-world scenarios. These findings demonstrate the advantages of reinforcement learning approaches in GUI navigation and offer practical guidance for building more capable and generalizable GUI agents.

PDF Details

IJCAI Conference 2025 Conference Paper

MMET: A Multi-Input and Multi-Scale Transformer for Efficient PDEs Solving

Yichen Luo
Jia Wang
Dapeng Lan
Yu Liu
Zhibo Pang

Partial Differential Equations (PDEs) are fundamental for modeling physical systems, yet solving them in a generic and efficient manner using machine learning-based approaches remains challenging due to limited multi-input and multi-scale generalization capabilities, as well as high computational costs. This paper proposes the Multi-input and Multi-scale Efficient Transformer (MMET), a novel framework designed to address the above challenges. MMET decouples mesh and query points as two sequences and feeds them into the encoder and decoder, respectively, and uses a Gated Condition Embedding (GCE) layer to embed input variables or functions with varying dimensions, enabling effective solutions for multi-scale and multi-input problems. Additionally, a Hilbert curve-based reserialization and patch embedding mechanism decrease the input length. This significantly reduces the computational cost when dealing with large-scale geometric models. These innovations enable efficient representations and support multi-scale resolution queries for large-scale and multi-input PDE problems. Experimental evaluations on diverse benchmarks spanning different physical fields demonstrate that MMET outperforms SOTA methods in both accuracy and computational efficiency. This work highlights the potential of MMET as a robust and scalable solution for real-time PDE solving in engineering and physics-based applications, paving the way for future explorations into pre-trained large-scale models in specific domains. This work is open-sourced at https: //github. com/YichenLuo-0/MMET.

PDF Details DOI

ICLR Conference 2025 Conference Paper

On Designing General and Expressive Quantum Graph Neural Networks with Applications to MILP Instance Representation

Xinyu Ye
Hao Xiong 0003
Jianhao Huang
Ziang Chen
Jia Wang
Junchi Yan

Graph-structured data is ubiquitous, and graph learning models have recently been extended to address complex problems like mixed-integer linear programming (MILP). However, studies have shown that the vanilla message-passing based graph neural networks (GNNs) suffer inherent limitations in learning MILP instance representation, i.e., GNNs may map two different MILP instance graphs to the same representation. In this paper, we introduce an expressive quantum graph learning approach, leveraging quantum circuits to recognize patterns that are difficult for classical methods to learn. Specifically, the proposed General Quantum Graph Learning Architecture (GQGLA) is composed of a node feature layer, a graph message interaction layer, and an optional auxiliary layer. Its generality is reflected in effectively encoding features of nodes and edges while ensuring node permutation equivariance and flexibly creating different circuit structures for various expressive requirements and downstream tasks. GQGLA is well suited for learning complex graph tasks like MILP representation. Experimental results highlight the effectiveness of GQGLA in capturing and learning representations for MILPs. In comparison to traditional GNNs, GQGLA exhibits superior discriminative capabilities and demonstrates enhanced generalization across various problem instances, making it a promising solution for complex graph tasks.

Details

NeurIPS Conference 2025 Conference Paper

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Yana Wei
Liang Zhao
Jianjian Sun
Kangheng Lin
jisheng yin
Jingcheng Hu
Yinmin Zhang
En Yu

The remarkable reasoning capability of large language models (LLMs) stems from cognitive behaviors that emerge through reinforcement with verifiable rewards. This work investigates how to transfer this principle to Multimodal LLMs (MLLMs) to unlock advanced visual reasoning. We introduce a two-stage paradigm built on Qwen2. 5-VL-7B: a massive linguistic cold-start fine-tuning, followed by multimodal reinforcement learning (RL) spanning nearly 1, 000 steps—surpassing all previous open-source efforts in scale. This pioneering work reveals three fundamental insights: 1) Behavior transfer emerges surprisingly early in cold start due to linguistic mental imagery. 2) Cold start broadly memorizes visual behaviors, while RL critically discerns and scales up effective patterns. 3) Transfer strategically favors high-utility behaviors such as visual reflection. Our resulting model, Open-Vision-Reasoner (OVR), achieves state-of-the-art performance on a suite of reasoning benchmarks, including 95. 3% on MATH500, 51. 8% on MathVision and 54. 6% on MathVerse. We release our model, data, and training dynamics to catalyze the development of more capable, behavior-aligned multimodal reasoners.

PDF Details

AAAI Conference 2025 Conference Paper

PScalpel: A Machine Learning-based Guider for Protein Phase-Separating Behaviour Alteration

Jia Wang
Liyan Zhu
Zhe Wang
Chenqiu Zhang
Yaoxing Wu
Jun Cui
Jianqiang Li

Missense mutations could affect the Liquid-Liquid Phase Separation (LLPS) propensity of proteins and lead to aberrant phase-separating behaviours, which are recently found to be associated with many diseases including Alzheimer's and cancer. However, the regulatory role of mutations in LLPS remains unclear due to challenges in accurately characterizing the LLPS ability of mutants, including the high similarity in features, lack of labeled data, and vast amounts of data involved. To bridge this gap and facilitate the discovery of therapeutic strategies, we propose the first machine learning-based guider for protein phase-separating behaviour alteration, PScalpel. PScalpel leverages both structural information and an auxiliary tasks-based graph contrastive learning framework to distinguish the mutants’ LLPS ability, and incorporates a genetic algorithms-based recommendation method to identify mutants with desired LLPS properties. Comprehensive computational and biological experiments validate the effectiveness of PScalpel as a versatile tool for guiding alterations in protein phase separation behavior.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Reinforcement Active Client Selection for Federated Heterogeneous Graph Learning

Jia Wang
Yawen Li
Yingxia Shao
Zhe Xue
Zeli Guan
Ang Li
Guanhua Ye

Carefully selecting clients to participate in aggregation can assist the global model in achieving better performance. However, existing research on federated heterogeneous graph learning (FHGL) has shown limited attention to the client selection (CS) problem. Current CS algorithms face challenges in accurately evaluating client contributions and selecting appropriate participants in the context of FHGL, leading to a dilemma between convergence and accuracy. In this paper, we propose a Reinforcement Active client selection based Federated Heterogeneous Graph Learning (RAFHGL), which precisely evaluates the importance of local heterogeneous graph data and selects high-contributing clients for aggregation. RAFHGL employs an active learning agent to select representative nodes for local training. The statistical features of the active scores are used to assess client contributions. A client selection agent then chooses clients conducive to global model convergence for aggregation. To address heterogeneity introduced by sample and client selection, the training process stabilizes by correcting local losses based on data prototypes. Experimental results on 4 publicly available heterogeneous graph datasets show that RAFHGL outperforms existing Client Selection algorithms in federated heterogeneous graph learning scenarios in terms of performance and convergence.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Role-aware Multi-agent Reinforcement Learning for Coordinated Emergency Traffic Control

Ming Cheng
Hao Chen
Zhiqing Li
Jia Wang
Senzhang Wang

Emergency traffic control presents an increasingly critical challenge, requiring seamless coordination among emergency vehicles, regular vehicles, and traffic lights to ensure efficient passage for all vehicles. Existing models primarily only focus on traffic light control, leaving emergency and regular vehicles prone to delay due to the lack of navigation strategies. To address this issue, we propose the R ole-aware M ulti-agent T raffic C ontrol (RMTC) framework, which dynamically assigns appropriate roles to traffic components for better cooperation by considering their relations with emergency vehicles and adaptively adjusting their policies. Specifically, RMTC introduces a Heterogeneous Temporal Traffic Graph (HTTG) to model the spatial and temporal relationships among all traffic components (traffic lights, regular and emergency vehicles) at each time step. Furthermore, we develop a Dynamic Role Learning model to infer the evolving roles of traffic lights and regular vehicles based on HTTG. Finally, we present a Role-aware Multi-agent Reinforcement Learning approach that learns traffic policies conditioned on the dynamically roles. Extensive experiments across four public traffic scenarios show that RMTC outperforms existing traffic light control methods by significantly reducing emergency vehicle travel time, while effectively preserving traffic efficiency for regular vehicles. The code is released at https: //github. com/mingchenghexi/RMTC.

PDF Details

IROS Conference 2025 Conference Paper

SILM: A Subjective Intent Based Low-Latency Framework for Multiple Traffic Participants Joint Trajectory Prediction

Weiming Qu
Jia Wang
Jiawei Du
Yuanhao Zhu
Jianfeng Yu
Rui Xia
Song Cao
Xihong Wu

Trajectory prediction is a fundamental technology for advanced autonomous driving systems and represents one of the most challenging problems in the field of cognitive intelligence. Accurately predicting the future trajectories of each traffic participant is a prerequisite for building high safety and high reliability decision-making, planning, and control capabilities in autonomous driving. However, existing methods often focus solely on the motion of other traffic participants without considering the underlying intent behind that motion, which increases the uncertainty in trajectory prediction. Autonomous vehicles operate in real-time environments, meaning that trajectory prediction algorithms must be able to process data and generate predictions in real-time. While many existing methods achieve high accuracy, they often struggle to effectively handle heterogeneous traffic scenarios. In this paper, we propose a Subjective Intent-based Low-latency framework for Multiple traffic participants joint trajectory prediction. Our method explicitly incorporates the subjective intent of traffic participants based on their key points, and predicts the future trajectories jointly without map, which ensures promising performance while significantly reducing the prediction latency. Additionally, we introduce a novel dataset designed specifically for trajectory prediction. Related code and dataset will be available soon.

Details

YNIMG Journal 2025 Journal Article

White-Matter fiber tract and resting-state functional connectivity abnormalities in young children with autism spectrum disorder

Jia Wang
Natasha Y.S. Kawata
Xuan Cao
Jie Zhang
Takashi X. Fujisawa
Xinyi Zhang
Lili Fan
Wei Xia

Details DOI

UAI Conference 2024 Conference Paper

Bias-aware Boolean Matrix Factorization Using Disentangled Representation Learning

Xiao Wang 0099
Jia Wang
Tong Zhao 0002
Yijie Wang
Nan Zhang
Yong Zang
Sha Cao
Chi Zhang 0021

Boolean matrix factorization (BMF) has been widely utilized in fields such as recommendation systems, graph learning, text mining, and -omics data analysis. Traditional BMF methods decompose a binary matrix into the Boolean product of two lower-rank Boolean matrices plus homoscedastic random errors. However, real-world binary data typically involves biases arising from heterogeneous row- and column-wise signal distributions. Such biases can lead to suboptimal fitting and unexplainable predictions if not accounted for. In this study, we reconceptualize the binary data generation as the Boolean sum of three components: a binary pattern matrix, a background bias matrix influenced by heterogeneous row or column distributions, and random flipping errors. We introduce a novel Disentangled Representation Learning for Binary matrices (DRLB) method, which employs a dual auto-encoder network to reveal the true patterns. DRLB can be seamlessly integrated with existing BMF techniques to facilitate bias-aware BMF. Our experiments with both synthetic and real-world datasets show that DRLB significantly enhances the precision of traditional BMF methods while offering high scalability. Moreover, the bias matrix detected by DRLB accurately reflects the inherent biases in synthetic data, and the patterns identified in the bias-corrected real-world data exhibit enhanced interpretability.

Details

YNICL Journal 2024 Journal Article

Large-scale effective connectivity analysis reveals the existence of two mutual inhibitory systems in patients with major depression

Jia Wang
Baojuan Li
Jian Liu
Jiaming Li
Adeel Razi
Kaizhong Zheng
Baoyu Yan
Huaning Wang

Details DOI

AAAI Conference 2024 Conference Paper

Practical Privacy-Preserving MLaaS: When Compressive Sensing Meets Generative Networks

Jia Wang
Wuqiang Su
Zushu Huang
Jie Chen
Chengwen Luo
Jianqiang Li

The Machine-Learning-as-a-Service (MLaaS) framework allows one to grab low-hanging fruit of machine learning techniques and data science, without either much expertise for this sophisticated sphere or provision of specific infrastructures. However, the requirement of revealing all training data to the service provider raises new concerns in terms of privacy leakage, storage consumption, efficiency, bandwidth, etc. In this paper, we propose a lightweight privacy-preserving MLaaS framework by combining Compressive Sensing (CS) and Generative Networks. It’s constructed on the favorable facts observed in recent works that general inference tasks could be fulfilled with generative networks and classifier trained on compressed measurements, since the generator could model the data distribution and capture discriminative information which are useful for classification. To improve the performance of the MLaaS framework, the supervised generative models of the server are trained and optimized with prior knowledge provided by the client. In order to prevent the service provider from recovering the original data as well as identifying the queried results, a noise-addition mechanism is designed and adopted into the compressed data domain. Empirical results confirmed its performance superiority in accuracy and resource consumption against the state-of-the-art privacy preserving MLaaS frameworks.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Certifiable Out-of-Distribution Generalization

Nanyang Ye
Lin Zhu
Jia Wang
Zhaoyu Zeng
Jiayao Shao
Chensheng Peng
Bikang Pan
Kaican Li

Machine learning methods suffer from test-time performance degeneration when faced with out-of-distribution (OoD) data whose distribution is not necessarily the same as training data distribution. Although a plethora of algorithms have been proposed to mitigate this issue, it has been demonstrated that achieving better performance than ERM simultaneously on different types of distributional shift datasets is challenging for existing approaches. Besides, it is unknown how and to what extent these methods work on any OoD datum without theoretical guarantees. In this paper, we propose a certifiable out-of-distribution generalization method that provides provable OoD generalization performance guarantees via a functional optimization framework leveraging random distributions and max-margin learning for each input datum. With this approach, the proposed algorithmic scheme can provide certified accuracy for each input datum's prediction on the semantic space and achieves better performance simultaneously on OoD datasets dominated by correlation shifts or diversity shifts. Our code is available at https://github.com/ZlatanWilliams/StochasticDisturbanceLearning.

PDF Details DOI

AAAI Conference 2022 Short Paper

A Multi-Factor Classification Framework for Completing Users’ Fuzzy Queries (Student Abstract)

Yaning Zhang
Liangqing Wu
Yangyang Wang
Jia Wang
Xiaoguang Yu
Shuangyong Song
Youzheng Wu
Xiaodong He

Intent identification is the key technology in dialogue system. However, not all online queries are clear or complete. To identify users’ intents from those fuzzy queries accurately, this paper proposes a multi-factor classification framework on the query level. Experimental results on our online serving system JIMI demonstrate the effectiveness of our proposed framework.

PDF Details

AAAI Conference 2022 Short Paper

SimCTC: A Simple Contrast Learning Method of Text Clustering (Student Abstract)

Chen Li
Xiaoguang Yu
Shuangyong Song
Jia Wang
Bo Zou
Xiaodong He

This paper presents SimCTC, a simple contrastive learning (CL) method that greatly advances the state-of-the-art text clustering models. In SimCTC, a pre-trained BERT model first maps the input sequence to the representation space, which is then followed by three different loss function heads: Clustering head, Instance-CL head and Cluster-CL head. Experimental results on multiple benchmark datasets demonstrate that SimCTC remarkably outperforms 6 competitive text clustering methods with 1%-6% improvement on Accuracy (ACC) and 1%-4% improvement on Normalized Mutual Information (NMI). Moreover, our results also show that the clustering performance can be further improved by setting an appropriate number of clusters in the cluster-level objective.

PDF Details

JBHI Journal 2022 Journal Article

Stroke Risk Prediction With Hybrid Deep Transfer Learning Framework

Jie Chen
Yingru Chen
Jianqiang Li
Jia Wang
Zijie Lin
Asoke K. Nandi

Stroke has become a leading cause of death and long-term disability in the world with no effective treatment. Deep learning-based approaches have the potential to outperform existing stroke risk prediction models, but they rely on large well-labeled data. Due to the strict privacy protection policy in health-care systems, stroke data is usually distributed among different hospitals in small pieces. In addition, the positive and negative instances of such data are extremely imbalanced. Transfer learning can solve small data issue by exploiting the knowledge of a correlated domain, especially when multiple source of data are available. In this work, we propose a novel Hybrid Deep Transfer Learning-based Stroke Risk Prediction (HDTL-SRP) scheme to exploit the knowledge structure from multiple correlated sources (i. e. , external stroke data, chronic diseases data, such as hypertension and diabetes). The proposed framework has been extensively tested in synthetic and real-world scenarios, and it outperforms the state-of-the-art stroke risk prediction models. It also shows the potential of real-world deployment among multiple hospitals aided with 5 G/B5G infrastructures.

Details DOI

NeurIPS Conference 2021 Conference Paper

Bounds all around: training energy-based models with bidirectional bounds

Cong Geng
Jia Wang
Zhiyong Gao
Jes Frellsen
Søren Hauberg

Energy-based models (EBMs) provide an elegant framework for density estimation, but they are notoriously difficult to train. Recent work has established links to generative adversarial networks, where the EBM is trained through a minimax game with a variational value function. We propose a bidirectional bound on the EBM log-likelihood, such that we maximize a lower bound and minimize an upper bound when solving the minimax game. We link one bound to a gradient penalty that stabilizes training, thereby provide grounding for best engineering practice. To evaluate the bounds we develop a new and efficient estimator of the Jacobi-determinant of the EBM generator. We demonstrate that these developments stabilize training and yield high-quality density estimation and sample generation.

PDF Details

IJCAI Conference 2019 Conference Paper

CLVSA: A Convolutional LSTM Based Variational Sequence-to-Sequence Model with Attention for Predicting Trends of Financial Markets

Jia Wang
Tong Sun
Benyuan Liu
Yu Cao
Hongwei Zhu

Financial markets are a complex dynamical system. The complexity comes from the interaction between a market and its participants, in other words, the integrated outcome of activities of the entire participants determines the markets trend, while the markets trend affects activities of participants. These interwoven interactions make financial markets keep evolving. Inspired by stochastic recurrent models that successfully capture variability observed in natural sequential data such as speech and video, we propose CLVSA, a hybrid model that consists of stochastic recurrent networks, the sequence-to-sequence architecture, the self- and inter-attention mechanism, and convolutional LSTM units to capture variationally underlying features in raw financial trading data. Our model outperforms basic models, such as convolutional neural network, vanilla LSTM network, and sequence-to-sequence model with attention, based on backtesting results of six futures from January 2010 to December 2017. Our experimental results show that, by introducing an approximate posterior, CLVSA takes advantage of an extra regularizer based on the Kullback-Leibler divergence to prevent itself from overfitting traps.

PDF Details

AAAI Conference 2019 Conference Paper

Understanding VAEs in Fisher-Shannon Plane

Huangjie Zheng
Jiangchao Yao
Ya Zhang
Ivor W. Tsang
Jia Wang

In information theory, Fisher information and Shannon information (entropy) are respectively used to quantify the uncertainty associated with the distribution modeling and the uncertainty in specifying the outcome of given variables. These two quantities are complementary and are jointly applied to information behavior analysis in most cases. The uncertainty property in information asserts a fundamental trade-off between Fisher information and Shannon information, which enlightens us the relationship between the encoder and the decoder in variational auto-encoders (VAEs). In this paper, we investigate VAEs in the Fisher-Shannon plane, and demonstrate that the representation learning and the log-likelihood estimation are intrinsically related to these two information quantities. Through extensive qualitative and quantitative experiments, we provide with a better comprehension of VAEs in tasks such as high-resolution reconstruction, and representation learning in the perspective of Fisher information and Shannon information. We further propose a variant of VAEs, termed as Fisher auto-encoder (FAE), for practical needs to balance Fisher information and Shannon information. Our experimental results have demonstrated its promise in improving the reconstruction accuracy and avoiding the noninformative latent code as occurred in previous works.

PDF Details

AAAI Conference 2018 Conference Paper

GraphGAN: Graph Representation Learning With Generative Adversarial Nets

Hongwei Wang
Jia Wang
Jialin Wang
Miao Zhao
Weinan Zhang
Fuzheng Zhang
Xing Xie
Minyi Guo

The goal of graph representation learning is to embed each vertex in a graph into a low-dimensional vector space. Existing graph representation learning methods can be classiﬁed into two categories: generative models that learn the underlying connectivity distribution in the graph, and discriminative models that predict the probability of edge existence between a pair of vertices. In this paper, we propose Graph- GAN, an innovative graph representation learning framework unifying above two classes of methods, in which the generative model and discriminative model play a game-theoretical minimax game. Speciﬁcally, for a given vertex, the generative model tries to ﬁt its underlying true connectivity distribution over all other vertices and produces “fake” samples to fool the discriminative model, while the discriminative model tries to detect whether the sampled vertex is from ground truth or generated by the generative model. With the competition between these two models, both of them can alternately and iteratively boost their performance. Moreover, when considering the implementation of generative model, we propose a novel graph softmax to overcome the limitations of traditional softmax function, which can be proven satisfying desirable properties of normalization, graph structure awareness, and computational efﬁciency. Through extensive experiments on real-world datasets, we demonstrate that Graph- GAN achieves substantial gains in a variety of applications, including link prediction, node classiﬁcation, and recommendation, over state-of-the-art baselines.

PDF Details

AAAI Conference 2010 Conference Paper

News Recommendation in Forum-Based Social Media

Jia Wang
Qing Li
Yuanzhu Chen
Jiafen Liu
Chen Zhang
Zhangxi Lin

Self-publication of news on Web sites is becoming a common application platform to enable more engaging interaction among users. Discussion in the form of comments following news postings can be effectively facilitated if the service provider can recommend articles based on not only the original news itself but also the thread of changing comments. This turns the traditional news recommendation to a “discussion moderator” that can intelligently assist online forums. In this work, we present a framework to implement such adaptive news recommendation. In addition, to alleviate the problem of recommending essentially identical articles, the relationship (duplication, generalization or specialization) between suggested news articles and the original posting is investigated. Experiments indicate that our proposed solutions provide an enhanced news recommendation service in forum-based social media.

PDF Details

SODA Conference 2007 Conference Paper

Compressing rectilinear pictures and minimizing access control lists

David L. Applegate
Gruia Calinescu
David S. Johnson 0001
Howard J. Karloff
Katrina Ligett
Jia Wang

Details