Arrow Research search

Author name cluster

Jun Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

269 papers
2 author rows

Possible papers

269

AAAI Conference 2026 Conference Paper

A General Anchor-Based Framework for Scalable Fair Clustering

  • Shengfei Wei
  • Suyuan Liu
  • Jun Wang
  • Ke Liang
  • Miaomiao Li
  • Lei Luo

Fair clustering is crucial for mitigating bias in unsupervised learning, yet existing algorithms often suffer from quadratic or super-quadratic computational complexity, rendering them impractical for large-scale datasets. To bridge this gap, we introduce the Anchor-based Fair Clustering Framework (AFCF), a novel, general, and plug-and-play framework that empowers arbitrary fair clustering algorithms with linear-time scalability. Our approach first selects a small but representative set of anchors using a novel fair sampling strategy. Then, any off-the-shelf fair clustering algorithm can be applied to this small anchor set. The core of our framework lies in a novel anchor graph construction module, where we formulate an optimization problem to propagate labels while preserving fairness. This is achieved through a carefully designed group-label joint constraint, which we prove theoretically ensures that the fairness of the final clustering on the entire dataset matches that of the anchor clustering. We solve this optimization efficiently using an ADMM-based algorithm. Extensive experiments on multiple large-scale benchmarks demonstrate that AFCF drastically accelerates state-of-the-art methods, which reduces computational time by orders of magnitude while maintaining strong clustering performance and fairness guarantees.

AAAI Conference 2026 Conference Paper

Bridging the Copyright Gap: Do Large Vision-Language Models Recognize and Respect Copyrighted Content?

  • Naen Xu
  • Jinghuai Zhang
  • Changjiang Li
  • Hengyu An
  • Chunyi Zhou
  • Jun Wang
  • Boyu Xu
  • Yuyuan Li

Large vision-language models (LVLMs) have achieved remarkable advancements in multimodal reasoning tasks. However, their widespread accessibility raises critical concerns about potential copyright infringement. Will LVLMs accurately recognize and comply with copyright regulations when encountering copyrighted content (i.e., user input, retrieved documents) in the context? Failure to comply with copyright regulations may lead to serious legal and ethical consequences, particularly when LVLMs generate responses based on copyrighted materials (e.g., retrieved book experts, news reports). In this paper, we present a comprehensive evaluation of various LVLMs, examining how they handle copyrighted content – such as book excerpts, news articles, music lyrics, and code documentation when they are presented as visual inputs. To systematically measure copyright compliance, we introduce a large-scale benchmark dataset comprising 50,000 multimodal query-content pairs designed to evaluate how effectively LVLMs handle queries that could lead to copyright infringement. Given that real-world copyrighted content may or may not include a copyright notice, the dataset includes query-content pairs in two distinct scenarios: with and without a copyright notice. For the former, we extensively cover four types of copyright notices to account for different cases. Our evaluation reveals that even state-of-the-art closed-source LVLMs exhibit significant deficiencies in recognizing and respecting the copyrighted content, even when presented with the copyright notice. To solve this limitation, we introduce a novel tool-augmented defense framework for copyright compliance, which reduces infringement risks in all scenarios. Our findings underscore the importance of developing copyright-aware LVLMs to ensure the responsible and lawful use of copyrighted content.

AAAI Conference 2026 Conference Paper

Cancer Survival Prediction by Cyclic Generation and Multi-grained Alignment

  • Yongqi Bu
  • Qinggang Niu
  • Zhen Li
  • Yanyu Xu
  • Jun Wang
  • Guoxian Yu

Cancer survival analysis with multimodal data is crucial for precise treatments and patient benefits. However, the following challenges prohibit integrating histopathology and genomics: (i) multimodal data is not always complete, especially for the more costly genomics data; (ii) intricate interactions between different modalities are difficult to capture and understand. To response, we propose an end-to-end framework (CIMA) that coordinates Cyclic modality generation and Multi-grained multimodal Alignment. Specifically, CIMA designs a cyclic modality reconstruction module to reciprocally impute missing modalities and infer the interactions between them. Next, it introduces the multi-grained alignment module over the imputed data and interactions to mine fine-grained alignments between histopathology (slide patches) and genomics (biological pathways). CIMA then constructs the adaptive fusion module to leverage multimodal data and alignments for survival prediction. Extensive experiments on cancer benchmark datasets demonstrate that CIMA outperforms existing methods and exhibits good interpretability, providing valuable insights into intricate relationships between pathological phenotypes and biological pathways.Our code is released in the supplementary materials.

AAAI Conference 2026 Conference Paper

Compression Artifacts Removal for VVC with Frequency Domain Mixture of Experts Network

  • Qijun Wang
  • Kang Wang
  • Jun Wang

In recent years, lossy compression algorithms such as H.264/AVC, H.265/HEVC, and H.266/VVC have been proposed and widely applied in image and video encoding. However, these compression algorithms inevitably introduce various complex types of compression artifacts, which severely degrade image quality. Although existing methods have attempted to remove artifacts through filter design or probabilistic prior modeling, they are often effective only for specific types of artifacts, lacking generalization and adaptability. To address this, we propose a novel image compression artifacts removal model: ARMoE, which combines multiple frequency domain transformations with the Mixture of Experts (MoE). Considering the frequency distribution and energy distribution differences of images, we introduce various frequency domain transformations as expert branches and use the Sparse Activation Strategy to adaptively select the optimal frequency domain expert to suppress compression artifacts, achieving an efficient artifacts removal method. Furthermore, we reencode and decode multiple original uncompressed high-quality datasets, including DF2K and Kodak24, using the VTM-20.0 codec under the H.266/VVC standard, constructing a more challenging artifacts dataset. We conducted rigorous comparative experiments with current state-of-the-art image restoration methods and the results demonstrate that ARMoE exhibits outstanding image restoration capability.

AAAI Conference 2026 Conference Paper

Counterfactual Fairness with Imperfect Causal Graphs

  • Cong Su
  • Qiaoyu Tan
  • Carlotta Domeniconi
  • Lizhen Cui
  • Jun Wang
  • Guoxian Yu

Fairness-aware machine learning aims to build predictive models that comply with fairness requirements, particularly concerning sensitive attributes such as race, gender, and age. Among causality-based fairness notions, counterfactual fairness is widely adopted for its individual-level guarantees, requiring that an individual’s predicted outcome remains unchanged in a counterfactual world where its sensitive attribute is altered. However, existing methods critically assume that the true causal graph is fully known, which is rarely the case in practice. Moreover, counterfactual fairness suffers from inherent identifiability limitations, as counterfactual quantities cannot always be uniquely estimated from observational data, especially under incomplete causal knowledge. To address these challenges, we propose a principled framework (CF-ICG) for counterfactual fairness under imperfectly known causal graphs, e.g., Completed Partially Directed Acyclic Graphs (CPDAGs). We first introduce a criterion to determine the identifiability, and bound the counterfactual quantities under CPDAGs. Building upon this, we develop an efficient local algorithm that avoids the exhaustive enumeration of all DAGs, ensuring robustness against worst-case fairness violations. Experimental results on synthetic and real-world datasets demonstrate the practical effectiveness and theoretical soundness of CF-ICG.

AAAI Conference 2026 Conference Paper

DMCAR: Disentangled Mixture-of-Experts with Context-Aware Routing for Multi-View Clustering

  • Baili Xiao
  • Ke Liang
  • Jiaqi Jin
  • Jun Wang
  • Yinbo Xu
  • Siwei Wang
  • En Zhu

Multi-View Clustering (MVC) aims to enhance clustering performance by integrating multi-source complementary information. However, existing deep MVC methods face inherent challenges in balancing the learning of shared consensus representations with the preservation of view-specific information: independent encoders hinder effective cross-view collaboration, while a single shared encoder tends to sacrifice representation diversity. Although the recently introduced Mixture-of-Experts (MoE) model offers a novel approach to facilitating view collaboration, its flattened expert pool design often leads to entanglement between shared and specific information, and its routing mechanism limits collaboration potential by neglecting cross-view context. To address these challenges, this paper proposes a novel deep multi-view clustering framework—Decoupled Mixture-of-Experts with Context-Aware Routing for Multi-View Clustering (DMCAR-MVC). At its core is an innovative Decoupled MoE (D-MoE) architecture. We establish a public expert pool to learn cross-view shared representations while equipping each view with an independent private expert pool to capture its unique information, thereby structurally enforcing the decoupling of shared and specific representations. Building on this, we further design a Context-Aware Hierarchical Routing (CAHR) mechanism. When routing for the public expert pool, this mechanism introduces a global context vector to guide expert selection, enabling more efficient and globally informed cross-view collaboration. Finally, to optimize the model, we adopt a multi-level contrastive learning paradigm: on one hand, a cross-view alignment loss ensures semantic consistency in shared representations; on the other, an orthogonality constraint is imposed to further enhance separability between shared and specific representations. Extensive experiments on multiple benchmark datasets demonstrate that DMCAR-MVC significantly outperforms state-of-the-art methods across key clustering metrics. Additionally, comprehensive ablation studies thoroughly validate the effectiveness and necessity of each proposed component.

AAAI Conference 2026 Conference Paper

GloTok: Global Perspective Tokenizer for Image Reconstruction and Generation

  • Xuan Zhao
  • Zhongyu Zhang
  • Yuge Huang
  • Yuxi Mi
  • Guodong Mu
  • Shouhong Ding
  • Jun Wang
  • Rizen Guo

Existing state-of-the-art image tokenization methods leverage diverse semantic features from pre-trained vision models for additional supervision, to expand the distribution of latent representations and thereby improve the quality of image reconstruction and generation. These methods employ a locally supervised approach for semantic supervision, which limits the uniformity of semantic distribution. However, VA-VAE proves that a more uniform feature distribution yields better generation performance. In this work, we introduce a Global Perspective Tokenizer (GloTok), which utilizes global relational information to model a more uniform semantic distribution of tokenized features. Specifically, a codebook-wise histogram relation learning method is proposed to transfer the semantics, which are modeled by pre-trained models on the entire dataset, to the semantic codebook. Then, we design a residual learning module which recovers the fine-grained details to minimize the reconstruction error caused by quantization. Through the above design, GloTok delivers more uniformly distributed semantic latent representations, which facilitates the training of autoregressive (AR) models for generating high-quality images without requiring direct access to pre-trained models during the training process. Experiments on the standard ImageNet-1k benchmark clearly show that our proposed method achieves state-of-the-art reconstruction performance and generation quality.

JBHI Journal 2026 Journal Article

Improving Medical Visual Representation Learning With Pathological-Level Cross-Modal Alignment and Correlation Exploration

  • Jun Wang
  • Lixing Zhu
  • Xiaohan Yu
  • Abhir Bhalerao
  • Yulan He

Learning medical visual representations from image-report pairs through joint learning has garnered increasing research attention due to its potential for transferring acquired knowledge to various downstream medical tasks. Previous works have predominantly focused on instance-wise or token-wise cross-modal alignment, often neglecting the importance of pathological-level consistency. This paper presents a novel framework PLACE that promotes the P athological- L evel A lignment and enriches the fine-grained details via C orrelation E xploration without additional human annotations. Specifically, we propose a novel pathological-level cross-modal alignment (PCMA) approach to maximize the consistency of pathology observations from both images and reports. To facilitate this, a Visual Pathology Observation Extractor is introduced to extract visual pathological observation representations from localized tokens. The PCMA module operates independently of any external disease annotations, enhancing the generalizability and robustness of our methods. Furthermore, we design a proxy task that enforces the model to identify correlations among image patches, thereby enriching the fine-grained details crucial for various downstream tasks. Experimental results demonstrate that our proposed framework achieves new state-of-the-art performance on multiple downstream tasks, including classification, image-to-text retrieval, semantic segmentation, object detection and report generation.

AAAI Conference 2026 Conference Paper

LSAP-PV: High-Fidelity Palm Vein Image Synthesis via Layered Spectral Absorption Projection-Guided Diffusion Model

  • Sheng Shang
  • Chenglong Zhao
  • Ruixin Zhang
  • Jianlong Jin
  • Jingyun Zhang
  • Jun Wang
  • Yang Zhao
  • Shouhong Ding

Palm vein recognition has emerged as a promising biometric technology, yet its development remains constrained by the scarcity of large-scale publicly available datasets. Several methods of palm vein image generation have been proposed to address this issue. These methods usually focus on the anatomical realism of palm vein patterns, but overlook the biophysical correlation between identities and vein patterns, particularly in simulating identity-specific vein contrast. To tackle this limitation, we propose a novel biophysics-driven synthesis method. Our method constructs a 3D palm vascular tree via established modeling method. Then, a projection model is proposed to map the 3D tree into 2D space to derive palm vein patterns. The projection model is based on skin spectral absorption and simulates the natural attenuation of light passing through the skin using a layer integration method. For different identities, we sample different skin parameters, resulting in varying degrees of attenuation. This method effectively simulates the variation in vein contrast across different identities. Furthermore, we introduce a conditional diffusion model that uses the projected patterns as identity conditions to generate palm vein images. To the best of our knowledge, this is the first palm vein generation method based on the diffusion model. Experimental results demonstrate that our method not only outperforms existing methods, but also enables a recognition model trained on our synthetic data to achieve superior performance compared to a model trained on real-world data at a scale of 2,000 IDs under an open-set protocol with a TAR@FAR=1:1 of 1e-4.

AAAI Conference 2026 Conference Paper

MLLM Enriched Explainable Multiple Clustering

  • Shan Zhang
  • Liangrui Ren
  • Qiaoyu Tan
  • Carlotta Domeniconi
  • Wei Du
  • Jun Wang
  • Guoxian Yu

Multiple clustering aims to uncover diverse latent structures within the data, enabling a more comprehensive understanding of complex datasets. However, existing approaches either heavily rely on user-supplied keywords or disregard user-interested clustering types, limiting the ability to discover the full range of explainable clusterings of interests, particularly in high-dimensional settings. Furthermore, existing methods insufficiently leverage the rich textual semantics and fall short in fully integrating multi-modal information. To address these challenges, we propose MLLM enriched Multiple Clustering (MLLMMC), a novel framework that leverages multi-modal large language model (MLLM) to explore explainable non-redundant clustering. Specifically, MLLMMC first employs MLLM to generate sample descriptions, which serve as input for LLM to perform prompt-driven reasoning and infer latent clustering types, and then merges them with user-interested types to obtain diverse and explainable clustering types. For each selected type, MLLMMC utilizes MLLM to generate sample-level textual descriptions and aligns them with corresponding visual features through a cross-attention fusion module, which produces a semantically aligned and enriched representation for the target clustering type. Extensive experiments on six benchmark datasets from diverse domains demonstrate that MLLMMC achieves diverse, explainable, and high-quality clustering outcomes, outperforming state-of-the-art multiple clustering methods with a large margin.

JBHI Journal 2026 Journal Article

Privacy Preserved Blood Glucose Level Cross-Prediction: An Asynchronous Decentralized Federated Learning Approach

  • Chengzhe Piao
  • Taiyu Zhu
  • Yu Wang
  • Stephanie E Baldeweg
  • Paul Taylor
  • Pantelis Georgiou
  • Jiahao Sun
  • Jun Wang

Newly diagnosed Type 1 Diabetes (T1D) patients often struggle to obtain effective Blood Glucose (BG) prediction models due to the lack of sufficient BG data from Continuous Glucose Monitoring (CGM), presenting a significant “cold start” problem in patient care. Utilizing population models to address this challenge is a potential solution, but collecting patient data for training population models in a privacy-conscious manner is challenging, especially given that such data is often stored on personal devices. Considering the privacy protection and addressing the “cold start” problem in diabetes care, we propose “GluADFL”, blood Glucose prediction by Asynchronous Decentralized Federated Learning. We compared GluADFL with eight baseline methods using four distinct T1D datasets, comprising 298 participants, which demonstrated its superior performance in accurately predicting BG levels for cross-patient analysis. Furthermore, patients’ data might be stored and shared across various communication networks in GluADFL, ranging from highly interconnected (e. g. , random, performs the best among others) to more structured topologies (e. g. , cluster and ring), suitable for various social networks. The asynchronous training framework supports flexible participation. By adjusting the ratios of inactive participants, we found it remains stable if less than 70% are inactive. Our results confirm that GluADFL offers a practical, privacy-preserved solution for BG prediction in T1D, significantly enhancing the quality of diabetes management.

AAAI Conference 2026 Conference Paper

Proactive Constrained Policy Optimization with Preemptive Penalty

  • Ning Yang
  • Pengyu Wang
  • Guoqing Liu
  • Haifeng Zhang
  • Pin Lyu
  • Jun Wang

Safe Reinforcement Learning (RL) often faces significant issues such as constraint violations and instability, necessitating the use of constrained policy optimization, which seeks optimal policies while ensuring adherence to specific constraints like safety. Typically, constrained optimization problems are addressed by the Lagrangian method, a post-violation remedial approach that may result in oscillations and overshoots. Motivated by this, we propose a novel method named Proactive Constrained Policy Optimization (PCPO) that incorporates a preemptive penalty mechanism. This mechanism integrates barrier items into the objective function as the policy nears the boundary, imposing a cost. Meanwhile, we introduce a constraint-aware intrinsic reward to guide boundary-aware exploration, which is activated only when the policy approaches the constraint boundary. We establish theoretical upper and lower bounds for the duality gap and the performance of the PCPO update, shedding light on the method's convergence characteristics. Additionally, to enhance the optimization performance, we adopt a policy iteration approach. An interesting finding is that PCPO demonstrates significant stability in experiments. Experimental results indicate that the PCPO framework provides a robust solution for policy optimization under constraints, with important implications for future research and practical applications.

TMLR Journal 2026 Journal Article

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

  • Guibin Zhang
  • Hejia Geng
  • Xiaohang Yu
  • Zhenfei Yin
  • Zaibin Zhang
  • Zelin Tan
  • Heng Zhou
  • Zhong-Zhi Li

The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM RL with the temporally extended Partially Observable Markov Decision Processes (POMDPs) that define Agentic RL. Building on this foundation, we propose a comprehensive twofold taxonomy: one organized around core agentic capabilities, including planning, tool use, memory, reasoning, self-improvement, and perception, and the other around their applications across diverse task domains. Central to our thesis is that reinforcement learning serves as the critical mechanism for transforming these capabilities from static, heuristic modules into adaptive, robust agentic behavior. To support and accelerate future research, we consolidate the landscape of open-source environments, benchmarks, and frameworks into a practical compendium. By synthesizing over five hundred recent works, this survey charts the contours of this rapidly evolving field and highlights the opportunities and challenges that will shape the development of scalable, general-purpose AI agents.

AAAI Conference 2026 Conference Paper

TVChain: Leveraging Textual-Visual Prompt Chains for Jailbreaking Large Vision-Language Models

  • Hao Yu
  • Ke Liang
  • Junxian Duan
  • Jun Wang
  • Siwei Wang
  • Chuan Ma
  • Xinwang Liu

Large Vision-Language Models (LVLMs) enhance the capabilities of Large Language Models by integrating visual inputs, thereby enabling advanced multimodal reasoning across diverse applications. However, these enhanced reasoning capabilities introduce new security risks, particularly to jailbreaking attacks that bypass built-in safety mechanisms to elicit harmful or unauthorized outputs. While recent efforts have explored adversarial and typographic prompts, most existing attacks suffer from three key limitations: reliance on auxiliary models, limited effectiveness in black-box scenarios, and inadequate exploitation of the LVLMs' intrinsic reasoning abilities. In this work, we propose TVChain, a novel black-box jailbreaking framework that explicitly intervenes in both the visual and textual reasoning processes of LVLMs. TVChain decomposes malicious prompts into a sequence of semantically meaningful sub-images that represent relevant objects and behaviors, thereby circumventing direct exposure of illicit content. In parallel, a carefully designed chain-of-thought (CoT) textual prompt is employed to steer the model's reasoning toward reconstructing the intended activity in a covert yet effective manner. We demonstrate that this compositional prompting strategy reduces the likelihood of triggering safety mechanisms while preserving attack efficacy. Extensive evaluations on eleven LVLMs (seven open-source and four commercial) across two benchmark datasets and three state-of-the-art defenses validate the effectiveness and robustness of TVChain.

AAAI Conference 2026 Conference Paper

Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach

  • Biao Wu
  • Meng Fang
  • Ling Chen
  • Ke Xu
  • Tao Cheng
  • Jun Wang

Recent advances in vision-language models have opened up new possibilities for reasoning-driven image geolocalization. However, existing approaches often rely on synthetic reasoning annotations or external image retrieval, which can limit interpretability and generalizability. In this paper, we present Geo-R, a retrieval-free framework that uncovers structured reasoning paths from existing ground-truth coordinates and optimizes geolocation accuracy via reinforcement learning. We propose the Chain of Region, a rule-based hierarchical reasoning paradigm that generates precise, interpretable supervision by mapping GPS coordinates to geographic entities (e.g., country, province, city) without relying on model-generated or synthetic labels. Building on this, we introduce a lightweight reinforcement learning strategy with coordinate-aligned rewards based on Haversine distance, enabling the model to refine predictions through spatially meaningful feedback. Our approach bridges structured geographic reasoning with direct spatial supervision, yielding improved localization accuracy, stronger generalization, and more transparent inference. Experimental results across multiple benchmarks confirm the effectiveness of Geo-R, establishing a new retrieval-free paradigm for scalable and interpretable image geolocalization. To facilitate further research and ensure reproducibility, both the model and code will be made publicly available.

AAAI Conference 2025 Conference Paper

4D Diffusion for Dynamic Protein Structure Prediction with Reference and Motion Guidance

  • Kaihui Cheng
  • Ce Liu
  • Qingkun Su
  • Jun Wang
  • Liwei Zhang
  • Yining Tang
  • Yao Yao
  • Siyu Zhu

Protein structure prediction is pivotal for understanding the structure-function relationship of proteins, advancing biological research, and facilitating pharmaceutical development and experimental design. While deep learning methods and the expanded availability of experimental 3D protein structures have accelerated structure prediction, the dynamic nature of protein structures has received limited attention. This study introduces an innovative 4D diffusion model incorporating molecular dynamics (MD) simulation data to learn dynamic protein structures. Our approach is distinguished by the following components: (1) a unified diffusion model capable of generating dynamic protein structures, including both the backbone and side chains, utilizing atomic grouping and side-chain dihedral angle predictions; (2) a reference network that enhances structural consistency by integrating the latent embeddings of the initial 3D protein structures; and (3) a motion alignment module aimed at improving temporal structural coherence across multiple time steps. To our knowledge, this is the first diffusion-based model aimed at predicting protein trajectories across multiple time steps simultaneously. Validation on benchmark datasets demonstrates that our model exhibits high accuracy in predicting dynamic 3D structures of proteins containing up to 256 amino acids over 32 time steps, effectively capturing both local flexibility in stable states and significant conformational changes.

NeurIPS Conference 2025 Conference Paper

A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning

  • Anjie Liu
  • Jianhong Wang
  • Samuel Kaski
  • Jun Wang
  • Mengyue Yang

Steering cooperative multi-agent reinforcement learning (MARL) towards desired outcomes is challenging, particularly when the global guidance from a human on the whole multi-agent system is impractical in a large-scale MARL. On the other hand, designing external mechanisms (e. g. , intrinsic rewards and human feedback) to coordinate agents mostly relies on empirical studies, lacking a easy-to-use research tool. In this work, we employ multi-agent influence diagrams (MAIDs) as a graphical framework to address the above issues. First, we introduce the concept of MARL interaction paradigms (orthogonal to MARL learning paradigms), using MAIDs to analyze and visualize both unguided self-organization and global guidance mechanisms in MARL. Then, we design a new MARL interaction paradigm, referred to as the targeted intervention paradigm that is applied to only a single targeted agent, so the problem of global guidance can be mitigated. In implementation, we introduce a causal inference technique—referred to as Pre-Strategy Intervention (PSI)—to realize the targeted intervention paradigm. Since MAIDs can be regarded as a special class of causal diagrams, a composite desired outcome that integrates the primary task goal and an additional desired outcome can be achieved by maximizing the corresponding causal effect through the PSI. Moreover, the bundled relevance graph analysis of MAIDs provides a tool to identify whether an MARL learning paradigm is workable under the design of an MARL interaction paradigm. In experiments, we demonstrate the effectiveness of our proposed targeted intervention, and verify the result of relevance graph analysis.

JBHI Journal 2025 Journal Article

A Trustworthy Curriculum Learning Guided Multi-Target Domain Adaptation Network for Autism Spectrum Disorder Classification

  • Jiale Dun
  • Jun Wang
  • Juncheng Li
  • Qianhui Yang
  • Wenlong Hang
  • Xiaofeng Lu
  • Shihui Ying
  • Jun Shi

Domain adaptation has demonstrated success in classification of multi-center autism spectrum disorder (ASD). However, current domain adaptation methods primarily focus on classifying data in a single target domain with the assistance of one or multiple source domains, lacking the capability to address the clinical scenario of identifying ASD in multiple target domains. In response to this limitation, we propose a Trustworthy Curriculum Learning Guided Multi-Target Domain Adaptation (TCL-MTDA) network for identifying ASD in multiple target domains. To effectively handle varying degrees of data shift in multiple target domains, we propose a trustworthy curriculum learning procedure based on the Dempster-Shafer (D-S) Theory of Evidence. Additionally, a domain-contrastive adaptation method is integrated into the TCL-MTDA process to align data distributions between source and target domains, facilitating the learning of domain-invariant features. The proposed TCL-MTDA method is evaluated on 437 subjects (including 220 ASD patients and 217 NCs) from the Autism Brain Imaging Data Exchange (ABIDE). Experimental results validate the effectiveness of our proposed method in multi-target ASD classification, achieving an average accuracy of 71. 46% (95% CI: 68. 85% - 74. 06%) across four target domains, significantly outperforming most baseline methods (p<0. 05).

IJCAI Conference 2025 Conference Paper

Aligning Contrastive Multiple Clusterings with User Interests

  • Shan Zhang
  • Liangrui Ren
  • Jun Wang
  • Yanyu Xu
  • Carlotta Domeniconi
  • Guoxian Yu

Multiple clustering approaches aim to partition complex data in different ways. These methods often exhibit a one-to-many relationship in their results, and relying solely on the data context may be insufficient to capture the patterns relevant to the user. User’s expectation is key for the multiple clustering task. Two main challenges exist: identifying the significant features to represent user interests and aligning those interests with the clustering results. To address this issue, we propose Contrastive Multiple Clusterings (CMClusts), which extends contrastive learning to multiple clustering by elevating traditional instance-level contrast to clustering-level contrast. Furthermore, CMClusts integrates user expectations or interests by extracting desired features through tailored data augmentations, enabling the model to effectively capture user-relevant clustering features. Experimental results on benchmark datasets show that CMClusts can generate interpretable and high-quality clusterings, which reflect different user interests.

ICLR Conference 2025 Conference Paper

Breaking Free from MMI: A New Frontier in Rationalization by Probing Input Utilization

  • Wei Liu 0144
  • Zhiying Deng
  • Zhongyu Niu
  • Jun Wang
  • Haozhao Wang
  • Zhigang Zeng
  • Ruixuan Li 0001

Extracting a small subset of crucial rationales from the full input is a key problem in explainability research. The most widely used fundamental criterion for rationale extraction is the maximum mutual information (MMI) criterion. In this paper, we first demonstrate that MMI suffers from diminishing marginal returns. Once part of the rationale has been identified, finding the remaining portions contributes only marginally to increasing the mutual information, making it difficult to use MMI to locate the rest. In contrast to MMI that aims to reproduce the prediction, we seek to identify the parts of the input that the network can actually utilize. This is achieved by comparing how different rationale candidates match the capability space of the weight matrix. The weight matrix of a neural network is typically low-rank, meaning that the linear combinations of its column vectors can only cover part of the directions in a high-dimensional space (high-dimension: the dimensions of an input vector). If an input is fully utilized by the network, it generally matches these directions (e.g., a portion of a hypersphere), resulting in a representation with a high norm. Conversely, if an input primarily falls outside (orthogonal to) these directions, its representation norm will approach zero, behaving like noise that the network cannot effectively utilize. Building on this, we propose using the norms of rationale candidates as an alternative objective to MMI. Through experiments on four text classification datasets and one graph classification dataset using three network architectures (GRUs, BERT, and GCN), we show that our method outperforms MMI and its improved variants in identifying better rationales. We also compare our method with a representative LLM (llama-3.1-8b-instruct) and find that our simple method gets comparable results to it and can sometimes even outperform it.

NeurIPS Conference 2025 Conference Paper

Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning

  • Xiangning Yu
  • Zhuohan Wang
  • Linyi Yang
  • Haoxuan Li
  • Anjie Liu
  • Xiao Xue
  • Jun Wang
  • Mengyue Yang

Chain-of-Thought (CoT) prompting plays an indispensable role in endowing large language models (LLMs) with complex reasoning capabilities. However, CoT currently faces two fundamental challenges: (1) Sufficiency, which ensures that the generated intermediate inference steps comprehensively cover and substantiate the final conclusion; and (2) Necessity, which identifies the inference steps that are truly indispensable for the soundness of the resulting answer. We propose a causal framework that characterizes CoT reasoning through the dual lenses of sufficiency and necessity. Incorporating causal Probability of Sufficiency and Necessity allows us not only to determine which steps are logically sufficient or necessary to the prediction outcome, but also to quantify their actual influence on the final reasoning outcome under different intervention scenarios, thereby enabling the automated addition of missing steps and the pruning of redundant ones. Extensive experimental results on various mathematical and commonsense reasoning benchmarks confirm substantial improvements in reasoning efficiency and reduced token usage without sacrificing accuracy. Our work provides a promising direction for improving LLM reasoning performance and cost-effectiveness. The code will be publicly available upon acceptance at: https: //anonymous. 4open. science/r/causalmath-1CEF.

NeurIPS Conference 2025 Conference Paper

CMoB: Modality Valuation via Causal Effect for Balanced Multimodal Learning

  • Jun Wang
  • Fuyuan Cao
  • Zhixin Xue
  • Xingwang Zhao
  • Jiye Liang

Existing early and late fusion frameworks in multimodal learning are confronted with the fundamental challenge of modality imbalance, wherein disparities in representational capacities induce inter-modal competition during training. Current research methodologies primarily rely on modality-level contribution assessments to measure gaps in representational capabilities and enhance poorly learned modalities, overlooking the dynamic variations of modality contributions across individual samples. To address this, we propose a Causal-aware Modality valuation approach for Balanced multimodal learning (CMoB). We define a benefit function based on Shannon's theory of informational uncertainty to evaluate the changes in the importance of samples across different stages of multimodal training. Inspired by human cognitive science, we propose a causal-aware modality contribution quantification method from a causal perspective to capture fine-grained changes in modality contribution degrees within samples. In the iterative training of multimodal learning, we develop targeted modal enhancement strategies that dynamically select and optimize modalities based on real-time evaluation of their contribution variations across training samples. Our method enhances the discriminative ability of key modalities and the learning capacity of weak modalities while achieving fine-grained balance in multimodal learning. Extensive experiments on benchmark multimodal datasets and multimodal frameworks demonstrate the superiority of our CMoB approach for balanced multimodal learning.

NeurIPS Conference 2025 Conference Paper

Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning

  • Xueqi Ma
  • Jun Wang
  • Yanbei Jiang
  • Sarah Erfani
  • Tongliang Liu
  • James Bailey

Large language models (LLMs) have achieved state-of-the-art performance in a variety of tasks, but remain largely opaque in terms of their internal mechanisms. Understanding these mechanisms is crucial to improve their reasoning abilities. Drawing inspiration from the interplay between neural processes and human cognition, we propose a novel interpretability framework to systematically analyze the roles and behaviors of attention heads, which are key components of LLMs. We introduce CogQA, a dataset that decomposes complex questions into step-by-step subquestions with a chain-of-thought design, each associated with specific cognitive functions such as retrieval or logical reasoning. By applying a multi-label probing method, we identify the attention heads responsible for these functions. Our analysis across multiple LLM families reveals that attention heads exhibit functional specialization, characterized as cognitive heads. These cognitive heads exhibit several key properties: they are universally sparse, and vary in number and distribution across different cognitive functions, and they display interactive and hierarchical structures. We further show that cognitive heads play a vital role in reasoning tasks—removing them leads to performance degradation, while augmenting them enhances reasoning accuracy. These insights offer a deeper understanding of LLM reasoning and suggest important implications for model design, training and fine-tuning strategies.

AAAI Conference 2025 Conference Paper

Coherency Improved Explainable Recommendation via Large Language Model

  • Shijie Liu
  • Ruixin Ding
  • Weihai Lu
  • Jun Wang
  • Mo Yu
  • Xiaoming Shi
  • Wei Zhang

Explainable recommender systems are designed to elucidate the explanation behind each recommendation, enabling users to comprehend the underlying logic. Previous works perform rating prediction and explanation generation in a multi-task manner. However, these works suffer from incoherence between predicted ratings and explanations. To address the issue, we propose a novel framework that employs a large language model (LLM) to generate a rating, transforms it into a rating vector, and finally generates an explanation based on the rating vector and user-item information. Moreover, we propose utilizing publicly available LLMs and pre-trained sentiment analysis models to automatically evaluate the coherence without human annotations. Extensive experimental results on three datasets of explainable recommendation show that the proposed framework is effective, outperforming state-of-the-art baselines with improvements of 7.3% in explainability and 4.4% in text quality.

NeurIPS Conference 2025 Conference Paper

Curious Causality-Seeking Agents in Open-ended Worlds

  • Zhiyu Zhao
  • Haoxuan Li
  • Haifeng Zhang
  • Jun Wang
  • Francesco Faccio
  • Jürgen Schmidhuber
  • Mengyue Yang

When building a world model, a common assumption is that the environment has a single, unchanging underlying causal rule, like applying Newton's laws to every situation. However, in truly open-ended environments, the apparent causal mechanism may drift over time because the agent continually encounters novel contexts and operates within a limited observational window. This brings about a problem that, when building a world model, even subtle shifts in policy or environment states can alter the very observed causal mechanisms. In this work, we introduce the Meta-Causal Graph as world models for open-ended environments, a minimal unified representation that efficiently encodes the transformation rules governing how causal structures shift across different latent world states. A single Meta-Causal Graph is composed of multiple causal subgraphs, each triggered by meta state, which is in the latent state space. Building on this representation, we introduce a Causality-Seeking Agent whose objectives are to (1) identify the meta states that trigger each subgraph, (2) discover the corresponding causal relationships by agent curiosity-driven intervention policy, and (3) iteratively refine the Meta-Causal Graph through ongoing curiosity-driven exploration and agent experiences. Experiments on both synthetic tasks and a challenging robot arm manipulation task demonstrate that our method robustly captures shifts in causal dynamics and generalizes effectively to previously unseen contexts.

JBHI Journal 2025 Journal Article

DC-ASTGCN: EEG Emotion Recognition Based on Fusion Deep Convolutional and Adaptive Spatio-Temporal Graph Convolutional Networks

  • Xiaodong Yang
  • Zhengping Zhu
  • Guangkang Jiang
  • Dandan Wu
  • Aijun He
  • Jun Wang

Thanks to advancements in artificial intelligence and brain-computer interface (BCI) research, there has been increasing attention towards emotion recognition techniques based on electroencephalogram (EEG) recently. The complexity of EEG data poses a challenge when it comes to accurately classifying emotions by integrating time, frequency, and spatial domain features. To address this challenge, this paper proposes a fusion model called DC-ASTGCN, which combines the strengths of deep convolutional neural network (DCNN) and adaptive spatio-temporal graphic convolutional neural network (ASTGCN) to comprehensively analyze and understand EEG signals. The DCNN focuses on extracting frequency-domain and local spatial features from EEG signals to identify brain region activity patterns, while the ASTGCN, with its spatio-temporal attention mechanism and adaptive brain topology layer, reveals the functional connectivity features between brain regions in different emotional states. This integration significantly enhances the model's ability to understand and recognize emotional states. Extensive experiments conducted on the DEAP and SEED datasets demonstrate that the DC-ASTGCN model outperforms existing state-of-the-art methods in terms of emotion recognition accuracy.

IJCAI Conference 2025 Conference Paper

Deep Learning for Multivariate Time Series Imputation: A Survey

  • Jun Wang
  • Wenjie Du
  • Yiyuan Yang
  • Linglong Qian
  • Wei Cao
  • Keli Zhang
  • Wenjia Wang
  • Yuxuan Liang

Missing values are ubiquitous in multivariate time series (MTS) data, posing significant challenges for accurate analysis and downstream applications. In recent years, deep learning-based methods have successfully handled missing data by leveraging complex temporal dependencies and learned data distributions. In this survey, we provide a comprehensive summary of deep learning approaches for multivariate time series imputation (MTSI) tasks. We propose a novel taxonomy that categorizes existing methods based on two key perspectives: imputation uncertainty and neural network architecture. Furthermore, we summarize existing MTSI toolkits with a particular emphasis on the PyPOTS Ecosystem, which provides an integrated and standardized foundation for MTSI research. Finally, we discuss key challenges and future research directions, which give insight for further MTSI research. This survey aims to serve as a valuable resource for researchers and practitioners in the field of time series analysis and missing data imputation tasks. A well-maintained MTSI paper and tool list is available at https: //github. com/WenjieDu/Awesome_Imputation.

NeurIPS Conference 2025 Conference Paper

DGCBench: A Deep Graph Clustering Benchmark

  • Benyu Wu
  • Yue Liu
  • Qiaoyu Tan
  • Xinwang Liu
  • Wei Du
  • Jun Wang
  • Guoxian Yu

Deep graph clustering (DGC) aims to partition graph nodes into distinct clusters in an unsupervised manner. Despite rapid advancements in this field, DGC remains inherently challenging due to the absence of ground-truth, which complicates the design of effective algorithms and impedes the establishment of standardized benchmarks. The lack of unified datasets, evaluation protocols, and metrics further exacerbates these challenges, making it difficult to systematically assess and compare DGC methods. To address these limitations, we introduce $\texttt{DGCBench}$, the first comprehensive and unified benchmark for DGC methods. It evaluates 12 state-of-the-art DGC methods across 12 datasets from diverse domains and scales, spanning 6 critical dimensions: $\textbf{discriminability}$, $\textbf{effectiveness}$, $\textbf{scalability}$, $\textbf{efficiency}$, $\textbf{stability}$, and $\textbf{robustness}$. Additionally, we develop $\texttt{PyDGC}$, an open-source Python library that standardizes the DGC training and evaluation paradigm. Through systematic experiments, we reveal persistent limitations in existing methods, specifically regarding the homophily bottleneck, training instability, vulnerability to perturbations, efficiency plateau, scalability challenges, and poor discriminability, thereby offering actionable insights for future research. We hope that $\texttt{DGCBench}$, $\texttt{PyDGC}$, and our analyses will collectively accelerate the progress in the DGC community. The code is available at https: //github. com/Marigoldwu/PyDGC.

IJCAI Conference 2025 Conference Paper

DUQ: Dual Uncertainty Quantification for Text-Video Retrieval

  • Xin Liu
  • Shibai Yin
  • Jun Wang
  • Jiaxin Zhu
  • Xingyang Wang
  • Yee-Hong Yang

Text-video retrieval establishes accurate similarity relationships between text and video through feature enhancement and granularity alignment. However, relying solely on similarity to associate intra-pair features and distinguish inter-pair features is insufficient, \textit{e. g. }, when querying a multi-scene video with sparse text or selecting the most relevant video from many similar candidates. In this paper, we propose a novel Dual Uncertainty Quantification (DUQ) model that separately handles uncertainties in intra-pair interaction and inter-pair exclusion. Specifically, to enhance intra-pair interaction, we propose an intra-pair similarity uncertainty module to provide similarity-based trustworthy predictions and explicitly model this uncertainty. To increase inter-pair exclusion, we propose an inter-pair distance uncertainty module to construct a distance-based diversity probability embeding, thereby widening the gap between similar features. The two components work synergistically, jointly improving the calculation of similarity between features. We evaluate our model on six benchmark datasets: MSRVTT (51. 2%), DiDeMo, MSVD, LSMDC, Charades, and VATEX, achieving state-of-the-art retrieval performance.

NeurIPS Conference 2025 Conference Paper

EconGym: A Scalable AI Testbed with Diverse Economic Tasks

  • Qirui Mi
  • Qipeng Yang
  • Zijun Fan
  • Wentian Fan
  • Heyang Ma
  • Chengdong Ma
  • Siyu Xia
  • Bo An

Artificial intelligence (AI) has become a powerful tool for economic research, enabling large-scale simulation and policy optimization. However, applying AI effectively requires simulation platforms for scalable training and evaluation—yet existing environments remain limited to simplified, narrowly scoped tasks, falling short of capturing complex economic challenges such as demographic shifts, multi-government coordination, and large-scale agent interactions. To address this gap, we introduce EconGym, a scalable and modular testbed that connects diverse economic tasks with AI algorithms. Grounded in rigorous economic modeling, EconGym implements 11 heterogeneous role types (e. g. , households, firms, banks, governments), their interaction mechanisms, and agent models with well-defined observations, actions, and rewards. Users can flexibly compose economic roles with diverse agent algorithms to simulate rich multi-agent trajectories across 25+ economic tasks for AI-driven policy learning and analysis. Experiments show that EconGym supports diverse and cross-domain tasks—such as coordinating fiscal, pension, and monetary policies—and enables benchmarking across AI, economic methods, and hybrids. Results indicate that richer task composition and algorithm diversity expand the policy space, while AI agents guided by classical economic methods perform best in complex settings. EconGym also scales to 100k agents with high realism and efficiency.

NeurIPS Conference 2025 Conference Paper

Efficient Pre-Training of LLMs via Topology-Aware Communication Alignment on More Than 9600 GPUs

  • Guoliang He
  • Youhe Jiang
  • Wencong Xiao
  • Jiang Kaihua
  • Shuguang Wang
  • Jun Wang
  • Du Zixian
  • Zhuo Jiang

The scaling law for large language models (LLMs) depicts that the path towards machine intelligence necessitates training at large scale. Thus, companies continuously build large-scale GPU clusters, and launch training jobs that span over thousands of computing nodes. However, LLM pre-training presents unique challenges due to its complex communication patterns, where GPUs exchange data in sparse yet high-volume bursts within specific groups. Inefficient resource scheduling exacerbates bandwidth contention, leading to suboptimal training performance. This paper presents Arnold, a scheduling system summarizing our experience to effectively align LLM communication patterns to data center topology at scale. In-depth characteristic study is performed to identify the impact of physical network topology to LLM pre-training jobs. Based on the insights, we develop a scheduling algorithm to effectively align communication patterns to physical network topology in data centers. Through simulation experiments, we show the effectiveness of our algorithm in reducing the maximum spread of communication groups by up to $1. 67$x. In production training, our scheduling system improves the end-to-end performance by $10. 6\%$ when training with more than $9600$ Hopper GPUs, a significant improvement for our training pipeline.

ICLR Conference 2025 Conference Paper

Efficient Reinforcement Learning with Large Language Model Priors

  • Xue Yan
  • Yan Song
  • Xidong Feng
  • Mengyue Yang
  • Haifeng Zhang
  • Haitham Bou-Ammar
  • Jun Wang

In sequential decision-making (SDM) tasks, methods like reinforcement learning (RL) and heuristic search have made notable advances in specific cases. However, they often require extensive exploration and face challenges in generalizing across diverse environments due to their limited grasp of the underlying decision dynamics. In contrast, large language models (LLMs) have recently emerged as powerful general-purpose tools, due to their capacity to maintain vast amounts of domain-specific knowledge. To harness this rich prior knowledge for efficiently solving complex SDM tasks, we propose treating LLMs as prior action distributions and integrating them into RL frameworks through Bayesian inference methods, making use of variational inference and direct posterior sampling. The proposed approaches facilitate the seamless incorporation of fixed LLM priors into both policy-based and value-based RL frameworks. Our experiments show that incorporating LLM-based action priors significantly reduces exploration and optimization complexity, substantially improving sample efficiency compared to traditional RL techniques, e.g., using LLM priors decreases the number of required samples by over 90\% in offline learning scenarios.

AAAI Conference 2025 Conference Paper

Emergence-Inspired Multi-Granularity Causal Learning

  • Hanwen Luo
  • Guoxian Yu
  • Jun Wang
  • Yanyu Xu
  • Yongqing Zheng
  • Qingzhong Li

Existing causal learning algorithms focus on micro-level causal discovery, confronting significant challenges in identifying the influence of macro systems, composed of micro-level variables, on other variables. This difficulty arises because the causal relationships in macro systems are often mediated through micro-level causal interactions, which can lead to erroneous causal discovery or omission when dispersed. To address this issue, we propose the Emergence-inspired Multi-granularity Causal learning (EMCausal) method. Inspired by the emerging phenomena of aggregating micro level variables into macro level representations, EMCausal introduces a progressive mapping encoder to simulate the process, thus capturing the causal relationships driven by these macro entities. Next, it introduces a causal consistency constraint to collaboratively reconstruct micro variables using macro-level representations, enabling the learning of a multi-granular causal structure. Experimental results on both synthetic and real datasets demonstrate that EMCausal can identify causal graphs under the influence of causal emergence, outperforming competitive baselines in term of accuracy and robustness.

IROS Conference 2025 Conference Paper

Enabling On-Chip Adaptive Linear Optimal Control via Linearized Gaussian Process

  • Yuan Gao
  • Yinyi Lai
  • Jun Wang
  • Yini Fang

Unpredictable and complex aerodynamic effects pose significant challenges to achieving precise flight control, emphasizing the necessity of adaptive control via data- driven models. Moreover, real hardware usually requires high-frequency and has limited on-chip computation, making it challenging to balance the model complexity and computational cost. To address these challenges, we incorporate a linearized Gaussian process (GP) to model the external aerodynamics and combine it with linear model predictive control, enabling real-time computability. More importantly, to compensate for the control performance sacrificed by GP linearization and reduce on-chip GP computations, we design active data collection strategies using Bayesian optimization with additive GP, reducing the performance sacrifice as much as possible. Specifically, we decompose the performance into force and trajectory partitions, where the force model is for the downstream controller, and the trajectory model is used to guide collection. Experimental results show that we can achieve comparable tracking errors with full GP (not real-time computable) while maintaining real-time computable on the real Crazyflies.

AAAI Conference 2025 Conference Paper

FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles

  • Tian-Hao Zhang
  • Jiawei Zhang
  • Jun Wang
  • Xinyuan Qian
  • Xu-Cheng Yin

Humans can perceive speakers’ characteristics (e.g., identity, gender, personality and emotion) by their appearance, which are generally aligned to their voice style. Recently, vision-driven Text-to-speech ( TTS ) scholars grounded their investigations on real-person faces, thereby restricting effective speech synthesis from applying to vast potential usage scenarios with diverse characters and image styles. To solve this issue, we introduce a novel FaceSpeak approach. It extracts salient identity characteristics and emotional representations from a wide variety of image styles. Meanwhile, it mitigates the extraneous information (e.g., background, clothing, and hair color, etc.), resulting in synthesized speech closely aligned with a character’s persona. Furthermore, to overcome the scarcity of multi-modal TTS data, we have devised an innovative dataset, namely Expressive Multi-Modal TTS ( EM2TTS), which is diligently curated and annotated to facilitate research in this domain. The experimental results demonstrate our proposed FaceSpeak can generate portrait-aligned voice with satisfactory naturalness and quality.

AAAI Conference 2025 Conference Paper

FedGOG: Federated Graph Out-of-Distribution Generalization with Diffusion Data Exploration and Latent Embedding Decorrelation

  • Pengyang Zhou
  • Chaochao Chen
  • Weiming Liu
  • Xinting Liao
  • Wenkai Shen
  • Jiahe Xu
  • Zhihui Fu
  • Jun Wang

Federated graph learning (FGL) has emerged as a promising approach to enable collaborative training of graph models while preserving data privacy. However, current FGL methods overlook the out-of-distribution (OOD) shifts that occur in real-world scenarios. The distribution shifts between training and testing datasets in each client impact the FGL performance. To address this issue, we propose federated graph OOD generalization framework FedGOG, which includes two modules, i.e., diffusion data exploration (DDE) and latent embedding decorrelation (LED). In DDE, all clients jointly train score models to accurately estimate the global graph data distribution and sufficiently explore sample space using score-based graph diffusion with conditional generation. In LED, each client models a global invariant GNN and a personalized spurious GNN. LED aims to decorrelate spuriousness from invariant relationships by minimizing the mutual information between two categories of latent embeddings from different GNN models. Extensive experiments on six benchmark datasets demonstrate the superiority of FedGOG.

NeurIPS Conference 2025 Conference Paper

GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning

  • Shutong Ding
  • Ke Hu
  • Shan Zhong
  • Haoyang Luo
  • Weinan Zhang
  • Jingya Wang
  • Jun Wang
  • Ye Shi

Recent advances in reinforcement learning (RL) have demonstrated the powerful exploration capabilities and multimodality of generative diffusion-based policies. While substantial progress has been made in offline RL and off-policy RL settings, integrating diffusion policies into on-policy frameworks like PPO remains underexplored. This gap is particularly significant given the widespread use of large-scale parallel GPU-accelerated simulators, such as IsaacLab, which are optimized for on-policy RL algorithms and enable rapid training of complex robotic tasks. A key challenge lies in computing state-action log-likelihoods under diffusion policies, which is straightforward for Gaussian policies but intractable for flow-based models due to irreversible forward-reverse processes and discretization errors (e. g. , Euler-Maruyama approximations). To bridge this gap, we propose GenPO, a generative policy optimization framework that leverages exact diffusion inversion to construct invertible action mappings. GenPO introduces a novel doubled dummy action mechanism that enables invertibility via alternating updates, resolving log-likelihood computation barriers. Furthermore, we also use the action log-likelihood for unbiased entropy and KL divergence estimation, enabling KL-adaptive learning rates and entropy regularization in on-policy updates. Extensive experiments on eight IsaacLab benchmarks, including legged locomotion (Ant, Humanoid, Anymal-D, Unitree H1, Go2), dexterous manipulation (Shadow Hand), aerial control (Quadcopter), and robotic arm tasks (Franka), demonstrate GenPO’s superiority over existing RL baselines. Notably, GenPO is the first method to successfully integrate diffusion policies into on-policy RL, unlocking their potential for large-scale parallelized training and real-world robotic deployment.

JBHI Journal 2025 Journal Article

How Deep is Your Guess? A Fresh Perspective on Deep Learning for Medical Time-Series Imputation

  • Linglong Qian
  • Hugh Logan Ellis
  • Tao Wang
  • Jun Wang
  • Robin Mitra
  • Richard Dobson
  • Zina Ibrahim

We present a comprehensive analysis of deep learning approaches for Electronic Health Record (EHR) time-series imputation, examining how the interplay between architectural and framework design decisions gives rise to higher-level properties of a given deep imputer model and distinct biases towards complex data characteristics. Our investigation reveals the varying capabilities of deep imputers in capturing complex spatio-temporal dependencies within EHRs, and that the effectiveness of the model depends on how its combined biases align with the characteristics of the medical time series. Our experimental evaluation challenges common assumptions about model complexity, demonstrating that larger models do not necessarily improve performance. Rather, carefully designed architectures can better capture the complex patterns inherent in clinical data. The study highlights the need for imputation approaches that prioritise clinically meaningful data reconstruction over statistical accuracy. Our experiments further reveal up to 20% in variations of imputation performance based on preprocessing and implementation choices, emphasising the need for standardised benchmarking methodologies. Finally, we identify critical gaps between current deep imputation methods and medical requirements, highlighting the importance of integrating clinical insights to achieve more reliable imputation approaches for healthcare applications.

IJCAI Conference 2025 Conference Paper

Imputation-free Incomplete Multi-view Clustering via Knowledge Distillation

  • Benyu Wu
  • Wei Du
  • Jun Wang
  • Guoxian Yu

Incomplete multi-view data presents a significant challenge for multi-view clustering (MVC). Existing incomplete MVC solutions commonly rely on data imputation to convert incomplete data into complete data. However, this paradigm suffers from the risk of error accumulation when clustering unreliable imputed data, causing suboptimal clustering performance. Moreover, using imputation to fulfill missing data is inefficient, while inferring data categories based solely on the existing views is extremely challenging. To this end, we propose an Imputation-free Incomplete MVC (I2MVC) via pseudo-supervised knowledge distillation. Specifically, I2MVC decomposes the incomplete MVC problem into two tasks: an MVC task for complete data and a pseudo-supervised classification task for fully incomplete data. A self-supervised simple contrastive Teacher network is trained for clustering complete data, and its knowledge is distilled into a lightweight pseudo-supervised Student network. The Student network, unrestricted by view completeness, further guides the clustering of fully incomplete data. Finally, the clustering results from both tasks are merged to generate the final clustering outcome. Experimental results on benchmark datasets demonstrate the effectiveness of I2MVC.

NeurIPS Conference 2025 Conference Paper

InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions

  • Liangjian Wen
  • Qun Dai
  • Jianzhuang Liu
  • Jiangtao Zheng
  • Yong Dai
  • Dongkai Wang
  • Zhao Kang
  • Jun Wang

In multimodal representation learning, synergistic interactions between modalities not only provide complementary information but also create unique outcomes through specific interaction patterns that no single modality could achieve alone. Existing methods may struggle to effectively capture the full spectrum of synergistic information, leading to suboptimal performance in tasks where such interactions are critical. This is particularly problematic because synergistic information constitutes the fundamental value proposition of multimodal representation. To address this challenge, we introduce InfMasking, a contrastive synergistic information extraction method designed to enhance synergistic information through an Infinite Masking strategy. InfMasking stochastically occludes most features from each modality during fusion, preserving only partial information to create representations with varied synergistic patterns. Unmasked fused representations are then aligned with masked ones through mutual information maximization to encode comprehensive synergistic information. This infinite masking strategy enables capturing richer interactions by exposing the model to diverse partial modality combinations during training. As computing mutual information estimates with infinite masking is computationally prohibitive, we derive an InfMasking loss to approximate this calculation. Through controlled experiments, we demonstrate that InfMasking effectively enhances synergistic information between modalities. In evaluations on large-scale real-world datasets, InfMasking achieves state-of-the-art performance across seven benchmarks. Code is released at https: //github. com/brightest66/InfMasking.

JBHI Journal 2025 Journal Article

MDD2DG-IRA: Multivariate Degree Distribution to Dynamic Graph With Inter-Channel Relevance Attention Mechanism for Multi-Channel Myocardial Infarction ECG Analysis

  • Xiaodong Yang
  • Guangkang Jiang
  • Zhengping Zhu
  • Dandan Wu
  • Aijun He
  • Jun Wang

We introduced a novel methodology Multivariate Degree Distribution to Dynamic Graph (MDD2DG) with Inter-channel Relevance Attention (IRA) mechanism to analyze multi-channel Electrocardiogram (ECG) signals and explore signal connections across different channels. Our methodology comprises three main steps. First, multi-channel cardiac signals are transformed into multi-channel visual graphs to extract crucial degree distribution features. Then, degree distributions are mapped into dynamic graphs using a neural network with an IRA mechanism. After that, critical features are extracted within dynamic graphs utilizing a Graph Convolutional Neural Networks (GCNNs), and classification is subsequently performed using a multilayer perceptron. In this model, a method of multi-scale position embedding was introduced, which significantly enhanced the processing efficiency of the model by providing a simpler yet sufficiently effective feature representation. Compared to traditional complex network methods, our approach replaces fixed formula-calculated features with dynamic graph models, resulting in improved recognition accuracy. In the experiments, we achieved an impressive 99. 94% classification accuracy for distinguishing ECG signals from the five distinct locations (AMI, ASMI, ALMI, IMI and ILMI) with myocardial infarction (MI) as well as those of the healthy controls (HC). This work contributes to the analysis of complex physiological signals in the field of multi-channel ECG sequence, and provides a robust approach with promising implications for improving clinical medicine and the early detection of cardiac diseases.

AAMAS Conference 2025 Conference Paper

Mean Field Correlated Imitation Learning

  • Zhiyu Zhao
  • Chengdong Ma
  • Qirui Mi
  • Ning Yang
  • Xue Yan
  • Mengyue Yang
  • Haifeng Zhang
  • Jun Wang

Modeling the behaviors of many-agent games is crucial for capturing the dynamics of large-scale complex systems. This is typically achieved by recovering policies from demonstrations within the Mean Field Game Imitation Learning (MFGIL) framework. However, most MFGIL methods assume that demonstrations are collected from Mean Field Nash Equilibrium (MFNE), implying that agents make decisions independently. When directly applied to situations where agents’ decisions are coordinated, such as publicly routed traffic networks, these techniques often fall short. In this paper, we propose the Adaptive Mean Field Correlated Equilibrium (AMFCE), which introduces a generalized assumption that effectively integrates the correlated behaviors common in real-world systems. We prove the existence of AMFCE under mild conditions and theoretically show that MFNE is a special case of AMFCE. Building upon this, we introduce a new Mean Field Correlated Imitation Learning (MFCIL) algorithm, which recovers expert policy more accurately in scenarios where agents’ decisions are coordinated. We also provide a theoretical upper bound for the error in recovering the expert policy, which is tighter than that of existing methods. Empirical results on real-world traffic flow prediction and large-scale economic simulations demonstrate that MFCIL significantly improves the predictive performance of large populations’ behaviors compared to existing MFGIL baselines. This improvement highlights potential of MFCIL to model real-world multi-agent systems. *Corresponding to Yaodong Yang ⟨yaodong. yang@pku. edu. cn⟩. This work is licensed under a Creative Commons Attribution International 4. 0 License. Proc. of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2025), Y. Vorobeychik, S. Das, A. Nowé (eds.), May 19 – 23, 2025, Detroit, Michigan, USA. © 2025 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org).

AAAI Conference 2025 Conference Paper

MEATRD: Multimodal Anomalous Tissue Region Detection Enhanced with Spatial Transcriptomics

  • Kaichen Xu
  • Qilong Wu
  • Yan Lu
  • Yinan Zheng
  • Wenlin Li
  • Xingjie Tang
  • Jun Wang
  • Xiaobo Sun

The detection of anomalous tissue regions (ATRs) within affected tissues is crucial in clinical diagnosis and pathological studies. Conventional automated ATR detection methods, primarily based on histology images alone, falter in cases where ATRs and normal tissues have subtle visual differences. The recent spatial transcriptomics (ST) technology profiles gene expressions across tissue regions, offering a molecular perspective for detecting ATRs. However, there is a dearth of ATR detection methods that effectively harness complementary information from both histology images and ST. To address this gap, we propose MEATRD, a novel ATR detection method that integrates histology image and ST data. MEATRD is trained to reconstruct image patches and gene expression profiles of normal tissue spots (inliers) from their multimodal embeddings, followed by learning a one-class classification AD model based on latent multimodal reconstruction errors. This strategy harmonizes the strengths of reconstruction-based and one-class classification approaches. At the heart of MEATRD is an innovative masked graph dual-attention transformer (MGDAT) network, which not only facilitates cross-modality and cross-node information sharing but also addresses the model over-generalization issue commonly seen in reconstruction-based AD methods. Additionally, we demonstrate that modality-specific, task-relevant information is collated and condensed in multimodal bottleneck encoding generated in MGDAT, marking the first theoretical analysis of the informational properties of multimodal bottleneck encoding. Extensive evaluations across eight real ST datasets reveal MEATRD's superior performance in ATR detection, surpassing various state-of-the-art AD methods. Remarkably, MEATRD also proves adept at discerning ATRs that only show slight visual deviations from normal tissues.

AAAI Conference 2025 Conference Paper

MeRino: Entropy-Driven Design for Generative Language Models on IoT Devices

  • Youpeng Zhao
  • Ming Lin
  • Huadong Tang
  • Qiang Wu
  • Jun Wang

Generative Large Language Models (LLMs) stand as a revolutionary advancement in the modern era of artificial intelligence (AI). However, scaling down LLMs for resource-constrained hardware, such as Internet-of-Things (IoT) devices requires non-trivial efforts and domain knowledge. In this paper, we propose a novel information-entropy framework for designing mobile-friendly generative language models. The whole design procedure involves solving a mathematical programming (MP) problem, which can be done on the CPU within minutes, making it nearly zero-cost. We evaluate our designed models, termed MeRino, across fourteen NLP downstream tasks, showing their competitive performance against the state-of-the-art autoregressive transformer models under the mobile setting. Notably, MeRino achieves similar or better performance on both language modeling and zero-shot learning tasks, compared to the 350M parameter OPT while being 4.9x faster on NVIDIA Jetson Nano with 5.5x reduction in model size.

NeurIPS Conference 2025 Conference Paper

MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework

  • Qirui Mi
  • Mengyue Yang
  • Xiangning Yu
  • Zhiyu Zhao
  • Cheng Deng
  • Bo An
  • Haifeng Zhang
  • Xu Chen

Simulating collective decision-making involves more than aggregating individual behaviors; it emerges from dynamic interactions among individuals. While large language models (LLMs) offer strong potential for social simulation, achieving quantitative alignment with real-world data remains a key challenge. To bridge this gap, we propose the \textbf{M}ean-\textbf{F}ield \textbf{LLM} (\textbf{MF-LLM}) framework, the first to incorporate mean field theory into LLM-based social simulation. MF-LLM models bidirectional interactions between individuals and the population through an iterative process, generating population signals to guide individual decisions, which in turn update the signals. This interplay produces coherent trajectories of collective behavior. To improve alignment with real-world data, we introduce \textbf{IB-Tune}, a novel fine-tuning method inspired by the \textbf{I}nformation \textbf{B}ottleneck principle, which retains population signals most predictive of future actions while filtering redundant history. Evaluated on a real-world social dataset, MF-LLM reduces KL divergence to human population distributions by \textbf{47\%} compared to non-mean-field baselines, enabling accurate trend forecasting and effective intervention planning. Generalizing across 7 domains and 4 LLM backbones, MF-LLM provides a scalable, high-fidelity foundation for social simulation.

JBHI Journal 2025 Journal Article

MFRC-Net: Multi-Scale Feature Residual Convolutional Neural Network for Motor Imagery Decoding

  • Xiao Li
  • Zhuowei Yang
  • Xikai Tu
  • Jun Wang
  • Jian Huang

Motor imagery (MI) decoding is the basis of external device control via electroencephalogram (EEG). However, the majority of studies prioritize enhancing the accuracy of decoding methods, often overlooking the magnitude and computational resource demands of deep learning models. In this study, we propose a novel lightweight Multi-Scale Feature Residual Convolutional Neural Network (MFRC-Net). MFRC-Net primarily consists of two blocks: temporal multi-scale residual convolution blocks and cross-domain dual-stream spatial convolution blocks. The former captures dynamic changes in EEG signals across various time scales through multi-scale grouped convolution and backbone temporal convolution skip connections; the latter improves local spatial feature extraction and calibrates feature mapping through the introduction of cross-domain spatial filtering layers. Furthermore, by specifically optimizing the loss function, MFRC-Net effectively reduces sensitivity to outliers. Experiment results on the BCI Competition IV 2a dataset and the SHU dataset demonstrate that, with a parameter size of only 13 K, MFRC-Net achieves accuracy of 85. 1% and 69. 3%, respectively, surpassing current state-of-the-art models. The integration of temporal multi-scale residual convolution blocks and cross-domain dual-stream spatial convolution blocks in lightweight models significantly boosts performance, as evidenced by ablation studies and visualizations.

NeurIPS Conference 2025 Conference Paper

MobileUse: A Hierarchical Reflection-Driven GUI Agent for Autonomous Mobile Operation

  • Ning Li
  • Xiangmou Qu
  • Jiamu Zhou
  • Muning Wen
  • Kounianhua Du
  • Xingyu Lou
  • Qiuying Peng
  • Jun Wang

Recent advances in Multimodal Large Language Models (MLLMs) have enabled the development of mobile agents that can understand visual inputs and follow user instructions, unlocking new possibilities for automating complex tasks on mobile devices. However, applying these models to real-world mobile scenarios remains a significant challenge due to the long-horizon task execution, difficulty in error recovery, and the cold-start problem in unfamiliar environments. To address these challenges, we propose MobileUse, a GUI agent designed for robust and adaptive mobile task execution. To improve resilience in long-horizon tasks and dynamic environments, we introduce a hierarchical reflection architecture that enables the agent to self-monitor, detect, and recover from errors across multiple temporal scales—ranging from individual actions to overall task completion—while maintaining efficiency through a Reflection-on-Demand strategy. To tackle cold-start issues, we further introduce a proactive exploration module, which enriches the agent’s understanding of the environment through self-planned exploration. Evaluations on the AndroidWorld and AndroidLab benchmarks demonstrate that MobileUse establishes new state-of-the-art performance, achieving success rates of 62. 9% and 44. 2%, respectively. To facilitate real-world applications, we release an out-of-the-box toolkit for automated task execution on physical mobile devices, which is available at https: //github. com/MadeAgents/mobile-use.

AAAI Conference 2025 Conference Paper

MTGA: Multi-View Temporal Granularity Aligned Aggregation for Event-Based Lip-Reading

  • Wenhao Zhang
  • Jun Wang
  • Yong Luo
  • Lei Yu
  • Wei Yu
  • Zheng He
  • Jialie Shen

Lip-reading is to utilize the visual information of the speaker’s lip movements to recognize words and sentences. Existing event-based lip-reading solutions integrate different frame rate branches to learn spatio-temporal features of varying granularities. However, aggregating events into event frames inevitably leads to the loss of fine-grained temporal information within frames. To remedy this drawback, we propose a novel framework termed Multi-view Temporal Granularity aligned Aggregation (MTGA). Specifically, we first present a novel event representation method, namely time-segmented voxel graph list, where the most significant local voxels are temporally connected into a graph list. Then we design a spatio-temporal fusion module based on temporal granularity alignment, where the global spatial features extracted from event frames, together with the local relative spatial and temporal features contained in voxel graph list are effectively aligned and integrated. Finally, we design a temporal aggregation module that incorporates positional encoding, which enables the capture of local absolute spatial and global temporal information. Experiments demonstrate that our method outperforms both the event-based and video-based lip-reading counterparts.

AAMAS Conference 2025 Conference Paper

Negotiated Reasoning: On Provably Addressing Relative Over-Generalization

  • Junjie Sheng
  • Wenhao Li
  • Bo Jin
  • Hongyuan Zha
  • Jun Wang
  • Xiangfeng Wang

We focus on the relative over-generalization (RO) issue in fully cooperative multi-agent reinforcement learning (MARL). Existing methods show that endowing agents with reasoning can help mitigate RO empirically, but there is little theoretical insight. We first prove that RO is avoided when agents satisfy a consistent reasoning requirement. We then propose a new negotiated reasoning framework connecting reasoning and RO with theoretical guarantees. Based on it, we develop an algorithm called Stein variational negotiated reasoning (SVNR), which uses Stein variational gradient descent to form a negotiation policy that provably bypasses RO under maximumentropy policy iteration. SVNR is further parameterized with neural networks for computational efficiency. Experiments demonstrate that SVNR significantly outperforms baselines on RO-challenged tasks, confirming its advantage in achieving better cooperation.

AAAI Conference 2025 Conference Paper

Noise-Injected Spiking Graph Convolution for Energy-Efficient 3D Point Cloud Denoising

  • Zikuan Li
  • Qiaoyun Wu
  • Jialin Zhang
  • Kaijun Zhang
  • Jun Wang

Spiking neural networks (SNNs), inspired by the inherent spiking computation paradigm of the biological neural systems, have exhibited superior energy efficiency in 2D classification tasks over traditional artificial neural networks (ANNs). However, the regression potential of SNNs has not been well explored, especially in 3D point cloud processing. In this paper, we propose noise-injected spiking graph convolutional networks to leverage the full regression potential of SNNs in 3D point cloud denoising. Specifically, we first emulate the noise-injected neuronal dynamics to build noise-injected spiking neurons. On this basis, we design noise-injected spiking graph convolution for promoting disturbance-aware spiking representation learning on 3D points. Starting from the spiking graph convolution, we build two SNN-based denoising networks. One is a purely spiking graph convolutional network, which achieves low accuracy loss compared with some ANN-based alternatives, while resulting in significantly reduced energy consumption on two benchmark datasets, PU-Net and PC-Net. The other is a hybrid architecture, which integrates some ANN-based learning operations and exhibits a high performance-efficiency trade-off with only a few time steps. Our work lights up SNN’s potential for 3D point cloud denoising, injecting new perspectives of exploring the deployment on neuromorphic chips while paving the way for developing energy-efficient 3D data acquisition devices.

ICRA Conference 2025 Conference Paper

Plug-and-Play Physics-Informed Learning Using Uncertainty Quantified Port-Hamiltonian Models

  • Kaiyuan Tan
  • Peilun Li
  • Jun Wang
  • Thomas Beckers 0001

The ability to predict trajectories of surrounding agents and obstacles is a crucial component in many robotic applications. Data-driven approaches are commonly adopted for state prediction in scenarios where the underlying dynamics are unknown. However, the performance, reliability, and uncertainty of data-driven predictors become compromised when encountering out-of-distribution observations relative to the training data. In this paper, we introduce a Plug-and-Play Physics-Informed Machine Learning (PnP-PIML) framework to address this challenge. Our method employs conformal prediction to identify outlier dynamics and, in that case, switches from a nominal predictor to a physics-consistent model, namely distributed Port-Hamiltonian systems (dPHS). We leverage Gaussian processes to model the energy function of the dPHS, enabling not only the learning of system dynamics but also the quantification of predictive uncertainty through its Bayesian nature. In this way, the proposed framework produces reliable physics-informed predictions even for the out-of-distribution scenarios.

NeurIPS Conference 2025 Conference Paper

Quantifying Distributional Invariance in Causal Subgraph for IRM-Free Graph Generalization

  • Yang Qiu
  • Yixiong Zou
  • Jun Wang
  • Wei Liu
  • Xiangyu Fu
  • Ruixuan Li

Out-of-distribution generalization under distributional shifts remains a critical challenge for graph neural networks. Existing methods generally adopt the Invariant Risk Minimization (IRM) framework, requiring costly environment annotations or heuristically generated synthetic splits. To circumvent these limitations, in this work, we aim to develop an IRM-free method for capturing causal subgraphs. We first identify that causal subgraphs exhibit substantially smaller distributional variations than non-causal components across diverse environments, which we formalize as the Invariant Distribution Criterion and theoretically prove in this paper. Building on this criterion, we systematically uncover the quantitative relationship between distributional shift and representation norm for identifying the causal subgraph, and investigate its underlying mechanisms in depth. Finally, we propose an IRM-free method by introducing a norm-guided invariant distribution objective for causal subgraph discovery and prediction. Extensive experiments on two widely used benchmarks demonstrate that our method consistently outperforms state-of-the-art methods in graph generalization. Code is available at https: //github. com/anders1123/IDG.

ICLR Conference 2025 Conference Paper

Recovery of Causal Graph Involving Latent Variables via Homologous Surrogates

  • Xiu-Chuan Li
  • Jun Wang
  • Tongliang Liu

Causal discovery with latent variables is an important and challenging problem. To identify latent variables and infer their causal relations, most existing works rely on the assumption that latent variables have pure children. Considering that this assumption is potentially restrictive in practice and not strictly necessary in theory, in this paper, by introducing the concept of homologous surrogate, we eliminate the need for pure children in the context of causal discovery with latent variables. The homologous surrogate fundamentally differs from the pure child in the sense that the latter is characterized by having strictly restricted parents while the former allows for much more flexible parents. We formulate two assumptions involving homologous surrogates and develop theoretical results under each assumption. Under the weaker assumption, our theoretical results imply that we can determine each variable's ancestors, that is, partially recover the causal graph. The stronger assumption further enables us to determine each variable's parents exactly, that is, fully recover the causal graph. Building on these theoretical results, we derive an algorithm that fully leverages the properties of homologous surrogates for causal graph recovery. Also, we validate its efficacy through experiments. Our work broadens the applicability of causal discovery. Our code is available at: https://github.com/XiuchuanLi/ICLR2025-CDHS

NeurIPS Conference 2025 Conference Paper

ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

  • Ziyu Wan
  • Yunxiang Li
  • Xiaoyu Wen
  • Yan Song
  • Hanjing Wang
  • Linyi Yang
  • Mark Schmidt
  • Jun Wang

Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking—enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors, encouraging LLMs to think about thinking. ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions. Through iterative reinforcement learning with aligned objectives, these agents explore and learn collaboration, leading to improved generalization and robustness. Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks, including competitive-level mathematical benchmarks and LLM-as-a-Judge benchmarks. Additionally, we further extend ReMA to multi-turn interaction settings, leveraging turn-level ratio and parameter sharing to improve efficiency. Comprehensive ablation studies further illustrate the evolving dynamics of each distinct agent, providing valuable insights into how the meta-thinking reasoning process enhances the reasoning capabilities of LLMs.

NeurIPS Conference 2025 Conference Paper

Risk-aware Direct Preference Optimization under Nested Risk Measure

  • Lijun Zhang
  • Lin Li
  • Yajie Qi
  • Huizhong Song
  • Yaodong Yang
  • Jun Wang
  • Wei Wei

When fine-tuning pre-trained Large Language Models (LLMs) to align with human values and intentions, maximizing the estimated reward can lead to superior performance, but it also introduces potential risks due to deviations from the reference model's intended behavior. Most existing methods typically introduce KL divergence to constrain deviations between the trained model and the reference model; however, this may not be sufficient in certain applications that require tight risk control. In this paper, we introduce Risk-aware Direct Preference Optimization (Ra-DPO), a novel approach that incorporates risk-awareness by employing a class of nested risk measures. This approach formulates a constrained risk-aware advantage function maximization problem and then converts the Bradley-Terry model into a token-level representation. The objective function maximizes the likelihood of the policy while suppressing the deviation between a trained model and the reference model using a sequential risk ratio, thereby enhancing the model's risk-awareness. Experimental results across three open-source datasets: IMDb Dataset, Anthropic HH Dataset, and AlpacaEval, demonstrate the proposed method's superior performance in balancing alignment performance and model drift.

NeurIPS Conference 2025 Conference Paper

Scalable Cross-View Sample Alignment for Multi-View Clustering with View Structure Similarity

  • Jun Wang
  • Zhenglai Li
  • Chang Tang
  • Suyuan Liu
  • Hao Yu
  • Chuan Tang
  • Miaomiao Li
  • Xinwang Liu

Most existing multi-view clustering methods aim to generate a consensus partition across all views, based on the assumption that all views share the same sample arrangement. However, in real-world scenarios, the collected data across different views is often unsynchronized, making it difficult to ensure consistent sample correspondence between views. To address this issue, we propose a scalable sample-alignment-based multi-view clustering method, referred to as SSA-MVC. Specifically, we first employ a cluster-label matching (CLM) algorithm to select the view whose clustering labels best match those of the others as the benchmark view. Then, for each of the remaining views, we construct representations of non-aligned samples by computing their similarities with aligned samples. Based on these representations, we build a similarity graph between the non-aligned samples of each view and those in the benchmark view, which serves as the alignment criterion. This alignment criterion is then integrated into a late-fusion framework to enable clustering without requiring aligned samples. Notably, the learned sample alignment matrix can be used to enhance existing multi-view clustering methods in scenarios where sample correspondence is unavailable. The effectiveness of the proposed SSA-MVC algorithm is validated through extensive experiments conducted on eight real-world multi-view datasets.

NeurIPS Conference 2025 Conference Paper

Self-Evolving Pseudo-Rehearsal for Catastrophic Forgetting with Task Similarity in LLMs

  • Jun Wang
  • Liang Ding
  • Shuai Wang
  • Hongyu Li
  • Yong Luo
  • Huangxuan Zhao
  • Han Hu
  • Bo Du

Continual learning for large language models (LLMs) demands a precise balance between $\textbf{plasticity}$ - the ability to absorb new tasks - and $\textbf{stability}$ - the preservation of previously learned knowledge. Conventional rehearsal methods, which replay stored examples, are limited by long-term data inaccessibility; earlier pseudo-rehearsal methods require additional generation modules, while self-synthesis approaches often generate samples that poorly align with real tasks, suffer from unstable outputs, and ignore task relationships. We present $\textbf{\textit{Self-Evolving Pseudo-Rehearsal for Catastrophic Forgetting with Task Similarity}}(\textbf{SERS})$, a lightweight framework that 1) decouples pseudo-input synthesis from label creation, using semantic masking and template guidance to produce diverse, task-relevant prompts without extra modules; 2) applies label self-evolution, blending base-model priors with fine-tuned outputs to prevent over-specialization; and 3) introduces a dynamic regularizer driven by the Wasserstein distance between task distributions, automatically relaxing or strengthening constraints in proportion to task similarity. Experiments across diverse tasks on different LLMs show that our SERS reduces forgetting by over 2\% points against strong pseudo-rehearsal baselines, by ensuring efficient data utilization and wisely transferring knowledge. The code will be released at https: //github. com/JerryWangJun/LLM_CL_SERS/.

IJCAI Conference 2025 Conference Paper

Self-supervised End-to-end ToF Imaging Based on RGB-D Cross-modal Dependency

  • Weihang Wang
  • Jun Wang
  • Fei Wen

Time-of-Flight (ToF) imaging systems are susceptible to various noise and degradation, which can severely affect image quality. Traditional sequential imaging pipelines often suffer from error accumulation due to separate multi-stage processing. Existing end-to-end methods typically rely on noisy-clean depth image pairs for supervised learning. However, acquiring ground-truth is challenging in real-world scenarios due to factors such as Multi-Path Interference (MPI), phase wrapping, and complex noise patterns. In this paper, we propose a self-supervised learning framework for end-to-end ToF imaging, which does not require any noisy-clean pairs yet generalizes well across various off-the-shelf cameras. Our framework leverages the cross-modal dependencies between RGB and depth data as implicit supervision to effectively suppress noise and maintain image fidelity. Additionally, the loss function integrates the statistical characteristics of raw measurement data, enhancing robustness against noise and artifacts. Extensive experiments on both synthetic and real-world data demonstrate that our approach achieves performance comparable to supervised methods, without requiring paired noisy-clean data for training. Furthermore, our method consistently delivers strong performance across all evaluated cameras, highlighting its generalization capabilities. The code is available at https: //github. com/WeihangWANG/RGBD_imaging.

NeurIPS Conference 2025 Conference Paper

Self-Verifying Reflection Helps Transformers with CoT Reasoning

  • Zhongwei Yu
  • Wannian Xia
  • Xue Yan
  • Bo Xu
  • Haifeng Zhang
  • Yali Du
  • Jun Wang

Advanced large language models (LLMs) frequently reflect in reasoning chain-of-thoughts (CoTs), where they self-verify the correctness of current solutions and explore alternatives. However, given recent findings that LLMs detect limited errors in CoTs, how reflection contributes to empirical improvements remains unclear. To analyze this issue, in this paper, we present a minimalistic reasoning framework to support basic self-verifying reflection for small transformers without natural language, which ensures analytic clarity and reduces the cost of comprehensive experiments. Theoretically, we prove that self-verifying reflection guarantees improvements if verification errors are properly bounded. Experimentally, we show that tiny transformers, with only a few million parameters, benefit from self-verification in both training and reflective execution, reaching remarkable LLM-level performance in integer multiplication and Sudoku. Similar to LLM results, we find that reinforcement learning (RL) improves in-distribution performance and incentivizes frequent reflection for tiny transformers, yet RL mainly optimizes shallow statistical patterns without faithfully reducing verification errors. In conclusion, integrating generative transformers with discriminative verification inherently facilitates CoT reasoning, regardless of scaling and natural language.

AAAI Conference 2025 Conference Paper

Sim4Rec: Data-Free Model Extraction Attack on Sequential Recommendation

  • Yihao Wang
  • Jiajie Su
  • Chaochao Chen
  • Meng Han
  • Chi Zhang
  • Jun Wang

Model extraction attack shows promising performance in revealing sequential recommendation (SeqRec) robustness, e.g., as an upstream task of transfer-based attack to provide optimization feedback for downstream attacks. However, existing work either heavily relies on impractical prior knowledge or has impressive attack performance. In this paper, we focus on data-free model extraction attack on SeqRec, which aims to efficiently train a surrogate model that closely imitates the target model in a practical setting. Conducting such an attack is challenging. First, imitating sequential training data for accurate model extraction is hard without prior knowledge. Second, limited queries for the target model require the attack to be efficient. To address these challenges, we propose a novel adversarial framework Sim4Rec which includes two modules, i.e., controllable sequence generation and reinforced adversarial distillation. The former allows a sequential generator to produce synthetic data similar to training data through pre-training with controllable generated samples. The latter efficiently extracts the target model via reinforced adversarial knowledge distillation. Extensive experiments demonstrate the advancement of Sim4Rec.

AAAI Conference 2025 Conference Paper

STAIR: Manipulating Collaborative and Multimodal Information for E-Commerce Recommendation

  • Cong Xu
  • Yunhang He
  • Jun Wang
  • Wei Zhang

While the mining of modalities is the focus of most multimodal recommendation methods, we believe that how to fully utilize both collaborative and multimodal information is pivotal in e-commerce scenarios where, as clarified in this work, the user behaviors are rarely determined entirely by multimodal features. In order to combine the two distinct types of information, some additional challenges are encountered: 1) Modality erasure: Vanilla graph convolution, which proves rather useful in collaborative filtering, however erases multimodal information; 2) Modality forgetting: Multimodal information tends to be gradually forgotten as the recommendation loss essentially facilitates the learning of collaborative information. To this end, we propose a novel approach named STAIR, which employs a novel stepwise graph convolution to enable a co-existence of collaborative and multimodal information in e-commerce recommendation. Besides, it starts with the raw multimodal features as an initialization, and the forgetting problem can be significantly alleviated through constrained embedding updates. As a result, STAIR achieves state-of-the-art recommendation performance on three public e-commerce datasets with minimal computational and memory costs.

NeurIPS Conference 2025 Conference Paper

Succeed or Learn Slowly: Sample Efficient Off-Policy Reinforcement Learning for Mobile App Control

  • Georgios Papoudakis
  • Thomas Coste
  • Jianye Hao
  • Jun Wang
  • Kun Shao

Reinforcement learning (RL) using foundation models for policy approximations in multi-turn tasks remains challenging. We identify two main limitations related to sparse reward settings and policy gradient updates, based on which we formulate a key insight: updates from positive samples with high returns typically do not require policy regularisation, whereas updates from negative samples, reflecting undesirable behaviour, can harm model performance. This paper introduces Succeed or Learn Slowly (SoLS), a novel off-policy RL algorithm evaluated on mobile app control tasks. SoLS improves sample efficiency when fine-tuning foundation models for user interface navigation via a modified off-policy actor-critic approach, applying direct policy updates for positive samples and conservative, regularised updates for negative ones to prevent model degradation. We augment SoLS with Successful Transition Replay (STR), which prioritises learning from successful interactions, further improving sample efficiency. We evaluate SoLS on the AndroidWorld benchmark, where it significantly outperforms existing methods (at least 17\% relative increase), including prompt-engineering and RL approaches, while requiring substantially fewer computational resources than GPT-4o-based methods with 5-60x faster inference.

NeurIPS Conference 2025 Conference Paper

Switchable Token-Specific Codebook Quantization For Face Image Compression

  • Yongbo Wang
  • Haonan Wang
  • Guodong Mu
  • Ruixin Zhang
  • Jiaqi Chen
  • Jingyun Zhang
  • Jun Wang
  • Yuan Xie

With the ever-increasing volume of visual data, the efficient and lossless transmission, along with its subsequent interpretation and understanding, has become a critical bottleneck in modern information systems. The emerged codebook-based solution utilize a globally shared codebook to quantize and dequantize each token, controlling the bpp by adjusting the number of tokens or the codebook size. However, for facial images—which are rich in attributes—such global codebook strategies overlook both the category-specific correlations within images and the semantic differences among tokens, resulting in suboptimal performance, especially at low bpp. Motivated by these observations, we propose a Switchable Token-Specific Codebook Quantization for face image compression, which learns distinct codebook groups for different image categories and assigns an independent codebook to each token. By recording the codebook group to which each token belongs with a small number of bits, our method can reduce the loss incurred when decreasing the size of each codebook group. This enables a larger total number of codebooks under a lower overall bpp, thereby enhancing the expressive capability and improving reconstruction performance. Owing to its generalizable design, our method can be integrated into any existing codebook-based representation learning approach and has demonstrated its effectiveness on face recognition datasets, achieving an average accuracy of 93. 51\% for reconstructed images at 0. 05 bpp.

AAMAS Conference 2025 Conference Paper

Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction

  • Taher Jafferjee
  • Juliusz Ziomek
  • Tianpei Yang
  • Zipeng Dai
  • Jianhong Wang
  • Matthew E. Taylor
  • Kun Shao
  • Jun Wang

Multi-agent reinforcement learning (MARL) enables systems of autonomous agents to solve complex tasks from jointly gathered experiences of the environment. Many MARL algorithms perform centralized training (CT), often in a simulated environment, where at each time-step the critic makes use of a single sample of the agents’ joint-action for training. Yet, as agents update their policies during training, these single samples may poorly represent the agents’ joint-policy leading to high variance gradient estimates that hinder learning. In this paper, we examine the effect on MARL estimators of allowing the number of joint-action samples taken at each time-step to be greater than 1 in training. Our theoretical analysis shows that even modestly increasing the number of jointaction samples shown to the critic leads to TD updates that closely approximate the true expected value under the current joint-policy. In particular, we prove this reduces variance in value estimates similar to that of decentralized training while maintaining the learning benefits of CT. We describe how such a protocol can be seamlessly realized by sharing policy parameters between the agents during training and apply the technique to induce lower variance in estimates in MARL methods within a general apparatus which we call Performance Enhancing Reinforcement Learning Apparatus (PERLA). Lastly, we demonstrate PERLA’s performance improvements and estimator variance reduction capabilities in a range of environments including Multi-agent Mujoco, and StarCraft II. ∗Work was conducted while at Huawei R&D. †Corresponding author. This work is licensed under a Creative Commons Attribution International 4. 0 License. Proc. of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2025), Y. Vorobeychik, S. Das, A. Nowé (eds.), May 19 – 23, 2025, Detroit, Michigan, USA. © 2025 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org).

NeurIPS Conference 2025 Conference Paper

ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning

  • Shulin Huang
  • Linyi Yang
  • Yan Song
  • Shawn Chen
  • Leyang Cui
  • Ziyu Wan
  • Qingcheng Zeng
  • Ying Wen

Evaluating large language models (LLMs) poses significant challenges, particularly due to issues of data contamination and the leakage of correct answers. To address these challenges, we introduce ThinkBench, a novel evaluation framework designed to robustly evaluate the reasoning capability of LLMs. ThinkBench proposes a dynamic data generation method for constructing out-of-distribution (OOD) datasets and offers an OOD dataset that contains 2, 912 samples drawn from reasoning tasks. ThinkBench unifies the evaluation of reasoning models and non-reasoning models. We evaluate 16 LLMs and 4 PRMs under identical experimental conditions and show that most of the LLMs' performance are far from robust and they face a certain level of data leakage. By dynamically generating OOD datasets, ThinkBench effectively provides a reliable evaluation of LLMs and reduces data contamination impact. Our data and codes are available at https: //github. com/huangshulin123/ThinkBench.

JBHI Journal 2025 Journal Article

Topological GCN Guided Improved Conformer for Detection of Hip Landmarks From Ultrasound Images

  • Tianxiang Huang
  • Jing Shi
  • Ge Jin
  • Juncheng Li
  • Jun Wang
  • Qian Wang
  • Jun Du
  • Jun Shi

The B-mode ultrasound based computer-aided diagnosis (CAD) has shown its effectiveness for diagnosis of Developmental Dysplasia of the Hip (DDH) in infants within 6 months. Hip landmark detection is a feasible way for the CAD of DDH according to the Graf's method. However, existing landmark detection algorithms mainly focus on designing special models to capture the features from hip ultrasound images, but generally ignore the important spatial relations among different landmarks. To this end, a novel weakly supervised learning-based algorithm, the Topological Graph Convolutional Network (TGCN) guided Improved Conformer (TGCN-ICF), is proposed for detecting landmarks from hip ultrasound images. The TGCN-ICF includes two subnetworks: an Improved Conformer (ICF) subnetwork to generate heatmaps and constraint vectors from ultrasound images, and a TGCN subnetwork to additionally explore topological relations among hip landmarks with the guidance of class labels for further refining and improving the detection accuracy. Moreover, a new Mutual Modulation Fusion (MMF) module is developed to fully exchange and fuse the extracted feature information from the convolutional neural network (CNN) and Transformer branches in ICF. Meanwhile, a novel Mutual Supervision Constraint (MSC) strategy is designed to provide a constraint for detection of each hip landmark. The experimental results on two real-world DDH datasets demonstrate that the TGCN-ICF outperforms all the compared algorithms, suggesting its potential applications.

NeurIPS Conference 2025 Conference Paper

Uncertainty-quantified Rollout Policy Adaptation for Unlabelled Cross-domain Video Temporal Grounding

  • Jian Hu
  • Zixu Cheng
  • Shaogang Gong
  • Isabel Guan
  • Jianye Hao
  • Jun Wang
  • Kun Shao

Video Temporal Grounding (TG) aims to temporally locate video segments matching a natural language description (a query) in a long video. While Vision-Language Models (VLMs) are effective at holistic semantic matching, they often struggle with fine-grained temporal localisation. Recently, Group Relative Policy Optimisation (GRPO) reformulates the inference process as a reinforcement learning task, enabling fine-grained grounding and achieving strong in-domain performance. However, GRPO relies on labelled data, making it unsuitable in unlabelled domains. Moreover, because videos are large and expensive to store and process, performing full-scale adaptation introduces prohibitive latency and computational overhead, making it impractical for real-time deployment. To overcome both problems, we introduce a Data-Efficient Unlabelled Cross-domain Temporal Grounding method, from which a model is first trained on a labelled source domain, then adapted to a target domain using only a small number of {\em unlabelled videos from the target domain}. This approach eliminates the need for target annotation and keeps both computational and storage overhead low enough to run in real time. Specifically, we introduce \textbf{U}ncertainty-quantified \textbf{R}ollout \textbf{P}olicy \textbf{A}daptation (\textbf{URPA}) for cross-domain knowledge transfer in learning video temporal grounding without target labels. URPA generates multiple candidate predictions using GRPO rollouts, averages them to form a pseudo label, and estimates confidence from the variance across these rollouts. This confidence then weights the training rewards, guiding the model to focus on reliable supervision. Experiments on three datasets across six cross-domain settings show that URPA generalises well using only a few unlabelled target videos. Codes are given in supplemental materials.

AAMAS Conference 2025 Conference Paper

Unlocking the Potential of Decentralized LLM-based MAS: Privacy Preservation and Monetization in Collective Intelligence

  • Yingxuan Yang
  • Qiuying Peng
  • Jun Wang
  • Ying Wen
  • Weinan Zhang

Recent advances in large language models (LLMs) have enabled the development of LLM agents—autonomous systems capable of perceiving their environment, reasoning about tasks, and taking actions using external tools. While existing LLM-based Multi-Agent Systems (LaMAS) have shown promising results, they are predominantly centralized, operating within specific tasks or scenarios. These centralized designs simplify coordination but are fundamentally constrained by the limited data and knowledge available within a single entity. As LLM agents see broader deployment, the complexity of tasks increasingly requires collaboration across multiple organizations and data domains. Since organizations cannot and will not fully share their proprietary data, the next frontier of artificial intelligence lies in collective intelligence through decentralized LLM-based Multi-Agent Systems (LaMAS), where LLM agents, each accessing proprietary knowledge and tools, collaborate to solve complex tasks. This paradigm is becoming not just possible but necessary with the growing adoption of LLM agents across diverse organizations. This paper explores the transformative potential of decentralized LaMAS. In decentralized settings, two key issues arise: (1) privacy-preserving mechanisms that enable meaningful collaboration while safeguarding proprietary data and knowledge, and (2) monetization and credit attribution mechanisms that incentivize continuous improvement of agent capabilities and ensure fair value distribution among participants. Our analysis reveals that addressing these challenges can unlock a new paradigm of artificial collective intelligence that overcomes the limitations. This work contributes to decentralized AI by proposing a practical framework for mechanism design that advances both technological innovation and economic sustainability in decentralized LLM Agent networks.

JBHI Journal 2025 Journal Article

Unsupervised Feature Selection-Driven Active Learning for Semi-Supervised Automatic ECG Analysis

  • Xiao Li
  • Yongkang Zhou
  • Songyang An
  • Yu Zeng
  • Xinqi Zhang
  • Jun Wang
  • Yizhe Huang
  • Fan Lin

Automatic analysis methods of electrocardiograms (ECGs) usually required large-scale annotated training data, but the annotation process is extremely time-consuming. While semi-supervised learning can leverage unlabeled data, its performance depends heavily on the quality of the initial labeled subset. Active learning has been used to identify the most informative samples for annotation, but conventional approaches face three critical limitations: (1) dependency on manual intervention for iterative query design, (2) prohibitive computational costs during sample selection, and (3) limited compatibility with semi-supervised learning frameworks. To address these limitations, we proposed an Unsupervised Active Feature-selective Semi-Supervised Learning (UAFSSL) framework for ECG analysis, including an unsupervised feature selection-based active learning module and a semi-supervised learning module. UAFSSL captures latent data distributions via unsupervised feature extraction, selects diverse and representative samples using pseudo-label clustering, and integrates seamlessly with semi-supervised learning to eliminate human intervention. We validated our algorithm on an ECG waveform segmentation task and an atrial fibrillation detection task. In the waveform segmentation task, our method improved the F1-score for P-wave delineation by 2. 4% compared to random sampling, using only 5% of labeled samples. For the atrial fibrillation detection task, we evaluated our method on both the AFDB and a 24-hour dataset collected from 500 atrial fibrillation patients. Using only 200 labeled samples for model training, our method achieved AUC improvements of 2. 5% and 2. 2% over random sampling in five-fold cross validation. This is the first study to integrate unsupervised active learning with semi-supervised learning for automatic ECG analysis, offering a robust, automated solution to reduce annotation costs while enhancing clinical applicability.

JBHI Journal 2025 Journal Article

Ψ-Net: Triple-Branch Network with Cross-Branch Alternately Updated Fusion for Diagnosis of Bicuspid Aortic Valve Via Dual-View Echocardiography

  • Jiayan Chen
  • Zibire Fulati
  • Lina Luan
  • Xueying Zhou
  • Juncheng Li
  • Jun Wang
  • Haiyan Chen
  • Jun Shi

Bicuspid Aortic Valve (BAV) can be diagnosed by Transthoracic Echocardiography (TTE), particularly on the parasternal short‑axis view. In this work, a Triple-Branch Network (named Ψ-Net) is proposed as a Computer-Aided Diagnosis (CAD) model for BAV based on the paired TTE images of aortic valve. This Ψ-shaped triple-branch network effectively learns both the view-common and view-specific features from the paired TTE images for improving feature representation. Moreover, a novel cross-branch alternately updated fusion block is developed by implementing alternately updated clique mechanism cross multiple branches, which maximizes cross-branch feature interaction among the Ψ-Net to enhance multi-view feature fusion. On the other hand, a multi-task self-supervised learning framework is developed to capture inherent properties from limited dual-view TTE samples by integrating the dual-view masked image modelling and Disentangled Representation Learning (DRL) into a unified framework. Specifically, an additional view classification task is designed and embedded into this framework for predicting which view a specific feature belongs to, so as to further promote the disentanglement learning of view-common and view-specific features by DRL. Moreover, the Shapley Value based weight adjustment strategy is designed to automatically assign weights to individual losses in objective function, which can dynamically balance the contribution of each loss term. The experimental results on two BAV TTE datasets demonstrate that Ψ-Net outperforms all the compared algorithms, suggesting its effectiveness in the diagnosis of BAV.

AAMAS Conference 2024 Conference Paper

A Summary of Online Markov Decision Processes with Non-oblivious Strategic Adversary

  • Le Cong Dinh
  • David Henry Mguni
  • Long Tran-Thanh
  • Jun Wang
  • Yaodong Yang

We study a novel setting in Online Markov Decision Processes (OMDPs) where the loss function is chosen by a non-oblivious strategic adversary who follows a no-external regret algorithm. In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of O( √︁ 𝑇 log(𝐿) +𝜏2 √︁ 𝑇 log(|𝐴|)) where 𝐿 is the size of adversary’s pure strategy set and |𝐴| denotes the size of agent’s action space. Considering real-world games where the support size of a NE is small, we further propose a new algorithm: MDP-Online Oracle Expert (MDP-OOE), that achieves a policy regret bound of O( √︁ 𝑇 log(𝐿) + 𝜏2 √︁ 𝑇𝑘 log(𝑘)) where 𝑘 depends only on the support size of the NE. MDP-OOE leverages the key benefit of Double Oracle in game theory and thus can solve games with prohibitively large action space. Finally, to better understand the learning dynamics of no-regret methods, under the same setting of no-external regret adversary in OMDPs, we introduce an algorithm that achieves last-round convergence result to a NE. To our best knowledge, this is first work leading to the last iteration result in OMDPs.

ICRA Conference 2024 Conference Paper

An LLM-driven Framework for Multiple-Vehicle Dispatching and Navigation in Smart City Landscapes

  • Ruiqing Chen
  • Wenbin Song
  • Weiqin Zu
  • ZiXin Dong
  • Ze Guo
  • Fanglei Sun
  • Zheng Tian
  • Jun Wang

In the context of smart cities, autonomous vehicles, such as unmanned delivery vehicles and taxis are gradually gaining acceptance. However, their application scenarios remain significantly fragmented. Typically, an Autonomous Multi-Functional Vehicle (AMFV) is not engaged in other scenarios when idle in a specific one. Currently, a unified system capable of coordinating and using these resources efficiently is lacking. Moreover, there is an absence of an advanced navigation algorithm for facilitating coordinated navigation among Heterogeneous Vehicles (HVs). To address these issues, we propose the LLM-driven Multi-vehicle Dispatching and navigation (LiMeda) framework. It comprises an LLM-driven scheduling module that facilitates efficient allocation considering task scenarios and vehicle information, which addresses the issue of incompatible vehicle resources across various smart city scenarios. And the other is a navigation module, founded on the Heterogeneous Agent Reinforcement Learning (HARL) framework we previously proposed, which can effectively perform cooperative navigation tasks among heterogeneous agents, assisting the cooperative task completion by HVs in a smart city. Experimental results show our method outperforms both traditional scheduling algorithms and Reinforcement Learning navigation algorithms in metric terms. Additionally, it shows remarkable scalability and generalization under varying city scales, vehicle numbers, and task numbers.

AAMAS Conference 2024 Conference Paper

Boosting Studies of Multi-Agent Reinforcement Learning on Google Research Football Environment: The Past, Present, and Future

  • Yan Song
  • He Jiang
  • Haifeng Zhang
  • Zheng Tian
  • Weinan Zhang
  • Jun Wang

Even though Google Research Football (GRF) was initially benchmarked and studied as a single-agent environment in its original paper [19], recent years have witnessed an increasing focus on its multi-agent nature by researchers utilizing it as a testbed for Multi-Agent Reinforcement Learning (MARL), especially in the cooperative scenarios. However, the absence of standardized environment settings and uni�ed evaluation metrics for multi-agent scenarios hampers the consistent understanding of various studies. Furthermore, the challenging 5 vs 5 and 11 vs 11 full-game scenarios have received limited thorough examination due to their substantial training complexities. To address these gaps, this paper extends the original environment by not only standardizing the environment settings and benchmarking cooperative learning algorithms across di�erent scenarios, including the most challenging full-game scenarios, but also by discussing approaches to enhance football AI from diverse perspectives and introducing related research tools for learning beyond multi-agent cooperation. Speci�cally, we provide a distributed and asynchronous population-based self-play framework with diverse pre-trained policies for faster training, two football-speci�c analytical tools for deeper investigation, and an online leaderboard for broader evaluation. The overall expectation of this work is to advance the study of Multi-Agent Reinforcement Learning both on and with Google Research Football environment, with the ultimate goal of deploying these technologies to real-world applications, such as sports analysis. ∗Equal Contribution †Corresponding Author This work is licensed under a Creative Commons Attribution International 4. 0 License. Proc. of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024), N. Alechina, V. Dignum, M. Dastani, J. S. Sichman (eds.), May 6 – 10, 2024, Auckland, New Zealand. © 2024 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org).

IJCAI Conference 2024 Conference Paper

Boundary-aware Decoupled Flow Networks for Realistic Extreme Rescaling

  • Jinmin Li
  • Tao Dai
  • Jingyun Zhang
  • Kang Liu
  • Jun Wang
  • Shaoming Wang
  • Shu-Tao Xia
  • Rizen Guo

Recently developed generative methods, including invertible rescaling network (IRN) based and generative adversarial network (GAN) based methods, have demonstrated exceptional performance in image rescaling. However, IRN-based methods tend to produce over-smoothed results, while GAN-based methods easily generate fake details, which thus hinders their real applications. To address this issue, we propose Boundary-aware Decoupled Flow Networks (BDFlow) to generate realistic and visually pleasing results. Unlike previous methods that model high-frequency information as standard Gaussian distribution directly, our BDFlow first decouples the high-frequency information into semantic high-frequency that adheres to a Boundary distribution and non-semantic high-frequency counterpart that adheres to a Gaussian distribution. Specifically, to capture semantic high-frequency parts accurately, we use Boundary-aware Mask (BAM) to constrain the model to produce rich textures, while non-semantic high-frequency part is randomly sampled from a Gaussian distribution. Comprehensive experiments demonstrate that our BDFlow significantly outperforms other state-of-the-art methods while maintaining lower complexity. Notably, our BDFlow improves the PSNR by 4. 4 dB and the SSIM by 0. 1 on average over GRAIN, utilizing only 74% of the parameters and 20% of the computation. The code will be available at https: //github. com/THU-Kingmin/BAFlow.

IJCAI Conference 2024 Conference Paper

Bridge to Non-Barrier Communication: Gloss-Prompted Fine-Grained Cued Speech Gesture Generation with Diffusion Model

  • Wentao Lei
  • Li Liu
  • Jun Wang

Cued Speech (CS) is an advanced visual phonetic encoding system that integrates lip reading with hand codings, enabling people with hearing impairments to communicate efficiently. CS video generation aims to produce specific lip and gesture movements of CS from audio or text inputs. The main challenge is that given limited CS data, we strive to simultaneously generate fine-grained hand and finger movements, as well as lip movements, meanwhile the two kinds of movements need to be asynchronously aligned. Existing CS generation methods are fragile and prone to poor performance due to template-based statistical models and careful hand-crafted pre-processing to fit the models. Therefore, we propose a novel Gloss-prompted Diffusion-based CS Gesture generation framework (called GlossDiff). Specifically, to integrate additional linguistic rules knowledge into the model. we first introduce a bridging instruction called Gloss, which is an automatically generated descriptive text to establish a direct and more delicate semantic connection between spoken language and CS gestures. Moreover, we first suggest rhythm is an important paralinguistic feature for CS to improve the communication efficacy. Therefore, we propose a novel Audio-driven Rhythmic Module (ARM) to learn rhythm that matches audio speech. Moreover, in this work, we design, record, and publish the first Chinese CS dataset with four CS cuers. Extensive experiments demonstrate that our method quantitatively and qualitatively outperforms current state-of-the-art (SOTA) methods. We will release the code and data at glossdiff. github. io/.

JBHI Journal 2024 Journal Article

CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation

  • Jun Wang
  • Abhir Bhalerao
  • Terry Yin
  • Simon See
  • Yulan He

Radiology report generation (RRG) has gained increasing research attention because of its huge potential to mitigate medical resource shortages and aid the process of disease decision making by radiologists. Recent advancements in RRG are largely driven by improving a model's capabilities in encoding single-modal feature representations, while few studies explicitly explore the cross-modal alignment between image regions and words. Radiologists typically focus first on abnormal image regions before composing the corresponding text descriptions, thus cross-modal alignment is of great importance to learn a RRG model which is aware of abnormalities in the image. Motivated by this, we propose a C lass A ctivation M ap guided A ttention Net work (CAMANet) which explicitly promotes cross-modal alignment by employing aggregated class activation maps to supervise cross-modal attention learning, and simultaneously enrich the discriminative information. CAMANet contains three complementary modules: a Visual Discriminative Map Generation module to generate the importance/contribution of each visual token; Visual Discriminative Map Assisted Encoder to learn the discriminative representation and enrich the discriminative information; and a Visual Textual Attention Consistency module to ensure the attention consistency between the visual and textual tokens, to achieve the cross-modal alignment. Experimental results demonstrate that CAMANet outperforms previous SOTA methods on two commonly used RRG benchmarks.

AAAI Conference 2024 Conference Paper

Decoupling Representation and Knowledge for Few-Shot Intent Classification and Slot Filling

  • Jie Han
  • Yixiong Zou
  • Haozhao Wang
  • Jun Wang
  • Wei Liu
  • Yao Wu
  • Tao Zhang
  • Ruixuan Li

Few-shot intent classification and slot filling are important but challenging tasks due to the scarcity of finely labeled data. Therefore, current works first train a model on source domains with sufficiently labeled data, and then transfer the model to target domains where only rarely labeled data is available. However, experience transferring as a whole usually suffers from gaps that exist among source domains and target domains. For instance, transferring domain-specific-knowledge-related experience is difficult. To tackle this problem, we propose a new method that explicitly decouples the transferring of general-semantic-representation-related experience and the domain-specific-knowledge-related experience. Specifically, for domain-specific-knowledge-related experience, we design two modules to capture intent-slot relation and slot-slot relation respectively. Extensive experiments on Snips and FewJoint datasets show that our method achieves state-of-the-art performance. The method improves the joint accuracy metric from 27.72% to 42.20% in the 1-shot setting, and from 46.54% to 60.79% in the 5-shot setting.

IJCAI Conference 2024 Conference Paper

Domain Adaptive and Fine-grained Anomaly Detection for Single-cell Sequencing Data and Beyond

  • Kaichen Xu
  • Yueyang Ding
  • Suyang Hou
  • Weiqiang Zhan
  • Nisang Chen
  • Jun Wang
  • Xiaobo Sun

Fined-grained anomalous cell detection from affected tissues is critical for clinical diagnosis and pathological research. Single-cell sequencing data provide unprecedented opportunities for this task. However, current anomaly detection methods struggle to handle domain shifts prevalent in multi-sample and multi-domain single-cell sequencing data, leading to suboptimal performance. Moreover, these methods fall short of distinguishing anomalous cells into pathologically distinct subtypes. In response, we propose ACSleuth, a novel, reconstruction deviation-guided generative framework that integrates the detection, domain adaptation, and fine-grained annotating of anomalous cells into a methodologically cohesive workflow. Notably, we present the first theoretical analysis of using reconstruction deviations output by generative models for anomaly detection in lieu of domain shifts. This analysis informs us to develop a novel and superior maximum mean discrepancy-based anomaly scorer in ACSleuth. Extensive benchmarks over various single-cell data and other types of tabular data demonstrate ACSleuth's superiority over the state-of-the-art methods in identifying and subtyping anomalies in multi-sample and multi-domain contexts. Our code is available at https: //github. com/Catchxu/ACsleuth.

AAAI Conference 2024 Conference Paper

Federated Causality Learning with Explainable Adaptive Optimization

  • Dezhi Yang
  • Xintong He
  • Jun Wang
  • Guoxian Yu
  • Carlotta Domeniconi
  • Jinglin Zhang

Discovering the causality from observational data is a crucial task in various scientific domains. With increasing awareness of privacy, data are not allowed to be exposed, and it is very hard to learn causal graphs from dispersed data, since these data may have different distributions. In this paper, we propose a federated causal discovery strategy (FedCausal) to learn the unified global causal graph from decentralized heterogeneous data. We design a global optimization formula to naturally aggregate the causal graphs from client data and constrain the acyclicity of the global graph without exposing local data. Unlike other federated causal learning algorithms, FedCausal unifies the local and global optimizations into a complete directed acyclic graph (DAG) learning process with a flexible optimization objective. We prove that this optimization objective has a high interpretability and can adaptively handle homogeneous and heterogeneous data. Experimental results on synthetic and real datasets show that FedCausal can effectively deal with non-independently and identically distributed (non-iid) data and has a superior performance.

NeurIPS Conference 2024 Conference Paper

FOOGD: Federated Collaboration for Both Out-of-distribution Generalization and Detection

  • Xinting Liao
  • Weiming Liu
  • Pengyang Zhou
  • Fengyuan Yu
  • Jiahe Xu
  • Jun Wang
  • Wenjie Wang
  • Chaochao Chen

Federated learning (FL) is a promising machine learning paradigm that collaborates with client models to capture global knowledge. However, deploying FL models in real-world scenarios remains unreliable due to the coexistence of in-distribution data and unexpected out-of-distribution (OOD) data, such as covariate-shift and semantic-shift data. Current FL researches typically address either covariate-shift data through OOD generalization or semantic-shift data via OOD detection, overlooking the simultaneous occurrence of various OOD shifts. In this work, we propose FOOGD, a method that estimates the probability density of each client and obtains reliable global distribution as guidance for the subsequent FL process. Firstly, SM3D in FOOGD estimates score model for arbitrary distributions without prior constraints, and detects semantic-shift data powerfully. Then SAG in FOOGD provides invariant yet diverse knowledge for both local covariate-shift generalization and client performance generalization. In empirical validations, FOOGD significantly enjoys three main advantages: (1) reliably estimating non-normalized decentralized distributions, (2) detecting semantic shift data via score values, and (3) generalizing to covariate-shift data by regularizing feature extractor. The project is open in https: //github. com/XeniaLLL/FOOGD-main. git.

NeurIPS Conference 2024 Conference Paper

Graph-enhanced Optimizers for Structure-aware Recommendation Embedding Evolution

  • Cong Xu
  • Jun Wang
  • Jianyong Wang
  • Wei Zhang

Embedding plays a key role in modern recommender systems because they are virtual representations of real-world entities and the foundation for subsequent decision-making models. In this paper, we propose a novel embedding update mechanism, Structure-aware Embedding Evolution (SEvo for short), to encourage related nodes to evolve similarly at each step. Unlike GNN (Graph Neural Network) that typically serves as an intermediate module, SEvo is able to directly inject graph structural information into embedding with minimal computational overhead during training. The convergence properties of SEvo along with its potential variants are theoretically analyzed to justify the validity of the designs. Moreover, SEvo can be seamlessly integrated into existing optimizers for state-of-the-art performance. Particularly SEvo-enhanced AdamW with moment estimate correction demonstrates consistent improvements across a spectrum of models and datasets, suggesting a novel technical route to effectively utilize graph structural information beyond explicit GNN modules.

AAAI Conference 2024 Conference Paper

Human-Guided Moral Decision Making in Text-Based Games

  • Zijing Shi
  • Meng Fang
  • Ling Chen
  • Yali Du
  • Jun Wang

Training reinforcement learning (RL) agents to achieve desired goals while also acting morally is a challenging problem. Transformer-based language models (LMs) have shown some promise in moral awareness, but their use in different contexts is problematic because of the complexity and implicitness of human morality. In this paper, we build on text-based games, which are challenging environments for current RL agents, and propose the HuMAL (Human-guided Morality Awareness Learning) algorithm, which adaptively learns personal values through human-agent collaboration with minimal manual feedback. We evaluate HuMAL on the Jiminy Cricket benchmark, a set of text-based games with various scenes and dense morality annotations, using both simulated and actual human feedback. The experimental results demonstrate that with a small amount of human feedback, HuMAL can improve task performance and reduce immoral behavior in a variety of games and is adaptable to different personal values.

JBHI Journal 2024 Journal Article

Involution Transformer Based U-Net for Landmark Detection in Ultrasound Images for Diagnosis of Infantile DDH

  • Tianxiang Huang
  • Jing Shi
  • Juncheng Li
  • Jun Wang
  • Jun Du
  • Jun Shi

The B-mode ultrasound based computer-aided diagnosis (CAD) has demonstrated its effectiveness for diagnosis of Developmental Dysplasia of the Hip (DDH) in infants, which can conduct the Graf's method by detecting landmarks in hip ultrasound images. However, it is still necessary to explore more valuable information around these landmarks to enhance feature representation for improving detection performance in the detection model. To this end, a novel Involution Transformer based U-Net (IT-UNet) network is proposed for hip landmark detection. The IT-UNet integrates the efficient involution operation into Transformer to develop an Involution Transformer module (ITM), which consists of an involution attention block and a squeeze-and-excitation involution block. The ITM can capture both the spatial-related information and long-range dependencies from hip ultrasound images to effectively improve feature representation. Moreover, an Involution Downsampling block (IDB) is developed to alleviate the issue of feature loss in the encoder modules, which combines involution and convolution for the purpose of downsampling. The experimental results on two DDH ultrasound datasets indicate that the proposed IT-UNet achieves the best landmark detection performance, indicating its potential applications.

NeurIPS Conference 2024 Conference Paper

Is the MMI Criterion Necessary for Interpretability? Degenerating Non-causal Features to Plain Noise for Self-Rationalization

  • Wei Liu
  • Zhiying Deng
  • Zhongyu Niu
  • Jun Wang
  • Haozhao Wang
  • YuanKai Zhang
  • Ruixuan Li

An important line of research in the field of explainability is to extract a small subset of crucial rationales from the full input. The most widely used criterion for rationale extraction is the maximum mutual information (MMI) criterion. However, in certain datasets, there are spurious features non-causally correlated with the label and also get high mutual information, complicating the loss landscape of MMI. Although some penalty-based methods have been developed to penalize the spurious features (e. g. , invariance penalty, intervention penalty, etc) to help MMI work better, these are merely remedial measures. In the optimization objectives of these methods, spurious features are still distinguished from plain noise, which hinders the discovery of causal rationales. This paper aims to develop a new criterion that treats spurious features as plain noise, allowing the model to work on datasets rich in spurious features as if it were working on clean datasets, thereby making rationale extraction easier. We theoretically observe that removing either plain noise or spurious features from the input does not alter the conditional distribution of the remaining components relative to the task label. However, significant changes in the conditional distribution occur only when causal features are eliminated. Based on this discovery, the paper proposes a criterion for \textbf{M}aximizing the \textbf{R}emaining \textbf{D}iscrepancy (MRD). Experiments on six widely used datasets show that our MRD criterion improves rationale quality (measured by the overlap with human-annotated rationales) by up to $10. 4\%$ as compared to several recent competitive MMI variants. Code: \url{https: //github. com/jugechengzi/Rationalization-MRD}.

AAAI Conference 2024 Conference Paper

Large Language Models Are Neurosymbolic Reasoners

  • Meng Fang
  • Shilong Deng
  • Yudi Zhang
  • Zijing Shi
  • Ling Chen
  • Mykola Pechenizkiy
  • Jun Wang

A wide range of real-world applications is characterized by their symbolic nature, necessitating a strong capability for symbolic reasoning. This paper investigates the potential application of Large Language Models (LLMs) as symbolic reasoners. We focus on text-based games, significant benchmarks for agents with natural language capabilities, particularly in symbolic tasks like math, map reading, sorting, and applying common sense in text-based worlds. To facilitate these agents, we propose an LLM agent designed to tackle symbolic challenges and achieve in-game objectives. We begin by initializing the LLM agent and informing it of its role. The agent then receives observations and a set of valid actions from the text-based games, along with a specific symbolic module. With these inputs, the LLM agent chooses an action and interacts with the game environments. Our experimental results demonstrate that our method significantly enhances the capability of LLMs as automated agents for symbolic reasoning, and our LLM agent is effective in text-based games involving symbolic tasks, achieving an average performance of 88% across all tasks.

NeurIPS Conference 2024 Conference Paper

Large Language Models Play StarCraft II:Benchmarks and A Chain of Summarization Approach

  • Weiyu Ma
  • Qirui Mi
  • Yongcheng Zeng
  • Xue Yan
  • Yuqiao Wu
  • Runji Lin
  • Haifeng Zhang
  • Jun Wang

With the continued advancement of Large Language Models (LLMs) Agents in reasoning, planning, and decision-making, benchmarks have become crucial in evaluating these skills. However, there is a notable gap in benchmarks for real-time strategic decision-making. StarCraft II (SC2), with its complex and dynamic nature, serves as an ideal setting for such evaluations. To this end, we have developed TextStarCraft II, a specialized environment for assessing LLMs in real-time strategic scenarios within SC2. Addressing the limitations of traditional Chain of Thought (CoT) methods, we introduce the Chain of Summarization (CoS) method, enhancing LLMs' capabilities in rapid and effective decision-making. Our key experiments included: 1. LLM Evaluation: Tested 10 LLMs in TextStarCraft II, most of them defeating LV5 build-in AI, showcasing effective strategy skills. 2. Commercial Model Knowledge: Evaluated four commercial models on SC2 knowledge; GPT-4 ranked highest by Grandmaster-level experts. 3. Human-AI Matches: Experimental results showed that fine-tuned LLMs performed on par with Gold-level players in real-time matches, demonstrating comparable strategic abilities. All code and data from thisstudy have been made pulicly available at https: //github. com/histmeisah/Large-Language-Models-play-StarCraftII

NeurIPS Conference 2024 Conference Paper

Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf

  • Xuanfa Jin
  • Ziyan Wang
  • Yali Du
  • Meng Fang
  • Haifeng Zhang
  • Jun Wang

Communication is a fundamental aspect of human society, facilitating the exchange of information and beliefs among people. Despite the advancements in large language models (LLMs), recent agents built with these often neglect the control over discussion tactics, which are essential in communication scenarios and games. As a variant of the famous communication game Werewolf, One Night Ultimate Werewolf (ONUW) requires players to develop strategic discussion policies due to the potential role changes that increase the uncertainty and complexity of the game. In this work, we first present the existence of the Perfect Bayesian Equilibria (PBEs) in two scenarios of the ONUW game: one with discussion and one without. The results showcase that the discussion greatly changes players' utilities by affecting their beliefs, emphasizing the significance of discussion tactics. Based on the insights obtained from the analyses, we propose an RL-instructed language agent framework, where a discussion policy trained by reinforcement learning (RL) is employed to determine appropriate discussion tactics to adopt. Our experimental results on several ONUW game settings demonstrate the effectiveness and generalizability of our proposed framework.

AAAI Conference 2024 Conference Paper

Multi-Dimensional Fair Federated Learning

  • Cong Su
  • Guoxian Yu
  • Jun Wang
  • Hui Li
  • Qingzhong Li
  • Han Yu

Federated learning (FL) has emerged as a promising collaborative and secure paradigm for training a model from decentralized data without compromising privacy. Group fairness and client fairness are two dimensions of fairness that are important for FL. Standard FL can result in disproportionate disadvantages for certain clients, and it still faces the challenge of treating different groups equitably in a population. The problem of privately training fair FL models without compromising the generalization capability of disadvantaged clients remains open. In this paper, we propose a method, called mFairFL, to address this problem and achieve group fairness and client fairness simultaneously. mFairFL leverages differential multipliers to construct an optimization objective for empirical risk minimization with fairness constraints. Before aggregating locally trained models, it first detects conflicts among their gradients, and then iteratively curates the direction and magnitude of gradients to mitigate these conflicts. Theoretical analysis proves mFairFL facilitates the fairness in model development. The experimental evaluations based on three benchmark datasets show significant advantages of mFairFL compared to seven state-of-the-art baselines.

AAAI Conference 2024 Conference Paper

Multi-Granularity Causal Structure Learning

  • Jiaxuan Liang
  • Jun Wang
  • Guoxian Yu
  • Shuyin Xia
  • Guoyin Wang

Unveiling, modeling, and comprehending the causal mechanisms underpinning natural phenomena stand as fundamental endeavors across myriad scientific disciplines. Meanwhile, new knowledge emerges when discovering causal relationships from data. Existing causal learning algorithms predominantly focus on the isolated effects of variables, overlook the intricate interplay of multiple variables and their collective behavioral patterns. Furthermore, the ubiquity of high-dimensional data exacts a substantial temporal cost for causal algorithms. In this paper, we develop a novel method called MgCSL (Multi-granularity Causal Structure Learning), which first leverages sparse auto-encoder to explore coarse-graining strategies and causal abstractions from micro-variables to macro-ones. MgCSL then takes multi-granularity variables as inputs to train multilayer perceptrons and to delve the causality between variables. To enhance the efficacy on high-dimensional data, MgCSL introduces a simplified acyclicity constraint to adeptly search the directed acyclic graph among variables. Experimental results show that MgCSL outperforms competitive baselines, and finds out explainable causal connections on fMRI datasets.

AAAI Conference 2024 Conference Paper

PointAttN: You Only Need Attention for Point Cloud Completion

  • Jun Wang
  • Ying Cui
  • Dongyan Guo
  • Junxia Li
  • Qingshan Liu
  • Chunhua Shen

Point cloud completion referring to completing 3D shapes from partial 3D point clouds is a fundamental problem for 3D point cloud analysis tasks. Benefiting from the development of deep neural networks, researches on point cloud completion have made great progress in recent years. However, the explicit local region partition like kNNs involved in existing methods makes them sensitive to the density distribution of point clouds. Moreover, it serves limited receptive fields that prevent capturing features from long-range context information. To solve the problems, we leverage the cross-attention and self-attention mechanisms to design novel neural network for point cloud completion with implicit local region partition. Two basic units Geometric Details Perception (GDP) and Self-Feature Augment (SFA) are proposed to establish the structural relationships directly among points in a simple yet effective way via attention mechanism. Then based on GDP and SFA, we construct a new framework with popular encoder-decoder architecture for point cloud completion. The proposed framework, namely PointAttN, is simple, neat and effective, which can precisely capture the structural information of 3D shapes and predict complete point clouds with detailed geometry. Experimental results demonstrate that our PointAttN outperforms state-of-the-art methods on multiple challenging benchmarks. Code is available at: https://github.com/ohhhyeahhh/PointAttN

NeurIPS Conference 2024 Conference Paper

Policy Learning from Tutorial Books via Understanding, Rehearsing and Introspecting

  • Xiong-Hui Chen
  • Ziyan Wang
  • Yali Du
  • Shengyi Jiang
  • Meng Fang
  • Yang Yu
  • Jun Wang

When humans need to learn a new skill, we can acquire knowledge through written books, including textbooks, tutorials, etc. However, current research for decision-making, like reinforcement learning (RL), has primarily required numerous real interactions with the target environment to learn a skill, while failing to utilize the existing knowledge already summarized in the text. The success of Large Language Models (LLMs) sheds light on utilizing such knowledge behind the books. In this paper, we discuss a new policy learning problem called Policy Learning from tutorial Books (PLfB) upon the shoulders of LLMs’ systems, which aims to leverage rich resources such as tutorial books to derive a policy network. Inspired by how humans learn from books, we solve the problem via a three-stage framework: Understanding, Rehearsing, and Introspecting (URI). In particular, it first rehearses decision-making trajectories based on the derived knowledge after understanding the books, then introspects in the imaginary dataset to distill a policy network. We build two benchmarks for PLfB~based on Tic-Tac-Toe and Football games. In experiment, URI's policy achieves at least 44% net win rate against GPT-based agents without any real data; In Football game, which is a complex scenario, URI's policy beat the built-in AIs with a 37% while using GPT-based agent can only achieve a 6\% winning rate. The project page: https: //plfb-football. github. io.

IJCAI Conference 2024 Conference Paper

Provable Acceleration of Nesterov’s Accelerated Gradient Method over Heavy Ball Method in Training Over-Parameterized Neural Networks

  • Xin Liu
  • Wei Tao
  • Wei Li
  • Dazhi Zhan
  • Jun Wang
  • Zhisong Pan

Due to its simplicity and efficiency, the first-order gradient method has been extensively employed in training neural networks. Although the optimization problem of the neural network is non-convex, recent research has proved that the first-order method is capable of attaining a global minimum during training over-parameterized neural networks, where the number of parameters is significantly larger than that of training instances. Momentum methods, including the heavy ball (HB) method and Nesterov's accelerated gradient (NAG) method, are the workhorse of first-order gradient methods owning to their accelerated convergence. In practice, NAG often exhibits superior performance than HB. However, current theoretical works fail to distinguish their convergence difference in training neural networks. To fill this gap, we consider the training problem of the two-layer ReLU neural network under over-parameterization and random initialization. Leveraging high-resolution dynamical systems and neural tangent kernel (NTK) theory, our result not only establishes tighter upper bounds of the convergence rate for both HB and NAG, but also provides the first theoretical guarantee for the acceleration of NAG over HB in training neural networks. Finally, we validate our theoretical results on three benchmark datasets.

NeurIPS Conference 2024 Conference Paper

Reinforcing LLM Agents via Policy Optimization with Action Decomposition

  • Muning Wen
  • Ziyu Wan
  • Jun Wang
  • Weinan Zhang
  • Ying Wen

Language models as intelligent agents push the boundaries of sequential decision-making agents but struggle with limited knowledge of environmental dynamics and exponentially huge action space. Recent efforts like GLAM and TWOSOME manually constrain the action space to a restricted subset and employ reinforcement learning to align agents' knowledge with specific environments. However, they overlook fine-grained credit assignments for intra-action tokens, which is essential for efficient language agent optimization, and rely on human's prior knowledge to restrict action space. This paper proposes decomposing language agent optimization from the action level to the token level, offering finer supervision for each intra-action token and manageable optimization complexity in environments with unrestricted action spaces. Beginning with the simplification of flattening all actions, we theoretically explore the discrepancies between action-level optimization and this naive token-level optimization. We then derive the Bellman backup with Action Decomposition (BAD) to integrate credit assignments for both intra-action and inter-action tokens, effectively eliminating the discrepancies. Implementing BAD within the PPO algorithm, we introduce Policy Optimization with Action Decomposition (POAD). POAD benefits from a finer-grained credit assignment process and lower optimization complexity, leading to enhanced learning efficiency and generalization abilities in aligning language agents with interactive environments. We validate POAD across diverse testbeds, with results affirming the advantages of our approach and the correctness of our theoretical analysis. The source code can be accessed directly with this link: https: //github. com/morning9393/ADRL.

AAAI Conference 2024 Conference Paper

Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis

  • Rohan Mitta
  • Hosein Hasanbeig
  • Jun Wang
  • Daniel Kroening
  • Yiannis Kantaros
  • Alessandro Abate

This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL), such that the safety constraint violations are bounded at any point during learning. As enforcing safety during training might severely limit the agent’s exploration, we propose here a new architecture that handles the trade-off between efficient progress and safety during exploration. As the exploration progresses, we update via Bayesian inference Dirichlet-Categorical models of the transition probabilities of the Markov decision process that describes the environment dynamics. We then propose a way to approximate moments of belief about the risk associated to the action selection policy. We demonstrate that this approach can be easily interleaved with RL and we present experimental results to showcase the performance of the overall architecture.

AAMAS Conference 2024 Conference Paper

TaxAI: A Dynamic Economic Simulator and Benchmark for Multi-agent Reinforcement Learning

  • Qirui Mi
  • Siyu Xia
  • Yan Song
  • Haifeng Zhang
  • Shenghao Zhu
  • Jun Wang

Taxation and government spending are crucial tools for governments to promote economic growth and maintain social equity. However, the difficulty in accurately predicting the dynamic strategies of diverse self-interested households presents a challenge for governments to implement effective tax policies. Given its proficiency in modeling other agents in partially observable environments and adaptively learning to find optimal policies, Multi-Agent Reinforcement Learning (MARL) is highly suitable for solving dynamic games between the government and numerous households. Although MARL shows more potential than traditional methods such as the genetic algorithm and dynamic programming, there is a lack of large-scale multi-agent reinforcement learning economic simulators. Therefore, we propose a MARL environment, named TaxAI, for dynamic games involving 𝑁 households, government, firms, and financial intermediaries based on the Bewley-Aiyagari economic model. Our study benchmarks 2 traditional economic methods with 7 MARL methods on TaxAI, demonstrating the effectiveness and superiority of MARL algorithms. Moreover, TaxAI’s scalability in simulating dynamic interactions between the government and 10, 000 households, coupled with real-data calibration, grants it a substantial improvement in scale and reality over existing simulators. Therefore, TaxAI is the most realistic economic simulator for optimal tax policy, which aims to generate feasible recommendations for governments and individuals.

JBHI Journal 2024 Journal Article

Towards Wearable and Portable Spine Motion Analysis Through Dynamic Optimization of Smartphone Videos and IMU Data

  • Wei Wang
  • Yinghu Peng
  • Yilun Sun
  • Jun Wang
  • Guanglin Li

Background: Monitoring spine kinematics is crucial for applications like disease evaluation and ergonomics analysis. However, the small scale of vertebrae and the number of degrees of freedom present significant challenges for noninvasive and convenient spine kinematics estimation. Methods: This study developed a dynamic optimization framework for wearable spine motion tracking at the intervertebral joint level by integrating smartphone videos and Inertia Measurement Units (IMUs) with dynamic constraints from a thoracolumbar spine model. Validation involved motion data from 10 healthy males performing static standing, dynamic upright trunk rotations, and gait. This data included rotations of ten IMUs on vertebrae and virtual landmarks from three smartphone videos preprocessed by OpenCap, an application leveraging computer vision for pose estimation. The kinematic measures derived from the optimized solution were compared against simultaneously collected infrared optical marker-based measurements and in vivo literature data. Solutions only based on IMUs or videos were also compared for accuracy evaluation. Results: The proposed optimization approach closely matched the reference data in the intervertebral or segmental rotation range, demonstrating minimal angular differences across all motions and the highest correlation in 3D rotations (maximal Pearson and intraclass correlation coefficients of 0. 92 and 0. 94, respectively). Time-series changes of joint angles also aligned well with the optical-marker reference. Conclusion: Dynamic optimization of the spine simulation that integrates IMUs and computer vision outperforms the single-modality method. Significance: This markerless 3D spine motion capture method holds potential for spinal health assessment in large cohorts in real-world settings without dedicated laboratories.

NeurIPS Conference 2023 Conference Paper

An Efficient End-to-End Training Approach for Zero-Shot Human-AI Coordination

  • Xue Yan
  • Jiaxian Guo
  • Xingzhou Lou
  • Jun Wang
  • Haifeng Zhang
  • Yali Du

The goal of zero-shot human-AI coordination is to develop an agent that can collaborate with humans without relying on human data. Prevailing two-stage population-based methods require a diverse population of mutually distinct policies to simulate diverse human behaviors. The necessity of such populations severely limits their computational efficiency. To address this issue, we propose E3T, an E fficient E nd-to- E nd T raining approach for zero-shot human-AI coordination. E3T employs a mixture of ego policy and random policy to construct the partner policy, making it both coordination-skilled and diverse. In this way, the ego agent is end-to-end trained with this mixture policy without the need of a pre-trained population, thus significantly improving the training efficiency. In addition, a partner modeling module is proposed to predict the partner's action from historical information. With the predicted partner's action, the ego policy is able to adapt its policy and take actions accordingly when collaborating with humans of different behavior patterns. Empirical results on the Overcooked environment show that our method significantly improves the training efficiency while preserving comparable or superior performance than the population-based baselines. Demo videos are available at https: //sites. google. com/view/e3t-overcooked.

NeurIPS Conference 2023 Conference Paper

ChessGPT: Bridging Policy Learning and Language Modeling

  • Xidong Feng
  • Yicheng Luo
  • Ziyan Wang
  • Hongrui Tang
  • Mengyue Yang
  • Kun Shao
  • David Mguni
  • Yali Du

When solving decision-making tasks, humans typically depend on information from two key sources: (1) Historical policy data, which provides interaction replay from the environment, and (2) Analytical insights in natural language form, exposing the invaluable thought process or strategic considerations. Despite this, the majority of preceding research focuses on only one source: they either use historical replay exclusively to directly learn policy or value functions, or engaged in language model training utilizing mere language corpus. In this paper, we argue that a powerful autonomous agent should cover both sources. Thus, we propose ChessGPT, a GPT model bridging policy learning and language modeling by integrating data from these two sources in Chess games. Specifically, we build a large-scale game and language dataset related to chess. Leveraging the dataset, we showcase two model examples ChessCLIP and ChessGPT, integrating policy learning and language modeling. Finally, we propose a full evaluation framework for evaluating language model's chess ability. Experimental results validate our model and dataset's effectiveness. We open source our code, model, and dataset at https: //github. com/waterhorse1/ChessGPT.

JBHI Journal 2023 Journal Article

Chromosome Detection in Metaphase Cell Images Using Morphological Priors

  • Jun Wang
  • Chengfeng Zhou
  • Songchang Chen
  • Jianwu Hu
  • Minghui Wu
  • Xudong Jiang
  • Chenming Xu
  • Dahong Qian

Reliable chromosome detection in metaphase cell (MC) images can greatly alleviate the workload of cytogeneticists for karyotype analysis and the diagnosis of chromosomal disorders. However, it is still an extremely challenging task due to the complicated characteristics of chromosomes, e. g. , dense distributions, arbitrary orientations, and various morphologies. In this article, we propose a novel rotated-anchor-based detection framework, named DeepCHM, for fast and accurate chromosome detection in MC images. Our framework has three main innovations: 1) A deep saliency map representing chromosomal morphological features is learned end-to-end with semantic features. This not only enhances the feature representations for anchor classification and regression but also guides the anchor setting to significantly reduce redundant anchors. This accelerates the detection and improves the performance; 2) A hardness-aware loss weights the contribution of positive anchors, which effectively reinforces the model to identify hard chromosomes; 3) A model-driven sampling strategy addresses the anchor imbalance issue by adaptively selecting hard negative anchors for model training. In addition, a large-scale benchmark dataset with a total of 624 images and 27, 763 chromosome instances was built for chromosome detection and segmentation. Extensive experimental results demonstrate that our method outperforms most state-of-the-art (SOTA) approaches and successfully handles chromosome detection, with an AP score of 93. 53%.

IJCAI Conference 2023 Conference Paper

CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization

  • Xiaohan Yu
  • Jun Wang
  • Yongsheng Gao

Ultra-fine-grained visual classification (ultra-FGVC) targets at classifying sub-grained categories of fine-grained objects. This inevitably requires discriminative representation learning within a limited training set. Exploring intrinsic features from the object itself, e. g. , predicting the rotation of a given image, has demonstrated great progress towards learning discriminative representation. Yet none of these works consider explicit supervision for learning mutual information at instance level. To this end, this paper introduces CLE-ViT, a novel contrastive learning encoded transformer, to address the fundamental problem in ultra-FGVC. The core design is a self-supervised module that performs self-shuffling and masking and then distinguishes these altered images from other images. This drives the model to learn an optimized feature space that has a large inter-class distance while remaining tolerant to intra-class variations. By incorporating this self-supervised module, the network acquires more knowledge from the intrinsic structure of the input data, which improves the generalization ability without requiring extra manual annotations. CLE-ViT demonstrates strong performance on 7 publicly available datasets, demonstrating its effectiveness in the ultra-FGVC task. The code is available at https: //github. com/Markin-Wang/CLEViT.

NeurIPS Conference 2023 Conference Paper

D-Separation for Causal Self-Explanation

  • Wei Liu
  • Jun Wang
  • Haozhao Wang
  • Ruixuan Li
  • Zhiying Deng
  • YuanKai Zhang
  • Yang Qiu

Rationalization aims to strengthen the interpretability of NLP models by extracting a subset of human-intelligible pieces of their inputting texts. Conventional works generally employ the maximum mutual information (MMI) criterion to find the rationale that is most indicative of the target label. However, this criterion can be influenced by spurious features that correlate with the causal rationale or the target label. Instead of attempting to rectify the issues of the MMI criterion, we propose a novel criterion to uncover the causal rationale, termed the Minimum Conditional Dependence (MCD) criterion, which is grounded on our finding that the non-causal features and the target label are \emph{d-separated} by the causal rationale. By minimizing the dependence between the non-selected parts of the input and the target label conditioned on the selected rationale candidate, all the causes of the label are compelled to be selected. In this study, we employ a simple and practical measure for dependence, specifically the KL-divergence, to validate our proposed MCD criterion. Empirically, we demonstrate that MCD improves the F1 score by up to 13. 7% compared to previous state-of-the-art MMI-based methods. Our code is in an anonymous repository: https: //anonymous. 4open. science/r/MCD-CE88.

AAAI Conference 2023 Short Paper

Enhancing Dynamic GCN for Node Attribute Forecasting with Meta Spatial-Temporal Learning (Student Abstract)

  • Bo Wu
  • Xun Liang
  • Xiangping Zheng
  • Jun Wang

Node attribute forecasting has recently attracted considerable attention. Recent attempts have thus far utilize dynamic graph convolutional network (GCN) to predict future node attributes. However, few prior works have notice that the complex spatial and temporal interaction between nodes, which will hamper the performance of dynamic GCN. In this paper, we propose a new dynamic GCN model named meta-DGCN, leveraging meta spatial-temporal tasks to enhance the ability of dynamic GCN for better capturing node attributes in the future. Experiments show that meta-DGCN effectively modeling comprehensive spatio-temporal correlations between nodes and outperforms state-of-the-art baselines on various real-world datasets.

IJCAI Conference 2023 Conference Paper

HyperFed: Hyperbolic Prototypes Exploration with Consistent Aggregation for Non-IID Data in Federated Learning

  • Xinting Liao
  • Weiming Liu
  • Chaochao Chen
  • Pengyang Zhou
  • Huabin Zhu
  • Yanchao Tan
  • Jun Wang
  • Yue Qi

Federated learning (FL) collaboratively models user data in a decentralized way. However, in the real world, non-identical and independent data distributions (non-IID) among clients hinder the performance of FL due to three issues, i. e. , (1) the class statistics shifting, (2) the insufficient hierarchical information utilization, and (3) the inconsistency in aggregating clients. To address the above issues, we propose HyperFed which contains three main modules, i. e. , hyperbolic prototype Tammes initialization (HPTI), hyperbolic prototype learning (HPL), and consistent aggregation (CA). Firstly, HPTI in the server constructs uniformly distributed and fixed class prototypes, and shares them with clients to match class statistics, further guiding consistent feature representation for local clients. Secondly, HPL in each client captures the hierarchical information in local data with the supervision of shared class prototypes in the hyperbolic model space. Additionally, CA in the server mitigates the impact of the inconsistent deviations from clients to server. Extensive studies of four datasets prove that HyperFed is effective in enhancing the performance of FL under the non-IID setting.

JBHI Journal 2023 Journal Article

Immunotherapy Efficacy Prediction for Non-Small Cell Lung Cancer Using Multi-View Adaptive Weighted Graph Convolutional Networks

  • Qiong Wu
  • Jun Wang
  • Zongqiong Sun
  • Lei Xiao
  • Wenhao Ying
  • Jun Shi

Immunotherapy is an effective way to treat non-small cell lung cancer (NSCLC). The efficacy of immunotherapy differs from person to person and may cause side effects, making it important to predict the efficacy of immunotherapy before surgery. Radiomics based on machine learning has been successfully used to predict the efficacy of NSCLC immunotherapy. However, most studies only considered the radiomic features of the individual patient, ignoring the inter-patient correlations. Besides, they usually concatenated different features as the input of a single-view model, failing to consider the complex correlation among features of multiple types. To this end, we propose a multi-view adaptive weighted graph convolutional network (MVAW-GCN) for the prediction of NSCLC immunotherapy efficacy. Specifically, we group the radiomic features into several views according to the type of the fitered images they extracted from. We construct a graph in each view based on the radiomic features and phenotypic information. An attention mechanism is introduced to automatically assign weights to each view. Considering the view-shared and view-specific knowledge of radiomic features, we propose separable graph convolution that decomposes the output of the last convolution layer into two components, i. e. , the view-shared and view-specific outputs. We maximize the consistency and enhance the diversity among different views in the learning procedure. The proposed MVAW-GCN is evaluated on 107 NSCLC patients, including 52 patients with valid efficacy and 55 patients with invalid efficacy. Our method achieved an accuracy of 77. 27% and an area under the curve (AUC) of 0. 7780, indicating its effectiveness in NSCLC immunotherapy efficacy prediction.

AAAI Conference 2023 Conference Paper

Incentive-Boosted Federated Crowdsourcing

  • Xiangping Kang
  • Guoxian Yu
  • Jun Wang
  • Wei Guo
  • Carlotta Domeniconi
  • Jinglin Zhang

Crowdsourcing is a favorable computing paradigm for processing computer-hard tasks by harnessing human intelligence. However, generic crowdsourcing systems may lead to privacy-leakage through the sharing of worker data. To tackle this problem, we propose a novel approach, called iFedCrowd (incentive-boosted Federated Crowdsourcing), to manage the privacy and quality of crowdsourcing projects. iFedCrowd allows participants to locally process sensitive data and only upload encrypted training models, and then aggregates the model parameters to build a shared server model to protect data privacy. To motivate workers to build a high-quality global model in an efficacy way, we introduce an incentive mechanism that encourages workers to constantly collect fresh data to train accurate client models and boosts the global model training. We model the incentive-based interaction between the crowdsourcing platform and participating workers as a Stackelberg game, in which each side maximizes its own profit. We derive the Nash Equilibrium of the game to find the optimal solutions for the two sides. Experimental results confirm that iFedCrowd can complete secure crowdsourcing projects with high quality and efficiency.

NeurIPS Conference 2023 Conference Paper

Interpretable Reward Redistribution in Reinforcement Learning: A Causal Approach

  • Yudi Zhang
  • Yali Du
  • Biwei Huang
  • Ziyan Wang
  • Jun Wang
  • Meng Fang
  • Mykola Pechenizkiy

A major challenge in reinforcement learning is to determine which state-action pairs are responsible for future rewards that are delayed. Reward redistribution serves as a solution to re-assign credits for each time step from observed sequences. While the majority of current approaches construct the reward redistribution in an uninterpretable manner, we propose to explicitly model the contributions of state and action from a causal perspective, resulting in an interpretable reward redistribution and preserving policy invariance. In this paper, we start by studying the role of causal generative models in reward redistribution by characterizing the generation of Markovian rewards and trajectory-wise long-term return and further propose a framework, called Generative Return Decomposition (GRD), for policy optimization in delayed reward scenarios. Specifically, GRD first identifies the unobservable Markovian rewards and causal relations in the generative process. Then, GRD makes use of the identified causal generative model to form a compact representation to train policy over the most favorable subspace of the state space of the agent. Theoretically, we show that the unobservable Markovian reward function is identifiable, as well as the underlying causal structure and causal models. Experimental results show that our method outperforms state-of-the-art methods and the provided visualization further demonstrates the interpretability of our method. The project page is located at https: //reedzyd. github. io/GenerativeReturnDecomposition/.

NeurIPS Conference 2023 Conference Paper

Invariant Learning via Probability of Sufficient and Necessary Causes

  • Mengyue Yang
  • Zhen Fang
  • Yonggang Zhang
  • Yali Du
  • Furui Liu
  • Jean-Francois Ton
  • Jianhong Wang
  • Jun Wang

Out-of-distribution (OOD) generalization is indispensable for learning models in the wild, where testing distribution typically unknown and different from the training. Recent methods derived from causality have shown great potential in achieving OOD generalization. However, existing methods mainly focus on the invariance property of causes, while largely overlooking the property of sufficiency and necessity conditions. Namely, a necessary but insufficient cause (feature) is invariant to distribution shift, yet it may not have required accuracy. By contrast, a sufficient yet unnecessary cause (feature) tends to fit specific data well but may have a risk of adapting to a new domain. To capture the information of sufficient and necessary causes, we employ a classical concept, the probability of sufficiency and necessary causes (PNS), which indicates the probability of whether one is the necessary and sufficient cause. To associate PNS with OOD generalization, we propose PNS risk and formulate an algorithm to learn representation with a high PNS value. We theoretically analyze and prove the generalizability of the PNS risk. Experiments on both synthetic and real-world benchmarks demonstrate the effectiveness of the proposed method. The detailed implementation can be found at the GitHub repository: https: //github. com/ymy4323460/CaSN.

AAMAS Conference 2023 Conference Paper

Is Nash Equilibrium Approximator Learnable?

  • Zhijian Duan
  • Wenhan Huang
  • Dinghuai Zhang
  • Yali Du
  • Jun Wang
  • Yaodong Yang
  • Xiaotie Deng

In this paper, we investigate the learnability of the function approximator that approximates Nash equilibrium (NE) for games generated from a distribution. First, we offer a generalization bound using the Probably Approximately Correct (PAC) learning model. The bound describes the gap between the expected loss and empirical loss of the NE approximator. Afterward, we prove the agnostic PAC learnability of the Nash approximator. In addition to theoretical analysis, we demonstrate an application of NE approximator in experiments. The trained NE approximator can be used to warmstart and accelerate classical NE solvers. Together, our results show the practicability of approximating NE through function approximation.

TMLR Journal 2023 Journal Article

JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games

  • Yang Li
  • Kun Xiong
  • Yingping Zhang
  • Jiangcheng Zhu
  • Stephen Marcus McAleer
  • Wei Pan
  • Jun Wang
  • Zonghong Dai

This paper presents an empirical exploration of non-transitivity in perfect-information games, specifically focusing on Xiangqi, a traditional Chinese board game comparable in game-tree complexity to chess and shogi. By analyzing over 10,000 records of human Xiangqi play, we highlight the existence of both transitive and non-transitive elements within the game’s strategic structure. To address non-transitivity, we introduce the JiangJun algorithm, an innovative combination of Monte-Carlo Tree Search (MCTS) and Policy Space Response Oracles (PSRO) designed to approximate a Nash equilibrium. We evaluate the algorithm empirically using a WeChat mini program and achieve a Master level with a 99.41% win rate against human players. The algorithm’s effectiveness in overcoming non-transitivity is confirmed by a plethora of metrics, such as relative population performance and visualization results. Our project site is available at https://sites.google.com/view/jiangjun-site/.

AAMAS Conference 2023 Conference Paper

Learning Structured Communication for Multi-Agent Reinforcement Learning

  • Junjie Sheng
  • Xiangfeng Wang
  • Bo Jin
  • Wenhao Li
  • Jun Wang
  • Junchi Yan
  • Tsung-Hui Chang
  • Hongyuan Zha

This paper investigates multi-agent reinforcement learning (MARL) communication mechanisms in large-scale scenarios. We propose a novel framework, Learning Structured Communication (LSC), that leverages a flexible and efficient communication topology. LSC enables adaptive agent grouping to create diverse hierarchical formations over episodes generated through an auxiliary task and a hierarchical routing protocol. We learn a hierarchical graph neural network with the formed topology that facilitates effective message generation and propagation between inter- and intra-group communications. Unlike state-of-the-art communication mechanisms, LSC possesses a detailed and learnable design for hierarchical communication. Numerical experiments on challenging tasks demonstrate that the proposed LSC exhibits high communication efficiency and global cooperation capability.

AAAI Conference 2023 Conference Paper

Learning to Shape Rewards Using a Game of Two Partners

  • David Mguni
  • Taher Jafferjee
  • Jianhong Wang
  • Nicolas Perez-Nieves
  • Wenbin Song
  • Feifei Tong
  • Matthew Taylor
  • Tianpei Yang

Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construc- tion is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimising Shaping Algorithm (ROSA), an automated reward shaping framework in which the shaping-reward function is constructed in a Markov game between two agents. A reward-shaping agent (Shaper) uses switching controls to determine which states to add shaping rewards for more efficient learning while the other agent (Controller) learns the optimal policy for the task using these shaped rewards. We prove that ROSA, which adopts existing RL algorithms, learns to construct a shaping-reward function that is beneficial to the task thus ensuring efficient convergence to high performance policies. We demonstrate ROSA’s properties in three didactic experiments and show its superior performance against state-of-the-art RS algorithms in challenging sparse reward environments.

NeurIPS Conference 2023 Conference Paper

Lending Interaction Wings to Recommender Systems with Conversational Agents

  • Jiarui Jin
  • Xianyu Chen
  • Fanghua Ye
  • Mengyue Yang
  • Yue Feng
  • Weinan Zhang
  • Yong Yu
  • Jun Wang

An intelligent conversational agent (a. k. a. , chat-bot) could embrace conversational technologies to obtain user preferences online, to overcome inherent limitations of recommender systems trained over the offline historical user behaviors. In this paper, we propose CORE, a new offline-training and online-checking framework to plug a COnversational agent into REcommender systems. Unlike most prior conversational recommendation approaches that systemically combine conversational and recommender parts through a reinforcement learning framework, CORE bridges the conversational agent and recommender system through a unified uncertainty minimization framework, which can be easily applied to any existing recommendation approach. Concretely, CORE treats a recommender system as an offline estimator to produce an estimated relevance score for each item, while CORE regards a conversational agent as an online checker that checks these estimated scores in each online session. We define uncertainty as the sum of unchecked relevance scores. In this regard, the conversational agent acts to minimize uncertainty via querying either attributes or items. Towards uncertainty minimization, we derive the certainty gain of querying each attribute and item, and develop a novel online decision tree algorithm to decide what to query at each turn. Our theoretical analysis reveals the bound of the expected number of turns of CORE in a cold-start setting. Experimental results demonstrate that CORE can be seamlessly employed on a variety of recommendation approaches, and can consistently bring significant improvements in both hot-start and cold-start settings.

AAAI Conference 2023 Conference Paper

Long-Tail Cross Modal Hashing

  • Zijun Gao
  • Jun Wang
  • Guoxian Yu
  • Zhongmin Yan
  • Carlotta Domeniconi
  • Jinglin Zhang

Existing Cross Modal Hashing (CMH) methods are mainly designed for balanced data, while imbalanced data with long-tail distribution is more general in real-world. Several long-tail hashing methods have been proposed but they can not adapt for multi-modal data, due to the complex interplay between labels and individuality and commonality information of multi-modal data. Furthermore, CMH methods mostly mine the commonality of multi-modal data to learn hash codes, which may override tail labels encoded by the individuality of respective modalities. In this paper, we propose LtCMH (Long-tail CMH) to handle imbalanced multi-modal data. LtCMH firstly adopts auto-encoders to mine the individuality and commonality of different modalities by minimizing the dependency between the individuality of respective modalities and by enhancing the commonality of these modalities. Then it dynamically combines the individuality and commonality with direct features extracted from respective modalities to create meta features that enrich the representation of tail labels, and binaries meta features to generate hash codes. LtCMH significantly outperforms state-of-the-art baselines on long-tail datasets and holds a better (or comparable) performance on datasets with balanced labels.

JMLR Journal 2023 Journal Article

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning

  • Ming Zhou
  • Ziyu Wan
  • Hanjing Wang
  • Muning Wen
  • Runzhe Wu
  • Ying Wen
  • Yaodong Yang
  • Yong Yu

Population-based multi-agent reinforcement learning (PB-MARL) encompasses a range of methods that merge dynamic population selection with multi-agent reinforcement learning algorithms (MARL). While PB-MARL has demonstrated notable achievements in complex multi-agent tasks, its sequential execution is plagued by low computational efficiency due to the diversity in computing patterns and policy combinations. We propose a solution involving a stateless central task dispatcher and stateful workers to handle PB-MARL's subroutines, thereby capitalizing on parallelism across various components for efficient problem-solving. In line with this approach, we introduce MALib, a parallel framework that incorporates a task control model, independent data servers, and an abstraction of MARL training paradigms. The framework has undergone extensive testing and is available under the MIT license (https://github.com/sjtu-marl/malib) [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

JBHI Journal 2023 Journal Article

Multi-Scale Efficient Graph-Transformer for Whole Slide Image Classification

  • Saisai Ding
  • Juncheng Li
  • Jun Wang
  • Shihui Ying
  • Jun Shi

The multi-scale information among the whole slide images (WSIs) is essential for cancer diagnosis. Although the existing multi-scale vision Transformer has shown its effectiveness for learning multi-scale image representation, it still cannot work well on the gigapixel WSIs due to their extremely large image sizes. To this end, we propose a novel Multi-scale Efficient Graph-Transformer (MEGT) framework for WSI classification. The key idea of MEGT is to adopt two independent efficient Graph-based Transformer (EGT) branches to process the low-resolution and high-resolution patch embeddings (i. e. , tokens in a Transformer) of WSIs, respectively, and then fuse these tokens via a multi-scale feature fusion module (MFFM). Specifically, we design an EGT to efficiently learn the local-global information of patch tokens, which integrates the graph representation into Transformer to capture spatial-related information of WSIs. Meanwhile, we propose a novel MFFM to alleviate the semantic gap among different resolution patches during feature fusion, which creates a non-patch token for each branch as an agent to exchange information with another branch by cross-attention mechanism. In addition, to expedite network training, a new token pruning module is developed in EGT to reduce the redundant tokens. Extensive experiments on both TCGA-RCC and CAMELYON16 datasets demonstrate the effectiveness of the proposed MEGT.

JBHI Journal 2023 Journal Article

Multi-View Feature Transformation Based SVM+ for Computer-Aided Diagnosis of Liver Cancers With Ultrasound Images

  • Huili Zhang
  • Lehang Guo
  • Jun Wang
  • Shihui Ying
  • Jun Shi

It is feasible to improve the performance of B-mode ultrasound (BUS) based computer-aided diagnosis (CAD) for liver cancers by transferring knowledge from contrast-enhanced ultrasound (CEUS) images. In this work, we propose a novel feature transformation based support vector machine plus (SVM+) algorithm for this transfer learning task by introducing feature transformation into the SVM+ framework (named FSVM+). Specifically, the transformation matrix in FSVM+ is learned to minimize the radius of the enclosing ball of all samples, while the SVM+ is used to maximize the margin between two classes. Moreover, to capture more transferable information from multiple CEUS phase images, a multi-view FSVM+ (MFSVM+) is further developed, which transfers knowledge from three CEUS images from three phases, i. e. , arterial phase, portal venous phase, and delayed phase, to the BUS-based CAD model. MFSVM+ innovatively assigns appropriate weights for each CEUS image by calculating the maximum mean discrepancy between a pair of BUS and CEUS images, which can capture the relationship between source and target domains. The experimental results on a bi-modal ultrasound liver cancer dataset demonstrate that MFSVM+ achieves the best classification accuracy of 88. 24±1. 28%, sensitivity of 88. 32±2. 88%, specificity of 88. 17±2. 91%, suggesting its effectiveness in promoting the diagnostic accuracy of BUS-based CAD.

NeurIPS Conference 2023 Conference Paper

Online PCA in Converging Self-consistent Field Equations

  • Xihan Li
  • Xiang Chen
  • Rasul Tutunov
  • Haitham Bou Ammar
  • Lei Wang
  • Jun Wang

Self-consistent Field (SCF) equation is a type of nonlinear eigenvalue problem in which the matrix to be eigen-decomposed is a function of its own eigenvectors. It is of great significance in computational science for its connection to the Schrödinger equation. Traditional fixed-point iteration methods for solving such equations suffer from non-convergence issues. In this work, we present a novel perspective on such SCF equations as a principal component analysis (PCA) for non-stationary time series, in which a distribution and its own top principal components are mutually updated over time, and the equilibrium state of the model corresponds to the solution of the SCF equations. By the new perspective, online PCA techniques are able to engage in so as to enhance the convergence of the model towards the equilibrium state, acting as a new set of tools for converging the SCF equations. With several numerical adaptations, we then develop a new algorithm for converging the SCF equation, and demonstrated its high convergence capacity with experiments on both synthesized and real electronic structure scenarios.

AAMAS Conference 2023 Conference Paper

PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI Coordination

  • Xingzhou Lou
  • Jiaxian Guo
  • Junge Zhang
  • Jun Wang
  • Kaiqi Huang
  • Yali Du

Zero-shot human-AI coordination holds the promise of collaborating with humans without human data. Prevailing methods try to train the ego agent with a population of partners via self-play. However, these methods suffer from two problems: 1) The diversity of a population with finite partners is limited, thereby limiting the capacity of the trained ego agent to collaborate with a novel human; 2) Current methods only provide a common best response for every partner in the population, which may result in poor zero-shot coordination performance with a novel partner or humans. To address these issues, we first propose the policy ensemble method to increase the diversity of partners in the population, and then develop a context-aware method enabling the ego agent to analyze and identify the partner’s potential policy primitives so that it can take different actions accordingly. In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners. We conduct experiments on the Overcooked environment, and evaluate the zero-shot human-AI coordination performance of our method with both behavior-cloned human proxies and real humans. The results demonstrate that our method significantly increases the diversity of partners and enables ego agents to learn more diverse behaviors than baselines, thus achieving state-of-theart performance in all scenarios. We also open-source a human-AI ∗Work done while visiting King’s College London. †Correspondence. Proc. of the 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2023), A. Ricci, W. Yeoh, N. Agmon, B. An (eds.), May 29 – June 2, 2023, London, United Kingdom. © 2023 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org). All rights reserved. coordination study framework on the Overcooked for the convenience of future studies. Codes and demo videos are available at https: //sites. google. com/view/pecan-overcooked.

JBHI Journal 2023 Journal Article

Reconstruction of Quantitative Susceptibility Mapping from Total Field Maps with Local Field Maps Guided UU-Net

  • Zheng Li
  • Shihui Ying
  • Jun Wang
  • Hongjian He
  • Jun Shi

Quantitative susceptibility mapping (QSM) is an emerging computational technique based on the magnetic resonance imaging (MRI) phase signal, which can provide magnetic susceptibility values of tissues. The existing deep learning-based models mainly reconstruct QSM from local field maps. However, the complicated inconsecutive reconstruction steps not only accumulate errors for inaccurate estimation, but also are inefficient in clinical practice. To this end, a novel local field maps guided UU-Net with Self- and Cross-Guided Transformer (LGUU-SCT-Net) is proposed to reconstruct QSM directly from the total field maps. Specifically, we propose to additionally generate the local field maps as the auxiliary supervision during the training stage. This strategy decomposes the more complicated mapping from total maps to QSM into two relatively easier ones, effectively alleviating the difficulty of direct mapping. Meanwhile, an improved U-Net model, named LGUU-SCT-Net, is further designed to promote the nonlinear mapping ability. The long-range connections are designed between two sequentially stacked U-Nets to bring more feature fusions and facilitate the information flow. The Self- and Cross-Guided Transformer integrated into these connections further captures multi-scale channel-wise correlations and guides the fusion of multi-scale transferred features, assisting in the more accurate reconstruction. The experimental results on an in-vivo dataset demonstrate the superior reconstruction results of our proposed algorithm.

AAAI Conference 2023 Conference Paper

Reinforcement Causal Structure Learning on Order Graph

  • Dezhi Yang
  • Guoxian Yu
  • Jun Wang
  • Zhengtian Wu
  • Maozu Guo

Learning directed acyclic graph (DAG) that describes the causality of observed data is a very challenging but important task. Due to the limited quantity and quality of observed data, and non-identifiability of causal graph, it is almost impossible to infer a single precise DAG. Some methods approximate the posterior distribution of DAGs to explore the DAG space via Markov chain Monte Carlo (MCMC), but the DAG space is over the nature of super-exponential growth, accurately characterizing the whole distribution over DAGs is very intractable. In this paper, we propose Reinforcement Causal Structure Learning on Order Graph (RCL-OG) that uses order graph instead of MCMC to model different DAG topological orderings and to reduce the problem size. RCL-OG first defines reinforcement learning with a new reward mechanism to approximate the posterior distribution of orderings in an efficacy way, and uses deep Q-learning to update and transfer rewards between nodes. Next, it obtains the probability transition model of nodes on order graph, and computes the posterior probability of different orderings. In this way, we can sample on this model to obtain the ordering with high probability. Experiments on synthetic and benchmark datasets show that RCL-OG provides accurate posterior probability approximation and achieves better results than competitive causal discovery algorithms.

AAAI Conference 2023 Conference Paper

Self-Decoupling and Ensemble Distillation for Efficient Segmentation

  • Yuang Liu
  • Wei Zhang
  • Jun Wang

Knowledge distillation (KD) is a promising teacher-student learning paradigm that transfers information from a cumbersome teacher to a student network. To avoid the training cost of a large teacher network, the recent studies propose to distill knowledge from the student itself, called Self-KD. However, due to the limitations of the performance and capacity of the student, the soft-labels or features distilled by the student barely provide reliable guidance. Moreover, most of the Self-KD algorithms are specific to classification tasks based on soft-labels, and not suitable for semantic segmentation. To alleviate these contradictions, we revisit the label and feature distillation problem in segmentation, and propose Self-Decoupling and Ensemble Distillation for Efficient Segmentation (SDES). Specifically, we design a decoupled prediction ensemble distillation (DPED) algorithm that generates reliable soft-labels with multiple expert decoders, and a decoupled feature ensemble distillation (DFED) mechanism to utilize more important channel-wise feature maps for encoder learning. The extensive experiments on three public segmentation datasets demonstrate the superiority of our approach and the efficacy of each component in the framework through the ablation study.

JBHI Journal 2023 Journal Article

Two-Stage Self-Supervised Cycle-Consistency Transformer Network for Reducing Slice Gap in MR Images

  • Zhiyang Lu
  • Jian Wang
  • Zheng Li
  • Shihui Ying
  • Jun Wang
  • Jun Shi
  • Dinggang Shen

Magnetic resonance (MR) images are usually acquired with large slice gap in clinical practice, i. e. , low resolution (LR) along the through-plane direction. It is feasible to reduce the slice gap and reconstruct high-resolution (HR) images with the deep learning (DL) methods. To this end, the paired LR and HR images are generally required to train a DL model in a popular fully supervised manner. However, since the HR images are hardly acquired in clinical routine, it is difficult to get sufficient paired samples to train a robust model. Moreover, the widely used convolutional Neural Network (CNN) still cannot capture long-range image dependencies to combine useful information of similar contents, which are often spatially far away from each other across neighboring slices. To this end, a Two-stage Self-supervised Cycle-consistency Transformer Network (TSCTNet) is proposed to reduce the slice gap for MR images in this work. A novel self-supervised learning (SSL) strategy is designed with two stages respectively for robust network pre-training and specialized network refinement based on a cycle-consistency constraint. A hybrid Transformer and CNN structure is utilized to build an interpolation model, which explores both local and global slice representations. The experimental results on two public MR image datasets indicate that TSCTNet achieves superior performance over other compared SSL-based algorithms.

NeurIPS Conference 2023 Conference Paper

UltraRE: Enhancing RecEraser for Recommendation Unlearning via Error Decomposition

  • Yuyuan Li
  • Chaochao Chen
  • Yizhao Zhang
  • Weiming Liu
  • Lingjuan Lyu
  • Xiaolin Zheng
  • Dan Meng
  • Jun Wang

With growing concerns regarding privacy in machine learning models, regulations have committed to granting individuals the right to be forgotten while mandating companies to develop non-discriminatory machine learning systems, thereby fueling the study of the machine unlearning problem. Our attention is directed toward a practical unlearning scenario, i. e. , recommendation unlearning. As the state-of-the-art framework, i. e. , RecEraser, naturally achieves full unlearning completeness, our objective is to enhance it in terms of model utility and unlearning efficiency. In this paper, we rethink RecEraser from an ensemble-based perspective and focus on its three potential losses, i. e. , redundancy, relevance, and combination. Under the theoretical guidance of the above three losses, we propose a new framework named UltraRE, which simplifies and powers RecEraser for recommendation tasks. Specifically, for redundancy loss, we incorporate transport weights in the clustering algorithm to optimize the equilibrium between collaboration and balance while enhancing efficiency; for relevance loss, we ensure that sub-models reach convergence on their respective group data; for combination loss, we simplify the combination estimator without compromising its efficacy. Extensive experiments on three real-world datasets demonstrate the effectiveness of UltraRE.

JBHI Journal 2022 Journal Article

A Convolutional Neural Network and Graph Convolutional Network Based Framework for Classification of Breast Histopathological Images

  • Zhiyang Gao
  • Zhiyang Lu
  • Jun Wang
  • Shihui Ying
  • Jun Shi

The spatial correlation among different tissue components is an essential characteristic for diagnosis of breast cancers based on histopathological images. Graph convolutional network (GCN) can effectively capture this spatial feature representation, and has been successfully applied to the histopathological image based computer-aided diagnosis (CAD). However, the current GCN-based approaches need complicated image preprocessing for graph construction. In this work, we propose a novel CAD framework for classification of breast histopathological images, which integrates both convolutional neural network (CNN) and GCN (named CNN-GCN) into a unified framework, where CNN learns high-level features from histopathological images for further adaptive graph construction, and the generated graph is then fed to GCN to learn the spatial features of histopathological images for the classification task. In particular, a novel clique GCN (cGCN) is proposed to learn more effective graph representation, which can arrange both forward and backward connections between any two graph convolution layers. Moreover, a new group graph convolution is further developed to replace the classical graph convolution of each layer in cGCN, so as to reduce redundant information and implicitly select superior fused feature representation. The proposed clique group GCN (cgGCN) is then embedded in the CNN-GCN framework (named CNN-cgGCN) to promote the learned spatial representation for diagnosis of breast cancers. The experimental results on two public breast histopathological image datasets indicate the effectiveness of the proposed CNN-cgGCN with superior performance to all the compared algorithms.

NeurIPS Conference 2022 Conference Paper

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning

  • Bo Liu
  • Xidong Feng
  • Jie Ren
  • Luo Mai
  • Rui Zhu
  • Haifeng Zhang
  • Jun Wang
  • Yaodong Yang

Gradient-based Meta-RL (GMRL) refers to methods that maintain two-level optimisation procedures wherein the outer-loop meta-learner guides the inner-loop gradient-based reinforcement learner to achieve fast adaptations. In this paper, we develop a unified framework that describes variations of GMRL algorithms and points out that existing stochastic meta-gradient estimators adopted by GMRL are actually \textbf{biased}. Such meta-gradient bias comes from two sources: 1) the compositional bias incurred by the two-level problem structure, which has an upper bound of $\mathcal{O}\big(K\alpha^{K}\hat{\sigma}_{\text{In}}|\tau|^{-0. 5}\big)$ \emph{w. r. t. } inner-loop update step $K$, learning rate $\alpha$, estimate variance $\hat{\sigma}^{2}_{\text{In}}$ and sample size $|\tau|$, and 2) the multi-step Hessian estimation bias $\hat{\Delta}_{H}$ due to the use of autodiff, which has a polynomial impact $\mathcal{O}\big((K-1)(\hat{\Delta}_{H})^{K-1}\big)$ on the meta-gradient bias. We study tabular MDPs empirically and offer quantitative evidence that testifies our theoretical findings on existing stochastic meta-gradient estimators. Furthermore, we conduct experiments on Iterated Prisoner's Dilemma and Atari games to show how other methods such as off-policy learning and low-bias estimator can help fix the gradient bias for GMRL algorithms in general.

JBHI Journal 2022 Journal Article

AGMB-Transformer: Anatomy-Guided Multi-Branch Transformer Network for Automated Evaluation of Root Canal Therapy

  • Yunxiang Li
  • Guodong Zeng
  • Yifan Zhang
  • Jun Wang
  • Qun Jin
  • Lingling Sun
  • Qianni Zhang
  • Qisi Lian

Accurate evaluation of the treatment result on X-ray images is a significant and challenging step in root canal therapy since the incorrect interpretation of the therapy results will hamper timely follow-up which is crucial to the patients’ treatment outcome. Nowadays, the evaluation is performed in a manual manner, which is time-consuming, subjective, and error-prone. In this article, we aim to automate this process by leveraging the advances in computer vision and artificial intelligence, to provide an objective and accurate method for root canal therapy result assessment. A novel anatomy-guided multi-branch Transformer (AGMB-Transformer) network is proposed, which first extracts a set of anatomy features and then uses them to guide a multi-branch Transformer network for evaluation. Specifically, we design a polynomial curve fitting segmentation strategy with the help of landmark detection to extract the anatomy features. Moreover, a branch fusion module and a multi-branch structure including our progressive Transformer and Group Multi-Head Self-Attention (GMHSA) are designed to focus on both global and local features for an accurate diagnosis. To facilitate the research, we have collected a large-scale root canal therapy evaluation dataset with 245 root canal therapy X-ray images, and the experiment results show that our AGMB-Transformer can improve the diagnosis accuracy from 57. 96% to 90. 20% compared with the baseline network. The proposed AGMB-Transformer can achieve a highly accurate evaluation of root canal therapy. To our best knowledge, our work is the first to perform automatic root canal therapy evaluation and has important clinical value to reduce the workload of endodontists.

JBHI Journal 2022 Journal Article

Diagnosis of Infantile Hip Dysplasia With B-Mode Ultrasound via Two-Stage Meta-Learning Based Deep Exclusivity Regularized Machine

  • Bangming Gong
  • Jing Shi
  • Xiangmin Han
  • Huan Zhang
  • Yuemin Huang
  • Liwei Hu
  • Jun Wang
  • Jun Du

The B-mode ultrasound (BUS) based computer-aided diagnosis (CAD) has shown its effectiveness for developmental dysplasia of the hip (DDH) in infants. In this work, a two-stage meta-learning based deep exclusivity regularized machine (TML-DERM) is proposed for the BUS-based CAD of DDH. TML-DERM integrates deep neural network (DNN) and exclusivity regularized machine into a unified framework to simultaneously improve the feature representation and classification performance. Moreover, the first-stage meta-learning is mainly conducted on the DNN module to alleviate the overfitting issue caused by the significantly increased parameters in DNN, and a random sampling strategy is adopted to self-generate the meta-tasks; while the second-stage meta-learning mainly learns the combination of multiple weak classifiers by a weight vector to improve the classification performance, and also optimizes the unified framework again. The experimental results on a DDH ultrasound dataset show the proposed TML-DERM algorithm achieves the superior classification performance with the mean accuracy of 85. 89%, sensitivity of 86. 54%, and specificity of 85. 23%.

NeurIPS Conference 2022 Conference Paper

Enhancing Safe Exploration Using Safety State Augmentation

  • Aivar Sootla
  • Alexander Cowen-Rivers
  • Jun Wang
  • Haitham Bou Ammar

Safe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost is sparse and unknown, which unavoidably leads to constraint violations - a phenomenon ideally to be avoided in safety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is nonnegative if and only if the constraint is satisfied. The value of this state also serves as a distance toward constraint violation, while its initial value indicates the available safety budget. This idea allows us to derive policies for scheduling the safety budget during training. We call our approach Simmer (Safe policy IMproveMEnt for RL) to reflect the careful nature of these schedules. We apply this idea to two safe RL problems: RL with constraints imposed on an average cost, and RL with constraints imposed on a cost with probability one. Our experiments suggest that "simmering" a safe algorithm can improve safety during training for both settings. We further show that Simmer can stabilize training and improve the performance of safe RL with average constraints.

IJCAI Conference 2022 Conference Paper

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

  • Rongjie Huang
  • Max W. Y. Lam
  • Jun Wang
  • Dan Su
  • Dong Yu
  • Yi Ren
  • Zhou Zhao

Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hindered their applications to speech synthesis. This paper proposes FastDiff, a fast conditional diffusion model for high-quality speech synthesis. FastDiff employs a stack of time-aware location-variable convolutions of diverse receptive field patterns to efficiently model long-term time dependencies with adaptive conditions. A noise schedule predictor is also adopted to reduce the sampling steps without sacrificing the generation quality. Based on FastDiff, we design an end-to-end text-to-speech synthesizer, FastDiff-TTS, which generates high-fidelity speech waveforms without any intermediate feature (e. g. , Mel-spectrogram). Our evaluation of FastDiff demonstrates the state-of-the-art results with higher-quality (MOS 4. 28) speech samples. Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time. We further show that FastDiff generalized well to the mel-spectrogram inversion of unseen speakers, and FastDiff-TTS outperformed other competing methods in end-to-end text-to-speech synthesis. Audio samples are available at https: //FastDiff. github. io/.

NeurIPS Conference 2022 Conference Paper

FR: Folded Rationalization with a Unified Encoder

  • Wei Liu
  • Haozhao Wang
  • Jun Wang
  • Ruixuan Li
  • Chao Yue
  • YuanKai Zhang

Rationalization aims to strengthen the interpretability of NLP models by extracting a subset of human-intelligible pieces of their inputting texts. Conventional works generally employ a two-phase model in which a generator selects the most important pieces, followed by a predictor that makes predictions based on the selected pieces. However, such a two-phase model may incur the degeneration problem where the predictor overfits to the noise generated by a not yet well-trained generator and in turn, leads the generator to converge to a suboptimal model that tends to select senseless pieces. To tackle this challenge, we propose Folded Rationalization (FR) that folds the two phases of the rationale model into one from the perspective of text semantic extraction. The key idea of FR is to employ a unified encoder between the generator and predictor, based on which FR can facilitate a better predictor by access to valuable information blocked by the generator in the traditional two-phase model and thus bring a better generator. Empirically, we show that FR improves the F1 score by up to 10. 3% as compared to state-of-the-art methods.

AAMAS Conference 2022 Conference Paper

GCS: Graph-Based Coordination Strategy for Multi-Agent Reinforcement Learning

  • Jingqing Ruan
  • Yali Du
  • Xuantang Xiong
  • Dengpeng Xing
  • Xiyun Li
  • Linghui Meng
  • Haifeng Zhang
  • Jun Wang

Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method. The code is available at https: //github. com/Amanda-1997/GCS_aamas337.

AAAI Conference 2022 Conference Paper

Generation-Focused Table-Based Intermediate Pre-training for Free-Form Question Answering

  • Peng Shi
  • Patrick Ng
  • Feng Nan
  • Henghui Zhu
  • Jun Wang
  • Jiarong Jiang
  • Alexander Hanbo Li
  • Rishav Chakravarti

Question answering over semi-structured tables has attracted significant attention in the NLP community. However, most of the existing work focus on questions that can be answered with short-form answer, i. e. the answer is often a table cell or aggregation of multiple cells. This can mismatch with the intents of users who want to ask more complex questions that require free-form answers such as explanations. To bridge the gap, most recently, pre-trained sequence-tosequence language models such as T5 are used for generating free-form answers based on the question and table inputs. However, these pre-trained language models have weaker encoding abilities over table cells and schema. To mitigate this issue, in this work, we present an intermediate pre-training framework, Generation-focused Table-based Intermediate Pre-training (GENTAP), that jointly learns representations of natural language questions and tables. GEN- TAP learns to generate via two training objectives to enhance the question understanding and table representation abilities for complex questions. Based on experimental results, models that leverage GENTAP framework outperform the existing baselines on FETAQA benchmark. The pre-trained models are not only useful for free-form question answering, but also for few-shot data-to-text generation task, thus showing good transfer ability by obtaining new state-of-the-art results.

JAIR Journal 2022 Journal Article

HEBO: Pushing The Limits of Sample-Efficient Hyper-parameter Optimisation

  • Alexander I. Cowen-Rivers
  • Wenlong Lyu
  • Rasul Tutunov
  • Zhi Wang
  • Antoine Grosnit
  • Ryan Rhys Griffiths
  • Alexandre Max Maraval
  • Hao Jianye

In this work we rigorously analyse assumptions inherent to black-box optimisation hyper-parameter tuning tasks. Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for black-box optimisers. Based on these findings, we propose a Heteroscedastic and Evolutionary Bayesian Optimisation solver (HEBO). HEBO performs non-linear input and output warping, admits exact marginal log-likelihood optimisation and is robust to the values of learned parameters. We demonstrate HEBO’s empirical efficacy on the NeurIPS 2020 Black-Box Optimisation challenge, where HEBO placed first. Upon further analysis, we observe that HEBO significantly outperforms existing black-box optimisers on 108 machine learning hyperparameter tuning tasks comprising the Bayesmark benchmark. Our findings indicate that the majority of hyper-parameter tuning tasks exhibit heteroscedasticity and non-stationarity, multiobjective acquisition ensembles with Pareto front solutions improve queried configurations, and robust acquisition maximisers afford empirical advantages relative to their non-robust counterparts. We hope these findings may serve as guiding principles for practitioners of Bayesian optimisation.

JBHI Journal 2022 Journal Article

Joint Localization and Classification of Breast Cancer in B-Mode Ultrasound Imaging via Collaborative Learning With Elastography

  • Weichang Ding
  • Jun Wang
  • Weijun Zhou
  • Shichong Zhou
  • Cai Chang
  • Jun Shi

Convolutional neural networks (CNNs) have been successfully applied in the computer-aided ultrasound diagnosis for breast cancer. Up to now, several CNN-based methods have been proposed. However, most of them consider tumor localization and classification as two separate steps, rather than performing them simultaneously. Besides, they suffer from the limited diagnosis information in the B-mode ultrasound (BUS) images. In this study, we develop a novel network ResNet-GAP that incorporates both localization and classification into a unified procedure. To enhance the performance of ResNet-GAP, we leverage stiffness information in the elastography ultrasound (EUS) modality by collaborative learning in the training stage. Specifically, a dual-channel ResNet-GAP network is developed, one channel for BUS and the other for EUS. In each channel, multiple class activity maps (CAMs) are generated using a series of convolutional kernels of different sizes. The multi-scale consistency of the CAMs in both channels are further considered in network optimization. Experiments on 264 patients in this study show that the newly developed ResNet-GAP achieves an accuracy of 88. 6%, a sensitivity of 95. 3%, a specificity of 84. 6%, and an AUC of 93. 6% on the classification task, and a 1. 0NLF of 87. 9% on the localization task, which is better than some state-of-the-art approaches.

AAAI Conference 2022 Conference Paper

Learning to Identify Top Elo Ratings: A Dueling Bandits Approach

  • Xue Yan
  • Yali Du
  • Binxin Ru
  • Jun Wang
  • Haifeng Zhang
  • Xu Chen

The Elo rating system is widely adopted to evaluate the skills of (chess) game and sports players. Recently it has been also integrated into machine learning algorithms in evaluating the performance of computerised AI agents. However, an accurate estimation of the Elo rating (for the top players) often requires many rounds of competitions, which can be expensive to carry out. In this paper, to improve the sample efficiency of the Elo evaluation (for top players), we propose an efficient online match scheduling algorithm. Specifically, we identify and match the top players through a dueling bandits framework and tailor the bandit algorithm to the gradient-based update of Elo. We show that it reduces the per-step memory and time complexity to constant, compared to the traditional likelihood maximization approaches requiring O(t) time. Our algorithm has a regret guarantee of Õ( √ T), sublinear in the number of competition rounds and has been extended to the multidimensional Elo ratings for handling intransitive games. We empirically demonstrate that our method achieves superior convergence speed and time efficiency on a variety of gaming tasks.

NeurIPS Conference 2022 Conference Paper

M2N: Mesh Movement Networks for PDE Solvers

  • Wenbin Song
  • Mingrui Zhang
  • Joseph G Wallwork
  • Junpeng Gao
  • Zheng Tian
  • Fanglei Sun
  • Matthew Piggott
  • Junqing Chen

Numerical Partial Differential Equation (PDE) solvers often require discretizing the physical domain by using a mesh. Mesh movement methods provide the capability to improve the accuracy of the numerical solution without introducing extra computational burden to the PDE solver, by increasing mesh resolution where the solution is not well-resolved, whilst reducing unnecessary resolution elsewhere. However, sophisticated mesh movement methods, such as the Monge-Ampère method, generally require the solution of auxiliary equations. These solutions can be extremely expensive to compute when the mesh needs to be adapted frequently. In this paper, we propose to the best of our knowledge the first learning-based end-to-end mesh movement framework for PDE solvers. Key requirements of learning-based mesh movement methods are: alleviating mesh tangling, boundary consistency, and generalization to mesh with different resolutions. To achieve these goals, we introduce the neural spline model and the graph attention network (GAT) into our models respectively. While the Neural-Spline based model provides more flexibility for large mesh deformation, the GAT based model can handle domains with more complicated shapes and is better at performing delicate local deformation. We validate our methods on stationary and time-dependent, linear and non-linear equations, as well as regularly and irregularly shaped domains. Compared to the traditional Monge-Ampère method, our approach can greatly accelerate the mesh adaptation process by three to four orders of magnitude, whilst achieving comparable numerical error reduction.

NeurIPS Conference 2022 Conference Paper

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

  • Muning Wen
  • Jakub Kuba
  • Runji Lin
  • Weinan Zhang
  • Ying Wen
  • Jun Wang
  • Yaodong Yang

Large sequence models (SM) such as GPT series and BERT have displayed outstanding performance and generalization capabilities in natural language process, vision and recently reinforcement learning. A natural follow-up question is how to abstract multi-agent decision making also as an sequence modeling problem and benefit from the prosperous development of the SMs. In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the objective is to map agents' observation sequences to agents' optimal action sequences. Our goal is to build the bridge between MARL and SMs so that the modeling power of modern sequence models can be unleashed for MARL. Central to our MAT is an encoder-decoder architecture which leverages the multi-agent advantage decomposition theorem to transform the joint policy search problem into a sequential decision making process; this renders only linear time complexity for multi-agent problems and, most importantly, endows MAT with monotonic performance improvement guarantee. Unlike prior arts such as Decision Transformer fit only pre-collected offline data, MAT is trained by online trial and error from the environment in an on-policy fashion. To validate MAT, we conduct extensive experiments on StarCraftII, Multi-Agent MuJoCo, Dexterous Hands Manipulation, and Google Research Football benchmarks. Results demonstrate that MAT achieves superior performance and data efficiency compared to strong baselines including MAPPO and HAPPO. Furthermore, we demonstrate that MAT is an excellent few-short learner on unseen tasks regardless of changes in the number of agents. See our project page at https: //sites. google. com/view/multi-agent-transformer.

AAAI Conference 2022 Conference Paper

Multi-Knowledge Aggregation and Transfer for Semantic Segmentation

  • Yuang Liu
  • Wei Zhang
  • Jun Wang

As a popular deep neural networks (DNN) compression technique, knowledge distillation (KD) has attracted increasing attentions recently. Existing KD methods usually utilize one kind of knowledge in an intermediate layer of DNN for classification tasks to transfer useful information from cumbersome teacher networks to compact student networks. However, this paradigm is not very suitable for semantic segmentation, a comprehensive vision task based on both pixel-level and contextual information, since it cannot provide rich information for distillation. In this paper, we propose a novel multi-knowledge aggregation and transfer (MKAT) framework to comprehensively distill knowledge within an intermediate layer for semantic segmentation. Specifically, the proposed framework consists of three parts: Independent Transformers and Encoders module (ITE), Auxiliary Prediction Branch (APB), and Mutual Label Calibration (MLC) mechanism, which can take advantage of abundant knowledge from intermediate features. To demonstrate the effectiveness of our proposed approach, we conduct extensive experiments on three segmentation datasets: Pascal VOC, Cityscapes, and CamVid, showing that MKAT outperforms the other KD methods.

AAMAS Conference 2022 Conference Paper

Multiagent Q-learning with Sub-Team Coordination

  • Wenhan Huang
  • Kai Li
  • Kun Shao
  • Tianze Zhou
  • Jun Luo
  • Dongge Wang
  • Hangyu Mao
  • Jianye Hao

For cooperative mutliagent reinforcement learning tasks, we propose a novel value factorization framework in the popular centralized training with decentralized execution paradigm, called multiagent Q-learning with sub-team coordination (QSCAN). This framework could flexibly exploit local coordination within sub-teams for effective factorization while honoring the individual-globalmax (IGM) condition. QSCAN encompasses the full spectrum of sub-team coordination according to sub-team size, ranging from the monotonic value function class to the entire IGM function class, with familiar methods such as QMIX and QPLEX located at the respective extremes of the spectrum. Empirical results show that QSCAN’s performance dominates state-of-the-art methods in predator-prey tasks and the Switch challenge in MA-Gym.

NeurIPS Conference 2022 Conference Paper

Multiagent Q-learning with Sub-Team Coordination

  • Wenhan Huang
  • Kai Li
  • Kun Shao
  • Tianze Zhou
  • Matthew Taylor
  • Jun Luo
  • Dongge Wang
  • Hangyu Mao

In many real-world cooperative multiagent reinforcement learning (MARL) tasks, teams of agents can rehearse together before deployment, but then communication constraints may force individual agents to execute independently when deployed. Centralized training and decentralized execution (CTDE) is increasingly popular in recent years, focusing mainly on this setting. In the value-based MARL branch, credit assignment mechanism is typically used to factorize the team reward into each individual’s reward — individual-global-max (IGM) is a condition on the factorization ensuring that agents’ action choices coincide with team’s optimal joint action. However, current architectures fail to consider local coordination within sub-teams that should be exploited for more effective factorization, leading to faster learning. We propose a novel value factorization framework, called multiagent Q-learning with sub-team coordination (QSCAN), to flexibly represent sub-team coordination while honoring the IGM condition. QSCAN encompasses the full spectrum of sub-team coordination according to sub-team size, ranging from the monotonic value function class to the entire IGM function class, with familiar methods such as QMIX and QPLEX located at the respective extremes of the spectrum. Experimental results show that QSCAN’s performance dominates state-of-the-art methods in matrix games, predator-prey tasks, the Switch challenge in MA-Gym. Additionally, QSCAN achieves comparable performances to those methods in a selection of StarCraft II micro-management tasks.

IJCAI Conference 2022 Conference Paper

On the Convergence of Fictitious Play: A Decomposition Approach

  • Yurong Chen
  • Xiaotie Deng
  • Chenchen Li
  • David Mguni
  • Jun Wang
  • Xiang Yan
  • Yaodong Yang

Fictitious play (FP) is one of the most fundamental game-theoretical learning frameworks for computing Nash equilibrium in n-player games, which builds the foundation for modern multi-agent learning algorithms. Although FP has provable convergence guarantees on zero-sum games and potential games, many real-world problems are often a mixture of both and the convergence property of FP has not been fully studied yet. In this paper, we extend the convergence results of FP to the combinations of such games and beyond. Specifically, we derive new conditions for FP to converge by leveraging game decomposition techniques. We further develop a linear relationship unifying cooperation and competition in the sense that these two classes of games are mutually transferable. Finally, we analyse a non-convergent example of FP, the Shapley game, and develop sufficient conditions for FP to converge.

TMLR Journal 2022 Journal Article

Online Double Oracle

  • Le Cong Dinh
  • Stephen Marcus McAleer
  • Zheng Tian
  • Nicolas Perez-Nieves
  • Oliver Slumbers
  • David Henry Mguni
  • Jun Wang
  • Haitham Bou Ammar

Solving strategic games with huge action spaces is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) from game theory. Our method---\emph{Online Double Oracle (ODO)}---is provably convergent to a Nash equilibrium (NE). Most importantly, unlike normal DO, ODO is \emph{rational} in the sense that each agent in ODO can exploit a strategic adversary with a regret bound of $\mathcal{O}(\sqrt{ k \log(k)/T})$, where $k$ is not the total number of pure strategies, but rather the size of \emph{effective strategy set}. In many applications, we empirically show that $k$ is linearly dependent on the support size of the NE. On tens of different real-world matrix games, ODO outperforms DO, PSRO, and no-regret algorithms such as Multiplicative Weights Update by a significant margin, both in terms of convergence rate to a NE, and average payoff against strategic adversaries.

NeurIPS Conference 2022 Conference Paper

Optimistic Tree Searches for Combinatorial Black-Box Optimization

  • Cedric Malherbe
  • Antoine Grosnit
  • Rasul Tutunov
  • Haitham Bou Ammar
  • Jun Wang

The optimization of combinatorial black-box functions is pervasive in computer science and engineering. However, the combinatorial explosion of the search space and lack of natural ordering pose significant challenges for current techniques from a theoretical and practical perspective, and require new algorithmic ideas. In this paper, we propose to adapt the recent advances in tree searches and partitioning techniques to design and analyze novel black-box combinatorial solvers. A first contribution is the analysis of a first tree-search algorithm called Optimistic Lipschitz Tree Search (OLTS) which assumes the Lipschitz constant of the function to be known. Linear convergence rates are provided for this algorithm under specific conditions, improving upon the logarithmic rates of baselines. An adaptive version, called Optimistic Combinatorial Tree Search (OCTS), is then introduced for the more realistic setup where we do not have any information on the Lipschitz constant of the function. Similar theoretical guarantees are shown to hold for OCTS and a numerical assessment is provided to illustrate the potential of tree searches with respect to state-of-the-art methods over typical benchmarks.

JBHI Journal 2022 Journal Article

Self-Supervised Bi-Channel Transformer Networks for Computer-Aided Diagnosis

  • Ronglin Gong
  • Xiangmin Han
  • Jun Wang
  • Shihui Ying
  • Jun Shi

Self-supervised learning (SSL) can alleviate the issue of small sample size, which has shown its effectiveness for the computer-aided diagnosis (CAD) models. However, since the conventional SSL methods share the identical backbone in both the pretext and downstream tasks, the pretext network generally cannot be well trained in the pre-training stage, if the pretext task is totally different from the downstream one. In this work, we propose a novel task-driven SSL method, namely Self-Supervised Bi-channel Transformer Networks (SSBTN), to improve the diagnostic accuracy of a CAD model by enhancing SSL flexibility. In SSBTN, we innovatively integrate two different networks for the pretext and downstream tasks, respectively, into a unified framework. Consequently, the pretext task can be flexibly designed based on the data characteristics, and the corresponding designed pretext network thus learns more effective feature representation to be transferred to the downstream network. Furthermore, a transformer-based transfer module is developed to efficiently enhance knowledge transfer by conducting feature alignment between two different networks. The proposed SSBTN is evaluated on two publicly available datasets, namely the full-field digital mammography INbreast dataset and the wireless video capsule CrohnIPI dataset. The experimental results indicate that the proposed SSBTN outperforms all the compared algorithms.

AAAI Conference 2022 Conference Paper

Structural Landmarking and Interaction Modelling: A “SLIM” Network for Graph Classification

  • Yaokang Zhu
  • Kai Zhang
  • Jun Wang
  • Haibin Ling
  • Jie Zhang
  • Hongyuan Zha

Graph neural networks are a promising architecture for learning and inference with graph-structured data. Yet, how to generate informative, fixed-dimensional graph-level features for graphs with varying size and topology can still be challenging. Typically, this is achieved through graph-pooling, which summarizes a graph by compressing all its nodes into a single vector after convolutional operations. Is such a “collapsing-style” graph-pooling the only choice for graph classification? From complex system’s point of view, properties of a complex system arise largely from the interaction among its components. Therefore, we speculate that preserving the interacting relation between parts, instead of pooling them together, could benefit system-level prediction. To verify this, we propose SLIM, a graph neural network model for Structural Landmarking and Interaction Modelling. The main idea is to compute a set of end-to-end optimizable sub-structure landmarks, so that any input graph can be projected onto these (spatially) local structural representatives for a faithful, global characterization. By doing this, explicit interaction between component parts of a graph can be leveraged directly in generating useful graphlevel representations despite significant topological variations. Encouraging results are observed on benchmark datasets for graph classification, demonstrating the value of interaction modelling in the design of graph neural networks.

NeurIPS Conference 2022 Conference Paper

Understanding Square Loss in Training Overparametrized Neural Network Classifiers

  • Tianyang Hu
  • Jun Wang
  • Wenjia Wang
  • Zhenguo Li

Deep learning has achieved many breakthroughs in modern classification tasks. Numerous architectures have been proposed for different data structures but when it comes to the loss function, the cross-entropy loss is the predominant choice. Recently, several alternative losses have seen revived interests for deep classifiers. In particular, empirical evidence seems to promote square loss but a theoretical justification is still lacking. In this work, we contribute to the theoretical understanding of square loss in classification by systematically investigating how it performs for overparametrized neural networks in the neural tangent kernel (NTK) regime. Interesting properties regarding the generalization error, robustness, and calibration error are revealed. We consider two cases, according to whether classes are separable or not. In the general non-separable case, fast convergence rate is established for both misclassification rate and calibration error. When classes are separable, the misclassification rate improves to be exponentially fast. Further, the resulting margin is proven to be lower bounded away from zero, providing theoretical guarantees for robustness. We expect our findings to hold beyond the NTK regime and translate to practical settings. To this end, we conduct extensive empirical studies on practical neural networks, demonstrating the effectiveness of square loss in both synthetic low-dimensional data and real image data. Comparing to cross-entropy, square loss has comparable generalization error but noticeable advantages in robustness and model calibration.

AAAI Conference 2021 Conference Paper

Adaptive Pattern-Parameter Matching for Robust Pedestrian Detection

  • Mengyin Liu
  • Chao Zhu
  • Jun Wang
  • Xu-Cheng Yin

Pedestrians with challenging patterns, e. g. small scale or heavy occlusion, appear frequently in practical applications like autonomous driving, which remains tremendous obstacle to higher robustness of detectors. Although plenty of previous works have been dedicated to these problems, properly matching patterns of pedestrian and parameters of detector, i. e. , constructing a detector with proper parameter sizes for certain pedestrian patterns of different complexity, has been seldom investigated intensively. Pedestrian instances are usually handled equally with the same amount of parameters, which in our opinion is inadequate for those with more difficult patterns and leads to unsatisfactory performance. Thus, we propose in this paper a novel detection approach via adaptive pattern-parameter matching. The input pedestrian patterns, especially the complex ones, are first disentangled into simpler patterns for detection head by Pattern Disentangling Module (PDM) with various receptive fields. Then, Gating Feature Filtering Module (GFFM) dynamically decides the spatial positions where the patterns are still not simple enough and need further disentanglement by the next-level PDM. Cooperating with these two key components, our approach can adaptively select the best matched parameter size for the input patterns according to their complexity. Moreover, to further explore the relationship between parameter sizes and their performance on the corresponding patterns, two parameter selection policies are designed: 1) extending parameter size to maximum, aiming at more difficult patterns for different occlusion types; 2) specializing parameter size by group division, aiming at complex patterns for scale variations. Extensive experiments on two popular benchmarks, Caltech and CityPersons, show that our proposed method achieves superior performance compared with other state-of-the-art methods on subsets of different scales and occlusion types.

JMLR Journal 2021 Journal Article

Are We Forgetting about Compositional Optimisers in Bayesian Optimisation?

  • Antoine Grosnit
  • Alexander I. Cowen-Rivers
  • Rasul Tutunov
  • Ryan-Rhys Griffiths
  • Jun Wang
  • Haitham Bou-Ammar

Bayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, a crucial performance-determining subroutine is the maximisation of the acquisition function, a task complicated by the fact that acquisition functions tend to be non-convex and thus nontrivial to optimise. In this paper, we undertake a comprehensive empirical study of approaches to maximise the acquisition function. Additionally, by deriving novel, yet mathematically equivalent, compositional forms for popular acquisition functions, we recast the maximisation task as a compositional optimisation problem, allowing us to benefit from the extensive literature in this field. We highlight the empirical advantages of the compositional approach to acquisition function maximisation across 3958 individual experiments comprising synthetic optimisation tasks as well as tasks from Bayesmark. Given the generality of the acquisition function maximisation subroutine, we posit that the adoption of compositional optimisers has the potential to yield performance improvements across all domains in which Bayesian optimisation is currently being applied. An open-source implementation is made available at https://github.com/huawei-noah/noah-research/tree/CompBO/BO/HEBO/CompBO. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2021. ( edit, beta )

ICLR Conference 2021 Conference Paper

CT-Net: Channel Tensorization Network for Video Classification

  • Kunchang Li 0002
  • Xianhang Li
  • Yali Wang 0001
  • Jun Wang
  • Yu Qiao 0001

3D convolution is powerful for video classification but often computationally expensive, recent studies mainly focus on decomposing it on spatial-temporal and/or channel dimensions. Unfortunately, most approaches fail to achieve a preferable balance between convolutional efficiency and feature-interaction sufficiency. For this reason, we propose a concise and novel Channel Tensorization Network (CT-Net), by treating the channel dimension of input feature as a multiplication of K sub-dimensions. On one hand, it naturally factorizes convolution in a multiple dimension way, leading to a light computation burden. On the other hand, it can effectively enhance feature interaction from different channels, and progressively enlarge the 3D receptive field of such interaction to boost classification accuracy. Furthermore, we equip our CT-Module with a Tensor Excitation (TE) mechanism. It can learn to exploit spatial, temporal and channel attention in a high-dimensional manner, to improve the cooperative power of all the feature dimensions in our CT-Module. Finally, we flexibly adapt ResNet as our CT-Net. Extensive experiments are conducted on several challenging video benchmarks, e.g., Kinetics-400, Something-Something V1 and V2. Our CT-Net outperforms a number of recent SOTA approaches, in terms of accuracy and/or efficiency.

AAMAS Conference 2021 Conference Paper

Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems

  • Yaodong Yang
  • Jun Luo
  • Ying Wen
  • Oliver Slumbers
  • Daniel Graves
  • Haitham Bou Ammar
  • Jun Wang
  • Matthew E. Taylor

Multiagent reinforcement learning (MARL) has achieved a remarkable amount of success in solving various types of video games. A cornerstone of this success is the auto-curriculum framework, which shapes the learning process by continually creating new challenging tasks for agents to adapt to, thereby facilitating the acquisition of new skills. In order to extend MARL methods to realworld domains outside of video games, we envision in this blue sky paper that maintaining a diversity-aware auto-curriculum is critical for successful MARL applications. Specifically, we argue that behavioural diversity is a pivotal, yet under-explored, component for real-world multiagent learning systems, and that significant work remains in understanding how to design a diversity-aware auto-curriculum. We list four open challenges for auto-curriculum techniques, which we believe deserve more attention from this community. Towards validating our vision, we recommend modelling realistic interactive behaviours in autonomous driving as an important test bed, and recommend the SMARTS/ULTRA benchmark.

AAAI Conference 2021 Conference Paper

Generalized Relation Learning with Semantic Correlation Awareness for Link Prediction

  • Yao Zhang
  • Xu Zhang
  • Jun Wang
  • Hongru Liang
  • Wenqiang Lei
  • Zhe Sun
  • Adam Jatowt
  • Zhenglu Yang

Developing link prediction models to automatically complete knowledge graphs has recently been the focus of significant research interest. The current methods for the link prediction task have two natural problems: 1) the relation distributions in KGs are usually unbalanced, and 2) there are many unseen relations that occur in practical situations. These two problems limit the training effectiveness and practical applications of the existing link prediction models. We advocate a holistic understanding of KGs and we propose in this work a unified Generalized Relation Learning framework GRL to address the above two problems, which can be plugged into existing link prediction models. GRL conducts a generalized relation learning, which is aware of semantic correlations between relations that serve as a bridge to connect semantically similar relations. After training with GRL, the closeness of semantically similar relations in vector space and the discrimination of dissimilar relations are improved. We perform comprehensive experiments on six benchmarks to demonstrate the superior capability of GRL in the link prediction task. In particular, GRL is found to enhance the existing link prediction models making them insensitive to unbalanced relation distributions and capable of learning unseen relations.

AAAI Conference 2021 Conference Paper

Generative Semi-supervised Learning for Multivariate Time Series Imputation

  • Xiaoye Miao
  • Yangyang Wu
  • Jun Wang
  • Yunjun Gao
  • Xudong Mao
  • Jianwei Yin

The missing values, widely existed in multivariate time series data, hinder the effective data analysis. Existing time series imputation methods do not make full use of the label information in real-life time series data. In this paper, we propose a novel semi-supervised generative adversarial network model, named SSGAN, for missing value imputation in multivariate time series data. It consists of three players, i. e. , a generator, a discriminator, and a classifier. The classifier predicts labels of time series data, and thus it drives the generator to estimate the missing values (or components), conditioned on observed components and data labels at the same time. We introduce a temporal reminder matrix to help the discriminator better distinguish the observed components from the imputed ones. Moreover, we theoretically prove that, SSGAN using the temporal reminder matrix and the classifier does learn to estimate missing values converging to the true data distribution when the Nash equilibrium is achieved. Extensive experiments on three public real-world datasets demonstrate that, SSGAN yields a more than 15% gain in performance, compared with the state-of-the-art methods.

AAAI Conference 2021 Short Paper

LAMS: A Location-aware Approach for Multimodal Summarization (Student Abstract)

  • Zhengkun Zhang
  • Jun Wang
  • Zhe Sun
  • Zhenglu Yang

Multimodal summarization aims to refine salient information from multiple modalities, among which texts and images are two mostly discussed ones. In recent years, many fantastic works have emerged in this field by modeling imagetext interactions; however, they neglect the fact that most of multimodal documents have been elaborately organized by their writers. This means that a critical organized factor has long been short of enough attention, that is, image locations, which may carry illuminating information and imply the key contents of a document. To address this issue, we propose a location-aware approach for multimodal summarization (LAMS) based on Transformer. We investigate image locations for multimodal summarization via a stack of multimodal fusion block, which can formulate the high-order interactions among images and texts. An extensive experimental study on an extended multimodal dataset validates the superior summarization performance of the proposed model.

AAAI Conference 2021 Conference Paper

Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

  • Peng Shi
  • Patrick Ng
  • Zhiguo Wang
  • Henghui Zhu
  • Alexander Hanbo Li
  • Jun Wang
  • Cicero Nogueira dos Santos
  • Bing Xiang

Most recently, there has been significant interest in learning contextual representations for various NLP tasks, by leveraging large scale text corpora to train large neural language models with self-supervised learning objectives, such as Masked Language Model (MLM). However, based on a pilot study, we observe three issues of existing general-purpose language models when they are applied to text-to-SQL semantic parsers: fail to detect column mentions in the utterances, fail to infer column mentions from cell values, and fail to compose complex SQL queries. To mitigate these issues, we present a model pre-training framework, Generation- Augmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data. GAP MODEL 1 is trained on 2M utterance-schema pairs and 30K utterance-schema-SQL triples, whose utterances are produced by generative models. Based on experimental results, neural semantic parsers that leverage GAP MODEL as a representation encoder obtain new state-of-the-art results on both SPIDER and CRITERIA-TO-SQL benchmarks.

AAMAS Conference 2021 Conference Paper

Learning Correlated Communication Topology in Multi-Agent Reinforcement learning

  • Yali Du
  • Bo Liu
  • Vincent Moens
  • Ziqi Liu
  • Zhicheng Ren
  • Jun Wang
  • Xu Chen
  • Haifeng Zhang

Communication improves the efficiency and convergence of multiagent learning. Existing study of agent communication has been limited on predefined fixed connections. While an attention mechanism exists and is useful for scheduling the communication between agents, it, however, largely ignores the dynamical nature of communication and thus the correlation between agents’ connections. In this work, we adopt a normalizing flow to encode correlation between agents interactions. The dynamical communication topology is directly learned by maximizing the agent rewards. In our end-to-end formulation, the communication structure is learned by considering it as a hidden dynamical variable. We realize centralized training of critics and graph reasoning policy, and decentralized execution from local observation and message that are received through the learned dynamical communication topology. Experiments on cooperative navigation in the particle world and adaptive traffic control tasks demonstrate the effectiveness of our method.

JBHI Journal 2021 Journal Article

Multi-Source Transfer Learning Via Multi-Kernel Support Vector Machine Plus for B-Mode Ultrasound-Based Computer-Aided Diagnosis of Liver Cancers

  • Huili Zhang
  • Lehang Guo
  • Dan Wang
  • Jun Wang
  • Lili Bao
  • Shihui Ying
  • Huixiong Xu
  • Jun Shi

B-mode ultrasound (BUS) imaging is a routine tool for diagnosis of liver cancers, while contrast-enhanced ultrasound (CEUS) provides additional information to BUS on the local tissue vascularization and perfusion to promote diagnostic accuracy. In this work, we propose to improve the BUS-based computer aided diagnosis for liver cancers by transferring knowledge from the multi-view CEUS images, including the arterial phase, portal venous phase, and delayed phase, respectively. To make full use of the shared labels of paired of BUS and CEUS images to guide knowledge transfer, support vector machine plus (SVM+), a specifically designed transfer learning (TL) classifier for paired data with shared labels, is adopted for this supervised TL. A nonparallel hyperplane based SVM+ (NHSVM+) is first proposed to improve the TL performance by transferring the per-class knowledge from source domain to the corresponding target domain. Moreover, to handle the issue of multi-source TL, a multi-kernel learning based NHSVM+ (MKL-NHSVM+) algorithm is further developed to effectively transfer multi-source knowledge from multi-view CEUS images. The experimental results indicate that the proposed MKL-NHSVM+ outperforms all the compared algorithms for diagnosis of liver cancers, whose mean classification accuracy, sensitivity, and specificity are 88. 18 ± 3. 16 %, 86. 98 ± 4. 77 %, and 89. 42±3. 77%, respectively.

JBHI Journal 2021 Journal Article

Multiscale Attention Guided Network for COVID-19 Diagnosis Using Chest X-Ray Images

  • Jingxiong Li
  • Yaqi Wang
  • Shuai Wang
  • Jun Wang
  • Jun Liu
  • Qun Jin
  • Lingling Sun

Coronavirus disease 2019 (COVID-19) is one of the most destructive pandemic after millennium, forcing the world to tackle a health crisis. Automated lung infections classification using chest X-ray (CXR) images could strengthen diagnostic capability when handling COVID-19. However, classifying COVID-19 from pneumonia cases using CXR image is a difficult task because of shared spatial characteristics, high feature variation and contrast diversity between cases. Moreover, massive data collection is impractical for a newly emerged disease, which limited the performance of data thirsty deep learning models. To address these challenges, Multiscale Attention Guided deep network with Soft Distance regularization ( MAG-SD ) is proposed to automatically classify COVID-19 from pneumonia CXR images. In MAG-SD, MA-Net is used to produce prediction vector and attention from multiscale feature maps. To improve the robustness of trained model and relieve the shortage of training data, attention guided augmentations along with a soft distance regularization are posed, which aims at generating meaningful augmentations and reduce noise. Our multiscale attention model achieves better classification performance on our pneumonia CXR image dataset. Plentiful experiments are proposed for MAG-SD which demonstrates its unique advantage in pneumonia classification over cutting-edge models. The code is available at https://github.com/JasonLeeGHub/MAG-SD.

NeurIPS Conference 2021 Conference Paper

Neural Auto-Curricula in Two-Player Zero-Sum Games

  • Xidong Feng
  • Oliver Slumbers
  • Ziyu Wan
  • Bo Liu
  • Stephen McAleer
  • Ying Wen
  • Jun Wang
  • Yaodong Yang

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population. Within such a process, the update rules of "who to compete with" (i. e. , the opponent mixture) and "how to beat them" (i. e. , finding best responses) are underpinned by manually developed game theoretical principles such as fictitious play and Double Oracle. In this paper, we introduce a novel framework—Neural Auto-Curricula (NAC)—that leverages meta-gradient descent to automate the discovery of the learning update rule without explicit human design. Specifically, we parameterise the opponent selection module by neural networks and the best-response module by optimisation subroutines, and update their parameters solely via interaction with the game engine, where both players aim to minimise their exploitability. Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance with the state-of-the-art population-based game solvers (e. g. , PSRO) on Games of Skill, differentiable Lotto, non-transitive Mixture Games, Iterated Matching Pennies, and Kuhn Poker. Additionally, we show that NAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker. Our work inspires a promising future direction to discover general MARL algorithms solely from data.

AAAI Conference 2021 Conference Paper

News Content Completion with Location-Aware Image Selection

  • Zhengkun Zhang
  • Jun Wang
  • Adam Jatowt
  • Zhe Sun
  • Shao-Ping Lu
  • Zhenglu Yang

News, as one of the fundamental social media types, typically contains both texts and images. Image selection, which involves choosing appropriate images according to some specified contexts, is crucial for formulating good news. However, it presents two challenges: where to place images and which images to use. The difficulties associated with this wherewhich problem lie in the fact that news typically contains linguistically rich text that delivers complex information and more than one image. In this paper, we propose a novel endto-end two-stage framework to address these issues comprehensively. In the first stage, we identify key information in news by using location embeddings, which represent the local contextual information of each candidate location for image insertion. Then, in the second stage, we thoroughly examine the candidate images and select the most context-related ones to insert into each location identified in the first stage. We also introduce three insertion strategies to formulate different scenarios influencing the image selection procedure. Extensive experiments demonstrate the consistent superiority of the proposed framework in image selection.

IJCAI Conference 2021 Conference Paper

Ordering-Based Causal Discovery with Reinforcement Learning

  • Xiaoqiang Wang
  • Yali Du
  • Shengyu Zhu
  • Liangjun Ke
  • Zhitang Chen
  • Jianye Hao
  • Jun Wang

It is a long-standing question to discover causal relations among a set of variables in many empirical sciences. Recently, Reinforcement Learning (RL) has achieved promising results in causal discovery from observational data. However, searching the space of directed graphs and enforcing acyclicity by implicit penalties tend to be inefficient and restrict the existing RL-based method to small scale problems. In this work, we propose a novel RL-based approach for causal discovery, by incorporating RL into the ordering-based paradigm. Specifically, we formulate the ordering search problem as a multi-step Markov decision process, implement the ordering generating process with an encoder-decoder architecture, and finally use RL to optimize the proposed model based on the reward mechanisms designed for each ordering. A generated ordering would then be processed using variable selection to obtain the final causal graph. We analyze the consistency and computational complexity of the proposed method, and empirically show that a pretrained model can be exploited to accelerate training. Experimental results on both synthetic and real data sets shows that the proposed method achieves a much improved performance over existing RL-based method.

IJCAI Conference 2021 Conference Paper

Pairwise Half-graph Discrimination: A Simple Graph-level Self-supervised Strategy for Pre-training Graph Neural Networks

  • Pengyong Li
  • Jun Wang
  • Ziliang Li
  • Yixuan Qiao
  • Xianggen Liu
  • Fei Ma
  • Peng Gao
  • Sen Song

Self-supervised learning has gradually emerged as a powerful technique for graph representation learning. However, transferable, generalizable, and robust representation learning on graph data still remains a challenge for pre-training graph neural networks. In this paper, we propose a simple and effective self-supervised pre-training strategy, named Pairwise Half-graph Discrimination (PHD), that explicitly pre-trains a graph neural network at graph-level. PHD is designed as a simple binary classification task to discriminate whether two half-graphs come from the same source. Experiments demonstrate that the PHD is an effective pre-training strategy that offers comparable or superior performance on 13 graph classification tasks compared with state-of-the-art strategies, and achieves notable improvements when combined with node-level strategies. Moreover, the visualization of learned representation revealed that PHD strategy indeed empowers the model to learn graph-level knowledge like the molecular scaffold. These results have established PHD as a powerful and effective self-supervised learning strategy in graph-level representation learning.

AAAI Conference 2021 Conference Paper

Predicting Flashover Occurrence using Surrogate Temperature Data

  • Eugene Yujun Fu
  • Wai Cheong Tam
  • Jun Wang
  • Richard Peacock
  • Paul A Reneke
  • Grace Ngai
  • Hong Va Leong
  • Thomas Cleary

Fire fighter fatalities and injuries in the U. S. remain too high and fire fighting too hazardous. Until now, fire fighters rely only on their experience to avoid life-threatening fire events, such as flashover. In this paper, we describe the development of a flashover prediction model which can be used to warn fire fighters before flashover occurs. Specifically, we consider the use of a fire simulation program to generate a set of synthetic data and an attention-based bidirectional long shortterm memory to learn the complex relationships between temperature signals and flashover conditions. We first validate the fire simulation program with temperature measurements obtained from full-scale fire experiments. Then, we generate a set of synthetic temperature data which account for the realistic fire and vent opening conditions in a multi-compartment structure. Results show that our proposed method achieves promising performance for prediction of flashover even when temperature data is completely lost in the room of fire origin. It is believed that the flashover prediction model can facilitate the transformation of fire fighting tactics from traditional experience-based decision marking to data-driven decision marking and reduce fire fighter deaths and injuries.

NeurIPS Conference 2021 Conference Paper

Settling the Variance of Multi-Agent Policy Gradients

  • Jakub Grudzien Kuba
  • Muning Wen
  • Linghui Meng
  • Shangding Gu
  • Haifeng Zhang
  • David Mguni
  • Jun Wang
  • Yaodong Yang

Policy gradient (PG) methods are popular reinforcement learning (RL) methods where a baseline is often applied to reduce the variance of gradient estimates. In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents. In this paper, we offer a rigorous analysis of MAPG methods by, firstly, quantifying the contributions of the number of agents and agents' explorations to the variance of MAPG estimators. Based on this analysis, we derive the optimal baseline (OB) that achieves the minimal variance. In comparison to the OB, we measure the excess variance of existing MARL algorithms such as vanilla MAPG and COMA. Considering using deep neural networks, we also propose a surrogate version of OB, which can be seamlessly plugged into any existing PG methods in MARL. On benchmarks of Multi-Agent MuJoCo and StarCraft challenges, our OB technique effectively stabilises training and improves the performance of multi-agent PPO and COMA algorithms by a significant margin. Code is released at \url{https: //github. com/morning9393/Optimal-Baseline-for-Multi-agent-Policy-Gradients}.

IJCAI Conference 2021 Conference Paper

State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

  • Shuang Wu
  • Jingyu Zhao
  • Guangjian Tian
  • Jun Wang

The restless multi-armed bandit (RMAB) problem is a generalization of the multi-armed bandit with non-stationary rewards. Its optimal solution is intractable due to exponentially large state and action spaces with respect to the number of arms. Existing approximation approaches, e. g. , Whittle's index policy, have difficulty in capturing either temporal or spatial factors such as impacts from other arms. We propose considering both factors using the attention mechanism, which has achieved great success in deep learning. Our state-aware value function approximation solution comprises an attention-based value function approximator and a Bellman equation solver. The attention-based coordination module capture both spatial and temporal factors for arm coordination. The Bellman equation solver utilizes the decoupling structure of RMABs to acquire solutions with significantly reduced computation overheads. In particular, the time complexity of our approximation is linear in the number of arms. Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments.

AAAI Conference 2021 Conference Paper

Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect

  • Jun Wang
  • Max W. Y. Lam
  • Dan Su
  • Dong Yu

We study the cocktail party problem and propose a novel attention network called Tune-In, abbreviated for training under negative environments with interference. It firstly learns two separate spaces of speaker-knowledge and speech-stimuli based on a shared feature space, where a new block structure is designed as the building block for all spaces, and then cooperatively solves different tasks. Between the two spaces, information is cast towards each other via a novel cross- and dual-attention mechanism, mimicking the bottom-up and topdown processes of a human’s cocktail party effect. It turns out that substantially discriminative and generalizable speaker representations can be learnt in severely interfered conditions via our self-supervised training. The experimental results verify this seeming paradox. The learnt speaker embedding has superior discriminative power than a standard speaker verification method; meanwhile, Tune-In achieves remarkably better speech separation performances in terms of SI-SNRi and SDRi consistently in all test modes, and especially at lower memory and computational consumption, than state-of-the-art benchmark systems.

AAAI Conference 2020 Conference Paper

Bi-Level Actor-Critic for Multi-Agent Coordination

  • Haifeng Zhang
  • Weizhe Chen
  • Zeren Huang
  • Minne Li
  • Yaodong Yang
  • Weinan Zhang
  • Jun Wang

Coordination is one of the essential problems in multiagent systems. Typically multi-agent reinforcement learning (MARL) methods treat agents equally and the goal is to solve the Markov game to an arbitrary Nash equilibrium (NE) when multiple equilibra exist, thus lacking a solution for NE selection. In this paper, we treat agents unequally and consider Stackelberg equilibrium as a potentially better convergence point than Nash equilibrium in terms of Pareto superiority, especially in cooperative environments. Under Markov games, we formally define the bi-level reinforcement learning problem in finding Stackelberg equilibrium. We propose a novel bi-level actor-critic learning method that allows agents to have different knowledge base (thus intelligent), while their actions still can be executed simultaneously and distributedly. The convergence proof is given, while the resulting learning algorithm is tested against the state of the arts. We found that the proposed bi-level actor-critic algorithm successfully converged to the Stackelberg equilibria in matrix games and find a asymmetric solution in a highway merge environment.

ICML Conference 2020 Conference Paper

ControlVAE: Controllable Variational Autoencoder

  • Huajie Shao
  • Shuochao Yao
  • Dachun Sun
  • Aston Zhang
  • Shengzhong Liu
  • Dongxin Liu
  • Jun Wang
  • Tarek F. Abdelzaher

Variational Autoencoders (VAE) and their variants have been widely used in a variety of applications, such as dialog generation, image generation and disentangled representation learning. However, the existing VAE models may suffer from KL vanishing in language modeling and low reconstruction quality for disentangling. To address these issues, we propose a novel controllable variational autoencoder framework, ControlVAE, that combines a controller, inspired by automatic control theory, with the basic VAE to improve the performance of resulting generative models. Specifically, we design a new non-linear PI controller, a variant of the proportional-integral-derivative (PID) control, to automatically tune the hyperparameter (weight) added in the VAE objective using the output KL-divergence as feedback during model training. The framework is evaluated using three applications; namely, language modeling, disentangled representation learning, and image generation. The results show that ControlVAE can achieve much better reconstruction quality than the competitive methods for the comparable disentanglement performance. For language modeling, it not only averts the KL-vanishing, but also improves the diversity of generated text. Finally, we also demonstrate that ControlVAE improves the reconstruction quality for image generation compared to the original VAE.

AAAI Conference 2020 Conference Paper

Crowdfunding Dynamics Tracking: A Reinforcement Learning Approach

  • Jun Wang
  • Hefu Zhang
  • Qi Liu
  • Zhen Pan
  • Hanqing Tao

Recent years have witnessed the increasing interests in research of crowdfunding mechanism. In this area, dynamics tracking is a significant issue but is still under exploration. Existing studies either fit the fluctuations of timeseries or employ regularization terms to constrain learned tendencies. However, few of them take into account the inherent decision-making process between investors and crowdfunding dynamics. To address the problem, in this paper, we propose a Trajectory-based Continuous Control for Crowdfunding (TC3) algorithm to predict the funding progress in crowdfunding. Specifically, actor-critic frameworks are employed to model the relationship between investors and campaigns, where all of the investors are viewed as an agent that could interact with the environment derived from the real dynamics of campaigns. Then, to further explore the in-depth implications of patterns (i. e. , typical characters) in funding series, we propose to subdivide them into fast-growing and slow-growing ones. Moreover, for the purpose of switching from different kinds of patterns, the actor component of TC3 is extended with a structure of options, which comes to the TC3-Options. Finally, extensive experiments on the Indiegogo dataset not only demonstrate the effectiveness of our methods, but also validate our assumption that the entire pattern learned by TC3-Options is indeed the U-shaped one.

IJCAI Conference 2020 Conference Paper

Crowdsourcing with Multiple-Source Knowledge Transfer

  • Guangyang Han
  • Jinzheng Tu
  • Guoxian Yu
  • Jun Wang
  • Carlotta Domeniconi

Crowdsourcing is a new computing paradigm that harnesses human effort to solve computer-hard problems. Budget and quality are two fundamental factors in crowdsourcing, but they are antagonistic and their balance is crucially important. Induction and inference are principled ways for humans to acquire knowledge. Transfer learning can also enable induction and inference processes. When a new task comes, we may not know how to go about approaching it. On the other hand, we may have easy access to relevant knowledge that can help us with the new task. As such, via appropriate knowledge transfer, for example, an improved annotation can be achieved for the task at a small cost. To make this idea concrete, we introduce the Crowdsourcing with Multiple-source Knowledge Transfer (CrowdMKT)approach to transfer knowledge from multiple, similar, but different domains for a new task, and to reduce the negative impact of irrelevant sources. CrwodMKT first learns a set of concentrated high-level feature vectors of tasks using knowledge transfer from multiple sources, and then introduces a probabilistic graphical model to jointly model the tasks with high-level features, workers, and their annotations. Finally, it adopts an EM algorithm to estimatethe workers strengths and consensus. Experimental results on real-world image and text datasets prove the effectiveness of CrowdMKT in improving quality and reducing the budget.

AAAI Conference 2020 Conference Paper

Differentially Private Learning with Small Public Data

  • Jun Wang
  • Zhi-Hua Zhou

Differentially private learning tackles tasks where the data are private and the learning process is subject to differential privacy requirements. In real applications, however, some public data are generally available in addition to private data, and it is interesting to consider how to exploit them. In this paper, we study a common situation where a small amount of public data can be used when solving the Empirical Risk Minimization problem over a private database. Specifically, we propose Private-Public Stochastic Gradient Descent, which utilizes such public information to adjust parameters in differentially private stochastic gradient descent and fine-tunes the final result with model reuse. Our method keeps differential privacy for the private database, and empirical study validates its superiority compared with existing approaches.

AAAI Conference 2020 Conference Paper

Document Summarization with VHTM: Variational Hierarchical Topic-Aware Mechanism

  • Xiyan Fu
  • Jun Wang
  • Jinghan Zhang
  • Jinmao Wei
  • Zhenglu Yang

Automatic text summarization focuses on distilling summary information from texts. This research field has been considerably explored over the past decades because of its significant role in many natural language processing tasks; however, two challenging issues block its further development: (1) how to yield a summarization model embedding topic inference rather than extending with a pre-trained one and (2) how to merge the latent topics into diverse granularity levels. In this study, we propose a variational hierarchical model to holistically address both issues, dubbed VHTM. Different from the previous work assisted by a pre-trained singlegrained topic model, VHTM is the first attempt to jointly accomplish summarization with topic inference via variational encoder-decoder and merge topics into multi-grained levels through topic embedding and attention. Comprehensive experiments validate the superior performance of VHTM compared with the baselines, accompanying with semantically consistent topics.

AAAI Conference 2020 Conference Paper

Label Enhancement with Sample Correlations via Low-Rank Representation

  • Haoyu Tang
  • Jihua Zhu
  • Qinghai Zheng
  • Jun Wang
  • Shanmin Pang
  • Zhongyu Li

Compared with single-label and multi-label annotations, label distribution describes the instance by multiple labels with different intensities and accommodates to more-general conditions. Nevertheless, label distribution learning is unavailable in many real-world applications because most existing datasets merely provide logical labels. To handle this problem, a novel label enhancement method, Label Enhancement with Sample Correlations via low-rank representation, is proposed in this paper. Unlike most existing methods, a low-rank representation method is employed so as to capture the global relationships of samples and predict implicit label correlation to achieve label enhancement. Extensive experiments on 14 datasets demonstrate that the algorithm accomplishes stateof-the-art results as compared to previous label enhancement baselines.

AAAI Conference 2020 Conference Paper

Learning to Communicate Implicitly by Actions

  • Zheng Tian
  • Shihao Zou
  • Ian Davies
  • Tim Warr
  • Lisheng Wu
  • Haitham Bou Ammar
  • Jun Wang

In situations where explicit communication is limited, human collaborators act by learning to: (i) infer meaning behind their partner’s actions, and (ii) convey private information about the state to their partner implicitly through actions. The first component of this learning process has been well-studied in multi-agent systems, whereas the second — which is equally crucial for successful collaboration — has not. To mimic both components mentioned above, thereby completing the learning process, we introduce a novel algorithm: Policy Belief Learning (PBL). PBL uses a belief module to model the other agent’s private information and a policy module to form a distribution over actions informed by the belief module. Furthermore, to encourage communication by actions, we propose a novel auxiliary reward which incentivizes one agent to help its partner to make correct inferences about its private information. The auxiliary reward for communication is integrated into the learning of the policy module. We evaluate our approach on a set of environments including a matrix game, particle environment and the non-competitive bidding problem from contract bridge. We show empirically that this auxiliary reward is effective and easy to generalize. These results demonstrate that our PBL algorithm can produce strong pairs of agents in collaborative games where explicit communication is disabled.

AAAI Conference 2020 Short Paper

Learning to Model Opponent Learning (Student Abstract)

  • Ian Davies
  • Zheng Tian
  • Jun Wang

Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment. The adaptation and learning of other agents induces non-stationarity in the environment dynamics. This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment. Policy search algorithms also struggle in multi-agent settings as the partial observability resulting from an opponent's actions not being known introduces high variance to policy training. Modelling an agent's opponent(s) is often pursued as a means of resolving the issues arising from the coexistence of learning opponents. An opponent model provides an agent with some ability to reason about other agents to aid its own decision making. Most prior works learn an opponent model by assuming the opponent is employing a stationary policy or switching between a set of stationary policies. Such an approach can reduce the variance of training signals for policy search algorithms. However, in the multi-agent setting, agents have an incentive to continually adapt and learn. This means that the assumptions concerning opponent stationarity are unrealistic. In this work, we develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL). We show our structured opponent model is more accurate and stable than naive behaviour cloning baselines. We further show that opponent modelling can improve the performance of algorithmic agents in multi-agent settings.

IJCAI Conference 2020 Conference Paper

Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning

  • Ying Wen
  • Yaodong Yang
  • Jun Wang

Though limited in real-world decision making, most multi-agent reinforcement learning (MARL) models assume perfectly rational agents -- a property hardly met due to individual's cognitive limitation and/or the tractability of the decision problem. In this paper, we introduce generalized recursive reasoning (GR2) as a novel framework to model agents with different \emph{hierarchical} levels of rationality; our framework enables agents to exhibit varying levels of ``thinking'' ability thereby allowing higher-level agents to best respond to various less sophisticated learners. We contribute both theoretically and empirically. On the theory side, we devise the hierarchical framework of GR2 through probabilistic graphical models and prove the existence of a perfect Bayesian equilibrium. Within the GR2, we propose a practical actor-critic solver, and demonstrate its convergent property to a stationary point in two-player games through Lyapunov analysis. On the empirical side, we validate our findings on a variety of MARL benchmarks. Precisely, we first illustrate the hierarchical thinking process on the Keynes Beauty Contest, and then demonstrate significant improvements compared to state-of-the-art opponent modeling baselines on the normal-form games and the cooperative navigation benchmark.

AAAI Conference 2020 Conference Paper

Multi-View Multiple Clusterings Using Deep Matrix Factorization

  • Shaowei Wei
  • Jun Wang
  • Guoxian Yu
  • Carlotta Domeniconi
  • Xiangliang Zhang

Multi-view clustering aims at integrating complementary information from multiple heterogeneous views to improve clustering results. Existing multi-view clustering solutions can only output a single clustering of the data. Due to their multiplicity, multi-view data, can have different groupings that are reasonable and interesting from different perspectives. However, how to find multiple, meaningful, and diverse clustering results from multi-view data is still a rarely studied and challenging topic in multi-view clustering and multiple clusterings. In this paper, we introduce a deep matrix factorization based solution (DMClusts) to discover multiple clusterings. DMClusts gradually factorizes multi-view data matrices into representational subspaces layer-by-layer and generates one clustering in each layer. To enforce the diversity between generated clusterings, it minimizes a new redundancy quantification term derived from the proximity between samples in these subspaces. We further introduce an iterative optimization procedure to simultaneously seek multiple clusterings with quality and diversity. Experimental results on benchmark datasets confirm that DMClusts outperforms state-of-the-art multiple clustering solutions.

AAAI Conference 2020 Conference Paper

Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning

  • Hangyu Mao
  • Wulong Liu
  • Jianye Hao
  • Jun Luo
  • Dong Li
  • Zhengchao Zhang
  • Jun Wang
  • Zhen Xiao

Social psychology and real experiences show that cognitive consistency plays an important role to keep human society in order: if people have a more consistent cognition about their environments, they are more likely to achieve better cooperation. Meanwhile, only cognitive consistency within a neighborhood matters because humans only interact directly with their neighbors. Inspired by these observations, we take the first step to introduce neighborhood cognitive consistency (NCC) into multi-agent reinforcement learning (MARL). Our NCC design is quite general and can be easily combined with existing MARL methods. As examples, we propose neighborhood cognition consistent deep Q-learning and Actor-Critic to facilitate large-scale multi-agent cooperations. Extensive experiments on several challenging tasks (i. e. , packet routing, wifi configuration and Google football player control) justify the superior performance of our methods compared with state-of-the-art MARL approaches.

AAAI Conference 2020 Conference Paper

NeoNav: Improving the Generalization of Visual Navigation via Generating Next Expected Observations

  • Qiaoyun Wu
  • Dinesh Manocha
  • Jun Wang
  • Kai Xu

We propose improving the cross-target and cross-scene generalization of visual navigation through learning an agent that is guided by conceiving the next observations it expects to see. This is achieved by learning a variational Bayesian model, called NeoNav, which generates the next expected observations (NEO) conditioned on the current observations of the agent and the target view. Our generative model is learned through optimizing a variational objective encompassing two key designs. First, the latent distribution is conditioned on current observations and the target view, leading to a modelbased, target-driven navigation. Second, the latent space is modeled with a Mixture of Gaussians conditioned on the current observation and the next best action. Our use of mixtureof-posteriors prior effectively alleviates the issue of overregularized latent space, thus significantly boosting the model generalization for new targets and in novel scenes. Moreover, the NEO generation models the forward dynamics of agentenvironment interaction, which improves the quality of approximate inference and hence benefits data efficiency. We have conducted extensive evaluations on both real-world and synthetic benchmarks, and show that our model consistently outperforms the state-of-the-art models in terms of success rate, data efficiency, and generalization.

NeurIPS Conference 2020 Conference Paper

Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets

  • Rui Luo
  • Qiang Zhang
  • Yaodong Yang
  • Jun Wang

In this paper, we present a new practical method for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise. This is achieved by simulating a collection of replicas in parallel with different temperatures and periodically swapping them. When evolving the replicas' states, the Nos\'e-Hoover dynamics is applied, which adaptively neutralizes the mini-batch noise. To perform proper exchanges, a new protocol is developed with a noise-aware test of acceptance, by which the detailed balance is reserved in an asymptotic way. While its efficacy on complex multimodal posteriors has been illustrated by testing over synthetic distributions, experiments with deep Bayesian neural networks on large-scale datasets have shown its significant improvements over strong baselines.

AAAI Conference 2020 Conference Paper

To Avoid the Pitfall of Missing Labels in Feature Selection: A Generative Model Gives the Answer

  • Yuanyuan Xu
  • Jun Wang
  • Jinmao Wei

In multi-label learning, instances have a large number of noisy and irrelevant features, and each instance is associated with a set of class labels wherein label information is generally incomplete. These missing labels possess two sides like a coin; people cannot predict whether their provided information for feature selection is favorable (relevant) or not (irrelevant) during tossing. Existing approaches either superficially consider the missing labels as negative or indiscreetly impute them with some predicted values, which may either overestimate unobserved labels or introduce new noises in selecting discriminative features. To avoid the pitfall of missing labels, a novel unified framework of selecting discriminative features and modeling incomplete label matrix is proposed from a generative point of view in this paper. Concretely, we relax Smoothness Assumption to infer the label observability, which can reveal the positions of unobserved labels, and employ the spike-and-slab prior to perform feature selection by excluding unobserved labels. Using a data-augmentation strategy leads to full local conjugacy in our model, facilitating simple and efficient Expectation Maximization (EM) algorithm for inference. Quantitative and qualitative experimental results demonstrate the superiority of the proposed approach under various evaluation metrics.

IJCAI Conference 2020 Conference Paper

Weakly-Supervised Multi-view Multi-instance Multi-label Learning

  • Yuying Xing
  • Guoxian Yu
  • Jun Wang
  • Carlotta Domeniconi
  • Xiangliang Zhang

Multi-view, Multi-instance, and Multi-label Learning (M3L) can model complex objects (bags), which are represented with different feature views, made of diverse instances, and annotated with discrete non-exclusive labels. Existing M3L approaches assume a complete correspondence between bags and views, and also assume a complete annotation for training. However, in practice, neither the correspondence between bags, nor the bags' annotations are complete. To tackle such a weakly-supervised M3L task, a solution called WSM3L is introduced. WSM3L adapts multimodal dictionary learning to learn a shared dictionary (representational space) across views and individual encoding vectors of bags for each view. The label similarity and feature similarity of encoded bags are jointly used to match bags across views. In addition, it replenishes the annotations of a bag based on the annotations of its neighborhood bags, and introduces a dispatch and aggregation term to dispatch bag-level annotations to instances and to reversely aggregate instance-level annotations to bags. WSM3L unifies these objectives and processes in a joint objective function to predict the instance-level and bag-level annotations in a coordinated fashion, and it further introduces an alternative solution for the objective function optimization. Extensive experimental results show the effectiveness of WSM3L on benchmark datasets.

IJCAI Conference 2019 Conference Paper

A Regularized Opponent Model with Maximum Entropy Objective

  • Zheng Tian
  • Ying Wen
  • Zhichen Gong
  • Faiz Punakkath
  • Shihao Zou
  • Jun Wang

In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the "optimality". In this paper, we redefine the binary random variable o in multi-agent setting and formalize multi-agent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEO-AC. We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines.

IJCAI Conference 2019 Conference Paper

ActiveHNE: Active Heterogeneous Network Embedding

  • Xia Chen
  • Guoxian Yu
  • Jun Wang
  • Carlotta Domeniconi
  • Zhao Li
  • Xiangliang Zhang

Heterogeneous network embedding (HNE) is a challenging task due to the diverse node types and/or diverse relationships between nodes. Existing HNE methods are typically unsupervised. To maximize the profit of utilizing the rare and valuable supervised information in HNEs, we develop a novel Active Heterogeneous Network Embedding (ActiveHNE) framework, which includes two components: Discriminative Heterogeneous Network Embedding (DHNE) and Active Query in Heterogeneous Networks (AQHN). In DHNE, we introduce a novel semi-supervised heterogeneous network embedding method based on graph convolutional neural network. In AQHN, we first introduce three active selection strategies based on uncertainty and representativeness, and then derive a batch selection method that assembles these strategies using a multi-armed bandit mechanism. ActiveHNE aims at improving the performance of HNE by feeding the most valuable supervision obtained by AQHN into DHNE. Experiments on public datasets demonstrate the effectiveness of ActiveHNE and its advantage on reducing the query cost.

IJCAI Conference 2019 Conference Paper

Community Detection and Link Prediction via Cluster-driven Low-rank Matrix Completion

  • Junming Shao
  • Zhong Zhang
  • Zhongjing Yu
  • Jun Wang
  • Yi Zhao
  • Qinli Yang

Community detection and link prediction are highly dependent since knowing cluster structure as a priori will help identify missing links, and in return, clustering on networks with supplemented missing links will improve community detection performance. In this paper, we propose a Cluster-driven Low-rank Matrix Completion (CLMC), for performing community detection and link prediction simultaneously in a unified framework. To this end, CLMC decomposes the adjacent matrix of a target network as three additive matrices: clustering matrix, noise matrix and supplement matrix. The community-structure and low-rank constraints are imposed on the clustering matrix, such that the noisy edges between communities are removed and the resulting matrix is an ideal block-diagonal matrix. Missing edges are further learned via low-rank matrix completion. Extensive experiments show that CLMC achieves state-of-the-art performance.

ICML Conference 2019 Conference Paper

Greedy Orthogonal Pivoting Algorithm for Non-Negative Matrix Factorization

  • Kai Zhang
  • Sheng Zhang
  • Jun Liu
  • Jun Wang
  • Jie Zhang

Non-negative matrix factorization is a powerful tool for learning useful representations in the data and has been widely applied in many problems such as data mining and signal processing. Orthogonal NMF, which can improve the locality of decomposition, has drawn considerable interest in solving clustering problems in recent years. However, imposing simultaneous non-negative and orthogonal structure can be quite difficult, and so existing algorithms can only solve it approximately. To address this challenge, we propose an innovative procedure called Greedy Orthogonal Pivoting Algorithm (GOPA). The GOPA algorithm fully exploits the sparsity of non-negative orthogonal solutions to break the global problem into a series of local optimizations, in which an adaptive subset of coordinates are updated in a greedy, closed-form manner. The biggest advantage of GOPA is that it promotes exact orthogonality and provides solid empirical evidence that stronger orthogonality does contribute favorably to better clustering performance. On the other hand, we further design randomized and parallel version of GOPA, which can further reduce the computational cost and improve accuracy, making it suitable for large data.

AAAI Conference 2019 Conference Paper

Learning Adaptive Random Features

  • Yanjun Li
  • Kai Zhang
  • Jun Wang
  • Sanjiv Kumar

Random Fourier features are a powerful framework to approximate shift invariant kernels with Monte Carlo integration, which has drawn considerable interest in scaling up kernel-based learning, dimensionality reduction, and information retrieval. In the literature, many sampling schemes have been proposed to improve the approximation performance. However, an interesting theoretic and algorithmic challenge still remains, i. e. , how to optimize the design of random Fourier features to achieve good kernel approximation on any input data using a low spectral sampling rate? In this paper, we propose to compute more adaptive random Fourier features with optimized spectral samples (wj’s) and feature weights (pj’s). The learning scheme not only significantly reduces the spectral sampling rate needed for accurate kernel approximation, but also allows joint optimization with any supervised learning framework. We establish generalization bounds using Rademacher complexity, and demonstrate advantages over previous methods. Moreover, our experiments show that the empirical kernel approximation provides effective regularization for supervised learning.

AAAI Conference 2019 Conference Paper

Multi-View Multi-Instance Multi-Label Learning Based on Collaborative Matrix Factorization

  • Yuying Xing
  • Guoxian Yu
  • Carlotta Domeniconi
  • Jun Wang
  • Zili Zhang
  • Maozu Guo

Multi-view Multi-instance Multi-label Learning (M3L) deals with complex objects encompassing diverse instances, represented with different feature views, and annotated with multiple labels. Existing M3L solutions only partially explore the inter or intra relations between objects (or bags), instances, and labels, which can convey important contextual information for M3L. As such, they may have a compromised performance. In this paper, we propose a collaborative matrix factorization based solution called M3Lcmf. M3Lcmf first uses a heterogeneous network composed of nodes of bags, instances, and labels, to encode different types of relations via multiple relational data matrices. To preserve the intrinsic structure of the data matrices, M3Lcmf collaboratively factorizes them into low-rank matrices, explores the latent relationships between bags, instances, and labels, and selectively merges the data matrices. An aggregation scheme is further introduced to aggregate the instance-level labels into bag-level and to guide the factorization. An empirical study on benchmark datasets show that M3Lcmf outperforms other related competitive solutions both in the instance-level and bag-level prediction.

IJCAI Conference 2019 Conference Paper

Multi-View Multiple Clustering

  • Shixin Yao
  • Guoxian Yu
  • Jun Wang
  • Carlotta Domeniconi
  • Xiangliang Zhang

Multiple clustering aims at exploring alternative clusterings to organize the data into meaningful groups from different perspectives. Existing multiple clustering algorithms are designed for single-view data. We assume that the individuality and commonality of multi-view data can be leveraged to generate high-quality and diverse clusterings. To this end, we propose a novel multi-view multiple clustering (MVMC) algorithm. MVMC first adapts multi-view self-representation learning to explore the individuality encoding matrices and the shared commonality matrix of multi-view data. It additionally reduces the redundancy (i. e. , enhancing the individuality) among the matrices using the Hilbert-Schmidt Independence Criterion (HSIC), and collects shared information by forcing the shared matrix to be smooth across all views. It then uses matrix factorization on the individual matrices, along with the shared matrix, to generate diverse clusterings of high-quality. We further extend multiple co-clustering on multi-view data and propose a solution called multi-view multiple co-clustering (MVMCC). Our empirical study shows that MVMC (MVMCC) can exploit multi-view data to generate multiple high-quality and diverse clusterings (co-clusterings), with superior performance to the state-of-the-art methods.

NeurIPS Conference 2019 Conference Paper

Multi-View Reinforcement Learning

  • Minne Li
  • Lisheng Wu
  • Jun Wang
  • Haitham Bou Ammar

This paper is concerned with multi-view reinforcement learning (MVRL), which allows for decision making when agents share common dynamics but adhere to different observation models. We define the MVRL framework by extending partially observable Markov decision processes (POMDPs) to support more than one observation model and propose two solution methods through observation augmentation and cross-view policy transfer. We empirically evaluate our method and demonstrate its effectiveness in a variety of environments. Specifically, we show reductions in sample complexities and computational time for acquiring policies that handle multi-view environments.

AAAI Conference 2019 Conference Paper

Multiple Independent Subspace Clusterings

  • Xing Wang
  • Jun Wang
  • Carlotta Domeniconi
  • Guoxian Yu
  • Guoqiang Xiao
  • Maozu Guo

Multiple clustering aims at discovering diverse ways of organizing data into clusters. Despite the progress made, it’s still a challenge for users to analyze and understand the distinctive structure of each output clustering. To ease this process, we consider diverse clusterings embedded in different subspaces, and analyze the embedding subspaces to shed light into the structure of each clustering. To this end, we provide a two-stage approach called MISC (Multiple Independent Subspace Clusterings). In the first stage, MISC uses independent subspace analysis to seek multiple and statistical independent (i. e. non-redundant) subspaces, and determines the number of subspaces via the minimum description length principle. In the second stage, to account for the intrinsic geometric structure of samples embedded in each subspace, MISC performs graph regularized semi-nonnegative matrix factorization to explore clusters. It additionally integrates the kernel trick into matrix factorization to handle non-linearly separable clusters. Experimental results on synthetic datasets show that MISC can find different interesting clusterings from the sought independent subspaces, and it also outperforms other related and competitive approaches on real-world datasets.

IJCAI Conference 2019 Conference Paper

Novel Collaborative Filtering Recommender Friendly to Privacy Protection

  • Jun Wang
  • Qiang Tang
  • Afonso Arriaga
  • Peter Y. A. Ryan

Nowadays, recommender system is an indispensable tool in many information services, and a large number of algorithms have been designed and implemented. However, fed with very large datasets, state-of-the-art recommendation algorithms often face an efficiency bottleneck, i. e. , it takes huge amount of computing resources to train a recommendation model. In order to satisfy the needs of privacy-savvy users who do not want to disclose their information to the service provider, the complexity of most existing solutions becomes prohibitive. As such, it is an interesting research question to design simple and efficient recommendation algorithms that achieve reasonable accuracy and facilitate privacy protection at the same time. In this paper, we propose an efficient recommendation algorithm, named CryptoRec, which has two nice properties: (1) can estimate a new user's preferences by directly using a model pre-learned from an expert dataset, and the new user's data is not required to train the model; (2) can compute recommendations with only addition and multiplication operations. As to the evaluation, we first test the recommendation accuracy on three real-world datasets and show that CryptoRec is competitive with state-of-the-art recommenders. Then, we evaluate the performance of the privacy-preserving variants of CryptoRec and show that predictions can be computed in seconds on a PC. In contrast, existing solutions will need tens or hundreds of hours on more powerful computers.

AAAI Conference 2019 Conference Paper

Practical Algorithms for Multi-Stage Voting Rules with Parallel Universes Tiebreaking

  • Jun Wang
  • Sujoy Sikdar
  • Tyler Shepherd
  • Zhibing Zhao
  • Chunheng Jiang
  • Lirong Xia

STV and ranked pairs (RP) are two well-studied voting rules for group decision-making. They proceed in multiple rounds, and are affected by how ties are broken in each round. However, the literature is surprisingly vague about how ties should be broken. We propose the first algorithms for computing the set of alternatives that are winners under some tiebreaking mechanism under STV and RP, which is also known as parallel-universes tiebreaking (PUT). Unfortunately, PUTwinners are NP-complete to compute under STV and RP, and standard search algorithms from AI do not apply. We propose multiple DFS-based algorithms along with pruning strategies, heuristics, sampling and machine learning to prioritize search direction to significantly improve the performance. We also propose novel ILP formulations for PUT-winners under STV and RP, respectively. Experiments on synthetic and realworld data show that our algorithms are overall faster than ILP.

AAAI Conference 2019 Conference Paper

Ranking-Based Deep Cross-Modal Hashing

  • Xuanwu Liu
  • Guoxian Yu
  • Carlotta Domeniconi
  • Jun Wang
  • Yazhou Ren
  • Maozu Guo

Cross-modal hashing has been receiving increasing interests for its low storage cost and fast query speed in multi-modal data retrievals. However, most existing hashing methods are based on hand-crafted or raw level features of objects, which may not be optimally compatible with the coding process. Besides, these hashing methods are mainly designed to handle simple pairwise similarity. The complex multilevel ranking semantic structure of instances associated with multiple labels has not been well explored yet. In this paper, we propose a ranking-based deep cross-modal hashing approach (RDCMH). RDCMH firstly uses the feature and label information of data to derive a semi-supervised semantic ranking list. Next, to expand the semantic representation power of hand-crafted features, RDCMH integrates the semantic ranking information into deep cross-modal hashing and jointly optimizes the compatible parameters of deep feature representations and of hashing functions. Experiments on real multi-modal datasets show that RDCMH outperforms other competitive baselines and achieves the state-of-the-art performance in cross-modal retrieval applications.

AAAI Conference 2019 Conference Paper

The Kelly Growth Optimal Portfolio with Ensemble Learning

  • Weiwei Shen
  • Bin Wang
  • Jian Pu
  • Jun Wang

As a competitive alternative to the Markowitz mean-variance portfolio, the Kelly growth optimal portfolio has drawn sufficient attention in investment science. While the growth optimal portfolio is theoretically guaranteed to dominate any other portfolio with probability 1 in the long run, it practically tends to be highly risky in the short term. Moreover, empirical analysis and performance enhancement studies under practical settings are surprisingly short. In particular, how to handle the challenging but realistic condition with insufficient training data has barely been investigated. In order to fill voids, especially grappling with the difficulty from small samples, in this paper, we propose a growth optimal portfolio strategy equipped with ensemble learning. We synergically leverage the bootstrap aggregating algorithm and the random subspace method into portfolio construction to mitigate estimation error. We analyze the behavior and hyperparameter selection of the proposed strategy by simulation, and then corroborate its effectiveness by comparing its out-of-sample performance with those of 10 competing strategies on four datasets. Experimental results lucidly confirm that the new strategy has superiority in extensive evaluation criteria.

AAAI Conference 2018 Conference Paper

A Neural Stochastic Volatility Model

  • Rui Luo
  • Weinan Zhang
  • Xiaojun Xu
  • Jun Wang

In this paper, we show that the recent integration of statistical models with deep recurrent neural networks provides a new way of formulating volatility (the degree of variation of time series) models that have been widely used in time series analysis and prediction in finance. The model comprises a pair of complementary stochastic recurrent neural networks: the generative network models the joint distribution of the stochastic volatility process; the inference network approximates the conditional distribution of the latent variables given the observables. Our focus here is on the formulation of temporal dynamics of volatility over time under a stochastic recurrent neural network framework. Experiments on real-world stock price datasets demonstrate that the proposed model generates a better volatility estimation and prediction that outperforms mainstream methods, e. g. , deterministic models such as GARCH and its variants, and stochastic models namely the MCMC-based stochvol as well as the Gaussian-processbased, on average negative log-likelihood.

AAMAS Conference 2018 Conference Paper

A Study of AI Population Dynamics with Million-agent Reinforcement Learning

  • Yaodong Yang
  • Lantao Yu
  • Yiwei Bai
  • Ying Wen
  • Weinan Zhang
  • Jun Wang

We1 conduct an empirical study on discovering the ordered collective dynamics obtained by a population of intelligence agents, driven by million-agent reinforcement learning. Our intention is to put intelligent agents into a simulated natural context and verify if the principles developed in the real world could also be used in understanding an artificially-created intelligent population. To achieve this, we simulate a large-scale predator-prey world, where the laws of the world are designed by only the findings or logical equivalence that have been discovered in nature. We endow the agents with the intelligence based on deep reinforcement learning (DRL). In order to scale the population size up to millions agents, a large-scale DRL training platform with redesigned experience buffer is proposed. Our results show that the population dynamics of AI agents, driven only by each agent’s individual self-interest, reveals an ordered pattern that is similar to the Lotka-Volterra model studied in population biology. We further discover the emergent behaviors of collective adaptations in studying how the agents’ grouping behaviors will change with the environmental resources. Both of the two findings could be explained by the self-organization theory in nature.

JBHI Journal 2018 Journal Article

An Unobtrusive Computerized Assessment Framework for Unilateral Peripheral Facial Paralysis

  • Zhexiao Guo
  • Guo Dan
  • Jianghuai Xiang
  • Jun Wang
  • Wanzhang Yang
  • Huijun Ding
  • Oliver Deussen
  • Yongjin Zhou

Unilateral peripheral facial paralysis (UPFP) is a form of facial nerve paralysis and clinically classified according to conditions of facial symmetry. Prompt and precise assessment is crucial to neural rehabilitation of UPFP. The prevalent House-Brackmann (HB) grading system relies on subjective judgments with significant interobservation variation. Therefore, to explore an objective method for the UPFP assessment, clinical image sequences are captured using a web camera setup while 5 healthy and 27 UPFP subjects perform a group of predefined actions, including keeping expressionless, raising brows, closing eyes, bulging cheek, and showing teeth in turn. First, facial region is decided using Haar cascade classifier, and then landmark points are acquired by a supervised descent method. Second, these landmark points are used to generate a group of features reflecting the structural parameters of regions of eyebrows, eyes, nose, and mouth, respectively. Third, correlation coefficients are computed between the raw features HB scores. To reduce feature dimensions, only those with correlation coefficients larger than an empirically selected value, 0. 35, are input into a support vector machine to generate a classifier. With the classifier, exact match (discrepancy = 0 between result from proposed method and HB scores) rate at 49. 9%, and loose match (discrepancy = 1) rate at 87. 97% are achieved on the experiment data. After sample augmentation, the final rate is increased to 90. 01%, outperformed previous reports. In conclusion, it is demonstrated with an unobtrusive web camera setup, encouraging results have been generated with the proposed framework in this exploratory study.

AAAI Conference 2018 Conference Paper

Efficient Architecture Search by Network Transformation

  • Han Cai
  • Tianyao Chen
  • Weinan Zhang
  • Yong Yu
  • Jun Wang

Techniques for automatically designing deep neural network architectures such as reinforcement learning based approaches have recently shown promising results. However, their success is based on vast computational resources (e. g. hundreds of GPUs), making them difficult to be widely used. A noticeable limitation is that they still design and train each network from scratch during the exploration of the architecture space, which is highly inefficient. In this paper, we propose a new framework toward efficient architecture search by exploring the architecture space based on the current network and reusing its weights. We employ a reinforcement learning agent as the meta-controller, whose action is to grow the network depth or layer width with function-preserving transformations. As such, the previously validated networks can be reused for further exploration, thus saves a large amount of computational cost. We apply our method to explore the architecture space of the plain convolutional neural networks (no skip-connections, branching etc.) on image benchmark datasets (CIFAR-10, SVHN) with restricted computational resources (5 GPUs). Our method can design highly competitive networks that outperform existing networks using the same design scheme. On CIFAR-10, our model without skip-connections achieves 4. 23% test error rate, exceeding a vast majority of modern architectures and approaching DenseNet. Furthermore, by applying our method to explore the DenseNet architecture space, we are able to achieve more accurate networks with fewer parameters.

IJCAI Conference 2018 Conference Paper

Incomplete Multi-View Weak-Label Learning

  • Qiaoyu Tan
  • Guoxian Yu
  • Carlotta Domeniconi
  • Jun Wang
  • Zili Zhang

Learning from multi-view multi-label data has wide applications. There are two main challenges of this learning task: incomplete views and missing (weak) labels. The former assumes that views may not include all data objects. The weak label setting implies that only a subset of relevant labels are provided for training objects while other labels are missing. Both incomplete views and weak labels can lead to significant performance degradation. In this paper, we propose a novel model (iMVWL) to jointly address the two challenges. iMVWL simultaneously learns a shared subspace from incomplete views with weak labels, the local label structure and the predictor in this subspace, which can not only capture cross-view relationships but also weak-label information of training samples. We further develop an alternative solution to optimize our model, this solution can avoid suboptimal results and reinforce their reciprocal effects, and thus further improve the performance. Extensive experimental results on several real-world datasets validate the effectiveness of our model against other competitive algorithms.

IJCAI Conference 2018 Conference Paper

Learning Sequential Correlation for User Generated Textual Content Popularity Prediction

  • Wen Wang
  • Wei Zhang
  • Jun Wang
  • Junchi Yan
  • Hongyuan Zha

Popularity prediction of user generated textual content is critical for prioritizing information in the web, which alleviates heavy information overload for ordinary readers. Most previous studies model each content instance separately for prediction and thus overlook the sequential correlations between instances of a specific user. In this paper, we go deeper into this problem based on the two observations for each user, i. e. , sequential content correlation and sequential popularity correlation. We propose a novel deep sequential model called User Memory-augmented recurrent Attention Network (UMAN). This model encodes the two correlations by updating external user memories which is further leveraged for target text representation learning and popularity prediction. The experimental results on several real-world datasets validate the benefits of considering these correlations and demonstrate UMAN achieves best performance among several strong competitors.

IJCAI Conference 2018 Conference Paper

Learning to Design Games: Strategic Environments in Reinforcement Learning

  • Haifeng Zhang
  • Jun Wang
  • Zhiming Zhou
  • Weinan Zhang
  • Yin Wen
  • Yong Yu
  • Wenxin Li

In typical reinforcement learning (RL), the environment is assumed given and the goal of the learning is to identify an optimal policy for the agent taking actions through its interactions with the environment. In this paper, we extend this setting by considering the environment is not given, but controllable and learnable through its interaction with the agent at the same time. This extension is motivated by environment design scenarios in the real-world, including game design, shopping space design and traffic signal design. Theoretically, we find a dual Markov decision process (MDP) w. r. t. the environment to that w. r. t. the agent, and derive a policy gradient solution to optimizing the parametrized environment. Furthermore, discontinuous environments are addressed by a proposed general generative framework. Our experiments on a Maze game design task show the effectiveness of the proposed algorithms in generating diverse and challenging Mazes against various agent settings.

AAAI Conference 2018 Conference Paper

Long Text Generation via Adversarial Training with Leaked Information

  • Jiaxian Guo
  • Sidi Lu
  • Han Cai
  • Weinan Zhang
  • Yong Yu
  • Jun Wang

Automatically generating coherent and semantically meaningful text has many applications in machine translation, dialogue systems, image captioning, etc. Recently, by combining with policy gradient, Generative Adversarial Nets (GAN) that use a discriminative model to guide the training of the generative model as a reinforcement learning policy has shown promising results in text generation. However, the scalar guiding signal is only available after the entire text has been generated and lacks intermediate information about text structure during the generative process. As such, it limits its success when the length of the generated text samples is long (more than 20 words). In this paper, we propose a new framework, called LeakGAN, to address the problem for long text generation. We allow the discriminative net to leak its own high-level extracted features to the generative net to further help the guidance. The generator incorporates such informative signals into all generation steps through an additional MANAGER module, which takes the extracted features of current generated words and outputs a latent vector to guide the WORKER module for next-word generation. Our extensive experiments on synthetic data and various realworld tasks with Turing test demonstrate that LeakGAN is highly effective in long text generation and also improves the performance in short text generation scenarios. More importantly, without any supervision, LeakGAN would be able to implicitly learn sentence structures only through the interaction between MANAGER and WORKER.

AAAI Conference 2018 System Paper

MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence

  • Lianmin Zheng
  • Jiacheng Yang
  • Han Cai
  • Ming Zhou
  • Weinan Zhang
  • Jun Wang
  • Yong Yu

We introduce MAgent, a platform to support research and development of many-agent reinforcement learning. Unlike previous research platforms on single or multi-agent reinforcement learning, MAgent focuses on supporting the tasks and the applications that require hundreds to millions of agents. Within the interactions among a population of agents, it enables not only the study of learning algorithms for agents’ optimal polices, but more importantly, the observation and understanding of individual agent’s behaviors and social phenomena emerging from the AI society, including communication languages, leaderships, altruism. MAgent is highly scalable and can host up to one million agents on a single GPU server. MAgent also provides flexible configurations for AI researchers to design their customized environments and agents. In this demo, we present three environments designed on MAgent and show emerged collective intelligence by learning from scratch.

IJCAI Conference 2018 Conference Paper

Multi-Label Co-Training

  • Yuying Xing
  • Guoxian Yu
  • Carlotta Domeniconi
  • Jun Wang
  • Zili Zhang

Multi-label learning aims at assigning a set of appropriate labels to multi-label samples. Although it has been successfully applied in various domains in recent years, most multi-label learning methods require sufficient labeled training samples, because of the large number of possible label sets. Co-training, as an important branch of semi-supervised learning, can leverage unlabeled samples, along with scarce labeled ones, and can potentially help with the large labeled data requirement. However, it is a difficult challenge to combine multi-label learning with co-training. Two distinct issues are associated with the challenge: (i) how to solve the widely-witnessed class-imbalance problem in multi-label learning; and (ii) how to select samples with confidence, and communicate their predicted labels among classifiers for model refinement. To address these issues, we introduce an approach called Multi-Label Co-Training (MLCT). MLCT leverages information concerning the co-occurrence of pairwise labels to address the class-imbalance challenge; it introduces a predictive reliability measure to select samples, and applies label-wise filtering to confidently communicate labels of selected samples among co-training classifiers. MLCT performs favorably against related competitive multi-label learning methods on benchmark datasets and it is also robust to the input parameters.

AAAI Conference 2018 Conference Paper

Tau-FPL: Tolerance-Constrained Learning in Linear Time

  • Ao Zhang
  • Nan Li
  • Jian Pu
  • Jun Wang
  • Junchi Yan
  • Hongyuan Zha

In many real-world applications, learning a classifier with false-positive rate under a specified tolerance is appealing. Existing approaches either introduce prior knowledge dependent label cost or tune parameters based on traditional classi- fiers, which are of limitation in methodology since they do not directly incorporate the false-positive rate tolerance. In this paper, we propose a novel scoring-thresholding approach, τ- False Positive Learning (τ-FPL) to address this problem. We show that the scoring problem which takes the false-positive rate tolerance into accounts can be efficiently solved in linear time, also an out-of-bootstrap thresholding method can transform the learned ranking function into a low false-positive classifier. Both theoretical analysis and experimental results show superior performance of the proposed τ-FPL over the existing approaches.

NeurIPS Conference 2018 Conference Paper

Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning

  • Rui Luo
  • Jianhong Wang
  • Yaodong Yang
  • Jun Wang
  • Zhanxing Zhu

In this paper, we propose a novel sampling method, the thermostat-assisted continuously-tempered Hamiltonian Monte Carlo, for the purpose of multimodal Bayesian learning. It simulates a noisy dynamical system by incorporating both a continuously-varying tempering variable and the Nos\'e-Hoover thermostats. A significant benefit is that it is not only able to efficiently generate i. i. d. samples when the underlying posterior distributions are multimodal, but also capable of adaptively neutralising the noise arising from the use of mini-batches. While the properties of the approach have been studied using synthetic datasets, our experiments on three real datasets have also shown its performance gains over several strong baselines for Bayesian learning with various types of neural networks plunged in.

AAAI Conference 2017 Conference Paper

Portfolio Selection via Subset Resampling

  • Weiwei Shen
  • Jun Wang

As the cornerstone of the modern portfolio theory, Markowitz’s mean-variance optimization is a major model adopted in portfolio management. However, the estimation errors in its input parameters substantially deteriorate its performance in practice. Specifically, loss could be huge when the number of assets for investment is not much smaller than the sample size of historical data. To hasten the applicability of Markowitz’s portfolio optimization to large portfolios, in this paper, we propose a new portfolio strategy via subset resampling. Through resampling subsets of the original large universe of assets, we construct the associated subset portfolios with more accurately estimated parameters without requiring additional data. By aggregating a number of constructed subset portfolios, we attain a well-diversified portfolio of all assets. To investigate its performance, we first analyze its corresponding efficient frontiers by simulation, provide analysis on the hyperparameter selection, and then empirically compare its out-of-sample performance with those of various competing strategies on diversified datasets. Experimental results corroborate that the proposed portfolio strategy has marked superiority in extensive evaluation criteria.

AAAI Conference 2017 Conference Paper

Random Features for Shift-Invariant Kernels with Moment Matching

  • Weiwei Shen
  • Zhihui Yang
  • Jun Wang

In order to grapple with the conundrum in the scalability of kernel-based learning algorithms, the method of approximating nonlinear kernels via random feature maps has attracted wide attention in large-scale learning systems. Specifically, the associated sampling procedure is one critical component that dictates the quality of random feature maps. However, for high-dimensional features, the standard Monte Carlo sampling method has been shown to be less effective in producing low-variance random samples. In consequence, it demands constructing a large number of features to attain the desired accuracy for downstream use. In this paper, we present a novel sampling algorithm powered by moment matching techniques to reduce the variance of random features. Our extensive empirical studies and comparisons with several highly competitive peer methods verify the superiority of the proposed algorithm in Gram matrix approximation and generalization errors in regression. Our rigorous theoretical proofs justify that the proposed algorithm is guaranteed achieving lower variance than the standard Monte Carlo method in high dimensional settings.

AAAI Conference 2017 Conference Paper

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

  • Lantao Yu
  • Weinan Zhang
  • Jun Wang
  • Yong Yu

As a new way of training generative models, Generative Adversarial Net (GAN) that uses a discriminative model to guide the training of the generative model has enjoyed considerable success in generating real-valued data. However, it has limitations when the goal is for generating sequences of discrete tokens. A major reason lies in that the discrete outputs from the generative model make it difficult to pass the gradient update from the discriminative model to the generative model. Also, the discriminative model can only assess a complete sequence, while for a partially generated sequence, it is non-trivial to balance its current score and the future one once the entire sequence has been generated. In this paper, we propose a sequence generation framework, called SeqGAN, to solve the problems. Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update. The RL reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines.

IJCAI Conference 2016 Conference Paper

Portfolio Blending via Thompson Sampling

  • Weiwei Shen
  • Jun Wang

As a definitive investment guideline for institutions and individuals, Markowitz's modern portfolio theory is ubiquitous in financial industry. However, its noticeably poor out-of-sample performance due to the inaccurate estimation of parameters evokes unremitting efforts of investigating effective remedies. One common retrofit that blends portfolios from disparate investment perspectives has received growing attention. While even a naive portfolio blending strategy can be empirically successful, how to effectually and robustly blend portfolios to generate stable performance improvement remains less explored. In this paper, we present a novel online algorithm that leverages Thompson sampling into the sequential decision-making process for portfolio blending. By modeling blending coefficients as probabilities of choosing basis portfolios and utilizing Bayes decision rules to update the corresponding distribution functions, our algorithm sequentially determines the optimal coefficients to blend multiple portfolios that embody different criteria of investment and market views. Compared with competitive trading strategies across various benchmarks, our method shows superiority through standard evaluation metrics.

TIST Journal 2015 Journal Article

Multi-Keyword Multi-Click Advertisement Option Contracts for Sponsored Search

  • Bowei Chen
  • Jun Wang
  • Ingemar J. Cox
  • Mohan S. Kankanhalli

In sponsored search, advertisement (abbreviated ad) slots are usually sold by a search engine to an advertiser through an auction mechanism in which advertisers bid on keywords. In theory, auction mechanisms have many desirable economic properties. However, keyword auctions have a number of limitations including: the uncertainty in payment prices for advertisers; the volatility in the search engine’s revenue; and the weak loyalty between advertiser and search engine. In this article, we propose a special ad option that alleviates these problems. In our proposal, an advertiser can purchase an option from a search engine in advance by paying an upfront fee, known as the option price. The advertiser then has the right, but no obligation, to purchase among the prespecified set of keywords at the fixed cost-per-clicks (CPCs) for a specified number of clicks in a specified period of time. The proposed option is closely related to a special exotic option in finance that contains multiple underlying assets (multi-keyword) and is also multi-exercisable (multi-click). This novel structure has many benefits: advertisers can have reduced uncertainty in advertising; the search engine can improve the advertisers’ loyalty as well as obtain a stable and increased expected revenue over time. Since the proposed ad option can be implemented in conjunction with the existing keyword auctions, the option price and corresponding fixed CPCs must be set such that there is no arbitrage between the two markets. Option pricing methods are discussed and our experimental results validate the development. Compared to keyword auctions, a search engine can have an increased expected revenue by selling an ad option.

AAAI Conference 2015 Conference Paper

Multi-View Point Registration via Alternating Optimization

  • Junchi Yan
  • Jun Wang
  • Hongyuan Zha
  • Xiaokang Yang
  • Stephen Chu

Multi-view point registration is a relatively less studied problem compared with two-view point registration. Directly applying pairwise registration often leads to matching discrepancy as the mapping between two point sets can be determined either by direct correspondences or by any intermediate point set. Also, the local two-view registration tends to be sensitive to noises. We propose a novel multi-view registration method, where the optimal registration is achieved via an efficient and effective alternating concave minimization process. We further extend our solution to a general case in practice of registration among point sets with different cardinalities. Extensive empirical evaluations of peer methods on both synthetic data and real images suggest our method is robust to large disturbance. In particular, it is shown that our method outperforms peer point matching methods and performs competitively against graph matching approaches. The latter approaches utilize the additional second-order information at the cost of exponentially increased run-time, thus usually being less efficient.

IJCAI Conference 2015 Conference Paper

Optimal Bayesian Hashing for Efficient Face Recognition

  • Qi Dai
  • Jianguo Li
  • Jun Wang
  • Yurong Chen
  • Yu-Gang Jiang

In practical applications, it is often observed that high-dimensional features can yield good performance, while being more costly in both computation and storage. In this paper, we propose a novel method called Bayesian Hashing to learn an optimal Hamming embedding of high-dimensional features, with a focus on the challenging application of face recognition. In particular, a boosted random FERNs classification model is designed to perform efficient face recognition, in which bit correlations are elaborately approximated with a random permutation technique. Without incurring additional storage cost, multiple random permutations are then employed to train a series of classifiers for achieving better discrimination power. In addition, we introduce a sequential forward floating search (SFFS) algorithm to perform model selection, resulting in further performance improvement. Extensive experimental evaluations and comparative studies clearly demonstrate that the proposed Bayesian Hashing approach outperforms other peer methods in both accuracy and speed. We achieve state-of-the-art results on well-known face recognition benchmarks using compact binary codes with significantly reduced computational overload and storage cost.

IJCAI Conference 2015 Conference Paper

Portfolio Choices with Orthogonal Bandit Learning

  • Weiwei Shen
  • Jun Wang
  • Yu-Gang Jiang
  • Hongyuan Zha

The investigation and development of new methods from diverse perspectives to shed light on portfolio choice problems has never stagnated in financial research. Recently, multi-armed bandits have drawn intensive attention in various machine learning applications in online settings. The tradeoff between exploration and exploitation to maximize rewards in bandit algorithms naturally establishes a connection to portfolio choice problems. In this paper, we present a bandit algorithm for conducting online portfolio choices by effectually exploiting correlations among multiple arms. Through constructing orthogonal portfolios from multiple assets and integrating with the upper confidence bound bandit framework, we derive the optimal portfolio strategy that represents the combination of passive and active investments according to a risk-adjusted reward function. Compared with oft-quoted trading strategies in finance and machine learning fields across representative real-world market datasets, the proposed algorithm demonstrates superiority in both risk-adjusted return and cumulative wealth.

AAAI Conference 2015 Conference Paper

Probabilistic Attributed Hashing

  • Mingdong Ou
  • Peng Cui
  • Jun Wang
  • Fei Wang
  • Wenwu Zhu

Due to the simplicity and efficiency, many hashing methods have recently been developed for large-scale similarity search. Most of the existing hashing methods focus on mapping low-level features to binary codes, but neglect attributes that are commonly associated with data samples. Attribute data, such as image tag, product brand, and user profile, can represent human recognition better than low-level features. However, attributes have specific characteristics, including high-dimensional, sparse and categorical properties, which is hardly leveraged into the existing hashing learning frameworks. In this paper, we propose a hashing learning framework, Probabilistic Attributed Hashing (PAH), to integrate attributes with low-level features. The connections between attributes and low-level features are built through sharing a common set of latent binary variables, i. e. hash codes, through which attributes and features can complement each other. Finally, we develop an efficient iterative learning algorithm, which is generally feasible for large-scale applications. Extensive experiments and comparison study are conducted on two public datasets, i. e. , DBLP and NUS-WIDE. The results clearly demonstrate that the proposed PAH method substantially outperforms the peer methods.

NeurIPS Conference 2015 Conference Paper

Space-Time Local Embeddings

  • Ke Sun
  • Jun Wang
  • Alexandros Kalousis
  • Stephane Marchand-Maillet

Space-time is a profound concept in physics. This concept was shown to be useful for dimensionality reduction. We present basic definitions with interesting counter-intuitions. We give theoretical propositions to show that space-time is a more powerful representation than Euclidean space. We apply this concept to manifold learning for preserving local information. Empirical results on non-metric datasets show that more information can be preserved in space-time.

AAAI Conference 2015 Conference Paper

Transaction Costs-Aware Portfolio Optimization via Fast Lowner-John Ellipsoid Approximation

  • Weiwei Shen
  • Jun Wang

Merton’s portfolio optimization problem in the presence of transaction costs for multiple assets has been an important and challenging problem in both theory and practice. Most existing work suffers from curse of dimensionality and encounters with the difficulty of generalization. In this paper, we develop an approximate dynamic programing method of synergistically combining the Löwner-John ellipsoid approximation with conventional value function iteration to quantify the associated optimal trading policy. Through constructing Löwner-John ellipsoids to parameterize the optimal policy and taking Euclidean projections onto the constructed ellipsoids to implement the trading policy, the proposed algorithm has cut computational costs up to a factor of five hundred and meanwhile achieved nearoptimal risk-adjusted returns across both synthetic and real-world market datasets.

AAAI Conference 2014 Conference Paper

Doubly Regularized Portfolio with Risk Minimization

  • Weiwei Shen
  • Jun Wang
  • Shiqian Ma

Due to recent empirical success, machine learning algorithms have drawn sufficient attention and are becoming important analysis tools in financial industry. In particular, as the core engine of many financial services such as private wealth and pension fund management, portfolio management calls for the application of those novel algorithms. Most of portfolio allocation strategies do not account for costs from market frictions such as transaction costs and capital gain taxes, as the complexity of sensible cost models often causes the induced problem intractable. In this paper, we propose a doubly regularized portfolio that provides a modest but effective solution to the above difficulty. Specifically, as all kinds of trading costs primarily root in large transaction volumes, to reduce volumes we synergistically combine two penalty terms with classic risk minimization models to ensure: (1) only a small set of assets are selected to invest in each period; (2) portfolios in consecutive trading periods are similar. To assess the new portfolio, we apply standard evaluation criteria and conduct extensive experiments on well-known benchmarks and market datasets. Compared with various state-of-the-art portfolios, the proposed portfolio demonstrates a superior performance of having both higher risk-adjusted returns and dramatically decreased transaction volumes.

AAAI Conference 2014 Conference Paper

Privacy and Regression Model Preserved Learning

  • Jinfeng Yi
  • Jun Wang
  • Rong Jin

Sensitive data such as medical records and business reports usually contains valuable information that can be used to build prediction models. However, designing learning models by directly using sensitive data might result in severe privacy and copyright issues. In this paper, we propose a novel matrix completion based framework that aims to tackle two challenging issues simultaneously: i) handling missing and noisy sensitive data, and ii) preserving the privacy of the sensitive data during the learning process. In particular, the proposed framework is able to mask the sensitive data while ensuring that the transformed data are still usable for training regression models. We show that two key properties, namely model preserving and privacy preserving, are satisfied by the transformed data obtained from the proposed framework. In model preserving, we guarantee that the linear regression model built from the masked data approximates the regression model learned from the original data in a perfect way. In privacy preserving, we ensure that the original sensitive data cannot be recovered since the transformation procedure is irreversible. Given these two characteristics, the transformed data can be safely released to any learners for designing prediction models without revealing any private content. Our empirical studies with a synthesized dataset and multiple sensitive benchmark datasets verify our theoretical claim as well as the effectiveness of the proposed framework.

IJCAI Conference 2013 Conference Paper

Multiple Task Learning Using Iteratively Reweighted Least Square

  • Jian Pu
  • Yu-Gang Jiang
  • Jun Wang
  • Xiangyang Xue

Multiple task learning (MTL) is becoming popular due to its theoretical advances and empirical successes. The key idea of MTL is to explore the hidden relationships among multiple tasks to enhance learning performance. Recently, many MTL algorithms have been developed and applied to various problems such as feature selection and kernel learning. However, most existing methods highly relied on certain assumptions of the task relationships. For instance, several works assumed that there is a major task group and several outlier tasks, and used a decomposition approach to identify the group structure and outlier tasks simultaneously. In this paper, we adopt a more general formulation for MTL without making specific structure assumptions. Instead of performing model decomposition, we directly impose an elastic-net regularization with a mixture of the structure and outlier penalties and formulate the objective as an unconstrained convex problem. To derive the optimal solution efficiently, we propose to use an Iteratively Reweighted Least Square (IRLS) method with a preconditioned conjugate gradient, which is computationally affordable for high dimensional data. Extensive experiments are conducted over both synthetic and real data, and comparisons with several state-of-the-art algorithms clearly show the superior performance of the proposed method.

JMLR Journal 2013 Journal Article

Semi-Supervised Learning Using Greedy Max-Cut

  • Jun Wang
  • Tony Jebara
  • Shih-Fu Chang

Graph-based semi-supervised learning (SSL) methods play an increasingly important role in practical machine learning systems, particularly in agnostic settings when no parametric information or other prior knowledge is available about the data distribution. Given the constructed graph represented by a weight matrix, transductive inference is used to propagate known labels to predict the values of all unlabeled vertices. Designing a robust label diffusion algorithm for such graphs is a widely studied problem and various methods have recently been suggested. Many of these can be formalized as regularized function estimation through the minimization of a quadratic cost. However, most existing label diffusion methods minimize a univariate cost with the classification function as the only variable of interest. Since the observed labels seed the diffusion process, such univariate frameworks are extremely sensitive to the initial label choice and any label noise. To alleviate the dependency on the initial observed labels, this article proposes a bivariate formulation for graph-based SSL, where both the binary label information and a continuous classification function are arguments of the optimization. This bivariate formulation is shown to be equivalent to a linearly constrained Max-Cut problem. Finally an efficient solution via greedy gradient Max-Cut (GGMC) is derived which gradually assigns unlabeled vertices to each class with minimum connectivity. Once convergence guarantees are established, this greedy Max-Cut based SSL is applied on both artificial and standard benchmark data sets where it obtains superior classification accuracy compared to existing state-of-the-art SSL methods. Moreover, GGMC shows robustness with respect to the graph construction method and maintains high accuracy over extensive experiments with various edge linking and weighting schemes. [abs] [ pdf ][ bib ] &copy JMLR 2013. ( edit, beta )

UAI Conference 2012 Conference Paper

Fast Graph Construction Using Auction Algorithm

  • Jun Wang
  • Yinglong Xia

In practical machine learning systems, graph based data representation has been widely used in various learning paradigms, ranging from unsupervised clustering to supervised classification. Besides those applications with natural graph or network structure data, such as social network analysis and relational learning, many other applications often involve a critical step in converting data vectors to an adjacency graph. In particular, a sparse subgraph extracted from the original graph is often required due to both theoretic and practical needs. Previous study clearly shows that the performance of different learning algorithms, e.g., clustering and classification, benefits from such sparse subgraphs with balanced node connectivity. However, the existing graph construction methods are either computationally expensive or with unsatisfactory performance. In this paper, we utilize a scalable method called auction algorithm and its parallel extension to recover a sparse yet nearly balanced subgraph with significantly reduced computational cost. Empirical study and comparison with the stateof-art approaches clearly demonstrate the superiority of the proposed method in both ef- ficiency and accuracy.

NeurIPS Conference 2012 Conference Paper

Parametric Local Metric Learning for Nearest Neighbor Classification

  • Jun Wang
  • Alexandros Kalousis
  • Adam Woznica

We study the problem of learning local metrics for nearest neighbor classification. Most previous works on local metric learning learn a number of local unrelated metrics. While this ''independence'' approach delivers an increased flexibility its downside is the considerable risk of overfitting. We present a new parametric local metric learning method in which we learn a smooth metric matrix function over the data manifold. Using an approximation error bound of the metric matrix function we learn local metrics as linear combinations of basis metrics defined on anchor points over different regions of the instance space. We constrain the metric matrix function by imposing on the linear combinations manifold regularization which makes the learned metric matrix function vary smoothly along the geodesics of the data manifold. Our metric learning method has excellent performance both in terms of predictive power and scalability. We experimented with several large-scale classification problems, tens of thousands of instances, and compared it with several state of the art metric learning methods, both global and local, as well as to SVM with automatic kernel selection, all of which it outperforms in a significant manner.

NeurIPS Conference 2011 Conference Paper

Metric Learning with Multiple Kernels

  • Jun Wang
  • Huyen T.
  • Adam Woznica
  • Alexandros Kalousis

Metric learning has become a very active research field. The most popular representative--Mahalanobis metric learning--can be seen as learning a linear transformation and then computing the Euclidean metric in the transformed space. Since a linear transformation might not always be appropriate for a given learning problem, kernelized versions of various metric learning algorithms exist. However, the problem then becomes finding the appropriate kernel function. Multiple kernel learning addresses this limitation by learning a linear combination of a number of predefined kernels; this approach can be also readily used in the context of multiple-source learning to fuse different data sources. Surprisingly, and despite the extensive work on multiple kernel learning for SVMs, there has been no work in the area of metric learning with multiple kernel learning. In this paper we fill this gap and present a general approach for metric learning with multiple kernel learning. Our approach can be instantiated with different metric learning algorithms provided that they satisfy some constraints. Experimental evidence suggests that our approach outperforms metric learning with an unweighted kernel combination and metric learning with cross-validation based kernel selection.

AAAI Conference 2008 Conference Paper

Generating Application-Specific Benchmark Models for Complex Systems

  • Jun Wang

Automated generators for synthetic models and data can play a crucial role in designing new algorithms/modelframeworks, given the sparsity of benchmark models for empirical analysis and the cost of generating models by hand. We describe an automated generator for benchmark models that is based on using a compositional modeling framework and employs random-graph models for the system topology. We choose the system topology that best matches the topology of the realworld system using a domain-analysis algorithm. To show the range of models for which this approach is applicable, we demonstrate our model-generation process using two examples of model generation optimized for a specific domain: (1) model-based diagnosis for discrete Boolean circuits, and (2) E. coli TRN networks for simulating gene expression.

IJCAI Conference 2007 Conference Paper

  • Gregory Provan
  • Jun Wang

The task of model-based diagnosis is NP-complete, but it is not known whether it is computationally difficult for the "average" real-world system. There has been no systematic study of the complexity of diagnosing real-world problems, and few good benchmarks exist to test this. Real-world-graphs, a mathematical framework that has been proposed as a model for complex systems, have empirically been shown to capture several topological roperties of real-world systems. We describe the adequacy with which a real-world-graph can characterise the complexity of model-based diagnostic inference on real-world systems. We empirically compare the inference complexity of diagnosing models automatically generated using the real-world-graph framework with comparable models from well-known ISCAS circuit benchmarks. We identify parameters necessary for the real-world-graph framework to generate benchmark diagnosis circuit models with realistic properties.