Arrow Research search

Author name cluster

Ziyan Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers
2 author rows

Possible papers

22

JBHI Journal 2026 Journal Article

Advanced Camera-Based Scoliosis Screening via Deep Learning Detection and Fusion of Trunk, Limb, and Skeleton Features

  • Ziyan Wang
  • Yi Zhou
  • Ninghui Xu
  • Yuqin Zhou
  • Heran Zhao
  • Zhiyong Chang
  • Zhigang Hu
  • Xiao Han

Scoliosis significantly impacts quality of life, highlighting the need for effective early scoliosis screening (SS) and intervention. However, current SS methods often involve physical contact, undressing, or radiation exposure. This study introduces an innovative, non-invasive SS approach utilizing a monocular RGB camera that eliminates the need for undressing, sensor attachment, and radiation exposure. We introduce a novel approach that employs Parameterized Human 3D Reconstruction (PH3DR) to reconstruct 3D human models, thereby effectively eliminating clothing obstructions, seamlessly integrated with an ISANet segmentation network, which has been enhanced by Multi-Scale Fusion Attention (MSFA) module we proposed for facilitating the segmentation of distinct human trunk and limb features (HTLF), capturing body surface asymmetries related to scoliosis. Additionally, we propose a Swin Transformer-enhanced CMU-Pose to extract human skeleton features (HSF), identifying skeletal asymmetries crucial for SS. Finally, we develop a fusion model that integrates the HTLF and HSF, combining surface morphology and skeletal features to improve the precision of SS. The experiments demonstrated that PH3DR and MSFA significantly improved the segmentation and extraction of HTLF, whereas ST-based CMU-Pose substantially enhanced the extraction of HSF. Our final model achieved a comparable F1 (0. 895 $\pm$ 0. 014) to the best-performing baseline model, with only 0. 79% of the parameters and 1. 64% of the FLOPs, achieving 36 FPS–significantly higher than the best-performing baseline model (10 FPS). Moreover, our model outperformed two spine surgeons, one less experienced and the other moderately experienced. With its patient-friendly, privacy-preserving, and easily deployable solution, this approach is particularly well-suited for early SS and routine monitoring.

AAAI Conference 2026 Conference Paper

Graph Domain Adaptation via Homophily-Agnostic Reconstructing Structure

  • Ruiyi Fang
  • Shuo Wang
  • Ruizhi Pu
  • Qiuhao Zeng
  • Hao Zheng
  • Ziyan Wang
  • Jiale Cai
  • Zhimin Mei

Graph Domain Adaptation (GDA) transfers knowledge from labeled source graphs to unlabeled target graphs, addressing the challenge of label scarcity. However, existing GDA methods typically assume that both source and target graphs exhibit homophily, leading existing methods to perform poorly when heterophily is present. Furthermore, the lack of labels in the target graph makes it impossible to assess its homophily level beforehand. To address this challenge, we propose a novel homophily-agnostic approach that effectively transfers knowledge between graphs with varying degrees of homophily. Specifically, we adopt a divide-and-conquer strategy that first separately reconstructs highly homophilic and heterophilic variants of both the source and target graphs, and then performs knowledge alignment separately between corresponding graph variants. Extensive experiments conducted on five benchmark datasets demonstrate the superior performance of our approach, particularly highlighting its substantial advantages on heterophilic graphs.

AAAI Conference 2026 Conference Paper

Safe Multi-agent Reinforcement Learning with Natural Language Constraints

  • Ziyan Wang
  • Meng Fang
  • Tristan Tomilin
  • Fei Fang
  • Yali Du

Safe Multi-Agent Reinforcement Learning (MARL) typically relies on manually specified numeric cost functions to ensure that policy behaviours respect safety constraints. As systems scale and human-defined constraints become more diverse, context-dependent, and frequently updated, hand-crafting such cost functions becomes prohibitively complex, tedious, and error-prone. Natural language offers an intuitive and flexible alternative for defining constraints, enabling broader accessibility and easier adaptation to new scenarios and evolving rules. However, current MARL frameworks lack effective mechanisms to incorporate free-form textual constraints in a robust and principled way. To bridge this gap, we introduce Safe Multi-Agent Reinforcement Learning with natural Language constraints (SMALL), a framework that leverages fine-tuned language models to parse and encode textual constraints into semantically meaningful embeddings. These embeddings characterise prohibited states or behaviours and enable automatic prediction of constraint violations. We integrate the resulting learned costs directly into MARL training, allowing agents to optimise task performance while simultaneously minimising constraint violations, without requiring manually engineered numeric cost functions. To rigorously evaluate our method, we also propose the LaMaSafe benchmark---a set of diverse multi-agent tasks designed to assess the capability of MARL algorithms to understand and adhere to realistic, human-provided natural language constraints. Experimental results across LaMaSafe environments show that SMALL achieves comparable task performance to strong MARL baselines while significantly reducing constraint violations. While SMALL does not provide formal safety guarantees, it demonstrates that natural language can be used to shape multi-agent behaviour toward safer policies.

AAAI Conference 2025 Conference Paper

Active Large Language Model-Based Knowledge Distillation for Session-Based Recommendation

  • Yingpeng Du
  • Zhu Sun
  • Ziyan Wang
  • Haoyan Chua
  • Jie Zhang
  • Yew-Soon Ong

Large language models (LLMs) provide a promising way for accurate session-based recommendation (SBR), but they demand substantial computational time and memory. Knowledge distillation (KD)-based methods can alleviate these issues by transferring the knowledge to a small student, which trains a student based on the predictions of a cumbersome teacher. However, these methods encounter difficulties for LLM-based KD in SBR. 1) It is expensive to make LLMs predict for all instances in KD. 2) LLMs may make ineffective predictions for some instances in KD, e.g., incorrect predictions for hard instances or similar predictions as existing recommenders for easy instances. In this paper, we propose an active LLM-based KD method in SBR, contributing to sustainable AI. To efficiently distill knowledge from LLMs with limited cost, we propose to extract a small proportion of instances predicted by LLMs. Meanwhile, for a more effective distillation, we propose an active learning strategy to extract instances that are as effective as possible for KD from a theoretical view. Specifically, we first formulate gains based on potential effects (e.g., effective, similar, and incorrect predictions by LLMs) and difficulties (e.g., easy or hard to fit) of instances for KD. Then, we propose to maximize the minimal gains of distillation to find the optimal selection policy for active learning, which can largely avoid extracting ineffective instances in KD. Experiments on real-world datasets show that our method significantly outperforms state-of-the-art methods for SBR.

NeurIPS Conference 2025 Conference Paper

DGH: Dynamic Gaussian Hair

  • Junying Wang
  • Yuanlu Xu
  • Edith Tretschk
  • Ziyan Wang
  • Anastasia Ianina
  • Aljaz Bozic
  • Ulrich Neumann
  • Tony Tung

The creation of photorealistic dynamic hair remains a major challenge in digital human modeling because of the complex motions, occlusions, and light scattering. Existing methods often resort to static capture and physics-based models that do not scale as they require manual parameter fine-tuning to handle the diversity of hairstyles and motions, and heavy computation to obtain high-quality appearance. In this paper, we present Dynamic Gaussian Hair (DGH), a novel framework that efficiently learns hair dynamics and appearance. We propose: (1) a coarse-to-fine model that learns temporally coherent hair motion dynamics across diverse hairstyles; (2) a strand-guided optimization module that learns a dynamic 3D Gaussian representation for hair appearance with support for differentiable rendering, enabling gradient-based learning of view-consistent appearance under motion. Unlike prior simulation-based pipelines, our approach is fully data-driven, scales with training data, and generalizes across various hairstyles and head motion sequences. Additionally, DGH can be seamlessly integrated into a 3D Gaussian avatar framework, enabling realistic, animatable hair for high-fidelity avatar representation. DGH achieves promising geometry and appearance results, providing a scalable, data-driven alternative to physics-based simulation and rendering.

NeurIPS Conference 2025 Conference Paper

Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia

  • Chandler Smith
  • Marwa Abdulhai
  • Manfred Díaz
  • Marko Tesic
  • Rakshit Trivedi
  • Sasha Vezhnevets
  • Lewis Hammond
  • Jesse Clifton

Large Language Model (LLM) agents have demonstrated impressive capabilities for social interaction and are increasingly being deployed in situations where they might engage with both human and artificial agents. These interactions represent a critical frontier for LLM-based agents, yet existing evaluation methods fail to measure how well these capabilities generalize to novel social situations. In this paper, we introduce a method for evaluating the ability of LLM-based agents to cooperate in zero-shot, mixed-motive environments using Concordia, a natural language multi-agent simulation environment. Our method measures general cooperative intelligence by testing an agent's ability to identify and exploit opportunities for mutual gain across diverse partners and contexts. We present empirical results from the NeurIPS 2024 Concordia Contest, where agents were evaluated on their ability to achieve mutual gains across a suite of diverse scenarios ranging from negotiation to collective action problems. Our findings reveal significant gaps between current agent capabilities and the robust generalization required for reliable cooperation, particularly in scenarios demanding persuasion and norm enforcement.

TMLR Journal 2025 Journal Article

MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment

  • Ziyan Wang
  • Yali Du
  • Yudi Zhang
  • Meng Fang
  • Biwei Huang

Offline Multi-agent Reinforcement Learning (MARL) is valuable in scenarios where online interaction is impractical or risky. While independent learning in MARL offers flexibility and scalability, accurately assigning credit to individual agents in offline settings poses challenges because interactions with an environment are prohibited. In this paper, we propose a new framework, namely \textbf{M}ulti-\textbf{A}gent \textbf{C}ausal \textbf{C}redit \textbf{A}ssignment (\textbf{MACCA}), to address credit assignment in the offline MARL setting. Our approach, MACCA, characterizing the generative process as a Dynamic Bayesian Network, captures relationships between environmental variables, states, actions, and rewards. Estimating this model on offline data, MACCA can learn each agent's contribution by analyzing the causal relationship of their individual rewards, ensuring accurate and interpretable credit assignment. Additionally, the modularity of our approach allows it to integrate with various offline MARL methods seamlessly. Theoretically, we proved that under the setting of the offline dataset, the underlying causal structure and the function for generating the individual rewards of agents are identifiable, which laid the foundation for the correctness of our modeling. In our experiments, we demonstrate that MACCA not only outperforms state-of-the-art methods but also enhances performance when integrated with other backbones.

ICML Conference 2025 Conference Paper

M³HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality

  • Ziyan Wang
  • Zhicheng Zhang
  • Fei Fang 0001
  • Yali Du 0001

Designing effective reward functions in multi-agent reinforcement learning (MARL) is a significant challenge, often leading to suboptimal or misaligned behaviors in complex, coordinated environments. We introduce Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality ($\text{M}^3\text{HF}$), a novel framework that integrates multi-phase human feedback of mixed quality into the MARL training process. By involving humans with diverse expertise levels to provide iterative guidance, $\text{M}^3\text{HF}$ leverages both expert and non-expert feedback to continuously refine agents’ policies. During training, we strategically pause agent learning for human evaluation, parse feedback using large language models to assign it appropriately and update reward functions through predefined templates and adaptive weights by using weight decay and performance-based adjustments. Our approach enables the integration of nuanced human insights across various levels of quality, enhancing the interpretability and robustness of multi-agent cooperation. Empirical results in challenging environments demonstrate that $\text{M}^3\text{HF}$ significantly outperforms state-of-the-art methods, effectively addressing the complexities of reward design in MARL and enabling broader human participation in the training process.

NeurIPS Conference 2025 Conference Paper

PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation

  • Ziyan Wang
  • Sizhe Wei
  • Xiaoming Huo
  • Hao Wang

Diffusion models have made significant advancements in recent years. However, their performance often deteriorates when trained or fine-tuned on imbalanced datasets. This degradation is largely due to the disproportionate representation of majority and minority data in image-text pairs. In this paper, we propose a general fine-tuning approach, dubbed PoGDiff, to address this challenge. Rather than directly minimizing the KL divergence between the predicted and ground-truth distributions, PoGDiff replaces the ground-truth distribution with a Product of Gaussians (PoG), which is constructed by combining the original ground-truth targets with the predicted distribution conditioned on a neighboring text embedding. Experiments on real-world datasets demonstrate that our method effectively addresses the imbalance problem in diffusion models, improving both generation accuracy and quality.

ICLR Conference 2025 Conference Paper

Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing

  • Qi Le
  • Enmao Diao
  • Ziyan Wang
  • Xinran Wang
  • Jie Ding 0002
  • Li Yang
  • Ali Anwar 0001

We introduce Probe Pruning (PP), a novel framework for online, dynamic, structured pruning of Large Language Models (LLMs) applied in a batch-wise manner. PP leverages the insight that not all samples and tokens contribute equally to the model's output, and probing a small portion of each batch effectively identifies crucial weights, enabling tailored dynamic pruning for different batches. It comprises three main stages: probing, history-informed pruning, and full inference. In the probing stage, PP selects a small yet crucial set of hidden states, based on residual importance, to run a few model layers ahead. During the history-informed pruning stage, PP strategically integrates the probing states with historical states. Subsequently, it structurally prunes weights based on the integrated states and the PP importance score, a metric developed specifically to assess the importance of each weight channel in maintaining performance. In the final stage, full inference is conducted on the remaining weights. A major advantage of PP is its compatibility with existing models, as it operates without requiring additional neural network modules or fine-tuning. Comprehensive evaluations of PP on LLaMA-2/3 and OPT models reveal that even minimal probing—using just 1.5% of FLOPs—can substantially enhance the efficiency of structured pruning of LLMs. For instance, when evaluated on LLaMA-2-7B with WikiText2, PP achieves a 2.56 times lower ratio of performance degradation per unit of latency reduction compared to the state-of-the-art method at a 40\% pruning ratio.

AAAI Conference 2025 Conference Paper

Re2LLM: Reflective Reinforcement Large Language Model for Session-based Recommendation

  • Ziyan Wang
  • Yingpeng Du
  • Zhu Sun
  • Haoyan Chua
  • Kaidong Feng
  • Wenya Wang
  • Jie Zhang

Emerging advancements in large language models (LLMs) show significant potential for enhancing recommendations. However, prompt-based methods often struggle to find ideal prompts without task-specific feedback, while fine-tuning-based methods are hindered by high computational demands and dependence on open-source backbones. To address these challenges, we propose a Reflective Reinforcement Large Language Model (Re2LLM) for session-based recommendation, which refines LLMs to generate and utilize specialized knowledge effectively and efficiently. Specifically, we first devise the Reflective Exploration Module to extract and present knowledge in a form that LLMs can easily process. This module enables LLMs to reflect on their recommendation mistakes and construct a hint knowledge base to rectify them effectively. Next, we design the Reinforcement Utilization Module to train a lightweight retrieval agent that elicits correct LLM reasoning. This module recognizes hints as signals to facilitate LLM recommendations and learns to select appropriate hints from the constructed knowledge base using task-specific feedback efficiently. Lastly, we conduct experiments on real-world datasets and demonstrate the superiority of our Re2LLM over state-of-the-art methods.

ICLR Conference 2025 Conference Paper

Towards Domain Adaptive Neural Contextual Bandits

  • Ziyan Wang
  • Xiaoming Huo
  • Hao Wang 0014

Contextual bandit algorithms are essential for solving real-world decision making problems. In practice, collecting a contextual bandit's feedback from different domains may involve different costs. For example, measuring drug reaction from mice (as a source domain) and humans (as a target domain). Unfortunately, adapting a contextual bandit algorithm from a source domain to a target domain with distribution shift still remains a major challenge and largely unexplored. In this paper, we introduce the first general domain adaptation method for contextual bandits. Our approach learns a bandit model for the target domain by collecting feedback from the source domain. Our theoretical analysis shows that our algorithm maintains a sub-linear regret bound even adapting across domains. Empirical results show that our approach outperforms the state-of-the-art contextual bandit algorithms on real-world datasets. Code will soon be available at https://github.com/Wang-ML-Lab/DABand.

NeurIPS Conference 2024 Conference Paper

Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf

  • Xuanfa Jin
  • Ziyan Wang
  • Yali Du
  • Meng Fang
  • Haifeng Zhang
  • Jun Wang

Communication is a fundamental aspect of human society, facilitating the exchange of information and beliefs among people. Despite the advancements in large language models (LLMs), recent agents built with these often neglect the control over discussion tactics, which are essential in communication scenarios and games. As a variant of the famous communication game Werewolf, One Night Ultimate Werewolf (ONUW) requires players to develop strategic discussion policies due to the potential role changes that increase the uncertainty and complexity of the game. In this work, we first present the existence of the Perfect Bayesian Equilibria (PBEs) in two scenarios of the ONUW game: one with discussion and one without. The results showcase that the discussion greatly changes players' utilities by affecting their beliefs, emphasizing the significance of discussion tactics. Based on the insights obtained from the analyses, we propose an RL-instructed language agent framework, where a discussion policy trained by reinforcement learning (RL) is employed to determine appropriate discussion tactics to adopt. Our experimental results on several ONUW game settings demonstrate the effectiveness and generalizability of our proposed framework.

NeurIPS Conference 2024 Conference Paper

Policy Learning from Tutorial Books via Understanding, Rehearsing and Introspecting

  • Xiong-Hui Chen
  • Ziyan Wang
  • Yali Du
  • Shengyi Jiang
  • Meng Fang
  • Yang Yu
  • Jun Wang

When humans need to learn a new skill, we can acquire knowledge through written books, including textbooks, tutorials, etc. However, current research for decision-making, like reinforcement learning (RL), has primarily required numerous real interactions with the target environment to learn a skill, while failing to utilize the existing knowledge already summarized in the text. The success of Large Language Models (LLMs) sheds light on utilizing such knowledge behind the books. In this paper, we discuss a new policy learning problem called Policy Learning from tutorial Books (PLfB) upon the shoulders of LLMs’ systems, which aims to leverage rich resources such as tutorial books to derive a policy network. Inspired by how humans learn from books, we solve the problem via a three-stage framework: Understanding, Rehearsing, and Introspecting (URI). In particular, it first rehearses decision-making trajectories based on the derived knowledge after understanding the books, then introspects in the imaginary dataset to distill a policy network. We build two benchmarks for PLfB~based on Tic-Tac-Toe and Football games. In experiment, URI's policy achieves at least 44% net win rate against GPT-based agents without any real data; In Football game, which is a complex scenario, URI's policy beat the built-in AIs with a 37% while using GPT-based agent can only achieve a 6\% winning rate. The project page: https: //plfb-football. github. io.

AAMAS Conference 2024 Conference Paper

Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models

  • Xingzhou Lou
  • Junge Zhang
  • Ziyan Wang
  • Kaiqi Huang
  • Yali Du

Safe reinforcement learning (RL) agents accomplish given tasks while adhering to specific constraints. Employing constraints expressed via easily-understandable human language offers considerable potential for real-world applications due to its accessibility and non-reliance on domain expertise. Previous safe RL methods with natural language constraints typically adopt a recurrent neural network, which leads to limited capabilities when dealing with various forms of human language input. Furthermore, these methods often require a ground-truth cost function, necessitating domain expertise for the conversion of language constraints into a well-defined cost function that determines constraint violation. To address these issues, we proposes to use pre-trained language models (LM) to facilitate RL agents’ comprehension of natural language constraints and allow them to infer costs for safe policy learning. Through the use of pre-trained LMs and the elimination of the need for a ground-truth cost, our method enhances safe policy learning under a diverse set of human-derived free-form natural language constraints. Experiments on grid-world navigation and robot control show that the proposed method can achieve strong performance while adhering to given constraints. The usage of pre-trained LMs allows our method to comprehend complicated constraints and learn safe policies without the need for ground-truth cost at any stage of training or evaluation. Extensive ablation studies are conducted to demonstrate the efficacy of each part of our method.

IJCAI Conference 2024 Conference Paper

ZeroDDI: A Zero-Shot Drug-Drug Interaction Event Prediction Method with Semantic Enhanced Learning and Dual-modal Uniform Alignment

  • Ziyan Wang
  • Zhankun Xiong
  • Feng Huang
  • Xuan Liu
  • Wen Zhang

Drug-drug interactions (DDIs) can result in various pharmacological changes, which can be categorized into different classes known as DDI events (DDIEs). In recent years, previously unobserved/unseen DDIEs have been emerging, posing a new classification task when unseen classes have no labelled instances in the training stage, which is formulated as a zero-shot DDIE prediction (ZS-DDIE) task. However, existing computational methods are not directly applicable to ZS-DDIE, which has two primary challenges: obtaining suitable DDIE representations and handling the class imbalance issue. To overcome these challenges, we propose a novel method named ZeroDDI for the ZS-DDIE task. Specifically, we design a biological semantic enhanced DDIE representation learning module, which emphasizes the key biological semantics and distills discriminative molecular substructure-related semantics for DDIE representation learning. Furthermore, we propose a dual-modal uniform alignment strategy to distribute drug pair representations and DDIE semantic representations uniformly in unit sphere and align the matched ones, which can mitigate the issue of class imbalance. Extensive experiments showed that ZeroDDI surpasses the baselines and indicate that it is a promising tool for detecting unseen DDIEs. Our code has been released in https: //github. com/wzy-Sarah/ZeroDDI.

NeurIPS Conference 2023 Conference Paper

ChessGPT: Bridging Policy Learning and Language Modeling

  • Xidong Feng
  • Yicheng Luo
  • Ziyan Wang
  • Hongrui Tang
  • Mengyue Yang
  • Kun Shao
  • David Mguni
  • Yali Du

When solving decision-making tasks, humans typically depend on information from two key sources: (1) Historical policy data, which provides interaction replay from the environment, and (2) Analytical insights in natural language form, exposing the invaluable thought process or strategic considerations. Despite this, the majority of preceding research focuses on only one source: they either use historical replay exclusively to directly learn policy or value functions, or engaged in language model training utilizing mere language corpus. In this paper, we argue that a powerful autonomous agent should cover both sources. Thus, we propose ChessGPT, a GPT model bridging policy learning and language modeling by integrating data from these two sources in Chess games. Specifically, we build a large-scale game and language dataset related to chess. Leveraging the dataset, we showcase two model examples ChessCLIP and ChessGPT, integrating policy learning and language modeling. Finally, we propose a full evaluation framework for evaluating language model's chess ability. Experimental results validate our model and dataset's effectiveness. We open source our code, model, and dataset at https: //github. com/waterhorse1/ChessGPT.

NeurIPS Conference 2023 Conference Paper

Interpretable Reward Redistribution in Reinforcement Learning: A Causal Approach

  • Yudi Zhang
  • Yali Du
  • Biwei Huang
  • Ziyan Wang
  • Jun Wang
  • Meng Fang
  • Mykola Pechenizkiy

A major challenge in reinforcement learning is to determine which state-action pairs are responsible for future rewards that are delayed. Reward redistribution serves as a solution to re-assign credits for each time step from observed sequences. While the majority of current approaches construct the reward redistribution in an uninterpretable manner, we propose to explicitly model the contributions of state and action from a causal perspective, resulting in an interpretable reward redistribution and preserving policy invariance. In this paper, we start by studying the role of causal generative models in reward redistribution by characterizing the generation of Markovian rewards and trajectory-wise long-term return and further propose a framework, called Generative Return Decomposition (GRD), for policy optimization in delayed reward scenarios. Specifically, GRD first identifies the unobservable Markovian rewards and causal relations in the generative process. Then, GRD makes use of the identified causal generative model to form a compact representation to train policy over the most favorable subspace of the state space of the agent. Theoretically, we show that the unobservable Markovian reward function is identifiable, as well as the underlying causal structure and causal models. Experimental results show that our method outperforms state-of-the-art methods and the provided visualization further demonstrates the interpretability of our method. The project page is located at https: //reedzyd. github. io/GenerativeReturnDecomposition/.

AAAI Conference 2023 Conference Paper

Multi-Relational Contrastive Learning Graph Neural Network for Drug-Drug Interaction Event Prediction

  • Zhankun Xiong
  • Shichao Liu
  • Feng Huang
  • Ziyan Wang
  • Xuan Liu
  • Zhongfei Zhang
  • Wen Zhang

Drug-drug interactions (DDIs) could lead to various unexpected adverse consequences, so-called DDI events. Predicting DDI events can reduce the potential risk of combinatorial therapy and improve the safety of medication use, and has attracted much attention in the deep learning community. Recently, graph neural network (GNN)-based models have aroused broad interest and achieved satisfactory results in the DDI event prediction. Most existing GNN-based models ignore either drug structural information or drug interactive information, but both aspects of information are important for DDI event prediction. Furthermore, accurately predicting rare DDI events is hindered by their inadequate labeled instances. In this paper, we propose a new method, Multi-Relational Contrastive learning Graph Neural Network, MRCGNN for brevity, to predict DDI events. Specifically, MRCGNN integrates the two aspects of information by deploying a GNN on the multi-relational DDI event graph attributed with the drug features extracted from drug molecular graphs. Moreover, we implement a multi-relational graph contrastive learning with a designed dual-view negative counterpart augmentation strategy, to capture implicit information about rare DDI events. Extensive experiments on two datasets show that MRCGNN outperforms the state-of-the-art methods. Besides, we observe that MRCGNN achieves satisfactory performance when predicting rare DDI events.

NeurIPS Conference 2023 Conference Paper

Variational Imbalanced Regression: Fair Uncertainty Quantification via Probabilistic Smoothing

  • Ziyan Wang
  • Hao Wang

Existing regression models tend to fall short in both accuracy and uncertainty estimation when the label distribution is imbalanced. In this paper, we propose a probabilistic deep learning model, dubbed variational imbalanced regression (VIR), which not only performs well in imbalanced regression but naturally produces reasonable uncertainty estimation as a byproduct. Different from typical variational autoencoders assuming I. I. D. representations (a data point's representation is not directly affected by other data points), our VIR borrows data with similar regression labels to compute the latent representation's variational distribution; furthermore, different from deterministic regression models producing point estimates, VIR predicts the entire normal-inverse-gamma distributions and modulates the associated conjugate distributions to impose probabilistic reweighting on the imbalanced data, thereby providing better uncertainty estimation. Experiments in several real-world datasets show that our VIR can outperform state-of-the-art imbalanced regression models in terms of both accuracy and uncertainty estimation. Code will soon be available at https: //github. com/Wang-ML-Lab/variational-imbalanced-regression.

ICML Conference 2022 Conference Paper

Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation

  • Aivar Sootla
  • Alexander I. Cowen-Rivers
  • Taher Jafferjee
  • Ziyan Wang
  • David Henry Mguni
  • Jun Wang 0012
  • Haitham Bou-Ammar

Satisfying safety constraints almost surely (or with probability one) can be critical for the deployment of Reinforcement Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety constraints are eliminated by augmenting them into the state-space and reshaping the objective. We show that Saute MDP satisfies the Bellman equation and moves us closer to solving Safe RL with constraints satisfied almost surely. We argue that Saute MDP allows viewing the Safe RL problem from a different perspective enabling new features. For instance, our approach has a plug-and-play nature, i. e. , any RL algorithm can be "Sauteed”. Additionally, state augmentation allows for policy generalization across safety constraints. We finally show that Saute RL algorithms can outperform their state-of-the-art counterparts when constraint satisfaction is of high importance.

NeurIPS Conference 2018 Conference Paper

Geometry-Aware Recurrent Neural Networks for Active Visual Recognition

  • Ricson Cheng
  • Ziyan Wang
  • Katerina Fragkiadaki

We present recurrent geometry-aware neural networks that integrate visual in- formation across multiple views of a scene into 3D latent feature tensors, while maintaining an one-to-one mapping between 3D physical locations in the world scene and latent feature locations. Object detection, object segmentation, and 3D reconstruction is then carried out directly using the constructed 3D feature memory, as opposed to any of the input 2D images. The proposed models are equipped with differentiable egomotion-aware feature warping and (learned) depth-aware unprojection operations to achieve geometrically consistent mapping between the features in the input frame and the constructed latent model of the scene. We empirically show the proposed model generalizes much better than geometry- unaware LSTM/GRU networks, especially under the presence of multiple objects and cross-object occlusions. Combined with active view selection policies, our model learns to select informative viewpoints to integrate information from by “undoing" cross-object occlusions, seamlessly combining geometry with learning from experience.