Author name cluster

Quan Yuan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers

2 author rows

AAAI Conference 2026 Conference Paper

From Discriminative to Generative: A Diffusion-Based Paradigm for Multi-Agent Collaborative Perception

Kexin Gong
Puyi Yao
Guiyang Luo
Quan Yuan
Tiange Fu
Hui Zhang
Jinglin Li

Collaborative perception leveraging intermediate feature fusion has emerged as a leading paradigm to significantly enhance the environmental perception capabilities of autonomous driving systems. However, existing methods typically rely on discriminative supervision guided by downstream tasks. This paradigm compels models to learn minimal, task-specific representations, which conflicts with the goal of cooperative perception to capture comprehensive information, thereby limiting generalization. To address this issue, we propose DiGS-CP, a novel two-stage generative supervised collaborative perception framework. Specifically, we introduce a diffusion-based generative task that conditions on fused object-level features to generate representations of object-level point clouds. The proposed generative supervision provides fine-grained, task-agnostic signals that encourages the fusion module to learn comprehensive representations beyond task-specific requirements. By preserving and integrating complementary information from collaborative agents, our approach overcomes the limitations of task-specific learning and enhances the generalizability of the learned features. Furthermore, our two-stage architecture requires agents to transmit only object-level features, significantly reducing communication overhead. Extensive experiments on three benchmark datasets demonstrate that DiGS-CP achieves state-of-the-art performance in 3D object detection, while maintaining low bandwidth requirements and exhibiting excellent generalization ability.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

NegoCollab: A Common Representation Negotiation Approach for Heterogeneous Collaborative Perception

CONGZHANG SHAO
Quan Yuan
Guiyang Luo
Yue Hu
Danni Wang
Liu Yilin
Rui Pan
Bo Chen

Collaborative perception improves task performance by expanding the perception range through information sharing among agents. Immutable heterogeneity poses a significant challenge in collaborative perception, as participating agents may employ different and fixed perception models. This leads to domain gaps in the intermediate features shared among agents, consequently degrading collaborative performance. Aligning the features of all agents to a common representation can eliminate domain gaps with low training cost. However, in existing methods, the common representation is designated as the representation of a specific agent, making it difficult for agents with significant domain discrepancies from this specific agent to achieve proper alignment. This paper proposes NegoCollab, a heterogeneous collaboration method based on the negotiated common representation. It introduces a negotiator during training to derive the common representation from the local representations of each modality's agent, effectively reducing the inherent domain gap with the various local representations. In NegoCollab, the mutual transformation of features between the local representation space and the common representation space is achieved by a pair of sender and receiver. To better align local representations to the common representation containing multimodal information, we introduce structural alignment loss and pragmatic alignment loss in addition to the distribution alignment loss to supervise the training. This enables the knowledge in the common representation to be fully distilled into the sender. The experimental results demonstrate that NegoCollab significantly outperforms existing methods in common representation-based collaboration approaches. The mechanism of obtaining common representations through negotiation provides a more reliable and flexible option for common representations in heterogeneous collaborative perception.

PDF Details

EAAI Journal 2025 Journal Article

Study on image acquisition and camera positioning of depth recognition model in the tobacco curing stage

Chuan Feng
Shiping Zhu
Maojie Tang
Hu Zhao
Quan Yuan
Bojun Wang

To address the challenges in the digital transformation of the tobacco industry regarding the recognition of data acquisition standards and the construction of an Internet of Things (IoT) system for tobacco leaf curing stages, these deep neural networks are applied in this field to construct an identification model. Data was collected under various conditions, including different camera types (with or without distortion, focal length), installation positions, and lighting conditions, to obtain curing stage data. After preprocessing each type of image data, ten-stage classification recognition datasets were established based on the “three stages and six steps” curing process. Six recognition models were developed using ResNeXt-50, ShuffleNetV2(1. 0), MobileNetV3-S, VanillaNet-10, EfficientNetV2-S, and EfficientNetV2-M as backbone networks. Evaluation and analysis were conducted using multiple performance metrics such as accuracy, F1 score, as well as various graphical representations including curve plots, radar charts, bar charts, and confusion matrices. The results indicate: 1. ShuffleNetV2(1. 0) (MT: 98. 71%) and EffcientNetV2(MT: 99. 84%) series networks exhibit superior recognition performance. 2. Recognition performance varies between high-temperature and low-temperature areas. 3. Combining multiple perspectives can improve recognition accuracy. 4. Cameras with larger focal lengths, cool white lighting, and distortion are more conducive to recognition. 5. The application prospects of image-assisted tobacco leaf curing are promising. Code available at: https: //github. com/vontran2021/CuringStage.

Details DOI

IROS Conference 2024 Conference Paper

Development of a Novel Redundant Parallel Mechanism with Enlarged Workspace and Enhanced Dexterity for Fracture Reduction Surgery

Quan Yuan
Xu Liang
Tingting Su
Weibang Bai

The limited workspace and complex singularity issues are predominant factors impeding the clinical applicability of fracture reduction parallel robots. To address these challenges, this paper proposes a novel redundant parallel mechanism (NRPM) for robotic-assisted fracture reduction with an enlarged workspace and enhanced dexterity capabilities based on the traditional Stewart parallel mechanism (SPM). With six redundant degrees-of-freedom (DOFs) added to the novel mechanism, the kinematics of NRPM needs to be thoroughly analyzed. Furthermore, the calculation of its workspace and determination of its dexterity are deduced. Both the analytical simulation and real experiment results demonstrated the effectiveness and superior performance of the proposed NRPM compared to SPM.

Details

AAAI Conference 2024 Conference Paper

TaskLAMA: Probing the Complex Task Understanding of Language Models

Quan Yuan
Mehran Kazemi
Xin Xu
Isaac Noble
Vaiva Imbrasaite
Deepak Ramachandran

Structured Complex Task Decomposition (SCTD) is the problem of breaking down a complex real-world task (such as planning a wedding) into a directed acyclic graph over individual steps that contribute to achieving the task, with edges specifying temporal dependencies between steps. SCTD is an important component of assistive planning tools, and a challenge for commonsense reasoning systems. We probe how accurately SCTD can be done with the knowledge extracted from pre-trained Large Language Models (LLMs). We introduce a new high-quality human-annotated dataset for this problem and novel metrics to fairly assess performance of LLMs against several baselines. Our experiments reveal that LLMs are able to decompose complex tasks into individual steps effectively, with a relative improvement of 15% to 280% over the best baseline. We also propose a number of approaches to further improve their performance, with a relative improvement of 7% to 37%. However, we find that LLMs still struggle to predict pairwise temporal dependencies, which reveals a gap in their understanding of complex tasks.

PDF Details DOI

AAAI Conference 2023 Conference Paper

AlphaRoute: Large-Scale Coordinated Route Planning via Monte Carlo Tree Search

Guiyang Luo
Yantao Wang
Hui Zhang
Quan Yuan
Jinglin Li

This paper proposes AlphaRoute, an AlphaGo inspired algorithm for coordinating large-scale routes, built upon graph attention reinforcement learning and Monte Carlo Tree Search (MCTS). We first partition the road network into regions and model large-scale coordinated route planning as a Markov game, where each partitioned region is treated as a player instead of each driver. Then, AlphaRoute applies a bilevel optimization framework, consisting of several region planners and a global planner, where the region planner coordinates the route choices for vehicles located in the region and generates several strategies, and the global planner evaluates the combination of strategies. AlphaRoute is built on graph attention network for evaluating each state and MCTS algorithm for dynamically visiting and simulating the future state for narrowing down the search space. AlphaRoute is capable of 1) bridging user fairness and system efficiency, 2) achieving higher search efficiency by alleviating the curse of dimensionality problems, and 3) making an effective and informed route planning by simulating over the future to capture traffic dynamics. Comprehensive experiments are conducted on two real-world road networks as compared with several baselines to evaluate the performance, and results show that AlphaRoute achieves the lowest travel time, and is efficient and effective for coordinating large-scale routes and alleviating the traffic congestion problem. The code will be publicly available.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information

Mehran Kazemi
Quan Yuan
Deepti Bhatia
Najoung Kim
Xin Xu
Vaiva Imbrasaite
Deepak Ramachandran

Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for developing robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. When reasoning in the real-world, the available information is frequently inconsistent or contradictory, and therefore models need to be equipped with a strategy to resolve such conflicts when they arise. One widely-applicable way of resolving conflicts is to impose preferences over information sources (e. g. , based on source credibility or information recency) and adopt the source with higher preference. In this paper, we formulate the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning, and develop a dataset called BoardgameQA for measuring the reasoning capacity of LMs in this setting. BoardgameQA also incorporates reasoning with implicit background knowledge, to better reflect reasoning problems in downstream applications. We benchmark various LMs on BoardgameQA and the results reveal a significant gap in the reasoning capacity of state-of-the-art LMs on this problem, showing that reasoning with conflicting information does not surface out-of-the-box in LMs. While performance can be improved with finetuning, it nevertheless remains poor.

PDF Details

IJCAI Conference 2023 Conference Paper

GPLight: Grouped Multi-agent Reinforcement Learning for Large-scale Traffic Signal Control

Yilin Liu
Guiyang Luo
Quan Yuan
Jinglin Li
Lei Jin
Bo Chen
Rui Pan

The use of multi-agent reinforcement learning (MARL) methods in coordinating traffic lights (CTL) has become increasingly popular, treating each intersection as an agent. However, existing MARL approaches either treat each agent absolutely homogeneous, i. e. , same network and parameter for each agent, or treat each agent completely heterogeneous, i. e. , different networks and parameters for each agent. This creates a difficult balance between accuracy and complexity, especially in large-scale CTL. To address this challenge, we propose a grouped MARL method named GPLight. We first mine the similarity between agent environment considering both real-time traffic flow and static fine-grained road topology. Then we propose two loss functions to maintain a learnable and dynamic clustering, one that uses mutual information estimation for better stability, and the other that maximizes separability between groups. Finally, GPLight enforces the agents in a group to share the same network and parameters. This approach reduces complexity by promoting cooperation within the same group of agents while reflecting differences between groups to ensure accuracy. To verify the effectiveness of our method, we conduct experiments on both synthetic and real-world datasets, with up to 1, 089 intersections. Compared with state-of-the-art methods, experiment results demonstrate the superiority of our proposed method, especially in large-scale CTL.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

A Novel Sequence-to-Subgraph Framework for Diagnosis Classification

Jun Chen
Quan Yuan
Chao Lu
Haifeng Huang

Text-based diagnosis classification is a critical problem in AI-enabled healthcare studies, which assists clinicians in making correct decision and lowering the rate of diagnostic errors. Previous studies follow the routine of sequence based deep learning models in NLP literature to deal with clinical notes. However, recent studies find that structural information is important in clinical contents that greatly impacts the predictions. In this paper, a novel sequence-to-subgraph framework is introduced to process clinical texts for classification, which changes the paradigm of managing texts. Moreover, a new classification model under the framework is proposed that incorporates subgraph convolutional network and hierarchical diagnostic attentive network to extract the layered structural features of clinical texts. The evaluation conducted on both the real-world English and Chinese datasets shows that the proposed method outperforms the state-of-the-art deep learning based diagnosis classification models.

PDF Details DOI

ICML Conference 2021 Conference Paper

SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II

Xiangjun Wang
Junxiao Song
Penghui Qi
Peng Peng
Zhenkun Tang
Wei Zhang
Weimin Li
Xiongjun Pi

AlphaStar, the AI that reaches GrandMaster level in StarCraft II, is a remarkable milestone demonstrating what deep reinforcement learning can achieve in complex Real-Time Strategy (RTS) games. However, the complexities of the game, algorithms and systems, and especially the tremendous amount of computation needed are big obstacles for the community to conduct further research in this direction. We propose a deep reinforcement learning agent, StarCraft Commander (SCC). With order of magnitude less computation, it demonstrates top human performance defeating GrandMaster players in test matches and top professional players in a live event. Moreover, it shows strong robustness to various human strategies and discovers novel strategies unseen from human plays. In this paper, we’ll share the key insights and optimizations on efficient imitation learning and reinforcement learning for StarCraft II full game.

Details

IJCAI Conference 2020 Conference Paper

The Graph-based Mutual Attentive Network for Automatic Diagnosis

Quan Yuan
Jun Chen
Chao Lu
Haifeng Huang

The automatic diagnosis has been suffering from the problem of inadequate reliable corpus to train a trustworthy predictive model. Besides, most of the previous deep learning based diagnosis models adopt the sequence learning techniques (CNN or RNN), which is difficult to extract the complex structural information, e. g. graph structure, between the critical medical entities. In this paper, we propose to build the diagnosis model based on the high-standard EMR documents from real hospitals to improve the accuracy and the credibility of the resulting model. Meanwhile, we introduce the Graph Convolutional Network into the model that alleviates the sparse feature problem and facilitates the extraction of structural information for diagnosis. Moreover, we propose the mutual attentive network to enhance the representation of inputs towards the better model performance. The evaluation conducted on the real EMR documents demonstrates that the proposed model is more accurate compared to the previous sequence learning based diagnosis models. The proposed model has been integrated into the information systems in over hundreds of primary health care facilities in China to assist physicians in the diagnostic process.

PDF Details DOI

TIST Journal 2018 Journal Article

GeoBurst+

Chao Zhang
Dongming Lei
Quan Yuan
Honglei Zhuang
Lance Kaplan
Shaowen Wang
Jiawei Han

The real-time discovery of local events (e.g., protests, disasters) has been widely recognized as a fundamental socioeconomic task. Recent studies have demonstrated that the geo-tagged tweet stream serves as an unprecedentedly valuable source for local event detection. Nevertheless, how to effectively extract local events from massive geo-tagged tweet streams in real time remains challenging. To bridge the gap, we propose a method for effective and real-time local event detection from geo-tagged tweet streams. Our method, named G eo B urst+, first leverages a novel cross-modal authority measure to identify several pivots in the query window. Such pivots reveal different geo-topical activities and naturally attract similar tweets to form candidate events. G eo B urst+ further summarizes the continuous stream and compares the candidates against the historical summaries to pinpoint truly interesting local events. Better still, as the query window shifts, G eo B urst+ is capable of updating the event list with little time cost, thus achieving continuous monitoring of the stream. We used crowdsourcing to evaluate G eo B urst+ on two million-scale datasets and found it significantly more effective than existing methods while being orders of magnitude faster.

Details DOI

IJCAI Conference 2017 Conference Paper

ContextCare: Incorporating Contextual Information Networks to Representation Learning on Medical Forum Data

Stan Zhao
Meng Jiang
Quan Yuan
Bing Qin
Ting Liu
ChengXiang Zhai

Online users have generated a large amount of health-related data on medical forums and search engines. However, exploiting these rich data for orienting patient online and assisting medical checkup offline is nontrivial due to the sparseness of existing symptom-disease links, which caused by the natural and chatty expressions of symptoms. In this paper, we propose a novel and general representation learning method ContextCare for human generated health-related data, which learns the latent relationship between symptoms and diseases from the symptom-disease diagnosis network for disease prediction, disease category prediction and disease clustering. To alleviate the network sparseness, ContextCare adopts regularizations from rich contextual information networks including a symptom co-occurrence network and a disease evolution network. Therefore, our representations of symptoms and diseases incorporate knowledge from these three networks. Extensive experiments on medical forum data demonstrate that ContextCare outperforms the state-of-the-art methods in disease category prediction, disease prediction and disease clustering.

PDF Details

IJCAI Conference 2016 Conference Paper

Collaborative Multi-Level Embedding Learning from Reviews for Rating Prediction

Wei Zhang
Quan Yuan
Jiawei Han
Jianyong Wang

We investigate the problem of personalized review-based rating prediction which aims at predicting users' ratings for items that they have not evaluated by using their historical reviews and ratings. Most of existing methods solve this problem by integrating topic model and latent factor model to learn interpretable user and items factors. However, these methods cannot utilize word local context information of reviews. Moreover, it simply restricts user and item representations equivalent to their review representations, which may bring some irrelevant information in review text and harm the accuracy of rating prediction. In this paper, we propose a novel Collaborative Multi-Level Embedding (CMLE) model to address these limitations. The main technical contribution of CMLE is to integrate word embedding model with standard matrix factorization model through a projection level. This allows CMLE to inherit the ability of capturing word local context information from word embedding model and relax the strict equivalence requirement by projecting review embedding to user and item embeddings. A joint optimization problem is formulated and solved through an efficient stochastic gradient ascent algorithm. Empirical evaluations on real datasets show CMLE outperforms several competitive methods and can solve the two limitations well.

PDF Details

AAAI Conference 2015 Conference Paper

A Tri-Role Topic Model for Domain-Specific Question Answering

Zongyang Ma
Aixin Sun
Quan Yuan
Gao Cong

Stack Overflow and MedHelp are examples of domainspecific community-based question answering (CQA) systems. Different from CQA systems for general topics (e. g. , Yahoo! Answers, Baidu Knows), questions and answers in domain-specific CQA systems are mostly in the same topical domain, enabling more comprehensive interaction between users on fine-grained topics. In such systems, users are more likely to ask questions on unfamiliar topics and to answer questions matching their expertise. Users can also vote answers based on their judgements. In this paper, we propose a Tri-Role Topic Model (TRTM) to model the tri-roles of users (i. e. , as askers, answerers, and voters, respectively) and the activities of each role including composing question, selecting question to answer, contributing and voting answers. The proposed model can be used to enhance CQA systems from many perspectives. As a case study, we conducted experiments on ranking answers for questions on Stack Overflow, a CQA system for professional and enthusiast programmers. Experimental results show that TRTM is effective in facilitating users getting ideal rankings of answers, particularly for new and less popular questions. Evaluated on nDCG, TRTM outperforms state-of-the-art methods.

PDF Details

IJCAI Conference 2015 Conference Paper

Personalized Ranking Metric Embedding for Next New POI Recommendation

Shanshan Feng
Xutao Li
Yifeng Zeng
Gao Cong
Yeow Meng Chee
Quan Yuan

The rapidly growing of Location-based Social Networks (LBSNs) provides a vast amount of check-in data, which enables many services, e. g. , point-ofinterest (POI) recommendation. In this paper, we study the next new POI recommendation problem in which new POIs with respect to users’ current location are to be recommended. The challenge lies in the difficulty in precisely learning users’ sequential information and personalizing the recommendation model. To this end, we resort to the Metric Embedding method for the recommendation, which avoids drawbacks of the Matrix Factorization technique. We propose a personalized ranking metric embedding method (PRME) to model personalized check-in sequences. We further develop a PRME-G model, which integrates sequential information, individual preference, and geographical influence, to improve the recommendation performance. Experiments on two real-world LBSN datasets demonstrate that our new algorithm outperforms the stateof-the-art next POI recommendation methods.

PDF Details

TIST Journal 2011 Journal Article

Who is Doing What and When

Shiwan Zhao
Michelle X. Zhou
Xiatian Zhang
Quan Yuan
Wentao Zheng
Rongyao Fu

Content-centric social Web sites, such as discussion forums and blog sites, have flourished during the past several years. These sites often contain overwhelming amounts of information that are also being updated rapidly. To help users locate their interests at such sites (e.g., interesting blogs to read or discussion forums to join), researchers have developed a number of recommendation technologies. However, it is difficult to make effective recommendations for new users (a.k.a. the cold start problem) due to a lack of user information (e.g., preferences and interests). Furthermore, the complexity of recommendation algorithms often prevents users from comprehending let alone trusting the recommended results. To tackle these above two challenges, we are building a social map-based recommender system called Pharos. A social map summarizes users’ content-related social behavior over time (e.g., reading, writing, and commenting behavior during the past week) as a set of latent communities. For a given time interval, each community is characterized by the theme of the content being discussed and the key people involved. By discovering, ranking, and displaying the most popular latent communities at different time intervals, Pharos creates a time-sensitive, visual social map of a Web site. This enables new users to obtain a quick overview of the site, alleviating the cold start problem. Furthermore, we use the social map as a context to help explain Pharos-recommended content and people. Users can also interactively explore the social map to locate the content in which they are interested or people that are not being explicitly recommended, compensating for the imperfections in the recommendation algorithms. We have developed several Pharos applications, one of which is deployed within our company. Our preliminary evaluation of the deployed application shows the usefulness of Pharos.

Details DOI