Arrow Research search

Author name cluster

Quan Yuan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
2 author rows

Possible papers

17

AAAI Conference 2026 Conference Paper

From Discriminative to Generative: A Diffusion-Based Paradigm for Multi-Agent Collaborative Perception

  • Kexin Gong
  • Puyi Yao
  • Guiyang Luo
  • Quan Yuan
  • Tiange Fu
  • Hui Zhang
  • Jinglin Li

Collaborative perception leveraging intermediate feature fusion has emerged as a leading paradigm to significantly enhance the environmental perception capabilities of autonomous driving systems. However, existing methods typically rely on discriminative supervision guided by downstream tasks. This paradigm compels models to learn minimal, task-specific representations, which conflicts with the goal of cooperative perception to capture comprehensive information, thereby limiting generalization. To address this issue, we propose DiGS-CP, a novel two-stage generative supervised collaborative perception framework. Specifically, we introduce a diffusion-based generative task that conditions on fused object-level features to generate representations of object-level point clouds. The proposed generative supervision provides fine-grained, task-agnostic signals that encourages the fusion module to learn comprehensive representations beyond task-specific requirements. By preserving and integrating complementary information from collaborative agents, our approach overcomes the limitations of task-specific learning and enhances the generalizability of the learned features. Furthermore, our two-stage architecture requires agents to transmit only object-level features, significantly reducing communication overhead. Extensive experiments on three benchmark datasets demonstrate that DiGS-CP achieves state-of-the-art performance in 3D object detection, while maintaining low bandwidth requirements and exhibiting excellent generalization ability.

NeurIPS Conference 2025 Conference Paper

NegoCollab: A Common Representation Negotiation Approach for Heterogeneous Collaborative Perception

  • CONGZHANG SHAO
  • Quan Yuan
  • Guiyang Luo
  • Yue Hu
  • Danni Wang
  • Liu Yilin
  • Rui Pan
  • Bo Chen

Collaborative perception improves task performance by expanding the perception range through information sharing among agents. Immutable heterogeneity poses a significant challenge in collaborative perception, as participating agents may employ different and fixed perception models. This leads to domain gaps in the intermediate features shared among agents, consequently degrading collaborative performance. Aligning the features of all agents to a common representation can eliminate domain gaps with low training cost. However, in existing methods, the common representation is designated as the representation of a specific agent, making it difficult for agents with significant domain discrepancies from this specific agent to achieve proper alignment. This paper proposes NegoCollab, a heterogeneous collaboration method based on the negotiated common representation. It introduces a negotiator during training to derive the common representation from the local representations of each modality's agent, effectively reducing the inherent domain gap with the various local representations. In NegoCollab, the mutual transformation of features between the local representation space and the common representation space is achieved by a pair of sender and receiver. To better align local representations to the common representation containing multimodal information, we introduce structural alignment loss and pragmatic alignment loss in addition to the distribution alignment loss to supervise the training. This enables the knowledge in the common representation to be fully distilled into the sender. The experimental results demonstrate that NegoCollab significantly outperforms existing methods in common representation-based collaboration approaches. The mechanism of obtaining common representations through negotiation provides a more reliable and flexible option for common representations in heterogeneous collaborative perception.

EAAI Journal 2025 Journal Article

Study on image acquisition and camera positioning of depth recognition model in the tobacco curing stage

  • Chuan Feng
  • Shiping Zhu
  • Maojie Tang
  • Hu Zhao
  • Quan Yuan
  • Bojun Wang

To address the challenges in the digital transformation of the tobacco industry regarding the recognition of data acquisition standards and the construction of an Internet of Things (IoT) system for tobacco leaf curing stages, these deep neural networks are applied in this field to construct an identification model. Data was collected under various conditions, including different camera types (with or without distortion, focal length), installation positions, and lighting conditions, to obtain curing stage data. After preprocessing each type of image data, ten-stage classification recognition datasets were established based on the “three stages and six steps” curing process. Six recognition models were developed using ResNeXt-50, ShuffleNetV2(1. 0), MobileNetV3-S, VanillaNet-10, EfficientNetV2-S, and EfficientNetV2-M as backbone networks. Evaluation and analysis were conducted using multiple performance metrics such as accuracy, F1 score, as well as various graphical representations including curve plots, radar charts, bar charts, and confusion matrices. The results indicate: 1. ShuffleNetV2(1. 0) (MT: 98. 71%) and EffcientNetV2(MT: 99. 84%) series networks exhibit superior recognition performance. 2. Recognition performance varies between high-temperature and low-temperature areas. 3. Combining multiple perspectives can improve recognition accuracy. 4. Cameras with larger focal lengths, cool white lighting, and distortion are more conducive to recognition. 5. The application prospects of image-assisted tobacco leaf curing are promising. Code available at: https: //github. com/vontran2021/CuringStage.

IROS Conference 2024 Conference Paper

Development of a Novel Redundant Parallel Mechanism with Enlarged Workspace and Enhanced Dexterity for Fracture Reduction Surgery

  • Quan Yuan
  • Xu Liang
  • Tingting Su
  • Weibang Bai

The limited workspace and complex singularity issues are predominant factors impeding the clinical applicability of fracture reduction parallel robots. To address these challenges, this paper proposes a novel redundant parallel mechanism (NRPM) for robotic-assisted fracture reduction with an enlarged workspace and enhanced dexterity capabilities based on the traditional Stewart parallel mechanism (SPM). With six redundant degrees-of-freedom (DOFs) added to the novel mechanism, the kinematics of NRPM needs to be thoroughly analyzed. Furthermore, the calculation of its workspace and determination of its dexterity are deduced. Both the analytical simulation and real experiment results demonstrated the effectiveness and superior performance of the proposed NRPM compared to SPM.

AAAI Conference 2024 Conference Paper

TaskLAMA: Probing the Complex Task Understanding of Language Models

  • Quan Yuan
  • Mehran Kazemi
  • Xin Xu
  • Isaac Noble
  • Vaiva Imbrasaite
  • Deepak Ramachandran

Structured Complex Task Decomposition (SCTD) is the problem of breaking down a complex real-world task (such as planning a wedding) into a directed acyclic graph over individual steps that contribute to achieving the task, with edges specifying temporal dependencies between steps. SCTD is an important component of assistive planning tools, and a challenge for commonsense reasoning systems. We probe how accurately SCTD can be done with the knowledge extracted from pre-trained Large Language Models (LLMs). We introduce a new high-quality human-annotated dataset for this problem and novel metrics to fairly assess performance of LLMs against several baselines. Our experiments reveal that LLMs are able to decompose complex tasks into individual steps effectively, with a relative improvement of 15% to 280% over the best baseline. We also propose a number of approaches to further improve their performance, with a relative improvement of 7% to 37%. However, we find that LLMs still struggle to predict pairwise temporal dependencies, which reveals a gap in their understanding of complex tasks.

AAAI Conference 2023 Conference Paper

AlphaRoute: Large-Scale Coordinated Route Planning via Monte Carlo Tree Search

  • Guiyang Luo
  • Yantao Wang
  • Hui Zhang
  • Quan Yuan
  • Jinglin Li

This paper proposes AlphaRoute, an AlphaGo inspired algorithm for coordinating large-scale routes, built upon graph attention reinforcement learning and Monte Carlo Tree Search (MCTS). We first partition the road network into regions and model large-scale coordinated route planning as a Markov game, where each partitioned region is treated as a player instead of each driver. Then, AlphaRoute applies a bilevel optimization framework, consisting of several region planners and a global planner, where the region planner coordinates the route choices for vehicles located in the region and generates several strategies, and the global planner evaluates the combination of strategies. AlphaRoute is built on graph attention network for evaluating each state and MCTS algorithm for dynamically visiting and simulating the future state for narrowing down the search space. AlphaRoute is capable of 1) bridging user fairness and system efficiency, 2) achieving higher search efficiency by alleviating the curse of dimensionality problems, and 3) making an effective and informed route planning by simulating over the future to capture traffic dynamics. Comprehensive experiments are conducted on two real-world road networks as compared with several baselines to evaluate the performance, and results show that AlphaRoute achieves the lowest travel time, and is efficient and effective for coordinating large-scale routes and alleviating the traffic congestion problem. The code will be publicly available.

NeurIPS Conference 2023 Conference Paper

BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information

  • Mehran Kazemi
  • Quan Yuan
  • Deepti Bhatia
  • Najoung Kim
  • Xin Xu
  • Vaiva Imbrasaite
  • Deepak Ramachandran

Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for developing robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. When reasoning in the real-world, the available information is frequently inconsistent or contradictory, and therefore models need to be equipped with a strategy to resolve such conflicts when they arise. One widely-applicable way of resolving conflicts is to impose preferences over information sources (e. g. , based on source credibility or information recency) and adopt the source with higher preference. In this paper, we formulate the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning, and develop a dataset called BoardgameQA for measuring the reasoning capacity of LMs in this setting. BoardgameQA also incorporates reasoning with implicit background knowledge, to better reflect reasoning problems in downstream applications. We benchmark various LMs on BoardgameQA and the results reveal a significant gap in the reasoning capacity of state-of-the-art LMs on this problem, showing that reasoning with conflicting information does not surface out-of-the-box in LMs. While performance can be improved with finetuning, it nevertheless remains poor.

IJCAI Conference 2023 Conference Paper

GPLight: Grouped Multi-agent Reinforcement Learning for Large-scale Traffic Signal Control

  • Yilin Liu
  • Guiyang Luo
  • Quan Yuan
  • Jinglin Li
  • Lei Jin
  • Bo Chen
  • Rui Pan

The use of multi-agent reinforcement learning (MARL) methods in coordinating traffic lights (CTL) has become increasingly popular, treating each intersection as an agent. However, existing MARL approaches either treat each agent absolutely homogeneous, i. e. , same network and parameter for each agent, or treat each agent completely heterogeneous, i. e. , different networks and parameters for each agent. This creates a difficult balance between accuracy and complexity, especially in large-scale CTL. To address this challenge, we propose a grouped MARL method named GPLight. We first mine the similarity between agent environment considering both real-time traffic flow and static fine-grained road topology. Then we propose two loss functions to maintain a learnable and dynamic clustering, one that uses mutual information estimation for better stability, and the other that maximizes separability between groups. Finally, GPLight enforces the agents in a group to share the same network and parameters. This approach reduces complexity by promoting cooperation within the same group of agents while reflecting differences between groups to ensure accuracy. To verify the effectiveness of our method, we conduct experiments on both synthetic and real-world datasets, with up to 1, 089 intersections. Compared with state-of-the-art methods, experiment results demonstrate the superiority of our proposed method, especially in large-scale CTL.

IJCAI Conference 2021 Conference Paper

A Novel Sequence-to-Subgraph Framework for Diagnosis Classification

  • Jun Chen
  • Quan Yuan
  • Chao Lu
  • Haifeng Huang

Text-based diagnosis classification is a critical problem in AI-enabled healthcare studies, which assists clinicians in making correct decision and lowering the rate of diagnostic errors. Previous studies follow the routine of sequence based deep learning models in NLP literature to deal with clinical notes. However, recent studies find that structural information is important in clinical contents that greatly impacts the predictions. In this paper, a novel sequence-to-subgraph framework is introduced to process clinical texts for classification, which changes the paradigm of managing texts. Moreover, a new classification model under the framework is proposed that incorporates subgraph convolutional network and hierarchical diagnostic attentive network to extract the layered structural features of clinical texts. The evaluation conducted on both the real-world English and Chinese datasets shows that the proposed method outperforms the state-of-the-art deep learning based diagnosis classification models.

ICML Conference 2021 Conference Paper

SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II

  • Xiangjun Wang
  • Junxiao Song
  • Penghui Qi
  • Peng Peng
  • Zhenkun Tang
  • Wei Zhang
  • Weimin Li
  • Xiongjun Pi

AlphaStar, the AI that reaches GrandMaster level in StarCraft II, is a remarkable milestone demonstrating what deep reinforcement learning can achieve in complex Real-Time Strategy (RTS) games. However, the complexities of the game, algorithms and systems, and especially the tremendous amount of computation needed are big obstacles for the community to conduct further research in this direction. We propose a deep reinforcement learning agent, StarCraft Commander (SCC). With order of magnitude less computation, it demonstrates top human performance defeating GrandMaster players in test matches and top professional players in a live event. Moreover, it shows strong robustness to various human strategies and discovers novel strategies unseen from human plays. In this paper, we’ll share the key insights and optimizations on efficient imitation learning and reinforcement learning for StarCraft II full game.

IJCAI Conference 2020 Conference Paper

The Graph-based Mutual Attentive Network for Automatic Diagnosis

  • Quan Yuan
  • Jun Chen
  • Chao Lu
  • Haifeng Huang

The automatic diagnosis has been suffering from the problem of inadequate reliable corpus to train a trustworthy predictive model. Besides, most of the previous deep learning based diagnosis models adopt the sequence learning techniques (CNN or RNN), which is difficult to extract the complex structural information, e. g. graph structure, between the critical medical entities. In this paper, we propose to build the diagnosis model based on the high-standard EMR documents from real hospitals to improve the accuracy and the credibility of the resulting model. Meanwhile, we introduce the Graph Convolutional Network into the model that alleviates the sparse feature problem and facilitates the extraction of structural information for diagnosis. Moreover, we propose the mutual attentive network to enhance the representation of inputs towards the better model performance. The evaluation conducted on the real EMR documents demonstrates that the proposed model is more accurate compared to the previous sequence learning based diagnosis models. The proposed model has been integrated into the information systems in over hundreds of primary health care facilities in China to assist physicians in the diagnostic process.

TIST Journal 2018 Journal Article

GeoBurst+

  • Chao Zhang
  • Dongming Lei
  • Quan Yuan
  • Honglei Zhuang
  • Lance Kaplan
  • Shaowen Wang
  • Jiawei Han

The real-time discovery of local events (e.g., protests, disasters) has been widely recognized as a fundamental socioeconomic task. Recent studies have demonstrated that the geo-tagged tweet stream serves as an unprecedentedly valuable source for local event detection. Nevertheless, how to effectively extract local events from massive geo-tagged tweet streams in real time remains challenging. To bridge the gap, we propose a method for effective and real-time local event detection from geo-tagged tweet streams. Our method, named G eo B urst+, first leverages a novel cross-modal authority measure to identify several pivots in the query window. Such pivots reveal different geo-topical activities and naturally attract similar tweets to form candidate events. G eo B urst+ further summarizes the continuous stream and compares the candidates against the historical summaries to pinpoint truly interesting local events. Better still, as the query window shifts, G eo B urst+ is capable of updating the event list with little time cost, thus achieving continuous monitoring of the stream. We used crowdsourcing to evaluate G eo B urst+ on two million-scale datasets and found it significantly more effective than existing methods while being orders of magnitude faster.

IJCAI Conference 2017 Conference Paper

ContextCare: Incorporating Contextual Information Networks to Representation Learning on Medical Forum Data

  • Stan Zhao
  • Meng Jiang
  • Quan Yuan
  • Bing Qin
  • Ting Liu
  • ChengXiang Zhai

Online users have generated a large amount of health-related data on medical forums and search engines. However, exploiting these rich data for orienting patient online and assisting medical checkup offline is nontrivial due to the sparseness of existing symptom-disease links, which caused by the natural and chatty expressions of symptoms. In this paper, we propose a novel and general representation learning method ContextCare for human generated health-related data, which learns the latent relationship between symptoms and diseases from the symptom-disease diagnosis network for disease prediction, disease category prediction and disease clustering. To alleviate the network sparseness, ContextCare adopts regularizations from rich contextual information networks including a symptom co-occurrence network and a disease evolution network. Therefore, our representations of symptoms and diseases incorporate knowledge from these three networks. Extensive experiments on medical forum data demonstrate that ContextCare outperforms the state-of-the-art methods in disease category prediction, disease prediction and disease clustering.

IJCAI Conference 2016 Conference Paper

Collaborative Multi-Level Embedding Learning from Reviews for Rating Prediction

  • Wei Zhang
  • Quan Yuan
  • Jiawei Han
  • Jianyong Wang

We investigate the problem of personalized review-based rating prediction which aims at predicting users' ratings for items that they have not evaluated by using their historical reviews and ratings. Most of existing methods solve this problem by integrating topic model and latent factor model to learn interpretable user and items factors. However, these methods cannot utilize word local context information of reviews. Moreover, it simply restricts user and item representations equivalent to their review representations, which may bring some irrelevant information in review text and harm the accuracy of rating prediction. In this paper, we propose a novel Collaborative Multi-Level Embedding (CMLE) model to address these limitations. The main technical contribution of CMLE is to integrate word embedding model with standard matrix factorization model through a projection level. This allows CMLE to inherit the ability of capturing word local context information from word embedding model and relax the strict equivalence requirement by projecting review embedding to user and item embeddings. A joint optimization problem is formulated and solved through an efficient stochastic gradient ascent algorithm. Empirical evaluations on real datasets show CMLE outperforms several competitive methods and can solve the two limitations well.

AAAI Conference 2015 Conference Paper

A Tri-Role Topic Model for Domain-Specific Question Answering

  • Zongyang Ma
  • Aixin Sun
  • Quan Yuan
  • Gao Cong

Stack Overflow and MedHelp are examples of domainspecific community-based question answering (CQA) systems. Different from CQA systems for general topics (e. g. , Yahoo! Answers, Baidu Knows), questions and answers in domain-specific CQA systems are mostly in the same topical domain, enabling more comprehensive interaction between users on fine-grained topics. In such systems, users are more likely to ask questions on unfamiliar topics and to answer questions matching their expertise. Users can also vote answers based on their judgements. In this paper, we propose a Tri-Role Topic Model (TRTM) to model the tri-roles of users (i. e. , as askers, answerers, and voters, respectively) and the activities of each role including composing question, selecting question to answer, contributing and voting answers. The proposed model can be used to enhance CQA systems from many perspectives. As a case study, we conducted experiments on ranking answers for questions on Stack Overflow, a CQA system for professional and enthusiast programmers. Experimental results show that TRTM is effective in facilitating users getting ideal rankings of answers, particularly for new and less popular questions. Evaluated on nDCG, TRTM outperforms state-of-the-art methods.

IJCAI Conference 2015 Conference Paper

Personalized Ranking Metric Embedding for Next New POI Recommendation

  • Shanshan Feng
  • Xutao Li
  • Yifeng Zeng
  • Gao Cong
  • Yeow Meng Chee
  • Quan Yuan

The rapidly growing of Location-based Social Networks (LBSNs) provides a vast amount of check-in data, which enables many services, e. g. , point-ofinterest (POI) recommendation. In this paper, we study the next new POI recommendation problem in which new POIs with respect to users’ current location are to be recommended. The challenge lies in the difficulty in precisely learning users’ sequential information and personalizing the recommendation model. To this end, we resort to the Metric Embedding method for the recommendation, which avoids drawbacks of the Matrix Factorization technique. We propose a personalized ranking metric embedding method (PRME) to model personalized check-in sequences. We further develop a PRME-G model, which integrates sequential information, individual preference, and geographical influence, to improve the recommendation performance. Experiments on two real-world LBSN datasets demonstrate that our new algorithm outperforms the stateof-the-art next POI recommendation methods.

TIST Journal 2011 Journal Article

Who is Doing What and When

  • Shiwan Zhao
  • Michelle X. Zhou
  • Xiatian Zhang
  • Quan Yuan
  • Wentao Zheng
  • Rongyao Fu

Content-centric social Web sites, such as discussion forums and blog sites, have flourished during the past several years. These sites often contain overwhelming amounts of information that are also being updated rapidly. To help users locate their interests at such sites (e.g., interesting blogs to read or discussion forums to join), researchers have developed a number of recommendation technologies. However, it is difficult to make effective recommendations for new users (a.k.a. the cold start problem) due to a lack of user information (e.g., preferences and interests). Furthermore, the complexity of recommendation algorithms often prevents users from comprehending let alone trusting the recommended results. To tackle these above two challenges, we are building a social map-based recommender system called Pharos. A social map summarizes users’ content-related social behavior over time (e.g., reading, writing, and commenting behavior during the past week) as a set of latent communities. For a given time interval, each community is characterized by the theme of the content being discussed and the key people involved. By discovering, ranking, and displaying the most popular latent communities at different time intervals, Pharos creates a time-sensitive, visual social map of a Web site. This enables new users to obtain a quick overview of the site, alleviating the cold start problem. Furthermore, we use the social map as a context to help explain Pharos-recommended content and people. Users can also interactively explore the social map to locate the content in which they are interested or people that are not being explicitly recommended, compensating for the imperfections in the recommendation algorithms. We have developed several Pharos applications, one of which is deployed within our company. Our preliminary evaluation of the deployed application shows the usefulness of Pharos.