Arrow Research search

Author name cluster

Qi Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

156 papers
2 author rows

Possible papers

156

AAAI Conference 2026 Conference Paper

Channel-masked Asymmetric Distribution Matching for Cross-Domain Generalized Dataset Distillation

  • Qi Liu
  • Chenghao Xu
  • Jiexi Yan
  • guangtao lyu
  • Erkun Yang
  • Guihai Chen
  • Yanhua Yang

Dataset distillation has achieved remarkable progress as an effective approach for data compression. However, real-world data often comes from diverse domains, leading to potential mismatches between the domains of synthesized images and those of the evaluation set. Existing methods primarily assume domain alignment between them, which limits their generalization ability in the above cross-domain scenarios. In this paper, we aim to ensure that images synthesized from known domains maintain robust performance on unseen domains and propose a novel framework called Channel-masked Asymmetric Distribution Matching (CADM). During asymmetric distribution matching, domain-sensitive channels of real data are selectively masked at different layers to extract domain-invariant features that guide synthetic data optimization. To further improve synthetic data representation, we introduce a class-focused domain-agnostic regularization to capture class-relevant knowledge while ignoring domain-specific information. Experiments show that our method produces domain-robust synthetic data and substantially improves generalization performance on unseen domains.

AAAI Conference 2026 Conference Paper

Conversational Learning Diagnosis via Reasoning Multi-Turn Interactive Learning

  • Fangzhou Yao
  • Sheng Chang
  • Weibo Gao
  • Qi Liu

Learning diagnosis is a critical task that monitors students' cognitive state during educational activities, with the goal of enhancing learning outcomes. With advancements in language models (LMs), many AI-driven educational studies have shifted towards conversational learning scenarios, where students engage in multi-turn interactive dialogues with tutors. However, conversational learning diagnosis remains underdeveloped, and most existing techniques acquire students' cognitive state through intuitive instructional prompts on LMs to analyze the dialogue text. This direct prompting approach lacks a solid psychological foundation and fails to ensure the reliability of the generated analytical text. In this study, we introduce ParLD, a preview-analyze-reason framework for conversational learning diagnosis, which leverages multi-agent collaboration to diagnose students' cognitive state over multiple dialogue turns. Specifically, ParLD comprises main components: (1) Behavior Previewer, which generates a student behavior schema based on previous states and learning content; (2) State Analyzer, which diagnose the tutor-student dialogue and behavior schema to update the cognitive state; and (3) Performance Reasoner, which predicts the student's future responses and provides verifiable feedback to support ParLD's self-reflection with the Chain Reflector. They operate sequentially and iteratively during each interaction turn to diagnose the student’s cognitive state. We conduct experiments to evaluate both performance prediction and tutoring support, emphasizing the effectiveness of ParLD in providing reliable and insightful learning diagnosis.

AAAI Conference 2026 Conference Paper

DMGIN: How Multimodal LLMs Enhance Large Recommendation Models for Lifelong User Post-click Behaviors

  • Zhuoxing Wei
  • Qingchen Xie
  • Qi Liu
  • Jingsong Yu

Modeling user interest based on lifelong user behavior sequences is crucial for enhancing Click-Through Rate (CTR) prediction. However, long post-click behavior sequences themselves pose severe performance issues: the sheer volume of data leads to high computational costs and inefficiencies in model training and inference. Traditional methods address this by introducing two-stage approaches, but this compromises model effectiveness due to incomplete utilization of the full sequence context. More importantly, integrating multimodal embeddings into existing large recommendation models (LRM) presents significant challenges: These embeddings often exacerbate computational burdens and mismatch with LRM architectures. To address these issues and enhance the model's efficiency and accuracy, we introduce Deep Multimodal Group Interest Network (DMGIN). Given the observation that user post-click behavior sequences contain a large number of repeated items with varying behaviors and timestamps, DMGIN employs Multimodal LLMs(MLLM) for grouping to reorganize complete lifelong post-click behavior sequences more effectively, with almost no additional computational overhead, as opposed to directly introducing multimodal embeddings. To mitigate the potential information loss from grouping, we have implemented two key strategies. First, we analyze behaviors within each group using both interest statistics and intra-group transformers to capture group traits. Second, apply inter-group transformers to temporally ordered groups to capture the evolution of user group interests. Our extensive experiments on both industrial and public datasets confirm the effectiveness and efficiency of DMGIN. The A/B test in our LBS advertising system shows that DMGIN improves CTR by 4.7% and Revenue per Mile by 2.3%.

AAAI Conference 2026 Conference Paper

From Diagnosis to Generalization: A Cognitive Approach to Data Selection for Educational LLMs

  • Yuxiang Guo
  • Yan Zhuang
  • Qi Liu
  • Zhenya Huang
  • Xianquan Wang
  • Liyang He
  • Jiatong Li
  • Rui Li

Specializing Large Language Models for educational domains is a key frontier in creating personalized learning tools. The central challenge is not data scarcity but its abundance: efficiently selecting a curated data subset from vast corpora to enhance specialized skills and foster generalization, without degrading existing abilities. Existing data selection paradigms, relying on superficial semantic similarity or model training dynamics, often lack a principled framework to identify data that promotes true cognitive growth. Our work proposes a paradigm shift from leveraging indirect proxies of learning value, such as semantic similarity and training dynamics, towards a framework that performs a direct, cognitive-level modeling of the learner's state. We introduce CASS, a novel framework that implements this cognitive approach through a clear pipeline, moving from an initial Diagnosis to the ultimate goal of expanding the model's cognitive frontier. First, CASS diagnoses the LLM's cognitive frontier using Multidimensional Item Response Theory. Leveraging this diagnosis, it then employs Fisher Information to select a data subset situated at LLM's cognitive frontier that offers maximum informational gain. Finally, the model is fine-tuned on this curated data using a structured, easy-to-hard curriculum to ensure effective learning. Experiments on our new multi-subject dataset show that models trained with CASS not only achieve superior accuracy in the target domain but also exhibit enhanced generalization. CASS provides a more efficient, effective, and theoretically-grounded paradigm for building expert educational LLMs.

AAAI Conference 2026 Conference Paper

Generic Adversarial Attack Framework Against Graph-based Vertical Federated Learning

  • Yimin Liu
  • Peng Jiang
  • Qi Liu
  • Liehuang Zhu

Graph-based vertical federated learning (GVFL) enables multiple parties to collaboratively train and infer over aligned nodes, where each party contributes its own local embedding derived from different attributes and adjacency relations. Adversarial inputs injected by an attacker can skew the joint prediction toward its desired outcomes while diminishing the influence of benign parties and undermining contribution. However, most attacks typically have pre-set assumptions, such as access to the server architecture, model queries, or in-domain auxiliary graphs. In this paper, we propose SGAC, an attack framework that enables domination of joint inference without relying on above assumptions. SGAC learns label-indicative embeddings and class-transferable probabilities to generate a surrogate that closely mimics the server-side classification behavior by exploiting auxiliary graphs from non-training domains. SGAC then leverages saliency over node attributes and edges on the auxiliary graphs to construct a diverse set of shadow inputs resembling highly influential test instances. With the surrogate fidelity and input diversity, SGAC crafts transferable contribution-monopoly adversarial inputs that hijack GVFL incentives. Extensive experiments across diverse model architectures validate SGAC's effectiveness.

AAAI Conference 2026 Conference Paper

Look as You Think: Unifying Reasoning and Visual Evidence Attribution for Verifiable Document RAG via Reinforcement Learning

  • Shuochen Liu
  • Pengfei Luo
  • Chao Zhang
  • Yuhao Chen
  • Haotian Zhang
  • Qi Liu
  • Xin Kou
  • Tong Xu

Aiming to identify precise evidence sources from visual documents, visual evidence attribution for visual document retrieval–augmented generation (VD-RAG) ensures reliable and verifiable predictions from vision-language models (VLMs) in multimodal question answering. Most existing methods adopt end-to-end training to facilitate intuitive answer verification. However, they lack fine-grained supervision and progressive traceability throughout the reasoning process. In this paper, we introduce the Chain-of-Evidence (CoE) paradigm for VD-RAG. CoE unifies Chain-of-Thought (CoT) reasoning and visual evidence attribution by grounding reference elements in reasoning steps to specific regions with bounding boxes and page indexes. To enable VLMs to generate such evidence-grounded reasoning, we propose Look As You Think (LAT), a reinforcement learning framework that trains models to produce verifiable reasoning paths with consistent attribution. During training, LAT evaluates the attribution consistency of each evidence region and provides rewards only when the CoE trajectory yields correct answers, encouraging process-level self-verification. Experiments on vanilla Qwen2.5-VL-7B-Instruct with Paper‑ and Wiki‑VISA benchmarks show that LAT consistently improves the vanilla model in both single- and multi-image settings, yielding average gains of 8.23% in soft exact match (EM) and 47.0% in [email protected]. Meanwhile, LAT not only outperforms the supervised fine-tuning baseline, which is trained to directly produce answers with attribution, but also exhibits stronger generalization across domains.

AAAI Conference 2026 Conference Paper

Melodia: Training-Free Music Editing Guided by Attention Probing in Diffusion Models

  • Yi Yang
  • Haowen Li
  • Tianxiang Li
  • Boyu Cao
  • Xiaohan Zhang
  • Liqun Chen
  • Qi Liu

Text-to-music generation technology is progressing rapidly, creating new opportunities for musical composition and editing. However, existing music editing methods often fail to preserve the source music's temporal structure, including melody and rhythm, when altering particular attributes like instrument, genre, and mood. To address this challenge, this paper conducts an in-depth probing analysis on attention maps within AudioLDM 2, a diffusion-based model commonly used as the backbone for existing music editing methods. We reveal a key finding: cross-attention maps encompass details regarding distinct musical characteristics, and interventions on these maps frequently result in ineffective modifications. In contrast, self-attention maps are essential for preserving the temporal structure of the source music during its conversion into the target music. Building upon this understanding, we present Melodia, a training-free technique that selectively manipulates self-attention maps in particular layers during the denoising process and leverages an attention repository to store source music information, achieving accurate modification of musical characteristics while preserving the original structure without requiring textual descriptions of the source music. Additionally, we propose two novel metrics to better evaluate music editing methods. Both objective and subjective experiments demonstrate that our approach achieves superior results in terms of textual adherence and structural integrity across various datasets. This research enhances comprehension of internal mechanisms within music generation models and provides improved control for music creation.

AAAI Conference 2026 Conference Paper

QueryCraft: Transformer-Guided Query Initialization for Enhanced Human-Object Interaction Detection

  • Yuxiao Wang
  • Wolin Liang
  • Yu Lei
  • Weiying Xue
  • Nan Zhuang
  • Qi Liu

Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions in images. Although DETR-based methods have recently emerged as the mainstream framework for HOI detection, they still suffer from a key limitation: Randomly initialized queries lack explicit semantics, leading to suboptimal detection performance. To address this challenge, we propose QueryCraft, a novel plug-and-play HOI detection framework that incorporates semantic priors and guided feature learning through transformer-based query initialization. Central to our approach is ACTOR (Action-aware Cross-modal TransfORmer), a cross-modal Transformer encoder that jointly attends to visual regions and textual prompts to extract action-relevant features. Rather than merely aligning modalities, ACTOR leverages language-guided attention to infer interaction semantics and produce semantically meaningful query representations. To further enhance object-level query quality, we introduce a Perceptual Distilled Query Decoder (PDQD), which distills object category awareness from a pre-trained detector to serve as object query initiation. This dual-branch query initialization enables the model to generate more interpretable and effective queries for HOI detection. Extensive experiments on HICO-Det and V-COCO benchmarks demonstrate that our method achieves state-of-the-art performance and strong generalization.

AAAI Conference 2026 Conference Paper

TEMPLE: Incentivizing Temporal Understanding of Video Large Language Models via Progressive Pre-SFT Alignment

  • Shicheng Li
  • Lei Li
  • Kun Ouyang
  • Shuhuai Ren
  • Yuanxin Liu
  • Yuanxing Zhang
  • Fuzheng Zhang
  • Lingpeng Kong

Video Large Language Models (Video LLMs) have achieved significant success by adopting the paradigm of large-scale pre-training followed by supervised fine-tuning (SFT). However, existing approaches struggle with temporal reasoning due to weak temporal correspondence in the data and over-reliance on the next-token prediction paradigm, which collectively result in the absence temporal supervision. To address these limitations, we propose TEMPLE (TEMporal Preference Learning), a systematic framework that enhances temporal reasoning capabilities through Direct Preference Optimization (DPO). To address temporal information scarcity in data, we introduce an automated pipeline for systematically constructing temporality-intensive preference pairs comprising three steps: selecting temporally rich videos, designing video-specific perturbation strategies, and evaluating model responses on clean and perturbed inputs. Complementing this data pipeline, we provide additional supervision signals via preference learning and propose a novel Progressive Pre-SFT Alignment strategy featuring two key innovations: a curriculum learning strategy which progressively increases perturbation difficulty to maximize data efficiency; and applying preference optimization before instruction tuning to incentivize fundamental temporal alignment. Extensive experiments demonstrate that our approach consistently improves Video LLM performance across multiple benchmarks with a relatively small set of self-generated DPO data. Our findings highlight TEMPLE as a scalable and efficient complement to SFT-based methods, paving the way for developing reliable Video LLMs.

AAAI Conference 2026 Conference Paper

Themis: Automated Constraint-Aware Test Synthesis Framework for Code Reinforcement Learning

  • Shengyu Ye
  • Qi Liu
  • Hao Jiang
  • Zheng Zhang
  • Heng Yu
  • Zhenya Huang

Reinforcement learning (RL) has shown promise for enhancing code generation capabilities in large language models (LLMs), yet its effectiveness critically depends on high-quality test suites for reliable reward signals. Current approaches suffer from inadequate test case quantity and quality, leading to false positives (incorrect solutions passing verification) and slow positives (valid but suboptimal implementations), which corrupt RL training dynamics. We address these challenges through three key contributions: (1) We systematically analyze how low-quality test suites degrade Code RL performance via reward misalignment; (2) We propose Themis, an automated framework that transforms test case generation into code synthesis—first extracting problem constraints via template-guided parsing, then generating executable test generators through LLM-powered code synthesis, and finally validating tests through constraint-aware filtering; (3) We develop an error-guided test case reduction method that preserves error detection efficacy while reducing test set cardinality, thereby enhancing reinforcement learning training efficiency. Evaluated on programming competition datasets, Themis achieves 95 percent error detection rates, outperforming original test suites in most of the cases. When integrated into RL pipelines, models trained with Themis-generated tests demonstrate consistent 3-5 percent improvements across HumanEval, MBPP, and LiveCodeBench compared to the baseline, matching performance levels achieved with manually curated test suites. Our constraint-aware test synthesis framework ensures full automation while preserving semantic validity—critical for scaling RL training to complex code generation tasks. The framework's modular design also enables seamless integration with existing code data synthesis frameworks.

AAAI Conference 2026 Conference Paper

What-Meets-Where: Unified Learning of Action and Contact Localization in Images

  • Yuxiao Wang
  • Yu Lei
  • Wolin Liang
  • Weiying Xue
  • Zhenao Wei
  • Nan Zhuang
  • Qi Liu

People control their bodies to establish contact with the environment. To comprehensively understand actions across diverse visual contexts, it is essential to simultaneously consider what action is occurring and where it is happening. Current methodologies, however, often inadequately capture this duality, typically failing to jointly model both action semantics and their spatial contextualization within scenes. To bridge this gap, we introduce a novel vision task that simultaneously predicts high-level action semantics and fine-grained body-part contact regions. Our proposed framework, PaIR-Net, comprises three key components: the Contact Prior Aware Module (CPAM) for identifying contact-relevant body parts, the Prior-Guided Concat Segmenter (PGCS) for pixel-wise contact segmentation, and the Interaction Inference Module (IIM) responsible for integrating global interaction relationships. To facilitate this task, we present PaIR (Part-aware Interaction Representation), a comprehensive dataset containing 13,979 images that encompass 654 actions, 80 object categories, and 17 body parts. Experimental evaluation demonstrates that PaIR-Net significantly outperforms baseline approaches, while ablation studies confirm the efficacy of each architectural component.

NeurIPS Conference 2025 Conference Paper

A Closed-Form Solution for Fast and Reliable Adaptive Testing

  • Yan Zhuang
  • Chenye Ke
  • Zirui Liu
  • Qi Liu
  • Yuting Ning
  • Zhenya Huang
  • Weizhe Huang
  • Qingyang Mao

Human ability estimation is essential for educational assessment, career advancement, and professional certification. Adaptive Testing systems can improve estimation efficiency by selecting fewer, targeted questions, and are widely used in exams, e. g. , GRE, GMAT, and Duolingo English Test. However, selecting an optimal subset of questions remains a challenging nested optimization problem. Existing methods rely on costly approximations or data-intensive training, making them unsuitable for today's large-scale and complex testing environments. Thus, we propose a Closed-Form solution for question subset selection in Adaptive Testing. It directly minimizes ability estimation error by reducing ability parameter's gradient bias while maintaining Hessian stability, which enables a simple greedy algorithm for question selection. Moreover, it can quantify the impact of human behavioral perturbations on ability estimation. Extensive experiments on large-scale educational datasets demonstrate that it reduces the number of required questions by 10% compared to SOTA methods, while maintaining the same estimation accuracy.

NeurIPS Conference 2025 Conference Paper

Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models

  • Zekai Zhao
  • Qi Liu
  • Kun Zhou
  • Zihan Liu
  • Yifei Shao
  • Zhiting Hu
  • Biwei Huang

Despite the remarkable reasoning performance, eliciting the long chain-of-thought(CoT) ability in large language models(LLMs) typically requires costly reinforcement learning or supervised fine-tuning on high-quality distilled data. We investigate the internal mechanisms behind this capability and show that a small set of high-impact activations in the last few layers, greatly govern the long-form reasoning attributes, e. g. output length and self-reflection. Through simply amplifying these activations and adding ``wait'' tokens, the long CoT ability can be invoked without training, leading to significantly increased self-reflection rate and accuracy. In addition, we also find that the activation changes follow predictable trajectories, i. e. a sharp rise after special tokens and a subsequent exponential decay. Based on these insights, we introduce a general training-free activation control technique. It utilizes a few contrastive examples to identify the relevant activations, and then incorporates simple analytic functions to adjust their values at inference time to elicit long CoTs. Extensive experiments have verified the effectiveness of our methods in efficiently eliciting the long CoT ability of LLMs and improving the performance. Besides, we further propose a parameter-efficient fine-tuning method that trains only the last-layer activation amplification module and a few LoRA layers, outperforming LoRA on reasoning benchmarks with much fewer parameters. Our code and data will be fully public released.

AAAI Conference 2025 Conference Paper

Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems

  • Weibo Gao
  • Qi Liu
  • Linan Yue
  • Fangzhou Yao
  • Rui Lv
  • Zheng Zhang
  • Hao Wang
  • Zhenya Huang

Personalized learning represents a promising educational strategy within intelligent educational systems, aiming to enhance learners' practice efficiency. However, the scarcity of offline practice response data (e.g., answer correctness) and potential biases in human online practice create a significant gap between offline metrics and the actual online performance of personalized learning services. To address this challenge, we introduce Agent4Edu, a novel personalized learning simulator leveraging recent advancements in human intelligence through large language models (LLMs). Agent4Edu features LLM-powered generative agents equipped with learner profile, memory, and action modules tailored to personalized learning algorithms. The learner profiles are initialized using real-world response data, capturing practice styles and cognitive factors. Inspired by psychology theory, the memory module records practice facts and high-level summaries, integrating reflection mechanisms. The action module supports various behaviors, including exercise understanding, analysis, and response generation. Each agent can interact with personalized learning algorithms, such as computerized adaptive testing, enabling a multifaceted evaluation and enhancement of customized services. Through a comprehensive assessment, we explore the strengths and weaknesses of Agent4Edu, emphasizing the consistency and discrepancies in responses between agents and human learners.

NeurIPS Conference 2025 Conference Paper

Auto-Connect: Connectivity-Preserving RigFormer with Direct Preference Optimization

  • jingfeng Guo
  • Jian Liu
  • Jinnan Chen
  • Shiwei Mao
  • Changrong Hu
  • Puhua Jiang
  • Junlin Yu
  • Jing Xu

We introduce Auto-Connect, a novel approach for automatic rigging that explicitly preserves skeletal connectivity through a connectivity-preserving tokenization scheme. Unlike previous methods that predict bone positions represented as two joints or first predict points before determining connectivity, our method employs special tokens to define endpoints for each joint's children and for each hierarchical layer, effectively automating connectivity relationships. This approach significantly enhances topological accuracy by integrating connectivity information directly into the prediction framework. To further guarantee high-quality topology, we implement a topology-aware reward function that quantifies topological correctness, which is then utilized in a post-training phase through reward-guided Direct Preference Optimization. Additionally, we incorporate implicit geodesic features for latent top-$k$ bone selection, which substantially improves skinning quality. By leveraging geodesic distance information within the model's latent space, our approach intelligently determines the most influential bones for each vertex, effectively mitigating common skinning artifacts. This combination of connectivity-preserving tokenization, reward-guided fine-tuning, and geodesic-aware bone selection enables our model to consistently generate more anatomically plausible skeletal structures with superior deformation properties.

NeurIPS Conference 2025 Conference Paper

Bootstrapping Hierarchical Autoregressive Formal Reasoner with Chain-of-Proxy-Autoformalization

  • Qi Liu
  • Xinhao Zheng
  • Renqiu Xia
  • Qinxiang Cao
  • Junchi Yan

Deductive formal problem-solving (D-FPS) enables process-verified, human-aligned problem-solving by implementing deductive solving processes within formal theorem proving (FTP) environments. However, current methods fail to address the misalignment between informal and formal reasoning granularity and suffer from inefficiency due to backtracking and error propagation. Moreover, the extreme scarcity of formal problem-solution pairs further hinders progress. For the first gap, we propose **HAR** (_**H**ierarchical **A**utoregressive Formal **R**easoner_), a novel reasoning pipeline. HAR decouples informal-aligned drafting and detailed proving, and formulates solution construction as autoregressive generation with per-step feedback. Second, we propose **CoPA** (_**C**hain-**o**f-**P**roxy-**A**utoformalization_), a data generation pipeline that cascades statement autoformalization, proof drafting, and proof search as a proxy autoformalization path. Experiments demonstrate significant improvements: trained on data bootstrapped by CoPA, HAR achieves superior performance on FormalMath500 ($15. 50\\%\mapsto 44. 09\\%$) and MiniF2F-Solving ($21. 87\\%\mapsto 56. 58\\%$) with lower computational budget. Explorations reveal promising directions in formal solution pruning and informal dataset denoising.

IJCAI Conference 2025 Conference Paper

CoderAgent: Simulating Student Behavior for Personalized Programming Learning with Large Language Models

  • Yi Zhan
  • Qi Liu
  • Weibo Gao
  • Zheng Zhang
  • Tianfu Wang
  • Shuanghong Shen
  • Junyu Lu
  • Zhenya Huang

Personalized programming tutoring, such as exercise recommendation, can enhance learners' efficiency, motivation, and outcomes, which is increasingly important in modern digital education. However, the lack of sufficient and high-quality programming data, combined with the mismatch between offline evaluation and real-world learning, hinders the practical deployment of such systems. To address this challenge, many approaches attempt to simulate learner practice data, yet they often overlook the fine-grained, iterative nature of programming learning, resulting in a lack of interpretability and granularity. To fill this gap, we propose a LLM-based agent, CoderAgent, to simulate students' programming processes in a fine-grained manner without relying on real data. Specifically, we equip each human learner with an intelligent agent, the core of which lies in capturing the cognitive states of the human programming practice process. Inspired by ACT-R, a cognitive architecture framework, we design the structure of CoderAgent to align with human cognitive architecture by focusing on the mastery of programming knowledge and the application of coding ability. Recognizing the inherent patterns in multi-layered cognitive reasoning, we introduce the Programming Tree of Thought (PTOT), which breaks down the process into four steps: why, how, where, and what. This approach enables a detailed analysis of iterative problem-solving strategies. Finally, experimental evaluations on real-world datasets demonstrate that CoderAgent provides interpretable insights into learning trajectories and achieves accurate simulations, paving the way for personalized programming education.

IROS Conference 2025 Conference Paper

Deep Learning-based Proactive Hazard Prediction for Human-Robot Collaboration with Sensor Malfunctions

  • Yuliang Ma
  • Zilin Jin
  • Qi Liu
  • Ilshat Mamaev
  • Andrey Morozov 0001

Safety is a critical concern in human-robot collaboration (HRC). As collaborative robots take on increasingly complex tasks in human environments, their systems have become more sophisticated through the integration of multimodal sensors, including force-torque sensors, cameras, LiDARs, and IMUs. However, existing studies on HRC safety primarily focus on ensuring safety under normal operating conditions, overlooking scenarios where internal sensor faults occur. While anomaly detection modules can help identify sensor errors and mitigate hazards, two key challenges remain: (1) no anomaly detector is flawless, and (2) not all sensor malfunctions directly threaten human safety. Relying solely on anomaly detection can lead to missed errors or excessive false alarms. To enhance safety in real-world HRC applications, this paper introduces a deep learning-based method that proactively predicts hazards following the detection of sensory anomalies. We simulate two common types of faults—bias and noise—affecting joint sensors and monitor abnormal manipulator behaviors that could pose risks in fenceless HRC environments. A dataset of 2, 400 real-world samples is collected to train the proposed hazard prediction model. The approach leverages multimodal inputs, including RGB-D images, human pose, joint states, and planned robot paths, to assess whether sensor malfunctions could lead to hazardous events. Experimental results show that the proposed method outperforms state-of-the-art models, while offering faster inference speed. Additionally, cross-scenario testing confirms its strong generalization capabilities. The code and datasets are available at: DL-based-Hazard-Prediction.

AAAI Conference 2025 Conference Paper

Distribution-Driven Dense Retrieval: Modeling Many-to-One Query-Document Relationship

  • Junfeng Kang
  • Rui Li
  • Qi Liu
  • Zhenya Huang
  • Zheng Zhang
  • Yanjiang Chen
  • Linbo Zhu
  • Yu Su

Dense retrieval has emerged as the leading approach in information retrieval, aiming to find semantically relevant documents based on natural language queries. Given that a single document can be retrieved by multiple distinct queries, existing methods aim to represent a document with multiple vectors. Each vector is aligned with a different query to model the many-to-one relationship between queries and documents. However, these multiple vector-based approaches encounter challenges such as Increased Storage, Vector Collapse, and Search Efficiency. To address these issues, we introduce the Distribution-Driven Dense Retrieval framework (DDR). Specifically, we use vectors to represent queries and distributions to represent documents. This approach not only captures the relationships between multiple queries corresponding to the same document but also avoids the need to use multiple vectors to represent the document. Furthermore, to ensure search efficiency for DDR, we propose a dot product-based computation method to calculate the similarity between documents represented by distributions and queries represented by vectors. This allows for seamless integration with existing approximate nearest neighbor (ANN) search algorithms for efficient search. Finally, we conduct extensive experiments on real-world datasets, which demonstrate that our method significantly outperforms traditional dense retrieval methods.

IROS Conference 2025 Conference Paper

Dynamic Residual Safe Reinforcement Learning for Multi-Agent Safety-Critical Scenarios Decision-Making

  • Kaifeng Wang
  • Yinsong Chen
  • Qi Liu
  • Xueyuan Li
  • Xin Gao

In multi-agent safety-critical scenarios, traditional autonomous driving frameworks face significant challenges in balancing safety constraints and task performance. These frameworks struggle to quantify dynamic interaction risks in real-time and depend heavily on manual rules, resulting in low computational efficiency and conservative strategies. To address these limitations, we propose a Dynamic Residual Safe Reinforcement Learning (DRS-RL) framework grounded in a safety-enhanced networked Markov decision process. It’s the first time that the weak-to-strong theory is introduced into multi-agent decision-making, enabling lightweight dynamic calibration of safety boundaries via a weak-to-strong safety correction paradigm. Based on the multi-agent dynamic conflict zone model, our framework accurately captures spatiotemporal coupling risks among heterogeneous traffic participants and surpasses the static constraints of conventional geometric rules. Moreover, a risk-aware prioritized experience replay mechanism mitigates data distribution bias by mapping risk to sampling probability. Experimental results reveal that the proposed method significantly outperforms traditional RL algorithms in safety, efficiency, and comfort. Specifically, it reduces the collision rate by up to 92. 17%, while the safety model accounts for merely 27% of the main model’s parameters.

JBHI Journal 2025 Journal Article

Fusing Micro- and Macro-Scale Information to Predict Anticancer Synergistic Drug Combinations

  • Xiaowen Wang
  • Hongming Zhu
  • Qi Liu
  • Qin Liu

Drug combination therapy is highly regarded in cancer treatment. Computational methods offer a time- and cost-effective opportunity to explore the vast combination space. Although deep learning-based prediction methods lead the field, their generalization ability remains unsatisfactory. Few previous studies have the ability to finely characterize drugs and cell lines at both the micro-scale and macro-scale. Furthermore, the interaction of cross-scale information is often overlooked. These two points limit models' ability of predicting the synergism of drug combinations in cell lines. To address the issues, we propose a novel anticancer synergistic drug combination prediction method termed MMFSynergy in this article. The construction of MMFSynergy involves three phases. First, MMFSynergy pretrains two micro encoders and a macro graph encoder, which can capture micro- or macro-scale information from large volumes of unlabeled data and generate generic features for drugs and proteins. Second, it represents drugs and proteins by fusing cross-scale information through a self-supervised task. Finally, it employs a Transformer Encoder-based model to predict synergy scores, taking representations of drugs in the combinations and the associated proteins of cell lines as input. We compared our method with eight advanced methods across three typical scenarios based on two public datasets. The results consistently demonstrated that the proposed method's generalization ability outperforms six advanced methods'. We also conducted experiments including but not limited to ablation study and case study to further exhibit the effectiveness of MMFSynergy.

AAAI Conference 2025 Conference Paper

GenAL: Generative Agent for Adaptive Learning

  • Rui Lv
  • Qi Liu
  • Weibo Gao
  • Haotian Zhang
  • Junyu Lu
  • Linbo Zhu

Adaptive learning, also known as adaptive teaching, relies on learning path recommendations that sequentially suggest personalized learning items (such as lectures and exercises) to meet the unique needs of each learner. Despite the extensive research in this field, previous approaches have primarily modeled the interaction sequences between learners and items using simple indexing, leading to three issues: (1) The utilization of information from both learners and items is not sufficient. For instance, these models are unable to leverage the semantic information contained within the textual content of the items. (2) Models need to be retrained on different datasets separately, which makes it difficult to adapt to the continuously expanding item pool in online educational scenarios. (3) The existing recommendation paradigm based on trained reinforcement learning frameworks, suffers from unstable recommendation performance in sparse learning logs. To address these challenges, we propose a generalized Generative Agent for Adaptive Learning (GenAL), which integrates educational tools with LLMs' semantic understanding to enable effective and generalizable learning path recommendations across diverse data distributions. Specifically, our framework consists of two components: the Global Thinking Agent, which updates the learner profile and reflects on recommendation outcomes based on the learner's historical learning records. The other is the Local Teaching Agent, which recommends items using educational prior knowledge. Leveraging the LLM's robust semantic understanding, our framework does not rely on item indexing but instead extracts relevant information from the textual content. We evaluated our approach on three real-world datasets, and the experimental results demonstrate that our GenAL not only consistently outperforms all baselines but also exhibits strong generalization ability.

AAAI Conference 2025 Conference Paper

Geometry-Aware 3D Salient Object Detection Network

  • Chen Wang
  • Liyuan Zhang
  • Le Hui
  • Qi Liu
  • Yuchao Dai

Point cloud salient object detection has attracted the attention of researchers in recent years. Since existing works do not fully utilize the geometry context of 3D objects, blurry boundaries are generated when segmenting objects with complex backgrounds. In this paper, we propose a geometry-aware 3D salient object detection network that explicitly clusters points into superpoints to enhance the geometric boundaries of objects, thereby segmenting complete objects with clear boundaries. Specifically, we first propose a simple yet effective superpoint partition module to cluster points into superpoints. In order to improve the quality of superpoints, we present a point cloud class-agnostic loss to learn discriminative point features for clustering superpoints from the object. After obtaining superpoints, we then propose a geometry enhancement module that utilizes superpoint-point attention to aggregate geometric information into point features for predicting the salient map of the object with clear boundaries. Extensive experiments show that our method achieves new state-of-the-art performance on the PCSOD dataset.

TAAS Journal 2025 Journal Article

GSFL: A Privacy-Preserving Grouping-Split Federated Learning Approach in Resource-Constrained Edge Computing Scenarios

  • Qi Liu
  • Zhilu Wang
  • Xiaokang Zhou
  • Yonghong Zhang
  • Xiaodong Liu
  • Haiyang Lin

The advancement of mobile multimedia communications, 5G, and Internet of Things (IoT) has led to the widespread use of edge devices, including sensors, smartphones, and wearables. This has generated in a large amount of distributed data, leading to new prospects for deep learning. However, this data is confined within data silos and contains sensitive information, making it difficult to be processed in a centralized manner, particularly under stringent data privacy regulations. Federated learning (FL) offers a solution by enabling collaborative learning while ensuring privacy. Nonetheless, data and device heterogeneity complicate FL implementation. This research presents a specialized FL algorithm for heterogeneous edge computing. It integrates a lightweight grouping strategy for homogeneous devices, a scheduling algorithm within groups, and a Split Learning (SL) approach. These contributions enhance model accuracy and training speed, alleviate the burden on resource-constrained devices, and strengthen privacy. Experimental results demonstrate that the GSFL outperforms FedAvg and SplitFed by 6.53× and 1.18×. Under experimental conditions with \(\alpha=0.05\), representing a highly heterogeneous data distribution typical of extreme Non-IID scenarios, GSFL showed better accuracy compared to FedAvg by 10.64%, HACCS by 4.53%, and Cluster-HSFL by 1.16%. GSFL effectively balances privacy protection and computational efficiency for real-world applications in mobile multimedia communications.

TIST Journal 2025 Journal Article

Hierarchical Multimodal LLMs with Semantic Space Alignment for Enhanced Time Series Classification

  • Xiaoyu Tao
  • Tingyue Pan
  • Mingyue Cheng
  • Yucong Luo
  • Qi Liu
  • Enhong Chen

Time series classification plays a fundamental role in a wide range of real-world applications. Recently, large language models (LLMs) have demonstrated strong generalization and reasoning capacities, but directly applying them to time series classification remains non-trivial due to the representation gap between numerical sequences and linguistic semantics. In this paper, we propose HiTime, a hierarchical LLM-based framework for multimodal time series classification that bridges structured temporal representations with semantic reasoning in a generative paradigm. Specifically, we design a hierarchical sequence feature encoding module composed of a data-specific encoder and a task-specific encoder to extract complementary temporal features. To mitigate the embedding gap between time series representations and textual semantics, we further introduce a semantic space alignment module that jointly performs coarse-grained global modeling and fine-grained cross-modal correspondence. Building upon the above representations, we employ a parameter-efficient supervised fine-tuning strategy to activate the generative classification capability of the algined LLMs, thereby transforming conventional discriminative time series classification into a generative task. Extensive experiments on multiple benchmarks demonstrate that the proposed framework consistently outperforms state-of-the-art baselines 1.

NeurIPS Conference 2025 Conference Paper

Improving Time Series Forecasting via Instance-aware Post-hoc Revision

  • Zhiding Liu
  • Mingyue Cheng
  • Guanhao Zhao
  • Jiqian Yang
  • Qi Liu
  • Enhong Chen

Time series forecasting plays a pivotal role in various real-world applications and has attracted significant attention in recent decades. While recent methods have achieved remarkable accuracy by incorporating advanced inductive biases and training strategies, we observe that instance-level variations remain a significant challenge. These variations—stemming from distribution shifts, missing data, and long-tail patterns—often lead to suboptimal forecasts for specific instances, even when overall performance appears strong. To address this issue, we propose a model-agnostic framework, PIR, designed to enhance forecasting performance through Post-forecasting Identification and Revision. Specifically, PIR first identifies biased forecast instances by estimating their predictive accuracy. Based on this, the framework revises the forecasts using contextual information, including covariates and historical time series, from both local and global perspectives in a post-processing fashion. Extensive experiments on real-world datasets with mainstream forecasting models demonstrate that PIR effectively mitigates instance-level errors and significantly improves forecasting reliability.

AAAI Conference 2025 Conference Paper

LLM Agents Can Be Choice-Supportive Biased Evaluators: An Empirical Study

  • Nan Zhuang
  • Boyu Cao
  • Yi Yang
  • Jing Xu
  • Mingda Xu
  • Yuxiao Wang
  • Qi Liu

With Large Language Model (LLM) agents taking on more evaluation responsibilities in decision-making, it is essential to recognize their possible biases to guarantee fair and trustworthy AI-supported decisions. This study is the first to thoroughly examine the choice-supportive bias in LLM agents, a cognitive bias that is known to impact human decision-making and evaluation. We conduct experiments across 19 open/unopen-source LLM models in five scenarios at maximum, employing both memory-based and evaluation-based tasks adapted and redesigned from human cognitive studies. Our findings show that LLM agents may exhibit biased attribution or evaluation that supports their initial choices, and such bias may persist even if contextual hallucination is not observable. Key findings show that bias manifestation can differ greatly depending on prompt construction and context preservation, and the bias may be mitigated in larger models. Significantly, we observe that the bias increases when the agents perceive they are in control. Our extensive study involving 284 well-educated humans shows that, despite bias, certain LLM agents can still perform better than humans in similar evaluation tasks. This research contributes to the growing area of AI psychology, and the findings underscore the importance of addressing cognitive biases in LLM Agent systems, with wide-ranging implications spanning from improving AI-assisted decision-making to advancing AI safety and ethics.

NeurIPS Conference 2025 Conference Paper

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

  • Daoguang Zan
  • Zhirong Huang
  • Wei Liu
  • Hanwu Chen
  • Shulin Xin
  • Linhao Zhang
  • Qi Liu
  • Li Aoyan

The task of issue resolving aims to modify a codebase to generate a patch that addresses a given issue. However, most existing benchmarks focus almost exclusively on Python, making them insufficient for evaluating Large Language Models (LLMs) across different programming languages. To bridge this gap, we introduce a multilingual issue-resolving benchmark, called Multi-SWE-bench, covering 8 languages of Python, Java, TypeScript, JavaScript, Go, Rust, C, and C++. In particular, this benchmark includes a total of 2, 132 high-quality instances, carefully curated by 68 expert annotators, ensuring a reliable and accurate evaluation of LLMs on the issue-resolving task. Based on human-annotated results, the issues are further classified into three difficulty levels. We evaluate a series of state-of-the-art models on Multi-SWE-bench, utilizing both procedural and agent-based frameworks for issue resolving. Our experiments reveal three key findings: (1) Limited generalization across languages: While existing LLMs perform well on Python issues, their ability to generalize across other languages remains limited; (2) Performance aligned with human-annotated difficulty: LLM-based agents' performance closely aligns with human-assigned difficulty, with resolution rates decreasing as issue complexity rises; and (3) Performance drop on cross-file issues: The performance of current methods significantly deteriorates when handling cross-file issues. These findings highlight the limitations of current LLMs and underscore the need for more robust models capable of handling a broader range of programming languages and complex issue scenarios.

NeurIPS Conference 2025 Conference Paper

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

  • Ling Fu
  • Zhebin Kuang
  • Jiajun Song
  • Mingxin Huang
  • Biao Yang
  • Yuzhe Li
  • Linghao Zhu
  • Qidi Luo

Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest. Existing benchmarks have highlighted the impressive performance of LMMs in text recognition; however, their abilities in certain challenging tasks, such as text localization, handwritten content extraction, and logical reasoning, remain underexplored. To bridge this gap, we introduce OCRBench v2, a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks ($4\times$ more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios ($31$ diverse scenarios), and thorough evaluation metrics, with $10, 000$ human-verified question-answering pairs and a high proportion of difficult samples. Moreover, we construct a private test set with $1, 500$ manually annotated images. The consistent evaluation trends observed across both public and private test sets validate the OCRBench v2's reliability. After carefully benchmarking state-of-the-art LMMs, we find that most LMMs score below $50$ ($100$ in total) and suffer from five-type limitations, including less frequently encountered text recognition, fine-grained perception, layout perception, complex element parsing, and logical reasoning. The benchmark and evaluation scripts are available at https: //github. com/Yuliang-Liu/MultimodalOCR.

NeurIPS Conference 2025 Conference Paper

Personalized Visual Content Generation in Conversational Systems

  • Xianquan Wang
  • Zhaocheng Du
  • Huibo Xu
  • Shukang Yin
  • Yupeng Han
  • Jieming Zhu
  • Kai Zhang
  • Qi Liu

With the rapid progress of large language models (LLMs) and diffusion models, there has been growing interest in personalized content generation. However, current conversational systems often present the same recommended content to all users, falling into the dilemma of "one-size-fits-all. " To break this limitation and boost user engagement, in this paper, we introduce PCG ( P ersonalized Visual C ontent G eneration), a unified framework for personalizing item images within conversational systems. We tackle two key bottlenecks: the depth of personalization and the fidelity of generated images. Specifically, an LLM-powered Inclinations Analyzer is adopted to capture user likes and dislikes from context to construct personalized prompts. Moreover, we design a dual-stage LoRA mechanism—Global LoRA for understanding task-specific visual style, and Local LoRA for capturing preferred visual elements from conversation history. During training, we introduce the visual content condition method to ensure LoRA learns both historical visual context and maintains fidelity to the original item images. Extensive experiments on benchmark conversational datasets—including objective metrics and GPT-based evaluations—demonstrate that our framework outperforms strong baselines, which highlight its potential to redefine personalization in visual content generation for conversational scenarios like e-commerce and real-world recommendation.

NeurIPS Conference 2025 Conference Paper

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

  • Shi Qiu
  • Shaoyang Guo
  • Zhuo-Yang Song
  • Yunbo Sun
  • Zeyu Cai
  • Jiashen Wei
  • Tianyu Luo
  • Yixuan Yin

Current benchmarks for evaluating the reasoning capabilities of Large Language Models (LLMs) face significant limitations: task oversimplification, data contamination, and flawed evaluation items. These deficiencies necessitate more rigorous assessment methods. To address these limitations, we introduce PHYBench, a benchmark of 500 original physics problems ranging from high school to Physics Olympiad difficulty. PHYBench addresses data contamination through original content and employs a systematic curation pipeline to eliminate flawed items. Evaluations show that PHYBench activates more tokens and provides stronger differentiation between reasoning models compared to other baselines like AIME 2024, OlympiadBench and GPQA. Even the best-performing model, Gemini 2. 5 Pro, achieves only 36. 9\% accuracy compared to human experts' 61. 9\%. To further enhance evaluation precision, we introduce the Expression Edit Distance (EED) Score for mathematical expression assessment, which improves sample efficiency by 204\% over binary scoring. Moreover, PHYBench effectively elicits multi-step and multi-condition reasoning, providing a platform for examining models' reasoning robustness, preferences, and deficiencies. The benchmark results and dataset are publicly available at https: //www. phybench. cn/.

AAAI Conference 2025 Conference Paper

Precision-Enhanced Human-Object Contact Detection via Depth-Aware Perspective Interaction and Object Texture Restoration

  • Yuxiao Wang
  • Wenpeng Neng
  • Zhenao Wei
  • Yu Lei
  • Weiying Xue
  • Nan Zhuang
  • Yanwu Xu
  • Xinyu Jiang

Human-object contact (HOT) is designed to accurately identify the areas where humans and objects come into contact. Current methods frequently fail to account for scenarios where objects are frequently blocking the view, resulting in inaccurate identification of contact areas. To tackle this problem, we suggest using a perspective interaction HOT detector called PIHOT, which utilizes a depth map generation model to offer depth information of humans and objects related to the camera, thereby preventing false interaction detection. Furthermore, we use mask dilatation and object restoration techniques to restore the texture details in covered areas, improve the boundaries between objects, and enhance the perception of humans interacting with objects. Moreover, a spatial awareness perception is intended to concentrate on the characteristic features close to the points of contact. The experimental results show that the PIHOT algorithm achieves state-of-the-art performance on three benchmark datasets for HOT detection tasks. Compared to the most recent DHOT, our method enjoys an average improvement of 13%, 27.5%, 16%, and 18.5% on SC-Acc., C-Acc., mIoU, and wIoU metrics, respectively.

ICLR Conference 2025 Conference Paper

Rethinking and Improving Autoformalization: Towards a Faithful Metric and a Dependency Retrieval-based Approach

  • Qi Liu
  • Xinhao Zheng
  • Xudong Lu
  • Qinxiang Cao
  • Junchi Yan

As a central component in formal verification, statement autoformalization has been widely studied including the recent efforts from machine learning community, but still remains a widely-recognized difficult and open problem. In this paper, we delve into two critical yet under-explored gaps: 1) absence of faithful and universal automated evaluation for autoformalization results; 2) agnosia of contextual information, inducing severe hallucination of formal definitions and theorems. To address the first issue, we propose **BEq** (_**B**idirectional **E**xtended Definitional E**q**uivalence_), an automated neuro-symbolic method to determine the equivalence between two formal statements, which is formal-grounded and well-aligned with human intuition. For the second, we propose **RAutoformalizer** (_**R**etrieval-augmented **Autoformalizer**_), augmenting statement autoformalization by _Dependency Retrieval_, retrieving potentially dependent objects from formal libraries. We parse the dependencies of libraries and propose to _structurally informalise_ formal objects by the topological order of dependencies. To evaluate OOD generalization and research-level capabilities, we build a novel benchmark, _Con-NF_, consisting of 961 informal-formal statement pairs from frontier mathematical researches. Experiments validate the effectiveness of our approaches: BEq is evaluated on 200 diverse formal statement pairs with expert-annotated equivalence label, exhibiting significantly improved accuracy ($82.50\\% \mapsto 90.50\\%$) and precision ($70.59\\% \mapsto 100.0\\%$). For dependency retrieval, a strong baseline is devised. Our RAutoformalizer substantially outperforms SOTA baselines in both in-distribution ProofNet benchmark ($12.83\\% \mapsto 18.18\\%$, BEq@8) and OOD Con-NF scenario ($4.58\\%\mapsto 16.86\\%$, BEq@8).

AAAI Conference 2025 Conference Paper

Semi-IIN: Semi-Supervised Intra-Inter Modal Interaction Learning Network for Multimodal Sentiment Analysis

  • Jinhao Lin
  • Yifei Wang
  • Yanwu Xu
  • Qi Liu

Despite multimodal sentiment analysis being a fertile research ground that merits further investigation, current approaches take up high annotation cost and suffer from label ambiguity, non-amicable to high-quality labeled data acquisition. Furthermore, choosing the right interactions is essential because the significance of intra- or inter-modal interactions can differ among various samples. To this end, we propose Semi-IIN, a Semi-supervised Intra-inter modal Interaction learning Network for multimodal sentiment analysis. Semi-IIN integrates masked attention and gating mechanisms, enabling effective dynamic selection after independently capturing intra- and inter-modal interactive information. Combined with the self-training approach, Semi-IIN fully utilizes the knowledge learned from unlabeled data. Experimental results on two public datasets, MOSI and MOSEI, demonstrate the effectiveness of Semi-IIN, establishing a new state-of-the-art on several metrics.

NeurIPS Conference 2025 Conference Paper

SGN: Shifted Window-Based Hierarchical Variable Grouping for Multivariate Time Series Classification

  • Zenan Ying
  • Zhi Zheng
  • huijun hou
  • Tong Xu
  • Qi Liu
  • Jinke Wang
  • Wei Chen

Multivariate time series (MTS) classification has attracted increasing attention across various domains. Existing methods either decompose MTS into separate univariate series, ignoring inter-variable dependencies, or jointly model all variables, which may lead to over-smoothing and loss of semantic structure. These limitations become particularly pronounced when dealing with complex and heterogeneous variable types. To address these challenges, we propose SwinGroupNet (SGN), which explores a novel perspective for constructing variable interaction and temporal dependency. Specifically, SGN processes multi-scale time series using (1) Variable Group Embedding (VGE), which partitions variables into groups and performs independent group-wise embedding; (2) Multi-Scale Group Window Mixing (MGWM), which reconstructs variable interactions by modeling both intra-group and inter-group dependencies while extracting multi-scale temporal features; and (3) Periodic Window Shifting and Merging (PWSM), which exploits inherent periodic patterns to enable hierarchical temporal interaction and feature aggregation. Extensive experiments on diverse benchmark datasets from multiple domains demonstrate that SGN consistently achieves state-of-the-art performance, with an average improvement of 4. 2% over existing methods. We release the source code at https: //anonymous. 4open. science/r/SGN.

AAAI Conference 2025 Conference Paper

SUMO: Search-Based Uncertainty Estimation for Model-Based Offline Reinforcement Learning

  • Zhongjian Qiao
  • Jiafei Lyu
  • Kechen Jiao
  • Qi Liu
  • Xiu Li

The performance of offline reinforcement learning (RL) suffers from the limited size and quality of static datasets. Model-based offline RL addresses this issue by generating synthetic samples through a dynamics model to enhance overall performance. To evaluate the reliability of the generated samples, uncertainty estimation methods are often employed. However, model ensemble, the most commonly used uncertainty estimation method, is not always the best choice. In this paper, we propose a Search-based Uncertainty estimation method for Model-based Offline RL (SUMO) as an alternative. SUMO characterizes the uncertainty of synthetic samples by measuring their cross entropy against the in-distribution dataset samples, and uses an efficient search-based method for implementation. In this way, SUMO can achieve trustworthy uncertainty estimation. We integrate SUMO into several model-based offline RL algorithms including MOPO and Adapted MOReL (AMOReL), and provide theoretical analysis for them. Extensive experimental results on D4RL datasets demonstrate that SUMO can provide accurate uncertainty estimation and boost the performance of base algorithms. These indicate that SUMO could be a better uncertainty estimator for model-based offline RL when used in either reward penalty or trajectory truncation.

IJCAI Conference 2025 Conference Paper

TCDM: A Temporal Correlation-Empowered Diffusion Model for Time Series Forecasting

  • Huibo Xu
  • Likang Wu
  • Xianquan Wang
  • Zhiding Liu
  • Qi Liu

Although previous studies have applied diffusion models to time series forecasting, these efforts have struggled to preserve the intrinsic temporal correlations within the series, leading to suboptimal predictive outcomes. This failure primarily results from the introduction of independent, identically distributed (i. i. d. ) noise. In the forward process, the addition of i. i. d. noise to the time series gradually diminishes these temporal correlations. The reverse process starts with i. i. d. noise and lacks priors related to temporal correlations, which can result in directional biases during sampling. From a frequency-domain perspective, noise disrupts the low-frequency-dominated structure of trend components, making it difficult for the model to learn long-term temporal dependencies. To address these limitations, we introduce a decomposition prediction framework to complement the novel Temporal Correlation-Empowered Diffusion Model. Overall, We decompose the time series into trend and residual components, predict them using a base model and a diffusion model, and then combine the results. Specifically, a frequency-domain MLP model was adopted as the base model due to its not distorting the original sequence, and better the capture of long-range temporal dependencies. The diffusion model incorporates two key modules to capture short- and mid-range temporal correlations: the Maintaining Temporal Correlation Module and the Redesigned Initial Module. Extensive experiments across multiple datasets demonstrate that the proposed method significantly outperforms related strong baselines.

AAAI Conference 2025 Conference Paper

Toy-GS: Assembling Local Gaussians for Precisely Rendering Large-Scale Free Camera Trajectories

  • Xiaohan Zhang
  • Zhenyu Sun
  • Yukui Qiu
  • Junyan Su
  • Qi Liu

Currently, 3D rendering for large-scale free camera trajectories, namely, arbitrary input camera trajectories, poses significant challenges: 1) The distribution and observation angles of the cameras are irregular, and various types of scenes are included in the free trajectories; 2) Processing the entire point cloud and all images at once for large-scale scenes requires a substantial amount of GPU memory. This paper presents a Toy-GS method for accurately rendering large-scale free camera trajectories. Specifically, we propose an adaptive spatial division approach for free trajectories to divide cameras and the sparse point cloud of the entire scene into various regions according to camera poses. Training each local Gaussian in parallel for each area enables us to concentrate on texture details and minimize GPU memory usage. Next, we use the multi-view constraint and position-aware point adaptive control (PPAC) to improve the rendering quality of texture details. In addition, our regional fusion approach combines local and global Gaussians to enhance rendering quality with an increasing number of divided areas. Extensive experiments have been carried out to confirm the effectiveness and efficiency of Toy-GS, leading to state-of-the-art results on two public large-scale datasets as well as our SCUTic dataset. Our proposal demonstrates an enhancement of 1.19 dB in PSNR and conserves 7 G of GPU memory when compared to various benchmarks.

AAAI Conference 2025 Conference Paper

VERSE: Verification-based Self-Play for Code Instructions

  • Hao Jiang
  • Qi Liu
  • Rui Li
  • Yuze Zhao
  • Yixiao Ma
  • Shengyu Ye
  • Junyu Lu
  • Yu Su

Instruction-tuned Code Large Language Models (Code LLMs) have excelled in diverse code-related tasks, such as program synthesis, automatic program repair, and code explanation. To collect training datasets for instruction-tuning, a popular method involves having models autonomously generate instructions and corresponding responses. However, the direct generation of responses does not ensure functional correctness, a crucial requirement for generating responses to code instructions. To overcome this, we present Verification-Based Self-Play (VERSE), aiming to enhance model proficiency in generating correct responses. VERSE establishes a robust verification framework that covers various code instructions. Employing VERSE, Code LLMs engage in self-play to generate instructions and corresponding verifications. They evaluate execution results and self-consistency as verification outcomes, using them as scores to rank generated data for self-training. Experiments show that VERSE improves multiple base Code LLMs (average 7.6%) across various languages and tasks on many benchmarks, affirming its effectiveness.

IJCAI Conference 2025 Conference Paper

WDMIR: Wavelet-Driven Multimodal Intent Recognition

  • Weiyin Gong
  • Kai Zhang
  • Yanghai Zhang
  • Qi Liu
  • Xinjie Sun
  • Junyu Lu
  • Linbo Zhu

Multimodal intent recognition (MIR) seeks to accurately interpret user intentions by integrating verbal and non-verbal information across video, audio and text modalities. While existing approaches prioritize text analysis, they often overlook the rich semantic content embedded in non-verbal cues. This paper presents a novel Wavelet-Driven Multimodal Intent Recognition (WDMIR) framework that enhances intent understanding through frequency-domain analysis of non-verbal information. To be more specific, we propose: (1) a wavelet-driven fusion module that performs synchronized decomposition and integration of video-audio features in the frequency domain, enabling fine-grained analysis of temporal dynamics; (2) a cross-modal interaction mechanism that facilitates progressive feature enhancement from bimodal to trimodal integration, effectively bridging the semantic gap between verbal and non-verbal information. Extensive experiments on MIntRec demonstrate that our approach achieves state-of-the-art performance, surpassing previous methods by 1. 13% on accuracy. Ablation studies further verify that the wavelet-driven fusion module significantly improves the extraction of semantic information from non-verbal sources, with a 0. 41% increase in recognition accuracy when analyzing subtle emotional cues.

NeurIPS Conference 2025 Conference Paper

Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering

  • Yangfu Li
  • Hongjian Zhan
  • Tianyi Chen
  • Qi Liu
  • Yu-Jie Xiong
  • Yue Lu

Existing visual token pruning methods target prompt alignment and visual preservation with static strategies, overlooking the varying relative importance of these objectives across tasks, which leads to inconsistent performance. To address this, we derive the first closed-form error bound for visual token pruning based on the Hausdorff distance, uniformly characterizing the contributions of both objectives. Moreover, leveraging $\epsilon$-covering theory, we reveal an intrinsic trade-off between these objectives and quantify their optimal attainment levels under a fixed budget. To practically handle this trade-off, we propose Multi-Objective Balanced Covering (MoB), which reformulates visual token pruning as a bi-objective covering problem. In this framework, the attainment trade-off reduces to budget allocation via greedy radius trading. MoB offers a provable performance bound and linear scalability with respect to the number of input visual tokens, enabling adaptation to challenging pruning scenarios. Extensive experiments show that MoB preserves 96. 4\% of performance for LLaVA-1. 5-7B using only 11. 1\% of the original visual tokens and accelerates LLaVA-Next-7B by 1. 3-1. 5$\times$ with negligible performance loss. Additionally, evaluations on Qwen2-VL and Video-LLaVA confirm that MoB integrates seamlessly into advanced MLLMs and diverse vision-language tasks. The code will be made available soon.

NeurIPS Conference 2024 Conference Paper

3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration

  • Liyuan Zhang
  • Le Hui
  • Qi Liu
  • Bo Li
  • Yuchao Dai

Multi-instance point cloud registration aims to estimate the pose of all instances of a model point cloud in the whole scene. Existing methods all adopt the strategy of first obtaining the global correspondence and then clustering to obtain the pose of each instance. However, due to the cluttered and occluded objects in the scene, it is difficult to obtain an accurate correspondence between the model point cloud and all instances in the scene. To this end, we propose a simple yet powerful 3D focusing-and-matching network for multi-instance point cloud registration by learning the multiple pair-wise point cloud registration. Specifically, we first present a 3D multi-object focusing module to locate the center of each object and generate object proposals. By using self-attention and cross-attention to associate the model point cloud with structurally similar objects, we can locate potential matching instances by regressing object centers. Then, we propose a 3D dual-masking instance matching module to estimate the pose between the model point cloud and each object proposal. It performs instance mask and overlap mask masks to accurately predict the pair-wise correspondence. Extensive experiments on two public benchmarks, Scan2CAD and ROBI, show that our method achieves a new state-of-the-art performance on the multi-instance point cloud registration task.

IJCAI Conference 2024 Conference Paper

A Teacher Classroom Dress Assessment Method Based on a New Assessment Dataset

  • Ming Fang
  • Qi Liu
  • Yunpeng Zhou
  • Xinning Du
  • Qiwen Liang
  • Shuhua Liu

Proper attire is a professional requirement for teachers and teachers' dress influence students' perceptions of teacher quality. Therefore, evaluating teacher attire can better regulate and improve the teacher’s dress. However, the lack of a dataset on teacher attire hinders the development of this field. For this purpose, this paper constructs a Teachers' Classroom Dress Assessment (TCDA) dataset. To our knowledge, it is the first dataset focused on teacher attire. This dataset is entirely from the classroom environment, covering 25 teacher attributes, with a total of 11879 teacher dress samples and sufficient positive and negative examples. Therefore, the TCDA dataset is a challenging evaluation dataset with characteristics such as data diversity. In order to verify the effectiveness of the dataset, this paper systematically explores a new perspective on human attribute information and proposes for the first time a Teachers' Dress Assessment Method (TDAM), aiming to use predicted teacher attributes to scoring the overall attire of each teacher, thereby promoting the development of the teacher's classroom teaching field. The experimental results demonstrate the rationality of the TCDA dataset and the effectiveness of the TDAM method. The dataset and code can be openly obtained at https: //github. com/MingZier/TCDA-dataset.

AAAI Conference 2024 Conference Paper

AT4CTR: Auxiliary Match Tasks for Enhancing Click-Through Rate Prediction

  • Qi Liu
  • Xuyang Hou
  • Defu Lian
  • Zhe Wang
  • Haoran Jin
  • Jia Cheng
  • Jun Lei

Click-through rate (CTR) prediction is a vital task in industrial recommendation systems. Most existing methods focus on the network architecture design of the CTR model for better accuracy and suffer from the data sparsity problem. Especially in industrial recommendation systems, the widely applied negative sample down-sampling technique due to resource limitation worsens the problem, resulting in a decline in performance. In this paper, we propose Auxiliary Match Tasks for enhancing Click-Through Rate (AT4CTR) prediction accuracy by alleviating the data sparsity problem. Specifically, we design two match tasks inspired by collaborative filtering to enhance the relevance modeling between user and item. As the "click" action is a strong signal which indicates the user's preference towards the item directly, we make the first match task aim at pulling closer the representation between the user and the item regarding the positive samples. Since the user's past click behaviors can also be treated as the user him/herself, we apply the next item prediction as the second match task. For both the match tasks, we choose the InfoNCE as their loss function. The two match tasks can provide meaningful training signals to speed up the model's convergence and alleviate the data sparsity. We conduct extensive experiments on one public dataset and one large-scale industrial recommendation dataset. The result demonstrates the effectiveness of the proposed auxiliary match tasks. AT4CTR has been deployed in the real industrial advertising system and has gained remarkable revenue.

NeurIPS Conference 2024 Conference Paper

Collaborative Cognitive Diagnosis with Disentangled Representation Learning for Learner Modeling

  • Weibo Gao
  • Qi Liu
  • Linan Yue
  • Fangzhou Yao
  • Hao Wang
  • Yin Gu
  • Zheng Zhang

Learners sharing similar implicit cognitive states often display comparable observable problem-solving performances. Leveraging collaborative connections among such similar learners proves valuable in comprehending human learning. Motivated by the success of collaborative modeling in various domains, such as recommender systems, we aim to investigate how collaborative signals among learners contribute to the diagnosis of human cognitive states (i. e. , knowledge proficiency) in the context of intelligent education. The primary challenges lie in identifying implicit collaborative connections and disentangling the entangled cognitive factors of learners for improved explainability and controllability in learner Cognitive Diagnosis (CD). However, there has been no work on CD capable of simultaneously modeling collaborative and disentangled cognitive states. To address this gap, we present Coral, a $\underline{Co}$llabo$\underline{ra}$tive cognitive diagnosis model with disentang$\underline{l}$ed representation learning. Specifically, Coral first introduces a disentangled state encoder to achieve the initial disentanglement of learners' states. Subsequently, a meticulously designed collaborative representation learning procedure captures collaborative signals. It dynamically constructs a collaborative graph of learners by iteratively searching for optimal neighbors in a context-aware manner. Using the constructed graph, collaborative information is extracted through node representation learning. Finally, a decoding process aligns the initial cognitive states and collaborative states, achieving co-disentanglement with practice performance reconstructions. Extensive experiments demonstrate the superior performance of Coral, showcasing significant improvements over state-of-the-art methods across several real-world datasets. Our code is available at https: //github. com/bigdata-ustc/Coral.

NeurIPS Conference 2024 Conference Paper

Computerized Adaptive Testing via Collaborative Ranking

  • Zirui Liu
  • Yan Zhuang
  • Qi Liu
  • Jiatong Li
  • Yuren Zhang
  • Zhenya Huang
  • Jinze Wu
  • Shijin Wang

As the deep integration of machine learning and intelligent education, Computerized Adaptive Testing (CAT) has received more and more research attention. Compared to traditional paper-and-pencil tests, CAT can deliver both personalized and interactive assessments by automatically adjusting testing questions according to the performance of students during the test process. Therefore, CAT has been recognized as an efficient testing methodology capable of accurately estimating a student’s ability with a minimal number of questions, leading to its widespread adoption in mainstream selective exams such as the GMAT and GRE. However, just improving the accuracy of ability estimation is far from satisfactory in the real-world scenarios, since an accurate ranking of students is usually more important (e. g. , in high-stakes exams). Considering the shortage of existing CAT solutions in student ranking, this paper emphasizes the importance of aligning test outcomes (student ranks) with the true underlying abilities of students. Along this line, different from the conventional independent testing paradigm among students, we propose a novel collaborative framework, Collaborative Computerized Adaptive Testing (CCAT), that leverages inter-student information to enhance student ranking. By using collaborative students as anchors to assist in ranking test-takers, CCAT can give both theoretical guarantees and experimental validation for ensuring ranking consistency.

AAAI Conference 2024 Conference Paper

CONSIDER: Commonalities and Specialties Driven Multilingual Code Retrieval Framework

  • Rui Li
  • Liyang He
  • Qi Liu
  • Yuze Zhao
  • Zheng Zhang
  • Zhenya Huang
  • Yu Su
  • Shijin Wang

Multilingual code retrieval aims to find code snippets relevant to a user's query from a multilingual codebase, which plays a crucial role in software development and expands their application scenarios compared to classical monolingual code retrieval. Despite the performance improvements achieved by previous studies, two crucial problems are overlooked in the multilingual scenario. First, certain programming languages face data scarcity in specific domains, resulting in limited representation capabilities within those domains. Second, different programming languages can be used interchangeably within the same domain, making it challenging for multilingual models to accurately identify the intended programming language of a user's query. To address these issues, we propose the CommONalities and SpecIalties Driven Multilingual CodE Retrieval Framework (CONSIDER), which includes two modules. The first module enhances the representation of various programming languages by modeling pairwise and global commonalities among them. The second module introduces a novel contrastive learning negative sampling algorithm that leverages language confusion to automatically extract specific language features. Through our experiments, we confirm the significant benefits of our model in real-world multilingual code retrieval scenarios in various aspects. Furthermore, an evaluation demonstrates the effectiveness of our proposed CONSIDER framework in monolingual scenarios as well. Our source code is available at https://github.com/smsquirrel/consider.

NeurIPS Conference 2024 Conference Paper

Decompose, Analyze and Rethink: Solving Intricate Problems with Human-like Reasoning Cycle

  • Shangzi Xue
  • Zhenya Huang
  • Jiayu Liu
  • Xin Lin
  • Yuting Ning
  • Binbin Jin
  • Xin Li
  • Qi Liu

In this paper, we introduce DeAR ( Decompose-Analyze-Rethink ), a framework that iteratively builds a reasoning tree to tackle intricate problems within a single large language model (LLM). Unlike approaches that extend or search for rationales, DeAR is featured by 1) adopting a tree-based question decomposition manner to plan the organization of rationales, which mimics the logical planning inherentin human cognition; 2) globally updating the rationales at each reasoning step through natural language feedback. Specifically, the Decompose stage decomposes the question into simpler sub-questions, storing them as new nodes; the Analyze stage generates and self-checks rationales for sub-questions at each node evel; and the Rethink stage updates parent-node rationales based on feedback from their child nodes. By generating and updating the reasoning process from a more global perspective, DeAR constructs more adaptive and accurate logical structures for complex problems, facilitating timely error correction compared to rationale-extension and search-based approaches such as Tree-of-Thoughts (ToT) and Graph-of-Thoughts (GoT). We conduct extensive experiments on three reasoning benchmarks, including ScienceQA, StrategyQA, and GSM8K, which cover a variety of reasoning tasks, demonstrating that our approach significantly reduces logical errors and enhances performance across various LLMs. Furthermore, we validate that DeAR is an efficient method that achieves a superior trade-off between accuracy and reasoning time compared to ToT and GoT.

NeurIPS Conference 2024 Conference Paper

DeltaDock: A Unified Framework for Accurate, Efficient, and Physically Reliable Molecular Docking

  • Jiaxian Yan
  • Zaixi Zhang
  • Jintao Zhu
  • Kai Zhang
  • Jianfeng Pei
  • Qi Liu

Molecular docking, a technique for predicting ligand binding poses, is crucial in structure-based drug design for understanding protein-ligand interactions. Recent advancements in docking methods, particularly those leveraging geometric deep learning (GDL), have demonstrated significant efficiency and accuracy advantages over traditional sampling methods. Despite these advancements, current methods are often tailored for specific docking settings, and limitations such as the neglect of protein side-chain structures, difficulties in handling large binding pockets, and challenges in predicting physically valid structures exist. To accommodate various docking settings and achieve accurate, efficient, and physically reliable docking, we propose a novel two-stage docking framework, DeltaDock, consisting of pocket prediction and site-specific docking. We innovatively reframe the pocket prediction task as a pocket-ligand alignment problem rather than direct prediction in the first stage. Then we follow a bi-level coarse-to-fine iterative refinement process to perform site-specific docking. Comprehensive experiments demonstrate the superior performance of DeltaDock. Notably, in the blind docking setting, DeltaDock achieves a 31\% relative improvement over the docking success rate compared with the previous state-of-the-art GDL model DiffDock. With the consideration of physical validity, this improvement increases to about 300\%.

NeurIPS Conference 2024 Conference Paper

FlexSBDD: Structure-Based Drug Design with Flexible Protein Modeling

  • Zaixi Zhang
  • Mengdi Wang
  • Qi Liu

Structure-based drug design (SBDD), which aims to generate 3D ligand molecules binding to target proteins, is a fundamental task in drug discovery. Existing SBDD methods typically treat protein as rigid and neglect protein structural change when binding with ligand molecules, leading to a big gap with real-world scenarios and inferior generation qualities (e. g. , many steric clashes). To bridge the gap, we propose FlexSBDD, a deep generative model capable of accurately modeling the flexible protein-ligand complex structure for ligand molecule generation. FlexSBDD adopts an efficient flow matching framework and leverages E(3)-equivariant network with scalar-vector dual representation to model dynamic structural changes. Moreover, novel data augmentation schemes based on structure relaxation/sidechain repacking are adopted to boost performance. Extensive experiments demonstrate that FlexSBDD achieves state-of-the-art performance in generating high-affinity molecules and effectively modeling the protein's conformation change to increase favorable protein-ligand interactions (e. g. , Hydrogen bonds) and decrease steric clashes.

NeurIPS Conference 2024 Conference Paper

Generalized Protein Pocket Generation with Prior-Informed Flow Matching

  • Zaixi Zhang
  • Marinka Zitnik
  • Qi Liu

Designing ligand-binding proteins, such as enzymes and biosensors, is essential in bioengineering and protein biology. One critical step in this process involves designing protein pockets, the protein interface binding with the ligand. Current approaches to pocket generation often suffer from time-intensive physical computations or template-based methods, as well as compromised generation quality due to the overlooking of domain knowledge. To tackle these challenges, we propose PocketFlow, a generative model that incorporates protein-ligand interaction priors based on flow matching. During training, PocketFlow learns to model key types of protein-ligand interactions, such as hydrogen bonds. In the sampling, PocketFlow leverages multi-granularity guidance (overall binding affinity and interaction geometry constraints) to facilitate generating high-affinity and valid pockets. Extensive experiments show that PocketFlow outperforms baselines on multiple benchmarks, e. g. , achieving an average improvement of 1. 29 in Vina Score and 0. 05 in scRMSD. Moreover, modeling interactions make PocketFlow a generalized generative model across multiple ligand modalities, including small molecules, peptides, and RNA.

IJCAI Conference 2024 Conference Paper

Making LLMs as Fine-Grained Relation Extraction Data Augmentor

  • Yifan Zheng
  • Wenjun Ke
  • Qi Liu
  • Yuting Yang
  • Ruizhuo Zhao
  • Dacheng Feng
  • Jianwei Zhang
  • Zhi Fang

Relation Extraction (RE) identifies relations between entities in text, typically relying on supervised models that demand abundant high-quality data. Various approaches, including Data Augmentation (DA), have been proposed as promising solutions for addressing low-resource challenges in RE. However, existing DA methods in RE often struggle to ensure consistency and contextual diversity in generated data due to the fine-grained nature of RE. Inspired by the extensive generative capabilities of large language models (LLMs), we introduce a novel framework named ConsistRE, aiming to maintain context consistency in RE. ConsistRE initiates by collecting a substantial corpus from external resources and employing statistical algorithms and semantics to identify keyword hints closely related to relation instances. These keyword hints are subsequently integrated as contextual constraints in sentence generation, ensuring the preservation of relation dependence and diversity with LLMs. Additionally, we implement syntactic dependency selection to enhance the syntactic structure of the generated sentences. Experimental results from the evaluation of SemEval, TACRED, and TACREV datasets unequivocally demonstrate that ConsistRE outperforms other baselines in F1 values by 1. 76%, 3. 92%, and 2. 53%, respectively, particularly when operating under low-resource experimental conditions.

TIST Journal 2024 Journal Article

Model-Agnostic Adaptive Testing for Intelligent Education Systems via Meta-learned Gradient Embeddings

  • Haoyang Bi
  • Qi Liu
  • Han Wu
  • Weidong He
  • Zhenya Huang
  • Yu Yin
  • Haiping Ma
  • Yu Su

The field of education has undergone a significant revolution with the advent of intelligent systems and technology, which aim to personalize the learning experience, catering to the unique needs and abilities of individual learners. In this pursuit, a fundamental challenge is designing proper test for assessing the students’ cognitive status on knowledge and skills accurately and efficiently. One promising approach, referred to as Computerized Adaptive Testing (CAT), is to administrate computer-automated tests that alternately select the next item for each examinee and estimate their cognitive states given their responses to the selected items. Nevertheless, existing CAT systems suffer from inflexibility in item selection and ineffectiveness in cognitive state estimation, respectively. In this article, we propose a Model-Agnostic adaptive testing framework via Meta-leaned Gradient Embeddings, MAMGE for short, improving both item selection and cognitive state estimation simultaneously. For item selection, we design a Gradient Embedding-based Item Selector (GEIS) which incorporates the concept of gradient embeddings to represent items and selects the best ones that are both informative and representative. For cognitive state estimation, we propose a Meta-learned Cognitive State Estimator (MCSE) to automatically control the estimation process by learning to learn a proper initialization and dynamically inferred updates. Both MCSE and GEIS are inherently model-agnostic, and the two modules have an ingenious connection via meta-learned gradient embeddings. Finally, extensive experiments evaluate the effectiveness and flexibility of MAMGE.

NeurIPS Conference 2024 Conference Paper

PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations

  • Jiatong Li
  • Renjun Hu
  • Kunzhe Huang
  • Yan Zhuang
  • Qi Liu
  • Mengxiao Zhu
  • Xing Shi
  • Wei Lin

Expert-designed close-ended benchmarks are indispensable in assessing the knowledge capacity of large language models (LLMs). Despite their widespread use, concerns have mounted regarding their reliability due to limited test scenarios and an unavoidable risk of data contamination. To rectify this, we present PertEval, a toolkit devised for in-depth probing of LLMs' knowledge capacity through knowledge-invariant perturbations. These perturbations employ human-like restatement techniques to generate on-the-fly test samples from static benchmarks, meticulously retaining knowledge-critical content while altering irrelevant details. Our toolkit further includes a suite of response consistency analyses that compare performance on raw vs. perturbed test sets to precisely assess LLMs' genuine knowledge capacity. Six representative LLMs are re-evaluated using PertEval. Results reveal significantly inflated performance of the LLMs on raw benchmarks, including an absolute 25. 8% overestimation for GPT-4. Additionally, through a nuanced response pattern analysis, we discover that PertEval retains LLMs' uncertainty to specious knowledge, and reveals their potential rote memorization to correct options which leads to overestimated performance. We also find that the detailed response consistency analyses by PertEval could illuminate various weaknesses in existing LLMs' knowledge mastery and guide the development of refinement. Our findings provide insights for advancing more robust and genuinely knowledgeable LLMs. Our code is available at https: //github. com/aigc-apps/PertEval.

JMLR Journal 2024 Journal Article

Pygmtools: A Python Graph Matching Toolkit

  • Runzhong Wang
  • Ziao Guo
  • Wenzheng Pan
  • Jiale Ma
  • Yikai Zhang
  • Nan Yang
  • Qi Liu
  • Longxuan Wei

Graph matching aims to find node-to-node matching among multiple graphs, which is a fundamental yet challenging problem. To facilitate graph matching in scientific research and industrial applications, pygmtools is released, which is a Python graph matching toolkit that implements a comprehensive collection of two-graph matching and multi-graph matching solvers, covering both learning-free solvers as well as learning-based neural graph matching solvers. Our implementation supports numerical backends including Numpy, PyTorch, Jittor, Paddle, runs on Windows, MacOS and Linux, and is friendly to install and configure. Comprehensive documentations covering beginner's guide, API reference and examples are available online. pygmtools is open-sourced under Mulan PSL v2 license. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

AAAI Conference 2024 Conference Paper

R3CD: Scene Graph to Image Generation with Relation-Aware Compositional Contrastive Control Diffusion

  • Jinxiu Liu
  • Qi Liu

Image generation tasks have achieved remarkable performance using large-scale diffusion models. However, these models are limited to capturing the abstract relations (viz., interactions excluding positional relations) among multiple entities of complex scene graphs. Two main problems exist: 1) fail to depict more concise and accurate interactions via abstract relations; 2) fail to generate complete entities. To address that, we propose a novel Relation-aware Compositional Contrastive Control Diffusion method, dubbed as R3CD, that leverages large-scale diffusion models to learn abstract interactions from scene graphs. Herein, a scene graph transformer based on node and edge encoding is first designed to perceive both local and global information from input scene graphs, whose embeddings are initialized by a T5 model. Then a joint contrastive loss based on attention maps and denoising steps is developed to control the diffusion model to understand and further generate images, whose spatial structures and interaction features are consistent with a priori relation. Extensive experiments are conducted on two datasets: Visual Genome and COCO-Stuff, and demonstrate that the proposal outperforms existing models both in quantitative and qualitative metrics to generate more realistic and diverse images according to different scene graph specifications.

NeurIPS Conference 2024 Conference Paper

SocraticLM: Exploring Socratic Personalized Teaching with Large Language Models

  • Jiayu Liu
  • Zhenya Huang
  • Tong Xiao
  • Jing Sha
  • Jinze Wu
  • Qi Liu
  • Shijin Wang
  • Enhong Chen

Large language models (LLMs) are considered a crucial technology for advancing intelligent education since they exhibit the potential for an in-depth understanding of teaching scenarios and providing students with personalized guidance. Nonetheless, current LLM-based application in personalized teaching predominantly follows a "Question-Answering" paradigm, where students are passively provided with answers and explanations. In this paper, we propose SocraticLM, which achieves a Socratic "Thought-Provoking" teaching paradigm that fulfills the role of a real classroom teacher in actively engaging students in the thought process required for genuine problem-solving mastery. To build SocraticLM, we first propose a novel "Dean-Teacher-Student" multi-agent pipeline to construct a new dataset, SocraTeach, which contains $35$K meticulously crafted Socratic-style multi-round (equivalent to $208$K single-round) teaching dialogues grounded in fundamental mathematical problems. Our dataset simulates authentic teaching scenarios, interacting with six representative types of simulated students with different cognitive states, and strengthening four crucial teaching abilities. SocraticLM is then fine-tuned on SocraTeach with three strategies balancing its teaching and reasoning abilities. Moreover, we contribute a comprehensive evaluation system encompassing five pedagogical dimensions for assessing the teaching quality of LLMs. Extensive experiments verify that SocraticLM achieves significant improvements in the teaching performance, outperforming GPT4 by more than 12\%. Our dataset and code is available at https: //github. com/Ljyustc/SocraticLM.

NeurIPS Conference 2024 Conference Paper

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

  • Weichao Zhao
  • Hao Feng
  • Qi Liu
  • Jingqun Tang
  • Shu Wei
  • Binghong Wu
  • Lei Liao
  • Yongjie Ye

Tables contain factual and quantitative data accompanied by various structures and contents that pose challenges for machine comprehension. Previous methods generally design task-specific architectures and objectives for individual tasks, resulting in modal isolation and intricate workflows. In this paper, we present a novel large vision-language model, TabPedia, equipped with a concept synergy mechanism. In this mechanism, all the involved diverse visual table understanding (VTU) tasks and multi-source visual embeddings are abstracted as concepts. This unified framework allows TabPedia to seamlessly integrate VTU tasks, such as table detection, table structure recognition, table querying, and table question answering, by leveraging the capabilities of large language models (LLMs). Moreover, the concept synergy mechanism enables table perception-related and comprehension-related tasks to work in harmony, as they can effectively leverage the needed clues from the corresponding source perception embeddings. Furthermore, to better evaluate the VTU task in real-world scenarios, we establish a new and comprehensive table VQA benchmark, ComTQA, featuring approximately 9, 000 QA pairs. Extensive quantitative and qualitative experiments on both table perception and comprehension tasks, conducted across various public benchmarks, validate the effectiveness of our TabPedia. The superior performance further confirms the feasibility of using LLMs for understanding visual tables when all concepts work in synergy. The benchmark ComTQA has been open-sourced at https: //huggingface. co/datasets/ByteDance/ComTQA. The source code and model also have been released at https: //github. com/zhaowc-ustc/TabPedia.

NeurIPS Conference 2024 Conference Paper

Towards Accurate and Fair Cognitive Diagnosis via Monotonic Data Augmentation

  • Zheng Zhang
  • Wei Song
  • Qi Liu
  • Qingyang Mao
  • Yiyan Wang
  • Weibo Gao
  • Zhenya Huang
  • Shijin Wang

Intelligent education stands as a prominent application of machine learning. Within this domain, cognitive diagnosis (CD) is a key research focus that aims to diagnose students' proficiency levels in specific knowledge concepts. As a crucial task within the field of education, cognitive diagnosis encompasses two fundamental requirements: accuracy and fairness. Existing studies have achieved significant success by primarily utilizing observed historical logs of student-exercise interactions. However, real-world scenarios often present a challenge, where a substantial number of students engage with a limited number of exercises. This data sparsity issue can lead to both inaccurate and unfair diagnoses. To this end, we introduce a monotonic data augmentation framework, CMCD, to tackle the data sparsity issue and thereby achieve accurate and fair CD results. Specifically, CMCD integrates the monotonicity assumption, a fundamental educational principle in CD, to establish two constraints for data augmentation. These constraints are general and can be applied to the majority of CD backbones. Furthermore, we provide theoretical analysis to guarantee the accuracy and convergence speed of CMCD. Finally, extensive experiments on real-world datasets showcase the efficacy of our framework in addressing the data sparsity issue with accurate and fair CD results.

AAAI Conference 2024 Conference Paper

ViTree: Single-Path Neural Tree for Step-Wise Interpretable Fine-Grained Visual Categorization

  • Danning Lao
  • Qi Liu
  • Jiazi Bu
  • Junchi Yan
  • Wei Shen

As computer vision continues to advance and finds widespread applications across various domains, the need for interpretability in deep learning models becomes paramount. Existing methods often resort to post-hoc techniques or prototypes to explain the decision-making process, which can be indirect and lack intrinsic illustration. In this research, we introduce ViTree, a novel approach for fine-grained visual categorization that combines the popular vision transformer as a feature extraction backbone with neural decision trees. By traversing the tree paths, ViTree effectively selects patches from transformer-processed features to highlight informative local regions, thereby refining representations in a step-wise manner. Unlike previous tree-based models that rely on soft distributions or ensembles of paths, ViTree selects a single tree path, offering a clearer and simpler decision-making process. This patch and path selectivity enhances model interpretability of ViTree, enabling better insights into the model's inner workings. Remarkably, extensive experimentation validates that this streamlined approach surpasses various strong competitors and achieves state-of-the-art performance while maintaining exceptional interpretability which is proved by multi-perspective methods. Code can be found at https://github.com/SJTU-DeepVisionLab/ViTree.

AAAI Conference 2024 Conference Paper

Zero-1-to-3: Domain-Level Zero-Shot Cognitive Diagnosis via One Batch of Early-Bird Students towards Three Diagnostic Objectives

  • Weibo Gao
  • Qi Liu
  • Hao Wang
  • Linan Yue
  • Haoyang Bi
  • Yin Gu
  • Fangzhou Yao
  • Zheng Zhang

Cognitive diagnosis seeks to estimate the cognitive states of students by exploring their logged practice quiz data. It plays a pivotal role in personalized learning guidance within intelligent education systems. In this paper, we focus on an important, practical, yet often underexplored task: domain-level zero-shot cognitive diagnosis (DZCD), which arises due to the absence of student practice logs in newly launched domains. Recent cross-domain diagnostic models have been demonstrated to be a promising strategy for DZCD. These methods primarily focus on how to transfer student states across domains. However, they might inadvertently incorporate non-transferable information into student representations, thereby limiting the efficacy of knowledge transfer. To tackle this, we propose Zero-1-to-3, a domain-level zero-shot cognitive diagnosis framework via one batch of early-bird students towards three diagnostic objectives. Our approach initiates with pre-training a diagnosis model with dual regularizers, which decouples student states into domain-shared and domain-specific parts. The shared cognitive signals can be transferred to the target domain, enriching the cognitive priors for the new domain, which ensures the cognitive state propagation objective. Subsequently, we devise a strategy to generate simulated practice logs for cold-start students through analyzing the behavioral patterns from early-bird students, fulfilling the domain-adaption goal. Consequently, we refine the cognitive states of cold-start students as diagnostic outcomes via virtual data, aligning with the diagnosis-oriented goal. Finally, extensive experiments on six real-world datasets highlight the efficacy of our model for DZCD and its practical application in question recommendation. The code is publicly available at https://github.com/bigdata-ustc/Zero-1-to-3.

AAAI Conference 2024 Conference Paper

π-Light: Programmatic Interpretable Reinforcement Learning for Resource-Limited Traffic Signal Control

  • Yin Gu
  • Kai Zhang
  • Qi Liu
  • Weibo Gao
  • Longfei Li
  • Jun Zhou

The recent advancements in Deep Reinforcement Learning (DRL) have significantly enhanced the performance of adaptive Traffic Signal Control (TSC). However, DRL policies are typically represented by neural networks, which are over-parameterized black-box models. As a result, the learned policies often lack interpretability, and cannot be deployed directly in the real-world edge hardware due to resource constraints. In addition, the DRL methods often exhibit limited generalization performance, struggling to generalize the learned policy to other geographical regions. These factors limit the practical application of learning-based approaches. To address these issues, we suggest the use of an inherently interpretable program for representing the control policy. We present a new approach, Programmatic Interpretable reinforcement learning for traffic signal control (π-light), designed to autonomously discover non-differentiable programs. Specifically, we define a Domain Specific Language (DSL) and transformation rules for constructing programs, and utilize Monte Carlo Tree Search (MCTS) to find the optimal program in a discrete space. Extensive experiments demonstrate that our method consistently outperforms baseline approaches. Moreover, π-Light exhibits superior generalization capabilities compared to DRL, enabling training and evaluation across intersections from different cities. Finally, we analyze how the learned program policies can directly deploy on edge devices with extremely limited resources.

NeurIPS Conference 2023 Conference Paper

A Bounded Ability Estimation for Computerized Adaptive Testing

  • Yan Zhuang
  • Qi Liu
  • Guanhao Zhao
  • Zhenya Huang
  • Weizhe Huang
  • Zachary Pardos
  • Enhong Chen
  • Jinze Wu

Computerized adaptive testing (CAT), as a tool that can efficiently measure student's ability, has been widely used in various standardized tests (e. g. , GMAT and GRE). The adaptivity of CAT refers to the selection of the most informative questions for each student, reducing test length. Existing CAT methods do not explicitly target ability estimation accuracy since there is no student's true ability as ground truth; therefore, these methods cannot be guaranteed to make the estimate converge to the true with such limited responses. In this paper, we analyze the statistical properties of estimation and find a theoretical approximation of the true ability: the ability estimated by full responses to question bank. Based on this, a Bounded Ability Estimation framework for CAT (BECAT) is proposed in a data-summary manner, which selects a question subset that closely matches the gradient of the full responses. Thus, we develop an expected gradient difference approximation to design a simple greedy selection algorithm, and show the rigorous theoretical and error upper-bound guarantees of its ability estimate. Experiments on both real-world and synthetic datasets, show that it can reach the same estimation accuracy using 15\% less questions on average, significantly reducing test length.

IJCAI Conference 2023 Conference Paper

Actor-Multi-Scale Context Bidirectional Higher Order Interactive Relation Network for Spatial-Temporal Action Localization

  • Jun Yu
  • Yingshuai Zheng
  • Shulan Ruan
  • Qi Liu
  • Zhiyuan Cheng
  • Jinze Wu

The key to video action detection lies in the understanding of interaction between persons and background objects in a video. Current methods usually employ object detectors to extract objects directly or use grid features to represent objects in the environment, which underestimate the great potential of multi-scale context information (e. g. , objects and scenes of different sizes). How to exactly represent the multi-scale context and make full utilization of it still remains an unresolved challenge for spatial-temporal action localization. In this paper, we propose a novel Actor-Multi-Scale Context Bidirectional Higher Order Interactive Relation Network (AMCRNet) that extracts multi-scale context through multiple pooling layers with different sizes. Specifically, we develop an Interactive Relation Extraction module to model the higher-order relation between the target person and the context (e. g. , other persons and objects). Along this line, we further propose a History Feature Bank and Interaction method to achieve better performance by modeling such relation across continuing video clips. Extensive experimental results on AVA2. 2 and UCF101-24 demonstrate the superiority and rationality of our proposed AMCRNet.

NeurIPS Conference 2023 Conference Paper

Adaptive Normalization for Non-stationary Time Series Forecasting: A Temporal Slice Perspective

  • Zhiding Liu
  • Mingyue Cheng
  • Zhi Li
  • Zhenya Huang
  • Qi Liu
  • Yanhu Xie
  • Enhong Chen

Deep learning models have progressively advanced time series forecasting due to their powerful capacity in capturing sequence dependence. Nevertheless, it is still challenging to make accurate predictions due to the existence of non-stationarity in real-world data, denoting the data distribution rapidly changes over time. To mitigate such a dilemma, several efforts have been conducted by reducing the non-stationarity with normalization operation. However, these methods typically overlook the distribution discrepancy between the input series and the horizon series, and assume that all time points within the same instance share the same statistical properties, which is too ideal and may lead to suboptimal relative improvements. To this end, we propose a novel slice-level adaptive normalization, referred to \textbf{SAN}, which is a novel scheme for empowering time series forecasting with more flexible normalization and denormalization. SAN includes two crucial designs. First, SAN tries to eliminate the non-stationarity of time series in units of a local temporal slice (i. e. , sub-series) rather than a global instance. Second, SAN employs a slight network module to independently model the evolving trends of statistical properties of raw time series. Consequently, SAN could serve as a general model-agnostic plugin and better alleviate the impact of the non-stationary nature of time series data. We instantiate the proposed SAN on four widely used forecasting models and test their prediction results on benchmark datasets to evaluate its effectiveness. Also, we report some insightful findings to deeply analyze and understand our proposed SAN. We make our codes publicly available.

NeurIPS Conference 2023 Conference Paper

AdaptSSR: Pre-training User Model with Augmentation-Adaptive Self-Supervised Ranking

  • Yang Yu
  • Qi Liu
  • Kai Zhang
  • Yuren Zhang
  • Chao Song
  • Min Hou
  • Yuqing Yuan
  • Zhihao Ye

User modeling, which aims to capture users' characteristics or interests, heavily relies on task-specific labeled data and suffers from the data sparsity issue. Several recent studies tackled this problem by pre-training the user model on massive user behavior sequences with a contrastive learning task. Generally, these methods assume different views of the same behavior sequence constructed via data augmentation are semantically consistent, i. e. , reflecting similar characteristics or interests of the user, and thus maximizing their agreement in the feature space. However, due to the diverse interests and heavy noise in user behaviors, existing augmentation methods tend to lose certain characteristics of the user or introduce noisy behaviors. Thus, forcing the user model to directly maximize the similarity between the augmented views may result in a negative transfer. To this end, we propose to replace the contrastive learning task with a new pretext task: Augmentation-Adaptive SelfSupervised Ranking (AdaptSSR), which alleviates the requirement of semantic consistency between the augmented views while pre-training a discriminative user model. Specifically, we adopt a multiple pairwise ranking loss which trains the user model to capture the similarity orders between the implicitly augmented view, the explicitly augmented view, and views from other users. We further employ an in-batch hard negative sampling strategy to facilitate model training. Moreover, considering the distinct impacts of data augmentation on different behavior sequences, we design an augmentation-adaptive fusion mechanism to automatically adjust the similarity order constraint applied to each sample based on the estimated similarity between the augmented views. Extensive experiments on both public and industrial datasets with six downstream tasks verify the effectiveness of AdaptSSR.

IJCAI Conference 2023 Conference Paper

Beyond Homophily: Robust Graph Anomaly Detection via Neural Sparsification

  • Zheng Gong
  • Guifeng Wang
  • Ying Sun
  • Qi Liu
  • Yuting Ning
  • Hui Xiong
  • Jingyu Peng

Recently, graph-based anomaly detection (GAD) has attracted rising attention due to its effectiveness in identifying anomalies in relational and structured data. Unfortunately, the performance of most existing GAD methods suffers from the inherent structural noises of graphs induced by hidden anomalies connected with considerable benign nodes. In this work, we propose SparseGAD, a novel GAD framework that sparsifies the structures of target graphs to effectively reduce noises and collaboratively learns node representations. It then robustly detects anomalies by uncovering the underlying dependency among node pairs in terms of homophily and heterophily, two essential connection properties of GAD. Extensive experiments on real-world datasets of GAD demonstrate that the proposed framework achieves significantly better detection quality compared with the state-of-the-art methods, even when the graph is heavily attacked. Code will be available at https: //github. com/KellyGong/SparseGAD. git.

NeurIPS Conference 2023 Conference Paper

Evaluating Self-Supervised Learning for Molecular Graph Embeddings

  • Hanchen Wang
  • Jean Kaddour
  • Shengchao Liu
  • Jian Tang
  • Joan Lasenby
  • Qi Liu

Graph Self-Supervised Learning (GSSL) provides a robust pathway for acquiring embeddings without expert labelling, a capability that carries profound implications for molecular graphs due to the staggering number of potential molecules and the high cost of obtaining labels. However, GSSL methods are designed not for optimisation within a specific domain but rather for transferability across a variety of downstream tasks. This broad applicability complicates their evaluation. Addressing this challenge, we present "Molecular Graph Representation Evaluation" (MOLGRAPHEVAL), generating detailed profiles of molecular graph embeddings with interpretable and diversified attributes. MOLGRAPHEVAL offers a suite of probing tasks grouped into three categories: (i) generic graph, (ii) molecular substructure, and (iii) embedding space properties. By leveraging MOLGRAPHEVAL to benchmark existing GSSL methods against both current downstream datasets and our suite of tasks, we uncover significant inconsistencies between inferences drawn solely from existing datasets and those derived from more nuanced probing. These findings suggest that current evaluation methodologies fail to capture the entirety of the landscape.

IJCAI Conference 2023 Conference Paper

Exploiting Non-Interactive Exercises in Cognitive Diagnosis

  • Fangzhou Yao
  • Qi Liu
  • Min Hou
  • Shiwei Tong
  • Zhenya Huang
  • Enhong Chen
  • Jing Sha
  • Shijin Wang

Cognitive Diagnosis aims to quantify the proficiency level of students on specific knowledge concepts. Existing studies merely leverage observed historical students-exercise interaction logs to access proficiency levels. Despite effectiveness, observed interactions usually exhibit a power-law distribution, where the long tail consisting of students with few records lacks supervision signals. This phenomenon leads to inferior diagnosis among few records students. In this paper, we propose the Exercise-aware Informative Response Sampling (EIRS) framework to address the long-tail problem. EIRS is a general framework that explores the partial order between observed and unobserved responses as auxiliary ranking-based training signals to supplement cognitive diagnosis. Considering the abundance and complexity of unobserved responses, we first design an Exercise-aware Candidates Selection module, which helps our framework produce reliable potential responses for effective supplementary training. Then, we develop an Expected Ability Change-weighted Informative Sampling strategy to adaptively sample informative potential responses that contribute greatly to model training. Experiments on real-world datasets demonstrate the supremacy of our framework in long-tailed data.

NeurIPS Conference 2023 Conference Paper

FairLISA: Fair User Modeling with Limited Sensitive Attributes Information

  • Zheng Zhang
  • Qi Liu
  • Hao Jiang
  • Fei Wang
  • Yan Zhuang
  • Le Wu
  • Weibo Gao
  • Enhong Chen

User modeling techniques profile users' latent characteristics (e. g. , preference) from their observed behaviors, and play a crucial role in decision-making. Unfortunately, traditional user models may unconsciously capture biases related to sensitive attributes (e. g. , gender) from behavior data, even when this sensitive information is not explicitly provided. This can lead to unfair issues and discrimination against certain groups based on these sensitive attributes. Recent studies have been proposed to improve fairness by explicitly decorrelating user modeling results and sensitive attributes. However, most existing approaches assume that fully sensitive attribute labels are available in the training set, which is unrealistic due to collection limitations like privacy concerns, and hence bear the limitation of performance. In this paper, we focus on a practical situation with limited sensitive data and propose a novel FairLISA framework, which can efficiently utilize data with known and unknown sensitive attributes to facilitate fair model training. We first propose a novel theoretical perspective to build the relationship between data with both known and unknown sensitive attributes with the fairness objective. Then, based on this, we provide a general adversarial framework to effectively leverage the whole user data for fair user modeling. We conduct experiments on representative user modeling tasks including recommender system and cognitive diagnosis. The results demonstrate that our FairLISA can effectively improve fairness while retaining high accuracy in scenarios with different ratios of missing sensitive attributes.

NeurIPS Conference 2023 Conference Paper

Full-Atom Protein Pocket Design via Iterative Refinement

  • Zaixi Zhang
  • Zepu Lu
  • Hao Zhongkai
  • Marinka Zitnik
  • Qi Liu

The design of \emph{de novo} functional proteins that bind with specific ligand molecules is crucial in various domains like therapeutics and bio-engineering. One vital yet challenging step is to design the protein pocket, the cavity region of protein where the ligand binds with. Existing methods suffer from inefficient generation, insufficient context modeling (ligand molecule), and incapability of generating sidechain atoms. To overcome the limitations, we propose a \textbf{F}ull-\textbf{A}tom \textbf{I}terative \textbf{R}efinement framework (\textbf{FAIR}) for protein pocket sequence (i. e. , residue types) and 3D structure co-design. Generally, FAIR consists of two steps that follow a coarse-to-fine pipeline (backbone atoms to full atoms including sidechain) for full-atom generation. For efficiency, all residue types and structures are updated together in each round (i. e. , full-shot refinement). In the first step, the residue types and backbone coordinates are updated with a hierarchical context encoder and two structure refinement modules capturing inter-residue and pocket-ligand interactions. The second step further models the sidechain atoms of pockets and updates residue types to achieve sequence-structure consistency. The structure of the binding ligand is also updated along with the above refinement iterations accounting for its flexibility. Finally, extensive evaluations showthat FAIR outperforms baselines in efficiently designing high-quality pocket sequences and structures. Specifically, the average improvements on AAR and RMSD are over 10$\%$.

NeurIPS Conference 2023 Conference Paper

GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning

  • Haiteng Zhao
  • Shengchao Liu
  • Ma Chang
  • Hannan Xu
  • Jie Fu
  • Zhihong Deng
  • Lingpeng Kong
  • Qi Liu

Molecule property prediction has gained significant attention in recent years. The main bottleneck is the label insufficiency caused by expensive lab experiments. In order to alleviate this issue and to better leverage textual knowledge for tasks, this study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting. We discover that existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs. To overcome these issues, we propose GIMLET, which unifies language models for both graph and text data. By adopting generalized position embedding, our model is extended to encode both graph structures and instruction text without additional graph encoding modules. GIMLET also decouples encoding of the graph from tasks instructions in the attention mechanism, enhancing the generalization of graph features across novel tasks. We construct a dataset consisting of more than two thousand molecule tasks with corresponding instructions derived from task descriptions. We pretrain GIMLET on the molecule tasks along with instructions, enabling the model to transfer effectively to a broad range of tasks. Experimental results demonstrate that GIMLET significantly outperforms molecule-text baselines in instruction-based zero-shot learning, even achieving closed results to supervised GNN models on tasks such as toxcast and muv.

JBHI Journal 2023 Journal Article

ICL-Net: Global and Local Inter-Pixel Correlations Learning Network for Skin Lesion Segmentation

  • Weiwei Cao
  • Gang Yuan
  • Qi Liu
  • Chengtao Peng
  • Jing Xie
  • Xiaodong Yang
  • Xinye Ni
  • Jian Zheng

Skin lesion segmentation is a fundamental procedure in computer-aided melanoma diagnosis. However, due to the diverse shape, variable size, blurry boundary, and noise interference of lesion regions, existing methods may struggle with the challenge of inconsistency within classes and indiscrimination between classes. In view of this, we propose a novel method to learn and model inter-pixel correlations from both global and local aspects, which can increase inter-class variances and intra-class similarities. Specifically, under the encoder-decoder architecture, we first design a pyramid transformer inter-pixel correlations (PTIC) module, aiming at capturing the non-local context information of different levels and further exploring the global pixel-level relationship to deal with the large variance of shape and size. Further, we devise a local neighborhood metric learning (LNML) module to strengthen the local semantic correlations learning capability and increase the separability between classes in the feature space. These two modules can complementarily strengthen the feature representation capability via exploiting the inter-pixel semantic correlations, thus further improving intra-class consistency and inter-class variance. Comprehensive experiments are performed on public skin lesion segmentation datasets: ISIC 2018, ISIC2016, and PH2, and experimental results demonstrate that the proposed method achieves better segmentation performance than other state-of-the-art methods.

IJCAI Conference 2023 Conference Paper

Keep Skills in Mind: Understanding and Implementing Skills in Commonsense Question Answering

  • Meikai Bao
  • Qi Liu
  • Kai Zhang
  • Ye Liu
  • Linan Yue
  • Longfei Li
  • Jun Zhou

Commonsense Question Answering (CQA) aims to answer questions that require human commonsense. Closed-book CQA, as one of the subtasks, requires the model to answer questions without retrieving external knowledge, which emphasizes the importance of the model's problem-solving ability. Most previous methods relied on large-scale pre-trained models to generate question-related knowledge while ignoring the crucial role of skills in the process of answering commonsense questions. Generally, skills refer to the learned ability in performing a specific task or activity, which are derived from knowledge and experience. In this paper, we introduce a new approach named Dynamic Skill-aware Commonsense Question Answering (DSCQA), which transcends the limitations of traditional methods by informing the model about the need for each skill in questions and utilizes skills as a critical driver in CQA process. To be specific, DSCQA first employs commonsense skill extraction module to generate various skill representations. Then, DSCQA utilizes dynamic skill module to generate dynamic skill representations. Finally, in perception and emphasis module, various skills and dynamic skill representations are used to help question-answering process. Experimental results on two publicly available CQA datasets show the effectiveness of our proposed model and the considerable impact of introducing skills.

AAAI Conference 2023 Conference Paper

Learning by Applying: A General Framework for Mathematical Reasoning via Enhancing Explicit Knowledge Learning

  • Jiayu Liu
  • Zhenya Huang
  • ChengXiang Zhai
  • Qi Liu

Mathematical reasoning is one of the crucial abilities of general artificial intelligence, which requires machines to master mathematical logic and knowledge from solving problems. However, existing approaches are not transparent (thus not interpretable) in terms of what knowledge has been learned and applied in the reasoning process. In this paper, we propose a general Learning by Applying (LeAp) framework to enhance existing models (backbones) in a principled way by explicit knowledge learning. In LeAp, we perform knowledge learning in a novel problem-knowledge-expression paradigm, with a Knowledge Encoder to acquire knowledge from problem data and a Knowledge Decoder to apply knowledge for expression reasoning. The learned mathematical knowledge, including word-word relations and word-operator relations, forms an explicit knowledge graph, which bridges the knowledge “learning” and “applying” organically. Moreover, for problem solving, we design a semantics-enhanced module and a reasoning-enhanced module that apply knowledge to improve the problem comprehension and symbol reasoning abilities of any backbone, respectively. We theoretically prove the superiority of LeAp's autonomous learning mechanism. Experiments on three real-world datasets show that LeAp improves all backbones' performances, learns accurate knowledge, and achieves a more interpretable reasoning process.

AILAW Journal 2023 Journal Article

LK-IB: a hybrid framework with legal knowledge injection for compulsory measure prediction

  • Xiang Zhou
  • Qi Liu
  • Yiquan Wu
  • Qiangchao Chen
  • Kun Kuang

Abstract The interpretability of AI is just as important as its performance. In the LegalAI field, there have been efforts to enhance the interpretability of models, but a trade-off between interpretability and prediction accuracy remains inevitable. In this paper, we introduce a novel framework called LK-IB for compulsory measure prediction (CMP), one of the critical tasks in LegalAI. LK-IB leverages Legal Knowledge and combines an Interpretable model and a Black-box model to balance interpretability and prediction performance. Specifically, LK-IB involves three steps: (1) inputting cases into the first module, where first-order logic (FOL) rules are used to make predictions and output them directly if possible; (2) sending cases to the second module if FOL rules are not applicable, where a case distributor categorizes them as either “simple” or “complex“; and (3) sending simple cases to an interpretable model with strong interpretability and complex cases to a black-box model with outstanding performance. Experimental results demonstrate that the LK-IB framework provides more interpretable and accurate predictions than other state-of-the-art models. Given that the majority of cases in LegalAI are simple, the idea of model combination has significant potential for practical applications.

AAMAS Conference 2023 Conference Paper

Multi-Agent Path Finding with Time Windows: Preliminary Results

  • Jianqi Gao
  • Qi Liu
  • Shiyu Chen
  • Kejian Yan
  • Xinyi Li
  • Yanjie Li

We formalize the problem of multi-agent path finding with time windows (MAPF-TW). The optimization objective is to maximize the average customer satisfaction for all agents when they reach their respective goal vertices without path conflicts. We first prove that solving MAPF-TW optimally is NP-hard. We then reduce the MAPF-TW problem into a multi-commodity flow problem and propose an integer linear programming (ILP) model. Next, we propose the conflict-based search with time windows (CBS-TW) for the MAPF-TW problem, which is also optimal. Finally, we conduct simulation experiments on two different maps with random obstacles.

IJCAI Conference 2023 Conference Paper

Towards Incremental NER Data Augmentation via Syntactic-aware Insertion Transformer

  • Wenjun Ke
  • Zongkai Tian
  • Qi Liu
  • Peng Wang
  • Jinhua Gao
  • Rui Qi

Named entity recognition (NER) aims to locate and classify named entities in natural language texts. Most existing high-performance NER models employ a supervised paradigm, which requires a large quantity of high-quality annotated data during training. In order to help NER models perform well in few-shot scenarios, data augmentation approaches attempt to build extra data by means of random editing or by using end-to-end generation with PLMs. However, these methods focus on only the fluency of generated sentences, ignoring the syntactic correlation between the new and raw sentences. Such uncorrelation also brings low diversity and inconsistent labeling of synthetic samples. To fill this gap, we present SAINT (Syntactic-Aware InsertioN Transformer), a hard-constraint controlled text generation model that incorporates syntactic information. The proposed method operates by inserting new tokens between existing entities in a parallel manner. During insertion procedure, new tokens will be added taking both semantic and syntactic factors into account. Hence the resulting sentence can retain the syntactic correctness with respect to the raw data. Experimental results on two benchmark datasets, i. e. , Ontonotes and Wikiann, demonstrate the comparable performance of SAINT over the state-of-the-art baselines.

AAAI Conference 2023 Conference Paper

Untargeted Attack against Federated Recommendation Systems via Poisonous Item Embeddings and the Defense

  • Yang Yu
  • Qi Liu
  • Likang Wu
  • Runlong Yu
  • Sanshi Lei Yu
  • Zaixi Zhang

Federated recommendation (FedRec) can train personalized recommenders without collecting user data, but the decentralized nature makes it susceptible to poisoning attacks. Most previous studies focus on the targeted attack to promote certain items, while the untargeted attack that aims to degrade the overall performance of the FedRec system remains less explored. In fact, untargeted attacks can disrupt the user experience and bring severe financial loss to the service provider. However, existing untargeted attack methods are either inapplicable or ineffective against FedRec systems. In this paper, we delve into the untargeted attack and its defense for FedRec systems. (i) We propose ClusterAttack, a novel untargeted attack method. It uploads poisonous gradients that converge the item embeddings into several dense clusters, which make the recommender generate similar scores for these items in the same cluster and perturb the ranking order. (ii) We propose a uniformity-based defense mechanism (UNION) to protect FedRec systems from such attacks. We design a contrastive learning task that regularizes the item embeddings toward a uniform distribution. Then the server filters out these malicious gradients by estimating the uniformity of updated item embeddings. Experiments on two public datasets show that ClusterAttack can effectively degrade the performance of FedRec systems while circumventing many defense methods, and UNION can improve the resistance of the system against various untargeted attacks, including our ClusterAttack.

AAAI Conference 2022 Conference Paper

Anisotropic Additive Quantization for Fast Inner Product Search

  • Jin Zhang
  • Qi Liu
  • Defu Lian
  • Zheng Liu
  • Le Wu
  • Enhong Chen

Maximum Inner Product Search (MIPS) plays an important role in many applications ranging from information retrieval, recommender systems to natural language processing and machine learning. However, exhaustive MIPS is often expensive and impractical when there are a large number of candidate items. The state-of-the-art approximated MIPS is product quantization with a score-aware loss, which weighs more heavily on items with larger inner product scores. However, it is challenging to extend the score-aware loss for additive quantization due to parallel-orthogonal decomposition of residual error. Learning additive quantization with respect to this loss is important since additive quantization can achieve a lower approximation error than product quantization. To this end, we propose a quantization method called Anisotropic Additive Quantization to combine the scoreaware anisotropic loss and additive quantization. To efficiently update the codebooks in this algorithm, we develop a new alternating optimization algorithm. The proposed algorithm is extensively evaluated on three real-world datasets. The experimental results show that it outperforms the stateof-the-art baselines with respect to approximate search accuracy while guaranteeing a similar retrieval efficiency.

NeurIPS Conference 2022 Conference Paper

DARE: Disentanglement-Augmented Rationale Extraction

  • Linan Yue
  • Qi Liu
  • Yichao Du
  • Yanqing An
  • Li Wang
  • Enhong Chen

Rationale extraction can be considered as a straightforward method of improving the model explainability, where rationales are a subsequence of the original inputs, and can be extracted to support the prediction results. Existing methods are mainly cascaded with the selector which extracts the rationale tokens, and the predictor which makes the prediction based on selected tokens. Since previous works fail to fully exploit the original input, where the information of non-selected tokens is ignored, in this paper, we propose a Disentanglement-Augmented Rationale Extraction (DARE) method, which encapsulates more information from the input to extract rationales. Specifically, it first disentangles the input into the rationale representations and the non-rationale ones, and then learns more comprehensive rationale representations for extracting by minimizing the mutual information (MI) between the two disentangled representations. Besides, to improve the performance of MI minimization, we develop a new MI estimator by exploring existing MI estimation methods. Extensive experimental results on three real-world datasets and simulation studies clearly validate the effectiveness of our proposed method. Code is released at https: //github. com/yuelinan/DARE.

AAAI Conference 2022 Conference Paper

Fully Adaptive Framework: Neural Computerized Adaptive Testing for Online Education

  • Yan Zhuang
  • Qi Liu
  • Zhenya Huang
  • Zhi Li
  • Shuanghong Shen
  • Haiping Ma

Computerized Adaptive Testing (CAT) refers to an efficient and personalized test mode in online education, aiming to accurately measure student proficiency level on the required subject/domain. The key component of CAT is the “adaptive” question selection algorithm, which automatically selects the best suited question for student based on his/her current estimated proficiency, reducing test length. Existing algorithms rely on some manually designed and pre-fixed informativeness/uncertainty metrics of question for selections, which is labor-intensive and not sufficient for capturing complex relations between students and questions. In this paper, we propose a fully adaptive framework named Neural Computerized Adaptive Testing (NCAT), which formally redefines CAT as a reinforcement learning problem and directly learns selection algorithm from real-world data. Specifically, a bilevel optimization is defined and simplified under CAT’s application scenarios to make the algorithm learnable. Furthermore, to address the CAT task effectively, we tackle it as an equivalent reinforcement learning problem and propose an attentive neural policy to model complex non-linear interactions. Extensive experiments on real-world datasets demonstrate the effectiveness and robustness of NCAT compared with several state-of-the-art methods.

NeurIPS Conference 2022 Conference Paper

Hierarchical Graph Transformer with Adaptive Node Sampling

  • Zaixi Zhang
  • Qi Liu
  • Qingyong Hu
  • Chee-Kong Lee

The Transformer architecture has achieved remarkable success in a number of domains including natural language processing and computer vision. However, when it comes to graph-structured data, transformers have not achieved competitive performance, especially on large graphs. In this paper, we identify the main deficiencies of current graph transformers: (1) Existing node sampling strategies in Graph Transformers are agnostic to the graph characteristics and the training process. (2) Most sampling strategies only focus on local neighbors and neglect the long-range dependencies in the graph. We conduct experimental investigations on synthetic datasets to show that existing sampling strategies are sub-optimal. To tackle the aforementioned problems, we formulate the optimization strategies of node sampling in Graph Transformer as an adversary bandit problem, where the rewards are related to the attention weights and can vary in the training procedure. Meanwhile, we propose a hierarchical attention scheme with graph coarsening to capture the long-range interactions while reducing computational complexity. Finally, we conduct extensive experiments on real-world datasets to demonstrate the superiority of our method over existing graph transformers and popular GNNs.

AAAI Conference 2022 Conference Paper

ProtGNN: Towards Self-Explaining Graph Neural Networks

  • Zaixi Zhang
  • Qi Liu
  • Hao Wang
  • Chengqiang Lu
  • Cheekong Lee

Despite the recent progress in Graph Neural Networks (GNNs), it remains challenging to explain the predictions made by GNNs. Existing explanation methods mainly focus on post-hoc explanations where another explanatory model is employed to provide explanations for a trained GNN. The fact that post-hoc methods fail to reveal the original reasoning process of GNNs raises the need of building GNNs with built-in interpretability. In this work, we propose Prototype Graph Neural Network (ProtGNN), which combines prototype learning with GNNs and provides a new perspective on the explanations of GNNs. In ProtGNN, the explanations are naturally derived from the case-based reasoning process and are actually used during classification. The prediction of Prot- GNN is obtained by comparing the inputs to a few learned prototypes in the latent space. Furthermore, for better interpretability and higher efficiency, a novel conditional subgraph sampling module is incorporated to indicate which part of the input graph is most similar to each prototype in Prot- GNN+. Finally, we evaluate our method on a wide range of datasets and perform concrete case studies. Extensive results show that ProtGNN and ProtGNN+ can provide inherent interpretability while achieving accuracy on par with the noninterpretable counterparts.

NeurIPS Conference 2021 Conference Paper

Causal Effect Inference for Structured Treatments

  • Jean Kaddour
  • Yuchen Zhu
  • Qi Liu
  • Matt J. Kusner
  • Ricardo Silva

We address the estimation of conditional average treatment effects (CATEs) for structured treatments (e. g. , graphs, images, texts). Given a weak condition on the effect, we propose the generalized Robinson decomposition, which (i) isolates the causal estimand (reducing regularization bias), (ii) allows one to plug in arbitrary models for learning, and (iii) possesses a quasi-oracle convergence guarantee under mild assumptions. In experiments with small-world and molecular graphs we demonstrate that our approach outperforms prior work in CATE estimation.

AAAI Conference 2021 Conference Paper

Coupling Macro-Sector-Micro Financial Indicators for Learning Stock Representations with Less Uncertainty

  • Guifeng Wang
  • Longbing Cao
  • Hongke Zhao
  • Qi Liu
  • Enhong Chen

While the stock movement prediction has been intensively studied, existing work suffers from weak generalization because of the uncertainty in both data and modeling. On one hand, training a stock representation on stochastic stock data in an end-to-end manner may lead to excessive modeling, which involves the model uncertainty. On the other, the analysis of correlating stock data with its relevant factors involves the data uncertainty. To simultaneously address such uncertainty both from data and modeling perspectives, a fundamental yet challenging task is to learn a better stock representation with less uncertainty by considering hierarchical couplings from the macro-level to the sector-and micro-level. Accordingly, we propose a copula-based contrastive predictive coding (Co-CPC) method. Co-CPC first models the dependence between a certain stock sector and relevant macroeconomic variables that are sequential and heterogeneous, e. g. , macrovariables are associated with different time intervals, scales, and distributions. Then, by involving a macro-sector context, stock representations are learned in a self-supervised way that can further be used for downstream tasks like stock movement prediction. Extensive experiments on two typical stock datasets verify the effectiveness of our Co-CPC method.

AAAI Conference 2021 Conference Paper

Cross-Oilfield Reservoir Classification via Multi-Scale Sensor Knowledge Transfer

  • Zhi Li
  • Zhefeng Wang
  • Zhicheng Wei
  • Xiangguang Zhou
  • Yijun Wang
  • Baoxing Huai
  • Qi Liu
  • Nicholas Jing Yuan

Reservoir classification is an essential step for the exploration and production process in the oil and gas industry. An appropriate automatic reservoir classification will not only reduce the manual workloads of experts, but also help petroleum companies to make optimal decisions efficiently, which in turn will dramatically reduce the costs. Existing methods mainly focused on generating reservoir classification in a single geological block but failed to work well on a new oilfield block. Indeed, how to transfer the subsurface characteristics and make accurate reservoir classification across the geological oilfields is a very important but challenging problem. To that end, in this paper, we present a focused study on the cross-oilfield reservoir classification task. Specifically, we first propose a Multi-scale Sensor Extraction (MSE) module to extract the multi-scale feature representations of geological characteristics from multivariate well logs. Furthermore, we design an encoder-decoder module, i. e. , Specific Feature Learning (SFL), to take advantage of specific information of both oilfields. Then, we develop a Knowledge- Attentive Transfer (KAT) module to learn the feature-invariant representation and transfer the geological knowledge from a source oilfield to a target oilfield. Finally, we evaluate our approaches by conducting extensive experiments with realworld industrial datasets. The experimental results clearly demonstrate the effectiveness of our proposed approaches to transfer the geological knowledge and generate the crossoilfield reservoir classifications.

IJCAI Conference 2021 Conference Paper

GraphMI: Extracting Private Graph Data from Graph Neural Networks

  • Zaixi Zhang
  • Qi Liu
  • Zhenya Huang
  • Hao Wang
  • Chengqiang Lu
  • Chuanren Liu
  • Enhong Chen

As machine learning becomes more widely used for critical applications, the need to study its implications in privacy becomes urgent. Given access to the target model and auxiliary information, model inversion attack aims to infer sensitive features of the training dataset, which leads to great privacy concerns. Despite its success in the grid domain, directly applying model inversion techniques on non grid domains such as graph achieves poor attack performance due to the difficulty to fully exploit the intrinsic properties of graphs and attributes of graph nodes used in GNN models. To bridge this gap, we present Graph Model Inversion attack, which aims to infer edges of the training graph by inverting Graph Neural Networks, one of the most popular graph analysis tools. Specifically, the projected gradient module in our method can tackle the discreteness of graph edges while preserving the sparsity and smoothness of graph features. Moreover, a well designed graph autoencoder module can efficiently exploit graph topology, node attributes, and target model parameters. With the proposed method, we study the connection between model inversion risk and edge influence and show that edges with greater influence are more likely to be recovered. Extensive experiments over several public datasets demonstrate the effectiveness of our method. We also show that differential privacy in its canonical form can hardly defend our attack while preserving decent utility.

IJCAI Conference 2021 Conference Paper

Guided Attention Network for Concept Extraction

  • Songtao Fang
  • Zhenya Huang
  • Ming He
  • Shiwei Tong
  • Xiaoqing Huang
  • Ye Liu
  • Jie Huang
  • Qi Liu

Concept extraction aims to find words or phrases describing a concept from massive texts. Recently, researchers propose many neural network-based methods to automatically extract concepts. Although these methods for this task show promising results, they ignore structured information in the raw textual data (e. g. , title, topic, and clue words). In this paper, we propose a novel model, named Guided Attention Concept Extraction Network (GACEN), which uses title, topic, and clue words as additional supervision to provide guidance directly. Specifically, GACEN comprises two attention networks, one of them is to gather the relevant title and topic information for each context word in the document. The other one aims to model the implicit connection between informative words (clue words) and concepts. Finally, we aggregate information from two networks as input to Conditional Random Field (CRF) to model dependencies in the output. We collected clue words for three well-studied datasets. Extensive experiments demonstrate that our model outperforms the baseline models with a large margin, especially when the labeled data is insufficient.

AAAI Conference 2021 Conference Paper

HMS: A Hierarchical Solver with Dependency-Enhanced Understanding for Math Word Problem

  • Xin Lin
  • Zhenya Huang
  • Hongke Zhao
  • Enhong Chen
  • Qi Liu
  • Hao Wang
  • Shijin Wang

Automatically solving math word problems is a crucial task for exploring the intelligence levels of machines in the general AI domain. It is highly challenging since it requires not only natural language understanding but also mathematical expression inference. Existing solutions usually explore sequence-to-sequence models to generate expressions, where the problems are simply encoded sequentially. However, such models are generally far from enough for understanding problems as similar to humans and lead to incorrect answers. To this end, in this paper, we propose a novel Hierarchical Math Solver (HMS) to make deep understanding and exploitation of problems. In problem understanding, imitating human reading habits, we propose a hierarchical word-clauseproblem encoder. Specifically, we first split each problem into several clauses and learn problem semantics from the local clause level to the global problem level. Then, in clause understanding, we propose a dependency-based module to enhance clause semantics with the dependency structure of the problem. Next, in expression inference, we propose a novel tree-based decoder to generate the mathematical expression for the answer. In the decoder, we apply a hierarchical attention mechanism to enhance the problem semantics with context from different levels, and a pointer-generator network to guide the model to copy existing information and infer extra knowledge. Extensive experimental results on two widely used datasets demonstrate that HMS achieves not only better answers but also more reasonable inference.

AAAI Conference 2021 Conference Paper

Ideography Leads Us to the Field of Cognition: A Radical-Guided Associative Model for Chinese Text Classification

  • Hanqing Tao
  • Shiwei Tong
  • Kun Zhang
  • Tong Xu
  • Qi Liu
  • Enhong Chen
  • Min Hou

Cognitive psychology research shows that humans have the instinct for abstract thinking, where association plays an essential role in language comprehension. Especially for Chinese, its ideographic writing system allows radicals to trigger semantic association without the need of phonetics. In fact, subconsciously using the associative information guided by radicals is a key for readers to ensure the robustness of semantic understanding. Fortunately, many basic and extended concepts related to radicals are systematically included in Chinese language dictionaries, which leaves a handy but unexplored way for improving Chinese text representation and classification. To this end, we draw inspirations from cognitive principles between ideography and human associative behavior to propose a novel Radical-guided Associative Model (RAM) for Chinese text classification. RAM comprises two coupled spaces, namely Literal Space and Associative Space, which imitates the real process in people’s mind when understanding a Chinese text. To be specific, we first devise a serialized modeling structure in Literal Space to thoroughly capture the sequential information of Chinese text. Then, based on the authoritative information provided by Chinese language dictionaries, we design an association module and put forward a strategy called Radical-Word Association to use ideographic radicals as the medium to associate prior concept words in Associative Space. Afterwards, we design an attention module to imitate people’s matching and decision between Literal Space and Associative Space, which can balance the importance of each associative words under specific contexts. Finally, extensive experiments on two real-world datasets prove the effectiveness and rationality of RAM, with good cognitive insights for future language modeling.

IJCAI Conference 2021 Conference Paper

Item Response Ranking for Cognitive Diagnosis

  • Shiwei Tong
  • Qi Liu
  • Runlong Yu
  • Wei Huang
  • Zhenya Huang
  • Zachary A. Pardos
  • Weijie Jiang

Cognitive diagnosis, a fundamental task in education area, aims at providing an approach to reveal the proficiency level of students on knowledge concepts. Actually, monotonicity is one of the basic conditions in cognitive diagnosis theory, which assumes that student's proficiency is monotonic with the probability of giving the right response to a test item. However, few of previous methods consider the monotonicity during optimization. To this end, we propose Item Response Ranking framework (IRR), aiming at introducing pairwise learning into cognitive diagnosis to well model the monotonicity between item responses. Specifically, we first use an item specific sampling method to sample item responses and construct response pairs based on their partial order, where we propose the two-branch sampling methods to handle the unobserved responses. After that, we use a pairwise objective function to exploit the monotonicity in the pair formulation. In fact, IRR is a general framework which can be applied to most of contemporary cognitive diagnosis models. Extensive experiments demonstrate the effectiveness and interpretability of our method.

NeurIPS Conference 2021 Conference Paper

Motif-based Graph Self-Supervised Learning for Molecular Property Prediction

  • Zaixi Zhang
  • Qi Liu
  • Hao Wang
  • Chengqiang Lu
  • Chee-Kong Lee

Predicting molecular properties with data-driven methods has drawn much attention in recent years. Particularly, Graph Neural Networks (GNNs) have demonstrated remarkable success in various molecular generation and prediction tasks. In cases where labeled data is scarce, GNNs can be pre-trained on unlabeled molecular data to first learn the general semantic and structural information before being finetuned for specific tasks. However, most existing self-supervised pretraining frameworks for GNNs only focus on node-level or graph-level tasks. These approaches cannot capture the rich information in subgraphs or graph motifs. For example, functional groups (frequently-occurred subgraphs in molecular graphs) often carry indicative information about the molecular properties. To bridge this gap, we propose Motif-based Graph Self-supervised Learning (MGSSL) by introducing a novel self-supervised motif generation framework for GNNs. First, for motif extraction from molecular graphs, we design a molecule fragmentation method that leverages a retrosynthesis-based algorithm BRICS and additional rules for controlling the size of motif vocabulary. Second, we design a general motif-based generative pretraining framework in which GNNs are asked to make topological and label predictions. This generative framework can be implemented in two different ways, i. e. , breadth-first or depth-first. Finally, to take the multi-scale information in molecular graphs into consideration, we introduce a multi-level self-supervised pre-training. Extensive experiments on various downstream benchmark tasks show that our methods outperform all state-of-the-art baselines.

AAAI Conference 2021 Conference Paper

NeuralAC: Learning Cooperation and Competition Effects for Match Outcome Prediction

  • Yin Gu
  • Qi Liu
  • Kai Zhang
  • Zhenya Huang
  • Runze Wu
  • Jianrong Tao

Match outcome prediction in group comparison setting is a challenging but important task. Existing works mainly focus on learning individual effects or mining limited interactions between teammates, which is not sufficient for capturing complex interactions between teammates as well as between opponents. Besides, the importance of interacting with different characters is still largely underexplored. To this end, we propose a novel Neural Attentional Cooperation-competition model (NeuralAC), which incorporates weighted-cooperation effects (i. e. , intra-team interactions) and weighted-competition effects (i. e. , inter-team interactions) for predicting match outcomes. Specifically, we first project individuals to latent vectors and learn complex interactions through deep neural networks. Then, we design two novel attention-based mechanisms to capture the importance of intra-team and inter-team interactions, which enhance NeuralAC with both accuracy and interpretability. Furthermore, we demonstrate NeuralAC can generalize several previous works. To evaluate the performances of NeuralAC, we conduct extensive experiments on four E-sports datasets. The experimental results clearly verify the effectiveness of NeuralAC compared with several state-of-the-art methods.

IJCAI Conference 2021 Conference Paper

Preference-Adaptive Meta-Learning for Cold-Start Recommendation

  • Li Wang
  • Binbin Jin
  • Zhenya Huang
  • Hongke Zhao
  • Defu Lian
  • Qi Liu
  • Enhong Chen

In recommender systems, the cold-start problem is a critical issue. To alleviate this problem, an emerging direction adopts meta-learning frameworks and achieves success. Most existing works aim to learn globally shared prior knowledge across all users so that it can be quickly adapted to a new user with sparse interactions. However, globally shared prior knowledge may be inadequate to discern users’ complicated behaviors and causes poor generalization. Therefore, we argue that prior knowledge should be locally shared by users with similar preferences who can be recognized by social relations. To this end, in this paper, we propose a Preference-Adaptive Meta-Learning approach (PAML) to improve existing meta-learning frameworks with better generalization capacity. Specifically, to address two challenges imposed by social relations, we first identify reliable implicit friends to strengthen a user’s social relations based on our defined palindrome paths. Then, a coarse-fine preference modeling method is proposed to leverage social relations and capture the preference. Afterwards, a novel preference-specific adapter is designed to adapt the globally shared prior knowledge to the preference-specific knowledge so that users who have similar tastes share similar knowledge. We conduct extensive experiments on two publicly available datasets. Experimental results validate the power of social relations and the effectiveness of PAML.

IJCAI Conference 2021 Conference Paper

Towards a New Generation of Cognitive Diagnosis

  • Qi Liu

Cognitive diagnosis is a type of assessment for automatically measuring individuals' proficiency profiles from their observed behaviors, e. g. quantifying the mastery level of examinees on specific knowledge concepts/skills. As one of the fundamental research tasks in domains like intelligent education, a number of Cognitive Diagnosis Models (CDMs) have been developed in the past decades. Though these solutions are usually well designed based on psychometric theories, they still suffer from the limited ability of the handcrafted diagnosis functions, especially when dealing with heterogeneous data. In this paper, I will share my personal understanding of cognitive diagnosis and review our recent developments of CDMs mostly from a machine learning perspective. Meanwhile, I will show the wide applications of cognitive diagnosis.

AAAI Conference 2020 Conference Paper

Adaptive Quantitative Trading: An Imitative Deep Reinforcement Learning Approach

  • Yang Liu
  • Qi Liu
  • Hongke Zhao
  • Zhen Pan
  • Chuanren Liu

In recent years, considerable efforts have been devoted to developing AI techniques for finance research and applications. For instance, AI techniques (e. g. , machine learning) can help traders in quantitative trading (QT) by automating two tasks: market condition recognition and trading strategies execution. However, existing methods in QT face challenges such as representing noisy high-frequent financial data and finding the balance between exploration and exploitation of the trading agent with AI techniques. To address the challenges, we propose an adaptive trading model, namely iRDPG, to automatically develop QT strategies by an intelligent trading agent. Our model is enhanced by deep reinforcement learning (DRL) and imitation learning techniques. Specifically, considering the noisy financial data, we formulate the QT process as a Partially Observable Markov Decision Process (POMDP). Also, we introduce imitation learning to leverage classical trading strategies useful to balance between exploration and exploitation. For better simulation, we train our trading agent in the real financial market using minute-frequent data. Experimental results demonstrate that our model can extract robust market features and be adaptive in different markets.

AAAI Conference 2020 Conference Paper

Crowdfunding Dynamics Tracking: A Reinforcement Learning Approach

  • Jun Wang
  • Hefu Zhang
  • Qi Liu
  • Zhen Pan
  • Hanqing Tao

Recent years have witnessed the increasing interests in research of crowdfunding mechanism. In this area, dynamics tracking is a significant issue but is still under exploration. Existing studies either fit the fluctuations of timeseries or employ regularization terms to constrain learned tendencies. However, few of them take into account the inherent decision-making process between investors and crowdfunding dynamics. To address the problem, in this paper, we propose a Trajectory-based Continuous Control for Crowdfunding (TC3) algorithm to predict the funding progress in crowdfunding. Specifically, actor-critic frameworks are employed to model the relationship between investors and campaigns, where all of the investors are viewed as an agent that could interact with the environment derived from the real dynamics of campaigns. Then, to further explore the in-depth implications of patterns (i. e. , typical characters) in funding series, we propose to subdivide them into fast-growing and slow-growing ones. Moreover, for the purpose of switching from different kinds of patterns, the actor component of TC3 is extended with a structure of options, which comes to the TC3-Options. Finally, extensive experiments on the Indiegogo dataset not only demonstrate the effectiveness of our methods, but also validate our assumption that the entire pattern learned by TC3-Options is indeed the U-shaped one.

AAAI Conference 2020 Conference Paper

Estimating Early Fundraising Performance of Innovations via Graph-Based Market Environment Model

  • Likang Wu
  • Zhi Li
  • Hongke Zhao
  • Zhen Pan
  • Qi Liu
  • Enhong Chen

Well begun is half done. In the crowdfunding market, the early fundraising performance of the project is a concerned issue for both creators and platforms. However, estimating the early fundraising performance before the project published is very challenging and still under-explored. To that end, in this paper, we present a focused study on this important problem in a market modeling view. Specifically, we propose a Graphbased Market Environment model (GME) for estimating the early fundraising performance of the target project by exploiting the market environment. In addition, we discriminatively model the market competition and market evolution by designing two graph-based neural network architectures and incorporating them into the joint optimization stage. Finally, we conduct extensive experiments on the real-world crowdfunding data collected from Indiegogo. com. The experimental results clearly demonstrate the effectiveness of our proposed model for modeling and estimating the early fundraising performance of the target project.

IJCAI Conference 2020 Conference Paper

Learning the Compositional Visual Coherence for Complementary Recommendations

  • Zhi Li
  • Bo Wu
  • Qi Liu
  • Likang Wu
  • Hongke Zhao
  • Tao Mei

Complementary recommendations, which aim at providing users product suggestions that are supplementary and compatible with their obtained items, have become a hot topic in both academia and industry in recent years. Existing work mainly focused on modeling the co-purchased relations between two items, but the compositional associations of item collections are largely unexplored. Actually, when a user chooses the complementary items for the purchased products, it is intuitive that she will consider the visual semantic coherence (such as color collocations, texture compatibilities) in addition to global impressions. Towards this end, in this paper, we propose a novel Content Attentive Neural Network (CANN) to model the comprehensive compositional coherence on both global contents and semantic contents. Specifically, we first propose a Global Coherence Learning (GCL) module based on multi-heads attention to model the global compositional coherence. Then, we generate the semantic-focal representations from different semantic regions and design a Focal Coherence Learning (FCL) module to learn the focal compositional coherence from different semantic-focal representations. Finally, we optimize the CANN in a novel compositional optimization strategy. Extensive experiments on the large-scale real-world data clearly demonstrate the effectiveness of CANN compared with several state-of-the-art methods.

AAAI Conference 2020 Conference Paper

Multi-Task Self-Supervised Learning for Disfluency Detection

  • Shaolei Wang
  • Wangxiang Che
  • Qi Liu
  • Pengda Qin
  • Ting Liu
  • William Yang Wang

Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expensive to obtain in practice. To tackle the training data bottleneck, we investigate methods for combining multiple self-supervised tasksi. e. , supervised tasks where data can be collected without manual labeling. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled news data, and propose two self-supervised pre-training tasks: (i) tagging task to detect the added noisy words. (ii) sentence classification to distinguish original sentences from grammatically-incorrect sentences. We then combine these two tasks to jointly train a network. The pre-trained network is then fine-tuned using human-annotated disfluency detection training data. Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1% (1000 sentences) of the training data. Our method trained on the full dataset significantly outperforms previous methods, reducing the error by 21% on English Switchboard.

AAAI Conference 2020 Conference Paper

Neural Cognitive Diagnosis for Intelligent Education Systems

  • Fei Wang
  • Qi Liu
  • Enhong Chen
  • Zhenya Huang
  • Yuying Chen
  • Yu Yin
  • Zai Huang
  • Shijin Wang

Cognitive diagnosis is a fundamental issue in intelligent education, which aims to discover the proficiency level of students on specific knowledge concepts. Existing approaches usually mine linear interactions of student exercising process by manual-designed function (e. g. , logistic function), which is not sufficient for capturing complex relations between students and exercises. In this paper, we propose a general Neural Cognitive Diagnosis (NeuralCD) framework, which incorporates neural networks to learn the complex exercising interactions, for getting both accurate and interpretable diagnosis results. Specifically, we project students and exercises to factor vectors and leverage multi neural layers for modeling their interactions, where the monotonicity assumption is applied to ensure the interpretability of both factors. Furthermore, we propose two implementations of NeuralCD by specializing the required concepts of each exercise, i. e. , the NeuralCDM with traditional Q-matrix and the improved NeuralCDM+ exploring the rich text content. Extensive experimental results on real-world datasets show the effectiveness of NeuralCD framework with both accuracy and interpretability.

AAAI Conference 2020 Conference Paper

Pointwise Rotation-Invariant Network with Adaptive Sampling and 3D Spherical Voxel Convolution

  • Yang You
  • Yujing Lou
  • Qi Liu
  • Yu-Wing Tai
  • Lizhuang Ma
  • Cewu Lu
  • Weiming Wang

Point cloud analysis without pose priors is very challenging in real applications, as the orientations of point clouds are often unknown. In this paper, we propose a brand new pointset learning framework PRIN, namely, Pointwise Rotation- Invariant Network, focusing on rotation-invariant feature extraction in point clouds analysis. We construct spherical signals by Density Aware Adaptive Sampling to deal with distorted point distributions in spherical space. In addition, we propose Spherical Voxel Convolution and Point Re-sampling to extract rotation-invariant features for each point. Our network can be applied to tasks ranging from object classification, part segmentation, to 3D feature matching and label alignment. We show that, on the dataset with randomly rotated point clouds, PRIN demonstrates better performance than state-of-the-art methods without any data augmentation. We also provide theoretical analysis for the rotationinvariance achieved by our methods.

NeurIPS Conference 2020 Conference Paper

Sampling-Decomposable Generative Adversarial Recommender

  • Binbin Jin
  • Defu Lian
  • Zheng Liu
  • Qi Liu
  • Jianhui Ma
  • Xing Xie
  • Enhong Chen

Recommendation techniques are important approaches for alleviating information overload. Being often trained on implicit user feedback, many recommenders suffer from the sparsity challenge due to the lack of explicitly negative samples. The GAN-style recommenders (i. e. , IRGAN) addresses the challenge by learning a generator and a discriminator adversarially, such that the generator produces increasingly difficult samples for the discriminator to accelerate optimizing the discrimination objective. However, producing samples from the generator is very time-consuming, and our empirical study shows that the discriminator performs poor in top-k item recommendation. To this end, a theoretical analysis is made for the GAN-style algorithms, showing that the generator of limit capacity is diverged from the optimal generator. This may interpret the limitation of discriminator's performance. Based on these findings, we propose a Sampling-Decomposable Generative Adversarial Recommender (SD-GAR). In the framework, the divergence between some generator and the optimum is compensated by self-normalized importance sampling; the efficiency of sample generation is improved with a sampling-decomposable generator, such that each sample can be generated in O(1) with the Vose-Alias method. Interestingly, due to decomposability of sampling, the generator can be optimized with the closed-form solutions in an alternating manner, being different from policy gradient in the GAN-style algorithms. We extensively evaluate the proposed algorithm with five real-world recommendation datasets. The results show that SD-GAR outperforms IRGAN by 12. 4% and the SOTA recommender by 10% on average. Moreover, discriminator training can be 20x faster on the dataset with more than 120K items.

IJCAI Conference 2020 Conference Paper

Smart Contract Vulnerability Detection using Graph Neural Network

  • Yuan Zhuang
  • Zhenguang Liu
  • Peng Qian
  • Qi Liu
  • Xiang Wang
  • Qinming He

The security problems of smart contracts have drawn extensive attention due to the enormous financial losses caused by vulnerabilities. Existing methods on smart contract vulnerability detection heavily rely on fixed expert rules, leading to low detection accuracy. In this paper, we explore using graph neural networks (GNNs) for smart contract vulnerability detection. Particularly, we construct a contract graph to represent both syntactic and semantic structures of a smart contract function. To highlight the major nodes, we design an elimination phase to normalize the graph. Then, we propose a degree-free graph convolutional neural network (DR-GCN) and a novel temporal message propagation network (TMP) to learn from the normalized graphs for vulnerability detection. Extensive experiments show that our proposed approach significantly outperforms state-of-the-art methods in detecting three different types of vulnerabilities.

AAAI Conference 2019 Conference Paper

A Radical-Aware Attention-Based Model for Chinese Text Classification

  • Hanqing Tao
  • Shiwei Tong
  • Hongke Zhao
  • Tong Xu
  • Binbin Jin
  • Qi Liu

Recent years, Chinese text classification has attracted more and more research attention. However, most existing techniques which specifically aim at English materials may lose effectiveness on this task due to the huge difference between Chinese and English. Actually, as a special kind of hieroglyphics, Chinese characters and radicals are semantically useful but still unexplored in the task of text classification. To that end, in this paper, we first analyze the motives of using multiple granularity features to represent a Chinese text by inspecting the characteristics of radicals, characters and words. For better representing the Chinese text and then implementing Chinese text classification, we propose a novel Radicalaware Attention-based Four-Granularity (RAFG) model to take full advantages of Chinese characters, words, characterlevel radicals, word-level radicals simultaneously. Specifically, RAFG applies a serialized BLSTM structure which is context-aware and able to capture the long-range information to model the character sharing property of Chinese and sequence characteristics in texts. Further, we design an attention mechanism to enhance the effects of radicals thus model the radical sharing property when integrating granularities. Finally, we conduct extensive experiments, where the experimental results not only show the superiority of our model, but also validate the effectiveness of radicals in the task of Chinese text classification.

AAAI Conference 2019 Conference Paper

Estimating the Days to Success of Campaigns in Crowdfunding: A Deep Survival Perspective

  • Binbin Jin
  • Hongke Zhao
  • Enhong Chen
  • Qi Liu
  • Yong Ge

Crowdfunding is an emerging mechanism for entrepreneurs or individuals to solicit funding from the public for their creative ideas. However, in these platforms, quite a large proportion of campaigns (projects) fail to raise enough money of backers’ supports by the declared expiration date. Actually, it is very urgent to predict the exact success time of campaigns. But this problem has not been well explored due to a series of domain and technical challenges. In this paper, we notice the implicit factor of distribution of backing behaviors has a positive impact on estimating the success time of the campaign. Therefore, we present a focused study on predicting two specific tasks, i. e. , backing distribution prediction and success time prediction of campaigns. Specifically, we propose a Seq2seq based model with Multi-facet Priors (SMP), which can integrate heterogeneous features to jointly model the backing distribution and success time. Additionally, to keep the change of backing distributions more smooth as the backing behaviors increases, we develop a linear evolutionary prior for backing distribution prediction. Furthermore, due to high failure rate, the success time of most campaigns is unobservable. We model this censoring phenomenon from the survival analysis perspective and also develop a non-increasing prior and a partial prior for success time prediction. Finally, we conduct extensive experiments on a real-world dataset from Indiegogo. Experimental results clearly validate the effectiveness of SMP.

IJCAI Conference 2019 Conference Paper

Explainable Fashion Recommendation: A Semantic Attribute Region Guided Approach

  • Min Hou
  • Le Wu
  • Enhong Chen
  • Zhi Li
  • Vincent W. Zheng
  • Qi Liu

In fashion recommender systems, each product usually consists of multiple semantic attributes (e. g. , sleeves, collar, etc). When making cloth decisions, people usually show preferences for different semantic attributes (e. g. , the clothes with v-neck collar). Nevertheless, most previous fashion recommendation models comprehend the clothing images with a global content representation and lack detailed understanding of users' semantic preferences, which usually leads to inferior recommendation performance. To bridge this gap, we propose a novel Semantic Attribute Explainable Recommender System (SAERS). Specifically, we first introduce a fine-grained interpretable semantic space. We then develop a Semantic Extraction Network (SEN) and Fine-grained Preferences Attention (FPA) module to project users and items into this space, respectively. With SAERS, we are capable of not only providing cloth recommendations for users, but also explaining the reason why we recommend the cloth through intuitive visual attribute semantic highlights in a personalized manner. Extensive experiments conducted on real-world datasets clearly demonstrate the effectiveness of our approach compared with the state-of-the-art methods.

NeurIPS Conference 2019 Conference Paper

Hyperbolic Graph Neural Networks

  • Qi Liu
  • Maximilian Nickel
  • Douwe Kiela

Learning from graph-structured data is an important task in machine learning and artificial intelligence, for which Graph Neural Networks (GNNs) have shown great promise. Motivated by recent advances in geometric representation learning, we propose a novel GNN architecture for learning representations on Riemannian manifolds with differentiable exponential and logarithmic maps. We develop a scalable algorithm for modeling the structural properties of graphs, comparing Euclidean and hyperbolic geometry. In our experiments, we show that hyperbolic GNNs can lead to substantial improvements on various benchmark datasets.

AAAI Conference 2019 Conference Paper

Interactive Attention Transfer Network for Cross-Domain Sentiment Classification

  • Kai Zhang
  • Hefu Zhang
  • Qi Liu
  • Hongke Zhao
  • Hengshu Zhu
  • Enhong Chen

Cross-domain sentiment classification refers to utilizing useful knowledge in the source domain to help sentiment classification in the target domain which has few or no labeled data. Most existing methods mainly concentrate on extracting common features between domains. Unfortunately, they cannot fully consider the effects of the aspect (e. g. , the battery life in reviewing an electronic product) information of the sentences. In order to better solve this problem, we propose an Interactive Attention Transfer Network (IATN) for crossdomain sentiment classification. IATN provides an interactive attention transfer mechanism, which can better transfer sentiment across domains by incorporating information of both sentences and aspects. Specifically, IATN comprises two attention networks, one of them is to identify the common features between domains through domain classification, and the other aims to extract information from the aspects by using the common features as a bridge. Then, we conduct interactive attention learning for those two networks so that both the sentences and the aspects can influence the final sentiment representation. Extensive experiments on the Amazon reviews dataset and crowdfunding reviews dataset not only demonstrate the effectiveness and universality of our method, but also give an interpretable way to track the attention information for sentiment.

AAAI Conference 2019 Conference Paper

Molecular Property Prediction: A Multilevel Quantum Interactions Modeling Perspective

  • Chengqiang Lu
  • Qi Liu
  • Chao Wang
  • Zhenya Huang
  • Peize Lin
  • Lixin He

Predicting molecular properties (e. g. , atomization energy) is an essential issue in quantum chemistry, which could speed up much research progress, such as drug designing and substance discovery. Traditional studies based on density functional theory (DFT) in physics are proved to be time-consuming for predicting large number of molecules. Recently, the machine learning methods, which consider much rule-based information, have also shown potentials for this issue. However, the complex inherent quantum interactions of molecules are still largely underexplored by existing solutions. In this paper, we propose a generalizable and transferable Multilevel Graph Convolutional neural Network (MGCN) for molecular property prediction. Specifically, we represent each molecule as a graph to preserve its internal structure. Moreover, the well-designed hierarchical graph neural network directly extracts features from the conformation and spatial information followed by the multilevel interactions. As a consequence, the multilevel overall representations can be utilized to make the prediction. Extensive experiments on both datasets of equilibrium and off-equilibrium molecules demonstrate the effectiveness of our model. Furthermore, the detailed results also prove that MGCN is generalizable and transferable for the prediction.

IJCAI Conference 2019 Conference Paper

Multi-Group Encoder-Decoder Networks to Fuse Heterogeneous Data for Next-Day Air Quality Prediction

  • Yawen Zhang
  • Qin Lv
  • Duanfeng Gao
  • Si Shen
  • Robert Dick
  • Michael Hannigan
  • Qi Liu

Accurate next-day air quality prediction is essential to enable warning and prevention measures for cities and individuals to cope with potential air pollution, such as vehicle restriction, factory shutdown, and limiting outdoor activities. The problem is challenging because air quality is affected by a diverse set of complex factors. There has been prior work on short-term (e. g. , next 6 hours) prediction, however, there is limited research on modeling local weather influences or fusing heterogeneous data for next-day air quality prediction. This paper tackles this problem through three key contributions: (1) we leverage multi-source data, especially high-frequency grid-based weather data, to model air pollutant dynamics at station-level; (2) we add convolution operators on grid weather data to capture the impacts of various weather parameters on air pollutant variations; and (3) we automatically group (cross-domain) features based on their correlations, and propose multi-group Encoder-Decoder networks (MGED-Net) to effectively fuse multiple feature groups for next-day air quality prediction. The experiments with real-world data demonstrate the improved prediction performance of MGED-Net over state-of-the-art solutions (4. 2% to 9. 6% improvement in MAE and 9. 2% to 16. 4% improvement in RMSE).

NeurIPS Conference 2019 Conference Paper

Quaternion Knowledge Graph Embeddings

  • Shuai Zhang
  • Yi Tay
  • Lina Yao
  • Qi Liu

In this work, we move beyond the traditional complex-valued representations, introducing more expressive hypercomplex representations to model entities and relations for knowledge graph embeddings. More specifically, quaternion embeddings, hypercomplex-valued embeddings with three imaginary components, are utilized to represent entities. Relations are modelled as rotations in the quaternion space. The advantages of the proposed approach are: (1) Latent inter-dependencies (between all components) are aptly captured with Hamilton product, encouraging a more compact interaction between entities and relations; (2) Quaternions enable expressive rotation in four-dimensional space and have more degree of freedom than rotation in complex plane; (3) The proposed framework is a generalization of ComplEx on hypercomplex space while offering better geometrical interpretations, concurrently satisfying the key desiderata of relational representation learning (i. e. , modeling symmetry, anti-symmetry and inversion). Experimental results demonstrate that our method achieves state-of-the-art performance on four well-established knowledge graph completion benchmarks.

AAAI Conference 2018 Conference Paper

Confidence-Aware Matrix Factorization for Recommender Systems

  • Chao Wang
  • Qi Liu
  • Runze Wu
  • Enhong Chen
  • Chuanren Liu
  • Xunpeng Huang
  • Zhenya Huang

Collaborative filtering (CF), particularly matrix factorization (MF) based methods, have been widely used in recommender systems. The literature has reported that matrix factorization methods often produce superior accuracy of rating prediction in recommender systems. However, existing matrix factorization methods rarely consider confidence of the rating prediction and thus cannot support advanced recommendation tasks. In this paper, we propose a Confidence-aware Matrix Factorization (CMF) framework to simultaneously optimize the accuracy of rating prediction and measure the prediction confidence in the model. Specifically, we introduce variance parameters for both users and items in the matrix factorization process. Then, prediction interval can be computed to measure confidence for each predicted rating. These confidence quantities can be used to enhance the quality of recommendation results based on Confidence-aware Ranking (CR). We also develop two effective implementations of our framework to compute the confidence-aware matrix factorization for large-scale data. Finally, extensive experiments on three real-world datasets demonstrate the effectiveness of our framework from multiple perspectives.

NeurIPS Conference 2018 Conference Paper

Constrained Graph Variational Autoencoders for Molecule Design

  • Qi Liu
  • Miltiadis Allamanis
  • Marc Brockschmidt
  • Alexander Gaunt

Graphs are ubiquitous data structures for representing interactions between entities. With an emphasis on applications in chemistry, we explore the task of learning to generate graphs that conform to a distribution observed in training data. We propose a variational autoencoder model in which both encoder and decoder are graph-structured. Our decoder assumes a sequential ordering of graph extension steps and we discuss and analyze design choices that mitigate the potential downsides of this linearization. Experiments compare our approach with a wide range of baselines on the molecule generation task and show that our method is successful at matching the statistics of the original dataset on semantically important metrics. Furthermore, we show that by using appropriate shaping of the latent space, our model allows us to design molecules that are (locally) optimal in desired properties.

AAAI Conference 2018 Conference Paper

Exercise-Enhanced Sequential Modeling for Student Performance Prediction

  • Yu Su
  • Qingwen Liu
  • Qi Liu
  • Zhenya Huang
  • Yu Yin
  • Enhong Chen
  • Chris Ding
  • Si Wei

In online education systems, for offering proactive services to students (e. g. , personalized exercise recommendation), a crucial demand is to predict student performance (e. g. , scores) on future exercising activities. Existing prediction methods mainly exploit the historical exercising records of students, where each exercise is usually represented as the manually labeled knowledge concepts, and the richer information contained in the text descriptions of exercises is still underexplored. In this paper, we propose a novel Exercise-Enhanced Recurrent Neural Network (EERNN) framework for student performance prediction by taking full advantage of both student exercising records and the text of each exercise. Specifically, for modeling the student exercising process, we first design a bidirectional LSTM to learn each exercise representation from its text description without any expertise and information loss. Then, we propose a new LSTM architecture to trace student states (i. e. , knowledge states) in their sequential exercising process with the combination of exercise representations. For making final predictions, we design two strategies under EERNN, i. e. , EERNNM with Markov property and EERNNA with Attention mechanism. Extensive experiments on large-scale real-world data clearly demonstrate the effectiveness of EERNN framework. Moreover, by incorporating the exercise correlations, EERNN can well deal with the cold start problems from both student and exercise perspectives.

TIST Journal 2018 Journal Article

Fuzzy Cognitive Diagnosis for Modelling Examinee Performance

  • Qi Liu
  • Runze Wu
  • Enhong Chen
  • Guandong Xu
  • Yu Su
  • Zhigang Chen
  • Guoping Hu

Recent decades have witnessed the rapid growth of educational data mining (EDM), which aims at automatically extracting valuable information from large repositories of data generated by or related to people’s learning activities in educational settings. One of the key EDM tasks is cognitive modelling with examination data, and cognitive modelling tries to profile examinees by discovering their latent knowledge state and cognitive level (e.g. the proficiency of specific skills). However, to the best of our knowledge, the problem of extracting information from both objective and subjective examination problems to achieve more precise and interpretable cognitive analysis remains underexplored. To this end, we propose a fuzzy cognitive diagnosis framework (FuzzyCDF) for examinees’ cognitive modelling with both objective and subjective problems. Specifically, to handle the partially correct responses on subjective problems, we first fuzzify the skill proficiency of examinees. Then we combine fuzzy set theory and educational hypotheses to model the examinees’ mastery on the problems based on their skill proficiency. Finally, we simulate the generation of examination score on each problem by considering slip and guess factors. In this way, the whole diagnosis framework is built. For further comprehensive verification, we apply our FuzzyCDF to three classical cognitive assessment tasks, i.e., predicting examinee performance, slip and guess detection, and cognitive diagnosis visualization. Extensive experiments on three real-world datasets for these assessment tasks prove that FuzzyCDF can reveal the knowledge states and cognitive level of the examinees effectively and interpretatively.

AAAI Conference 2018 Conference Paper

Multi-Modal Multi-Task Learning for Automatic Dietary Assessment

  • Qi Liu
  • Yue Zhang
  • Zhenguang Liu
  • Ye Yuan
  • Li Cheng
  • Roger Zimmermann

We investigate the task of automatic dietary assessment: given meal images and descriptions uploaded by real users, our task is to automatically rate the meals and deliver advisory comments for improving users’ diets. To address this practical yet challenging problem, which is multi-modal and multi-task in nature, an end-to-end neural model is proposed. In particular, comprehensive meal representations are obtained from images, descriptions and user information. We further introduce a novel memory network architecture to store meal representations and reason over the meal representations to support predictions. Results on a real-world dataset show that our method outperforms two strong image captioning baselines significantly.

IJCAI Conference 2018 Conference Paper

Patent Litigation Prediction: A Convolutional Tensor Factorization Approach

  • Qi Liu
  • Han Wu
  • Yuyang Ye
  • Hongke Zhao
  • Chuanren Liu
  • Dongfang Du

Patent litigation is an expensive legal process faced by many companies. To reduce the cost of patent litigation, one effective approach is proactive management based on predictive analysis. However, automatic prediction of patent litigation is still an open problem due to the complexity of lawsuits. In this paper, we propose a data-driven framework, Convolutional Tensor Factorization (CTF), to identify the patents that may cause litigations between two companies. Specifically, CTF is a hybrid modeling approach, where the content features from the patents are represented by the Network embedding-combined Convolutional Neural Network (NCNN) and the lawsuit records of companies are summarized in a tensor, respectively. Then, CTF integrates NCNN and tensor factorization to systematically exploit both content information and collaborative information from large amount of data. Finally, the risky patents will be returned by a learning to rank strategy. Extensive experimental results on real-world data demonstrate the effectiveness of our framework.

AAAI Conference 2017 Conference Paper

A Context-Enriched Neural Network Method for Recognizing Lexical Entailment

  • Kun Zhang
  • Enhong Chen
  • Qi Liu
  • Chuanren Liu
  • Guangyi Lv

Recognizing lexical entailment (RLE) always plays an important role in inference of natural language, i. e. , identifying whether one word entails another, for example, fox entails animal. In the literature, automatically recognizing lexical entailment for word pairs deeply relies on words’ contextual representations. However, as a “prototype” vector, a single representation cannot reveal multifaceted aspects of the words due to their homonymy and polysemy. In this paper, we propose a supervised Context-Enriched Neural Network (CENN) method for recognizing lexical entailment. To be specific, we first utilize multiple embedding vectors from different contexts to represent the input word pairs. Then, through different combination methods and attention mechanism, we integrate different embedding vectors and optimize their weights to predict whether there are entailment relations in word pairs. Moreover, our proposed framework is flexible and open to handle different word contexts and entailment perspectives in the text corpus. Extensive experiments on five datasets show that our approach significantly improves the performance of automatic RLE in comparison with several state-of-the-art methods.

IJCAI Conference 2017 Conference Paper

Enhancing Campaign Design in Crowdfunding: A Product Supply Optimization Perspective

  • Qi Liu
  • Guifeng Wang
  • Hongke Zhao
  • Chuanren Liu
  • Tong Xu
  • Enhong Chen

Crowdfunding is an emerging Internet application for creators designing campaigns (projects) to collect funds from public investors. Usually, the limited budget of the creator is manually divided into several perks (reward options), that should fit various market demand and further bring different monetary contributions for the campaign. Therefore, it is very challenging for each creator to design an effective campaign. To this end, in this paper, we aim to enhance the funding performance of the newly proposed campaigns, with a focus on optimizing the product supply of perks. Specifically, given the expected budget and the perks of a campaign, we propose a novel solution to automatically recommend the optimal product supply to every perk for balancing the expected return of this campaign against the risk. Along this line, we define it as a constrained portfolio selection problem, where the risk of each campaign is measured by a multi-task learning method. Finally, experimental results on the real-world crowdfunding data clearly prove that the optimized product supply can help improve the campaign performance significantly, and meanwhile, our multi-task learning method could more precisely estimate the risk of each campaign.

IJCAI Conference 2017 Conference Paper

Incremental Matrix Factorization: A Linear Feature Transformation Perspective

  • Xunpeng Huang
  • Le Wu
  • Enhong Chen
  • Hengshu Zhu
  • Qi Liu
  • Yijun Wang

Matrix Factorization (MF) is among the most widely used techniques for collaborative filtering based recommendation. Along this line, a critical demand is to incrementally refine the MF models when new ratings come in an online scenario. However, most of existing incremental MF algorithms are limited by specific MF models or strict use restrictions. In this paper, we propose a general incremental MF framework by designing a linear transformation of user and item latent vectors over time. This framework shows a relatively high accuracy with a computation and space efficient training process in an online scenario. Meanwhile, we explain the framework with a low-rank approximation perspective, and give an upper bound on the training error when this framework is used for incremental learning in some special cases. Finally, extensive experimental results on two real-world datasets clearly validate the effectiveness, efficiency and storage performance of the proposed framework.

TIST Journal 2017 Journal Article

P2P Lending Survey

  • Hongke Zhao
  • Yong Ge
  • Qi Liu
  • Guifeng Wang
  • Enhong Chen
  • Hefu Zhang

P2P lending is an emerging Internet-based application where individuals can directly borrow money from each other. The past decade has witnessed the rapid development and prevalence of online P2P lending platforms, examples of which include Prosper, LendingClub, and Kiva. Meanwhile, extensive research has been done that mainly focuses on the studies of platform mechanisms and transaction data. In this article, we provide a comprehensive survey on the research about P2P lending, which, to the best of our knowledge, is the first focused effort in this field. Specifically, we first provide a systematic taxonomy for P2P lending by summarizing different types of mainstream platforms and comparing their working mechanisms in detail. Then, we review and organize the recent advances on P2P lending from various perspectives (e.g., economics and sociology perspective, and data-driven perspective). Finally, we propose our opinions on the prospects of P2P lending and suggest some future research directions in this field. Meanwhile, throughout this paper, some analysis on real-world data collected from Prosper and Kiva are also conducted.

AAAI Conference 2017 Conference Paper

Question DifÞculty Prediction for READING Problems in Standard Tests

  • Zhenya Huang
  • Qi Liu
  • Enhong Chen
  • Hongke Zhao
  • Mingyong Gao
  • Si Wei
  • Yu Su
  • Guoping Hu

Standard tests aim to evaluate the performance of examinees using different tests with consistent difficulties. Thus, a critical demand is to predict the difficulty of each test question before the test is conducted. Existing studies are usually based on the judgments of education experts (e. g. , teachers), which may be subjective and labor intensive. In this paper, we propose a novel Test-aware Attention-based Convolutional Neural Network (TACNN) framework to automatically solve this Question Difficulty Prediction (QDP) task for READ- ING problems (a typical problem style in English tests) in standard tests. Specifically, given the abundant historical test logs and text materials of questions, we first design a CNNbased architecture to extract sentence representations for the questions. Then, we utilize an attention strategy to qualify the difficulty contribution of each sentence to questions. Considering the incomparability of question difficulties in different tests, we propose a test-dependent pairwise strategy for training TACNN and generating the difficulty prediction value. Extensive experiments on a real-world dataset not only show the effectiveness of TACNN, but also give interpretable insights to track the attention information for questions.

AAAI Conference 2016 Conference Paper

Modeling Users’ Preferences and Social Links in Social Networking Services: A Joint-Evolving Perspective

  • Le Wu
  • Yong Ge
  • Qi Liu
  • Enhong Chen
  • Bai Long
  • Zhenya Huang

Researchers have long converged that the evolution of a Social Networking Service (SNS) platform is driven by the interplay between users’ preferences (reflected in user-item consumption behavior) and the social network structure (re- flected in user-user interaction behavior), with both kinds of users’ behaviors change from time to time. However, traditional approaches either modeled these two kinds of behaviors in an isolated way or relied on a static assumption of a SNS. Thus, it is still unclear how do the roles of users’ historical preferences and the dynamic social network structure affect the evolution of SNSs. Furthermore, can jointly modeling users’ temporal behaviors in SNSs benefit both behavior prediction tasks? In this paper, we leverage the underlying social theories (i. e. , social influence and the homophily effect) to investigate the interplay and evolution of SNSs. We propose a probabilistic approach to fuse these social theories for jointly modeling users’ temporal behaviors in SNSs. Thus our proposed model has both the explanatory ability and predictive power. Experimental results on two real-world datasets demonstrate the effectiveness of our proposed model.

IJCAI Conference 2016 Conference Paper

Natural Supervised Hashing

  • Qi Liu
  • Hongtao Lu

Among learning-based hashing methods, supervised hashing tries to find hash codes which preserve semantic similarities of original data. Recent years have witnessed much efforts devoted to design objective functions and optimization methods for supervised hashing learning, in order to improve search accuracy and reduce training cost. In this paper, we propose a very straightforward supervised hashing algorithm and demonstrate its superiority over several state-of-the-art methods. The key idea of our approach is to treat label vectors as binary codes and to learn target codes which have similar structure to label vectors. To circumvent direct optimization on large Gram matrices, we identify an inner-product-preserving transformation and use it to bring close label vectors and hash codes without changing the structure. The optimization process is very efficient and scales well. In our experiment, training 16-bit and 96-bit code on NUS-WIDE cost respectively only 3 and 6 minutes.

AAAI Conference 2016 Conference Paper

Reading the Videos: Temporal Labeling for Crowdsourced Time-Sync Videos Based on Semantic Embedding

  • Guangyi Lv
  • Tong Xu
  • Enhong Chen
  • Qi Liu
  • Yi Zheng

Recent years have witnessed the boom of online sharing media contents, which raise significant challenges in effective management and retrieval. Though a large amount of efforts have been made, precise retrieval on video shots with certain topics has been largely ignored. At the same time, due to the popularity of novel time-sync comments, or so-called “bullet-screen comments”, video semantics could be now combined with timestamps to support further research on temporal video labeling. In this paper, we propose a novel video understanding framework to assign temporal labels on highlighted video shots. To be specific, due to the informal expression of bullet-screen comments, we first propose a temporal deep structured semantic model (T-DSSM) to represent comments into semantic vectors by taking advantage of their temporal correlation. Then, video highlights are recognized and labeled via semantic vectors in a supervised way. Extensive experiments on a real-world dataset prove that our framework could effectively label video highlights with a significant margin compared with baselines, which clearly validates the potential of our framework on video understanding, as well as bullet-screen comments interpretation.

TIST Journal 2016 Journal Article

Relevance Meets Coverage

  • Le Wu
  • Qi Liu
  • Enhong Chen
  • Nicholas Jing Yuan
  • Guangming Guo
  • Xing Xie

Collaborative filtering (CF) models offer users personalized recommendations by measuring the relevance between the active user and each individual candidate item. Following this idea, user-based collaborative filtering (UCF) usually selects the local popular items from the like-minded neighbor users. However, these traditional relevance-based models only consider the individuals (i.e., each neighbor user and candidate item) separately during neighbor set selection and recommendation set generation, thus usually incurring highly similar recommendations that lack diversity. While many researchers have recognized the importance of diversified recommendations, the proposed solutions either needed additional semantic information of items or decreased accuracy in this process. In this article, we describe how to generate both accurate and diversified recommendations from a new perspective. Along this line, we first introduce a simple measure of coverage that quantifies the usefulness of the whole set, that is, the neighbor userset and the recommended itemset as a complete entity. Then we propose a recommendation framework named REC that considers both traditional relevance-based scores and the new coverage measure based on UCF. Under REC, we further prove that the goals of maximizing relevance and coverage measures simultaneously in both the neighbor set selection step and the recommendation set generation step are NP-hard. Luckily, we can solve them effectively and efficiently by exploiting the inherent submodular property. Furthermore, we generalize the coverage notion and the REC framework from both a data perspective and an algorithm perspective. Finally, extensive experimental results on three real-world datasets show that the REC-based recommendation models can naturally generate more diversified recommendations without decreasing accuracy compared to some state-of-the-art models.

IROS Conference 2015 Conference Paper

Adaptive motor patterns and reflexes for bipedal locomotion on rough terrain

  • Qi Liu
  • Jie Zhao 0001
  • Steffen Schütz
  • Karsten Berns

The Bio-inspired Behavior-Based Bipedal Locomotion Control (B4LC) system consists of control units encapsulating feed-forward and feedback mechanisms, namely motor patterns and reflexes. To optimize the performance of motor patterns and reflexes in terms of stable locomotion on both even and uneven terrains, we present a learning scheme embedded in the B4LC system. By combining the Particle Swarm Optimization (PSO) method and the Expectation-maximization based Reinforcement Learning (EM-RL) method, a learning unit is comprised of an optimization module and a learning module embedded in the hierarchical control structure. The optimization module optimizes the motor patterns at hip and ankle joints with respect to energy consumption, stability and velocity control. The learning module generates compensating torques against disturbances at the ankle joints by combining the basis function derived from state information and the policy parameters. The optimization and learning procedures are conducted on a simulated robot with 21 DoFs. The simulation results show that the robot with optimized motor patterns and learned reflexes performs a more robust and stable locomotion on even and uneven terrains.

IJCAI Conference 2015 Conference Paper

Cognitive Modelling for Predicting Examinee Performance

  • Runze Wu
  • Qi Liu
  • Yuping Liu
  • Enhong Chen
  • Yu Su
  • Zhigang Chen
  • Guoping Hu

Cognitive modelling can discover the latent characteristics of examinees for predicting their performance (i. e. scores) on each problem. As cognitive modelling is important for numerous applications, e. g. personalized remedy recommendation, some solutions have been designed in the literature. However, the problem of extracting information from both objective and subjective problems to get more precise and interpretable cognitive analysis is still underexplored. To this end, we propose a fuzzy cognitive diagnosis framework (FuzzyCDF) for examinees’ cognitive modelling with both objective and subjective problems. Specifically, to handle the partially correct responses on subjective problems, we first fuzzify the skill proficiency of examinees. Then, we combine fuzzy set theory and educational hypotheses to model the examinees’ mastery on the problems. Further, we simulate the generation of examination scores by considering both slip and guess factors. Extensive experiments on three realworld datasets prove that FuzzyCDF can predict examinee performance more effectively, and the output of FuzzyCDF is also interpretative.

IJCAI Conference 2015 Conference Paper

Matrix Factorization with Scale-Invariant Parameters

  • Guangxiang Zeng
  • Hengshu Zhu
  • Qi Liu
  • Ping Luo
  • Enhong Chen
  • Tong Zhang

Tuning hyper-parameters for large-scale matrix factorization (MF) is very time consuming and sometimes unacceptable. Intuitively, we want to tune hyper-parameters on small sub-matrix sample and then exploit them into the original large-scale matrix. However, most of existing MF methods are scale-variant, which means the optimal hyperparameters usually change with the different scale of matrices. To this end, in this paper we propose a scale-invariant parametric MF method, where a set of scale-invariant parameters is defined for model complexity regularization. Therefore, the proposed method can free us from tuning hyper-parameters on large-scale matrix, and achieve a good performance in a more efficient way. Extensive experiments on real-world dataset clearly validate both the effectiveness and efficiency of our method.

IJCAI Conference 2015 Conference Paper

Maximizing the Coverage of Information Propagation in Social Networks

  • Zhefeng Wang
  • Enhong Chen
  • Qi Liu
  • Yu Yang
  • Yong Ge
  • Biao Chang

Social networks, due to their popularity, have been studied extensively these years. A rich body of these studies is related to influence maximization, which aims to select a set of seed nodes for maximizing the expected number of active nodes at the end of the process. However, the set of active nodes can not fully represent the true coverage of information propagation. A node may be informed of the information when any of its neighbours become active and try to activate it, though this node (namely informed node) is still inactive. Therefore, we need to consider both active nodes and informed nodes that are aware of the information when we study the coverage of information propagation in a network. Along this line, in this paper we propose a new problem called Information Coverage Maximization that aims to maximize the expected number of both active nodes and informed ones. After we prove that this problem is NP-hard and submodular in the independent cascade model, we design two algorithms to solve it. Extensive experiments on three real-world data sets demonstrate the performance of the proposed algorithms.

ICRA Conference 2014 Conference Paper

Experimental verification of an approach for disturbance estimation and compensation on a simulated biped during perturbed stance

  • Jie Zhao 0001
  • Qi Liu
  • Steffen Schütz
  • Karsten Berns

Human shows remarkable skills in reactive balancing control on unknown disturbances while standing and walking. Though current bipedal robots can walk, run and step obstacles, they normally perform in a well-controlled environment. Unexpected perturbations can cause the tumbling of bipedal robots when they possess limited capability of rejecting disturbances. Studies upon neurology and psychophysics of human stance attempt to trace how the human deals with external disturbances. This paper introduces a methodology for disturbances estimation and compensation (DEC) for bipedal robot during standing. Previous psychophysical studies of human self-motion perception indicate that humans estimate and compensate disturbances as follows: firstly, multi-sensory inputs are fused to provide explicit measures and then the estimations of the external disturbances are performed based on them. Then, the estimations are fed into a local feedback control loop, compensating the disturbances. Thus, an approach of disturbance estimation and compensation is developed according to the psychophysical aspects of human. Various experiments, for instance, standing on a rotating plate with varying frequency and amplitudes and continuous external contact forces upon the torso of a bipedal robot, are implemented. Through analyzing and verifying the experimental results on a simulated biped, one can state that the DEC approach successfully serves the bipedal robot during perturbed stance.

TIST Journal 2014 Journal Article

Object-Oriented Travel Package Recommendation

  • Chang Tan
  • Qi Liu
  • Enhong Chen
  • Hui Xiong
  • Xiang Wu

Providing better travel services for tourists is one of the important applications in urban computing. Though many recommender systems have been developed for enhancing the quality of travel service, most of them lack a systematic and open framework to dynamically incorporate multiple types of additional context information existing in the tourism domain, such as the travel area, season, and price of travel packages. To that end, in this article, we propose an open framework, the Objected-Oriented Recommender System (ORS), for the developers performing personalized travel package recommendations to tourists. This framework has the ability to import all the available additional context information to the travel package recommendation process in a cost-effective way. Specifically, the different types of additional information are extracted and uniformly represented as feature--value pairs. Then, we define the Object, which is the collection of the feature--value pairs. We propose two models that can be used in the ORS framework for extracting the implicit relationships among Objects. The Objected-Oriented Topic Model (OTM) can extract the topics conditioned on the intrinsic feature--value pairs of the Objects. The Objected-Oriented Bayesian Network (OBN) can effectively infer the cotravel probability of two tourists by calculating the co-occurrence time of feature--value pairs belonging to different kinds of Objects. Based on the relationships mined by OTM or OBN, the recommendation list is generated by the collaborative filtering method. Finally, we evaluate these two models and the ORS framework on real-world travel package data, and the experimental results show that the ORS framework is more flexible in terms of incorporating additional context information, and thus leads to better performances for travel package recommendations. Meanwhile, for feature selection in ORS, we define the feature information entropy, and the experimental results demonstrate that using features with lower entropies usually leads to better recommendation results.

IJCAI Conference 2013 Conference Paper

PageRank with Priors: An Influence Propagation Perspective

  • Biao Xiang
  • Qi Liu
  • Enhong Chen
  • Hui Xiong
  • Yi Zheng
  • Yu Yang

Recent years have witnessed increased interests in measuring authority and modelling influence in social networks. For a long time, PageRank has been widely used for authority computation and has also been adopted as a solid baseline for evaluating social influence related applications. However, the connection between authority measurement and in- fluence modelling is not clearly established. To this end, in this paper, we provide a focused study on understanding of PageRank as well as the relationship between PageRank and social influence analysis. Along this line, we first propose a linear social influence model and reveal that this model is essentially PageRank with prior. Also, we show that the authority computation by PageRank can be enhanced with more generalized priors. Moreover, to deal with the computational challenge of PageRank with general priors, we provide an upper bound for top authoritative nodes identification. Finally, the experimental results on the scientific collaboration network validate the effectiveness of the proposed social influence model.