Author name cluster

Gong Cheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

1 author row

AAAI Conference 2026 Conference Paper

Exploring Modality-Aware Fusion and Decoupled Temporal Propagation for Multi-Modal Object Tracking

Shilei Wang
Pujian Lai
Dong Gao
Jifeng Ning
Gong Cheng

Most existing multi-modal trackers adopt uniform fusion strategies, overlooking the inherent differences between modalities. Moreover, they propagate temporal information through mixed tokens, leading to entangled and less discriminative temporal representations. To address these limitations, we propose MDTrack, a novel framework for modality-aware fusion and decoupled temporal propagation in multi-modal object tracking. Specifically, for modality-aware fusion, we allocate dedicated experts to each modality (Infrared, Event, Depth, and RGB) to process their respective representations. The gating mechanism within the Mixture of Experts (MoE) then dynamically selects the optimal experts based on the input features, enabling adaptive and modality-specific fusion. For decoupled temporal propagation, we introduce two separate State Space Model (SSM) structures to independently store and update the hidden states h of the RGB and X-modal streams, effectively capturing their distinct temporal information. To ensure synergy between the two temporal representations, we incorporate a set of cross-attentions between the input features of the two SSMs, facilitating implicit information exchange. The resulting temporally enriched features are then integrated into the backbone via another set of cross-attention, enhancing MDTrack’s ability to leverage temporal information. Extensive experiments demonstrate the effectiveness of our proposed method. Both MDTrack-S (Modality-Specific Training) and MDTrack-U (Unified-Modality Training) achieve state-of-the-art performance across five multi-modal tracking benchmarks.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Unified Interaction Consistency Learning for Single-Source Domain-Generalized Object Detection in Urban Scene

Peng Zhang
Xiang Yuan
Gong Cheng

Domain generalization remains a critical challenge for deploying neural networks, particularly in out-of-distribution object detection. The distributional discrepancy between training (e.g., daytime-sunny) and the realistic condition (e.g., night-rainy) inevitably produces imprecise localization and wrong classification. To address these issues, we propose a unified interaction consistency learning (UICL) framework, a novel single-source domain-generalized method designed to learn intra-class domain-invariant representations. Specifically, we put forth a cross-domain interaction mechanism to exchange region proposals between original and augmented pipelines, enriching the diversity of instance-level representations. Building upon this, we propose prediction-guided consistency learning to unify the interaction mechanism and harmonize the cross-domain representations, contributing to a discriminative prediction distribution under domain shift. In addition, we devise a cyclic interaction resilient detection strategy, which mitigates inaccurate predictions suffering from partial occlusion and ambiguous boundaries among different domains. Extensive experiments evidence that UICL significantly improves the robustness of detectors over several target domains, achieving state-of-the-art generalization performance on the diverse weather benchmark.

PDF Details DOI

AAAI Conference 2026 Conference Paper

UQ-ViT: Harmonizing Extreme Activations with Hardware-Friendly Uniform Quantization in Vision Transformers

Tao Jiang
Yucheng Jiang
Xiwen Yao
Gong Cheng
Junwei Han

Post-Training Quantization enables efficient Vision Transformer (ViTs) deployment with a small calibration data, and its prevalent use of uniform quantization harnesses AI accelerator matrix cores for high-speed inference. However, the application of uniform quantization is fundamentally challenged by the extreme non-uniformity of activation distributions.Specifically, the power-law nature of post-Softmax attention scores and the significant inter-channel variance in post-GELU activations create a dilemma for conventional quantization, as it struggles to preserve critical high-magnitude values without sacrificing overall precision. To resolve this core conflict, we introduce UQ-ViT (Uniform Quantization for Vision Transformers), a novel uniform quantization framework designed to reconcile high precision with hardware efficiency. Central to UQ-ViT are two operators: Dynamic Elimination of Maximum (DeMax) and Normalization Quantization (NormQuant). DeMax is a quantization operator for post-Softmax attention scores that utilizes uniform quantization. It dynamically eliminates and preserves dominant values, effectively mitigating quantization loss from the extreme values in the power-law distribution. NormQuant utilizes a per-channel quantization strategy during quantization and reverts to a per-tensor format for dequantization, achieving both high accuracy and computational efficiency. Crucially, it is applicable to any linear layer, enabling effective quantization of post-GELU activations in ViTs. Through extensive experiments on various ViTs and vision tasks, including image classification, object detection, and instance segmentation, we demonstrate that our proposed approach outperforms existing methods, achieving superior accuracy while ensuring hardware friendliness.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

A Survey on Extractive Knowledge Graph Summarization: Applications, Approaches, Evaluation, and Future Directions

Xiaxia Wang
Gong Cheng

With the continuous growth of large Knowledge Graphs (KGs), extractive KG summarization becomes a trending task. Aiming at distilling a compact subgraph with condensed information, it facilitates various downstream KG-based tasks. In this survey paper, we are among the first to provide a systematic overview of its applications and define a taxonomy for existing methods from its interdisciplinary studies. Future directions are also laid out based on our extensive and comparative review.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

MindMerger: Efficiently Boosting LLM Reasoning in non-English Languages

Zixian Huang
Wenhao Zhu
Gong Cheng
Lei Li
Fei Yuan

Reasoning capabilities are crucial for Large Language Models~(LLMs), yet a notable gap exists between English and non-English languages. To bridge this disparity, some works fine-tune LLMs to relearn reasoning capabilities in non-English languages, while others replace non-English inputs with an external model's outputs such as English translation text to circumvent the challenge of LLM understanding non-English. Unfortunately, these methods often underutilize the built-in skilled reasoning and useful language understanding capabilities of LLMs. In order to better utilize the minds of reasoning and language understanding in LLMs, we propose a new method, namely MergeMinds, which merges LLMs with the external language understanding capabilities from multilingual models to boost the multilingual reasoning performance. Furthermore, a two-step training scheme is introduced to first train to embeded the external capabilities into LLMs and then train the collaborative utilization of the external capabilities and the built-in capabilities in LLMs. Experiments on three multilingual reasoning datasets and a language understanding dataset demonstrate that MergeMinds consistently outperforms all baselines, especially in low-resource languages. Without updating the parameters of LLMs, the average accuracy improved by 6. 7 and 8. 0 across all languages and low-resource languages on the MGSM dataset, respectively.

PDF Details DOI

AAAI Conference 2023 Conference Paper

DyRRen: A Dynamic Retriever-Reranker-Generator Model for Numerical Reasoning over Tabular and Textual Data

Xiao Li
Yin Zhu
Sichen Liu
Jiangzhou Ju
Yuzhong Qu
Gong Cheng

Numerical reasoning over hybrid data containing tables and long texts has recently received research attention from the AI community. To generate an executable reasoning program consisting of math and table operations to answer a question, state-of-the-art methods use a retriever-generator pipeline. However, their retrieval results are static, while different generation steps may rely on different sentences. To attend to the retrieved information that is relevant to each generation step, in this paper, we propose DyRRen, an extended retriever-reranker-generator framework where each generation step is enhanced by a dynamic reranking of retrieved sentences. It outperforms existing baselines on the FinQA dataset.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

Beyond the Prototype: Divide-and-conquer Proxies for Few-shot Segmentation

Chunbo Lang
Binfei Tu
Gong Cheng
Junwei Han

Few-shot segmentation, which aims to segment unseen-class objects given only a handful of densely labeled samples, has received widespread attention from the community. Existing approaches typically follow the prototype learning paradigm to perform meta-inference, which fails to fully exploit the underlying information from support image-mask pairs, resulting in various segmentation failures, e. g. , incomplete objects, ambiguous boundaries, and distractor activation. To this end, we propose a simple yet versatile framework in the spirit of divide-and-conquer. Specifically, a novel self-reasoning scheme is first implemented on the annotated support image, and then the coarse segmentation mask is divided into multiple regions with different properties. Leveraging effective masked average pooling operations, a series of support-induced proxies are thus derived, each playing a specific role in conquering the above challenges. Moreover, we devise a unique parallel decoder structure that integrates proxies with similar attributes to boost the discrimination power. Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information as a guide at the “episode” level, not just about the object cues themselves. Extensive experiments on PASCAL-5i and COCO-20i demonstrate the superiority of DCP over conventional prototype-based approaches (up to 5~10% on average), which also establishes a new state-of-the-art. Code is available at github. com/chunbolang/DCP.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

Keyword-Based Knowledge Graph Exploration Based on Quadratic Group Steiner Trees

Yuxuan Shi
Gong Cheng
Trung-Kien Tran
Jie Tang
Evgeny Kharlamov

Exploring complex structured knowledge graphs (KGs) is challenging for non-experts as it requires knowledge of query languages and the underlying structure of the KGs. Keyword-based exploration is a convenient paradigm, and computing a group Steiner tree (GST) as an answer is a popular implementation. Recent studies suggested improving the cohesiveness of an answer where entities have small semantic distances from each other. However, how to efficiently compute such an answer is open. In this paper, to model cohesiveness in a generalized way, the quadratic group Steiner tree problem (QGSTP) is formulated where the cost function extends GST with quadratic terms representing semantic distances. For QGSTP we design a branch-and-bound best-first (B3F) algorithm where we exploit combinatorial methods to estimate lower bounds for costs. This exact algorithm shows practical performance on medium-sized KGs.

PDF Details DOI

AAAI Conference 2021 Conference Paper

TSQA: Tabular Scenario Based Question Answering

Xiao Li
Yawei Sun
Gong Cheng

Scenario-based question answering (SQA) has attracted an increasing research interest. Compared with the well-studied machine reading comprehension (MRC), SQA is a more challenging task: a scenario may contain not only a textual passage to read but also structured data like tables, i. e. , tabular scenario based question answering (TSQA). AI applications of TSQA such as answering multiple-choice questions in high-school exams require synthesizing data in multiple cells and combining tables with texts and domain knowledge to infer answers. To support the study of this task, we construct GeoTSQA. This dataset contains 1k real questions contextualized by tabular scenarios in the geography domain. To solve the task, we extend state-of-the-art MRC methods with TTGen, a novel table-to-text generator. It generates sentences from variously synthesized tabular data and feeds the downstream MRC method with the most useful sentences. Its sentence ranking model fuses the information in the scenario, question, and domain knowledge. Our approach outperforms a variety of strong baseline methods on GeoTSQA.

PDF Details

IJCAI Conference 2020 Conference Paper

Enriching Documents with Compact, Representative, Relevant Knowledge Graphs

Shuxin Li
Zixian Huang
Gong Cheng
Evgeny Kharlamov
Kalpa Gunaratna

A prominent application of knowledge graph (KG) is document enrichment. Existing methods identify mentions of entities in a background KG and enrich documents with entity types and direct relations. We compute an entity relation subgraph (ERG) that can more expressively represent indirect relations among a set of mentioned entities. To find compact, representative, and relevant ERGs for effective enrichment, we propose an efficient best-first search algorithm to solve a new combinatorial optimization problem that achieves a trade-off between representativeness and compactness, and then we exploit ontological knowledge to rank ERGs by entity-based document-KG and intra-KG relevance. Extensive experiments and user studies show the promising performance of our approach.

PDF Details DOI

IJCAI Conference 2020 Conference Paper

Neural Entity Summarization with Joint Encoding and Weak Supervision

Junyou Li
Gong Cheng
Qingxia Liu
Wen Zhang
Evgeny Kharlamov
Kalpa Gunaratna
Huajun Chen

In a large-scale knowledge graph (KG), an entity is often described by a large number of triple-structured facts. Many applications require abridged versions of entity descriptions, called entity summaries. Existing solutions to entity summarization are mainly unsupervised. In this paper, we present a supervised approach NEST that is based on our novel neural model to jointly encode graph structure and text in KGs and generate high-quality diversified summaries. Since it is costly to obtain manually labeled summaries for training, our supervision is weak as we train with programmatically labeled data which may contain noise but is free of manual work. Evaluation results show that our approach significantly outperforms the state of the art on two public benchmarks.

PDF Details DOI

AAAI Conference 2020 Conference Paper

SPARQA: Skeleton-Based Semantic Parsing for Complex Questions over Knowledge Bases

Yawei Sun
Lingling Zhang
Gong Cheng
Yuzhong Qu

Semantic parsing transforms a natural language question into a formal query over a knowledge base. Many existing methods rely on syntactic parsing like dependencies. However, the accuracy of producing such expressive formalisms is not satisfying on long complex questions. In this paper, we propose a novel skeleton grammar to represent the high-level structure of a complex question. This dedicated coarse-grained formalism with a BERT-based parsing algorithm helps to improve the accuracy of the downstream ﬁne-grained semantic parsing. Besides, to align the structure of a question with the structure of a knowledge base, our multi-strategy method combines sentence-level and word-level semantics. Our approach shows promising performance on several datasets.

PDF Details

IJCAI Conference 2018 Conference Paper

Multi-scale and Discriminative Part Detectors Based Features for Multi-label Image Classification

Gong Cheng
Decheng Gao
Yang Liu
Junwei Han

Convolutional neural networks (CNNs) have shown their promise for image classification task. However, global CNN features still lack geometric invariance for addressing the problem of intra-class variations and so are not optimal for multi-label image classification. This paper proposes a new and effective framework built upon CNNs to learn Multi-scale and Discriminative Part Detectors (MsDPD)-based feature representations for multi-label image classification. Specifically, at each scale level, we (i) first present an entropy-rank based scheme to generate and select a set of discriminative part detectors (DPD), and then (ii) obtain a number of DPD-based convolutional feature maps with each feature map representing the occurrence probability of a particular part detector and learn DPD-based features by using a task-driven pooling scheme. The two steps are formulated into a unified framework by developing a new objective function, which jointly trains part detectors incrementally and integrates the learning of feature representations into the classification task. Finally, the multi-scale features are fused to produce the predictions. Experimental results on PASCAL VOC 2007 and VOC 2012 datasets demonstrate that the proposed method achieves better accuracy when compared with the existing state-of-the-art multi-label classification methods.

PDF Details

IJCAI Conference 2017 Conference Paper

Relatedness-based Multi-Entity Summarization

Kalpa Gunaratna
Amir Hossein Yazdavar
Krishnaprasad Thirunarayan
Amit Sheth
Gong Cheng

Representing world knowledge in a machine processable format is important as entities and their descriptions have fueled tremendous growth in knowledge-rich information processing platforms, services, and systems. Prominent applications of knowledge graphs include search engines (e. g. , Google Search and Microsoft Bing), email clients (e. g. , Gmail), and intelligent personal assistants (e. g. , Google Now, Amazon Echo, and Apple's Siri). In this paper, we present an approach that can summarize facts about a collection of entities by analyzing their relatedness in preference to summarizing each entity in isolation. Specifically, we generate informative entity summaries by selecting: (i) inter-entity facts that are similar and (ii) intra-entity facts that are important and diverse. We employ a constrained knapsack problem solving approach to efficiently compute entity summaries. We perform both qualitative and quantitative experiments and demonstrate that our approach yields promising results compared to two other stand-alone state-of-the-art entity summarization approaches.

PDF Details

IJCAI Conference 2016 Conference Paper

HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization

Gong Cheng
Cheng Jin
Yuzhong Qu

The rapid growth of open data on the Web promotes the development of data portals that facilitate finding useful datasets. To help users quickly inspect a dataset found in a portal, we propose to summarize its contents and generate a hierarchical grouping of entities connected by relations. Our generic approach, called HIEDS, considers coverage of dataset, height of hierarchy, cohesion within groups, overlap between groups, and homogeneity of groups, and integrates these configurable factors into a combinatorial optimization problem to solve. We present an efficient solution, to serve users with dynamically configured summaries with acceptable latency. We systematically experiment with our approach on real-world RDF datasets.

PDF Details

IJCAI Conference 2016 Conference Paper

Taking Up the Gaokao Challenge: An Information Retrieval Approach

Gong Cheng
Weixi Zhu
Ziwei Wang
Jianghui Chen
Yuzhong Qu

Answering questions in a university's entrance examination like Gaokao in China challenges AI technology. As a preliminary attempt to take up this challenge, we focus on multiple-choice questions in Gaokao, and propose a three-stage approach that exploits and extends information retrieval techniques. Taking Wikipedia as the source of knowledge, our approach obtains knowledge relevant to a question by retrieving pages from Wikipedia via string matching and context-based disambiguation, and then ranks and filters pages using multiple strategies to draw critical evidence, based on which the truth of each option is assessed via relevance-based entailment. It achieves encouraging results on real-life questions in recent history tests, significantly outperforming baseline approaches.

PDF Details