Author name cluster

Zhongming Jin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

1 author row

AAAI Conference 2026 Conference Paper

Enhancing Spatial Reasoning Through Visual and Textual Thinking

Xun Liang
Xin Guo
Zhongming Jin
Weihang Pan
Penghui Shang
Deng Cai
Binbin Lin
Jieping Ye

The spatial reasoning task aims to reason about the spatial relationships in 2D and 3D space, which is a fundamental capability for Visual Question Answering (VQA) and robotics. Although vision language models (VLMs) have developed rapidly in recent years, they are still struggling with the spatial reasoning task. In this paper, we introduce a method that can enhance Spatial reasoning through Visual and Textual thinking Simultaneously (SpatialVTS). In the spatial visual thinking phase, our model is trained to generate location-related specific tokens of important targets automatically. Not only are the objects mentioned in the problem addressed, but also the potential objects related to the reasoning are considered. During the spatial textual thinking phase, our model conducts long-term thinking based on visual cues and dialogues and gradually inferences the answers to spatial reasoning problems. To effectively support the model's training, we made manual corrections to the existing spatial reasoning dataset, eliminating numerous incorrect labels resulting from automatic annotation, restructuring the data input format to enhance generalization, and developing a reasoning framework for model thinking. Without introducing any additional information (such as masks or depth), our model's overall average level in several spatial understanding tasks has significantly improved compared with other models.

PDF Details DOI

AAAI Conference 2026 Conference Paper

FGD-Align: Pluralistic Alignment for Large Language Models via Fuzzy Group Decision-Making

Weihang Pan
Zhengxu Yu
Yong Wu
Xun Liang
Zhongming Jin
Qiang Fu
Penghui Shang
Binbin Lin

Ensuring alignment with human values is essential for modern large language models (LLMs), especially amid growing concerns around AI safety and social impact. Yet achieving such alignment remains challenging due to the limited, noisy, and often conflicting nature of human feedback from diverse annotators. Most existing approaches, such as Direct Preference Optimization (DPO), assume consistent and conflict-free supervision, overlooking the ambiguity, inconsistency, and value trade-offs inherent in real-world preferences—often leading to reduced robustness and exclusion of minority views. To address this, we propose FGD-Align, a novel pluralistic alignment framework grounded in Fuzzy Group Decision-Making theory. Our approach rigorously models and aggregates human preferences while retaining the complexity of real-world value trade-offs. Unlike traditional methods that rely on coarse-grained preference pairs, FGD-Align introduces fuzzy preference modeling via triangular fuzzy numbers to capture nuanced, multi-criteria human judgments. We further develop a new training objective, Probabilistic Fuzzy DPO, which incorporates fuzzy preference strength as adaptive loss weights and gradient filters, enhancing robustness to ambiguity and inconsistency in feedback. Comprehensive experiments demonstrate that FGD-Align consistently outperforms both DPO variants and advanced preference aggregation methods in terms of preference accuracy and robustness to ambiguity. It achieves superior alignment stability and better preserves minority preferences, all with minimal computational overhead. Our work bridges the gap between algorithmic tractability and the nuanced landscape of human values, enabling more scalable, inclusive, and socially-aware AI alignment.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

TokenSqueeze: Performance-Preserving Compression for Reasoning LLMs

Yuxiang Zhang
Zhengxu Yu
Weihang Pan
Zhongming Jin
Qiang Fu
Deng Cai
Binbin Lin
Jieping Ye

Emerging reasoning LLMs such as OpenAI-o1 and DeepSeek-R1 have achieved strong performance on complex reasoning tasks by generating long chain-of-thought (CoT) traces. However, these long CoTs result in increased token usage, leading to higher inference latency and memory consumption. As a result, balancing accuracy and reasoning efficiency has become essential for deploying reasoning LLMs in practical applications. Existing long-to-short (Long2Short) methods aim to reduce inference length but often sacrifice accuracy, revealing a need for an approach that maintains performance while lowering token costs. To address this efficiency-accuracy tradeoff, we propose TokenSqueeze, a novel Long2Short method that condenses reasoning paths while preserving performance and relying exclusively on self-generated data. First, to prevent performance degradation caused by excessive compression of reasoning depth, we propose to select self-generated samples whose reasoning depth is adaptively matched to the complexity of the problem. To further optimize the linguistic expression without altering the underlying reasoning paths, we introduce a distribution-aligned linguistic refinement method that enhances the clarity and conciseness of the reasoning path while preserving its logical integrity. Comprehensive experimental results demonstrated the effectiveness of TokenSqueeze in reducing token usage while maintaining accuracy. Notably, DeepSeek‑R1‑Distill‑Qwen‑7B fine-tuned by using our proposed method achieved a 50\% average token reduction while preserving accuracy on the MATH500 benchmark. TokenSqueeze exclusively utilizes the model's self-generated data, enabling efficient and high-fidelity reasoning without relying on manually curated short-answer datasets across diverse applications. Our code is available at \url{https: //github. com/zhangyx1122/TokenSqueeze}.

PDF Details

NeurIPS Conference 2024 Conference Paper

Low Precision Local Training is Enough for Federated Learning

Zhiwei Li
Yiqiu Li
Binbin Lin
Zhongming Jin
Weizhong Zhang

Federated Learning (FL) is a prevalent machine learning paradigm designed to address challenges posed by heterogeneous client data while preserving data privacy. Unlike distributed training, it typically orchestrates resource-constrained edge devices to communicate via a low-bandwidth communication network with a central server. This urges the development of more computation and communication efficient training algorithms. In this paper, we propose an efficient FL paradigm, where the local models in the clients are trained with low-precision operations and communicated with the server in low precision format, while only the model aggregation in the server is performed with high-precision computation. We surprisingly find that high precision models can be recovered from the low precision local models with proper aggregation in the server. In this way, both the workload in the client-side and the communication cost can be significantly reduced. We theoretically show that our proposed paradigm can converge to the optimal solution as the training goes on, which demonstrates that low precision local training is enough for FL. Our paradigm can be integrated with existing FL algorithms flexibly. Experiments across extensive benchmarks are conducted to showcase the effectiveness of our proposed method. Notably, the models trained by our method with the precision as low as 8 bits are comparable to those from the full precision training. As a by-product, we show that low precision local training can relieve the over-fitting issue in local training, which under heterogeneous client data can cause the client models drift further away from each other and lead to the failure in model aggregation. Code is released at https: //github. com/digbangbang/LPT-FL.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs

Liyi Chen
Panrong Tong
Zhongming Jin
Ying Sun
Jieping Ye
Hui Xiong

Large Language Models (LLMs) have shown remarkable reasoning capabilities on complex tasks, but they still suffer from out-of-date knowledge, hallucinations, and opaque decision-making. In contrast, Knowledge Graphs (KGs) can provide explicit and editable knowledge for LLMs to alleviate these issues. Existing paradigm of KG-augmented LLM manually predefines the breadth of exploration space and requires flawless navigation in KGs. However, this paradigm cannot adaptively explore reasoning paths in KGs based on the question semantics and self-correct erroneous reasoning paths, resulting in a bottleneck in efficiency and effect. To address these limitations, we propose a novel self-correcting adaptive planning paradigm for KG-augmented LLM named Plan-on-Graph (PoG), which first decomposes the question into several sub-objectives and then repeats the process of adaptively exploring reasoning paths, updating memory, and reflecting on the need to self-correct erroneous reasoning paths until arriving at the answer. Specifically, three important mechanisms of Guidance, Memory, and Reflection are designed to work together, to guarantee the adaptive breadth of self-correcting planning for graph reasoning. Finally, extensive experiments on three real-world datasets demonstrate the effectiveness and efficiency of PoG.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

CIMON: Towards High-quality Hash Codes

Xiao Luo
Daqing Wu
Zeyu Ma
Chong Chen
Minghua Deng
Jinwen Ma
Zhongming Jin
Jianqiang Huang

Recently, hashing is widely used in approximate nearest neighbor search for its storage and computational efficiency. Most of the unsupervised hashing methods learn to map images into semantic similarity-preserving hash codes by constructing local semantic similarity structure from the pre-trained model as the guiding information, i. e. , treating each point pair similar if their distance is small in feature space. However, due to the inefficient representation ability of the pre-trained model, many false positives and negatives in local semantic similarity will be introduced and lead to error propagation during the hash code learning. Moreover, few of the methods consider the robustness of models, which will cause instability of hash codes to disturbance. In this paper, we propose a new method named Comprehensive sImilarity Mining and cOnsistency learNing (CIMON). First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes. Extensive experiments on several benchmark datasets show that the proposed method outperforms a wide range of state-of-the-art methods in both retrieval performance and robustness.

PDF Details DOI

AAAI Conference 2021 Conference Paper

Traffic Flow Prediction with Vehicle Trajectories

Mingqian Li
Panrong Tong
Mo Li
Zhongming Jin
Jianqiang Huang
Xian-Sheng Hua

This paper proposes a spatiotemporal deep learning framework, Trajectory-based Graph Neural Network (TrGNN), that mines the underlying causality of flows from historical vehicle trajectories and incorporates that into road traffic prediction. The vehicle trajectory transition patterns are studied to explicitly model the spatial traffic demand via graph propagation along the road network; an attention mechanism is designed to learn the temporal dependencies based on neighborhood traffic status; and finally, a fusion of multi-step prediction is integrated into the graph neural network design. The proposed approach is evaluated with a real-world trajectory dataset. Experiment results show that the proposed TrGNN model achieves over 5% error reduction when compared with the state-of-the-art approaches across all metrics for normal traffic, and up to 14% for atypical traffic during peak hours or abnormal events. The advantage of trajectory transitions especially manifest itself in inferring high fluctuation of flows as well as non-recurrent flow patterns.

PDF Details

IJCAI Conference 2020 Conference Paper

MaCAR: Urban Traffic Light Control via Active Multi-agent Communication and Action Rectification

Zhengxu Yu
Shuxian Liang
Long Wei
Zhongming Jin
Jianqiang Huang
Deng Cai
Xiaofei He
Xian-Sheng Hua

Urban traffic light control is an important and challenging real-world problem. By regarding intersections as agents, most of the Reinforcement Learning (RL) based methods generate actions of agents independently. They can cause action conflict and result in overflow or road resource waste in adjacent intersections. Recently, some collaborative methods have alleviated the above problems by extending the observable surroundings of agents, which can be considered as inactive cross-agent communication methods. However, when agents act synchronously in these works, the perceived action value is biased and the information exchanged is insufficient. In this work, we propose a novel Multi-agent Communication and Action Rectification (MaCAR) framework. It enables active communication between agents by considering the impact of synchronous actions of agents. MaCAR consists of two parts: (1) an active Communication Agent Network (CAN) involving a Message Propagation Graph Neural Network (MPGNN); (2) a Traffic Forecasting Network (TFN) which learns to predict the traffic after agents' synchronous actions and the corresponding action values. By using predicted information, we mitigate the action value bias during training to help rectify agents' future actions. In experiments, we show that our proposal can outperforms state-of-the-art methods on both synthetic and real-world datasets.

PDF Details DOI

IJCAI Conference 2019 Conference Paper

Progressive Transfer Learning for Person Re-identification

Zhengxu Yu
Zhongming Jin
Long Wei
Jishun Guo
Jianqiang Huang
Deng Cai
Xiaofei He
Xian-Sheng Hua

Model fine-tuning is a widely used transfer learning approach in person Re-identification (ReID) applications, which fine-tuning a pre-trained feature extraction model into the target scenario instead of training a model from scratch. It is challenging due to the significant variations inside the target scenario, e. g. , different camera viewpoint, illumination changes, and occlusion. These variations result in a gap between the distribution of each mini-batch and the distribution of the whole dataset when using mini-batch training. In this paper, we study model fine-tuning from the perspective of the aggregation and utilization of the global information of the dataset when using mini-batch training. Specifically, we introduce a novel network structure called Batch-related Convolutional Cell (BConv-Cell), which progressively collects the global information of the dataset into a latent state and uses this latent state to rectify the extracted feature. Based on BConv-Cells, we further proposed the Progressive Transfer Learning (PTL) method to facilitate the model fine-tuning process by joint training the BConv-Cells and the pre-trained ReID model. Empirical experiments show that our proposal can improve the performance of the ReID model greatly on MSMT17, Market-1501, CUHK03 and DukeMTMC-reID datasets. The code will be released later on at \url{https: //github. com/ZJULearning/PTL}

PDF Details

IJCAI Conference 2013 Conference Paper

A Unified Approximate Nearest Neighbor Search Scheme by Combining Data Structure and Hashing

Debing Zhang
Genmao Yang
Yao Hu
Zhongming Jin
Deng Cai
Xiaofei He

Nowadays, Nearest Neighbor Search becomes more and more important when facing the challenge of big data. Traditionally, to solve this problem, researchers mainly focus on building effective data structures such as hierarchical k-means tree or using hashing methods to accelerate the query process. In this paper, we propose a novel unified approximate nearest neighbor search scheme to combine the advantages of both the effective data structure and the fast Hamming distance computation in hashing methods. In this way, the searching procedure can be further accelerated. Computational complexity analysis and extensive experiments have demonstrated the effectiveness of our proposed scheme.

PDF Details DOI

IJCAI Conference 2013 Conference Paper

Active Learning via Neighborhood Reconstruction

Yao Hu
Debing Zhang
Zhongming Jin
Deng Cai
Xiaofei He

In many real world scenarios, active learning methods are used to select the most informative points for labeling to reduce the expensive human action. One direction for active learning is selecting the most representative points, ie. , selecting the points that other points can be approximated by linear combination of the selected points. However, these methods fails to consider the local geometrical information of the data space. In this paper, we propose a novel framework named Active Learning via Neighborhood Reconstruction (ALNR) by taking into account the locality information directly during the selection. Speciﬁcally, for the linear reconstruction of target point, the nearer neighbors should have a greater effect and the selected points distant from the target point should be penalized severely. We further develop an efﬁcient two-stage iterative procedure to solve the ﬁnal optimization problem. Our empirical study shows encouraging results of the proposed algorithms in comparison to other state-of-the-art active learning algorithms on both synthetic and real visual data sets.

PDF Details DOI