Arrow Research search

Author name cluster

Jie Jiang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
2 author rows

Possible papers

17

AAAI Conference 2026 Conference Paper

Cross-Scale Collaboration between LLMs and Lightweight Sequential Recommenders with Domain-Specific Latent Reasoning

  • Yipeng Zhang
  • Xin Wang
  • Hong Chen
  • Junwei Pan
  • Qian Li
  • Jun Zhang
  • Jie Jiang
  • Hong Mei

Sequential recommendation aims to predict the next item based on historical interactions. To further enhance the reasoning capability in sequential recommendation, LLMs are employed to predict the next item or generate semantic IDs for item representation, given LLMs' extensive domain knowledge and reasoning ability. However, existing LLM-based methods suffer from two limitations. (i) The scarcity of recommendation data with reasoning paths makes it challenging to design suitable chain-of-thought prompting templates, and the full potential of LLMs' reasoning abilities remains underutilized. (ii) Upon obtaining semantic IDs, the LLMs and their representations are excluded from the subsequent recommendation model training, preventing downstream models from fully utilizing the rich semantic information encoded within these IDs. To address these issues, we propose a novel CoderRec framework, which is capable of fully exploiting the information encoded in semantic IDs to guide the recommendation process. Specifically, to address the problem of scarcity in reasoning path-augmented data, we introduce latent reasoning into sequential recommendation and treat the representation captured by the downstream model as domain-specific latent thought, enabling implicit logical inference without requiring explicit CoT annotations. To ensure that the downstream recommendation models are able to deeply leverage the semantic information within IDs, we propose a novel cross-scale model collaboration strategy, which employs cross-scale IDs and a two-phase approach to align LLM-derived semantics with recommendation objectives. Extensive experiments have shown the effectiveness of our proposed CoderRec framework.

IROS Conference 2025 Conference Paper

Emergent Cooperative Strategies for Pursuit-Evasion in Cluttered Environments: A Knowledge-Enhanced Multi-Agent Deep Reinforcement Learning Approach

  • Yihao Sun
  • Chao Yan
  • Han Zhou
  • Xiaojia Xiang
  • Jie Jiang

Deep reinforcement learning (DRL) has recently emerged as a promising tool for tackling pursuit-evasion tasks. However, most existing DRL-based pursuit approaches still rely on individual rewards and struggle with complex scenarios. To address these challenges, we propose a knowledge-enhanced DRL approach for multi-agent pursuit-evasion in complex environments. Specifically, the cooperative pursuit problem is modeled as a decentralized partially observable Markov decision process from each pursuers perspective, where the team reward function is elaborately designed to encourage collaborative behavior and enhance team coordination. Then, a novel knowledge enhanced multi-agent twin delayed deep deterministic policy gradient (KE-MATD3) algorithm is presented to efficiently learn the cooperative pursuit policy. By integrating a knowledge enhancement mechanism that extracts effective information from an improved artificial potential field method, the cooperative pursuit policy achieves more robust convergence, mitigating the local optima that typically arise from individual reward-based learning. Finally, extensive numerical simulations and real-world experiments validate the efficiency and superiority of the proposed approach, demonstrating emergent cooperative behaviors among the pursuers.

JBHI Journal 2025 Journal Article

MedFILIP: Medical Fine-Grained Language-Image Pre-Training

  • Xinjie Liang
  • Xiangyu Li
  • Fanding Li
  • Jie Jiang
  • Qing Dong
  • Wei Wang
  • Kuanquan Wang
  • Suyu Dong

Medical vision-language pretraining (VLP) that leverages naturally-paired medical image-report data is crucial for medical image analysis. However, existing methods struggle to accurately characterize associations between images and diseases, leading to inaccurate or incomplete diagnostic results. In this work, we propose MedFILIP, a fine-grained VLP model, introduces medical image-specific knowledge through contrastive learning, specifically: 1) An information extractor based on a large language model is proposed to decouple comprehensive disease details from reports, which excels in extracting disease deals through flexible prompt engineering, thereby effectively reducing text complexity while retaining rich information at a tiny cost. 2) A knowledge injector is proposed to construct relationships between categories and visual attributes, which help the model to make judgments based on image features, and fosters knowledge extrapolation to unfamiliar disease categories. 3) A semantic similarity matrix based on fine-grained annotations is proposed, providing smoother, information-richer labels, thus allowing fine-grained image-text alignment. 4) We validate MedFILIP on numerous datasets, e. g. , RSNA-Pneumonia, NIH ChestX-ray14, VinBigData, and COVID-19. For single-label, multi-label, and fine-grained classification, our model achieves state-of-the-art performance, the classification accuracy has increased by a maximum of 6. 69%.

AAAI Conference 2025 Conference Paper

PointCFormer: A Relation-Based Progressive Feature Extraction Network for Point Cloud Completion

  • Yi Zhong
  • Weize Quan
  • Dong-Ming Yan
  • Jie Jiang
  • Yingmei Wei

Point cloud completion aims to reconstruct the complete 3D shape from incomplete point clouds, and it is crucial for tasks such as 3D object detection and segmentation. Despite the continuous advances in point cloud analysis techniques, feature extraction methods are still confronted with apparent limitations. The sparse sampling of point clouds, used as inputs in most methods, often results in a certain loss of global structure information. Meanwhile, traditional local feature extraction methods usually struggle to capture the intricate geometric details. To overcome these drawbacks, we introduce PointCFormer, a transformer framework optimized for robust global retention and precise local detail capture in point cloud completion. This framework embraces several key advantages. First, we propose a relation-based local feature extraction method to perceive local delicate geometry characteristics. This approach establishes a fine-grained relationship metric between the target point and its k-nearest neighbors, quantifying each neighboring point's contribution to the target point's local features. Secondly, we introduce a progressive feature extractor that integrates our local feature perception method with self-attention. Starting with a denser sampling of points as input, it iteratively queries long-distance global dependencies and local neighborhood relationships. This extractor maintains enhanced global structure and refined local details, without generating substantial computational overhead. Additionally, we develop a correction module after generating point proxies in the latent space to reintroduce denser information from the input points, enhancing the representation capability of the point proxies. PointCFormer demonstrates state-of-the-art performance on several widely used benchmarks.

IROS Conference 2025 Conference Paper

ThermalLoc: A Vision Transformer-Based Approach for Robust Thermal Camera Relocalization in Large-Scale Environments

  • Yu Liu
  • Yangtao Meng
  • Xianfei Pan
  • Jie Jiang
  • Changhao Chen

Thermal cameras capture environmental data through heat emission, a fundamentally different mechanism compared to visible light cameras, which rely on pinhole imaging. As a result, traditional visual relocalization methods designed for visible light images are not directly applicable to thermal images. Despite significant advancements in deep learning for camera relocalization, approaches specifically tailored for thermal camera-based relocalization remain underexplored. To address this gap, we introduce ThermalLoc, a novel end-to-end deep learning method for thermal image relocalization. ThermalLoc effectively extracts both local and global features from thermal images by integrating EfficientNet with Transformers, and performs absolute pose regression using two MLP networks. We evaluated ThermalLoc on both the publicly available thermal-odometry dataset and our own dataset. The results demonstrate that ThermalLoc outperforms existing representative methods employed for thermal camera relocalization, including AtLoc, MapNet, PoseNet, and RobustLoc, achieving superior accuracy and robustness.

AAAI Conference 2024 Conference Paper

Decoupled Training: Return of Frustratingly Easy Multi-Domain Learning

  • Ximei Wang
  • Junwei Pan
  • Xingzhuo Guo
  • Dapeng Liu
  • Jie Jiang

Multi-domain learning (MDL) aims to train a model with minimal average risk across multiple overlapping but non-identical domains. To tackle the challenges of dataset bias and domain domination, numerous MDL approaches have been proposed from the perspectives of seeking commonalities by aligning distributions to reduce domain gap or reserving differences by implementing domain-specific towers, gates, and even experts. MDL models are becoming more and more complex with sophisticated network architectures or loss functions, introducing extra parameters and enlarging computation costs. In this paper, we propose a frustratingly easy and hyperparameter-free multi-domain learning method named Decoupled Training (D-Train). D-Train is a tri-phase general-to-specific training strategy that first pre-trains on all domains to warm up a root model, then post-trains on each domain by splitting into multi-heads, and finally fine-tunes the heads by fixing the backbone, enabling decouple training to achieve domain independence. Despite its extraordinary simplicity and efficiency, D-Train performs remarkably well in extensive evaluations of various datasets from standard benchmarks to applications of satellite imagery and recommender systems.

AAAI Conference 2024 Conference Paper

STEM: Unleashing the Power of Embeddings for Multi-Task Recommendation

  • Liangcai Su
  • Junwei Pan
  • Ximei Wang
  • Xi Xiao
  • Shijie Quan
  • Xihua Chen
  • Jie Jiang

Multi-task learning (MTL) has gained significant popularity in recommender systems as it enables simultaneous optimization of multiple objectives. A key challenge in MTL is negative transfer, but existing studies explored negative transfer on all samples, overlooking the inherent complexities within them. We split the samples according to the relative amount of positive feedback among tasks. Surprisingly, negative transfer still occurs in existing MTL methods on samples that receive comparable feedback across tasks. Existing work commonly employs a shared-embedding paradigm, limiting the ability of modeling diverse user preferences on different tasks. In this paper, we introduce a novel Shared and Task-specific EMbeddings (STEM) paradigm that aims to incorporate both shared and task-specific embeddings to effectively capture task-specific user preferences. Under this paradigm, we propose a simple model STEM-Net, which is equipped with an All Forward Task-specific Backward gating network to facilitate the learning of task-specific embeddings and direct knowledge transfer across tasks. Remarkably, STEM-Net demonstrates exceptional performance on comparable samples, achieving positive transfer. Comprehensive evaluation on three public MTL recommendation datasets demonstrates that STEM-Net outperforms state-of-the-art models by a substantial margin. Our code is released at https://github.com/LiangcaiSu/STEM.

AAAI Conference 2023 Conference Paper

AdaTask: A Task-Aware Adaptive Learning Rate Approach to Multi-Task Learning

  • Enneng Yang
  • Junwei Pan
  • Ximei Wang
  • Haibin Yu
  • Li Shen
  • Xihua Chen
  • Lei Xiao
  • Jie Jiang

Multi-task learning (MTL) models have demonstrated impressive results in computer vision, natural language processing, and recommender systems. Even though many approaches have been proposed, how well these approaches balance different tasks on each parameter still remains unclear. In this paper, we propose to measure the task dominance degree of a parameter by the total updates of each task on this parameter. Specifically, we compute the total updates by the exponentially decaying Average of the squared Updates (AU) on a parameter from the corresponding task. Based on this novel metric, we observe that many parameters in existing MTL methods, especially those in the higher shared layers, are still dominated by one or several tasks. The dominance of AU is mainly due to the dominance of accumulative gradients from one or several tasks. Motivated by this, we propose a Task-wise Adaptive learning rate approach, AdaTask in short, to separate the accumulative gradients and hence the learning rate of each task for each parameter in adaptive learning rate approaches (e.g., AdaGrad, RMSProp, and Adam). Comprehensive experiments on computer vision and recommender system MTL datasets demonstrate that AdaTask significantly improves the performance of dominated tasks, resulting SOTA average task-wise performance. Analysis on both synthetic and real-world datasets shows AdaTask balance parameters in every shared layer well.

NeurIPS Conference 2023 Conference Paper

ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning

  • Junguang Jiang
  • Baixu Chen
  • Junwei Pan
  • Ximei Wang
  • Dapeng Liu
  • Jie Jiang
  • Mingsheng Long

Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks. Occasionally, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, which is known as negative transfer. This problem is often attributed to the gradient conflicts among tasks, and is frequently tackled by coordinating the task gradients in previous works. However, these optimization-based methods largely overlook the auxiliary-target generalization capability. To better understand the root cause of negative transfer, we experimentally investigate it from both optimization and generalization perspectives. Based on our findings, we introduce ForkMerge, a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights by minimizing target validation errors, and dynamically merges all branches to filter out detrimental task-parameter updates. On a series of auxiliary-task learning benchmarks, ForkMerge outperforms existing methods and effectively mitigates negative transfer.

AAAI Conference 2023 Conference Paper

Towards In-Distribution Compatible Out-of-Distribution Detection

  • Boxi Wu
  • Jie Jiang
  • Haidong Ren
  • Zifan Du
  • Wenxiao Wang
  • Zhifeng Li
  • Deng Cai
  • Xiaofei He

Deep neural network, despite its remarkable capability of discriminating targeted in-distribution samples, shows poor performance on detecting anomalous out-of-distribution data. To address this defect, state-of-the-art solutions choose to train deep networks on an auxiliary dataset of outliers. Various training criteria for these auxiliary outliers are proposed based on heuristic intuitions. However, we find that these intuitively designed outlier training criteria can hurt in-distribution learning and eventually lead to inferior performance. To this end, we identify three causes of the in-distribution incompatibility: contradictory gradient, false likelihood, and distribution shift. Based on our new understandings, we propose a new out-of-distribution detection method by adapting both the top-design of deep models and the loss function. Our method achieves in-distribution compatibility by pursuing less interference with the probabilistic characteristic of in-distribution features. On several benchmarks, our method not only achieves the state-of-the-art out-of-distribution detection performance but also improves the in-distribution accuracy.

IJCAI Conference 2020 Conference Paper

Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning

  • Longteng Guo
  • Jing Liu
  • Xinxin Zhu
  • Xingjian He
  • Jie Jiang
  • Hanqing Lu

Most image captioning models are autoregressive, i. e. they generate each word by conditioning on previously generated words, which leads to heavy latency during inference. Recently, non-autoregressive decoding has been proposed in machine translation to speed up the inference time by generating all words in parallel. Typically, these models use the word-level cross-entropy loss to optimize each word independently. However, such a learning process fails to consider the sentence-level consistency, thus resulting in inferior generation quality of these non-autoregressive models. In this paper, we propose a Non-Autoregressive Image Captioning (NAIC) model with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL). CMAL formulates NAIC as a multi-agent reinforcement learning system where positions in the target sequence are viewed as agents that learn to cooperatively maximize a sentence-level reward. Besides, we propose to utilize massive unlabeled images to boost captioning performance. Extensive experiments on MSCOCO image captioning benchmark show that our NAIC model achieves a performance comparable to state-of-the-art autoregressive models, while brings 13. 9x decoding speedup.

IJCAI Conference 2016 Conference Paper

Collaborative Evolution for User Profiling in Recommender Systems

  • Zhongqi Lu
  • Sinno Jialin Pan
  • Yong Li
  • Jie Jiang
  • Qiang Yang

Accurate user profiling is important for an online recommender system to provide proper personalized recommendations to its users. In many real-world scenarios, the user's interests towards the items may change over time. Therefore, a dynamic and evolutionary user profile is needed. In this work, we come up with a novel evolutionary view of user's profile by proposing a Collaborative Evolution (CE) model, which learns the evolution of user's profiles through the sparse historical data in recommender systems and outputs the prospective user profile of the future. To verify the effectiveness of the proposed model, we conduct experiments on a real-world dataset, which is obtained from the online shopping website of Tencent - www. 51buy. com and contains more than 1 million users' shopping records in a time span of more than 180 days. Experimental analyses demonstrate that our proposed CE model can be used to make better future recommendations compared to several state-of-the-art methods.

IJCAI Conference 2015 Conference Paper

Image Feature Learning for Cold Start Problem in Display Advertising

  • Kaixiang Mo
  • Bo Liu
  • Lei Xiao
  • Yong Li
  • Jie Jiang

In online display advertising, state-of-the-art Click Through Rate(CTR) prediction algorithms rely heavily on historical information, and they work poorly on growing number of new ads without any historical information. This is known as the the cold start problem. For image ads, current stateof-the-art systems use handcrafted image features such as multimedia features and SIFT features to capture the attractiveness of ads. However, these handcrafted features are task dependent, inflexible and heuristic. In order to tackle the cold start problem in image display ads, we propose a new feature learning architecture to learn the most discriminative image features directly from raw pixels and user feedback in the target task. The proposed method is flexible and does not depend on human heuristic. Extensive experiments on a real world dataset with 47 billion records show that our feature learning method outperforms existing handcrafted features significantly, and it can extract discriminative and meaningful features.

AAMAS Conference 2013 Conference Paper

Norm Compliance Checking

  • Jie Jiang
  • Virginia Dignum
  • Huib Aldewereld
  • Frank Dignum
  • Yao-Hua Tan

In multi-agent systems, norms are used to regulate agents’ behavior so that the objectives of the systems can be realized in a predictable way. Therefore, it is important to check whether agents can comply with the norms imposed on them. However, when norms are interrelated, verification of norm compliance cannot be achieved by checking compliance of each norm separately as done traditionally. To this effect, this extended abstract introduces an approach which first models a set of interrelated norms as Norm Nets, and then map them to Colored Petri Nets (CPNs), by which compliance checking of both individual agents’ behavior and the collective behavior of the system can be performed automatically. With CPNs, it is also possible to identify under which conditions the norms can be complied with.