Arrow Research search

Author name cluster

Jun Ma

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers
2 author rows

Possible papers

35

AAAI Conference 2026 Conference Paper

Domain Adaptation Guided Infrared and Visible Image Fusion

  • Tianwei Guan
  • Haozhen Wei
  • Yuhan Zhou
  • Jun Ma
  • Zecheng Xu
  • Zhiying Jiang
  • Jinyuan Liu
  • Xingyuan Li

Infrared and Visible Image Fusion (IVIF) integrates complementary information from distinct modalities to enhance image quality. However, the effectiveness declines under unseen conditions such as novel weather or scenes, due to domain shifts primarily from variations of data distribution in the visible modality, while the infrared modality remains relatively stable. To overcome domain shifts caused by the imbalance between modalities during image fusion, we propose a Domain Adaptation Guided Infrared and Visible Image Fusion method, termed DAFusion, leveraging a dual-rank domain adapter to enable fast adaptation to diverse adverse conditions during image fusion. Specifically, trainable low-rank and high-rank embedding spaces are respectively used to capture knowledge common across domains (domain-shared) and those unique to target domains (domain-specific). To leverage the dual-rank adapter more effectively, we develop a homeostatic knowledge allotment strategy to integrate the distinct types of knowledge dynamically based on the uncertainty value of target domains. Since domain adaptation typically optimizes for feature alignment across domains and emphasizes invariance rather than preserving specific cues critical for image fusion, while the fusion objective requires retaining discriminative and complementary features, a conflict between the two modules appears. To reconcile this, we further adopt a bi-level optimization framework that structurally decouples the two objectives, enabling the fusion module to steer the adaptation process while benefiting in return from domain-aligned representations. Experimental results on three benchmarks demonstrate that our method significantly outperforms state-of-the-art approaches, achieving both an enhancement in fusion quality and an improvement on subsequent high-level tasks.

AAAI Conference 2026 Conference Paper

DRM-Net: Explicit Residual Modelling with Subaquatic Multi-Scale Context Fusion for Underwater Image Enhancement

  • Chang Huang
  • Zhou Zhexin
  • Jun Ma
  • Jiatong Shen
  • Peixuan Xiong
  • Huayong Yang
  • Kaishun Wu

Clear and high-quality underwater images are essential for marine applications, including autonomous navigation, ecological monitoring, and infrastructure inspection. However, underwater images typically suffer from severe colour distortion, low contrast, and diminished structural visibility due to wavelength-dependent attenuation, scattering, and uneven illumination conditions. Recent deep learning-based underwater image enhancement (UIE) methods primarily adopt end-to-end frameworks, directly regressing enhanced images from degraded inputs. While these approaches have achieved significant progress, they often lack explicit modeling of the degradation process, leading to limited interpretability and suboptimal recovery of fine-grained details. To address these limitations, we propose DRM-Net, an explicit residual learning framework for UIE. Rather than estimating the enhanced image directly, DRM-Net first predicts a pixel-wise Degradation Residual Map (DRM) in the perceptually uniform CIELab colour space. This map explicitly quantifies local colour, contrast, and structural degradations, thereby enabling the network to precisely reconstruct missing visual information. Furthermore, we design a lightweight Subaquatic Multi-Scale Context Fusion module, which utilizes parallel atrous convolutions with softmax-weighted feature aggregation, significantly enhancing robustness against spatially heterogeneous scattering. Trained jointly with pixel-wise DRM and VGG-based perceptual losses, DRM-Net achieves superior colour fidelity, perceptual realism, and structural detail recovery. Comprehensive experiments conducted on multiple benchmarks demonstrate that our proposed approach attains competitive quantitative results and superior qualitative visual performance compared to state-of-the-art UIE methods, while maintaining low computational overhead, making it particularly suitable for resource-constrained underwater robotic systems.

AAAI Conference 2026 Conference Paper

HATIR: Heat-Aware Diffusion for Turbulent Infrared Video Super-Resolution

  • Yang Zou
  • Xingyue Zhu
  • Kaiqi Han
  • Jun Ma
  • Xingyuan Li
  • Zhiying Jiang
  • Jinyuan Liu

Infrared video has been of great interest in visual tasks under challenging environments, but often suffers from severe atmospheric turbulence and compression degradation. Existing video super-resolution (VSR) methods either neglect the inherent modality gap between infrared and visible images or fail to restore turbulence-induced distortions. Directly cascading turbulence mitigation (TM) algorithms with VSR methods leads to error propagation and accumulation due to the decoupled modeling of degradation between turbulence and resolution. We introduce HATIR, a Heat-Aware Diffusion for Turbulent InfraRed Video Super-Resolution, which injects heat-aware deformation priors into the diffusion sampling path to jointly model the inverse process of turbulent degradation and structural detail loss. Specifically, HATIR constructs a Phasor-Guided Flow Estimator, rooted in the physical principle that thermally active regions exhibit consistent phasor responses over time, enabling reliable turbulence-aware flow to guide the reverse diffusion process. To ensure the fidelity of structural recovery under nonuniform distortions, a Turbulence-Aware Decoder is proposed to selectively suppress unstable temporal cues and enhance edge-aware feature aggregation via turbulence gating and structure-aware attention. We built FLIR-IVSR, the first dataset for turbulent infrared VSR, comprising paired LR-HR sequences from a FLIR T1050sc camera (1024 X 768) spanning 640 diverse scenes with varying camera and object motion conditions. This encourages future research in infrared VSR.

AAAI Conference 2026 Conference Paper

Induce, Align, Predict: Zero-Shot Stance Detection via Cognitive Inductive Reasoning

  • Bowen Zhang
  • Jun Ma
  • Fuqiang Niu
  • Li Dong
  • Jinzhou Cao
  • Genan Dai

Zero-shot stance detection (ZSSD) seeks to determine the stance of text toward previously unseen targets, a task critical for analyzing dynamic and polarized online discourse with limited labeled data. While large language models (LLMs) offer zero-shot capabilities, prompting-based approaches often fall short in handling complex reasoning and lack robust generalization to novel targets. Meanwhile, LLM-enhanced methods still require substantial labeled data and struggle to move beyond instance-level patterns, limiting their interpretability and adaptability. Inspired by cognitive science, we propose the Cognitive Inductive Reasoning Framework (CIRF), a schema-driven method that bridges linguistic inputs and abstract reasoning via automatic induction and application of cognitive reasoning schemas. CIRF abstracts first-order logic patterns from raw text into multi-relational schema graphs in an unsupervised manner, and leverages a schema-enhanced graph kernel model to align input structures with schema templates for robust, interpretable zero-shot inference. Extensive experiments on SemEval-2016, VAST, and COVID-19-Stance benchmarks demonstrate that CIRF not only establishes new state-of-the-art results, but also achieves comparable performance with just 30% of the labeled data, demonstrating its strong generalization and efficiency in low-resource settings.

AAAI Conference 2026 Conference Paper

Toward Real-World High-Precision Image Matting and Segmentation

  • Haipeng Zhou
  • Zhaohu Xing
  • Hongqiu Wang
  • Jun Ma
  • Ping Li
  • Lei Zhu

High-precision scene parsing tasks, including image matting and dichotomous segmentation, aim to accurately predict masks with extremely fine details (such as hair). Most existing methods focus on salient, single foreground objects. While interactive methods allow for target adjustment, their class-agnostic design restricts generalization across different categories. Furthermore, the scarcity of high-quality annotation has led to a reliance on inharmonious synthetic data, resulting in poor generalization to real-world scenarios. To this end, we propose a Foreground Consistent Learning model, dubbed as FCLM, to address the aforementioned issues. Specifically, we first introduce a Depth-Aware Distillation strategy where we transfer the depth-related knowledge for better foreground representation. Considering the data dilemma, we term the processing of synthetic data as domain adaptation problem where we propose a domain-invariant learning strategy to focus on foreground learning. To support interactive prediction, we contribute an Object-Oriented Decoder that can receive both visual and language prompts to predict the referring target. Experimental results show that our method quantitatively and qualitatively outperforms state-of-the-art methods.

JBHI Journal 2026 Journal Article

UTADC-Net: Unsupervised Topological-Aware Diffusion Condensation Network for Medical Image Segmentation

  • Yue Peng
  • Ruodai Wu
  • Bing Xiong
  • Fuqiang Chen
  • Jun Ma
  • Yaoqin Xie
  • Jing Cai
  • Wenjian Qin

Medical image segmentation plays a crucial role in computer-aided diagnosis and treatment planning. Unsupervised segmentation methods that can effectively leverage unlabeled data bring significant promise in clinical application. However, they remain a challenging task in maintaining anatomical structure topological consistency that often produces anatomical structure breaks, connectivity errors, or boundary discontinuities. To address these issues, we propose a novel Unsupervised Topological-Aware Diffusion Condensation Network (UTADC-Net) for medical image segmentation. Specifically, we design a diffusion condensation-based framework that achieves structural consistency in segmentation results by effectively modeling long-range dependencies between pixels and incorporating topological constraints. First, to effectively fuse local details and global semantic information, we employ a pixel-centric patch embedding module by simultaneously modeling local structural features and inter-region interactions. Second, to enhance the topological consistency of segmentation results, we introduce an adaptive topological constraint mechanism that guides the network to learn anatomically aligned structural representations through pixel-level topological relationships and corresponding loss functions. Extensive experiments conducted on three public medical image datasets demonstrate that our proposed UTADC-Net significantly outperforms existing unsupervised methods in terms of segmentation accuracy and topological structure preservation. Notably, our method demonstrates segmentation results with excellent anatomical structural consistency. These results indicate that our framework provides a novel and practical solution for unsupervised medical image segmentation.

NeurIPS Conference 2025 Conference Paper

Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector

  • Haoyan Yang
  • Runxue Bao
  • Cao (Danica) Xiao
  • Jun Ma
  • Parminder Bhatia
  • Shangqian Gao
  • Taha Kass-Hout

LLM-as-a-Judge has emerged as a promising tool for automatically evaluating generated outputs, but its reliability is often undermined by potential biases in judgment. Existing efforts to mitigate these biases face key limitations: in-context learning-based methods fail to address rooted biases due to the evaluator’s limited capacity for self-reflection, whereas fine-tuning is not applicable to all evaluator types, especially closed-source models. To address this challenge, we introduce the R easoning-based B ias D etector (RBD), which is a plug-in module that identifies biased evaluations and generates structured reasoning to guide evaluator self-correction. Rather than modifying the evaluator itself, RBD operates externally and engages in an iterative process of bias detection and feedback-driven revision. To support its development, we design a complete pipeline consisting of biased dataset construction, supervision collection, distilled reasoning-based fine-tuning of RBD, and integration with LLM evaluators. We fine-tune four sizes of RBD models, ranging from 1. 5B to 14B, and observe consistent performance improvements across all scales. Experimental results on 4 bias types—verbosity, position, bandwagon, and sentiment—evaluated using 8 LLM evaluators demonstrate RBD’s strong effectiveness. For example, the RBD-8B model improves evaluation accuracy by an average of 18. 5% and consistency by 10. 9%, and surpasses prompting-based baselines and fine-tuned judges by 12. 8% and 17. 2%, respectively. These results highlight RBD’s effectiveness and scalability. Additional experiments further demonstrate its strong generalization across biases and domains, as well as its efficiency.

AAAI Conference 2025 Conference Paper

OTIAS: OcTree Implicit Adaptive Sampling for Multispectral and Hyperspectral Image Fusion

  • Shangqi Deng
  • Jun Ma
  • Liang-Jian Deng
  • Ping Wei

Implicit Neural Representation (INR) methods have demonstrated great potential in arbitrary-scale super-resolution tasks. This success is primarily due to their ability to continuously represent images using coordinates. In the task of remote sensing image fusion, INR methods have also shown promising applications. However, the previous INR methods neglect channel-wise modeling, while sharing a single kernel across all channels at each position, resulting in a lack of sensitivity to data specificity. To address these issues, we propose the OcTree Implicit Adaptive Sampling (OTIAS) method, which innovatively applies the octree structure to restore data from both horizontal and vertical directions, effectively incorporating spatial and spectral information from hyperspectral data. Additionally, we introduce a novel method to adaptively generate interpolation kernels based on coordinates. This approach efficiently produces customized interpolation kernel parameters for octree nodes, tailored to different spectral information. Overall, our method achieves state-of-the-art performance on the CAVE and Harvard datasets with 4× and 8× scaling factors, outperforming existing approaches.

NeurIPS Conference 2025 Conference Paper

PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly

  • Liang Ma
  • Jiajun Wen
  • Min Lin
  • Rongtao Xu
  • Xiwen Liang
  • Bingqian Lin
  • Jun Ma
  • Yongxin Wang

While vision-language models (VLMs) have demonstrated promising capabilities in reasoning and planning for embodied agents, their ability to comprehend physical phenomena, particularly within structured 3D environments, remains severely limited. To close this gap, we introduce PhyBlock, a progressive benchmark designed to assess VLMs on physical understanding and planning through robotic 3D block assembly tasks. PhyBlock integrates a novel four-level cognitive hierarchy assembly task alongside targeted Visual Question Answering (VQA) samples, collectively aimed at evaluating progressive spatial reasoning and fundamental physical comprehension, including object properties, spatial relationships, and holistic scene understanding. PhyBlock includes 2600 block tasks (400 assembly tasks, 2200 VQA tasks) and evaluates models across three key dimensions: partial completion, failure diagnosis, and planning robustness. We benchmark 23 state-of-the-art VLMs, highlighting their strengths and limitations in physically grounded, multi-step planning. Our empirical findings indicate that the performance of VLMs exhibits pronounced limitations in high-level planning and reasoning capabilities, leading to a notable decline in performance for the growing complexity of the tasks. Error analysis reveals persistent difficulties in spatial orientation and dependency reasoning. We position PhyBlock as a unified testbed to advance embodied reasoning, bridging vision-language understanding and real-world physical problem-solving.

AAAI Conference 2025 Conference Paper

SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering

  • Xiaopeng Li
  • Shasha Li
  • Shezheng Song
  • Huijun Liu
  • Bin Ji
  • Xi Wang
  • Jun Ma
  • Jie Yu

The general capabilities of large language models (LLMs) make them the infrastructure for various AI applications, but updating their inner knowledge requires significant resources. Recent model editing is a promising technique for efficiently updating a small amount of knowledge of LLMs and has attracted much attention. In particular, local editing methods, which directly update model parameters, are proven suitable for updating small amounts of knowledge. Local editing methods update weights by computing least squares closed-form solutions and identify edited knowledge by vector-level matching in inference, which achieve promising results. However, these methods still require a lot of time and resources to complete the computation. Moreover, vector-level matching lacks reliability, and such updates disrupt the original organization of the model's parameters. To address these issues, we propose a detachable and expandable Subject Word Embedding Altering (SWEA) framework, which finds the editing embeddings through token-level matching and adds them to the subject word embeddings in Transformer input. To get these editing embeddings, we propose optimizing then suppressing fusion method, which first optimizes learnable embedding vectors for the editing target and then suppresses the Knowledge Embedding Dimensions (KEDs) to obtain final editing embeddings. We thus propose SWEAOS method for editing factual knowledge in LLMs. We demonstrate the overall state-of-the-art (SOTA) performance of SWEAOS on the CounterFact and zsRE datasets. To further validate the reasoning ability of SWEAOS in editing knowledge, we evaluate it on the more complex RippleEdits benchmark. The results demonstrate that SWEAOS possesses SOTA reasoning ability.

AAAI Conference 2025 Conference Paper

Towards Verifiable Text Generation with Generative Agent

  • Bin Ji
  • Huijun Liu
  • Mingzhe Du
  • Shasha Li
  • Xiaodong Liu
  • Jun Ma
  • Jie Yu
  • See-Kiong Ng

Text generation with citations makes it easy to verify the factuality of Large Language Models’ (LLMs) generations. Existing one-step generation studies expose distinct shortages in answer refinement and in-context demonstration matching. In light of these challenges, we propose R2-MGA, a Retrieval and Reflection Memory-augmented Generative Agent. Specifically, it first retrieves the memory bank to obtain the best-matched memory snippet, then reflects the retrieved snippet as a reasoning rationale, next combines the snippet and the rationale as the best-matched in-context demonstration. Additionally, it is capable of in-depth answer refinement with two specifically designed modules. We evaluate R2-MGA across five LLMs on the ALCE benchmark. The results reveal R2-MGA’ exceptional capabilities in text generation with citations. In particular, compared to the selected baselines, it delivers up to +58.8% and +154.7% relative performance gains on answer correctness and citation quality, respectively. Extensive analyses strongly support the motivations of R2-MGA.

AAAI Conference 2024 Conference Paper

Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum

  • Shen Gao
  • Zhengliang Shi
  • Minghang Zhu
  • Bowen Fang
  • Xin Xin
  • Pengjie Ren
  • Zhumin Chen
  • Jun Ma

Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extending the capability of LLMs. Although there are some works that employ open-source LLMs for the tool-learning task, most of them are trained in a controlled environment in which LLMs only learn to execute the human-provided tools. However, selecting proper tools from the large toolset is also a crucial ability for the tool-learning model to be applied in real-world applications. Existing methods usually directly employ self-instruction methods to train the model, which ignores differences in tool complexity. In this paper, we propose the Confucius a novel tool-learning framework to train LLM to use complicated tools in real-world scenarios, which contains two main phases: (1) We first propose a multi-stage learning method to teach the LLM to use various tools from an easy-to-difficult curriculum; (2) thenceforth, we propose the Iterative Self-instruct from Introspective Feedback (ISIF) to dynamically construct the dataset to improve the ability to use the complicated tool. Extensive experiments conducted on both controlled and real-world settings demonstrate the superiority of our tool-learning framework in the real-world application scenario compared to both tuning-free (e.g., ChatGPT, Claude) and tuning-based baselines (e.g., GPT4Tools).

AAAI Conference 2024 Conference Paper

Improving Factual Error Correction by Learning to Inject Factual Errors

  • Xingwei He
  • Qianru Zhang
  • A-Long Jin
  • Jun Ma
  • Yuan Yuan
  • Siu Ming Yiu

Factual error correction (FEC) aims to revise factual errors in false claims with minimal editing, making them faithful to the provided evidence. This task is crucial for alleviating the hallucination problem encountered by large language models. Given the lack of paired data (i.e., false claims and their corresponding correct claims), existing methods typically adopt the ‘mask-then-correct’ paradigm. This paradigm relies solely on unpaired false claims and correct claims, thus being referred to as distantly supervised methods. These methods require a masker to explicitly identify factual errors within false claims before revising with a corrector. However, the absence of paired data to train the masker makes accurately pinpointing factual errors within claims challenging. To mitigate this, we propose to improve FEC by Learning to Inject Factual Errors (LIFE), a three-step distantly supervised method: ‘mask-corrupt-correct’. Specifically, we first train a corruptor using the ‘mask-then-corrupt’ procedure, allowing it to deliberately introduce factual errors into correct text. The corruptor is then applied to correct claims, generating a substantial amount of paired data. After that, we filter out low-quality data, and use the remaining data to train a corrector. Notably, our corrector does not require a masker, thus circumventing the bottleneck associated with explicit factual error identification. Our experiments on a public dataset verify the effectiveness of LIFE in two key aspects: Firstly, it outperforms the previous best-performing distantly supervised method by a notable margin of 10.59 points in SARI Final (19.3% improvement). Secondly, even compared to ChatGPT prompted with in-context examples, LIFE achieves a superiority of 7.16 points in SARI Final.

AAAI Conference 2024 Conference Paper

PMET: Precise Model Editing in a Transformer

  • Xiaopeng Li
  • Shasha Li
  • Shezheng Song
  • Jing Yang
  • Jun Ma
  • Jie Yu

Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at a relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values of key-value memories of the Feed-Forward Network (FFN). They usually optimize the TL hidden states to memorize target knowledge and use it to update the weights of the FFN in LLMs. However, the information flow of TL hidden states comes from three parts: Multi-Head Self-Attention (MHSA), FFN, and residual connections. Existing methods neglect the fact that the TL hidden states contains information not specifically required for FFN. Consequently, the performance of model editing decreases. To achieve more precise model editing, we analyze hidden states of MHSA and FFN, finding that MHSA encodes certain general knowledge extraction patterns. This implies that MHSA weights do not require updating when new knowledge is introduced. Based on above findings, we introduce PMET, which simultaneously optimizes Transformer Component (TC, namely MHSA and FFN) hidden states, while only using the optimized TC hidden states of FFN to precisely update FFN weights. Our experiments demonstrate that PMET exhibits state-of-the-art performance on both the \textsc{counterfact} and zsRE datasets. Our ablation experiments substantiate the effectiveness of our enhancements, further reinforcing the finding that the MHSA encodes certain general knowledge extraction patterns and indicating its storage of a small amount of factual knowledge. Our code is available at \url{https://github.com/xpq-tech/PMET}.

IJCAI Conference 2024 Conference Paper

Towards Proactive Interactions for In-Vehicle Conversational Assistants Utilizing Large Language Models

  • Huifang Du
  • Xuejing Feng
  • Jun Ma
  • Meng Wang
  • Shiyu Tao
  • YiJie Zhong
  • Yuan-Fang Li
  • Haofen Wang

Research demonstrates that the proactivity of in-vehicle conversational assistants (IVCAs) can help to reduce distractions and enhance driving safety, better meeting users' cognitive needs. However, existing IVCAs struggle with user intent recognition and context awareness, which leads to suboptimal proactive interactions. Large language models (LLMs) have shown potential for generalizing to various tasks with prompts, but their application in IVCAs and exploration of proactive interaction remain under-explored. These raise questions about how LLMs improve proactive interactions for IVCAs and influence user perception. To investigate these questions systematically, we establish a framework with five proactivity levels across two dimensions—assumption and autonomy—for IVCAs. According to the framework, we propose a ``Rewrite + ReAct + Reflect'' strategy, aiming to empower LLMs to fulfill the specific demands of each proactivity level when interacting with users. Both feasibility and subjective experiments are conducted. The LLM outperforms the state-of-the-art model in success rate and achieves satisfactory results for each proactivity level. Subjective experiments with 40 participants validate the effectiveness of our framework and show the proactive level with strong assumptions and user confirmation is most appropriate.

IJCAI Conference 2020 Conference Paper

Auxiliary Template-Enhanced Generative Compatibility Modeling

  • Jinhuan Liu
  • Xuemeng Song
  • Zhaochun Ren
  • Liqiang Nie
  • Zhaopeng Tu
  • Jun Ma

In recent years, there has been a growing interest in the fashion analysis (e. g. , clothing matching) due to the huge economic value of the fashion industry. The essential problem is to model the compatibility between the complementary fashion items, such as the top and bottom in clothing matching. The majority of existing work on fashion analysis has focused on measuring the item-item compatibility in a latent space with deep learning methods. In this work, we aim to improve the compatibility modeling by sketching a compatible template for a given item as an auxiliary link between fashion items. Specifically, we propose an end-to-end Auxiliary Template-enhanced Generative Compatibility Modeling (AT-GCM) scheme, which introduces an auxiliary complementary template generation network equipped with the pixel-wise consistency and compatible template regularization. Extensive experiments on two real-world datasets demonstrate the superiority of the proposed approach.

AAAI Conference 2020 Conference Paper

RefNet: A Reference-Aware Network for Background Based Conversation

  • Chuan Meng
  • Pengjie Ren
  • Zhumin Chen
  • Christof Monz
  • Jun Ma
  • Maarten de Rijke

Existing conversational systems tend to generate generic responses. Recently, Background Based Conversations (BBCs) have been introduced to address this issue. Here, the generated responses are grounded in some background information. The proposed methods for BBCs are able to generate more informative responses, however, they either cannot generate natural responses or have difficulties in locating the right background information. In this paper, we propose a Reference-aware Network (RefNet) to address both issues. Unlike existing methods that generate responses token by token, RefNet incorporates a novel reference decoder that provides an alternative way to learn to directly select a semantic unit (e. g. , a span containing complete semantic information) from the background. Experimental results show that RefNet significantly outperforms state-of-the-art methods in terms of both automatic and human evaluations, indicating that RefNet can generate more appropriate and human-like responses.

AAAI Conference 2020 Conference Paper

Thinking Globally, Acting Locally: Distantly Supervised Global-to-Local Knowledge Selection for Background Based Conversation

  • Pengjie Ren
  • Zhumin Chen
  • Christof Monz
  • Jun Ma
  • Maarten de Rijke

Background Based Conversations (BBCs) have been introduced to help conversational systems avoid generating overly generic responses. In a BBC, the conversation is grounded in a knowledge source. A key challenge in BBCs is Knowledge Selection (KS): given a conversational context, try to find the appropriate background knowledge (a text fragment containing related facts or comments, etc.) based on which to generate the next response. Previous work addresses KS by employing attention and/or pointer mechanisms. These mechanisms use a local perspective, i. e. , they select a token at a time based solely on the current decoding state. We argue for the adoption of a global perspective, i. e. , pre-selecting some text fragments from the background knowledge that could help determine the topic of the next response. We enhance KS in BBCs by introducing a Global-to-Local Knowledge Selection (GLKS) mechanism. Given a conversational context and background knowledge, we first learn a topic transition vector to encode the most likely text fragments to be used in the next response, which is then used to guide the local KS at each decoding timestamp. In order to effectively learn the topic transition vector, we propose a distantly supervised learning schema. Experimental results show that the GLKS model significantly outperforms state-of-the-art methods in terms of both automatic and human evaluation. More importantly, GLKS achieves this without requiring any extra annotations, which demonstrates its high degree of scalability.

AAAI Conference 2019 Conference Paper

RepeatNet: A Repeat Aware Neural Recommendation Machine for Session-Based Recommendation

  • Pengjie Ren
  • Zhumin Chen
  • Jing Li
  • Zhaochun Ren
  • Jun Ma
  • Maarten de Rijke

Recurrent neural networks for session-based recommendation have attracted a lot of attention recently because of their promising performance. repeat consumption is a common phenomenon in many recommendation scenarios (e. g. , e-commerce, music, and TV program recommendations), where the same item is re-consumed repeatedly over time. However, no previous studies have emphasized repeat consumption with neural networks. An effective neural approach is needed to decide when to perform repeat recommendation. In this paper, we incorporate a repeat-explore mechanism into neural networks and propose a new model, called RepeatNet, with an encoder-decoder structure. RepeatNet integrates a regular neural recommendation approach in the decoder with a new repeat recommendation mechanism that can choose items from a user’s history and recommends them at the right time. We report on extensive experiments on three benchmark datasets. RepeatNet outperforms state-of-the-art baselines on all three datasets in terms of MRR and Recall. Furthermore, as the dataset size and the repeat ratio increase, the improvements of RepeatNet over the baselines also increase, which demonstrates its advantage in handling repeat recommendation scenarios.

TIST Journal 2017 Journal Article

Augmented Collaborative Filtering for Sparseness Reduction in Personalized POI Recommendation

  • Chaoran Cui
  • Jialie Shen
  • Liqiang Nie
  • Richang Hong
  • Jun Ma

As mobile device penetration increases, it has become pervasive for images to be associated with locations in the form of geotags. Geotags bridge the gap between the physical world and the cyberspace, giving rise to new opportunities to extract further insights into user preferences and behaviors. In this article, we aim to exploit geotagged photos from online photo-sharing sites for the purpose of personalized Point-of-Interest (POI) recommendation. Owing to the fact that most users have only very limited travel experiences, data sparseness poses a formidable challenge to personalized POI recommendation. To alleviate data sparseness, we propose to augment current collaborative filtering algorithms along from multiple perspectives. Specifically, hybrid preference cues comprising user-uploaded and user-favored photos are harvested to study users’ tastes. Moreover, heterogeneous high-order relationship information is jointly captured from user social networks and POI multimodal contents with hypergraph models. We also build upon the matrix factorization algorithm to integrate the disparate sources of preference and relationship information, and apply our approach to directly optimize user preference rankings. Extensive experiments on a large and publicly accessible dataset well verified the potential of our approach for addressing data sparseness and offering quality recommendations to users, especially for those who have only limited travel experiences.

TIST Journal 2015 Journal Article

A Hybrid Multigroup Coclustering Recommendation Framework Based on Information Fusion

  • Shanshan Huang
  • Jun Ma
  • Peizhe Cheng
  • Shuaiqiang Wang

Collaborative Filtering (CF) is one of the most successful algorithms in recommender systems. However, it suffers from data sparsity and scalability problems. Although many clustering techniques have been incorporated to alleviate these two problems, most of them fail to achieve further significant improvement in recommendation accuracy. First of all, most of them assume each user or item belongs to a single cluster. Since usually users can hold multiple interests and items may belong to multiple categories, it is more reasonable to assume that users and items can join multiple clusters (groups), where each cluster is a subset of like-minded users and items they prefer. Furthermore, most of the clustering-based CF models only utilize historical rating information in the clustering procedure but ignore other data resources in recommender systems such as the social connections of users and the correlations between items. In this article, we propose HMCoC, a Hybrid Multigroup CoClustering recommendation framework, which can cluster users and items into multiple groups simultaneously with different information resources. In our framework, we first integrate information of user--item rating records, user social networks, and item features extracted from the DBpedia knowledge base. We then use an optimization method to mine meaningful user--item groups with all the information. Finally, we apply the conventional CF method in each cluster to make predictions. By merging the predictions from each cluster, we generate the top-n recommendations to the target users for return. Extensive experimental results demonstrate the superior performance of our approach in top-n recommendation in terms of MAP, NDCG, and F1 compared with other clustering-based CF models.

TIST Journal 2014 Journal Article

VSRank

  • Shuaiqiang Wang
  • Jiankai Sun
  • Byron J. Gao
  • Jun Ma

Collaborative filtering (CF) is an effective technique addressing the information overload problem. CF approaches generally fall into two categories: rating based and ranking based. The former makes recommendations based on historical rating scores of items and the latter based on their rankings. Ranking-based CF has demonstrated advantages in recommendation accuracy, being able to capture the preference similarity between users even if their rating scores differ significantly. In this study, we propose VSRank, a novel framework that seeks accuracy improvement of ranking-based CF through adaptation of the vector space model. In VSRank, we consider each user as a document and his or her pairwise relative preferences as terms. We then use a novel degree-specialty weighting scheme resembling TF-IDF to weight the terms. Extensive experiments on benchmarks in comparison with the state-of-the-art approaches demonstrate the promise of our approach.

ICAART Conference 2009 Conference Paper

Using Agents' Attitudes and Assessments in Automated Fuzzy Bidding Strategy

  • Madhu Lata Goyal
  • Jun Ma

To be successful in multi-attribute auction, agents must be capable of adapting to continuous changing bidding price. This paper presents a novel fuzzy attitude based bidding strategy (FA-Bid), which employs dual assessment technique i. e. assessment of multiple attributes of the goods as well as assessment of agents attitude (eagerness) to procure an item in automated auction. The assessment of attributes adapts the fuzzy sets technique to handle uncertainty of the bidding process as well use heuristic rules to determine attitude of bidding agents in simulated auctions to procure goods. The overall assessment is used to determine a price range based on current bid, which finally selects the best one as the new bid.