Arrow Research search

Author name cluster

Jun Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers
2 author rows

Possible papers

36

JBHI Journal 2026 Journal Article

Hybrid Multi-View MRI Fusion for csPCa Diagnosis via Intra- and Inter-View Transformers

  • Yuchen Zhao
  • Danyan Li
  • Teng Zhang
  • Xiangxue Wang
  • Jun Xu
  • Kai Xuan

Accurate diagnosis of clinically significant prostate cancer (csPCa) from multi-view MRI scans (axial, sagittal, and coronal) is essential for effective treatment planning and improved outcomes. Although deep learning has advanced prostate MRI analysis, many existing approaches adopt late fusion strategies that aggregate one-dimensional feature vectors extracted independently from each view, resulting in loss of spatial information and anatomical correspondence across views, ultimately limiting diagnostic performance. While Vision Transformers offer flexibility in processing multi-view patches, their memory requirements scale quadratically with the number of patches, hindering efficient concurrent processing. In contrast, Swin Transformers efficiently capture local features but are typically restricted to single-view processing due to their reliance on regular-grid input constraints. To overcome these limitations, we propose a hybrid fusion framework that decomposes multi-view information integration into iterative intra-view and inter-view interactions across multiple resolutions. The framework preserves spatial coherence and enables fine-grained feature integration while maintaining computational efficiency. Specifically, the inter-view feature exchange module, based on the Vision Transformer, employs bridge tokens to summarize information from localized patch windows, reducing memory usage while preserving spatial relationships across views. The intra-view feature extraction module, built on the Swin Transformer, facilitates dynamic, attention-driven interactions among image patches and bridge tokens within each window. Moreover, shared positional embeddings are explicitly incorporated to enhance spatial correspondence across views. Extensive experiments on a public dataset demonstrate the superiority of our method in csPCa classification. Ablation studies highlight contributions of different components, while attention map visualizations validate integration of anatomical structures across views.

AAAI Conference 2026 Conference Paper

Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction

  • Jun Xu
  • Xinkai Du
  • Yu Ao
  • Peilong Zhao
  • Yang Li
  • Ling Zhong
  • Lin Yuan
  • Zhongpu Bo

Efficient retrieval of external knowledge bases and web pages is crucial for enhancing the reasoning abilities of LLMs. Previous works on training LLMs to leverage external retrievers for solving complex problems have predominantly employed end-to-end reinforcement learning. However, these approaches neglect supervision over the reasoning process, making it difficult to guarantee logical coherence and rigor. To address these limitations, we propose Thinker, a hierarchical thinking model for deep search through multi-turn interaction, making the reasoning process supervisable and verifiable. It decomposes complex problems into independently solvable sub-problems, each dually represented in both natural language and an equivalent logical function to support knowledge base and web searches. Concurrently, dependencies between sub-problems are passed as parameters via these logical functions, enhancing the logical coherence of the problem-solving process. To avoid unnecessary external searches, we perform knowledge boundary determination to check if a sub-problem is within the LLM's intrinsic knowledge, allowing it to answer directly. Experimental results indicate that with as few as several hundred training samples, the performance of Thinker is competitive with established baselines. Furthermore, when scaled to the full training set, Thinker significantly outperforms these methods across various datasets and model sizes.

ECAI Conference 2025 Conference Paper

A Self-Adaptive Frequency Domain Network for Continuous Intraoperative Hypotension Prediction

  • Xian Zeng
  • Tianze Xu
  • Kai Yang
  • Jie Sun
  • Youran Wang
  • Jun Xu
  • Mucheng Ren

Intraoperative hypotension (IOH) is strongly associated with postoperative complications, including postoperative delirium and increased mortality, making its early prediction crucial in perioperative care. While several artificial intelligence-based models have been developed to provide IOH warnings, existing methods face limitations in incorporating both time and frequency domain information, capturing short- and long-term dependencies, and handling noise sensitivity in biosignal data. To address these challenges, we propose a novel Self-Adaptive Frequency Domain Network (SAFDNet). Specifically, SAFDNet integrates an adaptive spectral block, which leverages Fourier analysis to extract frequency-domain features and employs self-adaptive thresholding to mitigate noise. Additionally, an interactive attention block is introduced to capture both long-term and short-term dependencies in the data. Extensive internal and external validations on two large-scale real-world datasets demonstrate that SAFDNet achieves up to 97. 3% AUROC in IOH early warning, outperforming state-of-the-art models. Furthermore, SAFDNet exhibits robust predictive performance and low sensitivity to noise, making it well-suited for practical clinical applications.

AAAI Conference 2025 Conference Paper

AdaO2B: Adaptive Online to Batch Conversion for Out-of-Distribution Generalization

  • Xiao Zhang
  • Sunhao Dai
  • Jun Xu
  • Yong Liu
  • Zhenhua Dong

Online to batch conversion involves constructing a new batch learner by utilizing a series of models generated by an existing online learning algorithm, for achieving generalization guarantees under i.i.d assumption. However, when applied to real-world streaming applications such as streaming recommender systems, the data stream may be sampled from time-varying distributions instead of persistently being i.i.d. This poses a challenge in terms of out-of-distribution (OOD) generalization. Existing approaches employ fixed conversion mechanisms that are unable to adapt to novel testing distributions, hindering the testing accuracy of the batch learner. To address these issues, we propose AdaO2B, an adaptive online to batch conversion approach under the bandit setting. AdaO2B is designed to be aware of the distribution shifts in the testing data and achieves OOD generalization guarantees. Specifically, AdaO2B can dynamically combine the sequence of models learned by a contextual bandit algorithm and determine appropriate combination weights using a context-aware weighting function. This innovative approach allows for the conversion of a sequence of models into a batch learner that facilitates OOD generalization. Theoretical analysis provides justification for why and how the learned adaptive batch learner can achieve OOD generalization error guarantees. Experimental results have demonstrated that AdaO2B significantly outperforms state-of-the-art baselines on both synthetic and real-world recommendation datasets.

NeurIPS Conference 2025 Conference Paper

GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents

  • Yuqi Zhou
  • Sunhao Dai
  • Shuai Wang
  • Kaiwen Zhou
  • Qinglin Jia
  • Jun Xu

Recent Graphical User Interface (GUI) agents replicate the R1-Zero paradigm, coupling online Reinforcement Learning (RL) with explicit chain-of-thought reasoning prior to object grounding and thereby achieving substantial performance gains. In this paper, we first conduct extensive analysis experiments of three key components of that training pipeline: input design, output evaluation, and policy update—each revealing distinct challenges arising from blindly applying general-purpose RL without adapting to GUI grounding tasks. Input design: Current templates encourage the model to generate chain-of-thought reasoning, but longer chains unexpectedly lead to worse grounding performance. Output evaluation: Reward functions based on hit signals or box area allow models to exploit box size, leading to reward hacking and poor localization quality. Policy update: Online RL tends to overfit easy examples due to biases in length and sample difficulty, leading to under-optimization on harder cases. To address these issues, we propose three targeted solutions. First, we adopt a $\textbf{Fast Thinking Template}$ that encourages direct answer generation, reducing excessive reasoning during training. Second, we incorporate a box size constraint into the reward function to mitigate reward hacking. Third, we revise the RL objective by adjusting length normalization and adding a difficulty-aware scaling factor, enabling better optimization on hard samples. Our $\textbf{GUI-G1-3B}$, trained on 17K public samples with Qwen2. 5-VL-3B-Instruct, achieves $\textbf{90. 3\%}$ accuracy on ScreenSpot and $\textbf{37. 1\%}$ on ScreenSpot-Pro. This surpasses all prior models of similar size and even outperforms the larger UI-TARS-7B, establishing a new state-of-the-art in GUI agent grounding.

JBHI Journal 2025 Journal Article

HealthiVert-GAN: A Novel Framework of Pseudo-Healthy Vertebral Image Synthesis for Interpretable Compression Fracture Grading

  • Qi Zhang
  • Cheng Chuang
  • Shunan Zhang
  • Ziqi Zhao
  • Kun Wang
  • Jun Xu
  • Jianqi Sun

Osteoporotic vertebral compression fractures (OVCFs) are prevalent in the elderly population, typically assessed on computed tomography (CT) scans by evaluating vertebral height loss. This assessment helps determine the fracture's impact on spinal stability and the need for surgical intervention. However, the absence of pre-fracture CT scans and standardized vertebral references leads to measurement errors and inter-observer variability, while irregular compression patterns further challenge the precise grading of fracture severity. While deep learning methods have shown promise in aiding OVCFs screening, they often lack interpretability and sufficient sensitivity, limiting their clinical applicability. To address these challenges, we introduce a novel vertebra synthesis-height loss quantification-OVCFs grading framework. Our proposed model, HealthiVert-GAN 1, utilizes a coarse-to-fine synthesis network designed to generate pseudo-healthy vertebral images that simulate the pre-fracture state of fractured vertebrae. This model integrates three auxiliary modules that leverage the morphology and height information of adjacent healthy vertebrae to ensure anatomical consistency. Additionally, we introduce the Relative Height Loss of Vertebrae (RHLV) as a quantification metric, which divides each vertebra into three sections to measure height loss between pre-fracture and post-fracture states, followed by fracture severity classification using a Support Vector Machine (SVM). Our approach achieves state-of-the-art classification performance on both the Verse2019 dataset and in-house dataset, and it provides cross-sectional distribution maps of vertebral height loss. This practical tool enhances diagnostic accuracy in clinical settings and assisting in surgical decision-making.

AAAI Conference 2025 Conference Paper

Improving Natural Language Understanding for LLMs via Large-Scale Instruction Synthesis

  • Lin Yuan
  • Jun Xu
  • Honghao Gui
  • Mengshu Sun
  • Zhiqiang Zhang
  • Lei Liang
  • Jun Zhou

High-quality, large-scale instructions are crucial for aligning large language models (LLMs), however, there is a severe shortage of instruction in the field of natural language understanding (NLU). Previous works on constructing NLU instructions mainly focus on information extraction (IE), neglecting tasks such as machine reading comprehension, question answering, and text classification. Furthermore, the lack of diversity in the data has led to a decreased generalization ability of trained LLMs in other NLU tasks and a noticeable decline in the fundamental model's general capabilities. To address this issue, we propose Hum, a large-scale, high-quality synthetic instruction corpus for NLU tasks, designed to enhance the NLU capabilities of LLMs. Specifically, Hum includes IE (either close IE or open IE), machine reading comprehension, text classification, and instruction generalist tasks, thereby enriching task diversity. Additionally, we introduce a human-LLMs collaborative mechanism to synthesize instructions, which enriches instruction diversity by incorporating guidelines, preference rules, and format variants. We conduct extensive experiments on 5 NLU tasks and 28 general capability evaluation datasets for LLMs. Experimental results show that Hum enhances the NLU capabilities of six LLMs by an average of 3.1%, with no significant decline observed in other general capabilities.

AAAI Conference 2025 Conference Paper

ParseCaps: An Interpretable Parsing Capsule Network for Medical Image Diagnosis

  • Xinyu Geng
  • Jiaming Wang
  • Xiaolin Huang
  • Fanglin Chen
  • Jun Xu

Deep learning has excelled in medical image classification, but its clinical application is limited by poor interpretability. Capsule networks, known for encoding hierarchical relationships and spatial features, show potential in addressing this issue. Nevertheless, traditional capsule networks often underperform due to their shallow structures, and deeper variants lack hierarchical architectures, thereby compromising interpretability. This paper introduces a novel capsule network, ParseCaps, which utilizes the sparse axial attention routing and parse convolutional capsule layer to form a parse-tree-like structure, enhancing both depth and interpretability. Firstly, sparse axial attention routing optimizes connections between child and parent capsules, as well as emphasizes the weight distribution across instantiation parameters of parent capsules. Secondly, the parse convolutional capsule layer generates capsule predictions aligning with the parse tree. Finally, based on the loss design that is effective whether concept ground truth exists or not, ParseCaps advances interpretability by associating each dimension of the global capsule with a comprehensible concept, thereby facilitating clinician trust and understanding of the model's classification results. Experimental results on three medical datasets show that ParseCaps not only outperforms other capsule network variants in classification accuracy and robustness, but also provides interpretable explanations, regardless of the availability of concept labels.

AAAI Conference 2025 Conference Paper

Trigger3:Refining Query Correction via Adaptive Model Selector

  • Kepu Zhang
  • Zhongxiang Sun
  • Xiao Zhang
  • Xiaoxue Zang
  • Kai Zheng
  • Yang Song
  • Jun Xu

In search scenarios, user experience can be hindered by erroneous queries due to typos, voice errors, or knowledge gaps. Therefore, query correction is crucial for search engines. Current correction models, usually small models trained on specific data, often struggle with queries beyond their training scope or those requiring contextual understanding. While the advent of Large Language Models (LLMs) offers a potential solution, they are still limited by their pre-training data and inference cost, particularly for complex queries, making them not always effective for query correction. To tackle these, we propose Trigger3, a large-small model collaboration framework that integrates the traditional correction model and LLM for query correction, capable of adaptively choosing the appropriate correction method based on the query and the correction results from the traditional correction model and LLM. Trigger3 first employs a correction trigger to filter out correct queries. Incorrect queries are then corrected by the traditional correction model. If this fails, an LLM trigger is activated to call the LLM for correction. Finally, for queries that no model can correct, a fallback trigger decides to return the original query. Extensive experiments demonstrate Trigger3 outperforms correction baselines while maintaining efficiency.

ECAI Conference 2024 Conference Paper

Task-Aware Dynamic Transformer for Efficient Arbitrary-Scale Image Super-Resolution

  • Tianyi Xu
  • Yijie Zhou
  • Xiaotao Hu
  • Kai Zhang
  • Anran Zhang
  • Xingye Qiu
  • Jun Xu

Arbitrary-scale super-resolution (ASSR) aims to learn a single model for image super-resolution at arbitrary magnifying scales. Existing ASSR networks typically comprise an off-the-shelf scale-agnostic feature extractor and an arbitrary scale upsampler. These feature extractors often use fixed network architectures to address different ASSR inference tasks, each of which is characterized by an input image and an upsampling scale. However, this overlooks the difficulty variance of super-resolution on different inference scenarios, where simple images or small SR scales could be resolved with less computational effort than difficult images or large SR scales. To tackle this difficulty variability, in this paper, we propose a Task-Aware Dynamic Transformer (TADT) as an input-adaptive feature extractor for efficient image ASSR. Our TADT consists of a multi-scale feature extraction backbone built upon groups of Multi-Scale Transformer Blocks (MSTBs) and a Task-Aware Routing Controller (TARC). The TARC predicts the inference paths within feature extraction backbone, specifically selecting MSTBs based on the input images and SR scales. The prediction of inference path is guided by a new loss function to trade-off the SR accuracy and efficiency. Experiments demonstrate that, when working with three popular arbitrary-scale upsamplers, our TADT achieves state-of-the-art ASSR performance when compared with mainstream feature extractors, but with relatively fewer computational costs. The code is available at https: //github. com/Tillyhere/TADT.

IROS Conference 2023 Conference Paper

A Safety Filter for Realizing Safe Robot Navigation in Crowds

  • Kaijun Feng
  • Zetao Lu
  • Jun Xu
  • Haoyao Chen
  • Yunjiang Lou

It is challenging to realize the safe navigation of mobile robots in crowds. Most of the previous studies may lead to unsafe robot navigation in crowds, as safety guarantee is lacked. To solve this problem, we devise a safety filter (SF) that enables realization of safe robot navigation in crowds, and provides safety guarantees by verifying whether the optimal action recommended by an unsafe method is safe and, if not, corrects the action. The three main processes performed by the SF applied to given robot are (1) construction of the safe state constraints of the robot using a safe set; (2) construction of the safe action constraints of the robot based on discrete-time generalized velocity obstacles (DGVOs); and (3) determination of a feasible solution of the SF design problem, or, if none can be found, replacement of the above hard constraints with heuristic soft constraints. We used the SF with a reaction-based method and three learning-based methods in simulation experiments of random and non-random crowds, and the results showed that the SF decreases the collision rates and danger rates and thereby increases the success rates of these methods. We also deployed the SF with three learning-based methods on an mr1000 robot in real-world experiments, and the results showed that the SF enabled the robot using learning-based methods to navigate to its goal without colliding with humans.

NeurIPS Conference 2023 Conference Paper

Reward Imputation with Sketching for Contextual Batched Bandits

  • Xiao Zhang
  • Ninglu Shao
  • Zihua Si
  • Jun Xu
  • Wenhan Wang
  • Hanjing Su
  • Ji-Rong Wen

Contextual batched bandit (CBB) is a setting where a batch of rewards is observed from the environment at the end of each episode, but the rewards of the non-executed actions are unobserved, resulting in partial-information feedback. Existing approaches for CBB often ignore the rewards of the non-executed actions, leading to underutilization of feedback information. In this paper, we propose an efficient approach called Sketched Policy Updating with Imputed Rewards (SPUIR) that completes the unobserved rewards using sketching, which approximates the full-information feedbacks. We formulate reward imputation as an imputation regularized ridge regression problem that captures the feedback mechanisms of both executed and non-executed actions. To reduce time complexity, we solve the regression problem using randomized sketching. We prove that our approach achieves an instantaneous regret with controllable bias and smaller variance than approaches without reward imputation. Furthermore, our approach enjoys a sublinear regret bound against the optimal policy. We also present two extensions, a rate-scheduled version and a version for nonlinear rewards, making our approach more practical. Experimental results show that SPUIR outperforms state-of-the-art baselines on synthetic, public benchmark, and real-world datasets.

AAAI Conference 2021 Conference Paper

Regret Bounds for Online Kernel Selection in Continuous Kernel Space

  • Xiao Zhang
  • Shizhong Liao
  • Jun Xu
  • Ji-Rong Wen

Regret bounds of online kernel selection in a finite kernel set have been well studied, having at least an order O( √ NT) of magnitude after T rounds, where N is the number of candidate kernels. But it is still an unsolved problem to achieve sublinear regret bounds of online kernel selection in a continuous kernel space under different learning frameworks. In this paper, to represent different learning frameworks of online kernel selection, we divide online kernel selection approaches in a continuous kernel space into two categories according to the order of selection and training at each round. Then we construct a surrogate hypothesis space that contains all the candidate kernels with bounded norms and inner products, representing the continuously varying hypothesis space. Finally, we decompose the regrets of the proposed online kernel selection categories into different types of instantaneous regrets in the surrogate hypothesis space, and derive optimal regret bounds of order O( √ T) of magnitude under mild assumptions, independent of the cardinality of the continuous kernel space. Empirical studies verified the correctness of the theoretical regret analyses.

IJCAI Conference 2020 Conference Paper

Enhancing Dialog Coherence with Event Graph Grounded Content Planning

  • Jun Xu
  • Zeyang Lei
  • Haifeng Wang
  • Zheng-Yu Niu
  • Hua Wu
  • Wanxiang Che

How to generate informative, coherent and sustainable open-domain conversations is a non-trivial task. Previous work on knowledge grounded conversation generation focus on improving dialog informativeness with little attention on dialog coherence. In this paper, to enhance multi-turn dialog coherence, we propose to leverage event chains to help determine a sketch of a multi-turn dialog. We first extract event chains from narrative texts and connect them as a graph. We then present a novel event graph grounded Reinforcement Learning (RL) framework. It conducts high-level response content (simply an event) planning by learning to walk over the graph, and then produces a response conditioned on the planned content. In particular, we devise a novel multi-policy decision making mechanism to foster a coherent dialog with both appropriate content ordering and high contextual relevance. Experimental results indicate the effectiveness of this framework in terms of dialog coherence and informativeness.

NeurIPS Conference 2020 Conference Paper

ICNet: Intra-saliency Correlation Network for Co-Saliency Detection

  • Wen-Da Jin
  • Jun Xu
  • Ming-Ming Cheng
  • Yi Zhang
  • Wei Guo

Intra-saliency and inter-saliency cues have been extensively studied for co-saliency detection (Co-SOD). Model-based methods produce coarse Co-SOD results due to hand-crafted intra- and inter-saliency features. Current data-driven models exploit inter-saliency cues, but undervalue the potential power of intra-saliency cues. In this paper, we propose an Intra-saliency Correlation Network (ICNet) to extract intra-saliency cues from the single image saliency maps (SISMs) predicted by any off-the-shelf SOD method, and obtain inter-saliency cues by correlation techniques. Specifically, we adopt normalized masked average pooling (NMAP) to extract latent intra-saliency categories from the SISMs and semantic features as intra cues. Then we employ a correlation fusion module (CFM) to obtain inter cues by exploiting correlations between the intra cues and single-image features. To improve Co-SOD performance, we propose a category-independent rearranged self-correlation feature (RSCF) strategy. Experiments on three benchmarks show that our ICNet outperforms previous state-of-the-art methods on Co-SOD. Ablation studies validate the effectiveness of our contributions. The PyTorch code is available at https: //github. com/blanclist/ICNet.

AAAI Conference 2020 Conference Paper

Knowledge Graph Grounded Goal Planning for Open-Domain Conversation Generation

  • Jun Xu
  • Haifeng Wang
  • Zhengyu Niu
  • Hua Wu
  • Wanxiang Che

Previous neural models on open-domain conversation generation have no effective mechanisms to manage chatting topics, and tend to produce less coherent dialogs. Inspired by the strategies in human-human dialogs, we divide the task of multi-turn open-domain conversation generation into two sub-tasks: explicit goal (chatting about a topic) sequence planning and goal completion by topic elaboration. To this end, we propose a three-layer Knowledge aware Hierarchical Reinforcement Learning based Model (KnowHRL). Specifically, for the first sub-task, the upper-layer policy learns to traverse a knowledge graph (KG) in order to plan a high-level goal sequence towards a good balance between dialog coherence and topic consistency with user interests. For the second sub-task, the middle-layer policy and the lower-layer one work together to produce an in-depth multi-turn conversation about a single topic with a goal-driven generation mechanism. The capability of goal-sequence planning enables chatbots to conduct proactive open-domain conversations towards recommended topics, which has many practical applications. Experiments demonstrate that our model outperforms state of the art baselines in terms of user-interest consistency, dialog coherence, and knowledge accuracy.

AAAI Conference 2019 Conference Paper

Differentiated Distribution Recovery for Neural Text Generation

  • Jianing Li
  • Yanyan Lan
  • Jiafeng Guo
  • Jun Xu
  • Xueqi Cheng

Neural language models based on recurrent neural networks (RNNLM) have significantly improved the performance for text generation, yet the quality of generated text represented by Turing Test pass rate is still far from satisfying. Some researchers propose to use adversarial training or reinforcement learning to promote the quality, however, such methods usually introduce great challenges in the training and parameter tuning processes. Through our analysis, we find the problem of RNNLM comes from the usage of maximum likelihood estimation (MLE) as the objective function, which requires the generated distribution to precisely recover the true distribution. Such requirement favors high generation diversity which restricted the generation quality. This is not suitable when the overall quality is low, since high generation diversity usually indicates lot of errors rather than diverse good samples. In this paper, we propose to achieve differentiated distribution recovery, DDR for short. The key idea is to make the optimal generation probability proportional to the β-th power of the true probability, where β > 1. In this way, the generation quality can be greatly improved by sacrificing diversity from noises and rare patterns. Experiments on synthetic data and two public text datasets show that our DDR method achieves more flexible quality-diversity trade-off and higher Turing Test pass rate, as compared with baseline methods including RNNLM, SeqGAN and LeakGAN.

IJCAI Conference 2019 Conference Paper

Generating Multiple Diverse Responses with Multi-Mapping and Posterior Mapping Selection

  • Chaotao Chen
  • Jinhua Peng
  • Fan Wang
  • Jun Xu
  • Hua Wu

In human conversation an input post is open to multiple potential responses, which is typically regarded as a one-to-many problem. Promising approaches mainly incorporate multiple latent mechanisms to build the one-to-many relationship. However, without accurate selection of the latent mechanism corresponding to the target response during training, these methods suffer from a rough optimization of latent mechanisms. In this paper, we propose a multi-mapping mechanism to better capture the one-to-many relationship, where multiple mapping modules are employed as latent mechanisms to model the semantic mappings from an input post to its diverse responses. For accurate optimization of latent mechanisms, a posterior mapping selection module is designed to select the corresponding mapping module according to the target response for further optimization. We also introduce an auxiliary matching loss to facilitate the optimization of posterior mapping selection. Empirical results demonstrate the superiority of our model in generating multiple diverse and informative responses over the state-of-the-art methods.

AAAI Conference 2019 Conference Paper

HAS-QA: Hierarchical Answer Spans Model for Open-Domain Question Answering

  • Liang Pang
  • Yanyan Lan
  • Jiafeng Guo
  • Jun Xu
  • Lixin Su
  • Xueqi Cheng

This paper is concerned with open-domain question answering (i. e. , OpenQA). Recently, some works have viewed this problem as a reading comprehension (RC) task, and directly applied successful RC models to it. However, the performances of such models are not so good as that in the RC task. In our opinion, the perspective of RC ignores three characteristics in OpenQA task: 1) many paragraphs without the answer span are included in the data collection; 2) multiple answer spans may exist within one given paragraph; 3) the end position of an answer span is dependent with the start position. In this paper, we first propose a new probabilistic formulation of OpenQA, based on a three-level hierarchical structure, i. e. , the question level, the paragraph level and the answer span level. Then a Hierarchical Answer Spans Model (HAS- QA) is designed to capture each probability. HAS-QA has the ability to tackle the above three problems, and experiments on public OpenQA datasets show that it significantly outperforms traditional RC baselines and recent OpenQA baselines.

IJCAI Conference 2019 Conference Paper

HorNet: A Hierarchical Offshoot Recurrent Network for Improving Person Re-ID via Image Captioning

  • Shiyang Yan
  • Jun Xu
  • Yuai Liu
  • Lin Xu

Person re-identification (re-ID) aims to recognize a person-of-interest across different cameras with notable appearance variance. Existing research works focused on the capability and robustness of visual representation. In this paper, instead, we propose a novel hierarchical offshoot recurrent network (HorNet) for improving person re-ID via image captioning. Image captions are semantically richer and more consistent than visual attributes, which could significantly alleviate the variance. We use the similarity preserving generative adversarial network (SPGAN) and an image captioner to fulfill domain transfer and language descriptions generation. Then the proposed HorNet can learn the visual and language representation from both the images and captions jointly, and thus enhance the performance of person re-ID. Extensive experiments are conducted on several benchmark datasets with or without image captions, i. e. , CUHK03, Market-1501, and Duke-MTMC, demonstrating the superiority of the proposed method. Our method can generate and extract meaningful image captions while achieving state-of-the-art performance.

AAAI Conference 2019 Short Paper

Teaching Machines to Extract Main Content for Machine Reading Comprehension

  • Zhaohui Li
  • Yue Feng
  • Jun Xu
  • Jiafeng Guo
  • Yanyan Lan
  • Xueqi Cheng

Machine reading comprehension, whose goal is to find answers from the candidate passages for a given question, has attracted a lot of research efforts in recent years. One of the key challenge in machine reading comprehension is how to identify the main content from a large, redundant, and overlapping set of candidate sentences. In this paper we propose to tackle the challenge with Markov Decision Process in which the main content identification is formalized as sequential decision making and each action corresponds to selecting a sentence. Policy gradient is used to learn the model parameters. Experimental results based on MSMARCO showed that the proposed model, called MC-MDP, can select high quality main contents and significantly improved the performances of answer span prediction.

AAAI Conference 2018 Short Paper

Fast Approximate Nearest Neighbor Search via k-Diverse Nearest Neighbor Graph

  • Yan Xiao
  • Jiafeng Guo
  • Yanyan Lan
  • Jun Xu
  • Xueqi Cheng

Approximate nearest neighbor search is a fundamental problem and has been studied for a few decades. Recently graphbased indexing methods have demonstrated their great efficiency, whose main idea is to construct neighborhood graph offline and perform a greedy search starting from some sampled points of the graph online. Most existing graph-based methods focus on either the precise k-nearest neighbor (k- NN) graph which has good exploitation ability, or the diverse graph which has good exploration ability. In this paper, we propose the k-diverse nearest neighbor (k-DNN) graph, which balances the precision and diversity of the graph, leading to good exploitation and exploration abilities simultaneously. We introduce an efficient indexing algorithm for the construction of the k-DNN graph inspired by a well-known diverse ranking algorithm in information retrieval (IR). Experimental results show that our method can outperform both state-of-the-art precise graph and diverse graph methods.

NeurIPS Conference 2018 Conference Paper

Multivariate Time Series Imputation with Generative Adversarial Networks

  • Yonghong Luo
  • Xiangrui Cai
  • Ying Zhang
  • Jun Xu
  • Yuan xiaojie

Multivariate time series usually contain a large number of missing values, which hinders the application of advanced analysis methods on multivariate time series data. Conventional approaches to addressing the challenge of missing values, including mean/zero imputation, case deletion, and matrix factorization-based imputation, are all incapable of modeling the temporal dependencies and the nature of complex distribution in multivariate time series. In this paper, we treat the problem of missing value imputation as data generation. Inspired by the success of Generative Adversarial Networks (GAN) in image generation, we propose to learn the overall distribution of a multivariate time series dataset with GAN, which is further used to generate the missing values for each sample. Different from the image data, the time series data are usually incomplete due to the nature of data recording process. A modified Gate Recurrent Unit is employed in GAN to model the temporal irregularity of the incomplete time series. Experiments on two multivariate time series datasets show that the proposed model outperformed the baselines in terms of accuracy of imputation. Experimental results also showed that a simple model on the imputed data can achieve state-of-the-art results on the prediction tasks, demonstrating the benefits of our model in downstream applications.

IJCAI Conference 2018 Conference Paper

Reinforcing Coherence for Sequence to Sequence Model in Dialogue Generation

  • Hainan Zhang
  • Yanyan Lan
  • Jiafeng Guo
  • Jun Xu
  • Xueqi Cheng

Sequence to sequence (Seq2Seq) approach has gained great attention in the field of single-turn dialogue generation. However, one serious problem is that most existing Seq2Seq based models tend to generate common responses lacking specific meanings. Our analysis show that the underlying reason is that Seq2Seq is equivalent to optimizing Kullback–Leibler (KL) divergence, thus does not penalize the case whose generated probability is high while the true probability is low. However, the true probability is unknown, which poses challenges for tackling this problem. Inspired by the fact that the coherence (i. e. similarity) between post and response is consistent with human evaluation, we hypothesize that the true probability of a response is proportional to the coherence degree. The coherence scores are then used as the reward function in a reinforcement learning framework to penalize the case whose generated probability is high while the true probability is low. Three different types of coherence models, including an unlearned similarity function, a pretrained semantic matching function, and an end-to-end dual learning architecture, are proposed in this paper. Experimental results on both Chinese Weibo dataset and English Subtitle dataset show that the proposed models produce more specific and meaningful responses, yielding better performances against Seq2Seq models in terms of both metric-based and human evaluations.

TIST Journal 2017 Journal Article

Directly Optimize Diversity Evaluation Measures

  • Jun Xu
  • Long Xia
  • Yanyan Lan
  • Jiafeng Guo
  • Xueqi Cheng

The queries issued to search engines are often ambiguous or multifaceted, which requires search engines to return diverse results that can fulfill as many different information needs as possible; this is called search result diversification. Recently, the relational learning to rank model, which designs a learnable ranking function following the criterion of maximal marginal relevance, has shown effectiveness in search result diversification [Zhu et al. 2014]. The goodness of a diverse ranking model is usually evaluated with diversity evaluation measures such as α-NDCG [Clarke et al. 2008], ERR-IA [Chapelle et al. 2009], and D#-NDCG [Sakai and Song 2011]. Ideally the learning algorithm would train a ranking model that could directly optimize the diversity evaluation measures with respect to the training data. Existing relational learning to rank algorithms, however, only train the ranking models by optimizing loss functions that loosely relate to the evaluation measures. To deal with the problem, we propose a general framework for learning relational ranking models via directly optimizing any diversity evaluation measure. In learning, the loss function upper-bounding the basic loss function defined on a diverse ranking measure is minimized. We can derive new diverse ranking algorithms under the framework, and several diverse ranking algorithms are created based on different upper bounds over the basic loss function. We conducted comparisons between the proposed algorithms with conventional diverse ranking methods using the TREC benchmark datasets. Experimental results show that the algorithms derived under the diverse learning to rank framework always significantly outperform the state-of-the-art baselines.

AAAI Conference 2016 Conference Paper

A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations

  • Shengxian Wan
  • Yanyan Lan
  • Jiafeng Guo
  • Jun Xu
  • Liang Pang
  • Xueqi Cheng

Matching natural language sentences is central for many applications such as information retrieval and question answering. Existing deep models rely on a single sentence representation or multiple granularity representations for matching. However, such methods cannot well capture the contextualized local information in the matching process. To tackle this problem, we present a new deep architecture to match two sentences with multiple positional sentence representations. Specifically, each positional sentence representation is a sentence representation at this position, generated by a bidirectional long short term memory (Bi-LSTM). The matching score is finally produced by aggregating interactions between these different positional sentence representations, through k-Max pooling and a multi-layer perceptron. Our model has several advantages: (1) By using Bi-LSTM, rich context of the whole sentence is leveraged to capture the contextualized local information in each positional sentence representation; (2) By matching with multiple positional sentence representations, it is flexible to aggregate different important contextualized local information in a sentence to support the matching; (3) Experiments on different tasks such as question answering and sentence completion demonstrate the superiority of our model.

JBHI Journal 2016 Journal Article

Fusing Heterogeneous Features From Stacked Sparse Autoencoder for Histopathological Image Analysis

  • Xiaofan Zhang
  • Hang Dou
  • Tao Ju
  • Jun Xu
  • Shaoting Zhang

In the analysis of histopathological images, both holistic (e. g. , architecture features) and local appearance features demonstrate excellent performance, while their accuracy may vary dramatically when providing different inputs. This motivates us to investigate how to fuse results from these features to enhance the accuracy. Particularly, we employ content-based image retrieval approaches to discover morphologically relevant images for image-guided diagnosis, using holistic and local features, both of which are generated from the cell detection results by a stacked sparse autoencoder. Because of the dramatically different characteristics and representations of these heterogeneous features (i. e. , holistic and local), their results may not agree with each other, causing difficulties for traditional fusion methods. In this paper, we employ a graph-based query-specific fusion approach where multiple retrieval results (i. e. , rank lists) are integrated and reordered based on a fused graph. The proposed method is capable of combining the strengths of local or holistic features adaptively for different inputs. We evaluate our method on a challenging clinical problem, i. e. , histopathological image-guided diagnosis of intraductal breast lesions, and it achieves 91. 67% classification accuracy on 120 breast tissue images from 40 patients.

AAAI Conference 2016 Conference Paper

Inside Out: Two Jointly Predictive Models for Word Representations and Phrase Representations

  • Fei Sun
  • Jiafeng Guo
  • Yanyan Lan
  • Jun Xu
  • Xueqi Cheng

Distributional hypothesis lies in the root of most existing word representation models by inferring word meaning from its external contexts. However, distributional models cannot handle rare and morphologically complex words very well and fail to identify some fine-grained linguistic regularity as they are ignoring the word forms. On the contrary, morphology points out that words are built from some basic units, i. e. , morphemes. Therefore, the meaning and function of such rare words can be inferred from the words sharing the same morphemes, and many syntactic relations can be directly identified based on the word forms. However, the limitation of morphology is that it cannot infer the relationship between two words that do not share any morphemes. Considering the advantages and limitations of both approaches, we propose two novel models to build better word representations by modeling both external contexts and internal morphemes in a jointly predictive way, called BEING and SEING. These two models can also be extended to learn phrase representations according to the distributed morphology theory. We evaluate the proposed models on similarity tasks and analogy tasks. The results demonstrate that the proposed models can outperform state-of-the-art models significantly on both word and phrase representation learning.

IJCAI Conference 2016 Conference Paper

Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN

  • Shengxian Wan
  • Yanyan Lan
  • Jun Xu
  • Jiafeng Guo
  • Liang Pang
  • Xueqi Cheng

Semantic matching, which aims to determine the matching degree between two texts, is a fundamental problem for many NLP applications. Recently, deep learning approach has been applied to this problem and significant improvements have been achieved. In this paper, we propose to view the generation of the global interaction between two texts as a recursive process: i. e. ~the interaction of two texts at each position is a composition of the interactions between their prefixes as well as the word level interaction at the current position. Based on this idea, we propose a novel deep architecture, namely Match-SRNN, to model the recursive matching structure. Firstly, a tensor is constructed to capture the word level interactions. Then a spatial RNN is applied to integrate the local interactions recursively, with importance determined by four types of gates. Finally, the matching score is calculated based on the global interaction. We show that, after degenerated to the exact matching scenario, Match-SRNN can approximate the dynamic programming process of longest common subsequence. Thus, there exists a clear interpretation for Match-SRNN. Our experiments on two semantic matching tasks showed the effectiveness of Match-SRNN, and its ability of visualizing the learned matching structure.

AAAI Conference 2016 Conference Paper

SPAN: Understanding a Question with Its Support Answers

  • Liang Pang
  • Yanyan Lan
  • Jiafeng Guo
  • Jun Xu
  • Xueqi Cheng

Matching a question to its best answer is a common task in community question answering. In this paper, we focus on the non-factoid questions and aim to pick out the best answer from its candidate answers. Most of the existing deep models directly measure the similarity between question and answer by their individual sentence embeddings. In order to tackle the problem of the information lack in question’s descriptions and the lexical gap between questions and answers, we propose a novel deep architecture namely SPAN in this paper. Specifically we introduce support answers to help understand the question, which are defined as the best answers of those similar questions to the original one. Then we can obtain two kinds of similarities, one is between question and the candidate answer, and the other one is between support answers and the candidate answer. The matching score is finally generated by combining them. Experiments on Yahoo! Answers demonstrate that SPAN can outperform the baseline models.

AAAI Conference 2016 Conference Paper

Text Matching as Image Recognition

  • Liang Pang
  • Yanyan Lan
  • Jiafeng Guo
  • Jun Xu
  • Shengxian Wan
  • Xueqi Cheng

Matching two texts is a fundamental problem in many natural language processing tasks. An effective way is to extract meaningful matching patterns from words, phrases, and sentences to produce the matching score. Inspired by the success of convolutional neural network in image recognition, where neurons can capture many complicated patterns based on the extracted elementary visual patterns such as oriented edges and corners, we propose to model text matching as the problem of image recognition. Firstly, a matching matrix whose entries represent the similarities between words is constructed and viewed as an image. Then a convolutional neural network is utilized to capture rich matching patterns in a layer-by-layer way. We show that by resembling the compositional hierarchies of patterns in image recognition, our model can successfully identify salient signals such as n-gram and n-term matchings. Experimental results demonstrate its superiority against the baselines.

AAAI Conference 2015 Conference Paper

A Probabilistic Model for Bursty Topic Discovery in Microblogs

  • Xiaohui Yan
  • Jiafeng Guo
  • Yanyan Lan
  • Jun Xu
  • Xueqi Cheng

Bursty topics discovery in microblogs is important for people to grasp essential and valuable information. However, the task is challenging since microblog posts are particularly short and noisy. This work develops a novel probabilistic model, namely Bursty Biterm Topic Model (BBTM), to deal with the task. BBTM extends the Biterm Topic Model (BTM) by incorporating the burstiness of biterms as prior knowledge for bursty topic modeling, which enjoys the following merits: 1) It can well solve the data sparsity problem in topic modeling over short texts as the same as BTM; 2) It can automatical discover high quality bursty topics in microblogs in a principled and efficient way. Extensive experiments on a standard Twitter dataset show that our approach outperforms the state-of-the-art baselines significantly.

JMLR Journal 2011 Journal Article

Learning a Robust Relevance Model for Search Using Kernel Methods

  • Wei Wu
  • Jun Xu
  • Hang Li
  • Satoshi Oyama

This paper points out that many search relevance models in information retrieval, such as the Vector Space Model, BM25 and Language Models for Information Retrieval, can be viewed as a similarity function between pairs of objects of different types, referred to as an S-function. An S-function is specifically defined as the dot product between the images of two objects in a Hilbert space mapped from two different input spaces. One advantage of taking this view is that one can take a unified and principled approach to address the issues with regard to search relevance. The paper then proposes employing a kernel method to learn a robust relevance model as an S-function, which can effectively deal with the term mismatch problem, one of the biggest challenges in search. The kernel method exploits a positive semi-definite kernel referred to as an S-kernel. The paper shows that when using an S-kernel the model learned by the kernel method is guaranteed to be an S-function. The paper then gives more general principles for constructing S-kernels. A specific implementation of the kernel method is proposed using the Ranking SVM techniques and click-through data. The proposed approach is employed to learn a relevance model as an extension of BM25, referred to as Robust BM25. Experimental results on web search and enterprise search data show that Robust BM25 significantly outperforms baseline methods and can successfully tackle the term mismatch problem. [abs] [ pdf ][ bib ] &copy JMLR 2011. ( edit, beta )