Arrow Research search

Author name cluster

Yang Deng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers
2 author rows

Possible papers

19

AAAI Conference 2026 Conference Paper

Do Retrieval Augmented Language Models Know When They Don’t Know?

  • Youchao Zhou
  • Heyan Huang
  • Yicheng Liu
  • Rui Dai
  • Xinglin Wang
  • Xingchen Zhang
  • Shumin Shi
  • Yang Deng

Existing large language models (LLMs) occasionally generate plausible yet factually incorrect responses, known as hallucinations. Two main approaches have been proposed to mitigate hallucinations: retrieval-augmented language models (RALMs) and refusal post-training. However, current research predominantly focuses on their individual effectiveness while overlooking the evaluation of the refusal capability of RALMs. Ideally, if RALMs know when they do not know, they should refuse to answer. In this study, we ask the fundamental question: Do RALMs know when they don’t know? Specifically, we investigate three questions. First, are RALMs well calibrated with respect to different internal and external knowledge states? We examine the influence of various factors. Contrary to expectations, when all retrieved documents are irrelevant, RALMs still tend to refuse questions they could have answered correctly. Next, given the model's pronounced over-refusal behavior, we raise a second question: How does a RALM's refusal ability align with its calibration quality? Our results show that the over-refusal problem can be mitigated through in-context fine-tuning. However, we observe that improved refusal behavior does not necessarily imply better calibration or higher overall accuracy. Finally, we ask: Can we combine refusal-aware RALMs with uncertainty-based answer abstention to mitigate over-refusal? We develop a simple yet effective refusal mechanism for refusal-post-trained RALMs that improves their overall answer quality by balancing refusal and correct answers. Our study provides a more comprehensive understanding of the factors influencing RALM behavior. Meanwhile, we emphasize that uncertainty estimation for RALMs remains an open problem deserving deeper investigation.

AAAI Conference 2026 Conference Paper

Pano-GS: Perception-Aware Gaussian Optimization with Gradient Consistency and Multi-Criteria Densification for High-Quality Rendering

  • Yang Deng
  • Zhanke Wang
  • Jiahao Wu
  • Jie Liang
  • Jingui Ma
  • Yang Hu
  • Ronggang Wang

Reconstructing 3D scenes from multi-view image sequences remains a significant challenge in practical applications. While recent advances in 3D Gaussian Splatting have enabled high-quality rendering, existing methods rely heavily on pixel-level L1 loss, which misaligns with human perception, leading to a lack of high-frequency details and the emergence of artifacts. Additionally, the position gradient-based densification strategy often results in under-densified Gaussian primitives, thereby degrading rendering quality. To address these challenges, we propose Pano-GS, a perception-aware Gaussian optimization framework. Specifically, we introduce a gradient consistency-constrained loss to capture high-frequency details, mitigating the inherent shortcomings of traditional L1 loss and enhancing reconstruction fidelity. In addition, we use a multi-criteria densification strategy to reduce the sole reliance on average position gradients. Extensive experiments demonstrate that Pano-GS achieves state-of-the-art performance, confirming its effectiveness and robust generalization across diverse real-world scenes.

AAAI Conference 2026 Conference Paper

RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems

  • Mengfan Li
  • Xuanhua Shi
  • Yang Deng

Large Language models (LLMs) are revolutionizing the conversational recommender systems (CRS) through their impressive capabilities in instruction comprehension, reasoning, and human interaction. A core factor underlying effective dialogue is the ability to infer and reason about others' mental states (such as desire, intention, and belief), a cognitive capacity commonly referred to as Theory of Mind (ToM). Despite growing interest in evaluating ToM in LLMs, current benchmarks predominantly rely on synthetic narratives inspired by Sally-Anne test, which emphasize physical perception and fail to capture the complexity of mental state inference in real-world conversational settings. Moreover,existing benchmarks often overlook a critical component of human ToM: behavioral prediction, the ability to use inferred mental states to guide strategic decision-making and select appropriate conversational actions for future interactions. To better align LLM-based ToM evaluation with human-like social reasoning, we propose RecToM, a novel benchmark for evaluating ToM abilities in recommendation dialogues. RecToM focuses on two complementary dimensions: Cognitive Inference and Behavioral Prediction. The former focus on understanding what has been communicated by inferring the underlying mental states. The latter emphasizes what should be done next, evaluating whether LLMs can leverage these inferred mental states to predict, select, and assess appropriate dialogue strategies. Together, these dimensions enable a comprehensive assessment of ToM reasoning in CRS. Extensive experiments on state-of-the-art LLMs demonstrate that RecToM poses a significant challenge. While the models exhibit partial competence in recognizing mental states, they struggle to maintain coherent, strategic ToM reasoning throughout dynamic recommendation dialogues, particularly in tracking evolving intentions and aligning conversational strategies with inferred mental states.

AAAI Conference 2026 Conference Paper

Towards Human-centered Proactive Conversational AI

  • Yang Deng

Conversational AI agents are envisioned to provide social support or functional service to human users via natural language interactions. The popularity of conversational AI has grown unprecedentedly with the advent of ChatGPT, which showcases exceptional proficiency in the capabilities of context understanding and response generation with large language models (LLMs). However, typical conversational systems are built to follow instructions, which means that the conversation is led by the user, and the system simply follows the user’s instructions or intents. My research endows the conversational AI with the capabilities of creating or controlling the conversation to achieve the conversational goals by taking initiative and anticipating impacts on themselves or human users, namely Proactive Conversational AI. I will also highlight the importance of moving towards building human-centered proactive conversational AI that emphasize human needs and expectations, and that considers ethical and social implications of these agents, rather than solely focusing on technological capabilities.

AAAI Conference 2026 Conference Paper

Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities

  • Weixiang Zhao
  • Xingyu Sui
  • Jiahe Guo
  • Yulin Hu
  • Yang Deng
  • Yanyan Zhao
  • Xuda Zhi
  • Yongbo Huang

Recent advancements in Large Reasoning Models (LRMs), such as OpenAI's o1/o3 and DeepSeek-R1, have demonstrated remarkable performance in specialized reasoning tasks through human-like deliberative thinking and long chain-of-thought reasoning. However, our systematic evaluation across various model families (DeepSeek, Qwen, and LLaMA) and scales (7B to 32B) reveals that acquiring these deliberative reasoning capabilities significantly reduces the foundational capabilities of LRMs, including notable declines in helpfulness and harmlessness, alongside substantially increased inference costs. Importantly, we demonstrate that adaptive reasoning---employing modes like Zero-Thinking, Less-Thinking, and Summary-Thinking---can effectively alleviate these drawbacks. Our empirical insights underline the critical need for developing more versatile LRMs capable of dynamically allocating inference-time compute according to specific task characteristics.

AAAI Conference 2025 Conference Paper

Aligning Large Language Models for Faithful Integrity Against Opposing Argument

  • Yong Zhao
  • Yang Deng
  • See-Kiong Ng
  • Tat-Seng Chua

Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks. However, they can be easily misled by unfaithful arguments during conversations, even when their original statements are correct. To this end, we investigate the problem of maintaining faithful integrity in LLMs. This involves ensuring that LLMs adhere to their faithful statements in the face of opposing arguments and are able to correct their incorrect statements when presented with faithful arguments. In this work, we propose a novel framework, named Alignment for Faithful Integrity with Confidence Estimation (AFICE), which aims to align the LLM responses with faithful integrity. Specifically, AFICE first designs a Bilateral Confidence Estimation (BCE) approach for estimating the uncertainty of each response generated by the LLM given a specific context, which simultaneously estimate the model's confidence to the question based on the internal states during decoding as well as to the answer based on cumulative probability ratios. With the BCE, we construct a conversational preference dataset composed of context, original statement, and argument, which is adopted for aligning the LLM for faithful integrity using Direct Preference Optimization (DPO). Extensive experimental results on a wide range of benchmarks demonstrate significant improvements in the LLM's ability to maintain faithful responses when encountering opposing arguments, ensuring both the practical utility and trustworthiness of LLMs in complex interactive settings.

AAAI Conference 2025 Conference Paper

LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph

  • Tu Ao
  • Yanhua Yu
  • Yuling Wang
  • Yang Deng
  • Zirui Guo
  • Liang Pang
  • Pinghui Wang
  • Tat-Seng Chua

Large Language Models (LLMs) have impressive capabilities in text understanding and zero-shot reasoning. However, delays in knowledge updates may cause them to reason incorrectly or produce harmful results. Knowledge Graphs (KGs) provide rich and reliable contextual information for the reasoning process of LLMs by structurally organizing and connecting a wide range of entities and relations. Existing KG-based LLM reasoning methods only inject KGs' knowledge into prompts in a textual form, ignoring its structural information. Moreover, they mostly rely on close-source models or open-source models with large parameters, which poses challenges to high resource consumption. To address this, we propose a novel Lightweight and efficient Prompt learning-ReasOning Framework for KGQA (LightPROF), which leverages the full potential of LLMs to tackle complex reasoning tasks in a parameter-efficient manner. Specifically, LightPROF follows a “Retrieve-Embed-Reason” process, first accurately, and stably retrieving the corresponding reasoning graph from the KG through retrieval module. Next, through a Transformer-based Knowledge Adapter, it finely extracts and integrates factual and structural information from the KG, then maps this information to the LLM’s token embedding space, creating an LLM-friendly prompt to be used by the LLM for the final reasoning. Additionally, LightPROF only requires training Knowledge Adapter and can be compatible with any open-source LLM. Extensive experiments on two public KGQA benchmarks demonstrate that LightPROF achieves superior performance with small-scale LLMs. Furthermore, LightPROF shows significant advantages in terms of input token count and reasoning time.

IROS Conference 2025 Conference Paper

Mapless Collision-Free Flight via MPC using Dual KD-Trees in Cluttered Environments

  • Linzuo Zhang
  • Yu Hu 0019
  • Yang Deng
  • Feng Yu 0024
  • Danping Zou

Collision-free flight in cluttered environments is a critical capability for autonomous quadrotors. Traditional methods often rely on detailed 3D map construction, trajectory generation, and tracking. However, this cascade pipeline can introduce accumulated errors and computational delays, limiting flight agility and safety. In this paper, we propose a novel method for enabling collision-free flight in cluttered environments without explicitly constructing 3D maps or generating and tracking collision-free trajectories. Instead, we leverage Model Predictive Control (MPC) to directly produce safe actions from sparse waypoints and point clouds from a depth camera. These sparse waypoints are dynamically adjusted online based on nearby obstacles detected from point clouds. To achieve this, we introduce a dual KD-Tree mechanism: the Obstacle KD-Tree quickly identifies the nearest obstacle for avoidance, while the Edge KD-Tree provides a robust initial guess for the MPC solver, preventing it from getting stuck in local minima during obstacle avoidance. We validate our approach through extensive simulations and real-world experiments. The results show that our approach significantly outperforms the mapping-based methods and is also superior to imitation learning-based methods, demonstrating reliable obstacle avoidance at up to 12 m/s in simulations and 6 m/s in real-world tests. Our method provides a simple and robust alternative to existing methods. The code is publicly available at https://github.com/SJTU-ViSYS-team/avoid-mpc.

NeurIPS Conference 2025 Conference Paper

SAP: Exact Sorting in Splatting via Screen-Aligned Primitives

  • Zhanke Wang
  • Zhiyan Wang
  • Kaiqiang Xiong
  • Wu Jiahao
  • Yang Deng
  • Ronggang Wang

Recently, 3D Gaussian Splatting (3DGS) has achieved state-of-the-art rendering results. However, its efficiency relies on simplifications that disregard the thickness of Gaussian primitives and their overlapping interactions. These simplifications can lead to popping artifacts due to inaccurate sorting, thereby affecting the rendering quality. In this paper, we propose Screen-Aligned Primitives (SAP), an anisotropic kernel that generates primitives parallel to the image plane for each view. Our rasterization pipeline enables full per-pixel ordering in real time. Since the primitives are parallel for a given viewpoint, a single global sorting operation suffices for correct per-pixel depth ordering. We formulate 3D reconstruction as a combination of a 3D-consistent decoder and 2D view-specific primitives, and further propose a highly efficient decoder to ensure 3D consistency. Moreover, within our framework, the primitive function values remain consistent between view space and screen space, allowing arbitrary radial basis functions (RBFs) to represent the scene without introducing projection errors. Experiments on diverse datasets demonstrate that our method achieves state-of-the-art rendering quality while maintaining real-time performance.

NeurIPS Conference 2025 Conference Paper

The Rise of Parameter Specialization for Knowledge Storage in Large Language Models

  • Yihuai Hong
  • Yiran Zhao
  • Wei Tang
  • Yang Deng
  • Yu Rong
  • Wenxuan Zhang

Over time, a growing wave of large language models from various series has been introduced to the community. Researchers are striving to maximize the performance of language models with constrained parameter sizes. However, from a microscopic perspective, there has been limited research on how to better store knowledge in model parameters, particularly within MLPs, to enable more effective utilization of this knowledge by the model. In this work, we analyze twenty publicly available open-source large language models to investigate the relationship between their strong performance and the way knowledge is stored in their corresponding MLP parameters. Our findings reveal that as language models become more advanced and demonstrate stronger knowledge capabilities, their parameters exhibit increased specialization. Specifically, parameters in the MLPs tend to be more focused on encoding similar types of knowledge. We experimentally validate that this specialized distribution of knowledge contributes to improving the efficiency of knowledge utilization in these models. Furthermore, by conducting causal training experiments, we confirm that this specialized knowledge distribution plays a critical role in improving the model's efficiency in leveraging stored knowledge.

NeurIPS Conference 2025 Conference Paper

Unveiling the Uncertainty in Embodied and Operational Carbon of Large AI Models through a Probabilistic Carbon Accounting Model

  • Xiaoyang Zhang
  • He Fang
  • Yang Deng
  • Dan Wang

The rapid growth of large AI models has raised significant environmental concerns due to their substantial carbon footprint. Existing carbon accounting methods for AI models are fundamentally deterministic and fail to account for inherent uncertainties in embodied and operational carbon emissions. Our work aims to investigate the effect of these uncertainties on embodied and operational carbon footprint estimates for large AI models. We propose a Probabilistic Carbon Accounting Model (PCAM), which quantifies uncertainties in the carbon accounting of large AI models. We develop parameter models to quantify key components (processors, memory, storage) in the carbon footprint of AI models. To characterize the distribution of the parameters, we develop a carbon dataset by aggregating related data from various sources. Then, we generate the probabilistic distribution of the parameters from the collected dataset. We compare the performance of PCAM with LLMCarbon, the state-of-the-art carbon accounting method for large AI models. PCAM achieves $\leq7. 44\%$ error compared to LLMCarbon’s $\leq108. 51\%$.

IJCAI Conference 2025 Conference Paper

Weather Foundation Model Enhanced Decentralized Photovoltaic Power Forecasting Through Spatio-temporal Knowledge Distillation

  • Fang He
  • Jiaqi Fan
  • Yang Deng
  • Xiaoyang Zhang
  • Ka Tai Lau
  • Dan Wang

The solar photovoltaic power forecasting (SPPF) of a PV system is vital for the downstream power estimation. While approaches for recent decentralized PV systems require customized models for each PV installation, this method is labor-intensive and not scalable. Therefore, developing a general SPPF model for a decentralized PV system is essential. The primary challenge in developing such a model is accounting for regional weather variations. Recent advancements in weather foundation models (WFMs) offer a promising opportunity, providing accurate forecasts with reduced computational demands. However, integrating WFMs into SPPF models remains challenging due to the complexity of WFMs. This paper introduces a novel approach, spatio-temporal knowledge distillation (STKD), to efficiently adapt WFMs for SPPF. The proposed STKD-PV models leverage regional weather and PV power data to forecast power generation from six hours to a day ahead. Globally evaluated across six datasets, STKD-PV models demonstrate superior performance compared to state-of-the-art (SOTA) time-series models and fine-tuned WFMs, achieving significant improvements in forecasting accuracy. This study marks the first application of knowledge distillation from WFMs to SPPF, offering a scalable and cost-effective solution for decentralized PV systems.

NeurIPS Conference 2025 Conference Paper

When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners

  • Weixiang Zhao
  • Jiahe Guo
  • Yang Deng
  • Tongtong Wu
  • Wenxuan Zhang
  • Yulin Hu
  • Xingyu Sui
  • Yanyan Zhao

Multilingual reasoning remains a significant challenge for large language models (LLMs), with performance disproportionately favoring high-resource languages. Drawing inspiration from cognitive neuroscience, which suggests that human reasoning functions largely independently of language processing, we hypothesize that LLMs similarly encode reasoning and language as separable components that can be disentangled to enhance multilingual reasoning. To evaluate this, we perform a causal intervention by ablating language-specific representations at inference time. Experiments on 10 open-weight LLMs spanning 11 typologically diverse languages show that this language-specific ablation consistently boosts multilingual reasoning performance. Layer-wise analyses further confirm that language and reasoning representations can be effectively disentangled throughout the model, yielding improved multilingual reasoning capabilities, while preserving top-layer language features remains essential for maintaining linguistic fidelity. Compared to post-training methods such as supervised fine-tuning or reinforcement learning, our training-free language-reasoning disentanglement achieves comparable or superior results with minimal computational overhead. These findings shed light on the internal mechanisms underlying multilingual reasoning in LLMs and suggest a lightweight and interpretable strategy for improving cross-lingual generalization.

IJCAI Conference 2023 Conference Paper

A Survey on Proactive Dialogue Systems: Problems, Methods, and Prospects

  • Yang Deng
  • Wenqiang Lei
  • Wai Lam
  • Tat-Seng Chua

Proactive dialogue systems, related to a wide range of real-world conversational applications, equip the conversational agent with the capability of leading the conversation direction towards achieving pre-defined targets or fulfilling certain goals from the system side. It is empowered by advanced techniques to progress to more complicated tasks that require strategical and motivational interactions. In this survey, we provide a comprehensive overview of the prominent problems and advanced designs for conversational agent's proactivity in different types of dialogues. Furthermore, we discuss challenges that meet the real-world application needs but require a greater research focus in the future. We hope that this first survey of proactive dialogue systems can provide the community with a quick access and an overall picture to this practical problem, and stimulate more progresses on conversational AI to the next level.

NeurIPS Conference 2023 Conference Paper

Blurred-Dilated Method for Adversarial Attacks

  • Yang Deng
  • Weibin Wu
  • Jianping Zhang
  • Zibin Zheng

Deep neural networks (DNNs) are vulnerable to adversarial attacks, which lead to incorrect predictions. In black-box settings, transfer attacks can be conveniently used to generate adversarial examples. However, such examples tend to overfit the specific architecture and feature representations of the source model, resulting in poor attack performance against other target models. To overcome this drawback, we propose a novel model modification-based transfer attack: Blurred-Dilated method (BD) in this paper. In summary, BD works by reducing downsampling while introducing BlurPool and dilated convolutions in the source model. Then BD employs the modified source model to generate adversarial samples. We think that BD can more comprehensively preserve the feature information than the original source model. It thus enables more thorough destruction of the image features, which can improve the transferability of the generated adversarial samples. Extensive experiments on the ImageNet dataset show that adversarial examples generated by BD achieve significantly higher transferability than the state-of-the-art baselines. Besides, BD can be conveniently combined with existing black-box attack techniques to further improve their performance.

ICLR Conference 2022 Conference Paper

Blaschke Product Neural Networks (BPNN): A Physics-Infused Neural Network for Phase Retrieval of Meromorphic Functions

  • Juncheng Dong
  • Simiao Ren
  • Yang Deng
  • Omar Khatib
  • Jordan M. Malof
  • Mohammadreza Soltani
  • Willie Padilla
  • Vahid Tarokh

Numerous physical systems are described by ordinary or partial differential equations whose solutions are given by holomorphic or meromorphic functions in the complex domain. In many cases, only the magnitude of these functions are observed on various points on the purely imaginary $j\omega$-axis since coherent measurement of their phases is often expensive. However, it is desirable to retrieve the lost phases from the magnitudes when possible. To this end, we propose a physics-infused deep neural network based on the Blaschke products for phase retrieval. Inspired by the Helson and Sarason Theorem, we recover coefficients of a rational function of Blaschke products using a Blaschke Product Neural Network (BPNN), based upon the magnitude observations as input. The resulting rational function is then used for phase retrieval. We compare the BPNN to conventional deep neural networks (NNs) on several phase retrieval problems, comprising both synthetic and contemporary real-world problems (e.g., metamaterials for which data collection requires substantial expertise and is time consuming). On each phase retrieval problem, we compare against a population of conventional NNs of varying size and hyperparameter settings. Even without any hyper-parameter search, we find that BPNNs consistently outperform the population of optimized NNs in scarce data scenarios, and do so despite being much smaller models. The results can in turn be applied to calculate the refractive index of metamaterials, which is an important problem in emerging areas of material science.

NeurIPS Conference 2021 Conference Paper

Benchmarking Data-driven Surrogate Simulators for Artificial Electromagnetic Materials

  • Yang Deng
  • Juncheng Dong
  • Simiao Ren
  • Omar Khatib
  • Mohammadreza Soltani
  • Vahid Tarokh
  • Willie Padilla
  • Jordan Malof

Artificial electromagnetic materials (AEMs), including metamaterials, derive their electromagnetic properties from geometry rather than chemistry. With the appropriate geometric design, AEMs have achieved exotic properties not realizable with conventional materials (e. g. , cloaking or negative refractive index). However, understanding the relationship between the AEM structure and its properties is often poorly understood. While computational electromagnetic simulation (CEMS) may help design new AEMs, its use is limited due to its long computational time. Recently, it has been shown that deep learning can be an alternative solution to infer the relationship between an AEM geometry and its properties using a (relatively) small pool of CEMS data. However, the limited publicly released datasets and models and no widely-used benchmark for comparison have made using deep learning approaches even more difficult. Furthermore, configuring CEMS for a specific problem requires substantial expertise and time, making reproducibility challenging. Here, we develop a collection of three classes of AEM problems: metamaterials, nanophotonics, and color filter designs. We also publicly release software, allowing other researchers to conduct additional simulations for each system easily. Finally, we conduct experiments on our benchmark datasets with three recent neural network architectures: the multilayer perceptron (MLP), MLP-mixer, and transformer. We identify the methods and models that generalize best over the three problems to establish the best practice and baseline results upon which future research can build.

AAAI Conference 2020 Conference Paper

Joint Learning of Answer Selection and Answer Summary Generation in Community Question Answering

  • Yang Deng
  • Wai Lam
  • Yuexiang Xie
  • Daoyuan Chen
  • Yaliang Li
  • Min Yang
  • Ying Shen

Community question answering (CQA) gains increasing popularity in both academy and industry recently. However, the redundancy and lengthiness issues of crowdsourced answers limit the performance of answer selection and lead to reading difficulties and misunderstandings for community users. To solve these problems, we tackle the tasks of answer selection and answer summary generation in CQA with a novel joint learning model. Specifically, we design a question-driven pointer-generator network, which exploits the correlation information between question-answer pairs to aid in attending the essential information when generating answer summaries. Meanwhile, we leverage the answer summaries to alleviate noise in original lengthy answers when ranking the relevancy degrees of question-answer pairs. In addition, we construct a new large-scale CQA corpus, WikiHowQA, which contains long answers for answer selection as well as reference summaries for answer summarization. The experimental results show that the joint learning method can effectively address the answer redundancy issue in CQA and achieves state-ofthe-art results on both answer selection and text summarization tasks. Furthermore, the proposed model is shown to be of great transferring ability and applicability for resource-poor CQA tasks, which lack of reference answer summaries.

AAAI Conference 2019 Conference Paper

Multi-Task Learning with Multi-View Attention for Answer Selection and Knowledge Base Question Answering

  • Yang Deng
  • Yuexiang Xie
  • Yaliang Li
  • Min Yang
  • Nan Du
  • Wei Fan
  • Kai Lei
  • Ying Shen

Answer selection and knowledge base question answering (KBQA) are two important tasks of question answering (QA) systems. Existing methods solve these two tasks separately, which requires large number of repetitive work and neglects the rich correlation information between tasks. In this paper, we tackle answer selection and KBQA tasks simultaneously via multi-task learning (MTL), motivated by the following motivations. First, both answer selection and KBQA can be regarded as a ranking problem, with one at text-level while the other at knowledge-level. Second, these two tasks can benefit each other: answer selection can incorporate the external knowledge from knowledge base (KB), while KBQA can be improved by learning contextual information from answer selection. To fulfill the goal of jointly learning these two tasks, we propose a novel multi-task learning scheme that utilizes multi-view attention learned from various perspectives to enable these tasks to interact with each other as well as learn more comprehensive sentence representations. The experiments conducted on several real-world datasets demonstrate the effectiveness of the proposed method, and the performance of answer selection and KBQA is improved. Also, the multi-view attention scheme is proved to be effective in assembling attentive information from different representational perspectives.