Arrow Research search

Author name cluster

Ting Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

80 papers
2 author rows

Possible papers

80

AAAI Conference 2026 Conference Paper

Bridging Scale Discrepancies in Robotic Control via Language-Based Action Representations

  • Yuchi Zhang
  • Churui Sun
  • Shiqi Liang
  • Diyuan Liu
  • Chao Ji
  • Weinan Zhang
  • Ting Liu

Recent end-to-end robotic manipulation research increasingly adopts architectures inspired by large language models to enable robust manipulation. However, a critical challenge arises from severe distribution shifts between robotic action data, primarily due to substantial numerical variations in action commands across diverse robotic platforms and tasks, hindering the effective transfer of pretrained knowledge. To address this limitation, we propose a semantically grounded linguistic representation to normalize actions for efficient pretraining. Unlike conventional discretized action representations that are sensitive to numerical scales, the motion representation specifically disregards numeric scale effects, emphasizing directionality instead. This abstraction mitigates distribution shifts, yielding a more generalizable pretraining representation. Moreover, using the motion representation narrows the feature distance between action tokens and standard vocabulary tokens, mitigating modality gaps. Multi-task experiments on two benchmarks demonstrate that the proposed method significantly improves generalization performance and transferability in robotic manipulation tasks.

AAAI Conference 2026 Conference Paper

CultureRL: Internalizing Cultural Principles in Large Language Models via Norm-Driven Reinforcement Learning

  • Weixiang Zhao
  • Haozhen Li
  • Yanyan Zhao
  • Haixiao Liu
  • Biye Li
  • Ting Liu
  • Bing Qin

As large language models (LLMs) are increasingly deployed across culturally diverse regions, ensuring that their responses align with users’ cultural norms has become a critical challenge. Existing approaches to cultural alignment primarily rely on prompting or data-augmentation-based supervised finetuning, which teach models to follow norms indirectly through example-based supervision. However, these methods are difficult to scale and often fail to generalize, particularly in low-resource cultural settings. In this work, we propose CultureRL, a culture-norm-driven reinforcement learning framework that directly encodes cultural principles into model behavior. Rather than relying on output imitation, CultureRL provides normative feedback during training, enabling the model to internalize high-level cultural rules. It consists of two key components: (1) Norm Pool Construction (NPC), which clusters data from the World Values Survey into abstract cultural concepts to form a structured and retrievable norm pool; and (2) Norm Cluster-based Reward Mechanism (NCRM), which retrieves the relevant norm for each input and uses an external reward model to assess conformity, guiding model updates through cultural alignment. We evaluate CultureRL in both one-for-one (per-culture) and one-for-all (multi-culture) settings across nine cultures and three benchmarks. Results show that CultureRL consistently outperforms strong baselines, especially in terms of cultural consistency and adaptability.

AAAI Conference 2026 Conference Paper

MedAtlas: Evaluating LLMs for Multi-Round, Multi-Task Medical Reasoning Across Diverse Imaging Modalities and Clinical Text

  • Ronghao Xu
  • Zhen Huang
  • Yangbo Wei
  • Xiaoqian Zhou
  • Zikang Xu
  • Ting Liu
  • Zihang Jiang
  • S. Kevin Zhou

Artificial intelligence has demonstrated significant potential in clinical decision-making; however, developing models capable of adapting to diverse real-world scenarios and performing complex diagnostic reasoning remains a major challenge. Existing medical multi-modal benchmarks are typically limited to single-image, single-turn tasks, lacking multi-modal medical image integration and failing to capture the longitudinal and multi-modal interactive nature inherent to clinical practice. To address this gap, we introduce MedAtlas, a novel benchmark framework designed to evaluate large language models on realistic medical reasoning tasks. MedAtlas is characterized by four key features: multi-round visual question answering (VQA), Joint reasoning of multiple modalities of medical images, multi-task integration, and high clinical fidelity. It supports four core tasks: open-ended multi-round VQA, closed-ended multi-round VQA, multi-image joint reasoning, and comprehensive disease diagnosis. Each case is derived from real diagnostic workflows and incorporates temporal interactions between textual medical histories and multiple imaging modalities, including CT, MRI, PET, ultrasound, X-ray, etc., requiring models to perform deep integrative reasoning across images and clinical texts. MedAtlas provides expert-annotated gold standards for all tasks. Furthermore, we propose two novel evaluation metrics: Stage Chain Accuracy (SCA) and Error Propagation Suppression Coefficient (EPSC). Benchmark results with existing multi-modal models reveal substantial performance gaps in multi-stage clinical reasoning. MedAtlas establishes a challenging evaluation platform to advance the development of robust and trustworthy medical AI.

AAAI Conference 2026 Conference Paper

MME-SCI: A Comprehensive and Challenging Science Benchmark for Multimodal Large Language Models

  • Jiacheng Ruan
  • Dan Jiang
  • Xian Gao
  • Ting Liu
  • Yuzhuo Fu
  • Yangyang Kang

Recently, multimodal large language models (MLLMs) have achieved significant advancements across various domains, and corresponding evaluation benchmarks have been continuously refined and improved. In this process, benchmarks in the scientific domain have played an important role in assessing the reasoning capabilities of MLLMs. However, existing benchmarks still face three key challenges: 1) Insufficient evaluation of models' reasoning abilities in multilingual scenarios; 2) Inadequate assessment of MLLMs' comprehensive modality coverage; 3) Lack of fine-grained annotation of scientific knowledge points. To address these gaps, we propose MME-SCI, a comprehensive and challenging benchmark. We carefully collected 1,019 high-quality question-answer pairs, which involve 3 distinct evaluation modes. These pairs cover four subjects, namely mathematics, physics, chemistry, and biology, and support five languages: Chinese, English, French, Spanish, and Japanese. We conducted extensive experiments on 16 open-source models and 4 closed-source models, and the results demonstrate that MME-SCI is widely challenging for existing MLLMs. For instance, under the Image-only evaluation mode, o4-mini achieved accuracy of only 52.11%, 24.73%, 36.57%, and 29.80% in mathematics, physics, chemistry, and biology, respectively, indicating a significantly higher difficulty level compared to existing benchmarks. More importantly, using MME-SCI's multilingual and fine-grained knowledge attributes, we analyzed existing models' performance in depth and identified their weaknesses in specific domains. For example, in questions related to "Magnetic Field", o4-mini correctly answered only 5 out of 33 questions, thereby fine-grainedly exposing the model's vulnerabilities. These findings highlight the urgent need to enhance the scientific reasoning capabilities of MLLMs.

AAAI Conference 2026 Conference Paper

The Visual Prism: Refracting Images into Parallel Multilingual Descriptions with Structured Visual Guidance

  • Chengpeng Fu
  • Xiaocheng Feng
  • Yichong Huang
  • Wenshuai Huo
  • Baohang Li
  • Yang Xiang
  • Ting Liu

Parallel corpora, as the foundation of machine translation, remain crucial even in the era of large language models (LLMs) for pre-training and fine-tuning. However, annotating parallel corpora is extremely costly, as it requires annotators to be proficient in multiple languages. To reduce this cost, prior work has explored image-pivoted corpus synthesis, generating multilingual captions for the same image as pseudo-parallel data. Unfortunately, these pseudo corpora suffer from the serious issue of multilingual focus divergence, i.e., the model attending to distinct aspects of the image when generating captions in different languages. To address this problem, we propose a method called PRISMS (Parallel Refracting ImageS into Multilingual descriptions with Structured visual guidance), which leverages semantic graphs as structured visual guidance to unify the focus of multilingual captions. To ensure adherence to this guidance, we introduce two key techniques: supervised fine-tuning using self-generated instructional data, and reinforcement learning with a reward signal based on semantic graph consistency. Experimental results on five languages show that our PRISMS significantly improves the image-pivot parallel corpora synthesis, enabling LLMs to achieve translation performance comparable to that of models trained on manually annotated corpora.

AAAI Conference 2026 Conference Paper

Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities

  • Weixiang Zhao
  • Xingyu Sui
  • Jiahe Guo
  • Yulin Hu
  • Yang Deng
  • Yanyan Zhao
  • Xuda Zhi
  • Yongbo Huang

Recent advancements in Large Reasoning Models (LRMs), such as OpenAI's o1/o3 and DeepSeek-R1, have demonstrated remarkable performance in specialized reasoning tasks through human-like deliberative thinking and long chain-of-thought reasoning. However, our systematic evaluation across various model families (DeepSeek, Qwen, and LLaMA) and scales (7B to 32B) reveals that acquiring these deliberative reasoning capabilities significantly reduces the foundational capabilities of LRMs, including notable declines in helpfulness and harmlessness, alongside substantially increased inference costs. Importantly, we demonstrate that adaptive reasoning---employing modes like Zero-Thinking, Less-Thinking, and Summary-Thinking---can effectively alleviate these drawbacks. Our empirical insights underline the critical need for developing more versatile LRMs capable of dynamically allocating inference-time compute according to specific task characteristics.

AAAI Conference 2026 Conference Paper

Unnoticed Yet Effective: A Hybrid Physical Camouflage Framework Against DNNs and Human Perception

  • Mingye Xie
  • Jiacheng Ruan
  • Xian Gao
  • Ting Liu
  • Yuzhuo Fu

While adversarial attacks can effectively deceive deep neural networks, their real-world applicability is often limited by complex and conspicuous patterns that reveal their attack intent to human observers. To overcome this limitation, we propose UYE, a novel camouflage framework designed to simultaneously mislead DNNs and evade human perception. UYE incorporates two key components: an attention refiner leveraging a pre-trained vision encoder to optimize adversarial patterns for robust attacks across diverse environments, and a perception evaluator trained on a preference dataset curated using tailored prompts from human-aligned large multimodal models to ensure natural and unobtrusive camouflage generation. Extensive experiments demonstrate that UYE outperforms state-of-the-art methods in achieving an optimal balance between human stealth and model deception while maintaining effectiveness in real-world scenarios.

ICLR Conference 2025 Conference Paper

Accelerating Diffusion Transformers with Token-wise Feature Caching

  • Chang Zou
  • Xuyang Liu
  • Ting Liu
  • Siteng Huang
  • Linfeng Zhang 0001

Diffusion transformers have shown significant effectiveness in both image and video synthesis at the expense of huge computation costs. To address this problem, feature caching methods have been introduced to accelerate diffusion transformers by caching the features in previous timesteps and reusing them in the following timesteps. However, previous caching methods ignore that different tokens exhibit different sensitivities to feature caching, and feature caching on some tokens may lead to 10$\times$ more destruction to the overall generation quality compared with other tokens. In this paper, we introduce token-wise feature caching, allowing us to adaptively select the most suitable tokens for caching, and further enable us to apply different caching ratios to neural layers in different types and depths. Extensive experiments on PixArt-alpha, OpenSora, and DiT demonstrate our effectiveness in both image and video generation with no requirements for training. For instance, 2.36$\times$ and 1.93$\times$ acceleration are achieved on OpenSora and PixArt-$\alpha$ with almost no drop in generation quality. Codes have been released in the supplementary material and Github.

IJCAI Conference 2025 Conference Paper

Beyond Fixed Length: Bucket Pre-training is All You Need

  • Qing Yang
  • Qiyao Peng
  • Hongtao Liu
  • Kai Liu
  • Bing Qin
  • Ting Liu

Large Language Models (LLMs) have demonstrated exceptional performance across various tasks, with pre-training stage serving as the cornerstone of their capabilities. However, the conventional fixed-length data composition strategy for pre-training presents several practical challenges. When using shorter sequences, documents are often truncated, potentially leading to information loss and affecting the model's ability to capture long-range dependencies. Conversely, longer sequences require concatenation of multiple documents, which can introduce noise and affect the natural document boundaries and semantic coherence as well as require substantial computational overhead. To address these challenges, we first establish three quantitative metrics for evaluating data composition quality: padding ratio, truncation ratio, and concatenation ratio. Building upon these metrics, we propose a novel multi-bucket data composition method that transcends the fixed-length paradigm. Our approach adaptively organizes training data to achieve optimal composition quality as measured by the proposed metrics, offering a more flexible and efficient approach for pre-training. We conduct extensive experiments and the results demonstrate that our proposed method significantly enhances both the efficiency and effectiveness of LLM pre-training. Our proposed method has been adopted in the Du Xiaoman–XuanYuan series of financial large language models at https: //github. com/Duxiaoman-DI/XuanYuan.

AAAI Conference 2025 Conference Paper

Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation

  • Jiaqi Huang
  • Zunnan Xu
  • Ting Liu
  • Yong Liu
  • Haonan Han
  • Kehong Yuan
  • Xiu Li

In the domain of computer vision, Parameter-Efficient Tuning (PET) is increasingly replacing the traditional paradigm of pre-training followed by full fine-tuning. PET is particularly favored for its effectiveness in large foundation models, as it streamlines transfer learning costs and optimizes hardware utilization. However, the current PET methods are mainly designed for single-modal optimization. While some pioneering studies have undertaken preliminary explorations, they still remain at the level of aligned encoders (e.g., CLIP) and lack exploration of misaligned encoders. These methods show sub-optimal performance with misaligned encoders, as they fail to effectively align the multimodal features during fine-tuning. In this paper, we introduce DETRIS, a parameter-efficient tuning framework designed to enhance low-rank visual feature propagation by establishing dense interconnections between each layer and all preceding layers, which enables effective cross-modal feature interaction and adaptation to misaligned encoders. We also suggest using text adapters to improve textual features. Our simple yet efficient approach greatly surpasses state-of-the-art methods with 0.9% to 1.8% backbone parameter updates, evaluated on challenging benchmarks.

NeurIPS Conference 2025 Conference Paper

How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models? Exploring Key Architecture Design Principles to Avoid Base Capabilities Degradation

  • Xin Lu
  • Yanyan Zhao
  • Si Wei
  • Shijin Wang
  • Bing Qin
  • Ting Liu

Pre-trained language models represented by the Transformer have been proven to possess strong base capabilities, and the representative self-attention mechanism in the Transformer has become a classic in sequence modeling architectures. Different from the work of proposing sequence modeling architecture to improve the efficiency of attention mechanism, this work focuses on the impact of sequence modeling architectures on base capabilities. Specifically, our concern is: How exactly do sequence modeling architectures affect the base capabilities of pre-trained language models? In this work, we first point out that the mixed domain pre-training setting commonly adopted in existing architecture design works fails to adequately reveal the differences in base capabilities among various architectures. To address this, we propose a limited domain pre-training setting with out-of-distribution testing, which successfully uncovers significant differences in base capabilities among architectures at an early stage. Next, we analyze the base capabilities of stateful sequence modeling architectures, and find that they exhibit significant degradation in base capabilities compared to the Transformer. Then, through a series of architecture component analysis, we summarize a key architecture design principle: A sequence modeling architecture need possess full-sequence arbitrary selection capability to avoid degradation in base capabilities. Finally, we empirically validate this principle using an extremely simple Top-1 element selection architecture and further generalize it to a more practical Top-1 chunk selection architecture. Experimental results demonstrate our proposed sequence modeling architecture design principle and suggest that our work can serve as a valuable reference for future architecture improvements and novel designs.

AAAI Conference 2025 Conference Paper

Memory Efficient Matting with Adaptive Token Routing

  • Yiheng Lin
  • Yihan Hu
  • Chenyi Zhang
  • Ting Liu
  • Xiaochao Qu
  • Luoqi Liu
  • Yao Zhao
  • Yunchao Wei

Transformer-based models have recently achieved outstanding performance in image matting. However, their application to high-resolution images remains challenging due to the quadratic complexity of global self-attention. To address this issue, we propose MEMatte, a memory-efficient matting framework for processing high-resolution images. MEMatte incorporates a router before each global attention block, directing informative tokens to the global attention while routing other tokens to a Lightweight Token Refinement Module (LTRM). Specifically, the router employs a local-global strategy to predict the routing probability of each token, and the LTRM utilizes efficient modules to simulate global attention. Additionally, we introduce a Batch-constrained Adaptive Token Routing (BATR) mechanism, which allows each router to dynamically route tokens based on image content and the stages of attention block in the network. Furthermore, we construct an ultra high-resolution image matting dataset, UHR-395, comprising 35,500 training images and 1,000 test images, with an average resolution of 4872 × 6017. This dataset is created by compositing 395 different alpha mattes across 11 categories onto various backgrounds, all with high-quality manual annotation. Extensive experiments demonstrate that MEMatte outperforms existing methods on both high-resolution and real-world datasets, significantly reducing memory usage by approximately 88% and latency by 50% on the Composition-1K benchmark.

AAAI Conference 2025 Conference Paper

MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios

  • Jiacheng Ruan
  • Wenzhen Yuan
  • Zehao Lin
  • Ning Liao
  • Zhiyu Li
  • Feiyu Xiong
  • Ting Liu
  • Yuzhuo Fu

Large visual-language models (LVLMs) have achieved great success in multiple applications. However, they still encounter challenges in complex scenes, especially those involving camouflaged objects. This is primarily due to the lack of samples related to camouflaged scenes in the training dataset. To mitigate this issue, we construct the MM-CamObj dataset for the first time, comprising two subsets: CamObj-Align and CamObj-Instruct. Specifically, CamObj-Align contains 11,363 image-text pairs, and it is designed for VL alignment and injecting rich knowledge of camouflaged scenes into LVLMs. CamObj-Instruct is collected for fine-tuning the LVLMs with improved instruction-following capabilities, and it includes 11,363 images and 68,849 conversations with diverse instructions. Based on the MM-CamObj dataset, we propose the CamObj-Llava, an LVLM specifically designed for addressing tasks in camouflaged scenes. To facilitate our model's effective acquisition of knowledge about camouflaged objects and scenes, we introduce a curriculum learning strategy with six distinct modes. Additionally, we construct the CamObj-Bench to evaluate the existing LVLMs' capabilities of understanding, recognition, localization and count in camouflage scenes. This benchmark includes 600 images and 7 tasks, with a total of 9,449 questions. Extensive experiments are conducted on the CamObj-Bench with CamObj-Llava, 8 existing open-source and 3 closed-source LVLMs. Surprisingly, the results indicate that our model achieves a 25.84% improvement in 4 out of 7 tasks compared to GPT-4o.

YNIMG Journal 2025 Journal Article

Neurocognitive mechanisms of age-related decline in global motion perception

  • Yaxi Hong
  • Ting Liu
  • Dan Luo
  • Ziliang Zhu
  • Shizhen Yan
  • Hua Jin

Age-related declines in global motion perception (GMP) may result from alterations in visual noise in combination with morphological changes in the visual cortices. However, the neurocognitive mechanisms that link cortical structural alterations to deficits in noise modulation remain unclear. In this study, we integrated psychophysical method, the perceptual template model (PTM), and structural magnetic resonance imaging to investigate the relationships among brain structure, cognitive processes, and perceptual performance underlying GMP aging. We compared motion coherence thresholds (MCT) of 106 younger and 94 older healthy adults using random-dot kinematograms. The PTM characterized age-related changes in internal additive noise and external noise, while voxel- and surface-based morphometry assessed gray matter volume, cortical thickness, and surface area in visual regions. Mediation models examined how changes in noise mediate the relationship between cortical structure and perceptual performance. PTM analysis revealed that reduced GMP in older adults was significantly associated with increased internal additive noise and external noise. Morphometric analyses indicated that GMP decline was associated with reductions in gray matter volume in right V4v, as well as cortical thinning in left V5 and right V8. Mediation analysis further demonstrated that internal additive noise fully mediated the relationship between cortical thickness in left V5 and MCT, whereas external noise partially mediated the relationships between right V4v gray matter volume and MCT, and between right V8 cortical thickness and MCT. These findings suggest that age-related cortical thickness reduction in left V5 amplifies internal noise, while cortical atrophy in right V4v and V8 impairs the extraction of motion signals from external noise. Overall, this study proposes a novel framework for understanding age-related GMP decline by linking cortical morphology, noise alterations, and perceptual performance.

NeurIPS Conference 2025 Conference Paper

SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning

  • Jiaqi Huang
  • Zunnan Xu
  • Jun Zhou
  • Ting Liu
  • Yicheng Xiao
  • Mingwen Ou
  • Bowen Ji
  • Xiu Li

Leveraging multimodal large models for image segmentation has become a prominent research direction. However, existing approaches typically rely heavily on manually annotated datasets that include explicit reasoning processes, which are costly and time-consuming to produce. Recent advances suggest that reinforcement learning (RL) can endow large models with reasoning capabilities without requiring such reasoning-annotated data. In this paper, we propose SAM-R1, a novel framework that enables multimodal large models to perform fine-grained reasoning in image understanding tasks. Our approach is the first to incorporate fine-grained segmentation settings during the training of multimodal reasoning models. By integrating task-specific, fine-grained rewards with a tailored optimization objective, we further enhance the model's reasoning and segmentation alignment. We also leverage the Segment Anything Model (SAM) as a strong and flexible reward provider to guide the learning process. With only 3k training samples, SAM-R1 achieves strong performance across multiple benchmarks, demonstrating the effectiveness of reinforcement learning in equipping multimodal models with segmentation-oriented reasoning capabilities.

NeurIPS Conference 2025 Conference Paper

Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment

  • Weixiang Zhao
  • Xingyu Sui
  • Yulin Hu
  • Jiahe Guo
  • Haixiao Liu
  • Biye Li
  • Yanyan Zhao
  • Bing Qin

Personalized alignment is essential for enabling large language models (LLMs) to engage effectively in user-centric dialogue. While recent prompt-based and offline optimization methods offer preliminary solutions, they fall short in cold-start scenarios and long-term personalization due to their inherently static and shallow designs. In this work, we introduce the Reinforcement Learning for Personalized Alignment (RLPA) framework, in which an LLM interacts with a simulated user model to iteratively infer and refine user profiles through dialogue. The training process is guided by a dual-level reward structure: the Profile Reward encourages accurate construction of user representations, while the Response Reward incentivizes generation of responses consistent with the inferred profile. We instantiate RLPA by fine-tuning Qwen-2. 5-3B-Instruct, resulting in Qwen-RLPA, which achieves state-of-the-art performance in personalized dialogue. Empirical evaluations demonstrate that Qwen-RLPA consistently outperforms prompting and offline fine-tuning baselines, and even surpasses advanced commercial models such as Claude-3. 5 and GPT-4o. Further analysis highlights Qwen-RLPA's robustness in reconciling conflicting user preferences, sustaining long-term personalization and delivering more efficient inference compared to recent reasoning-focused LLMs. These results emphasize the potential of dynamic profile inference as a more effective paradigm for building personalized dialogue systems.

AAAI Conference 2025 Conference Paper

TTE: Two Tokens Are Enough to Improve Parameter-Efficient Tuning

  • Jiacheng Ruan
  • Mingye Xie
  • Jingsheng Gao
  • Xian Gao
  • Suncheng Xiang
  • Ting Liu
  • Yuzhuo Fu

Existing fine-tuning paradigms are predominantly characterized by Full Parameter Tuning (FPT) and Parameter-Efficient Tuning (PET). FPT fine-tunes all parameters of a pre-trained model on downstream tasks, whereas PET freezes the pre-trained model and employs only a minimal number of learnable parameters for fine-tuning. However, both approaches face issues of overfitting, especially in scenarios where downstream samples are limited. This issue has been thoroughly explored in FPT, but less so in PET. To this end, this paper investigates overfitting in PET, representing a pioneering study in the field. Specifically, across 19 image classification datasets, we employ three classic PET methods (e.g., VPT, Adapter/Adaptformer, and LoRA) and explore various regularization techniques to mitigate overfitting. Regrettably, the results suggest that existing regularization techniques are incompatible with the PET process and may even lead to performance degradation. Consequently, we introduce a new framework named TTE (Two Tokens are Enough), which effectively alleviates overfitting in PET through a novel constraint function based on the learnable tokens. Experiments conducted on 24 datasets across image and few-shot classification tasks demonstrate that our fine-tuning framework not only mitigates overfitting but also significantly enhances PET's performance. Notably, our TTE framework surpasses the highest-performing FPT framework (DR-Tune), utilizing significantly fewer parameters (0.15M vs. 85.84M) and achieving an improvement of 1%.

NeurIPS Conference 2025 Conference Paper

UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection

  • Yang Zhao
  • Kai Xiong
  • Xiao Ding
  • Li Du
  • Yangou Ouyang
  • Zhouhao Sun
  • Jiannan Guan
  • Wenbin Zhang

A primary impediment to scaling reinforcement learning (RL) for large language model (LLM) training is the substantial computational cost, predominantly arising from the necessity of multi-sampling for policy optimization and evaluation. This underscores the critical yet challenging nature of efficient training data selection. Drawing inspiration from the Zone of Proximal Development (ZPD) theory, which posits that learners acquire knowledge more effectively from tasks of intermediate difficulty, we hypothesize that LLMs exhibit optimal learning from data they have not yet mastered but demonstrate the potential to comprehend. Conventional methodologies for assessing data difficulty or informativeness typically rely on computationally intensive multi-sampling or iterative procedures. To address this limitation, we introduce UFO-RL (**U**ncertainty-**F**ocused **O**ptimization for **R**einforcement **L**earning), a novel framework that employs a computationally efficient single-pass uncertainty estimation technique to identify informative training instances. This method, requiring only a single forward pass and obviating the need for iterative next-token computation, achieves a significant acceleration (up to 185$\times$) in data evaluation compared to multi-sampling approaches. UFO-RL leverages this efficient metric to select data within the model's estimated ZPD for training. Extensive experimentation across diverse LLMs and mathematical benchmarks demonstrates that training with a mere 10\% of the data, carefully selected by UFO-RL, yields performance comparable to or even surpassing that of full-data training. Furthermore, this targeted data selection results in up to a 16$\times$ reduction in overall training time, concurrently enhancing training stability and improving generalization capabilities. Thus, UFO-RL presents a practical and highly efficient strategy for scaling RL fine-tuning of LLMs by focusing learning efforts on the most informative and valuable data, thereby mitigating the computational bottlenecks associated with traditional RL training.

NeurIPS Conference 2025 Conference Paper

When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners

  • Weixiang Zhao
  • Jiahe Guo
  • Yang Deng
  • Tongtong Wu
  • Wenxuan Zhang
  • Yulin Hu
  • Xingyu Sui
  • Yanyan Zhao

Multilingual reasoning remains a significant challenge for large language models (LLMs), with performance disproportionately favoring high-resource languages. Drawing inspiration from cognitive neuroscience, which suggests that human reasoning functions largely independently of language processing, we hypothesize that LLMs similarly encode reasoning and language as separable components that can be disentangled to enhance multilingual reasoning. To evaluate this, we perform a causal intervention by ablating language-specific representations at inference time. Experiments on 10 open-weight LLMs spanning 11 typologically diverse languages show that this language-specific ablation consistently boosts multilingual reasoning performance. Layer-wise analyses further confirm that language and reasoning representations can be effectively disentangled throughout the model, yielding improved multilingual reasoning capabilities, while preserving top-layer language features remains essential for maintaining linguistic fidelity. Compared to post-training methods such as supervised fine-tuning or reinforcement learning, our training-free language-reasoning disentanglement achieves comparable or superior results with minimal computational overhead. These findings shed light on the internal mechanisms underlying multilingual reasoning in LLMs and suggest a lightweight and interpretable strategy for improving cross-lingual generalization.

NeurIPS Conference 2024 Conference Paper

Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration

  • Yichong Huang
  • Xiaocheng Feng
  • Baohang Li
  • Yang Xiang
  • Hui Wang
  • Ting Liu
  • Bing Qin

Large language models (LLMs) exhibit complementary strengths in various tasks, motivating the research of LLM ensembling. However, existing work focuses on training an extra reward model or fusion model to select or combine all candidate answers, posing a great challenge to the generalization on unseen data distributions. Besides, prior methods use textual responses as communication media, ignoring the valuable information in the internal representations. In this work, we propose a training-free ensemble framework \textsc{DeePEn}, fusing the informative probability distributions yielded by different LLMs at each decoding step. Unfortunately, the vocabulary discrepancy between heterogeneous LLMs directly makes averaging the distributions unfeasible due to the token misalignment. To address this challenge, \textsc{DeePEn} maps the probability distribution of each model from its own probability space to a universal \textit{relative space} based on the relative representation theory, and performs aggregation. Next, we devise a search-based inverse transformation to transform the aggregated result back to the probability space of one of the ensembling LLMs (main model), in order to determine the next token. We conduct extensive experiments on ensembles of different number of LLMs, ensembles of LLMs with different architectures, and ensembles between the LLM and the specialist model. Experimental results show that (i) \textsc{DeePEn} achieves consistent improvements across six benchmarks covering subject examination, reasoning, and knowledge, (ii) a well-performing specialist model can benefit from a less effective LLM through distribution fusion, and (iii) \textsc{DeePEn} has complementary strengths with other ensemble methods such as voting.

AAAI Conference 2024 Conference Paper

LAMM: Label Alignment for Multi-Modal Prompt Learning

  • Jingsheng Gao
  • Jiacheng Ruan
  • Suncheng Xiang
  • Zefang Yu
  • Ke Ji
  • Mingye Xie
  • Ting Liu
  • Yuzhuo Fu

With the success of pre-trained visual-language (VL) models such as CLIP in visual representation tasks, transferring pre-trained models to downstream tasks has become a crucial paradigm. Recently, the prompt tuning paradigm, which draws inspiration from natural language processing (NLP), has made significant progress in VL field. However, preceding methods mainly focus on constructing prompt templates for text and visual inputs, neglecting the gap in class label representations between the VL models and downstream tasks. To address this challenge, we introduce an innovative label alignment method named \textbf{LAMM}, which can dynamically adjust the category embeddings of downstream datasets through end-to-end training. Moreover, to achieve a more appropriate label distribution, we propose a hierarchical loss, encompassing the alignment of the parameter space, feature space, and logits space. We conduct experiments on 11 downstream vision datasets and demonstrate that our method significantly improves the performance of existing multi-modal prompt learning models in few-shot scenarios, exhibiting an average accuracy improvement of 2.31(\%) compared to the state-of-the-art methods on 16 shots. Moreover, our methodology exhibits the preeminence in continual learning compared to other prompt tuning methods. Importantly, our method is synergistic with existing prompt tuning methods and can boost the performance on top of them. Our code and dataset will be publicly available at https://github.com/gaojingsheng/LAMM.

AAAI Conference 2024 Conference Paper

Lyapunov-Stable Deep Equilibrium Models

  • Haoyu Chu
  • Shikui Wei
  • Ting Liu
  • Yao Zhao
  • Yuto Miyatake

Deep equilibrium (DEQ) models have emerged as a promising class of implicit layer models, which abandon traditional depth by solving for the fixed points of a single nonlinear layer. Despite their success, the stability of the fixed points for these models remains poorly understood. By considering DEQ models as nonlinear dynamic systems, we propose a robust DEQ model named LyaDEQ with guaranteed provable stability via Lyapunov theory. The crux of our method is ensuring the Lyapunov stability of the DEQ model's fixed points, which enables the proposed model to resist minor initial perturbations. To avoid poor adversarial defense due to Lyapunov-stable fixed points being located near each other, we orthogonalize the layers after the Lyapunov stability module to separate different fixed points. We evaluate LyaDEQ models under well-known adversarial attacks, and experimental results demonstrate significant improvement in robustness. Furthermore, we show that the LyaDEQ model can be combined with other defense methods, such as adversarial training, to achieve even better adversarial robustness.

AAAI Conference 2024 Conference Paper

Manifold-Based Verbalizer Space Re-embedding for Tuning-Free Prompt-Based Classification

  • Haochun Wang
  • Sendong Zhao
  • Chi Liu
  • Nuwa Xi
  • MuZhen Cai
  • Bing Qin
  • Ting Liu

Prompt-based classification adapts tasks to a cloze question format utilizing the [MASK] token and the filled tokens are then mapped to labels through pre-defined verbalizers. Recent studies have explored the use of verbalizer embeddings to reduce labor in this process. However, all existing studies require a tuning process for either the pre-trained models or additional trainable embeddings. Meanwhile, the distance between high-dimensional verbalizer embeddings should not be measured by Euclidean distance due to the potential for non-linear manifolds in the representation space. In this study, we propose a tuning-free manifold-based space re-embedding method called Locally Linear Embedding with Intra-class Neighborhood Constraint (LLE-INC) for verbalizer embeddings, which preserves local properties within the same class as guidance for classification. Experimental results indicate that even without tuning any parameters, our LLE-INC is on par with automated verbalizers with parameter tuning. And with the parameter updating, our approach further enhances prompt-based tuning by up to 3.2%. Furthermore, experiments with the LLaMA-7B&13B indicate that LLE-INC is an efficient tuning-free classification approach for the hyper-scale language models.

NeurIPS Conference 2024 Conference Paper

Meaningful Learning: Enhancing Abstract Reasoning in Large Language Models via Generic Fact Guidance

  • Kai Xiong
  • Xiao Ding
  • Ting Liu
  • Bing Qin
  • Dongliang Xu
  • Qing Yang
  • Hongtao Liu
  • Yixin Cao

Large language models (LLMs) have developed impressive performance and strong explainability across various reasoning scenarios, marking a significant stride towards mimicking human-like intelligence. Despite this, when tasked with several simple questions supported by a generic fact, LLMs often struggle to abstract and apply the generic fact to provide consistent and precise answers, revealing a deficiency in abstract reasoning abilities. This has sparked a vigorous debate about whether LLMs are genuinely reasoning or merely memorizing. In light of this, we design a preliminary study to quantify and delve into the abstract reasoning abilities of existing LLMs. Our findings reveal a substantial discrepancy between their general reasoning and abstract reasoning performances. To relieve this problem, we tailor an abstract reasoning dataset (AbsR) together with a meaningful learning paradigm to teach LLMs how to leverage generic facts for reasoning purposes. The results show that our approach not only boosts the general reasoning performance of LLMs but also makes considerable strides towards their capacity for abstract reasoning, moving beyond simple memorization or imitation to a more nuanced understanding and application of generic facts. The code is available at https: //github. com/Waste-Wood/MeanLearn.

YNIMG Journal 2024 Journal Article

Structural and functional alterations in MRI-negative drug-resistant epilepsy and associated gene expression features

  • Ting Liu
  • Sheng Wang
  • Yingjie Tang
  • Sisi Jiang
  • Huixia Lin
  • Fei Li
  • Dezhong Yao
  • Xian Zhu

Neuroimaging techniques have been widely used in the study of epilepsy. However, structural and functional changes in the MRI-negative drug-resistant epilepsy (DRE) and the genetic mechanisms behind the structural alterations remain poorly understood. Using structural and functional MRI, we analyzed gray matter volume (GMV) and regional homogeneity (ReHo) in DRE, drug-sensitive epilepsy (DSE) and healthy controls. Gene expression data from Allen human brain atlas and GMV/ReHo were evaluated to obtain drug resistance-related and epilepsy-associated gene expression and compared with real transcriptional data in blood. We found structural and functional alterations in the cerebellum of DRE patients, which may be related to the mechanisms of drug resistance in DRE. Our study confirms that changes in brain morphology and regional activity in DRE patients may be associated with abnormal gene expression related to nervous system development. And SP1, as an important transcription factor, plays an important role in the mechanism of drug resistance.

NeurIPS Conference 2024 Conference Paper

V-PETL Bench: A Unified Visual Parameter-Efficient Transfer Learning Benchmark

  • Yi Xin
  • Siqi Luo
  • Xuyang Liu
  • Yuntao Du
  • Haodi Zhou
  • Xinyu Cheng
  • Christina Lee
  • Junlong Du

Parameter-efficient transfer learning (PETL) methods show promise in adapting a pre-trained model to various downstream tasks while training only a few parameters. In the computer vision (CV) domain, numerous PETL algorithms have been proposed, but their direct employment or comparison remains inconvenient. To address this challenge, we construct a Unified Visual PETL Benchmark (V-PETL Bench) for the CV domain by selecting 30 diverse, challenging, and comprehensive datasets from image recognition, video action recognition, and dense prediction tasks. On these datasets, we systematically evaluate 25 dominant PETL algorithms and open-source a modular and extensible codebase for fair evaluation of these algorithms. V-PETL Bench runs on NVIDIA A800 GPUs and requires approximately 310 GPU days. We release all the benchmark, making it more efficient and friendly to researchers. Additionally, V-PETL Bench will be continuously updated for new PETL algorithms and CV tasks.

TMLR Journal 2024 Journal Article

VideoGLUE: Video General Understanding Evaluation of Foundation Models

  • Liangzhe Yuan
  • Nitesh Bharadwaj Gundavarapu
  • Long Zhao
  • Hao Zhou
  • Yin Cui
  • Lu Jiang
  • Xuan Yang
  • Menglin Jia

We evaluate the video understanding capabilities of existing foundation models (FMs) using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition,temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring an FM for downstream tasks. Furthermore,we jointly profile FMs’ efficacy and efficiency when adapting to general video understanding tasks using cost measurements during both training and inference. Our main findings areas follows. First, task-specialized models significantly outperform the seven FMs studied in this work, in sharp contrast to what FMs have achieved in natural language and image understanding. Second, video-native FMs, whose pretraining data mainly contains the video modality, are generally better than image-native FMs in classifying motion-rich videos,localizing actions in time, and understanding a video of more than one action. Third, the video-native FMs can perform well on video tasks under light adaptations to downstream tasks (e.g., freezing the FM backbones), while image-native FMs win in full end-to-end finetuning. The first two observations reveal the need and tremendous opportunities to conduct research on video-focused FMs, and the last confirms that both tasks and adaptation methods matter when it comes to the evaluation of FMs. Our code is released under: https://github.com/tensorflow/models/tree/master/official/projects/videoglue

AAAI Conference 2023 Conference Paper

Progressive Neighborhood Aggregation for Semantic Segmentation Refinement

  • Ting Liu
  • Yunchao Wei
  • Yanning Zhang

Multi-scale features from backbone networks have been widely applied to recover object details in segmentation tasks. Generally, the multi-level features are fused in a certain manner for further pixel-level dense prediction. Whereas, the spatial structure information is not fully explored, that is similar nearby pixels can be used to complement each other. In this paper, we investigate a progressive neighborhood aggregation (PNA) framework to refine the semantic segmentation prediction, resulting in an end-to-end solution that can perform the coarse prediction and refinement in a unified network. Specifically, we first present a neighborhood aggregation module, the neighborhood similarity matrices for each pixel are estimated on multi-scale features, which are further used to progressively aggregate the high-level feature for recovering the spatial structure. In addition, to further integrate the high-resolution details into the aggregated feature, we apply a self-aggregation module on the low-level features to emphasize important semantic information for complementing losing spatial details. Extensive experiments on five segmentation datasets, including Pascal VOC 2012, CityScapes, COCO-Stuff 10k, DeepGlobe, and Trans10k, demonstrate that the proposed framework can be cascaded into existing segmentation models providing consistent improvements. In particular, our method achieves new state-of-the-art performances on two challenging datasets, DeepGlobe and Trans10k. The code is available at https://github.com/liutinglt/PNA.

AAAI Conference 2023 Conference Paper

Self-Supervised Logic Induction for Explainable Fuzzy Temporal Commonsense Reasoning

  • Bibo Cai
  • Xiao Ding
  • Zhouhao Sun
  • Bing Qin
  • Ting Liu
  • Baojun Wang
  • Lifeng Shang

Understanding temporal commonsense concepts, such as times of occurrence and durations is crucial for event-centric language understanding. Reasoning about such temporal concepts in a complex context requires reasoning over both the stated context and the world knowledge that underlines it. A recent study shows massive pre-trained LM still struggle with such temporal reasoning under complex contexts (e.g., dialog) because they only implicitly encode the relevant contexts and fail to explicitly uncover the underlying logical compositions for complex inference, thus may not be robust enough. In this work, we propose to augment LMs with the temporal logic induction ability, which frames the temporal reasoning by defining three modular components: temporal dependency inducer and temporal concept defuzzifier and logic validator. The former two components disentangle the explicit/implicit dependency between temporal concepts across context (before, after,...) and the specific meaning of fuzzy temporal concepts, respectively, while the validator combines the intermediate reasoning clues for robust contextual reasoning about the temporal concepts. Extensive experimental results on TIMEDIAL, a challenging dataset for temporal reasoning over dialog, show that our method, Logic Induction Enhanced Contextualized TEmporal Reasoning (LECTER), can yield great improvements over the traditional language model for temporal reasoning.

AAAI Conference 2022 Conference Paper

Mitigating Reporting Bias in Semi-supervised Temporal Commonsense Inference with Probabilistic Soft Logic

  • Bibo Cai
  • Xiao Ding
  • Bowen Chen
  • Li Du
  • Ting Liu

Acquiring high-quality temporal common sense (TCS) knowledge from free-form text is a crucial but challenging problem for event-centric natural language understanding, due to the language reporting bias problem: people rarely report the commonly observed events but highlight the special cases. For example, one may rarely report “I get up from bed in 1 minute”, but we can observe “It takes me an hour to get up from bed every morning” in text. Models directly trained upon such corpus would capture distorted TCS knowledge, which could influence the model performance. Prior work addresses this issue mainly by exploiting the interactions among temporal dimensions (e. g. , duration, temporal relation between events) in a multi-task view. However, this line of work suffers the limitation of implicit, inadequate and unexplainable interactions modeling. In this paper, we propose a novel neural-logic based Soft Logic Enhanced Event Temporal Reasoning (SLEER) model for acquiring unbiased TCS knowledge, in which the complementary relationship among dimensions are explicitly represented as logic rules and modeled by t-norm fuzzy logics. SLEER can utilize logic rules to regularize its inference process. Experimental results on four intrinsic evaluation datasets and two extrinsic datasets show the efficiency of our proposed method.

AAAI Conference 2022 Conference Paper

You Only Infer Once: Cross-Modal Meta-Transfer for Referring Video Object Segmentation

  • Dezhuang Li
  • Ruoqi Li
  • Lijun Wang
  • Yifan Wang
  • Jinqing Qi
  • Lu Zhang
  • Ting Liu
  • Qingquan Xu

We present YOFO (You Only inFer Once), a new paradigm for referring video object segmentation (RVOS) that operates in an one-stage manner. Our key insight is that the language descriptor should serve as target-specific guidance to identify the target object, while a direct feature fusion of image and language can increase feature complexity and thus may be sub-optimal for RVOS. To this end, we propose a metatransfer module, which is trained in a learning-to-learn fashion and aims to transfer the target-specific information from the language domain to the image domain, while discarding the uncorrelated complex variations of language description. To bridge the gap between the image and language domains, we develop a multi-scale cross-modal feature mining block that aggregates all the essential features required by RVOS from both domains and generates regression labels for the meta-transfer module. The whole system can be trained in an end-to-end manner and shows competitive performance against state-of-the-art two-stage approaches.

IJCAI Conference 2021 Conference Paper

A Survey on Spoken Language Understanding: Recent Advances and New Frontiers

  • Libo Qin
  • Tianbao Xie
  • Wanxiang Che
  • Ting Liu

Spoken Language Understanding (SLU) aims to extract the semantics frame of user queries, which is a core component in a task-oriented dialog system. With the burst of deep neural networks and the evolution of pre-trained language models, the research of SLU has obtained significant breakthroughs. However, there remains a lack of a comprehensive survey summarizing existing approaches and recent trends, which motivated the work presented in this article. In this paper, we survey recent advances and new frontiers in SLU. Specifically, we give a thorough review of this research field, covering different aspects including (1) new taxonomy: we provide a new perspective for SLU filed, including single model vs. joint model, implicit joint modeling vs. explicit joint modeling in joint model, non pre-trained paradigm vs. pretrained paradigm; (2) new frontiers: some emerging areas in complex SLU as well as the corresponding challenges; (3) abundant open-source resources: to help the community, we have collected, organized the related papers, baseline projects and leaderboard on a public website where SLU researchers could directly access to the recent progress. We hope that this survey can shed a light on future research in SLU field.

AAAI Conference 2021 Conference Paper

C2C-GenDA: Cluster-to-Cluster Generation for Data Augmentation of Slot Filling

  • Yutai Hou
  • Sanyuan Chen
  • Wanxiang Che
  • Cheng Chen
  • Ting Liu

Slot filling, a fundamental module of spoken language understanding, often suffers from insufficient quantity and diversity of training data. To remedy this, we propose a novel Cluster-to-Cluster generation framework for Data Augmentation (DA), named C2C-GenDA. It enlarges the training set by reconstructing existing utterances into alternative expressions while keeping semantic. Different from previous DA works that reconstruct utterances one by one independently, C2C-GenDA jointly encodes multiple existing utterances of the same semantics and simultaneously decodes multiple unseen expressions. Jointly generating multiple new utterances allows to consider the relations between generated instances and encourages diversity. Besides, encoding multiple existing utterances endows C2C with a wider view of existing expressions, helping to reduce generation that duplicates existing data. Experiments on ATIS and Snips datasets show that instances augmented by C2C-GenDA improve slot filling by 7. 99 (11. 9%↑) and 5. 76 (13. 6%↑) F-scores respectively, when there are only hundreds of training utterances. Code: https: //github. com/Sanyuan-Chen/C2C-DA.

AAAI Conference 2021 Conference Paper

Co-GAT: A Co-Interactive Graph Attention Network for Joint Dialog Act Recognition and Sentiment Classification

  • Libo Qin
  • Zhouyang Li
  • Wanxiang Che
  • Minheng Ni
  • Ting Liu

In a dialog system, dialog act recognition and sentiment classification are two correlative tasks to capture speakers’ intentions, where dialog act and sentiment can indicate the explicit and the implicit intentions separately. The dialog context information (contextual information) and the mutual interaction information are two key factors that contribute to the two related tasks. Unfortunately, none of the existing approaches consider the two important sources of information simultaneously. In this paper, we propose a Co-Interactive Graph Attention Network (Co-GAT) to jointly perform the two tasks. The core module is a proposed co-interactive graph interaction layer where a cross-utterances connection and a cross-tasks connection are constructed and iteratively updated with each other, achieving to consider the two types of information simultaneously. Experimental results on two public datasets show that our model successfully captures the two sources of information and achieve the state-of-the-art performance. In addition, we find that the contributions from the contextual and mutual interaction information do not fully overlap with contextualized word representations (BERT, Roberta, XLNet).

AAAI Conference 2021 Conference Paper

Few-shot Learning for Multi-label Intent Detection

  • Yutai Hou
  • Yongkui Lai
  • Yushan Wu
  • Wanxiang Che
  • Ting Liu

In this paper, we study the few-shot multi-label classification for user intent detection. For multi-label intent detection, state-of-the-art work estimates label-instance relevance scores and uses a threshold to select multiple associated intent labels. To determine appropriate thresholds with only a few examples, we first learn universal thresholding experience on data-rich domains, and then adapt the thresholds to certain few-shot domains with a calibration based on nonparametric learning. For better calculation of label-instance relevance score, we introduce label name embedding as anchor points in representation space, which refines representations of different classes to be well-separated from each other. Experiments on two datasets show that the proposed model significantly outperforms strong baselines in both oneshot and five-shot settings. Data and code are available at https: //github. com/AtmaHou/FewShotMultiLabel.

AAAI Conference 2021 Conference Paper

Model Uncertainty Guides Visual Object Tracking

  • Lijun Zhou
  • Antoine Ledent
  • Qintao Hu
  • Ting Liu
  • Jianlin Zhang
  • Marius Kloft

Model object trackers largely rely on the online learning of a discriminative classifier from potentially diverse sample frames. However, noisy or insufficient amounts of samples can deteriorate the classifiers’ performance and cause tracking drift. Furthermore, alterations such as occlusion and blurring can cause the target to be lost. In this paper, we make several improvements aimed at tackling uncertainty and improving robustness in object tracking. Our first and most important contribution is to propose a sampling method for the online learning of object trackers based on uncertainty adjustment: our method effectively selects representative sample frames to feed the discriminative branch of the tracker, while filtering out noise samples. Furthermore, to improve the robustness of the tracker to various challenging scenarios, we propose a novel data augmentation procedure, together with a specific improved backbone architecture. All our improvements fit together in one model, which we refer to as the Uncertainty Adjusted Tracker (UATracker), and can be trained in a joint and end-to-end fashion. Experiments on the LaSOT, UAV123, OTB100 and VOT2018 benchmarks demonstrate that our UATracker outperforms state-of-the-art real-time trackers by significant margins. 1

AAAI Conference 2020 Conference Paper

DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act Recognition and Sentiment Classification

  • Libo Qin
  • Wanxiang Che
  • Yangming Li
  • Mingheng Ni
  • Ting Liu

In dialog system, dialog act recognition and sentiment classification are two correlative tasks to capture speakers’ intentions, where dialog act and sentiment can indicate the explicit and the implicit intentions separately (Kim and Kim 2018). Most of the existing systems either treat them as separate tasks or just jointly model the two tasks by sharing parameters in an implicit way without explicitly modeling mutual interaction and relation. To address this problem, we propose a Deep Co-Interactive Relation Network (DCR-Net) to explicitly consider the cross-impact and model the interaction between the two tasks by introducing a co-interactive relation layer. In addition, the proposed relation layer can be stacked to gradually capture mutual knowledge with multiple steps of interaction. Especially, we thoroughly study different relation layers and their effects. Experimental results on two public datasets (Mastodon and Dailydialog) show that our model outperforms the state-of-the-art joint model by 4. 3% and 3. 4% in terms of F1 score on dialog act recognition task, 5. 7% and 12. 4% on sentiment classification respectively. Comprehensive analysis empirically verifies the effectiveness of explicitly modeling the relation between the two tasks and the multi-steps interaction mechanism. Finally, we employ the Bidirectional Encoder Representation from Transformer (BERT) in our framework, which can further boost our performance in both tasks.

AAAI Conference 2020 Conference Paper

Discriminative Sentence Modeling for Story Ending Prediction

  • Yiming Cui
  • Wanxiang Che
  • Wei-Nan Zhang
  • Ting Liu
  • Shijin Wang
  • Guoping Hu

Story Ending Prediction is a task that needs to select an appropriate ending for the given story, which requires the machine to understand the story and sometimes needs commonsense knowledge. To tackle this task, we propose a new neural network called Diff-Net for better modeling the differences of each ending in this task. The proposed model could discriminate two endings in three semantic levels: contextual representation, story-aware representation, and discriminative representation. Experimental results on the Story Cloze Test dataset show that the proposed model siginificantly outperforms various systems by a large margin, and detailed ablation studies are given for better understanding our model. We also carefully examine the traditional and BERT-based models on both SCT v1. 0 and v1. 5 with interesting findings that may potentially help future studies.

AAAI Conference 2020 Conference Paper

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference

  • Haoyu Song
  • Wei-Nan Zhang
  • Jingwen Hu
  • Ting Liu

Consistency is one of the major challenges faced by dialogue agents. A human-like dialogue agent should not only respond naturally, but also maintain a consistent persona. In this paper, we exploit the advantages of natural language inference (NLI) technique to address the issue of generating persona consistent dialogues. Different from existing work that reranks the retrieved responses through an NLI model, we cast the task as a reinforcement learning problem and propose to exploit the NLI signals from response-persona pairs as rewards for the process of dialogue generation. Specifically, our generator employs an attention-based encoder-decoder to generate persona-based responses. Our evaluator consists of two components: an adversarially trained naturalness module and an NLI based consistency module. Moreover, we use another well-performed NLI model in the evaluation of persona-consistency. Experimental results on both human and automatic metrics, including the model-based consistency evaluation, demonstrate that the proposed approach outperforms strong generative baselines, especially in the personaconsistency of generated responses.

IJCAI Conference 2020 Conference Paper

Guided Generation of Cause and Effect

  • Zhongyang Li
  • Xiao Ding
  • Ting Liu
  • J. Edward Hu
  • Benjamin Van Durme

We present a conditional text generation framework that posits sentential expressions of possible causes and effects. This framework depends on two novel resources we develop in the course of this work: a very large-scale collection of English sentences expressing causal patterns (CausalBank); and a refinement over previous work on constructing large lexical causal knowledge graphs (Cause Effect Graph). Further, we extend prior work in lexically-constrained decoding to support disjunctive positive constraints. Human assessment confirms that our approach gives high-quality and diverse outputs. Finally, we use CausalBank to perform continued training of an encoder supporting a recent state-of-the-art model for causal reasoning, leading to a 3-point improvement on the COPA challenge set, with no change in model architecture.

AAAI Conference 2020 Conference Paper

Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation

  • Xiaocheng Feng
  • Yawei Sun
  • Bing Qin
  • Heng Gong
  • Yibo Sun
  • Wei Bi
  • Xiaojiang Liu
  • Ting Liu

In this paper, we focus on a new practical task, documentscale text content manipulation, which is the opposite of text style transfer and aims to preserve text styles while altering the content. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference. The task is unsupervised due to lack of parallel data, and is challenging to select suitable records and style words from bi-aspect inputs respectively and generate a high-fidelity long document. To tackle those problems, we first build a dataset based on a basketball game report corpus as our testbed, and present an unsupervised neural model with interactive attention mechanism, which is used for learning the semantic relationship between records and reference texts to achieve better content transfer and better style preservation. In addition, we also explore the effectiveness of the back-translation in our task for constructing some pseudotraining pairs. Empirical results show superiority of our approaches over competitive methods, and the models also yield a new state-of-the-art result on a sentence-level dataset. 1

NeurIPS Conference 2020 Conference Paper

Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness

  • Long Zhao
  • Ting Liu
  • Xi Peng
  • Dimitris Metaxas

Adversarial data augmentation has shown promise for training robust deep neural networks against unforeseen data shifts or corruptions. However, it is difficult to define heuristics to generate effective fictitious target distributions containing "hard" adversarial perturbations that are largely different from the source distribution. In this paper, we propose a novel and effective regularization term for adversarial data augmentation. We theoretically derive it from the information bottleneck principle, which results in a maximum-entropy formulation. Intuitively, this regularization term encourages perturbing the underlying source distribution to enlarge predictive uncertainty of the current model, so that the generated "hard" adversarial perturbations can improve the model robustness during training. Experimental results on three standard benchmarks demonstrate that our method consistently outperforms the existing state of the art by a statistically significant margin.

TIST Journal 2020 Journal Article

Multi-Task Learning for Entity Recommendation and Document Ranking in Web Search

  • Jizhou Huang
  • Haifeng Wang
  • Wei Zhang
  • Ting Liu

Entity recommendation, providing users with an improved search experience by proactively recommending related entities to a given query, has become an indispensable feature of today’s Web search engine. Existing studies typically only consider the query issued at the current timestep while ignoring the in-session user search behavior (short-term search history) or historical user search behavior across all sessions (long-term search history) when generating entity recommendations. As a consequence, they may fail to recommend entities of interest relevant to a user’s actual information need. In this work, we believe that both short-term and long-term search history convey valuable evidence that could help understand the user’s search intent behind a query, and take both of them into consideration for entity recommendation. Furthermore, there has been little work on exploring whether the use of other companion tasks in Web search such as document ranking as auxiliary tasks could improve the performance of entity recommendation. To this end, we propose a multi-task learning framework with deep neural networks (DNNs) to jointly learn and optimize two companion tasks in Web search engines: entity recommendation and document ranking, which can be easily trained in an end-to-end manner. Specifically, we regard document ranking as an auxiliary task to improve the main task of entity recommendation, where the representations of queries, sessions, and users are shared across all tasks and optimized by the multi-task objective during training. We evaluate our approach using large-scale, real-world search logs of a widely-used commercial Web search engine. We also performed extensive ablation experiments over a number of facets of the proposed multi-task DNN model to figure out their relative importance. The experimental results show that both short-term and long-term search history can bring significant improvements in recommendation effectiveness, and the combination of both outperforms using either of them individually. In addition, the experiments show that the performance of both entity recommendation and document ranking can be significantly improved, which demonstrates the effectiveness of using multi-task learning to jointly optimize the two companion tasks in Web search.

AAAI Conference 2020 Conference Paper

Multi-Task Self-Supervised Learning for Disfluency Detection

  • Shaolei Wang
  • Wangxiang Che
  • Qi Liu
  • Pengda Qin
  • Ting Liu
  • William Yang Wang

Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expensive to obtain in practice. To tackle the training data bottleneck, we investigate methods for combining multiple self-supervised tasksi. e. , supervised tasks where data can be collected without manual labeling. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled news data, and propose two self-supervised pre-training tasks: (i) tagging task to detect the added noisy words. (ii) sentence classification to distinguish original sentences from grammatically-incorrect sentences. We then combine these two tasks to jointly train a network. The pre-trained network is then fine-tuned using human-annotated disfluency detection training data. Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1% (1000 sentences) of the training data. Our method trained on the full dataset significantly outperforms previous methods, reducing the error by 21% on English Switchboard.

AAAI Conference 2020 Conference Paper

Understanding Medical Conversations with Scattered Keyword Attention and Weak Supervision from Responses

  • Xiaoming Shi
  • Haifeng Hu
  • Wanxiang Che
  • Zhongqian Sun
  • Ting Liu
  • Junzhou Huang

In this work, we consider the medical slot filling problem, i. e. , the problem of converting medical queries into structured representations which is a challenging task. We analyze the effectiveness of two points: scattered keywords in user utterances and weak supervision with responses. We approach the medical slot filling as a multi-label classification problem with label-embedding attentive model to pay more attention to scattered medical keywords and learn the classification models by weak-supervision from responses. To evaluate the approaches, we annotate a medical slot filling data and collect a large scale unlabeled data. The experiments demonstrate that these two points are promising to improve the task.

AAAI Conference 2019 Conference Paper

A Neural Multi-Task Learning Framework to Jointly Model Medical Named Entity Recognition and Normalization

  • Sendong Zhao
  • Ting Liu
  • Sicheng Zhao
  • Fei Wang

State-of-the-art studies have demonstrated the superiority of joint modeling over pipeline implementation for medical named entity recognition and normalization due to the mutual benefits between the two processes. To exploit these benefits in a more sophisticated way, we propose a novel deep neural multi-task learning framework with explicit feedback strategies to jointly model recognition and normalization. On one hand, our method benefits from the general representations of both tasks provided by multi-task learning. On the other hand, our method successfully converts hierarchical tasks into a parallel multi-task setting while maintaining the mutual supports between tasks. Both of these aspects improve the model performance. Experimental results demonstrate that our method performs significantly better than state-of-theart approaches on two publicly available medical literature datasets.

AAAI Conference 2019 Conference Paper

A Neural Network Approach to Verb Phrase Ellipsis Resolution

  • Wei-Nan Zhang
  • Yue Zhang
  • Yuanxing Liu
  • Donglin Di
  • Ting Liu

Verb Phrase Ellipsis (VPE) is a linguistic phenomenon, where some verb phrases as syntactic constituents are omitted and typically referred by an auxiliary verb. It is ubiquitous in both formal and informal text, such as news articles and dialogues. Previous work on VPE resolution mainly focused on manually constructing features extracted from auxiliary verbs, syntactic trees, etc. However, the optimization of feature representation, the effectiveness of continuous features and the automatic composition of features are not well addressed. In this paper, we explore the advantages of neural models on VPE resolution in both pipeline and end-to-end processes, comparing the differences between statistical and neural models. Two neural models, namely multi-layer perception and the Transformer, are employed for the subtasks of VPE detection and resolution. Experimental results show that the neural models outperform the state-of-the-art baselines in both subtasks and the end-to-end results.

AAAI Conference 2019 Conference Paper

Devil in the Details: Towards Accurate Single and Multiple Human Parsing

  • Tao Ruan
  • Ting Liu
  • Zilong Huang
  • Yunchao Wei
  • Shikui Wei
  • Yao Zhao

Human parsing has received considerable interest due to its wide application potentials. Nevertheless, it is still unclear how to develop an accurate human parsing system in an efficient and elegant way. In this paper, we identify several useful properties, including feature resolution, global context information and edge details, and perform rigorous analyses to reveal how to leverage them to benefit the human parsing task. The advantages of these useful properties finally result in a simple yet effective Context Embedding with Edge Perceiving (CE2P) framework for single human parsing. Our CE2P is end-to-end trainable and can be easily adopted for conducting multiple human parsing. Benefiting the superiority of CE2P, we won the 1st places on all three human parsing tracks in the 2nd Look into Person (LIP) Challenge. Without any bells and whistles, we achieved 56. 50% (mIoU), 45. 31% (mean APr ) and 33. 34% (APp 0. 5) in Track 1, Track 2 and Track 5, which outperform the state-of-the-arts more than 2. 06%, 3. 81% and 1. 87%, respectively. We hope our CE2P will serve as a solid baseline and help ease future research in single/multiple human parsing. Code has been made available at https: //github. com/liutinglt/CE2P.

IJCAI Conference 2019 Conference Paper

Exploiting Persona Information for Diverse Generation of Conversational Responses

  • Haoyu Song
  • Wei-Nan Zhang
  • Yiming Cui
  • Dong Wang
  • Ting Liu

In human conversations, due to their personalities in mind, people can easily carry out and maintain the conversations. Giving conversational context with persona information to a chatbot, how to exploit the information to generate diverse and sustainable conversations is still a non-trivial task. Previous work on persona-based conversational models successfully make use of predefined persona information and have shown great promise in delivering more realistic responses. And they all learn with the assumption that given a source input, there is only one target response. However, in human conversations, there are massive appropriate responses to a given input message. In this paper, we propose a memory-augmented architecture to exploit persona information from context and incorporate a conditional variational autoencoder model together to generate diverse and sustainable conversations. We evaluate the proposed model on a benchmark persona-chat dataset. Both automatic and human evaluations show that our model can deliver more diverse and more engaging persona-based responses than baseline approaches.

AAAI Conference 2019 Conference Paper

Gaussian Transformer: A Lightweight Approach for Natural Language Inference

  • Maosheng Guo
  • Yu Zhang
  • Ting Liu

Natural Language Inference (NLI) is an active research area, where numerous approaches based on recurrent neural networks (RNNs), convolutional neural networks (CNNs), and self-attention networks (SANs) has been proposed. Although obtaining impressive performance, previous recurrent approaches are hard to train in parallel; convolutional models tend to cost more parameters, while self-attention networks are not good at capturing local dependency of texts. To address this problem, we introduce a Gaussian prior to selfattention mechanism, for better modeling the local structure of sentences. Then we propose an efficient RNN/CNN-free architecture named Gaussian Transformer for NLI, which consists of encoding blocks modeling both local and global dependency, high-order interaction blocks collecting the evidence of multi-step inference, and a lightweight comparison block saving lots of parameters. Experiments show that our model achieves new state-of-the-art performance on both SNLI and MultiNLI benchmarks with significantly fewer parameters and considerably less training time. Besides, evaluation using the Hard NLI datasets demonstrates that our approach is less affected by the undesirable annotation artifacts.

IJCAI Conference 2019 Conference Paper

Story Ending Prediction by Transferable BERT

  • Zhongyang Li
  • Xiao Ding
  • Ting Liu

Recent advances, such as GPT and BERT, have shown success in incorporating a pre-trained transformer language model and fine-tuning operation to improve downstream NLP systems. However, this framework still has some fundamental problems in effectively incorporating supervised knowledge from other related tasks. In this study, we investigate a transferable BERT (TransBERT) training framework, which can transfer not only general language knowledge from large-scale unlabeled data but also specific kinds of knowledge from various semantically related supervised tasks, for a target task. Particularly, we propose utilizing three kinds of transfer tasks, including natural language inference, sentiment classification, and next action prediction, to further train BERT based on a pre-trained model. This enables the model to get a better initialization for the target task. We take story ending prediction as the target task to conduct experiments. The final result, an accuracy of 91. 8%, dramatically outperforms previous state-of-the-art baseline methods. Several comparative experiments give some helpful suggestions on how to select transfer tasks to improve BERT.

AAAI Conference 2018 Conference Paper

A Neural Transition-Based Approach for Semantic Dependency Graph Parsing

  • Yuxuan Wang
  • Wanxiang Che
  • Jiang Guo
  • Ting Liu

Semantic dependency graph has been recently proposed as an extension of tree-structured syntactic or semantic representation for natural language sentences. It particularly features the structural property of multi-head, which allows nodes to have multiple heads, resulting in a directed acyclic graph (DAG) parsing problem. Yet most statistical parsers focused exclusively on shallow bi-lexical tree structures, DAG parsing remains under-explored. In this paper, we propose a neural transition-based parser, using a variant of list-based arc-eager transition algorithm for dependency graph parsing. Particularly, two non-trivial improvements are proposed for representing the key components of the transition system, to better capture the semantics of segments and internal sub-graph structures. We test our parser on the SemEval-2016 Task 9 dataset (Chinese) and the SemEval-2015 Task 18 dataset (English). On both benchmark datasets, we obtain superior or comparable results to the best performing systems. Our parser can be further improved with a simple ensemble mechanism, resulting in the state-of-the-art performance.

IJCAI Conference 2018 Conference Paper

Constructing Narrative Event Evolutionary Graph for Script Event Prediction

  • Zhongyang Li
  • Xiao Ding
  • Ting Liu

Script event prediction requires a model to predict the subsequent event given an existing event context. Previous models based on event pairs or event chains cannot make full use of dense event connections, which may limit their capability of event prediction. To remedy this, we propose constructing an event graph to better utilize the event network information for script event prediction. In particular, we first extract narrative event chains from large quantities of news corpus, and then construct a narrative event evolutionary graph (NEEG) based on the extracted chains. NEEG can be seen as a knowledge base that describes event evolutionary principles and patterns. To solve the inference problem on NEEG, we present a scaled graph neural network (SGNN) to model event interactions and learn better event representations. Instead of computing the representations on the whole graph, SGNN processes only the concerned nodes each time, which makes our model feasible to large-scale graphs. By comparing the similarity between input context event representations and candidate event representations, we can choose the most reasonable subsequent event. Experimental results on widely used New York Times corpus demonstrate that our model significantly outperforms state-of-the-art baseline methods, by using standard multiple choice narrative cloze evaluation.

IJCAI Conference 2018 Conference Paper

Domain Adaptation via Tree Kernel Based Maximum Mean Discrepancy for User Consumption Intention Identification

  • Xiao Ding
  • Bibo Cai
  • Ting Liu
  • Qiankun Shi

Identifying user consumption intention from social media is of great interests to downstream applications. Since such task is domain-dependent, deep neural networks have been applied to learn transferable features for adapting models from a source domain to a target domain. A basic idea to solve this problem is reducing the distribution difference between the source domain and the target domain such that the transfer error can be bounded. However, the feature transferability drops dramatically in higher layers of deep neural networks with increasing domain discrepancy. Hence, previous work has to use a few target domain annotated data to train domain-specific layers. In this paper, we propose a deep transfer learning framework for consumption intention identification, to reduce the data bias and enhance the transferability in domain-specific layers. In our framework, the representation of the domain-specific layer is mapped to a reproducing kernel Hilbert space, where the mean embeddings of different domain distributions can be explicitly matched. By using an optimal tree kernel method for measuring the mean embedding matching, the domain discrepancy can be effectively reduced. The framework can learn transferable features in a completely unsupervised manner with statistical guarantees. Experimental results on five different domain datasets show that our approach dramatically outperforms state-of-the-art baselines, and it is general enough to be applied to more scenarios. The source code and datasets can be found at http: //ir. hit. edu. cn/$\scriptsize{\sim}$xding/index\_english. htm.

AAAI Conference 2018 Conference Paper

Exploring Implicit Feedback for Open Domain Conversation Generation

  • Wei-Nan Zhang
  • Lingzhi Li
  • Dongyan Cao
  • Ting Liu

User feedback can be an effective indicator to the success of the human-robot conversation. However, to avoid to interrupt the online real-time conversation process, explicit feedback is usually gained at the end of a conversation. Alternatively, users’ responses usually contain their implicit feedback, such as stance, sentiment, emotion, etc. , towards the conversation content or the interlocutors. Therefore, exploring the implicit feedback is a natural way to optimize the conversation generation process. In this paper, we propose a novel reward function which explores the implicit feedback to optimize the future reward of a reinforcement learning based neural conversation model. A simulation strategy is applied to explore the state-action space in training and test. Experimental results show that the proposed approach outperforms the Seq2Seq model and the state-of-the-art reinforcement learning model for conversation generation on automatic and human evaluations on the OpenSubtitles and Twitter datasets.

AAAI Conference 2018 Conference Paper

Hierarchical Attention Flow for Multiple-Choice Reading Comprehension

  • Haichao Zhu
  • Furu Wei
  • Bing Qin
  • Ting Liu

In this paper, we focus on multiple-choice reading comprehension which aims to answer a question given a passage and multiple candidate options. We present the hierarchical attention flow to adequately leverage candidate options to model the interactions among passages, questions and candidate options. We observe that leveraging candidate options to boost evidence gathering from the passages play a vital role in this task, which is ignored in previous works. In addition, we explicitly model the option correlations with attention mechanism to obtain better option representations, which are further fed into a bilinear layer to obtain the ranking score for each option. On a large-scale multiple-choice reading comprehension dataset (i. e. the RACE dataset), the proposed model outperforms two previous neural network baselines on both RACE-M and RACE-H subsets and yields the state-of-the-art overall results.

IJCAI Conference 2018 Conference Paper

Improving Entity Recommendation with Search Log and Multi-Task Learning

  • Jizhou Huang
  • Wei Zhang
  • Yaming Sun
  • Haifeng Wang
  • Ting Liu

Entity recommendation, providing search users with an improved experience by assisting them in finding related entities for a given query, has become an indispensable feature of today's Web search engine. Existing studies typically only consider the query issued at the current time step while ignoring the in-session preceding queries. Thus, they typically fail to handle the ambiguous queries such as "apple" because the model could not understand which apple (company or fruit) is talked about. In this work, we believe that the in-session contexts convey valuable evidences that could facilitate the semantic modeling of queries, and take that into consideration for entity recommendation. Furthermore, in order to better model the semantics of queries, we learn the model in a multi-task learning setting where the query representation is shared across entity recommendation and context-aware ranking. We evaluate our approach using large-scale, real-world search logs of a widely used commercial Web search engine. The experimental results show that incorporating context information significantly improves entity recommendation, and learning the model in a multi-task learning setting could bring further improvements.

IJCAI Conference 2018 Conference Paper

Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer

  • Xiaocheng Feng
  • Xiachong Feng
  • Bing Qin
  • Zhangyin Feng
  • Ting Liu

Neural networks have been widely used for high resource language (e. g. English) named entity recognition (NER) and have shown state-of-the-art results. However, for low resource languages, such as Dutch, Spanish, due to the limitation of resources and lack of annotated data, taggers tend to have lower performances. To narrow this gap, we propose three novel strategies to enrich the semantic representations of low resource languages: we first develop neural networks to improve low resource word representations by knowledge transfer from high resource language using bilingual lexicons. Further, a lexicon extension strategy is designed to address out-of lexicon problem by automatically learning semantic projections. Thirdly, we regard word-level entity type distribution features as an external language-independent knowledge and incorporate them into our neural architecture. Experiments on two low resource languages (including Dutch and Spanish) demonstrate the effectiveness of these additional semantic representations (average 4. 8\% improvement). Moreover, on Chinese OntoNotes 4. 0 dataset, our approach achieved an F-score of 83. 07\% with 2. 91\% absolute gain compared to the state-of-the-art results.

IJCAI Conference 2018 Conference Paper

Joint Extraction of Entities and Relations Based on a Novel Graph Scheme

  • Shaolei Wang
  • Yue Zhang
  • Wanxiang Che
  • Ting Liu

Both entity and relation extraction can benefit from being performed jointly, allowing each task to correct the errors of the other. Most existing neural joint methods extract entities and relations separately and achieve joint learning through parameter sharing, leading to a drawback that information between output entities and relations cannot be fully exploited. In this paper, we convert the joint task into a directed graph by designing a novel graph scheme and propose a transition-based approach to generate the directed graph incrementally, which can achieve joint learning through joint decoding. Our method can model underlying dependencies not only between entities and relations, but also between relations. Experiments on NewYork Times (NYT) corpora show that our approach outperforms the state-of-the-art methods.

IJCAI Conference 2018 Conference Paper

Topic-to-Essay Generation with Neural Networks

  • Xiaocheng Feng
  • Ming Liu
  • Jiahao Liu
  • Bing Qin
  • Yibo Sun
  • Ting Liu

We focus on essay generation, which is a challenging task that generates a paragraph-level text with multiple topics. Progress towards understanding different topics and expressing diversity in this task requires more powerful generators and richer training and evaluation resources. To address this, we develop a multi-topic aware long short-term memory (MTA-LSTM) network. In this model, we maintain a novel multi-topic coverage vector, which learns the weight of each topic and is sequentially updated during the decoding process. Afterwards this vector is fed to an attention model to guide the generator. Moreover, we automatically construct two paragraph-level Chinese essay corpora, 305, 000 essay paragraphs and 55, 000 question-and-answer pairs. Empirical results show that our approach obtains much better BLEU score compared to various baselines. Furthermore, human judgment shows that MTA-LSTM has the ability to generate essays that are not only coherent but also closely related to the input topics.

IJCAI Conference 2017 Conference Paper

A Deep Neural Network for Chinese Zero Pronoun Resolution

  • Qingyu Yin
  • Weinan Zhang
  • Yu Zhang
  • Ting Liu

Existing approaches for Chinese zero pronoun resolution overlook semantic information. This is because zero pronouns have no descriptive information, which results in difficulty in explicitly capturing their semantic similarities with antecedents. Moreover, when dealing with candidate antecedents, traditional systems simply take advantage of the local information of a single candidate antecedent while failing to consider the underlying information provided by the other candidates from a global perspective. To address these weaknesses, we propose a novel zero pronoun-specific neural network, which is capable of representing zero pronouns by utilizing the contextual information at the semantic level. In addition, when dealing with candidate antecedents, a two-level candidate encoder is employed to explicitly capture both the local and global information of candidate antecedents. We conduct experiments on the Chinese portion of the OntoNotes 5. 0 corpus. Experimental results show that our approach substantially outperforms the state-of-the-art method in various experimental settings.

IJCAI Conference 2017 Conference Paper

ContextCare: Incorporating Contextual Information Networks to Representation Learning on Medical Forum Data

  • Stan Zhao
  • Meng Jiang
  • Quan Yuan
  • Bing Qin
  • Ting Liu
  • ChengXiang Zhai

Online users have generated a large amount of health-related data on medical forums and search engines. However, exploiting these rich data for orienting patient online and assisting medical checkup offline is nontrivial due to the sparseness of existing symptom-disease links, which caused by the natural and chatty expressions of symptoms. In this paper, we propose a novel and general representation learning method ContextCare for human generated health-related data, which learns the latent relationship between symptoms and diseases from the symptom-disease diagnosis network for disease prediction, disease category prediction and disease clustering. To alleviate the network sparseness, ContextCare adopts regularizations from rich contextual information networks including a symptom co-occurrence network and a disease evolution network. Therefore, our representations of symptoms and diseases incorporate knowledge from these three networks. Extensive experiments on medical forum data demonstrate that ContextCare outperforms the state-of-the-art methods in disease category prediction, disease prediction and disease clustering.

IJCAI Conference 2017 Conference Paper

Effective Deep Memory Networks for Distant Supervised Relation Extraction

  • Xiaocheng Feng
  • Jiang Guo
  • Bing Qin
  • Ting Liu
  • Yongjie Liu

Distant supervised relation extraction (RE) has been an effective way of finding novel relational facts from text without labeled training data. Typically it can be formalized as a multi-instance multi-label problem. In this paper, we introduce a novel neural approach for distant supervised (RE) with specific focus on attention mechanisms. Unlike the feature-based logistic regression model and compositional neural models such as CNN, our approach includes two major attention-based memory components, which is capable of explicitly capturing the importance of each context word for modeling the representation of the entity pair, as well as the intrinsic dependencies between relations. Such importance degree and dependency relationship are calculated with multiple computational layers, each of which is a neural attention model over an external memory. Experiment on real-world datasets shows that our approach performs significantly and consistently better than various baselines.

TIST Journal 2017 Journal Article

Personalized Microtopic Recommendation on Microblogs

  • Yang Li
  • Jing Jiang
  • Ting Liu
  • Minghui Qiu
  • Xiaofei Sun

Microblogging services such as Sina Weibo and Twitter allow users to create tags explicitly indicated by the # symbol. In Sina Weibo, these tags are called microtopics, and in Twitter, they are called hashtags. In Sina Weibo, each microtopic has a designate page and can be directly visited or commented on. Recommending these microtopics to users based on their interests can help users efficiently acquire information. However, it is non-trivial to recommend microtopics to users to satisfy their information needs. In this article, we investigate the task of personalized microtopic recommendation, which exhibits two challenges. First, users usually do not give explicit ratings to microtopics. Second, there exists rich information about users and microtopics, for example, users' published content and biographical information, but it is not clear how to best utilize such information. To address the above two challenges, we propose a joint probabilistic latent factor model to integrate rich information into a matrix factorization-based solution to microtopic recommendation. Our model builds on top of collaborative filtering, content analysis, and feature regression. Using two real-world datasets, we evaluate our model with different kinds of content and contextual information. Experimental results show that our model significantly outperforms a few competitive baseline methods, especially in the circumstance where users have few adoption behaviors.

JAIR Journal 2016 Journal Article

A Distributed Representation-Based Framework for Cross-Lingual Transfer Parsing

  • Jiang Guo
  • Wanxiang Che
  • David Yarowsky
  • Haifeng Wang
  • Ting Liu

This paper investigates the problem of cross-lingual transfer parsing, aiming at inducing dependency parsers for low-resource languages while using only training data from a resource-rich language (e.g., English). Existing model transfer approaches typically don't include lexical features, which are not transferable across languages. In this paper, we bridge the lexical feature gap by using distributed feature representations and their composition. We provide two algorithms for inducing cross-lingual distributed representations of words, which map vocabularies from two different languages into a common vector space. Consequently, both lexical features and non-lexical features can be used in our model for cross-lingual transfer. Furthermore, our framework is flexible enough to incorporate additional useful features such as cross-lingual word clusters. Our combined contributions achieve an average relative error reduction of 10.9% in labeled attachment score as compared with the delexicalized parser, trained on English universal treebank and transferred to three other languages. It also significantly outperforms state-of-the-art delexicalized models augmented with projected cluster features on identical data. Finally, we demonstrate that our models can be further boosted with minimal supervision (e.g., 100 annotated sentences) from target languages, which is of great significance for practical usage.

AAAI Conference 2016 Conference Paper

A Representation Learning Framework for Multi-Source Transfer Parsing

  • Jiang Guo
  • Wanxiang Che
  • David Yarowsky
  • Haifeng Wang
  • Ting Liu

Cross-lingual model transfer has been a promising approach for inducing dependency parsers for lowresource languages where annotated treebanks are not available. The major obstacles for the model transfer approach are two-fold: 1. Lexical features are not directly transferable across languages; 2. Target languagespecific syntactic structures are difficult to be recovered. To address these two challenges, we present a novel representation learning framework for multi-source transfer parsing. Our framework allows multi-source transfer parsing using full lexical features straightforwardly. By evaluating on the Google universal dependency treebanks (v2. 0), our best models yield an absolute improvement of 6. 53% in averaged labeled attachment score, as compared with delexicalized multi-source transfer models. We also significantly outperform the state-of-the-art transfer system proposed most recently.

IJCAI Conference 2016 Conference Paper

Exploring Segment Representations for Neural Segmentation Models

  • Yijia Liu
  • Wanxiang Che
  • Jiang Guo
  • Bing Qin
  • Ting Liu

Many natural language processing (NLP) tasks can be generalized into segmentation problem. In this paper, we combine semi-CRF with neural network to solve NLP segmentation tasks. Our model represents a segment both by composing the input units and embedding the entire segment. We thoroughly study different composition functions and different segment embeddings. We conduct extensive experiments on two typical segmentation tasks: named entity recognition (NER) and Chinese word segmentation (CWS). Experimental results show that our neural semi-CRF model benefits from representing the entire segment and achieves the state-of-the-art performance on CWS benchmark dataset and competitive results on the CoNLL03 NER dataset.

IJCAI Conference 2016 Conference Paper

HC -Search for Incremental Parsing

  • Yijia Liu
  • Wanxiang Che
  • Bing Qin
  • Ting Liu

Standard incremental parsing algorithm employs a single scoring function and beam-search to find the best parse tree from an exponentially large search space. Inspired by recently proposed HC-search framework, we decompose the incremental parsing algorithm into two steps: first searching a set of high-quality outputs with beam-search, and second selecting the best output with a ranking model. We learn our incremental parsing model with a relaxed learning objective. We incorporate arbitrary features in our ranking model and learn the model from fine grain ranking examples. Experimental results on standard English and Chinese datasets show our method significantly outperforms a strong baseline.

IS Journal 2015 Journal Article

Creating a Fine-Grained Corpus for Chinese Sentiment Analysis

  • Yanyan Zhao
  • Bing Qin
  • Ting Liu

Writing comments on products or news has become a popular activity in social media. The amount of opinionated text available online has been growing rapidly, increasing the need for techniques that can analyze opinions expressed in such text so that reviews can be easily absorbed by users. To date, most techniques depend on annotated corpora. However, existing corpora are almost sentence-level works that ignore important global sentiment information in other sentences. Given the rise of advanced applications, more fine-grained corpora are needed, even at the sentence level. The authors aim to create a fine-grained corpus for Chinese sentiment analysis, and more importantly, explore new sentiment analysis tasks by analyzing the annotated corpus. The proposed fine-grained annotation scheme not only introduces cross-sentence and global sentiment information (such as "target entity"') but also includes new sentence-level elements (such as "implicit aspect"). Based on this scheme, this corpus can provide a more fine-grained platform for researchers to study algorithms for advanced applications. In addition, an in-depth analysis on the annotated corpus is made and several important but ignored tasks, such as the target-aspect pair extraction task, are explored, which can give useful hints about future directions.

IJCAI Conference 2015 Conference Paper

Deep Learning for Event-Driven Stock Prediction

  • Xiao Ding
  • Yue Zhang
  • Ting Liu
  • Junwen Duan

We propose a deep learning method for eventdriven stock market prediction. First, events are extracted from news text, and represented as dense vectors, trained using a novel neural tensor network. Second, a deep convolutional neural network is used to model both short-term and long-term influences of events on stock price movements. Experimental results show that our model can achieve nearly 6% improvements on S&P 500 index prediction and individual stock prediction, respectively, compared to state-of-the-art baseline methods. In addition, market simulation results show that our system is more capable of making profits than previously reported systems trained on S&P 500 stock historical data.

AAAI Conference 2015 Conference Paper

Exploring Key Concept Paraphrasing Based on Pivot Language Translation for Question Retrieval

  • Wei-Nan Zhang
  • Zhao-Yan Ming
  • Yu Zhang
  • Ting Liu
  • Tat-Seng Chua

Question retrieval in current community-based question answering (CQA) services does not, in general, work well for long and complex queries. One of the main difficulties lies in the word mismatch between queries and candidate questions. Existing solutions try to expand the queries at word level, but they usually fail to consider concept level enrichment. In this paper, we explore a pivot language translation based approach to derive the paraphrases of key concepts. We further propose a unified question retrieval model which integrates the key concepts and their paraphrases for the query question. Experimental results demonstrate that the paraphrase enhanced retrieval model significantly outperforms the state-of-the-art models in question retrieval.

AAAI Conference 2015 Conference Paper

Mining User Consumption Intention from Social Media Using Domain Adaptive Convolutional Neural Network

  • Xiao Ding
  • Ting Liu
  • Junwen Duan
  • Jian-Yun Nie

Social media platforms are often used by people to express their needs and desires. Such data offer great opportunities to identify users’ consumption intention from user-generated contents, so that better tailored products or services can be recommended. However, there have been few efforts on mining commercial intents from social media contents. In this paper, we investigate the use of social media data to identify consumption intentions for individuals. We develop a Consumption Intention Mining Model (CIMM) based on convolutional neural network (CNN), for identifying whether the user has a consumption intention. The task is domain-dependent, and learning CNN requires a large number of annotated instances, which can be available only in some domains. Hence, we investigate the possibility of transferring the CNN mid-level sentence representation learned from one domain to another by adding an adaptation layer. To demonstrate the effectiveness of CIMM, we conduct experiments on two domains. Our results show that CIMM offers a powerful paradigm for effectively identifying users’ consumption intention based on their social media data. Moreover, our results also confirm that the CNN learned in one domain can be effectively transferred to another domain. This suggests that a great potential for our model to significantly increase effectiveness of product recommendations and targeted advertising.

IJCAI Conference 2015 Conference Paper

User Modeling with Neural Network for Review Rating Prediction

  • Duyu Tang
  • Bing Qin
  • Ting Liu
  • Yuekui Yang

We present a neural network method for review rating prediction in this paper. Existing neural network methods for sentiment prediction typically only capture the semantics of texts, but ignore the user who expresses the sentiment. This is not desirable for review rating prediction as each user has an influence on how to interpret the textual content of a review. For example, the same word (e. g. “good”) might indicate different sentiment strengths when written by different users. We address this issue by developing a new neural network that takes user information into account. The intuition is to factor in user-specific modification to the meaning of a certain word. Specifically, we extend the lexical semantic composition models and introduce a userword composition vector model (UWCVM), which effectively captures how user acts as a function affecting the continuous word representation. We integrate UWCVM into a supervised learning framework for review rating prediction, and conduct experiments on two benchmark review datasets. Experimental results demonstrate the effectiveness of our method. It shows superior performances over several strong baseline methods.

IJCAI Conference 2007 Conference Paper

  • Shiqi Zhao
  • Ming Zhou
  • Ting Liu

Question paraphrasing is critical in many Natural Language Processing (NLP) applications, especially for question reformulation in question answering (QA). However, choosing an appropriate data source and developing effective methods are challenging tasks. In this paper, we propose a method that exploits Encarta logs to automatically identify question paraphrases and extract templates. Questions from Encarta logs are partitioned into small clusters, within which a perceptron classier is used for identifying question paraphrases. Experiments are conducted and the results have shown: (1) Encarta log data is an eligible data source for question paraphrasing and the user clicks in the data are indicative clues for recognizing paraphrases; (2) the supervised method we present is effective, which can evidently outperform the unsupervised method. Besides, the features introduced to identify paraphrases are sound; (3) the obtained question paraphrase templates are quite effective in question reformulation, enhancing the MRR from 0. 2761 to 0. 4939 with the questions of TREC QA 2003.

IJCAI Conference 2007 Conference Paper

  • Shiqi Zhao
  • Ting Liu
  • Xincheng Yuan
  • Sheng Li
  • Yu Zhang

Lexical paraphrasing aims at acquiring word-level paraphrases. It is critical for many Natural Language Processing (NLP) applications, such as Question Answering (QA), Information Extraction (IE), and Machine Translation (MT). Since the meaning and usage of a word can vary in distinct contexts, different paraphrases should be acquired according to the contexts. However, most of the existing researches focus on constructing paraphrase corpora, in which little contextual constraints for paraphrase application are imposed. This paper presents a method that automatically acquires context-specific lexical paraphrases. In this method, the obtained paraphrases of a word depend on the specific sentence the word occurs in. Two stages are included, i. e. candidate paraphrase extraction and paraphrase validation, both of which are mainly based on web mining. Evaluations are conducted on a news title corpus and the presented method is compared with a paraphrasing method that exploits a Chinese thesaurus of synonyms -- Tongyi Cilin (Extended) (CilinE for short). Results show that the f-measure of our method (0. 4852) is significantly higher than that using CilinE (0. 1127). In addition, over 85% of the correct paraphrases derived by our method cannot be found in CilinE, which suggests that our method is effective in acquiring out-of-thesaurus paraphrases.

JMLR Journal 2006 Journal Article

New Algorithms for Efficient High-Dimensional Nonparametric Classification

  • Ting Liu
  • Andrew W. Moore
  • Alexander Gray

This paper is about non-approximate acceleration of high-dimensional nonparametric operations such as k nearest neighbor classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the data points close to the query, but merely need to answer questions about the properties of that set of data points. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. This is applicable to many algorithms in nonparametric statistics, memory-based learning and kernel-based learning. But for clarity, this paper concentrates on pure k -NN classification. We introduce new ball-tree algorithms that on real-world data sets give accelerations from 2-fold to 100-fold compared against highly optimized traditional ball-tree-based k -NN. These results include data sets with up to 10 6 dimensions and 10 5 records, and demonstrate non-trivial speed-ups while giving exact answers. [abs] [ pdf ][ bib ] &copy JMLR 2006. ( edit, beta )

NeurIPS Conference 2004 Conference Paper

An Investigation of Practical Approximate Nearest Neighbor Algorithms

  • Ting Liu
  • Andrew Moore
  • Ke Yang
  • Alexander Gray

This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimen- sional perception areas such as computer vision, with dozens of publica- tions in recent years. Much of this enthusiasm is due to a successful new approximate nearest neighbor approach called Locality Sensitive Hash- ing (LSH). In this paper we ask the question: can earlier spatial data structure approaches to exact nearest neighbor, such as metric trees, be altered to provide approximate answers to proximity queries and if so, how? We introduce a new kind of metric tree that allows overlap: certain datapoints may appear in both the children of a parent. We also intro- duce new approximate k-NN search algorithms on this structure. We show why these structures should be able to exploit the same random- projection-based approximations that LSH enjoys, but with a simpler al- gorithm and perhaps with greater efficiency. We then provide a detailed empirical evaluation on five large, high dimensional datasets which show up to 31-fold accelerations over LSH. This result holds true throughout the spectrum of approximation levels.

NeurIPS Conference 2003 Conference Paper

New Algorithms for Efficient High Dimensional Non-parametric Classification

  • Ting Liu
  • Andrew Moore
  • Alexander Gray

Alexander Gray Computer Science Dept. Carnegie Mellon University Pittsburgh, PA 15213 agray@cs. cmu. edu This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational lee- way, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the pre- diction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN. These results include datasets with up to 106 dimensions and 105 records, and show non-trivial speedups while giving exact answers.