Arrow Research search

Author name cluster

Maosong Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

73 papers
1 author row

Possible papers

73

TMLR Journal 2026 Journal Article

ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer

  • Jinyi Hu
  • Shengding Hu
  • Yuxuan Song
  • Yufei Huang
  • Mingxuan Wang
  • Hao Zhou
  • Zhiyuan Liu
  • Wei-Ying Ma

Autoregressive and diffusion models have achieved remarkable progress in language models and visual generation, respectively. We present ACDiT, a novel Autoregressive blockwise Conditional Diffusion Transformer, that innovatively combines autoregressive and diffusion paradigms for continuous visual information. By introducing a block-wise autoregressive unit, ACDiT offers a flexible interpolation between token-wise autoregression and full-sequence diffusion, bypassing the limitations of discrete tokenization. The generation of each block is formulated as a conditional diffusion process, conditioned on prior blocks. ACDiT is easy to implement, as simple as applying a specially designed Skip-Causal Attention Mask on the standard diffusion transformer during training. During inference, the process iterates between diffusion denoising and autoregressive decoding that can make full use of KV-Cache. We validate the effectiveness of ACDiT on image, video, and text generation and show that ACDiT performs best among all autoregressive baselines under similar model scales on visual generation tasks. We also demonstrate that, benefiting from autoregressive modeling, pretrained ACDiT can be transferred in visual understanding tasks despite being trained with the generative objective. The analysis of the trade-off between autoregressive and diffusion demonstrates the potential of ACDiT to be used in long-horizon visual generation tasks. We hope that ACDiT offers a novel perspective on visual autoregressive generation and sheds light on new avenues for unified models.

TMLR Journal 2026 Journal Article

Exploring Perceptual Limitations of Multimodal LLMs on Small Visual Objects

  • Jiarui Zhang
  • Jinyi Hu
  • Mahyar Khayatkhoei
  • Filip Ilievski
  • Maosong Sun

Multimodal Large Language Models (MLLMs) have recently achieved remarkable performance in various multimodal benchmarks. However, general benchmarks often do not reveal the specific aspects of their visual perception limits due to the lack of controllability. In this work, we quantitatively study the perception of small visual objects in several widely-used MLLMs and reveal a pervasive limitation in answering questions about small objects in images. We then conduct a controlled study of MLLMs' perception, using text-reading as a surrogate task for general visual perception to understand how quality, size, distractors, and location of an object can independently affect the ability of MLLMs to perceive it in images. Through this controlled study, we find that lower object quality, smaller object size and the presence of visual distractors can both independently reduce MLLMs' ability to answer visual questions. More surprisingly, even local perturbations of an object by a few pixels can cause a drastic decline in the ability of MLLMs to perceive it. Our study provides a better understanding of the perceptual limitations of MLLMs and contributes new evaluation protocols for analyzing, enhancing perception of future MLLMs.

AAAI Conference 2026 Conference Paper

IROTE: Human-like Traits Elicitation of Large Language Model via In-Context Self-Reflective Optimization

  • Yuzhuo Bai
  • Shitong Duan
  • Muhua Huang
  • Jing Yao
  • Zhenghao Liu
  • Peng Zhang
  • Tun Lu
  • Xiaoyuan Yi

Trained on various human-authored corpora, Large Language Models (LLMs) have demonstrated a certain capability of reflecting specific human-like traits (e.g., personality or values) by prompting, benefiting applications like personalized LLMs and social simulations. However, existing methods suffer from the superficial elicitation problem: LLMs can only be steered to mimic shallow and unstable stylistic patterns, failing to embody the desired traits precisely and consistently across diverse tasks like humans. To address this challenge, we propose IROTE, a novel in-context method for stable and transferable trait elicitation. Drawing on psychological theories suggesting that traits are formed through identity-related reflection, our method automatically generates and optimizes a textual self-reflection within prompts, which comprises self-perceived experience, to stimulate LLMs' trait-driven behavior. The optimization is performed by iteratively maximizing an information-theoretic objective that enhances the connections between LLMs' behavior and the target trait, while reducing noisy redundancy in reflection without any fine-tuning, leading to evocative and compact trait reflection. Extensive experiments across three human trait systems manifest that one single IROTE-generated self-reflection can induce LLMs' stable impersonation of the target trait across diverse downstream tasks beyond simple questionnaire answering, consistently outperforming existing strong baselines.

AAAI Conference 2026 Conference Paper

LLaVA-UHD v2: Exploiting Hierarchical Vision Granularity in MLLMs via Inverse Semantic Pyramid

  • Yipeng Zhang
  • Yifan Liu
  • Zonghao Guo
  • Yidan Zhang
  • Xuesong Yang
  • Xiaoying Zhang
  • Chi Chen
  • Jun Song

Vision transformers (ViTs) are widely employed in multimodal large language models (MLLMs) for visual encoding. However, they exhibit inferior performance on tasks regarding fine-grained visual perception. We attribute this to the inner limitations of ViTs in capturing diverse visual semantic levels. To address this, we present Hierarchical window (Hiwin) transformer as a plug-and-play solution for MLLMs, centered around our inverse semantic pyramid (ISP). Hiwin transformer comprises two key modules: (i) a visual detail injection module, which progressively injects low-level visual details into high-level language-aligned semantics features, thereby constructing an ISP, and (ii) a hierarchical window attention module, which leverages cross-scale windows to condense multi-level semantics from the ISP. Notably, our design achieves an average boost of 3.7% across 14 benchmarks compared with the baseline method, 9.3% on DocVQA for instance.

AAAI Conference 2026 Conference Paper

Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning

  • Shuzheng Si
  • Haozhe Zhao
  • Cheng Gao
  • Yuzhuo Bai
  • Zhitong Wang
  • Bofei Gao
  • Kangyang Luo
  • Wenhao Li

Teaching large language models (LLMs) to be faithful in the provided context is crucial for building reliable information-seeking systems. Therefore, we propose a systematic framework, CANOE, to reduce faithfulness hallucinations of LLMs across different downstream tasks without human annotations. Specifically, we first synthesize short-form question-answering (QA) data with four diverse tasks to construct high-quality and easily verifiable training data without human annotation. Also, we propose Dual-GRPO, a rule-based reinforcement learning method that includes three tailored rule-based rewards derived from synthesized short-form QA data, while simultaneously optimizing both short-form and long-form response generation. Notably, Dual-GRPO eliminates the need to manually label preference data to train reward models and avoids over-optimizing short-form generation when relying only on the synthesized short-form QA data. Experimental results show that CANOE greatly improves the faithfulness of LLMs across 11 different tasks, even outperforming the most advanced LLMs, e.g., GPT-4o and OpenAI o1.

NeurIPS Conference 2025 Conference Paper

A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings

  • Xiaoang Xu
  • Shuo Wang
  • Xu Han
  • Zhenghao Liu
  • Huijia Wu
  • Peipei Li
  • Zhiyuan Liu
  • Maosong Sun

Large Reasoning Models (LRMs) achieve superior performance by extending the thought length. However, a lengthy thinking trajectory leads to reduced efficiency. Most of the existing methods are stuck in the assumption of overthinking and attempt to reason efficiently by compressing the Chain-of-Thought, but this often leads to performance degradation. To address this problem, we introduce A*-Thought, an efficient tree search-based unified framework designed to identify and isolate the most essential thoughts from the extensive reasoning chains produced by these models. It formulates the reasoning process of LRMs as a search tree, where each node represents a reasoning span in the giant reasoning space. By combining the A* search algorithm with a cost function specific to the reasoning path, it can efficiently compress the chain of thought and determine a reasoning path with high information density and low cost. In addition, we also propose a bidirectional importance estimation mechanism, which further refines this search process and enhances its efficiency beyond uniform sampling. Extensive experiments on several advanced math tasks show that A*-Thought effectively balances performance and efficiency over a huge search space. Specifically, A*-Thought can improve the performance of QwQ-32B by 2. 39$\times$ with low-budget and reduce the length of the output token by nearly 50\% with high-budget. The proposed method is also compatible with several other LRMs, demonstrating its generalization capability. The code can be accessed at: https: //github. com/AI9Stars/AStar-Thought.

NeurIPS Conference 2025 Conference Paper

DCAD-2000: A Multilingual Dataset across 2000+ Languages with Data Cleaning as Anomaly Detection

  • Yingli Shen
  • Wen Lai
  • Shuo Wang
  • Xueren Zhang
  • Kangyang Luo
  • Alexander Fraser
  • Maosong Sun

The rapid development of multilingual large language models (LLMs) highlights the need for high-quality, diverse, and well-curated multilingual datasets. In this paper, we introduce DCAD-2000 (Data Cleaning as Anomaly Detection), a large-scale multilingual corpus constructed from newly extracted Common Crawl data and existing multilingual sources. DCAD-2000 covers 2, 282 languages, 46. 72TB of text, and 8. 63 billion documents, spanning 155 high- and medium-resource languages and 159 writing scripts. To overcome the limitations of existing data cleaning approaches, which rely on manually designed heuristic thresholds, we reframe data cleaning as an anomaly detection problem. This dynamic filtering paradigm substantially improves data quality by automatically identifying and removing noisy or anomalous content. By fine-tuning LLMs on DCAD-2000, we demonstrate notable improvements in data quality, robustness of the cleaning pipeline, and downstream performance, particularly for low-resource languages across multiple multilingual benchmarks.

NeurIPS Conference 2025 Conference Paper

Multi-Agent Collaboration via Evolving Orchestration

  • Yufan Dang
  • Chen Qian
  • Xueheng Luo
  • Jingru Fan
  • Zihao Xie
  • Ruijie Shi
  • Weize Chen
  • Cheng Yang

Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving. While recent research explores multi-agent collaboration among LLMs, most approaches rely on static organizational structures that struggle to adapt as task complexity and agent numbers grow, resulting in coordination overhead and inefficiencies. To this end, we propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a centralized orchestrator ("puppeteer") dynamically directs agents ("puppets") in response to evolving task states. This orchestrator is trained via reinforcement learning to adaptively sequence and prioritize agents, enabling flexible and evolvable collective reasoning. Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs. Analyses further reveal that the key improvements consistently stem from the emergence of more compact, cyclic reasoning structures under the orchestrator’s evolution. Our code is available at https: //github. com/OpenBMB/ChatDev/tree/puppeteer.

IJCAI Conference 2025 Conference Paper

NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms

  • Yashan Wang
  • Shangda Wu
  • Jianhuai Hu
  • Xingjian Du
  • Yueqi Peng
  • Yongxin Huang
  • Shuai Fan
  • Xiaobing Li

We introduce NotaGen, a symbolic music generation model aiming to explore the potential of producing high-quality classical sheet music. Inspired by the success of Large Language Models (LLMs), NotaGen adopts pre-training, fine-tuning, and reinforcement learning paradigms (henceforth referred to as the LLM training paradigms). It is pre-trained on 1. 6M pieces of music in ABC notation, and then fine-tuned on approximately 9K high-quality classical compositions conditioned on "period-composer-instrumentation" prompts. For reinforcement learning, we propose the CLaMP-DPO method, which further enhances generation quality and controllability without requiring human annotations or predefined rewards. Our experiments demonstrate the efficacy of CLaMP-DPO in symbolic music generation models with different architectures and encoding schemes. Furthermore, subjective A/B tests show that NotaGen outperforms baseline models against human compositions, greatly advancing musical aesthetics in symbolic music generation.

NeurIPS Conference 2025 Conference Paper

ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation

  • Pengcheng Huang
  • Zhenghao Liu
  • Yukun Yan
  • Haiyan Zhao
  • Xiaoyuan Yi
  • Hao Chen
  • Zhiyuan Liu
  • Maosong Sun

Large language models (LLMs) integrated with retrieval-augmented generation (RAG) have improved factuality by grounding outputs in external evidence. However, they remain susceptible to unfaithful generation, where outputs contradict retrieved context despite its relevance and accuracy. Existing approaches aiming to improve faithfulness primarily focus on enhancing the utilization of external context, but often overlook the persistent influence of internal parametric knowledge during generation. In this work, we investigate the internal mechanisms behind unfaithful generation and identify a subset of mid-to-deep feed-forward networks (FFNs) that are disproportionately activated in such cases. Building on this insight, we propose Parametric Knowledge Muting through FFN Suppression (ParamMute), a framework that improves contextual faithfulness by suppressing the activation of unfaithfulness-associated FFNs and calibrating the model toward retrieved knowledge. To evaluate our approach, we introduce CoFaithfulQA, a benchmark specifically designed to evaluate faithfulness in scenarios where internal knowledge conflicts with accurate external evidence. Experimental results show that ParamMute significantly enhances faithfulness across both CoFaithfulQA and the established ConFiQA benchmark, achieving substantial reductions in reliance on parametric memory. These findings underscore the importance of mitigating internal knowledge dominance and provide a new direction for improving LLM trustworthiness in RAG. All codes are available at https: //github. com/OpenBMB/ParamMute.

NeurIPS Conference 2025 Conference Paper

The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training

  • Weize Chen
  • Jiarui yuan
  • Jin Tailin
  • Ning Ding
  • Huimin Chen
  • Zhiyuan Liu
  • Maosong Sun

Recent large language models (LLMs) exhibit impressive reasoning but often \textit{overthink}, generating excessively long responses that hinder efficiency. We introduce DIET (DIfficulty-AwarE Training), a framework that systematically cuts these "token calories" by integrating on-the-fly problem difficulty into the reinforcement learning (RL) process. DIET dynamically adapts token compression strategies by modulating token penalty strength and conditioning target lengths on estimated task difficulty, to optimize the performance-efficiency trade-off. We also theoretically analyze the pitfalls of naive reward weighting in group-normalized RL algorithms like GRPO, and propose \textit{Advantage Weighting} technique, which enables stable and effective implementation of these difficulty-aware objectives. Experimental results demonstrate that DIET significantly reduces token counts while simultaneously improving reasoning performance. Beyond raw token reduction, we show two crucial benefits largely overlooked by prior work: (1) DIET leads to superior \textbf{inference scaling}. By maintaining high per-sample quality with fewer tokens, it enables better scaling performance via majority voting under fixed computational budgets, an area where other methods falter. (2) DIET enhances the natural positive correlation between response length and problem difficulty, ensuring verbosity is appropriately allocated, unlike many existing compression methods that disrupt this relationship. Our analyses provide a principled and effective framework for developing more efficient, practical, and high-performing LLMs.

NeurIPS Conference 2024 Conference Paper

Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models

  • Xin Li
  • Weize Chen
  • Qizhi Chu
  • Haopeng Li
  • Zhaojun Sun
  • Ran Li
  • Chen Qian
  • Yiwei Wei

The need to analyze graphs is ubiquitous across various fields, from social networks to biological research and recommendation systems. Therefore, enabling the ability of large language models (LLMs) to process graphs is an important step toward more advanced general intelligence. However, current LLM benchmarks on graph analysis require models to directly reason over the prompts describing graphtopology, and are thus limited to small graphs with only a few dozens of nodes. In contrast, human experts typically write programs based on popular libraries for task solving, and can thus handle graphs with different scales. To this end, a question naturally arises: can LLMs analyze graphs like professionals? In this paper, we introduce ProGraph, a manually crafted benchmark containing 3 categories of graph tasks. The benchmark expects solutions based on programming instead of directly reasoning over raw inputs. Our findings reveal that the performance of current LLMs is unsatisfactory, with the best model achieving only 36% accuracy. To bridge this gap, we propose LLM4Graph datasets, which include crawled documents and auto-generated codes based on 6 widely used graph libraries. By augmenting closed-source LLMs with document retrieval and fine-tuning open-source ones on the codes, we show 11-32% absolute improvements in their accuracies. Our results underscore that the capabilities of LLMs in handling structured data are still under-explored, and show the effectiveness of LLM4Graph in enhancing LLMs’ proficiency of graph analysis. The benchmark, datasets and enhanced open-sourcemodels are available at https: //github. com/BUPT-GAMMA/ProGraph.

NeurIPS Conference 2024 Conference Paper

Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

  • Bowen Ping
  • Shuo Wang
  • Hanqing Wang
  • Xu Han
  • Yuzhuang Xu
  • Yukun Yan
  • Yun Chen
  • Baobao Chang

Fine-tuning is a crucial process for adapting large language models (LLMs) to diverse applications. In certain scenarios, such as multi-tenant serving, deploying multiple LLMs becomes necessary to meet complex demands. Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs. In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs (e. g. , WizardMath for math problems). Motivated by the long-tail distribution of singular values in the delta weights, we propose a delta quantization approach using mixed-precision. This method employs higher-bit representation for singular vectors corresponding to larger singular values. We evaluate our approach on various fine-tuned LLMs, including math LLMs, code LLMs, chat LLMs, and even VLMs. Experimental results demonstrate that our approach performs comparably to full fine-tuned LLMs, surpassing both low-rank and low-bit baselines by a considerable margin. Additionally, we show that our method is compatible with various backbone LLMs, such as Llama-2, Llama-3, and Mistral, highlighting its generalizability.

TMLR Journal 2024 Journal Article

Exploring Format Consistency for Instruction Tuning

  • Shihao Liang
  • Runchu Tian
  • Kunlun Zhu
  • Yujia Qin
  • Huadong Wang
  • Xin Cong
  • Zhiyuan Liu
  • Xiaojiang Liu

Instruction tuning has emerged as a promising approach to enhancing large language models in following human instructions. It is shown that increasing the diversity and number of instructions in the training data can consistently enhance generalization performance, which facilitates a recent endeavor to collect various instructions and integrate existing instruction tuning datasets into larger collections. However, different users have their unique ways of expressing instructions, and there often exist variations across different datasets in the instruction styles and formats, i.e., format inconsistency. In this work, a framework named Unified Instruction Tuning (UIT) is proposed, which calls OpenAI APIs for automatic format transfer among different instruction tuning datasets such as PromptSource, FLAN and CrossFit. With the framework, we (1) demonstrate the necessity of maintaining format consistency in instruction tuning; (2) improve the generalization performance on unseen instructions on T5-LM-xl; (3) provide a novel perplexity-based denoising method to reduce the noise of automatic format transfer to make the UIT framework more practical and a smaller offline model based on GPT-J that achieves comparable format transfer capability to OpenAI APIs to reduce costs in practice. Further analysis regarding variations of targeted formats and other effects is intended. The code and trained models will soon be available.

NeurIPS Conference 2024 Conference Paper

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

  • Chaojun Xiao
  • Pengle Zhang
  • Xu Han
  • Guangxuan Xiao
  • Yankai Lin
  • Zhengyan Zhang
  • Zhiyuan Liu
  • Maosong Sun

Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs (e. g. , LLM-driven agents). However, existing LLMs, pre-trained on sequences with a restricted maximum length, cannot process longer sequences due to the out-of-domain and distraction issues. Common solutions often involve continual pre-training on longer sequences, which will introduce expensive computational overhead and uncontrollable change in model capabilities. In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning. To this end, we introduce a training-free memory-based method, InfLLM. Specifically, InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention computation. Thereby, InfLLM allows LLMs to efficiently process long sequences with a limited context window and well capture long-distance dependencies. Without any training, InfLLM enables LLMs that are pre-trained on sequences consisting of a few thousand tokens to achieve comparable performance with competitive baselines that continually train these LLMs on long sequences. Even when the sequence length is scaled to 1, 024K, InfLLM still effectively captures long-distance dependencies. Our code can be found at https: //github. com/thunlp/InfLLM.

IJCAI Conference 2024 Conference Paper

On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models

  • Xinpeng Wang
  • Shitong Duan
  • Xiaoyuan Yi
  • Jing Yao
  • Shanlin Zhou
  • Zhihua Wei
  • Peng Zhang
  • Dongkuan Xu

Big models have achieved revolutionary breakthroughs in the field of AI, but they also pose potential ethical and societal risks to humans. Addressing such problems, alignment technologies were introduced to make these models conform to human preferences and values. Despite the considerable advancements in the past year, various challenges lie in establishing the optimal alignment strategy, such as data cost and scalable oversight, and how to align remains an open question. In this survey paper, we comprehensively investigate value alignment approaches. We first unpack the historical context of alignment tracing back to the 1920s (where it comes from), then delve into the mathematical essence of alignment (what it is), shedding light on the inherent challenges. Following this foundation, we provide a detailed examination of existing alignment methods, which fall into three categories: RL-based Alignment, SFT-based Alignment, and Inference-Time Alignment, and demonstrate their intrinsic connections, strengths, and limitations, helping readers better understand this research area. In addition, two emerging topics, alignment goal and multimodal alignment, are also discussed as novel frontiers in the field. Looking forward, we discuss potential alignment paradigms and how they could handle remaining challenges, prospecting where future alignment will go.

NeurIPS Conference 2023 Conference Paper

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

  • Yuzhen Huang
  • Yuzhuo Bai
  • Zhihao Zhu
  • Junlei Zhang
  • Jinghan Zhang
  • Tangjun Su
  • Junteng Liu
  • Chuancheng Lv

New NLP benchmarks are urgently needed to align with the rapid development of large language models (LLMs). We present C-Eval, the first comprehensive Chinese evaluation suite designed to assess advanced knowledge and reasoning abilities of foundation models in a Chinese context. C-Eval comprises multiple-choice questions across four difficulty levels: middle school, high school, college, and professional. The questions span 52 diverse disciplines, ranging from humanities to science and engineering. C-Eval is accompanied by C-Eval Hard, a subset of very challenging subjects in C-Eval that requires advanced reasoning abilities to solve. We conduct a comprehensive evaluation of the most advanced LLMs on C-Eval, including both English- and Chinese-oriented models. Results indicate that only GPT-4 could achieve an average accuracy of over 60%, suggesting that there is still significant room for improvement for current LLMs. We anticipate C-Eval will help analyze important strengths and shortcomings of foundation models, and foster their development and growth for Chinese users.

NeurIPS Conference 2023 Conference Paper

H3T: Efficient Integration of Memory Optimization and Parallelism for Large-scale Transformer Training

  • Yuzhong Wang
  • Xu Han
  • Weilin Zhao
  • Guoyang Zeng
  • Zhiyuan Liu
  • Maosong Sun

In recent years, big models based on Transformers have achieved state-of-the-art performance on many artificial intelligence (AI) tasks. Despite the success of these Transformer-based models, their huge parameter size poses a serious challenge to their training, both from the storage and computation perspectives. To this end, memory optimization (e. g. , rematerialization and offloading) and parallelism (e. g. , data parallelism and model parallelism) are widely explored to make training Transformers more efficient. In this paper, we propose a framework to automatically find an efficient integration of memory optimization and parallelism for High-Throughput Transformer Training (named H3T), which is rarely considered by existing efforts for training big Transformer-based models. Specifically, we design search algorithms to combine appropriate memory optimization strategies and parallelism schemes to achieve a balance between memory overhead and training efficiency. We implement H3T based on an open-source toolkit BMTrain and then use H3T to train the Transformers of different sizes to evaluate the efficiency of H3T. The experimental results show that H3T outperforms the most popular deep learning (DL) toolkit Megatron-DeepSpeed by $1. 2\times \sim 4. 3\times$ training speed while reducing $34. 6\% \sim 80. 5\%$ of memory overhead. Moreover, H3T can use only 64 NVIDIA A100 GPUs to train GPT-3-175B, which is very difficult for existing DL toolkits. The source code is available at https: //github. com/OpenBMB/BMTrain/tree/h3t.

NeurIPS Conference 2023 Conference Paper

Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluations

  • Lifan Yuan
  • Yangyi Chen
  • Ganqu Cui
  • Hongcheng Gao
  • FangYuan Zou
  • Xingyi Cheng
  • Heng Ji
  • Zhiyuan Liu

This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP. We find that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. To address these issues, we propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts. Then we introduceBOSS, a Benchmark suite for Out-of-distribution robustneSS evaluation covering 5 tasks and 20 datasets. Based on BOSS, we conduct a series of experiments on pretrained language models for analysis and evaluation of OOD robustness. First, for vanilla fine-tuning, we examine the relationship between in-distribution (ID) and OOD performance. We identify three typical types that unveil the inner learningmechanism, which could potentially facilitate the forecasting of OOD robustness, correlating with the advancements on ID datasets. Then, we evaluate 5 classic methods on BOSS and find that, despite exhibiting some effectiveness in specific cases, they do not offer significant improvement compared to vanilla fine-tuning. Further, we evaluate 5 LLMs with various adaptation paradigms and find that when sufficient ID data is available, fine-tuning domain-specific models outperform LLMs on ID examples significantly. However, in the case of OOD instances, prioritizing LLMs with in-context learning yields better results. We identify that both fine-tuned small models and LLMs face challenges in effectively addressing downstream tasks. The code is public at https: //github. com/lifan-yuan/OOD_NLP.

AAAI Conference 2023 Conference Paper

Visually Grounded Commonsense Knowledge Acquisition

  • Yuan Yao
  • Tianyu Yu
  • Ao Zhang
  • Mengdi Li
  • Ruobing Xie
  • Cornelius Weber
  • Zhiyuan Liu
  • Hai-Tao Zheng

Large-scale commonsense knowledge bases empower a broad range of AI applications, where the automatic extraction of commonsense knowledge (CKE) is a fundamental and challenging problem. CKE from text is known for suffering from the inherent sparsity and reporting bias of commonsense in text. Visual perception, on the other hand, contains rich commonsense knowledge about real-world entities, e.g., (person, can_hold, bottle), which can serve as promising sources for acquiring grounded commonsense knowledge. In this work, we present CLEVER, which formulates CKE as a distantly supervised multi-instance learning problem, where models learn to summarize commonsense relations from a bag of images about an entity pair without any human annotation on image instances. To address the problem, CLEVER leverages vision-language pre-training models for deep understanding of each image in the bag, and selects informative instances from the bag to summarize commonsense entity relations via a novel contrastive attention mechanism. Comprehensive experimental results in held-out and human evaluation show that CLEVER can extract commonsense knowledge in promising quality, outperforming pre-trained language model-based methods by 3.9 AUC and 6.4 mAUC points. The predicted commonsense scores show strong correlation with human judgment with a 0.78 Spearman coefficient. Moreover, the extracted commonsense can also be grounded into images with reasonable interpretability. The data and codes can be obtained at https://github.com/thunlp/CLEVER.

NeurIPS Conference 2022 Conference Paper

A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

  • Ganqu Cui
  • Lifan Yuan
  • Bingxiang He
  • Yangyi Chen
  • Zhiyuan Liu
  • Maosong Sun

Textual backdoor attacks are a kind of practical threat to NLP systems. By injecting a backdoor in the training phase, the adversary could control model predictions via predefined triggers. As various attack and defense models have been proposed, it is of great significance to perform rigorous evaluations. However, we highlight two issues in previous backdoor learning evaluations: (1) The differences between real-world scenarios (e. g. releasing poisoned datasets or models) are neglected, and we argue that each scenario has its own constraints and concerns, thus requires specific evaluation protocols; (2) The evaluation metrics only consider whether the attacks could flip the models' predictions on poisoned samples and retain performances on benign samples, but ignore that poisoned samples should also be stealthy and semantic-preserving. To address these issues, we categorize existing works into three practical scenarios in which attackers release datasets, pre-trained models, and fine-tuned models respectively, then discuss their unique evaluation methodologies. On metrics, to completely evaluate poisoned samples, we use grammar error increase and perplexity difference for stealthiness, along with text similarity for validity. After formalizing the frameworks, we develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning. With this toolkit, we perform extensive experiments to benchmark attack and defense models under the suggested paradigm. To facilitate the underexplored defenses against poisoned datasets, we further propose CUBE, a simple yet strong clustering-based defense baseline. We hope that our frameworks and benchmarks could serve as the cornerstones for future model development and evaluations.

NeurIPS Conference 2022 Conference Paper

Moderate-fitting as a Natural Backdoor Defender for Pre-trained Language Models

  • Biru Zhu
  • Yujia Qin
  • Ganqu Cui
  • Yangyi Chen
  • Weilin Zhao
  • Chong Fu
  • Yangdong Deng
  • Zhiyuan Liu

Despite the great success of pre-trained language models (PLMs) in a large set of natural language processing (NLP) tasks, there has been a growing concern about their security in real-world applications. Backdoor attack, which poisons a small number of training samples by inserting backdoor triggers, is a typical threat to security. Trained on the poisoned dataset, a victim model would perform normally on benign samples but predict the attacker-chosen label on samples containing pre-defined triggers. The vulnerability of PLMs under backdoor attacks has been proved with increasing evidence in the literature. In this paper, we present several simple yet effective training strategies that could effectively defend against such attacks. To the best of our knowledge, this is the first work to explore the possibility of backdoor-free adaptation for PLMs. Our motivation is based on the observation that, when trained on the poisoned dataset, the PLM's adaptation follows a strict order of two stages: (1) a moderate-fitting stage, where the model mainly learns the major features corresponding to the original task instead of subsidiary features of backdoor triggers, and (2) an overfitting stage, where both features are learned adequately. Therefore, if we could properly restrict the PLM's adaptation to the moderate-fitting stage, the model would neglect the backdoor triggers but still achieve satisfying performance on the original task. To this end, we design three methods to defend against backdoor attacks by reducing the model capacity, training epochs, and learning rate, respectively. Experimental results demonstrate the effectiveness of our methods in defending against several representative NLP backdoor attacks. We also perform visualization-based analysis to attain a deeper understanding of how the model learns different features, and explore the effect of the poisoning ratio. Finally, we explore whether our methods could defend against backdoor attacks for the pre-trained CV model. The codes are publicly available at https: //github. com/thunlp/Moderate-fitting.

NeurIPS Conference 2022 Conference Paper

Sparse Structure Search for Delta Tuning

  • Shengding Hu
  • Zhen Zhang
  • Ning Ding
  • Yadao Wang
  • Yasheng Wang
  • Zhiyuan Liu
  • Maosong Sun

Adapting large pre-trained models (PTMs) through fine-tuning imposes prohibitive computational and storage burdens. Recent studies of delta tuning (DT), i. e. , parameter-efficient tuning, find that only optimizing a small portion of parameters conditioned on PTMs could yield on-par performance compared to conventional fine-tuning. Generally, DT methods exquisitely design delta modules (DT modules) which could be applied to arbitrary fine-grained positions inside PTMs. However, the effectiveness of these fine-grained positions largely relies on sophisticated manual designation, thereby usually producing sub-optimal results. In contrast to the manual designation, we explore constructing DT modules in an automatic manner. We automatically \textbf{S}earch for the \textbf{S}parse \textbf{S}tructure of \textbf{Delta} Tuning (S$^3$Delta). Based on a unified framework of various DT methods, S$^3$Delta conducts the differentiable DT structure search through bi-level optimization and proposes shifted global sigmoid method to explicitly control the number of trainable parameters. Extensive experiments show that S$^3$Delta surpasses manual and random structures with less trainable parameters. The searched structures preserve more than 99\% fine-tuning performance with 0. 01\% trainable parameters. Moreover, the advantage of S$^3$Delta is amplified with extremely low trainable parameters budgets (0. 0009\%$\sim$0. 01\%). The searched structures are transferable and explainable, providing suggestions and guidance for the future design of DT methods. Our codes are publicly available at \url{https: //github. com/thunlp/S3Delta}.

AAAI Conference 2021 Conference Paper

Adversarial Language Games for Advanced Natural Language Intelligence

  • Yuan Yao
  • Haoxi Zhong
  • Zhengyan Zhang
  • Xu Han
  • Xiaozhi Wang
  • Kai Zhang
  • Chaojun Xiao
  • Guoyang Zeng

We study the problem of adversarial language games, in which multiple agents with conflicting goals compete with each other via natural language interactions. While adversarial language games are ubiquitous in human activities, little attention has been devoted to this field in natural language processing. In this work, we propose a challenging adversarial language game called Adversarial Taboo as an example, in which an attacker and a defender compete around a target word. The attacker is tasked with inducing the defender to utter the target word invisible to the defender, while the defender is tasked with detecting the target word before being induced by the attacker. In Adversarial Taboo, a successful attacker and defender need to hide or infer the intention, and induce or defend during conversations. This requires several advanced language abilities, such as adversarial pragmatic reasoning and goal-oriented language interactions in open domain, which will facilitate many downstream NLP tasks. To instantiate the game, we create a game environment and a competition platform. Comprehensive experiments on several baseline attack and defense strategies show promising and interesting results, based on which we discuss some directions for future research. The code and datasets of this paper can be obtained from https: //github. com/thunlp/AdversarialTaboo.

AAAI Conference 2021 Conference Paper

Aspect-Level Sentiment-Controllable Review Generation with Mutual Learning Framework

  • Huimin Chen
  • Yankai Lin
  • Fanchao Qi
  • Jinyi Hu
  • Peng Li
  • Jie Zhou
  • Maosong Sun

Review generation, aiming to automatically generate review text according to the given information, is proposed to assist in the unappealing review writing. However, most of existing methods only consider the overall sentiments of reviews and cannot achieve aspect-level sentiment control. Even though some previous studies attempt to generate aspect-level sentiment-controllable reviews, they usually require largescale human annotations which are unavailable in the real world. To address this issue, we propose a mutual learning framework to take advantage of unlabeled data to assist the aspect-level sentiment-controllable review generation. The framework consists of a generator and a classifier which utilize confidence mechanism and reconstruction reward to enhance each other. Experimental results show our model can achieve aspect-sentiment control accuracy up to 88% without losing generation quality.

AAAI Conference 2020 Conference Paper

Iteratively Questioning and Answering for Interpretable Legal Judgment Prediction

  • Haoxi Zhong
  • Yuzhong Wang
  • Cunchao Tu
  • Tianyang Zhang
  • Zhiyuan Liu
  • Maosong Sun

Legal Judgment Prediction (LJP) aims to predict judgment results according to the facts of cases. In recent years, LJP has drawn increasing attention rapidly from both academia and the legal industry, as it can provide references for legal practitioners and is expected to promote judicial justice. However, the research to date usually suffers from the lack of interpretability, which may lead to ethical issues like inconsistent judgments or gender bias. In this paper, we present QAjudge, a model based on reinforcement learning to visualize the prediction process and give interpretable judgments. QAjudge follows two essential principles in legal systems across the world: Presumption of Innocence and Elemental Trial. During inference, a Question Net will select questions from the given set and an Answer Net will answer the question according to the fact description. Finally, a Predict Net will produce judgment results based on the answers. Reward functions are designed to minimize the number of questions asked. We conduct extensive experiments on several realworld datasets. Experimental results show that QAjudge can provide interpretable judgments while maintaining comparable performance with other state-of-the-art LJP models. The codes can be found from https: //github. com/thunlp/QAjudge.

AAAI Conference 2020 Conference Paper

JEC-QA: A Legal-Domain Question Answering Dataset

  • Haoxi Zhong
  • Chaojun Xiao
  • Cunchao Tu
  • Tianyang Zhang
  • Zhiyuan Liu
  • Maosong Sun

We present JEC-QA, the largest question answering dataset in the legal domain, collected from the National Judicial Examination of China. The examination is a comprehensive evaluation of professional skills for legal practitioners. College students are required to pass the examination to be certified as a lawyer or a judge. The dataset is challenging for existing question answering methods, because both retrieving relevant materials and answering questions require the ability of logic reasoning. Due to the high demand of multiple reasoning abilities to answer legal questions, the state-of-the-art models can only achieve about 28% accuracy on JEC-QA, while skilled humans and unskilled humans can reach 81% and 64% accuracy respectively, which indicates a huge gap between humans and machines on this task. We will release JEC-QA and our baselines to help improve the reasoning ability of machine comprehension models. You can access the dataset from http: //jecqa. thunlp. org/.

AAAI Conference 2020 Conference Paper

MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space

  • Xiaoyuan Yi
  • Ruoyu Li
  • Cheng Yang
  • Wenhao Li
  • Maosong Sun

As an essential step towards computer creativity, automatic poetry generation has gained increasing attention these years. Though recent neural models make prominent progress in some criteria of poetry quality, generated poems still suffer from the problem of poor diversity. Related literature researches show that different factors, such as life experience, historical background, etc. , would influence composition styles of poets, which considerably contributes to the high diversity of human-authored poetry. Inspired by this, we propose MixPoet, a novel model that absorbs multiple factors to create various styles and promote diversity. Based on a semi-supervised variational autoencoder, our model disentangles the latent space into some subspaces, with each conditioned on one influence factor by adversarial training. In this way, the model learns a controllable latent variable to capture and mix generalized factor-related properties. Different factor mixtures lead to diverse styles and hence further differentiate generated poems from each other. Experiment results on Chinese poetry demonstrate that MixPoet improves both diversity and quality against three state-of-the-art models.

IJCAI Conference 2020 Conference Paper

Modeling Voting for System Combination in Machine Translation

  • Xuancheng Huang
  • Jiacheng Zhang
  • Zhixing Tan
  • Derek F. Wong
  • Huanbo Luan
  • Jingfang Xu
  • Maosong Sun
  • Yang Liu

System combination is an important technique for combining the hypotheses of different machine translation systems to improve translation performance. Although early statistical approaches to system combination have been proven effective in analyzing the consensus between hypotheses, they suffer from the error propagation problem due to the use of pipelines. While this problem has been alleviated by end-to-end training of multi-source sequence-to-sequence models recently, these neural models do not explicitly analyze the relations between hypotheses and fail to capture their agreement because the attention to a word in a hypothesis is calculated independently, ignoring the fact that the word might occur in multiple hypotheses. In this work, we propose an approach to modeling voting for system combination in machine translation. The basic idea is to enable words in hypotheses from different systems to vote on words that are representative and should get involved in the generation process. This can be done by quantifying the influence of each voter and its preference for each candidate. Our approach combines the advantages of statistical and neural methods since it can not only analyze the relations between hypotheses but also allow for end-to-end training. Experiments show that our approach is capable of better taking advantage of the consensus between hypotheses and achieves significant improvements over state-of-the-art baselines on Chinese-English and English-German machine translation tasks.

AAAI Conference 2020 Conference Paper

Multi-Channel Reverse Dictionary Model

  • Lei Zhang
  • Fanchao Qi
  • Zhiyuan Liu
  • Yasheng Wang
  • Qun Liu
  • Maosong Sun

A reverse dictionary takes the description of a target word as input and outputs the target word together with other words that match the description. Existing reverse dictionary methods cannot deal with highly variable input queries and low-frequency target words successfully. Inspired by the description-to-word inference process of humans, we propose the multi-channel reverse dictionary model, which can mitigate the two problems simultaneously. Our model comprises a sentence encoder and multiple predictors. The predictors are expected to identify different characteristics of the target word from the input query. We evaluate our model on English and Chinese datasets including both dictionary definitions and human-written descriptions. Experimental results show that our model achieves the state-of-the-art performance, and even outperforms the most popular commercial reverse dictionary system on the human-written description dataset. We also conduct quantitative analyses and a case study to demonstrate the effectiveness and robustness of our model. All the code and data of this work can be obtained on https: //github. com/thunlp/MultiRD.

AAAI Conference 2020 Conference Paper

Neural Snowball for Few-Shot Relation Learning

  • Tianyu Gao
  • Xu Han
  • Ruobing Xie
  • Zhiyuan Liu
  • Fen Lin
  • Leyu Lin
  • Maosong Sun

Knowledge graphs typically undergo open-ended growth of new relations. This cannot be well handled by relation extraction that focuses on pre-defined relations with sufficient training data. To address new relations with few-shot instances, we propose a novel bootstrapping approach, Neural Snowball, to learn new relations by transferring semantic knowledge about existing relations. More specifically, we use Relational Siamese Networks (RSN) to learn the metric of relational similarities between instances based on existing relations and their labeled data. Afterwards, given a new relation and its few-shot instances, we use RSN to accumulate reliable instances from unlabeled corpora; these instances are used to train a relation classifier, which can further identify new facts of the new relation. The process is conducted iteratively like a snowball. Experiments show that our model can gather high-quality instances for better fewshot relation learning and achieves significant improvement compared to baselines. Codes and datasets are released on https: //github. com/thunlp/Neural-Snowball.

IJCAI Conference 2020 Conference Paper

Text Style Transfer via Learning Style Instance Supported Latent Space

  • Xiaoyuan Yi
  • Zhenghao Liu
  • Wenhao Li
  • Maosong Sun

Text style transfer pursues altering the style of a sentence while remaining its main content unchanged. Due to the lack of parallel corpora, most recent work focuses on unsupervised methods and has achieved noticeable progress. Nonetheless, the intractability of completely disentangling content from style for text leads to a contradiction of content preservation and style transfer accuracy. To address this problem, we propose a style instance supported method, StyIns. Instead of representing styles with embeddings or latent variables learned from single sentences, our model leverages the generative flow technique to extract underlying stylistic properties from multiple instances of each style, which form a more discriminative and expressive latent style space. By combining such a space with the attention-based structure, our model can better maintain the content and simultaneously achieve high transfer accuracy. Furthermore, the proposed method can be flexibly extended to semi-supervised learning so as to utilize available limited paired data. Experiments on three transfer tasks, sentiment modification, formality rephrasing, and poeticness generation, show that StyIns obtains a better balance between content and style, outperforming several recent baselines.

AAAI Conference 2020 Conference Paper

Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets

  • Fanchao Qi
  • Liang Chang
  • Maosong Sun
  • Sicong Ouyang
  • Zhiyuan Liu

A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain words annotated with sememes, have been successfully applied to many NLP tasks. However, existing sememe KBs are built on only a few languages, which hinders their widespread utilization. To address the issue, we propose to build a uni- fied sememe KB for multiple languages based on BabelNet, a multilingual encyclopedic dictionary. We first build a dataset serving as the seed of the multilingual sememe KB. It manually annotates sememes for over 15 thousand synsets (the entries of BabelNet). Then, we present a novel task of automatic sememe prediction for synsets, aiming to expand the seed dataset into a usable KB. We also propose two simple and effective models, which exploit different information of synsets. Finally, we conduct quantitative and qualitative analyses to explore important factors and difficulties in the task. All the source code and data of this work can be obtained on https: //github. com/thunlp/BabelNet-Sememe-Prediction.

NeurIPS Conference 2020 Conference Paper

Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

  • Wangchunshu Zhou
  • Jinyi Hu
  • Hanlin Zhang
  • Xiaodan Liang
  • Maosong Sun
  • Chenyan Xiong
  • Jian Tang

Recently generating natural language explanations has shown very promising results in not only offering interpretable explanations but also providing additional information and supervision for prediction. However, existing approaches usually require a large set of human annotated explanations for training while collecting a large set of explanations is not only time consuming but also expensive. In this paper, we develop a general framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training. Our framework treats natural language explanations as latent variables that model the underlying reasoning process of a neural model. We develop a variational EM framework for optimization where an explanation generation module and an explanation-augmented prediction module are alternatively optimized and mutually enhance each other. Moreover, we further propose an explanation-based self-training method under this framework for semi-supervised learning. It alternates between assigning pseudo-labels to unlabeled data and generating new explanations to iteratively improve each other. Experiments on two natural language understanding tasks demonstrate that our framework can not only make effective predictions in both supervised and semi-supervised settings, but is also able to generate good natural language explanations.

IJCAI Conference 2019 Conference Paper

Enhancing Stock Movement Prediction with Adversarial Training

  • Fuli Feng
  • Huimin Chen
  • Xiangnan He
  • Ji Ding
  • Maosong Sun
  • Tat-Seng Chua

This paper contributes a new machine learning solution for stock movement prediction, which aims to predict whether the price of a stock will be up or down in the near future. The key novelty is that we propose to employ adversarial training to improve the generalization of a neural network prediction model. The rationality of adversarial training here is that the input features to stock prediction are typically based on stock price, which is essentially a stochastic variable and continuously changed with time by nature. As such, normal training with static price-based features (e. g. the close price) can easily overfit the data, being insufficient to obtain reliable models. To address this problem, we propose to add perturbations to simulate the stochasticity of price variable, and train the model to work well under small yet intentional perturbations. Extensive experiments on two real-world stock data show that our method outperforms the state-of-the-art solution [Xu and Cohen, 2018] with 3. 11% relative improvements on average w. r. t. accuracy, validating the usefulness of adversarial training for stock prediction task.

AAAI Conference 2019 Conference Paper

Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification

  • Tianyu Gao
  • Xu Han
  • Zhiyuan Liu
  • Maosong Sun

The existing methods for relation classification (RC) primarily rely on distant supervision (DS) because large-scale supervised training datasets are not readily available. Although DS automatically annotates adequate amounts of data for model training, the coverage of this data is still quite limited, and meanwhile many long-tail relations still suffer from data sparsity. Intuitively, people can grasp new knowledge by learning few instances. We thus provide a different view on RC by formalizing RC as a few-shot learning (FSL) problem. However, the current FSL models mainly focus on low-noise vision tasks, which makes them hard to directly deal with the diversity and noise of text. In this paper, we propose hybrid attention-based prototypical networks for the problem of noisy few-shot RC. We design instancelevel and feature-level attention schemes based on prototypical networks to highlight the crucial instances and features respectively, which significantly enhances the performance and robustness of RC models in a noisy FSL scenario. Besides, our attention schemes accelerate the convergence speed of RC models. Experimental results demonstrate that our hybrid attention-based models require fewer training iterations and outperform the state-of-the-art baseline models. The code and datasets are released on https: //github. com/thunlp/ HATT-Proto.

IJCAI Conference 2019 Conference Paper

Multi-scale Information Diffusion Prediction with Reinforced Recurrent Networks

  • Cheng Yang
  • Jian Tang
  • Maosong Sun
  • Ganqu Cui
  • Zhiyuan Liu

Information diffusion prediction is an important task which studies how information items spread among users. With the success of deep learning techniques, recurrent neural networks (RNNs) have shown their powerful capability in modeling information diffusion as sequential data. However, previous works focused on either microscopic diffusion prediction which aims at guessing the next influenced user or macroscopic diffusion prediction which estimates the total numbers of influenced users during the diffusion process. To the best of our knowledge, no previous works have suggested a unified model for both microscopic and macroscopic scales. In this paper, we propose a novel multi-scale diffusion prediction model based on reinforcement learning (RL). RL incorporates the macroscopic diffusion size information into the RNN-based microscopic diffusion model by addressing the non-differentiable problem. We also employ an effective structural context extraction strategy to utilize the underlying social graph information. Experimental results show that our proposed model outperforms state-of-the-art baseline models on both microscopic and macroscopic diffusion predictions on three real-world datasets.

IJCAI Conference 2019 Conference Paper

Sentiment-Controllable Chinese Poetry Generation

  • Huimin Chen
  • Xiaoyuan Yi
  • Maosong Sun
  • Wenhao Li
  • Cheng Yang
  • Zhipeng Guo

Expressing diverse sentiments is one of the main purposes of human poetry creation. Existing Chinese poetry generation models have made great progress in poetry quality, but they all neglected to endow generated poems with specific sentiments. Such defect leads to strong sentiment collapse or bias and thus hurts the diversity and semantics of generated poems. Meanwhile, there are few sentimental Chinese poetry resources for studying. To address this problem, we first collect a manually-labelled sentimental poetry corpus with fine-grained sentiment labels. Then we propose a novel semi-supervised conditional Variational Auto-Encoder model for sentiment-controllable poetry generation. Besides, since poetry is discourse-level text where the polarity and intensity of sentiment could transfer among lines, we incorporate a temporal module to capture sentiment transition patterns among different lines. Experimental results show our model can control the sentiment of not only a whole poem but also each line, and improve the poetry diversity against the state-of-the-art models without losing quality.

NeurIPS Conference 2018 Conference Paper

Bandit Learning with Implicit Feedback

  • Yi Qi
  • Qingyun Wu
  • Hongning Wang
  • Jie Tang
  • Maosong Sun

Implicit feedback, such as user clicks, although abundant in online information service systems, does not provide substantial evidence on users' evaluation of system's output. Without proper modeling, such incomplete supervision inevitably misleads model estimation, especially in a bandit learning setting where the feedback is acquired on the fly. In this work, we perform contextual bandit learning with implicit feedback by modeling the feedback as a composition of user result examination and relevance judgment. Since users' examination behavior is unobserved, we introduce latent variables to model it. We perform Thompson sampling on top of variational Bayesian inference for arm selection and model update. Our upper regret bound analysis of the proposed algorithm proves its feasibility of learning from implicit feedback in a bandit setting; and extensive empirical evaluations on click logs collected from a major MOOC platform further demonstrate its learning effectiveness in practice.

AAAI Conference 2018 Conference Paper

Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention

  • Xiangkai Zeng
  • Cheng Yang
  • Cunchao Tu
  • Zhiyuan Liu
  • Maosong Sun

Linguistic Inquiry and Word Count (LIWC) is a word counting software tool which has been used for quantitative text analysis in many fields. Due to its success and popularity, the core lexicon has been translated into Chinese and many other languages. However, the lexicon only contains several thousand of words, which is deficient compared with the number of common words in Chinese. Current approaches often require manually expanding the lexicon, but it often takes too much time and requires linguistic experts to extend the lexicon. To address this issue, we propose to expand the LIWC lexicon automatically. Specifically, we consider it as a hierarchical classification problem and utilize the Sequence-to-Sequence model to classify words in the lexicon. Moreover, we use the sememe information with the attention mechanism to capture the exact meanings of a word, so that we can expand a more precise and comprehensive lexicon. The experimental results show that our model has a better understanding of word meanings with the help of sememes and achieves significant and consistent improvements compared with the state-of-the-art methods. The source code of this paper can be obtained from https: //github. com/thunlp/Auto CLIWC.

IJCAI Conference 2018 Conference Paper

Chinese Poetry Generation with a Working Memory Model

  • Xiaoyuan Yi
  • Maosong Sun
  • Ruoyu Li
  • Zonghan Yang

As an exquisite and concise literary form, poetry is a gem of human culture. Automatic poetry generation is an essential step towards computer creativity. In recent years, several neural models have been designed for this task. However, among lines of a whole poem, the coherence in meaning and topics still remains a big challenge. In this paper, inspired by the theoretical concept in cognitive psychology, we propose a novel Working Memory model for poetry generation. Different from previous methods, our model explicitly maintains topics and informative limited history in a neural memory. During the generation process, our model reads the most relevant parts from memory slots to generate the current line. After each line is generated, it writes the most salient parts of the previous line into memory slots. By dynamic manipulation of the memory, our model keeps a coherent information flow and learns to express each topic flexibly and naturally. We experiment on three different genres of Chinese poetry: quatrain, iambic and chinoiserie lyric. Both automatic and human evaluation results show that our model outperforms current state-of-the-art methods.

AAAI Conference 2018 Conference Paper

Improving Neural Fine-Grained Entity Typing With Knowledge Attention

  • Ji Xin
  • Yankai Lin
  • Zhiyuan Liu
  • Maosong Sun

Fine-grained entity typing aims to identify the semantic type of an entity in a particular plain text. It is an important task which can be helpful for a lot of natural language processing (NLP) applications. Most existing methods typically extract features separately from the entity mention and context words for type classification. These methods inevitably fail to model complex correlations between entity mentions and context words. They also neglect rich background information about these entities in knowledge bases (KBs). To address these issues, we take information from KBs into consideration to bridge entity mentions and their context together, and thereby propose Knowledge-Attention Neural Fine-Grained Entity Typing. Experimental results and case studies on real-world datasets demonstrate that our model significantly outperforms other state-of-the-art methods, revealing the effectiveness of incorporating KB information for entity typing. Code and data for this paper can be found at https: //github. com/thunlp/KNET.

AAAI Conference 2018 Conference Paper

Neural Knowledge Acquisition via Mutual Attention Between Knowledge Graph and Text

  • Xu Han
  • Zhiyuan Liu
  • Maosong Sun

We propose a general joint representation learning framework for knowledge acquisition (KA) on two tasks, knowledge graph completion (KGC) and relation extraction (RE) from text. In this framework, we learn representations of knowledge graphs (KGs) and text within a unified parameter sharing semantic space. To achieve better fusion, we propose an effective mutual attention between KGs and text. The reciprocal attention mechanism enables us to highlight important features and perform better KGC and RE. Different from conventional joint models, no complicated linguistic analysis or strict alignments between KGs and text are required to train our models. Experiments on relation extraction and entity link prediction show that models trained under our joint framework are significantly improved in comparison with other baselines. Most existing methods for KGC and RE can be easily integrated into our framework due to its flexible architectures. The source code of this paper can be obtained from https: //github. com/thunlp/JointNRE.

AAAI Conference 2017 Conference Paper

Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision

  • Meng Zhang
  • Haoruo Peng
  • Yang Liu
  • Huanbo Luan
  • Maosong Sun

Building bilingual lexica from non-parallel data is a longstanding natural language processing research problem that could benefit thousands of resource-scarce languages which lack parallel data. Recent advances of continuous word representations have opened up new possibilities for this task, e. g. by establishing cross-lingual mapping between word embeddings via a seed lexicon. The method is however unreliable when there are only a limited number of seeds, which is a reasonable setting for resource-scarce languages. We tackle the limitation by introducing a novel matching mechanism into bilingual word representation learning. It captures extra translation pairs exposed by the seeds to incrementally improve the bilingual word embeddings. In our experiments, we find the matching mechanism to substantially improve the quality of the bilingual vector space, which in turn allows us to induce better bilingual lexica with seeds as few as 10.

IJCAI Conference 2017 Conference Paper

Fast Network Embedding Enhancement via High Order Proximity Approximation

  • Cheng Yang
  • Maosong Sun
  • Zhiyuan Liu
  • Cunchao Tu

Many Network Representation Learning (NRL) methods have been proposed to learn vector representations for vertices in a network recently. In this paper, we summarize most existing NRL methods into a unified two-step framework, including proximity matrix construction and dimension reduction. We focus on the analysis of proximity matrix construction step and conclude that an NRL method can be improved by exploring higher order proximities when building the proximity matrix. We propose Network Embedding Update (NEU) algorithm which implicitly approximates higher order proximities with theoretical approximation bound and can be applied on any NRL methods to enhance their performances. We conduct experiments on multi-label classification and link prediction tasks. Experimental results show that NEU can make a consistent and significant improvement over a number of NRL methods with almost negligible running time on all three publicly available datasets.

IJCAI Conference 2017 Conference Paper

Image-embodied Knowledge Representation Learning

  • Ruobing Xie
  • Zhiyuan Liu
  • Huanbo Luan
  • Maosong Sun

Entity images could provide significant visual information for knowledge representation learning. Most conventional methods learn knowledge representations merely from structured triples, ignoring rich visual information extracted from entity images. In this paper, we propose a novel Image-embodied Knowledge Representation Learning model (IKRL), where knowledge representations are learned with both triple facts and images. More specifically, we first construct representations for all images of an entity with a neural image encoder. These image representations are then integrated into an aggregated image-based representation via an attention-based method. We evaluate our IKRL models on knowledge graph completion and triple classification. Experimental results demonstrate that our models outperform all baselines on both tasks, which indicates the significance of visual information for knowledge representations and the capability of our models in learning knowledge representations with images.

IJCAI Conference 2017 Conference Paper

Iterative Entity Alignment via Joint Knowledge Embeddings

  • Hao Zhu
  • Ruobing Xie
  • Zhiyuan Liu
  • Maosong Sun

Entity alignment aims to link entities and their counterparts among multiple knowledge graphs (KGs). Most existing methods typically rely on external information of entities such as Wikipedia links and require costly manual feature construction to complete alignment. In this paper, we present a novel approach for entity alignment via joint knowledge embeddings. Our method jointly encodes both entities and relations of various KGs into a unified low-dimensional semantic space according to a small seed set of aligned entities. During this process, we can align entities according to their semantic distance in this joint semantic space. More specifically, we present an iterative and parameter sharing method to improve alignment performance. Experiment results on real-world datasets show that, as compared to baselines, our method achieves significant improvements on entity alignment, and can further improve knowledge graph completion performance on various KGs with the favor of joint knowledge embeddings.

IJCAI Conference 2017 Conference Paper

Joint Training for Pivot-based Neural Machine Translation

  • Yong Cheng
  • Qian Yang
  • Yang Liu
  • Maosong Sun
  • Wei Xu

While recent neural machine translation approaches have delivered state-of-the-art performance for resource-rich language pairs, they suffer from the data scarcity problem for resource-scarce language pairs. Although this problem can be alleviated by exploiting a pivot language to bridge the source and target languages, the source-to-pivot and pivot-to-target translation models are usually independently trained. In this work, we introduce a joint training algorithm for pivot-based neural machine translation. We propose three methods to connect the two models and enable them to interact with each other during training. Experiments on Europarl and WMT corpora show that joint training of source-to-pivot and pivot-to-target models leads to significant improvements over independent training across various languages.

IJCAI Conference 2017 Conference Paper

Lexical Sememe Prediction via Word Embeddings and Matrix Factorization

  • Ruobing Xie
  • Xingchi Yuan
  • Zhiyuan Liu
  • Maosong Sun

Sememes are defined as the minimum semantic units of human languages. People have manually annotated lexical sememes for words and form linguistic knowledge bases. However, manual construction is time-consuming and labor-intensive, with significant annotation inconsistency and noise. In this paper, we for the first time explore to automatically predict lexical sememes based on semantic meanings of words encoded by word embeddings. Moreover, we apply matrix factorization to learn semantic relations between sememes and words. In experiments, we take a real-world sememe knowledge base HowNet for training and evaluation, and the results reveal the effectiveness of our method for lexical sememe prediction. Our method will be of great use for annotation verification of existing noisy sememe knowledge bases and annotation suggestion of new words and phrases.

TIST Journal 2017 Journal Article

PRISM

  • Cunchao Tu
  • Zhiyuan Liu
  • Huanbo Luan
  • Maosong Sun

Profession is an important social attribute of people. It plays a crucial role in commercial services such as personalized recommendation and targeted advertising. In practice, profession information is usually unavailable due to privacy and other reasons. In this article, we explore the task of identifying user professions according to their behaviors in social media. The task confronts the following challenges that make it non-trivial: how to incorporate heterogeneous information of user behaviors, how to effectively utilize both labeled and unlabeled data, and how to exploit community structure. To address these challenges, we present a framework called Profession Identification in Social Media. It takes advantage of both personal information and community structure of users in the following aspects: (1) We present a cascaded two-level classifier with heterogeneous personal features to measure the confidence of users belonging to different professions. (2) We present a multi-training process to take advantages of both labeled and unlabeled data to enhance classification performance. (3) We design a profession identification method synthetically considering the confidences from personal features and community structure. We collect a real-world dataset to conduct experiments, and experimental results demonstrate the significant effectiveness of our method compared with other baseline methods. By applying prediction on large-scale users, we also analyze characteristics of microblog users, finding that there are significant diversities among users of different professions in demographics, social network structures, and linguistic styles.

IJCAI Conference 2017 Conference Paper

TransNet: Translation-Based Network Representation Learning for Social Relation Extraction

  • Cunchao Tu
  • Zhengyan Zhang
  • Zhiyuan Liu
  • Maosong Sun

Conventional network representation learning (NRL) models learn low-dimensional vertex representations by simply regarding each edge as a binary or continuous value. However, there exists rich semantic information on edges and the interactions between vertices usually preserve distinct meanings, which are largely neglected by most existing NRL models. In this work, we present a novel Translation-based NRL model, TransNet, by regarding the interactions between vertices as a translation operation. Moreover, we formalize the task of Social Relation Extraction (SRE) to evaluate the capability of NRL methods on modeling the relations between vertices. Experimental results on SRE demonstrate that TransNet significantly outperforms other baseline methods by 10% to 20% on hits@1. The source code and datasets can be obtained from https: //github. com/thunlp/TransNet.

IJCAI Conference 2016 Conference Paper

Agreement-Based Joint Training for Bidirectional Attention-Based Neural Machine Translation

  • Yong Cheng
  • Shiqi Shen
  • Zhongjun He
  • Wei He
  • Hua Wu
  • Maosong Sun
  • Yang Liu

The attentional mechanism has proven to be effective in improving end-to-end neural machine translation. However, due to the intricate structural divergence between natural languages, unidirectional attention-based models might only capture partial aspects of attentional regularities. We propose agreement-based joint training for bidirectional attention-based end-to-end neural machine translation. Instead of training source-to-target and target-to-source translation models independently, our approach encourages the two complementary models to agree on word alignment matrices on the same training data. Experiments on Chinese-English and English-French translation tasks show that agreement-based joint training significantly improves both alignment and translation quality over independent training.

AAAI Conference 2016 Conference Paper

Building Earth Mover’s Distance on Bilingual Word Embeddings for Machine Translation

  • Meng Zhang
  • Yang Liu
  • Huanbo Luan
  • Maosong Sun
  • Tatsuya Izuha
  • Jie Hao

Following their monolingual counterparts, bilingual word embeddings are also on the rise. As a major application task, word translation has been relying on the nearest neighbor to connect embeddings cross-lingually. However, the nearest neighbor strategy suffers from its inherently local nature and fails to cope with variations in realistic bilingual word embeddings. Furthermore, it lacks a mechanism to deal with manyto-many mappings that often show up across languages. We introduce Earth Mover’s Distance to this task by providing a natural formulation that translates words in a holistic fashion, addressing the limitations of the nearest neighbor. We further extend the formulation to a new task of identifying parallel sentences, which is useful for statistical machine translation systems, thereby expanding the application realm of bilingual word embeddings. We show encouraging performance on both tasks.

IJCAI Conference 2016 Conference Paper

Knowledge Representation Learning with Entities, Attributes and Relations

  • Yankai Lin
  • Zhiyuan Liu
  • Maosong Sun

Distributed knowledge representation (KR) encodes both entities and relations in a low-dimensional semantic space, which has significantly promoted the performance of relation extraction and knowledge reasoning. In many knowledge graphs (KG), some relations indicate attributes of entities (attributes) and others indicate relations between entities (relations). Existing KR models regard all relations equally, and usually suffer from poor accuracies when modeling one-to-many and many-to-one relations, mostly composed of attribute. In this paper, we distinguish existing KG-relations into attributes and relations, and propose a new KR model with entities, attributes and relations (KR-EAR). The experiment results show that, by special modeling of attribute, KR-EAR can significantly outperform state-of-the-art KR models in prediction of entities, attributes and relations.

IJCAI Conference 2016 Conference Paper

Max-Margin DeepWalk: Discriminative Learning of Network Representation

  • Cunchao Tu
  • Weicheng Zhang
  • Zhiyuan Liu
  • Maosong Sun

DeepWalk is a typical representation learning method that learns low-dimensional representations for vertices in social networks. Similar to other network representation learning (NRL) models, it encodes the network structure into vertex representations and is learnt in unsupervised form. However, the learnt representations usually lack the ability of discrimination when applied to machine learning tasks, such as vertex classification. In this paper, we overcome this challenge by proposing a novel semi-supervised model, max-margin DeepWalk (MMDW). MMDW is a unified NRL framework that jointly optimizes the max-margin classifier and the aimed social representation learning model. Influenced by the max-margin classifier, the learnt representations not only contain the network structure, but also have the characteristic of discrimination. The visualizations of learnt representations indicate that our model is more discriminative than unsupervised ones, and the experimental results on vertex classification demonstrate that our method achieves a significant improvement than other state-of-the-art methods.

AAAI Conference 2016 Conference Paper

Representation Learning of Knowledge Graphs with Entity Descriptions

  • Ruobing Xie
  • Zhiyuan Liu
  • Jia Jia
  • Huanbo Luan
  • Maosong Sun

Representation learning (RL) of knowledge graphs aims to project both entities and relations into a continuous lowdimensional space. Most methods concentrate on learning representations with knowledge triples indicating relations between entities. In fact, in most knowledge graphs there are usually concise descriptions for entities, which cannot be well utilized by existing methods. In this paper, we propose a novel RL method for knowledge graphs taking advantages of entity descriptions. More specifically, we explore two encoders, including continuous bag-of-words and deep convolutional neural models to encode semantics of entity descriptions. We further learn knowledge representations with both triples and descriptions. We evaluate our method on two tasks, including knowledge graph completion and entity classification. Experimental results on real-world datasets show that, our method outperforms other baselines on the two tasks, especially under the zero-shot setting, which indicates that our method is capable of building representations for novel entities according to their descriptions. The source code of this paper can be obtained from https: //github. com/xrb92/DKRL.

IJCAI Conference 2016 Conference Paper

Representation Learning of Knowledge Graphs with Hierarchical Types

  • Ruobing Xie
  • Zhiyuan Liu
  • Maosong Sun

Representation learning of knowledge graphs aims to encode both entities and relations into a continuous low-dimensional vector space. Most existing methods only concentrate on learning representations with structured information located in triples, regardless of the rich information located in hierarchical types of entities, which could be collected in most knowledge graphs. In this paper, we propose a novel method named Type-embodied Knowledge Representation Learning (TKRL) to take advantages of hierarchical entity types. We suggest that entities should have multiple representations in different types. More specifically, we consider hierarchical types as projection matrices for entities, with two type encoders designed to model hierarchical structures. Meanwhile, type information is also utilized as relation-specific type constraints. We evaluate our models on two tasks including knowledge graph completion and triple classification, and further explore the performances on long-tail dataset. Experimental results show that our models significantly outperform all baselines on both tasks, especially with long-tail distribution. It indicates that our models are capable of capturing hierarchical type information which is significant when constructing representations of knowledge graphs. The source code of this paper can be obtained from https: //github. com/thunlp/TKRL.

AAAI Conference 2015 Conference Paper

Contrastive Unsupervised Word Alignment with Non-Local Features

  • Yang Liu
  • Maosong Sun

Word alignment is an important natural language processing task that indicates the correspondence between natural languages. Recently, unsupervised learning of log-linear models for word alignment has received considerable attention as it combines the merits of generative and discriminative approaches. However, a major challenge still remains: it is intractable to calculate the expectations of non-local features that are critical for capturing the divergence between natural languages. We propose a contrastive approach that aims to differentiate observed training examples from noises. It not only introduces prior knowledge to guide unsupervised learning but also cancels out partition functions. Based on the observation that the probability mass of log-linear models for word alignment is usually highly concentrated, we propose to use top-n alignments to approximate the expectations with respect to posterior distributions. This allows for efficient and accurate calculation of expectations of non-local features. Experiments show that our approach achieves significant improvements over stateof-the-art unsupervised word alignment methods.

IJCAI Conference 2015 Conference Paper

Iterative Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora

  • Meiping Dong
  • Yang Liu
  • Huanbo Luan
  • Maosong Sun
  • Tatsuya Izuha
  • Dakun Zhang

While parallel corpora are an indispensable resource for data-driven multilingual natural language processing tasks such as machine translation, they are limited in quantity, quality and coverage. As a result, learning translation models from nonparallel corpora has become increasingly important nowadays, especially for low-resource languages. In this work, we propose a joint model for iteratively learning parallel lexicons and phrases from non-parallel corpora. The model is trained using a Viterbi EM algorithm that alternates between constructing parallel phrases using lexicons and updating lexicons based on the constructed parallel phrases. Experiments on Chinese-English datasets show that our approach learns better parallel lexicons and phrases and improves translation performance significantly.

IJCAI Conference 2015 Conference Paper

Joint Learning of Character and Word Embeddings

  • Xinxiong Chen
  • Lei Xu
  • Zhiyuan Liu
  • Maosong Sun
  • Huanbo Luan

Most word embedding methods take a word as a basic unit and learn embeddings according to words’ external contexts, ignoring the internal structures of words. However, in some languages such as Chinese, a word is usually composed of several characters and contains rich internal information. The semantic meaning of a word is also related to the meanings of its composing characters. Hence, we take Chinese for example, and present a characterenhanced word embedding model (CWE). In order to address the issues of character ambiguity and non-compositional words, we propose multipleprototype character embeddings and an effective word selection method. We evaluate the effectiveness of CWE on word relatedness computation and analogical reasoning. The results show that CWE outperforms other baseline methods which ignore internal character information. The codes and data can be accessed from https: //github. com/ Leonard-Xu/CWE.

AAAI Conference 2015 Conference Paper

Learning Entity and Relation Embeddings for Knowledge Graph Completion

  • Yankai Lin
  • Zhiyuan Liu
  • Maosong Sun
  • Yang Liu
  • Xuan Zhu

Knowledge graph completion aims to perform link prediction between entities. In this paper, we consider the approach of knowledge graph embeddings. Recently, models such as TransE and TransH build entity and relation embeddings by regarding a relation as translation from head entity to tail entity. We note that these models simply put both entities and relations within the same semantic space. In fact, an entity may have multiple aspects and various relations may focus on different aspects of entities, which makes a common space insufficient for modeling. In this paper, we propose TransR to build entity and relation embeddings in separate entity space and relation spaces. Afterwards, we learn embeddings by first projecting entities from entity space to corresponding relation space and then building translations between projected entities. In experiments, we evaluate our models on three tasks including link prediction, triple classification and relational fact extraction. Experimental results show significant and consistent improvements compared to stateof-the-art baselines including TransE and TransH. The source code of this paper can be obtained from https: //github. com/mrlyk423/relation extraction.

IJCAI Conference 2015 Conference Paper

Network Representation Learning with Rich Text Information

  • Cheng Yang
  • Zhiyuan Liu
  • Deli Zhao
  • Maosong Sun
  • Edward Chang

Representation learning has shown its effectiveness in many tasks such as image classification and text mining. Network representation learning aims at learning distributed vector representation for each vertex in a network, which is also increasingly recognized as an important aspect for network analysis. Most network representation learning methods investigate network structures for learning. In reality, network vertices contain rich information (such as text), which cannot be well applied with algorithmic frameworks of typical representation learning methods. By proving that DeepWalk, a state-ofthe-art network representation method, is actually equivalent to matrix factorization (MF), we propose text-associated DeepWalk (TADW). TADW incorporates text features of vertices into network representation learning under the framework of matrix factorization. We evaluate our method and various baseline methods by applying them to the task of multi-class classification of vertices. The experimental results show that, our method outperforms other baselines on all three datasets, especially when networks are noisy and training ratio is small. The source code of this paper can be obtained from https: //github. com/albertyang33/TADW.

AAAI Conference 2015 Conference Paper

Phrase Type Sensitive Tensor Indexing Model for Semantic Composition

  • Yu Zhao
  • Zhiyuan Liu
  • Maosong Sun

Compositional semantic aims at constructing the meaning of phrases or sentences according to the compositionality of word meanings. In this paper, we propose to synchronously learn the representations of individual words and extracted high-frequency phrases. Representations of extracted phrases are considered as gold standard for constructing more general operations to compose the representation of unseen phrases. We propose a grammatical type specific model that improves the composition flexibility by adopting vector-tensorvector operations. Our model embodies the compositional characteristics of traditional additive and multiplicative model. Empirical result shows that our model outperforms state-of-the-art composition methods in the task of computing phrase similarities.

IJCAI Conference 2015 Conference Paper

Representation Learning for Measuring Entity Relatedness with Rich Information

  • Yu Zhao
  • Zhiyuan Liu
  • Maosong Sun

Incorporating multiple types of relational information from heterogeneous networks has been proved effective in data mining. Although Wikipedia is one of the most famous heterogeneous network, previous works of semantic analysis on Wikipedia are mostly limited on single type of relations. In this paper, we aim at incorporating multiple types of relations to measure the semantic relatedness between Wikipedia entities. We propose a framework of coordinate matrix factorization to construct lowdimensional continuous representation for entities, categories and words in the same semantic space. We formulate this task as the completion of a sparse entity-entity association matrix, in which each entry quantifies the strength of relatedness between corresponding entities. We evaluate our model on the task of judging pair-wise word similarity. Experiment result shows that our model outperforms both traditional entity relatedness algorithms and other representation learning models.

AAAI Conference 2015 Conference Paper

Topical Word Embeddings

  • Yang Liu
  • Zhiyuan Liu
  • Tat-Seng Chua
  • Maosong Sun

Most word embedding models typically represent each word using a single vector, which makes these models indiscriminative for ubiquitous homonymy and polysemy. In order to enhance discriminativeness, we employ latent topic models to assign topics for each word in the text corpus, and learn topical word embeddings (TWE) based on both words and their topics. In this way, contextual word embeddings can be flexibly obtained to measure contextual word similarity. We can also build document representations, which are more expressive than some widely-used document models such as latent topic models. In the experiments, we evaluate the TWE models on two tasks, contextual word similarity and text classification. The experimental results show that our models outperform typical word embedding models including the multi-prototype version on contextual word similarity, and also exceed latent topic models and other representative document models on text classification. The source code of this paper can be obtained from https: //github. com/largelymfs/ topical word embeddings.

AAAI Conference 2013 Conference Paper

An Extended GHKM Algorithm for Inducing Lambda-SCFG

  • Peng Li
  • Yang Liu
  • Maosong Sun

Semantic parsing, which aims at mapping a natural language (NL) sentence into its formal meaning representation (e. g. , logical form), has received increasing attention in recent years. While synchronous context-free grammar (SCFG) augmented with lambda calculus (λ- SCFG) provides an effective mechanism for semantic parsing, how to learn such λ-SCFG rules still remains a challenge because of the difficulty in determining the correspondence between NL sentences and logical forms. To alleviate this structural divergence problem, we extend the GHKM algorithm, which is a state-ofthe-art algorithm for learning synchronous grammars in statistical machine translation, to induce λ-SCFG from pairs of NL sentences and logical forms. By treating logical forms as trees, we reformulate the theory behind GHKM that gives formal semantics to the alignment between NL words and logical form tokens. Experiments on the GEOQUERY dataset show that our semantic parser achieves an F-measure of 90. 2%, the best result published to date.

NeurIPS Conference 2012 Conference Paper

Monte Carlo Methods for Maximum Margin Supervised Topic Models

  • Qixia Jiang
  • Jun Zhu
  • Maosong Sun
  • Eric Xing

An effective strategy to exploit the supervising side information for discovering predictive topic representations is to impose discriminative constraints induced by such information on the posterior distributions under a topic model. This strategy has been adopted by a number of supervised topic models, such as MedLDA, which employs max-margin posterior constraints. However, unlike the likelihood-based supervised topic models, of which posterior inference can be carried out using the Bayes' rule, the max-margin posterior constraints have made Monte Carlo methods infeasible or at least not directly applicable, thereby limited the choice of inference algorithms to be based on variational approximation with strict mean field assumptions. In this paper, we develop two efficient Monte Carlo methods under much weaker assumptions for max-margin supervised topic models based on an importance sampler and a collapsed Gibbs sampler, respectively, in a convex dual formulation. We report thorough experimental results that compare our approach favorably against existing alternatives in both accuracy and efficiency.

IJCAI Conference 2011 Conference Paper

CHIME: An Efficient Error-Tolerant Chinese Pinyin Input Method

  • Yabin Zheng
  • Chen Li
  • Maosong Sun

Chinese Pinyin input methods are very important for Chinese language processing. In many cases, users may make typing errors. For example, a user wants to type in "shenme" (meaning "what" in English) but may type in "shenem" instead. Existing Pinyin input methods fail in converting such a Pinyin sequence with errors to the right Chinese words. To solve this problem, we developed an efficient error-tolerant Pinyin input method called "CHIME'' that can handle typing errors. By incorporating state-of-the-art techniques and language-specific features, the method achieves a better performance than state-of-the-art input methods. It can efficiently find relevant words in milliseconds for an input Pinyin sequence.

AAAI Conference 2011 Conference Paper

Fast Query Recommendation by Search

  • Qixia Jiang
  • Maosong Sun

Query recommendation can not only effectively facilitate users to obtain their desired information but also increase ads’ click-through rates. This paper presents a general and highly efficient method for query recommendation. Given query sessions, we automatically generate many similar and dissimilar query-pairs as the prior knowledge. Then we learn a transformation from the prior knowledge to move similar queries closer such that similar queries tend to have similar hash values. This is formulated as minimizing the empirical error on the prior knowledge while maximizing the gap between the data and some partition hyperplanes randomly generated in advance. In the recommendation stage, we search queries that have similar hash values to the given query, rank the found queries and return the top K queries as the recommendation result. All the experimental results demonstrate that our method achieves encouraging results in terms of efficiency and recommendation performance.

TIST Journal 2011 Journal Article

PLDA+

  • Zhiyuan Liu
  • Yuzhou Zhang
  • Edward Y. Chang
  • Maosong Sun

Previous methods of distributed Gibbs sampling for LDA run into either memory or communication bottlenecks. To improve scalability, we propose four strategies: data placement, pipeline processing, word bundling, and priority-based scheduling. Experiments show that our strategies significantly reduce the unparallelizable communication bottleneck and achieve good load balancing, and hence improve scalability of LDA.

IJCAI Conference 2009 Conference Paper

  • Yabin Zheng
  • Zhiyuan Liu
  • Maosong Sun
  • Liyun Ru
  • Yang Zhang

In this paper, we proposed a novel method to detect new words in domain-specific fields based on user behaviors. First, we select the most representative words from domain-specific lexicon. Then combining with user behaviors, we try to discover the potential experts in this field who use those terminologies frequently. Finally, we make further efforts to identify new words from behaviors of those experts. Words used much more frequently in this community than others are most probably new words. In brief, our method follows a collaborative filtering way: first from words to find professional experts, then from experts to discover new words, which is different from the traditional new word detection methods. Our method achieves up to 0. 86 in accuracy on a computer science related data set. Moreover, the proposed method can be easily extended to related words retrieval task. We compare our method with Google Sets and Bayesian Sets. Experiments show that our method and Bayesian Sets gives better results than Google Sets.