Arrow Research search

Author name cluster

Rui Yan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

93 papers
2 author rows

Possible papers

93

AAAI Conference 2026 Conference Paper

Cyto-SSL: A Self-Supervised Pretraining Framework for Cytology Foundation Model

  • Yiming Zhang
  • Rui Yan
  • Xiaohua Wan
  • Yifan Zhao
  • Shuang Feng
  • Zhetao Xu
  • Ying Wang
  • Fa Zhang

Cytological images originate from exfoliated cells, collected via liquid-based slides and digitized into whole slide images (WSIs). Unlike histological WSIs that exhibit continuous and well-structured tissue, cytological WSIs are sparse in spatial distribution and unstructured in cellular relationships. Typically, the nucleus serves as the primary diagnostic feature, while surrounding cytoplasmic information plays a supportive role. These unique characteristics limit the development of effective foundation models and hinder the transferability of histology-based models for cytopathology. To address this, we propose **Cyto-SSL**, the first self-supervised pretraining framework for cytological images. It introduces **Nuclei-Centered Perturbation**, which highlights individual nuclei by perturbing non-nuclear regions. We also design an SR-Transformer module, which complements this by using sparse attention to concentrate on diagnostically relevant scattered cells, while iRPE helps model to capture local spatial relationships and avoids unnecessary attention to irrelevant global structures. Experimental results show that **Cyto-SSL** enhances performance across diverse cytological datasets and Multiple Instance Learning (MIL) methods. On a WSI-level dataset, it achieved 95.67% accuracy and outperformed ImageNet-pretrained ResNet-50 by 11.33%, demonstrating superior feature representation for cytological analysis. Additionally, **Cyto-SSL** modules are plug-and-play, easily integrated into other pretraining frameworks, yielding a 2.6% accuracy gain across different SSL methods.

AAAI Conference 2026 Conference Paper

DS-ATGO: Dual-Stage Synergistic Learning via Forward Adaptive Threshold and Backward Gradient Optimization for Spiking Neural Networks

  • Jiaqiang Jiang
  • Wenfeng Xu
  • Jing Fan
  • Rui Yan

Brain-inspired spiking neural networks (SNNs) are recognized as a promising avenue for achieving efficient, low-energy neuromorphic computing. Direct training of SNNs typically relies on surrogate gradient (SG) learning to estimate derivatives of non-differentiable spiking activity. However, during training, the distribution of neuronal membrane potentials varies across timesteps and progressively deviates toward both sides of the firing threshold. When the firing threshold and SG remain fixed, this may lead to imbalanced spike firing and diminished gradient signals, preventing SNNs from performing well. To address these issues, we propose a novel dual-stage synergistic learning algorithm that achieves forward adaptive thresholding and backward dynamic SG. In forward propagation, we adaptively adjust thresholds based on the distribution of membrane potential dynamics (MPD) at each timestep, which enriches neuronal diversity and effectively balances firing rates across timesteps and layers. In backward propagation, drawing from the underlying association between MPD, threshold, and SG, we dynamically optimize SG to enhance gradient estimation through spatio-temporal alignment, effectively mitigating gradient information loss. Experimental results demonstrate that our method achieves significant performance improvements. Moreover, it allows neurons to fire stable proportions of spikes at each timestep and increases the proportion of neurons that obtain gradients in deeper layers.

AAAI Conference 2026 Conference Paper

FinRpt: Dataset, Evaluation System and LLM-based Multi-agent Framework for Equity Research Report Generation

  • Song Jin
  • Shuqi Li
  • Shukun Zhang
  • Rui Yan

While LLMs have shown great success in financial tasks like stock prediction and question answering, their application in fully automating Equity Research Report generation remains uncharted territory. In this paper, we formulate the Equity Research Report (ERR) Generation task for the first time. To address the data scarcity and the evaluation metrics absence, we present an open-source evaluation benchmark for ERR generation - FinRpt. We frame a Dataset Construction Pipeline that integrates 7 financial data types and produces a high-quality ERR dataset automatically, which could be used for model training and evaluation. We also introduce a comprehensive evaluation system including 11 metrics to assess the generated ERRs. Moreover, we propose a multi-agent framework specifically tailored to address this task, named FinRpt-Gen, and train several LLM-based agents on the proposed datasets using Supervised Fine-Tuning and Reinforcement Learning. Experimental results indicate the data quality and metrics effectiveness of the benchmark FinRpt and the strong performance of FinRpt-Gen, showcasing their potential to drive innovation in the ERR generation field. All code and datasets are publicly available.

AAAI Conference 2026 Conference Paper

MACRec: A Multi-View Subspace Alignment Framework for Contrastive Sampling Calibration in Recommendation

  • Junping Liu
  • Mingchao Yu
  • Xinrong Hu
  • Rui Yan
  • Wanqing Li
  • Jie Yang
  • Yi Guo

Graph Contrastive Learning (GCL) has proven effective in mitigating data sparsity and enhancing representation learning for recommendation. Yet, most GCL frameworks indiscriminately treat all non-anchor nodes as negatives during contrastive sampling, often leading to the false negative problem where semantically similar nodes are incorrectly repelled. Previous attempts to mitigate this issue rely on predetermined heuristics or local neighborhood mining, which struggle to reliably identify false negatives. More critically, they often overlook authentic user-item interactions for anchoring sample relationships. As a result, this paper presents MACRec, a Multi-View subspace-Alignment framework designed to Calibrate contrastive sampling in GCLbased Recommendation. MACRec comprises three core components: (1) a Multi-View Affinity (MVA) module that captures consistent semantic relations across multiple augmentations via self-expression modeling; (2) a Cross-Subspace Alignment (CSA) mechanism that leverages authentic useritem behavioral interactions to enforce semantic consistency across user and item subspaces; and (3) a Calibrationbased Contrastive Reweighting (CCR) strategy to dynamically down-weight potential false negatives during the contrastive learning process. Extensive experiments on three realworld benchmarks demonstrate that MACRec consistently improves performance across various augmentation backbones, achieving up to 14.55% relative gains.

AAAI Conference 2026 Conference Paper

Making Every Head Count: Sparse Attention Without the Speed-Performance Trade-off

  • Mingkuan Zhao
  • Wentao Hu
  • Jiayin Wang
  • Xin Lai
  • Tianchen Huang
  • Yuheng Min
  • Rui Yan
  • Xiaoyan Zhu

The design of Large Language Models (LLMs) has long been hampered by a fundamental conflict within their core attention mechanism: its remarkable expressivity is built upon a computational complexity of O(H·N²) that grows quadratically with the context size (N) and linearly with the number of heads (H). This standard implementation harbors significant computational redundancy, as all heads independently compute attention over the same sequence space. Existing sparse methods, meanwhile, often trade information integrity for computational efficiency. To resolve this efficiency-performance trade-off, we propose SPAttention, whose core contribution is the introduction of a new paradigm we term Principled Structural Sparsity. SPAttention does not merely drop connections but instead reorganizes the computational task by partitioning the total attention workload into balanced, non-overlapping distance bands, assigning each head a unique segment. This approach transforms the multi-head attention mechanism from H independent O(N²) computations into a single, collaborative O(N²) computation, fundamentally reducing complexity by a factor of H. The structured inductive bias compels functional specialization among heads, enabling a more efficient allocation of computational resources from redundant modeling to distinct dependencies across the entire sequence span. Extensive empirical validation on the OLMoE-1B-7B and 0.25B-1.75B model series demonstrates that while delivering an approximately two-fold increase in training throughput, its performance is on par with standard dense attention, even surpassing it on select key metrics, while consistently outperforming representative sparse attention methods including Longformer, Reformer, and BigBird across all evaluation metrics. Our work demonstrates that thoughtfully designed structural sparsity can serve as an effective inductive bias that simultaneously improves both computational efficiency and model performance, opening a new avenue for the architectural design of next-generation, high-performance LLMs.

AAAI Conference 2026 Conference Paper

MPD-SGR: Robust Spiking Neural Networks with Membrane Potential Distribution-Driven Surrogate Gradient Regularization

  • Runhao Jiang
  • Chengzhi Jiang
  • Rui Yan
  • Huajin Tang

The surrogate gradient (SG) method has shown significant promise in enhancing the performance of deep spiking neural networks (SNNs), but it also introduces vulnerabilities to adversarial attacks. Although spike coding strategies and neural dynamics parameters have been extensively studied for their impact on robustness, the critical role of gradient magnitude, which reflects the model's sensitivity to input perturbations, remains underexplored. In SNNs, the gradient magnitude is primarily determined by the interaction between the membrane potential distribution (MPD) and the SG function. In this study, we investigate the relationship between the MPD and SG and their implications for improving the robustness of SNNs. Our theoretical analysis reveals that reducing the proportion of membrane potentials lying within the gradient-available range of the SG function effectively mitigates the sensitivity of SNNs to input perturbations. Building upon this insight, we propose a novel MPD-driven surrogate gradient regularization (MPD-SGR) method, which enhances robustness by explicitly regularizing the MPD based on its interaction with the SG function. Extensive experiments across multiple image classification benchmarks and diverse network architectures confirm that the MPD-SGR method significantly enhances the resilience of SNNs to adversarial perturbations and exhibits strong generalizability across diverse network configurations, SG functions, and spike encoding schemes.

AAAI Conference 2026 Conference Paper

Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction

  • Zheng Yin
  • Chengjian Li
  • Xiangbo Shu
  • Meiqi Cao
  • Rui Yan
  • Jinhui Tang

Comprehensively and flexibly capturing the complex spatio-temporal dependencies of human motion is critical for multi-person motion prediction. Existing methods grapple with two primary limitations: i) Inflexible spatiotemporal representation due to reliance on positional encodings for capturing spatiotemporal information. ii) High computational costs stemming from the quadratic time complexity of conventional attention mechanisms. To overcome these limitations, we propose the Spatiotemporal-Untrammelled Mixture of Experts (ST-MoE), which flexibly explores complex spatio-temporal dependencies in human motion and significantly reduces computational cost. To adaptively mine complex spatio-temporal patterns from human motion, our model incorporates four distinct types of spatiotemporal experts, each specializing in capturing different spatial or temporal dependencies. To reduce the potential computational overhead while integrating multiple experts, we introduce bidirectional spatiotemporal Mamba as experts, each sharing bidirectional temporal and spatial Mamba in distinct combinations to achieve model efficiency and parameter economy. Extensive experiments on four multi-person benchmark datasets demonstrate that our approach not only outperforms state-of-art in accuracy but also reduces model parameter by 41.38% and achieves a 3.6× speedup in training.

IJCAI Conference 2025 Conference Paper

Adaptive Gradient Learning for Spiking Neural Networks by Exploiting Membrane Potential Dynamics

  • Jiaqiang Jiang
  • Lei Wang
  • Runhao Jiang
  • Jing Fan
  • Rui Yan

Recent advancements have focused on directly training high-performance spiking neural networks (SNNs) by estimating the approximate gradients of spiking activity through a continuous function with constant sharpness, known as surrogate gradient (SG) learning. However, as spikes propagate within neurons and among layers, the distribution of membrane potential dynamics (MPD) will deviate from the gradient-available interval of fixed SG, hindering SNNs from searching the optimal solution space. To maintain the stability of gradient flows, SG needs to align with evolving MPD. Here, we propose a novel adaptive gradient learning for SNNs by exploiting MPD, namely MPD-AGL. It fully accounts for the underlying factors contributing to membrane potential shifts and establishes a dynamic association between SG and MPD at different timesteps to relax gradient estimation, which provides a new degree of freedom for SG learning. Experimental results demonstrate that our method achieves excellent performance at low latency. Moreover, it increases the proportion of neurons that fall into the gradient-available interval compared to fixed SG, effectively mitigating the gradient vanishing problem. Code is available at https: //github. com/jqjiang1999/MPD-AGL.

AAAI Conference 2025 Conference Paper

BiDeV: Bilateral Defusing Verification for Complex Claim Fact-Checking

  • Yuxuan Liu
  • Hongda Sun
  • Wenya Guo
  • Xinyan Xiao
  • Cunli Mao
  • Zhengtao Yu
  • Rui Yan

Complex claim fact-checking performs a crucial role in disinformation detection. However, existing fact-checking methods struggle with claim vagueness, specifically in effectively handling latent information and complex relations within claims. Moreover, evidence redundancy, where non-essential information complicates the verification process, remains a significant issue. To tackle these limitations, we propose Bilateral Defusing Verification (BiDeV), a novel fact-checking working-flow framework integrating multiple role-played LLMs to mimic the human-expert fact-checking process. BiDeV consists of two main modules: Vagueness Defusing identifies latent information and resolves complex relations to simplify the claim, and Redundancy Defusing eliminates redundant content to enhance the evidence quality. Extensive experimental results on two widely used challenging fact-checking benchmarks (Hover and Feverous-s) demonstrate that our BiDeV can achieve the best performance under both gold and open settings. This highlights the effectiveness of BiDeV in handling complex claims and ensuring precise fact-checking.

ICRA Conference 2025 Conference Paper

Brain-Inspired Spatial Continuous State Encoding for Efficient Spiking-Based Navigation

  • Qingao Chai
  • Jiashuo Wang
  • Runhao Jiang
  • Bo Yang
  • Rui Yan
  • Huajin Tang

Spiking neural networks (SNNs) show great potential in mapless navigation tasks due to their low power consumption, but the continuous representation of spatial information poses a challenge to SNN training. Neuroscience findings reveal that spatial cognition cells encode spatial information through population spike patterns. Inspired by this, we propose a navigation method based on SNNs, leveraging spatial cognition cells, which include grid cells (GCs), head direction cells (HDCs), and boundary vector cells (BVCs). Our method integrates spike-based information to achieve precise navigation goal encoding and egocentric environment perception, significantly improving SNN navigation capabilities in complex environments. Simulation and real-world experiments demonstrate that our method achieves significant improvements in navigation success rate and energy efficiency, showcasing superior adaptability across environments. Our work provides a novel approach to developing efficient brain-inspired navigation systems.

NeurIPS Conference 2025 Conference Paper

DAPO : Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage-Based Policy Optimization

  • Jiacai Liu
  • Chaojie Wang
  • Chris Liu
  • Liang Zeng
  • Rui Yan
  • Yiwen Sun
  • Yang Liu

The role of reinforcement learning (RL) in enhancing the reasoning of large language models (LLMs) is becoming increasingly significant. Despite the success of RL in many scenarios, there are still many challenges in improving the reasoning of LLMs. One key challenge is the sparse reward, which introduces more training variance in policy optimization and makes it difficult to obtain a good estimation for value function in Actor-Critic (AC) methods. To address these issues, we introduce Direct Advantage-Based Policy Optimization (DAPO), a novel step-level offline RL algorithm with theoretical guarantees for enhancing the reasoning abilities of LLMs. Unlike response-level methods (such as DPO and GRPO) that the update directions of all reasoning steps are governed by the outcome reward uniformly, DAPO employs a critic function to provide step-level dense signals for policy optimization. Additionally, the actor and critic in DAPO are trained independently, ensuring that critic is a good estimation of true state value function and avoiding the co-training instability observed in standard AC methods. We train DAPO on mathematical and code problems and then evaluate its performance on multiple benchmarks. Our results show that DAPO can effectively enhance the mathematical and code capabilities on both SFT models and RL models, demonstrating the effectiveness of DAPO.

IJCAI Conference 2025 Conference Paper

GETMusic: Generating Music Tracks with a Unified Representation and Diffusion Framework

  • Ang Lv
  • Xu Tan
  • Peiling Lu
  • Wei Ye
  • Shikun Zhang
  • Jiang Bian
  • Rui Yan

Symbolic music generation aims to create musical notes, which can help users compose music, such as generating target instrument tracks based on provided source tracks. In practical scenarios where there’s a predefined ensemble of tracks and various composition needs, an efficient and effective generative model that can generate any target tracks based on the other tracks becomes crucial. However, previous efforts have fallen short in addressing this necessity due to limitations in their music representations and models. In this paper, we introduce a framework known as GETMusic, with ``GET'' standing for ``GEnerate music Tracks. '' This framework encompasses a novel music representation ``GETScore'' and a diffusion model ``GETDiff. '' GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time. At a training step, each track of a music piece is randomly selected as either the target or source. The training involves two processes: In the forward process, target tracks are corrupted by masking their tokens, while source tracks remain as the ground truth; in the denoising process, GETDiff is trained to predict the masked target tokens conditioning on the source tracks. Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations. Our experiments demonstrate that the versatile GETMusic outperforms prior works proposed for certain specific composition tasks.

AAAI Conference 2025 Conference Paper

GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL

  • Lang Qin
  • Ziming Wang
  • Runhao Jiang
  • Rui Yan
  • Huajin Tang

Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms, the simulation results of multiple time steps can only correspond to a single-step decision in RL. This is quite different from the real temporal dynamics in the brain and also fails to fully exploit the capacity of SNNs to process temporal data. In order to address this temporal mismatch issue and further take advantage of the inherent temporal dynamics of spiking neurons, we propose a novel temporal alignment paradigm (TAP) that leverages the single-step update of spiking neurons to accumulate historical state information in RL and introduces gated units to enhance the memory capacity of spiking neurons. Experimental results show that our method can solve partially observable Markov decision processes (POMDPs) and multi-agent cooperation problems with similar performance as recurrent neural networks (RNNs) but with about 50\% power consumption.

ICLR Conference 2025 Conference Paper

Memory Efficient Transformer Adapter for Dense Predictions

  • Dong Zhang
  • Rui Yan
  • Pingcheng Dong
  • Kwang-Ting (Tim) Cheng

While current Vision Transformer (ViT) adapter methods have shown promising accuracy, their inference speed is implicitly hindered by inefficient memory access operations, e.g., standard normalization and frequent reshaping. In this work, we propose META, a simple and fast ViT adapter that can improve the model's memory efficiency and decrease memory time consumption by reducing the inefficient memory access operations. Our method features a memory-efficient adapter block that enables the common sharing of layer normalization between the self-attention and feed-forward network layers, thereby reducing the model's reliance on normalization operations. Within the proposed block, the cross-shaped self-attention is employed to reduce the model's frequent reshaping operations. Moreover, we augment the adapter block with a lightweight convolutional branch that can enhance local inductive biases, particularly beneficial for the dense prediction tasks, e.g., object detection, instance segmentation, and semantic segmentation. The adapter block is finally formulated in a cascaded manner to compute diverse head features, thereby enriching the variety of feature representations. Empirically, extensive evaluations on multiple representative datasets validate that META substantially enhances the predicted quality, while achieving a new state-of-the-art accuracy-efficiency trade-off. Theoretically, we demonstrate that META exhibits superior generalization capability and stronger adaptability.

NeurIPS Conference 2025 Conference Paper

PolarQuant: Leveraging Polar Transformation for Key Cache Quantization and Decoding Acceleration

  • Songhao Wu
  • Ang Lv
  • xiao feng
  • Yufei Zhang
  • Xun Zhang
  • Guojun Yin
  • Wei Lin
  • Rui Yan

The increasing demand for long-context generation has made the KV cache in large language models a bottleneck in memory consumption. Quantizing the cache to lower bit widths is an effective way to reduce memory costs; however, previous methods struggle with key cache quantization due to outliers, resulting in suboptimal performance. We propose a novel quantization approach PolarQuant, which provides a new perspective for key cache quantization and efficiently addresses the outlier dilemma. We observe that the distribution of the key states reveals well-structured patterns under polar transformation. Outliers generally appear in only one of the two dimensions, which are rotated together by a specific angle when rotary position embeddings are applied. When represented as two-dimensional vectors, these dimensions exhibit well-organized patterns, with radii and angles smoothly distributed in polar space. This alleviates the channel-wise outliers, making them well-suited for key cache quantization. PolarQuant divides key vectors into groups of two-dimensional sub-vectors, encoding them as the quantized radius and the polar angle, rather than quantizing original key vectors directly. PolarQuant achieves the superior efficiency in KV cache quantization and accelerates the decoding process by turning the query-key inner product into a table lookup, all while maintaining the downstream performance of full-precision models. Our code is available at https: //github. com/ericshwu/PolarQuant.

IJCAI Conference 2025 Conference Paper

Reliable and Diverse Hierarchical Adapter for Zero-shot Video Classification

  • Wenxuan Ge
  • Peng Huang
  • Rui Yan
  • Hongyu Qu
  • Guosen Xie
  • Xiangbo Shu

Adapting pre-trained vision-language models to downstream tasks has emerged as a novel paradigm for zero-shot learning. Existing test-time adaptation (TTA) methods such as TPT attempt to fine-tune visual or textual representations to accommodate downstream tasks but still require expensive optimization costs. To this end, Training-free Dynamic Adapter (TDA) maintains a cache containing visual features for each category in a parameter-free manner and measures sample confidence based on prediction entropy of test samples. Inspired by TDA, this work aims to develop the first training-free adapter for zero-shot video classification. Capturing the intrinsic temporal relationships within video data to construct and maintain the video cache is key to extending TDA to the video domain. In this work, we propose a reliable and diverse Hierarchical Adapter for zero-shot video classification, which consists of Frame-level Cache Refiner and Video-level Cache Updater. Before each video sample enters the corresponding cache, it needs to be refined at frame level based on prediction entropy and temporal probability difference. Due to the limited capacity of the cache, we update the cache during inference based on the principle of diversity. Experiments on four popular video classification benchmarks demonstrate the effectiveness of Hierarchical Adapter. The code is available at https: //github. com/Gwxer/Hierarchical-Adapter.

IJCAI Conference 2025 Conference Paper

TEST-V: TEst-time Support-set Tuning for Zero-shot Video Classification

  • Rui Yan
  • Jin Wang
  • Hongyu Qu
  • Xiaoyu Du
  • Dong Zhang
  • Jinhui Tang
  • Tieniu Tan

Recently, adapting Vision Language Models (VLMs) to zero-shot visual classification by tuning class embedding with a few prompts (Test-time Prompt Tuning, TPT) or replacing class names with generated visual samples (support-set) has shown promising results. However, TPT cannot avoid the semantic gap between modalities while the support-set cannot be tuned. To this end, we draw on each other's strengths and propose a novel framework, namely TEst-time Support-set Tuning for zero-shot Video Classification (TEST-V). It first dilates the support-set with multiple prompts (Multi-prompting Support-set Dilation, MSD) and then erodes the support-set via learnable weights to mine key cues dynamically (Temporal-aware Support-set Erosion, TSE). Specifically, i) MSD expands the support samples for each class based on multiple prompts inquired from LLMs to enrich the diversity of the support-set. ii) TSE tunes the support-set with factorized learnable weights according to the temporal prediction consistency in a self-supervised manner to dig pivotal supporting cues for each class. TEST-V achieves state-of-the-art results across four benchmarks and shows good interpretability.

NeurIPS Conference 2025 Conference Paper

Vision-centric Token Compression in Large Language Model

  • Ling Xing
  • Alex Jinpeng Wang
  • Rui Yan
  • Xiangbo Shu
  • Jinhui Tang

Real-world applications are stretching context windows to hundreds of thousand of tokens while Large Language Models (LLMs) swell from billions to trillions of parameters. This dual expansion send compute and memory costs skyrocketing, making $\textit{token compression}$ indispensable. We introduce Vision Centric Token Compression ($\textbf{Vist}$), a $\textit{slow–fast}$ compression framework that mirrors human reading: the $\textit{fast}$ path renders distant tokens into images, letting a $\textbf{frozen, lightweight vision encoder}$ skim the low-salience context; the $\textit{slow}$ path feeds the proximal window into the LLM for fine-grained reasoning. A Probability-Informed Visual Enhancement (PVE) objective masks high-frequency tokens during training, steering the Resampler to concentrate on semantically rich regions—just as skilled reader gloss over function words. On eleven in-context learning benchmarks, $\textbf{Vist}$ achieves the same accuracy with 2. 3$\times$ fewer tokens, cutting FLOPs by 16\% and memory by 50\%. This method delivers remarkable results, outperforming the strongest text encoder-based compression method CEPE by $\textbf{7. 6}$\% on average over benchmarks like TriviaQA, NQ, PopQA, NLUI, and CLIN, setting a new standard for token efficiency in LLMs. The project is at https: //github. com/CSU-JPG/VIST.

NeurIPS Conference 2025 Conference Paper

You Only Communicate Once: One-shot Federated Low-Rank Adaptation of MLLM

  • Binqian Xu
  • Haiyang Mei
  • Zechen Bai
  • Jinjin Gong
  • Rui Yan
  • Guosen Xie
  • Yazhou Yao
  • Basura Fernando

Multimodal Large Language Models (MLLMs) with Federated Learning (FL) can quickly adapt to privacy-sensitive tasks, but face significant challenges such as high communication costs and increased attack risks, due to their reliance on multi-round communication. To address this, One-shot FL (OFL) has emerged, aiming to complete adaptation in a single client-server communication. However, existing adaptive ensemble OFL methods still need more than one round of communication, because correcting heterogeneity-induced local bias relies on aggregated global supervision, meaning they still do not achieve true one-shot communication. In this work, we make the first attempt to achieve true one-shot communication for MLLMs under OFL, by investigating whether implicit (i. e. , initial rather than aggregated) global supervision alone can effectively correct local training bias. Our key finding from the empirical study is that imposing directional supervision on local training substantially mitigates client conflicts and local bias. Building on this insight, we propose YOCO, in which directional supervision with sign-regularized LoRA B enforces global consistency, while sparsely regularized LoRA A preserves client-specific adaptability. Experiments demonstrate that YOCO cuts communication to $\sim$0. 03\% of multi-round FL while surpassing those methods in several multimodal scenarios and consistently outperforming all one-shot competitors.

NeurIPS Conference 2024 Conference Paper

CausalStock: Deep End-to-end Causal Discovery for News-driven Multi-stock Movement Prediction

  • Shuqi Li
  • Yuebo Sun
  • Yuxin Lin
  • Xin Gao
  • Shuo Shang
  • Rui Yan

There are two issues in news-driven multi-stock movement prediction tasks that are not well solved in the existing works. On the one hand, "relation discovery" is a pivotal part when leveraging the price information of other stocks to achieve accurate stock movement prediction. Given that stock relations are often unidirectional, such as the "supplier-consumer" relationship, causal relations are more appropriate to capture the impact between stocks. On the other hand, there is substantial noise existing in the news data leading to extracting effective information with difficulty. With these two issues in mind, we propose a novel framework called CausalStock for news-driven multi-stock movement prediction, which discovers the temporal causal relations between stocks. We design a lag-dependent temporal causal discovery mechanism to model the temporal causal graph distribution. Then a Functional Causal Model is employed to encapsulate the discovered causal relations and predict the stock movements. Additionally, we propose a Denoised News Encoder by taking advantage of the excellent text evaluation ability of large language models (LLMs) to extract useful information from massive news data. The experiment results show that CausalStock outperforms the strong baselines for both news-driven multi-stock movement prediction and multi-stock movement prediction tasks on six real-world datasets collected from the US, China, Japan, and UK markets. Moreover, getting benefit from the causal relations, CausalStock could offer a clear prediction mechanism with good explainability.

AAAI Conference 2024 Conference Paper

Collaborative Synthesis of Patient Records through Multi-Visit Health State Inference

  • Hongda Sun
  • Hongzhan Lin
  • Rui Yan

Electronic health records (EHRs) have become the foundation of machine learning applications in healthcare, while the utility of real patient records is often limited by privacy and security concerns. Synthetic EHR generation provides an additional perspective to compensate for this limitation. Most existing methods synthesize new records based on real EHR data, without consideration of different types of events in EHR data, which cannot control the event combinations in line with medical common sense. In this paper, we propose MSIC, a Multi-visit health Status Inference model for Collaborative EHR synthesis to address these limitations. First, we formulate the synthetic EHR generation process as a probabilistic graphical model and tightly connect different types of events by modeling the latent health states. Then, we derive a health state inference method tailored for the multi-visit scenario to effectively utilize previous records to synthesize current and future records. Furthermore, we propose to generate medical reports to add textual descriptions for each medical event, providing broader applications for synthesized EHR data. For generating different paragraphs in each visit, we incorporate a multi-generator deliberation framework to collaborate the message passing of multiple generators and employ a two-phase decoding strategy to generate high-quality reports. Our extensive experiments on the widely used benchmarks, MIMIC-III and MIMIC-IV, demonstrate that MSIC advances state-of-the-art results on the quality of synthetic data while maintaining low privacy risks.

IJCAI Conference 2024 Conference Paper

DTS-TPT: Dual Temporal-Sync Test-time Prompt Tuning for Zero-shot Activity Recognition

  • Rui Yan
  • Hongyu Qu
  • Xiangbo Shu
  • Wenbin Li
  • Jinhui Tang
  • Tieniu Tan

Finetuning the large vision-language models on video data with a set of learnable prompts has shown promising performance on zero-shot activity recognition but still requires extra video data and expensive training costs. Inspired by recent Test-time Prompt Tuning (TPT) on the image domain, this work attempts to extend TPT to video data for zero-shot activity recognition. However, monotonous spatial augmentation and short class names cannot meet the need to capture diverse and complicated semantics of human behavior during prompt tuning. To this end, this work proposes a Dual Temporal-Sync Test-time Prompt Tuning (DTS-TPT) framework for zero-shot activity recognition. DTS-TPT tunes the learnable prompts appended to text inputs on video feature sequences of different temporal scales in multiple steps during test time. In each tuning step, we minimize the semantic consistency among the predictions from video feature sequences randomly augmented via AugMix with both original class names and the corresponding description generated through LLM. Compared with the state-of-the-art methods, the proposed method improves the zero-shot top-1 accuracy by approximately 2% ~ 5% on popular benchmarks. The code is available at https: //github. com/quhongyu/DTS-TPT.

AAAI Conference 2024 Conference Paper

Enhancing Job Recommendation through LLM-Based Generative Adversarial Networks

  • Yingpeng Du
  • Di Luo
  • Rui Yan
  • Xiaopei Wang
  • Hongzhi Liu
  • Hengshu Zhu
  • Yang Song
  • Jie Zhang

Recommending suitable jobs to users is a critical task in online recruitment platforms. While existing job recommendation methods encounter challenges such as the low quality of users' resumes, which hampers their accuracy and practical effectiveness.With the rapid development of large language models (LLMs), utilizing the rich external knowledge encapsulated within them, as well as their powerful reasoning capabilities, is a promising way to complete users' resumes for more accurate recommendations. However, directly leveraging LLMs to enhance recommendation results is not a one-size-fits-all solution, as LLMs may suffer from fabricated generation and few-shot problems, which degrade the quality of resume completion. In this paper, we propose a novel LLM-based approach for job recommendation. To alleviate the limitation of fabricated generation for LLMs, we extract accurate and valuable information beyond users' self-description, which helps the LLMs better profile users for resume completion. Specifically, we not only extract users' explicit properties (e.g., skills, interests) from their self-description but also infer users' implicit characteristics from their behaviors for more accurate and meaningful resume completion. Nevertheless, some users still suffer from few-shot problems, which arise due to scarce interaction records, leading to limited guidance for high-quality resume generation. To address this issue, we propose aligning unpaired low-quality with high-quality generated resumes by Generative Adversarial Networks (GANs), which can refine the resume representations for better recommendation results. Extensive experiments on three large real-world recruitment datasets demonstrate the effectiveness of our proposed method.

IJCAI Conference 2024 Conference Paper

From Skepticism to Acceptance: Simulating the Attitude Dynamics Toward Fake News

  • Yuhan Liu
  • Xiuying Chen
  • Xiaoqing Zhang
  • Xing Gao
  • Ji Zhang
  • Rui Yan

In the digital era, the rapid propagation of fake news and rumors via social networks brings notable societal challenges and impacts public opinion regulation. Traditional fake news modeling typically forecasts the general popularity trends of different groups or numerically represents opinions shift. However, these methods often oversimplify real-world complexities and overlook the rich semantic information of news text. The advent of large language models (LLMs) provides the possibility of modeling subtle dynamics of opinion. Consequently, in this work, we introduce a Fake news Propagation Simulation framework (FPS) based on LLM, which studies the trends and control of fake news propagation in detail. Specifically, each agent in the simulation represents an individual with a distinct personality. They are equipped with both short-term and long-term memory, as well as a reflective mechanism to mimic human-like thinking. Every day, they engage in random opinion exchanges, reflect on their thinking, and update their opinions. Our simulation results uncover patterns in fake news propagation related to topic relevance, and individual traits, aligning with real-world observations. Additionally, we evaluate various intervention strategies and demonstrate that early and appropriately frequent interventions strike a balance between governance cost and effectiveness, offering valuable insights for practical applications. Our study underscores the significant utility and potential of LLMs in combating fake news.

NeurIPS Conference 2024 Conference Paper

Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

  • Hongzhan Lin
  • Ang Lv
  • Yuhan Chen
  • Chen Zhu
  • Yang Song
  • Hengshu Zhu
  • Rui Yan

Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions. Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging. In this paper, for LLMs utilizing RoPE as position embeddings, we introduce a novel method called "Mixture of In-Context Experts" (MoICE) to address this challenge. MoICE comprises two key components: a router integrated into each attention head within LLMs and a lightweight router-only training optimization strategy: (1) MoICE views each RoPE angle as an 'in-context' expert, demonstrated to be capable of directing the attention of a head to specific contextual positions. Consequently, each attention head flexibly processes tokens using multiple RoPE angles dynamically selected by the router to attend to the needed positions. This approach mitigates the risk of overlooking essential contextual information. (2) The router-only training strategy entails freezing LLM parameters and exclusively updating routers for only a few steps. When applied to open-source LLMs including Llama and Mistral, MoICE surpasses prior methods across multiple tasks on long context understanding and generation, all while maintaining commendable inference efficiency.

IJCAI Conference 2024 Conference Paper

Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation

  • Ang Lv
  • Xu Tan
  • Tao Qin
  • Tie-Yan Liu
  • Rui Yan

Current lyric-to-melody generation methods struggle with the lack of paired lyric-melody data to train, and the lack of adherence to composition guidelines, resulting in melodies that do not sound human-composed. To address these issues, we propose a novel paradigm called Re-creation of Creations (ROC) that combines the strengths of both rule-based and neural-based methods. ROC consists of a two-stage generation-retrieval pipeline: the creation and re-creation stages. In the creation stage, we train a melody language model using melody data to generate high-quality music fragments, which are stored in a database indexed by key features. In the re-creation stage, users provide lyrics and a preferred chord progression, and ROC infers melody features for each lyric sentence. By querying the database, we obtain relevant melody fragments that satisfy composition guidelines, and these candidates are filtered, re-ranked, and concatenated based on the guidelines and the melody language model scores. ROC offers two main advantages: it does not require paired lyric-melody data, and it incorporates commonly used composition guidelines, resulting in music that sounds more human-composed with better controllability. Both objective and subjective evaluation results on English and Chinese lyrics show the effectiveness of ROC.

JBHI Journal 2024 Journal Article

Sparse and Hierarchical Transformer for Survival Analysis on Whole Slide Images

  • Rui Yan
  • Zhilong Lv
  • Zhidong Yang
  • Senlin Lin
  • Chunhou Zheng
  • Fa Zhang

The Transformer-based methods provide a good opportunity for modeling the global context of gigapixel whole slide image (WSI), however, there are still two main problems in applying Transformer to WSI-based survival analysis task. First, the training data for survival analysis is limited, which makes the model prone to overfitting. This problem is even worse for Transformer-based models which require large-scale data to train. Second, WSI is of extremely high resolution (up to 150, 000 × 150, 000 pixels) and is typically organized as a multi-resolution pyramid. Vanilla Transformer cannot model the hierarchical structure of WSI (such as patch cluster-level relationships), which makes it incapable of learning hierarchical WSI representation. To address these problems, in this article, we propose a novel Sparse and Hierarchical Transformer (SH-Transformer) for survival analysis. Specifically, we introduce sparse self-attention to alleviate the overfitting problem, and propose a hierarchical Transformer structure to learn the hierarchical WSI representation. Experimental results based on three WSI datasets show that the proposed framework outperforms the state-of-the-art methods.

NeurIPS Conference 2024 Conference Paper

StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses

  • Jia-Nan Li
  • Quan Tu
  • Cunli Mao
  • Zhengtao Yu
  • Ji-Rong Wen
  • Rui Yan

Standard Large Language Models (LLMs) struggle with handling dialogues with long contexts due to efficiency and consistency issues. According to our observation, dialogue contexts are highly structured, and the special token of End-of-Utterance (EoU) in dialogues has the potential to aggregate information. We refer to the EoU tokens as ``conversational attention sinks'' (conv-attn sinks). Accordingly, we introduce StreamingDialogue, which compresses long dialogue history into conv-attn sinks with minimal losses, and thus reduces computational complexity quadratically with the number of sinks (i. e. , the number of utterances). Current LLMs already demonstrate the ability to handle long context window, e. g. , a window size of 200K or more. To this end, by compressing utterances into EoUs, our method has the potential to handle more than 200K of utterances, resulting in a prolonged dialogue learning. In order to minimize information losses from reconstruction after compression, we design two learning strategies of short-memory reconstruction (SMR) and long-memory reactivation (LMR). Our method outperforms strong baselines in dialogue tasks and achieves a 4 $\times$ speedup while reducing memory usage by 18 $\times$ compared to dense attention recomputation.

AAAI Conference 2024 Conference Paper

Successive POI Recommendation via Brain-Inspired Spatiotemporal Aware Representation

  • Gehua Ma
  • He Wang
  • Jingyuan Zhao
  • Rui Yan
  • Huajin Tang

Existing approaches usually perform spatiotemporal representation in the spatial and temporal dimensions, respectively, which isolates the spatial and temporal natures of the target and leads to sub-optimal embeddings. Neuroscience research has shown that the mammalian brain entorhinal-hippocampal system provides efficient graph representations for general knowledge. Moreover, entorhinal grid cells present concise spatial representations, while hippocampal place cells represent perception conjunctions effectively. Thus, the entorhinal-hippocampal system provides a novel angle for spatiotemporal representation, which inspires us to propose the SpatioTemporal aware Embedding framework (STE) and apply it to POIs (STEP). STEP considers two types of POI-specific representations: sequential representation and spatiotemporal conjunctive representation, learned using sparse unlabeled data based on the proposed graph-building policies. Notably, STEP jointly represents the spatiotemporal natures of POIs using both observations and contextual information from integrated spatiotemporal dimensions by constructing a spatiotemporal context graph. Furthermore, we introduce a successive POI recommendation method using STEP, which achieves state-of-the-art performance on two benchmarks. In addition, we demonstrate the excellent performance of the STE representation approach in other spatiotemporal representation-centered tasks through a case study of the traffic flow prediction problem. Therefore, this work provides a novel solution to spatiotemporal representation and paves a new way for spatiotemporal modeling-related tasks.

AAAI Conference 2024 Conference Paper

What Makes Quantization for Large Language Model Hard? An Empirical Study from the Lens of Perturbation

  • Zhuocheng Gong
  • Jiahao Liu
  • Jingang Wang
  • Xunliang Cai
  • Dongyan Zhao
  • Rui Yan

Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be learned about the relationship between quantization and LLM performance. To shed light on this relationship, we propose a new perspective on quantization, viewing it as perturbations added to the weights and activations of LLMs. We call this approach ``the lens of perturbation". Using this lens, we conduct experiments with various artificial perturbations to explore their impact on LLM performance. Our findings reveal several connections between the properties of perturbations and LLM performance, providing insights into the failure cases of uniform quantization and suggesting potential solutions to improve the robustness of LLM quantization. To demonstrate the significance of our findings, we implement a simple non-uniform quantization approach based on our insights. Our experiments show that this approach achieves minimal performance degradation on both 4-bit weight quantization and 8-bit quantization for weights and activations. These results validate the correctness of our approach and highlight its potential to improve the efficiency of LLMs without sacrificing performance.

AAAI Conference 2024 Conference Paper

Your Career Path Matters in Person-Job Fit

  • Zhuocheng Gong
  • Yang Song
  • Tao Zhang
  • Ji-Rong Wen
  • Dongyan Zhao
  • Rui Yan

We are again confronted with one of the most vexing aspects of the advancement of technology: automation and AI technology cause the devaluation of human labor, resulting in unemployment. With this background, automatic person-job fit systems are promising solutions to promote the employment rate. The purpose of person-job fit is to calculate a matching score between the job seeker's resume and the job posting, determining whether the job seeker is suitable for the position. In this paper, we propose a new approach to person-job fit that characterizes the hidden preference derived from the job seeker's career path. We categorize and utilize three types of preferences in the career path: consistency, likeness, and continuity. We prove that understanding the career path enables us to provide more appropriate career suggestions to job seekers. To demonstrate the practical value of our proposed model, we conduct extensive experiments on real-world data extracted from an online recruitment platform and then present detailed cases to show how the career path matter in person-job fit.

IJCAI Conference 2023 Conference Paper

A Low Latency Adaptive Coding Spike Framework for Deep Reinforcement Learning

  • Lang Qin
  • Rui Yan
  • Huajin Tang

In recent years, spiking neural networks (SNNs) have been used in reinforcement learning (RL) due to their low power consumption and event-driven features. However, spiking reinforcement learning (SRL), which suffers from fixed coding methods, still faces the problems of high latency and poor versatility. In this paper, we use learnable matrix multiplication to encode and decode spikes, improving the flexibility of the coders and thus reducing latency. Meanwhile, we train the SNNs using the direct training method and use two different structures for online and offline RL algorithms, which gives our model a wider range of applications. Extensive experiments have revealed that our method achieves optimal performance with ultra-low latency (as low as 0. 8% of other SRL methods) and excellent energy efficiency (up to 5X the DNNs) in different algorithms and different environments.

AAAI Conference 2023 Conference Paper

ConvNTM: Conversational Neural Topic Model

  • Hongda Sun
  • Quan Tu
  • Jinpeng Li
  • Rui Yan

Topic models have been thoroughly investigated for multiple years due to their great potential in analyzing and understanding texts. Recently, researchers combine the study of topic models with deep learning techniques, known as Neural Topic Models (NTMs). However, existing NTMs are mainly tested based on general document modeling without considering different textual analysis scenarios. We assume that there are different characteristics to model topics in different textual analysis tasks. In this paper, we propose a Conversational Neural Topic Model (ConvNTM) designed in particular for the conversational scenario. Unlike the general document topic modeling, a conversation session lasts for multiple turns: each short-text utterance complies with a single topic distribution and these topic distributions are dependent across turns. Moreover, there are roles in conversations, a.k.a., speakers and addressees. Topic distributions are partially determined by such roles in conversations. We take these factors into account to model topics in conversations via the multi-turn and multi-role formulation. We also leverage the word co-occurrence relationship as a new training objective to further improve topic quality. Comprehensive experimental results based on the benchmark datasets demonstrate that our proposed ConvNTM achieves the best performance both in topic modeling and in typical downstream tasks within conversational research (i.e., dialogue act classification and dialogue response generation).

NeurIPS Conference 2023 Conference Paper

FABind: Fast and Accurate Protein-Ligand Binding

  • Qizhi Pei
  • Kaiyuan Gao
  • Lijun Wu
  • Jinhua Zhu
  • Yingce Xia
  • Shufang Xie
  • Tao Qin
  • Kun He

Modeling the interaction between proteins and ligands and accurately predicting their binding structures is a critical yet challenging task in drug discovery. Recent advancements in deep learning have shown promise in addressing this challenge, with sampling-based and regression-based methods emerging as two prominent approaches. However, these methods have notable limitations. Sampling-based methods often suffer from low efficiency due to the need for generating multiple candidate structures for selection. On the other hand, regression-based methods offer fast predictions but may experience decreased accuracy. Additionally, the variation in protein sizes often requires external modules for selecting suitable binding pockets, further impacting efficiency. In this work, we propose FABind, an end-to-end model that combines pocket prediction and docking to achieve accurate and fast protein-ligand binding. FABind incorporates a unique ligand-informed pocket prediction module, which is also leveraged for docking pose estimation. The model further enhances the docking process by incrementally integrating the predicted pocket to optimize protein-ligand binding, reducing discrepancies between training and inference. Through extensive experiments on benchmark datasets, our proposed FABind demonstrates strong advantages in terms of effectiveness and efficiency compared to existing methods. Our code is available at https: //github. com/QizhiPei/FABind.

IJCAI Conference 2023 Conference Paper

Learnable Surrogate Gradient for Direct Training Spiking Neural Networks

  • Shuang Lian
  • Jiangrong Shen
  • Qianhui Liu
  • Ziming Wang
  • Rui Yan
  • Huajin Tang

Spiking neural networks (SNNs) have increasingly drawn massive research attention due to biological interpretability and efficient computation. Recent achievements are devoted to utilizing the surrogate gradient (SG) method to avoid the dilemma of non-differentiability of spiking activity to directly train SNNs by backpropagation. However, the fixed width of the SG leads to gradient vanishing and mismatch problems, thus limiting the performance of directly trained SNNs. In this work, we propose a novel perspective to unlock the width limitation of SG, called the learnable surrogate gradient (LSG) method. The LSG method modulates the width of SG according to the change of the distribution of the membrane potentials, which is identified to be related to the decay factors based on our theoretical analysis. Then we introduce the trainable decay factors to implement the LSG method, which can optimize the width of SG automatically during training to avoid the gradient vanishing and mismatch problems caused by the limited width of SG. We evaluate the proposed LSG method on both image and neuromorphic datasets. Experimental results show that the LSG method can effectively alleviate the blocking of gradient propagation caused by the limited width of SG when training deep SNNs directly. Meanwhile, the LSG method can help SNNs achieve competitive performance on both latency and accuracy.

AAAI Conference 2023 Conference Paper

Learning towards Selective Data Augmentation for Dialogue Generation

  • Xiuying Chen
  • Mingzhe Li
  • Jiayi Zhang
  • Xiaoqiang Xia
  • Chen Wei
  • Jianwei Cui
  • Xin Gao
  • Xiangliang Zhang

As it is cumbersome and expensive to acquire a huge amount of data for training neural dialog models, data augmentation is proposed to effectively utilize existing training samples. However, current data augmentation techniques on the dialog generation task mostly augment all cases in the training dataset without considering the intrinsic attributes between different cases. We argue that not all cases are beneficial for augmentation task, and the cases suitable for augmentation should obey the following two attributes: (1) low-quality (the dialog model cannot generate a high-quality response for the case), (2) representative (the case should represent the property of the whole dataset). Herein, we explore this idea by proposing a Selective Data Augmentation framework (SDA) for the response generation task. SDA employs a dual adversarial network to select the lowest quality and most representative data points for augmentation in one stage. Extensive experiments conducted on two publicly available datasets, i.e., DailyDialog and OpenSubtitles, show that our framework can improve the response generation performance with respect to various metrics

NeurIPS Conference 2023 Conference Paper

Lift Yourself Up: Retrieval-augmented Text Generation with Self-Memory

  • Xin Cheng
  • Di Luo
  • Xiuying Chen
  • Lemao Liu
  • Dongyan Zhao
  • Rui Yan

With direct access to human-written reference as memory, retrieval-augmented generation has achieved much progress in a wide range of text generation tasks. Since better memory would typically prompt better generation (we define this as primal problem). The traditional approach for memory retrieval involves selecting memory that exhibits the highest similarity to the input. However, this method is constrained by the quality of the fixed corpus from which memory is retrieved. In this paper, by exploring the duality of the primal problem: better generation also prompts better memory, we propose a novel framework, selfmem, which addresses this limitation by iteratively employing a retrieval-augmented generator to create an unbounded memory pool and using a memory selector to choose one output as memory for the subsequent generation round. This enables the model to leverage its own output, referred to as self-memory, for improved generation. We evaluate the effectiveness of selfmem on three distinct text generation tasks: neural machine translation, abstractive text summarization, and dialogue generation, under two generation paradigms: fine-tuned small model and few-shot LLM. Our approach achieves state-of-the-art results in four directions in JRC-Acquis translation dataset, 50. 3 ROUGE-1 in XSum, and 62. 9 ROUGE-1 in BigPatent, demonstrating the potential of self-memory in enhancing retrieval-augmented generation models. Furthermore, we conduct thorough analyses of each component in the selfmem framework to identify current system bottlenecks and provide insights for future research.

AAAI Conference 2023 Conference Paper

PEN: Prediction-Explanation Network to Forecast Stock Price Movement with Better Explainability

  • Shuqi Li
  • Weiheng Liao
  • Yuhan Chen
  • Rui Yan

Nowadays explainability in stock price movement prediction is attracting increasing attention in banks, hedge funds and asset managers, primarily due to audit or regulatory reasons. Text data such as financial news and social media posts can be part of the reasons for stock price movement. To this end, we propose a novel framework of Prediction-Explanation Network (PEN) jointly modeling text streams and price streams with alignment. The key component of the PEN model is an shared representation learning module that learns which texts are possibly associated with the stock price movement by modeling the interaction between the text data and stock price data with a salient vector characterizing their correlation. In this way, the PEN model is able to predict the stock price movement by identifying and utilizing abundant messages while on the other hand, the selected text messages also explain the stock price movement. Experiments on real-world datasets demonstrate that we are able to kill two birds with one stone: in terms of accuracy, the proposed PEN model outperforms the state-of-art baseline; on explainability, the PEN model are demonstrated to be far superior to attention mechanism, capable of picking out the crucial texts with a very high confidence.

AAAI Conference 2023 Conference Paper

Retrosynthesis Prediction with Local Template Retrieval

  • Shufang Xie
  • Rui Yan
  • Junliang Guo
  • Yingce Xia
  • Lijun Wu
  • Tao Qin

Retrosynthesis, which predicts the reactants of a given target molecule, is an essential task for drug discovery. In recent years, the machine learing based retrosynthesis methods have achieved promising results. In this work, we introduce RetroKNN, a local reaction template retrieval method to further boost the performance of template-based systems with non-parametric retrieval. We first build an atom-template store and a bond-template store that contains the local templates in the training data, then retrieve from these templates with a k-nearest-neighbor (KNN) search during inference. The retrieved templates are combined with neural network predictions as the final output. Furthermore, we propose a lightweight adapter to adjust the weights when combing neural network and KNN predictions conditioned on the hidden representation and the retrieved templates. We conduct comprehensive experiments on two widely used benchmarks, the USPTO-50K and USPTO-MIT. Especially for the top-1 accuracy, we improved 7.1% on the USPTO-50K dataset and 12.0% on the USPTO-MIT dataset.These results demonstrate the effectiveness of our method.

NeurIPS Conference 2023 Conference Paper

SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents

  • Shuzheng Si
  • Wentao Ma
  • Haoyu Gao
  • Yuchuan Wu
  • Ting-En Lin
  • Yinpei Dai
  • Hangyu Li
  • Rui Yan

Task-oriented dialogue (TOD) models have made significant progress in recent years. However, previous studies primarily focus on datasets written by annotators, which has resulted in a gap between academic research and real-world spoken con- versation scenarios. While several small-scale spoken TOD datasets are proposed to address robustness issues such as ASR errors, they ignore the unique challenges in spoken conversation. To tackle the limitations, we introduce SpokenWOZ, a large-scale speech-text dataset for spoken TOD, containing 8 domains, 203k turns, 5. 7k dialogues and 249 hours of audios from human-to-human spoken conversations. SpokenWOZ further incorporates common spoken characteristics such as word-by-word processing and reasoning in spoken language. Based on these characteristics, we present cross-turn slot and reasoning slot detection as new challenges. We conduct experiments on various baselines, including text-modal models, newly proposed dual-modal models, and LLMs, e. g. , ChatGPT. The results show that the current models still have substantial room for improvement in spoken conversation, where the most advanced dialogue state tracker only achieves 25. 65% in joint goal accuracy and the SOTA end-to-end model only correctly completes the user request in 52. 1% of dialogues. Our dataset, code, and leaderboard are available at https: //spokenwoz. github. io/SpokenWOZ-github. io/.

NeurIPS Conference 2023 Conference Paper

Temporal Conditioning Spiking Latent Variable Models of the Neural Response to Natural Visual Scenes

  • Gehua Ma
  • Runhao Jiang
  • Rui Yan
  • Huajin Tang

Developing computational models of neural response is crucial for understanding sensory processing and neural computations. Current state-of-the-art neural network methods use temporal filters to handle temporal dependencies, resulting in an unrealistic and inflexible processing paradigm. Meanwhile, these methods target trial-averaged firing rates and fail to capture important features in spike trains. This work presents the temporal conditioning spiking latent variable models ( TeCoS-LVM ) to simulate the neural response to natural visual stimuli. We use spiking neurons to produce spike outputs that directly match the recorded trains. This approach helps to avoid losing information embedded in the original spike trains. We exclude the temporal dimension from the model parameter space and introduce a temporal conditioning operation to allow the model to adaptively explore and exploit temporal dependencies in stimuli sequences in a natural paradigm. We show that TeCoS-LVM models can produce more realistic spike activities and accurately fit spike statistics than powerful alternatives. Additionally, learned TeCoS-LVM models can generalize well to longer time scales. Overall, while remaining computationally tractable, our model effectively captures key features of neural coding systems. It thus provides a useful tool for building accurate predictive computational accounts for various sensory perception circuits.

AAAI Conference 2023 Conference Paper

Video-Text Pre-training with Learned Regions for Retrieval

  • Rui Yan
  • Mike Zheng Shou
  • Yixiao Ge
  • Jinpeng Wang
  • Xudong Lin
  • Guanyu Cai
  • Jinhui Tang

Video-Text pre-training aims at learning transferable representations from large-scale video-text pairs via aligning the semantics between visual and textual information. State-of-the-art approaches extract visual features from raw pixels in an end-to-end fashion. However, these methods operate at frame-level directly and thus overlook the spatio-temporal structure of objects in video, which yet has a strong synergy with nouns in textual descriptions. In this work, we propose a simple yet effective module for video-text representation learning, namely RegionLearner, which can take into account the structure of objects during pre-training on large-scale video-text pairs. Given a video, our module (1) first quantizes continuous visual features via clustering patch-features into the same cluster according to content similarity, then (2) generates learnable masks to aggregate fragmentary features into regions with complete semantics, and finally (3) models the spatio-temporal dependencies between different semantic regions. In contrast to using off-the-shelf object detectors, our proposed module does not require explicit supervision and is much more computationally efficient. We pre-train the proposed approach on the public WebVid2M and CC3M datasets. Extensive evaluations on four downstream video-text retrieval benchmarks clearly demonstrate the effectiveness of our RegionLearner.

NeurIPS Conference 2022 Conference Paper

Debiased, Longitudinal and Coordinated Drug Recommendation through Multi-Visit Clinic Records

  • Hongda Sun
  • Shufang Xie
  • Shuqi Li
  • Yuhan Chen
  • Ji-Rong Wen
  • Rui Yan

AI-empowered drug recommendation has become an important task in healthcare research areas, which offers an additional perspective to assist human doctors with more accurate and more efficient drug prescriptions. Generally, drug recommendation is based on patients' diagnosis results in the electronic health records. We assume that there are three key factors to be addressed in drug recommendation: 1) elimination of recommendation bias due to limitations of observable information, 2) better utilization of historical health condition and 3) coordination of multiple drugs to control safety. To this end, we propose DrugRec, a causal inference based drug recommendation model. The causal graphical model can identify and deconfound the recommendation bias with front-door adjustment. Meanwhile, we model the multi-visit in the causal graph to characterize a patient's historical health conditions. Finally, we model the drug-drug interactions (DDIs) as the propositional satisfiability (SAT) problem, and solving the SAT problem can help better coordinate the recommendation. Comprehensive experiment results show that our proposed model achieves state-of-the-art performance on the widely used datasets MIMIC-III and MIMIC-IV, demonstrating the effectiveness and safety of our method.

NeurIPS Conference 2022 Conference Paper

Egocentric Video-Language Pretraining

  • Kevin Qinghong Lin
  • Jinpeng Wang
  • Mattia Soldan
  • Michael Wray
  • Rui Yan
  • Eric Z. XU
  • Difei Gao
  • Rong-Cheng Tu

Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention. Best performing works rely on large-scale, 3rd-person video-text datasets, such as HowTo100M. In this work, we exploit the recently released Ego4D dataset to pioneer Egocentric VLP along three directions. (i) We create EgoClip, a 1st-person video-text pretraining dataset comprising 3. 8M clip-text pairs well-chosen from Ego4D, covering a large variety of human daily activities. (ii) We propose a novel pretraining objective, dubbed EgoNCE, which adapts video-text contrastive learning to the egocentric domain by mining egocentric-aware positive and negative samples. (iii) We introduce EgoMCQ, a development benchmark that is close to EgoClip and hence can support effective validation and fast exploration of our design decisions in EgoClip and EgoNCE. Furthermore, we demonstrate strong performance on five egocentric downstream tasks across three datasets: video-text retrieval on EPIC-KITCHENS-100; action recognition on Charades-Ego; natural language query, moment query, and object state change classification on Ego4D challenge benchmarks. The dataset and code are available at https: //github. com/showlab/EgoVLP.

IJCAI Conference 2021 Conference Paper

A Survey on Response Selection for Retrieval-based Dialogues

  • Chongyang Tao
  • Jiazhan Feng
  • Rui Yan
  • Wei Wu
  • Daxin Jiang

Building an intelligent dialogue system capable of naturally and coherently conversing with humans has been a long-standing goal of artificial intelligence. In the past decade, with the development of machine/deep learning technology and the explosive growth of available conversation data in social media, numerous neural models have been developed for context-response matching tasks in retrieval-based dialogue systems, with more fluent and informative responses compared with generative models. This paper presents a comprehensive survey of recent advances in response selection for retrieval-based dialogues. In particular, we first formulate the problem of response selection and review state-of-the-art context-response matching models categorized by their architecture. Then we summarize some recent advances on the research of response selection, including incorporation with extra knowledge and exploration on more effective model learning. Finally, we highlight the challenges which are not yet well addressed in this task and present future research directions.

AAAI Conference 2021 Conference Paper

Content Learning with Structure-Aware Writing: A Graph-Infused Dual Conditional Variational Autoencoder for Automatic Storytelling

  • Meng-Hsuan Yu
  • Juntao Li
  • Zhangming Chan
  • Rui Yan
  • Dongyan Zhao

Recent automatic storytelling methods mainly rely on keyword planning or plot skeleton generation to model longrange dependencies and create consistent narrative texts. However, these approaches generate story plans or plots sequentially, leaving the non-sequential conception and structural design processes of human writers unexplored. To mimic human writers and exploit the fine-grained, intrinsic structural information of each story, we decompose automatic story generation into sub-problems of graph construction, graph generation, and graph-infused sequence generation. Specifically, we propose a graph-infused dual conditional variational autoencoder model to capture multi-level intra-story structures (i. e. , graph) by continuous variational latent variables and generate consistent stories through dualinfusion of story structure planning and content learning. Experimental results on the ROCStories dataset and the CMU Movie Summary corpus confirm that our proposed model outperforms strong baselines in both human judges and widely-used automatic metrics.

AAAI Conference 2021 Conference Paper

Empowering Conversational AI is a Trip to Mars: Progress and Future of Open Domain Human-Computer Dialogues

  • Rui Yan
  • Wei Wu

Dialogue systems powered by conversational artificial intelligence (AI) have never been so popular. Interacting with computer through languages reveals a more natural interface to give orders and acquire information---just like human communication. Due to promising potential as virtual assistants and/or social bots, major NLP, AI and even Search & Mining communities are explicitly calling-out for contributions of conversational studies. Learning towards real conversational intelligence is a trip to Mars; perhaps we are yet on Earth. We have achieved substantial progress from recent research outputs. Still we have major obstacles to overcome. In this paper, we present an overview of progress and look forward to future trends so as to shed light on possible directions towards success.

NeurIPS Conference 2021 Conference Paper

KeSpeech: An Open Source Speech Dataset of Mandarin and Its Eight Subdialects

  • Zhiyuan Tang
  • Dong Wang
  • Yanguang Xu
  • Jianwei Sun
  • XiaoNing Lei
  • Shuaijiang Zhao
  • Cheng Wen
  • Xingjun Tan

This paper introduces an open source speech dataset, KeSpeech, which involves 1, 542 hours of speech signals recorded by 27, 237 speakers in 34 cities in China, and the pronunciation includes standard Mandarin and its 8 subdialects. The new dataset possesses several properties. Firstly, the dataset provides multiple labels including content transcription, speaker identity and subdialect, hence supporting a variety of speech processing tasks, such as speech recognition, speaker recognition, and subdialect identification, as well as other advanced techniques like multi-task learning and conditional learning. Secondly, some of the text samples were parallel recorded with both the standard Mandarin and a particular subdialect, allowing for new applications such as subdialect style conversion. Thirdly, the number of speakers is much larger than other open-source datasets, making it suitable for tasks that require training data from vast speakers. Finally, the speech signals were recorded in two phases, which opens the opportunity for the study of the time variance property of human speech. We present the design principle of the KeSpeech dataset and four baseline systems based on the new data resource: speech recognition, speaker verification, subdialect identification and voice conversion. The dataset is free for all academic usage.

AAAI Conference 2021 Conference Paper

Learning an Effective Context-Response Matching Model with Self-Supervised Tasks for Retrieval-based Dialogues

  • Ruijian Xu
  • Chongyang Tao
  • Daxin Jiang
  • Xueliang Zhao
  • Dongyan Zhao
  • Rui Yan

Building an intelligent dialogue system with the ability to select a proper response according to a multi-turn context is a great challenging task. Existing studies focus on building a context-response matching model with various neural architectures or pretrained language models (PLMs) and typically learning with a single response prediction task. These approaches overlook many potential training signals contained in dialogue data, which might be beneficial for context understanding and produce better features for response prediction. Besides, the response retrieved from existing dialogue systems supervised by the conventional way still faces some critical challenges, including incoherence and inconsistency. To address these issues, in this paper, we propose learning a context-response matching model with auxiliary selfsupervised tasks designed for the dialogue data based on pretrained language models. Specifically, we introduce four selfsupervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination, and jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner. By this means, the auxiliary tasks can guide the learning of the matching model to achieve a better local optimum and select a more proper response. Experiment results on two benchmarks indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection in retrieval-based dialogues, and our model achieves new state-of-the-art results on both datasets.

AAAI Conference 2021 Conference Paper

Predictive Adversarial Learning from Positive and Unlabeled Data

  • Wenpeng Hu
  • Ran Le
  • Bing Liu
  • Feng Ji
  • Jinwen Ma
  • Dongyan Zhao
  • Rui Yan

This paper studies learning from positive and unlabeled examples, known as PU learning. It proposes a novel PU learning method called Predictive Adversarial Networks (PAN) based on GAN (Generative Adversarial Networks). GAN learns a generator to generate data (e. g. , images) to fool a discriminator which tries to determine whether the generated data belong to a (positive) training class. PU learning can be casted as trying to identify (not generate) likely positive instances from the unlabeled set to fool a discriminator that determines whether the identified likely positive instances from the unlabeled set are indeed positive. However, directly applying GAN is problematic because GAN focuses on only the positive data. The resulting PU learning method will have high precision but low recall. We propose a new objective function based on KLdivergence. Evaluation using both image and text data shows that PAN outperforms state-of-the-art PU learning methods and also a direct adaptation of GAN for PU learning.

AAAI Conference 2021 Conference Paper

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

  • Xiuying Chen
  • Zhi Cui
  • Jiayi Zhang
  • Chen Wei
  • Jianwei Cui
  • Bin Wang
  • Dongyan Zhao
  • Rui Yan

In multi-turn dialog, utterances do not always take the full form of sentences (Carbonell 1983), which naturally makes understanding the dialog context more difficult. However, it is essential to fully grasp the dialog context to generate a reasonable response. Hence, in this paper, we propose to improve the response generation performance by examining the model’s ability to answer a reading comprehension question, where the question is focused on the omitted information in the dialog. Enlightened by the multi-task learning scheme, we propose a joint framework that unifies these two tasks, sharing the same encoder to extract the common and task-invariant features with different decoders to learn taskspecific features. To better fusing information from the question and the dialog history in the encoding part, we propose to augment the Transformer architecture with a memory updater, which is designed to selectively store and update the history dialog information so as to support downstream tasks. For the experiment, we employ human annotators to write and examine a large-scale dialog reading comprehension dataset. Extensive experiments are conducted on this dataset, and the results show that the proposed model brings substantial improvements over several strong baselines on both tasks. In this way, we demonstrate that reasoning can indeed help better response generation and vice versa. We release our large-scale dataset for further research1.

NeurIPS Conference 2021 Conference Paper

Stylized Dialogue Generation with Multi-Pass Dual Learning

  • Jinpeng Li
  • Yingce Xia
  • Rui Yan
  • Hongda Sun
  • Dongyan Zhao
  • Tie-Yan Liu

Stylized dialogue generation, which aims to generate a given-style response for an input context, plays a vital role in intelligent dialogue systems. Considering there is no parallel data between the contexts and the responses of target style S 1, existing works mainly use back translation to generate stylized synthetic data for training, where the data about context, target style S 1 and an intermediate style S 0 is used. However, the interaction among these texts is not fully exploited, and the pseudo contexts are not adequately modeled. To overcome the above difficulties, we propose multi-pass dual learning (MPDL), which leverages the duality among the context, response of style S 1 and response of style S_0. MPDL builds mappings among the above three domains, where the context should be reconstructed by the MPDL framework, and the reconstruction error is used as the training signal. To evaluate the quality of synthetic data, we also introduce discriminators that effectively measure how a pseudo sequence matches the specific domain, and the evaluation result is used as the weight for that data. Evaluation results indicate that our method obtains significant improvement over previous baselines.

AAAI Conference 2021 Conference Paper

The Style-Content Duality of Attractiveness: Learning to Write Eye-Catching Headlines via Disentanglement

  • Mingzhe Li
  • Xiuying Chen
  • Min Yang
  • Shen Gao
  • Dongyan Zhao
  • Rui Yan

Eye-catching headlines function as the first device to trigger more clicks, bringing reciprocal effect between producers and viewers. Producers can obtain more traffic and profits, and readers can have access to outstanding articles. When generating attractive headlines, it is important to not only capture the attractive content but also follow an eye-catching written style. In this paper, we propose a Disentanglement-based Attractive Headline Generator (DAHG) that generates headline which captures the attractive content following the attractive style. Concretely, we first devise a disentanglement module to divide the style and content of an attractive prototype headline into latent spaces, with two auxiliary constraints to ensure the two spaces are indeed disentangled. The latent content information is then used to further polish the document representation and help capture the salient part. Finally, the generator takes the polished document as input to generate headline under the guidance of the attractive style. Extensive experiments on the public Kuaibao dataset show that DAHG achieves state-ofthe-art performance. Human evaluation also demonstrates that DAHG triggers 22% more clicks than existing models.

AAAI Conference 2020 Conference Paper

A Character-Centric Neural Model for Automated Story Generation

  • Danyang Liu
  • Juntao Li
  • Meng-Hsuan Yu
  • Ziming Huang
  • Gongshen Liu
  • Dongyan Zhao
  • Rui Yan

Automated story generation is a challenging task which aims to automatically generate convincing stories composed of successive plots correlated with consistent characters. Most recent generation models are built upon advanced neural networks, e. g. , variational autoencoder, generative adversarial network, convolutional sequence to sequence model. Although these models have achieved prompting results on learning linguistic patterns, very few methods consider the attributes and prior knowledge of the story genre, especially from the perspectives of explainability and consistency. To fill this gap, we propose a character-centric neural storytelling model, where a story is created encircling the given character, i. e. , each part of a story is conditioned on a given character and corresponded context environment. In this way, we explicitly capture the character information and the relations between plots and characters to improve explainability and consistency. Experimental results on open dataset indicate that our model yields meaningful improvements over several strong baselines on both human and automatic evaluations.

IJCAI Conference 2020 Conference Paper

Adaptively Multi-Objective Adversarial Training for Dialogue Generation

  • Xuemiao Zhang
  • Zhouxing Tan
  • Xiaoning Zhang
  • Yang Cao
  • Rui Yan

Naive neural dialogue generation models tend to produce repetitive and dull utterances. The promising adversarial models train the generator against a well-designed discriminator to push it to improve towards the expected direction. However, assessing dialogues requires consideration of many aspects of linguistics, which are difficult to be fully covered by a single discriminator. To address it, we reframe the dialogue generation task as a multi-objective optimization problem and propose a novel adversarial dialogue generation framework with multiple discriminators that excel in different objectives for multiple linguistic aspects, called AMPGAN, whose feasibility is proved by theoretical derivations. Moreover, we design an adaptively adjusted sampling distribution to balance the discriminators and promote the overall improvement of the generator by continuing to focus on these objectives that the generator is not performing well relatively. Experimental results on two real-world datasets show a significant improvement over the baselines.

AAAI Conference 2020 Conference Paper

Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce

  • Juntao Li
  • Chang Liu
  • Jian Wang
  • Lidong Bing
  • Hongsong Li
  • Xiaozhong Liu
  • Dongyan Zhao
  • Rui Yan

With the prosperous of cross-border e-commerce, there is an urgent demand for designing intelligent approaches for assisting e-commerce sellers to offer local products for consumers from all over the world. In this paper, we explore a new task of cross-lingual information retrieval, i. e. , cross-lingual set-todescription retrieval in cross-border e-commerce, which involves matching product attribute sets in the source language with persuasive product descriptions in the target language. We manually collect a new and high-quality paired dataset, where each pair contains an unordered product attribute set in the source language and an informative product description in the target language. As the dataset construction process is both time-consuming and costly, the new dataset only comprises of 13. 5k pairs, which is a low-resource setting and can be viewed as a challenging testbed for model development and evaluation in cross-border e-commerce. To tackle this cross-lingual set-to-description retrieval task, we propose a novel cross-lingual matching network (CLMN) with the enhancement of context-dependent cross-lingual mapping upon the pre-trained monolingual BERT representations. Experimental results indicate that our proposed CLMN yields impressive results on the challenging task and the contextdependent cross-lingual mapping on BERT yields noticeable improvement over the pre-trained multi-lingual BERT model.

AAAI Conference 2020 Conference Paper

Draft and Edit: Automatic Storytelling Through Multi-Pass Hierarchical Conditional Variational Autoencoder

  • Meng-Hsuan Yu
  • Juntao Li
  • Danyang Liu
  • Dongyan Zhao
  • Rui Yan
  • Bo Tang
  • Haisong Zhang

Automatic Storytelling has consistently been a challenging area in the field of natural language processing. Despite considerable achievements have been made, the gap between automatically generated stories and human-written stories is still significant. Moreover, the limitations of existing automatic storytelling methods are obvious, e. g. , the consistency of content, wording diversity. In this paper, we proposed a multi-pass hierarchical conditional variational autoencoder model to overcome the challenges and limitations in existing automatic storytelling models. While the conditional variational autoencoder (CVAE) model has been employed to generate diversified content, the hierarchical structure and multipass editing scheme allow the story to create more consistent content. We conduct extensive experiments on the ROCStories Dataset. The results verified the validity and effectiveness of our proposed model and yields substantial improvement over the existing state-of-the-art approaches.

IJCAI Conference 2020 Conference Paper

From Standard Summarization to New Tasks and Beyond: Summarization with Manifold Information

  • Shen Gao
  • Xiuying Chen
  • Zhaochun Ren
  • Dongyan Zhao
  • Rui Yan

Text summarization is the research area aiming at creating a short and condensed version of the original document, which conveys the main idea of the document in a few words. This research topic has started to attract the attention of a large community of researchers, and it is nowadays counted as one of the most promising research areas. In general, text summarization algorithms aim at using a plain text document as input and then output a summary. However, in real-world applications, most of the data is not in a plain text format. Instead, there is much manifold information to be summarized, such as the summary for a web page based on a query in the search engine, extreme long document (e. g. academic paper), dialog history and so on. In this paper, we focus on the survey of these new summarization tasks and approaches in the real-world application.

AAAI Conference 2020 Short Paper

Learning Sense Representation from Word Representation for Unsupervised Word Sense Disambiguation (Student Abstract)

  • Jie Wang
  • Zhenxin Fu
  • Moxin Li
  • Haisong Zhang
  • Dongyan Zhao
  • Rui Yan

Unsupervised WSD methods do not rely on annotated training datasets and can use WordNet. Since each ambiguous word in the WSD task exists in WordNet and each sense of the word has a gloss, we propose SGM and MGM to learn sense representations for words in WordNet using the glosses. In the WSD task, we calculate the similarity between each sense of the ambiguous word and its context to select the sense with the highest similarity. We evaluate our method on several benchmark WSD datasets and achieve better performance than the state-of-the-art unsupervised WSD systems.

AAAI Conference 2020 Short Paper

RPM-Oriented Query Rewriting Framework for E-commerce Keyword-Based Sponsored Search (Student Abstract)

  • Xiuying Chen
  • Daorui Xiao
  • Shen Gao
  • Guojun Liu
  • Wei Lin
  • Bo Zheng
  • Dongyan Zhao
  • Rui Yan

Sponsored search optimizes revenue and relevance, which is estimated by Revenue Per Mille (RPM). Existing sponsored search models are all based on traditional statistical models, which have poor RPM performance when queries follow a heavy-tailed distribution. Here, we propose an RPMoriented Query Rewriting Framework (RQRF) which outputs related bid keywords that can yield high RPM. RQRF embeds both queries and bid keywords to vectors in the same implicit space, converting the rewriting probability between each query and keyword to the distance between the two vectors. For label construction, we propose an RPM-oriented sample construction method, labeling keywords based on whether or not they can lead to high RPM. Extensive experiments are conducted to evaluate performance of RQRF. In a one month large-scale real-world traffic of e-commerce sponsored search system, the proposed model significantly outperforms traditional baseline.

IJCAI Conference 2020 Conference Paper

Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model

  • Juntao Li
  • Ruidan He
  • Hai Ye
  • Hwee Tou Ng
  • Lidong Bing
  • Rui Yan

Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements over various cross-lingual and low-resource tasks. Through training on one hundred languages and terabytes of texts, cross-lingual language models have proven to be effective in leveraging high-resource languages to enhance low-resource language processing and outperform monolingual models. In this paper, we further investigate the cross-lingual and cross-domain (CLCD) setting when a pretrained cross-lingual language model needs to adapt to new domains. Specifically, we propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features and domain-invariant features from the entangled pretrained cross-lingual representations, given unlabeled raw texts in the source language. Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts. Experimental results show that our proposed method achieves significant performance improvements over the state-of-the-art pretrained cross-lingual language model in the CLCD setting.

IJCAI Conference 2019 Conference Paper

A Document-grounded Matching Network for Response Selection in Retrieval-based Chatbots

  • Xueliang Zhao
  • Chongyang Tao
  • Wei Wu
  • Can Xu
  • Dongyan Zhao
  • Rui Yan

We present a document-grounded matching network (DGMN) for response selection that can power a knowledge-aware retrieval-based chatbot system. The challenges of building such a model lie in how to ground conversation contexts with background documents and how to recognize important information in the documents for matching. To overcome the challenges, DGMN fuses information in a document and a context into representations of each other, and dynamically determines if grounding is necessary and importance of different parts of the document and the context through hierarchical interaction with a response at the matching step. Empirical studies on two public data sets indicate that DGMN can significantly improve upon state-of-the-art methods and at the same time enjoys good interpretability.

AAAI Conference 2019 Conference Paper

Abstractive Text Summarization by Incorporating Reader Comments

  • Shen Gao
  • Xiuying Chen
  • Piji Li
  • Zhaochun Ren
  • Lidong Bing
  • Dongyan Zhao
  • Rui Yan

In neural abstractive summarization field, conventional sequence-to-sequence based models often suffer from summarizing the wrong aspect of the document with respect to the main aspect. To tackle this problem, we propose the task of reader-aware abstractive summary generation, which utilizes the reader comments to help the model produce better summary about the main aspect. Unlike traditional abstractive summarization task, reader-aware summarization confronts two main challenges: (1) Comments are informal and noisy; (2) jointly modeling the news document and the reader comments is challenging. To tackle the above challenges, we design an adversarial learning model named reader-aware summary generator (RASG), which consists of four components: (1) a sequence-to-sequence based summary generator; (2) a reader attention module capturing the reader focused aspects; (3) a supervisor modeling the semantic gap between the generated summary and reader focused aspects; (4) a goal tracker producing the goal for each generation step. The supervisor and the goal tacker are used to guide the training of our framework in an adversarial manner. Extensive experiments are conducted on our large-scale real-world text summarization dataset, and the results show that RASG achieves the stateof-the-art performance in terms of both automatic metrics and human evaluations. The experimental results also demonstrate the effectiveness of each module in our framework. We release our large-scale dataset for further research1.

AAAI Conference 2019 Conference Paper

CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling

  • Ning Miao
  • Hao Zhou
  • Lili Mou
  • Rui Yan
  • Lei Li

In real-world applications of natural language generation, there are often constraints on the target sentences in addition to fluency and naturalness requirements. Existing language generation techniques are usually based on recurrent neural networks (RNNs). However, it is non-trivial to impose constraints on RNNs while maintaining generation quality, since RNNs generate sentences sequentially (or with beam search) from the first word to the last. In this paper, we propose CGMH, a novel approach using Metropolis-Hastings sampling for constrained sentence generation. CGMH allows complicated constraints such as the occurrence of multiple keywords in the target sentences, which cannot be handled in traditional RNN-based approaches. Moreover, CGMH works in the inference stage, and does not require parallel corpora for training. We evaluate our method on a variety of tasks, including keywords-to-sentence generation, unsupervised sentence paraphrasing, and unsupervised sentence error correction. CGMH achieves high performance compared with previous supervised methods for sentence generation. Our code is released at https: //github. com/NingMiao/CGMH

IJCAI Conference 2019 Conference Paper

Fast and Accurate Classification with a Multi-Spike Learning Algorithm for Spiking Neurons

  • Rong Xiao
  • Qiang Yu
  • Rui Yan
  • Huajin Tang

The formulation of efficient supervised learning algorithms for spiking neurons is complicated and remains challenging. Most existing learning methods with the precisely firing times of spikes often result in relatively low efficiency and poor robustness to noise. To address these limitations, we propose a simple and effective multi-spike learning rule to train neurons to match their output spike number with a desired one. The proposed method will quickly find a local maximum value (directly related to the embedded feature) as the relevant signal for synaptic updates based on membrane potential trace of a neuron, and constructs an error function defined as the difference between the local maximum membrane potential and the firing threshold. With the presented rule, a single neuron can be trained to learn multi-category tasks, and can successfully mitigate the impact of the input noise and discover embedded features. Experimental results show the proposed algorithm has higher precision, lower computation cost, and better noise robustness than current state-of-the-art learning methods under a wide range of learning tasks.

AAAI Conference 2019 Short Paper

Find a Reasonable Ending for Stories: Does Logic Relation Help the Story Cloze Test?

  • Mingyue Shang
  • Zhenxin Fu
  • Hongzhi Yin
  • Bo Tang
  • Dongyan Zhao
  • Rui Yan

Natural language understanding is a challenging problem that covers a wide range of tasks. While previous methods generally train each task separately, we consider combining the cross-task features to enhance the task performance. In this paper, we incorporate the logic information with the help of the Natural Language Inference (NLI) task to the Story Cloze Test (SCT). Previous work on SCT considered various semantic information, such as sentiment and topic, but lack the logic information between sentences which is an essential element of stories. Thus we propose to extract the logic information during the course of the story to improve the understanding of the whole story. The logic information is modeled with the help of the NLI task. Experimental results prove the strength of the logic information.

IJCAI Conference 2019 Conference Paper

GSN: A Graph-Structured Network for Multi-Party Dialogues

  • Wenpeng Hu
  • Zhangming Chan
  • Bing Liu
  • Dongyan Zhao
  • Jinwen Ma
  • Rui Yan

Existing neural models for dialogue response generation assume that utterances are sequentially organized. However, many real-world dialogues involve multiple interlocutors (i. e. , multi-party dialogues), where the assumption does not hold as utterances from different interlocutors can occur ``in parallel. '' This paper generalizes existing sequence-based models to a Graph-Structured neural Network (GSN) for dialogue modeling. The core of GSN is a graph-based encoder that can model the information flow along the graph-structured dialogues (two-party sequential dialogues are a special case). Experimental results show that GSN significantly outperforms existing sequence-based models.

AAAI Conference 2019 Conference Paper

Insufficient Data Can Also Rock! Learning to Converse Using Smaller Data with Augmentation

  • Juntao Li
  • Lisong Qiu
  • Bo Tang
  • Dongmin Chen
  • Dongyan Zhao
  • Rui Yan

Recent successes of open-domain dialogue generation mainly rely on the advances of deep neural networks. The effectiveness of deep neural network models depends on the amount of training data. As it is laboursome and expensive to acquire a huge amount of data in most scenarios, how to effectively utilize existing data is the crux of this issue. In this paper, we use data augmentation techniques to improve the performance of neural dialogue models on the condition of insufficient data. Specifically, we propose a novel generative model to augment existing data, where the conditional variational autoencoder (CVAE) is employed as the generator to output more training data with diversified expressions. To improve the correlation of each augmented training pair, we design a discriminator with adversarial training to supervise the augmentation process. Moreover, we thoroughly investigate various data augmentation schemes for neural dialogue system with generative models, both GAN and CVAE. Experimental results on two open corpora, Weibo and Twitter, demonstrate the superiority of our proposed data augmentation model.

AAAI Conference 2019 Conference Paper

Learning to Write Stories with Thematic Consistency and Wording Novelty

  • Juntao Li
  • Lidong Bing
  • Lisong Qiu
  • Dongmin Chen
  • Dongyan Zhao
  • Rui Yan

Automatic story generation is a challenging task, which involves automatically comprising a sequence of sentences or words with a consistent topic and novel wordings. Although many attention has been paid to this task and prompting progress has been made, there still exists a noticeable gap between generated stories and those created by humans, especially in terms of thematic consistency and wording novelty. To fill this gap, we propose a cache-augmented conditional variational autoencoder for story generation, where the cache module allows to improve thematic consistency while the conditional variational autoencoder part is used for generating stories with less common words by using a continuous latent variable. For combing the cache module and the autoencoder part, we further introduce an effective gate mechanism. Experimental results on ROCStories and WritingPrompts indicate that our proposed model can generate stories with consistency and wording novelty, and outperforms existing models under both automatic metrics and human evaluations.

IJCAI Conference 2019 Conference Paper

Learning towards Abstractive Timeline Summarization

  • Xiuying Chen
  • Zhangming Chan
  • Shen Gao
  • Meng-Hsuan Yu
  • Dongyan Zhao
  • Rui Yan

Timeline summarization targets at concisely summarizing the evolution trajectory along the timeline and existing timeline summarization approaches are all based on extractive methods. In this paper, we propose the task of abstractive timeline summarization, which tends to concisely paraphrase the information in the time-stamped events. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, we propose a memory-based timeline summarization model (MTS). Concretely, we propose a time-event memory to establish a timeline, and use the time position of events on this timeline to guide generation process. Besides, in each decoding step, we incorporate event-level information into word-level attention to avoid confusion between events. Extensive experiments are conducted on a large-scale real-world dataset, and the results show that MTS achieves the state-of-the-art performance in terms of both automatic and human evaluations.

AAAI Conference 2019 Conference Paper

Plan-and-Write: Towards Better Automatic Storytelling

  • Lili Yao
  • Nanyun Peng
  • Ralph Weischedel
  • Kevin Knight
  • Dongyan Zhao
  • Rui Yan

Automatic storytelling is challenging since it requires generating long, coherent natural language to describes a sensible sequence of events. Despite considerable efforts on automatic story generation in the past, prior work either is restricted in plot planning, or can only generate stories in a narrow domain. In this paper, we explore open-domain story generation that writes stories given a title (topic) as input. We propose a plan-and-write hierarchical generation framework that first plans a storyline, and then generates a story based on the storyline. We compare two planning strategies. The dynamic schema interweaves story planning and its surface realization in text, while the static schema plans out the entire storyline before generating stories. Experiments show that with explicit storyline planning, the generated stories are more diverse, coherent, and on topic than those generated without creating a full plan, according to both automatic and human evaluations.

IJCAI Conference 2019 Conference Paper

Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs

  • Yuting Wu
  • Xiao Liu
  • Yansong Feng
  • Zheng Wang
  • Rui Yan
  • Dongyan Zhao

Entity alignment is the task of linking entities with the same real-world identity from different knowledge graphs (KGs), which has been recently dominated by embedding-based methods. Such approaches work by learning KG representations so that entity alignment can be performed by measuring the similarities between entity embeddings. While promising, prior works in the field often fail to properly capture complex relation information that commonly exists in multi-relational KGs, leaving much room for improvement. In this paper, we propose a novel Relation-aware Dual-Graph Convolutional Network (RDGCN) to incorporate relation information via attentive interactions between the knowledge graph and its dual relation counterpart, and further capture neighboring structures to learn better entity representations. Experiments on three real-world cross-lingual datasets show that our approach delivers better and more robust results over the state-of-the-art alignment methods by learning better KG representations.

IJCAI Conference 2018 Conference Paper

" Chitty-Chitty-Chat Bot" : Deep Learning for Conversational AI

  • Rui Yan

Conversational AI is of growing importance since it enables easy interaction interface between humans and computers. Due to its promising potential and alluring commercial values to serve as virtual assistants and/or social chatbots, major AI, NLP, and Search & Mining conferences are explicitly calling-out for contributions from conversational studies. It is an active research area and of considerable interest. To build a conversational system with moderate intelligence is challenging, and requires abundant dialogue data and interdisciplinary techniques. Along with the Web 2. 0, the massive data available greatly facilitate data-driven methods such as deep learning for human-computer conversations. In general, conversational systems can be categorized into 1) task-oriented systems which aim to help users accomplish goals in vertical domains, and 2) social chat bots which can converse seamlessly and appropriately with humans, playing the role of a chat companion. In this paper, we focus on the survey of non-task-oriented chit-chat bots.

IJCAI Conference 2018 Conference Paper

An Ensemble of Retrieval-Based and Generation-Based Human-Computer Conversation Systems

  • Yiping Song
  • Cheng-Te Li
  • Jian-Yun Nie
  • Ming Zhang
  • Dongyan Zhao
  • Rui Yan

Human-computer conversation systems have attracted much attention in Natural Language Processing. Conversation systems can be roughly divided into two categories: retrieval-based and generation-based systems. Retrieval systems search a user-issued utterance (namely a query ) in a large conversational repository and return a reply that best matches the query. Generative approaches synthesize new replies. Both ways have certain advantages but suffer from their own disadvantages. We propose a novel ensemble of retrieval-based and generation-based conversation system. The retrieved candidates, in addition to the original query, are fed to a reply generator via a neural network, so that the model is aware of more information. The generated reply together with the retrieved ones then participates in a re-ranking process to find the final reply to output. Experimental results show that such an ensemble system outperforms each single module by a large margin.

IJCAI Conference 2018 Conference Paper

Get The Point of My Utterance! Learning Towards Effective Responses with Multi-Head Attention Mechanism

  • Chongyang Tao
  • Shen Gao
  • Mingyue Shang
  • Wei Wu
  • Dongyan Zhao
  • Rui Yan

Attention mechanism has become a popular and widely used component in sequence-to-sequence models. However, previous research on neural generative dialogue systems always generates universal responses, and the attention distribution learned by the model always attends to the same semantic aspect. To solve this problem, in this paper, we propose a novel Multi-Head Attention Mechanism (MHAM) for generative dialog systems, which aims at capturing multiple semantic aspects from the user utterance. Further, a regularizer is formulated to force different attention heads to concentrate on certain aspects. The proposed mechanism leads to more informative, diverse, and relevant response generated. Experimental results show that our proposed model outperforms several strong baselines.

IJCAI Conference 2018 Conference Paper

Learning to Converse with Noisy Data: Generation with Calibration

  • Mingyue Shang
  • Zhenxin Fu
  • Nanyun Peng
  • Yansong Feng
  • Dongyan Zhao
  • Rui Yan

The availability of abundant conversational data on the Internet brought prosperity to the generation-based open domain conversation systems. In the training of the generation models, existing methods generally treat all the training data equivalently. However, the data crawled from the websites may contain many noises. Blindly training with the noisy data could harm the performance of the final generation model. In this paper, we propose a generation with calibration framework, that allows high- quality data to have more influences on the generation model and reduces the effect of noisy data. Specifically, for each instance in training set, we employ a calibration network to produce a quality score for it, then the score is used for the weighted update of the generation model parameters. Experiments show that the calibrated model outperforms baseline methods on both automatic evaluation metrics and human annotations.

IJCAI Conference 2018 Conference Paper

One " Ruler" for All Languages: Multi-Lingual Dialogue Evaluation with Adversarial Multi-Task Learning

  • Xiaowei Tong
  • Zhenxin Fu
  • Mingyue Shang
  • Dongyan Zhao
  • Rui Yan

Automatic evaluating the performance of Open-domain dialogue system is a challenging problem. Recent work in neural network-based metrics has shown promising opportunities for automatic dialogue evaluation. However, existing methods mainly focus on monolingual evaluation, in which the trained metric is not flexible enough to transfer across different languages. To address this issue, we propose an adversarial multi-task neural metric (ADVMT) for multi-lingual dialogue evaluation, with shared feature extraction across languages. We evaluate the proposed model in two different languages. Experiments show that the adversarial multi-task neural metric achieves a high correlation with human annotation, which yields better performance than monolingual ones and various existing metrics.

AAAI Conference 2018 Conference Paper

RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems

  • Chongyang Tao
  • Lili Mou
  • Dongyan Zhao
  • Rui Yan

Open-domain human-computer conversation has been attracting increasing attention over the past few years. However, there does not exist a standard automatic evaluation metric for open-domain dialog systems; researchers usually resort to human annotation for model evaluation, which is timeand labor-intensive. In this paper, we propose RUBER, a Referenced metric and Unreferenced metric Blended Evaluation Routine, which evaluates a reply by taking into consideration both a groundtruth reply and a query (previous user-issued utterance). Our metric is learnable, but its training does not require labels of human satisfaction. Hence, RUBER is flexible and extensible to different datasets and languages. Experiments on both retrieval and generative dialog systems show that RUBER has a high correlation with human annotation, and that RUBER has fair transferability over different datasets.

AAAI Conference 2018 Conference Paper

Scale Up Event Extraction Learning via Automatic Training Data Generation

  • Ying Zeng
  • Yansong Feng
  • Rong Ma
  • Zheng Wang
  • Rui Yan
  • Chongde Shi
  • Dongyan Zhao

The task of event extraction has long been investigated in a supervised learning paradigm, which is bound by the number and the quality of the training instances. Existing training data must be manually generated through a combination of expert domain knowledge and extensive human involvement. However, due to drastic efforts required in annotating text, the resultant datasets are usually small, which severally affects the quality of the learned model, making it hard to generalize. Our work develops an automatic approach for generating training data for event extraction. Our approach allows us to scale up event extraction training instances from thousands to hundreds of thousands, and it does this at a much lower cost than a manual approach. We achieve this by employing distant supervision to automatically create event annotations from unlabelled text using existing structured knowledge bases or tables. We then develop a neural network model with post inference to transfer the knowledge extracted from structured knowledge bases to automatically annotate typed events with corresponding arguments in text. We evaluate our approach by using the knowledge extracted from Freebase to label texts from Wikipedia articles. Experimental results show that our approach can generate a large number of highquality training instances. We show that this large volume of training data not only leads to a better event extractor, but also allows us to detect multiple typed events.

IJCAI Conference 2018 Conference Paper

Smarter Response with Proactive Suggestion: A New Generative Neural Conversation Paradigm

  • Rui Yan
  • Dongyan Zhao

Conversational systems are becoming more and more promising by playing an important role in human-computer communications. A conversational system is supposed to be intelligent to enable human-like interactions. The long-term goal of smart human-computer conversations is challenging and heavily driven by data. Thanks to the prosperity of Web 2. 0, a large volume of conversational data become available to establish human-computer conversational systems. Given a human issued message, namely a query, a traditional conversational system would provide a response after proper training of how to respond like humans. In this paper, we propose a new paradigm for neural generative conversations: smarter response with a suggestion is provided given the query. We assume that the new conversation mode which proactively introduces contents as next utterances, keeping user actively engaged. To address the task, we propose a novel integrated model to handle both the response generation and the suggestion generation. From the experimental results, we verify the effectiveness of the new neural generative conversation paradigm.

AAAI Conference 2018 Conference Paper

Style Transfer in Text: Exploration and Evaluation

  • Zhenxin Fu
  • Xiaoye Tan
  • Nanyun Peng
  • Dongyan Zhao
  • Rui Yan

The ability to transfer styles of texts or images, is an important measurement of the advancement of artificial intelligence (AI). However, the progress in language style transfer is lagged behind other domains, such as computer vision, mainly because of the lack of parallel data and reliable evaluation metrics. In response to the challenge of lacking parallel data, we explore learning style transfer from non-parallel data. We propose two models to achieve this goal. The key idea behind the proposed models is to learn separate content representations and style representations using adversarial networks. Considering the problem of lacking principle evaluation metrics, we propose two novel evaluation metrics that measure two aspects of style transfer: transfer strength and content preservation. We benchmark our models and the evaluation metrics on two style transfer tasks: paper-news title transfer, and positive-negative review transfer. Results show that the proposed content preservation metric is highly correlate to human judgments, and the proposed models are able to generate sentences with similar content preservation score but higher style transfer strength comparing to autoencoder.

AAAI Conference 2018 Conference Paper

Towards a Neural Conversation Model With Diversity Net Using Determinantal Point Processes

  • Yiping Song
  • Rui Yan
  • Yansong Feng
  • Yaoyuan Zhang
  • Dongyan Zhao
  • Ming Zhang

Typically, neural conversation systems generate replies based on the sequence-to-sequence (seq2seq) model. seq2seq tends to produce safe and universal replies, which suffers from the lack of diversity and information. Determinantal Point Processes (DPPs) is a probabilistic model defined on item sets, which can select the items with good diversity and quality. In this paper, we investigate the diversity issue in two different aspects, namely query-level and system-level diversity. We propose a novel framework which organically combines seq2seq model with Determinantal Point Processes (DPPs). The new framework achieves high quality in generated reply and significantly improves the diversity among them. Experiments show that our model achieves the best performance among various baselines in terms of both quality and diversity.

IJCAI Conference 2016 Conference Paper

i, Poet: Automatic Poetry Composition through Recurrent Neural Networks with Iterative Polishing Schema

  • Rui Yan

Part of the long lasting cultural heritage of humanity is the art of classical poems, which are created by fitting words into certain formats and representations. Automatic poetry composition by computers is considered as a challenging problem which requires high Artificial Intelligence assistance. This study attracts more and more attention in the research community. In this paper, we formulate the poetry composition task as a natural language generation problem using recurrent neural networks. Given user specified writing intents, the system generates a poem via sequential language modeling. Unlike the traditional one-pass generation for previous neural network models, poetry composition needs polishing to satisfy certain requirements. Hence, we propose a new generative model with a polishing schema, and output a refined poem composition. In this way, the poem is generated incrementally and iteratively by refining each line. We run experiments based on large datasets of 61, 960 classic poems in Chinese. A comprehensive evaluation, using perplexity and BLEU measurements as well as human judgments, has demonstrated the effectiveness of our proposed approach.

IJCAI Conference 2016 Conference Paper

StalemateBreaker: A Proactive Content-Introducing Approach to Automatic Human-Computer Conversation

  • Xiang Li
  • Lili Mou
  • Rui Yan
  • Ming Zhang

Existing open-domain human-computer conversation systems are typically passive: they either synthesize or retrieve a reply provided with a human-issued utterance. It is generally presumed that humans should take the role to lead the conversation and introduce new content when a stalemate occurs, and that computers only need to "respond. " In this paper, we propose STALEMATEBREAKER, a conversation system that can proactively introduce new content when appropriate. We design a pipeline to determine when, what, and how to introduce new content during human-computer conversation. We further propose a novel reranking algorithm Bi-PageRank-HITS to enable rich interaction between conversation context and candidate replies. Experiments show that both the content-introducing approach and the reranking algorithm are effective. Our full STALEMATEBREAKER model outperforms a state-of-the-practice conversation system by +14. 4% p@1 when a stalemate occurs.

IJCAI Conference 2015 Conference Paper

Opportunities or Risks to Reduce Labor in Crowdsourcing Translation? Characterizing Cost versus Quality via a PageRank-HITS Hybrid Model

  • Rui Yan
  • Yiping Song
  • Cheng-Te Li
  • Ming Zhang
  • Xiaohua Hu

Crowdsourcing machine translation shows advantages of lower expense in money to collect the translated data. Yet, when compared with translation by trained professionals, results collected from non-professional translators might yield lowquality outputs. A general solution for crowdsourcing practitioners is to employ a large amount of labor force to gather enough redundant data and then solicit from it. Actually we can further save money by avoid collecting bad translations. We propose to score Turkers by their authorities during observation, and then stop hiring the unqualified Turkers. In this way, we bring both opportunities and risks in crowdsourced translation: we can make it cheaper than cheaper while we might suffer from quality loss. In this paper, we propose a graphbased PageRank-HITS Hybrid model to distinguish authoritative workers from unreliable ones. The algorithm captures the intuition that good translation and good workers are mutually reinforced iteratively in the proposed frame. We demonstrate the algorithm will keep the performance while reduce work force and hence cut cost. We run experiments on the NIST 2009 Urdu-to-English evaluation set with Mechanical Turk, and quantitatively evaluate the performance in terms of BLEU score, Pearson correlation and real money.

IJCAI Conference 2013 Conference Paper

i, Poet: Automatic Chinese Poetry Composition through a Generative Summarization Framework under Constrained Optimization

  • Rui Yan
  • Han Jiang
  • Mirella Lapata
  • Shou-De Lin
  • Xueqiang Lv
  • Xiaoming Li

Part of the long lasting cultural heritage of China is the classical ancient Chinese poems which follow strict formats and complicated linguistic rules. Automatic Chinese poetry composition by programs is considered as a challenging problem in computational linguistics and requires high Artificial Intelligence assistance, and has not been well addressed. In this paper, we formulate the poetry composition task as an optimization problem based on a generative summarization framework under several constraints. Given the user specified writing intents, the system retrieves candidate terms out of a large poem corpus, and then orders these terms to fit into poetry formats, satisfying tonal and rhythm requirements. The optimization process under constraints is conducted via iterative term substitutions till convergence, and outputs the subset with the highest utility as the generated poem. For experiments, we perform generation on large datasets of 61, 960 classic poems from Tang and Song Dynasty of China. A comprehensive evaluation, using both human judgments and ROUGE scores, has demonstrated the effectiveness of our proposed approach.