Arrow Research search

Author name cluster

Jin Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

26 papers
2 author rows

Possible papers

26

TCS Journal 2026 Journal Article

A space improved algorithm for chromatic number

  • Pu Wu
  • Huanyu Gu
  • Huiqin Jiang
  • Zehui Shao
  • Jin Xu

We investigate the chromatic number problem, a classic NP-complete problem identified by Karp among his 21 seminal problems. The chromatic number of a graph G is the smallest integer k such that each vertex of G can be assigned one of k colors, with no two adjacent vertices assigned the same color. The chromatic number problem requires determining this minimum k for a given graph G with n vertices. The questions of whether an algorithm for the chromatic number problem with time complexity O * ( a n ), where a < 2, exists, and whether an algorithm for the chromatic number problem with time complexity O * ( 2 n ) and polynomial space exists, both remain unresolved. The fastest known algorithm for the chromatic number problem was proposed by Björklund, Husfeldt, and Koivisto (FOCS 2006), with the time and space complexity of O * ( 2 n ). Subsequently, in their follow-up work (ICALP 2010), the space complexity is reduced to O ( 1. 2916 n ). In this work, we present an improved algorithm for the chromatic number problem. Building on prior research, our approach leverages algebraic methods, specifically the generating functions and the discrete Fourier transform. Our main contribution demonstrates that by utilizing these algebraic techniques, certain structural properties of graphs can be exploited to reduce space complexity, while preserving the best-known time complexity of O * ( 2 n ). Specifically, our algorithm achieves a time complexity of O * ( 2 n ) and a space complexity of O * ( 2 9 n 25 ) = O ( 1. 2835 n ).

AAAI Conference 2026 Conference Paper

ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning

  • Juyuan Wang
  • Rongchen Zhao
  • Wei Wei
  • Yufeng Wang
  • Mo Yu
  • Jie Zhou
  • Jin Xu
  • Liyan Xu

Narrative comprehension on long stories and novels has been a challenging domain attributed to their intricate plotlines and entangled, often evolving relations among characters and entities. Given the LLM's diminished reasoning over extended context and its high computational cost, retrieval-based approaches remain a pivotal role in practice. However, traditional RAG methods could fall short due to their stateless, single-step retrieval process, which often overlooks the dynamic nature of capturing interconnected relations within long-range context. In this work, we propose ComoRAG, holding the principle that narrative reasoning is not a one-shot process, but a dynamic, evolving interplay between new evidence acquisition and past knowledge consolidation, analogous to human cognition on reasoning with memory-related signals in the brain. Specifically, when encountering a reasoning impasse, ComoRAG undergoes iterative reasoning cycles while interacting with a dynamic memory workspace. In each cycle, it generates probing queries to devise new exploratory paths, then integrates the retrieved evidence of new aspects into a global memory pool, thereby supporting the emergence of a coherent context for the query resolution. Across four challenging long-context narrative benchmarks (200K+ tokens), ComoRAG outperforms strong RAG baselines with consistent relative gains up to 11% compared to the strongest baseline. Further analysis reveals that ComoRAG is particularly advantageous for complex queries requiring global comprehension, offering a principled, cognitively motivated paradigm for retrieval-based stateful reasoning.

NeurIPS Conference 2025 Conference Paper

From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes

  • Long Ma
  • Zhiyuan Yan
  • Jin Xu
  • Yize Chen
  • Qinglang Guo
  • Zhen Bi
  • Yong Liao
  • Hui Lin

Detecting deepfakes has been an increasingly important topic, especially given the rapid development of AI generation techniques. In this paper, we ask: How can we build a universal detection framework that is effective for most facial deepfakes? One significant challenge is the wide variety of deepfake generators available, resulting in varying forgery artifacts (e. g. , lighting inconsistency, color mismatch, etc). But should we ``teach" the detector to learn all these artifacts separately? It is impossible and impractical to elaborate on them all. So the core idea is to pinpoint the more common and general artifacts across different deepfakes. Accordingly, we categorize deepfake artifacts into two distinct yet complementary types: Face Inconsistency Artifacts (FIA) and Up-Sampling Artifacts (USA). FIA arise from the challenge of generating all intricate details, inevitably causing inconsistencies between the complex facial features and relatively uniform surrounding areas. USA, on the other hand, are the inevitable traces left by the generator's decoder during the up-sampling process. This categorization stems from the observation that all existing deepfakes typically exhibit one or both of these artifacts. To achieve this, we propose a new data-level pseudo-fake creation framework that constructs fake samples with only the FIA and USA, without introducing extra less-general artifacts. Specifically, we employ a super-resolution to simulate the USA, while utilise image-level self-blending on diverse facial regions to create the FIA. We surprisingly found that, with this intuitive design, a standard image classifier trained only with our pseudo-fake data can non-trivially generalize well to previously unseen deepfakes.

NeurIPS Conference 2025 Conference Paper

TANDEM: Bi-Level Data Mixture Optimization with Twin Networks

  • Jiaxing Wang
  • Deping Xiang
  • Jin Xu
  • Mingyang Yi
  • Guoqiang Gong
  • Zicheng Zhang
  • Haoran Li
  • Pengzhang Liu

The capabilities of large language models (LLMs) significantly depend on training data drawn from various domains. Optimizing domain-specific mixture ratios can be modeled as a bi-level optimization problem, which we simplify into a single-level penalized form and solve with twin networks: a proxy model trained on primary data and a dynamically updated reference model trained with additional data. Our proposed method, Twin Networks for bi-level DatA mixturE optiMization (TANDEM), measures the data efficacy through the difference between the twin models and up-weights domains that benefit more from the additional data. TANDEM provides theoretical guarantees and wider applicability, compared to prior approaches. Furthermore, our bi-level perspective suggests new settings to study domain reweighting such as data-restricted scenarios and supervised fine-tuning, where optimized mixture ratios significantly improve the performance. Extensive experiments validate TANDEM's effectiveness in all scenarios.

NeurIPS Conference 2024 Conference Paper

$\boldsymbol{\mu}\mathbf{P^2}$: Effective Sharpness Aware Minimization Requires Layerwise Perturbation Scaling

  • Moritz Haas
  • Jin Xu
  • Volkan Cevher
  • Leena Chennuru Vankadara

Sharpness Aware Minimization (SAM) enhances performance across various neural architectures and datasets. As models are continually scaled up to improve performance, a rigorous understanding of SAM’s scaling behaviour is paramount. To this end, we study the infinite-width limit of neural networks trained with SAM, using the Tensor Programs framework. Our findings reveal that the dynamics of standard SAM effectively reduce to applying SAM solely in the last layer in wide neural networks, even with optimal hyperparameters. In contrast, we identify a stable parameterization with layerwise perturbation scaling, which we call *Maximal Update and Perturbation Parameterization* ($\mu$P$^2$), that ensures all layers are both feature learning and effectively perturbed in the limit. Through experiments with MLPs, ResNets and Vision Transformers, we empirically demonstrate that $\mu$P$^2$ is the first parameterization to achieve hyperparameter transfer of the joint optimum of learning rate and perturbation radius across model scales. Moreover, we provide an intuitive condition to derive $\mu$P$^2$ for other perturbation rules like Adaptive SAM and SAM-ON, also ensuring balanced perturbation effects across all layers.

EAAI Journal 2024 Journal Article

Bit depth enhancement method based on visual contrast perception features

  • Zhizhong Fu
  • Changmeng Peng
  • Xiaoyang Huang
  • Maohan Xia
  • Jin Xu
  • Xiaofeng Li

High-bit images can show more detail than low-bit images do. False contours in the low-bit images should be suppressed when converting low-bit images to high-bit ones. In this paper, a contrast-aware bit depth enhancement (BDE) module is proposed based on the visual perceptual feature which models the relationship of contrast sensitivity and image bit depth. And a bit depth enhancement network is constructed by the cascade of this module. The experimental results show that the bit depth enhancement algorithm based on this network structure has the best or second-best peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) metrics when compared with the related algorithms, and on average, our PSNR (40. 20 dB) is only 1. 03% and SSIM (0. 9681) is only 0. 11% lower than that of the state-of-the-arts, but uses 6. 7% fewer parameters (11. 2M). And visual comparisons show that our algorithm can effectively suppress false contours and color distortions, resulting in high-bit images with better quality.

NeurIPS Conference 2024 Conference Paper

cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers

  • Anirudh Sundar
  • Jin Xu
  • William Gay
  • Christopher Richardson
  • Larry Heck

An emerging area of research in situated and multimodal interactive conversations (SIMMC) includes interactions in scientific papers. Since scientific papers are primarily composed of text, equations, figures, and tables, SIMMC methods must be developed specifically for each component to support the depth of inquiry and interactions required by research scientists. This work introduces $Conversational Papers$ (cPAPERS), a dataset of conversational question-answer pairs from reviews of academic papers grounded in these paper components and their associated references from scientific documents available on arXiv. We present a data collection strategy to collect these question-answer pairs from OpenReview and associate them with contextual information from $LaTeX$ source files. Additionally, we present a series of baseline approaches utilizing Large Language Models (LLMs) in both zero-shot and fine-tuned configurations to address the cPAPERS dataset.

NeurIPS Conference 2024 Conference Paper

On Feature Learning in Structured State Space Models

  • Leena Chennuru Vankadara
  • Jin Xu
  • Moritz Haas
  • Volkan Cevher

This paper studies the scaling behavior of state-space models (SSMs) and their structured variants, such as Mamba, that have recently arisen in popularity as alternatives to transformer-based neural network architectures. Specifically, we focus on the capability of SSMs to learn features as their network width approaches infinity. Our findings reveal that established scaling rules, such as the Maximal Update Parameterization, fail to support feature learning as these models cannot be represented in the form of Tensor Programs. Additionally, we demonstrate that spectral scaling conditions, shown to be effective for feature learning in a host of other architectures, do not hold the same implications for SSMs. Through a detailed signal propagation analysis in SSMs, both forward and backward, we identify the appropriate scaling necessary for non-trivial feature evolution in the infinite-width limit. Our proposed scaling shows behavior akin to the Maximal Update Parameterization, such as improved stability, better generalization, and transferability of optimal hyper-parameters from small to large scale SSMs.

AAAI Conference 2024 Conference Paper

SIG: Speaker Identification in Literature via Prompt-Based Generation

  • Zhenlin Su
  • Liyan Xu
  • Jin Xu
  • Jiangnan Li
  • Mingdu Huangfu

Identifying speakers of quotations in narratives is an important task in literary analysis, with challenging scenarios including the out-of-domain inference for unseen speakers, and non-explicit cases where there are no speaker mentions in surrounding context. In this work, we propose a simple and effective approach SIG, a generation-based method that verbalizes the task and quotation input based on designed prompt templates, which also enables easy integration of other auxiliary tasks that further bolster the speaker identification performance. The prediction can either come from direct generation by the model, or be determined by the highest generation probability of each speaker candidate. Based on our approach design, SIG supports out-of-domain evaluation, and achieves open-world classification paradigm that is able to accept any forms of candidate input. We perform both cross-domain evaluation and in-domain evaluation on PDNC, the largest dataset of this task, where empirical results suggest that SIG outperforms previous baselines of complicated designs, as well as the zero-shot ChatGPT, especially excelling at those hard non-explicit scenarios by up to 17% improvement. Additional experiments on another dataset WP further corroborate the efficacy of SIG.

ICLR Conference 2024 Conference Paper

Understanding In-Context Learning from Repetitions

  • Jianhao Yan
  • Jin Xu
  • Chiyu Song
  • Chenming Wu
  • Yafu Li
  • Yue Zhang 0004

This paper explores the elusive mechanism underpinning in-context learning in Large Language Models (LLMs). Our work provides a novel perspective by examining in-context learning via the lens of surface repetitions. We quantitatively investigate the role of surface features in text generation, and empirically establish the existence of \emph{token co-occurrence reinforcement}, a principle that strengthens the relationship between two tokens based on their contextual co-occurrences. Furthermore, we find similar reinforcements lie behind the pretraining corpus, revealing the existence is due to LLMs' efforts to maximize the likelihood. By investigating the dual impacts of these features, our research illuminates the internal workings of in-context learning and expounds on the reasons for its failures. This paper provides an essential contribution to the understanding of in-context learning and its potential limitations, providing a fresh perspective on this exciting capability.

NeurIPS Conference 2023 Conference Paper

Deep Stochastic Processes via Functional Markov Transition Operators

  • Jin Xu
  • Emilien Dupont
  • Kaspar Märtens
  • Thomas Rainforth
  • Yee Whye Teh

We introduce Markov Neural Processes (MNPs), a new class of Stochastic Processes (SPs) which are constructed by stacking sequences of neural parameterised Markov transition operators in function space. We prove that these Markov transition operators can preserve the exchangeability and consistency of SPs. Therefore, the proposed iterative construction adds substantial flexibility and expressivity to the original framework of Neural Processes (NPs) without compromising consistency or adding restrictions. Our experiments demonstrate clear advantages of MNPs over baseline models on a variety of tasks.

NeurIPS Conference 2022 Conference Paper

Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation

  • Jin Xu
  • Xiaojiang Liu
  • Jianhao Yan
  • Deng Cai
  • Huayang Li
  • Jian Li

While large-scale neural language models, such as GPT2 and BART, have achieved impressive results on various text generation tasks, they tend to get stuck in undesirable sentence-level loops with maximization-based decoding algorithms (\textit{e. g. }, greedy search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions in the human corpus (e. g. , 0. 02\% in Wikitext-103). To investigate the underlying reasons for generating consecutive sentence-level repetitions, we study the relationship between the probability of repetitive tokens and their previous repetitions in context. Through our quantitative experiments, we find that 1) Models have a preference to repeat the previous sentence; 2) The sentence-level repetitions have a \textit{self-reinforcement effect}: the more times a sentence is repeated in the context, the higher the probability of continuing to generate that sentence; 3) The sentences with higher initial probabilities usually have a stronger self-reinforcement effect. Motivated by our findings, we propose a simple and effective training method \textbf{DITTO} (Pseu\underline{D}o-Repet\underline{IT}ion Penaliza\underline{T}i\underline{O}n), where the model learns to penalize probabilities of sentence-level repetitions from synthetic repetitive data. Although our method is motivated by mitigating repetitions, our experiments show that DITTO not only mitigates the repetition issue without sacrificing perplexity, but also achieves better generation quality. Extensive experiments on open-ended text generation (Wikitext-103) and text summarization (CNN/DailyMail) demonstrate the generality and effectiveness of our method.

AAAI Conference 2022 Conference Paper

Procedural Text Understanding via Scene-Wise Evolution

  • Jialong Tang
  • Hongyu Lin
  • Meng Liao
  • Yaojie Lu
  • Xianpei Han
  • Le Sun
  • Weijian Xie
  • Jin Xu

Procedural text understanding requires machines to reason about entity states within the dynamical narratives. Current procedural text understanding approaches are commonly entity-wise, which separately track each entity and independently predict different states of each entity. Such an entity-wise paradigm does not consider the interaction between entities and their states. In this paper, we propose a new scene-wise paradigm for procedural text understanding, which jointly tracks states of all entities in a scene-by-scene manner. Based on this paradigm, we propose Scene Graph Reasoner (SGR), which introduces a series of dynamically evolving scene graphs to jointly formulate the evolution of entities, states and their associations throughout the narrative. In this way, the deep interactions between all entities and states can be jointly captured and simultaneously derived from scene graphs. Experiments show that SGR not only achieves the new state-of-the-art performance but also significantly accelerates the speed of reasoning.

TCS Journal 2022 Journal Article

Total coloring of recursive maximal planar graphs

  • Yangyang Zhou
  • Dongyang Zhao
  • Mingyuan Ma
  • Jin Xu

The recursive maximal planar graphs can be obtained from K 4, by embedding a 3-vertex in a triangular face continuously. A total k-coloring of a graph G is a coloring of its vertices and edges such that no two adjacent or incident elements receive the same color. The Total Coloring Conjecture, in short, TCC, states that every simple graph G is totally ( Δ + 2 ) -colorable, where Δ is the maximum degree of G. In this paper, we prove that TCC holds for recursive maximal planar graphs, especially, a main class of recursive maximal planar graphs, named (2, 2)-recursive maximal planar graphs, are totally ( Δ + 1 ) -colorable. Moreover, we give linear time algorithms for total coloring of recursive maximal planar graphs and (2, 2)-recursive maximal planar graphs, respectively.

JAIR Journal 2022 Journal Article

Two-phase Multi-document Event Summarization on Core Event Graphs

  • Zengjian Chen
  • Jin Xu
  • Meng Liao
  • Tong Xue
  • Kun He

Succinct event description based on multiple documents is critical to news systems as well as search engines. Different from existing summarization or event tasks, Multi-document Event Summarization (MES) aims at the query-level event sequence generation, which has extra constraints on event expression and conciseness. Identifying and summarizing the key event from a set of related articles is a challenging task that has not been sufficiently studied, mainly because online articles exhibit characteristics of redundancy and sparsity, and a perfect event summarization needs high level information fusion among diverse sentences and articles. To address these challenges, we propose a two-phase framework for the MES task, that first performs event semantic graph construction and dominant event detection via graph-sequence matching, then summarizes the extracted key event by an event-aware pointer generator. For experiments in the new task, we construct two large-scale real-world datasets for training and assessment. Extensive evaluations show that the proposed framework significantly outperforms the related baseline methods, with the most dominant event of the articles effectively identified and correctly summarized.

NeurIPS Conference 2021 Conference Paper

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

  • Yichong Leng
  • Xu Tan
  • Linchen Zhu
  • Jin Xu
  • Renqian Luo
  • Linquan Liu
  • Tao Qin
  • Xiangyang Li

Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER) than original ASR outputs. Previous works usually use a sequence-to-sequence model to correct an ASR output sentence autoregressively, which causes large latency and cannot be deployed in online ASR services. A straightforward solution to reduce latency, inspired by non-autoregressive (NAR) neural machine translation, is to use an NAR sequence generation model for ASR error correction, which, however, comes at the cost of significantly increased ASR error rate. In this paper, observing distinctive error patterns and correction operations (i. e. , insertion, deletion, and substitution) in ASR, we propose FastCorrect, a novel NAR error correction model based on edit alignment. In training, FastCorrect aligns each source token from an ASR output sentence to the target tokens from the corresponding ground-truth sentence based on the edit distance between the source and target sentences, and extracts the number of target tokens corresponding to each source token during edition/correction, which is then used to train a length predictor and to adjust the source tokens to match the length of the target sentence for parallel generation. In inference, the token number predicted by the length predictor is used to adjust the source tokens for target sequence generation. Experiments on the public AISHELL-1 dataset and an internal industrial-scale ASR dataset show the effectiveness of FastCorrect for ASR error correction: 1) it speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model; and 2) it outperforms the popular NAR models adopted in neural machine translation and text edition by a large margin.

AAAI Conference 2021 Conference Paper

Fully Exploiting Cascade Graphs for Real-time Forwarding Prediction

  • Xiangyun Tang
  • Dongliang Liao
  • Weijie Huang
  • Jin Xu
  • Liehuang Zhu
  • Meng Shen

Real-time forwarding prediction for predicting online contents’ popularity is beneficial to various social applications for enhancing interactive social behaviors. Cascade graphs, formed by online contents’ propagation, play a vital role in real-time forwarding prediction. Existing cascade graph modeling methods are inadequate to embed cascade graphs that have hub structures and deep cascade paths, or they fail to handle the short-term outbreak of forwarding amount. To this end, we propose a novel real-time forwarding prediction method that includes an effective approach for cascade graph embedding and a short-term variation sensitive method for time-series modeling, making the best of cascade graph features. Using two real world datasets, we demonstrate the significant superiority of the proposed method compared with the state-of-the-art. Our experiments also reveal interesting implications hidden in the performance differences between cascade graph embedding and time-series modeling.

NeurIPS Conference 2021 Conference Paper

Group Equivariant Subsampling

  • Jin Xu
  • Hyunjik Kim
  • Thomas Rainforth
  • Yee Teh

Subsampling is used in convolutional neural networks (CNNs) in the form of pooling or strided convolutions, to reduce the spatial dimensions of feature maps and to allow the receptive fields to grow exponentially with depth. However, it is known that such subsampling operations are not translation equivariant, unlike convolutions that are translation equivariant. Here, we first introduce translation equivariant subsampling/upsampling layers that can be used to construct exact translation equivariant CNNs. We then generalise these layers beyond translations to general groups, thus proposing group equivariant subsampling/upsampling. We use these layers to construct group equivariant autoencoders (GAEs) that allow us to learn low-dimensional equivariant representations. We empirically verify on images that the representations are indeed equivariant to input translations and rotations, and thus generalise well to unseen positions and orientations. We further use GAEs in models that learn object-centric representations on multi-object datasets, and show improved data efficiency and decomposition compared to non-equivariant baselines.

IJCAI Conference 2021 Conference Paper

GSPL: A Succinct Kernel Model for Group-Sparse Projections Learning of Multiview Data

  • Danyang Wu
  • Jin Xu
  • Xia Dong
  • Meng Liao
  • Rong Wang
  • Feiping Nie
  • Xuelong Li

This paper explores a succinct kernel model for Group-Sparse Projections Learning (GSPL), to handle multiview feature selection task completely. Compared to previous works, our model has the following useful properties: 1) Strictness: GSPL innovatively learns group-sparse projections strictly on multiview data via ‘2; 0-norm constraint, which is different with previous works that encourage group-sparse projections softly. 2) Adaptivity: In GSPL model, when the total number of selected features is given, the numbers of selected features of different views can be determined adaptively, which avoids artificial settings. Besides, GSPL can capture the differences among multiple views adaptively, which handles the inconsistent problem among different views. 3) Succinctness: Except for the intrinsic parameters of projection-based feature selection task, GSPL does not bring extra parameters, which guarantees the applicability in practice. To solve the optimization problem involved in GSPL, a novel iterative algorithm is proposed with rigorously theoretical guarantees. Experimental results demonstrate the superb performance of GSPL on synthetic and real datasets.

AAAI Conference 2021 Conference Paper

Hierarchical Coherence Modeling for Document Quality Assessment

  • Dongliang Liao
  • Jin Xu
  • Gongfu Li
  • Yiru Wang

Text coherence plays a key role in document quality assessment. Most existing text coherence methods only focus on the similarity of adjacent sentences. However, local coherence exists in sentences with broader contexts and diverse rhetoric relations, rather than just adjacent sentence similarity. Besides, the high-level text coherence is also an important aspect of document quality. To this end, we propose a hierarchical coherence model for document quality assessment. In our model, we implement the local attention mechanism to capture the location semantics, bilinear tensor layer to measure coherence and max-coherence pooling to acquire highlevel coherence. We evaluate the proposed method on two realistic tasks: news quality judgement and automated essay scoring. Experimental results demonstrate the validity and superiority of our work.

NeurIPS Conference 2021 Conference Paper

Speech-T: Transducer for Text to Speech and Beyond

  • Jiawei Chen
  • Xu Tan
  • Yichong Leng
  • Jin Xu
  • Guihua Wen
  • Tao Qin
  • Tie-Yan Liu

Neural Transducer (e. g. , RNN-T) has been widely used in automatic speech recognition (ASR) due to its capabilities of efficiently modeling monotonic alignments between input and output sequences and naturally supporting streaming inputs. Considering that monotonic alignments are also critical to text to speech (TTS) synthesis and streaming TTS is also an important application scenario, in this work, we explore the possibility of applying Transducer to TTS and more. However, it is challenging because it is difficult to trade off the emission (continuous mel-spectrogram prediction) probability and transition (ASR Transducer predicts blank token to indicate transition to next input) probability when calculating the output probability lattice in Transducer, and it is not easy to learn the alignments between text and speech through the output probability lattice. We propose SpeechTransducer (Speech-T for short), a Transformer based Transducer model that 1) uses a new forward algorithm to separate the transition prediction from the continuous mel-spectrogram prediction when calculating the output probability lattice, and uses a diagonal constraint in the probability lattice to help the alignment learning; 2) supports both full-sentence or streaming TTS by adjusting the look-ahead context; and 3) further supports both TTS and ASR together for the first time, which enjoys several advantages including fewer parameters as well as streaming synthesis and recognition in a single model. Experiments on LJSpeech datasets demonstrate that Speech-T 1) is more robust than the attention based autoregressive TTS model due to its inherent monotonic alignments between text and speech; 2) naturally supports streaming TTS with good voice quality; and 3) enjoys the benefit of joint modeling TTS and ASR in a single network.

AAAI Conference 2020 Conference Paper

Active Learning with Query Generation for Cost-Effective Text Classification

  • Yi-Fan Yan
  • Sheng-Jun Huang
  • Shaoyi Chen
  • Meng Liao
  • Jin Xu

Labeling a text document is usually time consuming because it requires the annotator to read the whole document and check its relevance with each possible class label. It thus becomes rather expensive to train an effective model for text classification when it involves a large dataset of long documents. In this paper, we propose an active learning approach for text classification with lower annotation cost. Instead of scanning all the examples in the unlabeled data pool to select the best one for query, the proposed method automatically generates the most informative examples based on the classification model, and thus can be applied to tasks with large scale or even infinite unlabeled data. Furthermore, we propose to approximate the generated example with a few summary words by sparse reconstruction, which allows the annotators to easily assign the class label by reading a few words rather than the long document. Experiments on different datasets demonstrate that the proposed approach can effectively improve the classification performance while significantly reduce the annotation cost.

AAAI Conference 2020 Conference Paper

Simultaneous Learning of Pivots and Representations for Cross-Domain Sentiment Classification

  • Liang Li
  • Weirui Ye
  • Mingsheng Long
  • Yateng Tang
  • Jin Xu
  • Jianmin Wang

Cross-domain sentiment classification aims to leverage useful knowledge from a source domain to mitigate the supervision sparsity in a target domain. A series of approaches depend on the pivot features that behave similarly for polarity prediction in both domains. However, the engineering of such pivot features remains cumbersome and prevents us from learning the disentangled and transferable representations from rich semantic and syntactic information. Towards learning the pivots and representations simultaneously, we propose a new Transferable Pivot Transformer (TPT). Our model consists of two networks: a Pivot Selector that learns to detect transferable ngram pivots from contexts, and a Transferable Transformer that learns to generate domain-invariant representations by modeling the correlation between pivot and non-pivot words. The Pivot Selector and Transferable Transformer are jointly optimized through end-to-end back-propagation. We experiment with real tasks of cross-domain sentiment classification over 20 domain pairs where our model outperforms prior arts.

AAAI Conference 2020 Conference Paper

Transfer Value Iteration Networks

  • Junyi Shen
  • Hankz Hankui Zhuo
  • Jin Xu
  • Bin Zhong
  • Sinno Pan

Value iteration networks (VINs) have been demonstrated to have a good generalization ability for reinforcement learning tasks across similar domains. However, based on our experiments, a policy learned by VINs still fail to generalize well on the domain whose action space and feature space are not identical to those in the domain where it is trained. In this paper, we propose a transfer learning approach on top of VINs, termed Transfer VINs (TVINs), such that a learned policy from a source domain can be generalized to a target domain with only limited training data, even if the source domain and the target domain have domain-specific actions and features. We empirically verify that our proposed TVINs outperform VINs when the source and the target domains have similar but not identical action and feature spaces. Furthermore, we show that the performance improvement is consistent across different environments, maze sizes, dataset sizes as well as different values of hyperparameters such as number of iteration and kernel size.

AAAI Conference 2020 Conference Paper

Weak Supervision for Fake News Detection via Reinforcement Learning

  • Yaqing Wang
  • Weifeng Yang
  • Fenglong Ma
  • Jin Xu
  • Bin Zhong
  • Qiang Deng
  • Jing Gao

Today social media has become the primary source for news. Via social media platforms, fake news travel at unprecedented speeds, reach global audiences and put users and communities at great risk. Therefore, it is extremely important to detect fake news as early as possible. Recently, deep learning based approaches have shown improved performance in fake news detection. However, the training of such models requires a large amount of labeled data, but manual annotation is time-consuming and expensive. Moreover, due to the dynamic nature of news, annotated samples may become outdated quickly and cannot represent the news articles on newly emerged events. Therefore, how to obtain fresh and high-quality labeled samples is the major challenge in employing deep learning models for fake news detection. In order to tackle this challenge, we propose a reinforced weaklysupervised fake news detection framework, i. e. , WeFEND, which can leverage users’ reports as weak supervision to enlarge the amount of training data for fake news detection. The proposed framework consists of three main components: the annotator, the reinforced selector and the fake news detector. The annotator can automatically assign weak labels for unlabeled news based on users’ reports. The reinforced selector using reinforcement learning techniques chooses highquality samples from the weakly labeled data and filters out those low-quality ones that may degrade the detector’s prediction performance. The fake news detector aims to identify fake news based on the news content. We tested the proposed framework on a large collection of news articles published via WeChat official accounts and associated user reports. Extensive experiments on this dataset show that the proposed We- FEND model achieves the best performance compared with the state-of-the-art methods.

AAAI Conference 2019 Conference Paper

Popularity Prediction on Online Articles with Deep Fusion of Temporal Process and Content Features

  • Dongliang Liao
  • Jin Xu
  • Gongfu Li
  • Weijie Huang
  • Weiqing Liu
  • Jing Li

Predicting the popularity of online article sheds light to many applications such as recommendation, advertising and information retrieval. However, there are several technical challenges to be addressed for developing the best of predictive capability. (1) The popularity fluctuates under impacts of external factors, which are unpredictable and hard to capture. (2) Content and meta-data features, largely determining the online content popularity, are usually multi-modal and nontrivial to model. (3) Besides, it also needs to figure out how to integrate temporal process and content features modeling for popularity prediction in different lifecycle stages of online articles. In this paper, we propose a Deep Fusion of Temporal process and Content features (DFTC) method to tackle them. For modeling the temporal popularity process, we adopt the recurrent neural network and convolutional neural network. For multi-modal content features, we exploit the hierarchical attention network and embedding technique. Finally, a temporal attention fusion is employed for dynamically integrating all these parts. Using datasets collected from WeChat, we show that the proposed model significantly outperforms stateof-the-art approaches on popularity prediction.