Arrow Research search

Author name cluster

Han Shi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers
2 author rows

Possible papers

20

AAAI Conference 2026 Conference Paper

DeepWriter: A Multi-Agent Collaboration Framework for Information-rich Ultra-long Book Writing

  • Ming Wang
  • Minghao Hu
  • Xiuli Kang
  • Li He
  • Yu Tian
  • Chunming Liu
  • Han Shi
  • Zhunchen Luo

Long-form books are among the most information-rich and structurally complex forms of written content, often exceeding 100,000 words. While recent methods have enabled basic long-text generation, they remain limited in two key aspects: the inability to generate ultra-long content at book scale, and the lack of mechanisms for integrating rich factual information. To address these limitations, we propose DeepWriter, a multi-agent collaborative framework that follows a structured planning-then-generation paradigm. It first constructs a detailed book outline with narrative arcs and chapter semantics, then incrementally generates content conditioned on retrieved knowledge and contextual signals. DeepWriter supports controllable generation of full-length books exceeding 100,000 words, enriched with citations, trivia and images. To support evaluation beyond surface-level fluency, we introduce DeepWriter-Bench, a bilingual benchmark of 18 annotated books designed to assess book-scale coherence, richness, and factual grounding. Additionally, we propose BookScore, a unified 100-point metric for quantifying book maturity. Experimental results show that DeepWriter achieves a state-of-the-art BookScore of 80.92, consistently outperforming strong baselines.

ICLR Conference 2025 Conference Paper

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding

  • Yao Teng
  • Han Shi
  • Xian Liu
  • Xuefei Ning
  • Guohao Dai 0001
  • Yu Wang 0002
  • Zhenguo Li
  • Xihui Liu

The current large auto-regressive models can generate high-quality, high-resolution images, but these models require hundreds or even thousands of steps of next-token prediction during inference, resulting in substantial time consumption. In existing studies, Jacobi decoding, an iterative parallel decoding algorithm, has been used to accelerate the auto-regressive generation and can be executed without training. However, the Jacobi decoding relies on a deterministic criterion to determine the convergence of iterations. Thus, it works for greedy decoding but is incompatible with sampling-based decoding which is crucial for visual quality and diversity in the current auto-regressive text-to-image generation. In this paper, we propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation. By introducing a probabilistic convergence criterion, our SJD accelerates the inference of auto-regressive text-to-image generation while maintaining the randomness in sampling-based token decoding and allowing the model to generate diverse images. Specifically, SJD facilitates the model to predict multiple tokens at each step and accepts tokens based on the probabilistic criterion, enabling the model to generate images with fewer steps than the conventional next-token-prediction paradigm. We also investigate the token initialization strategies that leverage the spatial locality of visual data to further improve the acceleration ratio under specific scenarios. We conduct experiments for our proposed SJD on multiple auto-regressive text-to-image generation models, showing the effectiveness of model acceleration without sacrificing the visual quality. The code of our work is available here: https://github.com/tyshiwo1/Accelerating-T2I-AR-with-SJD/.

TMLR Journal 2025 Journal Article

AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures

  • Yihang Gao
  • Chuanyang Zheng
  • Enze Xie
  • Han Shi
  • Tianyang Hu
  • Yu Li
  • Michael Ng
  • Zhenguo Li

Besides natural language processing, transformers exhibit extraordinary performance in solving broader applications, including scientific computing and computer vision. Previous works try to explain this from the expressive power and capability perspectives that standard transformers are capable of performing some algorithms. To empower transformers with algorithmic capabilities and motivated by the recently proposed looped transformer (Yang et al., 2024; Giannou et al., 2023), we design a novel transformer framework, dubbed Algorithm Transformer (abbreviated as AlgoFormer). We provide an insight that efficient transformer architectures can be designed by leveraging prior knowledge of tasks and the underlying structure of potential algorithms. Compared with the standard transformer and vanilla looped transformer, the proposed AlgoFormer can perform efficiently in algorithm representation in some specific tasks. In particular, inspired by the structure of human-designed learning algorithms, our transformer framework consists of a pre-transformer that is responsible for task preprocessing, a looped transformer for iterative optimization algorithms, and a post-transformer for producing the desired results after post-processing. We provide theoretical evidence of the expressive power of the AlgoFormer in solving some challenging problems, mirroring human-designed algorithms. Furthermore, some theoretical and empirical results are presented to show that the designed transformer has the potential to perform algorithm representation and learning. Experimental results demonstrate the empirical superiority of the proposed transformer in that it outperforms the standard transformer and vanilla looped transformer in some specific tasks. An extensive experiment on real language tasks (e.g., neural machine translation of German and English, and text classification) further validates the expressiveness and effectiveness of AlgoFormer.

IJCAI Conference 2025 Conference Paper

MMNet: Missing-Aware and Memory-Enhanced Network for Multivariate Time Series Imputation

  • Xiaoye Miao
  • Han Shi
  • Yi Yuan
  • Daozhan Pan
  • Yangyang Wu
  • Xiaohua Pan

Multivariate time series (MTS) data in real-world scenarios are often incomplete, which hinders effective data analysis. Therefore, MTS imputation has been widely studied to facilitate various MTS tasks. Existing imputation methods primarily initialize missing values with zeros in order to perform effective incomplete MTS encoding, which impede the model's capacity to precisely discern the missing distribution. Moreover, these methods often overlook the global similarity in time series but are limited in the use of local information within the sample. To this end, we propose a novel multivariate time series imputation network model, named MMNet. MMNet introduces a Missing-Aware Embedding (MAE) approach to adaptively represent incomplete MTS, allowing the model to better distinguish between missing and observed data. Furthermore, we design a Memory-Enhanced Encoder (MEE) aimed at modeling prior knowledge through memory mechanism, enabling better utilization of the global similarity within the time series. Building upon this, MMNet incorporates a Multi-scale Mixing architecture (MSM) that leverages information from multiple scales to enhance the final imputation. Extensive experiments on four public real-world datasets demonstrate that, MMNet yields a more than 25% gain in performance, compared with the state-of-the-art methods.

NeurIPS Conference 2025 Conference Paper

Multi-Objective One-Shot Pruning for Large Language Models

  • Weiyu Chen
  • Hansi Yang
  • Yunhao Gou
  • Han Shi
  • Enliang Hu
  • Zhenguo Li
  • James Kwok

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks but require substantial computational resources, limiting their deployment in resource-constrained environments. While one-shot pruning methods can reduce model size without expensive retraining, they typically optimize for single objectives, ignoring LLMs' multi-faceted applications. We introduce Multi-Objective One-Shot Pruning (MOSP), which formulates LLM pruning as a multi-objective optimization problem. MOSP efficiently generates a Pareto set of pruned models representing different capability trade-offs, allowing users to select solutions aligned with their preferences. The proposed approach identifies share core support while enabling specialized support. Experiments across various LLMs and sparsity levels demonstrate MOSP's superior performance in navigating multi-objective trade-offs compared to baseline methods.

ICML Conference 2025 Conference Paper

SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator

  • Guoxuan Chen
  • Han Shi
  • Jiawei Li
  • Yihang Gao
  • Xiaozhe Ren
  • Yimeng Chen
  • Xin Jiang
  • Zhenguo Li

Large Language Models (LLMs) have exhibited exceptional performance across a spectrum of natural language processing tasks. However, their substantial sizes pose considerable challenges, particularly in computational demands and inference speed, due to their quadratic complexity. In this work, we have identified a key pattern: certain seemingly meaningless separator tokens (i. e. , punctuations) contribute disproportionately to attention scores compared to semantically meaningful tokens. This observation suggests that information of the segments between these separator tokens can be effectively condensed into the separator tokens themselves without significant information loss. Guided by this insight, we introduce SepLLM, a plug-and-play framework that accelerates inference by compressing these segments and eliminating redundant tokens. Additionally, we implement efficient kernels for training acceleration. Experimental results across training-free, training-from-scratch, and post-training settings demonstrate SepLLM’s effectiveness. Notably, using the Llama-3-8B backbone, SepLLM achieves over 50% reduction in KV cache on the GSM8K-CoT benchmark while maintaining comparable performance. Furthermore, in streaming settings, SepLLM effectively processes sequences of up to 4 million tokens or more while maintaining consistent language modeling capabilities.

NeurIPS Conference 2025 Conference Paper

Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation

  • Yao Teng
  • Fu-Yun Wang
  • Xian Liu
  • Zhekai Chen
  • Han Shi
  • Yu Wang
  • Zhenguo Li
  • Weiyang Liu

As a new paradigm of visual content generation, autoregressive text-to-image models suffer from slow inference due to their sequential token-by-token decoding process, often requiring thousands of model forward passes to generate a single image. To address this inefficiency, we propose Speculative Jacobi-Denoising Decoding (SJD2), a framework that incorporates the denoising process into Jacobi iterations to enable parallel token generation in autoregressive models. Our method introduces a next-clean-token prediction paradigm that enables the pre-trained autoregressive models to accept noise-perturbed token embeddings and predict the next clean tokens through low-cost fine-tuning. This denoising paradigm guides the model towards more stable Jacobi trajectories. During inference, our method initializes token sequences with Gaussian noise and performs iterative next-clean-token-prediction in the embedding space. We employ a probabilistic criterion to verify and accept multiple tokens in parallel, and refine the unaccepted tokens for the next iteration with the denoising trajectory. Experiments show that our method can accelerate generation by reducing model forward passes while maintaining the visual quality of generated images.

NeurIPS Conference 2024 Conference Paper

DAPE: Data-Adaptive Positional Encoding for Length Extrapolation

  • Chuanyang Zheng
  • Yihang Gao
  • Han Shi
  • Minbin Huang
  • Jingyao Li
  • Jing Xiong
  • Xiaozhe Ren
  • Michael Ng

Positional encoding plays a crucial role in transformers, significantly impact- ing model performance and length generalization. Prior research has introduced absolute positional encoding (APE) and relative positional encoding (RPE) to distinguish token positions in given sequences. However, both APE and RPE remain fixed after model training regardless of input data, limiting their adaptability and flexibility. Hence, we expect that the desired positional encoding should be data-adaptive and can be dynamically adjusted with the given attention. In this paper, we propose a Data-Adaptive Positional Encoding (DAPE) method, which dynamically and semantically adjusts based on input context and learned fixed priors. Experimental validation on real-world datasets (Arxiv, Books3, and CHE) demonstrates that DAPE enhances model performances in terms of trained length and length generalization, where the improvements are statistically significant. The model visualization suggests that our model can keep both local and anti-local information. Finally, we successfully train the model on sequence length 128 and achieve better performance at evaluation sequence length 8192, compared with other static positional encoding methods, revealing the benefit of the adaptive positional encoding method.

NeurIPS Conference 2024 Conference Paper

Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models

  • Jiacheng Ye
  • Shansan Gong
  • Liheng Chen
  • Lin Zheng
  • Jiahui Gao
  • Han Shi
  • Chuan Wu
  • Xin Jiang

Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models. In this work, we propose Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought, a well-established technique for improving the reasoning ability of autoregressive language models. In contrast to autoregressive language models that make decisions in a left-to-right, token-by-token manner, DoT allows reasoning steps to diffuse over time through a diffusion language model and offers greater flexibility in trading-off computation for reasoning performance. Our experimental results demonstrate the effectiveness of DoT in multi-digit multiplication, boolean logic, and grade school math problems. In addition to that, DoT showcases promising self-correction abilities and benefits from existing reasoning-enhancing techniques like self-consistency decoding. Our findings contribute to the understanding and development of reasoning with diffusion language models.

ICLR Conference 2024 Conference Paper

LEGO-Prover: Neural Theorem Proving with Growing Libraries

  • Haiming Wang
  • Huajian Xin
  • Chuanyang Zheng
  • Zhengying Liu
  • Qingxing Cao
  • Yinya Huang
  • Jing Xiong
  • Han Shi

Despite the success of large language models (LLMs), the task of theorem proving still remains one of the hardest reasoning tasks that is far from being fully solved. Prior methods using language models have demonstrated promising results, but they still struggle to prove even middle school level theorems. One common limitation of these methods is that they assume a fixed theorem library during the whole theorem proving process. However, as we all know, creating new useful theorems or even new theories is not only helpful but crucial and necessary for advancing mathematics and proving harder and deeper results. In this work, we present LEGO-Prover, which employs a growing skill library containing verified lemmas as skills to augment the capability of LLMs used in theorem proving. By constructing the proof modularly, LEGO-Prover enables LLMs to utilize existing skills retrieved from the library and to create new skills during the proving process. These skills are further evolved (by prompting an LLM) to enrich the library on another scale. Modular and reusable skills are constantly added to the library to enable tackling increasingly intricate mathematical problems. Moreover, the learned library further bridges the gap between human proofs and formal proofs by making it easier to impute missing steps. LEGO-Prover advances the state-of-the-art pass rate on miniF2F-valid (48.0\% to 57.0\%) and miniF2F-test (45.5\% to 50.0\%). During the proving process, LEGO-Prover also generates over 20,000 skills (theorems/lemmas) and adds them to the growing library. Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in a 4.9\% improvement in success rate

ICLR Conference 2024 Conference Paper

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

  • Longhui Yu
  • Weisen Jiang
  • Han Shi
  • Jincheng Yu
  • Zhengying Liu
  • Yu Zhang 0006
  • James T. Kwok
  • Zhenguo Li

Large language models (LLMs) have pushed the limits of natural language understanding and exhibited excellent problem-solving ability. Despite the great success, most existing open-source LLMs (\eg, LLaMA-2) are still far away from satisfactory for solving mathematical problems due to the complex reasoning procedures. To bridge this gap, we propose \emph{MetaMath}, a finetuned language model that specializes in mathematical reasoning. Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives, which results in a new dataset called MetaMathQA. Then we finetune the LLaMA-2 models on MetaMathQA. Experimental results on two popular benchmarks (\ie, GSM8K and MATH) for mathematical reasoning demonstrate that MetaMath outperforms a suite of open-source LLMs by a significant margin. Our MetaMath-7B model achieves $66.5\%$ on GSM8K and $19.8\%$ on MATH, exceeding the state-of-the-art models of the same size by $11.5\%$ and $8.7\%$. Particularly, MetaMath-70B achieves an accuracy of $82.3\%$ on GSM8K, slightly better than GPT-3.5-Turbo. We release the MetaMathQA dataset, the MetaMath models with different model sizes and the training code for public use.

YNICL Journal 2024 Journal Article

Right superior frontal gyrus: A potential neuroimaging biomarker for predicting short-term efficacy in schizophrenia

  • Yongfeng Yang
  • Xueyan Jin
  • Yongjiang Xue
  • Xue Li
  • Yi Chen
  • Ning Kang
  • Wei Yan
  • Peng Li

Antipsychotic drug treatment for schizophrenia (SZ) can alter brain structure and function, but it is unclear if specific regional changes are associated with treatment outcome. Therefore, we examined the effects of antipsychotic drug treatment on regional grey matter (GM) density, white matter (WM) density, and functional connectivity (FC) as well as associations between regional changes and treatment efficacy. SZ patients (n = 163) and health controls (HCs) (n = 131) were examined by structural magnetic resonance imaging (sMRI) at baseline, and a subset of SZ patients (n = 77) were re-examined after 8 weeks of second-generation antipsychotic treatment to assess changes in regional GM and WM density. In addition, 88 SZ patients and 81 HCs were examined by resting-state functional MRI (rs-fMRI) at baseline and the patients were re-examined post-treatment to examine FC changes. The Positive and Negative Syndrome Scale (PANSS) and MATRICS Consensus Cognitive Battery (MCCB) were applied to measure psychiatric symptoms and cognitive impairments in SZ. SZ patients were then stratified into response and non-response groups according to PANSS score change (≥50 % decrease or <50 % decrease, respectively). The GM density of the right cingulate gyrus, WM density of the right superior frontal gyrus (SFG) plus 5 other WM tracts were reduced in the response group compared to the non-response group. The FC values between the right anterior cingulate and paracingulate gyrus and left thalamus were reduced in the entire SZ group (n = 88) after treatment, while FC between the right inferior temporal gyrus (ITG) and right medial superior frontal gyrus (SFGmed) was increased in the response group. There were no significant changes in regional FC among the non-response group after treatment and no correlations with symptom or cognition test scores. These findings suggest that the right SFG is a critical target of antipsychotic drugs and that WM density and FC alterations within this region could be used as potential indicators in predicting the treatment outcome of antipsychotics of SZ.

YNICL Journal 2023 Journal Article

Cortical anatomical variations, gene expression profiles, and clinical phenotypes in patients with schizophrenia

  • Yong Han
  • Yongfeng Yang
  • Zhilu Zhou
  • Xueyan Jin
  • Han Shi
  • Minglong Shao
  • Meng Song
  • Xi Su

BACKGROUND AND HYPOTHESIS: Schizophrenia (SZ) patients display significant structural brain abnormalities; nevertheless, the genetic mechanisms regulating cortical anatomical variations and their correlation with the disease phenotype are still ambiguous. STUDY DESIGN: We characterized anatomical variation using a surface-based method derived from structural magnetic resonance imaging of patients with SZ and age- and sex-matched healthy controls (HCs). Partial least-squares regression was performed across cortex regions between anatomical variation and average transcriptional profiles of SZ risk genes and all qualified genes from the Allen Human Brain Atlas. The morphological features of each brain region were correlated to symptomology variables in patients with SZ using partial correlation analysis. STUDY RESULTS: A total of 203 SZ and 201 HCs were included in the final analysis. We observed significant variation of 55 regions of cortical thickness, 23 regions of volume, 7 regions of area, and 55 regions of local gyrification index (LGI) between SZ and HC groups. Expression profiles of 4 SZ risk genes and 96 genes from all qualified genes showed a correlation to anatomical variability, however, after multiple comparisons, the correlations were no longer significant. LGI variability in multiple frontal subregions was associated with specific symptoms of SZ, whereas cognitive function involving attention/vigilance was linked to LGI variability across nine brain regions. CONCLUSIONS: Cortical anatomical variation of patients with schizophrenia is associated with gene transcriptome profiles as well as clinical phenotypes.

YNICL Journal 2022 Journal Article

Abnormal patterns of regional homogeneity and functional connectivity across the adolescent first-episode, adult first-episode and adult chronic schizophrenia

  • Yongfeng Yang
  • Yuqing Sun
  • Yuliang Zhang
  • Xueyan Jin
  • Zheng Li
  • Minli Ding
  • Han Shi
  • Qing Liu

Functional deficits in schizophrenia (SZ) are observed prior to the onset of psychosis and differ at different stages of SZ. However, there is a paucity of studies focused on adolescent first-episode SZ (AOS), adult first-episode SZ (AFES), and adult chronic SZ (CHSZ). In this study, we investigated regional activity and corresponding functional connectivity alterations that have aimed to compare the three disease stages simultaneously. The subjects comprised 49 patients with AOS, 57 patients with AFES, 51 patients with CHSZ, 41 adolescent healthy controls, and 138 adult healthy controls. We compared regional homogeneity (ReHo) between patients at each disease stage with matched healthy controls. We focused on the shared brain regions that showed significant differences between SZ patients at the three different disease stages and healthy controls. Further analysis was conducted to explore whether the patterns of the whole brain functional connectivity alterations were similar. The putamen and medial frontal gyrus (MFG) showed consistently abnormal patterns in AOS, AFES, and CHSZ. Commonly decreased ReHo values in the MFG and increased ReHo values in the bilateral putamen were found in AOS, AFES, and CHSZ. Functional connectivity of MFG remained common abnormality in different SZ stage. In conclusion, ReHo abnormalities in the MFG and the putamen may be common abnormal patterns of brain function in the three different stages of SZ. The vmPFC-dlPFC FC abnormality common occurs in adolescence and adulthood.. This study may provide a more comprehensive understanding of the neurodevelopmental abnormality across the AOS, AFES, and CHSZ.

AAAI Conference 2022 Conference Paper

AutoBERT-Zero: Evolving BERT Backbone from Scratch

  • Jiahui Gao
  • Hang Xu
  • Han Shi
  • Xiaozhe Ren
  • Philip L. H. Yu
  • Xiaodan Liang
  • Xin Jiang
  • Zhenguo Li

Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural language processing (NLP) tasks. However, the conventional paradigm constructs the backbone by purely stacking the manually designed global selfattention layers, introducing inductive bias and thus leads to sub-optimal. In this work, we make the first attempt to automatically discover novel pre-trained language model (PLM) backbone on a flexible search space containing the most fundamental operations from scratch. Specifically, we propose a well-designed search space which (i) contains primitive math operations in the intra-layer level to explore novel attention structures, and (ii) leverages convolution blocks to be the supplementary for attentions in the inter-layer level to better learn local dependency. To enhance the efficiency for finding promising architectures, we propose an Operation- Priority Neural Architecture Search (OP-NAS) algorithm, which optimizes both the search algorithm and evaluation of candidate models. Specifically, we propose Operation- Priority (OP) evolution strategy to facilitate model search via balancing exploration and exploitation. Furthermore, we design a Bi-branch Weight-Sharing (BIWS) training strategy for fast model evaluation. Extensive experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks, proving the architecture’s transfer and scaling abilities. Remarkably, AutoBERT-Zerobase outperforms RoBERTa-base (using much more data) and BERT-large (with much larger model size) by 2. 4 and 1. 4 higher score on GLUE test set.

ICLR Conference 2022 Conference Paper

Revisiting Over-smoothing in BERT from the Perspective of Graph

  • Han Shi
  • Jiahui Gao
  • Hang Xu 0004
  • Xiaodan Liang
  • Zhenguo Li
  • Lingpeng Kong
  • Stephen M. S. Lee
  • James T. Kwok

Recently over-smoothing phenomenon of Transformer-based models is observed in both vision and language fields. However, no existing work has delved deeper to further investigate the main cause of this phenomenon. In this work, we make the attempt to analyze the over-smoothing problem from the perspective of graph, where such problem was first discovered and explored. Intuitively, the self-attention matrix can be seen as a normalized adjacent matrix of a corresponding graph. Based on the above connection, we provide some theoretical analysis and find that layer normalization plays a key role in the over-smoothing issue of Transformer-based models. Specifically, if the standard deviation of layer normalization is sufficiently large, the output of Transformer stacks will converge to a specific low-rank subspace and result in over-smoothing. To alleviate the over-smoothing problem, we consider hierarchical fusion strategies, which combine the representations from different layers adaptively to make the output more diverse. Extensive experiment results on various data sets illustrate the effect of our fusion method.

ICML Conference 2021 Conference Paper

SparseBERT: Rethinking the Importance Analysis in Self-attention

  • Han Shi
  • Jiahui Gao
  • Xiaozhe Ren
  • Hang Xu 0004
  • Xiaodan Liang
  • Zhenguo Li
  • James T. Kwok

Transformer-based models are popularly used in natural language processing (NLP). Its core component, self-attention, has aroused widespread interest. To understand the self-attention mechanism, a direct method is to visualize the attention map of a pre-trained model. Based on the patterns observed, a series of efficient Transformers with different sparse attention masks have been proposed. From a theoretical perspective, universal approximability of Transformer-based models is also recently proved. However, the above understanding and analysis of self-attention is based on a pre-trained model. To rethink the importance analysis in self-attention, we study the significance of different positions in attention matrix during pre-training. A surprising result is that diagonal elements in the attention map are the least important compared with other attention positions. We provide a proof showing that these diagonal elements can indeed be removed without deteriorating model performance. Furthermore, we propose a Differentiable Attention Mask (DAM) algorithm, which further guides the design of the SparseBERT. Extensive experiments verify our interesting findings and illustrate the effect of the proposed algorithm.

NeurIPS Conference 2020 Conference Paper

Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS

  • Han Shi
  • Renjie Pi
  • Hang Xu
  • Zhenguo Li
  • James Kwok
  • Tong Zhang

Neural Architecture Search (NAS) has shown great potentials in finding better neural network designs. Sample-based NAS is the most reliable approach which aims at exploring the search space and evaluating the most promising architectures. However, it is computationally very costly. As a remedy, the one-shot approach has emerged as a popular technique for accelerating NAS using weight-sharing. However, due to the weight-sharing of vastly different networks, the one-shot approach is less reliable than the sample-based approach. In this work, we propose BONAS (Bayesian Optimized Neural Architecture Search), a sample-based NAS framework which is accelerated using weight-sharing to evaluate multiple related architectures simultaneously. Specifically, we apply Graph Convolutional Network predictor as a surrogate model for Bayesian Optimization to select multiple related candidate models in each iteration. We then apply weight-sharing to train multiple candidate models simultaneously. This approach not only accelerates the traditional sample-based approach significantly, but also keeps its reliability. This is because weight-sharing among related architectures are more reliable than those in the one-shot approach. Extensive experiments are conducted to verify the effectiveness of our method over many competing algorithms.

AAAI Conference 2020 Conference Paper

Effective Decoding in Graph Auto-Encoder Using Triadic Closure

  • Han Shi
  • Haozheng Fan
  • James T. Kwok

The (variational) graph auto-encoder and its variants have been popularly used for representation learning on graphstructured data. While the encoder is often a powerful graph convolutional network, the decoder reconstructs the graph structure by only considering two nodes at a time, thus ignoring possible interactions among edges. On the other hand, structured prediction, which considers the whole graph simultaneously, is computationally expensive. In this paper, we utilize the well-known triadic closure property which is exhibited in many real-world networks. We propose the triad decoder, which considers and predicts the three edges involved in a local triad together. The triad decoder can be readily used in any graph-based auto-encoder. In particular, we incorporate this to the (variational) graph auto-encoder. Experiments on link prediction, node clustering and graph generation show that the use of triads leads to more accurate prediction, clustering and better preservation of the graph characteristics.

AAAI Conference 2020 Conference Paper

Table2Analysis: Modeling and Recommendation of Common Analysis Patterns for Multi-Dimensional Data

  • Mengyu Zhou
  • Wang Tao
  • Ji Pengxin
  • Han Shi
  • Zhang Dongmei

Given a table of multi-dimensional data, what analyses would human create to extract information from it? From scientific exploration to business intelligence (BI), this is a key problem to solve towards automation of knowledge discovery and decision making. In this paper, we propose Table2Analysis to learn commonly conducted analysis patterns from large amount of (table, analysis) pairs, and recommend analyses for any given table even not seen before. Multi-dimensional data as input challenges existing model architectures and training techniques to fulfill the task. Based on deep Q-learning with heuristic search, Table2Analysis does table to sequence generation, with each sequence encoding an analysis. Table2Analysis has 0. 78 recall at top-5 and 0. 65 recall at top-1 in our evaluation against a large scale spreadsheet corpus on the PivotTable recommendation task.