Arrow Research search

Author name cluster

Lin Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
2 author rows

Possible papers

18

AAAI Conference 2026 Conference Paper

GenePheno: Interpretable Gene Knockout-Induced Phenotype Abnormality Prediction from Gene Sequences

  • Jingquan Yan
  • Yuwei Miao
  • Lei Yu
  • Yuzhi Guo
  • Xue Xiao
  • Lin Xu
  • Junzhou Huang

Exploring how genetic sequences shape phenotypes is a fundamental challenge in biology and a key step toward scalable, hypothesis-driven experimentation. The task is complicated by the large modality gap between sequences and phenotypes, as well as the pleiotropic nature of gene–phenotype relationships. Existing sequence-based efforts focus on the degree to which variants of specific genes alter a limited set of phenotypes, while general gene knockout-induced phenotype abnormality prediction methods heavily rely on curated genetic information as inputs, which limits scalability and generalizability. As a result, the task of broadly predicting the presence of multiple phenotype abnormalities under gene knockout directly from gene sequences remains underexplored. We introduce GenePheno, the first interpretable multi-label prediction framework that predicts knockout-induced phenotypic abnormalities from gene sequences. GenePheno employs a contrastive multi-label learning objective that captures inter-phenotype correlations, complemented by an exclusive regularization that enforces biological consistency. It further incorporates a gene function bottleneck layer, offering human-interpretable concepts that reflect functional mechanisms behind phenotype formation. To support progress in this area, we curate four datasets with canonical gene sequences as input and multi-label phenotypic abnormalities induced by gene knockouts as targets. Across these datasets, GenePheno achieves state-of-the-art gene-centric Fmax and phenotype-centric AUC, and case studies demonstrate its ability to reveal gene functional mechanisms.

NeurIPS Conference 2025 Conference Paper

Aeolus: A Multi-structural Flight Delay Dataset

  • Lin Xu
  • Xinyun Yuan
  • Yuxuan Liang
  • Suwan Yin
  • Yuankai Wu

We introduce Aeolus, a large-scale Multi-modal Flight Delay Dataset designed to advance research on flight delay prediction and support the development of foundation models for tabular data. Existing datasets in this domain are typically limited to flat tabular structures and fail to capture the spatiotemporal dynamics inherent in delay propagation. Aeolus addresses this limitation by providing three aligned modalities: (i) a tabular dataset with rich operational, meteorological, and airportlevel features for over 50 million flights; (ii) a flight chain module that models delay propagation along sequential flight legs, capturing upstream and downstream dependencies; and (iii) a flight network graph that encodes shared aircraft, crew, and airport resource connections, enabling cross-flight relational reasoning. The dataset is carefully constructed with temporal splits, comprehensive features, and strict leakage prevention to support realistic and reproducible machine learning evaluation. Aeolus supports a broad range of tasks, including regression, classification, temporal structure modeling, and graph learning, serving as a unified benchmark across tabular, sequential, and graph modalities. We release baseline experiments and preprocessing tools to facilitate adoption. Aeolus fills a key gap for both domain-specific modeling and general-purpose structured data research. Our source code and data can be accessed at https: //github. com/Flnny/Delay-data

AAAI Conference 2022 Conference Paper

OA-FSUI2IT: A Novel Few-Shot Cross Domain Object Detection Framework with Object-Aware Few-Shot Unsupervised Image-to-Image Translation

  • Lifan Zhao
  • Yunlong Meng
  • Lin Xu

Unsupervised image-to-image (UI2I) translation methods aim to learn a mapping between different visual domains with well-preserved content and consistent structure. It has been proven that the generated images are quite useful for enhancing the performance of computer vision tasks like object detection in a different domain with distribution discrepancies. Current methods require large amounts of images in both source and target domains for successful translation. However, data collection and annotations in many scenarios are infeasible or even impossible. In this paper, we propose an Object-Aware Few-Shot UI2I Translation (OA-FSUI2IT) framework to address the few-shot cross domain (FSCD) object detection task with limited unlabeled images in the target domain. To this end, we first introduce a discriminator augmentation (DA) module into the OA-FSUI2IT framework for successful few-shot UI2I translation. Then, we present a patch pyramid contrastive learning (PPCL) strategy to further improve the quality of the generated images. Last, we propose a self-supervised content-consistency (SSCC) loss to enforce the content-consistency in the translation. We implement extensive experiments to demonstrate the effectiveness of our OA-FSUI2IT framework for FSCD object detection and achieve state-of-the-art performance on the benchmarks of Normal-to-Foggy, Day-to-Night, and Cross-scene adaptation. The source code of our proposed method is also available at https: //github. com/emdata-ailab/FSCD-Det.

AAAI Conference 2019 Conference Paper

End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis

  • Lin Xu
  • Qixian Zhou
  • Ke Gong
  • Xiaodan Liang
  • Jianheng Tang
  • Liang Lin

Beyond current conversational chatbots or task-oriented dialogue systems that have attracted increasing attention, we move forward to develop a dialogue system for automatic medical diagnosis that converses with patients to collect additional symptoms beyond their self-reports and automatically makes a diagnosis. Besides the challenges for conversational dialogue systems (e. g. topic transition coherency and question understanding), automatic medical diagnosis further poses more critical requirements for the dialogue rationality in the context of medical knowledge and symptom-disease relations. Existing dialogue systems (Madotto, Wu, and Fung 2018; Wei et al. 2018; Li et al. 2017) mostly rely on datadriven learning and cannot be able to encode extra expert knowledge graph. In this work, we propose an End-to-End Knowledge-routed Relational Dialogue System (KR-DS) that seamlessly incorporates rich medical knowledge graph into the topic transition in dialogue management, and makes it cooperative with natural language understanding and natural language generation. A novel Knowledge-routed Deep Q-network (KR-DQN) is introduced to manage topic transitions, which integrates a relational refinement branch for encoding relations among different symptoms and symptomdisease pairs, and a knowledge-routed graph branch for topic decision-making. Extensive experiments on a public medical dialogue dataset show our KR-DS significantly beats stateof-the-art methods (by more than 8% in diagnosis accuracy). We further show the superiority of our KR-DS on a newly collected medical dialogue system dataset, which is more challenging retaining original self-reports and conversational data between patients and doctors.

IJCAI Conference 2019 Conference Paper

HorNet: A Hierarchical Offshoot Recurrent Network for Improving Person Re-ID via Image Captioning

  • Shiyang Yan
  • Jun Xu
  • Yuai Liu
  • Lin Xu

Person re-identification (re-ID) aims to recognize a person-of-interest across different cameras with notable appearance variance. Existing research works focused on the capability and robustness of visual representation. In this paper, instead, we propose a novel hierarchical offshoot recurrent network (HorNet) for improving person re-ID via image captioning. Image captions are semantically richer and more consistent than visual attributes, which could significantly alleviate the variance. We use the similarity preserving generative adversarial network (SPGAN) and an image captioner to fulfill domain transfer and language descriptions generation. Then the proposed HorNet can learn the visual and language representation from both the images and captions jointly, and thus enhance the performance of person re-ID. Extensive experiments are conducted on several benchmark datasets with or without image captions, i. e. , CUHK03, Market-1501, and Duke-MTMC, demonstrating the superiority of the proposed method. Our method can generate and extract meaningful image captions while achieving state-of-the-art performance.

IJCAI Conference 2015 Conference Paper

Algorithm Runtime Prediction: Methods and Evaluation (Extended Abstract)

  • Frank Hutter
  • Lin Xu
  • Holger Hoos
  • Kevin Leyton-Brown

Perhaps surprisingly, it is possible to predict how long an algorithm will take to run on a previously unseen input, using machine learning techniques to build a model of the algorithm’s runtime as a function of problem-specific instance features. Such models have many important applications and over the past decade, a wide variety of techniques have been studied for building such models. In this extended abstract of our 2014 AI Journal article of the same title, we summarize existing models and describe new model families and various extensions. In a comprehensive empirical analyis using 11 algorithms and 35 instance distributions spanning a wide range of hard combinatorial problems, we demonstrate that our new models yield substantially better runtime predictions than previous approaches in terms of their generalization to new problem instances, to new algorithms from a parameterized space, and to both simultaneously.

SAT Conference 2012 Conference Paper

Evaluating Component Solver Contributions to Portfolio-Based Algorithm Selectors

  • Lin Xu
  • Frank Hutter
  • Holger H. Hoos
  • Kevin Leyton-Brown

Abstract Portfolio-based methods exploit the complementary strengths of a set of algorithms and—as evidenced in recent competitions—represent the state of the art for solving many NP-hard problems, including SAT. In this work, we argue that a state-of-the-art method for constructing portfolio-based algorithm selectors, \(\texttt{SATzilla}\), also gives rise to an automated method for quantifying the importance of each of a set of available solvers. We entered a substantially improved version of \(\texttt{SATzilla}\) to the inaugural “analysis track” of the 2011 SAT competition, and draw two main conclusions from the results that we obtained. First, automatically-constructed portfolios of sequential, non-portfolio competition entries perform substantially better than the winners of all three sequential categories. Second, and more importantly, a detailed analysis of these portfolios yields valuable insights into the nature of successful solver designs in the different categories. For example, we show that the solvers contributing most to \(\texttt{SATzilla}\) were often not the overall best-performing solvers, but instead solvers that exploit novel solution strategies to solve instances that would remain unsolved without them.

AAAI Conference 2012 Conference Paper

Predicting Satisfiability at the Phase Transition

  • Lin Xu
  • Holger Hoos
  • Kevin Leyton-Brown

Uniform random 3-SAT at the solubility phase transition is one of the most widely studied and empirically hardest distributions of SAT instances. For 20 years, this distribution has been used extensively for evaluating and comparing algorithms. In this work, we demonstrate that simple rules can predict the solubility of these instances with surprisingly high accuracy. Specifically, we show how classification accuracies of about 70% can be obtained based on cheaply (polynomial-time) computable features on a wide range of instance sizes. We argue in two ways that classification accuracy does not decrease with instance size: first, we show that our models’ predictive accuracy remains roughly constant across a wide range of problem sizes; second, we show that a classifier trained on small instances is sufficient to achieve very accurate predictions across the entire range of instance sizes currently solvable by complete methods. Finally, we demonstrate that a simple decision tree based on only two features, and again trained only on the smallest instances, achieves predictive accuracies close to those of our most complex model. We conjecture that this twofeature model outperforms random guessing asymptotically; due to the model’s extreme simplicity, we believe that this conjecture is a worthwhile direction for future theoretical work.

AAAI Conference 2010 Conference Paper

Hydra: Automatically Configuring Algorithms for Portfolio-Based Selection

  • Lin Xu
  • Holger Hoos
  • Kevin Leyton-Brown

The AI community has achieved great success in designing high-performance algorithms for hard combinatorial problems, given both considerable domain knowledge and considerable effort by human experts. Two influential methods aim to automate this process: automated algorithm configuration and portfolio-based algorithm selection. The former has the advantage of requiring virtually no domain knowledge, but produces only a single solver; the latter exploits per-instance variation, but requires a set of relatively uncorrelated candidate solvers. Here, we introduce Hydra, a novel technique for combining these two methods, thereby realizing the benefits of both. Hydra automatically builds a set of solvers with complementary strengths by iteratively configuring new algorithms. It is primarily intended for use in problem domains for which an adequate set of candidate solvers does not already exist. Nevertheless, we tested Hydra on a widely studied domain, stochastic local search algorithms for SAT, in order to characterize its performance against a well-established and highly competitive baseline. We found that Hydra consistently achieved major improvements over the best existing individual algorithms, and always at least roughly matched—and indeed often exceeded— the performance of the best portfolios of these algorithms.

IJCAI Conference 2009 Conference Paper

  • Ashiqur R. KhudaBukhsh
  • Lin Xu
  • Holger H. Hoos
  • Kevin Leyton-Brown

Designing high-performance algorithms for computationally hard problems is a difficult and often time-consuming task. In this work, we demonstrate that this task can be automated in the context of stochastic local search (SLS) solvers for the propositional satisfiability problem (SAT). We first introduce a generalised, highly parameterised solver framework, dubbed SATenstein, that includes components gleaned from or inspired by existing high-performance SLS algorithms for SAT. The parameters of SATenstein control the selection of components used in any specific instantiation and the behaviour of these components. SATenstein can be configured to instantiate a broad range of existing high-performance SLSbased SAT solvers, and also billions of novel algorithms. We used an automated algorithm configuration procedure to find instantiations of SATenstein that perform well on several well-known, challenging distributions of SAT instances. Overall, we consistently obtained significant improvements over the previously best-performing SLS algorithms, despite expending minimal manual effort. 1