Arrow Research search

Author name cluster

Jian Peng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers
2 author rows

Possible papers

24

AAAI Conference 2026 Conference Paper

Rejoining Precious Artifacts: Efficiently Bone Stick Rejoining Based Massive Fragment Images by Contour, Script, and Texture

  • Xingyi Wang
  • Wen Huang
  • Mengqiang Hu
  • Junhui Chen
  • Weixin Zhao
  • Wenzheng Xu
  • Jian Peng

Rejoining fragment images of precious artifacts is a meaningful task because complete artifacts could provide valuable clues for the research of human civilization. However, existing rejoining methods face several challenges including time-consuming manual annotation, insufficient rejoining accuracy, and prohibitive computation cost. For rejoining fragment images of bone sticks (a precious artifact), we propose a lightweight vision graph neural network called RejoinViG to address these challenges. First, our method avoids time-consuming manual annotation of ballast contour data by experts. Specifically, our method directly takes a pair of fragment images as input and then determines whether the image pair is rejoinable. Second, our method improves rejoining accuracy by contour, script, and texture through dynamically constructing local and global graphs. Third, our method improves rejoining accuracy while reducing computation cost by introducing a new attention mechanism named node self-attention. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods significantly. For example, the Top-1 accuracy of our method is 3.9 times that of SFF-Siam. Surprisingly, our method successfully rejoins a pair of previously unknown but rejoinable fragment images of bone sticks in a real-world scenario.

ICLR Conference 2025 Conference Paper

Hotspot-Driven Peptide Design via Multi-Fragment Autoregressive Extension

  • Jiahan Li
  • Tong Chen
  • Shitong Luo
  • Chaoran Cheng
  • Jiaqi Guan
  • Ruihan Guo
  • Sheng Wang
  • Ge Liu

Peptides, short chains of amino acids, interact with target proteins, making them a unique class of protein-based therapeutics for treating human diseases. Recently, deep generative models have shown great promise in peptide generation. However, several challenges remain in designing effective peptide binders. First, not all residues contribute equally to peptide-target interactions. Second, the generated peptides must adopt valid geometries due to the constraints of peptide bonds. Third, realistic tasks for peptide drug development are still lacking. To address these challenges, we introduce PepHAR, a hot-spot-driven autoregressive generative model for designing peptides targeting specific proteins. Building on the observation that certain hot spot residues have higher interaction potentials, we first use an energy-based density model to fit and sample these key residues. Next, to ensure proper peptide geometry, we autoregressively extend peptide fragments by estimating dihedral angles between residue frames. Finally, we apply an optimization process to iteratively refine fragment assembly, ensuring correct peptide structures. By combining hot spot sampling with fragment-based extension, our approach enables \textit{de novo} peptide design tailored to a target protein and allows the incorporation of key hot spot residues into peptide scaffolds. Extensive experiments, including peptide design and peptide scaffold generation, demonstrate the strong potential of PepHAR in computational peptide binder design. The source code will be available at https://github.com/Ced3-han/PepHAR.

AAAI Conference 2025 Conference Paper

Improving Federated Domain Generalization Through Dynamical Weights Calculated from Data Influences on Global Model Update

  • Zikun Zhou
  • Wen Huang
  • Xingyi Wang
  • Zhishuo Zhang
  • Zhun Zhang
  • Jian Peng
  • Feihu Huang

With the popularity of federated learning, federated domain generalization (FedDG) has attracted more and more attentions. Existing works of federated learning indicate that the generalization performance of the global model can be improved when the global model is obtained by aggregating local models according to a suitable weights. However, the existing methods to calculate weights do not fully utilize the data influences on the global model update, which gives us an opportunity to improve the generalization performance of the global model further. In this paper, we propose the method DI (data influences), which utilizes the data influences on the global model update to calculate dynamical weights of local model in each round of training. Specifically, the first component data influences calculator (DIC) of DI calculates the local weights of local model from the influences of each data on the global model update and we introduce the influences function to complete the calculation process. The second component data influences adjuster (DIA) of DI calculates the global weights (which are used in the aggregation process of the global model) from local weights. Extensive experiments indicate that our method improves the generalization performance of models significantly. In particular, our method improves model accuracy on benchmark datasets PACS, OfficeHome, and Office-31 by 1.79%, 1.61%, and 2.39% on average, respectively. Source code is publicly available at github.

AAAI Conference 2025 Conference Paper

PBECount: Prompt-Before-Extract Paradigm for Class-Agnostic Counting

  • Canchen Yang
  • Tianyu Geng
  • Jian Peng
  • Chun Xu

In the field of class-agnostic counting (CAC), counting only objects of interest that are similar to exemplars in multi-class scenarios has been a challenging task. To address this challenge, recent research has proposed the extract-and-match paradigm based on the vision transformer (ViT) architecture. However, although this paradigm can improve the accuracy of exemplar-similar object identification, it overly emphasizes the role of the ViT structure. To address this shortcoming, this work introduces a more generalized prompt-before-extract paradigm on top of the extract-and-match paradigm and designs a pure convolutional neural network (CNN) model named PBECount. In addition, an innovative loss function, a post-processing strategy, and a dynamic threshold method are proposed to enhance the detection performance of the proposed model when the probability maps are used as ground truth during model training. The experimental results on the FSC-147 and CARPK datasets demonstrate that the proposed PBECount can identify whether unknown class objects are similar to exemplars and outperform the state-of-the-art CAC methods in terms of accuracy and generalization.

NeurIPS Conference 2024 Conference Paper

Categorical Flow Matching on Statistical Manifolds

  • Chaoran Cheng
  • Jiahan Li
  • Jian Peng
  • Ge Liu

We introduce Statistical Flow Matching (SFM), a novel and mathematically rigorous flow-matching framework on the manifold of parameterized probability measures inspired by the results from information geometry. We demonstrate the effectiveness of our method on the discrete generation problem by instantiating SFM on the manifold of categorical distributions whose geometric properties remain unexplored in previous discrete generative models. Utilizing the Fisher information metric, we equip the manifold with a Riemannian structure whose intrinsic geometries are effectively leveraged by following the shortest paths of geodesics. We develop an efficient training and sampling algorithm that overcomes numerical stability issues with a diffeomorphism between manifolds. Our distinctive geometric perspective of statistical manifolds allows us to apply optimal transport during training and interpret SFM as following the steepest direction of the natural gradient. Unlike previous models that rely on variational bounds for likelihood estimation, SFM enjoys the exact likelihood calculation for arbitrary probability measures. We manifest that SFM can learn more complex patterns on the statistical manifold where existing models often fail due to strong prior assumptions. Comprehensive experiments on real-world generative tasks ranging from image, text to biological domains further demonstrate that SFM achieves higher sampling quality and likelihood than other discrete diffusion or flow-based models.

NeurIPS Conference 2024 Conference Paper

Enhancing Protein Mutation Effect Prediction through a Retrieval-Augmented Framework

  • Ruihan Guo
  • Rui Wang
  • Ruidong Wu
  • Zhizhou Ren
  • Jiahan Li
  • Shitong Luo
  • Zuofan Wu
  • Qiang Liu

Predicting the effects of protein mutations is crucial for analyzing protein functions and understanding genetic diseases. However, existing models struggle to effectively extract mutation-related local structure motifs from protein databases, which hinders their predictive accuracy and robustness. To tackle this problem, we design a novel retrieval-augmented framework for incorporating similar structure information in known protein structures. We create a vector database consisting of local structure motif embeddings from a pre-trained protein structure encoder, which allows for efficient retrieval of similar local structure motifs during mutation effect prediction. Our findings demonstrate that leveraging this method results in the SOTA performance across multiple protein mutation prediction datasets, and offers a scalable solution for studying mutation effects.

NeurIPS Conference 2024 Conference Paper

Enhancing vision-language models for medical imaging: bridging the 3D gap with innovative slice selection

  • Yuli Wang
  • Jian Peng
  • Yuwei Dai
  • Craig Jones
  • Haris Sair
  • Jinglai Shen
  • Nicolas Loizou
  • Jing Wu

Recent approaches to vision-language tasks are built on the remarkable capabilities of large vision-language models (VLMs). These models excel in zero-shot and few-shot learning, enabling them to learn new tasks without parameter updates. However, their primary challenge lies in their design, which primarily accommodates 2D input, thus limiting their effectiveness for medical images, particularly radiological images like MRI and CT, which are typically 3D. To bridge the gap between state-of-the-art 2D VLMs and 3D medical image data, we developed an innovative, one-pass, unsupervised representative slice selection method called Vote-MI, which selects representative 2D slices from 3D medical imaging. To evaluate the effectiveness of vote-MI when implemented with VLMs, we introduce BrainMD, a robust, multimodal dataset comprising 2, 453 annotated 3D MRI brain scans with corresponding textual radiology reports and electronic health records. Based on BrainMD, we further develop two benchmarks, BrainMD-select (including the most representative 2D slice of 3D image) and BrainBench (including various vision-language downstream tasks). Extensive experiments on the BrainMD dataset and its two corresponding benchmarks demonstrate that our representative selection method significantly improves performance in zero-shot and few-shot learning tasks. On average, Vote-MI achieves a 14. 6\% and 16. 6\% absolute gain for zero-shot and few-shot learning, respectively, compared to randomly selecting examples. Our studies represent a significant step toward integrating AI in medical imaging to enhance patient care and facilitate medical research. We hope this work will serve as a foundation for data selection as vision-language models are increasingly applied to new tasks.

NeurIPS Conference 2023 Conference Paper

Equivariant Neural Operator Learning with Graphon Convolution

  • Chaoran Cheng
  • Jian Peng

We propose a general architecture that combines the coefficient learning scheme with a residual operator layer for learning mappings between continuous functions in the 3D Euclidean space. Our proposed model is guaranteed to achieve SE(3)-equivariance by design. From the graph spectrum view, our method can be interpreted as convolution on graphons (dense graphs with infinitely many nodes), which we term InfGCN. By leveraging both the continuous graphon structure and the discrete graph structure of the input data, our model can effectively capture the geometric information while preserving equivariance. Through extensive experiments on large-scale electron density datasets, we observed that our model significantly outperformed the current state-of-the-art architectures. Multiple ablation studies were also carried out to demonstrate the effectiveness of the proposed architecture.

NeurIPS Conference 2023 Conference Paper

LinkerNet: Fragment Poses and Linker Co-Design with 3D Equivariant Diffusion

  • Jiaqi Guan
  • Xingang Peng
  • PeiQi Jiang
  • Yunan Luo
  • Jian Peng
  • Jianzhu Ma

Targeted protein degradation techniques, such as PROteolysis TArgeting Chimeras (PROTACs), have emerged as powerful tools for selectively removing disease-causing proteins. One challenging problem in this field is designing a linker to connect different molecular fragments to form a stable drug-candidate molecule. Existing models for linker design assume that the relative positions of the fragments are known, which may not be the case in real scenarios. In this work, we address a more general problem where the poses of the fragments are unknown in 3D space. We develop a 3D equivariant diffusion model that jointly learns the generative process of both fragment poses and the 3D structure of the linker. By viewing fragments as rigid bodies, we design a fragment pose prediction module inspired by the Newton-Euler equations in rigid body mechanics. Empirical studies on ZINC and PROTAC-DB datasets demonstrate that our model can generate chemically valid, synthetically-accessible, and low-energy molecules under both unconstrained and constrained generation settings.

NeurIPS Conference 2022 Conference Paper

Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures

  • Shitong Luo
  • Yufeng Su
  • Xingang Peng
  • Sheng Wang
  • Jian Peng
  • Jianzhu Ma

Antibodies are immune system proteins that protect the host by binding to specific antigens such as viruses and bacteria. The binding between antibodies and antigens is mainly determined by the complementarity-determining regions (CDR) of the antibodies. In this work, we develop a deep generative model that jointly models sequences and structures of CDRs based on diffusion probabilistic models and equivariant neural networks. Our method is the first deep learning-based method that generates antibodies explicitly targeting specific antigen structures and is one of the earliest diffusion probabilistic models for protein structures. The model is a "Swiss Army Knife" capable of sequence-structure co-design, sequence design for given backbone structures, and antibody optimization. We conduct extensive experiments to evaluate the quality of both sequences and structures of designed antibodies. We find that our model could yield competitive results in binding affinity measured by biophysical energy functions and other protein design metrics.

AAMAS Conference 2022 Conference Paper

Characterizing Attacks on Deep Reinforcement Learning

  • Xinlei Pan
  • Chaowei Xiao
  • Warren He
  • Shuang Yang
  • Jian Peng
  • Mingjie Sun
  • Mingyan Liu
  • Bo Li

Recent studies show that Deep Reinforcement Learning (DRL) models are vulnerable to adversarial attacks, which attack DRL models by adding small perturbations to the observations. However, some attacks assume full availability of the victim model, and some require a huge amount of computation, making them less feasible for real world applications. In this work, we make further explorations of the vulnerabilities of DRL by studying other aspects of attacks on DRL using realistic and e�cient attacks. First, we adapt and propose e�cient black-box attacks when we do not have access to DRL model parameters. Second, to address the high computational demands of existing attacks, we introduce e�cient online sequential attacks that exploit temporal consistency across consecutive steps. Third, we explore the possibility of an attacker perturbing other aspects in the DRL setting, such as the environment dynamics. Finally, to account for imperfections in how an attacker would inject perturbations in the physical world, we devise a method for generating a robust physical perturbations to be printed. The attack is evaluated on a real-world robot under various conditions. We conduct extensive experiments both in simulation such as Atari games, robotics and autonomous driving, and on real-world robotics, to compare the e�ectiveness of the proposed attacks with baseline approaches. To the best of our knowledge, we are the�rst to apply adversarial attacks on DRL systems to physical robots.

NeurIPS Conference 2022 Conference Paper

Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation

  • Zhizhou Ren
  • Anji Liu
  • Yitao Liang
  • Jian Peng
  • Jianzhu Ma

Learning new task-specific skills from a few trials is a fundamental challenge for artificial intelligence. Meta reinforcement learning (meta-RL) tackles this problem by learning transferable policies that support few-shot adaptation to unseen tasks. Despite recent advances in meta-RL, most existing methods require the access to the environmental reward function of new tasks to infer the task objective, which is not realistic in many practical applications. To bridge this gap, we study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning. We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback. The agent can adapt to new tasks by querying human's preference between behavior trajectories instead of using per-step numeric rewards. By extending techniques from information theory, our approach can design query sequences to maximize the information gain from human interactions while tolerating the inherent error of non-expert human oracle. In experiments, we extensively evaluate our method, Adaptation with Noisy OracLE (ANOLE), on a variety of meta-RL benchmark tasks and demonstrate substantial improvement over baseline algorithms in terms of both feedback efficiency and error tolerance.

NeurIPS Conference 2021 Conference Paper

A 3D Generative Model for Structure-Based Drug Design

  • Shitong Luo
  • Jiaqi Guan
  • Jianzhu Ma
  • Jian Peng

We study a fundamental problem in structure-based drug design --- generating molecules that bind to specific protein binding sites. While we have witnessed the great success of deep generative models in drug design, the existing methods are mostly string-based or graph-based. They are limited by the lack of spatial information and thus unable to be applied to structure-based design tasks. Particularly, such models have no or little knowledge of how molecules interact with their target proteins exactly in 3D space. In this paper, we propose a 3D generative model that generates molecules given a designated 3D protein binding site. Specifically, given a binding site as the 3D context, our model estimates the probability density of atom's occurrences in 3D space --- positions that are more likely to have atoms will be assigned higher probability. To generate 3D molecules, we propose an auto-regressive sampling scheme --- atoms are sampled sequentially from the learned distribution until there is no room for new atoms. Combined with this sampling scheme, our model can generate valid and diverse molecules, which could be applicable to various structure-based molecular design tasks such as molecule sampling and linker design. Experimental results demonstrate that molecules sampled from our model exhibit high binding affinity to specific targets and good drug properties such as drug-likeness even if the model is not explicitly optimized for them.

NeurIPS Conference 2020 Conference Paper

Learning Guidance Rewards with Trajectory-space Smoothing

  • Tanmay Gangwani
  • Yuan Zhou
  • Jian Peng

Long-term temporal credit assignment is an important challenge in deep reinforcement learning (RL). It refers to the ability of the agent to attribute actions to consequences that may occur after a long time interval. Existing policy-gradient and Q-learning algorithms typically rely on dense environmental rewards that provide rich short-term supervision and help with credit assignment. However, they struggle to solve tasks with delays between an action and the corresponding rewarding feedback. To make credit assignment easier, recent works have proposed algorithms to learn dense "guidance" rewards that could be used in place of the sparse or delayed environmental rewards. This paper is in the same vein -- starting with a surrogate RL objective that involves smoothing in the trajectory-space, we arrive at a new algorithm for learning guidance rewards. We show that the guidance rewards have an intuitive interpretation, and can be obtained without training any additional neural networks. Due to the ease of integration, we use the guidance rewards in a few popular algorithms (Q-learning, Actor-Critic, Distributional-RL) and present results in single-agent and multi-agent tasks that elucidate the benefit of our approach when the environmental rewards are sparse or delayed.

NeurIPS Conference 2020 Conference Paper

Off-Policy Interval Estimation with Lipschitz Value Iteration

  • Ziyang Tang
  • Yihao Feng
  • Na Zhang
  • Jian Peng
  • Qiang Liu

Off-policy evaluation provides an essential tool for evaluating the effects of different policies or treatments using only observed data. When applied to high-stakes scenarios such as medical diagnosis or financial decision-making, it is essential to provide provably correct upper and lower bounds of the expected reward, not just a classical single point estimate, to the end-users, as executing a poor policy can be very costly. In this work, we propose a provably correct method for obtaining interval bounds for off-policy evaluation in a general continuous setting. The idea is to search for the maximum and minimum values of the expected reward among all the Lipschitz Q-functions that are consistent with the observations, which amounts to solving a constrained optimization problem on a Lipschitz function space. We go on to introduce a Lipschitz value iteration method to monotonically tighten the interval, which is simple yet efficient and provably convergent. We demonstrate the practical efficiency of our method on a range of benchmarks.

NeurIPS Conference 2019 Conference Paper

Exploration via Hindsight Goal Generation

  • Zhizhou Ren
  • Kefan Dong
  • Yuan Zhou
  • Qiang Liu
  • Jian Peng

Goal-oriented reinforcement learning has recently been a practical framework for robotic manipulation tasks, in which an agent is required to reach a certain goal defined by a function on the state space. However, the sparsity of such reward definition makes traditional reinforcement learning algorithms very inefficient. Hindsight Experience Replay (HER), a recent advance, has greatly improved sample efficiency and practical applicability for such problems. It exploits previous replays by constructing imaginary goals in a simple heuristic way, acting like an implicit curriculum to alleviate the challenge of sparse reward signal. In this paper, we introduce Hindsight Goal Generation (HGG), a novel algorithmic framework that generates valuable hindsight goals which are easy for an agent to achieve in the short term and are also potential for guiding the agent to reach the actual goal in the long term. We have extensively evaluated our goal generation algorithm on a number of robotic manipulation tasks and demonstrated substantially improvement over the original HER in terms of sample efficiency.

AAMAS Conference 2019 Conference Paper

Stochastic Variance Reduction for Deep Q-learning

  • Wei-ye Zhao
  • Jian Peng

Recent advances in deep reinforcement learning have achieved human-level performance on a variety of real-world applications. However, the current algorithms still suffer from poor gradient estimation with excessive variance, resulting in unstable training and poor sample efficiency. In our paper, we proposed an innovative optimization strategy by utilizing stochastic variance reduced gradient (SVRG) techniques. With extensive experiments on Atari domain, our method outperforms the deep q-learning baselines on 18 out of 20 games.

NeurIPS Conference 2019 Conference Paper

Thresholding Bandit with Optimal Aggregate Regret

  • Chao Tao
  • Saúl Blanco
  • Jian Peng
  • Yuan Zhou

We consider the thresholding bandit problem, whose goal is to find arms of mean rewards above a given threshold $\theta$, with a fixed budget of $T$ trials. We introduce LSA, a new, simple and anytime algorithm that aims to minimize the aggregate regret (or the expected number of mis-classified arms). We prove that our algorithm is instance-wise asymptotically optimal. We also provide comprehensive empirical results to demonstrate the algorithm's superior performance over existing algorithms under a variety of different scenarios.

IJCAI Conference 2018 Conference Paper

Efficient Localized Inference for Large Graphical Models

  • Jinglin Chen
  • Jian Peng
  • Qiang Liu

We propose a new localized inference algorithm for answering marginalization queries in large graphical models with the correlation decay property. Given a query variable and a large graphical model, we define a much smaller model in a local region around the query variable in the target model so that the marginal distribution of the query variable can be accurately approximated. We introduce two approximation error bounds based on the Dobrushin’s comparison theorem and apply our bounds to derive a greedy expansion algorithm that efficiently guides the selection of neighbor nodes for localized inference. We verify our theoretical bounds on various datasets and demonstrate that our localized inference algorithm can provide fast and accurate approximation for large graphical models.

AAAI Conference 2018 Conference Paper

Empower Sequence Labeling with Task-Aware Neural Language Model

  • Liyuan Liu
  • Jingbo Shang
  • Xiang Ren
  • Frank Xu
  • Huan Gui
  • Jian Peng
  • Jiawei Han

Linguistic sequence labeling is a general approach encompassing a variety of problems, such as part-of-speech tagging and named entity recognition. Recent advances in neural networks (NNs) make it possible to build reliable models without handcrafted features. However, in many cases, it is hard to obtain sufficient annotations to train these models. In this study, we develop a neural framework to extract knowledge from raw texts and empower the sequence labeling task. Besides word-level knowledge contained in pretrained word embeddings, character-aware neural language models are incorporated to extract character-level knowledge. Transfer learning techniques are further adopted to mediate different components and guide the language model towards the key knowledge. Comparing to previous methods, these task-specific knowledge allows us to adopt a more concise model and conduct more efficient training. Different from most transfer learning methods, the proposed framework does not rely on any additional supervision. It extracts knowledge from self-contained order information of training sequences. Extensive experiments on benchmark datasets demonstrate the effectiveness of leveraging character-level knowledge and the efficiency of co-training. For example, on the CoNLL03 NER task, model training completes in about 6 hours on a single GPU, reaching F1 score of 91. 71±0. 10 without using any extra annotations.

IJCAI Conference 2018 Conference Paper

Energy-efficient Amortized Inference with Cascaded Deep Classifiers

  • Jiaqi Guan
  • Yang Liu
  • Qiang Liu
  • Jian Peng

Deep neural networks have been remarkable successful in various AI tasks but often cast high computation and energy cost for energy-constrained applications such as mobile sensing. We address this problem by proposing a novel framework that optimizes the prediction accuracy and energy cost simultaneously, thus enabling effective cost-accuracy trade-off at test time. In our framework, each data instance is pushed into a cascade of deep neural networks with increasing sizes, and a selection module is used to sequentially determine when a sufficiently accurate classifier can be used for this data instance. The cascade of neural networks and the selection module are jointly trained in an end-to-end fashion by the REINFORCE algorithm to optimize a trade-off between the computational cost and the predictive accuracy. Our method is able to simultaneously improve the accuracy and efficiency by learning to assign easy instances to fast yet sufficiently accurate classifiers to save computation and energy cost, while assigning harder instances to deeper and more powerful classifiers to ensure satisfiable accuracy. Moreover, we demonstrate our method's effectiveness with extensive experiments on CIFAR-10/100, ImageNet32x32 and original ImageNet dataset.

NeurIPS Conference 2012 Conference Paper

Variational Inference for Crowdsourcing

  • Qiang Liu
  • Jian Peng
  • Alexander Ihler

Crowdsourcing has become a popular paradigm for labeling large datasets. However, it has given rise to the computational task of aggregating the crowdsourced labels provided by a collection of unreliable annotators. We approach this problem by transforming it into a standard inference problem in graphical models, and applying approximate variational methods, including belief propagation (BP) and mean field (MF). We show that our BP algorithm generalizes both majority voting and a recent algorithm by Karger et al, while our MF method is closely related to a commonly used EM algorithm. In both cases, we find that the performance of the algorithms critically depends on the choice of a prior distribution on the workers' reliability; by choosing the prior properly, both BP and MF (and EM) perform surprisingly well on both simulated and real-world datasets, competitive with state-of-the-art algorithms based on more complicated modeling assumptions.

NeurIPS Conference 2009 Conference Paper

Conditional Neural Fields

  • Jian Peng
  • Liefeng Bo
  • Jinbo Xu

Conditional random fields (CRF) are quite successful on sequence labeling tasks such as natural language processing and biological sequence analysis. CRF models use linear potential functions to represent the relationship between input features and outputs. However, in many real-world applications such as protein structure prediction and handwriting recognition, the relationship between input features and outputs is highly complex and nonlinear, which cannot be accurately modeled by a linear function. To model the nonlinear relationship between input features and outputs we propose Conditional Neural Fields (CNF), a new conditional probabilistic graphical model for sequence labeling. Our CNF model extends CRF by adding one (or possibly several) middle layer between input features and outputs. The middle layer consists of a number of hidden parameterized gates, each acting as a local neural network node or feature extractor to capture the nonlinear relationship between input features and outputs. Therefore, conceptually this CNF model is much more expressive than the linear CRF model. To better control the complexity of the CNF model, we also present a hyperparameter optimization procedure within the evidence framework. Experiments on two widely-used benchmarks indicate that this CNF model performs significantly better than a number of popular methods. In particular, our CNF model is the best among about ten machine learning methods for protein secondary tructure prediction and also among a few of the best methods for handwriting recognition.