Arrow Research search

Author name cluster

Ting Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers
2 author rows

Possible papers

36

JBHI Journal 2026 Journal Article

AI-Generated Motifs Distinguish Altered Spatiotemporal Pain Response in the VTA of Mice with Chronic Pain

  • David Anderson Lloyd
  • Dunyan Yao
  • Austin Ganaway
  • Ting Chen
  • Yasumi Ohta
  • Jun Ohta
  • Yasemin M. Akay
  • Masahiro Ohsawa

More than one fifth of American adults lives with chronic pain. As pain chronifies, the underlying neuronal circuitry undergoes maladaptive spatial and temporal changes. We previously developed and used an advanced CMOS sensor to record video of ventral tegmental area (VTA) activity in response to acute pain and pain chronification. Here we use both discriminative and generative AI approaches to spatiotemporally characterize the VTA's complex response to pain and quantify changes in its circuit dynamics murine chronic pain models. We trained a time-attention convolutional neural network (TA-CNN) and used its gradient-weighted class activation maps (Grad CAMs) to spatially isolate activity which differentiates pre- and post-surgical responses to stimulation. Next, we implemented an unsupervised vector quantized variational autoencoder (VQ-VAE) to learn a dense, discrete representation of the VTA's response in terms of a codebook of spatiotemporal motifs. The TA-CNN's (test set accuracy 0. 787) CAMs help isolate post-surgery activity differences to the inferior segments of the VTA for partial sciatic nerve ligation (PNL) subjects but not sham subjects. The VQ-VAE (validation mean squared error 0. 00732) identifies distinct spatiotemporal motifs which spatially correspond to observed VTA sub-regions. Furthermore, these motifs show changes in both spatial organization and time-response to pain after PNL but not after sham operation. These motifs also exhibit intensified spatiotemporal responses to varying intensities of mechanical stimulation in post-PNL recordings. The use of AI to fit complex space-time dynamics to an ordered latent representation or code paves the way for nuanced analysis of previously difficult-to-approach problems.

AAAI Conference 2026 Conference Paper

RareAgents: Autonomous Multi-disciplinary Team for Rare Disease Diagnosis and Treatment

  • Xuanzhong Chen
  • Ye Jin
  • Xiaohao Mao
  • Lun Wang
  • Shuyang Zhang
  • Ting Chen

Rare diseases, despite their low individual incidence, collectively impact around 300 million people worldwide due to the vast number of diseases. The involvement of multiple organs and systems, and the shortage of specialized doctors with relevant experience, make diagnosing and treating rare diseases more challenging than common diseases. Recently, agents powered by large language models (LLMs) have demonstrated notable applications across various domains. In the medical field, some agent methods have outperformed direct prompts in question-answering tasks from medical examinations. However, current agent frameworks are not well-adapted to real-world clinical scenarios, especially those involving the complex demands of rare diseases. To bridge this gap, we introduce RareAgents, the first LLM-driven multi-disciplinary team decision-support tool designed specifically for the complex clinical context of rare diseases. RareAgents integrates advanced Multidisciplinary Team (MDT) coordination, memory mechanisms, and medical tools utilization, leveraging Llama-3.1-8B/70B as the base model. Experimental results show that RareAgents outperforms state-of-the-art domain-specific models, GPT-4o, and current agent frameworks in diagnosis and treatment for rare diseases. Furthermore, we contribute a novel rare disease dataset, MIMIC-IV-Ext-Rare, to facilitate further research in this field.

EAAI Journal 2026 Journal Article

The intelligent detection method for similar cotton diseases based on high- and low-frequency feature enhancement and attention-guided fusion

  • Ting Chen
  • Zhenhong Jia
  • Jiajia Wang
  • Gang Zhou
  • Peng Ouyang

Accurate and efficient identification of crop diseases is crucial for smart agriculture. However, existing deep learning models often struggle to detect small-scale lesions due to insufficient feature representation. This paper proposes an adaptive and efficient feature fusion model specifically designed to detect cotton leaf diseases (EHDM-Net), which is developed based on you only look once version 11 (YOLOv11) to enhance multiscale feature representation and detection accuracy. EHDM-Net incorporates two novel modules — the high- and low-frequency feature enhancement module (HLAEM), which enhances object boundaries and fine-grained features, applied to the cross stage partial with kernel size 2 (C3k2) to dynamically adjust the receptive field for improved multiscale feature extraction, and the efficient attention-directed fusion module (EAGFM), which applies pixel-level dynamic weighting to suppress redundancy and enhance cross-scale feature interactions. Experiments on a self-constructed cotton leaf disease dataset achieved mean average precision (mAP@50) of 91. 4 and F1 score of 91. Validation on a public dataset confirmed the robustness and generalization capability of EHDM-Net. Furthermore, based on the proposed model, an intelligent cotton disease detection system was developed, supporting image- and video-based disease identification and providing a practical tool for field monitoring and analysis of cotton health conditions. Code availability: https: //github. com/Tingchen121/EHDM-Net. git.

EAAI Journal 2025 Journal Article

A novel object detection model for sugar beet Cercospora leaf spot in field scenarios based on large kernel decomposition and spatial channel interaction attention

  • Hualong Dong
  • Yi Lu
  • Yurong Qian
  • Xuefei Ning
  • Ting Chen
  • Ke Tang

Cercospora leaf spot (CLS) is a widespread disease that seriously threatens beet yield and sugar quality. Timely detection enables farmers to take early control measures and reduce economic losses. Although artificial intelligence (AI)-based methods are replacing manual inspection in agriculture, CLS detection in complex field environments remains highly challenging due to subtle early-stage symptoms and severe occlusions caused by overlapping leaves and weeds. To address these challenges, this paper presents Cercospora Leaf Spot–You Only Look Once (CLS–YOLO), an enhanced detection model built upon You Only Look Once version 11 (YOLOv11), incorporating novel modules specifically designed for accurate CLS detection under challenging field conditions. To improve the detection of weak and early-stage symptoms, we design the Multi-Scale Large Kernel Decomposition (MSLKD) module, which enhances feature extraction for subtle lesions. Furthermore, we develop the Spatial-Channel Interaction Attention (SCIA) module to mitigate detection errors arising from occlusion and fragmented disease patterns by refining multi-scale feature representations. Experimental results demonstrate CLS–YOLO achieves superior performance, reaching an mAP@0. 5 of 73. 6% ± 0. 2% and an mAP@0. 5: 0. 95 of 40. 6% ± 0. 3% over five independent runs, outperforming twelve mainstream object detection algorithms while maintaining lightweight efficiency. To validate generalization capability across scenarios, crops, and diseases, we conducted comparative experiments on two public crop disease datasets, where our method achieved superior overall performance. In summary, this study provides an effective AI-driven solution for precise crop disease detection, contributing to the practical advancement of intelligent agriculture.

AAAI Conference 2025 Conference Paper

CoPRA: Bridging Cross-domain Pretrained Sequence Models with Complex Structures for Protein-RNA Binding Affinity Prediction

  • Rong Han
  • Xiaohong Liu
  • Tong Pan
  • Jing Xu
  • Xiaoyu Wang
  • Wuyang Lan
  • Zhenyu Li
  • Zixuan Wang

Accurately measuring protein-RNA binding affinity is crucial in many biological processes and drug design. Previous computational methods for protein-RNA binding affinity prediction rely on either sequence or structure features, unable to capture the binding mechanisms comprehensively. The recent emerging pre-trained language models trained on massive unsupervised sequences of protein and RNA have shown strong representation ability for various in-domain downstream tasks, including binding site prediction. However, applying different-domain language models collaboratively for complex-level tasks remains unexplored. In this paper, we propose CoPRA to bridge pre-trained language models from different biological domains via Complex structure for Protein-RNA binding Affinity prediction. We demonstrate for the first time that cross-biological modal language models can collaborate to improve binding affinity prediction. We propose a Co-Former to combine the cross-modal sequence and structure information and a bi-scope pre-training strategy for improving Co-Former's interaction understanding. Meanwhile, we build the largest protein-RNA binding affinity dataset PRA310 for performance evaluation. We also test our model on a public dataset for mutation effect prediction. CoPRA reaches state-of-the-art performance on all the datasets. We provide extensive analyses and verify that CoPRA can (1) accurately predict the protein-RNA binding affinity; (2) understand the binding affinity change caused by mutations; and (3) benefit from scaling data and model size.

AAAI Conference 2025 Conference Paper

HeMeNet: Heterogeneous Multichannel Equivariant Network for Protein Multi-task Learning

  • Rong Han
  • Wenbing Huang
  • Lingxiao Luo
  • Xinyan Han
  • Jiaming Shen
  • Zhiqiang Zhang
  • Jun Zhou
  • Ting Chen

Understanding and leveraging the 3D structures of proteins is central to a variety of biological and drug discovery tasks. While deep learning has been applied successfully for structure-based protein function prediction tasks, current methods usually employ distinct training for each task. However, each of the tasks is of small size, and such a single-task strategy hinders the models' performance and generalization ability. As some labeled 3D protein datasets are biologically related, combining multi-source datasets for larger-scale multi-task learning is one way to overcome this problem. In this paper, we propose a neural network model to address multiple tasks jointly upon the input of 3D protein structures. In particular, we first construct a standard structure-based multi-task benchmark called Protein-MT, consisting of 6 biologically relevant tasks, including affinity prediction and property prediction, integrated from 4 public datasets. Then, we develop a novel graph neural network for multi-task learning, dubbed Heterogeneous Multichannel Equivariant Network (HeMeNet), which is E(3) equivariant and able to capture heterogeneous relationships between different atoms. Besides, HeMeNet can achieve task-specific learning via the task-aware readout mechanism. Extensive evaluations of our benchmark verify the effectiveness of multi-task learning, and our model generally surpasses state-of-the-art models.

AAAI Conference 2025 Conference Paper

Multi-axis Prompt and Multi-dimension Fusion Network for All-in-one Weather-degraded Image Restoration

  • Yuanbo Wen
  • Tao Gao
  • Jing Zhang
  • Ziqi Li
  • Ting Chen

Existing approaches aiming to remove adverse weather degradations compromise the image quality and incur the long processing time. To this end, we introduce a multi-axis prompt and multi-dimension fusion network (MPMF-Net). Specifically, we develop a multi-axis prompts learning block (MPLB), which learns the prompts along three separate axis planes, requiring fewer parameters and achieving superior performance. Moreover, we present a multi-dimension feature interaction block (MFIB), which optimizes intra-scale feature fusion by segregating features along height, width and channel dimensions. This strategy enables more accurate mutual attention and adaptive weight determination. Additionally, we propose the coarse-scale degradation-free implicit neural representations (CDINR) to normalize the degradation levels of different weather conditions. Extensive experiments demonstrate the significant improvements of our model over the recent well-performing approaches in both reconstruction fidelity and inference time.

NeurIPS Conference 2025 Conference Paper

Rethinking Hebbian Principle: Low-Dimensional Structural Projection for Unsupervised Learning

  • Shikuang Deng
  • Jiayuan Zhang
  • Yuhang Wu
  • Ting Chen
  • Shi Gu

Hebbian learning is a biological principle that intuitively describes how neurons adapt their connections through repeated stimuli. However, when applied to machine learning, it suffers serious issues due to the unconstrained updates of the connections and the lack of accounting for feedback mediation. Such shortcomings limit its effective scaling to complex network architectures and tasks. To this end, here we introduce the Structural Projection Hebbian Representation (SPHeRe), a novel unsupervised learning method that integrates orthogonality and structural information preservation through a local auxiliary nonlinear block. The loss for structural information preservation backpropagates to the input through an auxiliary lightweight projection that conceptually serves as feedback mediation while the orthogonality constraints account for the boundedness of updating magnitude. Extensive experimental results show that SPHeRe achieves SOTA performance among unsupervised synaptic plasticity approaches on standard image classification benchmarks, including CIFAR-10, CIFAR-100, and Tiny-ImageNet. Furthermore, the method exhibits strong effectiveness in continual learning and transfer learning scenarios, and image reconstruction tasks show the robustness and generalizability of the extracted features. This work demonstrates the competitiveness and potential of Hebbian unsupervised learning rules within modern deep learning frameworks, demonstrating the possibility of efficient and biologically inspired learning algorithms without the strong dependence on strict backpropagation. Our code is available at https: //github. com/brain-intelligence-lab/SPHeRe.

EAAI Journal 2024 Journal Article

A novel dual-stage progressive enhancement network for single image deraining

  • Tao Gao
  • Yuanbo Wen
  • Jing Zhang
  • Ting Chen

The dense rain accumulation in heavy rain can significantly wash out images and thus destroy the background details of images. Although existing deep rain removal models lead to improved performance for heavy rain removal, we find that most of them ignore the detail reconstruction accuracy of rain-free images. In this paper, we propose a dual-stage progressive enhancement network (DPENet-v2) to achieve effective deraining with structure-accurate rain-free images. Three main modules are included in our framework, namely a rain streaks removal network (R 2 Net), a details reconstruction network (DRNet) and a cross-stage feature interaction module (CFIM). The former aims to achieve accurate rain removal, and the latter is designed to recover the details of rain-free images. We introduce two main strategies within our networks to achieve trade-off between the effectiveness of deraining and the detail restoration of rain-free images. Firstly, a dilated dense residual block (DDRB) within the rain streaks removal network is presented to aggregate high/low level features of heavy rain. Secondly, an enhanced residual pixel-wise attention block (ERPAB) within the details reconstruction network is designed for context information aggregation. Meanwhile, CFIM learns the long-range dependencies and achieves cross-stage information communication. We also propose a comprehensive loss function to highlight the marginal and regional accuracy of rain-free images. Extensive experiments on benchmark public datasets show both efficiency and effectiveness of the proposed method in achieving structure-preserving rain-free images for heavy rain removal.

ICML Conference 2024 Conference Paper

Denoising Autoregressive Representation Learning

  • Yazhe Li
  • Jörg Bornschein
  • Ting Chen

In this paper, we explore a new generative approach for learning visual representations. Our method, DARL, employs a decoder-only Transformer to predict image patches autoregressively. We find that training with Mean Squared Error (MSE) alone leads to strong representations. To enhance the image generation ability, we replace the MSE loss with the diffusion objective by using a denoising patch decoder. We show that the learned representation can be improved by using tailored noise schedules and longer training in larger models. Notably, the optimal schedule differs significantly from the typical ones used in standard image diffusion models. Overall, despite its simple architecture, DARL delivers performance remarkably close to state-of-the-art masked prediction models under the fine-tuning protocol. This marks an important step towards a unified model capable of both visual perception and generation, effectively combining the strengths of autoregressive and denoising diffusion models.

AAAI Conference 2024 Conference Paper

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

  • Wentse Chen
  • Shiyu Huang
  • Yuan Chiang
  • Tim Pearce
  • Wei-Wei Tu
  • Ting Chen
  • Jun Zhu

Most reinforcement learning algorithms seek a single optimal strategy that solves a given task. However, it can often be valuable to learn a diverse set of solutions, for instance, to make an agent's interaction with users more engaging, or improve the robustness of a policy to an unexpected perturbance. We propose Diversity-Guided Policy Optimization (DGPO), an on-policy algorithm that discovers multiple strategies for solving a given task. Unlike prior work, it achieves this with a shared policy network trained over a single run. Specifically, we design an intrinsic reward based on an information-theoretic diversity objective. Our final objective alternately constraints on the diversity of the strategies and on the extrinsic reward. We solve the constrained optimization problem by casting it as a probabilistic inference task and use policy iteration to maximize the derived lower bound. Experimental results show that our method efficiently discovers diverse strategies in a wide variety of reinforcement learning tasks. Compared to baseline methods, DGPO achieves comparable rewards, while discovering more diverse strategies, and often with better sample efficiency.

ICLR Conference 2024 Conference Paper

Perceptual Group Tokenizer: Building Perception with Iterative Grouping

  • Zhiwei Deng
  • Ting Chen
  • Yang Li

Human visual recognition system shows astonishing capability of compressing visual information into a set of tokens containing rich representations without label supervision. One critical driving principle behind it is perceptual grouping. Despite being widely used in computer vision in the early 2010s, it remains a mystery whether perceptual grouping can be leveraged to derive a neural visual recognition backbone that generates as powerful representations. In this paper, we propose the Perceptual Group Tokenizer, a model that entirely relies on grouping operations to extract visual features and perform self-supervised representation learning, where a series of grouping operations are used to iteratively hypothesize the context for pixels or superpixels to refine feature representations. We show that the proposed model can achieve competitive performance compared to state-of-the-art vision architectures, and inherits desirable properties including adaptive computation without re-training, and interpretability. Specifically, Perceptual Group Tokenizer achieves 79.7% on ImageNet-1K self-supervised learning benchmark with linear probe evaluation, marking a new progress under this paradigm.

AAMAS Conference 2023 Conference Paper

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

  • Wenze Chen
  • Shiyu Huang
  • Yuan Chiang
  • Ting Chen
  • Jun Zhu

Recent algorithms designed for reinforcement learning tasks focus on finding a single optimal solution. However, in many practical applications, it is important to develop reasonable agents with diverse strategies. In this paper, we propose Diversity-Guided Policy Optimization, an on-policy framework for discovering multiple strategies for the same task. Our algorithm uses diversity objectives to guide a latent code conditioned policy to learn a set of diverse strategies in a single training procedure. Experimental results show that our method efficiently finds diverse strategies in a wide variety of reinforcement learning tasks. We further show that DGPO has similar performance and achieves a higher diversity score or better sample efficiency compared to other baselines.

ICML Conference 2023 Conference Paper

Scalable Adaptive Computation for Iterative Generation

  • Allan Jabri
  • David J. Fleet
  • Ting Chen

Natural data is redundant yet predominant architectures tile computation uniformly across their input and output space. We propose the Recurrent Interface Network (RIN), an attention-based architecture that decouples its core computation from the dimensionality of the data, enabling adaptive computation for more scalable generation of high-dimensional data. RINs focus the bulk of computation (i. e. global self-attention) on a set of latent tokens, using cross-attention to read and write (i. e. route) information between latent and data tokens. Stacking RIN blocks allows bottom-up (data to latent) and top-down (latent to data) feedback, leading to deeper and more expressive routing. While this routing introduces challenges, this is less problematic in recurrent computation settings where the task (and routing problem) changes gradually, such as iterative generation with diffusion models. We show how to leverage recurrence by conditioning the latent tokens at each forward pass of the reverse diffusion process with those from prior computation, i. e. latent self-conditioning. RINs yield state-of-the-art pixel diffusion models for image and video generation, scaling to1024×1024 images without cascades or guidance, while being domain-agnostic and up to 10× more efficient than 2D and 3D U-Nets.

NeurIPS Conference 2022 Conference Paper

A Unified Sequence Interface for Vision Tasks

  • Ting Chen
  • Saurabh Saxena
  • Lala Li
  • Tsung-Yi Lin
  • David J. Fleet
  • Geoffrey E. Hinton

While language tasks are naturally expressed in a single, unified, modeling framework, i. e. , generating sequences of tokens, this has not been the case in computer vision. As a result, there is a proliferation of distinct architectures and loss functions for different vision tasks. In this work we show that a diverse set of "core" computer vision tasks can also be unified if formulated in terms of a shared pixel-to-sequence interface. We focus on four tasks, namely, object detection, instance segmentation, keypoint detection, and image captioning, all with diverse types of outputs, e. g. , bounding boxes or dense masks. Despite that, by formulating the output of each task as a sequence of discrete tokens with a unified interface, we show that one can train a neural network with a single model architecture and loss function on all these tasks, with no task-specific customization. To solve a specific task, we use a short prompt as task description, and the sequence output adapts to the prompt so it can produce task-specific output. We show that such a model can achieve competitive performance compared to well-established task-specific models.

AIIM Journal 2022 Journal Article

Automatic diagnosis of arrhythmia with electrocardiogram using multiple instance learning: From rhythm annotation to heartbeat prediction

  • Xuan Zhang
  • Hui Wu
  • Ting Chen
  • Guangyu Wang

The electrocardiogram (ECG) is a commonly used technique for detecting arrhythmias and many other cardiac diseases. Automatic ECG diagnosis has seen tremendous success in recent years, owing to the rapid development of the deep learning (DL) approach. Existing works on automatic ECG diagnosis can be divided roughly into two categories: prediction at the rhythm level from an ECG record, and prediction at the heartbeat level, although their relationship was seldom studied previously. In this paper, we address the following question: can we train an abnormal heartbeat detection model using solely data annotated at the rhythm level? We first used multiple instance learning (MIL) to model the relationship between an ECG record (whose label is given at the rhythm level and is provided as an input) and the heartbeats in the ECG (whose labels are to be predicted). Then, we sequentially trained two models, a rhythm model for detecting abnormal heartbeats in an ECG record labeled as arrhythmia, and a heartbeat model for classifying heartbeats as normal or various types of arrhythmias. We trained and tested our models using 61, 853 ECG records with rhythm annotations. The experimental results demonstrate that the heartbeat model achieves a macro-average F1 score of 0. 807 in classifying four types of arrhythmias as well as normal heartbeats. Our model significantly outperforms the model directly trained with 15, 385 ECG heartbeats with heartbeat annotations, demonstrating the viability of our strategy for training a high-performing heartbeat-level automatic diagnostic model using only rhythm annotation.

TMLR Journal 2022 Journal Article

Decoder Denoising Pretraining for Semantic Segmentation

  • Emmanuel Asiedu Brempong
  • Simon Kornblith
  • Ting Chen
  • Niki Parmar
  • Matthias Minderer
  • Mohammad Norouzi

Semantic segmentation labels are expensive and time consuming to acquire. Hence, pretraining is commonly used to improve the label-efficiency of segmentation models. Typically, the encoder of a segmentation model is pretrained as a classifier and the decoder is randomly initialized. Here, we argue that random initialization of the decoder can be suboptimal, especially when few labeled examples are available. We propose a decoder pretraining approach based on denoising, which can be combined with supervised pretraining of the encoder. We find that decoder denoising pretraining on the ImageNet dataset strongly outperforms encoder-only supervised pretraining. Despite its simplicity, decoder denoising pretraining achieves state-of-the-art results on label-efficient semantic segmentation and offers considerable gains on the Cityscapes, Pascal Context, and ADE20K datasets.

JBHI Journal 2022 Journal Article

Explainable Dynamic Multimodal Variational Autoencoder for the Prediction of Patients With Suspected Central Precocious Puberty

  • Yiming Xu
  • Xiaohong Liu
  • Liyan Pan
  • Xiaojian Mao
  • Huiying Liang
  • Guangyu Wang
  • Ting Chen

Central precocious puberty (CPP) is the most common type of precocious puberty and has a significant effect on children. A gonadotropin-releasing hormone (GnRH)-stimulation test is the gold standard for confirming CPP. This test, however, is costly and unpleasant for patients. Therefore, it is critical to developing alternative methods for CPP diagnosis in order to alleviate patient suffering. This study aims to develop an artificial intelligence (AI) diagnostic system for predicting response to the GnRH-stimulation test using data from laboratory tests, electronic health records (EHRs), and pelvic ultrasonography and left-hand radiography reports. The challenges are in integrating these multimodal features into a comprehensive deep learning model in order to achieve an accurate diagnosis while also accounting for the missing or incomplete modalities. To begin, we developed a dynamic multimodal variational autoencoder (DMVAE) that can exploit intrinsic correlations between different modalities to impute features for missing modalities. Next, we combined features from all modalities to predict the outcome of a CPP diagnosis. The experimental results (AUROC 0. 9086) demonstrate that our DMVAE model is superior to standard methods. Additionally, we showed that by setting appropriate operating thresholds, clinicians could diagnose about two-thirds of patients with confidence (1. 0 specificity). Only about one-third of patients require confirmation of their diagnoses using GnRH (or GnRH analog)-stimulation tests. To interpret the results, we implemented an explainer Shapley additive explanation (SHAP) to analyze the local and global feature attributions.

AAAI Conference 2022 Conference Paper

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

  • Zizhao Zhang
  • Han Zhang
  • Long Zhao
  • Ting Chen
  • Sercan Ö. Arik
  • Tomas Pfister

Hierarchical structures are popular in recent vision transformers, however, they require sophisticated designs and massive datasets to work well. In this paper, we explore the idea of nesting basic local transformers on non-overlapping image blocks and aggregating them in a hierarchical way. We find that the block aggregation function plays a critical role in enabling cross-block non-local information communication. This observation leads us to design a simplified architecture that requires minor code changes upon the original vision transformer. The benefits of the proposed judiciouslyselected design are threefold: (1) NesT converges faster and requires much less training data to achieve good generalization on both ImageNet and small datasets like CIFAR; (2) when extending our key ideas to image generation, NesT leads to a strong decoder that is 8 times faster than previous transformer-based generators; and (3) we show that decoupling the feature learning and abstraction processes via this nested hierarchy in our design enables constructing a novel method (named GradCAT) for visually interpreting the learned model. Source code is available https: //github. com/ google-research/nested-transformer.

NeurIPS Conference 2022 Conference Paper

Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation

  • Yao Qin
  • Chiyuan Zhang
  • Ting Chen
  • Balaji Lakshminarayanan
  • Alex Beutel
  • Xuezhi Wang

We investigate the robustness of vision transformers (ViTs) through the lens of their special patch-based architectural structure, i. e. , they process an image as a sequence of image patches. We find that ViTs are surprisingly insensitive to patch-based transformations, even when the transformation largely destroys the original semantics and makes the image unrecognizable by humans. This indicates that ViTs heavily use features that survived such transformations but are generally not indicative of the semantic class to humans. Further investigations show that these features are useful but non-robust, as ViTs trained on them can achieve high in-distribution accuracy, but break down under distribution shifts. From this understanding, we ask: can training the model to rely less on these features improve ViT robustness and out-of-distribution performance? We use the images transformed with our patch-based operations as negatively augmented views and offer losses to regularize the training away from using non-robust features. This is a complementary view to existing research that mostly focuses on augmenting inputs with semantic-preserving transformations to enforce models' invariance. We show that patch-based negative augmentation consistently improves robustness of ViTs on ImageNet based robustness benchmarks across 20+ different experimental settings. Furthermore, we find our patch-based negative augmentation are complementary to traditional (positive) data augmentation techniques and batch-based negative examples in contrastive learning.

NeurIPS Conference 2021 Conference Paper

Improved Transformer for High-Resolution GANs

  • Long Zhao
  • Zizhao Zhang
  • Ting Chen
  • Dimitris Metaxas
  • Han Zhang

Attention-based models, exemplified by the Transformer, can effectively model long range dependency, but suffer from the quadratic complexity of self-attention operation, making them difficult to be adopted for high-resolution image generation based on Generative Adversarial Networks (GANs). In this paper, we introduce two key ingredients to Transformer to address this challenge. First, in low-resolution stages of the generative process, standard global self-attention is replaced with the proposed multi-axis blocked self-attention which allows efficient mixing of local and global attention. Second, in high-resolution stages, we drop self-attention while only keeping multi-layer perceptrons reminiscent of the implicit neural function. To further improve the performance, we introduce an additional self-modulation component based on cross-attention. The resulting model, denoted as HiT, has a nearly linear computational complexity with respect to the image size and thus directly scales to synthesizing high definition images. We show in the experiments that the proposed HiT achieves state-of-the-art FID scores of 30. 83 and 2. 95 on unconditional ImageNet $128 \times 128$ and FFHQ $256 \times 256$, respectively, with a reasonable throughput. We believe the proposed HiT is an important milestone for generators in GANs which are completely free of convolutions. Our code is made publicly available at https: //github. com/google-research/hit-gan.

NeurIPS Conference 2021 Conference Paper

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling

  • Ziyu Jiang
  • Tianlong Chen
  • Ting Chen
  • Zhangyang Wang

Contrastive learning approaches have achieved great success in learning visual representations with few labels of the target classes. That implies a tantalizing possibility of scaling them up beyond a curated “seed" benchmark, to incorporating more unlabeled images from the internet-scale external sources to enhance its performance. However, in practice, larger amount of unlabeled data will require more computing resources due to the bigger model size and longer training needed. Moreover, open-world unlabeled data usually follows an implicit long-tail class or attribute distribution, many of which also do not belong to the target classes. Blindly leveraging all unlabeled data hence can lead to the data imbalance as well as distraction issues. This motivates us to seek a principled approach to strategically select unlabeled data from an external source, in order to learn generalizable, balanced and diverse representations for relevant classes. In this work, we present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK), which follows three simple principles: (1) tailness, which encourages sampling of examples from tail classes, by sorting the empirical contrastive loss expectation (ECLE) of samples over random data augmentations; (2) proximity, which rejects the out-of-distribution outliers that may distract training; and (3) diversity, which ensures diversity in the set of sampled examples. Empirically, using ImageNet-100-LT (without labels) as the seed dataset and two “noisy” external data sources, we demonstrate that MAK can consistently improve both the overall representation quality and the class balancedness of the learned features, as evaluated via linear classifier evaluation on full-shot and few-shot settings. Thecode is available at: https: //github. com/VITA-Group/MAK.

NeurIPS Conference 2021 Conference Paper

Intriguing Properties of Contrastive Losses

  • Ting Chen
  • Calvin Luo
  • Lala Li

We study three intriguing properties of contrastive learning. First, we generalize the standard contrastive loss to a broader family of losses, and we find that various instantiations of the generalized loss perform similarly under the presence of a multi-layer non-linear projection head. Second, we study if instance-based contrastive learning (with a global image representation) can learn well on images with multiple objects present. We find that meaningful hierarchical local features can be learned despite the fact that these objectives operate on global instance-level features. Finally, we study the phenomenon of feature suppression among competing features shared across augmented views, such as "color distribution" vs "object class". We construct datasets with explicit and controllable competing features, and show that, for contrastive learning, a few bits of easy-to-learn shared features can suppress, and even fully prevent, the learning of other sets of competing features. In scenarios where there are multiple objects in an image, the dominant object would suppress the learning of smaller objects. Existing contrastive learning methods critically rely on data augmentation to favor certain sets of features over others, and could suffer from learning saturation for scenarios where existing augmentations cannot fully address the feature suppression. This poses open challenges to existing contrastive learning techniques.

NeurIPS Conference 2021 Conference Paper

Why Do Better Loss Functions Lead to Less Transferable Features?

  • Simon Kornblith
  • Ting Chen
  • Honglak Lee
  • Mohammad Norouzi

Previous work has proposed many new loss functions and regularizers that improve test accuracy on image classification tasks. However, it is not clear whether these loss functions learn better representations for downstream tasks. This paper studies how the choice of training objective affects the transferability of the hidden representations of convolutional neural networks trained on ImageNet. We show that many objectives lead to statistically significant improvements in ImageNet accuracy over vanilla softmax cross-entropy, but the resulting fixed feature extractors transfer substantially worse to downstream tasks, and the choice of loss has little effect when networks are fully fine-tuned on the new tasks. Using centered kernel alignment to measure similarity between hidden representations of networks, we find that differences among loss functions are apparent only in the last few layers of the network. We delve deeper into representations of the penultimate layer, finding that different objectives and hyperparameter combinations lead to dramatically different levels of class separation. Representations with higher class separation obtain higher accuracy on the original task, but their features are less useful for downstream tasks. Our results suggest there exists a trade-off between learning invariant features for the original task and features relevant for transfer tasks.

IJCAI Conference 2020 Conference Paper

Automatic Emergency Diagnosis with Knowledge-Based Tree Decoding

  • Ke Wang
  • Xuyan Chen
  • Ning Chen
  • Ting Chen

Automatic diagnosis based on clinical notes is critical especially in the emergency department, where a fast and professional result is vital in assuring proper and timely treatment. Previous works formalize this task as plain text classification and fail to utilize the medically significant tree structure of International Classification of Diseases (ICD) coding system. Besides, external medical knowledge is rarely used before, and we explore it by extracting relevant materials from Wikipedia or Baidupedia. In this paper, we propose a knowledge-based tree decoding model (K-BTD), and the inference procedure is a top-down decoding process from the root node to leaf nodes. The stepwise inference procedure enables the model to give support for decision at each step, which visualizes the diagnosis procedure and adds to the interpretability of final predictions. Experiments on real-world data from the emergency department of a large-scale hospital indicate that the proposed model outperforms all baselines in both micro-F1 and macro-F1, and reduce the semantic distance dramatically.

NeurIPS Conference 2020 Conference Paper

Big Self-Supervised Models are Strong Semi-Supervised Learners

  • Ting Chen
  • Simon Kornblith
  • Kevin Swersky
  • Mohammad Norouzi
  • Geoffrey E. Hinton

One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to common approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network. After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way. The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2, supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge. This procedure achieves 73. 9% ImageNet top-1 accuracy with just 1% of the labels ($\le$13 labeled images per class) using ResNet-50, a 10X improvement in label efficiency over the previous state-of-the-art. With 10% of labels, ResNet-50 trained with our method achieves 77. 5% top-1 accuracy, outperforming standard supervised training with all of the labels.

NeurIPS Conference 2020 Conference Paper

Graph Contrastive Learning with Augmentations

  • Yuning You
  • Tianlong Chen
  • Yongduo Sui
  • Ting Chen
  • Zhangyang Wang
  • Yang Shen

Generalizable, transferrable, and robust representation learning on graph-structured data remains a challenge for current graph neural networks (GNNs). Unlike what has been developed for convolutional neural networks (CNNs) for image data, self-supervised learning and pre-training are less explored for GNNs. In this paper, we propose a graph contrastive learning (GraphCL) framework for learning unsupervised representations of graph data. We first design four types of graph augmentations to incorporate various priors. We then systematically study the impact of various combinations of graph augmentations on multiple datasets, in four different settings: semi-supervised, unsupervised, and transfer learning as well as adversarial attacks. The results show that, even without tuning augmentation extents nor using sophisticated GNN architectures, our GraphCL framework can produce graph representations of similar or better generalizability, transferrability, and robustness compared to state-of-the-art methods. We also investigate the impact of parameterized graph augmentation extents and patterns, and observe further performance gains in preliminary experiments. Our codes are available at https: //github. com/Shen-Lab/GraphCL.

NeurIPS Conference 2020 Conference Paper

Robust Pre-Training by Adversarial Contrastive Learning

  • Ziyu Jiang
  • Tianlong Chen
  • Ting Chen
  • Zhangyang Wang

Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness In this work, we improve robustness-aware self-supervised pre-training by learning representations that are consistent under both data augmentations and adversarial perturbations. Our approach leverages a recent contrastive learning framework, which learns representations by maximizing feature consistency under differently augmented views. This fits particularly well with the goal of adversarial robustness, as one cause of adversarial fragility is the lack of feature invariance, i. e. , small input perturbations can result in undesirable large changes in features or even predicted labels. We explore various options to formulate the contrastive task, and demonstrate that by injecting adversarial perturbations, contrastive pre-training can lead to models that are both label-efficient and robust. We empirically evaluate the proposed Adversarial Contrastive Learning (ACL) and show it can consistently outperform existing methods. For example on the CIFAR-10 dataset, ACL outperforms the previous state-of-the-art unsupervised robust pre-training approach by 2. 99% on robust accuracy and 2. 14% on standard accuracy. We further demonstrate that ACL pre-training can improve semi-supervised adversarial training, even when only a few labeled examples are available. Our codes and pre-trained models have been released at: https: //github. com/VITA-Group/Adversarial-Contrastive-Learning.

NeurIPS Conference 2020 Conference Paper

The Origins and Prevalence of Texture Bias in Convolutional Neural Networks

  • Katherine Hermann
  • Ting Chen
  • Simon Kornblith

Recent work has indicated that, unlike humans, ImageNet-trained CNNs tend to classify images by texture rather than by shape. How pervasive is this bias, and where does it come from? We find that, when trained on datasets of images with conflicting shape and texture, CNNs learn to classify by shape at least as easily as by texture. What factors, then, produce the texture bias in CNNs trained on ImageNet? Different unsupervised training objectives and different architectures have small but significant and largely independent effects on the level of texture bias. However, all objectives and architectures still lead to models that make texture-based classification decisions a majority of the time, even if shape information is decodable from their hidden representations. The effect of data augmentation is much larger. By taking less aggressive random crops at training time and applying simple, naturalistic augmentation (color distortion, noise, and blur), we train models that classify ambiguous images by shape a majority of the time, and outperform baselines on out-of-distribution test sets. Our results indicate that apparent differences in the way humans and ImageNet-trained CNNs process images may arise not primarily from differences in their internal workings, but from differences in the data that they see.

ICLR Conference 2020 Conference Paper

Understanding Why Neural Networks Generalize Well Through GSNR of Parameters

  • Jinlong Liu
  • Yunzhi Bai
  • Guoqing Jiang
  • Ting Chen
  • Huayan Wang

As deep neural networks (DNNs) achieve tremendous success across many application domains, researchers tried to explore in many aspects on why they generalize well. In this paper, we provide a novel perspective on these issues using the gradient signal to noise ratio (GSNR) of parameters during training process of DNNs. The GSNR of a parameter is simply defined as the ratio between its gradient's squared mean and variance, over the data distribution. Based on several approximations, we establish a quantitative relationship between model parameters' GSNR and the generalization gap. This relationship indicates that larger GSNR during training process leads to better generalization performance. Futher, we show that, different from that of shallow models (e.g. logistic regression, support vector machines), the gradient descent optimization dynamics of DNNs naturally produces large GSNR during training, which is probably the key to DNNs’ remarkable generalization ability.

AAAI Conference 2019 Conference Paper

Combo-Action: Training Agent For FPS Game with Auxiliary Tasks

  • Shiyu Huang
  • Hang Su
  • Jun Zhu
  • Ting Chen

Deep reinforcement learning (DRL) has achieved surpassing human performance on Atari games, using raw pixels and rewards to learn everything. However, first-person-shooter (FPS) games in 3D environments contain higher levels of human concepts (enemy, weapon, spatial structure, etc.) and a large action space. In this paper, we explore a novel method which can plan on temporally-extended action sequences, which we refer as Combo-Action to compress the action space. We further train a deep recurrent Q-learning network model as a high-level controller, called supervisory network, to manage the Combo-Actions. Our method can be boosted with auxiliary tasks (enemy detection and depth prediction), which enable the agent to extract high-level concepts in the FPS games. Extensive experiments show that our method is efficient in training process and outperforms previous stateof-the-art approaches by a large margin. Ablation study experiments also indicate that our method can boost the performance of the FPS agent in a reasonable way.

IJCAI Conference 2019 Conference Paper

Unsupervised Inductive Graph-Level Representation Learning via Graph-Graph Proximity

  • Yunsheng Bai
  • Hao Ding
  • Yang Qiao
  • Agustin Marinovic
  • Ken Gu
  • Ting Chen
  • Yizhou Sun
  • Wei Wang

We introduce a novel approach to graph-level representation learning, which is to embed an entire graph into a vector space where the embeddings of two graphs preserve their graph-graph proximity. Our approach, UGraphEmb, is a general framework that provides a novel means to performing graph-level embedding in a completely unsupervised and inductive manner. The learned neural network can be considered as a function that receives any graph as input, either seen or unseen in the training set, and transforms it into an embedding. A novel graph-level embedding generation mechanism called Multi-Scale Node Attention (MSNA), is proposed. Experiments on five real graph datasets show that UGraphEmb achieves competitive accuracy in the tasks of graph classification, similarity ranking, and graph visualization.

IJCAI Conference 2016 Conference Paper

Entity Embedding-Based Anomaly Detection for Heterogeneous Categorical Events

  • Ting Chen
  • Lu-An Tang
  • Yizhou Sun
  • Zhengzhang Chen
  • Kai Zhang

Anomaly detection plays an important role in modern data-driven security applications, such as detecting suspicious access to a socket from a process. In many cases, such events can be described as a collection of categorical values that are considered as entities of different types, which we call heterogeneous categorical events. Due to the lack of intrinsic distance measures among entities, and the exponentially large event space, most existing work relies heavily on heuristics to calculate abnormal scores for events. Different from previous work, we propose a principled and unified probabilistic model APE (Anomaly detection via Probabilistic pairwise interaction and Entity embedding) that directly models the likelihood of events. In this model, we embed entities into a common latent space using their observed co-occurrence in different events. More specifically, we first model the compatibility of each pair of entities according to their embeddings. Then we utilize the weighted pairwise interactions of different entity types to define the event probability. Using Noise-Contrastive Estimation with "context-dependent" noise distribution, our model can be learned efficiently regardless of the large event space. Experimental results on real enterprise surveillance data show that our methods can accurately detect abnormal events compared to other state-of-the-art abnormal detection techniques.

IJCAI Conference 2013 Conference Paper

Social Influence Locality for Modeling Retweeting Behaviors

  • Jing Zhang
  • Biao Liu
  • Jie Tang
  • Ting Chen
  • Juanzi Li

We study an interesting phenomenon of social in- fluence locality in a large microblogging network, which suggests that users’ behaviors are mainly in- fluenced by close friends in their ego networks. We provide a formal definition for the notion of social influence locality and develop two instantiation functions based on pairwise influence and structural diversity. The defined influence locality functions have strong predictive power. Without any additional features, we can obtain a F1-score of 71. 65% for predicting users’ retweet behaviors by training a logistic regression classifier based on the defined functions. Our analysis also reveals several intriguing discoveries. For example, though the probability of a user retweeting a microblog is positively correlated with the number of friends who have retweeted the microblog, it is surprisingly negatively correlated with the number of connected circles that are formed by those friends.

YNIMG Journal 2012 Journal Article

Construction of a neuroanatomical shape complex atlas from 3D MRI brain structures

  • Ting Chen
  • Anand Rangarajan
  • Stephan J. Eisenschenk
  • Baba C. Vemuri

Brain atlas construction has attracted significant attention lately in the neuroimaging community due to its application to the characterization of neuroanatomical shape abnormalities associated with various neurodegenerative diseases or neuropsychiatric disorders. Existing shape atlas construction techniques usually focus on the analysis of a single anatomical structure in which the important inter-structural information is lost. This paper proposes a novel technique for constructing a neuroanatomical shape complex atlas based on an information geometry framework. A shape complex is a collection of neighboring shapes – for example, the thalamus, amygdala and the hippocampus circuit – which may exhibit changes in shape across multiple structures during the progression of a disease. In this paper, we represent the boundaries of the entire shape complex using the zero level set of a distance transform function S(x). We then re-derive the relationship between the stationary state wave function ψ(x) of the Schrödinger equation − ℏ 2 ∇ 2 ψ + ψ =0 and the eikonal equation ‖∇ S‖=1 satisfied by any distance function. This leads to a one-to-one map (up to scale) between ψ(x) and S(x) via an explicit relationship. We further exploit this relationship by mapping ψ(x) to a unit hypersphere whose Riemannian structure is fully known, thus effectively turn ψ(x) into the square-root of a probability density function. This allows us to make comparisons – using elegant, closed-form analytic expressions – between shape complexes represented as square-root densities. A shape complex atlas is constructed by computing the Karcher mean ψ ¯ ( x ) in the space of square-root densities and then inversely mapping it back to the space of distance transforms in order to realize the atlas shape. We demonstrate the shape complex atlas computation technique via a set of experiments on a population of brain MRI scans including controls and epilepsy patients with either right anterior medial temporal or left anterior medial temporal lobectomies.