Author name cluster

Ying Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

38 papers

2 author rows

EAAI Journal 2026 Journal Article

Clinician-informed offline reinforcement learning for vasopressor administration optimization in shock management

Feier Qiu
Ying Chen
Xiuxian Wang
Na Geng
Zhitao Yang

Details DOI

EAAI Journal 2026 Journal Article

Decarbonization responsive scheduling interactions in cogeneration units based sustainable city energy ecosystem considering carbon capture and thermal storage device with digital social welfare

Jingxuan Dong
Ying Chen

Details DOI

EAAI Journal 2026 Journal Article

Enhancing small object detection in low-altitude remote sensing via high-resolution feature extraction and multi-scale fusion

Xinyuan Le
Ying Chen
Wei Zeng
Xiang Ao
Huiling Chen
Jingyan Xie

Details DOI

AAAI Conference 2026 Conference Paper

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and a Comprehensive Multimodal Dataset Towards General Medical AI

Tianbin Li
Yanzhou Su
Wei Li
Bin Fu
Zhe Chen
Ziyan Huang
Guoan Wang
Chenglong Ma

Despite significant advancements in general AI, its effectiveness in the medical domain is limited by the lack of specialized medical knowledge. To address this, we formulate GMAI-VL-5.5M, a multimodal medical dataset created by converting hundreds of specialized medical datasets with various annotations into high-quality image-text pairs. This dataset offers comprehensive task coverage, diverse modalities, and rich image-text data. Building upon this dataset, we develop GMAI-VL, a 7B-parameter general medical vision-language model, with a three-stage training strategy that enhances the integration of visual and textual information. This approach significantly improves the model's ability to process multimodal data, supporting accurate diagnoses and clinical decision-making. Experiments show that GMAI-VL achieves state-of-the-art performance across various multimodal medical tasks, including visual question answering and medical image diagnosis.

PDF Details DOI

TAAS Journal 2026 Journal Article

Joint Resource Allocation and Task Slicing for Mobile Multimedia Computing in Edge-based Autonomous Systems

Jiwei Huang
Yajing Leng
Jiarong Bao
Songyuan Li
Ying Chen

Mobile multimedia applications such as real-time video processing, augmented reality, and mobile gaming have raised high requirements for low latency and high efficiency. Edge-based autonomous systems have become a key technology for processing these application tasks. This article focuses on joint resource allocation and task slicing for mobile multimedia computing in edge-based autonomous systems. We propose an efficient resource allocation and task slicing strategy, aiming at the optimization of the overall utility of both edge servers and mobile devices simultaneously. We transform the resource allocation problem into resource pricing and purchasing behaviors. We present a Stackelberg game model and prove theorems for the existence of equilibrium and optimality. Based on the theorems, we design an algorithm namely G-RPTSS for resource purchasing and computation task slicing. Then, we employ Deep Reinforcement Learning (DRL) techniques in resource pricing and propose the DRL-ESRP algorithm which is capable of adaptively responding to dynamic computational scenarios in edge-based autonomous systems. Our scheme leverages the DRL technique for autonomous learning and policy adjustment. Simulation experiments, based on real-world scenario data, demonstrate the superior of our approach in learning efficiency and performance advantages to existing both non-DRL and other DRL algorithms.

Details DOI

JBHI Journal 2026 Journal Article

Point-Supervised Coronary Semantic Segmentation in X-Ray Angiographic Images

Ying Chen
Danni Ai
Jianyu Du
Yuanyuan Wang
Tianyu Fu
Deqiang Xiao
Yucong Lin
Long Shao

Coronary semantic segmentation in X-ray angiography is essential for computer-aided diagnosis and treatment planning of coronary artery disease (CAD). Despite its importance, this task remains highly challenging due to the complex and interconnected vascular topology, as well as the similar visual characteristics among different branches, making dense pixel-level manual annotation difficult and labor-intensive. To alleviate this burden, we propose a point-supervised coronary semantic segmentation framework that significantly reduces annotation effort without compromising segmentation accuracy. The primary challenge of point label based supervision lies in the model's tendency to overfit sparse point labels, leading to limited generalization to pixel-level predictions. To enrich the supervision signals and stabilize the training process with the sparse point labels, we propose an adaptive foreground mask generation module and a region regularization strategy to ensure accurate semantic guidance while maximizing meaningful coverage of the vascular structures. To enhance coronary topology perception and branch differentiation, we propose a multi-task learning framework that jointly performs keypoint detection and coronary semantic segmentation through a shared feature extraction encoder and two task-specific decoders. The experimental results demonstrate that our point-supervised model achieves performance comparable to fully supervised model, and outperforms the existing state-of-the-art point-supervised semantic segmentation methods.

Details DOI

AAAI Conference 2026 Conference Paper

S2-UniSeg: Fast Universal Agglomerative Pooling for Scalable Segment Anything Without Supervision

Huihui Xu
Jin Ye
Hongqiu Wang
Changkai Ji
Jiashi Lin
Ming Hu
Ziyan Huang
Ying Chen

Recent self-supervised image segmentation models have achieved promising performance on semantic segmentation and class-agnostic instance segmentation. However, their pretraining schedule is multi-stage, requiring a time-consuming pseudo-masks generation process between each training epoch. This time-consuming offline process not only makes it difficult to scale with training dataset size, but also leads to sub-optimal solutions due to its discontinuous optimization routine. To solve these, we first present a novel pseudo-mask algorithm, Fast Universal Agglomerative Pooling (UniAP). Each layer of UniAP can identify groups of similar nodes in parallel, allowing to generate both semantic-level and instance-level and multi-granular pseudo-masks within ens of milliseconds for one image. Based on the fast UniAP, we propose the Scalable Self-Supervised Universal Segmentation (S2-UniSeg), which employs a student and a momentum teacher for continuous pretraining. A novel segmentation-oriented pretext task, Query-wise Self-Distillation (QuerySD), is proposed to pretrain S2-UniSeg to learn the local-to-global correspondences. Under the same setting, S2-UniSeg outperforms the SOTA UnSAM model, achieving notable improvements of AP+6.9 on COCO, AR+11.1 on UVO, PixelAcc+4.5 on COCOStuff-27, RQ+8.0 on Cityscapes. After scaling up to a larger 2M-image subset of SA-1B, S2-UniSeg further achieves performance gains on all four benchmarks.

PDF Details DOI

EAAI Journal 2026 Journal Article

Self-supervised multi-level trajectory representation model for field-road trajectory segmentation

Xiaoqiang Zhang
Qianrun Wei
Bingbing Hu
Liwei Pan
Caicong Wu
Claus Aage Grøn Sørensen
Ying Chen
Kun Zhou

Details DOI

AAAI Conference 2026 Conference Paper

Trainable EEG Interpolation and Structure-Sharing Dual-Path Encoders for Brain-Assisted Target Speaker Extraction

Zhao Lv
Haoran Zhou
Ying Chen
Youdian Gao
Xinhui Li
Ruibo Fu
Cunhang Fan

Brain-assisted target speaker extraction (TSE) isolates a target speaker's voice from a mixture by leveraging task-specific representations in Electroencephalogram (EEG) signals. However, existing methods rely on fixed interpolation for EEG-audio alignment, introducing redundant computations. They also employ single-path encoders that extract only target-relevant features while neglecting complementary, irrelevant ones, limiting discriminability. To address these limitations, this paper proposes a Trainable EEG Interpolation and Structure-sharing Dual-path Encoders network (TIDENet). The proposed Trainable EEG Interpolation (TEI) uses a neural network module to leverage cross-sample EEG information during resampling by parameters updating, thereby overcoming the limitations of fixed interpolation. The Structure-sharing Dual-path Encoders (SSDPE) extend existing speech and EEG encoders by introducing dual paths that separately process features relevant and irrelevant to the target speaker and incorporates interactive fusion between them, which enhances the encoder's ability to capture task-relevant information. Experimental results on public datasets demonstrate that TIDENet achieves relative improvements of up to 20.47%, 22.22%, 2.91%, 6.20%, and 15.84% in signal-to-distortion ratio (SDR), scale-invariant SDR (SI-SDR), short-time objective intelligibility (STOI), extended STOI (ESTOI), and perceptual evaluation of speech quality (PESQ), respectively, compared to the state-of-the-art. These significant gains validate the effectiveness of the proposed TEI method and SSDPE architecture.

PDF Details DOI

EAAI Journal 2026 Journal Article

Vibration-induced deformation prediction and multi-staged parameter optimization of coarse-grained soils: Trade-off between energy and time

Ying Chen
Qun Qi
Zhihong Nie

Details DOI

EAAI Journal 2025 Journal Article

A novel convolutional neural network with global perception for bearing fault diagnosis

Xianguo Li
Ying Chen
Yi Liu

Details DOI

JBHI Journal 2025 Journal Article

BioMTAN: A Biological Knowledge-guided Multi-task Attention Network for Co-enhanced Cancer Diagnosis and Prognosis

Ying Chen
Jiajing Xie
Yuxiang Lin
Yuhang Song
Wenxian Yang
Rongshan Yu

With the advancement of precision medicine, gene expression data have become a crucial tool in both cancer diagnosis and prognosis for different cancer types. The incorporation of biological pathways as prior knowledge has gained increasing interest in tackling the difficulties of high dimensionality and noisy information within gene expression data. However, most existing approaches guided by biological pathways ignore the intrinsic link between diagnostic and prognostic tasks in cancer research. They fail to capitalize on the potential of leveraging shared biological information from both tasks to enhance gene pathway representations. To this end, we introduce the Biological Knowledge-guided Multi-task Attention Network (BioMTAN), a novel multi-task learning framework designed for simultaneous prediction of molecular subtypes and survival risk. Specifically, we compile tailored knowledge collections that comprise multiple pathways for the two tasks, model them as unique subgraphs and use a multi-level information fusion strategy to provide a wealth of biological insights. Moreover, we develop a Multi-task Attention Module, which extracts essential global information functioning as the key and value by interacting with biological pathways from different collections, and utilizes task-specific local information as the query, efficiently decoding task-awareness feature for each task and facilitating communication across tasks within cancer diagnosis and prognosis. Extensive validation on the public The Cancer Genome Atlas (TCGA) datasets confirms the enhanced performance of BioMTAN and highlights the significant pathways in each task, underscoring its potential as an instrumental asset in precision oncology.

Details DOI

YNIMG Journal 2025 Journal Article

Disrupted structural connectivity-gray matter covariance coupling and associated cytoarchitectural and transcriptomic profiles in attention-deficit/hyperactivity disorder

Yajing Long
Nanfang Pan
Song Wang
Kun Qin
Qiuxing Chen
Clara S. Vetter
Manpreet K. Singh
Alex Fornito

Details DOI

EAAI Journal 2025 Journal Article

Fully logits guided distillation with intermediate decision learning for deep model compression

Yiqin Wang
Ying Chen

Details DOI

AAAI Conference 2025 Conference Paper

Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection

Jiaqi Chen
Xiaoye Zhu
Tianyang Liu
Ying Chen
Chen Xinhui
Yiwen Yuan
Chak Tou Leong
Zuchao Li

Large Language Models (LLMs) have revolutionized text generation, making detecting machine-generated text increasingly challenging. Although past methods have achieved good performance on detecting pure machine-generated text, those detectors have poor performance on distinguishing machine-revised text (rewriting, expansion, and polishing), which can have only minor changes from its original human prompt. As the content of text may originate from human prompts, detecting machine-revised text often involves identifying distinctive machine styles, e.g., worded favored by LLMs. However, existing methods struggle to detect machine-style phrasing hidden within the content contributed by humans. We propose the “Imitate Before Detect” (ImBD) approach, which first imitates the machine-style token distribution, and then compares the distribution of the text to be tested with the machine-style distribution to determine whether the text has been machine-revised. To this end, we introduce Style Preference Optimization (SPO), which aligns a scoring LLM model to the preference of text styles generated by machines. The aligned scoring model is then used to calculate the style-conditional probability curvature (Style-CPC), quantifying the log probability difference between the original and conditionally sampled texts for effective detection. We conduct extensive comparisons across various scenarios, encompassing text revisions by six LLMs, four distinct text domains, and three machine revision types. Compared to existing state-of-the-art methods, our method yields a 13% increase in AUC for detecting text revised by open-source LLMs, and improves performance by 5% and 19% for detecting GPT-3.5 and GPT-4o revised text, respectively. Notably, our method surpasses the commercially trained GPT-Zero with just 1,000 samples and five minutes of SPO, demonstrating its efficiency and effectiveness.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

ListenNet: A Lightweight Spatio-Temporal Enhancement Nested Network for Auditory Attention Detection

Cunhang Fan
Xiaoke Yang
Hongyu Zhang
Ying Chen
Lu Li
Jian Zhou
Zhao Lv

Auditory attention detection (AAD) aims to identify the direction of the attended speaker in multi-speaker environments from brain signals, such as Electroencephalography (EEG) signals. However, existing EEG-based AAD methods overlook the spatio-temporal dependencies of EEG signals, limiting their decoding and generalization abilities. To address these issues, this paper proposes a Lightweight Spatio-Temporal Enhancement Nested Network (ListenNet) for AAD. The ListenNet has three key components: Spatio-temporal Dependency Encoder (STDE), Multi-scale Temporal Enhancement (MSTE), and Cross-Nested Attention (CNA). The STDE reconstructs dependencies between consecutive time windows across channels, improving the robustness of dynamic pattern extraction. The MSTE captures temporal features at multiple scales to represent both fine-grained and long-range temporal patterns. In addition, the CNA integrates hierarchical features more effectively through novel dynamic attention mechanisms to capture deep spatio-temporal correlations. Experimental results on three public datasets demonstrate the superiority of ListenNet over state-of-the-art methods in both subject-dependent and challenging subject-independent settings, while reducing the trainable parameter count by approximately 7 times. Code is available at: https: //github. com/fchest/ListenNet.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

M3ANet: Multi-scale and Multi-Modal Alignment Network for Brain-Assisted Target Speaker Extraction

Cunhang Fan
Ying Chen
Jian Zhou
Zexu Pan
Jingjing Zhang
Youdian Gao
Xiaoke Yang
Zhengqi Wen

The brain-assisted target speaker extraction (TSE) aims to extract the attended speech from mixed speech by utilizing the brain neural activities, for example Electroencephalography (EEG). However, existing models overlook the issue of temporal misalignment between speech and EEG modalities, which hampers TSE performance. In addition, the speech encoder in current models typically uses basic temporal operations (e. g. , one-dimensional convolution), which are unable to effectively extract target speaker information. To address these issues, this paper proposes a multi-scale and multi-modal alignment network (M3ANet) for brain-assisted TSE. Specifically, to eliminate the temporal inconsistency between EEG and speech modalities, the modal alignment module that uses a contrastive learning strategy is applied to align the temporal features of both modalities. Additionally, to fully extract speech information, multi-scale convolutions with GroupMamba modules are used as the speech encoder, which scans speech features at each scale from different directions, enabling the model to capture deep sequence information. Experimental results on three publicly available datasets show that the proposed model outperforms current state-of-the-art methods across various evaluation metrics, highlighting the effectiveness of our proposed method. The source code is available at: https: //github. com/fchest/M3ANet.

PDF Details DOI

JBHI Journal 2025 Journal Article

SleepHybridNet: A Lightweight Hybrid CNN-Transformer Model for Enhanced N1 Sleep Staging From Single-Channel EEG

Hao Zhou
Mengxiang Su
Jeng-Shyang Pan
Chenglong Dai
Ying Chen
Shu-Chuan Chu

This study introduces SleepHybridNet, a lightweight hybrid CNN-Transformer model designed to enhance the classification of non-rapid eye movement stage 1 (N1) sleep using single-channel electroencephalogram (EEG) signals. Accurate identification of the N1 stage is of critical importance in both sleep neuroscience and clinical practice. However, due to the ambiguous features during N1 stage, current deep learning models still struggle to achieve satisfactory performance. To address these challenges, SleepHybridNet integrates multi-scale feature fusion and sequence modeling through a novel architecture. It consists of a Multi-Scale Convolutional Neural Network (MSCNN) module, a Transformer encoder, a spectral feature extraction unit, and a multi-task classifier. Experimental results based on the publicly available Sleep-EDF Expanded dataset demonstrate that SleepHybridNet outperforms existing methods in both classification accuracy and generalization capability. Specifically, the model achieves an overall accuracy of 88. 2% and an F1-score of 0. 633 for the N1 stage, showing superior performance particularly in underrepresented classes such as N1 and N3 stages. With only 5. 1 M parameters, the lightweight design of the model can enable practical deployment in clinical settings, bridging the gap between high-performance deep learning algorithms and practical applicability in sleep medicine. Future work may explore the integration of multimodal data from wearable sensors to further expand its use in diverse application scenarios.

Details DOI

ICRA Conference 2024 Conference Paper

3D Object Detection with VI-SLAM Point Clouds: The Impact of Object and Environment Characteristics on Model Performance

Lin Duan
Tim Scargill
Ying Chen
Maria Gorlatova

3D object detection (OD) is a crucial element in scene understanding. However, most existing 3D OD models have been tailored to work with light detection and ranging (LiDAR) and RGB-D point cloud data, leaving their performance on commonly available visual-inertial simultaneous localization and mapping (VI-SLAM) point clouds unexamined. In this paper, we create and release two datasets: VIP500, 4772 VI-SLAM point clouds covering 500 different object and environment configurations, and VIP500-D, an accompanying set of 20 RGB-D point clouds for the object classes and shapes in VIP500. We then use these datasets to quantify the differences between VI-SLAM point clouds and dense RGB-D point clouds, as well as the discrepancies between VI-SLAM point clouds generated with different object and environment characteristics. Finally, we evaluate the performance of three leading OD models on the diverse data in our VIP500 dataset, revealing the promise of OD models trained on VI-SLAM data; we examine the extent to which both object and environment characteristics impact performance, along with the underlying causes.

Details

AAAI Conference 2024 Conference Paper

Unsupervised Continual Anomaly Detection with Contrastively-Learned Prompt

Jiaqi Liu
Kai Wu
Qiang Nie
Ying Chen
Bin-Bin Gao
Yong Liu
Jinbao Wang
Chengjie Wang

Unsupervised Anomaly Detection (UAD) with incremental training is crucial in industrial manufacturing, as unpredictable defects make obtaining sufficient labeled data infeasible. However, continual learning methods primarily rely on supervised annotations, while the application in UAD is limited due to the absence of supervision. Current UAD methods train separate models for different classes sequentially, leading to catastrophic forgetting and a heavy computational burden. To address this issue, we introduce a novel Unsupervised Continual Anomaly Detection framework called UCAD, which equips the UAD with continual learning capability through contrastively-learned prompts. In the proposed UCAD, we design a Continual Prompting Module (CPM) by utilizing a concise key-prompt-knowledge memory bank to guide task-invariant 'anomaly' model predictions using task-specific 'normal' knowledge. Moreover, Structure-based Contrastive Learning (SCL) is designed with the Segment Anything Model (SAM) to improve prompt learning and anomaly segmentation results. Specifically, by treating SAM's masks as structure, we draw features within the same mask closer and push others apart for general feature representations. We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation, demonstrating that our method is significantly better than anomaly detection methods, even with rehearsal training. The code will be available at https://github.com/shirowalker/UCAD.

PDF Details DOI

TIST Journal 2023 Journal Article

Attention-guided Adversarial Attack for Video Object Segmentation

Rui Yao
Ying Chen
Yong Zhou
Fuyuan Hu
Jiaqi Zhao
Bing Liu
Zhiwen Shao

Video Object Segmentation (VOS) methods have made many breakthroughs with the help of the continuous development and advancement of deep learning. However, the deep learning model is vulnerable to malicious adversarial attacks, which mislead the model to make wrong decisions by adding adversarial perturbation that humans cannot perceive to the input image. Threats to deep learning models remind us that video object segmentation methods are also vulnerable to attacks, thereby threatening their security. Therefore, we study adversarial attacks on the VOS task to better identify the vulnerabilities of the VOS method, which in turn provides an opportunity to improve its robustness. In this paper, we propose an attention-guided adversarial attack method, which uses spatial attention blocks to capture features with global dependencies to construct correlations between consecutive video frames, and performs multipath aggregation to effectively integrate spatial-temporal perturbation, thereby guiding the deconvolution network to generate adversarial examples with strong attack capability. Specifically, the class loss function is designed to enable the deconvolution network to better activate noise in other regions and suppress the activation related to the object class based on the enhanced feature map of the object class. At the same time, attentional feature loss is designed to enhance the transferability against attack. The experimental results on the DAVIS dataset show that the proposed attention-guided adversarial attack method can significantly reduce the segmentation accuracy of OSVOS, and the J & F mean on DAVIS 2016 can reach 73.6% drop rate. The generated adversarial examples are also highly transferable to other video object segmentation models.

Details DOI

AAAI Conference 2023 Conference Paper

Copyright-Certified Distillation Dataset: Distilling One Million Coins into One Bitcoin with Your Private Key

Tengjun Liu
Ying Chen
Wanxuan Gu

The rapid development of neural network dataset distillation in recent years has provided new ideas in many areas such as continuous learning, neural network architecture search and privacy preservation. Dataset distillation is a very effective method to distill large training datasets into small data, thus ensuring that the test accuracy of models trained on their synthesized small datasets matches that of models trained on the full dataset. Thus, dataset distillation itself is commercially valuable, not only for reducing training costs, but also for compressing storage costs and significantly reducing the training costs of deep learning. However, copyright protection for dataset distillation has not been proposed yet, so we propose the first method to protect intellectual property by embedding watermarks in the dataset distillation process. Our approach not only popularizes the dataset distillation technique, but also authenticates the ownership of the distilled dataset by the models trained on that distilled dataset.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Geometric Analysis of Matrix Sensing over Graphs

Haixiang Zhang
Ying Chen
Javad Lavaei

In this work, we consider the problem of matrix sensing over graphs (MSoG). As a general case of matrix completion and matrix sensing problems, the MSoG problem has not been analyzed in the literature and the existing results cannot be directly applied to the MSoG problem. This work provides the first theoretical results on the optimization landscape of the MSoG problem. More specifically, we propose a new condition, named the $\Omega$-RIP condition, to characterize the optimization complexity of the problem. In addition, with an improved regularizer of the incoherence, we prove that the strict saddle property holds for the MSoG problem with high probability under the incoherence condition and the $\Omega$-RIP condition, which guarantees the polynomial-time global convergence of saddle-avoiding methods. Compared with state-of-the-art results, the bounds in this work are tight up to a constant. Besides the theoretical guarantees, we numerically illustrate the close relation between the $\Omega$-RIP condition and the optimization complexity.

PDF Details

ICLR Conference 2023 Conference Paper

TextShield: Beyond Successfully Detecting Adversarial Sentences in text classification

Lingfeng Shen
Ze Zhang
Haiyun Jiang
Ying Chen

Adversarial attack serves as a major challenge for neural network models in NLP, which precludes the model's deployment in safety-critical applications. A recent line of work, detection-based defense, aims to distinguish adversarial sentences from benign ones. However, {the core limitation of previous detection methods is being incapable of giving correct predictions on adversarial sentences unlike defense methods from other paradigms.} To solve this issue, this paper proposes TextShield: (1) we discover a link between text attack and saliency information, and then we propose a saliency-based detector, which can effectively detect whether an input sentence is adversarial or not. (2) We design a saliency-based corrector, which converts the detected adversary sentences to benign ones. By combining the saliency-based detector and corrector, TextShield extends the detection-only paradigm to a detection-correction paradigm, thus filling the gap in the existing detection-based defense. Comprehensive experiments show that (a) TextShield consistently achieves higher or comparable performance than state-of-the-art defense methods across various attacks on different benchmarks. (b) our saliency-based detector outperforms existing detectors for detecting adversarial sentences.

Details

YNICL Journal 2022 Journal Article

Changes in brain connectivity linked to multisensory processing of pain modulation in migraine with acupuncture treatment

Lu Liu
Tian-Li Lyu
Ming-Yang Fu
Lin-Peng Wang
Ying Chen
Jia-Hui Hong
Qiu-Yi Chen
Yu-Pu Zhu

Details DOI

AAAI Conference 2022 Conference Paper

Guide Local Feature Matching by Overlap Estimation

Ying Chen
Dihe Huang
Shang Xu
Jianlin Liu
Yong Liu

Local image feature matching under large appearance, viewpoint, and distance changes is challenging yet important. Conventional methods detect and match tentative local features across the whole images, with heuristic consistency checks to guarantee reliable matches. In this paper, we introduce a novel Overlap Estimation method conditioned on image pairs with TRansformer, named OETR, to constrain local feature matching in the commonly visible region. OETR performs overlap estimation in a two step process of feature correlation and then overlap regression. As a preprocessing module, OETR can be plugged into any existing local feature detection and matching pipeline, to mitigate potential view angle or scale variance. Intensive experiments show that OETR can boost state of the art local feature matching performance substantially, especially for image pairs with small shared regions. The code will be publicly available at https: //github. com/AbyssGaze/OETR.

PDF Details

AAAI Conference 2022 Conference Paper

KATG: Keyword-Bias-Aware Adversarial Text Generation for Text Classification

Lingfeng Shen
Shoushan Li
Ying Chen

Recent work has shown that current text classification models are vulnerable to a small adversarial perturbation on inputs, and adversarial training that re-trains the models with the support of adversarial examples is the most popular way to alleviate the impact of the perturbation. However, current adversarial training methods have two principal problems: a drop in model’s generalization and ineffective defending against other text attacks. In this paper, we propose a Keywordbias-aware Adversarial Text Generation model (KATG) that implicitly generates adversarial sentences using a generatordiscriminator structure. Instead of using a benign sentence to generate an adversarial sentence, the KATG model utilizes extra multiple benign sentences (namely prior sentences) to guide adversarial sentence generation. Furthermore, to cover more perturbations used in existing attacks, a keyword-biasbased sampling is proposed to select sentences containing biased words as prior sentences. Besides, to effectively utilize prior sentences, a generative flow mechanism is proposed to construct a latent semantic space for learning a latent representation of the prior sentences. Experiments demonstrate that adversarial sentences generated by our KATG model can strengthen the generalization and the robustness of text classification models. Benign Sentence Sixthreezero is good, I’ve used it for a long time, only changed because I got tired of the same old bike. (Pos) Prior Sentences S1: Blackberry may work on the systems, but I’m not willing to take that chance on a new expensive phone. (Neg) S2: Iphone4s is in ok previously used condition as stated. But I was disappointed I couldn’t activate the phone upon arrival. (Neg) Adv. Sentence Amazing Iphone4s, used it for so long, only changed because I got tired of the old expensive Blackberry. (Pos) Table 1: Benign sentence, prior sentences and adversarial sentence used in our KATG model. *the corresponding author Copyright © 2022, Association for the Advancement of Artificial Intelligence (www. aaai. org). All rights reserved.

PDF Details

IJCAI Conference 2021 Conference Paper

KDExplainer: A Task-oriented Attention Model for Explaining Knowledge Distillation

Mengqi Xue
Jie Song
Xinchao Wang
Ying Chen
Xingen Wang
Mingli Song

Knowledge distillation (KD) has recently emerged as an efficacious scheme for learning compact deep neural networks (DNNs). Despite the promising results achieved, the rationale that interprets the behavior of KD has yet remained largely understudied. In this paper, we introduce a novel task-oriented attention model, termed as KDExplainer, to shed light on the working mechanism underlying the vanilla KD. At the heart of KDExplainer is a Hierarchical Mixture of Experts (HME), in which a multi-class classification is reformulated as a multi-task binary one. Through distilling knowledge from a free-form pre-trained DNN to KDExplainer, we observe that KD implicitly modulates the knowledge conflicts between different subtasks, and in reality has much more to offer than label smoothing. Based on such findings, we further introduce a portable tool, dubbed as virtual attention module (VAM), that can be seamlessly integrated with various DNNs to enhance their performance under KD. Experimental results demonstrate that with a negligible additional cost, student models equipped with VAM consistently outperform their non-VAM counterparts across different benchmarks. Furthermore, when combined with other KD methods, VAM remains competent in promoting results, even though it is only motivated by vanilla KD. The code is available at https: // github. com/zju-vipa/KDExplainer.

PDF Details DOI

AAAI Conference 2021 Conference Paper

MANGO: A Mask Attention Guided One-Stage Scene Text Spotter

Liang Qiao
Ying Chen
Zhanzhan Cheng
Yunlu Xu
Yi Niu
Shiliang Pu
Fei Wu

Recently end-to-end scene text spotting has become a popular research topic due to its advantages of global optimization and high maintainability in real applications. Most methods attempt to develop various region of interest (RoI) operations to concatenate the detection part and the sequence recognition part into a two-stage text spotting framework. However, in such framework, the recognition part is highly sensitive to the detected results (e. g. , the compactness of text contours). To address this problem, in this paper, we propose a novel Mask AttentioN Guided One-stage text spotting framework named MANGO, in which character sequences can be directly recognized without RoI operation. Concretely, a positionaware mask attention module is developed to generate attention weights on each text instance and its characters. It allows different text instances in an image to be allocated on different feature map channels which are further grouped as a batch of instance features. Finally, a lightweight sequence decoder is applied to generate the character sequences. It is worth noting that MANGO inherently adapts to arbitraryshaped text spotting and can be trained end-to-end with only coarse position information (e. g. , rectangular bounding box) and text annotations. Experimental results show that the proposed method achieves competitive and even new state-ofthe-art performance on both regular and irregular text spotting benchmarks, i. e. , ICDAR 2013, ICDAR 2015, Total-Text, and SCUT-CTW1500.

PDF Details

NeurIPS Conference 2021 Conference Paper

Refining Language Models with Compositional Explanations

Huihan Yao
Ying Chen
Qinyuan Ye
Xisen Jin
Xiang Ren

Pre-trained language models have been successful on text classification tasks, but are prone to learning spurious correlations from biased datasets, and are thus vulnerable when making inferences in a new domain. Prior work reveals such spurious patterns via post-hoc explanation algorithms which compute the importance of input features. Further, the model is regularized to align the importance scores with human knowledge, so that the unintended model behaviors are eliminated. However, such a regularization technique lacks flexibility and coverage, since only importance scores towards a pre-defined list of features are adjusted, while more complex human knowledge such as feature interaction and pattern generalization can hardly be incorporated. In this work, we propose to refine a learned language model for a target domain by collecting human-provided compositional explanations regarding observed biases. By parsing these explanations into executable logic rules, the human-specified refinement advice from a small set of explanations can be generalized to more training examples. We additionally introduce a regularization term allowing adjustments for both importance and interaction of features to better rectify model behavior. We demonstrate the effectiveness of the proposed approach on two text classification tasks by showing improved performance in target domain as well as improved model fairness after refinement.

PDF Details

AAAI Conference 2021 Conference Paper

Temporal-Coded Deep Spiking Neural Network with Easy Training and Robust Performance

Shibo Zhou
Xiaohua Li
Ying Chen
Sanjeev T. Chandrasekaran
Arindam Sanyal

Spiking neural network (SNN) is promising but the development has fallen far behind conventional deep neural networks (DNNs) because of difficult training. To resolve the training problem, we analyze the closed-form input-output response of spiking neurons and use the response expression to build abstract SNN models for training. This avoids calculating membrane potential during training and makes the direct training of SNN as efficient as DNN. We show that the nonleaky integrate-and-fire neuron with single-spike temporalcoding is the best choice for direct-train deep SNNs. We develop an energy-efficient phase-domain signal processing circuit for the neuron and propose a direct-train deep SNN framework. Thanks to easy training, we train deep SNNs under weight quantizations to study their robustness over low-cost neuromorphic hardware. Experiments show that our direct-train deep SNNs have the highest CIFAR-10 classification accuracy among SNNs, achieve ImageNet classification accuracy within 1% of the DNN of equivalent architecture, and are robust to weight quantization and noise perturbation.

PDF Details

AIIM Journal 2020 Journal Article

ADHD classification by dual subspace learning using resting-state functional connectivity

Ying Chen
Yibin Tang
Chun Wang
Xiaofeng Liu
Li Zhao
Zhishun Wang

Details DOI

JAIR Journal 2019 Journal Article

DSTL: Solution to Limitation of Small Corpus in Speech Emotion Recognition

Ying Chen
Zhongzhe Xiao
Xiaojun Zhang
Zhi Tao

Traditional machine learning methods share a common hypothesis: training and testing datasets must be in a common feature space with the same distribution. However, in reality, the labeled target data may be rare, so that target space does not share the same feature space or distribution as an available training set (source domain). To address the mismatch of domains, we propose a Dual-Subspace Transfer Learning (DSTL) framework that considers both the common and specific information of the two domains. In DSTL, a latent common subspace is first learned to preserve the data properties and reduce the discrepancy of domains. Then, we propose a mapping strategy to transfer the sourcespecific information to the target subspace. The integration of the domain-common and specific information constructs the proposed DSTL framework. In comparison to the stateart-of works, the main contribution of our work is that the DSTL framework not only considers the commonalities, but also exploits the specific information. Experiments on three emotional speech corpora verify the effectiveness of our approach. The results show that the methods which include both domain-common and specific information perform better than the baseline methods which only exploit the domain commonalities.

PDF Details DOI

AAAI Conference 2019 Conference Paper

M2Det: A Single-Shot Object Detector Based on Multi-Level Feature Pyramid Network

Qijie Zhao
Tao Sheng
Yongtao Wang
Zhi Tang
Ying Chen
Ling Cai
Haibin Ling

Feature pyramids are widely exploited by both the state-ofthe-art one-stage object detectors (e. g. , DSSD, RetinaNet, RefineDet) and the two-stage object detectors (e. g. , Mask R- CNN, DetNet) to alleviate the problem arising from scale variation across object instances. Although these object detectors with feature pyramids achieve encouraging results, they have some limitations due to that they only simply construct the feature pyramid according to the inherent multiscale, pyramidal architecture of the backbones which are originally designed for object classification task. Newly, in this work, we present Multi-Level Feature Pyramid Network (MLFPN) to construct more effective feature pyramids for detecting objects of different scales. First, we fuse multi-level features (i. e. multiple layers) extracted by backbone as the base feature. Second, we feed the base feature into a block of alternating joint Thinned U-shape Modules and Feature Fusion Modules and exploit the decoder layers of each Ushape module as the features for detecting objects. Finally, we gather up the decoder layers with equivalent scales (sizes) to construct a feature pyramid for object detection, in which every feature map consists of the layers (features) from multiple levels. To evaluate the effectiveness of the proposed MLFPN, we design and train a powerful end-to-end one-stage object detector we call M2Det by integrating it into the architecture of SSD, and achieve better detection performance than state-of-the-art one-stage detectors. Specifically, on MS- COCO benchmark, M2Det achieves AP of 41. 0 at speed of 11. 8 FPS with single-scale inference strategy and AP of 44. 2 with multi-scale inference strategy, which are the new stateof-the-art results among one-stage detectors. The code will be made available on https: //github. com/qijiezhao/M2Det.

PDF Details

YNICL Journal 2018 Journal Article

Disrupted grey matter network morphology in pediatric posttraumatic stress disorder

Running Niu
Du Lei
Fuqin Chen
Ying Chen
Xueling Suo
Lingjiang Li
Su Lui
Xiaoqi Huang

Details DOI

YNICL Journal 2018 Journal Article

Volume alteration of hippocampal subfields in first-episode antipsychotic-naïve schizophrenia patients before and after acute antipsychotic treatment

Wenbin Li
Kaiming Li
Pujun Guan
Ying Chen
Yuan Xiao
Su Lui
John A. Sweeney
Qiyong Gong

Details DOI

IS Journal 2015 Journal Article

A Study on a Cabled Seafloor Observatory

Fengzhong Qu
Zhenduo Wang
Hong Song
Ying Chen
Liuqing Yang

This article discusses the development of Zhejiang University's Zhairuoshan Island Experimental Research Observatory (ZERO). The authors discuss its background, network structure, components, sea trials, and future plans. The authors predict that ZERO will have an important influence as a collaborative center for scientists, engineers, and the public.

Details DOI

AAAI Conference 2011 Conference Paper

Cross Media Entity Extraction and Linkage for Chemical Documents

Su Yan
Scott Spangler
Ying Chen

Text and images are two major sources of information in scientiﬁc literature. Information from these two media typically reinforce and complement each other, thus simplifying the process for human to extract and comprehend information. However, machines cannot create the links or have the semantic understanding between images and text. We propose to integrate text analysis and image processing techniques to bridge the gap between the two media, and discover knowledge from the combined information sources, which would be otherwise lost by traditional single-media based mining systems. The focus is on the chemical entity extraction task because images are well known to add value to the textual content in chemical literature. Annotation of US chemical patent documents demonstrates the effectiveness of our proposal.

PDF Details