Arrow Research search

Author name cluster

Yan Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

37 papers
2 author rows

Possible papers

37

JBHI Journal 2026 Journal Article

Radar HRV Monitoring With Physiological Prior Inspired Deep Neural Networks

  • Haoyu Wang
  • Jinbo Chen
  • Dongheng Zhang
  • Zhi Lu
  • Yang Hu
  • Qibin Sun
  • Yan Chen

Radar sensing has emerged as a promising solution for the contactless monitoring of Heart Rate Variability (HRV), a crucial indicator of the cardiovascular and autonomic nervous systems. However, due to signal noise and interference that easily obscure heartbeat details, along with variations in heartbeat across different physiological conditions, existing methods remain restricted to laboratory settings with healthy subjects and fail in real-world scenarios involving more complex physiological conditions. In this study, we propose a physiological prior-inspired deep learning framework for robust radar-based HRV monitoring. Specifically, we leverage the prior that internal heartbeats drive movements across the entire torso surface and design a hybrid deep neural network to model the spatio-temporal relationship between full-body radio reflections and heartbeats, effectively mitigating interference. Then, we incorporate the cardiac motion's self-similarity prior to establish a signal augmentation strategy, effectively remodeling the HRV distribution and enhancing performance across diverse physiological conditions. We build and validate our method on a large-scale dataset comprising 7, 150 outpatients with complex physiological conditions in real-world scenarios. The experimental results demonstrate that our method achieves a mean IBI error of 19. 21 ms, an RMSSD error of 16. 23 ms, an SDSD error of 16. 70 ms, and a pNN50 error of 7. 28%. We further validate the performance by classifying five common cardiac conditions based on HRV results, demonstrating performance comparable to ECG-based methods. These results highlight the great potential of our approach for accurate, contactless HRV monitoring in real-world applications.

AAAI Conference 2026 Conference Paper

Semi-Supervised Synthetic Data Generation with Fine-Grained Relevance Control for Short Video Search Relevance Modeling

  • Haoran Li
  • Zhiming Su
  • Junyan Yao
  • Enwei Zhang
  • Yang Ji
  • Yan Chen
  • Kan Zhou
  • Chao Feng

Synthetic data is widely adopted in embedding models to ensure diversity in training data distributions across dimensions such as difficulty, length, and language. However, existing prompt-based synthesis methods struggle to capture domain-specific data distributions, particularly in data-scarce domains, and often overlook fine-grained relevance diversity. In this paper, we present a Chinese short video dataset with 4-level relevance annotations, filling a critical resource void. Further, we propose a semi-supervised synthetic data pipeline where two collaboratively trained models generate domain-adaptive short video data with controllable relevance labels. Our method enhances relevance-level diversity by synthesizing samples for underrepresented intermediate relevance labels, resulting in a more balanced and semantically rich training data set. Extensive offline experiments show that the embedding model trained on our synthesized data outperforms those using data generated based on prompting or vanilla supervised fine-tuning(SFT). Moreover, we demonstrate that incorporating more diverse fine-grained relevance levels in training data enhances the model's sensitivity to subtle semantic distinctions, highlighting the value of fine-grained relevance supervision in embedding learning. In the search enhanced recommendation pipeline of Douyin's dual-column scenario, through online A/B testing, the proposed model increased click-through rate(CTR) by 1.45%, raised the proportion of Strong Relevance Ratio (SRR) by 4.9%, and improved the Image User Penetration Rate (IUPR) by 0.1054%.

JBHI Journal 2026 Journal Article

WGB-GLFI: A Novel Graph-Based Global-Local Feature Interaction Framework for Automated Seizure Detection

  • Xiang Li
  • Mingxing Zhu
  • Chuqi Yang
  • Ke Zhang
  • Xin Wang
  • Sunday Timothy Aboyeji
  • Fei Chen
  • Chen Yao

Epilepsy detection faces significant challenges due to unpredictable seizures, ranging from brief awareness lapses to severe convulsions, posing risks to patients' safety and quality of life. In recent years, deep learning has become a mainstream approach in this field, leveraging advanced computational resources and EEG datasets. However, a key challenge remains: existing methods often lack unified spatial modeling and struggle to effectively handle local detailed features, thereby limiting their accuracy and robustness. To address these issues, we propose the Weighted Graph Building Global-Local Feature Interaction (WGB-GLFI) framework, which integrates spatial connectivity and dynamic patterns through a Weighted Graph Building (WGB) module and a Global-Local Feature Interaction (GLFI) module. This approach excels by comprehensively capturing the dynamic spatial relationships during epileptic seizures and achieving seamless global-local feature integration, significantly enhancing seizure detection performance. Its effectiveness has been validated across multiple datasets, including CHB-MIT, Siena Scalp, and private datasets, demonstrating robust and reliable results. Evaluated on these datasets, our model achieves accuracy rates of 99. 28%, 99. 21%, and 99. 30%, respectively. The reliability and robustness of our framework provide epilepsy patients with faster and more reliable seizure detection, which helps to intervene in a timely manner and improve the quality of life of patients.

JBHI Journal 2026 Journal Article

WN-Sleep: Modeling Whole-Night Data for Improved Sleep Staging Classification

  • Fang Zhou
  • Zhi Lu
  • Zhi Wu
  • Gaohan Ye
  • Lingjie Shu
  • Yu Pu
  • Beilei Wang
  • Dong Zhang

Sleep staging, crucial for diagnosing sleep disorders, requires precise recognition of physiological signals within 30-second epochs, a task fundamentally different from managing long-term semantic dependencies in natural language processing (NLP). Our model aims to refine the integration of local and global features for more accurate sleep stage classification. Following the American Academy of Sleep Medicine (AASM) guidelines, it focuses on rigorous intra-epoch feature extraction to ensure reliable identification of sleep stages. Moreover, our approach incorporates a global perspective by analyzing whole-night data, which is essential for handling transitional periods and ambiguities. Existing sequential modeling techniques often overlook the unique requirements of sleep staging, leading to performance declines when epochs extend beyond approximately 200. Our model addresses this by structurally processing local and global information and carefully balancing detailed intra-epoch analysis with an overarching view of sleep cycles through a gating mechanism. This gate mechanism selectively integrates long-term dependencies, optimizing the balance between local accuracy and global context. This approach represents a significant advancement over existing models, offering more accurate, reliable, and clinically relevant sleep staging. Extensive experiments on the SHHS, SleepEDF-20, and SleepEDF-78 datasets demonstrate that our method outperforms state-of-the-art approaches.

IROS Conference 2025 Conference Paper

Compact R-X-Y Stage and Dual-Finger Micromanipulator under Inverted Optical Microscope for Microassembly

  • Jichao Pang
  • Zhuo Chen 0006
  • Yan Chen
  • Yuke Li
  • Yunsheng Li
  • Qiang Huang 0002
  • Tatsuo Arai
  • Xiaoming Liu 0007

Microassembly plays an important role in fabricating complex structures with small basic components in industrial and biomedical fields. Inverted optical microscope could provide high-quality image feedback for microassembly with its continuously improving resolution. However, a compact stage capable of positioning and reorienting micro-objects while fitting within the limited space under an inverted optical microscope remains unavailable. This paper proposes a compact R-X-Y stage that can transport micro-objects over long distances in the X and Y directions, and reorient the objects by the 360-degree continuous rotation. Additionally, different from commonly putting the rotational stage on the X-Y stage, we mount the thin X-Y stage on a rotational stage. Thus, after aligning the centers of the visual field and rotational stage at the beginning, all the visiable micro-objects will not move out of the visual field during the rotation. We further integrate the R-X-Y stage and the dual-finger micromanipulator, and then use them to assemble 2-D patterns and complex 3-D micromachine. The obtained results and preliminary demonstration indicate that the proposed compact R-X-Y has great potential in assembling complex micromachines.

ICML Conference 2025 Conference Paper

Concurrent Reinforcement Learning with Aggregated States via Randomized Least Squares Value Iteration

  • Yan Chen
  • Qinxun Bai
  • Yiteng Zhang
  • Maria Dimakopoulou
  • Shi Dong
  • Qi Sun
  • Zhengyuan Zhou

Designing learning agents that explore efficiently in a complex environment has been widely recognized as a fundamental challenge in reinforcement learning. While a number of works have demonstrated the effectiveness of techniques based on randomized value functions on a single agent, it remains unclear, from a theoretical point of view, whether injecting randomization can help a society of agents concurently explore an environment. The theoretical results established in this work tender an affirmative answer to this question. We adapt the concurrent learning framework to randomized least-squares value iteration (RLSVI) with aggregated state representation. We demonstrate polynomial worst-case regret bounds in both finite- and infinite-horizon environments. In both setups the per-agent regret decreases at an optimal rate of $\Theta\left(\frac{1}{\sqrt{N}}\right)$, highlighting the advantage of concurent learning. Our algorithm exhibits significantly lower space complexity compared to Russo (2019) and Agrawal et. al (2021). We reduce the space complexity by a factor of $K$ while incurring only a $\sqrt{K}$ increase in the worst-case regret bound, compared to Russo (2019) and Agrawal et. al (2021). Interestingly, our algorithm improves the worst-case regret bound of Russo (2019) by a factor of $H^{1/2}$, matching the improvement in Agrawal et. al (2021). However, this result is achieved through a fundamentally different algorithmic enhancement and proof technique. Additionally, we conduct numerical experiments to demonstrate our theoretical findings.

IROS Conference 2025 Conference Paper

DMPBot: A high-speed, high-precision, omnidirectional, insect-scale piezoelectric robot

  • Yan Chen
  • Shu Chen
  • Zheyu Yang
  • Pengyu Liu
  • Sicheng Chen
  • Ziru Deng
  • Junqi An
  • Qiang Huang 0002

Microrobots have garnered significant attention due to their vast potential applications across various fields. Among various types of microrobots, piezoelectric robots stand out due to their exceptional motion accuracy, low power consumption, and simple structural design. This work introduces a novel piezoelectric microrobot, the Dual-Modal Piezoelectric Robot (DMPBot), which is fabricated with an innovative carbon fiber substrate through a heat-pressing process with a compact size of 6 mm × 9 mm × 1. 1 mm and a weight of only 0. 05 g. DMPBot can achieve both high-speed and high-precision motion in non-resonant mode, as well as omnidirectional movement by integrating non-resonant and resonant modes. In non-resonant mode, the robot can reach a speed of 33 mm/s (3. 67 body lengths per second) and a sub-micron resolution of 0. 4 μm by adjusting the applied signal. This work presents an analysis of the design, fabrication, and performance of DMPBot, focusing on its dynamic response, motion mechanisms, high-speed and high-precision motion, and omnidirectional movement capabilities. Experimental results validate the ability of DMPBot to perform high-speed, high-precision, and omnidirectional motion, demonstrating its promising potential in the field of micromanipulation.

AAAI Conference 2025 Conference Paper

Motion-adaptive Transformer for Event-based Image Deblurring

  • Senyan Xu
  • Zhijing Sun
  • Mingchen Zhong
  • Chengzhi Cao
  • Yidi Liu
  • Xueyang Fu
  • Yan Chen

Event cameras, which capture pixel-level brightness changes asynchronously, provide rich motion information that is often missed during traditional frame-based camera exposures, thereby offering fresh perspectives for motion deblurring. Although current approaches incorporate event intensity, they neglect essential spatial motion information. Unlike their CNN architectures, Transformers excel in modeling long-range dependencies but struggle with establishing relevant non-local connections in sparse events and fail to highlight significant interactions in dense images. To address these limitations, we introduce a Motion-Adaptive Transformer network (MAT) that utilizes spatial motion information to forge robust global connections. The core design is an Adaptive Motion Mask Predictor (AMMP) that identifies key motion regions, guiding the Motion-Sparse Attention (MSA) to eliminate irrelevant event tokens and enabling the Motion-Aware Attention (MAA) to focus on relevant ones, thereby enhancing long-range dependency modeling. Additionally, we elaborately design a Cross-Modal Intensity Gating mechanism that efficiently merges intensity data across modalities while minimizing parameter use. The learnable Expansion-Controlled Spatial Gating further optimizes the transmission of event features. Comprehensive testing confirms that our approach sets a new benchmark in image deblurring, surpassing previous methods by up to 0.60dB on the GoPro dataset, 1.04dB on the HS-ERGB dataset, and achieving an average improvement of 0.52dB across two real-world datasets.

NeurIPS Conference 2025 Conference Paper

RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility

  • Haoyu He
  • Haozheng Luo
  • Yan Chen
  • Qi Wang

Predicting human mobility is inherently challenging due to complex long-range dependencies and multi-scale periodic behaviors. To address this, we introduce RHYTHM (Reasoning with Hierarchical Temporal Tokenization for Human Mobility), a unified framework that leverages large language models (LLMs) as general-purpose spatio-temporal predictors and trajectory reasoners. Methodologically, RHYTHM employs temporal tokenization to partition each trajectory into daily segments and encode them as discrete tokens with hierarchical attention that captures both daily and weekly dependencies, thereby quadratically reducing the sequence length while preserving cyclical information. Additionally, we enrich token representations by adding pre-computed prompt embeddings for trajectory segments and prediction targets via a frozen LLM, and feeding these combined embeddings back into the LLM backbone to capture complex interdependencies. Computationally, RHYTHM keeps the pretrained LLM backbone frozen, yielding faster training and lower memory usage. We evaluate our model against state-of-the-art methods using three real-world datasets. Notably, RHYTHM achieves a 2. 4% improvement in overall accuracy, a 5. 0% increase on weekends, and a 24. 6% reduction in training time. Code is publicly available at https: //github. com/he-h/rhythm.

AAAI Conference 2025 Conference Paper

Sharper Error Bounds in Late Fusion Multi-view Clustering with Eigenvalue Proportion Optimization

  • Liang Du
  • Henghui Jiang
  • Xiaodong Li
  • Yiqing Guo
  • Yan Chen
  • Feijiang Li
  • Peng Zhou
  • Yuhua Qian

Multi-view clustering (MVC) aims to integrate complementary information from multiple views to enhance clustering performance. Late Fusion Multi-View Clustering (LFMVC) has shown promise by synthesizing diverse clustering results into a unified consensus. However, current LFMVC methods struggle with noisy and redundant partitions and often fail to capture high-order correlations across views. To address these limitations, we present a novel theoretical framework for analyzing the generalization error bounds of multiple kernel k-means, leveraging local Rademacher complexity and principal eigenvalue proportions. Our analysis establishes a convergence rate of O(1/n), significantly improving upon the existing rate in the order of O(sqrt(k/n)). Building on this insight, we propose a low-pass graph filtering strategy within a multiple linear K-means framework to mitigate noise and redundancy, further refining the principal eigenvalue proportion and enhancing clustering accuracy. Experimental results on benchmark datasets confirm that our approach outperforms state-of-the-art methods in clustering performance and robustness.

JBHI Journal 2024 Journal Article

A Real-Time Hand Gesture Recognition System for Low-Latency HMI via Transient HD-SEMG and In-Sensor Computing

  • Haomeng Qiu
  • Zhitao Chen
  • Yan Chen
  • Chaojie Yang
  • Sihan Wu
  • Fanglin Li
  • Longhan Xie

In real-time human-machine interaction (HMI) applications, hand gesture recognition (HGR) requires high accuracy with low latency. Surface electromyography (sEMG), a physiological electrical signal reflecting muscle activation, is extensively used in HMI. Recently, transient sEMG, generated during the gesture transitions, has been employed in HGR to achieve lower observational latency compared to steady-state sEMG. However, the use of long feature windows (up to 200 ms) still make it less desirable in low-latency HMI. In addition, most studies have relied on remote computing, where remote data processing and large data transfer result in high computation and network latency. In this paper, we proposed a method leveraging transient high density sEMG (HD-sEMG) and in-sensor computing to achieve low-latency HGR. An sEMG contrastive convolution network (sCCN) was proposed for HGR. The mean absolute value and its average integration were used to train the sCCN in a contrastive learning manner. In addition, all signal acquisition, data processing, and pattern recognition processes were deployed within designed sensor for in-sensor computing. Compared to the state-of-the-art study using multi-channel 200-ms transient sEMG, our proposed method achieved a comparable HGR accuracy of 0. 963, and a 58% lower observational latency of only 84 ms. In-sensor computing realizes a 4 times lower computation latency of 3 ms, and significantly reduces the network latency to 2 ms. The proposed method offers a promising approach to achieving low-latency HGR without compromising accuracy. This facilitates real-time HMI in biomedical applications such as prostheses, exoskeletons, virtual reality, and video games.

AAAI Conference 2024 Conference Paper

A Unified Knowledge Transfer Network for Generalized Category Discovery

  • Wenkai Shi
  • Wenbin An
  • Feng Tian
  • Yan Chen
  • Yaqiang Wu
  • Qianying Wang
  • Ping Chen

Generalized Category Discovery (GCD) aims to recognize both known and novel categories in an unlabeled dataset by leveraging another labeled dataset with only known categories. Without considering knowledge transfer from known to novel categories, current methods usually perform poorly on novel categories due to the lack of corresponding supervision. To mitigate this issue, we propose a unified Knowledge Transfer Network (KTN), which solves two obstacles to knowledge transfer in GCD. First, the mixture of known and novel categories in unlabeled data makes it difficult to identify transfer candidates (i.e., samples with novel categories). For this, we propose an entropy-based method that leverages knowledge in the pre-trained classifier to differentiate known and novel categories without requiring extra data or parameters. Second, the lack of prior knowledge of novel categories presents challenges in quantifying semantic relationships between categories to decide the transfer weights. For this, we model different categories with prototypes and treat their similarities as transfer weights to measure the semantic similarities between categories. On the basis of two treatments, we transfer knowledge from known to novel categories by conducting pre-adjustment of logits and post-adjustment of labels for transfer candidates based on the transfer weights between different categories. With the weighted adjustment, KTN can generate more accurate pseudo-labels for unlabeled data, which helps to learn more discriminative features and boost model performance on novel categories. Extensive experiments show that our method outperforms state-of-the-art models on all evaluation metrics across multiple benchmark datasets. Furthermore, different from previous clustering-based methods that can only work offline with abundant data, KTN can be deployed online conveniently with faster inference speed. Code and data are available at https://github.com/yibai-shi/KTN.

ICRA Conference 2024 Conference Paper

Development of a 3-RRS Micromanipulator Based on Origami-Inspired Spherical Joint

  • Haoqi Han
  • Xiaoming Liu 0007
  • Yan Chen
  • Hao Pang
  • Xiaoqing Tang
  • Dan Liu 0009
  • Qiang Huang 0002
  • Tatsuo Arai

In recent years, micromanipulation technology has achieved extensive applications in industry and life science. Improving the precision and bandwidth of the micromanipulator and simultaneously reducing size, weight, and cost pose significant challenges to the existing micromanipulator design and fabrication methods. Here, we propose a 3-RRS micromanipulator with an origami-inspired spherical joint based on the PC-MEMS process, aiming for miniaturization and cost-effectiveness. The spherical joint allows rotations of 140° around the x-axis approximately, 140° around the y-axis approximately, and 20° around the z-axis approximately. The micromanipulator has weights of 0. 8 g, dimensions of 16 mm × 16 mm × 22 mm, and workspace of 0. 7 mm 3. The end platform of the micromanipulator can be equipped with various effectors to accomplish different kinds of tasks. Experimental results validated its high precision and bandwidth, exhibiting its potential to perform intricate micromanipulation tasks.

NeurIPS Conference 2024 Conference Paper

Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery

  • Haonan Lin
  • Wenbin An
  • Jiahao Wang
  • Yan Chen
  • Feng Tian
  • Mengmeng Wang
  • Guang Dai
  • Qianying Wang

Recent advancements have shown promise in applying traditional Semi-Supervised Learning strategies to the task of Generalized Category Discovery (GCD). Typically, this involves a teacher-student framework in which the teacher imparts knowledge to the student to classify categories, even in the absence of explicit labels. Nevertheless, GCD presents unique challenges, particularly the absence of priors for new classes, which can lead to the teacher's misguidance and unsynchronized learning with the student, culminating in suboptimal outcomes. In our work, we delve into why traditional teacher-student designs falter in generalized category discovery as compared to their success in closed-world semi-supervised learning. We identify inconsistent pattern learning as the crux of this issue and introduce FlipClass—a method that dynamically updates the teacher to align with the student's attention, instead of maintaining a static teacher reference. Our teacher-attention-update strategy refines the teacher's focus based on student feedback, promoting consistent pattern recognition and synchronized learning across old and new classes. Extensive experiments on a spectrum of benchmarks affirm that FlipClass significantly surpasses contemporary GCD methods, establishing new standards for the field.

IJCAI Conference 2024 Conference Paper

Learning-Based Tracking-before-Detect for RF-Based Unconstrained Indoor Human Tracking

  • Zhi Wu
  • Dongheng Zhang
  • Zixin Shang
  • Yuqin Yuan
  • Hanqin Gong
  • Binquan Wang
  • Zhi Lu
  • Yadong Li

Existing efforts on human tracking using wireless signal are primarily focused on constrained scenarios with only a few individuals in empty spaces. However, in practical unconstrained scenarios with severe interference and attenuation, accurate multi-person tracking has been intractable. In this paper, we propose NeuralTBD, utilizing the capability of deep models and advancement of Tracking-Before-Detect (TBD) methodology to achieve accurate human tracking. TBD is a classical tracking methodology from signal processing accumulating measurement in time domain to distinguish target traces from interference, which however relies on handcrafted shape/motion models, impeding efficacy in complex indoor scenarios. To tackle this challenge, we build an end-to-end learning-based TBD framework leverages the advanced modeling capabilities of deep models to significantly enhance the performance of TBD. To evaluate NeuralTBD, we collect an RF-based tracking dataset in unconstrained scenarios, which encompasses 4 million annotated radar frames with up to 19 individuals acting in 6 different scenarios. NeuralTBD realizes a 70% improvement in performance compared to conventional TBD methods. To our knowledge, this is the first attempt dealing with RF-based unconstrained human tracking. The code and dataset will be released.

JBHI Journal 2024 Journal Article

Robust Epileptic Seizure Detection Based on Biomedical Signals Using an Advanced Multi-View Deep Feature Learning Approach

  • Ijaz Ahmad
  • Zhenzhen Liu
  • Lin Li
  • Inam Ullah
  • Sunday Timothy Aboyeji
  • Xin Wang
  • Oluwarotimi Williams Samuel
  • Guanglin Li

Epilepsy is a neurological disorder characterized by abnormal neuronal discharges that manifest in life-threatening seizures. These are often monitored via EEG signals, a key aspect of biomedical signal processing (BSP). Accurate epileptic seizure (ES) detection significantly depends on the precise identification of key EEG features, which requires a deep understanding of the data's intrinsic domain. Therefore, this study presents an Advanced Multi-View Deep Feature Learning (AMV-DFL) framework based on machine learning (ML) technology to enhance the detection of relevant EEG signal features for ES. Our method initially applies a fast Fourier transform (FFT) on EEG data for traditional frequency domain feature (TFD-F) extraction and directly incorporates time domain (TD) features from the raw EEG signals, establishing a comprehensive traditional multi-view feature (TMV-F). Deep features are subsequently extracted autonomously from optimal layers of one-dimensional convolutional neural networks (1D CNN), resulting in multi-view deep features (MV-DF) integrating both time and frequency domains. A multi-view forest (MV-F) is an interpretable rule-based advanced ML classifier used to construct a robust, generalized classification. Tree-based SHAP explainable artificial intelligence (T-XAI) is incorporated for interpreting and explaining the underlying rules. Experimental results confirm our method's superiority, surpassing models using TMV-FL and single-view deep features (SV-DF) by 4% and outperforming other state-of-the-art methods by an average of 3% in classification accuracy. The AMV-DFL approach aids clinicians in identifying EEG features indicative of ES, potentially discovering novel biomarkers, and improving diagnostic capabilities in epilepsy management.

NeurIPS Conference 2024 Conference Paper

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

  • Haonan Lin
  • Yan Chen
  • Jiahao Wang
  • Wenbin An
  • Mengmeng Wang
  • Feng Tian
  • Yong Liu
  • Guang Dai

Text-guided diffusion models have significantly advanced image editing, enabling high-quality and diverse modifications driven by text prompts. However, effective editing requires inverting the source image into a latent space, a process often hindered by prediction errors inherent in DDIM inversion. These errors accumulate during the diffusion process, resulting in inferior content preservation and edit fidelity, especially with conditional inputs. We address these challenges by investigating the primary contributors to error accumulation in DDIM inversion and identify the singularity problem in traditional noise schedules as a key issue. To resolve this, we introduce the Logistic Schedule, a novel noise schedule designed to eliminate singularities, improve inversion stability, and provide a better noise space for image editing. This schedule reduces noise prediction errors, enabling more faithful editing that preserves the original content of the source image. Our approach requires no additional retraining and is compatible with various existing editing methods. Experiments across eight editing tasks demonstrate the Logistic Schedule's superior performance in content preservation and edit fidelity compared to traditional noise schedules, highlighting its adaptability and effectiveness. The project page is available at https: //lonelvino. github. io/SYE/.

AAAI Conference 2024 Conference Paper

Transfer and Alignment Network for Generalized Category Discovery

  • Wenbin An
  • Feng Tian
  • Wenkai Shi
  • Yan Chen
  • Yaqiang Wu
  • Qianying Wang
  • Ping Chen

Generalized Category Discovery (GCD) is a crucial real-world task that aims to recognize both known and novel categories from an unlabeled dataset by leveraging another labeled dataset with only known categories. Despite the improved performance on known categories, current methods perform poorly on novel categories. We attribute the poor performance to two reasons: biased knowledge transfer between labeled and unlabeled data and noisy representation learning on the unlabeled data. The former leads to unreliable estimation of learning targets for novel categories and the latter hinders models from learning discriminative features. To mitigate these two issues, we propose a Transfer and Alignment Network (TAN), which incorporates two knowledge transfer mechanisms to calibrate the biased knowledge and two feature alignment mechanisms to learn discriminative features. Specifically, we model different categories with prototypes and transfer the prototypes in labeled data to correct model bias towards known categories. On the one hand, we pull instances with known categories in unlabeled data closer to these prototypes to form more compact clusters and avoid boundary overlap between known and novel categories. On the other hand, we use these prototypes to calibrate noisy prototypes estimated from unlabeled data based on category similarities, which allows for more accurate estimation of prototypes for novel categories that can be used as reliable learning targets later. After knowledge transfer, we further propose two feature alignment mechanisms to acquire both instance- and category-level knowledge from unlabeled data by aligning instance features with both augmented features and the calibrated prototypes, which can boost model performance on both known and novel categories with less noise. Experiments on three benchmark datasets show that our model outperforms SOTA methods, especially on novel categories. Theoretical analysis is provided for an in-depth understanding of our model in general. Our code and data are available at https://github.com/Lackel/TAN.

IJCAI Conference 2023 Conference Paper

Less Learn Shortcut: Analyzing and Mitigating Learning of Spurious Feature-Label Correlation

  • Yanrui Du
  • Jing Yan
  • Yan Chen
  • Jing Liu
  • Sendong Zhao
  • Qiaoqiao She
  • Hua Wu
  • Haifeng Wang

Recent research has revealed that deep neural networks often take dataset biases as a shortcut to make decisions rather than understand tasks, leading to failures in real-world applications. In this study, we focus on the spurious correlation between word features and labels that models learn from the biased data distribution of training data. In particular, we define the word highly co-occurring with a specific label as biased word, and the example containing biased word as biased example. Our analysis shows that biased examples are easier for models to learn, while at the time of prediction, biased words make a significantly higher contribution to the models' predictions, and models tend to assign predicted labels over-relying on the spurious correlation between words and labels. To mitigate models' over-reliance on the shortcut (i. e. spurious correlation), we propose a training strategy Less-Learn-Shortcut (LLS): our strategy quantifies the biased degree of the biased examples and down-weights them accordingly. Experimental results on Question Matching, Natural Language Inference and Sentiment Analysis tasks show that LLS is a task-agnostic strategy and can improve the model performance on adversarial data while maintaining good performance on in-domain data.

JBHI Journal 2023 Journal Article

Sleep Classification With Artificial Synthetic Imaging Data Using Convolutional Neural Networks

  • Lan Shi
  • Marianthie Wank
  • Yan Chen
  • Yibo Wang
  • Yachuan Liu
  • Emily C. Hector
  • Peter X.K. Song

Objective: We propose a new analytic framework, “Artificial Synthetic Imaging Data (ASID) Workflow, ” for sleep classification from a wearable device comprising: 1) the creation of ASID from data collected by a non-invasive wearable device that permits real-time multi-modal physiological monitoring on heart rate (HR), 3-axis accelerometer, electrodermal activity, and skin temperature, denoted as “Temporal E4 Data” (TED) and 2) the use of an image classification supervised learning algorithm, convolutional neural network (CNN), to classify periods of sleep. Methods: We investigate ASID Workflow under 6 settings (3 data resolutions × 2 HR scenarios). Competing machine/deep learning classification algorithms, including logistic regression, support vector machine, random forest, k-nearest neighbors, and Long Short-Term Memory, are applied to TED as comparisons, termed “Competing Workflow. ” Results: The ASID Workflow achieves excellent performance with mean weighted accuracy across settings of 94. 7%, and is superior to the Competing Workflow with high and low resolution data regardless of the inclusion of HR modality. This superiority is maximized for low resolution data without HR. Additionally, CNN has a relatively low subject-wise test computational cost compared with competing algorithms. Conclusion: We demonstrate the utility of creating ASID from multi-modal physiological data and applying a preexisting image classification algorithm to achieve better classification accuracy. We shed light on the influence of data resolution and HR modality on the Workflow's performance. Significance: Applying CNN to ASID allows us to capture both temporal and spatial dependency among physiological variables and modalities by using 2D images' topological structure that competing algorithms fail to utilize.

NeurIPS Conference 2022 Conference Paper

Society of Agents: Regret Bounds of Concurrent Thompson Sampling

  • Yan Chen
  • Perry Dong
  • Qinxun Bai
  • Maria Dimakopoulou
  • Wei Xu
  • Zhengyuan Zhou

We consider the concurrent reinforcement learning problem where $n$ agents simultaneously learn to make decisions in the same environment by sharing experience with each other. Existing works in this emerging area have empirically demonstrated that Thompson sampling (TS) based algorithms provide a particularly attractive alternative for inducing cooperation, because each agent can independently sample a belief environment (and compute a corresponding optimal policy) from the joint posterior computed by aggregating all agents' data, which induces diversity in exploration among agents while benefiting shared experience from all agents. However, theoretical guarantees in this area remain under-explored; in particular, no regret bound is known on TS based concurrent RL algorithms. In this paper, we fill in this gap by considering two settings. In the first, we study the simple finite-horizon episodic RL setting, where TS is naturally adapted into the concurrent setup by having each agent sample from the current joint posterior at the beginning of each episode. We establish a $\tilde{O}(HS\sqrt{\frac{AT}{n}})$ per-agent regret bound, where $H$ is the horizon of the episode, $S$ is the number of states, $A$ is the number of actions, $T$ is the number of episodes and $n$ is the number of agents. In the second setting, we consider the infinite-horizon RL problem, where a policy is measured by its long-run average reward. Here, despite not having natural episodic breakpoints, we show that by a doubling-horizon schedule, we can adapt TS to the infinite-horizon concurrent learning setting to achieve a regret bound of $\tilde{O}(DS\sqrt{ATn})$, where $D$ is the standard notion of diameter of the underlying MDP and $T$ is the number of timesteps. Note that in both settings, the per-agent regret decreases at an optimal rate of $\Theta(\frac{1}{\sqrt{n}})$, which manifests the power of cooperation in concurrent RL.

IJCAI Conference 2018 Conference Paper

IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

  • Qiangpeng Yang
  • Mengli Cheng
  • Wenmeng Zhou
  • Yan Chen
  • Minghui Qiu
  • Wei Lin

Incidental scene text detection, especially for multi-oriented text regions, is one of the most challenging tasks in many computer vision applications. Different from the common object detection task, scene text often suffers from a large variance of aspect ratio, scale, and orientation. To solve this problem, we propose a novel end-to-end scene text detector IncepText from an instance-aware segmentation perspective. We design a novel Inception-Text module and introduce deformable PSROI pooling to deal with multi-oriented text detection. Extensive experiments on ICDAR2015, RCTW-17, and MSRA-TD500 datasets demonstrate our method's superiority in terms of both effectiveness and efficiency. Our proposed method achieves 1st place result on ICDAR2015 challenge and the state-of-the-art performance on other datasets. Moreover, we have released our implementation as an OCR product which is available for public access.

AAAI Conference 2013 Conference Paper

From Interest to Function: Location Estimation in Social Media

  • Yan Chen
  • Jichang Zhao
  • Xia Hu
  • Xiaoming Zhang
  • Zhoujun Li
  • Tat-Seng Chua

Recent years have witnessed the tremendous development of social media, which attracts a vast number of Internet users. The high-dimension content generated by these users provides an unique opportunity to understand their behavior deeply. As one of the most fundamental topics, location estimation attracts more and more research efforts. Different from the previous literature, we find that user’s location is strongly related to user interest. Based on this, we first build a detection model to mine user interest from short text. We then establish the mapping between location function and user interest before presenting an efficient framework to predict the user’s location with convincing fidelity. Thorough evaluations and comparisons on an authentic data set show that our proposed model significantly outperforms the state-of-the-arts approaches. Moreover, the high efficiency of our model also guarantees its applicability in real-world scenarios.