Author name cluster

Hui Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

46 papers

2 author rows

EAAI Journal 2026 Journal Article

Deep reinforcement learning-driven joint filtering for extended target tracking under non-stationary heavy-tailed noise

Hui Chen
Yue Jiang
Xuxin Wang
Hongyun Zhang
Wenxu Zhang
Ziwen Zhao

Conventional filtering techniques often struggle to maintain accuracy in engineering applications plagued by non-stationary and heavy-tailed noise To address this issue, we propose a deep reinforcement learning (DRL)-driven autonomous filtering framework that adapts online to complex noise environments. The framework employs a Twin Delayed Deep Deterministic Policy Gradient (TD3) agent to dynamically adjust the parameters of a Gaussian–Student’s t mixture (GSTM) model for both process and measurement noise. This adaptive noise model is embedded within an Unscented Kalman Filter (UKF) that tracks star-convex extended targets using a Random Hypersurface Model (RHM). Crucially, the framework forms a closed Observation–Policy–Modeling–Estimation (OPME) loop: the TD3 policy updates the noise model based on filtering innovations and posterior uncertainty, which in turn refines the state estimates, creating a feedback cycle of continuous self-improvement. The learning is guided by a reward derived from the posterior Cramér–Rao bound (PCRB), ensuring the optimization directly targets estimation accuracy. Extensive simulations for extended-target tracking demonstrate that our approach significantly outperforms state-of-the-art filters. It achieves superior accuracy in both kinematic state estimation and spatial contour reconstruction, particularly under abrupt noise changes and heavy-tailed disturbances. These results demonstrate the effectiveness of the proposed framework as a self-adaptive filtering approach for robust perception in challenging engineering scenarios.

EAAI Journal 2026 Journal Article

Dynamic multi-scale feature fusion with spatial-gated frequency attention for oriented object detection in remote sensing images

Bo Tian
Hui Chen
Haoyang Zhang

High-precision oriented object detection in remote sensing imagery is challenged by large scale variation, dense object layouts, and arbitrary object orientations. We present a two-stage detection framework, termed Dynamic Multi-Scale Feature Fusion with Spatial-Gated Frequency Attention (DFGFNet). In the proposed Dynamic Dual-Path Upsampling Fusion (DDPUF), features are decomposed into a local detail stream and a global semantic stream; learnable sampling offsets and pixel-shuffle operations enable adaptive alignment between high-resolution details and low-resolution semantics across pyramid levels. The Spatial-Gated Frequency Attention Module (SGFAM) first uses a lightweight spatial gate to emphasize salient regions, then performs local frequency calibration to amplify orientation-relevant high-frequency components while suppressing irrelevant background signals. The fused, refined pyramid features are fed into a shared detection head for classification and oriented box regression. We apply the proposed method to oriented object detection in remote sensing imagery, addressing practical issues of large scale variation, dense object layouts, and arbitrary orientations. Extensive experiments on Dataset for Object Detection in Aerial Images (DOTA-v1. 0) show DFGFNet attains a single-scale mean average precision (mAP) of 77. 75% and a multi-scale mAP of 81. 08%, demonstrating the approach’s effectiveness and robustness for real-world remote sensing tasks.

AIJ Journal 2026 Journal Article

Federated neural nonparametric point processes

Hui Chen
Xuhui Fan
Hengyu Liu
Yaqiong Li
Zhilin Zhao
Feng Zhou
Christopher John Quinn
Longbing Cao

Temporal point processes (TPPs) are effective for modeling event occurrences over time, but they struggle with sparse and uncertain events in federated systems, where privacy is a major concern. To address this, we propose \textit{FedPP}, a Federated neural nonparametric Point Process model. FedPP integrates neural embeddings into Sigmoidal Gaussian Cox Processes (SGCPs) on the client side, which is a flexible and expressive class of TPPs, allowing it to generate highly flexible intensity functions that capture client-specific event dynamics and uncertainties while efficiently summarizing historical records. For global aggregation, FedPP introduces a divergence-based mechanism that communicates the distributions of SGCPs' kernel hyperparameters between the server and clients, while keeping client-specific parameters local to ensure privacy and personalization. FedPP effectively captures event uncertainty and sparsity, and extensive experiments demonstrate its superior performance in federated settings, particularly with KL divergence and Wasserstein distance-based global aggregation.

EAAI Journal 2026 Journal Article

Machine learning to enhance strain-resilience humidity sensing on flexible surface acoustic wave platform

Yanhong Xia
Zhangbin Ji
Jian Zhou
Yihao Guo
Hui Chen
Jinbo Zhang
Yongqing Fu

Flexible surface acoustic wave (SAW) humidity sensors have garnered considerable attention in fields such as environmental monitoring and healthcare, mainly attributed to their advantages such as wearability, applicability in non-planar scenarios, quasi-digital output, and wireless passive capabilities. However, improvement in performance of these flexible SAW humidity sensors faces great challenges such as low electromechanical coupling coefficient, poor humidity response or sensitivity, and introduction of detection errors caused by mechanical strain interference. Herein, we developed a flexible SAW humidity sensor utilizing an aluminum scandium nitride (AlScN) piezoelectric film deposited on ultrathin glass substrates, incorporating ternary nanocomposites of graphene quantum dots-polyethyleneimine-silica nanoparticles (GQDs-PEI-SiO2 NPs) as the sensitive layers, which demonstrated an ultra-high sensitivity of 5. 02 kHz (kHz)/%Relative Humidity (RH). To address critical issues of strain interferences under randomly bending or deformation conditions, we applied machine learning (ML) algorithms to establish correlations between sensor's response signal features and humidity labels, thereby effectively mitigating unreliable humidity measurements caused by significant strain interferences, with improved precision and specificity. After comprehensive evaluation and analysis using various artificial intelligence algorithms, multilayer perceptron regression model was identified as the best performer in humidity prediction under strain interferences, with a coefficient of determination as high as 0. 997 and a mean square error of ∼0. 479. Reliability and generalization capabilities of this model were verified, and such the strategy not only significantly enhances the performance metrics of flexible humidity sensors but also provides an innovative and precision solution under various strain interferences using the flexible SAW sensors.

EAAI Journal 2026 Journal Article

Tensile property prediction of titanium and aluminum alloys dissimilar joint by plasma plume characteristics based on a multi-stage cascade model

Chuang Cai
Fashuai Xiong
Zilin Chen
Hui Chen
Ping Tang
Xuanyu Jin
Zejun Xian
Hua Tang

In this study, the multi-stage cascade machine learning model was established to predict the titanium and aluminum (Ti/Al) alloys joint tensile property rapidly based on plasma plume characteristics during the welding process. To predict the joint tensile property accurately, the multi-stage modeling strategy was adopted for the establishment of multi-stage cascade model. The three models, plasma plume characteristics and weld cross-sectional size, weld cross-sectional size and zirconium (Zr) element content, Zr element content and joint tensile property, were trained in stages using a backpropagation neural network (BPNN) model, extreme gradient boosting (XGBoost) model, and random forest (RF) model, respectively. The three models were then cascaded and verified. The mean absolute error (MAE) of the predicted value in the cascade model was 0. 0714 kN, and the coefficient of determination (R2) was 0. 9133. The MAE of the predicted value in BPNN stage exhibited the greatest influence on that of the cascade model (ΔMAE1 → 3 = 0. 106). Although the error was transferred and amplified in the multi-stage cascade model, the cascade design of XGBoost and RF models partially absorbed and rebalanced the error. Consequently, the amplification of the error was suppressed and kept within a reasonable range. In addition, compared with the single BPNN model based on plasma plume characteristics the multi-stage cascade model achieved the MAE of 0. 0710 kN (reduced from 0. 6700 kN) and increased the R2 by 45. 3%.

IS Journal 2025 Journal Article

A Generative Random Modality Dropout Framework for Robust Multimodal Emotion Recognition

Yang Zhang
Hui Chen
Imad Rida
Xianxun Zhu

Multimodal sentiment analysis faces significant challenges in real-world applications due to the frequent absence of modalities caused by privacy concerns, device limitations, or security policies. This article introduces a random modality dropout based on generative approach (RMDG), designed to enhance the robustness and performance of multimodal models under various modality absence scenarios. The RMDG method employs a generative approach during the training phase, where random modality dropout is applied to simulate missing modalities. By leveraging the remaining modalities to predict and regenerate the key features of the missing ones, the model effectively adapts to dynamic and unpredictable modality absences. This strategy not only eliminates the need for separate training or adjustments for each modality combination but also significantly improves the efficiency and accuracy of sentiment analysis in incomplete multimodal data scenarios. Extensive experiments demonstrate that RMDG outperforms existing methods, achieving superior performance in both complete and missing modality conditions.

JBHI Journal 2025 Journal Article

Dual Transformer Network for Predicting Joint Angles and Torques From Multi-Channel EMG Signals in the Lower Limbs

Zhuo Wang
Chunjie Chen
Hui Chen
Yizhe Zhou
Xiangyang Wang
Xinyu Wu

Accurate estimation of lower limb joint kinematics and kinetics using wearable sensors enables biomechanical analysis beyond laboratory settings and facilitates real-time adaptation of exoskeleton assistance profiles. This study introduces a Dual Transformer Network (DTN) designed to concurrently estimate multiple joint angles and moments from multi-channel surface electromyography (sEMG) signals in the lower limbs. The performance evaluation of the predicted joint angles for the hip, knee, and ankle showed average root mean square error ( RMSE ) values of 1. 1827 $^{\circ }$, 1. 4312 $^{\circ }$, and 0. 8113 $^{\circ }$, Pearson correlation coefficients ( $\boldsymbol{\rho }$ ) of 0. 9992, 0. 9993, and 0. 9991, and coefficients of determination ( $\mathbf {{\mathit{R}}}^{2}$ ) of 0. 9847, 0. 9858, and 0. 9838, respectively. For the predicted joint moments, the corresponding values were RMSE of 0. 0458, 0. 0341, and 0. 0522 Nm/kg, $\boldsymbol{\rho }$ of 0. 9978, 0. 9972, and 0. 9990, and $\mathbf {{\mathit{R}}}^{2}$ of 0. 9825, 0. 9801, and 0. 9902. Angular velocities, derived by differentiating the estimated joint angles, achieved an RMSE below 0. 6530 rd/s, $\boldsymbol{\rho }$ exceeding 0. 9534, and $\mathbf {{\mathit{R}}}^{2}$ above 0. 9552. Additionally, joint power, computed as the dot product of predicted joint moments and angular velocities, resulted in RMSE below 0. 3823W/kg, $\boldsymbol{\rho }$ above 0. 9771, and $\mathbf {{\mathit{R}}}^{2}$ above 0. 8925. These results demonstrate the effectiveness of the proposed network in continuously estimating lower limb kinematics and kinetics, contributing to advancements in assist-as-needed exoskeleton control strategies.

IJCAI Conference 2025 Conference Paper

Exploiting Position Information in Convolutional Kernels for Structural Re-parameterization

Tianxiang Hao
Hui Chen
Guiguang Ding

In order to boost the performance of a convolutional neural network (CNN), several approaches have shown the benefit of enhancing the spatial encoding of feature maps. However, few works paid attention to the positional properties of convolutional kernels. In this paper, we demonstrate that different kernel positions are of different importance, which depends on the task, dataset and architecture, and adaptively emphasizing the informative parts in convolutional kernels can lead to considerable improvement. Therefore, we propose a novel structural re-parameterization Position Boosting Convolution (PBConv) to exploit and enhance the position information in the convolutional kernel. PBConv consists of several concurrent small convolutional kernels, which can be equivalently converted to the original kernel and bring no extra inference cost. Different from existing structural re-parameterization methods, PBconv searches for the optimal re-parameterized structure by a fast heuristic algorithm based on the dispersion of kernel weights. Such heuristic search is efficient yet effective, well adapting the varying kernel weight distribution. As a result, PBConv can significantly improve the representational power of a model, especially its ability to extract fine-grained low-level features. Importantly, PBConv is orthogonal to procedural re-parameterization methods and can further boost performance based on them.

PDF Details DOI

AAAI Conference 2025 Conference Paper

GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians

Xiaobao Wei
Peng Chen
Ming Lu
Hui Chen
Feng Tian

Rendering photorealistic head avatars from arbitrary viewpoints is crucial for various applications like virtual reality. Although previous methods based on Neural Radiance Fields (NeRF) can achieve impressive results, they lack fidelity and efficiency. Recent methods using 3D Gaussian Splatting (3DGS) have improved rendering quality and real-time performance but still require significant storage overhead. In this paper, we introduce a method called GraphAvatar that utilizes Graph Neural Networks (GNN) to generate 3D Gaussians for the head avatar. Specifically, GraphAvatar trains a geometric GNN and an appearance GNN to generate the attributes of the 3D Gaussians from the tracked mesh. Therefore, our method can store the GNN models instead of the 3D Gaussians, significantly reducing the storage overhead to just 10MB. To reduce the impact of face-tracking errors, we also present a novel graph-guided optimization module to refine face-tracking parameters during training. Finally, we introduce a 3D-aware enhancer for post-processing to enhance the rendering quality. We conduct comprehensive experiments to demonstrate the advantages of GraphAvatar, surpassing existing methods in visual fidelity and storage consumption. The ablation study sheds light on the trade-offs between rendering quality and model size.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research

Hui Chen
Miao Xiong
Yujie Lu
Wei Han
Ailin Deng
Yufei He
Jiaying Wu
Yibo Li

Recent advancements in AI agents have demonstrated their growing potential to drive and support scientific discovery. In this work, we introduce MLR-Bench, a comprehensive benchmark for evaluating AI agents on open-ended machine learning research. MLR-Bench includes three key components: (1) 201 research tasks sourced from NeurIPS, ICLR, and ICML workshops covering diverse ML topics; (2) MLR-Judge, an automated evaluation framework combining LLM-based reviewers with carefully designed review rubrics to assess research quality; and (3) MLR-Agent, a modular agent scaffold capable of completing research tasks through four stages: idea generation, proposal formulation, experimentation, and paper writing. Our framework supports both stepwise assessment across these distinct research stages, and end-to-end evaluation of the final research paper. We then use MLR-Bench to evaluate six frontier LLMs and an advanced coding agent, finding that while LLMs are effective at generating coherent ideas and well-structured papers, current coding agents frequently (e. g. , in 80\% of the cases) produce fabricated or invalidated experimental results—posing a major barrier to scientific reliability. We validate MLR-Judge through human evaluation, showing high agreement with expert reviewers, supporting its potential as a scalable tool for research evaluation. We open-source MLR-Bench to help the community benchmark, diagnose, and improve AI research agents toward trustworthy and transparent scientific discovery.

NeurIPS Conference 2025 Conference Paper

PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation

Ao Wang
Hui Chen
Jianchao Tan
Kefeng Zhang
Xunliang Cai
Zijia Lin
Jungong Han
Guiguang Ding

Recently, large vision-language models (LVLMs) have rapidly gained popularity for their strong generation and reasoning capabilities given diverse multimodal inputs. However, these models incur significant computational and memory overhead during inference, which greatly hinders the efficient deployment in practical scenarios. The extensive key-value (KV) cache, necessitated by the lengthy input and output sequences, notably contributes to the high inference cost. Based on this, recent works have investigated ways to reduce the KV cache size for higher efficiency. Although effective, they generally overlook the distinct importance distributions of KV vectors across layers and maintain the same cache size for each layer during the next token prediction. This results in the significant contextual information loss for certain layers, leading to notable performance decline. To address this, we present PrefixKV. It reframes the challenge of determining KV cache sizes for all layers into the task of searching for the optimal global prefix configuration. With an adaptive layer-wise KV retention recipe based on binary search, the maximum contextual information can thus be preserved in each layer, facilitating the generation. Extensive experiments demonstrate that our method achieves the state-of-the-art performance compared with others. It exhibits superior inference efficiency and generation quality trade-offs, showing promising potential for practical applications. Code is available at https: //github. com/THU-MIG/PrefixKV.

AAAI Conference 2025 Conference Paper

Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning

Hui-Yue Yang
Hui Chen
Ao Wang
Kai Chen
Zijia Lin
Yongliang Tang
Pengcheng Gao
Yuming Quan

Segment Anything Model (SAM) has made great progress in anomaly segmentation tasks due to its impressive generalization ability. However, existing methods that directly apply SAM through prompting often overlook the domain shift issue, where SAM performs well on natural images but struggles in industrial scenarios. Parameter-Efficient Fine-Tuning (PEFT) offers a promising solution, but it may yield suboptimal performance by not adequately addressing the perception challenges during adaptation to anomaly images. In this paper, we propose a novel Self-Perception Tuning (SPT) method, aiming to enhance SAM's perception capability for anomaly segmentation. The SPT method incorporates a self-drafting tuning strategy, which generates an initial coarse draft of the anomaly mask, followed by a refinement process. Additionally, a visual-relation-aware adapter is introduced to improve the perception of discriminative relational information for mask generation. Extensive experimental results on several benchmark datasets demonstrate that our SPT method can significantly outperform baseline methods, validating its effectiveness.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Scaffold-BPE: Enhancing Byte Pair Encoding for Large Language Models with Simple and Effective Scaffold Token Removal

Haoran Lian
Yizhe Xiong
Jianwei Niu
Shasha Mo
Zhenpeng Su
Zijia Lin
Hui Chen
Jungong Han

Byte Pair Encoding (BPE) serves as a foundation method for text tokenization in the Natural Language Processing (NLP) field. Despite its wide adoption, the original BPE algorithm harbors an inherent flaw: it inadvertently introduces a frequency imbalance for tokens in the text corpus. Since BPE iteratively merges the most frequent token pair in the text corpus to generate a new token and keeps all generated tokens in the vocabulary, it unavoidably holds tokens that primarily act as components of a longer token and appear infrequently on their own. We term such tokens as Scaffold Tokens. Due to their infrequent occurrences in the text corpus, Scaffold Tokens pose a learning imbalance issue. To address that issue, we propose Scaffold-BPE, which incorporates a dynamic scaffold token removal mechanism by parameter-free, computation-light, and easy-to-implement modifications to the original BPE method. This novel approach ensures the exclusion of low-frequency Scaffold Tokens from the token representations for given texts, thereby mitigating the issue of frequency imbalance and facilitating model training. On extensive experiments across language modeling and even machine translation, Scaffold-BPE consistently outperforms the original BPE, well demonstrating its effectiveness.

PDF Details DOI

ICLR Conference 2025 Conference Paper

SimpleTM: A Simple Baseline for Multivariate Time Series Forecasting

Hui Chen
Viet Luong
Lopamudra Mukherjee
Vikas Singh

The versatility of large Transformer-based models has led to many efforts focused on adaptations to other modalities, including time-series data. For instance, one could start from a pre-trained checkpoint of a large language model and attach adapters to recast the new modality (e.g., time-series) as ``language''. Alternatively, one can use a suitably large Transformer-based model, and make some modifications for time-series data. These ideas offer good performance across available benchmarks. But temporal data are quite heterogeneous (e.g., wearable sensors, physiological measurements in healthcare), and unlike text/image corpus, much of it is not publicly available. So, these models need a fair bit of domain-specific fine-tuning to achieve good performance -- this is often expensive or difficult with limited resources. In this paper, we study and characterize the performance profile of a non-generalist approach: our SimpleTM model is specialized for multivariate time-series forecasting. By simple, we mean that the model is lightweight. It is restricted to tokenization based on textbook signal processing ideas (shown to be effective in vision) which are then allowed to attend/interact: via self-attention but also via ways that are a bit more general than dot-product attention, accomplished via basic geometric algebra operations. We show that even a single- or two-layer model gives results that are competitive with much bigger models, including large transformer-based architectures, on most benchmarks commonly reported in the literature.

NeurIPS Conference 2025 Conference Paper

Single-Step Operator Learning for Conditioned Time-Series Diffusion Models

Hui Chen
Vikas Singh

Diffusion models have achieved significant success, yet their application to time series data, particularly with regard to efficient sampling, remains an active area of research. We describe an operator-learning approach for conditioned time-series diffusion models that gives efficient single-step generation by leveraging insights from the frequency-domain characteristics of both the time-series data and the diffusion process itself. The forward diffusion process induces a structured, frequency-dependent smoothing of the data's probability density function. However, this frequency smoothing is related (e. g. , via likelihood function) to easily accessible frequency components of time-series data. This suggests that a module operating in the frequency space of the time-series can, potentially, more effectively learn to reverse the frequency-dependent smoothing of the data distribution induced by the diffusion process. We set up an operator learning task, based on frequency-aware building blocks, which satisfies semi-group properties, while exploiting the structure of time-series data. Evaluations on multiple datasets show that our single-step generation proposal achieves forecasting/imputation results comparable (or superior) to many multi-step diffusion schemes while significantly reducing inference costs.

AIIM Journal 2024 Journal Article

Development and validation of a deep interpretable network for continuous acute kidney injury prediction in critically ill patients

Meicheng Yang
Songqiao Liu
Tong Hao
Caiyun Ma
Hui Chen
Yuwen Li
Changde Wu
Jianfeng Xie

Early detection of acute kidney injury (AKI) may provide a crucial window of opportunity to prevent further injury, which helps improve clinical outcomes. This study aimed to develop a deep interpretable network for continuously predicting the 24-hour AKI risk in real-time and evaluate its performance internally and externally in critically ill patients. A total of 21, 163 patients' electronic health records sourced from Beth Israel Deaconess Medical Center (BIDMC) were first included in building the model. Two external validation populations included 3025 patients from the Philips eICU Research Institute and 2625 patients from Zhongda Hospital Southeast University. A total of 152 intelligently engineered predictors were extracted on an hourly basis. The prediction model referred to as DeepAKI was designed with the basic framework of squeeze-and-excitation networks with dilated causal convolution embedded. The integrated gradients method was utilized to explain the prediction model. When performed on the internal validation set (3175 [15 %] patients from BIDMC) and the two external validation sets, DeepAKI obtained the area under the curve of 0. 799 (95 % CI 0. 791–0. 806), 0. 763 (95 % CI 0. 755–0. 771) and 0. 676 (95 % CI 0. 668–0. 684) for continuousAKI prediction, respectively. For model interpretability, clinically relevant important variables contributing to the model prediction were informed, and individual explanations along the timeline were explored to show how AKI risk arose. The potential threats to generalisability in deep learning-based models when deployed across health systems in real-world settings were analyzed.

AAAI Conference 2024 Conference Paper

Geometry-Guided Domain Generalization for Monocular 3D Object Detection

Fan Yang
Hui Chen
Yuwei He
Sicheng Zhao
Chenghao Zhang
Kai Ni
Guiguang Ding

Monocular 3D object detection (M3OD) is important for autonomous driving. However, existing deep learning-based methods easily suffer from performance degradation in real-world scenarios due to the substantial domain gap between training and testing. M3OD's domain gaps are complex, including camera intrinsic parameters, extrinsic parameters, image appearance, etc. Existing works primarily focus on the domain gaps of camera intrinsic parameters, ignoring other key factors. Moreover, at the feature level, conventional domain invariant learning methods generally cause the negative transfer issue, due to the ignorance of dependency between geometry tasks and domains. To tackle these issues, in this paper, we propose MonoGDG, a geometry-guided domain generalization framework for M3OD, which effectively addresses the domain gap at both camera and feature levels. Specifically, MonoGDG consists of two major components. One is geometry-based image reprojection, which mitigates the impact of camera discrepancy by unifying intrinsic parameters, randomizing camera orientations, and unifying the field of view range. The other is geometry-dependent feature disentanglement, which overcomes the negative transfer problems by incorporating domain-shared and domain-specific features. Additionally, we leverage a depth-disentangled domain discriminator and a domain-aware geometry regression attention mechanism to account for the geometry-domain dependency. Extensive experiments on multiple autonomous driving benchmarks demonstrate that our method achieves state-of-the-art performance in domain generalization for M3OD.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

MM-WLAuslan: Multi-View Multi-Modal Word-Level Australian Sign Language Recognition Dataset

Xin Shen
Heming Du
Hongwei Sheng
Shuyun Wang
Hui Chen
Huiqiang Chen
Zhuojie Wu
Xiaobiao Du

Isolated Sign Language Recognition (ISLR) focuses on identifying individual sign language glosses. Considering the diversity of sign languages across geographical regions, developing region-specific ISLR datasets is crucial for supporting communication and research. Auslan, as a sign language specific to Australia, still lacks a dedicated large-scale word-level dataset for the ISLR task. To fill this gap, we curate \underline{\textbf{the first}} large-scale Multi-view Multi-modal Word-Level Australian Sign Language recognition dataset, dubbed MM-WLAuslan. Compared to other publicly available datasets, MM-WLAuslan exhibits three significant advantages: (1) the largest amount of data, (2) the most extensive vocabulary, and (3) the most diverse of multi-modal camera views. Specifically, we record 282K+ sign videos covering 3, 215 commonly used Auslan glosses presented by 73 signers in a studio environment. Moreover, our filming system includes two different types of cameras, i. e. , three Kinect-V2 cameras and a RealSense camera. We position cameras hemispherically around the front half of the model and simultaneously record videos using all four cameras. Furthermore, we benchmark results with state-of-the-art methods for various multi-modal ISLR settings on MM-WLAuslan, including multi-view, cross-camera, and cross-view. Experiment results indicate that MM-WLAuslan is a challenging ISLR dataset, and we hope this dataset will contribute to the development of Auslan and the advancement of sign languages worldwide. All datasets and benchmarks are available at MM-WLAuslan.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

More is Better: Deep Domain Adaptation with Multiple Sources

Sicheng Zhao
Hui Chen
Hu Huang
Pengfei Xu
Guiguang Ding

In many practical applications, it is often difficult and expensive to obtain large-scale labeled data to train state-of-the-art deep neural networks. Therefore, transferring the learned knowledge from a separate, labeled source domain to an unlabeled or sparsely labeled target domain becomes an appealing alternative. However, direct transfer often results in significant performance decay due to domain shift. Domain adaptation (DA) aims to address this problem by aligning the distributions between the source and target domains. Multi-source domain adaptation (MDA) is a powerful and practical extension in which the labeled data may be collected from multiple sources with different distributions. In this survey, we first define various MDA strategies. Then we systematically summarize and compare modern MDA methods in the deep learning era from different perspectives, followed by commonly used datasets and a brief benchmark. Finally, we discuss future research directions for MDA that are worth investigating.

PDF Details DOI

TIST Journal 2024 Journal Article

Quintuple-based Representation Learning for Bipartite Heterogeneous Networks

Cangqi Zhou
Hui Chen
Jing Zhang
Qianmu Li
Dianming Hu

Recent years have seen rapid progress in network representation learning, which removes the need for burdensome feature engineering and facilitates downstream network-based tasks. In reality, networks often exhibit heterogeneity, which means there may exist multiple types of nodes and interactions. Heterogeneous networks raise new challenges to representation learning, as the awareness of node and edge types is required. In this article, we study a basic building block of general heterogeneous networks, the heterogeneous networks with two types of nodes. Many problems can be solved by decomposing general heterogeneous networks into multiple bipartite ones. Recently, to overcome the demerits of non-metric measures used in the embedding space, metric learning-based approaches have been leveraged to tackle heterogeneous network representation learning. These approaches first generate triplets of samples, in which an anchor node, a positive counterpart, and a negative one co-exist, and then try to pull closer positive samples and push away negative ones. However, when dealing with heterogeneous networks, even the simplest two-typed ones, triplets cannot simultaneously involve both positive and negative samples from different parts of networks. To address this incompatibility of triplet-based metric learning, in this article, we propose a novel quintuple-based method for learning node representations in bipartite heterogeneous networks. Specifically, we generate quintuples that contain positive and negative samples from two different parts of networks. And we formulate two learning objectives that accommodate quintuple-based learning samples, a proximity-based loss that models the relations in quintuples by sigmoid probabilities and an angular loss that more robustly maintains similarity structures. In addition, we also parameterize feature learning by using one-dimensional convolution operators around nodes’ neighborhoods. Compared with eight methods, extensive experiments on two downstream tasks manifest the effectiveness of our approach.

EAAI Journal 2024 Journal Article

SDG: A global large-scale airport perception disparity cognition modeling method based on deep learning and geographic knowledge

Ning Li
Liang Cheng
Hui Chen
Yalu Zhang
Lei Wang
Chen Ji
Manchun Li

Global airport perception levels vary due to natural geographical factors and economic development disparities. Understanding these differences is crucial for assessing regional airport development and its correlation with geographical patterns. However, there are limited methods available to effectively comprehend these disparities. To address this issue, this paper proposes a Salience, Disturbance, and Geographic-knowledge (SDG) approach for the cognitive analysis of global large-scale airport perception differences. Salience is assessed using a two-class deep learning model to evaluate the prominence of known airports. Disturbance is evaluated using an object detection model to measure background interference in large-scale airport perception. Geographic-knowledge analysis considers the correlation between regional airports and their surrounding geographic environment. The results rank perception difficulties for 17 regions worldwide, with Tajikistan exhibiting the highest difficulty at 0. 922, while the Jiangsu–Zhejiang–Shanghai region in China has the lowest at 0. 102. We also performed correlation analyses to validate the effectiveness of our model. To our knowledge, this paper pioneers the cognitive analysis of target perception difficulty differences across multiple global regions.

EAAI Journal 2024 Journal Article

Spatio-temporal features for fast early warning of unplanned self-extubation in ICU

Yang Chen
Ling Wang
Guorong Wang
Shuang Yang
Yingying Wang
MingFang Xiang
Xuan Zhang
Hui Chen

Patients’ behaviors in the Intensive Care Units (ICU) have garnered research attention, particularly regarding the impact of Unplanned Extubation (UEX). However, there is currently no existing report on methods for early warning of UEX action in RGB video. Applying traditional human action recognition algorithms to UEX in the complex ICU environment proves challenging. To address the above issue, we propose a novel feature for early warning of UEX action in patients using RGB videos. Firstly, we employ the YOLOv3 detection method to extract the region of interest (ROI), which corresponds to the region where the patient is located. Subsequently, we develop a spatio-temporal (ST) feature for human action tracking by using the L-K optical flow algorithm. This ST feature encompasses optical flow corner number, trajectory distance, and wavelet transform features. Finally, we utilize support vector machine (SVM) for patient action classification and early warning. Experimental results on the ICU monitoring dataset demonstrate the superior performance of the proposed feature in UEX prediction.

IJCAI Conference 2024 Conference Paper

TaD: A Plug-and-Play Task-Aware Decoding Method to Better Adapt LLMs on Downstream Tasks

Xinhao Xu
Hui Chen
Zijia Lin
Jungong Han
Lixing Gong
Guoxin Wang
Yongjun Bao
Guiguang Ding

Fine-tuning pre-trained models on downstream tasks is a common practice in leveraging large language models (LLMs) today. A critical issue is how to adapt pre-trained models to downstream tasks better, thereby enhancing their performance. This paper introduces Task-aware Decoding (TaD), a plug-and-play method that exploits the difference in probability distributions before and after fine-tuning to boost the performance of LLMs on downstream tasks. The proposed TaD argues that the difference between the pre-finetuning probability distribution and the post-finetuning one represents the direction from common knowledge towards specific downstream-task knowledge. Aligning the final output probability distribution to that direction can probably result in superior downstream task performance, compared to the original fine-tuned model. Experiments on various datasets across four different task categories well demonstrate TaD's effectiveness on different LLMs, i. e. , GPT, BLOOM, and LLaMA, with different fine-tuning methods. Moreover, further experiments reveal that TaD better enhances model performance in data-scarce scenarios.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

TPR: Topology-Preserving Reservoirs for Generalized Zero-Shot Learning

Hui Chen
Yanbin Liu
Yongqiang Ma
Nanning Zheng
Xin Yu

Pre-trained vision-language models (VLMs) such as CLIP have shown excellent performance for zero-shot classification. Based on CLIP, recent methods design various learnable prompts to evaluate the zero-shot generalization capability on a base-to-novel setting. This setting assumes test samples are already divided into either base or novel classes, limiting its application to realistic scenarios. In this paper, we focus on a more challenging and practical setting: generalized zero-shot learning (GZSL), i. e. , testing with no information about the base/novel division. To address this challenging zero-shot problem, we introduce two unique designs that enable us to classify an image without the need of knowing whether it comes from seen or unseen classes. Firstly, most existing methods only adopt a single latent space to align visual and linguistic features, which has a limited ability to represent complex visual-linguistic patterns, especially for fine-grained tasks. Instead, we propose a dual-space feature alignment module that effectively augments the latent space with a novel attribute space induced by a well-devised attribute reservoir. In particular, the attribute reservoir consists of a static vocabulary and learnable tokens complementing each other for flexible control over feature granularity. Secondly, finetuning CLIP models (e. g. , prompt learning) on seen base classes usually sacrifices the model's original generalization capability on unseen novel classes. To mitigate this issue, we present a new topology-preserving objective that can enforce feature topology structures of the combined base and novel classes to resemble the topology of CLIP. In this manner, our model will inherit the generalization ability of CLIP through maintaining the pairwise class angles in the attribute space. Extensive experiments on twelve object recognition datasets demonstrate that our model, termed Topology-Preserving Reservoir (TPR), outperforms strong baselines including both prompt learning and conventional generative-based zero-shot methods.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Uncertainty Quantification for Data-Driven Change-Point Learning via Cross-Validation

Hui Chen
Yinxu Jia
Guanghui Wang
Changliang Zou

Accurately detecting multiple change-points is critical for various applications, but determining the optimal number of change-points remains a challenge. Existing approaches based on information criteria attempt to balance goodness-of-fit and model complexity, but their performance varies depending on the model. Recently, data-driven selection criteria based on cross-validation has been proposed, but these methods can be prone to slight overfitting in finite samples. In this paper, we introduce a method that controls the probability of overestimation and provides uncertainty quantification for learning multiple change-points via cross-validation. We frame this problem as a sequence of model comparison problems and leverage high-dimensional inferential procedures. We demonstrate the effectiveness of our approach through experiments on finite-sample data, showing superior uncertainty quantification for overestimation compared to existing methods. Our approach has broad applicability and can be used in diverse change-point models.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

YOLOv10: Real-Time End-to-End Object Detection

Ao Wang
Hui Chen
Lihao Liu
Kai Chen
Zijia Lin
Jungong Han
Guiguang Ding

Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-time object detection owing to their effective balance between computational cost and detection performance. Researchers have explored the architectural designs, optimization objectives, data augmentation strategies, and others for YOLOs, achieving notable progress. However, the reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs and adversely impacts the inference latency. Besides, the design of various components in YOLOs lacks the comprehensive and thorough inspection, resulting in noticeable computational redundancy and limiting the model's capability. It renders the suboptimal efficiency, along with considerable potential for performance improvements. In this work, we aim to further advance the performance-efficiency boundary of YOLOs from both the post-processing and the model architecture. To this end, we first present the consistent dual assignments for NMS-free training of YOLOs, which brings the competitive performance and low inference latency simultaneously. Moreover, we introduce the holistic efficiency-accuracy driven model design strategy for YOLOs. We comprehensively optimize various components of YOLOs from both the efficiency and accuracy perspectives, which greatly reduces the computational overhead and enhances the capability. The outcome of our effort is a new generation of YOLO series for real-time end-to-end object detection, dubbed YOLOv10. Extensive experiments show that YOLOv10 achieves the state-of-the-art performance and efficiency across various model scales. For example, our YOLOv10-S is 1. 8$\times$ faster than RT-DETR-R18 under the similar AP on COCO, meanwhile enjoying 2. 8$\times$ smaller number of parameters and FLOPs. Compared with YOLOv9-C, YOLOv10-B has 46\% less latency and 25\% fewer parameters for the same performance. Code and models are available at https: //github. com/THU-MIG/yolov10.

PDF Details DOI

EAAI Journal 2023 Journal Article

A hybrid adaptive Differential Evolution based on Gaussian tail mutation

Hui Chen
Shaolang Li
Xiaobo Li
Yuxin Zhao
Junwei Dong

In this paper, we propose an improved version of JADE by hybridizing the JADE algorithm with a Gaussian tail, a modified hunger games search (HGS) algorithm, and a distance-based multi-population (DbMP) approach named as HJADE-GT. In the proposed algorithm, two sets of operators (modified HGS operator and JADE operator with Gaussian tail) are utilized to generate offspring to further enhance the exploration and exploitation abilities. DbMP approach is proposed to make full use of feedback information from the whole population. In HJADE-GT, the main population is divided into three fixed-size subpopulations: exploration subpopulation, balanced subpopulation, and exploitation subpopulation. Secondly, a modified HGS operator is incorporated into the exploration subpopulation to improve global searchability. Thirdly, the JADE operator with a Gaussian tail is utilized to enhance the ability of exploitation subpopulation. Finally, the DbMP approach is utilized for the balanced subpopulation to choose an appropriate operator for current individuals to make full use of feedback information from the exploration subpopulation and exploitation subpopulation. In the experimental studies, it is demonstrated that the proposed algorithm presents competitive performance with 13 well-known algorithms, including jDE, SaDE, JADE, MPEDE, SHADE, CoDE, SaJADE, iLSHADE, jSO, CMAES, MGFPA, ESSA, and PPSO on CEC2017 benchmark functions. Four engineering problems and three dynamic economic emission dispatch (DEED) problems were utilized to verify the performance of HJADE-GT, and the experiments on DEED problems confirm that HJADE-GT is an efficient algorithm to solve engineering and large-scale constrained DEED problems.

YNICL Journal 2023 Journal Article

Aberrant dynamic Functional-Structural connectivity coupling of Large-scale brain networks in poststroke motor dysfunction

Xiaoying Liu
Shuting Qiu
Xiaoyang Wang
Hui Chen
Yuting Tang
Yin Qin

BACKGROUND AND PURPOSE: Stroke may lead to widespread functional and structural reorganization in the brain. Several studies have reported a potential correlation between functional network changes and structural network changes after stroke. However, it is unclear how functional-structural relationships change dynamically over the course of one resting-state fMRI scan in patients following a stroke; furthermore, we know little about their relationships with the severity of motor dysfunction. Therefore, this study aimed to investigate dynamic functional and structural connectivity (FC-SC) coupling and its relationship with motor function in subcortical stroke from the perspective of network dynamics. METHODS: Resting-state functional magnetic resonance imaging and diffusion tensor imaging were obtained from 39 S patients (19 severe and 20 moderate) and 22 healthy controls (HCs). Brain structural networks were constructed by tracking fiber tracts in diffusion tensor imaging, and structural network topology metrics were calculated using a graph-theoretic approach. Independent component analysis, the sliding window method, and k-means clustering were used to calculate dynamic functional connectivity and to estimate different dynamic connectivity states. The temporal patterns and intergroup differences of FC-SC coupling were analyzed within each state. We also calculated dynamic FC-SC coupling and its relationship with functional network efficiency. In addition, the correlation between FC-SC coupling and the Fugl-Meyer assessment scale was analyzed. RESULTS: For SC, stroke patients showed lower global efficiency than HCs (all P < 0.05), and severely affected patients had a higher characteristic path length (P = 0.003). For FC and FC-SC coupling, stroke patients predominantly showed lower local efficiency and reduced FC-SC coupling than HCs in state 2 (all P < 0.05). Furthermore, severely affected patients also showed lower local efficiency (P = 0.031) and reduced FC-SC coupling (P = 0.043) in state 3, which was markedly linked to the severity of motor dysfunction after stroke. In addition, FC-SC coupling was correlated with functional network efficiency in state 2 in moderately affected patients (r = 0.631, P = 0.004) but not significantly in severely affected patients. CONCLUSIONS: Stroke patients show abnormal dynamic FC-SC coupling characteristics, especially in individuals with severe injuries. These findings may contribute to a better understanding of the anatomical functional interactions underlying motor deficits in stroke patients and provide useful information for personalized rehabilitation strategies.

EAAI Journal 2023 Journal Article

Airport detection in remote sensing real-open world using deep learning

Ning Li
Liang Cheng
Chen Ji
Hui Chen
WanXuan Geng
WeiMing Yang

Remote sensing real-open world of large-scare areas brings a high false alarm rate to object detection because of highly complex backgrounds. In this study, we constructed a two-stage extraction framework candidate region extraction (CRE)–multi-core binary analysis (MCBA) (CRE-MCBA) to improve the correct detection rate (DR) and reduce the error DR for airport extraction in large-scale remote sensing real-open areas. First, global sample labeling and large-scale runway CRE were conducted. Open-sourced data were applied to match the detection results spatially, and the MCBA was built for the issue of unbalanced positive and negative samples to mine potential airports. The minimum penalty term δ was also introduced into focal loss to improve detection ability in a remote sensing real-open world area. In the 219, 041 km 2 study area at the Yangtze River Delta in China, the detection and error reduction rates were 100% and 97. 3%, respectively. A total of 37 airports with prominent runway characteristics were detected, with 9 newly added airports. We also test the CRE-MCBA framework in Japan, Korean Peninsula, and Madhya Pradesh of India. Compared with other detection methods, ours has more robust regional adaptability and generalization ability and realizes the practical mining of potential objects.

IJCAI Conference 2023 Conference Paper

Bayesian Federated Learning: A Survey

Longbing Cao
Hui Chen
Xuhui Fan
Joao Gama
Yew-Soon Ong
Vipin Kumar

Federated learning (FL) demonstrates its advantages in integrating distributed infrastructure, communication, computing and learning in a privacy-preserving manner. However, the robustness and capabilities of existing FL methods are challenged by limited and dynamic data and conditions, complexities including heterogeneities and uncertainties, and analytical explainability. Bayesian federated learning (BFL) has emerged as a promising approach to address these issues. This survey presents a critical overview of BFL, including its basic concepts, its relations to Bayesian learning in the context of FL, and a taxonomy of BFL from both Bayesian and federated perspectives. We categorize and discuss client- and server-side and FL-based BFL methods and their pros and cons. The limitations of the existing BFL methods and the future directions of BFL research further address the intricate requirements of real-life FL applications.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Learning to Shape Rewards Using a Game of Two Partners

David Mguni
Taher Jafferjee
Jianhong Wang
Nicolas Perez-Nieves
Wenbin Song
Feifei Tong
Matthew Taylor
Tianpei Yang

Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construc- tion is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimising Shaping Algorithm (ROSA), an automated reward shaping framework in which the shaping-reward function is constructed in a Markov game between two agents. A reward-shaping agent (Shaper) uses switching controls to determine which states to add shaping rewards for more efficient learning while the other agent (Controller) learns the optimal policy for the task using these shaped rewards. We prove that ROSA, which adopts existing RL algorithms, learns to construct a shaping-reward function that is beneficial to the task thus ensuring efficient convergence to high performance policies. We demonstrate ROSA’s properties in three didactic experiments and show its superior performance against state-of-the-art RS algorithms in challenging sparse reward environments.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction

Tianshu Wang
Hongyu Lin
Cheng Fu
Xianpei Han
Le Sun
Feiyu Xiong
Hui Chen
Minlong Lu

Entity matching (EM) is the most critical step for entity resolution (ER). While current deep learning-based methods achieve very impressive performance on standard EM benchmarks, their real-world application performance is much frustrating. In this paper, we highlight that such the gap between reality and ideality stems from the unreasonable benchmark construction process, which is inconsistent with the nature of entity matching and therefore leads to biased evaluations of current EM approaches. To this end, we build a new EM corpus and re-construct EM benchmarks to challenge critical assumptions implicit in the previous benchmark construction process by step-wisely changing the restricted entities, balanced labels, and single-modal records in previous benchmarks into open entities, imbalanced labels, and multi-modal records in an open environment. Experimental results demonstrate that the assumptions made in the previous benchmark construction process are not coincidental with the open environment, which conceal the main challenges of the task and therefore significantly overestimate the current progress of entity matching. The constructed benchmarks and code are publicly released at https: //github. com/tshu-w/ember.

PDF Details DOI

AAAI Conference 2022 Short Paper

Learning to Ask for Data-Efficient Event Argument Extraction (Student Abstract)

Hongbin Ye
Ningyu Zhang
Zhen Bi
Shumin Deng
Chuanqi Tan
Hui Chen
Fei Huang
Huajun Chen

Event argument extraction (EAE) is an important task for information extraction to discover specific argument roles. In this study, we cast EAE as a question-based cloze task and empirically analyze fixed discrete token template performance. As generating human-annotated question templates is often time-consuming and labor-intensive, we further propose a novel approach called “Learning to Ask, ” which can learn optimized question templates for EAE without human annotations. Experiments using the ACE-2005 dataset demonstrate that our method based on optimized questions achieves state-of-the-art performance in both the few-shot and supervised settings.

NeurIPS Conference 2022 Conference Paper

Neural-Symbolic Entangled Framework for Complex Query Answering

Zezhong Xu
Wen Zhang
Peng Ye
Hui Chen
Huajun Chen

Answering complex queries over knowledge graphs (KG) is an important yet challenging task because of the KG incompleteness issue and cascading errors during reasoning. Recent query embedding (QE) approaches embed the entities and relations in a KG and the first-order logic (FOL) queries into a low dimensional space, making the query can be answered by dense similarity searching. However, previous works mainly concentrate on the target answers, ignoring intermediate entities' usefulness, which is essential for relieving the cascading error problem in logical query answering. In addition, these methods are usually designed with their own geometric or distributional embeddings to handle logical operators like union, intersection, and negation, with the sacrifice of the accuracy of the basic operator -- projection, and they could not absorb other embedding methods to their models. In this work, we propose a Neural and Symbolic Entangled framework (ENeSy) for complex query answering, which enables the neural and symbolic reasoning to enhance each other to alleviate the cascading error and KG incompleteness. The projection operator in ENeSy could be any embedding method with the capability of link prediction, and the other FOL operators are handled without parameters. With both neural and symbolic reasoning results contained, ENeSy answers queries in ensembles. We evaluate ENeSy on complex query answering benchmarks, and ENeSy achieves the state-of-the-art, especially in the setting of training model only with the link prediction task.

IROS Conference 2022 Conference Paper

Quantity over Quality: Training an AV Motion Planner with Large Scale Commodity Vision Data

Lukas Platinsky
Tayyab Naseer
Hui Chen
Ben Haines
Haoyue Zhu
Hugo Grimmett
Luca Del Pero

With the Autonomous Vehicle (AV) industry shifting towards machine-learned approaches for motion plan-ning [1], the performance of self-driving systems is starting to rely heavily on large quantities of expert driving demon-strations. However, collecting this demonstration data typically involves expensive HD sensor suites (LiDAR + RADAR + cameras), which quickly becomes financially infeasible at the scales required. This motivates the use of commodity sensors like cameras for data collection, which are an order of mag-nitude cheaper than HD sensor suites, but offer lower fidelity. Leveraging these sensors for training an AV motion planner opens a financially viable path to observe the ‘long tail’ of driving events. As our main contribution we show it is possible to train a high-performance motion planner using commodity vision data which outperforms planners trained on HD-sensor data for a fraction of the cost. To the best of our knowledge, we are the first to demonstrate this using real-world data. We compare the performance of the autonomy system on these two different sensor configurations, and show that we can compensate for the lower sensor fidelity by means of increased quantity: a planner trained on 100h of commodity vision data outperforms the one with 25h of expensive HD data (see Fig. 1). We also share the engineering challenges we had to tackle to make this work.

JBHI Journal 2021 Journal Article

Deep Learning Methods for Lung Cancer Segmentation in Whole-Slide Histopathology Images—The ACDC@LungHP Challenge 2019

Zhang Li
Jiehua Zhang
Tao Tan
Xichao Teng
Xiaoliang Sun
Hong Zhao
Lihong Liu
Yang Xiao

Accurate segmentation of lung cancer in pathology slides is a critical step in improving patient care. We proposed the ACDC@LungHP (Automatic Cancer Detection and Classification in Whole-slide Lung Histopathology) challenge for evaluating different computer-aided diagnosis (CADs) methods on the automatic diagnosis of lung cancer. The ACDC@LungHP 2019 focused on segmentation (pixel-wise detection) of cancer tissue in whole slide imaging (WSI), using an annotated dataset of 150 training images and 50 test images from 200 patients. This paper reviews this challenge and summarizes the top 10 submitted methods for lung cancer segmentation. All methods were evaluated using metrics using the precision, accuracy, sensitivity, specificity, and DICE coefficient (DC). The DC ranged from 0. 7354 $\pm$ 0. 1149 to 0. 8372 $\pm$ 0. 0858. The DC of the best method was close to the inter-observer agreement (0. 8398 $\pm$ 0. 0890). All methods were based on deep learning and categorized into two groups: multi-model method and single model method. In general, multi-model methods were significantly better ( p $< $ 0. 01) than single model methods, with mean DC of 0. 7966 and 0. 7544, respectively. Deep learning based methods could potentially help pathologists find suspicious regions for further analysis of lung cancer in WSI.

NeurIPS Conference 2020 Conference Paper

A Variational Approach for Learning from Positive and Unlabeled Data

Hui Chen
Fangqing Liu
Yin Wang
Liyue Zhao
Hao Wu

Learning binary classiﬁers only from positive and unlabeled (PU) data is an important and challenging task in many real-world applications, including web text classiﬁcation, disease gene identiﬁcation and fraud detection, where negative samples are difﬁcult to verify experimentally. Most recent PU learning methods are developed based on the misclassiﬁcation risk of the supervised learning type, and they may suffer from inaccurate estimates of class prior probabilities. In this paper, we introduce a variational principle for PU learning that allows us to quantitatively evaluate the modeling error of the Bayesian classiﬁer directly from given data. This leads to a loss function which can be efﬁciently calculated without involving class prior estimation or any other intermediate estimation problems, and the variational learning method can then be employed to optimize the classiﬁer under general conditions. We illustrate the effectiveness of the proposed variational method on a number of benchmark examples.

AAAI Conference 2020 Conference Paper

An Iterative Polishing Framework Based on Quality Aware Masked Language Model for Chinese Poetry Generation

Liming Deng
Jie Wang
Hangming Liang
Hui Chen
Zhiqiang Xie
Bojin Zhuang
Shaojun Wang
Jing Xiao

Owing to its unique literal and aesthetical characteristics, automatic generation of Chinese poetry is still challenging in Artiﬁcial Intelligence, which can hardly be straightforwardly realized by end-to-end methods. In this paper, we propose a novel iterative polishing framework for highly qualiﬁed Chinese poetry generation. In the ﬁrst stage, an encoder-decoder structure is utilized to generate a poem draft. Afterwards, our proposed Quality-Aware Masked Language Model (QA- MLM) is employed to polish the draft towards higher quality in terms of linguistics and literalness. Based on a multi-task learning scheme, QA-MLM is able to determine whether polishing is needed based on the poem draft. Furthermore, QA- MLM is able to localize improper characters of the poem draft and substitute with newly predicted ones accordingly. Beneﬁted from the masked language model structure, QA- MLM incorporates global context information into the polishing process, which can obtain more appropriate polishing results than the unidirectional sequential decoding. Moreover, the iterative polishing process will be terminated automatically when QA-MLM regards the processed poem as a qualiﬁed one. Both human and automatic evaluation have been conducted, and the results demonstrate that our approach is effective to improve the performance of encoder-decoder structure.

AAAI Conference 2020 Conference Paper

Enhanced Meta-Learning for Cross-Lingual Named Entity Recognition with Minimal Resources

Qianhui Wu
Zijia Lin
Guoxin Wang
Hui Chen
Börje F. Karlsson
Biqing Huang
Chin-Yew Lin

For languages with no annotated resources, transferring knowledge from rich-resource languages is an effective solution for named entity recognition (NER). While all existing methods directly transfer from source-learned model to a target language, in this paper, we propose to ﬁne-tune the learned model with a few similar examples given a test case, which could beneﬁt the prediction by leveraging the structural and semantic information conveyed in such similar examples. To this end, we present a meta-learning algorithm to ﬁnd a good model parameter initialization that could fast adapt to the given test case and propose to construct multiple pseudo- NER tasks for meta-training by computing sentence similarities. To further improve the model’s generalization ability across different languages, we introduce a masking scheme and augment the loss function with an additional maximum term during meta-training. We conduct extensive experiments on cross-lingual named entity recognition with minimal resources over ﬁve target languages. The results show that our approach signiﬁcantly outperforms existing state-of-the-art methods across the board.

AIIM Journal 2019 Journal Article

Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging-SVM ensemble classifier

Jianying Lin
Hui Chen
Shan Li
Yushuang Liu
Xuan Li
Bin Yu

Discovering and accurately locating drug targets is of great significance for the research and development of new drugs. As a different approach to traditional drug development, the machine learning algorithm is used to predict the drug target by mining the data. Because of its advantages of short time and low cost, it has received more and more attention in recent years. In this paper, we propose a novel method for predicting druggable proteins. Firstly, the features of the protein sequence are extracted by combining Chou’s pseudo amino acid composition (PseAAC), dipeptide composition (DPC) and reduced sequence (RS), getting the 591 dimension of drug target dataset. Then, the feature information of druggable proteins dataset is selected by genetic algorithm (GA). Finally, we use Bagging ensemble learning to improve SVM classifier to get the final prediction model. The predictive accuracy rate reaches 93. 78% by using 5-fold cross-validation and compared with other state-of-the-art predictive methods. The results indicate that the method proposed in this paper has a high reference value for the prediction of potential drug targets, which will successfully play a key role in the drug research and development. The source code and all datasets are available at https: //github. com/QUST-AIBBDRC/GA-Bagging-SVM.

AAAI Conference 2019 Conference Paper

GRN: Gated Relation Network to Enhance Convolutional Neural Network for Named Entity Recognition

Hui Chen
Zijia Lin
Guiguang Ding
Jianguang Lou
Yusen Zhang
Borje Karlsson

The dominant approaches for named entity recognition (NER) mostly adopt complex recurrent neural networks (RNN), e. g. , long-short-term-memory (LSTM). However, RNNs are limited by their recurrent nature in terms of computational efficiency. In contrast, convolutional neural networks (CNN) can fully exploit the GPU parallelism with their feedforward architectures. However, little attention has been paid to performing NER with CNNs, mainly owing to their difficulties in capturing the long-term context information in a sequence. In this paper, we propose a simple but effective CNN-based network for NER, i. e. , gated relation network (GRN), which is more capable than common CNNs in capturing long-term context. Specifically, in GRN we firstly employ CNNs to explore the local context features of each word. Then we model the relations between words and use them as gates to fuse local context features into global ones for predicting labels. Without using recurrent layers that process a sentence in a sequential manner, our GRN allows computations to be performed in parallel across the entire sentence. Experiments on two benchmark NER datasets (i. e. , CoNLL- 2003 and Ontonotes 5. 0) show that, our proposed GRN can achieve state-of-the-art performance with or without external knowledge. It also enjoys lower time costs to train and test.

IJCAI Conference 2018 Conference Paper

Show, Observe and Tell: Attribute-driven Attention Model for Image Captioning

Hui Chen
Guiguang Ding
Zijia Lin
Sicheng Zhao
Jungong Han

Despite the fact that attribute-based approaches and attention-based approaches have been proven to be effective in image captioning, most attribute-based approaches simply predict attributes independently without taking the co-occurrence dependencies among attributes into account. Besides, most attention-based captioning models directly leverage the feature map extracted from CNN, in which many features may be redundant in relation to the image content. In this paper, we focus on training a good attribute-inference model via the recurrent neural network (RNN) for image captioning, where the co-occurrence dependencies among attributes can be maintained. The uniqueness of our inference model lies in the usage of a RNN with the visual attention mechanism to \textit{observe} the image before generating captions. Additionally, it is noticed that compact and attribute-driven features will be more useful for the attention-based captioning model. To this end, we extract the context feature for each attribute, and guide the captioning model adaptively attend to these context features. We verify the effectiveness and superiority of the proposed approach over the other captioning approaches by conducting massive experiments and comparisons on MS COCO image captioning dataset.

AAAI Conference 2018 Conference Paper

Temporal-Difference Learning With Sampling Baseline for Image Captioning

Hui Chen
Guiguang Ding
Sicheng Zhao
Jungong Han

The existing methods for image captioning usually train the language model under the cross entropy loss, which results in the exposure bias and inconsistency of evaluation metric. Recent research has shown these two issues can be well addressed by policy gradient method in reinforcement learning domain attributable to its unique capability of directly optimizing the discrete and non-differentiable evaluation metric. In this paper, we utilize reinforcement learning method to train the image captioning model. Speciﬁcally, we train our image captioning model to maximize the overall reward of the sentences by adopting the temporal-difference (TD) learning method, which takes the correlation between temporally successive actions into account. In this way, we assign different values to different words in one sampled sentence by a discounted coefﬁcient when back-propagating the gradient with the REINFORCE algorithm, enabling the correlation between actions to be learned. Besides, instead of estimating a “baseline” to normalize the rewards with another network, we utilize the reward of another Monte-Carlo sample as the “baseline” to avoid high variance. We show that our proposed method can improve the quality of generated captions and outperforms the state-of-the-art methods on the benchmark dataset MS COCO in terms of seven evaluation metrics.

AAAI Conference 2017 Conference Paper

Reference Based LSTM for Image Captioning

Minghai Chen
Guiguang Ding
Sicheng Zhao
Hui Chen
Qiang Liu
Jungong Han

Image captioning is an important problem in artiﬁcial intelligence, related to both computer vision and natural language processing. There are two main problems in existing methods: in the training phase, it is difﬁcult to ﬁnd which parts of the captions are more essential to the image; in the caption generation phase, the objects or the scenes are sometimes misrecognized. In this paper, we consider the training images as the references and propose a Reference based Long Short Term Memory (R-LSTM) model, aiming to solve these two problems in one goal. When training the model, we assign different weights to different words, which enables the network to better learn the key information of the captions. When generating a caption, the consensus score is utilized to exploit the reference information of neighbor images, which might ﬁx the misrecognition and make the descriptions more natural-sounding. The proposed R-LSTM model outperforms the state-of-the-art approaches on the benchmark dataset MS COCO and obtains top 2 position on 11 of the 14 metrics on the online test server.

YNIMG Journal 2017 Journal Article

The neural circuits for arithmetic principles

Jie Liu
Han Zhang
Chuansheng Chen
Hui Chen
Jiaxin Cui
Xinlin Zhou

Arithmetic principles are the regularities underlying arithmetic computation. Little is known about how the brain supports the processing of arithmetic principles. The current fMRI study examined neural activation and functional connectivity during the processing of verbalized arithmetic principles, as compared to numerical computation and general language processing. As expected, arithmetic principles elicited stronger activation in bilateral horizontal intraparietal sulcus and right supramarginal gyrus than did language processing, and stronger activation in left middle temporal lobe and left orbital part of inferior frontal gyrus than did computation. In contrast, computation elicited greater activation in bilateral horizontal intraparietal sulcus (extending to posterior superior parietal lobule) than did either arithmetic principles or language processing. Functional connectivity analysis with the psychophysiological interaction approach (PPI) showed that left temporal-parietal (MTG-HIPS) connectivity was stronger during the processing of arithmetic principle and language than during computation, whereas parietal-occipital connectivities were stronger during computation than during the processing of arithmetic principles and language. Additionally, the left fronto-parietal (orbital IFG-HIPS) connectivity was stronger during the processing of arithmetic principles than during computation. The results suggest that verbalized arithmetic principles engage a neural network that overlaps but is distinct from the networks for computation and language processing.

YNICL Journal 2014 Journal Article

Multimodal neuroimaging in presurgical evaluation of drug-resistant epilepsy

Jing Zhang
Weifang Liu
Hui Chen
Hong Xia
Zhen Zhou
Shanshan Mei
Qingzhu Liu
Yunlin Li

Intracranial EEG (icEEG) monitoring is critical in epilepsy surgical planning, but it has limitations. The advances of neuroimaging have made it possible to reveal epileptic abnormalities that could not be identified previously and improve the localization of the seizure focus and the vital cortex. A frequently asked question in the field is whether non-invasive neuroimaging could replace invasive icEEG or reduce the need for icEEG in presurgical evaluation. This review considers promising neuroimaging techniques in epilepsy presurgical assessment in order to address this question. In addition, due to large variations in the accuracies of neuroimaging across epilepsy centers, multicenter neuroimaging studies are reviewed, and there is much need for randomized controlled trials (RCTs) to better reveal the utility of presurgical neuroimaging. The results of multiple studies indicate that non-invasive neuroimaging could not replace invasive icEEG in surgical planning especially in non-lesional or extratemporal lobe epilepsies, but it could reduce the need for icEEG in certain cases. With technical advances, multimodal neuroimaging may play a greater role in presurgical evaluation to reduce the costs and risks of epilepsy surgery, and provide surgical options for more patients with drug-resistant epilepsy.