Arrow Research search

Author name cluster

Gang Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

65 papers
2 author rows

Possible papers

65

NeurIPS Conference 2025 Conference Paper

DyMoDreamer: World Modeling with Dynamic Modulation

  • Boxuan Zhang
  • Runqing Wang
  • Wei Xiao
  • Weipu Zhang
  • Jian Sun
  • Gao Huang
  • Jie Chen
  • Gang Wang

A critical bottleneck in deep reinforcement learning (DRL) is sample inefficiency, as training high-performance agents often demands extensive environmental interactions. Model-based reinforcement learning (MBRL) mitigates this by building world models that simulate environmental dynamics and generate synthetic experience, improving sample efficiency. However, conventional world models process observations holistically, failing to decouple dynamic objects and temporal features from static backgrounds. This approach is computationally inefficient, especially for visual tasks where dynamic objects significantly influence rewards and decision-making performance. To address this, we introduce DyMoDreamer, a novel MBRL algorithm that incorporates a dynamic modulation mechanism to improve the extraction of dynamic features and enrich the temporal information. DyMoDreamer employs differential observations derived from a novel inter-frame differencing mask, explicitly encoding object-level motion cues and temporal dynamics. Dynamic modulation is modeled as stochastic categorical distributions and integrated into a recurrent state-space model (RSSM), enhancing the model's focus on reward-relevant dynamics. Experiments demonstrate that DyMoDreamer sets a new state-of-the-art on the Atari $100$k benchmark with a $156. 6$\% mean human-normalized score, establishes a new record of $832$ on the DeepMind Visual Control Suite, and gains a $9. 5$\% performance improvement after $1$M steps on the Crafter benchmark.

AAAI Conference 2025 Conference Paper

EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models

  • GuangHao Meng
  • Sunan He
  • Jinpeng Wang
  • Tao Dai
  • Letian Zhang
  • Jieming Zhu
  • Qing Li
  • Gang Wang

Vision-language retrieval (VLR) has attracted significant attention in both academia and industry, which involves using text (or images) as queries to retrieve corresponding images (or text). However, existing methods often neglect the rich visual semantics knowledge of entities, thus leading to incorrect retrieval results. To address this problem, we propose the Entity Visual Description enhanced CLIP (EvdCLIP), designed to leverage the visual knowledge of entities to enrich queries. Specifically, since humans recognize entities through visual cues, we employ a large language model (LLM) to generate Entity Visual Descriptions (EVDs) as alignment cues to complement textual data. These EVDs are then integrated into raw queries to create visually-rich, EVD-enhanced queries. Furthermore, recognizing that EVD-enhanced queries may introduce noise or low-quality expansions, we develop a novel, trainable EVD-aware Rewriter (EaRW) for vision-language retrieval tasks. EaRW utilizes EVD knowledge and the generative capabilities of the language model to effectively rewrite queries. With our specialized training strategy, EaRW can generate high-quality and low-noise EVD-enhanced queries. Extensive quantitative and qualitative experiments on image-text retrieval benchmarks validate the superiority of EvdCLIP on vision-language retrieval tasks.

AAAI Conference 2025 Conference Paper

Genomics Data Lossless Compression with (S, K)-Mer Encoding and Deep Neural Networks

  • Hui Sun
  • Liping Yi
  • Huidong Ma
  • Yongxia Sun
  • Yingfeng Zheng
  • Wenwen Cui
  • Meng Yan
  • Gang Wang

Learning-based compression shows competitive compression ratios for genomics data. It often includes three types of compressors: static, adaptive and semi-adaptive. However, these existing compressors suffer from inferior compression ratios or throughput, and adaptive compressors also faces model cold-start problems. To address these issues, we propose DeepGeCo, a novel genomics data lossless adaptive compression framework with (s,k)-mer encoding and deep neural networks, involving three compression modes (MINI for static, PLUS for adaptive, ULTRA for semi-adaptive) for flexible requirements of compression ratios or throughput. In DeepGeCo, (1) we develop BiGRU and Transformer as the backbone to build Warm-Start and Supporter models in terms of cold-start problems. (2) We introduce (s,k)-mer encoding to pre-process genomics data before feeding it into the DNN model for improve model throughput, and we propose a new metric - Ranking of Throughput and Compression Ratio (RTCR) for effective encoding parameters selection. (3) We design a threshold controller and a probabilistic mixer within the backbone to balance compression ratios and model throughput. Experiments on 10 real-world datasets show that DeepGeCo's three compression modes improve up to a 22.949X average throughput and up to a 31.095% average compression ratio improvement while occupying low CPU or GPU memory.

JBHI Journal 2025 Journal Article

Multi-Perturbation Consistency Learning for Semi-Supervised Medical Image Segmentation

  • Zhiyuan Zhang
  • Yu Zhang
  • Jing Chen
  • Wenlong Feng
  • Zihao Zhou
  • Jie Zou
  • Uzair Aslam Bhatti
  • Gang Wang

Existing semi-supervised learning (SSL) methods primarily rely on consistency learning to enhance model performance. However, most current approaches only validate the effectiveness of consistency learning under single perturbations, while introducing multiple perturbations may lead to the failure of consistency learning and degrade model performance. To address this issue and effectively leverage multiple perturbations for consistency learning, we propose a semi-supervised medical image segmentation method based on multi-perturbation consistency learning. Specifically, we design a cross-teaching framework integrating sparsely annotated 3D and 2D networks, introducing network perturbations through multidimensional architectures while combining strong and weak data augmentation techniques to achieve input perturbations. Furthermore, to address the instability issue in multi-perturbation consistency learning, we develop two complementary uncertainty-aware correction algorithm targeting labeled and unlabeled data. These designs effectively enhance the model's robustness to both labeled and unlabeled data, overcoming the instability problem in multi-perturbation consistency learning. To validate the proposed method, we conducted experiments on four datasets(ProstateX, HPH55, ACDC, and LA). Experimental results demonstrate that our algorithm outperforms existing methods across all validation datasets and exhibits strong generalization capabilities. This indicates that our approach can maintain excellent performance with limited annotated data while achieving efficient medical image segmentation. The project code will be made publicly available upon acceptance.

IJCAI Conference 2025 Conference Paper

Optical Flow Estimation for Tiny Objects: New Problem, Specialized Benchmark, and Bioinspired Scheme

  • Xueyao Ji
  • Gang Wang
  • Yizheng Wang

Optical flow is pivotal in video-based tasks, yet existing methods mostly focus on medium-/large-size objects, while underperforming when characterizing the motion of tiny objects. To bridge this gap, we introduce the On-off Time-delay with Hassenstein-Reichardt correlator (OTHR), a computationally efficient scheme inspired by the primate visual cortex's direction selectivity mechanism. OTHR kernels, applied across multiple frames, discern bright/dark luminance changes along a specific direction over a time delay, effectively estimating motion of tiny objects amidst noise and static backgrounds. Notably, OTHR integrates seamlessly with leading deep learning flow estimation models such as RAFT and FlowFormer. We also propose refined evaluation metrics for tiny objects and contribute a new dataset featuring such objects to aid algorithm development. Our experiments confirm OTHR's superiority over competing methods, particularly in enhancing state-of-the-art models' performance on tiny object motion estimation at minimal cost. Specifically, for objects less than 100 pixels, OTHR reduces RAFT and FlowFormer's errors by 22. 03% and 83. 50%, respectively. The codes will be accessible at https: //github. com/JaneEliot/OTHR.

AAAI Conference 2025 Conference Paper

pFedES: Generalized Proxy Feature Extractor Sharing for Model Heterogeneous Personalized Federated Learning

  • Liping Yi
  • Han Yu
  • Chao Ren
  • Gang Wang
  • Xiaoguang Liu
  • Xiaoxiao Li

Federated learning (FL), as a privacy-preserving collaborative machine learning paradigm, has attracted significant interest from industry and academia. To allow each data owner (FL client) to train a heterogeneous and personalized local model based on its local data distribution, system resources and requirements on model structure, the field of model-heterogeneous personalized federated learning (MHPFL) has emerged. Existing MHPFL approaches either rely on the availability of a public dataset with special characteristics to facilitate knowledge transfer, incur high computational and communication costs, or face potential model leakage risks. To address these limitations, we propose a model-heterogeneous personalized Federated learning approach based on generalized proxy feature Extractor Sharing (pFedES) for supervised image classification tasks. (1) We devise a shared small proxy homogeneous feature extractor before each client's heterogeneous local model. (2) Clients train them via the proposed iterative learning to enable the exchange of global generalized knowledge and local personalized knowledge. (3) The small proxy local homogeneous extractors produced after local training are uploaded to the server for aggregation to facilitate knowledge fusion across clients. We theoretically prove pFedES converges with a non-convex convergence rate O(1/T). Experiments on 3 benchmark datasets against 9 baselines demonstrate that pFedES performs state-of-the-art model accuracy while maintaining efficient communication and computation.

NeurIPS Conference 2025 Conference Paper

PurpCode: Reasoning for Safer Code Generation

  • Jiawei Liu
  • Nirav Diwan
  • Zhe Wang
  • Haoyu Zhai
  • Xiaona Zhou
  • Kiet Nguyen
  • Tianjiao Yu
  • Muntasir Wahed

We introduce PurpCode, the first post-training recipe for training safe code reasoning models towards generating secure code and defending against malicious cyberactivities. PurpCode trains a reasoning model in two stages: (i) Rule Learning, which explicitly teaches the model to reference cybersafety rules to generate vulnerability-free code and to avoid facilitating malicious cyberactivities; and (ii) Reinforcement Learning, which optimizes model safety and preserves model utility through diverse, multi-objective reward mechanisms. To empower the training pipelines with comprehensive cybersafety data, we conduct internal red-teaming to synthesize comprehensive and high-coverage prompts based on real-world tasks for inducing unsafe cyberactivities in the model. Based on PurpCode, we develop a reasoning-based coding model, namely PurpCode-32B, which demonstrates state-of-the-art cybersafety, outperforming various frontier models. Moreover, our alignment method decreases the model overrefusal rates in both general and cybersafety-specific scenarios, while preserving model utility in both code generation and common security knowledge.

IJCAI Conference 2025 Conference Paper

SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation

  • Zhaoxi Mu
  • Xinyu Yang
  • Gang Wang

While contemporary speech separation technologies adeptly process lengthy mixed audio waveforms, they are frequently challenged by the intricacies of real-world environments, including noisy and reverberant settings, which can result in artifacts or distortions in the separated speech. To overcome these limitations, we introduce SepALM, a pioneering approach that employs audio language models (ALMs) to rectify and re-synthesize speech within the text domain following preliminary separation. SepALM comprises four core components: a separator, a corrector, a synthesizer, and an aligner. By integrating an ALM-based end-to-end error correction mechanism, we mitigate the risk of error accumulation and circumvent the optimization hurdles typically encountered in conventional methods that amalgamate automatic speech recognition (ASR) with large language models (LLMs). Additionally, we have developed Chain-of-Thought (CoT) prompting and knowledge distillation techniques to facilitate the reasoning and training processes of the ALM. Our experiments substantiate that SepALM not only elevates the precision of speech separation but also markedly bolsters adaptability in novel acoustic environments.

JBHI Journal 2024 Journal Article

A Feature Fusion Model Based on Temporal Convolutional Network for Automatic Sleep Staging Using Single-Channel EEG

  • Jiameng Bao
  • Guangming Wang
  • Tianyu Wang
  • Ning Wu
  • Shimin Hu
  • Won Hee Lee
  • Sio-Long Lo
  • Xiangguo Yan

Sleep staging is a crucial task in sleep monitoring and diagnosis, but clinical sleep staging is both time-consuming and subjective. In this study, we proposed a novel deep learning algorithm named feature fusion temporal convolutional network (FFTCN) for automatic sleep staging using single-channel EEG data. This algorithm employed a one-dimensional convolutional neural network (1D-CNN) to extract temporal features from raw EEG, and a two-dimensional CNN (2D-CNN) to extract time-frequency features from spectrograms generated through continuous wavelet transform (CWT) at the epoch level. These features were subsequently fused and further fed into a temporal convolutional network (TCN) to classify sleep stages at the sequence level. Moreover, a two-step training strategy was used to enhance the model's performance on an imbalanced dataset. Our proposed method exhibits superior performance in the 5-class classification task for healthy subjects, as evaluated on the SHHS-1, Sleep-EDF-153, and ISRUC-S1 datasets. This work provided a straightforward and promising method for improving the accuracy of automatic sleep staging using only single-channel EEG, and the proposed method exhibited great potential for future applications in professional sleep monitoring, which could effectively alleviate the workload of sleep technicians.

JBHI Journal 2024 Journal Article

Classification of Three Anesthesia Stages Based on Near-Infrared Spectroscopy Signals

  • Zhian Liu
  • Lichengxi Si
  • Shaoxian Shi
  • Jing Li
  • Jing Zhu
  • Won Hee Lee
  • Sio-Long Lo
  • Xiangguo Yan

Proper monitoring of anesthesia stages can guarantee the safe performance of clinical surgeries. In this study, different anesthesia stages were classified using near-infrared spectroscopy (NIRS) signals with machine learning. The cerebral hemodynamic variables of right proximal oxyhemoglobin (HbO 2 ) in maintenance (MNT), emergence (EM) and the consciousness (CON) stage were collected and then the differences between the three stages were compared by phase-amplitude coupling (PAC). Then combined with time-domain including linear (mean, standard deviation, max, min and range), nonlinear (sample entropy) and power in frequency-domain signal features, feature selection was performed and finally classification was performed by support vector machine (SVM) classifier. The results show that the PAC of the NIRS signal was gradually enhanced with the deepening of anesthesia level. A good three-classification accuracy of 69. 27% was obtained, which exceeded the result of classification of any single category feature. These results indicate the feasibility of NIRS signals in performing three or even more anesthesia stage classifications, providing insight into the development of new anesthesia monitoring modalities.

AAAI Conference 2024 Conference Paper

DGA-GNN: Dynamic Grouping Aggregation GNN for Fraud Detection

  • Mingjiang Duan
  • Tongya Zheng
  • Yang Gao
  • Gang Wang
  • Zunlei Feng
  • Xinyu Wang

Fraud detection has increasingly become a prominent research field due to the dramatically increased incidents of fraud. The complex connections involving thousands, or even millions of nodes, present challenges for fraud detection tasks. Many researchers have developed various graph-based methods to detect fraud from these intricate graphs. However, those methods neglect two distinct characteristics of the fraud graph: the non-additivity of certain attributes and the distinguishability of grouped messages from neighbor nodes. This paper introduces the Dynamic Grouping Aggregation Graph Neural Network (DGA-GNN) for fraud detection, which addresses these two characteristics by dynamically grouping attribute value ranges and neighbor nodes. In DGA-GNN, we initially propose the decision tree binning encoding to transform non-additive node attributes into bin vectors. This approach aligns well with the GNN’s aggregation operation and avoids nonsensical feature generation. Furthermore, we devise a feedback dynamic grouping strategy to classify graph nodes into two distinct groups and then employ a hierarchical aggregation. This method extracts more discriminative features for fraud detection tasks. Extensive experiments on five datasets suggest that our proposed method achieves a 3% ~ 16% improvement over existing SOTA methods. Code is available at https://github.com/AtwoodDuan/DGA-GNN.

NeurIPS Conference 2024 Conference Paper

Federated Model Heterogeneous Matryoshka Representation Learning

  • Liping Yi
  • Han Yu
  • Chao Ren
  • Gang Wang
  • Xiaoguang Liu
  • Xiaoxiao Li

Model heterogeneous federated learning (MHeteroFL) enables FL clients to collaboratively train models with heterogeneous structures in a distributed fashion. However, existing MHeteroFL methods rely on training loss to transfer knowledge between the client model and the server model, resulting in limited knowledge exchange. To address this limitation, we propose the **Fed**erated model heterogeneous **M**atryoshka **R**epresentation **L**earning (**FedMRL**) approach for supervised learning tasks. It adds an auxiliary small homogeneous model shared by clients with heterogeneous local models. (1) The generalized and personalized representations extracted by the two models' feature extractors are fused by a personalized lightweight representation projector. This step enables representation fusion to adapt to local data distribution. (2) The fused representation is then used to construct Matryoshka representations with multi-dimensional and multi-granular embedded representations learned by the global homogeneous model header and the local heterogeneous model header. This step facilitates multi-perspective representation learning and improves model learning capability. Theoretical analysis shows that FedMRL achieves a $O(1/T)$ non-convex convergence rate. Extensive experiments on benchmark datasets demonstrate its superior model accuracy with low communication and computational costs compared to seven state-of-the-art baselines. It achieves up to 8. 48% and 24. 94% accuracy improvement compared with the state-of-the-art and the best same-category baseline, respectively.

IJCAI Conference 2024 Conference Paper

FedSSA: Semantic Similarity-based Aggregation for Efficient Model-Heterogeneous Personalized Federated Learning

  • Liping Yi
  • Han Yu
  • Zhuan Shi
  • Gang Wang
  • Xiaoguang Liu
  • Lizhen Cui
  • Xiaoxiao Li

Federated learning (FL) is a privacy-preserving collaboratively machine learning paradigm. Traditional FL requires all data owners (a. k. a. FL clients) to train the same local model. This design is not well-suited for scenarios involving data and/or system heterogeneity. Model-Heterogeneous Personalized FL (MHPFL) has emerged to address this challenge. Existing MHPFL approaches often rely on a public dataset with the same nature as the learning task, or incur high computation and communication costs. To address these limitations, we propose the Federated Semantic Similarity Aggregation (FedSSA) approach for supervised classification tasks, which splits each client's model into a heterogeneous (structure-different) feature extractor and a homogeneous (structure-same) classification header. It performs local-to-global knowledge transfer via semantic similarity-based header parameter aggregation. In addition, global-to-local knowledge transfer is achieved via an adaptive parameter stabilization strategy which fuses the seen-class parameters of historical local headers with that of the latest global header for each client. FedSSA does not rely on public datasets, while only requiring partial header parameter transmission to save costs. Theoretical analysis proves the convergence of FedSSA. Extensive experiments present that FedSSA achieves up to 3. 62% higher accuracy, 15. 54 times higher communication efficiency, and 15. 52 times higher computational efficiency compared to 7 state-of-the-art MHPFL baselines.

ICRA Conference 2024 Conference Paper

Increasing the Absolute Position Accuracy of Industrial Robots by Means of a Deep Continual Evidential Regression Model

  • Eckart Uhlmann
  • Mitchel Polte
  • Julian Blumberg
  • Sheng Yin
  • Gang Wang

The use of industrial robots represents a key technology for increasing productivity and efficiency in manufacturing. However, their low absolute position accuracy still denies the broad substitution of machine tools by industrial robots. In this paper, a data-driven method for accuracy enhancement of industrial robots under consideration of kinematic, elastic, and thermal effects is presented. A continual learning algorithm is proposed, which allows to train the model in a process-parallel manner without suffering from catastrophic forgetting. Furthermore, the model is able to determine confidence intervals of the prediction values and thus supports further processing in safety-relevant applications. The effectiveness of the model can be demonstrated using a large data stream with about 3, 000 real data points. As a result, it can be shown that the absolute position accuracy of the industrial robot can be improved by 96 % with the proposed method.

AAAI Conference 2024 Conference Paper

Multi-Constellation-Inspired Single-Shot Global LiDAR Localization

  • Tongzhou Zhang
  • Gang Wang
  • Yu Chen
  • Hai Zhang
  • Jue Hu

Global localization is a challenging task for intelligent robots, as its accuracy directly contributes to the performance of downstream navigation and planning tasks. However, existing literature focus more on the place retrieval and the success rate of localization, with limited attention given to the metrics of position estimation. In this paper, a single-shot global LiDAR localization method is proposed with the ultimate goal of achieving high position accuracy, inspired by the positioning approach of multi-constellation localization systems. Initially, we perform coarse localization using global descriptors and select observation points along with their corresponding coordinates based on the obtained coarse localization results. Coordinates can be acquired from a pre-built map, GNSS, or other devices. Then, a lightweight LiDAR odometry method is designed to estimate the distance between the retrieved data and the observation points. Ultimately, the localization problem is transformed into an optimization problem of solving a system of multiple sphere equations. The experimental results on the KITTI dataset and the self-collected dataset demonstrate that our method achieves an average localization error (including errors in the z-axis) of 0.89 meters. In addition, it achieves retrieval efficiency of 0.357 s per frame on the former dataset and 0.214 s per frame on the latter one. Code and data are available at https://github.com/jlurobot/multi-constellation-localization.

IJCAI Conference 2024 Conference Paper

Unified Single-Stage Transformer Network for Efficient RGB-T Tracking

  • Jianqiang Xia
  • Dianxi Shi
  • Ke Song
  • Linna Song
  • Xiaolei Wang
  • Songchang Jin
  • Chenran Zhao
  • Yu Cheng

Most existing RGB-T tracking networks extract modality features in a separate manner, which lacks interaction and mutual guidance between modalities. This limits the network's ability to adapt to the diverse dual-modality appearances of targets and the dynamic relationships between the modalities. Additionally, the three-stage fusion tracking paradigm followed by these networks significantly restricts the tracking speed. To overcome these problems, we propose a unified single-stage Transformer RGB-T tracking network, namely USTrack, which unifies the above three stages into a single ViT (Vision Transformer) backbone through joint feature extraction, fusion and relation modeling. With this structure, the network can not only extract the fusion features of templates and search regions under the interaction of modalities, but also significantly improve tracking speed through the single-stage fusion tracking paradigm. Furthermore, we introduce a novel feature selection mechanism based on modality reliability to mitigate the influence of invalid modalities for final prediction. Extensive experiments on three mainstream RGB-T tracking benchmarks show that our method achieves the new state-of-the-art while achieving the fastest tracking speed of 84. 2FPS. Code is available at https: //github. com/xiajianqiang/USTrack.

NeurIPS Conference 2024 Conference Paper

Weak-eval-Strong: Evaluating and Eliciting Lateral Thinking of LLMs with Situation Puzzles

  • Qi Chen
  • Bowen Zhang
  • Gang Wang
  • Qi Wu

While advancements in NLP have significantly improved the performance of Large Language Models (LLMs) on tasks requiring vertical thinking, their lateral thinking capabilities remain under-explored and challenging to measure due to the complexity of assessing creative thought processes and the scarcity of relevant data. To address these challenges, we introduce SPLAT, a benchmark leveraging Situation Puzzles to evaluate and elicit LAteral Thinking of LLMs. This benchmark, containing 975 graded situation puzzles across three difficulty levels, employs a new multi-turn player-judge framework instead of the traditional model-based evaluation, which often necessitates a stronger evaluation model. This framework simulates an interactive game where the model (player) asks the evaluation model (judge) questions about an incomplete story to infer the full scenario. The judge answers based on a detailed reference scenario or evaluates if the player's predictions align with the reference one. This approach lessens dependence on more robust evaluation models, enabling the assessment of state-of-the-art LLMs. The experiments demonstrate that a robust evaluation model, such as WizardLM-2, closely matches human judgements in both intermediate question-answering and final scenario accuracy, achieving over 80% agreement--similar to the agreement levels among humans. Furthermore, applying data and reasoning processes from our benchmark to other lateral thinking-related benchmarks, e. g. , RiddleSense and BrainTeaser, leads to performance enhancements. This suggests that our benchmark effectively evaluates and elicits the lateral thinking abilities of LLMs.

IJCAI Conference 2023 Conference Paper

JEPOO: Highly Accurate Joint Estimation of Pitch, Onset and Offset for Music Information Retrieval

  • Haojie Wei
  • Jun Yuan
  • Rui Zhang
  • Yueguo Chen
  • Gang Wang

Melody extraction is a core task in music information retrieval, and the estimation of pitch, onset and offset are key sub-tasks in melody extraction. Existing methods have limited accuracy, and work for only one type of data, either single-pitch or multi-pitch. In this paper, we propose a highly accurate method for joint estimation of pitch, onset and offset, named JEPOO. We address the challenges of joint learning optimization and handling both single-pitch and multi-pitch data through novel model design and a new optimization technique named Pareto modulated loss with loss weight regularization. This is the first method that can accurately handle both single-pitch and multi-pitch music data, and even a mix of them. A comprehensive experimental study on a wide range of real datasets shows that JEPOO outperforms state-of-the-art methods by up to 10. 6\%, 8. 3\% and 10. 3\% for the prediction of Pitch, Onset and Offset, respectively, and JEPOO is robust for various types of data and instruments. The ablation study validates the effectiveness of each component of JEPOO.

NeurIPS Conference 2023 Conference Paper

STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning

  • Weipu Zhang
  • Gang Wang
  • Jian Sun
  • Yetian Yuan
  • Gao Huang

Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments. These approaches begin by constructing a parameterized simulation world model of the real environment through self-supervised learning. By leveraging the imagination of the world model, the agent's policy is enhanced without the constraints of sampling from the real environment. The performance of these algorithms heavily relies on the sequence modeling and generation capabilities of the world model. However, constructing a perfectly accurate model of a complex unknown environment is nearly impossible. Discrepancies between the model and reality may cause the agent to pursue virtual goals, resulting in subpar performance in the real environment. Introducing random noise into model-based reinforcement learning has been proven beneficial. In this work, we introduce Stochastic Transformer-based wORld Model (STORM), an efficient world model architecture that combines the strong sequence modeling and generation capabilities of Transformers with the stochastic nature of variational autoencoders. STORM achieves a mean human performance of $126. 7\%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods that do not employ lookahead search techniques. Moreover, training an agent with $1. 85$ hours of real-time interaction experience on a single NVIDIA GeForce RTX 3090 graphics card requires only $4. 3$ hours, showcasing improved efficiency compared to previous methodologies.

NeurIPS Conference 2023 Conference Paper

ZoomTrack: Target-aware Non-uniform Resizing for Efficient Visual Tracking

  • Yutong Kou
  • Jin Gao
  • Bing Li
  • Gang Wang
  • Weiming Hu
  • Yizheng Wang
  • Liang Li

Recently, the transformer has enabled the speed-oriented trackers to approach state-of-the-art (SOTA) performance with high-speed thanks to the smaller input size or the lighter feature extraction backbone, though they still substantially lag behind their corresponding performance-oriented versions. In this paper, we demonstrate that it is possible to narrow or even close this gap while achieving high tracking speed based on the smaller input size. To this end, we non-uniformly resize the cropped image to have a smaller input size while the resolution of the area where the target is more likely to appear is higher and vice versa. This enables us to solve the dilemma of attending to a larger visual field while retaining more raw information for the target despite a smaller input size. Our formulation for the non-uniform resizing can be efficiently solved through quadratic programming (QP) and naturally integrated into most of the crop-based local trackers. Comprehensive experiments on five challenging datasets based on two kinds of transformer trackers, \ie, OSTrack and TransT, demonstrate consistent improvements over them. In particular, applying our method to the speed-oriented version of OSTrack even outperforms its performance-oriented counterpart by 0. 6\% AUC on TNL2K, while running 50\% faster and saving over 55\% MACs. Codes and models are available at https: //github. com/Kou-99/ZoomTrack.

JBHI Journal 2022 Journal Article

Flexible Dual-Channel Digital Auscultation Patch With Active Noise Reduction for Bowel Sound Monitoring and Application

  • Gang Wang
  • Yingyun Yang
  • Siyu Chen
  • Ji Fu
  • Dong Wu
  • Aiming Yang
  • Yinji Ma
  • Xue Feng

Bowel sounds (BSs) have important clinical value in the auxiliary diagnosis of digestive diseases, but due to the inconvenience of long-term monitoring and too much interference from environmental noise, they have not been well studied. Most of the current electronic stethoscopes are hard and bulky without the function of noise reduction, and their application for long-term wearable monitoring of BS in noisy clinical environments is very limited. In this paper, a flexible dual-channel digital auscultation patch with active noise reduction is designed and developed, which is wireless, wearable, and conformably attached to abdominal skin to record BS more accurately. The ambient noise can be greatly reduced through active noise reduction based on the adaptive filter. At the same time, some nonstationary noises appearing intermittently (e. g. , frictional noise) can also be removed from BS by the cross validation of multichannel simultaneous acquisition. Then, two kinds of typical BS signals are taken as examples, and the feature parameters of the BS in the time domain and frequency domain are extracted through the time-frequency analysis algorithm. Furthermore, based on the short-term energy ratio between the four channels of dual patches, the two-dimensional localization of BS on the abdomen mapping plane is realized. Finally, the continuous wearable monitoring of BS for patients with postoperative ileus (POI) in the noisy ward from pre-operation (POD0) to postoperative Day 7 (POD7) was carried out. The obtained change curve of the occurrence frequency of BS provides guidance for doctors to choose a reasonable feeding time for patients after surgery and accelerate their recovery. Therefore, flexible dual-channel digital auscultation patches with active noise reduction will have promising applications in the clinical auxiliary diagnosis of digestive diseases.

IJCAI Conference 2022 Conference Paper

Learning from Students: Online Contrastive Distillation Network for General Continual Learning

  • Jin Li
  • Zhong Ji
  • Gang Wang
  • Qiang Wang
  • Feng Gao

The goal of General Continual Learning (GCL) is to preserve learned knowledge and learn new knowledge with constant memory from an infinite data stream where task boundaries are blurry. Distilling the model's response of reserved samples between the old and the new models is an effective way to achieve promise performance on GCL. However, it accumulates the inherent old model's response bias and is not robust to model changes. To this end, we propose an Online Contrastive Distillation Network (OCD-Net) to tackle these problems, which explores the merit of the student model in each time step to guide the training process of the student model. Concretely, the teacher model is devised to help the student model to consolidate the learned knowledge, which is trained online via integrating the model weights of the student model to accumulate the new knowledge. Moreover, our OCD-Net incorporates both relation and adaptive response to help the student model alleviate the catastrophic forgetting, which is also beneficial for the teacher model preserves the learned knowledge. Extensive experiments on six benchmark datasets demonstrate that our proposed OCD-Net significantly outperforms state-of-the-art approaches in 3. 26%~8. 71% with various buffer sizes. Our code is available at https: //github. com/lijincm/OCD-Net.

NeurIPS Conference 2021 Conference Paper

Collaborative Uncertainty in Multi-Agent Trajectory Forecasting

  • Bohan Tang
  • Yiqi Zhong
  • Ulrich Neumann
  • Gang Wang
  • Siheng Chen
  • Ya Zhang

Uncertainty modeling is critical in trajectory-forecasting systems for both interpretation and safety reasons. To better predict the future trajectories of multiple agents, recent works have introduced interaction modules to capture interactions among agents. This approach leads to correlations among the predicted trajectories. However, the uncertainty brought by such correlations is neglected. To fill this gap, we propose a novel concept, collaborative uncertainty (CU), which models the uncertainty resulting from the interaction module. We build a general CU-based framework to make a prediction model learn the future trajectory and the corresponding uncertainty. The CU-based framework is integrated as a plugin module to current state-of-the-art (SOTA) systems and deployed in two special cases based on multivariate Gaussian and Laplace distributions. In each case, we conduct extensive experiments on two synthetic datasets and two public, large-scale benchmarks of trajectory forecasting. The results are promising: 1) The results of synthetic datasets show that CU-based framework allows the model to nicely rebuild the ground-truth distribution. 2) The results of trajectory forecasting benchmarks demonstrate that the CU-based framework steadily helps SOTA systems improve their performances. Specially, the proposed CU-based framework helps VectorNet improve by 57 cm regarding Final Displacement Error on nuScenes dataset. 3) The visualization results of CU illustrate that the value of CU is highly related to the amount of the interactive information among agents.

JBHI Journal 2021 Journal Article

Effective Brain State Estimation During Propofol-Induced Sedation Using Advanced EEG Microstate Spectral Analysis

  • Yamin Li
  • Wen Shi
  • Zhian Liu
  • Jing Li
  • Qiang Wang
  • Xiangguo Yan
  • Zehong Cao
  • Gang Wang

Brain states are patterns of neuronal synchrony, and the electroencephalogram (EEG) microstate provides a promising tool to characterize and analyze the synchronous neural firing. However, the topographical spectral information for each predominate microstate is still unclear during the switch of consciousness, such as sedation, and the practical usage of the EEG microstate is worth probing. Also, the mechanism behind the anesthetic-induced alternations of brain states remains poorly understood. In this study, an advanced EEG microstate spectral analysis was utilized using multivariate empirical mode decomposition in Hilbert-Huang transform. The practicability was further investigated in scalp EEG recordings during the propofol-induced transition of consciousness. The process of transition from the awake baseline to moderate sedation was accompanied by apparent increases in microstate (A, B, and F) energy, especially in the whole-brain delta band, frontal alpha band and beta band. In comparison to other effective EEG-based parameters that commonly used to measure anesthetic depth, using the selected spectral features reached better performance (80% sensitivity, 90% accuracy) to estimate the brain states during sedation. The changes in microstate energy also exhibited high correlations with individual behavioral data during sedation. In a nutshell, the EEG microstate spectral analysis is an effective method to estimate brain states during propofol-induced sedation, giving great insights into the underlying mechanism. The generated spectral features can be promising markers to dynamically assess the consciousness level.

TIST Journal 2021 Journal Article

Linking Multiple User Identities of Multiple Services from Massive Mobility Traces

  • Huandong Wang
  • Yong Li
  • Gang Wang
  • Depeng Jin

Understanding the linkability of online user identifiers (IDs) is critical to both service providers (for business intelligence) and individual users (for assessing privacy risks). Existing methods are designed to match IDs across two services but face key challenges of matching multiple services in practice, particularly when users have multiple IDs per service. In this article, we propose a novel system to link IDs across multiple services by exploring the spatial-temporal features of user activities, of which the core idea is that the same user's online IDs are more likely to repeatedly appear at the same location. Specifically, we first utilize a contact graph to capture the “co-location” of all IDs across multiple services. Based on this graph, we propose a set-wise matching algorithm to discover candidate ID sets and use Bayesian inference to generate confidence scores for candidate ranking, which is proved to be optimal. We evaluate our system using two real-world ground-truth datasets from an Internet service provider (4 services, 815K IDs) and Twitter-Foursquare (2 services, 770 IDs). Extensive results show that our system significantly outperforms the state-of-the-art algorithms in accuracy (AUC is higher by 0.1–0.2), and it is highly robust against data quality, matching order, and number of services.

ICRA Conference 2021 Conference Paper

MDANet: Multi-Modal Deep Aggregation Network for Depth Completion

  • Yanjie Ke
  • Kun Li
  • Wei Yang 0011
  • Zhenbo Xu
  • Dayang Hao
  • Liusheng Huang
  • Gang Wang

Depth completion aims to recover the dense depth map from sparse depth data and RGB image respectively. However, due to the huge difference between the multi-modal signal input, vanilla convolutional neural network and simple fusion strategy cannot extract features from sparse data and aggregate multi-modal information effectively. To tackle this problem, we design a novel network architecture that takes full advantage of multi-modal features for depth completion. An effective Pre-completion algorithm is first put forward to increase the density of the input depth map and to provide distribution priors. Moreover, to effectively fuse the image features and the depth features, we propose a multi-modal deep aggregation block that consists of multiple connection and aggregation pathways for deeper fusion. Furthermore, based on the intuition that semantic image features are beneficial for accurate contour, we introduce the deformable guided fusion layer to guide the generation of the dense depth map. The resulting architecture, called MDANet, outperforms all the stateof-the-art methods on the popular KITTI Depth Completion Benchmark, meanwhile with fewer parameters than recent methods. The code of this work will be available at https://github.com/USTC-Keyanjie/MDANet_ICRA2021.

ICRA Conference 2021 Conference Paper

Route Coverage Testing for Autonomous Vehicles via Map Modeling

  • Yun Tang 0003
  • Yuan Zhou 0005
  • Fenghua Wu
  • Yang Liu 0003
  • Jun Sun 0001
  • Wuling Huang
  • Gang Wang

Autonomous vehicles (AVs) play an important role in transforming our transportation systems and relieving traffic congestion. To guarantee their safety, AVs must be sufficiently tested before they are deployed to public roads. Existing testing often focuses on AVs’ collision avoidance on a given route. There is little work on the systematic testing for AVs’ route planning and tracking on a map. In this paper, we propose CROUTE, a novel testing method based on a new AV testing criterion called route coverage. First, the map is modeled as a labeled Petri net, where roads, junctions, and traffic signs are modeled as places, transitions, and labels, respectively. Second, based on the Petri net, we define junctions’ topology features and route features for junction classification. The topology feature describes the topology of roads forming the junction, and the route feature identifies the actions that a vehicle can take to follow a route. They can characterize route types on a map. Hence, route coverage measures how many route types are covered. We then propose a systematic method that aims to cover all route types for a well-designed AV system with a small number of test cases. We implement and evaluate CROUTE on Baidu Apollo running with the LGSVL simulator. We carry out testing on the map from a section of San Francisco and find six different types of issues in Apollo. The experiment results show the validity of route coverage and the efficiency of CROUTE.

NeurIPS Conference 2020 Conference Paper

Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis

  • Gang Wang
  • Songtao Lu
  • Georgios Giannakis
  • Gerald Tesauro
  • Jian Sun

The present contribution deals with decentralized policy evaluation in multi-agent Markov decision processes using temporal-difference (TD) methods with linear function approximation for scalability. The agents cooperate to estimate the value function of such a process by observing continual state transitions of a shared environment over the graph of interconnected nodes (agents), along with locally private rewards. Different from existing consensus-type TD algorithms, the approach here develops a simple decentralized TD tracker by wedding TD learning with gradient tracking techniques. The non-asymptotic properties of the novel TD tracker are established for both independent and identically distributed (i. i. d. ) as well as Markovian transitions through a unifying multistep Lyapunov analysis. In contrast to the prior art, the novel algorithm forgoes the limiting error bounds on the number of agents, which endows it with performance comparable to that of centralized TD methods that are the sharpest known to date.

AAAI Conference 2019 Conference Paper

Connecting the Digital and Physical World: Improving the Robustness of Adversarial Attacks

  • Steve T.K. Jan
  • Joseph Messou
  • Yen-Chen Lin
  • Jia-Bin Huang
  • Gang Wang

While deep learning models have achieved unprecedented success in various domains, there is also a growing concern of adversarial attacks against related applications. Recent results show that by adding a small amount of perturbations to an image (imperceptible to humans), the resulting adversarial examples can force a classifier to make targeted mistakes. So far, most existing works focus on crafting adversarial examples in the digital domain, while limited efforts have been devoted to understanding the physical domain attacks. In this work, we explore the feasibility of generating robust adversarial examples that remain effective in the physical domain. Our core idea is to use an image-to-image translation network to simulate the digital-to-physical transformation process for generating robust adversarial examples. To validate our method, we conduct a large-scale physical-domain experiment, which involves manually taking more than 3000 physical domain photos. The results show that our method outperforms existing ones by a large margin and demonstrates a high level of robustness and transferability.

IS Journal 2019 Journal Article

Identifying Adverse Drug Events From Social Media Using An Improved Semisupervised Method

  • Jing Liu
  • Gang Wang
  • Gang Chen

Adverse drug event (ADE) is a serious health concern. Social media has provided patients a broad platform to share their ADE experiences, impelling the development of social media-based pharmacovigilance. However, social media analysis of ADEs presents several important challenges that need to be addressed for high-performing ADE identification. To address these challenges, a feature weighted-based improved disagreement-based semisupervised learning method, named WIDSSL, is proposed for effectively identifying ADEs from non-ADEs. Empirical results demonstrate the effectiveness of WIDSSL. Our proposed WIDSSL method can reduce the reliance on a large number of labeled instances for high-performing ADE identification, and hence enhance the feasibility of conducting social media-based pharmacovigilance.

AAAI Conference 2019 Conference Paper

Modeling Local Dependence in Natural Language with Multi-Channel Recurrent Neural Networks

  • Chang Xu
  • Weiran Huang
  • Hongwei Wang
  • Gang Wang
  • Tie-Yan Liu

Recurrent Neural Networks (RNNs) have been widely used in processing natural language tasks and achieve huge success. Traditional RNNs usually treat each token in a sentence uniformly and equally. However, this may miss the rich semantic structure information of a sentence, which is useful for understanding natural languages. Since semantic structures such as word dependence patterns are not parameterized, it is a challenge to capture and leverage structure information. In this paper, we propose an improved variant of RNN, Multi-Channel RNN (MC-RNN), to dynamically capture and leverage local semantic structure information. Concretely, MC-RNN contains multiple channels, each of which represents a local dependence pattern at a time. An attention mechanism is introduced to combine these patterns at each step, according to the semantic information. Then we parameterize structure information by adaptively selecting the most appropriate connection structures among channels. In this way, diverse local structures and dependence patterns in sentences can be well captured by MC-RNN. To verify the effectiveness of MC-RNN, we conduct extensive experiments on typical natural language processing tasks, including neural machine translation, abstractive summarization, and language modeling. Experimental results on these tasks all show significant improvements of MC-RNN over current top systems.

JBHI Journal 2019 Journal Article

Multimodal Depression Detection: Fusion of Electroencephalography and Paralinguistic Behaviors Using a Novel Strategy for Classifier Ensemble

  • Xiaowei Zhang
  • Jian Shen
  • Zia ud Din
  • Jinyong Liu
  • Gang Wang
  • Bin Hu

Currently, depression has become a common mental disorder and one of the main causes of disability worldwide. Due to the difference in depressive symptoms evoked by individual differences, how to design comprehensive and effective depression detection methods has become an urgent demand. This study explored from physiological and behavioral perspectives simultaneously and fused pervasive electroencephalography (EEG) and vocal signals to make the detection of depression more objective, effective and convenient. After extraction of several effective features for these two types of signals, we trained six representational classifiers on each modality, then denoted diversity and correlation of decisions from different classifiers using co-decision tensor and combined these decisions into the ultimate classification result with multi-agent strategy. Experimental results on 170 (81 depressed patients and 89 normal controls) subjects showed that the proposed multi-modal depression detection strategy is superior to the single-modal classifiers or other typical late fusion strategies in accuracy, f1-score and sensitivity. This work indicates that late fusion of pervasive physiological and behavioral signals is promising for depression detection and the multi-agent strategy can take advantage of diversity and correlation of different classifiers effectively to gain a better final decision.

IJCAI Conference 2019 Conference Paper

Polygon-Net: A General Framework for Jointly Boosting Multiple Unsupervised Neural Machine Translation Models

  • Chang Xu
  • Tao Qin
  • Gang Wang
  • Tie-Yan Liu

Neural machine translation (NMT) has achieved great success. However, collecting large-scale parallel data for training is costly and laborious. Recently, unsupervised neural machine translation has attracted more and more attention, due to its demand for monolingual corpus only, which is common and easy to obtain, and its great potentials for the low-resource or even zero-resource machine translation. In this work, we propose a general framework called Polygon-Net, which leverages multi auxiliary languages for jointly boosting unsupervised neural machine translation models. Specifically, we design a novel loss function for multi-language unsupervised neural machine translation. In addition, different from the literature that just updating one or two models individually, Polygon-Net enables multiple unsupervised models in the framework to update in turn and enhance each other for the first time. In this way, multiple unsupervised translation models are associated with each other for training to achieve better performance. Experiments on the benchmark datasets including UN Corpus and WMT show that our approach significantly improves over the two-language based methods, and achieves better performance with more languages introduced to the framework.

AAAI Conference 2018 Conference Paper

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

  • Jiuxiang Gu
  • Jianfei Cai
  • Gang Wang
  • Tsuhan Chen

The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multi-stage prediction framework for image captioning, composed of multiple decoders each of which operates on the output of the previous stage, producing increasingly refined image descriptions. Our proposed learning approach addresses the difficulty of vanishing gradients during training by providing a learning objective function that enforces intermediate supervisions. Particularly, we optimize our model with a reinforcement learning approach which utilizes the output of each intermediate decoder’s test-time inference algorithm as well as the output of its preceding decoder to normalize the rewards, which simultaneously solves the well-known exposure bias problem and the loss-evaluation mismatch problem. We extensively evaluate the proposed approach on MSCOCO and show that our approach can achieve the state-of-the-art performance.

NeurIPS Conference 2017 Conference Paper

Solving Most Systems of Random Quadratic Equations

  • Gang Wang
  • Georgios Giannakis
  • Yousef Saad
  • Jie Chen

This paper deals with finding an $n$-dimensional solution $\bm{x}$ to a system of quadratic equations $y_i=|\langle\bm{a}_i, \bm{x}\rangle|^2$, $1\le i \le m$, which in general is known to be NP-hard. We put forth a novel procedure, that starts with a \emph{weighted maximal correlation initialization} obtainable with a few power iterations, followed by successive refinements based on \emph{iteratively reweighted gradient-type iterations}. The novel techniques distinguish themselves from prior works by the inclusion of a fresh (re)weighting regularization. For certain random measurement models, the proposed procedure returns the true solution $\bm{x}$ with high probability in time proportional to reading the data $\{(\bm{a}_i; y_i)\}_{1\le i \le m}$, provided that the number $m$ of equations is some constant $c>0$ times the number $n$ of unknowns, that is, $m\ge cn$. Empirically, the upshots of this contribution are: i) perfect signal recovery in the high-dimensional regime given only an \emph{information-theoretic limit number} of equations; and, ii) (near-)optimal statistical accuracy in the presence of additive noise. Extensive numerical tests using both synthetic data and real images corroborate its improved signal recovery performance and computational efficiency relative to state-of-the-art approaches.

JBHI Journal 2016 Journal Article

Epileptic Seizure Detection Based on Partial Directed Coherence Analysis

  • Gang Wang
  • Zhongjiang Sun
  • Ran Tao
  • Kuo Li
  • Gang Bao
  • Xiangguo Yan

Long-term video EEG epilepsy monitoring can help doctors diagnose and cure epilepsy. The workload of doctors to read the EEG signals of epilepsy patients can be effectively reduced by automatic seizure detection. The application of partial directed coherence (PDC) analysis as mechanism for feature extraction in the scalp EEG recordings for seizure detection could reflect the physiological changes of brain activity before and after seizure onsets. In this study, a new approach on the basis of PDC was proposed to detect the seizure intervals of epilepsy patients. First of all, the multivariate autoregressive model was established for a moving window and the direction and intensity of information flow based on PDC analysis was calculated. Then, the outflow information related to certain EEG channel could be obtained by summing up the intensity of information flow propagated to other EEG channels in order to reduce the feature dimensionality. At last, according to the pathological features of epileptic seizures, the outflow information was regarded as the input vectors to a support vector machine classifier for discriminating interictal periods and ictal periods of EEG signals. The proposed method had achieved a good performance with the correct rate of 98. 3%, the selectivity rate of 67. 88%, the sensitivity rate of 91. 44%, the specificity rate of 99. 34%, and the average detection rate of 95. 39%, which demonstrated that this method was suitable for detecting the seizure intervals of epilepsy patients. By comparing with other existing techniques, the proposed method based on PDC analysis achieved significant improvement in terms of seizure detection.

NeurIPS Conference 2016 Conference Paper

Solving Random Systems of Quadratic Equations via Truncated Generalized Gradient Flow

  • Gang Wang
  • Georgios Giannakis

This paper puts forth a novel algorithm, termed \emph{truncated generalized gradient flow} (TGGF), to solve for $\bm{x}\in\mathbb{R}^n/\mathbb{C}^n$ a system of $m$ quadratic equations $y_i=|\langle\bm{a}_i, \bm{x}\rangle|^2$, $i=1, 2, \ldots, m$, which even for $\left\{\bm{a}_i\in\mathbb{R}^n/\mathbb{C}^n\right\}_{i=1}^m$ random is known to be \emph{NP-hard} in general. We prove that as soon as the number of equations $m$ is on the order of the number of unknowns $n$, TGGF recovers the solution exactly (up to a global unimodular constant) with high probability and complexity growing linearly with the time required to read the data $\left\{\left(\bm{a}_i; \, y_i\right)\right\}_{i=1}^m$. Specifically, TGGF proceeds in two stages: s1) A novel \emph{orthogonality-promoting} initialization that is obtained with simple power iterations; and, s2) a refinement of the initial estimate by successive updates of scalable \emph{truncated generalized gradient iterations}. The former is in sharp contrast to the existing spectral initializations, while the latter handles the rather challenging nonconvex and nonsmooth \emph{amplitude-based} cost function. Numerical tests demonstrate that: i) The novel orthogonality-promoting initialization method returns more accurate and robust estimates relative to its spectral counterparts; and ii) even with the same initialization, our refinement/truncation outperforms Wirtinger-based alternatives, all corroborating the superior performance of TGGF over state-of-the-art algorithms.

JBHI Journal 2016 Journal Article

The Removal of EOG Artifacts From EEG Signals Using Independent Component Analysis and Multivariate Empirical Mode Decomposition

  • Gang Wang
  • Chaolin Teng
  • Kuo Li
  • Zhonglin Zhang
  • Xiangguo Yan

The recorded electroencephalography (EEG) signals are usually contaminated by electrooculography (EOG) artifacts. In this paper, by using independent component analysis (ICA) and multivariate empirical mode decomposition (MEMD), the ICA-based MEMD method was proposed to remove EOG artifacts (EOAs) from multichannel EEG signals. First, the EEG signals were decomposed by the MEMD into multiple multivariate intrinsic mode functions (MIMFs). The EOG-related components were then extracted by reconstructing the MIMFs corresponding to EOAs. After performing the ICA of EOG-related signals, the EOG-linked independent components were distinguished and rejected. Finally, the clean EEG signals were reconstructed by implementing the inverse transform of ICA and MEMD. The results of simulated and real data suggested that the proposed method could successfully eliminate EOAs from EEG signals and preserve useful EEG information with little loss. By comparing with other existing techniques, the proposed method achieved much improvement in terms of the increase of signal-to-noise and the decrease of mean square error after removing EOAs.

IJCAI Conference 2011 Conference Paper

Distance Metric Learning under Covariate Shift

  • Bin Cao
  • Xiaochuan Ni
  • Jian-Tao Sun
  • Gang Wang
  • Qiang Yang

Learning distance metrics is a fundamental problem in machine learning. Previous distance-metric learning research assumes that the training and test data are drawn from the same distribution, which may be violated in practical applications. When the distributions differ, a situation referred to as covariate shift, the metric learned from training data may not work well on the test data. In this case the metric is said to be inconsistent. In this paper, we address this problem by proposing a novel metric learning framework known as consistent distance metric learning (CDML), which solves the problem under covariate shift situations. We theoretically analyze the conditions when the metrics learned under covariate shift are consistent. Based on the analysis, a convex optimization problem is proposed to deal with the CDML problem. An importance sampling method is proposed for metric learning and two importance weighting strategies are proposed and compared in this work. Experiments are carried out on synthetic and real world datasets to show the effectiveness of the proposed method.

AAAI Conference 2008 Conference Paper

Semi-supervised Classification Using Local and Global Regularization

  • Fei Wang
  • Gang Wang

In this paper, we propose a semi-supervised learning (SSL) algorithm based on local and global regularization. In the local regularization part, our algorithm constructs a regularized classifier for each data point using its neighborhood, while the global regularization part adopts a Laplacian regularizer to smooth the data labels predicted by those local classifiers. We show that some existing SSL algorithms can be derived from our framework. Finally we present some experimental results to show the effectiveness of our method.