Arrow Research search

Author name cluster

Dan Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers
2 author rows

Possible papers

19

JBHI Journal 2026 Journal Article

Automatic Sleep Staging of Single-Channel Ear-EEG Signals With a Probabilistic Ensemble Learning Approach

  • Hongyu Liang
  • Yongxuan Wang
  • Le Yang
  • Meimei Wu
  • Dan Wang
  • Xiaohong Wang
  • Rong Liu

Accurate sleep staging is crucial for the early diagnosis of neurodegenerative diseases and the management of sleep disorders. To provide a user-friendly, non-intrusive, and long-term monitoring solution, we explored the potential clinical applications of ear-electroencephalogram (ear-EEG). This study proposes a probabilistic ensemble learning approach for automatic sleep staging using single-channel ear-EEG data. The proposed method integrates Extreme Gradient Boosting (XGBoost) with Linear Discriminant Analysis (LDA), augmented by transition matrix correction and probability weighting strategies, to capture temporal sleep patterns without compromising data integrity or requiring intensive preprocessing. An ear-EEG with polysomnography (ear-PSG) dataset collected from twenty subjects using our custom-developed ear-EEG sensor, was compared with two public datasets, ear-Feature and Sleep-EDF, to validate both the reliability of the data and the effectiveness of the proposed approach. The results indicate that transition matrix correction is particularly effective when training and testing are conducted using single-epoch inputs, whereas model weighting demonstrates greater stability as the number of epochs increases. When using seven-epochs input sequences, leave-one-subject-out (LOSO) cross-validation achieved 0. 814 accuracy with 0. 749 kappa coefficient on ear-PSG (earL-R), and 0. 841 accuracy with 0. 779 kappa coefficient on the ear-Feature dataset. The design of a single-channel cross-ear intra-auricular ear-EEG configuration, combined with an ensemble learning framework, effectively balances device portability and classification performance, offering new insights for the clinical translation of wearable sleep monitoring technology and laying a foundation for the development of portable sleep monitoring devices.

AAAI Conference 2026 Conference Paper

Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

  • Rongyu Zhang
  • Aosong Cheng
  • Yulin Luo
  • Gaole Dai
  • Huanrui Yang
  • Jiaming Liu
  • Ran Xu
  • Li Du

Continual Test-Time Adaptation (CTTA), which aims to adapt the pre-trained model to ever-evolving target domains, emerges as an important task for vision models. As current vision models appear to be heavily biased towards texture, continuously adapting the model from one domain distribution to another can result in serious catastrophic forgetting. Drawing inspiration from the the encoding characteristics of neuron activation in neural networks, we propose the Mixture-of-Activation-Sparsity-Experts (MoASE) for the CTTA task. Given the distinct reaction of neurons with low and high activation to domain-specific and agnostic features, MoASE decomposes the neural activation into high-activation and low-activation components in each expert with a Spatial Differentiable Dropout (SDD). Based on the decomposition, we devise a Domain-Aware Router (DAR) that utilizes domain information to adaptively weight experts that process the post-SDD sparse activations, and the Activation Sparsity Gate (ASG) that adaptively assigns feature selection thresholds of the SDD for different experts for more precise feature decomposition. Finally, we introduce a Homeostatic-Proximal (HP) loss to maintain update consistency between the teacher and student experts to prevent error accumulation. Extensive experiments substantiate that MoASE achieves state-of-the-art performance in both classification and segmentation tasks.

AAAI Conference 2026 Conference Paper

MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation

  • Rongyu Zhang
  • Menghang Dong
  • Yuan Zhang
  • Liang Heng
  • Xiaowei Chi
  • Gaole Dai
  • Li Du
  • Dan Wang

Vision-Language-Action (VLA) models enable robotic systems to perform embodied tasks but face deployment challenges due to the high computational demands of the dense Large Language Models (LLMs), with existing early-exit-based sparsification methods often overlooking the critical semantic role of final layers in downstream tasks. Aligning with the recent breakthrough of the Shallow Brain Hypothesis (SBH) in neuroscience and the mixture of experts in model sparsification, we conceptualize each LLM layer as an expert and propose a Mixture-of-LayEr Vision Language Action model (MoLe-VLA or simply MoLe) architecture for dynamic LLM layer activation. Specifically, we introduce a Spatial-Temporal Aware Router (STAR) for MoLe to selectively activate only parts of the layers based on the robot’s current state, mimicking the brain's distinct signal pathways specialized for cognition and causal reasoning. Additionally, to compensate for the cognition ability of LLM lost during the layer-skipping, we devise a Cognitive self-Knowledge Distillation (CogKD) to enhance the understanding of task demands and generate task-relevant action sequences by leveraging cognition features. Extensive experiments in RLBench simulations and real-world environments demonstrate the superiority of MoLe-VLA in both efficiency and performance, improving the mean success rate by 9.7% across ten simulation tasks while accelerating inference by 36.8% over OpenVLA.

ICML Conference 2025 Conference Paper

M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Embedding Predictive Architecture

  • Hongyang Lei
  • Xiaolong Cheng
  • Qi Qin
  • Dan Wang
  • Huazhen Huang
  • Qingqing Gu
  • Yetao Wu
  • Luo Ji

Current multimodal learning strategies primarily optimize in the original token space. Such a framework is easy to incorporate with the backbone of pretrained language model, but might result in modality collapse. To alleviate such issues, we leverage the Joint-Embedding Predictive Architecture (JEPA) on the multimodal tasks, which converts the input embedding into the output embedding space by a predictor and then conducts the cross-modal alignment on the latent space. We implement this predictor by a Multi-Gate Mixture of Experts (MMoE) and name the framework as M3-JEPA, accordingly. The gating function disentangles the modality-specific and shared information and derives information-theoretic optimality. The framework is implemented with both contrastive and regularization loss, and solved by alternative gradient descent (AGD) between different multimodal tasks. By thoroughly designed experiments, we show that M3-JEPA can obtain state-of-the-art performance on different modalities and tasks, generalize to unseen datasets and domains, and is computationally efficient in both training and inference. Our observation suggests that M3-JEPA might become a new basis to self-supervised learning in the open world.

NeurIPS Conference 2025 Conference Paper

Synthetic Series-Symbol Data Generation for Time Series Foundation Models

  • Wenxuan Wang
  • Kai Wu
  • yujian li
  • Dan Wang
  • Xiaoyu Zhang

Foundation models for time series analysis (TSA) have attracted significant attention. However, challenges such as training data scarcity and imbalance continue to hinder their development. Inspired by complex dynamic system theories, we design a series-symbol data generation mechanism, enabling the unrestricted creation of high-quality time series data paired with corresponding symbolic expressions. To leverage series-symbol data pairs with strong correlations, we develop SymTime, a pre-trained foundation model for enhancing time series representation using symbolic information. SymTime demonstrates competitive performance across five major TSA tasks when fine-tunes with downstream tasks, rivaling foundation models pre-trained on real-world datasets. This approach underscores the potential of series-symbol data generation and pretraining mechanisms in overcoming data scarcity and enhancing task performance. The code is available at https: //github. com/wwhenxuan/SymTime.

NeurIPS Conference 2025 Conference Paper

Unveiling the Uncertainty in Embodied and Operational Carbon of Large AI Models through a Probabilistic Carbon Accounting Model

  • Xiaoyang Zhang
  • He Fang
  • Yang Deng
  • Dan Wang

The rapid growth of large AI models has raised significant environmental concerns due to their substantial carbon footprint. Existing carbon accounting methods for AI models are fundamentally deterministic and fail to account for inherent uncertainties in embodied and operational carbon emissions. Our work aims to investigate the effect of these uncertainties on embodied and operational carbon footprint estimates for large AI models. We propose a Probabilistic Carbon Accounting Model (PCAM), which quantifies uncertainties in the carbon accounting of large AI models. We develop parameter models to quantify key components (processors, memory, storage) in the carbon footprint of AI models. To characterize the distribution of the parameters, we develop a carbon dataset by aggregating related data from various sources. Then, we generate the probabilistic distribution of the parameters from the collected dataset. We compare the performance of PCAM with LLMCarbon, the state-of-the-art carbon accounting method for large AI models. PCAM achieves $\leq7. 44\%$ error compared to LLMCarbon’s $\leq108. 51\%$.

AAAI Conference 2025 Conference Paper

VVRec: Reconstruction Attacks on DL-based Volumetric Video Upstreaming via Latent Diffusion Model with Gamma Distribution

  • Rui Lu
  • Bihai Zhang
  • Dan Wang

With the popularity of 3D volumetric video applications, such as Autonomous Driving, Virtual Reality, and Mixed Reality, current developers have turned to deep learning for compressing volumetric video frames, i.e., point clouds for video upstreaming. The latest deep learning-based solutions offer higher efficiency, lower distortion, and better hardware support compared to traditional ones like MPEG and JPEG. However, privacy threats arise, especially reconstruction attacks targeting to recover the original input point cloud from the intermediate results. In this paper, we design VVRec, to the best of our knowledge, which is the first targeting DL-based Volumetric Video Reconstruction attack scheme. VVRec demonstrates the ability to reconstruct high-quality point clouds from intercepted transmission intermediate results using four well-trained neural network modules we design. Leveraging the latest latent diffusion models with Gamma distribution and a refinement algorithm, VVRec excels in reconstruction quality and color recovery and surpasses existing defenses. We evaluate VVRec using three volumetric video datasets. The results demonstrate that VVRec achieves 64.70dB reconstruction accuracy, with an impressive 46.39% reduction of distortion over baselines.

IJCAI Conference 2025 Conference Paper

Weather Foundation Model Enhanced Decentralized Photovoltaic Power Forecasting Through Spatio-temporal Knowledge Distillation

  • Fang He
  • Jiaqi Fan
  • Yang Deng
  • Xiaoyang Zhang
  • Ka Tai Lau
  • Dan Wang

The solar photovoltaic power forecasting (SPPF) of a PV system is vital for the downstream power estimation. While approaches for recent decentralized PV systems require customized models for each PV installation, this method is labor-intensive and not scalable. Therefore, developing a general SPPF model for a decentralized PV system is essential. The primary challenge in developing such a model is accounting for regional weather variations. Recent advancements in weather foundation models (WFMs) offer a promising opportunity, providing accurate forecasts with reduced computational demands. However, integrating WFMs into SPPF models remains challenging due to the complexity of WFMs. This paper introduces a novel approach, spatio-temporal knowledge distillation (STKD), to efficiently adapt WFMs for SPPF. The proposed STKD-PV models leverage regional weather and PV power data to forecast power generation from six hours to a day ahead. Globally evaluated across six datasets, STKD-PV models demonstrate superior performance compared to state-of-the-art (SOTA) time-series models and fine-tuned WFMs, achieving significant improvements in forecasting accuracy. This study marks the first application of knowledge distillation from WFMs to SPPF, offering a scalable and cost-effective solution for decentralized PV systems.

NeurIPS Conference 2024 Conference Paper

MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation

  • Jialin Luo
  • Yuanzhi Wang
  • Ziqi Gu
  • Yide Qiu
  • Shuaizhen Yao
  • Fuyun Wang
  • Chunyan Xu
  • Wenhua Zhang

Recently, the diffusion-based generative paradigm has achieved impressive general image generation capabilities with text prompts due to its accurate distribution modeling and stable training process. However, generating diverse remote sensing (RS) images that are tremendously different from general images in terms of scale and perspective remains a formidable challenge due to the lack of a comprehensive remote sensing image generation dataset with various modalities, ground sample distances (GSD), and scenes. In this paper, we propose a Multi-modal, Multi-GSD, Multi-scene Remote Sensing (MMM-RS) dataset and benchmark for text-to-image generation in diverse remote sensing scenarios. Specifically, we first collect nine publicly available RS datasets and conduct standardization for all samples. To bridge RS images to textual semantic information, we utilize a large-scale pretrained vision-language model to automatically output text prompts and perform hand-crafted rectification, resulting in information-rich text-image pairs (including multi-modal images). In particular, we design some methods to obtain the images with different GSD and various environments (e. g. , low-light, foggy) in a single sample. With extensive manual screening and refining annotations, we ultimately obtain a MMM-RS dataset that comprises approximately 2. 1 million text-image pairs. Extensive experimental results verify that our proposed MMM-RS dataset allows off-the-shelf diffusion models to generate diverse RS images across various modalities, scenes, weather conditions, and GSD. The dataset is available at https: //github. com/ljl5261/MMM-RS.

JBHI Journal 2021 Journal Article

Multi-Source Transfer Learning Via Multi-Kernel Support Vector Machine Plus for B-Mode Ultrasound-Based Computer-Aided Diagnosis of Liver Cancers

  • Huili Zhang
  • Lehang Guo
  • Dan Wang
  • Jun Wang
  • Lili Bao
  • Shihui Ying
  • Huixiong Xu
  • Jun Shi

B-mode ultrasound (BUS) imaging is a routine tool for diagnosis of liver cancers, while contrast-enhanced ultrasound (CEUS) provides additional information to BUS on the local tissue vascularization and perfusion to promote diagnostic accuracy. In this work, we propose to improve the BUS-based computer aided diagnosis for liver cancers by transferring knowledge from the multi-view CEUS images, including the arterial phase, portal venous phase, and delayed phase, respectively. To make full use of the shared labels of paired of BUS and CEUS images to guide knowledge transfer, support vector machine plus (SVM+), a specifically designed transfer learning (TL) classifier for paired data with shared labels, is adopted for this supervised TL. A nonparallel hyperplane based SVM+ (NHSVM+) is first proposed to improve the TL performance by transferring the per-class knowledge from source domain to the corresponding target domain. Moreover, to handle the issue of multi-source TL, a multi-kernel learning based NHSVM+ (MKL-NHSVM+) algorithm is further developed to effectively transfer multi-source knowledge from multi-view CEUS images. The experimental results indicate that the proposed MKL-NHSVM+ outperforms all the compared algorithms for diagnosis of liver cancers, whose mean classification accuracy, sensitivity, and specificity are 88. 18 ± 3. 16 %, 86. 98 ± 4. 77 %, and 89. 42±3. 77%, respectively.

IJCAI Conference 2020 Conference Paper

An Attention-based Model for Conversion Rate Prediction with Delayed Feedback via Post-click Calibration

  • Yumin Su
  • Liang Zhang
  • Quanyu Dai
  • Bo Zhang
  • Jinyao Yan
  • Dan Wang
  • Yongjun Bao
  • Sulong Xu

Conversion rate (CVR) prediction is becoming increasingly important in the multi-billion dollar online display advertising industry. It has two major challenges: firstly, the scarce user history data is very complicated and non-linear; secondly, the time delay between the clicks and the corresponding conversions can be very large, e. g. , ranging from seconds to weeks. Existing models usually suffer from such scarce and delayed conversion behaviors. In this paper, we propose a novel deep learning framework to tackle the two challenges. Specifically, we extract the pre-trained embedding from impressions/clicks to assist in conversion models and propose an inner/self-attention mechanism to capture the fine-grained personalized product purchase interests from the sequential click data. Besides, to overcome the time-delay issue, we calibrate the delay model by learning dynamic hazard function with the abundant post-click data more in line with the real distribution. Empirical experiments with real-world user behavior data prove the effectiveness of the proposed method.

TIST Journal 2020 Journal Article

Contextual Anomaly Detection in Solder Paste Inspection with Multi-Task Learning

  • Zimu Zheng
  • Jie Pu
  • Linghui Liu
  • Dan Wang
  • Xiangming Mei
  • Sen Zhang
  • Quanyu Dai

In this article, we study solder paste inspection (SPI), an important stage that is used in the semiconductor manufacturing industry, where abnormal boards should be detected. A highly accurate SPI can substantially reduce human expert involvement, as well as reduce the waste in disposing of the boards in good condition. A key difference today is that because of increasing demand in board customization, the number of board types increases substantially and quantity of the boards produced in each type decreases. Thus, the previous approaches where a fine-tuned model is developed for each board type are no longer viable. Intrinsically, our problem is an anomaly detection problem. A major specialty in today’s SPI is that the target tasks for prediction cannot be fully pre-determined due to context changes during the solder paste printing stage. Our experiences show that a conventional approach to first define a set of tasks and train these tasks offline will lead to low accuracy. Here, we propose a novel multi-task approach, where the performance of all target tasks is ensured simultaneously. We note that the SPI process is streamlined and automatic, allowing the SPI time for only a few seconds. We propose a fast clustering algorithm that reuses existing models to avoid retraining and fine tune in the inference phase. We evaluate our approach using 3-month data collected from production lines. We show that we can reduce 81.28% of false alarms. This can translate to annual savings of $11.3 million.

AAAI Conference 2020 Conference Paper

Distributed Machine Learning through Heterogeneous Edge Systems

  • Hanpeng Hu
  • Dan Wang
  • Chuan Wu

Many emerging AI applications request distributed machine learning (ML) among edge systems (e. g. , IoT devices and PCs at the edge of the Internet), where data cannot be uploaded to a central venue for model training, due to their large volumes and/or security/privacy concerns. Edge devices are intrinsically heterogeneous in computing capacity, posing significant challenges to parameter synchronization for parallel training with the parameter server (PS) architecture. This paper proposes ADSP, a parameter synchronization model for distributed machine learning (ML) with heterogeneous edge systems. Eliminating the significant waiting time occurring with existing parameter synchronization models, the core idea of ADSP is to let faster edge devices continue training, while committing their model updates at strategically decided intervals. We design algorithms that decide time points for each worker to commit its model update, and ensure not only global model convergence but also faster convergence. Our testbed implementation and experiments show that ADSP outperforms existing parameter synchronization models significantly in terms of ML model convergence time, scalability and adaptability to large heterogeneity.

JBHI Journal 2019 Journal Article

Classification and Quantification of Emphysema Using a Multi-Scale Residual Network

  • Liying Peng
  • Yen-Wei Chen
  • Lanfen Lin
  • Hongjie Hu
  • Huali Li
  • Qingqing Chen
  • Xiaoli Ling
  • Dan Wang

Automated tissue classification is an essential step for quantitative analysis and treatment of emphysema. Although many studies have been conducted in this area, there still remain two major challenges. First, different emphysematous tissue appears in different scales, which we call “inter-class variations. ” Second, the intensities of CT images acquired from different patients, scanners or scanning protocols may vary, which we call “intra-class variations”. In this paper, we present a novel multi-scale residual network with two channels of raw CT image and its differential excitation component. We incorporate multi-scale information into our networks to address the challenge of inter-class variations. In addition to the conventional raw CT image, we use its differential excitation component as a pair of inputs to handle intra-class variations. Experimental results show that our approach has superior performance over the state-of-theart methods, achieving a classification accuracy of 93. 74% on our original emphysema database. Based on the classification results, we also perform the quantitative analysis of emphysema in 50 subjects by correlating the quantitative results (the area percentage of each class) with pulmonary functions. We show that centrilobular emphysema (CLE) and panlobular emphysema (PLE) have strong correlation with the pulmonary functions and the sum of CLE and PLE can be used as a new and accurate measure of emphysema severity instead of the conventional measure (sum of all subtypes of emphysema). The correlations between the new measure and various pulmonary functions are up to |r| = 0. 922 (ris correlation coefficient).

IJCAI Conference 2019 Conference Paper

Metadata-driven Task Relation Discovery for Multi-task Learning

  • Zimu Zheng
  • Yuqi Wang
  • Quanyu Dai
  • Huadi Zheng
  • Dan Wang

Task Relation Discovery (TRD), i. e. , reveal the relation of tasks, has notable value: it is the key concept underlying Multi-task Learning (MTL) and provides a principled way for identifying redundancies across tasks. However, task relation is usually specifically determined by data scientist resulting in the additional human effort for TRD, while transfer based on brute-force methods or mere training samples may cause negative effects which degrade the learning performance. To avoid negative transfer in an automatic manner, our idea is to leverage commonly available context attributes in nowadays systems, i. e. , the metadata. In this paper, we, for the first time, introduce metadata into TRD for MTL and propose a novel Metadata Clustering method, which jointly uses historical samples and additional metadata to automatically exploit the true relatedness. It also avoids the negative transfer by identifying reusable samples between related tasks. Experimental results on five real-world datasets demonstrate that the proposed method is effective for MTL with TRD, and particularly useful in complicated systems with diverse metadata but insufficient data samples. In general, this study helps in automatic relation discovery among partially related tasks and sheds new light on the development of TRD in MTL through the use of metadata as apriori information.

AAAI Conference 2018 Conference Paper

Adversarial Network Embedding

  • Quanyu Dai
  • Qiang Li
  • Jian Tang
  • Dan Wang

Learning low-dimensional representations of networks has proved effective in a variety of tasks such as node classi- fication, link prediction and network visualization. Existing methods can effectively encode different structural properties into the representations, such as neighborhood connectivity patterns, global structural role similarities and other highorder proximities. However, except for objectives to capture network structural properties, most of them suffer from lack of additional constraints for enhancing the robustness of representations. In this paper, we aim to exploit the strengths of generative adversarial networks in capturing latent features, and investigate its contribution in learning stable and robust graph representations. Specifically, we propose an Adversarial Network Embedding (ANE) framework, which leverages the adversarial learning principle to regularize the representation learning. It consists of two components, i. e. , a structure preserving component and an adversarial learning component. The former component aims to capture network structural properties, while the latter contributes to learning robust representations by matching the posterior distribution of the latent representations to given priors. As shown by the empirical results, our method is competitive with or superior to state-of-the-art approaches on benchmark network embedding tasks. The source code will be available online.

AAAI Conference 2018 Conference Paper

Supervised Deep Hashing for Hierarchical Labeled Data

  • Dan Wang
  • Heyan Huang
  • Chi Lu
  • Bo-Si Feng
  • Guihua Wen
  • Liqiang Nie
  • Xian-Ling Mao

Recently, hashing methods have been widely used in largescale image retrieval. However, most existing supervised hashing methods do not consider the hierarchical relation of labels, which means that they ignored the rich semantic information stored in the hierarchy. Moreover, most of previous works treat each bit in a hash code equally, which does not meet the scenario of hierarchical labeled data. To tackle the aforementioned problems, in this paper, we propose a novel deep hashing method, called supervised hierarchical deep hashing (SHDH), to perform hash code learning for hierarchical labeled data. Specifically, we define a novel similarity formula for hierarchical labeled data by weighting each level, and design a deep neural network to obtain a hash code for each data point. Extensive experiments on two real-world public datasets show that the proposed method outperforms the state-of-the-art baselines in the image retrieval task.