Arrow Research search

Author name cluster

Hua Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

57 papers
2 author rows

Possible papers

57

AAAI Conference 2026 Conference Paper

Decoding with Structured Awareness: Integrating Directional, Frequency-Spatial, and Structural Attention for Medical Image Segmentation

  • Fan Zhang
  • Zhiwei Gu
  • Hua Wang

To address the limitations of Transformer decoders in capturing edge details, recognizing local textures and modeling spatial continuity, this paper proposes a novel decoder framework specifically designed for medical image segmentation, comprising three core modules. First, the Adaptive Cross-Fusion Attention (ACFA) module integrates channel feature enhancement with spatial attention mechanisms and introduces learnable guidance in three directions (planar, horizontal, and vertical) to enhance responsiveness to key regions and structural orientations. Second, the Triple Feature Fusion Attention (TFFA) module fuses features from Spatial, Fourier and Wavelet domains, achieving joint frequency-spatial representation that strengthens global dependency and structural modeling while preserving local information such as edges and textures, making it particularly effective in complex and blurred boundary scenarios. Finally, the Structural-aware Multi-scale Masking Module (SMMM) optimizes the skip connections between encoder and decoder by leveraging multi-scale context and structural saliency filtering, effectively reducing feature redundancy and improving semantic interaction quality. Working synergistically, these modules not only address the shortcomings of traditional decoders but also significantly enhance performance in high-precision tasks such as tumor segmentation and organ boundary extraction, improving both segmentation accuracy and model generalization. Experimental results demonstrate that this framework provides an efficient and practical solution for medical image segmentation.

AAAI Conference 2026 Conference Paper

Exploiting All Mamba Fusion for Efficient RGB-D Tracking

  • Ge Ying
  • Dawei Zhang
  • Chengzhuan Yang
  • Wei Liu
  • Sang-Woon Jeon
  • Hua Wang
  • Changqin Huang
  • Zhonglong Zheng

Despite the progress made through deep learning, existing Visual Object Tracking (VOT) frameworks struggle with real-world challenges. Recent approaches incorporate additional modalities like Depth, Thermal Infrared, and Language to enhance the robustness of VOT, particularly with the improvement of the depth sensor precision, facilitating RGB-D tracking. However, current RGB-D trackers often copy RGB tracking paradigms, leading to inefficiency due to two-stream architectures that fail to exploit heterogeneous features, and reliance on simplistic or large-parameter fusion methods. To address these challenges, we propose AMTrack, a one-stream RGB-D tracker leveraging Mamba's linear complexity for simultaneous feature extraction and two-stage cross-modal feature fusion. Our innovation also includes a low-parameter Multimodal Mix Mamba (3M) module, which optimizes deep feature fusion and reduces computational overhead. The advantage of the 3M module stems from our Multimodal State Space Model (MSSM), a multimodal feature interaction component reconstructed based on SSM. Experiments across multiple RGB-D tracking datasets indicate that AMTrack achieves superior performance with lower parameters and memory demands compared to state-of-the-arts.

AAAI Conference 2026 Conference Paper

IdealTSF: Can Non-Ideal Data Contribute to Enhancing the Performance of Time Series Forecasting Models?

  • Hua Wang
  • Jinghao Lu
  • Fan Zhang

Deep learning has shown strong performance in time series forecasting tasks. However, issues such as missing values and anomalies in sequential data hinder its further development in prediction tasks. Previous research has primarily focused on extracting feature information from sequence data or addressing these suboptimal data as positive samples for knowledge transfer. A more effective approach would be to leverage these non-ideal negative samples to enhance event prediction. In response, this study highlights the advantages of non-ideal negative samples and proposes the IdealTSF framework, which integrates both ideal positive and negative samples for time series forecasting. IdealTSF consists of three progressive steps: pretraining, training, and optimization. It first pretrains the model by extracting knowledge from negative sample data, then transforms the sequence data into ideal positive samples during training. Additionally, a negative optimization mechanism with adversarial disturbances is applied. Extensive experiments demonstrate that negative sample data unlocks significant potential within the basic attention architecture for time series forecasting. Therefore, IdealTSF is particularly well-suited for applications with noisy samples or low-quality data.

AAAI Conference 2026 Conference Paper

IGIANet: Illumination Guided Implicit Alignment Network for Infrared–Visible UAV Detection

  • Xiangqi Chen
  • Dawei Zhang
  • Li Zhao
  • Chengzhuan Yang
  • Zhongyu Chen
  • Jungang Lou
  • Zhonglong Zheng
  • Sang-Woon Jeon

Visible-Infrared (RGB-IR) Unmanned Aerial Vehicle (UAV) object detection integrates complementary cues from visible and infrared sensors, offering broad application potential. However, due to sensor parallax, it still faces the challenge of weak spatial misalignment, which significantly limits its performance in UAV-based object detection. Existing methods emphasize strict alignment, overlooking spectral heterogeneity under varying illumination. To address these issues, we propose the Illumination Guided Implicit Alignment Network (IGIANet) to mitigate modality heterogeneity without explicit alignment. Specifically, we integrate three novel modules. First, we propose an illumination-guided frequency modulation module that adaptively allocates fusion weights to visible and infrared features based on global illumination estimation, effectively alleviating modality imbalance under varying lighting conditions. Second, we introduce a frequency-guided cross-modality differential enhancement module, which computes differential cues across frequency domains to enhance complementary information and highlight weakly aligned and low-contrast regions. Finally, we introduce an implicit alignment-driven dynamic fusion module that actively estimates offsets and generates dynamic, position-adaptive fusion kernels to align and fuse modalities. Extensive experiments demonstrate that IGIANet outperforms state-of-the-art models on various benchmarks, achieving 80.9% mAP on DroneVehicle, 57.1% mAP on VEDAI, and 49.4% mAP on FLIR.

AAAI Conference 2026 Conference Paper

MoEA-Net: Modality-Incremental Expert Aggregation Network for Retinal Prognostic Prediction

  • Hua Wang
  • Xiaodan Zhang
  • Yanzhao Shi
  • Chengxin Zheng
  • Wanyu Zhang
  • Zhen Wang
  • Jianing Wang
  • Xiaobing Yu

Automated analysis of temporal changes in multimodal retinal images is critical for the prognostic assessment of ophthalmic diseases. Unlike traditional single-timepoint diagnosis, tracking longitudinal changes across multiple imaging modalities introduces significant data bias challenges: (1) Imbalanced modality samples compromise the integration of knowledge within minority modalities; (2) Heterogeneous visual patterns across modalities undermine the perception of disease-relevant biomarkers. To tackle these issues, we propose a Modality-Incremental Expert Aggregation Network (MoEA-Net), which unifies the inter-modal integration and intra-modal perception for enhanced retinal prognostic prediction. Specifically, we employ the large language model (LLM) with incremental LoRA layers for specific modalities to effectively integrate knowledge from imbalanced data. Besides, we introduce a Spatiotemporal-aware Expert (SAE) module to better perceive both the anatomical structures and longitudinal changes within modalities. By progressively combining the SAE module with incremental LoRA, MoEA-Net supports continual knowledge accumulation and improves accurate reasoning. Experimental results show that MoEA-Net achieves state-of-the-art performance on subretinal fluid change and visual recovery classification tasks, validating its effectiveness.

AAAI Conference 2026 Conference Paper

Neural Outline Cache for Real-time Anti-aliasing Font Rendering

  • Jiashuaizi Mo
  • Sang-Woon Jeon
  • Hua Wang
  • Xiangqi Chen
  • Yanchao Wang
  • Minglu Li
  • Zhonglong Zheng

Neural textures have emerged as pivotal assets in next-generation neural rendering pipelines. However, hardware limitations and programming interface constraints lead to suboptimal performance in multi-instance real-time rendering scenarios. This bottleneck becomes particularly acute for texture-intensive tasks such as font rendering. To address this, we propose Neural Outline Cache (NOC), a novel neural font texture supporting real-time anti-aliased rendering and procedural editing within modern neural graphics pipelines. NOC's lightweight network leverages multi-resolution hash encoding to cache spline-derived SDFs, delivering anti-aliased rendering via standard graphics pipelines. For massive-instance scalability, our cache buffer layout (CBL) and batch-fused inference (BFI), tailored for NOC, mitigate neural texture streaming bottlenecks. We constructed an evaluation dataset using five font styles. In offline rendering, our proposed method achieves overall average results of 57.35 dB PSNR, 0.998 SSIM, and 1.1584e-3 pixel RMSE, while maintaining approximately 0.5ms frame latency with 500 real-time instances. To demonstrate its versatility, we integrated a procedural editor for visual effects editing of NOC textures. These results all prove that NOC is a reliable, production-ready neural asset.

TIST Journal 2026 Journal Article

Towards Evolutionary Differential Privacy in Cross-Platform Spatial Crowdsourcing

  • Yong-Feng Ge
  • Hua Wang
  • Elisa Bertino
  • Jinli Cao
  • Yanchun Zhang
  • Zhonglong Zheng

The development of mobile web services has brought significant attention to spatial crowdsourcing. The uneven distribution of tasks and workers has led to recent research on Cross-Platform Spatial Crowdsourcing (CPSC), aiming for a multi-win situation for platforms, workers and task requesters. Previous studies on CPSC problems focused on task assignment and worker selection performance, overlooking the importance of privacy preservation. This paper addresses the existing challenges of privacy preservation and service quality by formulating a Privacy-Preserving Cross-Platform Spatial Crowdsourcing (PP-CPSC) problem and proves it to be NP-hard. We propose an Evolutionary Differential Privacy (Evo-DP) approach to optimize PP-CPSC. Evo-DP's evolutionary framework enables efficient and flexible optimization of privacy budget allocation. Within Evo-DP, each solution to the privacy budget allocation is represented as an individual in the population. To approximate the optimal solution, three evolutionary operations - mutation, crossover, and scaling - are employed for population updates, along with a selection process. A hybrid population model is introduced to balance exploration and exploitation abilities. Experimental results demonstrate Evo-DP's superiority over previous strategies in terms of solution quality, convergence speed, and scalability.

TIST Journal 2026 Journal Article

Transformer-Enhanced Adaptive Graph Convolutional Network for Traffic Flow Prediction

  • Enfu Huang
  • Zhanshan Zhao
  • Jiao Yin
  • Jinli Cao
  • Hua Wang

Traffic flow prediction is vital in urban traffic management, planning, and development. With the continuous advancement of urbanization, there is an increasing demand for traffic flow prediction models to achieve higher accuracy and long-range forecasting capabilities. Against this backdrop, traditional methods that rely on local feature extraction and static spatial graph construction often fall short of expectations. This highlights the urgent need for advanced approaches to dynamically model spatio-temporal features while capturing global dependencies, effectively meeting the demands of complex traffic flow prediction tasks. To achieve this, we propose the Transformer-Enhanced Adaptive Graph Convolutional Network (T-AGCN), a novel model designed to capture global temporal relationships and dynamically extract rich spatial information. T-AGCN incorporates an Adaptive Graph Learner module to model dynamic relationships among traffic nodes and a Transformer-Based Spatio-Temporal graph convolutional module to capture long-range temporal dependencies in historical traffic data effectively. These innovations enable T-AGCN to jointly learn dynamic spatial interactions and complex temporal patterns, offering a comprehensive representation of traffic network dynamics. We evaluate T-AGCN on three real-world datasets, PeMSD7(M), PeMS08, and METR-LA. The experimental results demonstrate that T-AGCN, inspired by the baseline model Spatial-Temporal Graph Convolutional Network (STGCN), significantly enhances its design. Moreover, T-AGCN consistently outperforms state-of-the-art models, including the Transformer-Based Interactive Temporal and Adaptive Network (TITAN) and the Spatial-Temporal Decoupled Masked Autoencoder (STD-MAE). The implementation is available on GitHub at https://github.com/time1722/T-AGCN.

AAAI Conference 2026 Conference Paper

URPO: A Unified Reward & Policy Optimization Framework for Large Language Models

  • Songshuo Lu
  • Hua Wang
  • Zhi Chen
  • Yaohua Tang

Large-scale alignment pipelines typically pair a policy model with a separately trained reward model whose parameters remain frozen during reinforcement learning (RL). This separation creates a complex, resource-intensive pipeline and leads to a performance ceiling. We propose a novel framework, Unified Reward & Policy Optimization (URPO), that unifies instruction-following (“player”) and reward modeling (“referee”) into a single model and a single training phase. Our method recasts all alignment data-including preference pairs, verifiable reasoning, and open-ended instructions-into a unified generative format optimized by a single Group-Relative Policy Optimization (GRPO) loop. This enables the model to learn from ground-truth preferences and verifiable logic while simultaneously generating its own rewards for open-ended tasks. Experiments on the Qwen2.5-7B model demonstrate that URPO significantly outperforms a strong baseline using a separate generative reward model, boosting the instructionfollowing score on AlpacaEval to 44.84 and achieving a 36% relative improvement on the challenging AIME reasoning benchmark. Furthermore, URPO cultivates a superior internal evaluator as a byproduct of training, achieving a RewardBench score of 85.15 and surpassing the dedicated reward model it replaces (83.55). By eliminating the need for a separate reward model and fostering a co-evolutionary dynamic, URPO presents a simpler, more efficient, and more effective path towards robustly aligned language models.

EAAI Journal 2026 Journal Article

Variational inference for learning-based orbital pursuit-evasion under incomplete information

  • Junhua He
  • Hua Wang
  • Haitao Wang
  • Chengyi Huo
  • Heng Jing

Observation is a key factor influencing the strategy design of orbital games. This paper focuses on the orbital pursuit-evasion game with observation errors and information delays (ED-OPEG), proposing an artificial intelligence-based autonomous decision-making method. The game model for ED-OPEG is built based on orbital dynamics and game theory, and is further reconstructed under the partially observable Markov decision process framework—decomposing the game strategy solution into belief state inference and strategy mapping. Accordingly, the Histories-Enhanced Variational Twin Delay Deep Deterministic policy gradient (HE-VTD3) algorithm is proposed. Simulation experiments demonstrate that the HE-VTD3 algorithm exhibits strong resistance to local convergence while ensuring training stability. Against target adopting adversarial strategies, HE-VTD3 enables the pursuer to achieve a capture success rate of 80. 4%, showing improvements of 20. 2% and 67. 1% over the long-short-term-memory-based twin delayed deep deterministic policy gradient and delayed turn-based multi-agent deep deterministic policy gradient algorithms, respectively. Under the multivariate 95% confidence ellipse, HE-VTD3's empirical confidence level for non-cooperative target state uncertainty estimation reaches 97. 77%. Generalization analysis further confirms that HE-VTD3 exhibits robustness to environmental uncertainties and disturbances.

EAAI Journal 2025 Journal Article

A novel dual-channel model with adaptive multi-scale attention for time series forecasting

  • Shuqing Wang
  • Jinghao Lu
  • Ren Wang
  • Xiaofeng Zhang
  • Hua Wang
  • Yujuan Sun

Time series forecasting plays a crucial role in various domains, including finance, traffic management, energy, and healthcare. However, as application scenarios continue to expand, the complexity of time series data has significantly increased, posing substantial challenges in capturing trend fluctuations of multivariate features and the dynamic relationships among them. To address these issues, this paper proposes a novel architecture–DASformer (Dual-Channel model with Adaptive multi-Scale attention) - which enhances time series analysis by leveraging a dual-channel multivariate extractor and an adaptive multi-scale attention mechanism. Specifically, the dual-channel multivariate extractor comprises two independent yet interactive streams, focusing on capturing information at different levels of the time series, thereby effectively decoupling complex dynamic relationships. Moreover, to alleviate the problem of feature forgetting and loss in the long-term trend stream, the model incorporates an adaptive multi-scale attention module. This module adopts multi-scale processing and a dynamic weighting mechanism to learn dependencies across different scales and effectively capture their dynamic variations. Experimental results show that DASformer consistently achieves state-of-the-art performance on nine widely used benchmark datasets, delivering superior prediction accuracy, particularly in long-term forecasting tasks. The source code is available at: https: //github. com/LDU-TSA/DASformer.

EAAI Journal 2025 Journal Article

Autonomous dynamic formation for maritime target tracking using multi-agent reinforcement learning

  • Hua Wang
  • Jiaxin Li
  • Hao Tao
  • Junnan Liu
  • Chaochao Li
  • Ke Wang
  • Mingliang Xu

In various maritime missions such as escort and roundup, dynamic formation target tracking plays a crucial role. Most existing dynamic formation methods require user intervention before formation changes, resulting in poor flexibility and low automation. And they do not consider variations in the abilities of individual members. To address the above issue, we propose an autonomous dynamic formation planning method based on multi-agent reinforcement learning, integrating formation configuration into the strategy. This method can automatically adjust the formation based on the current state of the formation, providing greater flexibility and adaptability. Simultaneously, a staged reward function is devised for the training process to guide agents in progressively learning dynamic formation tasks. Finally, we validate the effectiveness and generalization of our proposed method through various experiments.

EAAI Journal 2025 Journal Article

Periodic decomposition and feature enhancement fusion for traffic forecasting

  • Xiaofei Kong
  • Hua Wang
  • Mingli Zhang
  • Fan Zhang

With the rapid acceleration of urbanization, traffic prediction plays a crucial role in smart city development. This paper proposes an architecture called Periodic Decomposition and Feature Enhancement Fusion (PDGM) aimed at addressing the periodicity issue overlooked in existing traffic prediction methods. PDGM utilizes downsampling techniques to decompose the original traffic data into periodic components and enhances missing data through feature enhancement fusion, thereby improving the accuracy of traffic data prediction. Experimental results of this study demonstrate that PDGM outperforms state-of-the-art baseline models on three benchmark datasets, offering new possibilities for traffic data analysis and prediction tasks.

EAAI Journal 2025 Journal Article

Probabilistic intervals prediction based on adaptive regression with attention residual connections and covariance constraints

  • Fan Zhang
  • Min Wang
  • Lin Li
  • Yepeng Liu
  • Hua Wang

This paper introduces a novel prediction interval method called Adaptive Regression with Attention Residual Connection and Covariance Constraint (AR-ARCC). By integrating Monte Carlo and Bayesian methods, we leverage the strengths of both to achieve a more flexible and accurate method for generating prediction intervals. Additionally, through the optimization of the loss function, introduction of penalty terms, and improvement of mean squared error calculations, the model’s performance in interval prediction tasks is enhanced. Finally, the integration of an interactive channel heterogeneous self-attention module, combined with residual blocks, enhances the modeling capability of the neural network. The comprehensive application of these methods results in superior performance of the model in handling uncertainty and local variations.

EAAI Journal 2025 Journal Article

Robust memory-based graph neural networks for noisy and sparse graphs

  • Linling Jiang
  • Wenchang Zhang
  • Hua Wang
  • Fan Zhang

Real-world graphs often suffer from structural noise and label sparsity, two critical challenges that severely impair the performance of graph neural networks (GNNs). While prior methods have focused on addressing either of these problems in isolation, their co-occurrence in practical scenarios remains largely unsolved. To address this challenge, we propose a robust memory-based GNN for noisy and sparse graphs that stores and updates node similarity information within a memory module to assist in predicting missing edges and eliminating noisy ones. This reconstruction densifies the graph structure, effectively mitigating the impact of noisy edges and alleviating the challenges posed by label sparsity through enhanced information propagation. Furthermore, the reconstructed graph adopts an edge regularization strategy that models the confidence of predicted edges and suppresses uncertain connections during training, thereby smoothing the label propagation for unlabeled nodes and improving the robustness of GNN training. Extensive evaluations conducted on real-world benchmark datasets, including Cora, Citeseer, and Pubmed, demonstrate that our proposed method, the robust memory graph neural network (RMGNN), significantly enhances GNN performance on noisy graphs with limited labeled nodes, with a notable performance boost of up to 17. 8% on the Cora dataset. Our experimental analysis further confirms the effectiveness and efficiency of the proposed memory-based graph structure learning approach in the presence of edge noise and sparse labels, validating the robustness of the framework in complex graph scenarios.

TIST Journal 2025 Journal Article

Scalable Multi-Instance Multi-Shape Support Vector Machine for Whole Slide Breast Histopathology

  • Hoon Seo
  • Yuze Bai
  • Lodewijk Brand
  • Lucia Saldana Barco
  • Hua Wang

Analysis of histopathological images is critical in cancer diagnosis and treatment. Due to the huge size of histopathological images and the varied number of imaging records per patient, many existing works analyze the Whole Slide Image (WSI) as a bag in which its patches are instances. However, these approaches are limited to analyzing the patches in a fixed shape, while the malignant lesions can form varied shapes. To address this challenge, in this article we propose a Multi-Instance Multi-Shape Support Vector Machine (MIMSSVM) to analyze the multiple images (instances) jointly where each instance consists of multiple patches in various shapes. In our approach, we can identify the different morphologic abnormalities of nuclei shapes from the multiple images. In addition to the multi-instance multi-shape learning capability, we derive an efficient solution algorithm to optimize the proposed model that scales well to a large number of features. Our experimental results show our new method outperforms the existing SVMs and deep learning models in histopathological classification. The proposed model also identifies the tissue segments in an image exhibiting an indication of an abnormality which provides utility in the early detection of malignant tumors. All these promising experimental results have demonstrated the effectiveness of our new method. We anticipate that our new method is of interest to biomedical engineering communities beyond WSI research and have open sourced the code of our method online. The implementation of our proposed MIMSSVM model is publicly available at https://github.com/hoonseo0409/MIMSSVM.

EAAI Journal 2025 Journal Article

UTCR-Dehaze: U-Net and transformer-based cycle-consistent generative adversarial network for unpaired remote sensing image dehazing

  • Canlin Li
  • Xiangfei Zhang
  • Hua Wang
  • Zhiwen Shao
  • Lizhuang Ma

To address issues of feature loss and color differences in existing unpaired dehazing methods for Remote Sensing images, we propose a method based on a U-Net and Transformer-based Cycle-Consistent Generative Adversarial Network for unpaired remote sensing image dehazing (UTCR-Dehaze). In this model, considering that paired hazy images are difficult to obtain, a cycle-consistent generative adversarial network (CycleGAN) is used to achieve remote sensing image dehazing. Due to the multi-scale features of remote sensing images, U-Net is combined with Transformer as the generator of CycleGAN. The generator learns the relationship between low-frequency and high-frequency features of the image at multiple scales. The U-Net encoder–decoder processes the high-frequency features, and the transformer at the bottleneck of U-Net learns the low-frequency feature relationship to restore image details and structures. Secondly, to further improve the details and clarity of dehazed images, a Mixed Cascade Group Attention module (MCGA) is designed. MCGA captures the global information of the image through cascade group attention and focuses on local information through Dehaze input-dependent depthwise convolution, thus better learning image features. In addition, to reduce feature loss and color differences in dehazed images, a Cycle Perceptual Identity Consistency Loss is designed, which combines perceptual and identity losses to maintain the details of input images through cycle consistency. Numerous experiments on synthetic and real remote sensing datasets show that, compared with previous methods, this method not only removes haze more accurately but also preserves image details and colors to the greatest extent.

EAAI Journal 2024 Journal Article

Combining optical flow and Swin Transformer for Space-Time video super-resolution

  • Xin Wang
  • Hua Wang
  • Mingli Zhang
  • Fan Zhang

Space–time video super-resolution is a task that aims to interpolate low frame rate, low resolution videos to high frame rate, high resolution ones. While existing Transformer-based methods have achieved results comparable to convolutional neural networks-based methods, the computational cost of Transformer limits its performance with constrained computational resources. Moreover, Swin Transformer may fail to fully exploit the spatio-temporal information of video frames due to the limitation of window size, impeding its effectiveness in handling large motions. To address these limitations, we propose an end-to-end space–time video super-resolution architecture based on optical flow alignment and Swin Transformer. The alignment module is introduced to extract spatio-temporal information from adjacent frames without significantly increasing the computational burden. Additionally, we design a residual convolution layer to enhance the translational invariance of the features extracted by Swin Transformer and introduces additional nonlinear transformations. Experimental results demonstrate that our proposed method achieves superior performance on various benchmark datasets compared to state-of-the-art methods. In terms of Peak Signal-to-Noise Ratio, our method outperforms the state-of-the-art methods by at least 0. 15 dB on Vimeo-Medium dataset.

IJCAI Conference 2024 Conference Paper

Skip-Timeformer: Skip-Time Interaction Transformer for Long Sequence Time-Series Forecasting

  • Wenchang Zhang
  • Hua Wang
  • Fan Zhang

Recent studies have raised questions about the suitability of the Transformer architecture for long sequence time-series forecasting. These forecasting models leverage Transformers to capture dependencies between multiple time steps in a time series, with embedding tokens composed of data from individual time steps. However, challenges arise when applying Transformers to predict long sequences with strong periodicity, leading to performance degradation and increased computational burden. Furthermore, embedding tokens formed one time step at a time may struggle to reveal meaningful information in long sequences, failing to capture correlations between different time steps. In this study, we propose Skip-Timeformer, a Transformer-based model that utilizes a skip-time interaction for long sequence time-series forecasting. Specifically, we decompose the time series into multiple subsequences based on different time intervals, embedding various time steps into variable tokens across multiple sequences. The skip-time interaction mechanism utilizes these variable tokens to capture dependencies in the skip-time dimension. Additionally, skip-time interaction is employed to learn dependencies between sequences missed by multiple skip time steps. The Skip-Timeformer model demonstrates state-of-the-art performance on various real-world datasets, further enhancing the long sequence forecasting capabilities of the Transformer variations and better adapting to arbitrary lookback windows.

ICRA Conference 2023 Conference Paper

3D Reconstruction of Tibia and Fibula using One General Model and Two X-ray Images

  • Kai Pan
  • Shuai Zhang 0029
  • Liang Zhao 0003
  • Shoudong Huang
  • Yanhao Zhang 0003
  • Hua Wang
  • Qi Luo

The 3D reconstruction of patient specific bone models plays a crucial role in orthopaedic surgery for clinical evaluation, surgical planning and precise implant design or selection. This paper considers the problem of reconstructing a patient-specific 3D tibia and fibula model from only two 2D X-ray images and one 3D general model segmented from the lower leg CT scans of one randomly selected patient. Currently, the bone 3D reconstruction mainly relies on computed tomography (CT) and magnetic resonance imaging (MRI) scanning-based mode segmentation which result in high radiation exposure or expensive costs. While, the proposed algorithm can accurately and efficiently deform a 3D general model to achieve a patient-specific 3D model that matches the patient's tibia and fibula projections in two 2D X-rays. The algorithm undergoes a preliminary deformation, 2D contour registration, and opti-misation based on the deformation graph that represents the shape deformation of models. Evaluations using simulations, cadaver and in-vivo experiments demonstrate that the proposed algorithm can effectively reconstruct the patient's 3D tibia and fibula surface model with high accuracy.

NeurIPS Conference 2023 Conference Paper

DP-HyPO: An Adaptive Private Framework for Hyperparameter Optimization

  • Hua Wang
  • Sheng Gao
  • Huanyu Zhang
  • Weijie Su
  • Milan Shen

Hyperparameter optimization, also known as hyperparameter tuning, is a widely recognized technique for improving model performance. Regrettably, when training private ML models, many practitioners often overlook the privacy risks associated with hyperparameter optimization, which could potentially expose sensitive information about the underlying dataset. Currently, the sole existing approach to allow privacy-preserving hyperparameter optimization is to uniformly and randomly select hyperparameters for a number of runs, subsequently reporting the best-performing hyperparameter. In contrast, in non-private settings, practitioners commonly utilize "adaptive" hyperparameter optimization methods such as Gaussian Process-based optimization, which select the next candidate based on information gathered from previous outputs. This substantial contrast between private and non-private hyperparameter optimization underscores a critical concern. In our paper, we introduce DP-HyPO, a pioneering framework for "adaptive" private hyperparameter optimization, aiming to bridge the gap between private and non-private hyperparameter optimization. To accomplish this, we provide a comprehensive differential privacy analysis of our framework. Furthermore, we empirically demonstrate the effectiveness of DP-HyPO on a diverse set of real-world datasets.

TMLR Journal 2023 Journal Article

On the Convergence and Calibration of Deep Learning with Differential Privacy

  • Zhiqi Bu
  • Hua Wang
  • Zongyu Dai
  • Qi Long

Differentially private (DP) training preserves the data privacy usually at the cost of slower convergence (and thus lower accuracy), as well as more severe mis-calibration than its non-private counterpart. To analyze the convergence of DP training, we formulate a continuous time analysis through the lens of neural tangent kernel (NTK), which characterizes the per-sample gradient clipping and the noise addition in DP training, for arbitrary network architectures and loss functions. Interestingly, we show that the noise addition only affects the privacy risk but not the convergence or calibration, whereas the per-sample gradient clipping (under both flat and layerwise clipping styles) only affects the convergence and calibration. Furthermore, we observe that while DP models trained with small clipping norm usually achieve the best accurate, but are poorly calibrated and thus unreliable. In sharp contrast, DP models trained with large clipping norm enjoy the same privacy guarantee and similar accuracy, but are significantly more \textit{calibrated}. Our code can be found at https://github.com/woodyx218/opacus_global_clipping.

IJCAI Conference 2022 Conference Paper

I2CNet: An Intra- and Inter-Class Context Information Fusion Network for Blastocyst Segmentation

  • Hua Wang
  • Linwei Qiu
  • Jingfei Hu
  • Jicong Zhang

The quality of a blastocyst directly determines the embryo's implantation potential, thus making it essential to objectively and accurately identify the blastocyst morphology. In this work, we propose an automatic framework named I2CNet to perform the blastocyst segmentation task in human embryo images. The I2CNet contains two components: IntrA-Class Context Module (IACCM) and InteR-Class Context Module (IRCCM). The IACCM aggregates the representations of specific areas sharing the same category for each pixel, where the categorized regions are learned under the supervision of the groundtruth. This aggregation decomposes a K-category recognition task into K recognition tasks of two labels while maintaining the ability of garnering intra-class features. In addition, the IRCCM is designed based on the blastocyst morphology to compensate for inter-class information which is gradually gathered from inside out. Meanwhile, a weighted mapping function is applied to facilitate edges of the inter classes and stimulate some hard samples. Eventually, the learned intra- and inter-class cues are integrated from coarse to fine, rendering sufficient information interaction and fusion between multi-scale features. Quantitative and qualitative experiments demonstrate that the superiority of our model compared with other representative methods. The I2CNet achieves accuracy of 94. 14% and Jaccard of 85. 25% on blastocyst public dataset.

JBHI Journal 2022 Journal Article

Multi-Scale Interactive Network With Artery/Vein Discriminator for Retinal Vessel Classification

  • Jingfei Hu
  • Hua Wang
  • Guang Wu
  • Zhaohui Cao
  • Lei Mou
  • Yitian Zhao
  • Jicong Zhang

Automatic classification of retinal arteries and veins plays an important role in assisting clinicians to diagnosis cardiovascular and eye-related diseases. However, due to the high degree of anatomical variation across the population, and the presence of inconsistent labels by the subjective judgment of annotators in available training data, most of existing methods generally suffer from blood vessel discontinuity and arteriovenous confusion, the artery/vein (A/V) classification task still faces great challenges. In this work, we propose a multi-scale interactive network with A/V discriminator for retinal artery and vein recognition, which can reduce the arteriovenous confusion and alleviate the disturbance of noisy label. A multi-scale interaction (MI) module is designed in encoder for realizing the cross-space multi-scale features interaction of fundus images, effectively integrate high-level and low-level context information. In particular, we also design an ingenious A/V discriminator (AVD) that utilizes the independent and shared information between arteries and veins, and combine with topology loss, to further strengthen the learning ability of model to resolve the arteriovenous confusion. In addition, we adopt a sample re-weighting (SW) strategy to effectively alleviate the disturbance from data labeling errors. The proposed model is verified on three publicly available fundus image datasets (AV-DRIVE, HRF, LES-AV) and a private dataset. We achieve the accuracy of 97. 47%, 96. 91%, 97. 79%, and 98. 18% respectively on these four datasets. Extensive experimental results demonstrate that our method achieves competitive performance compared with state-of-the-art methods for A/V classification. To address the problem of training data scarcity, we publicly release 100 fundus images with A/V annotations to promote relevant research in the community.

JBHI Journal 2022 Journal Article

Ultrasound Entropy Imaging for Detection and Monitoring of Thermal Lesion During Microwave Ablation of Liver

  • Xiejing Li
  • Xin Jia
  • Ting Shen
  • Mengke Wang
  • Guang Yang
  • Hua Wang
  • Qinli Sun
  • Mingxi Wan

Ultrasonic B-mode imaging offers non-invasive and real-time monitoring of thermal ablation treatment in clinical use, however it faces challenges of moderate lesion-normal contrast and detection accuracy. Quantitative ultrasound imaging techniques have been proposed as promising tools to evaluate the microstructure of ablated tissue. In this study, we introduced Shannon entropy, a non-model based statistical measurement of disorder, to quantitatively detect and monitor microwave-induced ablation in porcine livers. Performance of typical Shannon entropy (TSE), weighted Shannon entropy (WSE), and horizontally normalized Shannon entropy (hNSE) were explored and compared with conventional B-mode imaging. TSE estimated from non-normalized probability distribution histograms was found to have insufficient discernibility of different disorder of data. WSE that improves from TSE by adding signal amplitudes as weights obtained area under receiver operating characteristic (AUROC) curve of 0. 895, whereas it underestimated the periphery of lesion region. hNSE provided superior ablated area prediction with the correlation coefficient of 0. 90 against ground truth, AUROC of 0. 868, and remarkable lesion-normal contrast with contrast-to-noise ratio of 5. 86 which was significantly higher than other imaging methods. Data distributions shown in horizontally normalized probability distribution histograms indicated that the disorder of backscattered envelope signal from ablated region increased as treatment went on. These findings suggest that hNSE imaging could be a promising technique to assist ultrasound guided percutaneous thermal ablation.

TCS Journal 2021 Journal Article

A constrained two-stage submodular maximization

  • Ruiqi Yang
  • Shuyang Gu
  • Chuangen Gao
  • Weili Wu
  • Hua Wang
  • Dachuan Xu

In this paper, we investigate the two-stage submodular maximization problem, where there is a collection F = { f 1, .. ., f m } of m submodular functions which are defined on the same element ground set Ω. The goal is to select a subset S ⊆ Ω of size at most ℓ such that 1 m ∑ f ∈ F max T ⊆ S, T ∈ I ⁡ f ( T ) is maximized, where I denotes a specifically-defined independence system. We consider the two-stage submodular maximization with a P-matroid constraint and present a ( 1 / ( P + 1 ) ) ( 1 − 1 / e ( P + 1 ) ) -approximation algorithm. Furthermore, we extend the algorithm to the two-stage submodular maximization with a more generalized P-exchange system constraint and show the approximation ratio can be maintained with slightly modifications of the algorithm.

TCS Journal 2021 Journal Article

Enumeration of subtrees and BC-subtrees with maximum degree no more than k in trees

  • Yu Yang
  • Xiao-xiao Li
  • Meng-yuan Jin
  • Long Li
  • Hua Wang
  • Xiao-Dong Zhang

The subtrees and BC-subtrees (subtrees where any two leaves are at even distance apart) have been intensively studied in recent years. Such structures, under special constraints on degrees, have a wide range of applications in many fields. By way of an approach based on generating functions, we present novel recursive algorithms for enumerating various subtrees and BC-subtrees of maximum degree ≤k in trees. The algorithms are explained through detailed examples. We also briefly discuss, in trees, the densities of subtrees (resp. BC-subtrees) with maximum degree ≤k among all subtrees (resp. BC-subtrees). For a tree of order n, the novelly proposed algorithms have multiple advantages. (1) Novel ( k + 2 ) (resp. ( 2 k + 3 ) ) variable generating functions were introduced to construct the algorithms. (2) The proposed algorithms solved the fast enumerating problem of subtree (resp. BC-subtrees) with maximum degree constraint, and also make the subtree (resp. BC-subtrees) enumerating algorithms proposed by Yan and Yeh [1] (resp. Yang et al. [2]) a special case of ours with k = n − 1. (3) The time complexity of our algorithm for subtree (resp. BC-subtrees) is O ( k n ) (resp. O ( k n 2 ) ), which is much faster than the O ( n 2 ) (resp. O ( k n 3 ) ) time method based on algorithm proposed in [1] (resp. [2]).

NeurIPS Conference 2021 Conference Paper

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

  • Jiayao Zhang
  • Hua Wang
  • Weijie Su

Understanding the training dynamics of deep learning models is perhaps a necessary step toward demystifying the effectiveness of these models. In particular, how do training data from different classes gradually become separable in their feature spaces when training neural networks using stochastic gradient descent? In this paper, we model the evolution of features during deep learning training using a set of stochastic differential equations (SDEs) that each corresponding to a training sample. As a crucial ingredient in our modeling strategy, each SDE contains a drift term that reflects the impact of backpropagation at an input on the features of all samples. Our main finding uncovers a sharp phase transition phenomenon regarding the intra-class impact: if the SDEs are locally elastic in the sense that the impact is more significant on samples from the same class as the input, the features of training data become linearly separable---meaning vanishing training loss; otherwise, the features are not separable, no matter how long the training time is. In the presence of local elasticity, moreover, an analysis of our SDEs shows the emergence of a simple geometric structure called neural collapse of the features. Taken together, our results shed light on the decisive role of local elasticity underlying the training dynamics of neural networks. We corroborate our theoretical analysis with experiments on a synthesized dataset of geometric shapes as well as on CIFAR-10.

AAAI Conference 2021 Conference Paper

Integrating Static and Dynamic Data for Improved Prediction of Cognitive Declines Using Augmented Genotype-Phenotype Representations

  • Hoon Seo
  • Lodewijk Brand
  • Hua Wang
  • Feiping Nie

Alzheimer’s Disease (AD) is a chronic neurodegenerative disease that causes severe problems in patients’ thinking, memory, and behavior. An early diagnosis is crucial to prevent AD progression; to this end, many algorithmic approaches have recently been proposed to predict cognitive decline. However, these predictive models often fail to integrate heterogeneous genetic and neuroimaging biomarkers and struggle to handle missing data. In this work we propose a novel objective function and an associated optimization algorithm to identify cognitive decline related to AD. Our approach is designed to incorporate dynamic neuroimaging data by way of a participant-specific augmentation combined with multimodal data integration aligned via a regression task. Our approach, in order to incorporate additional side-information, utilizes structured regularization techniques popularized in recent AD literature. Armed with the fixed-length vector representation learned from the multimodal dynamic and static modalities, conventional machine learning methods can be used to predict the clinical outcomes associated with AD. Our experimental results show that the proposed augmentation model improves the prediction performance on cognitive assessment scores for a collection of popular machine learning algorithms. The results of our approach are interpreted to validate existing genetic and neuroimaging biomarkers that have been shown to be predictive of cognitive decline.

AIIM Journal 2020 Journal Article

Ensemble neural network approach detecting pain intensity from facial expressions

  • Ghazal Bargshady
  • Xujuan Zhou
  • Ravinesh C. Deo
  • Jeffrey Soar
  • Frank Whittaker
  • Hua Wang

This paper reports on research to design an ensemble deep learning framework that integrates fine-tuned, three-stream hybrid deep neural network (i. e. , Ensemble Deep Learning Model, EDLM), employing Convolutional Neural Network (CNN) to extract facial image features, detect and accurately classify the pain. To develop the approach, the VGGFace is fine-tuned and integrated with Principal Component Analysis and employed to extract features in images from the Multimodal Intensity Pain database at the early phase of the model fusion. Subsequently, a late fusion, three layers hybrid CNN and recurrent neural network algorithm is developed with their outputs merged to produce image-classified features to classify pain levels. The EDLM model is then benchmarked by means of a single-stream deep learning model including several competing models based on deep learning methods. The results obtained indicate that the proposed framework is able to outperform the competing methods, applied in a multi-level pain detection database to produce a feature classification accuracy that exceeds 89 %, with a receiver operating characteristic of 93 %. To evaluate the generalization of the proposed EDLM model, the UNBC-McMaster Shoulder Pain dataset is used as a test dataset for all of the modelling experiments, which reveals the efficacy of the proposed method for pain classification from facial images. The study concludes that the proposed EDLM model can accurately classify pain and generate multi-class pain levels for potential applications in the medical informatics area, and should therefore, be explored further in expert systems for detecting and classifying the pain intensity of patients, and automatically evaluating the patients’ pain level accurately.

AAAI Conference 2020 Conference Paper

Learning Multi-Modal Biomarker Representations via Globally Aligned Longitudinal Enrichments

  • Lyujian Lu
  • Saad Elbeleidy
  • Lauren Zoe Baker
  • Hua Wang

Alzheimer’s Disease (AD) is a chronic neurodegenerative disease that severely impacts patients’ thinking, memory and behavior. To aid automatic AD diagnoses, many longitudinal learning models have been proposed to predict clinical outcomes and/or disease status, which, though, often fail to consider missing temporal phenotypic records of the patients that can convey valuable information of AD progressions. Another challenge in AD studies is how to integrate heterogeneous genotypic and phenotypic biomarkers to improve diagnosis prediction. To cope with these challenges, in this paper we propose a longitudinal multi-modal method to learn enriched genotypic and phenotypic biomarker representations in the format of fixed-length vectors that can simultaneously capture the baseline neuroimaging measurements of the entire dataset and progressive variations of the varied counts of follow-up measurements over time of every participant from different biomarker sources. The learned global and local projections are aligned by a soft constraint and the structuredsparsity norm is used to uncover the multi-modal structure of heterogeneous biomarker measurements. While the proposed objective is clearly motivated to characterize the progressive information of AD developments, it is a nonsmooth objective that is difficult to efficiently optimize in general. Thus, we derive an efficient iterative algorithm, whose convergence is rigorously guaranteed in mathematics. We have conducted extensive experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data using one genotypic and two phenotypic biomarkers. Empirical results have demonstrated that the learned enriched biomarker representations are more effective in predicting the outcomes of various cognitive assessments. Moreover, our model has successfully identified disease-relevant biomarkers supported by existing medical findings that additionally warrant the correctness of our method from the clinical perspective. ∗ Corresponding author. † Data used in preparation of this article were obtained from the ADNI database (adni. loni. usc. edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: https: //adni. loni. usc. edu/wpcontent/uploads/how to apply/ADNI Acknowledgement List. pdf. Copyright c 2020, Association for the Advancement of Artificial Intelligence (www. aaai. org). All rights reserved.

NeurIPS Conference 2020 Conference Paper

The Complete Lasso Tradeoff Diagram

  • Hua Wang
  • Yachong Yang
  • Zhiqi Bu
  • Weijie Su

A fundamental problem in high-dimensional regression is to understand the tradeoff between type I and type II errors or, equivalently, false discovery rate (FDR) and power in variable selection. To address this important problem, we offer the first complete diagram that distinguishes all pairs of FDR and power that can be asymptotically realized by the Lasso from the remaining pairs, in a regime of linear sparsity under random designs. The tradeoff between the FDR and power characterized by our diagram holds no matter how strong the signals are. In particular, our results complete the earlier Lasso tradeoff diagram in previous literature by recognizing two simple constraints on the pairs of FDR and power. The improvement is more substantial when the regression problem is above the Donoho-Tanner phase transition. Finally, we present extensive simulation studies to confirm the sharpness of the complete Lasso tradeoff diagram.

TCS Journal 2020 Journal Article

Viral marketing of online game by DS decomposition in social networks

  • Chuangen Gao
  • Hai Du
  • Weili Wu
  • Hua Wang

In social networks, the spread of influence has been studied extensively, but most efforts in existing literature are made on the product used by a single person. This paper attempts to address the product which is used by many persons such as the online game. When multiple people participate in one game, interaction between users is accompanied by browsing and clicking on advertisements, and operators can also earn certain advertising revenues. All these revenues are related to information interaction between people involved in one game. We use game profit to represent all of the revenues gained from players involved in one game and model the game profit maximization problem in social networks, which finds a seed set to maximize the game profit between players who are influenced to buy the game. We prove that the problem is NP-hard and the objective function is neither submodular nor supermodular. To solve it, we decompose it into the Difference between two Submodular functions (DS decomposition) and propose four heuristic algorithms. To address the complexity of computing objective function, we design a new sampling method based on reverse reachable set technology. Experiment results on real datasets show that our approaches perform well.

IJCAI Conference 2019 Conference Paper

Learning Robust Distance Metric with Side Information via Ratio Minimization of Orthogonally Constrained L21-Norm Distances

  • Kai Liu
  • Lodewijk Brand
  • Hua Wang
  • Feiping Nie

Metric Learning, which aims at learning a distance metric for a given data set, plays an important role in measuring the distance or similarity between data objects. Due to its broad usefulness, it has attracted a lot of interest in machine learning and related areas in the past few decades. This paper proposes to learn the distance metric from the side information in the forms of must-links and cannot-links. Given the pairwise constraints, our goal is to learn a Mahalanobis distance that minimizes the ratio of the distances of the data pairs in the must-links to those in the cannot-links. Different from many existing papers that use the traditional squared L2-norm distance, we develop a robust model that is less sensitive to data noise or outliers by using the not-squared L2-norm distance. In our objective, the orthonormal constraint is enforced to avoid degenerate solutions. To solve our objective, we have derived an efficient iterative solution algorithm. We have conducted extensive experiments, which demonstrated the superiority of our method over state-of-the-art.

IJCAI Conference 2019 Conference Paper

Learning Strictly Orthogonal p-Order Nonnegative Laplacian Embedding via Smoothed Iterative Reweighted Method

  • Haoxuan Yang
  • Kai Liu
  • Hua Wang
  • Feiping Nie

Laplacian Embedding (LE) is a powerful method to reveal the intrinsic geometry of high-dimensional data by using graphs. Imposing the orthogonal and nonnegative constraints onto the LE objective has proved to be effective to avoid degenerate and negative solutions, which, though, are challenging to achieve simultaneously because they are nonlinear and nonconvex. In addition, recent studies have shown that using the p-th order of the L2-norm distances in LE can find the best solution for clustering and promote the robustness of the embedding model against outliers, although this makes the optimization objective nonsmooth and difficult to efficiently solve in general. In this work, we study LE that uses the p-th order of the L2-norm distances and satisfies both orthogonal and nonnegative constraints. We introduce a novel smoothed iterative reweighted method to tackle this challenging optimization problem and rigorously analyze its convergence. We demonstrate the effectiveness and potential of our proposed method by extensive empirical studies on both synthetic and real data sets.

AAAI Conference 2019 Conference Paper

Visual Place Recognition via Robust ℓ2-Norm Distance Based Holism and Landmark Integration

  • Kai Liu
  • Hua Wang
  • Fei Han
  • Hao Zhang

Visual place recognition is essential for large-scale simultaneous localization and mapping (SLAM). Long-term robot operations across different time of the days, months, and seasons introduce new challenges from significant environment appearance variations. In this paper, we propose a novel method to learn a location representation that can integrate the semantic landmarks of a place with its holistic representation. To promote the robustness of our new model against the drastic appearance variations due to long-term visual changes, we formulate our objective to use non-squared ℓ2-norm distances, which leads to a difficult optimization problem that minimizes the ratio of the ℓ2,1-norms of matrices. To solve our objective, we derive a new efficient iterative algorithm, whose convergence is rigorously guaranteed by theory. In addition, because our solution is strictly orthogonal, the learned location representations can have better place recognition capabilities. We evaluate the proposed method using two large-scale benchmark data sets, the CMU-VL and Nordland data sets. Experimental results have validated the effectiveness of our new method in long-term visual place recognition applications.

IJCAI Conference 2018 Conference Paper

High-Order Co-Clustering via Strictly Orthogonal and Symmetric L1-Norm Nonnegative Matrix Tri-Factorization

  • Kai Liu
  • Hua Wang

Different to traditional clustering methods that deal with one single type of data, High-Order Co- Clustering (HOCC) aims to cluster multiple types of data simultaneously by utilizing the inter- or/and intra-type relationships across different data types. In existing HOCC methods, data points routinely enter the objective functions with squared residual errors. As a result, outlying data samples can dominate the objective functions, which may lead to incorrect clustering results. Moreover, existing methods usually suffer from soft clustering, where the probabilities to different groups can be very close. In this paper, we propose an L1 -norm symmetric nonnegative matrix tri-factorization method to solve the HOCC problem. Due to the orthogonal constraints and the symmetric L1 -norm formulation in our new objective, conventional auxiliary function approach no longer works. Thus we derive the solution algorithm using the alternating direction method of multipliers. Extensive experiments have been conducted on a real world data set, in which promising empirical results, including less time consumption, strictly orthogonal membership matrix, lower local minima etc. , have demonstrated the effectiveness of our proposed method.

AAAI Conference 2018 Conference Paper

Learning Integrated Holism-Landmark Representations for Long-Term Loop Closure Detection

  • Fei Han
  • Hua Wang
  • Hao Zhang

Loop closure detection is a critical component of large-scale simultaneous localization and mapping (SLAM) in loopy environments. This capability is challenging to achieve in longterm SLAM, when the environment appearance exhibits significant long-term variations across various time of the day, months, and even seasons. In this paper, we introduce a novel formulation to learn an integrated long-term representation based upon both holistic and landmark information, which integrates two previous insights under a unified framework: (1) holistic representations outperform keypoint-based representations, and (2) landmarks as an intermediate representation provide informative cues to detect challenging locations. Our new approach learns the representation by projecting input visual data into a low-dimensional space, which preserves both the global consistency (to minimize representation error) and the local consistency (to preserve landmarks’ pairwise relationship) of the input data. To solve the formulated optimization problem, a new algorithm is developed with theoretically guaranteed convergence. Extensive experiments have been conducted using two large-scale public benchmark data sets, in which the promising performances have demonstrated the effectiveness of the proposed approach.

TIST Journal 2018 Journal Article

Traffic Simulation and Visual Verification in Smog

  • Mingliang Xu
  • Hua Wang
  • Shili Chu
  • Yong Gan
  • Xiaoheng Jiang
  • Yafei Li
  • Bing Zhou

Smog causes low visibility on the road and it can impact the safety of traffic. Modeling traffic in smog will have a significant impact on realistic traffic simulations. Most existing traffic models assume that drivers have optimal vision in the simulations, making these simulations are not suitable for modeling smog weather conditions. In this article, we introduce the Smog Full Velocity Difference Model (SMOG-FVDM) for a realistic simulation of traffic in smog weather conditions. In this model, we present a stadia model for drivers in smog conditions. We introduce it into a car-following traffic model using both psychological force and body force concepts, and then we introduce the SMOG-FVDM. Considering that there are lots of parameters in the SMOG-FVDM, we design a visual verification system based on SMOG-FVDM to arrive at an adequate solution which can show visual simulation results under different road scenarios and different degrees of smog by reconciling the parameters. Experimental results show that our model can give a realistic and efficient traffic simulation of smog weather conditions.

AAAI Conference 2017 Conference Paper

Semi-Supervised Classifications via Elastic and Robust Embedding

  • Yun Liu
  • Yiming Guo
  • Hua Wang
  • Feiping Nie
  • Heng Huang

Transductive semi-supervised learning can only predict labels for unlabeled data appearing in training data, and can not predict labels for testing data never appearing in training set. To handle this out-of-sample problem, many inductive methods make a constraint such that the predicted label matrix should be exactly equal to a linear model. In practice, this constraint might be too rigid to capture the manifold structure of data. In this paper, we relax this rigid constraint and propose to use an elastic constraint on the predicted label matrix such that the manifold structure can be better explored. Moreover, since unlabeled data are often very abundant in practice and usually there are some outliers, we use a non-squared loss instead of the traditional squared loss to learn a robust model. The derived problem, although is convex, has so many nonsmooth terms, which make it very challenging to solve. In the paper, we propose an efficient optimization algorithm to solve a more general problem, based on which we find the optimal solution to the derived problem.

IJCAI Conference 2016 Conference Paper

Enforcing Template Representability and Temporal Consistency for Adaptive Sparse Tracking

  • Xue Yang
  • Fei Han
  • Hua Wang
  • Hao Zhang

Sparse representation has been widely studied in visual tracking, which has shown promising tracking performance. Despite a lot of progress, the visual tracking problem is still a challenging task due to appearance variations over time. In this paper, we propose a novel sparse tracking algorithm that well addresses temporal appearance changes, by enforcing template representability and temporal consistency (TRAC). By modeling temporal consistency, our algorithm addresses the issue of drifting away from a tracking target. By exploring the templates' long-term-short-term representability, the proposed method adaptively updates the dictionary using the most descriptive templates, which significantly improves the robustness to target appearance changes. We compare our TRAC algorithm against the state-of-the-art approaches on 12 challenging benchmark image sequences. Both qualitative and quantitative results demonstrate that our algorithm significantly outperforms previous state-of-the-art trackers.

AAAI Conference 2016 Conference Paper

New l1-Norm Relaxations and Optimizations for Graph Clustering

  • Feiping Nie
  • Hua Wang
  • Cheng Deng
  • Xinbo Gao
  • Xuelong Li
  • Heng Huang

In recent data mining research, the graph clustering methods, such as normalized cut and ratio cut, have been well studied and applied to solve many unsupervised learning applications. The original graph clustering methods are NP-hard problems. Traditional approaches used spectral relaxation to solve the graph clustering problems. The main disadvantage of these approaches is that the obtained spectral solutions could severely deviate from the true solution. To solve this problem, in this paper, we propose a new relaxation mechanism for graph clustering methods. Instead of minimizing the squared distances of clustering results, we use the 1-norm distance. More important, considering the normalized consistency, we also use the 1norm for the normalized terms in the new graph clustering relaxations. Due to the sparse result from the 1-norm minimization, the solutions of our new relaxed graph clustering methods get discrete values with many zeros, which are close to the ideal solutions. Our new objectives are difficult to be optimized, because the minimization problem involves the ratio of nonsmooth terms. The existing sparse learning optimization algorithms cannot be applied to solve this problem. In this paper, we propose a new optimization algorithm to solve this difficult non-smooth ratio minimization problem. The extensive experiments have been performed on three two-way clustering and eight multi-way clustering benchmark data sets. All empirical results show that our new relaxation methods consistently enhance the normalized cut and ratio cut clustering results.

AAAI Conference 2015 Conference Paper

Collaborative Topic Ranking: Leveraging Item Meta-Data for Sparsity Reduction

  • Weilong Yao
  • Jing He
  • Hua Wang
  • Yanchun Zhang
  • Jie Cao

Pair-wise ranking methods have been widely used in recommender systems to deal with implicit feedback. They attempt to discriminate between a handful of observed items and the large set of unobserved items. In these approaches, however, user preferences and item characteristics cannot be estimated reliably due to overfitting given highly sparse data. To alleviate this problem, in this paper, we propose a novel hierarchical Bayesian framework which incorporates “bag-ofwords” type meta-data on items into pair-wise ranking models for one-class collaborative filtering. The main idea of our method lies in extending the pair-wise ranking with a probabilistic topic modeling. Instead of regularizing item factors through a zero-mean Gaussian prior, our method introduces item-specific topic proportions as priors for item factors. As a by-product, interpretable latent factors for users and items may help explain recommendations in some applications. We conduct an experimental study on a real and publicly available dataset, and the results show that our algorithm is effective in providing accurate recommendation and interpreting user factors and item factors.

TCS Journal 2015 Journal Article

Enumeration of BC-subtrees of trees

  • Yu Yang
  • Hongbo Liu
  • Hua Wang
  • Scott Makeig

A BC-tree (block-cutpoint-tree) is a tree (with at least two vertices) where the distance between any two leaves is even. Motivated from the study of the “core” of a graph, BC-trees form an interesting class of trees. We provide a comprehensive study of questions related to BC-trees. As the analogue of the study of extremal questions on subtrees of trees, we first characterize the general extremal trees that maximize or minimize the number of BC-subtrees or leaf-containing BC-subtrees. We further discuss the “middle part” of a tree with respect to the number of BC-subtrees, namely the BC-subtree-core that behaves in a rather different way than all previously known “middle parts” of a tree. Last but not least, fast algorithms are proposed (following similar ideas as those of the enumeration of subtrees) for enumerating various classes of BC-subtrees of a tree.

AAAI Conference 2015 Conference Paper

Learning Robust Locality Preserving Projection via p-Order Minimization

  • Hua Wang
  • Feiping Nie
  • Heng Huang

Locality preserving projection (LPP) is an effective dimensionality reduction method based on manifold learning, which is defined over the graph weighted squared 2-norm distances in the projected subspace. Since squared 2-norm distance is prone to outliers, it is desirable to develop a robust LPP method. In this paper, motivated by existing studies that improve the robustness of statistical learning models via 1-norm or not-squared 2-norm formulations, we propose a robust LPP (rLPP) formulation to minimize the p-th order of the 2-norm distances, which can better tolerate large outlying data samples because it suppress the introduced biased more than the 1-norm or not squared 2-norm minimizations. However, solving the formulated objective is very challenging because it not only non-smooth but also non-convex. As an important theoretical contribution of this work, we systematically derive an efficient iterative algorithm to solve the general p-th order 2-norm minimization problem, which, to the best of our knowledge, is solved for the first time in literature. Extensive empirical evaluations on the proposed rLPP method have been performed, in which our new method outperforms the related state-of-the-art methods in a variety of experimental settings and demonstrate its effectiveness in seeking better subspaces on both noiseless and noisy data.

IJCAI Conference 2013 Conference Paper

Adaptive Loss Minimization for Semi-Supervised Elastic Embedding

  • Feiping Nie
  • Hua Wang
  • Heng Huang
  • Chris Ding

The semi-supervised learning usually only predict labels for unlabeled data appearing in training data, and cannot effectively predict labels for testing data never appearing in training set. To handle this outof-sample problem, many inductive methods make a constraint such that the predicted label matrix should be exactly equal to a linear model. In practice, this constraint is too rigid to capture the manifold structure of data. Motivated by this deficiency, we relax the rigid linear embedding constraint and propose to use an elastic embedding constraint on the predicted label matrix such that the manifold structure can be better explored. To solve our new objective and also a more general optimization problem, we study a novel adaptive loss with efficient optimization algorithm. Our new adaptive loss minimization method takes the advantages of both L1 norm and L2 norm, and is robust to the data outlier under Laplacian distribution and can efficiently learn the normal data under Gaussian distribution. Experiments have been performed on image classification tasks and our approach outperforms other state-of-the-art methods.

IJCAI Conference 2013 Conference Paper

Early Active Learning via Robust Representation and Structured Sparsity

  • Feiping Nie
  • Hua Wang
  • Heng Huang
  • Chris Ding

Labeling training data is quite time-consuming but essential for supervised learning models. To solve this problem, the active learning has been studied and applied to select the informative and representative data points for labeling. However, during the early stage of experiments, only a small number (or none) of labeled data points exist, thus the most representative samples should be selected first. In this paper, we propose a novel robust active learning method to handle the early stage experimental design problem and select the most representative data points. Selecting the representative samples is an NP-hard problem, thus we employ the structured sparsity-inducing norm to relax the objective to an efficient convex formulation. Meanwhile, the robust sparse representation loss function is utilized to reduce the effect of outliers. A new efficient optimization algorithm is introduced to solve our non-smooth objective with low computational cost and proved global convergence. Empirical results on both single-label and multi-label classification benchmark data sets show the promising results of our method.

IJCAI Conference 2013 Conference Paper

Protein Function Prediction via Laplacian Network Partitioning Incorporating Function Category Correlations

  • Hua Wang
  • Heng Huang
  • Chris Ding

Understanding the molecular mechanisms of life requires decoding the functions of the proteins in an organism. Various high-throughput experimental techniques have been developed to characterize biological systems at the genome scale. A fundamental challenge of the post-genomic era is to assign biological functions to all the proteins encoded by the genome using high-throughput biological data. To address this challenge, we propose a novel Laplacian Network Partitioning incorporating function category Correlations (LNPC) method to predict protein function on proteinprotein interaction (PPI) networks by optimizing a Laplacian based quotient objective function that seeks the optimal network configuration to maximize consistent function assignments over edges on the whole graph. Unlike the existing approaches that have no unique optimization solutions, our optimization problem has unique global solution by eigen-decomposition methods. The correlations among protein function categories are quantified and incorporated into a correlated protein affinity graph which is integrated into the PPI graph to significantly improve the protein function prediction accuracy. We apply our new method to the BioGRID dataset for the Saccharomyces Cerevisiae species using the MIPS annotation scheme. Our new method outperforms other related state-of-the-art approaches more than 63% by the average precision of function prediction and 53% by the average F1 score.

NeurIPS Conference 2012 Conference Paper

High-Order Multi-Task Feature Learning to Identify Longitudinal Phenotypic Markers for Alzheimer's Disease Progression Prediction

  • Hua Wang
  • Feiping Nie
  • Heng Huang
  • Jingwen Yan
  • Sungeun Kim
  • Shannon Risacher
  • Andrew Saykin
  • Li Shen

Alzheimer disease (AD) is a neurodegenerative disorder characterized by progressive impairment of memory and other cognitive functions. Regression analysis has been studied to relate neuroimaging measures to cognitive status. However, whether these measures have further predictive power to infer a trajectory of cognitive performance over time is still an under-explored but important topic in AD research. We propose a novel high-order multi-task learning model to address this issue. The proposed model explores the temporal correlations existing in data features and regression tasks by the structured sparsity-inducing norms. In addition, the sparsity of the model enables the selection of a small number of MRI measures while maintaining high prediction accuracy. The empirical studies, using the baseline MRI and serial cognitive data of the ADNI cohort, have yielded promising results.

IJCAI Conference 2011 Conference Paper

Fast Nonnegative Matrix Tri-Factorization for Large-Scale Data Co-Clustering

  • Hua Wang
  • Feiping Nie
  • Heng Huang
  • Fillia Makedon

NonnegativeMatrix Factorization (NMF) based coclustering methods have attracted increasing attention in recent years because of their mathematical elegance and encouraging empirical results. However, the algorithms to solve NMF problems usually involve intensive matrix multiplications, which make them computationally inefficient. In this paper, instead of constraining the factor matrices of NMF to be nonnegative as existing methods, we propose a novel Fast Nonnegative Matrix Trifactorization (FNMTF) approach to constrain them to be cluster indicator matrices, a special type of nonnegative matrices. As a result, the optimization problem of our approach can be decoupled, which results in much smaller size subproblems requiring much less matrix multiplications, such that our approach works well for large-scale input data. Moreover, the resulted factor matrices can directly assign cluster labels to data points and features due to the nature of indicator matrices. In addition, through exploiting the manifold structures in both data and feature spaces, we further introduce the Locality Preserved FNMTF (LP-FNMTF) approach, by which the clustering performance is improved. The promising results in extensive experimental evaluations validate the effectiveness of the proposed methods.

AAAI Conference 2011 Conference Paper

Learning Instance Specific Distance for Multi-Instance Classification

  • Hua Wang
  • Feiping Nie
  • Heng Huang

Multi-Instance Learning (MIL) deals with problems where each training example is a bag, and each bag contains a set of instances. Multi-instance representation is useful in many real world applications, because it is able to capture more structural information than traditional flat single-instance representation. However, it also brings new challenges. Specifically, the distance between data objects in MIL is a set-to-set distance, which is harder to estimate than vector distances used in single-instance data. Moreover, because in MIL labels are assigned to bags instead of instances, although a bag belongs to a class, some, or even most, of its instances may not be truly related to the class. In order to address these difficulties, in this paper we propose a novel Instance Specific Distance (ISD) method for MIL, which computes the Class-to-Bag (C2B) distance by further considering the relevances of training instances with respect to their labeled classes. Taking into account the outliers caused by the weak label association in MIL, we learn ISD by solving an 0+ -norm minimization problem. An efficient algorithm to solve the optimization problem is presented, together with the rigorous proof of its convergence. The promising results on five benchmark multi-instance data sets and two real world multi-instance applications validate the effectiveness of the proposed method.

NeurIPS Conference 2011 Conference Paper

Maximum Margin Multi-Instance Learning

  • Hua Wang
  • Heng Huang
  • Farhad Kamangar
  • Feiping Nie
  • Chris Ding

Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags. Taking into account the two major challenges in MIL, high heterogeneity on data and weak label association, we propose a novel Maximum Margin Multi-Instance Learning (M3 I) approach to parameterize the C2B distance by introducing the class specific distance metrics and the locally adaptive significance coefficients. We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. Extensive experiments have demonstrated promising results that validate the proposed method.

AAAI Conference 2010 Conference Paper

Discriminant Laplacian Embedding

  • Hua Wang
  • Heng Huang
  • Chris Ding

Many real life applications brought by modern technologies often have multiple data sources, which are usually characterized by both attributes and pairwise similarities at the same time. For example in webpage ranking, a webpage is usually represented by a vector of term values, and meanwhile the internet linkages induce pairwise similarities among the webpages. Although both attributes and pairwise similarities are useful for class membership inference, many traditional embedding algorithms only deal with one type of input data. In order to make use of the both types of data simultaneously, in this work, we propose a novel Discriminant Laplacian Embedding (DLE) approach. Supervision information from training data are integrated into DLE to improve the discriminativity of the resulted embedding space. By solving the ambiguity problem in computing the scatter matrices caused by data points with multiple labels, we successfully extend the proposed DLE to multi-label classification. In addition, through incorporating the label correlations, the classification performance using multi-label DLE is further enhanced. Promising experimental results in extensive empirical evaluations have demonstrated the effectiveness of our approaches.

AAAI Conference 2010 Conference Paper

Multi-Label Classification: Inconsistency and Class Balanced K-Nearest Neighbor

  • Hua Wang
  • Chris Ding
  • Heng Huang

Many existing approaches employ one-vs-rest method to decompose a multi-label classification problem into a set of 2class classification problems, one for each class. This method is valid in traditional single-label classification, it, however, incurs training inconsistency in multi-label classification, because in the latter a data point could belong to more than one class. In order to deal with this problem, in this work, we further develop classical K-Nearest Neighbor classifier and propose a novel Class Balanced K-Nearest Neighbor approach for multi-label classification by emphasizing balanced usage of data from all the classes. In addition, we also propose a Class Balanced Linear Discriminant Analysis approach to address high-dimensional multi-label input data. Promising experimental results on three broadly used multi-label data sets demonstrate the effectiveness of our approach.