Author name cluster

Hua Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

57 papers

2 author rows

AAAI Conference 2026 Conference Paper

Decoding with Structured Awareness: Integrating Directional, Frequency-Spatial, and Structural Attention for Medical Image Segmentation

Fan Zhang
Zhiwei Gu
Hua Wang

To address the limitations of Transformer decoders in capturing edge details, recognizing local textures and modeling spatial continuity, this paper proposes a novel decoder framework specifically designed for medical image segmentation, comprising three core modules. First, the Adaptive Cross-Fusion Attention (ACFA) module integrates channel feature enhancement with spatial attention mechanisms and introduces learnable guidance in three directions (planar, horizontal, and vertical) to enhance responsiveness to key regions and structural orientations. Second, the Triple Feature Fusion Attention (TFFA) module fuses features from Spatial, Fourier and Wavelet domains, achieving joint frequency-spatial representation that strengthens global dependency and structural modeling while preserving local information such as edges and textures, making it particularly effective in complex and blurred boundary scenarios. Finally, the Structural-aware Multi-scale Masking Module (SMMM) optimizes the skip connections between encoder and decoder by leveraging multi-scale context and structural saliency filtering, effectively reducing feature redundancy and improving semantic interaction quality. Working synergistically, these modules not only address the shortcomings of traditional decoders but also significantly enhance performance in high-precision tasks such as tumor segmentation and organ boundary extraction, improving both segmentation accuracy and model generalization. Experimental results demonstrate that this framework provides an efficient and practical solution for medical image segmentation.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Exploiting All Mamba Fusion for Efficient RGB-D Tracking

Ge Ying
Dawei Zhang
Chengzhuan Yang
Wei Liu
Sang-Woon Jeon
Hua Wang
Changqin Huang
Zhonglong Zheng

Despite the progress made through deep learning, existing Visual Object Tracking (VOT) frameworks struggle with real-world challenges. Recent approaches incorporate additional modalities like Depth, Thermal Infrared, and Language to enhance the robustness of VOT, particularly with the improvement of the depth sensor precision, facilitating RGB-D tracking. However, current RGB-D trackers often copy RGB tracking paradigms, leading to inefficiency due to two-stream architectures that fail to exploit heterogeneous features, and reliance on simplistic or large-parameter fusion methods. To address these challenges, we propose AMTrack, a one-stream RGB-D tracker leveraging Mamba's linear complexity for simultaneous feature extraction and two-stage cross-modal feature fusion. Our innovation also includes a low-parameter Multimodal Mix Mamba (3M) module, which optimizes deep feature fusion and reduces computational overhead. The advantage of the 3M module stems from our Multimodal State Space Model (MSSM), a multimodal feature interaction component reconstructed based on SSM. Experiments across multiple RGB-D tracking datasets indicate that AMTrack achieves superior performance with lower parameters and memory demands compared to state-of-the-arts.

PDF Details DOI

AAAI Conference 2026 Conference Paper

IdealTSF: Can Non-Ideal Data Contribute to Enhancing the Performance of Time Series Forecasting Models?

Hua Wang
Jinghao Lu
Fan Zhang

Deep learning has shown strong performance in time series forecasting tasks. However, issues such as missing values and anomalies in sequential data hinder its further development in prediction tasks. Previous research has primarily focused on extracting feature information from sequence data or addressing these suboptimal data as positive samples for knowledge transfer. A more effective approach would be to leverage these non-ideal negative samples to enhance event prediction. In response, this study highlights the advantages of non-ideal negative samples and proposes the IdealTSF framework, which integrates both ideal positive and negative samples for time series forecasting. IdealTSF consists of three progressive steps: pretraining, training, and optimization. It first pretrains the model by extracting knowledge from negative sample data, then transforms the sequence data into ideal positive samples during training. Additionally, a negative optimization mechanism with adversarial disturbances is applied. Extensive experiments demonstrate that negative sample data unlocks significant potential within the basic attention architecture for time series forecasting. Therefore, IdealTSF is particularly well-suited for applications with noisy samples or low-quality data.

PDF Details DOI

AAAI Conference 2026 Conference Paper

IGIANet: Illumination Guided Implicit Alignment Network for Infrared–Visible UAV Detection

Xiangqi Chen
Dawei Zhang
Li Zhao
Chengzhuan Yang
Zhongyu Chen
Jungang Lou
Zhonglong Zheng
Sang-Woon Jeon

Visible-Infrared (RGB-IR) Unmanned Aerial Vehicle (UAV) object detection integrates complementary cues from visible and infrared sensors, offering broad application potential. However, due to sensor parallax, it still faces the challenge of weak spatial misalignment, which significantly limits its performance in UAV-based object detection. Existing methods emphasize strict alignment, overlooking spectral heterogeneity under varying illumination. To address these issues, we propose the Illumination Guided Implicit Alignment Network (IGIANet) to mitigate modality heterogeneity without explicit alignment. Specifically, we integrate three novel modules. First, we propose an illumination-guided frequency modulation module that adaptively allocates fusion weights to visible and infrared features based on global illumination estimation, effectively alleviating modality imbalance under varying lighting conditions. Second, we introduce a frequency-guided cross-modality differential enhancement module, which computes differential cues across frequency domains to enhance complementary information and highlight weakly aligned and low-contrast regions. Finally, we introduce an implicit alignment-driven dynamic fusion module that actively estimates offsets and generates dynamic, position-adaptive fusion kernels to align and fuse modalities. Extensive experiments demonstrate that IGIANet outperforms state-of-the-art models on various benchmarks, achieving 80.9% mAP on DroneVehicle, 57.1% mAP on VEDAI, and 49.4% mAP on FLIR.

PDF Details DOI

AAAI Conference 2026 Conference Paper

MoEA-Net: Modality-Incremental Expert Aggregation Network for Retinal Prognostic Prediction

Hua Wang
Xiaodan Zhang
Yanzhao Shi
Chengxin Zheng
Wanyu Zhang
Zhen Wang
Jianing Wang
Xiaobing Yu

Automated analysis of temporal changes in multimodal retinal images is critical for the prognostic assessment of ophthalmic diseases. Unlike traditional single-timepoint diagnosis, tracking longitudinal changes across multiple imaging modalities introduces significant data bias challenges: (1) Imbalanced modality samples compromise the integration of knowledge within minority modalities; (2) Heterogeneous visual patterns across modalities undermine the perception of disease-relevant biomarkers. To tackle these issues, we propose a Modality-Incremental Expert Aggregation Network (MoEA-Net), which unifies the inter-modal integration and intra-modal perception for enhanced retinal prognostic prediction. Specifically, we employ the large language model (LLM) with incremental LoRA layers for specific modalities to effectively integrate knowledge from imbalanced data. Besides, we introduce a Spatiotemporal-aware Expert (SAE) module to better perceive both the anatomical structures and longitudinal changes within modalities. By progressively combining the SAE module with incremental LoRA, MoEA-Net supports continual knowledge accumulation and improves accurate reasoning. Experimental results show that MoEA-Net achieves state-of-the-art performance on subretinal fluid change and visual recovery classification tasks, validating its effectiveness.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Neural Outline Cache for Real-time Anti-aliasing Font Rendering

Jiashuaizi Mo
Sang-Woon Jeon
Hua Wang
Xiangqi Chen
Yanchao Wang
Minglu Li
Zhonglong Zheng

Neural textures have emerged as pivotal assets in next-generation neural rendering pipelines. However, hardware limitations and programming interface constraints lead to suboptimal performance in multi-instance real-time rendering scenarios. This bottleneck becomes particularly acute for texture-intensive tasks such as font rendering. To address this, we propose Neural Outline Cache (NOC), a novel neural font texture supporting real-time anti-aliased rendering and procedural editing within modern neural graphics pipelines. NOC's lightweight network leverages multi-resolution hash encoding to cache spline-derived SDFs, delivering anti-aliased rendering via standard graphics pipelines. For massive-instance scalability, our cache buffer layout (CBL) and batch-fused inference (BFI), tailored for NOC, mitigate neural texture streaming bottlenecks. We constructed an evaluation dataset using five font styles. In offline rendering, our proposed method achieves overall average results of 57.35 dB PSNR, 0.998 SSIM, and 1.1584e-3 pixel RMSE, while maintaining approximately 0.5ms frame latency with 500 real-time instances. To demonstrate its versatility, we integrated a procedural editor for visual effects editing of NOC textures. These results all prove that NOC is a reliable, production-ready neural asset.

PDF Details DOI

TIST Journal 2026 Journal Article

Towards Evolutionary Differential Privacy in Cross-Platform Spatial Crowdsourcing

Yong-Feng Ge
Hua Wang
Elisa Bertino
Jinli Cao
Yanchun Zhang
Zhonglong Zheng

The development of mobile web services has brought significant attention to spatial crowdsourcing. The uneven distribution of tasks and workers has led to recent research on Cross-Platform Spatial Crowdsourcing (CPSC), aiming for a multi-win situation for platforms, workers and task requesters. Previous studies on CPSC problems focused on task assignment and worker selection performance, overlooking the importance of privacy preservation. This paper addresses the existing challenges of privacy preservation and service quality by formulating a Privacy-Preserving Cross-Platform Spatial Crowdsourcing (PP-CPSC) problem and proves it to be NP-hard. We propose an Evolutionary Differential Privacy (Evo-DP) approach to optimize PP-CPSC. Evo-DP's evolutionary framework enables efficient and flexible optimization of privacy budget allocation. Within Evo-DP, each solution to the privacy budget allocation is represented as an individual in the population. To approximate the optimal solution, three evolutionary operations - mutation, crossover, and scaling - are employed for population updates, along with a selection process. A hybrid population model is introduced to balance exploration and exploitation abilities. Experimental results demonstrate Evo-DP's superiority over previous strategies in terms of solution quality, convergence speed, and scalability.

Details DOI

TIST Journal 2026 Journal Article

Transformer-Enhanced Adaptive Graph Convolutional Network for Traffic Flow Prediction

Enfu Huang
Zhanshan Zhao
Jiao Yin
Jinli Cao
Hua Wang

Traffic flow prediction is vital in urban traffic management, planning, and development. With the continuous advancement of urbanization, there is an increasing demand for traffic flow prediction models to achieve higher accuracy and long-range forecasting capabilities. Against this backdrop, traditional methods that rely on local feature extraction and static spatial graph construction often fall short of expectations. This highlights the urgent need for advanced approaches to dynamically model spatio-temporal features while capturing global dependencies, effectively meeting the demands of complex traffic flow prediction tasks. To achieve this, we propose the Transformer-Enhanced Adaptive Graph Convolutional Network (T-AGCN), a novel model designed to capture global temporal relationships and dynamically extract rich spatial information. T-AGCN incorporates an Adaptive Graph Learner module to model dynamic relationships among traffic nodes and a Transformer-Based Spatio-Temporal graph convolutional module to capture long-range temporal dependencies in historical traffic data effectively. These innovations enable T-AGCN to jointly learn dynamic spatial interactions and complex temporal patterns, offering a comprehensive representation of traffic network dynamics. We evaluate T-AGCN on three real-world datasets, PeMSD7(M), PeMS08, and METR-LA. The experimental results demonstrate that T-AGCN, inspired by the baseline model Spatial-Temporal Graph Convolutional Network (STGCN), significantly enhances its design. Moreover, T-AGCN consistently outperforms state-of-the-art models, including the Transformer-Based Interactive Temporal and Adaptive Network (TITAN) and the Spatial-Temporal Decoupled Masked Autoencoder (STD-MAE). The implementation is available on GitHub at https://github.com/time1722/T-AGCN.

Details DOI

AAAI Conference 2026 Conference Paper

URPO: A Unified Reward & Policy Optimization Framework for Large Language Models

Songshuo Lu
Hua Wang
Zhi Chen
Yaohua Tang

Large-scale alignment pipelines typically pair a policy model with a separately trained reward model whose parameters remain frozen during reinforcement learning (RL). This separation creates a complex, resource-intensive pipeline and leads to a performance ceiling. We propose a novel framework, Unified Reward & Policy Optimization (URPO), that unifies instruction-following (“player”) and reward modeling (“referee”) into a single model and a single training phase. Our method recasts all alignment data-including preference pairs, verifiable reasoning, and open-ended instructions-into a unified generative format optimized by a single Group-Relative Policy Optimization (GRPO) loop. This enables the model to learn from ground-truth preferences and verifiable logic while simultaneously generating its own rewards for open-ended tasks. Experiments on the Qwen2.5-7B model demonstrate that URPO significantly outperforms a strong baseline using a separate generative reward model, boosting the instructionfollowing score on AlpacaEval to 44.84 and achieving a 36% relative improvement on the challenging AIME reasoning benchmark. Furthermore, URPO cultivates a superior internal evaluator as a byproduct of training, achieving a RewardBench score of 85.15 and surpassing the dedicated reward model it replaces (83.55). By eliminating the need for a separate reward model and fostering a co-evolutionary dynamic, URPO presents a simpler, more efficient, and more effective path towards robustly aligned language models.

PDF Details DOI

EAAI Journal 2026 Journal Article

Variational inference for learning-based orbital pursuit-evasion under incomplete information

Junhua He
Hua Wang
Haitao Wang
Chengyi Huo
Heng Jing

Observation is a key factor influencing the strategy design of orbital games. This paper focuses on the orbital pursuit-evasion game with observation errors and information delays (ED-OPEG), proposing an artificial intelligence-based autonomous decision-making method. The game model for ED-OPEG is built based on orbital dynamics and game theory, and is further reconstructed under the partially observable Markov decision process framework—decomposing the game strategy solution into belief state inference and strategy mapping. Accordingly, the Histories-Enhanced Variational Twin Delay Deep Deterministic policy gradient (HE-VTD3) algorithm is proposed. Simulation experiments demonstrate that the HE-VTD3 algorithm exhibits strong resistance to local convergence while ensuring training stability. Against target adopting adversarial strategies, HE-VTD3 enables the pursuer to achieve a capture success rate of 80. 4%, showing improvements of 20. 2% and 67. 1% over the long-short-term-memory-based twin delayed deep deterministic policy gradient and delayed turn-based multi-agent deep deterministic policy gradient algorithms, respectively. Under the multivariate 95% confidence ellipse, HE-VTD3's empirical confidence level for non-cooperative target state uncertainty estimation reaches 97. 77%. Generalization analysis further confirms that HE-VTD3 exhibits robustness to environmental uncertainties and disturbances.

Details DOI

EAAI Journal 2025 Journal Article

A novel dual-channel model with adaptive multi-scale attention for time series forecasting

Shuqing Wang
Jinghao Lu
Ren Wang
Xiaofeng Zhang
Hua Wang
Yujuan Sun

Time series forecasting plays a crucial role in various domains, including finance, traffic management, energy, and healthcare. However, as application scenarios continue to expand, the complexity of time series data has significantly increased, posing substantial challenges in capturing trend fluctuations of multivariate features and the dynamic relationships among them. To address these issues, this paper proposes a novel architecture–DASformer (Dual-Channel model with Adaptive multi-Scale attention) - which enhances time series analysis by leveraging a dual-channel multivariate extractor and an adaptive multi-scale attention mechanism. Specifically, the dual-channel multivariate extractor comprises two independent yet interactive streams, focusing on capturing information at different levels of the time series, thereby effectively decoupling complex dynamic relationships. Moreover, to alleviate the problem of feature forgetting and loss in the long-term trend stream, the model incorporates an adaptive multi-scale attention module. This module adopts multi-scale processing and a dynamic weighting mechanism to learn dependencies across different scales and effectively capture their dynamic variations. Experimental results show that DASformer consistently achieves state-of-the-art performance on nine widely used benchmark datasets, delivering superior prediction accuracy, particularly in long-term forecasting tasks. The source code is available at: https: //github. com/LDU-TSA/DASformer.

Details DOI

EAAI Journal 2025 Journal Article

Autonomous dynamic formation for maritime target tracking using multi-agent reinforcement learning

Hua Wang
Jiaxin Li
Hao Tao
Junnan Liu
Chaochao Li
Ke Wang
Mingliang Xu

In various maritime missions such as escort and roundup, dynamic formation target tracking plays a crucial role. Most existing dynamic formation methods require user intervention before formation changes, resulting in poor flexibility and low automation. And they do not consider variations in the abilities of individual members. To address the above issue, we propose an autonomous dynamic formation planning method based on multi-agent reinforcement learning, integrating formation configuration into the strategy. This method can automatically adjust the formation based on the current state of the formation, providing greater flexibility and adaptability. Simultaneously, a staged reward function is devised for the training process to guide agents in progressively learning dynamic formation tasks. Finally, we validate the effectiveness and generalization of our proposed method through various experiments.

Details DOI

EAAI Journal 2025 Journal Article

Periodic decomposition and feature enhancement fusion for traffic forecasting

Xiaofei Kong
Hua Wang
Mingli Zhang
Fan Zhang

With the rapid acceleration of urbanization, traffic prediction plays a crucial role in smart city development. This paper proposes an architecture called Periodic Decomposition and Feature Enhancement Fusion (PDGM) aimed at addressing the periodicity issue overlooked in existing traffic prediction methods. PDGM utilizes downsampling techniques to decompose the original traffic data into periodic components and enhances missing data through feature enhancement fusion, thereby improving the accuracy of traffic data prediction. Experimental results of this study demonstrate that PDGM outperforms state-of-the-art baseline models on three benchmark datasets, offering new possibilities for traffic data analysis and prediction tasks.

Details DOI

EAAI Journal 2025 Journal Article

Probabilistic intervals prediction based on adaptive regression with attention residual connections and covariance constraints

Fan Zhang
Min Wang
Lin Li
Yepeng Liu
Hua Wang

This paper introduces a novel prediction interval method called Adaptive Regression with Attention Residual Connection and Covariance Constraint (AR-ARCC). By integrating Monte Carlo and Bayesian methods, we leverage the strengths of both to achieve a more flexible and accurate method for generating prediction intervals. Additionally, through the optimization of the loss function, introduction of penalty terms, and improvement of mean squared error calculations, the model’s performance in interval prediction tasks is enhanced. Finally, the integration of an interactive channel heterogeneous self-attention module, combined with residual blocks, enhances the modeling capability of the neural network. The comprehensive application of these methods results in superior performance of the model in handling uncertainty and local variations.

Details DOI

EAAI Journal 2025 Journal Article

Robust memory-based graph neural networks for noisy and sparse graphs

Linling Jiang
Wenchang Zhang
Hua Wang
Fan Zhang

Real-world graphs often suffer from structural noise and label sparsity, two critical challenges that severely impair the performance of graph neural networks (GNNs). While prior methods have focused on addressing either of these problems in isolation, their co-occurrence in practical scenarios remains largely unsolved. To address this challenge, we propose a robust memory-based GNN for noisy and sparse graphs that stores and updates node similarity information within a memory module to assist in predicting missing edges and eliminating noisy ones. This reconstruction densifies the graph structure, effectively mitigating the impact of noisy edges and alleviating the challenges posed by label sparsity through enhanced information propagation. Furthermore, the reconstructed graph adopts an edge regularization strategy that models the confidence of predicted edges and suppresses uncertain connections during training, thereby smoothing the label propagation for unlabeled nodes and improving the robustness of GNN training. Extensive evaluations conducted on real-world benchmark datasets, including Cora, Citeseer, and Pubmed, demonstrate that our proposed method, the robust memory graph neural network (RMGNN), significantly enhances GNN performance on noisy graphs with limited labeled nodes, with a notable performance boost of up to 17. 8% on the Cora dataset. Our experimental analysis further confirms the effectiveness and efficiency of the proposed memory-based graph structure learning approach in the presence of edge noise and sparse labels, validating the robustness of the framework in complex graph scenarios.

Details DOI

TIST Journal 2025 Journal Article

Scalable Multi-Instance Multi-Shape Support Vector Machine for Whole Slide Breast Histopathology

Hoon Seo
Yuze Bai
Lodewijk Brand
Lucia Saldana Barco
Hua Wang

Analysis of histopathological images is critical in cancer diagnosis and treatment. Due to the huge size of histopathological images and the varied number of imaging records per patient, many existing works analyze the Whole Slide Image (WSI) as a bag in which its patches are instances. However, these approaches are limited to analyzing the patches in a fixed shape, while the malignant lesions can form varied shapes. To address this challenge, in this article we propose a Multi-Instance Multi-Shape Support Vector Machine (MIMSSVM) to analyze the multiple images (instances) jointly where each instance consists of multiple patches in various shapes. In our approach, we can identify the different morphologic abnormalities of nuclei shapes from the multiple images. In addition to the multi-instance multi-shape learning capability, we derive an efficient solution algorithm to optimize the proposed model that scales well to a large number of features. Our experimental results show our new method outperforms the existing SVMs and deep learning models in histopathological classification. The proposed model also identifies the tissue segments in an image exhibiting an indication of an abnormality which provides utility in the early detection of malignant tumors. All these promising experimental results have demonstrated the effectiveness of our new method. We anticipate that our new method is of interest to biomedical engineering communities beyond WSI research and have open sourced the code of our method online. The implementation of our proposed MIMSSVM model is publicly available at https://github.com/hoonseo0409/MIMSSVM.

Details DOI

EAAI Journal 2025 Journal Article

UTCR-Dehaze: U-Net and transformer-based cycle-consistent generative adversarial network for unpaired remote sensing image dehazing

Canlin Li
Xiangfei Zhang
Hua Wang
Zhiwen Shao
Lizhuang Ma

To address issues of feature loss and color differences in existing unpaired dehazing methods for Remote Sensing images, we propose a method based on a U-Net and Transformer-based Cycle-Consistent Generative Adversarial Network for unpaired remote sensing image dehazing (UTCR-Dehaze). In this model, considering that paired hazy images are difficult to obtain, a cycle-consistent generative adversarial network (CycleGAN) is used to achieve remote sensing image dehazing. Due to the multi-scale features of remote sensing images, U-Net is combined with Transformer as the generator of CycleGAN. The generator learns the relationship between low-frequency and high-frequency features of the image at multiple scales. The U-Net encoder–decoder processes the high-frequency features, and the transformer at the bottleneck of U-Net learns the low-frequency feature relationship to restore image details and structures. Secondly, to further improve the details and clarity of dehazed images, a Mixed Cascade Group Attention module (MCGA) is designed. MCGA captures the global information of the image through cascade group attention and focuses on local information through Dehaze input-dependent depthwise convolution, thus better learning image features. In addition, to reduce feature loss and color differences in dehazed images, a Cycle Perceptual Identity Consistency Loss is designed, which combines perceptual and identity losses to maintain the details of input images through cycle consistency. Numerous experiments on synthetic and real remote sensing datasets show that, compared with previous methods, this method not only removes haze more accurately but also preserves image details and colors to the greatest extent.

Details DOI

EAAI Journal 2024 Journal Article

Combining optical flow and Swin Transformer for Space-Time video super-resolution

Xin Wang
Hua Wang
Mingli Zhang
Fan Zhang

Space–time video super-resolution is a task that aims to interpolate low frame rate, low resolution videos to high frame rate, high resolution ones. While existing Transformer-based methods have achieved results comparable to convolutional neural networks-based methods, the computational cost of Transformer limits its performance with constrained computational resources. Moreover, Swin Transformer may fail to fully exploit the spatio-temporal information of video frames due to the limitation of window size, impeding its effectiveness in handling large motions. To address these limitations, we propose an end-to-end space–time video super-resolution architecture based on optical flow alignment and Swin Transformer. The alignment module is introduced to extract spatio-temporal information from adjacent frames without significantly increasing the computational burden. Additionally, we design a residual convolution layer to enhance the translational invariance of the features extracted by Swin Transformer and introduces additional nonlinear transformations. Experimental results demonstrate that our proposed method achieves superior performance on various benchmark datasets compared to state-of-the-art methods. In terms of Peak Signal-to-Noise Ratio, our method outperforms the state-of-the-art methods by at least 0. 15 dB on Vimeo-Medium dataset.

Details DOI

IJCAI Conference 2024 Conference Paper

Skip-Timeformer: Skip-Time Interaction Transformer for Long Sequence Time-Series Forecasting

Wenchang Zhang
Hua Wang
Fan Zhang

Recent studies have raised questions about the suitability of the Transformer architecture for long sequence time-series forecasting. These forecasting models leverage Transformers to capture dependencies between multiple time steps in a time series, with embedding tokens composed of data from individual time steps. However, challenges arise when applying Transformers to predict long sequences with strong periodicity, leading to performance degradation and increased computational burden. Furthermore, embedding tokens formed one time step at a time may struggle to reveal meaningful information in long sequences, failing to capture correlations between different time steps. In this study, we propose Skip-Timeformer, a Transformer-based model that utilizes a skip-time interaction for long sequence time-series forecasting. Specifically, we decompose the time series into multiple subsequences based on different time intervals, embedding various time steps into variable tokens across multiple sequences. The skip-time interaction mechanism utilizes these variable tokens to capture dependencies in the skip-time dimension. Additionally, skip-time interaction is employed to learn dependencies between sequences missed by multiple skip time steps. The Skip-Timeformer model demonstrates state-of-the-art performance on various real-world datasets, further enhancing the long sequence forecasting capabilities of the Transformer variations and better adapting to arbitrary lookback windows.

PDF Details DOI

ICRA Conference 2023 Conference Paper

3D Reconstruction of Tibia and Fibula using One General Model and Two X-ray Images

Kai Pan
Shuai Zhang 0029
Liang Zhao 0003
Shoudong Huang
Yanhao Zhang 0003
Hua Wang
Qi Luo

The 3D reconstruction of patient specific bone models plays a crucial role in orthopaedic surgery for clinical evaluation, surgical planning and precise implant design or selection. This paper considers the problem of reconstructing a patient-specific 3D tibia and fibula model from only two 2D X-ray images and one 3D general model segmented from the lower leg CT scans of one randomly selected patient. Currently, the bone 3D reconstruction mainly relies on computed tomography (CT) and magnetic resonance imaging (MRI) scanning-based mode segmentation which result in high radiation exposure or expensive costs. While, the proposed algorithm can accurately and efficiently deform a 3D general model to achieve a patient-specific 3D model that matches the patient's tibia and fibula projections in two 2D X-rays. The algorithm undergoes a preliminary deformation, 2D contour registration, and opti-misation based on the deformation graph that represents the shape deformation of models. Evaluations using simulations, cadaver and in-vivo experiments demonstrate that the proposed algorithm can effectively reconstruct the patient's 3D tibia and fibula surface model with high accuracy.

Details

NeurIPS Conference 2023 Conference Paper

DP-HyPO: An Adaptive Private Framework for Hyperparameter Optimization

Hua Wang
Sheng Gao
Huanyu Zhang
Weijie Su
Milan Shen

Hyperparameter optimization, also known as hyperparameter tuning, is a widely recognized technique for improving model performance. Regrettably, when training private ML models, many practitioners often overlook the privacy risks associated with hyperparameter optimization, which could potentially expose sensitive information about the underlying dataset. Currently, the sole existing approach to allow privacy-preserving hyperparameter optimization is to uniformly and randomly select hyperparameters for a number of runs, subsequently reporting the best-performing hyperparameter. In contrast, in non-private settings, practitioners commonly utilize "adaptive" hyperparameter optimization methods such as Gaussian Process-based optimization, which select the next candidate based on information gathered from previous outputs. This substantial contrast between private and non-private hyperparameter optimization underscores a critical concern. In our paper, we introduce DP-HyPO, a pioneering framework for "adaptive" private hyperparameter optimization, aiming to bridge the gap between private and non-private hyperparameter optimization. To accomplish this, we provide a comprehensive differential privacy analysis of our framework. Furthermore, we empirically demonstrate the effectiveness of DP-HyPO on a diverse set of real-world datasets.