Author name cluster

Hang Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

32 papers

2 author rows

EAAI Journal 2026 Journal Article

Accelerated three-dimensional topology optimization via deep transfer reinforcement learning

Mingwei Wang
Jian Zhang
Enming Li
Hang Zhang
Zhen Wang
Xu Huang
Tengyuan Jiang
Jingtao Zhou

Details DOI

AAAI Conference 2026 Conference Paper

Can Protective Watermarking Safeguard the Copyright of 3D Gaussian Splatting?

Wenkai Huang
Yijia Guo
Gaolei Li
Lei Ma
Hang Zhang
Liwen Hu
Jiazheng Wang
Jianhua Li

3D Gaussian Splatting (3DGS) has emerged as a powerful representation for 3D scenes, widely adopted due to its exceptional efficiency and high-fidelity visual quality. Given the significant value of 3DGS assets, recent works have introduced specialized watermarking schemes to ensure copyright protection and ownership verification. However, can existing 3D Gaussian watermarking approaches genuinely guarantee robust protection of the 3D assets? In this paper, for the first time, we systematically explore and validate possible vulnerabilities of 3DGS watermarking frameworks. We demonstrate that conventional watermark removal techniques designed for 2D images do not effectively generalize to the 3DGS scenario due to the specialized rendering pipeline and unique attributes of each gaussian primitives. Motivated by this insight, we propose GSPure, the first watermark purification framework specifically for 3DGS watermarking representations. By analyzing view-dependent rendering contributions and exploiting geometrically accurate feature clustering, GSPure precisely isolates and effectively removes watermark-related Gaussian primitives while preserving scene integrity. Extensive experiments demonstrate that our GSPure achieves the best watermark purification performance, reducing watermark PSNR by up to 16.34dB while minimizing degradation to original scene fidelity with less than 1dB PSNR loss. Moreover, it consistently outperforms existing methods in both effectiveness and generalization.

PDF Details DOI

AIJ Journal 2026 Journal Article

Corrigendum to “Kernel-Bounded Clustering: Achieving the Objective of Spectral Clustering without Eigendecomposition” [Artificial Intelligence 350 (2026) 104440]

Hang Zhang
Kai Ming Ting
Ye Zhu

Details DOI

JBHI Journal 2026 Journal Article

Hierarchical Multi-View Graph Diffusion Weighted Model for Cancer Subtype Identification

Yunhe Wang
Hang Zhang
Zhengyu Du
Yanchi Su
Xiangtao Li

Accurate cancer subtype identification is crucial for personalized medicine, as it enables precise diagnosis based on molecular characteristics. With the advent of large-scale multi-omics data from various resources, researchers now have unprecedented opportunities to explore cancer subtypes comprehensively. However, the inherent complexity, high dimensionality, and heterogeneity of these datasets present significant statistical and computational challenges, often leading to suboptimal clustering performance when inter-omics heterogeneity is overlooked. To address these challenges, we propose a novel method called the Hierarchical Multi-view Graph Diffusion Weighted (HMGDW) model for cancer subtype identification. Our approach begins with the generation of multiple base clusterings through random feature sampling, effectively mitigating the impact of high dimensionality. These base clusterings are subsequently integrated via a late integration strategy to yield the consensus clustering result. Then, we introduce a graph diffusion weighted mechanism that prioritizes views with the most significant contributions to the unified graph representation. Lastly, we conducted extensive experiments on both generic multi-view datasets and multi-omics cancer multi-omics datasets. The experimental results demonstrate that HMGDW consistently outperforms several state-of-the-art methods, achieving robust and accurate clustering. Additionally, a case study on the acute myeloid leukemia (AML) dataset validates the practical efficacy of our model in identifying clinically relevant subtypes.

Details DOI

AIJ Journal 2026 Journal Article

Kernel-bounded clustering: Achieving the objective of spectral clustering without eigendecomposition

Hang Zhang
Kai Ming Ting
Ye Zhu

Details DOI

AAAI Conference 2026 Conference Paper

Multi-level Style Preference Optimization: An Adaptive Detection Framework for Human-Machine Hybrid Text

Zehao Wang
Lianwei Wu
Wenbo An
Hang Zhang
Yaxiong Wang

Large language model (LLM) generated texts now rival human quality, creating four text categories: purely machine-generated, machine-rewritten, machine-polished, and human-written content. Traditional detection methods face significant challenges in human-machine hybrid scenarios where LLMs perform rewriting or polishing, as existing approaches focus on single-level features and fail to capture subtle, multi-layered machine traces. To address this, we propose the Multi-level Style Preference Optimization (MSPO) framework, capturing machine style features at multiple granularities: sequence-level (overall consistency), phrase-level (distinctive n-gram patterns), and lexical-level (word selection distributions). We further incorporate four text complexity indicators (Type-Token Ratio, Average Sentence Length, Average Word Length, and Punctuation Ratio) to dynamically adjust optimization parameters based on human-machine text complexity differences, enhancing adaptability across diverse text types. Additionally, we construct a comprehensive detection dataset spanning three representative domains (scientific writing, news articles, and creative writing) across four text types (human-written, purely machine-generated, machine-rewritten, and machine-polished), generated using state-of-the-art LLMs for robust evaluation. Experimental results demonstrate that MSPO significantly outperforms existing methods across all text types. On the challenging rewritten texts, MSPO achieves up to 82.14% AUROC, representing an improvement of 11.15 percentage points over the strongest baseline ImBD, while maintaining robust cross-domain generalizability across scientific, news, and creative writing domains.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Persistent Autoregressive Mapping with Traffic Rules for Autonomous Driving

Shiyi Liang
Xinyuan Chang
Changjie Wu
Huiyuan Yan
Yifan Bai
Xinran Liu
Hang Zhang
Yujian Yuan

Safe autonomous driving requires both accurate HD map construction and persistent awareness of traffic rules, even when their associated signs are no longer visible. However, existing methods either focus solely on geometric elements or treat rules as temporary classifications, failing to capture their persistent effectiveness across extended driving sequences. In this paper, we present PAMR (Persistent Autoregressive Mapping with Traffic Rules), a novel framework that performs autoregressive co-construction of lane vectors and traffic rules from visual observations. Our approach introduces two key mechanisms: Map-Rule Co-Construction for processing driving scenes in temporal segments, and Map-Rule Cache for maintaining rule consistency across these segments. To properly evaluate continuous and consistent map generation, we develop MapDRv2, featuring improved lane geometry annotations. Extensive experiments demonstrate that PAMR achieves superior performance in joint vector-rule mapping tasks, while maintaining persistent rule effectiveness throughout extended driving sequences.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation

Sicheng Yang
Yukai Huang
Weitong Cai
Shitong Sun
You He
Jiankang Deng
Hang Zhang
Jifei Song

The performance of egocentric AI agents is fundamentally limited by multimodal intent ambiguity. This challenge arises from a combination of underspecified language, imperfect visual data, and deictic gestures, which frequently leads to task failure. Existing monolithic Vision-Language Models (VLMs) struggle to resolve these multimodal ambiguous inputs, often failing silently or hallucinating responses. To address these ambiguities, we introduce the Plug-and-Play Clarifier, a zero-shot and modular framework that decomposes the problem into discrete, solvable sub-tasks. Specifically, our framework consists of three synergistic modules: (1) a text clarifier that uses dialogue-driven reasoning to interactively disambiguate linguistic intent, (2) a vision clarifier that delivers real-time guidance feedback, instructing users to adjust their positioning for improved capture quality, and (3) a cross-modal clarifier with grounding mechanism that robustly interprets 3D pointing gestures and identifies the specific objects users are pointing to. Extensive experiments demonstrate that our framework improves the intent clarification performance of small language models (4-8B) by approximately 30%, making them competitive with significantly larger counterparts. We also observe consistent gains when applying our framework to these larger models. Furthermore, our vision clarifier increases corrective guidance accuracy by over 20%, and our cross-modal clarifier improves semantic answer accuracy for referential grounding by 5%. Overall, our method provides a plug-and-play framework that effectively resolves multimodal ambiguity and significantly enhances user experience in egocentric interaction.

PDF Details DOI

AAAI Conference 2026 Conference Paper

SCoNE: Spherical Consistent Neighborhoods Ensemble for Effective and Efficient Multi-View Anomaly Detection

Yang Xu
Hang Zhang
Yixiao Ma
Ye Zhu
Kai Ming Ting

The core problem in multi-view anomaly detection is to represent local neighborhoods of normal instances consistently across all views. Recent approaches consider a representation of local neighborhood in each view independently, and then capture the consistent neighbors across all views via a learning process. They suffer from two key issues. First, there is no guarantee that they can capture consistent neighbors well, especially when the same neighbors are in regions of varied densities in different views, resulting in inferior detection accuracy. Second, the learning process has a high computational cost of O(N^2), rendering them inapplicable for large datasets. To address these issues, we propose a novel method termed Spherical Consistent Neighborhoods Ensemble (SCoNE). It has two unique features: (a) the consistent neighborhoods are represented with multi-view instances directly, requiring no intermediate representations as used in existing approaches; and (b) the neighborhoods have data-dependent properties, which lead to large neighborhoods in sparse regions and small neighborhoods in dense regions. The data-dependent properties enable local neighborhoods in different views to be represented well as consistent neighborhoods, without learning. This leads to O(N) time complexity. Empirical evaluations show that SCoNE has superior detection accuracy and runs orders-of-magnitude faster in large datasets than existing approaches.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Splats in Splats: Robust and Effective 3D Steganography Towards Gaussian Splatting

Yijia Guo
Wenkai Huang
Yang Li
Gaolei Li
Hang Zhang
Liwen Hu
Jianhua Li
Tiejun Huang

3D Gaussian splatting (3DGS) has demonstrated impressive 3D reconstruction performance with explicit scene representations. Given the widespread application of 3DGS in 3D reconstruction and generation tasks, there is an urgent need to protect the copyright of 3DGS assets. However, existing copyright protection techniques for 3DGS overlook the usability of 3D assets, posing challenges for practical deployment. Here we describe splats in splats, the first 3DGS steganography framework that embeds 3D content in 3DGS itself without modifying any attributes. To achieve this, we take a deep insight into spherical harmonics (SH) and devise an importance-graded SH coefficient encryption strategy to embed the hidden SH coefficients. Furthermore, we employ a convolutional autoencoder to establish a mapping between the original Gaussian primitives' opacity and the hidden Gaussian primitives' opacity. Extensive experiments indicate that our method significantly outperforms existing 3D steganography techniques, with 5.31% higher scene fidelity and 3x faster rendering speed, while ensuring security, robustness, and user experience.

PDF Details DOI

EAAI Journal 2026 Journal Article

Superpixel-based scribble-level supervision diffusion for the interactive segmentation of multi-configuration chips

Hang Zhang
Qing Zhou
Weidong Tang
Shengfeng Chen
Yuanqiang Luo

Details DOI

AAAI Conference 2026 Conference Paper

UniMapGen: A Generative Framework for Large-Scale Map Construction from Multi-modal Data

Yujian Yuan
Changjie Wu
Xinyuan Chang
Sijin Wang
Hang Zhang
Shiyi Liang
Shuang Zeng
Mu Xu

Large-scale map construction is foundational for critical applications such as autonomous driving and navigation systems. Traditional large-scale map construction approaches mainly rely on costly and inefficient special data collection vehicles and labor-intensive annotation processes. While existing satellite-based methods have demonstrated promising potential in enhancing the efficiency and coverage of map construction, they exhibit two major limitations: (1) inherent drawbacks of satellite data (e.g., occlusions, outdatedness) and (2) inefficient vectorization from perception-based methods, resulting in discontinuous and rough roads that require extensive post-processing. This paper presents a novel generative framework, UniMapGen, for large-scale map construction, offering three key innovations: (1) representing lane lines as discrete sequence and establishing an iterative strategy to generate more complete and smooth map vectors than traditional perception-based methods. (2) proposing a flexible architecture that supports multi-modal inputs, enabling dynamic selection among BEV, PV, and text prompt, to overcome the drawbacks of satellite data. (3) developing a state update strategy for global continuity and consistency of the constructed large-scale map. UniMapGen achieves state-of-the-art performance on the OpenSatMap dataset. Furthermore, UniMapGen can infer occluded roads and predict roads missing from dataset annotations.

PDF Details DOI

EAAI Journal 2025 Journal Article

Ensemble learning framework for detecting electricity theft in smart grids using weighted average method

Kaiyang Zhang
Jun Wang
Yonghai Zhu
Yifei Si
Shanshan Yin
Hang Zhang

Details DOI

IJCAI Conference 2025 Conference Paper

GraphProt: Certified Black-Box Shielding Against Backdoored Graph Models

Xiao Yang
Yuni Lai
Kai Zhou
Gaolei Li
Jianhua Li
Hang Zhang

Graph learning models have been empirically proven to be vulnerable to backdoor threats, wherein adversaries submit trigger-embedded inputs to manipulate the model predictions. Current graph backdoor defenses manifest several limitations: 1) dependence on model-related details, 2) necessitation of additional fine-tuning, and 3) reliance on extra explainability tools, all of which are infeasible under stringent privacy policies. To address those limitations, we propose GraphProt, a certified black-box defense method to suppress backdoor attacks on GNN-based graph classifiers. Our GraphProt operates in a model-agnostic manner and solely leverages graph input. Specifically, GraphProt first introduces designed topology-feature-filtration to mitigate graph anomalies. Subsequently, subgraphs are sampled via a formulated strategy integrating topology and features, followed by a robust model inference through a majority vote-based subgraph prediction ensemble. Our results across benchmark attacks and datasets show GraphProt effectively reduces attack success rates while preserving regular graph classification accuracy.

PDF Details DOI

ICLR Conference 2025 Conference Paper

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

Zehan Wang 0001
Ziang Zhang
Minjie Hong
Hang Zhang
Luping Liu
Rongjie Huang 0001
Xize Cheng
Shengpeng Ji

Recently, human-computer interaction with various modalities has shown promising applications, like GPT-4o and Gemini. Meanwhile, multimodal representation models have emerged as the foundation for these versatile multimodal understanding and generation pipeline. Models like CLIP, CLAP and ImageBind can map their specialized modalities into respective joint spaces. To construct a high-quality omni representation space that can be shared and expert in any modality, we propose to merge these advanced models into a unified space in scale. With this insight, we present \textbf{OmniBind}, advanced multimodal joint representation models via fusing knowledge of 14 pre-trained spaces, which support 3D, audio, image, video and language inputs. To alleviate the interference between different knowledge sources in integrated space, we dynamically assign weights to different spaces by learning routers with two objectives: cross-modal overall alignment and language representation decoupling. Notably, since binding and routing spaces only require lightweight networks, OmniBind is extremely training-efficient. Extensive experiments demonstrate the versatility and superiority of OmniBind as an omni representation model, highlighting its great potential for diverse applications, such as any-query and composable multimodal understanding.

Details

NeurIPS Conference 2025 Conference Paper

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Sicong Leng
Yun Xing
Zesen Cheng
Yang Zhou
Hang Zhang
Xin Li
Deli Zhao
Shijian Lu

Recent advancements in large multimodal models (LMMs) have significantly enhanced performance across diverse tasks, with ongoing efforts to further integrate additional modalities such as video and audio. However, most existing LMMs remain vulnerable to hallucinations, the discrepancy between the factual multimodal input and the generated textual output, which has limited their applicability in various real-world scenarios. This paper presents the first systematic investigation of hallucinations in LMMs involving the three most common modalities: language, visual, and audio. Our study reveals two key contributors to hallucinations: overreliance on unimodal priors and spurious inter-modality correlations. To address these challenges, we introduce the benchmark The Curse of Multi-Modalities (CMM), which comprehensively evaluates hallucinations in LMMs, providing a detailed analysis of their underlying issues. Our findings highlight key vulnerabilities, including imbalances in modality integration and biases from training data, underscoring the need for balanced cross-modal learning and enhanced hallucination mitigation strategies. Based on our observations and findings, we suggest potential research directions that could enhance the reliability of LMMs.

PDF Details

JAIR Journal 2025 Journal Article

Towards a Robust Persistence Diagram via Data-dependent Kernel

Hang Zhang
Kaifeng Zhang
Kai Ming Ting
Ye Zhu

Topological Data Analysis (TDA) is used to extract topological features such as rings from point clouds. Recent works have identified that existing methods, which construct persistence diagrams in TDA, are not robust to noise and varied densities in a point cloud. This causes these methods to obtain incorrect topological features. We analyze the necessary properties of an approach that can address these two issues, and propose a new filter function for TDA based on a new data-dependent kernel that possesses these properties. Our empirical evaluation reveals that (i) the proposed kernel provides a better mean for UMAP dimensionality reduction (ii) the proposed filter function can significantly improve the performance of Topological Point Cloud Clustering (iii) the proposed filter function is a more effective way of constructing Persistence Diagram for t-SNE visualization and SVM classification than three existing methods of TDA, In addition, we explore the proposed filter’s performance on a more complex deformation named Riemannian stretching. Our proposed filter equipped with Sample Fermat distance outperforms all the other filters when noise and Riemannian stretching coexist. Code is available at https://github.com/IsolationKernel/Codes/tree/main/Lambda-kernel.

PDF Details DOI

EAAI Journal 2024 Journal Article

Automated pixel-level pavement marking detection based on a convolutional transformer

Hang Zhang
Anzheng He
Zishuo Dong
Allen A. Zhang
Yang Liu
You Zhan
Kelvin C.P. Wang
Zhihao Lin

Details DOI

NeurIPS Conference 2024 Conference Paper

Empowering and Assessing the Utility of Large Language Models in Crop Science

Hang Zhang
Jiawei Sun
Renqi Chen
Wei Liu
Zhonghang Yuan
Xinzhe Zheng
Zhefan Wang
Zhiyuan Yang

Large language models (LLMs) have demonstrated remarkable efficacy across knowledge-intensive tasks. Nevertheless, their untapped potential in crop science presents an opportunity for advancement. To narrow this gap, we introduce CROP, which includes a novel instruction tuning dataset specifically designed to enhance LLMs’ professional capabilities in the crop science sector, along with a benchmark that serves as a comprehensive evaluation of LLMs’ understanding of the domain knowledge. The CROP dataset is curated through a task-oriented and LLM-human integrated pipeline, comprising 210, 038 single-turn and 1, 871 multi-turn dialogues related to crop science scenarios. The CROP benchmark includes 5, 045 multiple-choice questions covering three difficulty levels. Our experiments based on the CROP benchmark demonstrate notable enhancements in crop science-related tasks when LLMs are fine-tuned with the CROP dataset. To the best of our knowledge, CROP dataset is the first-ever instruction tuning dataset in the crop science domain. We anticipate that CROP will accelerate the adoption of LLMs in the domain of crop science, ultimately contributing to global food production.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

Yongxin Zhu
Bocheng Li
Hang Zhang
Xin Li
Linli Xu
Lidong Bing

Latent-based image generative models, such as Latent Diffusion Models (LDMs) and Mask Image Models (MIMs), have achieved notable success in image generation tasks. These models typically leverage reconstructive autoencoders like VQGAN or VAE to encode pixels into a more compact latent space and learn the data distribution in the latent space instead of directly from pixels. However, this practice raises a pertinent question: Is it truly the optimal choice? In response, we begin with an intriguing observation: despite sharing the same latent space, autoregressive models significantly lag behind LDMs and MIMs in image generation. This finding contrasts sharply with the field of NLP, where the autoregressive model GPT has established a commanding presence. To address this discrepancy, we introduce a unified perspective on the relationship between latent space and generative models, emphasizing the stability of latent space in image generative modeling. Furthermore, we propose a simple but effective discrete image tokenizer to stabilize the latent space for image generative modeling by applying K-Means on the latent features of self-supervised learning models. Experimental results show that image autoregressive modeling with our tokenizer (DiGIT) benefits both image understanding and image generation with the next token prediction principle, which is inherently straightforward for GPT models but challenging for other generative models. Remarkably, for the first time, a GPT-style autoregressive model for images outperforms LDMs, which also exhibits substantial improvement akin to GPT when scaling up model size. Our findings underscore the potential of an optimized latent space and the integration of discrete tokenization in advancing the capabilities of image generative models. The code is available at \url{https: //github. com/DAMO-NLP-SG/DiGIT}.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Bounded rationality in structured density estimation

Tianyuan Teng
Kevin Li
Hang Zhang

Learning to accurately represent environmental uncertainty is crucial for adaptive and optimal behaviors in various cognitive tasks. However, it remains unclear how the human brain, constrained by finite cognitive resources, constructs an internal model from an infinite space of probability distributions. In this study, we explore how these learned distributions deviate from the ground truth, resulting in observable inconsistency in a novel structured density estimation task. During each trial, human participants were asked to form and report the latent probability distribution functions underlying sequentially presented independent observations. As the number of observations increased, the reported predictive density became closer to the ground truth. Nevertheless, we observed an intriguing inconsistency in human structure estimation, specifically a large error in the number of reported clusters. Such inconsistency is invariant to the scale of the distribution and persists across stimulus modalities. We modeled uncertainty learning as approximate Bayesian inference in a nonparametric mixture prior of distributions. Human reports were best explained under resource rationality embodied in a decaying tendency towards model expansion. Our study offers insights into human cognitive processes under uncertainty and lays the groundwork for further exploration of resource-rational representations in the brain under more complex tasks.

PDF Details

YNIMG Journal 2023 Journal Article

LARO: Learned acquisition and reconstruction optimization to accelerate quantitative susceptibility mapping

Jinwei Zhang
Pascal Spincemaille
Hang Zhang
Thanh D. Nguyen
Chao Li
Jiahao Li
Ilhami Kovanlikaya
Mert R. Sabuncu

Details DOI

EAAI Journal 2023 Journal Article

Multiclass imbalanced and concept drift network traffic classification framework based on online active learning

Weike Liu
Cheng Zhu
Zhaoyun Ding
Hang Zhang
Qingbao Liu

Details DOI

IJCAI Conference 2023 Conference Paper

Spatially Covariant Lesion Segmentation

Hang Zhang
Rongguang Wang
Jinwei Zhang
Dongdong Liu
Chao Li
Jiahao Li

Compared to natural images, medical images usually show stronger visual patterns and therefore this adds flexibility and elasticity to resource-limited clinical applications by injecting proper priors into neural networks. In this paper, we propose spatially covariant pixel-aligned classifier (SCP) to improve the computational efficiency and meantime maintain or increase accuracy for lesion segmentation. SCP relaxes the spatial invariance constraint imposed by convolutional operations and optimizes an underlying implicit function that maps image coordinates to network weights, the parameters of which are obtained along with the backbone network training and later used for generating network weights to capture spatially covariant contextual information. We demonstrate the effectiveness and efficiency of the proposed SCP using two lesion segmentation tasks from different imaging modalities: white matter hyperintensity segmentation in magnetic resonance imaging and liver tumor segmentation in contrast-enhanced abdominal computerized tomography. The network using SCP has achieved 23. 8, 64. 9 and 74. 7 reduction in GPU memory usage, FLOPs, and network size with similar or better accuracy for lesion segmentation.

PDF Details DOI

YNICL Journal 2022 Journal Article

QSMRim-Net: Imbalance-aware learning for identification of chronic active multiple sclerosis lesions on quantitative susceptibility maps

Hang Zhang
Thanh D. Nguyen
Jinwei Zhang
Melanie Marcille
Pascal Spincemaille
Yi Wang
Susan A. Gauthier
Elizabeth M. Sweeney

Details DOI

JBHI Journal 2022 Journal Article

Symmetry-Aware Deep Learning for Cerebral Ventricle Segmentation With Intra-Ventricular Hemorrhage

Yineng Hua
Zengqiang Yan
Zhuo Kuang
Hang Zhang
Xianbo Deng
Li Yu

Cerebral ventricles are one of the prominent structures in the brain, segmenting which can provide rich information for brain-related disease diagnosis. Unfortunately, cerebral ventricle segmentation in complex clinical cases, such as in the coexistence with other lesions/hemorrhages, remains unexplored. In this paper, we, for the first time, focus on cerebral ventricle segmentation with the presence of intra-ventricular hemorrhages (IVH). To overcome the occlusions formed by IVH, we propose a symmetry-aware deep learning approach inspired by contrastive self-supervised learning. Specifically, for each slice, we jointly employ the raw slice and the horizontally flipped slice as inputs and penalize the consistency loss between the corresponding segmentation maps in addition to their segmentation losses. In this way, the symmetry of cerebral ventricles is enforced to eliminate the occlusions brought by IVH. Extensive experimental results show that the proposed symmetry-aware deep learning approach achieves consistent performance improvements for ventricle segmentation in both normal ( i. e. without IVH) and challenging cases ( i. e. with IVH). Through evaluation of multiple backbone networks, we demonstrate the architecture-independence of the proposed approach for performance improvements. Moreover, we re-design an end-to-end version of symmetry-aware deep learning, making it more extendable to other approaches for brain-related analysis.

Details DOI

YNICL Journal 2021 Journal Article

ALL-Net: Anatomical information lesion-wise loss function integrated into neural network for multiple sclerosis lesion segmentation

Hang Zhang
Jinwei Zhang
Chao Li
Elizabeth M. Sweeney
Pascal Spincemaille
Thanh D. Nguyen
Susan A. Gauthier
Yi Wang

Details DOI

AAAI Conference 2021 Conference Paper

Efficient Folded Attention for Medical Image Reconstruction and Segmentation

Hang Zhang
Jinwei Zhang
Rongguang Wang
Qihao Zhang
Pascal Spincemaille
Thanh D. Nguyen
Yi Wang

Recently, 3D medical image reconstruction (MIR) and segmentation (MIS) based on deep neural networks have been developed with promising results, and attention mechanism has been further designed for performance enhancement. However, the large size of 3D volume images poses a great computational challenge to traditional attention methods. In this paper, we propose a folded attention (FA) approach to improve the computational efficiency of traditional attention methods on 3D medical images. The main idea is that we apply tensor folding and unfolding operations to construct four small sub-affinity matrices to approximate the original affinity matrix. Through four consecutive sub-attention modules of FA, each element in the feature tensor can aggregate spatial-channel information from all other elements. Compared to traditional attention methods, with the moderate improvement of accuracy, FA can substantially reduce the computational complexity and GPU memory consumption. We demonstrate the superiority of our method on two challenging tasks for 3D MIR and MIS, which are quantitative susceptibility mapping and multiple sclerosis lesion segmentation.

PDF Details

YNIMG Journal 2020 Journal Article

Fidelity imposed network edit (FINE) for solving ill-posed image reconstruction

Jinwei Zhang
Zhe Liu
Shun Zhang
Hang Zhang
Pascal Spincemaille
Thanh D. Nguyen
Mert R. Sabuncu
Yi Wang

Details DOI

JMLR Journal 2020 Journal Article

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

Jian Guo
He He
Tong He
Leonard Lausen
Mu Li
Haibin Lin
Xingjian Shi
Chenguang Wang

We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customization. Leveraging the MXNet ecosystem, the deep learning models in GluonCV and GluonNLP can be deployed onto a variety of platforms with different programming languages. The Apache 2.0 license has been adopted by GluonCV and GluonNLP to allow for software distribution, modification, and usage. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

PDF Details

YNIMG Journal 2015 Journal Article

A two-step super-Gaussian independent component analysis approach for fMRI data

Ruiyang Ge
Li Yao
Hang Zhang
Zhiying Long

Details DOI

RLDM Conference 2013 Conference Abstract

Learned Myopic or Far-Sighted: Experience Shapes Human Temporal Horizon in Sequential

Hang Zhang
Hyoseok Kim
Nathaniel Daw
Laurence Maloney

We investigated how well people make sequential decisions to achieve the long-term goal. In video-game-like settings, a spaceship flew across a row of three mountains of increasing heights. Before each mountain, subjects could elevate the spaceship by either a constant and small height (CS) or a variant but on average larger height (VL) to avoid crashing. The goal was to survive beyond the last mountain. The optimal choice before a specific mountain depended on the heights of all future mountains. We tested whether subjects could learn the optimal policy or base their choices only on a short horizon, i. e. on the immediate mountain. Methods: We constructed two combinations of mountain heights, A and B, which differed in how early a short horizon would be penalized. For A, a short horizon would yield the optimal choice before the first mountain and not increase crash rate until the last mountain. In contrast, for B, a short horizon would increase crash rate as early as the second mountain. Each subject completed 4 blocks of 60 trials, in the block order of ABAB or BABA. Sixteen naı̈ve subjects were evenly assigned to the two groups. Results: The two groups differed in their learning trajectories. (1) The ABAB group achieved a higher probability of survival in the last (. 63) than in the first two blocks (. 50), but the BABA did not (both. 54). (2) The ABAB had a shorter horizon than the BABA: When VL was the optimal choice and involved long- term considerations, ABAB chose VL less than BABA did (53 % vs. 67 %). (3) The BABA appeared to be far-sighted: When CS was the optimal choice and reduced crash at the immediate mountain, BABA chose CS less than ABAB did (60 % vs. 81 %). Conclusion: Human individuals’ temporal horizon in a sequential- decision task depends on their initial experience with the task. People may learn to be myopic or far-sighted.

PDF Details