Arrow Research search

Author name cluster

Dan Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

47 papers
2 author rows

Possible papers

47

AAAI Conference 2026 Conference Paper

SOAR: Semi-Supervised Open-Vocabulary Aerial Object Detection via Dual-Aware Enhanced Prior Denoising

  • Xu Liu
  • Yihong Huang
  • Dan Zhang
  • Lingling Li
  • Long Sun
  • Licheng Jiao

Open-Vocabulary Object Detection (OVOD) shows promise in remote sensing (RS), but due to its unique value, there are challenges such as the predominance of background regions, sparse labels, limited semantic information, and difficulties in semi-supervised training. To tackle these challenges, we propose the Semi-Supervised Open-Vocabulary Aerial Object Detection with Dual-Perception Prior Denoising (SOAR), which explicitly models the background embeddings of each scene to indirectly construct foreground priors, thereby capitalizing on the abundant background information present in RS imagery. We further introduce a query enhancement module that integrates language and foreground prior information to enhance the effectiveness of query selection and feature augmentation. During the decoding stage of semi-supervised training, we perform denoising and reconstruction of the foreground priors to generate pseudo-labels that support the training process. Additionally, we address the sparsity of label information through expansion and aggregation techniques, further improving model performance. Experimental evaluations reveal that, in the open-vocabulary object detection task on the DIOR dataset, our method achieves a mean Average Precision (mAP) of 68.5% and Harmonic Mean (HM) of 55.9%, outperforming the previous state-of-the-art model’s mAP of 61.6% and HM of 53.6%. Our approach offers a novel solution to the open-vocabulary challenge in aerial object detection.

AAAI Conference 2026 Conference Paper

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

  • Jiazheng Xu
  • Yu Huang
  • Jiale Cheng
  • Yuanming Yang
  • Jiajun Xu
  • Yuan Wang
  • Wenbo Duan
  • Shen Yang

Visual generative models have achieved remarkable progress in synthesizing photorealistic images and videos, yet aligning their outputs with human preferences across critical dimensions remains a persistent challenge. Though reinforcement learning from human feedback offers promise for preference alignment, existing reward models for visual generation face limitations, including black-box scoring without interpretability and potentially resultant unexpected biases. We present VisionReward, a general framework for learning human visual preferences in both image and video generation. Specifically, we employ a hierarchical visual assessment framework to capture fine-grained human preferences, and leverages linear weighting to enable interpretable preference learning. Furthermore, we propose a multi-dimensional consistent strategy when using VisionReward as a reward model during preference optimization for visual generation. Experiments show that VisionReward can significantly outperform existing image and video reward models on both machine metrics and human evaluation. Notably, VisionReward surpasses VideoScore by 17.2% in preference prediction accuracy, and text-to-video models with VisionReward achieve a 31.6% higher pairwise win rate compared to the same models using VideoScore.

JBHI Journal 2025 Journal Article

$\text{MR}^{2}$-Net: Retinal OCTA Image Stitching via Multi-Scale Representation Learning and Dynamic Location Guidance

  • Haiting Mao
  • Yuhui Ma
  • Dan Zhang
  • Yanda Meng
  • Shaodong Ma
  • Yuchuan Qiao
  • Huazhu Fu
  • Caifeng Shan

Optical coherence tomography angiography (OCTA) plays a crucial role in quantifying and analyzing retinal vascular diseases. However, the limited field of view (FOV) inherent in most commercial OCTA imaging systems poses a significant challenge for clinicians, restricting the possibility to analyze larger retinal regions of high resolution. Automatic stitching of OCTA scans in adjacent regions may provide a promising solution to extend the region of interest. However, commonly-used stitching algorithms face difficulties in achieving effective alignment due to noise, artifacts and dense vasculature present in OCTA images. To address these challenges, we propose a novel retinal OCTA image stitching network, named $\text{MR}^{2}$ -Net, which integrates multi-scale representation learning and dynamic location guidance. In the first stage, an image registration network with a progressive multi-resolution feature fusion is proposed to derive deep semantic information effectively. Additionally, we introduce a dynamic guidance strategy to locate the foveal avascular zone (FAZ) and constrain registration errors in overlapping vascular regions. In the second stage, an image fusion network based on multiple mask constraints and adjacent image aggregation (AIA) strategies is developed to further eliminate the artifacts in the overlapping areas of stitched images, thereby achieving precise vessel alignment. To validate the effectiveness of our method, we conduct a series of experiments on two delicately constructed datasets, i. e. , OPTOVUE-OCTA and SVision-OCTA. Experimental results demonstrate that our method outperforms other image stitching methods and effectively generates high-quality wide-field OCTA images, achieving a structural similarity index (SSIM) score of 0. 8264 and 0. 8014 on the two datasets, respectively.

ICML Conference 2025 Conference Paper

A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models

  • Mengyang Sun
  • Yihao Wang
  • Tao Feng 0014
  • Dan Zhang
  • Yifan Zhu 0001
  • Jie Tang 0001

In order to streamline the fine-tuning of foundation models, Low-Rank Adapters (LoRAs) have been substantially adopted across various fields, including instruction tuning and domain adaptation. The underlying concept of LoRA involves decomposing a full-rank matrix into the product of two lower-rank matrices, which reduces storage consumption and accelerates the training process. Furthermore, to address the limited expressive capacity of LoRA, the Mixture-of-Expert (MoE) has been introduced for incorporating multiple LoRA adapters. The integration of LoRA experts leads to a visible improvement across several downstream scenes. However, the mixture of LoRAs (MoE-LoRA) still exhibits its low robustness during tuning and inferring. Inspired by the Riemannian Preconditioners which train LoRA as a sub-space projector, we propose a new training strategy for MoE-LoRA, to stabilize and boost its feature learning by gate-rescaled multi-space projections. We provide both a theoretical solution as well as an alternative engineering strategy. Examinations on SGD and AdamW optimizers demonstrate the effectiveness of our methodology. Source code is available at https: //github. com/THUDM/MoELoRA_Riemannian.

NeurIPS Conference 2025 Conference Paper

Can Large Language Models Master Complex Card Games?

  • Wei Wang
  • Fuqing Bie
  • Junzhe Chen
  • Dan Zhang
  • Shiyu Huang
  • Evgeny Kharlamov
  • Jie Tang

Complex games have long been an important benchmark for testing the progress of artificial intelligence algorithms. AlphaGo, AlphaZero, and MuZero have defeated top human players in Go and Chess, garnering widespread societal attention towards artificial intelligence. Concurrently, large language models (LLMs) have exhibited remarkable capabilities across various tasks, raising the question of whether LLMs can achieve similar success in complex games. In this paper, we explore the potential of LLMs in mastering complex card games. We systematically assess the learning capabilities of LLMs across eight diverse card games, evaluating the impact of fine-tuning on high-quality gameplay data, and examining the models' ability to retain general capabilities while mastering these games. Our findings indicate that: (1) LLMs can approach the performance of strong game AIs through supervised fine-tuning on high-quality data, (2) LLMs can achieve a certain level of proficiency in multiple complex card games simultaneously, with performance augmentation for games with similar rules and conflicts for dissimilar ones, and (3) LLMs experience a decline in general capabilities when mastering complex games, but this decline can be mitigated by integrating a certain amount of general instruction data. The evaluation results demonstrate strong learning ability and versatility of LLMs. The code is available at https: //github. com/THUDM/LLM4CardGame

AAAI Conference 2025 Conference Paper

DivGCL: A Graph Contrastive Learning Model for Diverse Recommendation

  • Wenwen Gong
  • Yangliao Geng
  • Dan Zhang
  • Yifan Zhu
  • Xiaolong Xu
  • Haolong Xiang
  • Amin Beheshti
  • Xuyun Zhang

Graph Contrastive Learning (GCL), as a primary paradigm of graph self-supervised learning, spurs a fruitful line of research in tackling the data sparsity issue by maximizing the consistency of user/item embeddings between different augmented views with random perturbations. However, diversity, as a crucial metric for recommendation performance and user satisfaction, has received rather little attention. In fact, there exists a challenging dilemma in balancing accuracy and diversity. To address these issues, we propose a new Graph Contrastive Learning (DivGCL) model for diversifying recommendations. Inspired by the excellence of the determinant point process (DPP), DivGCL adopts a DPP likelihood-based loss function to achieve an ideal trade-off between diversity and accuracy, optimizing it jointly with the advanced Gaussian noise-augmented GCL objective. Extensive experiments on four popular datasets demonstrate that DivGCL surpasses existing approaches in balancing accuracy and diversity, with an improvement of 23.47% at T@20 (abbreviation for trade-off metric) on ML-1M.

EAAI Journal 2025 Journal Article

Enhanced underwater acoustic target recognition using parallel dual-branch network with attention mechanism

  • Jingpu Xu
  • Xiaowei Li
  • Dan Zhang
  • Yaoran Chen
  • Yan Peng
  • Wenhu Liu

Ship-radiated noise serves as a crucial source of underwater acoustic signals for vessel classification, but its identification is often hindered by environmental variability and internal noise interference. To address these challenges, we propose a dual-branch Residual Attention-Long Short-Term Memory (ResA-LSTM) network for underwater acoustic target recognition. The proposed model integrates a Residual Attention (ResA) branch to extract spatial features and a Bidirectional Long Short-Term Memory (Bi-LSTM) branch to capture long-term temporal dependencies from Mel spectrograms. The ResA module incorporates attention mechanisms and residual connections to enhance feature selection and improve robustness in noisy environments. Evaluations conducted on two public datasets, ShipsEar and DeepShip, demonstrate the effectiveness of our approach, achieving classification accuracies of 98. 55 % and 99. 31 %, respectively. Sensitivity analysis further confirms the model's ability to handle long-duration acoustic sequences, highlighting its potential for practical deployment in real-world underwater recognition tasks.

JBHI Journal 2025 Journal Article

Fine-Grained Hierarchical Progressive Modal-Aware Network for Brain Tumor Segmentation

  • Chenggang Lu
  • Jianwei Zhang
  • Dan Zhang
  • Lei Mou
  • Jinli Yuan
  • Kewen Xia
  • Zhitao Guo
  • Jiong Zhang

Brain tumors are highly lethal and debilitating pathological changes that require timely diagnosis and treatment. Magnetic resonance imaging (MRI), a non-invasive diagnostic tool, provides complementary multi-modal information crucial for accurate tumor detection and delineation. However, existing methods struggle to effectively fuse multi-modal information from MRI sequences and often fail to perform modality-specific feature extraction, which hinders accurate tumor segmentation. Furthermore, the inherent challenges posed by the blurred boundaries and complex morphological characteristics of tumor structures present additional substantial obstacles to achieving precise segmentation. To address these issues, we propose FiHam, a fine-grained hierarchical progressive modal-aware network that introduces a novel multi-modal fusion strategy and an advanced feature extraction mechanism. Specifically, FiHam employs a progressive fusion strategy that extracts modality-specific features at lower levels and integrates multi-modal features at higher levels to effectively leverage complementary information from tumor images. Additionally, we design a gated cross-attention modal-fusion module that adaptively selects and integrates dual-modal features using cross-attention mechanisms to enhance modality fusion. To further refine segmentation accuracy, we incorporate a tiny U-Net into the encoder to capture boundary features and complex tumor morphology. Extensive experiments on three large-scale, multi-modal brain tumor datasets demonstrate that FiHam achieves state-of-the-art performance, delivering significant improvements in segmentation accuracy and generalizability across diverse MRI modalities.

UAI Conference 2025 Conference Paper

Generative Uncertainty in Diffusion Models

  • Metod Jazbec
  • Eliot Wong-Toi
  • Guoxuan Xia
  • Dan Zhang
  • Eric T. Nalisnick
  • Stephan Mandt

Diffusion models have recently driven significant breakthroughs in generative modeling. While state-of-the-art models produce high-quality samples on average, individual samples can still be low quality. Detecting such samples without human inspection remains a challenging task. To address this, we propose a Bayesian framework for estimating generative uncertainty of synthetic samples. We outline how to make Bayesian inference practical for large, modern generative models and introduce a new semantic likelihood (evaluated in the latent space of a feature extractor) to address the challenges posed by high-dimensional sample spaces. Through our experiments, we demonstrate that the proposed generative uncertainty effectively identifies poor-quality samples and significantly outperforms existing uncertainty-based methods. Notably, our Bayesian framework can be applied post-hoc to any pretrained diffusion or flow matching model (via the Laplace approximation), and we propose simple yet effective techniques to minimize its computational overhead during sampling.

EAAI Journal 2025 Journal Article

Global–local adaptive resampling strategy for enhancing the performance of Physics-informed neural networks

  • Lei Gao
  • Dan Zhang
  • Yaoran Chen
  • Xiaowei Li
  • Chunxin Li
  • Yan Peng

In recent years, the rapid development of artificial intelligence (AI) has brought innovative solutions to the field of engineering. Physical-informed neural networks (PINNs) have provided a new paradigm for solving partial differential equations (PDEs). However, PINNs are extremely sensitive to the number and distribution of collocation points. Insufficient collocation points or uneven distribution may lead can lead to solution failure. To address this issue, this study proposes a global–local adaptive resampling strategy (GLAR) that combines Monte Carlo integration with PINNs. During the training process, the collocation points are resampled in both global and local regions according to the Monte Carlo integral values to improve the accuracy of the model. Numerical experiments show that GLAR-PINNs is comparable to existing resampling methods when dealing with linear PDEs (such as the Diffusion and Wave equations). When it comes to solving nonlinear PDEs (such as the Burgers, Korteweg–de Vries, and Allen–Cahn equations), the accuracy is improved has been improved by 14. 65 times, 38. 04 times, and 7. 14 times, respectively. In addition, we applied this strategy to reconstruct the flow field around a two-dimensional triangular cylinder, and the relative error of the reconstructed velocity field was less than 0. 001. This significantly promoting significantly promotes the application and development of artificial intelligence in the engineering field.

ICLR Conference 2025 Conference Paper

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

  • Jiale Cheng
  • Xiao Liu 0036
  • Cunxiang Wang
  • Xiaotao Gu
  • Yida Lu
  • Dan Zhang
  • Yuxiao Dong
  • Jie Tang 0001

Instruction-following is a fundamental capability of language models, requiring the model to recognize even the most subtle requirements in the instructions and accurately reflect them in its output. Such an ability is well-suited for and often optimized by preference learning. However, existing methods often directly sample multiple independent responses from the model when creating preference pairs. Such practice can introduce content variations irrelevant to whether the instruction is precisely followed (e.g., different expressions about the same semantic), interfering with the goal of teaching models to recognize the key differences that lead to improved instruction following. In light of this, we introduce SPaR, a self-play framework integrating tree-search self-refinement to yield valid and comparable preference pairs free from distractions. By playing against itself, an LLM employs a tree-search strategy to refine its previous responses with respect to the instruction while minimizing unnecessary variations. Our experiments show that a LLaMA3-8B model, trained over three iterations guided by SPaR, surpasses GPT-4-Turbo on the IFEval benchmark without losing general capabilities. Furthermore, SPaR demonstrates promising scalability, greatly enhancing models like GLM-4-9B and LLaMA3-70B. We also identify how inference scaling in tree search would impact model performance. Our code and data are publicly available at https://github.com/thu-coai/SPaR.

TMLR Journal 2025 Journal Article

Temporal Test-Time Adaptation with State-Space Models

  • Mona Schirmer
  • Dan Zhang
  • Eric Nalisnick

Distribution shifts between training and test data are inevitable over the lifecycle of a deployed model, leading to performance decay. Adapting a model on test samples can help mitigate this drop in performance. However, most test-time adaptation methods have focused on synthetic corruption shifts, leaving a variety of distribution shifts underexplored. In this paper, we focus on distribution shifts that evolve gradually over time, which are common in the wild but challenging for existing methods, as we show. To address this, we propose STAD, a Bayesian filtering method that adapts a deployed model to temporal distribution shifts by learning the time-varying dynamics in the last set of hidden features. Without requiring labels, our model infers time-evolving class prototypes that act as a dynamic classification head. Through experiments on real-world temporal distribution shifts, we show that our method excels in handling small batch sizes and label shift.

ICML Conference 2025 Conference Paper

ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

  • Tao Feng 0014
  • Wei Li
  • Didi Zhu
  • Hangjie Yuan
  • Wendi Zheng
  • Dan Zhang
  • Jie Tang 0001

Backpropagation provides a generalized configuration for overcoming catastrophic forgetting. Optimizers such as SGD and Adam are commonly used for weight updates in continual learning and continual pre-training. However, access to gradient information is not always feasible in practice due to black-box APIs, hardware constraints, or non-differentiable systems, a challenge we refer to as the gradient bans. To bridge this gap, we introduce ZeroFlow, the first benchmark designed to evaluate gradient-free optimization algorithms for overcoming forgetting. ZeroFlow examines a suite of forward pass-based methods across various algorithms, forgetting scenarios, and datasets. Our results show that forward passes alone can be sufficient to mitigate forgetting. We uncover novel optimization principles that highlight the potential of forward pass-based methods in mitigating forgetting, managing task conflicts, and reducing memory demands. Additionally, we propose new enhancements that further improve forgetting resistance using only forward passes. This work provides essential tools and insights to advance the development of forward-pass-based methods for continual learning.

YNIMG Journal 2024 Journal Article

Contrastive learning of shared spatiotemporal EEG representations across individuals for naturalistic neuroscience

  • Xinke Shen
  • Lingyi Tao
  • Xuyang Chen
  • Sen Song
  • Quanying Liu
  • Dan Zhang

Neural representations induced by naturalistic stimuli offer insights into how humans respond to stimuli in daily life. Understanding neural mechanisms underlying naturalistic stimuli processing hinges on the precise identification and extraction of the shared neural patterns that are consistently present across individuals. Targeting the Electroencephalogram (EEG) technique, known for its rich spatial and temporal information, this study presents a framework for Contrastive Learning of Shared SpatioTemporal EEG Representations across individuals (CL-SSTER). CL-SSTER utilizes contrastive learning to maximize the similarity of EEG representations across individuals for identical stimuli, contrasting with those for varied stimuli. The network employs spatial and temporal convolutions to simultaneously learn the spatial and temporal patterns inherent in EEG. The versatility of CL-SSTER was demonstrated on three EEG datasets, including a synthetic dataset, a natural speech comprehension EEG dataset, and an emotional video watching EEG dataset. CL-SSTER attained the highest inter-subject correlation (ISC) values compared to the state-of-the-art ISC methods. The latent representations generated by CL-SSTER exhibited reliable spatiotemporal EEG patterns, which can be explained by properties of the naturalistic stimuli. CL-SSTER serves as an interpretable and scalable framework for the identification of inter-subject shared neural representations in naturalistic neuroscience.

ICML Conference 2024 Conference Paper

Directly Denoising Diffusion Models

  • Dan Zhang
  • Jingjing Wang
  • Feng Luo

In this paper, we present Directly Denoising Diffusion Models (DDDMs): a simple and generic approach for generating realistic images with few-step sampling, while multistep sampling is still preserved for better performance. DDDMs require no delicately designed samplers nor distillation on pre-trained distillation models. DDDMs train the diffusion model conditioned on an estimated target that was generated from previous training iterations of its own. To generate images, samples generated from previous timestep are also taken into consideration, guiding the generation process iteratively. We further propose Pseudo-LPIPS, a novel metric loss that is more robust to various values of hyperparameter. Despite its simplicity, the proposed approach can achieve strong performance in benchmark datasets. Our model achieves FID scores of 2. 57 and 2. 33 on CIFAR-10 in one-step and two-step sampling respectively, surpassing those obtained from GANs and distillation-based models. By extending the sampling to 1000 steps, we further reduce FID score to 1. 79, aligning with state-of-the-art methods in the literature. For ImageNet 64x64, our approach stands as a competitive contender against leading models.

UAI Conference 2024 Conference Paper

Early-Exit Neural Networks with Nested Prediction Sets

  • Metod Jazbec
  • Patrick Forré
  • Stephan Mandt
  • Dan Zhang
  • Eric T. Nalisnick

Early-exit neural networks (EENNs) facilitate adaptive inference by producing predictions at multiple stages of the forward pass. In safety-critical applications, these predictions are only meaningful when complemented with reliable uncertainty estimates. Yet, due to their sequential structure, an EENN’s uncertainty estimates should also be *consistent*: labels that are deemed improbable at one exit should not reappear within the confidence interval / set of later exits. We show that standard uncertainty quantification techniques, like Bayesian methods or conformal prediction, can lead to inconsistency across exits. We address this problem by applying anytime-valid confidence sequences (AVCSs) to the exits of EENNs. By design, AVCSs maintain consistency across exits. We examine the theoretical and practical challenges of applying AVCSs to EENNs and empirically validate our approach on both regression and classification tasks.

IJCAI Conference 2024 Conference Paper

FactCHD: Benchmarking Fact-Conflicting Hallucination Detection

  • Xiang Chen
  • Duanzheng Song
  • Honghao Gui
  • Chenxi Wang
  • Ningyu Zhang
  • Yong Jiang
  • Fei Huang
  • Chengfei Lyu

Despite their impressive generative capabilities, LLMs are hindered by fact-conflicting hallucinations in real-world applications. The accurate identification of hallucinations in texts generated by LLMs, especially in complex inferential scenarios, is a relatively unexplored area. To address this gap, we present FactCHD, a dedicated benchmark designed for the detection of fact-conflicting hallucinations from LLMs. FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation. A distinctive element of FactCHD is its integration of fact-based evidence chains, significantly enhancing the depth of evaluating the detectors' explanations. Experiments on different LLMs expose the shortcomings of current approaches in detecting factual errors accurately. Furthermore, we introduce TRUTH-TRIANGULATOR which synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2, aiming to yield more credible detection through the amalgamation of predictive results and evidence.

NeurIPS Conference 2024 Conference Paper

Fast yet Safe: Early-Exiting with Risk Control

  • Metod Jazbec
  • Alexander Timans
  • Tin H. Veljković
  • Kaspar Sakmann
  • Dan Zhang
  • Christian A. Naesseth
  • Eric Nalisnick

Scaling machine learning models significantly improves their performance. However, such gains come at the cost of inference being slow and resource-intensive. Early-exit neural networks (EENNs) offer a promising solution: they accelerate inference by allowing intermediate layers to exit and produce a prediction early. Yet a fundamental issue with EENNs is how to determine when to exit without severely degrading performance. In other words, when is it 'safe' for an EENN to go 'fast'? To address this issue, we investigate how to adapt frameworks of risk control to EENNs. Risk control offers a distribution-free, post-hoc solution that tunes the EENN's exiting mechanism so that exits only occur when the output is of sufficient quality. We empirically validate our insights on a range of vision and language tasks, demonstrating that risk control can produce substantial computational savings, all the while preserving user-specified performance goals.

ECAI Conference 2024 Conference Paper

GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization

  • Ran Liu 0011
  • Ming Liu 0003
  • Min Yu 0001
  • Jianguo Jiang
  • Gang Li 0009
  • Dan Zhang
  • Jingyuan Li 0002
  • Xiang Meng

Pre-trained language models are increasingly being used in multi-document summarization tasks. However, these models need large-scale corpora for pre-training and are domain-dependent. Other non-neural unsupervised summarization approaches mostly rely on key sentence extraction, which can lead to information loss. To address these challenges, we propose a lightweight yet effective unsupervised approach called GLIMMER: a Graph and LexIcal features based unsupervised Multi-docuMEnt summaRization approach. It first constructs a sentence graph from the source documents, then automatically identifies semantic clusters by mining low-level features from raw texts, thereby improving intra-cluster correlation and the fluency of generated sentences. Finally, it summarizes clusters into natural sentences. Experiments conducted on Multi-News, Multi-XScience and DUC-2004 demonstrate that our approach outperforms existing unsupervised approaches. Furthermore, it surpasses state-of-the-art pre-trained multi-document summarization models (e. g. PEGASUS and PRIMERA) under zero-shot settings in terms of ROUGE scores. Additionally, human evaluations indicate that summaries generated by GLIMMER achieve high readability and informativeness scores. Our code is available at https: //github. com/Oswald1997/GLIMMER.

YNIMG Journal 2024 Journal Article

Harmonizing three-dimensional MRI using pseudo-warping field guided GAN

  • Jiaying Lin
  • Zhuoshuo Li
  • Youbing Zeng
  • Xiaobo Liu
  • Liang Li
  • Neda Jahanshad
  • Xinting Ge
  • Dan Zhang

In pursuit of cultivating automated models for magnetic resonance imaging (MRI) to aid in diagnostics, an escalating demand for extensive, multisite, and heterogeneous brain imaging datasets has emerged. This potentially introduces biased outcomes when directly applied for subsequent analysis. Researchers have endeavored to address this issue by pursuing the harmonization of MRIs. However, most existing image-based harmonization methods for MRI are tailored for 2D slices, which may introduce inter-slice variations when they are combined into a 3D volume. In this study, we aim to resolve inconsistencies between slices by introducing a pseudo-warping field. This field is created randomly and utilized to transform a slice into an artificially warped subsequent slice. The objective of this pseudo-warping field is to ensure that generators can consistently harmonize adjacent slices to another domain, without being affected by the varying content present in different slices. Furthermore, we construct unsupervised spatial and recycle loss to enhance the spatial accuracy and slice-wise consistency across the 3D images. The results demonstrate that our model effectively mitigates inter-slice variations and successfully preserves the anatomical details of the images during the harmonization process. Compared to generative harmonization models that employ 3D operators, our model exhibits greater computational efficiency and flexibility.

IROS Conference 2024 Conference Paper

LiDAR-based HD Map Localization using Semantic Generalized ICP with Road Marking Detection

  • Yansong Gong
  • Xinglian Zhang
  • Jingyi Feng
  • Xiao He
  • Dan Zhang

In GPS-denied scenarios, a robust environmental perception and localization system becomes crucial for autonomous driving. In this paper, a LiDAR-based online localization system is developed, incorporating road marking detection and registration on a high-definition (HD) map. Within our system, a road marking detection approach is proposed with realtime performance, in which an adaptive segmentation technique is first introduced to isolate high-reflectance points correlated with road markings, enhancing real-time efficiency. Then, a spatio-temporal probabilistic local map is formed by aggregating historical LiDAR scans, providing a dense point cloud. Finally, a LiDAR bird’s-eye view (LiBEV) image is generated, and an instance segmentation network is applied to accurately label the road markings. For road marking registration, a semantic generalized iterative closest point (SG-ICP) algorithm is designed. Linear road markings are modeled as 1-manifolds embedded in 2D space, mitigating the influence of constraints along the linear direction, addressing the under-constrained problem and achieving a lower localization errors on HD maps than ICP. Extensive experiments are conducted in real-world scenarios, demonstrating the effectiveness and robustness of our system.

NeurIPS Conference 2024 Conference Paper

Renovating Names in Open-Vocabulary Segmentation Benchmarks

  • Haiwen Huang
  • Songyou Peng
  • Dan Zhang
  • Andreas Geiger

Names are essential to both human cognition and vision-language models. Open-vocabulary models utilize class names as text prompts to generalize to categories unseen during training. However, the precision of these names is often overlooked in existing datasets. In this paper, we address this underexplored problem by presenting a framework for "renovating" names in open-vocabulary segmentation benchmarks (RENOVATE). Our framework features a renaming model that enhances the quality of names for each visual segment. Through experiments, we demonstrate that our renovated names help train stronger open-vocabulary models with up to 15% relative improvement and significantly enhance training efficiency with improved data quality. We also show that our renovated names improve evaluation by better measuring misclassification and enabling fine-grained model analysis. We provide our code and relabelings for several popular segmentation datasets to the research community on our project page: https: //andrehuang. github. io/renovate.

NeurIPS Conference 2024 Conference Paper

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search

  • Dan Zhang
  • Sining Zhoubian
  • Ziniu Hu
  • Yisong Yue
  • Yuxiao Dong
  • Jie Tang

Recent methodologies in LLM self-training mostly rely on LLM generating responses and filtering those with correct output answers as training data. This approach often yields a low-quality fine-tuning training set (e. g. , incorrect plans or intermediate reasoning). In this paper, we develop a reinforced self-training approach, called ReST-MCTS*, based on integrating process reward guidance with tree search MCTS* for collecting higher-quality reasoning traces as well as per-step value to train policy and reward models. ReST-MCTS* circumvents the per-step manual annotation typically used to train process rewards by tree-search-based reinforcement learning: Given oracle final correct answers, ReST-MCTS* is able to infer the correct process rewards by estimating the probability this step can help lead to the correct answer. These inferred rewards serve dual purposes: they act as value targets for further refining the process reward model and also facilitate the selection of high-quality traces for policy model self-training. We first show that the tree-search policy in ReST-MCTS* achieves higher accuracy compared with prior LLM reasoning baselines such as Best-of-N and Tree-of-Thought, within the same search budget. We then show that by using traces searched by this tree-search policy as training data, we can continuously enhance the three language models for multiple iterations, and outperform other self-training algorithms such as ReST$^\text{EM}$ and Self-Rewarding LM.

NeurIPS Conference 2024 Conference Paper

SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models

  • Dan Zhang
  • Ziniu Hu
  • Sining Zhoubian
  • Zhengxiao Du
  • Kaiyu Yang
  • Zihan Wang
  • Yisong Yue
  • Yuxiao Dong

Large Language Models (LLMs) have shown promise in assisting scientific discovery. However, such applications are currently limited by LLMs' deficiencies in understanding intricate scientific concepts, deriving symbolic equations, and solving advanced numerical calculations. To bridge these gaps, we introduce SciInstruct, a suite of scientific instructions for training scientific language models capable of college-level scientific reasoning. Central to our approach is a novel self-reflective instruction annotation framework to address the data scarcity challenge in the science domain. This framework leverages existing LLMs to generate step-by-step reasoning for unlabelled scientific questions, followed by a process of self-reflective critic-and-revise. Applying this framework, we curated a diverse and high-quality dataset encompassing physics, chemistry, math, and formal proofs. We analyze the curated SciInstruct from multiple interesting perspectives (e. g. , domain, scale, source, question type, answer length, etc. ). To verify the effectiveness of SciInstruct, we fine-tuned different language models with SciInstruct, i. e. , ChatGLM3 (6B and 32B), Llama3-8B-Instruct, and Mistral-7B: MetaMath, enhancing their scientific and mathematical reasoning capabilities, without sacrificing the language understanding capabilities of the base model. We release all codes and SciInstruct at https: //github. com/THUDM/SciGLM.

NeurIPS Conference 2023 Conference Paper

Controlling Text-to-Image Diffusion by Orthogonal Finetuning

  • Zeju Qiu
  • Weiyang Liu
  • Haiwen Feng
  • Yuxuan Xue
  • Yao Feng
  • Zhen Liu
  • Dan Zhang
  • Adrian Weller

Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.

ICLR Conference 2023 Conference Paper

GOOD: Exploring geometric cues for detecting objects in an open world

  • Haiwen Huang
  • Andreas Geiger 0001
  • Dan Zhang

We address the task of open-world class-agnostic object detection, i.e., detecting every object in an image by learning from a limited number of base object classes. State-of-the-art RGB-based models suffer from overfitting the training classes and often fail at detecting novel-looking objects. This is because RGB-based models primarily rely on appearance similarity to detect novel objects and are also prone to overfitting short-cut cues such as textures and discriminative parts. To address these shortcomings of RGB-based object detectors, we propose incorporating geometric cues such as depth and normals, predicted by general-purpose monocular estimators. Specifically, we use the geometric cues to train an object proposal network for pseudo-labeling unannotated novel objects in the training set. Our resulting Geometry-guided Open-world Object Detector (GOOD) significantly improves detection recall for novel object categories and already performs well with only a few training classes. Using a single ``person'' class for training on the COCO dataset, GOOD surpasses SOTA methods by 5.0% AR@100, a relative improvement of 24%. The code has been made available at https://github.com/autonomousvision/good.

YNIMG Journal 2023 Journal Article

Leading and following: Noise differently affects semantic and acoustic processing during naturalistic speech comprehension

  • Xinmiao Zhang
  • Jiawei Li
  • Zhuoran Li
  • Bo Hong
  • Tongxiang Diao
  • Xin Ma
  • Guido Nolte
  • Andreas K. Engel

Despite the distortion of speech signals caused by unavoidable noise in daily life, our ability to comprehend speech in noisy environments is relatively stable. However, the neural mechanisms underlying reliable speech-in-noise comprehension remain to be elucidated. The present study investigated the neural tracking of acoustic and semantic speech information during noisy naturalistic speech comprehension. Participants listened to narrative audio recordings mixed with spectrally matched stationary noise at three signal-to-ratio (SNR) levels (no noise, 3 dB, -3 dB), and 60-channel electroencephalography (EEG) signals were recorded. A temporal response function (TRF) method was employed to derive event-related-like responses to the continuous speech stream at both the acoustic and the semantic levels. Whereas the amplitude envelope of the naturalistic speech was taken as the acoustic feature, word entropy and word surprisal were extracted via the natural language processing method as two semantic features. Theta-band frontocentral TRF responses to the acoustic feature were observed at around 400 ms following speech fluctuation onset over all three SNR levels, and the response latencies were more delayed with increasing noise. Delta-band frontal TRF responses to the semantic feature of word entropy were observed at around 200 to 600 ms leading to speech fluctuation onset over all three SNR levels. The response latencies became more leading with increasing noise and decreasing speech comprehension and intelligibility. While the following responses to speech acoustics were consistent with previous studies, our study revealed the robustness of leading responses to speech semantics, which suggests a possible predictive mechanism at the semantic level for maintaining reliable speech comprehension in noisy environments.

NeurIPS Conference 2023 Conference Paper

Learning Sample Difficulty from Pre-trained Models for Reliable Prediction

  • Peng Cui
  • Dan Zhang
  • Zhijie Deng
  • Yinpeng Dong
  • Jun Zhu

Large-scale pre-trained models have achieved remarkable success in many applications, but how to leverage them to improve the prediction reliability of downstream models is undesirably under-explored. Moreover, modern neural networks have been found to be poorly calibrated and make overconfident predictions regardless of inherent sample difficulty and data uncertainty. To address this issue, we propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization. Pre-trained models that have been exposed to large-scale datasets and do not overfit the downstream training classes enable us to measure each training sample’s difficulty via feature-space Gaussian modeling and relative Mahalanobis distance computation. Importantly, by adaptively penalizing overconfident prediction based on the sample difficulty, we simultaneously improve accuracy and uncertainty calibration across challenging benchmarks (e. g. , +0. 55% ACC and −3. 7% ECE on ImageNet1k using ResNet34), consistently surpassing competitive baselines for reliable prediction. The improved uncertainty estimate further improves selective classification (abstaining from erroneous predictions) and out-of-distribution detection.

EAAI Journal 2023 Journal Article

Multi-scale split dual calibration network with periodic information for interpretable fault diagnosis of rotating machinery

  • Yongyi Chen
  • Dan Zhang
  • Hongjie Ni
  • Jun Cheng
  • Hamid Reza Karimi

Conventional intelligent fault diagnosis algorithms based on signal processing and pattern recognition have high demands on expert experience and poor generalization performance, which may not have good fault diagnosis performance in complex industrial fields. Meanwhile, the data acquisition system may suffer from cyber attacks when collecting vibration signals. The vibration signal has a very low signal-to-noise ratio (SNR), which seriously affects the accuracy of fault diagnosis. Aiming at the problem of fault diagnosis under low SNR, a new fault diagnosis framework based on a Multi-scale Split Dual Calibration Network with Periodic Information (PI-MSDCN) is proposed in this paper. In the fault diagnosis framework, a periodic block is constructed to automatically learn the periodic information of vibration signals through the neural network. The learned periodic information and raw vibration signal are used as the input data of MSDCN. Specifically, MSDCN uses convolution kernels of different sizes for different channels of input features to generate multi-scale features, and obtain mixed domain attention features for features with different scales respectively. Then, the attention feature is used as the threshold to remove the redundant information in the multi-scale feature adaptively. Finally, in order to calibrate the contribution of different scale features to fault diagnosis, the mixed-domain attention coefficients are applied to the corresponding features to obtain richer multi-scale attention features. The experimental studies under different levels of interference are performed to demonstrate the average accuracy of the proposed method is 92. 91% ( ± 5. 08 % ), which is superior to other existing results in literature.

JBHI Journal 2023 Journal Article

Personality in Daily Life: Multi-Situational Physiological Signals Reflect Big-Five Personality Traits

  • Xinyu Shui
  • Yiling Chen
  • Xin Hu
  • Fei Wang
  • Dan Zhang

The popularity of wearable physiological recording devices has opened up new possibilities for the assessment of personality traits in everyday life. Compared with traditional questionnaires or laboratory assessments, wearable device-based measurements can collect rich data about individual physiological activities in real-life situations without interfering with normal life, enabling a more comprehensive description of individual differences. The present study aimed to explore the assessment of individuals’ Big-Five personality traits by physiological signals in daily life situations. A commercial bracelet was used to track the heart rate (HR) data from eighty college students (all male) enrolled in a special training program with a strictly-controlled daily schedule for ten consecutive working days. Their HR activities were divided into five daily situations (morning exercise, morning classes, afternoon classes, free time in the evening, and self-study situations) according to their daily schedule. Regression analyses with HR-based features in these five situations averaged across the ten days revealed significant cross-validated quantitative prediction correlations of 0. 32 and 0. 26 for the dimensions of Openness and Extraversion, with the prediction correlation trending significance for Conscientiousness and Neuroticism. Moreover, the multi-situation HR-based results were in general superior to those based on single-situation HR-based features, as well as those based on the multi-situation self-reported emotion ratings. Togetherour findings demonstrate the link between personality and daily HR measures using state-of-the-art commercial devices and could shed light on the development of Big-Five personality assessment based on daily multi-situation physiological measures.

NeurIPS Conference 2023 Conference Paper

Towards Anytime Classification in Early-Exit Architectures by Enforcing Conditional Monotonicity

  • Metod Jazbec
  • James Allingham
  • Dan Zhang
  • Eric Nalisnick

Modern predictive models are often deployed to environments in which computational budgets are dynamic. Anytime algorithms are well-suited to such environments as, at any point during computation, they can output a prediction whose quality is a function of computation time. Early-exit neural networks have garnered attention in the context of anytime computation due to their capability to provide intermediate predictions at various stages throughout the network. However, we demonstrate that current early-exit networks are not directly applicable to anytime settings, as the quality of predictions for individual data points is not guaranteed to improve with longer computation. To address this shortcoming, we propose an elegant post-hoc modification, based on the Product-of-Experts, that encourages an early-exit network to become gradually confident. This gives our deep models the property of conditional monotonicity in the prediction quality---an essential building block towards truly anytime predictive modeling using early-exit architectures. Our empirical results on standard image-classification tasks demonstrate that such behaviors can be achieved while preserving competitive accuracy on average.

EAAI Journal 2023 Journal Article

Underwater image enhancement via multi-scale fusion and adaptive color-gamma correction in low-light conditions

  • Dan Zhang
  • Zongxin He
  • Xiaohuan Zhang
  • Zhen Wang
  • Wenyi Ge
  • Taian Shi
  • Yi Lin

In dark underwater areas, existing single-model underwater image enhancement methods have poor enhancement effects. We propose an underwater image enhancement method based on color correction and multi-scale fusion (CCMF). Specifically, we first design a color correction method with red channel compensation, which compensates for the red channel according to light attenuation and removes color bias. We propose a contrast enhancement method based on guided filtering to enhance edge texture details. The image is decomposed into a base layer and a detail layer in the logarithmic domain, with layered enhancement. Secondly, we propose an adaptive gamma correction method that dynamically adjusts correction parameters based on the gray image values. This approach prevents over-enhancement and effectively enhances the exposure in dark areas. We extract weight maps that represent different features from the input images and employ a multi-scale pyramid fusion technique to integrate the aforementioned feature information. This approach enables the mutual complementarity of various features and enhances the overall visual effect. Experimental results show that our method can effectively integrate the advantages of different enhancement methods, and the objective indicators of UCIQE, UIQM, and EG are better than other related state-of-the-art methods.

IROS Conference 2023 Conference Paper

VIW-Fusion: Extrinsic Calibration and Pose Estimation for Visual-IMU-Wheel Encoder System

  • Chunxiao Qiao
  • Shuying Zhao
  • Yunzhou Zhang
  • Yahui Wang
  • Dan Zhang

The data fusion of camera, IMU, and wheel encoder measurements has proved its effectiveness in localizing ground robots, and obtaining accurate sensor extrinsic parameters is its premise. We propose an extrinsic parameter calibration algorithm and a multi-sensor-based pose estimation algorithm for the camera-IMU-wheel encoder system. First, we propose a joint calibration algorithm for the extrinsic parameters of the camera-IMU-wheel encoder system, which improves the accuracy and robustness of the camera-wheel encoder calibration. We then extend the visual-inertial odometry (VIO) to incorporate the measurements from the wheel encoder and weight the wheel encoder measurements according to angular velocity in global optimization to improve the performance. We further propose a novel method for VIO initialization by integrating wheel encoder information, which significantly reduces the scale error in initialization. We conduct extrinsic parameter calibration experiments on a real self-driving car and validate the performance of our multi-sensor-based localization system on the KAIST dataset and a dataset collected by our self-driving vehicles by performing an exhaust comparison with the state-of-the-art algorithms. Our implementations are open source 1 1 https://github.com/chunxiaoqiao/VIW-Fusion.git.

YNIMG Journal 2022 Journal Article

Similar brains blend emotion in similar ways: Neural representations of individual difference in emotion profiles

  • Xin Hu
  • Fei Wang
  • Dan Zhang

Our daily emotional experience is a complex construct that usually involves multiple emotions blended in a context-dependent manner. However, the co-occurring and context-dependent nature of human emotions was understated in previous studies when addressing the individual difference in emotional experiences. The present study proposed a situated and blended 'profile' perspective to characterize individualized emotional experiences. Eighty participants watched a series of emotional videos with their EEG recorded, and the individual differences in their emotion profiles were measured as the vector distances between their multidimensional emotion ratings for these video stimuli. This measure was found to be a reliable descriptor of individualized emotional experiences and could efficiently predict classical emotional complexity indices. More importantly, inter-subject representational analyses revealed that similar emotion profiles were associated with similar delta-band activities over the prefrontal and temporo-parietal regions and similar theta-band activities over the frontal regions. Furthermore, left- and right-lateralized temporo-parietal representations were observed for positive and negative emotion profiles, respectively. Our findings demonstrate the potential of taking a 'profile' perspective for understanding individual differences in human emotions.

JBHI Journal 2022 Journal Article

Sparse-Based Domain Adaptation Network for OCTA Image Super-Resolution Reconstruction

  • Huaying Hao
  • Cong Xu
  • Dan Zhang
  • Qifeng Yan
  • Jiong Zhang
  • Yue Liu
  • Yitian Zhao

Retinal Optical Coherence Tomography Angiography (OCTA) with high-resolution is important for the quantification and analysis of retinal vasculature. However, the resolution of OCTA images is inversely proportional to the field of view at the same sampling frequency, which is not conducive to clinicians for analyzing larger vascular areas. In this paper, we propose a novel S parse-based domain A daptation S uper- R esolution network (SASR) for the reconstruction of realistic $6\times \text{6}{\rm{mm}}^{2}$ /low-resolution (LR) OCTA images to high-resolution (HR) representations. To be more specific, we first perform a simple degradation of the $3\times \text{3}\, {\rm{mm}}^{2}$ /high-resolution (HR) image to obtain the synthetic LR image. An efficient registration method is then employed to register the synthetic LR with its corresponding $3\times \text{3}\, {\rm{mm}}^{2}$ image region within the $6\times \text{6}\, {\rm{mm}}^{2}$ image to obtain the cropped realistic LR image. We then propose a multi-level super-resolution model for the fully-supervised reconstruction of the synthetic data, guiding the reconstruction of the realistic LR images through a generative-adversarial strategy that allows the synthetic and realistic LR images to be unified in the feature domain. Finally, a novel sparse edge-aware loss is designed to dynamically optimize the vessel edge structure. Extensive experiments on two OCTA sets have shown that our method performs better than state-of-the-art super-resolution reconstruction methods. In addition, we have investigated the performance of the reconstruction results on retina structure segmentations, which further validate the effectiveness of our approach.

EAAI Journal 2021 Journal Article

Local–Global Attentive Adaptation for Object Detection

  • Dan Zhang
  • Jingjing Li
  • Xingpeng Li
  • Zhekai Du
  • Lin Xiong
  • Mao Ye

Adversarial adaptive methods have been proven to be useful for domain transfer in many fields such as image recognition and semantic segmentation, etc However, for object detection, since each image could have different combinations of objects, brutally aligning all the images without considering their transferability may cause the notorious phenomena named ‘negative transfer’. On the other hand, strong matching the local-level features makes sense, as it not only reduces the discrepancy between different domain distributions, but preserves the category-level semantic information. However, it is hard to markedly achieve domain invariance using a simple adversarial adaptive method. In this work, we propose an effective method termed Local–Global Attentive Adaptation for object Detection (LGAAD). Our method can alleviate the negative transfer caused by improper global alignments through leveraging an adaptively and dynamically weighted transferability to highlight the more transferable images. Furthermore, the proposed method also achieves the strong matching between two domains at local-level features to alleviate the cross-domain discrepancy by using the attention mechanism after multiple local discriminators. Additionally, we also consider the domain impacts of instance-wise features and backgrounds in images with large domain divergence, a non-negligible factor for improving the domain adaptive detection model performance. Extensive experiments of various domain shift scenarios show that our method exceeds the state-of-the-art results on several public datasets. Furthermore, qualitative visualization and ablation analyzes can demonstrate the validity of our approach for attending the interested regions and instances on domain adaptation.

YNIMG Journal 2021 Journal Article

Non-rhythmic temporal prediction involves phase resets of low-frequency delta oscillations

  • Jonathan Daume
  • Peng Wang
  • Alexander Maye
  • Dan Zhang
  • Andreas K. Engel

The phase of neural oscillatory signals aligns to the predicted onset of upcoming stimulation. Whether such phase alignments represent phase resets of underlying neural oscillations or just rhythmically evoked activity, and whether they can be observed in a rhythm-free visual context, however, remains unclear. Here, we recorded the magnetoencephalogram while participants were engaged in a temporal prediction task, judging the visual or tactile reappearance of a uniformly moving stimulus. The prediction conditions were contrasted with a control condition to dissociate phase adjustments of neural oscillations from stimulus-driven activity. We observed stronger delta band inter-trial phase consistency (ITPC) in a network of sensory, parietal and frontal brain areas, but no power increase reflecting stimulus-driven or prediction-related evoked activity. Delta ITPC further correlated with prediction performance in the cerebellum and visual cortex. Our results provide evidence that phase alignments of low-frequency neural oscillations underlie temporal predictions in a non-rhythmic visual and crossmodal context.

YNIMG Journal 2021 Journal Article

Speech frequency-following response in human auditory cortex is more than a simple tracking

  • Ning Guo
  • Xiaopeng Si
  • Yang Zhang
  • Yue Ding
  • Wenjing Zhou
  • Dan Zhang
  • Bo Hong

The human auditory cortex is recently found to contribute to the frequency following response (FFR) and the cortical component has been shown to be more relevant to speech perception. However, it is not clear how cortical FFR may contribute to the processing of speech fundamental frequency (F0) and the dynamic pitch. Using intracranial EEG recordings, we observed a significant FFR at the fundamental frequency (F0) for both speech and speech-like harmonic complex stimuli in the human auditory cortex, even in the missing fundamental condition. Both the spectral amplitude and phase coherence of the cortical FFR showed a significant harmonic preference, and attenuated from the primary auditory cortex to the surrounding associative auditory cortex. The phase coherence of the speech FFR was found significantly higher than that of the harmonic complex stimuli, especially in the left hemisphere, showing a high timing fidelity of the cortical FFR in tracking dynamic F0 in speech. Spectrally, the frequency band of the cortical FFR was largely overlapped with the range of the human vocal pitch. Taken together, our study parsed the intrinsic properties of the cortical FFR and reveals a preference for speech-like sounds, supporting its potential role in processing speech intonation and lexical tones.

JBHI Journal 2020 Journal Article

GP-CNN-DTEL: Global-Part CNN Model With Data-Transformed Ensemble Learning for Skin Lesion Classification

  • Peng Tang
  • Qiaokang Liang
  • Xintong Yan
  • Shao Xiang
  • Dan Zhang

Precise skin lesion classification is still challenging due to two problems, i. e. , (1) inter-class similarity and intra-class variation of skin lesion images, and (2) the weak generalization ability of single Deep Convolutional Neural Network trained with limited data. Therefore, we propose a Global-Part Convolutional Neural Network (GP-CNN) model, which treats the fine-grained local information and global context information with equal importance. The Global-Part model consists of a Global Convolutional Neural Network (G-CNN) and a Part Convolutional Neural Network (P-CNN). Specifically, the G-CNN is trained with downscaled dermoscopy images, and is used to extract the global-scale information of dermoscopy images and produce the Classification Activation Map (CAM). While the P-CNN is trained with the CAM guided cropped image patches and is used to capture local-scale information of skin lesion regions. Additionally, we present a data-transformed ensemble learning strategy, which can further boost the classification performance by integrating the different discriminant information from GP-CNNs that are trained with original images, color constancy transformed images, and feature saliency transformed images, respectively. The proposed method is evaluated on the ISIC 2016 and ISIC 2017 Skin Lesion Challenge (SLC) classification datasets. Experimental results indicate that the proposed method can achieve the state-of-the-art skin lesion classification performance (i. e. , an AP value of 0. 718 on the ISIC 2016 SLC dataset and an Average Auc value of 0. 926 on the ISIC 2017 SLC dataset) without any external data, compared with other current methods which need to use external data.

NeurIPS Conference 2020 Conference Paper

Understanding Anomaly Detection with Deep Invertible Networks through Hierarchies of Distributions and Features

  • Robin Schirrmeister
  • Yuxuan Zhou
  • Tonio Ball
  • Dan Zhang

Deep generative networks trained via maximum likelihood on a natural image dataset like CIFAR10 often assign high likelihoods to images from datasets with different objects (e. g. , SVHN). We refine previous investigations of this failure at anomaly detection for invertible generative networks and provide a clear explanation of it as a combination of model bias and domain prior: Convolutional networks learn similar low-level feature distributions when trained on any natural image dataset and these low-level features dominate the likelihood. Hence, when the discriminative features between inliers and outliers are on a high-level, e. g. , object shapes, anomaly detection becomes particularly challenging. To remove the negative impact of model bias and domain prior on detecting high-level differences, we propose two methods, first, using the log likelihood ratios of two identical models, one trained on the in-distribution data (e. g. , CIFAR10) and the other one on a more general distribution of images (e. g. , 80 Million Tiny Images). We also derive a novel outlier loss for the in-distribution network on samples from the more general distribution to further improve the performance. Secondly, using a multi-scale model like Glow, we show that low-level features are mainly captured at early scales. Therefore, using only the likelihood contribution of the final scale performs remarkably well for detecting high-level feature differences of the out-of-distribution and the in-distribution. This method is especially useful if one does not have access to a suitable general distribution. Overall, our methods achieve strong anomaly detection performance in the unsupervised setting, and only slightly underperform state-of-the-art classifier-based methods in the supervised setting. Code can be found at https: //github. com/boschresearch/hierarchical anomaly detection.

NeurIPS Conference 2019 Conference Paper

Progressive Augmentation of GANs

  • Dan Zhang
  • Anna Khoreva

Training of Generative Adversarial Networks (GANs) is notoriously fragile, requiring to maintain a careful balance between the generator and the discriminator in order to perform well. To mitigate this issue we introduce a new regularization technique - progressive augmentation of GANs (PA-GAN). The key idea is to gradually increase the task difficulty of the discriminator by progressively augmenting its input or feature space, thus enabling continuous learning of the generator. We show that the proposed progressive augmentation preserves the original GAN objective, does not compromise the discriminator's optimality and encourages a healthy competition between the generator and discriminator, leading to the better-performing generator. We experimentally demonstrate the effectiveness of PA-GAN across different architectures and on multiple benchmarks for the image synthesis task, on average achieving 3 point improvement of the FID score.

JBHI Journal 2019 Journal Article

Weakly Supervised Biomedical Image Segmentation by Reiterative Learning

  • Qiaokang Liang
  • Yang Nan
  • Gianmarc Coppola
  • Kunglin Zou
  • Wei Sun
  • Dan Zhang
  • Yaonan Wang
  • Guanzhen Yu

Recent advances in deep learning have produced encouraging results for biomedical image segmentation; however, outcomes rely heavily on comprehensive annotation. In this paper, we propose a neural network architecture and a new algorithm, known as overlapped region forecast, for the automatic segmentation of gastric cancer images. To the best of our knowledge, this report for the first time describes that deep learning has been applied to the segmentation of gastric cancer images. Moreover, a reiterative learning framework that achieves superior performance without pretraining or further manual annotation is presented to train a simple network on weakly annotated biomedical images. We customize the loss function to make the model converge faster while avoiding becoming trapped in local minima. Patch boundary errors were eliminated by our overlapped region forecast algorithm. By studying the characteristics of the model trained using two different patch extraction methods, we train iteratively and integrate predictions and weak annotations to improve the quality of the training data. Using these methods, a mean Intersection over Union coefficient of 0. 883 and a mean accuracy of 91. 09% were achieved on the partially labeled dataset, thereby securing a win in the 2017 China Big Data and Artificial Intelligence Innovation and Entrepreneurship Competition.

YNIMG Journal 2013 Journal Article

Toward a minimally invasive brain–computer interface using a single subdural channel: A visual speller study

  • Dan Zhang
  • Huaying Song
  • Rui Xu
  • Wenjing Zhou
  • Zhipei Ling
  • Bo Hong

Electrocorticography (ECoG) has attracted increasing interest for implementing advanced brain–computer interfaces (BCIs) in the past decade. However, real-life application of ECoG BCI demands mitigation of its invasive nature by minimizing both the size of the involved brain regions and the number of implanted electrodes. In this study, we employed a recently proposed BCI paradigm that utilizes the attentional modulation of visual motion response. With ECoG data collected from five epilepsy patients, power increase of the high gamma (60–140Hz) frequency range was found to be associated with the overtly attended moving visual stimuli in the parietal-temporal-occipital junction and the occipital cortex. Event-related potentials (ERPs) were elicited as well but with broader cortical distribution. We achieved significantly higher BCI classification accuracy by employing both high gamma and ERP responses from a single ECoG electrode than by using ERP responses only (84. 22±5. 54% vs. 75. 48±4. 18%, p <0. 005, paired t-test, 3-trial averaging, binary results of attended vs. unattended). More importantly, the high gamma responses were located within brain regions specialized in visual motion processing as mapped by fMRI, suggesting the spatial location for electrode implantation can be determined prior to surgery using non-invasive imaging. Our findings demonstrate the feasibility of implementing a minimally invasive ECoG BCI.

NeurIPS Conference 2011 Conference Paper

Multiple Instance Learning on Structured Data

  • Dan Zhang
  • Yan Liu
  • Luo Si
  • Jian Zhang
  • Richard Lawrence

Most existing Multiple-Instance Learning (MIL) algorithms assume data instances and/or data bags are independently and identically distributed. But there often exists rich additional dependency/structure information between instances/bags within many applications of MIL. Ignoring this structure information limits the performance of existing MIL algorithms. This paper explores the research problem as multiple instance learning on structured data (MILSD) and formulates a novel framework that considers additional structure information. In particular, an effective and efficient optimization algorithm has been proposed to solve the original non-convex optimization problem by using a combination of Concave-Convex Constraint Programming (CCCP) method and an adapted Cutting Plane method, which deals with two sets of constraints caused by learning on instances within individual bags and learning on structured data. Our method has the nice convergence property, with specified precision on each set of constraints. Experimental results on three different applications, i. e. , webpage classification, market targeting, and protein fold identification, clearly demonstrate the advantages of the proposed method over state-of-the-art methods.

AAAI Conference 2011 Conference Paper

Transfer Latent Semantic Learning: Microblog Mining with Less Supervision

  • Dan Zhang
  • Yan Liu
  • Richard Lawrence
  • Vijil Chenthamarakshan

The increasing volume of information generated on microblogging sites such as Twitter raises several challenges to traditional text mining techniques. First, most texts from those sites are abbreviated due to the constraints of limited characters in one post; second, the input usually comes in streams of large-volumes. Therefore, it is of significant importance to develop effective and efficient representations of abbreviated texts for better filtering and mining. In this paper, we introduce a novel transfer learning approach, namely transfer latent semantic learning, that utilizes a large number of related tagged documents with rich information from other sources (source domain) to help build a robust latent semantic space for the abbreviated texts (target domain). This is achieved by simultaneously minimizing the document reconstruction error and the classification error of the labeled examples from the source domain by building a classifier with hinge loss in the latent semantic space. We demonstrate the effectiveness of our method by applying them to the task of classifying and tagging abbreviated texts. Experimental results on both synthetic datasets and real application datasets, including Reuters-21578 and Twitter data, suggest substantial improvements using our approach over existing ones.

IJCAI Conference 2009 Conference Paper

  • Dan Zhang
  • Fei Wang
  • Luo Si
  • Tao Li

Clustering, classification, and regression, are three major research topics in machine learning. So far, much work has been conducted in solving multiple instance classification and multiple instance regression problems, where supervised training patterns are given as bags and each bag consists of some instances. But the research on unsupervised multiple instance clustering is still limited. This paper formulates a novel Maximum Margin Multiple Instance Clustering (M3 IC) problem for the multiple instance clustering task. To avoid solving a nonconvex optimization problem directly, M3 IC is further relaxed, which enables an efficient optimization solution with a combination of Constrained Concave-Convex Procedure (CCCP) and the Cutting Plane method. Furthermore, this paper analyzes some important properties of the proposed method and the relationship between the proposed method and some other related ones. An extensive set of empirical results demonstrate the advantages of the proposed method against existing research for both effectiveness and efficiency.

AAAI Conference 2008 Conference Paper

Multi-View Local Learning

  • Dan Zhang
  • Changshui Zhang

The idea of local learning, i. e. , classifying a particular example based on its neighbors, has been successfully applied to many semi-supervised and clustering problems recently. However, the local learning methods developed so far are all devised for single-view problems. In fact, in many real-world applications, examples are represented by multiple sets of features. In this paper, we extend the idea of local learning to multi-view problem, design a multi-view local model for each example, and propose a Multi-View Local Learning Regularization (MVLL-Reg) matrix. Both its linear and kernel version are given. Experiments are conducted to demonstrate the superiority of the proposed method over several state-of-theart ones.