Author name cluster

Peng Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

114 papers

2 author rows

EAAI Journal 2026 Journal Article

A general framework for interactive semantic segmentation refinement of point clouds

Peng Zhang
Ting Wu
Jinsheng Sun
Weiqing Li
Zhiyong Su

Details DOI

EAAI Journal 2026 Journal Article

A multi-scale adaptive frequency domain network for few-shot current sensor fault diagnosis

Bin Chen
Hongmei Li
Haonan Zhao
Peng Zhang
Jiandong Huang

Details DOI

EAAI Journal 2026 Journal Article

A trend-aware reinforcement learning approach for adaptive motion planning of robotic manipulators in dynamic environments

Dexian Wang
Peng Zhang
Pengfei Ding
Junliang Wang
Jie Zhang

Details DOI

AAAI Conference 2026 Conference Paper

DarkBench+: An Extended Benchmark for Evaluating Dark Patterns in Large Language Models

Yaowen Liu
Shenjia Jing
Yufei Wei
Shoumin Zhang
Jinglu Zhang
Zhen Mei
Liangliang Yue
Jiarui Wang

With the widespread deployment of large language models (LLMs) in human-computer interaction, dark patterns have extended from traditional visual interfaces to conversational AI systems. While existing research has confirmed the prevalence of dark patterns in LLMs, current evaluation benchmarks face critical challenges including limited classification coverage, overlooked risks specific to reasoning models, and inadequate consideration of cross-linguistic differences. To address these limitations, we propose DarkBench+, an extended benchmark for evaluating dark patterns in LLMs. We construct an expanded taxonomy containing 10 major categories and 24 subcategories, introduce an annotation workflow combining manual and automated methods, and design 2,088 bilingual test samples in Chinese and English. This benchmark is the first to develop specialized evaluation dimensions for reasoning models and systematically evaluates dark pattern behaviors across nearly 40 mainstream LLMs. Experimental results demonstrate significant manipulation risks in reasoning models' transparency displays, while cross-linguistic evaluation analyzes AI manipulation behavior differences across different linguistic environments, promoting more ethical and responsible LLM development.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Differentiated Directional Intervention: A Framework for Evading LLM Safety Alignment

Peng Zhang
Peijie Sun

Safety alignment instills in Large Language Models (LLMs) a critical capacity to refuse malicious requests. Prior works have modeled this refusal mechanism as a single linear direction in the activation space. We posit that this is an oversimplification that conflates two functionally distinct neural processes: the detection of harm and the execution of a refusal. In this work, we deconstruct this single representation into a Harm Detection Direction and a Refusal Execution Direction. Leveraging this fine-grained model, we introduce Differentiated Bi-Directional Intervention (DBDI), a new white-box framework that precisely neutralizes the safety alignment at critical layer. DBDI applies adaptive projection nullification to the refusal execution direction while suppressing the harm detection direction via direct steering. Extensive experiments demonstrate that DBDI outperforms prominent jailbreaking methods, achieving up to a 97.88% attack success rate on models such as Llama-2. By providing a more granular and mechanistic framework, our work offers a new direction for the in-depth understanding of LLM safety alignment.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Elite Pattern Reinforcement for Vehicle Routing Problems

Ning Li
Peng Lin
Peng Zhang
Ruichen Tian

Machine learning methods have been increasingly applied to solve Vehicle Routing Problems (VRPs). A high-efficiency approach is to learn solution construction using deep neural networks. However, their tendency toward premature convergence is a critical barrier, severely hindering generalization across diverse distributions and scales. To overcome this, we introduce Elite-Pattern Reinforcement (EPR), a novel strategy designed to create a synergy between the diverse, exploratory nature of reinforcement learning and the high-quality, structured knowledge from classical heuristics. The strategy guides the learning process by reinforcing structural patterns from elite solutions, employing an elite-guided score modulation to integrate this external knowledge. The inherent symmetry of path patterns is also exploited to augment the structural information. This steers the policy away from premature convergence by enabling it to distinguish and favour elite path patterns over inferior ones. Integrating our strategy with four construction methods yields substantial performance improvements on the CVRPLIB and TSPLIB benchmarks. Furthermore, our approach outperforms state-of-the-art learning-based methods, demonstrating superior generalization across diverse distributions and scales.

PDF Details DOI

AAAI Conference 2026 Conference Paper

FaNe: Towards Fine-Grained Cross-Modal Contrast with False-Negative Reduction and Text-Conditioned Sparse Attention

Peng Zhang
Zhihui Lai
Wenting Chen
Xu Wu
Heng Kong

Medical vision-language pre-training (VLP) offers significant potential for advancing medical image understanding by leveraging paired image-report data. However, existing methods are limited by False Negatives (FaNe) induced by semantically similar texts and insufficient fine-grained cross-modal alignment. To address these limitations, we propose FaNe, a semantic-enhanced VLP framework. To mitigate false negatives, we introduce a semantic-aware positive pair mining strategy based on text-text similarity with adaptive normalization. Furthermore, we design a text-conditioned sparse attention pooling module to enable fine-grained image-text alignment through localized visual representations guided by textual cues. To strengthen intra-modal discrimination, we develop a hard-negative aware contrastive loss that adaptively reweights semantically similar negatives. Extensive experiments on five downstream medical imaging benchmarks demonstrate that FaNe achieves state-of-the-art performance across image classification, object detection, and semantic segmentation, validating the effectiveness of our framework.

PDF Details DOI

AAAI Conference 2026 Conference Paper

FUSE: Fine-Grained and Semantic-Aware Learning for Unified Image Understanding and Generation

Peng Zhang
Wanggui He
Mushui Liu
Wenyi Xiao
Siyu Zou
Yuan Li
Xingjian Wang
Guanghao Zhang

Recent unified models have demonstrated that the reasoning capacity of Multimodal Large Language Models (MLLMs) can be leveraged to facilitate diffusion-based image generation with impressive flexibility and performance. However, approaches that rely heavily on MLLMs for high-level semantic encoding often struggle with fine-grained visual tasks like image editing and virtual try-on. To address this gap, we propose FUSE, a unified framework excelling at both high-level vision–language understanding and fine-grained generation. First, we introduce a Semantic-to-Detail Connector that pre-aligns fine-grained visual features with the MLLM's semantic space. This design counteracts the low-level information loss inherent in MLLM encodings, creating a unified representation that steers the diffusion process with both global semantics and rich local details. Second, to further enhance semantic awareness and detail preservation, we introduce Adaptive-GRPO, a post-training objective that dynamically balances semantic coherence against pixel-level fidelity. The integration of these two innovations allows FUSE to generate images that are both semantically faithful and visually fine-grained. Comprehensive experiments on text-to-image and instruction-guided editing benchmarks show that FUSE significantly outperforms existing unified baselines, achieving 0.89 on Geneval, 0.65 on WISE, and 3.88 on ImageEdit.

PDF Details DOI

AAAI Conference 2026 Conference Paper

IROTE: Human-like Traits Elicitation of Large Language Model via In-Context Self-Reflective Optimization

Yuzhuo Bai
Shitong Duan
Muhua Huang
Jing Yao
Zhenghao Liu
Peng Zhang
Tun Lu
Xiaoyuan Yi

Trained on various human-authored corpora, Large Language Models (LLMs) have demonstrated a certain capability of reflecting specific human-like traits (e.g., personality or values) by prompting, benefiting applications like personalized LLMs and social simulations. However, existing methods suffer from the superficial elicitation problem: LLMs can only be steered to mimic shallow and unstable stylistic patterns, failing to embody the desired traits precisely and consistently across diverse tasks like humans. To address this challenge, we propose IROTE, a novel in-context method for stable and transferable trait elicitation. Drawing on psychological theories suggesting that traits are formed through identity-related reflection, our method automatically generates and optimizes a textual self-reflection within prompts, which comprises self-perceived experience, to stimulate LLMs' trait-driven behavior. The optimization is performed by iteratively maximizing an information-theoretic objective that enhances the connections between LLMs' behavior and the target trait, while reducing noisy redundancy in reflection without any fine-tuning, leading to evocative and compact trait reflection. Extensive experiments across three human trait systems manifest that one single IROTE-generated self-reflection can induce LLMs' stable impersonation of the target trait across diverse downstream tasks beyond simple questionnaire answering, consistently outperforming existing strong baselines.

PDF Details DOI

AAAI Conference 2026 Conference Paper

MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions

Yanxu Zhu
Shitong Duan
Xiangxu Zhang
Jitao Sang
Peng Zhang
Tun Lu
Xiao Zhou
Jing Yao

Recently Multimodal Large Language Models (MLLMs) have achieved considerable advancements in vision-language tasks, yet produce potentially harmful or untrustworthy content. Despite substantial work investigating the trustworthiness of language models, MMLMs' capability to act honestly, especially when faced with visually unanswerable questions, remains largely underexplored. This work presents the first systematic assessment of honesty behaviors across various MLLMs. We ground honesty in models' response behaviors to unanswerable visual questions, define four representative types of such questions, and construct MoHoBench, a large-scale MMLM honest benchmark, consisting of 12k+ visual question samples, whose quality is guaranteed by multi-stage filtering and human verification. Using MoHoBench, we benchmarked the honesty of 28 popular MMLMs and conducted a comprehensive analysis. Our findings show that: (1) most models fail to appropriately refuse to answer when necessary, and (2) MMLMs' honesty is not solely a language modeling issue, but is deeply influenced by visual information, necessitating the development of dedicated methods for multimodal honesty alignment. Therefore, we implemented initial alignment methods using supervised and preference learning to improve honesty behavior, providing a foundation for future work on trustworthy MLLMs.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Phased One-Step Adversarial Equilibrium for Video Diffusion Models

Jiaxiang Cheng
Bing Ma
Xuhua Ren
Hongyi Henry Jin
Kai Yu
Peng Zhang
Wenyue Li
Yuan Zhou

Video diffusion generation suffers from critical sampling efficiency bottlenecks, particularly for large-scale models and long contexts. Existing video acceleration methods, adapted from image-based techniques, lack a single-step distillation ability for large-scale video models and task generalization for conditional downstream tasks. To bridge this gap, we propose the Video Phased Adversarial Equilibrium (V-PAE), a distillation framework that enables high-quality, single-step video generation from large-scale video models. Our approach employs a two-phase process. (i) Stability priming is a warm-up process to align the distributions of real and generated videos. It improves the stability of single-step adversarial distillation in the following process. (ii) Unified adversarial equilibrium is a flexible self-adversarial process that reuses generator parameters for the discriminator backbone. It achieves a co-evolutionary adversarial equilibrium in the Gaussian noise space. For the conditional tasks, we primarily preserve video-image subject consistency, which is caused by semantic degradation and conditional frame collapse during the distillation training in image-to-video (I2V) generation. Comprehensive experiments on VBench-I2V demonstrate that V-PAE outperforms existing acceleration methods by an average of 5.8% in the overall quality score, including semantic alignment, temporal coherence, and frame quality. In addition, our approach reduces the diffusion latency of the large-scale video model (e.g., Wan2.1-I2V-14B) by 100 times, while preserving competitive performance.

PDF Details DOI

YNIMG Journal 2026 Journal Article

The amplitude and latency of the earliest signal in V1 encode bottom-up saliency by feature conjunction

Chen Wu
Xiaoning Li
Huan Li
Xuan Wang
Ziang Yin
Zeyu Wang
Peng Zhang
Zhikuan Yang

Details DOI

AAAI Conference 2026 Conference Paper

Unified Interaction Consistency Learning for Single-Source Domain-Generalized Object Detection in Urban Scene

Peng Zhang
Xiang Yuan
Gong Cheng

Domain generalization remains a critical challenge for deploying neural networks, particularly in out-of-distribution object detection. The distributional discrepancy between training (e.g., daytime-sunny) and the realistic condition (e.g., night-rainy) inevitably produces imprecise localization and wrong classification. To address these issues, we propose a unified interaction consistency learning (UICL) framework, a novel single-source domain-generalized method designed to learn intra-class domain-invariant representations. Specifically, we put forth a cross-domain interaction mechanism to exchange region proposals between original and augmented pipelines, enriching the diversity of instance-level representations. Building upon this, we propose prediction-guided consistency learning to unify the interaction mechanism and harmonize the cross-domain representations, contributing to a discriminative prediction distribution under domain shift. In addition, we devise a cyclic interaction resilient detection strategy, which mitigates inaccurate predictions suffering from partial occlusion and ambiguous boundaries among different domains. Extensive experiments evidence that UICL significantly improves the robustness of detectors over several target domains, achieving state-of-the-art generalization performance on the diverse weather benchmark.

PDF Details DOI

AAAI Conference 2025 Conference Paper

3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding

Xindian Ma
Wenyuan Liu
Peng Zhang
Nan Xu

An essential component in Large Language Models (LLMs) is Rotary Position Encoding (RoPE), which efficiently manages positional dependencies in long-context modeling. However, when the number of input tokens surpasses the pretrained capacity of LLMs, their ability to process and generate text is markedly weakened. Although position interpolation techniques for RoPE can mitigate this issue, an increase in interpolations leads to a decrease in positional resolution. To tackle this challenge, drawing inspiration from the Bloch Sphere representation, we propose a novel rotary position encoding on a three-dimensional sphere, named 3D Rotary Position Encoding (3D-RPE). 3D-RPE is an advanced version of the widely used 2D RoPE, with two major advantages for modeling long contexts: controllable long-term decay and improved position resolution. For controllable long-term decay, 3D-RPE allows for the regulation of long-term decay within the chunk size, ensuring the modeling of relative positional information between tokens at a distant relative position. For improved position resolution, 3D-RPE can mitigate the degradation of position resolution caused by position interpolation on RoPE. We have conducted experiments on long-context Natural Language Understanding (NLU) and long sequence Language Modeling (LM) tasks. From the experimental results, 3D-RPE achieved performance improvements over RoPE, especially in long-context NLU tasks.

PDF Details DOI

EAAI Journal 2025 Journal Article

A multimodal multi-scale fusion network for leak detection in marine piping systems

Peng Zhang
Chaozhe Li
Shitao Peng
Bomu Tian
Si Luo
Yuewen Zhang
Taili Du

Details DOI

JAIR Journal 2025 Journal Article

A New Literature Review of 3D Object Detection on Autonomous Driving

Peng Zhang
Xin Li
Xin Lin
Liang He

In recent years, the realm of computer vision has experienced a significant surge in the importance of 3D object detection, especially in the context of autonomous driving. The capability to precisely identify the locations, dimensions, and types of key 3D objects surrounding an autonomous vehicle is crucial, rendering 3D object detection a vital component of any advanced perception system. This review delivers an extensive overview of the emerging technologies in 3D object detection tailored for autonomous vehicles. It encompasses a thorough examination, evaluation, and integration of the current research landscape in this domain, staying up-to-date with the latest advancements in 3D object detection and suggesting prospective avenues for future research. Our survey begins by clarifying the principles of 3D object detection and addressing its present challenges in the 3D domain. We then introduce three distinct taxonomies: camera-based, point cloudbased, and multi-modality-based approaches, providing a comprehensive classification of contemporary 3D object detection methodologies from various angles. Diverging from previous reviews, this paper also highlights and scrutinizes common issues and solutions for specific scenarios (such as pedestrian detection, lane lines, roadside cameras, and weather conditions) in object detection. Furthermore, we conduct an in-depth analysis and comparison of different classifications and methods, utilizing various datasets and experimental outcomes. Conclusively, we suggest several potential research directions, offering valuable insights for the ongoing evolution of 3D object detection technology. This review aims to serve as a comprehensive resource for researchers and practitioners in the field, guiding future innovations in 3D object detection for autonomous driving.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Aerodynamic Coefficients Prediction via Cross-Attention Fusion and Physical-Informed Training

Yueqing Wang
Peng Zhang
Yushuang Liu
Jianing Zhao
Jie Lin
Yi Chen

Aerodynamic coefficient prediction is pivotal in aircraft and vehicles' design, performance evaluation, and motion control. Integrating artificial neural networks into aerodynamic coefficient prediction offers a promising alternative to traditional numerical methods burdened by extensive computations and high costs. Nevertheless, this data-driven approach faces several critical challenges, which limit its further performance enhancement: i) The current research lacks a profound understanding of the complex interplay between the shape of an object and its aerodynamic characteristics. ii) The scarcity of high-quality aerodynamic data poses a significant barrier. The models trained on limited datasets lack generalization ability, struggling to accurately predict and adapt to diverse aerodynamic performance under new shapes or conditions. To overcome these challenges, we introduce an innovative framework that employs cross-attention to capture the intimate interplay between shape and flow conditions and allows for the direct utilization of pre-trained models on general shape datasets to mitigate the scarcity of aerodynamic data. Furthermore, to bolster the inference capabilities of this data-driven approach, we integrate physical information constraints into the model, leveraging them as guiding principles to enhance the model's predictive power under unknown conditions. Experimental validation demonstrates that our proposed method performs excellently in multiple aerodynamic prediction tasks. This achievement brings a new technological breakthrough to the field of aerodynamic prediction and provides robust support for the design optimization of complex systems such as aircraft and vehicles.

PDF Details DOI

YNICL Journal 2025 Journal Article

Altered functional connectivity of brainstem ARAS nuclei unveils the mechanisms of disorders of consciousness in sTBI: an exploratory study

Peng Zhang
Yinan Zhou
Haoqi Ni
Zhaoneng Huang
Can Tang
Qichuan Zhuge
Lun Dong
Jun Zhang

Details DOI

NeurIPS Conference 2025 Conference Paper

Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

Runyu Lu
Peng Zhang
Ruochuan Shi
Yuanheng Zhu
Dongbin Zhao
Yang Liu
Dong Wang
Cesare Alippi

Equilibrium learning in adversarial games is an important topic widely examined in the fields of game theory and reinforcement learning (RL). Pursuit-evasion game (PEG), as an important class of real-world games from the fields of robotics and security, requires exponential time to be accurately solved. When the underlying graph structure varies, even the state-of-the-art RL methods require recomputation or at least fine-tuning, which can be time-consuming and impair real-time applicability. This paper proposes an Equilibrium Policy Generalization (EPG) framework to effectively learn a generalized policy with robust cross-graph zero-shot performance. In the context of PEGs, our framework is generally applicable to both pursuer and evader sides in both no-exit and multi-exit scenarios. These two generalizability properties, to our knowledge, are the first to appear in this domain. The core idea of the EPG framework is to train an RL policy across different graph structures against the equilibrium policy for each single graph. To construct an equilibrium oracle for single-graph policies, we present a dynamic programming (DP) algorithm that provably generates pure-strategy Nash equilibrium with near-optimal time complexity. To guarantee scalability with respect to pursuer number, we further extend DP and RL by designing a grouping mechanism and a sequence model for joint policy decomposition, respectively. Experimental results show that, using equilibrium guidance and a distance feature proposed for cross-graph PEG training, the EPG framework guarantees desirable zero-shot performance in various unseen real-world graphs. Besides, when trained under an equilibrium heuristic proposed for the graphs with exits, our generalized pursuer policy can even match the performance of the fine-tuned policies from the state-of-the-art PEG methods.

PDF Details

ICML Conference 2025 Conference Paper

Gamma Distribution PCA-Enhanced Feature Learning for Angle-Robust SAR Target Recognition

Chong Zhang
Peng Zhang
Mengke Li 0001

Scattering characteristics of synthetic aperture radar (SAR) targets are typically related to observed azimuth and depression angles. However, in practice, it is difficult to obtain adequate training samples at all observation angles, which probably leads to poor robustness of deep networks. In this paper, we first propose a Gamma-Distribution Principal Component Analysis ($\Gamma$PCA) model that fully accounts for the statistical characteristics of SAR data. The $\Gamma$PCA derives consistent convolution kernels to effectively capture the angle-invariant features of the same target at various attitude angles, thus alleviating deep models’ sensitivity to angle changes in SAR target recognition task. We validate $\Gamma$PCA model based on two commonly used backbones, ResNet and ViT, and conduct multiple robustness experiments on the MSTAR benchmark dataset. The experimental results demonstrate that $\Gamma$PCA effectively enables the model to withstand substantial distributional discrepancy caused by angle changes. Additionally, $\Gamma$PCA convolution kernel is designed to require no parameter updates, introducing no extra computational burden to the network. The source code is available at https: //github. com/ChGrey/GammaPCA.

Details

AAAI Conference 2025 Conference Paper

Large Language Models Enhanced Personalized Graph Neural Architecture Search in Federated Learning

Hui Fang
Yang Gao
Peng Zhang
Jiangchao Yao
Hongyang Chen
Haishuai Wang

Personalized federated learning (PFL) on graphs is an emerging field focusing on the collaborative development of architectures across multiple clients, each with distinct graph data distributions while adhering to strict privacy standards. This area often requires extensive expert intervention in model design, which is a significant limitation. Recent advancements have aimed to automate the search for graph neural network architectures, incorporating large language models (LLMs) for their advanced reasoning and self-reflection capabilities. However, two technical challenges persist. First, although LLMs are effective in natural language processing, their ability to meet the complex demands of graph neural architecture search (GNAS) is still being explored. Second, while LLMs can guide the architecture search process, they do not directly solve the issue of client drift due to heterogeneous data distributions. To address these challenges, we introduce a novel method, Personalized Federated Graph Neural Architecture Search (PFGNAS). This approach employs a task-specific prompt to identify and integrate optimal GNN architectures continuously. To counteract client drift, PFGNAS utilizes a weight-sharing strategy of supernet, which optimizes the local architectures while ensuring client-specific personalization. Extensive evaluations show that PFGNAS significantly outperforms traditional PFL methods, highlighting the advantages of integrating LLMs into personalized federated learning environments.

PDF Details DOI

EAAI Journal 2025 Journal Article

Mutual transfer learning for cuff-less blood pressure estimation using photoplethysmography-based visibility graphs

Chenbin Ma
Zhenchang Liu
Peng Zhang
Lishuang Guo
Haonan Zhang
Zeyu Liu
Guanglei Zhang

Details DOI

NeurIPS Conference 2025 Conference Paper

NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding

Wei Xu
Cheng Wang
Dingkang Liang
Zongchuang Zhao
Xingyu Jiang
Peng Zhang
Xiang Bai

Underwater exploration offers critical insights into our planet and attracts increasing attention for its broader applications in resource exploration, national security, etc. We study the underwater scene understanding methods, which aim to achieve automated underwater exploration. The underwater scene understanding task demands multi-task perceptions from multiple granularities. However, the absence of large-scale underwater multi-task instruction-tuning datasets hinders the progress of this research. To bridge this gap, we construct NautData, a dataset containing 1. 45 M image-text pairs supporting eight underwater scene understanding tasks. It enables the development and thorough evaluation of the underwater scene understanding models. Underwater image degradation is a widely recognized challenge that interferes with underwater tasks. To improve the robustness of underwater scene understanding, we introduce physical priors derived from underwater imaging models and propose a plug-and-play vision feature enhancement (VFE) module, which explicitly restores clear underwater information. We integrate this module into renowned baselines LLaVA-1. 5 and Qwen2. 5-VL and build our underwater LMM, NAUTILUS. Experiments conducted on the NautData and public underwater datasets demonstrate the effectiveness of the VFE module, consistently improving the performance of both baselines on the majority of supported tasks, thus ensuring the superiority of NAUTILUS in the underwater scene understanding area. Data and models are available at https: //github. com/H-EmbodVis/NAUTILUS.

PDF Details

IJCAI Conference 2025 Conference Paper

NovPhy: A Physical Reasoning Benchmark for Open-World AI Systems Author Links Open Overlay Panel (Abstract Reprint)

Vimukthini Pinto
Chathura Gamage
Cheng Xue
Peng Zhang
Ekaterina Nikonova
Matthew Stephenson
Jochen Renz

Due to the emergence of AI systems that interact with the physical environment, there is an increased interest in incorporating physical reasoning capabilities into those AI systems. But is it enough to only have physical reasoning capabilities to operate in a real physical environment? In the real world, we constantly face novel situations we have not encountered before. As humans, we are competent at successfully adapting to those situations. Similarly, an agent needs to have the ability to function under the impact of novelties in order to properly operate in an open-world physical environment. To facilitate the development of such AI systems, we propose a new benchmark, NovPhy, that requires an agent to reason about physical scenarios in the presence of novelties and take actions accordingly. The benchmark consists of tasks that require agents to detect and adapt to novelties in physical scenarios. To create tasks in the benchmark, we develop eight novelties representing a diverse novelty space and apply them to five commonly encountered scenarios in a physical environment, related to applying forces and motions such as rolling, falling, and sliding of objects. According to our benchmark design, we evaluate two capabilities of an agent: the performance on a novelty when it is applied to different physical scenarios and the performance on a physical scenario when different novelties are applied to it. We conduct a thorough evaluation with human players, learning agents, and heuristic agents. Our evaluation shows that humans' performance is far beyond the agents' performance. Some agents, even with good normal task performance, perform significantly worse when there is a novelty, and the agents that can adapt to novelties typically adapt slower than humans. We promote the development of intelligent agents capable of performing at the human level or above when operating in open-world physical environments. Benchmark website: https: //github. com/phy-q/novphy

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

OmniTalker: One-shot Real-time Text-Driven Talking Audio-Video Generation With Multimodal Style Mimicking

Zhongjian Wang
Peng Zhang
Jinwei Qi
Wang Yuan
Sheng Xu
Bang Zhang

Although significant progress has been made in audio-driven talking head generation, text-driven methods remain underexplored. In this work, we present OmniTalker, a unified framework that jointly generates synchronized talking audio-video content from input text while emulating the target identity's speaking and facial movement styles, including speech characteristics, head motion, and facial dynamics. Our framework adopts a dual-branch diffusion transformer (DiT) architecture, with one branch dedicated to audio generation and the other to video synthesis. At the shallow layers, cross-modal fusion modules are introduced to integrate information between the two modalities. In deeper layers, each modality is processed independently, with the generated audio decoded by a vocoder and the video rendered using a GAN-based high-quality visual renderer. Leveraging DiT’s in-context learning capability through a masked-infilling strategy, our model can simultaneously capture both audio and visual styles without requiring explicit style extraction modules. Thanks to the efficiency of the DiT backbone and the optimized visual renderer, OmniTalker achieves real-time inference at 25 FPS. To the best of our knowledge, OmniTalker is the first one-shot framework capable of jointly modeling speech and facial styles in real time. Extensive experiments demonstrate its superiority over existing methods in terms of generation quality, particularly in preserving style consistency and ensuring precise audio-video synchronization, all while maintaining efficient inference.

PDF Details

JBHI Journal 2025 Journal Article

PPG-Based Continuous BP Waveform Estimation Using Polarized Attention-Guided Conditional Adversarial Learning Model

Chenbin Ma
Yangfan Xu
Peng Zhang
Fan Song
Yangyang Sun
Youdan Feng
Yufang He
Guanglei Zhang

The blood pressure (BP) waveform is a vital source of physiological and pathological information concerning the cardiovascular system. This study proposes a novel attention-guided conditional generative adversarial network (cGAN), named PPG2BP-cGAN, to estimate BP waveforms based on photoplethysmography (PPG) signals. The proposed model comprises a generator and a discriminator. Specifically, the UNet3+-based generator integrates a full-scale skip connection structure with a modified polarized self-attention module based on a spatial-temporal attention mechanism. Additionally, its discriminator comprises PatchGAN, which augments the discriminative power of the generated BP waveform by increasing the perceptual field through fully convolutional layers. We demonstrate the superior BP waveform prediction performance of our proposed method compared to state-of-the-art (SOTA) techniques on two independent datasets. Our approach first pre-trained on a dataset containing 683 subjects and then tested on a public dataset. Experimental results from the Multi-parameter Intelligent Monitoring in Intensive Care dataset show that the proposed method achieves a root mean square error of 3. 54, mean absolute error of 2. 86, and Pearson coefficient of 0. 99 for BP waveform estimation. Furthermore, the estimation errors (mean error ± standard deviation error) for systolic BP and diastolic BP are 0. 72 ± 4. 34 mmHg and 0. 41 ± 2. 48 mmHg, respectively, meeting the American Association for the Advancement of Medical Instrumentation standard. Our approach exhibits significant superiority over SOTA techniques on independent datasets, thus highlighting its potential for future applications in continuous cuffless BP waveform measurement.

Details DOI

JBHI Journal 2025 Journal Article

Robust R-Peak Detection in Noisy ECGs via Strip-Attention YOLO with Multilead Fusion

Wang Peng
Peng Tang
Hao Wang
Yuhang Liu
Qiang Li
Peng Zhang

The R-peak in electrocardiogram (ECG) signals is a critical physiological marker for the diagnosis of cardiovascular diseases. Although various R-peak detection methods have been proposed, their performance is often hindered by noise, especially in dynamic ECG monitoring. Furthermore, the potential of harnessing complementary information from 12-lead ECG signals has not been fully exploited. To address these challenges, this study conceptualized 12-lead ECG data as two-dimensional images and employed YOLOv5 as the model's backbone for R-peak detection, effectively transforming a signal segmentation task into an object detection task in images. Specifically, considering the characteristics of consistent R-peak positions across different leads, we proposed a strip attention mechanism to treat horizontal or vertical strips as tokens for computing inter- and intra-strip attention, enhancing the model's ability to capture R-peak positional information and likelihood. Additionally, a one-dimensional Manhattan distance-based NMS algorithm was used to minimize redundant detection frames, thereby enhancing model performance. The proposed model was rigorously evaluated on two publicly available datasets, INCART and LUDB, under varying noise conditions. On the INCART dataset, the model achieved F1 scores of 99. 97%, 99. 86%, 99. 63%, and 98. 00% at noise levels of Original, SNR = 10 dB, SNR = 5 dB, and SNR = 0 dB, respectively. Similarly, on the LUDB dataset, the F1 scores were 99. 89%, 100%, 100%, and 99. 86% for the corresponding noise levels. Extensive testing across multiple datasets and noise scenarios demonstrated that the proposed model outperformed existing state-of-the-art methods in terms of accuracy, noise robustness, and generalization capability.

Details DOI

AAAI Conference 2025 Conference Paper

SIGraph: Saliency Image-Graph Network for Retinal Disease Classification in Fundus Image

Peng Zhang
Yuan Li
Haotian Song
Yankai Jiang
Yubo Tao
Hai Lin
Hongguang Cui

An efficient and precise diagnosis of retinal diseases is a fundamental goal for auxiliary diagnostic systems in ophthalmology. Inspired by the importance of scattered subtle lesions in manual retinal disease diagnosis, recent research has achieved state-of-the-art performance by mining information related to subtle lesions, including their texture and shape. However, the spatial distribution patterns of subtle lesion areas, which are also crucial in manual diagnosis, have been overlooked in existing research. Neglecting these spatial distribution patterns (e.g., the ring distribution of microaneurysms in diabetic macular edema) may negatively impact the diagnostic process. In this paper, we introduce the Saliency-Image-Graph (SIGraph) network to capture the spatial distribution patterns of lesion areas. We first employ saliency-based perception to identify latent lesion pixels. Subsequently, we propose a novel image-graph block to efficiently capture the global distribution of abundant lesion pixels with minimal information loss. By leveraging additional distribution patterns, SIGraph achieves state-of-the-art performance with at least a 1.5% performance gain across three datasets. Furthermore, ablation studies demonstrate that our image-graph block can be integrated into other visual backbones and effectively boost performance.

PDF Details DOI

JBHI Journal 2025 Journal Article

Unsupervised Feature Selection-Driven Active Learning for Semi-Supervised Automatic ECG Analysis

Xiao Li
Yongkang Zhou
Songyang An
Yu Zeng
Xinqi Zhang
Jun Wang
Yizhe Huang
Fan Lin

Automatic analysis methods of electrocardiograms (ECGs) usually required large-scale annotated training data, but the annotation process is extremely time-consuming. While semi-supervised learning can leverage unlabeled data, its performance depends heavily on the quality of the initial labeled subset. Active learning has been used to identify the most informative samples for annotation, but conventional approaches face three critical limitations: (1) dependency on manual intervention for iterative query design, (2) prohibitive computational costs during sample selection, and (3) limited compatibility with semi-supervised learning frameworks. To address these limitations, we proposed an Unsupervised Active Feature-selective Semi-Supervised Learning (UAFSSL) framework for ECG analysis, including an unsupervised feature selection-based active learning module and a semi-supervised learning module. UAFSSL captures latent data distributions via unsupervised feature extraction, selects diverse and representative samples using pseudo-label clustering, and integrates seamlessly with semi-supervised learning to eliminate human intervention. We validated our algorithm on an ECG waveform segmentation task and an atrial fibrillation detection task. In the waveform segmentation task, our method improved the F1-score for P-wave delineation by 2. 4% compared to random sampling, using only 5% of labeled samples. For the atrial fibrillation detection task, we evaluated our method on both the AFDB and a 24-hour dataset collected from 500 atrial fibrillation patients. Using only 200 labeled samples for model training, our method achieved AUC improvements of 2. 5% and 2. 2% over random sampling in five-fold cross validation. This is the first study to integrate unsupervised active learning with semi-supervised learning for automatic ECG analysis, offering a robust, automated solution to reduce annotation costs while enhancing clinical applicability.

Details DOI

JBHI Journal 2024 Journal Article

A Novel Feature Engineering Method Based on Latent Representation Learning for Radiomics: Application in NSCLC Subtype Classification

Fan Song
Jiaxin Tian
Peng Zhang
Chenbin Ma
Yangyang Sun
Youdan Feng
Tianyi Zhang
Yanli Lei

Radiomics refers to the high-throughput extraction of quantitative features from medical images, and is widely used to construct machine learning models for the prediction of clinical outcomes, while feature engineering is the most important work in radiomics. However, current feature engineering methods fail to fully and effectively utilize the heterogeneity of features when dealing with different kinds of radiomics features. In this work, latent representation learning is first presented as a novel feature engineering approach to reconstruct a set of latent space features from original shape, intensity and texture features. This proposed method projects features into a subspace called latent space, in which the latent space features are obtained by minimizing a unique hybrid loss function including a clustering-like loss and a reconstruction loss. The former one ensures the separability among each class while the latter one narrows the gap between the original features and latent space features. Experiments were performed on a multi-center non-small cell lung cancer (NSCLC) subtype classification dataset from 8 international open databases. Results showed that compared with four traditional feature engineering methods (baseline, PCA, Lasso and L2, 1-norm minimization), latent representation learning could significantly improve the classification performance of various machine learning classifiers on the independent test set (all p < 0. 001). Further on two additional test sets, latent representation learning also showed a significant improvement in generalization performance. Our research shows that latent representation learning is a more effective feature engineering method, which has the potential to be used as a general technology in a wide range of radiomics researches.

Details DOI

JBHI Journal 2024 Journal Article

Auto Diagnosis of Parkinson's Disease Via a Deep Learning Model Based on Mixed Emotional Facial Expressions

Wei Huang
Wenqiang Xu
Renjie Wan
Peng Zhang
Yufei Zha
Meng Pang

Parkinson's disease (PD) is a common degenerative disease of the nervous system in the elderly. The early diagnosis of PD is very important for potential patients to receive prompt treatment and avoid the aggravation of the disease. Recent studies have found that PD patients always suffer from emotional expression disorder, thus forming the characteristics of “masked faces”. Based on this, we thus propose an auto PD diagnosis method based on mixed emotional facial expressions in the paper. Specifically, the proposed method is cast into four steps: Firstly, we synthesize virtual face images containing six basic expressions (i. e. , anger, disgust, fear, happiness, sadness, and surprise) via generative adversarial learning, in order to approximate the premorbid expressions of PD patients; Secondly, we design an effective screening scheme to assess the quality of the above synthesized facial expression images and then shortlist the high-quality ones; Thirdly, we train a deep feature extractor accompanied with a facial expression classifier based on the mixture of the original facial expression images of the PD patients, the high-quality synthesized facial expression images of PD patients, and the normal facial expression images from other public face datasets; Finally, with the well-trained deep feature extractor, we thus adopt it to extract the latent expression features for six facial expression images of a potential PD patient to conduct PD/non-PD prediction. To show real-world impacts, we also collected a new facial expression dataset of PD patients in collaboration with a hospital. Extensive experiments are conducted to validate the effectiveness of the proposed method for PD diagnosis and facial expression recognition.

Details DOI

AAAI Conference 2024 Conference Paper

CcDPM: A Continuous Conditional Diffusion Probabilistic Model for Inverse Design

Yanxuan Zhao
Peng Zhang
Guopeng Sun
Zhigong Yang
Jianqiang Chen
Yueqing Wang

Engineering design methods aim to generate new designs that meet desired performance requirements. Past work has directly introduced conditional Generative Adversarial Networks (cGANs) into this field and achieved promising results in single-point design problems(one performance requirement under one working condition). However, these methods assume that the performance requirements are distributed in categorical space, which is not reasonable in these scenarios. Although Continuous conditional GANs (CcGANs) introduce Vicinal Risk Minimization (VRM) to reduce the performance loss caused by this assumption, they still face the following challenges: 1) CcGANs can not handle multi-point design problems (multiple performance requirements under multiple working conditions). 2) Their training process is time-consuming due to the high computational complexity of the vicinal loss. To address these issues, A Continuous conditional Diffusion Probabilistic Model (CcDPM) is proposed, which the first time introduces the diffusion model into the engineering design area and VRM into the diffusion model. CcDPM adopts a novel sampling method called multi-point design sampling to deal with multi-point design problems. Moreover, the k-d tree is used in the training process of CcDPM to shorten the calculation time of vicinal loss and speed up the training process by 2-300 times in our experiments. Experiments on a synthetic problem and three real-world design problems demonstrate that CcDPM outperforms the state-of-the-art GAN models.

PDF Details DOI

EAAI Journal 2024 Journal Article

MBRB: Micro-belief rule Base model based on cautious conjunctive rule for interpretable fault diagnosis

Chunchao Zhang
Zhijie Zhou
Pengyun Ning
Peng Zhang
Zheng Lian
Zhichao Ming

Details DOI

EAAI Journal 2024 Journal Article

Multi-attribute group decision-making method using single-valued neutrosophic credibility numbers with the Dombi extended power aggregation operator and its application in intelligent transportation system data collection scheme selection

Pingqing Liu
Junxin Shen
Peng Zhang

Details DOI

JBHI Journal 2024 Journal Article

Multi-Feature Decision Fusion Network for Heart Sound Abnormality Detection and Classification

Haobo Zhang
Peng Zhang
Zhiwei Wang
Lianying Chao
Yuting Chen
Qiang Li

The heart sound reflects the movement status of the cardiovascular system and contains the early pathological information of cardiovascular diseases. Automatic heart sound diagnosis plays an essential role in the early detection of cardiovascular diseases. In this study, we aim to develop a novel end-to-end heart sound abnormality detection and classification method, which can be adapted to different heart sound diagnosis tasks. Specifically, we developed a Multi-feature Decision Fusion Network (MDFNet) composed of a Multi-dimensional Feature Extraction (MFE) module and a Multi-dimensional Decision Fusion (MDF) module. The MFE module extracted spatial features, multi-level temporal features and spatial-temporal fusion features to learn heart sound characteristics from multiple perspectives. Through deep supervision and decision fusion, the MDF module made the multi-dimensional features extracted by the MFE module more discriminative, and fused the decision results of multi-dimensional features to integrate complementary information. Furthermore, attention modules were embedded in the MDFNet to emphasize the fundamental heart sounds containing effective feature information. Finally, we proposed an efficient data augmentation method to circumvent the diagnosis performance degradation caused by the lack of cardiac cycle segmentation in other end-to-end methods. The developed method achieved an overall accuracy of 94. 44% and a F1-score of 86. 90% on the binary classification task and a F1-score of 99. 30% on the five-classification task. Our method outperformed other state-of-the-art methods and had good clinical application prospects.

Details DOI

AIJ Journal 2024 Journal Article

NovPhy: A physical reasoning benchmark for open-world AI systems

Vimukthini Pinto
Chathura Gamage
Cheng Xue
Peng Zhang
Ekaterina Nikonova
Matthew Stephenson
Jochen Renz

Details DOI

IJCAI Conference 2024 Conference Paper

On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models

Xinpeng Wang
Shitong Duan
Xiaoyuan Yi
Jing Yao
Shanlin Zhou
Zhihua Wei
Peng Zhang
Dongkuan Xu

Big models have achieved revolutionary breakthroughs in the field of AI, but they also pose potential ethical and societal risks to humans. Addressing such problems, alignment technologies were introduced to make these models conform to human preferences and values. Despite the considerable advancements in the past year, various challenges lie in establishing the optimal alignment strategy, such as data cost and scalable oversight, and how to align remains an open question. In this survey paper, we comprehensively investigate value alignment approaches. We first unpack the historical context of alignment tracing back to the 1920s (where it comes from), then delve into the mathematical essence of alignment (what it is), shedding light on the inherent challenges. Following this foundation, we provide a detailed examination of existing alignment methods, which fall into three categories: RL-based Alignment, SFT-based Alignment, and Inference-Time Alignment, and demonstrate their intrinsic connections, strengths, and limitations, helping readers better understand this research area. In addition, two emerging topics, alignment goal and multimodal alignment, are also discussed as novel frontiers in the field. Looking forward, we discuss potential alignment paradigms and how they could handle remaining challenges, prospecting where future alignment will go.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Zinan Guo
Yanze Wu
Zhuowei Chen
Lang Chen
Peng Zhang
Qian He

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation. By incorporating a Lightning T2I branch with a standard diffusion one, PuLID introduces both contrastive alignment loss and accurate ID loss, minimizing disruption to the original model and ensuring high ID fidelity. Experiments show that PuLID achieves superior performance in both ID fidelity and editability. Another attractive property of PuLID is that the image elements (\eg, background, lighting, composition, and style) before and after the ID insertion are kept as consistent as possible. Codes and models are available at https: //github. com/ToTheBeginning/PuLID

PDF Details DOI

ICML Conference 2024 Conference Paper

Quantum Implicit Neural Representations

Jiaming Zhao
Wenbo Qiao
Peng Zhang
Hui Gao

Implicit neural representations have emerged as a powerful paradigm to represent signals such as images and sounds. This approach aims to utilize neural networks to parameterize the implicit function of the signal. However, when representing implicit functions, traditional neural networks such as ReLU-based multilayer perceptrons face challenges in accurately modeling high-frequency components of signals. Recent research has begun to explore the use of Fourier Neural Networks (FNNs) to overcome this limitation. In this paper, we propose Quantum Implicit Representation Network (QIREN), a novel quantum generalization of FNNs. Furthermore, through theoretical analysis, we demonstrate that QIREN possesses a quantum advantage over classical FNNs. Lastly, we conducted experiments in signal representation, image superresolution, and image generation tasks to show the superior performance of QIREN compared to state-of-the-art (SOTA) models. Our work not only incorporates quantum advantages into implicit neural representations but also uncovers a promising application direction for Quantum Neural Networks.

Details

AAAI Conference 2024 Conference Paper

Quantum-Inspired Neural Network with Runge-Kutta Method

Zipeng Fan
Jing Zhang
Peng Zhang
Qianxi Lin
Hui Gao

In recent years, researchers have developed novel Quantum-Inspired Neural Network (QINN) frameworks for the Natural Language Processing (NLP) tasks, inspired by the theoretical investigations of quantum cognition. However, we have found that the training efficiency of QINNs is significantly lower than that of classical networks. We analyze the unitary transformation modules of existing QINNs based on the time displacement symmetry of quantum mechanics and discover that they are resembling a mathematical form similar to the first-order Euler method. The high truncation error associated with Euler method affects the training efficiency of QINNs. In order to enhance the training efficiency of QINNs, we generalize QINNs' unitary transformation modules to the Quantum-like high-order Runge-Kutta methods (QRKs). Moreover, we present the results of experiments on conversation emotion recognition and text classification tasks to validate the effectiveness of the proposed approach.

PDF Details DOI

JAAMAS Journal 2023 Journal Article

Accelerating deep reinforcement learning via knowledge-guided policy network

Yuanqiang Yu
Peng Zhang
Jianye Hao

Abstract Deep reinforcement learning has contributed to dramatic advances in many tasks, such as playing games, controlling robots, and navigating complex environments. However, it requires many interactions with the environment. This is different from the human learning process since humans can use prior knowledge, which can significantly speed up the learning process as it avoids unnecessary exploration. Previous works integrating knowledge in RL did not model uncertainty in human cognition, which reduces the reliability of knowledge. In this paper, we propose a knowledge-guided policy network, a novel framework that combines suboptimal human knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller representing human knowledge and a refined module to fine-tune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing reinforcement learning algorithms such as PPO, AC, and SAC. We conduct experiments on both discrete and continuous control tasks. The empirical results show that our approach, which combines suboptimal human knowledge and RL, significantly improves the learning efficiency of basic RL algorithms, even with very low-performance human prior knowledge. Additional experiments are conducted on the number of fuzzy rules and the interpretability of the policy, which make our proposed framework more complete and reasonable. The code for this research is released under the project page of https: //github. com/yuyuanq/reinforcement-learning-using-knowledge-controller.

Details DOI

EAAI Journal 2023 Journal Article

DGFaceNet: Lightweight and efficient face recognition

Feng Zhao
Peng Zhang
Ran Zhang
Mengwei Li

Details DOI

EAAI Journal 2023 Journal Article

Efficient thermal infrared tracking with cross-modal compress distillation

Hangfei Li
Yufei Zha
Huanyu Li
Peng Zhang
Wei Huang

Details DOI

YNIMG Journal 2023 Journal Article

Improved sensitivity and microvascular weighting of 3T laminar fMRI with GE-BOLD using NORDIC and phase regression

Lasse Knudsen
Christopher J. Bailey
Jakob U. Blicher
Yan Yang
Peng Zhang
Torben E. Lund

Details DOI

EAAI Journal 2023 Journal Article

Inference and analysis on the evidential reasoning rule with time-lagged dependencies

Peng Zhang
Zhijie Zhou
Zhichao Feng
Jie Wang
Yijun Zhang

Details DOI

JBHI Journal 2023 Journal Article

KD-Informer: A Cuff-Less Continuous Blood Pressure Waveform Estimation Approach Based on Single Photoplethysmography

Chenbin Ma
Peng Zhang
Fan Song
Yangyang Sun
Guangda Fan
Tianyi Zhang
Youdan Feng
Guanglei Zhang

Ambulatory blood pressure (BP) monitoring plays a critical role in the early prevention and diagnosis of cardiovascular diseases. However, cuff-based inflatable devices cannot be used for continuous BP monitoring, while pulse transit time or multi-parameter-based methods require more bioelectrodes to acquire electrocardiogram signals. Thus, estimating the BP waveforms only based on photoplethysmography (PPG) signals for continuous BP monitoring has essential clinical values. Nevertheless, extracting useful features from raw PPG signals for fine-grained BP waveform estimation is challenging due to the physiological variation and noise interference. For single PPG analysis utilizing deep learning methods, the previous works depend mainly on stacked convolution operation, which ignores the underlying complementary time-dependent information. Thus, this work presents a novel Transformer-based method with knowledge distillation (KD-Informer) for BP waveform estimation. Meanwhile, we integrate the prior information of PPG patterns, selected by a novel backward elimination algorithm, into the knowledge transfer branch of the KD-Informer. With these strategies, the model can effectively capture the discriminative features through a lightweight architecture during the learning process. Then, we further adopt an effective transfer learning technique to demonstrate the excellent generalization capability of the proposed model using two independent multicenter datasets. Specifically, we first fine-tuned the KD-Informer with a large and high-quality dataset (Mindray dataset) and then transferred the pre-trained model to the target domain (MIMIC dataset). The experimental test results on the MIMIC dataset showed that the KD-Informer exhibited an estimation error of 0. 02 ± 5. 93 mmHg for systolic BP (SBP) and 0. 01 ± 3. 87 mmHg for diastolic BP (DBP), which complied with the association for the advancement of medical instrumentation (AAMI) standard. These results demonstrate that the KD-Informer has high reliability and elegant robustness to measure continuous BP waveforms.

Details DOI

YNICL Journal 2023 Journal Article

MRI histogram analysis of tumor-infiltrating CD8+ T cell levels in patients with glioblastoma

Caiqiang Xue
Qing Zhou
Peng Zhang
Bin Zhang
Qiu Sun
Shenglin Li
Juan Deng
Xianwang Liu

Details DOI

JBHI Journal 2023 Journal Article

MSHT: Multi-Stage Hybrid Transformer for the ROSE Image Analysis of Pancreatic Cancer

Tianyi Zhang
Yunlu Feng
Yu Zhao
Guangda Fan
Aiming Yang
Shangqing Lyu
Peng Zhang
Fan Song

Pancreatic cancer is one of the most malignant cancers with high mortality. The rapid on-site evaluation (ROSE) technique can significantly accelerate the diagnostic workflow of pancreatic cancer by immediately analyzing the fast-stained cytopathological images with on-site pathologists. However, the broader expansion of ROSE diagnosis has been hindered by the shortage of experienced pathologists. Deep learning has great potential for the automatic classification of ROSE images in diagnosis. But it is challenging to model the complicated local and global image features. The traditional convolutional neural network (CNN) structure can effectively extract spatial features, while it tends to ignore global features when the prominent local features are misleading. In contrast, the Transformer structure has excellent advantages in capturing global features and long-range relations, while it has limited ability in utilizing local features. We propose a multi-stage hybrid Transformer (MSHT) to combine the strengths of both, where a CNN backbone robustly extracts multi-stage local features at different scales as the attention guidance, and a Transformer encodes them for sophisticated global modeling. Going beyond the strength of each single method, the MSHT can simultaneously enhance the Transformer global modeling ability with the local guidance from CNN features. To evaluate the method in this unexplored field, a dataset of 4240 ROSE images is collected where MSHT achieves 95. 68% in classification accuracy with more accurate attention regions. The distinctively superior results compared to the state-of-the-art models make MSHT extremely promising for cytopathological image analysis.

Details DOI

TCS Journal 2023 Journal Article

New algorithms for a simple measure of network partitioning

Xueyang Zhao
Binghao Yan
Peng Zhang

Details DOI

TCS Journal 2023 Journal Article

New approximation algorithms for the rooted Budgeted Cycle Cover problem

Jiangkun Li
Peng Zhang

Details DOI

AAAI Conference 2023 Conference Paper

ReGANIE: Rectifying GAN Inversion Errors for Accurate Real Image Editing

Bingchuan Li
Tianxiang Ma
Peng Zhang
Miao Hua
Wei Liu
Qian He
Zili Yi

The StyleGAN family succeed in high-fidelity image generation and allow for flexible and plausible editing of generated images by manipulating the semantic-rich latent style space. However, projecting a real image into its latent space encounters an inherent trade-off between inversion quality and editability. Existing encoder-based or optimization-based StyleGAN inversion methods attempt to mitigate the trade-off but see limited performance. To fundamentally resolve this problem, we propose a novel two-phase framework by designating two separate networks to tackle editing and reconstruction respectively, instead of balancing the two. Specifically, in Phase I, a W-space-oriented StyleGAN inversion network is trained and used to perform image inversion and edit- ing, which assures the editability but sacrifices reconstruction quality. In Phase II, a carefully designed rectifying network is utilized to rectify the inversion errors and perform ideal reconstruction. Experimental results show that our approach yields near-perfect reconstructions without sacrificing the editability, thus allowing accurate manipulation of real images. Further, we evaluate the performance of our rectifying net- work, and see great generalizability towards unseen manipulation types and out-of-domain images.

PDF Details DOI

ICML Conference 2023 Conference Paper

SpeedDETR: Speed-aware Transformers for End-to-end Object Detection

Peiyan Dong
Zhenglun Kong
Xin Meng
Peng Zhang
Hao Tang 0005
Yanzhi Wang 0001
Chih-Hsien Chou

Vision Transformers (ViTs) have continuously achieved new milestones in object detection. However, the considerable computation and memory burden compromise their efficiency and generalization of deployment on resource-constraint devices. Besides, efficient transformer-based detectors designed by existing works can hardly achieve a realistic speedup, especially on multi-core processors (e. g. , GPUs). The main issue is that the current literature solely concentrates on building algorithms with minimal computation, oblivious that the practical latency can also be affected by the memory access cost and the degree of parallelism. Therefore, we propose SpeedDETR, a novel speed-aware transformer for end-to-end object detectors, achieving high-speed inference on multiple devices. Specifically, we design a latency prediction model which can directly and accurately estimate the network latency by analyzing network properties, hardware memory access pattern, and degree of parallelism. Following the effective local-to-global visual modeling process and the guidance of the latency prediction model, we build our hardware-oriented architecture design and develop a new family of SpeedDETR. Experiments on the MS COCO dataset show SpeedDETR outperforms current DETR-based methods on Tesla V100. Even acceptable speed inference can be achieved on edge GPUs.

Details

YNIMG Journal 2023 Journal Article

White matter BOLD signals at 7 Tesla reveal visual field maps in optic radiation and vertical occipital fasciculus

Huan Wang
Xiaoxiao Wang
Yanming Wang
Du Zhang
Yan Yang
Yifeng Zhou
Bensheng Qiu
Peng Zhang

Details DOI

YNICL Journal 2022 Journal Article

Changes in sensory-related brain networks of patients with moyamoya disease with limb paresthesia: A resting-state fMRI-based functional connectivity analysis

Rujing Sun
Shi-Yu Zhang
Xu Cheng
Peng Zhang
Peng-Gang Qiao
Gong-Jie Li

Details DOI

AAAI Conference 2022 Conference Paper

CODE: Contrastive Pre-training with Adversarial Fine-Tuning for Zero-Shot Expert Linking

Bo Chen
Jing Zhang
Xiaokang Zhang
Xiaobin Tang
lingfan cai
Hong Chen
Cuiping Li
Peng Zhang

Expert finding, a popular service provided by many online websites such as Expertise Finder, LinkedIn, and AMiner, is beneficial to seeking candidate qualifications, consultants, and collaborators. However, its quality is suffered from lack of ample sources of expert information. This paper employs AMiner as the basis with an aim at linking any external experts to the counterparts on AMiner. As it is infeasible to acquire sufficient linkages from arbitrary external sources, we explore the problem of zero-shot expert linking. In this paper, we propose CODE, which first pre-trains an expert linking model by contrastive learning on AMiner such that it can capture the representation and matching patterns of experts without supervised signals, then it is fine-tuned between AMiner and external sources to enhance the model’s transferability in an adversarial manner. For evaluation, we first design two intrinsic tasks, author identification and paper clustering, to validate the representation and matching capability endowed by contrastive learning. Then the final external expert linking performance on two genres of external sources also implies the superiority of the adversarial fine-tuning method. Additionally, we show the online deployment of CODE, and continuously improve its online performance via active learning.

PDF Details

YNIMG Journal 2022 Journal Article

Corrigendum to “Laminar perfusion imaging with zoomed arterial spin labeling at 7 Tesla” [NeuroImage volume 245, 2021, 118724]

Xingfeng Shao
Fanhua Guo
Qinyang Shou
Kai Wang
Kay Jann
Lirong Yan
Arthur W. Toga
Peng Zhang

Details DOI

NeurIPS Conference 2022 Conference Paper

DART: Articulated Hand Model with Diverse Accessories and Rich Textures

Daiheng Gao
Yuliang Xiu
Kailin Li
Lixin Yang
Feng Wang
Peng Zhang
Bang Zhang
Cewu Lu

Hand, the bearer of human productivity and intelligence, is receiving much attention due to the recent fever of digital twins. Among different hand morphable models, MANO has been widely used in vision and graphics community. However, MANO disregards textures and accessories, which largely limits its power to synthesize photorealistic hand data. In this paper, we extend MANO with Diverse Accessories and Rich Textures, namely DART. DART is composed of 50 daily 3D accessories which varies in appearance and shape, and 325 hand-crafted 2D texture maps covers different kinds of blemishes or make-ups. Unity GUI is also provided to generate synthetic hand data with user-defined settings, e. g. , pose, camera, background, lighting, textures, and accessories. Finally, we release DARTset, which contains large-scale (800K), high-fidelity synthetic hand images, paired with perfect-aligned 3D labels. Experiments demonstrate its superiority in diversity. As a complement to existing hand datasets, DARTset boosts the generalization in both hand pose estimation and mesh recovery tasks. Raw ingredients (textures, accessories), Unity GUI, source code and DARTset are publicly available at dart2022. github. io.

PDF Details

NeurIPS Conference 2022 Conference Paper

MorphTE: Injecting Morphology in Tensorized Embeddings

Guobing Gan
Peng Zhang
Sunzhu Li
Xiuqing Lu
Benyou Wang

In the era of deep learning, word embeddings are essential when dealing with text tasks. However, storing and accessing these embeddings requires a large amount of space. This is not conducive to the deployment of these models on resource-limited devices. Combining the powerful compression capability of tensor products, we propose a word embedding compression method with morphological augmentation, Morphologically-enhanced Tensorized Embeddings (MorphTE). A word consists of one or more morphemes, the smallest units that bear meaning or have a grammatical function. MorphTE represents a word embedding as an entangled form of its morpheme vectors via the tensor product, which injects prior semantic and grammatical knowledge into the learning of embeddings. Furthermore, the dimensionality of the morpheme vector and the number of morphemes are much smaller than those of words, which greatly reduces the parameters of the word embeddings. We conduct experiments on tasks such as machine translation and question answering. Experimental results on four translation datasets of different languages show that MorphTE can compress word embedding parameters by about $20$ times without performance loss and significantly outperforms related embedding compression methods.

PDF Details

NeurIPS Conference 2022 Conference Paper

Parameter-free Dynamic Graph Embedding for Link Prediction

Jiahao Liu
Dongsheng Li
Hansu Gu
Tun Lu
Peng Zhang
Ning Gu

Dynamic interaction graphs have been widely adopted to model the evolution of user-item interactions over time. There are two crucial factors when modelling user preferences for link prediction in dynamic interaction graphs: 1) collaborative relationship among users and 2) user personalized interaction patterns. Existing methods often implicitly consider these two factors together, which may lead to noisy user modelling when the two factors diverge. In addition, they usually require time-consuming parameter learning with back-propagation, which is prohibitive for real-time user preference modelling. To this end, this paper proposes FreeGEM, a parameter-free dynamic graph embedding method for link prediction. Firstly, to take advantage of the collaborative relationships, we propose an incremental graph embedding engine to obtain user/item embeddings, which is an Online-Monitor-Offline architecture consisting of an Online module to approximately embed users/items over time, a Monitor module to estimate the approximation error in real time and an Offline module to calibrate the user/item embeddings when the online approximation errors exceed a threshold. Meanwhile, we integrate attribute information into the model, which enables FreeGEM to better model users belonging to some under represented groups. Secondly, we design a personalized dynamic interaction pattern modeller, which combines dynamic time decay with attention mechanism to model user short-term interests. Experimental results on two link prediction tasks show that FreeGEM can outperform the state-of-the-art methods in accuracy while achieving over 36X improvement in efficiency. All code and datasets can be found in https: //github. com/FudanCISL/FreeGEM.

PDF Details

JBHI Journal 2022 Journal Article

Semi-Supervised Learning for Automatic Atrial Fibrillation Detection in 24-Hour Holter Monitoring

Peng Zhang
Yuting Chen
Fan Lin
Sifan Wu
Xiaoyun Yang
Qiang Li

Paroxysmal atrial fibrillation (AF) is generally diagnosed by long-term dynamic electrocardiogram (ECG) monitoring. Identifying AF episodes from long-term ECG data can place a heavy burden on clinicians. Many machine-learning-based automatic AF detection methods have been proposed to solve this issue. However, these methods require numerous annotated data to train the model, and the annotation of AF in long-term ECG is extremely time-consuming. Reducing the demand for labeled data can effectively improve the clinical practicability of automatic AF detection methods. In this study, we developed a novel semi-supervised learning method that generated modified low-entropy labels of unlabeled samples for training a deep learning model to automatically detect paroxysmal AF in 24 h Holter monitoring data. Our method employed a 1D CNN-LSTM neural network with RR intervals as input and used few labeled training data with numerous unlabeled data for training the neural network. This method was evaluated using a 24 h Holter monitoring dataset collected from 1000 paroxysmal AF patients. Using labeled samples from only 10 patients for model training, our method achieved a sensitivity of 97. 8%, specificity of 97. 9%, and accuracy of 97. 9% in five-fold cross-validation. Compared to the supervised learning method with complete labeled samples, the detection accuracy of our method was only 0. 5% lower, while the workload of data annotation was significantly reduced by more than 98%. In general, this is the first study to apply semi-supervised learning techniques for automatic AF detection using ECG. Our method can effectively reduce the demand for AF data annotations and can improve the clinical practicability of automatic AF detection.

Details DOI

IJCAI Conference 2022 Conference Paper

Subgraph Neighboring Relations Infomax for Inductive Link Prediction on Knowledge Graphs

Xiaohan Xu
Peng Zhang
Yongquan He
Chengpeng Chao
Chaoyang Yan

Inductive link prediction for knowledge graph aims at predicting missing links between unseen entities, those not shown in training stage. Most previous works learn entity-specific embeddings of entities, which cannot handle unseen entities. Recent several methods utilize enclosing subgraph to obtain inductive ability. However, all these works only consider the enclosing part of subgraph without complete neighboring relations, which leads to the issue that partial neighboring relations are neglected, and sparse subgraphs are hard to be handled. To address that, we propose Subgraph Neighboring Relations Infomax, SNRI, which sufficiently exploits complete neighboring relations from two aspects: neighboring relational feature for node feature and neighboring relational path for sparse subgraph. To further model neighboring relations in a global way, we innovatively apply mutual information (MI) maximization for knowledge graph. Experiments show that SNRI outperforms existing state-of-art methods by a large margin on inductive link prediction task, and verify the effectiveness of exploring complete neighboring relations in a global way to characterize node features and reason on sparse subgraphs.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

Text/Speech-Driven Full-Body Animation

Wenlin Zhuang
Jinwei Qi
Peng Zhang
Bang Zhang
Ping Tan

Due to the increasing demand in films and games, synthesizing 3D avatar animation has attracted much attention recently. In this work, we present a production-ready text/speech-driven full-body animation synthesis system. Given the text and corresponding speech, our system synthesizes face and body animations simultaneously, which are then skinned and rendered to obtain a video stream output. We adopt a learning-based approach for synthesizing facial animation and a graph-based approach to animate the body, which generates high-quality avatar animation efficiently and robustly. Our results demonstrate the generated avatar animations are realistic, diverse and highly text/speech-correlated.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Towards Explainable Action Recognition by Salient Qualitative Spatial Object Relation Chains

Hua Hua
Dongxu Li
Ruiqi Li
Peng Zhang
Jochen Renz
Anthony Cohn

In order to be trusted by humans, Artificial Intelligence agents should be able to describe rationales behind their decisions. One such application is human action recognition in critical or sensitive scenarios, where trustworthy and explainable action recognizers are expected. For example, reliable pedestrian action recognition is essential for self-driving cars and explanations for real-time decision making are critical for investigations if an accident happens. In this regard, learningbased approaches, despite their popularity and accuracy, are disadvantageous due to their limited interpretability. This paper presents a novel neuro-symbolic approach that recognizes actions from videos with human-understandable explanations. Specifically, we first propose to represent videos symbolically by qualitative spatial relations between objects called qualitative spatial object relation chains. We further develop a neural saliency estimator to capture the correlation between such object relation chains and the occurrence of actions. Given an unseen video, this neural saliency estimator is able to tell which object relation chains are more important for the action recognized. We evaluate our approach on two real-life video datasets, with respect to recognition accuracy and the quality of generated action explanations. Experiments show that our approach achieves superior performance on both aspects to previous symbolic approaches, thus facilitating trustworthy intelligent decision making. Our approach can be used to augment state-of-the-art learning approaches with explainability.

PDF Details

TCS Journal 2021 Journal Article

Approximating Max k-Uncut via LP-rounding plus greed, with applications to Densest k-Subgraph

Peng Zhang
Zhendong Liu

Details DOI

AAAI Conference 2021 Conference Paper

Continuous Self-Attention Models with Neural ODE Networks

Jing Zhang
Peng Zhang
Baiwen Kong
Junqiu Wei
Xin Jiang

Stacked self-attention models receive widespread attention, due to its ability of capturing global dependency among words. However, the stacking of many layers and components generates huge parameters, leading to low parameter efficiency. In response to this issue, we propose a lightweight architecture named Continuous Self-Attention models with neural ODE networks (CSAODE). In CSAODE, continuous dynamical models (i. e. , neural ODEs) are coupled with our proposed self-attention block to form a self-attention ODE solver. This solver continuously calculates and optimizes the hidden states via only one layer of parameters to improve the parameter efficiency. In addition, we design a novel accelerated continuous dynamical model to reduce computing costs, and integrate it in CSAODE. Moreover, since the original self-attention ignores local information, CSAODE makes use of N-gram convolution to encode local representations, and a fusion layer with only two trainable scalars are designed for generating sentence vectors. We perform a series of experiments on text classification, natural language inference (NLI) and text matching tasks. With fewer parameters, CSAODE outperforms state-of-the-art models on text classification tasks (e. g. , 1. 3% accuracy improved on SUBJ task), and has competitive performances for NLI and text matching tasks as well.

PDF Details

IJCAI Conference 2021 Conference Paper

HIP Network: Historical Information Passing Network for Extrapolation Reasoning on Temporal Knowledge Graph

Yongquan He
Peng Zhang
Luchen Liu
Qi Liang
Wenyuan Zhang
Chuang Zhang

In recent years, temporal knowledge graph (TKG) reasoning has received significant attention. Most existing methods assume that all timestamps and corresponding graphs are available during training, which makes it difficult to predict future events. To address this issue, recent works learn to infer future events based on historical information. However, these methods do not comprehensively consider the latent patterns behind temporal changes, to pass historical information selectively, update representations appropriately and predict events accurately. In this paper, we propose the Historical Information Passing (HIP) network to predict future events. HIP network passes information from temporal, structural and repetitive perspectives, which are used to model the temporal evolution of events, the interactions of events at the same time step, and the known events respectively. In particular, our method considers the updating of relation representations and adopts three scoring functions corresponding to the above dimensions. Experimental results on five benchmark datasets show the superiority of HIP network, and the significant improvements on Hits@1 prove that our method can more accurately predict what is going to happen.

PDF Details DOI

YNIMG Journal 2021 Journal Article

Laminar perfusion imaging with zoomed arterial spin labeling at 7 Tesla

Xingfeng Shao
Fanhua Guo
Qinyang Shou
Kai Wang
Kay Jann
Lirong Yan
Arthur W. Toga
Peng Zhang

Details DOI

IJCAI Conference 2021 Conference Paper

MDNN: A Multimodal Deep Neural Network for Predicting Drug-Drug Interaction Events

Tengfei Lyu
Jianliang Gao
Ling Tian
Zhao Li
Peng Zhang
Ji Zhang

The interaction of multiple drugs could lead to serious events, which causes injuries and huge medical costs. Accurate prediction of drug-drug interaction (DDI) events can help clinicians make effective decisions and establish appropriate therapy programs. Recently, many AI-based techniques have been proposed for predicting DDI associated events. However, most existing methods pay less attention to the potential correlations between DDI events and other multimodal data such as targets and enzymes. To address this problem, we propose a Multimodal Deep Neural Network (MDNN) for DDI events prediction. In MDNN, we design a two-pathway framework including drug knowledge graph (DKG) based pathway and heterogeneous feature (HF) based pathway to obtain drug multimodal representations. Finally, a multimodal fusion neural layer is designed to explore the complementary among the drug multimodal representations. We conduct extensive experiments on real-world dataset. The results show that MDNN can accurately predict DDI events and outperform the state-of-the-art models.

PDF Details DOI

TIST Journal 2021 Journal Article

TARA-Net: A Fusion Network for Detecting Takeaway Rider Accidents

Yifan He
Zhao Li
Lei Fu
Anhui Wang
Peng Zhang
Shuigeng Zhou
Ji Zhang
Ting Yu

In the emerging business of food delivery, rider traffic accidents raise financial cost and social traffic burden. Although there has been much effort on traffic accident forecasting using temporal-spatial prediction models, none of the existing work studies the problem of detecting the takeaway rider accidents based on food delivery trajectory data. In this article, we aim to detect whether a takeaway rider meets an accident on a certain time period based on trajectories of food delivery and riders’ contextual information. The food delivery data has a heterogeneous information structure and carries contextual information such as weather and delivery history, and trajectory data are collected as a spatial-temporal sequence. In this article, we propose a TakeAway Rider Accident detection fusion network TARA-Net to jointly model these heterogeneous and spatial-temporal sequence data. We utilize the residual network to extract basic contextual information features and take advantage of a transformer encoder to capture trajectory features. These embedding features are concatenated into a pyramidal feed-forward neural network. We jointly train the above three components to combine the benefits of spatial-temporal trajectory data and sparse basic contextual data for early detecting traffic accidents. Furthermore, although traffic accidents rarely happen in food delivery, we propose a sampling mechanism to alleviate the imbalance of samples when training the model. We evaluate the model on a transportation mode classification dataset Geolife and a real-world Ele.me dataset with over 3 million riders. The experimental results show that the proposed model is superior to the state-of-the-art.

Details DOI

YNIMG Journal 2021 Journal Article

Ultra-high field fMRI reveals origins of feedforward and feedback activity within laminae of human ocular dominance columns

Gilles de Hollander
Wietske van der Zwaag
Chencan Qian
Peng Zhang
Tomas Knapen

Details DOI

YNIMG Journal 2021 Journal Article

Visual adaptation and 7T fMRI reveal facial identity processing in the human brain under shallow interocular suppression

Runnan Cao
Chencan Qian
Shiwen Ren
Zhifen He
Sheng He
Peng Zhang

Details DOI

EAAI Journal 2020 Journal Article

Collaborative weighted multi-view feature extraction

Jinxin Zhang
Peng Zhang
Liming Liu
Naiyang Deng
Ling Jing

Details DOI

IJCAI Conference 2020 Conference Paper

Discrete Embedding for Latent Networks

Hong Yang
Ling Chen
Minglong Lei
Lingfeng Niu
Chuan Zhou
Peng Zhang

Discrete network embedding emerged recently as a new direction of network representation learning. Compared with traditional network embedding models, discrete network embedding aims to compress model size and accelerate model inference by learning a set of short binary codes for network vertices. However, existing discrete network embedding methods usually assume that the network structures (e. g. , edge weights) are readily available. In real-world scenarios such as social networks, sometimes it is impossible to collect explicit network structure information and it usually needs to be inferred from implicit data such as information cascades in the networks. To address this issue, we present an end-to-end discrete network embedding model for latent networks DELN that can learn binary representations from underlying information cascades. The essential idea is to infer a latent Weisfeiler-Lehman proximity matrix that captures node dependence based on information cascades and then to factorize the latent Weisfiler-Lehman matrix under the binary node representation constraint. Since the learning problem is a mixed integer optimization problem, an efficient maximal likelihood estimation based cyclic coordinate descent (MLE-CCD) algorithm is used as the solution. Experiments on real-world datasets show that the proposed model outperforms the state-of-the-art network embedding methods.

PDF Details DOI

IJCAI Conference 2020 Conference Paper

Graph Neural Architecture Search

Yang Gao
Hong Yang
Peng Zhang
Chuan Zhou
Yue Hu

Graph neural networks (GNNs) emerged recently as a powerful tool for analyzing non-Euclidean data such as social network data. Despite their success, the design of graph neural networks requires heavy manual work and domain knowledge. In this paper, we present a graph neural architecture search method (GraphNAS) that enables automatic design of the best graph neural architecture based on reinforcement learning. Specifically, GraphNAS uses a recurrent network to generate variable-length strings that describe the architectures of graph neural networks, and trains the recurrent network with policy gradient to maximize the expected accuracy of the generated architectures on a validation data set. Furthermore, to improve the search efficiency of GraphNAS on big networks, GraphNAS restricts the search space from an entire architecture space to a sequential concatenation of the best search results built on each single architecture layer. Experiments on real-world datasets demonstrate that GraphNAS can design a novel network architecture that rivals the best human-invented architecture in terms of validation set accuracy. Moreover, in a transfer learning task we observe that graph neural architectures designed by GraphNAS, when transferred to new datasets, still gain improvement in terms of prediction accuracy.

PDF Details DOI

IJCAI Conference 2020 Conference Paper

KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge

Peng Zhang
Jianye Hao
Weixun Wang
Hongyao Tang
Yi Ma
Yihai Duan
Yan Zheng

Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process afterwards. Although the prior knowledge may be not fully applicable to the new task, the learning process is significantly sped up since the initial policy ensures a quick-start of learning and intermediate guidance allows to avoid unnecessary exploration. Taking this inspiration, we propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to finetune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing policy-based reinforcement learning algorithm. We conduct experiments on several control tasks. The empirical results show that our approach, which combines suboptimal human knowledge and RL, achieves significant improvement on learning efficiency of flat RL algorithms, even with very low-performance human prior knowledge.

PDF Details DOI

YNIMG Journal 2020 Journal Article

Neural mechanisms of AVPR1A RS3-RS1 haplotypes that impact verbal learning and memory

Yan Zhang
Dan Zhu
Peng Zhang
Wei Li
Wen Qin
Feng Liu
Jiayuan Xu
Qiang Xu

Details DOI

IJCAI Conference 2020 Conference Paper

Overcoming Language Priors with Self-supervised Learning for Visual Question Answering

Xi Zhu
Zhendong Mao
Chunxiao Liu
Peng Zhang
Bin Wang
Yongdong Zhang

Most Visual Question Answering (VQA) models suffer from the language prior problem, which is caused by inherent data biases. Specifically, VQA models tend to answer questions (e. g. , what color is the banana? ) based on the high-frequency answers (e. g. , yellow) ignoring image contents. Existing approaches tackle this problem by creating delicate models or introducing additional visual annotations to reduce question dependency and strengthen image dependency. However, they are still subject to the language prior problem since the data biases have not been fundamentally addressed. In this paper, we introduce a self-supervised learning framework to solve this problem. Concretely, we first automatically generate labeled data to balance the biased data, and then propose a self-supervised auxiliary task to utilize the balanced data to assist the VQA model to overcome language priors. Our method can compensate for the data biases by generating balanced data without introducing external annotations. Experimental results show that our method achieves state-of-the-art performance, improving the overall accuracy from 49. 50% to 57. 59% on the most commonly used benchmark VQA-CP v2. In other words, we can increase the performance of annotation-based methods by 16% without using external annotations. Our code is available on GitHub.

PDF Details DOI

AAAI Conference 2019 Conference Paper

A Generalized Language Model in Tensor Space

Lipeng Zhang
Peng Zhang
Xindian Ma
Shuqin Gu
Zhan Su
Dawei Song

Sentence semantic matching requires an agent to determine the semantic relation between two sentences, which is widely used in various natural language tasks such as Natural Language Inference (NLI) and Paraphrase Identification (PI). Among all matching methods, attention mechanism plays an important role in capturing the semantic relations and properly aligning the elements of two sentences. Previous methods utilized attention mechanism to select important parts of sentences at one time. However, the important parts of the sentence during semantic matching are dynamically changing with the degree of sentence understanding. Selecting the important parts at one time may be insufficient for semantic understanding. To this end, we propose a Dynamic Re-read Network (DRr-Net) approach for sentence semantic matching, which is able to pay close attention to a small region of sentences at each step and re-read the important words for better sentence semantic understanding. To be specific, we first employ Attention Stack-GRU (ASG) unit to model the original sentence repeatedly and preserve all the information from bottom-most word embedding input to up-most recurrent output. Second, we utilize Dynamic Re-read (DRr) unit to pay close attention to one important word at one time with the consideration of learned information and re-read the important words for better sentence semantic understanding. Extensive experiments on three sentence matching benchmark datasets demonstrate that DRr-Net has the ability to model sentence semantic more precisely and significantly improve the performance of sentence semantic matching. In addition, it is very interesting that some of finding in our experiments are consistent with the findings of psychological research.

PDF Details

NeurIPS Conference 2019 Conference Paper

A Tensorized Transformer for Language Modeling

Xindian Ma
Peng Zhang
Shuai Zhang
Nan Duan
Yuexian Hou
Ming Zhou
Dawei Song

Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP) tasks. However, the multi-head attention mechanism, as a key component of Transformer, limits the effective deployment of the model to a resource-limited setting. In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD). We test and verify the proposed attention method on three language modeling tasks (i. e. , PTB, WikiText-103 and One-billion) and a neural machine translation task (i. e. , WMT-2016 English-German). Multi-linear attention can not only largely compress the model parameters but also obtain performance improvements, compared with a number of language modeling approaches, such as Transformer, Transformer-XL, and Transformer with tensor train decomposition.

PDF Details

IJCAI Conference 2019 Conference Paper

Low-Bit Quantization for Attributed Network Representation Learning

Hong Yang
Shirui Pan
Ling Chen
Chuan Zhou
Peng Zhang

Attributed network embedding plays an important role in transferring network data into compact vectors for effective network analysis. Existing attributed network embedding models are designed either in continuous Euclidean spaces which introduce data redundancy or in binary coding spaces which incur significant loss of representation accuracy. To this end, we present a new Low-Bit Quantization for Attributed Network Representation Learning model (LQANR for short) that can learn compact node representations with low bitwidth values while preserving high representation accuracy. Specifically, we formulate a new representation learning function based on matrix factorization that can jointly learn the low-bit node representations and the layer aggregation weights under the low-bit quantization constraint. Because the new learning function falls into the category of mixed integer optimization, we propose an efficient mixed-integer based alternating direction method of multipliers (ADMM) algorithm as the solution. Experiments on real-world node classification and link prediction tasks validate the promising results of the proposed LQANR model.

PDF Details

IJCAI Conference 2019 Conference Paper

Quantum-Inspired Interactive Networks for Conversational Sentiment Analysis

Yazhou Zhang
Qiuchi Li
Dawei Song
Peng Zhang
Panpan Wang

Conversational sentiment analysis is an emerging, yet challenging Artificial Intelligence (AI) subtask. It aims to discover the affective state of each participant in a conversation. There exists a wealth of interaction information that affects the sentiment of speakers. However, the existing sentiment analysis approaches are insufficient in dealing with this task due to ignoring the interactions and dependency relationships between utterances. In this paper, we aim to address this issue by modeling intrautterance and inter-utterance interaction dynamics. We propose an approach called quantum-inspired interactive networks (QIN), which leverages the mathematical formalism of quantum theory (QT) and the long short term memory (LSTM) network, to learn such interaction dynamics. Specifically, a density matrix based convolutional neural network (DM-CNN) is proposed to capture the interactions within each utterance (i. e. , the correlations between words), and a strong-weak influence model inspired by quantum measurement theory is developed to learn the interactions between adjacent utterances (i. e. , how one speaker influences another). Extensive experiments are conducted on the MELD and IEMOCAP datasets. The experimental results demonstrate the effectiveness of the QIN model.

PDF Details

IJCAI Conference 2019 Conference Paper

Robust Embedding with Multi-Level Structures for Link Prediction

Zihan Wang
Zhaochun Ren
Chunyu He
Peng Zhang
Yue Hu

Knowledge Graph (KG) embedding has become crucial for the task of link prediction. Recent work applies encoder-decoder models to tackle this problem, where an encoder is formulated as a graph neural network (GNN) and a decoder is represented by an embedding method. These approaches enforce embedding techniques with structure information. Unfortunately, existing GNN-based frameworks still confront 3 severe problems: low representational power, stacking in a flat way, and poor robustness to noise. In this work, we propose a novel multi-level graph neural network (M-GNN) to address the above challenges. We first identify an injective aggregate scheme and design a powerful GNN layer using multi-layer perceptrons (MLPs). Then, we define graph coarsening schemes for various kinds of relations, and stack GNN layers on a series of coarsened graphs, so as to model hierarchical structures. Furthermore, attention mechanisms are adopted so that our approach can make predictions accurately even on the noisy knowledge graph. Results on WN18 and FB15k datasets show that our approach is effective in the standard link prediction task, significantly and consistently outperforming competitive baselines. Furthermore, robustness analysis on FB15k-237 dataset demonstrates that our proposed M-GNN is highly robust to sparsity and noise.

PDF Details

TCS Journal 2018 Journal Article

A quantum-inspired multimodal sentiment analysis framework

Yazhou Zhang
Dawei Song
Peng Zhang
Panpan Wang
Jingfei Li
Xiang Li
Benyou Wang

Details DOI

TCS Journal 2018 Journal Article

Approximation and hardness results for the Max k-Uncut problem

Peng Zhang
Chenchen Wu
Dachuan Xu

Details DOI

TCS Journal 2018 Journal Article

Computing and estimating the volume of the solution space of SMT(LA) constraints

Cunjing Ge
Feifei Ma
Peng Zhang
Jian Zhang

Details DOI

AAAI Conference 2018 Conference Paper

End-to-End Quantum-like Language Models with Application to Question Answering

Peng Zhang
Jiabin Niu
Zhan Su
Benyou Wang
Liqun Ma
Dawei Song

Language Modeling (LM) is a fundamental research topic in a range of areas. Recently, inspired by quantum theory, a novel Quantum Language Model (QLM) has been proposed for Information Retrieval (IR). In this paper, we aim to broaden the theoretical and practical basis of QLM. We develop a Neural Network based Quantum-like Language Model (NNQLM) and apply it to Question Answering. Speciﬁcally, based on word embeddings, we design a new density matrix, which represents a sentence (e. g. , a question or an answer) and encodes a mixture of semantic subspaces. Such a density matrix, together with a joint representation of the question and the answer, can be integrated into neural network architectures (e. g. , 2-dimensional convolutional neural networks). Experiments on the TREC-QA and WIKIQA datasets have veriﬁed the effectiveness of our proposed models.

PDF Details

TIST Journal 2017 Journal Article

A Distribution Separation Method Using Irrelevance Feedback Data for Information Retrieval

Peng Zhang
Qian Yu
Yuexian Hou
Dawei Song
Jingfei Li
Bin Hu

In many research and application areas, such as information retrieval and machine learning, we often encounter dealing with a probability distribution that is mixed by one distribution that is relevant to our task in hand and the other that is irrelevant and that we want to get rid of. Thus, it is an essential problem to separate the irrelevant distribution from the mixture distribution. This article is focused on the application in Information Retrieval, where relevance feedback is a widely used technique to build a refined query model based on a set of feedback documents. However, in practice, the relevance feedback set, even provided by users explicitly or implicitly, is often a mixture of relevant and irrelevant documents. Consequently, the resultant query model (typically a term distribution) is often a mixture rather than a true relevance term distribution, leading to a negative impact on the retrieval performance. To tackle this problem, we recently proposed a Distribution Separation Method (DSM), which aims to approximate the true relevance distribution by separating a seed irrelevance distribution from the mixture one. While it achieved a promising performance in an empirical evaluation with simulated explicit irrelevance feedback data, it has not been deployed in the scenario where one should automatically obtain the irrelevance feedback data. In this article, we propose a substantial extension of the basic DSM from two perspectives: developing a further regularization framework and deploying DSM in the automatic irrelevance feedback scenario. Specifically, in order to avoid the output distribution of DSM drifting away from the true relevance distribution when the quality of seed irrelevant distribution (as the input to DSM) is not guaranteed, we propose a DSM regularization framework to constrain the estimation for the relevance distribution. This regularization framework includes three algorithms, each corresponding to a regularization strategy incorporated in the objective function of DSM. In addition, we exploit DSM in automatic (i.e., pseudo) irrelevance feedback, by automatically detecting the seed irrelevant documents via three different document reranking methods. We have carried out extensive experiments based on various TREC datasets, in order to systematically evaluate the proposed methods. The experimental results demonstrate the effectiveness of our proposed approaches in comparison with various strong baselines.

Details DOI

TCS Journal 2017 Journal Article

A probabilistic study of generalized solution concepts in satisfiability testing and constraint programming

Peng Zhang
Yong Gao

Details DOI

IS Journal 2017 Journal Article

Collective Hyping Detection System for Identifying Online Spam Activities

Qinzhe Zhang
Jia Wu
Peng Zhang
Guodong Long
Chengqi Zhang

Although existing antispam strategies detect traditional spam activities effectively, evolving spam schemes can successfully cheat conventional testing by buying the comments that are written by genuine users and sold by specific web markets. Such spam activities turn into a kind of advertising campaign among business owners to maintain their rank in top positions. This article proposes a new collaborative marketing hyping detection solution that aims to identify spam comments generated by the Spam Reviewer Cloud and detect products that adopt an evolving spam strategy for promotion. The authors propose an unsupervised learning model that combines heterogeneous product review networks in an attempt to discover collective hyping activities. Their experiments validate the existence of the collaborative marketing hyping activities on a real-life e-commerce platform and demonstrate that their model can effectively and accurately identify these advanced spam activities.

Details DOI

TCS Journal 2016 Journal Article

A new approximation algorithm for the unbalanced Min s–t Cut problem

Peng Zhang

Details DOI

AAAI Conference 2016 Conference Paper

Angry Birds as a Challenge for Artificial Intelligence

Jochen Renz
XiaoYu Ge
Rohan Verma
Peng Zhang

The Angry Birds AI Competition1 has been held annually since 2012 in conjunction with some of the major AI conferences, most recently with IJCAI 2015. The goal of the competition is to build AI agents that can play new Angry Birds levels as good as or better than the best human players. Successful agents should be able to quickly analyze new levels and to predict physical consequences of possible actions in order to select actions that solve a given level with a high score. Agents have no access to the game internal physics, but only receive screenshots of the live game. In this paper we describe why this problem is a challenge for AI, and why it is an important step towards building AI that can successfully interact with the real world. We also summarise some highlights of past competitions, including a new competition track we introduced recently.

PDF Details

AAAI Conference 2016 Conference Paper

Direct Discriminative Bag Mapping for Multi-Instance Learning

Jia Wu
Shirui Pan
Peng Zhang
Xingquan Zhu

Multi-instance learning (MIL) is useful for tackling labeling ambiguity in learning tasks, by allowing a bag of instances to share one label. Recently, bag mapping methods, which transform a bag to a single instance in a new space via instance selection, have drawn signiﬁcant attentions. To date, most existing works are developed based on the original space, i. e. , utilizing all instances for bag mapping, and instance selection is indirectly tied to the MIL objective. As a result, it is hard to guarantee the distinguish capacity of the selected instances in the new bag mapping space for MIL. In this paper, we propose a direct discriminative mapping approach for multi-instance learning (MILDM), which identiﬁes instances to directly distinguish bags in the new mapping space. Experiments and comparisons on real-world learning tasks demonstrate the algorithm performance.

PDF Details

AAAI Conference 2016 Conference Paper

Iterative Project Quasi-Newton Algorithm for Training RBM

Shuai Mi
Xiaozhao Zhao
Yuexian Hou
Peng Zhang
Wenjie Li
Dawei Song

The restricted Boltzmann machine (RBM) has been used as building blocks for many successful deep learning models, e. g. , deep belief networks (DBN) and deep Boltzmann machine (DBM) etc. The training of RBM can be extremely slow in pathological regions. The second order optimization methods, such as quasi-Newton methods, were proposed to deal with this problem. However, the non-convexity results in many obstructions for training RBM, including the infeasibility of applying second order optimization methods. In order to overcome this obstruction, we introduce an em-like iterative project quasi-Newton (IPQN) algorithm. Speciﬁcally, we iteratively perform the sampling procedure where it is not necessary to update parameters, and the sub-training procedure that is convex. In sub-training procedures, we apply quasi-Newton methods to deal with the pathological problem. We further show that Newton’s method turns out to be a good approximation of the natural gradient (NG) method in RBM training. We evaluate IPQN in a series of density estimation experiments on the artiﬁcial dataset and the MNIST digit dataset. Experimental results indicate that IPQN achieves an improved convergent performance over the traditional CD method.

PDF Details

AAAI Conference 2016 Conference Paper

On the Minimum Differentially Resolving Set Problem for Diffusion Source Inference in Networks

Chuan Zhou
Wei-Xue Lu
Peng Zhang
Jia Wu
Yue Hu
Li Guo

In this paper we theoretically study the minimum Differentially Resolving Set (DRS) problem derived from the classical sensor placement optimization problem in network source locating. A DRS of a graph G = (V, E) is deﬁned as a subset S ⊆ V where any two elements in V can be distinguished by their different differential characteristic sets deﬁned on S. The minimum DRS problem aims to ﬁnd a DRS S in the graph G with minimum total weight v∈S w(v). In this paper we establish a group of Integer Linear Programming (ILP) models as the solution. By the weighted set cover theory, we propose an approximation algorithm with the Θ(ln n) approximability for the minimum DRS problem on general graphs, where n is the graph size.

PDF Details

TCS Journal 2016 Journal Article

The label cut problem with respect to path length and label frequency

Peng Zhang
Bin Fu

Details DOI

IJCAI Conference 2016 Conference Paper

Trend-Based Prediction of Spatial Change

XiaoYu Ge
Jae Hee Lee
Jochen Renz
Peng Zhang

The capability to predict changes of spatial regions is important for an intelligent system that interacts with the physical world. For example, in a disaster management scenario, predicting potentially endangered areas and inferring safe zones is essential for planning evacuations and countermeasures. Existing approaches usually predict such spatial changes by simulating the physical world based on specific models. Thus, these simulation-based methods will not be able to provide reliable predictions when the scenario is not similar to any of the models in use or when the input parameters are incomplete. In this paper, we present a prediction approach that overcomes the aforementioned problem by using a more general model and by analysing the trend of the spatial changes. The method is also flexible to adopt to new observations and to adapt its prediction to new situations.

PDF Details

TCS Journal 2015 Journal Article

Algorithmic aspects of homophyly of networks

Peng Zhang
Angsheng Li

Details DOI

JBHI Journal 2015 Journal Article

An Implantable RFID Sensor Tag toward Continuous Glucose Monitoring

Zhibin Xiao
Xi Tan
Xianliang Chen
Sizheng Chen
Zijian Zhang
Hualei Zhang
Junyu Wang
Yue Huang

This paper presents a wirelessly powered implantable electrochemical sensor tag for continuous blood glucose monitoring. The system is remotely powered by a 13. 56-MHz inductive link and utilizes an ISO 15693 radio frequency identification (RFID) standard for communication. This paper provides reliable and accurate measurement for changing glucose level. The sensor tag employs a long-term glucose sensor, a winding ferrite antenna, an RFID front-end, a potentiostat, a 10-bit sigma-delta analog to digital converter, an on-chip temperature sensor, and a digital baseband for protocol processing and control. A high-frequency external reader is used to power, command, and configure the sensor tag. The only off-chip support circuitry required is a tuned antenna and a glucose microsensor. The integrated chip fabricated in SMIC 0. 13-μm CMOS process occupies an area of 1. 2 mm × 2 mm and consumes 50 μW. The power sensitivity of the whole system is -4 dBm. The sensor tag achieves a measured glucose range of 0-30 mM with a sensitivity of 0. 75 nA/mM.

Details DOI

TCS Journal 2015 Journal Article

Computing on binary strings

Tian-Ming Bu
Chen Yuan
Peng Zhang

Details DOI

IJCAI Conference 2015 Conference Paper

From Raw Sensor Data to Detailed Spatial Knowledge

Peng Zhang
Jae Hee Lee
Jochen Renz

Qualitative spatial reasoning deals with relational spatial knowledge and with how this knowledge can be processed efficiently. Identifying suitable representations for spatial knowledge and checking whether the given knowledge is consistent has been the main research focus in the past two decades. However, where the spatial information comes from, what kind of information can be obtained and how it can be obtained has been largely ignored. This paper is an attempt to start filling this gap. We present a method for extracting detailed spatial information from sensor measurements of regions. We analyse how different sparse sensor measurements can be integrated and what spatial information can be extracted from sensor measurements. Different from previous approaches to qualitative spatial reasoning, our method allows us to obtain detailed information about the internal structure of regions. The result has practical implications, for example, in disaster management scenarios, which include identifying the safe zones in bushfire and flood regions.

PDF Details

TCS Journal 2015 Journal Article

Improved parameterized and exact algorithms for cut problems on trees

Iyad Kanj
Guohui Lin
Tian Liu
Weitian Tong
Ge Xia
Jinhui Xu
Boting Yang
Fenghui Zhang

Details DOI

IJCAI Conference 2015 Conference Paper

Influence Maximization in Big Networks: An Incremental Algorithm for Streaming Subgraph Influence Spread Estimation

Wei-Xue Lu
Peng Zhang
Chuan Zhou
Chunyi Liu
Li Gao

Influence maximization plays a key role in social network viral marketing. Although the problem has been widely studied, it is still challenging to estimate influence spread in big networks with hundreds of millions of nodes. Existing heuristic algorithms and greedy algorithms incur heavy computation cost in big networks and are incapable of processing dynamic network structures. In this paper, we propose an incremental algorithm for influence spread estimation in big networks. The incremental algorithm breaks down big networks into small subgraphs and continuously estimate influence spread on these subgraphs as data streams. The challenge of the incremental algorithm is that subgraphs derived from a big network are not independent and MC simulations on each subgraph (defined as snapshots) may conflict with each other. In this paper, we assume that different combinations of MC simulations on subgraphs generate independent samples. In so doing, the incremental algorithm on streaming subgraphs can estimate influence spread with fewer simulations. Experimental results demonstrate the performance of the proposed algorithm.

PDF Details

YNIMG Journal 2015 Journal Article

Layer-specific response properties of the human lateral geniculate nucleus and superior colliculus

Peng Zhang
Hao Zhou
Wen Wen
Sheng He

Details DOI

IJCAI Conference 2015 Conference Paper

Modeling Quantum Entanglements in Quantum Language Models

Mengjiao Xie
Yuexian Hou
Peng Zhang
Jingfei Li
Wenjie Li
Dawei Song

Recently, a Quantum Language Model (QLM) was proposed to model term dependencies upon Quantum Theory (QT) framework and successively applied in Information Retrieval (IR). Nevertheless, QLM’s dependency is based on co-occurrences of terms and has not yet taken into account the Quantum Entanglement (QE), which is a key quantum concept and has a significant cognitive implication. In QT, an entangled state can provide a more complete description for the nature of realities, and determine intrinsic correlations of considered objects globally, rather than those co-occurrences on the surface. It is, however, a real challenge to decide and measure QE using the classical statistics of texts in a post-measurement configuration. In order to circumvent this problem, we theoretically prove the connection between QE and statistically Unconditional Pure Dependence (UPD). Since UPD has an implementable deciding algorithm, we can in turn characterize QE by extracting the UPD patterns from texts. This leads to a measurable QE, based on which we further advance the existing QLM framework. We empirically compare our model with related models, and the results demonstrate the effectiveness of our model.

PDF Details

AAAI Conference 2014 Conference Paper

Combining Heterogenous Social and Geographical Information for Event Recommendation

Zhi Qiao
Peng Zhang
Yanan Cao
Chuan Zhou
Li Guo
Binxing Fang

With the rapid growth of event-based social networks (EBSNs) like Meetup, the demand for event recommendation becomes increasingly urgent. In EBSNs, event recommendation plays a central role in recommending the most relevant events to users who are likely to participate in. Different from traditional recommendation problems, event recommendation encounters three new types of information, i. e. , heterogenous online+offline social relationships, geographical features of events and implicit rating data from users. Yet combining the three types of data for offline event recommendation has not been considered. Therefore, we present a Bayesian latent factor model that can unify these data for event recommendation. Experimental results on real-world data sets show the performance of our method.

PDF Details

AAAI Conference 2014 Conference Paper

Event Recommendation in Event-Based Social Networks

Zhi Qiao
Peng Zhang
Chuan Zhou
Yanan Cao
Li Guo
Yanchuan Zhang

With the rapid growth of event-based social networks, the demand of event recommendation becomes increasingly important. Different from classic recommendation problems, event recommendation generally faces the problems of heterogenous online and offline social relationships among users and implicit feedback data. In this paper, we present a baysian probability model that can fully unleash the power of heterogenous social relations and efficiently tackle with implicit feedback characteristic for event recommendation. Experimental results on several real-world datasets demonstrate the utility of our method.

PDF Details

KR Conference 2014 Conference Paper

Qualitative Spatial Representation and Reasoning in Angry Bird: the Extended Rectangle Algebra

Peng Zhang
Jochen Renz

TCS Journal 2009 Journal Article

An approximation algorithm to the k-Steiner Forest problem

Peng Zhang
Mingji Xia

Details DOI

TCS Journal 2007 Journal Article

A new approximation algorithm for the k -facility location problem

Peng Zhang

Details DOI

TCS Journal 2007 Journal Article

Computational complexity of counting problems on 3-regular planar graphs

Mingji Xia
Peng Zhang
Wenbo Zhao

Details DOI

TCS Journal 2006 Journal Article

A network flow approach to the Minimum Common Integer Partition Problem

Wenbo Zhao
Peng Zhang
Tao Jiang

Details DOI

YNIMG Journal 2006 Journal Article

The effect of visuospatial attentional load on the processing of irrelevant acoustic distractors

Peng Zhang
Xiangchuan Chen
Peng Yuan
Daren Zhang
Sheng He

Details DOI

YNIMG Journal 2005 Journal Article

Age-dependent brain activation during forward and backward digit recall revealed by fMRI

Xiwen Sun
Xiaochu Zhang
Xiangchuan Chen
Peng Zhang
Min Bao
Daren Zhang
Jing Chen
Sheng He

Details DOI

AAAI Conference 2005 Conference Paper

Finite Sample Error Bound for Parzen Windows

Peng Zhang

Parzen Windows as a nonparametric method has been applied to a variety of density estimation as well as classification problems. Similar to nearest neighbor methods, Parzen Windows does not involve learning. While it converges to true but unknown probability densities in the asymptotic limit, there is a lack of theoretical analysis on its performance with finite samples. In this paper we establish a finite sample error bound for Parzen Windows. We first show that Parzen Windows is an approximation to regularized least squares (RLS) methods that have been well studied in statistical learning theory. We then derive the finite sample error bound for Parzen Windows, and discuss the properties of the error bound and its relationship to the error bound for RLS. This analysis provides interesting insight to Parzen Windows as well as the nearest neighbor method from the point of view of learning theory. Finally, we provide empirical results on the performance of Parzen Windows and other methods such as nearest neighbors, RLS and SVMs on a number of real data sets. These results corroborate well our theoretical analysis.

PDF Details