Arrow Research search

Author name cluster

Li Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

96 papers
2 author rows

Possible papers

96

AAAI Conference 2026 Conference Paper

Beyond Sharpness: The Role of Nonuniformity in Generalization

  • Yingcong Zhou
  • Pingfan Wu
  • Li Wang
  • Zhiguo Fu
  • Fengqin Yang

Sharpness-aware minimization (SAM) is widely recognized for enhancing the generalization performance of deep neural networks. However, recent works have challenged the statement that flatness implies generalization, demonstrating that it is insufficient as the indicator of generalization. In this paper, we reveal an insightful phenomenon: among minima of similar sharpness, stochastic optimization algorithms tend to prefer those with lower nonuniformity. We define nonuniformity by both the magnitude and structure of the gradient noise, and show that it fundamentally differs from sharpness and plays a critical role in generalization. Specifically, we first theoretically prove that the expected generalization gap of models trained via stochastic optimization algorithm is positively correlated with nonuniformity (the magnitude of the gradient noise). Empirically, we show that nonuniformity exhibits a stronger correlation with generalization than sharpness, especially in Transformer models. Furthermore, we demonstrate that the nonuniformity (the structure of the gradient noise) more effectively guides the algorithm towards sparser solutions and exhibits better generalization performance than sharpness-based methods in the high-dimensional sparse regression problem. Finally, extensive experiments on various datasets and models confirm the advantages of nonuniformity for generalization: (1) optimization guided by nonuniformity achieves better generalization compared to those achieved through flatness (including standard training, transfer learning, hyperparameter sensitivity and robustness to label noise); (2) model architecture (such as depth and width) is closely related to nonuniformity.

AAAI Conference 2026 Conference Paper

Decoupling Understanding from Reasoning via Problem Space Mapping for Small-Scale Model Reasoning

  • Li Wang
  • Changhao Zhang
  • Zengqi Xiu
  • Kai Lu
  • Xin Yu
  • Kui Zhang
  • Wenjun Wu

Despite recent advances in the reasoning capabilities of Large Language Models (LLMs), improving the reasoning ability of Small Language Models (SLMs, e.g., up to 1.5B parameters) remains challenging. A key obstacle lies in the complexity and variability of natural language: essentially equivalent problems often appear in diverse surface forms, often obscured by redundant or distracting details. This imposes a dual burden on SLMs: they must first extract the core problem from complex linguistic input, and then perform reasoning based on that understanding. The resulting vast and noisy problem space hinders optimization, particularly for models with limited capacity. To address this, we propose a new framework that decouples understanding from reasoning by mapping natural language problems into a canonical problem space-a semantically simplified yet expressive domain. This enables SLMs to focus on reasoning over standardized inputs, free from linguistic variability. Within this framework, we introduce DURIT (Decoupled Understanding from Reasoning via Iterative Training), a three-step algorithm that iteratively: (1) mapping natural language problems via reinforcement learning, (2) aligns reasoning trajectories through self-distillation, and (3) trains reasoning policies in the problem space. The mapper and reasoner are co-trained in an alternating loop throughout this process. Experiments show that DURIT substantially improves SLMs' performance on both in-domain and out-of-domain mathematical and logical reasoning tasks. Beyond improving reasoning capabilities, DURIT also improves the robustness of reasoning, validating decoupling understanding from reasoning as an effective strategy for strengthening SLMs.

JBHI Journal 2026 Journal Article

Direct PET-to-CT Generation for Attenuation Correction: A Slice-to-Slice Continual Transformer Segmentation-Aware Network

  • Rongjun Ge
  • Hanyuan Zheng
  • Yuxin Liu
  • Liutao Yang
  • Li Wang
  • Xu Ji
  • Jingtao Shen
  • Nan Li

Direct synthetic computed tomography (CT) generation from positron emission tomography (PET) plays a crucial role in PET attenuation correction, yet providing detailed structural information to compensate for functional imaging. Compared to the widely used PET/CT and indirect PET/MR-CT, the direct PET-to-CT translation method (denoted as PET-to-CT) offers several advantages: 1) The CT required for PET-to-CT is directly obtained from PET, thereby avoiding the intermediate errors generated in the inter-step processes of multimodal scanning in PET/CT and PET/MR-CT. 2) Furthermore, direct PET-to-CT eliminates the requirement for supplementary imaging equipment, thereby reducing complexity and scan duration in contrast to PET/CT and PET/MR-CT imaging. Thus, direct PET-to-CT is highly promising for clinical applications. However, it faces challenges, including spatial resolution mismatches between PET and CT, as well as voxel-wise semantic differences arising from functional and structural imaging. To address these challenges, this paper proposes a 2D hierarchical method called S2SCT (Slice-to-Slice Continual Transformer)-SA (Segmentation-aware) Network. It uses a slice-continual network to acquire semantic transformation knowledge from each PET slice to a CT slice, facilitating the conversion between functional and structural imaging domains. Subsequently, the segmentation-aware network is designed to futher capture spatial correlations both between slices and within slice, resulting in improved CT spatial resolution. The experiment results demonstrate that our proposed method outperforms mainstream methods in both CT generation and attenuation correction, as evidenced by both visual results and metric values.

JBHI Journal 2026 Journal Article

GPFD-Net: A Geometry-Pose Frequency Decoupling Network for Privacy-Preserving Human Action Recognition in Healthcare

  • Xing Li
  • Jingfan Liang
  • Ge Gao
  • Li Wang
  • Haifeng Wang
  • Shihao Han

Human Action Recognition (HAR) holds significant application value in healthcare informatics, facilitating tasks such as clinical diagnosis and rehabilitation monitoring. Point cloud sequences have emerged as a pivotal modality for balancing privacy preservation with high-fidelity geometric structural representation, ensuring anonymity while retaining critical 3D behavioral information. However, existing point cloud sequence encoding methods struggle to precisely encode micro-geometric details and macro-pose contours within the spatial dimension, as well as the dynamic heterogeneity of actions within the temporal dimension. These limitations impede the realization of high-precision clinical motion analysis. To address these challenges, we propose a Geometry-Pose Frequency Decoupling Network (GPFD-Net) for human action recognition. First, we design a Geometry-Pose Parallel-Collaborative Spatial Encoder (GPCSE). This module designs a parallel dual-stream architecture to explicitly capture and fuse complementary micro-geometric details and macro-pose contours, generating an informative geometry-enhanced pose feature sequence. Second, we introduce a Frequency-Decoupled Temporal Capturer (FDTC). This module adaptively decomposes the geometry-enhanced pose feature sequence into a smooth trend sequence and a transient detail sequence, which are subsequently processed by two parallel expert encoders via differentiated encoding to achieve robust human action recognition. Extensive experiments on four public benchmark datasets demonstrate that GPFD-Net achieves superior performance. The proposed method provides a novel paradigm for high-precision and privacy-preserving motion analysis in healthcare applications.

AAAI Conference 2026 Conference Paper

MoMoREC: A Multi-agent Motivation Generation Framework for Residual Semantic ID-Aware Recommendation

  • Yige Wang
  • Mingming Li
  • Li Wang
  • Kaichen Zhao
  • Wangming Li
  • Weipeng Jiang
  • Xueying Li

Recent advances in the field of sequential recommendation have highlighted the potential of Large Language Models (LLMs) in enhancing item embeddings and improving user understanding. However, existing approaches face three major limitations: 1) insufficient understanding of the reasons behind users' purchase decisions, 2) the high-dimensional embeddings directly produced by LLMs are not well compatible with traditional low-dimensional ID embeddings and 3) reliance on additional fine-tuning and high inference overhead to adapt LLMs to the recommendation task. In this paper, we propose MoMoREC, a simple yet effective user-understanding-based recommendation strategy. This method leverages the intrinsic comprehension capabilities of LLMs combined with residual semantic IDs to better understand users. Specifically, starting from common user purchasing behaviors and incorporating item characteristics, we employ a multi-agent framework to utilize LLMs in analyzing user shopping motivations and extracting high-dimensional dense embeddings. These embeddings are then transformed into low-dimensional IDs using a residual semantic ID approach via clustering and residual dimensionality reduction, which can be fed into the recommendation model. MoMoREC effectively integrates the understanding power of LLMs with the strengths of recommendation systems, preserving rich semantic language embeddings while reducing or eliminating the need for auxiliary trainable modules. As a result, it seamlessly adapts to any sequential recommendation framework. Experiments on three benchmark datasets show that MoMoRec significantly improves traditional recommendation models, demonstrating its effectiveness and flexibility.

JBHI Journal 2026 Journal Article

TriFuse-Net: A Tri-Branch PET/CT Fusion Pyramid Network Enhanced by Lesion-Guided Structural-Metabolic Attention for Lung Cancer Diagnosis and Prognosis

  • Yuyu Liu
  • Jieqin Lv
  • Fangfang Yang
  • Huiqin Wu
  • Xiang Pan
  • Li Wang
  • Han Bai
  • Shunfang Wang

Diagnosis and prognosis of lung cancer via PET/CT imaging have long been major clinical concerns. However, existing multimodal approaches often focus on feature aggregation rather than cross-modal interactive collaboration, failing to capture the structural-metabolic correlations and multi-scale synergy essential for characterizing complex lesions. Therefore, this study proposes TriFuse-Net, a tri-branch PET/CT fusion pyramid network (FPN) enhanced by lesion-guided structural-metabolic attention (LSMA) to improve both diagnosis and prognosis prediction tasks. The model is composed of two identical unimodal branches (PET/CT) and one pyramid branch with an interacting channel and spatial attention. The pyramid structure enables bidirectional multiscale feature extraction and fusion, capturing both local details and global semantic information of lesions. Comprehensive experiments validated the model's superiority across three clinical tasks. TriFuse-Net achieved a C-index of 0. 747 for progression-free survival (PFS) prediction, showing improvements of 14. 7% and 11. 0% over ResNet-CT and ResNet-PET, respectively. Additionally, the clinical-integrated model (TriFuse-Net-Cli) achieved AUCs of 0. 947 for differentiating lung cancer from tuberculosis and 0. 937 for identifying lymph-node metastasis. Ablation studies further confirmed the essential contributions of both FPN and LSMA. In summary, the proposed framework demonstrates that integrating multi-scale structural-metabolic relationships significantly enhances diagnosis and prognosis in lung cancer.

JBHI Journal 2026 Journal Article

Whisperization and Masked CycleGAN-Based Framework for Electrolaryngeal Speech Enhancement

  • Jie Zhou
  • Li Wang
  • Fengji Li
  • Shaochuan Zhang
  • Fan Fan
  • Tao Liu
  • Xiaohong Chen
  • Haijun Niu

Electrolarynx (EL) provides an effective approach to voice rehabilitation for patients with phonation disorder. However, due to its reliance on an external mechanical source, EL speech suffers from limited acoustic cues, leading to degraded quality and restricting the potential of subsequent modeling and enhancement. This paper proposes a novel EL speech enhancement framework that combines whisperization with Masked CycleGAN model. The whisperization step removes redundant constant excitation and mechanical noise, generating an intermediate speech form—whisper-like EL (W-EL) speech, whose acoustic and perceptual properties are closer to natural whisper. Subsequently, the Masked CycleGAN employs a frame-level masking strategy to guide the generator in reconstructing missing prosodic and linguistic features. Thus, we achieved a dual-stage enhancement of “redundancy removal” and “deficiency compensation. ” Acoustic feature analysis demonstrates that the converted W-EL speech is more similar to normal speech in terms of spectrogram, fundamental frequency (F0) values, and F0 contours, while also compensating for the missing low frequency energy below 500 Hz. Objective evaluations show significant improvements across multiple metrics. Subjective evaluations confirm that W-EL speech exhibits higher naturalness and intelligibility compared to original EL speech. Moreover, the combined “whisperization + voice conversion” framework further enhances perceptual quality. This study not only offer a novel pathway for EL speech enhancement, but also may provide valuable insights for improving other types of pathological speech.

ICML Conference 2025 Conference Paper

Cape: Context-Aware Prompt Perturbation Mechanism with Differential Privacy

  • Haoqi Wu
  • Wei Dai
  • Li Wang
  • Qiang Yan

Large Language Models (LLMs) have gained significant popularity due to their remarkable capabilities in text understanding and generation. However, despite their widespread deployment in inference services such as ChatGPT, concerns about the potential leakage of sensitive user data have arisen. Existing solutions primarily rely on privacy-enhancing technologies to mitigate such risks, facing the trade-off among efficiency, privacy, and utility. To narrow this gap, we propose Cape, a context-aware prompt perturbation mechanism based on differential privacy, to enable efficient inference with an improved privacy-utility trade-off. Concretely, we introduce a hybrid utility function that better captures the token similarity. Additionally, we propose a bucketized sampling mechanism to handle large sampling space, which might lead to long-tail phenomenons. Extensive experiments across multiple datasets, along with ablation studies, demonstrate that Cape achieves a better privacy-utility trade-off compared to prior state-of-the-art works.

NeurIPS Conference 2025 Conference Paper

Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning

  • Simin Li
  • Zihao Mao
  • Hanxiao Li
  • Zonglei Jing
  • Zhuohang bian
  • Jun Guo
  • Li Wang
  • Zhuoran Han

In cooperative Multi-Agent Reinforcement Learning (MARL), it is a common practice to tune hyperparameters in ideal simulated environments to maximize cooperative performance. However, policies tuned for cooperation often fail to maintain robustness and resilience under real-world uncertainties. Building trustworthy MARL systems requires a deep understanding of \emph{robustness}, which ensures stability under uncertainties, and \emph{resilience}, the ability to recover from disruptions—a concept extensively studied in control systems but largely overlooked in MARL. In this paper, we present a large-scale empirical study comprising over 82, 620 experiments to evaluate cooperation, robustness, and resilience in MARL across 4 real-world environments, 13 uncertainty types, and 15 hyperparameters. Our key findings are: (1) Under mild uncertainty, optimizing cooperation improves robustness and resilience, but this link weakens as perturbations intensify. Robustness and resilience also varies by algorithm and uncertainty type. (2) Robustness and resilience do not generalize across uncertainty modalities or agent scopes: policies robust to action noise for all agents may fail under observation noise on a single agent. (3) Hyperparameter tuning is critical for trustworthy MARL: surprisingly, standard practices like parameter sharing, GAE, and PopArt can hurt robustness, while early stopping, high critic learning rates, and Leaky ReLU consistently help. By optimizing hyperparameters only, we observe substantial improvement in cooperation, robustness and resilience across all MARL backbones, with the phenomenon also generalizing to robust MARL methods across these backbones.

YNIMG Journal 2025 Journal Article

Image-based meta- and mega-analysis (IBMMA): A unified framework for large-scale, multi-site, neuroimaging data analysis

  • Nick Steele
  • Ashley A. Huggins
  • Rajendra A. Morey
  • Ahmed Hussain
  • Courtney Russell
  • Benjamin Suarez-Jimenez
  • Elena Pozzi
  • Hadis Jameei

The increasing scale and complexity of neuroimaging datasets aggregated from multiple study sites present substantial analytic challenges, as existing statistical analysis tools struggle to handle missing voxel-data, suffer from limited computational speed and inefficient memory allocation, and are restricted in the types of statistical designs they are able to model. We introduce Image-Based Meta- & Mega-Analysis (IBMMA), a novel software package implemented in R and Python that provides a unified framework for analyzing diverse neuroimaging features, efficiently handles large-scale datasets through parallel processing, offers flexible statistical modeling options, and properly manages missing voxel-data commonly encountered in multi-site studies. IBMMA successfully analyzed a large-n dataset of several thousand participants and revealed findings in brain regions that some traditional software overlooked due to missing voxel-data resulting in gaps in brain coverage. IBMMA has the potential to accelerate discoveries in neuroscience and enhance the clinical utility of neuroimaging findings.

YNIMG Journal 2025 Journal Article

Linking visual-frontoparietal network neural dynamics to spontaneous cognitive processing

  • Leinian Li
  • Li Wang

Previous studies in neuroscience have predominantly focused on the role of the default mode network (DMN) in spontaneous thought, with the contributions of other brain regions remaining largely unexplored. In this study, we hypothesized that the visual-frontoparietal network (VFPN) would exhibit distinct macroscopic patterns associated with spontaneous cognitive processing. To test this hypothesis, we analyzed four functional magnetic resonance imaging (fMRI) datasets. Our results revealed that self-reported cognitive states during rest were strongly correlated with specific macroscopic patterns in the VFPN. These patterns were also observed during movie viewing/listening and had previously been identified in multistable perception tasks. Further analysis showed that the microscopic activation patterns in the visual areas were closely linked to self-reported cognitive states. Additionally, we found that memory replay in the visual areas was more pronounced when the frontoparietal network was active, compared to when it was inactive. Finally, fluctuations in the VFPN and their coupling with the hippocampus were significant predictors of offline memory enhancement. In conclusion, these findings demonstrate consistent patterns in the visual and frontoparietal brain regions during resting states that are closely associated with cognitive activity, providing strong evidence for the significant roles of regions beyond the DMN in spontaneous thought.

JBHI Journal 2025 Journal Article

PKAN: Leveraging Kolmogorov–Arnold Networks and Multi-Modal Learning for Peptide Prediction With Advanced Language Models

  • Li Wang
  • Xiangzheng Fu
  • Xiucai Ye
  • Tetsuya Sakurai
  • Xiangxiang Zeng
  • Yiping Liu

Peptides can offer highly specific biological activities, serving as essential mediators of intercellular signaling, which are critical for advancing precision medicine and drug development. Their primary structure can be depicted either as an amino acid sequence or as a chemical molecules consisting of atoms and chemical bonds. Large language models (LLMs) hold the potential to thoroughly elucidate the intricate intrinsic properties of peptides. Here we present the Peptide Kolmogorov-Arnold Network (PKAN), a framework leveraging multi-modal representations inspired by advanced language models for peptide activity and functionality prediction. Comparative experiments across tasks show that PKAN outperforms state-of-the-art models while maintaining a streamlined design with superior predictive capabilities. The multi-modal feature importance scoring, anchored in global structures and the significant marginal impacts of derived features on the model, coupled with intricate symbolic regression of specific activation functions, further demonstrates the robustness and precision of the PKAN framework in identifying and elucidating key determinants of peptide functionality. This work provides scientific evidence for investigating the complex mechanisms of peptide materials and supports the progression of peptide language paradigms in biology.

YNIMG Journal 2025 Journal Article

STF: A spherical transformer for versatile cortical surfaces applications

  • Jiale Cheng
  • Fenqiang Zhao
  • Zhengwang Wu
  • Xinrui Yuan
  • Li Wang
  • John H Gilmore
  • Weili Lin
  • Xin Zhang

Inspired by the remarkable success of attention mechanisms in various applications, there is a growing need to adapt the Transformer architecture from conventional Euclidean domains to non-Euclidean spaces commonly encountered in medical imaging. Structures such as brain cortical surfaces, represented by triangular meshes, exhibit spherical topology and present unique challenges. To address this, we propose the Spherical Transformer (STF), a versatile backbone that leverages self-attention for analyzing cortical surface data. Our approach involves mapping cortical surfaces onto a sphere, dividing them into overlapping patches, and tokenizing both patches and vertices. By performing self-attention at patch and vertex levels, the model simultaneously captures global dependencies and preserves fine-grained contextual information within each patch. Overlapping regions between neighboring patches naturally enable efficient cross-patch information sharing. To handle longitudinal cortical surface data, we introduce the spatiotemporal self-attention mechanism, which jointly captures spatial context and temporal developmental patterns within a single layer. This innovation enhances the representational power of the model, making it well-suited for dynamic surface data. We evaluate the Spherical Transformer on key tasks, including cognition prediction at the surface level and two vertex-level tasks: cortical surface parcellation and cortical property map prediction. Across these applications, our model consistently outperforms state-of-the-art methods, demonstrating its ability to effectively model global dependencies and preserve detailed spatial information. The results highlight its potential as a general-purpose framework for cortical surface analysis.

JBHI Journal 2025 Journal Article

Synergistic Drug Combination Prediction via Dual-Level Feature Aggregation and Knowledge Graph-Based Deep Neural Network

  • Ying Zuo
  • Yan Zhang
  • Li Wang
  • Jianping Yu
  • Jiawei Luo
  • Qiu Xiao

Identifying synergistic drug combinations is a critical but difficult challenge in cancer treatment, owing to the sheer complexity and enormous number of possible drug combinations. However, most existing computational methods rely on a single data perspective and often overlooking the complexity of interactions between different biological entities. Furthermore, they fail to fully integrate the intrinsic properties of drugs and cell lines with the broader biological relationships that play a crucial role in drug synergy. To address these challenges, we propose a novel framework called LGSyn that integrates two types of information: local features, including molecular fingerprints, descriptors, and gene expression profiles, as well as global features that encompass broader biological interactions, including drug-protein, protein-cell line, protein-protein, and cell line-tissue interactions. By combining these two types of features, LGSyn leverages the full spectrum of biological knowledge to predict drug synergy. In LGSyn, we developed three fusion strategies to effectively integrate local and global information and identify the most suitable strategy. The resulting fused feature vectors are then fed into a deep neural network for training and synergy prediction. Experimental results demonstrate that the proposed method outperforms current state-of-the-art models, achieving superior accuracy and stability in drug synergy prediction.

NeurIPS Conference 2025 Conference Paper

V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

  • Lei Yang
  • Xinyu Zhang
  • Jun Li
  • Chen Wang
  • Jiaqi Ma
  • Zhiying Song
  • Tong Zhao
  • Ziying Song

Modern autonomous vehicle perception systems often struggle with occlusions and limited perception range. Previous studies have demonstrated the effectiveness of cooperative perception in extending the perception range and overcoming occlusions, thereby enhancing the safety of autonomous driving. In recent years, a series of cooperative perception datasets have emerged; however, these datasets primarily focus on cameras and LiDAR, neglecting 4D Radar—a sensor used in single-vehicle autonomous driving to provide robust perception in adverse weather conditions. In this paper, to bridge the gap created by the absence of 4D Radar datasets in cooperative perception, we present V2X-Radar, the first large-scale, real-world multi-modal dataset featuring 4D Radar. V2X-Radar dataset is collected using a connected vehicle platform and an intelligent roadside unit equipped with 4D Radar, LiDAR, and multi-view cameras. The collected data encompasses sunny and rainy weather conditions, spanning daytime, dusk, and nighttime, as well as various typical challenging scenarios. The dataset consists of 20K LiDAR frames, 40K camera images, and 20K 4D Radar data, including 350K annotated boxes across five categories. To support various research domains, we have established V2X-Radar-C for cooperative perception, V2X-Radar-I for roadside perception, and V2X-Radar-V for single-vehicle perception. Furthermore, we provide comprehensive benchmarks across these three sub-datasets.

YNIMG Journal 2024 Journal Article

A common and specialized neural code for social attention triggered by eye gaze and biological motion

  • Ruidi Wang
  • Tian Yuan
  • Li Wang
  • Yi Jiang

Humans appear to be endowed with the ability to readily share attention with interactive partners through the utilization of social direction cues, such as eye gaze and biological motion (BM). Here, we investigated the specialized brain mechanism underlying this fundamental social attention ability by incorporating different types of social (i.e., BM, gaze) and non-social (arrow) cues and combining functional magnetic resonance imaging (fMRI) with a modified central cueing paradigm. Using multi-voxel pattern analysis (MVPA), we found that although gaze- and BM-mediated attentional orienting could be decoded from neural activity in a wide range of brain areas, only the right anterior and posterior superior temporal sulcus (aSTS and pSTS) could specifically decode attentional orienting triggered by social but not non-social cues. Critically, cross-category MVPA further revealed that social attention could be decoded across BM and gaze cues in the right STS and the right superior temporal gyrus (STG). However, these regions could not decode attentional orienting across social and non-social cues. These findings together provide evidence for the existence of a specialized social attention module in the human brain, with the right STS/STG being the critical neural site dedicated to social attention.

YNIMG Journal 2024 Journal Article

Common neural dysfunction of economic decision-making across psychiatric conditions

  • Chunliang Feng
  • Qingxia Liu
  • Chuangbing Huang
  • Ting Li
  • Li Wang
  • Feilong Liu
  • Simon B. Eickhoff
  • Chen Qu

Adaptive decision-making, which is often impaired in various psychiatric conditions, is essential for well-being. Recent evidence has indicated that decision-making capacity in multiple tasks could be accounted for by latent dimensions, enlightening the question of whether there is a common disruption of brain networks in economic decision-making across psychiatric conditions. Here, we addressed the issue by combining activation/lesion network mapping analyses with a transdiagnostic brain imaging meta-analysis. Our findings indicate that there were transdiagnostic alterations in the thalamus and ventral striatum during the decision or outcome stage of decision-making. The identified regions represent key nodes in a large-scale network, which is composed of multiple heterogeneous brain regions and plays a causal role in motivational functioning. The findings suggest that disturbances in the network associated with emotion- and reward-related processing play a key role in dysfunctions of decision-making observed in various psychiatric conditions. This study provides the first meta-analytic evidence of common neural alterations linked to deficits in economic decision-making.

YNIMG Journal 2024 Journal Article

Large-scale meta-analyses and network analyses of neural substrates underlying human escalated aggression

  • Li Wang
  • Ting Li
  • Ruolei Gu
  • Chunliang Feng

Escalated aggression represents a frequent and severe form of violence, sometimes manifesting as antisocial behavior. Driven by the pressures of modern life, escalated aggression is of particular concern due to its rising prevalence and its destructive impact on both individual well-being and socioeconomic stability. However, a consistent neural circuitry underpinning it remains to be definitively identified. Here, we addressed this issue by comparing brain alterations between individuals with escalated aggression and those without such behavioral manifestations. We first conducted a meta-analysis to synthesize previous neuroimaging studies on functional and structural alterations of escalated aggression (325 experiments, 2997 foci, 16,529 subjects). Following-up network and functional decoding analyses were conducted to provide quantitative characterizations of the identified brain regions. Our results revealed that brain regions constantly involved in escalated aggression were localized in the subcortical network (amygdala and lateral orbitofrontal cortex) associated with emotion processing, the default mode network (dorsal medial prefrontal cortex and middle temporal gyrus) associated with mentalizing, and the salience network (anterior cingulate cortex and anterior insula) associated with cognitive control. These findings were further supported by additional meta-analyses on emotion processing, mentalizing, and cognitive control, all of which showed conjunction with the brain regions identified in the escalated aggression. Together, these findings advance the understanding of the risk biomarkers of escalated aggressive populations and refine theoretical models of human aggression.

IJCAI Conference 2024 Conference Paper

LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory

  • Zicheng Liu
  • Li Wang
  • Siyuan Li
  • Zedong Wang
  • Haitao Lin
  • Stan Z. Li

Transformer models have been successful in various sequence processing tasks, but the self-attention mechanism's computational cost limits its practicality for long sequences. Although there are existing attention variants that improve computational efficiency, they have a limited ability to abstract global information effectively based on their hand-crafted mixing strategies. On the other hand, state-space models (SSMs) are tailored for long sequences but cannot capture complicated local information. Therefore, the combination of them as a unified token mixer is a trend in recent long-sequence models. However, the linearized attention degrades performance significantly even when equipped with SSMs. To address the issue, we propose a new method called LongVQ. LongVQ uses the vector quantization (VQ) technique to compress the global abstraction as a length-fixed codebook, enabling the linear-time computation of the attention matrix. This technique effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues. Our experiments on the Long Range Arena benchmark, autoregressive language modeling, and image and speech classification demonstrate the effectiveness of LongVQ. Our model achieves significant improvements over other sequence models, including variants of Transformers, Convolutions, and recent State Space Models.

YNIMG Journal 2024 Journal Article

nBEST: Deep-learning-based non-human primates Brain Extraction and Segmentation Toolbox across ages, sites and species

  • Tao Zhong
  • Xueyang Wu
  • Shujun Liang
  • Zhenyuan Ning
  • Li Wang
  • Yuyu Niu
  • Shihua Yang
  • Zhuang Kang

Accurate processing and analysis of non-human primate (NHP) brain magnetic resonance imaging (MRI) serves an indispensable role in understanding brain evolution, development, aging, and diseases. Despite the accumulation of diverse NHP brain MRI datasets at various developmental stages and from various imaging sites/scanners, existing computational tools designed for human MRI typically perform poor on NHP data, due to huge differences in brain sizes, morphologies, and imaging appearances across species, sites, and ages, highlighting the imperative for NHP-specialized MRI processing tools. To address this issue, in this paper, we present a robust, generic, and fully automated computational pipeline, called non-human primates Brain Extraction and Segmentation Toolbox (nBEST), whose main functionality includes brain extraction, non-cerebrum removal, and tissue segmentation. Building on cutting-edge deep learning techniques by employing lifelong learning to flexibly integrate data from diverse NHP populations and innovatively constructing 3D U-NeXt architecture, nBEST can well handle structural NHP brain MR images from multi-species, multi-site, and multi-developmental-stage (from neonates to the elderly). We extensively validated nBEST based on, to our knowledge, the largest assemblage dataset in NHP brain studies, encompassing 1,469 scans with 11 species (e.g., rhesus macaques, cynomolgus macaques, chimpanzees, marmosets, squirrel monkeys, etc.) from 23 independent datasets. Compared to alternative tools, nBEST outperforms in precision, applicability, robustness, comprehensiveness, and generalizability, greatly benefiting downstream longitudinal, cross-sectional, and cross-species quantitative analyses. We have made nBEST an open-source toolbox (https://github.com/TaoZhong11/nBEST) and we are committed to its continual refinement through lifelong learning with incoming data to greatly contribute to the research field.

JMLR Journal 2024 Journal Article

Nonparametric Regression for 3D Point Cloud Learning

  • Xinyi Li
  • Shan Yu
  • Yueying Wang
  • Guannan Wang
  • Li Wang
  • Ming-Jun Lai

In recent years, there has been an exponentially increased amount of point clouds collected with irregular shapes in various areas. Motivated by the importance of solid modeling for point clouds, we develop a novel and efficient smoothing tool based on multivariate splines over the triangulation to extract the underlying signal and build up a 3D solid model from the point cloud. The proposed method can denoise or deblur the point cloud effectively, provide a multi-resolution reconstruction of the actual signal, and handle sparse and irregularly distributed point clouds to recover the underlying trajectory. In addition, our method provides a natural way of numerosity data reduction. We establish the theoretical guarantees of the proposed method, including the convergence rate and asymptotic normality of the estimator, and show that the convergence rate achieves optimal nonparametric convergence. We also introduce a bootstrap method to quantify the uncertainty of the estimators. Through extensive simulation studies and a real data example, we demonstrate the superiority of the proposed method over traditional smoothing methods in terms of estimation accuracy and efficiency of data reduction. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

JBHI Journal 2024 Journal Article

Progressive Dual Priori Network for Generalized Breast Tumor Segmentation

  • Li Wang
  • Lihui Wang
  • Zixiang Kuai
  • Lei Tang
  • Yingfeng Ou
  • Min Wu
  • Tianliang Shi
  • Chen Ye

To promote the generalization ability of breast tumor segmentation models, as well as to improve the segmentation performance for breast tumors with smaller size, low-contrast and irregular shape, we propose a progressive dual priori network (PDPNet) to segment breast tumors from dynamic enhanced magnetic resonance images (DCE-MRI) acquired at different centers. The PDPNet first cropped tumor regions with a coarse-segmentation based localization module, then the breast tumor mask was progressively refined by using the weak semantic priori and cross-scale correlation prior knowledge. To validate the effectiveness of PDPNet, we compared it with several state-of-the-art methods on multi-center datasets. The results showed that, comparing against the suboptimal method, the DSC and HD95 of PDPNet were improved at least by 5. 13% and 7. 58% respectively on multi-center test sets. In addition, through ablations, we demonstrated that the proposed localization module can decrease the influence of normal tissues and therefore improve the generalization ability of the model. The weak semantic priors allow focusing on tumor regions to avoid missing small tumors and low-contrast tumors. The cross-scale correlation priors are beneficial for promoting the shape-aware ability for irregular tumors. Thus integrating them in a unified framework improved the multi-center breast tumor segmentation performance.

JBHI Journal 2024 Journal Article

RClaNet: An Explainable Alzheimer's Disease Diagnosis Framework by Joint Registration and Classification

  • Liang Wu
  • Shunbo Hu
  • Duanwei Wang
  • Changchun Liu
  • Li Wang

Alzheimer's disease (AD) is an irreversible neurodegenerative disease that affects people's ability of daily life. Unfortunately, there is currently no known cure for AD. Thus, the early detection of AD plays a key role in preventing and controlling its progression. As one of representative methods for measuring brain atrophy, image registration technique has been widely adopted for AD diagnosis. In this study, an AD assistant diagnosis framework based on joint registration and classification is proposed. Specifically, to capture more local deformation information, a novel patch-based joint brain image registration and classification network (RClaNet) to estimate the local dense deformation fields (DDF) and disease risk probability maps (DRM) that explain high-risk areas for AD patients. RClaNet consists of a registration network and a classification network, in which the deformation field from registration network is fed into the classification network to enhance the prediction accuracy of the disease. Then, the exponential distance weighting method is used to obtain the global DDF and the global DRM without grid-like artifacts. Finally, the global classification network uses the global DRM for the early detection of AD. We evaluate the proposed method on the OASIS-3, AIBL, ADNI and COVID-19 datasets, and experimental results show that the proposed RClaNet achieves superior registration performances than several state-of-the-art methods. Early diagnosis of AD using the global DRM also yielded competitive results. These experiments prove that the deformation information in the registration process can be used to characterize subtle changes of degenerative diseases and further assist clinicians in diagnosis.

YNICL Journal 2024 Journal Article

Relationship of irisin with disease severity and dopamine uptake in Parkinson's disease patients

  • Xiaoxue Shi
  • Qi Gu
  • Chang Fu
  • Jianjun Ma
  • Dongsheng Li
  • Jinhua Zheng
  • Siyuan Chen
  • Zonghan She

BACKGROUND: This study was designed to investigate the relationship of irisin with the severity of Parkinson's disease (PD) and dopamine (DOPA) uptake in patients with PD and to understand the role of irisin in PD. METHODS: The plasma levels of irisin and α-syn were measured by enzyme-linked immunosorbent assay (ELISA). Motor and nonmotor symptoms were assessed with the relevant scales. DOPA uptake was measured with DOPA positron emission tomography (PET)/magnetic resonance imaging (MRI). RESULTS: The plasma levels of α-syn and irisin in patients with PD gradually increased and decreased, respectively, with the progression of the disease. There was a negative correlation between plasma α-syn and irisin levels in patients with PD. The level of irisin in plasma was negatively correlated with Unified Parkinson's Disease Rating Scale (UPDRS)-III scores and positively correlated with Montreal Cognitive Assessment (MoCA) scores. The striatal/occipital lobe uptake ratios (SORs) of the ipsilateral and contralateral caudate nucleus and anterior and posterior putamen in the high-irisin group were significantly higher than those in the low-irisin group, and irisin levels in the caudate nucleus and anterior and posterior putamen contralateral to the affected limb were lower than those on the ipsilateral side. The level of irisin was positively correlated with the SORs of the ipsilateral and contralateral caudate nucleus and putamen in PD patients. CONCLUSIONS: Irisin plays a neuroprotective role by decreasing the level of α-syn. Irisin is negatively correlated with the severity of motor symptoms and cognitive impairment. More importantly, irisin can improve DOPA uptake in the striatum of patients with PD, especially on the side contralateral to the affected limb.

IJCAI Conference 2024 Conference Paper

RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM

  • Ziying Song
  • Guoxing Zhang
  • Lin Liu
  • Lei Yang
  • Shaoqing Xu
  • Caiyan Jia
  • Feiyang Jia
  • Li Wang

Multi-modal 3D object detectors are dedicated to exploring secure and reliable perception systems for autonomous driving (AD). Although achieving state-of-the-art (SOTA) performance on clean benchmark datasets, they tend to overlook the complexity and harsh conditions of real-world environments. With the emergence of visual foundation models (VFMs), opportunities and challenges are presented for improving the robustness and generalization of multi-modal 3D object detection in AD. Therefore, we propose RoboFusion, a robust framework that leverages VFMs like SAM to tackle out-of-distribution (OOD) noise scenarios. We first adapt the original SAM for AD scenarios named SAM-AD. To align SAM or SAM-AD with multi-modal methods, we then introduce AD-FPN for upsampling the image features extracted by SAM. We employ wavelet decomposition to denoise the depth-guided images for further noise reduction and weather interference. At last, we employ self-attention mechanisms to adaptively reweight the fused features, enhancing informative features while suppressing excess noise. In summary, RoboFusion significantly reduces noise by leveraging the generalization and robustness of VFMs, thereby enhancing the resilience of multi-modal 3D object detection. Consequently, RoboFusion achieves SOTA performance in noisy scenarios, as demonstrated by the KITTI-C and nuScenes-C benchmarks. Code is available at https: //github. com/adept-thu/RoboFusion.

ICML Conference 2024 Conference Paper

Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences

  • Zicheng Liu 0006
  • Siyuan Li 0002
  • Li Wang
  • Zedong Wang
  • Yunfan Liu 0002
  • Stan Z. Li

To mitigate the computational complexity in the self-attention mechanism on long sequences, linear attention utilizes computation tricks to achieve linear complexity, while state space models (SSMs) popularize a favourable practice of using non-data-dependent memory pattern, i. e. , emphasize the near and neglect the distant, to processing sequences. Recent studies have shown the priorities by combining them as one. However, the efficiency of linear attention remains only at the theoretical level in a causal setting, and SSMs require various designed constraints to operate effectively on specific data. Therefore, in order to unveil the true power of the hybrid design, the following two issues need to be addressed: (1) hardware-efficient implementation for linear attention and (2) stabilization of SSMs. To achieve this, we leverage the thought of tiling and hierarchy to propose CHELA (short-long Convolutions with Hardware-Efficient Linear Attention), which replaces SSMs with short-long convolutions and implements linear attention in a divide-and-conquer manner. This approach enjoys global abstraction and data-dependent selection from stable SSM and linear attention while maintaining real linear complexity. Our comprehensive experiments on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method.

NeurIPS Conference 2024 Conference Paper

Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs

  • Qinpeng Cui
  • Yixuan Liu
  • Xinyi Zhang
  • Qiqi Bao
  • Qingmin Liao
  • Li Wang
  • Tian Lu
  • Zicheng Liu

Diffusion-based image super-resolution (SR) models have attracted substantial interest due to their powerful image restoration capabilities. However, prevailing diffusion models often struggle to strike an optimal balance between efficiency and performance. Typically, they either neglect to exploit the potential of existing extensive pretrained models, limiting their generative capacity, or they necessitate a dozens of forward passes starting from random noises, compromising inference efficiency. In this paper, we present DoSSR, a $\textbf{Do}$main $\textbf{S}$hift diffusion-based SR model that capitalizes on the generative powers of pretrained diffusion models while significantly enhancing efficiency by initiating the diffusion process with low-resolution (LR) images. At the core of our approach is a domain shift equation that integrates seamlessly with existing diffusion models. This integration not only improves the use of diffusion prior but also boosts inference efficiency. Moreover, we advance our method by transitioning the discrete shift process to a continuous formulation, termed as DoS-SDEs. This advancement leads to the fast and customized solvers that further enhance sampling efficiency. Empirical results demonstrate that our proposed method achieves state-of-the-art performance on synthetic and real-world datasets, while notably requiring $\textbf{\emph{only 5 sampling steps}}$. Compared to previous diffusion prior based methods, our approach achieves a remarkable speedup of 5-7 times, demonstrating its superior efficiency.

YNIMG Journal 2023 Journal Article

An attention-based context-informed deep framework for infant brain subcortical segmentation

  • Liangjun Chen
  • Zhengwang Wu
  • Fenqiang Zhao
  • Ya Wang
  • Weili Lin
  • Li Wang
  • Gang Li

Precise segmentation of subcortical structures from infant brain magnetic resonance (MR) images plays an essential role in studying early subcortical structural and functional developmental patterns and diagnosis of related brain disorders. However, due to the dynamic appearance changes, low tissue contrast, and tiny subcortical size in infant brain MR images, infant subcortical segmentation is a challenging task. In this paper, we propose a context-guided, attention-based, coarse-to-fine deep framework to precisely segment the infant subcortical structures. At the coarse stage, we aim to directly predict the signed distance maps (SDMs) from multi-modal intensity images, including T1w, T2w, and the ratio of T1w and T2w images, with an SDM-Unet, which can leverage the spatial context information, including the structural position information and the shape information of the target structure, to generate high-quality SDMs. At the fine stage, the predicted SDMs, which encode spatial-context information of each subcortical structure, are integrated with the multi-modal intensity images as the input to a multi-source and multi-path attention Unet (M2A-Unet) for achieving refined segmentation. Both the 3D spatial and channel attention blocks are added to guide the M2A-Unet to focus more on the important subregions and channels. We additionally incorporate the inner and outer subcortical boundaries as extra labels to help precisely estimate the ambiguous boundaries. We validate our method on an infant MR image dataset and on an unrelated neonatal MR image dataset. Compared to eleven state-of-the-art methods, the proposed framework consistently achieves higher segmentation accuracy in both qualitative and quantitative evaluations of infant MR images and also exhibits good generalizability in the neonatal dataset.

AAAI Conference 2023 Conference Paper

BERT-ERC: Fine-Tuning BERT Is Enough for Emotion Recognition in Conversation

  • Xiangyu Qin
  • Zhiyu Wu
  • Tingting Zhang
  • Yanran Li
  • Jian Luan
  • Bin Wang
  • Li Wang
  • Jinshi Cui

Previous works on emotion recognition in conversation (ERC) follow a two-step paradigm, which can be summarized as first producing context-independent features via fine-tuning pretrained language models (PLMs) and then analyzing contextual information and dialogue structure information among the extracted features. However, we discover that this paradigm has several limitations. Accordingly, we propose a novel paradigm, i.e., exploring contextual information and dialogue structure information in the fine-tuning step, and adapting the PLM to the ERC task in terms of input text, classification structure, and training strategy. Furthermore, we develop our model BERT-ERC according to the proposed paradigm, which improves ERC performance in three aspects, namely suggestive text, fine-grained classification module, and two-stage training. Compared to existing methods, BERT-ERC achieves substantial improvement on four datasets, indicating its effectiveness and generalization capability. Besides, we also set up the limited resources scenario and the online prediction scenario to approximate real-world scenarios. Extensive experiments demonstrate that the proposed paradigm significantly outperforms the previous one and can be adapted to various scenes.

YNICL Journal 2023 Journal Article

Brain development mediates the relationship between self-reported poor parental monitoring and adolescent anxiety

  • Yiman Li
  • Zheyi Zhou
  • Yuqi Zhang
  • Hui Ai
  • Mingfang Liu
  • Jing Liu
  • Li Wang
  • Jiang Qiu

Adolescence is the peak period for the onset of generalized anxiety disorder (GAD). Brain networks of cognitive and affective control in adolescents are not well developed when their exposure to external stimuli suddenly increases.Reasonable parental monitoring is especially important during this period.To examine the role of parental monitoring in the development of functional brain networks of GAD, we conducted a cross-validation-based predictive study based on the functional brain networks of 192 participants. We found that a set of functional brain networks, especially the default mode network and its connectivity with the frontoparietal network, could predict the ages of adolescents, which was replicated in three independent samples.Importantly, the difference between predicted age and chronological age significantly mediated the relationship between parental monitoring and anxiety levels. These findings suggest that inadequate parental monitoring plays a crucial role in the delayed development of specific brain networks associated with GAD in adolescents. Our work highlights the important role of parental monitoring in adolescent development.

YNIMG Journal 2023 Journal Article

Neuroimaging-based classification of PTSD using data-driven computational approaches: A multisite big data study from the ENIGMA-PGC PTSD consortium

  • Xi Zhu
  • Yoojean Kim
  • Orren Ravid
  • Xiaofu He
  • Benjamin Suarez-Jimenez
  • Sigal Zilcha-Mano
  • Amit Lazarov
  • Seonjoo Lee

BACKGROUND: Recent advances in data-driven computational approaches have been helpful in devising tools to objectively diagnose psychiatric disorders. However, current machine learning studies limited to small homogeneous samples, different methodologies, and different imaging collection protocols, limit the ability to directly compare and generalize their results. Here we aimed to classify individuals with PTSD versus controls and assess the generalizability using a large heterogeneous brain datasets from the ENIGMA-PGC PTSD Working group. METHODS: We analyzed brain MRI data from 3,477 structural-MRI; 2,495 resting state-fMRI; and 1,952 diffusion-MRI. First, we identified the brain features that best distinguish individuals with PTSD from controls using traditional machine learning methods. Second, we assessed the utility of the denoising variational autoencoder (DVAE) and evaluated its classification performance. Third, we assessed the generalizability and reproducibility of both models using leave-one-site-out cross-validation procedure for each modality. RESULTS: We found lower performance in classifying PTSD vs. controls with data from over 20 sites (60 % test AUC for s-MRI, 59 % for rs-fMRI and 56 % for d-MRI), as compared to other studies run on single-site data. The performance increased when classifying PTSD from HC without trauma history in each modality (75 % AUC). The classification performance remained intact when applying the DVAE framework, which reduced the number of features. Finally, we found that the DVAE framework achieved better generalization to unseen datasets compared with the traditional machine learning frameworks, albeit performance was slightly above chance. CONCLUSION: These results have the potential to provide a baseline classification performance for PTSD when using large scale neuroimaging datasets. Our findings show that the control group used can heavily affect classification performance. The DVAE framework provided better generalizability for the multi-site data. This may be more significant in clinical practice since the neuroimaging-based diagnostic DVAE classification models are much less site-specific, rendering them more generalizable.

AAAI Conference 2023 Conference Paper

The Implicit Regularization of Momentum Gradient Descent in Overparametrized Models

  • Li Wang
  • Zhiguo Fu
  • Yingcong Zhou
  • Zili Yan

The study of the implicit regularization induced by gradient-based optimization in deep learning is a long-standing pursuit. In the present paper, we characterize the implicit regularization of momentum gradient descent (MGD) in the continuous-time view, so-called momentum gradient flow (MGF). We show that the components of weight vector are learned for a deep linear neural networks at different evolution rates, and this evolution gap increases with the depth. Firstly, we show that if the depth equals one, the evolution gap between the weight vector components is linear, which is consistent with the performance of ridge. In particular, we establish a tight coupling between MGF and ridge for the least squares regression. In detail, we show that when the regularization parameter of ridge is inversely proportional to the square of the time parameter of MGF, the risk of MGF is no more than 1.54 times that of ridge, and their relative Bayesian risks are almost indistinguishable. Secondly, if the model becomes deeper, i.e. the depth is greater than or equal to 2, the evolution gap becomes more significant, which implies an implicit bias towards sparse solutions. The numerical experiments strongly support our theoretical results.

AAAI Conference 2023 Conference Paper

Transfer Learning Enhanced DeepONet for Long-Time Prediction of Evolution Equations

  • Wuzhe Xu
  • Yulong Lu
  • Li Wang

Deep operator network (DeepONet) has demonstrated great success in various learning tasks, including learning solution operators of partial differential equations. In particular, it provides an efficient approach to predicting the evolution equations in a finite time horizon. Nevertheless, the vanilla DeepONet suffers from the issue of stability degradation in the long- time prediction. This paper proposes a transfer-learning aided DeepONet to enhance the stability. Our idea is to use transfer learning to sequentially update the DeepONets as the surro- gates for propagators learned in different time frames. The evolving DeepONets can better track the varying complexities of the evolution equations, while only need to be updated by efficient training of a tiny fraction of the operator networks. Through systematic experiments, we show that the proposed method not only improves the long-time accuracy of Deep- ONet while maintaining similar computational cost but also substantially reduces the sample size of the training set.

AAAI Conference 2023 Conference Paper

Video-Audio Domain Generalization via Confounder Disentanglement

  • Shengyu Zhang
  • Xusheng Feng
  • Wenyan Fan
  • Wenjing Fang
  • Fuli Feng
  • Wei Ji
  • Shuo Li
  • Li Wang

Existing video-audio understanding models are trained and evaluated in an intra-domain setting, facing performance degeneration in real-world applications where multiple domains and distribution shifts naturally exist. The key to video-audio domain generalization (VADG) lies in alleviating spurious correlations over multi-modal features. To achieve this goal, we resort to causal theory and attribute such correlation to confounders affecting both video-audio features and labels. We propose a DeVADG framework that conducts uni-modal and cross-modal deconfounding through back-door adjustment. DeVADG performs cross-modal disentanglement and obtains fine-grained confounders at both class-level and domain-level using half-sibling regression and unpaired domain transformation, which essentially identifies domain-variant factors and class-shared factors that cause spurious correlations between features and false labels. To promote VADG research, we collect a VADG-Action dataset for video-audio action recognition with over 5,000 video clips across four domains (e.g., cartoon and game) and ten action classes (e.g., cooking and riding). We conduct extensive experiments, i.e., multi-source DG, single-source DG, and qualitative analysis, validating the rationality of our causal analysis and the effectiveness of the DeVADG framework.

YNIMG Journal 2022 Journal Article

A 4D infant brain volumetric atlas based on the UNC/UMN baby connectome project (BCP) cohort

  • Liangjun Chen
  • Zhengwang Wu
  • Dan Hu
  • Ya Wang
  • Fenqiang Zhao
  • Tao Zhong
  • Weili Lin
  • Li Wang

Spatiotemporal (four-dimensional) infant-dedicated brain atlases are essential for neuroimaging analysis of early dynamic brain development. However, due to the substantial technical challenges in the acquisition and processing of infant brain MR images, 4D atlases densely covering the dynamic brain development during infancy are still scarce. Few existing ones generally have fuzzy tissue contrast and low spatiotemporal resolution, leading to degraded accuracy of atlas-based normalization and subsequent analyses. To address this issue, in this paper, we construct a 4D structural MRI atlas for infant brains based on the UNC/UMN Baby Connectome Project (BCP) dataset, which features a high spatial resolution, extensive age-range coverage, and densely sampled time points. Specifically, 542 longitudinal T1w and T2w scans from 240 typically developing infants up to 26-month of age were utilized for our atlas construction. To improve the co-registration accuracy of the infant brain images, which typically exhibit dynamic appearance with low tissue contrast, we employed the state-of-the-art registration method and leveraged our generated reliable brain tissue probability maps in addition to the intensity images to improve the alignment of individual images. To achieve consistent region labeling on both infant and adult brain images for facilitating region-based analysis across ages, we mapped the widely used Desikan cortical parcellation onto our atlas by following an age-decreasing mapping manner. Meanwhile, the typical subcortical structures were manually delineated to facilitate the studies related to the subcortex. Compared with the existing infant brain atlases, our 4D atlas has much higher spatiotemporal resolution and preserves more structural details, and thus can boost accuracy in neurodevelopmental analysis during infancy.

YNIMG Journal 2022 Journal Article

A comparison of methods to harmonize cortical thickness measurements across scanners and sites

  • Delin Sun
  • Gopalkumar Rakesh
  • Courtney C. Haswell
  • Mark Logue
  • C. Lexi Baird
  • Erin N. O'Leary
  • Andrew S. Cotton
  • Hong Xie

Results of neuroimaging datasets aggregated from multiple sites may be biased by site-specific profiles in participants’ demographic and clinical characteristics, as well as MRI acquisition protocols and scanning platforms. We compared the impact of four different harmonization methods on results obtained from analyses of cortical thickness data: (1) linear mixed-effects model (LME) that models site-specific random intercepts (LMEINT), (2) LME that models both site-specific random intercepts and age-related random slopes (LMEINT+SLP), (3) ComBat, and (4) ComBat with a generalized additive model (ComBat-GAM). Our test case for comparing harmonization methods was cortical thickness data aggregated from 29 sites, which included 1, 340 cases with posttraumatic stress disorder (PTSD) (6. 2–81. 8 years old) and 2, 057 trauma-exposed controls without PTSD (6. 3–85. 2 years old). We found that, compared to the other data harmonization methods, data processed with ComBat-GAM was more sensitive to the detection of significant case-control differences (Χ 2(3) = 63. 704, p < 0. 001) as well as case-control differences in age-related cortical thinning (Χ 2(3) = 12. 082, p = 0. 007). Both ComBat and ComBat-GAM outperformed LME methods in detecting sex differences (Χ 2(3) = 9. 114, p = 0. 028) in regional cortical thickness. ComBat-GAM also led to stronger estimates of age-related declines in cortical thickness (corrected p-values < 0. 001), stronger estimates of case-related cortical thickness reduction (corrected p-values < 0. 001), weaker estimates of age-related declines in cortical thickness in cases than controls (corrected p-values < 0. 001), stronger estimates of cortical thickness reduction in females than males (corrected p-values < 0. 001), and stronger estimates of cortical thickness reduction in females relative to males in cases than controls (corrected p-values < 0. 001). Our results support the use of ComBat-GAM to minimize confounds and increase statistical power when harmonizing data with non-linear effects, and the use of either ComBat or ComBat-GAM for harmonizing data with linear effects.

AAAI Conference 2022 Conference Paper

Cross-Dataset Collaborative Learning for Semantic Segmentation in Autonomous Driving

  • Li Wang
  • Dong Li
  • Han Liu
  • Jinzhang Peng
  • Lu Tian
  • Yi Shan

Semantic segmentation is an important task for scene understanding in self-driving cars and robotics, which aims to assign dense labels for all pixels in the image. Existing work typically improves semantic segmentation performance by exploring different network architectures on a target dataset. Little attention has been paid to build a unified system by simultaneously learning from multiple datasets due to the inherent distribution shift across different datasets. In this paper, we propose a simple, flexible, and general method for semantic segmentation, termed Cross-Dataset Collaborative Learning (CDCL). Our goal is to train a unified model for improving the performance in each dataset by leveraging information from all the datasets. Specifically, we first introduce a family of Dataset-Aware Blocks (DAB) as the fundamental computing units of the network, which help capture homogeneous convolutional representations and heterogeneous statistics across different datasets. Second, we present a Dataset Alternation Training (DAT) mechanism to facilitate the collaborative optimization procedure. We conduct extensive evaluations on diverse semantic segmentation datasets for autonomous driving. Experiments demonstrate that our method consistently achieves notable improvements over prior single-dataset and cross-dataset training methods without introducing extra FLOPs. Particularly, with the same architecture of PSPNet (ResNet-18), our method outperforms the single-dataset baseline by 5. 65%, 6. 57%, and 5. 79% mIoU on the validation sets of Cityscapes, BDD100K, CamVid, respectively. We also apply CDCL for point cloud 3D semantic segmentation and achieve improved performance, which further validates the superiority and generality of our method. Code and models will be released.

NeurIPS Conference 2022 Conference Paper

DARE: Disentanglement-Augmented Rationale Extraction

  • Linan Yue
  • Qi Liu
  • Yichao Du
  • Yanqing An
  • Li Wang
  • Enhong Chen

Rationale extraction can be considered as a straightforward method of improving the model explainability, where rationales are a subsequence of the original inputs, and can be extracted to support the prediction results. Existing methods are mainly cascaded with the selector which extracts the rationale tokens, and the predictor which makes the prediction based on selected tokens. Since previous works fail to fully exploit the original input, where the information of non-selected tokens is ignored, in this paper, we propose a Disentanglement-Augmented Rationale Extraction (DARE) method, which encapsulates more information from the input to extract rationales. Specifically, it first disentangles the input into the rationale representations and the non-rationale ones, and then learns more comprehensive rationale representations for extracting by minimizing the mutual information (MI) between the two disentangled representations. Besides, to improve the performance of MI minimization, we develop a new MI estimator by exploring existing MI estimation methods. Extensive experimental results on three real-world datasets and simulation studies clearly validate the effectiveness of our proposed method. Code is released at https: //github. com/yuelinan/DARE.

NeurIPS Conference 2022 Conference Paper

HSDF: Hybrid Sign and Distance Field for Modeling Surfaces with Arbitrary Topologies

  • Li Wang
  • Jie Yang
  • Weikai Chen
  • Xiaoxu Meng
  • Bo Yang
  • Jintao Li
  • Lin Gao

Neural implicit function based on signed distance field (SDF) has achieved impressive progress in reconstructing 3D models with high fidelity. However, such approaches can only represent closed shapes. Recent works based on unsigned distance function (UDF) are proposed to handle both watertight and open surfaces. Nonetheless, as UDF is signless, its direct output is limited to point cloud, which imposes an additional challenge on extracting high-quality meshes from discrete points. To address this issue, we present a new learnable implicit representation, coded HSDF, that connects the good ends of SDF and UDF. In particular, HSDF is able to represent arbitrary topologies containing both closed and open surfaces while being compatible with existing iso-surface extraction techniques for easy field-to-mesh conversion. In addition to predicting a UDF, we propose to learn an additional sign field via a simple classifier. Unlike traditional SDF, HSDF is able to locate the surface of interest before level surface extraction by generating surface points following NDF~\cite{chibane2020ndf}. We are then able to obtain open surfaces via an adaptive meshing approach that only instantiates regions containing surface into a polygon mesh. We also propose HSDF-Net, a dedicated learning framework that factorizes the learning of HSDF into two easier problems. Experiments on multiple datasets show that HSDF outperforms state-of-the-art techniques both qualitatively and quantitatively.

ICLR Conference 2022 Conference Paper

HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via Hybrid Action Representation

  • Boyan Li
  • Hongyao Tang
  • Yan Zheng 0002
  • Jianye Hao
  • Pengyi Li 0001
  • Zhen Wang 0004
  • Zhaopeng Meng
  • Li Wang

Discrete-continuous hybrid action space is a natural setting in many practical problems, such as robot control and game AI. However, most previous Reinforcement Learning (RL) works only demonstrate the success in controlling with either discrete or continuous action space, while seldom take into account the hybrid action space. One naive way to address hybrid action RL is to convert the hybrid action space into a unified homogeneous action space by discretization or continualization, so that conventional RL algorithms can be applied. However, this ignores the underlying structure of hybrid action space and also induces the scalability issue and additional approximation difficulties, thus leading to degenerated results. In this paper, we propose Hybrid Action Representation (HyAR) to learn a compact and decodable latent representation space for the original hybrid action space. HyAR constructs the latent space and embeds the dependence between discrete action and continuous parameter via an embedding table and conditional Variantional Auto-Encoder (VAE). To further improve the effectiveness, the action representation is trained to be semantically smooth through unsupervised environmental dynamics prediction. Finally, the agent then learns its policy with conventional DRL algorithms in the learned representation space and interacts with the environment by decoding the hybrid action embeddings to the original action space. We evaluate HyAR in a variety of environments with discrete-continuous action space. The results demonstrate the superiority of HyAR when compared with previous baselines, especially for high-dimensional action spaces.

ICML Conference 2022 Conference Paper

Individual Reward Assisted Multi-Agent Reinforcement Learning

  • Li Wang
  • Yupeng Zhang
  • Yujing Hu
  • Weixun Wang
  • Chongjie Zhang
  • Yang Gao 0001
  • Jianye Hao
  • Tangjie Lv

In many real-world multi-agent systems, the sparsity of team rewards often makes it difficult for an algorithm to successfully learn a cooperative team policy. At present, the common way for solving this problem is to design some dense individual rewards for the agents to guide the cooperation. However, most existing works utilize individual rewards in ways that do not always promote teamwork and sometimes are even counterproductive. In this paper, we propose Individual Reward Assisted Team Policy Learning (IRAT), which learns two policies for each agent from the dense individual reward and the sparse team reward with discrepancy constraints for updating the two policies mutually. Experimental results in different scenarios, such as the Multi-Agent Particle Environment and the Google Research Football Environment, show that IRAT significantly outperforms the baseline methods and can greatly promote team policy learning without deviating from the original team objective, even when the individual rewards are misleading or conflict with the team rewards.

YNIMG Journal 2022 Journal Article

Longitudinal brain atlases of early developing cynomolgus macaques from birth to 48 months of age

  • Tao Zhong
  • Jingkuan Wei
  • Kunhua Wu
  • Liangjun Chen
  • Fenqiang Zhao
  • Yuchen Pei
  • Ya Wang
  • Hongjiang Zhang

Longitudinal brain imaging atlases with densely sampled time-points and ancillary anatomical information are of fundamental importance in studying early developmental characteristics of human and non-human primate brains during infancy, which feature extremely dynamic imaging appearance, brain shape and size. However, for non-human primates, which are highly valuable animal models for understanding human brains, the existing brain atlases are mainly developed based on adults or adolescents, denoting a notable lack of temporally densely-sampled atlases covering the dynamic early brain development. To fill this critical gap, in this paper, we construct a comprehensive set of longitudinal brain atlases and associated tissue probability maps (gray matter, white matter, and cerebrospinal fluid) with totally 12 time-points from birth to 4 years of age (i.e., 1, 2, 3, 4, 5, 6, 9, 12, 18, 24, 36, and 48 months of age) based on 175 longitudinal structural MRI scans from 39 typically-developing cynomolgus macaques, by leveraging state-of-the-art computational techniques tailored for early developing brains. Furthermore, to facilitate region-based analysis using our atlases, we also provide two popular hierarchy parcellations, i.e., cortical hierarchy maps (6 levels) and subcortical hierarchy maps (6 levels), on our longitudinal macaque brain atlases. These early developing atlases, which have the densest time-points during infancy (to the best of our knowledge), will greatly facilitate the studies of macaque brain development.

AAAI Conference 2022 Conference Paper

Privacy-Preserving Face Recognition in the Frequency Domain

  • Yinggui Wang
  • Jian Liu
  • Man Luo
  • Le Yang
  • Li Wang

Some applications require performing face recognition (FR) on third-party servers, which could be accessed by attackers with malicious intents to compromise the privacy of users’ face information. This paper advocates a practical privacypreserving frequency-domain FR scheme without key management. The new scheme first collects the components with the same frequency from different blocks of a face image to form component channels. Only part of the channels are retained and fed into the analysis network that performs an interpretable privacy-accuracy trade-off analysis to identify channels important for face image visualization but not crucial for maintaining high FR accuracy. For this purpose, the loss function of the analysis network consists of the empirical FR error loss and a face visualization penalty term, and the network is trained in an end-to-end manner. We find that with the developed analysis network, more than 94% of the image energy can be dropped while the face recognition accuracy stays almost undegraded. In order to further protect the remaining frequency components, we propose a fast masking method. Effectiveness of the new scheme in removing the visual information of face images while maintaining their distinguishability is validated over several large face datasets. Results show that the proposed scheme achieves a recognition performance and inference time comparable to ArcFace operating on original face images directly.

TIST Journal 2022 Journal Article

Toward Scalable and Privacy-preserving Deep Neural Network via Algorithmic-Cryptographic Co-design

  • Jun Zhou
  • Longfei Zheng
  • Chaochao Chen
  • Yan Wang
  • Xiaolin Zheng
  • Bingzhe Wu
  • Cen Chen
  • Li Wang

Deep Neural Networks (DNNs) have achieved remarkable progress in various real-world applications, especially when abundant training data are provided. However, data isolation has become a serious problem currently. Existing works build privacy-preserving DNN models from either algorithmic perspective or cryptographic perspective. The former mainly splits the DNN computation graph between data holders or between data holders and server, which demonstrates good scalability but suffers from accuracy loss and potential privacy risks. In contrast, the latter leverages time-consuming cryptographic techniques, which has strong privacy guarantee but poor scalability. In this article, we propose SPNN—a Scalable and Privacy-preserving deep Neural Network learning framework, from an algorithmic-cryptographic co-perspective. From algorithmic perspective, we split the computation graph of DNN models into two parts, i.e., the private-data-related computations that are performed by data holders and the rest heavy computations that are delegated to a semi-honest server with high computation ability. From cryptographic perspective, we propose using two types of cryptographic techniques, i.e., secret sharing and homomorphic encryption, for the isolated data holders to conduct private-data-related computations privately and cooperatively. Furthermore, we implement SPNN in a decentralized setting and introduce user-friendly APIs. Experimental results conducted on real-world datasets demonstrate the superiority of our proposed SPNN.

AAAI Conference 2022 Conference Paper

Two-Stage Octave Residual Network for End-to-End Image Compression

  • Fangdong Chen
  • Yumeng Xu
  • Li Wang

Octave Convolution (OctConv) is a generic convolutional unit that has already achieved good performances in many computer vision tasks. Recent studies also have shown the potential of applying the OctConv in end-to-end image compression. However, considering the characteristic of image compression task, current works of OctConv may limit the performance of the image compression network due to the loss of spatial information caused by the sampling operations of inter-frequency communication. Besides, the correlation between multi-frequency latents produced by OctConv is not utilized in current architectures. In this paper, to address these problems, we propose a novel Two-stage Octave Residual (ToRes) block which strips the sampling operation from OctConv to strengthen the capability of preserving useful information. Moreover, to capture the redundancy between the multi-frequency latents, a context transfer module is designed. The results show that both ToRes block and the incorporation of context transfer module help to improve the Rate-Distortion performance, and the combination of these two strategies makes our model achieve the state-of-the-art performance and outperform the latest compression standard Versatile Video Coding (VVC) in terms of both PSNR and MS-SSIM.

IJCAI Conference 2022 Conference Paper

Vertically Federated Graph Neural Network for Privacy-Preserving Node Classification

  • Chaochao Chen
  • Jun Zhou
  • Longfei Zheng
  • Huiwen Wu
  • Lingjuan Lyu
  • Jia Wu
  • Bingzhe Wu
  • Ziqi Liu

Recently, Graph Neural Network (GNN) has achieved remarkable progresses in various real-world tasks on graph data, consisting of node features and the adjacent information between different nodes. High-performance GNN models always depend on both rich features and complete edge information in graph. However, such information could possibly be isolated by different data holders in practice, which is the so-called data isolation problem. To solve this problem, in this paper, we propose VFGNN, a federated GNN learning paradigm for privacy-preserving node classification task under data vertically partitioned setting, which can be generalized to existing GNN models. Specifically, we split the computation graph into two parts. We leave the private data (i. e. , features, edges, and labels) related computations on data holders, and delegate the rest of computations to a semi-honest server. We also propose to apply differential privacy to prevent potential information leakage from the server. We conduct experiments on three benchmarks and the results demonstrate the effectiveness of VFGNN.

NeurIPS Conference 2022 Conference Paper

Weighted Mutual Learning with Diversity-Driven Model Compression

  • Miao Zhang
  • Li Wang
  • David Campos
  • Wei Huang
  • Chenjuan Guo
  • Bin Yang

Online distillation attracts attention from the community as it simplifies the traditional two-stage knowledge distillation process into a single stage. Online distillation collaboratively trains a group of peer models, which are treated as students, and all students gain extra knowledge from each other. However, memory consumption and diversity among peers are two key challenges to the scalability and quality of online distillation. To address the two challenges, this paper presents a framework called Weighted Mutual Learning with Diversity-Driven Model Compression (WML) for online distillation. First, at the base of a hierarchical structure where peers share different parts, we leverage the structured network pruning to generate diversified peer models and reduce the memory requirements. Second, rather than taking the average of peers, this paper, for the first time, leverages a bi-level formulation to estimate the relative importance of peers with a close-form, to further boost the effectiveness of the distillation from each other. Extensive experiments show the generalization of the proposed framework, which outperforms existing online distillation methods on a variety of deep neural networks. More interesting, as a byproduct, \WML produces a series of pruned models under different model sizes in a single run, which also achieves competitive results compared with existing channel pruning methods.

AAAI Conference 2022 Conference Paper

What about Inputting Policy in Value Function: Policy Representation and Policy-Extended Value Function Approximator

  • Hongyao Tang
  • Zhaopeng Meng
  • Jianye Hao
  • Chen Chen
  • Daniel Graves
  • Dong Li
  • Changmin Yu
  • Hangyu Mao

We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation. Such an extension enables PeVFA to preserve values of multiple policies at the same time and brings an appealing characteristic, i. e. , value generalization among policies. We formally analyze the value generalization under Generalized Policy Iteration (GPI). From theoretical and empirical lens, we show that generalized value estimates offered by PeVFA may have lower initial approximation error to true values of successive policies, which is expected to improve consecutive value approximation during GPI. Based on above clues, we introduce a new form of GPI with PeVFA which leverages the value generalization along policy improvement path. Moreover, we propose a representation learning framework for RL policy, providing several approaches to learn effective policy embeddings from policy network parameters or stateaction pairs. In our experiments, we evaluate the efficacy of value generalization offered by PeVFA and policy representation learning in several OpenAI Gym continuous control tasks. For a representative instance of algorithm implementation, Proximal Policy Optimization (PPO) re-implemented under the paradigm of GPI with PeVFA achieves about 40% performance improvement on its vanilla counterpart in most environments.

YNIMG Journal 2021 Journal Article

DIKA-Nets: Domain-invariant knowledge-guided attention networks for brain skull stripping of early developing macaques

  • Tao Zhong
  • Fenqiang Zhao
  • Yuchen Pei
  • Zhenyuan Ning
  • Lufan Liao
  • Zhengwang Wu
  • Yuyu Niu
  • Li Wang

As non-human primates, macaques have a close phylogenetic relationship to human beings and have been proven to be a valuable and widely used animal model in human neuroscience research. Accurate skull stripping (aka. brain extraction) of brain magnetic resonance imaging (MRI) is a crucial prerequisite in neuroimaging analysis of macaques. Most of the current skull stripping methods can achieve satisfactory results for human brains, but when applied to macaque brains, especially during early brain development, the results are often unsatisfactory. In fact, the early dynamic, regionally-heterogeneous development of macaque brains, accompanied by poor and age-related contrast between different anatomical structures, poses significant challenges for accurate skull stripping. To overcome these challenges, we propose a fully-automated framework to effectively fuse the age-specific intensity information and domain-invariant prior knowledge as important guiding information for robust skull stripping of developing macaques from 0 to 36 months of age. Specifically, we generate Signed Distance Map (SDM) and Center of Gravity Distance Map (CGDM) based on the intermediate segmentation results as guidance. Instead of using local convolution, we fuse all information using the Dual Self-Attention Module (DSAM), which can capture global spatial and channel-dependent information of feature maps. To extensively evaluate the performance, we adopt two relatively-large challenging MRI datasets from rhesus macaques and cynomolgus macaques, respectively, with a total of 361 scans from two different scanners with different imaging protocols. We perform cross-validation by using one dataset for training and the other one for testing. Our method outperforms five popular brain extraction tools and three deep-learning-based methods on cross-source MRI datasets without any transfer learning.

IJCAI Conference 2021 Conference Paper

Preference-Adaptive Meta-Learning for Cold-Start Recommendation

  • Li Wang
  • Binbin Jin
  • Zhenya Huang
  • Hongke Zhao
  • Defu Lian
  • Qi Liu
  • Enhong Chen

In recommender systems, the cold-start problem is a critical issue. To alleviate this problem, an emerging direction adopts meta-learning frameworks and achieves success. Most existing works aim to learn globally shared prior knowledge across all users so that it can be quickly adapted to a new user with sparse interactions. However, globally shared prior knowledge may be inadequate to discern users’ complicated behaviors and causes poor generalization. Therefore, we argue that prior knowledge should be locally shared by users with similar preferences who can be recognized by social relations. To this end, in this paper, we propose a Preference-Adaptive Meta-Learning approach (PAML) to improve existing meta-learning frameworks with better generalization capacity. Specifically, to address two challenges imposed by social relations, we first identify reliable implicit friends to strengthen a user’s social relations based on our defined palindrome paths. Then, a coarse-fine preference modeling method is proposed to leverage social relations and capture the preference. Afterwards, a novel preference-specific adapter is designed to adapt the globally shared prior knowledge to the preference-specific knowledge so that users who have similar tastes share similar knowledge. We conduct extensive experiments on two publicly available datasets. Experimental results validate the power of social relations and the effectiveness of PAML.

NeurIPS Conference 2021 Conference Paper

Progressive Coordinate Transforms for Monocular 3D Object Detection

  • Li Wang
  • Li Zhang
  • Yi Zhu
  • Zhi Zhang
  • Tong He
  • Mu Li
  • Xiangyang Xue

Recognizing and localizing objects in the 3D space is a crucial ability for an AI agent to perceive its surrounding environment. While significant progress has been achieved with expensive LiDAR point clouds, it poses a great challenge for 3D object detection given only a monocular image. While there exist different alternatives for tackling this problem, it is found that they are either equipped with heavy networks to fuse RGB and depth information or empirically ineffective to process millions of pseudo-LiDAR points. With in-depth examination, we realize that these limitations are rooted in inaccurate object localization. In this paper, we propose a novel and lightweight approach, dubbed {\em Progressive Coordinate Transforms} (PCT) to facilitate learning coordinate representations. Specifically, a localization boosting mechanism with confidence-aware loss is introduced to progressively refine the localization prediction. In addition, semantic image representation is also exploited to compensate for the usage of patch proposals. Despite being lightweight and simple, our strategy allows us to establish a new state-of-the-art among the monocular 3D detectors on the competitive KITTI benchmark. At the same time, our proposed PCT shows great generalization to most coordinate-based 3D detection frameworks.

YNIMG Journal 2021 Journal Article

The maturation and cognitive relevance of structural brain network organization from early infancy to childhood

  • Mackenzie Woodburn
  • Cheyenne L. Bricken
  • Zhengwang Wu
  • Gang Li
  • Li Wang
  • Weili Lin
  • Margaret A. Sheridan
  • Jessica R. Cohen

The interactions of brain regions with other regions at the network level likely provide the infrastructure necessary for cognitive processes to develop. Specifically, it has been theorized that in infancy brain networks become more modular, or segregated, to support early cognitive specialization, before integration across networks increases to support the emergence of higher-order cognition. The present study examined the maturation of structural covariance networks (SCNs) derived from longitudinal cortical thickness data collected between infancy and childhood (0-6 years). We assessed modularity as a measure of network segregation and global efficiency as a measure of network integration. At the group level, we observed trajectories of increasing modularity and decreasing global efficiency between early infancy and six years. We further examined subject-based maturational coupling networks (sbMCNs) in a subset of this cohort with cognitive outcome data at 8-10 years, which allowed us to relate the network organization of longitudinal cortical thickness maturation to cognitive outcomes in middle childhood. We found that lower global efficiency of sbMCNs throughout early development (across the first year) related to greater motor learning at 8-10 years. Together, these results provide novel evidence characterizing the maturation of brain network segregation and integration across the first six years of life, and suggest that specific trajectories of brain network maturation contribute to later cognitive outcomes.

JBHI Journal 2020 Journal Article

Adaptive-Guided-Coupling-Probability Level Set for Retinal Layer Segmentation

  • Yue Sun
  • Sijie Niu
  • Xizhan Gao
  • Jie Su
  • Jiwen Dong
  • Yuehui Chen
  • Li Wang

Quantitative assessment of retinal layer thickness in spectral domain-optical coherence tomography (SD-OCT) images is vital for clinicians to determine the degree of ophthalmic lesions. However, due to the complex retinal tissues, high-level speckle noises and low intensity constraint, how to accurately recognize the retinal layer structure still remains a challenge. To overcome this problem, this paper proposes an adaptive-guided-coupling-probability level set method for retinal layer segmentation in SD-OCT images. Specifically, based on Bayes's theorem, each voxel probability representation is composed of two probability terms in our method. The first term is constructed as neighborhood Gaussian fitting distribution to characterize intensity information for each intra-retinal layer. The second one is boundary probability map generated by combining anatomical priors and adaptive thickness information to ensure surfaces evolve within a proper range. Then, the voxel probability representation is introduced into the proposed segmentation framework based on coupling probability level set to detect layer boundaries. A total of 1792 retinal B-scan images from 4 SD-OCT cubes in healthy eyes, 5 cubes in abnormal eyes with central serous chorioretinaopathy and 5 SD-OCT cubes in abnormal eyes with age-related macular disease are used to evaluate the proposed method. The experiment demonstrates that the segmentation results obtained by the proposed method have a good consistency with ground truth, and the proposed method outperforms six methods in the layer segmentation of uneven retinal SD-OCT images.

YNICL Journal 2020 Journal Article

Altered resting-state dynamic functional brain networks in major depressive disorder: Findings from the REST-meta-MDD consortium

  • Yicheng Long
  • Hengyi Cao
  • Chaogan Yan
  • Xiao Chen
  • Le Li
  • Francisco Xavier Castellanos
  • Tongjian Bai
  • Qijing Bo

BACKGROUND: Major depressive disorder (MDD) is known to be characterized by altered brain functional connectivity (FC) patterns. However, whether and how the features of dynamic FC would change in patients with MDD are unclear. In this study, we aimed to characterize dynamic FC in MDD using a large multi-site sample and a novel dynamic network-based approach. METHODS: Resting-state functional magnetic resonance imaging (fMRI) data were acquired from a total of 460 MDD patients and 473 healthy controls, as a part of the REST-meta-MDD consortium. Resting-state dynamic functional brain networks were constructed for each subject by a sliding-window approach. Multiple spatio-temporal features of dynamic brain networks, including temporal variability, temporal clustering and temporal efficiency, were then compared between patients and healthy subjects at both global and local levels. RESULTS: ). Corresponding local changes in MDD were mainly found in the default-mode, sensorimotor and subcortical areas. Measures of temporal variability and characteristic temporal path length were significantly correlated with depression severity in patients (corrected p < 0.05). Moreover, the observed between-group differences were robustly present in both first-episode, drug-naïve (FEDN) and non-FEDN patients. CONCLUSIONS: Our findings suggest that excessive temporal variations of brain FC, reflecting abnormal communications between large-scale bran networks over time, may underlie the neuropathology of MDD.

YNICL Journal 2020 Journal Article

Biotypes of major depressive disorder: Neuroimaging evidence from resting-state default mode network patterns

  • Sugai Liang
  • Wei Deng
  • Xiaojing Li
  • Andrew J. Greenshaw
  • Qiang Wang
  • Mingli Li
  • Xiaohong Ma
  • Tong-Jian Bai

BACKGROUND: Major depressive disorder (MDD) is heterogeneous disorder associated with aberrant functional connectivity within the default mode network (DMN). This study focused on data-driven identification and validation of potential DMN-pattern-based MDD subtypes to parse heterogeneity of the disorder. METHODS: The sample comprised 1397 participants including 690 patients with MDD and 707 healthy controls (HC) registered from multiple sites based on the REST-meta-MDD Project in China. Baseline resting-state functional magnetic resonance imaging (rs-fMRI) data was recorded for each participant. Discriminative features were selected from DMN between patients and HC. Patient subgroups were defined by K-means and principle component analysis in the multi-site datasets and validated in an independent single-site dataset. Statistical significance of resultant clustering were confirmed. Demographic and clinical variables were compared between identified patient subgroups. RESULTS: Two MDD subgroups with differing functional connectivity profiles of DMN were identified in the multi-site datasets, and relatively stable in different validation samples. The predominant dysfunctional connectivity profiles were detected among superior frontal cortex, ventral medial prefrontal cortex, posterior cingulate cortex and precuneus, whereas one subgroup exhibited increases of connectivity (hyperDMN MDD) and another subgroup showed decreases of connectivity (hypoDMN MDD). The hyperDMN subgroup in the discovery dataset had age-related severity of depressive symptoms. Patient subgroups had comparable demographic and clinical symptom variables. CONCLUSIONS: Findings suggest the existence of two neural subtypes of MDD associated with different dysfunctional DMN connectivity patterns, which may provide useful evidence for parsing heterogeneity of depression and be valuable to inform the search for personalized treatment strategies.

AAAI Conference 2020 Conference Paper

Characterizing Membership Privacy in Stochastic Gradient Langevin Dynamics

  • Bingzhe Wu
  • Chaochao Chen
  • Shiwan Zhao
  • Cen Chen
  • Yuan Yao
  • Guangyu Sun
  • Li Wang
  • Xiaolu Zhang

Bayesian deep learning is recently regarded as an intrinsic way to characterize the weight uncertainty of deep neural networks (DNNs). Stochastic Gradient Langevin Dynamics (SGLD) is an effective method to enable Bayesian deep learning on large-scale datasets. Previous theoretical studies have shown various appealing properties of SGLD, ranging from the convergence properties to the generalization bounds. In this paper, we study the properties of SGLD from a novel perspective of membership privacy protection (i. e. , preventing the membership attack). The membership attack, which aims to determine whether a specific sample is used for training a given DNN model, has emerged as a common threat against deep learning algorithms. To this end, we build a theoretical framework to analyze the information leakage (w. r. t. the training dataset) of a model trained using SGLD. Based on this framework, we demonstrate that SGLD can prevent the information leakage of the training dataset to a certain extent. Moreover, our theoretical analysis can be naturally extended to other types of Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) methods. Empirical results on different datasets and models verify our theoretical findings and suggest that the SGLD algorithm can not only reduce the information leakage but also improve the generalization ability of the DNN models in real-world applications.

AAAI Conference 2020 Conference Paper

Feature Variance Regularization: A Simple Way to Improve the Generalizability of Neural Networks

  • Ranran Huang
  • Hanbo Sun
  • Ji Liu
  • Lu Tian
  • Li Wang
  • Yi Shan
  • Yu Wang

To improve the generalization ability of neural networks, we propose a novel regularization method that regularizes the empirical risk using a penalty on the empirical variance of the features. Intuitively, our approach introduces confusion into feature extraction and prevents the models from learning features that may relate to specific training samples. According to our theoretical analysis, our method encourages models to generate closer feature distributions for the training set and unobservable true data and minimize the expected risk as well, which allows the model to adapt to new samples better. We provide a thorough empirical justification of our approach, and achieves a greater improvement than other regularization methods. The experimental results show the effectiveness of our method on multiple visual tasks, including classification (CIFAR100, ImageNet, fine-grained datasets) and semantic segmentation (Cityscapes).

TIST Journal 2020 Journal Article

Practical Privacy Preserving POI Recommendation

  • Chaochao Chen
  • Jun Zhou
  • Bingzhe Wu
  • Wenjing Fang
  • Li Wang
  • Yuan Qi
  • Xiaolin Zheng

Point-of-Interest (POI) recommendation has been extensively studied and successfully applied in industry recently. However, most existing approaches build centralized models on the basis of collecting users’ data. Both private data and models are held by the recommender, which causes serious privacy concerns. In this article, we propose a novel Privacy preserving POI Recommendation (PriRec) framework. First, to protect data privacy, users’ private data (features and actions) are kept on their own side, e.g., Cellphone or Pad. Meanwhile, the public data that need to be accessed by all the users are kept by the recommender to reduce the storage costs of users’ devices. Those public data include: (1) static data only related to the status of POI, such as POI categories, and (2) dynamic data dependent on user-POI actions such as visited counts. The dynamic data could be sensitive, and we develop local differential privacy techniques to release such data to the public with privacy guarantees. Second, PriRec follows the representations of Factorization Machine (FM) that consists of a linear model and the feature interaction model. To protect the model privacy, the linear models are saved on the users’ side, and we propose a secure decentralized gradient descent protocol for users to learn it collaboratively. The feature interaction model is kept by the recommender since there is no privacy risk, and we adopt a secure aggregation strategy in a federated learning paradigm to learn it. To this end, PriRec keeps users’ private raw data and models in users’ own hands, and protects user privacy to a large extent. We apply PriRec in real-world datasets, and comprehensive experiments demonstrate that, compared with FM, PriRec achieves comparable or even better recommendation accuracy.

AAAI Conference 2020 Conference Paper

Show, Recall, and Tell: Image Captioning with Recall Mechanism

  • Li Wang
  • Zechen Bai
  • Yonghua Zhang
  • Hongtao Lu

Generating natural and accurate descriptions in image captioning has always been a challenge. In this paper, we propose a novel recall mechanism to imitate the way human conduct captioning. There are three parts in our recall mechanism: recall unit, semantic guide (SG) and recalled-word slot (RWS). Recall unit is a text-retrieval module designed to retrieve recalled words for images. SG and RWS are designed for the best use of recalled words. SG branch can generate a recalled context, which can guide the process of generating caption. RWS branch is responsible for copying recalled words to the caption. Inspired by pointing mechanism in text summarization, we adopt a soft switch to balance the generated-word probabilities between SG and RWS. In the CIDEr optimization step, we also introduce an individual recalled-word reward (WR) to boost training. Our proposed methods (SG+RWS+WR) achieve BLEU-4 / CIDEr / SPICE scores of 36. 6 / 116. 9 / 21. 3 with cross-entropy loss and 38. 7 / 129. 1 / 22. 4 with CIDEr optimization on MSCOCO Karpathy test split, which surpass the results of other state-of-the-art methods.

AAAI Conference 2020 Conference Paper

Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement

  • Jianing Deng
  • Li Wang
  • Shiliang Pu
  • Cheng Zhuo

Recent years have witnessed remarkable success of deep learning methods in quality enhancement for compressed video. To better explore temporal information, existing methods usually estimate optical flow for temporal motion compensation. However, since compressed video could be seriously distorted by various compression artifacts, the estimated optical flow tends to be inaccurate and unreliable, thereby resulting in ineffective quality enhancement. In addition, optical flow estimation for consecutive frames is generally conducted in a pairwise manner, which is computational expensive and inefficient. In this paper, we propose a fast yet effective method for compressed video quality enhancement by incorporating a novel Spatio-Temporal Deformable Fusion (STDF) scheme to aggregate temporal information. Specifically, the proposed STDF takes a target frame along with its neighboring reference frames as input to jointly predict an offset field to deform the spatio-temporal sampling positions of convolution. As a result, complementary information from both target and reference frames can be fused within a single Spatio-Temporal Deformable Convolution (STDC) operation. Extensive experiments show that our method achieves the state-of-the-art performance of compressed video quality enhancement in terms of both accuracy and efficiency.

AAAI Conference 2019 Short Paper

An Optimal Rewiring Strategy for Cooperative Multiagent Social Learning

  • Hongyao Tang
  • Jianye Hao
  • Li Wang
  • Tim Baarslag
  • Zan Wang

Multiagent coordination in cooperative multiagent systems (MASs) has been widely studied in both fixed-agent repeated interaction setting and static social learning framework. However, two aspects of dynamics in real-world MASs are currently missing. First, the network topologies can dynamically change during the course of interaction. Second, the interaction utilities between each pair of agents may not be identical and not known as a prior. Both issues mentioned above increase the difficulty of coordination. In this paper, we consider the multiagent social learning in a dynamic environment in which agents can alter their connections and interact with randomly chosen neighbors with unknown utilities beforehand. We propose an optimal rewiring strategy to select most beneficial peers to maximize the accumulated payoffs in long-run interactions. We empirically demonstrate the effects of our approach in large-scale MASs.

AAMAS Conference 2019 Conference Paper

An Optimal Rewiring Strategy for Cooperative Multiagent Social Learning

  • Hongyao Tang
  • Jianye Hao
  • Li Wang
  • Zan Wang
  • Tim Baarslag

Multiagent coordination is a key problem in cooperative multiagent systems (MASs). It has been widely studied in both fixed-agent repeated interaction setting and static social learning framework. However, two aspects of dynamics in real-world MASs are currently neglected. First, the network topologies can change during the course of interaction dynamically. Second, the interaction utilities can be different among each pair of agents and usually unknown before interaction. Both issues mentioned above increase the difficulty of coordination. In this paper, we consider the multiagent social learning in a dynamic environment in which agents can alter their connections and interact with randomly chosen neighbors with unknown utilities beforehand. We propose an optimal rewiring strategy to select most beneficial peers to maximize the accumulated payoffs in long-run interactions. We empirically demonstrate the effects of our approach in a variety of large-scale MASs.

AAAI Conference 2019 Conference Paper

Difficulty-Aware Attention Network with Confidence Learning for Medical Image Segmentation

  • Dong Nie
  • Li Wang
  • Lei Xiang
  • Sihang Zhou
  • Ehsan Adeli
  • Dinggang Shen

Medical image segmentation is a key step for various applications, such as image-guided radiation therapy and diagnosis. Recently, deep neural networks provided promising solutions for automatic image segmentation; however, they often perform good on regular samples (i. e. , easy-to-segment samples), since the datasets are dominated by easy and regular samples. For medical images, due to huge inter-subject variations or disease-specific effects on subjects, there exist several difficult-to-segment cases that are often overlooked by the previous works. To address this challenge, we propose a difficulty-aware deep segmentation network with confidence learning for end-to-end segmentation. The proposed framework has two main contributions: 1) Besides the segmentation network, we also propose a fully convolutional adversarial network for confidence learning to provide voxel-wise and region-wise confidence information for the segmentation network. We relax the adversarial learning to confidence learning by decreasing the priority of adversarial learning, so that we can avoid the training imbalance between generator and discriminator. 2) We propose a difficulty-aware attention mechanism to properly handle hard samples or hard regions considering structural information, which may go beyond the shortcomings of focal loss. We further propose a fusion module to selectively fuse the concatenated feature maps in encoder-decoder architectures. Experimental results on clinical and challenge datasets show that our proposed network can achieve state-of-the-art segmentation accuracy. Further analysis also indicates that each individual component of our proposed network contributes to the overall performance improvement.

NeurIPS Conference 2019 Conference Paper

Generalization in Generative Adversarial Networks: A Novel Perspective from Privacy Protection

  • Bingzhe Wu
  • Shiwan Zhao
  • Chaochao Chen
  • Haoyang Xu
  • Li Wang
  • Xiaolu Zhang
  • Guangyu Sun
  • Jun Zhou

In this paper, we aim to understand the generalization properties of generative adversarial networks (GANs) from a new perspective of privacy protection. Theoretically, we prove that a differentially private learning algorithm used for training the GAN does not overfit to a certain degree, i. e. , the generalization gap can be bounded. Moreover, some recent works, such as the Bayesian GAN, can be re-interpreted based on our theoretical insight from privacy protection. Quantitatively, to evaluate the information leakage of well-trained GAN models, we perform various membership attacks on these models. The results show that previous Lipschitz regularization techniques are effective in not only reducing the generalization gap but also alleviating the information leakage of the training dataset.

IJCAI Conference 2018 Conference Paper

A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization

  • Li Wang
  • Junlin Yao
  • Yunzhe Tao
  • Li Zhong
  • Wei Liu
  • Qiang Du

In this paper, we propose a deep learning approach to tackle the automatic summarization tasks by incorporating topic information into the convolutional sequence-to-sequence (ConvS2S) model and using self-critical sequence training (SCST) for optimization. Through jointly attending to topics and word-level alignment, our approach can improve coherence, diversity, and informativeness of generated summaries via a biased probability generation mechanism. On the other hand, reinforcement training, like SCST, directly optimizes the proposed model with respect to the non-differentiable metric ROUGE, which also avoids the exposure bias during inference. We carry out the experimental evaluation with state-of-the-art methods over the Gigaword, DUC-2004, and LCSTS datasets. The empirical results demonstrate the superiority of our proposed method in the abstractive summarization.

AAAI Conference 2018 Conference Paper

Efficient Test-Time Predictor Learning With Group-Based Budget

  • Li Wang
  • Dajiang Zhu
  • Yujie Chi

Learning a test-time efficient predictor is becoming important for many real-world applications for which accessing the necessary features of a test data is costly. In this paper, we propose a novel approach to learn a linear predictor by introducing binary indicator variables for selecting feature groups and imposing an explicit budget constraint to up-bound the total cost of selected groups. We solve the convex relaxation of the resulting problem, with the optimal solution proved to be integers for most of the elements at the optima and independent of the specific forms of loss functions used. We propose a general and efficient algorithm to solve the relaxation problem by leveraging the existing SVM solvers with various loss functions. For certain loss functions, the proposed algorithm can further take the advantage of SVM solver in the primal to tackle large-scale and high-dimensional data. Experiments on various datasets demonstrate the effectiveness and efficiency of the proposed method by comparing with various baselines.

AAAI Conference 2017 Conference Paper

Latent Smooth Skeleton Embedding

  • Li Wang
  • Qi Mao
  • Ivor Tsang

Learning a smooth skeleton in a low-dimensional space from noisy data becomes important in computer vision and computational biology. Existing methods assume that the manifold constructed from the data is smooth, but they lack the ability to model skeleton structures from noisy data. To overcome this issue, we propose a novel probabilistic structured learning model to learn the density of latent embedding given high-dimensional data and its neighborhood graph. The embedded points that form a smooth skeleton structure are obtained by maximum a posteriori (MAP) estimation. Our analysis shows that the resulting similarity matrix is sparse and unique, and its associated kernel has eigenvalues that follow a power law distribution, which leads to the embeddings of a smooth skeleton. The model is extended to learn a sparse similarity matrix when the graph structure is unknown. Extensive experiments demonstrate the effectiveness of the proposed methods on various datasets by comparing them with existing methods.

AAAI Conference 2016 Conference Paper

Learning Sparse Confidence-Weighted Classifier on Very High Dimensional Data

  • Mingkui Tan
  • Yan Yan
  • Li Wang
  • Anton van den Hengel
  • Ivor W. Tsang
  • Qinfeng (Javen) Shi

Confidence-weighted (CW) learning is a successful online learning paradigm which maintains a Gaussian distribution over classifier weights and adopts a covariance matrix to represent the uncertainties of the weight vectors. However, there are two deficiencies in existing full CW learning paradigms, these being the sensitivity to irrelevant features, and the poor scalability to high dimensional data due to the maintenance of the covariance structure. In this paper, we begin by presenting an online-batch CW learning scheme, and then present a novel paradigm to learn sparse CW classifiers. The proposed paradigm essentially identifies feature groups and naturally builds a block diagonal covariance structure, making it very suitable for CW learning over very high-dimensional data. Extensive experimental results demonstrate the superior performance of the proposed methods over state-of-the-art counterparts on classification and feature selection tasks.

JMLR Journal 2014 Journal Article

Towards Ultrahigh Dimensional Feature Selection for Big Data

  • Mingkui Tan
  • Ivor W. Tsang
  • Li Wang

In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature selection on Big Data, and then reformulate it as a convex semi-infinite programming (SIP) problem. To address the SIP, we propose an efficient feature generating paradigm. Different from traditional gradient-based approaches that conduct optimization on all input features, the proposed paradigm iteratively activates a group of features, and solves a sequence of multiple kernel learning (MKL) subproblems. To further speed up the training, we propose to solve the MKL subproblems in their primal forms through a modified accelerated proximal gradient approach. Due to such optimization scheme, some efficient cache techniques are also developed. The feature generating paradigm is guaranteed to converge globally under mild conditions, and can achieve lower feature selection bias. Moreover, the proposed method can tackle two challenging tasks in feature selection: 1) group-based feature selection with complex structures, and 2) nonlinear feature selection with explicit feature mappings. Comprehensive experiments on a wide range of synthetic and real-world data sets of tens of million data points with $O(10^{14})$ features demonstrate the competitive performance of the proposed method over state-of-the-art feature selection methods in terms of generalization performance and training efficiency. [abs] [ pdf ][ bib ] &copy JMLR 2014. ( edit, beta )

AAAI Conference 2012 Conference Paper

Convex Matching Pursuit for Large-Scale Sparse Coding and Subset Selection

  • Mingkui Tan
  • Ivor Tsang
  • Li Wang
  • Xinming Zhang

In this paper, a new convex matching pursuit scheme is proposed for tackling large-scale sparse coding and subset selection problems. In contrast with current matching pursuit algorithms such as subspace pursuit (SP), the proposed algorithm has a convex formulation and guarantees that the objective value can be monotonically decreased. Moreover, theoretical analysis and experimental results show that the proposed method achieves better scalability while maintaining similar or better decoding ability compared with state-of-the-art methods on large-scale problems.