Author name cluster

Jing Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

123 papers

2 author rows

EAAI Journal 2026 Journal Article

A hybrid method for anomaly data detection and reconstruction in proton exchange membrane fuel cells to enhance life prediction accuracy

Donghai Hu
Yan Sun
Yinjie Xu
Yuan Li
Biaoyi Liu
Hua Ding
Jing Wang
Hongwei Liu

Details DOI

AIIM Journal 2026 Journal Article

A labeled ophthalmic ultrasound dataset with medical report generation based on cross-modal deep learning

Jing Wang
Junyan Fan
Meng Zhou
Yanzhu Zhang
Mingyu Shi

Details DOI

EAAI Journal 2026 Journal Article

An approach for optimizing Convolutional Neural Network in copper foil surface defect detection

Haitao Zhang
Zhiyuan Piao
Jing Wang
Zhuo Cheng

Details DOI

AAAI Conference 2026 Conference Paper

CoRE-Learning with Look-Ahead and Immediate Resource Allocation

Jing Wang
Xi-Tong Liu
Zhi-Hua Zhou

Machine learning under limited computational resources has gained increasing attention recently. A common yet challenging scenario is managing multiple time-constrained learning tasks with budgeted computational resources, known as Computational Resource Efficient Learning (CoRE-Learning). To this end, a recently proposed framework, Learning with Adaptive Resource Allocation (LARA), offers a preliminary approach. In this paper, we point out the limitations of LARA, including its reliance on interpolation-based extrapolation methods, the need for a fixed exploration phase, and the use of high-frequency re-estimation and reallocation strategies. To address these issues, we propose Look-ahead and immediate Resource Allocation (LaiRA). Our approach incorporates an efficient Dynamic Kalman Filtering (DKF) for look-ahead feasibility check with limited data and a weight-based online estimator for immediate performance evaluation. For resource allocation, LaiRA constructs an Upper Confidence Bound (UCB) to enable adaptive exploration and introduces an adaptive time-slicing method to reduce task switching costs. Empirical studies validate the effectiveness of our approach.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Cross-Space Synergy: A Unified Framework for Multimodal Emotion Recognition in Conversation

Xiaosen Lyu
Jiayu Xiong
Yuren Chen
Wanlong Wang
Xiaoqing Dai
Jing Wang

Multimodal Emotion Recognition in Conversation (MERC) aims to predict speakers’ emotions by integrating textual, acoustic, and visual cues. Existing approaches either struggle to capture complex cross‑modal interactions or experience gradient conflicts and unstable training when using deeper architectures. To address these issues, we propose Cross-Space Synergy (CSS), which couples a representation component with an optimization component. Synergistic Polynomial Fusion (SPF) serves the representation role, leveraging low-rank tensor factorization to efficiently capture high-order cross-modal interactions. Pareto Gradient Modulator (PGM) serves the optimization role, steering updates along Pareto-optimal directions across competing objectives to alleviate gradient conflicts and improve stability. Experiments show that CSS outperforms existing representative methods on IEMOCAP and MELD in both accuracy and training stability, demonstrating its effectiveness in complex multimodal scenarios.

PDF Details DOI

JBHI Journal 2026 Journal Article

Digital Twins Framework for Clinical Decision-Centric Co-Management of Patient Monitoring and Environment Management

Wei Liu
Yuanyuan Sun
Jing Wang
Nanchang Yin

The convergence of continuous physiological monitoring and intelligent building systems in smart clinics offers a transformative opportunity for patient-centered care, yet it introduces the challenge of harmonizing clinical fidelity, patient comfort, and operational sustainability. We present DT-ECO, a privacy-preserving digital twins framework that enables decision-centric co-management of multi-modal patient monitoring and clinical environmental systems. DT-ECO constructs a hybrid digital twin that integrates a physics-informed building model with graph-temporal physiological inference and battery electrochemistry, enabling real-time synchronization between patient state, IoT device operation, and environmental dynamics within a differentiable programming environment. On this foundation, a hierarchical control strategy is developed, in which a constrained deep reinforcement learning agent adaptively schedules wearable IoT sensor sampling to extend device lifetime, while a model predictive controller orchestrates HVAC operation and on-site energy resources to maintain a therapeutic environment. Extensive evaluations on DOE reference hospitals and public ECG datasets demonstrate that DT-ECO achieves a 31. 8% reduction in annual energy consumption and extends median wearable battery life by 28%, while rigorously maintaining clinical standards-evidenced by less than 0. 6% thermal comfort violation and no degradation in arrhythmia detection capability (F1-score 0. 956). By bridging the gap between patient physiology and the clinical environment, DT-ECO establishes a pathway toward precision healthcare facilities that are simultaneously patient-centric, diagnostically robust, and operationally sustainable.

Details DOI

YNIMG Journal 2026 Journal Article

Dissociated glymphatic pathways mediates the neuronal hypometabolism on consciousness

Kun Guo
Yirong Wang
Zhiyong Quan
Guiyu Li
Yifei Zhang
Steven Laureys
Fei Kang
Jing Wang

Details DOI

JBHI Journal 2026 Journal Article

Distance Learning-Based Prototypical Network With Multi-Domain Adaptation for Few-Shot Hyperspectral Medical Image Classification

Favour Ekong
Jun Zhou
Jing Wang
Mohammad Aminul Islam
Yongsheng Gao

Hyperspectral imaging (HSI) holds immense potential for medical diagnostics by capturing tissue-specific spectral signatures that facilitate precise disease detection. However, effective HSI classification in clinical settings is hindered by two main challenges: (i) the severe lack of labelled medical HSI samples constrains model training. Prototypical networks, as a few-shot learning paradigm, have been adopted to address label scarcity. However, current Euclidean-based prototypical methods typically assume equal feature variance and spherical distributions, while ignoring intraclass covariance and spectral correlations; (ii) significant domain shifts across heterogeneous medical HSI datasets undermine model generalisation, impair multi-domain interpretability, and force expensive per-dataset retraining. To overcome these limitations, we propose a novel distance-learning-based prototypical network with multi-domain adaptation for few-shot hyperspectral medical image classification. First, by embedding a class-covariance-aware Mahalanobis metric within the prototypical block, our module adapts similarity measures to each class's intrinsic spectral–spatial covariance and scale variations, thereby enhancing prototype robustness under severe label scarcity and significantly reducing misclassification compared with existing few-shot networks. Secondly, we introduce the domain-aware adapter block designed to address domain shift and multi-domain variability by dynamically fusing shared spectral–spatial representations with domain-specific characteristics via spectral integration and switchable adapters. We undertook extensive experiments on three publicly available hyperspectral medical datasets: skin dermoscopy, multidimensional choledochal, and in-vivo brain dataset. Compared to state-of-the-art classifiers, the proposed method achieved excellent performance on all three datasets, paving the way for generalisable HSI solutions in clinical workflows and biomedical research.

Details DOI

AAAI Conference 2026 Conference Paper

DivControl: Knowledge Diversion for Controllable Image Generation

Yucheng Xie
Fu Feng
Ruixiao Shi
Jing Wang
Yong Rui
Xin Geng

Diffusion models have advanced from text-to-image (T2I) to image-to-image (I2I) generation by incorporating structured inputs such as depth maps, enabling fine-grained spatial control. However, existing methods either train separate models for each condition or rely on unified architectures with entangled representations, resulting in poor generalization and high adaptation costs for novel conditions. To this end, we propose DivControl, a decomposable pretraining framework for unified controllable generation and efficient adaptation. DivControl factorizes ControlNet via SVD into basic components—pairs of singular vectors—which are disentangled into condition-agnostic learngenes and condition-specific tailors through knowledge diversion during multi-condition training. Knowledge diversion is implemented via a dynamic gate that performs soft routing over tailors based on the semantics of condition instructions, enabling zero-shot generalization and parameter-efficient adaptation to novel conditions. To further improve condition fidelity and training efficiency, we introduce a representation alignment loss that aligns condition embeddings with early diffusion features. Extensive experiments demonstrate that DivControl achieves state-of-the-art controllability with 36.4× less training cost, while simultaneously improving average performance on basic conditions. It also delivers strong zero-shot and few-shot performance on unseen conditions, demonstrating superior scalability, modularity, and transferability.

PDF Details DOI

AAAI Conference 2026 Conference Paper

EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion

Shang Liu
Chenjie Cao
Chaohui Yu
Wen Qian
Jing Wang
Fan Wang

Despite the remarkable developments achieved by recent 3D generation works, scaling these methods to geographic extents, such as modeling thousands of square kilometers of Earth’s surface, remains an open challenge. We address this through a dual innovation in data infrastructure and model architecture. First, we introduce Aerial-Earth3D, the largest 3D aerial dataset to date, consisting of 50k curated scenes (each measuring 600m) captured across the U.S. mainland, comprising 45M multi-view Google Earth frames. Each scene provides pose-annotated multi-view images, depth maps, normals, semantic segmentation, and camera poses, with explicit quality control to ensure terrain diversity. Building on this foundation, we propose EarthCrafter, a tailored framework for large-scale 3D Earth generation via sparse-decoupled latent diffusion. Our architecture separates structural and textural generation: 1) Dual sparse 3D-VAEs compress high-resolution geometric voxels and textural 2D Gaussian Splats (2DGS) into compact latent spaces, largely alleviating the costly computation suffering from vast geographic scales while preserving critical information. 2) We propose condition-aware flow matching models trained on mixed inputs (semantics, images, or neither) to flexibly model latent geometry and texture features independently. Extensive experiments demonstrate that EarthCrafter performs substantially better in extremely large-scale generation. The framework further supports versatile applications, from semantic-guided urban layout generation to unconditional terrain synthesis, while maintaining geographic plausibility through our rich data priors from Aerial-Earth3D.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Learngene: Inheritable ‘Genes’ in Intelligent Agents (Abstract Reprint)

Fu Feng
Jing Wang
Xu Yang
Xin Geng

Biological intelligence has driven significant progress in artificial intelligence (AI), but a critical gap remains: biological systems inherit innate abilities from genes, with brains initialized by blueprints refined over 3.5 billion years of evolution, while machines rely heavily on inefficient, data-driven learning from scratch. This gap arises from the lack of a genetic mechanism in machines to transfer and accumulate inheritable knowledge across generations. To bridge this gap, we propose learngenes, network fragments that act as inheritable 'genes' for machines. Unlike conventional knowledge transfer methods, learngenes enable efficient and universal knowledge transfer by selectively encapsulating task-agnostic knowledge. To facilitate the transfer and accumulation of task-agnostic knowledge across generations, we introduce Genetic Reinforcement Learning (GRL), a framework that simulates the learning and evolution of organisms in intelligent agents following Lamarckian principles. Through GRL, we identify learngenes as network fragments within agents' policy networks, equipping newborn agents with innate abilities for rapid adaptation to novel tasks. We demonstrate the advantages of learngene-based knowledge transfer over evolution-based search and traditional pre-trained models, and show how learngenes evolve through the accumulation of task-agnostic knowledge. Overall, this work establishes a novel paradigm for knowledge transfer and model initialization in AI, offering new possibilities for more adaptive, efficient, and scalable learning systems.

PDF Details DOI

AAAI Conference 2026 Conference Paper

RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers

Ke Cao
Jing Wang
Ao Ma
Jiasong Feng
Xuanhua He
Run Ling
Haowei Liu
Jian Lu

The Diffusion Transformer plays a pivotal role in advancing text-to-image and text-to-video generation, owing primarily to its inherent scalability. However, existing controlled diffusion transformer methods incur significant parameter and computational overheads and suffer from inefficient resource allocation due to their failure to account for the varying relevance of control information across different transformer layers. To address this, we propose the Relevance-Guided Efficient Controllable Generation framework, RelaCtrl, enabling efficient and resource-optimized integration of control signals into the Diffusion Transformer. First, we evaluate the relevance of each layer in the Diffusion Transformer to the control information by assessing the ControlNet Relevance Score, which measures the impact of skipping each control layer on both the quality of generation and the control effectiveness during inference. Based on the strength of the relevance, we then tailor the positioning, parameter scale, and modeling capacity of the control layers to reduce unnecessary parameters and redundant computations. Additionally, to further improve efficiency, we replace the self-attention and FFN in the commonly used copy block with the carefully designed Two-Dimensional Shuffle Mixer (TDSM), enabling efficient implementation of both the token mixer and channel mixer. Both qualitative and quantitative experimental results demonstrate that our approach achieves superior performance with only 15% of the parameters and computational complexity compared to PixArt-delta.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Robust Single-Stage Fully Sparse 3D Object Detection via Detachable Latent Diffusion

Wentao Qu
Guofeng Mei
Jing Wang
Yujiao Wu
Xiaoshui Huang
Liang Xiao

Denoising Diffusion Probabilistic Models (DDPMs) have shown success in robust 3D object detection tasks. Existing methods often rely on the score matching from 3D boxes or pre-trained diffusion priors. However, they typically require multi-step iterations in inference, which limits efficiency. To address this, we propose a Robust single-stage fully Sparse 3D object Detection Network with a Detachable Latent Framework (DLF) of DDPMs, named RSDNet. Specifically, RSDNet learns the denoising process in latent feature spaces through lightweight denoising networks like multi-level denoising autoencoders (DAEs). This enables RSDNet to effectively understand scene distributions under multi-level perturbations, achieving robust and reliable detection. Meanwhile, we reformulate the noising and denoising mechanisms of DDPMs, enabling DLF to construct multi-type and multi-level noise samples and targets, enhancing RSDNet robustness to multiple perturbations. Furthermore, a semantic-geometric conditional guidance is introduced to perceive the object boundaries and shapes, alleviating the center feature missing problem in sparse representations, enabling RSDNet to perform in a fully sparse detection pipeline. Moreover, the detachable denoising network design of DLF enables RSDNet to perform single-step detection in inference, further enhancing detection efficiency. Extensive experiments on public benchmarks show that RSDNet can outperform existing methods, achieving state-of-the-art detection.

PDF Details DOI

JBHI Journal 2026 Journal Article

SHIELD: Blockchain-Enabled Lightweight Authentication Framework for Secure Wearable Health Monitoring in IoMT Environments

Chao Wang
Xingsi Xue
Jing Wang
Huamao Jiang

Consumer health devices generate massive volumes of sensitive medical data requiring secure authentication mechanisms that accommodate the resource constraints of wearable sensors and portable diagnostic equipment. Traditional centralized authentication approaches in Internet of Medical Things (IoMT) environments suffer from single points of failure, privacy vulnerabilities, and scalability limitations when managing diverse health monitoring devices. This paper presents secure healthcare IoMT enhanced lightweight device authentication (SHIELD), a blockchain-based lightweight authentication framework designed for resource-constrained consumer health devices. The framework leverages blockchain's immutable and decentralized properties, combined with efficient elliptic curve cryptography, to ensure secure storage and verification of device identities while providing mutual authentication between health devices and medical data servers. Security analysis demonstrates that SHIELD satisfies twelve critical security properties, including decentralization, resistance to password guessing and replay attacks, perfect forward secrecy, and session key security. Performance evaluation reveals that SHIELD achieves computational efficiency at 9. 837 milliseconds authentication latency, representing 31% improvement over previous best-performing schemes. The framework requires only 1384 bits of communication overhead and maintains minimal average delay times suitable for real-time health monitoring applications. Blockchain implementation analysis confirms practical deployment feasibility with 0. 0356 MGas operational costs per authentication session.

Details DOI

YNIMG Journal 2025 Journal Article

A novel approach of [18F]FDG PET-based individual metabolic radiomics network to predict cognitive impairment in multiple system atrophy

Daoyan Hu
Xiaofeng Dou
Jing Wang
Chentao Jin
Ke Liu
Rui Zhou
Xiaohui Zhang
Congcong Yu

Details DOI

AIIM Journal 2025 Journal Article

A novel memory interaction neural network for multi-label drug–drug interaction prediction with neighbor importance sampling

Jing Wang
Runzhi Li
Shuo Zhang
YunLi Xing
Siyu Yan
Lihong Ma

Details DOI

YNIMG Journal 2025 Journal Article

A practical measure of integrated information reveals alpha-band activity and the posterior cortex as neural correlates of arousal

Xin Wen
Yu Chang
Sijie Li
Jing Wang
Xiaoli Li
Duan Li
Changwei Wei
Zhenhu Liang

Details DOI

NeurIPS Conference 2025 Conference Paper

Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies

Jing Wang
Weiting Peng
Jing Tang
Zeyu Gong
Xihua Wang
Bo Tao
Li Cheng

Existing imitation learning methods decouple perception and action, which overlooks the causal reciprocity between sensory representations and action execution that humans naturally leverage for adaptive behaviors. To bridge this gap, we introduce Action-Guided Diffusion Policy (DP-AG), a unified representation learning that explicitly models a dynamic interplay between perception and action through probabilistic latent dynamics. DP-AG encodes latent observations into a Gaussian posterior via variational inference and evolves them using an action-guided SDE, where the Vector–Jacobian Product (VJP) of the diffusion policy's noise predictions serves as a structured stochastic force driving latent updates. To promote bidirectional learning between perception and action, we introduce a cycle-consistent contrastive loss that organizes the gradient flow of the noise predictor into a coherent perception–action loop, enforcing mutually consistent transitions in both latent updates and action refinements. Theoretically, we derive a variational lower bound for the action-guided SDE, and prove that the contrastive objective enhances continuity in both latent and action trajectories. Empirically, DP-AG significantly outperforms state-of-the-art methods across simulation benchmarks and real-world UR5 manipulation tasks. As a result, our DP-AG offers a promising step toward bridging biological adaptability and artificial policy learning. Code is available on our project website: https: //jingwang18. github. io/dp-ag. github. io/.

PDF Details

EAAI Journal 2025 Journal Article

An interpretable dynamic Bayesian network method for time-varying seismic liquefaction uplift risk assessment of underground rectangular structures

Jing Wang
Jilei Hu
Luou Pang

Details DOI

IJCAI Conference 2025 Conference Paper

A³-Net: Calibration-Free Multi-View 3D Hand Reconstruction for Enhanced Musical Instrument Learning

Geng Chen
Xufeng Jian
Yuchen Chen
Pengfei Ren
Jingyu Wang
Haifeng Sun
Qi Qi
Jing Wang

Precise 3D hand posture is essential for learning musical instruments. Reconstructing highly precise 3D hand gestures enables learners to correct and master proper techniques through 3D simulation and Extended Reality. However, exsiting methods typically rely on precisely calibrated multi-camera systems, which are not easily deployable in everyday environments. In this paper, we focus on calibration-free multi-view 3D hand reconstruction in unconstrained scenarios. Establishing correspondences between multi-view images is particularly challenging without camera extrinsics. To address this, we propose A^3-Net, a multi-level alignment framework that utilizes 3D structural representations with hierarchical geometric and explicit semantic information as alignment proxies, facilitating multi-view feature interaction in both 3D geometric space and 2D visual space. Specifically, we first perfrom global geometric alignment to map multi-view features into a canonical space. Subsequently, we aggregate information into predefined sparse and dense proxies to further integrate cross-view semantics through mutual interaction. Finnaly, we perfrom 2D alignment to align projected 2D visual features with 2D observations. Our method achieves state-of-the-art results in the multi-view 3D hand reconstruction task, demonstrating the effectiveness of our proposed framework.

PDF Details DOI

JBHI Journal 2025 Journal Article

CDAF-Net: A Contextual Contrast Detail Attention Feature Fusion Network for Low-Dose CT Denoising

Yaoyao Ma
Jing Wang
Chao Xu
Yuling Huang
Minghang Chu
Zhiwei Fan
Yishen Xu
Di Wu

Low-dose computed tomography (LDCT) is a specialized CT scan with a lower radiation dose than normal-dose CT. However, the reduced radiation dose can introduce noise and artifacts, affecting diagnostic accuracy. To enhance the LDCT image quality, we propose a Contextual Contrast Detail Attention Feature Fusion Network (CDAF-Net) for LDCT denoising. Firstly, the LDCT image, with dimensions 1 × H × W, is mapped to a feature map with dimensions C × H × W, and it is processed through the Contextual Contrast Detail Attention (CCDA) module and the Selective Kernel Feature Fusion (SKFF) module. The CCDA module combines a global contextual attention mechanism with detail-enhanced differential convolutions to better understand the overall semantics and structure of the LDCT image, capturing subtle changes and details. The SKFF module effectively merges shallow features extracted by the encoder with deep features from the decoder, integrating feature representations from different levels. This process is repeated across four different resolution feature maps, and the denoised LDCT image is output through a skip connection. We conduct experiments on the Mayo dataset, the LDCT-and-Projection-Data dataset, and the Piglet dataset. Specifically, the CDAF-Net achieves the optimal metrics with a PSNR of 33. 7262 dB, an SSIM of 0. 9254, and an RMSE of 5. 3731 on the Mayo dataset. Improvements are also observed in head CT and ultra-low-dose chest CT images of the LDCT-and-Projection-Data dataset and the Piglet dataset. Experimental results show that the proposed CDAF-Net algorithm provides superior denoising performance compared with the state-of-the-art (SOTA) algorithms.

Details DOI

ICLR Conference 2025 Conference Paper

CircuitFusion: Multimodal Circuit Representation Learning for Agile Chip Design

Wenji Fang
Shang Liu
Jing Wang
Zhiyao Xie

The rapid advancements of AI rely on the support of integrated circuits (ICs). However, the growing complexity of digital ICs makes the traditional IC design process costly and time-consuming. In recent years, AI-assisted IC design methods have demonstrated great potential, but most methods are task-specific or focus solely on the circuit structure in graph format, overlooking other circuit modalities with rich functional information. In this paper, we introduce CircuitFusion, the first multimodal and implementation-aware circuit encoder. It encodes circuits into general representations that support different downstream circuit design tasks. To learn from circuits, we propose to fuse three circuit modalities: hardware code, structural graph, and functionality summary. More importantly, we identify four unique properties of circuits: parallel execution, functional equivalent transformation, multiple design stages, and circuit reusability. Based on these properties, we propose new strategies for both the development and application of CircuitFusion: 1) During circuit preprocessing, utilizing the parallel nature of circuits, we split each circuit into multiple sub-circuits based on sequential-element boundaries, each sub-circuit in three modalities. It enables fine-grained encoding at the sub-circuit level. 2) During CircuitFusion pre-training, we introduce three self-supervised tasks that utilize equivalent transformations both within and across modalities. We further utilize the multi-stage property of circuits to align representation with ultimate circuit implementation. 3) When applying CircuitFusion to downstream tasks, we propose a new retrieval-augmented inference method, which retrieves similar known circuits as a reference for predictions. It improves fine-tuning performance and even enables zero-shot inference. Evaluated on five different circuit design tasks, CircuitFusion consistently outperforms the state-of-the-art supervised method specifically developed for every single task, demonstrating its generalizability and ability to learn circuits' inherent properties.

Details

AAAI Conference 2025 Conference Paper

CoDTS: Enhancing Sparsely Supervised Collaborative Perception with a Dual Teacher-Student Framework

Yushan Han
Hui Zhang
Honglei Zhang
Jing Wang
Yidong Li

Current collaborative perception methods often rely on fully annotated datasets, which can be expensive to obtain in practical situations. To reduce annotation costs, some works adopt sparsely supervised learning techniques and generate pseudo labels for the missing instances. However, these methods fail to achieve an optimal confidence threshold that harmonizes the quality and quantity of pseudo labels. To address this issue, we propose an end-to-end Collaborative perception Dual Teacher-Student framework (CoDTS), which employs adaptive complementary learning to produce both high-quality and high-quantity pseudo labels. Specifically, the Main Foreground Mining (MFM) module generates high-quality pseudo labels based on the prediction of the static teacher. Subsequently, the Supplement Foreground Mining (SFM) module ensures a balance between the quality and quantity of pseudo labels by adaptively identifying missing instances based on the prediction of the dynamic teacher. Additionally, the Neighbor Anchor Sampling (NAS) module is incorporated to enhance the representation of pseudo labels. To promote the adaptive complementary learning, we implement a staged training strategy that trains the student and dynamic teacher in a mutually beneficial manner. Extensive experiments demonstrate that the CoDTS effectively ensures an optimal balance of pseudo labels in both quality and quantity, establishing a new state-of-the-art in sparsely supervised collaborative perception.

PDF Details DOI

ICML Conference 2025 Conference Paper

COSDA: Counterfactual-based Susceptibility Risk Framework for Open-Set Domain Adaptation

Wenxu Wang
Rui Zhou
Jing Wang
Yun Zhou
Cheng Zhu
Ruichun Tang
Bo Han
Nevin L. Zhang

Open-Set Domain Adaptation (OSDA) aims to transfer knowledge from the labeled source domain to the unlabeled target domain that contains unknown categories, thus facing the challenges of domain shift and unknown category recognition. While recent works have demonstrated the potential of causality for domain alignment, little exploration has been conducted on causal-inspired theoretical frameworks for OSDA. To fill this gap, we introduce the concept of Susceptibility and propose a novel C ounterfactual-based susceptibility risk framework for OSDA, termed COSDA. Specifically, COSDA consists of three novel components: (i) a Susceptibility Risk Estimator (SRE) for capturing causal information, along with comprehensive derivations of the computable theoretical upper bound, forming a risk minimization framework under the OSDA paradigm; (ii) a Contrastive Feature Alignment (CFA) module, which is theoretically proven based on mutual information to satisfy the Exogeneity assumption and facilitate cross-domain feature alignment; (iii) a Virtual Multi-unknown-categories Prototype (VMP) pseudo-labeling strategy, providing label information by measuring how similar samples are to known and multiple virtual unknown category prototypes, thereby assisting in open-set recognition and intra-class discriminative feature learning. Extensive experiments demonstrate that our approach achieves state-of-the-art performance.

Details

EAAI Journal 2025 Journal Article

Cross-attention interaction learning network for multi-model image fusion via transformer

Jing Wang
Long Yu
Shengwei Tian

Details DOI

EAAI Journal 2025 Journal Article

Diabetes risk assessment model based on unbalanced public health examination data

Liangjun Jiang
Jing Wang
Jie Xie
Zhenhua Xia
Juan Li
Haimei Gong
Lei Wang

Details DOI

ICML Conference 2025 Conference Paper

DiMa: Understanding the Hardness of Online Matching Problems via Diffusion Models

Boyu Zhang
Aocheng Shen
Bing Liu
Qiankun Zhang 0001
Bin Yuan
Jing Wang
Shenghao Liu
Xianjun Deng

We explore the potential of AI-enhanced combinatorial optimization theory, taking online bipartite matching (OBM) as a case study. In the theoretical study of OBM, the hardness corresponds to a performance upper bound of a specific online algorithm or any possible online algorithms. Typically, these upper bounds derive from challenging instances meticulously designed by theoretical computer scientists. Zhang et al. (ICML 2024) recently provide an example demonstrating how reinforcement learning techniques enhance the hardness result of a specific OBM model. Their attempt is inspiring but preliminary. It is unclear whether their methods can be applied to other OBM problems with similar breakthroughs. This paper takes a further step by introducing DiMa, a unified and novel framework that aims at understanding the hardness of OBM problems based on denoising diffusion probabilistic models (DDPMs). DiMa models the process of generating hard instances as denoising steps, and optimizes them by a novel reinforcement learning algorithm, named shortcut policy gradient (SPG). We first examine DiMa on the classic OBM problem by reproducing its known hardest input instance in literature. Further, we apply DiMa to two well-known variants of OBM, for which the exact hardness remains an open problem, and we successfully improve their theoretical state-of-the-art upper bounds.

Details

EAAI Journal 2025 Journal Article

DPMF-Net: A dual-path perceptive multi-stage fusion network for skin lesion segmentation

Yuling Huang
Yaoyao Ma
Jing Wang
Chao Xu
Zhiwei Fan
Di Wu

Details DOI

IROS Conference 2025 Conference Paper

DRP: A Decomposition-Reflection-Prediction Framework for Long-Horizon Robot Task Planning using Large Language Models

Zhaowen Zheng
Zhuofeng Zhao
Haocen Wang
Jing Wang

Large language models have demonstrated powerful reasoning capabilities, and their integration with robotics has revolutionized human-computer interaction and automated task planning. However, LLMs are unaware of environmental knowledge and possible state changes in the environment during planning, which makes the generated tasks unexecutable, particularly when dealing with complex long-horizon tasks involving crowded objects and dynamic relations. In this paper, we propose a LLM-based robot task planning framework with support for environmental knowledge injection, which is called DRP(Decomposition-Reflection-Prediction). The DRP framework combines LLMs with rule-based task decomposition, multi-perspective reflection and environmental prediction to generate admissible actions for complex long-horizon tasks. We only leverage few-shot prompting to implement our framework, which avoids the need for additional model training work. Experiments on VirtualHome household task dataset show that the task plans generated by our method have improved the executability by 25. 23%, the subgoal success rate by 64. 29%, and the success rate by 58. 06%, in comparison to state-of-the-art baseline methods. The complete code of our framework has been made public at https://github.com/lab-bj/taskplanning

Details

NeurIPS Conference 2025 Conference Paper

Enhancing Consistency of Flow-Based Image Editing through Kalman Control

Haozhe Chi
Zhicheng Sun
Yang Jin
Yi Ma
Jing Wang
Yadong Mu

Flow-based generative models have gained popularity for image generation and editing. For instruction-based image editing, it is critical to ensure that modifications are confined to the targeted regions. Yet existing methods often fail to maintain consistency in non-targeted regions between the original / edited images. Our primary contribution is to identify the cause of this limitation as the error accumulation across individual editing steps and to address it by incorporating the historical editing trajectory. Specifically, we formulate image editing as a control problem and leverage the Kalman filter to integrate the historical editing trajectory. Our proposed algorithm, dubbed Kalman-Edit, reuses early-stage details from the historical trajectory to enhance the structural consistency of the editing results. To speed up editing, we introduce a shortcut technique based on approximate vector field velocity estimation. Extensive experiments on several datasets demonstrate its superior performance compared to previous state-of-the-art methods.

PDF Details

IJCAI Conference 2025 Conference Paper

FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

Jiasong Feng
Ao Ma
Jing Wang
Ke Cao
Zhanjie Zhang

Synthesizing motion-rich and temporally consistent videos remains a challenge in artificial intelligence, especially when dealing with extended durations. Existing text-to-video (T2V) models commonly employ spatial cross-attention for text control, equivalently guiding different frame generations without frame-specific textual guidance. Thus, the model's capacity to comprehend the temporal logic conveyed in prompts and generate videos with coherent motion is restricted. To tackle this limitation, we introduce FancyVideo, an innovative video generator that improves the existing text-control mechanism with the well-designed Cross-frame Textual Guidance Module (CTGM). Specifically, CTGM incorporates the Temporal Information Injector (TII) and Temporal Affinity Refiner (TAR) at the beginning and end of cross-attention, respectively, to achieve frame-specific textual guidance. Firstly, TII injects frame-specific information from latent features into text conditions, thereby obtaining cross-frame textual conditions. Then, TAR refines the correlation matrix between cross-frame textual conditions and latent features along the time dimension. Extensive experiments comprising both quantitative and qualitative evaluations demonstrate the effectiveness of FancyVideo. Our approach achieves state-of-the-art T2V generation results on the EvalCrafter benchmark and facilitates the synthesis of dynamic and consistent videos. Note that the T2V process of FancyVideo essentially involves a text-to-image step followed by T+I2V. This means it also supports the generation of videos from user images, i. e. , the image-to-video (I2V) task. A significant number of experiments have shown that its performance is also outstanding.

PDF Details DOI

YNIMG Journal 2025 Journal Article

Flexible ability-willingness trade-offs in cooperative partner choice: Evidence from a drift-diffusion model and ERP data

Qiang Xu
Jing Wang
Peng Li

Details DOI

NeurIPS Conference 2025 Conference Paper

From Indicators to Insights: Diversity-Optimized for Medical Series-Text Decoding via LLMs

Xiyuan Jin
Jing Wang
Ziwei Lin
QIANRU JIA
Yuqing Huang
Xiaojun Ning
Zhonghua Shi
Youfang Lin

Medical time-series analysis differs fundamentally from general ones by requiring specialized domain knowledge to interpret complex signals and clinical context. Large language models (LLMs) hold great promise for augmenting medical time-series analysis by complementing raw series with rich contextual knowledge drawn from biomedical literature and clinical guidelines. However, realizing this potential depends on precise and meaningful prompts that guide the LLM to key information. Yet, determining what constitutes effective prompt content remains non-trivial—especially in medical settings where signal interpretation often hinges on subtle, expert-defined decision-making indicators. To this end, we propose InDiGO, a knowledge-aware evolutionary learning framework that integrates clinical signals and decision-making indicators through iterative optimization. Across four medical benchmarks, InDiGO consistently outperforms prior methods. The code is available at: https: //github. com/jinxyBJTU/InDiGO.

PDF Details

NeurIPS Conference 2025 Conference Paper

FuncGenFoil: Airfoil Generation and Editing Model in Function Space

Jinouwen Zhang
Junjie Ren
Ma Qianhong
Jianyu Wu
Aobo Yang
Yan Lu
Lu Chen
Hairun Xie

Aircraft manufacturing is the jewel in the crown of industry, in which generating high-fidelity airfoil geometries with controllable and editable representations remains a fundamental challenge. Existing deep learning methods, which typically rely on predefined parametric representations (e. g. , Bézier curves) or discrete point sets, face an inherent trade-off between expressive power and resolution adaptability. To tackle this challenge, we introduce FuncGenFoil, a novel function-space generative model that directly reconstructs airfoil geometries as function curves. Our method inherits the advantages of arbitrary-resolution sampling and smoothness from parametric functions, as well as the strong expressiveness of discrete point-based representations. Empirical evaluations demonstrate that FuncGenFoil improves upon state-of-the-art methods in airfoil generation, achieving a relative 74. 4% reduction in label error and a 23. 2% increase in diversity on the AF-200K dataset. Our results highlight the advantages of function-space modeling for aerodynamic shape optimization, offering a powerful and flexible framework for high-fidelity airfoil design.

PDF Details

NeurIPS Conference 2025 Conference Paper

Generalizable Hand-Object Modeling from Monocular RGB Images via 3D Gaussians

Xingyu Liu
Pengfei Ren
Qi Qi
Haifeng Sun
Zirui Zhuang
Jing Wang
Jianxin Liao
Jingyu Wang

Recent advances in hand-object interaction modeling have employed implicit representations, such as Signed Distance Functions (SDF) and Neural Radiance Fields (NeRF) to reconstruct hands and objects with arbitrary topology and photo-realistic detail. However, these methods often rely on dense 3D surface annotations, or are tailored to short clips constrained in motion trajectories and scene contexts, limiting their generalization to diverse environments and movement patterns. In this work, we present HOGS, an adaptively perceptive 3D Gaussian Splatting (3DGS) framework for generalizable hand-object modeling from unconstrained monocular RGB images. By integrating photometric cues from the visual modality with the physically grounded structure of 3D Gaussians, HOGS disentangles inherent geometry from transient lighting and motion-induced appearance changes. This endows hand-object assets with the ability to generalize to unseen environments and dynamic motion patterns. Experiments on two challenging datasets demonstrate that HOGS outperforms state-of-the-art methods in monocular hand-object reconstruction and photo-realistic rendering.

PDF Details

EAAI Journal 2025 Journal Article

Health- and behavior-aware energy management strategy for fuel cell hybrid electric vehicles based on parallel deep deterministic policy gradient learning

Haochen Sun
Jing Li
Chun Cheng
Suzhen Shi
Jing Wang
Jingjing Lin
Yang Liu

Details DOI

JBHI Journal 2025 Journal Article

Irregular Artificial Vision Optimization Strategies Based on Transformer Saliency Detection

Jing Wang
Rongfeng Zhao
Haiyang He
Xianglong Zhou
Yanling Han

To improve the performance of object recognition under artificial prosthetic vision, this study proposes a two-stage method. The first stage is to extract the saliency and edge Mask of the object (SMP, EMP). Then, the irregular visual information of the object is processed using Irregularity Correction (IC). We design eye-hand coordination tasks and simulate artificial vision with retinal prostheses to validate strategy effectiveness, and select direct pixelation (DP) as a control group. Each subject retained a phosphene map in the same stochastic pattern in all his/her trails. The real-time experimental results showed that the deep saliency-based optimization strategies improved the performance of the subjects when completing tasks, in terms of head movement, recognition accuracy, and response time, and counts for successful small-objects recognition. The subjects have the smallest-scale average head movement (76. 53 deg ± 20. 75 deg), higher average objects recognition accuracy (91. 18% ± 2. 52%), and less time for finishing the task (35. 71 s ± 8. 66 s) and better successful search times of the small target objects (1. 35 ± 0. 33) under the SMP strategy. When integrating with IC, subjects’ average performances have further improved to 63. 39 ± 15. 38 deg, 94. 22% ± 3. 94%, 25. 76 s ± 6. 24 s and 1. 05 ± 0. 30 respectively, which also significantly outperformed the DP condition. These results indicated that when utilizing the deep-learning-based saliency detection and IC processing, subjects could shorten the searching process and were able to discern the target objects more reliably. This work could be informative to future prosthetic devices considering implementation with the technique of artificial intelligence.

Details DOI

IJCAI Conference 2025 Conference Paper

Label Distribution Learning with Biased Annotations Assisted by Multi-Label Learning

Zhiqiang Kou
Si Qin
Hailin Wang
Jing Wang
Mingkun Xie
Shuo Chen
Yuheng Jia
Tongliang Liu

Multi-label learning (MLL) has gained attention for its ability to represent real-world data. Label Distribution Learning (LDL), an extension of MLL to learning from label distributions, faces challenges in collecting accurate label distributions. To address the issue of biased annotations, based on the low-rank assumption, existing works recover true distributions from biased observations by exploring the label correlations. However, recent evidence shows that the label distribution tends to be full-rank, and naive apply of low-rank approximation on biased observation leads to inaccurate recovery and performance degradation. In this paper, we address the LDL with biased annotations problem from a novel perspective, where we first degenerate the soft label distribution into a hard multi-hot label and then recover the true label information for each instance. This idea stems from an insight that assigning hard multi-hot labels is often easier than assigning a soft label distribution, and it shows stronger immunity to noise disturbances, leading to smaller label bias. Moreover, assuming that the multi-label space for predicting label distributions is low-rank offers a more reasonable approach to capturing label correlations. Theoretical analysis and experiments confirm the effectiveness and robustness of our method on real-world datasets.

PDF Details DOI

AIJ Journal 2025 Journal Article

Learngene: Inheritable “genes” in intelligent agents

Fu Feng
Jing Wang
Xu Yang
Xin Geng

Details DOI

YNIMG Journal 2025 Journal Article

Mapping the mind’s landscape: Common neural encoding for spatial and morality concepts

Jing Wang
Miao Qian
Qing Cai

Details DOI

JBHI Journal 2025 Journal Article

Multi-Modal Encrypted Retrieval Method with Semantic Feature Fusion towards Internet of Medical Things

Puning Zhang
Yingjie Wang
Jing Wang
Zhen Zhang

There exists a tremendous amount of multimodal data in the Internet of Medical Things (IoMT), retrieval technology can extract target data on demand from the extensive multimodal medical data space, which is crucial for aiding diagnosis and medical informatization. However, existing methods only focus on single-modal data such as medical texts, without considering the privacy protection and retrieval needs of users' multimodal data. Furthermore, these existing methods only match keywords and fail to effectively mine the semantic features of multimodal data, thereby limiting the performance of retrieval systems. To address these issues, this paper proposes a multimodal encrypted retrieval method for the IoMT based on semantic feature fusion and designs a multimodal semantic feature extraction model based on searchable encryption technology to enable encrypted retrieval of multimodal data. Specifically, an edge-cloud collaboration concept is introduced to underpin a secure semantic search architecture tailored for multimodal data, which ensures low-latency encrypted retrieval while safeguarding user privacy. Besides, a semantic-aware multimodal feature extraction method is designed, enhancing the capability of mining semantic features and replacing the traditional keyword retrieval mode with semantic feature retrieval. Moreover, a multimodal data encrypted retrieval method is proposed, employing a block idea and parallel search tree structure, which achieves rapid retrieval of semantic similarity with low-cost and privacy-preserving. Simulation results demonstrate that the proposed method significantly outperforms the latest research regarding precision, search delay, and storage overhead.

Details DOI

TMLR Journal 2025 Journal Article

ODNet: Opinion Dynamics-Inspired Neural Message Passing for Graphs and Hypergraphs

Bingxin Zhou
Outongyi Lv
Jing Wang
Xiang Xiao
Weishu Zhao

Neural message passing serves as a cornerstone framework in graph neural networks, providing a clear and intuitive mathematical guideline for the propagation and aggregation of information among interconnected nodes within graphs. Throughout this process, node representations undergo dynamic updates, considering both the individual states and connections of neighboring nodes. Concurrently, social networks, as prominent forms of interconnected data, form dynamic systems that achieve stability through continuous internal communications and opinion exchanges among social actors along their social ties. Drawing upon the shared concepts between these two domains, our study establishes an explicit connection between message passing and opinion dynamics in sociology. Moreover, we introduce a novel continuous message passing scheme termed ODNet, which integrates bounded confidence to refine the influence weight of local nodes for message propagation. By adjusting the similarity cutoffs of bounded confidence and influence weights within ODNet, we define opinion exchange rules that align with the characteristics of neural message passing and can effectively mitigate the oversmoothing issue. We extend the framework to hypergraphs and formulate corresponding continuous message passing rules, which reveal a close association with particle dynamics. Empirically, we showcase that ODNet enhances prediction performance across various social networks presented as homophilic graphs, heterophilic graphs, and hypergraphs. Notably, our proposed ODNet outperforms existing GNNs with its straightforward construction and robust theoretical foundation.

PDF Details

NeurIPS Conference 2025 Conference Paper

REFED: A Subject Real-time Dynamic Labeled EEG-fNIRS Synchronized Recorded Emotion Dataset

Xiaojun Ning
Jing Wang
Zhiyang Feng
Tianzuo Xin
Shuo Zhang
Shaoqi Zhang
Zheng Lian
Yi Ding

Affective brain-computer interfaces (aBCIs) play a crucial role in personalized human–computer interaction and neurofeedback modulation. To develop practical and effective aBCI paradigms and to investigate the spatial-temporal dynamics of brain activity under emotional inducement, portable electroencephalography (EEG) signals have been widely adopted. To further enhance spatial-temporal perception, functional near-infrared spectroscopy (fNIRS) has attracted increasing interest in the aBCI field and has been explored in combination with EEG. However, existing datasets typically provide only static fixation labels, overlooking the dynamic changes in subjects' emotions. Notably, some studies have attempted to collect continuously annotated emotional data, but they have recorded only peripheral physiological signals without directly observing brain activity, limiting insight into underlying neural states under different emotions. To address these challenges, we present the Real-time labeled EEG-fNIRS Dataset (REFED). To the best of our knowledge, this is the first EEG-fNIRS dataset with real-time dynamic emotional annotations. REFED simultaneously records brain signals from both EEG and fNIRS modalities while providing continuous, real-time annotations of valence and arousal. The results of the data analysis demonstrate the effectiveness of emotion inducement and the reliability of real-time annotation. This dataset offers the possibility for studying the neurovascular coupling mechanism under emotional evolution and for developing dynamic, robust affective BCIs.

PDF Details

AIIM Journal 2025 Journal Article

Rethinking mitosis detection: Towards diverse data and feature representation for better domain generalization

Jiatai Lin
Hao Wang
Danyi Li
Jing Wang
Bingchao Zhao
Zhenwei Shi
Changhong Liang
Guoqiang Han

Details DOI

YNIMG Journal 2025 Journal Article

Right inferior frontal cortex and preSMA in response inhibition: An investigation based on PTC model

Lili Wu
Mengjie Jiang
Min Zhao
Xin Hu
Jing Wang
Kaihua Zhang
Ke Jia
Fuxin Ren

Details DOI

NeurIPS Conference 2025 Conference Paper

StreamForest: Efficient Online Video Understanding with Persistent Event Memory

Xiangyu Zeng
Kefan Qiu
Qingyu Zhang
Xinhao Li
Jing Wang
Jiaxin Li
Ziang Yan
Kun Tian

Multimodal Large Language Models (MLLMs) have recently achieved remarkable progress in video understanding. However, their effectiveness in real-time streaming scenarios remains limited due to storage constraints of historical visual features and insufficient real-time spatiotemporal reasoning. To address these challenges, we propose StreamForest, a novel architecture specifically designed for streaming video understanding. Central to StreamForest is the Persistent Event Memory Forest, a memory mechanism that adaptively organizes video frames into multiple event-level tree structures. This process is guided by penalty functions based on temporal distance, content similarity, and merge frequency, enabling efficient long-term memory retention under limited computational resources. To enhance real-time perception, we introduce a Fine-grained Spatiotemporal Window, which captures detailed short-term visual cues to improve current scene perception. Additionally, we present OnlineIT, an instruction-tuning dataset tailored for streaming video tasks. OnlineIT significantly boosts MLLM performance in both real-time perception and future prediction. To evaluate generalization in practical applications, we introduce ODV-Bench, a new benchmark focused on real-time streaming video understanding in autonomous driving scenarios. Experimental results demonstrate that StreamForest achieves the state-of-the-art performance, with accuracies of 77. 3% on StreamingBench, 60. 5% on OVBench, and 55. 6% on OVO-Bench. In particular, even under extreme visual token compression (limited to 1024 tokens), the model retains 96. 8% of its average accuracy in eight benchmarks relative to the default setting. These results underscore the robustness, efficiency, and generalizability of StreamForest for streaming video understanding.

PDF Details

JBHI Journal 2025 Journal Article

Subject-Adaptation Salient Wave Detection Network for Multimodal Sleep Stage Classification

Jing Wang
Xuehui Wang
Xiaojun Ning
Youfang Lin
Huy Phan
Ziyu Jia

Sleep stage classification is an important step in the diagnosis and treatment of sleep disorders. Despite the high classification performance of previous sleep stage classification work, some challenges remain unresolved: 1) How to effectively capture salient waves in sleep signals to improve sleep stage classification results. 2) How to capture salient waves affected by inter-subject variability. 3) How to adaptively regulate the importance of different modals for different sleep stages. To address these challenges, we propose SleepWaveNet, a multimodal salient wave detection network, which is motivated by the salient object detection task in computer vision. It has a U-Transformer structure to detect salient waves in sleep signals. Meanwhile, the subject-adaptation wave extraction architecture based on transfer learning can adapt to the information of target individuals and extract salient waves with inter-subject variability. In addition, the multimodal attention module can adaptively enhance the importance of specific modal data for sleep stage classification tasks. Experiments on three datasets show that SleepWaveNet has better overall performance than existing baselines. Moreover, visualization experiments show that the model has the ability to capture salient waves with inter-subject variability.

Details DOI

AAAI Conference 2025 Conference Paper

SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering

Xiaopeng Li
Shasha Li
Shezheng Song
Huijun Liu
Bin Ji
Xi Wang
Jun Ma
Jie Yu

The general capabilities of large language models (LLMs) make them the infrastructure for various AI applications, but updating their inner knowledge requires significant resources. Recent model editing is a promising technique for efficiently updating a small amount of knowledge of LLMs and has attracted much attention. In particular, local editing methods, which directly update model parameters, are proven suitable for updating small amounts of knowledge. Local editing methods update weights by computing least squares closed-form solutions and identify edited knowledge by vector-level matching in inference, which achieve promising results. However, these methods still require a lot of time and resources to complete the computation. Moreover, vector-level matching lacks reliability, and such updates disrupt the original organization of the model's parameters. To address these issues, we propose a detachable and expandable Subject Word Embedding Altering (SWEA) framework, which finds the editing embeddings through token-level matching and adds them to the subject word embeddings in Transformer input. To get these editing embeddings, we propose optimizing then suppressing fusion method, which first optimizes learnable embedding vectors for the editing target and then suppresses the Knowledge Embedding Dimensions (KEDs) to obtain final editing embeddings. We thus propose SWEAOS method for editing factual knowledge in LLMs. We demonstrate the overall state-of-the-art (SOTA) performance of SWEAOS on the CounterFact and zsRE datasets. To further validate the reasoning ability of SWEAOS in editing knowledge, we evaluate it on the more complex RippleEdits benchmark. The results demonstrate that SWEAOS possesses SOTA reasoning ability.

PDF Details DOI

EAAI Journal 2025 Journal Article

Training-free multi-scale neural architecture search for high-incidence cancer prediction

Jie Zheng
Chunlin He
Wenxing Man
Jing Wang

Details DOI

ICML Conference 2025 Conference Paper

Network quantization, one of the most widely studied model compression methods, effectively quantizes a floating-point model to obtain a fixed-point one with negligible accuracy loss. Although great success was achieved in reducing the model size, it may exacerbate the unfairness in model accuracy across different groups of datasets. This paper considers two widely used algorithms: Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT), with an attempt to understand how they cause this critical issue. Theoretical analysis with empirical verifications reveals two responsible factors, as well as how they influence a metric of fairness in depth. A comparison between PTQ and QAT is then made, explaining an observation that QAT behaves even worse than PTQ in fairness, although it often preserves a higher accuracy at lower bit-widths in quantization. Finally, the paper finds out that several simple data augmentation methods can be adopted to alleviate the disparate impacts of quantization, based on a further observation that class imbalance produces distinct values of the aforementioned factors among different attribute classes. We experiment on either imbalanced (UTK-Face and FER2013) or balanced (CIFAR-10 and MNIST) datasets using ResNet and VGG models for empirical evaluation.

Details

NeurIPS Conference 2025 Conference Paper

Unified 2D-3D Discrete Priors for Noise-Robust and Calibration-Free Multiview 3D Human Pose Estimation

Geng Chen
Pengfei Ren
Xufeng Jian
Haifeng Sun
Menghao Zhang
Qi Qi
Zirui Zhuang
Jing Wang

Multi-view 3D human pose estimation (HPE) leverages complementary information across views to improve accuracy and robustness. Traditional methods rely on camera calibration to establish geometric correspondences, which is sensitive to calibration accuracy and lacks flexibility in dynamic settings. Calibration-free approaches address these limitations by learning adaptive view interactions, typically leveraging expressive and flexible continuous representations. However, as the multiview interaction relationship is learned entirely from data without constraint, they are vulnerable to noisy input, which can propagate, amplify and accumulate errors across all views, severely corrupting the final estimated pose. To mitigate this, we propose a novel framework that integrates a noise-resilient discrete prior into the continuous representation-based model. Specifically, we introduce the \textit{UniCodebook}, a unified, compact, robust, and discrete representation complementary to continuous features, allowing the model to benefit from robustness to noise while preserving regression capability. Furthermore, we further propose an attribute-preserving and complementarity-enhancing Discrete-Continuous Spatial Attention (DCSA) mechanism to facilitate interaction between discrete priors and continuous pose features. Extensive experiments on three representative datasets demonstrate that our approach outperforms both calibration-required and calibration-free methods, achieving state-of-the-art performance.

PDF Details

EAAI Journal 2025 Journal Article

Unscented Kalman filter neural network with double-layer decomposition algorithm applied to the prediction of current efficiency in aluminum electrolysis processes

Xiaoyan Fang
Xihong Fei
Zhenyi Xu
Lei Su
Jing Wang

Details DOI

NeurIPS Conference 2025 Conference Paper

Vicinity-Guided Discriminative Latent Diffusion for Privacy-Preserving Domain Adaptation

Jing Wang
Wonho Bae
Jiahong Chen
Wenxu Wang
Junhyug Noh

Recent work on latent diffusion models (LDMs) has focused almost exclusively on generative tasks, leaving their potential for discriminative transfer largely unexplored. We introduce Discriminative Vicinity Diffusion (DVD), a novel LDM-based framework for a more practical variant of source-free domain adaptation (SFDA): the source provider may share not only a pre-trained classifier but also an auxiliary latent diffusion module, trained once on the source data and never exposing raw source samples. DVD encodes each source feature’s label information into its latent vicinity by fitting a Gaussian prior over its k-nearest neighbors and training the diffusion network to drift noisy samples back to label-consistent representations. During adaptation, we sample from each target feature’s latent vicinity, apply the frozen diffusion module to generate source-like cues, and use a simple InfoNCE loss to align the target encoder to these cues, explicitly transferring decision boundaries without source access. Across standard SFDA benchmarks, DVD outperforms state-of-the-art methods. We further show that the same latent diffusion module enhances the source classifier’s accuracy on in-domain data and boosts performance in supervised classification and domain generalization experiments. DVD thus reinterprets LDMs as practical, privacy-preserving bridges for explicit knowledge transfer, addressing a core challenge in source-free domain adaptation that prior methods have yet to solve. Code is available on our Github: https: //github. com/JingWang18/DVD-SFDA.

PDF Details

NeurIPS Conference 2025 Conference Paper

WISA: World simulator assistant for physics-aware text-to-video generation

Jing Wang
Ao Ma
Ke Cao
Jun Zheng
Jiasong Feng
Zhanjie Zhang
Wanyuan Pang
Xiaodan Liang

Recent advances in text-to-video (T2V) generation, exemplified by models such as Sora and Kling, have demonstrated strong potential for constructing world simulators. However, existing T2V models still struggle to understand abstract physical principles and to generate videos that faithfully obey physical laws. This limitation stems primarily from the lack of explicit physical guidance, caused by a significant gap between high-level physical concepts and the generative capabilities of current models. To address this challenge, we propose the W orld S imulator A ssistant ( WISA ), a novel framework designed to systematically decompose and integrate physical principles into T2V models. Specifically, WISA decomposes physical knowledge into three hierarchical levels: textual physical descriptions, qualitative physical categories, and quantitative physical properties. It then incorporates several carefully designed modules—such as Mixture-of-Physical-Experts Attention (MoPA) and a Physical Classifier—to effectively encode these attributes and enhance the model’s adherence to physical laws during generation. In addition, most existing video datasets feature only weak or implicit representations of physical phenomena, limiting their utility for learning explicit physical principles. To bridge this gap, we present WISA-80K, a new dataset comprising 80, 000 human-curated videos that depict 17 fundamental physical laws across three core domains of physics: dynamics, thermodynamics, and optics. Experimental results show that WISA substantially improves the alignment of T2V models (such as CogVideoX and Wan2. 1) with real-world physical laws, achieving notable gains on the VideoPhy benchmark. Our data, code, and models are available in the Project Page.

PDF Details

JBHI Journal 2024 Journal Article

A Siamese-Transport Domain Adaptation Framework for 3D MRI Classification of Gliomas and Alzheimer's Diseases

Luyue Yu
Ju Liu
Qiang Wu
Jing Wang
Aixi Qu

Accurate and fully automated brain structure examination and prediction from 3D volumetric magnetic resonance imaging (MRI) is a necessary step in medical imaging analysis, which can assist greatly in clinical diagnosis. Traditional deep learning models suffer from severe performance degradation when applied to clinically acquired unlabeled data. The performance degradation is mainly caused by domain discrepancy such as different device types and parameter settings for data acquisition. However, existing approaches focus on the reduction of domain discrepancies but ignore the entanglement of semantic features and domain information. In this article, we explore the feature invariance of categories and domains in different projection spaces and propose a Siamese-Transport Domain Adaptation (STDA) method using a joint optimal transport theory and contrastive learning for automatic 3D MRI classification and glioma multi-grade prediction. Specifically, the learning framework updates the distribution of features across domains and categories by Siamese transport network training with an Optimal Cost Transfer Strategy (OCTS) and a Mutual Invariant Constraint (MIC) in two projective spaces to find multiple invariants in potential heterogeneity. We design three sets of transfer task scenarios with different source and target domains, and demonstrate that STDA yields substantially higher generalization performance than other state-of-the-art unsupervised domain adaptation (UDA) methods. The method is applicable on 3D MRI data from glioma to Alzheimer's disease and has promising applications in the future clinical diagnosis and treatment of brain diseases.

Details DOI

AAAI Conference 2024 Conference Paper

Adaptive FSS: A Novel Few-Shot Segmentation Framework via Prototype Enhancement

Jing Wang
Jiangyun Li
Chen Chen
Yisi Zhang
Haoran Shen
Tianxiang Zhang

The Few-Shot Segmentation (FSS) aims to accomplish the novel class segmentation task with a few annotated images. Current FSS research based on meta-learning focuses on designing a complex interaction mechanism between the query and support feature. However, unlike humans who can rapidly learn new things from limited samples, the existing approach relies solely on fixed feature matching to tackle new tasks, lacking adaptability. In this paper, we propose a novel framework based on the adapter mechanism, namely Adaptive FSS, which can efficiently adapt the existing FSS model to the novel classes. In detail, we design the Prototype Adaptive Module (PAM), which utilizes accurate category information provided by the support set to derive class prototypes, enhancing class-specific information in the multi-stage representation. In addition, our approach is compatible with diverse FSS methods with different backbones by simply inserting PAM between the layers of the encoder. Experiments demonstrate that our method effectively improves the performance of the FSS models (e.g., MSANet, HDMNet, FPTrans, and DCAMA) and achieves new state-of-the-art (SOTA) results (i.e., 72.4% and 79.1% mIoU on PASCAL-5i 1-shot and 5-shot settings, 52.7% and 60.0% mIoU on COCO-20i 1-shot and 5-shot settings). Our code is available at https://github.com/jingw193/AdaptiveFSS.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

AFBench: A Large-scale Benchmark for Airfoil Design

Jian Liu
Jianyu Wu
Hairun Xie
Guoqing Zhang
Jing Wang
Wei Liu
Wanli Ouyang
Junjun Jiang

Data-driven generative models have emerged as promising approaches towards achieving efficient mechanical inverse design. However, due to prohibitively high cost in time and money, there is still lack of open-source and large-scale benchmarks in this field. It is mainly the case for airfoil inverse design, which requires to generate and edit diverse geometric-qualified and aerodynamic-qualified airfoils following the multimodal instructions, \emph{i. e. ,} dragging points and physical parameters. This paper presents the open-source endeavors in airfoil inverse design, \emph{AFBench}, including a large-scale dataset with 200 thousand airfoils and high-quality aerodynamic and geometric labels, two novel and practical airfoil inverse design tasks, \emph{i. e. ,} conditional generation on multimodal physical parameters, controllable editing, and comprehensive metrics to evaluate various existing airfoil inverse design methods. Our aim is to establish \emph{AFBench} as an ecosystem for training and evaluating airfoil inverse design methods, with a specific focus on data-driven controllable inverse design models by multimodal instructions capable of bridging the gap between ideas and execution, the academic research and industrial applications. We have provided baseline models, comprehensive experimental observations, and analysis to accelerate future research. Our baseline model is trained on an RTX 3090 GPU within 16 hours. The codebase, datasets and benchmarks will be available at \url{https: //hitcslj. github. io/afbench/}.

PDF Details DOI

EAAI Journal 2024 Journal Article

An artificial bee colony with diversified operators for energy-efficient hybrid flow shop scheduling with batch processing machines

Yuhang Zhang
Deming Lei
Jing Wang

Details DOI

YNIMG Journal 2024 Journal Article

Brain extended and closed forms glutathione levels decrease with age and extended glutathione is associated with visuospatial memory

Xin Hu
Keyu Pan
Min Zhao
Jiali Lv
Jing Wang
Xiaofeng Zhang
Yuxi Liu
Yulu Song

Details DOI

NeurIPS Conference 2024 Conference Paper

Cluster-Learngene: Inheriting Adaptive Clusters for Vision Transformers

Qiufeng Wang
Xu Yang
Fu Feng
Jing Wang
Xin Geng

In recent years, the merging of vast datasets with powerful computational resources has led to the emergence of large pre-trained models in the field of deep learning. However, the common practices often overgeneralize the applicability of these models, overlooking the task-specific resource constraints. To mitigate this issue, we propose \textbf{Cluster-Learngene}, which effectively clusters critical internal modules from a large ancestry model and then inherits them to initialize descendant models of elastic scales. Specifically, based on the density characteristics of attention heads, our method adaptively clusters attention heads of each layer and position-wise feed-forward networks (FFNs) in the ancestry model as the learngene. Moreover, we introduce priority weight-sharing and learnable parameter transformations that expand the learngene to initialize descendant models of elastic scales. Through extensive experimentation, we demonstrate that Cluster-Learngene not only is more efficient compared to other initialization methods but also customizes models of elastic scales according to downstream task resources.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Contrastive and View-Interaction Structure Learning for Multi-view Clustering

Jing Wang
Songhe Feng

Existing Deep Multi-view Clustering (DMVC) approaches typically concentrate on capturing consensus semantics from multiple views, where contrastive learning is widely used to align view-specific representations of each view. Unfortunately, view-specific representations are extracted from the content information of the corresponding instance, neglecting the relationships among different instances. Furthermore, existing contrastive loss imports numerous false negative pairs that conflict with the clustering objectives. In response to these challenges, we propose a contraStive and viEw-interaction stRucture learning framework for multI-viEw cluStering (SERIES). Our method takes into account the structural relations among instances and boosts the contrastive loss to improve intra-class compactness. Meanwhile, a cross-view dual relation generation mechanism is introduced to achieve the consensus structural graph across multiple views for clustering. Specifically, we initially acquire view-specific representations using multiple graph autoencoders to exploit both content information and structural information. Furthermore, to pull together the same cluster instances, a soft negative pair aware contrastive loss is employed to distinguish the dissimilar instances while attracting similar instances. Thereafter, the view-specific representations are fed into cross-view dual relation generation layers to generate the affinity matrices of each other, aiming to reveal a consistent structural graph across various views. Extensive experiments conducted on six benchmarks illustrate the superiority of our method compared to other state-of-the-art approaches.

PDF Details DOI

YNIMG Journal 2024 Journal Article

Decoding the task specificity of post-error adjustments: Features and determinants

Qing Li
Jing Wang
Zong Meng
Yongqiang Chen
Mengke Zhang
Na Hu
Xu Chen
Antao Chen

Details DOI

EAAI Journal 2024 Journal Article

Driver distraction detection using semi-supervised lightweight vision transformer

Adam A.Q. Mohammed
Xin Geng
Jing Wang
Zafar Ali

Details DOI

IJCAI Conference 2024 Conference Paper

Exploiting Multi-Label Correlation in Label Distribution Learning

Zhiqiang Kou
Jing Wang
Jiawei Tang
Yuheng Jia
Boyu Shi
Xin Geng

Label Distribution Learning (LDL) is a novel machine learning paradigm that assigns label distribution to each instance. Numerous LDL methods proposed to leverage label correlation in the learning process to solve the exponential-sized output space; among these, many exploited the low-rank structure of label distribution to capture label correlation. However, recent research has unveiled that label distribution matrices typically maintain full rank, posing a challenge to approaches relying on low-rank label correlation. Notably, low-rank label correlation finds widespread adoption in multi-label learning (MLL) literature due to the often low-rank nature of multi-label matrices. Inspired by that, we introduce an auxiliary MLL process within the LDL framework, focusing on capturing low-rank label correlation within this auxiliary MLL component rather than the LDL itself. By doing so, we adeptly exploited low-rank label correlation in our LDL methods. We conduct comprehensive experiments and demonstrate that our methods are superior to existing LDL methods. Besides, the ablation studies justify the advantages of exploiting low-rank label correlation in the auxiliary MLL.

PDF Details DOI

EAAI Journal 2024 Journal Article

FCT-Net: A dual-encoding-path network fusing atrous spatial pyramid pooling and transformer for pavement crack detection

Bing Xiong
Rong Hong
Rui Liu
Jing Wang
Jin Zhang
Wei Li
Songtao Lv
Dongdong Ge

Details DOI

YNIMG Journal 2024 Journal Article

Mapping morphological cortical networks with joint probability distributions from multiple morphological features

Yuqi Wang
Junle Li
Suhui Jin
Jing Wang
Yating Lv
Qihong Zou
Jinhui Wang

Details DOI

EAAI Journal 2024 Journal Article

Parametric generative schemes with geometric constraints for encoding and synthesizing airfoils

Hairun Xie
Jing Wang
Miao Zhang

Details DOI

ECAI Conference 2024 Conference Paper

Reasonable Gradients for Online Training Algorithms in Spiking Neural Networks

Lang Xue
Hanwen Liu
Jing Wang
Hong Qu

Spiking neural networks (SNNs) have the potential to simulate sparse and spatio-temporal dynamics observed in biological neurons, making them promising for achieving energy-efficient artificial general intelligence. While backpropagation through time (BPTT) ensures reliable precision for training SNNs, it is hampered by high computation and storage complexity and does not conform to the instantaneous learning mechanism in brains. On the contrary, online training algorithms, which are biologically interpretable, offer low latency and memory efficiency, and are well-suited for on-chip learning applications. However, recent research exhibit a deficiency in the scientific comprehension of online gradients, which leads to certain limitations. To address this issue, we conduct an in-depth analysis of the calculation deviation in chain derivations induced by weight update and find two pivotal factors that affect the accuracy of online gradients: completeness and timeliness. To further enhance the performance of online training leveraging these findings, we propose spatio-temporal online learning (STOL), which substantially ameliorates the accuracy of the online gradients and demonstrates superior computation and memory efficiency. Our experiments on CIFAR-10, CIFAR-100, ImageNet, CIFAR10-DVS, and DVS128-Gesture datasets demonstrate that our method achieves state-of-the-art performance across most of these tasks. Besides, it shows a great improvement compared with existing online training algorithms.

Details

NeurIPS Conference 2024 Conference Paper

Scale-invariant Optimal Sampling for Rare-events Data and Sparse Models

Jing Wang
Haiying Wang
Hao H. Zhang

Subsampling is effective in tackling computational challenges for massive data with rare events. Overly aggressive subsampling may adversely affect estimation efficiency, and optimal subsampling is essential to mitigate the information loss. However, existing optimal subsampling probabilities depends on data scales, and some scaling transformations may result in inefficient subsamples. This problem is more significant when there are inactive features, because their influence on the subsampling probabilities can be arbitrarily magnified by inappropriate scaling transformations. We tackle this challenge and introduce a scale-invariant optimal subsampling function in the context of sparse models, where inactive features are commonly assumed. Instead of focusing on estimating model parameters, we define an optimal subsampling function to minimize the prediction error, using adaptive lasso as an example to outline the estimation procedure and study its theoretical guarantee. We first introduce the adaptive lasso estimator for rare-events data and establish its oracle properties, thereby validating the use of subsampling. Then we derive a scale-invariant optimal subsampling function that minimizes the prediction error of the inverse probability weighted (IPW) adaptive lasso. Finally, we present an estimator based on the maximum sampled conditional likelihood (MSCL) to further improve the estimation efficiency. We conduct numerical experiments using both simulated and real-world data sets to demonstrate the performance of the proposed methods.

PDF Details DOI

AAAI Conference 2024 Conference Paper

SURER: Structure-Adaptive Unified Graph Neural Network for Multi-View Clustering

Jing Wang
Songhe Feng
Gengyu Lyu
Jiazheng Yuan

Deep Multi-view Graph Clustering (DMGC) aims to partition instances into different groups using the graph information extracted from multi-view data. The mainstream framework of DMGC methods applies graph neural networks to embed structure information into the view-specific representations and fuse them for the consensus representation. However, on one hand, we find that the graph learned in advance is not ideal for clustering as it is constructed by original multi-view data and localized connecting. On the other hand, most existing methods learn the consensus representation in a late fusion manner, which fails to propagate the structure relations across multiple views. Inspired by the observations, we propose a Structure-adaptive Unified gRaph nEural network for multi-view clusteRing (SURER), which can jointly learn a heterogeneous multi-view unified graph and robust graph neural networks for multi-view clustering. Specifically, we first design a graph structure learning module to refine the original view-specific attribute graphs, which removes false edges and discovers the potential connection. According to the view-specific refined attribute graphs, we integrate them into a unified heterogeneous graph by linking the representations of the same sample from different views. Furthermore, we use the unified heterogeneous graph as the input of the graph neural network to learn the consensus representation for each instance, effectively integrating complementary information from various views. Extensive experiments on diverse datasets demonstrate the superior effectiveness of our method compared to other state-of-the-art approaches.

PDF Details DOI

JBHI Journal 2024 Journal Article

TNCB: Tri-Net With Cross-Balanced Pseudo Supervision for Class Imbalanced Medical Image Classification

Aixi Qu
Qiang Wu
Jing Wang
Luyue Yu
Jing Li
Ju Liu

In clinical settings, the implementation of deep neural networks is impeded by the prevalent problems of label scarcity and class imbalance in medical images. To mitigate the need for labeled data, semi-supervised learning (SSL) has gained traction. However, existing SSL schemes exhibit certain limitations. 1) They commonly fail to address the class imbalance problem. Training with imbalanced data makes the model's prediction biased towards majority classes, consequently introducing prediction bias. 2) They usually suffer from training bias arising from unreasonable training strategies, such as strong coupling between the generation and utilization of pseudo labels. To address these problems, we propose a novel SSL framework called Tri-Net with Cross-Balanced pseudo supervision (TNCB). Specifically, two student networks focusing on different learning tasks and a teacher network equipped with an adaptive balancer are designed. This design enables the teacher model to pay more focus on minority classes, thereby reducing prediction bias. Additionally, we propose a virtual optimization strategy to further enhance the teacher model's resistance to class imbalance. Finally, to fully exploit valuable knowledge from unlabeled images, we employ cross-balanced pseudo supervision, where an adaptive cross loss function is introduced to reduce training bias. Extensive evaluation on four datasets with different diseases, image modalities, and imbalance ratios consistently demonstrate the superior performance of TNCB over state-of-the-art SSL methods. These results indicate the effectiveness and robustness of TNCB in addressing imbalanced medical image classification challenges.

Details DOI

TMLR Journal 2024 Journal Article

What Has Been Overlooked in Contrastive Source-Free Domain Adaptation: Leveraging Source-Informed Latent Augmentation within Neighborhood Context

Jing Wang
Wonho Bae
Jiahong Chen
Kuangen Zhang
Leonid Sigal
Clarence W. de Silva

Source-free domain adaptation (SFDA) involves adapting a model originally trained using a labeled dataset (source domain) to perform effectively on an unlabeled dataset (target domain) without relying on any source data during adaptation. This adaptation is especially crucial when significant disparities in data distributions exist between the two domains and when there are privacy concerns regarding the source model's training data. The absence of access to source data during adaptation makes it challenging to analytically estimate the domain gap. To tackle this issue, various techniques have been proposed, such as unsupervised clustering, contrastive learning, and continual learning. In this paper, we first conduct an extensive theoretical analysis of SFDA based on contrastive learning, primarily because it has demonstrated superior performance compared to other techniques. Motivated by the obtained insights, we then introduce a straightforward yet highly effective latent augmentation method tailored for contrastive SFDA. This augmentation method leverages the dispersion of latent features within the neighborhood of the query sample, guided by the source pre-trained model, to enhance the informativeness of positive keys. Our approach, based on a single InfoNCE-based contrastive loss, outperforms state-of-the-art SFDA methods on widely recognized benchmark datasets.

PDF Details

EAAI Journal 2023 Journal Article

A Q-learning artificial bee colony for distributed assembly flow shop scheduling with factory eligibility, transportation capacity and setup time

Jing Wang
Hongtao Tang
Deming Lei

Details DOI

TMLR Journal 2023 Journal Article

Cross-validation for Geospatial Data: Estimating Generalization Performance in Geostatistical Problems

Jing Wang
Laurel Hopkins
Tyler Hallman
W. Douglas Robinson
Rebecca Hutchinson

Geostatistical learning problems are frequently characterized by spatial autocorrelation in the input features and/or the potential for covariate shift at test time. These realities violate the classical assumption of independent, identically distributed data, upon which most cross-validation algorithms rely in order to estimate the generalization performance of a model. In this paper, we present a theoretical criterion for unbiased cross-validation estimators in the geospatial setting. We also introduce a new cross-validation algorithm to evaluate models, inspired by the challenges of geospatial problems. We apply a framework for categorizing problems into different types of geospatial scenarios to help practitioners select an appropriate cross-validation strategy. Our empirical analyses compare cross-validation algorithms on both simulated and several real datasets to develop recommendations for a variety of geospatial settings. This paper aims to draw attention to some challenges that arise in model evaluation for geospatial problems and to provide guidance for users.

PDF Details

AAMAS Conference 2023 Conference Paper

Differentiable Arbitrating in Zero-sum Markov Games

Jing Wang
Meichen Song
Feng Gao
Boyi Liu
Zhaoran Wang
Yi Wu

We initiate the study of how to perturb the reward in a zero-sum Markov game with two players to induce a desirable Nash equilibrium, namely arbitrating. Such a problem admits a bi-level optimization formulation. The lower level requires solving the Nash equilibrium under a given reward function, which makes the overall problem challenging to optimize in an end-to-end way. We propose a backpropagation scheme that differentiates through the Nash equilibrium, which provides the gradient feedback for the upper level. In particular, our method only requires a black-box solver for the (regularized) Nash equilibrium (NE). We develop the convergence analysis for the proposed framework with proper black-box NE solvers and demonstrate the empirical successes in two multi-agent reinforcement learning (MARL) environments. Supplementary for all the proofs in this paper could be found in: https: //arxiv. org/abs/2302. 10058.

PDF

JBHI Journal 2023 Journal Article

E-DGAN: An Encoder-Decoder Generative Adversarial Network Based Method for Pathological to Normal Voice Conversion

Minghang Chu
Mengtao Yang
Chao Xu
Yaoyao Ma
Jing Wang
Zhiwei Fan
Zhi Tao
Di Wu

In recent years, more and more people suffer from voice-related diseases. Given the limitations of current pathological speech conversion methods, that is, a method can only convert a single kind of pathological voice. In this study, we propose a novel Encoder-Decoder Generative Adversarial Network (E-DGAN) to generate personalized speech for pathological to normal voice conversion, which is suitable for multiple kinds of pathological voices. Our proposed method can also solve the problem of improving the intelligibility and personalizing custom speech of pathological voices. Feature extraction is performed using a mel filter bank. The conversion network is an encoder-decoder structure, which is used to convert the mel spectrogram of pathological voices to the mel spectrogram of normal voices. After being converted by the residual conversion network, the personalized normal speech is synthesized by the neural vocoder. In addition, we propose a subjective evaluation metric named “content similarity” to evaluate the consistency between the converted pathological voice content and the reference content. The Saarbrücken Voice Database (SVD) is used to verify the proposed method. The intelligibility and content similarity of pathological voices are increased by 18. 67% and 2. 60%, respectively. Besides, an intuitive analysis based on a spectrogram was done and a significant improvement was achieved. The results show that our proposed method can improve the intelligibility of pathological voices and personalize the conversion of pathological voices into the normal voices of 20 different speakers. Our proposed method is compared with five other pathological voice conversion methods, and our proposed method has the best evaluation results.

Details DOI

AAAI Conference 2023 Conference Paper

Imbalanced Label Distribution Learning

Xingyu Zhao
Yuexuan An
Ning Xu
Jing Wang
Xin Geng

Label distribution covers a certain number of labels, representing the degree to which each label describes an instance. The learning process on the instances labeled by label distributions is called Label Distribution Learning (LDL). Although LDL has been applied successfully to many practical applications, one problem with existing LDL methods is that they are limited to data with balanced label information. However, annotation information in real-world data often exhibits imbalanced distributions, which significantly degrades the performance of existing methods. In this paper, we investigate the Imbalanced Label Distribution Learning (ILDL) problem. To handle this challenging problem, we delve into the characteristics of ILDL and empirically find that the representation distribution shift is the underlying reason for the performance degradation of existing methods. Inspired by this finding, we present a novel method named Representation Distribution Alignment (RDA). RDA aligns the distributions of feature representations and label representations to alleviate the impact of the distribution gap between the training set and the test set caused by the imbalance issue. Extensive experiments verify the superior performance of RDA. Our work fills the gap in benchmarks and techniques for practical ILDL problems.

PDF Details DOI

JBHI Journal 2023 Journal Article

SDPN: A Slight Dual-Path Network With Local-Global Attention Guided for Medical Image Segmentation

Jing Wang
Shuyi Li
Luyue Yu
Aixi Qu
Qing Wang
Ju Liu
Qiang Wu

Accurate identification of lesions is a key step in surgical planning. However, this task mainly exists two challenges: 1) Due to the complex anatomical shapes of different lesions, most segmentation methods only achieve outstanding performance for a specific structure, rather than other lesions with location differences. 2) The huge number of parameters limits existing transformer-based segmentation models. To overcome these problems, we propose a novel slight dual-path network (SDPN) to segment variable location lesions or organs with significant differences accurately. First, we design a dual-path module to integrate local with global features without obvious memory consumption. Second, a novel Multi-spectrum attention module is proposed to pay further attention to detailed information, which can automatically adapt to the variable segmentation target. Then, the compression module based on tensor ring decomposition is designed to compress convolutional and transformer structures. In the experiment, four datasets, including three benchmark datasets and a clinical dataset, are used to evaluate SDPN. Results of the experiments show that SDPN performs better than other start-of-the-art methods for brain tumor, liver tumor, endometrial tumor and cardiac segmentation. To ensure the generalizability, we train the network on Kvasir-SEG and test on CVC-ClinicDB which collected from a different institution. The quantitative analysis shows that the clinical evaluation results are consistent with the experts. Therefore, this model may be a potential candidate for the segmentation of lesions and organs segmentation with variable locations in clinical applications.

Details DOI

EAAI Journal 2022 Journal Article

An adaptively balanced grey wolf optimization algorithm for feature selection on high-dimensional classification

Jing Wang
Dakun Lin
Yuanzi Zhang
Shiguo Huang

Details DOI

YNICL Journal 2022 Journal Article

Frequency-dependent white-matter functional network changes associated with cognitive deficits in subcortical vascular cognitive impairment

Juanwei Ma
Feng Liu
Yang Wang
Lin Ma
Yali Niu
Jing Wang
Zhaoxiang Ye
Jing Zhang

Details DOI

JBHI Journal 2022 Journal Article

HID: The Hybrid Image Decomposition Model for MRI and CT Fusion

Rui Zhu
Xiongfei Li
Xiaoli Zhang
Jing Wang

Multimodal medical image fusion can combine salient information from different source images of the same part and reduce the redundancy of information. In this paper, an efficient hybrid image decomposition (HID) method is proposed. It combines the advantages of spatial domain and transform domain methods and breaks through the limitations of the algorithms based on single category features. The accurate separation of base layer and texture details is conducive to the better effect of the fusion rules. First, the source anatomical images are decomposed into a series of high frequencies and a low frequency via nonsubsampled shearlet transform (NSST). Second, the low frequency is further decomposed using the designed optimization model based on structural similarity and structure tensor to get an energy texture layer and a base layer. Then, the modified choosing maximum (MCM) is designed to fuse base layers. The sum of modified Laplacian (SML) is used to fuse high frequencies and energy texture layers. Finally, the fused low frequency can be obtained by adding fused energy texture layer and base layer. And the fused image is reconstructed by the inverse NSST. The superiority of the proposed method is verified by amounts of experiments on 50 pairs of magnetic resonance imaging (MRI) images and computed tomography (CT) images and others, and compared with 12 state-of-the-art medical image fusion methods. It is demonstrated that the proposed hybrid decomposition model has a better ability to extract texture information than conventional ones.

Details DOI

JBHI Journal 2022 Journal Article

Interpretability Analysis of One-Year Mortality Prediction for Stroke Patients Based on Deep Neural Network

Shuo Zhang
Jing Wang
Lulu Pei
Kai Liu
Yuan Gao
Hui Fang
Rui Zhang
Lu Zhao

Clinically, physicians collect the benchmark medical data to establish archives for a stroke patient and then add the follow up data regularly. It has great significance on prognosis prediction for stroke patients. In this paper, we present an interpretable deep learning model to predict the one-year mortality risk on stroke. We design sub-modules to reconstruct features from original clinical data that highlight the dissimilarity and temporality of different variables. The model consists of Bidirectional Long Short-Term Memory (Bi-LSTM), in which a novel correlation attention module is proposed that takes the correlation of variables into consideration. In experiments, datasets are collected clinically from the department of neurology in a local AAA hospital. It consists of 2, 275 stroke patients hospitalized in the department of neurology from 2014 to 2016. Our model achieves a precision of 0. 9414, a recall of 0. 9502 and an F1-score of 0. 9415. In addition, we provide the analysis of the interpretability by visualizations with reference to clinical professional guidelines.

Details DOI

TMLR Journal 2022 Journal Article

Uncertainty-Based Active Learning for Reading Comprehension

Jing Wang
Jie Shen
Xiaofei Ma
Andrew Arnold

Recent years have witnessed a surge of successful applications of machine reading comprehension. Of central importance to these tasks is the availability of massive amount of labeled data, which facilitates training of large-scale neural networks. However, in many real-world problems, annotated data are expensive to gather not only because of time cost and budget, but also of certain domain-specific restrictions such as privacy for healthcare data. In this regard, we propose an uncertainty-based active learning algorithm for reading comprehension, which interleaves data annotation and model updating to mitigate the demand of labeling. Our key techniques are two-fold: 1) an unsupervised uncertainty-based sampling scheme that queries the labels of the most informative instances with respect to the currently learned model; and 2) an adaptive loss minimization paradigm that simultaneously fits the data and controls the degree of model updating. We demonstrate on benchmark datasets that 25% less labeled samples suffice to guarantee similar, or even improved performance. Our results show strong evidence that for label-demanding scenarios, the proposed approach offers a practical guide on data collection and model training.

PDF Details

JBHI Journal 2021 Journal Article

Attention-Aware Residual Network Based Manifold Learning for White Blood Cells Classification

Pu Huang
Jing Wang
Jian Zhang
Yajuan Shen
Cong Liu
Weiqing Song
Shangshang Wu
Yuwei Zuo

The classification of six types of white blood cells (WBCs) is considered essential for leukemia diagnosis, while the classification is labor-intensive and strict with the clinical experience. To relieve the complicated process with an efficient and automatic method, we propose the A ttention-aware R esidual Network based M anifold L earning model (ARML) to classify WBCs. The proposed ARML model leverages the adaptive attention-aware residual learning to exploit the category-relevant image-level features and strengthen the first-order feature representation ability. To learn more discriminatory information than the first-order ones, the second-order features are characterized. Afterwards, ARML encodes both the first- and second-order features with Gaussian embedding into the Riemannian manifold to learn the underlying non-linear structure of the features for classification. ARML can be trained in an end-to-end fashion, and the learnable parameters are iteratively optimized. 10800 WBCs images (1800 images for each type) is collected, 9000 images and five-fold cross-validation are used for training and validation of the model, while additional 1800 images for testing. The results show that ARML achieving average classification accuracy of 0. 953 outperforms other state-of-the-art methods with fewer trainable parameters. In the ablation study, ARML achieves improved accuracy against its three variants: without manifold learning (AR), without attention-aware learning (RML), and AR without attention-aware learning. The t-SNE results illustrate that ARML has learned more distinguishable features than the comparison methods, which benefits the WBCs classification. ARML provides a clinically feasible WBCs classification solution for leukemia diagnose with an efficient manner.

Details DOI

NeurIPS Conference 2021 Conference Paper

Generalization Bounds for Graph Embedding Using Negative Sampling: Linear vs Hyperbolic

Atsushi Suzuki
Atsushi Nitanda
Jing Wang
Linchuan Xu
Kenji Yamanishi
Marc Cavazza

Graph embedding, which represents real-world entities in a mathematical space, has enabled numerous applications such as analyzing natural languages, social networks, biochemical networks, and knowledge bases. It has been experimentally shown that graph embedding in hyperbolic space can represent hierarchical tree-like data more effectively than embedding in linear space, owing to hyperbolic space's exponential growth property. However, since the theoretical comparison has been limited to ideal noiseless settings, the potential for the hyperbolic space's property to worsen the generalization error for practical data has not been analyzed. In this paper, we provide a generalization error bound applicable for graph embedding both in linear and hyperbolic spaces under various negative sampling settings that appear in graph embedding. Our bound states that error is polynomial and exponential with respect to the embedding space's radius in linear and hyperbolic spaces, respectively, which implies that hyperbolic space's exponential growth property worsens the error. Using our bound, we clarify the data size condition on which graph embedding in hyperbolic space can represent a tree better than in Euclidean space by discussing the bias-variance trade-off. Our bound also shows that imbalanced data distribution, which often appears in graph embedding, can worsen the error.

PDF Details

IJCAI Conference 2021 Conference Paper

Learn the Highest Label and Rest Label Description Degrees

Jing Wang
Xin Geng

Although Label Distribution Learning (LDL) has found wide applications in varieties of classification problems, it may face the challenge of objective mismatch -- LDL neglects the optimal label for the sake of learning the whole label distribution, which leads to performance deterioration. To improve classification performance and solve the objective mismatch, we propose a new LDL algorithm called LDL-HR. LDL-HR provides a new perspective of label distribution, \textit{i. e. }, a combination of the \textbf{highest label} and the \textbf{rest label description degrees}. It works as follows. First, we learn the highest label by fitting the degenerated label distribution and large margin. Second, we learn the rest label description degrees to exploit generalization. Theoretical analysis shows the generalization of LDL-HR. Besides, the experimental results on 18 real-world datasets validate the statistical superiority of our method.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

SalientSleepNet: Multimodal Salient Wave Detection Network for Sleep Staging

Ziyu Jia
Youfang Lin
Jing Wang
Xuehui Wang
Peiyi Xie
Yingbin Zhang

Sleep staging is fundamental for sleep assessment and disease diagnosis. Although previous attempts to classify sleep stages have achieved high classification performance, several challenges remain open: 1) How to effectively extract salient waves in multimodal sleep data; 2) How to capture the multi-scale transition rules among sleep stages; 3) How to adaptively seize the key role of specific modality for sleep staging. To address these challenges, we propose SalientSleepNet, a multimodal salient wave detection network for sleep staging. Specifically, SalientSleepNet is a temporal fully convolutional network based on the $U^2$-Net architecture that is originally proposed for salient object detection in computer vision. It is mainly composed of two independent $U^2$-like streams to extract the salient features from multimodal data, respectively. Meanwhile, the multi-scale extraction module is designed to capture multi-scale transition rules among sleep stages. Besides, the multimodal attention module is proposed to adaptively capture valuable information from multimodal data for the specific sleep stage. Experiments on the two datasets demonstrate that SalientSleepNet outperforms the state-of-the-art baselines. It is worth noting that this model has the least amount of parameters compared with the existing deep neural network models.

PDF Details DOI

TCS Journal 2021 Journal Article

The generalized 3-connectivity of two kinds of regular networks

Jing Wang

Details DOI

IJCAI Conference 2020 Conference Paper

Discovering Latent Class Labels for Multi-Label Learning

Jun Huang
Linchuan Xu
Jing Wang
Lei Feng
Kenji Yamanishi

Existing multi-label learning (MLL) approaches mainly assume all the labels are observed and construct classification models with a fixed set of target labels (known labels). However, in some real applications, multiple latent labels may exist outside this set and hide in the data, especially for large-scale data sets. Discovering and exploring the latent labels hidden in the data may not only find interesting knowledge but also help us to build a more robust learning model. In this paper, a novel approach named DLCL (i. e. , Discovering Latent Class Labels for MLL) is proposed which can not only discover the latent labels in the training data but also predict new instances with the latent and known labels simultaneously. Extensive experiments show a competitive performance of DLCL against other state-of-the-art MLL approaches.

PDF Details DOI

IJCAI Conference 2020 Conference Paper

GraphSleepNet: Adaptive Spatial-Temporal Graph Convolutional Networks for Sleep Stage Classification

Ziyu Jia
Youfang Lin
Jing Wang
Ronghao Zhou
Xiaojun Ning
Yuanlai He
Yaoshuai Zhao

Sleep stage classification is essential for sleep assessment and disease diagnosis. However, how to effectively utilize brain spatial features and transition information among sleep stages continues to be challenging. In particular, owing to the limited knowledge of the human brain, predefining a suitable spatial brain connection structure for sleep stage classification remains an open question. In this paper, we propose a novel deep graph neural network, named GraphSleepNet, for automatic sleep stage classification. The main advantage of the GraphSleepNet is to adaptively learn the intrinsic connection among different electroencephalogram (EEG) channels, represented by an adjacency matrix, thereby best serving the spatial-temporal graph convolution network (ST-GCN) for sleep stage classification. Meanwhile, the ST-GCN consists of graph convolutions for extracting spatial features and temporal convolutions for capturing the transition rules among sleep stages. Experiments on the Montreal Archive of Sleep Studies (MASS) dataset demonstrate that the GraphSleepNet outperforms the state-of-the-art baselines.

PDF Details DOI

ICLR Conference 2020 Conference Paper

Knowledge Consistency between Neural Networks and Beyond

Ruofan Liang
Tianlin Li
Longfei Li
Jing Wang
Quanshi Zhang

This paper aims to analyze knowledge consistency between pre-trained deep neural networks. We propose a generic definition for knowledge consistency between neural networks at different fuzziness levels. A task-agnostic method is designed to disentangle feature components, which represent the consistent knowledge, from raw intermediate-layer features of each neural network. As a generic tool, our method can be broadly used for different applications. In preliminary experiments, we have used knowledge consistency as a tool to diagnose representations of neural networks. Knowledge consistency provides new insights to explain the success of existing deep-learning techniques, such as knowledge distillation and network compression. More crucially, knowledge consistency can also be used to refine pre-trained networks and boost performance.

Details

AAAI Conference 2020 Conference Paper

Logo-2K+: A Large-Scale Logo Dataset for Scalable Logo Classification

Jing Wang
Weiqing Min
Sujuan Hou
Shengnan Ma
Yuanjie Zheng
Haishuai Wang
Shuqiang Jiang

Logo classiﬁcation has gained increasing attention for its various applications, such as copyright infringement detection, product recommendation and contextual advertising. Compared with other types of object images, the real-world logo images have larger variety in logo appearance and more complexity in their background. Therefore, recognizing the logo from images is challenging. To support efforts towards scalable logo classiﬁcation task, we have curated a dataset, Logo- 2K+, a new large-scale publicly available real-world logo dataset with 2, 341 categories and 167, 140 images. Compared with existing popular logo datasets, such as FlickrLogos-32 and LOGO-Net, Logo-2K+ has more comprehensive coverage of logo categories and larger quantity of logo images. Moreover, we propose a Discriminative Region Navigation and Augmentation Network (DRNA-Net), which is capable of discovering more informative logo regions and augmenting these image regions for logo classiﬁcation. DRNA- Net consists of four sub-networks: the navigator sub-network ﬁrst selected informative logo-relevant regions guided by the teacher sub-network, which can evaluate its conﬁdence belonging to the ground-truth logo class. The data augmentation sub-network then augments the selected regions via both region cropping and region dropping. Finally, the scrutinizer sub-network fuses features from augmented regions and the whole image for logo classiﬁcation. Comprehensive experiments on Logo-2K+ and other three existing benchmark datasets demonstrate the effectiveness of proposed method. Logo-2K+ and the proposed strong baseline DRNA-Net are expected to further the development of scalable logo image recognition, and the Logo-2K+ dataset can be found at https: //github. com/msn199959/Logo-2k-plus-Dataset.

PDF Details

JBHI Journal 2020 Journal Article

Multi-Objective-Based Radiomic Feature Selection for Lesion Malignancy Classification

Zhiguo Zhou
Shulong Li
Genggeng Qin
Michael Folkert
Steve Jiang
Jing Wang

Objective: accurately classifying the malignancy of lesions detected in a screening scan is critical for reducing false positives. Radiomics holds great potential to differentiate malignant from benign tumors by extracting and analyzing a large number of quantitative image features. Since not all radiomic features contribute to an effective classifying model, selecting an optimal feature subset is critical. Methods: this work proposes a new multi-objective based feature selection (MO-FS) algorithm that considers sensitivity and specificity simultaneously as the objective functions during feature selection. For MO-FS, we developed a modified entropy-based termination criterion that stops the algorithm automatically rather than relying on a preset number of generations. We also designed a solution selection methodology for multi-objective learning that uses the evidential reasoning approach (SMOLER) to automatically select the optimal solution from the Pareto-optimal set. Furthermore, we developed an adaptive mutation operation to generate the mutation probability in MO-FS automatically. Results: we evaluated the MO-FS for classifying lung nodule malignancy in low-dose CT and breast lesion malignancy in digital breast tomosynthesis. Conclusion: the experimental results demonstrated that the feature set selected by MO-FS achieved better classification performance than features selected by other commonly used methods. Significance: the proposed method is general and more effective radiomic feature selection strategy.

Details DOI

IJCAI Conference 2019 Conference Paper

Attributed Subspace Clustering

Jing Wang
Linchuan Xu
Feng Tian
Atsushi Suzuki
Changqing Zhang
Kenji Yamanishi

Existing methods on representation-based subspace clustering mainly treat all features of data as a whole to learn a single self-representation and get one clustering solution. Real data however are often complex and consist of multiple attributes or sub-features, such as a face image has expressions or genders. Each attribute is distinct and complementary on depicting the data. Failing to explore attributes and capture the complementary information among them may lead to an inaccurate representation. Moreover, a single clustering solution is rather limited to depict data, which can often be interpreted from different aspects and grouped into multiple clusters according to attributes. Therefore, we propose an innovative model called attributed subspace clustering (ASC). It simultaneously learns multiple self-representations on latent representations derived from original data. By utilizing Hilbert Schmidt Independence Criterion as a co-regularizing term, ASC enforces that each self-representation is independent and corresponds to a specific attribute. A more comprehensive self-representation is then established by adding these self-representations. Experiments on several benchmark image datasets have demonstrated the effectiveness of ASC not only in terms of clustering accuracy achieved by the integrated representation, but also the diverse interpretation of data, which is beyond what current approaches can offer.

PDF Details

IJCAI Conference 2019 Conference Paper

Classification with Label Distribution Learning

Jing Wang
Xin Geng

Label Distribution Learning (LDL) is a novel learning paradigm, aim of which is to minimize the distance between the model output and the ground-truth label distribution. We notice that, in real-word applications, the learned label distribution model is generally treated as a classification model, with the label corresponding to the highest model output as the predicted label, which unfortunately prompts an inconsistency between the training phrase and the test phrase. To solve the inconsistency, we propose in this paper a new Label Distribution Learning algorithm for Classification (LDL4C). Firstly, instead of KL-divergence, absolute loss is applied as the measure for LDL4C. Secondly, samples are re-weighted with information entropy. Thirdly, large margin classifier is adapted to boost discrimination precision. We then reveal that theoretically LDL4C seeks a balance between generalization and discrimination. Finally, we compare LDL4C with existing LDL algorithms on 17 real-word datasets, and experimental results demonstrate the effectiveness of LDL4C in classification.

PDF Details

IJCAI Conference 2019 Conference Paper

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation

Jing Wang
Yingwei Pan
Ting Yao
Jinhui Tang
Tao Mei

Image paragraph generation is the task of producing a coherent story (usually a paragraph) that describes the visual content of an image. The problem nevertheless is not trivial especially when there are multiple descriptive and diverse gists to be considered for paragraph generation, which often happens in real images. A valid question is how to encapsulate such gists/topics that are worthy of mention from an image, and then describe the image from one topic to another but holistically with a coherent structure. In this paper, we present a new design --- Convolutional Auto-Encoding (CAE) that purely employs convolutional and deconvolutional auto-encoding framework for topic modeling on the region-level features of an image. Furthermore, we propose an architecture, namely CAE plus Long Short-Term Memory (dubbed as CAE-LSTM), that novelly integrates the learnt topics in support of paragraph generation. Technically, CAE-LSTM capitalizes on a two-level LSTM-based paragraph generation framework with attention mechanism. The paragraph-level LSTM captures the inter-sentence dependency in a paragraph, while sentence-level LSTM is to generate one sentence which is conditioned on each learnt topic. Extensive experiments are conducted on Stanford image paragraph dataset, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, CAE-LSTM increases CIDEr performance from 20. 93% to 25. 15%.

PDF Details

AAAI Conference 2019 Conference Paper

Orderly Subspace Clustering

Jing Wang
Atsushi Suzuki
Linchuan Xu
Feng Tian
Liang Yang
Kenji Yamanishi

Semi-supervised representation-based subspace clustering is to partition data into their underlying subspaces by finding effective data representations with partial supervisions. Essentially, an effective and accurate representation should be able to uncover and preserve the true data structure. Meanwhile, a reliable and easy-to-obtain supervision is desirable for practical learning. To meet these two objectives, in this paper we make the first attempt towards utilizing the orderly relationship, such as the data a is closer to b than to c, as a novel supervision. We propose an orderly subspace clustering approach with a novel regularization term. OSC enforces the learned representations to simultaneously capture the intrinsic subspace structure and reveal orderly structure that is faithful to true data relationship. Experimental results with several benchmarks have demonstrated that aside from more accurate clustering against state-of-the-arts, OSC interprets orderly data structure which is beyond what current approaches can offer.

PDF Details

IJCAI Conference 2019 Conference Paper

SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets

Eugene Ie
Vihan Jain
Jing Wang
Sanmit Narvekar
Ritesh Agarwal
Rui Wu
Heng-Tze Cheng
Tushar Chandra

Reinforcement learning methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. We demonstrate our methods in simulation, and validate the scalability and effectiveness of decomposed TD-learning on YouTube.

PDF Details

AAAI Conference 2019 Conference Paper

Theoretical Analysis of Label Distribution Learning

Jing Wang
Xin Geng

As a novel learning paradigm, label distribution learning (LDL) explicitly models label ambiguity with the definition of label description degree. Although lots of work has been done to deal with real-world applications, theoretical results on LDL remain unexplored. In this paper, we rethink LDL from theoretical aspects, towards analyzing learnability of LDL. Firstly, risk bounds for three representative LDL algorithms (AA-kNN, AA-BP and SA-ME) are provided. For AA-kNN, Lipschitzness of the label distribution function is assumed to bound the risk, and for AA-BP and SA-ME, rademacher complexity is utilized to give data-dependent risk bounds. Secondly, a generalized plug-in decision theorem is proposed to understand the relation between LDL and classification, uncovering that approximation to the conditional probability distribution function in absolute loss guarantees approaching to the optimal classifier, and also data-dependent error probability bounds are presented for the corresponding LDL algorithms to perform classification. As far as we know, this is perhaps the first research on theory of LDL.

PDF Details

AAAI Conference 2018 Conference Paper

Curve-Structure Segmentation From Depth Maps: A CNN-Based Approach and Its Application to Exploring Cultural Heritage Objects

Yuhang Lu
Jun Zhou
Jing Wang
Jun Chen
Karen Smith
Colin Wilder
Song Wang

Motivated by the important archaeological application of exploring cultural heritage objects, in this paper we study the challenging problem of automatically segmenting curve structures that are very weakly stamped or carved on an object surface in the form of a highly noisy depth map. Different from most classical low-level image segmentation methods that are known to be very sensitive to the noise and occlusions, we propose a new supervised learning algorithm based on Convolutional Neural Network (CNN) to implicitly learn and utilize more curve geometry and pattern information for addressing this challenging problem. More speciﬁcally, we ﬁrst propose a Fully Convolutional Network (FCN) to estimate the skeleton of curve structures and at each skeleton pixel, a scale value is estimated to reﬂect the local curve width. Then we propose a dense prediction network to re- ﬁne the estimated curve skeletons. Based on the estimated scale values, we ﬁnally develop an adaptive thresholding algorithm to achieve the ﬁnal segmentation of curve structures. In the experiment, we validate the performance of the proposed method on a dataset of depth images scanned from unearthed pottery sherds dating to the Woodland period of Southeastern North America.

PDF Details

AIIM Journal 2018 Journal Article

Image processing strategies based on saliency segmentation for object recognition under simulated prosthetic vision

Heng Li
Xiaofan Su
Jing Wang
Han Kan
Tingting Han
Yajie Zeng
Xinyu Chai

Details DOI

EAAI Journal 2018 Journal Article

Operation space design of microbial fuel cells combined anaerobic–anoxic–oxic process based on support vector regression inverse model

Jing Wang
Qilun Wang
Jinglin Zhou
Xiaohui Wang
Long Cheng

Details DOI

IJCAI Conference 2018 Conference Paper

Power-law Distribution Aware Trust Prediction

Xiao Wang
Ziwei Zhang
Jing Wang
Peng Cui
Shiqiang Yang

Trust prediction, aiming to predict the trust relations between users in a social network, is a key to helping users discover the reliable information. Many trust prediction methods are proposed based on the low-rank assumption of a trust network. However, one typical property of the trust network is that the trust relations follow the power-law distribution, i. e. , few users are trusted by many other users, while most tail users have few trustors. Due to these tail users, the fundamental low-rank assumption made by existing methods is seriously violated and becomes unrealistic. In this paper, we propose a simple yet effective method to address the problem of the violated low-rank assumption. Instead of discovering the low-rank component of the trust network alone, we learn a sparse component of the trust network to describe the tail users simultaneously. With both of the learned low-rank and sparse components, the trust relations in the whole network can be better captured. Moreover, the transitive closure structure of the trust relations is also integrated into our model. We then derive an effective iterative algorithm to infer the parameters of our model, along with the proof of correctness. Extensive experimental results on real-world trust networks demonstrate the superior performance of our proposed method over the state-of-the-arts.

PDF Details

IJCAI Conference 2018 Conference Paper

Ranking Preserving Nonnegative Matrix Factorization

Jing Wang
Feng Tian
Weiwei Liu
Xiao Wang
Wenjie Zhang
Kenji Yamanishi

Nonnegative matrix factorization (NMF), a well-known technique to find parts-based representations of nonnegative data, has been widely studied. In reality, ordinal relations often exist among data, such as data i is more related to j than to q. Such relative order is naturally available, and more importantly, it truly reflects the latent data structure. Preserving the ordinal relations enables us to find structured representations of data that are faithful to the relative order, so that the learned representations become more discriminative. However, current NMFs pay no attention to this. In this paper, we make the first attempt towards incorporating the ordinal relations and propose a novel ranking preserving nonnegative matrix factorization (RPNMF) approach, which enforces the learned representations to be ranked according to the relations. We derive iterative updating rules to solve RPNMF's objective function with convergence guaranteed. Experimental results with several datasets for clustering and classification have demonstrated that RPNMF achieves greater performance against the state-of-the-arts, not only in terms of accuracy, but also interpretation of orderly data structure.

PDF Details

AAAI Conference 2018 Conference Paper

Show, Reward and Tell: Automatic Generation of Narrative Paragraph From Photo Stream by Adversarial Training

Jing Wang
Jianlong Fu
Jinhui Tang
Zechao Li
Tao Mei

Impressive image captioning results (i. e. , an objective description for an image) are achieved with plenty of training pairs. In this paper, we take one step further to investigate the creation of narrative paragraph for a photo stream. This task is even more challenging due to the difﬁculty in modeling an ordered photo sequence and in generating a relevant paragraph with expressive language style for storytelling. The dif- ﬁculty can even be exacerbated by the limited training data, so that existing approaches almost focus on search-based solutions. To deal with these challenges, we propose a sequenceto-sequence modeling approach with reinforcement learning and adversarial training. First, to model the ordered photo stream, we propose a hierarchical recurrent neural network as story generator, which is optimized by reinforcement learning with rewards. Second, to generate relevant and story-style paragraphs, we design the rewards with two critic networks, including a multi-modal and a language-style discriminator. Third, we further consider the story generator and reward critics as adversaries. The generator aims to create indistinguishable paragraphs to human-level stories, whereas the critics aim at distinguishing them and further improving the generator by policy gradient. Experiments on three widely-used datasets show the effectiveness, against state-of-the-art methods with relative increase of 20. 2% by METEOR. We also show the subjective preference for the proposed approach over the baselines through a user study with 30 human subjects.

PDF Details

YNIMG Journal 2017 Journal Article

Commonality of neural representations of sentences across languages: Predicting brain activation during Portuguese sentence comprehension using an English-based model of brain function

Ying Yang
Jing Wang
Cyntia Bailer
Vladimir Cherkassky
Marcel Adam Just

Details DOI

AAAI Conference 2017 Conference Paper

Community Preserving Network Embedding

Xiao Wang
Peng Cui
Jing Wang
Jian Pei
Wenwu Zhu
Shiqiang Yang

Network embedding, aiming to learn the low-dimensional representations of nodes in networks, is of paramount importance in many real applications. One basic requirement of network embedding is to preserve the structure and inherent properties of the networks. While previous network embedding methods primarily preserve the microscopic structure, such as the ﬁrst- and second-order proximities of nodes, the mesoscopic community structure, which is one of the most prominent feature of networks, is largely ignored. In this paper, we propose a novel Modularized Nonnegative Matrix Factorization (M-NMF) model to incorporate the community structure into network embedding. We exploit the consensus relationship between the representations of nodes and community structure, and then jointly optimize NMF based representation learning model and modularity based community detection model in a uniﬁed framework, which enables the learned representations of nodes to preserve both of the microscopic and community structures. We also provide efﬁcient updating rules to infer the parameters of our model, together with the correctness and convergence guarantees. Extensive experimental results on a variety of real-world networks show the superior performance of the proposed method over the state-of-the-arts.

PDF Details

YNICL Journal 2017 Journal Article

Identification of the epileptogenic zone of temporal lobe epilepsy from stereo-electroencephalography signals: A phase transfer entropy and graph theory approach

Meng-yang Wang
Jing Wang
Jian Zhou
Yu-guang Guan
Feng Zhai
Chang-qing Liu
Fei-fei Xu
Yi-xian Han

Details DOI

IJCAI Conference 2017 Conference Paper

Multi-Component Nonnegative Matrix Factorization

Jing Wang
Feng Tian
Xiao Wang
Hongchuan Yu
Chang Hong Liu
Liang Yang

Real data are usually complex and contain various components. For example, face images have expressions and genders. Each component mainly reflects one aspect of data and provides information others do not have. Therefore, exploring the semantic information of multiple components as well as the diversity among them is of great benefit to understand data comprehensively and in-depth. However, this cannot be achieved by current nonnegative matrix factorization (NMF)-based methods, despite that NMF has shown remarkable competitiveness in learning parts-based representation of data. To overcome this limitation, we propose a novel multi-component nonnegative matrix factorization (MCNMF). Instead of seeking for only one representation of data, MCNMF learns multiple representations simultaneously, with the help of the Hilbert Schmidt Independence Criterion (HSIC) as a diversity term. HSIC explores the diverse information among the representations, where each representation corresponds to a component. By integrating the multiple representations, a more comprehensive representation is then established. A new iterative updating optimization scheme is derived to solve the objective function of MCNMF, along with its correctness and convergence guarantees. Extensive experimental results on real-world datasets have shown that MCNMF not only achieves more accurate performance over the state-of-the-arts using the aggregated representation, but also interprets data from different aspects with the multiple representations, which is beyond what current NMFs can offer.

PDF Details

YNIMG Journal 2017 Journal Article

Neural representations of the concepts in simple sentences: Concept activation prediction and context effects

Marcel Adam Just
Jing Wang
Vladimir L. Cherkassky

Details DOI

IJCAI Conference 2013 Conference Paper

Online Group Feature Selection

Jing Wang
Zhong-Qiu Zhao
Xuegang Hu
Yiu-ming Cheung
Meng Wang
Xindong Wu

Online feature selection with dynamic features has become an active research area in recent years. However, in some real-world applications such as image analysis and email spam filtering, features may arrive by groups. Existing online feature selection methods evaluate features individually, while existing group feature selection methods cannot handle online processing. Motivated by this, we formulate the online group feature selection problem, and propose a novel selection approach for this problem. Our proposed approach consists of two stages: online intra-group selection and online inter-group selection. In the intra-group selection, we use spectral analysis to select discriminative features in each group when it arrives. In the inter-group selection, we use Lasso to select a globally optimal subset of features. This 2-stage procedure continues until there are no more features to come or some predefined stopping conditions are met. Extensive experiments conducted on benchmark and real-world data sets demonstrate that our proposed approach outperforms other state-of-theart online feature selection methods.

PDF Details DOI

YNIMG Journal 2012 Journal Article

Decoding the neural representation of affective states

Laura B. Baucom
Douglas H. Wedell
Jing Wang
David N. Blitzer
Svetlana V. Shinkareva

Details DOI

AAAI Conference 2010 Conference Paper

GTPA: A Generative Model For Online Mentor-Apprentice Networks

Muhammad Ahmad
David Huffakar
Jing Wang
Jeff Treem
Marshall Poole
Jaideep Srivastava

There is a large body of work on the evolution of graphs in various domains, which shows that many real graphs evolve in a similar manner. In this paper we study a novel type of network formed by mentor-apprentice relationships in a massively multiplayer online role playing game. We observe that some of the static and dynamic laws which have been observed in many other real world networks are not observed in this network. Consequently well known graph generators like Preferential Attachment, Forest Fire, Butterfly, RTM, etc. , cannot be applied to such mentoring networks. We propose a novel generative model to generate networks with the characteristics of mentoring networks.

PDF Details

NeurIPS Conference 2006 Conference Paper

MLLE: Modified Locally Linear Embedding Using Multiple Weights

Zhenyue Zhang
Jing Wang

The locally linear embedding (LLE) is improved by introducing multiple linearly independent local weight vectors for each neighborhood. We characterize the reconstruction weights and show the existence of the linearly independent weight vectors at each neighborhood. The modiﬁed locally linear embedding (MLLE) proposed in this paper is much stable. It can retrieve the ideal embedding if MLLE is applied on data points sampled from an isometric manifold. MLLE is also compared with the local tangent space alignment (LTSA). Numerical examples are given that show the improvement and efﬁciency of MLLE.

PDF Details

NeurIPS Conference 2004 Conference Paper

Adaptive Manifold Learning

Jing Wang
Zhenyue Zhang
Hongyuan Zha

Recently, there have been several advances in the machine learning and pattern recognition communities for developing manifold learning algo- rithms to construct nonlinear low-dimensional manifolds from sample data points embedded in high-dimensional spaces. In this paper, we de- velop algorithms that address two key issues in manifold learning: 1) the adaptive selection of the neighborhood sizes; and 2) better fitting the local geometric structure to account for the variations in the curvature of the manifold and its interplay with the sampling density of the data set. We also illustrate the effectiveness of our methods on some synthetic data sets.

PDF Details

TCS Journal 1998 Journal Article

Finite derivation type for semi-direct products of monoids

Jing Wang

Details DOI

ICRA Conference 1995 Conference Paper

A Wireless Medium Access Protocol (CSMA/CD-W) for Mobile Robot based Distributed Robotic Systems

Jing Wang
Suparerk Premvuti
Abdullah Tabbara

A media access protocol, CSMA/CD-W (Carrier Sense Multiple Access with Collision Detection for Wireless) is proposed to support broadcasting and point-to-point communication in mobile robot based distributed robotic systems (DRS). Distinct from many existing experimental systems built with off-the-shelf wireless communication products for computers, no centralized mechanism such as a communication server, or "ground support" is used, which is consistent with basic principles of DRS. The proposed protocol supports wireless data communication among mobile robots on a shared radio communication channel. It differs from CSMA and its variations with the capability of detecting, in a wireless network, collisions of broadcast (undesignated) messages without using any centralized devices. Satisfactory performance of the protocol is demonstrated with a rigorously designed discrete event simulation.

Details

ICRA Conference 1995 Conference Paper

Distributed Traffic Regulation and Control for Multiple Autonomous Mobile Robots Operating in Discrete Space

Jing Wang
Suparerk Premvuti

A fully distributed algorithm is presented, which when executed by each robot, collectively allows multiple autonomous mobile robots to travel through a discrete traffic network composed of passage segments, intersections, and terminals, all of which are of only finite capacity. Each robot may establish dynamically its own route not known to others. Treating passage segments, intersections and terminals as shared, discrete resources, the algorithm guarantees ordered traffic flow in a discrete network such that (i) finite capacity constrains of passage segments and terminals are always enforced; (ii) no collision occurs at any intersection; (iii) deadlocks are detected and resolved. The system operates under the model of Distributed Robotic Systems (DRS), assuming no centralized mechanism, synchronized clock, shared memory or ground support. Interrobot communication is only required among spatially adjacent robotic units. The algorithm is implementable with today's technology.

Details

ICRA Conference 1995 Conference Paper

Operating Primitives Supporting Traffic Regulation and Control of Mobile Robots under Distributed Robotic Systems

Jing Wang

Two distributed operating primitives (1 out of N and deadlock detection) are presented to support fully distributed traffic regulation and control for multiple autonomous mobile robots operating in a 2-D discrete network consisting of passage segments, intersections and terminals, all of which are of only finite capacity. In consistency with the model of distributed robotic systems (DRS), no centralized mechanism, synchronized clock, shared memory or ground support is assumed. It is shown that simple, low bandwidth inter-robot communication is only required among a finite, small number of spatially adjacent robotic units. The correctness of these two distributed algorithms are provable.

Details

ICRA Conference 1994 Conference Paper

On Sign-Board Based Inter-Robot Communication in Distributed Robotic Systems

Jing Wang

Inter-robot communication based on the conceptual mechanism of "sign-board" in distributed robotic systems (DRS) is discussed. Equipped by each robot, a sign-board can be written only by the robot that carries it, and be read by robots in the neighborhood. Consistent with DRS principles, the sign-board model is not supported by any centralized mechanism, and is considered a natural way of interaction among autonomous robotic units. It is shown that along with message passing, the sign-board model is one of the two important mechanisms for inter-robot communication. Previous research on DRS algorithms employing the sign-board model assume zero signal propagation delay. These algorithms may fail if non-zero propagation delay is taken into account. A simple fix for these algorithms exists if the propagation delay is bounded. Implementation strategies for the conceptual sign-board are also discussed. >

Details

IROS Conference 1994 Conference Paper

Resource sharing in distributed robotic systems based on a wireless medium access protocol (CSMA/CD-W)

Jing Wang
Suparerk Premvuti

Resource sharing is crucial in any multi-agent systems, a distributed robotic system (DRS) is not an exception. A new, general strategy of sharing multiple, discrete resources with predetermined capacities under the model of distributed robotic systems (DRS) is proposed. It is based upon a media access protocol, CSMA/CD-W (Carrier Sense Multiple Access with Collision Detection for Wireless), which supports wireless inter-robot communication among multiple autonomous mobile robots without using any centralized mechanism. This resource sharing strategy is derived based on the fact that with the single, time-multiplexed communication channel, asynchronous events for requesting and releasing resources are effectively serialized. It is shown that the control protocol is effective, efficient, reliable and robust. >

Details

IROS Conference 1993 Conference Paper

DRS operating primitives based on distributed mutual exclusion

Jing Wang

Distributed mutual exclusion (DME) is an important concept in any multi-agent systems, including the distributed robotic system (DRS). Several basic DME algorithms employing sign-board as their inter-robot communication mechanism are presented. It is shown that a large number of DRS operating primitives, such as leader finding, dynamic ordering of robots and events, job assignment, and resource sharing, can be effectively implemented with algorithms based on DME.

Details

IROS Conference 1991 Conference Paper

Fully distributed traffic control strategies for many-AGV systems

Jing Wang

A model for studying fully distributed traffic control strategies for many-AGV systems in an operating field of a network of stations and passages is proposed. As a basic operating primitive, distributed mutual exclusion on a resource of capacity M (0

Details