Arrow Research search

Author name cluster

Jun Zhou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

80 papers
2 author rows

Possible papers

80

AAAI Conference 2026 Conference Paper

Auto-PRE: An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation

  • Junjie Chen
  • Weihang Su
  • Zhumin Chu
  • Haitao Li
  • Yujia Zhou
  • Dingbo Yuan
  • Xudong Wang
  • Jun Zhou

The rapid development of large language models (LLMs) has highlighted the need for efficient and reliable methods to evaluate their performance. Traditional evaluation methods often face challenges like high costs, limited task formats, dependence on human references, and systematic biases. To address these limitations, we propose Auto-PRE, an automatic LLM evaluation framework inspired by the peer review process. Unlike previous approaches that rely on human annotations, Auto-PRE automatically selects evaluator LLMs based on three core traits: consistency, pertinence, and self-confidence, which correspond to the instruction, content, and response stages, respectively, and collectively cover the entire evaluation process. Experiments on three representative tasks, including summarization, non-factoid QA, and dialogue generation, demonstrate that Auto-PRE achieves state-of-the-art performance while significantly reducing evaluation costs. Furthermore, the structured and scalable design of our automatic qualification exam framework provides valuable insights into automating the evaluation of LLMs-as-judges, paving the way for more advanced LLM-based evaluation frameworks.

JBHI Journal 2026 Journal Article

Distance Learning-Based Prototypical Network With Multi-Domain Adaptation for Few-Shot Hyperspectral Medical Image Classification

  • Favour Ekong
  • Jun Zhou
  • Jing Wang
  • Mohammad Aminul Islam
  • Yongsheng Gao

Hyperspectral imaging (HSI) holds immense potential for medical diagnostics by capturing tissue-specific spectral signatures that facilitate precise disease detection. However, effective HSI classification in clinical settings is hindered by two main challenges: (i) the severe lack of labelled medical HSI samples constrains model training. Prototypical networks, as a few-shot learning paradigm, have been adopted to address label scarcity. However, current Euclidean-based prototypical methods typically assume equal feature variance and spherical distributions, while ignoring intraclass covariance and spectral correlations; (ii) significant domain shifts across heterogeneous medical HSI datasets undermine model generalisation, impair multi-domain interpretability, and force expensive per-dataset retraining. To overcome these limitations, we propose a novel distance-learning-based prototypical network with multi-domain adaptation for few-shot hyperspectral medical image classification. First, by embedding a class-covariance-aware Mahalanobis metric within the prototypical block, our module adapts similarity measures to each class's intrinsic spectral–spatial covariance and scale variations, thereby enhancing prototype robustness under severe label scarcity and significantly reducing misclassification compared with existing few-shot networks. Secondly, we introduce the domain-aware adapter block designed to address domain shift and multi-domain variability by dynamically fusing shared spectral–spatial representations with domain-specific characteristics via spectral integration and switchable adapters. We undertook extensive experiments on three publicly available hyperspectral medical datasets: skin dermoscopy, multidimensional choledochal, and in-vivo brain dataset. Compared to state-of-the-art classifiers, the proposed method achieved excellent performance on all three datasets, paving the way for generalisable HSI solutions in clinical workflows and biomedical research.

AAAI Conference 2026 Conference Paper

Note2Chat: Improving LLMs for Multi-Turn Clinical History Taking Using Medical Notes

  • Yang Zhou
  • Zhenting Sheng
  • Mingrui Tan
  • Yuting Song
  • Jun Zhou
  • Yu Heng Kwan
  • Lian Leng Low
  • Yang Bai

Effective clinical history taking is a foundational yet underexplored component of clinical reasoning. While large language models (LLMs) have shown promise on static benchmarks, they often fall short in dynamic, multi-turn diagnostic settings that require iterative questioning and hypothesis refinement. To address this gap, we propose Note2Chat, a note-driven framework that trains LLMs to conduct structured history taking and diagnosis by learning from widely available medical notes. Instead of relying on scarce and sensitive dialogue data, we convert real-world medical notes into high-quality doctor-patient dialogues using a decision tree-guided generation and refinement pipeline. We then propose a three-stage fine-tuning strategy combining supervised learning, simulated data augmentation, and preference learning. Furthermore, we propose a novel single-turn reasoning paradigm that reframes history taking as a sequence of single-turn reasoning problems. This design enhances interpretability and enables local supervision, dynamic adaptation, and greater sample efficiency. Experimental results show that our method substantially improves clinical reasoning, achieving gains of +16.9 F1 and +21.0 Top-1 diagnostic accuracy over GPT-4o.

AAAI Conference 2026 Conference Paper

Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction

  • Jun Xu
  • Xinkai Du
  • Yu Ao
  • Peilong Zhao
  • Yang Li
  • Ling Zhong
  • Lin Yuan
  • Zhongpu Bo

Efficient retrieval of external knowledge bases and web pages is crucial for enhancing the reasoning abilities of LLMs. Previous works on training LLMs to leverage external retrievers for solving complex problems have predominantly employed end-to-end reinforcement learning. However, these approaches neglect supervision over the reasoning process, making it difficult to guarantee logical coherence and rigor. To address these limitations, we propose Thinker, a hierarchical thinking model for deep search through multi-turn interaction, making the reasoning process supervisable and verifiable. It decomposes complex problems into independently solvable sub-problems, each dually represented in both natural language and an equivalent logical function to support knowledge base and web searches. Concurrently, dependencies between sub-problems are passed as parameters via these logical functions, enhancing the logical coherence of the problem-solving process. To avoid unnecessary external searches, we perform knowledge boundary determination to check if a sub-problem is within the LLM's intrinsic knowledge, allowing it to answer directly. Experimental results indicate that with as few as several hundred training samples, the performance of Thinker is competitive with established baselines. Furthermore, when scaled to the full training set, Thinker significantly outperforms these methods across various datasets and model sizes.

AAAI Conference 2026 Conference Paper

VPHO: Joint Visual-Physical Cue Learning and Aggregation for Hand-Object Pose Estimation

  • Jun Zhou
  • Chi Xu
  • Kaifeng Tang
  • Yuting Ge
  • Tingrui Guo
  • Li Cheng

Estimating the 3D poses of hands and objects from a single RGB image is a fundamental yet challenging problem, with broad applications in augmented reality and human-computer interaction. Existing methods largely rely on visual cues alone, often producing results that violate physical constraints such as interpenetration or non-contact. Recent efforts to incorporate physics reasoning typically depend on post-optimization or non-differentiable physics engines, which compromise visual consistency and end-to-end trainability. To overcome these limitations, we propose a novel framework that jointly integrates visual and physical cues for hand-object pose estimation. This integration is achieved through two key ideas: 1) joint visual-physical cue learning: The model is trained to extract 2D visual cues and 3D physical cues, thereby enabling more comprehensive representation learning for hand-object interactions; 2) candidate pose aggregation: A novel refinement process that aggregates multiple diffusion-generated candidate poses by leveraging both visual and physical predictions, yielding a final estimate that is visually consistent and physically plausible. Extensive experiments demonstrate that our method significantly outperforms existing state-of-the-art approaches in both pose accuracy and physical plausibility.

AAAI Conference 2026 Conference Paper

Zo3T: Zero-Shot 3D-Aware Trajectory-Guided Image-to-Video Generation via Test-Time Training

  • Ruicheng Zhang
  • Jun Zhou
  • Zunnan Xu
  • Zihao Liu
  • Jiehui Huang
  • Mingyang Zhang
  • Yu Sun
  • Xiu Li

Trajectory-Guided image-to-video (I2V) generation aims to synthesize videos that adhere to user-specified motion instructions. Existing methods typically rely on computationally expensive fine-tuning on scarce annotated datasets. Although some zero-shot methods attempt to trajectory control in the latent space, they may yield unrealistic motion by neglecting 3D perspective and creating a misalignment between the manipulated latents and the network's noise predictions. To address these challenges, we introduce Zo3T, a novel zero-shot test-time-training framework for trajectory-guided generation with three core innovations: First, we incorporate a 3D-Aware Kinematic Projection, leveraging inferring scene depth to derive perspective-correct affine transformations for target regions. Second, we introduce Trajectory-Guided Test-Time LoRA, a mechanism that dynamically injects and optimizes ephemeral LoRA adapters into the denoising network alongside the latent state. Driven by a regional feature consistency loss, this co-adaptation effectively enforces motion constraints while allowing the pre-trained model to locally adapt its internal representations to the manipulated latent, thereby ensuring generative fidelity and on-manifold adherence. Finally, we develop Guidance Field Rectification, which refines the denoising evolutionary path by optimizing the conditional guidance field through a one-step lookahead strategy, ensuring efficient generative progression towards the target trajectory. Zo3T significantly enhances 3D realism and motion accuracy in trajectory-controlled I2V generation, demonstrating superior performance over existing training-based and zero-shot approaches.

JMLR Journal 2025 Journal Article

Adaptive Client Sampling in Federated Learning via Online Learning with Bandit Feedback

  • Boxin Zhao
  • Lingxiao Wang
  • Ziqi Liu
  • Zhiqiang Zhang
  • Jun Zhou
  • Chaochao Chen
  • Mladen Kolar

Due to the high cost of communication, federated learning (FL) systems need to sample a subset of clients that are involved in each round of training. As a result, client sampling plays an important role in FL systems as it affects the convergence rate of optimization algorithms used to train machine learning models. Despite its importance, there is limited work on how to sample clients effectively. In this paper, we cast client sampling as an online learning task with bandit feedback, which we solve with an online stochastic mirror descent (OSMD) algorithm designed to minimize the sampling variance. We then theoretically show how our sampling method can improve the convergence speed of federated optimization algorithms over the widely used uniform sampling. Through both simulated and real data experiments, we empirically illustrate the advantages of the proposed client sampling algorithm over uniform sampling and existing online learning-based sampling strategies. The proposed adaptive sampling procedure is applicable beyond the FL problem studied here and can be used to improve the performance of stochastic optimization procedures such as stochastic gradient descent and stochastic coordinate descent. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2025. ( edit, beta )

NeurIPS Conference 2025 Conference Paper

ARGenSeg: Image Segmentation with Autoregressive Image Generation Model

  • Xiaolong Wang
  • Lixiang Ru
  • Ziyuan Huang
  • Kaixiang Ji
  • DanDan Zheng
  • Jingdong Chen
  • Jun Zhou

We propose a novel AutoRegressive Generation-based paradigm for image Segmentation (ARGenSeg), achieving multimodal understanding and pixel-level perception within a unified framework. Prior works integrating image segmentation into multimodal large language models (MLLMs) typically employ either boundary points representation or dedicated segmentation heads. These methods rely on discrete representations or semantic prompts fed into task-specific decoders, which limits the ability of the MLLM to capture fine-grained visual details. To address these challenges, we introduce a segmentation framework for MLLM based on image generation, which naturally produces dense masks for target objects. We leverage MLLM to output visual tokens and detokenize them into images using an universal VQ-VAE, making the segmentation fully dependent on the pixel-level understanding of the MLLM. To reduce inference latency, we employ a next-scale-prediction strategy to generate required visual tokens in parallel. Extensive experiments demonstrate that our method surpasses prior state-of-the-art approaches on multiple segmentation datasets with a remarkable boost in inference speed, while maintaining strong understanding capabilities.

AAAI Conference 2025 Conference Paper

CREST: An Efficient Conjointly-trained Spike-driven Framework for Event-based Object Detection Exploiting Spatiotemporal Dynamics

  • Ruixin Mao
  • Aoyu Shen
  • Lin Tang
  • Jun Zhou

Event-based cameras feature high temporal resolution, wide dynamic range, and low power consumption, which are ideal for high-speed and low-light object detection. Spiking neural networks (SNNs) are promising for event-based object recognition and detection due to their spiking nature but lack efficient training methods, leading to gradient vanishing and high computational complexity, especially in deep SNNs. Additionally, existing SNN frameworks often fail to effectively handle multi-scale spatiotemporal features, leading to increased data redundancy and reduced accuracy. To address these issues, we propose CREST, a novel conjointly trained spike-driven framework to exploit spatiotemporal dynamics in event-based object detection. We introduce the conjoint learning rule to accelerate SNN learning and alleviate gradient vanishing. It also supports dual operation modes for efficient and flexible implementation on different hardware types. Additionally, CREST features a fully spike driven framework with a multi-scale spatiotemporal event integrator (MESTOR) and a spatiotemporal-IoU (ST-IoU) loss. Our approach achieves superior object recognition & detection performance and energy efficiency compared with state of-the-art SNN algorithms on three datasets, providing an efficient solution for event-based object detection algorithms suitable for SNN hardware implementation.

AAAI Conference 2025 Conference Paper

Explore What LLM Does Not Know in Complex Question Answering

  • Xin Lin
  • Zhenya Huang
  • Zhiqiang Zhang
  • Jun Zhou
  • Enhong Chen

Complex question answering (QA) is a challenging task in artificial intelligence research which requires reasoning based on related knowledge. The retrieval-augmented generation (RAG) based on large language models (LLMs) have become one promising solution in QA. To facilitate RAG more effectively, the LLM needs to precisely evaluate knowledge required in QA. That is, first, the LLM needs to examine its knowledge boundary (what the LLM does not know) to retrieve external knowledge as supplement. Second, the LLM needs to evaluate the utility of the retrieved knowledge (whether it helps in reasoning) for robust RAG. To this end, in this paper, we propose a novel Question Answering with Knowledge Evaluation (KEQA) framework to promote the effectiveness and efficiency of RAG in QA. First, inspired by quizzes in classroom, we propose a quiz-based method to precisely examine the knowledge state of the uninterpretable LLM for QA. We ask indicative quizzes on each required knowledge, and inspect whether the LLM can consistently answer the quiz to examine its knowledge boundary. Second, we retrieve the unknown knowledge from external source, and evaluate its utility to pick the helpful ones for reasoning. We design a reasoning-based metric to evaluate utility, and construct a demonstration set in training data for reference to guide knowledge picking in inference. We conduct extensive experiments on four widely-used QA datasets, and the results demonstrate the effectiveness of the proposed method.

AAAI Conference 2025 Conference Paper

HeMeNet: Heterogeneous Multichannel Equivariant Network for Protein Multi-task Learning

  • Rong Han
  • Wenbing Huang
  • Lingxiao Luo
  • Xinyan Han
  • Jiaming Shen
  • Zhiqiang Zhang
  • Jun Zhou
  • Ting Chen

Understanding and leveraging the 3D structures of proteins is central to a variety of biological and drug discovery tasks. While deep learning has been applied successfully for structure-based protein function prediction tasks, current methods usually employ distinct training for each task. However, each of the tasks is of small size, and such a single-task strategy hinders the models' performance and generalization ability. As some labeled 3D protein datasets are biologically related, combining multi-source datasets for larger-scale multi-task learning is one way to overcome this problem. In this paper, we propose a neural network model to address multiple tasks jointly upon the input of 3D protein structures. In particular, we first construct a standard structure-based multi-task benchmark called Protein-MT, consisting of 6 biologically relevant tasks, including affinity prediction and property prediction, integrated from 4 public datasets. Then, we develop a novel graph neural network for multi-task learning, dubbed Heterogeneous Multichannel Equivariant Network (HeMeNet), which is E(3) equivariant and able to capture heterogeneous relationships between different atoms. Besides, HeMeNet can achieve task-specific learning via the task-aware readout mechanism. Extensive evaluations of our benchmark verify the effectiveness of multi-task learning, and our model generally surpasses state-of-the-art models.

AAAI Conference 2025 Conference Paper

Improving Natural Language Understanding for LLMs via Large-Scale Instruction Synthesis

  • Lin Yuan
  • Jun Xu
  • Honghao Gui
  • Mengshu Sun
  • Zhiqiang Zhang
  • Lei Liang
  • Jun Zhou

High-quality, large-scale instructions are crucial for aligning large language models (LLMs), however, there is a severe shortage of instruction in the field of natural language understanding (NLU). Previous works on constructing NLU instructions mainly focus on information extraction (IE), neglecting tasks such as machine reading comprehension, question answering, and text classification. Furthermore, the lack of diversity in the data has led to a decreased generalization ability of trained LLMs in other NLU tasks and a noticeable decline in the fundamental model's general capabilities. To address this issue, we propose Hum, a large-scale, high-quality synthetic instruction corpus for NLU tasks, designed to enhance the NLU capabilities of LLMs. Specifically, Hum includes IE (either close IE or open IE), machine reading comprehension, text classification, and instruction generalist tasks, thereby enriching task diversity. Additionally, we introduce a human-LLMs collaborative mechanism to synthesize instructions, which enriches instruction diversity by incorporating guidelines, preference rules, and format variants. We conduct extensive experiments on 5 NLU tasks and 28 general capability evaluation datasets for LLMs. Experimental results show that Hum enhances the NLU capabilities of six LLMs by an average of 3.1%, with no significant decline observed in other general capabilities.

NeurIPS Conference 2025 Conference Paper

Large Language Diffusion Models

  • Shen Nie
  • Fengqi Zhu
  • Zebin You
  • Xiaolu Zhang
  • Jingyang Ou
  • Jun Hu
  • Jun Zhou
  • Yankai Lin

The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA employs a forward data masking process and a reverse generation process, parameterized by a Transformer to predict masked tokens. It provides a principled generative approach for probabilistic inference by optimizing a likelihood lower bound. Across extensive benchmarks on general tasks, math, code, and so on, LLaDA demonstrates strong scalability and performs comparably to our self-constructed ARM baselines. Remarkably, LLaDA 8B is competitive with strong LLMs like LLaMA3 8B in in-context learning and, after SFT, exhibits impressive instruction-following abilities in case studies such as multi-turn dialogue. Moreover, LLaDA addresses the reversal curse, surpassing GPT-4o in a reversal poem completion task. Our findings show the promise of diffusion models for language modeling at scale and challenge the common assumption that core LLM capabilities discussed above inherently depend on ARMs. Project page and codes: \url{https: //ml-gsai. github. io/LLaDA-demo/}.

AAAI Conference 2025 Conference Paper

Learning Causal Transition Matrix for Instance-dependent Label Noise

  • Jiahui Li
  • Tai-Wei Chang
  • Kun Kuang
  • Ximing Li
  • Long Chen
  • Jun Zhou

Noisy labels are both inevitable and problematic in machine learning methods, as they negatively impact models' generalization ability by causing overfitting. In the context of learning with noise, the transition matrix plays a crucial role in the design of statistically consistent algorithms. However, the transition matrix is often considered unidentifiable. One strand of methods typically addresses this problem by assuming that the transition matrix is instance-independent; that is, the probability of mislabeling a particular instance is not influenced by its characteristics or attributes. This assumption is clearly invalid in complex real-world scenarios. To better understand the transition relationship and relax this assumption, we propose to study the data generation process of noisy labels from a causal perspective. We discover that an unobservable latent variable can affect either the instance itself, the label annotation procedure, or both, which complicates the identification of the transition matrix. To address various scenarios, we have unified these observations within a new causal graph. In this graph, the input instance is divided into a noise-resistant component and a noise-sensitive component based on whether they are affected by the latent variable. These two components contribute to identifying the “causal transition matrix”, which approximates the true transition matrix with theoretical guarantee. In line with this, we have designed a novel training framework that explicitly models this causal relationship and, as a result, achieves a more accurate model for inferring the clean label.

ICLR Conference 2025 Conference Paper

LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch

  • Caigao Jiang
  • Xiang Shu
  • Hong Qian
  • Xingyu Lu
  • Jun Zhou
  • Aimin Zhou
  • Yang Yu

Optimization problems are prevalent across various scenarios. Formulating and then solving optimization problems described by natural language often requires highly specialized human expertise, which could block the widespread application of optimization-based decision making. To automate problem formulation and solving, leveraging large language models (LLMs) has emerged as a potential way. However, this kind of approach suffers from the issue of optimization generalization. Namely, the accuracy of most current LLM-based methods and the generality of optimization problem types that they can model are still limited. In this paper, we propose a unified learning-based framework called LLMOPT to boost optimization generalization. Starting from the natural language descriptions of optimization problems and a pre-trained LLM, LLMOPT constructs the introduced five-element formulation as a universal model for learning to define diverse optimization problem types. Then, LLMOPT employs the multi-instruction tuning to enhance both problem formalization and solver code generation accuracy and generality. After that, to prevent hallucinations in LLMs, such as sacrificing solving accuracy to avoid execution errors, the model alignment and self-correction mechanism are adopted in LLMOPT. We evaluate the optimization generalization ability of LLMOPT and compared methods across six real-world datasets covering roughly 20 fields such as health, environment, energy and manufacturing, etc. Extensive experiment results show that LLMOPT is able to model various optimization problem types such as linear/nonlinear programming, mixed integer programming, and combinatorial optimization, and achieves a notable 11.08% average solving accuracy improvement compared with the state-of-the-art methods. The code is available at https://github.com/caigaojiang/LLMOPT.

AAAI Conference 2025 Conference Paper

Making Large Vision Language Models to Be Good Few-Shot Learners

  • Fan Liu
  • Wenwen Cai
  • Jian Huo
  • Chuanyi Zhang
  • Delong Chen
  • Jun Zhou

Few-shot classification (FSC) is a fundamental yet challenging task in computer vision that involves recognizing novel classes from limited data. While previous methods have focused on enhancing visual features or incorporating additional modalities, Large Vision Language Models (LVLMs) offer a promising alternative due to their rich knowledge and strong visual perception. However, LVLMs risk learning specific response formats rather than effectively extracting useful information from support data in FSC. In this paper, we investigate LVLMs' performance in FSC and identify key issues such as insufficient learning and the presence of severe position biases. To tackle above challenges, we adopt the meta-learning strategy to teach models ``learn to learn". By constructing a rich set of meta-tasks for instruction fine-tuning, LVLMs enhance the ability to extract information from few-shot support data for classification. Additionally, we further boost LVLM's few-shot learning capabilities through label augmentation (LA) and candidate selection (CS) in the fine-tuning and inference stages, respectively. LA is implemented via a character perturbation strategy to ensure the model focuses on support information. CS leverages attribute descriptions to filter out unreliable candidates and simplify the task. Extensive experiments demonstrate that our approach achieves superior performance on both general and fine-grained datasets. Furthermore, our candidate selection strategy has been proven beneficial for training-free LVLMs.

NeurIPS Conference 2025 Conference Paper

Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

  • Jinluan Yang
  • Dingnan Jin
  • Anke Tang
  • Li Shen
  • Didi Zhu
  • Zhengyu Chen
  • Ziyu Zhao
  • Daixin Wang

Achieving balanced alignment of large language models (LLMs) in terms of Helpfulness, Honesty, and Harmlessness (3H optimization) constitutes a cornerstone of responsible AI. Existing methods like data mixture strategies face limitations, including heavy reliance on expert knowledge and conflicting optimization signals. While model merging offers parameter-level conflict-resolution strategies through integrating specialized models' parameters, its potential for 3H optimization remains underexplored. This paper systematically compares the effectiveness of model merging and data mixture methods in constructing 3H-aligned LLMs for the first time, revealing previously overlooked collaborative and conflict relationships among the 3H dimensions and discussing the advantages and drawbacks of data mixture (\textit{data-level}) and model merging (\textit{parameter-level}) methods in mitigating the conflict for balanced 3H optimization. Specially, we propose a novel \textbf{R}eweighting \textbf{E}nhanced task \textbf{S}ingular \textbf{M}erging method, \textbf{RESM}, through outlier weighting and sparsity-aware rank selection strategies to address the challenges of preference noise accumulation and layer sparsity adaptation inherent in 3H-aligned LLM merging. Extensive evaluations can verify the effectiveness and robustness of RESM compared to previous data mixture (2\%-5\% gain) and model merging (1\%-3\% gain) methods in achieving balanced LLM alignment.

NeurIPS Conference 2025 Conference Paper

SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning

  • Jiaqi Huang
  • Zunnan Xu
  • Jun Zhou
  • Ting Liu
  • Yicheng Xiao
  • Mingwen Ou
  • Bowen Ji
  • Xiu Li

Leveraging multimodal large models for image segmentation has become a prominent research direction. However, existing approaches typically rely heavily on manually annotated datasets that include explicit reasoning processes, which are costly and time-consuming to produce. Recent advances suggest that reinforcement learning (RL) can endow large models with reasoning capabilities without requiring such reasoning-annotated data. In this paper, we propose SAM-R1, a novel framework that enables multimodal large models to perform fine-grained reasoning in image understanding tasks. Our approach is the first to incorporate fine-grained segmentation settings during the training of multimodal reasoning models. By integrating task-specific, fine-grained rewards with a tailored optimization objective, we further enhance the model's reasoning and segmentation alignment. We also leverage the Segment Anything Model (SAM) as a strong and flexible reward provider to guide the learning process. With only 3k training samples, SAM-R1 achieves strong performance across multiple benchmarks, demonstrating the effectiveness of reinforcement learning in equipping multimodal models with segmentation-oriented reasoning capabilities.

IROS Conference 2025 Conference Paper

TerraX: Visual Terrain Classification Enhanced by Vision-Language Models

  • Hongze Li
  • Xuchuan Huang
  • Xinhai Chang
  • Jun Zhou
  • Huijing Zhao

Visual Terrain Classification (VTC) plays a vital role in enabling unmanned ground vehicles to understand complex environments. Existing research relies on image-label pairs annotated by static label sets, where semantic ambiguity and high annotation costs constrain fine-grained terrain characterization. These limitations hinder the model’s adaptation to real-world terrain diversity and restrict its applicability. To address these issues, we propose TerraX, a vision-language learning framework that integrates multi-modal image-label-text data, unifying structured annotations with fine-grained natural language descriptions. The framework introduces a composite dataset TerraData, an evaluation benchmark suite TerraBench, and a CLIP-based visual terrain classification model TerraCLIP. TerraData aggregates multi-source terrain images from public and self-collected datasets, annotated through a VLM-based vision-language data annotation pipeline. TerraBench defines three evaluation benchmarks to systematically assess model robustness and adaptability in real-world terrain classification scenarios. Built on the CLIP model, TerraCLIP utilizes multi-granularity contrastive loss and LoRA fine-tuning to enhance understanding for terrain categories and attributes, and incorporates confidence-weighted inference for accurate predictions. Extensive experiments across benchmarks and real-world platforms demonstrate that our approach significantly enhances VTC performance, highlighting its potential for deployment in complex environments.

NeurIPS Conference 2025 Conference Paper

VADB: A Large-Scale Video Aesthetic Database with Professional and Multi-Dimensional Annotations

  • Qianqian Qiao
  • DanDan Zheng
  • Yihang Bo
  • Bao Peng
  • Heng Huang
  • Longteng Jiang
  • Jingdong Chen
  • Jun Zhou

Video aesthetic assessment, a vital area in multimedia computing, integrates computer vision with human cognition. Its progress is limited by the lack of standardized datasets and robust models, as the temporal dynamics of video and multimodal fusion challenges hinder direct application of image-based methods. This study introduces VADB, the largest video aesthetic database with 10, 490 diverse videos annotated by 37 professionals across multiple aesthetic dimensions, including overall and attribute-specific aesthetic scores, rich language comments and objective tags. We propose VADB-Net, a dual-modal pre-training framework with a two-stage training strategy, which outperforms existing video quality assessment models in scoring tasks and supports downstream video aesthetic assessment tasks. The dataset and source code are available at https: //github. com/BestiVictory/VADB.

AAAI Conference 2024 Conference Paper

Backdoor Adjustment via Group Adaptation for Debiased Coupon Recommendations

  • Junpeng Fang
  • Gongduo Zhang
  • Qing Cui
  • Caizhi Tang
  • Lihong Gu
  • Longfei Li
  • Jinjie Gu
  • Jun Zhou

Accurate prediction of coupon usage is crucial for promoting user consumption through targeted coupon recommendations. However, in real-world coupon recommendations, the coupon allocation process is not solely determined by the model trained with the history interaction data but is also interfered with by marketing tactics desired to fulfill specific commercial goals.This interference creates an imbalance in the interactions, which causes the data to deviate from the user's natural preferences. We refer to this deviation as the matching bias. Such biased interaction data affects the efficacy of the model, and thus it is necessary to employ debiasing techniques to prevent any negative impact. We investigate the mitigation of matching bias in coupon recommendations from a causal-effect perspective. By treating the attributes of users and coupons associated with marketing tactics as confounders, we find the confounders open the backdoor path between user-coupon matching and the conversion, which introduces spurious correlation. To remove the bad effect, we propose a novel training paradigm named Backdoor Adjustment via Group Adaptation (BAGA) for debiased coupon recommendations, which performs intervened training and inference, i.e., separately modeling each user-coupon group pair. However, modeling all possible group pairs greatly increases the computational complexity and cost. To address the efficiency challenge, we further present a simple but effective dual-tower multi-task framework and leverage the Customized Gate Control (CGC) model architecture, which separately models each user and coupon group with a separate expert module. We instantiate BAGA on five representative models: FM, DNN, NCF, MASKNET, and DEEPFM, and conduct comprehensive offline and online experiments to demonstrate the efficacy of our proposed paradigm.

NeurIPS Conference 2024 Conference Paper

Collaborative Refining for Learning from Inaccurate Labels

  • Bin Han
  • Yi-Xuan Sun
  • Ya-Lin Zhang
  • Libang Zhang
  • Haoran Hu
  • Longfei Li
  • Jun Zhou
  • Guo Ye

This paper considers the problem of learning from multiple sets of inaccurate labels, which can be easily obtained from low-cost annotators, such as rule-based annotators. Previous works typically concentrate on aggregating information from all the annotators, overlooking the significance of data refinement. This paper presents a collaborative refining approach for learning from inaccurate labels. To refine the data, we introduce the annotator agreement as an instrument, which refers to whether multiple annotators agree or disagree on the labels for a given sample. For samples where some annotators disagree, a comparative strategy is proposed to filter noise. Through theoretical analysis, the connections among multiple sets of labels, the respective models trained on them, and the true labels are uncovered to identify relatively reliable labels. For samples where all annotators agree, an aggregating strategy is designed to mitigate potential noise. Guided by theoretical bounds on loss values, a sample selection criterion is introduced and modified to be more robust against potentially problematic values. Through these two methods, all the samples are refined during training, and these refined samples are used to train a lightweight model simultaneously. Extensive experiments are conducted on benchmark and real-world datasets to demonstrate the superiority of our methods.

AAAI Conference 2024 Conference Paper

GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting

  • Fan Zhou
  • Chen Pan
  • Lintao Ma
  • Yu Liu
  • Siqiao Xue
  • James Zhang
  • Jun Zhou
  • Hongyuan Mei

Time series forecasts of different temporal granularity are widely used in real-world applications, e.g., sales prediction in days and weeks for making different inventory plans. However, these tasks are usually solved separately without ensuring coherence, which is crucial for aligning downstream decisions. Previous works mainly focus on ensuring coherence with some straightforward methods, e.g., aggregation from the forecasts of fine granularity to the coarse ones, and allocation from the coarse granularity to the fine ones. These methods merely take the temporal hierarchical structure to maintain coherence without improving the forecasting accuracy. In this paper, we propose a novel granularity message-passing mechanism (GMP) that leverages temporal hierarchy information to improve forecasting performance and also utilizes an adaptive reconciliation (AR) strategy to maintain coherence without performance loss. Furthermore, we introduce an optimization module to achieve task-based targets while adhering to more real-world constraints. Experiments on real-world datasets demonstrate that our framework (GMP-AR) achieves superior performances on temporal hierarchical forecasting tasks compared to state-of-the-art methods. In addition, our framework has been successfully applied to a real-world task of payment traffic management in Alipay by integrating with the task-based optimization module.

ICML Conference 2024 Conference Paper

Improving Equivariant Graph Neural Networks on Large Geometric Graphs via Virtual Nodes Learning

  • Yuelin Zhang
  • Jiacheng Cen
  • Jiaqi Han
  • Zhiqiang Zhang
  • Jun Zhou
  • Wenbing Huang 0001

Equivariant Graph Neural Networks (GNNs) have made remarkable success in a variety of scientific applications. However, existing equivariant GNNs encounter the efficiency issue for large geometric graphs and perform poorly if the input is reduced to sparse local graph for speed acceleration. In this paper, we propose FastEGNN, an enhanced model of equivariant GNNs on large geometric graphs. The central idea is leveraging a small ordered set of virtual nodes to approximate the large unordered graph of real nodes. In particular, we distinguish the message passing and aggregation for different virtual node to encourage the mutual distinctiveness, and minimize the Maximum Mean Discrepancy (MMD) between virtual and real coordinates to realize the global distributedness. FastEGNN meets all necessary E(3) symmetries, with certain universal expressivity assurance as well. Our experiments on N-body systems (100 nodes), proteins (800 nodes) and water-3D (8000 nodes), demonstrate that FastEGNN achieves a promising balance between accuracy and efficiency, and outperforms EGNN in accuracy even after dropping all edges in real systems like proteins and water-3D.

AAAI Conference 2024 Conference Paper

LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs

  • Yan Wang
  • Zhixuan Chu
  • Xin Ouyang
  • Simeng Wang
  • Hongyan Hao
  • Yue Shen
  • Jinjie Gu
  • Siqiao Xue

Recommendation systems aim to provide users with relevant suggestions, but often lack interpretability and fail to capture higher-level semantic relationships between user behaviors and profiles. In this paper, we propose a novel approach that leverages large language models (LLMs) to construct personalized reasoning graphs. These graphs link a user's profile and behavioral sequences through causal and logical inferences, representing the user's interests in an interpretable way. Our approach, LLM reasoning graphs (LLMRG), has four components: chained graph reasoning, divergent extension, self-verification and scoring, and knowledge base self-improvement. The resulting reasoning graph is encoded using graph neural networks, which serves as additional input to improve conventional recommender systems, without requiring extra user or item information. Our approach demonstrates how LLMs can enable more logical and interpretable recommender systems through personalized reasoning graphs. LLMRG allows recommendations to benefit from both engineered recommendation systems and LLM-derived reasoning graphs. We demonstrate the effectiveness of LLMRG on benchmarks and real-world scenarios in enhancing base recommendation models.

NeurIPS Conference 2024 Conference Paper

Lower Bounds of Uniform Stability in Gradient-Based Bilevel Algorithms for Hyperparameter Optimization

  • Rongzhen Wang
  • Chenyu Zheng
  • Guoqiang Wu
  • Xu Min
  • Xiaolu Zhang
  • Jun Zhou
  • Chongxuan Li

Gradient-based bilevel programming leverages unrolling differentiation (UD) or implicit function theorem (IFT) to solve hyperparameter optimization (HO) problems, and is proven effective and scalable in practice. To understand their generalization behavior, existing works establish upper bounds on the uniform stability of these algorithms, while their tightness is still unclear. To this end, this paper attempts to establish stability lower bounds for UD-based and IFT-based algorithms. A central technical challenge arises from the dependency of each outer-level update on the concurrent stage of inner optimization in bilevel programming. To address this problem, we introduce lower-bounded expansion properties to characterize the instability in update rules which can serve as general tools for lower-bound analysis. These properties guarantee the hyperparameter divergence at the outer level and the Lipschitz constant of inner output at the inner level in the context of HO. Guided by these insights, we construct a quadratic example that yields tight lower bounds for the UD-based algorithm and meaningful bounds for a representative IFT-based algorithm. Our tight result indicates that uniform stability has reached its limit in stability analysis for the UD-based algorithm.

AAAI Conference 2024 Conference Paper

MDGNN: Multi-Relational Dynamic Graph Neural Network for Comprehensive and Dynamic Stock Investment Prediction

  • Hao Qian
  • Hongting Zhou
  • Qian Zhao
  • Hao Chen
  • Hongxiang Yao
  • Jingwei Wang
  • Ziqi Liu
  • Fei Yu

The stock market is a crucial component of the financial system, but predicting the movement of stock prices is challenging due to the dynamic and intricate relations arising from various aspects such as economic indicators, financial reports, global news, and investor sentiment. Traditional sequential methods and graph-based models have been applied in stock movement prediction, but they have limitations in capturing the multifaceted and temporal influences in stock price movements. To address these challenges, the Multi-relational Dynamic Graph Neural Network (MDGNN) framework is proposed, which utilizes a discrete dynamic graph to comprehensively capture multifaceted relations among stocks and their evolution over time. The representation generated from the graph offers a complete perspective on the interrelationships among stocks and associated entities. Additionally, the power of the Transformer structure is leveraged to encode the temporal evolution of multiplex relations, providing a dynamic and effective approach to predicting stock investment. Further, our proposed MDGNN framework achieves the best performance in public datasets compared with the state-of-the-art stock investment methods.

NeurIPS Conference 2024 Conference Paper

Rethinking Memory and Communication Costs for Efficient Data Parallel Training of Large Language Models

  • Hanxiao Zhang
  • Lin Ju
  • Chan Wu
  • Jinjing Huang
  • Youshao Xiao
  • Zhenglei Zhou
  • Zhiming Fan
  • Zhaoxin Huan

Recently, various strategies for distributed training of large language models (LLMs) have been proposed. By categorizing them into basic strategies and composite strategies, we have discovered that existing basic strategies provide limited options in specific scenarios, leaving considerable room for optimization in training speed. In this paper, we rethink the impact of memory and communication costs on the training speed of LLMs, taking into account the impact of intra- and inter-group communication performance disparities, and then propose a new set of basic strategies named the \textbf{Pa}rtial \textbf{R}edundancy \textbf{O}ptimizer (PaRO). PaRO Data Parallelism (PaRO-DP) accelerates LLM training through refined model state partitioning and tailored training procedures. At the same time, PaRO Collective Communications (PaRO-CC) speeds up collective communication operations by rearranging the topology. We also propose a guideline for choosing different DP strategies based on simple quantitative calculations, which yields minimal ranking errors. Our experiments demonstrate that PaRO improves the training speed of LLMs by up to 266\% that of ZeRO-3 as basic DP strategies. Moreover, employing PaRO-CC independently for model parallel strategies, such as Megatron, can also boost the training speed by 17\%.

AAAI Conference 2024 Conference Paper

Structural Information Enhanced Graph Representation for Link Prediction

  • Lei Shi
  • Bin Hu
  • Deng Zhao
  • Jianshan He
  • Zhiqiang Zhang
  • Jun Zhou

Link prediction is a fundamental task of graph machine learning, and Graph Neural Network (GNN) based methods have become the mainstream approach due to their good performance. However, the typical practice learns node representations through neighborhood aggregation, lacking awareness of the structural relationships between target nodes. Recently, some methods have attempted to address this issue by node labeling tricks. However, they still rely on the node-centric neighborhood message passing of GNNs, which we believe involves two limitations in terms of information perception and transmission for link prediction. First, it cannot perceive long-range structural information due to the restricted receptive fields. Second, there may be information loss of node-centric model on link-centric task. In addition, we empirically find that the neighbor node features could introduce noise for link prediction. To address these issues, we propose a structural information enhanced link prediction framework, which involves removing the neighbor node features while fitting neighborhood graph structures more focused through GNN. Furthermore, we introduce Binary Structural Transformer (BST) to encode the structural relationships between target nodes, complementing the deficiency of GNN. Our approach achieves remarkable results on multiple popular benchmarks, including ranking first on ogbl-ppa, ogbl-citation2 and Pubmed.

AAAI Conference 2024 Conference Paper

π-Light: Programmatic Interpretable Reinforcement Learning for Resource-Limited Traffic Signal Control

  • Yin Gu
  • Kai Zhang
  • Qi Liu
  • Weibo Gao
  • Longfei Li
  • Jun Zhou

The recent advancements in Deep Reinforcement Learning (DRL) have significantly enhanced the performance of adaptive Traffic Signal Control (TSC). However, DRL policies are typically represented by neural networks, which are over-parameterized black-box models. As a result, the learned policies often lack interpretability, and cannot be deployed directly in the real-world edge hardware due to resource constraints. In addition, the DRL methods often exhibit limited generalization performance, struggling to generalize the learned policy to other geographical regions. These factors limit the practical application of learning-based approaches. To address these issues, we suggest the use of an inherently interpretable program for representing the control policy. We present a new approach, Programmatic Interpretable reinforcement learning for traffic signal control (π-light), designed to autonomously discover non-differentiable programs. Specifically, we define a Domain Specific Language (DSL) and transformation rules for constructing programs, and utilize Monte Carlo Tree Search (MCTS) to find the optimal program in a discrete space. Extensive experiments demonstrate that our method consistently outperforms baseline approaches. Moreover, π-Light exhibits superior generalization capabilities compared to DRL, enabling training and evaluation across intersections from different cities. Finally, we analyze how the learned program policies can directly deploy on edge devices with extremely limited resources.

NeurIPS Conference 2023 Conference Paper

FAST: a Fused and Accurate Shrinkage Tree for Heterogeneous Treatment Effects Estimation

  • Jia Gu
  • Caizhi Tang
  • Han Yan
  • Qing Cui
  • Longfei Li
  • Jun Zhou

This paper proposes a novel strategy for estimating the heterogeneous treatment effect called the Fused and Accurate Shrinkage Tree ($\mathrm{FAST}$). Our approach utilizes both trial and observational data to improve the accuracy and robustness of the estimator. Inspired by the concept of shrinkage estimation in statistics, we develop an optimal weighting scheme and a corresponding estimator that balances the unbiased estimator based on the trial data with the potentially biased estimator based on the observational data. Specifically, combined with tree-based techniques, we introduce a new split criterion that utilizes both trial data and observational data to more accurately estimate the treatment effect. Furthermore, we confirm the consistency of our proposed tree-based estimator and demonstrate the effectiveness of our criterion in reducing prediction error through theoretical analysis. The advantageous finite sample performance of the $\mathrm{FAST}$ and its ensemble version over existing methods is demonstrated via simulations and real data analysis.

IJCAI Conference 2023 Conference Paper

Few-shot Classification via Ensemble Learning with Multi-Order Statistics

  • Sai Yang
  • Fan Liu
  • Delong Chen
  • Jun Zhou

Transfer learning has been widely adopted for few-shot classification. Recent studies reveal that obtaining good generalization representation of images on novel classes is the key to improving the few-shot classification accuracy. To address this need, we prove theoretically that leveraging ensemble learning on the base classes can correspondingly reduce the true error in the novel classes. Following this principle, a novel method named Ensemble Learning with Multi-Order Statistics (ELMOS) is proposed in this paper. In this method, after the backbone network, we use multiple branches to create the individual learners in the ensemble learning, with the goal to reduce the storage cost. We then introduce different order statistics pooling in each branch to increase the diversity of the individual learners. The learners are optimized with supervised losses during the pre-training phase. After pre-training, features from different branches are concatenated for classifier evaluation. Extensive experiments demonstrate that each branch can complement the others and our method can produce a state-of-the-art performance on multiple few-shot classification benchmark datasets.

IJCAI Conference 2023 Conference Paper

Keep Skills in Mind: Understanding and Implementing Skills in Commonsense Question Answering

  • Meikai Bao
  • Qi Liu
  • Kai Zhang
  • Ye Liu
  • Linan Yue
  • Longfei Li
  • Jun Zhou

Commonsense Question Answering (CQA) aims to answer questions that require human commonsense. Closed-book CQA, as one of the subtasks, requires the model to answer questions without retrieving external knowledge, which emphasizes the importance of the model's problem-solving ability. Most previous methods relied on large-scale pre-trained models to generate question-related knowledge while ignoring the crucial role of skills in the process of answering commonsense questions. Generally, skills refer to the learned ability in performing a specific task or activity, which are derived from knowledge and experience. In this paper, we introduce a new approach named Dynamic Skill-aware Commonsense Question Answering (DSCQA), which transcends the limitations of traditional methods by informing the model about the need for each skill in questions and utilizes skills as a critical driver in CQA process. To be specific, DSCQA first employs commonsense skill extraction module to generate various skill representations. Then, DSCQA utilizes dynamic skill module to generate dynamic skill representations. Finally, in perception and emphasis module, various skills and dynamic skill representations are used to help question-answering process. Experimental results on two publicly available CQA datasets show the effectiveness of our proposed model and the considerable impact of introducing skills.

NeurIPS Conference 2023 Conference Paper

Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning

  • Xiaoming Shi
  • Siqiao Xue
  • Kangrui Wang
  • Fan Zhou
  • James Zhang
  • Jun Zhou
  • Chenhao Tan
  • Hongyuan Mei

Large language models have shown astonishing performance on a wide range of reasoning tasks. In this paper, we investigate whether they could reason about real-world events and help improve the prediction performance of event sequence models. We design LAMP, a framework that integrates a large language model in event prediction. Particularly, the language model performs abductive reasoning to assist an event sequence model: the event model proposes predictions on future events given the past; instructed by a few expert-annotated demonstrations, the language model learns to suggest possible causes for each proposal; a search module finds out the previous events that match the causes; a scoring function learns to examine whether the retrieved events could actually cause the proposal. Through extensive experiments on several challenging real-world datasets, we demonstrate that our framework---thanks to the reasoning capabilities of large language models---could significantly outperform the state-of-the-art event sequence models.

NeurIPS Conference 2023 Conference Paper

Prompt-augmented Temporal Point Process for Streaming Event Sequence

  • Siqiao Xue
  • Yan Wang
  • Zhixuan Chu
  • Xiaoming Shi
  • Caigao JIANG
  • Hongyan Hao
  • Gangwei Jiang
  • Xiaoyun Feng

Neural Temporal Point Processes (TPPs) are the prevalent paradigm for modeling continuous-time event sequences, such as user activities on the web and financial transactions. In real world applications, the event data typically comes in a streaming manner, where the distribution of the patterns may shift over time. Under the privacy and memory constraints commonly seen in real scenarios, how to continuously monitor a TPP to learn the streaming event sequence is an important yet under-investigated problem. In this work, we approach this problem by adopting Continual Learning (CL), which aims to enable a model to continuously learn a sequence of tasks without catastrophic forgetting. While CL for event sequence is less well studied, we present a simple yet effective framework, PromptTPP, by integrating the base TPP with a continuous-time retrieval prompt pool. In our proposed framework, prompts are small learnable parameters, maintained in a memory space and jointly optimized with the base TPP so that the model is properly instructed to learn event streams arriving sequentially without buffering past examples or task-specific attributes. We formalize a novel and realistic experimental setup for modeling event streams, where PromptTPP consistently sets state-of-the-art performance across two real user behavior datasets.

ECAI Conference 2023 Conference Paper

Self-Expressive Network-Based Subspace Clustering for Deep Embedding

  • Tingting Leng
  • Ling Zhao
  • Xiaolong Xiong
  • Peng Cheng 0011
  • Jun Zhou

Existing deep subspace clustering algorithms are difficult to scale to large-scale data. There are two reasons: Firstly, the existing subspace clustering algorithms almost all need to find the self-expressive coefficient matrix whose size is proportional to the square of the data set size at once. Secondly, spectral clustering needs to solve the eigenvector of the affinity matrix. These two points make the computational complexity of clustering very high when the data scale is large. This paper proposes Self-Expressive Network-Based Deep Embedded Subspace Clustering (SE-DESC), a subspace clustering method that can be applied to large-scale single-view and multi-view data. Using the idea of siamese networks, we design a self-expressive network to calculate the self-expressive coefficient between two data points, reducing the parameter amount of the self-expressive model to a constant. It can effectively avoid computational complexity. Then, we use a deeply embedded network to learn an embedding for each data point to map the data into the spectral space, avoiding the high computational complexity of spectral clustering. Extensive experiments demonstrate that SE-DESC improves the clustering performance on large-scale data compared to state-of-the-art methods.

JMLR Journal 2023 Journal Article

SQLFlow: An Extensible Toolkit Integrating DB and AI

  • Jun Zhou
  • Ke Zhang
  • Lin Wang
  • Hua Wu
  • Yi Wang
  • Chaochao Chen

Integrating AI algorithms into databases is an ongoing effort in both academia and industry. We introduce SQLFlow, a toolkit seamlessly combining data manipulations and AI operations that can be run locally or remotely. SQLFlow extends SQL syntax to support typical AI tasks including model training, inference, interpretation, and mathematical optimization. It is compatible with a variety of database management systems (DBMS) and AI engines, including MySQL, TiDB, MaxCompute, and Hive, as well as TensorFlow, scikit-learn, and XGBoost. Documentations and case studies are available at https://sqlflow.org. The source code and additional details can be found at https://github.com/sql-machine-learning/sqlflow. &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2023 Conference Paper

Unleashing the Power of Graph Data Augmentation on Covariate Distribution Shift

  • Yongduo Sui
  • Qitian Wu
  • Jiancan Wu
  • Qing Cui
  • Longfei Li
  • Jun Zhou
  • Xiang Wang
  • Xiangnan He

The issue of distribution shifts is emerging as a critical concern in graph representation learning. From the perspective of invariant learning and stable learning, a recently well-established paradigm for out-of-distribution generalization, stable features of the graph are assumed to causally determine labels, while environmental features tend to be unstable and can lead to the two primary types of distribution shifts. The correlation shift is often caused by the spurious correlation between environmental features and labels that differs between the training and test data; the covariate shift often stems from the presence of new environmental features in test data. However, most strategies, such as invariant learning or graph augmentation, typically struggle with limited training environments or perturbed stable features, thus exposing limitations in handling the problem of covariate shift. To address this challenge, we propose a simple-yet-effective data augmentation strategy, Adversarial Invariant Augmentation (AIA), to handle the covariate shift on graphs. Specifically, given the training data, AIA aims to extrapolate and generate new environments, while concurrently preserving the original stable features during the augmentation process. Such a design equips the graph classification model with an enhanced capability to identify stable features in new environments, thereby effectively tackling the covariate shift in data. Extensive experiments with in-depth empirical analysis demonstrate the superiority of our approach. The implementation codes are publicly available at https: //github. com/yongduosui/AIA.

NeurIPS Conference 2022 Conference Paper

Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding

  • Caizhi Tang
  • Huiyuan Wang
  • Xinyu Li
  • Qing Cui
  • Ya-Lin Zhang
  • Feng Zhu
  • Longfei Li
  • Jun Zhou

Unmeasured confounding poses a significant threat to the validity of causal inference. Despite that various ad hoc methods are developed to remove confounding effects, they are subject to certain fairly strong assumptions. In this work, we consider the estimation of conditional causal effects in the presence of unmeasured confounding using observational data and historical controls. Under an interpretable transportability condition, we prove the partial identifiability of conditional average treatment effect on the treated group (CATT). For tree-based models, a new notion, \emph{confounding entropy}, is proposed to measure the discrepancy introduced by unobserved confounders between the conditional outcome distribution of the treated and control groups. The confounding entropy generalizes conventional confounding bias, and can be estimated effectively using historical controls. We develop a new method, debiased causal tree, whose splitting rule is to minimize the empirical risk regularized by the confounding entropy. Notably, our method integrates current observational data (for empirical risk) and their historical controls (for confounding entropy) harmoniously. We highlight that, debiased causal tree can not only estimate CATT well in the presence of unmeasured confounding, but also is a robust estimator of conditional average treatment effect (CATE) against the imbalance of the treated and control populations when all confounders are observed. An extension of combining multiple debiased causal trees to further reduce biases by gradient boosting is considered. The computational feasibility and statistical power of our method are evidenced by simulations and a study of a credit card balance dataset.

IJCAI Conference 2022 Conference Paper

Learning Mixture of Neural Temporal Point Processes for Multi-dimensional Event Sequence Clustering

  • Yunhao Zhang
  • Junchi Yan
  • Xiaolu Zhang
  • Jun Zhou
  • Xiaokang Yang

Multi-dimensional event sequence clustering applies to many scenarios e. g. e-Commerce and electronic health. Traditional clustering models fail to characterize complex real-world processes due to the strong parametric assumption. While Neural Temporal Point Processes (NTPPs) mainly focus on modeling similar sequences instead of clustering. To fill the gap, we propose Mixture of Neural Temporal Point Processes (NTPP-MIX), a general framework that can utilize many existing NTPPs for multi-dimensional event sequence clustering. In NTPP-MIX, the prior distribution of coefficients for cluster assignment is modeled by a Dirichlet distribution. When the assignment is given, the conditional probability of a sequence is modeled by the mixture of a series of NTPPs. We combine variational EM algorithm and Stochastic Gradient Descent (SGD) to efficiently train the framework. In E-step, we fix parameters for NTPPs and approximate the true posterior with variational distributions. In M-step, we fix variational distributions and use SGD to update parameters of NTPPs. Extensive experimental results on four synthetic datasets and three real-world datasets show the effectiveness of NTPP-MIX against state-of-the-arts.

IJCAI Conference 2022 Conference Paper

MERIT: Learning Multi-level Representations on Temporal Graphs

  • Binbin Hu
  • Zhengwei Wu
  • Jun Zhou
  • Ziqi Liu
  • Zhigang Huangfu
  • Zhiqiang Zhang
  • Chaochao Chen

Recently, representation learning on temporal graphs has drawn increasing attention, which aims at learning temporal patterns to characterize the evolving nature of dynamic graphs in real-world applications. Despite effectiveness, these methods commonly ignore the individual- and combinatorial-level patterns derived from different types of interactions (e. g. ,user-item), which are at the heart of the representation learning on temporal graphs. To fill this gap, we propose MERIT, a novel multi-level graph attention network for inductive representation learning on temporal graphs. We adaptively embed the original timestamps to a higher, continuous dimensional space for learn-ing individual-level periodicity through Personalized Time Encoding (PTE) module. Furthermore, we equip MERIT with Continuous time and Con-text aware Attention (Coco-Attention) mechanism which chronologically locates most relevant neighbors by jointly capturing multi-level context on temporal graphs. Finally, MERIT performs multiple aggregations and propagations to explore and exploit high-order structural information for down-stream tasks. Extensive experiments on four public datasets demonstrate the effectiveness of MERITon both (inductive / transductive) link prediction and node classification task.

JBHI Journal 2022 Journal Article

Multi-Dimensional Feature Combination Method for Continuous Blood Pressure Measurement Based on Wrist PPG Sensor

  • Pan Yao
  • Ning Xue
  • Siyuan Yin
  • Changhua You
  • Yusen Guo
  • Yi Shi
  • Tiezhu Liu
  • Lei Yao

The cuff-less blood pressure (BP) monitoring method based on photoplethysmo- gram (PPG) makes it possible for long-term BP monitoring to prevent and treat cardiovascular and cerebrovascular events. In this paper, a portable BP prediction system based on feature combination and artificial neural network (ANN) is implemented. The robustness of the model is improved from three aspects. Firstly, an adaptive peak extraction algorithm was used to improve the accuracy of peaks and troughs detection. Secondly, multi-dimensional features were extracted and fused, including three groups of PPG-based features and one group of demographics-based features. Finally, a two-layer feedforward artificial neural networks algorithm was used for regression. Thirty-three subjects distributed in the three BP groups were recruited. The proposed method passed the European Society of Hypertension International Protocol revision 2010 (ESP-IP2). Experimental results show that the proposed method exhibits good accuracy for a diverse population with an estimation error of −0. 07 ± 4. 47 mmHg for SBP and 0. 00 ± 3. 61 mmHg for DBP. Moreover, the model tracked the BP of two subjects for half a month, laying the foundation work for daily BP monitoring. This work will contribute to the long-term wellness management and rehabilitation process, enabling timely detection and improvement of the user's physical health.

AAAI Conference 2022 Conference Paper

Regularizing Graph Neural Networks via Consistency-Diversity Graph Augmentations

  • Deyu Bo
  • Binbin Hu
  • Xiao Wang
  • Zhiqiang Zhang
  • Chuan Shi
  • Jun Zhou

Despite the remarkable performance of graph neural networks (GNNs) in semi-supervised learning, it is criticized for not making full use of unlabeled data and suffering from overfitting. Recently, graph data augmentation, used to improve both accuracy and generalization of GNNs, has received considerable attentions. However, one fundamental question is how to evaluate the quality of graph augmentations in principle? In this paper, we propose two metrics, Consistency and Diversity, from the aspects of augmentation correctness and generalization. Moreover, we discover that existing augmentations fall into a dilemma between these two metrics. Can we find a graph augmentation satisfying both consistency and diversity? A well-informed answer can help us understand the mechanism behind graph augmentation and improve the performance of GNNs. To tackle this challenge, we analyze two representative semi-supervised learning algorithms: label propagation (LP) and consistency regularization (CR). We find that LP utilizes the prior knowledge of graphs to improve consistency and CR adopts variable augmentations to promote diversity. Based on this discovery, we treat neighbors as augmentations to capture the prior knowledge embodying homophily assumption, which promises a high consistency of augmentations. To further promote diversity, we randomly replace the immediate neighbors of each node with its remote neighbors. After that, a neighbor-constrained regularization is proposed to enforce the predictions of the augmented neighbors to be consistent with each other. Extensive experiments on five real-world graphs validate the superiority of our method in improving the accuracy and generalization of GNNs.

AAAI Conference 2022 Conference Paper

Robust Heterogeneous Graph Neural Networks against Adversarial Attacks

  • Mengmei Zhang
  • Xiao Wang
  • Meiqi Zhu
  • Chuan Shi
  • Zhiqiang Zhang
  • Jun Zhou

Heterogeneous Graph Neural Networks (HGNNs) have drawn increasing attention in recent years and achieved outstanding performance in many tasks. However, despite their wide use, there is currently no understanding of their robustness to adversarial attacks. In this work, we first systematically study the robustness of HGNNs and show that they can be easily fooled by adding the adversarial edge between the target node and large-degree node (i. e. , hub). Furthermore, we show two key reasons for such vulnerabilities of HGNNs: one is perturbation enlargement effect, i. e. , HGNNs, failing to encode transiting probability, will enlarge the effect of the adversarial hub in comparison of GCNs, and the other is soft attention mechanism, i. e. , such mechanism assigns positive attention values to obviously unreliable neighbors. Based on the two facts, we propose a novel robust HGNN framework RoHe against topology adversarial attacks by equipping an attention purifier, which can prune malicious neighbors based on topology and feature. Specifically, to eliminate the perturbation enlargement, we introduce the metapath-based transiting probability as the prior criterion of the purifier, restraining the confidence of malicious neighbors from adversarial hub. Then the purifier learns to mask out neighbors with low confidence, thus can effectively alleviate the negative effect of malicious neighbors in the soft attention mechanism. Extensive experiments on different benchmark datasets for multiple HGNNs are conducted, where the considerable improvement of HGNNs under adversarial attacks will demonstrate the effectiveness and generalization ability of our defense framework.

AAAI Conference 2022 Conference Paper

SAIL: Self-Augmented Graph Contrastive Learning

  • Lu Yu
  • Shichao Pei
  • Lizhong Ding
  • Jun Zhou
  • Longfei Li
  • Chuxu Zhang
  • Xiangliang Zhang

This paper studies learning node representations with graph neural networks (GNNs) for unsupervised scenario. Specifically, we derive a theoretical analysis and provide an empirical demonstration about the non-steady performance of GNNs over different graph datasets, when the supervision signals are not appropriately defined. The performance of GNNs depends on both the node feature smoothness and the locality of graph structure. To smooth the discrepancy of node proximity measured by graph topology and node feature, we proposed SAIL - a novel Self-Augmented graph contrastive Learning framework, with two complementary self-distilling regularization modules, i. e. , intra- and inter-graph knowledge distillation. We demonstrate the competitive performance of SAIL on a variety of graph applications. Even with a single GNN layer, SAIL has consistently competitive or even better performance on various benchmark datasets, comparing with state-of-the-art baselines.

TIST Journal 2022 Journal Article

Toward Scalable and Privacy-preserving Deep Neural Network via Algorithmic-Cryptographic Co-design

  • Jun Zhou
  • Longfei Zheng
  • Chaochao Chen
  • Yan Wang
  • Xiaolin Zheng
  • Bingzhe Wu
  • Cen Chen
  • Li Wang

Deep Neural Networks (DNNs) have achieved remarkable progress in various real-world applications, especially when abundant training data are provided. However, data isolation has become a serious problem currently. Existing works build privacy-preserving DNN models from either algorithmic perspective or cryptographic perspective. The former mainly splits the DNN computation graph between data holders or between data holders and server, which demonstrates good scalability but suffers from accuracy loss and potential privacy risks. In contrast, the latter leverages time-consuming cryptographic techniques, which has strong privacy guarantee but poor scalability. In this article, we propose SPNN—a Scalable and Privacy-preserving deep Neural Network learning framework, from an algorithmic-cryptographic co-perspective. From algorithmic perspective, we split the computation graph of DNN models into two parts, i.e., the private-data-related computations that are performed by data holders and the rest heavy computations that are delegated to a semi-honest server with high computation ability. From cryptographic perspective, we propose using two types of cryptographic techniques, i.e., secret sharing and homomorphic encryption, for the isolated data holders to conduct private-data-related computations privately and cooperatively. Furthermore, we implement SPNN in a decentralized setting and introduce user-friendly APIs. Experimental results conducted on real-world datasets demonstrate the superiority of our proposed SPNN.

JBHI Journal 2022 Journal Article

ULECGNet: An Ultra-Lightweight End-to-End ECG Classification Neural Network

  • Jianbiao Xiao
  • Jiahao Liu
  • Huanqi Yang
  • Qingsong Liu
  • Ning Wang
  • Zhen Zhu
  • Yulong Chen
  • Yu Long

ECG classification is a key technology in intelligent electrocardiogram (ECG) monitoring. In the past, traditional machine learning methods such as support vector machine (SVM) and K-nearest neighbor (KNN) have been used for ECG classification, but with limited classification accuracy. Recently, the end-to-end neural network has been used for ECG classification and shows high classification accuracy. However, the end-to-end neural network has large computational complexity including a large number of parameters and operations. Although dedicated hardware such as field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) can be developed to accelerate the neural network, they result in large power consumption, large design cost, or limited flexibility. In this work, we have proposed an ultra-lightweight end-to-end ECG classification neural network that has extremely low computational complexity (∼8. 2k parameters & ∼227k multiplication/addition operations) and can be squeezed into a low-cost microcontroller (MCU) such as MSP432 while achieving 99. 1% overall classification accuracy. This outperforms the state-of-the-art ECG classification neural network. Implemented on MSP432, the proposed design consumes only 0. 4 mJ and 3. 1 mJ per heartbeat classification for normal and abnormal heartbeats respectively for real-time ECG classification.

IJCAI Conference 2022 Conference Paper

Vertically Federated Graph Neural Network for Privacy-Preserving Node Classification

  • Chaochao Chen
  • Jun Zhou
  • Longfei Zheng
  • Huiwen Wu
  • Lingjuan Lyu
  • Jia Wu
  • Bingzhe Wu
  • Ziqi Liu

Recently, Graph Neural Network (GNN) has achieved remarkable progresses in various real-world tasks on graph data, consisting of node features and the adjacent information between different nodes. High-performance GNN models always depend on both rich features and complete edge information in graph. However, such information could possibly be isolated by different data holders in practice, which is the so-called data isolation problem. To solve this problem, in this paper, we propose VFGNN, a federated GNN learning paradigm for privacy-preserving node classification task under data vertically partitioned setting, which can be generalized to existing GNN models. Specifically, we split the computation graph into two parts. We leave the private data (i. e. , features, edges, and labels) related computations on data holders, and delegate the rest of computations to a semi-honest server. We also propose to apply differential privacy to prevent potential information leakage from the server. We conduct experiments on three benchmarks and the results demonstrate the effectiveness of VFGNN.

NeurIPS Conference 2021 Conference Paper

A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs

  • Runzhong Wang
  • Zhigang Hua
  • Gan Liu
  • Jiayi Zhang
  • Junchi Yan
  • Feng Qi
  • Shuang Yang
  • Jun Zhou

Combinatorial Optimization (CO) has been a long-standing challenging research topic featured by its NP-hard nature. Traditionally such problems are approximately solved with heuristic algorithms which are usually fast but may sacrifice the solution quality. Currently, machine learning for combinatorial optimization (MLCO) has become a trending research topic, but most existing MLCO methods treat CO as a single-level optimization by directly learning the end-to-end solutions, which are hard to scale up and mostly limited by the capacity of ML models given the high complexity of CO. In this paper, we propose a hybrid approach to combine the best of the two worlds, in which a bi-level framework is developed with an upper-level learning method to optimize the graph (e. g. add, delete or modify edges in a graph), fused with a lower-level heuristic algorithm solving on the optimized graph. Such a bi-level approach simplifies the learning on the original hard CO and can effectively mitigate the demand for model capacity. The experiments and results on several popular CO problems like Directed Acyclic Graph scheduling, Graph Edit Distance and Hamiltonian Cycle Problem show its effectiveness over manually designed heuristics and single-level learning methods.

IJCAI Conference 2021 Conference Paper

Attention-based Pyramid Dilated Lattice Network for Blind Image Denoising

  • Mohammad Nikzad
  • Yongsheng Gao
  • Jun Zhou

Though convolutional neural networks (CNNs) with residual and dense aggregations have obtained much attention in image denoising, they are incapable of exploiting different levels of contextual information at every convolutional unit in order to infer different levels of noise components with a single model. In this paper, to overcome this shortcoming we present a novel attention-based pyramid dilated lattice (APDL) architecture and investigate its capability for blind image denoising. The proposed framework can effectively harness the advantages of residual and dense aggregations to achieve a great trade-off between performance, parameter efficiency, and test time. It also employs a novel pyramid dilated convolution strategy to effectively capture contextual information corresponding to different noise levels through the training of a single model. Our extensive experimental investigation verifies the effectiveness and efficiency of the APDL architecture for image denoising as well as JPEG artifacts suppression tasks.

IJCAI Conference 2021 Conference Paper

Cross-Domain Recommendation: Challenges, Progress, and Prospects

  • Feng Zhu
  • Yan Wang
  • Chaochao Chen
  • Jun Zhou
  • Longfei Li
  • Guanfeng Liu

To address the long-standing data sparsity problem in recommender systems (RSs), cross-domain recommendation (CDR) has been proposed to leverage the relatively richer information from a richer domain to improve the recommendation performance in a sparser domain. Although CDR has been extensively studied in recent years, there is a lack of a systematic review of the existing CDR approaches. To fill this gap, in this paper, we provide a comprehensive review of existing CDR approaches, including challenges, research progress, and prospects. Specifically, we first summarize existing CDR approaches into four types, including single-target CDR, single-target multi-domain recommendation (MDR), dual-target CDR, and multi-target CDR. We then present the definitions and challenges of these CDR approaches. Next, we propose a full-view categorization and new taxonomies on these approaches and report their research progress in detail. In the end, we share several promising prospects in CDR.

JBHI Journal 2021 Journal Article

Discriminant Tensor-Based Manifold Embedding for Medical Hyperspectral Imagery

  • Meng Lv
  • Wei Li
  • Tianhong Chen
  • Jun Zhou
  • Ran Tao

Medical hyperspectral imagery has recentlyattracted considerable attention. However, for identification tasks, the high dimensionality of hyperspectral images usually leads to poor performance. Thus, dimensionality reduction (DR) is crucial in hyperspectral image analysis. Motivated by exploiting the underlying structure information of medical hyperspectral images and enhancing the discriminant ability of features, a discriminant tensor-based manifold embedding (DTME) is proposed for discriminant analysis of medical hyperspectral images. Based on the idea of manifold learning, a new discriminant similarity metric is designed, which takes into account the tensor representation, sparsity, low-rank and distribution characteristics. Then, an inter-class tensor graph and an intra-class tensor graph are constructed using the new similarity metric to reveal intrinsic manifold of hyperspectral data. Dimensionality reduction is achieved by embedding this supervised tensor graphs into the low-dimensional tensor subspace. Experimental results on membranous nephropathy and white bloodcells identification tasks demonstrate the potential clinical value of the proposed DTME.

NeurIPS Conference 2021 Conference Paper

MixSeq: Connecting Macroscopic Time Series Forecasting with Microscopic Time Series Data

  • Zhibo Zhu
  • Ziqi Liu
  • Ge Jin
  • Zhiqiang Zhang
  • Lei Chen
  • Jun Zhou
  • Jianyong Zhou

Time series forecasting is widely used in business intelligence, e. g. , forecast stock market price, sales, and help the analysis of data trend. Most time series of interest are macroscopic time series that are aggregated from microscopic data. However, instead of directly modeling the macroscopic time series, rare literature studied the forecasting of macroscopic time series by leveraging data on the microscopic level. In this paper, we assume that the microscopic time series follow some unknown mixture probabilistic distributions. We theoretically show that as we identify the ground truth latent mixture components, the estimation of time series from each component could be improved because of lower variance, thus benefitting the estimation of macroscopic time series as well. Inspired by the power of Seq2seq and its variants on the modeling of time series data, we propose Mixture of Seq2seq (MixSeq), an end2end mixture model to cluster microscopic time series, where all the components come from a family of Seq2seq models parameterized by different parameters. Extensive experiments on both synthetic and real-world data show the superiority of our approach.

AAAI Conference 2021 Conference Paper

Near Lossless Transfer Learning for Spiking Neural Networks

  • Zhanglu Yan
  • Jun Zhou
  • Weng-Fai Wong

Spiking neural networks (SNNs) significantly reduce energy consumption by replacing weight multiplications with additions. This makes SNNs suitable for energy-constrained platforms. However, due to its discrete activation, training of SNNs remains a challenge. A popular approach is to first train an equivalent CNN using traditional backpropagation, and then transfer the weights to the intended SNN. Unfortunately, this often results in significant accuracy loss, especially in deeper networks. In this paper, we propose CQ training (Clamped and Quantized training), an SNN-compatible CNN training algorithm with clamp and quantization that achieves near-zero conversion accuracy loss. Essentially, CNN training in CQ training accounts for certain SNN characteristics. Using a 7 layer VGG-∗ and a 21 layer VGG-19, running on the CIFAR-10 dataset, we achieved 94. 16% and 93. 44% accuracy in the respective equivalent SNNs. It outperforms other existing comparable works that we know of. We also demonstrate the low-precision weight compatibility for the VGG- 19 structure. Without retraining, an accuracy of 93. 43% and 92. 82% using quantized 9-bit and 8-bit weights, respectively, was achieved. The framework was developed in PyTorch and is publicly available. 1

NeurIPS Conference 2020 Conference Paper

Bandit Samplers for Training Graph Neural Networks

  • Ziqi Liu
  • Zhengwei Wu
  • Zhiqiang Zhang
  • Jun Zhou
  • Shuang Yang
  • Le Song
  • Yuan Qi

Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs). However, due to the intractable computation of optimal sampling distribution, these sampling algorithms are suboptimal for GCNs and are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT). The fundamental reason is that the embeddings of the neighbors or learned weights involved in the optimal sampling distribution are \emph{changing} during the training and \emph{not known a priori}, but only \emph{partially observed} when sampled, thus making the derivation of an optimal variance reduced samplers non-trivial. In this paper, we formulate the optimization of the sampling variance as an adversary bandit problem, where the rewards are related to the node embeddings and learned weights, and can vary constantly. Thus a good sampler needs to acquire variance information about more neighbors (exploration) while at the same time optimizing the immediate sampling variance (exploit). We theoretically show that our algorithm asymptotically approaches the optimal variance within a factor of 3. We show the efficiency and effectiveness of our approach on multiple datasets.

AAAI Conference 2020 Conference Paper

Characterizing Membership Privacy in Stochastic Gradient Langevin Dynamics

  • Bingzhe Wu
  • Chaochao Chen
  • Shiwan Zhao
  • Cen Chen
  • Yuan Yao
  • Guangyu Sun
  • Li Wang
  • Xiaolu Zhang

Bayesian deep learning is recently regarded as an intrinsic way to characterize the weight uncertainty of deep neural networks (DNNs). Stochastic Gradient Langevin Dynamics (SGLD) is an effective method to enable Bayesian deep learning on large-scale datasets. Previous theoretical studies have shown various appealing properties of SGLD, ranging from the convergence properties to the generalization bounds. In this paper, we study the properties of SGLD from a novel perspective of membership privacy protection (i. e. , preventing the membership attack). The membership attack, which aims to determine whether a specific sample is used for training a given DNN model, has emerged as a common threat against deep learning algorithms. To this end, we build a theoretical framework to analyze the information leakage (w. r. t. the training dataset) of a model trained using SGLD. Based on this framework, we demonstrate that SGLD can prevent the information leakage of the training dataset to a certain extent. Moreover, our theoretical analysis can be naturally extended to other types of Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) methods. Empirical results on different datasets and models verify our theoretical findings and suggest that the SGLD algorithm can not only reduce the information leakage but also improve the generalization ability of the DNN models in real-world applications.

AAAI Conference 2020 Conference Paper

Deep Residual-Dense Lattice Network for Speech Enhancement

  • Mohammad Nikzad
  • Aaron Nicolson
  • Yongsheng Gao
  • Jun Zhou
  • Kuldip K. Paliwal
  • Fanhua Shang

Convolutional neural networks (CNNs) with residual links (ResNets) and causal dilated convolutional units have been the network of choice for deep learning approaches to speech enhancement. While residual links improve gradient flow during training, feature diminution of shallow layer outputs can occur due to repetitive summations with deeper layer outputs. One strategy to improve feature re-usage is to fuse both ResNets and densely connected CNNs (DenseNets). DenseNets, however, over-allocate parameters for feature reusage. Motivated by this, we propose the residual-dense lattice network (RDL-Net), which is a new CNN for speech enhancement that employs both residual and dense aggregations without over-allocating parameters for feature re-usage. This is managed through the topology of the RDL blocks, which limit the number of outputs used for dense aggregations. Our extensive experimental investigation shows that RDL-Nets are able to achieve a higher speech enhancement performance than CNNs that employ residual and/or dense aggregations. RDL-Nets also use substantially fewer parameters and have a lower computational requirement. Furthermore, we demonstrate that RDL-Nets outperform many state-of-the-art deep learning approaches to speech enhancement. Availability: https: //github. com/nick-nikzad/RDL-SE.

IJCAI Conference 2020 Conference Paper

Financial Risk Analysis for SMEs with Graph-based Supply Chain Mining

  • Shuo Yang
  • Zhiqiang Zhang
  • Jun Zhou
  • Yang Wang
  • Wang Sun
  • Xingyu Zhong
  • Yanming Fang
  • Quan Yu

Small and Medium-sized Enterprises (SMEs) are playing a vital role in the modern economy. Recent years, financial risk analysis for SMEs attracts lots of attentions from financial institutions. However, the financial risk analysis for SMEs usually suffers data deficiency problem, especially for the mobile financial institutions which seldom collect credit-related data directly from SMEs. Fortunately, although credit-related information of SMEs is hard to be acquired sufficiently, the interactive relationships between SMEs, which may contain valuable information of financial risk, is usually available for the mobile financial institutions. Finding out credit-related relationship of SME from massive interactions helps comprehensively model the SMEs thus improve the performance of financial risk analysis. In this paper, tackling the data deficiency problem of financial risk analysis for SMEs, we propose an innovative financial risk analysis framework with graph-based supply chain mining. Specifically, to capture the credit-related topology structural and temporal variation information of SMEs, we design and employ a novel spatial-temporal aware graph neural network, to mine supply chain relationship on a SME graph, and then analysis the credit risk based on the mined supply chain graph. Experimental results on real-world financial datasets prove the effectiveness of our proposal for financial risk analysis for SMEs.

TIST Journal 2020 Journal Article

Practical Privacy Preserving POI Recommendation

  • Chaochao Chen
  • Jun Zhou
  • Bingzhe Wu
  • Wenjing Fang
  • Li Wang
  • Yuan Qi
  • Xiaolin Zheng

Point-of-Interest (POI) recommendation has been extensively studied and successfully applied in industry recently. However, most existing approaches build centralized models on the basis of collecting users’ data. Both private data and models are held by the recommender, which causes serious privacy concerns. In this article, we propose a novel Privacy preserving POI Recommendation (PriRec) framework. First, to protect data privacy, users’ private data (features and actions) are kept on their own side, e.g., Cellphone or Pad. Meanwhile, the public data that need to be accessed by all the users are kept by the recommender to reduce the storage costs of users’ devices. Those public data include: (1) static data only related to the status of POI, such as POI categories, and (2) dynamic data dependent on user-POI actions such as visited counts. The dynamic data could be sensitive, and we develop local differential privacy techniques to release such data to the public with privacy guarantees. Second, PriRec follows the representations of Factorization Machine (FM) that consists of a linear model and the feature interaction model. To protect the model privacy, the linear models are saved on the users’ side, and we propose a secure decentralized gradient descent protocol for users to learn it collaboratively. The feature interaction model is kept by the recommender since there is no privacy risk, and we adopt a secure aggregation strategy in a federated learning paradigm to learn it. To this end, PriRec keeps users’ private raw data and models in users’ own hands, and protects user privacy to a large extent. We apply PriRec in real-world datasets, and comprehensive experiments demonstrate that, compared with FM, PriRec achieves comparable or even better recommendation accuracy.

JBHI Journal 2020 Journal Article

Progressive Transfer Learning and Adversarial Domain Adaptation for Cross-Domain Skin Disease Classification

  • Yanyang Gu
  • Zongyuan Ge
  • C. Paul Bonnington
  • Jun Zhou

Deep learning has been used to analyze and diagnose various skin diseases through medical imaging. However, recent researches show that a well-trained deep learning model may not generalize well to data from different cohorts due to domain shift. Simple data fusion techniques such as combining disease samples from different data sources are not effective to solve this problem. In this paper, we present two methods for a novel task of cross-domain skin disease recognition. Starting from a fully supervised deep convolutional neural network classifier pre-trained on ImageNet, we explore a two-step progressive transfer learning technique by fine-tuning the network on two skin disease datasets. We then propose to adopt adversarial learning as a domain adaptation technique to perform invariant attribute translation from source to target domain in order to improve the recognition performance. In order to evaluate these two methods, we analyze generalization capability of the trained model on melanoma detection, cancer detection, and cross-modality learning tasks on two skin image datasets collected from different clinical settings and cohorts with different disease distributions. The experiments prove the effectiveness of our method in solving the domain shift problem.

AAAI Conference 2020 Conference Paper

Pruning from Scratch

  • Yulong Wang
  • Xiaolu Zhang
  • Lingxi Xie
  • Jun Zhou
  • Hang Su
  • Bo Zhang
  • Xiaolin Hu

Network pruning is an important research field aiming at reducing computational costs of neural networks. Conventional approaches follow a fixed paradigm which first trains a large and redundant network, and then determines which units (e. g. , channels) are less important and thus can be removed. In this work, we find that pre-training an over-parameterized model is not necessary for obtaining the target pruned structure. In fact, a fully-trained over-parameterized model will reduce the search space for the pruned structure. We empirically show that more diverse pruned structures can be directly pruned from randomly initialized weights, including potential models with better performance. Therefore, we propose a novel network pruning pipeline which allows pruning from scratch with little training overhead. In the experiments for compressing classification models on CIFAR10 and ImageNet datasets, our approach not only greatly reduces the pre-training burden of traditional pruning methods, but also achieves similar or even higher accuracy under the same computation budgets. Our results facilitate the community to rethink the effectiveness of existing techniques used for network pruning.

JBHI Journal 2020 Journal Article

TSE-CNN: A Two-Stage End-to-End CNN for Human Activity Recognition

  • Jiahui Huang
  • Shuisheng Lin
  • Ning Wang
  • Guanghai Dai
  • Yuxiang Xie
  • Jun Zhou

Human activity recognition has been widely used in healthcare applications such as elderly monitoring, exercise supervision, and rehabilitation monitoring. Compared with other approaches, sensor-based wearable human activity recognition is less affected by environmental noise and therefore is promising in providing higher recognition accuracy. However, one of the major issues of existing wearable human activity recognition methods is that although the average recognition accuracy is acceptable, the recognition accuracy for some activities (e. g. , ascending stairs and descending stairs) is low, mainly due to relatively less training data and complex behavior pattern for these activities. Another issue is that the recognition accuracy is low when the training data from the test subject are limited, which is a common case in real practice. In addition, the use of neural network leads to large computational complexity and thus high power consumption. To address these issues, we proposed a new human activity recognition method with two-stage end-to-end convolutional neural network and a data augmentation method. Compared with the state-of-the-art methods (including neural network based methods and other methods), the proposed methods achieve significantly improved recognition accuracy and reduced computational complexity.

IJCAI Conference 2019 Conference Paper

AntProphet: an Intention Mining System behind Alipay's Intelligent Customer Service Bot

  • Cen Chen
  • Xiaolu Zhang
  • Sheng Ju
  • Chilin Fu
  • Caizhi Tang
  • Jun Zhou
  • Xiaolong Li

We create an intention mining system, named AntProphet, for Alipay's intelligent customer service bot, to alleviate the burden of customer service. Whenever users have any questions, AntProphet is the first stop to help users to answer their questions. Our system gathers users' profile and their historical behavioral trajectories, together with contextual information to predict users' intention, i. e. , the potential questions that users want to resolve. AntProphet takes care of more than 90% of the customer service demands in the Alipay APP and resolves most of the users' problems on the spot, thus significantly reduces the burden of manpower. With the help of it, the overall satisfaction rate of our customer service bot exceeds 85%.

AAAI Conference 2019 Conference Paper

Cash-Out User Detection Based on Attributed Heterogeneous Information Network with a Hierarchical Attention Mechanism

  • Binbin Hu
  • Zhiqiang Zhang
  • Chuan Shi
  • Jun Zhou
  • Xiaolong Li
  • Yuan Qi

As one of the major frauds in financial services, cash-out fraud is that users pursue cash gains with illegal or insincere means. Conventional solutions for the cash-out user detection are to perform subtle feature engineering for each user and then apply a classifier, such as GDBT and Neural Network. However, users in financial services have rich interaction relations, which are seldom fully exploited by conventional solutions. In this paper, with the real datasets in Ant Credit Pay of Ant Financial Services Group, we first study the cashout user detection problem and propose a novel hierarchical attention mechanism based cash-out user detection model, called HACUD. Specifically, we model different types of objects and their rich attributes and interaction relations in the scenario of credit payment service with an Attributed Heterogeneous Information Network (AHIN). The HACUD model enhances feature representation of objects through meta-path based neighbors exploiting different aspects of structure information in AHIN. Furthermore, a hierarchical attention mechanism is elaborately designed to model user’s preferences towards attributes and meta-paths. Experimental results on two real datasets show that the HACUD outperforms the state-of-the-art methods.

TIST Journal 2019 Journal Article

Distributed Deep Forest and its Application to Automatic Detection of Cash-Out Fraud

  • Ya-Lin Zhang
  • Jun Zhou
  • Wenhao Zheng
  • Ji Feng
  • Longfei Li
  • Ziqi Liu
  • Ming Li
  • Zhiqiang Zhang

Internet companies are facing the need for handling large-scale machine learning applications on a daily basis and distributed implementation of machine learning algorithms which can handle extra-large-scale tasks with great performance is widely needed. Deep forest is a recently proposed deep learning framework which uses tree ensembles as its building blocks and it has achieved highly competitive results on various domains of tasks. However, it has not been tested on extremely large-scale tasks. In this work, based on our parameter server system, we developed the distributed version of deep forest. To meet the need for real-world tasks, many improvements are introduced to the original deep forest model, including MART (Multiple Additive Regression Tree) as base learners for efficiency and effectiveness consideration, the cost-based method for handling prevalent class-imbalanced data, MART based feature selection for high dimension data, and different evaluation metrics for automatically determining the cascade level. We tested the deep forest model on an extra-large-scale task, i.e., automatic detection of cash-out fraud, with more than 100 million training samples. Experimental results showed that the deep forest model has the best performance according to the evaluation metrics from different perspectives even with very little effort for parameter tuning. This model can block fraud transactions in a large amount of money each day. Even compared with the best-deployed model, the deep forest model can additionally bring a significant decrease in economic loss each day.

NeurIPS Conference 2019 Conference Paper

Generalization in Generative Adversarial Networks: A Novel Perspective from Privacy Protection

  • Bingzhe Wu
  • Shiwan Zhao
  • Chaochao Chen
  • Haoyang Xu
  • Li Wang
  • Xiaolu Zhang
  • Guangyu Sun
  • Jun Zhou

In this paper, we aim to understand the generalization properties of generative adversarial networks (GANs) from a new perspective of privacy protection. Theoretically, we prove that a differentially private learning algorithm used for training the GAN does not overfit to a certain degree, i. e. , the generalization gap can be bounded. Moreover, some recent works, such as the Bayesian GAN, can be re-interpreted based on our theoretical insight from privacy protection. Quantitatively, to evaluate the information leakage of well-trained GAN models, we perform various membership attacks on these models. The results show that previous Lipschitz regularization techniques are effective in not only reducing the generalization gap but also alleviating the information leakage of the training dataset.

AAAI Conference 2019 Conference Paper

GeniePath: Graph Neural Networks with Adaptive Receptive Paths

  • Ziqi Liu
  • Chaochao Chen
  • Longfei Li
  • Jun Zhou
  • Xiaolong Li
  • Le Song
  • Yuan Qi

In this paper, we propose a new online feature selection algorithm for streaming data. We aim to focus on the following two problems which remain unaddressed in literature. First, most existing online feature selection algorithms merely utilize the first-order information of the data streams, regardless of the fact that second-order information explores the correlations between features and significantly improves the performance. Second, most online feature selection algorithms are based on the balanced data presumption, which is not true in many real-world applications. For example, in fraud detection, the number of positive examples are much less than negative examples because most cases are not fraud. The balanced assumption will make the selected features biased towards the majority class and fail to detect the fraud cases. We propose an Adaptive Sparse Confidence-Weighted (ASCW) algorithm to solve the aforementioned two problems. We first introduce an `0-norm constraint into the second-order confidence-weighted (CW) learning for feature selection. Then the original loss is substituted with a cost-sensitive loss function to address the imbalanced data issue. Furthermore, our algorithm maintains multiple sparse CW learner with the corresponding cost vector to dynamically select an optimal cost. We theoretically enhance the theory of sparse CW learning and analyze the performance behavior in F-measure. Empirical studies show the superior performance over the stateof-the-art online learning methods in the online-batch setting.

IJCAI Conference 2019 Conference Paper

Latent Distribution Preserving Deep Subspace Clustering

  • Lei Zhou
  • Xiao Bai
  • Dong Wang
  • Xianglong Liu
  • Jun Zhou
  • Edwin Hancock

Subspace clustering is a useful technique for many computer vision applications in which the intrinsic dimension of high-dimensional data is smaller than the ambient dimension. Traditional subspace clustering methods often rely on the self-expressiveness property, which has proven effective for linear subspace clustering. However, they perform unsatisfactorily on real data with complex nonlinear subspaces. More recently, deep autoencoder based subspace clustering methods have achieved success owning to the more powerful representation extracted by the autoencoder network. Unfortunately, these methods only considering the reconstruction of original input data can hardly guarantee the latent representation for the data distributed in subspaces, which inevitably limits the performance in practice. In this paper, we propose a novel deep subspace clustering method based on a latent distribution-preserving autoencoder, which introduces a distribution consistency loss to guide the learning of distribution-preserving latent representation, and consequently enables strong capacity of characterizing the real-world data for subspace clustering. Experimental results on several public databases show that our method achieves significant improvement compared with the state-of-the-art subspace clustering methods.

TAAS Journal 2018 Journal Article

Adaptive Process Migrations in Coupled Applications for Exchanging Data in Local File Cache

  • Jianwei Liao
  • Zhigang Cai
  • Francois Trahay
  • Jun Zhou
  • Guoqiang Xiao

Many problems in science and engineering are usually emulated as a set of mutually interacting models, resulting in a coupled or multiphysics application. These component models show challenges originating from their interdisciplinary nature and from their computational and algorithmic complexities. In general, these models are independently developed and maintained, so that they commonly employ the global file system for exchanging their data in the coupled application. To effectively use the local file cache on the compute node for exchanging the data among the processes of such applications, and consequently boosting I/O performance, this article presents a novel mechanism to migrate a process from one compute node to another node on the basis of block I/O dependency. In this newly proposed mechanism, the block I/O dependency between two involved processes running on the different nodes is profiled as block access similarity by taking advantage of the Cohen’s kappa statistic. Then, the process is supposed to be dynamically migrated from its source node to the destination node, on which there is another process having heavy block I/O dependency. As a result, both processes can exchange their data by utilizing the local file cache instead of the global file system to reduce I/O time. The experimental results demonstrate that the I/O performance can be significantly improved, and the time required for executing the application can be resultantly decreased, as expected.

AAAI Conference 2018 Conference Paper

Curve-Structure Segmentation From Depth Maps: A CNN-Based Approach and Its Application to Exploring Cultural Heritage Objects

  • Yuhang Lu
  • Jun Zhou
  • Jing Wang
  • Jun Chen
  • Karen Smith
  • Colin Wilder
  • Song Wang

Motivated by the important archaeological application of exploring cultural heritage objects, in this paper we study the challenging problem of automatically segmenting curve structures that are very weakly stamped or carved on an object surface in the form of a highly noisy depth map. Different from most classical low-level image segmentation methods that are known to be very sensitive to the noise and occlusions, we propose a new supervised learning algorithm based on Convolutional Neural Network (CNN) to implicitly learn and utilize more curve geometry and pattern information for addressing this challenging problem. More specifically, we first propose a Fully Convolutional Network (FCN) to estimate the skeleton of curve structures and at each skeleton pixel, a scale value is estimated to reflect the local curve width. Then we propose a dense prediction network to re- fine the estimated curve skeletons. Based on the estimated scale values, we finally develop an adaptive thresholding algorithm to achieve the final segmentation of curve structures. In the experiment, we validate the performance of the proposed method on a dataset of depth images scanned from unearthed pottery sherds dating to the Woodland period of Southeastern North America.

AAAI Conference 2018 Conference Paper

cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information

  • Shaosheng Cao
  • Wei Lu
  • Jun Zhou
  • Xiaolong Li

We propose cw2vec, a novel method for learning Chinese word embeddings. It is based on our observation that exploiting stroke-level information is crucial for improving the learning of Chinese word embeddings. Specifically, we design a minimalist approach to exploit such features, by using stroke n-grams, which capture semantic and morphological level information of Chinese words. Through qualitative analysis, we demonstrate that our model is able to extract semantic information that cannot be captured by existing methods. Empirical results on the word similarity, word analogy, text classification and named entity recognition tasks show that the proposed approach consistently outperforms state-of-the-art approaches such as word-based word2vec and GloVe, character-based CWE, component-based JWE and pixel-based GWE.

AAAI Conference 2018 Conference Paper

Privacy Preserving Point-of-Interest Recommendation Using Decentralized Matrix Factorization

  • Chaochao Chen
  • Ziqi Liu
  • Peilin Zhao
  • Jun Zhou
  • Xiaolong Li

Points of interest (POI) recommendation has been drawn much attention recently due to the increasing popularity of location-based networks, e. g. , Foursquare and Yelp. Among the existing approaches to POI recommendation, Matrix Factorization (MF) based techniques have proven to be effective. However, existing MF approaches suffer from two major problems: (1) Expensive computations and storages due to the centralized model training mechanism: the centralized learners have to maintain the whole user-item rating matrix, and potentially huge low rank matrices. (2) Privacy issues: the users’ preferences are at risk of leaking to malicious attackers via the centralized learner. To solve these, we present a Decentralized MF (DMF) framework for POI recommendation. Specifically, instead of maintaining all the low rank matrices and sensitive rating data for training, we propose a random walk based decentralized training technique to train MF models on each user’s end, e. g. , cell phone and Pad. By doing so, the ratings of each user are still kept on one’s own hand, and moreover, decentralized learning can be taken as distributed learning with multi-learners (users), and thus alleviates the computation and storage issue. Experimental results on two real-world datasets demonstrate that, comparing with the classic and state-of-the-art latent factor models, DMF significantly improvements the recommendation performance in terms of precision and recall.

IJCAI Conference 2017 Conference Paper

Locally Linear Factorization Machines

  • Chenghao Liu
  • Teng Zhang
  • Peilin Zhao
  • Jun Zhou
  • Jianling Sun

Factorization Machines (FMs) are a widely used method for efficiently using high-order feature interactions in classification and regression tasks. Unfortunately, despite increasing interests in FMs, existing work only considers high order information of the input features which limits their capacities in non-linear problems and fails to capture the underlying structures of more complex data. In this work, we present a novel Locally Linear Factorization Machines (LLFM) which overcomes this limitation by exploring local coding technique. Unlike existing local coding classifiers that involve a phase of unsupervised anchor point learning and predefined local coding scheme which is suboptimal as the class label information is not exploited in discovering the encoding and thus can result in a suboptimal encoding for prediction, we formulate a joint optimization over the anchor points, local coding coordinates and FMs variables to minimize classification or regression risk. Empirically, we demonstrate that our approach achieves much better predictive accuracy than other competitive methods which employ LLFM with unsupervised anchor point learning and predefined local coding scheme.