Author name cluster

Jie Ma

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

27 papers

2 author rows

TCS Journal 2026 Journal Article

Dirac-type condition for rainbow chorded pancyclicity

Jie Ma
Junqing Cai

Suppose H is a collection of graphs H 1, H 2, …, H n + 1, where H 1, H 2, …, H n + 1 are not necessarily distinct, and each Hj has the same vertex set U with | U | = n. A subgraph H with V(H)⊆U is said to be rainbow if no two distinct edges in H originate from the same Hj. The graph collection H is said to be rainbow chorded pancyclic if H admits a rainbow chorded cycle of length l for each l ∈ { 4, 5, …, n }. In this paper, we prove that if δ ( H ) = min { δ ( H j ): j = 1, 2, …, n + 1 } ≥ n 2, then H is rainbow chorded pancyclic, with the following exceptions: (1) n ≥ 4 is even and H 1 = H 2 = ⋯ = H n + 1 ≅ K n 2, n 2; or (2) n = 6 and H 1 = H 2 = ⋯ = H 7 ≅ K 2 □ K 3.

Details DOI

TCS Journal 2026 Journal Article

Fan-type condition for two completely independent spanning trees

Jie Ma
Junqing Cai

The spanning trees T 1, T 2, ⋯, T k of a graph G are called k completely independent spanning trees (CISTs) if for any two vertices u, v ∈ V(G), the paths connecting u and v in any two distinct trees are pairwise edge-disjoint and internally vertex-disjoint. CISTs have significant applications in fault-tolerant broadcasting for interconnection networks, substantially improving network reliability and redundancy. However, determining whether a connected graph contains two CISTs is known to be NP-complete. Araki [J. Graph Theory. 77 (2014) 171–179. ] posed an open question regarding whether certain known sufficient conditions for hamiltonian cycles could guarantee the existence of two CISTs. In this paper, we provide an affirmative answer to this question by proving that every connected graph G of order n ≥ 7 with μ 2(G) ≥ n contains two CISTs. Moreover, both the lower bounds on the order n and the degree condition μ 2(G) are best possible.

Details DOI

AAAI Conference 2026 Conference Paper

SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries

Chenxu Dang
Haiyan Liu
Jason Bao
Pei An
Xinyue Tang
An Pan
Jie Ma
Bingchuan Sun

Semantic occupancy has emerged as a powerful representation in world models for its ability to capture rich spatial semantics. However, most existing occupancy world models rely on static and fixed embeddings or grids, which inherently limit the flexibility of perception. Moreover, their ``in-place classification" over grids exhibits a potential misalignment with the dynamic and continuous nature of real scenarios. In this paper, we propose SparseWorld, a novel 4D occupancy world model that is flexible, adaptive, and efficient, powered by sparse and dynamic queries. We propose a Range-Adaptive Perception module, in which learnable queries are modulated by the ego vehicle states and enriched with temporal-spatial associations to enable extended-range perception. To effectively capture the dynamics of the scene, we design a State-Conditioned Forecasting module, which replaces classification-based forecasting with regression-guided formulation, precisely aligning the dynamic queries with the continuity of the 4D environment. In addition, We specifically devise a Temporal-Aware Self-Scheduling training strategy to enable smooth and efficient training. Extensive experiments demonstrate that SparseWorld achieves state-of-the-art performance across perception, forecasting, and planning tasks. Comprehensive visualizations and ablation studies further validate the advantages of SparseWorld in terms of flexibility, adaptability, and efficiency.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

A Minimalistic Unified Framework for Incremental Learning across Image Restoration Tasks

Xiaoxuan Gong
Jie Ma

Existing research in low-level vision has shifted its focus from "one-by-one" task-specific methods to "all-in-one" multi-task unified architectures. However, current all-in-one image restoration approaches primarily aim to improve overall performance across a limited number of tasks. In contrast, how to incrementally add new image restoration capabilities on top of an existing model — that is, task-incremental learning — has been largely unexplored. To fill this research gap, we propose a minimalistic and universal paradigm for task-incremental learning called MINI. It addresses the problem of parameter interference across different tasks through a simple yet effective mechanism, enabling nearly forgetting-free task-incremental learning. Specifically, we design a special meta-convolution called MINI-Conv, which generates parameters solely through lightweight embeddings instead of complex convolutional networks or MLPs. This not only significantly reduces the number of parameters and computational overhead but also achieves complete parameter isolation across different tasks. Moreover, MINI-Conv can be seamlessly integrated as a plug-and-play replacement for any convolutional layer within existing backbone networks, endowing them with incremental learning capabilities. Therefore, our method is highly generalizable. Finally, we demonstrate that our method achieves state-of-the-art performance compared to existing incremental learning approaches across five common image restoration tasks. Moreover, the near forgetting-free nature of our method makes it highly competitive even against all-in-one image restoration methods trained in a full-supervised manner. Our code is available at https: //github. com.

PDF Details

NeurIPS Conference 2025 Conference Paper

ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding

Muye Huang
Lingling Zhang
Jie Ma
Han Lai
Fangzhi Xu
Yifei Li
Wenjun Wu
Yaqiang Wu

Charts are high-density visualization carriers for complex data, serving as a crucial medium for information extraction and analysis. Automated chart understanding poses significant challenges to existing multimodal large language models (MLLMs) due to the need for precise and complex visual reasoning. Current step-by-step reasoning models primarily focus on text-based logical reasoning for chart understanding. However, they struggle to refine or correct their reasoning when errors stem from flawed visual understanding, as they lack the ability to leverage multimodal interaction for deeper comprehension. Inspired by human cognitive behavior, we propose ChartSketcher, a multimodal feedback-driven step-by-step reasoning method designed to address these limitations. ChartSketcher is a chart understanding model that employs Sketch-CoT, enabling MLLMs to annotate intermediate reasoning steps directly onto charts using a programmatic sketching library, iteratively feeding these visual annotations back into the reasoning process. This mechanism enables the model to visually ground its reasoning and refine its understanding over multiple steps. We employ a two-stage training strategy: a cold start phase to learn sketch-based reasoning patterns, followed by off-policy reinforcement learning to enhance reflection and generalization. Experiments demonstrate that ChartSketcher achieves promising performance on chart understanding benchmarks and general vision tasks, providing an interactive and interpretable approach to chart comprehension.

PDF Details

AAAI Conference 2025 Conference Paper

Co-Progression Knowledge Distillation with Knowledge Prototype for Industrial Anomaly Detection

Bokang Yang
Zhe Zhang
Jie Ma

Unsupervised anomaly detection has emerged as a powerful technique for identifying abnormal patterns in images without relying on pre-labeled defective samples. Many unsupervised methods use pre-trained feature extractors from large datasets, with knowledge distillation between teacher and student models being a leading technique. However, due to the similar structures of teacher and student, these methods face challenges like excessive specialization and inadequate generalization, reducing detection performance. In this paper, we introduce a Co-Progression Knowledge Distillation (CPKD) framework, enabling bidirectional learning between teacher and student models. This innovative framework enables concurrent evolution of both models, fostering mutual improvement and enhanced adaptability. To maintain system stability and prevent overspecialization, we introduce a knowledge prototype as a regulatory mechanism for the teacher's learning process. Our method effectively addresses key challenges in anomaly detection, including insufficient learning and overadaptation, by striking a balance between acquiring new knowledge and preserving core competencies. We demonstrate significant improvements in detection accuracy, achieving SOTA performance on the MVTec dataset.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Debate on Graph: A Flexible and Reliable Reasoning Framework for Large Language Models

Jie Ma
Zhitao Gao
Qi Chai
Wangchun Sun
Pinghui Wang
Hongbin Pei
Jing Tao
Lingyun Song

Large Language Models (LLMs) may suffer from hallucinations in real-world applications due to the lack of relevant knowledge. In contrast, knowledge graphs encompass extensive, multi-relational structures that store a vast array of symbolic facts. Consequently, integrating LLMs with knowledge graphs has been extensively explored, with Knowledge Graph Question Answering (KGQA) serving as a critical touchstone for the integration. This task requires LLMs to answer natural language questions by retrieving relevant triples from knowledge graphs. However, existing methods face two significant challenges: *excessively long reasoning paths distracting from the answer generation*, and *false-positive relations hindering the path refinement*. In this paper, we propose an iterative interactive KGQA framework that leverages the interactive learning capabilities of LLMs to perform reasoning and Debating over Graphs (DoG). Specifically, DoG employs a subgraph-focusing mechanism, allowing LLMs to perform answer trying after each reasoning step, thereby mitigating the impact of lengthy reasoning paths. On the other hand, DoG utilizes a multi-role debate team to gradually simplify complex questions, reducing the influence of false-positive relations. This debate mechanism ensures the reliability of the reasoning process. Experimental results on five public datasets demonstrate the effectiveness and superiority of our architecture. Notably, DoG outperforms the state-of-the-art method ToG by 23.7% and 9.1% in accuracy on WebQuestions and GrailQA, respectively. Furthermore, the integration experiments with various LLMs on the mentioned datasets highlight the flexibility of DoG.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs

Jie Ma
NING QU
Zhitao Gao
Xing Rui
Jun Liu
Hongbin Pei
Jiang Xie
Lingyun Song

Knowledge graph-based retrieval-augmented generation seeks to mitigate hallucinations in Large Language Models (LLMs) caused by insufficient or outdated knowledge. However, existing methods often fail to fully exploit the prior knowledge embedded in knowledge graphs (KGs), particularly their structural information and explicit or implicit constraints. The former can enhance the faithfulness of LLMs' reasoning, while the latter can improve the reliability of response generations. Motivated by these, we propose a trustworthy reasoning framework, termed Deliberation over Priors (\texttt{DP}), which sufficiently utilizes the priors contained in KGs. Specifically, \texttt{DP} adopts a progressive knowledge distillation strategy that integrates structural priors into LLMs through a combination of supervised fine-tuning and Kahneman-Tversky Optimization, thereby improving the faithfulness of relation path generation. Furthermore, our framework employs a reasoning-introspection strategy, which guides LLMs to perform refined reasoning verification based on extracted constraint priors, ensuring the reliability of response generation. Extensive experiments on three benchmark datasets demonstrate that \texttt{DP} achieves new state-of-the-art performance, especially a H@1 improvement of 13% on the ComplexWebQuestions dataset, and generates highly trustworthy responses. We also conduct various analyses to verify its flexibility and practicality. Code is available at https: //github. com/mira-ai-lab/Deliberation-on-Priors.

PDF Details

AAAI Conference 2025 Conference Paper

EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

Muye Huang
Han Lai
Xinyu Zhang
Wenjun Wu
Jie Ma
Lingling Zhang
Jun Liu

Chart understanding enables automated data analysis for humans, which requires models to achieve highly accurate visual comprehension. While existing Visual Language Models (VLMs) have shown progress in chart understanding, the lack of high-quality training data and comprehensive evaluation benchmarks hinders VLM chart comprehension. In this paper, we introduce EvoChart, a novel self-training method for generating synthetic chart data to enhance VLMs' capabilities in real-world chart comprehension. We also propose EvoChart-QA, a noval benchmark for measuring models' chart comprehension abilities in real-world scenarios. Specifically, EvoChart is a unique self-training data synthesis approach that simultaneously produces high-quality training corpus and a high-performance chart understanding model. EvoChart-QA consists of 650 distinct real-world charts collected from 140 different websites and 1,250 expert-curated questions that focus on chart understanding. Experimental results on various open-source and proprietary VLMs tested on EvoChart-QA demonstrate that even the best proprietary model, GPT-4o, achieves only 49.8% accuracy. Moreover, the EvoChart method significantly boosts the performance of open-source VLMs on real-world chart understanding tasks, achieving 54.2% accuracy on EvoChart-QA.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Metapath and Hypergraph Structure-based Multi-Channel Graph Contrastive Learning for Student Performance Prediction

Lingyun Song
Xiaofan Sun
Xinbiao Gan
Yudai Pan
Xiaolin Han
Jie Ma
Jun Liu
Xuequn Shang

Considerable attention has been paid to predicting student performance on exercises. The performance of prior studies is determined by the quality of the trait features of students and exercises. Nevertheless, most of the prior study primarily examines simple pairwise interactions in learning trait features, like those between students and exercises or exercises and concepts, while disregarding the complex higher-order interactions that typically exist among these components, which in turn hinders the prediction results. In this paper, we using an innovative Multi-Channel Graph Contrastive Learning (MCGCL) framework that integrates various high-order interactions for predicting student performance. MCGCL characterizes graph structures reflecting various high-order relationships among students, exercises, and concepts through multiple channels, thereby enhancing the trait features of both students and exercises. Moreover, graph contrastive learning is employed to enhance the representation of trait features acquired from high-order graph structures in diverse views. Extensive experiments on real-world datasets show that MCGCL achieves state-of-the-art results on the task of predicting student performance. The code is available at https: //github. com/sunlitsong/MCGCL.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Multi-Scale Temporal Neural Network for Stock Trend Prediction Enhanced by Temporal Hyepredge Learning

Lingyun Song
Haodong Li
Siyu Chen
Xinbiao Gan
Binze Shi
Jie Ma
Yudai Pan
Xiaoqi Wang

Existing research in Stock Trend Prediction (STP) focuses on temporal features extracted from a temporal sequence of stock data with a look-back window, which frequently leads to the omission of important periodic patterns, such as weekly and monthly variations in stock prices. Furthermore, these methods examine stocks individually, ignoring the temporal variation patterns among stocks that share higher-order relationships, like those within the same industry. These relationships typically provide contextual insights into market investments influencing stock price fluctuations. To tackle these issues, we propose a Multi-Scale Temporal Neural Network (MSTNN) framework tailored for STP. This architecture explores the periodic fluctuation behaviors of individual stocks through an innovative 3D convolutional neural network, alongside examining temporal variation patterns of stocks linked to specific industries via a temporal hypergraph attention mechanism. Empirical results from two real-world benchmark datasets show that MSTNN significantly outperforms prior state-of-the-art STP methods. The code of our MSTNN is available at https: //github. com/sunlitsong/MSTNN.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Top-I2P: Explore Open-Domain Image-to-Point Cloud Registration Using Topology Relationship

Pei An
Jiaqi Yang
Muyao Peng
You Yang
Qiong Liu
Jie Ma
Liangliang Nan

Image-to-point cloud (I2P) registration is a fundamental task in computer vision, which aims to align pixels in 2D images with corresponding points in 3D point clouds. While deep learning based methods dominate this field, they often fail to generalize to the open domain. In this paper, we address open-domain I2P registration from the topology relationships perspective. Firstly, we find that topology relationships reflect sparse connections between pixels and points, which shows the significant potential in enhancing cross-modality feature interaction in the open domain. Building on this insight, we develop an I2P registration framework using topology relationships. After that, to construct and leverage the topology relationships between the heterogeneous 2D and 3D spaces, we design a registration network, Top-I2P, with correction-based topology reasoning and fast topology feature interaction modules. Extensive experiments on 7-Scenes, RGBD-V2, ScanNet, and self-collected I2P datasets demonstrate that Top-I2P achieves superior registration performance in open-domain scenarios.

PDF Details DOI

YNIMG Journal 2024 Journal Article

Investigating unilateral and bilateral motor imagery control using electrocorticography and fMRI in awake craniotomy

Jie Ma
Zhengsheng Li
Qian Zheng
Shichen Li
Rui Zong
Zhizhen Qin
Li Wan
Zhenyu Zhao

BACKGROUND: The rapid development of neurosurgical techniques, such as awake craniotomy, has increased opportunities to explore the mysteries of the brain. This is crucial for deepening our understanding of motor control and imagination processes, especially in developing brain-computer interface (BCI) technologies and improving neurorehabilitation strategies for neurological disorders. OBJECTIVE: This study aimed to analyze brain activity patterns in patients undergoing awake craniotomy during actual movements and motor imagery, mainly focusing on the motor control processes of the bilateral limbs. METHODS: We conducted detailed observations of patients undergoing awake craniotomies. The experimenter requested participants to perform and imagine a series of motor tasks involving their hands and tongues. Brain activity during these tasks was recorded using functional magnetic resonance imaging (fMRI) and intraoperative electrocorticography (ECoG). The study included left and right finger tapping, tongue protrusion, hand clenching, and imagined movements corresponding to these actions. RESULTS: fMRI revealed significant activation in the brain's motor areas during task performance, mainly involving bilateral brain regions during imagined movement. ECoG data demonstrated a marked desynchronization pattern in the ipsilateral motor cortex during bilateral motor imagination, especially in bilateral coordination tasks. This finding suggests a potential controlling role of the unilateral cerebral cortex in bilateral motor imagination. CONCLUSION: Our study highlights the unilateral cerebral cortex's significance in controlling bilateral limb motor imagination, offering new insights into future brain network remodeling in patients with hemiplegia. Additionally, these findings provide important insights into understanding motor imagination and its impact on BCI and neurorehabilitation.

Details DOI

NeurIPS Conference 2024 Conference Paper

IR-CM: The Fast and General-purpose Image Restoration Method Based on Consistency Model

Xiaoxuan Gong
Jie Ma

This paper proposes a fast and general-purpose image restoration method. The key idea is to achieve few-step or even one-step inference by conducting consistency distilling or training on a specific mean-reverting stochastic differential equations. Furthermore, based on this, we propose a novel linear-nonlinear decoupling training strategy, significantly enhancing training effectiveness and surpassing consistency distillation on inference performance. This allows our method to be independent of any pre-trained checkpoint, enabling it to serve as an effective standalone image-to-image transformation model. Finally, to avoid trivial solutions and stabilize model training, we introduce a simple origin-guided loss. To validate the effectiveness of our proposed method, we conducted experiments on tasks including image deraining, denoising, deblurring, and low-light image enhancement. The experiments show that our method achieves highly competitive results with only one-step inference. And with just two-step inference, it can achieve state-of-the-art performance in low-light image enhancement. Furthermore, a number of ablation experiments demonstrate the effectiveness of the proposed training strategy. our code is available at https: //github. com/XiaoxuanGong/IR-CM.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering

Jie Ma
Min Hu
Pinghui Wang
Wangchun Sun
Lingyun Song
Hongbin Pei
Jun Liu
Youtian Du

Audio-Visual Question Answering (AVQA) is a complex multi-modal reasoning task, demanding intelligent systems to accurately respond to natural language queries based on audio-video input pairs. Nevertheless, prevalent AVQA approaches are prone to overlearning dataset biases, resulting in poor robustness. Furthermore, current datasets may not provide a precise diagnostic for these methods. To tackle these challenges, firstly, we propose a novel dataset, MUSIC-AVQA-R, crafted in two steps: rephrasing questions within the test split of a public dataset ( MUSIC-AVQA ) and subsequently introducing distribution shifts to split questions. The former leads to a large, diverse test space, while the latter results in a comprehensive robustness evaluation on rare, frequent, and overall questions. Secondly, we propose a robust architecture that utilizes a multifaceted cycle collaborative debiasing strategy to overcome bias learning. Experimental results show that this architecture achieves state-of-the-art performance on MUSIC-AVQA-R, notably obtaining a significant improvement of 9. 32\%. Extensive ablation experiments are conducted on the two datasets mentioned to analyze the component effectiveness within the debiasing strategy. Additionally, we highlight the limited robustness of existing multi-modal QA methods through the evaluation on our dataset. We also conduct experiments combining various baselines with our proposed strategy on two datasets to verify its plug-and-play capability. Our dataset and code are available at https: //github. com/reml-group/MUSIC-AVQA-R.

PDF Details DOI

ICRA Conference 2024 Conference Paper

Towards Visibility Estimation and Noise-Distribution-Based Defogging for LiDAR in Autonomous Driving

Jie Zhan
Yucong Duan
Junfeng Ding
Xuzhong Hu
Xiao Huang
Jie Ma

Point clouds play a crucial role in robots and intelligent vehicles. Noise caused by fog droplets seriously degrades the quality of point clouds. Previous researches have shown that the extent of degradation is correlated with visibility. The fog attenuation coefficient is associated with visibility. In light of this background, this paper proposes a noise-distribution-based defogging method for point clouds. Our approach hinges on the estimation of the fog attenuation coefficient, facilitated by road-based prior knowledge. Subsequently, our method integrates the fog-induced noise distribution inferred from the LiDAR imaging model with the spatially non-uniform distribution of point clouds caused by LiDAR structure. The fused results are input to a statistical filter based on the relative sparsity of noise to achieve defogging. This paper is one of the early works focusing on point cloud defogging. Its core insight lies in the estimation of the attenuation coefficient and the employment of fog-induced noise distribution for defogging. Experiments demonstrate that our method can accurately mitigate the impact of fog and meanwhile enhance the performance of 3D object detection network.

Details

EAAI Journal 2023 Journal Article

Conditional temporal GAN for intent-aware vessel trajectory prediction in the precautionary area

Chengfeng Jia
Jie Ma

Accurate vessel trajectory prediction is crucial for ensuring maritime traffic safety and efficiency, particularly in precaution areas characterized by multi-waterway branch merging and frequent traffic conflicts. However, predicting trajectories in these areas poses significant challenges due to the uncertain future intents of different directional branches resulting in diverse motion patterns and multiple possible paths. To address this issue and minimize the prediction error, we propose a conditional temporal generative adversarial network (CTGAN). Specifically, in this method, the trajectory generator is developed to capture the inherently dynamic of ship motions and outputs the future trajectory proposals, while the intent classifier is designed to evaluate whether the trajectory proposals are consistent with the hidden intention. With the adversarial training strategy, the trajectory generator and intent classifier form a closed loop, feedback informative signals to each other that enables generating the intent-constrained trajectory. In addition, a mixed adversarial loss function was designed to capture the spatial–temporal dependencies among the vessel motions for producing consistent trajectories that complied with plausible ship dynamics. Experiments on extensive naturalistic vessel trajectory data demonstrate that compared with the baseline methods, the proposed model achieves comparable or better prediction performance.

Details DOI

JBHI Journal 2023 Journal Article

Multi-Task Learning With Hierarchical Guidance for Locating and Stratifying Submucosal Tumors

Ruifei Zhang
Feng Zhang
Si Qin
Dejun Fan
Chaowei Fang
Jie Ma
Xiang Wan
Guanbin Li

Locating and stratifying the submucosal tumor of the digestive tract from endoscopy ultrasound (EUS) images are of vital significance to the preliminary diagnosis of tumors. However, the above problems are challenging, due to the poor appearance contrast between different layers of the digestive tract wall (DTW) and the narrowness of each layer. Few of existing deep-learning based diagnosis algorithms are devised to tackle this issue. In this article, we build a multi-task framework for simultaneously locating and stratifying the submucosal tumor. And considering the awareness of the DTW is critical to the localization and stratification of the tumor, we integrate the DTW segmentation task into the proposed multi-task framework. Except for sharing a common backbone model, the three tasks are explicitly directed with a hierarchical guidance module, in which the probability map of DTW itself is used to locally enhance the feature representation for tumor localization, and the probability maps of DTW and tumor are jointly employed to locally enhance the feature representation for tumor stratification. Moreover, by means of the dynamic class activation map, probability maps of DTW and tumor are reused to enforce the stratification inference process to pay more attention to DTW and tumor regions, contributing to a reliable and interpretable submucosal tumor stratification model. Additionally, considering the relation with respect to other structures is beneficial for stratifying tumors, we devise a graph reasoning module to replenish non-local relation knowledge for the stratification branch. Experiments on a Stomach-Esophagus and an Intestinal EUS dataset prove that our method achieves very appealing performance on both tumor localization and stratification, significantly outperforming state-of-the-art object detection approaches.

Details DOI

NeurIPS Conference 2023 Conference Paper

Structured Federated Learning through Clustered Additive Modeling

Jie Ma
Tianyi Zhou
Guodong Long
Jing Jiang
Chengqi Zhang

Heterogeneous federated learning without assuming any structure is challenging due to the conflicts among non-identical data distributions of clients. In practice, clients often comprise near-homogeneous clusters so training a server-side model per cluster mitigates the conflicts. However, FL with client clustering often suffers from “clustering collapse'', i. e. , one cluster's model excels on increasing clients, and reduces to single-model FL. Moreover, cluster-wise models hinder knowledge sharing between clusters and each model depends on fewer clients. Furthermore, the static clustering assumption on data may not hold for dynamically changing models, which are sensitive to cluster imbalance/initialization or outliers. To address these challenges, we propose ''Clustered Additive Modeling (CAM)'', which applies a globally shared model $\Theta_g$ on top of the cluster-wise models $\Theta_{1: K}$, i. e. , $y=h(x; \Theta_g)+f(x; \Theta_k)$ for clients of cluster-$k$. The global model captures the features shared by all clusters so $\Theta_{1: K}$ are enforced to focus on the difference among clusters. To train CAM, we develop a novel Fed-CAM algorithm that alternates between client clustering and training global/cluster models to predict the residual of each other. We can easily modify any existing clustered FL methods by CAM and significantly improve their performance without ‘’clustering collapse'' in different non-IID settings. We also provide a convergence analysis of Fed-CAM algorithm.

PDF Details

AAAI Conference 2022 Conference Paper

Distribution Aware VoteNet for 3D Object Detection

Junxiong Liang
Pei An
Jie Ma

Occlusion is common in the actual 3D scenes, causing the boundary ambiguity of the targeted object. This uncertainty brings difficulty for labeling and learning. Current 3D detectors predict the bounding box directly, regarding it as Dirac delta distribution. However, it does not fully consider such ambiguity. To deal with it, distribution learning is used to efficiently represent the boundary ambiguity. In this paper, we revise the common regression method by predicting the distribution of the 3D box and then present a distribution-aware regression (DAR) module for box refinement and localization quality estimation. It contains scale adaptive (SA) encoder and joint localization quality estimator (JLQE). With the adaptive receptive field, SA encoder refines discriminative features for precise distribution learning. JLQE provides a reliable location score by further leveraging the distribution statistics, correlating with the localization quality of the targeted object. Combining DAR module and the baseline VoteNet, we propose a novel 3D detector called DAVNet. Extensive experiments on both ScanNet V2 and SUN RGB-D datasets demonstrate that the proposed DAVNet achieves significant improvement and outperforms state-of-the-art 3D detectors.

PDF Details

NeurIPS Conference 2022 Conference Paper

Federated Learning from Pre-Trained Models: A Contrastive Learning Approach

Yue Tan
Guodong Long
Jie Ma
Lu Liu
Tianyi Zhou
Jing Jiang

Federated Learning (FL) is a machine learning paradigm that allows decentralized clients to learn collaboratively without sharing their private data. However, excessive computation and communication demands pose challenges to current FL frameworks, especially when training large-scale models. To prevent these issues from hindering the deployment of FL systems, we propose a lightweight framework where clients jointly learn to fuse the representations generated by multiple fixed pre-trained models rather than training a large-scale model from scratch. This leads us to a more practical FL problem by considering how to capture more client-specific and class-relevant information from the pre-trained models and jointly improve each client's ability to exploit those off-the-shelf models. Here, we design a Federated Prototype-wise Contrastive Learning (FedPCL) approach which shares knowledge across clients through their class prototypes and builds client-specific representations in a prototype-wise contrastive manner. Sharing prototypes rather than learnable model parameters allows each client to fuse the representations in a personalized way while keeping the shared knowledge in a compact form for efficient communication. We perform a thorough evaluation of the proposed FedPCL in the lightweight framework, measuring and visualizing its ability to fuse various pre-trained models on popular FL datasets.

PDF Details

ICLR Conference 2022 Conference Paper

Pareto Policy Pool for Model-based Offline Reinforcement Learning

Yijun Yang
Jing Jiang 0002
Tianyi Zhou 0001
Jie Ma
Yuhui Shi 0001

Online reinforcement learning (RL) can suffer from poor exploration, sparse reward, insufficient data, and overhead caused by inefficient interactions between an immature policy and a complicated environment. Model-based offline RL instead trains an environment model using a dataset of pre-collected experiences so online RL methods can learn in an offline manner by solely interacting with the model. However, the uncertainty and accuracy of the environment model can drastically vary across different state-action pairs so the RL agent may achieve high model return but perform poorly in the true environment. Unlike previous works that need to carefully tune the trade-off between the model return and uncertainty in a single objective, we study a bi-objective formulation for model-based offline RL that aims at producing a pool of diverse policies on the Pareto front performing different levels of trade-offs, which provides the flexibility to select the best policy for each realistic environment from the pool. Our method, ''Pareto policy pool (P3)'', does not need to tune the trade-off weight but can produce policies allocated at different regions of the Pareto front. For this purpose, we develop an efficient algorithm that solves multiple bi-objective optimization problems with distinct constraints defined by reference vectors targeting diverse regions of the Pareto front. We theoretically prove that our algorithm can converge to the targeted regions. In order to obtain more Pareto optimal policies without linearly increasing the cost, we leverage the achieved policies as initialization to find more Pareto optimal policies in their neighborhoods. On the D4RL benchmark for offline RL, P3 substantially outperforms several recent baseline methods over multiple tasks, especially when the quality of pre-collected experiences is low.

Details

ICLR Conference 2021 Conference Paper

Structured Prediction as Translation between Augmented Natural Languages

Giovanni Paolini
Ben Athiwaratkun
Jason Krone
Jie Ma
Alessandro Achille
Rishita Anubhai
Cícero Nogueira dos Santos
Bing Xiang

We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking. Instead of tackling the problem by training task-specific discriminative classifiers, we frame it as a translation task between augmented natural languages, from which the task-relevant information can be easily extracted. Our approach can match or outperform task-specific models on all tasks, and in particular achieves new state-of-the-art results on joint entity and relation extraction (CoNLL04, ADE, NYT, and ACE2005 datasets), relation classification (FewRel and TACRED), and semantic role labeling (CoNLL-2005 and CoNLL-2012). We accomplish this while using the same architecture and hyperparameters for all tasks, and even when training a single model to solve all tasks at the same time (multi-task learning). Finally, we show that our framework can also significantly improve the performance in a low-resource regime, thanks to better use of label semantics.

Details

JBHI Journal 2020 Journal Article

Automatic CIN Grades Prediction of Sequential Cervigram Image Using LSTM With Multistate CNN Features

Zijie Yue
Shuai Ding
Weidong Zhao
Hao Wang
Jie Ma
Youtao Zhang
Yanchun Zhang

Cervical cancer ranks as the second most common cancer in women worldwide. In clinical practice, colposcopy is an indispensable part of screening for cervical intraepithelial neoplasia (CIN) grades and cervical cancer but exhibits high misdiagnosis rate. Existing computer-assisted algorithms for analyzing cervigram images have neglected that colposcopy is a sequential and multistate process, which is unsuitable for clinical applications. In this work, we construct a cervigram-based recurrent convolutional neural network (C-RCNN) to classify different CIN grades and cervical cancer. Convolutional neural networks are leveraged to extract spatial features. We develop a sequence-encoding module to encode discriminative temporal features and a multistate-aware convolutional layer to integrate features from different states of cervigram images. To train and evaluate the performance of C-RCNN, we leveraged a dataset of 4, 753 real cervigrams and obtained 96. 13% test accuracy with a specificity and sensitivity of 98. 22% and 95. 09%, respectively. Areas under each receiver operating characteristic curves are above 0. 94, proving that visual representations and sequential dynamics can be jointly and effectively optimized in the training phase. Comparative analysis demonstrated the effectiveness of the proposed C-RCNN against competing methods, showing significant improvement over only focusing on a single frame. This architecture can be extended to other applications in medical image analysis

Details DOI

IJCAI Conference 2020 Conference Paper

Stochastic Batch Augmentation with An Effective Distilled Dynamic Soft Label Regularizer

Qian Li
Qingyuan Hu
Yong Qi
Saiyu Qi
Jie Ma
Jian Zhang

Data augmentation have been intensively used in training deep neural network to improve the generalization, whether in original space (e. g. , image space) or representation space. Although being successful, the connection between the synthesized data and the original data is largely ignored in training, without considering the distribution information that the synthesized samples are surrounding the original sample in training. Hence, the behavior of the network is not optimized for this. However, that behavior is crucially important for generalization, even in the adversarial setting, for the safety of the deep learning system. In this work, we propose a framework called Stochastic Batch Augmentation (SBA) to address these problems. SBA stochastically decides whether to augment at iterations controlled by the batch scheduler and in which a ''distilled'' dynamic soft label regularization is introduced by incorporating the similarity in the vicinity distribution respect to raw samples. The proposed regularization provides direct supervision by the KL-Divergence between the output soft-max distributions of original and virtual data. Our experiments on CIFAR-10, CIFAR-100, and ImageNet show that SBA can improve the generalization of the neural networks and speed up the convergence of network training.

PDF Details DOI

AAAI Conference 2019 System Paper

A General Planning-Based Framework for Goal-Driven Conversation Assistant

Zhuoxuan Jiang
Jie Ma
Jingyi Lu
Guangyuan Yu
Yipeng Yu
Shaochun Li

We propose a general framework for goal-driven conversation assistant based on Planning methods. It aims to rapidly build a dialogue agent with less handcrafting and make the more interpretable and efficient dialogue management in various scenarios. By employing the Planning method, dialogue actions can be efficiently defined and reusable, and the transition of the dialogue are managed by a Planner. The proposed framework consists of a pipeline of Natural Language Understanding (intent labeler), Planning of Actions (with a World Model), and Natural Language Generation (learned by an attention-based neural network). We demonstrate our approach by creating conversational agents for several independent domains.

PDF Details

ICRA Conference 2011 Conference Paper

Compliant fixture layout design using topology optimization method

Jie Ma
Michael Yu Wang
Xiangyang Zhu

The deformation of the workpiece-fixture system has essential influence on the locating accuracy of the workpiece. To minimize the overall deformation of the workpiece-fixture system is an important issue in fixture design. This paper focuses on the fixture layout design with compliant model. A topology optimization approach is presented in order to reduce the complexity introduced by the high computational cost of the finite element equation solving and the exhaustive search in the point set domain. With finite element analysis, algorithms are developed for the optimization problem of locator synthesis in the point set domain. Numerical examples are also presented to verify the effectiveness of the proposed approach.

Details