Arrow Research search

Author name cluster

Jun Guo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

31 papers
1 author row

Possible papers

31

JBHI Journal 2026 Journal Article

ECG–MTDA: A Framework for Morphology–Temporal Decoupling and LLM–Guided Text Alignment

  • Huimin Zheng
  • Jun Guo
  • Xin Zhang
  • Xiaofen Xing
  • Xiangmin Xu

Effective automated electrocardiogram (ECG) interpretation hinges on disentangling waveform morphology from rhythm dynamics, a challenge for existing multimodal models that often conflate these heterogeneous attributes and introduce semantic ambiguity. We introduce ECG-MTDA, a framework that explicitly decouples these components. It learns morphology-oriented representations via a PQRST-guided masked autoencoder, while separately modeling temporal dynamics using continuous wavelet transform. Crucially, we align the learned morphology with concise, label-conditioned textual descriptions generated by a large language model (LLM) using a contrastive objective, creating a semantically grounded embedding space. ECG-MTDA demonstrates superior performance on the PTB-XL and CPSC 2018 benchmarks (e. g. , AUC 93. 16 on PTB-XL Superclass), with statistically significant gains over a strong multimodal baseline. Furthermore, on a challenging in-house cohort (n=620) for short-term paroxysmal atrial fibrillation (pAF) progression prediction, the model achieves an AUC of 0. 97 $\pm$ 0. 02 with high sensitivity (0. 80 $\pm$ 0. 04) and specificity (0. 98 $\pm$ 0. 01). Ablation studies and qualitative analyses confirm the benefits of our decoupled design and morphology-text alignment. Our results demonstrate that this clinically-inspired decoupling strategy yields more precise and robust multimodal representations for complex ECG analysis, enhancing both diagnostic classification and near-term risk stratification.

NeurIPS Conference 2025 Conference Paper

Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning

  • Simin Li
  • Zihao Mao
  • Hanxiao Li
  • Zonglei Jing
  • Zhuohang bian
  • Jun Guo
  • Li Wang
  • Zhuoran Han

In cooperative Multi-Agent Reinforcement Learning (MARL), it is a common practice to tune hyperparameters in ideal simulated environments to maximize cooperative performance. However, policies tuned for cooperation often fail to maintain robustness and resilience under real-world uncertainties. Building trustworthy MARL systems requires a deep understanding of \emph{robustness}, which ensures stability under uncertainties, and \emph{resilience}, the ability to recover from disruptions—a concept extensively studied in control systems but largely overlooked in MARL. In this paper, we present a large-scale empirical study comprising over 82, 620 experiments to evaluate cooperation, robustness, and resilience in MARL across 4 real-world environments, 13 uncertainty types, and 15 hyperparameters. Our key findings are: (1) Under mild uncertainty, optimizing cooperation improves robustness and resilience, but this link weakens as perturbations intensify. Robustness and resilience also varies by algorithm and uncertainty type. (2) Robustness and resilience do not generalize across uncertainty modalities or agent scopes: policies robust to action noise for all agents may fail under observation noise on a single agent. (3) Hyperparameter tuning is critical for trustworthy MARL: surprisingly, standard practices like parameter sharing, GAE, and PopArt can hurt robustness, while early stopping, high critic learning rates, and Leaky ReLU consistently help. By optimizing hyperparameters only, we observe substantial improvement in cooperation, robustness and resilience across all MARL backbones, with the phenomenon also generalizing to robust MARL methods across these backbones.

NeurIPS Conference 2024 Conference Paper

Animal-Bench: Benchmarking Multimodal Video Models for Animal-centric Video Understanding

  • Yinuo Jing
  • Ruxu Zhang
  • Kongming Liang
  • Yongxiang Li
  • Zhongjiang He
  • Zhanyu Ma
  • Jun Guo

With the emergence of large pre-trained multimodal video models, multiple benchmarks have been proposed to evaluate model capabilities. However, most of the benchmarks are human-centric, with evaluation data and tasks centered around human applications. Animals are an integral part of the natural world, and animal-centric video understanding is crucial for animal welfare and conservation efforts. Yet, existing benchmarks overlook evaluations focused on animals, limiting the application of the models. To address this limitation, our work established an animal-centric benchmark, namely Animal-Bench, to allow for a comprehensive evaluation of model capabilities in real-world contexts, overcoming agent-bias in previous benchmarks. Animal-Bench includes 13 tasks encompassing both common tasks shared with humans and special tasks relevant to animal conservation, spanning 7 major animal categories and 819 species, comprising a total of 41, 839 data entries. To generate this benchmark, we defined a task system centered on animals and proposed an automated pipeline for animal-centric data processing. To further validate the robustness of models against real-world challenges, we utilized a video editing approach to simulate realistic scenarios like weather changes and shooting parameters due to animal movements. We evaluated 8 current multimodal video models on our benchmark and found considerable room for improvement. We hope our work provides insights for the community and opens up new avenues for research in multimodal video models. Our data and code will be released at https: //github. com/PRIS-CV/Animal-Bench.

AAAI Conference 2024 Conference Paper

Dual-Prior Augmented Decoding Network for Long Tail Distribution in HOI Detection

  • Jiayi Gao
  • Kongming Liang
  • Tao Wei
  • Wei Chen
  • Zhanyu Ma
  • Jun Guo

Human object interaction detection aims at localizing human-object pairs and recognizing their interactions. Trapped by the long-tailed distribution of the data, existing HOI detection methods often have difficulty recognizing the tail categories. Many approaches try to improve the recognition of HOI tasks by utilizing external knowledge (e.g. pre-trained visual-language models). However, these approaches mainly utilize external knowledge at the HOI combination level and achieve limited improvement in the tail categories. In this paper, we propose a dual-prior augmented decoding network by decomposing the HOI task into two sub-tasks: human-object pair detection and interaction recognition. For each subtask, we leverage external knowledge to enhance the model's ability at a finer granularity. Specifically, we acquire the prior candidates from an external classifier and embed them to assist the subsequent decoding process. Thus, the long-tail problem is mitigated from a coarse-to-fine level with the corresponding external knowledge. Our approach outperforms existing state-of-the-art models in various settings and significantly boosts the performance on the tail HOI categories. The source code is available at https://github.com/PRIS-CV/DP-ADN.

AAAI Conference 2023 Conference Paper

Bi-directional Feature Reconstruction Network for Fine-Grained Few-Shot Image Classification

  • Jijie Wu
  • Dongliang Chang
  • Aneeshan Sain
  • Xiaoxu Li
  • Zhanyu Ma
  • Jie Cao
  • Jun Guo
  • Yi-Zhe Song

The main challenge for fine-grained few-shot image classification is to learn feature representations with higher inter-class and lower intra-class variations, with a mere few labelled samples. Conventional few-shot learning methods however cannot be naively adopted for this fine-grained setting -- a quick pilot study reveals that they in fact push for the opposite (i.e., lower inter-class variations and higher intra-class variations). To alleviate this problem, prior works predominately use a support set to reconstruct the query image and then utilize metric learning to determine its category. Upon careful inspection, we further reveal that such unidirectional reconstruction methods only help to increase inter-class variations and are not effective in tackling intra-class variations. In this paper, we for the first time introduce a bi-reconstruction mechanism that can simultaneously accommodate for inter-class and intra-class variations. In addition to using the support set to reconstruct the query set for increasing inter-class variations, we further use the query set to reconstruct the support set for reducing intra-class variations. This design effectively helps the model to explore more subtle and discriminative features which is key for the fine-grained problem in hand. Furthermore, we also construct a self-reconstruction module to work alongside the bi-directional module to make the features even more discriminative. Experimental results on three widely used fine-grained image classification datasets consistently show considerable improvements compared with other methods. Codes are available at: https://github.com/PRIS-CV/Bi-FRN.

AAAI Conference 2020 Conference Paper

Preserving Ordinal Consensus: Towards Feature Selection for Unlabeled Data

  • Jun Guo
  • Heng Chang
  • Wenwu Zhu

To better pre-process unlabeled data, most existing feature selection methods remove redundant and noisy information by exploring some intrinsic structures embedded in samples. However, these unsupervised studies focus too much on the relations among samples, totally neglecting the feature-level geometric information. This paper proposes an unsupervised triplet-induced graph to explore a new type of potential structure at feature level, and incorporates it into simultaneous feature selection and clustering. In the feature selection part, we design an ordinal consensus preserving term based on a triplet-induced graph. This term enforces the projection vectors to preserve the relative proximity of original features, which contributes to selecting more relevant features. In the clustering part, Self-Paced Learning (SPL) is introduced to gradually learn from ‘easy’ to ‘complex’ samples. SPL alleviates the dilemma of falling into the bad local minima incurred by noise and outliers. Specifically, we propose a compelling regularizer for SPL to obtain a robust loss. Finally, an alternating minimization algorithm is developed to efficiently optimize the proposed model. Extensive experiments on different benchmark datasets consistently demonstrate the superiority of our proposed method.

AAAI Conference 2019 Conference Paper

A Human-Like Semantic Cognition Network for Aspect-Level Sentiment Classification

  • Zeyang Lei
  • Yujiu Yang
  • Min Yang
  • Wei Zhao
  • Jun Guo
  • Yi Liu

In this paper, we propose a novel Human-like Semantic Cognition Network (HSCN) for aspect-level sentiment classification, motivated by the principles of human beings’ reading cognitive process (pre-reading, active reading, post-reading). We first design a word-level interactive perception module to capture the correlation between context words and the given target words, which can be regarded as pre-reading. Second, to mimic the process of active reading, we propose a targetaware semantic distillation module to produce the targetspecific context representation for aspect-level sentiment prediction. Third, we further devise a semantic deviation metric module to measure the semantic deviation between the targetspecific context representation and the given target, which evaluates the degree we understand the target-specific context semantics. The measured semantic deviation is then used to fine-tune the above active reading process in a feedback regulation way. To verify the effectiveness of our approach, we conduct extensive experiments on three widely used datasets. The experiments demonstrate that HSCN achieves impressive results compared to other strong competitors.

AAAI Conference 2019 Conference Paper

Anchors Bring Ease: An Embarrassingly Simple Approach to Partial Multi-View Clustering

  • Jun Guo
  • Jiahui Ye

Clustering on multi-view data has attracted much more attention in the past decades. Most previous studies assume that each instance appears in all views, or there is at least one view containing all instances. However, real world data often suffers from missing some instances in each view, leading to the research problem of partial multi-view clustering. To address this issue, this paper proposes a simple yet effective Anchorbased Partial Multi-view Clustering (APMC) method, which utilizes anchors to reconstruct instance-to-instance relationships for clustering. APMC is conceptually simple and easy to implement in practice, besides it has clear intuitions and non-trivial empirical guarantees. Specifically, APMC firstly integrates intra- and inter- view similarities through anchors. Then, spectral clustering is performed on the fused similarities to obtain a unified clustering result. Compared with existing partial multi-view clustering methods, APMC has three notable advantages: 1) it can capture more non-linear relations among instances with the help of kernel-based similarities; 2) it has a much lower time complexity in virtue of a noniterative scheme; 3) it can inherently handle data with negative entries as well as be extended to more than two views. Finally, we extensively evaluate the proposed method on five benchmark datasets. Experimental results demonstrate the superiority of APMC over state-of-the-art approaches.

AAAI Conference 2018 Conference Paper

Dependence Guided Unsupervised Feature Selection

  • Jun Guo
  • Wenwu Zhu

In the past decade, various sparse learning based unsupervised feature selection methods have been developed. However, most existing studies adopt a two-step strategy. i. e. , selecting the top-m features according to a calculated descending order and then performing K-means clustering, resulting in a group of sub-optimal features. To address this problem, we propose a Dependence Guided Unsupervised Feature Selection (DGUFS) method to select features and partition data in a joint manner. Our proposed method enhances the interdependence among original data, cluster labels, and selected features. In particular, a projection-free feature selection model is proposed based on l2, 0-norm equality constraints. We utilize the learned cluster labels to fill in the information gap between original data and selected features. Two dependence guided terms are consequently proposed for our model. More specifically, one term increases the dependence of desired cluster labels on original data, while the other term maximizes the dependence of selected features on cluster labels to guide the process of feature selection. Last but not least, an iterative algorithm based on Alternating Direction Method of Multipliers (ADMM) is designed to solve the constrained minimization problem efficiently. Extensive experiments on different datasets consistently demonstrate that our proposed method significantly outperforms state-of-the-art baselines.

AAAI Conference 2018 Conference Paper

Partial Multi-View Outlier Detection Based on Collective Learning

  • Jun Guo
  • Wenwu Zhu

In the past decade, various multi-view outlier detection methods have been designed to detect horizontal outliers that exhibit inconsistent across-view characteristics. The existing works assume that all objects are present in all views. However, in real-world applications, it is often the incomplete case that every view may suffer from some missing samples, resulting in partial objects difficult to detect outliers from. To address this problem, we propose a novel Collective Learning (CL) based framework to detect outliers from partial multi-view data in a self-guided way. More specifically, by well exploiting the inter-dependence among different views, we develop an algorithm to reconstruct missing samples based on learning. Furthermore, we propose similarity-based outlier detection to break through the dilemma that the number of clusters is unknown priori. Then, the calculated outlier scores act as the confidence levels in CL and in turn guide the reconstruction of missing data. Learning-based missing sample recovery and similarity-based outlier detection are iteratively performed in a self-guided manner. Experimental results on benchmark datasets show that our proposed approach consistently and significantly outperforms state-of-the-art baselines.

IJCAI Conference 2018 Conference Paper

Robust Auto-Weighted Multi-View Clustering

  • Pengzhen Ren
  • Yun Xiao
  • Pengfei Xu
  • Jun Guo
  • Xiaojiang Chen
  • Xin Wang
  • Dingyi Fang

Multi-view clustering has played a vital role in real-world applications. It aims to cluster the data points into different groups by exploring complementary information of multi-view. A major challenge of this problem is how to learn the explicit cluster structure with multiple views when there is considerable noise. To solve this challenging problem, we propose a novel Robust Auto-weighted Multi-view Clustering (RAMC), which aims to learn an optimal graph with exactly k connected components, where k is the number of clusters. ℓ1-norm is employed for robustness of the proposed algorithm. We have validated this in the later experiment. The new graph learned by the proposed model approximates the original graphs of each individual view but maintains an explicit cluster structure. With this optimal graph, we can immediately achieve the clustering results without any further post-processing. We conduct extensive experiments to confirm the superiority and robustness of the proposed algorithm.

AAAI Conference 2017 Conference Paper

Building an End-to-End Spatial-Temporal Convolutional Network for Video Super-Resolution

  • Jun Guo
  • Hongyang Chao

We propose an end-to-end deep network for video superresolution. Our network is composed of a spatial component that encodes intra-frame visual patterns, a temporal component that discovers inter-frame relations, and a reconstruction component that aggregates information to predict details. We make the spatial component deep, so that it can better leverage spatial redundancies for rebuilding high-frequency structures. We organize the temporal component in a bidirectional and multi-scale fashion, to better capture how frames change across time. The effectiveness of the proposed approach is highlighted on two datasets, where we observe substantial improvements relative to the state of the arts.

JBHI Journal 2017 Journal Article

HEp-2 Cell Classification via Combining Multiresolution Co-Occurrence Texture and Large Region Shape Information

  • Xianbiao Qi
  • Guoying Zhao
  • Chun-Guang Li
  • Jun Guo
  • Matti Pietikainen

Indirect immunofluorescence imaging of human epithelial type 2 (HEp-2) cell image is an effective evidence to diagnose autoimmune diseases. Recently, computer-aided diagnosis of autoimmune diseases by the HEp-2 cell classification has attracted great attention. However, the HEp-2 cell classification task is quite challenging due to large intraclass and small interclass variations. In this paper, we propose an effective approach for the automatic HEp-2 cell classification by combining multiresolution co-occurrence texture and large regional shape information. To be more specific, we propose to: 1) capture multiresolution co-occurrence texture information by a novel pairwise rotation-invariant co-occurrence of local Gabor binary pattern descriptor; 2) depict large regional shape information by using an improved Fisher vector model with RootSIFT features, which are sampled from large image patches in multiple scales; and 3) combine both features. We evaluate systematically the proposed approach on the IEEE International Conference on Pattern Recognition (ICPR) 2012, the IEEE International Conference on Image Processing (ICIP) 2013, and the ICPR 2014 contest datasets. The proposed method based on the combination of the introduced two features outperforms the winners of the ICPR 2012 contest using the same experimental protocol. Our method also greatly improves the winner of the ICIP 2013 contest under four different experimental setups. Using the leave-one-specimen-out evaluation strategy, our method achieves comparable performance with the winner of the ICPR 2014 contest that combined four features.

AAAI Conference 2016 Conference Paper

Discriminative Analysis Dictionary Learning

  • Jun Guo
  • Yanqing Guo
  • Xiangwei Kong
  • Man Zhang
  • Ran He

Dictionary learning (DL) has been successfully applied to various pattern classification tasks in recent years. However, analysis dictionary learning (ADL), as a major branch of DL, has not yet been fully exploited in classification due to its poor discriminability. This paper presents a novel DL method, namely Discriminative Analysis Dictionary Learning (DADL), to improve the classification performance of ADL. First, a code consistent term is integrated into the basic analysis model to improve discriminability. Second, a tripletconstraint-based local topology preserving loss function is introduced to capture the discriminative geometrical structures embedded in data. Third, correntropy induced metric is employed as a robust measure to better control outliers for classification. Then, half-quadratic minimization and alternate search strategy are used to speed up the optimization process so that there exist closed-form solutions in each alternating minimization stage. Experiments on several commonly used databases show that our proposed method not only significantly improves the discriminative ability of ADL, but also outperforms state-of-the-art synthesis DL methods.

AAAI Conference 2015 Conference Paper

Building Effective Representations for Sketch Recognition

  • Jun Guo
  • Changhu Wang
  • Hongyang Chao

As the popularity of touch-screen devices, understanding a user’s hand-drawn sketch has become an increasingly important research topic in artificial intelligence and computer vision. However, different from natural images, the hand-drawn sketches are often highly abstract, with sparse visual information and large intraclass variance, making the problem more challenging. In this work, we study how to build effective representations for sketch recognition. First, to capture saliency patterns of different scales and spatial arrangements, a Gabor-based low-level representation is proposed. Then, based on this representation, to discovery more complex patterns in a sketch, a Hybrid Multilayer Sparse Coding (HMSC) model is proposed to learn midlevel representations. An improved dictionary learning algorithm is also leveraged in HMSC to reduce overfitting to common but trivial patterns. Extensive experiments show that the proposed representations are highly discriminative and lead to large improvements over the state of the arts.

AAAI Conference 2015 Conference Paper

VecLP: A Realtime Video Recommendation System for Live TV Programs

  • Sheng Gao
  • Dai Zhang
  • Honggang Zhang
  • Chao Huang
  • Yongsheng Zhang
  • Jianxin Liao
  • Jun Guo

We propose VecLP, a novel Internet Video recommendation system working for Live TV Programs in this paper. Given little information on the live TV programs, our proposed VecLP system can effectively collect necessary information on both the programs and the subscribers as well as a large volume of related online videos, and then recommend the relevant Internet videos to the subscribers. For that, the key frames are firstly detected from the live TV programs, and then visual and textual features are extracted from these frames to enhance the understanding of the TV broadcasts. Furthermore, by utilizing the subscribers’ profiles and their social relationships, a user preference model is constructed, which greatly improves the diversity of the recommendations in our system. The subscriber’s browsing history is also recorded and used to make a further personalized recommendation. This work also illustrates how our proposed VecLP system makes it happen. Finally, we dispose some sort of new recommendation strategies in use at the system to meet special needs from diverse live TV programs and throw light upon how to fuse these strategies.

AAAI Conference 2013 Conference Paper

A Maximum K-Min Approach for Classification

  • Mingzhi Dong
  • Liang Yin
  • Weihong Deng
  • Li Shang
  • Jun Guo
  • Honggang Zhang

In this paper, a general Maximum K-Min approach for classification is proposed. With the physical meaning of optimizing the classification confidence of the K worst instances, Maximum K-Min Gain/Minimum K- Max Loss (MKM) criterion is introduced. To make the original optimization problem with combinational number of constraints computationally tractable, the optimization techniques are adopted and a general compact representation lemma for MKM Criterion is summarized. Based on the lemma, a Nonlinear Maximum K- Min (NMKM) classifier and a Semi-supervised Maximum K-Min (SMKM) classifier are presented for traditional classification task and semi-supervised classi- fication task respectively. Based on the experiment results of publicly available datasets, our Maximum K- Min methods have achieved competitive performance when comparing against Hinge Loss classifiers.