Arrow Research search

Author name cluster

Junwei Han

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

34 papers
1 author row

Possible papers

34

AAAI Conference 2026 Conference Paper

AURORA: Augmented Understanding via Structured Reasoning and Reinforcement Learning for Reference Audio-Visual Segmentation

  • Ziyang Luo
  • Nian Liu
  • Fahad Shahbaz Khan
  • Junwei Han

Reference Audio-Visual Segmentation (Ref-AVS) tasks challenge models to precisely locate sounding objects by integrating visual, auditory, and textual cues. Existing methods often lack genuine semantic understanding, tending to memorize fixed reasoning patterns. Furthermore, jointly training for reasoning and segmentation can compromise pixel-level precision. To address these issues, we introduce AURORA, a novel framework designed to enhance genuine reasoning and language comprehension in reference audio-visual segmentation. We employ a structured Chain-of-Thought (CoT) prompting mechanism to guide the model through a step-by-step reasoning process and introduce a novel segmentation feature distillation loss to effectively integrate these reasoning abilities without sacrificing segmentation performance. To further cultivate the model's genuine reasoning capabilities, we devise a further two-stage training strategy: first, a ``corrective reflective-style training" stage utilizes self-correction to enhance the quality of reasoning paths, followed by reinforcement learning via Group Reward Policy Optimization (GRPO) to bolster robustness in challenging scenarios. Experiments demonstrate that AURORA achieves state-of-the-art performance on Ref-AVS benchmarks and generalizes effectively to unreferenced segmentation.

AAAI Conference 2026 Conference Paper

UQ-ViT: Harmonizing Extreme Activations with Hardware-Friendly Uniform Quantization in Vision Transformers

  • Tao Jiang
  • Yucheng Jiang
  • Xiwen Yao
  • Gong Cheng
  • Junwei Han

Post-Training Quantization enables efficient Vision Transformer (ViTs) deployment with a small calibration data, and its prevalent use of uniform quantization harnesses AI accelerator matrix cores for high-speed inference. However, the application of uniform quantization is fundamentally challenged by the extreme non-uniformity of activation distributions.Specifically, the power-law nature of post-Softmax attention scores and the significant inter-channel variance in post-GELU activations create a dilemma for conventional quantization, as it struggles to preserve critical high-magnitude values without sacrificing overall precision. To resolve this core conflict, we introduce UQ-ViT (Uniform Quantization for Vision Transformers), a novel uniform quantization framework designed to reconcile high precision with hardware efficiency. Central to UQ-ViT are two operators: Dynamic Elimination of Maximum (DeMax) and Normalization Quantization (NormQuant). DeMax is a quantization operator for post-Softmax attention scores that utilizes uniform quantization. It dynamically eliminates and preserves dominant values, effectively mitigating quantization loss from the extreme values in the power-law distribution. NormQuant utilizes a per-channel quantization strategy during quantization and reverts to a per-tensor format for dequantization, achieving both high accuracy and computational efficiency. Crucially, it is applicable to any linear layer, enabling effective quantization of post-GELU activations in ViTs. Through extensive experiments on various ViTs and vision tasks, including image classification, object detection, and instance segmentation, we demonstrate that our proposed approach outperforms existing methods, achieving superior accuracy while ensuring hardware friendliness.

JBHI Journal 2025 Journal Article

A Foundational fMRI Model for Representing Continuous Brain States

  • Li Yang
  • Lei Guo
  • Yixuan Yuan
  • Junwei Han
  • Xintao Hu
  • Tuo Zhang

Foundational models have significant potential to advance brain function research, particularly in understanding the dynamics of brain states. However, most existing models process brain signals within fixed time windows, restricting their ability to capture the full temporal complexity of brain activity. In this study, we propose BrainSN (Brain States Network), a novel fMRI foundational model designed to represent continuous brain state information and support diverse downstream tasks. First, leveraging a transformer-based architecture, BrainSN reconstructs input brain states across multiple time scales and predicts future brain activity, effectively capturing both short-term and long-term dependencies. Second, through multiple embeddings and a channel gating module, the model integrates brain state information and applies an attention mechanism to extract critical features. Additionally, we train BrainSN on 1, 256 hours of resting-state and naturalistic stimulus fMRI data, enabling it to learn large-scale brain dynamics without relying on task-based paradigms. Without fine-tuning, BrainSN achieves 75. 23% and 75. 82% accuracy in autism and attention disorder diagnosis tasks, respectively, matching the performance of leading models pretrained on disease-specific data. After fine-tuning, it surpasses these models. In mental state decoding, BrainSN attains 95. 31% accuracy without fine-tuning, outperforming the best models trained on large-scale task-based fMRI data. Furthermore, by analyzing BrainSN's embeddings in relation to movie stimuli, we demonstrate that the model effectively captures the semantic content of movie scenes embedded in fMRI signals and is highly sensitive to sequence. These results highlight BrainSN's ability to model brain state dynamics and underscore its potential advantages for clinical diagnosis, treatment evaluation, and cognitive neuroscience research.

JBHI Journal 2025 Journal Article

Frequency-Aware B-Line and Pleural Line Analysis in Lung Ultrasound Videos

  • Kaihui Yang
  • Guangyu Guo
  • Ying Zhang
  • Linxuan Pang
  • Zhaohui Zheng
  • Ruyu Liu
  • Jin Ding
  • Dingwen Zhang

Accurately identifying B-lines and pleural line (P-line) in lung ultrasound (LUS) videos is valuable for evaluating certain lung conditions. However, manual interpretation remains subjective and highly dependent on operator expertise. Existing deep learning methods often suffer from performance degradation due to speckle noise and motion artifacts. Moreover, the limited availability of LUS video data annotated for multiple diagnostic features such as B-lines and the P-line limits model development. Therefore, this paper introduces ILD-LUS, a new clinical LUS database designed based on interstitial lung disease (ILD) analysis by category labeling, comprising 2, 149 ultrasound videos (193, 410 frames). Also, we construct an external test set based on the public Covid-BLUES dataset for the evaluation of B-lines and P-line recognition in different pulmonary pathologies. Then, we propose a novel video analysis framework that integrates wavelet enhancement with temporal attention modeling. Specifically, we employ a dual-component frequency feature enhancement method using the Discrete Wavelet Transform (DWT), which effectively suppresses noise while preserving important landmarks. Subsequently, an adaptive attention module is introduced to model long-range temporal dependencies and improve dynamic feature representation across consecutive frames. Experimental results show that the proposed method achieves over 94% AUC and 82% ACC for both B-lines and P-line classification on both the ILD-LUS and Covid-BLUES datasets, outperforming existing methods. These findings demonstrate the robustness and generalizability of our approach across different pathological conditions. Overall, the proposed framework shows strong potential for supporting clinical decision-making in LUS analysis. The code is available at https://github.com/KaIi-github/WaveLUS.

JBHI Journal 2025 Journal Article

MHKD: Multi-Step Hybrid Knowledge Distillation for Low-Resolution Whole Slide Images Glomerulus Detection

  • Xiangsen Zhang
  • Longfei Han
  • Chenchu Xu
  • Zhaohui Zheng
  • Jin Ding
  • Xianghui Fu
  • Dingwen Zhang
  • Junwei Han

Glomerulus detection is a critical component of renal histopathology assessment, essential for diagnosing glomerulonephritis. To mitigate the increasing workload on pathologists, AI-assisted diagnostic methods based on high-resolution digital pathology whole slide images have been developed. However, these current AI-assisted approaches are limited to high-resolution whole slide images, necessitating expensive digital scanner equipment, high image storage costs, and significant computational complexity. To address this limitation, this paper pioneers a method for facilitating glomerulus detection in low-resolution human kidney pathology images. Specifically, we propose a novel multi-step hybrid knowledge distillation method. Our method distills both the global features and the semantic information through a hybrid knowledge distillation strategy that integrates offline and online knowledge distillation, where the information from high-resolution pathological images is successively transferred to student model from the global features in the shallow network layers to the semantic information of the back-end through a multi-step training strategy. Experimental results on two datasets show that the proposed method achieves effective detection outcomes for low-resolution kidney pathology images. Compared to other state-of-the-art detection techniques, our method achieves an ${AP}_{0. 5: 0. 95}$ improvement of 23. 1% on the private LN dataset and 15. 9% on the public HUBMAP dataset.

NeurIPS Conference 2025 Conference Paper

STRIDER: Navigation via Instruction-Aligned Structural Decision Space Optimization

  • Diqi He
  • Xuehao Gao
  • Hao Li
  • Junwei Han
  • Dingwen Zhang

The Zero-shot Vision-and-Language Navigation in Continuous Environments (VLN-CE) task requires agents to navigate previously unseen 3D environments using natural language instructions, without any scene-specific training. A critical challenge in this setting lies in ensuring agents’ actions align with both spatial structure and task intent over long-horizon execution. Existing methods often fail to achieve robust navigation due to a lack of structured decision-making and insufficient integration of feedback from previous actions. To address these challenges, we propose STRIDER (Instruction-Aligned Structural Decision Space Optimization), a novel framework that systematically optimizes the agent’s decision space by integrating spatial layout priors and dynamic task feedback. Our approach introduces two key innovations: 1) a Structured Waypoint Generator that constrains the action space through spatial structure, and 2) a Task-Alignment Regulator that adjusts behavior based on task progress, ensuring semantic alignment throughout navigation. Extensive experiments on the R2R-CE and RxR-CE benchmarks demonstrate that STRIDER significantly outperforms strong SOTA across key metrics; in particular, it improves Success Rate (SR) from 29\% to 35\%, a relative gain of 20. 7\%. Such results highlight the importance of spatially constrained decision-making and feedback-guided execution in improving navigation fidelity for zero-shot VLN-CE.

JBHI Journal 2024 Journal Article

End-to-End Prediction of EGFR Mutation Status With Denseformer

  • Shijie Zhao
  • Wenyuan Li
  • Zhuoyan Liu
  • Tianji Pang
  • Yang Yang
  • Ning Qiang
  • Jingyi Zhao
  • Bangguo Li

Accurate genotyping of the epidermal growth factor receptor (EGFR) is critical for the treatment planning of lung adenocarcinoma. Currently, clinical identification of EGFR genotyping highly relies on biopsy and sequence testing which is invasive and complicated. Recent advancements in the integration of computed tomography (CT) imagery with deep learning techniques have yielded a non-invasive and straightforward way for identifying EGFR profiles. However, there are still many limitations for further exploration: 1) most of these methods still require physicians to annotate tumor boundaries, which are time-consuming and prone to subjective errors; 2) most of the existing methods are simply borrowed from computer vision field which does not sufficiently exploit the multi-level features for final prediction. To solve these problems, we propose a Denseformer framework to identify EGFR mutation status in a real end-to-end fashion directly from 3D lung CT images. Specifically, we take the 3D whole-lung CT images as the input of the neural network model without manually labeling the lung nodules. This is inspired by the medical report that the mutational status of EGFR is associated not only with the local tumor nodules but also with the microenvironment surrounded by the whole lung. Besides, we design a novel Denseformer network to fully explore the distinctive information across the different level features. The Denseformer is a novel network architecture that combines the advantages of both convolutional neural network (CNN) and Transformer. Denseformer directly learns from the 3D whole-lung CT images, which preserves the spatial location information in the CT images. To further improve the model performance, we designed a combined Transformer module. This module employs the Transformer Encoder to globally integrate the information of different levels and layers and use them as the basis for the final prediction. The proposed model has been tested on a lung adenocarcinoma dataset collected at the Affiliated Hospital of Zunyi Medical University. Extensive experiments demonstrated the proposed method can effectively extract meaningful features from 3D CT images to make accurate predictions. Compared with other state-of-the-art methods, Denseformer achieves the best performance among current methods using deep learning to predict EGFR mutation status based on a single modality of CT images.

AAAI Conference 2024 Conference Paper

SpFormer: Spatio-Temporal Modeling for Scanpaths with Transformer

  • Wenqi Zhong
  • Linzhi Yu
  • Chen Xia
  • Junwei Han
  • Dingwen Zhang

Saccadic scanpath, a data representation of human visual behavior, has received broad interest in multiple domains. Scanpath is a complex eye-tracking data modality that includes the sequences of fixation positions and fixation duration, coupled with image information. However, previous methods usually face the spatial misalignment problem of fixation features and loss of critical temporal data (including temporal correlation and fixation duration). In this study, we propose a Transformer-based scanpath model, SpFormer, to alleviate these problems. First, we propose a fixation-centric paradigm to extract the aligned spatial fixation features and tokenize the scanpaths. Then, according to the visual working memory mechanism, we design a local meta attention to reduce the semantic redundancy of fixations and guide the model to focus on the meta scanpath. Finally, we progressively integrate the duration information and fuse it with the fixation features to solve the problem of ambiguous location with the Transformer block increasing. We conduct extensive experiments on four databases under three tasks. The SpFormer establishes new state-of-the-art results in distinct settings, verifying its flexibility and versatility in practical applications. The code can be obtained from https://github.com/wenqizhong/SpFormer.

YNIMG Journal 2023 Journal Article

Arousal modulates the amygdala-insula reciprocal connectivity during naturalistic emotional movie watching

  • Liting Wang
  • Xintao Hu
  • Yudan Ren
  • Jinglei Lv
  • Shijie Zhao
  • Lei Guo
  • Tianming Liu
  • Junwei Han

Emotional arousal is a complex state recruiting distributed cortical and subcortical structures, in which the amygdala and insula play an important role. Although previous neuroimaging studies have showed that the amygdala and insula manifest reciprocal connectivity, the effective connectivities and modulatory patterns on the amygdala-insula interactions underpinning arousal are still largely unknown. One of the reasons may be attributed to static and discrete laboratory brain imaging paradigms used in most existing studies. In this study, by integrating naturalistic-paradigm (i.e., movie watching) functional magnetic resonance imaging (fMRI) with a computational affective model that predicts dynamic arousal for the movie stimuli, we investigated the effective amygdala-insula interactions and the modulatory effect of the input arousal on the effective connections. Specifically, the predicted dynamic arousal of the movie served as regressors in general linear model (GLM) analysis and brain activations were identified accordingly. The regions of interest (i.e., the bilateral amygdala and insula) were localized according to the GLM activation map. The effective connectivity and modulatory effect were then inferred by using dynamic causal modeling (DCM). Our experimental results demonstrated that amygdala was the site of driving arousal input and arousal had a modulatory effect on the reciprocal connections between amygdala and insula. Our study provides novel evidence to the underlying neural mechanisms of arousal in a dynamical naturalistic setting.

YNIMG Journal 2023 Journal Article

Genetic Influence on Gyral Peaks

  • Ying Huang
  • Tuo Zhang
  • Songyao Zhang
  • Weihan Zhang
  • Li Yang
  • Dajiang Zhu
  • Tianming Liu
  • Xi Jiang

Genetic mechanisms have been hypothesized to be a major determinant in the formation of cortical folding. Although there is an increasing number of studies examining the heritability of cortical folding, most of them focus on sulcal pits rather than gyral peaks. Gyral peaks, which reflect the highest local foci on gyri and are consistent across individuals, remain unstudied in terms of heritability. To address this knowledge gap, we used high-resolution data from the Human Connectome Project (HCP) to perform classical twin analysis and estimate the heritability of gyral peaks across various brain regions. Our results showed that the heritability of gyral peaks was heterogeneous across different cortical regions, but relatively symmetric between hemispheres. We also found that pits and peaks are different in a variety of anatomic and functional measures. Further, we explored the relationship between the levels of heritability and the formation of cortical folding by utilizing the evolutionary timeline of gyrification. Our findings indicate that the heritability estimates of both gyral peaks and sulcal pits decrease linearly with the evolution timeline of gyrification. This suggests that the cortical folds which formed earlier during gyrification are subject to stronger genetic influences than the later ones. Moreover, the pits and peaks coupled by their time of appearance are also positively correlated in respect of their heritability estimates. These results fill the knowledge gap regarding genetic influences on gyral peaks and significantly advance our understanding of how genetic factors shape the formation of cortical folding. The comparison between peaks and pits suggests that peaks are not a simple morphological mirror of pits but could help complete the understanding of folding patterns.

IJCAI Conference 2022 Conference Paper

Beyond the Prototype: Divide-and-conquer Proxies for Few-shot Segmentation

  • Chunbo Lang
  • Binfei Tu
  • Gong Cheng
  • Junwei Han

Few-shot segmentation, which aims to segment unseen-class objects given only a handful of densely labeled samples, has received widespread attention from the community. Existing approaches typically follow the prototype learning paradigm to perform meta-inference, which fails to fully exploit the underlying information from support image-mask pairs, resulting in various segmentation failures, e. g. , incomplete objects, ambiguous boundaries, and distractor activation. To this end, we propose a simple yet versatile framework in the spirit of divide-and-conquer. Specifically, a novel self-reasoning scheme is first implemented on the annotated support image, and then the coarse segmentation mask is divided into multiple regions with different properties. Leveraging effective masked average pooling operations, a series of support-induced proxies are thus derived, each playing a specific role in conquering the above challenges. Moreover, we devise a unique parallel decoder structure that integrates proxies with similar attributes to boost the discrimination power. Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information as a guide at the “episode” level, not just about the object cues themselves. Extensive experiments on PASCAL-5i and COCO-20i demonstrate the superiority of DCP over conventional prototype-based approaches (up to 5~10% on average), which also establishes a new state-of-the-art. Code is available at github. com/chunbolang/DCP.

NeurIPS Conference 2022 Conference Paper

Intermediate Prototype Mining Transformer for Few-Shot Semantic Segmentation

  • Yuanwei Liu
  • Nian Liu
  • Xiwen Yao
  • Junwei Han

Few-shot semantic segmentation aims to segment the target objects in query under the condition of a few annotated support images. Most previous works strive to mine more effective category information from the support to match with the corresponding objects in query. However, they all ignored the category information gap between query and support images. If the objects in them show large intra-class diversity, forcibly migrating the category information from the support to the query is ineffective. To solve this problem, we are the first to introduce an intermediate prototype for mining both deterministic category information from the support and adaptive category knowledge from the query. Specifically, we design an Intermediate Prototype Mining Transformer (IPMT) to learn the prototype in an iterative way. In each IPMT layer, we propagate the object information in both support and query features to the prototype and then use it to activate the query feature map. By conducting this process iteratively, both the intermediate prototype and the query feature can be progressively improved. At last, the final query feature is used to yield precise segmentation prediction. Extensive experiments on both PASCAL-5i and COCO-20i datasets clearly verify the effectiveness of our IPMT and show that it outperforms previous state-of-the-art methods by a large margin. Code is available at https: //github. com/LIUYUANWEI98/IPMT

AAAI Conference 2020 Conference Paper

Deep Embedded Complementary and Interactive Information for Multi-View Classification

  • Jinglin Xu
  • Wenbin Li
  • Xinwang Liu
  • Dingwen Zhang
  • Ji Liu
  • Junwei Han

Multi-view classification optimally integrates various features from different views to improve classification tasks. Though most of the existing works demonstrate promising performance in various computer vision applications, we observe that they can be further improved by sufficiently utilizing complementary view-specific information, deep interactive information between different views, and the strategy of fusing various views. In this work, we propose a novel multi-view learning framework that seamlessly embeds various view-specific information and deep interactive information and introduces a novel multi-view fusion strategy to make a joint decision during the optimization for classification. Specifically, we utilize different deep neural networks to learn multiple view-specific representations, and model deep interactive information through a shared interactive network using the cross-correlations between attributes of these representations. After that, we adaptively integrate multiple neural networks by flexibly tuning the power exponent of weight, which not only avoids the trivial solution of weight but also provides a new approach to fuse outputs from different deterministic neural networks. Extensive experiments on several public datasets demonstrate the rationality and effectiveness of our method.

IJCAI Conference 2020 Conference Paper

Joint Multi-view 2D Convolutional Neural Networks for 3D Object Classification

  • Jinglin Xu
  • Xiangsen Zhang
  • Wenbin Li
  • Xinwang Liu
  • Junwei Han

Three-dimensional (3D) object classification is widely involved in various computer vision applications, e. g. , autonomous driving, simultaneous localization and mapping, which has attracted lots of attention in the committee. However, solving 3D object classification by directly employing the 3D convolutional neural networks (CNNs) generally suffers from high computational cost. Besides, existing view-based methods cannot better explore the content relationships between views. To this end, this work proposes a novel multi-view framework by jointly using multiple 2D-CNNs to capture discriminative information with relationships as well as a new multi-view loss fusion strategy, in an end-to-end manner. Specifically, we utilize multiple 2D views of a 3D object as input and integrate the intra-view and inter-view information of each view through the view-specific 2D-CNN and a series of modules (outer product, view pair pooling, 1D convolution, and fully connected transformation). Furthermore, we design a novel view ensemble mechanism that selects several discriminative and informative views to jointly infer the category of a 3D object. Extensive experiments demonstrate that the proposed method is able to outperform current state-of-the-art methods on 3D object classification. More importantly, this work provides a new way to improve 3D object classification from the perspective of fully utilizing well-established 2D-CNNs.

JBHI Journal 2019 Journal Article

Identifying Brain Networks at Multiple Time Scales via Deep Recurrent Neural Network

  • Yan Cui
  • Shijie Zhao
  • Han Wang
  • Li Xie
  • Yaowu Chen
  • Junwei Han
  • Lei Guo
  • Fan Zhou

For decades, task functional magnetic resonance imaging has been a powerful noninvasive tool to explore the organizational architecture of human brain function. Researchers have developed a variety of brain network analysis methods for task fMRI data, including the general linear model, independent component analysis, and sparse representation methods. However, these shallow models are limited in faithful reconstruction and modeling of the hierarchical and temporal structures of brain networks, as demonstrated in more and more studies. Recently, recurrent neural networks (RNNs) exhibit great ability of modeling hierarchical and temporal dependence features in the machine learning field, which might be suitable for task fMRI data modeling. To explore such possible advantages of RNNs for task fMRI data, we propose a novel framework of a deep recurrent neural network (DRNN) to model the functional brain networks from task fMRI data. Experimental results on the motor task fMRI data of Human Connectome Project 900 subjects release demonstrated that the proposed DRNN can not only faithfully reconstruct functional brain networks, but also identify more meaningful brain networks with multiple time scales which are overlooked by traditional shallow models. In general, this work provides an effective and powerful approach to identifying functional brain networks at multiple time scales from task fMRI data.

TIST Journal 2018 Journal Article

A Review of Co-Saliency Detection Algorithms

  • Dingwen Zhang
  • Huazhu Fu
  • Junwei Han
  • Ali Borji
  • Xuelong Li

Co-saliency detection is a newly emerging and rapidly growing research area in the computer vision community. As a novel branch of visual saliency, co-saliency detection refers to the discovery of common and salient foregrounds from two or more relevant images, and it can be widely used in many computer vision tasks. The existing co-saliency detection algorithms mainly consist of three components: extracting effective features to represent the image regions, exploring the informative cues or factors to characterize co-saliency, and designing effective computational frameworks to formulate co-saliency. Although numerous methods have been developed, the literature is still lacking a deep review and evaluation of co-saliency detection techniques. In this article, we aim at providing a comprehensive review of the fundamentals, challenges, and applications of co-saliency detection. Specifically, we provide an overview of some related computer vision works, review the history of co-saliency detection, summarize and categorize the major algorithms in this research area, discuss some open issues in this area, present the potential applications of co-saliency detection, and finally point out some unsolved challenges and promising future works. We expect this review to be beneficial to both fresh and senior researchers in this field and to give insights to researchers in other related areas regarding the utility of co-saliency detection algorithms.

AAAI Conference 2018 Conference Paper

Generative Adversarial Network Based Heterogeneous Bibliographic Network Representation for Personalized Citation Recommendation

  • Xiaoyan Cai
  • Junwei Han
  • Libin Yang

Network representation has been recently exploited for many applications, such as citation recommendation, multilabel classification and link prediction. It learns lowdimensional vector representation for each vertex in networks. Existing network representation methods only focus on incomplete aspects of vertex information (i. e. , vertex content, network structure or partial integration), moreover they are commonly designed for homogeneous information networks where all the vertices of a network are of the same type. In this paper, we propose a deep network representation model that integrates network structure and the vertex content information into a unified framework by exploiting generative adversarial network, and represents different types of vertices in the heterogeneous network in a continuous and common vector space. Based on the proposed model, we can obtain heterogeneous bibliographic network representation for efficient citation recommendation. The proposed model also makes personalized citation recommendation possible, which is a new issue that a few papers addressed in the past. When evaluated on the AAN and DBLP datasets, the performance of the proposed heterogeneous bibliographic network based citation recommendation approach is comparable with that of the other network representation based citation recommendation approaches. The results also demonstrate that the personalized citation recommendation approach is more effective than the nonpersonalized citation recommendation approach.

IJCAI Conference 2018 Conference Paper

Multi-scale and Discriminative Part Detectors Based Features for Multi-label Image Classification

  • Gong Cheng
  • Decheng Gao
  • Yang Liu
  • Junwei Han

Convolutional neural networks (CNNs) have shown their promise for image classification task. However, global CNN features still lack geometric invariance for addressing the problem of intra-class variations and so are not optimal for multi-label image classification. This paper proposes a new and effective framework built upon CNNs to learn Multi-scale and Discriminative Part Detectors (MsDPD)-based feature representations for multi-label image classification. Specifically, at each scale level, we (i) first present an entropy-rank based scheme to generate and select a set of discriminative part detectors (DPD), and then (ii) obtain a number of DPD-based convolutional feature maps with each feature map representing the occurrence probability of a particular part detector and learn DPD-based features by using a task-driven pooling scheme. The two steps are formulated into a unified framework by developing a new objective function, which jointly trains part detectors incrementally and integrates the learning of feature representations into the classification task. Finally, the multi-scale features are fused to produce the predictions. Experimental results on PASCAL VOC 2007 and VOC 2012 datasets demonstrate that the proposed method achieves better accuracy when compared with the existing state-of-the-art multi-label classification methods.

AAAI Conference 2017 Conference Paper

Balanced Clustering with Least Square Regression

  • Hanyang Liu
  • Junwei Han
  • Feiping Nie
  • Xuelong Li

Clustering is a fundamental research topic in data mining. A balanced clustering result is often required in a variety of applications. Many existing clustering algorithms have good clustering performances, yet fail in producing balanced clusters. In this paper, we propose a novel and simple method for clustering, referred to as the Balanced Clustering with Least Square regression (BCLS), to minimize the least square linear regression, with a balance constraint to regularize the clustering model. In BCLS, the linear regression is applied to estimate the class-specific hyperplanes that partition each class of data from others, thus guiding the clustering of the data points into different clusters. A balance constraint is utilized to regularize the clustering, by minimizing which can help produce balanced clusters. In addition, we apply the method of augmented Lagrange multipliers (ALM) to help optimize the objective model. The experiments on seven real-world benchmarks demonstrate that our approach not only produces good clustering performance but also guarantees a balanced clustering result.

AAAI Conference 2017 Conference Paper

Bilateral k-Means Algorithm for Fast Co-Clustering

  • Junwei Han
  • Kun Song
  • Feiping Nie
  • Xuelong Li

With the development of the information technology, the amount of data, e. g. text, image and video, has been increased rapidly. Efficiently clustering those large scale data sets is a challenge. To address this problem, this paper proposes a novel co-clustering method named bilateral k-means algorithm (BKM) for fast co-clustering. Different from traditional k-means algorithms, the proposed method has two indicator matrices P and Q and a diagonal matrix S to be solved, which represent the cluster memberships of samples and features, and the co-cluster centres, respectively. Therefore, it could implement different clustering tasks on the samples and features simultaneously. We also introduce an effective approach to solve the proposed method, which involves less multiplication. The computational complexity is analyzed. Extensive experiments on various types of data sets are conducted. Compared with the state-of-the-art clustering methods, the proposed BKM not only has faster computational speed, but also achieves promising clustering results.

IJCAI Conference 2017 Conference Paper

Feature Selection via Scaling Factor Integrated Multi-Class Support Vector Machines

  • Jinglin Xu
  • Feiping Nie
  • Junwei Han

In data mining, we often encounter high dimensional and noisy features, which may not only increase the load of computational resources but also result in the problem of model overfitting. Feature selection is often adopted to address this issue. In this paper, we propose a novel feature selection method based on multi-class SVM, which introduces the scaling factor with a flexible parameter to renewedly adjust the distribution of feature weights and select the most discriminative features. Concretely, the proposed method designs a scaling factor with p/2 power to control the distribution of weights adaptively and search optimal sparsity of weighting matrix. In addition, to solve the proposed model, we provide an alternative and iterative optimization method. It not only makes solutions of weighting matrix and scaling factor independently, but also provides a better way to address the problem of solving L2, 0-norm. Comprehensive experiments are conducted on six datasets to demonstrate that this work can obtain better performance compared with a number of existing state-of-the-art multi-class feature selection methods.

IJCAI Conference 2017 Conference Paper

Flexible Orthogonal Neighborhood Preserving Embedding

  • Tianji Pang
  • Feiping Nie
  • Junwei Han

In this paper, we propose a novel linear subspace learning algorithm called Flexible Orthogonal Neighborhood Preserving Embedding (FONPE), which is a linear approximation of Locally Linear Embedding (LLE) algorithm. Our novel objective function integrates two terms related to manifold smoothness and a flexible penalty defined on the projection fitness. Different from Neighborhood Preserving Embedding (NPE), we relax the hard constraint by modeling the mismatch between the approximate linear embedding and the original nonlinear embedding instead of enforcing them to be equal, which makes it better cope with the data sampled from a nonlinear manifold. Besides, instead of enforcing an orthogonality between the projected points, we enforce the mapping to be orthogonal. By using this method, FONPE tends to preserve distances and thus the overall geometry can be preserved. Unlike LLE, as FONPE has an explicit linear mapping between the input and the reduced spaces, it can handle novel testing data straightforwardly. Moreover, when the projection matrix in our model becomes an identity matrix, our model can be transformed to denoising LLE (DLLE). Compared with the standard LLE, we demonstrate that DLLE can handle data with noise better. Comprehensive experiments on several benchmark databases demonstrate the effectiveness of our algorithm.

IJCAI Conference 2017 Conference Paper

Linear Manifold Regularization with Adaptive Graph for Semi-supervised Dimensionality Reduction

  • Kai Xiong
  • Feiping Nie
  • Junwei Han

Many previous graph-based methods perform dimensionality reduction on a pre-defined graph. However, due to the noise and redundant information in the original data, the pre-defined graph has no clear structure and may not be appropriate for the subsequent task. To overcome the drawbacks, in this paper, we propose a novel approach called linear manifold regularization with adaptive graph (LMRAG) for semi-supervised dimensionality reduction. LMRAG directly incorporates the graph construction into the objective function, thus the projection matrix and the optimal graph can be simultaneously optimized. Due to the structure constraint, the learned graph is sparse and has clear structure. Extensive experiments on several benchmark datasets demonstrate the effectiveness of the proposed method.

IJCAI Conference 2017 Conference Paper

Multi-view Feature Learning with Discriminative Regularization

  • Jinglin Xu
  • Junwei Han
  • Feiping Nie

More and more multi-view data which can capture rich information from heterogeneous features are widely used in real world applications. How to integrate different types of features, and how to learn low dimensional and discriminative information from high dimensional data are two main challenges. To address these challenges, this paper proposes a novel multi-view feature learning framework, which is regularized by discriminative information and obtains a feature learning model that contains multiple discriminative feature weighting matrices for different views, and then yields multiple low dimensional features used for subsequent multi-view clustering. To optimize the formulated objective function, we transform the proposed framework into a trace optimization problem which obtains the global solution in a closed form. Experimental evaluations on four widely used datasets and comparisons with a number of state-of-the-art multi-view clustering algorithms demonstrate the superiority of the proposed work.

IJCAI Conference 2017 Conference Paper

Orthogonal and Nonnegative Graph Reconstruction for Large Scale Clustering

  • Junwei Han
  • Kai Xiong
  • Feiping Nie

Spectral clustering has been widely used due to its simplicity for solving graph clustering problem in recent years. However, it suffers from the high computational cost as data grow in scale, and is limited by the performance of post-processing. To address these two problems simultaneously, in this paper, we propose a novel approach denoted by orthogonal and nonnegative graph reconstruction (ONGR) that scales linearly with the data size. For the relaxation of Normalized Cut, we add nonnegative constraint to the objective. Due to the nonnegativity, ONGR offers interpretability that the final cluster labels can be directly obtained without post-processing. Extensive experiments on clustering tasks demonstrate the effectiveness of the proposed method.

AAAI Conference 2017 Conference Paper

Parameter Free Large Margin Nearest Neighbor for Distance Metric Learning

  • Kun Song
  • Feiping Nie
  • Junwei Han
  • Xuelong Li

We introduce a novel supervised metric learning algorithm named parameter free large margin nearest neighbor (PFLMNN) which can be seen as an improvement of the classical large margin nearest neighbor (LMNN) algorithm. The contributions of our work consist of two aspects. First, our method discards the costterm which shrinks the distances between inquiry input and its k target neighbors (the k nearest neighbors with same labels as inquiry input) in LMNN, and only focuses on improving the action to push the imposters (the samples with different labels form the inquiry input) apart out of the neighborhood of inquiry. As a result, our method does not have the parameter needed to tune on the validating set, which makes it more convenient to use. Second, by leveraging the geometry information of the imposters, we construct a novel cost function to penalize the smalldistances between each inquiry and its imposters. Different from LMNN considering every imposter located in the neighborhood of each inquiry, our method only takes care of the nearest imposters. Because when the nearest imposter is pushed out of the neighborhood of its inquiry, other imposters would be all out. In this way, the constraints in our model are much less than that of LMNN, which makes our method much easier to find the optimal distance metric. Consequently, our method not only learns a better distance metric than LMNN, but also runs faster than LMNN. Extensive experiments on different data sets with various sizes and difficulties are conducted, and the results have shown that, compared with LMNN, PFLMNN achieves better classification results.

IJCAI Conference 2017 Conference Paper

Self-paced Mixture of Regressions

  • Longfei Han
  • Dingwen Zhang
  • Dong Huang
  • Xiaojun Chang
  • Jun Ren
  • Senlin Luo
  • Junwei Han

Mixture of regressions (MoR) is the well-established and effective approach to model discontinuous and heterogeneous data in regression problems. Existing MoR approaches assume smooth joint distribution for its good anlaytic properties. However, such assumption makes existing MoR very sensitive to intra-component outliers (the noisy training data residing in certain components) and the inter-component imbalance (the different amounts of training data in different components). In this paper, we make the earliest effort on Self-paced Learning (SPL) in MoR, i. e. , Self-paced mixture of regressions (SPMoR) model. We propose a novel self-paced regularizer based on the Exclusive LASSO, which improves inter-component balance of training data. As a robust learning regime, SPL pursues confidence sample reasoning. To demonstrate the effectiveness of SPMoR, we conducted experiments on both the sythetic examples and real-world applications to age estimation and glucose estimation. The results show that SPMoR outperforms the state-of-the-arts methods.

IJCAI Conference 2017 Conference Paper

Semi-supervised Orthogonal Graph Embedding with Recursive Projections

  • Hanyang Liu
  • Junwei Han
  • Feiping Nie

Many graph based semi-supervised dimensionality reduction algorithms utilize the projection matrix to linearly map the data matrix from the original feature space to a lower dimensional representation. But the dimensionality after reduction is inevitably restricted to the number of classes, and the learned non-orthogonal projection matrix usually fails to preserve distances well and balance the weight on different projection direction. This paper proposes a novel dimensionality reduction method, called the semi-supervised orthogonal graph embedding with recursive projections (SOGE). We integrate the manifold smoothness and label fitness as well as the penalization of the linear mapping mismatch, and learn the orthogonal projection on the Stiefel manifold that empirically demonstrates better performance. Moreover, we recursively update the projection matrix in its orthocomplemented space to continuously learn more projection vectors, so as to better control the dimension of reduction. Comprehensive experiment on several benchmarks demonstrates the significant improvement over the existing methods.

IJCAI Conference 2017 Conference Paper

Two dimensional Large Margin Nearest Neighbor for Matrix Classification

  • Kun Song
  • Feiping Nie
  • Junwei Han

Matrices are common forms of data that are encountered in a wide range of real applications. How to classify this kind of data is an important research topic. In this paper, we propose a novel distance metric learning method named two dimensional large margin nearest neighbor (2DLMNNN), for improving the performance of k nearest neighbor (KNN) classifier in matrix classification. In the proposed method, left and right projection matrices are employed to define the matrix-based Mahalanobis distance, which is used to construct the objective aimed at separating points in different classes by a large margin. The parameters in those two projection matrices are much less than that in its vector-based counterpart, thus our method reduces the risks of overfitting. We also introduce a framework for solving the proposed 2DLMNN. The convergence behavior, initialization, and parameter determination are also analyzed. Compared with vector-based methods, 2DLMNN performs better for matrix data classification. Promising experimental results on several data sets are provided to demonstrate the effectiveness of our method.

IJCAI Conference 2016 Conference Paper

Bridging Saliency Detection to Weakly Supervised Object Detection Based on Self-Paced Curriculum Learning

  • Dingwen Zhang
  • Deyu Meng
  • Long Zhao
  • Junwei Han

Weakly-supervised object detection (WOD) is a challenging problems in computer vision. The key problem is to simultaneously infer the exact object locations in the training images and train the object detectors, given only the training images with weak image-level labels. Intuitively, by simulating the selective attention mechanism of human visual system, saliency detection technique can select attractive objects in scenes and thus is a potential way to provide useful priors for WOD. However, the way to adopt saliency detection in WOD is not trivial since the detected saliency region might be possibly highly ambiguous in complex cases. To this end, this paper first comprehensively analyzes the challenges in applying saliency detection to WOD. Then, we make one of the earliest efforts to bridge saliency detection to WOD via the self-paced curriculum learning, which can guide the learning procedure to gradually achieve faithful knowledge of multi-class objects from easy to hard. The experimental results demonstrate that the proposed approach can successfully bridge saliency detection and WOD tasks and achieve the state-of-the-art object detection results under the weak supervision.

YNICL Journal 2016 Journal Article

Connectome-scale group-wise consistent resting-state network analysis in autism spectrum disorder

  • Yu Zhao
  • Hanbo Chen
  • Yujie Li
  • Jinglei Lv
  • Xi Jiang
  • Fangfei Ge
  • Tuo Zhang
  • Shu Zhang

Understanding the organizational architecture of human brain function and its alteration patterns in diseased brains such as Autism Spectrum Disorder (ASD) patients are of great interests. In-vivo functional magnetic resonance imaging (fMRI) offers a unique window to investigate the mechanism of brain function and to identify functional network components of the human brain. Previously, we have shown that multiple concurrent functional networks can be derived from fMRI signals using whole-brain sparse representation. Yet it is still an open question to derive group-wise consistent networks featured in ASD patients and controls. Here we proposed an effective volumetric network descriptor, named connectivity map, to compactly describe spatial patterns of brain network maps and implemented a fast framework in Apache Spark environment that can effectively identify group-wise consistent networks in big fMRI dataset. Our experiment results identified 144 group-wisely common intrinsic connectivity networks (ICNs) shared between ASD patients and healthy control subjects, where some ICNs are substantially different between the two groups. Moreover, further analysis on the functional connectivity and spatial overlap between these 144 common ICNs reveals connectomics signatures characterizing ASD patients and controls. In particular, the computing time of our Spark-enabled functional connectomics framework is significantly reduced from 240 hours (C ++ code, single core) to 20 hours, exhibiting a great potential to handle fMRI big data in the future.

IJCAI Conference 2016 Conference Paper

Robust and Sparse Fuzzy K-Means Clustering

  • Jinglin Xu
  • Junwei Han
  • Kai Xiong
  • Feiping Nie

The partition-based clustering algorithms, like K-Means and fuzzy K-Means, are most widely and successfully used in data mining in the past decades. In this paper, we present a robust and sparse fuzzy K-Means clustering algorithm, an extension to the standard fuzzy K-Means algorithm by incorporating a robust function, rather than the square data fitting term, to handle outliers. More importantly, combined with the concept of sparseness, the new algorithm further introduces a penalty term to make the object-clusters membership of each sample have suitable sparseness. Experimental results on benchmark datasets demonstrate that the proposed algorithm not only can ensure the robustness of such soft clustering algorithm in real world applications, but also can avoid the performance degradation by considering the membership sparsity.

YNIMG Journal 2014 Journal Article

Fusing DTI and fMRI data: A survey of methods and applications

  • Dajiang Zhu
  • Tuo Zhang
  • Xi Jiang
  • Xintao Hu
  • Hanbo Chen
  • Ning Yang
  • Jinglei Lv
  • Junwei Han

The relationship between brain structure and function has been one of the centers of research in neuroimaging for decades. In recent years, diffusion tensor imaging (DTI) and functional magnetic resonance imaging (fMRI) techniques have been widely available and popular in cognitive and clinical neurosciences for examining the brain's white matter (WM) micro-structures and gray matter (GM) functions, respectively. Given the intrinsic integration of WM/GM and the complementary information embedded in DTI/fMRI data, it is natural and well-justified to combine these two neuroimaging modalities together to investigate brain structure and function and their relationships simultaneously. In the past decade, there have been remarkable achievements of DTI/fMRI fusion methods and applications in neuroimaging and human brain mapping community. This survey paper aims to review recent advancements on methodologies and applications in incorporating multimodal DTI and fMRI data, and offer our perspectives on future research directions. We envision that effective fusion of DTI/fMRI techniques will play increasingly important roles in neuroimaging and brain sciences in the years to come.

YNIMG Journal 2012 Journal Article

Inferring consistent functional interaction patterns from natural stimulus FMRI data

  • Jiehuan Sun
  • Xintao Hu
  • Xiu Huang
  • Yang Liu
  • Kaiming Li
  • Xiang Li
  • Junwei Han
  • Lei Guo

There has been increasing interest in how the human brain responds to natural stimulus such as video watching in the neuroimaging field. Along this direction, this paper presents our effort in inferring consistent and reproducible functional interaction patterns under natural stimulus of video watching among known functional brain regions identified by task-based fMRI. Then, we applied and compared four statistical approaches, including Bayesian network modeling with searching algorithms: greedy equivalence search (GES), Peter and Clark (PC) analysis, independent multiple greedy equivalence search (IMaGES), and the commonly used Granger causality analysis (GCA), to infer consistent and reproducible functional interaction patterns among these brain regions. It is interesting that a number of reliable and consistent functional interaction patterns were identified by the GES, PC and IMaGES algorithms in different participating subjects when they watched multiple video shots of the same semantic category. These interaction patterns are meaningful given current neuroscience knowledge and are reasonably reproducible across different brains and video shots. In particular, these consistent functional interaction patterns are supported by structural connections derived from diffusion tensor imaging (DTI) data, suggesting the structural underpinnings of consistent functional interactions. Our work demonstrates that specific consistent patterns of functional interactions among relevant brain regions might reflect the brain's fundamental mechanisms of online processing and comprehension of video messages.