Author name cluster

Baocai Yin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers

2 author rows

AAAI Conference 2026 Conference Paper

AlignTrack: Top-Down Spatiotemporal Resolution Alignment for RGB-Event Visual Tracking

Chuanyu Sun
Jiqing Zhang
Yang Wang
Yuanchen Wang
Yutong Jiang
Baocai Yin
Xin Yang

Most existing RGB-Event trackers rely on strictly aligned datasets, overlooking the asynchronous spatio-temporal resolutions common in real-world scenarios. This methodological limitation impedes effective RGB-Event feature alignment and ultimately degrades tracking performance. To overcome this limitation, we propose AlignTrack, a novel tracking framework built upon a Top-Down Alignment (TDA) strategy inspired by the human visual system. Our TDA framework follows an encode-decode-align paradigm: it first encodes multimodal features to generate target-related priors, which are then progressively decoded to guide a subsequent feature alignment pass. Within this framework, we introduce two key innovations: (1) a Cross-Prior Attention (CPA) module that effectively generates and integrates cross-modal priors, and (2) a Cross-Modal Semantic Alignment (CSA) loss that maximizes mutual information to enforce semantic consistency between modalities. Extensive experiments show that AlignTrack achieves state-of-the-art performance on four challenging RGB-Event tracking benchmarks, demonstrating its robustness in both aligned and unaligned scenarios. Ablation studies further validate the significant contribution of each proposed component.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Binary-Gaussian: Compact and Progressive Representation for 3D Gaussian Segmentation

An Yang
Chenyu Liu
Jun Du
Jianqing Gao
Jia Pan
Jinshui Hu
Baocai Yin
Bing Yin

3D Gaussian Splatting (3D-GS) has emerged as an efficient 3D representation and a promising foundation for semantic tasks like segmentation. However, existing 3D-GS-based segmentation methods typically rely on high-dimensional category features, which introduce substantial memory overhead. Moreover, fine-grained segmentation remains challenging due to label space congestion and the lack of stable multi-granularity control mechanisms. To address these limitations, we propose a coarse-to-fine binary encoding scheme for per-Gaussian category representation, which compresses each feature into a single integer via the binary-to-decimal mapping, drastically reducing memory usage. We further design a progressive training strategy that decomposes panoptic segmentation into a series of independent sub-tasks, reducing inter-class conflicts and thereby enhancing fine-grained segmentation capability. Additionally, we fine-tune opacity during segmentation training to address the incompatibility between photometric rendering and semantic segmentation, which often leads to foreground-background confusion. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art segmentation performance while significantly reducing memory consumption and accelerating inference.

PDF Details DOI

AAAI Conference 2026 Conference Paper

CompEvent: Complex-valued Event-RGB Fusion for Low-light Video Enhancement and Deblurring

Mingchen Zhong
Xin Lu
Dong Li
Senyan Xu
Ruixuan Jiang
Xueyang Fu
Baocai Yin

Low-light video deblurring poses significant challenges in applications like nighttime surveillance and autonomous driving due to dim lighting and long exposures. While event cameras offer potential solutions with superior low-light sensitivity and high temporal resolution, existing fusion methods typically employ staged strategies, limiting their effectiveness against combined low-light and motion blur degradations. To overcome this, we propose CompEvent, a complex neural network framework enabling holistic full-process fusion of event data and RGB frames for enhanced joint restoration. CompEvent features two core components: 1) Complex Temporal Alignment GRU, which utilizes complex-valued convolutions and processes video and event streams iteratively via GRU to achieve temporal alignment and continuous fusion; and 2) Complex Space-Frequency Learning module, which performs unified complex-valued signal processing in both spatial and frequency domains, facilitating deep fusion through spatial structures and system-level characteristics. By leveraging the holistic representation capability of complex-valued neural networks, CompEvent achieves full-process spatiotemporal fusion, maximizes complementary learning between modalities, and significantly strengthens low-light video deblurring capability. Extensive experiments demonstrate that CompEvent outperforms SOTA methods in addressing this challenging task.

PDF Details DOI

AAAI Conference 2026 Conference Paper

E-MaT:Event-oriented Mamba for Egocentric Point Tracking

Han Han
Wei Zhai
Baocai Yin
Yang Cao
Bin Li
Zheng-Jun Zha

Egocentric point tracking aims to localize points on object surfaces from a first-person perspective and serves as a critical step toward embodied intelligence. Recent methods rely on video input, tracking query points through feature matching across consecutive frames. However, these methods struggle in highly dynamic settings—a common challenge in first-person perspectives, where the head-mounted camera undergoes frequent and abrupt rotations, resulting in high angular velocities, motion blur, and large inter-frame displacements. In contrast, event cameras capture motion at microsecond temporal resolution, naturally avoiding blur and delivering low-latency, high-fidelity cues crucial for egocentric point tracking. Moreover, rapid egocentric motion disrupts local smoothness, breaking the assumption that spatially adjacent regions share similar motion. Event dynamics expose global motion trends, guiding coherent modeling and consistent feature flow. Therefore, this paper proposes a mamba-based tracking framework that constructs feature modeling paths aligned with the dominant motion trend extracted from events, and modulates feature propagation along these paths based on local motion intensity, enhancing stability by suppressing unreliable signals and emphasizing consistent cues. Additionally, a motion-adaptive suppression module enhances temporal robustness by adaptively suppressing correlation features based on motion intensity variations, mitigating the effects of intensity fluctuations and partial observability. To facilitate research in this domain, a multimodal dataset named DVS-EgoPoints with both events and videos for egocentric point tracking is collected. Experiments on the DVS-EgoPoints dataset and a simulation benchmark demonstrate superior performance over state-of-the-art methods, especially under challenging motion and occlusion conditions.

PDF Details DOI

AAAI Conference 2026 Conference Paper

MARE: Multimodal Analogical Reasoning for Disease Evolution-Aware Radiology Report Generation

Qingqing Gao
Tengfei Liu
Xiaoyan Li
Xiaodan Zhang
Zhongfan Sun
Boyue Wang
Baocai Yin
Zhaohui Liu

Radiology report generation from longitudinal medical data is critical for assessing disease progression and automating diagnostic workflows. While recent methods incorporate longitudinal information, they primarily rely on multimodal feature fusion, with limited capacity for explicit disease evolution modeling and temporal reasoning. To address this, we propose MARE, an end-to-end framework that formulates longitudinal radiology report generation as a multimodal analogical reasoning task. Inspired by the Abduction–Mapping–Induction paradigm, MARE models latent relational structures underlying disease evolution by aligning lesion-level visual features across time and mapping them to the textual domain for temporally coherent and clinically meaningful report generation. To mitigate the spatial misalignment caused by patient positioning or imaging variation, we introduce an Adaptive Region Alignment (ARA) module for robust temporal correspondence. Additionally, we design Dual Evolution Consistency (DEC) losses to regularize analogical reasoning by enforcing temporal coherence in both visual and textual evolution paths. Extensive experiments on the Longitudinal-MIMIC dataset demonstrate that MARE significantly outperforms state-of-the-art baselines across both natural language generation and clinical effectiveness metrics, highlighting the value of structured analogical reasoning for disease evolution-aware report generation.

PDF Details DOI

AAAI Conference 2025 Conference Paper

HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation

Tengfei Liu
Jiapu Wang
Yongli Hu
Mingjie Li
Junfei Yi
Xiaojun Chang
Junbin Gao
Baocai Yin

Radiology report generation (RRG) models typically focus on individual exams, often overlooking the integration of historical visual or textual data, which is crucial for patient follow-ups. Traditional methods usually struggle with long sequence dependencies when incorporating historical information, but large language models (LLMs) excel at in-context learning, making them well-suited for analyzing longitudinal medical data. In light of this, we propose a novel Historical-Constrained Large Language Models (HC-LLM) framework for RRG, empowering LLMs with longitudinal report generation capabilities by constraining the consistency and differences between longitudinal images and their corresponding reports. Specifically, our approach extracts both time-shared and time-specific features from longitudinal chest X-rays and diagnostic reports to capture disease progression. Then, we ensure consistent representation by applying intra-modality similarity constraints and aligning various features across modalities with multimodal contrastive and structural constraints. These combined constraints effectively guide the LLMs in generating diagnostic reports that accurately reflect the progression of the disease, achieving state-of-the-art results on the Longitudinal-MIMIC dataset. Notably, our approach performs well even without historical data during testing and can be easily adapted to other multimodal large models, enhancing its versatility.

PDF Details DOI

AAAI Conference 2025 Conference Paper

MaskPrompt: Open-Vocabulary Affordance Segmentation with Object Shape Mask Prompts

Dongpan Chen
Dehui Kong
Jinghua Li
Baocai Yin

Affordance refers to the interactable functional properties of an object, and affordance segmentation aims to pixel-level segment the object functional parts in a given image, which is crucial for various interactive vision tasks. Existing methods address the affordance segmentation problem by utilizing only image features, they can hardly solve the problems of interference between adjacent object pixels in complex scenes, and inability to generalize to the open-world. To tackle these problems, we propose a novel open-vocabulary affordance segmentation task and a benchmark dataset, and propose an approach with object shape mask prompts. The mask is used as prior for different granularity visual feature enhancement and fine-grained text prompt embedding. Specifically, we first propose a mask prompt generation module, which generates refined object shape masks, as well as text prompts for mask-focused regions. Based on the masks, we propose a mask prompt feature enhancement module. It uses masks to encode instance features, and then aggregates them with global features to enhance the visual feature representation. The enhanced visual features are combined with text prompts of different granularity to generate class-agnostic affordance mask proposals. We finally classify these proposals in a proposed affordance prediction module. Quantitative and qualitative evaluations compared with state-of-the-art methods demonstrate that the proposed method achieves superior performance on a proposed benchmark dataset. Our approach is also competitive on other open-vocabulary part segmentation datasets.

PDF Details DOI

AAAI Conference 2025 Conference Paper

RFL: Simplifying Chemical Structure Recognition with Ring-Free Language

Qikai Chang
Mingjun Chen
Changpeng Pi
Pengfei Hu
Zhenrong Zhang
Jiefeng Ma
Jun Du
Baocai Yin

The primary objective of Optical Chemical Structure Recognition is to identify chemical structure images into corresponding markup sequences. However, the complex two-dimensional structures of molecules, particularly those with rings and multiple branches, present significant challenges for current end-to-end methods to learn one-dimensional markup directly. To overcome this limitation, we propose a novel Ring-Free Language (RFL), which utilizes a divide-and-conquer strategy to describe chemical structures in a hierarchical form. RFL allows complex molecular structures to be decomposed into multiple parts, ensuring both uniqueness and conciseness while enhancing readability. This approach significantly reduces the learning difficulty for recognition models. Leveraging RFL, we propose a universal Molecular Skeleton Decoder (MSD), which comprises a skeleton generation module that progressively predicts the molecular skeleton and individual rings, along with a branch classification module for predicting branch information. Experimental results demonstrate that the proposed RFL and MSD can be applied to various mainstream methods, achieving superior performance compared to state-of-the-art approaches in both printed and handwritten scenarios.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

1DFormer: A Transformer Architecture Learning 1D Landmark Representations for Facial Landmark Tracking

Shi Yin
Shijie Huang
Shangfei Wang
Jinshui Hu
Tao Guo
Bing Yin
Baocai Yin
Cong Liu

Recently, heatmap regression methods based on 1D landmark representations have shown prominent performance on locating facial landmarks. However, previous methods ignored to make deep explorations on the good potentials of 1D landmark representations for sequential and structural modeling of multiple landmarks to track facial landmarks. To address this limitation, we propose a Transformer architecture, namely 1DFormer, which learns informative 1D landmark representations by capturing the dynamic and the geometric patterns of landmarks via token communications in both temporal and spatial dimensions for facial landmark tracking. For temporal modeling, we propose a confidence-enhanced multi-head attention mechanism with a recurrently token mixing strategy to adaptively and robustly embed long-term landmark dynamics into their 1D representations; for structure modeling, we design intra-group and inter-group geometric encoding mechanisms to encode the component-level as well as global-level facial structure patterns as a refinement for the 1D representations of landmarks through token communications in the spatial dimension via 1D convolutional layers. Experimental results on the 300VW and the TF databases show that 1DFormer successfully models the long-range sequential patterns as well as the inherent facial structures to learn informative 1D representations of landmark sequences, and achieves state-of-the-art performance on facial landmark tracking. Codes of our model are available in the supplementary materials.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Graph Neural Networks with Soft Association between Topology and Attribute

Yachao Yang
Yanfeng Sun
Shaofan Wang
Jipeng Guo
Junbin Gao
Fujiao Ju
Baocai Yin

Graph Neural Networks (GNNs) have shown great performance in learning representations for graph-structured data. However, recent studies have found that the interference between topology and attribute can lead to distorted node representations. Most GNNs are designed based on homophily assumptions, thus they cannot be applied to graphs with heterophily. This research critically analyzes the propagation principles of various GNNs and the corresponding challenges from an optimization perspective. A novel GNN called Graph Neural Networks with Soft Association between Topology and Attribute (GNN-SATA) is proposed. Different embeddings are utilized to gain insights into attributes and structures while establishing their interconnections through soft association. Further as integral components of the soft association, a Graph Pruning Module (GPM) and Graph Augmentation Module (GAM) are developed. These modules dynamically remove or add edges to the adjacency relationships to make the model better fit with graphs with homophily or heterophily. Experimental results on homophilic and heterophilic graph datasets convincingly demonstrate that the proposed GNN-SATA effectively captures more accurate adjacency relationships and outperforms state-of-the-art approaches. Especially on the heterophilic graph dataset Squirrel, GNN-SATA achieves a 2.81% improvement in accuracy, utilizing merely 27.19% of the original number of adjacency relationships. Our code is released at https://github.com/wwwfadecom/GNN-SATA.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Large Language Models-guided Dynamic Adaptation for Temporal Knowledge Graph Reasoning

Jiapu Wang
Kai Sun
Linhao Luo
Wei Wei
Yongli Hu
Alan W. Liew
Shirui Pan
Baocai Yin

Temporal Knowledge Graph Reasoning (TKGR) is the process of utilizing temporal information to capture complex relations within a Temporal Knowledge Graph (TKG) to infer new knowledge. Conventional methods in TKGR typically depend on deep learning algorithms or temporal logical rules. However, deep learning-based TKGRs often lack interpretability, whereas rule-based TKGRs struggle to effectively learn temporal rules that capture temporal patterns. Recently, Large Language Models (LLMs) have demonstrated extensive knowledge and remarkable proficiency in temporal reasoning. Consequently, the employment of LLMs for Temporal Knowledge Graph Reasoning (TKGR) has sparked increasing interest among researchers. Nonetheless, LLMs are known to function as black boxes, making it challenging to comprehend their reasoning process. Additionally, due to the resource-intensive nature of fine-tuning, promptly updating LLMs to integrate evolving knowledge within TKGs for reasoning is impractical. To address these challenges, in this paper, we propose a Large Language Models-guided Dynamic Adaptation (LLM-DA) method for reasoning on TKGs. Specifically, LLM-DA harnesses the capabilities of LLMs to analyze historical data and extract temporal logical rules. These rules unveil temporal patterns and facilitate interpretable reasoning. To account for the evolving nature of TKGs, a dynamic adaptation strategy is proposed to update the LLM-generated rules with the latest events. This ensures that the extracted rules always incorporate the most recent knowledge and better generalize to the predictions on future events. Experimental results show that without the need of fine-tuning, LLM-DA significantly improves the accuracy of reasoning over several common datasets, providing a robust framework for TKGR tasks.

PDF Details DOI

EAAI Journal 2024 Journal Article

Lite-UNet: A lightweight and efficient network for cell localization

Bo Li
Yong Zhang
Yunhan Ren
Chengyang Zhang
Baocai Yin

Cell localization constitutes a fundamental research domain within the realm of pathology image analysis, with its core objective being the precise identification of cell spatial coordinates. The task has always involved the challenge of large color variations among cells, uneven distribution, and overlapping borders. Furthermore, in realistic cell localization scenarios, the existing state-of-the-art methods suffer from high computational costs and slow inference times, which severely reduce the efficiency of computer-assisted. To tackle the above issues, a lightweight and efficient cell localization model named Lite-UNet is proposed. Specifically, the Lite-UNet encompasses three pivotal modules. Firstly, we introduce a gradient aggregation module grounded in difference convolution. This module effectively mitigates the challenge posed by extensive color variations among cells by adeptly leveraging gradient information. Secondly, we propose an efficient plug-and-play graph correlation attention module, which optimizes the feature representation capabilities by encoding higher-order feature associations. Finally, we design a lightweight Ghost_CBAM module that alleviates the difficulty of uneven cell distribution while forming the base module of the Lite-UNet. Extensive experiments show that our Lite-UNet is capable of locating cells in images quickly and accurately, thus further improving the efficiency of computer-assisted medicine.

AAAI Conference 2024 Conference Paper

TAU: Trajectory Data Augmentation with Uncertainty for Next POI Recommendation

Zhuang Zhuang
Tianxin Wei
Lingbo Liu
Heng Qi
Yanming Shen
Baocai Yin

Next Point-of-Interest (POI) recommendation has been proven effective at utilizing sparse, intricate spatial-temporal trajectory data to recommend subsequent POIs to users. While existing methods commonly alleviate the problem of data sparsity by integrating spatial-temporal context information, POI category features, and social relationships, they largely overlook the fact that the trajectory sequences collected in the datasets are often incomplete. This oversight limits the model’s potential to fully leverage historical context. In light of this background, we propose Trajectory Data Augmentation with Uncertainty (TAU) for Next POI Recommendation. TAU is a general graph-based trajectory data augmentation method designed to complete user mobility patterns by marrying uncertainty estimation into the next POI recommendation task. More precisely, TAU taps into the global transition pattern graph to identify sets of intermediate nodes located between every pair of locations, effectively leveraging edge weights as transition probabilities. During trajectory sequence construction, TAU selectively prompts intermediate nodes, chosen based on their likelihood of occurrence as pseudo-labels, to establish comprehensive trajectory sequences. Furthermore, to gauge the certainty and impact of pseudo-labels on the target location, we introduce a novel confidence-aware calibration strategy using evidence deep learning (EDL) for improved performance and reliability. The experimental results clearly indicate that our TAU method achieves consistent performance improvements over existing techniques across two real-world datasets, verifying its effectiveness as the state-of-the-art approach to the task.

PDF Details DOI

EAAI Journal 2023 Journal Article

MVMA-GCN: Multi-view multi-layer attention graph convolutional networks

Pengyu Zhang
Yong Zhang
Jingcheng Wang
Baocai Yin

The accuracy of graph representation learning is highly dependent on the precise characterization of node relationships. However, representing the complex and diverse networks in the real world using a single type of node or link is challenging, often resulting in incomplete information. Moreover, different types of nodes and links convey rich information, which makes it difficult to design a graph network that can integrate diverse links. This paper introduces a novel multi-view and multi-layer attention model designed to optimize node embeddings for semi-supervised node classification. The proposed model exploits various types of inter-node links and employs the Hilbert–Schmidt independence criterion to maximize the dissimilarity between distinct node relationships. Furthermore, the multi-layer attention mechanism is used to discern the impact of different neighboring nodes and relationships between various node relationships. The performance of the proposed model, MVMA-GCN, was assessed on numerous real-world multi-view datasets. It was observed that MVMA-GCN consistently outperformed existing models, demonstrating superior accuracy in semi-supervised classification tasks. We have made our code publicly available at here to ensure the reproducibility of our results.

ICML Conference 2022 Conference Paper

A New Perspective on the Effects of Spectrum in Graph Neural Networks

Mingqi Yang
Yanming Shen
Rui Li 0086
Heng Qi
Qiang Zhang 0008
Baocai Yin

Many improvements on GNNs can be deemed as operations on the spectrum of the underlying graph matrix, which motivates us to directly study the characteristics of the spectrum and their effects on GNN performance. By generalizing most existing GNN architectures, we show that the correlation issue caused by the unsmooth spectrum becomes the obstacle to leveraging more powerful graph filters as well as developing deep architectures, which therefore restricts GNNs’ performance. Inspired by this, we propose the correlation-free architecture which naturally removes the correlation issue among different channels, making it possible to utilize more sophisticated filters within each channel. The final correlation-free architecture with more powerful filters consistently boosts the performance of learning graph representations. Code is available at https: //github. com/qslim/gnn-spectrum.

NeurIPS Conference 2022 Conference Paper

Biologically Inspired Dynamic Thresholds for Spiking Neural Networks

Jianchuan Ding
Bo Dong
Felix Heide
Yufei Ding
Yunduo Zhou
Baocai Yin
Xin Yang

The dynamic membrane potential threshold, as one of the essential properties of a biological neuron, is a spontaneous regulation mechanism that maintains neuronal homeostasis, i. e. , the constant overall spiking firing rate of a neuron. As such, the neuron firing rate is regulated by a dynamic spiking threshold, which has been extensively studied in biology. Existing work in the machine learning community does not employ bioinspired spiking threshold schemes. This work aims at bridging this gap by introducing a novel bioinspired dynamic energy-temporal threshold (BDETT) scheme for spiking neural networks (SNNs). The proposed BDETT scheme mirrors two bioplausible observations: a dynamic threshold has 1) a positive correlation with the average membrane potential and 2) a negative correlation with the preceding rate of depolarization. We validate the effectiveness of the proposed BDETT on robot obstacle avoidance and continuous control tasks under both normal conditions and various degraded conditions, including noisy observations, weights, and dynamic environments. We find that the BDETT outperforms existing static and heuristic threshold approaches by significant margins in all tested conditions, and we confirm that the proposed bioinspired dynamic threshold scheme offers homeostasis to SNNs in complex real-world tasks.

EAAI Journal 2022 Journal Article

Multi-view hypergraph neural networks for student academic performance prediction

Mengran Li
Yong Zhang
Xiaoyong Li
Lijia Cai
Baocai Yin

Academic performance prediction is a fundamental and hot issue in educational data mining (EDM). Recently, researchers have proposed a series of effective machine learning (ML) based classification strategies to predict students’ academic performance. However, prior arts are typically concerned about individual models but neglect the association among students, which might considerably have an effect on the integrity of the academic performance-related representations. Meanwhile, students’ multi-viewing behavior contains complex relations among students. Therefore, we propose a Multi-View Hypergraph Neural Network (MVHGNN) for predicting students’ academic performance. MVHGNN uses hypergraphs to construct high-order relations among students. The semantic information implied by multiple behaviors is consolidated through meta-paths. Further, a Cascade Attention Transformer (CAT) module is introduced to mine the weight of different behaviors by the self-attention mechanism. Our method is evaluated on real campus student behavioral datasets. The experimental results demonstrate that our method outperforms the state-of-the-art ones.

IROS Conference 2021 Conference Paper

A Vision-based Irregular Obstacle Avoidance Framework via Deep Reinforcement Learning

Lingping Gao
Jianchuan Ding
Wenxi Liu
Haiyin Piao
Yuxin Wang 0001
Xin Yang 0011
Baocai Yin

Deep reinforcement learning has achieved great success in laser-based collision avoidance work because the laser can sense accurate depth information without too much redundant data, which can maintain the robustness of the algorithm when it is migrated from the simulation environment to the real world. However, high-cost laser devices are not only difficult to apply on a large scale but also have poor robustness to irregular objects, e. g. , tables, chairs, shelves, etc. In this paper, we propose a vision-based collision avoidance framework to solve the challenging problem. Our method attempts to estimate the depth and incorporate the semantic information from RGB data to obtain a new form of data, pseudo-laser data, which combines the advantages of visual information and laser information. Compared to traditional laser data that only contains the one-dimensional distance information captured at a certain height, our proposed pseudo-laser data encodes the depth information and semantic information within the image, which makes our method more effective for irregular obstacles. Besides, we adaptively add noise to the laser data during the training stage to increase the robustness of our model in the real world, due to the estimated depth information is not accurate. Experimental results show that our framework achieves state-of-the-art performance in several unseen virtual and real-world scenarios.

AAAI Conference 2021 Conference Paper

Hierarchical Graph Convolution Network for Traffic Forecasting

Kan Guo
Yongli Hu
Yanfeng Sun
Sean Qian
Junbin Gao
Baocai Yin

Traffic forecasting is attracting considerable interest due to its widespread application in intelligent transportation systems. Given the complex and dynamic traffic data, many methods focus on how to establish a spatial-temporal model to express the non-stationary traffic patterns. Recently, the latest Graph Convolution Network (GCN) has been introduced to learn spatial features while the time neural networks are used to learn temporal features. These GCN based methods obtain state-of-the-art performance. However, the current GCN based methods ignore the natural hierarchical structure of traffic systems which is composed of the micro layers of road networks and the macro layers of region networks, in which the nodes are obtained through pooling method and could include some hot traffic regions such as downtown and CBD etc. , while the current GCN is only applied on the micro graph of road networks. In this paper, we propose a novel Hierarchical Graph Convolution Networks (HGC- N) for traffic forecasting by operating on both the micro and macro traffic graphs. The proposed method is evaluated on two complex city traffic speed datasets. Compared to the latest GCN based methods like Graph WaveNet, the proposed HGCN gets higher traffic forecasting precision with lower computational cost. The website of the code is https: //github. com/guokan987/HGCN. git.

IJCAI Conference 2018 Conference Paper

Active Object Reconstruction Using a Guided View Planner

Xin Yang
Yuanbo Wang
Yaru Wang
Baocai Yin
Qiang Zhang
Xiaopeng Wei
Hongbo Fu

Inspired by the recent advance of image-based object reconstruction using deep learning, we present an active reconstruction model using a guided view planner. We aim to reconstruct a 3D model using images observed from a planned sequence of informative and discriminative views. But where are such informative and discriminative views around an object? To address this we propose a unified model for view planning and object reconstruction, which is utilized to learn a guided information acquisition model and to aggregate information from a sequence of images for reconstruction. Experiments show that our model (1) increases our reconstruction accuracy with an increasing number of views (2) and generally predicts a more informative sequence of views for object reconstruction compared to other alternative methods.

IJCAI Conference 2018 Conference Paper

Cascaded Low Rank and Sparse Representation on Grassmann Manifolds

Boyue Wang
Yongli Hu
Junbin Gao
Yanfeng Sun
Baocai Yin

Inspired by low rank representation and sparse subspace clustering acquiring success, ones attempt to simultaneously perform low rank and sparse constraints on the affinity matrix to improve the performance. However, it is just a trade-off between these two constraints. In this paper, we propose a novel Cascaded Low Rank and Sparse Representation (CLRSR) method for subspace clustering, which seeks the sparse expression on the former learned low rank latent representation. To make our proposed method suitable to multi-dimension or imageset data, we extend CLRSR onto Grassmann manifolds. An effective solution and its convergence analysis are also provided. The excellent experimental results demonstrate the proposed method is more robust than other state-of-the-art clustering methods on imageset data.

AAAI Conference 2018 Conference Paper

Locality Preserving Projection Based on F-norm

Xiangjie Hu
Yanfeng Sun
Junbin Gao
Yongli Hu
Baocai Yin

Locality preserving projection (LPP) is a well-known method for dimensionality reduction in which the neighborhood graph structure of data is preserved. Traditional LPP employ squared F-norm for distance measurement. This may exaggerate more distance errors, and result in a model being sensitive to outliers. In order to deal with this issue, we propose two novel F-norm-based models, termed as F-LPP and F-2DLPP, which are developed for vector-based and matrixbased data, respectively. In F-LPP and F-2DLPP, the distance of data projected to a low dimensional space is measured by F-norm. Thus it is anticipated that both methods can reduce the inﬂuence of outliers. To solve the F-norm-based models, we propose an iterative optimization algorithm, and give the convergence analysis of algorithm. The experimental results on three public databases have demonstrated the effectiveness of our proposed methods.

IJCAI Conference 2017 Conference Paper

Locality Preserving Projections for Grassmann manifold

Boyue Wang
Yongli Hu
Junbin Gao
Yanfeng Sun
Haoran Chen
Muhammad Ali
Baocai Yin

Learning on Grassmann manifold has become popular in many computer vision tasks, with the strong capability to extract discriminative information for imagesets and videos. However, such learning algorithms particularly on high-dimensional Grassmann manifold always involve with significantly high computational cost, which seriously limits the applicability of learning on Grassmann manifold in more wide areas. In this research, we propose an unsupervised dimensionality reduction algorithm on Grassmann manifold based on the Locality Preserving Projections (LPP) criterion. LPP is a commonly used dimensionality reduction algorithm for vector-valued data, aiming to preserve local structure of data in the dimension-reduced space. The strategy is to construct a mapping from higher dimensional Grassmann manifold into the one in a relative low-dimensional with more discriminative capability. The proposed method can be optimized as a basic eigenvalue problem. The performance of our proposed method is assessed on several classification and clustering tasks and the experimental results show show its clear advantages over other Grassmann based algorithms.

AAAI Conference 2016 Conference Paper

Product Grassmann Manifold Representation and Its LRR Models

Boyue Wang
Yongli Hu
Junbin Gao
Yanfeng Sun
Baocai Yin

It is a challenging problem to cluster multi- and highdimensional data with complex intrinsic properties and nonlinear manifold structure. The recently proposed subspace clustering method, Low Rank Representation (LRR), shows attractive performance on data clustering, but it generally does with data in Euclidean spaces. In this paper, we intend to cluster complex high dimensional data with multiple varying factors. We propose a novel representation, namely Product Grassmann Manifold (PGM), to represent these data. Additionally, we discuss the geometry metric of the manifold and expand the conventional LRR model in Euclidean space onto PGM and thus construct a new LRR model. Several clustering experimental results show that the proposed method obtains superior accuracy compared with the clustering methods on manifolds or conventional Euclidean spaces.