Author name cluster

Hang Yu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

31 papers

2 author rows

EAAI Journal 2026 Journal Article

A few-shot enhancement method for railway foreign object detection using sample generation and transfer learning

Hang Yu
Zhiwei Cao
Yong Qin
Tiantao Xu
Tao Jing
Zhenlin Wei

Details DOI

EAAI Journal 2026 Journal Article

Context-aware and deformation-adaptive small unmanned aerial vehicles detection via parallel attention and multi-scale fusion

Hang Yu
Jialin Bao
Yibo Sun
Suiping Zhou
Zhengfa Yu
Yilu Chen
Ke Yan
Yifan Yang

Details DOI

AAAI Conference 2026 Conference Paper

Decoupled Spatiotemporal Forecasting from Extreme Sparse Observations via Quantized Latent Space

Zhongnan Weng
Yue Hong
Hang Yu
Jiayi Que
Juan Liu
Xiangrong Liu

Predicting spatiotemporal fields governed by partial differential equations (PDEs) from sparse sensor data is a critical and long-standing challenge in science and engineering. Recent deep learning approaches, particularly neural operators, have shown considerable promise in solving PDEs. However, their performance degrades significantly in the demanding regime of extreme sparsity, characterized by spatial sensor coverage of less than 1% and limited temporal observations. To overcome this limitation, we propose a novel framework that decouples the task into two stages: spatial reconstruction and temporal extrapolation. In the first stage, rather than reconstructing the high-dimensional physical field directly, our model learns to reconstruct the complete latent features from sparse observations—features that would otherwise be extracted from a dense field. This process is stabilized by a Vector Quantization (VQ) bottleneck, which discretizes the latent space. In the second stage, a decoder-only Transformer performs temporal extrapolation by autoregressively predicting the future sequence of these discrete latent indices. This design inherently allows the model to generalize to new initial conditions and varying forecast horizons, akin to standard autoregressive models. We validate our framework on three challenging benchmarks, achieving state-of-the-art (SOTA) performance under severe sparsity constraints. Furthermore, we introduce a challenging benchmark dataset based on fire dynamics simulations. On this benchmark, our model successfully forecasts the field's evolution 30 frames into the future from a single timeframe with less than 0.1% spatial observations—a result that pushes well beyond the capabilities of existing methods.

PDF Details DOI

AAAI Conference 2026 Conference Paper

DFRec: Dual Fluctuation Modeling of Multi-level Intent Evolution for Next-Item Recommendation

Nengjun Zhu
Lingdan Sun
Qi Zhang
Jian Cao
Hang Yu

User sequential behaviors are driven by a variety of complex and evolving intents. Capturing the dynamic change of user intents has become critical yet challenging in the next-item recommendation. Existing studies usually model the transition relationships among multiple intents within a session or integrate temporal information to capture the dynamic evolution of user intents. However, they struggle to identify the precise timing and magnitudes of intent changes, leading to ambiguity in providing consistent or violated recommendations and ultimately yielding subpar performance. To this end, we propose a novel framework called Dual Fluctuation Modeling of Multi-level Intent Evolution for Next-Item Recommendation (DFRec) in this paper. DFRec explicitly identifies the user intent changes and further quantifies the magnitude of the changes. Specifically, we assume that a user's intent fluctuates around an inherent intent, with the magnitude of fluctuations indicating the extent of changes in user intents. Thus, we design an Emerging Intent Generation Module that employs a normal distribution with dynamic variance to capture intent fluctuations at each time step. Furthermore, we introduce a dual-layer dynamic variance update mechanism to capture fluctuation characteristics at different temporal levels, enhancing the representation of possible emergent intents. Extensive experiments on three real-world datasets verify DFRec's superiority over state-of-the-art baselines.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Discovering Latent Facts from Context to Construct Richer Open Knowledge Graphs

Jinpeng Li
Hang Yu
Ziqi Ma
Peng Qi

Knowledge graph construction (KGC) aims to extract valuable information from text and organize it into structured knowledge graphs (KGs). Recent methods have leveraged the strong generative capabilities of large language models (LLMs) to improve the generalization and reduce the labor costs. However, constrained by the input length of LLMs, existing methods mainly focus on extracting knowledge within individual texts and lack the capability to discover latent knowledge across texts. To fill this gap, we propose a novel method for open knowledge graph construction, termed KG-DLF. The core idea of this method is to enhance the knowledge graph construction process by discovering new facts that are consistent with the underlying contextual logic. Specifically, we first design a knowledge extractor to extract knowledge from the text. Then, a knowledge normalizer performs schema alignment on the extracted knowledge. Next, we explore a knowledge discoverer based on a clue search strategy, which leverages the logical consistency of context to mine latent facts. Finally, we design a counterfactual-based knowledge corrector, enabling the model to purify knowledge and reduce factual errors. Experimental results show that KG-DLF is capable of extracting comprehensive knowledge in open-world scenarios across three KGC benchmarks.

PDF Details DOI

AAAI Conference 2026 Conference Paper

LAMDAS: LLM as an Implicit Classifier for Domain-specific Data Selection

Jian Wu
Hang Yu
Bingchang Liu
Yang Wenjie
Peng Di
Jianguo Li
Yue Zhang

Adapting large language models (LLMs) to specific domains often faces a critical bottleneck: the scarcity of high-quality, human-curated data. While large volumes of unchecked data are readily available, indiscriminately using them for fine-tuning risks introducing noise and degrading performance. Strategic data selection is thus crucial, requiring a method that is both accurate and efficient. Existing approaches, categorized as similarity-based and direct optimization methods, struggle to simultaneously achieve these goals. In this paper, we introduce LAMDAS (LLM as an implicit classifier for domain-specific Data Selection), a novel approach that leverages the pre-trained LLM itself as an implicit classifier, thereby bypassing explicit feature engineering and computationally intensive optimization process. LAMDAS reframes data selection as a one-class classification problem, identifying candidate data that "belongs" to the target domain defined by a small reference dataset. Extensive experimental results demonstrate that LAMDAS not only exceeds the performance of full-data training using a fraction of the data but also outperforms nine state-of-the-art (SOTA) baselines under various scenarios. Furthermore, LAMDAS achieves the most compelling balance between performance gains and computational efficiency compared to all evaluated baselines.

PDF Details DOI

EAAI Journal 2026 Journal Article

Multiscale wavelet-based spatial–spectral compression network for hyperspectral image

Hang Yu
Mingyang Wan
Tao Chen
Aibin Peng
Xiangfei Shen
Rulong He
Lihui Chen
Haijun Liu

Details DOI

AAAI Conference 2026 Conference Paper

Understanding Interaction as You Need: Intention-Driven Pedestrian Behavior Prediction

Hang Yu
Yansen Yu
Jiayan Qiu

Prediction of pedestrian behavior is crucial for autonomous driving systems and intelligent transportation.Conventional methods predict the behavior based solely on either the pedestrian intention or the distance-related interactions between the pedestrian and its surroundings. However, these methods overlook the associations between intention and interaction for behavior prediction, in which they should be aligned with each other, thus leading to sub-optimal predictions. To solve this problem, we propose to predict the behavior by learning the association between intention and interaction, enabling them to mutually enhance each other during the prediction. Specifically, we first predict the short-term intention of all objects, including the target pedestrian and its surroundings.Then, instead of using the distance-related interactions, we predict the interactions by learning the correlated intentions. Finally, the intention-driven interactions refine the initial intention prediction, thus ensuring the alignment between intention and interaction for behavior prediction. We evaluate our method on two downstream tasks, the pedestrian trajectory prediction and pedestrian intention estimation, and show that it outperforms all the existing methods.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Advancing Out-of-Distribution Detection via Local Neuroplasticity

Alessandro Canevaro
Julian Schmidt
Mohammad Sajad Marvi
Hang Yu
Georg Martius
Julian Jordan

In the domain of machine learning, the assumption that training and test data share the same distribution is often violated in real-world scenarios, requiring effective out-of-distribution (OOD) detection. This paper presents a novel OOD detection method that leverages the unique local neuroplasticity property of Kolmogorov-Arnold Networks (KANs). Unlike traditional multilayer perceptrons, KANs exhibit local plasticity, allowing them to preserve learned information while adapting to new tasks. Our method compares the activation patterns of a trained KAN against its untrained counterpart to detect OOD samples. We validate our approach on benchmarks from image and medical domains, demonstrating superior performance and robustness compared to state-of-the-art techniques. These results underscore the potential of KANs in enhancing the reliability of machine learning systems in diverse environments.

Details

NeurIPS Conference 2025 Conference Paper

Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks

Hongyuan Tao
Ying Zhang
Zhenhao Tang
Hongen Peng
Xukun Zhu
Bingchang Liu
Yingguang Yang
Ziyin Zhang

Recent advances in Large Language Models (LLMs) have shown promise in function-level code generation, yet repository-level software engineering tasks remain challenging. Current solutions predominantly rely on proprietary LLM agents, which introduce unpredictability and limit accessibility, raising concerns about data privacy and model customization. This paper investigates whether open-source LLMs can effectively address repository-level tasks without requiring agent-based approaches. We demonstrate this is possible by enabling LLMs to comprehend functions and files within codebases through their semantic information and structural dependencies. To this end, we introduce Code Graph Models (CGMs), which integrate repository code graph structures into the LLM's attention mechanism and map node attributes to the LLM's input space using a specialized adapter. When combined with an agentless graph RAG framework, our approach achieves a 43. 00% resolution rate on the SWE-bench Lite benchmark using the open-source Qwen2. 5-72B model. This performance ranks first among open weight models, second among methods with open-source systems, and eighth overall, surpassing the previous best open-source model-based method by 12. 33%.

PDF Details

AAAI Conference 2025 Conference Paper

Context-aware Graph Neural Network for Graph-based Fraud Detection with Extremely Limited Labels

Pengbo Li
Hang Yu
Xiangfeng Luo

Graph-based fraud detection is crucial in identifying illegal activities in social networks, finance, and other sectors. Despite recent progress in this area, most of current researches typically require a large amount of annotated data to demonstrate its benefits. In practice, obtaining sufficient high-quality annotated data is challenging, limiting the effectiveness of model training. Therefore, leveraging extremely limited label information is crucial to enhance model performance. We propose a context-aware graph neural network (CGNN) to address this. CGNN performs category semantic decomposition on the contextual neighbor features of the center node to enrich the category semantics. In the neighbor message aggregation stage, the denoising attention mechanism enables the center node to adaptively aggregate heterophilic and homophilic information from neighbors. Particularly for unlabeled data, feature augmentation within the category subspace and consistency regularization driven by entropy minimization ensure that such data can further enhance model performance under explicit semantic guidance. We demonstrate on four real-world datasets that CGNN significantly outperforms other baseline methods with extremely limited labels.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Dynamic Shadow Unveils Invisible Semantics for Video Outpainting

Ruilin Li
Hang Yu
Jiayan Qiu

Conventional video outpainting methods primarily focus on maintaining coherent textures and visual consistency across frames. However, they often fail at handling dynamic scenes due to the complex motion of objects or camera movement, leading to temporal incoherence and visible flickering artifacts across frames. This is primarily because they lack instance-aware modeling to accurately separate and track individual object motions throughout the video. In this paper, we propose a novel video outpainting framework that explicitly takes shadow-object pairs into consideration to enhance the temporal and spatial consistency of instances, even when they are temporarily invisible. Specifically, we first track the shadow-object pairs across frames and predict the instances in the scene to unveil the spatial regions of invisible instances. Then, these prediction results are fed to guide the instance-aware optical flow completion to unveil the temporal motion of invisible instances. Next, these spatiotemporal guidances of instances are used to guide the video outpainting process. Finally, a video-aware discriminator is implemented to enhance alignment among dynamic shadows and the extended semantics in the scene. Comprehensive experiments underscore the superiority of our approach, outperforming existing state-of-the-art methods in widely recognized benchmarks.

PDF Details

NeurIPS Conference 2025 Conference Paper

Nonparametric Quantile Regression with ReLU-Activated Recurrent Neural Networks

Hang Yu
Lyumin Wu
Wenxin Zhou
Zhao Ren

This paper investigates nonparametric quantile regression using recurrent neural networks (RNNs) and sparse recurrent neural networks (SRNNs) to approximate the conditional quantile function, which is assumed to follow a compositional hierarchical interaction model. We show that RNN- and SRNN-based estimators with rectified linear unit (ReLU) activation and appropriately designed architectures achieve the optimal nonparametric convergence rate, up to a logarithmic factor, under stationary, exponentially $\boldsymbol{\beta}$-mixing processes. To establish this result, we derive sharp approximation error bounds for functions in the hierarchical interaction model using RNNs and SRNNs, exploiting their close connection to sparse feedforward neural networks (SFNNs). Numerical experiments and an empirical study on the Dow Jones Industrial Average (DJIA) further support our theoretical findings.

PDF Details

AAAI Conference 2025 Conference Paper

Promoting Knowledge Base Question Answering by Directing LLMs to Generate Task-relevant Logical Forms

Jianqi Gao
Jian Cao
Ranran Bu
Nengjun Zhu
Wei Guan
Hang Yu

Knowledge base question answering (KBQA) refers to the system that produces answers to user queries by reasoning with a large-scale structured knowledge base. Advanced works have achieved great success either by generating logical forms (LF) or directly generating answers. Although the former typically yields better performance, these generated LF could be inaccurate, e.g., non-executable. In this regard, large language models (LLMs) have shown exciting potential for accurate generation. However, it is challenging to fine-tune LLMs to generate LF. This is because the context retrieved for prediction typically leads to an excessive number of reasoning paths. In this context, LLMs can generate numerous LF corresponding to these reasoning paths, but a few LF can result in correct answers. Thus, fine-tuning LLMs to generate answer-relevant LF would conflict with the prior knowledge of the LLMs. In this work, we propose a novel learning framework, FM-KBQA, to fine-tune LLMs using multi-task learning for KBQA. Specifically, we propose to fine-tune LLMs using an additional objective: generating the index of reasoning paths that lead to correct answers. This will direct LLMs to pay attention to answer-relevant paths among numerous reasoning paths by completing a simple task where the selected reasoning paths can be supplementary for non-executable LF. Directly generating answers can make LLMs pay attention to the answer-relevant reasoning paths, but it is much more challenging than generating the index of reasoning paths. To verify FM-KBQA's effectiveness, we conduct experiments on mainstream benchmarks, such as WebQuestionsSP (WQSP) and ComplexWebQuestions (CWQ). Extensive evaluations across two public benchmark datasets underscore the superiority of FM-KBQA over current state-of-the-art methods.

PDF Details DOI

EAAI Journal 2025 Journal Article

Spatial–temporal intention representation with multi-agent reinforcement learning for unmanned surface vehicles strategies learning in asset guarding task

Yang Li
Shaorong Xie
Hang Yu
Han Zhang
Zhenyu Zhang
Xiangfeng Luo

Details DOI

NeurIPS Conference 2025 Conference Paper

WeatherPrompt: Multi-modality Representation Learning for All-Weather Drone Visual Geo-Localization

Jiahao Wen
Hang Yu
Zhedong Zheng

Visual geo-localization for drones faces critical degradation under weather perturbations, \eg, rain and fog, where existing methods struggle with two inherent limitations: 1) Heavy reliance on limited weather categories that constrain generalization, and 2) Suboptimal disentanglement of entangled scene-weather features through pseudo weather categories. We present WeatherPrompt, a multi-modality learning paradigm that establishes weather-invariant representations through fusing the image embedding with the text context. Our framework introduces two key contributions: First, a Training-free Weather Reasoning mechanism that employs off-the-shelf large multi-modality models to synthesize multi-weather textual descriptions through human-like reasoning. It improves the scalability to unseen or complex weather, and could reflect different weather strength. Second, to better disentangle the scene and weather features, we propose a multi-modality framework with the dynamic gating mechanism driven by the text embedding to adaptively reweight and fuse visual features across modalities. The framework is further optimized by the cross-modal objectives, including image-text contrastive learning and image-text matching, which maps the same scene with different weather conditions closer in the representation space. Extensive experiments validate that, under diverse weather conditions, our method achieves competitive recall rates compared to state-of-the-art drone geo-localization methods. Notably, it improves Recall@1 by 13. 37\% under night conditions and by 18. 69\% under fog and snow conditions. Our code is available at https: //github. com/Jahawn-Wen/WeatherPrompt.

PDF Details

AAAI Conference 2024 Conference Paper

Barely Supervised Learning for Graph-Based Fraud Detection

Hang Yu
Zhengyang Liu
Xiangfeng Luo

In recent years, graph-based fraud detection methods have garnered increasing attention for their superior ability to tackle the issue of camouflage in fraudulent scenarios. However, these methods often rely on a substantial proportion of samples as the training set, disregarding the reality of scarce annotated samples in real-life scenarios. As a theoretical framework within semi-supervised learning, the principle of consistency regularization posits that unlabeled samples should be classified into the same category as their own perturbations. Inspired by this principle, this study incorporates unlabeled samples as an auxiliary during model training, designing a novel barely supervised learning method to address the challenge of limited annotated samples in fraud detection. Specifically, to tackle the issue of camouflage in fraudulent scenarios, we employ disentangled representation learning based on edge information for a small subset of annotated nodes. This approach partitions node features into three distinct components representing different connected edges, providing a foundation for the subsequent augmentation of unlabeled samples. For the unlabeled nodes used in auxiliary training, we apply both strong and weak augmentation and design regularization losses to enhance the detection performance of the model in the context of extremely limited labeled samples. Across five publicly available datasets, the proposed model showcases its superior detection capability over baseline models.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

DeepITE: Designing Variational Graph Autoencoders for Intervention Target Estimation

Hongyuan Tao
Hang Yu
Jianguo Li

Intervention Target Estimation (ITE) is vital for both understanding and decision-making in complex systems, yet it remains underexplored. Current ITE methods are hampered by their inability to learn from distinct intervention instances collaboratively and to incorporate rich insights from labeled data, which leads to inefficiencies such as the need for re-estimation of intervention targets with minor data changes or alterations in causal graphs. In this paper, we propose DeepITE, an innovative deep learning framework designed around a variational graph autoencoder. DeepITE can concurrently learn from both unlabeled and labeled data with different intervention targets and causal graphs, harnessing correlated information in a self or semi-supervised manner. The model's inference capabilities allow for the immediate identification of intervention targets on unseen samples and novel causal graphs, circumventing the need for retraining. Our extensive testing confirms that DeepITE not only surpasses 13 baseline methods in the Recall@k metric but also demonstrates expeditious inference times, particularly on large graphs. Moreover, incorporating a modest fraction of labeled data (5-10\%) substantially enhances DeepITE's performance, further solidifying its practical applicability. Our source code is available at https: //github. com/alipay/DeepITE.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Focus On What Matters: Separated Models For Visual-Based RL Generalization

Di Zhang
Bowen Lv
Hai Zhang
Feifan Yang
Junqiao Zhao
Hang Yu
Chang Huang
Hongtu Zhou

A primary challenge for visual-based Reinforcement Learning (RL) is to generalize effectively across unseen environments. Although previous studies have explored different auxiliary tasks to enhance generalization, few adopt image reconstruction due to concerns about exacerbating overfitting to task-irrelevant features during training. Perceiving the pre-eminence of image reconstruction in representation learning, we propose SMG (\blue{S}eparated \blue{M}odels for \blue{G}eneralization), a novel approach that exploits image reconstruction for generalization. SMG introduces two model branches to extract task-relevant and task-irrelevant representations separately from visual observations via cooperatively reconstruction. Built upon this architecture, we further emphasize the importance of task-relevant features for generalization. Specifically, SMG incorporates two additional consistency losses to guide the agent's focus toward task-relevant areas across different scenarios, thereby achieving free from overfitting. Extensive experiments in DMC demonstrate the SOTA performance of SMG in generalization, particularly excelling in video-background settings. Evaluations on robotic manipulation tasks further confirm the robustness of SMG in real-world applications. Source code is available at \url{https: //anonymous. 4open. science/r/SMG/}.

PDF Details DOI

TMLR Journal 2024 Journal Article

Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code

Ziyin Zhang
Chaoyu Chen
Bingchang Liu
Cong Liao
Zi Gong
Hang Yu
Jianguo Li
Rui Wang

In this work we systematically review the recent advancements in software engineering with language models, covering 70+ models, 40+ evaluation tasks, 180+ datasets, and 900 related works. Unlike previous works, we integrate software engineering (SE) with natural language processing (NLP) by discussing the perspectives of both sides: SE applies language models for development automation, while NLP adopts SE tasks for language model evaluation. We break down code processing models into general language models represented by the GPT family and specialized models that are specifically pretrained on code, often with tailored objectives. We discuss the relations and differences between these models, and highlight the historical transition of code modeling from statistical models and RNNs to pretrained Transformers and LLMs, which is exactly the same course that had been taken by NLP. We also go beyond programming and review LLMs' application in other software engineering activities including requirement engineering, testing, deployment, and operations in an endeavor to provide a global view of NLP in SE, and identify key challenges and potential future directions in this domain.

PDF Details

NeurIPS Conference 2023 Conference Paper

BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis

Zelin Ni
Hang Yu
Shizhan Liu
Jianguo Li
Weiyao Lin

Bases have become an integral part of modern deep learning-based models for time series forecasting due to their ability to act as feature extractors or future references. To be effective, a basis must be tailored to the specific set of time series data and exhibit distinct correlation with each time series within the set. However, current state-of-the-art methods are limited in their ability to satisfy both of these requirements simultaneously. To address this challenge, we propose BasisFormer, an end-to-end time series forecasting architecture that leverages learnable and interpretable bases. This architecture comprises three components: First, we acquire bases through adaptive self-supervised learning, which treats the historical and future sections of the time series as two distinct views and employs contrastive learning. Next, we design a Coef module that calculates the similarity coefficients between the time series and bases in the historical view via bidirectional cross-attention. Finally, we present a Forecast module that selects and consolidates the bases in the future view based on the similarity coefficients, resulting in accurate future predictions. Through extensive experiments on six datasets, we demonstrate that BasisFormer outperforms previous state-of-the-art methods by 11. 04% and 15. 78% respectively for univariate and multivariate forecasting tasks. Code isavailable at: https: //github. com/nzl5116190/Basisformer.

PDF Details

IROS Conference 2023 Conference Paper

From "Thumbs Up" to "10 out of 10": Reconsidering Scalar Feedback in Interactive Reinforcement Learning

Hang Yu
Reuben M. Aronson
Katherine H. Allen
Elaine Schaertl Short

Learning from human feedback is an effective way to improve robotic learning in exploration-heavy tasks. Compared to the wide application of binary human feedback, scalar human feedback has been used less because it is believed to be noisy and unstable. In this paper, we compare scalar and binary feedback, and demonstrate that scalar feedback benefits learning when properly handled. We collected binary or scalar feedback respectively from two groups of crowdworkers on a robot task. We found that when considering how consistently a participant labeled the same data, scalar feedback led to less consistency than binary feedback; however, the difference vanishes if small mismatches are allowed. Additionally, scalar and binary feedback show no significant differences in their correlations with key Reinforcement Learning targets. We then introduce Stabilizing TEacher Assessment DYnamics (STEADY) to improve learning from scalar feedback. Based on the idea that scalar feedback is muti-distributional, STEADY reconstructs underlying positive and negative feedback distributions and re-scales scalar feedback based on feedback statistics. We show that models trained with scalar feedback + STEADY outperform baselines, including binary feedback and raw scalar feedback, in a robot reaching task with non-expert human feedback. Our results show that both binary feedback and scalar feedback are dynamic, and scalar feedback is a promising signal for use in interactive Reinforcement Learning.

Details

NeurIPS Conference 2023 Conference Paper

How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization

Hai Zhang
Hang Yu
Junqiao Zhao
Di Zhang
Xiao Zhang
Hongtu Zhou
Chang Huang
Chen Ye

Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior methods that rely on return discrepancy to guide model learning ignore the impacts of model shift, which can lead to performance deterioration due to excessive model updates. Other methods use performance difference bound to explicitly consider model shift. However, these methods rely on a fixed threshold to constrain model shift, resulting in a heavy dependence on the threshold and a lack of adaptability during the training process. In this paper, we theoretically derive an optimization objective that can unify model shift and model bias and then formulate a fine-tuning process. This process adaptively adjusts the model updates to get a performance improvement guarantee while avoiding model overfitting. Based on these, we develop a straightforward algorithm USB-PO (Unified model Shift and model Bias Policy Optimization). Empirical results show that USB-PO achieves state-of-the-art performance on several challenging benchmark tasks.

PDF Details

EAAI Journal 2023 Journal Article

THFE: A Triple-hierarchy Feature Enhancement method for tiny boat detection

Yinsai Guo
Hang Yu
Liyan Ma
Liang Zeng
Xiangfeng Luo

Details DOI

NeurIPS Conference 2021 Conference Paper

AP-10K: A Benchmark for Animal Pose Estimation in the Wild

Hang Yu
Yufei Xu
Jing Zhang
Wei Zhao
Ziyu Guan
Dacheng Tao

Accurate animal pose estimation is an essential step towards understanding animal behavior, and can potentially benefit many downstream applications, such as wildlife conservation. Previous works only focus on specific animals while ignoring the diversity of animal species, limiting the generalization ability. In this paper, we propose AP-10K, the first large-scale benchmark for general animal pose estimation, to facilitate the research in animal pose estimation. AP-10K consists of 10, 015 images collected and filtered from 23 animal families and 54 species following the taxonomic rank and high-quality keypoint annotations labeled and checked manually. Based on AP-10K, we benchmark representative pose estimation models on the following three tracks: (1) supervised learning for animal pose estimation, (2) cross-domain transfer learning from human pose estimation to animal pose estimation, and (3) intra- and inter-family domain generalization for unseen animals. The experimental results provide sound empirical evidence on the superiority of learning from diverse animals species in terms of both accuracy and generalization ability. It opens new directions for facilitating future research in animal pose estimation. AP-10k is publicly available at https: //github. com/AlexTheBad/AP10K.

PDF Details

IROS Conference 2020 Conference Paper

Autonomous Obstacle Avoidance for UAV based on Fusion of Radar and Monocular Camera

Hang Yu
Fan Zhang 0031
Panfeng Huang
Chen Wang
Yuanhao Li 0002

UAVs face many challenges in autonomous obstacle avoidance in large outdoor scenarios, specifically the long communication distance from ground stations. The computing power of onboard computers is limited, and the unknown obstacles cannot be accurately detected. In this paper, an autonomous obstacle avoidance scheme based on the fusion of millimeter wave radar and monocular camera is proposed. The visual detection is designed to detect unknown obstacles which is more robust than traditional algorithms. Then extended Kalman filter (EKF) data fusion is used to build exact real 3D coordinates of the obstacles. Finally, an efficient path planning algorithm is used to obtain the path to avoid obstacles. Based on the theoretical design, an experimental platform is built to verify the UAV autonomous obstacle avoidance scheme proposed in this paper. The experiment results show the proposed scheme cannot only detect different kinds of unknown obstacles, but can also take up very little computing resources to run on an onboard computer. The outdoor flight experiment shows the feasibility of the proposed scheme.

Details

JBHI Journal 2020 Journal Article

Deep Learning for Smartphone-Based Malaria Parasite Detection in Thick Blood Smears

Feng Yang
Mahdieh Poostchi
Hang Yu
Zhou Zhou
Kamolrat Silamut
Jian Yu
Richard J. Maude
Stefan Jaeger

Objective: This work investigates the possibility of automated malaria parasite detection in thick blood smears with smartphones. Methods: We have developed the first deep learning method that can detect malaria parasites in thick blood smear images and can run on smartphones. Our method consists of two processing steps. First, we apply an intensity-based Iterative Global Minimum Screening (IGMS), which performs a fast screening of a thick smear image to find parasite candidates. Then, a customized Convolutional Neural Network (CNN) classifies each candidate as either parasite or background. Together with this paper, we make a dataset of 1819 thick smear images from 150 patients publicly available to the research community. We used this dataset to train and test our deep learning method, as described in this paper. Results: A patient-level five-fold cross-evaluation demonstrates the effectiveness of the customized CNN model in discriminating between positive (parasitic) and negative image patches in terms of the following performance indicators: accuracy (93. 46% ± 0. 32%), AUC (98. 39% ± 0. 18%), sensitivity (92. 59% ± 1. 27%), specificity (94. 33% ± 1. 25%), precision (94. 25% ± 1. 13%), and negative predictive value (92. 74% ± 1. 09%). High correlation coefficients (>0. 98) between automatically detected parasites and ground truth, on both image level and patient level, demonstrate the practicality of our method. Conclusion: Promising results are obtained for parasite detection in thick blood smears for a smartphone application using deep learning methods. Significance: Automated parasite detection running on smartphones is a promising alternative to manual parasite counting for malaria diagnosis, especially in areas lacking experienced parasitologists.

Details DOI

JMLR Journal 2020 Journal Article

Fast Bayesian Inference of Sparse Networks with Automatic Sparsity Determination

Hang Yu
Songwei Wu
Luyin Xin
Justin Dauwels

Structure learning of Gaussian graphical models typically involves careful tuning of penalty parameters, which balance the tradeoff between data fidelity and graph sparsity. Unfortunately, this tuning is often a “black art” requiring expert experience or brute-force search. It is therefore tempting to develop tuning-free algorithms that can determine the sparsity of the graph adaptively from the observed data in an automatic fashion. In this paper, we propose a novel approach, named BISN (Bayesian inference of Sparse Networks), for automatic Gaussian graphical model selection. Specifically, we regard the off-diagonal entries in the precision matrix as random variables and impose sparse-promoting horseshoe priors on them, resulting in automatic sparsity determination. With the help of stochastic gradients, an efficient variational Bayes algorithm is derived to learn the model. We further propose a decaying recursive stochastic gradient (DRSG) method to reduce the variance of the stochastic gradients and to accelerate the convergence. Our theoretical analysis shows that the time complexity of BISN scales only quadratically with the dimension, whereas the theoretical time complexity of the state-of-the-art methods for automatic graphical model selection is typically a third-order function of the dimension. Furthermore, numerical results show that BISN can achieve comparable or better performance than the state-of-the-art methods in terms of structure recovery, and yet its computational time is several orders of magnitude shorter, especially for large dimensions. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2020. ( edit, beta )

PDF Details

IJCAI Conference 2018 Conference Paper

CSNN: An Augmented Spiking based Framework with Perceptron-Inception

Qi Xu
Yu Qi
Hang Yu
Jiangrong Shen
Huajin Tang
Gang Pan

Spiking Neural Networks (SNNs) represent and transmit information in spikes, which is considered more biologically realistic and computationally powerful than the traditional Artificial Neural Networks. The spiking neurons encode useful temporal information and possess highly anti-noise property. The feature extraction ability of typical SNNs is limited by shallow structures. This paper focuses on improving the feature extraction ability of SNNs in virtue of powerful feature extraction ability of Convolutional Neural Networks (CNNs). CNNs can extract abstract features resorting to the structure of the convolutional feature maps. We propose a CNN-SNN (CSNN) model to combine feature learning ability of CNNs with cognition ability of SNNs. The CSNN model learns the encoded spatial temporal representations of images in an event-driven way. We evaluate the CSNN model on the handwritten digits images dataset MNIST and its variational databases. In the presented experimental results, the proposed CSNN model is evaluated regarding learning capabilities, encoding mechanisms, robustness to noisy stimuli and its classification performance. The results show that CSNN behaves well compared to other cognitive models with significantly fewer neurons and training samples. Our work brings more biological realism into modern image classification models, with the hope that these models can inform how the brain performs this high-level vision task.

PDF Details

IJCAI Conference 2018 Conference Paper

Jointly Learning Network Connections and Link Weights in Spiking Neural Networks

Yu Qi
Jiangrong Shen
Yueming Wang
Huajin Tang
Hang Yu
Zhaohui Wu
Gang Pan

Spiking neural networks (SNNs) are considered to be biologically plausible and power-efficient on neuromorphic hardware. However, unlike the brain mechanisms, most existing SNN algorithms have fixed network topologies and connection relationships. This paper proposes a method to jointly learn network connections and link weights simultaneously. The connection structures are optimized by the spike-timing-dependent plasticity (STDP) rule with timing information, and the link weights are optimized by a supervised algorithm. The connection structures and the weights are learned alternately until a termination condition is satisfied. Experiments are carried out using four benchmark datasets. Our approach outperforms classical learning methods such as STDP, Tempotron, SpikeProp, and a state-of-the-art supervised algorithm. In addition, the learned structures effectively reduce the number of connections by about 24%, thus facilitate the computational efficiency of the network.

PDF Details

YNIMG Journal 2015 Journal Article

Miniaturized optical neuroimaging in unrestrained animals

Hang Yu
Janaka Senarathna
Betty M. Tyler
Nitish V. Thakor
Arvind P. Pathak

Details DOI