Author name cluster

Hao Yu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

37 papers

2 author rows

EAAI Journal 2026 Journal Article

Attention-guided network for infrared unmanned aerial vehicle target detection

Qian Jiang
Hao Yu
Xin Jin
Puming Wang
Shin-Jye Lee
Shaowen Yao
Huan Jiang
Wangming Lan

Infrared unmanned aerial vehicle target detection is of great value in protecting national and personal security due to the inherent strong anti-interference capability of infrared sensors. Therefore, it has become an important area of research in remote sensing and computer vision. Influenced by long shooting distances and complex backgrounds, infrared unmanned aerial vehicle images often contain significant background noise and weak features, which pose significant challenges for infrared unmanned aerial vehicle target detection. In this work, we propose an attention-guided network for infrared unmanned aerial vehicle target detection. We first extract frames from videos in the Anti-unmanned aerial vehicle dataset and corrects incorrect labels, so that we can obtain the dataset used for model training. Then, we enhanced asymptotic feature pyramid network for the neck portion of the model, reducing the loss of small target features during network propagation. Next, we introduce efficient spatial coordinate attention to highlight the features of infrared unmanned aerial vehicle targets and enable the network to quickly focus on the regions of interest. Finally, to account for the varying aspect ratios of small targets, we employ the shape intersection over the union function as the bounding box loss function to improve the accuracy of target localization. The experimental results show that our network has achieved better performance than the state-of-the-art one-stage detection frameworks.

Details DOI

AAAI Conference 2026 Conference Paper

Counterfactual Question Generation Uncovering Learner Contradictions

Bo Zhang
Hao Yu
Wenjie Dong
Yvhang Yang
Dezhuang Miao
Fengyi Song
Yanhui Gu
Xiaoming Zhang

Conventional feedback, even when accompanied by brief explanations, rarely uncovers the hidden contradictions that trigger a learner's mistake. We bridge this gap with counterfactual question generation (CFQG): given a learner's answer, generate a follow-up question that deliberately contradicts it, compelling the learner to confront the underlying conflict. CFQG thus transforms assessment from passive scoring into an interactive and contradiction-centered dialogue that supports knowledge repair. To automate CFQG, we propose GapProbe, which probes the knowledge gap between a learner’s belief and curated facts through a knowledge graph (KG), then designs counterfactual questions (CFQs) that negate the belief. Identifying contradiction-aware triples, and more importantly, selecting those most likely to confuse the learner, are highly challenging in large-scale KGs. GapProbe tackles these challenges with an iterative ProConB cycle coupled with a schema-aware KGMap. By caching one- and multi-hop schema patterns of the KG, KGMap provides ``roadmap'' to guide LLMs jump to deep and contradiction-aware triples, beyond traditional step-wise graph traversal. We present the CFQG benchmark and corresponding metrics for evaluating how generated CFQs trigger, focus, and deepen learner reflection through explicit contradictions. Experiments on multiple datasets and LLMs show that GapProbe boosts LLM reasoning over KGs and generates follow-up questions that consistently promote deeper and more focused learner reflection.

PDF Details DOI

EAAI Journal 2026 Journal Article

Deep dynamic image prior for three-dimensional time-sequence pulmonary electrical impedance tomography

Hao Fang
Hao Yu
Sihao Teng
Tao Zhang
Siyi Yuan
Huaiwu He
Zhe Liu
Yunjie Yang

Unsupervised learning methods, such as Deep Image Prior (DIP), have shown great potential in engineering imaging due to their training-data-free nature and high generalization capability. However, their reliance on numerous network parameter iterations results in high computational costs, limiting their practical application, particularly in complex three-dimensional (3D) or time-sequence tomographic imaging tasks. To overcome these challenges, we propose Deep Dynamic Image Prior (D 2 IP), a novel framework for three-dimensional time-sequence imaging. D 2 IP introduces three key strategies — Unsupervised Parameter Warm-Start (UPWS), Temporal Parameter Propagation (TPP), and a customized lightweight reconstruction backbone, Three-dimensional Fast Residual U-Net (3D-FastResUNet) — to accelerate convergence, enforce temporal coherence, and improve computational efficiency. Experimental results on both simulated and clinical pulmonary datasets demonstrate that D 2 IP enables fast and accurate 3D time-sequence Electrical Impedance Tomography (tsEIT) reconstruction. Compared to the state-of-the-art Regularized Shallow Image Prior (R-SIP) baseline, D 2 IP delivers superior image quality — with a 24. 8% increase in average Mean Structural Similarity Index (MSSIM) and an 8. 1% reduction in Relative Error (ERR) — alongside significantly reduced computational time (7. 1× faster), demonstrating its promise for artificial intelligence (AI)-driven medical engineering applications, as exemplified by clinical dynamic pulmonary imaging.

Details DOI

EAAI Journal 2026 Journal Article

Learning-based target fencing control for delay-tolerant unmanned aerial vehicle swarm

Hao Yu
Xiu-xia Yang
Yi Zhang
Wen-qiang Yao

This study focuses on the cooperative fencing mission for unmanned aerial vehicle (UAV) swarm under communication delays, proposing an adaptive self-organized control framework based on a Radial Basis Function-Brain Emotional Learning-Based Intelligent Controller (RBF-BELBIC). Firstly, a fixed-time convergent observer is developed to realize simultaneous estimation of multiple states of the target, achieving precise estimation independent of initial states through dual-channel Hurwitz polynomial configuration. Secondly, a self-organized distributed control scheme integrating consensus term, navigation term, and potential field term is constructed. This strategy enables the UAV swarm to autonomously generate a dynamic fencing convex hull around the target, eliminating the dependency on predefined geometric configurations while guaranteeing collision avoidance. Thirdly, a dual-layer intelligent robust controller driven by the RBF-BELBIC network is designed to tackle the control lag effects caused by communication delays. This architecture establishes a hierarchical structure where the RBF network serves as an upper layer for online gain optimization, and the BELBIC acts as a lower reactive control layer, thereby enabling simultaneous disturbance compensation and dynamic control policy adaptation. Closed-loop stability is analytically established using Lyapunov theory. Simulations verify that the proposed control strategy extends the tolerable delay bound by an order of magnitude over conventional methods (from 100 ms to 1000 ms). Concurrently, it reduces fencing position and velocity errors by 99. 36% and 97. 45%, compared to single-layer learning networks under large delays, demonstrating superior robustness in complex environments.

Details DOI

AAAI Conference 2026 Conference Paper

Make Model Transparent: Brain Network Analysis via Causal and Knowledge Graph Learning

Lingyuan Meng
Ke Liang
Hao Yu
Haotian Wang
Miaomiao Li
Xinwang Liu

Brain network analysis technology reveals the organizational mechanism and information processing mode by constructing the structural connection network between brain regions. It has achieved satisfactory results in brain disease prediction tasks, promoting the progress of neuroscience. In recent years, graph transformer has become the most mainstream method for brain analysis with its powerful feature extraction ability and attention mechanism. However, these methods face two challenges, i.e., lack of interpretability, and neglect of semantic associations among brain regions. To solve these problems, we proposed a large language model (LLM)-driven causal knowledge brain network transformer framework, termed BrainCKT, which is plug-and-play, and can adapt to most of the existing mainstream graph transformer-based methods. Specifically, we constructed a brain region causal graph and used its adjacency matrix to guide the learning process of the self-attention mechanism. In addition, we constructed a brain science knowledge graph and encoded it through a pre-trained model to enhance the original brain region features. Finally, we integrated BrainCKT into four mainstream graph transformer baselines for verification. Experimental results on two brain imaging datasets proved the effectiveness of BrainCKT.

PDF Details DOI

AAAI Conference 2026 Conference Paper

TVChain: Leveraging Textual-Visual Prompt Chains for Jailbreaking Large Vision-Language Models

Hao Yu
Ke Liang
Junxian Duan
Jun Wang
Siwei Wang
Chuan Ma
Xinwang Liu

Large Vision-Language Models (LVLMs) enhance the capabilities of Large Language Models by integrating visual inputs, thereby enabling advanced multimodal reasoning across diverse applications. However, these enhanced reasoning capabilities introduce new security risks, particularly to jailbreaking attacks that bypass built-in safety mechanisms to elicit harmful or unauthorized outputs. While recent efforts have explored adversarial and typographic prompts, most existing attacks suffer from three key limitations: reliance on auxiliary models, limited effectiveness in black-box scenarios, and inadequate exploitation of the LVLMs' intrinsic reasoning abilities. In this work, we propose TVChain, a novel black-box jailbreaking framework that explicitly intervenes in both the visual and textual reasoning processes of LVLMs. TVChain decomposes malicious prompts into a sequence of semantically meaningful sub-images that represent relevant objects and behaviors, thereby circumventing direct exposure of illicit content. In parallel, a carefully designed chain-of-thought (CoT) textual prompt is employed to steer the model's reasoning toward reconstructing the intended activity in a covert yet effective manner. We demonstrate that this compositional prompting strategy reduces the likelihood of triggering safety mechanisms while preserving attack efficacy. Extensive evaluations on eleven LVLMs (seven open-source and four commercial) across two benchmark datasets and three state-of-the-art defenses validate the effectiveness and robustness of TVChain.

PDF Details DOI

EAAI Journal 2025 Journal Article

An Insulator defect detection network combining bidirectional feature pyramid network and attention mechanism in unmanned aerial vehicle images

Fu Feng
Xiaoxia Yang
Ronghao Yang
Hao Yu
Fangzhou Liao
Qiqi Shi
Feng Zhu

Insulator defect detection is a crucial aspect of power transmission line inspection. To achieve effective detection of insulator defects, a deep learning-based object detection network is applied to unmanned aerial vehicle images, offering advantages such as high accuracy, high efficiency, and low cost. While the you only look once version8 network demonstrates superior performance in insulator defect detection compared to other methods, it still struggles to achieve ideal results in scenarios with variable scales and complex backgrounds. To address this issue, an improved network is proposed in this paper, tailored to the characteristics of insulator defect detection using unmanned aerial vehicle images. Firstly, an attention mechanism is introduced into the backbone network to attenuate the influence of background in the target region. Secondly, the bidirectional feature pyramid network structure is incorporated, and cross-layer connections and weighted fusion are implemented during feature extraction and fusion to enable the network to better focus on insulator defect features and suppress the impact of noise. Lastly, the detection head at the lowest layer is replaced with a small target detection head to enhance the network's attention to small target. Experimental results on the self-made dataset demonstrate that recall and precision of the network are 89. 9 % and 96. 5 %.

Details DOI

AAAI Conference 2025 Conference Paper

FreeNet: Liberating Depth-Wise Separable Operations for Building Faster Mobile Vision Architectures

Hao Yu
Haoyu Chen
Wei Peng
Xu Cheng
Guoying Zhao

In the pursuit of efficient vision architectures, substantial efforts have been devoted to optimizing operator efficiency. Depth-wise separable operators, such as DWConv, are found cheap in both FLOPs and parameters. As a result, they are increasingly incorporated into efficient backbones, trading for deeper and wider architectures to enhance performance. However, separable operators are not really fast on devices due to the discontinuous memory access requirements. In this paper, we propose FreeNets, a family of simple and efficient backbones that free the separable operation to further accelerate the running speed. We introduce sparse sampling mixers (S2-Mixer) to supersede existing separable token mixers. The S2-Mixer samples multiple segments of partially continuous signals across spatial and channel dimensions for convolutional processing, achieving extremely fast on-device speed. The sparse sampling also enables S2-Mixer to capture long-range pixel relationships from dynamic receptive fields. Furthermore, we introduce a Shift Feed-Forward Network (ShiftFFN) as a faster alternative to existing channel mixers. It utilizes a shift neck architecture that aggregates global information to shift features, enabling faster channel mixing while incorporating global pixel information. Extensive experiments demonstrate that FreeNet offers a superior accuracy-efficiency tradeoff compared to the latest efficient models. On ImageNet-1k, FreeNet-S2 outperforms the StarNet-S4 by 0.4% in top-1 accuracy, while running around 40% faster on desktop GPU and 15% faster on Mobile GPU.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

SandboxSocial: A Sandbox for Social Media Using Multimodal AI Agents

Maximilian Puelma Touzel
Sneheel Sarangi
Gayatri Krishnakumar
Busra Tugce Gurbuz
Austin Welch
Zachary Yang
Andreea Musulan
Hao Yu

The online information ecosystem enables influence campaigns of unprecedented scale and impact. We urgently need empirically grounded approaches to counter the growing threat of malicious campaigns, now amplified by generative AI. But, developing defenses in real-world settings is impractical. Social system simulations with agents modelled using Large Language Models (LLMs) are a promising alternative approach and a growing area of research. However, existing simulators lack features needed to capture the complex information-sharing dynamics of platform-based social networks. To bridge this gap, we present SandboxSocial, a new simulator that includes several key innovations, mainly: (1) a virtual social media platform (modelled as Mastodon and mirrored in an actual Mastodon server) that enables a realistic setting in which agents interact; (2) an adapter that uses real-world user data to create more grounded agents and social media content; and (3) multi-modal capabilities that enable our agents to interact using both text and images---just as humans do on social media. We make the simulator more useful to researchers by providing measurement and analysis tools that track simulation dynamics and compute evaluation metrics to compare experimental results.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Scalable Cross-View Sample Alignment for Multi-View Clustering with View Structure Similarity

Jun Wang
Zhenglai Li
Chang Tang
Suyuan Liu
Hao Yu
Chuan Tang
Miaomiao Li
Xinwang Liu

Most existing multi-view clustering methods aim to generate a consensus partition across all views, based on the assumption that all views share the same sample arrangement. However, in real-world scenarios, the collected data across different views is often unsynchronized, making it difficult to ensure consistent sample correspondence between views. To address this issue, we propose a scalable sample-alignment-based multi-view clustering method, referred to as SSA-MVC. Specifically, we first employ a cluster-label matching (CLM) algorithm to select the view whose clustering labels best match those of the others as the benchmark view. Then, for each of the remaining views, we construct representations of non-aligned samples by computing their similarities with aligned samples. Based on these representations, we build a similarity graph between the non-aligned samples of each view and those in the benchmark view, which serves as the alignment criterion. This alignment criterion is then integrated into a late-fusion framework to enable clustering without requiring aligned samples. Notably, the learned sample alignment matrix can be used to enhance existing multi-view clustering methods in scenarios where sample correspondence is unavailable. The effectiveness of the proposed SSA-MVC algorithm is validated through extensive experiments conducted on eight real-world multi-view datasets.

PDF Details

NeurIPS Conference 2025 Conference Paper

SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought

Guanghao Li
Wenhao Jiang
Mingfeng Chen
Yan Li
Hao Yu
Shuting Dong
Tao Ren
Ming Tang

Chain-of-Thought (CoT) prompting improves the reasoning performance of large language models (LLMs) by encouraging step-by-step thinking. However, CoT-based methods depend on intermediate reasoning steps, which limits scalability and generalization. Recent work explores recursive reasoning, where LLMs reuse internal layers across iterations to refine latent representations without explicit CoT supervision. While promising, these approaches often require costly pretraining and lack a principled framework for how reasoning should evolve across iterations. We address this gap by introducing Flow Chain-of-Thought (Flow CoT), a reasoning paradigm that models recursive inference as a progressive trajectory of latent cognitive states. Flow CoT frames each iteration as a distinct cognitive stage—deepening reasoning across iterations without relying on manual supervision. To realize this, we propose SCOUT ( Stepwise Cognitive Optimization Using Teachers ), a lightweight fine-tuning framework that enables Flow CoT-style reasoning without the need for pretraining. SCOUT uses progressive distillation to align each iteration with a teacher of appropriate capacity, and a cross-attention-based retrospective module that integrates outputs from previous iterations while preserving the model’s original computation flow. Experiments across eight reasoning benchmarks show that SCOUT consistently improves both accuracy and explanation quality, achieving up to 1. 8\% gains under fine-tuning. Qualitative analyses further reveal that SCOUT enables progressively deeper reasoning across iterations—refining both belief formation and explanation granularity. These results not only validate the effectiveness of SCOUT, but also demonstrate the practical viability of Flow CoT as a scalable framework for enhancing reasoning in LLMs.

PDF Details

AAAI Conference 2025 Conference Paper

Treasures in Discarded Weights for LLM Quantization

Hao Yu
Yang Zhou
Bohua Chen
Zelan Yang
Shen Li
Yong Li
Jianxin Wu

In recent years, large language models (LLMs) have developed rapidly and revolutionized natural language processing. However, high storage overhead and computing costs limit LLM deployment in resource-constrained environments. Quantization algorithms can effectively compress LLMs and accelerate inference, but they lead to loss in precision, especially in low-bit scenarios. In this paper, we find that the discarded weight values caused by quantization in fact contain treasures to improve LLMs' accuracy. To excavate those hidden treasures, we construct search spaces around these discarded weights and those weights within the search space can seamlessly be incorporated into the original quantization weights. To determine which weights should be merged, we design a plug-and-play weight compensation framework to capture global information and keep the weights with the highest potential benefits. Our framework can be combined with various LLM quantization algorithms to achieve higher precision without additional inference overhead. We validate the effectiveness of our approach on widely used benchmark datasets for LLMs.

PDF Details DOI

IROS Conference 2025 Conference Paper

Unsupervised Liver Deformation Correction Network Using Optimal Transport for Image-Guided Liver Surgery

Mingyang Liu
Geng Li
Hao Yu
Xinzhe Du
Rui Song 0002
Yibin Li 0001
Max Q. -H. Meng
Zhe Min

In this paper, we propose a novel unsupervised intraoperative liver deformation correction method, called Learning Coherent point drift Network (LCNet), for image-guided liver surgery (IGLS). We first estimate the correspondences between the preoperative and intraoperative point sets in the optimal transport (OT) module by leveraging both original points and extracted features. Afterwards, we compute the point-wise displacement vector by solving the involved matrix equation in the Transformation module, where the point localisation noise is explicitly considered and modeled. Additionally, we present three variants of the proposed approach, i. e. , LCNet, LCNet-ED and LCNet-WD, where better registration performances of LCNet against the other two demonstrate the superiority of the utilised Chamfer loss. We have extensively evaluated LCNet on the MedShapeNet dataset consisting of 615 different liver shapes of real patients, and the 3Dircadb dataset comprising 20 liver models of real patients. Extensive experimental results under different deformation and noise magnitudes demonstrate that LCNet outperforms existing state-of-the-art registration algorithms and holds significant application potential in IGLS. For example, when the overlapping ratio between the preoperative and intraoperative point sets is 25%, the deformation magnitude is 8 mm, the maximum point localization noise magnitude is 2 mm and the rotation angle lies in the range of [−45°, 45°], LCNet achieves a root-mean-square error (RMSE) value being 3. 21 mm on MedShapeNet dataset, significantly outperforming those of Lepard and RoITr being 5. 41 mm (p < 0. 001) and 4. 90 mm (p < 0. 001) respectively.

Details

AAAI Conference 2025 Conference Paper

What Is a Good Question? Assessing Question Quality via Meta-Fact Checking

Bo Zhang
Jianghua Zhu
Chaozhuo Li
Hao Yu
Li Kong
Zhan Wang
Dezhuang Miao
Xiaoming Zhang

Knowledge-based questions are typically employed to evaluate LLM's knowledge boundaries; meanwhile, numerous studies focus on question generation as a means to enhance the capabilities of both models and individuals. However, there is a lack of in-depth exploration about what constitutes a good question from the perspective of knowledge cognition. This paper proposes aligning the complete knowledge underlying questions with educational criteria effectively employed in physics courses, thereby developing novel knowledge-intensive metrics of question quality. To this end, we propose Meta-Fact Checking (MFC), which transforms questions into knowledge graph (KG) triples utilizing LLMs through few-shot prompting, thereby quantifying question quality based on the patterns observed within these triples. MFC introduces a novel interaction mechanism for KGs that communicates meta-facts, illustrating the types of knowledge that KGs can offer to the LLM for reasoning questions, rather than relying solely on the original triples. This strategy ensures that MFC remains unaffected by unexplored triples that LLM has not yet encountered within KGs compared to the retrieve-while-reasoning routine. Experiments across multiple datasets and LLMs demonstrate that MFC significantly improves the accuracy and efficiency of both question answering and assessing. This research marks a pioneering effort to automate the evaluation of question quality based on cognitive capabilities.

PDF Details DOI

IROS Conference 2024 Conference Paper

Bidirectional Partial-to-Full Non-Rigid Point Set Registration with Non-Overlapping Filtering

Hao Yu
Mingyang Liu
Rui Song 0002
Yibin Li 0001
Max Q. -H. Meng
Zhe Min

In this paper, we introduce Bidirectional Non-Overlapping Filtering Network (Bi-NOFNet), which registers the partial intraoperative point set with full preoperative point set for computer-assisted interventions (CAI). Our contributions are three-folds. First, Bi-NOFNet adopts customised feature extractor to extract distinctive features from both point sets, with which the per-point overlap mask is predicted and the overlapping region is segmented for the preoperative point set. Furthermore, we propose two methods to filter out the non-overlapping regions, at feature-level (i. e. , Bi-NOFNet(Feature)) and point-level (i. e. , Bi-NOFNet (Point)). For these two methods, we develop supervised registration strategy where the ground-truth overlap mask and displacement vectors are employed, and weakly-supervised registration strategies where only the ground-truth overlap mask is available. Additionally, to fully utilise the information in both space, we propose a bidirectional registration mechanism, which predicts the displacement vectors associated with the intraoperative point set (i. e. , the forward way) and those warpping the preoperative point set (i. e. , the backward way). Experiments have been conducted on the proposed DeformMedShapeNet dataset that contains 615 different liver shapes. Extensive results demonstrate that Bi-NOFNet performs well for partial-to-full registration tasks under various scenarios of noise, overlap ratios and deformation levels, outperforming existing non-rigid registration approaches.

Details

AAAI Conference 2024 Conference Paper

Differentiable Auxiliary Learning for Sketch Re-Identification

Xingyu Liu
Xu Cheng
Haoyu Chen
Hao Yu
Guoying Zhao

Sketch re-identification (Re-ID) seeks to match pedestrians' photos from surveillance videos with corresponding sketches. However, we observe that existing works still have two critical limitations: (i) cross- and intra-modality discrepancies hinder the extraction of modality-shared features, (ii) standard triplet loss fails to constrain latent feature distribution in each modality with inadequate samples. To overcome the above issues, we propose a differentiable auxiliary learning network (DALNet) to explore a robust auxiliary modality for Sketch Re-ID. Specifically, for (i) we construct an auxiliary modality by using a dynamic auxiliary generator (DAG) to bridge the gap between sketch and photo modalities. The auxiliary modality highlights the described person in photos to mitigate background clutter and learns sketch style through style refinement. Moreover, a modality interactive attention module (MIA) is presented to align the features and learn the invariant patterns of two modalities by auxiliary modality. To address (ii), we propose a multi-modality collaborative learning scheme (MMCL) to align the latent distribution of three modalities. An intra-modality circle loss in MMCL brings learned global and modality-shared features of the same identity closer in the case of insufficient samples within each modality. Extensive experiments verify the superior performance of our DALNet over the state-of-the-art methods for Sketch Re-ID, and the generalization in sketch-based image retrieval and sketch-photo face recognition tasks.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models

Linyi Li
Shijie Geng
Zhenwen Li
Yibo He
Hao Yu
Ziyue Hua
Guanghan Ning
Siwei Wang

Large Language Models for code (code LLMs) have witnessed tremendous progress in recent years. With the rapid development of code LLMs, many popular evaluation benchmarks, such as HumanEval, DS-1000, and MBPP, have emerged to measure the performance of code LLMs with a particular focus on code generation tasks. However, they are insufficient to cover the full range of expected capabilities of code LLMs, which span beyond code generation to answering diverse coding-related questions. To fill this gap, we propose InfiBench, the first large-scale freeform question-answering (QA) benchmark for code to our knowledge, comprising 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages. InfiBench uses four types of model-free automatic metrics to evaluate response correctness where domain experts carefully concretize the criterion for each question. We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings. Our detailed analyses showcase potential directions for further advancement of code LLMs. InfiBench is fully open source at https: //infi-coder. github. io/infibench and continuously expanding to foster more scientific and systematic practices for code LLM evaluation.

PDF Details DOI

EAAI Journal 2024 Journal Article

Multi-scale context feature and cross-attention network-enabled system and software-based for pavement crack detection

Xin Wen
Shuo Li
Hao Yu
Yu He

Pavement crack detection continues to be a stubborn problem given the interference of various factors in the actual pavement and the complex topological structure of asphalt pavement. Among all the obstacles, the bottleneck of pavement crack detection lies in the difficulty of segmenting the cracks in the pavement images whose edges are blurred. This paper proposes a multi-scale context feature and cross-attention based on convolutional neural network for accurate and robust pavement crack segmentation. The multi-scale context feature module is built in different deep networks to extract rich crack feature information. Subsequently, in order to effectively promote the seamless integration of features at different levels, we deploy cross-attention modules to each branch. After that, we add deep supervision to each branch to accelerate training. Finally, we integrate the outputs of each branch to obtain the final output diagram. The comparative experiments on various pavement datasets show that the method has better robustness. At the same time, this paper designs a complete system of pavement crack detection (PCD) and develops corresponding engineering application software. The PCD system can record the real-time pavement image data to the edge server, and the client can also monitor the real-time pavement images from the edge server through the HTTP protocol.

Details DOI

YNIMG Journal 2024 Journal Article

Noninvasive microvascular imaging in newborn rats using high-frequency ultrafast ultrasound

Yunlong Zhao
Jiabin Zhang
Hao Yu
Xinlin Hou
Jue Zhang

Ultrasound imaging stands as the predominant modality for neonatal health assessment, with recent advancements in ultrafast Doppler (μDoppler) technology offering significant promise in fields such as neonatal brain imaging. Combining μDoppler with high-frequency ultrasound (HF-μDoppler) presents a potential efficient avenue to enhance in vivo microvascular imaging in small animals, notably newborn rats, a crucial preclinical animal model for neonatal disease and development research. It is necessary to verify the imaging performance of HF-μDoppler in preclinical trials. This study investigates the microvascular imaging capabilities of HF-μDoppler using a 30 MHz high-frequency linear array probe in newborn rats. Results demonstrate the clarity of cerebral microvascular imaging in rats aged 1 to 7 postnatal days, extending to whole-body microvascular imaging, encompassing the central nervous system, including the brain and spinal cord. In conclusion, HF-μDoppler technology emerges as a reliable imaging tool, offering a new perspective for preclinical investigations into neonatal diseases and development.

Details DOI

JBHI Journal 2024 Journal Article

ST-Phys: Unsupervised Spatio-Temporal Contrastive Remote Physiological Measurement

Mingyue Cao
Xu Cheng
Xingyu Liu
Yan Jiang
Hao Yu
Jingang Shi

Remote photoplethysmography (rPPG) is a non-contact method that employs facial videos for measuring physiological parameters. Existing rPPG methods have achieved remarkable performance. However, the success mainly profits from supervised learning over massive labeled data. On the other hand, existing unsupervised rPPG methods fail to fully utilize spatio-temporal features and encounter challenges in low-light or noise environments. To address these problems, we propose an unsupervised contrast learning approach, ST-Phys. We incorporate a low-light enhancement module, a temporal dilated module, and a spatial enhanced module to better deal with long-term dependencies under the random low-light conditions. In addition, we design a circular margin loss, wherein rPPG signals originating from identical videos are attracted, while those from distinct videos are repelled. Our method is assessed on six openly accessible datasets, including RGB and NIR videos. Extensive experiments reveal the superior performance of our proposed ST-Phys over state-of-the-art unsupervised rPPG methods. Moreover, it offers advantages in parameter reduction and noise robustness.

Details DOI

EAAI Journal 2024 Journal Article

Uncertain remanufacturing reverse logistics network design in industry 5.0: Opportunities and challenges of digitalization

Hao Yu
Xu Sun

Remanufacturing, a crucial step of reverse logistics, focuses on restoring or enhancing the functionality of waste products. The challenge in planning an effective remanufacturing reverse logistics system lies in the uncertainties from various sources. In addition, the evolving industrial landscape in Industry 5. 0 necessitates adaptability to technological advancements. This paper proposes an integrated and digitalized architecture for uncertain reverse logistics network design. A fuzzy optimization model is first formulated to identify potential network configurations under varying demand-satisfying and capacity constraints. These solutions are automatically converted and assessed in a dynamic simulation environment with practical operational logic under a set of real-world scenarios. Numerical experiments are performed to validate the method and show the advantages of integrating optimization with dynamic simulation on a digital platform for strategic network planning. The results, built upon previous research, indicate that while initial investments in technology might be substantial, they may lead to long-term reductions in both costs and emissions. Moreover, collaborative decision-making is essential to mitigate potential disruptions and cascading effects. Our research contributes to the development of a novel integrated decision-support architecture and underscores the role of digitalization and Industry 5. 0 in future smart and sustainable reverse logistics planning.

Details DOI

ICLR Conference 2024 Conference Paper

Variance-enlarged Poisson Learning for Graph-based Semi-Supervised Learning with Extremely Sparse Labeled Data

Xiong Zhou
Xianming Liu 0005
Hao Yu
Jialiang Wang 0003
Zeke Xie
Junjun Jiang
Xiangyang Ji

Graph-based semi-supervised learning, particularly in the context of extremely sparse labeled data, often suffers from degenerate solutions where label functions tend to be nearly constant across unlabeled data. In this paper, we introduce Variance-enlarged Poisson Learning (VPL), a simple yet powerful framework tailored to alleviate the issues arising from the presence of degenerate solutions. VPL incorporates a variance-enlarged regularization term, which induces a Poisson equation specifically for unlabeled data. This intuitive approach increases the dispersion of labels from their average mean, effectively reducing the likelihood of degenerate solutions characterized by nearly constant label functions. We subsequently introduce two streamlined algorithms, V-Laplace and V-Poisson, each intricately designed to enhance Laplace and Poisson learning, respectively. Furthermore, we broaden the scope of VPL to encompass graph neural networks, introducing Variance-enlarged Graph Poisson Networks (V-GPN) to facilitate improved label propagation. To achieve a deeper understanding of VPL's behavior, we conduct a comprehensive theoretical exploration in both discrete and variational cases. Our findings elucidate that VPL inherently amplifies the importance of connections within the same class while concurrently tempering those between different classes. We support our claims with extensive experiments, demonstrating the effectiveness of VPL and showcasing its superiority over existing methods. The code is available at https://github.com/hitcszx/VPL.

Details

AAAI Conference 2023 Conference Paper

Compressing Transformers: Features Are Low-Rank, but Weights Are Not!

Hao Yu
Jianxin Wu

Transformer and its variants achieve excellent results in various computer vision and natural language processing tasks, but high computational costs and reliance on large training datasets restrict their deployment in resource-constrained settings. Low-rank approximation of model weights has been effective in compressing CNN models, but its application to transformers has been less explored and is less effective. Existing methods require the complete dataset to fine-tune compressed models, which are both time-consuming and data-hungry. This paper reveals that the features (i.e., activations) are low-rank, but model weights are surprisingly not low-rank. Hence, AAFM is proposed, which adaptively determines the compressed model structure and locally compresses each linear layer's output features rather than the model weights. A second stage, GFM, optimizes the entire compressed network holistically. Both AAFM and GFM only use few training samples without labels, that is, they are few-shot, unsupervised, fast and effective. For example, with only 2K images without labels, 33% of the parameters are removed in DeiT-B with 18.8% relative throughput increase, but only a 0.23% accuracy loss for ImageNet recognition. The proposed methods are successfully applied to the language modeling task in NLP, too. Besides, the few-shot compressed models generalize well in downstream tasks.

PDF Details DOI

EAAI Journal 2023 Journal Article

Preference-based multi-attribute decision-making method with spherical-Z fuzzy sets for green product design

Zhongwei Huang
Honghao Zhang
Danqi Wang
Hao Yu
Lingyu Wang
Dongtao Yu
Yong Peng

Green product design, i. e. , design that harmonizes with the environment, is a crucial component for addressing environmental considerations in the earliest stages of the product life cycle, e. g. , new energy vehicles (NEVs), that minimize negative environmental impacts. The designs can encompass material selection, resource use, production requirements, recycling, reuse and the disposal of products. Selecting the optimal design alternative considering multiple attribute indices, e. g. , environmental indicators and functional indicators, is a typical multi-attribute decision-making (MADM) problem. This study proposes a hybrid preference-based MADM method with spherical-Z fuzzy numbers (SZFNs) for solving the green product design fuzzy information problem. SZFNs are designed to mine the internal hidden information of traditional Z-numbers, and they combine the characteristics of the reliability constraints of Z-numbers and the advantages of spherical linguistic sets. The operation, aggregation operators and probabilistic measure method of SZFNs are defined in this study. The weight vector of the criteria is obtained by adopting the degree of possibility of spherical-Z (DPSZ). The hybrid MADM method is developed by incorporating the total utility of spherical-Z, which is used to convert the internal meaning of the evaluation information into an additive ratio assessment using gray relation analysis (ARAS-GRA) to obtain the optimal alternative. Finally, a case study, i. e. , five green product design schemes for NEVs, is adopted to verify the effectiveness and robustness of this proposed method. A comparative analysis, sensitivity analysis and comprehensive discussion are conducted in this research. The results confirm that this proposed method has an improved performance, and provides some references for designers.

Details DOI

IROS Conference 2022 Conference Paper

A wearable system with harmonic oscillations to assess finger biomechanics

Hao Yu
Aran Sena
Etienne Burdet

This paper presents a wearable device for finger assessment that can identify finger joint impedance parameters through harmonic oscillation perturbations. This device is designed to help assess motor impairments related to hypertonic soft-tissue changes, that can arise from a number of conditions such as stroke. By measuring the ratio of the applied torque and resulting velocities, the impedance values for any bending direction of a metacarpophalangeal (MCP) joint can be estimated. The ability of this device to effectively estimate finger parameters was tested in experiments with six participants. The experimental result was validated through comparison to prior works on finger impedance estimation. The user experience of the presented system was also analysed, indicating that the device design is comfortable and acceptable for participants.

Details

NeurIPS Conference 2021 Conference Paper

CoFiNet: Reliable Coarse-to-fine Correspondences for Robust PointCloud Registration

Hao Yu
Fu Li
Mahdi Saleh
Benjamin Busam
Slobodan Ilic

We study the problem of extracting correspondences between a pair of point clouds for registration. For correspondence retrieval, existing works benefit from matching sparse keypoints detected from dense points but usually struggle to guarantee their repeatability. To address this issue, we present CoFiNet - Coarse-to-Fine Network which extracts hierarchical correspondences from coarse to fine without keypoint detection. On a coarse scale and guided by a weighting scheme, our model firstly learns to match down-sampled nodes whose vicinity points share more overlap, which significantly shrinks the search space of a consecutive stage. On a finer scale, node proposals are consecutively expanded to patches that consist of groups of points together with associated descriptors. Point correspondences are then refined from the overlap areas of corresponding patches, by a density-adaptive matching module capable to deal with varying point density. Extensive evaluation of CoFiNet on both indoor and outdoor standard benchmarks shows our superiority over existing methods. Especially on 3DLoMatch where point clouds share less overlap, CoFiNet significantly outperforms state-of-the-art approaches by at least 5% on Registration Recall, with at most two-third of their parameters.

PDF Details

TIST Journal 2021 Journal Article

S3-Net: A Fast Scene Understanding Network by Single-Shot Segmentation for Autonomous Driving

Yuan Cheng
Yuchao Yang
Hai-Bao Chen
Ngai Wong
Hao Yu

Real-time segmentation and understanding of driving scenes are crucial in autonomous driving. Traditional pixel-wise approaches extract scene information by segmenting all pixels in a frame, and hence are inefficient and slow. Proposal-wise approaches only learn from the proposed object candidates, but still require multiple steps on the expensive proposal methods. Instead, this work presents a fast single-shot segmentation strategy for video scene understanding. The proposed net, called S3-Net, quickly locates and segments target sub-scenes, and meanwhile extracts attention-aware time-series sub-scene features ( ats-features ) as inputs to an attention-aware spatio-temporal model (ASM). Utilizing tensorization and quantization techniques, S3-Net is intended to be lightweight for edge computing. Experiments results on CityScapes, UCF11, HMDB51, and MOMENTS datasets demonstrate that the proposed S3-Net achieves an accuracy improvement of 8.1% versus the 3D-CNN based approach on UCF11, a storage reduction of 6.9× and an inference speed of 22.8 FPS on CityScapes with a GTX1080Ti GPU.

Details DOI

JMLR Journal 2020 Journal Article

A Low Complexity Algorithm with O(√T) Regret and O(1) Constraint Violations for Online Convex Optimization with Long Term Constraints

Hao Yu
Michael J. Neely

This paper considers online convex optimization over a complicated constraint set, which typically consists of multiple functional constraints and a set constraint. The conventional online projection algorithm (Zinkevich, 2003) can be difficult to implement due to the potentially high computation complexity of the projection operation. In this paper, we relax the functional constraints by allowing them to be violated at each round but still requiring them to be satisfied in the long term. This type of relaxed online convex optimization (with long term constraints) was first considered in Mahdavi et al. (2012). That prior work proposes an algorithm to achieve $O(\sqrt{T})$ regret and $O(T^{3/4})$ constraint violations for general problems and another algorithm to achieve an $O(T^{2/3})$ bound for both regret and constraint violations when the constraint set can be described by a finite number of linear constraints. A recent extension in Jenatton et al. (2016) can achieve $O(T^{\max\{\theta,1-\theta\}})$ regret and $O(T^{1-\theta/2})$ constraint violations where $\theta\in (0,1)$. The current paper proposes a new simple algorithm that yields improved performance in comparison to prior works. The new algorithm achieves an $O(\sqrt{T})$ regret bound with $O(1)$ constraint violations. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

PDF Details

IJCAI Conference 2020 Conference Paper

A Speech-to-Knowledge-Graph Construction System

Xiaoyi Fu
Jie Zhang
Hao Yu
Jiachen Li
Dong Chen
Jie Yuan
Xindong Wu

This paper presents a HAO-Graph system that generates and visualizes knowledge graphs from a speech in real-time. When a user speaks to the system, HAO-Graph transforms the voice into knowledge graphs with key phrases from the original speech as nodes and edges. Different from language-to-language systems, such as Chinese-to-English and English-to-English, HAO-Graph converts a speech into graphs, and is the first of its kind. The effectiveness of our HAO-Graph system is verified by a two-hour chairman's talk in front of two thousand participants at an annual meeting in the form of a satisfaction survey.

PDF Details DOI

NeurIPS Conference 2019 Conference Paper

A Communication Efficient Stochastic Multi-Block Alternating Direction Method of Multipliers

Hao Yu

The alternating direction method of multipliers (ADMM) has recently received tremendous interests for distributed large scale optimization in machine learning, statistics, multi-agent networks and related applications. In this paper, we propose a new parallel multi-block stochastic ADMM for distributed stochastic optimization, where each node is only required to perform simple stochastic gradient descent updates. The proposed ADMM is fully parallel, can solve problems with arbitrary block structures, and has a convergence rate comparable to or better than existing state-of-the-art ADMM methods for stochastic optimization. Existing stochastic (or deterministic) ADMMs require each node to exchange its updated primal variables across nodes at each iteration and hence cause significant amount of communication overhead. Existing ADMMs require roughly the same number of inter-node communication rounds as the number of in-node computation rounds. In contrast, the number of communication rounds required by our new ADMM is only the square root of the number of computation rounds.

PDF Details

ICML Conference 2019 Conference Paper

On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization

Hao Yu
Rong Jin 0001

For SGD based distributed stochastic optimization, computation complexity, measured by the convergence rate in terms of the number of stochastic gradient calls, and communication complexity, measured by the number of inter-node communication rounds, are two most important performance metrics. The classical data-parallel implementation of SGD over N workers can achieve linear speedup of its convergence rate but incurs an inter-node communication round at each batch. We study the benefit of using dynamically increasing batch sizes in parallel SGD for stochastic non-convex optimization by charactering the attained convergence rate and the required number of communication rounds. We show that for stochastic non-convex optimization under the P-L condition, the classical data-parallel SGD with exponentially increasing batch sizes can achieve the fastest known $O(1/(NT))$ convergence with linear speedup using only $\log(T)$ communication rounds. For general stochastic non-convex optimization, we propose a Catalyst-like algorithm to achieve the fastest known $O(1/\sqrt{NT})$ convergence with only $O(\sqrt{NT}\log(\frac{T}{N}))$ communication rounds.

Details

ICML Conference 2019 Conference Paper

On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization

Hao Yu
Rong Jin 0001
Sen Yang

Recent developments on large-scale distributed machine learning applications, e. g. , deep neural networks, benefit enormously from the advances in distributed non-convex optimization techniques, e. g. , distributed Stochastic Gradient Descent (SGD). A series of recent works study the linear speedup property of distributed SGD variants with reduced communication. The linear speedup property enables us to scale out the computing capability by adding more computing nodes into our system. The reduced communication complexity is desirable since communication overhead is often the performance bottleneck in distributed systems. Recently, momentum methods are more and more widely adopted by practitioners to train machine learning models since they can often converge faster and generalize better. However, it remains unclear whether any distributed momentum SGD possesses the same linear speedup property as distributed SGD and has reduced communication complexity. This paper fills the gap by considering a distributed communication efficient momentum SGD method and proving its linear speedup property.

Details

AAAI Conference 2019 Conference Paper

Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning

Hao Yu
Sen Yang
Shenghuo Zhu

In distributed training of deep neural networks, parallel minibatch SGD is widely used to speed up the training process by using multiple workers. It uses multiple workers to sample local stochastic gradients in parallel, aggregates all gradients in a single server to obtain the average, and updates each worker’s local model using a SGD update with the averaged gradient. Ideally, parallel mini-batch SGD can achieve a linear speed-up of the training time (with respect to the number of workers) compared with SGD over a single worker. However, such linear scalability in practice is significantly limited by the growing demand for gradient communication as more workers are involved. Model averaging, which periodically averages individual models trained over parallel workers, is another common practice used for distributed training of deep neural networks since (Zinkevich et al. 2010) (McDonald, Hall, and Mann 2010). Compared with parallel mini-batch SGD, the communication overhead of model averaging is significantly reduced. Impressively, tremendous experimental works have verified that model averaging can still achieve a good speed-up of the training time as long as the averaging interval is carefully controlled. However, it remains a mystery in theory why such a simple heuristic works so well. This paper provides a thorough and rigorous theoretical study on why model averaging can work as well as parallel mini-batch SGD with significantly less communication overhead.

PDF Details

IJCAI Conference 2018 Conference Paper

Robust Graph Dimensionality Reduction

Xiaofeng Zhu
Cong Lei
Hao Yu
Yonggang Li
Jiangzhang Gan
Shichao Zhang

In this paper, we propose conducting Robust Graph Dimensionality Reduction (RGDR) by learning a transformation matrix to map original high-dimensional data into their low-dimensional intrinsic space without the influence of outliers. To do this, we propose simultaneously 1) adaptively learning three variables, \ie a reverse graph embedding of original data, a transformation matrix, and a graph matrix preserving the local similarity of original data in their low-dimensional intrinsic space; and 2) employing robust estimators to avoid outliers involving the processes of optimizing these three matrices. As a result, original data are cleaned by two strategies, \ie a prediction of original data based on three resulting variables and robust estimators, so that the transformation matrix can be learnt from accurately estimated intrinsic space with the helping of the reverse graph embedding and the graph matrix. Moreover, we propose a new optimization algorithm to the resulting objective function as well as theoretically prove the convergence of our optimization algorithm. Experimental results indicated that our proposed method outperformed all the comparison methods in terms of different classification tasks.

PDF Details

NeurIPS Conference 2018 Conference Paper

Solving Non-smooth Constrained Programs with Lower Complexity than $\mathcal{O}(1/\varepsilon)$: A Primal-Dual Homotopy Smoothing Approach

Xiaohan Wei
Hao Yu
Qing Ling
Michael Neely

We propose a new primal-dual homotopy smoothing algorithm for a linearly constrained convex program, where neither the primal nor the dual function has to be smooth or strongly convex. The best known iteration complexity solving such a non-smooth problem is $\mathcal{O}(\varepsilon^{-1})$. In this paper, we show that by leveraging a local error bound condition on the dual function, the proposed algorithm can achieve a better primal convergence time of $\mathcal{O}\l(\varepsilon^{-2/(2+\beta)}\log_2(\varepsilon^{-1})\r)$, where $\beta\in(0, 1]$ is a local error bound parameter. As an example application, we show that the distributed geometric median problem, which can be formulated as a constrained convex program, has its dual function non-smooth but satisfying the aforementioned local error bound condition with $\beta=1/2$, therefore enjoying a convergence time of $\mathcal{O}\l(\varepsilon^{-4/5}\log_2(\varepsilon^{-1})\r)$. This result improves upon the $\mathcal{O}(\varepsilon^{-1})$ convergence time bound achieved by existing distributed optimization algorithms. Simulation experiments also demonstrate the performance of our proposed algorithm.

PDF Details

AAAI Conference 2018 Conference Paper

SqueezedText: A Real-Time Scene Text Recognition by Binary Convolutional Encoder-Decoder Network

Zichuan Liu
Yixing Li
Fengbo Ren
Wang Ling Goh
Hao Yu

A new approach for real-time scene text recognition is proposed in this paper. A novel binary convolutional encoderdecoder network (B-CEDNet) together with a bidirectional recurrent neural network (Bi-RNN). The B-CEDNet is engaged as a visual front-end to provide elaborated character detection, and a back-end Bi-RNN performs characterlevel sequential correction and classiﬁcation based on learned contextual knowledge. The front-end B-CEDNet can process multiple regions containing characters using a one-off forward operation, and is trained under binary constraints with signiﬁcant compression. Hence it leads to both remarkable inference run-time speedup as well as memory usage reduction. With the elaborated character detection, the back-end Bi-RNN merely processes a low dimension feature sequence with category and spatial information of extracted characters for sequence correction and classiﬁcation. By training with over 1, 000, 000 synthetic scene text images, the B-CEDNet achieves a recall rate of 0. 86, precision of 0. 88 and F-score of 0. 87 on ICDAR-03 and ICDAR-13. With the correction and classiﬁcation by Bi-RNN, the proposed real-time scene text recognition achieves state-of-the-art accuracy while only consumes less than 1-ms inference run-time. The ﬂow processing ﬂow is realized on GPU with a small network size of 1. 01 MB for B-CEDNet and 3. 23 MB for Bi-RNN, which is much faster and smaller than the existing solutions.

PDF Details

NeurIPS Conference 2017 Conference Paper

Online Convex Optimization with Stochastic Constraints

Hao Yu
Michael Neely
Xiaohan Wei

This paper considers online convex optimization (OCO) with stochastic constraints, which generalizes Zinkevich's OCO over a known simple fixed set by introducing multiple stochastic functional constraints that are i. i. d. generated at each round and are disclosed to the decision maker only after the decision is made. This formulation arises naturally when decisions are restricted by stochastic environments or deterministic environments with noisy observations. It also includes many important problems as special case, such as OCO with long term constraints, stochastic constrained convex optimization, and deterministic constrained convex optimization. To solve this problem, this paper proposes a new algorithm that achieves $O(\sqrt{T})$ expected regret and constraint violations and $O(\sqrt{T}\log(T))$ high probability regret and constraint violations. Experiments on a real-world data center scheduling problem further verify the performance of the new algorithm.

PDF Details