Arrow Research search

Author name cluster

Hao Feng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers
2 author rows

Possible papers

15

EAAI Journal 2026 Journal Article

A rapid image-based detection method for coalmine dust concentration and mass dispersion via multi-task deep learning

  • Haoran Fu
  • Xiaoyan Gong
  • Long Chen
  • Xinyu Wang
  • Yuxuan Xue
  • Wansu Yong
  • Hao Feng

Rapid and simultaneous detection of multiple coalmine dust parameters is crucial for accurate dust control. Existing methods often suffer from single detection indicators and delayed responses, making it difficult to effectively implement dust prevention and control measures. This study leverages artificial intelligence to propose an image-based detection method for coalmine dust, built upon multi-task deep learning. The method enables real-time detection of total dust concentration, respiratory dust concentration, and mass dispersion. An image preprocessing strategy is designed, and shared features are selected using the maximal information coefficient and variance inflation factor, with a multidimensional feature library constructed. The proposed architecture integrates a shared layer and a Set Transformer layer, both based on multi-head attention, to optimize inter-task representation consistency and enhance adaptability to dynamic variations in particle count. To jointly optimize the network architecture and training performance, an adaptive loss optimization mechanism and an Optuna-based hyperparameter tuning strategy are introduced. On an independent test set, the method is compared with multi-level baselines and evaluated via ablation studies. A dust particle image detection device is developed, and based on a fully mechanized tunneling face of a coalmine in Shaanxi Province, an experimental platform is constructed for application analysis. The results show that, under coal-dust conditions, the method achieves an average response cycle of 5. 5620 s, and the maximum average relative error across all outputs is 7. 3201%, meeting engineering requirements for real-time performance and detection accuracy. Overall, the method offers robust theoretical and technical support for intelligent dust monitoring.

EAAI Journal 2026 Journal Article

An encoder-decoder model with self-attention mechanism for airport aviation noise estimation

  • Weili Zeng
  • Wentao Guo
  • Hao Feng
  • Yadong Zhou

Estimating aviation noise around airports is the prerequisite and foundation for assessing and controlling noise impacts. However, mainstream models are unable to capture the time-series correlation between noise impact factors. This deficiency directly undermines the estimation accuracy and generalization ability of the models. To this end, this paper proposes an encoder-decoder model with a self-attention mechanism (SAM-EDM) to estimate the aviation noise around airports. The model employs a deep autoencoder to reduce the dimensionality of noise-related factors, which not only captures complex nonlinear relationships among different variables but also eliminates redundancy and anomalies in the input features. On this basis, the encoder incorporates a Bidirectional Long Short-Term Memory (Bi-LSTM) network to learn bidirectional temporal dependencies across different time steps. The decoder incorporates physical prior knowledge and a self-attention mechanism into a Gated Recurrent Unit (GRU) and subsequently employs a fully connected layer to produce the final noise estimation outputs. A case study based on Hefei Xinqiao International Airport in China demonstrates that the SAM-EDM model achieves a coefficient of determination of 0. 94 across different aircraft types. The model achieves a mean absolute error of 1. 17 dB on the test set and 1. 19 dB under unseen scenarios, outperforming traditional physical models, lightweight physics-guided neural networks, and pure deep learning models, demonstrating high estimation accuracy and strong generalization capability.

AAAI Conference 2026 Conference Paper

MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement

  • Weitao Jia
  • Jinghui Lu
  • Haiyang Yu
  • Siqi Wang
  • Guozhi Tang
  • An-Lan Wang
  • Weijie Yin
  • Dingkang Yang

Recent advances demonstrate that reinforcement learning with verifiable rewards (RLVR) significantly enhances the reasoning capabilities of large language models (LLMs). However, standard RLVR faces challenges with reward sparsity, where zero rewards from consistently incorrect candidate answers provide no learning signal, particularly in challenging tasks. To address this,we propose Multi-Expert Mutual Learning GRPO (MEML-GRPO), an innovative framework that utilizes diverse expert prompts as system prompts to generate a broader range of responses, substantially increasing the likelihood of identifying correct solutions. Additionally, we introduce an inter-expert mutual learning mechanism that facilitates knowledge sharing and transfer among experts, further boosting the model’s performance through RLVR. Extensive experiments across multiple reasoning benchmarks show that MEML-GRPO delivers significant improvements, achieving an average performance gain of 4.89% with Qwen and 11.33% with Llama, effectively overcoming the core limitations of traditional RLVR methods.

NeurIPS Conference 2025 Conference Paper

DUO: No Compromise to Accuracy Degradation

  • Jinda Jia
  • Cong Xie
  • Hanlin Lu
  • Fanjiang Ye
  • Hao Feng
  • Daoce Wang
  • Haibin Lin
  • Zhi Zhang

Distributed training often suffers from high communication overhead due to large-scale gradient synchronization. Although gradient compression—particularly at 4-bit or even lower precision—significantly reduces transfer volume, it typically results in sacrifice in precision and degradation of the final model accuracy. In this work, we introduce DUO, a distributed training framework designed to mitigate accuracy degradation incurred by gradient compression without involving additional overhead. DUO achieves this by inserting an additional high-precision gradient synchronization step into a previously computation-only phase, so that its communication is fully hidden by computation. We provide a comprehensive theoretical proof of convergence for DUO and validate its effectiveness through extensive pre-training experiments on GPT models. Our results indicate that DUO effectively restores accuracy when using 4-bit gradient compression, achieving performance comparable to uncompressed training. Remarkably, DUO maintains minimal accuracy degradation even under extreme compression scenarios, including 1-bit gradients or complete omission of the low-precision gradient communication step (0-bit transmission).

ICML Conference 2025 Conference Paper

EPIC: Efficient Position-Independent Caching for Serving Large Language Models

  • Junhao Hu
  • Wenrui Huang
  • Weidong Wang
  • Haoyi Wang
  • Tiancheng Hu
  • Qin Zhang
  • Hao Feng
  • Xusheng Chen

Large Language Models (LLMs) show great capabilities in a wide range of applications, but serving them efficiently becomes increasingly challenging as requests (prompts) become more complex. Context caching improves serving performance by reusing Key-Value (KV) vectors, the intermediate representations of tokens that are repeated across requests. However, existing context caching requires exact prefix matches across requests, limiting reuse cases in settings such as few-shot learning and retrieval-augmented generation, where immutable content (e. g. , documents) remains unchanged across requests but is preceded by varying prefixes. Position-Independent Caching (PIC) addresses this issue by enabling modular reuse of the KV vectors regardless of prefixes. We formalize PIC and advance prior work by introducing EPIC, a serving system incorporating our new LegoLink algorithm, which mitigates the inappropriate “attention sink” effect at every document beginning, to maintain accuracy with minimal computation. Experiments show that EPIC achieves up to 8$\times$ improvements in Time-To-First-Token (TTFT) and 7$\times$ throughput gains over existing systems, with negligible or no accuracy loss.

NeurIPS Conference 2025 Conference Paper

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

  • Ling Fu
  • Zhebin Kuang
  • Jiajun Song
  • Mingxin Huang
  • Biao Yang
  • Yuzhe Li
  • Linghao Zhu
  • Qidi Luo

Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest. Existing benchmarks have highlighted the impressive performance of LMMs in text recognition; however, their abilities in certain challenging tasks, such as text localization, handwritten content extraction, and logical reasoning, remain underexplored. To bridge this gap, we introduce OCRBench v2, a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks ($4\times$ more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios ($31$ diverse scenarios), and thorough evaluation metrics, with $10, 000$ human-verified question-answering pairs and a high proportion of difficult samples. Moreover, we construct a private test set with $1, 500$ manually annotated images. The consistent evaluation trends observed across both public and private test sets validate the OCRBench v2's reliability. After carefully benchmarking state-of-the-art LMMs, we find that most LMMs score below $50$ ($100$ in total) and suffer from five-type limitations, including less frequently encountered text recognition, fine-grained perception, layout perception, complex element parsing, and logical reasoning. The benchmark and evaluation scripts are available at https: //github. com/Yuliang-Liu/MultimodalOCR.

TMLR Journal 2025 Journal Article

TRIDE: A Text-assisted Radar-Image weather-aware fusion network for Depth Estimation

  • Huawei Sun
  • Zixu Wang
  • Hao Feng
  • Julius Ott
  • Lorenzo Servadei
  • Robert Wille

Depth estimation, essential for autonomous driving, seeks to interpret the 3D environment surrounding vehicles. The development of radar sensors, known for their cost-efficiency and robustness, has spurred interest in radar-camera fusion-based solutions. However, existing algorithms fuse features from these modalities without accounting for weather conditions, despite radars being known to be more robust than cameras under adverse weather. Additionally, while Vision-Language models have seen rapid advancement, utilizing language descriptions alongside other modalities for depth estimation remains an open challenge. This paper first introduces a text-generation strategy along with feature extraction and fusion techniques that can assist monocular depth estimation pipelines, leading to improved accuracy across different algorithms on the KITTI dataset. Building on this, we propose TRIDE, a radar-camera fusion algorithm that enhances text feature extraction by incorporating radar point information. To address the impact of weather on sensor performance, we introduce a weather-aware fusion block that adaptively adjusts radar weighting based on current weather conditions. Our method, benchmarked on the nuScenes dataset, demonstrates performance gains over the state-of-the-art, achieving a 12.87% improvement in MAE and a 9.08% improvement in RMSE. Code: https://github.com/harborsarah/TRIDE

EAAI Journal 2024 Journal Article

A visual detection algorithm for autonomous driving road environment perception

  • Peichao Cong
  • Hao Feng
  • Shanda Li
  • Tianheng Li
  • Yutao Xu
  • Xin Zhang

Achieving accurate and real-time perception of environmental targets in complex traffic scenes based on visual sensors is a challenging research problem in the field of autonomous driving technology. In methods to date, it is difficult to effectively balance the detection accuracy and speed. To this end, this paper proposes an interactive and lightweight visual detection algorithm – YRDM (Your Region Decision-Making) – based on the concepts of efficient mining and utilisation of target feature information, lightweight network structure, and optimisation of label allocation for highly practical detection of ambient targets in autonomous driving scenarios. First, a two-stage algorithm architecture consisting of four low-parameter subnetworks is constructed with the goal of efficiently mining and utilising target feature information, and the accuracy and effectiveness of the algorithm are balanced through the interaction of information between the subnetworks. Second, in order to further improve the detection speed, lightweight convolution is introduced into the structure of the YRDM network to construct the DSC3 module, which allows lightweight processing of the subnetwork structure. Finally, by converting the label assignment problem into an optimal transport problem, adaptation to the global nature of the samples by YRDM is improved, allowing better detection accuracy. The algorithm is tested with two major public datasets, BDD100K and KITTI, and a large number of experimental results show that the comprehensive performance of YRDM is better than other existing algorithms. In addition, ablation experiments and mobile terminal device deployment experiments further demonstrate the effectiveness and real-time performance of this algorithm.

IROS Conference 2024 Conference Paper

CaFNet: A Confidence-Driven Framework for Radar Camera Depth Estimation

  • Huawei Sun
  • Hao Feng
  • Julius Ott
  • Lorenzo Servadei
  • Robert Wille

Depth estimation is critical in autonomous driving for interpreting 3D scenes accurately. Recently, radar-camera depth estimation has become of sufficient interest due to the robustness and low-cost properties of radar. Thus, this paper introduces a two-stage, end-to-end trainable Confidence-aware Fusion Net (CaFNet) for dense depth estimation, combining RGB imagery with sparse and noisy radar point cloud data. The first stage addresses radar-specific challenges, such as ambiguous elevation and noisy measurements, by predicting a radar confidence map and a preliminary coarse depth map. A novel approach is presented for generating the ground truth for the confidence map, which involves associating each radar point with its corresponding object to identify potential projection surfaces. These maps, together with the initial radar input, are processed by a second encoder. For the final depth estimation, we innovate a confidence-aware gated fusion mechanism to integrate radar and image features effectively, thereby enhancing the reliability of the depth map by filtering out radar noise. Our methodology, evaluated on the nuScenes dataset, demonstrates superior performance, improving upon the current leading model by 3. 2% in Mean Absolute Error (MAE) and 2. 7% in Root Mean Square Error (RMSE). Code: https://github.com/harborsarah/CaFNet

NeurIPS Conference 2024 Conference Paper

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training

  • Jinda Jia
  • Cong Xie
  • Hanlin Lu
  • Daoce Wang
  • Hao Feng
  • Chengming Zhang
  • Baixi Sun
  • Haibin Lin

Recent years have witnessed a clear trend towards language models with an ever-increasing number of parameters, as well as the growing training overhead and memory usage. Distributed training, particularly through Sharded Data Parallelism (ShardedDP) which partitions optimizer states among workers, has emerged as a crucial technique to mitigate training time and memory usage. Yet, a major challenge in the scalability of ShardedDP is the intensive communication of weights and gradients. While compression techniques can alleviate this issue, they often result in worse accuracy. Driven by this limitation, we propose SDP4Bit (Toward 4Bit Communication Quantization in Sharded Data Parallelism for LLM Training), which effectively reduces the communication of weights and gradients to nearly 4 bits via two novel techniques: quantization on weight differences, and two-level gradient smooth quantization. Furthermore, SDP4Bit presents an algorithm-system co-design with runtime optimization to minimize the computation overhead of compression. Additional to the theoretical guarantees of convergence, we empirically evaluate the accuracy of SDP4Bit on the pre-training of GPT models with up to 6. 7 billion parameters, and the results demonstrate a negligible impact on training loss. Furthermore, speed experiments show that SDP4Bit achieves up to 4. 08× speedup in end-to-end throughput on a scale of 128 GPUs.

NeurIPS Conference 2024 Conference Paper

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

  • Weichao Zhao
  • Hao Feng
  • Qi Liu
  • Jingqun Tang
  • Shu Wei
  • Binghong Wu
  • Lei Liao
  • Yongjie Ye

Tables contain factual and quantitative data accompanied by various structures and contents that pose challenges for machine comprehension. Previous methods generally design task-specific architectures and objectives for individual tasks, resulting in modal isolation and intricate workflows. In this paper, we present a novel large vision-language model, TabPedia, equipped with a concept synergy mechanism. In this mechanism, all the involved diverse visual table understanding (VTU) tasks and multi-source visual embeddings are abstracted as concepts. This unified framework allows TabPedia to seamlessly integrate VTU tasks, such as table detection, table structure recognition, table querying, and table question answering, by leveraging the capabilities of large language models (LLMs). Moreover, the concept synergy mechanism enables table perception-related and comprehension-related tasks to work in harmony, as they can effectively leverage the needed clues from the corresponding source perception embeddings. Furthermore, to better evaluate the VTU task in real-world scenarios, we establish a new and comprehensive table VQA benchmark, ComTQA, featuring approximately 9, 000 QA pairs. Extensive quantitative and qualitative experiments on both table perception and comprehension tasks, conducted across various public benchmarks, validate the effectiveness of our TabPedia. The superior performance further confirms the feasibility of using LLMs for understanding visual tables when all concepts work in synergy. The benchmark ComTQA has been open-sourced at https: //huggingface. co/datasets/ByteDance/ComTQA. The source code and model also have been released at https: //github. com/zhaowc-ustc/TabPedia.

EAAI Journal 2024 Journal Article

Transformer-based dual-view X-ray security inspection image analysis

  • Xianglong Meng
  • Hao Feng
  • Yu Ren
  • Haigang Zhang
  • Weidong Zou
  • Xinyu Ouyang

Artificial intelligence technology is rapidly advancing and has been widely applied in the field of intelligent security inspection. Utilizing computer vision technology to detect prohibited items in X-ray images has drawn much attention. Due to the transmission effect of X-rays, single-view security inspection images are prone to object occlusion and poor imaging angles, which seriously affects the performance of object detection models. Dual-view security inspection equipment can simultaneously capture X-ray transmission images of the item under inspection from both horizontal and vertical angles, which can effectively address issues of poor imaging angles and object occlusions that single-view imaging cannot resolve. In this paper, we introduced the artificial intelligence technology in dual-view security inspection image analysis, and proposed the dual-view feature fusion and prohibited item detection model in X-ray security inspection images based on the Vision Transformer framework. The detection model contains two input channels: the main and secondary channel. The main function of the main channel is to detect prohibited items in security inspection images, while the secondary channel is dedicated to providing effective feature information of prohibited items for the main channel. Two feature interaction modules are applied in the proposed model to realize dual channel information exchange and supplement from local and global perspectives respectively. Simulation results based on the public Dualray dataset have demonstrated the state-of-the-art performance of the proposed dual-view X-ray image detection model. Code is available at https: //github. com/zhg-SZPT/Trans2Ray.

EAAI Journal 2023 Journal Article

Adaptive sliding mode controller based on fuzzy rules for a typical excavator electro-hydraulic position control system

  • Hao Feng
  • Jinye Jiang
  • Xiaodan Chang
  • Chenbo Yin
  • Donghui Cao
  • Hongfu Yu
  • Chunbiao Li
  • Jiaxue Xie

In order to improve the trajectory tracking accuracy and robustness of a typical heavy electro-hydraulic position system, a fuzzy adaptive sliding mode controller is proposed. A potential-like function is introduced to design a new sliding surface with non-linear integral term. An adaptive controller is designed to approximate the equivalent controller in sliding mode control. Stability of the controller is demonstrated by Lyapunov method, and the chattering phenomenon caused by the nonlinear switching term is reduced by using the fuzzy switching method, 25 fuzzy rules are given to adjust the sliding mode switching controller. Simulations with sinusoidal, ramp and step signals, and experiments with leveling operation at different speeds are carried out. The proposed controller can follow the reference trajectory quickly and smoothly, and has certain anti-interference ability. Compared with the traditional sliding mode controller, the proposed controller has small tracking error, fast response and good tracking performances.

AAAI Conference 2020 Conference Paper

A Practical Approach to Forgetting in Description Logics with Nominals

  • Yizheng Zhao
  • Renate Schmidt
  • Yuejie Wang
  • Xuanming Zhang
  • Hao Feng

This paper investigates the problem of forgetting in description logics with nominals. In particular, we develop a practical method for forgetting concept and role names from ontologies specified in the description logic ALCO, extending the basic ALC with nominals. The method always terminates, and is sound in the sense that the forgetting solution computed by the method has the same logical consequences with the original ontology. The method is so far the only approach to deductive forgetting in description logics with nominals. An evaluation of a prototype implementation shows that the method achieves a significant speed-up and notably better success rates than the LETHE tool which performs deductive forgetting for ALC-ontologies. Compared to FAME, a semantic forgetting tool for ALCOIH-ontologies, better success rates are attained. From the perspective of ontology engineering this is very useful, as it provides ontology curators with a powerful tool to produce views of ontologies.

AAAI Conference 2019 Conference Paper

Tracking Logical Difference in Large-Scale Ontologies: A Forgetting-Based Approach

  • Yizheng Zhao
  • Ghadah Alghamdi
  • Renate A. Schmidt
  • Hao Feng
  • Giorgos Stoilos
  • Damir Juric
  • Mohammad Khodadadi

This paper explores how the logical difference between two ontologies can be tracked using a forgetting-based or uniform interpolation (UI)-based approach. The idea is that rather than computing all entailments of one ontology not entailed by the other ontology, which would be computationally infeasible, only the strongest entailments not entailed in the other ontology are computed. To overcome drawbacks of existing forgetting/uniform interpolation tools we introduce a new forgetting method designed for the task of computing the logical difference between different versions of large-scale ontologies. The method is sound and terminating, and can compute uniform interpolants for ALC-ontologies as large as SNOMED CT and NCIt. Our evaluation shows that the method can achieve considerably better success rates (>90%) and provides a feasible approach to computing the logical difference in large-scale ontologies, as a case study on different versions of SNOMED CT and NCIt ontologies shows.