Arrow Research search

Author name cluster

Jiabao Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
2 author rows

Possible papers

6

AILAW Journal 2025 Journal Article

Adversarial training flat-lattice transformer for named entity recognition of chinese legal texts

  • Jiabao Wang
  • Kaixuan Wang
  • Yang Weng
  • Xin Li

Abstract Judgment documents are the legally binding written conclusion made by the court based on the facts of the case and the law. Due to the use of professional terms and nested combinations of words, potential information of judgment documents has not been deeply excavated. Named Entity Recognition (NER) is a necessary task in Natural Language Processing (NLP), and has been widely introduced into Chinese texts processing for many years. However, the professional terms and nested words lead to the boundaries between entities being blurred, which can not accurately divide entities. In this paper, a new NER model Adversarial Training Flat-Lattice Transformer (AT-Flat) which combines adversarial training and Flat-Lattice Transformer (Flat) is proposed to weaken these problems. In AT-Flat, the Flat combines character and word information to get sequence information, and the CRF is used to output the final entity prediction results. Moreover, the key point is an adversarial training framework introduced to integrate task-shared word boundary information from Chinese Word Segmentation (CWS) task into Chinese NER task. The framework is able to filter out the noise caused by CWS task and further enhance the effect of Chinese NER task. More importantly, experiments on NER task of Chinese traffic accident and financial lending judgment documents show that our method outperforms other state-of-the-art methods. These verify that our method can effectively alleviate the problem of poor NER effect caused by professional terms and nested words. In addition, three public Chinese NER datasets were also used to evaluate our method.

NeurIPS Conference 2025 Conference Paper

COME: Adding Scene-Centric Forecasting Control to Occupancy World Model

  • Yining Shi
  • Kun Jiang
  • Qiang Meng
  • Ke Wang
  • Jiabao Wang
  • Wenchao Sun
  • Tuopu Wen
  • MengMeng Yang

World models are critical for autonomous driving to simulate environmental dynamics and generate synthetic data. Existing methods struggle to disentangle ego-vehicle motion (perspective shifts) from scene evolvement (agent interactions), leading to suboptimal predictions. Instead, we propose to separate environmental changes from ego-motion by leveraging the scene-centric coordinate systems. In this paper, we introduce COME: a framework that integrates scene-centric forecasting Control into the Occupancy world ModEl. Specifically, COME first generates ego-irrelevant, spatially consistent future features through a scene-centric prediction branch, which are then converted into scene condition using a tailored ControlNet. These condition features are subsequently injected into the occupancy world model, enabling more accurate and controllable future occupancy predictions. Experimental results on the nuScenes-Occ3D dataset show that COME achieves consistent and significant improvements over state-of-the-art (SOTA) methods across diverse configurations, including different input sources (ground-truth, camera-based, fusion-based occupancy) and prediction horizons (3s and 8s). For example, under the same settings, COME achieves 26. 3% better mIoU metric than DOME and 23. 7% better mIoU metric than UniScene. These results highlight the efficacy of disentangled representation learning in enhancing spatio-temporal prediction fidelity for world models. Code is available at https: //github. com/synsin0/COME.

AAAI Conference 2025 Conference Paper

Hybrid-Driving: An Autonomous Driving Decision Framework Integrating Large Language Models, Knowledge Graphs and Driving Rules

  • Jiabao Wang
  • Zepeng Wu
  • Qian Dong
  • Lingzhong Meng
  • Yunzhi Xue
  • Yukuan Yang

Recent advancements have underscored the exceptional analytical and situational understanding capabilities of Large Language Models (LLMs) in autonomous driving decisions. However, the inherent hallucination issues of LLMs pose significant safety concerns when utilized as standalone decision-making systems. To address these challenges, we propose the Hybrid-Driving framework, which leverages LLMs' situational comprehension and reasoning abilities alongside the specialized driving expertise embedded in knowledge graphs and driving rules, thereby enhancing the safety, robustness, and reliability of autonomous driving decisions. To articulate driving experiences clearly, we introduce the Scenario Evolution Knowledge Graph (SEKG), which integrates scenario prediction and action risk analysis in autonomous driving. By delineating observation areas and defining Time-to-Collision (TTC) levels, we effectively control the number of driving scenario nodes and ensure scenario diversity. Based on the scenario evolution relationships within the SEKG, we predict scenarios and assess associated action risks. Additionally, we implement a rule-filtering mechanism to eliminate unreasonable actions and employ prompt engineering to integrate scenario information, optional actions, and SEKG-based action risk analysis into the LLMs for decision-making. Extensive experiments demonstrate that our approach substantially improves decision success rates compared to using LLMs alone (≥37.5%), as well as surpasses the DiLu framework with LLMs and few-shot driving memory (≥7.5%), and other reinforcement learning methods (≥11%). These results validate the effectiveness of the Hybrid-Driving framework in enhancing LLM reliability for autonomous driving and advocate for its broader application of domain-specific knowledge across other fields.

ICRA Conference 2025 Conference Paper

LoFSORT: Sample Online and Real-time Tracking in Low Frame Rate Scenarios

  • Jiabao Wang
  • Dong Eui Chang

We propose a novel motion-based tracker specifically designed for tracking multiple people in low frame rate scenarios. While previous studies have predominantly focused on scenarios with high frame rates (exceeding 10 frames per second), tracking in low frame rate conditions is significant for robotic platforms with limited computational resources. Our tracker optimizes the cost function, cascade structure and Kalman filter correction to better adapt to the characteristics of low frame rate environments. First, we enhance the cost function by incorporating stable variables through the introduction of height-based and displacement-based cost terms. Second, we prioritize handling occlusion among individuals during association, which reduces ambiguity in subsequent tracking processes. Third, we utilize the error-compensated detection to correct the Kalman filter, thereby improving tracking accuracy. Experimental results demonstrate that our proposed tracker, LoFSORT, outperforms other motion model-based trackers across various frame rate scenarios. Ablation studies further confirm that each component of our tracker enhances tracking performance in low frame rate scenarios.

EAAI Journal 2025 Journal Article

Paying more attention to local contrast: Improving infrared small target detection performance via prior knowledge

  • Peichao Wang
  • Jiabao Wang
  • Yao Chen
  • Rui Zhang
  • Yang Li
  • Zhuang Miao

The data-driven methods for InfraRed Small Target Detection (IRSTD) have achieved promising results. However, these methods typically incorporate modules with high computational complexity, which enhance performance at the expense of computational efficiency. Utilizing human expert knowledge to assist data-driven methods in better learning with less costs is worthy of exploration. To effectively guide the model to focus on targets’ spatial features, this paper proposes the Local Contrast Attention Enhanced infrared small target detection Network (LCAE-Net), combining prior knowledge with data-driven deep learning methods. LCAE-Net is a U-shaped neural network model which consists of two developed modules: a Local Contrast Enhancement (LCE) module and a Channel Attention Enhancement (CAE) module. The LCE module takes advantage of prior knowledge, leveraging handcrafted convolution operators to acquire Local Contrast Attention (LCA), which could realize background suppression while enhancing the potential target region, thus guiding the neural network to pay more attention to potential infrared small targets’ location information. To effectively utilize the response information throughout the downsampling progresses, the CAE module is proposed to achieve the information fusion among feature maps’ different channels. Experimental results indicate that our LCAE-Net outperforms comparison methods on the three public datasets, and its detection speed could reach up to 70 Frames Per Second (FPS). Meanwhile, our model has a parameter count and Floating-Point Operations (FLOPs) of 1. 945 Million (M) and 4. 862 Giga (G) respectively, which is suitable for deployment on edge devices. Our code will be available at https: //github. com/boa2004plaust/LCAENet.

NeurIPS Conference 2024 Conference Paper

OPUS: Occupancy Prediction Using a Sparse Set

  • Jiabao Wang
  • Zhaojiang Liu
  • Qiang Meng
  • Liujiang Yan
  • Ke Wang
  • Jie Yang
  • Wei Liu
  • Qibin Hou

Occupancy prediction, aiming at predicting the occupancy status within voxelized 3D environment, is quickly gaining momentum within the autonomous driving community. Mainstream occupancy prediction works first discretize the 3D environment into voxels, then perform classification on such dense grids. However, inspection on sample data reveals that the vast majority of voxels is unoccupied. Performing classification on these empty voxels demands suboptimal computation resource allocation, and reducing such empty voxels necessitates complex algorithm designs. To this end, we present a novel perspective on the occupancy prediction task: formulating it as a streamlined set prediction paradigm without the need for explicit space modeling or complex sparsification procedures. Our proposed framework, called OPUS, utilizes a transformer encoder-decoder architecture to simultaneously predict occupied locations and classes using a set of learnable queries. Firstly, we employ the Chamfer distance loss to scale the set-to-set comparison problem to unprecedented magnitudes, making training such model end-to-end a reality. Subsequently, semantic classes are adaptively assigned using nearest neighbor search based on the learned locations. In addition, OPUS incorporates a suite of non-trivial strategies to enhance model performance, including coarse-to-fine learning, consistent point sampling, and adaptive re-weighting, etc. Finally, compared with current state-of-the-art methods, our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6. 1 RayIoU.