Author name cluster

Hanlin Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

EAAI Journal 2026 Journal Article

Vector attention-based point cloud network for semantic segmentation of sewer sonar data

Wenli Liu
Yueming Jiang
Hanlin Li
Lei Yang
Hanbin Luo

Sonar technology is unaffected by lighting or water conditions, making it ideal for inspecting water-filled sewers. Nonetheless, significant challenges remain in utilizing sonar point clouds effectively. This research introduces the Vector Attention-based Point Cloud Network (VAPCNet), a deep learning method for semantic segmentation of sewer sonar point clouds. It is based on a U-Net style encoder-decoder architecture and consists of the attention module, the contraction module, and the expansion module. Additionally, to mitigate data imbalance, a weighted focal loss was employed during training. VAPCNet demonstrates excellent performance on a sewer dataset collected by a sonar robot, achieving an overall accuracy of 95. 9 % and a mean Intersection over Union (mIoU) of 86. 4 %. It demonstrates robustness to point perturbations and supports a lightweight design by adjusting encoder stages without sacrificing accuracy. These advantages make VAPCNet an innovative solution for employing sonar technology in sewer detection, emphasizing its practical potential.

Details DOI

NeurIPS Conference 2025 Conference Paper

Collective Bargaining in the Information Economy Can Address AI-Driven Power Concentration

Nicholas Vincent
Matt Prewitt
Hanlin Li

This position paper argues that there is an urgent need to restructure markets for the information that goes into AI systems. Specifically, small-to-medium sized producers of information (such as journalists, news organizations, researchers, and creative professionals) need to be able to appoint representatives who can carry out "collective bargaining" with AI product builders in order to receive a reasonable terms and a fair return on the informational value they contribute. Obstacles to this market structure can be removed through technical work that facilitates collective bargaining in the information economy (e. g. , explainable data value estimation and federated data management tools) and regulatory/policy interventions (e. g. , support for trusted data intermediary organizations that represent guilds or syndicates of information producers). We argue that without collective bargaining in the information economy, AI will exacerbate a large-scale "information market failure" that will lead not only to undesirable concentration of capital, but also to a potential "ecological collapse" in the informational commons. On the other hand, collective bargaining in the information economy can create market conditions necessary for a pro-social AI future. We provide concrete actions that can be taken to support a coalition-based approach to achieve this.

PDF Details

NeurIPS Conference 2025 Conference Paper

Decoupled Entropy Minimization

Jing Ma
Hanlin Li
Xiang Xiang

Entropy Minimization (EM) is beneficial to reducing class overlap, bridging domain gap, and restricting uncertainty for various tasks in machine learning, yet its potential is limited. To study the internal mechanism of EM, we reformulate and decouple the classical EM into two parts with opposite effects: cluster aggregation driving factor (CADF) rewards dominant classes and prompts a peaked output distribution, while gradient mitigation calibrator (GMC) penalizes high-confidence classes based on predicted probabilities. Furthermore, we reveal the limitations of classical EM caused by its coupled formulation: 1) reward collapse impedes the contribution of high-certainty samples in the learning process, and 2) easy-class bias induces misalignment between output distribution and label distribution. To address these issues, we propose Adaptive Decoupled Entropy Minimization (AdaDEM), which normalizes the reward brought from CADF and employs a marginal entropy calibrator (MEC) to replace GMC. AdaDEM outperforms DEM*, an upper-bound variant of classical EM, and achieves superior performance across various imperfectly supervised learning tasks in noisy and dynamic environments.

PDF Details

ICML Conference 2025 Conference Paper

PTTA: Purifying Malicious Samples for Test-Time Model Adaptation

Jing Ma
Hanlin Li
Xiang Xiang 0001

Test-Time Adaptation (TTA) enables deep neural networks to adapt to arbitrary distributions during inference. Existing TTA algorithms generally tend to select benign samples that help achieve robust online prediction and stable self-training. Although malicious samples that would undermine the model’s optimization should be filtered out, it also leads to a waste of test data. To alleviate this issue, we focus on how to make full use of the malicious test samples for TTA by transforming them into benign ones, and propose a plug-and-play method, PTTA. The core of our solution lies in the purification strategy, which retrieves benign samples having opposite effects on the objective function to perform Mixup with malicious samples, based on a saliency indicator for encoding benign and malicious data. This strategy results in effective utilization of the information in malicious samples and an improvement of the models’ online test accuracy. In this way, we can directly apply the purification loss to existing TTA algorithms without the need to carefully adjust the sample selection threshold. Extensive experiments on four types of TTA tasks as well as classification, segmentation, and adversarial defense demonstrate the effectiveness of our method. Code is available at https: //github. com/HAIV-Lab/PTTA.

Details

NeurIPS Conference 2024 Conference Paper

A Systematic Review of NeurIPS Dataset Management Practices

Yiwei Wu
Leah Ajmani
Shayne Longpre
Hanlin Li

As new machine learning methods demand larger training datasets, researchers and developers face significant challenges in dataset management. Although ethics reviews, documentation, and checklists have been established, it remains uncertain whether consistent dataset management practices exist across the community. This lack of a comprehensive overview hinders our ability to diagnose and address fundamental tensions and ethical issues related to managing large datasets. We present a systematic review of datasets published at the NeurIPS Datasets and Benchmarks track, focusing on four key aspects: provenance, distribution, ethical disclosure, and licensing. Our findings reveal that dataset provenance is often unclear due to ambiguous filtering and curation processes. Additionally, a variety of sites are used for dataset hosting, but only a few offer structured metadata and version control. These inconsistencies underscore the urgent need for standardized data infrastructures for the publication and management of datasets.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Consent in Crisis: The Rapid Decline of the AI Data Commons

Shayne Longpre
Robert Mahari
Ariel Lee
Campbell Lund
Hamidah Oderinwale
William Brannon
Nayan Saxena
Naana Obeng-Marnu

General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. Our audit of 14, 000 web domains provides an expansive view of crawlable web data and how codified data use preferences are changing over time. We observe a proliferation of AI-specific clauses to limit use, acute differences in restrictions on AI developers, as well as general inconsistencies between websites' expressed intentions in their Terms of Service and their robots. txt. We diagnose these as symptoms of ineffective web protocols, not designed to cope with the widespread re-purposing of the internet for AI. Our longitudinal analyses show that in a single year (2023-2024) there has been a rapid crescendo of data restrictions from web sources, rendering ~5\%+ of all tokens in C4, or 28%+ of the most actively maintained, critical sources in C4, fully restricted from use. For Terms of Service crawling restrictions, a full 45% of C4 is now restricted. If respected or enforced, these restrictions are rapidly biasing the diversity, freshness, and scaling laws for general-purpose AI systems. We hope to illustrate the emerging crises in data consent, for both developers and creators. The foreclosure of much of the open web will impact not only commercial AI, but also non-commercial AI and academic research.

PDF Details DOI

EAAI Journal 2024 Journal Article

Convolutional point transformer for semantic segmentation of sewer sonar point clouds

Chen Li
Hanlin Li
Ke Chen

The application of sonar technology in sewer inspections offers significant potential for improving inspection efficiency. However, the point cloud data obtained via sonar encounters challenges such as excessive noise, irregular spatial distribution, and imbalanced data distribution. This study introduces the Convolutional Point Transformer for Semantic Segmentation (CPTSS) approach, specifically tailored for the precise identification of sewer defects. The architecture of CPTSS features a streamlined encoder-decoder framework, where the encoder module effectively combines the strengths of point transformer and convolutional techniques. This integration optimizes the model's ability to extract both local and global features, capture remote contextual information, and improve overall learning performance. Additionally, an α-balanced focal loss is proposed to address the imbalanced data distribution during training. The CPTSS was validated through field testing. The resulting metrics, including macro precision, macro recall, macro F1 score, and mean Intersection over Union (MIoU), yielded impressive values of 0. 9562, 0. 9020, 0. 9234, and 0. 8662, respectively. Furthermore, the CPTSS outperforms state-of-the-art methods including Point Transformer, Randla-Net, and KPConv in terms of MIoU, and exhibits strong generalization capability across diverse sewer conditions. These findings highlight the CPTSS as a significant advancement in sonar-based sewer inspection method, with the potential to substantially reduce the time and resources required for accurate inspections.

Details DOI

EAAI Journal 2024 Journal Article

Multisensor data fusion approach for sediment assessment of sewers in operation

Chen Li
Ke Chen
Hanlin Li
Hanbin Luo

Urban sewer systems are essential components of urban water infrastructure, but their operations are often affected by sediment. Existing sediment assessment methods generally adopt closed-circuit television (CCTV) or individual types of sensors, but they fail to accurately locate the sediment and quantify the sediment volume. More seriously, these methods become ineffective in an operating sewer. In this study, a sewer sediment assessment approach is developed based on multisensor data fusion (SA-MDF). The raw data are collected by a remotely operated vehicle (ROV) equipped with a rotating sonar device, a gyroscope, an accelerometer, and an odometer. Subsequently, a two-step process of multisensor data fusion is implemented. In the first step, the unscented Kalman filter (UKF) and Rauch-Tung-Striebel (RTS) are applied to fuse the gyroscope, accelerometer, and odometer data, thereby achieving precise localization of the ROV inside the sewer. In the second step, the point cloud data collected by sonar are fused with the sensor data to address the point cloud data offset caused by the sewage flow and robot motion jitter. The laboratory and field experiments demonstrated that the SA-MDF can effectively reduce the ROV location error from 32. 37% to 0. 7% and the sediment quantification error from 15. 25% to 6. 38%. As a result, the SA-MDF facilitates accurate sediment assessment of the sewer, which offers valuable support for sewer maintenance decisions.

Details DOI