EAAI Journal 2026 Journal Article
An efficient and interpretable clustering-based framework for large-scale maritime traffic pattern recognition
- Shaoqing Guo
- Victor Bolbot
- Liangliang Lu
- Pengfei Chen
- Osiris A. Valdez Banda
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
EAAI Journal 2026 Journal Article
AAAI Conference 2026 Conference Paper
Instruction-following is a critical capability of Large Language Models (LLMs). While existing works primarily focus on assessing how well LLMs adhere to user instructions, they often overlook scenarios where instructions contain conflicting constraints—a common occurrence in complex prompts. The behavior of LLMs under such conditions remains under-explored. To bridge this gap, we introduce ConInstruct, a benchmark specifically designed to assess LLMs' ability to detect and resolve conflicts within user instructions. Using this dataset, we evaluate LLMs' conflict detection performance and analyze their conflict resolution behavior. Our experiments reveal two key findings: (1) Most proprietary LLMs exhibit strong conflict detection capabilities, whereas among open-source models, only DeepSeek-R1 demonstrates similarly strong performance. DeepSeek-R1 and Claude-4.5-Sonnet achieve the highest average F1-scores at 91.5% and 87.3%, respectively, ranking first and second overall. (2) Despite their strong conflict detection abilities, LLMs rarely explicitly notify users about the conflicts or request clarification when faced with conflicting constraints. These results underscore a critical shortcoming in current LLMs and highlight an important area for future improvement when designing instruction-following LLMs.
AAAI Conference 2026 Conference Paper
Recent years have witnessed remarkable achievements in perceptual image restoration (IR), creating an urgent demand for accurate image quality assessment (IQA), which is essential for both performance comparison and algorithm optimization. Unfortunately, the existing IQA metrics exhibit inherent weakness for IR task, particularly when distinguishing fine-grained quality differences among restored images. To address this dilemma, we contribute the first-of-its-kind fine-grained image quality assessment dataset for image restoration, termed FGRestore, comprising 18,408 restored images across six common IR tasks. Beyond conventional scalar quality scores, FGRestore was also annotated with 30,886 fine-grained pairwise preferences. Based on FGRestore, a comprehensive benchmark was conducted on the existing IQA metrics, which reveal significant inconsistencies between score-based IQA evaluations and the fine-grained restoration quality. Motivated by these findings, we further propose FGResQ, a new IQA model specifically designed for image restoration, which features both coarse-grained score regression and fine-grained quality ranking. Extensive experiments and comparisons demonstrate that FGResQ significantly outperforms state-of-the-art IQA metrics.
TAAS Journal 2026 Journal Article
Serverless function compositions subject to unpredictable faults are challenging to evaluate for root cause analysis. Even though distributed tracing provides observations at multiple levels of granularity for troubleshooting, excessive code instrumentation increases the tracing overheads in terms of both computation and storage. Therefore, developers face the challenge of where and how to instrument serverless functions to maximize the likelihood of locating faults based on tracing data while minimizing tracing overhead and costs. In this article, we propose a methodology to instrument an application with code-level tracing to infer the location of faults, taking into account constraints in terms of the maximum cost of the instrumentation and testing. We encode the tracing probe placement based on the control flow graph of the application and devise heuristics-based tracing data collection strategies to relate possible probe placements with their ability to locate a fault. Then we train novelty detection models to identify the internal anomalies and present an enhanced global search algorithm that automatically computes a probe placement with optimal fault localization ability versus cost. Experimental results show high performance in locating single and multiple faults with over 90% recall score for up to 15% latency anomalies, with minimal instrumentation overhead.
EAAI Journal 2026 Journal Article
AAAI Conference 2026 Conference Paper
The increasing popularity of long Text-to-Image (T2I) generation has created an urgent need for automatic and interpretable models that can evaluate the image-text alignment in long prompt scenarios. However, the existing T2I alignment benchmarks predominantly focus on short prompt scenarios and only provide MOS or Likert scale annotations. This inherent limitation hinders the development of long T2I evaluators, particularly in terms of the interpretability of alignment. In this study, we contribute LongT2IBench, which comprises 14K long text-image pairs accompanied by graph-structured human annotations. Given the detail-intensive nature of long prompts, we first design a Generate-Refine-Qualify annotation protocol to convert them into textual graph structures that encompass entities, attributes, and relations. Through this transformation, fine-grained alignment annotations are achieved based on these granular elements. Finally, the graph-structed annotations are converted into alignment scores and interpretations to facilitate the design of T2I evaluation models. Based on LongT2IBench, we further propose LongT2IExpert, a LongT2I evaluator that enables multi-modal large language models (MLLMs) to provide both quantitative scores and structured interpretations through an instruction-tuning process with Hierarchical Alignment Chain-of-Thought (CoT). Extensive experiments and comparisons demonstrate the superiority of the proposed LongT2IExpert in alignment evaluation and interpretation.
AAAI Conference 2026 Conference Paper
Livestreaming has become increasingly prevalent in modern visual communication, where automatic camera quality tuning is essential for delivering superior user Quality of Experience (QoE). Such tuning requires accurate blind image quality assessment (BIQA) to guide parameter optimization decisions. Unfortunately, the existing BIQA models typically only predict an overall coarse-grained quality score, which cannot provide fine-grained perceptual guidance for precise camera parameter tuning. To bridge this gap, we first establish FGLive-10K, a comprehensive fine-grained BIQA database containing 10,185 high-resolution images captured under varying camera parameter configurations across diverse livestreaming scenarios. The dataset features 50,925 multi-attribute quality annotations and 19,234 fine-grained pairwise preference annotations. Based on FGLive-10K, we further develop TuningIQA, a fine-grained BIQA metric for livestreaming camera tuning, which integrates human-aware feature extraction and graph-based camera parameter fusion. Extensive experiments and comparisons demonstrate that TuningIQA significantly outperforms state-of-the-art BIQA methods in both score regression and fine-grained quality ranking, achieving superior performance when deployed for livestreaming camera tuning.
EAAI Journal 2025 Journal Article
EAAI Journal 2025 Journal Article
EAAI Journal 2023 Journal Article
AAAI Conference 2021 Conference Paper
Supervised learning under label noise has seen numerous advances recently, while existing theoretical findings and empirical results broadly build up on the class-conditional noise (CCN) assumption that the noise is independent of input features given the true label. In this work, we present a theoretical hypothesis testing and prove that noise in real-world dataset is unlikely to be CCN, which confirms that label noise should depend on the instance and justifies the urgent need to go beyond the CCN assumption. The theoretical results motivate us to study the more general and practical-relevant instancedependent noise (IDN). To stimulate the development of theory and methodology on IDN, we formalize an algorithm to generate controllable IDN and present both theoretical and empirical evidence to show that IDN is semantically meaningful and challenging. As a primary attempt to combat IDN, we present a tiny algorithm termed self-evolution average label (SEAL), which not only stands out under IDN with various noise fractions, but also improves the generalization on realworld noise benchmark Clothing1M. Our code is released1. Notably, our theoretical analysis in Section 2 provides rigorous motivations for studying IDN, which is an important topic that deserves more research attention in future.
AAAI Conference 2021 Conference Paper
Value function is the central notion of Reinforcement Learning (RL). Value estimation, especially with function approximation, can be challenging since it involves the stochasticity of environmental dynamics and reward signals that can be sparse and delayed in some cases. A typical model-free RL algorithm usually estimates the values of a policy by Temporal Difference (TD) or Monte Carlo (MC) algorithms directly from rewards, without explicitly taking dynamics into consideration. In this paper, we propose Value Decomposition with Future Prediction (VDFP), providing an explicit two-step understanding of the value estimation process: 1) first foresee the latent future, 2) and then evaluate it. We analytically decompose the value function into a latent future dynamics part and a policy-independent trajectory return part, inducing a way to model latent dynamics and returns separately in value estimation. Further, we derive a practical deep RL algorithm, consisting of a convolutional model to learn compact trajectory representation from past experiences, a conditional variational auto-encoder to predict the latent future dynamics and a convex return model that evaluates trajectory representation. In experiments, we empirically demonstrate the effectiveness of our approach for both off-policy and on-policy RL in several OpenAI Gym continuous control tasks as well as a few challenging variants with delayed reward.
AAAI Conference 2021 Conference Paper
For multi-class classification under class-conditional label noise, we prove that the accuracy metric itself can be robust. We concretize this finding’s inspiration in two essential aspects: training and validation, with which we address critical issues in learning with noisy labels. For training, we show that maximizing training accuracy on sufficiently many noisy samples yields an approximately optimal classifier. For validation, we prove that a noisy validation set is reliable, addressing the critical demand of model selection in scenarios like hyperparameter-tuning and early stopping. Previously, model selection using noisy validation samples has not been theoretically justified. We verify our theoretical results and additional claims with extensive experiments. We show characterizations of models trained with noisy labels, motivated by our theoretical results, and verify the utility of a noisy validation set by showing the impressive performance of a framework termed noisy best teacher and student (NTS). Our code is released1.