EAAI Journal 2026 Journal Article
Cross-query contextual clues dynamic enhancement for partially relevant video retrieval
- Ou Ye
- Rongkang Wang
- Zhenhua Yu
- Yun Zhang
- Wenchao Zhang
- Liangguo Xiao
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
EAAI Journal 2026 Journal Article
EAAI Journal 2026 Journal Article
ICLR Conference 2025 Conference Paper
Knowledge distillation (KD) compresses deep neural networks by transferring task-related knowledge from cumbersome pre-trained teacher models to more compact student models. However, vanilla KD for image super-resolution (SR) networks yields only limited improvements due to the inherent nature of SR tasks, where the outputs of teacher models are noisy approximations of high-quality label images. In this work, we show that the potential of vanilla KD has been underestimated and demonstrate that the ingenious application of data augmentation methods can close the gap between it and more complex, well-designed methods. Unlike conventional training processes typically applying image augmentations simultaneously to both low-quality inputs and high-quality labels, we propose AugKD utilizing unpaired data augmentations to 1) generate auxiliary distillation samples and 2) impose label consistency regularization. Comprehensive experiments show that the AugKD significantly outperforms existing state-of-the-art KD methods across a range of SR tasks.
NeurIPS Conference 2025 Conference Paper
Recent advancements in Vision-Language-Action (VLA) models have shown promise for end-to-end autonomous driving by leveraging world knowledge and reasoning capabilities. However, current VLA models often struggle with physically infeasible action outputs, complex model structures, or unnecessarily long reasoning. In this paper, we propose AutoVLA, a novel VLA model that unifies reasoning and action generation within a single autoregressive generation model for end-to-end autonomous driving. AutoVLA performs semantic reasoning and trajectory planning directly from raw visual inputs and language instructions. We tokenize continuous trajectories into discrete, feasible actions, enabling direct integration into the language model. For training, we employ supervised fine-tuning to equip the model with dual thinking modes: fast thinking (trajectory-only) and slow thinking (enhanced with chain-of-thought reasoning). To further enhance planning performance and efficiency, we introduce a reinforcement fine-tuning method based on Group Relative Policy Optimization (GRPO), reducing unnecessary reasoning in straightforward scenarios. Extensive experiments across real-world and simulated datasets and benchmarks, including nuPlan, nuScenes, Waymo, and CARLA, demonstrate the competitive performance of AutoVLA in both open-loop and closed-loop settings. Qualitative results showcase the adaptive reasoning and accurate planning capabilities of AutoVLA in diverse scenarios.
ICLR Conference 2025 Conference Paper
Post-training quantization (PTQ) has played a pivotal role in compressing large language models (LLMs) at ultra-low costs. Although current PTQ methods have achieved promising results by addressing outliers and employing layer- or block-wise loss optimization techniques, they still suffer from significant performance degradation at ultra-low bits precision. To dissect this issue, we conducted an in-depth analysis of quantization errors specific to LLMs and surprisingly discovered that, unlike traditional sources of quantization errors, the growing number of model parameters, combined with the reduction in quantization bits, intensifies inter-layer and intra-layer dependencies, which severely impact quantization accuracy. This finding highlights a critical challenge in quantizing LLMs. To address this, we propose CBQ, a cross-block reconstruction-based PTQ method for LLMs. CBQ leverages a cross-block dependency to establish long-range dependencies across multiple blocks and integrates an adaptive LoRA-Rounding technique to manage intra-layer dependencies. To further enhance performance, CBQ incorporates a coarse-to-fine pre-processing mechanism for processing weights and activations. Extensive experiments show that CBQ achieves superior low-bit quantization (W4A4, W4A8, W2A16) and outperforms existing state-of-the-art methods across various LLMs and datasets. Notably, CBQ only takes 4.3 hours to quantize a weight-only quantization of a 4-bit LLAMA1-65B model, achieving a commendable trade off between performance and efficiency.
EAAI Journal 2025 Journal Article
YNIMG Journal 2025 Journal Article
YNIMG Journal 2025 Journal Article
ICLR Conference 2025 Conference Paper
Knowledge distillation (KD) is a promising yet challenging model compression approach that transmits rich learning representations from robust but resource-demanding teacher models to efficient student models. Previous methods for image super-resolution (SR) are often tailored to specific teacher-student architectures, limiting their potential for improvement and hindering broader applications. This work presents a novel KD framework for SR models, the multi-granularity Mixture of Priors Knowledge Distillation (MiPKD), which can be universally applied to a wide range of architectures at both feature and block levels. The teacher’s knowledge is effectively integrated with the student's feature via the Feature Prior Mixer, and the reconstructed feature propagates dynamically in the training phase with the Block Prior Mixer. Extensive experiments illustrate the significance of the proposed MiPKD technique.
NeurIPS Conference 2025 Conference Paper
Large language models (LLMs) have shown that generative pretraining can distill vast world knowledge into compact token representations. While LLMs encapsulate extensive world knowledge, they remain limited in modeling the behavioral knowledge contained within user interaction histories. User behavior forms a distinct modality, where each action—defined by multi-dimensional attributes such as time, context, and transaction type—constitutes a behavioral token. Modeling these high-cardinality, sparse, and irregular sequences is challenging, and discriminative models often falter under limited supervision. To bridge this gap, we extend generative pretraining to user behavior, learning transferable representations from unlabeled behavioral data analogous to how LLMs learn from text. We present PANTHER, a hybrid generative–discriminative framework that unifies user behavior pretraining and downstream adaptation, enabling large-scale sequential user representation learning and real-time inference. PANTHER introduces: (1) Structured Tokenization to compress multi-dimensional transaction attributes into an interpretable vocabulary; (2) Sequence Pattern Recognition Module (SPRM) for modeling periodic transaction motifs; (3) a Unified User-Profile Embedding that fuses static demographics with dynamic transaction histories, enabling both personalized predictions and population-level knowledge transfer; and (4) Real-time scalability enabled by offline caching of pre-trained embeddings for millisecond-level inference. Fully deployed and operational online at WeChat Pay, PANTHER delivers a 25. 6\% boost in next-transaction prediction HitRate@1 and a 38. 6\% relative improvement in fraud detection recall over baselines. Cross-domain evaluations on public benchmarks (CCT, MBD, MovieLens-1M, Yelp) show strong generalization, achieving up to 21\% HitRate@1 gains over transformer baselines, establishing PANTHER as a scalable, high-performance framework for industrial user sequential behavior modeling.
IROS Conference 2025 Conference Paper
In this paper, we introduce a novel estimator for vision-aided inertial navigation systems (VINS), the Preconditioned Cholesky-based Square Root Information Filter (PC-SRIF). When solving linear systems, employing Cholesky decomposition offers superior efficiency but can compromise numerical stability. Due to this, existing VINS utilizing (Square Root) Information Filters often opt for QR decomposition on platforms where single precision is preferred, avoiding the numerical challenges associated with Cholesky decomposition. While these issues are often attributed to the ill-conditioned information matrix in VINS, our analysis reveals that this is not an inherent property of VINS but rather a consequence of specific parameterizations. We identify several factors that contribute to an ill-conditioned information matrix and propose a preconditioning technique to mitigate these conditioning issues. Building on this analysis, we present PC-SRIF, which exhibits remarkable stability in performing Cholesky decomposition in single precision when solving linear systems in VINS. Consequently, PC-SRIF achieves superior theoretical efficiency compared to alternative estimators. To validate the efficiency advantages and numerical stability of PC-SRIF based VINS, we have conducted well controlled experiments, which provide empirical evidence in support of our theoretical findings. Remarkably, in our VINS implementation, PC-SRIF’s runtime is 41% faster than QR-based SRIF.
EAAI Journal 2024 Journal Article
EAAI Journal 2024 Journal Article
YNIMG Journal 2024 Journal Article
YNIMG Journal 2023 Journal Article
AILAW Journal 2023 Journal Article
Abstract Law article prediction is a task of predicting the relevant laws and regulations involved in a case according to the description text of the case, and it has broad application prospects in improving judicial efficiency. In the existing research work, researchers often only consider a single case, employing the neural network method to extract features for prediction, which lack the mining of related and common element information between different data. In order to solve this problem, we propose a law article prediction method that integrates the characteristics of common elements. It can effectively utilize the co-occurrence information of the training data, fully mine the relevant common elements between cases, and fuse local features. Experiments show that our method performs well.
EAAI Journal 2023 Journal Article
JBHI Journal 2021 Journal Article
Adverse drug-drug interaction (ADDI) becomes a significant threat to public health. Despite the detection of ADDIs is experimentally implemented in the early development phase of drug design, many potential ADDIs are still clinically explored by accidents, leading to a large number of morbidity and mortality. Several computational models are designed for ADDI prediction. However, they take no consideration of drug dependency, although many drugs usually produce synergistic effects and own highly mutual dependency in treatments, which contains underlying information about ADDIs and benefits ADDI prediction. In this paper, we design a dependent network to model the drug dependency and propose an attribute supervised learning model Probabilistic Dependent Matrix Tri-Factorization (PDMTF) for ADDI prediction. In particular, PDMTF incorporates two drug attributes, molecular structure and side effect, and their correlation to model the adverse interactions among drugs. The dependent network is represented by a dependent matrix, which is first formulated by the row precision matrix of the predicted attribute matrices and then regularized by the molecular structure similarities among drugs. Meanwhile, an efficient alternating algorithm is designed for solving the optimization problem of PDMTF. Experiments demonstrate the superior performance of the proposed model when compared with eight baselines and its two variants.
EAAI Journal 2021 Journal Article
EAAI Journal 2017 Journal Article
JBHI Journal 2017 Journal Article
In this paper, we propose a novel concordance coefficient, called order statistics concordance coefficient (OSCOC), to quantify the association among multichannel biosignals. To uncover its properties, we compare OSCOC with three other similar indexes, i. e. , average Pearson's product moment correlation coefficient (APPMCC), Kendall's concordance coefficients (KCC), and average Kendall's tau (AKT), under a multivariate normal model (MNM), linear model (LM), and nonlinear model. To further demonstrate its usefulness, we present an example on atrial arrhythmia analysis based on real-world multichannel cardiac signals. Theoretical derivations as well as numerical results suggest that 1) under MNM and LM, OSCOC performs equally well with APPMCC, and outperforms the other two methods, 2) in nonlinear case, OSCOC even has better performance than KCC and AKT, which are well known to be robust under increasing nonlinear transformations, and 3) OSCOC performs the best in the case study of arrhythmia analysis in terms of the volume under the surface.
EAAI Journal 2015 Journal Article
YNIMG Journal 2015 Journal Article
EAAI Journal 2013 Journal Article
EAAI Journal 2008 Journal Article