EAAI Journal 2026 Journal Article
Active detection-based concept drift adaptive approach for coal mill condition monitoring
- Jian Xu
- Yuguang Niu
- Ming Du
- Jun Yao
- Guoxiong Zhu
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
EAAI Journal 2026 Journal Article
JBHI Journal 2026 Journal Article
Accurate identification of EEG electrodes associated with epilepsy is essential for developing real-time diagnostic applications. This paper introduces the Explainable Convolutional Channel Ranking (ECCR) method for identifying diagnostically relevant EEG channels for Idiopathic Absence Seizure (IAS) detection by analyzing channel-specific feature contributions learned by a convolutional neural network (CNN). Unlike traditional saliency-based approaches that focus only on highly activated regions or pool contributions across seizure types and spatial locations, ECCR retains channel-specific contribution patterns and shows that channels with moderate contribution levels offer the most discriminative and physiologically consistent information. This finding suggests that channels with very high saliency are often affected by noise or subject-specific artifacts, while medium-contribution channels capture more stable seizure-related information dynamics. In 10-fold cross-validation, the ECCR-guided CNN achieved 82. 21% accuracy and 92. 01% sensitivity, while leave-one-subject-out (LOSO) validation yielded 73. 78% accuracy, demonstrating improved subject-independent performance under a leakage-controlled protocol; ECCR consistently selected fronto-central, temporo-parietal, and occipital regions, reducing 29 channels to 7 in the subject-dependent evaluation. A validation using a Random Forest classifier confirmed that ECCR-selected channels provided stronger detection power than those excluded. These findings suggest that ECCR can guide the design of compact, interpretable EEG systems, supporting more reliable deep learning solutions for IAS diagnosis.
EAAI Journal 2026 Journal Article
JBHI Journal 2026 Journal Article
Remote photoplethysmography (rPPG) is a non-contact method that detects blood volume changes in facial tissues from video. The non-invasiveness of rPPG makes it promising for applications in remote health monitoring and telemedicine. However, its real-world application is hindered by a fundamental challenge. Existing models are typically designed for high-resolution, fixed-size inputs, making them ill-suited for the arbitrary-resolution videos commonly encountered in practical scenarios due to dynamic camera-to-subject distances. To address this challenge, we propose ResoPhys, an unsupervised plug-and-play rPPG measurement method designed for facial videos of arbitrary resolution. This method first generates video pairs via random scaling and then employs specialized modules for arbitrary-resolution feature extraction and upsampling to analyze the resulting multi-scale features. The framework is optimized via an unsupervised contrastive learning approach using our proposed multi-resolution contrastive loss. To validate its performance across a spectrum of resolutions, we evaluated ResoPhys on several public datasets. The results demonstrate the superiority of our method over previous unsupervised approaches, exhibiting particular strength in challenging low-resolution scenarios, which underscores its robustness to resolution changes. Crucially, ResoPhys acts as a universal front-end that decouples resolution handling from signal extraction, empowering existing rPPG networks for effective deployment in arbitrary-resolution conditions. The source code is available at: https://github.com/HeZhongTian-xjtu/ResoPhys.
JBHI Journal 2025 Journal Article
Magnetic Resonance Imaging (MRI) is a widely used medical imaging technique, but its resolution is often limited by acquisition time constraints, potentially compromising diagnostic accuracy. Reference-based Image Super-Resolution (RefSR) has shown promising performance in addressing such challenges by leveraging external high-resolution (HR) reference images to enhance the quality of low-resolution (LR) images. The core objective of RefSR is to accurately establish correspondences between the reference HR image and the LR images. In pursuit of this objective, this paper develops a Self-rectified Texture Supplementation network for RefSR (STS-SR) to enhance fine details in MRI images and support the expanding role of autonomous AI in healthcare. Our network comprises a texture-specified selfrectified feature transfer module and a cross-scale texture complementary network. The feature transfer module employs highfrequency filtering to facilitate the network concentrating on fine details. To better exploit the information from both the reference and LR images, our cross-scale texture complementary module incorporates the All-ViT and Swin Transformer layers to achieve feature aggregation at multiple scales, which enables high-quality image enhancement that is critical for autonomous AI systems in healthcare to make accurate decisions. Extensive experiments are performed across various benchmark datasets. The results validate the effectiveness of our method and demonstrate that the method produces state-of-the-art performance as compared to existing approaches. This advancement enables autonomous AI systems to utilize high-quality MRI images for more accurate diagnostics and reliable predictions.
EAAI Journal 2025 Journal Article
AAAI Conference 2025 Conference Paper
Federated Learning (FL) offers a decentralized approach to model training, where data remains local and only model parameters are shared between the clients and the central server. Traditional methods, such as Federated Averaging (FedAvg), linearly aggregate these parameters which are usually trained on heterogeneous data distributions, potentially overlooking the complex, high-dimensional nature of the parameter space. This can result in degraded performance of the aggregated model. While personalized FL approaches can mitigate the heterogeneous data issue to some extent, the limitation of linear aggregation remains unresolved. To alleviate this issue, we investigate the generative approach of diffusion model and propose a novel generative parameter aggregation framework for personalized FL, pFedGPA. In this framework, we deploy a diffusion model on the server to integrate the diverse parameter distributions and propose a parameter inversion method to efficiently generate a set of personalized parameters for each client. This inversion method transforms the uploaded parameters into a latent code, which is then aggregated through denoising sampling to produce the final personalized parameters. By encoding the dependence of a client's model parameters on the specific data distribution using the high-capacity diffusion model, pFedGPA can effectively decouple the complexity of the overall distribution of all clients' model parameters from the complexity of each individual client's parameter distribution. Our experimental results consistently demonstrate the superior performance of the proposed method across multiple datasets, surpassing baseline approaches.
AAAI Conference 2025 Conference Paper
With the advancement of large-scale language modeling techniques, large multimodal models combining visual encoders with large language models have demonstrated exceptional performance in various visual tasks. Most of the current large multimodal models achieve this by mapping visual features obtained from the visual encoder into a large language model and using them as inputs alongside text for downstream tasks. Therefore, the number of visual tokens directly affects the training and inference speed of the model. There has been significant work on token pruning for visual transformers, but for large multimodal models, only relying on visual information for token pruning or compression may lead to significant loss of important information. On the other hand, the textual input in the form of a question may contain valuable information that can aid in answering the question, providing additional knowledge to the model. To address the potential oversimplification and excessive pruning that can occur with most purely visual token pruning methods, we propose a text information-guided dynamic visual token recovery mechanism that does not require training. This mechanism leverages the similarity between the question text and visual tokens to recover visually meaningful tokens with important text information while merging other less important tokens, to achieve efficient computation for large multimodal models. Experimental results demonstrate that our proposed method achieves comparable performance to the original approach while compressing the visual tokens to an average of 10\% of the original quantity.
NeurIPS Conference 2025 Conference Paper
The core task of recommender systems is to learn user preferences from historical user-item interactions. With the rapid development of large language models (LLMs), recent research has explored leveraging the reasoning capabilities of LLMs to enhance rating prediction tasks. However, existing distillation-based methods suffer from limitations such as the teacher model's insufficient recommendation capability, costly and static supervision, and superficial transfer of reasoning ability. To address these issues, this paper proposes RecZero, a reinforcement learning (RL)-based recommendation paradigm that abandons the traditional multi-model and multi-stage distillation approach. Instead, RecZero trains a single LLM through pure RL to autonomously develop reasoning capabilities for rating prediction. RecZero consists of two key components: (1) "Think-before-Recommendation" prompt construction, which employs a structured reasoning template to guide the model in step-wise analysis of user interests, item features, and user-item compatibility; and (2) rule-based reward modeling, which adopts group relative policy optimization (GRPO) to compute rewards for reasoning trajectories and optimize the LLM. Additionally, the paper explores a hybrid paradigm, RecOne, which combines supervised fine-tuning with RL, initializing the model with cold-start reasoning samples and further optimizing it with RL. Experimental results demonstrate that RecZero and RecOne significantly outperform existing baseline methods on multiple benchmark datasets, validating the superiority of the RL paradigm in achieving autonomous reasoning-enhanced recommender systems.
NeurIPS Conference 2024 Conference Paper
Decision-making in large-scale games is an essential research area in artificial intelligence (AI) with significant real-world impact. However, the limited access to realistic large-scale game environments has hindered research progress in this area. In this paper, we present AuctionNet, a benchmark for bid decision-making in large-scale ad auctions derived from a real-world online advertising platform. AuctionNet is composed of three parts: an ad auction environment, a pre-generated dataset based on the environment, and performance evaluations of several baseline bid decision-making algorithms. More specifically, the environment effectively replicates the integrity and complexity of real-world ad auctions through the interaction of several modules: the ad opportunity generation module employs deep generative networks to bridge the gap between simulated and real-world data while mitigating the risk of sensitive data exposure; the bidding module implements diverse auto-bidding agents trained with different decision-making algorithms; and the auction module is anchored in the classic Generalized Second Price (GSP) auction but also allows for customization of auction mechanisms as needed. To facilitate research and provide insights into the environment, we have also pre-generated a substantial dataset based on the environment. The dataset contains 10 million ad opportunities, 48 diverse auto-bidding agents, and over 500 million auction records. Performance evaluations of baseline algorithms such as linear programming, reinforcement learning, and generative models for bid decision-making are also presented as a part of AuctionNet. AuctionNet has powered the NeurIPS 2024 Auto-Bidding in Large-Scale Auctions competition, providing competition environments for over 1, 500 teams. We believe that AuctionNet is applicable not only to research on bid decision-making in ad auctions but also to the general area of decision-making in large-scale games. Code: https: //github. com/alimama-tech/AuctionNet.
ICML Conference 2024 Conference Paper
We present COALA, a vision-centric Federated Learning (FL) platform, and a suite of benchmarks for practical FL scenarios, which we categorize as task, data, and model levels. At the task level, COALA extends support from simple classification to 15 computer vision tasks, including object detection, segmentation, pose estimation, and more. It also facilitates federated multiple-task learning, allowing clients to train on multiple tasks simultaneously. At the data level, COALA goes beyond supervised FL to benchmark both semi-supervised FL and unsupervised FL. It also benchmarks feature distribution shifts other than commonly considered label distribution shifts. In addition to dealing with static data, it supports federated continual learning for continuously changing data in real-world scenarios. At the model level, COALA benchmarks FL with split models and different models in different clients. COALA platform offers three degrees of customization for these practical FL scenarios, including configuration customization, components customization, and workflow customization. We conduct systematic benchmarking experiments for the practical FL scenarios and highlight potential opportunities for further advancements in FL.
AAAI Conference 2024 Conference Paper
Rearrangement operations form the crux of interactions between humans and their environment. The ability to generate natural, fluid sequences of this operation is of essential value in AR/VR and CG. Bridging a gap in the field, our study introduces FAVOR: a novel dataset for Full-body AR-driven Virtual Object Rearrangement that uniquely employs motion capture systems and AR eyeglasses. Comprising 3k diverse motion rearrangement sequences and 7.17 million interaction data frames, this dataset breaks new ground in research data. We also present a pipeline FAVORITE for producing digital human rearrangement motion sequences guided by instructions. Experimental results, both qualitative and quantitative, suggest that this dataset and pipeline deliver high-quality motion sequences. Our dataset, code, and appendix are available at https://kailinli.github.io/FAVOR.
AAAI Conference 2024 Conference Paper
The theory of Bayesian learning incorporates the use of Student-t Processes to model heavy-tailed distributions and datasets with outliers. However, despite Student-t Processes having a similar computational complexity as Gaussian Processes, there has been limited emphasis on the sparse representation of this model. This is mainly due to the increased difficulty in modeling and computation compared to previous sparse Gaussian Processes. Our motivation is to address the need for a sparse representation framework that reduces computational complexity, allowing Student-t Processes to be more flexible for real-world datasets. To achieve this, we leverage the conditional distribution of Student-t Processes to introduce sparse inducing points. Bayesian methods and variational inference are then utilized to derive a well-defined lower bound, facilitating more efficient optimization of our model through stochastic gradient descent. We propose two methods for computing the variational lower bound, one utilizing Monte Carlo sampling and the other employing Jensen's inequality to compute the KL regularization term in the loss function. We propose adopting these approaches as viable alternatives to Gaussian processes when the data might contain outliers or exhibit heavy-tailed behavior, and we provide specific recommendations for their applicability. We evaluate the two proposed approaches on various synthetic and real-world datasets from UCI and Kaggle, demonstrating their effectiveness compared to baseline methods in terms of computational complexity and accuracy, as well as their robustness to outliers.
NeurIPS Conference 2024 Conference Paper
The Multi-modal Large Language Model (MLLM) based Referring Expression Generation (REG) task has gained increasing popularity, which aims to generate an unambiguous text description that applies to exactly one object or region in the image by leveraging foundation models. We empirically found that there exists a potential trade-off between the detailedness and the correctness of the descriptions for the referring objects. On the one hand, generating sentences with more details is usually required in order to provide more precise object descriptions. On the other hand, complicated sentences could easily increase the probability of hallucinations. To address this issue, we propose a training-free framework, named ``unleash-then-eliminate'', which first elicits the latent information in the intermediate layers, and then adopts a cycle-consistency-based decoding method to alleviate the production of hallucinations. Furthermore, to reduce the computational load of cycle-consistency-based decoding, we devise a Probing-based Importance Estimation method to statistically estimate the importance weights of intermediate layers within a subset. These importance weights are then incorporated into the decoding process over the entire dataset, intervening in the next token prediction from intermediate layers. Extensive experiments conducted on the RefCOCOg and PHD benchmarks show that our proposed framework could outperform existing methods on both semantic and hallucination-related metrics. Code will be made available in https: //github. com/Glupayy/unleash-eliminate.
EAAI Journal 2023 Journal Article
IJCAI Conference 2023 Conference Paper
In large-scale e-commerce live-stream recommendation, streamers are classified into different levels based on their popularity and other metrics for marketing. Several top streamers at the head level occupy a considerable amount of exposure, resulting in an unbalanced data distribution. A unified model for all levels without consideration of imbalance issue can be biased towards head streamers and neglect the conflicts between levels. The lack of inter-level streamer correlations and intra-level streamer characteristics modeling imposes obstacles to estimating the user behaviors. To tackle these challenges, we propose a curriculum multi-level learning framework for imbalanced recommendation. We separate model parameters into shared and level-specific ones to explore the generality among all levels and discrepancy for each level respectively. The level-aware gradient descent and a curriculum sampling scheduler are designed to capture the de-biased commonalities from all levels as the shared parameters. During the specific parameters training, the hardness-aware learning rate and an adaptor are proposed to dynamically balance the training process. Finally, shared and specific parameters are combined to be the final model weights and learned in a cooperative training framework. Extensive experiments on a live-stream production dataset demonstrate the superiority of the proposed framework.
IJCAI Conference 2023 Conference Paper
Automated bidding, an emerging intelligent decision-making paradigm powered by machine learning, has become popular in online advertising. Advertisers in automated bidding evaluate the cumulative utilities and have private financial constraints over multiple ad auctions in a long-term period. Based on these distinct features, we consider a new ad auction model for automated bidding: the values of advertisers are public while the financial constraints, such as budget and return on investment (ROI) rate, are private types. We derive the truthfulness conditions with respect to private constraints for this multi-dimensional setting, and demonstrate any feasible allocation rule could be equivalently reduced to a series of non-decreasing functions on budget. However, the resulted allocation mapped from these non-decreasing functions generally follows an irregular shape, making it difficult to obtain a closed-form expression for the auction objective. To overcome this design difficulty, we propose a family of truthful automated bidding auction with personalized rank scores, similar to the Generalized Second-Price (GSP) auction. The intuition behind our design is to leverage personalized rank scores as the criteria to allocate items, and compute a critical ROI to transforms the constraints on budget to the same dimension as ROI. The experimental results demonstrate that the proposed auction mechanism outperforms the widely used ad auctions, such as first-price auction and second-price auction, in various automated bidding environments.
NeurIPS Conference 2022 Conference Paper
In many web applications, deep learning-based CTR prediction models (deep CTR models for short) are widely adopted. Traditional deep CTR models learn patterns in a static manner, i. e. , the network parameters are the same across all the instances. However, such a manner can hardly characterize each of the instances which may have different underlying distributions. It actually limits the representation power of deep CTR models, leading to sub-optimal results. In this paper, we propose an efficient, effective, and universal module, named as Adaptive Parameter Generation network (APG), which can dynamically generate parameters for deep CTR models on-the-fly based on different instances. Extensive experimental evaluation results show that APG can be applied to a variety of deep CTR models and significantly improve their performance. Meanwhile, APG can reduce the time cost by 38. 7\% and memory usage by 96. 6\% compared to a regular deep CTR model. We have deployed APG in the industrial sponsored search system and achieved 3\% CTR gain and 1\% RPM gain respectively.
NeurIPS Conference 2022 Conference Paper
High-concurrency asynchronous training upon parameter server (PS) architecture and high-performance synchronous training upon all-reduce (AR) architecture are the most commonly deployed distributed training modes for recommendation models. Although synchronous AR training is designed to have higher training efficiency, asynchronous PS training would be a better choice for training speed when there are stragglers (slow workers) in the shared cluster, especially under limited computing resources. An ideal way to take full advantage of these two training modes is to switch between them upon the cluster status. However, switching training modes often requires tuning hyper-parameters, which is extremely time- and resource-consuming. We find two obstacles to a tuning-free approach: the different distribution of the gradient values and the stale gradients from the stragglers. This paper proposes Global Batch gradients Aggregation (GBA) over PS, which aggregates and applies gradients with the same global batch size as the synchronous training. A token-control process is implemented to assemble the gradients and decay the gradients with severe staleness. We provide the convergence analysis to reveal that GBA has comparable convergence properties with the synchronous training, and demonstrate the robustness of GBA the recommendation models against the gradient staleness. Experiments on three industrial-scale recommendation tasks show that GBA is an effective tuning-free approach for switching. Compared to the state-of-the-art derived asynchronous training, GBA achieves up to 0. 2% improvement on the AUC metric, which is significant for the recommendation models. Meanwhile, under the strained hardware resource, GBA speeds up at least 2. 4x compared to synchronous training.
NeurIPS Conference 2022 Conference Paper
Recently, auto-bidding technique has become an essential tool to increase the revenue of advertisers. Facing the complex and ever-changing bidding environments in the real-world advertising system (RAS), state-of-the-art auto-bidding policies usually leverage reinforcement learning (RL) algorithms to generate real-time bids on behalf of the advertisers. Due to safety concerns, it was believed that the RL training process can only be carried out in an offline virtual advertising system (VAS) that is built based on the historical data generated in the RAS. In this paper, we argue that there exists significant gaps between the VAS and RAS, making the RL training process suffer from the problem of inconsistency between online and offline (IBOO). Firstly, we formally define the IBOO and systematically analyze its causes and influences. Then, to avoid the IBOO, we propose a sustainable online RL (SORL) framework that trains the auto-bidding policy by directly interacting with the RAS, instead of learning in the VAS. Specifically, based on our proof of the Lipschitz smooth property of the Q function, we design a safe and efficient online exploration (SER) policy for continuously collecting data from the RAS. Meanwhile, we derive the theoretical lower bound on the safety degree of the SER policy. We also develop a variance-suppressed conservative Q-learning (V-CQL) method to effectively and stably learn the auto-bidding policy with the collected data. Finally, extensive simulated and real-world experiments validate the superiority of our approach over the state-of-the-art auto-bidding algorithm.
YNICL Journal 2020 Journal Article
IJCAI Conference 2020 Conference Paper
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into diverse applications, such as economic markets, labor markets, etc. These practical problems usually exhibit two distinct features: large-scale and dynamic, which requires the matching algorithm to be repeatedly executed at regular intervals. However, existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource. To address this issue, based on a key observation that the matching instances vary not too much, we propose NeuSearcher which leverage the knowledge learned from previously instances to solve new problem instances. Specifically, we design a multichannel graph neural network to predict the threshold of the matched edges, by which the search region could be significantly reduced. We further propose a parallel heuristic search algorithm to iteratively improve the solution quality until convergence. Experiments on both open and industrial datasets demonstrate that NeuSearcher can speed up 2 to 3 times while achieving exactly the same matching solution compared with the state-of-the-art approximation approaches.
TCS Journal 2020 Journal Article
NeurIPS Conference 2019 Conference Paper
Large-scale industrial recommender systems are usually confronted with computational problems due to the enormous corpus size. To retrieve and recommend the most relevant items to users under response time limits, resorting to an efficient index structure is an effective and practical solution. The previous work Tree-based Deep Model (TDM) \cite{zhu2018learning} greatly improves recommendation accuracy using tree index. By indexing items in a tree hierarchy and training a user-node preference prediction model satisfying a max-heap like property in the tree, TDM provides logarithmic computational complexity w. r. t. the corpus size, enabling the use of arbitrary advanced models in candidate retrieval and recommendation. In tree-based recommendation methods, the quality of both the tree index and the user-node preference prediction model determines the recommendation accuracy for the most part. We argue that the learning of tree index and preference model has interdependence. Our purpose, in this paper, is to develop a method to jointly learn the index structure and user preference prediction model. In our proposed joint optimization framework, the learning of index and user preference prediction model are carried out under a unified performance measure. Besides, we come up with a novel hierarchical user preference representation utilizing the tree index hierarchy. Experimental evaluations with two large-scale real-world datasets show that the proposed method improves recommendation accuracy significantly. Online A/B test results at a display advertising platform also demonstrate the effectiveness of the proposed method in production environments.
AAAI Conference 2018 Conference Paper
In this paper, we propose a simple but effective semantic part-based weighting aggregation (PWA) for image retrieval. The proposed PWA utilizes the discriminative filters of deep convolutional layers as part detectors. Moreover, we propose the effective unsupervised strategy to select some part detectors to generate the “probabilistic proposals”, which highlight certain discriminative parts of objects and suppress the noise of background. The final global PWA representation could then be acquired by aggregating the regional representations weighted by the selected ”probabilistic proposals” corresponding to various semantic content. We conduct comprehensive experiments on four standard datasets and show that our unsupervised PWA outperforms the state-of-the-art unsupervised and supervised aggregation methods.
NeurIPS Conference 2016 Conference Paper
Sensing is the process of deriving signals from the environment that allows artificial systems to interact with the physical world. The Shannon theorem specifies the maximum rate at which information can be acquired. However, this upper bound is hard to achieve in many man-made systems. The biological visual systems, on the other hand, have highly efficient signal representation and processing mechanisms that allow precise sensing. In this work, we argue that redundancy is one of the critical characteristics for such superior performance. We show architectural advantages by utilizing redundant sensing, including correction of mismatch error and significant precision enhancement. For a proof-of-concept demonstration, we have designed a heuristic-based analog-to-digital converter - a zero-dimensional quantizer. Through Monte Carlo simulation with the error probabilistic distribution as a priori, the performance approaching the Shannon limit is feasible. In actual measurements without knowing the error distribution, we observe at least 2-bit extra precision. The results may also help explain biological processes including the dominance of binocular vision, the functional roles of the fixational eye movements, and the structural mechanisms allowing hyperacuity.
AAAI Conference 2016 Conference Paper
Real-time bidding has become one of the largest online advertising markets in the world. Today the bid price per ad impression is typically decided by the expected value of how it can lead to a desired action event to the advertiser. However, this industry standard approach to decide the bid price does not consider the actual effect of the ad shown to the user, which should be measured based on the performance lift among users who have been or have not been exposed to a certain treatment of ads. In this paper, we propose a new bidding strategy and prove that if the bid price is decided based on the performance lift rather than absolute performance value, advertisers can actually gain more action events. We describe the modeling methodology to predict the performance lift and demonstrate the actual performance gain through blind A/B test with real ad campaigns. We also show that to move the demand-side platforms to bid based on performance lift, they should be rewarded based on the relative performance lift they contribute.
YNIMG Journal 2010 Journal Article
IROS Conference 2004 Conference Paper
A large autonomous logistics tracked carrier, which can be deployed in a tropical forest for day and night, has been developed. This paper presents details of the architecture and major system components. Emphasis is made on the conversion of a highly non-linear and mechanically-operated vehicle into a computer-controlled system, the perception mechanism used to identify drivable areas, and algorithms developed for vehicle path planning. A suite of active and passive sensors is employed for visual guidance. The data from different sensors are fused and fed to a path planner that takes into account the vehicle dynamics and generates the heading and speed commands for the vehicle control system. The paper includes results from a series of field trials completed in a tropical forest and conditions that are unique to the region.