Author name cluster

Yunming Ye

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

2 author rows

AAAI Conference 2026 Conference Paper

Improved Masked Image Generation with Knowledge-Augmented Token Representations

Guotao Liang
Baoquan Zhang
Zhiyuan Wen
Zihao Han
Yunming Ye

Masked image generation (MIG) has demonstrated remarkable efficiency and high-fidelity images by enabling parallel token prediction. Existing methods typically rely solely on the model itself to learn semantic dependencies among visual token sequences. However, directly learning such semantic dependencies from data is challenging because the individual tokens lack clear semantic meanings, and these sequences are usually long. To address this limitation, we propose a novel Knowledge-Augmented Masked Image Generation framework, named KA-MIG, which introduces explicit knowledge of token-level semantic dependencies (i.e., extracted from the training data) as priors to learn richer representations for improving performance. In particular, we explore and identify three types of advantageous token knowledge graphs, including two positive and one negative graphs (i.e., the co-occurrence graph, the semantic similarity graph, and the position-token incompatibility graph). Based on three prior knowledge graphs, we design a graph-aware encoder to learn token and position-aware representations. After that, a lightweight fusion mechanism is introduced to integrate these enriched representations into the existing MIG methods. Resorting to such prior knowledge, our method effectively enhances the model's ability to capture semantic dependencies, leading to improved generation quality. Experimental results demonstrate that our method improves upon existing MIG for class-conditional image generation on ImageNet.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Satellite-Text-Prompted Large Language Model for Photovoltaic Power Forecasting

Pengfei Jia
Jianghong Ma
Baoquan Zhang
Kenghong Lin
Xinyu Zhang
Chuyao Luo
Xutao Li
Yunming Ye

Photovoltaic (PV) power forecasting is critical for the operation of solar power plants and the coordination of energy within power grids. This work aims to predict future PV power time series by leveraging multimodal data. While recent studies have incorporated numerical modalities such as satellite image sequences and numerical weather prediction (NWP) time series, they often overlook textual modalities—such as the spatio-temporal context of PV plants—and the potential of pretrained large language models (LLMs). In this paper, we build upon existing numerical inputs and further explore the use of spatio-temporal text prompts, generated based on plant coordinates and forecast start time, to enhance the forecasting process. We propose PV-LLM, a satellite-text-prompted framework that integrates a pretrained LLM to improve PV power forecasting. The framework consists of three key components: Text Prompt Construction, Modality-Specific Encoding, and Adaptive Prompt Tuning. First, the Text Prompt Construction module generates spatio-temporal prompts that offer high-level semantic guidance. Next, the Modality-Specific Encoding module encodes each modality according to its unique characteristics, capturing modality-specific patterns while managing varying context lengths. Finally, the Adaptive Prompt Tuning module fine-tunes the LLM to integrate multimodal embeddings, while an adaptive gating mechanism retains its pretrained knowledge. We validate the effectiveness of the proposed framework on a real-world dataset containing multiple PV plants. Experimental results demonstrate that our approach outperforms existing state-of-the-art methods.

PDF Details DOI

AAAI Conference 2025 Conference Paper

AsyncDSB: Schedule-Asynchronous Diffusion Schrödinger Bridge for Image Inpainting

Zihao Han
Baoquan Zhang
Lisai Zhang
Shanshan Feng
Kenghong Lin
Guotao Liang
Yunming Ye
Joeq

Image inpainting is an important image generation task, which aims to restore corrupted image from partial visible area. Recently, diffusion Schrödinger bridge methods effectively tackle this task by modeling the translation between corrupted and target images as a diffusion Schrödinger bridge process along a noising schedule path. Although these methods have shown superior performance, in this paper, we find that 1) existing methods suffer from a schedule-restoration mismatching issue, i.e., the theoretical schedule and practical restoration processes usually exist a large discrepancy, which theoretically results in the schedule not fully leveraged for restoring images; and 2) the key reason causing such issue is that the restoration process of all pixels are actually asynchronous but existing methods set a synchronous noise schedule to them, i.e., all pixels shares the same noise schedule. To this end, we propose a schedule-Asynchronous Diffusion Schrödinger Bridge (AsyncDSB) for image inpainting. Our insight is preferentially scheduling pixels with high frequency (i.e., large gradients) and then low frequency (i.e., small gradients). Based on this insight, given a corrupted image, we first train a network to predict its gradient map in corrupted area. Then, we regard the predicted image gradient as prior and design a simple yet effective pixel-asynchronous noise schedule strategy to enhance the diffusion Schrödinger bridge. Thanks to the asynchronous schedule at pixels, the temporal interdependence of restoration process between pixels can be fully characterized for high-quality image inpainting. Experiments on real-world datasets show that our AsyncDSB achieves superior performance, especially on FID with around 3% ∼ 14% improvement over state-of-the-art baseline methods.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Integrating Multi-Source Data for Long Sequence Precipitation Forecasting

Demin Yu
Wenzhi Feng
Kenghong Lin
Xutao Li
Yunming Ye
Chuyao Luo
Wenchuan Du

Long-sequence precipitation forecasting is critical for both meteorological science and smart city applications. The primary objective of this task is to predict future radar echo sequences, which provide high resolution and timely references for atmospheric precipitation distribution based on current observations. However, the chaotic nature of precipitation systems poses significant challenges in extending reliable forecast horizons. Most existing methods struggle with accuracy and clarity when extended to long-sequence predictions, such as three-hour forecasts. This is primarily due to the insufficiency of spatio-temporal information within a single modality over time. In this paper, we propose a cascading forecasting framework that adaptively extracts and integrates multimodal spatio-temporal information to support accurate and realistic long-sequence radar forecasting. Our framework includes a temporal adaptive predictor and a flow-based precipitation distribution adaptor. The predictor utilizes a multi-branch encoder-decoder architecture. This design allows it to extract meteorological sequences from multiple sources at varying scales, resulting in an initial global precipitation estimate. The core component is a carefully designed cross-attention module with a temporal adaptive layer to enhance multi-modality alignment. The initial estimate is then refined by the flow-based adaptor, which adjusts the prediction to match the target precipitation distribution, enhancing local details and correcting extreme precipitation patterns. We validated our method using real multi-source dataset for long-sequence forecasting, and the experimental results demonstrate that our approach outperforms existing state-of-the-art methods.

PDF Details DOI

ICML Conference 2025 Conference Paper

Perceptually Constrained Precipitation Nowcasting Model

Wenzhi Feng
Xutao Li 0003
Zhe Wu 0006
Kenghong Lin
Demin Yu
Yunming Ye
Yaowei Wang 0001

Most current precipitation nowcasting methods aim to capture the underlying spatiotemporal dynamics of precipitation systems by minimizing the mean square error (MSE). However, these methods often neglect effective constraints on the data distribution, leading to unsatisfactory prediction accuracy and image quality, especially for long forecast sequences. To address this limitation, we propose a precipitation nowcasting model incorporating perceptual constraints. This model reformulates precipitation nowcasting as a posterior MSE problem under such constraints. Specifically, we first obtain the posteriori mean sequences of precipitation forecasts using a precipitation estimator. Subsequently, we construct the transmission between distributions using rectified flow. To enhance the focus on distant frames, we design a frame sampling strategy that gradually increases the corresponding weights. We theoretically demonstrate the reliability of our solution, and experimental results on two publicly available radar datasets demonstrate that our model is effective and outperforms current state-of-the-art models.

Details

AAAI Conference 2024 Conference Paper

iTrendRNN: An Interpretable Trend-Aware RNN for Meteorological Spatiotemporal Prediction

Xu Huang
Chuyao Luo
Bowen Zhang
Huiwei Lin
Xutao Li
Yunming Ye

Accurate prediction of meteorological elements, such as temperature and relative humidity, is important to human livelihood, early warning of extreme weather, and urban governance. Recently, neural network-based methods have shown impressive performance in this field. However, most of them are overcomplicated and impenetrable. In this paper, we propose a straightforward and interpretable differential framework, where the key lies in explicitly estimating the evolutionary trends. Specifically, three types of trends are exploited. (1) The proximity trend simply uses the most recent changes. It works well for approximately linear evolution. (2) The sequential trend explores the global information, aiming to capture the nonlinear dynamics. Here, we develop an attention-based trend unit to help memorize long-term features. (3) The flow trend is motivated by the nature of evolution, i.e., the heat or substance flows from one region to another. Here, we design a flow-aware attention unit. It can reflect the interactions via performing spatial attention over flow maps. Finally, we develop a trend fusion module to adaptively fuse the above three trends. Extensive experiments on two datasets demonstrate the effectiveness of our method.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

LG-VQ: Language-Guided Codebook Learning

Guotao Liang
Baoquan Zhang
Yaowei Wang
Xutao Li
Yunming Ye
Huaibin Wang
Chuyao Luo
Kola Ye

Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image synthesis, which aims to learn a codebook to encode an image with a sequence of discrete codes and then generate an image in an auto-regression manner. Although existing methods have shown superior performance, most methods prefer to learn a single-modal codebook (\emph{e. g. }, image), resulting in suboptimal performance when the codebook is applied to multi-modal downstream tasks (\emph{e. g. }, text-to-image, image captioning) due to the existence of modal gaps. In this paper, we propose a novel language-guided codebook learning framework, called LG-VQ, which aims to learn a codebook that can be aligned with the text to improve the performance of multi-modal downstream tasks. Specifically, we first introduce pre-trained text semantics as prior knowledge, then design two novel alignment modules (\emph{i. e. }, Semantic Alignment Module, and Relationship Alignment Module) to transfer such prior knowledge into codes for achieving codebook text alignment. In particular, our LG-VQ method is model-agnostic, which can be easily integrated into existing VQ models. Experimental results show that our method achieves superior performance on reconstruction and various multi-modal downstream tasks.

PDF Details DOI

AAAI Conference 2024 Conference Paper

MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning

Baoquan Zhang
Chuyao Luo
Demin Yu
Xutao Li
Huiwei Lin
Yunming Ye
Bowen Zhang

Equipping a deep model the ability of few-shot learning (FSL) is a core challenge for artificial intelligence. Gradient-based meta-learning effectively addresses the challenge by learning how to learn novel tasks. Its key idea is learning a deep model in a bi-level optimization manner, where the outer-loop process learns a shared gradient descent algorithm (called meta-optimizer), while the inner-loop process leverages it to optimize a task-specific base learner with few examples. Although these methods have shown superior performance on FSL, the outer-loop process requires calculating second-order derivatives along the inner-loop path, which imposes considerable memory burdens and the risk of vanishing gradients. This degrades meta-learning performance. Inspired by recent diffusion models, we find that the inner-loop gradient descent process can be viewed as a reverse process (i.e., denoising) of diffusion where the target of denoising is the weight of base learner but origin data. Based on this fact, we propose to model the gradient descent algorithm as a diffusion model and then present a novel conditional diffusion-based meta-learning, called MetaDiff, that effectively models the optimization process of base learner weights from Gaussian initialization to target weights in a denoising manner. Thanks to the training efficiency of diffusion models, our MetaDiff does not need to differentiate through the inner-loop path such that the memory burdens and the risk of vanishing gradients can be effectively alleviated for improving FSL. Experimental results show that our MetaDiff outperforms state-of-the-art gradient-based meta-learning family on FSL tasks.

PDF Details DOI

EAAI Journal 2023 Journal Article

TFG-Net:Tropical Cyclone Intensity Estimation from a Fine-grained perspective with the Graph convolution neural network

Guangning Xu
Yan Li
Chi Ma
Xutao Li
Yunming Ye
Qingquan Lin
Zhichao Huang
Shidong Chen

Tropical Cyclone Intensity Estimation (TIE) is a fundamental study subject for tropical cyclone development, flood or landslide avoidance, etc. Despite considerable efforts, two main challenges remain unresolved in this critical endeavor. The first challenge is that the TIE task is frequently conducted as a coarse-grained recognition problem rather than a fine-grained one. The second challenge is that the prediction fails to consider general wind speed information. To conquer these two challenges, we offer a novel model, namely Tropical cyclone intensity estimation from a Fine-grained perspective with the Graph convolution neural Network (TFG-Net). It is composed of three key components, viz. , the Backbone, the Fine-grained Tropical cyclone Features Extractor (FTFE), and the Wind Scale Transition Rule Generator (WTRG), which aim at extracting general spatial features, subtle spatial features, and general wind speed information, respectively. To validate the proposed method, extensive experiments on a well-known real-world tropical dataset named GridSat were carried out. Following the standard benchmark task setting that the model estimates the wind speed from a given satellite image, the proposed TFG-Net reaches 11. 12 knots in the RMSE metric, which outperforms 33. 33%, 2. 54% to the traditional method and the state-of-the-art deep learning method, respectively. The code is available on GitHub: https: //github. com/xuguangning1218/TI_Estimation and its reproductive result is available on Code Ocean: https: //doi. org/10. 24433/CO. 6606867. v1.

Details DOI

IJCAI Conference 2022 Conference Paper

Hyperbolic Knowledge Transfer with Class Hierarchy for Few-Shot Learning

Baoquan Zhang
Hao Jiang
Shanshan Feng
Xutao Li
Yunming Ye
Rui Ye

Few-shot learning (FSL) aims to recognize a novel class with very few instances, which is a challenging task since it suffers from a data scarcity issue. One way to effectively alleviate this issue is introducing explicit knowledge summarized from human past experiences to achieve knowledge transfer for FSL. Based on this idea, in this paper, we introduce the explicit knowledge of class hierarchy (i. e. , the hierarchy relations between classes) as FSL priors and propose a novel hyperbolic knowledge transfer framework for FSL, namely, HyperKT. Our insight is, in the hyperbolic space, the hierarchy relation between classes can be well preserved by resorting to the exponential growth characters of hyperbolic volume, so that better knowledge transfer can be achieved for FSL. Specifically, we first regard the class hierarchy as a tree-like structure. Then, 1) a hyperbolic representation learning module and a hyperbolic prototype inference module are employed to encode/infer each image and class prototype to the hyperbolic space, respectively; and 2) a novel hierarchical classification and relation reconstruction loss are carefully designed to learn the class hierarchy. Finally, the novel class prediction is performed in a nearest-prototype manner. Extensive experiments on three datasets show our method achieves superior performance over state-of-the-art methods, especially on 1-shot tasks.

PDF Details DOI

AAAI Conference 2022 Conference Paper

MetaNODE: Prototype Optimization as a Neural ODE for Few-Shot Learning

Baoquan Zhang
Xutao Li
Shanshan Feng
Yunming Ye
Rui Ye

Few-Shot Learning (FSL) is a challenging task, i. e. , how to recognize novel classes with few examples? Pre-training based methods effectively tackle the problem by pre-training a feature extractor and then predicting novel classes via a cosine nearest neighbor classifier with mean-based prototypes. Nevertheless, due to the data scarcity, the mean-based prototypes are usually biased. In this paper, we attempt to diminish the prototype bias by regarding it as a prototype optimization problem. To this end, we propose a novel metalearning based prototype optimization framework to rectify prototypes, i. e. , introducing a meta-optimizer to optimize prototypes. Although the existing meta-optimizers can also be adapted to our framework, they all overlook a crucial gradient bias issue, i. e. , the mean-based gradient estimation is also biased on sparse data. To address the issue, we regard the gradient and its flow as meta-knowledge and then propose a novel Neural Ordinary Differential Equation (ODE)-based meta-optimizer to polish prototypes, called MetaNODE. In this meta-optimizer, we first view the mean-based prototypes as initial prototypes, and then model the process of prototype optimization as continuous-time dynamics specified by a Neural ODE. A gradient flow inference network is carefully designed to learn to estimate the continuous gradient flow for prototype dynamics. Finally, the optimal prototypes can be obtained by solving the Neural ODE. Extensive experiments on miniImagenet, tieredImagenet, and CUB-200-2011 show the effectiveness of our method.

PDF Details

IJCAI Conference 2020 Conference Paper

MR-GCN: Multi-Relational Graph Convolutional Networks based on Generalized Tensor Product

Zhichao Huang
Xutao Li
Yunming Ye
Michael K. Ng

Graph Convolutional Networks (GCNs) have been extensively studied in recent years. Most of existing GCN approaches are designed for the homogenous graphs with a single type of relation. However, heterogeneous graphs of multiple types of relations are also ubiquitous and there is a lack of methodologies to tackle such graphs. Some previous studies address the issue by performing conventional GCN on each single relation and then blending their results. However, as the convolutional kernels neglect the correlations across relations, the strategy is sub-optimal. In this paper, we propose the Multi-Relational Graph Convolutional Network (MR-GCN) framework by developing a novel convolution operator on multi-relational graphs. In particular, our multi-dimension convolution operator extends the graph spectral analysis into the eigen-decomposition of a Laplacian tensor. And the eigen-decomposition is formulated with a generalized tensor product, which can correspond to any unitary transform instead of limited merely to Fourier transform. We conduct comprehensive experiments on four real-world multi-relational graphs to solve the semi-supervised node classification task, and the results show the superiority of MR-GCN against the state-of-the-art competitors.

PDF Details DOI

IJCAI Conference 2017 Conference Paper

DeepFM: A Factorization-Machine based Neural Network for CTR Prediction

Huifeng Guo
Ruiming Tang
Yunming Ye
Zhenguo Li
Xiuqiang He

Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low- or high-order interactions, or require expertise feature engineering. In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions. The proposed model, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture. Compared to the latest Wide & Deep model from Google, DeepFM has a shared input to its "wide" and "deep" parts, with no need of feature engineering besides raw features. Comprehensive experiments are conducted to demonstrate the effectiveness and efficiency of DeepFM over the existing models for CTR prediction, on both benchmark data and commercial data.

PDF Details

AAAI Conference 2017 Conference Paper

Low-Rank Tensor Completion with Total Variation for Visual Data Inpainting

Xutao Li
Yunming Ye
Xiaofei Xu

With the advance of acquisition techniques, plentiful higherorder tensor data sets are built up in a great variety of ﬁelds such as computer vision, neuroscience, remote sensing and recommender systems. The real-world tensors often contain missing values, which makes tensor completion become a prerequisite to utilize them. Previous studies have shown that imposing a low-rank constraint on tensor completion produces impressive performances. In this paper, we argue that low-rank constraint, albeit useful, is not effective enough to exploit the local smooth and piecewise priors of visual data. We propose integrating total variation into low-rank tensor completion (LRTC) to address the drawback. As LRTC can be formulated by both tensor unfolding and tensor decomposition, we develop correspondingly two methods, namely LRTC-TV-I and LRTC- TV-II, and their iterative solvers. Extensive experimental results on color image and medical image inpainting tasks show the effectiveness and superiority of the two methods against state-of-the-art competitors. Our codes are available at https: //sites. google. com/site/xutaoli2014

PDF Details

IS Journal 2015 Journal Article

Dynamic Business Network Analysis for Correlated Stock Price Movement Prediction

Wenping Zhang
Chunping Li
Yunming Ye
Wenjie Li
Eric W.T. Ngai

Although much research is devoted to the analysis and prediction of individuals' behavior in social networks, very few studies analyze firms' performance with respect to business networks. Empowered by recent research on the automated mining of business networks, this article illustrates the design of a novel business network-based model called the energy cascading model (ECM) for predicting directional stock price movements of related firms. More specifically, the proposed network-based predictive analytics model considers both influential business relationships and Twitter sentiments to infer a firm's middle to long-term directional stock price movements. The reported empirical experiments are based on a publicly available financial corpus and social media postings that reveal the proposed ECM model to be effective for predicting directional stock price movements. It outperforms the best baseline model, the Pearson correlation-based prediction model, in upward stock price movement prediction by 11. 7 percent in terms of F-measure.

Details DOI

IS Journal 2014 Journal Article

Cotransfer Learning Using Coupled Markov Chains with Restart

Qingyao Wu
Michael K. Ng
Yunming Ye

This article studies cotransfer learning, a machine learning strategy that uses labeled data to enhance the classification of different learning spaces simultaneously. The authors model the problem as a coupled Markov chain with restart. The transition probabilities in the coupled Markov chain can be constructed using the intrarelationships based on the affinity metric among instances in the same space, and the interrelationships based on co-occurrence information among instances from different spaces. The learning algorithm computes ranking of labels to indicate the importance of a set of labels to an instance by propagating the ranking score of labeled instances via the coupled Markov chain with restart. Experimental results on benchmark data (multiclass image-text and English-Spanish-French classification datasets) have shown that the learning algorithm is computationally efficient, and effective in learning across different spaces.

Details DOI