EAAI Journal 2025 Journal Article
Causality-inspired surface defect detection by transferring knowledge from natural images
- Fangfang An
- Shaolei Cao
- Shuai Ma
- Dawu Shu
- Bo Han
- Wanxin Li
- Ruigang Liu
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
EAAI Journal 2025 Journal Article
IJCAI Conference 2025 Conference Paper
Point cloud completion is a crucial task in 3D computer vision. Multi-modal completion approaches have gained attention among the popular two-stage point cloud completion methods. However, there is a notable lack of research focused on accurately aligning data from different modalities within these methods. Additionally, in other point cloud-based tasks, edge point information often provides unexpected positive contributions. In this paper, we propose a novel point cloud completion method that leverages edge point information for the first time in the completion task, which also addresses the precise alignment of multi-modal data. In particular, we implement a two-step local-to-global module to achieve better alignment of multi-modal data during the preliminary point cloud generation process. Besides, we introduce a new spatial representation structure capable of extracting a fixed number of edge points. Moreover, with the assistance of edge information, we further design an inverse edge-aware upsampler to refine the point cloud. We evaluate our method on three typical datasets, and the results demonstrate that our IE-PMMA outperforms the existing state-of-the-art methods quantitatively and visually.
AAAI Conference 2024 Conference Paper
Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds. Nevertheless, contemporary panoramic synthesis techniques grapple with the challenge of semantically guiding the content generation process. Although recent breakthroughs in visual synthesis have unlocked the potential for semantic control in 2D flat images, a direct application of these methods to panorama synthesis yields distorted content. In this study, we unveil an innovative framework for generating high-resolution panoramas, adeptly addressing the issues of spherical distortion and edge discontinuity through sophisticated spherical modeling. Our pioneering approach empowers users with semantic control, harnessing both image and text inputs, while concurrently streamlining the generation of high-resolution panoramas using parallel decoding. We rigorously evaluate our methodology on a diverse array of indoor and outdoor datasets, establishing its superiority over recent related work, in terms of both quantitative and qualitative performance metrics. Our research elevates the controllability, efficiency, and fidelity of panorama synthesis to new levels.
EAAI Journal 2024 Journal Article
NeurIPS Conference 2024 Conference Paper
In recent years, the integration of vision and language understanding has led to significant advancements in artificial intelligence, particularly through Vision-Language Models (VLMs). However, existing VLMs face challenges in handling real-world applications with complex scenes and multiple objects, as well as aligning their focus with the diverse attention patterns of human users. In this paper, we introduce gaze information, feasibly collected by ubiquitous wearable devices such as MR glasses, as a proxy for human attention to guide VLMs. We propose a novel approach, Voila-A, for gaze alignment to enhance the effectiveness of these models in real-world applications. First, we collect hundreds of minutes of gaze data to demonstrate that we can mimic human gaze modalities using localized narratives. We then design an automatic data annotation pipeline utilizing GPT-4 to generate the VOILA-COCO dataset. Additionally, we introduce a new model VOILA-A that integrate gaze information into VLMs while maintain pretrained knowledge from webscale dataset. We evaluate Voila-A using a hold-out validation set and a newly collected VOILA-GAZE testset, which features real-life scenarios captured with a gaze-tracking device. Our experimental results demonstrate that Voila-A significantly outperforms several baseline models. By aligning model attention with human gaze patterns, Voila-A paves the way for more intuitive, user-centric VLMs and fosters engaging human-AI interaction across a wide range of applications.
IJCAI Conference 2023 Conference Paper
Keeping risk under control is often more crucial than maximizing expected rewards in real-world decision-making situations, such as finance, robotics, autonomous driving, etc. The most natural choice of risk measures is variance, while it penalizes the upside volatility as much as the downside part. Instead, the (downside) semivariance, which captures negative deviation of a random variable under its mean, is more suitable for risk-averse proposes. This paper aims at optimizing the mean-semivariance (MSV) criterion in reinforcement learning w. r. t. steady reward distribution. Since semivariance is time-inconsistent and does not satisfy the standard Bellman equation, the traditional dynamic programming methods are inapplicable to MSV problems directly. To tackle this challenge, we resort to Perturbation Analysis (PA) theory and establish the performance difference formula for MSV. We reveal that the MSV problem can be solved by iteratively solving a sequence of RL problems with a policy-dependent reward function. Further, we propose two on-policy algorithms based on the policy gradient theory and the trust region method. Finally, we conduct diverse experiments from simple bandit problems to continuous control tasks in MuJoCo, which demonstrate the effectiveness of our proposed methods.
TIST Journal 2023 Journal Article
Analyzing the urban trajectory in cities has become an important topic in data mining. How can we model the human mobility consisting of stay and travel states from the raw trajectory data? How can we infer these mobility states from a single user’s trajectory information? How can we further generalize the mobility inference to the real-world trajectory data that span multiple users and are sparsely sampled over time? In this article, based on formal and rigid definitions of the stay/travel mobility, we propose a single trajectory inference algorithm that utilizes a generic long-tailed sparsity pattern in the large-scale trajectory data. The algorithm guarantees a 100% precision in the stay/travel inference with a provable lower bound in the recall metric. Furthermore, we design a transformer-like deep learning architecture on the problem of mobility inference from multiple sparse trajectories. Several adaptations from the standard transformer network structure are introduced, including the singleton design to avoid the negative effect of sparse labels in the decoder side, the customized space-time embedding on features of location records, and the mask apparatus at the output side for loss function correction. Evaluations on three trajectory datasets of 40 million urban users validate the performance guarantees of the proposed inference algorithm and demonstrate the superiority of our deep learning model, in comparison to sequence learning methods in the literature. On extremely sparse trajectories, the deep learning model improves from the single trajectory inference algorithm with more than two times of overall and F1 accuracy. The model also generalizes to large-scale trajectory data from different sources with good scalability.
EAAI Journal 2023 Journal Article
JAIR Journal 2022 Journal Article
Keeping risk under control is often more crucial than maximizing expected reward in real-world decision-making situations, such as finance, robotics, autonomous driving, etc. The most natural choice of risk measures is variance, while it penalizes the upside volatility as much as the downside part. Instead, the (downside) semivariance, which captures the negative deviation of a random variable under its mean, is more suitable for risk-averse proposes. This paper aims at optimizing the mean-semivariance (MSV) criterion in reinforcement learning w.r.t. steady rewards. Since semivariance is time-inconsistent and does not satisfy the standard Bellman equation, the traditional dynamic programming methods are inapplicable to MSV problems directly. To tackle this challenge, we resort to the Perturbation Analysis (PA) theory and establish the performance difference formula for MSV. We reveal that the MSV problem can be solved by iteratively solving a sequence of RL problems with a policy-dependent reward function. Further, we propose two on-policy algorithms based on the policy gradient theory and the trust region method. Finally, we conduct diverse experiments from simple bandit problems to continuous control tasks in MuJoCo, which demonstrate the effectiveness of our proposed methods.
IJCAI Conference 2020 Conference Paper
While Point-of-Interest (POI) recommendation has been a popular topic of study for some time, little progress has been made for understanding why and how people make their decisions for the selection of POIs. To this end, in this paper, we propose a user decision profiling framework, named PROUD, which can identify the key factors in people's decisions on choosing POIs. Specifically, we treat each user decision as a set of factors and provide a method for learning factor embeddings. A unique perspective of our approach is to identify key factors, while preserving decision structures seamlessly, via a novel scalar projection maximization objective. Exactly solving the objective is non-trivial due to a sparsity constraint. To address this, our PROUD adopts a self projection attention and an L2 regularized sparse activation to directly estimate the likelihood of each factor to be a key factor. Finally, extensive experiments on real-world data validate the advantage of PROUD in preserving user decision structures. Also, our case study indicates that the identified key decision factors can help us to provide more interpretable recommendations and analyses.
IJCAI Conference 2019 Conference Paper
Semantically matching two text sequences (usually two sentences) is a fundamental problem in NLP. Most previous methods either encode each of the two sentences into a vector representation (sentence-level embedding) or leverage word-level interaction features between the two sentences. In this study, we propose to take the sentence-level embedding features and the word-level interaction features as two distinct views of a sentence pair, and unify them with a framework of Variational Autoencoders such that the sentence pair is matched in a semi-supervised manner. The proposed model is referred to as Dual-View Variational AutoEncoder (DV-VAE), where the optimization of the variational lower bound can be interpreted as an implicit Co-Training mechanism for two matching models over distinct views. Experiments on SNLI, Quora and a Community Question Answering dataset demonstrate the superiority of our DV-VAE over several strong semi-supervised and supervised text matching models.
AAAI Conference 2019 Conference Paper
In the framework of MDP, although the general reward function takes three arguments—current state, action, and successor state; it is often simplified to a function of two arguments—current state and action. The former is called a transition-based reward function, whereas the latter is called a state-based reward function. When the objective involves the expected total reward only, this simplification works perfectly. However, when the objective is risk-sensitive, this simplification leads to an incorrect value. We propose three successively more general state-augmentation transformations (SATs), which preserve the reward sequences as well as the reward distributions and the optimal policy in risk-sensitive reinforcement learning. In risk-sensitive scenarios, firstly we prove that, for every MDP with a stochastic transition-based reward function, there exists an MDP with a deterministic state-based reward function, such that for any given (randomized) policy for the first MDP, there exists a corresponding policy for the second MDP, such that both Markov reward processes share the same reward sequence. Secondly we illustrate that two situations require the proposed SATs in an inventory control problem. One could be using Q-learning (or other learning methods) on MDPs with transition-based reward functions, and the other could be using methods, which are for the Markov processes with a deterministic state-based reward functions, on the Markov processes with general reward functions. We show the advantage of the SATs by considering Value-at-Risk as an example, which is a risk measure on the reward distribution instead of the measures (such as mean and variance) of the distribution. We illustrate the error in the reward distribution estimation from the reward simplification, and show how the SATs enable a variance formula to work on Markov processes with general reward functions.
AAAI Conference 2019 Conference Paper
Without real bilingual corpus available, unsupervised Neural Machine Translation (NMT) typically requires pseudo parallel data generated with the back-translation method for the model training. However, due to weak supervision, the pseudo data inevitably contain noises and errors that will be accumulated and reinforced in the subsequent training process, leading to bad translation performance. To address this issue, we introduce phrase based Statistic Machine Translation (SMT) models which are robust to noisy data, as posterior regularizations to guide the training of unsupervised NMT models in the iterative back-translation process. Our method starts from SMT models built with pre-trained language models and word-level translation tables inferred from cross-lingual embeddings. Then SMT and NMT models are optimized jointly and boost each other incrementally in a unified EM framework. In this way, (1) the negative effect caused by errors in the iterative back-translation process can be alleviated timely by SMT filtering noises from its phrase tables; meanwhile, (2) NMT can compensate for the deficiency of fluency inherent in SMT. Experiments conducted on en-fr and en-de translation tasks show that our method outperforms the strong baseline and achieves new state-of-the-art unsupervised machine translation performance.
TCS Journal 2014 Journal Article