Arrow Research search

Author name cluster

Siwei Lyu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

42 papers
2 author rows

Possible papers

42

AAAI Conference 2026 Conference Paper

DICE: Distilling Classifier-Free Guidance into Text Embeddings

  • Zhenyu Zhou
  • Defang Chen
  • Can Wang
  • Chun Chen
  • Siwei Lyu

Text-to-image diffusion models are capable of generating high-quality images, but suboptimal pre-trained text representations often result in these images failing to align closely with the given text prompts. Classifier-free guidance (CFG) is a popular and effective technique for improving text-image alignment in the generative process. However, CFG introduces significant computational overhead. In this paper, we present DIstilling CFG by sharpening text Embeddings (DICE) that replaces CFG in the sampling process with half the computational complexity while maintaining similar generation quality. DICE distills a CFG-based text-to-image diffusion model into a CFG-free version by refining text embeddings to replicate CFG-based directions. In this way, we avoid the computational drawbacks of CFG, enabling high-quality, well-aligned image generation at a fast sampling speed. Furthermore, examining the enhancement pattern, we identify the underlying mechanism of DICE that sharpens specific components of text embeddings to preserve semantic information while enhancing fine-grained details. Extensive experiments on multiple Stable Diffusion v1.5 variants, SDXL, and PixArt-\alpha demonstrate the effectiveness of our method.

TMLR Journal 2025 Journal Article

Conditional Image Synthesis with Diffusion Models: A Survey

  • Zheyuan Zhan
  • Defang Chen
  • Jian-Ping Mei
  • Zhenghe Zhao
  • Jiawei Chen
  • Chun Chen
  • Siwei Lyu
  • Can Wang

Conditional image synthesis based on user-specified requirements is a key component in creating complex visual content. In recent years, diffusion-based generative modeling has become a highly effective way for conditional image synthesis, leading to exponential growth in the literature. However, the complexity of diffusion-based modeling, the wide range of image synthesis tasks, and the diversity of conditioning mechanisms present significant challenges for researchers to keep up with rapid developments and to understand the core concepts on this topic. In this survey, we categorize existing works based on how conditions are integrated into the two fundamental components of diffusion-based modeling, i.e., the denoising network and the sampling process. We specifically highlight the underlying principles, advantages, and potential challenges of various conditioning approaches during the training, re-purposing, and specialization stages to construct a desired denoising network. We also summarize six mainstream conditioning mechanisms in the sampling process. All discussions are centered around popular applications. Finally, we pinpoint several critical yet still unsolved problems and suggest some possible solutions for future research.

NeurIPS Conference 2025 Conference Paper

X2-DFD: A framework for explainable and extendable Deepfake Detection

  • Yize Chen
  • Zhiyuan Yan
  • Guangliang Cheng
  • Kangran Zhao
  • Siwei Lyu
  • Baoyuan Wu

This paper proposes **$\mathcal{X}^2$-DFD**, an **e$\mathcal{X}$plainable** and **e$\mathcal{X}$tendable** framework based on multimodal large-language models (MLLMs) for deepfake detection, consisting of three key stages. The first stage, *Model Feature Assessment*, systematically evaluates the detectability of forgery-related features for the MLLM, generating a prioritized ranking of features based on their intrinsic importance to the model. The second stage, *Explainable Dataset Construction*, consists of two key modules: *Strong Feature Strengthening*, which is designed to enhance the model’s existing detection and explanation capabilities by reinforcing its well-learned features, and *Weak Feature Supplementing*, which addresses gaps by integrating specific feature detectors (e. g. , low-level artifact analyzers) to compensate for the MLLM’s limitations. The third stage, Fine-tuning and Inference, involves fine-tuning the MLLM on the constructed dataset and deploying it for final detection and explanation. By integrating these three stages, our approach enhances the MLLM's strengths while supplementing its weaknesses, ultimately improving both the detectability and explainability. Extensive experiments and ablations, followed by a comprehensive human study, validate the improved performance of our approach compared to the original MLLMs. More encouragingly, our framework is designed to be plug-and-play, allowing it to seamlessly integrate with future more advanced MLLMs and specific feature detectors, leading to continual improvement and extension to face the challenges of rapidly evolving deepfakes. Code can be found on https: //github. com/chenyize111/X2DFD.

ICLR Conference 2024 Conference Paper

Exposing Text-Image Inconsistency Using Diffusion Models

  • Mingzhen Huang
  • Shan Jia
  • Zhou Zhou 0009
  • Yan Ju
  • Jialing Cai
  • Siwei Lyu

In the battle against widespread online misinformation, a growing problem is text-image inconsistency, where images are misleadingly paired with texts with different intent or meaning. Existing classification-based methods for text-image inconsistency can identify contextual inconsistencies but fail to provide explainable justifications for their decisions that humans can understand. Although more nuanced, human evaluation is impractical at scale and susceptible to errors. To address these limitations, this study introduces D-TIIL (Diffusion-based Text-Image Inconsistency Localization), which employs text-to-image diffusion models to localize semantic inconsistencies in text and image pairs. These models, trained on large-scale datasets act as ``omniscient" agents that filter out irrelevant information and incorporate background knowledge to identify inconsistencies. In addition, D-TIIL uses text embeddings and modified image regions to visualize these inconsistencies. To evaluate D-TIIL's efficacy, we introduce a new TIIL dataset containing 14K consistent and inconsistent text-image pairs. Unlike existing datasets, TIIL enables assessment at the level of individual words and image regions and is carefully designed to represent various inconsistencies. D-TIIL offers a scalable and evidence-based approach to identifying and localizing text-image inconsistency, providing a robust framework for future research combating misinformation.

NeurIPS Conference 2024 Conference Paper

First-Order Minimax Bilevel Optimization

  • Yifan Yang
  • Zhaofeng Si
  • Siwei Lyu
  • Kaiyi Ji

Multi-block minimax bilevel optimization has been studied recently due to its great potential in multi-task learning, robust machine learning, and few-shot learning. However, due to the complex three-level optimization structure, existing algorithms often suffer from issues such as high computing costs due to the second-order model derivatives or high memory consumption in storing all blocks' parameters. In this paper, we tackle these challenges by proposing two novel fully first-order algorithms named FOSL and MemCS. FOSL features a fully single-loop structure by updating all three variables simultaneously, and MemCS is a memory-efficient double-loop algorithm with cold-start initialization. We provide a comprehensive convergence analysis for both algorithms under full and partial block participation, and show that their sample complexities match or outperform those of the same type of methods in standard bilevel optimization. We evaluate our methods in two applications: the recently proposed multi-task deep AUC maximization and a novel rank-based robust meta-learning. Our methods consistently improve over existing methods with better performance over various datasets.

ICML Conference 2024 Conference Paper

On the Trajectory Regularity of ODE-based Diffusion Sampling

  • Defang Chen 0001
  • Zhenyu Zhou
  • Can Wang 0001
  • Chunhua Shen
  • Siwei Lyu

Diffusion-based generative models use stochastic differential equations (SDEs) and their equivalent ordinary differential equations (ODEs) to establish a smooth connection between a complex data distribution and a tractable prior distribution. In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models. We characterize an implicit denoising trajectory and discuss its vital role in forming the coupled sampling trajectory with a strong shape regularity, regardless of the generated content. We also describe a dynamic programming-based scheme to make the time schedule in sampling better fit the underlying trajectory structure. This simple strategy requires minimal modification to any given ODE-based numerical solvers and incurs negligible computational cost, while delivering superior performance in image generation, especially in $5\sim 10$ function evaluations.

NeurIPS Conference 2024 Conference Paper

ParallelEdits: Efficient Multi-Aspect Text-Driven Image Editing with Attention Grouping

  • Mingzhen Huang
  • Jialing Cai
  • Shan Jia
  • Vishnu S. Lokhande
  • Siwei Lyu

Text-driven image synthesis has made significant advancements with the development of diffusion models, transforming how visual content is generated from text prompts. Despite these advances, text-driven image editing, a key area in computer graphics, faces unique challenges. A major challenge is making simultaneous edits across multiple objects or attributes. Applying these methods sequentially for multi-attribute edits increases computational demands and efficiency losses. In this paper, we address these challenges with significant contributions. Our main contribution is the development of ParallelEdits, a method that seamlessly manages simultaneous edits across multiple attributes. In contrast to previous approaches, ParallelEdits not only preserves the quality of single attribute edits but also significantly improves the performance of multitasking edits. This is achieved through innovative attention distribution mechanism and multi-branch design that operates across several processing heads. Additionally, we introduce the PIE-Bench++ dataset, an expansion of the original PIE-Bench dataset, to better support evaluating image-editing tasks involving multiple objects and attributes simultaneously. This dataset is a benchmark for evaluating text-driven image editing methods in multifaceted scenarios.

NeurIPS Conference 2024 Conference Paper

Simple and Fast Distillation of Diffusion Models

  • Zhenyu Zhou
  • Defang Chen
  • Can Wang
  • Chun Chen
  • Siwei Lyu

Diffusion-based generative models have demonstrated their powerful performance across various tasks, but this comes at a cost of the slow sampling speed. To achieve both efficient and high-quality synthesis, various distillation-based accelerated sampling methods have been developed recently. However, they generally require time-consuming fine tuning with elaborate designs to achieve satisfactory performance in a specific number of function evaluation (NFE), making them difficult to employ in practice. To address this issue, we propose **S**imple and **F**ast **D**istillation (SFD) of diffusion models, which simplifies the paradigm used in existing methods and largely shortens their fine-tuning time up to $1000\times$. We begin with a vanilla distillation-based sampling method and boost its performance to state of the art by identifying and addressing several small yet vital factors affecting the synthesis efficiency and quality. Our method can also achieve sampling with variable NFEs using a single distilled model. Extensive experiments demonstrate that SFD strikes a good balance between the sample quality and fine-tuning costs in few-step image generation task. For example, SFD achieves 4. 53 FID (NFE=2) on CIFAR-10 with only **0. 64 hours** of fine-tuning on a single NVIDIA A100 GPU.

IJCAI Conference 2023 Conference Paper

Controlling Neural Style Transfer with Deep Reinforcement Learning

  • Chengming Feng
  • Jing Hu
  • Xin Wang
  • Shu Hu
  • Bin Zhu
  • Xi Wu
  • Hongtu Zhu
  • Siwei Lyu

Controlling the degree of stylization in the Neural Style Transfer (NST) is a little tricky since it usually needs hand-engineering on hyper-parameters. In this paper, we propose the first deep Reinforcement Learning (RL) based architecture that splits one-step style transfer into a step-wise process for the NST task. Our RL-based method tends to preserve more details and structures of the content image in early steps, and synthesize more style patterns in later steps. It is a user-easily-controlled style-transfer method. Additionally, as our RL-based model performs the stylization progressively, it is lightweight and has lower computational complexity than existing one-step Deep Learning (DL) based models. Experimental results demonstrate the effectiveness and robustness of our method.

NeurIPS Conference 2023 Conference Paper

DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection

  • Zhiyuan Yan
  • Yong Zhang
  • Xinhang Yuan
  • Siwei Lyu
  • Baoyuan Wu

A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark. This issue leads to unfair performance comparisons and potentially misleading results. Specifically, there is a lack of uniformity in data processing pipelines, resulting in inconsistent data inputs for detection models. Additionally, there are noticeable differences in experimental settings, and evaluation strategies and metrics lack standardization. To fill this gap, we present the first comprehensive benchmark for deepfake detection, called \textit{DeepfakeBench}, which offers three key contributions: 1) a unified data management system to ensure consistent input across all detectors, 2) an integrated framework for state-of-the-art methods implementation, and 3) standardized evaluation metrics and protocols to promote transparency and reproducibility. Featuring an extensible, modular-based codebase, \textit{DeepfakeBench} contains 15 state-of-the-art detection methods, 9 deepfake datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations. Moreover, we provide new insights based on extensive analysis of these evaluations from various perspectives (\eg, data augmentations, backbones). We hope that our efforts could facilitate future research and foster innovation in this increasingly critical domain. All codes, evaluations, and analyses of our benchmark are publicly available at \url{https: //github. com/SCLBD/DeepfakeBench}.

ECAI Conference 2023 Conference Paper

GAN-Generated Faces Detection: A Survey and New Perspectives

  • Xin Wang 0045
  • Hui Guo
  • Shu Hu 0001
  • Ming-Ching Chang
  • Siwei Lyu

Generative Adversarial Networks (GAN) have led to the generation of very realistic face images, which have been used in fake social media accounts and other disinformation matters that can generate profound impacts. Therefore, the corresponding GAN-face detection techniques are under active development that can examine and expose such fake faces. In this work, we aim to provide a comprehensive review of recent progress in GAN-face detection. We focus on methods that can detect face images that are generated or synthesized from GAN models. We classify the existing detection works into four categories: (1) deep learning-based, (2) physical-based, (3) physiological-based methods, and (4) evaluation and comparison against human visual performance. For each category, we summarize the key ideas and connect them with method implementations. We also discuss open problems and suggest future research directions.

IROS Conference 2023 Conference Paper

RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator Control

  • Yanfei Xiang
  • Xin Wang 0045
  • Shu Hu 0001
  • Bin Zhu 0008
  • Xiaomeng Huang
  • Xi Wu 0004
  • Siwei Lyu

Reinforcement learning is used to tackle complex tasks with high-dimensional sensory inputs. Over the past decade, a wide range of reinforcement learning algorithms have been developed, with recent progress benefiting from deep learning for raw sensory signal representation. This raises a natural question: how well do these algorithms perform across different robotic manipulation tasks? To objectively compare algorithms, benchmarks use performance metrics. Benchmarks use objective performance metrics to offer a scientific way to compare algorithms. In this paper, we introduce RMBench, the first benchmark for robotic manipulations with high-dimensional continuous action and state spaces. We implement and evaluate reinforcement learning algorithms that take observed pixels as inputs and report their average performance and learning curves to demonstrate their performance and training stability. Our study concludes that none of the evaluated algorithms can handle all tasks well, with soft Actor-Critic outperforming most algorithms in terms of average reward and stability, and an algorithm combined with data augmentation potentially facilitating learning policies. Our code is publicly available at https://github.com/xiangyanfei212/RMBench-2022.git, including all benchmark tasks and studied algorithms.

UAI Conference 2022 Conference Paper

Differentially private SGDA for minimax problems

  • Zhenhuan Yang
  • Shu Hu 0001
  • Yunwen Lei
  • Kush R. Varshney
  • Siwei Lyu
  • Yiming Ying

Stochastic gradient descent ascent (SGDA) and its variants have been the workhorse for solving minimax problems. However, in contrast to the well-studied stochastic gradient descent (SGD) with differential privacy (DP) constraints, there is little work on understanding the generalization (utility) of SGDA with DP constraints. In this paper, we use the algorithmic stability approach to establish the generalization (utility) of DP-SGDA in different settings. In particular, for the convex-concave setting, we prove that the DP-SGDA can achieve an optimal utility rate in terms of the weak primal-dual population risk in both smooth and non-smooth cases. To our best knowledge, this is the first-ever-known result for DP-SGDA in the non-smooth case. We further provide its utility analysis in the nonconvex-strongly-concave setting which is the first-ever-known result in terms of the primal population risk. The convergence and generalization results for this nonconvex setting are new even in the non-private setting. Finally, numerical experiments are conducted to demonstrate the effectiveness of DP-SGDA for both convex and nonconvex cases.

AAAI Conference 2022 Conference Paper

Stochastic Planner-Actor-Critic for Unsupervised Deformable Image Registration

  • Ziwei Luo
  • Jing Hu
  • Xin Wang
  • Shu Hu
  • Bin Kong
  • Youbing Yin
  • Qi Song
  • Xi Wu

Large deformations of organs, caused by diverse shapes and nonlinear shape changes, pose a significant challenge for medical image registration. Traditional registration methods need to iteratively optimize an objective function via a specific deformation model along with meticulous parameter tuning, but which have limited capabilities in registering images with large deformations. While deep learning-based methods can learn the complex mapping from input images to their respective deformation field, it is regression-based and is prone to be stuck at local minima, particularly when large deformations are involved. To this end, we present Stochastic Planner-Actor-Critic (SPAC), a novel reinforcement learningbased framework that performs step-wise registration. The key notion is warping a moving image successively by each time step to finally align to a fixed image. Considering that it is challenging to handle high dimensional continuous action and state spaces in the conventional reinforcement learning (RL) framework, we introduce a new concept ‘Plan’ to the standard Actor-Critic model, which is of low dimension and can facilitate the actor to generate a tractable high dimensional action. The entire framework is based on unsupervised training and operates in an end-to-end manner. We evaluate our method on several 2D and 3D medical image datasets, some of which contain large deformations. Our empirical results highlight that our work achieves consistent, significant gains and outperforms state-of-the-art methods.

JMLR Journal 2022 Journal Article

Sum of Ranked Range Loss for Supervised Learning

  • Shu Hu
  • Yiming Ying
  • Xin Wang
  • Siwei Lyu

In forming learning objectives, one oftentimes needs to aggregate a set of individual values to a single output. Such cases occur in the aggregate loss, which combines individual losses of a learning model over each training sample, and in the individual loss for multi-label learning, which combines prediction scores over all class labels. In this work, we introduce the sum of ranked range (SoRR) as a general approach to form learning objectives. A ranked range is a consecutive sequence of sorted values of a set of real numbers. The minimization of SoRR is solved with the difference of convex algorithm (DCA). We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary/multi-class classification at the sample level and the TKML individual loss for multi-label/multi-class classification at the label level. A combination loss of AoRR and TKML is proposed as a new learning objective for improving the robustness of multi-label learning in the face of outliers in sample and labels alike. Our empirical results highlight the effectiveness of the proposed optimization frameworks and demonstrate the applicability of proposed losses using synthetic and real data sets. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

AAAI Conference 2022 Conference Paper

Towards To-a-T Spatio-Temporal Focus for Skeleton-Based Action Recognition

  • Lipeng Ke
  • Kuan-Chuan Peng
  • Siwei Lyu

Graph Convolutional Networks (GCNs) have been widely used to model the high-order dynamic dependencies for skeleton-based action recognition. Most existing approaches do not explicitly embed the high-order spatio-temporal importance to joints’ spatial connection topology and intensity, and they do not have direct objectives on their attention module to jointly learn when and where to focus on in the action sequence. To address these problems, we propose the To-a-T Spatio-Temporal Focus (STF), a skeleton-based action recognition framework that utilizes the spatio-temporal gradient to focus on relevant spatio-temporal features. We first propose the STF modules with learnable gradient-enforced and instance-dependent adjacency matrices to model the high-order spatio-temporal dynamics. Second, we propose three loss terms defined on the gradientbased spatio-temporal focus to explicitly guide the classifier when and where to look at, distinguish confusing classes, and optimize the stacked STF modules. STF outperforms the state-of-the-art methods on the NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400 datasets in all 15 settings over different views, subjects, setups, and input modalities, and STF also shows better accuracy on scarce data and dataset shifting settings.

IJCAI Conference 2021 Conference Paper

Stochastic Actor-Executor-Critic for Image-to-Image Translation

  • Ziwei Luo
  • Jing Hu
  • Xin Wang
  • Siwei Lyu
  • Bin Kong
  • Youbing Yin
  • Qi Song
  • Xi Wu

Training a model-free deep reinforcement learning model to solve image-to-image translation is difficult since it involves high-dimensional continuous state and action spaces. In this paper, we draw inspiration from the recent success of the maximum entropy reinforcement learning framework designed for challenging continuous control problems to develop stochastic policies over high dimensional continuous spaces including image representation, generation, and control simultaneously. Central to this method is the Stochastic Actor-Executor-Critic (SAEC) which is an off-policy actor-critic model with an additional executor to generate realistic images. Specifically, the actor focuses on the high-level representation and control policy by a stochastic latent action, as well as explicitly directs the executor to generate low-level actions to manipulate the state. Experiments on several image-to-image translation tasks have demonstrated the effectiveness and robustness of the proposed SAEC when facing high-dimensional continuous space problems.

AAAI Conference 2020 Conference Paper

3D Single-Person Concurrent Activity Detection Using Stacked Relation Network

  • Yi Wei
  • Wenbo Li
  • Yanbo Fan
  • Linghan Xu
  • Ming-Ching Chang
  • Siwei Lyu

We aim to detect real-world concurrent activities performed by a single person from a streaming 3D skeleton sequence. Different from most existing works that deal with concurrent activities performed by multiple persons that are seldom correlated, we focus on concurrent activities that are spatiotemporally or causally correlated and performed by a single person. For the sake of generalization, we propose an approach based on a decompositional design to learn a dedicated feature representation for each activity class. To address the scalability issue, we further extend the class-level decompositional design to the postural-primitive level, such that each class-wise representation does not need to be extracted by independent backbones, but through a dedicated weighted aggregation of a shared pool of postural primitives. There are multiple interdependent instances deriving from each decomposition. Thus, we propose Stacked Relation Networks (SRN), with a specialized relation network for each decomposition, so as to enhance the expressiveness of instance-wise representations via the inter-instance relationship modeling. SRN achieves state-of-the-art performance on a public dataset and a newly collected dataset. The relation weights within SRN are interpretable among the activity contexts. The new dataset and code are available at https: //github. com/weiyi1991/UA Concurrent/

IROS Conference 2020 Conference Paper

Explainable and Efficient Sequential Correlation Network for 3D Single Person Concurrent Activity Detection

  • Yi Wei 0006
  • Wenbo Li 0001
  • Ming-Ching Chang
  • Hongxia Jin
  • Siwei Lyu

We present the sequential correlation network (SCN) to improve concurrent activity detection. SCN combines a recurrent neural network and a correlation model hierarchically to model the complex correlations and temporal dynamics of concurrent activities. SCN has several advantages that enable effective learning even from a small dataset for real-world deployment. Unlike the majority of approaches assuming that each subject performs one activity at a time, SCN is end-to- end trainable, i. e. , it can automatically learn the inclusive or exclusive relations of concurrent activities. SCN is lightweight in design using only a small set of learnable parameters to model the spatio-temporal correlations of activities. This also enhances the explainability of the learned parameters. Furthermore, the learning of SCN can benefit from the initialization using semantically meaningful priors. We evaluate the proposed method against the state-of-the-art method on two benchmark datasets with human skeletal data, SCN achieves comparable performance to the SOTA but with much faster inference speed and less memory usage.

NeurIPS Conference 2020 Conference Paper

Learning by Minimizing the Sum of Ranked Range

  • Shu Hu
  • Yiming Ying
  • Xin Wang
  • Siwei Lyu

In forming learning objectives, one oftentimes needs to aggregate a set of individual values to a single output. Such cases occur in the aggregate loss, which combines individual losses of a learning model over each training sample, and in the individual loss for multi-label learning, which combines prediction scores over all class labels. In this work, we introduce the sum of ranked range (SoRR) as a general approach to form learning objectives. A ranked range is a consecutive sequence of sorted values of a set of real numbers. The minimization of SoRR is solved with the difference of convex algorithm (DCA). We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification. Our empirical results highlight the effectiveness of the proposed optimization framework and demonstrate the applicability of proposed losses using synthetic and real datasets.

IJCAI Conference 2019 Conference Paper

Deep Correlated Predictive Subspace Learning for Incomplete Multi-View Semi-Supervised Classification

  • Zhe Xue
  • Junping Du
  • Dawei Du
  • Wenqi Ren
  • Siwei Lyu

Incomplete view information often results in failure cases of the conventional multi-view methods. To address this problem, we propose a Deep Correlated Predictive Subspace Learning (DCPSL) method for incomplete multi-view semi-supervised classification. Specifically, we integrate semi-supervised deep matrix factorization, correlated subspace learning, and multi-view label prediction into a unified framework to jointly learn the deep correlated predictive subspace and multi-view shared and private label predictors. DCPSL is able to learn proper subspace representation that is suitable for class label prediction, which can further improve the performance of classification. Extensive experimental results on various practical datasets demonstrate that the proposed method performs favorably against the state-of-the-art methods.

AAAI Conference 2019 Conference Paper

Learning Non-Uniform Hypergraph for Multi-Object Tracking

  • Longyin Wen
  • Dawei Du
  • Shengkun Li
  • Xiao Bian
  • Siwei Lyu

The majority of Multi-Object Tracking (MOT) algorithms based on the tracking-by-detection scheme do not use higher order dependencies among objects or tracklets, which makes them less effective in handling complex scenarios. In this work, we present a new near-online MOT algorithm based on non-uniform hypergraph, which can model different degrees of dependencies among tracklets in a unified objective. The nodes in the hypergraph correspond to the tracklets and the hyperedges with different degrees encode various kinds of dependencies among them. Specifically, instead of setting the weights of hyperedges with different degrees empirically, they are learned automatically using the structural support vector machine algorithm (SSVM). Several experiments are carried out on various challenging datasets (i. e. , PETS09, ParkingLot sequence, SubwayFace, and MOT16 benchmark), to demonstrate that our method achieves favorable performance against the state-of-the-art MOT methods.

AAAI Conference 2019 Conference Paper

Scale Invariant Fully Convolutional Network: Detecting Hands Efficiently

  • Dan Liu
  • Dawei Du
  • Libo Zhang
  • Tiejian Luo
  • Yanjun Wu
  • Feiyue Huang
  • Siwei Lyu

Existing hand detection methods usually follow the pipeline of multiple stages with high computation cost, i. e. , feature extraction, region proposal, bounding box regression, and additional layers for rotated region detection. In this paper, we propose a new Scale Invariant Fully Convolutional Network (SIFCN) trained in an end-to-end fashion to detect hands efficiently. Specifically, we merge the feature maps from high to low layers in an iterative way, which handles different scales of hands better with less time overhead comparing to concatenating them simply. Moreover, we develop the Complementary Weighted Fusion (CWF) block to make full use of the distinctive features among multiple layers to achieve scale invariance. To deal with rotated hand detection, we present the rotation map to get rid of complex rotation and derotation layers. Besides, we design the multi-scale loss scheme to accelerate the training process significantly by adding supervision to the intermediate layers of the network. Compared with the state-of-the-art methods, our algorithm shows comparable accuracy and runs a 4. 23 times faster speed on the VIVA dataset and achieves better average precision on Oxford hand detection dataset at a speed of 62. 5 fps.

UAI Conference 2018 Conference Paper

A Univariate Bound of Area Under ROC

  • Siwei Lyu
  • Yiming Ying

Area under ROC (AUC) is an important metric for binary classification and bipartite ranking problems. However, it is difficult to directly optimize AUC as a learning objective, so most existing algorithms are based on optimizing a surrogate loss to AUC. One significant drawback of these surrogate losses is that they require pairwise comparisons among training data, which leads to slow running time and increasing local storage for online learning. In this work, we describe a new surrogate loss based on a reformulation of AUC risk, which does not require pairwise comparison but rankings of the predictions. We further show that the ranking operation can be avoided, and the learning objective obtained based on this surrogate enjoys linear complexity in time and storage. We perform experiments to demonstrate the effectiveness of the online and batch algorithms for AUC optimization based on the proposed surrogate loss.

IROS Conference 2018 Conference Paper

Efficient State Estimation with Constrained Rao-Blackwellized Particle Filter

  • Shuai Li 0015
  • Siwei Lyu
  • Jeffrey C. Trinkle

Due to the limitations of the robotic sensors, during a robotic manipulation task, the acquisition of the object's state can be unreliable and noisy. Combining an accurate model of multi-body dynamic system with Bayesian filtering methods has been shown to be able to filter out noise from the object's observed states. However, efficiency of these filtering methods suffers from samples that violate the physical constraints, e. g. , no penetration constraint. In this paper, we propose a Rao-Blackwellized Particle Filter (RBPF) that samples the contact states and updates the object's poses using Kalman filters. This RBPF also enforces the physical constraints on the samples by solving a quadratic programming problem. By comparing our method with methods that does not consider physical constraints, we show that our proposed RBPF is not only able to estimate the object's states, e. g. , poses, more accurately but also able to infer unobserved states, e. g. , velocities, with higher precision.

ICML Conference 2018 Conference Paper

Stochastic Proximal Algorithms for AUC Maximization

  • Michael Natole
  • Yiming Ying
  • Siwei Lyu

Stochastic optimization algorithms such as SGDs update the model sequentially with cheap per-iteration costs, making them amenable for large-scale data analysis. However, most of the existing studies focus on the classification accuracy which can not be directly applied to the important problems of maximizing the Area under the ROC curve (AUC) in imbalanced classification and bipartite ranking. In this paper, we develop a novel stochastic proximal algorithm for AUC maximization which is referred to as SPAM. Compared with the previous literature, our algorithm SPAM applies to a non-smooth penalty function, and achieves a convergence rate of O(log t/t) for strongly convex functions while both space and per-iteration costs are of one datum.

NeurIPS Conference 2017 Conference Paper

Learning with Average Top-k Loss

  • Yanbo Fan
  • Siwei Lyu
  • Yiming Ying
  • Baogang Hu

In this work, we introduce the average top-$k$ (\atk) loss as a new ensemble loss for supervised learning. The \atk loss provides a natural generalization of the two widely used ensemble losses, namely the average loss and the maximum loss. Furthermore, the \atk loss combines the advantages of them and can alleviate their corresponding drawbacks to better adapt to different data distributions. We show that the \atk loss affords an intuitive interpretation that reduces the penalty of continuous and convex individual losses on correctly classified data. The \atk loss can lead to convex optimization problems that can be solved effectively with conventional sub-gradient based method. We further study the Statistical Learning Theory of \matk by establishing its classification calibration and statistical consistency of \matk which provide useful insights on the practical choice of the parameter $k$. We demonstrate the applicability of \matk learning combined with different individual loss functions for binary and multi-class classification and regression using synthetic and real datasets.

AAAI Conference 2017 Conference Paper

Unsupervised Learning of Multi-Level Descriptors for Person Re-Identification

  • Yang Yang
  • Longyin Wen
  • Siwei Lyu
  • Stan Li

In this paper, we propose a novel coding method named weighted linear coding (WLC) to learn multi-level (e. g. , pixel-level, patch-level and image-level) descriptors from raw pixel data in an unsupervised manner. It guarantees the property of saliency with a similarity constraint. The resulting multi-level descriptors have a good balance between the robustness and distinctiveness. Based on WLC, all data from the same region can be jointly encoded. Consequently, when we extract the holistic image features, it is able to preserve the spatial consistency. Furthermore, we apply PCA to these features and compact person representations are then achieved. During the stage of matching persons, we exploit the complementary information resided in multi-level descriptors via a score-level fusion strategy. Experiments on the challenging person re-identification datasets - VIPeR and CUHK 01, demonstrate the effectiveness of our method.

AAAI Conference 2016 Conference Paper

Co-Regularized PLSA for Multi-Modal Learning

  • Xin Wang
  • MingChing Chang
  • Yiming Ying
  • Siwei Lyu

Many learning problems in real world applications involve rich datasets comprising multiple information modalities. In this work, we study co-regularized PLSA (coPLSA) as an ef- ficient solution to probabilistic topic analysis of multi-modal data. In coPLSA, similarities between topic compositions of a data entity across different data modalities are measured with divergences between discrete probabilities, which are incorporated as a co-regularizer to augment individual PLSA models over each data modality. We derive efficient iterative learning algorithms for coPLSA with symmetric KL, 2 and 1 divergences as co-regularizers, in each case the essential optimization problem affords simple numerical solutions that entail only matrix arithmetic operations and numerical solution of 1D nonlinear equations. We evaluate the performance of the coPLSA algorithms on text/image cross-modal retrieval tasks, on which they show competitive performance with state-of-the-art methods.

AAAI Conference 2016 Conference Paper

Constrained Submodular Minimization for Missing Labels and Class Imbalance in Multi-label Learning

  • Baoyuan Wu
  • Siwei Lyu
  • Bernard Ghanem

In multi-label learning, there are two main challenges: missing labels and class imbalance (CIB). The former assumes that only a partial set of labels are provided for each training instance while other labels are missing. CIB is observed from two perspectives: first, the number of negative labels of each instance is much larger than its positive labels; second, the rate of positive instances (i. e. the number of positive instances divided by the total number of instances) of different classes are significantly different. Both missing labels and CIB lead to significant performance degradation. In this work, we propose a new method to handle these two challenges simultaneously. We formulate the problem as a constrained submodular minimization that is composed of a submodular objective function that encourages label consistency and smoothness, as well as, class cardinality bound constraints to handle class imbalance. We further present a convex approximation based on the Lovasz extension of submodular functions, leading to a linear program, which can be efficiently solved by the alternative direction method of multipliers (ADMM). Experimental results on several benchmark datasets demonstrate the improved performance of our method over several state-of-the-art methods.

NeurIPS Conference 2016 Conference Paper

Stochastic Online AUC Maximization

  • Yiming Ying
  • Longyin Wen
  • Siwei Lyu

Area under ROC (AUC) is a metric which is widely used for measuring the classification performance for imbalanced data. It is of theoretical and practical interest to develop online learning algorithms that maximizes AUC for large-scale data. A specific challenge in developing online AUC maximization algorithm is that the learning objective function is usually defined over a pair of training examples of opposite classes, and existing methods achieves on-line processing with higher space and time complexity. In this work, we propose a new stochastic online algorithm for AUC maximization. In particular, we show that AUC optimization can be equivalently formulated as a convex-concave saddle point problem. From this saddle representation, a stochastic online algorithm (SOLAM) is proposed which has time and space complexity of one datum. We establish theoretical convergence of SOLAM with high probability and demonstrate its effectiveness and efficiency on standard benchmark datasets.

IROS Conference 2015 Conference Paper

A comparative study of contact models for contact-aware state estimation

  • Shuai Li 0015
  • Siwei Lyu
  • Jeffrey C. Trinkle
  • Wolfram Burgard

We study the contact-aware state estimation (CASE) problem, i. e. , the problem of estimating the state of an object while it is being actively manipulated by a robot. Several researchers have developed particle filters for this problem. They estimate the state (pose and velocity) of manipulated objects, some physical properties (such as mass and shape), and contact information (such as, gain or loss of contact and transitions between sliding and sticking). However, the effects of various contact and noise models, which can have a huge impact on the estimation results, are obfuscated by implementation details. In this paper, we study the CASE problem arising from a simple pushing task with the goal of shedding light on the fundamental contact modeling choices. Specifically, we evaluate four particle filters based upon four probabilistic state transition models generated from a deterministic multibody dynamics models with rigid or compliant contacts, each of which is augmented by one of two different noise models. Comparisons of these state transition models are carried out through the analysis of real and simulated experiments, the results of which, provide guidance to filter designers.

ICRA Conference 2015 Conference Paper

State estimation for dynamic systems with intermittent contact

  • Shuai Li 0015
  • Siwei Lyu
  • Jeffrey C. Trinkle

Dynamic system states estimation, such as object pose and contact states estimation, is essential for robots to perform manipulation tasks. In order to make accurate estimation, the state transition model needs to be physically correct. Complementarity formulations of the dynamics are widely used for describing rigid body physical behaviors in the simulation field, which makes it a good state transition model for dynamic system states estimation problem. However, the non-smoothness of complementarity models and the high dimensionality of the dynamic system make the estimation problem challenging. In this paper, we propose a particle filtering framework that solves the estimation problem by sampling the discrete contact states using contact graphs and collision detection algorithms, and by estimating the continuous states through a Kalman filter. This method exploits the piecewise continuous property of complementarity problems and reduces the dimension of the sampling space compared with sampling the high dimensional continuous states space. We demonstrate that this method makes stable and reliable estimation in physical experiments.

ICRA Conference 2013 Conference Paper

A dynamic Bayesian approach to real-time estimation and filtering in grasp acquisition

  • Li Zhang 0130
  • Siwei Lyu
  • Jeffrey C. Trinkle

In this work, we develop a general solution to a broad class of grasping and manipulation problems that we term as C-SLAM for contact simultaneous localization and modeling, where the robots need to accurately track the motions of the contacted bodies and the locations of contacts, while simultaneously estimating important system parameters, such as body dimensions, masses and friction coefficients between contacting surfaces. Our solution framework is based on a dynamic Bayesian inference framework, and hence, we refer to it as Dynamic Bayesian C-SLAM (DBC-SLAM). DBC-SLAM combines an NCP-based dynamic model with the dynamic Bayesian network, and incorporates model parameter estimation as an intrinsic part of the overall inference procedure. We show two preliminary “proof-of-concept” examples that demonstrate the use of DBC-SLAM in robotic contact tasks.

IJCAI Conference 2013 Conference Paper

Deep Feature Learning Using Target Priors with Applications in ECoG Signal Decoding for BCI

  • Zuoguan Wang
  • Siwei Lyu
  • Gerwin Schalk
  • Qiang Ji

Recent years have seen a great interest in using deep architectures for feature learning from data. One drawback of the commonly used unsupervised deep feature learning methods is that for supervised or semi-supervised learning tasks, the information in the target variables are not used until the final stage when the classifier or regressor is trained on the learned features. This could lead to over-generalized features that are not competitive on the specific supervised or semi-supervised learning tasks. In this work, we describe a new learning method that combines deep feature learning on mixed labeled and unlabeled data sets. Specifically, we describe a weakly supervised learning method of a prior supervised convolutional stacked auto-encoders (PCSA), of which information in the target variables is represented probabilistically using a Gaussian Bernoulli restricted Boltzmann machine (RBM). We apply this method to the decoding problem of an ECoG based Brain Computer Interface (BCI) system. Our experimental results show that PCSA achieves significant improvement in decoding performance on benchmark data sets compared to the unsupervised feature learning as well as to the current state-of-the-art algorithms that are based on manually crafted features.

NeurIPS Conference 2013 Conference Paper

On Algorithms for Sparse Multi-factor NMF

  • Siwei Lyu
  • Xin Wang

Nonnegative matrix factorization (NMF) is a popular data analysis method, the objective of which is to decompose a matrix with all nonnegative components into the product of two other nonnegative matrices. In this work, we describe a new simple and efficient algorithm for multi-factor nonnegative matrix factorization problem ({mfNMF}), which generalizes the original NMF problem to more than two factors. Furthermore, we extend the mfNMF algorithm to incorporate a regularizer based on Dirichlet distribution over normalized columns to encourage sparsity in the obtained factors. Our sparse NMF algorithm affords a closed form and an intuitive interpretation, and is more efficient in comparison with previous works that use fix point iterations. We demonstrate the effectiveness and efficiency of our algorithms on both synthetic and real data sets.

NeurIPS Conference 2012 Conference Paper

Learning with Target Prior

  • Zuoguan Wang
  • Siwei Lyu
  • Gerwin Schalk
  • Qiang Ji

In the conventional approaches for supervised parametric learning, relations between data and target variables are provided through training sets consisting of pairs of corresponded data and target variables. In this work, we describe a new learning scheme for parametric learning, in which the target variables $\y$ can be modeled with a prior model $p(\y)$ and the relations between data and target variables are estimated through $p(\y)$ and a set of uncorresponded data $\x$ in training. We term this method as learning with target priors (LTP). Specifically, LTP learning seeks parameter $\t$ that maximizes the log likelihood of $f_\t(\x)$ on a uncorresponded training set with regards to $p(\y)$. Compared to the conventional (semi)supervised learning approach, LTP can make efficient use of prior knowledge of the target variables in the form of probabilistic distributions, and thus removes/reduces the reliance on training data in learning. Compared to the Bayesian approach, the learned parametric regressor in LTP can be more efficiently implemented and deployed in tasks where running efficiency is critical, such as on-line BCI signal decoding. We demonstrate the effectiveness of the proposed approach on parametric regression tasks for BCI signal decoding and pose estimation from video.

NeurIPS Conference 2011 Conference Paper

Unifying Non-Maximum Likelihood Learning Objectives with Minimum KL Contraction

  • Siwei Lyu

When used to learn high dimensional parametric probabilistic models, the clas- sical maximum likelihood (ML) learning often suffers from computational in- tractability, which motivates the active developments of non-ML learning meth- ods. Yet, because of their divergent motivations and forms, the objective func- tions of many non-ML learning methods are seemingly unrelated, and there lacks a unified framework to understand them. In this work, based on an information geometric view of parametric learning, we introduce a general non-ML learning principle termed as minimum KL contraction, where we seek optimal parameters that minimizes the contraction of the KL divergence between the two distributions after they are transformed with a KL contraction operator. We then show that the objective functions of several important or recently developed non-ML learn- ing methods, including contrastive divergence [12], noise-contrastive estimation [11], partial likelihood [7], non-local contrastive objectives [31], score match- ing [14], pseudo-likelihood [3], maximum conditional likelihood [17], maximum mutual information [2], maximum marginal likelihood [9], and conditional and marginal composite likelihood [24], can be unified under the minimum KL con- traction framework with different choices of the KL contraction operators.

NeurIPS Conference 2010 Conference Paper

Divisive Normalization: Justification and Effectiveness as Efficient Coding Transform

  • Siwei Lyu

Divisive normalization (DN) has been advocated as an effective nonlinear {\em efficient coding} transform for natural sensory signals with applications in biology and engineering. In this work, we aim to establish a connection between the DN transform and the statistical properties of natural sensory signals. Our analysis is based on the use of multivariate {\em t} model to capture some important statistical properties of natural sensory signals. The multivariate {\em t} model justifies DN as an approximation to the transform that completely eliminates its statistical dependency. Furthermore, using the multivariate {\em t} model and measuring statistical dependency with multi-information, we can precisely quantify the statistical dependency that is reduced by the DN transform. We compare this with the actual performance of the DN transform in reducing statistical dependencies of natural sensory signals. Our theoretical analysis and quantitative evaluations confirm DN as an effective efficient coding transform for natural sensory signals. On the other hand, we also observe a previously unreported phenomenon that DN may increase statistical dependencies when the size of pooling is small.

UAI Conference 2009 Conference Paper

Interpretation and Generalization of Score Matching

  • Siwei Lyu

Score matching is a recently developed parameter learning method that is particularly effective to complicated high dimensional density models with intractable partition functions. In this paper, we study two issues that have not been completely resolved for score matching. First, we provide a formal link between maximum likelihood and score matching. Our analysis shows that score matching finds model parameters that are more robust with noisy training data. Second, we develop a generalization of score matching. Based on this generalization, we further demonstrate an extension of score matching to models of discrete data.

NeurIPS Conference 2008 Conference Paper

Reducing statistical dependencies in natural signals using radial Gaussianization

  • Siwei Lyu
  • Eero Simoncelli

We consider the problem of efficiently encoding a signal by transforming it to a new representation whose components are statistically independent. A widely studied linear solution, independent components analysis (ICA), exists for the case when the signal is generated as a linear transformation of independent non- Gaussian sources. Here, we examine a complementary case, in which the source is non-Gaussian but elliptically symmetric. In this case, no linear transform suffices to properly decompose the signal into independent components, but we show that a simple nonlinear transformation, which we call radial Gaussianization (RG), is able to remove all dependencies. We then demonstrate this methodology in the context of natural signal statistics. We first show that the joint distributions of bandpass filter responses, for both sound and images, are better described as elliptical than linearly transformed independent sources. Consistent with this, we demonstrate that the reduction in dependency achieved by applying RG to either pairs or blocks of bandpass filter responses is significantly greater than that achieved by PCA or ICA.

NeurIPS Conference 2006 Conference Paper

Statistical Modeling of Images with Fields of Gaussian Scale Mixtures

  • Siwei Lyu
  • Eero Simoncelli

The local statistical properties of photographic images, when represented in a multi-scale basis, have been described using Gaussian scale mixtures (GSMs). Here, we use this local description to construct a global field of Gaussian scale mixtures (FoGSM). Specifically, we model subbands of wavelet coefficients as a product of an exponentiated homogeneous Gaussian Markov random field (hGMRF) and a second independent hGMRF. We show that parameter estimation for FoGSM is feasible, and that samples drawn from an estimated FoGSM model have marginal and joint statistics similar to wavelet coefficients of photographic images. We develop an algorithm for image denoising based on the FoGSM model, and demonstrate substantial improvements over current state-ofthe-art denoising method based on the local GSM model. Many successful methods in image processing and computer vision rely on statistical models for images, and it is thus of continuing interest to develop improved models, both in terms of their ability to precisely capture image structures, and in terms of their tractability when used in applications. Constructing such a model is difficult, primarily because of the intrinsic high dimensionality of the space of images. Two simplifying assumptions are usually made to reduce model complexity. The first is Markovianity: the density of a pixel conditioned on a small neighborhood, is assumed to be independent from the rest of the image. The second assumption is homogeneity: the local density is assumed to be independent of its absolute position within the image. The set of models satisfying both of these assumptions constitute the class of homogeneous Markov random fields (hMRFs). Over the past two decades, studies of photographic images represented with multi-scale multiorientation image decompositions (loosely referred to as "wavelets") have revealed striking nonGaussian regularities and inter and intra-subband dependencies. For instance, wavelet coefficients generally have highly kurtotic marginal distributions [1, 2], and their amplitudes exhibit strong correlations with the amplitudes of nearby coefficients [3, 4]. One model that can capture the nonGaussian marginal behaviors is a product of non-Gaussian scalar variables [5]. A number of authors have developed non-Gaussian MRF models based on this sort of local description [6, 7, 8], among which the recently developed fields of experts model [7] has demonstrated impressive performance in denoising (albeit at an extremely high computational cost in learning model parameters). An alternative model that can capture non-Gaussian local structure is a scale mixture model [9, 10, 11]. An important special case is Gaussian scale mixtures (GSM), which consists of a Gaussian random vector whose amplitude is modulated by a hidden scaling variable. The GSM model provides a particularly good description of local image statistics, and the Gaussian substructure of the model leads to efficient algorithms for parameter estimation and inference. Local GSM-based methods represent the current state-of-the-art in image denoising [12]. The power of GSM models should be substantially improved when extended to describe more than a small neighborhood of wavelet coefficients. To this end, several authors have embedded local Gaussian mixtures into tree-structured MRF models [e. g. , 13, 14]. In order to maintain tractability, these models are arranged such that coefficients are grouped in non-overlapping clusters, allowing a graphical probability model with no loops. Despite their global consistency, the artificially imposed cluster boundaries lead to substantial artifacts in applications such as denoising. In this paper, we use a local GSM as a basis for a globally consistent and spatially homogeneous field of Gaussian scale mixtures (FoGSM). Specifically, the FoGSM is formulated as the product of two mutually independent MRFs: a positive multiplier field obtained by exponentiating a homogeneous Gaussian MRF (hGMRF), and a second hGMRF. We develop a parameter estimation procedure, and show that the model is able to capture important statistical regularities in the marginal and joint wavelet statistics of a photographic image. We apply the FoGSM to image denoising, demonstrating substantial improvement over the previous state-of-the-art results obtained with a local GSM model. 1 Gaussian scale mixtures A GSM random vector x is formed as the product of a zero-mean Gaussian random vector u and an d d independent random variable z, as x = zu, where = denotes equality in distribution. The density of x is determined by the covariance of the Gaussian vector, , and the density of the multiplier, p z (z), through the integral - T -1 p z z xx 1 exp (1) p(x) = Nx (0, z) pz (z)dz z (z)d z. 2z z|| A key property of GSMs is that when z determines the scale of the conditional variance of x given z, wich is a Gaussian variable with zero mean and covariance z. In addition, the normalized variable h x z is a zero mean Gaussian with covariance matrix. The GSM model has been used to describe the marginal and joint densities of local clusters of wavelet coefficients, both within and across subbands [9], where the embedded Gaussian structure affords simple and efficient computation. This local GSM model has been be used for denoising, by independently estimating each coefficient conditioned on its surrounding cluster [12]. This method achieves state-of-the-art performances, despite the fact that treating overlapping clusters as independent does not give rise to a globally consistent statistical model that satisfies all the local constraints. 2 Fields of Gaussian scale mixtures In this section, we develop fields of Gaussian scale mixtures (FoGSM) as a framework for modeling wavelet coefficients of photographic images. Analogous to the local GSM model, we use a latent multiplier field to modulate a homogeneous Gaussian MRF (hGMRF). Formally, we define a FoGSM x as the product of two mutually independent MRFs, d x = u z, (2) where u is a zero-mean hGMRF, and z is a field of positive multipliers that control the local coefficient variances. The operator denotes element-wise multiplication, and the square root operation is applied to each component. Note that x has a one-dimensional GSM marginal distributions, while its components have dependencies captured by the MRF structures of u and z. Analogous to the local GSM, when conditioned on z, x is an inhomogeneous GMRF | | - - -1 x = 1 T D -1 1 Qu | Qu | p(x|z) x (x z)T Qu (x i exp z Qu D z i exp zi 2 zi 2, z) (3) where Qu is the inverse covariance matrix of u (also known as the precision matrix), and D() denotes the operator that form a diagonal matrix from an input vector. Note also that the elementwise division of the two fields, x z, yields a hGMRF with precision matrix Q u. To complete the FoGSM model, we need to specify the structure of the multiplier field z. For tractability, we use another hGMRF as a substrate, and map it into positive values by exponentiation,