Author name cluster

Marek Grzes

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

2 author rows

JMLR Journal 2023 Journal Article

Be More Active! Understanding the Differences Between Mean and Sampled Representations of Variational Autoencoders

Lisa Bonheme
Marek Grzes

The ability of Variational Autoencoders to learn disentangled representations has made them appealing for practical applications. However, their mean representations, which are generally used for downstream tasks, have recently been shown to be more correlated than their sampled counterpart, on which disentanglement is usually measured. In this paper, we refine this observation through the lens of selective posterior collapse, which states that only a subset of the learned representations, the active variables, is encoding useful information while the rest (the passive variables) is discarded. We first extend the existing definition to multiple data examples and show that active variables are equally disentangled in mean and sampled representations. Based on this extension and the pre-trained models from disentanglement_lib}, we then isolate the passive variables and show that they are responsible for the discrepancies between mean and sampled representations. Specifically, passive variables exhibit high correlation scores with other variables in mean representations while being fully uncorrelated in sampled ones. We thus conclude that despite what their higher correlation might suggest, mean representations are still good candidates for downstream tasks applications. However, it may be beneficial to remove their passive variables, especially when used with models sensitive to correlated features. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

PDF Details

AAAI Conference 2018 Conference Paper

Improving Language Modelling with Noise Contrastive Estimation

Farhana Ferdousi Liza
Marek Grzes

Neural language models do not scale well when the vocabulary is large. Noise contrastive estimation (NCE) is a sampling-based method that allows for fast learning with large vocabularies. Although NCE has shown promising performance in neural machine translation, its full potential has not been demonstrated in the language modelling literature. A sufﬁcient investigation of the hyperparameters in the NCEbased neural language models was clearly missing. In this paper, we showed that NCE can be a very successful approach in neural language modelling when the hyperparameters of a neural network are tuned appropriately. We introduced the ‘search-then-converge’ learning rate schedule for NCE and designed a heuristic that speciﬁes how to use this schedule. The impact of the other important hyperparameters, such as the dropout rate and the weight initialisation range, was also demonstrated. Using a popular benchmark, we showed that appropriate tuning of NCE in neural language models outperforms the state-of-the-art single-model methods based on standard dropout and the standard LSTM recurrent neural networks.

PDF Details

AAMAS Conference 2017 Conference Paper

Reward Shaping in Episodic Reinforcement Learning

Marek Grzes

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of reinforcement learning in various sectors, such as healthcare and cyber-security, among others. However, reinforcement learning can be time-consuming because the learning algorithms have to determine the long term consequences of their actions using delayed feedback or rewards. Reward shaping is a method of incorporating domain knowledge into reinforcement learning so that the algorithms are guided faster towards more promising solutions. Under an overarching theme of episodic reinforcement learning, this paper shows a unifying analysis of potential-based reward shaping which leads to new theoretical insights into reward shaping in both model-free and model-based algorithms, as well as in multi-agent reinforcement learning. CCS Concepts •Theory of computation → Reinforcement learning; Sequential decision making; Multi-agent reinforcement learning; •Computing methodologies → Reinforcement learning; Q-learning;

PDF

ICAPS Conference 2011 Conference Paper

Distributed Control of Situated Assistance in Large Domains with Many Tasks

Jesse Hoey
Marek Grzes

This paper tackles the problem of building situated prompting and assistance systems for guiding a human with a cognitive disability through a large domain containing multiple tasks. This problem is challenging because the target population has difficulty maintaining goals, recalling necessary steps and recognizing objects and potential actions (affordances), and therefore may not appear to be acting rationally. Prompts or cues from an automated system can be very helpful in this regard, but the domain is inherently partially observable due to sensor noise and uncertain human behaviours, making the task of selecting an appropriate prompt very challenging. Prior work has shown how such automated assistance for a single task can be modeled as a partially observable Markov decision process (POMDP). In this paper, we generalise this to multiple tasks, and show how to build a scalable, distributed and hierarchical controller. We demonstrate the algorithm in a set of simulated domains and show it can perform as well as the full model in many cases, and can give solutions to large problems (over 1015 states and 109 observations) for which the full model fails to find a policy.

Details

AAMAS Conference 2010 Conference Paper

PAC-MDP Learning with Knowledge-based Admissible Models

Marek Grzes
Daniel Kudenko

PAC-MDP algorithms approach the exploration-exploitation problem of reinforcement learning agents in an effective way whichguarantees that with high probability, the algorithm performs nearoptimally for all but a polynomial number of steps. The performance of these algorithms can be further improved by incorporating domain knowledge to guide their learning process. In this paper we propose a framework to use partial knowledge about effectsof actions in a theoretically well-founded way. Empirical evaluation shows that our proposed method is more efficient than reward shaping which represents an alternative approach to incorporate background knowledge. Our solution is also very competitivewhen compared with the Bayesian Exploration Bonus (BEB) algorithm. BEB is not PAC-MDP, however it can exploit domainknowledge via informative priors. We show how to use the samekind of knowledge in the PAC-MDP framework in a way whichpreserves all theoretical guarantees of PAC-MDP learning.

PDF