Author name cluster

Kevin Duh

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

NeurIPS Conference 2024 Conference Paper

Where does In-context Learning Happen in Large Language Models?

Suzanna Sia
David Mueller
Kevin Duh

Self-supervised large language models have demonstrated the ability to perform various tasks via in-context learning, but little is known about where the model locates the task with respect to prompt instructions and demonstration examples. In this work, we attempt to characterize the region where large language models transition from recognizing the task to performing the task. Through a series of layer-wise context-masking experiments on GPTNeo2. 7B, Bloom3B, Starcoder2-7B, Llama3. 1-8B, Llama3. 1-8B-Instruct, on Machine Translation and Code generation, we demonstrate evidence of a "task recognition" point where the task is encoded into the input representations and attention to context is no longer necessary. Taking advantage of this redundancy results in 45% computational savings when prompting with 5 examples, and task recognition achieved at layer 14 / 32 using an example with Machine Translation. Our findings also have implications for resource and parameter efficient fine-tuning; we observe a correspondence between strong fine-tuning performance of individual LoRA layers and the task recognition layers.

PDF Details DOI

AAAI Conference 2022 Short Paper

Modeling Constraints Can Identify Winning Arguments in Multi-Party Interactions (Student Abstract)

Suzanna Sia
Kokil Jaidka
Niyati Chayya
Kevin Duh

In contexts where debate and deliberation is the norm, participants are regularly presented with new information that conflicts with their original beliefs. When required to update their beliefs (belief alignment), they may choose arguments that align with their worldview (confirmation bias). We test this and competing hypotheses in a constraint-based modeling approach to predict the winning arguments in multi-party interactions in the Reddit ChangeMyView dataset. We impose structural constraints that reflect competing hypotheses on a hierarchical generative Variational Auto-encoder. Our findings suggest that when arguments are further from the initial belief state of the target, they are more likely to succeed.

PDF Details

AAAI Conference 2017 Conference Paper

Robsut Wrod Reocginiton via Semi-Character Recurrent Neural Network

Keisuke Sakaguchi
Kevin Duh
Matt Post
Benjamin Van Durme

Language processing mechanism by humans is generally more robust than computers. The Cmabrigde Uinervtisy (Cambridge University) effect from the psycholinguistics literature has demonstrated such a robust word processing mechanism, where jumbled words (e. g. Cmabrigde / Cambridge) are recognized with little cost. On the other hand, computational models for word recognition (e. g. spelling checkers) perform poorly on data with such noise. Inspired by the ﬁndings from the Cmabrigde Uinervtisy effect, we propose a word recognition model based on a semicharacter level recurrent neural network (scRNN). In our experiments, we demonstrate that scRNN has signiﬁcantly more robust performance in word spelling correction (i. e. word recognition) compared to existing spelling checkers and character-based convolutional neural network. Furthermore, we demonstrate that the model is cognitively plausible by replicating a psycholinguistics experiment about human reading difﬁculty using our model.

PDF Details

AAAI Conference 2016 Conference Paper

Non-Linear Similarity Learning for Compositionality

Masashi Tsubaki
Kevin Duh
Masashi Shimbo
Yuji Matsumoto

Many NLP applications rely on the existence of similarity measures over text data. Although word vector space models provide good similarity measures between words, phrasal and sentential similarities derived from composition of individual words remain as a dif- ﬁcult problem. In this paper, we propose a new method of of non-linear similarity learning for semantic compositionality. In this method, word representations are learned through the similarity learning of sentences in a high-dimensional space with kernel functions. On the task of predicting the semantic relatedness of two sentences (SemEval 2014, Task 1), our method outperforms linear baselines, feature engineering approaches, recursive neural networks, and achieve competitive results with long short-term memory models.

PDF Details