Arrow Research search

Author name cluster

Boris Katz

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

21 papers
2 author rows

Possible papers

21

ICLR Conference 2025 Conference Paper

Population Transformer: Learning Population-level Representations of Neural Activity

  • Geeling Chau
  • Christopher Wang
  • Sabera J. Talukder
  • Vighnesh Subramaniam
  • Saraswati Soedarmadji
  • Yisong Yue
  • Boris Katz
  • Andrei Barbu

We present a self-supervised framework that learns population-level codes for arbitrary ensembles of neural recordings at scale. We address key challenges in scaling models with neural time-series data, namely, sparse and variable electrode distribution across subjects and datasets. The Population Transformer (PopT) stacks on top of pretrained temporal embeddings and enhances downstream decoding by enabling learned aggregation of multiple spatially-sparse data channels. The pretrained PopT lowers the amount of data required for downstream decoding experiments, while increasing accuracy, even on held-out subjects and tasks. Compared to end-to-end methods, this approach is computationally lightweight, while achieving similar or better decoding performance. We further show how our framework is generalizable to multiple time-series embeddings and neural data modalities. Beyond decoding, we interpret the pretrained and fine-tuned PopT models to show how they can be used to extract neuroscience insights from large amounts of data. We release our code as well as a pretrained PopT to enable off-the-shelf improvements in multi-channel intracranial data decoding and interpretability. Code is available at https://github.com/czlwang/PopulationTransformer.

NeurIPS Conference 2025 Conference Paper

Training the Untrainable: Introducing Inductive Bias via Representational Alignment

  • Vighnesh Subramaniam
  • David Mayo
  • Colin Conwell
  • Tomaso Poggio
  • Boris Katz
  • Brian Cheung
  • Andrei Barbu

We demonstrate that architectures which traditionally are considered to be ill-suited for a task can be trained using inductive biases from another architecture. We call a network untrainable when it overfits, underfits, or converges to poor results even when tuning their hyperparameters. For example, fully connected networks overfit on object recognition while deep convolutional networks without residual connections underfit. The traditional answer is to change the architecture to impose some inductive bias, although the nature of that bias is unknown. We introduce guidance, where a guide network steers a target network using a neural distance function. The target minimizes its task loss plus a layerwise representational similarity against the frozen guide. If the guide is trained, this transfers over the architectural prior and knowledge of the guide to the target. If the guide is untrained, this transfers over only part of the architectural prior of the guide. We show that guidance prevents FCN overfitting on ImageNet, narrows the vanilla RNN–Transformer gap, boosts plain CNNs toward ResNet accuracy, and aids Transformers on RNN-favored tasks. We further identify that guidance-driven initialization alone can mitigate FCN overfitting. Our method provides a mathematical tool to investigate priors and architectures, and in the long term, could automate architecture design.

NeurIPS Conference 2024 Conference Paper

Brain Treebank: Large-scale intracranial recordings from naturalistic language stimuli

  • Christopher Wang
  • Adam Yaari
  • Aaditya K. Singh
  • Vighnesh Subramaniam
  • Dana Rosenfarb
  • Jan DeWitt
  • Pranav Misra
  • Joseph R. Madsen

We present the Brain Treebank, a large-scale dataset of electrophysiological neural responses, recorded from intracranial probes while 10 subjects watched one or more Hollywood movies. Subjects watched on average 2. 6 Hollywood movies, for an average viewing time of 4. 3 hours, and a total of 43 hours. The audio track for each movie was transcribed with manual corrections. Word onsets were manually annotated on spectrograms of the audio track for each movie. Each transcript was automatically parsed and manually corrected into the universal dependencies (UD) formalism, assigning a part of speech to every word and a dependency parse to every sentence. In total, subjects heard over 38, 000 sentences (223, 000 words), while they had on average 168 electrodes implanted. This is the largest dataset of intracranial recordings featuring grounded naturalistic language, one of the largest English UD treebanks in general, and one of only a few UD treebanks aligned to multimodal features. We hope that this dataset serves as a bridge between linguistic concepts, perception, and their neural representations. To that end, we present an analysis of which electrodes are sensitive to language features while also mapping out a rough time course of language processing across these electrodes. The Brain Treebank is available at https: //BrainTreebank. dev/

NeurIPS Conference 2024 Conference Paper

BrainBits: How Much of the Brain are Generative Reconstruction Methods Using?

  • David Mayo
  • Christopher Wang
  • Asa Harbin
  • Abdulrahman Alabdulkareem
  • Albert Shaw
  • Boris Katz
  • Andrei Barbu

When evaluating stimuli reconstruction results it is tempting to assume that higher fidelity text and image generation is due to an improved understanding of the brain or more powerful signal extraction from neural recordings. However, in practice, new reconstruction methods could improve performance for at least three other reasons: learning more about the distribution of stimuli, becoming better at reconstructing text or images in general, or exploiting weaknesses in current image and/or text evaluation metrics. Here we disentangle how much of the reconstruction is due to these other factors vs. productively using the neural recordings. We introduce BrainBits, a method that uses a bottleneck to quantify the amount of signal extracted from neural recordings that is actually necessary to reproduce a method's reconstruction fidelity. We find that it takes surprisingly little information from the brain to produce reconstructions with high fidelity. In these cases, it is clear that the priors of the methods' generative models are so powerful that the outputs they produce extrapolate far beyond the neural signal they decode. Given that reconstructing stimuli can be improved independently by either improving signal extraction from the brain or by building more powerful generative models, improving the latter may fool us into thinking we are improving the former. We propose that methods should report a method-specific random baseline, a reconstruction ceiling, and a curve of performance as a function of bottleneck size, with the ultimate goal of using more of the neural recordings.

ICML Conference 2024 Conference Paper

Revealing Vision-Language Integration in the Brain with Multimodal Networks

  • Vighnesh Subramaniam
  • Colin Conwell
  • Christopher Wang
  • Gabriel Kreiman
  • Boris Katz
  • Ignacio Cases
  • Andrei Barbu

We use (multi)modal deep neural networks (DNNs) to probe for sites of multimodal integration in the human brain by predicting stereoencephalography (SEEG) recordings taken while human subjects watched movies. We operationalize sites of multimodal integration as regions where a multimodal vision-language model predicts recordings better than unimodal language, unimodal vision, or linearly-integrated language-vision models. Our target DNN models span different architectures (e. g. , convolutional networks and transformers) and multimodal training techniques (e. g. , cross-attention and contrastive learning). As a key enabling step, we first demonstrate that trained vision and language models systematically outperform their randomly initialized counterparts in their ability to predict SEEG signals. We then compare unimodal and multimodal models against one another. Because our target DNN models often have different architectures, number of parameters, and training sets (possibly obscuring those differences attributable to integration), we carry out a controlled comparison of two models (SLIP and SimCLR), which keep all of these attributes the same aside from input modality. Using this approach, we identify a sizable number of neural sites (on average 141 out of 1090 total sites or 12. 94%) and brain regions where multimodal integration seems to occur. Additionally, we find that among the variants of multimodal training techniques we assess, CLIP-style training is the best suited for downstream prediction of the neural activity in these sites.

ICLR Conference 2023 Conference Paper

BrainBERT: Self-supervised representation learning for intracranial recordings

  • Christopher Wang
  • Vighnesh Subramaniam
  • Adam Uri Yaari
  • Gabriel Kreiman
  • Boris Katz
  • Ignacio Cases
  • Andrei Barbu

We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience. Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, i.e., decoding neural data, with higher accuracy and with much less data by being pretrained in an unsupervised manner on a large corpus of unannotated neural recordings. Our approach generalizes to new subjects with electrodes in new positions and to unrelated tasks showing that the representations robustly disentangle the neural signal. Just like in NLP where one can study language by investigating what a language model learns, this approach opens the door to investigating the brain by what a model of the brain learns. As a first step along this path, we demonstrate a new analysis of the intrinsic dimensionality of the computations in different areas of the brain. To construct these representations, we combine a technique for producing super-resolution spectrograms of neural data with an approach designed for generating contextual representations of audio by masking. In the future, far more concepts will be decodable from neural recordings by using representation learning, potentially unlocking the brain like language models unlocked language.

NeurIPS Conference 2023 Conference Paper

How hard are computer vision datasets? Calibrating dataset difficulty to viewing time

  • David Mayo
  • Jesse Cummings
  • Xinyu Lin
  • Dan Gutfreund
  • Boris Katz
  • Andrei Barbu

Humans outperform object recognizers despite the fact that models perform well on current datasets, including those explicitly designed to challenge machines with debiased images or distribution shift. This problem persists, in part, because we have no guidance on the absolute difficulty of an image or dataset making it hard to objectively assess progress toward human-level performance, to cover the range of human abilities, and to increase the challenge posed by a dataset. We develop a dataset difficulty metric MVT, Minimum Viewing Time, that addresses these three problems. Subjects view an image that flashes on screen and then classify the object in the image. Images that require brief flashes to recognize are easy, those which require seconds of viewing are hard. We compute the ImageNet and ObjectNet image difficulty distribution, which we find significantly undersamples hard images. Nearly 90% of current benchmark performance is derived from images that are easy for humans. Rather than hoping that we will make harder datasets, we can for the first time objectively guide dataset difficulty during development. We can also subset recognition performance as a function of difficulty: model performance drops precipitously while human performance remains stable. Difficulty provides a new lens through which to view model performance, one which uncovers new scaling laws: vision-language models stand out as being the most robust and human-like while all other techniques scale poorly. We release tools to automatically compute MVT, along with image sets which are tagged by difficulty. Objective image difficulty has practical applications – one can measure how hard a test set is before deploying a real-world system – and scientific applications such as discovering the neural correlates of image difficulty and enabling new object recognition techniques that eliminate the benchmark-vs- real-world performance gap.

AAAI Conference 2023 Conference Paper

Zero-Shot Linear Combinations of Grounded Social Interactions with Linear Social MDPs

  • Ravi Tejwani
  • Yen-Ling Kuo
  • Tianmin Shu
  • Bennett Stankovits
  • Dan Gutfreund
  • Joshua B. Tenenbaum
  • Boris Katz
  • Andrei Barbu

Humans and animals engage in rich social interactions. It is often theorized that a relatively small number of basic social interactions give rise to the full range of behavior observed. But no computational theory explaining how social interactions combine together has been proposed before. We do so here. We take a model, the Social MDP, which is able to express a range of social interactions, and extend it to represent linear combinations of social interactions. Practically for robotics applications, such models are now able to not just express that an agent should help another agent, but to express goal-centric social interactions. Perhaps an agent is helping someone get dressed, but preventing them from falling, and is happy to exchange stories in the meantime. How an agent responds socially, should depend on what it thinks the other agent is doing at that point in time. To encode this notion, we take linear combinations of social interactions as defined in Social MDPs, and compute the weights on those combinations on the fly depending on the estimated goals of other agents. This new model, the Linear Social MDP, enables zero-shot reasoning about complex social interactions, provides a mathematical basis for the long-standing intuition that social interactions should compose, and leads to interesting new behaviors that we validate using human observers. Complex social interactions are part of the future of intelligent agents, and having principled mathematical models built on a foundation like MDPs will make it possible to bring social interactions to every robotic application.

ICRA Conference 2022 Conference Paper

Incorporating Rich Social Interactions Into MDPs

  • Ravi Tejwani
  • Yen-Ling Kuo
  • Tianmin Shu
  • Bennett Stankovits
  • Dan Gutfreund
  • Joshua B. Tenenbaum
  • Boris Katz
  • Andrei Barbu

Much of what we do as humans is engage socially with other agents, a skill that robots must also eventually possess. We demonstrate that a rich theory of social interactions originating from microsociology can be formalized by extending a nested MDP where agents reason about arbitrary functions of each other's rewards. This extended Social MDP allows us to encode the five basic interactions that underlie microsociology: cooperation, conflict, coercion, competition, and exchange. The result is a robotic agent capable of executing social interactions in new environments with no interaction-specific training; like humans it can engage socially in novel ways even without a single example of that social interaction. Moreover, the estimations of these Social MDPs align closely with the judge-ments of humans when considering which social interaction is taking place in an environment. This method both sheds light on the nature of social interactions, by providing concrete mathematical definitions, and brings rich social interactions into a mathematical framework that has proven to be natural for robotics.

ICRA Conference 2022 Conference Paper

Trajectory Prediction with Linguistic Representations

  • Yen-Ling Kuo
  • Xin Huang 0018
  • Andrei Barbu
  • Stephen G. McGill
  • Boris Katz
  • John J. Leonard
  • Guy Rosman

Language allows humans to build mental models that interpret what is happening around them resulting in more accurate long-term predictions. We present a novel trajectory prediction model that uses linguistic intermediate representations to forecast trajectories, and is trained using trajectory samples with partially-annotated captions. The model learns the meaning of each of the words without direct per-word supervision. At inference time, it generates a linguistic description of trajectories which captures maneuvers and interactions over an extended time interval. This generated description is used to refine predictions of the trajectories of multiple agents. We train and validate our model on the Argoverse dataset, and demonstrate improved accuracy results in trajectory prediction. In addition, our model is more interpretable: it presents part of its reasoning in plain language as captions, which can aid model development and can aid in building confidence in the model before deploying it.

ICLR Conference 2021 Conference Paper

Multi-resolution modeling of a discrete stochastic process identifies causes of cancer

  • Adam Uri Yaari
  • Maxwell Sherman
  • Oliver Clarke Priebe
  • Po-Ru Loh
  • Boris Katz
  • Andrei Barbu
  • Bonnie Berger

Detection of cancer-causing mutations within the vast and mostly unexplored human genome is a major challenge. Doing so requires modeling the background mutation rate, a highly non-stationary stochastic process, across regions of interest varying in size from one to millions of positions. Here, we present the split-Poisson-Gamma (SPG) distribution, an extension of the classical Poisson-Gamma formulation, to model a discrete stochastic process at multiple resolutions. We demonstrate that the probability model has a closed-form posterior, enabling efficient and accurate linear-time prediction over any length scale after the parameters of the model have been inferred a single time. We apply our framework to model mutation rates in tumors and show that model parameters can be accurately inferred from high-dimensional epigenetic data using a convolutional neural network, Gaussian process, and maximum-likelihood estimation. Our method is both more accurate and more efficient than existing models over a large range of length scales. We demonstrate the usefulness of multi-resolution modeling by detecting genomic elements that drive tumor emergence and are of vastly differing sizes.

NeurIPS Conference 2021 Conference Paper

Neural Regression, Representational Similarity, Model Zoology & Neural Taskonomy at Scale in Rodent Visual Cortex

  • Colin Conwell
  • David Mayo
  • Andrei Barbu
  • Michael Buice
  • George Alvarez
  • Boris Katz

How well do deep neural networks fare as models of mouse visual cortex? A majority of research to date suggests results far more mixed than those produced in the modeling of primate visual cortex. Here, we perform a large-scale benchmarking of dozens of deep neural network models in mouse visual cortex with both representational similarity analysis and neural regression. Using the Allen Brain Observatory's 2-photon calcium-imaging dataset of activity in over 6, 000 reliable rodent visual cortical neurons recorded in response to natural scenes, we replicate previous findings and resolve previous discrepancies, ultimately demonstrating that modern neural networks can in fact be used to explain activity in the mouse visual cortex to a more reasonable degree than previously suggested. Using our benchmark as an atlas, we offer preliminary answers to overarching questions about levels of analysis (e. g. do models that better predict the representations of individual neurons also predict representational similarity across neural populations? ); questions about the properties of models that best predict the visual system overall (e. g. is convolution or category-supervision necessary to better predict neural activity? ); and questions about the mapping between biological and artificial representations (e. g. does the information processing hierarchy in deep nets match the anatomical hierarchy of mouse visual cortex? ). Along the way, we catalogue a number of models (including vision transformers, MLP-Mixers, normalization free networks, Taskonomy encoders and self-supervised models) outside the traditional circuit of convolutional object recognition. Taken together, our results provide a reference point for future ventures in the deep neural network modeling of mouse visual cortex, hinting at novel combinations of mapping method, architecture, and task to more fully characterize the computational motifs of visual representation in a species so central to neuroscience, but with a perceptual physiology and ecology markedly different from the ones we study in primates.

AAAI Conference 2021 Conference Paper

PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception

  • Aviv Netanyahu
  • Tianmin Shu
  • Boris Katz
  • Andrei Barbu
  • Joshua B. Tenenbaum

The ability to perceive and reason about social interactions in the context of physical environments is core to human social intelligence and human-machine cooperation. However, no prior dataset or benchmark has systematically evaluated physically grounded perception of complex social interactions that go beyond short actions, such as high-fiving, or simple group activities, such as gathering. In this work, we create a dataset of physically-grounded abstract social events, PHASE, that resemble a wide range of real-life social interactions by including social concepts such as helping another agent. PHASE consists of 2D animations of pairs of agents moving in a continuous space generated procedurally using a physics engine and a hierarchical planner. Agents have a limited field of view, and can interact with multiple objects, in an environment that has multiple landmarks and obstacles. Using PHASE, we design a social recognition task and a social prediction task. PHASE is validated with human experiments demonstrating that humans perceive rich interactions in the social events, and that the simulated agents behave similarly to humans. As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE (SIMulation, Planning and Local Estimation), which outperforms state-of-the-art feedforward neural networks. We hope that PHASE can serve as a difficult new challenge for developing new models that can recognize complex social interactions.

ICRA Conference 2020 Conference Paper

Deep compositional robotic planners that follow natural language commands

  • Yen-Ling Kuo
  • Boris Katz
  • Andrei Barbu

We demonstrate how a sampling-based robotic planner can be augmented to learn to understand a sequence of natural language commands in a continuous configuration space to move and manipulate objects. Our approach combines a deep network structured according to the parse of a complex command that includes objects, verbs, spatial relations, and attributes, with a sampling-based planner, RRT. A recurrent hierarchical deep network controls how the planner explores the environment, determines when a planned path is likely to achieve a goal, and estimates the confidence of each move to trade off exploitation and exploration between the network and the planner. Planners are designed to have near-optimal behavior when information about the task is missing, while networks learn to exploit observations which are available from the environment, making the two naturally complementary. Combining the two enables generalization to new maps, new kinds of obstacles, and more complex sentences that do not occur in the training set. Little data is required to train the model despite it jointly acquiring a CNN that extracts features from the environment as it learns the meanings of words. The model provides a level of interpretability through the use of attention maps allowing users to see its reasoning steps despite being an end-to-end model. This end-to-end model allows robots to learn to follow natural language commands in challenging continuous environments.

IROS Conference 2020 Conference Paper

Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas

  • Yen-Ling Kuo
  • Boris Katz
  • Andrei Barbu

We demonstrate a reinforcement learning agent which uses a compositional recurrent neural network that takes as input an LTL formula and determines satisfying actions. The input LTL formulas have never been seen before, yet the network performs zero-shot generalization to satisfy them. This is a novel form of multi-task learning for RL agents where agents learn from one diverse set of tasks and generalize to a new set of diverse tasks. The formulation of the network enables this capacity to generalize. We demonstrate this ability in two domains. In a symbolic domain, the agent finds a sequence of letters that is accepted. In a Minecraft-like environment, the agent finds a sequence of actions that conform to the formula. While prior work could learn to execute one formula reliably given examples of that formula, we demonstrate how to encode all formulas reliably. This could form the basis of new multitask agents that discover sub-tasks and execute them without any additional training, as well as the agents which follow more complex linguistic commands. The structures required for this generalization are specific to LTL formulas, which opens up an interesting theoretical question: what structures are required in neural networks for zero-shot generalization to different logics?

NeurIPS Conference 2019 Conference Paper

ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models

  • Andrei Barbu
  • David Mayo
  • Julian Alverio
  • William Luo
  • Christopher Wang
  • Dan Gutfreund
  • Josh Tenenbaum
  • Boris Katz

We collect a large real-world test set, ObjectNet, for object recognition with controls where object backgrounds, rotations, and imaging viewpoints are random. Most scientific experiments have controls, confounds which are removed from the data, to ensure that subjects cannot perform a task by exploiting trivial correlations in the data. Historically, large machine learning and computer vision datasets have lacked such controls. This has resulted in models that must be fine-tuned for new datasets and perform better on datasets than in real-world applications. When tested on ObjectNet, object detectors show a 40-45% drop in performance, with respect to their performance on other benchmarks, due to the controls for biases. Controls make ObjectNet robust to fine-tuning showing only small performance increases. We develop a highly automated platform that enables gathering datasets with controls by crowdsourcing image capturing and annotation. ObjectNet is the same size as the ImageNet test set (50, 000 images), and by design does not come paired with a training set in order to encourage generalization. The dataset is both easier than ImageNet (objects are largely centered and unoccluded) and harder (due to the controls). Although we focus on object recognition here, data with controls can be gathered at scale using automated tools throughout machine learning to generate datasets that exercise models in new ways thus providing valuable feedback to researchers. This work opens up new avenues for research in generalizable, robust, and more human-like computer vision and in creating datasets where results are predictive of real-world performance.

IROS Conference 2018 Conference Paper

Deep Sequential Models for Sampling-Based Planning

  • Yen-Ling Kuo
  • Andrei Barbu
  • Boris Katz

We demonstrate how a sequence model and a sampling-based planner can influence each other to produce efficient plans and how such a model can automatically learn to take advantage of observations of the environment. Sampling-based planners such as RRT generally know nothing of their environments even if they have traversed similar spaces many times. A sequence model, such as an HMM or LSTM, guides the search for good paths. The resulting model, called DeRRT*, observes the state of the planner and the local environment to bias the next move and next planner state. The neural-network-based models avoid manual feature engineering by co-training a convolutional network which processes map features and observations from sensors. We incorporate this sequence model in a manner that combines its likelihood with the existing bias for searching large unexplored Voronoi regions. This leads to more efficient trajectories with fewer rejected samples even in difficult domains such as when escaping bug traps. This model can also be used for dimensionality reduction in multi-agent environments with dynamic obstacles. Instead of planning in a high-dimensional space that includes the configurations of the other agents, we plan in a low-dimensional subspace relying on the sequence model to bias samples using the observed behavior of the other agents. The techniques presented here are general, include both graphical models and deep learning approaches, and can be adapted to a range of planners.

IJCAI Conference 2017 Conference Paper

Temporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context

  • Rohan Paul
  • Andrei Barbu
  • Sue Felshin
  • Boris Katz
  • Nicholas Roy

A robot’s ability to understand or ground natural language instructions is fundamentally tied to its knowledge about the surrounding world. We present an approach to grounding natural language utterances in the context of factual information gathered through natural-language interactions and past visual observations. A probabilistic model estimates, from a natural language utterance, the objects, relations, and actions that the utterance refers to, the objectives for future robotic actions it implies, and generates a plan to execute those actions while updating a state representation to include newly acquired knowledge from the visual-linguistic context. Grounding a command necessitates a representation for past observations and interactions; however, maintaining the full context consisting of all possible observed objects, attributes, spatial relations, actions, etc. , over time is intractable. Instead, our model, Temporal Grounding Graphs, maintains a learned state representation for a belief over factual groundings, those derived from natural-language interactions, and lazily infers new groundings from visual observations using the context implied by the utterance. This work significantly expands the range of language that a robot can understand by incorporating factual knowledge and observations of its workspace into its inference about the meaning and grounding of natural-language utterances.

AAAI Conference 1999 Conference Paper

HIKE (HPKB Integrated Knowledge Environment) — A Query Interface and Integrated Knowledge Environment for HPKB

  • Barbara H. Starr
  • Vinay K. Chaudhri
  • Boris Katz
  • Benjamin Good
  • Jerome Thomere

This demonstration is based upon the results of a research project sponsored by the Defense Advance Research Projects Agency (DARPA), called High Performance Knowledge Bases (HPKB). The demonstrated portion of HPKB follows a questionanswering paradigm. The integrated architecture developed at Science Applications International Corporation (SAIC), called the HPKB Integrated Knowledge Environment (HIKE) is introduced. Following this, the components involved in the demonstration, which include a natural language understanding system, a first order theorem prover, and a knowledge server are briefly described. The demonstration effectively illustrates the use of both a graphical user interface and a natural language interface to query a first order theorem prover with similar results.

AAAI Conference 1999 Short Paper

Word Sense Disambiguation for Information Retrieval

  • Ozlem Uzuner
  • Boris Katz
  • Deniz Yuret
  • MIT Artificial Intelligence Laboratory

Despite their increasing importance as data retrieval tools, most Information Retrieval (IR) systems are deficient in precision and recall. Lack of disambiguation power is one reason for the poor performance of these systems. Correctly disambiguating and expanding a query with intended synonyms before retrieval may improve the performance. We use the local context of a word to identify its sense. In our case, the local context of a word is the ordered list of words from the closest content word on each side of the target word up to the target word which is expressed as a placeholder.