Author name cluster

Deepak P

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

AAAI Conference 2024 Conference Paper

Towards Fairer Centroids in K-means Clustering

Stanley Simoes
Deepak P
Muiris MacCarthaigh

There has been much recent interest in developing fair clustering algorithms that seek to do justice to the representation of groups defined along sensitive attributes such as race and sex. Within the centroid clustering paradigm, these algorithms are seen to generate clusterings where different groups are disadvantaged within different clusters with respect to their representativity, i.e., distance to centroid. In view of this deficiency, we propose a novel notion of cluster-level centroid fairness that targets the representativity unfairness borne by groups within each cluster, along with a metric to quantify the same. Towards operationalising this notion, we draw on ideas from political philosophy aligned with consideration for the worst-off group to develop Fair-Centroid; a new clustering method that focusses on enhancing the representativity of the worst-off group within each cluster. Our method uses an iterative optimisation paradigm wherein an initial cluster assignment is refined by reassigning objects to clusters such that the worst-off group in each cluster is benefitted. We compare our notion with a related fairness notion and show through extensive empirical evaluations on real-world datasets that our method significantly enhances cluster-level centroid fairness at low impact on cluster coherence.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Distributed Representations for Arithmetic Word Problems

Sowmya S Sundaram
Deepak P
Savitha Sam Abraham

We consider the task of learning distributed representations for arithmetic word problems. We outline the characteristics of the domain of arithmetic word problems that make generic text embedding methods inadequate, necessitating a specialized representation learning method to facilitate the task of retrieval across a wide range of use cases within online learning platforms. Our contribution is two-fold; ﬁrst, we propose several ’operators’ that distil knowledge of the domain of arithmetic word problems and schemas into word problem transformations. Second, we propose a novel neural architecture that combines LSTMs with graph convolutional networks to leverage word problems and their operator-transformed versions to learn distributed representations for word problems. While our target is to ensure that the distributed representations are schema-aligned, we do not make use of schema labels in the learning process, thus yielding an unsupervised representation learning method. Through an evaluation on retrieval over a publicly available corpus of word problems, we illustrate that our framework is able to consistently improve upon contemporary generic text embeddings in terms of schema-alignment.

PDF Details

AAAI Conference 2018 Conference Paper

Content and Context: Two-Pronged Bootstrapped Learning for Regex-Formatted Entity Extraction

Stanley Simoes
Deepak P
Munu Sairamesh
Deepak Khemani
Sameep Mehta

Regular expressions are an important building block of rulebased information extraction systems. Regexes can encode rules to recognize instances of simple entities which can then feed into the identiﬁcation of more complex cross-entity relationships. Manually crafting a regex that recognizes all possible instances of an entity is difﬁcult since an entity can manifest in a variety of different forms. Thus, the problem of automatically generalizing manually crafted seed regexes to improve the recall of IE systems has attracted research attention. In this paper, we propose a bootstrapped approach to improve the recall for extraction of regex-formatted entities, with the only source of supervision being the seed regex. Our approach starts from a manually authored high precision seed regex for the entity of interest, and uses the matches of the seed regex and the context around these matches to identify more instances of the entity. These are then used to identify a set of diverse, high recall regexes that are representative of this entity. Through an empirical evaluation over multiple real world document corpora, we illustrate the effectiveness of our approach.

PDF Details

IJCAI Conference 2017 Conference Paper

LoCaTe: Influence Quantification for Location Promotion in Location-based Social Networks

Ankita Likhyani
Srikanta Bedathur
Deepak P

Location-based social networks (LBSNs) such as Foursquare offer a platform for users to share and be aware of each other’s physical movements. As a result of such a sharing of check-in information with each other, users can be influenced to visit (or check-in) at the locations visited by their friends. Quantifying such influences in these LBSNs is useful in various settings such as location promotion, personalized recommendations, mobility pattern prediction etc. In this paper, we focus on the problem of location promotion and develop a model to quantify the influence specific to a location between a pair of users. Specifically, we develop a joint model called LoCaTe, consisting of (i) user mobility model estimated using kernel density estimates; (ii) a model of the semantics of the location using topic models; and (iii) a model of time-gap between check-ins using exponential distribution. We validate our model on a long-term crawl of Foursquare data collected between Jan 2015 Feb 2016, as well as on publicly available LBSN datasets. Our experiments demonstrate that LoCaTe significantly outperforms state-of-the-art models for the same task.

PDF Details