Arrow Research search

Author name cluster

Sameep Mehta

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
1 author row

Possible papers

11

AAAI Conference 2026 System Paper

DFAgent: From Natural Language Data Interactions to Reusable Agent-Ready Tools

  • Neelamadhav Gantayat
  • Renuka Sindhgatta
  • Sambit Ghosh
  • Sameep Mehta
  • Soujanya Soni

We present DataFoundry Agent (DFAgent), a system that forges reusable, agent-ready tools from interactive data exploration, quality, and remediation tasks. Users engage with data through natural-language prompts for operations that include inspection, transformation, and visualization. These interactions automatically generate executable code snippets that are logged. From these snippets, DFAgent acts as a foundry, synthesizing a governed catalog of enriched tools exposed via the Model Context Protocol (MCP). In this way, user-derived logic for all data operations is transformed into standardized, composable tools without reimplementation. We demonstrate how diverse interactions accumulate into a reusable toolset, highlighting a paradigm that unifies natural language interaction, executable code generation, and tool foundry processes for agentic data systems.

AAAI Conference 2026 System Paper

ToolSmith: A Multi-Agent Framework for Enterprise Tool Creation

  • Purna Chandra Sekhar Vakudavathu
  • Kushal Mukherjee
  • Jayachandu Bandlamudi
  • Renuka Sindhgatta
  • Sameep Mehta

Although LLMs can generate tools for generic domains and tasks, they struggle with enterprise-related domains that involve proprietary APIs and data schemas. We present ToolSmith, a framework for autonomously generating and validating agent-compatible tools. Given an API specification and a Tool Specification Requirement (TSR), ToolSmith produces a tool function and verifies it through a closed-loop process: it creates natural language (NL) tests and executes the tool in a secure agent sandbox for validation. For state-changing tools, ToolSmith confirms outcomes by querying the API with parameters derived from the NL tests. If the tool fails to produce the desired output, ToolSmith generates diagnostic feedback to iteratively regenerate it. By ensuring both functional correctness and agent compatibility, ToolSmith enables reliable automation of enterprise workflows.

AAAI Conference 2025 System Paper

Question-guided Insights Generation for Automated Exploratory Data Analysis

  • Abhijit Manatkar
  • Ashlesha Akella
  • Krishnasuri Narayanam
  • Sameep Mehta

Exploratory Data Analysis (EDA) derives meaningful insights from extensive and complex datasets. This process typically involves a series of analytical operations to identify the patterns within the data. However, the effectiveness of EDA is often limited by the user's domain knowledge and proficiency in data exploration methods. To overcome these challenges, we developed QUIS, a fully automated EDA system that uncovers insights by generating data-related questions and exploring subspaces in the dataset without prior training. QUIS allows users to control key system parameters such as beam width, beam depth, and expansion factor for subspace selection, the interestingness score for filtering valuable insights, and parameters for managing the quality and quantity of generated questions.

IJCAI Conference 2024 Conference Paper

LLM-powered GraphQL Generator for Data Retrieval

  • Balaji Ganesan
  • Sambit Ghosh
  • Nitin Gupta
  • Manish Kesarwani
  • Sameep Mehta
  • Renuka Sindhgatta

GraphQL offers an efficient, powerful, and flexible alternative to REST APIs. However, application developers writing GraphQL clients need both technical and domain-specific expertise to reap its benefits, and avoid over-fetching or under-fetching data. Automated GraphQL generation has so far proven to be a hard problem because of complex GraphQL schema and lack of benchmark datasets. To address these issues, our work focuses on building an LLM-powered pipeline that can accept user requirements in natural language along with the complex GraphQL schema and automatically produce the GraphQL query needed to retrieve the necessary data. Automated GraphQL generation helps reduce entry barriers to application developers, broadening GraphQL adoption.

AAAI Conference 2024 System Paper

LLMGuard: Guarding against Unsafe LLM Behavior

  • Shubh Goyal
  • Medha Hira
  • Shubham Mishra
  • Sukriti Goyal
  • Arnav Goel
  • Niharika Dadu
  • Kirushikesh DB
  • Sameep Mehta

Although the rise of Large Language Models (LLMs) in enterprise settings brings new opportunities and capabilities, it also brings challenges, such as the risk of generating inappropriate, biased, or misleading content that violates regulations and can have legal concerns. To alleviate this, we present "LLMGuard", a tool that monitors user interactions with an LLM application and flags content against specific behaviours or conversation topics. To do this robustly, LLMGuard employs an ensemble of detectors.

AAAI Conference 2020 Short Paper

Multidimensional Analysis of Trust in News Articles (Student Abstract)

  • Avneet Kaur
  • Maitree Leekha
  • Utkarsh Chawla
  • Ayush Agarwal
  • Mudit Saxena
  • Nishtha Madaan
  • Kalapriya Kannan
  • Sameep Mehta

The advancements in the field of Information Communication Technology have engendered revolutionary changes in the journalism industry, not only on the part of the journalists and the media personnel, but also on the people consuming these news stories, who today, are only a click away from all the updates they need. However, these advances have also exposed the prevailing venality, wearying off the trust of the public in news media. How then, does an individual discern that which, out of the countless news stories for an incident, should be trusted? This work introduces a system that presents the user a multidimensional analysis for trust in news from various media sources based on the textual content of the articles, assessment of the journalists’ perspectives and the temporal diversity of the issues being covered by the media houses publishing the news articles. Our experiments on a self-collected dataset confirm that the system aids in a comprehensive analysis of trust.

AAAI Conference 2018 Conference Paper

Content and Context: Two-Pronged Bootstrapped Learning for Regex-Formatted Entity Extraction

  • Stanley Simoes
  • Deepak P
  • Munu Sairamesh
  • Deepak Khemani
  • Sameep Mehta

Regular expressions are an important building block of rulebased information extraction systems. Regexes can encode rules to recognize instances of simple entities which can then feed into the identification of more complex cross-entity relationships. Manually crafting a regex that recognizes all possible instances of an entity is difficult since an entity can manifest in a variety of different forms. Thus, the problem of automatically generalizing manually crafted seed regexes to improve the recall of IE systems has attracted research attention. In this paper, we propose a bootstrapped approach to improve the recall for extraction of regex-formatted entities, with the only source of supervision being the seed regex. Our approach starts from a manually authored high precision seed regex for the entity of interest, and uses the matches of the seed regex and the context around these matches to identify more instances of the entity. These are then used to identify a set of diverse, high recall regexes that are representative of this entity. Through an empirical evaluation over multiple real world document corpora, we illustrate the effectiveness of our approach.

AAAI Conference 2018 Short Paper

Semantic Understanding for Contextual In-Video Advertising

  • Rishi Madhok
  • Shashank Mujumdar
  • Nitin Gupta
  • Sameep Mehta

With the increasing consumer base of online video content, it is important for advertisers to understand the video context when targeting video ads to consumers. To improve the consumer experience and quality of ads, key factors need to be considered such as (i) ad relevance to video content (ii) where and how video ads are placed, and (iii) non-intrusive user experience. We propose a framework to semantically understand the video content for better ad recommendation that ensure these criteria.

IJCAI Conference 2015 Conference Paper

Tracking Political Elections on Social Media: Applications and Experience

  • Danish Contractor
  • Bhupesh Chawda
  • Sameep Mehta
  • L Venkata Subramaniam
  • Tanveer Afzal Faruquie

In recent times, social media has become a popular medium for many election campaigns. It not only allows candidates to reach out to a large section of the electorate, it is also a potent medium for people to express their opinion on the proposed policies and promises of candidates. Analyzing social media data is challenging as the text can be noisy, sparse and even multilingual. In addition, the information may not be completely trustworthy, particularly in the presence of propaganda, promotions and rumors. In this paper we describe our work for analyzing election campaigns using social media data. Using data from the 2012 US presidential elections and the 2013 Philippines General elections, we provide detailed experiments on our methods that use granger causality to identify topics that were most “causal” for public opinion and which in turn, give an interpretable insight into “elections topics” that were most important. Our system was deployed by the largest media organization in the Philippines during the 2013 General elections and using our work, the media house able to identify and report news stories much faster than competitors and reported higher TRP ratings during the election.

IJCAI Conference 2011 Conference Paper

A System for Providing Differentiated QoS in Retail Banking

  • Sameep Mehta
  • Girish Chafle
  • Gyana Parija
  • Vikas Kedia

In today's services driven economic environment, it is imperative for organizations to provide better quality service experience to differentiate and grow their business. Customer satisfaction (C-SAT) is the key driver for retention and growth in Retail Banking. Wait time, the time spent by a customer at the branch before getting serviced, contributes significantly to C-SAT. Due to high footfall, it is improbable to improve the wait time of every customer walking in the branch. Therefore, banks in developing countries are strategically looking to segment its customers and services and offer differentiated QoS based service delivery. In this work, we present a system for customer segmentation, and scheduling based on historic value of the customer and characteristics of current service request. We describe the system and give mathematical formulation of the scheduling problem and the associated heuristics. We present results and experience of deployment of this solution in multiple branches of a leading bank in India.