I-Ta Lee Papers

NeurIPS Conference 2025 Conference Paper

ORBIT - Open Recommendation Benchmark for Reproducible Research with Hidden Tests

Jingyuan He
Jiongnan Liu
Vishan Oberoi
Bolin Wu
Mahima Jagadeesh Patel
Kangrui Mao
Chuning Shi
I-Ta Lee

Recommender systems are among the most impactful AI applications, interacting with billions of users every day, guiding them to relevant products, services, or information tailored to their preferences. However, the research and development of recommender systems are hindered by existing datasets that fail to capture realistic user behaviors and inconsistent evaluation settings that lead to ambiguous conclusions. This paper introduces the \textbf{O}pen \textbf{R}ecommendation \textbf{B}enchmark for Reproducible Research with H\textbf{I}dden \textbf{T}ests (\textbf{ORBIT}), a unified benchmark for consistent and realistic evaluation of recommendation models. ORBIT offers a standardized evaluation framework of public datasets with reproducible splits and transparent settings for its public leaderboard. Additionally, ORBIT introduces a new webpage recommendation task, ClueWeb-Reco, featuring web browsing sequences from 87 million public, high-quality webpages. ClueWeb-Reco is a synthetic dataset derived from real, user-consented, and privacy-guaranteed browsing data. It aligns with modern recommendation scenarios and is reserved as the hidden test part of our leaderboard to challenge recommendation models' generalization ability. ORBIT measures 12 representative recommendation models on its public benchmark and introduces a prompted LLM baseline on the ClueWeb-Reco hidden test. Our benchmark results reflect general improvements of recommender systems on the public datasets, with variable individual performances. The results on the hidden test reveal the limitations of existing approaches in large-scale webpage recommendation and highlight the potential for improvements with LLM integrations. ORBIT benchmark, leaderboard, and codebase are available at \url{https: //www. open-reco-bench. ai}.

PDF Details

AAAI Conference 2018 Conference Paper

FEEL: Featured Event Embedding Learning

I-Ta Lee
Dan Goldwasser

Statistical script learning is an effective way to acquire world knowledge which can be used for commonsense reasoning. Statistical script learning induces this knowledge by observing event sequences generated from texts. The learned model thus can predict subsequent events, given earlier events. Recent approaches rely on learning event embeddings which capture script knowledge. In this work, we suggest a general learning model–Featured Event Embedding Learning (FEEL)–for injecting event embeddings with ﬁne grained information. In addition to capturing the dependencies between subsequent events, our model can take into account higher level abstractions of the input event which help the model generalize better and account for the global context in which the event appears. We evaluated our model over three narrative cloze tasks, and showed that our model is competitive with the most recent state-of-the-art. We also show that our resulting embedding can be used as a strong representation for advanced semantic tasks such as discourse parsing and sentence semantic relatedness.

PDF Details

Possible papers

ORBIT - Open Recommendation Benchmark for Reproducible Research with Hidden Tests

FEEL: Featured Event Embedding Learning