Arrow Research search

Author name cluster

Dongwon Lee

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
1 author row

Possible papers

10

AAAI Conference 2024 Conference Paper

ALISON: Fast and Effective Stylometric Authorship Obfuscation

  • Eric Xing
  • Saranya Venkatraman
  • Thai Le
  • Dongwon Lee

Authorship Attribution (AA) and Authorship Obfuscation (AO) are two competing tasks of increasing importance in privacy research. Modern AA leverages an author's consistent writing style to match a text to its author using an AA classifier. AO is the corresponding adversarial task, aiming to modify a text in such a way that its semantics are preserved, yet an AA model cannot correctly infer its authorship. To address privacy concerns raised by state-of-the-art (SOTA) AA methods, new AO methods have been proposed but remain largely impractical to use due to their prohibitively slow training and obfuscation speed, often taking hours. To this challenge, we propose a practical AO method, ALISON, that (1) dramatically reduces training/obfuscation time, demonstrating more than 10x faster obfuscation than SOTA AO methods, (2) achieves better obfuscation success through attacking three transformer-based AA methods on two benchmark datasets, typically performing 15% better than competing methods, (3) does not require direct signals from a target AA classifier during obfuscation, and (4) utilizes unique stylometric features, allowing sound model interpretation for explainable obfuscation. We also demonstrate that ALISON can effectively prevent four SOTA AA methods from accurately determining the authorship of ChatGPT-generated texts, all while minimally changing the original text semantics. To ensure the reproducibility of our findings, our code and data are available at: https://github.com/EricX003/ALISON.

IS Journal 2024 Journal Article

The Longtail Impact of Generative AI on Disinformation: Harmonizing Dichotomous Perspectives

  • Jason S. Lucas
  • Barani Maung Maung
  • Maryam Tabar
  • Keegan McBride
  • Dongwon Lee

Generative AI (GenAI) poses significant risks in creating convincing yet factually ungrounded content, particularly in “longtail” contexts of high-impact events and resource-limited settings. While some argue that current disinformation ecosystems naturally limit GenAI’s impact, we contend that this perspective neglects longtail contexts where disinformation consequences are most profound. This article analyzes the potential impact of GenAI’s disinformation in longtail events and settings, focusing on 1) quantity: its ability to flood information ecosystems during critical events; 2) quality: the challenge of distinguishing authentic content from high-quality GenAI content; 3) personalization: its capacity for precise microtargeting exploiting individual vulnerabilities; and 4) hallucination: the danger of unintentional false information generation, especially in high-stakes situations. We then propose strategies to combat disinformation in these contexts. Our analysis underscores the need for proactive measures to mitigate risks, safeguard social unity, and combat the erosion of trust in the GenAI era, particularly in vulnerable communities and during critical events.

AAAI Conference 2023 Conference Paper

LANCER: A Lifetime-Aware News Recommender System

  • Hong-Kyun Bae
  • Jeewon Ahn
  • Dongwon Lee
  • Sang-Wook Kim

From the observation that users reading news tend to not click outdated news, we propose the notion of 'lifetime' of news, with two hypotheses: (i) news has a shorter lifetime, compared to other types of items such as movies or e-commerce products; (ii) news only competes with other news whose lifetimes have not ended, and which has an overlapping lifetime (i.e., limited competitions). By further developing the characteristics of the lifetime of news, then we present a novel approach for news recommendation, namely, Lifetime-Aware News reCommEndeR System (LANCER) that carefully exploits the lifetime of news during training and recommendation. Using real-world news datasets (e.g., Adressa and MIND), we successfully demonstrate that state-of-the-art news recommendation models can get significantly benefited by integrating the notion of lifetime and LANCER, by up to about 40% increases in recommendation accuracy.

AAMAS Conference 2022 Conference Paper

CAPS: Comprehensible Abstract Policy Summaries for Explaining Reinforcement Learning Agents

  • Joe McCalmon
  • Thai Le
  • Sarra Alqahtani
  • Dongwon Lee

As reinforcement learning (RL) continues to improve and be applied in situations alongside humans, the need to explain the learned behaviors of RL agents to end-users becomes more important. Strategies for explaining the reasoning behind an agent’s policy, called policy-level explanations, can lead to important insights about both the task and the agent’s behaviors. Following this line of research, in this work, we propose a novel approach, named as CAPS, that summarizes an agent’s policy in the form of a directed graph with natural language descriptions. A decision tree based clustering method is utilized to abstract the state space of the task into fewer, condensed states which makes the policy graphs more digestible to end-users. This abstraction allows the users to control the size of the policy graph to achieve their desired balance between comprehensibility and accuracy. In addition, we develop a heuristic optimization method to find the most explainable graph policy and present it to the users. Finally, we use the user-defined predicates to enrich the abstract states with semantic meaning. We test our approach on 5 RL tasks, using both deterministic and stochastic policies, and show that our method is: (1) agnostic to the algorithms used to train the policies, and (2) comparable in accuracy and superior in explanation capabilities to existing baselines. Especially, when provided with our explanation graph, end-users are able to accurately interpret policies of trained RL agents 80% of the time, compared to 10% when provided with the next best baseline. We make our code and datasets available to ensure the reproducibility of our research findings: https: //github. com/mccajl/CAPS

IJCAI Conference 2022 Conference Paper

Forecasting the Number of Tenants At-Risk of Formal Eviction: A Machine Learning Approach to Inform Public Policy

  • Maryam Tabar
  • Wooyong Jung
  • Amulya Yadav
  • Owen Wilson Chavez
  • Ashley Flores
  • Dongwon Lee

Eviction of tenants has reached a crisis level in the U. S. and its consequences pose significant challenges to society. To tackle this eviction crisis, policymakers have been allocating financial resources but a more efficient resource allocation would need an accurate forecast of the number of tenants at-risk of evictions ahead of time. To help enhance the existing eviction prevention/diversion programs, in this work, we propose a multi-view deep neural network model, named as MARTIAN, that forecasts the number of tenants at-risk of getting formally evicted (at the census tract level) n months into the future. Then, we evaluate MARTIAN’s predictive performance under various conditions using real-world eviction cases filed across Dallas County, TX. The results of empirical evaluation show that MARTIAN outperforms an extensive set of baseline models in terms of predictive performance. Additionally, MARTIAN’s superior predictive performance is generalizable to unseen census tracts, for which no labeled data is available in the training set. This research has been done in collaboration with Child Poverty Action Lab (CPAL), which is a pioneering non-governmental organization (NGO) working for tackling poverty-related issues across Dallas County, TX. The usability of MARTIAN is under review by subject matter experts. We release our codebase at https: //github. com/maryam-tabar/MARTIAN.

AAAI Conference 2018 Conference Paper

gOCCF: Graph-Theoretic One-Class Collaborative Filtering Based on Uninteresting Items

  • Yeon-Chang Lee
  • Sang-Wook Kim
  • Dongwon Lee

We investigate how to address the shortcomings of the popular One-Class Collaborative Filtering (OCCF) methods in handling challenging “sparse” dataset in one-class setting (e. g. , clicked or bookmarked), and propose a novel graphtheoretic OCCF approach, named as gOCCF, by exploiting both positive preferences (derived from rated items) as well as negative preferences (derived from unrated items). In capturing both positive and negative preferences as a bipartite graph, further, we apply the graph shattering theory to determine the right amount of negative preferences to use. Then, we develop a suite of novel graph-based OCCF methods based on the random walk with restart and belief propagation methods. Through extensive experiments using 3 reallife datasets, we show that our gOCCF effectively addresses the sparsity challenge and significantly outperforms all of 8 competing methods in accuracy on very sparse datasets while providing comparable accuracy to the best performing OCCF methods on less sparse datasets. The datasets and implementations used in the empirical validation are available for access: https: //goo. gl/sfiawn.

AAAI Conference 2018 Conference Paper

ROAR: Robust Label Ranking for Social Emotion Mining

  • Jason (Jiasheng) Zhang
  • Dongwon Lee

Understanding and predicting latent emotions of users toward online contents, known as social emotion mining, has became increasingly important to both social platforms and businesses alike. Despite recent developments, however, very little attention has been made to the issues of nuance, subjectivity, and bias of social emotions. In this paper, we fill this gap by formulating social emotion mining as a robust label ranking problem, and propose: (1) a robust measure, named as G-mean-rank (GMR), which sets a formal criterion consistent with practical intuition; and (2) a simple yet effective label ranking model, named as ROAR, that is more robust toward unbalanced datasets (which are common). Through comprehensive empirical validation using 4 real datasets and 16 benchmark semi-synthetic label ranking datasets, and a case study, we demonstrate the superiorities of our proposals over 2 popular label ranking measures and 6 competing label ranking algorithms. The datasets and implementations used in the empirical validation are available for access1.

AAAI Conference 2010 Conference Paper

What Is an Opinion About? Exploring Political Standpoints Using Opinion Scoring Model

  • Bi Chen
  • Leilei Zhu
  • Daniel Kifer
  • Dongwon Lee

In this paper, we propose a generative model to automatically discover the hidden associations between topics words and opinion words. By applying those discovered hidden associations, we construct the opinion scoring models to extract statements which best express opinionists’ standpoints on certain topics. For experiments, we apply our model to the political area. First, we visualize the similarities and dissimilarities between Republican and Democratic senators with respect to various topics. Second, we compare the performance of the opinion scoring models with 14 kinds of methods to find the best ones. We find that sentences extracted by our opinion scoring models can effectively express opinionists’ standpoints.

IJCAI Conference 2009 Conference Paper

  • Hyunyoung Kil
  • Wonhong Nam
  • Dongwon Lee

The Web Service Composition (WSC) problem with respect to behavioral descriptions deals with the automatic synthesis of a coordinator web service, c, that controls a set of web services to reach a goal state. Despite its importance, however, solving the WSC problem for a general case (when c has only partial observations) remains to be doubly exponential in the number of variables in web service descriptions, rendering any attempts to compute an exact solution for modest size impractical. Toward this challenge, in this paper, we propose two novel (signature preserving and subsuming) approximation-based approaches using abstraction and refinement. We empirically validate that our proposals can solve realistic problems efficiently.