Author name cluster

Sun Kim

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

2 author rows

ICML Conference 2025 Conference Paper

BounDr. E: Predicting Drug-likeness via Biomedical Knowledge Alignment and EM-like One-Class Boundary Optimization

Dongmin Bang
Inyoung Sung
Yinhua Piao
Sangseon Lee
Sun Kim

The advent of generative AI now enables large-scale $\textit{de novo}$ design of molecules, but identifying viable drug candidates among them remains an open problem. Existing drug-likeness prediction methods often rely on ambiguous negative sets or purely structural features, limiting their ability to accurately classify drugs from non-drugs. In this work, we introduce BounDr. E: a novel modeling of drug-likeness as a compact space surrounding approved drugs through a dynamic one-class boundary approach. Specifically, we enrich the chemical space through biomedical knowledge alignment, and then iteratively tighten the drug-like boundary by pushing non-drug-like compounds outside via an Expectation-Maximization (EM)-like process. Empirically, BounDr. E achieves 10% F1-score improvement over the previous state-of-the-art and demonstrates robust cross-dataset performance, including zero-shot toxic compound filtering. Additionally, we showcase its effectiveness through comprehensive case studies in large-scale $\textit{in silico}$ screening. Our codes and constructed benchmark data under various schemes are provided at: https: //github. com/eugenebang/boundr_e.

Details

ICLR Conference 2025 Conference Paper

CheapNet: Cross-attention on Hierarchical representations for Efficient protein-ligand binding Affinity Prediction

Hyukjun Lim
Sun Kim
Sangseon Lee

Accurately predicting protein-ligand binding affinity is a critical challenge in drug discovery, crucial for understanding drug efficacy. While existing models typically rely on atom-level interactions, they often fail to capture the complex, higher-order interactions, resulting in noise and computational inefficiency. Transitioning to modeling these interactions at the cluster level is challenging because it is difficult to determine which atoms form meaningful clusters that drive the protein-ligand interactions. To address this, we propose CheapNet, a novel interaction-based model that integrates atom-level representations with hierarchical cluster-level interactions through a cross-attention mechanism. By employing differentiable pooling of atom-level embeddings, CheapNet efficiently captures essential higher-order molecular representations crucial for accurate binding predictions. Extensive evaluations demonstrate that CheapNet not only achieves state-of-the-art performance across multiple binding affinity prediction tasks but also maintains prediction accuracy with reasonable computational efficiency. The code of CheapNet is available at https://github.com/hyukjunlim/CheapNet.

Details

ICML Conference 2025 Conference Paper

CombiMOTS: Combinatorial Multi-Objective Tree Search for Dual-Target Molecule Generation

Thibaud Southiratn
Bonil Koo
Yijingxiu Lu
Sun Kim

Dual-target molecule generation, which focuses on discovering compounds capable of interacting with two target proteins, has garnered significant attention due to its potential for improving therapeutic efficiency, safety and resistance mitigation. Existing approaches face two critical challenges. First, by simplifying the complex dual-target optimization problem to scalarized combinations of individual objectives, they fail to capture important trade-offs between target engagement and molecular properties. Second, they typically do not integrate synthetic planning into the generative process. This highlights a need for more appropriate objective function design and synthesis-aware methodologies tailored to the dual-target molecule generation task. In this work, we propose CombiMOTS, a Pareto Monte Carlo Tree Search (PMCTS) framework that generates dual-target molecules. CombiMOTS is designed to explore a synthesizable fragment space while employing vectorized optimization constraints to encapsulate target affinity and physicochemical properties. Extensive experiments on real-world databases demonstrate that CombiMOTS produces novel dual-target molecules with high docking scores, enhanced diversity, and balanced pharmacological characteristics, showcasing its potential as a powerful tool for dual-target drug discovery. The code and data is accessible through https: //github. com/Tibogoss/CombiMOTS.

Details

JBHI Journal 2025 Journal Article

Dual Representation Learning for Predicting Drug-Side Effect Frequency Using Protein Target Information

Sungjoon Park
Sangseon Lee
Minwoo Pak
Sun Kim

Knowledge of unintended effects of drugs is critical in assessing the risk of treatment and in drug repurposing. Although numerous existing studies predict drug-side effect presence, only four of them predict the frequency of the side effects. Unfortunately, current prediction methods 1) do not utilize drug targets, 2) do not predict well for unseen drugs, and 3) do not use multiple heterogeneous drug features. We propose a novel deep learning-based drug-side effect frequency prediction model. Our model utilized heterogeneous features such as target protein information as well as molecular graph, fingerprints, and chemical similarity to create drug embeddings simultaneously. Furthermore, the model represents drugs and side effects into a common vector space, learning the dual representation vectors of drugs and side effects, respectively. We also extended the predictive power of our model to compensate for the drugs without clear target proteins using the Adaboost method. We achieved state-of-the-art performance over the existing methods in predicting side effect frequencies, especially for unseen drugs. Ablation studies show that our model effectively combines and utilizes heterogeneous features of drugs. Moreover, we observed that, when the target information given, drugs with explicit targets resulted in better prediction than the drugs without explicit targets.

Details DOI

AAAI Conference 2024 Conference Paper

DiSCO: Diffusion Schrödinger Bridge for Molecular Conformer Optimization

Danyeong Lee
Dohoon Lee
Dongmin Bang
Sun Kim

The generation of energetically optimal 3D molecular conformers is crucial in cheminformatics and drug discovery. While deep generative models have been utilized for direct generation in Euclidean space, this approach encounters challenges, including the complexity of navigating a vast search space. Recent generative models that implement simplifications to circumvent these challenges have achieved state-of-the-art results, but this simplified approach unavoidably creates a gap between the generated conformers and the ground-truth conformational landscape. To bridge this gap, we introduce DiSCO: Diffusion Schrödinger Bridge for Molecular Conformer Optimization, a novel diffusion framework that enables direct learning of nonlinear diffusion processes in prior-constrained Euclidean space for the optimization of 3D molecular conformers. Through the incorporation of an SE(3)-equivariant Schrödinger bridge, we establish the roto-translational equivariance of the generated conformers. Our framework is model-agnostic and offers an easily implementable solution for the post hoc optimization of conformers produced by any generation method. Through comprehensive evaluations and analyses, we establish the strengths of our framework, substantiating the application of the Schrödinger bridge for molecular conformer optimization. First, our approach consistently outperforms four baseline approaches, producing conformers with higher diversity and improved quality. Then, we show that the intermediate conformers generated during our diffusion process exhibit valid and chemically meaningful characteristics. We also demonstrate the robustness of our method when starting from conformers of diverse quality, including those unseen during training. Lastly, we show that the precise generation of low-energy conformers via our framework helps in enhancing the downstream prediction of molecular properties. The code is available at https://github.com/Danyeong-Lee/DiSCO.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

Yinhua Piao
Sangseon Lee
Dohoon Lee
Sun Kim

Recently, graph neural networks (GNNs) have been widely used for document classification. However, most existing methods are based on static word co-occurrence graphs without sentence-level information, which poses three challenges: (1) word ambiguity, (2) word synonymity, and (3) dynamic contextual dependency. To address these challenges, we propose a novel GNN-based sparse structure learning model for inductive document classification. Specifically, a document-level graph is initially generated by a disjoint union of sentence-level word co-occurrence graphs. Our model collects a set of trainable edges connecting disjoint words between sentences, and employs structure learning to sparsely select edges with dynamic contextual dependencies. Graphs with sparse structures can jointly exploit local and global contextual information in documents through GNNs. For inductive learning, the refined document graph is further fed into a general readout function for graph-level classification and optimization in an end-to-end manner. Extensive experiments on several real-world datasets demonstrate that the proposed model outperforms most state-of-the-art results, and reveal the necessity to learn sparse structures for each document.

PDF Details

IJCAI Conference 2018 Conference Paper

Hybrid Approach of Relation Network and Localized Graph Convolutional Filtering for Breast Cancer Subtype Classification

Sungmin Rhee
Seokjun Seo
Sun Kim

Network biology has been successfully used to help reveal complex mechanisms of disease, especially cancer. On the other hand, network biology requires in-depth knowledge to construct disease-specific networks, but our current knowledge is very limited even with the recent advances in human cancer biology. Deep learning has shown an ability to address the problem like this. However, it conventionally used grid-like structured data, thus application of deep learning technologies to the human disease subtypes is yet to be explored. To overcome the issue, we propose a hybrid model, which integrates two key components 1) graph convolution neural network (graph CNN) and 2) relation network (RN). Experimental results on synthetic data and breast cancer data demonstrate that our proposed method shows better performances than existing methods.

PDF Details

AIIM Journal 2010 Journal Article

Data mining for the study of disease genes and proteins

Sun Kim

Details DOI

AAAI Conference 1994 Conference Paper

ModGen: Theorem Proving by Model Generation

Sun Kim

ModGen (Model Generation) is a complete theorem prover for first order logic with finite Herbrand do- Sun Kim Hantao L&hang Department of Computer Science The University of Iowa Iowa City, IA 52242, U. S. A {sunkim, hihang}@cs. uiowa. edu mains. ModGen takes first order formulas as input, and generates models of the input formulas. ModGen consists of two major modules: a module for transforming the input formulas into propositional clauses, and a module to find models of the propositional clauses. The first module can be used by other researchers so that the SAT problems can be easily represented, stored and communicated. An important issue in the design of ModGen is to ensure that transformed propositional clauses are satisfiable iff the original formulas are. The second module can be easily replaced by any advanced SAT problem solver. Mod- Gen is easy to use and very efficient. Many problems which are hard for general resolution theorem provers are found easy for ModGen.

PDF Details