AAAI Conference 2026 Conference Paper
CycleChemist: A Dual-Pronged Machine Learning Framework for Organic Photovoltaic Discovery
- Hou Hei Lam
- Jiangjie Qiu
- Xiuyuan Hu
- Wentao Li
- Fankun Zeng
- Siwei Fu
- Hao Zhang
- Xiaonan Wang
Organic photovoltaic (OPV) materials offer a promising pathway for sustainable energy generation. However, their development is hindered by the challenge of identifying high-performance donor-acceptor pairs with optimal power conversion efficiencies (PCEs). Most existing design strategies focus exclusively on either the donor or the acceptor, rather than employing a unified model capable of designing both components. In this work, we introduce a dual-pronged machine learning framework for OPV discovery, integrating predictive modeling and generative molecular design. In this study, we propose the newly curated Organic Photovoltaic Donor-Acceptor Dataset (OPV²D), the largest of its kind, comprising 2,000 experimentally characterized donor-acceptor pairs. This dataset serves as a comprehensive foundation for model training and evaluation. To enable accurate property prediction in organic photovoltaic (OPV) materials, we first introduce the Organic Photovoltaic Classifier (OPVC) to predict the likelihood that a given material exhibits OPV behavior. Complementing this, we develop a hierarchical graph neural network framework that integrates multi-task learning and cross-modal donor–acceptor interaction modeling. This framework includes the Molecular Orbital Energy Estimator (MOE²) for predicting the highest occupied molecular orbital–lowest unoccupied molecular orbital (HOMO–LUMO) energy levels, and the Photovoltaic Performance Predictor (P³) for estimating power conversion efficiency (PCE). In addition, we introduce the Material Generative Pretrained Transformer (MatGPT) to generate synthetically accessible organic semiconductors. Building on this, we propose a reinforcement learning strategy with three-objective policy optimization that guides molecular generation while preserving chemical validity. By bridging molecular representation learning with device performance prediction, our framework advances computational OPV material discovery.