Learning Plaintext-Ciphertext Cryptographic Problems via ANF-based SAT Instance Representation

Xinhao Zheng; Yang Li; Cunxin Fan; Huaijin Wu; Xinhao Song; Junchi Yan

doi:10.52202/079017-1317

Back to NeurIPS

NeurIPS 2024

Learning Plaintext-Ciphertext Cryptographic Problems via ANF-based SAT Instance Representation

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details DOI

Abstract

Cryptographic problems, operating within binary variable spaces, can be routinely transformed into Boolean Satisfiability (SAT) problems regarding specific cryptographic conditions like plaintext-ciphertext matching. With the fast development of learning for discrete data, this SAT representation also facilitates the utilization of machine-learning approaches with the hope of automatically capturing patterns and strategies inherent in cryptographic structures in a data-driven manner. Existing neural SAT solvers consistently adopt conjunctive normal form (CNF) for instance representation, which in the cryptographic context can lead to scale explosion and a loss of high-level semantics. In particular, extensively used XOR operations in cryptographic problems can incur an exponential number of clauses. In this paper, we propose a graph structure based on Arithmetic Normal Form (ANF) to efficiently handle the XOR operation bottleneck. Additionally, we design an encoding method for AND operations in these ANF-based graphs, demonstrating improved efficiency over alternative general graph forms for SAT. We then propose CryptoANFNet, a graph learning approach that trains a classifier based on a message-passing scheme to predict plaintext-ciphertext satisfiability. Using ANF-based SAT instances, CryptoANFNet demonstrates superior scalability and can naturally capture higher-order operational information. Empirically, CryptoANFNet achieves a 50x speedup over heuristic solvers and outperforms SOTA learning-based SAT solver NeuroSAT, with 96\% vs. 91\% accuracy on small-scale and 72\% vs. 55\% on large-scale datasets from real encryption algorithms. We also introduce a key-solving algorithm that simplifies ANF-based SAT instances from plaintext and ciphertext, enhancing key decryption accuracy from 76. 5\% to 82\% and from 72\% to 75\% for datasets generated from two real encryption algorithms.

Learning Plaintext-Ciphertext Cryptographic Problems via ANF-based SAT Instance Representation

Abstract

Authors

Keywords

Context