Improving the performance of models for one-step retrosynthesis through re-ranking

Retrosynthesis is at the core of organic chemistry. Recently, the rapid growth of artificial intelligence (AI) has spurred a variety of novel machine learning approaches for data-driven synthesis planning. These methods learn complex patterns from reaction databases in order to predict, for a given...

Full description

Saved in:
Bibliographic Details
Main Authors: Lin, Min Htoo, Tu, Zhengkai, Coley, Connor W.
Other Authors: School of Physical and Mathematical Sciences
Format: Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/163080
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-163080
record_format dspace
spelling sg-ntu-dr.10356-1630802023-02-28T20:05:22Z Improving the performance of models for one-step retrosynthesis through re-ranking Lin, Min Htoo Tu, Zhengkai Coley, Connor W. School of Physical and Mathematical Sciences Science::Chemistry Computer-Aided Synthesis Planning Energy-Based Model Retrosynthesis is at the core of organic chemistry. Recently, the rapid growth of artificial intelligence (AI) has spurred a variety of novel machine learning approaches for data-driven synthesis planning. These methods learn complex patterns from reaction databases in order to predict, for a given product, sets of reactants that can be used to synthesise that product. However, their performance as measured by the top-N accuracy in matching published reaction precedents still leaves room for improvement. This work aims to enhance these models by learning to re-rank their reactant predictions. Specifically, we design and train an energy-based model to re-rank, for each product, the published reaction as the top suggestion and the remaining reactant predictions as lower-ranked. We show that re-ranking can improve one-step models significantly using the standard USPTO-50k benchmark dataset, such as RetroSim, a similarity-based method, from 35.7 to 51.8% top-1 accuracy and NeuralSym, a deep learning method, from 45.7 to 51.3%, and also that re-ranking the union of two models' suggestions can lead to better performance than either alone. However, the state-of-the-art top-1 accuracy is not improved by this method. Nanyang Technological University Published version This work was supported by the CN Yang Scholars Programme at Nanyang Technological University and the Machine Learning for Pharmaceutical Discovery and Synthesis consortium. 2022-11-21T02:06:17Z 2022-11-21T02:06:17Z 2022 Journal Article Lin, M. H., Tu, Z. & Coley, C. W. (2022). Improving the performance of models for one-step retrosynthesis through re-ranking. Journal of Cheminformatics, 14(1), 15-. https://dx.doi.org/10.1186/s13321-022-00594-8 1758-2946 https://hdl.handle.net/10356/163080 10.1186/s13321-022-00594-8 35292121 2-s2.0-85127152962 1 14 15 en Journal of Cheminformatics © The Author(s) 2022. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativeco mmons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Science::Chemistry
Computer-Aided Synthesis Planning
Energy-Based Model
spellingShingle Science::Chemistry
Computer-Aided Synthesis Planning
Energy-Based Model
Lin, Min Htoo
Tu, Zhengkai
Coley, Connor W.
Improving the performance of models for one-step retrosynthesis through re-ranking
description Retrosynthesis is at the core of organic chemistry. Recently, the rapid growth of artificial intelligence (AI) has spurred a variety of novel machine learning approaches for data-driven synthesis planning. These methods learn complex patterns from reaction databases in order to predict, for a given product, sets of reactants that can be used to synthesise that product. However, their performance as measured by the top-N accuracy in matching published reaction precedents still leaves room for improvement. This work aims to enhance these models by learning to re-rank their reactant predictions. Specifically, we design and train an energy-based model to re-rank, for each product, the published reaction as the top suggestion and the remaining reactant predictions as lower-ranked. We show that re-ranking can improve one-step models significantly using the standard USPTO-50k benchmark dataset, such as RetroSim, a similarity-based method, from 35.7 to 51.8% top-1 accuracy and NeuralSym, a deep learning method, from 45.7 to 51.3%, and also that re-ranking the union of two models' suggestions can lead to better performance than either alone. However, the state-of-the-art top-1 accuracy is not improved by this method.
author2 School of Physical and Mathematical Sciences
author_facet School of Physical and Mathematical Sciences
Lin, Min Htoo
Tu, Zhengkai
Coley, Connor W.
format Article
author Lin, Min Htoo
Tu, Zhengkai
Coley, Connor W.
author_sort Lin, Min Htoo
title Improving the performance of models for one-step retrosynthesis through re-ranking
title_short Improving the performance of models for one-step retrosynthesis through re-ranking
title_full Improving the performance of models for one-step retrosynthesis through re-ranking
title_fullStr Improving the performance of models for one-step retrosynthesis through re-ranking
title_full_unstemmed Improving the performance of models for one-step retrosynthesis through re-ranking
title_sort improving the performance of models for one-step retrosynthesis through re-ranking
publishDate 2022
url https://hdl.handle.net/10356/163080
_version_ 1759856119945101312