Invalid SMILES are beneficial rather than detrimental to chemical language models
AbstractGenerative machine learning models have attracted intense interest for their ability to sample novel molecules with desired chemical or biological properties. Among these, language models trained on SMILES (Simplified Molecular-Input Line-Entry System) representations have been subject to the most extensive experimental validation and have been widely adopted. However, these models have what is perceived to be a major limitation: some fraction of the SMILES strings that they generate are...
Read more at nature.com