Per- and polyfluoroalkyl substances (PFAS) in PubChem: 7 million and growing#
Authors#
Emma Schymanski, Jian Zhang, Paul A. Thiessen, Parviel Chirsir, Todor Kondic, Evan E. Bolton
Abstract#
Per- and polyfluoroalkyl substances (PFAS) are of high concern, with calls to regulate these as a class. In 2021, the Organisation for Economic Co-operation and Development (OECD) revised the definition of PFAS to include any chemical containing at least one saturated CF2 or CF3 moiety. The consequence is that one of the largest open chemical collections, PubChem, with 115 million compounds, now contains over 7 million PFAS under this revised definition. These numbers are several orders of magnitude higher than previously established PFAS lists (typically thousands of entries) and pose an incredible challenge to researchers and computational workflows alike. This article describes a dynamic, openly accessible effort to navigate and explore the >7 million PFAS and >21 million fluorinated compounds (17 June 2023) in PubChem by establishing the “PFAS and Fluorinated Compounds in PubChem” Classification Browser (or “PubChem PFAS Tree”). A total of 36,500 nodes support browsing of the content according to several categories, including classification, structural properties, regulatory status, or presence in existing PFAS suspect lists. Additional annotation and associated data can be used to create subsets (and thus manageable suspect lists or databases) of interest for a wide range of environmental, regulatory, exposomics and other applications.
Raw data#
The raw data is publicaly available on the PubChem FTP site.
Source code#
The code to create the PubChem PFAS Tree is available on GitLab (PubChem PFAS Tree PERL Scripts and PubChem PFAS Annotations), along with more detailed documentation about the PubChem PFAS Tree.