PubChemLite plus Collision Cross Section (CCS) values for enhanced interpretation of non-target environmental data#
Authors#
Anjana Elapavalore, Dylan H. Ross, Valentin Groues, Dagny Aurich, Allison M. Krinsky, Sunghwan Kim, Paul A. Thiessen, Jian Zhang, James N. Dodds, Erin S. Baker, Evan E. Bolton, Libin Xu, Emma Schymanski
Abstract#
Finding relevant chemicals in the vast (known) chemical space is a major challenge for environmental and exposomics studies leveraging non-target high resolution mass spectrometry (NT-HRMS) methods. Chemical databases now contain hundreds of millions of chemicals, yet many are not relevant - and many relevant chemicals are missing. This article details an extensive collaborative, open science effort to provide a dynamic collection of chemicals for environmental, metabolomics and exposomics research, along with supporting information about their relevance to assist researchers in the interpretation of candidate hits. The PubChemLite for Exposomics collection is compiled from ten sections of PubChem, enhanced with patent, literature and annotation counts and predicted partitioning coefficient (logP) values, as well as predicted collision cross section values using CCSbase. Monthly versions are archived on Zenodo under a CC-BY license, supporting reproducible research, and a new interface has been developed, including the chemical stripes on patent and literature data, for researchers to browse the collection. This article describes the collaborative efforts to build PubChemLite, how this can support researchers in environmental/exposomics studies and explores known limitations and potential for future developments. The data and code behind these efforts are openly available and PubChemLite content can be explored at https://pubchemlite.lcsb.uni.lu.
PubChemLite is compiled weekly from openly available files on the PubChem FTP site and is archived monthly on Zenodo (DOI: 10.5281/zenodo.5995885). CCS values are added using open cs3db code, the PubChemLite-CCS files are archived on Zenodo at with CCS at DOI: 10.5281/zenodo.4081056. Both Zenodo links redirect to the latest version. The code for the PubChemLite build system, inputs, chemical stripes and interface are available on the Environmental Cheminformatics (ECI) GitLab. All are available under open licenses, see individual resources for details