Systematic Data Analysis and Diagnostic Machine Learning Reveal Differences between Compounds with Single- and Multitarget Activity

Main Authors: Christian Feldmann, Dimitar Yonchev, Dagmar Stumpfe, Jürgen Bajorath
Format: info dataset Journal
Terbitan: , 2020
Subjects:
Online Access: https://zenodo.org/record/4190988
Daftar Isi:
  • The deposited files contain balanced data sets of multi-target (MT) and single-target (ST) compounds (CPDs) used for machine learning studies (https://dx.doi.org/10.1021/acs.molpharmaceut.0c00901). The first file (st_mt_data.tsv) contains 15,142 MT- and 15,081 ST-CPDs and the second (st_dt_data.tsv) 1828 DT- and 1776 ST-CPDs. For each CPD, a nonstereo_aromatic_SMILES representation, the original ChEMBL_cid, UniProt (target) IDs, and CPD category (CPD_CAT) (i.e. DT/MT/ST) is provided. DT stands for 'diverse-target' and denotes a subset of MT-CPDs (as detailed in the publication). In addition, a CPD is tagged “Y” if it continued to be present in the data set after removal of 50% randomly selected CPDs or 50% CPD nearest neighbors (NN), respectively.