Systematic Data Analysis and Diagnostic Machine Learning Reveal Differences between Compounds with Single- and Multitarget Activity
Main Authors: | Christian Feldmann, Dimitar Yonchev, Dagmar Stumpfe, Jürgen Bajorath |
---|---|
Format: | info dataset Journal |
Terbitan: |
, 2020
|
Subjects: | |
Online Access: |
https://zenodo.org/record/4190988 |
Daftar Isi:
- The deposited files contain balanced data sets of multi-target (MT) and single-target (ST) compounds (CPDs) used for machine learning studies (https://dx.doi.org/10.1021/acs.molpharmaceut.0c00901). The first file (st_mt_data.tsv) contains 15,142 MT- and 15,081 ST-CPDs and the second (st_dt_data.tsv) 1828 DT- and 1776 ST-CPDs. For each CPD, a nonstereo_aromatic_SMILES representation, the original ChEMBL_cid, UniProt (target) IDs, and CPD category (CPD_CAT) (i.e. DT/MT/ST) is provided. DT stands for 'diverse-target' and denotes a subset of MT-CPDs (as detailed in the publication). In addition, a CPD is tagged “Y” if it continued to be present in the data set after removal of 50% randomly selected CPDs or 50% CPD nearest neighbors (NN), respectively.