MetaClass - A comprehensive classification system for the prediction of the metabolic reaction
Main Authors: | Alessandro Pedretti, Giulio Vistoli, Angelica Mazzolari, Alice Scaccabarozzi |
---|---|
Format: | info software Journal |
Bahasa: | eng |
Terbitan: |
, 2021
|
Subjects: | |
Online Access: |
https://zenodo.org/record/5128531 |
Daftar Isi:
- 1. Introduction MetaClass is a comprehensive classification system for the prediction of the metabolic reactions of a given molecule or of a set of molecules. The prediction is based on a machine-learning algorithm (Random Forest), which was trained by using the metabolic data collected and classified into the MetaQSAR database. MetaClass package includes two modules: MetaClass builder and MetaClass predictor. The former can generate automatically not only the models but also code, which can be used directly in the VEGA ZZ environment for the prediction, while the latter is the result of MetaClass builder run. 2. Software requirements For MetaClass Predictor: VEGA ZZ 3.2.1 or greater (freely available at http://www.vegazz.net). MOPAC 2016 (freely available after registration at http://openmopac.net). For the version selection and the right installation, you must consult the VEGA ZZ manual (https://www.ddl.unimi.it/manual/pages/gl_activation.htm#MOPAC2007). For MetaClass Builder: The same packages required for MetaClass Predictor. Weka 3.8 or greater (freely available at https://www.cs.waikato.ac.nz/ml/weka/index.html). Java (freely available at https://www.java.com, required to run Weka). Tree2C (freely available at https://www.ddl.unimi.it/cms/index.php?Software_projects:Tree2C). Some VEGA ZZ releases do not include this tool, therefore you should install it manually by copying the Tree2C.exe from Tree2C\Bin\Mingw32 or Tree2C\Bin\Mingw64 directories of Tree2C package respectively to ...\VEGA ZZ\Bin\Win32 or ...\VEGA ZZ\Bin\Win64 directories according to the VEGA ZZ version installed in your system (x86 or x64). Tcc C compiler (included in VEGA ZZ package). A MetaQSAR database in the supported format (ODBC data source, SQLite and Microsoft Access). 3. Installation To install the MetaClass, you must unzip the package and run the Install.vbs script. At the end of the installation procedure, you can find the MetaClass builder.c and MetaClass predictor.vll files in the ADMET branch of the script tree of VEGA ZZ program (to show it, select File → Run script in the main menu). The setup aborts if the VEGA ZZ package is not installed. 4. MetaClass predictor The MetaClass predictor is a compiled C-Script (available for both x64 and x86 architectures), which allows the prediction of metabolic reactions which a given molecule undergoes according to MetaQSAR rules. To run it, you must start VEGA ZZ, select File → Run script in the main menu, expand the ADMET branch and double click MetaClass predictor.vll. If a molecule is present in the current workspace, the prediction is performed for a single molecule and the results are shown in the VEGA ZZ console. If the workspace is empty, a file requester is shown to select an input database (it must be in a format supported by VEGA ZZ: Microsoft Access, Mol2, ODBC data source, SDF, SMILES, SQLite and Zip). In this second case, the prediction is performed for all molecules in the database and the results are saved into a CSV file. Since the training set used by the learning phase (the substrates classified in the MetaQSAR database) includes only molecules in non-ionized form, with the exception of quaternary ammonium salts, the molecules for which you want to predict the metabolic reactions must also be in their neutral form. For each reaction class (according to the metabolic reaction classification in MetaQSAR), the output table shows the code and the description (Reaction column), if the molecule is substrate or not (Substrate column) and the number of the domain violations (Dom. viol. column). This counter indicates how many parameters/attributes are out of the range of the property space of the training set. If this value is not zero, the prediction might be less accurate. If the input is a database of molecule, the output is saved to a CSV file thanks to a file requester, which allows you to choose the file name. The prediction can be aborted in any time just clicking the Abort button shown in the progress window. The output file includes a column with the name of the molecule for which the prediction is given, and two columns for each reaction class reporting respectively the prediction (1 if the molecule is substrate, 0 if it is not) and the number of domain violations. 5. MetaClass builder As explained above, MetaClass builder can generate the predictive models based on MetaQSAR dataset as well as the C-Script code required to build the MetaClass predictor. To run it, you must start VEGA ZZ, select File → Run script in the main menu, expand the ADMET branch and double click MetaClass builder.c. The only input required by the script is a database, which must be in a format supported by VEGA ZZ (ODBC data source, SQLite and Microsoft Access). MetaClass builder modifies the database structure by adding three new tables, which include the descriptors/attributes calculated with MOPAC and Kier-Hall approach (namely Prop_Mopac, Prop_Mopac_Atom and Prop_KierHall). Therefore, if you want to keep the original structure, you must create a work-copy of the database. Moreover, if the script finds these tables, the calculation of the descriptors is not performed so speeding-up the whole process. If you want to force the descriptors calculation, you should delete these three tables from the database. 5.1 How MetaClass builder works MetaClass builder uses a multi-step approach to build the final models, which are compiled as link-libraries that can be used directly in VEGA ZZ environment as shown here: Checking the required programs/components. If one of them is missing, the script aborts. Checking if the input file/data source (specified by the file requester) is a MetaQSAR-compatible database. Extracting the reaction classes for which the predictive models will be developed. Calculating and storing into the same database the Kier-Hall descriptors if required. Calculating and storing the MOPAC descriptors if required by applying the following keywords: PM7 GEO-OK MMOK 1SCF SUPER THREADS=1. If your molecules are not optimized by MOPAC, you must delete the 1SCF keyword from VGS_MOPAC_KEYS definition of the script. Extracting the VEGA-based molecular descriptors from the database, which are calculated by MetaQSAR when the user compile it. For each reaction class: Building for each class a balanced dataset by selecting all substrates of the reaction class and an equal number of non-substrates (which are chosen randomly). Creating the input file for Weka (in ARFF format) by considering the most significant attributes previously chosen by Select attributes tool implemented in Weka (see ADMET\_MetaClass builder\*.txt files) according to the BestFirst search algorithm (direction = Forward; lookupCacheSize = 1; searchTermination = 5) and the WrapperSubsetEval attribute evaluator (classifier = RandomForest with default settings; doNotCheckCapabilities = False; evaluationMeasure = accuracy, RMSE; folds = 5; seed = 1; threshold = 0.01). Running Weka to build the models by Random Forest algorithm with the default parameters, which are: weka.classifiers.trees.RandomForest -P 100 -print -I 100 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1. Checking if the Weka calculation is completed without errors and reads the output file to extract the statistical data. Running Tree2C to convert the decision trees generated by Weka into C-Script code. Compiling the C code (for both x64 and x86 versions) into the object file by using Tcc. For all reaction classes: Creating the configuration header file of the main code according the data collected during the generation of the models. Copying the main code template (main.c) from ADMET\_MetaClass builder directory to the working directory, compiling and linking it by Tcc with the object files created for each reaction class. Installing the resulting compiled scripts (.vll and .vl1) into the ADMET scripts directory of VEGA ZZ program. Cleaning the working directory if VGS_CLEANUP macro is defined in the code of the script (by default this is not performed). The MetaClass builder generates both x64 and x86 versions of the MetaClass predictor only if both VEGA ZZ x86 and x64 are installed. Usually, only one of the two versions is installed according to your operating system, but you can override this behaviour during the VEGA ZZ setup by choosing the installation of the Live CD creator component. The working directory is the same in which the MetaQSAR database file is placed and here several intermediate files are saved. In detail, you can find: config.h: the configuration header of the main part of MetaClass predictor (main.c). DATABASE_NAME - Model performances.csv: this file includes several statistical data of the models created by Weka (see below). main.c: the main code of MetaClass predictor. main_32.o: the object file compiled by Tcc from main.c (x86 version). main_64.o: the object file compiled by Tcc from main.c (x64 version). MetaClass predictor.vll: the final script compiled and linked by Tcc (x86 version). MetaClass predictor.vl1: the final script compiled and linked by Tcc (x64 version). For each reaction class you can find: DATABASE_NAME – REACTION_CODE.arff: the Weka input file in ARFF format. DATABASE_NAME – REACTION_CODE.txt: the Weka output file with the trees in text format. model_REACTION_CODE.c: the source code of the decision tree translated by Tree2C. model_REACTION_CODE.o: the object file of the decision tree compiled by Tcc (x86 version). model_REACTION_CODE_32.o: the object file of the decision tree compiled by Tcc (x86 version). model_REACTION_CODE_64.o: the object file of the decision tree compiled by Tcc (x64 version). 6. History Release 1.0 (22/07/2021) First public release. 7. Copyright and disclaimers All trademarks and software directly or indirectly referred in this document, are copyrighted from legal owners. The MetaClass builder and the MetaClass predictor are pieces of software, which can be freely distributed through Internet, CD-ROM and other electronic formats. The Author of this program accepts no responsibility for hardware/software damages resulting from the use of this package. No warranty is made about the software or its performance. Use and copying of this software and the preparation of derivative works based on this software are permitted, so long as the following conditions are met: The copyright notice and this entire notice are included intact and prominently carried on all copies and supporting documentation. No fees or compensation are charged for use, copies, or access to this software. You may charge a nominal distribution fee for the physical act of transferring a copy, but you may not charge for the program itself. Any work distributed or published that in whole or in part contains or is a derivative of this software or any part thereof is subject to the terms of this agreement. The aggregation of another unrelated program with this software or its derivative on a volume of storage or distribution medium does not bring the other program under the scope of these terms. MetaClass builder MetaClass predictor Are pieces of software developed in 2020-2021 by Alessandro Pedretti All rights reserved. Alessandro Pedretti Dipartimento di Scienze Farmaceutiche Università degli Studi di Milano Via Luigi Mangiagalli, 25 I-20133 Milano - Italy Tel. +39 02 503 19332 E-Mail: info@vegazz.net WWW: http://www.vegazz.net