Basic Local Alignment of 96 Fixed-Length 2-Bit Alphanumeric encoded Transcendental cDNA sequences with the SARS-nCoV-2 Genome

Main Author: Praharshit Sharma
Format: info Proceeding Journal
Bahasa: eng
Terbitan: , 2020
Subjects:
Online Access: https://indiabioscience.org/events/international-symposium-on-bioinformatics-and-artificial-intelligence-in-covid-19-era-and-beyond
Daftar Isi:
  • Abstract We generate Ninety-Six (96) cDNA artifact sequences of arbitrary length (and Binit encoded precision) by permuting for the Fixed-length 2-bit encodings as per the initial 1-1 Bijective as well as Alpha-numeric mapping: A| Adenine = 00, C| Cytosine = 01, G| Guanine= 10, T,U| Thymine, Uracil= 11 . In each such case, 4! = (4*3*2*1 = 24) possibilities occur, yielding the Sum-total set of 24 + 24 + 24 + 24 = (24*4 = 96) “Query” cDNA sequences. Now, the SARS-nCoV-2 Genome is retrieved from NCBI using Entrez-Direct on Ubuntu 20.04 LTS command line Terminal (RefSeq Accession Number: NC_045512.2). There is a rationale behind the Generation of these basis Four (4) cDNA sequences, from which upto 96 sequences can potentially accrue. For 5`-A^ , we consider the “Champernowne constant” (which is 00.11011100101110111...base_2), ‘directly’ equivalent to the primer ATCTAGTGT---) in First iteration (without Permutations). For 5`-C^ , we consider the “Golden-ratio/ Phi” (which is 01.100111100011...base_2), that is ‘directly’ equivalent to Primer CGCTTGAT---). For 5`-G^ , we consider “Napier’s constant/ Euler’s constant (e) that happens to be (10.10110111111...base_2): ‘directly’ being GGTCTT---). Finally for 5`-T^ , we take the Fundamental “Circular constant/ PI” i.e., (11.001001000011...base_2) and this is ‘directly’ the Primer sequence (TAGCAAT---). We create a “BLAST database” of the 29903 bp long ss-RNA/ cDNA sequence (using –makeblastdb subcommand) and after obtaining the 96 possible permutations, we systematically perform BLAST/ Basic Local Alignment of above cDNA sequence Stretches, striving to look for which of them produces the Highest maximal pairwise alignment score and also evaluate their respective “Karlin-Altschul statistical parameters” such as K-value, Expectation value (E) and Lambda. Further decision analysis is done based on Length of Thus Generated sequences classified into Restriction enzymes, Primers and potentially Whole Genome simulators.