Data from: High evolutionary turnover of satellite families in Caenorhabditis
Main Authors: | Subirana, Juan A., Alba, M. Mar, Messeguer, Xavier |
---|---|
Format: | info dataset Journal |
Terbitan: |
, 2015
|
Subjects: | |
Online Access: |
https://zenodo.org/record/4935882 |
Daftar Isi:
- Background: The high density of tandem repeat sequences (satellites) in nematode genomes and the availability of genome sequences from several species in the group offer a unique opportunity to better understand the evolutionary dynamics and the functional role of these sequences. We take advantage of the previously developed SATFIND program to study the satellites in four Caenorhabditis species and investigate these questions. Methods: The identification and comparison of satellites is carried out in three steps. First we find all the satellites present in each species with the SATFIND program. Each satellite is defined by its length, number of repeats, and repeat sequence. Only satellites with at least ten repeats are considered. In the second step we build satellite families with a newly developed alignment program. Satellite families are defined by a consensus sequence and the number of satellites in the family. Finally we compare the consensus sequence of satellite families in different species. Results: We give a catalog of individual satellites in each species. We have also identified satellite families with a related sequence and compare them in different species. We analyze the turnover of satellites: they increased in size through duplications of fragments of 100-300 bases. It appears that in many cases they have undergone an explosive expansion. In C. elegans we have identified a subset of large satellites that have strong affinity for the centromere protein CENP-A. We have also compared our results with those obtained from other species, including one nematode and three mammals. Conclusions: Most satellite families found in Caenorhabditis are species-specific; in particular those with long repeats. A subset of these satellites may facilitate the formation of kinetochores in mitosis. Other satellite families in C. elegans are either related to Helitron transposons or to meiotic pairing centers.
- satfindSatfind has been designed to find satellites that contain patterns (the seed) of length L (<15) that appear at least n times in a DNA subsequence of length N. Once a satellite is found the search is continued until no more seeds are found in N bases, then the search may continue beyond the original length N. Each seed defines a possible repeat that ends at the start of the next seed. The satellite is accepted if we are able to select a percentage p of possible repeats (selected repeats) with the same length and, in this case, these selected repeats become the repeats of the satellite. The first one is chosen as the representative repeat of the satellite.malig: DNA multialignment softwareMalig has been implemented in C. Just untar malig.tar and run the make file utility. Malig has been designed to global align DNA sequences with similar length using the progressive multialignment method whose pairwise alignment uses the dynamic programming algorithm designed by Needlema and Wunsch, both with the affine gap penalty function. As a progressive multialignment method malig first computes the similarity score between all pairs of sequences. This computation is made under the following criteria: Normalized score: the idea is that a multialignment of identical sequences has to be scored one. Reverse sequences: the input file is increased by adding the reverse sequences. Cycle permutations: to compute the score between two sequences, we choose the cycle permutation of the second one with the greatest score, and this cycle is applied when the multialignment is built. Then the progressive multialignment is started. Recall that in a classic multialignment algorithm this process finishes when all sequences are aligned. In our case the process finish when the score is smaller than a similarity threshold (input parameter).malig.tar