The integration of SYSTERS, GeneNest and SpliceNest into one framework facilitates the over-all exploration of the whole sequence space covering protein, mRNA and EST sequences, as well as genomic DNA. The SYSTERS protein sequence cluster set provides an automatically generated classification of all sequences of the SWISS-PROT and TrEMBL databases as well as of the predicted protein sequence sets of several completely sequenced organisms into disjoint protein family and superfamily clusters annotated with sequence information from various other resources. For each cluster an MView (database search or multiple alignment viewer) output is generated and from the resulting partial multiple alignment a majority consensus sequence is calculated. All consensus sequences together build a searchable sequence database. The sequences in every cluster have been multiply aligned and annotated with known domains from the Pfam protein family database. GeneNest is a database and software package for producing and visualizing gene indices from ESTs and mRNAs. Currently, the database comprises gene indices of human (based on UniGene), mouse, A.thaliana, drosophila, and zebrafish. All sequences are preprocessed to detect, annotate and clip regions containing vector sequence, repeats or are of low quality. The subsequent assembly step is done with the Staden package. For all contigs of a cluster, consensus sequences are generated and extracted to build a searchable sequence database. The visualization of a contig provides further information about the sequences, the represented gene and open reading frames, and links to precomputed protein homologies detected in the SYSTERS database. SpliceNest is a web based graphical tool to explore gene structure based on a mapping of the EST consensus sequences from GeneNest to a complete genome. Assuming that a cluster normally represents a single gene, every contig of a cluster is aligned separately to the corresponding genomic region, using a spliced alignment program. The alignments are visualized in a diagram showing the exon/intron structure of all the exons simultaneously, mapped on the common genomic sequence, automatically highlighting candidates of alternative splicing.
human and other vertebrate genomes human orfs