|
|
|
1) Genes in eukaryotes are often organized into exons and introns, which require post-transcriptional splicing to produce a mature mRNA with a contiguous open reading frame for translation. This broken organization can make gene identification difficult in eukaryotes-- particularly in higher eukaryotes with complex gene organization. Prediction of many genes and their organization has been based on similarity searches between genomic sequence and known protein amino acid sequences and/or genomic sequence and the corresponding full-length cDNAs. cDNAs are reverse transcribed mRNAs and therefore do not have intron sequences. Because of this, cDNAs (i.e. copied DNA) can be considered mRNAs. A comparison of a genomic sequence (with introns) to its corresponding cDNAs will reveal where the introns begin and end. GenBank will contain the genomic sequence and the cDNA sequence. To find out the structure of the gene (i.e. the arrangement of the exons and introns) we simply need to perform a sequence comparison between the genomic sequence and the cDNA sequence. The Basic Local Alignment Sequence Tool (BLAST) is a good way to obtain the translated cDNA sequence. This will show you where the introns and exons are located. Use the genomic sequence and the BLAST website for this assignment. Create a schematic diagram that shows the arrangement of the introns and exons in this genomic sequence. Give the protein sequence associated with this segment of the C. elegans genome. Below is the small portion (1,500 bp) of the C. elegans genome: ATTTTTAAAAATGTACAAAATCAAACGCCCTACAAATCATGTGTGTGAAGAAGAATAATAACTAACATAT CTATTTATATTTACCGAATAAATATATATTCATCAATTAACCTGAAGAACAAACGAATTCGGCTACAGGC GTCGATCAGTCTCGAATCTAGTAACAACAAGAGAGCAATACGAAAACCGGTAAATCAATAGGGGGAAGCG AAACAGTAGGTACAAATTGGAGGGGAAGCACCAATACATTAGGTGGGGGGTACGACTTGAAAAATGAGCT GATTTTCGAATAGTTAAAGCGATGATCGTGTCCGAAAAACAGTTCATTTTTCAAGACAACATTGAGACTG GGAGTACGGGGAAGCTCATTTACGGTGAGAGGAATTGGTGAGATCTTTAGAATATGCTTAAGGAGTTGGG GTGGCTGGAGAAGTTCCTGTAGCCTCCGTGCCGGGATTCGATGGAGAAGTCGTTGCGGCTGGTCCCTTTT CCTTCACTGGTGCTGGATCCTTGGCTGGAAGACATATGCGTGGCTTGACAGTCGATGAGGTGCGAGCCGA CGAGTCCTTGTGAACTTCGTATCTGGAAATATTTTACTTAGATAGCAAATACTAAAATTGTAAAATTACC TCAAAATCTCAGTATCCGGAATGCTCAATTTCTGCTTCAAAACCTGTCCGATGCGAAGATTGACATCATC GCGAGTAGCATCACGAGTCCACAAGGAAACCTTGTCACCCTTTTGACGAACATTCACGACAGCTCCGCAG ATGTAGTCTCCGTACTCGTCGAATTGCTCTCCAACAATAGCCATCAACAGCTCCAACCAGTAGTGATCGA GCAATTGCGTTCTTCTCTGAAGCTTCTATGATTCATTGAATAAAATATATTTCTCAAAACGTACTTGCTT ATCGACAACAACCAACCAACGTCCACCTTGAACGTTGTTGACGTCCTCCCACATTGGCTTGATTCCTTCC TTGAACAAGTAATAATCGGATCCCCAGTTCAATCCTCCGGCAGACTGAATGTGATTGTACAGCGACCAGA AGTCCTCGACAGTGTCGAAAAGTGAAACCATCTGGAAAAAATCGATAAAAGACGTATTTAAAAATCTTCT ACCTTCAGACAATCCTCCCATTCCTTGTTACGGTCAGCTTTCAAGTACCAGAGAGCCCAGCGATTCTGGA GGGGGTGTCTGGTGAGAAGCTCTGGAGGAACTGAAGCATCGGACGCATTCACATCGCCGGAAGCTGACAA TGCTTTGTTTTCCGCTACGGATGTGCTCATTTAGCTGAAAATAGGTAATATTATATACGATTAGAGCTCG GAAAACGATAAAATAGAGAAGAGTATGAATTTGGTTCAAATAACTCGGATTTTATAGGAAATTTTGTTTT ACTGCACATTTTCGGCTAGTTTCCAAGCTTTTTAGATTTTTCAAGTGTAATTGGTAACATCGGGCACAAT AAATTGATATTAAAGCTTGGAAAACAATAA In addition to the assignment above answer the following: (a) What does the blastx software program do with your nucleotide sequence before searching UniProtKB/SwissProt for matches? (b) From the blastx output, to what protein does this region of genomic DNA have significant similarity to? (c) How can you tell that the coding regions for the amino acids within the matched protein are not located within a single contiguous region of the genomic DNA? (There is more than one way to tell.) (d) How many separate regions of the genomic DNA align with the highest scoring match in the output? (e) What essential feature of the organization of the gene does the above information provide? (f) Note the numbering of the sequences in the alignments. Do the sequences in the genome database progress in the same direction as the sequence in the amino acid database in the alignments? In other words do they have the same orientation (see below): 1.................................114 = query 61...............................98 = subject or the opposite orientation (below): 1.................................114 = query 98...............................61 = subject (g) What does the orientation of the sequences in the alignment relative to each other tell you about the DNA strand used for the query? |
| Return to the Bioinformatics Home Page |