![]() |
|||||||||||
|
|
|||||||||||
![]() |
Potato Ests: Documentation
|
||||||||||
1.0 EST Sequencing Potato ESTs were sequenced using the MegaBACE 100 or 4000 sequencing machine and Et-terminator chemistry 2.0 EST Processing
Bases were called using Phred (version 0.020425.c; Ewing and Green, 1998; Ewing et al.,1998) Sequences with greater than 3% Ns were removed from further analysis Sequences were compared to each of the four following reference libraries Any significant matches were removed from further analysis Sequences with fewer than 100 bp, or greater than 4000 bp are removed from further analysis EST sequences are compared to a reference library of plant repeats containing: Any significant matches were temporarily masked during the assembly process Any low complexity regions including poly A/T are temporarily masked during the assembly process 3.0 Removal of Late Blight Sequences Cleaned EST sequences were screened (E-value cut-off of < 1.0 e-30), using BLASTN, against a comprehensive late blight nucleotide database, as well as a plant nucleotide database made from a subset of the GenBank nucleotide database. ESTs with a lower E-value hit to a late blight sequence than a plant nucleotide sequence were removed from further analysis. 4.0 EST clustering and Assembly ESTs were clustered and assembled using with the Paracel Transcript Assembly program (version 2.7; Paracel, Pasadena, CA). Following clustering, ESTs were assembled into high quality contigs if they had a 94% similarity, a minimum overlap of 40 bp, and gaps less than 9 bases long. Chimeric sequences were identified during the assembly process and removed to prevent misalignments. 5.0 Functional Annotation Homology searches of CPGP contigs and singletons were conducted with the Basic Linear Alignment Search Tool (BLAST) procedures (Altschul et al., 1990). The unigene set was searched using BLASTX against a local copy of the GenBank protein database (http://www.ncbi.nlm.nih.gov/). A putative function was assigned to a sequence using an E-value cut-off of < 1.0 e-10. 6.0 Gene Ontology A Gene Ontology (http://www.geneontology.org/) term was assigned to the sequences in the unigene set by screening the sequences, using BLASTX, against the Arabidopsis thaliana protein database (http://www.Arabidopsis.org). Sequences were assigned the Gene Ontology term of the annotated Arabidopsis thaliana blast hit (E-value cut-off of < 1.0 e-10), and then grouped according to classification. 7.0 Terminology
8.0 References Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410. Ewing B, Green P (1998) "Base-Calling of Automated Sequencer Traces Using Phred II. Error Probabilities". Genome Res. 8:186-194. Ewing B, Hillier L, Wendl MC, Green P (1998) "Base-calling of Automated Sequencer Traces Using Phred. I. Accuracy Assessment". Genome Res. 8:175-185. Ronning CM, Stegalkina SS, Ascenzi RA, Bougri O, Hart AL, Utterbach TR, Vanaken SE, Riedmuller SB, White JA, Cho J, Pertea GM, Lee Y, Karamycheva S, Sultana R, Tsai J, Quackenbush J, Griffiths HM, Restrepo S, Smart CD, Fry WE, Van Der Hoeven R, Tanksley S, Zhang P, Jin H, Yamamoto ML, Baker BJ, Buell CR. (2003) "Comparative Analyses of Potato Expressed Sequence Tag Libraries". Plant Physiol. 131(2):419-429. |
|||||||||||