版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、Tue Sep 18 Intro 1: Computing, statistics, Perl, MathematicaTue Sep 25 Intro 2: Biology, comparative genomics, models & evidence, applications Tue Oct 02 DNA 1: Polymorphisms, populations, statistics, pharmacogenomics, databasesTue Oct 09 DNA 2: Dynamic programming, Blast, multi-alignment, HiddenMar
2、kovModelsTue Oct 16 RNA 1: 3D-structure, microarrays, library sequencing & quantitation concepts Tue Oct 23 RNA 2: Clustering by gene or condition, DNA/RNA motifs. Tue Oct 30 Protein 1: 3D structural genomics, homology, dynamics, function & drug designTue Nov 06 Protein 2: Mass spectrometry, modific
3、ations, quantitation of interactionsTue Nov 13 Network 1: Metabolic kinetic & flux balance optimization methodsTue Nov 20 Network 2: Molecular computing, self-assembly, genetic algorithms, neural-netsTue Nov 27 Network 3: Cellular, developmental, social, ecological & commercial modelsTue Dec 04 Proj
4、ect presentationsTue Dec 11 Project PresentationsTue Jan 08 Project PresentationsTue Jan 15 Project PresentationsBio 101: Genomics & Computational BiologyRNA1: Last weeks take home lessons Integration with previous topics (HMM for RNA structure) Goals of molecular quantitation (maximal fold-changes,
5、 clustering & classification of genes & conditions/cell types, causality) Genomics-grade measures of RNA and protein and how we choose (SAGE, oligo-arrays, gene-arrays)Sources of random and systematic errors (reproducibilty of RNA source(s), biases in labeling, non-polyA RNAs, effects of array geome
6、try, cross-talk). Interpretation issues (splicing, 5 & 3 ends, editing, gene families, small RNAs, antisense, apparent absence of RNA).Time series data: causality, mRNA decay, time-warpingRNA2: Todays story & goalsClustering by gene and/or condition Distance and similarity measuresClustering & class
7、ificationApplicationsDNA & RNA motif discovery & searchData- Ratios- Log Ratios- Absolute Measurement- Euclidean Dist.- Manhattan Dist.- Sup. Dist.- Correlation Coeff.- Single- Complete- Average- CentroidUnsupervised | Supervised- SVM- Relevance NetworksHierarchical | Non-hierarchical- Minimal Spann
8、ing Tree- K-means- SOMData Normalization | Distance Metric | Linkage | Clustering MethodGene Expression Clustering Decision TreeHow to normalize- Variance normalize- Mean center normalize- Median center normalizeWhat to normalize- genes- conditions(Whole genome) RNA quantitation objectivesRNAs showi
9、ng maximum changeminimum change detectable/meaningfulRNA absolute levels (compare protein levels)minimum amount detectable/meaningfulClassification: drugs & cancersNetwork - direct causality- motifsClustering vs. supervised learningK-means clusteringSOM = Self Organizing MapsSVD = Singular Value dec
10、ompositionPCA = Principal Component AnalysisSVM = Support Vector Machine classification and Relevance networksBrown et al. PNAS 97:262 Butte et al PNAS 97:12182Cluster analysis of mRNA expression dataBy gene (rat spinal cord development, yeast cell cycle): Wen et al., 1998; Tavazoie et al., 1999; Ei
11、sen et al., 1998; Tamayo et al., 1999By condition or cell-type or by gene&cell-type (human cancer): Golub, et al. 1999; Alon, et al. 1999; Perou, et al. 1999; Weinstein, et al. 1997Cheng, ISMB 2000. .Cluster AnalysisProtein/protein complexGenesDNA regulatory elementsClustering hierarchical & non-Hie
12、rarchical: a series of successive fusions of data until a final number of clusters is obtained; e.g. Minimal Spanning Tree: each component of the population to be a cluster. Next, the two clusters with the minimum distance between them are fused to form a single cluster. Repeated until all component
13、s are grouped. Non-: e.g. K-mean: K clusters chosen such that the points are mutually farthest apart. Each component in the population assigned to one cluster by minimum distance. The centroids position is recalculated and repeat until all the components are grouped. The criterion minimized, is the
14、within-clusters sum of the variance.Clusters of Two-Dimensional DataKey Terms in Cluster AnalysisDistance measuresSimilarity measuresHierarchical and non-hierarchicalSingle/complete/average linkageDendrogramDistance Measures: Minkowski MetricMost Common Minkowski MetricsAn Example43xy Manhattan dist
15、ance is called Hamming distance when all features are binary.Gene Expression Levels Under 17 Conditions (1-High,0-Low)Similarity Measures: Correlation CoefficientWhat kind of x and y givelinear CC ? Similarity Measures: Correlation CoefficientTimeGene AGene BGene ATimeGene BExpression LevelExpressio
16、n LevelExpression LevelTimeGene AGene BHierarchical Clustering DendrogramsAlon et al. 1999Clustering tree for the tissue samplesTumors(T) and normal tissue(n).Hierarchical Clustering TechniquesThe distance between two clusters is defined as the distance betweenSingle-Link Method / Nearest Neighbor:
17、their closest members.Complete-Link Method / Furthest Neighbor: their furthest members.Centroid: their centroids.Average: average of all cross-cluster pairs.Single-Link MethodbaDistance MatrixEuclidean Distance(1)(2)(3)a,b,cccda,bdda,b,c,dComplete-Link MethodbaDistance MatrixEuclidean Distance(1)(2)
18、(3)a,bccda,bdc,da,b,c,dDendrograms2460Single-LinkComplete-LinkWhich clustering methods do you suggest for the following two-dimensional data?Nadler and Smith, Pattern Recognition Engineering, 1993Data- Ratios- Log Ratios- Absolute Measurement- Euclidean Dist.- Manhattan Dist.- Sup. Dist.- Correlatio
19、n Coeff.- Single- Complete- Average- CentroidUnsupervised | Supervised- SVM- Relevance NetworksHierarchical | Non-hierarchical- Minimal Spanning Tree- K-means- SOMData Normalization | Distance Metric | Linkage | Clustering MethodGene Expression Clustering Decision TreeHow to normalize- Variance norm
20、alize- Mean center normalize- Median center normalizeWhat to normalize- genes- conditionsNormalized Expression DataTavazoie et al. 1999 ( :/)Time-point 1Time-point 3Time-point 2Gene 1Gene 2Normalized Expression Data from microarraysT1T2T3Gene 1Gene N.Representation of expression
21、datadijIdentifying prevalent expression patterns (gene clusters)Time-point 1Time-point 3Time-point 2-1.8-1.3-0.8-1.2123-2-1.5-1-0.500.511.5123-1.5-1-0.500.511.5123Time -pointTime -pointTime -pointNormalizedExpressionNormalizedExpressionNormalizedExpressionGlycolysisNuclear OrganizationRibos
22、omeTranslationUnknownGenesMIPS functional categoryCluster contentsRNA2: Todays story & goalsClustering by gene and/or condition Distance and similarity measuresClustering & classificationApplicationsDNA & RNA motif discovery & searchMotif-finding algorithmsoligonucleotide frequenciesGibbs sampling (
23、e.g. AlignACE)MEMEClustalWMACAWTranscription control sites(7 bases of information)Genome:(12 Mb) 7 bases of information (14 bits) 1 match every 16000 sites. 1500 such matches in a 12 Mb genome (24 * 106 sites). The distribution of numbers of sites for different motifs is Poisson with mean 1500, whic
24、h can be approximated as normal with a mean of 1500 and a standard deviation of 40 sites. Therefore, 100 sites are needed to achieve a detectable signal above background.Feasibility of a whole-genome motif search? Whole-genome mRNA expression data: two-way comparisons between different conditions or
25、 mutants, clustering/grouping over many conditions/timepoints. Shared phenotype (functional category). Conservation among different species. Details of the sequence selection: eliminate protein-coding regions, repetitive regions, and any other sequences not likely to contain control sites.Sequence S
26、earch Space Reduction Whole-genome mRNA expression data: two-way comparisons between different conditions or mutants, clustering/grouping over many conditions/timepoints. Shared phenotype (functional category). Conservation among different species. Details of the sequence selection: eliminate protei
27、n-coding regions, repetitive regions, and any other sequences not likely to contain control sites.Sequence Search Space Reduction Modification of Gibbs Motif Sampling (GMS), a routine for motif finding in protein sequences (Lawrence, et al. Science 262:208-214, 1993). Advantages of GMS: stochastic s
28、ampling variable number of sites per input sequence distributed information content per motif AlignACE modifications: considers both strands of DNA simultaneously efficiently returns multiple distinct motifs various other tweaksMotif FindingAlignACE(Aligns nucleic Acid Conserved Elements)5- TCTCTCTC
29、CACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT5- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG5- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT5- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTAT
30、AGTCATTTCTATC5- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA5- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA5- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACAHIS7 ARO4ILV6THR4ARO1HOM2PRO3300-600 bp of u
31、pstream sequence per gene are searched in Saccharomyces cerevisiae.AlignACE ExampleInput Data Set5- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT5- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG5- CACATCCAACGAATCACCTCACCGTTATCGTGACTCAC
32、TTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT5- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC5- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA5- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA5- GGCGCCACAGTCCG
33、CGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACAAAAAGAGTCAAAATGACTCAAAGTGAGTCAAAAAGAGTCAGGATGAGTCAAAATGAGTCAGAATGAGTCAAAAAGAGTCA*MAP score = 20.37 (maximum)HIS7 ARO4ILV6THR4ARO1HOM2PRO3AlignACE ExampleThe Target Motif5- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGAC
34、ATCGAAACATACAT5- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG5- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT5- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC5- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATA
35、GTAAATTGTCATGCATATGACTCATCCCGAACATGAAA5- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA*TGAAAAATTCGACATCGAAAGCACTTCGGCGAGTCATTACGTAAATTGTCCCACAGTCCGTGTGAAGCAC5- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT5- ATGGCAGAATCACTTTAAAACGTGGC
36、CCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG5- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT5- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC5- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA*TGAA
37、AAATTCGACATCGAAAGCACTTCGGCGAGTCATTACGTAAATTGTCCCACAGTCCGTGTGAAGCACMAP score = -10.0HIS7 ARO4ILV6THR4ARO1HOM2PRO3AlignACE ExampleInitial Seeding5- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT5- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACT
38、CAACG5- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT5- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC5- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA5- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAG
39、AAAAATAGAAAAGCAGAAAAAATAAATAA5- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA*TGAAAAATTCGACATCGAAAGCACTTCGGCGAGTCATTACGTAAATTGTCCCACAGTCCGTGTGAAGCACAdd?*TGAAAAATTCGACATCGAAAGCACTTCGGCGAGTCATTACGTAAATTGTCCCACAGTCCGTGTGAAGCACTCTCTCTCCAHow much better is the alignment w
40、ith this site as opposed to without?HIS7 ARO4ILV6THR4ARO1HOM2PRO3AlignACE ExampleSampling5- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT5- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG5- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTC
41、GCATCGCCGAAGTGCCATAAAAAATATTTTTT5- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC5- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA5- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA5- GGCGCCACAGTCCGCGTTTGGT
42、TATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA*TGAAAAATTCGACATCGAAAGCACTTCGGCGAGTCATTACGTAAATTGTCCCACAGTCCGTGTGAAGCACAdd?*TGAAAAATTCGACATCGAAAGCACTTCGGCGAGTCATTACGTAAATTGTCCCACAGTCCGTGTGAAGCACHow much better is the alignment with this site as opposed to without?Remove.ATGAAAAAATHIS7 ARO4IL
43、V6THR4ARO1HOM2PRO3AlignACE ExampleContinued Sampling5- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT5- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG5- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT5- TG
44、CGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC5- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA5- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA5- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGT
45、GTGGCATGTGCTTCACACA*GACATCGAAAGCACTTCGGCGAGTCATTACGTAAATTGTCCCACAGTCCGTGTGAAGCACAdd?*TGAAAAATTCGACATCGAAAGCACTTCGGCGAGTCATTACGTAAATTGTCCCACAGTCCGTGTGAAGCACHow much better is the alignment with this site as opposed to without?HIS7 ARO4ILV6THR4ARO1HOM2PRO3AlignACE ExampleContinued Sampling5- TCTCTCTCC
46、ACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT5- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG5- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT5- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATA
47、GTCATTTCTATC5- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA5- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA5- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA*GACATCGAAAGCACTTCGGCGAGTCATTACGTAAATTGTCCCAC
48、AGTCCGTGTGAAGCAC* *GACATCGAAACGCACTTCGGCGGAGTCATTACAGTAAATTGTCACCACAGTCCGCTGTGAAGCACAHow much better is the alignment with this new column structure?HIS7 ARO4ILV6THR4ARO1HOM2PRO3AlignACE ExampleColumn Sampling5- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT5- ATGGCAG
49、AATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG5- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT5- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC5- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCA
50、TCCCGAACATGAAA5- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA5- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACAAAAAGAGTCAAAATGACTCAAAGTGAGTCAAAAAGAGTCAGGATGAGTCAAAATGAGTCAGAATGAGTCAAAAAGAGTCA*MAP score = 20.37HIS7 ARO4ILV6THR4ARO1HOM2P
51、RO3AlignACE ExampleThe Best Motif5- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAXAGTCAGACATCGAAACATACAT5- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATXACTCAACG5- CACATCCAACGAATCACCTCACCGTTATCGTGACTXACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT5- TGCGAACAAAAXAGTCATTAC
52、AACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC5- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTXATCCCGAACATGAAA5- ATTGATTGACTXATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA5- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTXATTCTGACTXTTTTTTGGAAAGTGTGGCATGTGCTTCACACA
53、AAAAGAGTCAAAATGACTCAAAGTGAGTCAAAAAGAGTCAGGATGAGTCAAAATGAGTCAGAATGAGTCAAAAAGAGTCA*HIS7 ARO4ILV6THR4ARO1HOM2PRO3 Take the best motif found after a prescribed number of random seedings. Select the strongest position of the motif. Mark these sites in the input sequence, and do not allow future motifs to
54、 sample those sites. Continue sampling.AlignACE ExampleMasking (old way)5- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT5- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG5- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCA
55、TAAAAAATATTTTTT5- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC5- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA5- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA5- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATT
56、CTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACAAAAAGAGTCAAAATGACTCAAAGTGAGTCAAAAAGAGTCAGGATGAGTCAAAATGAGTCAGAATGAGTCAAAAAGAGTCA*HIS7 ARO4ILV6THR4ARO1HOM2PRO3 Maintain a list of all distinct motifs found. Use CompareACE to compare subsequent motifs to those already found. Quickly reject weaker, but similar m
57、otifs.AlignACE ExampleMasking (new way)B,G = standard Beta & Gamma functionsN = number of aligned sites; T = number of total possible sitesFjb = number of occurrences of base b at position j (F = sum)Gb = background genomic frequency for base bbb = n x Gb for n pseudocounts (b = sum)W = width of mot
58、if; C = number of columns in motif (W=C)MAP ScoreN = number of aligned sitesR = overrepresentation of those sites. MAP N log RMAP Score188.38578.116320.620128.1044117.52831.10173.42768.2458619.37955.099389.42922.78973 MAP score MotifAlignACE Example: Final Results (alignment of upstream regions from
59、 116 amino acid biosynthetic genes in S. cerevisiae)Indices used to evaluate motif significanceGroup specificity Functional enrichment Positional biasPalindromicityKnown motifs (CompareACE)Searching for additional motif instances in the entire genome sequenceSearches over the entire genome for addit
60、ional high-scoring instances of the motif are done using the ScanACE program, which uses the Berg & von Hippel weight matrix (1987). M = length of binding site motifB = base at position l within the motifnlB= number of occurrences of base B at position l in the input alignmentnlO= number of occurren
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年度二零二五年度工藝品電商平臺(tái)入駐合同2篇
- 2025年度城市綜合體地下車位租賃合作協(xié)議3篇
- 2025年度廢舊家電拆解與環(huán)保新材料研發(fā)合同
- 2025年度工業(yè)通信網(wǎng)絡(luò)快速接入標(biāo)準(zhǔn)化合同
- 2025年度城市地下綜合管廊建設(shè)工程施工合同
- 2025年度廠房物業(yè)財(cái)產(chǎn)移交與能源審計(jì)服務(wù)合同
- 2025年度航空航天部件委外加工及檢測(cè)合同
- 2025年度高速公路充電樁場(chǎng)地租賃與配套設(shè)施建設(shè)協(xié)議2篇
- 2025年度城市更新改造合作投標(biāo)協(xié)議書
- 2025年度國(guó)際文化交流活動(dòng)聘用合同3篇
- 村集體經(jīng)濟(jì)入股分紅協(xié)議書
- 2025屆上海市交大附中嘉定分校物理高二上期末達(dá)標(biāo)檢測(cè)試題含解析
- 2024年不銹鋼門安裝協(xié)議
- 放飛心靈 激揚(yáng)青春-中職生心理健康學(xué)習(xí)通超星期末考試答案章節(jié)答案2024年
- 基于OBE理念的課程目標(biāo)、畢業(yè)要求及培養(yǎng)目標(biāo)達(dá)成度評(píng)價(jià)
- 2024年6月浙江高考?xì)v史試卷(含答案解析)
- 2024年知識(shí)競(jìng)賽-少先隊(duì)知識(shí)競(jìng)賽考試近5年真題附答案
- 勞動(dòng)合同(模版)4篇
- 保密工作會(huì)議領(lǐng)導(dǎo)講話稿
- 四年級(jí)數(shù)學(xué)上冊(cè)計(jì)算題過(guò)關(guān)訓(xùn)練共20天
- 貨物采購(gòu)供貨方案(技術(shù)方案)
評(píng)論
0/150
提交評(píng)論