轉(zhuǎn)錄本拼裝和可變剪接瑞客_第1頁
轉(zhuǎn)錄本拼裝和可變剪接瑞客_第2頁
轉(zhuǎn)錄本拼裝和可變剪接瑞客_第3頁
轉(zhuǎn)錄本拼裝和可變剪接瑞客_第4頁
轉(zhuǎn)錄本拼裝和可變剪接瑞客_第5頁
已閱讀5頁,還剩28頁未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

轉(zhuǎn)錄本拼裝只用于內(nèi)部培訓(xùn)學(xué)習(xí)使用,請(qǐng)勿傳播。為什么轉(zhuǎn)錄本可以拼接起來?易生信,畢生緣;培訓(xùn)。2有參組裝和無參組裝易生信,畢生緣;培訓(xùn)。3可變剪接分析有參轉(zhuǎn)錄組分析流程

–可變剪接易生信,畢生緣;培訓(xùn)。4Sequencing

RNA-seq

reads(2

x

150

bp)Read

alignment

STAR/HISAT2

(genome)

Geneidentification

StringTieDifferentialexpression

DESeq2

ImageGPVisualization

Geneannotation

(.gtf

file)Raw

sequence

data

(.fastq

files)Reference

genome

(.fa

file)

Inputs

rMATSAlternative

splicingEnrichment

analysis

GO/GSEA

WGCNACytoscape

Network

analysisStringTieStringTie

is

a

fast

and

highly

efficient

assembler

of

RNA-Seqalignments

into

potential

transcripts.

It

uses

a

novel

network

flowalgorithm

as

well

as

an

optional

de

novo

assembly

step

toassemble

and

quantitate

full-length

transcripts

representingmultiple

splice

variants

for

each

gene

locus.

Its

input

can

includenot

only

the

alignments

of

raw

reads

used

by

other

transcriptassemblers,

but

also

alignments

longer

sequences

that

havebeen

assembled

from

those

reads.

In

order

to

identifydifferentially

expressed

genes

between

experiments,

StringTie'soutput

can

be

processed

by

specialized

software

like

Ballgown,Cuffdiff

or

other

programs

(DESeq2,

edgeR,

etc.).

易生信,畢生緣;培訓(xùn)。5StringTie的下載和安裝下載地址:wget解壓縮:tar-xvfzstringtie-1.3.4d.Linux_x86_64.tar.gz進(jìn)入文件目錄:cdstringtie-1.3.4d.Linux_x86_64獲得當(dāng)前工作目錄pwd復(fù)制當(dāng)前工作目錄,vim~/.bash_profile打開并編輯添加:exportPATH=復(fù)制的地址:$PATH保存并退出source~/.bash_profile終端輸入stringtie-h,顯示參數(shù)設(shè)置文檔易生信,畢生緣;培訓(xùn)。6轉(zhuǎn)錄本拼裝samtoolssort-@1-Obam-osampleA.sorted.bamsampleA.sam#輸入文件可以是sam或者bam格式,如果已經(jīng)按照坐標(biāo)排好序,請(qǐng)忽略此步stringtie-GGRCh38.gtf-lsampleA-f0.15-p1-osampleA/sampleA.stringtie_first.gtf

sampleA.sorted.bam#輸入文件是按坐標(biāo)排序的bam文件,如果是鏈特異性建庫,請(qǐng)加上對(duì)應(yīng)參數(shù)易生信,畢生緣;培訓(xùn)。7不同樣本轉(zhuǎn)錄本合并stringtie

--merge

-p

1

-G

GRCh38.gtf

-l

ehbio_trans

-ostringtie_merged.gtf

mergelist.txt#merge模式,

a

merged

GTF

file

from

a

set

of

GTF

files,mergelist.txt中是各樣本拼裝的gtf文件路徑和文件名#mergelist.txt:/ehbio/RNAseq/assmble/sampleA.stringtie_first.gtf/ehbio/RNAseq/assmble/sampleB.stringtie_first.gtf/ehbio/RNAseq/assmble/sampleC.stringtie_first.gtf/ehbio/RNAseq/assmble/sampleD.stringtie_first.gtf易生信,畢生緣;培訓(xùn)。8轉(zhuǎn)錄本定量stringtie-Gstringtie_merged.gtf-lsampleA-f0.15-p1-e-B-osampleA/sampleA.stringtie_quant.gtfsampleA.sorted.bam從拼裝定量的gtf文件整理readscounttable(也可用htseq-count或者featureCount計(jì)算獲得),prepDE.py腳本的下載:wgetprepDE.py-isample_lst.txt-g#sample_lst.txt包括sample的名稱和定量gtf所在的路徑易生信,畢生緣;培訓(xùn)。9pareTheprogramparecanbeusedtocompare,merge,annotateandestimateaccuracyofoneormoreGFFfiles(the“query”files),whencomparedwithareferenceannotation.pare-R-rGRCh38.gtf-osampleA2GRCh38-igtf_list#gtf_list:/ehbio/RNAseq/assmble/sampleA.stringtie_first.gtf/ehbio/RNAseq/assmble/sampleB.stringtie_first.gtf/ehbio/RNAseq/assmble/sampleC.stringtie_first.gtf/ehbio/RNAseq/assmble/sampleD.stringtie_first.gtf

易生信,畢生緣;培訓(xùn)。10pare輸出文件:sampleA2GRCh38.stats(總的數(shù)據(jù)統(tǒng)計(jì))bined/annotated.gtf(querygtf信息)sampleA2GRCh38.sampleA.stringtie_first.gtf.refmap(原注釋與組裝轉(zhuǎn)錄本的匹配信息)sampleA2GRCh38.sampleA.stringtie_first.gtf.tmap(最匹配的原注釋與組裝轉(zhuǎn)錄本的匹配信息)sampleA2GRCh38.loci(轉(zhuǎn)錄本在基因組上的坐標(biāo)信息)sampleA2GRCh38.tracking(Trackingtransfragsthroughmultiplesamples)

易生信,畢生緣;培訓(xùn)。11Transcript

classification

codesIfparewasrunwiththe-roption(paringwithareferenceannotation),trackingrowswillcontaina"classcode"valueshowingtherelationshipbetweenatransfragandtheclosestreferencetranscript(whereapplicable).Ifthe-roptionwasnotusedtherowswillallcontain“-”intheirclasscodecolumn.Thesamecodesarealsoshownasthevalueoftheattribute"class_code"intheoutputGTFfile.Theclasscodesareshownbelowindecreasingorderoftheirpriority.組裝轉(zhuǎn)錄本與參考轉(zhuǎn)錄本之間的匹配類型易生信,畢生緣;培訓(xùn)。12Transcript

classification

codes易生信,畢生緣;培訓(xùn)。

完全匹配

部分包含反向包含參考序列潛在新轉(zhuǎn)錄本

pre-

mRNA片段鄰近相同鏈重疊外顯子相反鏈匹配內(nèi)含子相反鏈重疊外顯子位于內(nèi)含子區(qū)域內(nèi)含子包含參考序列聚合酶通讀產(chǎn)物重復(fù)序列未注釋基因

13間區(qū)可變剪接分析易生信,畢生緣;培訓(xùn)。15可變剪接的類型

外顯子跳躍

可變的5'

剪接位點(diǎn)

可變的3'

剪接位點(diǎn)

外顯子互斥

內(nèi)含子保留樣品特異的選擇性剪接事件的鑒定易生信,畢生緣;培訓(xùn)。16rMATS

tar-xzfrMATS.4.0.2.tgzcdrMATS.4.0.2/(userguidestepbystep)Pre-requisites:InstallPython2.7.xandcorrespondingversionsofNumPyandSciPyDownloadandinstallpysam(rMATSwastestedwithv)Downloadandinstallsamtools(version1.2orlater)DownloadandinstallSTAR(version2.5orlater)易生信,畢生緣;培訓(xùn)。17rMATS依賴的python包和庫的安裝pip

install

numpy

--userpip

install

scipy

--timeout=1000

--user

#(網(wǎng)絡(luò)下載速度較慢時(shí)可設(shè)置--timeout)pip

install

pysam

--user易生信,畢生緣;培訓(xùn)。18Which

version

to

use?Open

python

console

and

type

in:>>>

import

sys>>>

print

sys.maxunicode1114111This

output

indicates

that

your

python

is

built

with

--enable-unicode=ucs4,and

you

should

use

rMATS-turbo-xxx-UCS4.>>>

import

sys>>>

print

sys.maxunicode65535This

output

indicates

that

your

python

is

built

with

--enable-unicode=ucs2,and

you

should

use

rMATS-turbo-xxx-UCS2.易生信,畢生緣;培訓(xùn)。19易生信,畢生緣;培訓(xùn)。20rMATS必需參數(shù)

s1.txt和s2.txt是以逗號(hào)分

隔的樣本FASTQ文件

b1.txt和b2.txt是以逗號(hào)分

隔的樣本bam文件

-t

readType雙端測序?yàn)?/p>

paired

single

測序reads的長度

gtf注釋文件

輸入文件是fastq格式時(shí),

指定STAR索引文件位置

輸出結(jié)果文件夾易生信,畢生緣;培訓(xùn)。21rMATS可選參數(shù)

線程數(shù)

統(tǒng)計(jì)計(jì)算

時(shí)線程數(shù)

建議加上rMATS運(yùn)行腳本python

rMATS-turbo-xxx-UCSx/rmats.py

--s1

s1.txt

--s2

s2.txt

--gtfgtfFile

--bi

STARindexFile

--od

out_directory

-t

paired

--nthread

2

--tstat2

--readLength

101

--tophatAnchor

8

--cstat

0.0001#輸入文件是fastq格式python

rMATS-turbo-xxx-UCSx/rmats.py

--b1

b1.txt

--b2

b2.txt

-gtfgtfFile

--od

bam_out_directory

-t

paired

--nthread

2

--tstat

2

--readLength

101

--cstat

0.0001

--libType

fr-unstranded#輸入文件是bam格式易生信,畢生緣;培訓(xùn)。22rMATS運(yùn)行結(jié)果AS_Event.MATS.JC.txtevaluates

splicing

with

only

reads

that

span

splicing

junctionsAS_Event.MATS.JCEC.txtevaluates

splicing

with

reads

that

span

splicing

junctions

and

reads

on

targetfromGTF.AS_Event.txtall

possible

alternative

splicing

(AS)

events

derived

from

GTF

and

RNAJC.raw.input.AS_Event.txtevaluates

splicing

with

only

reads

that

span

splicing

junctionsJCEC.raw.input.AS_Event.txtevaluates

splicing

with

reads

that

span

splicing

junctions

and

reads

on

target易生信,畢生緣;培訓(xùn)。23rMATS輸出結(jié)果解釋易生信,畢生緣;培訓(xùn)。24rMATS輸出結(jié)果每列含義解釋易生信,畢生緣;培訓(xùn)。25rMATS輸出結(jié)果每列含義解釋易生信,畢生緣;培訓(xùn)。26rmats2sashimiplot:convertingtherMATSoutput

into

sashimiplotRequirements:Python2.6.xorPython2.7.xInstallsamtools;BAMfilemustbesortedbeforevisualization/indexing(samtoolsindexsample.bam)wgetunziprmats2sashimiplot-master.zipcdrmats2sashimiplot-masterpythonsetup.pyinstall或者利用pip進(jìn)行安裝pipinstallrmats2sashimiplot--user易生信,畢生緣;培訓(xùn)。27易生信,畢生緣;培訓(xùn)。28rmats2sashimiplot必需參數(shù)

--s1和--s2參數(shù):

逗號(hào)分隔的樣本

sam文件

--b1和--b2參數(shù):

逗號(hào)分隔的樣本

bam文件

-t

可變剪接類型

-e

對(duì)應(yīng)剪接類型的

rMATS輸出結(jié)果--l1和--l2:第一組和第二組樣本名稱Rmats2sashimiplot可選參數(shù)易生信,畢生緣;培訓(xùn)。29rmats2sashimiplot運(yùn)行腳本rmats2sashimiplot

--s1

sampleA.R1.sam,sampleA.R2.sam,sampleA.R3.sam

--s2sampleB.R1.sam,sampleB.R2.sam,sampleB.R3.sam-tSE-e/MATS_output/sampleA_sampleB.SE.MA

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論