Gene Composer database software for protein construct design, codon engineering, and gene synthesis_第1頁(yè)
Gene Composer database software for protein construct design, codon engineering, and gene synthesis_第2頁(yè)
Gene Composer database software for protein construct design, codon engineering, and gene synthesis_第3頁(yè)
Gene Composer database software for protein construct design, codon engineering, and gene synthesis_第4頁(yè)
Gene Composer database software for protein construct design, codon engineering, and gene synthesis_第5頁(yè)
已閱讀5頁(yè),還剩17頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、 BMCBiotechnologyBioMedCentralSoftwareOpen AccessGene Composer: database software forprotein construct design,codon engineering, andgene synthesis1,2,3,AmyRaymond ,John Walchli3,4,Mark Mixon3,4,1,2DonLorimerAdrienne Barrow1,3,Ellen Wallace1,3,Rena Grice1,3,Alex Burgin1andLance Stewart*1,2,3Address:

2、1deCODE biostructures, Inc7869 NEDayRoad West, Bainbridge Island, WA,98110, USA, 2Seattle Structural Genomics Center forInfectious Disease, Bainbridge Island, WA,98110, USA, 3Accelerated Technologies Center for Geneto3DStructure, Bainbridge Island, WA,98110, USAand 4Emerald BioSystems, Inc7869 NEDay

3、RoadWest, Bainbridge Island, WA, 98110, USAE-mail: DonLorimer -dlorimerdecode ; AmyRaymond -araymonddecode ; JohnWalchli -jwalchlidecode ;Mark Mixon -mmixondecode ; Adrienne Barrow -abarrowdecode ; Ellen Wallace -ewallacedecode ;RenaGrice -rgricedecode ; AlexBurgin -aburgindecode ; Lance Stewart* -l

4、stewartdecode *Corresponding authorPublished: 21April 2009Received: 16October 2008Accepted: 21April 2009BMCBiotechnology 2009, 9:36 doi: 10.1186/1472-6750-9-36This article isavailable from: :/ biomedcentral /1472-6750/9/362009Lorimer etal;licensee BioMed Central Ltd.ThisisanOpenAccess article distri

5、buted undertheterms oftheCreative Commons Attribution License ( :/licenses/by/2.0),which permits unrestricted use,distribution, andreproduction inanymedium, provided theoriginal workisproperly cited.AbstractBackground: Toimprove efficiency inhighthroughput protein structure determination, wehavedeve

6、loped a database software package, Gene Composer, which facilitates the information-richdesign of protein constructs and their codon engineered synthetic gene sequences. With itsmodular workflow design and numerous graphical user interfaces, Gene Composer enablesresearchers toperform allcommon bio-i

7、nformatics steps usedinmodern structure guided proteinengineering and synthetic gene engineering.Results: An interactive Alignment Viewer allows the researcher to simultaneously visualizesequence conservation in the context of known protein secondary structure, ligand contacts, watercontacts,crystal

8、contacts, B-factors,solventaccessiblearea, residueproperty type and severalotheruseful property views. The Construct Design Module enables the facile design of novel proteinconstructswithalteredN-andC-termini,internalinsertionsordeletions,pointmutations,anddesiredaffinitytags.Themodificationscanbeco

9、mbinedandpermutedintomultipleproteinconstructs,andthenvirtuallyclonedinsilicointodefinedexpressionvectors.TheGeneDesignModuleusesaprotein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codonengineerednucleicacidgenesequenceaccordingtoaselectedcodonusage

10、tablewithminimalcodonusagethreshold,definedG:C%content,anddesiredsequencefeaturesachievedthroughsynonymouscodonselectionthatisoptimizedfortheintendedexpressionsystem.Thegene-to-oligoalgorithmoftheGeneDesignModuleplansoutalloftherequiredoverlappingoligonucleotidesandmutagenicprimersneededtosynthesize

11、thedesiredgeneconstructsbyPCR,andforphysicallycloningthemintoselectedvectorsbythemostpopularsubcloningstrategies.Conclusion: We present acomplete descriptionof Gene Composer functionality,and an efficientPCR-basedsyntheticgeneassemblyprocedurewithmis-matchspecificendonucleaseerrorcorrectionincombina

12、tionwithPIPEcloning.InasistermanuscriptwepresentdataonhowGeneComposerdesignedgenesandproteinconstructscanresultinimprovedproteinproductionforstructuralstudies.Page 1of22(page number notfor citation purposes) BMCBiotechnology 2009, 9:36 :/ biomedcentral /1472-6750/9/36Backgroundconstructs, guided by2

13、Dand 3Dinformation, while thecorresponding nucleic acid sequences are engineered forboth codon usage and other desired sequence features.Gene Composer also enables the virtual cloning of thedesigned gene constructs which, depending on userpreferences, can be parsed into data files for onlineordering

14、 of complete genes or overlapping oligonucleo-tides that can be used for PCR-based gene assembly inany standard molecular biology lab. Gene Composeroperates within the Windows operating system andutilizes a network based SQL server or Access databasethat is populated by users as they design genes. T

15、hisarrangement makes it possible for multiple users to goback after time to design new construct variants thatimprove on existing designs by inclusion of newsequence or structural information from internationalgenome sequencing and structural genomics efforts. Inthis report we describe how the synth

16、etic gene designmodules of Gene Composer facilitate protein constructengineering for structural studies, codon engineering forheterologous protein production, and oligonucleotideplanning for PCR-based gene assembly with mismatchendonuclease error correction.Large-scale projects in genomic sequencing

17、 and proteinstructure determination are producing enormous quan-tities of data on the relationships between 2D genesequence and 3D protein structure. Moreover, suchefforts areproviding experimental dataonsuccess factorsat every step in the gene to structure research endeavor.Ideally, this wealth of

18、information should be used in afeedback cycle to facilitate the design and production ofgenes and protein constructs that are optimized for thesuccessful production of functional protein samples forstructural studies. Fundamentally, this goal represents abioinformatics software challenge. With the g

19、oal ofimproving yield and success rates of heterologousprotein production for structural studies, we havedeveloped Gene Composer, adatabase software packagewhich facilitates the information-rich design of proteinconstructs and their codon engineered synthetic genesequences.The redundancy of the gene

20、tic code allows any givenprotein tobeencoded byaverylargenumber ofpossiblesynonymous gene sequences. On average, each aminoacid can be encoded by approximately three differentcodons (61 amino acid codons/20 amino acids). For atypical 100aminoacidprotein therewouldbe3100(51047) different possible cod

21、ing sequences. The degen-eracyofthegenetic codetherefore allows thepressures ofnatural selection to simultaneously influence both DNAandRNAsequence features inaddition toprotein codingfunction. DNA sequence elements and folded RNAstructures are known to play significant roles in geneexpression. As s

22、uch, the overlapping information con-tained in a gene sequence can be significantly morecomplex than coding for a linear amino acid sequence.For example in the tryptophan operon of E. coli , themRNA can fold into one of two mutually exclusiveconformations that are a direct consequence of trypto-phan

23、 availability 1. These alternate conformationsaffect mRNA stability and therefore alter the expressionof the encoded proteins. It is also well established thatcodon preferences between species, and often betweengene families within a given species, can vary 2,3.Therefore, some gene sequences may beh

24、ave better thanothers in supporting high-level translation for hetero-logous protein expression. Being able to tailor syntheticgene sequences by codon engineering to favor optimalheterologous expression isawell established strategy forimproving heterologous protein expression forstructuralbiology 4.

25、Implementation and resultsGene ComposerSoftwareGene Composer has a modular design to facilitate thework of protein engineers and structural biologists. Itcombines, within asingle database software product, theability to carry out comparative sequence alignments(Alignment Viewer) that facilitates int

26、eractive proteinconstruct design with virtual cloning (Construct DesignModule), followed by codon engineering of novelsynthetic gene sequences that are optimized for proteinexpression invarious recombinant systems (Gene DesignModule).GeneComposer iswritten inC+forWindowsoperating systems, and runs t

27、ogether with either anAccess or SQL database.Alignment ViewerA typical gene design cycle is initiated when a userdefines a protein target name and a project namewhich establishes key database identifiers for which allsubsequent Gene Composer workflow is associated.Once these identifiers have been es

28、tablished, the useris presented with a file navigation interface that allowsone to import information into the Gene Composerdatabase from multiple sources such asFASTA sequencefiles from BLAST 5 searches, existing sequence align-ments, simple text(.txt) files, andstructure filesfromtheProtein Data B

29、ank (PDB, :/ /pdb).From this imported information, Gene Composer usesthe popular ClustalW algorithm 6,7 to calculatecomparative protein sequence alignments, which areGiven the overlapping nature of information content ingene sequences (DNA, RNA, and protein level) weendeavored tocreate adatabase and

30、software tool calledGene Composer which facilitates the design of proteinPage 2of22(page number notfor citation purposes) BMCBiotechnology 2009, 9:36 :/ biomedcentral /1472-6750/9/36presented in a distilled format within the interactiveAlignment Viewer (Figure 1). This Alignment Viewerallows the res

31、earcher to simultaneously visualizesequence conservation in the context of known proteinsecondary structure, ligand contacts, water contacts,crystal contacts, residue property type and several otheruseful property views that are used to guide interactivedecision making for protein construct design.I

32、mportantly, the native amino acid residue numberingscheme is preserved throughout any alignment manip-ulations. However, since the amino acid sequencenumbering scheme of PDB files is often not necessarilycongruent with the residue numbering of native fulllength gene sequences, users can select agive

33、n sequenceand re-define the starting residue number. In this way,users canarrive atacommon residue numbering schemeto help ensure accuracy in subsequent construct design.Finally, the information rich alignments can besaved inthe Gene Composer database and exported to severalother formats including *

34、.aln, *.xml, and*.pdf forfaciledata sharing between researchers.Areas of sequence conservation are highlighted withinthe Alignment Viewer according to a user defined colorscheme andaconsensus sequence isdisplayed below thealignment. Protein secondary structure information isextracted from PDB files

35、and displayed in commongraphic annotation underneath their associated linearamino acid sequences. Importantly, the AlignmentViewer presents both the chain sequence for theprotein that went into crystallization as well as theexperimentally refined model sequence from the PDBcoordinate file. This allo

36、ws the user to easily visualizewhichaminoacidresidues hadnostructural informationreported in the PDB file, displayed as blank gaps in themodel sequence. Such residues are usually locatedwithin highly flexible regions of the protein and do notcontribute to X-ray diffraction.Protein Construct Design a

37、ndAutomated CloningTheConstruct Design Module works inconcert with theAlignment Viewer allowing theresearcher tointeractivelydefine novel protein constructs with altered amino- andcarboxy-termini, internal insertions or deletions, pointmutations, and added affinity tags. The construct designtools ar

38、e connected tothe Alignment Viewer byacursorthat shows the user exactly where in the sequencealignment the desired changes are being made. Forexample, the user can set the cursor within theAlignment Viewer at a domain boundary as visualizedin the comparative sequence alignment and thentruncate the c

39、onstruct at that site. The desired modifica-tions canbevirtually combined andpermuted insilico toarrive at multiple desired protein constructs (Figure 2).The user can also add avariety of adaptor assemblies atthe DNA sequence level to facilitate the virtual andphysical cloning of the constructs into

40、 multiple definedexpression vectors (Figure 3). Importantly, the GeneComposer virtual in silico cloning utility manages theinserts, vectors, and adaptor assemblies as three inde-pendent informatics components that are combined bythe user to arrive at final vector clones 11. After thevirtual cloning

41、is completed, the user can inspect theentire vector with its adaptor assemblies and proteinconstruct inserts. In this way, the user can see exactlyhowopenreading frames areconstructed andtheneasilyfix any virtual cloning errors before wet lab work isperformed. Many expression vectors come with their

42、own N-terminal or C-terminal affinity tags that must beaccurately fused in frame with the protein construct.Visual inspection of the virtual clone ensures that theopen reading frame formed by the vector/adaptor/insertcombination is intact and accurate.Users can define a threshold contact distance se

43、tting(default setting is3.4Angstroms) which GeneComposeruses to generate a simple distance matrix between non-bonded, non-hydrogen atom centers in PDB files. Theresulting matrix is used to flag residues in the proteinmodel thatparticipate inligand contacts, water contacts,and/or crystal contacts. Ea

44、ch contact type is annotatedwithin theAlignment Viewer with special visual symbolsdisplayed belowtheresidue ofinterest (Figure 1).Crystalcontacts are indicated when non-hydrogen atoms of aresidue are positioned within 4.0 Angstroms of neigh-boring molecule related by crystallographic or non-crystall

45、ographic symmetry. Gene Composer has a data-base of all protein crystal space groups required for thecrystal contact calculation. The ability to visualizeresidues involved in crystal contacts helps the user toidentify residues that could be mutated to improvecrystal growth 8. Gene Composer also calc

46、ulates fromPDB file information the relative solvent accessibleConnolly surface area 9 and thermal B-factors forresidues which are displayed with relative color inten-sities to provide a visual representation of the surfacelocation andmobility ofamino acids. This facilitates thevisual identification

47、 of surface residues that are candi-dates forsurface entropy reduction mutagenesis which isacommonly used toaidprotein crystallization 10. Thealignments can also be easily modified and annotatedwith the aid of an interactive cursor that allows the userto insert or delete sequences, residues, or spac

48、es.In order to automate the physical cloning of designedconstructs using either the Tecan Freedom Evo2 liquidhandling system :/ tecan or handheldpipettors, the Construct Design Module can automati-cally plan out all of the required amplimers (primers)Page 3of22(page number notfor citation purposes)

49、BMCBiotechnology 2009, 9:36 :/ biomedcentral /1472-6750/9/36Figure 1Gene Composer software, Alignment View parative sequence alignments and corresponding structuralinformation areorganized bytheAlignment Viewer ofGene Composer, shown herewithPDBfilesequences forhuman renaldipeptidase (1ITU), andthea

50、mino acid sequences offull length human renal dipeptidase plus selected vertebrate homologues(FASTA files read from aBLAST search using the amino acid sequence file of PDB file 1ITU). The sequence alignments areproduced by the popular ClustalW algorithm 6,7. The conserved consensus sequence is also

51、shown. Solvent accessiblesurface areaisrepresented inpurple shading levels (light islowanddarkishigh)calculated according toConnolly 9.ThermalB-factors from thePDB file arenormalized and represented inred shading levels (light islow and dark ishigh). Residues thatcontain atoms within 3.5Angstroms (o

52、rother userdefined setting) ofanoxygen atom ofwater molecules areillustrated withabluedot.Residues thatcontain atoms within 4.0Angstrom (orotheruserdefined setting) ofanon-hydrogen atomofasmallmolecule ligandareillustrated withcolored circles ordots(excluding bluedotsforwatercontacts). Residues that

53、contain non-hydrogen atomswithin 4.0Angstroms (orotheruserdefined setting) ofanynon-hydrogen atominaneighboring protein chainare illustrated with agreen diamond when space group symmetry operators are applied to build crystallographic neighbormolecules, or ared diamond when the contact is between no

54、n-crystallographic symmetrically related polypeptide chains.Protein secondary structural regions are represented by red helices for alpha helical regions, blue arrows for beta-sheetregions, and green squiggles for turns, as defined the respective PDB file format.Page 4of22(page number notfor citatio

55、n purposes) BMCBiotechnology 2009, 9:36 :/ biomedcentral /1472-6750/9/36Figure 2Gene Composer software, Protein Construct Design.The Construct Design tool of Gene Composer is aseparatewindow positioned below theAlignment Viewer. Tostart aConstruct Design session, theuserdefines aBaseConstruct fromth

56、eAlignment Viewer (human protein sequence withblackhighlighted box),whosesequence ispresented schematically (greenbar) with asliding window (green box) that delimits theamino acid sequence oftheportion oftheBase Construct within thesliding window (sequence above the green bar). The Alignment Viewer

57、and Construct Design Tool are coordinated tomovewith each other astheuser conducts various construct design operations across theBase Construct. Desired truncation endpoints aresetbyplacing thecursor (redlineinalignment) atdesired sequence positions andinserting arightorleftpointer forN-terminal and

58、 C-terminal truncations, respectively. The user may also create mutations within the amino acid sequenceeither asasingle amino acidchange (empty square), insertions (filled triangle above theinserted amino acids), deletions (filled,inverted triangle), oramutation pooldefining multiple mutations atas

59、ingle site(dotted lineabove thedesired changes). Oncethe desired construct modifications have been defined, the user can combine and permute anydesired setoftruncations andmutations to generate virtual constructs shown in the lower Construct Design window. Each construct is given auniqueConstruct ID

60、andConstruct Nameaccording toastandardized schema thatdescribes theN-andC-terminal residue numbersand other features.and mutagenic oligonucleotides needed to produce thegene constructs of interest by PCR from a definedtemplate. The template sequence can either be a nativecDNA sequence or a codon eng

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論