版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
知識圖譜的集成 計(jì)算機(jī)科學(xué)與軟件新技術(shù)國CCKS2016講習(xí)班,提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata 2Semantic SemanticWebwasathoughtfromTimBerners- GiveformalmeaningstoWebinformation– Web1.0(page)àWeb2.0(social)àWeb3.0(awebof SemanticWebiscommonformats integrationandcombinationofdrawnfromdiverselanguages recordinghowthedatarelatestoreal-worldobjects3 RDF
謂主 賓
LayerTheworldisnotmadeofstrings,butismadeofthings4Linkeddata 數(shù)據(jù)/關(guān)聯(lián)數(shù)據(jù) AsarealizationofSemantic LinkedDatareferstoacollectionofinterrelated Usedforlarge-scaleintegrationof,reasoningon,dataonthe LinkeddataUseURIstonameUseHTTPURIs(canbeProvideusefulinformationusingopenWebstandards(e.g.Includelinkstootherrelated5Linkedopendata(LOD)1,000+
lifesocial
6Knowledge KnowledgeGraphisaknowledgebaseusedby toenhanceitssearchengine’ssearchresultswithsemantic-searchinformationgatheredfromawidevarietyofsources?知識圖譜是使用的一個(gè)知識庫, 亦可看作是一張巨大的圖,節(jié)點(diǎn)表實(shí)體或概念,邊則由屬性或關(guān)系 除了關(guān) (部分)真實(shí)世界的一個(gè)模 引入領(lǐng)域相關(guān)的 指定術(shù)語的含義(語義 使用合適的邏輯來形 描述 HeartisamuscularorganispartofthecirculatoryI.Horrocks.Ontologiesandthesemanticweb:thestorysofar. 大規(guī)模知識庫/圖譜規(guī)英文:4百萬個(gè)實(shí)體,5億個(gè)RDF三元125種1千萬個(gè)實(shí)體,1.2億個(gè)RDF三元4千萬個(gè)實(shí)體,10億個(gè)RDF三元 知識圖譜6億個(gè)實(shí)體,35億條RDF三元WolframAlpha計(jì)算知識引擎,CMUNELL,知心,搜狗知立9知識圖譜的技術(shù)族知識體已有知識 知識圖譜提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata Sincelonglongtimes SyntacticSchema- e.g.,“WeiHu”vs.Schema- Terminological e.g.,“notebook”vs.Data-entityData-entity Pragmatic OntheSemantic Datahasexplicitsemantics,richlinks,Ontology Thepopularityofontologiesisrapidlygrowing,andthenumberofontologiescontinuesincreasing Ontology Theprocessofdeterminingcorrespondencesbetween 本體匹配即發(fā)現(xiàn)一個(gè)三元組????????>,包括一個(gè)源本體??,一個(gè)目標(biāo)本體??’,以及一個(gè)映射單元的集合??={??1??2????}。其中,????表示一個(gè)基本的映射元,可以寫成????=<????,??????>的四元 ????為映射單元的標(biāo)識符,用于唯一標(biāo)識該四元 ??,??’分別為??,??’中的術(shù) ??表示????’之間的相似度,滿足??//另外,可以有??表示??,??’之間的關(guān)系,常見的關(guān)系有等本體匹配:消除模式 (驅(qū)動的)Stateofthe語言學(xué)特征 本體中術(shù)語的語言學(xué)描 本地名(localnForanameNinanamespaceidentifiedbyaURII,thenamespacenameisI.ForanameNthatisnotinanamespace,thenamespacenamehasnovalue.Definition:IneithercasethelocalnameisN.n -->local 注釋 其他:foaf:name、dc:title語言學(xué)特征 本體語言學(xué)特征使用現(xiàn)狀的調(diào) 本地名使用多,有一些 注 鄰居未充 詞典查詢耗√√√√√√類√√機(jī)器學(xué)√排序、S-類√ Edit 指兩個(gè)字串之間,由一個(gè)轉(zhuǎn)成另一個(gè)所需的最少編輯操作次 編輯操作包括替換、插入、刪 一般來說,編輯距離越小,兩個(gè)字串的相似度越 I-Sub:??????(??1,??2)=????????(??1,??2)?????????(??1,??2)+??????????????(??1, biggestcommonsubstringtwo thelengthofunmatchedresultedfrominitialmatching 術(shù)語的語言學(xué)描述:本地名 、注 結(jié)點(diǎn)的語言學(xué)描述:前向鄰居的語言學(xué)描 術(shù)語的鄰居:主語鄰居、謂語鄰居、賓語鄰 術(shù)語的虛擬文檔:自身+???????? =???????? +??3???????? +??1 向量空間模型:TF-Stringsimilaritymetrics Lessthantwowordsperlabel:Jaro- Twoormorewordsper Synonyms:SoftJaccard,withLevensteinbase Nosynonyms:SoftJaccard,withLevensteinbase Lessthantwowordsperlabel:TF- Twoormorewordsper Synonyms:SoftTF-IDF,withJaro-Winklerbase DifferentLanguages:SoftTF-IDF,withJaro-Winklerbase Other:SoftTF-IDF,withJaro-Winklerbase結(jié)構(gòu)特征 Intuition:termsoftwodistinctontologiesaresimilarwhenadjacenttermsarennSimilarity?^_`??, =?^??, +
ij,k,lcl,k,ir
?^(??e,??e)g??(??e,??e,(??,?^(??q,??q)g??(??q,??q,(??,實(shí)例數(shù)據(jù) Machine Jointprobability Instance Content Name Meta Relaxation
搜索引擎 distance sbetween -basedsimilarity????????, =maxlog????,log?? ?log??(??,log???min{log????,log ?? isthenumber hitsforthesearchterm ?? isthenumber hitsforthesearchterm ????, isthenumber hitsforthetupleofsearchterms?? ??isthenumberofwebpagesindexed (??≈10`x)Ontologymatching Falcon- New Alotof(semi-)automaticalgorithmsand Mostareonlyapplicableforsmall ManyapplicationsrequirematchingBIG Medicineandbiology:GALEN,FMA, Agricultureandfood:AGROVOC, Librarycollections:Brinkman, Commonknowledge:DBpedia,
≥10K Adivide-and-conquer1.ontologypartitioningà2.blockmatchingà3.termRunningNewdirectionsnHolisticontologynIncreasingamountofdataàsimultaneouslymatchingnInput:asetΩ={??1,…,????}ofontologieswith??>2nOutput:??=??12∪??13∪??23∪?nGuaranteetofindalwaysthesameAglobaloptimal Limitationofpairwise ??isconsideredasalocalsolutiondependingoftheorderwhichtheontologymatchingiscarried e.g.??12∪??`}~≠??13∪??`~}≠??23∪??}~Holisticontology Extending um-weightedgraphmatchingproblemwithconstraints(cardinality,structuralandcoherence Threetypesof Class,objectproperty,data Representvirtualconnectionsbetweenthesametypesof Haveweightstorepresentsimilaritiesbetweenthe Correspondences(1:1)with umweight?à Linearconstraints:binary Classdecision disjoint 提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata Entity SemanticWebdatahavereachedascaleinbillionsof Manydifferententitiesrefertothesamereal-world TypicallydenotedbyURIs,fromdistributeddata e.g.Wei? Entitylinkage:linkdifferententitiesthatrefertothesame a.k.a.coreferenceresolution,entitymatching,recordlinkage Theentitymatchingproblemwasoriginallydefinedin1959by beetal.andwasformalizedbyFelligiandSuntertenyearslater Outof31BRDFstatements,lessthan500Marelinksacross 實(shí) 的識 數(shù) 的消 消除描述這些標(biāo)識符RDF數(shù)據(jù)之StateStateofthe Stateofthe InLOD,millionsofentitieshavealreadybeen However,potentialcandidatesarestill Current owl:sameAs,inversefunctionalpropertiesSimilaritycomputation(alsointhedatabase ComparepropertiesandvaluesofEquivalence AnRDFtriple:???,??,???∈(??∪??)×??×(??∪??∪ Same-asrelation: ???,owl:sameAs,???à???,???∈??and???,???∈ Inversefunctionalproperty(IFP)relation: IFP:avaluecanonlybethevalueofthispropertyforasingle e.g.,??1,foaf:mbox,??,??2,foaf:mbox,??à???1,??2?∈??and???2,??1?∈ Functionalproperty(FP)relation: Cardinalityrelation: owl:cardinality/owl:maxCardinality= ??=??∪??∪??∪??+,??isanequivalenceSimilarity Similarity LinkSimilarity 問題一般為以下形??,?? ????????,??>??,??∈??,??∈?? ??和??是兩個(gè)字符串集合,??是相似度 時(shí)間復(fù)雜度為:??(??}??}) 現(xiàn)有的常規(guī)的方法是“過濾—驗(yàn)證”框 過濾階段:使用各種過濾方法縮小候選集大 常見方 All-Pairs,ED-Join,PPJoin,PassJoin Na?vepairwise:??}pairwise 1,000businesslistingseachfrom1,000differentcitiesacrossthe 1trillioncomparisons,11.6days(ifeachcomparisonis1 Mentionsfromdifferentcitiesareunlikelytobe Blockingcriterion: 1billioncomparisons,16minutes(ifeachcomparisonis1 Hashbased Pairwisesimilarity/neighborhoodbasedblocking Simpleblocking:invertedMachine Alinkage Learning Genetic ActiveSelectslinkcandidatestobelabeledbyaAhumanexpertlabelstheselectedlinkascorrectorincorrectThegeneticprogrammingalgorithmevolvesthepopulationoflinkagerules InLOD,millionsofentitieshavealreadybeen However,potentialcandidatesarestill Current Atpresent,probablymissmanypotentialSimilarity Toimprove,machine Time-consuming,labor-intensivetobuildalarge-scaletrainingDefinitionDefinition1.LetUbethesetofentitiesinasetDofdatasources.Given,theentitylinkageforuistoqueryaofforwhicharelationεwhereεlinksalltheentitiesinUthatrefertothesameobjectasudoes,arecoreferentwithHowtocombine?Oursolution: Query-drivenentity UseSearch/browsing–asystemknows“whattolink”onlyatqueryyzesmallportionsofaverylargedatasettoansweron-demandOurAutomaticallyinfersemanticallyentitiesbasedonOWL/SKOS
Output:aof
an
1Builda(Initializetraining
LabeledSomepropertiestousetogether
External
LearnUnresolved
Assumptions:(1)coreferententitiessharesimilarproperty-valuepairs;(2)afewproperty-valuepairsaremoreimportantforlinkingentitiesRunning
“Nanjing“32N“118E“Nanjing“Nan-ching”“Nanjing”“32N“118E“117W“32NSome Discriminabilityofaproperty Property Non-coreferententity intermsofcoreferent Discriminabilityofavalue Discriminabilityofaprop-value>100>100RDF>2Same-asIFPFP2 BillionTriplesChallenge(BTC) Testing Top-50in364thousandquery8Music/54323 Evaluationprocedureand 30graduates,2judges+1arbitrator/link,Fleiss’sκ=0.8(sufficient Precision&relativerecall RR=correctlinksinonesystem/totalcorrectuniquelinksinall umiteration= Discriminabilitythreshold= Linkage Runningtimeon5,000samples:avg.11.3linksin OntologyAlignmentEvaluation ISWCworkshopsincen Ontologymatching&instancematching提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata Metadataisvitaltomultimediacontent Search,browsing,management Large-scaleLODarepublishedand Makeuseofsuchrichsourceof Existingmultimediametadatamodelsanddonotprovideformaltypicallyfocusonasinglemedia EXIF patiblewithMPEG- Differentmediatypesco-existinamultimedia Amoviemayhaveathememusicanda Aunified,well-definedontology(withits stoothers)neededtogain Challenge:LinkandintegrateheterogeneousAmotivingBeautyandthe Low-levelmetadata:runtime,location LOD:LinkedMDB,DBpedia Differentontologies(terms),different linkedmdb:filmlinkedmdb:directorlinkedmdb:11264 "BeautyandtheBeast"
andentitylinkageBeautyBeautyandtheruntime"91min." location
"BeautyandtheBeast" "...isa1991Americananimated
"BeautyandtheBeast"
Our CAMO:enrichmultimediametadataviaintegratingSelectDBpediaasthemediationandmatchwithLinkDBpediaentitieswithother andaggregatetheirIncorporatelegacyrelationaldatabases Moreover,provideamobileappforbrowsingandmultimediacontentonAndroid AssesstheadvantagesofintegratingLODintomultimediaSystemClient-ServerServer TheDBpedia3.6ontologyas Global-as-Viewsolutionof Music:DBpedia,DBTune, Movie:DBpedia,LinkedMDB,
Client Android-basedmobile Integratewithamultimedia Search&browsemultimediaSystemSearch,browseand
InstancemobileInstanceJohnrelationalEntityOntologyDataMatchingontologieswith DifferentLODsourceshavedifferentpreferenceson DBpedia,Musicontology Falcon-AO:anautomaticontologymatching Extend knowledgetosupportsynonym trackvs. StructuralStructural
4 Linguisticmatching:V-Doc(TF-IDF)&I-Sub(edit Structuralmatching:GMO(similarityLinkingentitieswith EntitylinkagehelpsmergealldescriptionsindifferentsourcesthattothesamemultimediaTrainingTraining
2
{p1,{p1,p3}?c1vs.{p5,p6}?c3{p1,p2}?c1vs.{p3,p4}?c2Instancelinkage?
Trainingset Negativeexamples:donotholdequivalencerelation Class-baseddiscriminativeproperty Information OnlineIntegratinglegacyrelational Therearestillagreatdealoflegacydatastoredin SomedatainLODaregeneratedfromtheirrelational123123 Element e.g.,entitytableandrelationship Element Instance
similartoontologymatchingandentitylinkage TwoUsabilityandeffectivenessofthemobileIntegrationaccuracyinthe User(1)Usability& 3comparative : : :WikipediaAndroid 6testing 50 10 22 18Usability& SystemUsabilityScale(SUS)&post-task
Post-task yzetheresultaccordingtothetypologyoftheIntegrationOntology 78 incl.18RDB
Entity 60thousand 100samplesper10110Lessons CAMOleveragesontologymatchingandentitylinkagefordataintegrationandsupportsuserstobrowseandsearchmultimediacontentonmobiledevices LessonsOntologymatters:trade-offbetweenexpressivenessandeaseofDataintegrationquality:humancomputation+machineMobileappdesign:conciseness,rankingscheme,user- FutureGeneratecomplex sforsemanticqueryExtendtouser-generatedNLP提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicati
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 密度板美縫施工合同協(xié)議書
- 水療中心泳池改造合同
- 農(nóng)田水利項(xiàng)目招投標(biāo)邀請函范本
- 學(xué)校放學(xué)時(shí)段保安服務(wù)合同范本
- 廣告宣傳采購單效果評估
- 招標(biāo)投標(biāo)創(chuàng)新技術(shù)應(yīng)用
- 城市地下管線塔吊施工協(xié)議
- 建筑設(shè)施知識產(chǎn)權(quán)認(rèn)證管理辦法
- 釀酒行業(yè)釀酒師管理方案
- 產(chǎn)業(yè)園區(qū)展示中心共建租賃合同
- 2024年安徽省高校分類對口招生考試數(shù)學(xué)試卷真題
- 《入侵檢測與防御原理及實(shí)踐(微課版)》全套教學(xué)課件
- IT企業(yè)安全生產(chǎn)管理制度范本
- 工業(yè)傳感器行業(yè)市場調(diào)研分析報(bào)告
- 2024電影數(shù)字節(jié)目管理中心招聘歷年高頻難、易錯(cuò)點(diǎn)練習(xí)500題附帶答案詳解
- 小學(xué)生心理健康講座5
- 上海市市轄區(qū)(2024年-2025年小學(xué)五年級語文)部編版期末考試((上下)學(xué)期)試卷及答案
- 國家職業(yè)技術(shù)技能標(biāo)準(zhǔn) X2-10-07-18 陶瓷工藝師(試行)勞社廳發(fā)200633號
- 人教版八年級上冊生物全冊教案(完整版)教學(xué)設(shè)計(jì)含教學(xué)反思
- 棋牌室消防應(yīng)急預(yù)案
- 2024年銀行考試-銀行間本幣市場交易員資格考試近5年真題附答案
評論
0/150
提交評論