版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
知識(shí)圖譜架構(gòu)知識(shí)圖譜一般架構(gòu):[來源自百度百科]復(fù)旦大學(xué)知識(shí)圖譜架構(gòu):早期知識(shí)圖譜架構(gòu)知識(shí)圖譜架構(gòu)知識(shí)圖譜一般架構(gòu):[來源自百度百科]1知識(shí)圖譜一般架構(gòu):[來源自百度百科]知識(shí)圖譜一般架構(gòu):[來源自百度百科]2知識(shí)圖譜梳理專題培訓(xùn)課件3架構(gòu)討論早期知識(shí)圖譜架構(gòu)架構(gòu)討論早期知識(shí)圖譜架構(gòu)4知識(shí)抽取實(shí)體概念抽取實(shí)體概念映射關(guān)系抽取質(zhì)量評估知識(shí)抽取實(shí)體概念抽取5KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014A
sampler
of
research
problems?????????????Growth:
knowledge
graphs
are
incomplete!
Link
prediction:
add
relations
Ontology
matching:
connect
graphs
Knowledge
extraction:
extract
new
entities
and
relations
from
web/textValidation:
knowledge
graphs
are
not
always
correct!
Entity
resolution:
merge
duplicate
entities,
split
wrongly
merged
ones
Error
detection:
remove
false
assertionsInterface:
how
to
make
it
easier
to
access
knowledge?
Semantic
parsing:
interpret
the
meaning
of
queries
Question
answering:
compute
answers
using
the
knowledge
graphIntelligence:
can
AI
emerge
from
knowledge
graphs?
Automatic
reasoning
and
planning
Generalization
and
abstraction9KDD2014TutorialonConstruct6關(guān)系抽取定義:常見手段:語義模式匹配[頻繁模式抽取,基于密度聚類,基于語義相似性]層次主題模型[弱監(jiān)督]關(guān)系抽取定義:7KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Methods
and
techniques???Supervised
modelsSemi-supervised
modelsDistant
supervision2.
Entity
resolution?Single
entity
methods?Relational
methods3.
Link
prediction????Rule-based
methodsProbabilistic
modelsFactorization
methodsEmbedding
models80Notinthistutorial:
?Entityclassification?Group/expertdetection?Ontologyalignment?Objectranking 1.Relationextraction:KDD2014TutorialonConstruct8KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014?
Extracting
semantic
relations
between
sets
of
[grounded]
entities?Numerous
variants:?????Undefined
vs
pre-determined
set
of
relationsBinary
vs
n-ary
relations,
facet
discoveryExtracting
temporal
informationSupervision:
{fully,
un,
semi,
distant}-supervisionCues
used:
only
lexical
vs
full
linguistic
features82Relation
Extraction
Kobe
BryantLA
LakersplayForthe
franchise
player
ofonce
again
savedman
of
the
match
forthe
Lakers”his
team”Los
Angeles”“KobeBryant,“Kobe“KobeBryant?KDD2014TutorialonConstruct9KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Supervised
relation
extraction?Sentence-level
labels
of
relation
mentions??"Apple
CEO
Steve
Jobs
said.."
=>
(SteveJobs,
CEO,
Apple)"Steve
Jobs
said
that
Apple
will.."
=>
NIL?Traditional
relation
extraction
datasets???ACE
2004MUC-7Biomedical
datasets
(e.g
BioNLP
clallenges)??Learn
classifiers
from
+/-
examplesTypical
features:
context
words
+
POS,
dependency
path
betweenentities,
named
entity
tags,
token/parse-path/entity
distance83KDD2014TutorialonConstruct10KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Semi-supervised
relation
extraction?Generic
algorithm(遺傳算法)1.2.3.4.5.Start
with
seed
triples
/
golden
seed
patternsExtract
patterns
that
match
seed
triples/patternsTake
the
top-k
extracted
patterns/triplesAdd
to
seed
patterns/triplesGo
to
2?????Many
published
approaches
in
this
category:
Dual
Iterative
Pattern
Relation
Extractor
[Brin,
98]
Snowball
[Agichtein
&
Gravano,
00]
TextRunner
[Banko
et
al.,
07]
–
almost
unsupervisedDiffer
in
pattern
definition
and
selection86KDD2014TutorialonConstruct11founderOfKDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Distantly-supervised
relation
extraction88???Existing
knowledge
base
+
unlabeled
text
generate
examples
Locate
pairs
of
related
entities
in
text
Hypothesizes
that
the
relation
is
expressedGoogle
CEO
Larry
Page
announced
that...Steve
Jobs
has
been
Apple
for
a
while...Pixar
lost
its
co-founder
Steve
Jobs...I
went
to
Paris,
France
for
the
summer...GoogleCEO
capitalOfLarryPageFrance
AppleCEO
PixarSteve
JobsfounderOfKDD2014Tutorialon12Distant
supervision:
modeling
hypotheses
Typical
architecture:
1.
Collect
many
pairs
of
entities
co-occurring
in
sentences
from
text
corpus
2.
If
2
entities
participate
in
a
relation,
several
hypotheses:1.All
sentences
mentioning
them
express
it
[Mintz
et
al.,
09]
“Barack
Obama
is
the
44th
and
current
President
of
the
US.”
(BO,
employedBy,
USA)
89KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Distantsupervision:modeling13KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Sentence-level
features●●●●●Lexical:
words
in
between
and
around
mentions
and
their
parts-of-speech
tags
(conjunctive
form)Syntactic:
dependency
parse
path
between
mentions
along
withside
nodesNamed
Entity
Tags:
for
the
mentionsConjunctions
of
the
above
features
Distant
supervision
is
used
on
to
lots
of
data
sparsity
of
conjunctive
forms
not
an
issue92KDD2014TutorialonConstruct14Distant
supervision:
modeling
hypotheses
Typical
architecture:
1.
Collect
many
pairs
of
entities
co-occurring
in
sentences
from
text
corpus
2.
If
2
entities
participate
in
a
relation,
several
hypotheses:1.2.All
sentences
mentioning
them
express
it
[Mintz
et
al.,
09]At
least
one
sentence
mentioning
them
express
it
[Riedel
et
al.,
10]
“Barack
Obama
is
the
44th
and
current
President
of
the
US.”
(BO,
employedBy,
USA)
“Obama
flew
back
to
the
US
on
Wednesday.”
(BO,
employedBy,
USA)
95KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Distantsupervision:modeling15Distant
supervision:
modeling
hypotheses
Typical
architecture:
1.
Collect
many
pairs
of
entities
co-occurring
in
sentences
from
text
corpus
2.
If
2
entities
participate
in
a
relation,
several
hypotheses:1.2.3.All
sentences
mentioning
them
express
it
[Mintz
et
al.,
09]At
least
one
sentence
mentioning
them
express
it
[Riedel
et
al.,
10]At
least
one
sentence
mentioning
them
express
it
and
2
entities
can
express
multiple
relations
[Hoffmann
et
al.,
11]
[Surdeanu
et
al.,
12]
“Barack
Obama
is
the
44th
and
current
President
of
the
US.”
(BO,
employedBy,
USA)
“Obama
flew
back
tothe
US
justWednesday.”
said.”
employedBy,
USA)
98KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014was
born
in
on
he
always
(BO,
(BO,
bornIn,Distantsupervision:modeling16KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Distant
supervision?Pros???Can
scale
to
the
web,
as
no
supervision
requiredGeneralizes
to
text
from
different
domainsGenerates
a
lot
more
supervision
in
one
iteration?Cons??Needs
high
quality
entity-matchingRelation-expression
hypothesis
can
be
wrongCan
be
compensated
by
the
extraction
model,
redundancy,
language
model?Does
not
generate
negative
examplesPartially
tackled
by
matching
unrelated
entities101KDD2014TutorialonConstruct17KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014104
KobeBryantGasolteammatebornInplayInLeague
BlackMambaEntity
resolution
LA
Lakers
playFor
playFor
Pau35ageKobeB.
BryantVanessaL.BryantmarriedTo
1978Single
entity
resolutionRelational
entity
resolutionKDD2014TutorialonConstruct18DEF:Weconsidertheentityresolution(ER)problem(alsoknownasdeduplication,ormerge–purge),inwhichrecordsdeterminedtorepresentthesamereal-worldentityaresuccessivelylocatedandmergedtheproblemofextracting,matching
andresolvingentitymentionsinstructuredandunstructured
dataMethodsEntityresolution/deduplication ?Multiplementionsofthesameentityiswrongandconfusing.DEF:Entityresolution/dedupl19KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Single-entity
entity
resolution??????????Entity
resolution
without
using
the
relational
context
of
entitiesMany
distances/similarities
for
single-entity
entity
resolution:
Edit
distance
(Levenshtein,
etc.)
Set
similarity
(TF-IDF,
etc.)
Alignment-based
Numeric
distance
between
values
Phonetic
Similarity
Equality
on
a
boolean
predicate
Translation-based
Domain-specific105KDD2014TutorialonConstruct20KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Relational
entity
resolution
–
Simple
strategies
?
Enrich
model
with
relational
features
richer
context
for
matching?Relational
features:??Value
of
edge
or
neighboring
attributeSet
similarity
measures?????Overlap/JaccardAverage
similarity
between
set
membersAdamic/Adar:
two
entities
are
more
similar
if
they
share
more
items
that
areoverall
less
frequentSimRank:
two
entities
are
similar
if
they
are
related
to
similar
objectsKatz
score:
two
entities
are
similar
if
they
are
connected
by
shorter
paths114
KobeBryant1978teammatebornInplayForplayInLeague
BlackMamba
LA
LakersplayFor35agePauGasolKDD2014TutorialonConstruct21KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014
KobeBryant1978teammatebornInplayForplayInLeague
BlackMamba
LA
LakersplayFor
35agePauGasolRelational
entity
resolution
–
Advanced
strategies?????Dependency
graph
approaches
[Dong
et
al.,
05]Relational
clustering
[Bhattacharya
&
Getoor,
07]Probabilistic
Relational
Models
[Pasula
et
al.,
03]Markov
Logic
Networks
[Singla
&
Domingos,
06]Probabilistic
Soft
Logic
[Broecheler
&
Getoor,
10]115KDD2014TutorialonConstruct22KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014LINK
PREDICTION116KDD2014TutorialonConstruct23KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014117
KobeBryantLink
prediction
NY
Knicks
PauGasolteammateplayInLeagueteamInLeagueopponentplayForLA
Lakers
playFor
?
Add
knowledge
from
existing
graph?
No
external
source
?
Reasoning
within
the
graph1.
Rule-based
methods2.
Probabilistic
models3.
Factorization
models4.
Embedding
modelsKDD2014TutorialonConstruct24KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014First
Order
Inductive
Learner
?
FOIL
learns
function-free
Horn
clauses:???118Gasolgiven
positive
negative
examples
of
a
concepta
set
of
background-knowledge
predicatesFOIL
inductively
generates
a
logical
rule
for
the
concept
that
cover
all
+
and
no
-
LA
LakersplayFor
playFor
Pauteammate(x,y)∧
playFor(y,z)
?
playFor(x,z)
teammate
Kobe
Bryant?
Computationally
expensive:
huge
search
space
large,
costly
Horn
clauses?
Must
add
constraints
high
precision
but
low
recall?
Inductive
Logic
Programming:
deterministic
and
potentially
problematicKDD2014TutorialonConstruct25KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014S(KB,
playFor,LAL)iplayForh(pai(KB,LAL))
ipathsPath
Ranking
Algorithm
[Lao
et
al.,
11]???119
LALakersplayFor
PauGasolplayFor
teammate
KobeBryantRandom
walks
on
the
graph
are
used
to
sample
pathsPaths
are
weighted
with
probability
of
reaching
target
from
sourcePaths
are
used
as
ranking
experts
in
a
scoring
function
NY
KnicksplayInLeagueteamInLeagueopponenth(Pa2(KB,LAL))
=
0.2h(Pa1(KB,LAL))
=
0.95KDD2014TutorialonConstruct26KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Link
prediction
with
scoring
functions??A
scoring
function
alone
does
not
grant
a
decisionThresholding:
determine
a
threshold
θ(KB,
playFor,
LAL)
is
True
iff???120S(KB,
playFor,LAL)
Ranking:?
The
most
likely
relation
between
Kobe
Bryant
and
LA
Lakers
is:
rel
argmaxr'relsS(KB,r',LAL)?
The
most
likely
team
for
Kobe
Bryant
is:
obj
argmaxe'entsS(KB,
playFor,e')
As
prior
for
extraction
models
(cf.
Knowledge
Vault)
No
calibration
of
scores
like
probabilitiesKDD2014TutorialonConstruct27知識(shí)圖譜架構(gòu)知識(shí)圖譜一般架構(gòu):[來源自百度百科]復(fù)旦大學(xué)知識(shí)圖譜架構(gòu):早期知識(shí)圖譜架構(gòu)知識(shí)圖譜架構(gòu)知識(shí)圖譜一般架構(gòu):[來源自百度百科]28知識(shí)圖譜一般架構(gòu):[來源自百度百科]知識(shí)圖譜一般架構(gòu):[來源自百度百科]29知識(shí)圖譜梳理專題培訓(xùn)課件30架構(gòu)討論早期知識(shí)圖譜架構(gòu)架構(gòu)討論早期知識(shí)圖譜架構(gòu)31知識(shí)抽取實(shí)體概念抽取實(shí)體概念映射關(guān)系抽取質(zhì)量評估知識(shí)抽取實(shí)體概念抽取32KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014A
sampler
of
research
problems?????????????Growth:
knowledge
graphs
are
incomplete!
Link
prediction:
add
relations
Ontology
matching:
connect
graphs
Knowledge
extraction:
extract
new
entities
and
relations
from
web/textValidation:
knowledge
graphs
are
not
always
correct!
Entity
resolution:
merge
duplicate
entities,
split
wrongly
merged
ones
Error
detection:
remove
false
assertionsInterface:
how
to
make
it
easier
to
access
knowledge?
Semantic
parsing:
interpret
the
meaning
of
queries
Question
answering:
compute
answers
using
the
knowledge
graphIntelligence:
can
AI
emerge
from
knowledge
graphs?
Automatic
reasoning
and
planning
Generalization
and
abstraction9KDD2014TutorialonConstruct33關(guān)系抽取定義:常見手段:語義模式匹配[頻繁模式抽取,基于密度聚類,基于語義相似性]層次主題模型[弱監(jiān)督]關(guān)系抽取定義:34KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Methods
and
techniques???Supervised
modelsSemi-supervised
modelsDistant
supervision2.
Entity
resolution?Single
entity
methods?Relational
methods3.
Link
prediction????Rule-based
methodsProbabilistic
modelsFactorization
methodsEmbedding
models80Notinthistutorial:
?Entityclassification?Group/expertdetection?Ontologyalignment?Objectranking 1.Relationextraction:KDD2014TutorialonConstruct35KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014?
Extracting
semantic
relations
between
sets
of
[grounded]
entities?Numerous
variants:?????Undefined
vs
pre-determined
set
of
relationsBinary
vs
n-ary
relations,
facet
discoveryExtracting
temporal
informationSupervision:
{fully,
un,
semi,
distant}-supervisionCues
used:
only
lexical
vs
full
linguistic
features82Relation
Extraction
Kobe
BryantLA
LakersplayForthe
franchise
player
ofonce
again
savedman
of
the
match
forthe
Lakers”his
team”Los
Angeles”“KobeBryant,“Kobe“KobeBryant?KDD2014TutorialonConstruct36KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Supervised
relation
extraction?Sentence-level
labels
of
relation
mentions??"Apple
CEO
Steve
Jobs
said.."
=>
(SteveJobs,
CEO,
Apple)"Steve
Jobs
said
that
Apple
will.."
=>
NIL?Traditional
relation
extraction
datasets???ACE
2004MUC-7Biomedical
datasets
(e.g
BioNLP
clallenges)??Learn
classifiers
from
+/-
examplesTypical
features:
context
words
+
POS,
dependency
path
betweenentities,
named
entity
tags,
token/parse-path/entity
distance83KDD2014TutorialonConstruct37KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Semi-supervised
relation
extraction?Generic
algorithm(遺傳算法)1.2.3.4.5.Start
with
seed
triples
/
golden
seed
patternsExtract
patterns
that
match
seed
triples/patternsTake
the
top-k
extracted
patterns/triplesAdd
to
seed
patterns/triplesGo
to
2?????Many
published
approaches
in
this
category:
Dual
Iterative
Pattern
Relation
Extractor
[Brin,
98]
Snowball
[Agichtein
&
Gravano,
00]
TextRunner
[Banko
et
al.,
07]
–
almost
unsupervisedDiffer
in
pattern
definition
and
selection86KDD2014TutorialonConstruct38founderOfKDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Distantly-supervised
relation
extraction88???Existing
knowledge
base
+
unlabeled
text
generate
examples
Locate
pairs
of
related
entities
in
text
Hypothesizes
that
the
relation
is
expressedGoogle
CEO
Larry
Page
announced
that...Steve
Jobs
has
been
Apple
for
a
while...Pixar
lost
its
co-founder
Steve
Jobs...I
went
to
Paris,
France
for
the
summer...GoogleCEO
capitalOfLarryPageFrance
AppleCEO
PixarSteve
JobsfounderOfKDD2014Tutorialon39Distant
supervision:
modeling
hypotheses
Typical
architecture:
1.
Collect
many
pairs
of
entities
co-occurring
in
sentences
from
text
corpus
2.
If
2
entities
participate
in
a
relation,
several
hypotheses:1.All
sentences
mentioning
them
express
it
[Mintz
et
al.,
09]
“Barack
Obama
is
the
44th
and
current
President
of
the
US.”
(BO,
employedBy,
USA)
89KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Distantsupervision:modeling40KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Sentence-level
features●●●●●Lexical:
words
in
between
and
around
mentions
and
their
parts-of-speech
tags
(conjunctive
form)Syntactic:
dependency
parse
path
between
mentions
along
withside
nodesNamed
Entity
Tags:
for
the
mentionsConjunctions
of
the
above
features
Distant
supervision
is
used
on
to
lots
of
data
sparsity
of
conjunctive
forms
not
an
issue92KDD2014TutorialonConstruct41Distant
supervision:
modeling
hypotheses
Typical
architecture:
1.
Collect
many
pairs
of
entities
co-occurring
in
sentences
from
text
corpus
2.
If
2
entities
participate
in
a
relation,
several
hypotheses:1.2.All
sentences
mentioning
them
express
it
[Mintz
et
al.,
09]At
least
one
sentence
mentioning
them
express
it
[Riedel
et
al.,
10]
“Barack
Obama
is
the
44th
and
current
President
of
the
US.”
(BO,
employedBy,
USA)
“Obama
flew
back
to
the
US
on
Wednesday.”
(BO,
employedBy,
USA)
95KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Distantsupervision:modeling42Distant
supervision:
modeling
hypotheses
Typical
architecture:
1.
Collect
many
pairs
of
entities
co-occurring
in
sentences
from
text
corpus
2.
If
2
entities
participate
in
a
relation,
several
hypotheses:1.2.3.All
sentences
mentioning
them
express
it
[Mintz
et
al.,
09]At
least
one
sentence
mentioning
them
express
it
[Riedel
et
al.,
10]At
least
one
sentence
mentioning
them
express
it
and
2
entities
can
express
multiple
relations
[Hoffmann
et
al.,
11]
[Surdeanu
et
al.,
12]
“Barack
Obama
is
the
44th
and
current
President
of
the
US.”
(BO,
employedBy,
USA)
“Obama
flew
back
tothe
US
justWednesday.”
said.”
employedBy,
USA)
98KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014was
born
in
on
he
always
(BO,
(BO,
bornIn,Distantsupervision:modeling43KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Distant
supervision?Pros???Can
scale
to
the
web,
as
no
supervision
requiredGeneralizes
to
text
from
different
domainsGenerates
a
lot
more
supervision
in
one
iteration?Cons??Needs
high
quality
entity-matchingRelation-expression
hypothesis
can
be
wrongCan
be
compensated
by
the
extraction
model,
redundancy,
language
model?Does
not
generate
negative
examplesPartially
tackled
by
matching
unrelated
entities101KDD2014TutorialonConstruct44KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014104
KobeBryantGasolteammatebornInplayInLeague
BlackMambaEntity
resolution
LA
Lakers
playFor
playFor
Pau35ageKobeB.
BryantVanessaL.BryantmarriedTo
1978Single
entity
resolutionRelational
entity
resolutionKDD2014TutorialonConstruct45DEF:Weconsidertheentityresolution(ER)problem(alsoknownasdeduplication,ormerge–purge),inwhichrecordsdeterminedtorepresentthesamereal-worldentityaresuccessivelylocatedandmergedtheproblemofextracting,matching
andresolvingentitymentionsinstructuredandunstructured
dataMethodsEntityresolution/deduplication ?Multiplementionsofthesameentityiswrongandconfusing.DEF:Entityresolution/dedupl46KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Single-entity
entity
resolution??????????Entity
resolution
without
using
the
relational
context
of
entitiesMany
distances/similarities
for
single-entity
entity
resolution:
Edit
distance
(Levenshtein,
etc.)
Set
similarity
(TF-IDF,
etc.)
Alignment-based
Numeric
distance
between
values
Phonetic
Similarity
Equality
on
a
boolean
predicate
Translation-based
Domain-specific105KDD2014TutorialonConstruct47KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Relational
entity
resolution
–
Simple
strategies
?
Enrich
model
with
relational
features
richer
context
for
matching?Relational
features:??Value
of
edge
or
neighboring
attributeSet
similarity
measures?????Overlap/JaccardAverage
similarity
between
set
membersAdamic/Adar:
two
entities
are
more
similar
if
they
share
more
items
that
areoverall
less
frequentSimRank:
two
entities
are
similar
if
they
are
related
to
similar
objectsKatz
score:
two
entities
are
similar
if
they
are
connected
by
shorter
paths114
KobeBryant1978teammatebornInplayForplayInLeague
BlackMamba
LA
LakersplayFor35agePauGasolKDD2014TutorialonConstruct48KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014
KobeBryant1978teammatebornInplayForplayInLeague
BlackMamba
LA
LakersplayFor
35agePauGasolRelational
entity
resolution
–
Advanced
strategies?????Dependency
graph
approaches
[Dong
et
al.,
05]Relational
clustering
[Bhattacharya
&
Getoor,
07]Probabilistic
Relational
Models
[Pasula
et
al.,
03]Markov
Logic
Networks
[Singla
&
Domingos,
06]Probabilistic
Soft
Logic
[Broecheler
&
Getoor,
10]115KDD2014TutorialonConstruct49KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
201
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 精密輸送帶銷售協(xié)議
- 隧道支護(hù)專項(xiàng)作業(yè)勞務(wù)分包協(xié)議
- 軟件外包項(xiàng)目技術(shù)協(xié)議解析
- 大型機(jī)械設(shè)備交易協(xié)議
- 獨(dú)家代理商合同范本
- 裝卸合作承包協(xié)議
- 小區(qū)房產(chǎn)買賣合同問答
- 育苗基地合作方案
- 典當(dāng)行貸款協(xié)議范本
- 弱電智能化勞務(wù)分包條件
- 2020年污水處理廠設(shè)備操作維護(hù)必備
- LSS-250B 純水冷卻器說明書
- 中藥分類大全
- 防止返貧監(jiān)測工作開展情況總結(jié)范文
- 精文減會(huì)經(jīng)驗(yàn)交流材料
- 淺談離子交換樹脂在精制糖行業(yè)中的應(yīng)用
- 設(shè)備研發(fā)項(xiàng)目進(jìn)度表
- 管道定額價(jià)目表
- 新時(shí)期如何做好檔案管理課件
- 復(fù)興號(hào)動(dòng)車組空調(diào)系統(tǒng)設(shè)計(jì)優(yōu)化及應(yīng)用
- 礦山壓力與巖層控制課程設(shè)計(jì).doc
評論
0/150
提交評論