Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Knowledge Base
Jimmy Lai
Yahoo! Search Engineer
r97922028 [at] ntu.edu.tw
2014/05/18
http://www.slideshare.net/jimmy_lai/build-a-searchable-knowledge-base
Outline
Introduction to Knowledge Base
2
Knowledge
3
Application of Knowledge
Base
Personal assistant: Siri, Google now
4
Construct a Knowledge
Base
1. Find good data sources.
5
Wikipedia
A collaborated encyclopedia with more than 30M
articles over 287 languages.
! http://www.theguardian.com/technology/blog/2009/aug/13/wikipedia-edits
6
DBpedia
http://wiki.dbpedia.org/About
7
8
Identifier
Knowledge
Entity
Abstract
Relations
9
What can Python do for us
Data Wrangling
Process the raw text data
Aggregate the data from different sources
Output data as json format
https://github.com/jimmylai/knowledge!
11
Data Preparation
1. Download data from DBpedia
http://downloads.dbpedia.org/current/en/
http://localhost:8983/solr/
14
Search - String Match
To be able to search by entity name
python feed_data.py string_match
config: solr/conf/string_match/schema.xml
<field name="name" type="string" indexed="true" stored="true"
multiValued="false"/>
<field name="abstract" type="string" indexed="false" stored="true"
multiValued="false"/>
15
Search - String Match
http://localhost:8983/solr/string_match/select?q=name%3A%22San+Francisco
%22&wt=json&indent=true
16
Search - Synonym
To be able to search by synonym of entity name
python feed_data.py synonym_string_match
config: solr/conf/synonym_string_match/schema.xml
<field name="name" type=name_text" indexed="true" stored="true" multiValued="false"/>
!
<fieldType name="name_text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
18
Synonym handling at query
time
19
Search - Synonym
Search by synonym.
20
Search - Full Text Search
To be able to search by entity name
python feed_data.py full_text_search
config: solr/conf/full_text_search/schema.xml
<copyField source="name" dest="text"/>
<copyField source="abstract" dest=text"/>
!
21
Search - Full Text Search
22
Search - Geo Search
To be able to search by distance given a location
python feed_data.py geo_search
config: solr/conf/geo_search/schema.xml
<field name="location" type="location" indexed="true" stored="true"
required="false" multiValued="false" />
23
Given condition on distance
24
Search - Put All Together
Search Strategy
1. Input a query
26
27
Review
28
More Applications
Question answering system:
1.Query analysis: identify the intension (e.g. looking
for specific type of entity)
2.Search in the knowledge base
3.Return the knowledge entity
29
The modern search engine dont just provide web page urls. They provide the
direct answer to users.
30
More Data Sources and
Knowledge Entities
Open Data
Open APIs
31
My Life in
Build online services for billions of users.