Está en la página 1de 37

to Advance Knowledge for Humanity

Domain Specific Multi-stage Query Language


for Medical Document Repositories

Aastha Madaan
University of Aizu
8/25/2013

VLDB Phd Workshop 2013

to Advance Knowledge for Humanity

Introduction
Specialized Domains
Biomedical, agriculture, medical/healthcare
Require Effective search and query mechanisms
Insufficient Search Engine-like Keywords based search
Medical domain
Complex
Example,
Query general AIDS information
medical search tool, such as PubMed
Result 1000s of documents Different aspects of AIDS
Such as , treatment, drug therapy, transmission, diagnosis, and history

Medical professionals Specific technical articles (particular topic )


General public General information (disease or medicine).

How to retrieve medical information effectively


8/25/2013

VLDB Phd Workshop 2013

to Advance Knowledge for Humanity

Introduction (1)
Medical Information
Knowledge Evolved over 10s of years
Contains Well defined terms and processes

Available on the Web


Patient Specific Information

Knowledge-based Information

Web Documents
Patient-encounter
Recordings
Medical Literature
8/25/2013

VLDB Phd Workshop 2013

to Advance Knowledge for Humanity

Introduction (2)
Complexity Knowledge-based Resources
Heterogeneous End-user groups

Patients, researchers, doctors and other experts

Variation Information Requirements

Patient-treatment, self -diagnosis, general health information

Structure Medical Documents

8/25/2013

Scientific papers, encyclopedias and other literature


Unique, well-defined

VLDB Phd Workshop 2013

to Advance Knowledge for Humanity

Introduction (3): Specialized Documents


Case of medical encyclopedias
Comprehensive medical guide Patients and clinicians
Authoritative source NLM (National Library of Medicine)
Paper based resources Electronic format

Frequently referred Medical domain users


Example, MedlinePlus, WebMD, ADAMS,
Merriam-Webster Medical Dictionary

8/25/2013

VLDB Phd Workshop 2013

to Advance Knowledge for Humanity

Introduction (4): Why Query


External knowledge base Clinicians

Evidence based medicine


During different stages of point-of-care

Assessment plan of treatment


Patient diagnosis

Improve Quality of Care


Authoritative information required

Self Diagnosis
During Early appearance of symptoms
Personal Knowledge Patients and their relatives
8/25/2013

VLDB Phd Workshop 2013

to Advance Knowledge for Humanity

The Underlying Structure


The Hierarchical Structure
Topic of the Document

Miscellaneous/Related
Subtopics

Content

Subtopic 1

Subtopic 2

Subtopic n

Content topic 1

Content topic 2

Content topic n

Content

Content

Content

Content

Content

Content

Flow of Contents Organized as stages of point-of-care


8/25/2013

VLDB Phd Workshop 2013

to Advance Knowledge for Humanity

Introduction (5): The End Users


Variable
a.
b.
c.

Demographical Characteristics
Tasks/Purpose
Computer/Domain Expertise

Practitioners and Researchers

Well-versed
Domain knowledge and terminologies
Require
Precise, complete, accurate and timely results

Healthcare Workers

Specialized
Researchers

Patients and their relatives

NOT Well-versed
Domain knowledge and terminologies
Require
General information
Patients, their relatives

8/25/2013

VLDB Phd Workshop 2013

to Advance Knowledge for Humanity

The Medical Queries


Evidence-based Queries
Intent: Diagnostic
Raised by: Clinicians/Experts
Target resources: Online Medical Repositories (e.g. medical encyclopedia)
Example: Cases where helicobacter bacteria causes peptic ulcer

Hypothesis-directed Queries
Intent: Non-diagnostic
Raised by: Novice users/patients
Target resources: Online Medical Repositories (e.g. medical encyclopedia)
Example: Treatment in case of high fever and dizziness

8/25/2013

VLDB Phd Workshop 2013

to Advance Knowledge for Humanity

The Query Flows


Occurrence Evidence-based and hypothesis-directed queries
Represent Stages of information seeking

Comprise Varying levels of query complexity

2.

1.

8/25/2013

3.

VLDB Phd Workshop 2013

4.

10

to Advance Knowledge for Humanity

The Research Gap


Healthcare Expert

Help Needed
Paper-based resources

Query: Find chances of "Cancer Risk" in patients showing symptom "Sleep Deprivation"
and have been exposed to "Radiation" (but not "Environmental Toxins" and does not have
"Genetic Disorder") .
Results Large in number, irrelevant

Failure Keyword search, domain-specific search tools


Require Precise and easy-to-use database style query methods
Key steps:
1. Schema understandable by users
2. Identify Resources to query
3. Identify Granularity of results

8/25/2013

VLDB Phd Workshop 2013

11

to Advance Knowledge for Humanity

Bridging the Gap


Aim: Effective Online Medical Information
Transform Document Repository User-Level Schema
Enable High-level Query Language

Target Audience Skilled and semi-skilled users


Utilize Query capabilities of a database query language

Assist Domain Experts Using Query language


Facilitate
In-depth Queries

Granular Results

8/25/2013

VLDB Phd Workshop 2013

12

to Advance Knowledge for Humanity

Querying the New Way


Resource
-

Results returned
Lack specificity
Long list of full documents
Trustworthiness of resources unknown

User-level
Schema

High-level Query Query


Method
language

Keyword
Search

Results returned
Specific, granular
Segments of documents query criteria
Trustworthy/Authoritative sources only

Medical
Expert

Medical
Expert

8/25/2013

Resource

Traditional
Method

VLDB Phd Workshop 2013

Proposed
Method

13

to Advance Knowledge for Humanity

Proposed Approach

8/25/2013

VLDB Phd Workshop 2013

14

to Advance Knowledge for Humanity

Key Features
User-Level Schema
Universal , concept-level schema
Attributes
Understandable Domain experts and novice users
Query-able
Granular results

Multi-stage Query Language


Map multi-stage diagnostic process Step-by-step Query Flow
Interactive Querying View Results Add concept
Continuous query refinement
Supported Queries Simple, Medium, Complex, Recursive

8/25/2013

VLDB Phd Workshop 2013

15

to Advance Knowledge for Humanity

Outline
Two-step framework
Offline Process Create User-level schema
Online Process Enable Multi-stage Query Language

Offline Process
Use Web segmentation algorithm, Domain concepts
Result Automatic creation of a User-level schema

Online Process
Enable Multi-stage Query Language
Use User-level schema
Results Granular, segment-level, context-based

8/25/2013

VLDB Phd Workshop 2013

16

to Advance Knowledge for Humanity

The Method
Two-step Framework

8/25/2013

VLDB Phd Workshop 2013

17

to Advance Knowledge for Humanity

Data Model

8/25/2013

VLDB Phd Workshop 2013

18

to Advance Knowledge for Humanity

Data Model
Tree Structured Repository
H1
f1
f2

Example: MedlinePlus Medical Encyclopedia

8/25/2013

VLDB Phd Workshop 2013

19

to Advance Knowledge for Humanity

Data Model (1)

8/25/2013

VLDB Phd Workshop 2013

20

to Advance Knowledge for Humanity

Data Model (2):The Schema


Attributes Diagnostic concepts/terms
Understandable by expert and novice users
Do not change frequently

8/25/2013

VLDB Phd Workshop 2013

21

to Advance Knowledge for Humanity

Data Model (3): A XML Document


Document corresponding to Aarskog Syndrome
Title

Causes

Symptoms

Treatment

Example: MedlinePlus Medical Encyclopedia

8/25/2013

VLDB Phd Workshop 2013

22

to Advance Knowledge for Humanity

Data Model (4): Query Effort


Query: Find if "Oxygen therapy" work for the treatment of "Chronic Respiratory
Failure" and symptoms are "Lethargy" OR "Shortness of breath.

Not Possible

Easy-to-Use

Advanced search: MedlinePlus

Result segment

Proposed Method
SELECT attribute = Disease_name
WHERE
Attribute Disease_name = Chronic
Respiratory Failure
AND
Attribute Treatment = Oxygen therapy
AND
Attribute Symptoms =Lethargy
OR
Attribute Symptoms = Shortness of breath
Context of user-query

8/25/2013

VLDB Phd Workshop 2013

23

to Advance Knowledge for Humanity

Data Model (5): Granular Results


Each result is a segment, combination of
Concept/context in query
Item of concern (content enclosed in a
segment)

8/25/2013

Context

Granular

Queried
Attributes/Segments

Query Results

VLDB Phd Workshop 2013

24

to Advance Knowledge for Humanity

An Example
Query: Find other symptoms where chronic kidney failure is
caused by anemia
Queried segment Symptoms
Segments in Query Causes = anemia and Disease_name
= Chronic kidney failure
Result Segment Symptoms
Context disease_name = chronic kidney failure & causes
= anemia (SYMPTOM - segment)

8/25/2013

VLDB Phd Workshop 2013

25

to Advance Knowledge for Humanity

Next Step Multi-stage Query Language

8/25/2013

VLDB Phd Workshop 2013

26

to Advance Knowledge for Humanity

Proposed Query Language (1)


XQBE Medical document repositories
Create queries Drag and drop interface
Query : Find cases where a person is inflicted with peptic ulcer due to
helicobacter pylon bacteria

Attributes understandable by end


users
1. Case = disease_name
Value = ??
2. Due to = Causes
Value = helicobacter pylon bacteria
3. Inflicted with = Symptoms
Value = peptic ulcer

Query Effort
Minimal learning curve
Computer-expertise not required
8/25/2013

VLDB Phd Workshop 2013

27

to Advance Knowledge for Humanity

Proposed Query Language (2)


Multi-stage Query-by-Concept

Concept Query-able attribute


Topic, sub-topic, medical concept

Query Effort
Dynamic selection of attributes
No computer expertise

An Example: Cases where fever is caused due


to infliction of Pneumonia and Tuberculosis

Query Process

8/25/2013

VLDB Phd Workshop 2013

28

to Advance Knowledge for Humanity

Another Example
Query: Find cases where 3 clinical concepts (cough, no
sore-throat, and had no sterol injection) occur in context of
symptoms occur along with a concept having sub-key (i.e. non
sterol injection at the left side)
Possible XQBE on Specialized Medical Repositories

Possible Multi-stage Query-by-concept Query Language

8/25/2013

VLDB Phd Workshop 2013

29

to Advance Knowledge for Humanity

Evaluation Plan
Data Sets
MedlinePlus document repository
Health topics (900+) , encyclopedia (4000+), drugs (12000+)

Set of Queries
50 test queries (multi-staged)
Using diseases and medication etc.

Quantitative Studies
Evaluation Metrics
Accuracy of segment extraction (schema creation) Precision and Recall
Reduction in search space

Qualitative Studies
Usability Studies
Actual End-users
Query Performance

8/25/2013

VLDB Phd Workshop 2013

30

to Advance Knowledge for Humanity

Initial Achievements
HTML XML as per proposed model
XQuery on XML
Integration with XQBE
Query by concept Enumeration using paper and
pencil

8/25/2013

VLDB Phd Workshop 2013

31

to Advance Knowledge for Humanity

Challenges

Scalability Similarly structured repositories

List Query operations needed


Implementation Above query operations
Query Language User Interface
8/25/2013

VLDB Phd Workshop 2013

32

to Advance Knowledge for Humanity

Related Work
Domain-specific Information Retrieval
Similarity and popularity based models Insufficient for domain experts
Information granulation needs to be considered in huge document repositories
Form-based Query Interfaces
Easy-to-use
Limited access to the database
Complex queries large number of forms
Varying medical concepts large number of fields in forms
Beyond single page web search results
Provide granular results for users search
Return segments from multiple or related web documents as results
High-level Graphical Query Languages
Easy-to-use and understand
Little or no programming effort required by the user
Common languages QBE, XQBE
8/25/2013

VLDB Phd Workshop 2013

33

to Advance Knowledge for Humanity

Summary and Conclusions


Proposed Multi-stage Query Language
1.

Aim Effective online medical resources

2.

Key feature User-Level Schema

3.

Facilitates Granular/Context-based Results

4.

Support Healthcare Experts

5.

Minimize Learning curve for novice users

6.

Reduce Dependency on general-purpose search engines

Provide Web user level activity no or little programming effort


Healthcare experts

8/25/2013

VLDB Phd Workshop 2013

34

to Advance Knowledge for Humanity

References (1)
[1] D. Braga, A. Campi, and S. Ceri. Xqbe (xquery by example): A visual interface to the standard xml query language. ACM
Trans. Database Syst., 30(2):398443, June 2005.
[2] D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. Extracting content structure for web pages based on visual representation. In
Proceedings of the 5th APWeb, pages 406417. Springer-Verlag, 2003.
[3] M.-A. Cartright, R. W. White, and E. Horvitz. Intentions and attention in exploratory health search. In Proceedings of the 34th
Intl. ACM SIGIR conference, pages 6574, New York, NY, USA, 2011.ACM.
[4] S. Cohen, Y. Kanza, Y. Kogan, W. Nutt, Y. Sagiv, and A. Serebrenik. Equix-a search and query language for xml. Journal of
the American Society for Information Science and Technology, 53:2002, 2000.
[5] S. M. Freire, E. Sundvall, D. Karlsson, and P. Lambrix. Performance of XML Databases for Epidemiological Queries in
Archetype-Based EHRs. In Proceedings Scandinavian Conference on Health Informatics 2012, volume 70 of Linkping Electronic
Conference Proceedings, pages 5157. Linkping University Electronic Press, 2012.
[6] M. Gschwandtner, M. Kritz, and C. Boyer. Requirements of the health professional research. In Technical Report D8.1.2.
Khresmoi Project, 2011.
[7] A. Hanbury. Medical information retrieval, an instance of domain. In SIGIR'12. ACM, August 2012.
[8] S. Hunt, J. J. Cimino, and D. E. Koziol. A comparison of clinicianss access to online knowledge resources using two types of
information retrieval applications in an academic hospital setting. J Med Libr Assoc, 101(1):2631, 2013.
[9] http://www.who.int/classifications/icd/en/, 2011.
[10] M. Jayapandian and H. V. Jagadish. Automating the design and construction of query forms. ICDE, page 125, 2006.
[11] F. Li and H. V. Jagadish. Usability, databases, and hci. IEEE Data Eng. Bull., 35(3):3745, 2012. [12] http://loinc.org/, 2011.
[13] A. Marian and W. Wang. Flexible querying of personal information. IEEE Data Eng. Bull., 32(2):2027, 2009.
[14] http://www.nlm.nih.gov/bsd/pmresources.html, 2011.
[15] http://www.nlm.nih.gov/medlineplus/, 2009.
[16] http://www.linkedin.com/groups/ Choice-OpenEHR-persistence-layer-144276.S.208531138?qid=208adbca-fc26-4ada-bf027efe5a9e5661&trk=group_most_recent_rich-0-b-ttl&goback=%2Egmr_144276, 2013.

8/25/2013

VLDB Phd Workshop 2013

35

to Advance Knowledge for Humanity

References (2)
[17] http://www.ncbi.nlm.nih.gov/pubmed, 2011.
[18] S. A. Rahman, S. Bhalla, and T. Hashimoto. Query-by-object interface for information requirement elicitation in m-commerce. Int.
J. Hum. Comput. Interaction, 20(2):135160, 2006.
[19] X. Y. Raymond, Y. Lau, D. Song, X. Li, and J. Ma. Toward a semantic granularity model for domain-specific information retrieval.
ACM Trans. On Information Systems., 29(3), July 2011.
[20] S. Sachdeva and S. Bhalla. Implementing high-level query language interfaces for archetype-based electronic health records
database. In COMAD, 2009.
[21] http://www.ihtsdo.org/snomed-ct/, 2011.
[22] R. Varadarajan, V. Hristidis, and T. Li. Beyond single-page web search results. IEEE Transactions on Knowledge and Data
Engineering, 20(3):411424, 2008.
[23] A. Yasir, M. Kumara Swamy, P. Krishna Reddy, and S. Bhalla. Enhanced query-by-object approach for information requirement
elicitation in large databases. In Big Data Analytics, volume 7678 of Lecture Notes in Computer Science, pages 2641. Springer, 2012.

8/25/2013

VLDB Phd Workshop 2013

36

to Advance Knowledge for Humanity

Questions

8/25/2013

VLDB Phd Workshop 2013

37

También podría gustarte