Documentos de Académico
Documentos de Profesional
Documentos de Cultura
R G
d V
ti e
Un
-
9
ri 9
h
ta
BASICS IN
EPIDEMIOLOGY AND BIOSTATISTICS
R G
d V
ti e
Waqar H Kazmi MD MS (Tufts, Boston)
Principal, Professor of Nephrology and Director Research
n
Karachi Medical and Dental College/Abbasi Shaheed Hospital
Karachi, Pakistan
- U
Farida Habib Khan DCH MPH MCPS FCPS
Professor of Community Medicine
9
Princess Nora Bint Abdulrahman University
Riyadh, Kingdom of Saudi Arabia
ri 9 Foreword
h
Waris Qidwai
ta
The Health Sciences Publisher
New Delhi | London | Philadelphia | Panama
Jaypee Brothers Medical Publishers (P) Ltd.
Headquarters
Jaypee Brothers Medical Publishers (P) Ltd.
4838/24, Ansari Road, Daryaganj
New Delhi 110 002, India
Phone: +91-11-43574357
Fax: +91-11-43574314
E-mail: jaypee@jaypeebrothers.com
Overseas Offices
J.P. Medical Ltd. Jaypee-Highlights Medical Publishers Inc.
83, Victoria Street, London City of Knowledge, Bld. 237, Clayton
SW1H 0HW (UK) Panama City, Panama
Phone: +44-20 3170 8910 Phone: +1 507-301-0496
Fax: +44(0)20 3008 6180 Fax: +1 507-301-0499
E-mail: info@jpmedpub.com E-mail: cservice@jphmedical.com
Jaypee Medical Inc. Jaypee Brothers Medical Publishers (P) Ltd.
The Bourse 17/1-B, Babar Road, Block-B, Shaymali
111, South Independence Mall East Mohammadpur, Dhaka-1207
Suite 835, Philadelphia, PA 19106, USA Bangladesh
Phone: +1 267-519-9789 Mobile: +08801912003485
E-mail: jpmed.us@gmail.com E-mail: jaypeedhaka@gmail.com
Website: www.jaypeebrothers.com
Website: www.jaypeedigital.com
© 2015, Jaypee Brothers Medical Publishers
The views and opinions expressed in this book are solely those of the original contributor(s)/author(s)
and do not necessarily represent those of editor(s) of the book.
All rights reserved. No part of this publication may be reproduced, stored or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior
permission in writing of the publishers.
All brand names and product names used in this book are trade names, service marks, trademarks
or registered trademarks of their respective owners. The publisher is not associated with any product
or vendor mentioned in this book.
Medical knowledge and practice change constantly. This book is designed to provide accurate,
authoritative information about the subject matter in question. However, readers are advised to
check the most current information available on procedures included and check information from the
manufacturer of each product to be administered, to verify the recommended dose, formula, method
and duration of administration, adverse effects and contraindications. It is the responsibility of the
practitioner to take all appropriate safety precautions. Neither the publisher nor the author(s)/editor(s)
assume any liability for any injury and/or damage to persons or property arising from or related to use
of material in this book.
This book is sold on the understanding that the publisher is not engaged in providing professional
medical services. If such advice or services are required, the services of a competent medical
professional should be sought.
Every effort has been made where necessary to contact holders of copyright to obtain permission to
reproduce copyright material. If any have been inadvertently overlooked, the publisher will be pleased
to make the necessary arrangements at the first opportunity.
Inquiries for bulk sales may be solicited at: jaypee@jaypeebrothers.com
Basics in Epidemiology and Biostatistics
First Edition: 2015
ISBN: 978-93-5152-631-5
Printed at
tahir99 - UnitedVRG
R G
V
Dedicated to
d
ti e
Medical and Dental Students
and
Young Researchers
Un
-
9
ri 9
h
ta
Foreword
G
and Biostatistics, written by highly eminent and respected scholars Professor
Waqar H Kazmi and Professor Farida Habib Khan. Prof Kazmi is considered an
R
authority on this subject and has skills to present challenging concepts in the
area of epidemiology and biostatistics, in an easy-to-understand language.
V
He obtained his Masters in Epidemiology from Tufts University, Boston, USA
and has a strong clinical background being a Professor of Nephrology, as
d
well. Farida Habib Khan is the Professor of Community Medicine and served
College of Physicians and Surgeons as a regular facilitator of the Workshops
ti e
on Research Methodology and dissertation writing and served two medical
journals as an Associate Editor.
The book fills a great need that exists for availability of such books on
n
this important yet neglected subject. Epidemiology and biostatistics has
been neglected in medical education curriculum and, therefore, healthcare
providers are lacking expertise in this important area. The book will go a long
U
way, in addressing important need to provide an easy-to-understand guide
for healthcare providers and others, to understand and apply concepts of
-
epidemiology and biostatistics in their work. Its simple language and prac-
tical approach, makes it indispensable for those involved in research work as
9
well as those associated with teaching epidemiology and biostatistics. It will
be useful for undergraduate and postgraduate students in various disciplines
ri 9
of healthcare as well as those practicing medicine.
Besides, the book would be highly useful to healthcare providers, teachers
and researchers.
h
Waris Qidwai
Chair, Working Party on Research
ta
World Organization of Family Doctors (WONCA)
Former Chair
International Federation of Primary Care
Research Networks (IFPCRN)
Professor and Chairman
Department of Family Medicine
Aga Khan University
Karachi, Pakistan
Preface
G
postgraduates, researchers, or clinicians, to the study of statistics applied to
medicine. We have incorporated our experiences in medicine and statistics to
R
develop a comprehensive text covering the traditional topics of biostatistics
and epidemiology. Particular emphasis is given to study design and the inter-
V
pretation of results of medical research.
It has been more than a decade that we have been giving lectures at
d
various undergraduate and postgraduate institutes. The students find these
lectures worthwhile for the understanding of basic concepts in biostatistics
ti e
and epidemiology. We realized that by writing a book, we could reach a large
number of students and faculty members in remote areas, which were not
accessible to us otherwise. Thus, we hope that anyone interested in research
n
will find the book extremely helpful.
We have tried to explain all statistical concepts in simple terms. No special
background knowledge will require to understand the text. An effort has been
U
made to cover all the fundamental concepts and important terms in the book.
-
The book contains the following features:
Simple Text
9
The book is written in a very simple and easy-to-understand manner. The
information given in the book is relevant to the need of any junior and early
ri 9
stage researcher. The information is presented in a schematic pattern. This is
necessary because a learner must understand the pre-requisite information
before understanding the more advanced concepts in basic epidemiology
and biostatistics. Thus, all the information have been presented in a schematic
h
and synchronized way so that the reader could grasp them very easily.
ta
Pictorial and Tabular Display of Information
Different learners have different learning styles. Some find textual informa-
tion easy to understand, while others are more at ease of understanding the
pictorial and tabular display of information. Thus, all relevant texts have also
been presented in a pictorial and tabular form. We hope that a large number
of readers could grasp the important and useful information by having a good
look at the pictures and tables.
Relevant Examples
We have used multiple clinical and nonclinical examples so that the reader
will understand the basic concepts of epidemiology and biostatistics. Simple
interesting examples have also been used for the purpose.
x Basics in Epidemiology and Biostatistics
Waqar H Kazmi
Farida Habib Khan
tahir99 - UnitedVRG
Acknowledgments
G
Department, Karachi Medical and Dental College, Karachi, Pakistan, for his
invaluable support and efforts in every stage of writing the book.
R
We express our gratitude to Mrs Huma Khan, Research Co-ordinator,
Universal Research Group, Pakistan, for her support regarding proofreading of
V
the book.
We are thankful to Asma Kazmi, Assistant Professor, California Institute of
d
Fine Arts, Los Angeles, USA, for designing the Cover Page.
Our special thanks to M/s Jaypee Brothers Medical Publishers (P) Ltd, New
ti e
Delhi, India, for their active co-operation in publishing this book.
Un
-
9
ri 9
h
ta
Contents
1. Introduction to Research 1
G
y What is Research ? 1
y
y Types of Research 1
R
y
y Steps to Conduct Research 3
y
y Selection of Research Topic 3
V
y
y Scale for Rating Research Topics 5
y
d
y Resources of Literature Search 5
y
2. Study Designs 8
ti e
y Definition 8
y
y Types of Epidemiological Study Designs 8
y
n
y Descriptive Observational Studies 10
y
y Analytical or Comparative Studies 14
y
y Analytical Observational Studies 14
U
y
y Registries 20
y
y Interventional/Experimental Studies 21
-
y
y Blinding 24
y
y Consent Form 25
9
y
y Intent to Treat Analysis 25
y
y Quasi-experimental Studies 25
ri 9
y
y Clinical Trials and their Phases 25
y
y Research Questions and Study Types 27
y
y Meta-analysis 27
y
h
3. Sampling Procedure 30
ta
y Population 30
y
y Reasons for Sampling 31
y
y Sampling Techniques 31
y
4. Variables, Data and its Presentation 41
y Variables and their Types 41
y
y Data and its Types 42
y
y Tabulation and Graphical Presentation of Data 44
y
5. Biostatistics: Basic 51
y Measures of Central Tendency 51
y
y Measures of Variation 52
y
xiv Basics in Epidemiology and Biostatistics
d
y
15. Synopsis Writing 129
ti e
y Methodology 129
y
y Plan for Analysis of Results 130
y
y Title/Topic 130
n
y
y Introduction 130
y
16. Dissertation Writing 151
U
y Steps in Writing a Dissertation 151
y
-
y Title 152
y
y Table of Content 152
y
y Title Page 152
9
y
y Abstract 152
y
y Introduction 152
ri 9
y
y Hypothesis 153
y
y Study Objective 153
y
y Subjects/Material and Methods 153
h
y
y Results 153
y
y Discussion 154
ta
y
y Optional Components 154
y
y References 155
y
y Annexes 155
y
y The Whole Manuscript/Dissertation Should be
y
in Past Tense 155
y Sample of Title Page 155
y
17. Reference Writing 157
y Citing a Journal Article 157
y
y Title of Journal Article 158
y
y Journal’s Title 158
y
y Citing a Book Reference 159
y
xvi Basics in Epidemiology and Biostatistics
1
Introduction to
Research
R G
V
WHAT IS RESEARCH ?
d
Research is a systematic process of collection and analysis of data
ti e
and later on its interpretation so as to find solutions to a problem or
any event around us (Fig. 1.1).
n
TYPES OF RESEARCH
U
Basically research is of two types, i.e. empirical and theoretical
(Flow chart 1.1 for the classification of research). Empirical approach
-
is based upon observation and experience, while theoretical is
based upon theory and abstraction. Both empirical and theoretical
research complement with each other to develop an understanding
9
of the phenomenon, predict future events and prevent harmful
ri 9
events for the general welfare of the population of interest.
Empirical research is further divided into qualitative and
quantitative.
h
Qualitative Research
ta
This type of research is context based. Here there is an inquiry with
the goal to understand a social or human problem so build up a
complex and holistic picture of the phenomena of interest. The
researcher interprets the results of perspectives or information
taken from subjects.
tahir99 - UnitedVRG
Introduction to Research 3
Quantitative Research
In quantitative research reality is studied objectively by the
researcher. Theory or hypothesis is tested by using numbers and
analyzed by statistical methods. This type of research is based
G
on deductive form of logic. Ultimately, the researcher develops
generalization and contributes to theory.
Three different types of quantitative research are experimental,
R
quasi-experimental and surveys.
1. In experimental type of research, there is random assignment of
V
subjects to experimental conditions. The results are compared
d
with controls.
2. Quasi-experimental studies are similar to experimental studies
ti e
with the exception that there is nonrandomized assignment of
subjects to experiments.
3. Surveys are cross-sectional studies using questionnaires or
n
interviews with an intent of estimating the characteristics of a
larger population based on a smaller group from that population.
U
Health science research mostly deals with quantitative type of
research approach.
-
STEPS TO CONDUCT RESEARCH
9
Research is a systemic process starting from selection of research topic
ri 9
and ends at reporting the research findings at local/international
journals or scientific meeting. The Table 1.1 gives details about
various steps and relevant purposes in conducting research.
h
SELECTION OF RESEARCH TOPIC
ta
Main Criteria for Selecting a Research Topic
There are seven criteria for selecting a research topic.
1. Relevance: Here consider the prevalence of the problem in which
you are interested. In other words, how big is the problem.
2. Innovation: It is good to look into a new problem but it is not
always possible to work or search for new problems as you may
have limited resources. Thus, you can work on the old problem
but with a different perspective.
3. Feasibility: It means the availability of different resources that you
may need to carry out the research project. It includes manpower,
money, material, machinery, skills and time, etc.
4 Basics in Epidemiology and Biostatistics
tahir99 - UnitedVRG
Introduction to Research 5
tahir99 - UnitedVRG
Introduction to Research 7
BIBLIOGRAPHY
1. Dawson B, Trapp RG (Eds). Reading the Medical Literature. Basic
and Clinical Biostatistics, 3rd edn. Singapore: Lange Medical Books;
McGraw Hill; 2001.pp.317-9.
2. Fathalla MF, Fathalla MMF (Eds). What research to do? WHO Regional
Publication, Eastern Mediterranean Series: A Practical Guide for Health
Researchers. Cairo: World Health Organization; 2004.pp.25-42.
3. Harvard L. How to conduct an effective and valid literature search?
[Online]. 2007 [cited 2008 Jul]; Available from: URL: http://www.
nursingtimes.net/ntclinical/how_to_conduct_a_literature_search.
html
4. Hulley SB, Newman TB. Getting started: the anatomy and physiology
of clinical research. In: Hulley SB, Cummings SR, Browner WS (Eds).
Designing clinical research. Philadelphia, PA: Lippincott Williams and
Wilkins; 2007.pp.3-15.
5. Research and Scientific Methods. In: World Health Organization.
Health research methodology: a guide for training in research methods.
Manila: World Health Organization; 2001.pp.1-10.
CHAPTER
2
Study Designs
DEFINITION
A study design is a plan to conduct a study which allows the
researcher to translate a conceptual hypothesis into an operational
one. It is the method of data collection with respect to time, exposure
and outcome (Fig. 2.1).
The selection of a study design depends upon the research
objective and hypothesis. The researcher should know and use the
most appropriate study design that matches best with the objective.
Case Report
It is report of a single case of disease, usually with an unexpected
presentation, which typically describes the findings, clinical course
and prognosis of the case. Writing of a case report is like writing a
good clinical history of a patient that includes presenting features,
clinical signs, lab investigations, and diagnosis after excluding a list
of differential diagnosis. A classical example of a case report from
history is that of a congenital anomaly affecting limbs and digits
tahir99 - UnitedVRG
Study Designs 11
from Germany in late 1959 (The Thalidomide tragedy). The world has
never heard or seen such a unique congenital anomaly before. These
are the type of cases which should be presented as a case report.
12 Basics in Epidemiology and Biostatistics
Case Series
When several unusual cases all with similar conditions are described
in a published report, this is called a “Case Series”. A case series does
not include a control group. Subsequently after the first case report
of thalidomide tragedy a case series was published in 1961. The
thalidomide was used for nausea and vomiting in pregnancy in that
era, hence soon more such mal-developed children’ were identified
becoming a basis for a case series.
It was quite easy to identify the exposure now as thalidomide
because all mothers with the outcome (mal-developed children)
used this drug.
Cross-sectional Studies
In a cross-sectional study, the data is collected at one point of time.
The hallmark of such studies is that there is no follow-up. These
studies are also called “Prevalence Studies” as they determine the
burden of disease in a population, e.g. National Health Survey of
Pakistan on the prevalence of hypertension in Pakistan or Pakistan
National Diabetic Survey—shows Prevalence of Diabetes Mellitus in
Pakistan.
A survey is a classical example of a cross-sectional study. These
days surveys are also being carried out by people other than the
health professionals, for example, the media.
In a cross-sectional study, data on both the exposure and
outcome are determined at the same time. Hence, in this type of
study 4 groups are made, i.e. those exposed and have the outcome,
those exposed but do not have the outcome, those unexposed but
have the outcome, and those unexposed but without the outcome
(Flow chart 2.2). Exposure rates are calculated in each group, thus a
2 × 2 table can be constructed. These exposure rates are compared.
If a cross-sectional study covers the whole population, it is called
a census.
A cross-sectional design is not suitable to study the association
between an exposure and an outcome. While using this design
it is difficult for the researcher to establish whether the exposure
preceded the outcome or not. Ideally, the exposure should always
precede the outcome. For example, if the researcher is studying the
association of uric acid level and hypertension, and on analysis finds
tahir99 - UnitedVRG
Study Designs 13
Advantages
• Easy to perform
• Prevalence/frequency of the disease can be calculated
• Inexpensive as compared to analytical studies
• Useful for evaluating diagnostic procedures, e.g. comparing two
diagnostic or treatment modalities, or the usefulness of a new
diagnostic procedure
14 Basics in Epidemiology and Biostatistics
• Useful for measuring current health status and planning for some
health services
• Takes lesser time as compared to analytical studies
• Researcher can generate hypothesis.
Disadvantages
• The data about both the exposure to risk factors and the presence
or absence of disease are collected simultaneously, hence it is
difficult to determine temporal relationship of a presumed cause
and effect.
• Nonresponders bias (in surveys), it is difficult to obtain sufficiently
large response rates, as some people are too busy or reluctant to
participate.
• Hypothesis though can be generated but it is a weak hypothesis
which needs to be tested by conducting further analytical study.
tahir99 - UnitedVRG
Study Designs 15
Advantages
• Multiple exposure for a single outcome can be detected
• Inexpensive as compared to other analytical study designs
• No need of follow-up
• Takes lesser time as compare to other analytical study designs
• Recommended for those problems which have a long incubation
period as cancers.
• Recommended for studies on rare diseases
• Recommended for investigating a preliminary hypothesis.
Disadvantages
• Recall bias is the main problem as the “cases” will be more likely to
recall the past exposure. Similarly, if the researcher is working on
geriatric patients then recall bias can be problematic both in cases
and controls as the respondents might not have good memory
due to old age. For example, in a study looking at the association
of being a cigarette smoker for ten years and development of lung
cancer, some participants may have difficulty in recalling whether
they have been a cigarette smoker for ten years or not.
• Selection bias is another problem if the cases and controls are not
properly selected. Here are two examples of selection bias in two
studies carried out at two leading tertiary care centers of the world
by two very eminent researchers of the time.
Study 1
In 1929, Raymond Pearl at John Hopkins, Baltimore conducted a
study to test the hypothesis that tuberculosis (TB) protected against
cancer. He selected 816 cases of cancer from 7500 consecutive
autopsies. He also selected 816 controls from others on whom
autopsies had been carried out at John Hopkins. Of the 816 cases
(with cancer), 6.6% had TB. Of the 816 control (without cancer), 16.3%
had TB. From the finding that the prevalence of TB was considerably
higher in the control group, Pearl concluded that TB was protective
against cancer. Actually at the time of this study, TB was one of the
major reasons for hospitalization at Johns Hopkins Hospital. Pearl
thought that the control group’s rate of TB would represent the level
of TB in the general population; but because of the way he selected
the controls, they came from a pool that was heavily weighted with
tahir99 - UnitedVRG
Study Designs 17
Study 2
Coffee-drinking and Cancer of the Pancreas in Women. The cases
(patients with cancer of the pancreas) were white cancer patients
from 11 Boston and Rhode-Island hospitals. The controls were
recruited from the Gastrointestinal Clinics of the same hospital.
McMohan found that coffee consumption was greater in cases
than controls. The controls were patients who had reduced their
coffee consumption because of Physician’s advice. The controls
level of coffee consumption was not representative of the general
population. When a difference in exposure is observed between
cases and controls we must ask “Is the level of exposure observed
in the controls really the expected level in the general population.”
In the two studies (1 and 2) the researchers erroneously concluded
about the association between an exposure and outcome because of
improper selection of controls.
Cohort Studies
Cohort means a group of people sharing the same attribute, e.g. all
those who are exposed to the use of tobacco as compared to those
not exposed to the use of tobacco.
In a cohort study design, the two groups are made on the basis of
exposure (i.e. smokers and nonsmokers). These groups are followed
for a specific period of time for the outcome of interest. This study
design is preferred if the researcher aims to determine the incidence
and the risk factors associated with the disease.
There are two types of cohort studies:
1. Prospective Cohort Study or Concurrent Cohort Study
2. Retrospective Cohort Study or Historical Cohort Study
of interest. The subjects are then followed into the future in order
to record the development of an outcome of interest. The follow-up
can be conducted by mail questionnaires, by phone interviews, via
the Internet, or in person with interviews, physical examinations,
and laboratory or imaging tests. For example a study investigating
the association between cigarette smoking for ten years or more and
lung cancer, if the researcher wants to choose a prospective cohort
design then his study would start in the year 2013 and end into 2023
(Flow chart 2.4).
The Framingham Heart Study is a good example of large, pros
pective cohort study. It is an ongoing cohort study still in progress to
identify the risk factors associated with heart disease.
Advantages
• Multiple outcomes to a single exposure can be detected
• Incidence rates are calculated
• It helps in calculating the relative risk and the attributable risk
• Temporal association is best studied in prospective cohort study
• It allows the assessment of dose response relationship
Disadvantages
• Expensive
• Time consuming
• Strict follow-up is required
• Not suitable for diseases that have a long incubation period
• Not suitable for rare diseases
• Attrition (loss to follow-up) due to migration or death of the
respondents.
Advantages
• Less expensive
• Less time consuming
• Follow-up data is obtained through records so ‘follow-up time’ is
saved
• Other advantages of cohort studies are also there.
Disadvantages
There is no control over the data, whatever variable information
is available is there. Nothing can be done about missing data.
Sometimes information on a variable of interest is not available.
In a prospective cohort study, the investigators are typically
present from the beginning to the end of the observation period.
However, it is possible to maintain the advantages of the cohort
study without the continuous presence of the investigator, or having
to wait for a long time to collect the necessary data, through the
use of a retrospective cohort study. In other words, although the
investigator was not present when the exposure was first identified,
he reconstructs the exposed and unexposed population from records,
and then proceeds as though he has been present throughout the
study. For example, if the 10 years cigarette smoking and lung cancer
study using a retrospective cohort design was being done today
(year 2013), the investigator would look into records and identify
the people who were smokers in the year 2003. In this manner, he
has selected a cohort who have been exposed to cigarette smoking
for ten years. He would now determine the outcome of lung cancer
today (year 2013). This way by using the retrospective cohort design
he has been able to complete a study which would have taken ten
years from now in a few months time.
REGISTRIES
In the developed world, researchers have collected data pertaining to
specific diseases like the United States Renal Data Systems (USRDS)
for end-stage renal disease patients (ESRD). The USRDS has data on
all dialysis patients being dialyzed in any of the 52 states in the US.
Any patient who initiates dialysis is immediately registered in this
data base and subsequently the entire follow-up including clinical
characteristics, labs and medicines are recorded continuously until the
tahir99 - UnitedVRG
Study Designs 21
INTERVENTIONAL/EXPERIMENTAL STUDIES
Here intervention or some action is involved such as deliberate
application of a drug in the experimental (study) group and
no intervention in the control group. Later, the outcome of the
experiment is compared in both the groups (Flow chart 2.6).
Thus it differs from the observational analytical study designs
in that here the experiment is directly under the control of the
investigator whereas in the observational analytical studies, the
investigator takes no action, just observes.
There are three key components of an experimental study design:
(1) prepost test design, (2) a treatment group and a control group,
and (3) random assignment of study participants.
A prepost test design requires the collection of data on study
participants’ level of performance before the intervention is given
(pre-), and that you collect the same data on similar participants
after the intervention was given (post). This design is the best way to
be ensure that the intervention had a causal effect.
tahir99 - UnitedVRG
Study Designs 23
Contd...
Surgical Medical p-value
therapy group therapy group
Clinical
Angina (CCS class)—no (%) 0.24
0 132 (11.6) 146 (12.9)
1 338 (29.6) 339 (30.0)
11 407 (35.7) 423 (37.4)
111 259 (22.7) 219 (19.4)
Missing data 3 (<1) 2 (<1)
Duration of angina—months 0.53
Median 5 5
Episodes/week with exertion or at rest within last month 0.83
Median 3 3
History—no (%)
Diabetes 365 (32.0) 395 (35.0) 0.12
Hypertension 755 (66.2) 763 (67.5) 0.53
Congestive heart failure 56 (4.9) 51 (4.5) 0.59
Cerebrovascular Disease 99 (8.7) 100 (8.8) 0.83
Myocardial Infarction 435 (38.2) 437 (38.7) 0.80
Previous (PCI)* 173 (15.2) 183 (16.2) 0.49
Coronary artery bypass graft 124 (10.9) 124 (11.0) 0.94
(CABG)
Stress test
Total patients—no (%) 968 (84.9) 974 (86.2) 0.84
Treadmill tests—no (%) 552 (57.0) 550 (56.5)
Duration of treadmill 6.9 ± 2.6 6.8 ± 2.2 0.43
test-minute
Pharmacologic—stress no (%) 415 (42.9) 425 (43.6)
Echocardiography—no (%) 61 (5.4) 52 (4.6)
Nuclear imaging—no (%) 683 (70.6) 705 (72.2) 0.59
Single reversible defects 152 (22.2) 159 (22.6) 0.09
Multiple reversible defects 441 (66.0) 481 (68.2) 0.09
* PCI is per cutaneous intervention
24 Basics in Epidemiology and Biostatistics
BLINDING
Blinding represents an important, distinct aspect of randomized
controlled trials. The term blinding refers to keeping trial participants,
investigators or assessors (those collecting outcome data) unaware
of an assigned intervention. Blinding is of three types:
Single-blind
Here the participants do not know whether they are assigned to the
study or the control group. It means that they do not know whether
they are getting the new drug which is under investigation or the
old conventional drug. However, only the investigator knows who is
getting which drug. This trial helps to overcome subject variation.
Double-blind
Here neither the investigator (doctor) nor the participant (patient)
knows the group allocation and treatment received. However, the
statistician knows it. The drug is coded before handing over to the
doctor. Usually this trial is in practice.
tahir99 - UnitedVRG
Study Designs 25
Triple-blind
It goes one step further. All the participants, the doctor and the
statistician are unaware (blind) of the group allocation. Only the
principal investigator is aware of the group allocation and the
treatment allocation.
CONSENT FORM
Since these studies involve human subjects, hence there are always
ethical issues which cannot be over looked. Approval from Ethical
Review Board (ERB) is mandatory. Consent forms are always
required and are scrutinized in detail by the ERB.
QUASI-EXPERIMENTAL STUDIES
In a quasi-experimental study, one characteristics of a true
experiment is missing, either randomization or the use of a separate
control group. A quasi-experimental study, however, always
includes the manipulation of an independent variable which is the
intervention.
One of the most common quasi-experimental designs uses two
(more) groups, one of which serves as a control group. Both groups
are observed before as well as after the intervention, to test if the
intervention has made any difference.
Preclinical Phase
Drug is developed and evaluated in cells and animals to see its
potential effect on human body.
Phase I Trial
These trials are conducted to determine recommended dose, side
effects and manner in which drug is processed by body. Here just
10–20 healthy volunteers are recruited.
Phase II Trial
These are controlled clinical studies conducted to evaluate the
effectiveness of the drug or treatment to a larger group of people
(100–300) to see if it is effective. These trials further evaluate its safety
and determine the common short-term side effects and risks.
Phase IV Trial
This includes post-marketing studies to delineate additional
information including the drug’s risks, benefits, optimal use and
long-term side effects.
Post-marketing Surveillance
These involve observational studies such as case reports, cohort
studies or case control studies. Its purpose is to assess drug safety
tahir99 - UnitedVRG
Study Designs 27
META-ANALYSIS
A meta-analysis is a particular type of systematic review that
focuses on the numerical results. The main aim of meta-analysis
is to combine the results from individual studies to produce, if
appropriate, an estimate of the overall or average effect of interest
(e.g., the relative risk). The direction and magnitude of this average
effect, together with a consideration of the associated confidence
interval and hypothesis test result, can be used to make decisions
about the therapy under investigation and the management of
patients.
In the below study, Figure 2.2 is a meta-analysis comparing two
intervention for a certain outcome. The studies A [RR= 0.65 (CI = 0.1
– 0.7); p-value = 0.01] and E [RR = 0.7 (CI = 0.1 – 0.4); p-value = 0.0001]
show group A is better. While the study H [RR = 1.5 (CI = 1.2 – 2.0);
p-value = 0.001] shows that group B is better. The overall effect size is
not significant; [RR = 0.75 (95% CI = 0.3 – 1.1; p-value=0.32)].
Statistical Approach
We decide on the effect of interest and, if the raw data is available,
evaluate it for each study. However, in practice, we may have to
extract these effects from published results. For example, if the
outcome in a clinical trial comparing two treatments is numerical—
the effect may be the difference in treatment means. A zero difference
implies no treatment effect. Similarly, if the outcome is binary (e.g.
died/survived) we consider the risks of the outcome (e.g. death) in
28 Basics in Epidemiology and Biostatistics
problem exists but tude of the problem? or descriptive
knowing little about • Who is affected? studies:
its characteristics or • How do the affected • Descriptive case
possible causes people behave? studies
• What do they know, • Cross sectional
-
believe, and think about studies
the problem and its
causes?
Suspecting that Are certain factors indeed Analytical
certain factors associated with the (comparative)
contribute to the problem? (e.g. Is lack of studies:
problem preschool education related • Cross sectional
-
to low school performance? comparative
Is low fiber diet related studies
to carcinoma of the large • Case control
intestine?) studies
• Cohort studies
Having established • What is the cause of the Cohort studies
that certain factors problem? experimental or
are associated • Will the removal of a quasi-experimental
with the problem: particular factor prevent studies
desiring to establish or reduce the problem?
the extent to which (e.g. stopping smoking,
a particular factor providing safe water)
causes or contributes
to the problem
Having sufficient • What is the effect of a Experimental or
knowledge about particular intervention/ quasi-experimental
cause(s) to develop strategy? (e.g. treating studies
and assess an with a particular drug:
intervention that being exposed to a certain
would prevent, type of health education).
control or solve the • Which of two alternate
problem strategies gives better
results?
• Which strategy is most
cost effective?
-
tahir99 - UnitedVRG
Study Designs 29
the treatment groups. The effect may be the difference in the risks
or their ratios, the RR. If the difference in risks equals zero or RR=1,
then there is no treatment effect.
BIBLIOGRAPHY
1. Hulley SB, Newman TB. Getting started: the anatomy and physiology of
clinical research. In: Hulley SB, Cummings SR, Browner WS. Designing
clinical research. Philadelphia, PA: Lippincott Williams and Wilkins;
2007.
2. Last John M. A Dictionary of Epidemiology. Oxford University Press
1983.
3. Park K. Park’s Textbook on Preventive and Social Medicine 18th edn,
2005.
4. Schlesselman JJ. Case-Control Studies. Oxford University Press. New
York 1982.
5. Types of epidemiologic studies. In: Hennekens CH, Buring JE.
Epidemiology in Medicine. Boston: Little, Brown and Company; 1987.
pp. 101-204.
CHAPTER
3
Sampling Procedure
POPULATION
A major purpose of the research is to infer or generalize findings from
a sample to a target population. Population is the term statisticians
use to describe a large set or collection of items that have something
in common (i.e. all pregnant women, all pregnant women in third
trimester, all anemic pregnant women in third trimester, etc.).
Target population is a group about which researcher aims to draw
conclusion. In medicine, population generally refers to patients
or other living organisms, but the term can also be used to denote
collections of inanimate objects, such as autopsy reports, X-ray
reports, or birth certificates.
Figure 3.1 shows relationship among target population, study
population and sample. Target population is a population of ultimate
clinical interest about which researcher aims to draw a conclusion.
On account of the cost and other practical issues, the entire target
population cannot be studied. Study population is a subset of
target population that can be studied. Samples are subsets of study
populations investigated in clinical research because often not every
individual in a study population can be measured.
A “sample” is a subset of population with all its inherent qualities.
Studies are conducted on samples but inference is made about
target population. That is why it is important that the sample should
be a true representative of the target population. Hence, the selected
elements should be properly approached, recruited in the study and
interviewed. Thus, selection of sample is critical as, otherwise, the
research findings might not be valid.
It is vital to have a clear understanding of the terms population
and sample; these two terms must not be used interchangeably.
tahir99 - UnitedVRG
Sampling Procedure 31
SAMPLING TECHNIQUES
Broadly, there are two types of sampling techniques (Table 3.1):
1. Probability sampling techniques.
2. Nonprobability sampling techniques.
In a probability sampling technique, each participant in a study
population has an equal (or at least a known) chance of being
selected. The method protects the research from bias and ensures
32 Basics in Epidemiology and Biostatistics
that the sample is a true representative of a population. Importantly,
it helps a researcher to make meaningful statistical estimation while
analyzing the results of the research. In a nonprobability technique,
each participant does not have an equal chance of being selected.
001 002 003 004 005 006 007 008 009 010
011 012 013 014 015 016 017 018 019 020
021 022 023 024 025 026 027 028 029 030
031 032 033 034 035 036 037 038 039 040
041 042 043 044 045 046 047 048 049 050
051 052 053 054 055 056 057 058 059 060
061 062 063 064 065 066 067 068 069 070
071 072 073 074 075 076 077 078 079 080
081 082 083 084 085 086 087 088 089 090
091 092 093 094 095 096 097 098 099 100
tahir99 - UnitedVRG
Sampling Procedure 35
001 002 003 004 005 006 007 008 009 010
011 012 013 014 015 016 017 018 019 020
021 022 023 024 025 026 027 028 029 030
031 032 033 034 035 036 037 038 039 040
041 042 043 044 045 046 047 048 049 050
051 052 053 054 055 056 057 058 059 060
061 062 063 064 065 066 067 068 069 070
071 072 073 074 075 076 077 078 079 080
081 082 083 084 085 086 087 088 089 090
091 092 093 094 095 096 097 098 099 100
Cluster Sampling
In clustered sampling technique sub-group of population is used as
a sampling unit instead of individuals. It is a probability sampling
technique, employed when the researcher aims to select participants
from a large geographical area i.e. country, province, state or city
(Flow chart 3.1). Suppose the city of Karachi consisted of 18 towns
and each town consisted of 10 union councils. Initially, 5 towns are
tahir99 - UnitedVRG
Sampling Procedure 37
Convenience Sampling
Convenience sampling is presumed to be the most commonly used
technique in clinical research. It involves the selection of subjects
that are conveniently accessible to the researcher. Suppose, a
38 Basics in Epidemiology and Biostatistics
Purposive Sampling
Purposive sampling is also called judgmental sampling. The
technique is criticized for introducing selection bias in the research
as the researcher recruit participants based over pre-existing belief
that certain subjects will be more likely benefit, compliant or respond
in certain way. Thus, the researcher selects study participants with a
‘particular purpose’ in mind.
For example, if the researcher wants to check the hypothesis that
Pakistani females have better knowledge regarding medical research
than American females. Selection of Pakistani females medical
students (a group that has better understanding of medical research
than other women) and American females who came to the market
for shopping were selected. As the two groups are noncomparable,
evidently Pakistani females will display a better knowledge regarding
medical research which might not be the case. Such deviation from
truth is on account of purposeful sampling.
Similarly, while conducting a knowledge survey on the mode
of transmission of HIV; selecting participants that are relatives of
AIDS patients will demonstrate an excellent knowledge regarding
transmission modes of HIV. Evidently the selection of study
participants was biased as the sample was not the true representative
of the target population.
Snowball Sampling
Snowball sampling method is employed when study participants
are difficult to identify, access or locate. The method is commonly
employed to recruit participants from hard to reach group (i.e. sex
workers, IV drug users, etc.). The sample is built through chain
referrals. Suppose, you are investigating the knowledge about
tahir99 - UnitedVRG
Sampling Procedure 39
Quota Sampling
Quota sampling is a nonprobability sampling method that
ensured a certain number of study participants from different
subgroups constitute the sample so that all these characteristics are
represented. Suppose you aim to identify the quality of life among
dialysis patients but you think that socioeconomic status has a
strong affect on quality of life in these patients. Thus you decide to
include 25% of respondents from each socioeconomic groups (i.e.
upper, middle, lower middle and lower). If the estimated sample
size is 200, each socioeconomic group will include 50 participants.
Thus initially a population is divided into different strata and then
any nonprobability sampling technique will be applied to select
participants.
BIBLIOGRAPHY
1. Beth Dawson-Saunders, Robert G Trapp. Basic and Clinical Biostatistics,
1989.
40 Basics in Epidemiology and Biostatistics
4
Variables, Data and
its Presentation
Type of Variables
Dependent and Independent Variables
As in health system research you often look for causal explanations,
hence it is important to make distinction between dependent and
independent variables.
The variable that is used to describe or measure the problem
under study is called the dependent variable. It represents the
output or effect, or is tested to see if there is an effect. A dependent
variable is also known as a “response variable”, “outcome variable”,
and “output variable”.
The variables that are used to describe or explain the difference
in the dependent variable or to cause changes in the dependent
variables are called the independent (exposure) variables. It
represents the inputs or causes, or is tested to see if they are the cause.
An independent variable is also known as a “predictor variable”,
“explanatory variable”, and “exposure variable”.
For example, in a study of the relationship between smoking and
lung cancer, suffering from lung cancer (with the values yes or no)
would be the dependent variable and ‘smoking’ (varying from not
42 Basics in Epidemiology and Biostatistics
Types of Data
Data is classified as either qualitative and quantitative (Flow chart 4.1):
1. Qualitative or categorical data.
2. Quantitative or numerical data.
Flow chart 4.1 Classification of data types
* Mutually exclusive means both events cannot occur at the same time (i.e. tossing a
coin will result in either head or tail).
tahir99 - UnitedVRG
Variables, Data and its Presentation 43
child is equal with respect to providing one counting unit. There are
no intermediate values between each number.
Continuous variable is one in which there are no gaps in the values
of the variables: there are an unlimited number of possible values
between any two adjacent values on the scale. Thus, if the variable
is height measured in inches, then 4 and 5 inches are two adjacent
values of the variable. However, there can be an infinite number of
the intermediate values, such as 4.5 and 4.7 inches, variables such
as these are known as continuous variables (the values which can
occur in fractions or decimals).
Frequency Tables
The most common way of presentation of data is to arrange them in
the form of tables. It gives the frequency with which (or the number
of times) a particular value appears in the data.
The basic principles of tabulation of data are:
1. The information should be in a simple and orderly manner.
2. The table should have a title which must be brief and compre
hensive.
3. Rows and columns must have their own captions.
4. The titles of the rows must be entered on the left side of the table
while the titles of the columns are on the top row. The rest of
the table constituting the body, contain the numerical values in
actual numbers, in percentages or in both forms.
5. The class intervals are usually taken at equal intervals.
6. Standard codes or symbols, if used, should be explained in the
foot note.
-
In a frequency Tables 4.1 and 4.2, data is presented in a tabular
form. It gives the frequency with which (or the number of times) a
particular value appears in the data.
tahir99 - UnitedVRG
Variables, Data and its Presentation 45
Table 4.1: Systolic blood pressure of 100 patients coming to a tertiary care
hospital
Systolic blood Frequency Relative Cumulative
pressure (mm Hg) (n =100) frequency relative
Below 100 15 0.15 0.15
100–120 25 0.25 0.40
121–140 20 0.20 0.60
141–160 30 0.30 0.90
Above 160 10 0.10 1.00
Total 100 1.00
Graphs
Another way to summarize and display data is through the use of
graph or pictorial representations of data, so that the data is easier to
interpret. Graphs should be designed so that they convey at a single
glance the general patterns in a set of data.
Types of Graphs
• Bar charts
• Pie charts
• Histograms
• Line graphs
• Scatter plots
46 Basics in Epidemiology and Biostatistics
Bar Charts
Bar charts are used for binary, nominal and ordinal data (categorical)
and comprises of nonadjacent bar. The bars can be vertical or
horizontal.
Example: The marital status of different respondents (200 in total)
participated in a knowledge, attitude and practice survey regarding
dengue fever are as follows; Single 60 (30%), Married 120 (60%) and
Divorced 20 (10%). The bar graph is shown in Figure 4.1.
Y axis = Percentage of respondent
-
X axis = Marital status of respondent.
-
Pie Charts
Pie charts can also be used to display binary, nominal and ordinal
data (categorical). A pie chart consists of circular region partitioned
into sections, with each percentage represents a part or a percentage.
Example: The data regarding knowledge of research ethics were
collected from 150 postgraduate trainees were collected. The survey
showed that 60 (40%) of the respondents were male and 90 (60%)
were female. The data is represented in Figure 4.2.
tahir99 - UnitedVRG
Variables, Data and its Presentation 47
Histograms
A histogram depicts a frequency distribution for quantitative data, it
comprises of series of adjacent bars (Fig. 4.3).
Histograms are constructed to represent the continuous or
quantitative data. Ideally, every quantitative variable should be
normally distributed (bell shaped curve).
48 Basics in Epidemiology and Biostatistics
Line Graphs
A line graph (also called time series plot) is appropriate for
representing data that vary continuously. It shows a trend of variable
over time. To construct a time series plot, time is placed on a
horizontal axis and the variable being measured on a vertical axis,
with points being connected using line segments (Fig. 4.4).
Example: The population statistics of the US for the years 1860–1950
are as in Table 4.3:
tahir99 - UnitedVRG
Variables, Data and its Presentation 49
Scatter Plots
Scatter plot represents a relationship between two continuous
variable.
Example: Suppose, a researcher wishes to identify whether studying
for longer hours will lead to better scores. A collection of data is given
in Table 4.4.
Based, on the data below a scatter plot has been constructed as
shown in Figure 4.5. (Note: When connecting a scatter plot, do not
connect the dots).
Figure 4.5 Scatter plot of students test scores and hours of study
50 Basics in Epidemiology and Biostatistics
BIBLIOGRAPHY
1. Kuzma JW, Bohnenblust SE (Eds). Organizing and displaying data.
Basic statistics for the health sciences, 3rd edn. London: Mayfield
Publishing Company; 2001.pp.23 43.
-
2. Kuzma JW, Bohnenblust SE (Eds). Organizing and displaying data.
Basic statistics for the health sciences, 5th edn. Boston: McGraw Hill;
2005.pp.29 53.
-
3. Pagano M, Gauvreau K (Eds). Data presentation. Principles of
biostatistics. Australia: Duxbury Press; 2000.pp.7 37.
-
4. Perrie A, Sabin C (Eds). Displaying data graphically. Medical Statistics
at glance. UK: Blackwell Science Ltd; 2000.pp.14 5.
-
5. Perrie A, Sabin C (Eds). Type of data: Medical Statistics at glance.
UK: Blackwell Science Ltd; 2000.pp.8 9.
-
tahir99 - UnitedVRG
CHAPTER
5
Biostatistics: Basic
The mean weight would equal (110 + 110 + 140 + 150 + 160)/5 =
670/5 = 134 pounds.
The median value would be 140 pounds; since 140 pounds is the
middle weight.
Most frequent value is 110 (as occurring twice), so the mode of the
data set is 110 pounds.
The mode of the data is 110 pounds, since it is occurring twice
(more frequently).
MEASURES OF VARIATION
These include the measures to describe the amount of variability or
spread in a set of data. The most common measures of variability are
the range, variance, and standard deviation.
Range is the simplest measure of variability. It is defined as the
difference in value between the highest (maximum) and the lowest
(minimum) observation in the data set. For example, consider the
following women weights in the data set 110 pounds, 110 pounds,
140 pounds, 150 pounds, and 160 pounds. The range would be
160–100 = 60 lbs.
Variance quantifies the amount of variability or spread about the
mean of the sample.
tahir99 - UnitedVRG
Biostatistics: Basic 53
For instant, the women weights in the above example were 110,
110, 140, 150 and 160 pound, the mean weight would be 134 pounds.
Variance (S) = S (xi – –x)2 / (n – 1)
Where xi = Individual sample observation
–x = Sample mean
n = Total sample size
S = sum of the differences between individual sample observation
and sample mean
Example:
S = [(110–134)2 + (110–134)2 + (140–134)2 + (150–134)2 +
(160–134)2]/5–1
S = [ (–24)2 + (–24)2 + (6)2 + (16)2 + (26)2]/5–1
S = [576 + 576 + 36 + 256 + 676]/4 = 2120/4
S = 530
Standard deviation is the square root of the variance. The standard
deviation is a measure, which describes how much individual
measurement differs, on the average, from the mean.
Standard deviation is the square root of variance (S):
SD = S
SD = (530) = 23.02
The same results can easily be obtained by SPSS (statistical
package).
Below is the SPSS output showing central tendency and variation
of above data set.
N (Number of observations) 5
Mean 134
Median 140.00
Mode 110.00
Standard deviation 23.02
Variance 530.00
Range 50.00
NORMAL DISTRIBUTION
A normal distribution such as the distribution shown in the following
figure (Figs 5.1A and B) is classically a bell shaped curve. Most of
the values are clustered near the mean and a few values are near the
tails. The normal distribution is symmetrical around the mean. If
the variable is normally distributed, then mean, median and mode
values will be approximately equal.
An important characteristic of a normally distributed variable is
that 95% of the measurements have values which are approximately
within 2 standard deviations (SD) around the mean (Fig. 5.1B).
When the area of the normal curve is divided into sections by
standard deviations above and below the mean, the area in each
section is a known as a quantity. For example, 34 percent of all the
values of a normally distributed variable are between the mean
and one standard deviation above it. It also means that there is
a 0.34 chance that a value drawn at random from the distribution
will lie between these two points. Similarly, 34 percent of all the
values of a normally distributed variable are between the mean and
one standard deviation below it. It also means that there is a 0.34
chance that a value drawn at random from the distribution will lie
between these two points. Consequently, 68 percent of all the values
of a normally distributed variable are between the mean and one
standard deviation either side.
Sections of the curve above and below the mean may be added
together to find the probability of obtaining a value within (plus
tahir99 - UnitedVRG
Biostatistics: Basic 55
R G
d V
ti e
A
Un
-
9
ri 9
B
Figures 5.1A and B Proportion of cases under portion of the normal curve
h
ta
or minus) a given number of standard deviations of the mean.
For example, the amount of curve area between one standard
deviation above the mean and one standard deviation below is
0.34 + 0.34 = 0.68, which means that approximately 68 percent
of the values lie in that range. Similarly, about 95 percent of the
values lie within two standard deviations while 99.7 percent of
the values lie within three standard deviations around the mean
(Fig. 5.1B).
Example: Suppose, for a study on 300 chronic kidney disease
(CKD) patients, the hemoglobin levels were obtained. The data on
56 Basics in Epidemiology and Biostatistics
BIBLIOGRAPHY
1. Kuzma JW, Bohnenblust SE (Eds). Summarizing data: Basic statistics
for the science. London: Mayfield Publishing Company; 2001.pp.44 54.
-
2. Kuzma JW, Bohnenblust SE (Eds). The Normal Distribution: Basic
statistics for the science. London: Mayfield Publishing Company; 2001.
pp.79 91.
-
3. Pagano M, Gauvreau K (Eds). Numerical summary measures. Principles
of biostatistics. Australia: Duxbury Press; 2000.pp.38 65.
-
4. Perrie A, Sabin C (Eds). Describing data. Medical Statistics at glance.
UK: Blackwell Science Ltd; 2000.pp.16 9.
-
5. Perrie A, Sabin C (Eds). Theoretical distribution (1): the normal distri
bution. Medical Statistics at glance. UK: Blackwell Science Ltd; 2000.
pp.20 1.
-
tahir99 - UnitedVRG
CHAPTER
6
Estimation and
Hypothesis Testing
R G
V
Estimation refers to the process by which one makes inferences
d
about a population, based on information obtained from a sample.
ti e
POINT ESTIMATE
• A point estimate is a specific numerical value estimate of a
n
parameter.
• The best point estimate of the population mean (µ) is the sample
U
mean.
• But how good is a point estimate?
-
There is no way of knowing how close the point estimate is to
the population mean. Statisticians therefore prefer another type of
9
estimate called an interval estimate.
ri 9
INTERVAL ESTIMATE
• An interval estimate of a ‘parameter’ is an interval or a range of
h
values used to estimate the ‘parameter’ (confidence level).
• The confidence level of an interval estimate of a ‘parameter’ is the
ta
probability that the interval estimate will contain the parameter.
• Two commonly used confidence levels are 95 percent and
99 percent.
• If one desires to be more confident then the sample size must be
large enough.
HYPOTHESIS TESTING
What is a Hypothesis?
Hypothesis is a testable theory. Hypothesis testing is the method
of testing whether claims or hypothesis regarding a population are
58 Basics in Epidemiology and Biostatistics
tahir99 - UnitedVRG
Estimation and Hypothesis Testing 59
R G
V
appearance of an outcome. In between the absolute values of zero
and one there is a whole range of probabilities. The application of
d
the scale of probability in the concept of hypothesis testing will be
elaborated subsequently (Fig. 6.1).
ti e
TEST OF HYPOTHESIS
n
• Suppose a study is being conducted to answer questions about
differences between two regimens for the management of
U
diarrhea in children: the sugar based modern ORS, and the time-
tested indigenous herbal solution made from locally available
-
herbs.
• One question that could be asked is:
“In the population is there a difference in overall improvement
9
(after three days of treatment) between the ORS and the herbal
ri 9
solution?’’
• There could be only two answers to this question:
1. Yes
h
2. No
ta
Null Hypothesis (H0)
“There is no difference between the 2 regimens in term of improve
ment” (null hypothesis). A null hypothesis is usually a statement
that there is no difference between groups or that one factor is not
dependent on another and corresponds to the No answer.
theses are stated in such a way that they are mutually exclusive.
That is, if one is true, the other must be false; and vice versa.
2. Formulate an analysis plan: The analysis plan describes how to
use sample data to accept or reject the null hypothesis. It should
specify the following elements:
• Selection of significance level (a): Often, researchers choose
significance level equal to 0.01 (1 in 100), 0.05 (1 in 20), the
significance level is the risk we are willing to take that a sample
which showed a difference was misleading. Five percent
significance level means that we are ready to take a 5 percent
chance of wrong results. The significance level is set prior to
the actual testing of the null hypothesis, if alpha is set at 0.01,
then the researcher desires to be 99 percent confident before
rejecting the null hypothesis.
• Choosing a test statistic: t-test, z-test for continuous data,
chi-square for proportions, etc. Test statistics is computed
from the sample data and is used to determine whether the
null hypothesis should be rejected or retained. Test statistics
generates a p-value.
3. Analyze sample data: Using sample data perform computations
called for in the analysis plan.
• p-value: Indicates the probability or likelihood of obtaining a
result at least as extreme as that observed in a study by chance
tahir99 - UnitedVRG
Estimation and Hypothesis Testing 61
G
value of p less than or equal to 0.05 indicates that there is at
the most a 5 percent probability of observing an association
R
as large or larger than that found in the study due to chance
alone, given that there is no association between exposure and
V
outcome. If the P values is higher than the set value of alpha is,
e.g. p value>0.05, then we do not reject the null hypothesis
d
(Fig. 6.2).
ti e
4. Interpret the results: If the sample findings are unlikely, given
the null hypothesis, the researcher rejects the null hypothesis.
Typically, this involves comparing the p-value to the significance
level, and rejecting the null hypothesis when the p-value is less
n
than the significance level.
- U
9
ri 9
h
ta
Figure 6.2 Level of significance (a = 0.05) for hypothesis (testing)
62 Basics in Epidemiology and Biostatistics
DECISION ERRORS
Two types of errors can result from a hypothesis test.
1. Type I error: A type I error occurs when the researcher rejects a
null hypothesis when it is true. The probability of committing a
type I error is called the significance level. This probability is also
called alpha, and is often denoted by . Thus, if of a study is
α
α
lowered from 0.05 to 0.01 the maximum chance of committing a
type I error also reduces from 5 to 1 percent.
Suppose, a researcher wants to compare the mean ages of males
and females in a class of final year students. The null hypothesis for
this research is that there is no difference in the mean age of males
and females of this class. For some reason (i.e. small number of
sample size, inappropriate statistical analysis technique, etc.) the
p-value is calculated as 0.01 (as less than 0.05 thus significant at
95 percent confidence interval). As a result, the researcher has to
reject the true null hypothesis, thus forced to make a type I error.
In this example, there was no true difference in the mean ages of
males and female as they are students in the same class, and the
null hypothesis was true.
Similarly, this type of error can happen in the court of law,
a judge while prosecuting a trial if sends an innocent behind
the bars, he has committed a type I error. Type I error is more
important, which both researcher and judge must avoid in all
cases, and for this reason they make every effort not to commit a
type I error ( = 0.05).
α
2. Type II error: A type II error occurs when the researcher accepts
a null hypothesis that is false. The probability of committing
a type II error is called beta, and is often denoted by b. The
probability of not committing a type II error is called the Power of
the test. The power is generally kept at 80 percent and determined
by 1-b. The level of significance and power of a study play a very
crucial role in sample size determination.
Suppose, a researcher aims to compare the mean Hb of chronic
kidney disease (CKD) patients with that of normal population.
The null hypothesis is; there is no difference in the mean Hb of
CKD patients and normal population. Considering that the mean
hemoglobin of the sample of CKD patients was 7G/dL, and that
of the normal population was 12G/dL. If in this study the sample
size is inadequate or because of some inappropriate statistical
tahir99 - UnitedVRG
Estimation and Hypothesis Testing 63
G
the normal population. Obviously, there was logically a true
difference between the mean hemoglobin of CKD patients and
R
normal population and the null hypothesis was false, but because
the sample size was small, the analysis failed to work a significant
V
difference. In order to avoid this error, sample size calculation is
carried out at the synopsis level.
d
Similarly, this type of error can happen in the court of law, a
ti e
judge while prosecuting a trial, if declares a guilty person as
innocent and frees him/her, he has committed a type II error. This
can happen in the court of law where a person who is thought
to be guilty gets away from punishment because the court does
n
not have enough evidence against him. So we can see that in the
court of law having enough evidence is a must to make a decision,
U
whereas in research the evidence is a large and adequate sample
size. Type II error is less important than type II error, but it should
-
also be tried to avoid by having an adequate sample size with a
minimum power size of 80 percent (Table 6.1).
9
Simple Explanation of p-value and
ri 9
95 Percent Confidence Interval
Hypothesis is all about the confidence of researcher in his results.
h
Having completed his study he is faced with two questions:
ta
Table 6.1: a- and b-errors
Decision
Retain the null Reject the null
hypothesis hypothesis
Truth in the Correct Type I error
Population True 1–a a
Type II error Correct
False b 1–b
Power
64 Basics in Epidemiology and Biostatistics
tahir99 - UnitedVRG
Estimation and Hypothesis Testing 65
G
is less than 0.0001. This means that the probability of the researcher
having his results by chance are almost negligible. So we can see
R
that the researcher takes the age of 95 percent confidence interval
and p-value to be sure that his results are valid. The next variable
V
in this multivariate regression analysis is gender. The relative risk
of mortality among females compared to males is 12 percent (RR =
d
1.12). However, when we look at p-values and 95 percent confidence
ti e
interval for this variable, they are both statistically insignificant. The
interpretation for the variable gender in this study considering the
p-value and 95 percent confidence interval, would be that there is no
difference for mortality among males and females.
Un
Solving Hypothesis Testing Problems
The six steps for solving hypothesis testing from problems are as
-
follows:
1. State the hypothesis and identify the claim
9
2. Choose a significance level a
3. Find the critical value (s)
ri 9
4. Compute the test value
5. Make the decision to reject or not to reject the null hypothesis
6. State the appropriate conclusion.
h
Example: The population has a mean Hb level (µ) of 12 g/dL, and a
ta
SD of 2. A sample of the population (x) has a mean Hb of 7 g/dL. Is
the Hb level of the sample representative of the population mean?
Solution
Step 1: State the hypothesis and identify the claim:
Null: The mean Hb level of x = µ
Alternate: The mean Hb level of x ≠ µ
Step 2: Choose a significance level a:
Alpha = 0.05
The researcher is willing to accept < 5 percent chance of committing
a type I error (of rejecting a true null hypothesis, by chance).
66 Basics in Epidemiology and Biostatistics
Interpretation
Since z-score calculated by statisticians for 2 standard deviation cut
of point is –1.96 and +1.96. Any z-score less than 1.96 and/or greater
Figure 6.3 Critical regions (the two tails) for rejecting the null hypothesis
(a = 0.025)
tahir99 - UnitedVRG
Estimation and Hypothesis Testing 67
R G
V
Figure 6.4 Region of rejection and region of acceptance
d
ti e
than +1.96 will fall in the region of rejection. In the above study, –2.5
is smaller than –1.96 we can see in Figure 6.4 the CKD sample mean
falls within the region of rejection (Fig. 6.4) of the population mean.
n
Hence, we reject the null hypothesis.
U
Level of Significance ( Level: )
The level of significance is the maximum probability of committing
-
a type I error. Statisticians generally agree on using 3 arbitrary
significance levels, i.e. 0.10, 0.05 and 0.01. If the significance level
9
is 0.01, there is 1 percent probability of committing a mistake (and
accepting results that are not true). If the significance level is 0.10,
ri 9
there is 10 percent probability of committing a mistake, while if the
significance level is 0.05 it means, there is 5 percent probability of
committing a mistake. If the significance level is set as 1 percent an
h
extremely large sample size is required which may be difficult to
achieve, while if the significance level is set as 10 percent the sample
ta
size required will be small, but the validity of the results will become
questionable. Increase in sample size makes the findings more
valid while decrease in sample size invariably affects the validity
of the results. Thus it is investigator choice and decision to set the
significance level at an appropriate level so that the findings are
valid, at an affordable sample size.
BIBLIOGRAPHY
1. Duffy MS, Jacobsen BS. Key principles of statistical inference. In: Munro
BH (Ed). Statistical methods for health care research. Philadelphia:
Lippincott William and Wilkins; 2005. pp. 73-106.
68 Basics in Epidemiology and Biostatistics
tahir99 - UnitedVRG
CHAPTER
7
Measures of Disease Frequency
R G
V
For epidemiological purposes, the occurrence of cases of disease
d
must be related to “population at risk” giving rise to the cases.
Several measures of disease frequency are in common use. There are
ti e
three general classes of mathematical parameters used to relate the
number of cases of a disease or outcome to the size of the source
population.
Ratio
RATIO, PROPORTION AND RATE
Un
-
It is obtained by simply dividing one quantity by another without
9
implying any specific relationship between the numerator and
denominator, such as gender ratio, i.e. females : males. In ratio, the
ri 9
numerator and denominator are mutually exclusive.
For example, the female to male ratio of postgraduate trainees in
Abbasi Shaheed Hospital is:
h
Number of female trainees in
ta
Abbasi Shaheed hospital
Gender ratio = __________________________________________
Number of male trainees in
Abbasi Shaheed hospital
= 150/50
Female : Male = 3:1
Proportion
It is a type of ratio in which those who are included in the numerator
must also be included in the denominator.
70 Basics in Epidemiology and Biostatistics
Total number of postgraduate
trainees who appeared FCPS part 2
= 150/1500
= 1/10
Rate
A rate is a proportion with specification of time. There is a distinct
relationship between the numerator and denominator with a
measure of time being an intrinsic part of the denominator.
For example, the number of newly diagnosed cases of cervical
cancer per 100,000 women during a given year.
Important Point
It is necessary to be very specific about what constitutes both the
numerator and the denominator. In some circumstances, it is
important to make clear distinction whether the measure represents
the number of events or the number of individuals.
For example, the frequency of myopia among a population of
school children could represent the number of affected eyes in
relation to total eyes (measure represents the event), or the number
of children affected in one or both eyes relative to all students
(measure represents the number of individuals).
tahir99 - UnitedVRG
Measures of Disease Frequency 71
Prevalence
Prevalence quantifies the proportion of individuals in a population
who have the disease at a specific instant and provides an estimate
of the probability that an individual will be ill at a point in time.
G
Prevalence is proportion, so has no unit.
Prevalence “P” can be calculated as
R
Number of existing cases (both old
and new) of a disease
V
___________________________________________________
Prevalence =
Total population at a given point in time
d
Point Prevalence
ti e
Point prevalence measures the frequency of disease of interest in a
defined population at a single point in time.
n
Number of cases (diseased) in a defined
population at one point in time
U
_______________________________________________________
Point prevalence =
Number of persons in a defined population
-
at the same point in time
For example: Of 25,000 male residents in Steel Town on 1st March,
9
2013, 25,00 have diabetes. The prevalence of diabetes among men in
ri 9
Steel Town on 1st March, 2013 is calculated as:
Prevalence (P) = 2500/25000
= 1/10 or 0.1
Prevalence can also be expressed as percentage (cases per 100),
h
by multiplying P by 100. Thus the prevalence percentage in the above
ta
example was calculated(as)
Prevalence in (%) of diabetes among men in Steel Town on 1st
March, 2013 is calculated as = 0.1 × 100
= 10%
Period Prevalence
Period prevalence is the total number of cases (diseased) at any
point during a specified period of time divided by the population at
risk midway through the period.
72 Basics in Epidemiology and Biostatistics
time period
For example: A study was conducted in Gulberg Town, Karachi from
January 1st 2011 to December 31st 2012, to determine the period
prevalence of hypertension in women greater than 45 years during
the time period. The Period Prevalence based over the below data is
as follows:
• Number of hypertensive women greater than 45 years residing in
Gulberg Town as on 1st January 2011 = 2500
• Number of women greater than 45 years residing in Gulberg Town
developed hypertension from 1st Jan 2011 to 31st Dec 2012 = 500
• Population at risk midway (as on 31st Dec, 2011) = 60,000
2500 (old cases) + 500 (new cases)
Period prevalence = _____________________________________________
60,000 (midway population)
= 3,000/ 60,000
= 1/20 or 0.05
Period prevalence when expressed in percentage would be:
= 0.05 × 100
= 5%
Factors Influencing Prevalence
Prevalence is a useful measure in quantifying the burden of disease
in a population at a given point in time, thus beneficial in planning
health services. However, as it is influenced by a number of factors
(Table 7.1) thus not a useful measure to establish the determinant of
disease (causality) in a population.
Incidence
Incidence quantifies the number of new events or cases of disease
that develop in a population of individuals at risk during a specified
time interval.
Number of new cases of a disease
Incidence = _____________________________________________
Total population at risk
tahir99 - UnitedVRG
Measures of Disease Frequency 73
G
without cure
Increase in new cases (Increase in Decrease in new cases (decrease in
R
incidence) incidence)
In-migration of cases In-migration of healthy people
V
Out-migration of healthy people Out-migration of cases
d
In-migration of susceptible people Increase cure rate of cases
Improved diagnostic facilities (Better Worsening diagnostic facilities
ti e
reporting) (Poor reporting)
n
Issues in the Calculation of Measures of Incidence
U
For any measure of disease frequency, precise definition of the
denominator is essential for both accuracy and clarity. This is of
-
particular concern in the calculation of incidence. The denominator
of a measure of incidence should include only those who are
9
considered “at risk” of developing the disease. That is, the total
population from which the new cases could arise.
ri 9
Consequently, those who currently have or have already had the
disease under study or persons who cannot develop the disease for
reasons such as age, immunization, or prior removal of the involved
h
organ should be excluded from the denominator.
ta
SPECIAL TYPES OF INCIDENCE RATES
• Cumulative incidence rate or incidence risk
• Incidence density rate.
Morbidity Rate
It is the incidence rate of nonfatal cases in the total population at risk
during a specified period of time. For example, the morbidity rate
of tuberculosis (TB) in the US in 1982 can be calculated by dividing
the number of nonfatal cases newly reported during that year by the
total US mid-year population.
76 Basics in Epidemiology and Biostatistics
Mortality Rate
It expresses the incidence of deaths in a particular population during
a period of time. It is calculated by dividing the number of fatalities
during that period by the total population. This can be further
divided into cause specific mortality rate, age specific mortality rate
or sex specific mortality rate, etc.
BIBLIOGRAPHY
1. Gordis L (Ed). Measuring the occurrence of disease. Epidemiology.
Philadelphia, PA: Saunders Elsevier; 2008. pp. 37-57.
2. Hennekens CH, Buring JE (Eds). Measures of disease frequency and
association. Epidemiology in medicine. Boston: Little Brown and
Company; 1987. pp. 54-100.
3. Kuzma JW (Ed). Vital statistics and demographic methods. Basic
statistics for the science. London: Mayfield Publishing Company; 2001.
pp. 255-73.
tahir99 - UnitedVRG
CHAPTER
8
Measures of Association
Person 1 2 3 4 5 6 7 8 9 10
Machine owned 5 10 4 8 2 7 9 6 1 12
(in months)
Hours exercised 5 2 8 3 8 5 5 7 10 3
If you display these data pairs as points in a scatter plot (Fig. 8.1),
then you can see a definite trend. The points appear to form a line
that slants from the upper left to the lower right. As you move along
that line from left to the right, the values on the vertical axis (hours of
exercise) decreases, while the values on the horizontal axis (months
owned) increases. Another way to express this is to say that the two
variables are inversely related: the more months the machine was
owned, the less the person tends to exercise. Thus, there seems to be
tahir99 - UnitedVRG
Measures of Association 79
Person 1 2 3 4 5 6 7 8 9 10
Machine owned 5 10 4 8 2 7 9 6 1 12
(in months)
Cardiovascular 4 9 5 7 3 7 8 5 2 11
fitness (score from
1 to 12)
Figure 8.2 Scatter plot of two continuous variables (Months exercise machine
owned and cardiovascular fitness score) showing positive correlation
80 Basics in Epidemiology and Biostatistics
Person 1 2 3 4 5 6 7 8 9 10
Machine owned 5 10 4 8 2 7 9 6 1 12
(in months)
Height (meters) 2 1.3 1.8 1.5 1.9 1.3 1.9 1.4 1.8 1.5
tahir99 - UnitedVRG
Measures of Association 81
the closeness of the data points to the perfect line. Figure 8.5 shows
a stronger correlation than Figure 8.4.
tahir99 - UnitedVRG
Measures of Association 83
where
Y’ is the predicted value of the dependent variable Y
a is the intercept (in this case it is 19.932)
b is the slope or the gradient of the regression line (in this case, it
is –0.061)
X is the independent or explanatory variable
Thus, the equation of the regression line will be
Y’ = 19.932 + (–0.061)X
Y’ = 19.932–0.061X
Task: Predict the Beck depression score for maintenance dialysis
patients on dialysis for 16 months and on 17 months?
Relative risk
The relative risk (or risk ratio) is defined as the ratio of the incidence of
disease in the exposed group divided by the corresponding incidence
of disease in the nonexposed group. Relative risk can be calculated in
cohort studies such as the Framingham Heart Study where subjects
tahir99 - UnitedVRG
Measures of Association 85
________ a
a + b
Relative risk (RR) = _____________
_________ c
c + d
86 Basics in Epidemiology and Biostatistics
Disease status
Risk factor Total
CHD present CHD absent
88 224 312
Nonsmoker
c d c+d
Odds Ratio
The odds ratio is defined as the odds of exposure in the group
with disease divided by the odds of exposure in the control group.
As subjects are selected on the basis of disease status in case
-
control studies; therefore, it is not possible to calculate the rate of
development of disease (or the incidence).
tahir99 - UnitedVRG
Measures of Association 87
Exposed A b a+b
Nonexposed C d c+d
______ a
c
Odds ratio = ________
______ b
d
Odds ratio = ad/bc
Breast cancer
Exposure Total
Yes No
Exposed
140 (a) 370 (b) 510
(oral contraceptive users)
Nonexposed 40 (c) 234 (d) 274
88 Basics in Epidemiology and Biostatistics
______
c
Odds ratio = ________
b
______
d
Odds of exposure in cases = a/c = 140/40 = 3.5
Odds of exposure in controls = b/d = 370/234 = 1.6
OR = 3.5/1.6 = 2.2
Interpretation of OR
Compared to the controls (those who did not have Ca breast), the
odds of being an oral contraceptive user were 2.2 greater in those
who had Ca breast (cases).
BIBLIOGRAPHY
1. Coggon D, Rose G. Quantifying disease in populations. [Online].
1997 [cited 2008 Oct 01]; Available from: URL: http://www.bmj.com/
epidem/epid.2.html
2. Grimes DA, Schultz KF. Cohort studies: marching towards outcomes.
Lancet. 2002;359:341 5.
-
3. Israni RK. Guide to biostatistics. [Online]. 2007 [cited 2008 Aug 05];
Available from: URL:http://www.medpagetoday.com/Medpage
-
Guide to Biostatistics.pdf
-
-
4. Schultz KF, Grimes DA. Case control studies: research in reverse.
-
Lancet. 2002;359:431 4.
-
tahir99 - UnitedVRG
CHAPTER
9
Factors Affecting
Study Outcomes
INTRODUCTION
Results of an epidemiological studies may reflect the true effect of an
exposure(s) on the development of the outcome under investigation,
but it must always be considered that the results may in fact due to an
alternative explanations. Such alternative explanations, may be on
account of the effects of chance (random error), bias or confounding
which may produce spurious results, leading the researcher to
believe the existence of a valid statistical association when one does
not exists or alternatively the absence of an association when one is
truly present.
Observational studies are more susceptible to the effect of chance,
bias and confounding, so appropriate steps must be taken at both
the design and analysis so their effects could be minimized.
BIAS
Any systematic error that results in an incorrect estimate of the
association between an exposure and the disease/outcome is
called a bias. It is usually introduced by the researcher due to
nonstandardized measuring techniques.
Types of Bias
More than 50 types of bias are identified in epidemiological studies,
but for simplicity, they are broadly grouped into two categories:
1. Selection bias
2. Information bias
90 Basics in Epidemiology and Biostatistics
Selection Bias
It occurs when the inclusion of subjects in a study depends in some
way on the outcome of interest. It occurs mainly in case control and
retrospective cohort studies and not in prospective cohort study as
outcome of interest has not yet occurred. Selection bias can occur
due to improper means or source of selection of study subjects.
A classical example of selection bias is a study conducted to see
the association between oral contraceptives (OC) and thrombo-
embolism. There was a concern in this study that as physicians
were already aware of the possible relationship between OC and
thromboembolism, hence proportion of women that had been
hospitalized for evaluation of thromboembolism was all current
users of OCs. So any increased frequency of thromboembolism
in oral contraceptive users could be in part due to the fact that
hospitalization and the determination of the diagnosis were both
influenced by a history of OC use.
Another means of selection bias could be due to inappropriate
source of selection, e.g. cases selected from hospitals and controls
from household surveys. In this case it is possible that a number of
demographic and lifestyle variables could be different amongst the
cases and controls leading to noncomparability between groups and
incorrect results with respect to association between exposures and
outcome.
In a clinical trial a selection bias can occur if there is no
randomization. Suppose that the principal investigator is taking
a decision as to which patients are going to be included in the
standard drug group and which patient is going to be included in
the new drug group. If the principal investigator is allowed to do so,
he might include all the healthy patients in the new drug group and
all patients who are sick (and have multiple comorbid conditions) in
the standard drug group. Thus, he can show better outcomes among
the new drug group (who are healthy patients) compared to the
standard treatment group (who are sicker) and present results which
are not true. The process of randomization ensures that selection bias
cannot take place, by ensuring that the principal investigator and
his team members are not even close to where the randomization
process is taking place.
tahir99 - UnitedVRG
Factors Affecting Study Outcomes 91
CONTROL OF BIAS
Control of bias is mostly done at the design phase of the study.
Following are some means to ensure the same.
CONFOUNDING
The concept of confounding is a central one in the interpretation
of any epidemiological study. It can be thought of as mixing of the
effect of the exposure under study on the outcome, with that of an
extraneous factor—the “confounder”. This external factor or variable
must be associated with the exposure, and independent of the
exposure must be a risk factor for the disease to be deemed as a
confounder. Confounding can lead to an over or an underestimation
of the true association between exposure and outcome.
Example 1: In a study conducted to determine the association
between smoking and myocardial infarction (MI), age can be a
confounder as it is associated with both exposure and outcome
independently.
tahir99 - UnitedVRG
Factors Affecting Study Outcomes 93
Yes 29 135
No 205 1607 = 1.68
Total 234 1742
EFFECT MODIFIERS
Effect modifiers are variables that bring about a change in the
magnitude of an effect. Unlike confounder, effect modifier does not
94 Basics in Epidemiology and Biostatistics
BIBLIOGRAPHY
1. Delgado-Rodríguez M, Lorca J (Eds). Bias. J Epidemiol Community
Health. 2004;58(8):635-41.
2. Hennekens CH, Buring JE (Eds). Analysis of epidemiologic studies:
evaluation the role of bias. Epidemiology in medicine. Boston: Little
Brown and Company; 1987.pp. 243-71.
3. Rothman KJ, Greenland S, Lash TL. Validity in epidemiologic study.
In: Rothman KJ, Greenland S, Lash TL (Eds). Modern epidemiology.
Philadelphia, PA: Lippincott, Williams and Wilkins; 2008. pp. 128-47.
tahir99 - UnitedVRG
CHAPTER
10
Sample Size Estimation
SAMPLE SIZE
The sample size calculation depends on:
• Type of study
• Magnitude of the outcome of interest derived from previous
studies
• Type of statistical analysis required (comparing means or
proportions)
• Level of significance/power.
Figure 10.1 Sample size calculation and formula for single proportion
When the above values are entered into WHO sample size
calculator, the estimated sample size will be calculated (Fig. 10.1).
The estimated sample size calculated is 369. Thus, at least
369 participants must be recruited in the study to determine the
prevalence of CKD at confidence interval of 95 percent, with a
precision of 5 percent.
tahir99 - UnitedVRG
Sample Size Estimation 97
8.2 g/dL and standard deviation of 4.2 g/dL. How many pregnant
women must be studied if he wants the estimate should fall within
1 g/dL with 95 percent confidence?
Values needed to be entered into the WHO Sample Size Calculator:
Confidence interval: 95 percent
Population mean (Average hemoglobin of pregnant women identified
from previous study): 8.2 g/dL
Population standard deviation: 4.2 g/dL
Absolute precision required: 1 g/dL
When the above values are entered into WHO sample size
calculator, the estimated sample size will be calculated (Fig. 10.2).
Where ∈ = d/µ
∈ = Relative precision
d = Absolute precision
µ = Population mean
The estimated sample size calculated is 68. Thus, at least 68
participants must be recruited in the study to estimate the mean
hemoglobin level among pregnant women at confidence interval of
95 percent, with a precision of 1 g/dL.
Figure 10.2 Sample size calculation and formula for single group mean
98 Basics in Epidemiology and Biostatistics
tahir99 - UnitedVRG
Sample Size Estimation 99
Figure 10.3 Sample size calculation and formula for two proportions
tahir99 - UnitedVRG
Sample Size Estimation 101
To achieve the precision of 0.05 for ‘Sensitivity’, we need ‘the total sample size’ of = 347 This
is preferable as it will give precision of 0.05 or less for both sensitivity and specificity
With this sample size, the precision for ‘Specificity’ will be = 0.027
102 Basics in Epidemiology and Biostatistics
BIBLIOGRAPHY
1. Calkins KG. Power and sample size: an appropriate sample size is
crucial to any well-planned research investigation. [Online]. 2005 [cited
19 Sep. 2008]; Available from: URL: http://www.andrews.edu/~calkins/
math/edrm611/edrm11.htm.
2. Naing L, Winn T, Rusli BN. Practical issues in calculating the sample
size for prevalence studies. Arch Orofac Sci. 2006;1:9-14.
3. Naing L. Sample size calculation for sensitivity and specificity studies.
[Online]. 2004 [cited 10 Aug. 2008]; Available from: URL: http://www.
kck.usm.my/ppsg/statistical_resources/samplesize_forsensitivity_
specificitystudiesLinNaing.xls.
4. OpenEpi Version 2.2.1: open source epidemiologic statistics for public
health. [Online]. 2008 [cited 10 Oct. 2008]; Available from: URL: http://
www.openepi.com/Menu/OpenEpiMenu.htm.
5. Sample size calculations: statistics guide for research grant applicants.
[Online]. [2001?] [cited 14 Oct. 2008]; Available form: URL: http://
www.sgul.ac.uk/index.cfm?D7DEB028-B5BE-7536-BD9D-
2EC800CE3789CAB35E63-88E4-4358-889C-043A012DF815.
tahir99 - UnitedVRG
CHAPTER
11
Screening
Validity (Accuracy)
The term validity refers to what extent the test accurately measures
which it intends to measure. In other words, validity expresses the
104 Basics in Epidemiology and Biostatistics
tahir99 - UnitedVRG
Screening 105
G
Number of true positives
Sensitivity =
Total with disease
R
= a/(a + c)
when expressed in percent
V
a
= ×100
d
a+c
Specificity is the ability of the test or procedure to identify correctly
ti e
all those who do not have the disease, that is “true negatives” in
the screened population. Thirty percent specificity means that
30 percent of the nondiseased persons will give true-negative result,
n
while 70 percent of the nondiseased persons screened by the test
will be incorrectly classified as “diseased” when they are not. Thus,
U
expressed as the proportion of those without disease correctly
identified by a negative screening test result.
-
Number of true negative
Specificity =
Total without disease
9
= d/(b + d)
ri 9
when expressed in percent
d
= ×100
b+d
h
PREDICTIVE VALUES
ta
Predictive value reflects the diagnostic power of the test. The
predictive accuracy depends upon sensitivity, specificity and disease
prevalence.
Positive predictive value describes the probability of having
the disease given a positive screening test result in the screened
population. Thus, expressed as the proportion of those with disease
among all screening test positives. The positive predictive value of
mammography, for example, will tell a woman how likely it is that
she has breast cancer after a positive mammogram.
106 Basics in Epidemiology and Biostatistics
HIV
Infected Non-infected Total
a + b =110
Positive 60 (a = TP) 50 (b = FN)
ELISA Total test positive
Test c +d = 90
Negative 20 (c = FP) 70 (d = TN)
Total test negative
80 (a + c) 120 (b + d) a + b + c + d = 200
Total
Total infected Total not infected Total screened
tahir99 - UnitedVRG
Screening 107
a 60 × 100
Sensitivity = × 100 = = 75%, i.e. the new test ELISA is
a+c 80
75 percent sensitive in correctly identifying HIV infection.
d 70 × 100
Specificity = × 100 = = 58%, i.e. the new test ELISA is
G
d+b 120
58 percent specific to detect non-HIV infected persons.
V R
d
a 60 × 100
PPV = × 100 = = 55% , i.e. based over, ELISA the new
a+b 100
ti e
screening technique for HIV 55 percent persons who test positive,
are actually suffering from HIV.
n
Negative Predictive Value (NPV)
d 70 × 100
U
NPV = × 100 = = 78%, i.e. based over, ELISA the new
c +d 90
-
screening technique for HIV 78 percent persons who test negative,
are actually free from HIV.
9
Relationship between Sensitivity,
ri 9
Specificity, PPV and NPV
Sensitivity and NPV
h
Sensitivity and Negative predictive value are positively correlated
(increase in one will increase other). If the test is more sensitive, it
ta
is less likely that an individual with a negative result will have the
disease, so the greater Negative predictive value.
99 9405
PPV = × 100; NPV = × 100
594 9406
= 17% = 99.99%
However, with the same sensitivity, specificity and population
size, if the prevalence changes then what will be the effect on the
tests positive predictive value (PPV); see example 2b.
Example 1b: In a population of 10,000 with a disease prevalence of
5% Sensitivity = 99%; Specificity = 95% with test A;
Disease Disease Disease
Total
prevalence positive negative
5% Test (Positive) 495 475 970
Test (Negative) 5 9025 9030
Total 500 9500 10,000
495 9025
PPV = × 100; NPV = × 100
970 9030
= 51.03 % = 99.94 %
tahir99 - UnitedVRG
Screening 109
G
BIBLIOGRAPHY
1. Hennekens CH, Buring JE. Screening. Epidemiology in medicine.
R
Boston, Mass: Little, Brown and Co; 1987.pp.327-47.
2. Park K. Screening for Disease. In Park’s Textbook of Preventive and
V
Social Medicine. India: Bhanot; 2009.pp.123-130.
d
3. Petrie A, Sabin C. Diagnostic tools. Medical statistics at a glance. UK:
Blackwell Science; 2000.pp.90-1.
ti e
4. Wassertheil-Smoller S. Mostly about screening. Biostatistics and
epidemiology: a primer for health and biomedical professionals. New
York: Springer-Verlag; 1995.pp.118-28.
Un
-
9
ri 9
h
ta
CHAPTER
12
Basic Statistical Tests
UNPAIRED SAMPLES
In unpaired samples, there is no relation between subjects in group
1 and subjects in group 2 (two independent groups). Suppose a data
is collected on ICT skills comparing medical versus engineering
students. These are two independent groups. Whenever you are
comparing mean of continuous variable in two independent groups
(e.g. medical students and engineering students), an independent
sample t-test will be applied.
PAIRED SAMPLES
In paired samples, repeated measures (pre-post test) are taken on the
same subject. For example, if you wanted to determine how much a
student learned in a statistics class, you would do a pre (before) and
post (after) test to determine the impact of intervention (statistical
class) on the score.
Whenever comparing a categorical variable (qualitative data)
between two groups, a Chi-square test is used.
Comparing a continuous variable (quantitative data) between
two independent groups is called comparing two means, a t-test (e.g.
independent t-test, paired test, etc.) is applied for this purpose.
When comparing a continuous variable (quantitative data)
between two paired groups (pre-post) a paired t-test is applied.
tahir99 - UnitedVRG
Basic Statistical Tests 111
Flow charts 12.1 and 12.2 give different choices of tests for
qualitative and quantitative data.
ri 9
quantitative data to compare means
h
ta
112 Basics in Epidemiology and Biostatistics
Nonparametric Tests
When assumptions of the parametric tests are not satisfied, i.e.
data is not normally distributed or the data is collected on less
than 30 participants a nonparametric test is applied (Flow chart
12.3). Nonparametric tests are an alternative to parametric tests.
Chi-square is the most frequently used nonparametric test. Other,
nonparametric tests are:
Wilcoxon Rank Sum test or Mann-Whitney U test is the
nonparametric version of the independent sample t-test and can
be used when assumptions of the parametric tests are not satisfied.
Thus, Mann-Whitney U test is used to compare median of two
independent samples when the data is either:
• On interval scale; or
• Ranked (ordered) scale.
The test is used to test the hypothesis that two population
distributions do not differ in median (e.g. a null hypothesis comparing
median bicep skinfold thickness of patients with celiac disease and
Crohn’s disease would say that the two median are equal).
tahir99 - UnitedVRG
Basic Statistical Tests 113
G
The test is based on the rank of absolute difference, rather than
the numerical value of the difference (Table 12.1).
R
Kruskal-Wallis test is the nonparametric version of ANOVA and
used when the assumptions of the parametric tests are not satisfied.
V
WHAT ARE VALIDITY AND RELIABILITY
d
IN RESEARCH FINDINGS?
ti e
Validity and reliability has been discussed in Figures 12.1A to D.
Validity means that your scientific observations actually measure
what they intend to measure (your conclusions are true).
n
Reliability means that someone else using the same method in
the same circumstances should be able to obtain the same findings
U
(your findings are repeatable).
Reliability (repeatability) refers to the possibility to replicate
-
(repeat) the observations and is related to the precision of the
instrument used for scientific observations. Validity refers to the
9
soundness of the observations and to the accurateness of the data
collected by the research method/instrument.
ri 9
Table 12.1: Wilcoxon signed rank test
h
Participants ID Placebo Drug Difference
(Placebo-Drug)
ta
1 2 1 1
2 5 2 3
3 8 3 5
4 6 4 2
5 9 3 6
6 13 16 -3
7 19 8 11
114 Basics in Epidemiology and Biostatistics
A B
C D
Figures 12.1A to D (A) Neither valid nor reliable. The research method does
not measure the research outcome (not valid) and repeated attempts are un-
focused; (B) Reliable but not valid. The research method does not measure
the research outcome (not valid), but repeated attempts get almost the same
(wrong) results; (C) Fairly valid but not very reliable. The research method
measures the research outcomes fairly closely, but repeated attempts have
very scattered results (not reliable); (D) Valid and reliable. The research
method precisely measures the research outcomes, and repeated attempts
produce similar results
BIBLIOGRAPHY
1. Data management: preparing to analyse the data. In: Peat J, Barton B.
Medical Statistics: a guide to data analysis and critical appraisal. USA:
Blackwell Publishing Ltd; 2005.pp.1-23.
2. Field A. Discovering statistics using IBM SPSS statistics. Sage
Publications, 2013.
3. Petrie A, Sabin C. Medical Statistics at a glance (vol 29). John Wiley &
Sons;2009.
4. Pallant J. SPSS Survival manual: A step-by-step guide to data analysis
using SPSS for windows (version 10): Allen and Unwin, 2001.
5. Rosner B. Fundamentals of biostatistics. Cengage Learning, 2010.
tahir99 - UnitedVRG
CHAPTER
13
Overview of Data
Collection Techniques
R G
V
Data collection techniques allow us to systematically collect
d
information about our objects of study (people, objects, phenomena)
and about the settings in which they occur.
ti e
DIFFERENT DATA COLLECTION TECHNIQUES
n
• Using available information
• Observing
U
• Interviewing (face-to-face)
• Administering written questionnaire
-
• Focus group discussion
• Projective techniques
9
• Mapping and scaling.
ri 9
Using Available Information
Usually, there is a large amount of information/data that has
been collected by some other source but not being analyzed and
h
published. For example, analysis of information collected from a
ta
Primary Health Care Center regarding the proportion of different
diseases and the age group affected in those diseases in an area. The
advantage of using existing knowledge is that it is a very inexpensive
method, however, the data may not always be completed or too
disorganized.
Observing
It is a technique that involves systematically selecting, watching
and recording behavior and characteristics of living beings, objects
or phenomena. Observations can be open (e.g. observing a health
worker during his/her routine activities) or concealed (e.g. mystery
116 Basics in Epidemiology and Biostatistics
Interviewing
Here there is oral questioning of respondents. Answers to the
questions posed during an interview are either written down or
recorded by a tape recorder, or both techniques could be used.
The unstructured method of asking questions is used. This method
is frequently used in exploratory studies where the investigator has,
as yet, little understanding of the problem, or if the topic is sensitive.
Questionnaire
A written questionnaire also known as self-administered question
naire, is a data collection tool in which questions are presented that
are to be answered by the respondent himself in written form.
Questionnaire comprises of a formal, written, set of closed-ended/
open-ended questions that are asked from every respondent in the
study. It provides an objective means of collecting information (data)
related to exposure/outcome of interest as well as on confounders or
effect modifiers.
Types of Questions
• Open-ended questions are those questions that solicit additional
information from the inquirer. They are also called infinite
response or unsaturated type questions. By definition, they are
broad and require more than one or two word responses. These
types of questions are of use in conduct of qualitative research.
• Closed ended questions: Closed ended questions are those
questions, which can be answered finitely by either “yes” or “no”.
They are also called dichotomous or saturated type questions.
In quantitative research closed ended questions are maximally
used.
tahir99 - UnitedVRG
Overview of Data Collection Techniques 117
G
• Interviewer.
V
the information you would need in your analysis. Therefore,
R
d
before you compose any question, think through your research
questions/objectives and also think how you will conduct your
ti e
analysis.
• It should be ensured that the format of the questionnaire be
attractive and easy for the respondents to fill, overcrowding or
n
cluttering of inquiries should be avoided. All pages and questions
should be clearly numbered.
U
• The questionnaire should never be too long. In general, questions
should be short and to the point (around 12 words or less).
-
• Only information relevant to the objective should be solicited, the
proforma/questionnaire should not resemble a history sheet.
• Be careful about responses of ‘neutral’ or ‘no opinion’ versus ‘do
9
not know’.
ri 9
• Questions concerning major areas should be grouped together.
• Simple questions about age, birth date, etc. should be put at the
beginning to warm up the respondent.
• Questions should ask only 1 piece of information.
h
• Question wordings should ensure that every respondent will
ta
be answering the same thing, so avoid ambiguous wording or
wording that means different things to different respondents. Also
avoid terms for which the definition can vary (if it is unavoidable,
provide the respondent with a definition).
• Question should be preferably close ended, possible answers
to close ended question should be lined vertically, preceded by
boxes, brackets or numbers.
Example: How many different medicines do you take daily (check
one)?
– [ ] None
–
– [ ] 1–2
–
118 Basics in Epidemiology and Biostatistics
– [ ] 3–4
–
– [ ] 5–6
–
– [ ] 7 or more
–
• If more details are required pertaining to a question, then the
filter/skip technique should be used to save time and allow
respondents to avoid irrelevant questions.
Example: Have you ever been told that you have hypertension?
1. Yes
2. No
If yes, proceed to the next question
How long back were you told that you have hypertension?
• Always choose an appropriate means of measurement e.g. score/
scales.
Example: Two words that are often used inappropriately are
frequently and regularly. A poorly designed question might read,
“I frequently engage in exercise,” and offer a Likert scale giving
responses from “strongly agree” through to “strongly disagree.”
But “frequently” implies frequency, so a frequency based rating
scale (with options such as at least once a day, twice a week, and
so on) would be more appropriate.
• Sensitive questions should be left for the end.
• Using a previously validated and published questionnaire will
save your time and resources, so if similar research instruments
are available it may be a good idea to review and borrow questions.
• Always try to ensure that if questions are to be asked in any
language besides English they shall be so written too.
Projective Techniques
When a researcher uses projective techniques, he asks an informant
to react to some kind of visual or verbal stimulus.
For example, the presentation of a hypothetical question or
an incomplete sentence or case/study to an informant (story with
a gap). The researcher then asks the informant to complete the
sentence in writing such as;
tahir99 - UnitedVRG
Overview of Data Collection Techniques 119
G
Mapping and Scaling
It is a valuable technique to display relationships and resources. In
R
a water supply project, for example, mapping is invaluable. It can be
V
used to present the placement of wells, distance of the homes from
the wells, other water systems, etc. It gives researcher a good overview
d
of the physical situation and may help to highlight relationships
hitherto unrecognized.
ti e
Scaling is a technique that allows researcher through their
respondents to categorize certain variables that they would not be
able to rank themselves. For example, they may ask their informant
n
to bring certain types of herbal medicine and ask them to arrange
these into piles according to their usefulness. The informant would
U
then be asked to explain the logic of their ranking.
Mapping and scaling are used as techniques in rapid appraisals
-
or situation analysis. Rapid appraisal technique is an approach often
used in health systems-research.
9
BIBLIOGRAPHY
ri 9
1. Bourque, Linda and Eve Fielder. How to Conduct Self-Administered
and Mail Surveys? Learning Objectives. Thousand Oaks, CA: Sage
Publications, 1995.
h
2. Converse Jean M, Stanley Presser. Survey Questions: Handcrafting the
ta
Standardized Questionnaire. Quantitative Applications in the Social
Sciences (series). Thousand Oaks, CA: Sage Publications, 1986.
3. Dillman Don A. Mail and Internet Surveys: The Tailored Design Method.
New York: J Wiley, 2000.
4. Fink Arlene. How To Ask Survey Questions? Thousand Oaks, CA:Sage
Publication, 1995.
5. Fowler, Floyd J Jr. Improving Survey Questions: Design and Evaluation.
Thousand Oaks, CA: Sage Publications, 1995
6. Sudman Seymore, Norman M Bradburn. Asking Questions: A Practical
Guide to Questionnaire Design. San Francisco: Jossey-Bass Inc., 1982.
CHAPTER
14
Data Analysis Plan
tahir99 - UnitedVRG
Data Analysis Plan 121
G
an important review of the appropriateness of the data collection
tools for collecting the data you need. That is why you have to plan for
data analysis before the pretest. When you process and analyze the
R
data you collect during the pretest you will spot gaps and overlaps
which require changes in the data collection tools before it is too late!
d V
ti e
When making a plan for data processing and analysis the following
issues should be considered:
• Sorting data
n
• Performing quality-control checks
• Data processing
U
• Data analysis.
-
Sorting Data
An appropriate system for sorting the data is important for facilitating
9
subsequent processing and analysis.
If you have different study populations (for example, doctors,
ri 9
paramedical staff and medical students), you obviously would
number the questionnaires separately.
In a comparative study, it is best to sort the data right after
h
collection into the two or three groups that you will be comparing
ta
during data analysis. For example, in a study where you are
interested to know the use of sedatives by the doctors, users and
nonusers would be two basic categories. In a study of the reasons
why doctors object to being posted in rural areas, rural and urban
doctors would be basic categories. In a case-control study obviously
the cases are to be compared with the controls. It is useful to number
the questionnaires belonging to each of these categories separately,
right after they are sorted.
For example:
Yes (or positive response) code-Y or 1
No (or negative response) code-N or 2
Do not know code-D or 8
No response/unknown code-U or 9
tahir99 - UnitedVRG
Data Analysis Plan 123
G
for creating questionnaires (developed by the Center for Disease
Control, Atlanta, USA and World Health Organization, Geneva),
R
• LOTUS 1-2-3, a spreadsheet program (from the Lotus
Development Corporation),
V
• dBase (version III plus or IV), a data-management program (from
Ashton-Tate), and
d
• SPSS, which is a quite advanced Statistical Package for Social
ti e
Sciences (SPSS Inc.).
If you intend to use a computer, you may ask advice from
an experienced person concerning which program is the most
appropriate for your type of data. Note that Epi Info may be freely
n
used and copied. All the other programs have copyrights.
U
Data Analysis: Quantitative Data
-
Analysis of quantitative data involves the production and
interpretation of frequencies, tables, graphs, etc., that describe the
9
data.
After deciding on a data entry format, the information on the
ri 9
data collection instrument will have to be coded (e.g., Male: M or 1,
Female: F or 2). During data entry, the information relating to each
subject in the study is keyed into the computer in the form of the
h
relevant code (e.g., if the first subject (identified as 001) is a male
(code 1) aged 25, the data could be keyed in as 001125).
ta
The computer can do all kinds of analysis and the results can be
printed. It is important to decide whether each of the tables, graphs,
and statistical tests that can be produced makes sense and should
be used in your report. That is why we plan the data analysis before
hand!
• Frequency counts: From the data master sheets, simple tables can
be made with frequency counts for each variable. A frequency
count is an enumeration of how often a certain measurement or a
certain answer to a specific question occurs.
124 Basics in Epidemiology and Biostatistics
For example:
Smokers 51
Nonsmokers 93
Total 144
If numbers are large enough it is better to calculate the frequency
distribution in percentages (relative frequencies): 51/144 × 100 =
35 percent are smokers and 93/144 × 100 = 65 percent nonsmokers.
This makes it easier to compare groups than when only absolute
numbers are given. In other words, percentages standardize the
data.
• Divide the range into three to five categories. You can either aim
at having a reasonable number in each category (e.g. 0–2 km,
3–4 km, 5–9 km, 10+ km for home-clinic distance) or you can
define the categories in such a way that they are each equal in size
(e.g. 20–29 years, 30–39 years, 40–49 years, etc.).
• Construct a table indicating how data are grouped and count the
number of observations in each group.
• Cross-tabulations: Further analysis of the data usually requires
the combination of information on two or more variables in order
to describe the problem or to arrive at possible explanations for it.
For this purpose it is necessary to design cross-tabulations.
Depending on the objectives and the type of study, two major
kinds of cross-tabulations may be required:
1. Descriptive cross-tabulations that aim at describing the
problem under study.
2. Analytic cross-tabulations in which groups are compared in
order to determine differences, or which focus on exploring
relationships between variables.
A descriptive cross-tabulation (Table 14.1) would, for example,
relate smoking behavior to sex or occupational background:
The males appear to be smoking more (43%) than females (28%).
Table 14.1: Smoking by sex
Sex Smoking Not smoking Total
Males 31 (43%) 41 (57%) 72 (100%)
Females 20 (28%) 52 (72%) 72 (100%)
Total 51 (35%) 93 (65%) 144 (100%)
tahir99 - UnitedVRG
Data Analysis Plan 125
Table 14.2: Smoking in relation to persistent cough over the past 2 weeks
Smoking behavior Cough No cough Total
Smoking 10 (77%) 41 (32%) 51 (35%)
Not smoking 3 (23%) 90 (68%) 93 (65%)
Total 13 (100%) 131 (100%) 144 (100%)
126 Basics in Epidemiology and Biostatistics
tahir99 - UnitedVRG
Data Analysis Plan 127
Step 3: List the answers again, grouping those with the same code
together.
Step 4: Then interpret each category of answers and try to give it a label
that covers the content of all answers. In the case of data on opinions,
for example, there may be only a limited number of possibilities,
which may range from (very) positive, neutral, to (very) negative.
Data on reasons may require different categories depending on
the topic and the purpose of your question. In the exercise below
you will be asked to categorize the reasons why people smoke by
grouping them in such a way that it is easy to find entry points for
health education aimed at reducing smoking.
After some shuffling you usually end up with 5 to 7 categories.
Step 5: Now try a next batch of 20 questionnaires and check if the
labels work. Adjust the categories and labels, if necessary.
Step 6: Make a final list of labels for each category and give each label
a code (keyword, letter or number).
Step 7: Code all your data, including what you have already coded,
and enter these codes in your master sheet or in the computer.
Note again that you may include a category ‘others’, but that it
should be as small as possible, preferably used for less than 5 percent
of the total answers. If you categorize your responses to open-ended
questions in this way you can:
• Analyze the content of each answer given in particular categories,
for example, in order to plan what actions should be taken (e.g.,
for health education). Gaining insight in a problem, or in possible
interventions for a problem, is the most important function of
qualitative data.
• Report the number and percentage of respondents that fall into
each category; so that you gain insight in the relative weight of
different opinions or reasons.
Questions that ask for descriptions of procedures, practices, or
beliefs usually do not provide quantifiable answers (though you may
quantify certain aspects of them). The answers rather form part of a
jigsaw puzzle that you have to put together in order to obtain insight
in your problem/topic under study.
128 Basics in Epidemiology and Biostatistics
BIBLIOGRAPHY
1. AO Foundation (n.d.0. Step-by-step guide to doing clinical research.
Retrieved on 09 October 2006 from http//:www.aofoundation.org/
portal/wps/portal/!ut/p/.cmd/cs/.ce/7_0_7T5/_s.7_0_A/7_0_7T5.
2. Designing and conducting Health Systems Research Project Volume
1. The International Development Research Centre (Science for
Humanity). Module 13: Plan for data processing and analysis. Retrieved
on 14 April 2010 from http//:www.idrc.ca/en/ev-56622-201-1-D0-
TOPIC.html
3. Professional Data Analysts (n.d.). Stage 3: Data Analysis. Retrieved on
09 October 2006 from http//:www.pdastats.com/default.asp
tahir99 - UnitedVRG
CHAPTER
15
Synopsis Writing
METHODOLOGY
• Operational definitions
• Type of study and general design
130 Basics in Epidemiology and Biostatistics
INTRODUCTION
An introduction is the most important part of the research protocol
and it should come very strongly just like a thunder to grab the
reader’s attention. It is here that one tries to let the reviewer know
that his research is going to be different from what other people have
done. One should also know that in case of research protocol the
onus is on the researcher to tell the reviewer how important is the
study going to be. Let me explain here, in case the reviewer comes
from a specialty different than the researcher, the former might not
tahir99 - UnitedVRG
Synopsis Writing 131
Third Paragraph
The third paragraph should point out the existing gaps in scientific
knowledge and how the present study will contribute to fill in the
gaps.
A template of a third paragraph: Using the same study as an example,
this is how the researchers made their point. The outcome mortality
has been a controversial issue among LR vs ER in end stage renal
disease (ESRD) patients. The previous studies on this subject have
been single center studies with a sample size of a few hundred
patients only.
“Our study will be carried out on a generalizable United States
population of about 3,50,000 dialysis patients recruited from all
states of the US. Our study will also be using a novel statistical
technique called propensity score analysis (PS analysis). PS analysis
is a proxy for randomization. Thus using a PS analysis will make the
study as good as randomized controlled clinical trial. Hence, our
study is going to make an effort to settle this controversy regarding
LR vs ER in a robust fashion.”
Fourth Paragraph
The fourth paragraph should give details about the rationale of the
study planned. Thus a clear emphasis must be made why this study
is important.
A template of a fourth paragraph: The outcome (i.e. mortality)
associated with late vs early referral has been a controversial subject
and has generated immense debate among the researchers. There
is lack of consensus among researchers whether late referral is
tahir99 - UnitedVRG
Synopsis Writing 133
Research Objectives
A research objective is a statement that clearly depicts the goal to be
achieved by a research project. In other words, the objectives of a
research project summarize what a study plans to achieve.
The formulation of objectives will help you to:
• Focus the study (narrowing it down to essentials)
• Avoid the collection of data which are not strictly necessary for
understanding and solving the problem you have identified (to
establish the limits of the study)
• Organize the study in clearly defined parts or phases.
Properly formulated, specific objectives will facilitate the
development of your research methodology and will help to orient
the collection, analysis, interpretation and utilization of data.
Objectives should be stated using “action verbs” that are specific
enough to be measured:
Examples: To determine …, To compare…, To verify…, To calculate…,
To describe…, etc.
Do not use vague nonaction verbs such as:
To appreciate … To understand… To believe
An objective is intent of what the researcher wants to determine
and should be stated in clear, measurable terms. While developing
a research protocol a researcher must ensure that the research
objective must match the hypothesis and data analysis plan.
Moreover, a researcher can have as many objectives as he feels that
the study is feasible to achieve.
Given below is an example of specific aims/objectives mentioned
for a study looking at the impact of socioeconomic factors on
134 Basics in Epidemiology and Biostatistics
Operational Definition
It is the definition of the exposure and outcome variables of interest
in context to objective in a particular study and their means of
measurement/determination.
Consider that one wishes to do a study on anemia in patients
with chronic kidney disease (CKD). He has to give an operational
definition of anemia in his study. This definition of anemia should
not be a textbook definition of anemia, rather it should mention
what anemia means in this particular study. For example, he
should mention an operational definition that anemia in this study
is defined as hemoglobin less than 11 g/dL. This cut-off of 11 g/dL
should ideally come from a world recognized body like the WHO or
National Kidney Foundation.
Take another example, a study to compare the effectiveness of
dressing A and dressing B in patients presenting with infected wounds
of the foot. An outcome variable should be easily measureable. By
looking at the objective it is not clear that what will be deemed as
effective and how will effectiveness be measure. So effectiveness
should be defined in clear measurable terms. “The effectiveness
tahir99 - UnitedVRG
Synopsis Writing 135
Contd...
136 Basics in Epidemiology and Biostatistics
Contd...
Contd...
tahir99 - UnitedVRG
Synopsis Writing 137
Contd...
Sampling Method
Sampling is the process involving the selection of a finite number
of elements from a given population of interest, for purposes of
inquiry. A researcher can use either a probability or nonprobability
sampling technique after considering the cost, resources available
and practicability.
Large-scale descriptive studies almost always use probability-
sampling techniques. Intervention studies sometimes use probability
sampling but also frequently use nonprobability sampling. Qualita
tive studies almost always use nonprobability samples.
Probability sampling techniques are preferred by researchers
as maximizes external validity or generalizability of the results of
the study while nonprobability sampling techniques introduces
selection bias in the research.
tahir99 - UnitedVRG
Synopsis Writing 139
Duration of Study
It is also important to make clear that during what time period the
data will be collected. For example, “all participants who attend the
outpatient diabetic clinics of XYZ hospital from 1st January 2012 to
31st December 2013 will be included in the study.”
Software
The sample size calculation was done using the WHO software for
“Sample Size Calculation” edited by Lemeshow L and Lwanga SK.
Reference Study
The reference study used for this sample size calculation is;
Charité, Virchow Klinikum et al. “Betel quid chewing, oral cancer
and other oral mucosal diseases in Vietnam”. J Oral Pathol Med.
2008 Oct;37(9):511-4. Epub 2008 Jul 8. The values obtained from the
reference study are P1 = 0.30 ; 30% of the controls in the reference
study were consuming betel quid (chemical similar to ghutka).
P2 = 0.70 ; (70% of the cases in the reference study were consuming
betel quid). These two numbers 30 percent and 70 percent were
plugged into the WHO sample size software.
According to the proportion of exposures in cases and controls
in the above study, the sample size calculated is 38 (Fig. 15.1). The
results of the study are valid as confirmed by sample size calculation
using WHO software for sample size calculation.
Although the calculated sample size according to the WHO
software is 38 cases and 38 controls.
Ethical Concerns
Ethical concerns are of paramount importance for any research.
The researcher must obtain an informed consent in the local
language from all the participants. The purpose of the research,
intervention to be given, potential benefits and harms, voluntary
participation, healthcare cost, etc. must be explained in detail to
all study participants. It is also important to protect the rights of
vulnerable groups (i.e. children, mentally ill people, etc.) If children
are to included in the study, a consent from guardian is essential.
A translated version of the inform consent form must be attached
as an appendix. It is the duty of the researcher to ensure that
anonymity of the participants will be maintained throughout the
research. Moreover, confidentiality of participants response must
also be maintained during research. The researcher must make
sure that appropriate data protection policies are adopted, so no
unofficial person has an access to confidential data collected from
study participants. Finally, the researcher must ensure that the
study is conducted in accordance with the guidelines of Helenski
Deceleration, and if deemed necessary an approval from the local
ethical review board should be obtained. All these details must be
included in the ethical consideration portion of the methodology.
tahir99 - UnitedVRG
Synopsis Writing 143
Data Analysis
Descriptive Analysis
The data analysis usually begins with the descriptive analysis. The
descriptive analysis is the description about the characteristics of the
population/sample being studied. The descriptive analysis is usually
presented in research studies as shown in Table 15.2.
A universally accepted and prescribed descriptive analysis, if the
study is describing one sample/population is like given here:
A descriptive statistical analysis of continuous and categorical
variables will be performed. Data on continuous variables will be
presented as mean ± SD and data on categorical variables will be
presented as proportions.
Please note that there is no p-values column in Table 15.3 as no
comparison is being made.
If the comparison is to be made between two groups, then values
on each variable in both groups must be calculated, with a p-value
indicating any difference (Table 15.3).
Ideally, a statistical analysis should include various types of
analyses like cross-tabulations, linear regression, multivariate
regression analysis, and survival analysis. New researchers are
strongly encouraged to include these types of analysis to add
glamor and colour to the research. Examples of some of the analysis
mentioned above are given here.
tahir99 - UnitedVRG
Synopsis Writing 147
B
Figures 15.3A and B Relationship of hematocrit to renal function: Linear
regression between hematocrit and creatinine
tahir99 - UnitedVRG
Synopsis Writing 149
synopsis stage. The researcher so far does not have the data but
he has in his mind how the associations should be between these
two continuous variables (Figs 15.4 and 15.5). A true association
between continuous variables hematocrit and GFR, and hematocrit
and creatinines can be seen in Figures 15.3A and B, which is a
published study by kazmi et al.
BIBLIOGRAPHY
1. Guidelines for Synopsis and Dissertation Writing for CPSP, Retrieved
on 14 April 2010 from http://www pakmedinet.com/page/cpsp
2. Marg Gilks. How to write a synopsis? Retrieved on 14 April 2010.Writing-
World.com.from http://www.wrting-world.com/publish/synopsis.shtml
tahir99 - UnitedVRG
CHAPTER
16
Dissertation Writing
TITLE
It should highlight the key features of the study.
TABLE OF CONTENT
Include headings and subheadings with respect to the page number.
TITLE PAGE
It includes complete title of the manuscript, the name of the authors
with their highest qualifications, the department or institution to
which they are attached, address for correspondence with telephone
numbers and fax number, if possible.
ABSTRACT
Structured: All original articles should have a structured abstract.
Usually the limit ranges from one hundred fifty to two hundred fifty
words. The abstract should be in structured form and should have
headings of objective, study design, settings, subjects, interventions
(if applicable), main outcome measures, results and conclusions.
Keywords: Below the abstract give few keywords, which should not be
more than ten. These keywords are used in cross-indexing the article
and are usually published with abstract. Use terms from the Medical
Subject Headings (MeSH) which are listed with standard medical
headings given in the list of index medicus, e.g. glomerulonephritis,
paraplegia, infertility. If some cases, MeSH terms are not yet available
for recently introduced terms, present term may be used. Keywords
are included with structured abstract.
INTRODUCTION
It includes:
• Importance of the subject (what is known).
• Limitation of previous studies/gray areas/controversies (what is
unknown).
• Justification of your study/rationale (based on the above aspects
e.g., gaps in knowledge).
• Any special strength of your study.
tahir99 - UnitedVRG
Dissertation Writing 153
HYPOTHESIS
It is an expected relationship between the exposure and the outcome.
STUDY OBJECTIVE
Formulate your objective(s) clearly. Remember Quality Thoughts
Precede Quality Results.
RESULTS
Firstly, the demographic profile is shown (e.g. if the study is done
on human subjects, show the different age groups, common areas of
154 Basics in Epidemiology and Biostatistics
DISCUSSION
It should emphasize the salient features of present findings.
Comparisons should be made of variations or similarities with
results of previous similar studies both national and international
with references. The detailed data should not be repeated in the
discussion. It must be mentioned whether the hypothesis in the
article was rejected, or could not be rejected. It is important to
remember that in the “discussion section” only discuss points you
have highlighted in the results. The second last paragraph highlights
the limitations of your study. It is a good idea to mention your
limitations before they are pointed out to you by the reviewer. The
conclusions of your study must be based on what you have observed
in your results.
OPTIONAL COMPONENTS
They are added only whenever applied. These are as follows:
Acknowledgement—if desired, it should be included after the
discussion and before references.
Letter of undertaking signed by the main author must accompany
all manuscripts.
tahir99 - UnitedVRG
Dissertation Writing 155
REFERENCES
It includes citation in the text that should be serially numbered. List
the references in Vancouver style.
ANNEXES
It should be added, if they increase the understanding or evaluation
of the study. All annexure should be serially numbered and referred
to at appropriate places in the body of dissertation.
Dr XYZ
FCPS Student
(2008-2009)
Supervisor:
Dr ABC
Institute
Department
Name of Institution
156 Basics in Epidemiology and Biostatistics
Official stamp:
BIBLIOGRAPHY
1. Dissertation Writing. Retrieved on 15 April 2010 from www.cpsp.edu.
pk/guideline/dissertation.
2. Newcastle University, (2009). School of Chemical Engineering
and Advanced Materials. Writing Research Thesis or Dissertations
(guidelines and tips). Retrieved on 14 April 2010 from http://lorien.ncl.
ac.uk/ming/dept/tips/writing/thesis/thesis-layout.htm
3. PhD-Dissertations.com. Retrieved on 15 April 2010 from http://www.
phd-dissertations.com/topic/medical_dissertation_thesis.html
tahir99 - UnitedVRG
CHAPTER
17
Reference Writing
– Put a comma and 1 space between each name. The last author
–
must have a full-stop after his initial(s).
• Format name (s) of author(s): Surname (1 space) initial(s) (no
spaces or punctuation between surname and initials) (full-stop
OR if further names comma, 1 space).
– Example: Halpern SD, Ubel PA, Caplan AL. Solid-organ trans
–
plantation in HIV-infected patients. N Engl J Med. 2002;
347(4):284-7.
As an option, if a journal carries continuous pagination
throughout a volume (as many medical journals do) the month
and issue number may be omitted.
– Example: Halpern SD, Ubel PA, Caplan AL. Solid-organ
–
transplantation in HIV-infected patients. N Engl J Med.
2002;347: 284-7.
• More than six authors
– Example: Rose ME, Huerbin MB, Melick J, Marion DW, Palmer
–
AM, Schiding JK, et al. Regulation of interstitial excitatory
amino acid concentrations after cortical contusion injury.
Brain Res. 2002; 935(1-2):40-6.
• Organization as author
– Example: Diabetes Prevention Program Research Group.
–
Hypertension, insulin, and proinsulin in participants with
impaired glucose tolerance. Hypertension. 2002; 40(5):679-86.
JOURNAL’S TITLE
• Title of journal (abbreviated)
– Abbreviate title according to the style used in Medline. A list of
–
abbreviations can be found at: http://www.nlm.nih.gov
tahir99 - UnitedVRG
Reference Writing 159
Volume Number
• If the journal has continuous page numbering through volume,
the month/day and issue information can be omitted.
• Format volume of publication: Volume number (no space) issue
number in brackets (colon, no space) OR volume number (colon,
no space).
–– Example: 4(3):
Page Numbers
• Format of page number: Page numbers (full-stop).
–– Example: pp. 122-9.
–– Example: pp. 1129-57.
OTHER AUTHORS
• More than six authors: Give the first six names in full and add “et
al.” The authors are listed in the order in which they appear on the
title page.
• Editor(s): Follow the same methods used with authors but use the
word “editor” or “editors” in full after the name(s). The word editor
or editors must be in lower case. (Do not confuse with “edn” used
for edition).
–– Example: Millares M, editor. Applied drug information:
strategies for information management. Vancouver, WA:
Applied Therapeutics, Inc.; 1998.
• Sponsored by institution, corporation or other organization
(including Pamphlet)
–– Example: Australian Pharmaceutical Advisory Council.
Integrated best practice model for medication management
in residential aged care facilities. Canberra: Australian
Government Publishing Service; 1997.
Chapter or part of a book to which a number of authors have
contributed.
• Format of book chapter: Author(s)/editor(s) of chapter. Title
of chapter. In: author(s)/editor(s) of book. Title of book. City of
publication (State or country of publication): Publisher; year.
pages of book chapter.
–– Example: Porter RJ, Meldrum BS. Antiepileptic drugs. In:
Katzung BG, editor. Basic and clinical pharmacology. Norwalk,
CN: Appleton and Lange; 1995.pp. 361-80.
DISSERTATION REFERENCE
Example: Borkowski MM. Infant sleep and feeding: a telephone
survey of Hispanic Americans [dissertation]. Mount Pleasant (MI):
Central Michigan University; 2002.
tahir99 - UnitedVRG
Reference Writing 163
BIBLIOGRAPHY
1. International Committee of Medical Journal Editors. Uniform
requirements of manuscripts submitted to biomedical journal: sample
references. [monograph on the Internet]. Bethesda (MD): National
library of Medicine (US); 2003. [cited 10 Aug. 2008]; Available from:
URL: http://www.nlm.nih.gov/bsd/uniform_requirements.html.
2. Uniform requirements for manuscripts submitted to biomedical
journals. International Committee of Medical Journal Editors. CMAJ.
1995;152(9):1459-73.
CHAPTER
18
Guidelines for Consent Writing
tahir99 - UnitedVRG
Guidelines for Consent Writing 165
G
considerations, namely:
1. Respect for autonomy, which requires that those who are
R
capable of deliberation about their personal choices should be
treated with respect for their capacity for self-determination.
V
2. Protection of persons with impaired or diminished autonomy
(vulnerable groups e.g. children/minors, subjects with
d
psychiatric illness, etc.), which requires that those who are
ti e
dependent or vulnerable be afforded security against harm or
abuse.
• Beneficence refers to the ethical obligation to maximize benefits
and to minimize harms. This principle gives rise to norms
n
requiring that the risks of research be reasonable in the light of the
expected benefits, that the research design should be sound, and
U
that the investigators must be competent to conduct the research
and to safeguard the welfare of the research subjects. Beneficence
-
further proscribes the deliberate infliction of harm on persons;
this aspect of beneficence is sometimes expressed as a separate
9
principle, nonmaleficence (do no harm).
• Justice refers to the ethical obligation to treat each person in
ri 9
accordance with what is morally right and proper, to give each
person what is due to him or her. In the ethics of research involving
human subjects the principle refers primarily to distributive
h
justice, which requires the equitable distribution of both the
burdens and the benefits of participation in research. Differences
ta
in distribution of burdens and benefits are justifiable only if they
are based on morally relevant distinctions between persons;
one such distinction is vulnerability. “Vulnerability” refers to a
substantial incapacity to protect one’s own interests owing to
such impediments as lack of capability to give informed consent,
lack of alternative means of obtaining medical care or other
expensive necessities, or being a junior or subordinate member
of a hierarchical group. Accordingly, special provision must be
made for the protection of the rights and welfare of vulnerable
persons.
166 Basics in Epidemiology and Biostatistics
tahir99 - UnitedVRG
Guidelines for Consent Writing 167
G
• In case of community studies, community leaders, elders,
local political leaders, religious leaders (in certain cases), and
R
governmental officials should be taken into confidence, and a
written consent should be obtained.
V
• In case of doing a study in other locations such as other hospitals
and clinics, permission from appropriate authority or physicians
d
should also be obtained.
ti e
• The consent form should be in English, Urdu or other local
language if needed. These should be identical in such a way that
the translation of one into other is similar. The language should be
easy which can be understood by study subjects (uneducated or
n
primary passed). Use of technical terms should be avoided.
• A properly drafted consent form should contain the following
U
important points:
– Information sheet. There should be one paragraph or page
-
–
giving information about the nature of the study, its purpose
and need, possible benefits of the study, and procedures to be
9
carried out on the study subjects.
– Possible risks and benefits to the study subjects.
ri 9
–
– Availability of alternate treatment in case of therapeutic trials.
–
– Voluntary participation without any compulsion, moral or
–
otherwise and without any financial incentive or coercion.
h
However, financial assistance reimbursement for time and
traveling may/should be provided to study subjects; which
ta
should commensurate with the time spent, and should not be
too high.
– Right to withdraw from the study any time without affecting
–
their rights and treatment.
– Confidentiality.
–
– If any specimen is to be stored, its time of storage and
–
permission to use it in further research.
– Name and contact number of the investigator in case the study
–
subject wants further clarification or information about study.
– Authorization from study subjects with their signature, thumb
–
impression, signature of witness, etc.
168 Basics in Epidemiology and Biostatistics
IMPORTANT NOTES
• Studies should not be done on patient’s expenses.
• If any new or additional tests are to be done as a requirement of
study, their cost should be supported by the study.
• If a new treatment is compared with an existing and establish one
or two treatment modalities are being evaluated and compared,
cost of treatment or difference in cost of treatment should be
borne by the study. In addition any expected or unexpected
complication arising as a result of new treatment should also be
supported by the study.
• Studies which are unlikely to produce any significant results
because of faulty design are often considered not to be ethical as
such studies cause wastage of time and resources. Theses should
be avoided unless there is a strong justification.
BIBLIOGRAPHY
1. Agard E, Finkelstein D, Wallach E. Cultural Diversity and Informed
Consent. The Journal of Clinical Ethics. 1998;9(2):173-6.
2. Sugarman J, Popkin B, Fortney J Rivera R. International Perspectives
on Protecting Human Research Subjects. Crystal City, VA: National
Bioethics Advisory Commission Draft, 2000.
3. World Health Organization and Council for International Organizations
of Medical Sciences (WHO-CIOMS). International Ethical Guidelines
for Biomedical Research Involving Human Subjects. Author, Geneva,
1993.
tahir99 - UnitedVRG
CHAPTER
19
Consent to Participate
in Research (Sample)
R G
V
TITLE OR PARAPHRASED TITLE OF THE STUDY
d
You are asked to participate in a research study conducted by names
ti e
of PI (and faculty sponsor if the PI is a student), from the departmental
affiliation at Michigan Technological University. If student, indicate
whether study is being conducted as part of undergraduate project,
n
graduate student project, thesis, or dissertation. Your participation
in this study is entirely voluntary. Please read the information below
U
and ask questions about anything you do not understand, before
deciding whether or not to participate.
-
Optional: You have been asked to participate in this study because
explain succinctly and simply why the prospective subject is eligible
9
to participate. If appropriate, state the approximate number of
subjects involved in the study. State whether there are inclusion
ri 9
or exclusion criteria for participation (e.g. medical conditions that
would include or exclude a person).
h
PURPOSE OF THE STUDY
ta
Briefly state what the study is designed to examine, assess, or
establish.
PROCEDURES
If you volunteer to participate in this study, you will be asked to do
the following things:
Describe the procedures chronologically using simple language,
short sentences, and short paragraphs. If there are several procedures
or if they are complex, then use of subheadings may help organize
this section and increase readability.
170 Basics in Epidemiology and Biostatistics
tahir99 - UnitedVRG
Consent to Participate in Research (Sample) 171
G
Following Paragraph, if Relevant
R
Based on experience with this drug, procedure, device, etc. in
animals, patients with similar disorders, researchers believe it may
V
be of benefit to subjects with your condition or, it may be as good
as standard therapy but with fewer side effects. Of course, because
d
individuals respond differently to therapy, no one can know in
advance if it will be helpful in your particular case. The potential
ti e
benefits may include: describe the anticipated benefits to subjects
resulting from their participation in the research.
If there is no likelihood that participants will benefit directly from
n
their participation in the research, state in clear terms. For example:
“You should not expect your condition to improve as a result of
U
participating in this research” or “This study is not being conducted
to improve your condition or health. You have the right to refuse to
-
participate in this study.”
9
Payment for Participation (Optional)
ri 9
State whether the subject will receive payment. If not, delete
this section. If subject will receive compensation, describe type
and amount, when compensation (e.g. money, extra credit, gift
h
certificate) is scheduled, and the proration schedule, if any, should
the subject decide to withdraw or is withdrawn by the investigator.
ta
Confidentiality
Any information that is obtained in connection with this study
and that can be identified with you will remain confidential and
will be disclosed only with your permission or as required by law.
Confidentiality will be maintained by means of describe coding
procedures and plans to safeguard data, including where data will
be kept, who will have access to it, etc.
If information will be released to any other party for any reason,
then state the person or agency to whom the information will
172 Basics in Epidemiology and Biostatistics
IDENTIFICATION OF INVESTIGATORS
If you have any questions or concerns about this research, please
contact; identify research personnel: principal Investigator, faculty
Sponsor (if student is the PI), Co-Investigator(s), if any. Include
day phone numbers, addresses, and email addresses for all listed
tahir99 - UnitedVRG
Consent to Participate in Research (Sample) 173
G
The Michigan Tech Institutional Review Board has reviewed my
request to conduct this project. If you have any concerns about your
rights in this study, please contact Joanne Polzien of the Michigan
R
Tech-IRB at 906-487-2902 or email jpolzien@mtu.edu.
V
I understand the procedures described above. My questions have
been answered to my satisfaction, and I agree to participate in this
d
study. I have been given a copy of this form.
ti e
________________________________________
Printed Name of Subject
Un
________________________________________
________________________________________
-
Signature of Subject
Date
9
________________________________________
ri 9
________________________________________
Signature of Witness
Date
h
BIBLIOGRAPHY
ta
1. www.uoguelph.ca/research/forms/.../sample%20consent%20form.
doc
Index
A Conduct research 4t
Consecutive manner 37
Alternate hypothesis, types of 60
Consecutive sampling 37
Analytical observational studies 14
Consent form 25
Antibody test 106
Convenience sampling 37
B Coronary artery disease 22f
Coronary heart disease 94
Bar charts 46
Cross-sectional studies 12
Basic statistical tests 110
design of 13
Bias 89
Cumulative incidence rate 73
control of selection 92
interviewer 91 D
misclassification 91
Data analysis 123, 143
types of 89
plan 120
Biostatistics 51
Data collection techniques, over-
Blinding 24
view of 115
C Data processing 122
Data types, classification of 42
Calculating odds ratio 87 Descriptive analysis 143
Case control study 15 Descriptive observational
design 15 studies 10
Categorical data 43 Diabetes 6
Causes of CRI 11 Different data collection
Central tendency, measures of 51 techniques 115
Chronic kidney disease 11t, 62, 95, Disease frequency, measures of 69
134, 144f Disease prevalence, effect of 108
Citing book reference 159 Dissertation reference 161
Citing internet and electronic Dissertation writing 151
sources 161 Dissertation, format of 151
Citing journal article 157 Dyspepsia 45
Closed ended questions 116
Cluster random sampling E
technique 37 End-stage renal disease 131
Cluster sampling 32, 36 Epidemiological study designs,
Cohort studies 17 types of 8, 9
Comorbidity index 11 Estimation and hypothesis
Comparative studies 14 testing 57
tahir99 - UnitedVRG
176 Basics in Epidemiology and Biostatistics
Qualitative research 1 Solving hypothesis testing
Quantitative data 43, 122, 123 problems 65
Quantitative research 3 Sorting data 121
Quasi-experimental studies 25 Special package for social
Questions, types of 116 sciences 83
Quota sampling 39 Standard error of mean 54
State appropriate conclusion 66
R Steps in
Recall bias 91 hypothesis testing 60
References 155 writing dissertation 151
study 140 Stratified random sampling 32, 35
writing 157 technique 36f
Research questions and study Study designs 8
types 27 Study duration 139
Research subjects, rights of 173 Study objective 153
Research topic, selection of 3 Study purpose 169
Research Synopsis writing 129
classification of 2 Systematic random sampling 32,
types of 1 33, 34f, 35f
Retrospective cohort study 19
Systolic blood pressure 45
S
Sample data 60
T
Sample of title page 155 Table of content 152
Sample size 95 Title 152
calculation 139 page 152
calculation result 100t Tuberculosis 16
estimation 95 morbidity rate of 75
for single group mean 96
V
for single proportion 95
Sampling Variables, types of 41
method 138 Variation, measures of 52
procedure 30 Volume number 159
techniques 31, 32f Vomiting 45
tahir99 - UnitedVRG