Está en la página 1de 4

World J. Surg. 29, 557560 (2005) DOI: 10.

1007/s00268-005-7912-z

How to Analyze an Article


John D. Urschel, M.D.
Department of Surgery, Division of Cardiothoracic Surgery, Tufts University School of Medicine, Tufts-New England Medical Center, 750 Washington Street, Boston, Massuchusetts, 02111, USA Published Online: April 21, 2005 Abstract. In clinical research investigators generalize from study samples to populations, and in evidence-based medicine practitioners apply population-level evidence to individual patients. The validity of these processes is assessed through critical appraisal of published articles. Critical appraisal is therefore a core component of evidence-based medicine (EBM). The purpose of critical appraisal is not one of criticizing for criticisms sake. Instead, it is an exercise in assigning a value to an article. A checklist approach to article appraisal is outlined, and common pitfalls of analysis are highlighted. Relevant questions are posed for each section of an article (introduction, methods, results, discussion). The approach is applicable to most clinical surgical research articles, even those of a nonrandomized nature. Issues specic to evidence-based surgical practice, in contrast to evidence-based medicine, are introduced.

Critical appraisal of a clinical research article is an essential feature of evidence-based medicine (EBM) and EBM surgical practice. Nevertheless, surgical trainees and some students of EBM occasionally lose sight of this fact or misunderstand the purpose of critical appraisal. Trainees wonder if they can leave the critique of research to others and simply read an expert review. This strategy may serve the generalist reasonable well (even this is debatable), but it is not acceptable for a serious practitioner of surgery [16]. The E in EBM stands for evidence, not expert opinion. The lessons of medical history point to the fallibility of expert opinion, especially when it is not rigorously derived from published evidence. Whereas some students question the practical usefulness of critical appraisal, others embrace it with excessive enthusiasm. In their vigor to nd fault in published papers and to criticize for criticisms sake, they fail to evaluate the value of an article. Determining the value of an article is the essence of critical appraisal [1]. All articles have aws. The real question is: Given the aws, how valuable is this article to the practice of EBM? Research in surgery yields a variety of article types, ranging from simple case reports to randomized controlled trials (RCTs) and meta-analyses of RCTs (Table 1). Between these extremes are the ever-prevalent case-series and nonrandomized compara-

Correspondence to: John D. Urschel, M.D., e-mail: buffalo.edu

jurschel@

tive studies of varying validity. The specialty of surgery has been criticized for relying excessively on case-series and their related expert opinion for far too long and for being slow to adopt the RCT [24]. A detailed exploration of this problem is outside the scope of this article, but the four major reasons for the relative infrequency of RCTs in surgery deserve mention. First, surgeons tend to be seriously attached to their own surgical viewpoint or technique, an attachment that usually exceeds a medical physicians afnity for a particular drug. Second, many surgical questions cannot be addressed in RCTs because the necessary community equipoise, or management uncertainty, does not exist. Third, surgeons are usually skilled in one operative approach to any given problem, but they are rarely equally procient in two competitive operative approaches; this makes RCTs of different operations difcult. Finally, patients do not mind having the choice of a perioperative antibiotic (or some similar medical intervention) left to chance, but they are understandably reluctant to leave the decision to operate to chance. Although I do not seek to make excuses for the lack of RCTs in surgery, the realities of the situation should be considered. Therefore, there is still a role, albeit a diminishing one, for a carefully conducted case-series in surgery. Whereas medical practitioners of EBM can often simply dismiss case-series from consideration, surgeons do not currently have this luxury. We still must critically analyze case-series while at the same time encouraging the performance of more sophisticated research studies [6]. The editors of this World Journal of Surgery issue on EBM for surgeons have commissioned several articles on the critical appraisal of articles, highlighting the central role of critical appraisal in EBM. The issue contains articles devoted to the analysis of therapeutic studies, studies of diagnostic tests, prognostic studies, and systematic reviews. Critical appraisal of these various forms of research publications has much in common. Readers familiar with the Users Guide publication series [7] and subsequent textbook [8] will be well versed in three basic questions of article appraisal: Are the results valid? What are the results? How can I apply the results to patient care? At least two of the articles in this issue follow this established format. However, there are other published checklists and appraisal approaches that are also useful [1, 911]. To maintain some balance in presentation in this issue, and to permit a generic approach to appraisal that is broadly

558 Table 1. Hierarchy of clinical surgical research. Meta-analyses and systematic reviews of multiple randomized controlled trials Randomized controlled trials Nonrandomized comparative studies with intent of a fair comparison Prospective (concurrent) cohort studies Retrospective (historic) cohort studies Case-control studies Nonrandomized comparative studies without consideration of fair comparison Case-series attempting to compare two dissimilar groups of patients Case-series attempting to compare contemporary patients with historic controls from previous era Noncomparative observational case-series Case reports

World J. Surg. Vol. 29, No. 5, May 2005 Table 3. Critical appraisal checklistmethods. Question Are the numbers of patients sufcient? Are the measurements valid and reliable? Are the outcomes clinically relevant? Are the statistical approaches sensible? Be wary of Lack of evidence for a treatment eect is not the same as evidence of no eect (study underpowered). Hospital records are not created with research in mind, and measurements found in hospital records are suspect. Convenient surrogates for important outcomes may not be valid. Unnecessarily complex methods may be designed to deceive. Data dredging leads to spurious associations. Best test seeking behavior overstates signicance.

Table 2. Critical appraisal checklistintroduction. Question Why was the study done? Are the aims clearly stated? Be wary of Case-series may be a veiled form of advertising for a prot-seeking organization. Preliminary unfocused data dredging, with study goals formulated after data analysis (to give appearance of legitimate research question).

applicable to RCTs and lesser research publications, an approach is outlined that differs from the Users Guide format popularized at McMaster University (the reader is referred to the papers by Bhandari and colleagues in this issue). I admit some difculty with this departure from the familiar [12]. Article Appraisal Checklists Checklists for the critical appraisal of surgical articles are outlined in Tables 2 to 5. The checklists are organized into four main categories that correspond to the usual format of a research article: introduction, methods, results, and discussion [1]. Within each category there are two to four basic questions. Appraisal of the Introduction The rst question to ask when reading an article is: Why was the study done?(Table 2) Whereas RCTs are usually motivated by a desire to answer a serious research question, case-series are often an exercise in publishing for the sake of publishing, or even publishing for the purpose of improving an institutions marketing position. The second question to ask about an articles introduction is: Are the aims clearly stated? A plausible and focused research goal suggests that the study was well thought out before data were collected. Alternatively, a vague goal or no goal at all usually indicates that data collection and analysis preceded the formulation of a research question. The introduction section of an article provides the reader with an early estimate of the papers value; a good question does not guarantee good research, but a poor question precludes it. Appraisal of Methods The articles methods section provides information on the internal validity of the study. In the Users Guide approach, for example, the methodology questions are asked under the heading Are the

results valid? In other words, if the methodology is not sound, the results will not be valid. This is an important concept. Are the numbers of patients sufcient? is the rst question in the methods section (Table 3). In an RCT, for example, the reader should look for explicit sample size justication. Of course, this question comes up again in the discussion section when the possibility of a type II error (nding no evidence of a difference between sample groups when a difference really exists in the population groups) in the study should be considered. The next question is: Are the measurements valid and reliable? A valid measure is one that measures what it is supposed to measure; and a reliable measure is one that gives a similar result when applied on more than one occasion [1]. Published articles often fail to mention shortcomings in this area or minimize their importance. Readers should be especially skeptical of clinical measurements obtained from hospital records. Hospital records are not designed for research, and many measurements that are acceptable for clinical care are not valid as research measurements. In vascular and plastic surgery, for example, Doppler measurements are often used to assess tissue perfusion. This serves a useful clinical purpose but the measurements, when viewed in a research context, may not be valid or reliable. Are the outcomes clinically relevant? is the next question in the methods section. The really important outcomes are often difcult to measure. Therefore investigators may select outcomes that are easy to assess and then argue that these outcomes are clinically relevant in their own right or are useful surrogates for other outcomes. Serum albumin, for example, is a simple outcome to assess, but it may not be a good surrogate for nutritional status in acutely ill patients. The nal question in the methods section is: Are the statistical approaches sensible? Surgeons should have a basic understanding of statistical methods; we need to understand standard approaches to common statistical problems. That basic knowledge allows the reader to evaluate, in a general sense, the suitability of the reported statistical approach [13]. If a studys statistical methods seem unduly complex or depart too far from the norm, the reader might wonder if this represents an intentional attempt at statistical deception. Readers should also be wary of two common forms of disingenuous statistical manipulation: data dredging and best test seeking behavior. With data dredging, the investigator tests for multiple possible associations in the data and hopes to nd something signicant. Of course, if enough possible associations are examined, something is

Urschel: Article Appraisal

559 Table 4. Critical appraisal checklistResults. Question Are the basic data properly described? Be wary of

bound to turn up by chance alone. For example, an investigator exploring 20 possible associations may nd, by chance, one that seems to meet an arbitrary denition of statistical signicance (p = 0.05). Data dredging gives rise to spurious associations. Best test seeking behavior is similar to data dredging, put here the investigator seeks out good tests instead of good associations. In other words, the investigator runs the data with many different statistical tests and then reports the statistical methods that are most pleasing. Whereas data dredging leads to spurious associations, best test seeking behavior overstates the statistical signicance of an association. Unfortunately, modern computer software packages facilitate both data dredging and best test seeking behavior.

Appraisal of Results Are the basic data properly described? is the rst question (Table 4). Basic data include important patient characteristics such as age, sex, weight, socioeconomic status, performance status, and disease stage. It also includes basic data on the medical environment, such as size and type of hospital (teaching, community, private, public), referral patterns, specialist or generalist practice, and hospital resources. These basic data may seem mundane, but they are extremely important. A fair comparison of two groups of patients hinges on the similarity of the two groups before intervention. Even the process of randomization in an RCT does not guarantee that the two groups are similar. Randomization prevents the groups from being dissimilar in a systematically biased way, but it does not prevent dissimilarity by chance. Irrespective of publication type, the reader cannot make a judgment on group similarity without the basic data. Basic data also help the reader in another respect. The reader cannot generalize the study ndings to his or her surgical practice without considering the studys patient and hospital characteristics. The issue of generalizing research ndings is critically important for surgeons (see below). The question Do the numbers add up? may seem too obvious for inclusion in this checklist, but (sadly) it remains an important question for the reader. A quick glance at the tables and graphs, while reading through the text, may show inconsistencies in the numbers. All articles have aws, and errors do occur, but the real worry for the reader is the extent of the error. If there is obvious sloppiness in the paper, might there be even more sloppiness and error in the underlying study? The next question, Are the measure of effect, and statistical signicance, properly presented? is important. The related Users Guide questions are How large was the treatment effect and how precise was its estimate? With these questions the reader evaluates the magnitude of difference between two patient groups, and its possible explanation by chance alone (statistical signicance). The reader should be wary if the authors quietly state a modest absolute difference between groups and then go on to use measures of effect (e.g., relative risk reduction) that express absolute difference as a proportion of the control groups risk [12, 14]. If, for example, a new drug reduces the risk of a perioperative complication from 6% in the control group to 3% in the treatment group, the absolute risk reduction is 3% (number needed to treat is 33, see Dr. Trainers article) and the relative risk reduction is 50%. A novice reader may be unduly impressed by the 50% relative risk reduction.

If basic data are not provided, there is no way of telling if the two groups are similar (fair comparison). Generalization, a key step in EBM, is not possible if we do not have basic patient data. Do the numbers add up? Sloppiness, when present, is usually not conned to the easily identiable errors (iceberg analogy). Are the measures of effect and Relative risk reduction may be statistical signicance properly impressive, but what is the absolute presented? risk reduction (and NNT)? What is the main nding, If the main nding comes from and could it be erroneous? an unplanned subgroup analysis, it may be wrong (data dredging). Bias and confounders may give a spurious result. Absolute risk reduction = risk in control group minus risk in treatment group. Relative risk reduction = absolute risk reduction divided by risk in control group, expressed as percent. NNT: number needed to treat (1/absolute risk reduction); EBM: evidence based medicine.

After considering the suitability of the articles measure of effect, the reader should look at the presentation of statistical signicance or, stated differently, the precision of the estimate of measure of effect. Condence intervals are preferred, but traditional p values provide the same information (in a more opaque way). A 95% condence interval (CI) is typically reported in surgical journals for the same reason that a p value of 0.05 is considered signicant: It is an arbitrary but convenient threshold level of signicance. A 95% CI denes an interval of values that include the true value 95% of the time. Unfortunately, statistical signicance is still poorly presented in many articles. Readers should be wary of statements or tables that simplistically report not signicant or, alternatively, signicant, p < 0.05. There is no excuse for this type of imprecision in surgical reporting. The last question in the results section is What is the main nding and could it be erroneous? The reader should be wary of main ndings that do not directly follow from the main research question. If, for example, a paper reported a trial of lymphadenectomy versus no lymphadenectomy for malignancy, the anticipated main nding would be one of survival in patients treated by lymphadenectomy. However, it would not be unusual for the article to emphasize a different nding, such as improved survival in a just a subgroup of patients undergoing lymphadenectomy. Subgroup analyses may be valid if the analysis was planned a priori (contrast with subgroup analyses after data dredging). Nevertheless, subgroup analyses should be viewed with caution, especially if they form the basis of the articles main nding [15]. The reader should also consider the possibility of a major error in ndings due to bias, or the presence of a confounder. Bias, at any point in a study, can systematically (rather than randomly) deviate the results away from the truth. A confounder is an unidentied third variable that is responsible for an apparent, but false, association between two study variables. Good researchers strive to eliminate bias and to understand confounders.

560 Table 5. Critical appraisal checklistdiscussion. Question Are the results fairly considered against a background of previously published data? What are the implications for my practice? Do I possess surgical skills similar to those of the reporting surgeons? Be wary of Results only discussed within the context of supportive published data. Results nicely conrm authors previously published position. It may not be possible to generalize the study results to a dierent treatment environment. The skill with which an operation is performed may be more important than the specics of the operation itself.

World J. Surg. Vol. 29, No. 5, May 2005

some cases, the evidence may even suggest that the surgeon refer specic patients to another center. That is an especially difcult aspect of evidence-based surgery, and one that our medical colleagues have trouble understanding. Few physicians have seen their professional livelihood altered by the arrival of a new prescription medicine, but the same cannot be said for the impact of new procedures on established surgeons. In part, it is the differences between evidence-based medicine and evidence-based surgery that make this World Journal of Surgery issue on evidencebased surgery so timely. References
1. Crombie IK. The pocket guide to critical appraisal. London: BMJ Books, 2002 2. Horton R. Surgical research or comic opera: questions, but few answers. Lancet 1996;347:984985 3. Lee JS, Urschel DM, Urschel JD. Is general thoracic surgical practice evidence based? Ann. Thorac. Surg. 2000;70:429431 4. McLeod RS. Issues in surgical randomized controlled trials. World J. Surg. 1999;23:12101214 5. Urschel JD, Urschel DM, Mannella SM, et al. Duration of knowledge in general thoracic surgery. Ann. Thorac. Surg. 2001;71:337339 6. Law S, Wong J. Use of controlled randomized trials to evaluate new technologies and new operative procedures in surgery. J. Gastrointest. Surg. 1998;2:494495 7. Oxman AD, Sackett DL, Guyatt GH. Users guide to the medical literature. I. How to get started. J. A. M. A. 1993;270:20932095 8. Guyatt, G, Rennie, D (2002) Users Guide to the Medical Literature: A Manual for Evidence-based Clinical Practice, AMA Press, Chicago 9. Greenhalgh T. How to read a paper: getting your bearings (deciding what the paper is about). B. M. J. 1997;315:243246 10. Greenhalgh T. How to read a paper. London: BMJ Books, 1997 11. Jadad A. Randomised controlled trials. London: BMJ Books, 1998 12. Urschel JD, Goldsmith CH, Tandan VR, et al. Users guide to evidence-based surgery: how to use an article evaluating surgical interventions. Can. J. Surg. 2001;44:95100 13. Greenhalgh T. How to read a paper: statistics for the non-statistician. I. Different types of data need different statistical tests. B. M. J. 1997;315:364366 14. Antes G, Galandi D, Bouillon B. What is evidence-based medicine?. Langenbecks Arch. Surg. 1999;384:409416 15. Oxman AD, Guyatt GH. A consumers guide to subgroup analyses. Ann. Intern. Med. 1992;16:7884 16. Sauerland S, Lefering R, Neugebauer EAM. The pros and cons of evidence-based surgery. Langenbecks Arch. Surg. 1999;384:423431

Appraisal of Discussion Are the results fairly considered against a background of previously published data? (Table 5). The authors should present their results in a balanced way, but this is often not done. Readers should be wary of articles that cite only supportive data. Similarly, readers should ask how the ndings t into a framework of any previous publications by the same authors. Some authors champion the same opinion, in an unwavering way, in publication after publication. A key issue in EBM relates to the process of generalizing research results to individual patients. The question can be stated as What are the implications for my practice? For medical practitioners, these are questions of patient characteristics and health care environment. The reader assesses the basic data in the article (see above) and asks if the patients and health care environment are similar to his or her own. If they are, the articles ndings are probably applicable to the physicians practice. However, for surgeons, there is an additional dimension to this process of generalization: individual surgeon skill. Do I possess similar skills to those of the reporting surgeons? This is a difcult issue for surgeons to confront [12, 16]. Patients would not be well served if surgeons abandoned operative techniques with which they were successful in an attempt to adopt the latest best technique. There must be a cautious transition to new surgical techniques. In

También podría gustarte