Está en la página 1de 10

Gait & Posture 29 (2009) 360–369

Contents lists available at ScienceDirect

Gait & Posture

journal homepage:


The reliability of three-dimensional kinematic gait measurements:

A systematic review
Jennifer L. McGinley a,d,*, Richard Baker a,b, Rory Wolfe a,c, Meg E. Morris a,d
Centre for Clinical Research Excellence in Clinical Gait Analysis and Gait Rehabilitation, Murdoch Childrens Research Institute, Royal Children’s Hospital, Melbourne, Australia
Hugh Williamson Gait Analysis Service, Royal Children’s Hospital, Melbourne, Australia
Department of Epidemiology and Preventive Medicine, Monash University, Melbourne, Australia
School of Physiotherapy, The University of Melbourne, Melbourne, Australia


Article history: Background/Aim: Three-dimensional kinematic measures of gait are routinely used in clinical gait analysis
Received 5 March 2008 and provide a key outcome measure for gait research and clinical practice. This systematic review
Received in revised form 5 September 2008 identifies and evaluates current evidence for the inter-session and inter-assessor reliability of three-
Accepted 5 September 2008
dimensional kinematic gait analysis (3DGA) data.
Method: A targeted search strategy identified reports that fulfilled the search criteria. The quality of full-
text reports were tabulated and evaluated for quality using a customised critical appraisal tool.
Results: Fifteen full manuscripts and eight abstracts were included. Studies addressed both within-
Gait analysis
assessor and between-assessor reliability, with most examining healthy adults. Four full-text reports
Reproducibility evaluated reliability in people with gait pathologies. The highest reliability indices occurred in the hip and
Measurement error knee in the sagittal plane, with lowest errors in pelvic rotation and obliquity and hip abduction. Lowest
reliability and highest error frequently occurred in the hip and knee transverse plane. Methodological
quality varied, with key limitations in sample descriptions and strategies for statistical analysis. Reported
reliability indices and error magnitudes varied across gait variables and studies. Most studies providing
estimates of data error reported values (S.D. or S.E.) of less than 58, with the exception of hip and knee
Conclusion: This review provides evidence that clinically acceptable errors are possible in gait analysis.
Variability between studies, however, suggests that they are not always achieved.
ß 2008 Elsevier B.V. All rights reserved.


1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
2. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
2.1. Study identification and selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
2.2. Data extraction and quality appraisal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
3.1. Sample selection, composition and description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
3.2. Study procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
3.3. Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
3.4. Reliability findings: overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
4.1. Methodological considerations: participant and assessor samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
4.2. Methodological considerations: study design and procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
4.3. Methodological considerations: statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

* Corresponding author at: Gait CCRE, Murdoch Children’s Research Institute, Hugh Williamson Gait Laboratory, Royal Children’s Hospital, Flemington Rd Parkville, Victoria
3052, Australia. Tel.: +61 3 9345 5354; fax: +61 3 9345 5447.
E-mail address: (J.L. McGinley).

0966-6362/$ – see front matter ß 2008 Elsevier B.V. All rights reserved.
J.L. McGinley et al. / Gait & Posture 29 (2009) 360–369 361

5. Considerations and recommendations for future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

1. Introduction 2. Method

2.1. Study identification and selection

Three-dimensional kinematic gait measurements are used
The search strategy for this review began with retrieval of published reports
widely in clinical gait analysis services and clinical research. indexed on health or biomechanics related electronic databases from MEDLINE
Despite the increasing number of gait laboratories, there is limited (1970 to July 2007), EMBASE (1980 to July 2007), CINAHL (1982 to July 2007), RECAL
cohesive information regarding the reliability of kinematic gait Bibliographic Database (pre-1990 to July 2007) and Inspec (1970 to July 2007). The
measurements. Two recent reports [1,2] highlighting between- search was limited to literature reporting studies of human subjects with abstracts
written in English. The search terms were customised to each database and
laboratory differences in 3DGA measures have raised concerns
included the following keywords; gait, gait disorders, gait analysis, observer
from the wider orthopaedic community [3,4]. This paper presents a variation, reproducibility of results, and reliability. Bibliographies of identified
systematic review and qualitative appraisal of the evidence papers and relevant conference proceedings were hand searched.
describing the reliability of three-dimensional kinematic gait data The review was conducted to be of primary relevance to gait laboratories
(3DGA). collecting typical multi-joint lower body gait kinematic data. The titles and
abstracts identified by the initial search strategy were screened by the first named
The reliability and validity of gait measurements should be author (JM) to identify potentially eligible reports and retrieve full-text reports.
known in order to be used appropriately [5]. As repeated gait When the title or abstract did not clearly indicate whether an article should be
measurements typically show some differences, these can be included then the complete article was obtained and reviewed. Full-text reports
assumed to contain a proportion of error. This review addresses were then evaluated by two authors (JM and RB) for the following inclusion criteria:
(1) reports of the inter-session or inter-assessor reliability of three-dimensional
reliability, which is the extent to which gait measurements are
kinematic gait or running measures of human participants; (2) including at least
consistent or free from variation. The term ‘error’ in this paper is three joints of the lower body (pelvis, hips, knees, ankles); (3) reporting numerical
used within the context of reliability and refers to the variation findings from repeated kinematic data capture from more than one measurement
found across repeated measures. Knowledge and understanding of occasion (with markers replaced each occasion); (4) full papers or abstracts (not
typical measurement variation is helpful to guide the use and later published as full papers); (5) published with an English abstract.

interpretation of data.
2.2. Data extraction and quality appraisal
Clinical gait analysis typically seeks to discriminate between
normal and abnormal gait and to assess change in walking over Reports were retained as either full-text reports or published abstracts. A
time [6]. Repeated gait measurements can be used to evaluate standardised data extraction and appraisal form was constructed to identify and
the response to therapeutic interventions such as surgery, detail key features of each study. Two reviewers (JM and RB) initially independently
piloted the form with a small subset of representative studies to confirm the
physiotherapy, medications and orthotics. Variability between content and to assess the reliability. The extracted study details focused on
‘before’ and ‘after’ measurements may be due to treatment effects participant characteristics and recruitment, study procedures and biomechanical
or measurement variation, or a combination of both. Knowledge of models, and the statistical analysis techniques.
the error magnitude can enable clinical teams to minimise the risk The quality of study design and conduct are key elements in evaluating scientific
evidence, with contemporary systematic reviews providing study quality
of over-interpreting small differences as meaningful, [7] and to
appraisals in addition to quantitative reviews. Although a large body of literature
have greater confidence that the treatment effect exceeds the exists to provide guidelines for the systematic evaluation of research methodology
measurement error. Additionally, the use of measurements with [10,11], the majority are focussed primarily upon studies of healthcare interven-
low reliability in clinical research may lead to underestimation or tions, in particular randomized controlled trials. As no standardised or established
failure to detect significant effect sizes; with too much noise (error) guidelines were located for reviews of reliability, a customised quality appraisal
form was developed. The appraisal component was developed to integrate relevant
drowning out real effects [8]. examples of methodological quality criteria from other systematic reviews of
The reliability or consistency of 3DGA can be examined in reliability [12–14], and gait classification [15]. Relevant quality themes and
various ways. Typically multiple walking trials are collected principles were also adapted from quality criteria proposed for the measurement
within a single session. Variability between these trials can be properties of health status questionnaires [16], and the QUADAS tool used to
appraise studies of diagnostic accuracy [17]. Additionally, an initial expert panel
regarded as ‘intrinsic variation’, and reflects the inherent variation
was formed to consider and define the data extraction and appraisal criteria for the
within unimpaired individuals or those with pathology [7]. These study. Quality appraisal indicators were developed into a standardised form to
intrinsic variations cannot be reduced, yet provide a baseline ensure a structured approach to evaluation of key quality elements and to ensure
indication of variation independent of other error sources. Other equal appraisal of all papers. Appraisal items were not scored as the validity of such
measurement variation arises from extrinsic factors such as scoring systems is currently unproven [18]. The appraisal criteria included themes
related to external validity such as sampling methods and description, standardisa-
procedural errors [7]. Reliability of data obtained from different
tion and description of procedures, and selection of statistical analysis techniques.
testing sessions conducted by the same assessor (inter-session or Appraisal criteria were not applied to the abstract-only reports because their
within-assessor) and by different assessors (inter-assessor) is brevity limited the provision of methodological detail.
susceptible to these extrinsic errors. Inconsistent marker place- The data extraction and appraisal form were used independently by two
reviewers (JM and RB) to extract key details from each report and to evaluate the
ment is generally regarded as a key factor, although other factors
quality of each full-text paper. Any rating disagreements on quality criteria were
such as inconsistent anthropometric measurements, variation in checked against the original article to ascertain the correct scoring according to a
walking speed, data processing or measurement equipment errors pre-defined procedure, in accordance with established and recommended protocols
may also contribute to data variation [9]. Reliability across [15,18].
sessions is of immediate relevance to clinical gait analysis
practices; as observations are routinely and regularly repeated
to measure patient performance over time, and different assessors 3. Results
may conduct tests for an individual patient. This aim of this review
was to identify and critically evaluate the evidence describing the The electronic searches and hand-search of references and
reliability of lower body kinematic gait data across repeated selected conference proceedings yielded a total of 510 articles.
sessions. Following the application of the inclusion/exclusion criteria, 23
362 J.L. McGinley et al. / Gait & Posture 29 (2009) 360–369

studies were identified for inclusion in the systematic review; 15 reflect the heterogeneity of the sample, and allow insight into the
full papers and 8 abstracts. generalisability of the findings to other populations. The number of
Details of the 23 identified studies are provided in Table 1. gait participants varied widely across full-text studies from 1 [27]
Within-assessor reliability was reported in 15 of the studies, and to 40 [23,28], with 10 reports including more than 10 participants.
between-assessor in 10. Four of the studies were described as test– Justification for the sample size of gait participants was not
retest, as either the number of assessors was not stated, or it was provided in any study.
uncertain whether the same assessors applied the markers in The sampling method used to recruit assessors or the related
repeated sessions [19–22]. Gait participant sample sizes ranged inclusion and exclusion criteria were not reported in any of the
from 1 to 50 (median of 10), with approximately 70% of studies full-text reports. Descriptions of the assessors were generally
examining healthy subjects. Three studies included groups of both poor, with only two studies reporting the desired complete
healthy and disabled participants [20,22,23]. Thirteen studies details including the number of assessors, professional back-
included adults, eight included children and five did not report the ground and experience or training [29,30]. Physiotherapists were
age of participants. The number of assessors ranged from 1 to 24, most often reported as the group, with six of the reports also
and included physiotherapists and technicians. Commercially describing their assessors as either experienced or highly trained
available biomechanical models were most frequently used in [9,24,28–31]. The number of assessors was frequently small,
the studies. Measurement sessions commonly numbered two or between one and five.
three and occurred over time intervals ranging from 2 h to 20
weeks. 3.2. Study procedures

3.1. Sample selection, composition and description The majority of full-text reports appeared to use standardised
measurement protocols on repeated occasions. Of the full-text
The quality indicators related to sampling methods and reports, twelve studies reported capturing data at self-selected
description varied widely across the 15 full-text reports. Key or normal speed, with one study selecting a fixed speed (of
criteria are reported in Table 2. The sampling method for running) [32], and one study not reporting speed [27].
recruitment of gait participants was frequently not reported. In Metronome paced or beep test controlled speed protocols were
the studies with unimpaired people, it is likely that convenience also reported in abstracts [33,37]. Associated spatio-temporal
sampling was employed. In studies with clinical participants, the (ST) gait measures or measures of between-session ST variation
sampling method was also predominantly convenience samples were provided in 11 of the 15 reports. Data capture systems
from local known clinical populations [2,21,23]. Yavuzer et al. [24] were generally adequately described with most full-text reports
sought consecutive patients with stroke who fulfilled the study providing adequate overall descriptions of the biomechanical
criteria. models used, or providing appropriate reference to available
The study inclusion/exclusion criteria for gait participants were descriptions. Desirable model-specific details regarding within-
stated in around half of the full-text reports and varied greatly in model options such as Knee Angle Device (KAD) utilisation,
detail. Of the studies including ‘healthy’, ‘unimpaired’ or ‘normal’ post-testing adjustment of thigh rotation angles and specifica-
subjects, seven specified inclusion/exclusion criteria such as the tion of hip joint centre location techniques were inconsistently
absence of previous musculo-skeletal, neurological or other provided.
conditions that may affect gait. A single study chose to recruit The duration between measurement sessions varied and ranged
only male healthy participants to minimise any potential influence from 2 h [24] up to 20 weeks [2], with several studies failing to
of gender variation on the findings [25]. Another study selected state their time interval (e.g. [7,34]). Within-study standardisation
participants older than 16 in order to minimise potential variation of the measurement interval did not occur uniformly, with some
due to adolescent growth and variable walking velocity [9]. Of the studies reporting varied time intervals or wide within-study
three full-text studies including participants with CP, common ranges of 6–20 weeks [2].
inclusion criteria included age, type of CP, ability to walk
independently, and adequate cognition to cooperate with gait 3.3. Statistical analysis
analysis. Required gait ability varied from the ability to walk
without an orthosis, [2] to the ability to walk continuously for The coefficient of multiple correlation (CMC), or coefficients
15 min without walking aids or orthoses [23]. Noonan et al. [2] of multiple determination (CMD) were used in eight of the 23
offered study participation to a wide range of subjects, including studies. These techniques examine consistency across the entire
subjects from mild to severe disability, both pre- and post- gait cycle and are expressed as an index of agreement between 0
operatively, with and without bracing and with varied distribution and 1. Intra-class correlations (ICCs) were reported in six
of spasticity. This sample seems likely to be representative of a studies. Various forms of ICCs are available for different study
typical gait laboratory population, although it is uncertain whether designs and different methods of estimation exist with
the patients with prior surgery had stabilised prior to study corresponding differences in the underlying assumptions and
inclusion. Children with prior therapeutic intervention (therapy/ generalisability [35].
casts/surgery) were excluded in the study by Steinwender et al. Absolute measures of measurement variation in degrees were
[23]. also included in numerous reports, including standard deviations
The quality of the descriptions of the gait participants also (S.D.), standard error (S.E.), range, mean absolute difference and
varied markedly across the full-text reports. Gait participants were Bland and Altman limits of agreement. The majority of studies used
fully described in terms of age, gender, health status and techniques to examine the reliability of the entire kinematic curve,
anthropometric characteristics in only seven studies. Relevant with others choosing to examine selected key kinematic peaks,
pathology-specific detail was provided in the descriptions of the CP amplitudes or events [9,24,32,34,36,37]. Many authors also
and stroke participants, although only one of the three studies presented within-session reliability data. Schwartz et al. and
including subjects with CP reported the Gross Motor Function Murphy et al. [7,38] reported sources of variance related to within-
Classification System (GMFCS) [26] to detail gait ability. Adequate session (inter-trial), inter-session (within-assessor) and inter-
clinical participant descriptors are particularly important as they therapist (between-assessor).
Table 1
Characteristics of the identified studies of the reliability of 3DGA data.

Study Biomechanical model Participant characteristics Assessor characteristics Type of reliability, session number and interval Statistical analysis
(n, age (years), type, gender) (n, profession)

Besier et al. [31] Custom models; Anatomical n = 10, Age: NS, Able-bodied, 6 n = 5, Discipline: NS Inter-session (W-Ass), 2  sessions, Analysis across GC curve: CMD, Average
landmark model, Functional M;4 F Inter-assessor, Interval: 4 h systematic error
joint model
Charlton et al. [27] VCM, OLGA n = 1, Age: NS, Healthy n = 3, Discipline: PT Inter-session (W-Ass), 3  sessions, Analysis across GC curve: S.D. of averaged joint
Inter-assessor, Interval: NS angle
Cowman et al. [37] (A) Bilateral CODA mpx30 n = 2, Age: 9, 21 years, Normal n = 2, Discipline: PT Inter-session (W-Ass), 3  sessions, Kinematic key points selected for analysis:
Inter-assessor Error indices (% of 95% confidence ranges)
Eve et al. [40] (A) PiG n = 10, Age: 31.4, Healthy n = 1, Discipline: Inter-session (W-Ass), 2  sessions, Analysis across GC curve: S.E.
therapist Interval: Within a week
Ferber et al. [32] Unilateral model MOVE3D n = 20, Age: 21.4 (mean), n = 1, Discipline: NS Inter-session (W-Ass), 2  sessions, Kinematic key points (stance phase) selected
Healthy, 7 M; 13 F Interval: 1 week for analysis: ICC, Mean (S.E.M.)
Gok et al. [25] VCM n = 11, Age: 32 (mean), Healthy, n = 1 or 2, Discipline: Inter-session (W-Ass), 2  sessions, Kinematic key points selected for analysis, ICC,
11 M Physician & technician Interval: 3 days Wilcoxon signed ranks test
Gorton et al. [19] (A) VCM n = 50, Age: 5–16, Normal n = 2, Discipline: NS Inter-session (test-retest), Interval: >1 week Analysis across GC curve: CMC
Gorton et al. [1] (A) Vicon and Motion Analysis n = 1, Age: NS n = 24 (in 12 labs), Inter-assessor, Interval: within a 3 month Linear mixed model, S.D. and range
Corporation software Discipline: clinicians period

J.L. McGinley et al. / Gait & Posture 29 (2009) 360–369

Gorton et al. [33] (A) Vicon and Motion Analysis n = 1, Age: NS n = 24 (in 12 labs), Inter-assessor, Interval: within 1 month Linear mixed model, S.D. and range
Corporation software Discipline: clinicians
Growney et al. [34] 21 marker model similar to n = 5, Age: NS, Normal, 3 F; 2 M n = 1, Discipline: NS Inter-session (W-Ass), 3  sessions, Analysis across GC curve: CMC, S.D.
‘Kabada’ model with ANALYZE Interval: NS; 3 separate days
Kadaba et al. [28] Conventional gait model n = 40, Age: 18–40, Normal n = 1, Discipline = NS Inter-session (W-Ass), 3  sessions >1 week Analysis across GC curve: CMC with mean
Leardini et al. [29]# Custom model: Anatomically n = 1 Age: 7, Healthy, F n = 5, Discipline: PT Inter-assessor, Interval: NS Analysis across GC curve: Averaged S.D.
based protocol
Mackey et al. [21] Cleveland clinic marker set, EvA n = 10, CP; Age: 6 M; 9+/ 4, 4 F; n = NS, Discipline = NS Inter-session (test-retest), 2  sessions, Analysis across GC curve: CMC (S.D.), Mean
and Orthotrak (Motion Analysis 12+/ 3 Interval: 1 week absolute difference (S.D.)
Maynard et al. [36] CODA mpx30 model For inter-session: n = 10, Age: For inter-session: NS, Inter-session (W-Ass), 3  sessions, Kinematic key points selected for analysis:
39.2 (mean) 5 M 5 F, For For inter-assessor: Inter-assessor, Interval: within day, & 1 week Bland and Altman LOA, ICC
inter-assessor: n = 19, Age: 34.4 n = 3,
(mean) 4 M, 15 F Discipline = NS
Miller et al. [20] (A) Modified Helen Hayes n = 10, 5 = CP, 5 = non-disabled, n = NS, Discipline: NS Inter-session (test-retest), 5  sessions Analysis across GC curve: ICC
Age 5–16
Monaghan et al. [9] Unilateral CODA n = 10, Age: 28.5 (mean), 7 F, 3 n = 1, Discipline: NS Inter-session (W-Ass), 2  sessions, Kinematic key points selected for analysis,
M, Healthy Interval: 1 week Bland and Altman LOA, ICC
Murphy et al. [38] (A) Conventional biomechanical n = 3, Age: 50 (mean, S.D. = 8), n = 3, Discipline: Inter-session (W-Ass), 2  sessions, Analysis across GC curve: Multi-level,
model Stroke Clinician Interval: NS random-effects linear regression model
Noonan et al. [2] VCM, CCM/Orthotrak n = 11, Age: 5–17, CP, 6 M; 5 F n = 4 laboratories, Inter-assessor, Interval: 6–20 weeks Analysis across GC curve, Discordance index,
Discipline: NS Absolute variability
Quigley et al. [22] (A) NS n = 10, CP (n = 5), Typically n = NS, Discipline: NS Inter-session (test-retest), 5  sessions; Kinematic key points selected for analysis, CV
developing (n = 5), Age: 9.6 Interval: Each session >2 days apart (S.D.)
Steinwender et al. [23] Conventional gait model n = 40, Healthy (n = 20), CP n = 1, Discipline: NS Inter-session (W-Ass), 3  sessions, Analysis across GC curve, CMC (S.D.)
(n = 20), Age: 7–15 Interval: 3 days within a week
Schwartz et al. [7] VCM n = 2, Age: 40, 36, Healthy n = 4, Discipline: PT Inter-session (W-Ass), 3  sessions, Analysis across GC curve, Variance components
Inter-assessor, Interval: NS estimation (S.D.)
Tsushima et al. [30] VCM n = 6, Age: 34.8 (mean), n = 2, Discipline: PT Inter-session (W-Ass), 2  sessions, Analysis across GC curve, CMC (S.D.)
Unimpaired, 3 M; 3 F Inter-assessor, Interval: within 2 weeks
Yavuzer et al. [24] VCM n = 20, Age: 54.2 (mean), 7 F; 13 n = 1, Discipline: Inter-session (W-Ass), 2  sessions, Analysis across GC curve and at selected
M, Stroke Technician Interval: 2 h kinematic key points, CV%, CMC, ICC

(A), Abstract only; F, female; M, male; VCM, Vicon Clinical Manager; GC, gait cycle; S.D., standard deviation; SEM, standard error of measurement; W-Ass, Within-assessor; ANOVA, analysis of variance; CODA, Cartesian
Optoelectric Dynamic Anthropometer; OLGA, optimised lower-limb gait analysis; LOA, limits of agreement; CCM, Cleveland Clinic Model; NS, not stated; PiG, Plug-in-Gait; PT, Physiotherapist; CMC, coefficient of multiple

correlation; CMD, coefficient of multiple determination; CV%, coefficient of variation; ICC, intra-class correlation; CI, confidence interval; NS, not stated; # data refers to Leardini et al. [29] Study 2 (inter-examiner).
364 J.L. McGinley et al. / Gait & Posture 29 (2009) 360–369

Table 2
Methodological quality of the reviewed full-text studies.

Gait participants Assessor Protocol Model Data Statistical

participant standardisation description description analysis
Sampling Inclusion and Description description and description
method exclusion criteria

Besier et al. [31] Not stated Not stated Partial Partial Adequate Adequate Adequate Adequate
Charlton et al. [27] Not stated Not stated Inadequate Partial Limited Adequate Limited Adequate
Ferber et al. [32] Not stated Not stated Adequate Inadequate Adequate Limited Limited Adequate
Gok et al. [25] Convenience Stated Partial Partial Adequate Adequate Limited Limited
Growney et al. [34] Not stated Not stated Partial Inadequate Limited Adequate Adequate Adequate
Kadaba et al. [28] Not stated Limited Partial Partial Limited Adequate Adequate Limited
Leardini et al. [29] Not stated Not stated Adequate Adequate Limited Adequate Adequate Adequate
Mackey et al. [21] Convenience Stated Adequate Inadequate Limited Adequate Adequate Adequate
Maynard et al. [36] Not stated Stated Partial Inadequate Adequate Adequate Adequate Adequate
Monaghan et al. [9] Not stated Stated Adequate Partial Adequate Adequate Adequate Adequate
Noonan et al. [2] Convenience Stated Partial Partial Limited Adequate Adequate Adequate
Schwartz et al. [7] Not stated Not stated Adequate Partial Limited Adequate Adequate Adequate
Steinwender et al. [23] Convenience Stated Partial Inadequate Adequate Limited Adequate Limited
Tsushima et al. [30] Not stated Stated Adequate Adequate Adequate Adequate Adequate Limited
Yavuzer et al. [24] Case consecutive Stated Adequate Partial Limited Adequate Adequate Limited

3.4. Reliability findings: overview analyses limits between-study comparisons; however grouping
data in this manner is useful to look broadly at patterns. In general,
The diversity in the reported studies precludes a simple sagittal plane errors were typically <48, and coronal plane around
synthesis of results. Meta-analysis of the results was not 28. Highest errors were seen in hip and knee rotation, and the
considered to be appropriate given the diversity among a fairly lowest errors commonly at the pelvis in the transverse and coronal
small number of studies, the varied participant ages and pathology, plane, and hip abduction. The pattern of these findings broadly
the marked variability in the quality, methods and selected concurs with other reports in terms of range [1,33] or ‘absolute
statistical analysis and the heterogeneity of results. Under these variability’ [2,29], with reported values of hip rotation ranging
circumstances, the review comprised a qualitative analysis of the from 168 [29] to 348 [33], in contrast to lower estimates of pelvic
research available, a ‘‘best evidence synthesis’’ [39]. obliquity of less than 68 [1,2,29,33].
Limited comparisons are possible across the seven studies Reports of the distribution and relative error sizes across gait
reporting within-assessor reliability using the CMC or CMD (see variables did not always coincide with the findings of studies using
Table 3). Reliability varied widely across the studies and gait CMCs to assess reliability. Both statistical methods suggested that
variables. Excluding pelvic tilt, very high values were typically hip and knee rotation measures showed most variation, with
reported for the sagittal plane data, with the transverse plane higher errors reported and generally low CMC values. For some gait
generally showing the lowest reliability (median < .72). The variables, however, the error magnitudes did not reflect the
lowest obtained reliability indices (<.6) were reported for pelvic reported CMC indices. For example, pelvic rotation was frequently
tilt, [23,28,30], knee varus [23], and hip, knee and foot (transverse reported with relatively low error (<28), yet showed only moderate
plane) [23,28,31,34]. reliability with CMC values ranging from .67 to 89 (median of .72).
Evaluation of the reliability indices (either CMC or ICC) across Similarly, knee flexion showed uniformly high CMC values, yet
all studies confirms that sagittal plane reliability was typically showed relatively larger error magnitudes, with some studies
higher than .8, excluding pelvic tilt. For the coronal plane, most showing errors in excess of 48.
studies reported reliability indices of >.7. The majority of studies Of the six studies reporting both within-assessor and between-
reported indices <.7 for the transverse plane (excluding the pelvis). assessor error, three found single assessors to be more repeatable
Results from gait studies reported as either S.D. or S.E. provide than multiple assessors [7,27,30], one found similar repeatability
the magnitude of error across different gait variables and are [31] and a single study reported between-assessor reliability to be
reported in Fig. 1. The diversity of study types, participants and better than within-assessor [36]. Of the three reports comparing

Table 3
Summary of studies reporting within-assessor reliability of 3DGA, data as coefficient of multiple correlation (CMC) (within-assessor).

Besier et al. Besier et al. Gorton Growney Kadaba Steinwender Steinwender Tsushima, Yavuzer Median
[31] (AL)a [31] (FUN)a et al. [19] et al. [34] et al. [28] et al. [23] et al. [23] et al. [30] et al. [24]
Healthy Healthy Healthy Healthy Healthy Healthy Children Healthy Adults with
adults adults children adults children with CP adult stroke

Sagittal Pelvic Tilt – – .79 .64 .24 .32 .56 .38 .95 .56
Hip flexion .97 .98 .99 .96 .98 .96 .96 .99 .89 .96
Knee flexion .96 .96 .99 .99 .99 .96 .96 .99 .85 .96
Ankle d’flexion .92 .93 .96 .98 .93 .87 .83 .98 .85 .93
Coronal Pel obliquity – – .85 .89 .75 .73 .98 – .85
Hip abduction .93 .92 – .90 .89 .85 .76 .97 – .89
Knee varus/val .80 .82 – .74 .61 .49 .58 .79 – .74
Transverse Pel rotation – – – .88 .72 .67 .71 .89 – .72
Hip rotation .62 .63 .91 .74 .41 .59 .57 .82 – .62
Knee rotation .83 .87 .54 .49 .34 .41 .81 – .54
Foot progression – – – .55 .58 .37 .49 .82 – .55
Besier et al. CMCs derived from coefficient of multiple determination data, L side.
J.L. McGinley et al. / Gait & Posture 29 (2009) 360–369 365

Fig. 1. Summary of gait studies reporting 3DGA reliability as S.D. or S.E.

Study details: Besier [31]; average systematic error between and within-assessors, Charlton [27]a*; SD PIG inter-assessor, Charlton [27]b*; SD OLGA inter-assessor, Eve [40];
SE within-assessor, Gorton [1]; SD inter-assessor, Gorton [33]; SD inter-assessor, Growney [34]; SD within-assessor (right side), Leardini [29]; SD inter-assessor, Maynard
[36]; SD diff. inter-assessor (averaged across events), Monoghan [9]; SD diff. within-assessor (averaged across events), Murphy [38]*; SD within and inter-assessor, Schwartz
[7]; SD inter-assessor. * data estimated from Figure provided.

the repeatability of healthy children and those with CP, no clear than those with CP, although varying across different gait
findings emerged. The only full-text report described the repeat- variables.
ability of children with CP as lower than healthy children, although
the provided data show broadly comparable CMC values with 4. Discussion
higher values obtained for the CP group pelvic tilt and foot rotation
[23]. In the abstract reports from Quigley et al. and Miller et al. The diversity of study participants, methods, biomechanical
[20,22], the normal children appeared to be slightly less consistent modelling techniques, statistical analyses and results precludes a
366 J.L. McGinley et al. / Gait & Posture 29 (2009) 360–369

simple conclusion about the reliability of 3DGA. Data from studies also more variable than older children [19]. Children with CP were
reporting reliability indices do however suggest that the majority more variable for some kinematic gait variables than healthy
of studies reported moderate to good reliability for sagittal and children [23]. Measures of gait data reliability are intrinsically
coronal plane variables, with the exception of pelvic tilt and knee related to the variability within the studied group [54], with
varus/valgus in some reports. Likewise, estimates of error (S.D. or measurements widely considered to be population-specific [55].
S.E.; Fig. 1) suggest most studies reported error of less than 58 for all Whether estimates of error can be reasonably generalised across
gait variables, excluding hip and knee rotation. clinical populations should be carefully considered, in the context
Whether 3DGA data is ‘reliable enough’ remains a question that of the characteristics of the specific pathology, and the associated
can be answered only in the context of proposed use, with the impairment and gait dysfunction characteristics. Furthermore,
degree of acceptable measurement variation relating directly to although different gait disorders may be associated with variable
the intended application. It is clearly beyond the scope of this levels of intrinsic gait repeatability, it is not clear whether the
review to specify ‘acceptable limits’ of reliability for 3DGA data. We nature of the gait disorder has any direct effect on procedural
do however believe that in most common clinical situations that sources of error such as marker placement. It is likely that such
error of 28 or less is highly likely to be widely considered errors may be related to patient-specific factors such as cognition,
acceptable, as such errors are probably too small to require explicit compliance and cooperation which may or may not be related to
consideration during data interpretation. Errors of between 28 and the gait disorder.
58 are also likely to be regarded as reasonable but may require The potential influence of the assessor characteristics on the
consideration in data interpretation. We suggest that errors in reliability of 3DGA data received very limited focus within the
excess of 58 should raise concern and may be large enough to studies in this review, with generally poor detailing of assessor
mislead clinical interpretation. Data from the studies reporting recruitment and descriptions. Kinematic 3D gait measurement
error reveals that the majority of studies and gait variables show using landmark-specific models requires specialised staff skills,
errors that fall between 28 and 58. Hip rotation clearly shows the including accurate and consistent placement of markers, and
highest error, although it is noteworthy that some studies report expert knowledge of the underlying biomechanical model.
lower error of <58 for this variable, suggesting that lower error is Training of clinical staff in standardised protocols is widely
currently achievable [7,27,38,40,41]. This compares well with considered to be important [1,29]. The consistency of the measures
clinical measurements of similar variables. For example, both may therefore be influenced by assessor experience, expertise,
Fosang et al. [42] and McDowell et al. [43] report variability of professional background and additional training [56], with
between 58 and 108 in clinical assessment of sagittal plane range of experience of the clinical team potentially contributing to random
movement of the major joints of the lower extremity. error in gait data [57]. Inclusion criteria or sampling methods for
assessors were not reported in any study, and it seems probable
4.1. Methodological considerations: participant and assessor samples that assessors were convenience samples of staff working within
the authors’ laboratories. Whether the samples were influenced by
The widespread use of 3DGA as part of clinical services in any biasing factors, such as recruitment of only the most
clinical populations warrants careful consideration of best-quality experienced or ‘best’ assessors is uncertain. If experience or
study methodology. Appropriate sample composition and inclu- discipline-specific training is a determinant of 3DGA measurement
sion/exclusion criteria should ensure that the range of character- reliability, then it is uncertain whether the results of ‘best’
istics of interest in a clinical population is most likely to be present assessors can be applied to other inexperienced assessors, or those
in a sample, and that the findings can be generalised. Of the four from different professional backgrounds. Similarly, if the findings
full-text reports including clinical participants [2,21,23,24], three are from novice assessors, then the error sizes reported may be
chose convenience samples. Such samples may be susceptible to larger than those typically achieved by experienced assessors with
sampling bias, such as selective inclusion of the most cooperative greater expertise.
or compliant participants. Ideally, subjects participating in a study
of a measurement should consist of individuals who would be 4.2. Methodological considerations: study design and procedures
likely to undergo the test in clinical practice, and reflect a
continuum of severity from mild to severe [44]. If the target Although the majority of studies described the use of
population is intended to be typical clinical gait analysis service standardised protocols, wide variation was apparent in the
patients then recruitment strategies could consider use of a duration between measurement sessions. Justification of the time
prospective cohort design with consecutive clinical subjects, such interval duration is recognized as a desirable attribute of study
as the case consecutive sampling described by Yavuzer et al. [24]. quality [16], but was absent in the majority of reports. Selection of
Such designs are recognized as the best method in studies of an optimal interval in repeated 3DGA measures requires con-
diagnostic tests to ensure a representative sample and avoid sideration of both practical and theoretical issues. In principle,
selection bias [45]. Alternate strategies could include stratified intervals should be far apart to minimise fatigue or memory bias
random sampling of typical gait disorder populations. effects, but short enough to avoid genuine change in the
The prevalence of healthy participants in the majority of the measurements [16,55]. Artificially short intervals within a day
studies is noteworthy and contrasts with the widespread clinical are often most feasible to achieve, yet may leave visible signs of
and research application of 3DGA to evaluate gait disorders. Gait marker placement on skin to ‘unblind’ a repeat assessment or
analysis services typically include those with gait conditions such subsequent assessor, or increase the possibility that assessors may
as CP [46], spinal cord injury [47,48], spina bifida and acquired remember aspects of anthropometric measures or landmark
brain injury [49]. Furthermore, research studies have used 3DGA to identification. Fatigue may also cause true variations in the gait
characterise gait disorders and examine intervention efficacy in patterns of clinical subjects when measured repeatedly within a
diseases such as Parkinson’s disease [50], myelomenigocele [51], day by multiple assessors. In contrast, longer time periods of
and CP [52]. Generalisability of the error associated with repeated months increase the possibility that real change has occurred
measures of healthy adults to children or those with gait pathology within the measurement interval, potentially introducing disease
should be viewed with some caution. Adult gait data is generally progression bias [17]. In clinical populations such as CP,
found to be less variable than children’s [53], and younger children deterioration in gait has been documented over periods of 1–2
J.L. McGinley et al. / Gait & Posture 29 (2009) 360–369 367

years [58,59], and may potentially occur over shorter periods. and AFO conditions, requiring marker replacement for the AFO
Further, the exact level of intrinsic stability of able-bodied human condition. The data from both gait conditions are commonly used
gait patterns over hours, days, weeks, months or years has not been for clinical interpretation and evaluation of AFO prescription and
well detailed. efficacy. Further studies are needed to examine the reliability of
Blinding of assessors to prior measurements is typical practice 3DGA data from gait conditions including orthoses.
within repeatability studies of measurement tools other than Comparisons of the repeatability of alternate biomechanical
3DGA (e.g. [60]). Although the potential for assessor bias is less models are likely to influence and guide the development and
apparent with instrumented measures, it remains a possible factor adoption of more reliable models. Adequate model description is
in some studies which good research design may minimise. It is therefore necessary to allow ready identification of the models
particularly relevant to within-laboratory or within-assessor used. Two studies evaluated the repeatability of alternate models
studies using biomechanical models that rely on landmark-specific with concurrent data capture. Charlton et al. [27] compared the
marker placement and anthropometric measures. When measures within- and between-assessor repeatability of a new model using
are repeated over short duration intervals, assessors may recall optimisation techniques (OLGA) with a conventional gait model
anthropometric measures and/or bony landmark identification. (VCM), finding lower error with OLGA. In contrast, Besier et al. [31]
Bias may also be introduced in data processing, by unblinded found few differences in either within- and between-assessor
selective trial inclusion or post hoc data adjustment. Three authors repeatability of a conventional anatomical landmark model
reported efforts aiming to minimise assessor bias. Tsushima et al. compared to a newer model with functional (motion) calibration.
[30] ensured that any traces of marker placement were absent No conclusions can be made from this review as to whether
prior to repeated marker placement, and both Maynard et al. [36] particular models are more repeatable than others, due to the
and Noonan et al. [2] reported that assessors/study sites were blind diverse methodology used, the varied statistical analysis and
to previous measurements. We suggest that future studies reflect variable study quality.
upon potential sources of bias within study design and when
relevant consider blinding assessors to previous measurements. 4.3. Methodological considerations: statistical analysis
Provision of concurrent ST data are a potentially useful
additional indicator of the ‘true’ level of between-session gait A key question in the reliability of 3DGA data is whether the
stability. Kinematic gait patterns are known to vary with changes measures are reliable enough for clinical decision-making.
in walking speed [61]. Significant changes in speed or step size Although indices such as the CMC and ICC were commonly
across sessions are therefore more likely to be associated with reported, it is now well-recognized that, in isolation, correlation
‘true’ change in kinematic variability, rather than error related to indices do not tell us whether the measures are ‘reliable enough’,
inconsistent marker placement. Inspection of the reported ST data with even high values potentially hiding measurement errors
shows marked across-study differences in ST variation. Healthy judged to be of clinical importance [63]. Furthermore, expressing
adults varied little in mean walking speed (0.03 m/s; 3% data variability as a coefficient results in units that are difficult to
Coefficient of Variation (CV)) across four measurement sessions interpret clinically [29]. To be most useful, variability should be
within 2 weeks [30]. Higher variation was evident in the study of expressed in a manner that can be directly related to the
children with CP by Noonan et al. [2], which included four visits to measurement itself, in the same measurement units (e.g. degrees)
separate laboratories over 6–20 weeks, reporting a mean absolute [64]. This is a significant limitation to much of the existing
variability of 0.3 m/s, with a maximum absolute variability of literature, with only around half of the papers reporting error in
0.6 m/s. These wide variations suggest that the resultant dis- absolute terms. Interpretation of reliability indices according to
cordance index may include kinematic changes due to differences reference ranges of arbitrary ‘acceptable or unacceptable’ values
in walking speed in some individuals across measurement also occurred in some of the studies. This is now generally regarded
sessions. as unreasonable with preference that the adequacy of reliability
The data selected for the evaluation of reliability may outcomes should be reported in the context of the intended
potentially influence between-session data variation and differed research or clinical utilisation.
markedly across studies. Measurement sessions ranged from two The prevalence of reports using the CMC warrants particular
to five (Table 1) and trial numbers varied from a single or ‘typical’ attention, as the calculation method of the CMC is markedly
or ‘representative’ trial [2,36], up to 10 trials (e.g. [1,9,33]). Some influenced by the joint range of motion (ROM). As noted by
evidence suggests that the number of analysed trials may influence previous authors [23,34], joints with large ROM typically record
the reliability of gait measurements. Monoghan et al. [9] examined high CMCs, and conversely, joints with low ROM typically show
the inter-session reliability of two, four, six, eight and 10 trials, poorer reliability. Furthermore, lower limb joints vary greatly in
finding that reliability improved with higher trial numbers, ROM across patient groups and individual patients, and subjects
subsequently advocating that 10 trials be used in analysis. with gait pathology may show either increased (e.g. pelvic tilt) or
Similarly, in a study of inter-session reliability of the kinematics reduced (e.g. knee flexion) ROM. Inspection of reported data
of able-bodied running, Diss [62] found higher reliability indices confirms this pattern, with generally low and variable CMC values
from inclusion of five trials, in contrast to the lower values achieved for pelvic tilt (for example see [23,28,30]) and only
obtained from a single trial. Wide variations in methodology relatively high values of >.85 reported for sagittal knee motion
prevent a detailed examination of the influence of trial number (Table 3). The marked variation in joint range across the lower limb
within this review. It is notable however, that the two studies seriously limits the utility of this measure, and caution is
including only single trials reported generally lower values of advocated in interpreting such results. We recommend that this
reliability [36] and larger data variability [2]. technique should not be used in isolation in future reliability
The majority of studies either stated or are presumed to have studies.
captured data in barefoot conditions, inferred from the description It is recommended that future studies reporting reliability of
of skin-mounted foot markers. No study examined gait with an 3DGA data include absolute measures of measurement error such
orthosis. It is common for children with CP or adults after stroke to as the S.D., S.E.M. or alternate forms. Consideration should also be
wear lower limb Ankle-Foot Orthoses (AFOs). Typical clinical gait given to the investigation and development of minimum levels of
analysis for these people includes measures of gait in both barefoot detectable change (MDC), or minimal clinically important differ-
368 J.L. McGinley et al. / Gait & Posture 29 (2009) 360–369

Table 4 As an alternative to research with clinical participants, small

Factors to consider when planning or reporting a 3DGA gait reliability study.
studies using low numbers of healthy participants may also be
Descriptor appropriate, to more easily enable between-laboratory compar-
isons of specific techniques or biomechanical models. Further
Participants (gait) Eligibility criteria. Recruitment strategy. refinement and adoption of a ‘standard test protocol’ using
Participants (assessors) Eligibility criteria. Recruitment strategy. methods such as those outlined by Schwartz et al. [7] may be
Protocol and model Description of setting, measurement useful. Such a protocol could specify an agreed number of trials and
protocol, data capture systems and
sessions, incorporate methods to minimise assessor bias, and
biomechanical models (in sufficient
detail to allow study to be repeated).
adopt a specified time interval such as 1 week. This may provide a
Study design Single or multiple assessors and/or labs. useful and more feasible approach to investigating model or
Number and timing of sessions and technique-specific questions, prior to definitive studies in clinical
trials within session. Standardisation of populations when necessary.
assessment intervals. Variables to be
This review concludes that although most errors in gait analysis
Steps to reduce bias Has blinding of assessors occurred if are probably acceptable, they are generally not small enough to be
appropriate? ignored during clinical data interpretation. A goal of any clinical
Sample size How has sample size been determined? measurement technology must be to provide measurements that
Statistical methods Description of statistical measurements.
are free from any measurement error that might affect inter-
Do these provide outcomes with the same
units as the measured variables to ensure
pretation. There is thus still a need for modifying measurement
clinical applicability of results? techniques to reduce levels of error. Many current techniques rely
heavily on the skill of assessors in accurately placing markers, and
Participants (gait) Description of participant characteristics. inaccurate marker placement is almost certainly the principal
Participants (assessors) Description of participant characteristics source of error. New techniques are now emerging based on
with specific emphasis on professional functional calibration techniques which are, in principle, less
background and experience. dependent on the accuracy of marker placement (for example, see
Data Report of basic temporal data parameters
[66,67]). It is hoped that these may further reduce measurement
along with more complex gait data.
Consider reporting estimates of variance error in clinical gait analysis. The definition of what measurement
of various sources: i.e. inter-trial, error is acceptable is, of course, dependent on the particular clinical
within-assessor, between-assessor etc application.
This review provides evidence that clinically acceptable errors
are possible in gait analysis. Variability between studies, however,
ences (MCID) [65]. Further evidence may also be sought for the suggests that they are not always achieved and that particular care
responsiveness of 3DGA measures. Whether the error magnitudes is required to achieve acceptable results.
are sufficiently low will be relative to the magnitude of expected
intervention effect size and specific population context. Further
studies are necessary in typical clinical populations to provide high
quality evidence indicating whether 3DGA measures are suffi-
This project was funded by a National Health and Medical
ciently reliable to detect clinically important change.
Research Council Grant (ID 264597) to the Centre for Clinical
Research Excellence in Gait Analysis and Gait Rehabilitation,
5. Considerations and recommendations for future research
Murdoch Childrens Research Institute, Melbourne, Australia.

A number of limitations should be considered when interpret-

ing the findings of this review. All papers were retained for Conflict of interest
inclusion regardless of study quality, in order to provide a
comprehensive overview of available data. Statistical synthesis Author RB has received research support funding from VICON.
of the data was not performed. The findings of this review are The other authors state there were no conflicts of interest.
limited to the published papers identified by the search strategies.
Potential publication bias was not assessed and may have resulted References
in an over-estimation of reliability. Study quality was only
reviewed by the criterion tool developed for the study purpose. [1] Gorton G, Hebert D, Goode B. Assessment of the kinematic variability between
Future studies of the reliability of 3DGA require careful 12 Shriners motion analysis laboratories. Gait & Posture 2001;13:247.
[2] Noonan K, Halliday S, Browne R, O’Brien S, Kayes K, Feinberg J. Inter-observer
consideration of optimal design to enhance the generalisability variability of gait analysis in patients with cerebral palsy. Journal of Pediatric
of the findings. If the intention is to apply the reliability estimates Orthopaedics 2003;23:279–87.
to clinical populations, then careful attention is necessary to [3] Wright JG. Pro: interobserver variability of gait analysis. Journal of Pediatric
Orthopaedics 2003;23:288–9.
recruit and describe samples which are representative of the [4] Gage JR. Con: interobserver variability of gait analysis. Journal of Pediatric
clinical populations of interest. Assessor recruitment and char- Orthopaedics 2003;23:290–1.
acterization warrants comparable attention. Protocols should [5] Rothstein JM, Echternach JL. Primer on measurement: an introductory guide to
measurement issues. Alexandria, VA: American Physical Therapy Association
carefully consider what standardised measurement interval is
(APTA); 1993.
most appropriate and minimise predictable sources of assessor [6] Baker R. Gait analysis methods in rehabilitation. Journal of NeuroEngineering
bias. Appropriate statistical strategies should include reliability and Rehabilitation 2006;3:4.
[7] Schwartz MH, Trost JP, Wervey RA. Measurement and management of errors in
estimates in units of degrees to enhance interpretation. Future
quantitative gait data. Gait & Posture 2004;20:196–203.
studies should also consider evaluation of the reliability of kinetics [8] Kallen M. Understanding reliability when using measurement instruments in
and consider study designs that allow evaluation of the respon- the VA population. METRIC Newsletter (Measurement Excellence and Training
siveness of 3DGA. Table 4 proposes a list of factors that should be Resources Information Center); 2005 [Fall].
[9] Monaghan K, Delahunt E, Caulfield B. Increasing the number of gait trial
considered when designing or reporting a study of the reliability of recordings maximises intra-rater reliability of the CODA motion analysis
3DGA. system. Gait & Posture 2007;25:303–15.
J.L. McGinley et al. / Gait & Posture 29 (2009) 360–369 369

[10] National Health and Medical Research Council. How to review the evidence: [38] Murphy A, McGinley J, Tirosh O. Reliability of kinematic gait measurements in
systematic identification and review of the scientific literature. Canberra, adult hemiplegic stroke. In: Proceedings of the 12th annual gait and clinical
Australia: Biotext; 1999. movement analysis society; 2007.
[11] Mulrow C, Cook DJ, Davidoff F. Systematic reviews: critical links to the great [39] Deville WL, Buntinx F, Bouter LM, Montori VM, de Vet HCW, van der Windt
chain of evidence. Annals of Internal Medicine 1997;126:389–91. DAWM, et al. Conducting systematic reviews of diagnostic studies: didactic
[12] Hestbaek L, LeBoeuf-Yde C. Are chiropractic tests for the lumbo-pelvic spine guidelines. BMC Medical Research Methodology 2002.
reliable and valid? A systematic critical literature review. Journal of Manip- [40] Eve L, McNee A, Shortland A. Extrinsic and intrinsic variation in kinematic data
ulative and Physiological Therapeutics 2000;23:258–75. from the gait of healthy adult subjects. Gait & Posture 2006;24:S56–7.
[13] Jordan K. Assessment of published reliability studies for cervical spine range- [41] Schwartz MH, Viehweger E, Stout J, Novacheck TF, Gage JR. Comprehensive
of-motion measurement tools. Journal of Manipulative and Physiological treatment of ambulatory children with cerebral palsy: an outcome assess-
Therapeutics 2000;23:180–95. ment. Journal of Pediatric Orthopedics 2004;24:45–53.
[14] van der Wurff P, Hagmeijer RHM, Meyne W. Clinical tests of the sacroiliac [42] Fosang AL, Galea MP, McCoy AT, Reddihough DS, Story I. Measures of muscle
joint. A systematic review. Part 1: reliability. Manual Therapy 2000;5: and joint performance in the lower limb of children with cerebral palsy.
30–6. Developmental Medicine & Child Neurology 2003;45:664–70.
[15] Dobson F, Morris ME, Baker R, Graham HK. Gait classification in children with [43] McDowell B, Hewitt V, Nurse A, Weston T, Baker R. The variability of gonio-
cerebral palsy: a systematic review. Gait & Posture 2007;25:140–52. metric measurements in ambulatory children with spastic cerebral palsy. Gait
[16] Terwee C, Bot S, de Boer M, van der Windt D, Knol D, Dekker J, et al. Quality & Posture 2000;12:114–21.
criteria were proposed for measurement properties of health status ques- [44] Lijmer JG, Willem B, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JHP,
tionnaires. Journal of Clinical Epidemiology 2007;60:34–42. et al. Empirical evidence of design-related bias in studies of diagnostic tests.
[17] Whiting P, Rutjes A, Dinnes J, Reitsma J, Bossuyt P, Kleijnen J. Development and Journal of the American Medical Association 1999;282:1061–6.
validation of methods for assessing the quality of diagnostic accuracy studies. [45] Fritz J, Wainner R. Examining diagnostic tests: an evidence-based perspective.
Health Technology Assessment 2004;8. Physical Therapy 2001;81:1546–64.
[18] Higgins J, Green S, editors. Cochrane handbook for systematic reviews of [46] Gage JR, Koop SE. Clinical gait analysis: application to management of cerebral
interventions 426 [updated September 2006] The Cochrane Library, vol. issue palsy. In: Allard P, Stokes IAF, Blanchi J-P, editors. Three-dimensional analysis
4. Chichester, UK: John Wiley & Sons, Ltd.; 2006. of human movement. Champaign, IL: Human Kinetics; 1995. p. 349–62.
[19] Gorton G, Stevens C, Masso P, Vannah W. Repeatability of the walking patterns [47] Patrick J. Case for gait analysis as part of management of incomplete spinal
of normal children. Gait & Posture 1997;5:155. cord injury. Spinal Cord 2003;41:497–582.
[20] Miller F, Castagno P, Richards J, Lennon N, Quigley E, Njiler T. Reliability of [48] Smith PA, Hassani S, Reiners K, Vogel LC, Harris GF. Gait analysis in children
kinematics during clinical gait analysis: a comparison between normal and and adolescents with spinal cord injuries. Journal of Spinal Cord Medicine
children with cerebral palsy. Gait & Posture 1996;4:169–70. 2004;27:S44–9.
[21] Mackey AH, Walt SE, Lobb GA, Stott NS. Reliability of upper and lower limb [49] Perry J. The use of gait analysis for surgical recommendations in traumatic
three-dimensional kinematics in children with hemiplegia. Gait & Posture brain injury. Journal of Head Trauma Rehabilitation 1999;14:116–35.
2005;22:1–9. [50] Morris M, Iansek R, McGinley J, Matyas T, Huxham F. 3-Dimensional gait
[22] Quigley E, Miller F, Castagno P, Richards J, Lennon N. Variability of gait biomechanics in Parkinson’s disease: evidence for a centrally mediated ampli-
measurements for typically developing children and children with cerebral tude regulation disorder. Movement Disorders 2005;20:40–50.
palsy. Gait & Posture 1999;10. [51] Gutierrez E, Bartonek A, Haglund-Akerling Y, Saraste H. Kinetics of compen-
[23] Steinwender G, Saraph V, Scheiber S, Zwick EB, Uitz C, Hackl K. Intrasubject satory gait in persons with myelomeningocele. Gait & Posture 2005;21:12–23.
repeatability of gait analysis data in normal and spastic children. Clinical [52] Gage JR, DeLuca PA, Renshaw TS. Gait analysis: principle and applications with
Biomechanics 2000;15:134–9. emphasis on its use in cerebral palsy. Instructional Course Lectures
[24] Yavuzer G, Oken O, Elhan A, Stam HJ. Repeatability of lower limb three- 1996;45:491–507.
dimensional kinematics in patients with stroke. Gait & Posture 2008;27: [53] Stolze H, Kuhtz-Buschbeck J, Mondwurf C, Johnk K, Friege L. Retest reliability
31–5. of spatiotemporal gait parameters in children and adults. Gait & Posture 1998;
[25] Gok H, Ergin S, Yavuzer G. Reliability of gait measurement in normal subjects. 7:125–30.
Journal Rheumatic Medicine and Rehabilitation 2002;13:76–80. [54] Bruton A, Conway JH, Holgate ST. Reliability: what is it, and how is it
[26] Palisano R, Rosenbaum P, Walter S, Russel LD, Wood E, Galuppi B. Develop- measured? Physiotherapy 2000;86:94–9.
ment and reliability of a system to classify gross motor function in children [55] Portney LG, Watkins MP. Foundations of clinical research. Applications to
with cerebral palsy. Developmental Medicine and Child Neurology 1997;39: practice. New Jersey: Prentice Hall Health; 2000.
214–23. [56] de Vet HCW, Terwee CB, Bouter LM. Current challenges in clinimetrics. Journal
[27] Charlton IW, Tate P, Smyth P, Roren L. Repeatability of an optimised lower of Clinical Epidemiology 2003;56:1137–41.
body model. Gait & Posture 2004;20:213–21. [57] Davis R, Davids J, Gorton G, Aiona M, Scarborough N, Oeffinger D, Tylkowski
[28] Kadaba MP, Ramakrishnan HK, Wootten ME, Gainey J, Gorton G, Cochran GV. CAB. A minimum standardized gait analysis protocol: development and
Repeatability of kinematic, kinetic, and electromyographic data in normal implementation by the Shriners Motion Analysis Laboratory Network (SMAL-
adult gait. Journal of Orthopaedic Research 1989;7:849–60. net). In: Harris GF, Smith PA (Eds.). Pediatric gait: a new millennium in clinical
[29] Leardini A, Sawacha Z, Paolini G, Ingrosso S, Nativo R, Benedetti MG. A new care and motion analysis technology. IEEE; 2000.
anatomically based protocol for gait analysis in children. Gait & Posture 2007 [58] Bell KJ, Ounpuu S, DeLuca PA, Romness MJ. Natural progression of gait in children
Oct;26:560–71. with cerebral palsy. Journal of Pediatric Orthopaedics 2002;22:677–82.
[30] Tsushima H, Morris ME, McGinley J. Test-retest reliability and inter-tester [59] Gough M, Eve LC, Robinson RO, Shortland AP. Short-term outcome of multi-
reliability of kinematic data from a three-dimensional gait analysis system. level surgical intervention in spastic diplegic cerebral palsy compared with the
Journal of the Japanese Physical Therapy Association 2003;6:9–17. natural history. Developmental Medicine & Child Neurology 2004;46:91–7.
[31] Besier TF, Sturnieks DL, Alderson JA, Lloyd DG. Repeatability of gait data using [60] Watkins M, Riddle D, Lamb R, Personius W. Reliability of goniometric mea-
a functional hip joint centre and a mean helical knee axis. Journal of Biome- surements and visual estimates of knee range of motion obtained in a clinical
chanics 2003;36:1159–68. setting. Physical Therapy 1991;71:90–6.
[32] Ferber R, McClay Davis I, Williams D, Laughton C. A comparison of within- and [61] van der Linden ML, Kerr AM, Hazlewood ME, Hillman SJM, Robb JE. Kinematic
between-day reliability of discrete 3D lower extremity variables in runners. and kinetic gait characteristics of normal children walking at a range of
Journal of Orthopaedic Research 2002;20:1139–45. clinically relevant speeds. Journal of Pediatric Orthopaedics 2002;22:800–6.
[33] Gorton G, Hebert D, Goode B. Assessment of the kinematic variability between [62] Diss CE. The reliability of kinetic and kinematic variables used to analyse
twelve Shriners motion analysis laboratories Part 2: short-term follow up. Gait normal running gait. Gait & Posture 2001;14:98–103.
& Posture 2002;16:S65–66. [63] Luiz RR, Szklo M. More than one statistical strategy to assess agreement of
[34] Growney E, Meglan D, Johnson M, Cahalan T, An K-N. Repeated measures of quantitative measurements may usefully be reported (commentary). Journal
adult normal walking using a video tracking system. Gait & Posture 1997;6: of Clinical Epidemiology 2005;58:215–6.
147–62. [64] Keating J, Matyas T. Unreliable inferences from reliable measurements. Aus-
[35] Shrout PE, Fleiss JL. Intra-class correlations: uses in assessing rater reliability. tralian Journal of Physiotherapy 1998;44:5–10.
Psychology Bulletin 1979;86:420–8. [65] Haley SM, Fragala-Pinkham MA. Interpreting change scores of tests and
[36] Maynard V, Bakheit AMO, Oldham J, Freeman J. Intra-rater and inter-rater measures used in physical therapy. Physical Therapy 2006;86:735–43.
reliability of gait measurements with CODA mpx30 motion analysis system. [66] Schwartz MH, Rozumalski A. A new method for estimating joint parameters
Gait & Posture 2003;17:59–67. from motion data. Journal of Biomechanics 2005;38:107–16.
[37] Cowman J, Jenkinson A, O’Connell P, O’Brien T. A model for establishing [67] Reinbolt JA, Schutte JF, Fregly BJ, Koh BI, Haftka R, George A, Mitchell K.
reliability and quantifying error associated with routine gait analysis. Gait Determination of patient-specific multi-joint kinematic models through two-
& Posture 1998;8:79. level optimization. Journal of Biomechanics 2005;38:621–6.