First and foremost, HAVE A SYSTEM, then practice it. A common one
is known by the acronym:
OMRAD
- Objectives
- Methods
- Results
- Analysis
- Discussion/Conclusion
- Objectives
- Do the authors have a clearly defined objective for their
study?
- Methods
- Design
- Setting
- Participants (Inclusion / exclusion criteria, age range,
underlying disease)
- Interventions
- Outcome measures (Primary / Secondary)
Plus depending on type of paper:
Theraputic |
Diagnostic |
Systematic review / Meta-analysis |
Ethics
Sample size
Randomization
Allocation
Blinding
Follow up
Similar groups
Equal Rx |
Ethics
Gold Standard
Comparison with standard (blind & independent)
Both tests in all
Reproducible
Follow up |
Data source
Study selection + inter rater obs
Clinical quest & objectives
Methodology of included studies
Weighing +/- rejection of poor quality studies
Handling of heterogeneity |
- Results
- Main outcome – estimate / precision
- Secondary outcomes - estimate / precision
- Discussion & Conclusion
- Authors’
- Bottom line
- Bias justification / limitation
- External validation (compare to other studies /
clinical practice)
- Mine
- Bottom line
- Bias justification / limitation
- Applicability (to my practice)
- Limitation (considering my set-up)
Plus don’t forget to look at the References.
The FCEM Examination includes a Critical
Appraisal section. See the FCEM
guidance page for advice and infromation about the Critical
appraisal section of the exam
Null Hypothesis
Most research papers of value will have an objective that states
a clear hypothesis (that there is a difference between two or more
groups). The opposite of the hypothesis is the null hypothesis
(a prediction that there is no difference between two groups)
Testing a hypothesis: the P-value
We start out with the assumption that the null hypothesis is true.
We then see what if any difference is demonstrated between two
groups (eg treatment group & placebo group), and use statistical
tests to calculate the probability that this difference could have
arisen by chance. This probability is the P-value.
The smaller the P-value, the smaller the probability is that the
difference arose by chance. If this probability is very small, then
we can reject the null hypothesis (that there is no difference).
In other words we can be confident the difference is due to the
intervention.
The P-values considered significant (eg P<0.05) should be defined
at the beginning of a research project. This level is known as alpha.
Confidence intervals
Data collected in a research trial provides an estimate of a measurement
that we use to answer the research question. Confidence intervals
tell us how much uncertainty lies around this estimate.
Most confidence intervals are expressed as 95% - that the true value
has a 95% probability of lying within the confidence interval. If
the confidence interval is narrow, the estimate is more precise.
Confidence intervals provide information about clinical significance,
whether the result is statistically significant or not. P-values
do not. Confidence intervals can also be used to estimate the likelihood
of a type II error.
Expressing Magnitude of Effect
|
Yes |
No |
Intervention |
a |
b |
Control |
c |
d |
Relative Risk (or Risk Ratio)
The ratio of the probability of developing an outcome over a specified
time, with the intervention group compared to the control group.
RR=EER/CER
Relative Risk Reduction
The proportion that an intervention reduces a harmful outcome
in comparison to patients not receiving the intervention. RRR =
[CER-EER] / CER
Absolute Risk Reduction
The difference in rates of an adverse event between study and
control populations. ARR=CER-EER
At a very simplistic level, reporting the RRR can make the treatment
sound more
impressive than reporting the ARR. Both measures have their uses,
but the ARR may be more useful for decision-making in the individual
patient, particularly as it is used to calculate the number needed
to treat.
Number Needed to Treat (NNT)
Number of patients who need to be treated over a specified period
of time to achieve one additional good outcome. The inverse of
absolute risk reduction [1/ARR]
Type I error
When you say there is a difference between the two groups when
actually there isn’t (ie reject the NH when it is true).
This may occur when the p-value is set too low.
Multiple hypothesis testing, where researchers collect data without
any clear objective and then analyse the data to look for statistically
significant results, is a common cause of type I errors in poorly
planned studies. This can be hard to spot if only the positive results
are reported – critics should look for a logical flow from the objectives
and methods to identify a clear rationale for doing the test in question.
Type II error
There is a difference between the two groups but you fail to spot
it (ie fail to reject the NH when it is false). Usually because
study is underpowered (not enough numbers).
The probability of a false negative result (defined as beta) is
determined by the sample size. The larger the sample size, the smaller
beta will be.
If confidence intervals are wide, estimates are imprecise and false
negative result more likely. If the minimum clinically significant
difference considered could in fact have been made smaller (and still
worthwhile detecting), a type II error is possible.
|
Alternative
hypothesis TRUE |
Null Hypothesis
TRUE |
Research shows significant
result |
True Positive |
False Positive
TYPE I ERROR |
Research shows no significant
result |
False Negative
TYPE II ERROR |
True Negative |
Power
The likelihood of detecting a true difference. It is also the
probability of rejecting the null hypothesis.
The power of a study is defined as 1-beta. Conventionally, a study
should aim to recruit a sufficient sample size for the power to
be 80 or 90%.
Several factors will influence study power:
- Level at which alpha is set – 0.05 by convention
- Sample size
- Variability of the outcome measure (defined by its standard deviation)
- The minimum clinically significant difference we wish to detect
Intention to treat analysis
Patients should be analyzed in the group to which they were originally
randomized, regardless of whether they actually received the treatment
they were allocated to. This ensures that the protection from bias
created by allocation concealment is maintained.
The Hawthorne Effect
When examining changes within an organization, studies that simply
measure outcomes before and after an intervention, and then conclude
that intervention caused the change in outcome may be subject to
confounding by the Hawthorne Effect. Based on experiments undertaken
at the Hawthorne works of the Western Electric Company in Chicago,
this describes the observation that people change their behaviour
when they think that you are watching them. Therefore any intervention,
if subsequently monitored, will produce a recordable change in
processes or outcomes, which is lost when monitoring ceases.
General:
Accuracy
- % of all true (+Ve) / true (–Ve) of all the results
Hypothesis
Null Hypothesis
- A prediction that there is no difference between two groups
Validity
- Is the finding true (ie can we trust the results, have they
measured what they are supposed to)
Reliability
- Gets same results every time = reproducibility.
Generalisability
- Is the finding applicable elsewhere?
Bias
- Results are affected by systematic error
Bias leads to inaccurate estimates. Accuracy can only
be determined by examining the methods of a study and deciding
if they have led to bias.
Chance
- Results affected by random error
P values tell us how likely this is
Chance leads to imprecise estimates
Confidence intervals give us an indication of the precision of
an estimate
Confounding
- Results have been misinterpreted (ie part of the observed relationship
between two variables is due to action of a third.) A false conclusion
is drawn. Known confounders can be accounted for in the analysis,
unknown confounders cannot.
Efficacy
- Whether a treatment can work under ideal conditions
Effectiveness
- Whether a treatment does work under normal conditions
Presenting results:
Case positive
- An individual with the disease in question, i.e. the gold standard
is positive.
Case negative
- An individual without the disease in question, i.e. the gold
standard is negative.
Test positive
- An individual with a positive result for the diagnostic test
under investigation.
Test negative
- An individual with a negative result for the diagnostic test
under investigation.
True positives
- Diseased individuals who test positive
False positive
- Disease free but test positive
True negative
- Disease free and test negative
False negative
- Diseased individuals who test negative
|
Case Positive |
Case Negative |
Test Positive |
A |
B |
Test Negative |
C |
D |
Sensitivity = A/(A+C)
- The proportion of people who have the disease who test positive
for the disease
If a test/sign has a high sensitivity, a negative result can
help rule out the diagnosis (SNout).
Specificity = D/(B+D)
- The proportion of disease free people who test negative for
the disease
If a test/sign has a high specificity, a positive result can
help rule in the diagnosis (SPin)
Sensitivity and specificity are constant when the prevalence
varies
Positive predictive value = A/(A+B)
- The probability that a patient has the condition if test is
positive
PPV increases with increasing prevalence
Negative predictive value = D/(C+D)
- The probability that a patient doesn’t have the condition if
test is negative
NPV decreases with increasing prevalence
Likelihood ratio for a positive test
- How much more likely is a positive result to be found in a
person with as opposed to without the condition.
Sensitivity
/ (1 - specificity)
Likelihood ratio for a negative test
- (1-sensitivity) / specificity
Likelihood ratio |
Value of additional information |
1 |
None at all |
0.5 – 2 |
Little clinical significance |
2 – 5 |
Moderately increases likelihood
of disease. Useful additional information, but not diagnostic. |
0.2 – 0.5 |
Moderately decreases likelihood
of disease. Useful additional information, but not rule-out. |
5 – 10 |
Markedly increases likelihood
of disease. May be diagnostic if other information is supportive. |
0.1 – 0.2 |
Markedly decreases likelihood
of disease. May rule-out if other information is supportive.
|
> 10 |
Diagnostic. If this does not
convince you that the patient has the disease then you
probably shouldn’t have done the test. |
< 0.1 |
Rules out disease. |
Relative Risk
- The risk of an event (eg death) after the experimental treatment/procedure
as a percentage of the original (standard) risk
Power of a Study
The likelihood of detecting a true difference. Usually 80-90%.
It is also the probability of rejecting the null hypothesis.
Prevalence
- The proportion of the population with the condition of interest.
Prevalence = (a+c) / (a+b+c+d)
Type I error
- When you say there is a difference between the two groups when
actually there isn’t ie reject the NH when it is in fact true
Type II error
- There is a difference between the two groups but you fail to
spot it ie wrongly fail to reject the NH.
Books:
- Emergency Medicine Manual - good for definitions, thumbnail
versions of terms, stats etc
- How to read a paper (Greenhalgh) – Comprehensive
& very good, recommended for dipping into certain chapters
but don’t need to read or know or understand it all.
- Pocket guide to critical appraisal (Crombie) – worth buying,
easy reading with small chapters and big headings.
Internet:
- Centre for evidence based medicine
- SIGN
- Biomed Central
- Netting the Evidence
- Bandolier
Papers:
JAMA 1994 271 had a series of articles called a users guide
to medical literature that are OK but a bit wordy and detailed. Probably
worth looking at in the library to see if they suit your style of revision/learning
Basic statistics for clinicians Can Med Assoc J 1995 152
- A series of four small well written papers:
- Hypothesis testing I
- Interpreting study results: confidence intervals
- Assessing the effects of treatment; measures of association
- Correlation and regression
- Sounds heavy reading but actually isn’t and is very understandable
and logical.
|