Go to USC home page USC Logo
UNIVERSITY OF SOUTH CAROLINA





Glossary of Research and Statistical Terminology


This glossary presents definitions for commonly used research and statistical terms.
There are two ways to access the terms:



  1. You can find them grouped by type in one of the four categories in the drop-down menu below.







  1. If you know the term for which you need a definition, you can find it in the alphabetical listing below:

Alphabetical Listing


A B C D E F G H I J K L M N O P Q R S T U V W X Y Z



Accuracy:
The overall accuracy is the percentage of correctly classified outcomes.



Attributable Risk:
The difference in the rate of a condition between the exposed and unexposed populations.
This difference is attributed to the exposure. (Table 1).

Table 1




In epidemiology, attributable risk is the difference in the rate of a disease/outcome for exposed and unexposed populations. This calculation helps illustrate whether an exposure is related to the particular disease/outcome.

Ex. Cohort Study of Smoking and Coronary Heart Disease (CHD) among Medicaid Recipients

 

Developed CHD

Do not develop CHD

Total

Incidence per 1,000 per Year

Smoked Cigarettes

115

3,000

3,115

36.9

Do not smoke Cigarettes

132

5,200

5,332

24.8


Incidence among Exposed (Smokers) =

Incidence among Unexposed (Non-Smokers) =


The incidence in the exposed group, which is attributable to the exposure, is calculated as follows:

The proportion of the total incidence in the exposed group, which is attributable to the exposure, is calculated by:



Amongst Medicaid recipients, 32.8% of the morbidity from CHD among smokers may be attributable to smoking.

 



Case-Control
: In an observational case-control study, the researcher looks through extant data and randomly selects cases and controls. Because the researcher specifically looks for and includes cases and controls, the proportion of cases in the sample is pre-determined. Thus, one cannot estimate a risk ratio. After selection based only on case-control status, each person's exposure is determined.



Cross-sectional:
In a cross-sectional study, data are collected at multiple time periods (usually at regular intervals). However, the data not collected from the same sampling units.



Cumulative incidence:
The number of new cases in a specific time interval divided by the number of persons at risk.

Ex. In 2010, the population of women ages 35-49 who were breast cancer free was 135,000 and 1,000 of those women develop breast cancer over 1 year of observation, the cumulative incidence rate of breast cancer is 7.41 breast cancer cases per 1,000 Medicaid recipients (0.741%).

(return to top)



Ecological Study:
A study for which the unit of analysis is the population rather than the individual. This type of study might compare outcomes, for example, in different countries.




Effectiveness:
RCTs are sometimes designed to investigate whether there is evidence in favor/against a drug/device/intervention when recruiting relatively arbitrary participants in flexible conditions. These trials focus on general practice.




Efficacy:
RCTs are sometimes designed to investigate whether there is evidence in favor/against a drug/device/intervention when recruiting highly selected participants in highly controlled conditions.




Efficiency:
The efficiency of a test is the percentage of the times that the test gives the correct answer compared to the total number of tests (Table 2).

Table 2

 

Test Result (T)

True Status (D)

Positive (+)

Negative (-)

Disease (+)

a
(True Positive)

b
(False Negative)

No Disease (-)

c
(False Positive)

d
(True Negative)



(return to top)



Hazard ratio (HR): The ratio of two hazard rates corresponding to two conditions (e.g., male versus female, or exposed versus unexposed). The hazard rate is the rate of events at time t conditioned on not having the event before time t.




Incidence rate:
The measure of the risk of occurrence of a specific outcome in a specific time interval.

Ex. In 2010, the average Medicaid population was 972,000 and there were 3,500 deaths over 1 year of observation, the incidence rate of breast cancer is 3.60 deaths per 1,000 person-years.



Incidence rate ratio (IRR):
The ratio of two incidence rates which is used for comparison in regression models of count (incidence) outcomes.




Longitudinal:
In a longitudinal study, data for each sampling unit (typically a person in health studies) is collected repeatedly over time. The measures may or may not be at regular time intervals.

(return to top)





Negative Predictive Value (NPV):
The negative predictive value is the probability that non-cases really are non-cases (Table 2).

Table 2

 

Test Result (T)

True Status (D)

Positive (+)

Negative (-)

Disease (+)

a
(True Positive)

b
(False Negative)

No Disease (-)

c
(False Positive)

d
(True Negative)




Nested case-control:
A case-control study taken from within a (larger) panel study.

(return to top)



Odds
: The ratio of the probability of success to the probability of failure.

Example: The ratio of the probability of success to the probability of failure (Table 1).

Table 1



Let p = probability of an event
1-p = probability of that event not occurring



  The odds of disease among those who have been exposed:

The odds of disease among those who were not exposed is:




Odds Ratio (OR):
The ratio of two odds. The odds of cancer for males versus the odds of cancer for females is the odds ratio of cancer for males versus females. This measure relates the relative ratio of success to failure for one condition versus another. Generally, people are better able to think in terms of relative risk than they are in terms of relative odds (Table 1).

Table 1

The odds ratio compares the risk of disease in exposed versus non-exposed persons:

Interpretation of OR:

OR < 1 lower risk (“exposure is protective or negatively associated with disease”) of disease for exposed individuals
OR = 1 no difference in risk of disease
OR > 1 increased risk (“exposure is positively associated with disease“) of disease for exposed individuals

(return to top)




Panel study:
A form of a longitudinal study (sometimes called a cohort study) in which groups are followed over time. The groups are formed to differ only on certain key variables.



Positive Predictive Value (PPV):
The positive predictive value (also known as the precision) is the probability that predicted cases really are cases (Table 2).

Table 2

 

Test Result (T)

True Status (D)

Positive (+)

Negative (-)

Disease (+)

a
(True Positive)

b
(False Negative)

No Disease (-)

c
(False Positive)

d
(True Negative)




Power:
The power of a test (1-β) is the probability it will reject a hypothesis when that hypothesis is not true (Table 3).

Table 3

 

Reality

Decision

H0 is true

H0 is false

Reject H0

Type I (α)

Correct decision

Fail to Reject H0

Correct decision

Type II (β)



Prevalence rate:
The total number of cases at a specific time. This measures how common a disease/outcome is at points in time.

Ex. In 2010, there were 7,500 children (ages birth-18 years) who had paid claims associated with a primary diagnosis of obesity in the Medicaid population in SC. The total population of children ages birth-18 years was 125,000, therefore the prevalence of obesity in children in the Medicaid population in SC is 6%.





Prospective Cohort: In a prospective cohort study, the researcher identifies a cohort of persons based on whether they were exposed (and none of them are already cases). Then, the entire cohort is followed over time where some proportion of the population will become cases.

(return to top)




Randomized controlled trial:
The preferred design for a clinical trial used to examine the efficacy of a drug/intervention/device. In RCTs, subjects are first accepted into the study and then assigned to one of the treatment arms. RCTs are typically broken into several levels of investigation (especially when the focus is on a drug).




Randomized clinical trials:
A randomized clinical trial (RCT) is one in which after subjects satisfy eligibility criteria are then randomly assigned to one of the treatment groups under study. This randomization helps balance known and unknown prognostic factors.



Relative risk (RR)
: This is the risk of an event (e.g., a health outcome) relative to whether there was exposure. For example, one might be interested in the risk of cancer relative to smoking; the RR is the proportion of exposed sample which develops into a case over the proportion of the unexposed sample which develops into a case. (Table 1)

Table1

Sample data table for Relative Risk definition


Equation for relative risk


Interpretation of Relative Risk:

RR < 1 -> Positive association between exposure and disease
Exposed group has higher incidence than non-exposed group
RR = 1 -> No association between exposure and disease
Incidence rates are identical between groups
RR > 1 -> Negative association between exposure and disease
Non exposed group has higher incidence




Retrospective Cohort:
In a retrospective study, the researcher identifies a cohort of persons based on whether they were exposed. The researcher looks to see whether those persons subsequently became cases.



Receiver Operator Characteristic (ROC) Curve:
The ROC curve is a graph of the true positive rate (sensitivity) versus the false positive rate (one minus specificity) at various threshold settings. A poor predictive model will have an area under the ROC curve of one half and a perfect predictive model will have an area under the curve of one.

(return to top)




Sensitivity:
The number of true positives divided by the sum of the number of true positives and the number of false negatives. This is the proportion of items classified as positive which really are positive (Table 2).

Table 2

 

Test Result (T)

True Status (D)

Positive (+)

Negative (-)

Disease (+)

a
(True Positive)

b
(False Negative)

No Disease (-)

c
(False Positive)

d
(True Negative)




Specificity:
The number of true negatives divided by the sum of the number of true negatives and the number of false positives. This is the proportion of items classified as negative which really are negative (Table 2).

Table 2

 

Test Result (T)

True Status (D)

Positive (+)

Negative (-)

Disease (+)

a
(True Positive)

b
(False Negative)

No Disease (-)

c
(False Positive)

d
(True Negative)

(return to top)




Threshold setting
: The threshold setting is that value which classifies continuous measure into 2 categories (non-case and case).



Type I Error:
  In a statistical hypothesis test, there is a null hypothesis (H0) assumed to be true. Evidence is assessed to determine whether the null hypothesis should be rejected in favor of the alternative hypothesis. Type I error (α) occurs when we reject the null hypothesis when the null hypothesis is actually true. This is also referred to as the significance level (Table 3).

Table 3

 

Reality

Decision

H0 is true

H0 is false

Reject H0

Type I (α)

Correct decision

Fail to Reject H0

Correct decision

Type II (β)





Type II Error:
In a statistical hypothesis test, there is a null hypothesis (H0) assumed to be true. Evidence is assessed to determine whether the null hypothesis should be rejected in favor of the alternative hypothesis. Type II error (β) occurs when we fail to reject the null hypothesis when the null hypothesis is actually false (Table 3).

Table 3

 

Reality

Decision

H0 is true

H0 is false

Reject H0

Type I (α)

Correct decision

Fail to Reject H0

Correct decision

Type II (β)





Youden's J:
A summary of 2x2 tables, this statistic is the sensitivity plus the specificity minus 1. This statistic can be used to choose the optimal threshold setting for classifying predicted values into non-cases and cases.

(return to top)