A correlation coefficient is a numerical measure of some type of linear correlation, meaning a statistical relationship between two variables.[a] The variables may be two columns of a given data set of observations, often called a sample, or two components of a multivariate random variable with a known distribution.[citation needed]
Several types of correlation coefficient exist, each with their own definition and own range of usability and characteristics. They all assume values in the range from −1 to +1, where ±1 indicates the strongest possible correlation and 0 indicates no correlation.[2] As tools of analysis, correlation coefficients present certain problems, including the propensity of some types to be distorted by outliers and the possibility of incorrectly being used to infer a causal relationship between the variables (for more, see Correlation does not imply causation).[3]
There are several different measures for the degree of correlation in data, depending on the kind of data: principally whether the data is a measurement, ordinal, or categorical.
The Pearson product-moment correlation coefficient, also known as r, R, or Pearson's r, is a measure of the strength and direction of the linear relationship between two variables that is defined as the covariance of the variables divided by the product of their standard deviations.[4] This is the best-known and most commonly used type of correlation coefficient. When the term "correlation coefficient" is used without further qualification, it usually refers to the Pearson product-moment correlation coefficient.
Intraclass correlation (ICC) is a descriptive statistic that can be used, when quantitative measurements are made on units that are organized into groups; it describes how strongly units in the same group resemble each other.
Rank correlation is a measure of the relationship between the rankings of two variables, or two rankings of the same variable:
The polychoric correlation coefficient measures association between two ordered-categorical variables. It's technically defined as the estimate of the Pearson correlation coefficient one would obtain if:
When both variables are dichotomous instead of ordered-categorical, the polychoric correlation coefficient is called the tetrachoric correlation coefficient.
The correlation between two variables have different associations that are measured in values such as r or R. Correlation values range from −1 to +1, where ±1 indicates the strongest possible correlation and 0 indicates no correlation between variables.[5]
r or R | r or R | Strength or weakness of association between variables[6] |
---|---|---|
+1.0 to +0.8 | -1.0 to -0.8 | Perfect or very strong association |
+0.8 to +0.6 | -0.8 to -0.6 | Strong association |
+0.6 to +0.4 | -0.6 to -0.4 | Moderate association |
+0.4 to +0.2 | -0.4 to -0.2 | Weak association |
+0.2 to 0.0 | -0.2 to 0.0 | Very weak or no association |