Correlations might be the most common statistical test run (especially early on in a project). For this part of the workshop we will be using the mtcars data set due to its many numerical values that we can assess for correlation. Below we will see the first 10 rows of the data set displayed
The primary assumption of a Pearson’s correlation coefficient is that the data is on some kind of interval scale. However, if we wish to generalize, we must have a random large sample (unlikely) and the individual variables should be roughly normally distributed. This is the assumption we will focus on for this part of the workshop.
Normality of Variables
For this part of the workshop, we are going to focus on the mpg and wt variables. We will be looking at normality both statistically1 as well as graphically.
A Note About Statistical Assumption Testing
Odds are your statistical test is going to fail pretty much every single time. Particularly if you use something like a Shapiro Wilk test, but we’ll look at it anyway
The psych package has a bunch of nifty functions for social science research. One is the describe() function which gives you a bunch of variable level summary statistics (e.g., mean, median, se, etc.)
2
The shapiro.test() performs a Shapiro Wilk test of normality. Keep in mind this particular test is very sensitive to sample sizes
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 32 20.09 6.03 19.2 19.7 5.41 10.4 33.9 23.5 0.61 -0.37 1.07
Shapiro-Wilk normality test
data: data$mpg
W = 0.94756, p-value = 0.1229
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 32 3.22 0.98 3.33 3.15 0.77 1.51 5.42 3.91 0.42 -0.02 0.17
Shapiro-Wilk normality test
data: data$wt
W = 0.94326, p-value = 0.09265
Running An Actual Correlation
There are multiple packages and methods for calculating a correlation in R depending on what you want to assess. The best to use for psychology is probably the corr.test() function in the psych package because it allows you to change the type of correlation you wish to compute (e.g., spearman vs pearson) as well as generate confidence intervals and do p value adjustments
Choose one of your variables to be your x variable
2
Choose the other to be your y variable
3
You can choose “pairwise” or “complete”. For information on what each does, use the following function to access the documentation: ?psych::corr.test()
4
You can adjust method to be other ones like “spearman”
5
You can also use “bonferroni” among a few others
Below we will see the output of the correlation results as you might be used to seeing in a program like SPSS.
print(corr_results)
Call:corr.test(x = data$mpg, y = data$wt, use = "pairwise", method = "pearson",
adjust = "holm")
Correlation matrix
[1] -0.87
Sample Size
[1] 32
These are the unadjusted probability values.
The probability values adjusted for multiple tests are in the p.adj object.
[1] 0
To see confidence intervals of the correlations, print with the short=FALSE option
While the above is great, notice we didn’t get a confidence interval output despite asking for it with ci = TRUE. Sometimes R will store complex computations within the output object (e.g., corr_results). To get this output we can put a $ after the output. If there is extra information stored, but not shown, we’ll get a drop down box. We want the ci option. Below we will see the output that results from this. We should see the following:
\(r\) = -.87 , \(p\) < .001 with a CI = [-.93,-.74]
print(corr_results$ci)
lower r upper p
NA-NA -0.9338264 -0.8676594 -0.7440872 1.293959e-10