Chapter 21 Correlation

21.1 Intro

A correlation coefficient is an estimation of the degree to which two variables have a linear association, and is the square root of their mutual proportions of explained variance.

21.1.1 Example dataset

This example uses the Rosetta Stats example dataset “pp15” (see Chapter 1 for information about the datasets and Chapter 3 for an explanation of how to load datasets).

21.1.2 Variable(s)

From this dataset, this example uses variables highDose_AttDesirable_long, highDose_AttDesirable_intens, highDose_AttDesirable_intoxicated, highDose_AttDesirable_energy, and highDose_AttDesirable_euphoria; these are a number of expressions of which effects people prefer when using MDMA (see Chapter 1).

21.2 Input: jamovi

In jamovi, use the ‘Regression’ menu, choose ‘Correlation matrix’, and select the variables you want to include.

Opening the "Regression" menu in jamovi

Figure 21.1: Opening the “Regression” menu in jamovi

You can check the checkbox for confidence intervals to order confidence intervals.

21.3 Input: R

Many analyses can be done with base R without installing additional packages. The rosetta package accompanies this book and aims to provide output similar to jamovi and SPSS with simple commands.

21.3.1 R: base R

A basic correlation matrix can be produced with cor(), passing argument use="complete.obs" if there are missing values in the dataset (otherwise, missing values result in a correlation estimate that it also missing).

cor(
  dat[
    ,
    c(
      'highDose_AttDesirable_long',
      'highDose_AttDesirable_intens',
      'highDose_AttDesirable_intoxicated',
      'highDose_AttDesirable_energy',
      'highDose_AttDesirable_euphoria'
    )
  ],
  use = "complete.obs"
);

To obtain confidence intervals for a correlation, cor.test() can be used. However, this function only works for one correlation.

cor.test(
  dat$highDose_AttDesirable_long,
  dat$highDose_AttDesirable_intens
);

21.3.2 R: rosetta (ufs)

A correlation matrix function has not yet been made available in the rosetta package, but it is available in the ufs package that comes installed with rosetta. Therefore, if you have rosetta installed, you can use the following command.

ufs::associationMatrix(
  dat,
  x = c(
    'highDose_AttDesirable_long',
    'highDose_AttDesirable_intens',
    'highDose_AttDesirable_intoxicated',
    'highDose_AttDesirable_energy',
    'highDose_AttDesirable_euphoria'
  )
);

This function provides the confidence intervals (the confidence level, by default \(95\%\), can be set with argument conf.level) as well as the point estimates and associated \(p\)-values. The \(p\)-values are corrected for multiple testing (using the false detection rate approach by default; this can be set using the correction argument; for example, pass correction="none" to not correct the \(p\)-values), and sample sizes are printed as well if they differ for each comparison (and omitted if they are the same for all correlation coefficients).

21.4 Input: SPSS

For SPSS, there are two approaches: using the Graphical User Interface (GUI) or specify an analysis script, which in SPSS are called “syntax”.

21.4.1 SPSS: GUI

Click the “Analyze” menu, then select the “Correlate” submenu, and then select “Bivariate”. Then specify the variables you’re interested in.

Opening the "bivariate" submenu in SPSS

Figure 21.2: Opening the “bivariate” submenu in SPSS

21.4.2 SPSS: Syntax

CORRELATIONS
  /VARIABLES = 
    highDose_AttDesirable_long
    highDose_AttDesirable_intens
    highDose_AttDesirable_intoxicated
    highDose_AttDesirable_energy
    highDose_AttDesirable_euphoria
.

21.5 Output: jamovi

The produced correlation matrix in jamovi

Figure 21.3: The produced correlation matrix in jamovi

21.6 Output: R

21.6.1 R: base

A correlation matrix (note: the variable names have been manually shortened, and the resulting correlations have been rounded to four decimal places, to make this example fit in the book):

         long intens intoxi energy euphor
long   1.0000 0.5724 0.3737 0.3885 0.4663
intens 0.5724 1.0000 0.5843 0.3476 0.3441
intoxi 0.3737 0.5843 1.0000 0.3519 0.1474
energy 0.3885 0.3476 0.3519 1.0000 0.4772
euphor 0.4663 0.3441 0.1474 0.4772 1.0000

The results of cor.test() including the confidence interval:


    Pearson's product-moment correlation

data:  dat$highDose_AttDesirable_long and dat$highDose_AttDesirable_intens
t = 10.068, df = 208, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.4737307 0.6568901
sample estimates:
      cor 
0.5724077 

21.6.2 R: rosetta (ufs)

Note: the variable names in the first column have been adjusted to make the table fit, and make the labels consistent with those in Chapter 22.

Prefer long effects
Prefer intense effects r=[0.47; 0.66], r=0.57, p<.001
Prefer more intoxication r=[0.25; 0.48], r=0.37, p<.001 r=[0.49; 0.67], r=0.58, p<.001
Prefer more energy r=[0.27; 0.5], r=0.39, p<.001 r=[0.22; 0.46], r=0.35, p<.001 r=[0.23; 0.47], r=0.35, p<.001
Prefer more euphoria r=[0.35; 0.57], r=0.47, p<.001 r=[0.22; 0.46], r=0.34, p<.001 r=[0.01; 0.28], r=0.15, p=.033 r=[0.37; 0.58], r=0.48, p<.001

21.7 Output: SPSS

The output produced in SPSS

Figure 21.4: The output produced in SPSS

21.8 Read more

If you would like more background on this topic, you can read more in these sources: