UP | HOME

R Practice 3

1 Variables

Dependent
Attractiveness
Independent
3 Factors
  • No alcohol
  • 2 pints
  • 4 pints

2 dataset Package

WRS2 provides a dataset called goggles. The following code loads the package, loads the goggles dataframe and shows its first rows.

library(WRS2)
data(goggles)
head(goggles)

  gender alcohol attractiveness
1 Female    None             65
2 Female    None             70
3 Female    None             60
4 Female    None             60
5 Female    None             60
6 Female    None             55

summary shows some information about the dataframe fields.

summary(goggles)
   gender      alcohol   attractiveness 
Female:24   None   :16   Min.   :20.00  
Male  :24   2 Pints:16   1st Qu.:53.75  
            4 Pints:16   Median :60.00  
                         Mean   :58.33  
                         3rd Qu.:66.25  
                         Max.   :85.00

3 psych

psych package provides the describeBy function which shows more details about the relation between two fields.

First, the following loads the package.

library(psych)

Secondly, it shows the relation between attractiveness according to the amount of alcohol drinked. The results is grouped between the three possible values of the alcohol domain.

describeBy(goggles$attractiveness, goggles$alcohol)

 Descriptive statistics by group 
group: None
   vars  n  mean   sd median trimmed   mad min max range skew kurtosis   se
X1    1 16 63.75 8.47   62.5   63.57 11.12  50  80    30 0.29    -1.07 2.12
------------------------------------------------------------ 
group: 2 Pints
   vars  n  mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 16 64.69 9.91     65   64.64 7.41  45  85    40 0.08    -0.23 2.48
------------------------------------------------------------ 
group: 4 Pints
   vars  n  mean    sd median trimmed   mad min max range  skew kurtosis   se
X1    1 16 46.56 14.34     50   46.79 14.83  20  70    50 -0.22    -1.21 3.59

4 Plotting

Plotting can be done by the plot function or one of their most specific one. boxplot creates a box plot which shows the relation between attractiveness (Y axis) and alcohol (X axis).

boxplot(goggles$attractiveness ~ goggles$alcohol)

boxplot.png

5 Outlier values

The rapportools package provides the rp.outlier functions. It search for all extreme values. Use help(rp.outlier) for more information about the method used.

library(rapportools)
a <- rp.outlier(goggles[goggles$alcohol == "None",  "attractiveness"])
print(a)
b <- rp.outlier(goggles[goggles$alcohol == "2 Pints", "attractiveness"])
print(b)
c <- rp.outlier(goggles[goggles$alcohol == "4 Pints", "attractiveness"])
print(c)

NULL

NULL

NULL

6 Test for Anova Requirements

Before using the Anova, some requirements must be met. The following section shows how to test for normality of their data and homoscedasticity (variance homogeneity).

6.1 Normalidad

The Shapiro-Wilk test for normality on the data. If the p-value is greater than 0.05, it implies that distribution of the data is not significantly different from a normal distribution.

by(goggles$attractiveness, goggles$alcohol, shapiro.test)
goggles$alcohol: None

	Shapiro-Wilk normality test

data:  dd[x, ]
W = 0.95498, p-value = 0.5725

------------------------------------------------------------ 
goggles$alcohol: 2 Pints

	Shapiro-Wilk normality test

data:  dd[x, ]
W = 0.94489, p-value = 0.4132

------------------------------------------------------------ 
goggles$alcohol: 4 Pints

	Shapiro-Wilk normality test

data:  dd[x, ]
W = 0.952, p-value = 0.522

6.2 TODO Homoscedasticity or variances homogeneity

A set of variables is homoscedastic if all of them have the same finite variance. The following are example codes that apply the Barlett, Levene and Fligner tests.

bartlett.test(goggles$attractiveness, goggles$alcohol)
library(car)
leveneTest(goggles$attractiveness, goggles$alcochol)
fligner.test(goggles$attractivess, goggles$alcohol)

7 Anova

Once the homoscedaticity and normal distribution of data are confirmed, the Anova analisys can be applied.

The following code execute the Anova and display the results.

analysis <- aov(attractiveness ~ alcohol, data=goggles)
summary(analysis)

            Df Sum Sq Mean Sq F value   Pr(>F)    
alcohol      2   3332  1666.1   13.31 2.88e-05 ***
Residuals   45   5634   125.2                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

7.1 Which one?

aov does not show which are the groups related. TukeyHSD display grouped with the different factors the p-value of their relation. A p-value less than 0.05 means that there are sifnificantly difference between the factors.

TukeyHSD(analysis)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = attractiveness ~ alcohol, data = goggles)

$alcohol
                    diff        lwr       upr     p adj
2 Pints-None      0.9375  -8.650654 10.525654 0.9695381
4 Pints-None    -17.1875 -26.775654 -7.599346 0.0002283
4 Pints-2 Pints -18.1250 -27.713154 -8.536846 0.0001067

8 License of This Work

This work is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nd/4.0/.

Creative Commons Licence
R Practice 3 by Gimenez Christian is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.