Using R for
Statistics and Data Analysis
R
is a versatile and powerful programming environment for statistics and data
analysis, but it is far from user-friendly for the novice. First, you should install R on your
computer. Versions for Windows, MacOS X, and Linux are available at http://cran.cnr.berkeley.edu/.
In
these guides, R code will be shown as red text,
example R output as blue
text, and comments as
green text, preceded by the R comment symbol
#.
This
isn’t intended to be an exhaustive guide to R (if such a thing is even
possible). The topics are chosen to
address the material covered in my class, EART125 – Statistics and Data
Analysis in the Geosciences.
Topics:
R Basics: Data and Data Manipulation, Variables and
Functions
Data Analysis
Functions apply(), split() and sapply(),
by()
Installing and
Loading Add-on Packages
Statistics
in R
Descriptive
Statistics: mean, median, variance, standard deviation
Univariate, Parametric Statistics: t-test, F test,
ANOVA, Bartlett test
Tests for
Normality: Q-Q plots, Shapiro-Wilk test,
Kolmogorov-Smirnov test
Univariate, Non-Parametric Statistics: Wilcoxon
rank-sum/Mann-Whitney U test, Kruskal-Wallis test, Levene test/Brown-Forsythe test
Multivariate,
Parametric Statistics: Hotelling T2 test, MANOVA
Statistics for
Categorical/Count Data: Exact binomial test, exact multinomial test,
G-test, Fisher’s exact test
Correlation
and Regression: Pearson r, Spearman rho, and Kendall tau, linear and
logistic regression, autocorrelation and differencing
Useful
Functions for EART125
(copy and paste in the R window)
G-test for
goodness-of-fit or independence
Levene test/Brown-Forsythe test
Reduced major axis
(RMA) and Major axis (MA) regression