Basic Statistics for Applied Linguists with R
Stefan Th. Gries
Professor, University of California, Santa Barbara
This workshop will familiarize participants with the statistical programming language R and how to use it for the (1) import and processing, (2) description, (3) visualization, and (4) analysis of linguistic data; it is aimed at beginners.
As for the first item, we will briefly discuss R's four most important data structures and how spreadsheet data are loaded into R and prepared for subsequent steps. In considering issues of description, we will turn to a variety of basic descriptive statistics used for categorical and numeric data, including frequencies, central tendencies, dispersions, and correlations. Next we will explore the visualization of data in a variety of simple but useful ways, such as dotcharts, boxplots, ecdf plots, and scatterplots. The emphasis will be on creating self-sufficient plots that can draw attention to trends or important data points in a data set. Then, in terms of analyzing linguistic data, we will discuss a variety of analytical scenarios that are frequently encountered in applied linguistics research.
Contrary to many introductory courses/workshops, however, this workshop will not deal with these scenarios in terms of simple monofactorial tests (chi-squared tests, t-tests, Pearson's r, etc.). Instead we will explore all these tests from a regression perspective. This approach may appear to be more complex than learning simple functions for simple tests. However, it is superior in that it shows how many statistical tests typically taught separately can in fact be viewed as only slightly different instantiations of a more general logic--generalized linear modeling. In addition, this approach better sets the stage for subsequent exploration of multifactorial regression modeling in the participants' own future work.
Stefan Th. Gries is currently (Full) Professor in the Department of Linguistics at the University of California, Santa Barbara. Gries is Honorary Liebig-Professor at the Justus-Liebig-Universität Giessen, Visiting Chair in the ESRC Centre for Corpus Approaches to Social Science, Lancaster University, and was a Visiting Professor at the 2007, 2011, 2013, and 2015 LSA Linguistic Institutes.
Theoretically, he is a cognitively-oriented usage-based linguist (with an interest in Construction Grammar) in the wider sense of seeking explanations in terms of cognitive processes influenced in particular by R. Harald Baayen, Douglas Biber, Nick C. Ellis, Adele E. Goldberg, and Michael Tomasello.
Methodologically, Gries is a quantitative corpus linguist at the intersection of corpus linguistics, cognitive linguistics, and computational linguistics, who has used a variety of different statistical methods to investigate linguistic topics such as morphophonology, syntax, the syntax-lexis interface, and semantics and corpus-linguistic methodology (corpus homogeneity and comparisons, association and dispersion measures, n-gram identification and exploration, and other quantitative methods), as well as first and second/foreign language acquisition. Occasionally, he has also used experimental methods (acceptability judgments, sentence completion, priming, self-paced reading times, and sorting tasks). His most recent work is particularly concerned with corpus research on second/foreign language learning and native vs. indigenized (South Asian) varieties of English; in particular, he has explored alternation phenomena, which have been widely studied using native speaker data, with an eye to determining to what degree non-native speakers exhibit similar patterns and preferences in their linguistic choices and how to analyze them best corpus-linguistically and statistically. Also, he has been involved in research questions of turn-taking in narratives (how patterned are narrators' turns and how much are they correlated with turn share?) and research in literary linguistics (do perceptions of climaxness in literary works correlate with tense switches?).