Degrees of difficulty to learn an additional language: The role of typological distinctions and linguistic distances between the languages involved

Job Schepens, Technische Universität Dortmund
Frans van der Silk, Radboud University
Roeland van Hout, Radboud University

Discussant: Scott Jarvis, University of Utah



Transfer plays an important role in second language acquisition (SLA). It appears that humans can quickly perform quite well on new but similar tasks, such as learning an additional language that is similar to a previously learned language. In contrast, difficulty in learning a new language depends on the typological distinctions and the linguistic distance between the languages involved.

New approaches are currently being developed that might present opportunities for closer understanding of the learning mechanisms underlying transfer in SLA. Big data and its tools often play an important role in these developments and their applications, in the following ways: 

  • Language testing institutions can provide access to language proficiency testing scores for learners with diverse language backgrounds and learning trajectories (age and exposure).

  • Educational technology generates similarly large numbers of linguistic constructions from language learners with diverse backgrounds.

  • Typological databases make it easier to quantify linguistic differences across many languages.

  • Computational tools are available for SLA research, such as mixed effects modeling, NLP, and Bayesian modeling.

These innovations have resulted in new theoretical perspectives that quantify the roles of linguistic similarities or distances. Recent research suggests that the linguistic starting points of the learners determine aspects across all domains of language proficiency. Varying types of similarity also seem to have varying impacts on language learnability. This colloquium showcases new research on the different types of similarity and highlights the implications for additional language learning.

Capturing the role of L1 experience in L2 learning
Florian Jaeger, University of Rochester

An adult learner’s native language (L1) has a tremendous influence on the difficulty they experience when acquiring a second or other language (L2). Recent estimates attribute as much as 69% of the explained variance in L2 speaking proficiency to learners’ L1 background (Schepens et al., 2019). However, how to best describe or model the influence of L1 knowledge and experience on L2 learning has remained a challenge. This holds, in particular, with regard to the most fine-grained aspects of L1 knowledge, such as the implicit knowledge about the mapping from phonological categories or words onto acoustic dimensions (e.g., voice onset timing, energy formants, etc.)—i.e., the knowledge that allows listeners to recognize the basic building blocks of language.

I will use a related, but simpler, learning problem—native speakers’ adaptation to an unfamiliar foreign accent—to demonstrate how Bayesian inference provides an effective way to model previous experience and its effect on learning. Computational models that implement Bayesian inference (ideal adapters, Kleinschmidt & Jaeger, 2015) allow us to make testable predictions about how a learner’s implicit knowledge (or in Bayesian terminology: beliefs) changes with exposure to unfamiliar input (e.g., from a novel language or an unfamiliar foreign accent), and how these changes are predicted to affect, for example, comprehension (Xie et al., in progress).

The advantages of explaining learners’ L2 Dutch language variation by means of L1-Ln lexical, morphological, and phonological distance measures
Job Schepens, Freie University; Frans van der Slik, Radboud University; and Roeland van Hout, Radboud University

We studied the impact of three L1-to-additional language (Ln) Dutch distance measures on the speaking test scores of more than 50,000 adult learners of Dutch: lexical distance, morphological distance and phonological distance. Lexical distance is an absolute measure that expresses branch lengths in a phylogenetic language tree based on expert cognacy judgements of words in Swadesh lists (Gray & Atkinson, 2003). Morphological distance is a relative measure relating the properties of an L1 to the properties of an Ln (Ln Dutch) that is based on selected morphological features as described in WALS, used by Lupyan and Dale (2010). Phonological distance is a relative measure, too, relating the new features for an L1 to the features of an Ln (Ln Dutch). This measure is based on selected phonological features as described in PHOIBLE (Moran, McCloy & Wright, 2014) (Schepens, Jaeger & van Hout, submitted).

The impact of the three distance measures on the acquisition of Dutch as an additional language was examined in immigrants from 49 mother tongue backgrounds, spoken in 74 countries, 20 of which were Indo-European (IE) and 13 non-Indo-European (non-IE). We found that the combination of lexical, morphological and phonological distance measures successfully yields an accumulative, unbiased, and fairly complete account of differences in Ln Dutch speaking test scores.

Linguistic typology and learnability in second language
Dora Alexopoulou (in collaboration with Xiaobin Chen and Ianthi Tsimpli), University of Cambridge

In this work we exploit recent results from linguistic typology and large datasets from online language learning  to provide a typological framework for  the investigation of linguistic distance based on an empirical investigation of a set of 10 typologically diverse languages. Our approach to measuring linguistic distance is syntactic, complementing recent approaches relying on lexical, morphological and phonological features (e.g. Schepens et al., 2016; Borin & Saxena, 2013). Specifically, to measure linguistic distance between L1 and L2, we adopt the  Parametric Comparison Method (PCM) (Longobardi & Guardiano 2009).  Following the Principles and Parameters framework, PMC uses binary parameters to model cross-linguistic variation and measures distance through identities and differences in parameter values. It yields measures refined enough to differentiate between as many as 28 languages and successfully distinguish between language genealogies.

To obtain a dataset rich enough for the investigation of typological effects across developmental stages with significant learner numbers, we exploit advances in online learning technology.  Specifically, we use the EF Cambridge Open Language Database (EFCAMDAT), an open access corpus consisting of L2 writings submitted to Englishtown, the online school of EF Education First, an international school of English as a foreign language. EFCAMDAT is an open access corpus standing out for its size, with 1.2 million scripts totalling 71.8 million words.  Available at, it contains 128 distinct tasks across the proficiency spectrum drawing from learners across the globe (170 nationalities).  

Our main research question concerns the impact of linguistic distance on the acquisition of L2 features that are absent from the L1.  Specifically,  we focused on whether there is evidence for  typological effects on the  acquisition of individual features rather than language-specific effects only. We draw evidence from two phenomena, the acquisition of relative clauses and the acquisition of articles.

Moderating Effects of Aging and Linguistic Dissimilarity on a Test of English Grammaticality Judgement
Frans van der Silk, Radboud University
Roeland van Hout, Radboud University

We analyzed data from nearby 15,000 immersion language learners of English, speaking 64 different first languages (L1s). These data are a crucial subset of the big data set analyzed in Hartshorne, Tenenbaum and Pinker (2018). We used mixed effects regression to investigate age of onset, education, length of residence, gender, and current country of residence as predictors of grammatical competence, measured by a grammaticality judgment test. In addition, we constructed measures of morphological, lexical, and phonological distance between their L1s and English. 

It was found that the learners’ grammaticality scores were affected by age of onset in an almost linear declining way (no critical period, contrary to the conclusion of Hartshorne, Tenenbaum and Pinker) and that increasing language dissimilarity amplifies the negative effects of acquiring English as a second language (L2) at later ages. Our conclusion is twofold: we not only could identify and quantify distinguishing elements of the previously learned mother tongue but we were also capable of showing that they behave as a cultural factor that poses strong constraints on learning an additional language.