AAAL 2024: Invited Colloquium

Leveraging artificial intelligence in second language acquisition research

Convener:

Scott Crossley, Vanderbilt University

Colloquium Abstract

This colloquium gathers experts in the field of natural language processing, language analytics, and learning engineering to demonstrate how advances in artificial intelligence (AI) can inform second language acquisition (SLA) research. In particular, the colloquium focuses on the application large language models (LLMs) in language research. LLMs are neural network architectures which use the principle of self-attention to generate large, pre-trained models which can then be further finetuned for downstream tasks. These pre-trained models are trained on large corpora using masked language modeling, in which the text is tokenized, but some tokens are masked. The task of masked language modeling is to predict the masked tokens based on all the tokens that come before and after the masked token. After many epochs of training on very large corpora, the parameters of the model come to represent a general knowledge of the language domain on which they were trained. LLMs can be further fine-tuned on specific language tasks using appropriate labeled training data to create more fine-grained language models.

The colloquium will provide an overview of LLM architecture and theory. Additionally, the colloquium will provide applications for LLMs trained on English corpora in SLA domains. Specifically, the talks will focus on using LLMs to enhance chatbots that focus on language learning, to improve automatic speech recognition systems for non-native speech, to measure the lexical output of language learners, to generate verbal and visual cues for language learning, and to identify lexico-grammatically ambiguous language.


Chatbots for second language learning

Zhou Yu, Columbia University

Abstract: It is estimated that over one billion people are currently learning English as a second language worldwide. Dialog systems could support second language conversational skill training by acting as native-speaker conversational partners. This talk will cover three projects geared toward developing an engaging, effective language partner. We will first discuss how to build a coherent chatbot that discusses a specific topic without collecting massive conversational training data. We will discuss how to leverage large pre-trained language models to synthesize targeted data for model adaptation. Then we will discuss distilling different curriculums, such as vocabulary and grammar knowledge into chatbots through constrained neural decoding methods. Finally, we will describe the design and experimentation of various grammar feedback strategies in human-chatbot conversations. We will provide takeaways concerning user engagement and self-efficacy with respect to learners with different proficiency levels and learning motivations.


AI approaches to L2 pronunciation evaluation and intervention

Kevin HirschiNorthern Arizona University

Okim Kang, Northern Arizona University

Abstract: Automatic Speech Recognition systems have greatly benefitted by recent advances in Neural Networks (NNs) and Large Language Models (LLMs). While computationally expensive, cutting-edge systems utilize far greater amounts of acoustic information while simultaneously requiring smaller model training datasets to produce increasingly accurate output. However, their use in Second Language (L2) contexts has only begun to be explored, and most efforts assess accent reduction rather than pedagogical goals like comprehensibility and intelligibility. This talk will review the transition of ASR technology from classical probabilistic models to cutting edge NNs with LLMs in terms of implications for L2 pronunciation evaluation within the intelligibility framework. It will review recent work in adapting NN-based recognition systems for L2 pronunciation evaluation, highlighting the advantages, disadvantages, and resources necessary for undertaking such endeavors. We will then share recent findings of LLM-based ASR testing using a recent corpus of spontaneous speech (L2USI, Kang et al., forthcoming) and report on phonological features of L2 speech that remain challenging amongst leading ASR systems. Finally, we will present process designs for leveraging ASR output for pronunciation learning, highlighting its potential for salient feedback that promotes more comprehensible and intelligible speech. Implications will be given for researchers interested in incorporating ASR in their L2 pronunciation research, including considerations of the suitability of modern systems for specific L1 backgrounds, proficiency levels, and contexts.


AI assessments of word predictability and L2 language proficiency
Langdon Holmes, Vanderbilt University
Wesley Morris, Vanderbilt University
Joon Suh Choi, Vanderbilt University

Abstract:  Psycholinguistic research has recently underscored the role of prediction in language processing. This body of work demonstrates that processing difficulty is proportional to word surprisal, an information-theoretic construct that measures the unexpectedness of a word (Huettig et al., 2022; Futrell et al., 2020). This study applies this research to the context of second language acquisition, introducing a measure of word predictability that can be calculated using a large language model (LLM). We test the extent to which word predictability can discriminate the writing proficiency of English language learners on the TOEFL independent writing task. A support vector classifier ranked student essays into low, medium, and high proficiency levels and correctly classified 75% of learner essays in a balanced subsample using only 2 textual features: word count and word predictability. This represents a parsimonious model that achieves higher accuracy over previous automated essay scoring systems on the same dataset using multiple linguistic features as predictors (Vajjala, 2018). Results indicate that more proficient language learners make more predictable word choice decisions, suggesting that predictive competency plays an important role in learner language development.


SmartPhone: Exploring keyword mnemonic with auto-generated verbal and visual cues

Andrew Lan, University of Massachusetts

Abstract:  In second language vocabulary learning, existing research has primarily focused on either the learning interface or scheduling personalized retrieval practices to maximize memory retention. However, the learning content (e.g., the information presented on flashcards) has mostly remained constant. Keyword mnemonic is a notable learning strategy that relates new vocabulary to existing knowledge by building an acoustic and imagery link using a keyword that sounds alike. However, producing verbal and visual cues associated with the keyword to facilitate building these links requires a manual process and is not scalable. This paper details a pipeline for automatically generating verbal and visual cues in one shot via text generator and text-to-image generator. First, we propose a large language model (LLM)-based pipeline that automatically generates highly memorable verbal and visual cues for an L1 word in language learning. Second, we design an experiment using language learners to explore whether the approach is effective. In doing so, we implement a web application for the experiment, which is reusable for future experiments. We analyze the results of the experiment by comparing four conditions: automatically generated keyword only, automatically generated keyword with a verbal cue, automatically generated keyword with both verbal and visual cues, and manually generated keyword and verbal cues. The findings indicate that automatically generated verbal cues are not as effective as keywords alone and visual cues, suggesting that it is important for learners to come up with their own cues. We also found that visual cues are effective when the generated images are meaningful. We discuss limitations of the current work and possibilities for future work.


Identifying lexico-grammatically ambiguous language features using large language models

Kristopher Kyle, University of Oregon
Hakyung Sung, University of Oregon
Masaki Eguchi, University of Oregon

Abstract: Many features of proficient written and spoken discourse are lexicogrammatically ambiguous. Until recently, researchers interested in such features had to conduct manual and/or semi-automatic searches, which practically limited the amount of data that could be analyzed and precluded the use of these features in automated feedback and scoring systems (Lu, 2021). For example, most research into the relationships between argument structure construction (ASC; Goldberg, 1995) use and language development (e.g., Ellis & Ferreira-Junior, 2009; Goldberg et al., 2004) has been limited with regard to the amount of data examined and/or the number of features investigated. This has also been the case with discourse features such as engagement strategies (e.g., Martin & White, 2005; Wu, 2007). With the advent of large-language models such as RoBERTa (Liu et al., 2019), however, which use a) highly featured, contextually aware vector spaces and b) very large reference corpora to model language use, the automated analysis of lexicogrammatically ambiguous features is now possible. 
 
In this presentation, we report on two recent projects that have leveraged large language models to conduct automatic analyses of ASC and engagement strategy use. These models achieved accuracy figures that are on par with or exceed that of trained human annotators. Importantly, both projects used a relatively small amount of training data (e.g., as few as 4,000 sentences), suggesting that related pipelines can be created with relatively few resources. We discuss the implications and limitations of these projects and suggest areas for future research. Furthermore, we highlight the importance of making annotated data publicly available to the research community.


>>>Back to AAAL 2024 Invited Colloquia