Construct definition in interpreter testing: Interdisciplinary perspectives (ILTA@AAAL Joint Session)


Interpreting studies is often referred to as an “interdiscipline” (Pöchhacker, 2016) due to the history of interdisciplinary collaborations in the field since the early investigations into the cognitive processes engaged in conference interpreting. Research into the institutional discourses mediated by interpreters working in the courts (Berk-Seligson, 1990; Hale, 2004; Pollabauer, 2007) and medical systems (Angelelli, 2004; Davidson, 2000; Meyer, 2003), for example, and investigations into the interpreter’s intervention in mediating communication in a range of institutional settings (Wadensjö, 1998; Roy, 2000) has been informed by applied linguistics, pragmatics and interactional sociolinguistics in particular. These lines of enquiry have contributed to more informed notions of how interpreters negotiate meaning in the performance of their role.

However, despite advances in our understanding of interpreting, the discipline still lacks comprehensive and empirically supported constructs of interpreter performance that can serve as the basis for defining constructs for testing. Historically, national testing systems for accreditation and certification have based their tests on hypothetical notions of professional interpreting performances and scored the performance on the basis of the accuracy of the interpretation using a ‘points-off’ system for errors. Combined with a lack of validation research, interpreter testing for the purposes of certification has been beset by a lack of validity and reliability.

This two-hour colloquium presents recent work that centres on the constructs and construct validity applied to interpreter testing. Papers examine the varied demands of testing in specialised areas of interpreting such as conference interpreting, court and medical interpreting, and for the purposes of national testing programs.  Moreover, each paper also analyses the competencies tested and attends to the challenges of ensuring that the tests are valid, reliable and administratively feasible.

Questions of validity in simulated medical interactions as the basis for testing medical interpreters

Heiyeon Myung, Macquarie University

The authenticity of test materials is a valued quality of performance tests that is expected to contribute to the construct validity of the test (Lewkovitz, 2000; Messick, 1996) and contributes to the test’s face validity. In attempting to replicate the characteristics of professional practice, interpreter testing has largely shifted to live performance testing. Test tasks involve two role-players who, in the medical domain, usually represent a doctor and a patient to simulate doctor-patient communication. In a Korean context, this would involve a Korean-speaking doctor and an English-speaking patient, for example. Their bilingual conversation is mediated by a medical interpreter. However, as demonstrated here, discourse analysis of data collected from roleplays used to simulate interpreter performance reveals that live roleplays do not accurately represent the realities of real-life interpreter-mediated communication. One major difference is that at least one of the participants usually understands and speaks both languages and is therefore not dependent on the interpreter. For this reason, inaccurate interpreting does not lead to communication breakdown as it would if both interlocutors only understood one of the languages. Another problem identified in the data is that the actor playing the patient was inventing an illness for the task and was unable to answer probing questions regarding the precise nature of the symptoms for diagnosis. This resulted in a lack of depth and authenticity in the doctor-patient communication.

This paper outlines the features of ‘authentic’ doctor-patient communication and compares this with the simulated and interpreted communication in order to examine the degree of authenticity achieved in roleplayed interpreter tests. The main aim is not to negate the value of simulations for interpreter testing, but to provide evidence for a re-think of how to prepare a more authentic communicative roleplay for interpreter performance assessment.

Keyword correspondence as a construct for accuracy in interpreting: Designing a certification exam for interpreters in Taiwan

Minhua Liu, Hong Kong Baptist University

This paper describes the process through which a construct for accuracy in interpreting was identified and validated in a Taiwanese certification exam for interpreters. In an attempt to develop a more objective measure to assess quality in interpreting, specifically in the context of large-scale examinations, we explored how well an interpreting output corresponded to the original speech on a set of keywords selected from the original speech. Ten student interpreters with Mandarin Chinese as either their A language or on equal par with their English participated in the study. They each interpreted two English speeches into Mandarin and two Mandarin speeches into English consecutively. For each language direction, one speech was interpreted in a short consecutive mode while the other in long consecutive. The procedures yielded 40 consecutive interpretations in four conditions. Two native speakers of Chinese selected the keywords in the Chinese speeches and two native speakers of English the English speeches. The final set of keywords for a specific speech was composed of the ones that were selected by both judges. As the keywords selected for this study were judged as critical to the content of the source speech, the study hypothesized that the more source-speech keywords the interpreters rendered into the target speech, the more faithful the interpretations tended to be. The scores obtained from this measure were compared with the scores of the same interpreting outputs based on rubric-based evaluations (with inter-rater reliabilities ranging from 0.835 to 0.966) by calculating their correlation coefficients in an attempt to show how well a more objective measure can predict a more subjective one. Significantly strong correlations (0.738, 0.751, 0.810 and 0.870) were found in the four sets of data, showing that keyword correspondence is strongly connected with accuracy and thus may serve as a reliable accuracy indicator.

Rubric design for the assessment of ASL/English interpreted interaction

Betsy Winston, TIEM Center
Rico Peterson, Rochester Institute of Technology
Christine Monikowski, Independent Scholar, 
National Technical Institute for the Deaf
Robert G. Lee, University of Central Lancashire (UK)

Laurie Swabey, St. Catherine University

The need for valid, reliable measurements of sign language/spoken language interpreter competence has bedeviled interpreter education and interpreting practitioners for many years. In the academic setting, rubrics and checklists, for example, are used to elucidate expectations about proficiency standards. Clarifying the purpose and value of these instruments might help students understand that rubrics need not be confined to their scholastic work, as they are equally valuable for professional interpreters to evaluate continuing growth and development. The Interpreter Assessment Project was established to heighten awareness of the many values of assessment by offering practitioners the means to engage in the authentic formative assessment and self-assessment of their interpreting performance.

To this end, a team of expert interpreting assessment specialists has been established, tasked with creating authentic rubrics that measure sign language/spoken language interpreting proficiency. After conducting a review of existing rubrics, including rubrics used by certifying bodies, educational institutions, employers, and educators, we have developed and piloted rubrics that focus on assessing both the products and the processes involved in effective interpreting interaction and communication.

This paper shares the constructs and structure of a recently developed rubric for assessing American Sign Language/English interpreters’ effectiveness and the results of the rubric pilot. The rubric can be used to assess both simultaneous and consecutive interpretations across a variety of settings (e.g., education, community, health). We describe the domains and sub-domains of practice identified for assessment (Interpretation Production: content, intents & purposes, and communicative cues; Interpreting Process: communication management, situation & environmental management, ergonomic management), outline the descriptors for each domain, and discuss the scoring scales that have been developed. Taken together, the rubric supports a comprehensive assessment of effective interpretations and underlying processes.


Getting the interaction right: Testing the interactional competence of interpreters in a national certification test.

Helen Slatyer, Macquarie University
Adolfo Gentile, National Accreditation Authority for Translators and Interpreters (NAATI)
Magdalena Rowan, NAATI
Nora Sautter, NAATI

Our understanding of the interactional dynamics of interpreter-mediated communication is shifting towards a more communicatively engaged interpreter who interacts with the other interlocutors in the conversation to better manage the communication. Interpreters are no longer expected to be passive, neutral and detached (Llewellyn-Jones & Lee, 2013; Mason & Ren, 2012). When it comes to testing interpreters for the purpose of certification, test tasks and scoring procedures need to assess the ability of interpreters to skillfully and implicitly allocate turns, take the floor and initiate repair  as required (Levinson, 2016; Sacks, Schegloff & Jefferson, 1974) using gaze, voice, and body language (Davitti & Pasquandrea, 2017; Mason, 2012). Responses to a national survey of the knowledge, skills and attributes of interpreters by the National Accreditation Authority for Translators and Interpreters (NAATI) in Australia identified interactional skills as a characteristic of professional interpreting practice. Building on the results of the survey, NAATI has re-designed the interpreter certification test to include interactional skills. This is a live performance test that aims to simulate real-world interpreting in institutional contexts that are typical of interpreter’s professional practice.
This paper traces the test development process starting with the interactional sociolinguistic literature on interpreter mediation in dialogic communication and how this research translates into the development of a test construct. We outline the test specifications, test methods and materials: test tasks, rating scales and rating process and explicitly map the construct definition to these test materials and processes. Finally, we highlight some of the challenges inherent in testing complex interactional skills in a multilingual, multicultural, high-stakes testing environment.