Selection, Implementation and Testing of Language Sample Analysis Measures for the Web-Based Application MultiDis

Hržica, Gordana; Košutar, Sara; Karl, Dario; Kramarić, Matea

izvor podataka: crosbi !

Selection, Implementation and Testing of Language Sample Analysis Measures for the Web-Based Application MultiDis (CROSBI ID 723422)

Prilog sa skupa u zborniku | sažetak izlaganja sa skupa | međunarodna recenzija

Hržica, Gordana ; Košutar, Sara ; Karl, Dario ; Kramarić, Matea Selection, Implementation and Testing of Language Sample Analysis Measures for the Web-Based Application MultiDis // LLOD Approaches for Language Data Research and Management LLODREAM2022: International Scientific Interdisciplinary Conference / Autorių kolektyvas (ur.). Vilnius: Mykolo Romerio universitetas, 2022. str. 40-41

Podaci o odgovornosti

Autori

Hržica, Gordana ; Košutar, Sara ; Karl, Dario ; Kramarić, Matea

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Selection, Implementation and Testing of Language Sample Analysis Measures for the Web-Based Application MultiDis

Sažetak

Purpose: The MultiDis application is a new, web- based application designed for the analysis of spoken and written language samples, which provides information about the language abilities of children and adults, thus facilitating language assessment. The aim of this paper is to present the selection, implementation, and testing of language measures in the MultiDis application. We will present the application, the process of selecting the measures we implemented, the language resources needed to calculate them, and the results of testing. MultiDis is currently being developed for Croatian, but it could be scaled up for multilingual analysis. Design/methodology/approach: Language samples can be analyzed according to several dimensions, such as productivity, lexical diversity, and syntactic complexity. A set of (semi-) automatic measures has been selected to assess language abilities (e.g., number of lemmas, mean-average type-token ratio, mean length of communication unit). The next step was the integration of an open-source Python library for lemmatization, part-of-speech tagging, and syntactic parsing (Stanza ; Qi et al., 2020). To test whether these tasks and the subsequent calculation of language measures can be successfully performed on spoken language samples, we uploaded 150 short narrative samples produced by children as a result of a storytelling task. Findings: Lemmatization and part-of-speech tagging are fairly accurate (>85% of cases), as they do not interfere with the calculation of the currently implemented measures of productivity and lexical diversity. The process of syntactic parsing has been an obstacle that is currently being resolved. Research limitations/implications: The MultiDis web application is still under development, although the current version fulfils its main purpose – it allows for (semi-)automatic spoken language analysis. Practical implications: There is an increasing awareness of the importance of language sample analysis as a complementary method in language assessment. The time needed for transcription and the linguistic knowledge required for manual analysis are considered to be the main obstacles to its implementation (Pezold et al., 2020). Therefore, the development of a tool for automatic calculation of language measures such as the MultiDis application could make naturalistic language assessment more feasible. Originality/Value: The value of this study lies in proposing a new application for lemmatization and part-of-speech tagging that allows for more reliable calculation of measures of productivity, lexical diversity, and syntactic complexity. Selecting appropriate measures for language assessment is a challenging task because there are many available. Implementing language technologies developed for large bodies of written texts to spoken language is also challenging. Success in some parts of automated tagging (lemmatization and part-of-speech tagging) allows for the reliable calculation of measures of productivity and lexical diversity. Future work on syntactic parsing will lead to the successful implementation of measures of syntactic complexity.

Ključne riječi

Language Sample Analysis ; Microstructural Measures, Lemmatization, Part-of-speech tagging, Syntactic parsing

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o prilogu

Stranice rada

40-41.

Godina izdavanja

2022.

Status objave rada

objavljeno

Podaci o matičnoj publikaciji

Naslov

LLOD Approaches for Language Data Research and Management LLODREAM2022: International Scientific Interdisciplinary Conference

Urednici

Autorių kolektyvas

Izdavač

Vilnius: Mykolo Romerio universitetas

ISBN

978-609-488-041-4

Podaci o skupu

Skup

LLOD approaches for language data research and management (LLODREAM 2022)

Vrsta sudjelovanja

predavanje

Datum održavanja skupa

21.09.2022-22.09.2022

Mjesto održavanja skupa

Vilnius, Litva

Povezanost rada

Povezane osobe

Gordana Hržica (autor/i)

Sara Košutar (autor/i)

Dario Karl (autor/i)

Matea Kramarić (autor/i)

Povezane ustanove

Edukacijsko-rehabilitacijski fakultet (013) (autorova ustanova)

Povezani projekti

Višerazinski pristup govornom diskursu u jezičnom razvoju (rezultat rada na projektu)

Područje

Filologija, Interdisciplinarne društvene znanosti, Interdisciplinarne humanističke znanosti, Logopedija

Poveznice

llodapproaches2022.mruni.eu