Selection, Implementation and Testing of Language Sample Analysis Measures for the Web-Based Application MultiDis

Hržica, Gordana; Košutar, Sara; Karl, Dario; Kramarić, Matea

Pregled bibliografske jedinice broj: 1216821

Selection, Implementation and Testing of Language Sample Analysis Measures for the Web-Based Application MultiDis

Hržica, Gordana; Košutar, Sara; Karl, Dario; Kramarić, Matea

Selection, Implementation and Testing of Language Sample Analysis Measures for the Web-Based Application MultiDis // LLOD Approaches for Language Data Research and Management LLODREAM2022: International Scientific Interdisciplinary Conference / Autorių kolektyvas (ur.).
Vilnius: Mykolo Romerio universitetas, 2022. str. 40-41 (predavanje, međunarodna recenzija, sažetak, znanstveni)

CROSBI ID: 1216821 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Selection, Implementation and Testing of Language Sample Analysis Measures for the Web-Based Application MultiDis

Autori
Hržica, Gordana ; Košutar, Sara ; Karl, Dario ; Kramarić, Matea

Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, znanstveni

Izvornik
LLOD Approaches for Language Data Research and Management LLODREAM2022: International Scientific Interdisciplinary Conference / Autorių kolektyvas - Vilnius : Mykolo Romerio universitetas, 2022, 40-41

ISBN
978-609-488-041-4

Skup
LLOD approaches for language data research and management (LLODREAM 2022)

Mjesto i datum
Vilnius, Litva, 21.09.2022. - 22.09.2022

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
Language Sample Analysis ; Microstructural Measures, Lemmatization, Part-of-speech tagging, Syntactic parsing

Sažetak
Purpose: The MultiDis application is a new, web- based application designed for the analysis of spoken and written language samples, which provides information about the language abilities of children and adults, thus facilitating language assessment. The aim of this paper is to present the selection, implementation, and testing of language measures in the MultiDis application. We will present the application, the process of selecting the measures we implemented, the language resources needed to calculate them, and the results of testing. MultiDis is currently being developed for Croatian, but it could be scaled up for multilingual analysis. Design/methodology/approach: Language samples can be analyzed according to several dimensions, such as productivity, lexical diversity, and syntactic complexity. A set of (semi-) automatic measures has been selected to assess language abilities (e.g., number of lemmas, mean-average type-token ratio, mean length of communication unit). The next step was the integration of an open-source Python library for lemmatization, part-of-speech tagging, and syntactic parsing (Stanza ; Qi et al., 2020). To test whether these tasks and the subsequent calculation of language measures can be successfully performed on spoken language samples, we uploaded 150 short narrative samples produced by children as a result of a storytelling task. Findings: Lemmatization and part-of-speech tagging are fairly accurate (>85% of cases), as they do not interfere with the calculation of the currently implemented measures of productivity and lexical diversity. The process of syntactic parsing has been an obstacle that is currently being resolved. Research limitations/implications: The MultiDis web application is still under development, although the current version fulfils its main purpose – it allows for (semi-)automatic spoken language analysis. Practical implications: There is an increasing awareness of the importance of language sample analysis as a complementary method in language assessment. The time needed for transcription and the linguistic knowledge required for manual analysis are considered to be the main obstacles to its implementation (Pezold et al., 2020). Therefore, the development of a tool for automatic calculation of language measures such as the MultiDis application could make naturalistic language assessment more feasible. Originality/Value: The value of this study lies in proposing a new application for lemmatization and part-of-speech tagging that allows for more reliable calculation of measures of productivity, lexical diversity, and syntactic complexity. Selecting appropriate measures for language assessment is a challenging task because there are many available. Implementing language technologies developed for large bodies of written texts to spoken language is also challenging. Success in some parts of automated tagging (lemmatization and part-of-speech tagging) allows for the reliable calculation of measures of productivity and lexical diversity. Future work on syntactic parsing will lead to the successful implementation of measures of syntactic complexity.

Izvorni jezik
Engleski

Znanstvena područja
Logopedija, Interdisciplinarne društvene znanosti, Filologija, Interdisciplinarne humanističke znanosti

POVEZANOST RADA

Projekti:
UIP-2017-05-6603 - Višerazinski pristup govornom diskursu u jezičnom razvoju (MultiDis) (Hržica, Gordana, HRZZ - 2017-05) ( CroRIS)

Ustanove:
Edukacijsko-rehabilitacijski fakultet, Zagreb

Profili:

Sara Košutar (autor)

Gordana Hržica (autor)

Matea Kramarić (autor)

Dario Karl (autor)

Poveznice na cjeloviti tekst rada:

llodapproaches2022.mruni.eu

CROSBI Hrvatska znanstvena bibliografija

Pregled bibliografske jedinice broj: 1216821

Selection, Implementation and Testing of Language Sample Analysis Measures for the Web-Based Application MultiDis

Poveznice na cjeloviti tekst rada:

Citiraj ovu publikaciju:

Pregled bibliografske jedinice broj: 1216821

Selection, Implementation and Testing of Language Sample Analysis Measures for the Web-Based Application MultiDis

Poveznice na cjeloviti tekst rada:

Citiraj ovu publikaciju:

Podijeli: