Selection, Implementation and Testing of Language Sample Analysis Measures for the Web-Based Application MultiDis (CROSBI ID 723422)
Prilog sa skupa u zborniku | sažetak izlaganja sa skupa | međunarodna recenzija
Podaci o odgovornosti
Hržica, Gordana ; Košutar, Sara ; Karl, Dario ; Kramarić, Matea
engleski
Selection, Implementation and Testing of Language Sample Analysis Measures for the Web-Based Application MultiDis
Purpose: The MultiDis application is a new, web- based application designed for the analysis of spoken and written language samples, which provides information about the language abilities of children and adults, thus facilitating language assessment. The aim of this paper is to present the selection, implementation, and testing of language measures in the MultiDis application. We will present the application, the process of selecting the measures we implemented, the language resources needed to calculate them, and the results of testing. MultiDis is currently being developed for Croatian, but it could be scaled up for multilingual analysis. Design/methodology/approach: Language samples can be analyzed according to several dimensions, such as productivity, lexical diversity, and syntactic complexity. A set of (semi-) automatic measures has been selected to assess language abilities (e.g., number of lemmas, mean-average type-token ratio, mean length of communication unit). The next step was the integration of an open-source Python library for lemmatization, part-of-speech tagging, and syntactic parsing (Stanza ; Qi et al., 2020). To test whether these tasks and the subsequent calculation of language measures can be successfully performed on spoken language samples, we uploaded 150 short narrative samples produced by children as a result of a storytelling task. Findings: Lemmatization and part-of-speech tagging are fairly accurate (>85% of cases), as they do not interfere with the calculation of the currently implemented measures of productivity and lexical diversity. The process of syntactic parsing has been an obstacle that is currently being resolved. Research limitations/implications: The MultiDis web application is still under development, although the current version fulfils its main purpose – it allows for (semi-)automatic spoken language analysis. Practical implications: There is an increasing awareness of the importance of language sample analysis as a complementary method in language assessment. The time needed for transcription and the linguistic knowledge required for manual analysis are considered to be the main obstacles to its implementation (Pezold et al., 2020). Therefore, the development of a tool for automatic calculation of language measures such as the MultiDis application could make naturalistic language assessment more feasible. Originality/Value: The value of this study lies in proposing a new application for lemmatization and part-of-speech tagging that allows for more reliable calculation of measures of productivity, lexical diversity, and syntactic complexity. Selecting appropriate measures for language assessment is a challenging task because there are many available. Implementing language technologies developed for large bodies of written texts to spoken language is also challenging. Success in some parts of automated tagging (lemmatization and part-of-speech tagging) allows for the reliable calculation of measures of productivity and lexical diversity. Future work on syntactic parsing will lead to the successful implementation of measures of syntactic complexity.
Language Sample Analysis ; Microstructural Measures, Lemmatization, Part-of-speech tagging, Syntactic parsing
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
40-41.
2022.
objavljeno
Podaci o matičnoj publikaciji
LLOD Approaches for Language Data Research and Management LLODREAM2022: International Scientific Interdisciplinary Conference
Autorių kolektyvas
Vilnius: Mykolo Romerio universitetas
978-609-488-041-4
Podaci o skupu
LLOD approaches for language data research and management (LLODREAM 2022)
predavanje
21.09.2022-22.09.2022
Vilnius, Litva
Povezanost rada
Filologija, Interdisciplinarne društvene znanosti, Interdisciplinarne humanističke znanosti, Logopedija