Croatian Adult Spoken Language Corpus

Hržica, Gordana; Kuvač Kraljević, Jelena

Pregled bibliografske jedinice broj: 816959

Croatian Adult Spoken Language Corpus

Hržica, Gordana; Kuvač Kraljević, Jelena

Croatian Adult Spoken Language Corpus // 6th International Conference on Foreign Language Teaching and Applied Linguistics (FLTAL)
Sarajevo, Bosna i Hercegovina, 2016. (predavanje, međunarodna recenzija, sažetak, znanstveni)

CROSBI ID: 816959 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Croatian Adult Spoken Language Corpus

Autori
Hržica, Gordana ; Kuvač Kraljević, Jelena

Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, znanstveni

Skup
6th International Conference on Foreign Language Teaching and Applied Linguistics (FLTAL)

Mjesto i datum
Sarajevo, Bosna i Hercegovina, 12.05.2016. - 14.05.2016

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
spoken language coprous; metalinguistic annotation; participant selection; speech segmentation

Sažetak
Spontaneous spoken language corpora provide unique language data that reveal basic components of spoken language: frequency and distributions of words and structures, item- based variations and contextual patterns (Pusch, 2006). With the growing research on language impairment (in different ages and in different conditions) the need for specialised corpora grew. However, no adult spoken language corpus was available in Croatian. There are many relevant methodological questions in language sampling of spoken language corpora due to the great variations in speech samples regarding factors such as formality, common ground or knowledge and various socioeconomic factors. Corpus makers tend to control the diversity of their corpora by limiting the number and type of social situations (e.g. Le Corpus de Référence du Français Parlé (CRFP)) or (in cases of large corpora) by carefully annotating metalinguistic factors of speech sampling (e.g. in British National Corpus (BNC) or in Santa Barbara Corpus of Spoken American English (CSAE)). In this paper presents building of Croatian spoken adult language corpus with the respect to those issues. The development of Croatian spoken adult language corpus started in 2014. Today it consists of more than 300 000 tokens in 83 language samples. Samples were collected in spontaneous conversation of adult speakers of Croatian language, with the respect to the dialectal and socio-economic diversity of Croatia. They were prepared according to TalkBank ground rules for contribution in order to become publically assessable in relevant subcopora of TalkBank, the largest collection of spoken language corpora (MacWhinney, 2007). References: Pusch, C. D. (2006). Corpora of Spoken Discourse. In: Keith Brown (ed). Encyclopedia of Language & Linguistics. Oxford: Elsevier. 226-230. MacWhinney, B. (2007). The TalkBank project. In: J.C. Beal, K.P Corrigan, H. L. Moisl (eds) Creating and Digitizing Language Corpora: Synchronic Database, Vol 1. Houndmills: Palgrave-Macmillan.

Izvorni jezik
Engleski

POVEZANOST RADA

Projekti:
HRZZ-UIP-2013-11-2421 - Jezična obrada u odraslih govornika (ALP) (Kuvač Kraljević, Jelena, HRZZ - 2013-11) ( CroRIS)

Ustanove:
Edukacijsko-rehabilitacijski fakultet, Zagreb

Profili:

Jelena Kuvač (autor)

Gordana Hržica (autor)

CROSBI Hrvatska znanstvena bibliografija

Pregled bibliografske jedinice broj: 816959

Croatian Adult Spoken Language Corpus

Citiraj ovu publikaciju:

Pregled bibliografske jedinice broj: 816959

Croatian Adult Spoken Language Corpus

Citiraj ovu publikaciju:

Podijeli: