Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 594744

Optimizing Sentence Boundary Detection for Croatian


Šarić, Frane; Šnajder, Jan; Dalbelo Bašić, Bojana
Optimizing Sentence Boundary Detection for Croatian // Lecture notes in Artificial Intelligence, 7499 (2012), 105-111 doi:10.1007/978-3-642-32790-2_12 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 594744 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Optimizing Sentence Boundary Detection for Croatian

Autori
Šarić, Frane ; Šnajder, Jan ; Dalbelo Bašić, Bojana

Izvornik
Lecture notes in Artificial Intelligence (0302-9743) 7499 (2012); 105-111

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
sentence boundary; croatian language; logistic regression

Sažetak
A number of natural language processing tasks depend on segmenting text into sentences. Tools that perform sentence boundary detection achieve excellent performance for some languages. We have tried to train a few publicly available language independent tools to perform sentence boundary detection for Croatian. The initial results show that off-the-shelf methods used for English do not work particularly well for Croatian. After performing error analysis, we propose additional features that help in resolving some of the most common boundary detection errors. We use unsupervised methods on a large Croatian corpus to collect likely sentence starters, abbreviations, and honorifics. In addition to some commonly used features, we use these lists of words as features for classifier that is trained on a smaller corpus with manually annotated sentences. The method we propose advances the state-of-the art accuracy for Croatian sentence boundary detection on news corpora to 99.5%.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo

Napomena
Rad je prezentiran na skupu 15th International Conference Text, Speech and Dialogue (TSD 2012), održanom u rujnu 2012.g., Brno, Republika Česka.



POVEZANOST RADA


Projekti:
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Avatar Url Jan Šnajder (autor)

Avatar Url Bojana Dalbelo Bašić (autor)

Poveznice na cjeloviti tekst rada:

doi www.springerlink.com

Citiraj ovu publikaciju:

Šarić, Frane; Šnajder, Jan; Dalbelo Bašić, Bojana
Optimizing Sentence Boundary Detection for Croatian // Lecture notes in Artificial Intelligence, 7499 (2012), 105-111 doi:10.1007/978-3-642-32790-2_12 (međunarodna recenzija, članak, znanstveni)
Šarić, F., Šnajder, J. & Dalbelo Bašić, B. (2012) Optimizing Sentence Boundary Detection for Croatian. Lecture notes in Artificial Intelligence, 7499, 105-111 doi:10.1007/978-3-642-32790-2_12.
@article{article, author = {\v{S}ari\'{c}, Frane and \v{S}najder, Jan and Dalbelo Ba\v{s}i\'{c}, Bojana}, year = {2012}, pages = {105-111}, DOI = {10.1007/978-3-642-32790-2\_12}, keywords = {sentence boundary, croatian language, logistic regression}, journal = {Lecture notes in Artificial Intelligence}, doi = {10.1007/978-3-642-32790-2\_12}, volume = {7499}, issn = {0302-9743}, title = {Optimizing Sentence Boundary Detection for Croatian}, keyword = {sentence boundary, croatian language, logistic regression} }
@article{article, author = {\v{S}ari\'{c}, Frane and \v{S}najder, Jan and Dalbelo Ba\v{s}i\'{c}, Bojana}, year = {2012}, pages = {105-111}, DOI = {10.1007/978-3-642-32790-2\_12}, keywords = {sentence boundary, croatian language, logistic regression}, journal = {Lecture notes in Artificial Intelligence}, doi = {10.1007/978-3-642-32790-2\_12}, volume = {7499}, issn = {0302-9743}, title = {Optimizing Sentence Boundary Detection for Croatian}, keyword = {sentence boundary, croatian language, logistic regression} }

Časopis indeksira:


  • Scopus


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font