Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 524245

Unsupervised Topic-Oriented Keyphrase Extraction and its Application to Croatian


Saratlija, Josip; Šnajder, Jan; Dalbelo Bašić, Bojana
Unsupervised Topic-Oriented Keyphrase Extraction and its Application to Croatian // Lecture notes in Artificial Intelligence (Text, Speech and Dialogue, 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 2011), 6836 (2011), 340-347 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 524245 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Unsupervised Topic-Oriented Keyphrase Extraction and its Application to Croatian

Autori
Saratlija, Josip ; Šnajder, Jan ; Dalbelo Bašić, Bojana

Izvornik
Lecture notes in Artificial Intelligence (Text, Speech and Dialogue, 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 2011) (0302-9743) 6836 (2011); 340-347

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
Information extraction; keyphrase extraction; unsupervised learning; k-means; Croatian language

Sažetak
Labeling documents with keyphrases is a tedious and expensive task. Most approaches to automatic keyphrases extraction rely on supervised learning and require manually labeled training data. In this paper we propose a fully unsupervised keyphrase extraction method, differing from the usual generic keyphrase extractor in the manner the keyphrases are formed. Our method begins by building topically related word clusters from which document keywords are selected, and then expands the selected keywords into syntactically valid keyphrases. We evaluate our approach on a Croatian document collection annotated by eight human experts, taking into account the high subjectivity of the keyphrase extraction task. The performance of the proposed method reaches up to F1=44.5, which is outperformed by human annotators, but comparable to a supervised approach.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo



POVEZANOST RADA


Projekti:
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Avatar Url Jan Šnajder (autor)

Avatar Url Bojana Dalbelo Bašić (autor)


Citiraj ovu publikaciju:

Saratlija, Josip; Šnajder, Jan; Dalbelo Bašić, Bojana
Unsupervised Topic-Oriented Keyphrase Extraction and its Application to Croatian // Lecture notes in Artificial Intelligence (Text, Speech and Dialogue, 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 2011), 6836 (2011), 340-347 (međunarodna recenzija, članak, znanstveni)
Saratlija, J., Šnajder, J. & Dalbelo Bašić, B. (2011) Unsupervised Topic-Oriented Keyphrase Extraction and its Application to Croatian. Lecture notes in Artificial Intelligence (Text, Speech and Dialogue, 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 2011), 6836, 340-347.
@article{article, author = {Saratlija, Josip and \v{S}najder, Jan and Dalbelo Ba\v{s}i\'{c}, Bojana}, year = {2011}, pages = {340-347}, keywords = {Information extraction, keyphrase extraction, unsupervised learning, k-means, Croatian language}, journal = {Lecture notes in Artificial Intelligence (Text, Speech and Dialogue, 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 2011)}, volume = {6836}, issn = {0302-9743}, title = {Unsupervised Topic-Oriented Keyphrase Extraction and its Application to Croatian}, keyword = {Information extraction, keyphrase extraction, unsupervised learning, k-means, Croatian language} }
@article{article, author = {Saratlija, Josip and \v{S}najder, Jan and Dalbelo Ba\v{s}i\'{c}, Bojana}, year = {2011}, pages = {340-347}, keywords = {Information extraction, keyphrase extraction, unsupervised learning, k-means, Croatian language}, journal = {Lecture notes in Artificial Intelligence (Text, Speech and Dialogue, 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 2011)}, volume = {6836}, issn = {0302-9743}, title = {Unsupervised Topic-Oriented Keyphrase Extraction and its Application to Croatian}, keyword = {Information extraction, keyphrase extraction, unsupervised learning, k-means, Croatian language} }

Časopis indeksira:


  • Scopus





Contrast
Increase Font
Decrease Font
Dyslexic Font