Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 726449

Searching for Semantically Correct Postal Addresses on the Croatian Web


Ugrina, Ivo; Žigo, Mislav
Searching for Semantically Correct Postal Addresses on the Croatian Web // Proceedings of Central European Conference on Information and Intelligent Systems 2014 / Hunjak, Tihomir ; Lovrenčić, Sandra ; Tomičić, Igor (ur.).
Varaždin: Fakultet organizacije i informatike Sveučilišta u Zagrebu, 2014. str. 276-283 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 726449 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Searching for Semantically Correct Postal Addresses on the Croatian Web

Autori
Ugrina, Ivo ; Žigo, Mislav

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of Central European Conference on Information and Intelligent Systems 2014 / Hunjak, Tihomir ; Lovrenčić, Sandra ; Tomičić, Igor - Varaždin : Fakultet organizacije i informatike Sveučilišta u Zagrebu, 2014, 276-283

Skup
Central European Conference on Information and Intelligent Systems

Mjesto i datum
Varaždin, Hrvatska, 17.09.2014. - 19.09.2014

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
postal addresses ; string similarity ; machine learning ; geographic location ; address extraction

Sažetak
This article presents a method of extraction and simultaneous verification of postal addresses within web pages written in a highly inflective language (Croatian). The method uses a combined approach of direct city name extraction, string similarity measure (Jaro-Winkler) for street names, an algorithm for treating overlapping addresses and a machine learning classifier (Decision trees) to derive Semantically Correct Postal Addresses. A Semantically Correct Postal Address is defined as one that was meant to be written by an author of the text and is not simply there by a lucky ordering of words. The presented method jointly does geoparsing and geocoding. For the initial search of cities and streets, the method relies on a database containing most of the streets and cities in Croatia. The method was evaluated on a data set consisting of 13, 000, 000 documents (from 35, 000 web domains) and resulted in 4, 000, 000 addresses found in 2, 750, 000 documents. The quality of classifiers was tested on a hand annotated set giving F1 scores greater than 0.9.

Izvorni jezik
Engleski

Znanstvena područja
Matematika, Računarstvo, Informacijske i komunikacijske znanosti



POVEZANOST RADA


Ustanove:
Prirodoslovno-matematički fakultet, Matematički odjel, Zagreb,
Prirodoslovno-matematički fakultet, Zagreb

Profili:

Avatar Url Mislav Žigo (autor)

Avatar Url Ivo Ugrina (autor)

Citiraj ovu publikaciju:

Ugrina, Ivo; Žigo, Mislav
Searching for Semantically Correct Postal Addresses on the Croatian Web // Proceedings of Central European Conference on Information and Intelligent Systems 2014 / Hunjak, Tihomir ; Lovrenčić, Sandra ; Tomičić, Igor (ur.).
Varaždin: Fakultet organizacije i informatike Sveučilišta u Zagrebu, 2014. str. 276-283 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Ugrina, I. & Žigo, M. (2014) Searching for Semantically Correct Postal Addresses on the Croatian Web. U: Hunjak, T., Lovrenčić, S. & Tomičić, I. (ur.)Proceedings of Central European Conference on Information and Intelligent Systems 2014.
@article{article, author = {Ugrina, Ivo and \v{Z}igo, Mislav}, year = {2014}, pages = {276-283}, keywords = {postal addresses, string similarity, machine learning, geographic location, address extraction}, title = {Searching for Semantically Correct Postal Addresses on the Croatian Web}, keyword = {postal addresses, string similarity, machine learning, geographic location, address extraction}, publisher = {Fakultet organizacije i informatike Sveu\v{c}ili\v{s}ta u Zagrebu}, publisherplace = {Vara\v{z}din, Hrvatska} }
@article{article, author = {Ugrina, Ivo and \v{Z}igo, Mislav}, year = {2014}, pages = {276-283}, keywords = {postal addresses, string similarity, machine learning, geographic location, address extraction}, title = {Searching for Semantically Correct Postal Addresses on the Croatian Web}, keyword = {postal addresses, string similarity, machine learning, geographic location, address extraction}, publisher = {Fakultet organizacije i informatike Sveu\v{c}ili\v{s}ta u Zagrebu}, publisherplace = {Vara\v{z}din, Hrvatska} }




Contrast
Increase Font
Decrease Font
Dyslexic Font