Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

Searching for Semantically Correct Postal Addresses on the Croatian Web (CROSBI ID 616627)

Prilog sa skupa u časopisu | izvorni znanstveni rad | međunarodna recenzija

Ugrina, Ivo ; Žigo, Mislav Searching for Semantically Correct Postal Addresses on the Croatian Web // Central European conference on information and intelligent systems / Hunjak, Tihomir ; Lovrenčić, Sandra ; Tomičić, Igor (ur.). 2014. str. 276-283

Podaci o odgovornosti

Ugrina, Ivo ; Žigo, Mislav

engleski

Searching for Semantically Correct Postal Addresses on the Croatian Web

This article presents a method of extraction and simultaneous verification of postal addresses within web pages written in a highly inflective language (Croatian). The method uses a combined approach of direct city name extraction, string similarity measure (Jaro-Winkler) for street names, an algorithm for treating overlapping addresses and a machine learning classifier (Decision trees) to derive Semantically Correct Postal Addresses. A Semantically Correct Postal Address is defined as one that was meant to be written by an author of the text and is not simply there by a lucky ordering of words. The presented method jointly does geoparsing and geocoding. For the initial search of cities and streets, the method relies on a database containing most of the streets and cities in Croatia. The method was evaluated on a data set consisting of 13, 000, 000 documents (from 35, 000 web domains) and resulted in 4, 000, 000 addresses found in 2, 750, 000 documents. The quality of classifiers was tested on a hand annotated set giving F1 scores greater than 0.9.

postal addresses ; string similarity ; machine learning ; geographic location ; address extraction

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

276-283.

2014.

nije evidentirano

objavljeno

Podaci o matičnoj publikaciji

Central European conference on information and intelligent systems

Hunjak, Tihomir ; Lovrenčić, Sandra ; Tomičić, Igor

Varaždin: Fakultet organizacije i informatike Sveučilišta u Zagrebu

1847-2001

1848-2295

Podaci o skupu

Central European Conference on Information and Intelligent Systems

predavanje

17.09.2014-19.09.2014

Varaždin, Hrvatska

Povezanost rada

Informacijske i komunikacijske znanosti, Matematika, Računarstvo