A Generic Method for Multi Word Extraction from Wikipedia (CROSBI ID 49163)
Prilog u knjizi | izvorni znanstveni rad
Podaci o odgovornosti
Bekavac, Božo ; Tadić, Marko
engleski
A Generic Method for Multi Word Extraction from Wikipedia
This paper presents the generic method for multiword expression extraction from Wikipedia. The method is using the properties of this specific encyclopedic genre in its HTML format and it relies on the intention of the authors of articles to link to other articles. The relevant links were processed by applying local regular grammars within the NooJ development environment. We tested the method on a Croatian version of Wikipedia and we present the results obtained.
multi word expressions, multi word extraction, Croatian, Wikipedia
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
115-124.
objavljeno
Podaci o knjizi
Technologies for the Processing and Retrieval of Semi-Structured Documents: Experience from the CADIAL Project
Tadić, Marko ; Dalbelo Bašić, Bojana ; Moens, Marie-Francine
Zagreb: Hrvatsko društvo za jezične tehnologije
2009.
978-953-55375-1-9