Pregled bibliografske jedinice broj: 936281
Unsupervised Acquisition of Comprehensive Multiword Lexicons using Competition in an n-gram Lattice
Unsupervised Acquisition of Comprehensive Multiword Lexicons using Competition in an n-gram Lattice // Transactions of the Association for Computational Linguistics, 5 (2017), 1; 455-470 (međunarodna recenzija, članak, znanstveni)
CROSBI ID: 936281 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Unsupervised Acquisition of Comprehensive Multiword Lexicons using Competition in an n-gram Lattice
Autori
Brooke, Julian ; Šnajder, Jan ; Baldwin, Timothy
Izvornik
Transactions of the Association for Computational Linguistics (2307-387X) 5
(2017), 1;
455-470
Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni
Ključne riječi
Multiword expressions ; natural language processing ; lexical semantics
Sažetak
We present a new model for acquiring comprehensive multiword lexicons from large corpora based on competition among n-gram candidates. In contrast to the standard approach of simple ranking by association measure, in our model n-grams are arranged in a lattice structure based on subsumption and overlap relationships, with nodes inhibiting other nodes in their vicinity when they are selected as a lexical item. We show how the configuration of such a lattice can be optimized tractably, and demonstrate using annotations of sampled n-grams that our method consistently outperforms alternatives by at least 0.05 F-score across several corpora and languages.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Projekti:
HRZZ-UIP-2014-09-7312 - SenseHive: Dinamički modeli za postepenu izgradnju leksičko-semantičkih resursa potpomognuti radom mnoštva (SenseHive) (Šnajder, Jan, HRZZ ) ( CroRIS)
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb
Profili:
Jan Šnajder
(autor)