Pregled bibliografske jedinice broj: 187763
Word sense disambiguation : Distinguishing between individuals and kinds
Word sense disambiguation : Distinguishing between individuals and kinds, 2004., magistarski rad, Computer Laboratory, Cambridge
CROSBI ID: 187763 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Word sense disambiguation : Distinguishing between individuals and kinds
Autori
Mikelić, Nives
Vrsta, podvrsta i kategorija rada
Ocjenski radovi, magistarski rad
Fakultet
Computer Laboratory
Mjesto
Cambridge
Datum
30.07
Godina
2004
Stranica
60
Mentor
Copestake, Ann
Ključne riječi
Word sense disambiguation; British National Corpus; kind feature extractor; machine learning
Sažetak
This thesis presents a method of semi automatically acquiring kind properties of mass nouns, as well as nouns denoting broader domain of physical objects from the British National Corpus (BNC). Kind-referring nouns are not easily extracted, since there is no morphological distinction between kind reading of a noun and its individual reading and since there is no overall agreement in linguistic theories on the features that should be reliable clues for the distinction. In order to extract kind nouns together with their features from the BNC, the kind feature extractor was built. High reliable features were then used to semi automatically tag nouns as either KIND or PORTION in the first phase or as either KIND or OTHER in the later phase. The project results show that the method used is quite successful. Taking into account the rare number of kind-referring nouns in the corpus, features extracted with KFE seem to be highly reliable. Two different machine learning environments, TiMBL and WEKA, were used to learn these features and to predict the class of previously unseen words. Classifiers that were trained on the above mentioned features obtained accuracy of 89% and showed the ability to uncover some new patterns.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti
POVEZANOST RADA