Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 984624

A web corpus and word sketches for Japanese


Srdanović, Irena; Erjavec, Tomaž; Kilgarriff, Adam
A web corpus and word sketches for Japanese // Shizen gengo shori (Journal of Natural Language Processing), 15 (2008), 2; 137-159 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 984624 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
A web corpus and word sketches for Japanese

Autori
Srdanović, Irena ; Erjavec, Tomaž ; Kilgarriff, Adam

Izvornik
Shizen gengo shori (Journal of Natural Language Processing) (1340-7619) 15 (2008), 2; 137-159

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
japanese web corpus ; corpus query tool ; Sketch Engine ; word sketches

Sažetak
Of all the major world languages, Japanese is lagging behind in terms of publicly accessible and searchable corpora. In this paper we describe the development of JpWaC (Japanese Web as Corpus), a large corpus of 400 million words of Japanese web text, and its encoding for the Sketch Engine. The Sketch Engine is a web-based corpus query tool that supports fast concordancing, grammatical processing, ‘word sketching’ (one-page summaries of a word's grammatical and collocational behaviour), a distributional thesaurus, and robot use. We describe the steps taken to gather and process the corpus and to establish its validity, in terms of the kinds of language it contains. We then describe the development of a shallow grammar for Japanese to enable word sketching. We believe that the Japanese web corpus as loaded into the Sketch Engine will be a useful resource for a wide number of Japanese researchers, learners, and NLP developers.

Izvorni jezik
Engleski



POVEZANOST RADA


Profili:

Avatar Url Irena Srdanović (autor)


Citiraj ovu publikaciju:

Srdanović, Irena; Erjavec, Tomaž; Kilgarriff, Adam
A web corpus and word sketches for Japanese // Shizen gengo shori (Journal of Natural Language Processing), 15 (2008), 2; 137-159 (međunarodna recenzija, članak, znanstveni)
Srdanović, I., Erjavec, T. & Kilgarriff, A. (2008) A web corpus and word sketches for Japanese. Shizen gengo shori (Journal of Natural Language Processing), 15 (2), 137-159.
@article{article, author = {Srdanovi\'{c}, Irena and Erjavec, Toma\v{z} and Kilgarriff, Adam}, year = {2008}, pages = {137-159}, keywords = {japanese web corpus, corpus query tool, Sketch Engine, word sketches}, journal = {Shizen gengo shori (Journal of Natural Language Processing)}, volume = {15}, number = {2}, issn = {1340-7619}, title = {A web corpus and word sketches for Japanese}, keyword = {japanese web corpus, corpus query tool, Sketch Engine, word sketches} }
@article{article, author = {Srdanovi\'{c}, Irena and Erjavec, Toma\v{z} and Kilgarriff, Adam}, year = {2008}, pages = {137-159}, keywords = {japanese web corpus, corpus query tool, Sketch Engine, word sketches}, journal = {Shizen gengo shori (Journal of Natural Language Processing)}, volume = {15}, number = {2}, issn = {1340-7619}, title = {A web corpus and word sketches for Japanese}, keyword = {japanese web corpus, corpus query tool, Sketch Engine, word sketches} }




Contrast
Increase Font
Decrease Font
Dyslexic Font