Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 769887

A preliminary study on similarity-preserving digital book identifiers


Vladimir, Klemo; Šilić, Marin; Romić, Nenad; Delač, Goran; Srbljić, Siniša
A preliminary study on similarity-preserving digital book identifiers // Proceedings of the 9th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Peking, Kina, 2015. (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 769887 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
A preliminary study on similarity-preserving digital book identifiers

Autori
Vladimir, Klemo ; Šilić, Marin ; Romić, Nenad ; Delač, Goran ; Srbljić, Siniša

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the 9th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities / - , 2015

Skup
ACL Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

Mjesto i datum
Peking, Kina, 30.07.2015

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
locality-sensitive hashing; simhash; digital book; clustering

Sažetak
Due to proliferation of digital publishing, e-book catalogs are abundant but noisy and unstructured. Tools for the digital librarian rely on ISBN, metadata embedded into digital files (without accepted standard) and cryptographic hash functions for the identification of coderivative or near-duplicate content. However, unreliability of metadata and sensitivity of hashing to even smallest changes prevents efficient detection of coderivative or similar digital books. Focus of the study are books with many versions that differ in certain amount of OCR errors and have a number of sentence-length variations. Identification of similar books is performed using small-sized fingerprints that can be easily shared and compared. We created synthetic datasets to evaluate fingerprinting accuracy while providing standard precision and recall measurements.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo



POVEZANOST RADA


Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Avatar Url Klemo Vladimir (autor)

Avatar Url Siniša Srbljić (autor)

Avatar Url Marin Šilić (autor)

Avatar Url Goran Delač (autor)


Citiraj ovu publikaciju:

Vladimir, Klemo; Šilić, Marin; Romić, Nenad; Delač, Goran; Srbljić, Siniša
A preliminary study on similarity-preserving digital book identifiers // Proceedings of the 9th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Peking, Kina, 2015. (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Vladimir, K., Šilić, M., Romić, N., Delač, G. & Srbljić, S. (2015) A preliminary study on similarity-preserving digital book identifiers. U: Proceedings of the 9th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities.
@article{article, year = {2015}, keywords = {locality-sensitive hashing, simhash, digital book, clustering}, title = {A preliminary study on similarity-preserving digital book identifiers}, keyword = {locality-sensitive hashing, simhash, digital book, clustering}, publisherplace = {Peking, Kina} }
@article{article, year = {2015}, keywords = {locality-sensitive hashing, simhash, digital book, clustering}, title = {A preliminary study on similarity-preserving digital book identifiers}, keyword = {locality-sensitive hashing, simhash, digital book, clustering}, publisherplace = {Peking, Kina} }




Contrast
Increase Font
Decrease Font
Dyslexic Font