Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

Computational stylometry and authorship verification of Sanskrit philosophical texts (CROSBI ID 701674)

Prilog sa skupa u zborniku | sažetak izlaganja sa skupa | međunarodna recenzija

Andrijanić, Ivan Computational stylometry and authorship verification of Sanskrit philosophical texts // 9th International Conference: “New Horizons in Oriental Studies” 8-9.12.2020 Warsaw : Abstracts. Varšava: Komitet Nauk Orientalistycznych Polskiej Akademii Nauk, 2020. str. 1-2

Podaci o odgovornosti

Andrijanić, Ivan

engleski

Computational stylometry and authorship verification of Sanskrit philosophical texts

Within the broader field of digital humanities, contemporary computational stylometry represents a particularly interesting and exciting area. The term ‘stylometry’ was introduced in 1890 by Polish scholar Wincenty Lutosławski, pioneer in the Field. Broadly outlined, stylometry implies the measurement of textual stylistic affinities in order to address questions such as authorship and chronology. Following the development of personal computers and their ever increasing use, stylometry developed rapidly because it depends on advanced statistics and large text corpuses. A number of methods producing encouraging results were developed lately culminating in machine-learning classification models. One of the most studied stylometric discipline is authorship attribution where features of a text of unknown authorship are compared to the determined profiles of known authors in order to find a matching candidate. However, in Sanskrit studies, a different setup might be of greater interest, where features of a text of disputed authorship are compared to undisputed texts of the candidate author. This setup is referred to as authorship verification. In Indology we find a lot of texts of disputed attribution. One of the most striking examples is the famous philosopher Śaṅkara (seventh or eighth century) to whom some 400 works are ascribed according to Belvalkar’s estimate from 1929. The first attempt to address the question of Śaṅkara’s authorship on quantitative stylometric grounds was conducted by Richard Gussner who denied Śaṅkara’s authorship of fifteen Vedāntic stotras (1976) and the Vivekacūḍāmaṇi (1977). Gussner used the Chi square formula in order to compare word frequencies in these disputed texts and in undisputed Upadeśasahasrī. This simple statistical operation was conducted on manually collected data. Today, however, large body of Sanskrit electronic texts are at hand together with powerful computational tools to handle them. In order to analyze the texts Gussner had assessed, this presentation will make use of the authorship verification method called General Imposters (GI) framework that already produced quite convincing results. The GI procedure compares text features of the disputed text with the candidate author and a set of randomly chosen imposter features from the corpus. Firstly the procedure will be explained, and then the extremely important problem of Sanskrit text preparation will be discussed. As GI procedure depends on the measurement of some text-features, usually words, Sanskrit presents a challenge because of the sound change (sandhi) that affects the word boundaries. Second problem are different conventions of word splitting as many Sanskrit electronic texts in Latin IAST transliteration follow the Devanagari convention of word conjunction. In the first step, undisputed texts will be treated as disputed and the success of the procedure will be assessed. In this manner, three approaches to Sanskrit text segmentations will be assessed: (a) manually segmented corpus of rather limited size, (b) much larger, automatically segmented corpus (prepared by Jacek Bąkowski), (c) random samples of 5 000 words from the large corpus, (d) n-grams based on Bąkowski’s automatically segmented corpus. The most successful setup will then be used in order to asses Śaṅkara’s authorship of the disputed stotras and Vivekacūḍāmaṇi. The last issue that will be considered is the relationship of quantitative methods such as GI procedure (and others) and the traditional philology. The point is that the still significant number of mistakes and authors that steadily resist correct verification always cast some doubt in results. The answer might be that quantitative approach should necessary be combined with traditional philological methods. On the other hand some spurious results might be an indication that some more detailed text study in texts in questions should be conducted.

stylometry, advaita, authorship, imposters, authorship verification

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

1-2.

2020.

objavljeno

Podaci o matičnoj publikaciji

9th International Conference: “New Horizons in Oriental Studies” 8-9.12.2020 Warsaw : Abstracts

Varšava: Komitet Nauk Orientalistycznych Polskiej Akademii Nauk

Podaci o skupu

9th International Conference: New Horizons of Oriental Studies

ostalo

08.12.2020-09.12.2020

Varšava, Poljska

Povezanost rada

Filologija

Poveznice