Computational stylometry and authorship verification of Sanskrit philosophical texts

Andrijanić, Ivan

izvor podataka: crosbi ✓

Computational stylometry and authorship verification of Sanskrit philosophical texts (CROSBI ID 701674)

Prilog sa skupa u zborniku | sažetak izlaganja sa skupa | međunarodna recenzija

Andrijanić, Ivan Computational stylometry and authorship verification of Sanskrit philosophical texts // 9th International Conference: “New Horizons in Oriental Studies” 8-9.12.2020 Warsaw : Abstracts. Varšava: Komitet Nauk Orientalistycznych Polskiej Akademii Nauk, 2020. str. 1-2

Podaci o odgovornosti

Autori

Andrijanić, Ivan

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Computational stylometry and authorship verification of Sanskrit philosophical texts

Sažetak

Within the broader field of digital humanities, contemporary computational stylometry represents a particularly interesting and exciting area. The term ‘stylometry’ was introduced in 1890 by Polish scholar Wincenty Lutosławski, pioneer in the Field. Broadly outlined, stylometry implies the measurement of textual stylistic affinities in order to address questions such as authorship and chronology. Following the development of personal computers and their ever increasing use, stylometry developed rapidly because it depends on advanced statistics and large text corpuses. A number of methods producing encouraging results were developed lately culminating in machine-learning classification models. One of the most studied stylometric discipline is authorship attribution where features of a text of unknown authorship are compared to the determined profiles of known authors in order to find a matching candidate. However, in Sanskrit studies, a different setup might be of greater interest, where features of a text of disputed authorship are compared to undisputed texts of the candidate author. This setup is referred to as authorship verification. In Indology we find a lot of texts of disputed attribution. One of the most striking examples is the famous philosopher Śaṅkara (seventh or eighth century) to whom some 400 works are ascribed according to Belvalkar’s estimate from 1929. The first attempt to address the question of Śaṅkara’s authorship on quantitative stylometric grounds was conducted by Richard Gussner who denied Śaṅkara’s authorship of fifteen Vedāntic stotras (1976) and the Vivekacūḍāmaṇi (1977). Gussner used the Chi square formula in order to compare word frequencies in these disputed texts and in undisputed Upadeśasahasrī. This simple statistical operation was conducted on manually collected data. Today, however, large body of Sanskrit electronic texts are at hand together with powerful computational tools to handle them. In order to analyze the texts Gussner had assessed, this presentation will make use of the authorship verification method called General Imposters (GI) framework that already produced quite convincing results. The GI procedure compares text features of the disputed text with the candidate author and a set of randomly chosen imposter features from the corpus. Firstly the procedure will be explained, and then the extremely important problem of Sanskrit text preparation will be discussed. As GI procedure depends on the measurement of some text-features, usually words, Sanskrit presents a challenge because of the sound change (sandhi) that affects the word boundaries. Second problem are different conventions of word splitting as many Sanskrit electronic texts in Latin IAST transliteration follow the Devanagari convention of word conjunction. In the first step, undisputed texts will be treated as disputed and the success of the procedure will be assessed. In this manner, three approaches to Sanskrit text segmentations will be assessed: (a) manually segmented corpus of rather limited size, (b) much larger, automatically segmented corpus (prepared by Jacek Bąkowski), (c) random samples of 5 000 words from the large corpus, (d) n-grams based on Bąkowski’s automatically segmented corpus. The most successful setup will then be used in order to asses Śaṅkara’s authorship of the disputed stotras and Vivekacūḍāmaṇi. The last issue that will be considered is the relationship of quantitative methods such as GI procedure (and others) and the traditional philology. The point is that the still significant number of mistakes and authors that steadily resist correct verification always cast some doubt in results. The answer might be that quantitative approach should necessary be combined with traditional philological methods. On the other hand some spurious results might be an indication that some more detailed text study in texts in questions should be conducted.

Ključne riječi

stylometry, advaita, authorship, imposters, authorship verification

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o prilogu

Stranice rada

1-2.

Godina izdavanja

2020.

Status objave rada

objavljeno

Podaci o matičnoj publikaciji

Naslov

9th International Conference: “New Horizons in Oriental Studies” 8-9.12.2020 Warsaw : Abstracts

Izdavač

Varšava: Komitet Nauk Orientalistycznych Polskiej Akademii Nauk

Podaci o skupu

Skup

9th International Conference: New Horizons of Oriental Studies

Vrsta sudjelovanja

ostalo

Datum održavanja skupa

08.12.2020-09.12.2020

Mjesto održavanja skupa

Varšava, Poljska

Povezanost rada

Povezane osobe

Ivan Andrijanić (autor/i)

Povezane ustanove

Filozofski fakultet u Zagrebu (130) (autorova ustanova)

Područje

Filologija

Poveznice

kno.pan.pl