Pregled bibliografske jedinice broj: 1120459
Computational stylometry and authorship verification of Sanskrit philosophical texts
Computational stylometry and authorship verification of Sanskrit philosophical texts // 9th International Conference: “New Horizons in Oriental Studies” 8-9.12.2020 Warsaw : Abstracts
Varšava: Komitet Nauk Orientalistycznych Polskiej Akademii Nauk, 2020. str. 1-2 (plenarno, međunarodna recenzija, sažetak, znanstveni)
CROSBI ID: 1120459 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Computational stylometry and authorship
verification
of Sanskrit philosophical texts
Autori
Andrijanić, Ivan
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, znanstveni
Izvornik
9th International Conference: “New Horizons in Oriental Studies” 8-9.12.2020 Warsaw : Abstracts
/ - Varšava : Komitet Nauk Orientalistycznych Polskiej Akademii Nauk, 2020, 1-2
Skup
9th International Conference: New Horizons of Oriental Studies
Mjesto i datum
Varšava, Poljska, 08.12.2020. - 09.12.2020
Vrsta sudjelovanja
Plenarno
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
stylometry, advaita, authorship, imposters, authorship verification
Sažetak
Within the broader field of digital humanities, contemporary computational stylometry represents a particularly interesting and exciting area. The term ‘stylometry’ was introduced in 1890 by Polish scholar Wincenty Lutosławski, pioneer in the Field. Broadly outlined, stylometry implies the measurement of textual stylistic affinities in order to address questions such as authorship and chronology. Following the development of personal computers and their ever increasing use, stylometry developed rapidly because it depends on advanced statistics and large text corpuses. A number of methods producing encouraging results were developed lately culminating in machine-learning classification models. One of the most studied stylometric discipline is authorship attribution where features of a text of unknown authorship are compared to the determined profiles of known authors in order to find a matching candidate. However, in Sanskrit studies, a different setup might be of greater interest, where features of a text of disputed authorship are compared to undisputed texts of the candidate author. This setup is referred to as authorship verification. In Indology we find a lot of texts of disputed attribution. One of the most striking examples is the famous philosopher Śaṅkara (seventh or eighth century) to whom some 400 works are ascribed according to Belvalkar’s estimate from 1929. The first attempt to address the question of Śaṅkara’s authorship on quantitative stylometric grounds was conducted by Richard Gussner who denied Śaṅkara’s authorship of fifteen Vedāntic stotras (1976) and the Vivekacūḍāmaṇi (1977). Gussner used the Chi square formula in order to compare word frequencies in these disputed texts and in undisputed Upadeśasahasrī. This simple statistical operation was conducted on manually collected data. Today, however, large body of Sanskrit electronic texts are at hand together with powerful computational tools to handle them. In order to analyze the texts Gussner had assessed, this presentation will make use of the authorship verification method called General Imposters (GI) framework that already produced quite convincing results. The GI procedure compares text features of the disputed text with the candidate author and a set of randomly chosen imposter features from the corpus. Firstly the procedure will be explained, and then the extremely important problem of Sanskrit text preparation will be discussed. As GI procedure depends on the measurement of some text-features, usually words, Sanskrit presents a challenge because of the sound change (sandhi) that affects the word boundaries. Second problem are different conventions of word splitting as many Sanskrit electronic texts in Latin IAST transliteration follow the Devanagari convention of word conjunction. In the first step, undisputed texts will be treated as disputed and the success of the procedure will be assessed. In this manner, three approaches to Sanskrit text segmentations will be assessed: (a) manually segmented corpus of rather limited size, (b) much larger, automatically segmented corpus (prepared by Jacek Bąkowski), (c) random samples of 5 000 words from the large corpus, (d) n-grams based on Bąkowski’s automatically segmented corpus. The most successful setup will then be used in order to asses Śaṅkara’s authorship of the disputed stotras and Vivekacūḍāmaṇi. The last issue that will be considered is the relationship of quantitative methods such as GI procedure (and others) and the traditional philology. The point is that the still significant number of mistakes and authors that steadily resist correct verification always cast some doubt in results. The answer might be that quantitative approach should necessary be combined with traditional philological methods. On the other hand some spurious results might be an indication that some more detailed text study in texts in questions should be conducted.
Izvorni jezik
Engleski
Znanstvena područja
Filologija