Hunston, Susan and Gill Francis. A Survey and Future Challenges. Eurographics Conference on Visualization EuroVis. Phrases in literary contexts: International Journal of Corpus Linguistics, 18 1: CLiC Dickens — Novel uses of concordances for the integration of corpus stylistics and cognitive poetics, Corpora. Palgrave, Basingstoke, UK, pp. Salway, Andrew and Samia Touileb.
Applying Grammar Induction to Text Mining. Language, Corpus and Discourse. Narrative Progression in the Short Story. A Corpus Stylistic Approach. Unsupervised learning of natural languages. Quindi una rete di centri di eccellenza scientifica su alcuni temi chiave quali: In questo contesto i Big Data costituiscono una grande sfida a modificare non solo le metodologie della ricerca, ma anche gli assetti operativi dei team di studiosi al lavoro Morabito, Il settore educativo, e in modo speciale quello dell'istruzione, ha un grande bisogno di questo tipo di contributi, volti a dare valore a masse imponenti di dati che prima giacevano nei faldoni cartacei e oggi rischiano di "prendere la polvere" nei server.
Lo sviluppo di questo strumento di analisi della vita scolastica potrebbe essere esteso per operare sui dati provenienti da tutte le scuole italiane di ogni ordine e grado. Il software che implementa questi algoritmi di annotazione prende il nome di TagMe. Analisi dei dati e datamining. Boyd, Danah, Crawford, Kate. Calders, Toon, Pechenizkiy, Mykola. Computer Supported Education, University of Pennsylvania Press. Daniel, Ben Kei, ed.
Current Theory and Practice.
Ferragina, Paolo, Scaiella, Ugo. IEEE Software, 29 1: Exploring Big Historical Data: Greengrass, Mark, Hughues, Lorna M. The Virtual Representation of the Past, Aldershot: Gruppo di lavoro Miur Big Data Miur, Rapporto del gruppo di lavoro costituito con D. Guldi, Jo, Armitage, David University of Illinois Press. Big Data in History. Learning with Big Data: The Future of Education.
Big Data and Analytics: Strategic and Organizational Impacts. The transformative promise of digital humanities. Viteritti, Assunta, Giancola, Orazio. L'indagine PISA e il governo dell'educazione tramite i numeri. The Memory of the World in the Digital Age: A spasso nel tempo: Breve conclusione Quanto detto in precedenza non ha ragione di esistere, se non si presuppone il seguente assunto di partenza: Gambetta, La conservazione della memoria digitale, [Rubano], Siav, Guercio, Conservare il digitale.
Principi, metodi e procedure per la conservazione a lungo termine di documenti digitali, Roma-Bari, Laterza, ed. Guercio, Conservazione delle e-mail: Marzano, Conservare il digitale. Metodi, norme, tecnologie, Milano, Editrice Bibliografica, Implementation Strategies Working Group, http: Mellon Foundation, October https: Sustainable Economics for a Digital Planet: Lavoie, co-chairs , La Jolla, February http: Quale approccio per il ripensamento?
Ci soffermeremo in particolare sulle sezioni della struttura amministrativa- gestionale e descrittiva realizzate utilizzando lo standard METS-SAN. La presenza di queste elementi descrittivi in ciascuna risorsa digitale non crea negli utenti fattori di ridondanza informativa: Bruno, Linking data in digital libraries: We performed several experiments on a corpus of Spanish texts to test classification by time period and author's gender, and also authorship recognition by clustering.
Since we found no relevant literature on stylometry applied to Spanish texts, the main purpose of these experiments was to prove that computational stylometry applied to texts in Spanish language works in the same way as when applied to English texts. The tests performed have shown that by making the necessary adjustments and using the correct parameters, the tool is able to perform classification and clustering in a reliable way. We also learned from our own mistakes, particularly concerning text preparation. A proposed workflow for performing stylometric tests in a methodical way is included.
These experiment results will be briefly showcased during the conference presentation. Our work line covers three subfields related to the analysis of texts: These fields, either use similar methods, or cooperate with each other. Both text mining and computational stylometry use classification and clustering techniques Jockers and Witten, Oakes, , common also to data mining, as well as NLP methods Nieto, and in some cases neural networks Merriam and Matthews, Ramyaa et al. In fact, most of the hypotheses that we want to prove can be tested using clustering and classification methods, by first using training samples and then verifying the approach with test samples, the same as in data mining.
Visualization techniques then facilitate a good understanding of the problem and a better analysis of the experimental results. The hypothesis here - To separate translations affected by censorship from uncensored translations. Stylo provides functions for stylometric analysis, a graphic user interface and print quality diagrams. Stylo is being developed and maintained by the Computational Stylistics group2.
In other experiments, we applied the Bootstrap Consensus Tree BCT method and a dendrogram diagram for visualization for authorship recognition. According to Eder Eder et al. The BCT method was originally borrowed by Eder from the field of Language Evolution and Genetics; since a number of successful applications of the technique have been reported in the literature Rybicki and Heydel, ; van Dalen-Oskam, ; Stover et al. For authorship recognition, we also used the cluster analysis feature of the Stylo package which groups the samples by branches of a hierarchical tree structure.
The corpus used for the experiment consisted of 83 full literary works of 36 different Spanish writers from different centuries 17th to 20th and gender, including a few false examples to prove the consistency of the methods used. As an example of the latter, we used some Catalan and Portuguese texts to verify they were not taken for old Spanish. Clustering by time period: Cervantes, Galdos and Clarin. Conclusions The tests performed with Stylo have shown that by making the necessary adjustments and using the correct parameters, the tool is able to perform time-period classification, gender analysis and authorship recognition in a reliable way on a Spanish corpus of literary works, just as it works when applied to English texts.
From experience, we learned the obvious: In other words, the texts used for stylometry must be pure uncontaminated samples of the writing style of the corresponding authors. Stylometric tools proved to be very useful, but further human analysis and interpretation of the results is essential to get interesting conclusions.
In this, visualization techniques play a very important role. All this, points to the need of a methodical practice see figure 3: Proposed Stylometry analysis workflow. For future work, we intend to explore the use of these tools in computer forensics, to determine authorship and gender of short messages, as shown in the literature Brocardo et al.
Although stylometry is not an exact science and depends much on the skills and effort of the researcher choosing the appropriate methods and properly adjusting the test parameters, research on obfuscation corpora has proved that the most robust and accurate methods can be effective, even in cases of deceptive obfuscation Juola and Vescovi, Bibliographic References Brocardo, M. A measure of stylistic difference and a guide to likely authorship.
Literary and Linguistic Computing journal, 17 3: All the way through: Literary and Linguistic Computing, 22 1: Stylometry for E- mail Author Identification and Authentication. Becoming a Data Scientist — Curriculum via Metromap. Shakespeare, Computers, and the Mystery of Authorship. Computational stylistics and biblical translation: Grabowski eds , The Translator and the Computer, — R Journal, 8 1: The stylistics and stylometry of collaborative translation: Literary and Linguistic Computing, 28 4: Computational authorship verification method attributes new work to major 2nd century African author.
The case of Elisabeth Wolff and Agatha Deken. Literary and Linguistic Computing, 29 3: Riferimenti bibliografici Anderson C. Wired Magazine, June Accessed November 7, , https: Philosophical Transactions of the Royal Society A Past, present and future of historical information science. Information, Communication and Society Una lezione di storia. In Archeologia classica e post-classica tra Italia e Mediterraneo. Scritti in ricordo di Maria Pia Rossignani, a cura di S. In Medieval or Early Modern.
Current Swedish Archaeology Roselli Del Turco, A. In La nuova storia, a cura di J. Le Goff , Formation Processes of the Archaeological Record. Univers ity of Utah Press. The Signal and the Noise. Low Countries Historical Review: In The Fourth Paradigm. Data- Intensive Scientific Discovery, edited by T. On the contrary, their value arises from what they reveal in aggregate. On the one hand, the constant enhancement of digital applications for producing, storing and manipulating data has brought the focus onto data-driven and data-led science Royal Society, , 7 , even in the Humanities, on the other hand, in recent decades, archaeology has embraced digitisation.
Moreover, the low cost and improvement in computing power both software and hardware gives the opportunity to easily aggregate huge amounts of data coming from different sources at high velocity: Even if Big Data started in the world of Computer Science and are strongly connected to business, they are rapidly emerging in academic research, with scholars from different disciplines recognising the inherent research potential of analysing composite and heterogeneous datasets that dwarf in size and complexity those traditionally employed in their respective fields Wesson and Cottier ; Gattiglia In recent years, archaeologists began to ask to themselves if a Big Data approach can be applied to archaeology from both a theoretical and practical point of view Gattiglia In the scientific and scholarly world what constitutes Big Data varies significantly between disciplines, but we can certainly affirm that the shift in scale of data volume is evident in most disciplines, and that analysing large amounts of data holds the potential to revolutionise research, even in the Humanities, producing hitherto impossible and unimaginable insights Wesson and Cottier , 1.
For a better understanding of the general concept of Big Data, I adopt the definition proposed by Boyd and Crawford , This kind of approach permits to gain more choices for exploring data from diverse angles or for looking closer at certain features of them, and to comprehend aspects that we cannot understand using smaller amounts of data. Moreover, Big Data is about predictive modelling, i.
Moreover, a Big Data approach is related to the information content of data. Data are useful because they carry pieces of information. Data become information when they are processed and aggregated with other data, thereby we gain information from data when we make sense out of them Anichini and Gattiglia Finally, we can say that data are data because they describe a phenomenon in a quantified format so it can be tabulated and analysed, not because they are digital.
Digitisation usually refers to the migration of pieces of information into digital formats, for transmission, re-use and manipulation. Surely, this process has increased exponentially the amount of data that could be processed, but from a more general point of view the act of digitisation, i. Datafication is a new phenomenon brought out by the continuous development of IT technologies. Datafication promises to go significantly beyond digitisation, and to have an even more profound impact on archaeology, challenging the foundations of our established methods of measurement and providing new opportunities.
This is a key issue. To datafy means to transform objects, processes, etc. We can argue that datafication puts more emphasis on the I information of IT, dis- embedding the knowledge associated with physical objects by decoupling them from the data associated with them Gattiglia Moreover, a key differentiating aspect between digitisation and datafication is the one related to data analytics: In other words, to datafy archaeology would mean to produce a flow of data starting from the data produced by the archaeological practice, for instance, locations, interactions and relations between finds and sites.
A flow of data that the archaeological community should have available. The work of the project includes the design, development and assessment of a new software platform offering applications, tools and services for digital archaeology. This framework, that will be available through both a mobile application and a desktop version, will be able to support archaeologists in recognising and classifying pottery sherds during excavation and post- excavation analysis.
The system will be designed to provide very easy-to-use interfaces e. Our approach is driven by archaeologists needs; since we are aware of the caution of the discipline in front of the replacement of well-established methods, we plan to support this specific Humanities domain by exploiting what is already available in the Archaeology domain in terms of good practices and representation paradigms. We thus plan to deliver efficient computer-supported tools for drafting the profile of each sherd and to automatically match it with the huge archives provided by available classifications currently encoded only in drawings and written descriptions contained in books and publications.
The system will also be able to support the production of archaeological documentation, including data on localisation provided by the mobile device GPS. The platform will also allow to access tools and services able to enhance the analysis of archaeological resources, such as the open data publication of the pottery classification, or the data analysis and data visualisation of spatial distribution of a certain pottery typology, leading to a deeper interpretations of the past.
The integration of cultural heritage information from different sources, together with faster description, cataloguing and improved accessibility can be exploited to generate new knowledge around archaeological heritage. Data visualisation, for instance, would stimulate new research perspectives, and could enable new interpretation and understanding of history, and would bring archaeological storytelling to new audiences in a novel way.
By means of a wider dissemination of user-generated content, the framework would permit to develop the culture of sharing cultural resources, research and knowledge. Since we are interested in designing automatic matching and retrieval features, digital description does not mean here only digitisation of the paper catalogues, but includes understanding the meaning of the graphic representation and its conversion in a format that includes shape in vectorial format, not raster and semantic.
This process, naturally, will also require the definition of a semantically-rich digital vectorial representation for the pottery sherds and of each entire object able to represent not only the shape of the object, but also its subdivision in semantic components e. A lightweight set of metadata the subset considered crucial for the purposes of the project by our users and advisors, e.
On the other hand, the data collected through digitisation will be enriched by data collected by users during the recognition process. This will permit on-time data analysis and data visualisation. In fact, all the information encoded in the pottery identity cards being them natively digital and including data on location, classification, dating, and so on will be shared, visualised and integrated with cultural heritage information from different sources archaeological repositories, Europeana and so on in order to produce a really significant impact in the advancement of the discipline and in the accessibility for professional and non-professional users.
Real time comparisons between different archaeological sites and regions will be made possible, thus highlighting differences and commonalities in the economy of the ancient world. Data analysis will be carried on by the MAPPA Lab of the University of Pisa, and will be achieved as an exploratory statistical analysis of data related to pottery. It will be mainly concerned with data about size, density, geo-localisation and chronology. The main objective of the exploratory analysis is to disclose statistical relationships in statistical sense between the different variables considered.
Moreover, it will provide a comprehensive description of the available data, pointing out important features of the datasets, such as: There are different statistical techniques useful for exploratory data analysis, each one concentrating on particular aspects of the description we would like to give for the data. However, it is important to observe that the statistical techniques are not exploratory as such, rather they are used in order to summarize main characteristics of data, identify outliers, trends, or patterns, i.
Concerning the analysis of pottery datasets, we will concentrate on the following tools: These specific combinations provide all at once a way to summarize data, and the identification of the major sources of variability; - Spatial statistics, point pattern analysis and Kriging methods will be mainly used in order to highlight the possible patterns within the spatial distribution of data; - Different predictive modelling techniques will be implemented mostly for suggesting where to look for more data in order to get relevant gain of information, or optimal strategies to perform testing.
The results of the data analysis will be made more understandable and easily explicable applying data visualisation techniques. Apart from the quantitative data analysis, data visualization is of extreme importance, in order to: An important issue is the communicating the visual information about the relationships among different ceramic classes in the same location, the relationships between the location of the finding and the productive centre, and the relationships with pottery found in different locations.
Browse subject: Political science -- Soviet Union | The Online Books Page
A web-based visualisation tools will be built following the principles of data visualization, pionereed by Bertin , 83 , and developed for instance in Tufte ; Few ; Munzner Following these guidelines, we will classify the different data into types categorial, ordinal, interval, ratio types , and will determine which visual attributes shape, orientation, colors, texture, size, position, length, area, volume represent data types most effectively, so giving rise to the visualization, according to the basic principle of assigning most efficient attributes, such as position, length, slope, to the more quantitative data types, and less efficient attributes, like area, volume, or colors to ordinal or categorical types.
The process of building the visualisation will be made interactive, letting the user associating the different variables with the different attributes, at the same time explaining the principles above. Moreover, the different relations within pottery production, trade flows, and social interactions, will be visualised applying the same principles, with graphs. The possibilities of such system open to research actors, institutions and general public would be a dramatic change in the archaeological discipline as it is nowadays.
Its impact on the field would dramatically change the profile of the professionals involved and will generate new markets.
Bahga, Arshdeep and Madisetti, Vija Internet of Things a Hands-On Approach. Esripress Boyd, Danah and Crawford, Kate Spatial Science, quantitative revolutions and the culture of numbers. Archaeology and the Big Data challenge. Visualization Analysis and design.
Science as an Open Enterprise. Bulletin of the History of Archaeology 24 Lo studio delle bibliografie permette di applicare metodologie di distant reading per osservare somiglianze e differenze relative ai contributi presentati alle conferenze in oggetto. Idee e opinioni vengono legittimate e consolidate e metodologie di ricerca diverse si confrontano e si integrano su tematiche comuni. La rivoluzione tecnologica e la diffusione del mezzo digitale sono state le due condizioni fondamentali per la realizzazione di questo concetto e la formalizzazione del movimento Open Access OA.
La misura delle risorse citate dagli autori consente di valutare eventuali trasformazioni rispetto alla citazione tradizionale. Sono stati esaminati riferimenti bibliografici estratti da articoli. In CLiC-it, invece, queste tre macrocategorie mostrano variazioni minori nelle due annate di vita della conferenza. La tabella 2 mostra la distribuzione annuale delle altre categorie documentarie in entrambe le conferenze.
Si rileva tuttavia qualche differenza nelle percentuali di utilizzo. Il latino e lo spagnolo, sebbene scarsamente 3 Per visualizzare correttamente alcuni risultati in questa tabella sono state utilizzate due cifre decimali. AIUCD pre '50 '50 '60 '70 '80 '90 in stampa nd n. CLiC-it pre'50 '50 '60 '70 '80 '90 in stampa nd n. Le notizie riferibili agli anni precedenti il sono circoscrivibili al periodo In CLiC-it i riferimenti bibliografici in corso di stampa costituiscono solo lo 0.
Le fonti riferiscono manoscritti, lettere e testi per cui confluiscono quasi interamente nella macrocategoria libri. La tecnologia fornisce i mezzi necessari al raggiungimento di obiettivi importanti e complessi. In molti casi le Associazioni hanno facilitato il lavoro di etichettatura. Allo scopo sono stati elaborati 24 raggruppamenti tematici: Computational Linguistics - CL 3. Computer Science - CS 4.
Corpus Annotation - CA 5. Digital Archives - DA 6. Digital Libraries - DL 8. Digital Philology - DP 9. Information Retrieval - IR Latino - LAT Linguistics - LIN Machine Learning - ML Machine Translation — MT Ontology - ONT Psycholinguistics - PSY Scholarly Editing - SE Semantic Web - SW Treebanks Parsing - TP La tabella 6 presenta la distribuzione degli argomenti citati dalle due conferenze. Le Associazioni di computer science coinvolte nelle citazioni CLiC-it e AIUCD forniscono una misura degli spazi in comune nella creazione di piattaforme e infrastrutture per archivi digitali di varia natura.
Testi significativi di storia e filosofia emergono dalle bibliografie, non mancano riferimenti a opere generali come dizionari e enciclopedie. In questa classificazione abbiamo inserito tutte le citazioni di digital heritage e quelle di digital humanities. Alcune associazioni sono in comune ai due corpora citazionali. Il grafico 2 mostra le frequenze relative dei documenti ad accesso aperto nei riferimenti bibliografici delle singole annate.
Nelle due annate comparabili si registra una crescita a dimostrazione che gli autori di ambedue le conferenze hanno incrementato le citazioni a ocumenti ad accesso aperto. Tuttavia, soprattutto in AIUCD, cominciano ad emergere citazioni a journal nativi online e disponibili soltanto in versione elettronica. Nelle bibliografie di entrambe le conferenze sono presenti anche citazioni a versioni digitali di alcuni libri o contributi a libri ma sono ancora abbastanza esigue, soprattutto in CLiC-it 0.
Si tratta molto spesso di tipologie quali reports documentazione tecnica, linee guida, tesi… riscontrabili nelle citazioni delle due conferenze. Pur prestando forte attenzione ad oggetti digitali, la peculiare componente filologica di AIUCD richiede che buona parte degli autori debba necessariamente citare libri manoscritti e testi a stampa , mentre la maggiore componente tecnologica di CLiC-it spinge gran parte dei suoi autori verso articoli pubblicati in atti di congresso.
Dorr, Bryan Gibson, Mark T. Radev, and Yee Fan Tan. Budapest Open Access Initiative. Ultimo accesso 4 gennaio Bethesda Statement on Open Access Publishing. Cassella, Maria, and Oriana Bozzarelli.
Open Access e comunicazione scientifica: What's the Future for Computational Linguistics? Dale, Robert, and Adam Kilgarriff. Bibliometrics and Citation Analysis: Information Services and Use Lee Giles, and Kurt Bollacker. In Atti del 1. Convegno Nazionale sulla Letteratura Grigia, a cura di V. International Journal on Grey Literature 6: Metitieri, Fabio, e Riccardo Ridi. In Biblioteche in rete: A Companion to Digital Humanities. What about the Linguistics?
La citazione bibliografica nei percorsi di ricerca. Dalla galassia Gutenberg alla rivoluzione digitale. Communications of the ACM Its importance is due to two factors: Second, it focuses on topics that are crucial for super-diverse societies Becci, Burchardt and Giorda, , such as gender, spirituality, and ethnic minorities. In what follows, we provide a multidisciplinary analysis of these cultural and social aspects, bringing together an NLP- approach with perspectives from semiotics and religious studies.
In particular, we describe how these issues are represented in OITNB, comparing the outcome of an automatic analysis of subtitles, reviews, and fan-discussions with a semiotic interpretation of the series content and with recent work in the sociology of monastic institutions. Automated keyword extraction As a first step towards the automated analysis of OITNB, we created a large corpus containing enough information to explore both the series content and its reception. In particular, we collected a dataset made of , tokens and divided in three subcorpora: For each corpus, we extracted the top-relevant 50 keywords using KD14 Moretti et al.
KD is a flexible, rule-based tool, which supports English and Italian and can be adapted to any domain by setting a list of parameters to extract a list of ranked keywords. The ranking algorithms is mainly based on frequency, but takes into account also other information such as the keyphrase length i. We chose this tool for several reasons: Second it is an off-the-shelf tool that does not require training data hence, no preliminary annotation work.
Third, it is freely available, enabling other researchers to reproduce this work or apply a similar methodology to other sources of information. Finally, since it was developed at FBK, we had a complete control over all parameters and we could easily fine-tune them for our study. We also made an extensive use of the black- list included in the tool, allowing users to manually define a set of words or expressions to be excluded during keyword extraction. This was done because in subtitles, interjections, colloquial expressions and curse words are extremely frequent, and we had to discard them in order to focus on the actual content of the series, and to make the analysis comparable with reviews and discussion forums.
Support of English and Italian provided by KD was necessary in order to compare English subtitles with reviews and discussions leveraged from Italian websites. The partial lists of keywords are reported in Fig. Top-ranked keywords extracted from of the three corpora considered Analysis and corpus comparison In this section we provide an analysis of KD results and sketch three possible research directions: Finally, we compare the NLP-based approach with a traditional semiotic approach, in order to understand if these two theoretical perspectives are complementary.
Moreover, we can find some references to the ethnic barriers in the prison, for example a strong polarization between white and black inmates e. Finally, issues about prison management are highlighted e. The critics focus on the series social criticism e. Conversely, reviews show high awareness of gender discrimination, that affects in particular one character of the show e. Other keywords are related to the specific text-genre of reviews. Indeed, many of them seem to be used to provide the reader with contextual information about the storyline of the show e.
The third corpus in our collection represents the voice of OITNB fans and their opinion on the series. It shows important differences from the other corpora: Moreover, we can notice an evaluative attitude towards characters e. Again, there is an interest in some particular scenes of the show for their aesthetic quality eg: Finally, some key-phrases are stylistic clues about the nature of these text e.
Political science -- Soviet Union
In other words, discursive structures exist as a kind of fulfilment of semio-narrative ones and represent the part of the text where deep values are translated into factual elements such as characters, places, colours, figures, etc. At the deep level, instead, we can find the value structure of the text and these values are the key to comprehend the meaning, the message, the reason why of the text itself.
As a matter of fact, the series characters are portrayed in the middle of this dichotomy: The data extracted using KD would seem to confirm the interpretive hypothesis that the deep level of OITNB is structured by the opposition of sovereignty and submission, i. A first relevant term is mother, that appears as a relevant keyword both in subtitles, and in reviews, and represents a strong constraint for inmates. Indeed, most of them are mothers unable to raise their children.
Another example of the correlation between Semiotics and NLP analysis concerns the term sex, that appears in all the corpora. However, in both cases, sex is linked with the relationship between inmates and their own bodies: The expression refers to the final scene of the third season: Once again, we have two complementary reactions: Conversely, critics pay attention to social aspects of the detention problem.
Issues related to religion are rarely the core topic of the narration, and few characters are explicitly religious. In general, religion has a role only in moments of personal crisis, and religious groups tend to be under-represented, except for Catholics Engstrom, Valenzano III, or, more generally, Christians Clarke, Religion is, therefore, quite a scenery element — and for this reason the assumption is that it is Christian religion. The extracted keywords seem to confirm the underrepresentation of religion in TV series: Even though there are obvious limits to the analogy, it is instructive to compare prison settings to monastic settings Giorda and Hejazi, , since both can be interpreted as a totalizing place where agents live with very few possibilities to communicate with the external world, organizing themselves under strict rules and living the deep and strong presence of borders.
Furthermore, the dialectic between exterior and interior involves also the psychology of inmates and, again, we can use the descriptive tools of female monasticism in order to interpret it: This strongly hierarchized space is not an abstract construction. The jail is the centre which attracts, towards which the world goes, but also a watershed, a border which separates what and who is an insider from what and who is an outsider De Certeau M.
As in the case of monastic settings, the relationship between everyday life and environment is pivotal. Through the comparison with monastic life, we can analyse the meanings of identities of the inmates, their relationships and their representation. In the first part of our work, we described the creation of the corpora and the automated text processing. In the second part, we presented the analysis of the data, which in turn was divided into three sections: In particular, we analysed religious discourse both at a textual level and at a deep level of meaning, suggesting a parallel between prison life and monastic choice.
Our study suggests that keyphrase extractors are valuable tools for guiding, enriching and validating interpretive hypotheses concerning the semantics of the linguistic level of cultural artifacts. While our focus was on the religious and gender issues figuring in OITNB, the approach can be applied to various other topics and, more generally, to text-based cultural artifacts at large. In the future, we plan to explore how the domain-specific knowledge provided by religious discourse analysis and semiotics may, in turn, be used to fine-tune KD rules and filters.
Bibliographic References Bazzanella, Carla. John Benjamins Publishing Company, Multiple Secularities Beyond the West. Religion and Modernity in the Global Age. Femminismo, critica postcoloniale e semiotica, Milano: Demaria, Cristina, and Mascio Lella. Migrazioni interculturali e propagazioni extra-testuali. Nicola Dusi and Lucio Spaziante Roma: Demaria, Cristina, and Siri Nergaard, eds. Temi e prospettive a confronto, Milano: Surveiller et punir, Paris: Giorda, Maria Chiara, Sara Hejazi.
Essays on the social situation of mental patients and other inmates. Essays on theory, film and fiction. Indiana University Press, Extracting Keyphrases from Texts with KD. Robert Musil and the Soldaten-Zeitung Robert Musil, one of the most important authors of the twentieth-century German literature, fought in the Austrian army at the Italian front. During the First World War, between and , Musil was chief editor of the Tiroler Soldaten-Zeitung in Bozen and later of the Viennese journal Heimat, where he probably authored numerous articles.
We suggest that applying methods of formal authorship attribution helps solve both issues. In , the publishing was entrusted to the Bozen-based Heeresgruppekommando Erzherzog Eugen, to which lieutenant Musil was assigned during the same year. At the beginning of October, Musil became the chief editor of the newspaper. Due to the repositioning of the commands and technical problems, the magazine publication ended in April Subsequently, Musil moved to Vienna, where he collaborated with the war journal Heimat from March to October All the 43 numbers of Soldaten-Zeitung published with the collaboration of Musil are still extant, while only 17 issues from the 34 numbers of Heimat survived.
Both in the case of the Soldaten-Zeitung and in the case of Heimat, all the articles were published anonymously. Musil scholars have never been able to define with certainty the number of texts written by the author. In Musil studies, between and , at least 40 articles were attributed to the author. However, the surprising aspect of these attributions is the lack of evidence for their assumptions.
Subsequent studies, such as the one by Arntzen , refer to Roth without highlighting her gaps in the argumentation. For our means, analysis of the individual texts thus poses a challenge. First, it will work only on the assumption that at least 8 articles were actually written by Musil but we lack any sufficient proof to sustain this assumption.
Second, it will imply the adoption of a complex combinatory design: A strongly demanding computational task even for the most powerful machines: However, the complexity of this design can be reduced by introducing some careful simplifications. As shown by Figure 1, some texts fall deeply below the 1, word- mean with a minimum of 47 words. Consequently, these texts may be preliminarily cut off from the experiment—because they are less probably attributable and because they sharply decrease the length of the text combinations.
As demonstrated below, we have decided that a reasonable limit can be fixed to words, thus cutting off 9 texts from the experiment. The combinatory design could be repeated on this simplified corpus composed by a total of 28 texts reducing the number of combined texts to 6. However, also this design could be highly demanding in computational terms: In order to further reduce this complexity, an effective expedient can be the addition of some already attributed articles to the text chunks.
The number of iterations will sharply decrease to 3, However, the biggest issue with these project designs will be the interpretability of the results. The articles published by Musil in various journals between and , available in digital format through the Klagenfurter Ausgabe Amann et al. The first one will be tested in this paper: However, methodological research advises against the application of only one approach.
As noted by Juola , only the combination of different stylometric methods can provide probabilistically significant results. By consequence, the approach developed here may simply be the first stage of a multilayered authentication chain. The training set has been composed by articles published by Musil between and In order to test the combinatory design 2 , a number of test sets were composed by manually combining different disputed articles written by Musil.
However, their precise individuation—or, more properly, the definition of their probability—will be possible only after comparing the results of , iterations which will provide 80, different results for each text. A similar testing for design 3 hardly shows any text chunk placing itself outside of the Musil cluster. Evidently, the presence of 2, words actually written by Musil inside the test set, acts as a dominant attractor towards the Musil cluster. However, further investigation reveals how an improvement of the training set actually provides some statistically significant results.
This even holds when adopting the simplified combinatory design 3. The fact that the threshold length is around 1, words is also a promising result, because this corresponds with the average length of the disputed Musil articles. However, once again, these results cannot be generalized and a more systematic research on this type of artificial test set is advisable. This approach may eventually help tackling the biggest doubt that overshadows the project: Therefore, next to expanding the research to a more comprehensive corpus comprising all issues of Soldaten-Zeitung and Heimat , our goal is to identify all—or at least the biggest part of— previously mistaken attributions.
In relation to stylometry in general, this research has shown how the limit of 5, words, while necessary in itself for the construction of an effective project design, is not at all an insurmountable boundary. Especially when the researcher, instead of looking for positive scores or strong attributions, starts looking for negative scores and structural anomalies: Robert Musil, Klagenfurter Ausgabe.
University of Nebraska, — Authorship attribution, small samples, big problem. Fontanari, Alessandro, and Massimo Libardi, eds. University of Oulu, — Koppel, Moshe, and Yaron Winter. Der Dichter im Dienst des Generals. Robert Musils Propagandaschriften im ersten Weltkrieg. Distant reading of literary journals has allowed numerous scholars to draw interesting conclusions about the forms, shapes, currents and dynamics of literary life cf. Bode , Long and So , Goldstone and Underwood In the present study we combine two approaches — co-citation network analysis and topic modelling — in order to give a nuanced view of a Polish literary-studies bimonthly Teksty Drugie [Second Texts].
The first part is dedicated to the co-citation analysis based on metadata extracted from texts published between and 28, bibliographical records and 10, unique authors. It enabled the detection of 15 meaningful groups of authors interconnected by references they use. A subsequent part employs topic modelling to analyse full-text articles 11,3 million words published between and , which revealed thematic patterns pertinent to the journal, which are first discussed as an interconnected network and subsequently analysed in a diachronic perspective.
It shows tight links between the humanities scholarship and questions pertinent to the society. Such an approach allows for the multifaceted bird-eye view on the processes of literary scholarship. Macroanalysis and Literary Scholarship The availability of digitised full-text resources as well as bibliographical data in standard database format, opened, quite recently, a new chapter in sociology of literature, especially by revaluating empirical approaches and data-driven scholarship.
This approach gathered its momentum as other works exploring applicability of distant reading emerged. Due to the shortage of space we will name just a few that have most influence on this paper, dividing them, arbitrarily, into three research strands. Firstly, the use of bibliographical data for statistical inferences on literary processes, e. Secondly, the study of author co-occurrences and mutual references, e. Thirdly, application of topic modelling to uncover pertinent issues in literary scholarship, e.
Goldstone and Underwood analyses of the evolution of American literary scholarship on the example of PMLA and seven major literary journals In combining those approaches into a macroanalytical study of Teksty Drugie we also adopted the rationale introduced by the internet edition marking the 40th anniversary of Signs, literary journal dedicated to feminist criticism1. Material Teksty Drugie is a Polish scholarly journal dedicated to literary scholarship.
It focuses on literary theory, criticism and cultural studies, while also publishing articles by authors from the neighbouring disciplines philosophy, sociology, anthropology. The journal publishes monographic issues, dedicated to particular topics or approaches within literary and cultural studies. All those features make it a good example for exploring the vicissitudes of Polish literary scholarship.
Bibliographic material consists of 28, references in texts, published between and It should be noted that each reference to a particular article or book appears only once in metadata, no matter how many times this work was cited by the author. In order to increase comparability of the data, only names of the authors or editors in case of collected volumes were used. Overall there were 10, unique authors. More than two thirds of them The textual corpus consists of the entire collection of papers published in Teksty Drugie excluding letters, surveys, notes, etc.
The material covering the years — was digitised, OCR-ed, and then manually edited, in order to exclude running heads, editorial comments, and so forth. Obviously, some textual noise — e. The material from onwards was digitally-born, but even though a small number of textual issues might have occurred. We believe, however, that distant reading techniques are resistant to small amounts of systematic noise Eder, Given the nature of Polish, which is highly inflected, lemmatization was necessary for a reliable processing of texts.
The corpus has been lemmatised with LEM 1.
- Customer reviews.
- Electric Bicycle Conversion Kit Installation - Made Simple (How to Design, Choose, Install and Use an e-Bike Kit)!
- Der zweite Kuss des Judas: Roman (German Edition).
Piasecki, Walkowiak, Maryl, under review 2. Dynamic visualisation, supporting the interpretive process, was performed in Gephi Bastian et al. Louvain algorithm for computation Blondel et al. A bimodal network of the relations between topics were produced using, again, Gephi. Other parameters used in the study included: Subsequent trials showed that the best segmentation of the co-citation network is achieved with 27 groups among which 12 were too small for analysis.
Those groups were interpreted see. The graph is bidirectional authors referencing others may have also been referenced.
- Trade Options Online;
- How to Build a Robot with an Arduino - Module 1 of 10.
Figure 1, Co-citation network of Teksty Drugie with respective clusters. Firstly, we analysed and categorised the topics on the basis of their predominant words. The categories are as follows: A thorough exploration of such models requires a topographical visualisation capable of showing the connections between various topics, which often share a key word cf.
Goldstone and Underwood, For instance, almost perfectly in the geometrical centre of the network we may find topics and words pertinent to literary scholarship: Relationships between topics in Teksty Drugie. Discussion In the discussion we will elaborate the details of both models and show how both methods may complement each other. Topic models help us understand why certain authors could be connected in the co-citation network.
It could also allow us to check whether the dominance of a certain topic stems from the large number of scholars who pursue it, or rather, depends on the fact that a small group of authors has been publishing more often than others. Journal of Statistical Mechanics: Theory and Experiment Recalibrating the Literary Field. Communications of the ACM, 55 4: Goldstone, Andrew and Ted Underwood. New Literary History, 45 3: Digital Methods and Literary History. University of llinois Press. Long, Hoyt and Richard So.
Abstract Models for a Literary History. Paper submitted to DH So, Richard and Long, Hoyt Boundary 2, 40 2: Date le caratteristiche dei calcolatori elettronici, e in particolare l'aspetto computazionale Ausiello et al. Utilizzando un approccio interdisciplinare, che attinge a diversi settori, dalla critica testuale all'informatica teorica, passando per la biblioteconomia, questa relazione vuole concentrarsi sulla natura precipua delle edizioni elettroniche, analizzando da un diverso punto di vista aspetti fino ad ora considerati non rilevanti e accessori.
Come conseguenza i modelli sviluppati fino ad ora, come ad esempio il metodo Lachmann Timpanaro , o la teoria del copy-text di Bowers-Greg Greg , hanno la loro natura fondativa basata su criteri formali e nomotetici, basti pensare alla legge di maggioranza o alle differenze strutturali tra accidentals e substantials. Passando a un livello pragmatico, in un saggio di bibliografia testuale, Neil Harris cita una definizione di W. Riferimenti Bibliografici Ausiello, G.
Thinking about the digital humanities. The Beauty of Code, the Code of Beauty. Why the Humanities Need Numbers to Survive". Aspen Ideas Festival Fondamenti di critica testuale. The Rationale of Copy-Text. Studies in Bibliography 3 1: A Critique of Modern Textual Criticism. University of Virginia Press. Literature after the World Wide Web. Or do electronic scholarly editions have a mercurial attitude? International seminar of Digital Philology, Edinburgh, Informatica e critica dei testi. Manuale di linguistica e filologia romanza, Manuali.
TEI Consortium a cura di. Guidelines for Electronic Text Encoding and Interchange. La genesi del metodo del Lachmann, Firenze: Communications of the ACM 49 3: Imagining how emerging technologies could first be tried on humanities problems and then scaled up to infrastructure for others to use has been one of the defining features of the field. The project lasted 34 years and at its peak involved as staff of 71 persons all housed in large ex-textile factory in Gallarate.
For their time they were dealing with big data, really big data. Please try again later. Kindle Edition Verified Purchase. There was a problem loading comments right now. Customers also viewed these items. Control interno y fraudes Spanish Edition. There's a problem loading this menu right now. Learn more about Amazon Prime. Get fast, free shipping with Amazon Prime. Get to Know Us. English Choose a language for shopping.
Amazon Music Stream millions of songs. Amazon Advertising Find, attract, and engage customers. Amazon Drive Cloud storage from Amazon. Alexa Actionable Analytics for the Web. AmazonGlobal Ship Orders Internationally. Amazon Inspire Digital Educational Resources.