DANES Newsletter - March 2024

Happy birthday DANES! We are celebrating one year since our first get together in the DANES 2023 conference–join us this month for an extra-happy happy hour, in which we will celebrate and reflect on the DANES activities of the past year! This month’s newsletter has some exciting updates of DANES working groups, including the progress of the Elamite task force and the efforts of the annotation and interoperability group in linking relevant assyriological resources to Wikidata.

Some common threads are the consolidation of egyptological projects and computational research, as well as previously sporadic datasets. Also, we are beginning to see much more inscribed artifacts and monuments recorded in 3D using photogrammetry, and published open access online. There is an expanding pattern of sharing old and new datasets across the fields of the ANE–from fully annotated texts, to GIS archaeological site data, and annotations of scripts from images–and using that data for new computational research. Don’t forget to cite properly!

DANES Working Groups

MEGA-ALP Elamite task force

This month saw the setting up of the first draft of the MEGA annotation guide! We had a first MEGA housekeeping meeting in which we discussed the ontology and annotation standards, and later in the monthly sprint reflected on problems and challenges, as well on the fluctuating nature of digital resources and editions for ancient texts. We currently have 539 annotated entries and counting!

Join us for the next monthly sprint on 27^th of March 15:00-16:00 CET / 16:00-17:00 IST / 09:00-10:00 EST. Noun, verb, and adjective annotation groups are now meeting regularly, please send an email to shygordin@gmail.com or katrien.degraef@ugent.be to join.

The DANES happy-hour

A get-together on the DANES discord channel to discuss news in computational studies of the ANE, following up on the monthly newsletter. Come one come all to celebrate the DANES one year anniversary!

This month’s happy hour will take place on Wednesday 6^th of March at 17:00-18:00 CET / 18:00-19:00 IST / 11:00-12:00 EST

Recent Academic Publications

Classification in Sumerian cuneiform and the implementation of iClassifier (article, Journal of Chinese Writing Systems), by Gebhard J Selz and Bo Zhang

As part of a special issue of the Journal of Chinese Writing Systems, Classifiers in Ancient Scripts, the iClassifier project led by Orly Goldwasser contributed several papers on the ancient classifier systems of ancient Chinese, Egyptian hieroglyphs, and Sumerian. The latter has just appeared, presenting results on computational analysis of semantic classification in Sumerian texts and the application of Gebhard Selz’s consolidated list of 50 cuneiform determinatives, i.e. classifiers. The study extracted more than 5,200 tokens (i.e. signs in Sumerian words) with at least one classifier or more from the electronic Pennsylvania Sumerian Dictionary (now ePSD2). The iClassifier system maps the Sumerian classifiers, creating a dynamic network of words and their variant spellings with links to their different classifiers. Taking also the intensity (i.e. raw frequency) of links each classifier has to different word variants will allow scholars to start uncovering emic taxnomies and patterns of classification in Sumerian, as well as linking them across many other ancient languages with similar systems.

Cuneiform Stroke Recognition and Vectorization in 2D Images (article, Digital Humanities Quarterly), by Adéla Hamplová, Avital Romach, Josef Pavlíček, Arnošt Veselý, Martin Čejka, David Franc, Shai Gordin

High-quality optical character recognition (OCR) requires significant amounts of training data that are not easily available for cuneiform, particularly in the case of 2D images, which are becoming the most ubiquitous digital representation of cuneiform. This study suggests a new approach to an OCR pipeline from 2D images. Instead of identifying signs, the article shows the plausibility of using deep learning to identify cuneiform strokes. Focusing on horizontal strokes as an initial case-study, they show the potential of a method that is period-agnostic and can provide immediate vectorization of the strokes. Such vectorizations can be used in further stages of a pipeline to identify constellations of strokes with specific cuneiform signs, and assist in paleographic analysis. The annotated dataset of horizontal strokes was released with the article.

Linguistic annotation of cuneiform texts using treebanks and deep learning (article, Digital Scholarship in the Humanities), by Matthew Ong and Shai Gordin

The article describes an efficient pipeline for morpho-syntactically annotating an ancient language corpus using machine learning (ML) methods. The authors situate their work in the field of similar ancient language treebank projects, arguing that humanities scholars can leverage current machine-learning tools to produce their own richly annotated corpora. The study further illustrates this pipeline by producing a new Akkadian-language treebank based on two volumes from the online editions of the State Archives of Assyria (SAA) project hosted on ORACC, as well as a spaCy language model named AkkParser trained on that treebank. Both of these are made publicly available for annotating other Akkadian corpora (see also datasets published below).

Long-Term Urban and Population Trends in the Southern Mesopotamian Floodplains (article, Journal of Archaeological Research), by Nicolò Marchetti, Eugenio Bortolini, Jessica Cristina Menghi Sartorio, Valentina Orrù, and Federico Zaina

There has recently been a rising trend in studies of long-term urbanization processes in southern Mesopotamia. This paper presents interim results of the FloodPlains Web GIS project, using close to 5,000 site data covering a period of more than 6,000 years collected from published archaeological surveys and excavations since the 1950s; their data and code is available on GitHub, as well as in an interactive GIS platform. The authors measured modifications over time in a variety of demographic proxies generated through probabilistic approaches. Their results show that different factors, chiefly among them rapid climate changes, contributed to the main urban and demographic cycles in southern Mesopotamia. They also discuss what settlement strategies characterized each of these demographic cycles.

Highlighted Academic Publications

Ancient Egypt, New Technology: The Present and Future of Computer Visualization, Virtual Reality and Other Digital Humanities in Egyptology (book, 2023), edited by Rita Lucarelli, Joshua A. Roberson, and Steve Vinson

A starting point to current and recent digital humanities projects in egyptology, the above volume collects case-studies of digitization projects and computational analysis methods for Egyptian texts, artifacts, and landscapes. Most of the book chapters were originally presented at a conference held in 2019 at Bloomington Indiana, whose purpose was to bring scholars from around the world to discuss their state-of-the-art research in digital egyptology. The contributions cover a wide breadth of new methodologies. They include ethics of digital publications, online interfaces and digitization of collections and textual corpora, or Egyptian-themed computer games as pedagogical tool for schoolchildren. They also investigate new computational research questions with a critical perspective, such as the use of photogrammetry with museum artifacts and archaeological contexts for better analyses, or new possibilities and potential pitfalls of virtual reconstructions of ancient landscapes. Many of the articles include detailed introductions into the creation and execution of the digital aspects of their work.

Special Mention

The up-to-date ORACC sign list (OSL, previously OGSL) has launched with new linked data properties! Linked data ensures interoperability between websites, platforms, and datasets through the semantic web. Congratulation to Steve Tinney and Robin Leroy for producing unique IDs for each sign. One can also download the sign list in several formats, and read more about the continuously updating cuneiform unicode standard. The signs are partially linked to Wikidata, as part of the Factgrid Cuneiform project led by the DANES interoperability and annotation team leaders Adam Anderson and Timo Homburg, together with DANES network co-creator and board member Hubert Mara. To link from Wikidata to the ORACC Sign List, please join in supporting the wikidata property proposal!

Datasets Published

Bibliography of Egyptological databases and datasets, by Alexander Ilin-Tomich and Tobias Konrad, aims to collect digital publications of databases, texts, images, 3D models, etc., which are related to egyptology, and not otherwise represented in conventional egyptological bibliographies. The dataset is available on Zenodo as a CSV or JSON file, as well as in a Zotero file and library.

A Full Morphosyntactic Annotation of the State Archives of Assyria Letter Corpus, by Matthew Ong, available on Zenodo, includes annotations specifying part of speech, lemma, morphological decomposition, and syntactic dependencies of all relevant tokens in the entire letter corpus of the Neo-Assyrian empire available on ORACC (ca. 2600 letters). The annotations were made with the help of a language model with human checking to validate a representative selection of the results. It also includes metadata such as sender, recipient, estimated date of composition, script, and dialect of Akkadian (if determinable).

Tower of Babylon Stele in 3D photogrammetry, by Olof Pedersén, is a 3D representation of the so-called Tower of Babylon stele. The stele was discovered in Babylon in the early 1990s, for 28 years it was housed in the private Schøyen Collection (Oslo), and since December 2023 it is back in the Iraq Museum (Baghdad). It is the only known original side illustration of the Ziggurat in Babylon and the only picture of the king Nebuchadnezzar II so far found in Mesopotamia. Following his photogrammetry of the stele in Oct. 2023, Pedersén also published his analysis of the artefact, its findspot, and implications on its dubious authenticity.

Transliterated Cuneiform Tablets of the Electronic Babylonian Library Platform, by Yunus Cobanoglu, Jussi Laasonen, Fabian Simonjetz, Ilya Khait, Sophie Cohen, Zsombor Földi, Aino Hätinen, Adrian Heinrich, Tonio Mitto, Geraldina Rozzi, Luis Sáenz, and Enrique Jiménez (in Journal of Open Humanities Data), makes available the transliterated dataset of the eBL for further computational study. They publish all cuneiform fragments transliterated by 1^st of September 2023 on Zenodo in JSON format, and a GitHub repository with instructions on how to access the current, full dataset in python code through an API. The full dataset includes more than 25,000 fragments of transliterated cuneiform texts. On the eBL project, see below under Upcoming Events.

Conferences and Call for Papers

Upcoming Events

The electronic Babylonian Library (eBL) is hosting a zoom meeting intended to provide a comprehensive overview of the tools and functionalities of the eBL platform. It will also include a dedicate Q&A section. The eBL’s goal is to accumulate and make accessible a large mass of transliterated cuneiform fragments with an effective search function, to alleviate the difficulties of deciphering broken and damaged cuneiform texts. The first session will take place on March 5^th at 6:00 PM CET, and participation requires registration.

The biblissim-ia-2024 Artificial intelligence for pattern and handwriting recognition is organizing a confernce that will take place on 7^th-8^th of March in Aubervilliers (France) and on zoom. Talks include use of advanced optical character recognition (OCR) and handwriting text recognition (HTR) for varied languages and corpora, such as Hebrew, Greek, Sanskrit, and medieval manuscripts. Some discuss the combination of OCR/HTR models and language models to improve the identification pipeline.

A Syriac Transcribathon will take place between 25^th-28^th of March at Princeton University and online. The Princeton MARBAS initiative and the ERC MiDRASH project are organizing a hybrid transcribathon on Syriac manuscripts. Data produced will be open source and freely shared as far as image rights permit. The aim is to produce a sufficient quantity of transcriptions for manuscript images in order to support the training of handwritten text recognition (HTR) computer models for Syriac. Deadline for registration is 10^th of March.

Call for Papers

The 2024 annual ASOR conference will include the following sessions and workshops on digital and computational topics:

Deadline for papers is the 15^th of March.

Did we miss relevant articles published in the previous month? Did we miss upcoming events in the next month? Would you like to ensure your news will appear in the next newsletter? Please send us an email at digpasts@gmail.com! Corrections to published Newsletters will be sent via the DANES mailing list. Image copyrights can be found by clicking on each image.

On this page