Rebuilding and modernizing a large, 20-year-old database of folk poetry with eXist-db and TEI Publisher

The community meetings are held online using this link, usually on the first Tuesday of each month at 17:00 CET/CEST. No registration is required to attend. We warmly invite you to propose topics info@e-editiones.org for the forthcoming meetings.

Community meetup on May 06, 2025 at 17:00 CET.

Folk poetry is closely tied with the Finnish national awakening in the 19th century. The publication of the Finnish national epic, Elias Lönnrot’s Kalevala, in 1835 awakened a national idealism, inspiring a large number of university students to follow in Lönnrot’s footsteps to collect folk poetry. The majority of the original folk poetry thus collected was published by the Finnish Literature Society in the first half of the 20th century in in 33 printed volumes, containing some 89 000 Kalevala-metre texts in total. Some 20 years ago, this series, called Suomen Kansan Vanhat Runot, or SKVR in short (The Ancient Songs of the Finnish People), was made into an SQL-based database.

The SKVR collection is of major importance folklore studies and other fields of Humanities scholarships as well as to individuals in non-academic fields, such as folk musicians and authors seeking inspiration for their work. However, by 2024 the old database was in dire need of modernization, with improved features to better serve the needs of researchers. We’d successfully used eXist-db and TEI Publisher for some years for smaller digital editions, the largest one up to then containing images of c. 10 000 archival index cards with transcriptions, so they felt like the best tools for our modernization project. One of the major benefits of eXist-db and TEI Publisher, for us, is that as open source software they allow us to control the entire process ourselves, from design to building work to possible changes and improvements based on user feedback.

The new database would contain not only the original c. 89 000 texts but also a new collection of c. 84 000 texts which had not been included in the original printed series for various reasons. In our project we had quite specific requirements for features: e.g. a complicated, three- or four-level genre categorization; multi-selection within facets; enabling users to select which collection to query; fielded queries. Performance was a major concern, especially with regard to the catalogue views with facets. We could split the original SKVR and the new dataset of unsorted poems into two separate catalogue views, but nearly 90 000 TEI documents per dataset is still a lot to load into a HTML view with facets. One way to (further) improve performance would have been to limit the facet listings to show e.g. the first 50 entries initially, but felt it wasn’t a solution in this case: with this kind of data, the facets themselves offer a wealth of information on the genres, collectors, collection regions, places and years of the poems – information that is updated in real time when the users makes facet selections. If a user wants to know, say, where and when a specific person collected poetry or which regions yielded epic poems, the facets reveal that information immediately.

The rebuilding and modernization project has been a learning experience in many ways, and challenging too, not the least because of the expectations of various existing user groups with different needs. It has shown us concretely that eXist-db and TEI Publisher indeed work well with large datasets. The experiences gained during the project will be used and hopefully improved upon in forthcoming large-scale publishing projects.

The SKVR database can be accessed at: https://skvr.fi

Printed SKVR-series staples, image by the Finnish Literature Society