GCND Hackathon

GCND Hackathon: Automatic linguistic annotation and speech recognition of dialects

Calling students and researchers in linguistics, digital humanities, social sciences, computer science, natural language processing: we have a rich language data source, and we’d like you to come and work on it!

Join us for a dynamic two-part course exploring dialect syntax through hands-on projects. During the first part of the course, participants will learn about automatic speech recognition (ASR), data annotation, and visualization, with a focus on dialect data in lectures taught by Veronique Hoste (UGent) and Hugo van Hamme (KU Leuven).

In the second part, participants will work on collaborative projects using data from the Spoken Corpus of Southern Dutch Dialects (CGND) to enhance ASR or annotation tools, gaining practical experience in dialect research. Potential projects will be proposed by CGND-researchers, but participants will also be encouraged to propose their own projects. Projects may involve developing tools for data analysis, address questions about language variation, or creating works incorporating the range of voices and stories in the dataset.

Participants in the hack will have access to:

The parsed corpus of Southern Dutch Dialects (GCND) is a linguistically annotated corpus based on existing dialect recordings from the 1960s and 1970s: Voices from the past. The corpus, which is still being expanded, currently provides about 500 hours of audio-aligned transcriptions from ca. 550 different locations in two layers, one closer to the dialect and one closer to Standard Dutch. About 50 of those are already part-of-speech tagged, automatically parsed and manually corrected. The corpus is meant to facilitate large-scale research into syntactical particularities of the southern Dutch dialects. The course will be taught in English, but basic knowledge of Dutch will be required to be able to work with the GCND.

Course Objectives

  • Develop skills in annotating dialect data sets and create and adapt tools for analyzing dialect data.
  • Gain proficiency in annotation of spoken data focusing on Southern Dutch dialects, automatic speech recognition, and collaborative project development.
  • Acquire interdisciplinary research experience and foster collaborations.

How to Apply

The course will take place on Monday 7th to Wednesday 9th October 2024 at Gent University and is free of charge for every participant. Lunches and coffee will be provided. For participants not affiliated with a Flemish institution that do not take the course as part of a Flemish Doctoral Schools program, several travel funds up to €500 for travel and accommodation expenses is available. Please indicate on the sign-up sheet if you want to apply for such funding.

 

To apply, visit the sign-up sheet. Registration is possible until September 1. You can contact maud.westendorp@uit.no or melissa.farasyn@ugent.be with general inquiries.