Access to the internet is a basic component of everyday life and civic engagement, but one in which language continues to be a challenge for fair and equitable access. As Europe becomes more multicultural, and personal and professional mobility between cultures rapidly increases, access to fundamental resources such as local news and government services is limited by the great diversity of the EU’s 37 languages. The internet mostly developed in English, and without clear planning for how language issues might form barriers to access and engagement, nor how multilingualism might be supported. In the EU, websites and online services for citizens have developed national local language resources, and often only provide a second language (usually English) when absolutely needed; but the great proliferation of web content, multiple and fast-changing content streams, and an expanding user interest base make this approach untenable. And while advanced natural language research and resources exist for a few dominant languages (English, French, German), many of Europe’s smaller language communities—and the news media industry that serves them—lack appropriate tools for multilingual internet development. For the EU to realise a truly equitable, open, multilingual future internet, new tools allowing high quality transformations (not translations) between languages are urgently needed. 

The EMBEDDIA project seeks to address these challenges by leveraging innovations in the use of cross-lingual embeddings coupled with deep neural networks to allow existing monolingual resources to be used across languages, leveraging their high speed of operation for near real-time applications, without the need for large computational resources. Across three years, the project’s six academic and four industry partners will develop novel solutions including for under-represented languages, and test them in real-world news and media production contexts. Automated content analysis of news media, including both news articles and users’ comments on them, can provide unparalleled insight into current events, interests and opinions, as well as trends and changes in them. The needs are varied, from the readers who consume news of their personal interest to journalists who keep track of what is going on in the world, try to understand what their readers think of various topics, or want to automate routine reporting.

Alexandra Garatzogianni, Coordinator of MediaFutures, presented the MediaFutures project at the Hackashop on News Media Content Analysis and Automated Report Generation of the EMBEDDIA project, which was in conjunction with the EACL 2021. The aim of the hackashop was to foster discussion and research on the combination of language technology and news media content. The hackashop was a forum for both (1) discussing scientific advances in analysis of news stories and their reader comments and in automated generation of reports, as well as for (2) experimental work on identifying interesting phenomena in reader comments and reporting on them. The Hackahop embraced cross-disciplinary collaborations of computer scientists with media researchers and other social scientists in order to reach richer insights into the needs and opportunities in news media analysis and generation, while welcoming contributions that address multilingual settings, including low-resource languages.