DiSLiDaS 2022

Home

The Cost Action CA18209 NexusLinguarum (https://nexuslinguarum.eu) is glad to announce the Workshop Discourse studies and linguistic data science: Addressing challenges in interoperability, multilinguality and linguistic data processing – DiSLiDaSDue to restrictions from Covid-19, the workshop will be held in a hybrid mode, so speakers and attendees can choose to participate onsite or online.

Conference aims and topics

The purpose of the workshop is to gather current research advances in discourse analysis and representation, in the context of multilinguality, from a linguistic and computational perspective. We invite submissions addressing challenges such as interoperability, linguistic linked open data (LLOD), and language processing and analysis. 

The workshop topics are the following (but not limited to):    

Topics:

  • Discourse and dialog annotation: Parsing and representation across languages and frameworks
  • Discourse markers and discourse relations (RST, PDTB, SDRT): Identification, prediction and extraction
  • Attitudes discovery and interpretation in Discourse: Appraisal and sentiment
  • Effects of multimodality on discourse interpretation: Intonation, gesture and text
  • Interoperability for Multilingual language data: Challenges of rich and distributed data
  • Discourse data and machine learning: Methods and tools

Discourse comprises a wide variety of linguistic phenomena, such as discourse markers, discourse relations, speaker attitude, that have been largely studied by different communities of practice from Linguistics and Computation, rendering several theoretical frameworks (for instance, RST, SDRT, PDTB, for discourse relations; appraisal theory for sentiment analysis,…), and technological approaches, such as transformer models, embeddings and alike. Nonetheless, there are open issues with regards to interoperability, multilinguality, and language processing, in particular, the existence of different annotation schemas, disambiguation, lack of training data for machine learning, scarcity of effective language phenomena detection and interpretation methods, diverse vocabularies, insufficient multilingual parallel corpora of non-dialog and dialog, initial stages of exploration of multimodality. 

Discourse research is one of  the central research areas of natural language processing (NLP) too. NLP research focuses on formalization, identification and discovery of semantic phenomena, dialogue exchange structure, and coherence of text. Some of the technological approaches of NLP include the use of transformer models, word embeddings, linguistic linked open data, constitution of aligned multilingual corpora, vocabularies of language phenomena and alike. Computational discourse explores the evidence that language consists not only in placing words in the right order but also in detection and interpretation of the meaning and deeper textual relations as well as organizing ideas into a logical textual flow. The linguistic approaches study language phenomena referring to coherence and cohesiveness of discourse, lexical, phrasal, syntactic, semantic and pragmatic means to express discourse relations, represent their roles and build language resources for them.

Despite all the advances, there are still plenty of unresolved problems related to interoperability, multilinguality, and language processing. With the growth of the Semantic Web and Linguistic Linked Data, interoperability is key to read, to interpret and to adopt language resources. The existence of different annotation schemas to encode discourse relations constitutes a problem to allow data exchange and re-use on the one hand and to provide theoretical consistency when producing annotated corpora. Ideally, the model is custom designed to deal with all the specificities of a particular dataset, but also broad enough so that it can be applied to other datasets. Many proposals try to achieve this balance, one of them being ISO 24617. The treatment of multilinguality is also complicated because of the insufficiency of multilingual parallel corpora of collections of non-dialog and dialog texts, that would allow systematic contrastive studies. As to language processing, the lack of training data for machine learning, coupled with the scarcity of effective language phenomena detection and interpretation methods, the coexistence of diverse vocabularies, and the minimal attention to the contribution of the tone of voice, intonation, gestures to the meaning and the informative value of discourse elements makes the task of discourse processing still very challenging.

The workshop intends to be a forum of discussion for researchers interested in addressing the aforementioned challenges and in advancing the-state-of-art in discourse studies and linguistic data science.

Call for extended abstracts

Workshop Discourse studies and linguistic data science: Addressing challenges in interoperability, multilinguality and linguistic data processing – DiSLiDaS

Jerusalem, Jerusalem College of Technology
24 May 2022

Home

The Cost Action CA18209 NexusLinguarum (https://nexuslinguarum.eu) is glad to announce the Workshop Discourse studies and linguistic data science: Addressing challenges in interoperability, multilinguality and linguistic data processing (DiSLiDaS). Due to restrictions from Covid-19, the workshop will be held in a hybrid mode, so speakers and attendees can choose to participate on site or online.

Conference aims and topics

The purpose of the workshop is to gather current research advances in discourse analysis and representation, in the context of multilinguality, from a linguistic and computational perspective. We invite submissions addressing challenges such as interoperability, linguistic linked open data (LLOD), and language processing and analysis. 

The workshop topics are the following (but not limited to):    

Topics:

  • Discourse and dialog annotation: Parsing and representation across languages and frameworks
  • Discourse markers and discourse relations (RST, PDTB, SDRT): Identification, prediction and extraction
  • Attitudes discovery and interpretation in Discourse: Appraisal and sentiment
  • Effects of multimodality on discourse interpretation: Intonation, gesture and text
  • Interoperability for Multilingual language data: Challenges of rich and distributed data
  • Discourse data and machine learning: Methods and tools

Discourse comprises a wide variety of linguistic phenomena, such as discourse markers, discourse relations, speaker attitude, that have been largely studied by different communities of practice from Linguistics and Computation, rendering several theoretical frameworks (for instance, RST, SDRT, PDTB, for discourse relations; appraisal theory for sentiment analysis,…), and technological approaches, such as transformer models, embeddings and alike. Nonetheless, there are open issues with regards to interoperability, multilinguality, and language processing, in particular, the existence of different annotation schemas, disambiguation, lack of training data for machine learning, scarcity of effective language phenomena detection and interpretation methods, diverse vocabularies, insufficient multilingual parallel corpora of non-dialog and dialog, initial stages of exploration of multimodality. 

Discourse research is one of  the central research areas of natural language processing (NLP) too. NLP research focuses on formalization, identification and discovery of semantic phenomena, dialogue exchange structure, and coherence of text. Some of the technological approaches of NLP include the use of transformer models, word embeddings, linguistic linked open data, constitution of aligned multilingual corpora, vocabularies of language phenomena and alike. Computational discourse explores the evidence that language consists not only in placing words in the right order but also in detection and interpretation of the meaning and deeper textual relations as well as organizing ideas into a logical textual flow. The linguistic approaches study language phenomena referring to coherence and cohesiveness of discourse, lexical, phrasal, syntactic, semantic and pragmatic means to express discourse relations, represent their roles and build language resources for them.

Despite all the advances, there are still plenty of unresolved problems related to interoperability, multilinguality, and language processing. With the growth of the Semantic Web and Linguistic Linked Data, interoperability is key to read, to interpret and to adopt language resources. The existence of different annotation schemas to encode discourse relations constitutes a problem to allow data exchange and re-use on the one hand and to provide theoretical consistency when producing annotated corpora. Ideally, the model is custom designed to deal with all the specificities of a particular dataset, but also broad enough so that it can be applied to other datasets. Many proposals try to achieve this balance, one of them being ISO 24617. The treatment of multilinguality is also complicated because of the insufficiency of multilingual parallel corpora of collections of non-dialog and dialog texts, that would allow systematic contrastive studies. As to language processing, the lack of training data for machine learning, coupled with the scarcity of effective language phenomena detection and interpretation methods, the coexistence of diverse vocabularies, and the minimal attention to the contribution of the tone of voice, intonation, gestures to the meaning and the informative value of discourse elements makes the task of discourse processing still very challenging.

The workshop intends to be a forum of discussion for researchers interested in addressing the aforementioned challenges and in advancing the-state-of-art in discourse studies and linguistic data science.

Programme committee

Nicholas Asher, CNRS/IRIT, Toulouse, France
Johan Bos, University of Groningen, Groningen, The Netherlands
Paul Buitelaar, NUI Galway, Ireland
Harry Bunt, Tilburg University, Netherlands
Philip Cimiano, University Bielefeld, Germany
Ludivine Crible, Ghent University
Maria Josep Cuenca, Universitat de València
Vera Demberg
, University of Saarland, Germany
Jorge Gracia, University of Zaragoza, Spain
Mikel Iruskieta, University of the Basque Country, Spain
John McCrae, NUI Galway, Ireland
Anna Nedoluzhko, Charles University, Czech Republic
Ted Sanders, Utrecht University
Merel Scholman, University of Saarland, Germany
Manfred Stede, University Potsdam, Germany
Radoslava Trnavac, University of Belgrade, Serbia
Amir Zeldes, The Georgetown University, USA

Organization committee

Chaya Liebeskind, Jerusalem College of Technology, Jerusalem (Local organizer)

Purificação Silvano, Faculty of Arts and Humanities of the University of Porto, CLUP, Porto, Portugal
Christian Chiarcos, Applied Computational Linguistics, Goethe-Universität, Frankfurt am Main, Germany
Mariana Damova, Mozaika, Ltd., Sofia, Bulgaria 
Giedre Valunaite Oleskevicienė, Mykolas Romeris University, Institute of Humanities, Vilnius, Lithuania
Dimitar Trajanov, Faculty of Computer Science and Engineering Ss. Cyril and Methodius University, Skopje, North Macedonia
Ciprian-Octavian Truica, Faculty of Automatic Control and Computers, University Politehnica of Bucharest, Bucharest, Romania
Elena-Simona Apostol, Faculty of Automatic Control and Computers, University Politehnica of Bucharest, Bucharest, Romania
Anna Bączkowska, Institute of English and American Studies, University of Gdansk, Gdansk, Poland

The Scientific Programme will include one invited talk and oral presentations.

Invited Speaker – Bonnie Webber, University of Edinburgh – Talk title: “Discourse Connectives Revisited”

Bonnie Webber received her PhD from Harvard University and then taught at the
University of Pennsylvania in Philadelphia for 20 years before joining the
School of Informatics at the University of Edinburgh, where she is now
professor emeritus.

Known for early research on “cooperative question-answering” and extended
research on discourse anaphora and discourse relations, she has served as
President of the Association for Computational Linguistics (ACL) and Deputy
Chair of the European COST action IS1312, “TextLink: Structuring Discourse
in Multilingual Europe”. Along with Aravind Joshi, Rashmi Prasad, Alan Lee
and Eleni Miltsakaki, she is co-developer of the Penn Discourse TreeBank,
most recently, the PDTB-3.0 (LDC2019T05).

She is a Fellow of the Association for Advancement of Artificial Intelligence
(AAAI), the Association for Computational Linguistics (ACL) and the Royal
Society of Edinburgh (RSE). In July 2020, she was awarded the ACL Life Time
Achievement award. Her current interest is focussed on automating the
recognition and correction of inconsistencies in annotated corpora.

24 May 2022 – Programme

9:00Opening remarks
9:15-9:45Multiword expressions as discourse markers in multilingual TED-ELH Parallel Corpus, Giedre Valunaite Oeskeviciene and Chaya Liebeskind
9:45-10:15Towards Discourse Annotation in CLARIN-PL, Maciej Ogrodniczuk, Sebastian Żurowski and Paulina Rosalska
10:15-10:45Evaluation of Cross-Lingual Methods for Discourse Markers Detection, Kostadin Mishev, Mariana Damova, Giedre Valunaite Oleskeviciene, Chaya Liebeskind, Dimitar Trajanov, Purificação Silvano and Christian Chiarcos
10:45-11:15Coffee break
11:15-11:45Information-providing dialogue acts: taxonomic issues, Darinka Verdonik
12:00-13:00Invited Talk          Bonnie Webber – Discourse connectives revisited
13:00-14:00Lunch break
14:00-14:30ISO-DR-core plugs into ISO-dialogue acts for a crosslinguistic taxonomy of discourse markers, Purificação Silvano and Mariana Damova
14:30-15:00Testing the Continuity Hypothesis: evidence from corpus analysis, Debopam Das and Markus Egg 
15:00-15:30QUDs and discourse relations: Non-at-issue information in texts, Christoph Hesse, Ralf Klabunde, Anton Benz and Maurice Langner
15:30-16:00Coffee break
16:00-16:30Discussion
16:30-17:00Closing remarks

DiSLiDaS Workshop participantsat the end of the day

Submission

Authors are invited to submit and extended abstract up to 4 pages in pdf using the template LaTeX or MS Word.

Submissions must be anonymous and should be submitted electronically via EasyChair

At least one author of each accepted extended abstract is required to register for, and present the work at the workshop.

Important dates:

Time Zone: Anywhere on Earth

Extended abstracts due: March, 20, 2022

Extended abstract notifications: April, 20, 2022

Full papers due: July, 20, 2022 Full papers notifications: October, 15, 2022

Registration

*First Name *Last Name 

*Affiliation *Position 

*User Email *Phone Number

*Participation *

  •  In person
  •  Online

User Password 

*Presentation *

  •  With presentation
  •  Without presentation

Submit

Venue & Travel

See in the link below:

Travel information about Israel and the logistics of the Workshop

Contact:

organizers@dislidas.mozajka.co

Website:

http://dislidas.mozajka.co