Code-Mixed Question Answering Challenge

Welcome to the shared task for Code-Mixed Question Answering.

Please register here if you are interested in taking part in the challenge:

https://goo.gl/forms/QGKexLbQs8GdfLkw1

Task Description

Code-Mixing(CM) is the phenomenon of "embedding of linguistic units such as phrases, words and morphemes of one language into an utterance of another language". The lexicon and syntactic formulations from both the languages are mixed to form a single coherent sentence. Some of such mixtures are known as Spanglish, Hinglish, Tenglish, Portunol and Franponaisor. CM usually prevails in a multilingual configuration with speakers having more than one common language. Moreover, anglicization of languages is also a very common phenomenon these days, which leads to the representation of native words in English letters phonetically. The increasing use of CM is also driven by the ease and speed of communication mainly facilitated by the easier choice of words and a richer set of expressions to choose from. In spite of this, current QA systems only support interaction in a single language. This severely hampers the ability of a multi-lingual user to interact naturally with the QA system. This is especially true in scenarios involving technical and scientific terminology. For example, when a native Telugu speaker wants to know the director of the movie 'Heart Attack', he is more likely to express it as "heart attack cinema ni direct chesindi evaru?" (Translation: who directed the movie heart attack) where the words 'heart attack', 'direct', 'cinema' are all English words. Hence, to increase the reach, impact and effectiveness of QA in multi-lingual societies, it is imperative to support QA in CM languages. We are proposing a challenge for an end-to-end Code-Mixed(CM) factoid Question Answering system. This task will be conducted in three dominantly spoken languages in India; Hindi, Tamil and Telugu, mixed with English. The questions are generated from general images and articles from hinglishpedia.com .

Task Timeline

This is the tentative schedule for the shared task.

Training data release

27 February2018

Test Batch 1

13 March 2018

Test Batch 2

27 March 2018

Test Batch 3

10 April 2018

Test Batch 4

24 April 2018

Test Batch 5

8 May 2018

Paper Submission for a special session at EMNLP/NIPS (tentative) - date and venue to be announced soon.

Organizers

Alan W Black
Khyathi Raghavi Chandu
Manoj Chinnakotla
Eric Nyberg

Please contact us if you are interested in data collection and/or participating in the task.

Alan W Black : Email: awb@cs.cmu.edu
Khyathi Raghavi Chandu : Email: kchandu@cs.cmu.edu