Call For Shared Task Participation: WAT2018 (The 5th Workshop on Asian Translation)

(The 5th Workshop on Asian Translation)
in collaboration with PACLIC32
December 1, 2 or 3, 2018, Hong Kong

Following the success of the previous WAT workshops, WAT2018 will
bring together machine translation researchers and users to try,
evaluate, share and discuss brand-new ideas about machine

WAT2018 does NOT accept research papers. Instead you can submit them
to PACLIC32.

What’s NEW in WAT2018:

* baseline translations are updated to NMT (from PBSMT)
* additional test data for patent tasks
* Myanmar-English translation tasks
* multilingual translation subtask for 10 Indian languages

************************* IMPORTANT NOTICE *************************
Participants of the previous workshop are also required to sign up to


The task is to improve the text translation quality for scientific
papers and patent documents. Participants choose any of the subtasks
in which they would like to participate and translate the test data
using their machine translation systems. The WAT organizers will
evaluate the results submitted using automatic evaluation and human
evaluation. We will also provide a baseline machine translation.

Scientific Paper Tasks: [Asian Scientific Paper Excerpt Corpus (ASPEC)]
English/Chinese Japanese
Patent Tasks: [Japan Patent Office Patent Corpus 2.0 (JPC2)]
English/Chinese/Korean Japanese
Chinese -> Japanese Expression Pattern Task
Newswire Tasks:
English Japanese [JIJI Corpus]
Indian Language Tasks:
Hindi English [IIT Bombay (IITB) Corpus]
10 Indian Languages [NEW!!]
Mixed domain tasks: UCSY and ALT Corpora
Myanmar (Burmese) English [NEW!!]
Recipe Tasks: [Cookpad Comparable Corpus]
Japanese English


* Scientific paper Tasks:

WAT uses ASPEC for the dataset including training, development,
development test and test data. Participants of the scientific paper
tasks must get a copy of ASPEC by themselves. ASPEC consists of
approximately 3 million Japanese-English parallel sentences from paper
abstracts (ASPEC-JE) and approximately 0.7 million Japanese-Chinese
paper excerpts (ASPEC-JC)

* Patent Tasks:

WAT uses JPO Patent Corpus 2.0 (JPC2), which is constructed by Japan
Patent Office (JPO). This corpus consists of 1 million parallel
sentences from patent description with four categories (Chemistry,
Electricity, Machine and Physics) for each language pair
(English-Japanese, Chinese-Japanese and Korean-Japanese). Participants
are required to get it on WAT2018 site of JPC2.

– English/Chinese/Korean Japanese:
These tasks evaluate performance of a translation model similarly as
the other translation tasks. Differing from the previous tasks at
WAT2015, WAT2016 and WAT2017, new test sets of these tasks consists
of (a) patent documents published between 2011 and 2013, which were
used in the past years’ WAT, and (b) ones published between 2016 and
2017 for each language pair. We will also evaluate performance of
the section (a) so as to compare systems submitted in the past
years’ WAT.

– Chinese -> Japanese Expression Pattern Task:
This task evaluates performance of a translation model for each
predefined category of expression patterns, which corresponds to
title of invention (TIT), abstract (ABS), scope of claim (CLM) or
description (DES). Test set of this task consists of sentences each
of which is annotated with a corresponding category of expression

* Newswire Tasks (English Japanese):

WAT uses JIJI Corpus, which is constructed by Jiji Press Ltd. in
collaboration with the National Institute of Information and
Communications Technology (NICT). This corpus consists of a
Japanese-English news corpus of 200K parallel sentences, from Jiji
Press news with various categories. Participants of patent tasks are
required to get it on WAT2017 site of JIJI Corpus.

* Indian Language Tasks:

TBA (Keep watching our WEB site)

* Myanmar English Tasks:

WAT uses UCSY Corpus and ALT Corpus.
The UCSY corpus and a portion of the ALT corpus are use as training data,
which are around 220,000 lines of sentences and phrases.
The development and test data are from the ALT corpus.

* Recipe Tasks:

WAT uses Recipe Corpus, which is constructed by Cookpad Inc. This
corpus consists of 16,282 Japanese-English parallel sentences from
recipes. Participants of recipe tasks are required to get it on
WAT2018 site of Recipe Corpus.


Automatic evaluation:
We are providing an automatic evaluation server. It is for free for
everyone, but you need to create an account for evaluation. Just
showing the list of evaluation results does not require an account.

Eval. result:

Human evaluation:
Both crowdsourcing evaluation and JPO adequacy evaluation will be
carried out for selected subtasks and selected submitted systems (the
details will be announced later). Participants can submit one
translation result for each subtask.


Pushpak Bhattacharyya, Indian Institute of Technology Bombay (IIT), India
Raj Dabre, National Institute of Information and Communications Technology (NICT), Japan
Chenchen Ding, National Institute of Information and Communications Technology (NICT), Japan
Isao Goto, Japan Broadcasting Corporation (NHK), Japan
Jun Harashima, Cookpad Inc., Japan
Shohei Higashiyama, National Institute of Information and Communications Technology (NICT), Japan
Hideo Kazawa, Google, Japan
Anoop Kunchukuttan, Microsoft Research India, India
Sadao Kurohashi, Kyoto University, Japan
Hideya Mino, Japan Broadcasting Corporation (NHK), Japan
Toshiaki Nakazawa, The University of Tokyo, Japan
Graham Neubig, Carnegie Mellon University (CMU), Japan
Yusuke Oda, Google, Japan
Win Pa Pa, University of Computer Studies, Yangon (UCSY), Myanmar
Katsuhito Sudoh, Nara Institute of Science and Technology (NAIST), Japan

Leave a Reply

Your email address will not be published. Required fields are marked *