【学术讲座】史晓东教授作“机器同传技术探讨”报告

2018年6月28日下午3时,应厦门大学口译学研究所邀请,厦门大学人工智能研究所所长,智能科学系副主任史晓东教授走进厦门大学基金楼102,为师生作了题为“机器同传技术探讨”的学术讲座,现场座无虚席。 史教授是外文学院口译团队今年刚刚获批立项的跨学科创新团队的骨干成员之一。讲座由肖晓燕教授(跨学科创新团队负责人)主持。

讲座中,史晓东教授介绍了机器同传技术的发展历史,并以微软、谷歌和腾讯等国际顶尖AI翻译为例,向大家展示了目前国内外机器翻译的水平与现状。他还介绍了实现机器同传的四种方案,即三阶段方案、两阶段方案、直接翻译方案和Universal Translator方案,并简单讲解了各个方案的实现步骤、形式和技术现状。 做完讲解之后,史教授为我们现场演示了他们研发的同传机器——两台黑色的笔记本电脑。它们外形看起来与寻常电脑无异,只不过内部多了一个特殊芯片。这台机器能够识别中文语音并转化为中文文本,并同步翻译出对应的英文文本,再同步输出英文语音,即实现从中文语音到英文语音的同传。
Continue reading “【学术讲座】史晓东教授作“机器同传技术探讨”报告”

Call for Task Participation: CENTRE@NTCIR-14

#############################################
CALL FOR PARTICIPATION at the
NTCIR-14 CENTRE task
http://www.centre-eval.org/ntcir14/

The NTCIR-14 edition of
CENTRE = CLEF/NTCIR/TREC Reproducibility
#############################################

CENTRE@NTCIR-14 is a new NTCIR task
and part of the bigger CLEF/NTCIR/TREC Reproducibility effort
( http://www.centre-eval.org/ ).
The CLEF, NTCIR, and TREC editions of CENTRE
come in slightly different flavours.

CENTRE@NTCIR-14 offers the following subtasks:

T1 (replicability=same data, different research groups):
Replicate the pair of RMIT runs from the NTCIR-13 WWW task
http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings13/pdf/ntcir/02-NTCIR13-WWW-GallagherL.pdf
which leveraged the sequential/full dependency models from
a Metzler/Croft SIGIR’05 paper!
https://doi.org/10.1145/1076034.1076115
Continue reading “Call for Task Participation: CENTRE@NTCIR-14”

祝贺厦门大学自然语言处理实验室在CWMT2018多语言翻译任务中获第一名

在刚刚结束的第十四届全国机器翻译研讨会(CWMT 2018)机器翻译评测中,厦门大学自然语言处理实验室在“英日汉多语言翻译”任务上取得第一名。
全国机器翻译研讨会(CWMT)机器翻译评测是国内机器翻译领域规模最大、最具权威的评测活动,由中国中文信息学会主办,参评单位包括国内外从事机器翻译研究的高校科研院所和企事业单位。近年来,厦门大学自然语言处理实验室在CWMT评测中屡获佳绩:在CWMT 2015中获汉英、双盲两项第一名,藏汉第二名;在CWMT 2017中获藏汉、维汉两项第一名,蒙汉、汉英、日汉三项第二名,英汉第三名。今年CWMT 2018评测新增面向专利领域的英日汉多语言翻译任务,旨在评测面向专业领域低资源情况下的机器翻译技术。厦门大学团队采用多语言神经网络机器翻译模型,利用伪平行语料方法和迭代训练策略,最终在该任务上取得了41.24的BLEU得分,以高出第二名2.20分的明显优势获得该项目的冠军。
CWMT
2018研讨会将于2018年10月25日至26日在福建武夷学院举行,厦门大学自然语言处理实验室主任史晓东教授担任此次会议的大会主席。

1st Call for Papers: NLE Journal Special Issue on NLP for Similar Languages, Varieties and Dialects

1st Call for Papers

Natural Language Engineering Journal – Cambridge University Press
Special Issue on NLP for Similar Languages, Varieties and Dialects
URL: https://sites.google.com/view/nledialects

Guest Editors
Marcos Zampieri (University of Wolverhampton, United Kingdom)
Preslav Nakov (Qatar Computing Research Institute, HBKU, Qatar)

Topics

Recent initiatives in language technology have led to the development of at least minimal language processing toolkits for all EU-official languages, as well as for languages with a large number of speakers worldwide such as Chinese and Arabic. Apart from those official languages, a large number of dialects or closely-related language varieties are in daily use, not only as spoken colloquial languages but also in written media and social networks. Building language resources and tools from scratch is expensive, but the efforts can often be reduced by making use of pre-existing resources and tools for related, resource-richer languages.
Continue reading “1st Call for Papers: NLE Journal Special Issue on NLP for Similar Languages, Varieties and Dialects”

Legal Information Extraction/Entailment Competition (COLIEE-2018); Call for Participation

Competition on Legal Information Extraction/Entailment (COLIEE) 2018,
run in association with the Workshop on Juris-informatics (JURISIN) 2018
http://www.ualberta.ca/~miyoung2/COLIEE2018/
http://research.nii.ac.jp/jurisin2018/
November 12-13, 2018, Yokohama, Japan

As an associated event of JURISIN 2018, we are happy to announce the 5th
Competition on Legal Information Extraction and Entailment (COLIEE-2018),
derived from the case law competition and statute law competition.
Continue reading “Legal Information Extraction/Entailment Competition (COLIEE-2018); Call for Participation”

IWSLT 2018: Call for Participation

Dear colleagues,

I’m glad to announce you the two tasks of the 2018 IWSLT Evaluation Campaign:

– Low resource MT from Basque to English

Given the scarcity of available parallel data, the challenge is how to optimally leverage data from
other translation directions, e.g. Basque-Spanish, Spanish-English, etc., knowing that:
“Linguistically, Basque is unrelated to the other languages of Europe and, as a language isolate, to any
other known living language … Basque has adopted a good deal of its vocabulary from the Romance
languages” (from Wikipedia)

– Speech Translation from English to German

The challenge of this year is to train neural systems directly from source speech to target text.
For this exercise, we provide a parallel corpus of TED Talks of English audio and German text, as well
as a baseline implementation of a traditional speech recognition + machine translation pipeline as a
Docker container.
Continue reading “IWSLT 2018: Call for Participation”