The motivation for the TaaS project is to address the need for instant access to the most up-to-date terms, user participation in the acquisition and sharing of multilingual terminological data, and efficient solutions for terminology resources reuse.
The TaaS project will establish a sustainable cloud-based platform that provides the following online core terminology services for key terminology tasks:
- Automatic extraction of monolingual term candidates from user uploaded documents using the state-of-the-art terminology extraction techniques
- Automatic recognition of translation equivalents for the extracted terms in user-defined target language(s) from different public and industry terminology databases
- Automatic acquisition of translation equivalents for terms not found in term banks from parallel/comparable web data using the state-of-the-art terminology extraction and bilingual terminology alignment methods (MS2: Prototype bilingual term extraction system/M12)
- Facilities for cleaning up (i.e., revising: editing, deleting) of automatically acquired terminology by users
- Facilities for terminology sharing and reusing: APIs and export tools for sharing resulting terminological data with major term banks and reuse in different user applications (MS3: TaaS platform and integrated core services)
The TaaS platform will demonstrate the efficacy of reusing acquired and user-cleaned terminology resources within the following usage scenarios:
- Simplify the process for language workers to prepare, store and share of task-specific multilingual term glossaries
- Provide instant access to term translation equivalents and translation candidates for professional translators through computer-assisted translation (CAT) tools
- Domain adaptation of statistical machine translation (SMT) systems by dynamic integration with TaaS provided terminological data
The TaaS project will have impact in multiple areas:
- Availability of terminology resources is a precondition for humans to communicate clearly and businesses to operate efficiently. In many smaller countries there is a lack and fragmentation of terminology coverage. TaaS will address these issues by providing an innovative model to create and use terminology resources by automated acquisition of multilingual terms from the web and by user involvement in data clean-up. New term data will be shared and be publicly available through integration with other well-established tools and terminology portals such as EuroTermBank, TAUS Data Association, ELRA Catalogue, META-SHARE Repository, and others.
- Reuse of terminology resources will be greatly facilitated with API-level interfaces for integration into CAT tools and language technology systems.
- TaaS will improve the quality of MT systems. Novel approaches will be elaborated and evaluated for terminology integration with SMT to adapt for domain specific translations.
- New and efficient work patterns developed. Translators, terminologists and language learners will spend less time searching for term candidates and for their translation. They will just edit and / or accept term candidates in the list from the TAAS platform. MT systems will benefit by having access to more qualitative language data.
- Competitiveness of SMEs in localization and translation industry will increase. Surveys indicate that translators and terminologists spend over 30% of their time on terminology creation. TaaS aims to decrease time spent on terminology work by over 50%. Further productivity increase will be achieved by application of domain-adapted MT through TaaS provided services.
- Translation quality improved. TaaS will increase productivity – seeking term translations in the vast tangle of available terminology collections and repositories is beyond the workflow of a casual translator, terminologist, and inefficient for the professional translator. By focusing on consolidating and managing terminology on a continuous basis more consistent translations can and will be ensured.
- Crossing the digital divide. Due to large volumes of terms that appear in usage with increasing velocity a lack of a repository of multilingual terminological data exacerbates the digital divide between larger and smaller EU languages. Wide accessibility to better and broader terminology will promote information availability and learning and integration for representatives of smaller languages. Access to organized, available, up-to-date terminology will promote the development and acceptance of appropriate new terms in a broader range of languages.