안찬수의 더 느린질주: 번역의 진화, ‘인간번역’에서 ‘기계+인간번역’으로/ 이일재 광운대·영어영문학과

‘제5회 한국번역대상’ 수상자 이일재 광운대 교수가 말하는 AI 시대의 번역

써본 사람은 안다. 현재 기계번역 프로그램들이 얼마나 진화했는지. 몇 년 전만 하더라도 횡설수설하기 일쑤던 이 프로그램들은 이젠 썩 괜찮은 문장들을 쏟아낸다. 우리는 그저 이 프로그램들이 1차로 번역한 문장들을 좀 더 알기 쉽게 바꿔주기만 하면 된다. 인간 고유의 영역인 줄만 알았던 번역에서도 이제 기계와 인간의 협업이 이뤄지고 있는 것. 하지만 기계번역의 발전과, 그에 따른 기계와 인간의 협업을 마냥 만족스러운 표정으로 바라보는 사람만 있는 건 아니다. 그들이 걱정하는 이유는 무엇일까? 이 문명에서 인간이 퇴출당할까 봐? 만약 그렇다면, 그것은 또 다른 형태의 인간중심주의(낭만주의)에 지나지 않는다. 기계번역의 진화는 우리 인간이 공존해야 할 대상이 늘어났다는 걸 말한다. 대등한 의미로서의 공존의 대상이 우리의 창조물이라는 점에서 대략난감하지만 말이다(인간은 자신이 만든 것 앞에서 주인 행세하는 걸 당연하게 생각하니까).

최근 기계번역 프로그램을 활용해 교육·학술 발전에 기여한 공로로 ‘제5회 한국번역대상’을 수상한 이일재 광운대 교수(영어영문학과)도 ‘기계와 인간의 협업’에 초점을 맞췄다. 상을 받으며 이 교수는 “인간이 최적의 번역가”라면서도 “제한된 시간에 많은 양의 글을 일관성 있게 번역해야 하는 경우, 즉 인간의 한계를 넘어서는 번역을 해야 할 경우에는 기계번역의 도움이 필요하다”고 말했다. 번역의 진화에 대한 의견을 더 듣고 싶어 그에게 원고를 요청했다. 그는 “영어로 작성하는 게 내 의견을 더 정확하게 전달할 수 있다”며 영어로 작성한 원고를 보내왔다. (당연히) 마감에 쫓기는 기자는 기계번역 프로그램을 활용해 그의 글을 번역할 수밖에 없었다. 말 그대로 기계와 협업한 것. 바로 아래 실린 글이 非전문 번역가인 기자와 기계가 협업한 결과물이다. 결과물의 수준을 직접 확인해보시길 바란다. 이 교수의 영어 원문은 kyosu.net에서 확인할 수 있다. 원문의 제목은 「Out of Human Translation, but into Machine + Human Translation」이다.

양도웅 기자 doh0328@kyosu.net

이일재 광운대•영어영문학과

4차 산업혁명 시대에 개발된 인공지능(AI)과 딥러닝은 구글의 구글 번역기, 마이크로소프트의 BING, 그리고 네이버의 파파고와 카카오의 kakao i 등의 기계번역 프로그램을 획기적으로 발전시켰다. 이 프로그램들은 무료라는 점이 무색할 정도로 번역의 정확성 면에서 놀랄 만한 향상을 보이며, 누구나 인터넷만 연결하면 언제 어디서든 이 프로그램을 사용할 수 있다. 하지만 언어와 관련된 일을 하는 사람들 사이에서 이런 변화에 대해 다음과 같은 말들이 자주 오가곤 한다. “번역가는 가까운 미래에 일자리를 잃을 거야” “사람들은 더는 외국어를 공부할 필요가 없어” “그래도 기계번역은 인간번역을 능가할 수 없어” “기계번역의 질은 끔찍할 정도야” 등등···. 나는 이 글에서 앞으로 번역이 어떻게 변화할지에 대한 전망과 누군가에겐 환영받지 못하는 이런 불가피한 예측에 대한 대중의 관심에 대해서도 코멘트를 해볼까 한다.

18세기에 일어난 산업혁명은 수공업에서 기계공업으로의 변화를 가져왔다. 가령 인간이 하던 직물 생산을 기계가 대신하게 된 것이다. 재봉사에 대한 수요는 사실상 사라졌지만, 재봉 일은 기계와 함께 계속됐는데, 그 규모는 상상할 수 없을 정도로 큰 규모였다. 직물 산업은 나날이 발전했고 GDP는 급격하게 증가했다. 사람들은 더욱 부유해졌고, 마을은 빠르게 도시화 됐다. 굳이 언급하지 않아도 인간이 만든 직물은 정성과 품질 면에서 기계가 만든 직물보다 우수하지만, 기계를 사용하지 않고 오늘날 우리가 우리에게 필요한 모든 섬유를 생산하는 게 가능할까? 같은 맥락에서 농기계를 손쉽게 사용하는 오늘날, 농부들은 손수 씨앗을 뿌리고 낫으로 직접 수확하지 않는다. 옛날처럼 농부들이 직접 맨손으로 이 힘든 노동을 해야 하나? 당신이라면 그렇게 하겠는가? 넓디넓은 초원에서? 쨍쨍 내리쬐는 햇볕 아래서? 이웃이 직접 자신의 텃밭에서 기른 채소에 정성이 가득하고, 우리 마을에 사는 농부가 화학 비료 없이 기른 유기농 채소가 안전하다는 것은 모두가 아는 사실이다. 하지만 이웃과 마을의 농부들만으로 모든 이의 저녁 식탁에 놓일 채소를 생산하는 게 가능할까? 만약 인간이 섬유기계와 농기계를 지속적으로 발전시키지 않았다면, 지금 우리는 과연 어떤 모습을 하고 있겠는가?

마찬가지로, 번역가는 하나의 독립된 직업이자 언어서비스 제공자로서 역사에서 매우 중요한 역할을 해왔다. 하지만, 인간번역가가 제공하는 언어서비스는 수년 내로 극히 제한된 영역에서만 역할을 하게 될 것 같다. 그럼에도 ‘번역’이라는 일은 여전히 우리에게 필요한 작업이며, 기존에 우리가 알던 경계들을 넘나들며 거대한 언어서비스 산업으로 진화하는 중이기도 하다. 의심의 여지 없이 인간번역은 최상의 번역 품질을 보장하지만, 충분한 시간과 만족스러운 금전적 보상이 보장돼야 가능하다. 따라서 인간번역의 품질은 한 개인의 내적 상태(컨디션), 그리고 개인 간의 차이에 영향을 받는다. 번역해야 할 글의 양이 많은 경우, 한 명의 번역가가 질적으로 일관성 있게 구문과 단어를 번역하는 것은 만만치 않은 일이다. 또한, 다루는 내용이 광범위한 글을 번역해야 할 경우, 여러 명의 번역가가 질적으로 일관성 있게 구문과 단어를 번역하는 것 또한 거의 불가능에 가까운 일이다. 마감이라는 압박 속에서 인간번역의 품질은 아무래도 일관성을 유지하기 힘들고, 따라서 신뢰하기도 어렵다.

인간의 도움 없이 기계가 1차로 번역한 날 것의 글은, 번역 중인 두 언어 사이에서 딥러닝된 정보가 충분하다면 어느 정도 이해할 수 있다. 유럽 국가들의 언어가 바로 이런 사례에 속하는데, 왜냐하면 두 언어를 번역하는 데 필요한 정보(딥러닝된 정보)가 수십 년 동안 축적됐기 때문이다. 하지만 두 언어를 번역하는 데 필요한 데이터가 충분하지 않은 경우, 가령 한국어에서 영어로 번역하는 경우, 기계가 1차로 번역한 날 것의 글을 이해하기란 쉽지 않다. 이해불가능성(不可解性)은 일반적으로 구문(문장) 단위에서가 아니라 어휘 단위에서 발생한다. 이는 기계가 구문 단위에서는 이해 가능한 수준으로 번역한다는 뜻이다. 하지만 어휘 단위에서는 제대로 된 번역을 하지 못한다는 뜻인데, 적어도 한국어-영어 번역에서 이런 점을 확인할 수 있다. 딥러닝이 충분히 학습하지 못한 단어들, 동음이의어(장인, 성적, 정부), 고유명사(총각, 이황, 파전), 신조어(대박, 팀장, 문콕), 구어(口語)(아싸, 꼰대, 쩔다), 방언(면경, 지짐, 호랑) 등이 그렇다. 가령 장인(丈人, 匠人, 掌印 등)은 맥락에 따라 다양하게 번역될 수 있어 전혀 엉뚱한 단어로 번역될 우려가 있다.

‘포스트에디터(post-editor)’라고 불리는 새로운 직업을 가진 사람들은 기계가 1차로 번역한 날 것의 글을 원문과 대조해 편집하고 교열한다. 여러 연구 결과에 따르면, 이런 번역 방식(1차 기계번역→2차 인간교정)이 번역 시간을 50%에서 70%까지 단축시키고, 기계의 도움으로 인간의 인지적 피로감 또한 줄어들게 된다. 현재 포스트에디터의 역할은 번역 원문을 사전 편집(pre-editing)하는 데까지 확장됐다. 기계번역의 언어처리 효율을 높이기 위해 긴 절(節)을 짧게 잘라 기계가 적절하게 판독할 수 있도록 수정하고, 기계번역에 문제가 될 소지가 있는 단어를 기계가 보다 쉽게 구분할 수 있는 단어로 바꿔주고, 원문 표현이 논리적인지(문법에 맞는지) 검토한다. 처음부터 인간이 번역하려면 구문과 단어에 대한 정교한 지식, 두 언어가 속한 문화에 대한 다양하고 세련된 지식을 갖고 있어야 한다. 기계가 1차로 번역한 글을 편집하고 교열하는 일(post-edit)은 인간이 번역에 들이는 시간과 에너지를 줄여주는데, 특히 구문(문법)을 번역하는 시간과 에너지를 대폭 줄여준다. 이 결과, 과거 전문 번역가들만이 할 수 있었던 번역이라는 전문적인 작업을, 이제 막 언어를 배우기 시작한 평범한 대학생들도 할 수 있게 됐다.

이 시대는 다른 언어들 간의 ‘신속하고 가성비 좋은’ 정보 공유와 전달이 개인의 생존과 기업의 생존, 그리고 정부의 생존에 필수적인 시대다. 유감스럽게도 이제는 인간번역이 과거처럼 번역 방식에서 주류로 돌아가는 건 쉽지 않다. 기계번역은 인간번역이 해결하지 못하는 영역에서 대체 가능한 선택지이기 때문이다. 이를 위해 번역 테크놀로지는 인간과 기계를 능동적으로 연결해주는 번역 작업창(workbench)을 개발했다. 컨셉은 이렇다. 인터넷 검색을 위해 우리는 구글, 네이버, 다음 등의 검색 엔진을 사용한다. 번역 작업창에서는 원문을 자동으로, 예를 들어 구글 번역기나 네이버 파파고의 기계번역 엔진으로 연결해 번역해 주고, 인간이 이러한 기계번역 엔진이 번역한 문장들을 수정하도록 한다. 게다가, 기계번역 프로그램들은 지금까지 번역한 문장들을 번역메모리에 저장할 뿐만 아니라, 특정한 분야의 전문용어들을 용어사전에 등록할 수도 있다. 번역메모리와 용어사전에 저장된 번역 자료는 재활용되고, 같은 프로그램을 사용하는 다른 포스트에디터에게 공유돼 그들의 번역 작업을 돕는다.

구글 번역기에 기초한 CASMACAT 작업창은 유럽의 언어서비스 산업에서 널리 사용되는 번역 작업창이다. 반면 구글 번역기와 네이버 파파고에 기초한 VisualTran 작업창은 한 한국 IT 회사가 단독 소유해 운영하는데, 예를 들어, CASMACAT도 마찬가지겠지만, VisualTran 작업창의 경우 1명의 포스터에디터가 대략 3~4시간 안에 10쪽 분량의 글을 작업할 수 있도록 한다. 또한, 번역메모리를 활용해 이전 포스트에디터들이 번역할 때 사용한 문장과 단어, 용어들을 현재 포스트에디터에게 보여준다. 게다가, 번역 프로젝트 관리자들이 컴퓨터 혹은 모바일 기기에서 언제든 모든 포스트에디터의 작업 상태를 확인할 수 있는 관리자 프로그램을 제공한다. 이 프로그램은 각각의 포스터에디터가 어느 정도의 작업 진행률을 보이는지를 나타내는데, 따라서 프로젝트 관리자는 실시간으로 사후 편집된 번역 문건의 진행과 품질을 관리 감독할 수 있고, 의문 사항이 떠오르면 해당 포스트에디터에게 바로 문의할 수도 있다.

‘기계번역’이라는 새로운 컨셉과 ‘포스터에디터’라는 새로운 직업은 한국 사회에선 그리 친숙하지 않다. 하지만, 유럽의 언어서비스 산업은 이미 오래전부터 기계번역 중심으로 탈바꿈했다. 어떤 회사는 자사가 기계번역과 포스트에디팅을 통해 번역한다는 걸 당당하게 밝히는 반면, 어떤 회사는 여전히 자료가 충분치 않은 분야에 정보 전달하기 위해, 혹은 온전히 우수한 번역 품질을 위해 인간번역을 한다. 하지만 아마 이를 위해 높은 번역료를 부담하고 있을 것이다. 언어서비스 산업은 기계통·번역에 기초를 두고 있다. 기계통·번역은 번역 기술을 탑재한 AI와 상상을 뛰어넘는 작업능력의 컴퓨터를 활용한다.

이러한 언어서비스 산업은 과거 초신성에 비교할 만한 휴대폰의 폭발적인 탄생과 전적으로 닮았으며, 블랙홀을 연상시킬 만큼 우리의 삶을 급격하게 변화시키고 있는 스마트폰의 확대와도 닮았다. 통·번역과 관련된 새로운 언어서비스 산업은 수많은 언어 전문가는 물론 기계통·번역에 관심 있는 사람이라면 누구든 받아들여야 할 것이다. 교육기관과 정부는 이러한 언어서비스 산업에서의 획기적인 변화에 주목할 필요가 있다. 학생들의 교육 목표 중 하나로서, AI를 활용한 언어 통·번역 능력을 향상시키는 데 관심을 가질 필요가 있다. 또한, 교육기관은 학생들이 언어서비스 산업에 배운 지식을 접목하는 융합교육을 실시하고, 나아가 학생들에게 이런 언어서비스 산업 시대에 자신(인간)의 역할이 무엇인지 살펴보라고 가르쳐야 한다. 정부기관도 공문서를 일관되고 정확하게 그리고 신속하게 번역하는 것이 국가에 매우 중요하다는 것을 깨달아야만 한다. 통·번역 산업의 선두에 선 국가를 건설하기 위한 과감한 결단이 필요한 시점이다.

[이일재 교수 글 원문]

Out of Human Translation and into the Machine Translation Industry

Exploitation of artificial intelligence (AI) and deep learning in the Fourth Industrial Revolution has led to the ground-breaking technological advancement of machine translation programs such as global versions of Google Translate of Google and Bing Translator of Microsoft and local versions of Papago of NAVER and Kakao i of Kakao. Translation accuracy has superbly enhanced, and those online programs are nevertheless free of charge and can be freely accessible from wherever internet is connected, and by whoever and whenever. Chats about this change have flourished amongst anyone whose life involves with languages: “Translators will go broke in near future,” “People don’t need to study foreign languages anymore,” “Machine translation can’t surpass human translation,” “Machine-translation quality is horrible,” etc. I’d like to comment on these chats, and further draw public attention to the omen, unwelcome to some but inevitable.

The Industrial Revolution in the 18th century marked the transition from hand-production manners to machines. Human-based textile-production was replaced by mechanized textile-production. Need for manual sewers practically disappeared, but the activity of sewing per se continued with machines, but in unthinkably large scale. The textile industry expanded, and the national GDP boosted up. People became richer, and towns became more urbanized. Not to mention that man-made textiles can be affectionate and of a higher quality, but can humans produce all the textiles in modern days without machines? In the same vein, when agricultural machinery is easily available as now, farmers don’t sow manually or mow with a sickle. Should they? Would you? In the far-ranging prairie? Under the scorching heat? Not to mention that greens from a neighbor’s kitchen garden can be affectionate and organic vegetables from local farmers can be free of synthesized fertilizers, can kitchen gardens and local farmers keep up with all the vegetables in everyday’s dinner tables? What would have led to if humans had given up enhancing the technologies of textile machinery and agricultural machinery?

Likewise, ‘translator’ is an independent occupation, and has played an essential role in history as a language-services provider. Such a service by human translator may, however, be terminated in counted years. ‘Translation,’ nevertheless, as a job remains intact—further evolving itself into the language-services industry, gigantically and across the borderlines. Human translation can undoubtedly guarantee the optimal quality provided that time is sufficient and remuneration is satisfactory. Hence, the quality of human translation is subject to inter- and intra-variations. It is always a formidable task for the same translators to maintain the expected quality of grammar and vocabulary, when given an overwhelming load of translation. It is also nearly impossible for different translators to maintain the similar quality of grammar and vocabulary in large-scale translation. Under time pressure, the quality of human translation by any means can only be unreliable and inconsistent.

Machine-translated raw outputs—that is, without the involvement of humans—can be legitimately comprehensible, when ‘deep-learned’ data between the language pair under translation are sufficient. Languages of the European Union are of such a case because the data have been stored for a few decades. However, when ‘deep-learned’ data between a language pair—for example, Korean and English—are insufficient, machine-translated raw outputs can be practically gibberish. Incomprehensibility arises generally in lexical levels, not in syntactic levels. This means that machine translation is reasonably successful in the translation of grammar, but not quite so in the translation of words, as in the Korean-English translation. Words, not learned sufficiently in deep learning, such as homonym (장인, 성적, 정부), proper noun (총각, 이황, 파전), neologism (대박, 팀장, 문콕), colloquial expression (아싸, 꼰대, 쩔다), local word (면경, 지짐, 호랑), etc., may offer a wild translation in English. Humans under the new occupational title called ‘post-editor’ can then post-edit (or revise afterwards) the machine-translated raw outputs, as comparing with the Korean text. Studies show that translation time shortens by 50% to 70% and cognitive effort reduces since the machine takes much care of grammar translation. Presently, the duty of ‘post-editor’ has extended to the pre-editing the source text as segmenting the clauses to be shorter for the machine to process, altering problematic words to more sensible ones for the machine to recognize, and checking the logical argument of the expressions. Human translation from scratch necessitates quite advanced, sophisticated knowledge of linguistics (grammar and vocabulary) and cultures of the language pair under translation. Post-edited machine translation alleviates human involvement primarily to vocabulary, even inviting average undergraduates majoring in languages to once the reserved, prestigious work area for translators.

In this era where a ‘rapid and cost-effective’ attainment and distribution of information among different languages is ‘vital to the survival’ of individuals, corporates, and governments, it is uneasy to turn to human translation, regrettably. Machine translation is a workable option, particularly in the area where human translation is beyond the reach. For it, translation technologies put forth interactive workbenches that link humans to machines. The concept is this. To search on the internet, we can use Google, NAVER, or Daum engines. Translation workbenches automatically link the source text to, for example, Google Translate or Papago for machine translation and allow humans to revise the raw outputs. Moreover, they can store the entire translation data in the translation memory and register specific terminologies in the glossary. The data in the translation memory and in the glossary can be reused and shared within the same workbench network. CASMACAT based on Google Translate is a popular workbench used among language-services industries in Europe, while VisualTran mainly based on Google Translate and Papago is a local workbench solely-owned and operated by a Korean-IT translation factory. VisualTran, for example, allows the post-editor to cover about 10 pages of a word-document in about 3 to 4 hours and displays previously translated texts in different colors and selected terminologies among other post-editors working on the same translation task. Moreover, the management program for project managers displays the activity status of all post-editors on a single computer screen or on any mobile connection. It shows the percentages of job done per post-editor and for all post-editors. Project managers can monitor the post-edited translation in real time and contact the post-editor immediately for inquiries.

The new concept ‘machine translation’ and the new occupation ‘post-editor’ are unfamiliar in Korea. However, the language-services industries in Europe have already turned to machine translation longer than a decade ago. Some announce that they proudly carry out the translation via machines and post-editors, while the industries all still offer human translation for genres lacking sufficient data or for superior qualities, but with a top fee, assumedly. The language-services industry based on machine translation and interpretation, equipped with AI-based translation technologies and with unthinkable computing powers, would quite resemble a sudden emergence of mobile phones like a supernova a while ago, and a rapid propagation with smartphones like a black hole in recent times. These new industries concerning translation and interpretation will have to accommodate a great variety of language expertise of all sort, and whoever that is interested in machine translation and interpretation. Schools and governments must pay attention to such changes in the language-services industry and promote the AI-based interlingual translation and interpretation competence as one of the educational objectives. Schools can also offer students to further link their knowledge to the industry and to determine their possible roles. Governments must realize the importance of translation accuracy and consistency of the public documents. Necessity nears to make decisive headway toward building a nation with powerful translation and interpretation industries.

출처 : 교수신문(http://www.kyosu.net) https://goo.gl/ZqubSY

안찬수의 더 느린질주

페이지

2018년 10월 29일 월요일

번역의 진화, ‘인간번역’에서 ‘기계+인간번역’으로/ 이일재 광운대·영어영문학과

댓글 없음:

댓글 쓰기