Hacettepe University Graduate School Of Social Sciences Department of Translation and Interpretation AUTOMATIC SPEECH RECOGNITION IN CONSECUTIVE INTERPRETER WORKSTATION: COMPUTER-AIDED INTERPRETING TOOL ‘SIGHT-TERP’ Cihan ÜNLÜ Master’s Thesis Ankara, 2023 AUTOMATIC SPEECH RECOGNITION IN CONSECUTIVE INTERPRETER WORKSTATION: COMPUTER-AIDED INTERPRETING TOOL ‘SIGHT-TERP’ Cihan ÜNLÜ Hacettepe University Graduate School of Social Sciences Department of Translation and Interpretation Master’s Thesis Ankara, 2023 KABUL VE ONAY Cihan ÜNLÜ tarafından hazırlanan “Automatic Speech Recognition in Consecutive Intepreter Workstation: Computer Aided Interpreting Tool ‘Sight-Terp’” (Otomatik Konuşma Tanıma Sistemlerinin Ardıl Çeviride Kullanılması: Sight-Terp) başlıklı bu çalışma, 15.06.2023 tarihinde yapılan savunma sınavı sonucunda başarılı bulunarak jürimiz tarafından Yüksek Lisans Tezi olarak kabul edilmiştir. Dr. Öğr. Üyesi Alper KUMCU (Başkan) Prof. Dr. Aymil DOĞAN (Danışman) Doç. Dr. Gökçen HASTÜRKOĞLU (Üye) Yukarıdaki imzaların adı geçen öğretim üyelerine ait olduğunu onaylarım. Prof. Dr. Uğur ÖMÜRGÖNÜLŞEN Enstitü Müdürü YAYIMLAMA VE FİKRİ MÜLKİYET HAKLARI BEYANI Enstitü tarafından onaylanan lisansüstü tezimin/raporumun tamamını veya herhangi bir kısmını, basılı (kağıt) ve elektronik formatta arşivleme ve aşağıda verilen koşullarla kullanıma açma iznini Hacettepe Üniversitesine verdiğimi bildiririm. Bu izinle Üniversiteye verilen kullanım hakları dışındaki tüm fikri mülkiyet haklarım bende kalacak, tezimin tamamının ya da bir bölümünün gelecekteki çalışmalarda (makale, kitap, lisans ve patent vb.) kullanım hakları bana ait olacaktır. Tezin kendi orijinal çalışmam olduğunu, başkalarının haklarını ihlal etmediğimi ve tezimin tek yetkili sahibi olduğumu beyan ve taahhüt ederim. Tezimde yer alan telif hakkı bulunan ve sahiplerinden yazılı izin alınarak kullanılması zorunlu metinlerin yazılı izin alınarak kullandığımı ve istenildiğinde suretlerini Üniversiteye teslim etmeyi taahhüt ederim. Yükseköğretim Kurulu tarafından yayınlanan “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” kapsamında tezim aşağıda belirtilen koşullar haricince YÖK Ulusal Tez Merkezi / H.Ü. Kütüphaneleri Açık Erişim Sisteminde erişime açılır. Enstitü / Fakülte yönetim kurulu kararı ile tezimin erişime açılması mezuniyet tarihimden itibaren 2 yıl ertelenmiştir. (1) Enstitü / Fakülte yönetim kurulunun gerekçeli kararı ile tezimin erişime açılması mezuniyet tarihimden itibaren ... ay ertelenmiştir. (2) Tezimle ilgili gizlilik kararı verilmiştir. (3) 21/06/2023 Cihan ÜNLÜ “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” (1) Madde 6. 1. Lisansüstü tezle ilgili patent başvurusu yapılması veya patent alma sürecinin devam etmesi durumunda, tez danışmanının önerisi ve enstitü anabilim dalının uygun görüşü üzerine enstitü veya fakülte yönetim kurulu iki yıl süre ile tezin erişime açılmasının ertelenmesine karar verebilir. (2) Madde 6. 2. Yeni teknik, materyal ve metotların kullanıldığı, henüz makaleye dönüşmemiş veya patent gibi yöntemlerle korunmamış ve internetten paylaşılması durumunda 3. şahıslara veya kurumlara haksız kazanç imkanı oluşturabilecek bilgi ve bulguları içeren tezler hakkında tez danışmanının önerisi ve enstitü anabilim dalının uygun görüşü üzerine enstitü veya fakülte yönetim kurulunun gerekçeli kararı ile altı ayı aşmamak üzere tezin erişime açılması engellenebilir. (3) Madde 7. 1. Ulusal çıkarları veya güvenliği ilgilendiren, emniyet, istihbarat, savunma ve güvenlik, sağlık vb. konulara ilişkin lisansüstü tezlerle ilgili gizlilik kararı, tezin yapıldığı kurum tarafından verilir *. Kurum ve kuruluşlarla yapılan işbirliği protokolü çerçevesinde hazırlanan lisansüstü tezlere ilişkin gizlilik kararı ise, ilgili kurum ve kuruluşun önerisi ile enstitü veya fakültenin uygun görüşü üzerine üniversite yönetim kurulu tarafından verilir. Gizlilik kararı verilen tezler Yükseköğretim Kuruluna bildirilir. Madde 7.2. Gizlilik kararı verilen tezler gizlilik süresince enstitü veya fakülte tarafından gizlilik kuralları çerçevesinde muhafaza edilir, gizlilik kararının kaldırılması halinde Tez Otomasyon Sistemine yüklenir * Tez danışmanının önerisi ve enstitü anabilim dalının uygun görüşü üzerine enstitü veya fakülte yönetim kurulu tarafından karar verilir. ETİK BEYAN Bu çalışmadaki bütün bilgi ve belgeleri akademik kurallar çerçevesinde elde ettiğimi, görsel, işitsel ve yazılı tüm bilgi ve sonuçları bilimsel ahlak kurallarına uygun olarak sunduğumu, kullandığım verilerde herhangi bir tahrifat yapmadığımı, yararlandığım kaynaklara bilimsel normlara uygun olarak atıfta bulunduğumu, tezimin kaynak gösterilen durumlar dışında özgün olduğunu, Prof. Dr. Aymil DOĞAN danışmanlığında tarafımdan üretildiğini ve Hacettepe Üniversitesi Sosyal Bilimler Enstitüsü Tez Yazım Yönergesine göre yazıldığını beyan ederim. Cihan ÜNLÜ iv ACKNOWLEDGEMENTS My first sincere thanks go to my advisor Prof. Dr. Aymil Doğan. I have nothing but admiration for her wisdom, attention and efforts for her students. I would also like to thank the participants of this study who generously gave their time and energy to take part in this research. Their willingness to share their experiences and insights has been invaluable and deeply appreciated. I would like to express my deepest gratitude to Assoc. Assoc. Prof. Dr. Didem Tuna and Asst. Prof. Dr. Javid Aliyev for their invaluable academic guidance, understanding, and sincere help throughout both my undergraduate and graduate education. I would also like to extend my thanks to my friend Ebru Kürkcü for her unwavering moral support. I am grateful to Sebahat Gören, Dr. Pınar Uysal Cantürk, Aslı Yolcu and Büşra Ceren Tangül for her kind help in the statistical assessment and evaluation. Lastly, I would like to thank Prof. Didem Tuna and Prof. Alev Bulut for their invaluable expert opinions on the methodology used in this work. I owe a debt of gratitude to my family, colleagues at Istanbul Yeni Yüzyıl University and friends who provided me with much-needed encouragement, motivation, and support during this challenging time. Their faith in me has been a constant source of motivation and inspiration. v ÖZET ÜNLÜ, Cihan. Otomatik Konuşma Tanıma Sistemlerinin Ardıl Çeviride Kullanılması: Sight-Terp, Yüksek Lisans Tezi, Ankara, 2023. Bu deneysel çalışma, bilgisayar destekli sözlü çeviri (BDS) aracı olan "Sight-Terp" kullanımının ardıl çeviri sürecine etkisini araştırmaktadır. Bu çalışmanın yazarı tarafından tasarlanan ve geliştirilen Sight-Terp, dijital not defteri, otomatik konuşma tanıma (OKT), gerçek zamanlı konuşma çevirisi, adlandırılmış varlık tanıma ve vurgulama ve otomatik segmentasyon işlevlerine sahiptir. Çalışma, katılımcıların performanslarını iki koşulda (Sight-Terp'li ve Sight-Terp'siz) test etmek ve performanslarını doğruluk ve akıcılık kriterlerine göre analiz etmek için grupiçi tekrarlı ölçümler tasarımı kullanmıştır. İki farklı koşuldaki doğruluk oranları arasındaki farkı analiz etmek için doğruluk değişkeni, anlamsal olarak eşdeğer bir şekilde aktarılan anlam birimlerinin sayısının ortalaması ile ölçülmüştür (Seleskovitch, 1989). Akıcılık ise, her bir performans için yanlış başlangıçlar, dolgulu duraksamaların sıklığı, sessiz duraksamalar, tüm sözcük tekrarları, bozuk sözcükler ve tamamlanmamış tümceler gibi akıcısızlık göstergelerinin toplam sayısı hesaplanarak ölçülmüştür. Ek olarak, katılımcıların araç kullanımına ilişkin algılarını analiz etmek için deney sonrası anket uygulanmıştır. Elde edilen bulgular, OKT ile entegre edilmiş BDS aracı Sight-Terp'ten yararlanmanın katılımcıların çevirilerinin doğruluğunda bir artışa yol açtığını göstermektedir. Ancak Sight-Terp kullandıklarında katılımcılarda daha fazla akıcısızlık belirteçleri meydana gelmiş ve çeviri için harcadıkları süre görece uzamıştır. Kullanıcılar aracı kullanırken herhangi bir zorluk veya yabancılık hissetmeseler de çalışma sonuçları yazılımın faydasını daha da artırabilecek potansiyel iyileştirme ve değişiklik alanlarını da ortaya koymaktadır. Bu çalışma, OKT teknolojisini sözlü çeviri sürecine dahil etmenin faydalarını ve zorluklarını vurgulayarak sözlü çeviri eğitimini ve pratiğini bilgilendirmeyi amaçlamakta ve sözlü çevirmenler için BDS araçlarının gelecekteki gelişimi için pratik öneriler sunmayı amaçlamaktadır. Keywords: bilgisayar destekli sözlü çeviri, otomatik konuşma tanıma, sözlü çeviri teknolojileri, ardıl çeviri, not alma, tablet destekli sözlü çeviri vi ABSTRACT ÜNLÜ, Cihan. Automatic Speech Recognition in Consecutive Interpreter Workstation: Computer-Aided Interpreting Tool ‘Sight-Terp’, Master’s Thesis, Ankara, 2023. This experimental study investigates the effect of using an automatic speech recognition (ASR)-enhanced computer-assisted interpreting (CAI) tool “Sight-Terp” on the performances of a group of participants in consecutive interpreting tasks. Sight-Terp, which is designed and developed by the author of this study, provides a digital note-pad, real-time speech translation, named entity recognition and highlighting, and automatic segmentation of a speech. The study employs a within-subjects repeated measures design to test participants' performances in two conditions (with and without Sight-Terp) and analyses their performances based on the criteria of accuracy and fluency. In seeking the significant difference between the accuracy ratios in two different conditions, accuracy was measured by the average of the number of accurately conveyed units of meaning (Seleskovitch, 1989). Fluency, on the other hand, was measured by calculating the total number of occurrences of disfluency markers such as false starts, frequency of filled pauses, filler words, whole-word repetitions, broken words, and incomplete phrases for each performance. Additionally, a follow-up qualitative survey is conducted to obtain participants' comparative responses and perceptions of the tool usage. The analysis and quantitative results of the study indicate that leveraging the ASR-integrated CAI tool Sight-Terp led to an enhancement in the accuracy of the participants' interpretations. However, this also resulted in a higher occurrence of disfluencies and elongated durations of interpretations. While the users experienced little difficulty while using the tool, the study outcomes also suggest potential areas of improvement and modifications that could further enhance the utility of the tool. The study aims to inform interpreting education and practice by highlighting the benefits and challenges of incorporating ASR technology in the interpreting process and offers practical suggestions for the future development of CAI tools for interpreters. Keywords: computer-assisted interpreting, automatic speech recognition, interpreting technology, consecutive interpreting, note-taking, tablet interpreting vii TABLE OF CONTENTS KABUL VE ONAY .......................................................................................................... i YAYIMLAMA VE FİKRİ MÜLKİYET HAKLARI BEYANI ................................ iv ETİK BEYAN ................................................................................................................. iii ACKNOWLEDGEMENTS ........................................................................................... iv ÖZET ............................................................................................................................... V ABSTRACT .................................................................................................................... vi TABLE OF CONTENTS .............................................................................................. vii LIST OF ABBREVIATIONS ....................................................................................... xi LIST OF TABLES .................................................................................................... Xİİİ LIST OF FIGURES ................................................................................................... XİV LIST OF CHARTS ..................................................................................................... XV INTRODUCTION ........................................................................................................... 1 CHAPTER ONE: SCOPE OF THE STUDY ............................................................... 5 1.1. AIM OF THIS STUDY ..................................................................................... 5 1.2. SIGNIFICANCE OF THIS STUDY ............................................................... 5 1.3. RESEARCH QUESTION(S) ........................................................................... 6 1.4. LIMITATIONS ................................................................................................. 7 1.5. ASSUMPTIONS ................................................................................................ 7 1.6. DEFINITIONS ................................................................................................. 8 CHAPTER TWO: THEORETICAL BACKGROUND .............................................. 9 2.1. INTERPRETING: AN OVERVIEW .............................................................. 9 2.1.1. Defining Interpreting ................................................................................. 10 2.1.2. History of Interpreting ............................................................................... 11 viii 2.1.3. Interpreting in Modern Times ................................................................... 13 2.1.4. Modes and Settings of Interpreting ........................................................... 15 2.1.4.1. Consecutive Interpreting .................................................................... 18 2.1.4.2. Simultaneous Interpreting .................................................................. 20 2.1.4.3. Sight Interpreting................................................................................ 21 2.1.4.4. Whispering (Chuchotage) .................................................................. 22 2.1.4.5. Sign Language Interpreting ................................................................ 23 2.2. EFFORT MODELS IN INTERPRETING ................................................... 23 2.2.1. Effort Models in Consecutive Interpreting................................................ 25 2.2.2. Effort Models in Human-Machine Interaction .......................................... 26 2.3. TECHNOLOGY AND INTERPRETING .................................................... 29 2.3.1. The Emergence of Information Technologies in Interpreting ................... 29 2.3.1.1.Categorization of Technologies in Interpreting .................................. 30 2.3.2. Computer-Assisted Interpreting Tools ...................................................... 37 2.3.2.1. InterpretBank ...................................................................................... 41 2.3.2.2. Kudo Interpreter Assist ...................................................................... 44 2.3.2.3. SmarTerp ............................................................................................ 46 2.3.3. Speech Technologies and Automatic Speech Recognition ....................... 48 2.3.3.1. ASR Integration into Translation ....................................................... 53 2.3.3.2. ASR Integration into Interpreting ...................................................... 56 2.3.4. Technology and Consecutive Interpreting ................................................ 61 2.1.4.1.Sim-Consec ......................................................................................... 62 3.1.4.1. Tablet Interpreting .............................................................................. 63 2.4. SIGHT-TERP ......................................................................................................... 65 2.4.1. General Features ........................................................................................ 66 2.4.1.1. Automatic Speech Recognition and Speech Translation ................... 67 ix 2.4.1.2. Automatic Text Segmentation ............................................................ 69 2.4.1.3. Named Entity Recognition and Highlighting ..................................... 71 2.4.1.4. Digital Notepad .................................................................................. 73 CHAPTER THREE: METHODOLOGY ................................................................... 76 3.1. DESIGN OF THE STUDY ............................................................................. 76 3.2. DATA COLLECTION INSTRUMENTS ..................................................... 77 3.2.1. Speeches .................................................................................................... 78 3.2.2. Questionnaires ........................................................................................... 81 3.3. PARTICIPANTS ............................................................................................. 82 3.4. PROCEDURE ................................................................................................. 83 3.4.1. Training ..................................................................................................... 86 3.4.2. Preliminary test ......................................................................................... 87 3.5. DATA ANALYSIS TECHNIQUES .............................................................. 89 CHAPTER FOUR: FINDINGS AND DISCUSSION ................................................ 91 4.1. FINDINGS AND DISCUSSION RELATED TO THE ACCURACY DİFFERENCES ..................................................................................................... 91 4.2. FINDINGS AND DISCUSSION RELATED TO THE FLUENCY DIFFERENCES ..................................................................................................... 93 4.3. POST-EXPERİMENT QUESTIONNAIRE RESULTS.............................. 96 CONCLUSION AND RECOMMENDATIONS ...................................................... 105 BIBLIOGRAPHY ....................................................................................................... 110 APPENDIX 1. SPEECH MATERIALS .................................................................... 120 APPENDIX 2. TABLE OF ICT TOOLS AND PLATFORMS RELATED TO INTERPRETING TECHNOLOGY .......................................................................... 124 x APPENDIX 3. ETHICS COMMITTEE APPROVAL ........................................... 126 APPENDIX 4. THESIS/DISSERTATION ORIGINALITY REPORT ................. 127 xi LIST OF ABBREVIATIONS AI : Artificial Intelligence AIIC : International Association of Conference Interpreters API : Application Programming Interface AR : Augmented Reality ARI : The Automated Readability Index ASR : Automatic Speech Recognition CAI : Computer-Assisted Interpreting CI : Consecutive Interpreting EM : Effort Model ER : External Resources ESIT : École Supérieure d’Interprètes et de Traducteurs (School for Interpreters and Translators in Paris, France) ETI : École de Traduction et d'Interprétation (School of Translation and Interpreting in Geneva, Switzerland) EVS : Ear-Voice Span HMI : Human-Machine Interaction ICT : Information and Communications Technology LLM : Language Learning Machine MFD : Mean Fixation Duration MI : Machine Interpreting MT : Machine Translation NER : Named Entity Recognition NLP : Natural Language Processing PE : Post-Editing RI : Remote Interpreting RSI : Remote Simultaneous Interpreting S2ST : Speech-to-Speech Translation SCI : Sight-Consecutive Interpreting SI : Simultaneous Interpreting SMOG : Simple Measure of Gobbledygook ST : Speech Translation xii TD : Translation Dictation TIS : Translation and Interpreting Studies UI : User Interface VR : Virtual Reality WER : Word Error Rate xiii LIST OF TABLES Table 1: ICT Tools and Platforms Related to Interpreting Technology Table 2: Advantages and Disadvantages of Tablet Interpreting (Goldsmith, 2018, p.357) Table 3: Readability Index Results and Lexical Density Ratios of Speech Materials Table 4: Detailed Descriptions of Speech Materials (Duration, Length, Units of Meaning) Table 5: Word-Error-Rate Results and Precision of ASR in Named Entity Recognition Table 6: Distribution of Speech Materials per Participant Table 7: Instances of Disfluency Markers per Participant xiv LIST OF FIGURES Figure 1. The conceptual spectrum of interpreting drafted by Pöchhacker Figure 2. Glossary creation and editing in InterpretBank Figure 3. The memory feature of InterpretBank Figure 4. The main interface of InterpretBank ASR Figure 5. Glossary management page of Interpreter Assist Figure 6. ASR Feature in KUDO Interpreter Assist Figure 7. The user interface of SmarTerp Figure 8. The workflow of the ASR-CAI integration in the case of InterpretBank Figure 9. The functionalities of Livescribe™ Echo® Smartpen Figure 10. The main layout of Sight-Terp (Tablet View) Figure 11. A segmented text on the interface of Sight-Terp Figure 12. Named entities highlighted in Sight-Terp interface Figure 13. Digital Notepad feature of Sight-Terp Figure 14. The comparable results of the preliminary test: complete renditions of meaning units in % Figure 15. The comparable results of the main test: complete renditions of units of meaning in %. Figure 16. The durations of the performances (in minutes and seconds) Figure 17. The answers to the question “How would you evaluate your experience with the Sight-Terp tool?” Figure 18. The answers to the Likert item “I think the Sight-Terp tool is easy to use.” Figure 19. The answers to the Likert item “Using automatic speech recognition during the consecutive interpreting task negatively affected my performance.” Figure 20. The answers to the Likert item “I think the features in Sight-Terp contributed to my consecutive interpreting performance.” Figure 21. The answers to the question “Do you think the automatic speech recognition function in Sight-Terp is accurate and reliable?” Figure 22. The answers to the question “Which automatically generated output did you use for support during consecutive interpreting?” Figure 23. Answers to the question “Would you use the Sight-Terp tool in your future professional life?” xv LIST OF CHARTS Chart 1. The procedure followed in the study 1 INTRODUCTION The key role of information and communication technologies (ICT) in interpreting is inarguably prominent considering recent tailor-made technological solutions for interpreters. Remote interpreting (RI) solutions have changed the way interpreters work and created a digital identity along with its problems and contributions. Machine interpreting (MI), on the other hand, though far from human parity, has the potential to create thought-provoking debates on user perception, multilingualism, and communicative perspective. The advancement of technology has brought about a plethora of tools and solutions to enhance the accuracy and efficiency of interpreters. With the use of computer-assisted interpreting tools (CAI) and natural language processing (NLP) applications, interpreters now have access to a whole new world of linguistic and technical possibilities, which can revolutionize the way they approach their work. Computer-assisted interpreting is defined as software which is ‘specifically designed and developed to assist interpreters in at least one of the different sub-processes of interpreting’ (Fantinuoli, 2018b, p. 12). CAI tools emerged to fulfil the common objective, which is helping interpreters in a wide range of productivity and quality-related tasks from easing cognitive load to conference preparation and terminology organization. As a matter of fact, technological trends in the field of interpreting have changed with new developments in natural language processing, speech technologies, general artificial intelligence and changing role of interpreters with the rise of remote simultaneous interpreting (RSI) and the so-called technologization process or ‘technological turn’ (Fantinuoli, 2018b) has changed the way “computer-assisted interpreting” is perceived. Automatic speech recognition technology is game-changer for the new generation CAI tools. The quality of ASR systems has been incrementally improved thanks to new advancements in deep learning1, which brought about the question of whether CAI tools and ASR can be integrated. ASR-integrated CAI tools have been proposed and designed 1 Deep learning is a subset of artificial intelligence (AI) that focuses on teaching machines to learn and process data in ways that resemble human learning. 2 to alleviate the cognitive strain on interpreters during the interpreting process, while simultaneously augmenting their processing capabilities. The aim in principle is to automate the querying system in real-time in simultaneous interpreting and make it possible to automatically display the reliable transcript of the source speech in a short time that fits into interpreters’ ear-voice span (EVS). These new generation ASR- enhanced CAI tools have newly gained traction thanks to the tools (or projects) such as InterpretBank (Fantinuoli, 2016), SmarTerp (Rodriguez et al., 2021), VIP (Corpas-Pastor, 2021) and KUDO Interpreter Assist (Fantinuoli et al., 2022). ASR with “considerable potential for changing the way interpreting is practiced” (Pöchhacker, 2016, p. 188) has a pivotal role in shaping the concept of human-machine interaction in the context of interpreting. Several empirical studies are questioning possible ASR implementation as an automated querying system (Hansen-Schirra, 2012; Fantinuoli, 2017), investigating the feasibility of ASR-enhanced CAI tools in the context of problem triggers (Ricci, 2020; Van Cauwenberghe, 2020; Defrancq & Fantinuoli, 2021; Rodríguez et al., 2021; Pisani & Fantinuoli, 2021; Montecchio, 2021; Prandi, 2023), using ASR for meeting the preparatory needs of interpreters (Gaber et al., 2020) and implementing ASR for supporting interpreters with the transcription of the source speech (Cheung & Tianyun, 2018; Wang & Wang, 2019). In order to enhance the depth of empirical research on CAI tools, this study deviates from the earlier studies that primarily investigated the use of ASR in simultaneous interpreting, instead focusing on the usage of an ASR and MT-enhanced CAI tool in consecutive mode. The study2 attempts to fill a gap in the available literature on computer-assisted interpreting tools by proposing a prototype of an ASR-enhanced digital application and providing insights into the effectiveness of ASR and technology usage in enhancing interpreter performance in consecutive interpreting (CI), which could help shape the creation of more sophisticated CAI tools that cater to the specific needs of interpreters. The study aims at exploring and identifying a significant difference in the performances of a group of participants in CI tasks, using an ASR-enhanced CAI tool “Sight-Terp” (see section 2.4.) which is 2 The scope and the results of the preliminary test of this thesis were presented with the title “Investigating the usage of ASR and speech translation in consecutive interpreter workstation: A pilot study on ASR- enhanced CAI tool prototype ‘Sight-Terp’” in the TC44 Translating and the Computer Conference organized in Luxembourg on the 22-25th of November 2022. 3 developed by the author within the scope of this thesis. Sight-Terp3 is a prototype of a CAI tool that initiates continuous speech recognition and provides real-time speech translation, named entity recognition and automatic segmentation of a speech. The named entity recognition (NER) function allows users to easily detect the named entities in the automated texts, such as numerals and proper names to improve their lookup mechanism. Participants' performances were tested and analyzed for accuracy and fluency using a repeated measures design. Accuracy was measured by calculating the percentage of the accurately rendered “units of meaning” (Seleskovitch, 1989) in each performance. A non- parametric statistical test (Wilcoxon Signed-Rank) was used to compare performances without technological aid and with Sight-Terp. A follow-up qualitative survey were given to the participants to obtain comparative responses and perceptions on the tool usage. The study has the potential to inform interpreting education and practice by highlighting the benefits and challenges of incorporating ASR technology in the interpreting process. By doing so, the study can also offer important practical suggestions for the future development of CAI tools for interpreters. The first chapter of the study serves as the introduction and outlines the aim, significance, research questions, limitations, assumptions, and research definitions. The second chapter focuses on the background and the literature review of this study, starting with historical and etymological aspects of interpreting (2.1.) and cognitive dimensions of interpreting with a focus on Effort Models by Daniel Gile (2.2.). Chapter two also touches upon technology in interpreting by providing a classification of ICT tools and platforms (2.3.1). Further, in section 2.3.2., the definition of CAI tools is made with three examples of ASR-enhanced CAI tools available on the market. Section 2.3.3. then explains speech technologies in general coupled with qualitative and quantitative data from various studies on ASR integration into interpreting and translation. Section 2.3.4 mentions the usage of technological solutions for consecutive interpreting. Finally, the last section of chapter two gives a detailed description of the proposed CAI tool Sight-Terp (2.4.). 3 Sight-Terp is publicly available at: https://www.sightterp.net. https://www.sightterp.net/ 4 Chapter three outlines the methodology of the study, including its design, data collection instruments, participants, and procedure. Chapter four presents the findings and discussions related to the accuracy and fluency differences in interpreting performance as well as comprehensive feedback of the users. Finally, chapter five dwells on the conclusion reached at the end of the study and provides recommendations for future research. 5 CHAPTER ONE SCOPE OF THE STUDY 1.1. AIM OF THIS STUDY The purpose of this study is to evaluate the effectiveness of the ASR-enhanced CAI tool Sight-Terp (https://www.sightterp.net), developed in the scope of this thesis, in enhancing the performance of consecutive interpreters by facilitating real-time speech translation, named entity recognition and automatic segmentation. The research aims to investigate whether the use of Sight-Terp improves the accuracy and fluency of CI as its primary objective. By means of the within-participants repeated measures design, the study seeks to empirically test the performance of a group of interpreters who will use Sight-Terp during the post-test phase. Furthermore, the research attempts to collect qualitative feedback from the participants through a follow-up survey, which will offer insights into their experiences and perspectives of using the tool. The contribution of this study to the field of interpreting will be to provide evidence of the effectiveness of ASR-based CAI tools in improving interpreters' performance by identifying a significant difference in participants' performance. 1.2. SIGNIFICANCE OF THIS STUDY Process-oriented translation and interpreting research in experimental settings have gained traction in recent decades. Within the scope of interpreting research, research trends investigating the impact of technology-enabled interpreting tools on interpreters’ tasks have mostly centred upon simultaneous interpreting. Recognizing the need to expand empirical research on computer-assisted interpreting (CAI) tools, this study diverges from previous investigations that primarily focused on automatic speech recognition (ASR) in simultaneous interpreting. Instead, it examines the utilization of an ASR and machine translation (MT) augmented CAI tool in consecutive interpreting. By proposing a prototype digital application, the study aims to address a gap in the current body of literature pertaining to CAI tools. The empirical research conducted in this study can provide valuable insights into the effectiveness of these tools in improving interpreter https://www.sightterp.net/ 6 performance and can inform the development of more advanced CAI tools. This thesis distinguishes itself by employing the English-Turkish language pair, whereas similar studies investigating ASR in interpreting have predominantly focused on high-resource European languages or Chinese. In addition to addressing the research questions posed by the methodology, this study seeks to compile a comprehensive table in the literature review section that highlights the various tools and platforms associated with information and communication technologies in interpreting. By doing so, it aims to provide an extensive overview of the resources that influence, either partially or entirely, the practice of interpreting. Moreover, the methodology utilized in this empirical study might bring new questions as to whether or not we need new methodological designs for product-oriented CAI tool research for better generalizability, particularly in technology-assisted CI. The results of this study can have practical implications for professional interpreters, interpreter training programmes and speech technology developers, as they can inform the development and integration or introduction of more efficient and effective interpreting technology tools, particularly enhanced with AI and automatic speech recognition. Finally, the results of this study can lead to a better understanding of the potential of human-machine interaction in interpreting and contribute to ongoing efforts to improve the quality of interpreting services. 1.3. RESEARCH QUESTIONS 1. Does the use of the CAI tool Sight-Terp in consecutive interpreting, which provides both a source transcription and a machine translation output, lead to a significant improvement in the interpreting accuracy of interpreters compared to their performance without technological aid? 2. Are there significant differences in the number of disfluencies (pauses, hesitations, repetitions, stuttering, false starts) between pre-test performances without CAI support and post-test performances with Sight-Terp support? 3. How do users interact with the tool Sight-Terp? Do its interface design and ergonomic features meet the required standards for efficient and effective interpretation? 7 1.4. LIMITATIONS The results obtained from our empirical evaluation must be interpreted in a nuanced manner, as they are subject to certain limitations. One such limitation is that the experiment of the study is conducted with students/novice interpreters. On the other hand, the language pair used in this study is Turkish and English and the interpreting task requested from the participants is in the direction from English into Turkish. The directionality is another phenomenon which may bring other factors and interferes with the accuracy and completeness of the interpreting performance particularly when it comes to technology-mediated interpreting scenarios. The use of pre-recorded speeches may not be reflective of the challenges and demands of live interpreting, which could impact the generalizability of the results. It is also critical to acknowledge that variables such as specific domains, speech characteristics, and accents - among other factors - are highly relevant and may significantly affect the tool's performance and usability. As a fourth limitation, the ASR system that Sight-Terp relies on is Microsoft Azure Speech Recognition API. At the time of writing this thesis, the Microsoft Speech Recognition API is considered to be one of the best ASR systems when compared to other equivalent software. However, this limitation should still be taken into account when evaluating the proposed software's overall performance and effectiveness. 1.5. ASSUMPTIONS 1. All participants are presumed to possess similar but comparable skills and levels of expertise in consecutive interpretation. 2. The participants are assumed to perform to the best of their ability and to be motivated to achieve high levels of accuracy and fluency in their interpretation, regardless of the presence of technological aids. 3. The participants are assumed to be honest and sincere in their self-assessment of their performance and to provide accurate responses in the questionnaires. 4. The reliability index results for the materials used in the research are presumed to be adequate and valid for assessing the validity of the performances and the pre-test and post-test speeches are assumed to have similar levels of difficulty and content familiarity. 8 5. The laboratory conditions are assumed to affect all subjects in a similar manner and that all subjects participate in the tasks with their utmost focus and concentration. 1.6. DEFINITIONS Automatic Speech Recognition (ASR): ASR is a subfield of natural language processing and artificial intelligence (AI) that focuses on the development of algorithms and models to convert spoken language into written text. Speech Translation (ST): Speech Translation is a machine learning algorithm that utilizes a variety of techniques and models to facilitate the process of translating spoken language from one language to another. Computer-Assisted Interpreting (CAI) tools: CAI tools refer to a wide range of computer programs that have been developed with the primary purpose of supporting interpreters in one or more of the diverse sub-processes of interpreting. CAI tools provide human interpreters with real-time support in the form of speech recognition, translation, and other tools to enhance their interpretation performance, providing aid in the interpreting process. Named Entity Recognition (NER): NER is a natural language processing task (or technique) used to identify and extract important entities such as names, locations, and organizations from a text, providing a more comprehensive understanding of the information being processed. 9 CHAPTER TWO THEORETICAL BACKGROUND This chapter delineates the theoretical background of this thesis and provides a broad literature review of the core concepts that are linked with the professional, academic technological aspects of interpreting. In the first section, after a historical and etymological overview of interpreting per se, modes and settings of interpreting are defined with their precise subsections. The second section outlines the main principles of the cognitive dimension of interpreting with a particular focus on Daniel Gile’s Effort Models, which are closely associated with the cognitive aspects of interpreting. Section three of this chapter elaborates on the information and communication technologies in interpreting and classifies technology-relevant interpreting tools and platforms in a single frame. Further, the brief description of speech technologies including ASR integration into the interpreting and translation are described and outlined along with relevant data from qualitative and quantitative studies. Moreover, a subsection is allocated for detailing the use of technology in CI, with a few articles published so far for a better understanding of recent approaches. Finally, the last section introduces the computer-assisted interpreting tool Sight-Terp, which this thesis is grounded on, and provides an elaborative description of its features. 2.1. INTERPRETING: AN OVERVIEW Throughout history, interpreting was always required in any cross-linguistic communicative event in which across barriers of culture and language. It has been used for centuries to facilitate communication between individuals or groups who speak different languages, playing a crucial role in facilitating communication between people of different languages. The use of interpreters has continued to evolve and expand over periods of history. With the rise of globalization, communication between countries increased, so has the demand for interpreters. This has caused the interpreting industry to become more professional and standardized, creating professional groups and introducing interpreter training and certification programs. As a result, the ancient human practice of interpreting has undergone many social, cultural and most importantly professional 10 phases up until now. This section begins with a brief introduction to the concept of interpreting and its definition along with its history. It then goes on to explain the ramifications of the practice with different modes and settings. 2.1.1. Defining Interpreting Briefly defined, interpreting is the act of transferring a message from a language (signed or oral) into another language form. Different conceptual approaches are observable in defining interpreting in a broad manner. In Routledge Encyclopaedia of Translation Studies, Interpreting scholar Daniel Gile defines interpreting as “the oral or signed translation of oral or signed discourse, as opposed to oral translation of written texts” (2009, p. 51). Many languages have a corresponding equivalent word for interpreter and interpreting which are distinct from the words used for (written) translation. Etymologically, the first trace comes from the Akkadian word targumannu and its corresponding form turgemana from Aramaic the semantic component of which is ‘to explain’ (Pöchhacker, 2015, p. 198). The word finds its correspondence as tarjuman/targuman in Arabic, dragoumanos in middle Greek, dragumannus in middle Latin, dragomanno in Italian, drugemen/drogman in French, tercüman in Turkish, tolmács in Hungarian. The semantic inference of ‘explaining’ in these words have also a root in the greek word hermeneus or hermeneuties, referring to the Greek god Hermes interpreting the ethereal communique of the gods to the language of mortals for the sake of humanity. The English term "interpreting" has its origins in the Latin words interpres and interpretari. These words travelled through Old French and Anglo-French before finally being incorporated into modern English, accommodating diverse dialects and linguistic norms. As a result, the term has taken on different meanings in different contexts, with some restricting it to the act of facilitating communication between multilingual speakers and others embracing a more expansive interpretation that includes any kind of translation, whether in the form of written or spoken. 11 Apart from the etymological origin, it is also possible to draw a line in the distinction between translation and interpreting in that interpreting is performed ‘here and now’ and its feature of ‘immediacy’ makes the word ‘interpreting’ distinguished from other translational activities (Pöchhacker, 2016, p. 10). This denomination allows for the incorporation of other manifestations like signed language interpreting and excludes dichotomies of oral vs written translation by getting away from the common definition of “the oral translation of an oral discourse” (Gile, 1998, p. 40; 2004). Otto Kade defines the practice of interpreting as “the source-language text is presented only once and thus cannot be reviewed or replayed, and the target-language text is produced under time pressure, with little chance for correction and revision” (1968, as cited by Pöchhacker, 2016). This definition clearly articulates the feature of immediacy as the interpreter has limited potential to access the source text (can be substituted with “acts of discourse” and/or “utterances”) in its “one-time presentation” (p. 10). All in all, all definitions feature interpreting as an in-the-moment activity that focuses on facilitating oral communication. 2.1.2. History of Interpreting Throughout history, mediation, reciprocity, connectivity, and interconnectedness have always been at the heart of the engagement of civilizations, countries, tribes etc. This engagement at the basis of all cultural interactions was wealth, reputation, invasion, and the struggle for sovereignty. Whether in conflict or not, peace-making has also been also a matter of talking and therefore of language. Having been older than the invention of writing, interpreting has taken an inevitable and crucial role in war, peace, trade, and administration in addition to its undeniable role in peace negotiations, social interactions of civilizations, the spread of religions and in the context of many periods. Historically, records about interpreting are not in abundance for some presumable reasons, particularly prior to middle age. First, interpreting might have been considered a daily, common activity. Secondly, people in power in history writing did not consider the interpreter’s name worth mentioning, which resulted in a lack of historical documentation (Roland, 1982, p. 4). Another possible reason is the merit of invisibility as an integral ethical principle upheld by interpreters. As such, they were not considered to be worth recording in the official minutes and administrative documents. The earliest 12 known evidence of interpreting is from historical documents inscribing or mentioning the interpreter engaging in the practice of interpreting such as the hieroglyph from ancient Egypt depicting a communicative action between parties (Delisle & Woodsworth, 2012; p. 248) or a handful of documentary evidence on the role of interpreters in the Roman Empire (Giambruno, 2008, p. 28). Interpreters escorted conquerors as they marched into foreign lands, assumed important roles in diplomacy and government in Ancient Egypt and in Ottoman Empire, had social privileges in many societies (Diriker, 2005, p. 88), constituted a recognized occupational group in Rome (Hermann, 2002). In ancient times, they mostly consisted of people with multiple ethnic backgrounds, slaves, or prisoners (Roditi, 1982). Correspondingly, the motivation for embarking on an expedition was not limited to religion but trade, power and annexing new areas. Conquerors selected their interpreters from the land conquered by taking them to the native country to teach their language (Andres, 2012, p.3). Ottoman interpreters, the dragomans, who were mostly in charge of embassies and consulates of European states in cities under Ottoman rule, were from non-muslims of Christian communities of Fener and Pera districts who were knowledgeable with western culture and languages (Hitzel, 1995; Abbasbeyli, 2015). This was seen in ancient Greeks, who were not eager to learn new languages as they think their language is superior and made interpreters from bilingual foreign people whom they call “barbarians” (Wiotte-Franz, 2001). Profession-wise, it is also possible to trace the old code of ethics stipulated for interpreters. Mexican interpreters called “Nahuatlatos” were actively used in the Spanish influx into Central and South America. In this specific historical context, the striking point to lay out is that partly comprehensive legislation on interpreters was drafted by Spanish authorities which enshrines the training, accreditation, and definition of interpreters in a code of ethics (Baigorri-Jalón, 2015, p. 16). Overall, the origins of interpreting hark back to ancient civilizations. However, it wasn't until the 20th century that interpreting became a globally recognised profession, influenced by the convergence of significant political, technological, economic and social advances that played a crucial role in its development and growth. 13 2.1.3. Interpreting in Modern Times The oldest and, at the same time, one of the most modern professions, interpreting has undergone many transitions on its way to institutionalization and becoming a full profession as well as an academic discipline. In the past 100 years, interpreting experienced new transformations and ramifications with new modes emerging in new settings mostly driven by economic, political, and social developments. The widespread adoption of multilingualism in international conferences became possible after the emergence of official French-English bilingualism at the League of Nations in the early 20th century. This was a remarkable turning point in that it ensured multilingualism at international conferences and solidified the role of interpreters in facilitating communication between diverse linguistic backgrounds. Before the end of the First World War and the Paris Peace Conference of 1919, the prevalence of French in diplomatic proceedings was such that the demand for interpreters was minimal, as most participants were fluent in the language. On the rare occasion that a delegate was unable to speak French, they were assisted by a personal secretary or interpreter. Nevertheless, considering the need for interpreting was much less than in today's era of globalisation, conference interpreting was not considered a profession in its own right at the time. During these times, CI was mostly used for the meetings though it would double the duration of them. SI with equipment was not considered thoroughly until the twenties. Chronologically, in 1925, Edward Filene, a businessman, philanthropist and entrepreneur came up with the idea of simultaneous interpreting. He then appealed to Gordon Finlay, a staff member of the ILO, to conceive of a technique that could provide delegates with a method to listen to speeches via telephone. This system, called ‘the Filene-Finlay simultaneous translator”4 was operational using the available telephone equipment. It is known that, on June 4, 1927, the first meeting with simultaneous interpretation took place at the International Labour Conference in Geneva (Gaiba 1998, p. 3; Taylor-Bouladon, 2011). However, it can be said that there is uncertainty on the exact date and meeting where the first SI with equipment was used. While western scholars indicate that ILO was the first place, soviet histography mentions that SI is used for the first time in the VI 4 The system was later named “International Translator System” by IBM in 1945. 14 Congress of the Comintern held in 1928 (Flerov, 2013). Another SI system was used at the International Conference on Energy in Berlin in 1930, invented by Siemens & Halske (Gaiba, 1998). Between 1920 and 1940, SI was used in some international conferences across Europe (Taylor-Bouladon, 2011) but CI was used still quite often, especially in parliamentary meetings of ILO and the League of Nations. The start of the rich and storied history of conference interpreting dates to the successful deployment of SI in the infamous Nuremberg Trials of 1945-1946, which is considered to have marked a crucial milestone in the development of conference interpreting as a formal and respected profession. During these trials, interpreters were tasked with interpreting the speeches of Nazi war criminals, defendants, prosecutors and judges in English, French, Russian and German. Back then, Colonel Léon Dostert, the interpreter of General Eisenhower was entrusted to organize the language mediation process of the trials. John Tusa and Ann Tusa in their book “The Nuremberg Trial” (1983) describe the event as follows: “Colonel Dostert, the head of the translation section, had grouped his simultaneous translators into three teams of twelve: one team had to sit in court and work a shift of one and a half hours; another to sit in a separate room, relatively relaxed, but still wearing headphones and following the proceedings closely so as to ensure continuity and standard vocabulary when they took over; the third having a well-earned half- day off. The work was exacting. It needed great linguistic skills and total concentration. For many of those involved the subject matter imposed a further emotional strain. Working conditions were uncomfortable: the translators were cramped in their booths, which were even hotter than the courtroom. They spoke through a lip microphone to try to dampen their sound (the booth was not enclosed at the top) but not even the use of the microphone nor the huge headphones they wore could deaden the noise made by their colleagues. As they worked they had to fight the distractions of other versions and other languages” The time-saving feasibility of the SI and its organized application over a long run throughout the trials was another sign of future usability of SI. It wasn't until the 1950s that simultaneous interpretation was widely implemented at the United Nations in New York. At this time, the interpreters who worked in the English booth for the Security Council gained nationwide acclaim as their interpretations were broadcast over the radio 15 (Taylor-Bouladon, 2011, p. 29). In later years, the system became operational using wired systems and wireless/infrared. The International Association of Conference Interpreters (AIIC) was founded on in 1953. This occasion marked a turning point in the history of interpreting as we know it. This is because AIIC adopted a code of ethics and professional standards to regulate the working conditions of interpreters and to raise the profession's profile on the global stage, which was a great success. Alongside its birth, the AIIC also established complex administrative structures that continue to exist to this very day, with a highly centralized professional organization currently operating in Geneva. Today, modern technology has revolutionised the field of interpreting, with interpreters relying on cutting-edge tools such as soundproof booths, wireless headsets and computer- assisted translation software to enhance their work. The industry has also become more nuanced, with specialist interpreters serving specific sectors such as finance, law and healthcare. Moreover, because of the extensive use of simultaneous interpreting which started with the Nuremberg trials, a great need for trained interpreters has arisen, leading to the creation of numerous degree courses around the world. The formal education started with the foundation of the École de traduction et d’interprétation (ETI) in Geneva and respectively with the HEC School of Interpreting in Paris which was later replaced by the Sorbonne School of Interpreting and Translating (ESIT). In time many courses and programmes have been established offering bachelor, master's and PhD degrees to prospective interpreters and help in the professionalization of the field. The world has undergone significant changes since the early days of interpreting, and new modes and settings have emerged due to advances in technology. These changes have transformed the field of interpreting, and the next section will delve into the intricate details of these various modes and settings. 2.1.4. Modes and Settings of Interpreting It is possible to draw a conceptual map of interpreting with different settings and constellations such as inter-social and intra-social settings and the situational 16 constellation of interaction (i.e., conference interpreting vs dialogue interpreting) (Pöchhacker, 2016). Among many classifications, a distinct division can be drawn based on the methods and contexts as the practice of interpreting takes place in a number of different modes and settings, each of which presents unique challenges and opportunities for the interpreter. In the literature, there are no precise lines when it comes to classifying interpreting based on the settings, in which the action takes place, and the modes, which denote the temporal relationship of interpretation and the source message. Researchers tend to use different criteria while explaining the settings and modes of interpreting. According to Diriker (2018) interpreting can generally be classified based on the languages used in the communicational context (spoken language interpreting and sign language interpreting), the form of the interpretation (simultaneous interpreting, consecutive interpreting, whispering, sight interpreting), and the context in which the translation is performed (conference interpreting and community interpreting). Doğan (2022) adopts a particular classification. She initially outlines types of interpreting based on the method of execution with a particular focus on consecutive and simultaneous modes and then further delineates another classification based on settings where spoken and sign language mediation is needed. Accordingly, CI has subtypes such as classic consecutive, liaison interpreting, dialogue interpreting, and over-the-phone interpreting (p. 50), while simultaneous interpreting falls under the umbrella of subtypes such as TV interpreting, whispering, video-conference interpreting, sight interpreting, conference interpreting and sign language interpreting. On the other hand, the settings, namely the subjects of interpreting, are community interpreting, court interpreting, police interpreting, disaster interpreting, (Disaster Relief Interpreters, ARÇ in short), sports interpreting, healthcare interpreting, and conflict interpreting. Interpreting scholar Franz Pöchhacker (2016) takes another step and creates a broader systematic typological map of interpreting based on language modality (spoken vs signed), working mode (simultaneous, consecutive, sight interpreting etc.), directionality, technology use, and professional status (professional vs non-professional). The historical prevalence of professional interpreting at international conferences and meetings has led 17 to the belief that conference interpreting is carried out exclusively through the use of consecutive and simultaneous interpreting. These two modes have become synonymous with conference interpreting, and it is often assumed that they are the only methods used in this type of interpreting being ‘misconstrued in a taxonomic sense’ (Pöchhacker, 2015). Pöchhacker sets forth the following interpretation regarding the topic: Aside from the modality of the language(s) involved, which serves to contrast spoken language with sign language interpreting, the most common distinction is made in terms of the temporal relationship between the interpretation (target text) and the source text, which yields consecutive interpreting and simultaneous interpreting as the two main modes of interpreting. In a looser sense, different ‘modes’ can also be identified with reference to the directness of the interpreting process (relay interpreting) and the use of technology to deliver the interpretation, as in the case of remote interpreting provided in ‘distance mode’. Much more relevant, however, are conceptual distinctions with reference to the settings in which interpreter-mediated social contacts take place. On the broadest level, inter-social (or inter-national) scenarios, involving diplomats, politicians, scientists, business leaders or other types of representatives of comparable standing, can be viewed as different from intra- social (community-based) ones, in which one of the interacting parties is an individual speaking on his or her own behalf. The latter, subsumed under the broad heading of community interpreting, allow multiple interpreting subdivisions in terms of different institutional contexts, including legal interpreting, healthcare interpreting and educational interpreting, with numerous institution-related subtypes. (Pöchhacker, 2015, p. 199) Moreover, by combining all distinctions, Pöchhacker details different formats of interaction by drafting the scheme in the Figure 1. 18 Figure 1. The conceptual spectrum of interpreting drafted by Pöchhacker (2016, p. 17) In this subsection of the study, the main modes of interpreting, namely simultaneous interpreting, consecutive interpreting, whispering, sight interpreting and sign language interpreting will be briefly addressed. These crucial facets of interpreting will be delved into, exploring their specificities and intricacies in a succinct manner. 2.1.4.1. Consecutive Interpreting Consecutive interpreting (CI), which is the main mode and practice used in the experiment phase of this study, is a mode of interpreting which involves listening to the speaker's message in one language with or without the use of electronic equipment, taking notes, and then delivering a full and immediate consecutive interpretation in another language. The interpreter in this mode waits until the speaker has finished a segment of speech before beginning to interpret it. In other words, the interpreter and the speaker take turns after they speak. This mode of interpreting requires the interpreter to possess a distinctive set of skills and abilities, including good memory retention, unwavering attention to detail, and note-taking dexterity, all while possessing a profound grasp of the languages in question. CI can be practised for any duration as long as the original act of discourse continues, since the length of the speech to be interpreted is not predetermined. It involves the interpretation of both short utterances and extended speeches and thus “can be conceived 19 of as a continuum which ranges from the rendition of utterances as short as one word to the handling of entire speeches, or more or less lengthy portions thereof, ‘in one go’” (Pöchhacker, 2016). There are factors that affect the CI process such as the interpreter's working style, memory and situational factors. To cope with longer speeches, the note- taking technique i.e. taking notes which represent ideas and concepts rather than words is used, which was first introduced by pioneer conference interpreters in the early 1900s. Note-taking serves as a memory jogger for the interpreter. There are numerous methods and approaches in note-taking for CI each with its own unique nuances and subtleties such as mind-mapping, sentence condensation, and jotting down symbols, abbreviations, bullet points and keywords that trigger the memory of the speech content. Whether or not to use a note-taking technique divides CI into two classes: classic consecutive, where note-taking is most commonly used, and short consecutive where the duration of the speech is less than two or three minutes and does not require the interpreter to take notes. The term 'consecutive interpreting' was considered as, so to speak, a standard or default mode of interpreting and emerged in the 1920s to distinguish it from the new method of interpreting known as 'telephonic' or simultaneous interpreting (Baigorri-Jalón, 2014; Andres, 2015), which later paved the way for the birth of the profession of conference interpreting (see section 2.1.3). Subsequent to the effective deployment of the technique of simultaneous interpreting at the Nuremberg Trial and later adoption by the United Nations, the use of CI became less widespread. Simultaneous interpretation is commonly utilized for meetings with many languages and a big number of participants whereas consecutive interpretation is more suited for smaller sessions with technical or secret content, as well as ministerial negotiations. Additionally, CI is more flexible than simultaneous interpreting in terms of allowing the interpreter to communicate and clarify with participants, regulate the dialogical discourse, and look at the physical circumstances of the participants and their surroundings (Russel and Takeda, 2015). Simultaneous interpreting is widely regarded as a more advanced form of interpreting and more cognitively challenging than CI. According to Gile (2001a), it is often 20 recommended that students begin their interpreting training with CI since it serves as a ground for the more complex task of simultaneous interpreting. Though this argument is still debatable (e.g. Seleskovitch & Lederer 1989; Russell et al. 2010), this approach can be observed in the curricula of schools of translation and interpreting around the world. Before moving on to the simultaneous mode, CI is included as one of the basic practices, along with courses in sight translation and note-taking (Niska, 2005, p. 49). 2.1.4.2. Simultaneous Interpreting Simultaneous interpreting (SI) is a type of interpreting that involves interpreting the spoken language in real time while the speaker is still speaking. It is typically used in situations where a large group of people need to understand a speaker speaking in another language, such as at international conferences or meetings. Interpreters working in simultaneous mode are expected to produce a logically coherent output that is consistent with the source. The simultaneity of the act distinguishes simultaneous interpreting as a cognitively demanding process requiring a high level of language management. During simultaneous interpreting, the interpreter listens to the speaker through her/his headphone and at the same time speaks into a microphone, allowing the audience to hear the interpretation via headphones or loudspeakers. This dynamic makes simultaneous interpreting demand an exceptional degree of cognitive dexterity and linguistic prowess. The interpreter must be able to keep pace with the speaker's delivery, dexterously interpreting as they listen, a feat requiring extraordinary mental agility and linguistic virtuosity. Conference interpreters in this mode usually work in a booth where he or she can concentrate on the interpretation without distractions. Collaboration is a key aspect of simultaneous interpreting because interpreters rarely work alone. Instead, they work in pairs or even trios, each taking a 20-to-30-minute shift. This tag-team approach allows one interpreter to take a short break while the other does the heavy lifting, interpreting in real time for the audience. In this situation, teamwork is essential, with each interpreter assisting the other as needed, for example with difficult terminology. To be successful, interpreters must have an in-depth knowledge of their working languages and cultures, as 21 well as exceptional short-term memory. Adequate preparation is also essential, including prior research into the subject(s) of the event, which can cover a wide range of areas such as finance, medicine, law and science. 2.1.4.3. Sight Interpreting Sight interpreting is an interpreting modality that requires the interpreter to promptly interpret written materials in real time. Sight interpreting can be considered a hybrid mode where the written source text is turned into an oral in another language. This mode has become a critical aspect of various industries like law, medicine, and professional services, where immediate verbalization of written documents or letters is imperative for the recipient's comprehension. The tempo of the interpreter's delivery in simultaneous interpreting often aligns with the pace of the speaker's speech, yet the pace of interpreting from written text lies solely within the hands of the interpreter to manipulate as they see fit. The other modes of interpreting mostly depend on auditory input, sight interpreting frees up the interpreter's memory, but also poses an added challenge in the form of allocating their processing capacity to the visual channel. The complexity of the modality is also scrutinized in the framework of translation process research. In their study, Dragsted and Gorm Hansen (2007) showed that interpreters, in sight interpreting tasks, are different than translators in temporal variables and translational approach. Eye tracking studies also show that the visual presence of the source text requires more cognitive effort and visual interference, which needs further sources allocated to cope with the lexical and syntactic complexity of a written text (Shreve, e. al., 2010). In education, similar to the utilization of CI, sight interpreting has long been used to assess a candidate's aptitude - the ability to swiftly comprehend and articulate the core essence of a given text. This mode of testing widely considered a benchmark in determining an individual's competency in the field of interpreting (Russo, 2011). It is also commonly believed that sight interpreting can help students navigate a text in a non-linear manner and identify key information (Čeňková, 2015). Sight interpreting has the potential to elevate the practice of simultaneous interpreting to an even greater level of accuracy and precision. This is due to the fact that conference 22 speeches are often written beforehand, thus granting interpreters the ability to not only listen but also peruse the text in front of them. The skills gained from practising simultaneous interpreting with written material can also be extended to the scenarios of mixed media presentations, such as presentations using PowerPoint and, most notably, presentations with real-time subtitles displayed on screens (Setton, 2015). This blend of listening and text-based interpretation results in what Gile coins as "simultaneous interpreting with text” (1995). SI with text modality is considered to be a more favourable technique, however, its intricacy is unparalleled. The written text, being dense in information and language, often lacks the fluidity and prosody of spontaneous speech. This brings the issue of the complex balance of the two acts: relying too heavily on the written text, which might result in lagging behind, and relying solely on auditory input, which can be too fast to process. The studies on this modality indicate some benefits as well. According to Lambert's study (2004), it was revealed that providing text materials to student interpreters impacts their simultaneous interpreting performance. The results indicated a substantial improvement in their performance when they were given ten minutes to prepare with the text being made available to them. Likewise, Lamberger- Felber (2001, 2003) examined the impact of simultaneous interpreting with and without text on target-text accuracy and omissions. A remarkable difference was observed in the proportion of correctly translated proper names and numbers when interpreters had access to written text, in contrast to the figures obtained in the absence of written text. The accuracy soared to 98% with time to prepare and 92% without. It is important to note that ASR aid in consecutive interpretation, which is what this study partly aims to investigate in a product-oriented methodology, shows similarities with SI with text since the text is available as a reference during the execution of the interpreting practice. Therefore, ASR with consecutive interpreting might be entitled to consecutive interpreting with text or sight-consecutive modality. 2.1.4.4. Whispering (Chuchotage) Whispering, also known as chuchotage, is a different form of simultaneous interpreting. Known for its use in intimate and close-quartered situations, such as business dealings or guided tours, this mode provides a one-on-one, personal experience for the listener. The 23 interpreter, who is physically near the listener, whispers the interpretation, ensuring unobtrusive communication without disrupting the pace of the meeting or event. 2.1.4.5. Sign Language Interpreting Sign language interpreting involves the interpretation of verbal communication into sign language and vice versa, providing an unparalleled level of access and understanding for people with hearing impairments such as deafness and hearing loss. The role of a sign language interpreter requires a remarkable level of fluency in both sign language and spoken language, coupled with a comprehensive understanding of deaf culture and appropriate deaf etiquette in order to effectively interpret the intended message. 2.2. EFFORT MODELS IN INTERPRETING The cognitive dimension of interpreting has garnered extensive interest from experts in various disciplines including neurology, psychology, linguistics, and cognitive science. This rich intellectual landscape has spurred numerous investigations into the fundamental cognitive processes involved in interpreting, including the pivotal aspects of listening and comprehension, production, and delivery. This section delves into how these crucial components of interpreting are explained by the interpreting scholar Daniel Gile’s Effort Models (1997/2002). The cornerstone of the interpreting process lies in the crucial stage of listening and analysis. It is here where the interpreter must attentively perceive the source text produced by the speaker and embark on the initial step of speech analysis. This involves delving into the source text to decipher its message and subsequently, finding its equivalent in the target language. Consequently, the speech is produced in a series of processes that range from the initial formation of the message in the mind to speech planning and implementation. In interpreting studies, the first modelling attempt at the translational process was put forward by Danica Seleskovitch (1962) and later developed by Lederer (1981). Seleskovitch's contribution to the cognitive analysis of interpreting is widely known, particularly for her triangular process model of interpreting. In this model, 'sense' is seen as the culmination of the process, rather than mere linguistic transcoding. More 24 specifically, it is the interpreter's ability to grasp and convey the underlying 'sense' of a message rather than fixed linguistic correspondences that is the essential component of interpreting. ‘Sense’, according to Seleskovitch, is a deliberate cognitive addition to linguistic meaning, with the added characteristic of being non-verbal. In general, the main idea in the interpretive theory is that 'deverbalised' meaning is more important in translation than linguistic conversion processes. Later, more comprehensive multi-phase models are created focusing particularly on ‘processing difficulties’ (Pöchhacker, 2016). In this regard, Daniel Gile’s ‘Effort Models’ (1985, 1997/2002) is based on the idea that in situations where cognitive decisions are necessary to complete tasks, the issue of multiple-task performance arises, as the combined cognitive demands may surpass the individual's capacity limit for processing. Gile's effort model (EM) posits that there is a finite amount of cognitive 'effort', with three basic processes competing for this resource. These processes, 'listening and analysis' (L), 'production' (P) and 'memory' (M), are essential components of the interpreting process. According to the model, all efforts require processing capacity and the sum of the three efforts must not exceed the interpreter's processing capacity, suggesting that successful interpreting requires careful management of cognitive resources. Gile introduced this model in 1985 and it remains a fundamental framework for understanding the cognitive demands of interpreting. The equation hereby can be solidified as L + P + M < Capacity. In his later work, Gile expanded the model and added “Coordination Effort” (C) (management effort) and modelled simultaneous interpreting as SI = L(istening) + P(roduction) + M(emory) + C(oordination). The following set of formulas (Gile 1997/2002) was created in order to explain the relationship of the components. The overall processing capacity needs as the result of the sum of the individual processing capacity requirements (Pöchhacker, 2016, p. 91). TR (Total processing capacity requirements) = LR + MR + PR + CR LA ≥ LR MA ≥ MR PA ≥ PR 25 CA ≥ CR TA ≥ TR Gile consequently assumes that the entire available capacity must be equivalent to or greater than the total requirements. His contribution demonstrates, based on all these formulas, that during the interpreting process, an interpreter operates within the limits of their own capacity. For the interpreting process to proceed smoothly, the available capacity for each effort must be greater than or equal to the capacity required by the relevant task. If an effort is not performed adequately, there might be errors, omissions and infelicities such as incomplete comprehension or incorrect target reformulation or incomplete retrieval of information. The EMs in SI, devised by Gile, are underpinned by the Tightrope Hypothesis (Gile, 1999). In essence, this theory posits that interpreters, much like tightrope walkers, operate on the brink of cognitive saturation (Gile, 2009, p. 198). This precarious balancing act is a constant challenge, as they must coordinate various sub-tasks. Gile's analysis reveals that when the interpreter's cognitive capacity reaches its limit, errors and "infelicities" (EOIs) occur. These missteps stem from an inability to effectively deal with "problem triggers" (Gile, 1999, p. 157), such as specialized terms, proper names, and numerical data, which demand heightened cognitive resources. Gile has created other models that serve to represent the distinct challenges and efforts associated with various interpreting modalities, such as simultaneous interpreting with text, consecutive interpreting, sign language interpreting, and even remote interpreting. Due to its relevance to this study, I will focus on EMs in CI and EMs for human-machine intreraction (HMI) briefly below. 2.2.1. Effort Models in Consecutive Interpreting Different from SI, in CI (with notes), the model includes other operations since different tasks are included. To be more precise, during the listening phase, the listening effort is the same as in SI but another production effort is executed when the notes are manually produced for memory-jogging. Additionally, during the listening phase again, a short- term memory effort is required to store the information until it is noted (Gile, 2001). 26 During the reformulation phase, three efforts are required: the note-reading effort (deciphering), the long-term memory effort, which entails retrieving information from long-term memory and reconstructing the speech content, and finally, the production effort, the operation for generating the target-language speech. Ultimately, for CI with notes, the following model can be drafted (Gile, 2023): Listening and Comprehension phase: L + M + NP + C (NP: Note Production) Reformulation phase: NR + SR + P + C (NR: Note Reading SR: Speech Reconstruction from Memory) Based on this model, it is worth noting that the interpreter is able to dedicate a greater degree of attention to the monitoring of their output during the speech, compared to the simultaneous interpreting process, where such monitoring may be more difficult to accomplish due to the demands of real-time production. Similarly, in SI, as it involves the simultaneous processing of two languages in working memory (Gile, 2001), interpreters devote some attention to inhibiting the influence of the source language to avoid ‘linguistic interference’ (p. 2), making it a more challenging task. Conversely, in CI, the effort of inhibiting the source language influence might be much weaker or even non-existent since the notes taken are shorter, more summarized and organized. Therefore, from this point of view, note-taking during comprehension would inflict more cognitive requirements whereas cognitive pressure during the reformulation phase in CI with notes would be relatively less. However, this balance may shift when technological aids like Sight-Terp are incorporated into the interpreting process. The equilibrium could alter depending on which subtask the technology helps to reduce cognitive load for. 2.2.2. Effort Models in Human-Machine Interaction Daniel Gile suggests in his keynote speech (2020) that Effort Models could give rise to new versions if researchers and teachers discover novel functions connected to significant attentional resource requirements in interpreting. A potential situation could arise if interpreters were required to direct significant attention to interaction with more screens, interfaces, and technological tools. A recent development might serve as an example. 27 During the COVID pandemic, remote interpreting platforms grew in number and for the last three years, there has been an increasing volume of demand for interpreters working remotely. When team members are not in the same location, the communication between boothmates must be through video-conference platforms which, though with some essential similarities, have different interfaces and functions. In similar veins, CAI tools (see section 2.3.2), especially those designed specifically for in-booth scenarios, have certain functionalities that require familiarity and additional cognitive resources. In this respect, Gile (2020; 2023) postulates the following model (for SI), taking into account the changing technology and working environments: SI: R + M + P + HMI + C Here, ‘R’ is for reception, ‘which can be both auditive and visual’ (Gile, 2020) while HMI stands for human-machine interaction. HMI is a broad concept which might have different efforts. In the example of remote interpreting, Gile adds the turn-taking effort as TT. Turn-taking in remote interpreting can be more complex and challenging than in traditional settings, due to factors such as latency, audio quality, and coordination with other participants. However, in general, there are many combined efforts required to manage and troubleshoot technology-related issues, such as connectivity problems, audio and video settings, and platform-specific features. In the main study of this thesis, the software Sight-Terp uses ASR to generate the speech transcript with which the interpreter can deliver the interpretation by looking at the script. Since there's no need for note-taking5, the interpreter can allocate their attention to every detail in the speech, and focus on formulating the interpretation in their mind, without the extra cognitive pressure of note-taking. Though that would mean less cognitive pressure on the comprehension phase, the software's constant visual presence of the auto-generated text may induce more cognitive pressure on the reconstruction phase, requiring the interpreter to reformulate and adjust their interpretation constantly. The additional features of Sight-Terp (named entity highlighting, automatic segmentation), which is 5 Sight-Terp, in fact, allows for digital note-taking with a stylus (like Apple Pen). Though the feature of digital note-taking embedded in Sight-Terp is described in the study, note-taking is excluded from the main study and the participants are instructed to only use the ASR function. 28 detailed in the following sections (section 2.4), are deployed in order to mitigate linguistic interference which is more generally associated with sight interpreting (Agrifoglio, 2004). Based on Gile's Effort Models, a formula for an effort model specific to sight-consecutive interpreting can be drafted. In sight-consecutive interpreting, the interpreter relies on a text-based reference generated by an ASR system, which reduces the cognitive load associated with listening and memory to some extent. Consequently, the effort model for sight-consecutive interpreting might place more emphasis on the analysis of the text, the production of the target language, and the coordination of these efforts. In light of these restrictions, mitigations and possible cognitive requirements brought by Sight-Terp the following model can be drafted to encompass sight-consecutive (SCI) modality: SCI: Listening and comprehension phase: L + M + NV + C (L: listening, M: memory, NV: note verification, C: coordination) NP is replaced with NV (note verification) implying the effort of the interpreter to monitor the accuracy of the ASR, make corrections and take up strategies and coping mechanisms accordingly. The cognitive demands of using the tool will likely vary depending on the quality of the ASR output. Reformulation phase: BR + SR + P + C (BR: bilingual note reading, SR: speech reconstruction, P: production, C: coordination) In the reformulation phase, BR (bilingual note reading) is included to manage the bilingual format of the text MT and auto-generated source transcript, SR (speech reconstruction) to reconstruct the meaning of the source text, P (production) to produce the interpretation, and C (coordination) to manage the use of the tool. In the reformulation phase, Strong C and P are needed because of the linguistic interference potentially resulting from the bilingual format of the text MT and auto-generated source transcript together (see 2.4.1.1.). 29 2.3. TECHNOLOGY AND INTERPRETING As technological advances are overhauling the interpreting sphere, they cause a shift in the traditional practices of interpreters. The proliferation of large language models (LLMs), machine translation, speech recognition technologies and other cutting-edge tools have the potential to transform the interpreting process and demand a change in the way interpreters approach their work. The impact of technology on interpreting is multifaceted. On the one hand, technological innovations have streamlined information access and work management for interpreters, leading to an increase in productivity. On the other hand, the emergence of new technologies has disrupted the demand for interpretation services in the marketplace and has overhauled the entire landscape of the industry. The following section explores the proliferation of technology and its impact on interpreting and the latest technological developments and concepts with a particular focus on ASR-enhanced CAI tools. I will delve into speech technologies and their impact on both written translation and interpreting and examine ASR-enhanced computer- assisted interpreting tools. As technology continues to shape the landscape, I will explore the ways in which it affects consecutive interpreting, highlighting innovative methods and techniques. Finally, I will focus on the proposed tool 'Sight-Terp' and provide an insight into its intriguing features and capabilities. 2.3.1. The Emergence of Information Technologies in Interpreting Information and communication (ICT) tools have been a driving force in the pursuit of improved quality and productivity in both translation and interpreting over the last two decades. Interpreting has not experienced such a significant impact in contrast to the transformative effects that ICT has had on translation. However, it is possible to say that there have been crucial technological advances in the field of interpreting. When discussing the evolution of interpreting in light of the emergence of information technologies, it is worth highlighting some key breakthroughs in the field. One such example can be, as stated in section 2.1.3, the advent of simultaneous interpreting. SI stands out as the first game-changing innovation which took place in the 1920s when IBM 30 made a ground-breaking breakthrough in developing a hardwired system for instantaneous speech transmission. Gaining popularity in several other international conferences, the wired system eventually made its mark in history by becoming an irreplaceable asset during the Nuremberg trials. Needless to say, this breakthrough changed the way interpretation is facilitated on daily basis and created an imminent social status for interpreters. The second and most important breakthrough is the introduction of the world wide web, which has revolutionized the way that interpreters access and share information, opening up new avenues for research and collaboration. The significance of the internet lies behind the crucial need for preparation for interpreting assignments: conference interpreters are constantly engaging in different “specific terms, semantic background knowledge and context knowledge” in each assignment they are in (Rütten, 2016). The World Wide Web, with its fast ability to gather information from a multitude of sources, has a powerful advantage. By streamlining the information management process, interpreters have increased the efficiency of their preparation. Today, the current landscape of interpreting technology is a vast and varied one, characterised by a wide range of technological solutions that have played a significant role in ushering in a ‘technological turn’ (Fantinuoli, 2018b) in the profession and creating bespoke and non-bespoke computer-assisted interpreting tools. The categorization of the recent technologies of today’s interpreting technology sphere would be a line between the purpose and functions of such tools. Considering that the interpreting technology is a vast umbrella term, classification is a must for a thorough understanding indeed. 2.3.1.1. Categorization of Technologies in Interpreting There are a couple of approaches when it comes to the classification of ICT tools in interpreting. Fantinuoli (2018a) suggests two classifications: setting-oriented technologies and process-oriented technologies. Setting-oriented technologies “primarily influence the external conditions in which interpreting is performed” (2018a, p. 155). On the other hand, process-oriented technologies include a variety of tools, such as “terminology management systems, knowledge extraction software, and corpus analysis 31 tools“ (p. 155), all of which aim to assist interpreters in different sub-processes and various phases of an assignment. In parallel with this approach, according to Braun (2019), interpreting technology can be categorized into three. The first category is “technology-mediated interpreting” which encompasses all technologies employed to expand the reach and effectiveness of interpreting services, including remote simultaneous interpreting (RSI) equipment. In broad terms, technologies mediating interpreting entail distance interpreting technologies, which cover “a whole range of technologically different setups” (Ziegler & Gigliobianco 2018, p. 121). Remote interpreting can be defined as the utilization of various instruments of ICT to enable interpreter-mediated communication from a physically removed location. During the COVID-19 pandemic, remote interpreting served as the catalyst for the development of a fresh generation of conference interpreter profiles, a location-independent alternative to traditional conference settings. Moreover, the proliferation of video conference platforms (e.g., Zoom, Interactio, KUDO, and Interprefy) during the pandemic paved the way for computer-assisted interpreting tools explicitly developed for incorporation in RSI scenarios (see Interpreter Assist in section 2.3.2.2.). The incorporation of cutting-edge augmented reality (AR) innovations, including the deployment of advanced virtual reality goggles, can be the next evolutionary leap in remote interpreting by mitigating “the feeling of isolation” (Ziegler & Gigliobianco 2018, p. 136) and/or integrating the CAI tool interfaces on the virtual reality screen worn by the interpreter6 (Gieshoff, 2022). The second category is technology-generated interpreting, which implies machine interpreting (MI) or speech-to-speech translation. MI can be characterized as a technological advancement enabling the conversion of spoken language into another language through computer programming (speech technologies)7. MI involves a multi- 6 At the time of writing, a group of three scholars at Zurich University of Applied Sciences are examining whether augmented reality technology can provide assistance to interpreters in their additional exertion of having to consult terms. In other words, the research focuses on integration of ASR-enhanced CAI tool interface on augmented reality screen by postulating that instead of switching between different types of visual information and redirecting the visual attention for CAI output, interpreters can benefit from the output directly on their augmented reality interface by wearing virtual reality headset. 7 The section 2.3.3. briefly focuses on aforementioned speech technologies. 32 step approach that generates an audible version of the translated text by creating a synthetic speech in the target language. In cascade systems, the steps are as follows: ASR transcribes oral speech into written text. This is followed by machine translation, and finally, text-to-speech synthesis is used to generate an audible version of the translated text. The third category is “technology-supported interpreting”, which entails all technologies that can be used to augment or facilitate interpreters' preparation, performance, and workflow. In this context, technologies supporting interpreting can be considered as a wide group of technological applications and hardware that are used before, during and after the interpreting process, thereby affecting the cognitive processes behind the actual task of interpreting. CAI tools (see 2.3.3.) and other technologies that aim to enhance the performance of the task can be listed under technology-supported interpreting. The CAI tools falling under the technology-supported interpreting class has also classifications namely ‘generations’ depending on their purpose, feature and release date, as described in 2.3.3. Drawing inspiration from Ortiz and Cavallo's list of ICT tools for interpreting (2018, p. 17), which categorizes tools by their function, specificity, and update date, I have expanded the list to include new categories such as Speech Bank, Audio and Video Conference platforms, Machine Interpreting and Real-time Speech Translation. In table 3 below, the tools have been matched according to their specificities, purposes, modalities, and features to provide a comprehensive overview of the range of tools currently available to interpreters as of January 2023. The categories are training platform, speech bank, glossary management, corpora building, terminology extraction, speech recognition, note-taking, virtual booth service, audio and video conference, machine interpreting, and real-time speech translation. The tools under the category of ‘interpreter training’ and/or ‘speech bank’ show various platforms and software that facilitate lexical and terminological searches for both novice and expert interpreters. These tools aim to help interpreters hone their interpreting skills and strengthen their grasp of both their native language and foreign languages by allowing them to conduct deliberate practice using speech and other materials. Glossary 33 management, corpora building and term extraction tools (regardless of their specificity for interpreters) indicate the resources that can be used to aid interpreters during preparation, allowing them to delve deeper into the primary topic they will be interpreting. Additionally, interpreters can develop and reference personalized glossaries throughout the interpretation process, while also familiarizing themselves with the speakers' accents and backgrounds by watching videos and scouring online sources. The categories of Speech Recognition, Real-time Speech Translation, Note-taking and Virtual Booth Service include tools that are utilized for the interpreting process itself. The tools under this class are ASR-enhanced CAI tools for SI, speech translation solutions for various purposes, and note-taking applications that can be used for interpreting scenarios. Therefore, this class of categories as well as categories related to preparation/terminology can be listed under the division of technology-supported interpreting. Platforms, where remote simultaneous interpreting can be carried out8 (corresponds to technology-mediated interpreting), are listed in the category of ‘Audio and Video Conference’. Finally, tools under the category of ‘Machine Interpreting’ (speech-to- speech interpreting) are specified as ‘replacement’ (corresponds to technology-generated interpreting) referring to full automation of the interpreting process, resulting in a complete replacement of human interpreters. In this category, available devices and tools on the market are added based on their availability. The columns of the table show specificity (whether it is designed for interpreters), purpose (main aim of usage), modality (simultaneous interpreting and/or consecutive interpreting), and feature (remote interpreting platform, ASR-enhanced or fully ASR-powered, replacement by MI).