Hacettepe University Graduate School of Social Sciences Department of Translation and Interpreting EFFORT IN MACHINE TRANSLATION POST-EDITING: THE ROLE OF TRANSLATION EXPERTISE AND INDIVIDUAL COGNITIVE DIFFERENCES Minel SAYAR ÖZTÜRK Ph.D. Dissertation Ankara, 2026 EFFORT IN MACHINE TRANSLATION POST-EDITING: THE ROLE OF TRANSLATION EXPERTISE AND INDIVIDUAL COGNITIVE DIFFERENCES Minel SAYAR ÖZTÜRK Hacettepe University Graduate School of Social Sciences Department of Translation and Interpreting Ph.D. Dissertation Ankara, 2026 ACCEPTANCE AND APPROVAL The jury finds that Minel SAYAR ÖZTÜRK has on the date of 13.01.2026 successfully passed the defence examination and approves her PhD. Dissertation titled “Effort in Machine Translation Post-Editing: The Role of Translation Expertise and Individual Cognitive Differences.” Prof. Dr. Sakibe Nalan BÜYÜKKANTARCIOĞLU (Jury President) Assoc. Prof. Dr. Alper KUMCU (Main Adviser) Prof. Dr. Ayşe Şirin OKYAYUZ Assoc. Prof. Dr. Hilal ERKAZANCI DURMUŞ Assist. Prof. Dr. Şerife DALBUDAK I agree that the signatures above belong to the faculty members listed. Prof. Dr. Uğur ÖMÜRGÖNÜLŞEN Graduate School Director YAYIMLAMA VE FİKRİ MÜLKİYET HAKLARI BEYANI Enstitü tarafından onaylanan lisansüstü tezimin/raporumun tamamını veya herhangi bir kısmını, basılı (kağıt) ve elektronik formatta arşivleme ve aşağıda verilen koşullarla kullanıma açma iznini Hacettepe Üniversitesine verdiğimi bildiririm. Bu izinle Üniversiteye verilen kullanım hakları dışındaki tüm fikri mülkiyet haklarım bende kalacak, tezimin tamamının ya da bir bölümünün gelecekteki çalışmalarda (makale, kitap, lisans ve patent vb.) kullanım hakları bana ait olacaktır. Tezin kendi orijinal çalışmam olduğunu, başkalarının haklarını ihlal etmediğimi ve tezimin tek yetkili sahibi olduğumu beyan ve taahhüt ederim. Tezimde yer alan telif hakkı bulunan ve sahiplerinden yazılı izin alınarak kullanılması zorunlu metinlerin yazılı izin alınarak kullandığımı ve istenildiğinde suretlerini Üniversiteye teslim etmeyi taahhüt ederim. Yükseköğretim Kurulu tarafından yayınlanan “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” kapsamında tezim aşağıda belirtilen koşullar haricince YÖK Ulusal Tez Merkezi / H.Ü. Kütüphaneleri Açık Erişim Sisteminde erişime açılır. o Enstitü / Fakülte yönetim kurulu kararı ile tezimin erişime açılması mezuniyet tarihimden itibaren 2 yıl ertelenmiştir. (1) o Enstitü / Fakülte yönetim kurulunun gerekçeli kararı ile tezimin erişime açılması mezuniyet tarihimden itibaren ... ay ertelenmiştir. (2) o Tezimle ilgili gizlilik kararı verilmiştir. (3) ……/………/…… Minel SAYAR ÖZTÜRK 1“Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” (1) Madde 6. 1. Lisansüstü tezle ilgili patent başvurusu yapılması veya patent alma sürecinin devam etmesi durumunda, tez danışmanının önerisi ve enstitü anabilim dalının uygun görüşü üzerine enstitü veya fakülte yönetim kurulu iki yıl süre ile tezin erişime açılmasının ertelenmesine karar verebilir. (2) Madde 6. 2. Yeni teknik, materyal ve metotların kullanıldığı, henüz makaleye dönüşmemiş veya patent gibi yöntemlerle korunmamış ve internetten paylaşılması durumunda 3. şahıslara veya kurumlara haksız kazanç imkanı oluşturabilecek bilgi ve bulguları içeren tezler hakkında tez danışmanının önerisi ve enstitü anabilim dalının uygun görüşü üzerine enstitü veya fakülte yönetim kurulunun gerekçeli kararı ile altı ayı aşmamak üzere tezin erişime açılması engellenebilir. (3) Madde 7. 1. Ulusal çıkarları veya güvenliği ilgilendiren, emniyet, istihbarat, savunma ve güvenlik, sağlık vb. konulara ilişkin lisansüstü tezlerle ilgili gizlilik kararı, tezin yapıldığı kurum tarafından verilir *. Kurum ve kuruluşlarla yapılan işbirliği protokolü çerçevesinde hazırlanan lisansüstü tezlere ilişkin gizlilik kararı ise, ilgili kurum ve kuruluşun önerisi ile enstitü veya fakültenin uygun görüşü üzerine üniversite yönetim kurulu tarafından verilir. Gizlilik kararı verilen tezler Yükseköğretim Kuruluna bildirilir. Madde 7.2. Gizlilik kararı verilen tezler gizlilik süresince enstitü veya fakülte tarafından gizlilik kuralları çerçevesinde muhafaza edilir, gizlilik kararının kaldırılması halinde Tez Otomasyon Sistemine yüklenir * Tez danışmanının önerisi ve enstitü anabilim dalının uygun görüşü üzerine enstitü veya fakülte yönetim kurulu tarafından karar verilir. ETİK BEYAN Bu çalışmadaki bütün bilgi ve belgeleri akademik kurallar çerçevesinde elde ettiğimi, görsel, işitsel ve yazılı tüm bilgi ve sonuçları bilimsel ahlak kurallarına uygun olarak sunduğumu, kullandığım verilerde herhangi bir tahrifat yapmadığımı, yararlandığım kaynaklara bilimsel normlara uygun olarak atıfta bulunduğumu, tezimin kaynak gösterilen durumlar dışında özgün olduğunu, Doç. Dr. Alper KUMCU danışmanlığında tarafımdan üretildiğini ve Hacettepe Üniversitesi Sosyal Bilimler Enstitüsü Tez Yazım Yönergesine göre yazıldığını beyan ederim. Minel SAYAR ÖZTÜRK iv DEDICATION (OPTIONAL) To my son Uraz Nabi, v ACKNOWLEDGEMENTS First and foremost, I would like to express my deepest gratitude to my supervisor, Assoc. Prof. Dr. Alper Kumcu, whose guidance, patience, and profound knowledge shaped every stage of this study. His door was always open, and his academic insight and human kindness uplifted me whenever I needed support. I am sincerely thankful for his trust in me and for his invaluable contribution to both this dissertation and my growth as a researcher. I also wish to extend my heartfelt thanks to my committee members, Prof. Dr. Ayşe Şirin Okyayuz and Assist. Prof. Dr. Şerife Dalbudak, for their insightful comments and constructive criticism throughout the research process. Their guidance, particularly with respect to methodological considerations, consistently helped me feel that I was on the right track. After every committee meeting, I found myself reassured and motivated to move forward, and I owe that confidence to their sustained support. In addition, I would like to thank the jury president, Prof. Dr. Sakibe Nalan Büyükkantarcıoğlu, and jury member, Assoc. Prof. Dr. Hilal Erkazancı Durmuş, for their careful evaluation and constructive, strengthening feedback. Their perspectives at the final stage of the dissertation not only contributed significantly to the clarity and overall quality of this work, but also offered valuable insights that will continue to inform and shape my future research endeavours. My sincere appreciation also goes to Prof. Dr. Neslihan Kansu Yetkiner, my professor from my undergraduate years at Izmir University of Economics. She was the foundation upon which I built everything that followed, and her support has continued to this day. I would also like to thank Prof. Dr. Mehmet Şahin, who first introduced me to the fascinating world of machine translation, translation technologies, and post-editing. Thanks to him, I am writing this very dissertation on the topic he sparked my curiosity about years ago. I would like to express my gratitude to Hacettepe University Translation and Interpreting Postgraduate Research Laboratory, which provided the perfect environment for conducting my experiment. I am also profoundly thankful to the 21 amazing participants vi who generously devoted their time and effort to this study. Although I cannot name them here, I sincerely hope they know that this research could not have been completed without them. They truly are the best. My journey was made brighter by dear friends who walked beside me. Ahsen Can, my PhD companion and fellow mother, was my bridge to the department and my source of motivation when things felt overwhelming. Her support and friendship have been constants that I will always treasure. Başak Pırıl Gökayaz, my friend from MA days, shared with me the same doctoral adventure in different universities. Our heart-to-heart conversations and shared breakdowns, which we can now laugh about, kept me sane during this process. I will miss those chaotic yet beautiful moments more than I imagined. My deepest love and appreciation go to my family. To my husband, Sami Öztürk, my greatest supporter and my best listener. He is the calm in my storms and the laughter in my hardest days. Thank you for being by my side at all times and for sharing every moment of this long journey with patience, understanding, and love. To my son, Uraz Nabi, to whom I dedicate this dissertation, you are the greatest joy, the sweetest challenge, and the reason behind every step I take. You are my sun and my moon. I love you so much that words will never be enough to describe it. To my mother, Fahriye Genç, thank you for your endless support across the distance and for coming to me whenever I needed help. I am especially grateful for the countless small but meaningful gestures, including the late-night surprises with snacks when I woke up at 2 a.m. to work on this dissertation. To my grandmother, Kadriye Genç, and my uncle, Ferhat Genç, thank you for always putting a smile on my face during our laughter-filled phone calls, your humour is the best stress-relief therapy one could ever have. And to my parents-in-law, Melahat Öztürk and Raif Öztürk, thank you for taking care of my little one so that I could take care of my PhD. You made the impossible possible. Finally, I would like to thank myself. For finishing this PhD while raising a child and building a professional life. For making it through before turning thirty. For staying strong, curious, and passionate even when things got tough. So here’s a little pat on my own back: well done, girl, you made it! vii ABSTRACT SAYAR ÖZTÜRK, Minel. Effort in Machine Translation Post-Editing: The Role of Translation Expertise and Individual Cognitive Differences, Ph.D. Dissertation, Ankara, 2026. Machine translation post-editing (MTPE) has become an integral component of contemporary translation workflows. While technological advances are often assumed to reduce human effort, research suggests that effort in MTPE is complex and multidimensional. This dissertation investigates the relationship between expertise, effort, accuracy, and individual cognitive differences in MTPE, addressing three research questions: (1) to what extent expertise serves as a valid indicator of cognitive, temporal, and technical/linguistic effort in MTPE; (2) how expertise relates to objective and self-reported effort and post- editing (PE) accuracy; and (3) whether individual cognitive difference scores predict effort during MTPE. A mixed-methods experimental design was employed with 21 participants grouped as experienced translators, inexperienced translators, and field experts. Participants post-edited three legal texts that had been machine-translated from English into Turkish. Data were collected through keylogging using Translog-II, capturing total task duration, mean pause duration, total pause duration, pause percentage, pauses per word, text production and elimination, and user and production events per minute indicators. PE accuracy, post-task questionnaires, and retrospective think-aloud protocols were also collected. Additionally, individual cognitive differences were examined using executive function tests administered in the PEBL, measuring cognitive flexibility, inhibition, and working memory. The findings demonstrate that experienced translators often invested more temporal, cognitive, and technical/linguistic effort than the other groups, reflecting deliberate monitoring, revision, and quality control strategies. At the same time, expertise emerged as the strongest predictor of PE accuracy. Cognitive flexibility was found to meaningfully predict how effort was managed during MTPE, while inhibition and working memory showed more limited and selective effects. Consequently, the results indicate that effort in MTPE is shaped by expertise and cognitive profiles rather than being automatically reduced by technology, underscoring the continued centrality of expert human translators in achieving high-quality PE outcomes. Keywords Machine translation post-editing (MTPE), cognitive effort, temporal effort, technical and linguistic effort, expertise, cognitive differences, executive function viii ÖZET SAYAR ÖZTÜRK, Minel. Makine Çevirisi Sonrası Düzenlemede Çaba: Çeviri Uzmanlığı ile Bireysel Bilişsel Farklılıkların Rolü, Doktora Tezi, Ankara, 2026. Makine çevirisi sonrası düzenleme, çağdaş çeviri iş akışlarının ayrılmaz bir bileşeni hâline gelmiştir. Teknolojik gelişmelerin insan emeğini azalttığı varsayılsa da araştırmalar makine çevirisi sonrası düzenlemede çabanın karmaşık ve çok boyutlu olduğunu göstermektedir. Bu doktora tezi, makine çevirisi sonrası düzenlemede uzmanlık, çaba, doğruluk ve bireysel bilişsel farklılıklar arasındaki ilişkiyi incelemekte ve üç araştırma sorusunu ele almaktadır: (1) Uzmanlık, makine çevirisi sonrası düzenlemede bilişsel, zamansal ve teknik/dilsel çabanın geçerli bir göstergesi olarak ne ölçüde kullanılabilir? (2) Uzmanlık, nesnel ve öznel çaba göstergeleri ile düzenlemeden geçirilmiş metnin doğruluğu arasındaki ilişki nedir? (3) Bireysel bilişsel farklılıklar, makine çevirisi sonrası düzenlemede gereken çabayı öngörür mü? Deneyimli çevirmenler, deneyimsiz çevirmenler ve alan uzmanlarından oluşan 21 katılımcıyla karma yöntemli bir deneysel tasarım uygulanmıştır. Katılımcılar, İngilizceden Türkçeye makine çevirisi yapılmış üç hukuki metni düzenlemiştir. Veriler, Translog-II kullanılarak tuş kaydı yoluyla toplanmış ve çabaya ilişkin toplam görev süresi, ortalama duraklama süresi, toplam duraklama süresi, duraklama yüzdesi, sözcük başına duraklama, metin üretimi ve metin silimi, dakika başına kullanıcı eylemleri ve üretim eylemleri gibi göstergeler elde edilmiştir. Ayrıca son düzenleme doğruluğu, görev sonrası anketler ve geriye dönük sesli düşünme protokolleri de toplanmıştır. Bunun yanında, bilişsel esneklik, inhibisyon ve çalışma belleğini ölçen ve PEBL’da uygulanan yürütücü işlev testleri aracılığıyla bireysel bilişsel farklılıklar incelenmiştir. Bulgular, deneyimli çevirmenlerin bilinçli izleme, gözden geçirme ve kalite kontrol stratejilerini uygulamak amacıyla diğer gruplara kıyasla daha fazla zamansal, bilişsel ve teknik/dilsel çaba harcadığını göstermiştir. Aynı zamanda uzmanlığın, son düzenleme doğruluğunun en güçlü belirleyicisi olduğu anlaşılmıştır. Bilişsel esnekliğin makine çevirisi sonrası düzenleme sırasında çabanın nasıl yönetildiğini anlamlı bir şekilde belirlediği anlaşılırken, inhibisyon ve çalışma belleğinin ise daha sınırlı ve seçici etkileri olduğu görülmüştür. Sonuç olarak, makine çevirisi sonrası düzenlemede çabanın teknolojiyle otomatik olarak azalmak yerine uzmanlık ve bilişsel profiller ile şekillendiği ve son düzenlemeden geçirilmiş yüksek kaliteli çıktıların elde edilmesinde uzman çevirmenlerin merkezi rolünü sürdürdüğü sonucuna varılmıştır. Anahtar Sözcükler Makine çevirisi son düzenleme, bilişsel çaba, zamansal çaba, teknik ve dilsel çaba, uzmanlık, bilişsel farklılıklar, yürütücü işlev ix TABLE OF CONTENTS ACCEPTANCE AND APPROVAL ........................................................................................... I YAYIMLAMA VE FİKRİ MÜLKİYET HAKLARI BEYANI ............................................. II ETİK BEYAN............................................................................................................................ III DEDICATION........................................................................................................................... IV ACKNOWLEDGEMENTS ....................................................................................................... V ABSTRACT ............................................................................................................................. VII ÖZET ...................................................................................................................................... VIII TABLE OF CONTENTS ......................................................................................................... IX ABBREVIATIONS ................................................................................................................. XII TABLES INDEX .................................................................................................................... XIII FIGURES INDEX ............................................................................................................... XIVV PHOTO INDEX ....................................................................................................................... XX CHAPTER 1: INTRODUCTION .............................................................................................. 1 1.1. PROBLEM SITUATION ............................................................................................... 2 1.2. AIM OF THE STUDY .................................................................................................... 2 1.3. RESEARCH QUESTIONS ............................................................................................ 3 1.4. LIMITATIONS ............................................................................................................... 6 1.5. IMPORTANCE OF THE STUDY ................................................................................ 7 CHAPTER 2: THEORETICAL BACKGROUND .................................................................. 8 2.1. MACHINE TRANSLATION AND POST-EDITING ................................................. 8 2.1.1. A Very Brief History .................................................................................................. 8 2.1.2. Neural Machine Translation (NMT) ........................................................................ 11 2.1.3. Machine Translation Post-editing (MTPE) .............................................................. 16 2.2. EFFORT IN POST-EDITING ..................................................................................... 19 x 2.2.1. Pause as Cognitive Effort ......................................................................................... 24 2.2.2. Keystroke Logging as Technical and Linguistic Effort ........................................... 28 2.2.3. Duration as Temporal Effort .................................................................................... 29 2.2.4. Individual Variables and Effort in PE ...................................................................... 31 2.2.5. Language Typology and Effort in PE ....................................................................... 32 2.2.4. Ergonomics and Effort in PE.................................................................................... 35 2.3. EXPERTISE IN TRANSLATION .............................................................................. 37 2.3.1. Translation Expertise ................................................................................................ 37 2.3.2. Field Expertise.......................................................................................................... 39 CHAPTER 3: METHODOLOGY ........................................................................................... 42 3.1. ETHICAL CONSIDERATIONS ................................................................................. 47 3.2. TEXT TYPE .................................................................................................................. 48 3.3. SOURCE TEXTS .......................................................................................................... 48 3.4. PARTICIPANTS .......................................................................................................... 50 3.5. MACHINE TRANSLATION TOOL .......................................................................... 52 3.6. EXPERIMENTAL PROCEDURE.............................................................................. 52 3.7. PILOT STUDY ............................................................................................................. 56 3.8. DATA COLLECTION ................................................................................................. 57 3.9. DATA ANALYSIS ........................................................................................................ 59 CHAPTER 4: RESULTS AND DISCUSSION ...................................................................... 62 4.1. TEMPORAL EFFORT ANALYSIS ........................................................................... 62 4.2. COGNITIVE EFFORT ANALYSIS ........................................................................... 65 4.3. TECHNICAL AND LINGUSTIC EFFORT ANALYSIS ......................................... 74 4.4. POST-EDITING ACCURACY ................................................................................... 78 4.5. PREDICTORS OF EFFORT MEASURES ............................................................... 81 4.6. COGNITIVE TESTS .................................................................................................... 85 4.7. RETROSPECTIVE THINK-ALOUD PROTOCOLS (rTAPs) .............................. 94 xi 4.8. POST-TASK QUESTIONNAIRE ............................................................................. 109 CONCLUSION ..................................................................................................................114114 BIBLIOGRAPHY ................................................................................................................... 124 APPENDIX 1. ORIGINALITY REPORT ............................................................................ 137 APPENDIX 2. ETHICS COMMISSION FORM ................................................................. 139 APPENDIX 3. VOLUNTARY PARTICIPATION INFORMATION FORM FOR PARTICIPANTS..................................................................................................................... 140 APPENDIX 4. SOURCE AND TARGET TEXTS USED IN THE EXPERIMENT ........ 143 APPENDIX 5. EXPERIMENTAL PROCEDURE PAPER ................................................ 149 APPENDIX 6. PRE-EXPERIMENT SURVEY ................................................................... 150 APPENDIX 7. POST-TASK QUESTIONNAIRE ............................................................... 153 APPENDIX 8. NASA TASK LOAD INDEX ........................................................................ 156 APPENDIX 9. TRANSCRIPTS OF THE RETROSPECTIVE THINK-ALOUD PROTOCOL RECORDINGS ................................................................................................ 157 xii ABBREVIATIONS AI Artificial intelligence ANN Artificial neural network DNN Deep neural network EBMT Example-based machine translation MT Machine translation MTPE Machine translation post-editing NASA-TLX National Aeronautics and Space Administration Task Load Index NMT Neural machine translation PE Post-editing RBMT Rule-based machine translation RNN Recurrent neural network RTAP Retrospective Think-Aloud Protocol SMT Statistical machine translation ST Source text TAUS Translation Automation User Society TT Target text xiii TABLES INDEX Tablo 2.1 TAUS MTPE guidelines for light and full PE (adapted from (Massardo et al., 2016) ................................................................................................... Error! Bookmark not defined.8 Tablo 3.1 Descriptive statistics of the English STs and their Turkish TTs used in the experiment, including lexical, syntactic, and readability measures ................................................................ 50 Tablo 3.2 Distribution of participants across the pilot study and the main experiment .............. 51 Tablo 3.3 Demographics of the participants ............................................................................... 52 Tablo 3.4 Overview of the pilot study design, including participant groups, post-edited texts, and administered instruments ............................................................................................................ 57 Tablo 4.1 Participant statements regarding the MT outputs encountered during the MTPE tasks .................................................................................................................................................... 96 Tablo 4.2 Participant statements regarding the pauses experienced during the MTPE tasks ..... 99 Tablo 4.3 Participant statements regarding the positive aspects encountered during the MTPE tasks .......................................................................................................................................... 101 Tablo 4.4 Participant statements regarding the negative aspects encountered during the MTPE tasks .......................................................................................................................................... 103 Tablo 4.5 Additional participant statements regarding the experiment .................................... 106 xiv FIGURES INDEX Figure 2.1 Schematic representation of the NMT operation process, illustrating the encoding, meaning-capturing, and decoding stages with the roles of attention and transformer mechanisms .................................................................................................................................................... 14 Figure 2.2 Neural network architecture illustrating the input layer, multiple hidden layers, and output layer (adapted from (Chatterjee et al., 2019, p. 4). .......................................................... 15 Figure 3.1 Road map of the experimental methodological design .............................................. 47 Figure 3.2 Schematic representation of the experimental procedure for MTPE ......................... 56 Figure 4.1 Violin plots show the distribution of total task duration (in seconds) across participant groups during the MTPE tasks. The box plots within the violins show the whole range (vertical line), the median (horizontal line), the mean (pink square), and the interquartile range (box). Light grey dots represent data points for individual participants (n = 21). Participant groups are represented by colour: experienced translators (red), inexperienced translators (blue), and field experts (green). The bracket indicates a statistically significant difference between groups (p = .043).. .......................................................................................................................................... 63 Figure 4.2 Violin plots show the distribution of total pause duration (in seconds) across participant groups during the MTPE tasks. The box plots within the violins show the whole range (vertical line), the median (horizontal line), the mean (pink square), and the interquartile range (box). Light grey dots represent data points for individual participants (n = 21). Participant groups are represented by colour: experienced translators (red), inexperienced translators (blue), and field experts (green). The bracket indicates a statistically significant difference between groups (p = .027). ........................................................................................................................................... 66 Figure 4.3 Violin plots show the distribution of pause duration per word (in seconds) across participant groups during the MTPE tasks. The box plots within the violins show the whole range (vertical line), the median (horizontal line), the mean (pink square), and the interquartile range (box). Light grey dots represent data points for individual participants (n = 21). Participant groups are represented by colour: experienced translators (red), inexperienced translators (blue), and field experts (green). The bracket indicates a statistically significant difference between groups (p = .022). ........................................................................................................................................... 68 Figure 4.4 Violin plots show the distribution of mean pause duration (in seconds) across participant groups during the MTPE tasks, presented with and without an extreme outlier (All xv three tasks of Participant No: 012 in Field Expert Group, and two tasks of Participant No: 003 in Inexperienced Translator Group). The left panel displays the distribution including the outlier, while the right panel shows the distribution after exclusion of the outlier. The box plots within the violins show the whole range (vertical line), the median (horizontal line), the mean (pink square), and the interquartile range (box). Light grey dots represent data points for individual participants (n = 21). Participant groups are represented by colour: experienced translators (red), inexperienced translators (blue), and field experts (green). The bracket in the right panel indicates a statistically significant difference between groups (p = .041).. ................................................ 70 Figure 4.5 Violin plots show the distribution of pause percentage (%) across participant groups during the MTPE tasks. The box plots within the violins show the whole range (vertical line), the median (horizontal line), the mean (pink square), and the interquartile range (box). Light grey dots represent data points for individual participants (n = 21). Participant groups are represented by colour: experienced translators (red), inexperienced translators (blue), and field experts (green). ........................................................................................................................................ 72 Figure 4.6 Violin plots show the distribution of text production (number of characters) across participant groups during the MTPE tasks. The box plots within the violins show the whole range (vertical line), the median (horizontal line), the mean (pink square), and the interquartile range (box). Light grey dots represent data points for individual participants (n = 21). Participant groups are represented by colour: experienced translators (red), inexperienced translators (blue), and field experts (green). The brackets indicate statistically significant differences between groups (p = .006 and p = .002). ...................................................................................................................... 75 Figure 4.7 Violin plots show the distribution of text elimination (number of characters) across participant groups during the MTPE tasks. The box plots within the violins show the whole range (vertical line), the median (horizontal line), the mean (pink square), and the interquartile range (box). Light grey dots represent data points for individual participants (n = 21). Participant groups are represented by colour: experienced translators (red), inexperienced translators (blue), and field experts (green). The bracket indicates a statistically significant difference between groups (p = .027).. .......................................................................................................................................... 76 Figure 4.8 Violin plots show the distribution of production events per minute (count) across participant groups during the MTPE tasks. The box plots within the violins show the whole range (vertical line), the median (horizontal line), the mean (pink square), and the interquartile range (box). Light grey dots represent data points for individual participants (n = 21). Participant groups are represented by colour: experienced translators (red), inexperienced translators (blue), and field xvi experts (green). The bracket indicates a statistically significant difference between groups (p = .005). ........................................................................................................................................... 77 Figure 4.9 Violin plots show the distribution of accuracy scores across participant groups during the MTPE tasks. The box plots within the violins show the whole range (vertical line), the median (horizontal line), the mean (pink square), and the interquartile range (box). Light grey dots represent data points for individual participants (n = 21). Participant groups are represented by colour: experienced translators (red), inexperienced translators (blue), and field experts (green). The brackets indicate statistically significant differences between groups (p < .001). ............... 80 Figure 4.10 Scatter plot illustrating the relationship between physical demand scores and total task duration (in seconds) during the MTPE tasks. Individual data points represent participants’ self-reported physical demand scores measured by the NASA-TLX and their corresponding total task duration. The dashed line represents the fitted linear regression model, indicating the predicted trend between physical demand and total task duration.. ............................................ 82 Figure 4.11 Scatter plot illustrating the relationship between physical demand scores and pause duration per word (in seconds) during the MTPE tasks. Individual data points represent participants’ self-reported physical demand scores measured by the NASA-TLX and their corresponding pause duration per word values. The dashed line represents the fitted linear regression model, indicating the predicted trend between physical demand and pause duration per word. ........................................................................................................................................... 83 Figure 4.12 Scatter plot illustrating the relationship between physical demand scores and total pause duration (in seconds) during the MTPE tasks. Individual data points represent participants’ self-reported physical demand scores measured by the NASA-TLX and their corresponding total pause duration values. The dashed line represents the fitted linear regression model, indicating the predicted trend between physical demand and total pause duration. .................................... 84 Figure 4.13 Scatter plot showing the relationship between cognitive flexibility, measured by BCST accuracy (z-scored), and mean pause duration during the MTPE tasks. Each dot represents an individual participant (n = 21). Mean pause duration values are log-transformed. The red line represents the fitted linear regression, with the shaded grey area indicating the 95% confidence interval. The reported regression coefficient (B = 0.40) and significance level (p < .001) indicate a significant positive association between BCST accuracy and mean pause duration. .............. 87 Figure 4.14 Scatter plot showing the relationship between inhibitory control, measured by Stroop accuracy (z-scored), and mean pause duration during the MTPE tasks. Each dot represents an individual participant (n = 21). Mean pause duration values are log-transformed. The red line xvii represents the fitted linear regression, with the shaded grey area indicating the 95% confidence interval. The reported regression coefficient (B = - 0.25) and significance level (p < .05) indicate a significant negative association between Stroop accuracy and mean pause duration.. ............ 88 Figure 4.15 Scatter plot showing the relationship between cognitive flexibility, measured by BCST accuracy (z-scored), and total task duration during the MTPE tasks. Each dot represents an individual participant (n = 21). Total task duration values are log-transformed. The blue line represents the fitted linear regression, with the shaded grey area indicating the 95% confidence interval. The reported regression coefficient (B = - 0.24) and significance level (p < .05) indicate a significant negative association between BCST accuracy and total task duration. .................. 89 Figure 4.16 Scatter plot showing the relationship between inhibitory control, measured by Stroop reaction time (z-scored), and total task duration during the MTPE tasks. Each dot represents an individual participant (n = 21). Total task duration values are log-transformed. The blue line represents the fitted linear regression, with the shaded grey area indicating the 95% confidence interval. The reported regression coefficient (B = - 0.29) and significance level (p < .05) indicate a significant negative association between Stroop reaction time and total task duration. .......... 90 Figure 4.17 Scatter plot showing the relationship between cognitive flexibility, measured by BCST accuracy (z-scored), and user events during the MTPE tasks. Each dot represents an individual participant (n = 21). User event values are log-transformed. The green line represents the fitted linear regression, with the shaded grey area indicating the 95% confidence interval. The reported regression coefficient (B = - 0.61) and significance level (p < .001) indicate a significant negative association between BCST accuracy and the number of user events.. ......................... 91 Figure 4.18 Scatter plot showing the relationship between inhibitory control, measured by Stroop accuracy (z-scored), and user events during the MTPE tasks. Each dot represents an individual participant (n = 21). User event values are log-transformed. The green line represents the fitted linear regression, with the shaded grey area indicating the 95% confidence interval. The reported regression coefficient (B = 0.38) and significance level (p < .01) indicate a significant positive association between Stroop accuracy and the number of user events. ........................................ 92 Figure 4.19 Scatter plot showing the relationship between working memory performance, measured by n-back reaction time (z-scored), and user events during the MTPE tasks. Each dot represents an individual participant (n = 21). User event values are log-transformed. The green line represents the fitted linear regression, with the shaded grey area indicating the 95% confidence interval. The reported regression coefficient (B = - 0.26) and significance level (p < .05) indicate a significant negative association between n-back reaction time and the number of user events. .................................................................................................................................. 93 xviii Figure 4.20 Word cloud representing the most frequently referred terms in participants’ comments on MT outputs encountered during the experiment. ................................................................... 98 Figure 4.21 Word cloud representing the most frequently referred terms in participants’ comments on the pauses experienced during the experiment .................................................................... 100 Figure 4.22 Word cloud representing the most frequently referred terms in participants’ comments on the positive aspects encountered during the experiment ...................................................... 102 Figure 4.23 Word cloud representing the most frequently referred terms in participants’ comments on the positive aspects encountered during the experiment. ..................................................... 105 Figure 4.24 Distribution of participant responses to post-task questionnaire statement 1: “I believe that the MT output was complete and error-free before my post-editing.” The pie chart illustrates the percentage distribution of responses across the five-point Likert scale. Colours represent response categories as follows: strongly disagree (blue), disagree (red), neutral (orange), agree (green), and strongly agree (purple). Percentages indicate the proportion of participants selecting each response option. ................................................................................................................ 109 Figure 4.25 Distribution of participant responses to post-task questionnaire statement 2: “I believe that the MT output would be suitable for delivery to a client if revised by a language professional (e.g., a translator).” The pie chart illustrates the percentage distribution of responses across the five-point Likert scale. Colours represent response categories as follows: strongly disagree (blue), disagree (red), neutral (orange), agree (green), and strongly agree (purple). Percentages indicate the proportion of participants selecting each response option. ................................................. 110 Figure 4.26 Distribution of participant responses to post-task questionnaire statement 3: “I believe that the MT output would be suitable for delivery to a client if revised by a field expert (e.g., a lawyer).” The pie chart illustrates the percentage distribution of responses across the five-point Likert scale. Colours represent response categories as follows: strongly disagree (blue), disagree (red), neutral (orange), agree (green), and strongly agree (purple). Percentages indicate the proportion of participants selecting each response option. ....................................................... 111 Figure 4.27 Distribution of participant responses to post-task questionnaire statement 4: “I experienced difficulty while post-editing the MT output.” The pie chart illustrates the percentage distribution of responses across the five-point Likert scale. Colours represent response categories as follows: strongly disagree (blue), disagree (red), neutral (orange), agree (green), and strongly agree (purple). Percentages indicate the proportion of participants selecting each response option. .................................................................................................................................................. 112 xix Figure 5.1 Dimension-Based Synthesis of the Multimethod Analysis Adopted in the Present Dissertation.. ............................................................................................................................. 114 xx PHOTO INDEX Photo 3.1 Screenshot of the Dual n-back Task Interface in the PEBL ....................................... 44 Photo 3.2 Screenshot of the Color Stroop Task Interface in the PEBL ...................................... 45 Photo 3.3 Screenshot of the Berg Cart Sorting Task Interface in the PEBL............................... 46 Photo 3.4 A Participant performing the MTPE task using Translog-II in the Translation and Interpreting Postgraduate Research Laboratory at Hacettepe University ................................... 53 Photo 3.5 Screenshot of the Translog-II Interface used during the MTPE Task ........................ 54 1 CHAPTER 1 INTRODUCTION It was only a couple of decades ago that perhaps one of the greatest shifts in human history emerged with the advent of the internet and mechanisation. This shift not only transformed tangible practices and routines, but also led to a redefinition of intangible phenomena, including the perception of time. This may sound somewhat abstract; however, when one considers everyday life at the end of the 20th century, it becomes easier to imagine the amount of time required to fulfil even relatively simple tasks, such as communicating with someone at a distance. Until very recently, accessing information, for instance, required consulting physical archives, printed encyclopaedias, or specialised professionals. In contrast, contemporary technologies have reduced these high temporal costs to mere seconds. A message can now be sent instantly across continents, information can be retrieved with a single query, and tasks that once demanded sustained effort are increasingly automated. As a result, the concept of time has re-evaluated and is now oriented towards optimisation, management, and efficiency. This transformation in the perception of time has inevitably extended to professional domains, including translation. Translation is now embedded within technologically enriched environments that prioritise speed, efficiency, and consistency. Machine translation (MT) systems, neural models, and post-editing (PE) practices have become integral components of contemporary translation workflows, reshaping how translators interact with texts, tools, and temporal constraints. While these technologies enable increased productivity and reduced time, they also introduce new forms of cognitive, technical, and temporal demands, raising critical questions about effort, quality, and human–machine interaction. Within this perspective, this dissertation aims to investigate the temporal, cognitive, and technical and linguistic effort exerted by participants while performing machine translation post-editing (MTPE) tasks. By collecting both qualitative and quantitative 2 data, the study focuses on effort during MTPE and its relationship with individual differences and task demands. 1.1. PROBLEM SITUATION This process-oriented study, which focuses on the cognitive effort exerted during MTPE, serves as interdisciplinary research at the intersection of translation and cognitive psychology. It was in the late 1990s that scholars in the field of translation studies, such as Lörscher (1991b) and Hansen (1999), turned their attention to cognitive psychology to conduct research on the translation process. Since then, there has been increasing interest in the matter, especially due to the emergence of tools used for quantitative data collection, including eye-tracking and keylogging software programs (see Hvelplund, 2017; Jakobsen, 2011; O’Brien, 2009). On the other hand, as MT tools have become widely used and now provide high-quality outcomes, MTPE has emerged as an increasingly critical task for translation service providers who aim to be up-to-date, efficient, and proactive in the profession. Due to this shift in practice, the theoretical studies in the field have increasingly focused on MTPE (e.g., Herbig et al., 2021; O’Brien, 2022; Omazić & Šoštarić, 2023). However, considering the current state of knowledge in the field of translation studies, although MT technology is widely used in the translation process, there is a limited number of empirical studies focusing on different participant groups that include both translation professionals and field experts performing the same MTPE task. This approach, in which the evaluation is based on effort, is important for understanding the effect of expertise on MTPE. The literature covering the English and Turkish language pair investigated in this research, on the other hand, remains highly underexplored, which further adds value to the study. Moreover, the inclusion of individual differences in executive functions contributes to the authenticity of the research. 1.2. AIM OF THE STUDY The aim of this translation process research is to investigate the effort involved in the MTPE process for legal texts. The study places particular emphasis on how varying levels of translation expertise and domain-specific knowledge influence different dimensions of 3 effort during MTPE. By comparing the PE performance of experienced and inexperienced translators in the field of legal translation, as well as legal domain experts including attorneys and academics, the study seeks to identify the extent to which professional background shapes temporal, cognitive, and technical and linguistic effort. In addition, grounded in both qualitative and quantitative data, the study examines the role of individual differences in cognitive control, with a focus on executive functions such as working memory, inhibition, and cognitive flexibility. To briefly clarify the gaps addressed by this empirical and interdisciplinary research, the study aims to (1) demonstrate the influence of expertise on the MTPE process by including legal translation practitioners, general translation practitioners, and legal domain experts in the experiment; (2) present findings on the English and Turkish language pair, which differs typologically in terms of word order, namely SVO in English and SOV in Turkish; (3) contribute to the subfield of legal translation, particularly with regard to MTPE for legal texts, as the experimental materials consist exclusively of legal texts; and (4) examine the relationship between multiple dimensions of effort exerted during MTPE and individual differences in executive functions including working memory, inhibition, and cognitive flexibility. 1.3. RESEARCH QUESTIONS The present study is motivated by answering the following research questions: (1) To what extent can expertise serve as a valid indicator of cognitive, temporal, and technical/linguistic effort in MTPE? (2) What is the relationship between expertise, objective and self-reported effort, and PE accuracy? (3) Do individual difference scores meaningfully predict the level of effort required during MTPE? The first research question delves into the expertise phenomenon in translation study by conducting an analysis derived from studies on effort. The result of this research question will enlighten that if and how expertise influences cognitive, temporal, and 4 technical/linguistic effort in MTPE by comparing the data collected from experienced translators, inexperienced translators, and field experts. Given the inconclusive and sometimes contradictory findings in the existing literature discussed below, the present study does not advance directional hypotheses. Instead, it acknowledges that one of the following possible outcomes may emerge for the first research question based on indicators for cognitive, temporal, and technical/linguistic effort: (1) Effort decreases as expertise increases, suggesting that higher expertise may lead to more efficient processing and quicker decision-making, as experts rely on their experience and knowledge to reduce the mental workload during MTPE tasks. (2) Effort increases as expertise increases, suggesting that higher expertise may lead to a more deliberate and strategic approach to MTPE, possibly involving deeper processing and more careful attention. (3) Effort shows no significant change with increasing expertise, suggesting that factors other than expertise, such as task complexity, individual cognitive style, or familiarity with the content, may play a more decisive role in influencing mental workload during MTPE tasks. The second research question challenges the common assumption that “the more experience professionals gain, the less effort they exert” by examining the relationship between expertise, objective and self-reported effort, and PE accuracy. By analysing both objective and subjective measures of effort in addition to PE accuracy, this research question aims to determine whether lower effort is associated with the production of a high-quality final product. To this end, the quality of the post-edited texts produced by the participants is evaluated in conjunction with their self-reported perceptions of effort. In line with this exploratory approach, the relationship between effort, expertise, and PE accuracy may manifest in different ways, as outlined in the following possible outcomes: 5 (1) Low effort correlates with high-quality output, suggesting that individuals who expend less effort are able to perform MTPE tasks efficiently without compromising the accuracy or quality of the final product. (2) Low effort correlates with low-quality output, suggesting that reduced effort may reflect superficial processing, resulting in overlooked errors or insufficient PE. (3) Low effort does not necessarily correlate with PE quality, suggesting that the relationship between effort and accuracy is not linear. The third research question investigates the effect of individual differences among the participants of the experiment in terms of cognitive flexibility, inhibition and working- memory capacity on their performance in MTPE. The outcomes of this research question aim to broaden the understanding of the effort phenomenon in MTPE by integrating perspectives from cognitive psychology into translation process research. Similarly, the role of individual cognitive differences in shaping effort during MTPE may yield varying patterns, which are reflected in the possible outcomes presented below: (1) Higher performance in cognitive tests correlates with lower effort in MTPE, suggesting that individuals with stronger cognitive abilities require less cognitive, temporal, or technical/linguistic effort during PE tasks. (2) Higher performance in cognitive tests correlates with higher effort in MTPE, suggesting that stronger cognitive abilities may be associated with more deliberate, controlled, or strategic processing, resulting in increased effort investment during MTPE. (3) Higher performance in cognitive tests does not necessarily correlate with MTPE effort, suggesting that effort during PE is not linearly related to individual cognitive difference scores. Considering the available literature in the field, as reviewed in the theoretical background chapter below, prior research suggests that effort may decrease as expertise increases, that lower effort may be associated with higher-quality output, and that stronger performance in cognitive tests may relate to reduced effort during MTPE. At the same time, these 6 patterns have primarily been observed in studies that do not include field experts who lack formal training in translation or PE. The inclusion of field experts in the present study may therefore shift dominant trends and give rise to alternative or unexpected patterns of effort and performance. The interpretation of the analysed data in relation to these possible outcomes is presented in the Results and Discussion chapter of this dissertation. 1.4. LIMITATIONS As is common in academic research, this study is subject to several limitations that define its scope and boundaries. These limitations include the following: (1) The text type selected for the experimental design was limited to legal texts. This choice was informed by data collected from translation bureaus. (2) The MT tool employed in the experiment was DeepL. This selection was based on the results of a pilot study survey regarding tool preference. (3) The study involved seven participants per group, reflecting the specialised nature of the participant profile required for MTPE research. To ensure sufficient analytical depth, each participant completed three MTPE tasks, yielding a total of sixty-three observations for analysis. This repeated- measures design enabled the examination of within-participant variation while maintaining a manageable and methodologically coherent dataset. (4) The study focuses on MTPE with visible source text (ST) and target text (TT) from English into Turkish, which may limit the generalisability of the findings to other language pairs. (5) Participants were not allowed to view any content on the screen other than the experimental interface. This restriction was implemented to reduce heterogeneous sources of pausing (e.g., time spent on web searches). However, it may reduce ecological validity for legal MTPE, where terminology consultation is a routine practice. Accordingly, the findings should be interpreted as reflecting MTPE performance under restricted- 7 resource conditions, and future research should include a controlled dictionary-allowed condition. (6) Because each participant completed three tasks, the observations were not statistically independent. Accordingly, analyses must account for data nesting (tasks within participants and texts), and the generalisability of the findings remains constrained by the modest number of participants. 1.5. IMPORTANCE OF THE STUDY During the research design phase of this study, the primary motivation was to address a critical question frequently posed by legal translation service users in Türkiye: Do we need a translator familiar with legal procedures, or a legal expert who knows English? This dissertation aims to provide a clear answer to this industry dilemma, which has significant implications for professional translation practice. Beyond its practical contribution, the study also holds theoretical value for both translation studies and cognitive psychology, as it deepens the understanding of how different types of expertise and individual cognitive differences influence effort in the MTPE process. By focusing on effort and participant diversity, the study helps to fill a notable empirical gap in MTPE process research. The inclusion of the English and Turkish language pair, which has been rarely explored in this context, adds linguistic and cultural significance to the study. Finally, the research offers targeted insights into legal translation, a highly complex and commercially active subfield of applied translation studies. 8 CHAPTER 2 THEORETICAL BACKGROUND 2.1. MACHINE TRANSLATION AND POST-EDITING Machine translation (MT) refers to the automated generation of target-language text from source-language input using computational models (Kenny, 2022, p. 32). In contemporary practice, MT is commonly integrated into professional workflows through machine translation post-editing (MTPE), where human language professionals evaluate and revise MT output to meet communicative and domain-specific requirements (O’Brien, 2022, p. 105). Accordingly, the present dissertation treats MT not as a replacement for human translation, but as a technology whose outputs are systematically shaped through human decision-making during PE. Today, MT is widely used by many users for a variety of purposes. Thanks to smartphones and other digital devices, MT accompanies users in everyday contexts, such as reading descriptions of artworks in museums abroad, communicating with speakers of other languages, understanding the content of legal documents, and producing translations for professional purposes. There is little doubt that the emergence of MT tools has transformed the conventional translation process, shifting it from purely human translation towards machine-assisted human translation. This shift can be interpreted as progress due to its positive impact on the translation process, including reduced turnaround time, the ability to handle larger volumes of work, and a potential decrease in effort compared to translating from scratch. Within this framework, the present chapter provides the theoretical background of the dissertation by reviewing key concepts related to MT, MTPE, and effort. 2.1.1. A Very Brief History It was less than a century ago that machines were associated with human-specific features, such as thinking and intelligence. Alan Turing (1950), who was one of the key British scientists in the World War II, posed the striking question in his article Computing Machinery and Intelligence: “Can machines think?” This question emerged as an outcome of the developments in machine technology at the time, and undoubtedly, one 9 of the most significant advancements was MT technology, driven by political and security reasons, which will be detailed in the following paragraphs. The “precursors and pioneers” of MT technology, as referred to by Hutchins (1995), were Georges Artsrouni and Peter Petrovich Troyanskii, who invented machines for automatic translation in the 1930s (pp. 432-433). At the time, these inventions were groundbreaking; however, neither received support due to widespread scepticism about the technology; moreover, the east side of the Iron Curtain, particularly within the Soviet Union and Eastern Europe, did not turn any initiatives in this field into reality until the 1950s (Hutchins & Lovtskii, 2000, pp. 198–199). Meanwhile, the other side of the Iron Curtain took note of the developments emerging in the Soviet Union and chose to focus on translation technology to gain a linguistic advantage in the Cold War race. At this point, Warren Weaver of the Rockefeller Foundation published a pivotal memorandum calling for research and initiatives in translation technology. The memorandum begins with the following statements: There is no need to do more than mention the obvious fact that a multiplicity of language impedes cultural interchange between the peoples of the earth, and is a serious deterrent to international understanding. The present memorandum, assuming the validity and importance of this fact, contains some comments and suggestions bearing on the possibility of contributing at least something to the solution of the world-wide translation problem through the use of electronic computers of great capacity, flexibility, and speed. (Warren, 1949, p. 1) From the very first paragraph of the memorandum, it is clear that there was an urgent need for research on MT technology, as its feasibility was strongly believed. Given the conditions of the time, whichever side of the Iron Curtain possessed this technology would have gained a significant advantage, particularly in national defence, and this was exactly the main motivation for the United States. As a result of these developments, research on MT technology gained importance and received strong support, leading to the initiation of the Georgetown-IBM Experiment under the guidance of Léon Dostert, who was a prominent American linguist of French 10 origin, and in 1954, this effort resulted in the successful development of a machine capable of translating 49 Russian sentences into English (Şahin, 2019, Chapter 9.2). This achievement not only reinforced faith and support for MT but also served as a significant victory for the United States in the ongoing race between the sides of the Iron Curtain. In other words, it was both a scientific and political milestone. The latest developments of that time led to a rise in both empirical and theoretical studies on MT, despite the lack of advanced computer technology making the process of improving language processing tools more effortful, and while most European and Soviet research focused on theoretical approaches that were contributing to linguistics and leading to the emergence of computational linguistics as a subfield, the United States adopted a “trial-and-error” approach to MT, primarily in the English-Russian language pair (Hutchins, 1995, p. 433). In other words, this period inevitably accelerated MT research as its feasibility became increasingly evident; however, the eastern side of the Iron Curtain took cautious steps by prioritizing theoretical foundations, while the western side was eager to jump to conclusions to get the edge in the race. In 1964, Automatic Language Processing Advisory Committee (ALPAC) was established in the US to evaluate advancements in MT, and its report that was two years later sharply interrupted the intensive interest in the field since the report suggested that the fully automatic and accurate MT did not seem feasible (Hutchins, 2007). This report halted efforts primarily in the US and partially in Europe and the USSR; however, this process did not last long, as research resumed in the mid-1970s, mainly focusing on the English and Russian language pair (Şahin, 2019, Chapter 9.2). Thus, the ALPAC report, which unsettled the belief in MT worldwide, led to a ten-year period of silence, but advancements in technology, particularly in computing, eventually ended this era and reignited interest and research in the field. The acceleration in the mid-1970s triggered the emergence of commercial and operational MT systems in various language pairs across different countries throughout the1980s and 1990s (Hutchins, 1995, pp. 436–438). Moreover, with the advent of the internet, these systems transferred to the online environment in the mid-1990s (Şahin, 2019, Chapter 9.2), which paved the way for the transformation in translation process. Today, online 11 MT tools are easily accessible and free to use, and countless individuals benefit from this technology for various purposes. According to Hutchins (2001), there are three main purposes for MT: (1) communication, i.e., on-the-spot translation to facilitate interaction; (2) assimilation, i.e., translation for gaining information from written documents; and (3) dissemination, i.e., translation for publishing work in another language (pp. 16-17). To exemplify, respectively: a tourist uses MT to order a cup of coffee; a journalist uses MT to scan international newspapers to stay informed about the agenda; and a translator applies an MT tool to translate a newsletter published by a UN agency. As can be observed, the significance of the translation output varies depending on the purpose and the level of human involvement. For communication, there is no need to correct the MT-generated text since it is used for a simple transfer of information between individuals. Similarly, assimilation does not require strict corrections, as its primary goal is to quickly understand the content of the ST. However, dissemination is generally handled by translators, where MT is not the central means of translation but rather a supporting tool. In this case, the translator uses MT to reduce time and effort, while the final output is still crafted by the human translator. In this dissertation, the purpose of MT use is dissemination, as participants are required to post-edit the MT-generated output to produce a final version suitable for delivery to the customer. However, during the PE process, they are restricted from using any source other than the MT-generated output. It is acknowledged that such an arrangement does not reflect a typical translation workflow, as professional translators usually have access to various computer-assisted translation (CAT) tools and online resources. Nevertheless, the constraints of data collection necessitate limiting participants from utilizing additional sources during the experiment. 2.1.2. Neural Machine Translation (NMT) MT is subject to continuous development and improvement like other software and hardware technologies, and since the 1950s, a variety of MT systems have emerged. MT systems can be categorised into four main types: (1) the rule-based machine translation (RBMT) system, which is a knowledge-driven approach based on rules and lexicon; (2) the example-based machine translation (EBMT) system, which is an example-driven 12 approach based on example-based database and bilingual corpora; (3) the statistical machine translation (SMT) system, which is a data-driven approach based on probabilistic models; (4) the neural machine translation (NMT) system, which is a neural networks-driven approach based on deep learning (Sharma et al., 2023, pp. 2–4). As the most recent and technologically advanced MT system, NMT has been the dominant approach for improving MT quality since the mid-2010s. As NMT is the system employed in the experiment conducted within the scope of this dissertation, it is the only MT system examined in detail within the theoretical framework. Before NMT, the engineering background of MT systems focused on the alignments and basic units of translation rather than sentence-level translation models (Kalchbrenner & Blunsom, 2013). In other words, these systems were designed to perform the translation process at the word or phrase level, without considering the sentence as a whole. Consequently, the output of these systems often exhibited deficiencies in terms of overall sentence meaning, which was a limitation that the development of NMT aimed to overcome. However, unlike rule-based and statistical models, NMT “aims at building a single neural network that can be jointly tuned to maximize the translation performance” (Bahdanau et al., 2015, p. 1). Consequently, the separate components employed in earlier systems, such as grammar analysis, word translation, and reordering, are integrated within a single neural architecture that performs these processes jointly. Moreover, NMT is a corpus-based MT system that relies on large datasets of source- language segments and their corresponding translations, and even though there is a similarity to SMT in terms of using vast translation memories, NMT differs from it by employing neural networks instead of statistical methods, which marks a significant shift in computational approach (Forcada, 2017, p. 292). More specifically, SMT relies on probabilities, phrase tables, and mathematical formulas, whereas NMT uses neural networks, continuous representations, and learning-based models inspired by the human brain. As a result, NMT adopts a fundamentally different translation model that is capable of generating more coherent translations. One of the most influential characteristics of NMT is its reliance on machine learning techniques, which enable the system to learn translation patterns directly from data rather than through manually encoded rules or exhaustive example lists; accordingly, the term 13 neural refers to the use of artificial neural networks (ANNs), which are computational models loosely inspired by the structure and functioning of the human brain (Şahin, 2019, Chapter 9.3). From this perspective, NMT can be described as an adaptive MT system that improves its performance as it is exposed to increasing amounts of training data. More specifically, NMT systems can generalise from learned representations and inferring translation patterns beyond explicitly provided parallel examples. As a result, NMT systems are able to refine their translation performance as they are trained on additional data over time. NMT operates as the following: (1) the encoding phase during which source-text elements are assigned neural representations, or embeddings, which are then (2) combined into a sentence-level representation adjusted based on context to capture meaning; and (3) the decoding phase during which the sentence-level representation is gradually processed to predict each target element step by step; moreover, both encoding and decoding are performed by ANNs that form a single composite system, and, similar to human translators, these networks do not process the entire source sentence at once but instead focus on the most relevant source words and the target words already generated by applying attention (Pérez-Ortiz et al., 2022, p. 143). In essence, NMT operates by first converting source-language words into numerical representations and combining them to form a sentence-level representation of meaning. The system then generates the TT, predicting one target word at a time based on this representation and the words already produced. Throughout this process, attention mechanisms allow the system to focus on the most relevant source-language elements at each step of translation. However, early attention-based NMT architectures relied on recurrent neural networks (RNNs), which are designed for sequence-based tasks (Lipton et al., 2015). In such architectures, source sentences are processed word by word, and information is passed sequentially through the network rather than being handled simultaneously. This reliance introduced long-term dependency problems, as RNNs tend to lose information over long sequences, and this limitation makes the translation of longer sentences less accurate due to memory constraints (Safwan Mahmood Al-Selwi et al., 2023). In addition, the sequential nature of RNN processing leads to slow training and inference, which limits the efficiency of early NMT systems in practical applications (Schaefer et al., 2008). To 14 address these limitations, Vaswani et al. (2017) proposed the Transformer architecture, which eliminates recurrence and relies entirely on attention mechanisms, enabling both higher-quality translations and significantly faster training. To clarify, the visual representation of the NMT operation process is presented below (see Figure 1.1), illustrating the encoding and decoding phases along with the role of attention mechanism and transformer architecture. Figure 2.1. Schematic representation of the NMT operation process, illustrating the encoding, meaning-capturing, and decoding stages with the roles of attention and transformer mechanisms In addition to understanding the overall operation of NMT, it is essential to delve into the internal structure of neural networks, particularly the hidden layers, which play a crucial role in the system’s ability to capture complex patterns in language. Figure 1.2 illustrates the layered architecture of NMT. 15 Figure 2.2. Neural network architecture illustrating the input layer, multiple hidden layers, and output layer (adapted from (Chatterjee et al., 2019, p. 4) NMT is fundamentally built upon ANNs, which constitute the core computational mechanism underlying contemporary MT systems. ANNs are information-processing systems inspired by biological neural networks and were originally developed as extensions of mathematical models representing aspects of human cognition and neural functioning (Montesinos López et al., 2022, p. 387). An ANN consists of an input layer, one or more hidden layers, and an output layer, through which information flows forward to generate output probabilities; furthermore, learning in these networks occurs through optimisation techniques such as backpropagation and gradient descent, which allow the system to adjust its internal parameters based on error feedback and to model complex, non-linear relationships without requiring prior assumptions about the distribution of input data (Chatterjee et al., 2019, pp. 3–4). ANNs that contain multiple hidden layers are referred to as deep neural networks (DNNs), and the process of training such multilayered architectures to learn relationships between input and output variables is known as deep learning, which is a subfield of statistical machine learning (Montesinos López et al., 2022, p. 383). From this perspective, deep 16 learning represents a specialised approach within machine learning, which falls under the broader domain of artificial intelligence (AI). NMT directly benefits from deep learning, as its underlying neural architectures rely on DNNs to capture complex linguistic patterns and contextual dependencies across sentences. As seen, NMT employs advanced engineering technology, including machine learning, to implement a complex yet problem-solving approach to MT. From ST analysis to TT generation, the system focuses on meaning in context rather than solely on word-level definitions, with the aim of producing the most appropriate equivalent in the target language. Although NMT does not generate translations that can be used without human intervention for dissemination purposes, it is still widely utilized by translation professionals to reduce time and effort. Consequently, the use of NMT alters the traditional translation workflow by incorporating NMT tools alongside the necessary PE process, which ensures continued human involvement in MT. 2.1.3. Machine Translation Post-editing (MTPE) MTPE is a dynamic process in which the translator actively engages with machine- generated output and continuously revises and refines the translation during its production (Rico Pérez, 2024, p. 32). The necessity of MTPE may vary depending on the intended purpose of the MT output, and in some cases, it may be considered essential, while in others it may be entirely omitted. As discussed earlier, MT is commonly used for the purposes of communication, assimilation, and dissemination. In cases of communication and assimilation, the MT user may not need to correct the output due to the informal or personal nature of these tasks. Even when the text is formal, such as a legal document or a rental agreement, the MT user may choose not to modify the MT output if it provides sufficient information to understand the content of the text. In contrast, when MT is used for dissemination purposes, the MT user typically assumes the role of a translation professional who is responsible for delivering an accurate and high-quality TT to the end user. In such cases, MT output requires PE in order to ensure that the final version meets professional standards, particularly with regard to grammatical, lexical, and syntactic accuracy. 17 Although there is still ongoing debate regarding the use of MT, it can be clearly stated that the translation market has already adopted MT tools as an integral part of computer- assisted translation (CAT) environments. Their use is now widely accepted by key actors in the translation workflow, including translators, project coordinators, employers, and clients. As a result, MTPE has become a critical and increasingly dominant phase of the contemporary translation process. However, since MTPE remains a human-mediated activity, it reaffirms the central role of human agency in translation and challenges the notion of competition between human translators and machines in achieving translation quality (Rico Pérez, 2024, p. 32). Achieving high-quality output from MT systems has become increasingly feasible with advances in technology; however, high quality in this context does not imply the production of error-free translations. Depending on factors such as language pair, translation direction, MT tool training method or the ST content, an MT system may generate output that contains a range of errors, including grammatical and stylistic issues, unnecessary additions or omissions, and terminological misuse (O’Brien, 2022, p. 106). To address such errors, the International Standards Organisation (ISO) (2017) and Translation Automation User Society (TAUS) (2010) classify PE into two main levels based on the extent of intervention required: (1) light PE, which involves essential corrections aimed at producing an understandable and usable TT with minimal intervention; and (2) full PE, entails comprehensive revisions intended to achieve a TT that meets professional quality standards and is comparable to a human-generated translation. In other words, quality expectations and end-use requirements are among the key factors influencing the level of intervention applied during the PE process. Accordingly, the choice between light and full PE depends on the purpose of the translation and the degree of accuracy, fluency, and terminological consistency required. On the other hand, O’Brien (2022) highlights the ambiguity of light and full PE definitions and the inconsistency between theoretical classifications and their practical applications, noting that translation commissioners often avoid explicitly acknowledging the use of light PE in professional settings (pp. 107-111). Similarly, Way (2013) argues that light and full PE classification is too limited to capture the diverse applications of MT and quality expectations, highlighting that the level of PE depends on the translation’s 18 intended purpose and the longevity of its content (p. 2). Taken together, these perspectives suggest that PE is a flexible, context-dependent activity rather than a fixed set of procedures, and that decisions about the level of intervention depend on communicative goals, professional expectations, and practical constraints. However, it became necessary to establish guidelines that clearly delineate the boundaries between light and full PE. For instance, the TAUS MTPE Guidelines (Massardo et al., 2016), as shown in Table 1.1, classify PE quality as either “good enough” or “similar to human translation.” The table below presents these guidelines together with their corresponding implementation criteria. Table 2.1. TAUS MTPE guidelines for light and full PE (adapted from (Massardo et al., 2016) Light PE Full PE Aim for semantically correct translation. Aim for grammatically, syntactically and semantically correct translation. Ensure that no information has been accidentally added or omitted. Ensure that no information has been accidentally added or omitted. Edit any offensive, inappropriate or culturally unacceptable content. Edit any offensive, inappropriate or culturally unacceptable content. Use as much of the raw MT output as possible. Use as much of the raw MT output as possible. Basic rules regarding spelling apply. Basic rules regarding spelling, punctuation and hyphenation apply. No need to implement corrections that are of a stylistic nature only. Ensure that formatting is correct. No need to restructure sentences solely to improve the natural flow of the text. Ensure that key terminology is correctly translated and that untranslated terms belong to the client’s list of “Do Not Translate” terms. As can be seen, both light and full PE prioritise semantic accuracy, completeness of information, and the removal of offensive or culturally inappropriate content. However, full PE involves a substantially more thorough level of revision. While light PE focuses 19 on correcting only those errors that hinder comprehension or usability, full PE requires the PE practitioner to address a wider range of linguistic and formal aspects. In contrast to light PE, full PE ensures grammatical, syntactic, and stylistic accuracy, as well as careful attention to punctuation, hyphenation, formatting, and the consistent use of key terminology. As a result, full PE constitutes a more comprehensive and time-intensive process, with the explicit aim of producing a TT that is comparable in quality to a human- generated translation. According to Arenas (2020), on the other hand, PE guidelines must be tailored to each project by considering factors such as language pair, MT engine type, and output quality, while also providing language-specific examples to clarify permissible edits, addressing key aspects like ST characteristics, expected final quality, common error patterns, segment discarding criteria, customer expectations on stylistic changes, and strategies for handling terminology (pp. 336-337). This perspective underscores that PE cannot be governed by universal rules; instead, it must be adapted to the specific characteristics and requirements of each translation project. In this sense, effective PE guidelines function as practical decision-making tools that help PE practitioners align their interventions with project-specific quality expectations. In conclusion, the level of PE varies based on factors such as error type, the MT tool and its output, customer requirements, and the translator or post-editor’s preferences. While distinguishing between light and full PE is challenging, general guidelines help define their scope, though task-specific guidelines would serve as a more effective reference for PE practitioners. Within the scope of this dissertation, the primary objective of PE is to produce a TT that is as accurate, complete, and fluent as possible. However, no predefined PE guidelines are provided to participants to avoid constraining their decision-making processes. Moreover, as the participant groups do not include a designated post-editor category, it is acknowledged that some participants may lack formal theoretical training in MTPE. 2.2. EFFORT IN POST-EDITING Effort is defined as the use of limited attentional resources required to perform non- automatic operations during translation, as described by Gile and Lei (2021). It should 20 not be equated with difficulty or error; rather, effort refers to the mobilisation of cognitive resources during task performance. These authors also emphasise that the relationship between effort and performance is non-linear. In other words, too little effort may lead to careless translation and errors, while too much effort can become counterproductive and fail to produce proportional gains. From this perspective, effort can be understood as a purposeful and regulated component of translation activity, whose effectiveness depends on how well it is adjusted to the demands of the task (PACTE et al., 2000). On the other hand, effort is a notoriously challenging concept to define in a general sense, as it encompasses both objective and subjective dimensions that emerge during task performance. Objective effort refers to the measurable amount of work required to complete a task, whereas subjective effort denotes the individual’s perceived mental exertion while performing it (Steele, 2020, p. 5). Effort can therefore be understood as a psychological phenomenon that reflects the interaction between task demands and the individual’s response to those demands. As such, it is closely related to, yet distinct from, concepts such as cognitive load, task difficulty, and cognitive cost. In order to ensure conceptual clarity and analytical consistency, it is essential to define and distinguish these related concepts before examining effort in the context of MTPE. In translation process research, cognitive load, cognitive effort, and cognitive cost refer to related but distinct aspects of task performance. Cognitive load denotes the demands imposed by the task itself and by external conditions, such as the complexity of the ST, the translation brief, situational constraints, and environmental factors; however, in contrast, cognitive effort refers to the actual response of the task performer, that is, the amount of mental resources actively invested while carrying out the task (Ehrensberger- Dow et al., 2020, p. 221). As Gile and Lei (2021) explain, cognitive load captures the pressure exerted by factors related to the task and the environment (p. 275). In contrast, cognitive effort reflects the effort the translator effectively expends during task execution. This distinction can be further clarified through an economic analogy proposed by Gieshoff and Heeb (2023), in which cognitive load corresponds to the price tag of a task, while cognitive effort represents how much the individual is willing or able to pay in response to that demand. Accordingly, it can be inferred that high task load does not 21 necessarily result in high effort, as individual differences, strategies, expertise, and contextual factors may mediate the effort ultimately invested. Cognitive cost, by contrast, can be understood as the measurable consequences of this effortful processing. As Diamond and Shreve (2019) put forward, cognitive cost refers to the observable outcomes that arise when a task places prolonged demands on cognitive resources, and such costs may manifest as slower processing speed, reduced accuracy, a diminished ability to handle multiple streams of information, and an overall decrease in processing efficiency. These effects may occur while performing reading, writing, listening, and speaking activities. Taken together, cognitive load reflects the demands imposed by the task, cognitive effort captures the resources actively invested by the individual in response to those demands, and cognitive cost represents the observable impact of sustained effort on performance. While load and effort describe conditions and responses during task execution, cost becomes visible in performance outcomes and processing efficiency. On this basis, task difficulty cannot be attributed solely to the objective demands associated with task load and to the effort invested by the performer. Rather, task difficulty emerges from the interaction between task demands, the effort mobilised to meet those demands, and the cognitive costs incurred during task. Consequently, a task can be considered difficult not only when it places high demands on the individual, but also when it requires sustained effort that results in observable cognitive costs, especially in linguistically complex tasks such as MTPE. Among these concepts, this dissertation focuses specifically on the effort invested during MTPE as effort provides a unifying lens through which task demands, individual responses, and performance outcomes can be examined. In order to further clarify how effort manifests during translation and PE, it is useful to consider theoretical perspectives that link effort to strategic behaviour and underlying cognitive mechanisms. Effort, particularly cognitive effort, in translation and PE can be examined through the strategies translators employ and the linguistic choices they make during task performance. In their analysis of explicitation as a translation procedure, Lourenço da Silva and Pagano (2017) show that different strategic choices may be associated with different degrees of cognitive engagement. They note that literal translation often 22 functions as a default strategy and typically requires less cognitive effort, particularly when mental resources are limited or when processing relies on automatic routines. Explicitation, by contrast, involves making implicit information in the ST more explicit in the TT and generally requires more controlled and effortful processing. This process entails close monitoring of the ST, inference of unstated meanings, and deliberate decision-making regarding how information should be conveyed in the target language. Importantly, however, the authors caution against a simplistic association between strategy choice and effort, observing that more implicit solutions do not necessarily require greater cognitive effort. This observation reinforces the view that effort is shaped not only by surface-level linguistic outcomes, but by the underlying cognitive processes and strategic regulation involved during task performance. Additional insight into the nature of effort can be gained from cognitive load theory (Sweller, 2011), which conceptualises cognitive effort in terms of the demands placed on working memory during task execution. Working memory refers to the limited-capacity system responsible for temporarily storing and processing information (Baddeley & Hitch, 1974). From this perspective, effort increases when these limited resources are strained by task complexity or by extraneous factors that do not directly contribute to task completion. In translation and PE contexts, such factors may include the complexity of the ST, redundancy in MT output, interface-related constraints, or the need to manage multiple streams of information simultaneously. Cognitive load theory also offers a theoretical explanation for differences in effort across levels of expertise, as greater domain knowledge reduces the number of interacting elements that must be processed at the same time, thereby lowering working-memory demands. Consequently, lower observable effort among more experienced participants should not be interpreted as reduced task engagement, but rather as more efficient organisation and regulation of cognitive resources. Together, these perspectives underscore that effort is not a fixed property of the task, but a dynamic and regulated response shaped by both strategic choices and cognitive constraints. On the other hand, effort does not emerge in isolation but is shaped by the interaction of several underlying cognitive and motivational factors. Central among these are attention, motivation, and cognitive control, which together determine whether and to what extent 23 an individual engages with a task. As Botvinick and Braver (2015) argue, cognitive control refers to a set of functions that regulate attention, memory, and action selection, while motivation determines whether these control processes are activated and sustained (pp. 84-85). From this perspective, effort can be understood as the result of an internal evaluation process in which individuals weigh the expected benefits of completing a task against the mental costs of exerting control. When motivation is high, individuals are more likely to sustain attention and apply cognitive control, even when tasks are demanding. When motivation is low, however, effort investment may decrease, even if task demands remain the same. Attention plays a key mediating role in this process by determining which information is selected, prioritised, and maintained during task performance. In cognitively demanding activities such as MTPE, on the other hand, which require continuous monitoring, error detection, and decision-making, effort is therefore shaped by the combined influence of attentional focus, motivational engagement, and the ability to maintain cognitive control over time. Effort has been a central focus of research in mentally demanding professions, such as aviation, including air controllers and pilots (e.g., Holley & Miller, 2022; Miller & Holley, 2017; Pagnotta et al., 2021), and in healthcare fields involving nurses and surgeons (e.g., Dias et al., 2018; Surendran et al., 2024). However, in recent years, this research area has also gained prominence in translation studies, especially in research exploring translators’ mental processes during tasks such as PE and simultaneous interpreting (e.g., Chen, 2017; Li et al., 2019), and, within this concept, effort refers to the cognitive and temporal resources mobilised by translators while performing translation tasks (Gile & Lei, 2021, p. 275). Krings (2001), whose nominal classification of the types of effort involved in MTPE has been widely adopted in translation process research, categorises effort into three dimensions: (1) cognitive effort, which refers to the mental processes involved in reading and analysing the ST, evaluating and correcting the output, generating the final TT, applying personal preferences, and engaging in problem-solving strategies; (2) technical effort, which denotes the physical actions required to alter the text, such as typing, deleting, inserting, and scrolling; and (3) temporal effort, which captures the time-based dimension of effort and is operationalised through the duration required to complete the 24 MTPE task. Building on Krings’ framework, Popović et al. (2014) empirically demonstrate that these dimensions capture different aspects of PE behaviour and that they do not necessarily align across error types or editing operations, as certain corrections may require substantial cognitive processing without extensive technical intervention or prolonged task duration. Importantly, these dimensions do not function independently but rather represent complementary manifestations of effort during MTPE. The present study, therefore, adopts a multidimensional approach to effort and operationalises it through process- oriented indicators, examining pauses as markers of cognitive effort, keystroke activity as indicators of technical and linguistic effort, and task duration as a measure of temporal effort. 2.2.1. Pause as Cognitive Effort Pauses refer to temporary halts in activity that occur during task execution. In translation process research, pauses have frequently been used as indicators of cognitive effort, particularly in studies on MTPE. Lacruz et al. (2012) investigate pause behaviour using keystroke logging data from a professional post-editor and introduce the average pause ratio, a measure that captures both pause frequency and duration. Their findings show that cognitively demanding segments are characterised by a higher number of short pauses, whereas less demanding segments tend to contain fewer but longer pauses. This suggests that clusters of short pauses provide more informative signals of increased cognitive effort during MTPE. Similarly, O’Brien (2006) examines pause behaviour using Translog, a program to record reading and writing processes on a computer, and defines pauses through pause ratio measures combined with Choice Network Analysis, which captures translators’ decision- making processes by mapping alternative translation choices. The results indicate that pause ratios alone do not reliably distinguish between sentences that are more or less suitable for MT, and the study highlights considerable individual variation in pause behaviour; for this reason, pause data must be interpreted alongside other process indicators, as reported by the author. 25 Extending this line of inquiry, Vieira (2016) conducts a multivariate analysis that combines pause-based measures with eye-tracking data, temporal measures, and subjective ratings of effort. The findings demonstrate that pause measures correlate with other indicators of cognitive effort, but they do not behave uniformly across tasks and conditions, which supports the view that pauses reflect multiple facets of cognitive effort rather than a single, homogeneous construct. Taken together, these studies show that pause behaviour provides valuable insights into cognitive effort during MTPE, while also underscoring that pauses must be analysed in combination with other qualitative and quantitative data. However, although pauses are widely regarded as signs of cognitive effort, O’Brien (2006) emphasizes the possibility of other factors that may cause pauses and notes the limitations of data collection methods in accurately identifying the cause behind each pause (p. 7). This observation suggests that pause behaviour cannot be interpreted as a direct or exclusive reflection of cognitive effort, as pauses may also arise from physical, environmental, or procedural factors unrelated to cognitive processing. In