1 EXPLAINING ARTIFICIAL NEURAL NETWORKS WITH DECISION TREE ENSEMBLES YAPAY SİNİR AĞLARININ KARAR AĞACI TOPLULUKLARI İLE AÇIKLANMASI SAYİT KILIÇ ASSOC. PROF. DR. BURKAY GENÇ Supervisor Submitted to Graduate School of Science and Engineering of Hacettepe University as a Partial Fulfillment to the Requirements for the Award of the Degree of Master of Science in Computer Engineering April 2023 ABSTRACT EXPLAINING ARTIFICIAL NEURAL NETWORKS WITH DECISION TREE ENSEMBLES Sayit Kılıç Master of Science, Computer Engineering Supervisor: Assoc. Prof. Dr. Burkay GENÇ April 2023, 73 pages With the development of efficient algorithms, artificial intelligence (AI) applications have become ubiquitous in almost every aspect of our lives. They have even started to be used in critical areas such as defense industry, economy, and healthcare. However, the use of AI models in these important areas raises concerns about their reliability. Therefore, explaining how these black box models work has become an important goal. This thesis, we propose a simple and fast model to explain the decisions of any black box model. To achieve this, we attempt to explain the basic behavior of the model through a set of semi-random decision trees. Our approach only requires the data used to train the black box model and the model itself to work. Current state-of-the-art explainable AI (XAI) models typically produce local explanations for a black box model’s decision regarding a single observation. On the other hand, models that produce global explanations use complex computations to understand the effect of each feature on the model’s decisions. However, our proposed approach defines separate regions in the model’s general decision space to explain the decision-making process of the model, and requires significantly less computational power than other advanced XAI techniques while producing both local and global explanations. Keywords: Explainable Artificial Intelligence, Interpretability, Ensemble Model, TEXAI i ÖZET YAPAY SİNİR AĞLARININ KARAR AĞACI TOPLULUKLARI İLE AÇIKLANMASI Sayit Kılıç Yüksek Lisans, Bilgisayar Mühendisliği Danışman: Assoc. Prof. Dr. Burkay GENÇ March 2023, 73 sayfa Verimli algoritmaların gelişmesiyle yapay zeka uygulamaları hayatımızın neredeyse her alanında kullanılır hale geldi. Savunma sanayi, ekonomi ve sağlık gibi insan hayatı için çok önemli konularda bile kullanılmaya başlandı. Bu önemli konularda yapay zeka modellerinin kullanımı, bu modellerin güvenilirliği hakkında soru işaretlerine neden olmaktadır. Bu nedenle, bu siyah kutu modellerinin nasıl çalıştığını açıklayabilme önemli bir hedef haline gelmiştir. Bu tezde, herhangi bir kara kutu modelin kararlarını açıklamak için basit ve hızlı bir model öneriyoruz. Bunu yapmak için, modelin temel davranışını yarı rastgele karar ağaçlarının bir kümesi aracılığıyla açıklamaya çalışıyoruz. Yaklaşımımızın çalışması için sadece kara kutu modeli eğitmek için kullanılan verilerine ve modelin kendisine ihtiyaç duymaktayız. Mevcut son teknoloji açıklanabilir yapay zeka modelleri genellikle siyah kutu modellerin tek bir gözlem hakkındaki kararı için yerel açıklamalar üretmektedir. Öte yandan, genel açıklamalar üreten modeller, her özniteliğin modelin kararları üzerindeki etkisini anlamak için karmaşık hesaplamalar kullanır. Ancak, önerdiğimiz yaklaşım, modelin karar verme sürecini açıklamak için modelin genel karar uzayında ayrı ayrı bölgeler tanımlar ve hem yerel hem de genel açıklamalar üretirken diğer ileri XAI tekniklerine göre önemli ölçüde daha az hesaplama gücü gerektirir. Keywords: Açıklanabilir Yapay Zeka, Açıklanabilirlik, Topluluk Modeli, TEXAI ii ACKNOWLEDGEMENTS To our noble martyrs who have filled every page of the glorious history of Türkiye with epic heroism and ensured us to live in peace and security on this sacred land under the crescent and star flag, To my valuable advisor Assoc. Prof. Dr. Burkay GENÇ, who has enlightened my path with his knowledge and experience and taught me how to learn throughout this process, To my mother Hacer KILIÇ, my father Adem KILIÇ and my brother İsmail KILIÇ, who have made all kinds of sacrifices for me to come this far, To the MORALABS family, starting with my valuable employers Ms. Kamer KEMERKAYA and Mr. Kemal ŞEN, who provide every kind of support for me to continue my education alongside my professional life, To Ms. Gözde ORHAN, who has supported me with her knowledge and experience during my master’s degree, To Mr. Ertunç EFE, Mr. Mehmed Derviş BARUTÇU, Ms. Zümrüt UYSAL, and Ms. Elif ÜSKUP, who have guided me and never withheld their assistance during my working life, To my dear friend Mr. Muhammed GÜL, who has shown me that blood ties are not necessary for brotherhood, To my dear friend and the greatest advisor on my life, Atty. Cemre Gözde ATİK, who has always been by my side in difficult times, And to my precious wife Gözde KILIÇ, who has taught me how to love and has been my biggest supporter in all my decisions and my companion on this life journey. I offer my endless respect and gratitude... iii CONTENTS Page ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i ÖZET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii ABBREVIATIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2. Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3. Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. BACKGROUND OVERVIEW .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1. Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1. Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2. Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.3. Decision Trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.4. Decision Tree Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2. Explainable Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1. Categorization of explanations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.2. Explainability Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.3. Operations To Enable Explainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.4. XAI Visualizing Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.5. Evaluation XAI Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3. RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1. Local Interpretable Model-Agnostic Explanations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2. SHAP (SHapley Additive exPlanations) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4. PROPOSED METHOD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 iv 4.1. TEXAI Model Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5. EXPERIMENTAL RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.1. Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.1.1. Abalone Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.1.2. Wine Quality Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.2. Threshold Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 v TABLES Page Table 5.1 Abalone dataset rules count. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Table 5.2 Wine quality dataset rules count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Table 5.3 Metric results of TEXAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 vi FIGURES Page Figure 2.1 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Figure 2.2 Relations of AI, machine learning, and deep learning. 1 . . . . . . . . . . . . . . . . 8 Figure 2.3 An artificial neural network diagram. 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Figure 4.1 Basic workflow of TEXAI model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Figure 4.2 Learning techniques and their explainability.[1] . . . . . . . . . . . . . . . . . . . . . . . . 31 Figure 4.3 TEXAI model work flow schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Figure 4.4 Decision tree example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Figure 5.1 Abalone dataset target class distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Figure 5.2 Wine quality dataset target class distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Figure 5.3 Minimum probability evaluation(Abalone) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Figure 5.4 Minimum coverage evaluation(Abalone) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Figure 5.5 ANN prediction plots(Abalone) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Figure 5.6 TEXAI prediction plots(Abalone) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Figure 5.7 ANN vs TEXAI predictions(Abalone) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Figure 5.8 ANN prediction plots(Wine Quality) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Figure 5.9 TEXAI prediction plots(Wine Quality) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Figure 5.10 ANN vs TEXAI predictions(Wine Quality) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Figure 5.11 ANN predictions for observations which fits one rule (Abalone). . . . . . . 48 Figure 5.12 Feature importance (Wine Quality) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 vii ABBREVIATIONS TEXAI : Tree Ensemble for eXplainable Artificial Intelligence ANN : Artificial Neural Networks ML : Machine Learning EM : Ensemble Model AI : Artificial Intelligence XAI : eXplainable Artificial Intelligence LIME : Local Interpretable Model-Agnostic Explainations SHAP : SHapley Additive exPlainations viii 1. INTRODUCTION Artificial intelligence (AI) has become an increasingly essential part of our daily routines, from voice assistants to fraud detection systems in our banks. However, AI is often viewed as a black box, making it difficult for humans to understand the decision-making process behind AI-generated outcomes [2, 3]. The absence of transparency regarding the implementation of AI in crucial domains like healthcare and finance can result in doubt and suspicion towards its use. To address this issue, the domain of explainable artificial intelligence (XAI) has emerged, which aims to develop models and methods that can be understood and interpreted by humans [4]. XAI is a multidisciplinary research field that seeks to develop algorithms and models capable of providing clear and understandable explanations for decision-making processes. With the rapid expansion of AI usage across a range of domains, including healthcare, finance, and security, the demand for accountable and trustworthy AI systems has grown considerably. The significance of XAI stems from the fact that many modern AI systems function as “black boxes” meaning their decision-making processes are opaque to human understanding. This poses a critical challenge in situations where these systems make decisions with real-world consequences, such as medical diagnosis, financial trading, or autonomous vehicles. In these cases, it is essential to have a clear grasp of the reasoning behind an AI system’s outputs, particularly when they have an impact on human welfare. XAI draws on expertise from various fields, including computer science, cognitive science, philosophy, and others, to develop techniques and approaches that enable AI systems to provide transparent and understandable explanations of their process of making a decision. These approaches include methods from machine learning, natural language processing, and visualization, among others, with the aim of creating more transparent and interpretable AI systems. In summary, the ultimate goal of XAI is to improve the transparency, interpretability, and accountability of AI systems, especially in contexts where the consequences of their 1 decisions are significant. By leveraging interdisciplinary expertise and developing techniques and approaches that enable AI systems to provide transparent and interpretable explanations, XAI promotes trust and confidence in AI systems, mitigates unintended consequences or errors, and paves the way for the responsible and ethical deployment of AI in society. 1.1. Motivation Explainable Artificial Intelligence (XAI) not only enhances the credibility of artificial intelligence models, but also offers several advantages that drive our motivation in this thesis. XAI can potentially increase the accuracy of Artificial Intelligence (AI) models in several ways. One way is by allowing humans to better understand and correct errors in the AI model’s decision-making process. XAI models provide human-interpretable explanations of the AI model’s decisions, allowing users to identify and correct any errors or biases in the model’s logic. In addition, XAI models can enable users to fine-tune the AI model’s parameters and features to better fit the specific problem or domain, which can lead to improved accuracy. For example, a study on medical diagnosis found that an XAI model that allowed clinicians to adjust the importance of different features achieved higher accuracy than a traditional AI model [5]. Moreover, XAI can help detect and prevent errors or biases in the training data used to develop the AI model. By providing explanations of the AI model’s decisions, XAI can reveal any underlying biases or inaccuracies in the data and allow users to address them before they affect the model’s performance. The comprehension of an AI model’s decision-making process can provide individuals with valuable insights into the problem domain being addressed. XAI can aid medical professionals and researchers in the medical field by enabling them to comprehend how AI models diagnose diseases or prescribe treatments. Through the provision of natural language explanations of the features or variables that the model employs to make decisions, XAI can assist medical professionals in identifying patterns or relationships that may not be immediately evident from the data. Similarly, in the financial domain, XAI can facilitate analysts in understanding how AI models predict stock prices or recognize fraudulent 2 transactions. By visualizing the connections between various variables and emphasizing the factors that significantly influence the model’s decision-making process, XAI can enable analysts to acquire insights into the market trends or patterns detected by the model. As AI technologies continue to be integrated into important domains such as defence industry, finance, and criminal justice, ethical values such as fairness, transparency, accountability, and privacy are becoming increasingly paramount in AI development. One of the major ethical worries regarding AI systems is that they may entrench or maintain existing biases and discrimination. This can occur if the training data of AI model is biased or if the algorithms used to make decisions are designed with bias. Another important ethical consideration in AI is the potential for these systems to cause harm to individuals or society as a whole. This can occur if AI systems are used inappropriately or if they make decisions that have unintended consequences. The assurance of ethical alignment in the training of AI models is crucial in order to avoid unintentional negative consequences and promote responsible AI development. XAI plays a vital role in achieving this objective by providing developers and users with the ability to comprehend AI model decisions. This transparency assists in identifying potential biases or unfairness in the data or algorithms employed to train the models and implementing corrective actions to align with ethical values. AI systems must be designed to ensure that they do not perpetuate or amplify existing biases or discrimination and must uphold basic human rights and values. XAI can play a critical role in ensuring ethical alignment in AI systems. By providing transparent and interpretable explanations for their decision-making processes, AI systems can be made more accountable and can be designed in a way that is consistent with ethical values. One important example of unethical behavior by an AI model is the case of a hiring algorithm developed by Amazon in 2014. The algorithm was designed to evaluate job candidates’ resumes and assign them a score based on their qualifications. However, it turned out that the system was biased towards female candidates. The system’s bias against female candidates was that they were trained over a ten-year period with resumes sent to Amazon, 3 with the majority of them male candidates. As a result, the algorithm gave lower scores to resumes that contained words associated with women, such as “women’s,” “female,” and “she.” Amazon eventually scrapped the algorithm after the bias was discovered, and this case highlights the potential of AI models to maintain and prolong exist discrimination if not trained with care and ethical considerations [6, 7]. Therefore, the development of AI models featuring explainability and transparency represents a positive stride in building trust and confidence in these technologies and promoting responsible and ethical development. 1.2. Problem Definition Despite significant advances in the research on eXplainable Artificial Intelligence (XAI), contemporary models have certain limitations and drawbacks. LIME (Local Interpretable Model-Agnostic Explanations) [8] and SHAP (SHapley Additive exPlanations) [9] are the most widely well-known models in this area. LIME is an approach for explaining black-box machine learning model predictions at a local level by generating a simplified “local surrogate model” that imitates the behavior of the black box model. To create the surrogate model, LIME perturbs the feature values of the instance and uses the black-box model to predict the labels of these instances, then trains the surrogate model based on these predictions. The surrogate model’s feature weights serve as feature importances, indicating the impact of each feature to the original model’s decision. However, the drawback of LIME is that it may not always provide a faithful explanation of the model’s behavior. LIME approximates the global behavior of the original model by building a local surrogate model, which may not capture all the nuances of the original model’s behavior. The choice of kernel function used in LIME may also impact the resulting explanations, and determining the appropriate kernel function for a given problem may be challenging. SHAP is an algorithmic approach that explains the contribution or importance of each feature for a specific prediction made by an AI system. It uses shapley values to distribute credit or blame fairly among the features. The method assigns an relevance point to each feature 4 based on its impact to the output. However, SHAP can be computationally expensive and time-consuming to calculate for large datasets or complex models because it requires calculating the marginal contribution of each feature to the prediction for every possible combination of features, which is computationally intensive. Another potential issue with both LIME and SHAP is that they may not provide a comprehensive picture of the model’s behavior. Both methods focus on explaining the impact of individual features to the AI system predictions but may not capture complex interactions between features or the overall structure of the model. Furthermore, LIME and SHAP explanations may be specific to a particular input example and may not generalize well to other examples. In the “Tree Ensemble for Explainable Artificial Intelligence(TEXAI)” model we presented, all the structural behaviors of the black box model are uncovered without requiring any additional data beyond the model’s training data. This is achieved by analyzing the model’s behavior on observations instead of computing complex feature importance scores. In this way, we introduce a novel XAI approach that is simple, rapid, and resource-efficient, with minimal computational requirements. 1.3. Organization The organization of the thesis is as follows: • Chapter 1 presents problem definition, our motivation, contributions and the scope of the thesis. • Chapter 2 provides an overview of the overall scope and methodology of the literature. • Chapter 3 gives a comprehensive information about state of the art XAI models. • Chapter 4 introduces our TEXAI model. • Chapter 5 demonstrates the metric results of our TEXAI Model. 5 • Chapter 6 provides a summary of the thesis and outlines potential future directions. 6 2. BACKGROUND OVERVIEW 2.1. Artificial Intelligence The term Artificial Intelligence (AI) pertains to the creation of computer systems that have the ability to carry out functions which would usually necessitate human intelligence, like recognizing visual inputs, comprehending speech, making decisions, and translating languages. According to Russell and Norvig, AI can be defined as “the study of agents that receive percepts from the environment and take actions that affect that environment” [10] as shown in figure 2.1. AI has been around for over sixty years and has made remarkable progress recently due to greater accessibility to data and the advent of more advanced computing systems. Figure 2.1 Artificial Intelligence There are many different AI methods, and they can be classified in different ways based on model type, problem-solving approach, learning type, and other criteria. For the purpose of our discourse, we shall limit our discussion to the types of models that are pertinent to our thesis or those that are prevalent in the field of AI. 7 Figure 2.2 Relations of AI, machine learning, and deep learning. 1 2.1.1. Machine Learning Machine learning (ML) is a branch of artificial intelligence (AI) that enables computer systems to learn and enhance their performance based on experience, without requiring explicit programming [11]. According to Tom Mitchell, a renowned computer scientist, machine learning is the study of algorithms that enable computer programs to automatically improve through experience [11]. Supervised learning, unsupervised learning, and reinforcement learning are the three main types of machine learning. In supervised learning, a dataset with known input and output pairs is used to train the computer, which then learns to predict the output for new inputs. In contrast, unsupervised learning involves the computer being given an unlabeled dataset and tasked with discovering patterns and structures within the data. Reinforcement learning is another approach in which an agent learns how to act within an environment by taking actions and receiving penalties or rewards according to those actions. 1Adapted from “AI vs Machine Learning vs Deep Learning” by Emerj. Available at: https://emerj. com/ai-vs-machine-learning-vs-deep-learning-whats-the-difference/ 8 https://emerj.com/ai-vs-machine-learning-vs-deep-learning-whats-the-difference/ https://emerj.com/ai-vs-machine-learning-vs-deep-learning-whats-the-difference/ Figure 2.3 An artificial neural network diagram. 2 Machine learning is used in various fields such as natural language processing, speech recognition, autonomous vehicles etc. Different types of algorithms are used in machine learning, including decision trees, k-nearest neighbors, support vector machines, neural networks, and deep learning models [12, 13]. 2.1.2. Artificial Neural Networks Artificial Neural Networks (ANNs) are a branch of Artificial Intelligence (AI) that has gained a lot of interest in recent times. ANNs can learn from input data and make decisions according to that learning, by taking inspiration from the organization of the biological neural networks found in the human brain. According to Haykin, an ANN is a “massively parallel distributed processor that has a natural propensity for storing experiential knowledge and making it available for use” [14]. Artificial Neural Networks (ANNs) consist of numerous interconnected processing nodes, called neurons, that receive inputs and generate outputs. as shown in figure 2.3. The connections between the neurons are weighted, which allows the network to learn from data by adjusting the weights to improve its performance on a given task. 2Adapted from TIBCO Software Inc., 2021). Available at : https://www.tibco.com/sites/ tibco/files/media_entity/2021-05/neutral-network-diagram.svg 9 https://www.tibco.com/sites/tibco/files/media_entity/2021-05/neutral-network-diagram.svg https://www.tibco.com/sites/tibco/files/media_entity/2021-05/neutral-network-diagram.svg Artificial Neural Networks (ANNs) have been effectively implemented in various fields such as image recognition, natural language processing, and prediction tasks [15]. ANNs are particularly useful in learning intricate patterns and relationships in data, which makes them suitable for tasks that cannot be modeled using conventional methods. This attribute of ANNs is one of their significant advantages. Artificial Neural Networks (ANNs) can be categorized based on various factors such as their architecture, learning algorithm, and activation function. Several common types of ANNs exist, including feedforward propogated networks, convolutional neural networks, among others [16]. Despite their success, ANNs are not without their challenges and limitations, including the need for large amounts of training data, overfitting risk, and the difficulty of interpreting the internal workings of the network [17]. Nonetheless, despite the challenges and limitations, ANNs are still considered a potent approach for addressing various AI problems. As such, their ongoing improvement and advancement are expected to have a significant impact on the development of the field in the future. 2.1.3. Decision Trees Decision Tree is a versatile algorithm that can be used for both classification and regression tasks. It has been applied in a wide range of fields and industries, including healthcare, finance, marketing, engineering, and many more. [18]. The Decision Tree algorithm is a robust and uncomplicated method that builds a tree-like structure by dividing the data recursively according to the feature values that offer the highest information about the target variable or reduce the variance in it. Both classification and regression tasks can be performed by decision trees. In classification task, the goal is to assign a label to a given instance according to its features. In regression, the goal is to predict a numerical value for a given instance based on its features. The construction of the decision tree algorithm involves recursively dividing the data using 10 feature values that enhance the distinction between classes or reduce the variance in the target variable. Decision Trees have several advantages over other machine learning algorithms. First, they are easy to understand and interpret, as they represent a set of decision rules that can be visualized as a tree. This makes them useful for explaining the reasoning behind the model’s predictions and gaining insights into the data. Second, they are capable of handling both categorical and continuous data, as well as NA values and noisy data, without the need for data normalization or transformation. Lastly, they can capture complex interactions between features, such as nonlinear and hierarchical relationships, by constructing decision rules based on multiple features. In conclusion, Decision Tree is a powerful and versatile algorithm in artificial intelligence that can be helpful for various fields like classification, regression, and feature selection. Decision Trees have several advantages such as interpretability, flexibility, and robustness which make them a popular choice for individuals working in the field of artificial intelligence. 2.1.4. Decision Tree Ensembles Ensemble methods refer to the use of multiple machine learning models to improve the overall performance of a predictive task, as compared to using a single model. Decision tree ensembles, are a popular type of ensemble method that use multiple decision trees to create a stronger model. Two primary types of tree ensembles are “bagging” and “boosting”. Bagging, also known as bootstrap aggregating, entails building multiple decision trees using randomly selected subsets of the training data, and then combining their predictions, typically through averaging, to obtain the final prediction. This technique was introduced by Breiman in 1996 [19], and studies have shown that utilizing bagging can improve the accuracy and robustness of decision tree models. [20]. 11 Boosting, on the other hand, involves iteratively constructing decision trees that focus on the data points that were misclassified by previous trees. The most well-known algorithm for boosting decision trees is AdaBoost, which was proposed by Freund and Schapire in 1997 [21]. AdaBoost has been shown to increase the performance of decision tree models on a variety of datasets [22]. Random Forest, Gradient Boosting, and XGBoost are some of the most common tree ensembles. Random Forest is a machine learning technique that involves creating numerous decision trees on different subsets of the training data and consolidating their predictions to obtain a final prediction [23]. The Gradient Boosting algorithm constructs decision trees in a sequential manner, where each subsequent tree aims to rectify the mistakes made by the previous tree [24]. XGBoost is known for its speed and scalability, and uses a regularized gradient boosting algorithm to build a strong model [25]. Ensemble methods using decision trees have emerged as a popular and effective machine learning technique and using in diverse applications, such as speech recognition, bioinformatics, finance, and many others thanks to improve predictive accuracy and reduce overfitting [26]. 2.2. Explainable Artificial Intelligence Explainable artificial intelligence (XAI) is a specialized domain within AI that emphasizes the creation of AI systems that are transparent, easily comprehensible, and capable of being explained to people. XAI aims to provide users with a better understanding of how AI systems make decisions by revealing the internal workings of the AI models. XAI has been defined by the Defense Advanced Research Projects Agency(DARPA) as “the ability to understand, appropriately trust, and effectively use AI systems.” [27, 28]. XAI research employs various techniques, including model interpretation, visualization, and explanation, to make the AI models decision process more transparent and interpretable. The origins of the requirement for XAI can be traced back to the initial stages of AI research. In the 1960s and 1970s, AI researchers were primarily focused on developing rule-based 12 systems, which were designed to mimic human reasoning and decision-making processes. These systems were inherently interpretable, as the rules used to make decisions were explicitly defined and transparent to the user. As the AI field advanced, researchers started to concentrate on more intricate and potent machine learning models, like neural networks. Even though these models were proficient in accomplishing remarkable performance in various tasks, they were frequently opaque and hard to interpret. This lack of transparency led to concerns about the reliability, fairness, and accountability of AI systems. The importance of XAI became even more apparent in the 1990s and 2000s, as AI systems became increasingly embedded in everyday life. For example, AI models were used in credit scoring, healthcare decision-making, and criminal justice systems, raising concerns about bias, discrimination, and lack of accountability. In response to these concerns, researchers began to develop a range of XAI methods, with the aim of making AI systems more transparent and understandable to human users. Moreover, XAI has become increasingly important due to legal obligations that require AI systems to be transparent and interpretable. For example, the General Data Protection Regulation (GDPR) of the European Union enforces that individuals possess the entitlement to receive an explanation of the verdicts taken by AI systems that impact them [29]. Similarly, in the United States, as per the Fair Credit Reporting Act (FCRA), automated systems are mandated to provide customers with explanations regarding credit determinations [30]. These legal obligations highlight the significance of XAI in ensuring that AI systems are accountable, fair, and unbiased. XAI has a significant impact on the development and deployment of AI systems, particularly in sensitive domains. By providing users with insight into the decision-making process of AI systems, XAI can enhance the accountability, transparency, and fairness of AI models. 13 2.2.1. Categorization of explanations Various classification schemes have been proposed for explanations. One commonly used approach involves categorizing explanations according to two factors: the basis of the explanation and the timing of the explanation. The basis of explanation pertains to the classification of explanation that either clarifies a single observation or clarifies the entire architecture of an black box model. The timing of disclosure refers to when the explanation is provided, either during the decision-making process or after the fact. As a combination of these distinctions, we can classify explanations in 4 main categories. Local Post-Hoc explanations refer to a set of methods used to provide an understanding of the rationale behind a model’s prediction for a given input instance. These methods are employed after the model has generated its output, and they help to shed light on the reasoning behind the decision made by the model. The purpose of local post-hoc methods is to pinpoint the input features that held the most sway in the decision-making mechanism of the model. One popular technique for doing this is known as “feature importance analysis,” which involves evaluating the impact of each input feature to the model’s decision. Other methods include perturbation analysis, which involves altering the input features and observing the effect on the model’s output, and LIME (Local Interpretable Model-Agnostic Explanations), which creates simple, interpretable method to approximate the original model’s behavior for the given input[8]. Local post-hoc methods can be helpful in diverse applications where understanding the reasoning behind a model’s prediction is important. For example, in medical diagnosis, local post-hoc explanations can help doctors understand which features of a patient’s medical history were most influential in the diagnosis. Similarly, in credit risk assessment, local post-hoc explanations can help lenders understand which factors led to a given loan application being approved or denied. Global Post-Hoc explanations are a set of methodologies utilized to comprehend the overall operation of a model, regardless of the input instance. In contrast to local post-hoc 14 explanations, which focus on expounding the prediction for a single input instance, global post-hoc techniques examine the model holistically to detect input-independent factors that contribute to the model’s decision-making process. Global post-hoc techniques endeavor to determine the most crucial input features for the entire model instead of individual input cases. This can be achieved by utilizing methods such as permutation feature importance, where each input feature’s values are randomly shuffled, and the impact on the model’s overall performance is assessed. As an example SHAP (Shapley Additive Explanations) method can be presented as an example. Global post-hoc methods are advantageous in applications where comprehending the overall behavior of a model is vital, such as in regulatory compliance or auditability. By examining the model as a whole, global post-hoc methods provide insights into the decision-making process of the model and facilitate identification of potential biases or inaccuracies in the model’s output. Moreover, these techniques are applicable for optimizing the model by identifying the most significant features that should be prioritized in future iterations. Local Self-Explaining explanations refer to the set of methods that aim to explain how the model works by analyzing the intermediate results that occur during the model’s training or prediction phase. These methods focus on providing transparency into the inner workings of the model and can be useful in applications where model interpretability is important. An instance of a local self-explanatory model is the Layer-wise Relevance Propagation (LRP) approach, wherein importance scores are attributed to each input feature based on their contribution to the model’s output [31]. LRP has been shown to provide accurate and reliable explanations for diverse models, including neural networks and deep learning models. Global Self-Explaining explanations pertain to the process of clarifying the intermediate outcomes that arise during the model’s training and operation without necessitating an extra process or a separate model. This type of explanation is commonly utilized in deep learning models, where the model’s intricacy makes it challenging to construe the output based solely on the input features. 15 An instance of a Global Self-Explaining Explanation technique is Integrated Gradients, which computes the average gradient of the model output concerning the input feature, incorporated over a course from a baseline input to the current input[32]. This method has been shown to provide intuitive and reliable explanations of deep neural network models. Integrated Gradients have been extensively studied and have been used in diverse domains, such as vision systems and natural language processing. 2.2.2. Explainability Techniques Explainability techniques in Explainable Artificial Intelligence (XAI) allude to a collection of methodologies employed to offer an understanding of the decisions of AI models. These techniques aim to increase transparency, accountability, and trust in the models by explaining how they arrive at their predictions or decisions. Feature Importance is a method used in machine learning to identify which features or variables in a dataset have the most impact on a model’s output. The knowledge derived from such techniques can be significant in multiple scenarios, such as recognizing the essential factors that govern business performance, comprehending the elements that lead to a medical diagnosis, or refining the design of a product. Various techniques can be employed to compute feature importance, each with its own strengths and limitations. A frequently used technique for evaluating feature importance is based on the notion of information gain. Information gain is an indicator of the reduction in entropy (or uncertainty) that is achieved by segregating a dataset based on a specific feature. Attributes that lead to the most significant decline in entropy are deemed to be the most informative and, thus, the most vital. This method is commonly used in decision tree algorithms such as C4.5 and CART. [33, 34]. Another commonly used approach for determining feature importance is permutation. It entails randomly shuffling the values of an individual attribute and observing the influence on 16 the model output. Attributes that have the most substantial effect on the model’s performance when shuffled are deemed to be the most crucial [23]. Over the years, there has been an increasing interest in utilizing feature importance methodologies to enhance the interpretability and transparency of black box models. By understanding which features are most important to a model’s predictions, stakeholders can gain insight into the underlying drivers of a decision and assess the model’s fairness and bias [9]. While feature importance techniques can be useful, it is important to note that they have limitations and should be used in conjunction with other explainability techniques such as model visualization and local interpretation. Additionally, the choice of feature importance method may depend on the specific characteristics of the dataset and the model being used. Surrogate Model refers to a simplified model that is constructed to imitiate the behavior of a more intricate model. The goal of a surrogate model is to provide a more interpretable representation of the underlying model, allowing stakeholders to gain insight into the factors driving the model’s output. Surrogate models can be created using a diverse techniques, such as decision trees, linear regression, and neural networks. The key is to create a model that is simpler and easier to understand than the original model, while still capturing the essential features of the original model’s behavior. Surrogate models are often utilized when the internal workings of a model are not easily accessible, particularly in opaque models. By creating a surrogate model that imitiates the black box model, stakeholders can gain insight into the factors driving the model’s predictions without needing to understand the intricacies of the original model [35]. Surrogate models can also be used to perform sensitivity analysis, which involves varying the inputs to the model and observing the impact on the model’s predictions. By creating a surrogate model that imitiates the original model, sensitivity analysis can be performed more quickly and efficiently than by directly varying the inputs to the original model [36]. 17 Example Driven explainability technique involves providing interpretable explanations for a black box model’s predictions based on specific examples or instances. The idea is to present an explanation of how the model arrived at a particular prediction by highlighting the most influential features or attributes that contributed to the prediction for that particular instance. Example Driven explanations can be created using a variety of methods, such as LIME [8], counterfactual explanations [37], and prototype explanations [38]. LIME generates an explanation by fitting a local linear model to the neighborhood of the instance of interest. Counterfactual explanations provide a set of alternative instances that would have led to a different prediction. Prototype explanations identify a small set of representative examples that can explain the model’s behavior across a range of instances. Example Driven explanations can be particularly useful in situations where the model’s behavior is non-intuitive or unexpected. By providing explanations that are specific to particular instances, Example Driven techniques can help build trust in the model’s predictions and can provide insight into how the model is processing information. Provenance-Based approach to explainability focuses on providing information about the origin and history of the data and model used in making a prediction. This can include information about the source of the data, the transformation steps of data, and the specific features or variables that were used in the model. By providing this information, users can better understand how the model arrived at its prediction and make more informed decisions about how to use the model. Provenance-Based techniques have been applied in various domains, such as healthcare, finance, and machine learning. In healthcare, the provenance of a model’s output can be helpful for clinicians to understand the reason behind a diagnosis or treatment recommendation. In finance, it can assist regulators and auditors in tracing the data lineage for compliance purposes. In machine learning, it can aid in identifying the data and features that are most influential in a model’s decision-making process. 18 One of the popular methods in Provenance-Based techniques is Provenance Graph, which is a directed acyclic graph (DAG) that represents the lineage of data and computations in a workflow [39]. Provenance Graphs have been used in various domains, such as scientific workflows and database systems, to capture the lineage of data and provide explanations for the results. In recent times, Provenance-Based techniques have garnered significant attention owing to their capacity to offer transparency and interpretability for intricate models. However, there are still challenges in applying these techniques, such as the scalability of capturing and analyzing the provenance of large datasets and models[40]. Declarative Induction approach to explainability focuses on generating human-readable rules or decision trees that summarize the behavior of a black box model [41]. This approach involves training an interpretable model that is as accurate as possible while still being simple and understandable [42]. The resultant model can be utilized to furnish justifications for particular predictions, in addition to obtaining discernments regarding the performance of the opaque model in a broader sense. One example of the Declarative Induction approach is the use of decision trees to summarize the behavior of a black box model. Decision trees can be trained using techniques such as CART or C4.5 to generate a tree-like structure that represents a set of rules for making predictions [43]. These rules can be easily understood by humans and can provide insights into how the black box model is making its predictions. 2.2.3. Operations To Enable Explainability As the need for explanation of artificial intelligence models increases, several techniques have been developed to enable explainability operations. First-Derivative Saliency refers to a method for interpreting machine learning models that involves calculating the first derivative of a model’s output with respect to its input. 19 This approach has gained popularity in recent years as a way to provide insight into the decision-making processes of complex machine learning models. The concept underlying the First Derivative Saliency method is to determine the most relevant input features that affect a model’s output by computing the differentiation of the output base on input. This technique is especially valuable in image and text classification problems, where comprehending the rationale behind a model’s decisions can be challenging. There are several different techniques for calculating First Derivative Saliency, including the popular “gradient-based” methods which includes Gradient-weighted Class Activation Mapping (Grad-CAM) and Guided Backpropagation (GBP). These methods have been shown to be effective in identifying salient features in image data [44, 45]. In addition to image classification, First Derivative Saliency has also been applied to natural language processing assignments like sentiment analysis and named entity recognition [46, 47]. Applications have demonstrated the usefulness of First Derivative Saliency for interpreting complex models in a wide range of domains. Overall, “First Derivative Saliency” is an important tool for interpreting machine learning models and gaining insight into their decision-making processes. As machine learning gains prominence and becomes integral to diverse applications, the ability to understand and explain these models will become even more crucial. Layer-wise Relevance Propagation(LRP) is a method used to interpret the predictions of neural networks by assigning correlation scores to input features. The idea behind LRP is to propagate the correlation scores of the output of a model back to its inputs, in order to identify which input features were important for making the prediction. LRP can be used to understand how a neural network arrives at its decisions, and to identify potential biases or errors in its predictions [31, 48]. The basic idea of LRP is to assign relevance scores to each neuron in the network, which represent the contribution of that neuron to the final prediction. These relevance scores are 20 then propagated backwards through the network using a set of propagation rules, which depend on the architecture and activation functions of the network. One of the advantages of LRP is that it provides a principled approach for interpreting the output of a neural network, without requiring any external knowledge or preconceptions about the problem being solved. LRP has demonstrated its efficacy in explaining the predictions of diverse neural network architectures, spanning convolutional neural networks, deep belief networks, and recurrent neural networks. Its applications span across image classification, natural language processing, and drug discovery, where it has proven to be a useful tool for detecting possible biases and identifying significant features in neural network predictions. Input Perturbations is employed to interpret the predictions made by machine learning models by altering the input data and evaluating its impact on the output. The underlying concept of this technique is to identify the features of input data that are critical for the model’s predictions. This method is beneficial in comprehending the model’s decision-making process, discovering any possible predispositions or inaccuracies, and enhancing the model’s effectiveness. Input perturbations can take many forms, including adding noise to the input data, removing or replacing certain features, or manipulating the input in other ways. The effect of these perturbations on the model’s output can be analyzed to determine which attributes are most critical for model’s predictions. Input perturbations are a powerful and versatile technique as they can be applied to any type of machine learning model, regardless of its structure or training approach. This model-agnostic technique has been successfully utilized to explain the predictions of a broad range of machine learning models, such as decision trees, support vector machines, and neural networks. One common method for analyzing the effect of input perturbations is to use sensitivity analysis, which involves calculating the change in the model’s output for a given change 21 in the input. Another method is to use visualization techniques, such as heatmaps or scatterplots, to identify patterns in the model’s predictions and their relationship to the input data. Input perturbations have found applications across diverse fields, including but not limited to image classification, natural language processing, and healthcare. It has been shown to be effective at identifying relevant features, detecting potential biases in machine learning models, and improving the interpretability and transparency of these models. Attention is a deep learning technique that enables the model to concentrate on specific parts of the input data during processing [49]. This mechanism is inspired by human cognition, where the brain selects information to process based on its importance to the given task. The implementation of attention in deep learning has been proven to enhance the model’s performance in various applications such as image classification, speech recognition, and machine translation [50]. The attention mechanism functions by assigning a significance score to each element in the input data, depending on its significance to the task. These scores are then utilized to compute a weighted summation of the input, which is subsequently propagated through the remainder of the network. The significance scores are obtained through training and can be viewed as an indication of the significance of each input element. Attention offers the advantage of enhancing the interpretability of deep learning models, as it enables us to comprehend the specific regions of the input data that the model is emphasizing during processing. This ability can be valuable in gaining insight into the decision-making process of the model, detecting possible biases or inaccuracies, and refining the model’s performance. LSTM (Long Short-Term Memory) is a variant of recurrent neural networks that excels in handling sequential data. What distinguishes LSTM is its capacity to selectively retain and retrieve information over a period, aided by gating signals that regulate the information 22 flow through the network. These gating signals are acquired through training, and offer an indication of the significance of distinct segments of the input sequence. In an LSTM, the gating signal consists of three gates: the input gate, the forget gate, and the output gate. The input gate regulates the amount of new input to be added to the memory cell, the forget gate determines how much old information should be retained in the memory cell, and the output gate determines the extent to which the current memory cell value should be output to the next layer in the network. The gating signal is computed by taking a linear transformation of the input and the previous hidden state, and then applying a sigmoid function. By doing so, the model is capable of learning a function that is flexible enough to adapt to various input sequences, and can manage the flow of information in the network efficiently. In an LSTM, the gating signal has been shown to be a useful tool for understanding the inner structure of the model, along with for improving the performance of the network on certain tasks [51–53]. By analyzing the gating signals, researchers can gain insights into which parts of the input sequence are most crucial to the task at hand, and can use this information to refine the network architecture and training process. Explainability-aware architecture design is a growing tendency in deep learning involves integrating explanation techniques within the model architecture to improve interpretability. This approach is gaining popularity as it can lead to more reliable and better-performing models in diverse domains, including finance, healthcare, and autonomous driving. The attention mechanism is one such technique that enables the model to selectively focus on relevant parts of the input and produce a more interpretable output. This approach has been effectively utilized in natural language processing (NLP) tasks, such as machine translation [54] and text classification [55]. Another example of an explainability-aware architecture design is the use of graph neural networks (GNNs), which can model the structure and relationships between entities in a 23 graph and provide a more interpretable output. GNNs have been applied in various domains such as social networks [56], drug discovery [57], and recommendation systems [58]. In addition to attention and GNNs, there are other techniques that can be used to design explainability-aware architectures, includes decision trees, and causal models [59]. By incorporating explainability techniques directly into the model architecture, explainability-aware architecture design can provide a more interpretable and transparent model, which can lead to better performance and more reliable predictions. 2.2.4. XAI Visualizing Techniques Another important aspect of XAI is visualizing the decision-making process of these models, which can help humans better understand and trust their outputs. Some popular methods for visualizing the internal workings of machine learning models has revealed. A common technique is to visualize the importance of different features, which can reveal the features that have the greatest influence on a model’s predictions. For instance, permutation-based techniques such as “Permutation Feature Importance (PFI)” and “Partial Dependence Plot (PDP)” can be utilized to analyze the impact of individual attributes on the model’s output [60]. Alternatively, “Integrated Gradients” and “Layer-wise Relevance Propagation (LRP)” can be used to visualize the contribution of each feature to the model’s output by computing the output’s gradient with respect to the input [31]. Another approach to visualizing machine learning models is to use saliency maps, which highlight the parts of an image that the model is focusing on when making its predictions. For example, Grad-CAM (Gradient-weighted Class Activation Mapping) calculates the gradient of the output concerning the feature maps of the last convolutional layer, a heatmap is generated that pinpoints the image regions that are most significant for the model’s prediction. [44]. 24 Raw declarative and natural language processes are another XAI visualization methods that convert model behavior into natural language for human comprehension. Raw declarative models map complex model decisions to simple, understandable rules or logic that a non-expert can easily interpret. Natural language processes, on the other hand, convert the model’s behavior into a more human-readable text, allowing users to understand the reasoning behind a particular prediction. One example of raw declarative methods is the “Decision Trees”, which are commonly used for classification tasks and can be visualized as a hierarchical structure of if-else statements that explain the reasoning behind the model’s predictions [61]. Natural language processes have also been applied in XAI to enhance model interpretability. One example is the work of Lei[62], who proposed a natural language generation method to explain model behavior. The method generates textual explanations for model predictions, which can be used to understand the reasoning behind the model’s decisions. Both raw declarative and natural language methods provide a way to translate complex model decisions into human-readable formats. These methods can help to improve model transparency and foster greater trust in machine learning systems. 2.2.5. Evaluation XAI Methods Evaluation of XAI methods is an essential step in determining the effectiveness and usefulness of these methods. The evaluation can be conducted using various metrics such as, accuracy, precision, recall, and F1 score. In addition to these traditional metrics, there are also several other metrics specific to XAI, such as interpretability and transparency [63]. A common method for evaluating explainable artificial intelligence (XAI) techniques involves employing human participants to assess the quality of the explanations generated by these methods. This approach is particularly relevant when evaluating methods designed to explain complex models, includes ANNs. For instance, Li et al. [64] utilized 25 human participants to evaluate the effectiveness of explanation methods in elucidating the predictions made by neural networks. Another approach to evaluating XAI methods is to use simulation experiments to measure the effectiveness of these methods in identifying errors or biases in the models they are designed to explain. For example, Kaur et al. used simulation experiments to make assesment the effectiveness of different explainable artificial intelligence methods in identifying biases in a predictive model for predicting breast cancer recurrence [65]. Simulation experiments involve creating artificial datasets to evaluate the performance of XAI methods. These datasets are created to mimic real-world scenarios and are designed to test the ability of XAI methods to accurately explain the decisions made by machine learning models. By simulating different scenarios, researchers can evaluate the robustness of XAI methods and determine their effectiveness in different contexts. 26 3. RELATED WORK 3.1. Local Interpretable Model-Agnostic Explanations Local Interpretable Model-Agnostic Explanations(LIME) is a method to explain the predictions of opaque machine learning models by generating understandable and localized approximations of these models. In 2016, Ribeiro et al. introduced LIME, which has since become a commonly used technique for enhancing interpretability in machine learning. [61]. The basic idea behind LIME is to create interpretable model that which imitates the actions of a complex model in a limited area surrounding a specific instance. This approximation can then be used to understand why the black box model made a certain prediction for that instance. To create this local approximation, LIME first selects a set of important features for the instance in question, then generates a dataset of perturbed instances by randomly sampling from the feature distributions. The black box model is then used to predict the output of these perturbed instances, and a linear regression model is trained on the perturbed instances and their corresponding black box model predictions. The coefficients of this linear regression model represent the importance of each feature in the local approximation. LIME has several advantages over other explainability methods. First, it can be used with any black box model, including deep learning models. It makes LIME “Model-Agnostic”. Second, it produces interpretable explanations that are easy to understand and can be used to build trust in machine learning systems. Finally, it can be used to explain both individual predictions and the overall behavior of a model. There have been many applications of LIME in various fields, including healthcare [66], finance [67], and NLP [68]. However, there are also some constraints to LIME, such as the fact that it only approximates the black box model in a small region around the instance, and that the explanations may not be globally consistent. In conclusion, LIME is a powerful and widely used technique for explaining black box machine learning models. Its ability to create simple, interpretable approximations of 27 complex models has made it a popular choice for building trust in machine learning systems. However, as with any explainability method, it is important to understand its limitations and use it appropriately. 3.2. SHAP (SHapley Additive exPlanations) The SHAP technique is an approach for interpreting the results of intricate black box models. It offers a method to evaluate the impact of each input feature on the model’s prediction for a specific instance. The SHAP values demonstrate the alteration in the anticipated model output when a specific feature value is observed, compared to when the feature value is substituted by a baseline value. [9]. The SHAP technique utilizes the concept of Shapley values from cooperative game theory, which provides a way to fairly allocate the total value of a coalition of players to each player based on their individual contributions [69]. The Shapley value is a method for assigning a payoff to each player in a cooperative game. It is based on the idea that the contribution of each player to the total payoff should be proportional to their marginal contribution to any coalition that achieves the payoff. In other words, the Shapley value is a way of fairly dividing the payoff among the players based on their contributions. In machine learning, the coalition of players in the context of SHAP is the set of input features, and the value is the output of the model. The Shapley values represent the marginal contribution of each feature to the difference between the actual prediction and the expected prediction for a given input instance. ϕS j (x) = ∑ T⊆1,...,p\j |T |!(p− |T | − 1)! p! [ fT∪j(x)− fT (x) ] (1) In equation 1, ϕS j (x) represents the SHAP value for feature j in the context of input x. p is the total number of attributes in the dataset, T is a subset of the feature indices excluding feature j. fT (x) represents the predicted output of the machine learning model when only the features in subset T are present in the input x. fT∪j(x) represents the predicted output of 28 the model when both feature j and the features in subset T are present in the input x. The equation calculates the difference in predicted output between the input x with feature j and the features in subset T , and the input x with only the features in subset T . These differences are then weighted and averaged over all possible subsets of features, resulting in a SHAP value for feature j that provides a measure of feature importance and can be used to interpret the predictions of the any black box model. The SHAP method is model-agnostic, meaning it can be used with various machine learning models, such as ANNs, decision trees, and linear models. This approach makes the SHAP method versatile and applicable to a wide range of problems. The method approximates the model locally to calculate the SHAP values, enabling it to handle complex models and large datasets without significant computational overhead [9]. The SHAP method has found applications in various domains, including healthcare [70], finance [71], and natural language processing [72]. It has also been integrated into popular machine learning libraries, such as XGBoost [25] and scikit-learn [73], making it accessible to a wide range of users. To summarize, the SHAP method is a potent approach to interpret the results of complicated machine learning models. It utilizes Shapley values from cooperative game theory to estimate the input feature’s contribution to the model’s prediction for a given instance, regardless of the model’s type. The SHAP method uses a model-agnostic approach to compute the SHAP values through a local approximation of the model, which makes it scalable to large datasets and complex models. Due to its widespread applications and its integration into popular machine learning libraries, the SHAP method is an invaluable tool for both researchers and practitioners. 29 4. PROPOSED METHOD In the context of our thesis, we introduce a novel technique denominated as the “Tree Ensemble for Explainable Artificial Intelligence(TEXAI)” which related to the concept of explainable artificial intelligence. This approach is aligned with the theoretical underpinnings and empirical evidence we have expounded in the preceding sections. Broadly speaking, the TEXAI model we have devised takes as input an artificial neural network (ANN) and it’s training dataset, as depicted in figure 4.1. Subsequently, it generates a comprehensive explication for the decisions made by the ANN. Figure 4.1 Basic workflow of TEXAI model The structure of our TEXAI possesses the capability to elucidate the decision-making process of any black-box model. As such, our TEXAI can be categorized as model-agnostic. It can be classified as a post-hoc approach, given its ability to generate explanations regarding the decision-making of the black-box model subsequent to its training phase. It does not require any supplementary processing during the training of the black box model. And our TEXAI has the capacity to generate two types of explanations: one pertaining to an individual observation and the other concerning the overall structure of the opaque model. This duality of explanations allows us to assert that our model generates both local and global explanations. The research carried out by the Defense Advanced Research Projects Agency(DARPA) has shown that there is a trade-off between the predictive accuracy and the explainability accuracy of a model[1]. The model prediction accuracy is inversely proportional to its 30 explainability. The models that achieve high predictive performance tend to have lower explainability accuracy. The comparative scheme of DARPA, which is illustrated in figure 4.2, highlights this trade-off between predictive performance and explainability. Thus, there is a need to find a balance between accuracy and explainability in order to obtain a model that is both accurate and interpretable. Figure 4.2 Learning techniques and their explainability.[1] The central question we aim to address in this study is whether models that exhibit high predictive accuracy can be transformed into highly explainable models. This question serves as the primary inspiration for our proposed method. In light of this, we have chosen to utilize the decision tree method as it is considered to be one of the most suitable techniques for generating explainable models. However, it is not realistic to expect that a single decision tree can achieve the same level of performance as a neural network. Therefore, we have opted to construct an ensemble that comprises multiple decision trees to simulate the behavior of a neural network. Our methodology involves the utilization of the surrogate model technique, whereby an additional model is developed to emulate the black box model in order to provide an explanation of its decision process. Simultaneously, it can be asserted that the our TEXAI was conceived utilizing the Declarative Induction explanation technique, as it was built through the utilization of self-explanatory decision trees. 4.1. TEXAI Model Development The subsequent sections of our thesis shall provide an elaborate elucidation of the phases involved in the development of our TEXAI. For the purpose to gain preliminary insight, we 31 present the pseudo code of our model in Algorithm and the visual flowchart in figure 4.3. Algorithm TEXAI model development 1: procedure MAIN(Dataset,ANN ) 2: D =DATAAUGMENTATION(Dataset) 3: predictions = ANN.predict(D) 4: D[Target] = predictions 5: TreeForest =CREATEDECISIONTREEFOREST(Dataset) 6: Paths =TREEFOREST.CREATEPATH 7: Paths =PRUNE(Paths) 8: Paths =REMOVEOVERLAPPEDPATHS(Paths) 9: Return Paths 10: end procedure 11: procedure DATAAUGMENTATION(Dataset) 12: length = Dataset.length 13: for each integer i in length do 14: Pickrandom2Observations 15: Dataset.append(mean(twoobservations)) 16: end for 17: Return Dataset 18: end procedure 19: procedure CREATEDECISIONTREEFOREST(Dataset) 20: ColumnCount = Dataset.Columns.Count 21: for each integer i in MaxTreeCount do 22: RandomlySelectedColumns = Dataset.Columns.randomSample(TempColCount) 23: TempDataset = Dataset[RandomlySelectedColumns] 24: Tree = sklearn.DecisionTreeClassifier().F it(TempDataset[RandomlySelectedColumns], T empDataset[Target]) 25: Forest.Add(Tree) 26: end for 27: Return Forest 28: end procedure 29: procedure PRUNE(Paths) 30: for each path p in paths do 31: if CHECK(p) isFalse then 32: Paths.remove(p) 33: end if 34: end for 35: Return Paths 36: end procedure 37: procedure CHECK(Path) 38: if Path.Coverage > MinCoverage & Path.Purity > MinPurity & Path.Proba > MinProba then 39: Return True 40: else 41: Return False 42: end if 43: end procedure 44: procedure REMOVEOVERLAPPEDPATHS(Paths) 45: groups = Paths.GroupBy(Breakdownfeatures, direction) 46: for each group g in groups do 47: tempPaths.add(group.pickMoreInclusive()) 48: end for 49: Return tempPaths 50: end procedure In the initial phase of this study, a rudimentary artificial neural network (ANN) was trained and assessed using the dataset, attaining an accuracy rate of roughly 39%. Though this accuracy rate may appear low or unsatisfactory, it is noteworthy that our objective in this thesis does not entail the presentation of a highly efficacious ANN model. Rather, our aim is to explicate the decision-making mechanisms of the ANN in a manner that can be 32 Figure 4.3 TEXAI model work flow schema. comprehended by human beings, even when the accuracy rate is diminished. Consequently, no attempts were made to enhance the performance of the model. It is commonly understood that there exists a positive correlation between the amount of data and the accuracy of models. Furthermore, as the amount of the data increases, a more comprehensive understanding of the model’s behavior can be attained. Based on these suppositions, we have taken the initiative to augment our input dataset. By doing so, we aim to ensure that the threshold values obtained in the node breakdown of a decision tree are more consistent. For instance, suppose a dataset comprises of two observations, where the x1 value of the first observation is denoted as t1 and its target class is c1, and the x1 value of the second observation is t2 and its target class is c2. In a model trained with this data, the 33 threshold value for attribute x1 would be the average of t1 and t2. However, it is evident that a more precise threshold value could be derived if additional observations between t1 and t2 for the x1 attribute were present in the dataset. As a result, we have augmented our input dataset to incorporate more data using a data augmentation method that generated new observations by averaging two randomly chosen observations, thereby doubling the data volume. n = nf/2∑ i=2 ( nf i ) (2) The proposed TEXAI takes an augmented and preprocessed dataset, and a neural network model as input, and generates a set of rules as output. The algorithm initiates by constructing a set of n decision trees, each independently built on a subset of the original dataset. The subset is created by randomly selecting a subset of features from the original dataset. It is important to note that the value of n indicates the maximum number of sample datasets that can be generated through random feature selection from the original dataset. The equtation for n represented in equation 2 where n is the maximum number of trees to be created and nf is the number of features in dataset. An example of these trees is shown in figure 4.4.Each path of each tree represents a rule. In the information presented about the leaf in the figure 4.4, the array named “classes” shows the target class distribution in that leaf. The class with the maximum value in this array is the prediction of the path that reaches that leaf. And the other informations such as gini impurity, probability, sample count and coverage in the figure 4.4 will be explained subsequent part of the our thesis. Among the n trees, many different rules may make the same decisions as the ANN that requires explanation, thus locally simulating its behavior. The core idea of our approach is to identify the paths resulting in these rules, which provide a local explanation of the ANN model, and then create an ensemble of them to explain the entire ANN model. Note that, each tree was trained using different and random sub-samples of features and observations. Due to the use of a randomized subset of features, different decision trees were created and the consistency of the node thresholds in these decision trees was observed. Not all paths were immediately accepted though. Only decisive paths which satisfy the threshold 34 Figure 4.4 Decision tree example we established based on the following three metrics were used, namely, Gini impurity, probability and coverage metrics. Gini impurity is a measure of the impurity or heterogeneity of a set of data points. It is commonly used in decision tree algorithms for classification tasks. A smaller value of Gini impurity indicates a more pure or homogeneous set of data points, where all data points belong to the same class. Conversely, a larger value of Gini impurity indicates a more impure or heterogeneous set of data points, where the data points belong to multiple classes with roughly equal probabilities. Formally, the Gini impurity of a set S is defined in equation 3 where pj is the proportion of data points in S that belong to class j. The summation is taken over all classes j. gini(S) = 1− ∑ p2j (3) The probability assigned to a leaf node is a computed estimate of the likelihood that a novel data point would be classified as a certain class label, which is determined based on the ratio of observations of that class label present in the leaf node to the overall number of samples in that node. In a formal context, the probability associated with a class label c for a given leaf node L may be mathematically expressed as illustrated in equation 4. Herein, c signifies the class label that is most commonly observed in the leaf node, n(c, L) represents the number of samples in the leaf node L belonging to the class c, and n(L) refers to the total number of samples contained in the leaf node L. 35 p(L) = n(c, L) n(L) (4) In our study, we utilized the concept of coverage as the third criterion for evaluating decision tree paths. This addition was motivated by the issue of overfitting. The aim was to ensure that the rule set generated by the model would exhibit similar levels of accuracy on both training and test datasets. The conventional definition of coverage for a leaf in decision trees is the ratio of the number of observations in the leaf to the total number of observations. However, in our classification approach, the target attribute was based on predictions generated by ANN. Occasionally, the ANN may predict a specific class for a limited number of observations, which may be contained within a single leaf. Using the standard coverage equation to evaluate this path to leaf may result in it being categorized as nondecisive. However, upon further examination, it may be discovered that this leaf covers the entire area associated with the relevant class in hyperspace. To address this issue, we formulated an alternative coverage equation and presented it as equation 5. In this equation, c represents the class with the maximum number of observations in the leaf, n(c, L) denotes the number of observations in leaf node L that belong to class c, and n(c) indicates the total number of observations belonging to class c. This modified equation enables us to obtain a more accurate assessment of the coverage for each leaf in the decision tree, even when dealing with non-uniform class distributions. C(L) = n(c, L) n(c) (5) Each individual path of the decision trees was evaluated based on the three metrics previously mentioned. Those paths that were determined to be non-decisive were subsequently eliminated, resulting in a reduced set of paths. As examples, the following two rules were generated from the abalone dataset. In this context, the symbol P is used to denote the probability associated with a leaf node, while N represents the number of observations classified to that specific leaf node. Additionally, GI pertains to the measure 36 of impurity known as “Gini Impurity”, while the “Classes” array indicates the distribution of observations based on their target class. Rule 1: (Viscera weight>0.44475) then (Height>0.155) then (Length>0.6825) ⇒ “Class11” P:97.0 N:128 GI:0.02 Classes:[0,0,4,124] Rule 2: (Diameter>0.335) then (I≤0.5) then (Shell weight≤0.21975) then (Whole weight>0.761) then (Whole weight≤1.088) ⇒ “Class9” P:97.3 N:150 GI:0.01 Classes:[0,146,4,0] Despite the elimination of non-decisive paths, it is possible that some remaining paths may overlap on the same plane. In such scenarios, the path that covers a larger area on the plane is preserved while the other path is discarded. During this process, rules that are on the same hyperplane and provide the same target class as a prediction are consolidated into groups. Among these groups, the most comprehensive rule is selected, and the rest are eliminated. To provide an illustration, two rules generated by our model on the same hyperspace are presented below. Rule 3 is more comprehensive, as evident from the N value, which denotes the total number of observations covered by that rule. Hence, rule 3 is retained while rule 4 is eliminated. Rule 3: (I≤0.5) then (Length>0.4475) then (Length<0.59) ⇒ “Class9” P:100.0 N:158 GI:0.0 Classes:[0,158,0,0] Rule 4: (I≤0.5) then (Length>0.4925) then (Length≤0.59) ⇒ “Class9” P:100.0 N:79 GI:0.0 Classes:[0,79,0,0] After applying the filtering criteria as described previously, a set of rules that define each target class of the neural network are obtained. These rules are derived from the decision trees’ leaves and are linear in nature, dividing the space into two halves, one of which corresponds to the neural network’s class decision and the other half suggests a different class. The set of these rules describes a region that potentially contains open half-spaces, which closely matches the neural network’s decisions locally. It should be noted that the 37 neural network is a non-linear model and, as a result, the regions described by our set of rules are only piece-wise approximations of the original regions of the neural network. By manipulating the parameters of algorithm , specifically the minimum gini impurity, purity and probability values, we can generate tighter or looser regions. When the regions are tighter, the resulting explanations of the neural network is more accurate, but may fail to account for some observations. Conversely, when the regions are looser, the explanations of the neural network is less accurate, but the number of labeled observations increases. In the next section, we present experimental results obtained from the “Wine Quality” and “Abalone” datasets. 38 5. EXPERIMENTAL RESULTS 5.1. Datasets In the process of constructing our TEXAI model, we employed two distinct datasets, designated as “Abalone” and “Wine Quality”. A substantial portion of each dataset, that is 80%, was allocated for utilization in the development of the TEXAI model. The remaining 20% was reserved as a test phase for evaluation purposes. In order to ensure the accuracy and effectiveness of the TEXAI model development process, it is crucial to utilize pre-processed datasets that were originally used to train the black-box model. 5.1.1. Abalone Dataset Figure 5.1 Abalone dataset target class distribution Primarily, in the process of developing our model TEXAI, we employed the abalone dataset, which can be accessed from [74]. This dataset contains a total of 4,177 observations, comprising of 9 attributes and a target column that indicates the number of rings. These attributes include sex(F,I,M), length, diameter, height, whole weight, shucked weight, viscera weight, and shell weight. 39 The target attribute of the dataset utilized in this study comprises numerical values, rendering it suitable for regression tasks. However, the objective of the thesis is to investigate the decision-making process of classification models. Unfortunately, as illustrated in figure 5.1, there are a large number of classes and the number of observations in each class is quite unbalanced. Due to the scarcity of observations in certain target classes, black box models have a tendency to predict target classes with a high number of observations for all instances. Our TEXAI model, developed with the predictions and inputs of the ANN model, will have no observations for target classes characterized by a low number of instances. Consequently, we eliminated instances belonging to target classes with low observation count from our dataset. Thus, by subsampling the four most populated classes (8, 9, 10, and 11 rings) we obtained a more balanced classification problem. Upon completion of the elimination process, a final count of 2378 observations remained. During the development process of TEXAI, 1902 observations were employed for training purposes while the remainder of the dataset was allocated for testing the model’s performance. The training dataset was augmented to twice its original size using the augmentation technique described in the previous section. 5.1.2. Wine Quality Dataset Figure 5.2 Wine quality dataset target class distribution 40 Additionally to Abalone dataset, during the testing phase, the wine quality dataset, which is also available on [75] was utilized. This dataset contains a total of 178 observations, comprising of 13 attributes and a target column that indicates the quality of wine. These attributes include alcohol, malic acid, ash, alcalinity of ash, magnesium, total phenols, flavanoids, nonflavanoid phenols, proanthocyanins, color intensity, hue, od280/od315 of diluted wines, proline. The target attribute of the dataset comprises three unique classes, and their distribution is depicted in figure 5.2. During the development process of TEXAI, 142 observations were employed for training purposes while the remainder of the dataset was allocated for testing the model’s performance. The training dataset was augmented to twice its original size using the augmentation technique described in the previous section. 5.2. Threshold Tuning In the preceding section, where rule elimination was to be performed, the utilization of probability, coverage, and gini impurity thresholds was discussed. In this section, we present an experimental approach for determining an optimal threshold value. To determine minimum probability threshold, we conducted a systematic investigation in which we gradually increased the probability value by 1 percent increments while maintaining all other parameters constant. We then evaluated the resulting TEXAI model’s performance. The findings of our study are presented in figure 5.3 where the y-axis shows the normalized values about results and the x-axis shows the minimum probability value. Our analysis uncovered a direct relationship between the minimum probability threshold and the TEXAI model’s decision accuracy. Increasing the minimum probability threshold decreases the area covered by the TEXAI in the ANN’s decision space and increases the number of observations for which the TEXAI cannot make a decision. However, this decrease in the decision space also results in a reduction in the number of incorrect predictions made by the TEXAI. Conversely, decreasing the minimum probability threshold increases the number of incorrect predictions made by the TEXAI, but reduces the number of observations for which 41 the TEXAI is uncertain. Therefore, a trade-off exists between the TEXAI being indecisive and making incorrect decisions. Consequently, it is crucial to select an appropriate minimum threshold to achieve an optimal balance. Based on our empirical findings, for Abalone dataset, we have determined that a probability value higher than the 44 percent is required for a path to be decisive. Figure 5.3 Minimum probability evaluation(Abalone) As observed in our minimum probability analysis, a similar trade-off exists between the TEXAI model providing an incorrect estimate and being undecisive. Similar to the assessment of the minimum probability threshold, we employed a systematic methodology. While maintaining constancy of all other variables, we systematically incremented the coverage parameter by 1% increments and assessed the outcomes. Based on our experimental analysis, which is presented in figure 5.4 where the y-axis shows the normalized values about results and the x-axis shows the minimum coverage value and considering the generic nature of our coverage equation 5, we found that a minimum coverage value of 13% represents a reasonable compromise between model error and model indecisiveness. 42 Figure 5.4 Minimum coverage evaluation(Abalone) Both figure 5.3 and figure 5.4, The “No Prediction” displays the normalized quantity of observations that were left unclassified by TEXAI, whereas the “Wrong Prediction” exhibits the normalized quantity of observations that were incorrect classified by TEXAI. Conversely, the “Correct Prediction” line demonstrates the normalized quantity of observations that were accurately classified by TEXAI. The term “purity” pertains to the purity of TEXAI. The purity of TEXAI was computed according to test dataset using equation 6 where the symbol σ(r) represents the purity of the TEXAI, the symbol ε represents the number of rules that predict the same target class as the ANN for that observation, while the symbol lambda denotes the total number of rules that fit the observation. Additionally, the symbol N represents the total number of observations. σ(T ) = ∑ ε λ N (6) Upon analyzing both figures 5.3 and figure 5.4, it becomes apparent that the optimal values for probability and coverage are 44% and 13%, respectively. This conclusion is based on the 43 observation that up to these threshold values, there was a significant and rapid increase in the number of correct predictions and purity, accompanied by a sharp decline in the number of incorrect predictions. However, beyond these threshold values, while the purity and number of correct predictions continued to increase gradually, the number of observations in which the model was indecisive increased rapidly. 5.3. Results As a consequence of the procedures outlined in the preceding segment, a total of 203 rules were derived for the abalone dataset and 150 rules were obtained for the wine quality dataset. Due to the random sampling of datasets from the main dataset for the purpose of constructing decision trees, these figures may vary, even for the identical dataset and model. Nevertheless, given the creation of a substantial number of decision trees, any disparity in these figures is expected to be negligible. Each rule indicates a target class. The set of rules indicating a class defines the area covered by that class in hyperspace. Table 5.1 illustrates the count of rules that pertain to target classes in the “Abalone” dataset, while table 5.2 presents the corresponding rule count for the target classes in the Wine dataset. Target Class Rules Count Class 8 85 Class 9 45 Class 10 23 Class 11 75 Total 228 Table 5.1 Abalone dataset rules count Target Class Rules Count Class 0 111 Class 1 59 Class 2 49 Total 219 Table 5.2 Wine quality dataset rules count As we previously discussed, our starting point was to mimic high-performing models with low interpretability, such as ANN, by transforming them into decision tree ensembles that have high interpretability. Therefore, our evaluation should focus on the extent to which our TEXAI mimics ANN. Broadly speaking, all AI models take an input and produce an output. 44 Consequently, the similarity between two AI methods or models is determined by comparing the outputs they generate for the same inputs. The decision-making process for the TEXAI model that we developed can be described as follows: For a given observation, all rules that match the observation are considered, and each of these rules has one vote which is the target class of them. The target class that receives the highest number of votes is then selected as TEXAI’s estimate for that particular observation. During the evaluation of our results, we employed two distinct approaches. The first approach involved a comparison between the decisions made by the Artificial Neural Network (ANN) and those made by the TEXAI model, using the test datasets. The visual representations provided below illustrate prediction plots made over the “Abalone” dataset, with figure 5.5 representing the predictions of ANN for observations, figure 5.6 representing the predictions of TEXAI for observations, and figure 5.7 displaying the differences between the predictions of ANN and TEXAI. The prediction plots for the wine quality dataset that we have used as secondary dataset are presented in figure 5.8 for ANN predictions, figure 5.9 for TEXAI predictions, and figure 5.10 for the differences between ANN and TEXAI predictions. According to both datasets we used, the results are presented in table 5.3. As can be seen from the results, there is an agreement of almost 94% between the ANN’s and TEXAI’s decisions on the wine dataset, and an agreement of 81% on the abalone dataset. Note that, in the abalone dataset there is also a large number of observations where the TEXAI model couldn’t make a prediction. That is because of the tightness of the TEXAI model constructed. We could easily decrease the number of “no predictions” in favor of the true and false predictions. For each dataset the desired ratio can be obtained by the model builder using the parameters described above. As a secondary approach for evaluation, we conducted individualized assessments for each rule that was created. It was explicitly stated that each of our rules is linked to a target class. 45 Figure 5.5 ANN prediction plots(Abalone) Figure 5.6 TEXAI prediction plots(Abalone) Figure 5.7 ANN vs TEXAI predictions(Abalone) Dataset Test Observation Count Matched with NN Wrong Prediction No Prediction Abalone Dataset 476 386 46 44 Wine Quality Dataset 64 60 4 0 Table 5.3 Metric results of TEXAI 46 Figure 5.8 ANN prediction plots(Wine Quality) Figure 5.9 TEXAI prediction plots(Wine Quality) Figure 5.10 ANN vs TEXAI predictions(Wine Quality) In order to carry out this evaluation, we extracted all the observations that satisfied rule from the test data and subsequently predicted their corresponding target classes using an Artificial Neural Network (ANN) model. The degree of correspondence between the target class of the rule and the predictions made by the ANN were then determined as a percentage. As an example, the observations and ANN predictions that followed a specific rule which obtained from abalone dataset are depicted in figure 5.11. Notably, out of 58 observations that adhered to the specified rule, 55 of them exhibited agreement between the ANN and the rule. This corresponded to a compliance rate of approximately 95%. It is important to consider the possibility that the differences in predictions may also arise due to the “overfitting” issue of ANN. Out of the 203 rules obtained from the Abalone dataset, a total of 153 and, similarly, 47 out of the 219 rules obtained from the wine quality dataset, 162 of them demonstrate a perfect match with the predictions generated by the Artificial Neural Network (ANN) at a 100% level of accuracy. The average compliance rate for all rules generated within the Abalone dataset was estimated to be 97%, while that of the wine quality dataset was 98%. Therefore, we can conclude that each rule is fairly successful in recognizing a specific class. Figure 5.11 ANN predictions for observations which fits one rule (Abalone) Moreover, the TEXAI framework we have established enables us to identify which features hold greater significance in the model’s decision-making process. Hence, TEXAI can also function as a feature importance extraction method. Figure 5.12 shows the number of rules containing each feature for wine quality dataset. For example, in the ”Wine Quality” dataset, where we obtained 219 rules, more than half of the rules contain the color intensity property. Based on this observation, we can infer that the feature of color intensity holds greater significance in the model’s decision-making process compared to other features. 48 Figure 5.12 Feature importance (Wine Quality) 49 6. CONCLUSION The concept of Explainable Artificial Intelligence (XAI) has become a noteworthy area of focus in recent times, mainly because of the surge in the utilization of intricate black box models, such as deep neural networks, in different fields. XAI aims to provide interpretability and transparency to these models, enabling humans to understand