ECT: AN OFFLINE METRIC TO EVALUATE CASE-BASED EXPLANATIONS FOR RECOMMENDATION SYSTEMS
Interpretability. Recommendation systems. Black-box explanations. Offline eva-luation metric
Recommender systems suggest items that best fit the users’ interests and tastes. They are complex systems, which are often designed as black-box models. However, it is well- known that understanding recommendations is essential to help building users’ trust and engagement in these systems, so explanations for recommendations are a way to achieve this goal. An explanation is a piece of information displayed to users, explaining why a particular item is recommended. The case-based explanation is one of the most commonly used explanation styles in recommender systems. This approach is focused on providing explanations in the form of previously liked items that are used to make the recommendation, such as: “Because you liked A, we recommend B”. To measure the quality of methods that generate explanations, we can conduct online and/or offline evaluations. Online evaluations require direct interaction with users, and offline evaluations require just previously historic data. Online evaluation can handle more aspects of the explanations and produces more complete results, so it is encouraged, but tests involving users are not always accessible. In many scenarios, offline evaluation is preferred because it will not affect user’s experience through bad explanations, and it is also easier to implement than the online counterpart. Few research projects approach offline metrics to measure the quality of case-based ex- planation methods, so it is an ill-defined problem. Thus to cover this gap in the literature, we propose a new offline metric – ECT (Ease-of-interpretation Coverage Triviality) – to evaluate more complex aspects of case-based explanation methods. It is based on three sub metrics, one of them has already been proposed: (Coverage), and two new metrics: (Ease-of-interpretation and Triviality). Experiments were conducted in offline and online scenarios to validate the discriminative power of the proposed metric ECT, and show that it presents a 0.67 correlation with what users consider good explanations to their recommendations.