Dealing with bias and discrimination in learning analytics models
Data-driven decision making powered nowadays mainly via Machine Learning and (Big) Data entail risk for discrimination as has been already shown in a variety of applications from search engines to target advertising and face recognition systems. In this project, we consider as bias “the inclination or prejudice of a decision made by a data-driven decision making system which is for or against one person or group, especially in a way considered to be unfair”. Bias and discrimination are old problems and “it is human nature for members of the dominant majority to be oblivious to the experiences of other groups”¹. However, data-driven decision making may magnify such pre-existing biases and even, evolve new types of biases leading to worse discrimination effects. The domain of fairness-aware machine learning has recently emerged to address the issues of fairness and discrimination in Machine Learning . In particular, methods have been proposed that focus on understanding bias at the data and/or model results, mitigate bias at different phases of the data analysis pipeline from training data to learning algorithms and model outputs as well as accounting for bias via bias-aware data collection, explaining model results etc.
In this project, we will focus on issues of bias and discrimination in the learning analytics domain which have gained attention recently² as predictive modelling has been deployed to different learning contexts, from traditional classrooms to MOOC. For example,  analysed MOOC data and showed that model discrimination is related to the course gender imbalance, course curricular area, and the individual course itself. In this project, we will focus in particular on data from the STEM domain, and mainly data from the Physics domain.
Particular goals of the project include:
- Understanding sources of bias in the learning analytics setting, in particular sociotechnical sources of bias. As an example, systems used for data collection are created by humans and whatever biases exist in humans might enter such systems and be reflected in the data.
- Understanding how bias is manifested in the data, for example, through sensitive features and their casual influences as well as through under-/over-representation of certain groups.
- Measuring discrimination in data/model results. Formalizing fairness is a hard task and this is reflected also on the variety of fairness definitions proposed over the years. Only in the computer science domain there are more than 20 different definitions of fairness  from statistical parity, to equal opportunity, equalized ods and counterfactual fairness, just to mention the most popular ones. We will investigate the utility of these measures for the learning-analytics setting.
- Mitigating bias: We will investigate different approaches to bias elimination from interventions at the input data (the so-called pre-processing approaches, e.g., ), to tweaking of the learning algorithms (the so-called in-processing approaches, e.g., ) and interventions at the model outputs (the so-called post-processing approaches, e.g., ) as well as hybrid approaches that tackle bias in the whole data-analytics pipeline (e.g., ). Most of the proposed approaches focus on the fully-supervised learning scenario, depending on the data challenges for the learning-analytics domain we will also investigate approaches for unsupervised learning and semi-supervised learning.
- Objective and extensive evaluation of the proposed methods in the learning-analytics context, especially in data coming from the Physics and general STEM domain. In particular, we are not only interested in one-shot evaluation but rather on studying long-term effects of fairness-aware interventions in the individual students and the class.
¹ Fei-Fei Li, Chief-Scientist for AI at Google and Professor at Stanford, http://fortune.com/longform/ai-bias-problem/.
² Don’t ignore the ethics of learning analytics, https://wonkhe.com/blogs/dont-ignore-the-ethics-of-learning-analytics/
- Ntoutsi et al, “Bias in Data-driven AI-systems – An Introductory Survey”, WIREs Data Mining and Knowledge Discovery (accepted 31/12/2019)
- Verma, S., & Rubin, J. (2018). “Fairness definitions explained”. In Fairware@ICSE (pp. 1–7). ACM
- Luong, B. T., Ruggieri, S., & Turini, F. (2011). “k-NN as an implementation of situation testing for discrimination discovery and prevention”. In KDD (pp. 502–510). ACM.
- Hardt, M., Price, E., & Srebro, N. (2016). “Equality of opportunity in supervised learning”. In NIPS (pp. 3315–3323).
- Zafar, M. B., Valera, I., Gomez-Rodriguez, M., & Gummadi, K. P. (2017). “Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment”. In WWW (pp. 1171–1180). ACM.
- Iosifidis, V., & Ntoutsi, E. (2019). AdaFair: Cumulative Fairness Adaptive Boosting. In CIKM. ACM
- Gardner et al, Evaluating the Fairness of Predictive Student Models Through Slicing Analysis, 2019, https://dl.acm.org/citation.cfm?id=3303791