What is fairness in an algorithmic world?
On 1 March 2017, the Information Commissioner’s Office (ICO) released its updated paper on big data, artificial intelligence, machine learning and data protection (2017 Paper). The aim of… Read more
On 1 March 2017, the Information Commissioner’s Office (ICO) released its updated paper on big data, artificial intelligence, machine learning and data protection (2017 Paper). The aim of the paper is to explore the data protection implications of these technologies and the debate around how artificial intelligence and big data will potentially impact on individuals, and through this they are hoping to set the agenda for the development of market practices.
The ICO recognises the potential benefits of big data, artificial intelligence and machine learning, but emphasises that they should not be at the expense of data protection, and that people should be treated fairly and decisions about them should be accurate and free from bias. The ICO discusses criticism of the traditional “notice and choice” model of data protection, whereby information as to data and purposes is given and consent obtained before processing commences. This is much harder to reconcile with the key features of big data analytics: the use of algorithms, the opacity of processing, the use of new types of data (in particular derived rather than submitted data) and the reversed workflow of big data analytics, which promotes the collection of data first and then identifying new purposes for using the data later. The 2017 Paper attempts to navigate a path through the tensions between these features and data protection legislation.
The paper is lengthy and cites many academic and industry articles, but for the purposes of this short article we will pick up on the key challenges posed to the fairness requirements, and consider how businesses can take steps to address some of these concerns.
Fairness is enshrined in the First Principle of the Data Protection Act 1998 (DP Act) and repeated in Article 5(1)(a) of the General Data Protection Regulation (GDPR). The ICO examines fairness with respect to the effects of the processing on the individuals, their expectations as to how their data will be used and the transparency of the processing.
One of the key advantages of algorithmic processing of large data sets is that it can identify correlations between seemingly unrelated data points. This can generate efficiencies and new insights. However, depending on the purpose, the effects can potentially be unfair. The ICO cites instances where individuals have been tarred by association, for example people’s credit limits being lowered based on an analysis of the poor repayment histories of other people who shopped at the same shops as them. However, the ICO considers the same type of processing may be beneficial at a social level in the insurance industry, to divide people into different risk groupings, even if that means some individuals end up with higher insurance premiums – provided this is in fact a more accurate assessment of the risk.
However, care should be taken, as we have heard of instances that would not seem to meet “fair and lawful processing”. Towards the end of last year, a paper was presented on AI, robotics, privacy and DP at the 38th International Privacy Conference which highlighted biases that had been seen in the use of artificial intelligence, e.g. a study by Carnegie Mellon University found that an ad-targeting algorithm was discriminatory, with searches returning higher paid jobs for men compared with women visiting job sites.
Transparency: The 2017 Paper identifies the difficulties experienced with the care.data project as an example of a situation where opacity about the processing of personal data led to a lack of public trust. The complexity of big data analytics, the fact that in many situations it is not apparent that data is being collected (e.g. mobile phone location) or how it is being processed all make it difficult for individuals to perceive and assess the processing of their data. This is particularly important where it is unclear how the processing of certain data leads to a particular decision being taken about the individual, for example where social media data is used in credit-scoring.
In more traditional uses of data, a business would first define its purposes and then collect and process the data. In the big data model, the data is collected first, processed to reveal correlations and then a purpose is identified. As the original data controller may have generated insights in a completely unrelated business, it may seek to sell its data set or provide its insights as a service. In each case, it would need to ensure that it complies with the DP Act (and in the future, the GDPR), including that such processing of personal data is transparent (e.g. by serving fair processing notices).
Expectations: In terms of individuals’ expectations, the ICO takes quite an ambivalent approach, acknowledging that many people simply provide data because it is the price of using Internet services. The 2017 Paper includes statistics from a number of studies demonstrating varying levels of concern for data privacy among respondents, but in a number of studies the ICO identifies the themes of “a feeling of resignation despite a general lack of trust, combined with a willingness for data to be used for socially useful purposes”. This isn’t addressed directly in the 2017 Paper, but perhaps reveals a fundamental challenge for data protection legislation – that individuals judge data processing primarily on its outcomes, as these are understandable and capable of assessment by the lay person. Interpreting a lengthy privacy notice, the relevant security standards and various potential disclosures takes a great deal of effort to be translated into real effects and outcomes for the individual. But this then raises the question, how do we as a society assess and police fair outcomes?
Algorithmic transparency: In response, the ICO promotes the concepts of algorithmic accountability and transparency to ensure that artificial intelligence and algorithms developed by machine learning systems work as originally intended and do not produce discriminatory, erroneous or unjustified results. One approach is algorithmic auditing, which first requires developers to incorporate processes into the algorithms to enable an audit, and then for independent third party auditors to carry out regular audits. The ICO compares this to a financial audit which is carried out in confidence to protect proprietary information, but then used to provide public assurance.
An alternative would be for the algorithm to contain the functionality to report on its own development. Through natural language generation, the algorithm could produce output text that explains why particular input cases were classified in a certain way. This could be useful in debugging and also as part of a product offering, with the ICO citing products that use visualisation methods to enable users to see why recommendations had been made for them and to review them to create more accurate recommendations.
Similarly, if users are given access to their profiles, they can review and correct them. This data correction effort by the user would have a dual benefit of demonstrating satisfaction of the GDPR accuracy principle (and aiding transparency) and also has the advantage of improving the algorithm with more accurate data and corrective input from a human.
The ICO cites the Big Data Ethics Initiative and other public and private sector organisations that have established ethical principles in relation to the processing of personal data, and views such principles as helpful in addressing the key issues of fairness and transparency, particularly if backed up by an ethics board with the powers to enforce them. Examples of these principles include the simple litmus test: “would you want the data of a member of your family to be used in this way?”
Ultimately, the ICO considers that “developing ethical principles and frameworks for big data is a job for data controllers rather than data protection authorities” (paragraph 181). We have seen these sorts of ethical frameworks being deployed in the healthcare sector where there is mass-processing of sensitive health data, but we expect similar principles to be adopted across other industries as the market develops.
The ICO acknowledges that the focus on accountability will not by itself resolve the data protection issues with big data and algorithmic processing, but it believes it will be a key part of future developments. The ICO emphasises that transparency remains an important aspect, and recommends adopting a more “layered” approach to provide information at an appropriate level of detail, depending on when the purposes of collecting and processing emerge and to reflect the sophistication of the reader (for example, regulators should be given a greater level of detail).
Implicit in the 2017 Paper is the acknowledgement that the data protection legislative framework, including the GDPR, may not be sufficient to obviate the potential misuse of personal data through big data and algorithmic processing. Although the technology and techniques are not themselves new, the inversion of the process through processing data first and defining purposes afterwards and the mantra of “correlation is king” pose challenges to the current regulatory framework and the requirement of fairness. The ICO believes the best way forwards is for industry to establish common standards and oversight processes, including ethical principles. However, fairness is inherently a political concept involved with weighing competing societal interests. As famously put by Lawrence Lessig, “code is law”, so this may be an area in which government may come to regulate in the future.
Putting the IC’s guidance in to practice
The ICO recommends a number of steps that businesses can take to address concerns with data processing in a big data and artificial intelligence context:
- Anonymisation: in a big data context, truly anonymised data can be difficult as enough data about an individual when viewed together may be sufficient to identify that individual. However, organisations that are able to use pseudonymous or anonymous data wherever possible will be in a far safer position
- Privacy notices: taking an innovative approach to privacy notices will most importantly help them be noticed and read by users, especially if provided in a meaningful format at appropriate stages. Such innovations may include increased use of icons, just-in-time notifications and layered privacy notices
- Privacy impact assessments: these should be embedded into big data processing activities to identify privacy risks as projects develop, with input from a range of key stakeholders (including the potential data subjects)
- Privacy by Design: as required by the GDPR, Privacy by Design is a key consideration for big data projects, and data security, data minimisation and data segregation should be considered at the outset of any big data project
- Transparency and accountability: the changes introduced by the GDPR shift the focus onto data governance. Whether this involves a set of big data principles, an ethics board, auditable algorithms or any other measures, organisations will need to establish clear governance processes and oversight over data processing to ensure accountability and champion transparency.
This article first appeared in Privacy Laws and Business.