On Ethics and Algorithms

franck-v-g29arbbvPjo-unsplash
Photo by Franck V. on Unsplash

An article on the front page of the Observer, Revealed: how drugs giants can access your health records, caught my eye this week. In summary the article highlights that the Department of Health and Social Care (DHSC) has been selling the medical data of NHS patients to international drugs companies and have “misled” the public that the information contained in the records would be “anonymous”.

The data in question is collated from GP surgeries and hospitals and, according to “senior NHS figures”, can “routinely be linked back to individual patients’ medical records via their GP surgeries.” Apparently there is “clear evidence” that companies have identified individuals whose medical histories are of “particular interest.” The DHSC have replied by saying it only sells information after “thorough measures” have been taken to ensure patient anonymity.

As with many articles like this it is frustrating when some of the more technical aspects are not fully explained. Whilst I understand the importance of keeping their general readership on board and not frightening them too much with the intricacies of statistics or cryptography it would be nice to know a bit more about how these records are being made anonymous.

There is a hint of this in the Observer report when it states that the CPRD (the Clinical Practice Research Datalink ) says the data made available for research was “anonymous” but, following the Observer’s story, it changed the wording to say that the data from GPs and hospitals had been “anonymised”. This is a crucial difference. One of the more common methods of ‘anonymisation’  is to obscure or redact some bits of information. So, for example, a record could have patient names removed and ages and postcodes “coarsened”, that is only the first part of a postcode (e.g. SW1A rather than SW1A 2AA)  are included and ages are placed in a range rather than using someones actual age (e.g. 60-70 rather than 63).

The problem with anonymising data records is that they are prone to what is referred to as data re-identification or de-anonymisation. This is the practice of matching anonymous data with publicly available information in order to discover the individual to which the data belongs. One of the more famous examples of this is the competition that Netflix organised encouraging people to improve its recommendation system by offering a $50,000 prize for a 1% improvement. The Netflix Prize was started in 2006 but abandoned in 2010 in response to a lawsuit and Federal Trade Commission privacy concerns. Although the dataset released by Netflix to allow competition entrants to test their algorithms had supposedly been anonymised (i.e. by replacing user names with a meaningless ID and not including any gender or zip code information) a PhD student from the University of Texas was able to find out the real names of people in the supplied dataset by cross-referencing the Netflix dataset with Internet Movie Database (IMDB) ratings which people post publicly using their real names.

Herein lies the problem with the anonymisation of datasets. As Michael Kearns and Aaron Roth highlight in their recent book The Ethical Algorithm, when an organisation releases anonymised data they can try and make an intelligent guess as to which bits of the dataset to anonymise but it can be difficult (probably impossible) to anticipate what other data sources either already exist or could be made available in the future which could be used to correlate records. This is the reason that the computer scientist Cynthia Dwork has said “anonymised data isn’t” – meaning either it isn’t really anonymous or so much of the dataset has had to be removed that it is no longer data (at least in any useful way).

So what to do? Is it actually possible to release anonymised datasets out into the wild with any degree of confidence that they can never be de-anonymised? Thankfully something called differential privacy, invented by the aforementioned Cynthia Dwork and colleagues, allows us to do just that. Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in that dataset.

To understand how differential privacy works consider this example*. Suppose we want to conduct a poll of all people in London to find out who have driven after taking non-prescription drugs. One way of doing this is to randomly sample a suitable number of Londoners, asking them if they have ever driven whilst under the influence of drugs. The data collected could be entered into a spreadsheet and various statistics, e.g. number of men, number of women, maybe ages etc derived. The problem is that whilst collecting this information lots of compromising personal details may be collected which, if the data were stolen, could be used against them.

In order to avoid this problem consider the following alternative. Instead of asking people the question directly, first ask them to flip a coin but not to tell us how it landed. If the coin comes up heads they tell us (honestly) if they have driven under the influence. If it comes up tails however they tell us a random answer then flip the coin again and tell us “yes” if it comes up heads or “no” if it is tails. This polling protocol is a simple randomised algorithm which is a form of differential privacy. So how does this work?

differential privacy
If your answer is no, the randomised response answers no two out of three times. It answers no only one out of three times if your answer is yes. Diagram courtesy Michael Kearns and Aaron Roth, The Ethical Algorithm 2020

When we ask people if they have driven under the influence using this protocol half the time (i.e. when the coin lands heads up) the protocol tells them to tell the truth. If the protocol tells them to respond with a random answer (i.e. when the coin lands tails up), then half of that time they just happen to randomly tell us the right answer. So they tell us the right answer 1/2 + ((1/2) x (1/2)) or three-quarters of the time. The remaining one quarter of the time they tell us a lie. There is no way of telling true answers from lies. Surely though, this injection of randomisation completely masks the true results and the data is now highly error prone? Actually, it turns out, this is not the case.

Because we know how this randomisation is introduced we can reverse engineer the answers we get to remove the errors and get an approximation of the right answer. Here’s how. Suppose one-third of people in London have actually driven under the influence of drugs. So of the one-third who have truthfully answered “yes” to the question, three-quarters of those will answer “yes” using the protocol, that is 1/3 x 3/4 = 1/4. Of the two-thirds who have a truthful answer of “no”, one-quarter of those will report “yes”, that is 2/3 x 1/4 = 1/6. So we expect 1/4 + 1/6 = 5/12 ~ 1/3 of the population to answer “yes”.

So what is the point of doing the survey like this? Simply put it allows the true answer to be hidden behind the protocol. If the data were leaked and an individual from it was identified as being suspected of driving under the influence then they could always argue they were told to say “yes” because of the way the coins fell.

In the real world a number of companies including the US census, Apple, Google and Privitar Lens use differential privacy to limit the disclosure of private information about individuals whose information is in public databases.

It would be nice to think that the NHS data that is supposedly being used by US drug companies was protected by some form of differential privacy. If it were, and if this could be explained to the public in a reasonable and rational way, then surely we would all benefit both in the knowledge that our data is safe and is maybe even being put to good use in protecting and improving our health. After all, wasn’t this meant to be the true benefit of living in a connected society where information is shared for the betterment of all our lives?

*Based on an example from Kearns and Roth in The Ethical Algorithm.

Cummings needs data scientists, economists and physicists (oh, and weirdos)

Dominic Cummings
Dominic Cummings – Image Copyright Business Insider Australia

To answer my (rhetorical) question in this post I think it’s been pretty much confirmed since the election that Dominic Cummings is, in equal measures, the most influential, disruptive, powerful and dangerous man in British politics right now. He has certainly set the cat amongst the pigeons in this blog post where he has effectively by-passed the civil service recruitment process by advertising for people to join his ever growing team of SPAD’s (special advisors). Cummings is looking for data scientists, project managers, policy experts and assorted weirdos to join his team. (Interestingly today we hear that the self-proclaimed psychic Uri Geller has applied for the job believing he qualifies because of the super-talented weirdo aspect of the job spec.)

Cummings is famed for his wide reaching reading tastes and the job spec also cites a number of scientific papers potential applicants “will be considering”. The papers mentioned are broadly in the areas of complex systems and the use of maths and statistics in forecasting which give an inkling into the kind of problems Cummings sees as those that need to be ‘fixed’ in the civil service as well as the government at large (including the assertion that “Brexit requires many large changes in policy and in the structure of decision-making”).

Like many of his posts, this particular one tends to ramble and also be contradictory. In one paragraph he’s saying that you “do not need a PhD” but then in the very next one saying you  “must have exceptional academic qualifications from one of the world’s best universities with a PhD or MSc in maths or physics.”

Cummings also returns to one of his favourite topics which is that of the failure of projects – mega projects in particular – and presumably those that governments tend to initiate and not complete on time or to budget (or at all). He’s an admirer of some of the huge project successes of yesteryear such as The Manhattan Project (1940s), ICBMs (1950s) and Apollo (1960s) but reckons that since then the Pentagon has “systematically de-programmed itself from more effective approaches to less effective approaches from the mid-1960s, in the name of ‘efficiency’.” Certainly the UK government is no stranger to some spectacular project failures itself both in the past and present (HS2 and Crossrail being two more contemporary examples of not so much failures but certainly massive cost overruns).

However as John Naughton points out here  “these inspirational projects have some interesting things in common: no ‘politics’, no bureaucratic processes and no legal niceties. Which is exactly how Cummings likes things to be.” Let’s face it both Crossrail and HS2 would be a doddle of only you could do away with all those pesky planning proposals and environmental impact assessments you have to do and just move people out of the way quickly – sort of how they do things in China maybe?

Cummings believes that now is the time to bring together the right set of people with a sufficient amount of cognitive diversity and work in Downing Street with him and other SPADs to start to address some of the wicked problems of government. One ‘lucky’ person will be his personal assistant, a role which he says will “involve a mix of very interesting work and lots of uninteresting trivia that makes my life easier which you won’t enjoy.” He goes on to say that in this role you “will not have weekday date nights, you will sacrifice many weekends — frankly it will hard having a boy/girlfriend at all. It will be exhausting but interesting and if you cut it you will be involved in things at the age of ~21 that most people never see.” That’s quite some sales pitch for a job!

What this so called job posting is really about though is another of Cummings abiding obsessions (which he often discusses in his blog) that the government in general, and civil service in particular (which he groups together as “SW1”), is basically not fit for purpose because it is scientifically and technologically illiterate as well as being staffed largely with Oxbridge humanities graduates. The posting is also a thinly veiled attempt at pushing the now somewhat outdated ‘move fast and break things” mantra of Silicon Valley. An approach that does not always play out well in government (Universal Credit anyone). I well remember my time working at the DWP (yes, as a consultant) where one of the civil servants with whom I was working said that the only problem with disruption in government IT was that it was likely to lead to riots on the streets if benefit payments were not paid on time. Sadly, Universal Credit has shown us that it’s not so much street riots that are caused but a demonstrable increase in demand for food banks. On average, 12 months after roll-out, food banks see a 52% increase in demand, compared to 13% in areas with Universal Credit for 3 months or less.

Cummings of course would say that the problem is not so much that disruption per se causes problems but rather the ineffective, stupid and incapable civil servants who plan and deploy such projects are at fault, hence the need for hiring the right ‘assorted weirdos’ who will bring new insights that fusty old civil servants cannot see. Whilst he may well be right that SW1 is lacking in deep technical experts as well as great project managers and ‘unusual’ economists he needs to realise that government transformation cannot succeed unless it is built on a sound strategy and good underlying architecture. Ideas are just thoughts floating in space until they can be transformed into actions that result in change which takes into account that the ‘products’ that governments deal with are people not software and hardware widgets.

This problem is far better articulated by Hannah Fry when she says that although maths has, and will continue to have, the capability to transform the world those who apply equations to human behaviour fall into two groups: “those who think numbers and data ultimately hold the answer to everything, and those who have the humility to realise they don’t.”

Possibly the last words should be left to Barack Obama who cautioned Silicon Valley’s leaders thus:

“The final thing I’ll say is that government will never run the way Silicon Valley runs because, by definition, democracy is messy. This is a big, diverse country with a lot of interests and a lot of disparate points of view. And part of government’s job, by the way, is dealing with problems that nobody else wants to deal with.

So sometimes I talk to CEOs, they come in and they start telling me about leadership, and here’s how we do things. And I say, well, if all I was doing was making a widget or producing an app, and I didn’t have to worry about whether poor people could afford the widget, or I didn’t have to worry about whether the app had some unintended consequences — setting aside my Syria and Yemen portfolio — then I think those suggestions are terrific. That’s not, by the way, to say that there aren’t huge efficiencies and improvements that have to be made.

But the reason I say this is sometimes we get, I think, in the scientific community, the tech community, the entrepreneurial community, the sense of we just have to blow up the system, or create this parallel society and culture because government is inherently wrecked. No, it’s not inherently wrecked; it’s just government has to care for, for example, veterans who come home. That’s not on your balance sheet, that’s on our collective balance sheet, because we have a sacred duty to take care of those veterans. And that’s hard and it’s messy, and we’re building up legacy systems that we can’t just blow up.”

Now I think that’s a man who shows true humility, something our current leaders (and their SPADs) could do with a little more of I think.

 

From Turing to Watson (via Minsky)

This week (Monday 25th) I gave a lecture about IBM’s Watson technology platform to a group of first year students at Warwick Business School. My plan was to write up the transcript of that lecture, with links for references and further study, as a blog post. The following day when I opened up my computer to start writing the post I saw that, by a sad coincidence, Marvin Minsky the American cognitive scientist and co-founder of the Massachusetts Institute of Technology’s AI laboratory had died only the day before my lecture. Here is that blog post, now updated with some references to Minsky and his pioneering work on machine intelligence.

Minsky
Marvin Minsky in a lab at MIT in 1968 (c) MIT

First though, let’s start with Alan Turing, sometimes referred to as “the founder of computer science”, who led the team that developed a programmable machine to break the Nazi’s Enigma code, which was used to encrypt messages sent between units on the battlefield during World War 2. The work of Turing and his team was recently brought to life in the film The Imitation Game starring Benedict Cumberbatch as Turing and Keira Knightley as Joan Clarke, the only female member of the code breaking team.

Turing
Alan Turing

Sadly, instead of being hailed a hero, Turing was persecuted for his homosexuality and committed suicide in 1954 having undergone a course of hormonal treatment to reduce his libido rather than serve a term in prison. It seems utterly barbaric and unforgivable that such an action could have been brought against someone who did so much to affect the outcome of WWII. It took nearly 60 years for his conviction to be overturned when on 24 December 2013, Queen Elizabeth II signed a pardon for Turing, with immediate effect.

In 1949 Turing became Deputy Director of the Computing Laboratory at Manchester University, working on software for one of the earliest computers. During this time he worked in the emerging field of artificial intelligence and proposed an experiment which became known as the Turing test having observed that: “a computer would deserve to be called intelligent if it could deceive a human into believing that it was human.”

The idea of the test was that a computer could be said to “think” if a human interrogator could not tell it apart, through conversation, from a human being.

Turing’s test was supposedly ‘passed’ in June 2014 when a computer called Eugene fooled several of its interrogators that it was a 13 year old boy. There has been much discussion since as to whether this was a valid run of the test and that the so called “supercomputer,” was nothing but a chatbot or a script made to mimic human conversation. In other words Eugene could in no way considered to be intelligent. Certainly not in the sense that Professor Marvin Minsky would have defined intelligence at any rate.

In the early 1970s Minsky, working with the computer scientist and educator Seymour Papert, wrote a book called The Society of Mind, which combined both of their insights from the fields of child psychology and artificial intelligence.

Minsky and Papert believed that there was no real difference between humans and machines. Humans, they maintained, are actually machines of a kind whose brains are made up of many semiautonomous but unintelligent “agents.” Their theory revolutionized thinking about how the brain works and how people learn.

Despite the more widespread accessibility to apparently intelligent machines with programs like Apple Siri Minsky maintained that there had been “very little growth in artificial intelligence” in the past decade, saying that current work had been “mostly attempting to improve systems that aren’t very good and haven’t improved much in two decades”.

Minsky also thought that large technology companies should not get involved the field of AI saying: “we have to get rid of the big companies and go back to giving support to individuals who have new ideas because attempting to commercialise existing things hasn’t worked very well,”

Whilst much of the early work researching AI certainly came out of organisations like Minsky’s AI lab at MIT it seems slightly disingenuous to believe that commercialistion of AI, as being carried out by companies like Google, Facebook and IBM, is not going to generate new ideas. The drive for commercialisation (and profit), just like war in Turing’s time, is after all one of the ways, at least in the capitalist world, that innovation is created.

Which brings me nicely to Watson.

IBM Watson is a technology platform that uses natural language processing and machine learning to reveal insights from large amounts of unstructured data. It is named after Thomas J. Watson, the first CEO of IBM, who led the company from 1914 – 1956.

Thomas_J_Watson_Sr
Thomas J. Watson

IBM Watson was originally built to compete on the US television program Jeopardy.  On 14th February 2011 IBM entered Watson onto a special 3 day version of the program where the computer was pitted against two of the show’s all-time champions. Watson won by a significant margin. So what is the significance of a machine winning a game show and why is this a “game changing” event in more than the literal sense of the term?

Today we’re in the midst of an information revolution. Not only is the volume of data and information we’re producing dramatically outpacing our ability to make use of it but the sources and types of data that inform the work we do and the decisions we make are broader and more diverse than ever before. Although businesses are implementing more and more data driven projects using advanced analytics tools they’re still only reaching 12% of the data they have, leaving 88% of it to go to waste. That’s because this 88% of data is “invisible” to computers. It’s the type of data that is encoded in language and unstructured information, in the form of text, that is books, emails, journals, blogs, articles, tweets, as well as images, sound and video. If we are to avoid such a “data waste” we need better ways to make use of that data and generate “new knowledge” around it. We need, in other words, to be able to discover new connections, patterns, and insights in order to draw new conclusions and make decisions with more confidence and speed than ever before.

For several decades we’ve been digitizing the world; building networks to connect the world around us. Today those networks connect not just traditional structured data sources but also unstructured data from social networks and increasingly Internet of Things (IoT) data from sensors and other intelligent devices.

Data to Knowledge
From Data to Knowledge

These additional sources of data mean that we’ve reached an inflection point in which the sheer volume of information generated is so vast; we no longer have the ability to use it productively. The purpose of cognitive systems like IBM Watson is to process the vast amounts of information that is stored in both structured and unstructured formats to help turn it into useful knowledge.

There are three capabilities that differentiate cognitive systems from traditional programmed computing systems.

  • Understanding: Cognitive systems understand like humans do, whether that’s through natural language or the written word; vocal or visual.
  • Reasoning: They can not only understand information but also the underlying ideas and concepts. This reasoning ability can become more advanced over time. It’s the difference between the reasoning strategies we used as children to solve mathematical problems, and then the strategies we developed when we got into advanced math like geometry, algebra and calculus.
  • Learning: They never stop learning. As a technology, this means the system actually gets more valuable with time. They develop “expertise”. Think about what it means to be an expert- – it’s not about executing a mathematical model. We don’t consider our doctors to be experts in their fields because they answer every question correctly. We expect them to be able to reason and be transparent about their reasoning, and expose the rationale for why they came to a conclusion.

The idea of cognitive systems like IBM Watson is not to pit man against machine but rather to have both reasoning together. Humans and machines have unique characteristics and we should not be looking for one to supplant the other but for them to complement each other. Working together with systems like IBM Watson, we can achieve the kinds of outcomes that would never have been possible otherwise:

IBM is making the capabilities of Watson available as a set of cognitive building blocks delivered as APIs on its cloud-based, open platform Bluemix. This means you can build cognition into your digital applications, products, and operations, using any one or combination of a number of available APIs. Each API is capable of performing a different task, and in combination, they can be adapted to solve any number of business problems or create deeply engaging experiences.

So what Watson APIs are available? Currently there are around forty which you can find here together with documentation and demos. Four examples of the Watson APIs you will find at this link are:

Watson API - Dialog

 

Dialog

Use natural language to automatically respond to user questions

 

 

Watson API - Visual Recognition

 

Visual Recognition

Analyses the contents of an image or video and classifies by category.

 

 

Watson API - Text to Speech

 

Text to Speech

Synthesize speech audio from an input of plain text.

 

 

Watson API - Personality Insights

 

Personality Insights

Understand someones personality from what they have written.

 

 

It’s never been easier to get started with AI by using these cognitive building blocks. I wonder what Turing would have made of this technology and how soon someone will be able to pin together current and future cognitive building blocks to really pass Turing’s famous test?