Machines like us? – Part I

From The Secret of the Machines, Artist unknown

Our ambitions run high and low – for a creation myth made real, for a monstrous act of self love. As soon as it was feasible, we had no choice, but to follow our desires and hang the consequences.

Ian McEwan, Machines Like Me

I know what you’re thinking – not yet another post on ChatGPT! Haven’t enough words been written (or machine-generated) on this topic in the last few months to make the addition of any more completely unnecessary? What else is there to possibly say?

Well, we’ll see.

First, just in case you have been living in a cave in North Korea for the last year, what is ChatGPT? Let’s ask it…

ChatGPT is an AI language model developed by OpenAI. It is based on the GPT (Generative Pre-trained Transformer) architecture, specifically GPT-3.5. GPT-3.5 is a deep learning model that has been trained on a diverse range of internet text to generate human-like responses to text prompts.

ChatGPT response to the question: “What is ChatGPT”.

In this post, I am not interested in what use cases ChatGPT is or is not good for. I’m not even particularly interested in what jobs ChatGPT is going to replace in the coming years. Let’s face it, if the CEO of IBM, Arvind Krishna, is saying I could easily see 30 per cent of [non-customer-facing roles] getting replaced by AI and automation over a five-year period” then many people are already going to be worried so I’m not going to add to those fears.

I see much of what Krishna predicts as inevitable. Unless the world takes note of the recent letter from the tech/AI ‘great and the good’ (which appears to have some fake signatories anyway) then the simple fact is that if you can algorithmically capture a solution to a problem, people who solve those problems for a living are probably going to get replaced by such algorithms.

What interests me is whether deep learning models like ChatGPT really are intelligent and, more significantly, are they creative. In this post I’ll consider intelligence and in a later post creativity.

It turns out that intelligence is actually quite a hard thing to define. As Max Tegmark, president of the Future of Life Institute, says: “there’s no agreement on what intelligence is even amongst intelligent intelligence researchers!” Helpfully however, in his book Life 3.0, Tegmark puts us out of our misery and defines intelligence as the: “ability to accomplish complex goals”. This, as even Tegmark acknowledges, is a very broad definition. He also claims this definition subsumes the ability to “apply knowledge and skills” as this could be a goal one sets oneself.

Having defined intelligence it then becomes relatively easy, for Tegmark at least, to define artificial intelligence (AI) as: “non biological intelligence”; general intelligence as the: “ability to accomplish virtually any goal, including learning”; artificial general intelligence (AGI) as the: “ability to accomplish any cognitive task at least as well as humans” and superintelligence as: “general intelligence far beyond human level”.

So, given the relatively low bar that these definitions are set at, where would we say ChatGPT and its ilk are currently at on the intelligence scale?

I guess this really hinges on whether you think the multitude of tasks that these deep learning models can perform constitutes “accomplishing virtually any goal” or not? For many people (especially as we’ll see, AI researchers) the answer is probably yes.

But let’s ask ChatGPT if it thinks it’s an AGI?

It is important to note that as an AI language model, I am an instantiation of artificial general intelligence (AGI) and can provide insights based on my training up until September 2021.

Partial ChatGPT response to the question: “Given Max Tegmark’s definition of intelligence where would you say you are on the various intelligence levels he proposes?”.

Personally, and up until a few weeks ago, I would have said ChatGPT was getting a bit above itself to say it was an “instantiation” of an AGI but then I read an interview with Jaron Lanier titled How humanity can defeat AI.

Lanier works for Microsoft and is the author of a number of what you might call anti-social media books including You Are Not A Gadget and Ten Arguments For Deleting Your Social Media Accounts Right Now.

Lanier’s argument in this interview is that we have got AI wrong and we should not be treating it as a new form of intelligence at all. Indeed he has previously stated there is no AI. Instead Lanier reckons we have built a new and “innovative form of social collaboration”. Like the other social collaboration platforms that Lanier has argued we should all leave because they have gone horribly wrong this new form too could become perilous in nature if we don’t design it well. In Lanier’s view therefore the sooner we understand there is no such thing as AI, the sooner we’ll start managing our new technology intelligently and learn how to use it as a collaboration tool.

Whilst all of the above is well intentioned the real insightful moment for me came when Lanier was discussing Alan Turing’s famous test for intelligence. Let me quote directly what Lanier says.

You’ve probably heard of the Turing test, which was one of the original thought-experiments about artificial intelligence. There’s this idea that if a human judge can’t distinguish whether something came from a person or computer, then we should treat the computer as having equal rights. And the problem with that is that it’s also possible that the judge became stupid. There’s no guarantee that it wasn’t the judge who changed rather than the computer. The problem with treating the output of GPT as if it’s an alien intelligence, which many people enjoy doing, is that you can’t tell whether the humans are letting go of their own standards and becoming stupid to make the machine seem smart.

Jaron Lanier, How humanity can defeat AI, UnHerd, May 8th 2023

There is no doubt that we are in great danger of believing whatever bullshit GPT’s generate. The past decade or so of social media growth has illustrated just how difficult we humans find it to handle misinformation and these new and wondrous machines are only going to make that task even harder. This, coupled with the problem that our education system seems to reward the regurgitation of facts rather than developing critical thinking skills is, as journalist Kenan Malik says, increasingly going to become more of an issue as we try to figure out what is fake and what is true.

Interestingly, around the time Lanier was saying “there is no AI”, the so called “godfather of AI”, Geoffrey Hinton was announcing he was leaving Google because he was worried that AI could become “more intelligent than humans and could be exploited by ‘bad actors'”. Clearly, as someone who created the early neural networks that were the predecessors to the large language models GPTs are built on Hinton could not be described as being “stupid”, so what is going on here? Like others before him who think AI might be exhibiting signs of becoming sentient, maybe Hinton is being deceived by the very monster he has helped create.

So what to do?

Helpfully Max Tegmark, somewhat tongue-in-cheek, has suggested the following rules for developing AI (my comments are in italics):

  • Don’t teach it to code: this facilitates recursive self-improvement – ChatGPT can already code.
  • Don’t connect it to the internet: let it learn only the minimum needed to help us, not how to manipulate us or gain power – ChatGPT certainly connected to the internet to learn what it already knows.
  • Don’t give it a public API: prevent nefarious actors from using it within their code – OpenAI is releasing a public API.
  • Don’t start an arms race: this incentivizes everyone to prioritize development speed over safety – I think it’s safe to say there is already an AI arms race between the US and China.

Oh dear, it’s not going well is it?

So what should we really do?

I think Lanier is right. Like many technologies that have gone before, AI is seducing us into believing it is something it is not – even, it seems, to its creators. Intelligent it may well be, at least by Max Tegmark’s very broad definition of what intelligence is, but let’s not get beyond ourselves. Whilst I agree (and definitely fear) AI could be exploited by bad actors it is still, at a fundamental level, little more than a gargantuan mash up machine that is regurgitating the work of the people who have written the text and created the images it spits out. These mash ups may be fooling many of us some of the time (myself included) but we must be not be fooled into losing our critical thought processes here.

As Ian McEwan points out, we must be careful we don’t “follow our desires and hang the consequences”.

My Take on Web3 and THAT Letter

Anyone following the current Web3/cryptocurrency/NFT debate will know that last week 26 computer scientists, software engineers and technologists ‘penned’ a letter to various U.S. Congressional leaders warning them of the risks of a “technology that is not built for purpose and will remain forever unsuitable as a foundation for large-scale economic activity”.

The letter urged the recipients to “resist pressure from digital asset industry financiers, lobbyists, and boosters” and to take an approach that ensures “the technology is deployed in genuine service to the needs of ordinary citizens”.

This is quite an explosive claim and one that has, not unexpectedly, drawn the fire (and the ire) of the Web3 diehards. Some of the less inflammatory comments include:

  • Their professional work has nothing to do with cryptocurrencies, blockchain or finance so so I’m not seeing why they’re a signatory.
  • They don’t even have real tech experts they are a joke. Much like… who claims to be a “software engineer” but is spreading an insane amount of disinformation.
  • Many liars like you… making “assumptions” and “guesses” on something you just don’t understand at all.
  • … doesn’t want us all to know what a clown they are.
  • Why don’t you setup a debate and make your points with crypto community.. instead of blatantly spreading half-truths about crypto and their use cases. It’s such a shame that instead of becoming a topic of discussion, you guys want to turn it into us vs them.

Whilst I don’t claim to have the tech credentials of the group who signed this letter, as a former software engineer and software architect with some experience of permissioned blockchains (in a previous life I worked with Hyperledger Fabric) as well as a healthy interest in “responsible tech”, I do feel duty-bound to weigh in here.

First off I absolutely applaud the intent of this letter. I especially agree with “Not all innovation is unqualifiedly good; not everything that we can build should be built”. As a long time advocate (and practitioner) of teaching ethics as part of technology courses I truly believe that all technologists should at least have a basic understanding of value-sensitive design when building new products and services; especially those that have a large software component (and what doesn’t these days).

I also agree with the statement that a blockchain based Web3 is very much a “solution in search of a problem”. To understand why this is the case consider the origins of Bitcoin, still the dominant and arguably most successful use of blockchain to date. Bitcoin was launched in 2008 at the height of the financial crisis with the intent of being “a purely peer-to-peer version of electronic cash [that] would allow online payments to be sent directly from one party to another without going through a financial institution”. In other words, the use case for Bitcoin was to do away with banks and other financial institutions. Given the historical context of the time this may have seemed like a ‘good thing’ however the underlying intent was really to remove the trust that those failing institutions were meant to provide by encoding it in software instead. But is throwing tech at what is basically human and/or systemic failure a good thing and ever going to work out well? As Bruce Schneier (one of the signatories to the letter) says “What blockchain does is shift some of the trust in people and institutions to trust in technology. You need to trust the cryptography, the protocols, the software, the computers and the network”.

Put another way, is building (or trying to) systems that negate the need for trust in human interactions the correct and ethical thing to do? When we try to use technology to patch-up business, regulatory and societal problems then surely our moral compass has become seriously damaged. Yes, it’s a problem but I am not convinced the solution is a read-only, immutable ledger with smart contracts having the final say in what can and cannot be added to the ledger. Maybe the greed and immoral behaviour of the banks is what should really be addressed?

Web3 is sometimes erroneously referred to as the “new Internet” when it is, at best, an iteration of the current Internets application layer adding features such as immutability, decentralisation and smart contracts into the mix. Web3 advocates claim this will lead to a new nirvana that will finally allow content creators to break free from the chains of the Web 2.0 social networks allowing everyone to be recompensed for their work and their art in tokens or cryptocurrencies. For those less enthusiastic, Web3 is a techno-libertarians wet dream which will be no more decentralised, community-driven, secure, and private than anything else that is VC funded.

Whilst it is now de facto the case that Webs 1 and 2 are more or less entirely controlled by a few gatekeeper companies (Google, Facebook, Amazon, Netflix et al) we need to ask who owns or builds the infrastructure that Web3 will depend on and how are these current gatekeepers somehow suddenly going to disappear? Someone still has to build the servers and the chips that go in them, the routers, firewalls and networks that allow the servers to talk to each other and write and operate the software that glues all this hardware together. Is all of this is just going to disappear in Web3 or become open source? No, like it or not, it is going to continue to be controlled by the same large corporations. In addition we are also going to get the new Web3 corporations that are being formed right now by the likes of Jack DorseyBalaji SrinavasanPeter Thiel and Marc Andreessen who are all pumping their millions into this utopianist scheme. They are seeking to control and own Web3, just as they came to own its predecessors and don’t really care who is going to get hurt in the process.

The notion that in Web 3, users and builders alike can earn money and make a good living is pure fantasy. Sure, there are a few well publicised cases of people selling art work as NFTs but these are either established artists who have large followings already or part of elaborate whitewashing schemes that help the already-rich and well-connected (mostly white male collectors collecting other white men) to profit, thus serving as nothing more than a speculative finance instrument that will ultimately crash and burn like all other Ponzi schemes.

We should all be concerned when a small (relatively speaking) group of people are dictating what our society will be like without the majority either understanding or, even worse, caring? A bit like some of the side effects of the web, we’ll only realise when it’s too late (yes, I’m looking at you Facebook/Meta).

But, back to that letter. Although I agree wholeheartedly with its intent what I doubt is the ability of those who it has been sent to in actually being able to do anything about the problem. The letter implores the (US) leaders to “take a truly responsible approach to technological innovation” but is this really going to cut the mustard? After all these same leaders cannot even control guns in their own country so what chance is there in controlling a technology that I imagine most have little or no understanding of?

Further, even if something could be done in the US what chance for the rest of the world? After all, blockchains are hardly just a US phenomenon. The tech hegemony enjoyed by the US is ending and the likes of China and Russia are equally capable of building blockchains. Whilst I agree that we do indeed need to “act now” to protect ourselves this needs to be at a global, not just a US, level. Responsible technologists all over the world need to be highlighting the negative impacts of permissionless blockchains and not just guiding their leaders in how to deal with them but explaining to everyone else what the potential downsides of such technology could be.

In 1939 the scientist Albert Einstein wrote to President Roosevelt warning him of a different technology issue, that of nuclear fission and the fact that Germany may be working on a new weapon that utilised this, the atom bomb. This letter led to the US creating the Manhattan Project which resulted in it developing its own bomb. As we now know the final result of that letter was not great in that it led to the US detonating two such bombs over Hiroshima and Nagasaki in Japan.

In 1939 when Einstein wrote his letter the US had both the power and the money to go it alone in addressing that particular issue. In today’s interconnected world where America’s power is on the wane that is no longer possible. What is needed instead is a global initiative whereby new technologies that could fundamentally reshape our world in a negative way are thoroughly vetted and assessed before they are released on its unsuspecting citizens for it is they, not the VC’s who can afford to splash their billions on these high-risk ventures, who will be the ultimate losers.

So what would I actually do? Three things.

  1. Tech leaders around the world should lobby their political representatives on the potential dangers of Web3 if left to market forces and technologists to design and build.
  2. Everyone needs to educate themselves on at least the basics of this technology as well as the benefits and the dangers.
  3. Education institutions at all levels should instigate basic ethics programmes that teach young people the critical thinking skills needed to understand the potential impacts of technology on their lives to help them decide if that is the kind of world they want to grow up in.

Am I being idealistic? Maybe. At least though what this letter, and hopefully others like it, will do is open up the discussion which we all need to have if we want to have some influence on the the way this life-changing technology will affect us and our children.

Should we worry about those dancing robots?

Image Copyright Boston Dynamics

The robots in question are the ones built by Boston Dynamics who shared this video over the holiday period.

For those who have not been watching the development of this companies robots, we get to see the current ‘stars’ of the BD stable, namely: ‘Atlas’ (the humanoid robot), ‘Spot’ (the ‘dog’, who else?) and ‘Handle’ (the one on wheels) all coming together for a nice little Christmassy dance.

(As an aside, if you didn’t quite get what you wanted from Santa this year, you’ll be happy to know you can have your very own ‘Spot’ for a cool $74,500.00 from the Boston Dynamics online shop).

Boston Dynamics is an American engineering and robotics design company founded in 1992 as a spin-off from the Massachusetts Institute of Technology. Boston Dynamics is currently owned by the Hyundai Motor Group (since December, 2020) having previously been owned by Google X and SoftBank Group, the Japanese multinational conglomerate holding company.

Before I get to the point of this post, and attempt to answer the question posed by it, it’s worth knowing that five years ago the US Marine Corps, working with Boston Dynamics who were under contract with DARPA, decided to abandon a project to build a “robotic mule” that would carry heavy equipment for the Marines because the Legged Squad Support System (LS3) was too noisy. I mention this for two reasons: 1) that was five years ago, a long time in robotics/AI/software development terms and 2) that was a development we were actually told about, what about all those other military projects that are classified that BD may very well be participating in? More of this later.

So back to the central question: should we worry about those dancing robots? My answer is a very emphatic ‘yes’, for three reasons.


Reason Number One: It’s a “visual lie”

The first reason is nicely summed up by James J. Ward, a privacy lawyer, in this article. Ward’s point, which I agree with, is that this is an attempt to convince people that BD’s products are harmless and pose no threat because robots are fun and entertaining. Anyone who’s been watching too much Black Mirror should just chill a little and stop worrying. As Ward says:

“The real issue is that what you’re seeing is a visual lie. The robots are not dancing, even though it looks like they are. And that’s a big problem”.

Ward goes on to explain that when we watch this video and we see these robots appearing to be experiencing the music, the rhythmic motion, the human-like gestures we naturally start to feel the joyfulness and exuberance of the dance with them. The robots become anthropomorphised and we start to feel we should love them because they can dance, just like us. This however, is dangerous. These robots are not experiencing the music or the interaction with their ‘partners’ in any meaningful way, they have simply been programmed to move in time to a rhythm. As Ward says:

“It looks like human dancing, except it’s an utterly meaningless act, stripped of any social, cultural, historical, or religious context, and carried out as a humblebrag show of technological might.”

The more content like this that we see, the more familiar and normal it seems and the more blurred the line becomes between what it is to be human and what our relationship should be with technology. In other words, we will become as accepting of robots as we are now with our mobile phones and our cars and they will suddenly be integral parts of our life just like those relatively more benign objects are.

But robots are different.

Although we’re probably still some way off from the dystopian amusement park for rich vacationers depicted in the film Westworld, where customers can live out their fantasies through the use of robots that provide anything humans want we should not ignore the threat from robots and advanced artificial intelligence (AI) too quickly. Maybe then, videos like the BD one should serve as a reminder that now is the time to start thinking about what sort of relationship we want with this new breed of machine and start developing ethical frameworks on how we create and treat things that will look increasingly like us?


Reason Number Two: The robots divert us from the real issue

If the BD video runs the risk of making us more accepting of technology because it fools us into believing those robots are just like us, it also distracts us in a more pernicious way. Read any article or story on the threats of AI and you’ll aways see it appearing alongside a picture of a robot, and usually one that Terminator like is rampaging around shooting everything and everyone in sight. The BD video however shows that robots are fun and that they’re here to do work for us and entertain us, so let’s not worry about them or, by implication, their ‘intelligence’.

As Max Tegmark points out in his book Life 3.0 however, one of the great myths of the dangers of artificial intelligence is not that robots will rise against us and wage out of control warfare Terminator style, it’s more to do with the nature of artificial intelligence itself. Namely, that an AI whose goals are misaligned with our own, needs no body, just an internet connection, to wreak its particular form of havoc on our economy or our very existence. How so?

It’s all to do with the nature of, and how we define, intelligence. It turns out intelligence is actually quite a hard thing to define (and more so to get everyone to agree on a definition). Tegmark uses a relatively broad definition:

intelligence = ability to accomplish complex goals

and it then follows that:

artificial intelligence = non-biological intelligence

Given these definitions then, the real worry is not about machines becoming malevolent but about machines becoming very competent. In other words what about if you give a machine a goal to accomplish and it decides to achieve that goal no matter what the consequences?

This was the issue so beautifully highlighted by Stanley Kubrick and Arthur C. Clarke in the film 2001: A Space Odyssey. In that film the onboard computer (HAL) on a spaceship bound for Jupiter ends up killing all of the crew but one when it fears its goal (to reach Jupiter) maybe jeopardised. HAL had no human-like manifestation (no arms or legs), it was ‘just’ a computer responsible for every aspect of controlling the spaceship and eminently able to use that power to kill several of the crew members. As far as HAL was concerned it was just achieving its goal – even if it did mean dispensing with the crew!

It seems that hardly a day goes by without there being news of not just our existing machines becoming ever more computerised but with those computers becoming ever more intelligent. For goodness sake, even our toothbrushes are now imbued with AI! The ethical question here then is how much AI is enough and just because you can build intelligence into a machine or device, does that mean you actually should?


Reason Number Three: We maybe becoming “techno-chauvinists”

One of the things I always think when I see videos like the BD one is, if that’s what these companies are showing is commercially available, how far advanced are the machines they are building, in secret, with militaries around the world?

Is there a corollary here with spy satellites? Since the end of the Cold War, satellite technology has advanced to such a degree that we are being watched — for good or for bad — almost constantly by military, and commercial organisations. Many of the companies doing the work are commercial with the boundary between military and commercial now very blurred. As Pat Norris, a former NASA engineer who worked on the Apollo 11 mission to the moon and author of Spies in the Sky says “the best of the civilian satellites are taking pictures that would only have been available to military people less than 20 years ago”. If that is so then what are the military satellites doing now?

In his book Megatech: Technology in 2050 Daniel Franklin points out that Western liberal democracies often have a cultural advantage, militarily over those who grew up under a theocracy or authoritarian regime. With a background of greater empowerment in decision making and encouragement to learn from, and not be penalised by, mistakes, Westerners tend to display greater creativity and innovation. Education systems in democracies encourage the type of creative problem-solving that is facilitated by timely intelligence as well as terabytes of data that is neither controlled nor distorted by an illiberal regime.

Imagine then how advanced some of these robots could become, in military use, if they are trained using all of the data available to them from past military conflicts, both successful and not so successful campaigns?

Which brings me to my real concern about all this. If we are training our young scientists and engineers to build ‘platforms’ (which is how Boston Dynamics refers to its robots) that can learn from all of this data, and maybe to begin making decisions which are no longer understood by their creators, then whose responsibility is it when things go wrong?

Not only that, but what happens when the technology that was designed by an engineering team for a relatively benign use, is subverted by people who have more insidious ideas for deploying those ‘platforms’? As Meredith Broussard says in her book Artificial Unintelligence: “Blind optimism about technology and an abundant lack of caution about how new technologies will be used are a hallmark of techno-chauvinism”.


As engineers and scientists who hopefully care about the future of humanity and the planet on which we live surely it is beholden on us all to morally and ethically think about the technology we are unleashing? If we don’t then what Einstein said at the advent of the atomic age rings equally true today:

“It has become appallingly obvious that our technology has exceeded our humanity.”

Albert Einstein

Three types of problem, and how to solve them

Image by Thanasis Papazacharias from Pixabay

We are all problem solvers. Whether it be trying to find our car keys, which we put down somewhere when we came home from work or trying to solve some of the world’s more gnarly issues like climate change, global pandemics or nuclear arms proliferation.

Human beings have the unique ability not just to individually work out ways to fix things but also to collaborate with others, sometimes over great distances, to address great challenges and seemingly intractable problems. How many of us though, have thought about what we do when we try to solve a problem? Do we have a method for problem solving?

As Albert Einstein once said: “We cannot solve our problems by using the same kind of thinking we used when we created them.” This being the case (and who would argue with Einstein) it would be good to have a bit of a systematic approach to solving problems.

On the Digital Innovators Skills Programme we spend some time looking at types of problem as well as the methods and tools we have at our disposal to address them. Here, I’ll take a look at the technique we use but first, what types of problem are there?

We can think of problems as being one of three types: Simple, Complex and Wicked, as shown in this diagram.

3 Problem Types

Simple problems are ones that have a single cause, are well defined and have a clear and unambiguous solution. Working out a route to travel e.g. from Birmingham to Lands’ End is an example of a simple problem (as is finding those lost car keys).

Complex problems tend to have multiple causes, are difficult to understand and their solutions can lead to other problems and unintended consequences. Addressing traffic congestion in a busy town is an example of a complex problem.

Wicked problems are problems that seem to be so complex it’s difficult to envision a solution. Climate change is an example of a wicked problem.

Wicked problems are like a tangled mess of thread – it’s difficult to know which to pull first. Rittel and Webber, who formulated the concept of wicked problems, identified them as having the following characteristics:

  1. Difficult to define the problem.
  2. Difficult to know when the problem has been solved.
  3. No clear right or wrong solutions.
  4. Difficult to learn from previous success to solve the problem.
  5. Each problem is unique.
  6. There are too many possible solutions to list and compare.

Problems, of all types, can benefit from a systematic approach to being solved. There are many frameworks that can be used for addressing problems but at Digital Innovators we use the so called 4S Method proposed by Garrette, Phelps and Sibony.

The 4S Method is a problem-solving toolkit that works with four, iterative steps: State, Structure, Solve and Sell.

The 4S Method
  1. State the Problem. It might sound obvious but unless you understand exactly what the problem is you are trying to solve it’s going to be very difficult to come up with a solution. The first step is therefore to state exactly what the problem is.
  2. Structure the Problem. Having clearly stated what the problem probably means you now know just how complex, or even wicked, it is. The next step is to structure the problem by breaking down into smaller, hopefully more manageable parts each of which can hopefully be solved through analysis.
  3. Solve the Problem. Having broken the problem down each piece can now be solved separately. The authors of this method suggest three main approached: hypothesis-driven problem solving, issue-driven problem solving, or the creative path of design thinking.
  4. Sell the Solution. Even if you come up with an amazing and innovative solution to the problem, if you cannot persuade others of its value and feasibility your amazing idea will never get implemented or ever be known about. When selling always focus on the solution, not the steps you went through to arrive at it.

Like any technique, problem solving can be learned and practiced. Even the world’s greatest problem solvers are not necessarily smarter than you are. It’s just that they have learnt and practised their skills then mastered them through continuous improvement.

If you are interested in delving more deeply into the techniques discussed here Digital Innovators will coach you in these as well as other valuable, transferable business skills and also give you chance to practice these skills on real-life projects provided to us by employers. We are currently enrolling students for our next programme which you can register an interest for here.

Happy New Year from Software Architecture Zen.

Tech skills are not the only type of skill you’ll need in 2021

Image by Gerd Altmann from Pixabay

Whilst good technical skills continue to be important these alone will not be enough to enable you to succeed in the modern, post-pandemic workplace. At Digital Innovators, where I am Design and Technology Director, we believe that skills with a human element are equally, if not more, important if you are to survive in the changed working environment of the 2020’s. That’s why, if you attend one of our programmes during 2021, you’ll also learn these, as well as other, people focused, as well as transferable, skills.

1. Adaptability

The COVID-19 pandemic has changed the world of work not just in the tech industry but across other sectors as well. Those organisations most able to thrive during the crisis were ones that were able to adapt quickly to new ways of working whether that is full-time office work in a new, socially distanced way, a combination of both office and remote working, or a completely remote environment. People have had to adapt to these ways of working whilst continuing to be productive in their roles. This has meant adopting different work patterns, learning to communicate in new ways and dealing with a changed environment where work, home (and for many school) have all merged into one. Having the ability to adapt to these new challenges is a skill which will be more important than ever as we embrace a post-pandemic world.

Adaptability also applies to learning new skills. Technology has undergone exponential growth in even the last 20 years (there were no smartphones in 2000) and has been adopted in new and transformative ways by nearly all industries. In order to keep up with such a rapidly changing world you need to be continuously learning new skills to stay up-to-date and current with industry trends. 

2. Collaboration and Teamwork

Whilst there are still opportunities for the lone maverick, working away in his or her bedroom or garage, to come up with new and transformative ideas, for most of us, working together in teams and collaborating on ideas and new approaches is the way we work best.

In his book Homo Deus – A Brief History of Tomorrow, Yuval Noah Harari makes the observation: “To the best of our knowledge, only Sapiens can collaborate in very flexible ways with countless numbers of strangers. This concrete capability – rather than an eternal soul or some unique kind of consciousness – explains our mastery over planet Earth.

On our programme we encourage and demand our students to collaborate from the outset. We give them tasks to do (like drawing how to make toast!) early on, then build on these, leading up to a major 8-week projects where students work in teams of four or five to define a solution to a challenge set by one of our industry partners. Students tell us this is one of their favourite aspects of the programme as it allows them to work with new people from a diverse range of backgrounds to come up with new and innovative solutions to problems.

3. Communication

Effective communication skills, whether they be written spoken or aural, as well as the ability to present ideas well, have always been important. In a world where we are increasingly communicating through a vast array of different channels, we need to adapt our core communications skills to thrive in a virtual as well as an offline environment.

Digital Innovators teach their students how to communicate effectively using a range of techniques including a full-day, deep dive into how to create presentations that tell stories and really enable you to get across your ideas.

4. Creativity

Pablo Picasso famously said “Every child is an artist; the problem is staying an artist when you grow up”.

As Hugh MacLeod, author of Ignore Everybody, And 39 Other Keys to Creativity says: “Everyone is born creative; everyone is given a box of crayons in kindergarten. Then when you hit puberty they take the crayons away and replace them with dry, uninspiring books on algebra, history, etc. Being suddenly hit years later with the ‘creative bug’ is just a wee voice telling you, ‘I’d like my crayons back please.’”

At Digital Innovators we don’t believe that it’s only artists who are creative. We believe that everyone can be creative in their own way, they just need to learn how to let go, be a child again and unlock their inner creativity. That’s why on our skills programme we give you the chance to have your crayons back.

5. Design Thinking

Design thinking is an approach to problem solving that puts users at the centre of the solution. It includes proven practices such as building empathy, ideation, storyboarding and extreme prototyping to create new products, processes and systems that really work for the people that have to live with and use them.

For Digital Innovators, Design Thinking is at the core of what we do. As well as spending a day-and-a-half teaching the various techniques (which our students learn by doing) we use Design Thinking at the beginning of, and throughout, our 8-week projects to ensure the students deliver solutions are really what our employers want.

6. Ethics

The ethical aspects on the use of digital technology in today’s world is something that seems to be sadly missing from most courses in digital technology. We may well churn out tens of thousands of developers a year, from UK universities alone, but how many of these people ever give anything more than a passing thought to the ethics of the work they end up doing? Is it right, for example, to build systems of mass surveillance and collect data about citizens that most have no clue about? Having some kind of ethical framework within which we operate is more important today than ever before.

That’s why we include a module on Digital Ethics as part of our programme. In it we introduce a number of real-world, as well as hypothetical case studies that challenge students to think about the various ethical aspects of the technology they already use or are likely to encounter in the not too distant future.

7. Negotiation

Negotiation is a combination of persuasion, influencing and confidence as well as being able to empathise with the person you are negotiating with and understanding their perspective. Being able to negotiate, whether it be to get a pay rise, buy a car or sell the product or service your company makes is one of the key skills you will need in your life and career, but one that is rarely taught in school or even at university.

As Katherine Knapke, the Communications & Operations Manager at the American Negotiation Institute says: “Lacking in confidence can have a huge impact on your negotiation outcomes. It can impact your likelihood of getting what you want and getting the best possible outcomes for both parties involved. Those who show a lack of confidence are more likely to give in or cave too quickly during a negotiation, pursue a less-aggressive ask, and miss out on opportunities by not asking in the first place”. 

On the Digital Innovators skills programme you will work with a skilled negotiator from The Negotiation Club to practice and hone your negotiation skills in a fun way but in a safe environment which allows you to learn from your mistakes and improve your negotiation skills.

The ethics of contact tracing

After a much publicised “U-turn” the UK government has decided to change the architecture of its coronavirus contact tracing system and to embrace the one based on the interfaces being provided by Apple and Google. The inevitable cries of a government that does not know what it is doing, we told you it wouldn’t work and this means we have wasted valuable time in building a system that would help protect UK citizens have ensued. At times like these it’s often difficult to get to the facts and understand where the problems actually lie. Let’s try and unearth some facts and understand the options for the design of a contact tracing app.

Any good approach to designing a system such as contact tracing should, you would hope, start with the requirements. I have no government inside knowledge and it’s not immediately apparent from online searches what the UK governments exact and actual requirements were. However as this article highlights you would expect that a contact tracing system would need to “involve apps, reporting channels, proximity-based communication technology and monitoring through personal items such as ID badges, phones and computers.” You might also expect it to involve cooperation with local health service departments. Whether or not there is also a requirement to collate data in some centralised repository so that epidemiologists, without knowing the nature of the contact, can build a model of contacts to see if they are serious spreaders or those who have tested positive yet are asymptomatic, at least for the UK, is not clear. Whilst it would seem perfectly reasonable to want the system to do that, this is a different use case to the one of contact tracing. One might assume that because the UK government was proposing a centralised database for tracking data this latter use case was also to be handled by the system.

Whilst different countries are going to have different requirements for contact tracing one would hope that for any democratically run country a minimum set of requirements (i.e. privacy, anonymity, transparency and verifiability, no central repository and minimal data collection) would be implemented.

The approach to contact tracing developed by Google and Apple (the two largest providers of mobile phone operating systems) was published in April of this year with the detail of the design being made available in four technical papers. Included as part of this document set were some frequently asked questions where the details of how the system would work were explained using the eponymous Alice and Bob notation. Here is a summary.

  1. Alice and Bob don’t know each other but happen to have a lengthy conversation sitting a few feet apart on a park bench. They both have a contact tracing app installed on their phones which exchange random Bluetooth identifiers with each other. These identifiers change frequently.
  2. Alice continues her day unaware that Bob had recently contracted Covid-19.
  3. Bob feels ill and gets tested for Covid-19. His test results are positive and he enters his result into his phone. With Bob’s consent his phone uploads the last 14 days of keys stored on his phone to a server.
  4. Alice’s phone periodically downloads the Bluetooth beacon keys of everyone who has tested positive for Covid-19 in her immediate vicinity. A match is found with Bob’s randomly generated Bluetooth identifier.
  5. Alice sees a notification on her phone warning her she has recently come into contact with someone who has tested positive with Covid-19. What Alice needs to do next is decided by her public health authority and will be provided in their version of the contact tracing app.

There are a couple of things worth noting about this use case:

  1. Alice and Bob both have to make an explicit choice to turn on the contact tracing app.
  2. Neither Alice or Bob’s names are ever revealed, either between themselves or to the app provider or health authority.
  3. No location data is collected. The system only knows that two identifiers have previously been within range of each other.
  4. Google and Apple say that the Bluetooth identifiers change every 10-20 minutes, to help prevent tracking and that they will disable the exposure notification system on a regional basis when it is no longer needed.
  5. Health authorities of any other third parties do not receive any data from the app.

Another point to note is that initially this solution has been released via application programming interfaces (APIs) that allow customised contact tracing apps from public health authorities to work across Android and iOS devices. Maintaining user privacy seems to have been a key non-functional requirement of the design. The apps are made available from the public health authorities via the respective Apple and Google app stores. A second phase has also been announced whereby the capability will be embedded at the operating system level meaning no app has to be installed but users still have to opt into using the capability. If a user is notified she has been in contact with someone with Covid-19 and has not already downloaded an official public health authority app they will be prompted to do so and advised on next steps. Only public health authorities will have access to this technology and their apps must meet specific criteria around privacy, security, and data control as mandated by Apple and Google.

So why would Google and Apple choose to implement its contact tracing app in this way which would seem to be putting privacy ahead of efficacy? More importantly why should Google and Apple get to dictate how countries should do contact tracing?

Clearly one major driver from both companies is that of security and privacy. Post-Snowden we know just how easy it has been for government security agencies (i.e. the US National Security Agency and UK’s Government Communications Headquarters) to get access to supposedly private data. Trust in central government is at an all time low and it is hardly surprising that the corporate world is stepping in to announce that they were the good guys all along and you can trust us with your data.

Another legitimate reason is also that during the coronavirus pandemic we have all had our ability to travel even locally, never mind nationally or globally, severely restricted. Implementing an approach that is supported at the operating system level means that it should be easier to make the app compatible with other countries’ counterparts, which are based on the same system therefore making it safer for people to begin travelling internationally again.

The real problem, at least as far as the UK has been concerned, is that the government has been woefully slow in implementing a rigorous and scaleable contact tracing system. It seems as though they may have been looking at an app-based approach to be the silver bullet that would solve all of their problems – no matter how poorly identified these are. Realistically that was never going to happen, even if the system had worked perfectly. The UK is not China and could never impose an app based contact tracing system on its populace, could it? Lessons from Singapore, where contact tracing has been in place for some time, are that the apps do not perform as required and other more intrusive measures are needed to make them effective.

There will now be the usual blame game between government, the press, and industry, no doubt resulting in the inevitable government enquiry into what went wrong. This will report back after several months, if not years, of deliberation. Blame will be officially apportioned, maybe a few junior minister heads will roll, if they have not already moved on, but meanwhile the trust that people have in their leaders will be chipped away a little more.

More seriously however, will we have ended up, by default, putting more trust into the powerful corporations of Silicon Valley some of whom not only have greater valuations than many countries GDP but are also allegedly practising anti-competitive behaviour?

Update: 21st June 2020

Updated to include link to Apple’s anti-trust case.

Trust Google?

Photo by Daniele Levis Pelusi on Unsplash

Google has just released data on people’s movements, gathered from millions of mobile devices that use its software (e.g. Android, Google Maps etc) leading up to and during the COVID-19 lockdown in various countries. The data has been analysed here to show graphically how people spent their time between six location categories: homes; workplaces; parks; public transport stations; grocery shops and pharmacies; and retail and recreational locations.

The data shows how quickly people reacted to the instructions to lockdown. Here in the UK for example we see people reacted late but then strongly, with a rise of about 20-25% staying at home. This delay reflects the fact that lockdown began later, on March 23, in the UK though some people were already staying home before lockdown began.

What we see in the data provided by Google is likely to be only the start and, I suspect, a preview of how we may soon have to live. In the book Homo Deus by Yuval Noah Harari the chapter The Great Decoupling discusses how bioscience and computer science are conspiring to learn more about us than we know about ourselves and in the process destroy the “great liberal project” where we think that we have free-will and are able to make our own decisions about what we eat, who we marry and vote for in elections as well as what career path we choose etc, etc.

Harari asks what will happen when Google et al know more about us than we, or anyone else does? Facebook, for example, already purports to know more about us than our spouse by analysing as few as 300 of our ‘likes’. What if those machines who are watching over us (hopefully with “loving grace” but who knows) can offer us ‘advice’ on who we should vote for based on our previous four years comments and ‘likes’ on Facebook or recommend we should go and see a psychiatrist because of the somewhat erratic comments we have been making in emails to our friends or on Twitter?

The Google we see today, providing us with relatively benign data for us to analyse ourselves, is currently at the level of what Harari says is an ‘oracle’. It has the data and, with the right interpretation, we can use that data to provide us with information to make decisions. Exactly where we are now with coronavirus and this latest dataset.

The next stage is that of Google becoming an ‘agent’. You give Google an aim and it works out the best way to achieve that aim. Say, I want to lose two stone by next summer so I have the perfect beach ready body. Google knows all about my biometric data (they just bought Fitbit remember) as well as your predisposition for buying crisps and watching too much Netflix and comes up with a plan that will allow you to lose that weight provided you follow it.

Finally Google becomes ’sovereign’ and starts making those decisions for you. So maybe it checks your supermarket account and recommends removing those crisps from your shopping list and then, if you continue to ignore its advice it instructs your insurance company who bumps up your health insurance if you don’t.

At this point we ask who is in control. Google, Facebook etc own all that data but that data can be influenced (or hacked) to nudge us to do things we don’t realise. We already know how Cambridge Analytica used Facebook to influence the voting behaviour (we’re looking at you Mr Cummings) in a few swing areas (for Brexit and the last US election). We have no idea how much of that was also being influenced by Russia.

I think humanity is rapidly approaching the point when we really need to be making some hard decisions about how much of our data, and the analysis of that data, we should allow Google, Facebook and Twitter to hold. Should we be starting to think the unthinkable and calling a halt to this ever growing mountain of data each of us willingly gives away for free? But, how do we do that when most of it is being kept and analysed by private companies or worse, by China and Russia?

Pythons and pandas (or why software architects no longer have an excuse not to code)

pythonpanda

The coronavirus pandemic has certainly shown just how much the world depends not just on accurate and readily available datasets but also the ability of scientists and data analysts to make sense of that data. All of us are at the mercy of those experts to interpret this data correctly – our lives could quite literally depend on it.

Thankfully we live in a world where the tools are available to allow anyone, with a bit of effort, to learn how to analyse data themselves and not just rely on the experts to tell us what is happening.

The programming language Python, coupled with the pandas dataset analysis library and Bokeh interactive visualisation library, provide a robust and professional set of tools to begin analysing data of all sorts and get it into the right format.

Data on the coronavirus pandemic is available from lots of sources including the UK’s Office for National Statistics as well as the World Health Organisation. I’ve been using data from DataHub which provides datasets in different formats (CSV, Excel, JSON) across a range of topics including climate change, healthcare, economics and demographics. You can find their coronavirus related datasets here.

I’ve created a set of resources which I’ve been using to learn Python and some of its related libraries which is available on my GitHub page here. You’ll also find the project which I’ve been using to analyse some of the COVID-19 data around the world here.

The snippet of code below shows how to load a CSV file into a panda DataFrame – a 2-dimensional data structure that can store data of different types in columns that is similar to a spreadsheet or SQL table.

# Return COVID-19 info for country, province and date.
def covid_info_data(country, province, date):
    df4 = pd.DataFrame()
    if (country != "") and (date != ""):
        try:
            # Read dataset as a panda dataframe
            df1 = pd.read_csv(path + coviddata)

            # Check if country has an alternate name for this dataset
            if country in alternatives:
                country = alternatives[country]

            # Get subset of data for specified country/region
            df2 = df1[df1["Country/Region"] == country]

            # Get subset of data for specified date
            df3 = df2[df2["Date"] == date]

            # Get subset of data for specified province. If none specified but there
            # are provinces the current dataframe will contain all with the first one being 
            # country and province as 'NaN'. In that case just select country otherwise select
            # province as well.
            if province == "":
                df4 = df3[df3["Province/State"].isnull()]
            else:
                df4 = df3[df3["Province/State"] == province]
        except FileNotFoundError:
            print("Invalid file or path")
    # Return selected covid data from last subset
    return df4

The first ten rows from the DataFrame df1 shows the data from the first country (Afghanistan).

         Date Country/Region Province/State   Lat  Long  Confirmed  Recovered  Deaths
0  2020-01-22    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
1  2020-01-23    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
2  2020-01-24    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
3  2020-01-25    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
4  2020-01-26    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0

Three further subsets of data are made, the final one is for a specific country showing the COVID-19 data for a particular date (the UK on 7th May in this case).

             Date  Country/Region Province/State      Lat   Long  Confirmed  Recovered   Deaths
26428  2020-05-07  United Kingdom            NaN  55.3781 -3.436   206715.0        0.0  30615.0

Once the dataset has been obtained the information can be printed in a more readable way. Here’s a summary of information for the UK on 9th May.

Date:  2020-05-09
Country:  United Kingdom
Province: No province
Confirmed:  215,260
Recovered:  0
Deaths:  31,587
Population:  66,460,344
Confirmed/100,000: 323.89
Deaths/100,000: 47.53
Percent Deaths/Confirmed: 14.67

Obviously there are lots of ways of analysing this dataset as well as how to display it. Graphs are always a good way of showing information and Bokeh is a nice and relatively simple to use Python library for creating a range of different graphs. Here’s how Bokeh can be used to create a simple line graph of COVID-19 deaths over a period of time.

from datetime import datetime as dt
from bokeh.plotting import figure, output_file, show
from bokeh.models import DatetimeTickFormatter

def graph_covid_rate(df):
    x = []
    y = []
    country = df.values[0][1]
    for deaths, date in zip(df['Deaths'], df['Date']):
        y.append(deaths) 
        date_obj = dt.strptime(date, "%Y-%m-%d")
        x.append(date_obj)

    # output to static HTML file
    output_file("lines.html")

    # create a new plot with a title and axis labels
    p = figure(title="COVID-19 Deaths for "+country, x_axis_label='Date', y_axis_label='Deaths', x_axis_type='datetime')

    # add a line renderer with legend and line thickness
    p.line(x, y, legend_label="COVID-19 Deaths for "+country, line_width=3, line_color="green")
    p.xaxis.major_label_orientation = 3/4

    # show the results
    show(p)

Bokeh creates an HTML file of an interactive graph. Here’s the one the above code creates, again for the UK, for the period 2020-02-01 to 2020-05-09.

As a recently retired software architect (who has now started a new career working for Digital Innovators, a company addressing the digital skills gap) coding is still important to me. I’m a believer in the Architect’s Don’t Code anti-pattern believing that design and coding are two sides of the same coin and you cannot design if you cannot code (and you cannot code if you cannot design). These days there really is no excuse not to keep your coding skills up to date with the vast array of resources available to everyone with just a few clicks and Google searches.

I also see coding as not just a way of keeping my own skills up to date and to teach others vital digital skills, but also, as this article helpfully points out, as a way of helping solve problems of all kinds. Coding is a skill for life that is vitally important for young people entering the workplace to at least have a rudimentary understanding of to help them not just get a job but to also understand more of the world in these incredibly uncertain times.

All Watched Over by Machines of Loving Grace?

 

This-HAL-9000-Inspired-AI-Simulation-Kept-Its-Virtual-Astronauts-Alive
The Watching “Eye” of the HAL 9000 Computer from 2001 – A Space Odyssey

I like to think
(it has to be!)
of a cybernetic ecology
where we are free of our labors
and joined back to nature,
returned to our mammal
brothers and sisters,
and all watched over
by machines of loving grace.

The last verse of Richard Brautigan’s 1967 poem, All Watched Over by Machines of Loving Grace, has a particular resonance during these dark and uncertain times caused by the COVID-19 pandemic[1].

The poem, which was also the name of a BBC documentary series by Adam Curtis[2], speaks of a time when we can return to nature and that mammals and computers will live together in “mutually programming harmony” with machines taking care of all our needs.

Things haven’t quite turned out like that have they?

In some kind of warped way maybe our machines are taking care of our needs but are they things we really need taken care of? If by “meeting our needs” we mean machines whose algorithms predict and dictate our shopping choices (Amazon), influence our voting behaviour (Facebook), satisfy our sexual preferences (Tinder, Grindr) or find us cheap rides and accommodation (Uber and Airbnb) then yes, maybe we have reached a mutually programmed harmony. I’m not sure that is exactly what Brautigan had in mind though.

If we think the “machines of loving grace” part of the poem have not quite happened in the way Brautigan predicted it could be that the “all watched over” part is about to become only too true however.

China, where the current coronavirus variant, SARS-CoV-2 originated, was already building the worlds largest social credit system whereby all citizens are given points from which the authorities make deductions for bad behaviour like traffic violations, and add points for good behaviour such as donating to charity. The full system is being rolled out during this decade at which point all citizens will be forced into using the system and everything from credit worthiness to political allegiance will be ‘measured’, not just by the system but by your peers as well. If trust is broken in one place restrictions will be imposed elsewhere meaning the untrustworthy will have reduced access to everything from jobs, to foreign travel, to bank loans and the internet.

Now, as a way of tracking peoples freedom of movement as its citizens come out of the coronavirus lockdown, the government has, through the ubiquitous Alipay and WeChat platforms, developed a “health code” service. This assigns users a colour-coded status based on their health and travel history plus a QR code that can be scanned by authorities. If you have a green code you are allowed to travel relatively freely. A yellow code indicates that the holder should be in home isolation, and a red code says the user is a confirmed COVID-19 patient and should be in quarantine. In China, which is not exactly known for its liberal attitude toward privacy, this may be acceptable as the price to pay for relative freedom of movement however as talk of such apps being rolled out in western liberal democracies start to become news, its citizens may not be quite as accepting of such uses of private data.

A similar system in South Korea that sends emergency virus text alerts has already revealed some embarrassing revelations about infected people’s private lives. These include a text saying “A woman in her 60s has just tested positive. Click on the link for the places she visited before she was hospitalised.” For many people the texts, whilst intended to be helpful, are creating a climate of concern by revealing a little too much personal information including revelations about extra-marital affairs.

At a country level there are already plentiful supplies of open data that allow apps such as this one to track COVID-19 statistics by country. The fact that we have systems and organisations that publish such data is to be applauded and should be seen as a good thing in providing us all (if we can be bothered to look) with plentiful amounts of data to help us come to our own conclusions and combat the unfortunately equally plentiful supply of fake news that abounds on social media about COVID-19. However once such data starts to get more personal that becomes a different matter.

Dominic Cummings, the Prime Ministers chief advisor, hosted a meeting at Downing Street on 11 March with technology company leaders to see how they could help develop an app to tackle COVID-19 and on Easter Sunday the UK government confirmed plans for an app that will warn users if they have recently been in close proximity to someone suspected to be infected with the coronavirus. Meanwhile Apple and Google have announced a system for tracking the spread of the new coronavirus, allowing users to share data through Bluetooth technology.

Four questions immediately arise from this situation?

  1. Should we trust corporations (especially Apple and Google) to be handling location data identifying where we have travelled and who we might have been close to?
  2. Can we trust the government to handle this data sensitively and with due regard to our privacy?
  3. What happens if not enough people use these apps?
  4. Once the pandemic is over can we trust the government and corporations to disable these functions from our phones and our lives?

Let’s take these one at a time.

First, are Google and Apple to be trusted with our private data? Historically neither exactly have a clean slate when it comes to protecting private data. In 2014 third-party software was used to steal intimate photos of celebrities from Apple’s cloud service iCloud, forcing the company to expand it’s two-step authentication service. More recently Hacker News revealed that Apple suffered a possible privacy breach in 2018 due to a bug in its platform that might have exposed iCloud data to other users.

Google’s failed social networking site Google+, which had already suffered a massive data breach in 2018 that exposed the private data of more than 500,000 Google+ users to third-party developers, was shut down earlier than planned in April 2019 following the discovery by Google engineers of another critical security vulnerability.

Despite the breaches of security suffered by these companies it is probably true to say that they have a deeper understanding of their platforms than most companies and government agencies. Putting something temporary in place during this potentially existential threat to society is probably not a bad thing however what happens once the pandemic is over then becomes critical.

Can we trust governments to behave properly with how they handle this data? Again governments do not have a good track records here. Edward Snowden, in his memoir  Permanent Record, reveals the extent of the mass surveillance that was taking place on US citizens by the National Security Agency from 2010 and beyond. If even democratically elected governments do this what chance for the dictatorial regimes of Russia and China? Even during these unprecedented times we should not be too hasty to give away the freedoms that we enjoy today without knowing the extent to which our data could be compromised. As John Naughton explains here there are ways of doing non-intrusive tracking of COVID-19 but to do so our smartphones have to be a bit, well, smarter. This is also a good reason why here in the UK, parliament should be recalled, even in virtual form, to ensure decisions being made in this area are challenged and subject to proper scrutiny.

Next, what happens if not enough people use the apps, either because they don’t trust the government or because not everyone has smartphones or they simply can’t be bothered to install the app and make sure it is active? It is estimated that in order for this to work there must be at least a 60% take up of the app. Can governments somehow enforce its usage and penalise users in someway if they don’t? Maybe they rule that only those who have smartphones with this app installed and active are the ones who will be allowed freedom of movement both to work, socialise and meet with other family members. Whilst this may encourage some to install the app it would alsonput a huge burden on police, the authorities and maybe even your employer as well as shops, bars and restaurants to ensure people moving around or entering their buildings have apps installed.  Also, what about people who don’t have smartphones? Smartphone ownership here in the UK  varies massively by age. In 2019, 96% of 25-34 year olds owned smartphones whereas as only 55% of 55-64 year olds owned these devices and only 16% (figures only available for 2015) of people over 65 owned them. How would they be catered for?

Finally, what happens when the pandemic is over and we return to relative normality? Will these emergency measures be rolled back or will the surveillance state have irrevocably crept one step closer? Recent history (think 9/11) does not provide much comfort here. As Edward Snowden says about the US:

“The two decades since 9/11 have been a litany of American destruction by way of American self-destruction, with the promulgation of secret policies, secret laws, secret courts, and secret wars, whose traumatising impact – whose very existence – the US government has repeatedly classified, denied, disclaimed, and distorted.”

Will our governments not claim there will always be a zoonotic-virus threat and that the war against such viruses, just like the “war on terror” will therefore be never ending and that we must never drop our guard (for which read, we must keep everyone under constant surveillance)?

An open letter published by a group of “responsible technologists” calls upon the NHSX leadership and the Secretary of State for Health and Social Care to ensure new technologies used in the suppression of Coronavirus follow ethical best practice and that if corners are cut, the public’s trust in the NHS will be undermined. The writer Yuval Noah Harari, who is quoted in the open letter by the data campaigners, warns that such measures have a nasty habit of becoming permanent. But he also says this: “When people are given a choice between privacy and health, they will usually choose health.”

Once the surveillance genie has been let out of its bottle it will be very difficult to squish it back in again allowing us to return to times of relative freedom. If we are not careful those machines which are watching over us may not be ones of loving grace but rather ones of mass surveillance and constant monitoring of our movements that make us all a little less free and a little less human.

  1. COVID-19 is the disease caused by the 2019 novel coronavirus or to give it its World Health Organisation designated name severe acute respiratory syndrome coronavirus 2 or SARS-CoV-2.
  2. No longer available on the BBC iPlayer but can be found here.

On Ethics and Algorithms

franck-v-g29arbbvPjo-unsplash
Photo by Franck V. on Unsplash

An article on the front page of the Observer, Revealed: how drugs giants can access your health records, caught my eye this week. In summary the article highlights that the Department of Health and Social Care (DHSC) has been selling the medical data of NHS patients to international drugs companies and have “misled” the public that the information contained in the records would be “anonymous”.

The data in question is collated from GP surgeries and hospitals and, according to “senior NHS figures”, can “routinely be linked back to individual patients’ medical records via their GP surgeries.” Apparently there is “clear evidence” that companies have identified individuals whose medical histories are of “particular interest.” The DHSC have replied by saying it only sells information after “thorough measures” have been taken to ensure patient anonymity.

As with many articles like this it is frustrating when some of the more technical aspects are not fully explained. Whilst I understand the importance of keeping their general readership on board and not frightening them too much with the intricacies of statistics or cryptography it would be nice to know a bit more about how these records are being made anonymous.

There is a hint of this in the Observer report when it states that the CPRD (the Clinical Practice Research Datalink ) says the data made available for research was “anonymous” but, following the Observer’s story, it changed the wording to say that the data from GPs and hospitals had been “anonymised”. This is a crucial difference. One of the more common methods of ‘anonymisation’  is to obscure or redact some bits of information. So, for example, a record could have patient names removed and ages and postcodes “coarsened”, that is only the first part of a postcode (e.g. SW1A rather than SW1A 2AA)  are included and ages are placed in a range rather than using someones actual age (e.g. 60-70 rather than 63).

The problem with anonymising data records is that they are prone to what is referred to as data re-identification or de-anonymisation. This is the practice of matching anonymous data with publicly available information in order to discover the individual to which the data belongs. One of the more famous examples of this is the competition that Netflix organised encouraging people to improve its recommendation system by offering a $50,000 prize for a 1% improvement. The Netflix Prize was started in 2006 but abandoned in 2010 in response to a lawsuit and Federal Trade Commission privacy concerns. Although the dataset released by Netflix to allow competition entrants to test their algorithms had supposedly been anonymised (i.e. by replacing user names with a meaningless ID and not including any gender or zip code information) a PhD student from the University of Texas was able to find out the real names of people in the supplied dataset by cross-referencing the Netflix dataset with Internet Movie Database (IMDB) ratings which people post publicly using their real names.

Herein lies the problem with the anonymisation of datasets. As Michael Kearns and Aaron Roth highlight in their recent book The Ethical Algorithm, when an organisation releases anonymised data they can try and make an intelligent guess as to which bits of the dataset to anonymise but it can be difficult (probably impossible) to anticipate what other data sources either already exist or could be made available in the future which could be used to correlate records. This is the reason that the computer scientist Cynthia Dwork has said “anonymised data isn’t” – meaning either it isn’t really anonymous or so much of the dataset has had to be removed that it is no longer data (at least in any useful way).

So what to do? Is it actually possible to release anonymised datasets out into the wild with any degree of confidence that they can never be de-anonymised? Thankfully something called differential privacy, invented by the aforementioned Cynthia Dwork and colleagues, allows us to do just that. Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in that dataset.

To understand how differential privacy works consider this example*. Suppose we want to conduct a poll of all people in London to find out who have driven after taking non-prescription drugs. One way of doing this is to randomly sample a suitable number of Londoners, asking them if they have ever driven whilst under the influence of drugs. The data collected could be entered into a spreadsheet and various statistics, e.g. number of men, number of women, maybe ages etc derived. The problem is that whilst collecting this information lots of compromising personal details may be collected which, if the data were stolen, could be used against them.

In order to avoid this problem consider the following alternative. Instead of asking people the question directly, first ask them to flip a coin but not to tell us how it landed. If the coin comes up heads they tell us (honestly) if they have driven under the influence. If it comes up tails however they tell us a random answer then flip the coin again and tell us “yes” if it comes up heads or “no” if it is tails. This polling protocol is a simple randomised algorithm which is a form of differential privacy. So how does this work?

differential privacy
If your answer is no, the randomised response answers no two out of three times. It answers no only one out of three times if your answer is yes. Diagram courtesy Michael Kearns and Aaron Roth, The Ethical Algorithm 2020

When we ask people if they have driven under the influence using this protocol half the time (i.e. when the coin lands heads up) the protocol tells them to tell the truth. If the protocol tells them to respond with a random answer (i.e. when the coin lands tails up), then half of that time they just happen to randomly tell us the right answer. So they tell us the right answer 1/2 + ((1/2) x (1/2)) or three-quarters of the time. The remaining one quarter of the time they tell us a lie. There is no way of telling true answers from lies. Surely though, this injection of randomisation completely masks the true results and the data is now highly error prone? Actually, it turns out, this is not the case.

Because we know how this randomisation is introduced we can reverse engineer the answers we get to remove the errors and get an approximation of the right answer. Here’s how. Suppose one-third of people in London have actually driven under the influence of drugs. So of the one-third who have truthfully answered “yes” to the question, three-quarters of those will answer “yes” using the protocol, that is 1/3 x 3/4 = 1/4. Of the two-thirds who have a truthful answer of “no”, one-quarter of those will report “yes”, that is 2/3 x 1/4 = 1/6. So we expect 1/4 + 1/6 = 5/12 ~ 1/3 of the population to answer “yes”.

So what is the point of doing the survey like this? Simply put it allows the true answer to be hidden behind the protocol. If the data were leaked and an individual from it was identified as being suspected of driving under the influence then they could always argue they were told to say “yes” because of the way the coins fell.

In the real world a number of companies including the US census, Apple, Google and Privitar Lens use differential privacy to limit the disclosure of private information about individuals whose information is in public databases.

It would be nice to think that the NHS data that is supposedly being used by US drug companies was protected by some form of differential privacy. If it were, and if this could be explained to the public in a reasonable and rational way, then surely we would all benefit both in the knowledge that our data is safe and is maybe even being put to good use in protecting and improving our health. After all, wasn’t this meant to be the true benefit of living in a connected society where information is shared for the betterment of all our lives?

*Based on an example from Kearns and Roth in The Ethical Algorithm.