The ethics of contact tracing

After a much publicised “U-turn” the UK government has decided to change the architecture of its coronavirus contact tracing system and to embrace the one based on the interfaces being provided by Apple and Google. The inevitable cries of a government that does not know what it is doing, we told you it wouldn’t work and this means we have wasted valuable time in building a system that would help protect UK citizens have ensued. At times like these it’s often difficult to get to the facts and understand where the problems actually lie. Let’s try and unearth some facts and understand the options for the design of a contact tracing app.

Any good approach to designing a system such as contact tracing should, you would hope, start with the requirements. I have no government inside knowledge and it’s not immediately apparent from online searches what the UK governments exact and actual requirements were. However as this article highlights you would expect that a contact tracing system would need to “involve apps, reporting channels, proximity-based communication technology and monitoring through personal items such as ID badges, phones and computers.” You might also expect it to involve cooperation with local health service departments. Whether or not there is also a requirement to collate data in some centralised repository so that epidemiologists, without knowing the nature of the contact, can build a model of contacts to see if they are serious spreaders or those who have tested positive yet are asymptomatic, at least for the UK, is not clear. Whilst it would seem perfectly reasonable to want the system to do that, this is a different use case to the one of contact tracing. One might assume that because the UK government was proposing a centralised database for tracking data this latter use case was also to be handled by the system.

Whilst different countries are going to have different requirements for contact tracing one would hope that for any democratically run country a minimum set of requirements (i.e. privacy, anonymity, transparency and verifiability, no central repository and minimal data collection) would be implemented.

The approach to contact tracing developed by Google and Apple (the two largest providers of mobile phone operating systems) was published in April of this year with the detail of the design being made available in four technical papers. Included as part of this document set were some frequently asked questions where the details of how the system would work were explained using the eponymous Alice and Bob notation. Here is a summary.

  1. Alice and Bob don’t know each other but happen to have a lengthy conversation sitting a few feet apart on a park bench. They both have a contact tracing app installed on their phones which exchange random Bluetooth identifiers with each other. These identifiers change frequently.
  2. Alice continues her day unaware that Bob had recently contracted Covid-19.
  3. Bob feels ill and gets tested for Covid-19. His test results are positive and he enters his result into his phone. With Bob’s consent his phone uploads the last 14 days of keys stored on his phone to a server.
  4. Alice’s phone periodically downloads the Bluetooth beacon keys of everyone who has tested positive for Covid-19 in her immediate vicinity. A match is found with Bob’s randomly generated Bluetooth identifier.
  5. Alice sees a notification on her phone warning her she has recently come into contact with someone who has tested positive with Covid-19. What Alice needs to do next is decided by her public health authority and will be provided in their version of the contact tracing app.

There are a couple of things worth noting about this use case:

  1. Alice and Bob both have to make an explicit choice to turn on the contact tracing app.
  2. Neither Alice or Bob’s names are ever revealed, either between themselves or to the app provider or health authority.
  3. No location data is collected. The system only knows that two identifiers have previously been within range of each other.
  4. Google and Apple say that the Bluetooth identifiers change every 10-20 minutes, to help prevent tracking and that they will disable the exposure notification system on a regional basis when it is no longer needed.
  5. Health authorities of any other third parties do not receive any data from the app.

Another point to note is that initially this solution has been released via application programming interfaces (APIs) that allow customised contact tracing apps from public health authorities to work across Android and iOS devices. Maintaining user privacy seems to have been a key non-functional requirement of the design. The apps are made available from the public health authorities via the respective Apple and Google app stores. A second phase has also been announced whereby the capability will be embedded at the operating system level meaning no app has to be installed but users still have to opt into using the capability. If a user is notified she has been in contact with someone with Covid-19 and has not already downloaded an official public health authority app they will be prompted to do so and advised on next steps. Only public health authorities will have access to this technology and their apps must meet specific criteria around privacy, security, and data control as mandated by Apple and Google.

So why would Google and Apple choose to implement its contact tracing app in this way which would seem to be putting privacy ahead of efficacy? More importantly why should Google and Apple get to dictate how countries should do contact tracing?

Clearly one major driver from both companies is that of security and privacy. Post-Snowden we know just how easy it has been for government security agencies (i.e. the US National Security Agency and UK’s Government Communications Headquarters) to get access to supposedly private data. Trust in central government is at an all time low and it is hardly surprising that the corporate world is stepping in to announce that they were the good guys all along and you can trust us with your data.

Another legitimate reason is also that during the coronavirus pandemic we have all had our ability to travel even locally, never mind nationally or globally, severely restricted. Implementing an approach that is supported at the operating system level means that it should be easier to make the app compatible with other countries’ counterparts, which are based on the same system therefore making it safer for people to begin travelling internationally again.

The real problem, at least as far as the UK has been concerned, is that the government has been woefully slow in implementing a rigorous and scaleable contact tracing system. It seems as though they may have been looking at an app-based approach to be the silver bullet that would solve all of their problems – no matter how poorly identified these are. Realistically that was never going to happen, even if the system had worked perfectly. The UK is not China and could never impose an app based contact tracing system on its populace, could it? Lessons from Singapore, where contact tracing has been in place for some time, are that the apps do not perform as required and other more intrusive measures are needed to make them effective.

There will now be the usual blame game between government, the press, and industry, no doubt resulting in the inevitable government enquiry into what went wrong. This will report back after several months, if not years, of deliberation. Blame will be officially apportioned, maybe a few junior minister heads will roll, if they have not already moved on, but meanwhile the trust that people have in their leaders will be chipped away a little more.

More seriously however, will we have ended up, by default, putting more trust into the powerful corporations of Silicon Valley some of whom not only have greater valuations than many countries GDP but are also allegedly practising anti-competitive behaviour?

Update: 21st June 2020

Updated to include link to Apple’s anti-trust case.

The real reason Boris Johnson has not (yet) sacked Dominic Cummings

Amidst the current press furore over ‘CummingsGate’ (you can almost hear the orgiastic paroxysms of sheer ecstasy emanating from Guardian HQ 250 miles away at Barnard Castle as the journalists there finally think they have got their man) I think everyone really is missing the point. The real reason Johnson is not sacking Cummings (or at least hasn’t at the time of writing) is because Cummings is his ‘dataist-in-chief’ (let’s call him Johnson’s DiC for short) and having applied his dark arts twice now (the Brexit referendum and the 2019 General Election) Cummings has proven his battle worthiness. It would be like Churchill (Johnson’s hero and role model) blowing up all his Spitfires on the eve of the Battle of Britain. The next battle Johnson is going to need his DiC for being the final push to get us out of the EU on 31st December 2020.

Dominic Cummings is a technocrat. He believes that science, or more precisely data science, can be deployed to understand and help solve almost any problem in government or elsewhere. Earlier this year he upset the governments HR department by posting a job advert, on his personal blog for data scientists, economists and physicists (oh, and weirdos). In this post he says “some people in government are prepared to take risks to change things a lot” and the UK now has “a new government with a significant majority and little need to worry about short-term unpopularity”. He saw these as being “a confluence” implying now was the time to get sh*t done.

So what is dataism, why is Cummings practicing it and what is its likely impact for us going to be moving forward?

The first reference to dataism was by David Brooks, the conservative political commentator, in his 2013 New York Times article The Philosophy of Data. In this article Brooks says:

“We now have the ability to gather huge amounts of data. This ability seems to carry with it certain cultural assumptions — that everything that can be measured should be measured; that data is a transparent and reliable lens that allows us to filter out emotionalism and ideology; that data will help us do remarkable things — like foretell the future”.

David Brooks, The Philosophy of Data

Dataism was then picked up by historian Yuval Noah Harari in his 2016 book Homo Deus. Harari went as far to call dataism a new form of religion which joins together biochemistry and computer science whose algorithms obey the same mathematical laws.

The central tenet of dataism is the idea that the universe gives more value to systems, individuals, and societies that generate the most data to be consumed and processed by algorithms. Harari states that “according to dataism Beethovens Fifth Symphony, a stock-exchange bubble and the flu virus are just three patterns of data flown that can be analysed using the same basic concepts and tools“. That last example is obviously the most relevant to our current situation with SAR-COV-2 or coronavirus still raging around the world and which Cummings, as far as we know, is focused on.

As computer scientist Steven Parton says here:

Dataists believe we should hand over as much information and power to these [big data and machine learning] algorithms as possible, allowing the free flow of data to unlock innovation and progress unlike anything we’ve ever seen before“.

Steven Parton

This, I believe, is Cummings belief also. He has no time for civil servants who are humanities graduates that “chat about Lacan at dinner parties” when they ought to be learning about numbers, probabilities and predictions based on hard data.

Whilst I have some sympathy with the idea of bringing science and data more to the fore in government you have to ask, if Cummings is forging ahead in creating a dataist civil service somewhere in the bowels of Downing Street, why are our COVID-19 deaths the worst, per capita, in the world? This graph shows the data for deaths per 100,000 of population (2018 population data) for the major economies of the world (using this data source.). You’ll see that as of 1st June 2020 the UK is faring the worst of all countries, having just overtaken Spain.

Unfortunately Cummings has now blotted his copybook twice in the eyes of the public and most MPs. Not only did he ignore the governments advice (which he presumably was instrumental in creating) and broke the rules on lockdown he was also found guilty of editing one of his own blog posts sometime between 8 April 2020 and 15 April 2020 to include a paragraph on SARS (which, along with Covid-19, is also caused by a coronavirus) to make out he had been warning about the disease since March of 2019.

Not only is Cummings ignoring the facts derived from the data he is so fond of using he is also doctoring data (i.e. his blog post) to change those facts. In many ways this is just another form of the data manipulation that was being carried out by Cambridge Analytica, the firm that Cummings allegedly used during the Brexit referendum, to bombard peoples Facebook feeds with ‘misleading’ information about the EU.

Cummings is like Gollum in Lord of the Rings. Gollum became corrupted by the power of the “one ring that ruled them all” and turned into a bitter and twisted creature that would do anything to get back “his precious” (the ring). It seems that data corrupts just as much as power. Hardly surprising really because in the dataist’s view of the world data is power.

All in all not a good look for the man that is meant to be changing the face of government and bringing a more data-centric (AKA dataist) approach to lead the country forward post-Brexit. If you cannot trust the man who is leading this initiative how can you trust the data and, more seriously, how can you trust the person who Cummings works for?


Update: 8th June 2020

Since writing this post I’ve read that Belgium is actually the country with the highest per-capita death rate from Covid-19. Here then is an update of my graph which now includes the G7 countries plus China, Spain and Belgium showing that Belgium does indeed have 20 more deaths per capita than the next highest, the UK.

It appears however that Belgium is somewhat unique in how it reports its deaths, being one of the few countries counting deaths in hospitals and care homes and also including deaths in care homes that are suspected, not confirmed, as Covid-19 cases. I suspect that for many countries, the UK included, deaths in care homes is going to end up being one of the great scandals of this crisis. In the UK ministers ordered 15,000 hospital beds to be vacated by 27 March and for patients to be moved into care homes without either adequate testing or adequate amounts of PPE being available.

Trust Google?

Photo by Daniele Levis Pelusi on Unsplash

Google has just released data on people’s movements, gathered from millions of mobile devices that use its software (e.g. Android, Google Maps etc) leading up to and during the COVID-19 lockdown in various countries. The data has been analysed here to show graphically how people spent their time between six location categories: homes; workplaces; parks; public transport stations; grocery shops and pharmacies; and retail and recreational locations.

The data shows how quickly people reacted to the instructions to lockdown. Here in the UK for example we see people reacted late but then strongly, with a rise of about 20-25% staying at home. This delay reflects the fact that lockdown began later, on March 23, in the UK though some people were already staying home before lockdown began.

What we see in the data provided by Google is likely to be only the start and, I suspect, a preview of how we may soon have to live. In the book Homo Deus by Yuval Noah Harari the chapter The Great Decoupling discusses how bioscience and computer science are conspiring to learn more about us than we know about ourselves and in the process destroy the “great liberal project” where we think that we have free-will and are able to make our own decisions about what we eat, who we marry and vote for in elections as well as what career path we choose etc, etc.

Harari asks what will happen when Google et al know more about us than we, or anyone else does? Facebook, for example, already purports to know more about us than our spouse by analysing as few as 300 of our ‘likes’. What if those machines who are watching over us (hopefully with “loving grace” but who knows) can offer us ‘advice’ on who we should vote for based on our previous four years comments and ‘likes’ on Facebook or recommend we should go and see a psychiatrist because of the somewhat erratic comments we have been making in emails to our friends or on Twitter?

The Google we see today, providing us with relatively benign data for us to analyse ourselves, is currently at the level of what Harari says is an ‘oracle’. It has the data and, with the right interpretation, we can use that data to provide us with information to make decisions. Exactly where we are now with coronavirus and this latest dataset.

The next stage is that of Google becoming an ‘agent’. You give Google an aim and it works out the best way to achieve that aim. Say, I want to lose two stone by next summer so I have the perfect beach ready body. Google knows all about my biometric data (they just bought Fitbit remember) as well as your predisposition for buying crisps and watching too much Netflix and comes up with a plan that will allow you to lose that weight provided you follow it.

Finally Google becomes ’sovereign’ and starts making those decisions for you. So maybe it checks your supermarket account and recommends removing those crisps from your shopping list and then, if you continue to ignore its advice it instructs your insurance company who bumps up your health insurance if you don’t.

At this point we ask who is in control. Google, Facebook etc own all that data but that data can be influenced (or hacked) to nudge us to do things we don’t realise. We already know how Cambridge Analytica used Facebook to influence the voting behaviour (we’re looking at you Mr Cummings) in a few swing areas (for Brexit and the last US election). We have no idea how much of that was also being influenced by Russia.

I think humanity is rapidly approaching the point when we really need to be making some hard decisions about how much of our data, and the analysis of that data, we should allow Google, Facebook and Twitter to hold. Should we be starting to think the unthinkable and calling a halt to this ever growing mountain of data each of us willingly gives away for free? But, how do we do that when most of it is being kept and analysed by private companies or worse, by China and Russia?

Pythons and pandas (or why software architects no longer have an excuse not to code)

pythonpanda

The coronavirus pandemic has certainly shown just how much the world depends not just on accurate and readily available datasets but also the ability of scientists and data analysts to make sense of that data. All of us are at the mercy of those experts to interpret this data correctly – our lives could quite literally depend on it.

Thankfully we live in a world where the tools are available to allow anyone, with a bit of effort, to learn how to analyse data themselves and not just rely on the experts to tell us what is happening.

The programming language Python, coupled with the pandas dataset analysis library and Bokeh interactive visualisation library, provide a robust and professional set of tools to begin analysing data of all sorts and get it into the right format.

Data on the coronavirus pandemic is available from lots of sources including the UK’s Office for National Statistics as well as the World Health Organisation. I’ve been using data from DataHub which provides datasets in different formats (CSV, Excel, JSON) across a range of topics including climate change, healthcare, economics and demographics. You can find their coronavirus related datasets here.

I’ve created a set of resources which I’ve been using to learn Python and some of its related libraries which is available on my GitHub page here. You’ll also find the project which I’ve been using to analyse some of the COVID-19 data around the world here.

The snippet of code below shows how to load a CSV file into a panda DataFrame – a 2-dimensional data structure that can store data of different types in columns that is similar to a spreadsheet or SQL table.

# Return COVID-19 info for country, province and date.
def covid_info_data(country, province, date):
    df4 = pd.DataFrame()
    if (country != "") and (date != ""):
        try:
            # Read dataset as a panda dataframe
            df1 = pd.read_csv(path + coviddata)

            # Check if country has an alternate name for this dataset
            if country in alternatives:
                country = alternatives[country]

            # Get subset of data for specified country/region
            df2 = df1[df1["Country/Region"] == country]

            # Get subset of data for specified date
            df3 = df2[df2["Date"] == date]

            # Get subset of data for specified province. If none specified but there
            # are provinces the current dataframe will contain all with the first one being 
            # country and province as 'NaN'. In that case just select country otherwise select
            # province as well.
            if province == "":
                df4 = df3[df3["Province/State"].isnull()]
            else:
                df4 = df3[df3["Province/State"] == province]
        except FileNotFoundError:
            print("Invalid file or path")
    # Return selected covid data from last subset
    return df4

The first ten rows from the DataFrame df1 shows the data from the first country (Afghanistan).

         Date Country/Region Province/State   Lat  Long  Confirmed  Recovered  Deaths
0  2020-01-22    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
1  2020-01-23    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
2  2020-01-24    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
3  2020-01-25    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
4  2020-01-26    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0

Three further subsets of data are made, the final one is for a specific country showing the COVID-19 data for a particular date (the UK on 7th May in this case).

             Date  Country/Region Province/State      Lat   Long  Confirmed  Recovered   Deaths
26428  2020-05-07  United Kingdom            NaN  55.3781 -3.436   206715.0        0.0  30615.0

Once the dataset has been obtained the information can be printed in a more readable way. Here’s a summary of information for the UK on 9th May.

Date:  2020-05-09
Country:  United Kingdom
Province: No province
Confirmed:  215,260
Recovered:  0
Deaths:  31,587
Population:  66,460,344
Confirmed/100,000: 323.89
Deaths/100,000: 47.53
Percent Deaths/Confirmed: 14.67

Obviously there are lots of ways of analysing this dataset as well as how to display it. Graphs are always a good way of showing information and Bokeh is a nice and relatively simple to use Python library for creating a range of different graphs. Here’s how Bokeh can be used to create a simple line graph of COVID-19 deaths over a period of time.

from datetime import datetime as dt
from bokeh.plotting import figure, output_file, show
from bokeh.models import DatetimeTickFormatter

def graph_covid_rate(df):
    x = []
    y = []
    country = df.values[0][1]
    for deaths, date in zip(df['Deaths'], df['Date']):
        y.append(deaths) 
        date_obj = dt.strptime(date, "%Y-%m-%d")
        x.append(date_obj)

    # output to static HTML file
    output_file("lines.html")

    # create a new plot with a title and axis labels
    p = figure(title="COVID-19 Deaths for "+country, x_axis_label='Date', y_axis_label='Deaths', x_axis_type='datetime')

    # add a line renderer with legend and line thickness
    p.line(x, y, legend_label="COVID-19 Deaths for "+country, line_width=3, line_color="green")
    p.xaxis.major_label_orientation = 3/4

    # show the results
    show(p)

Bokeh creates an HTML file of an interactive graph. Here’s the one the above code creates, again for the UK, for the period 2020-02-01 to 2020-05-09.

As a recently retired software architect (who has now started a new career working for Digital Innovators, a company addressing the digital skills gap) coding is still important to me. I’m a believer in the Architect’s Don’t Code anti-pattern believing that design and coding are two sides of the same coin and you cannot design if you cannot code (and you cannot code if you cannot design). These days there really is no excuse not to keep your coding skills up to date with the vast array of resources available to everyone with just a few clicks and Google searches.

I also see coding as not just a way of keeping my own skills up to date and to teach others vital digital skills, but also, as this article helpfully points out, as a way of helping solve problems of all kinds. Coding is a skill for life that is vitally important for young people entering the workplace to at least have a rudimentary understanding of to help them not just get a job but to also understand more of the world in these incredibly uncertain times.

All Watched Over by Machines of Loving Grace?

 

This-HAL-9000-Inspired-AI-Simulation-Kept-Its-Virtual-Astronauts-Alive
The Watching “Eye” of the HAL 9000 Computer from 2001 – A Space Odyssey

I like to think
(it has to be!)
of a cybernetic ecology
where we are free of our labors
and joined back to nature,
returned to our mammal
brothers and sisters,
and all watched over
by machines of loving grace.

The last verse of Richard Brautigan’s 1967 poem, All Watched Over by Machines of Loving Grace, has a particular resonance during these dark and uncertain times caused by the COVID-19 pandemic[1].

The poem, which was also the name of a BBC documentary series by Adam Curtis[2], speaks of a time when we can return to nature and that mammals and computers will live together in “mutually programming harmony” with machines taking care of all our needs.

Things haven’t quite turned out like that have they?

In some kind of warped way maybe our machines are taking care of our needs but are they things we really need taken care of? If by “meeting our needs” we mean machines whose algorithms predict and dictate our shopping choices (Amazon), influence our voting behaviour (Facebook), satisfy our sexual preferences (Tinder, Grindr) or find us cheap rides and accommodation (Uber and Airbnb) then yes, maybe we have reached a mutually programmed harmony. I’m not sure that is exactly what Brautigan had in mind though.

If we think the “machines of loving grace” part of the poem have not quite happened in the way Brautigan predicted it could be that the “all watched over” part is about to become only too true however.

China, where the current coronavirus variant, SARS-CoV-2 originated, was already building the worlds largest social credit system whereby all citizens are given points from which the authorities make deductions for bad behaviour like traffic violations, and add points for good behaviour such as donating to charity. The full system is being rolled out during this decade at which point all citizens will be forced into using the system and everything from credit worthiness to political allegiance will be ‘measured’, not just by the system but by your peers as well. If trust is broken in one place restrictions will be imposed elsewhere meaning the untrustworthy will have reduced access to everything from jobs, to foreign travel, to bank loans and the internet.

Now, as a way of tracking peoples freedom of movement as its citizens come out of the coronavirus lockdown, the government has, through the ubiquitous Alipay and WeChat platforms, developed a “health code” service. This assigns users a colour-coded status based on their health and travel history plus a QR code that can be scanned by authorities. If you have a green code you are allowed to travel relatively freely. A yellow code indicates that the holder should be in home isolation, and a red code says the user is a confirmed COVID-19 patient and should be in quarantine. In China, which is not exactly known for its liberal attitude toward privacy, this may be acceptable as the price to pay for relative freedom of movement however as talk of such apps being rolled out in western liberal democracies start to become news, its citizens may not be quite as accepting of such uses of private data.

A similar system in South Korea that sends emergency virus text alerts has already revealed some embarrassing revelations about infected people’s private lives. These include a text saying “A woman in her 60s has just tested positive. Click on the link for the places she visited before she was hospitalised.” For many people the texts, whilst intended to be helpful, are creating a climate of concern by revealing a little too much personal information including revelations about extra-marital affairs.

At a country level there are already plentiful supplies of open data that allow apps such as this one to track COVID-19 statistics by country. The fact that we have systems and organisations that publish such data is to be applauded and should be seen as a good thing in providing us all (if we can be bothered to look) with plentiful amounts of data to help us come to our own conclusions and combat the unfortunately equally plentiful supply of fake news that abounds on social media about COVID-19. However once such data starts to get more personal that becomes a different matter.

Dominic Cummings, the Prime Ministers chief advisor, hosted a meeting at Downing Street on 11 March with technology company leaders to see how they could help develop an app to tackle COVID-19 and on Easter Sunday the UK government confirmed plans for an app that will warn users if they have recently been in close proximity to someone suspected to be infected with the coronavirus. Meanwhile Apple and Google have announced a system for tracking the spread of the new coronavirus, allowing users to share data through Bluetooth technology.

Four questions immediately arise from this situation?

  1. Should we trust corporations (especially Apple and Google) to be handling location data identifying where we have travelled and who we might have been close to?
  2. Can we trust the government to handle this data sensitively and with due regard to our privacy?
  3. What happens if not enough people use these apps?
  4. Once the pandemic is over can we trust the government and corporations to disable these functions from our phones and our lives?

Let’s take these one at a time.

First, are Google and Apple to be trusted with our private data? Historically neither exactly have a clean slate when it comes to protecting private data. In 2014 third-party software was used to steal intimate photos of celebrities from Apple’s cloud service iCloud, forcing the company to expand it’s two-step authentication service. More recently Hacker News revealed that Apple suffered a possible privacy breach in 2018 due to a bug in its platform that might have exposed iCloud data to other users.

Google’s failed social networking site Google+, which had already suffered a massive data breach in 2018 that exposed the private data of more than 500,000 Google+ users to third-party developers, was shut down earlier than planned in April 2019 following the discovery by Google engineers of another critical security vulnerability.

Despite the breaches of security suffered by these companies it is probably true to say that they have a deeper understanding of their platforms than most companies and government agencies. Putting something temporary in place during this potentially existential threat to society is probably not a bad thing however what happens once the pandemic is over then becomes critical.

Can we trust governments to behave properly with how they handle this data? Again governments do not have a good track records here. Edward Snowden, in his memoir  Permanent Record, reveals the extent of the mass surveillance that was taking place on US citizens by the National Security Agency from 2010 and beyond. If even democratically elected governments do this what chance for the dictatorial regimes of Russia and China? Even during these unprecedented times we should not be too hasty to give away the freedoms that we enjoy today without knowing the extent to which our data could be compromised. As John Naughton explains here there are ways of doing non-intrusive tracking of COVID-19 but to do so our smartphones have to be a bit, well, smarter. This is also a good reason why here in the UK, parliament should be recalled, even in virtual form, to ensure decisions being made in this area are challenged and subject to proper scrutiny.

Next, what happens if not enough people use the apps, either because they don’t trust the government or because not everyone has smartphones or they simply can’t be bothered to install the app and make sure it is active? It is estimated that in order for this to work there must be at least a 60% take up of the app. Can governments somehow enforce its usage and penalise users in someway if they don’t? Maybe they rule that only those who have smartphones with this app installed and active are the ones who will be allowed freedom of movement both to work, socialise and meet with other family members. Whilst this may encourage some to install the app it would alsonput a huge burden on police, the authorities and maybe even your employer as well as shops, bars and restaurants to ensure people moving around or entering their buildings have apps installed.  Also, what about people who don’t have smartphones? Smartphone ownership here in the UK  varies massively by age. In 2019, 96% of 25-34 year olds owned smartphones whereas as only 55% of 55-64 year olds owned these devices and only 16% (figures only available for 2015) of people over 65 owned them. How would they be catered for?

Finally, what happens when the pandemic is over and we return to relative normality? Will these emergency measures be rolled back or will the surveillance state have irrevocably crept one step closer? Recent history (think 9/11) does not provide much comfort here. As Edward Snowden says about the US:

“The two decades since 9/11 have been a litany of American destruction by way of American self-destruction, with the promulgation of secret policies, secret laws, secret courts, and secret wars, whose traumatising impact – whose very existence – the US government has repeatedly classified, denied, disclaimed, and distorted.”

Will our governments not claim there will always be a zoonotic-virus threat and that the war against such viruses, just like the “war on terror” will therefore be never ending and that we must never drop our guard (for which read, we must keep everyone under constant surveillance)?

An open letter published by a group of “responsible technologists” calls upon the NHSX leadership and the Secretary of State for Health and Social Care to ensure new technologies used in the suppression of Coronavirus follow ethical best practice and that if corners are cut, the public’s trust in the NHS will be undermined. The writer Yuval Noah Harari, who is quoted in the open letter by the data campaigners, warns that such measures have a nasty habit of becoming permanent. But he also says this: “When people are given a choice between privacy and health, they will usually choose health.”

Once the surveillance genie has been let out of its bottle it will be very difficult to squish it back in again allowing us to return to times of relative freedom. If we are not careful those machines which are watching over us may not be ones of loving grace but rather ones of mass surveillance and constant monitoring of our movements that make us all a little less free and a little less human.

  1. COVID-19 is the disease caused by the 2019 novel coronavirus or to give it its World Health Organisation designated name severe acute respiratory syndrome coronavirus 2 or SARS-CoV-2.
  2. No longer available on the BBC iPlayer but can be found here.

On Ethics and Algorithms

franck-v-g29arbbvPjo-unsplash
Photo by Franck V. on Unsplash

An article on the front page of the Observer, Revealed: how drugs giants can access your health records, caught my eye this week. In summary the article highlights that the Department of Health and Social Care (DHSC) has been selling the medical data of NHS patients to international drugs companies and have “misled” the public that the information contained in the records would be “anonymous”.

The data in question is collated from GP surgeries and hospitals and, according to “senior NHS figures”, can “routinely be linked back to individual patients’ medical records via their GP surgeries.” Apparently there is “clear evidence” that companies have identified individuals whose medical histories are of “particular interest.” The DHSC have replied by saying it only sells information after “thorough measures” have been taken to ensure patient anonymity.

As with many articles like this it is frustrating when some of the more technical aspects are not fully explained. Whilst I understand the importance of keeping their general readership on board and not frightening them too much with the intricacies of statistics or cryptography it would be nice to know a bit more about how these records are being made anonymous.

There is a hint of this in the Observer report when it states that the CPRD (the Clinical Practice Research Datalink ) says the data made available for research was “anonymous” but, following the Observer’s story, it changed the wording to say that the data from GPs and hospitals had been “anonymised”. This is a crucial difference. One of the more common methods of ‘anonymisation’  is to obscure or redact some bits of information. So, for example, a record could have patient names removed and ages and postcodes “coarsened”, that is only the first part of a postcode (e.g. SW1A rather than SW1A 2AA)  are included and ages are placed in a range rather than using someones actual age (e.g. 60-70 rather than 63).

The problem with anonymising data records is that they are prone to what is referred to as data re-identification or de-anonymisation. This is the practice of matching anonymous data with publicly available information in order to discover the individual to which the data belongs. One of the more famous examples of this is the competition that Netflix organised encouraging people to improve its recommendation system by offering a $50,000 prize for a 1% improvement. The Netflix Prize was started in 2006 but abandoned in 2010 in response to a lawsuit and Federal Trade Commission privacy concerns. Although the dataset released by Netflix to allow competition entrants to test their algorithms had supposedly been anonymised (i.e. by replacing user names with a meaningless ID and not including any gender or zip code information) a PhD student from the University of Texas was able to find out the real names of people in the supplied dataset by cross-referencing the Netflix dataset with Internet Movie Database (IMDB) ratings which people post publicly using their real names.

Herein lies the problem with the anonymisation of datasets. As Michael Kearns and Aaron Roth highlight in their recent book The Ethical Algorithm, when an organisation releases anonymised data they can try and make an intelligent guess as to which bits of the dataset to anonymise but it can be difficult (probably impossible) to anticipate what other data sources either already exist or could be made available in the future which could be used to correlate records. This is the reason that the computer scientist Cynthia Dwork has said “anonymised data isn’t” – meaning either it isn’t really anonymous or so much of the dataset has had to be removed that it is no longer data (at least in any useful way).

So what to do? Is it actually possible to release anonymised datasets out into the wild with any degree of confidence that they can never be de-anonymised? Thankfully something called differential privacy, invented by the aforementioned Cynthia Dwork and colleagues, allows us to do just that. Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in that dataset.

To understand how differential privacy works consider this example*. Suppose we want to conduct a poll of all people in London to find out who have driven after taking non-prescription drugs. One way of doing this is to randomly sample a suitable number of Londoners, asking them if they have ever driven whilst under the influence of drugs. The data collected could be entered into a spreadsheet and various statistics, e.g. number of men, number of women, maybe ages etc derived. The problem is that whilst collecting this information lots of compromising personal details may be collected which, if the data were stolen, could be used against them.

In order to avoid this problem consider the following alternative. Instead of asking people the question directly, first ask them to flip a coin but not to tell us how it landed. If the coin comes up heads they tell us (honestly) if they have driven under the influence. If it comes up tails however they tell us a random answer then flip the coin again and tell us “yes” if it comes up heads or “no” if it is tails. This polling protocol is a simple randomised algorithm which is a form of differential privacy. So how does this work?

differential privacy
If your answer is no, the randomised response answers no two out of three times. It answers no only one out of three times if your answer is yes. Diagram courtesy Michael Kearns and Aaron Roth, The Ethical Algorithm 2020

When we ask people if they have driven under the influence using this protocol half the time (i.e. when the coin lands heads up) the protocol tells them to tell the truth. If the protocol tells them to respond with a random answer (i.e. when the coin lands tails up), then half of that time they just happen to randomly tell us the right answer. So they tell us the right answer 1/2 + ((1/2) x (1/2)) or three-quarters of the time. The remaining one quarter of the time they tell us a lie. There is no way of telling true answers from lies. Surely though, this injection of randomisation completely masks the true results and the data is now highly error prone? Actually, it turns out, this is not the case.

Because we know how this randomisation is introduced we can reverse engineer the answers we get to remove the errors and get an approximation of the right answer. Here’s how. Suppose one-third of people in London have actually driven under the influence of drugs. So of the one-third who have truthfully answered “yes” to the question, three-quarters of those will answer “yes” using the protocol, that is 1/3 x 3/4 = 1/4. Of the two-thirds who have a truthful answer of “no”, one-quarter of those will report “yes”, that is 2/3 x 1/4 = 1/6. So we expect 1/4 + 1/6 = 5/12 ~ 1/3 of the population to answer “yes”.

So what is the point of doing the survey like this? Simply put it allows the true answer to be hidden behind the protocol. If the data were leaked and an individual from it was identified as being suspected of driving under the influence then they could always argue they were told to say “yes” because of the way the coins fell.

In the real world a number of companies including the US census, Apple, Google and Privitar Lens use differential privacy to limit the disclosure of private information about individuals whose information is in public databases.

It would be nice to think that the NHS data that is supposedly being used by US drug companies was protected by some form of differential privacy. If it were, and if this could be explained to the public in a reasonable and rational way, then surely we would all benefit both in the knowledge that our data is safe and is maybe even being put to good use in protecting and improving our health. After all, wasn’t this meant to be the true benefit of living in a connected society where information is shared for the betterment of all our lives?

*Based on an example from Kearns and Roth in The Ethical Algorithm.

Cummings needs data scientists, economists and physicists (oh, and weirdos)

Dominic Cummings
Dominic Cummings – Image Copyright Business Insider Australia

To answer my (rhetorical) question in this post I think it’s been pretty much confirmed since the election that Dominic Cummings is, in equal measures, the most influential, disruptive, powerful and dangerous man in British politics right now. He has certainly set the cat amongst the pigeons in this blog post where he has effectively by-passed the civil service recruitment process by advertising for people to join his ever growing team of SPAD’s (special advisors). Cummings is looking for data scientists, project managers, policy experts and assorted weirdos to join his team. (Interestingly today we hear that the self-proclaimed psychic Uri Geller has applied for the job believing he qualifies because of the super-talented weirdo aspect of the job spec.)

Cummings is famed for his wide reaching reading tastes and the job spec also cites a number of scientific papers potential applicants “will be considering”. The papers mentioned are broadly in the areas of complex systems and the use of maths and statistics in forecasting which give an inkling into the kind of problems Cummings sees as those that need to be ‘fixed’ in the civil service as well as the government at large (including the assertion that “Brexit requires many large changes in policy and in the structure of decision-making”).

Like many of his posts, this particular one tends to ramble and also be contradictory. In one paragraph he’s saying that you “do not need a PhD” but then in the very next one saying you  “must have exceptional academic qualifications from one of the world’s best universities with a PhD or MSc in maths or physics.”

Cummings also returns to one of his favourite topics which is that of the failure of projects – mega projects in particular – and presumably those that governments tend to initiate and not complete on time or to budget (or at all). He’s an admirer of some of the huge project successes of yesteryear such as The Manhattan Project (1940s), ICBMs (1950s) and Apollo (1960s) but reckons that since then the Pentagon has “systematically de-programmed itself from more effective approaches to less effective approaches from the mid-1960s, in the name of ‘efficiency’.” Certainly the UK government is no stranger to some spectacular project failures itself both in the past and present (HS2 and Crossrail being two more contemporary examples of not so much failures but certainly massive cost overruns).

However as John Naughton points out here  “these inspirational projects have some interesting things in common: no ‘politics’, no bureaucratic processes and no legal niceties. Which is exactly how Cummings likes things to be.” Let’s face it both Crossrail and HS2 would be a doddle of only you could do away with all those pesky planning proposals and environmental impact assessments you have to do and just move people out of the way quickly – sort of how they do things in China maybe?

Cummings believes that now is the time to bring together the right set of people with a sufficient amount of cognitive diversity and work in Downing Street with him and other SPADs to start to address some of the wicked problems of government. One ‘lucky’ person will be his personal assistant, a role which he says will “involve a mix of very interesting work and lots of uninteresting trivia that makes my life easier which you won’t enjoy.” He goes on to say that in this role you “will not have weekday date nights, you will sacrifice many weekends — frankly it will hard having a boy/girlfriend at all. It will be exhausting but interesting and if you cut it you will be involved in things at the age of ~21 that most people never see.” That’s quite some sales pitch for a job!

What this so called job posting is really about though is another of Cummings abiding obsessions (which he often discusses in his blog) that the government in general, and civil service in particular (which he groups together as “SW1”), is basically not fit for purpose because it is scientifically and technologically illiterate as well as being staffed largely with Oxbridge humanities graduates. The posting is also a thinly veiled attempt at pushing the now somewhat outdated ‘move fast and break things” mantra of Silicon Valley. An approach that does not always play out well in government (Universal Credit anyone). I well remember my time working at the DWP (yes, as a consultant) where one of the civil servants with whom I was working said that the only problem with disruption in government IT was that it was likely to lead to riots on the streets if benefit payments were not paid on time. Sadly, Universal Credit has shown us that it’s not so much street riots that are caused but a demonstrable increase in demand for food banks. On average, 12 months after roll-out, food banks see a 52% increase in demand, compared to 13% in areas with Universal Credit for 3 months or less.

Cummings of course would say that the problem is not so much that disruption per se causes problems but rather the ineffective, stupid and incapable civil servants who plan and deploy such projects are at fault, hence the need for hiring the right ‘assorted weirdos’ who will bring new insights that fusty old civil servants cannot see. Whilst he may well be right that SW1 is lacking in deep technical experts as well as great project managers and ‘unusual’ economists he needs to realise that government transformation cannot succeed unless it is built on a sound strategy and good underlying architecture. Ideas are just thoughts floating in space until they can be transformed into actions that result in change which takes into account that the ‘products’ that governments deal with are people not software and hardware widgets.

This problem is far better articulated by Hannah Fry when she says that although maths has, and will continue to have, the capability to transform the world those who apply equations to human behaviour fall into two groups: “those who think numbers and data ultimately hold the answer to everything, and those who have the humility to realise they don’t.”

Possibly the last words should be left to Barack Obama who cautioned Silicon Valley’s leaders thus:

“The final thing I’ll say is that government will never run the way Silicon Valley runs because, by definition, democracy is messy. This is a big, diverse country with a lot of interests and a lot of disparate points of view. And part of government’s job, by the way, is dealing with problems that nobody else wants to deal with.

So sometimes I talk to CEOs, they come in and they start telling me about leadership, and here’s how we do things. And I say, well, if all I was doing was making a widget or producing an app, and I didn’t have to worry about whether poor people could afford the widget, or I didn’t have to worry about whether the app had some unintended consequences — setting aside my Syria and Yemen portfolio — then I think those suggestions are terrific. That’s not, by the way, to say that there aren’t huge efficiencies and improvements that have to be made.

But the reason I say this is sometimes we get, I think, in the scientific community, the tech community, the entrepreneurial community, the sense of we just have to blow up the system, or create this parallel society and culture because government is inherently wrecked. No, it’s not inherently wrecked; it’s just government has to care for, for example, veterans who come home. That’s not on your balance sheet, that’s on our collective balance sheet, because we have a sacred duty to take care of those veterans. And that’s hard and it’s messy, and we’re building up legacy systems that we can’t just blow up.”

Now I think that’s a man who shows true humility, something our current leaders (and their SPADs) could do with a little more of I think.

 

The story so far…

 

Photo by Joshua Sortino on Unsplash
Photo by Joshua Sortino on Unsplash

It’s hard to believe that this year is the 30th anniversary of Tim Berners-Lee’s great invention, the World-Wide Web, and that much of the technology that enabled his creation is still less than 60 years old. Here’s a brief history of the Internet and the Web, and how we got to where we are today, in ten significant events.

 

#1: 1963 – Ted Nelson begins developing a model for creating and using linked content he calls hypertext and hypermedia. Hypertext is born.

#2: 1969 – The first message is sent over the ARPANET from computer science Professor Leonard Kleinrock’s laboratory at University of California, Los Angeles to the second network node at Stanford Research Institute. The Internet is born.

#3: 1969 – Charles Goldfarb, leading a small team at IBM, developed the first markup language, called Generalized Markup Language, or GML. Markup languages are born.

#4: 1989 – Tim Berners-Lee whilst working at CERN publishes his paper, Information Management: A Proposal. The World Wide Web (WWW) is born.

#5: 1993Mosaic, a graphical browser aiming to bring multimedia content to non-technical users (images and text on the same page) is invented by Marc Andreessen. The web browser is born.

#6: 1995 – Jeff Bezos launches Amazon “earth’s biggest bookstore” from a garage in Seattle. E-commerce is born.

#7: 1998 – The Google company is officially launched by Larry Page and Sergey Brin to market Google Search. Web search is born.

#8: 2003Facebook (then called FaceMash but changed to The Facebook a year later) is founded by Mark Zuckerberg with his college roommate and fellow Harvard University student Eduardo Saverin. Social media is born.

#9: 2007 – Steve Jobs launches the iPhone at MacWorld Expo in San Francisco. Mobile computing is born.

#10: 2018 – Tim Berners-Lee instigates act II of the web when he announces a new initiative called Solid, to reclaim the Web from corporations and return it to its democratic roots. The web is reborn?

I know there have been countless events that have enabled the development of our modern Information Age and you will no doubt think others should be included in preference to some of my suggestions. Also, I suspect that many people will not have heard of my last choice (unless you are a fairly hardcore computer type). The reason I have added this one is because I think/hope it will start to address what is becoming one of the existential threats of our age, namely how we survive in a world awash with data (our data) that is being mined and used without us knowing, much less understanding, the impact of such usage. Rather than living in an open society in which ideas and data are freely exchanged and used to everyones benefit we instead find ourselves in an age of surveillance capitalism which, according to this source, is defined as being:

…the manifestation of George Orwell’s prophesied Memory Hole combined with the constant surveillance, storage and analysis of our thoughts and actions, with such minute precision, and artificial intelligence algorithmic analysis, that our future thoughts and actions can be predicted, and manipulated, for the concentration of power and wealth of the very few.

In her book The Age of Surveillance Capitalism, Shoshana Zuboff provides a sweeping (and worrying) overview and history of the techniques that the large tech companies are using to spy on us in ways that even George Orwell would have found alarming. Not least because we have voluntarily given up all of this data about ourselves in exchange for what are sometimes the flimsiest of benefits. As Zuboff says:

Thanks to surveillance capitalism the resources for effective life that we seek in the digital realm now come encumbered with a new breed of menace. Under this new regime, the precise moment at which our needs are met is also the precise moment at which our lives are plundered for behavioural data, and all for the sake others gain.

Tim Berners-Lee invented the World-Wide Web then gave it away so that all might benefit. Sadly some have benefited more than others, not just financially but also by knowing more about us than most of us would ever want or wish. I hope for all our sakes the work that Berners-Lee and his small group of supporters is doing make enough progress to reverse the worst excesses of surveillance capitalism before it is too late.

What Are Digital Skills?

Photo by Sabri Tuzcu on Unsplash
Photo by Sabri Tuzcu on Unsplash

There have been many, many reports both globally and within the UK bemoaning the lack of digital skills in todays workforce. The term digital skills is somewhat amorphous however and can mean different things to different people.

To more technical types it can mean the ability to write code, develop new computer hardware or have deep insights into how networks are set up and configured. To less digital savvy people it may just mean the ability to operate digital technology such as tablets and mobile phones or how to find information on the world wide web or even just fill out forms on web sites (e.g. to apply for a bank account).

A recent report from the CBI, Delivering Skills for the New Economy , which comes up with a number of concrete steps on how the UK might address a shortage of digital skills, suggests the following as a way of categorising these skills. Useful if we are to find way in which to address their scarcity.

  • Basic digital skills: Businesses define basic digital skills in similar terms. For most businesses this means computer literacy such as familiarity with Microsoft Office; handling digital information and content; core skills such as communication and problem-solving; and understanding how digital technologies work. This understanding of digital technologies includes understanding how data can be used to glean new insights, how social media provides value for a business or how an algorithm or piece of digitally-enabled machinery works.
    Basic digital skills: Businesses define basic digital skills in similar terms. For most businesses this means computer literacy such as familiarity with Microsoft Office; handling digital information and content; core skills such as communication
    and problem-solving; and understanding how digital technologies work. This understanding of digital technologies includes understanding how data can be used to glean new insights, how social media provides value for a business or how an algorithm or piece of digitally-enabled machinery works.
  • Advanced digital skills: Businesses also broadly agree on the definitions of advanced digital skills. For most businesses, these include software engineering and development (77%), data analytics (77%), IT support and system maintenance (81%) and digital marketing and sales (72%). Businesses have highlighted their increasing need for specific advanced digital skills, including programming, visualisation, machine learning, data analytics, app development, 3D printing expertise, cloud awareness and cybersecurity.
    It is important that a good grounding in the basic (core) skills is given to as many people as possible. The so called digital natives or “Gen Zs” (at least in first world countries) have grown up knowing nothing else but the world wide web, touch screen technology and pervasive social media. Older generations, less so. All need this information if they are to operate effectively in the “New Economy” (or know enough to actively disengage from it if they choose to do so).

The basic skills will also allow for a more critical assessment of what advanced digital skills should be considered if making choices about jobs or if people just need to understand what social media companies they should or should not be using or how artificial intelligence might affect their career prospects.

I would argue that a basic level of advanced digital knowledge is also a requirement so that everyone can play a more active role in this modern economy and understand the implications of technology.

Do Startups Need Enterprise Architectures?

And by implication, do they need Enterprise Architects?

For the last few years I have been meeting with startup and scaleup founders, in and around my local community of Birmingham in the United Kingdom, offering thoughts and advice on how they should be thinking about the software and systems architectures of the applications and platforms they are looking to build. Inevitably, as the discussions proceed and we discuss not just what they need now (the minimum viable architecture if you like) but what they might need in the future, as they hopefully scale and grow, the question arises of how and why they need to worry about architecture now and can’t leave it until later when hopefully they’ll have a bit more investment and be able to hire people to do it “properly”.

To my mind this is a bit of a chicken and egg type question. Do they set the groundwork properly now, in the hope that this will then form the foundation on which they grow or, do they just hack something together “quick and dirty” to get paying customers onto the platform and only then think about architecture and go through the pain of moving onto a more robust platform that will be future proof? Or do they try and have their cake and eat it and somehow do a bit of both? Spending too much time on pointless architecture may mean you never get to launch because you run out of time or money. On the other hand if you don’t have at least the basics of an architecture you might launch but quickly collapse if your system cannot support an unexpected surge in users or maybe a security breach.

These are important questions to consider and ones which need to be addressed early on in the startup cycle, at least to the degree that if an enterprise architecture (EA) is not laid out, the reasons for not doing so are clear. In other words has an architecture decision been made for having an EA or not?

No startup being formed today should consider that IT and business are separate endeavours. Whether you are a new cake shop selling patisseries using locally sourced ingredients or a company launching a new online social media enterprise that hopes to take onFacebook, IT will be fundamental to your business model. Decisions you make when starting out could live with you for a long, long time (and remember just five years is a long time in the IT world).

Interestingly it’s often the business folk that understand the need for doing architecture early on rather than IT people. Possibly this is because business people are looking at costs not just today but over a five year period and want to minimise the amount of IT rework they need to do. IT folk just see it as bringing in new cool toys to play with every year or so (and may not even be around in five years time anyway as they are more likely to be contract staff).

Clearly the amount of EA a startup or scaleup needs has to be in proportion to the size and ambitions of the business. If they are a one or two man band who just needs a minimum viable product to show to investors then maybe a simple solution architecture captured using, for example, the C4 model for software architecture will suffice.

For businesses who are beyond their first seed rounding of funding and are into series A or B investment then I do believe they should be thinking seriously about using part of that investment to build some elements of an EA. Whether you use TOGAF or Zachman or any other framework doesn’t really matter. What does matter is that you capture the fundamental organisation of the system you wish to build to help offset the amount of technical debt you are prepared to take on.

Here are some pointers and guidelines for justifying an EA to your founders, investors and employees.

Reducing Your Technical Debt

Technical debt is what you get when you release not-quite-right code out into the world. Every minute spent on not-quite-right code counts as interest on that debt. The more you can do to ensure “right-first-time” deployments the lower your maintenance costs and the more of your budget you’ll have to spend on innovation rather than maintenance. The lower, in other words, are your technical debt interest repayments. For a scaleup this is crucial. As much of every dollar of your investment funding that can go on marketing or innovating your platform leads to an improved bottom line, more customers and overall making you attractive to potential buyers. This leads to…

Enhancing Your Exit Strategy

Many, if not most, startups have an exit strategy which usually involves being brought out by a larger company making the founders and initial investors rich beyond their wildest dreams enabling them to go on and found other startups (or retire onto a yacht in the Caribbean). Those large companies are going to want to see what they are getting for their money and having a well thought through and documented EA is a step on the way to proving that. The large company does not want to become another HP who got its fingers badly burnt in the infamous Autonomy buyout.

To Infinity and Beyond

Maybe your founder is another Mark Zuckerberg who refuses the advances of other companies and instead decides to go it alone in conquering the world. If successful there is going to be some point at which your user base grows beyond what you ever thought possible. If that’s the case hopefully you architected your system to be extensible as well as manageable to support all those users. Whilst there are no guarantees at least having the semblance of an EA will reduce the chances of having to go back to square one and rebuilding your whole technology base from scratch.

Technology Changes

Hardly an Earth shattering statement but often one which you tend not to consider when starting out. As technologists, we all know the ever increasing pace of change of the technology we deal with and the almost impossible task of keeping up with such change. Whilst there may be no easy ways to deal with the complete left-field, unexpected (who’d have thought blockchain would be a thing just 10 years go) having a well thought through business, data, application and technology architecture which contains loosely coupled components with well understood functionality and relationships at least means you can change or upgrade those components when the technology improves.

K.I.S.S (Keep It Simple Stupid)

It is almost certain that the first version of the system you come up with will be more complicated than it needs to be (which is different from saying the system is complex). Anyone who invests in V1.0 or even V2.0 of a system is almost certainly buying more complexity than they need. That complexity will manifest itself in how easy (or not) it is to maintain or upgrade the solution or how often it fails or under-performs. By following at least part of a well thought through architecture development method (ADM) which you iterate over a few times in the early stages of the development process you should be able to spot complexity, redundant components or ones which should not be built by you at all but which should be procured instead as commercial of the shelf (COTS) products. Bear in mind of course that COTS products are not always what they seem and using EA and an ADM to evaluate such products can save much grief further down the road.

It’s All About Strategy

For a startup, or even a scaleup, getting your strategy right is key. Investors often want to see a three or even five year plan which shows how you will execute on your strategy. Without having a strategy which maps out where you want to go how will you ever be able to build a plan that shows how you will get there?

Henry Mintzberg developed the so called 5P’s of Strategy as being:

  1. Strategy as Planning – large planning exercises, defining the future of the organisation.
  2. Strategy as a Ploy – to act to influence a competitor or market.
  3. Strategy as a Position – to act to take a chosen place in the chosen market.
  4. Strategy as Pattern – the strategy that has evolved over time.
  5. Strategy as a Perspective – basing the strategy on cultural values or similar non-tangible concepts.

For startups, where it is important to understand early on your ploy (who are you going to disrupt), your position (where do you want to place yourself and what differentiates you in the market) and perspective (what are your cultural values that makes you stand out), an EA can help you formulate these. Your business architecture can be developed to show how it supports these P’s and your technology architecture can show how it implements them.

As an aside one of the most effective tools I have used for understanding strategy and mapping and deciding what should be built versus what should be bought are Wardley maps. Not usually part of an EA or an ADM but worthwhile investigating.

So, in summary, whilst having an EA for a startup will in no way guarantee its success I believe not having an EA could inhibit its smooth growth and in the long run cost more in rework and refactoring (i.e. to reduce the amount of technical debt) than the cost of putting together, at least an outline EA, in the first place. How much EA is right for your startup/scaleup is up to you but it’s worthwhile giving some thought to this as part of your initial planning.