Pythons and Pandas (Or Why Software Architects No Longer Have an Excuse Not to Code)

pythonpanda

The coronavirus pandemic has certainly shown just how much the world depends not just on accurate and readily available datasets but also the ability of scientists and data analysts to make sense of that data. All of us are at the mercy of those experts to interpret this data correctly – our lives could quite literally depend on it.

Thankfully we live in a world where the tools are available to allow anyone, with a bit of effort, to learn how to analyse data themselves and not just rely on the experts to tell us what is happening.

The programming language Python, coupled with the pandas dataset analysis library and Bokeh interactive visualisation library, provide a robust and professional set of tools to begin analysing data of all sorts and get it into the right format.

Data on the coronavirus pandemic is available from lots of sources including the UK’s Office for National Statistics as well as the World Health Organisation. I’ve been using data from DataHub which provides datasets in different formats (CSV, Excel, JSON) across a range of topics including climate change, healthcare, economics and demographics. You can find their coronavirus related datasets here.

I’ve created a set of resources which I’ve been using to learn Python and some of its related libraries which is available on my GitHub page here. You’ll also find the project which I’ve been using to analyse some of the COVID-19 data around the world here.

The snippet of code below shows how to load a CSV file into a panda DataFrame – a 2-dimensional data structure that can store data of different types in columns that is similar to a spreadsheet or SQL table.

# Return COVID-19 info for country, province and date.
def covid_info_data(country, province, date):
    df4 = pd.DataFrame()
    if (country != "") and (date != ""):
        try:
            # Read dataset as a panda dataframe
            df1 = pd.read_csv(path + coviddata)

            # Check if country has an alternate name for this dataset
            if country in alternatives:
                country = alternatives[country]

            # Get subset of data for specified country/region
            df2 = df1[df1["Country/Region"] == country]

            # Get subset of data for specified date
            df3 = df2[df2["Date"] == date]

            # Get subset of data for specified province. If none specified but there
            # are provinces the current dataframe will contain all with the first one being 
            # country and province as 'NaN'. In that case just select country otherwise select
            # province as well.
            if province == "":
                df4 = df3[df3["Province/State"].isnull()]
            else:
                df4 = df3[df3["Province/State"] == province]
        except FileNotFoundError:
            print("Invalid file or path")
    # Return selected covid data from last subset
    return df4

The first ten rows from the DataFrame df1 shows the data from the first country (Afghanistan).

         Date Country/Region Province/State   Lat  Long  Confirmed  Recovered  Deaths
0  2020-01-22    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
1  2020-01-23    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
2  2020-01-24    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
3  2020-01-25    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
4  2020-01-26    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0

Three further subsets of data are made, the final one is for a specific country showing the COVID-19 data for a particular date (the UK on 7th May in this case).

             Date  Country/Region Province/State      Lat   Long  Confirmed  Recovered   Deaths
26428  2020-05-07  United Kingdom            NaN  55.3781 -3.436   206715.0        0.0  30615.0

Once the dataset has been obtained the information can be printed in a more readable way. Here’s a summary of information for the UK on 9th May.

Date:  2020-05-09
Country:  United Kingdom
Province: No province
Confirmed:  215,260
Recovered:  0
Deaths:  31,587
Population:  66,460,344
Confirmed/100,000: 323.89
Deaths/100,000: 47.53
Percent Deaths/Confirmed: 14.67

Obviously there are lots of ways of analysing this dataset as well as how to display it. Graphs are always a good way of showing information and Bokeh is a nice and relatively simple to use Python library for creating a range of different graphs. Here’s how Bokeh can be used to create a simple line graph of COVID-19 deaths over a period of time.

from datetime import datetime as dt
from bokeh.plotting import figure, output_file, show
from bokeh.models import DatetimeTickFormatter

def graph_covid_rate(df):
    x = []
    y = []
    country = df.values[0][1]
    for deaths, date in zip(df['Deaths'], df['Date']):
        y.append(deaths) 
        date_obj = dt.strptime(date, "%Y-%m-%d")
        x.append(date_obj)

    # output to static HTML file
    output_file("lines.html")

    # create a new plot with a title and axis labels
    p = figure(title="COVID-19 Deaths for "+country, x_axis_label='Date', y_axis_label='Deaths', x_axis_type='datetime')

    # add a line renderer with legend and line thickness
    p.line(x, y, legend_label="COVID-19 Deaths for "+country, line_width=3, line_color="green")
    p.xaxis.major_label_orientation = 3/4

    # show the results
    show(p)

Bokeh creates an HTML file of an interactive graph. Here’s the one the above code creates, again for the UK, for the period 2020-02-01 to 2020-05-09.

As a recently retired software architect (who has now started a new career working for Digital Innovators, a company addressing the digital skills gap) coding is still important to me. I’m a believer in the Architect’s Don’t Code anti-pattern believing that design and coding are two sides of the same coin and you cannot design if you cannot code (and you cannot code if you cannot design). These days there really is no excuse not to keep your coding skills up to date with the vast array of resources available to everyone with just a few clicks and Google searches.

I also see coding as not just a way of keeping my own skills up to date and to teach others vital digital skills, but also, as this article helpfully points out, as a way of helping solve problems of all kinds. Coding is a skill for life that is vitally important for young people entering the workplace to at least have a rudimentary understanding of to help them not just get a job but to also understand more of the world in these incredibly uncertain times.

The Fall and Rise of the Full Stack Architect

strawberry-layer-cake

Almost three years ago to the day on here I wrote a post called Happy 2013 and Welcome to the Fifth Age! The ‘ages’ of (commercial) computing discussed there were:

  • First Age: The Mainframe Age (1960 – 1975)
  • Second Age: The Mini Computer Age (1975 – 1990)
  • Third Age: The Client-Server Age (1990 – 2000)
  • Fourth Age: The Internet Age (2000 – 2010)
  • Fifth Age: The Mobile Age (2010 – 20??)

One of the things I wrote in that article was this:

“Until a true multi-platform technology such as HTML5 is mature enough, we are in a complex world with lots of new and rapidly changing technologies to get to grips with as well as needing to understand how the new stuff integrates with all the old legacy stuff (again). In other words, a world which we as architects know and love and thrive in.”

So, three years later, are we any closer to having a multi-platform technology? Where does cloud computing fit into all of this and is multi-platform technology making the world get more or less complex for us as architects?

In this post I argue that cloud computing is actually taking us to an age where rather than having to spend our time dealing with the complexities of the different layers of architecture we can be better utilised by focussing on delivering business value in the form of new and innovative services. In other words, rather than us having to specialise as layer architects we can become full-stack architects who create value rather than unwanted or misplaced technology. Let’s explore this further.

The idea of the full stack architect.

Vitruvius, the Roman architect and civil engineer, defined the role of the architect thus:

“The ideal architect should be a [person] of letters, a mathematician, familiar with historical studies, a diligent student of philosophy, acquainted with music, not ignorant of medicine, learned in the responses of juriconsults, familiar with astronomy and astronomical calculations.”

Vitruvius also believed that an architect should focus on three central themes when preparing a design for a building: firmitas (strength), utilitas (functionality), and venustas (beauty).

vitruvian man
Vitruvian Man by Leonardo da Vinci

For Vitruvius then the architect was a multi-disciplined person knowledgable of both the arts and sciences. Architecture was not just about functionality and strength but beauty as well. If such a person actually existed then they had a fairly complete picture of the whole ‘stack’ of things that needed to be considered when architecting a new structure.

So how does all this relate to IT?

In the first age of computing (roughly 1960 – 1975) life was relatively simple. There was a mainframe computer hidden away in the basement of a company managed by a dedicated team of operators who guarded their prized possession with great care and controlled who had access to it and when. You were limited by what you could do with these systems not only by cost and availability but also by the fact that their architectures were fixed and the choice of programming languages (Cobol, PL/I and assembler come to mind) to make them do things was also pretty limited. The architect (should such a role have actually existed then) had a fairly simple task as their options were relatively limited and the number of architectural decisions that needed to be made were correspondingly fairly straight forward. Like Vitruvias’ architect one could see that it would be fairly straight forward to understand the full compute stack upon which business applications needed to run.

Indeed, as the understanding of these computing engines increased you could imagine that the knowledge of the architects and programmers who built systems around these workhorses of the first age reached something of a ‘plateau of productivity’*.

Architecture Stacks 3

However things were about to get a whole lot more complicated.

The fall of the full stack architect.

As IT moved into its second age and beyond (i.e. with the advent of mini computers, personal computers, client-server, the web and early days of the internet) the breadth and complexity of the systems that were built increased. This is not just because of the growth in the number of programming languages, compute platforms and technology providers but also because each age has built another layer on the previous one. The computers from a previous age never go away, they just become the legacy that subsequent ages must deal with. Complexity has also increased because of the pervasiveness of computers. In the fifth age the number of people whose lives are now affected by these machines is orders of magnitude greater than it was in the first age.

All of this has led to niches and specialisms that were inconceivable in the early age of computing. As a result, architecting systems also became more complex giving rise to what have been termed ‘layer’ architects whose specialities were application architecture, infrastructure architecture, middleware architecture and so on.

Architecture Stacks

Whole professions have been built around these disciplines leading to more and more specialisation. Inevitably this has led to a number of things:

  1. The need for communications between the disciplines (and for them to understand each others ‘language’).
  2. As more knowledge accrues in one discipline, and people specialise in it more, it becomes harder for inter-disciplinary understanding to happen.
  3. Architects became hyper-specialised in their own discipline (layer) leading to a kind of ‘peak of inflated expectations’* (at least amongst practitioners of each discipline) as to what they could achieve using the technology they were so well versed in but something of a ‘trough of disillusionment’* to the business (who paid for those systems) when they did not deliver the expected capabilities and came in over cost and behind schedule.

Architecture Stacks 4

So what of the mobile and cloud age which we now find ourselves in?

The rise of the full stack architect.

As the stack we need to deal with has become more ‘cloudified’ and we have moved from Infrastructure as a Service (IaaS) to Platform as a Service (PaaS) it has become easier to understand the full stack as an architect. We can, to some extent, take for granted the lower, specialised parts of the stack and focus on the applications and data that are the differentiators for a business.

Architecture Stacks 2

We no longer have to worry about what type of server to use or even what operating system or programming environments have to be selected. Instead we can focus on what the business needs and how that need can be satisfied by technology. With the right tools and the right cloud platforms we can hopefully climb the ‘slope of enlightenment’ and reach a new ‘plateau of productivity’*.

Architecture Stacks 5

As Neal Ford, Software Architect at Thoughtworks says in this video:

“Architecture has become much more interesting now because it’s become more encompassing … it’s trying to solve real problems rather than play with abstractions.”

 

I believe that the fifth age of computing really has the potential to take us to a new plateau of productivity and hopefully allow all of us to be architects described by this great definition from the author, marketeer and blogger Seth Godin:

“Architects take existing components and assemble them in interesting and important ways.”

What interesting and important things are you going to do in this age of computing?

* Diagrams and terms borrowed from Gartner’s hype cycle.

When a Bridge Falls Down is the Architect to Blame?

Here’s a question. When a bridge or building falls down whose “fault” is it. Is it the architect who designed the bridge or building in the first place, is it the the builders and construction workers who did not build it to spec, the testers for not testing the worst-case scenario or the people that maintain or operate the building or bridge? How might we use disasters from the world of civil engineering to learn about better ways of architecting software systems?Here are four well known examples of architectural disasters (resulting in increasing loss of life) from the world of civil engineering:

  1. The Millenium Bridge in London, a steel suspension bridge for pedestrians across the Thames. Construction of the bridge began in 1998, with the opening on 10 June 2000. Two days after the bridge opened the participants in a charity walk felt an unexpected swaying motion. The bridge was closed for almost two years while modifications were made to eliminate the “wobble” which was caused by a positive feedback phenomenon, known as Synchronous Lateral Excitation. The natural sway motion of people walking caused small sideways oscillations which in turn caused people on the bridge to sway in step increasing the amplitude of the bridge oscillations and continually reinforcing the effect. Engineers added dampers to the bridge to prevent the horizontal and vertical movement. No people or animals were injured in this incident.
  2. The Tacoma Narrows Bridge was opened to traffic on July 1st 1940 and collapsed four months later. At the time of its construction the bridge was the third longest suspension bridge in the world. Even from the time the deck was built, it began to move vertically in windy conditions, which led to it being given the nickname Galloping Gertie. Several measures aimed at stopping the motion were ineffective, and the bridge’s main span finally collapsed under 40 mph wind conditions on November 7, 1940. No people were injured in this incident but a dog was killed.
  3. On 28 January 2006, the roof of one of the buildings at the Katowice International Fair collapsed in Katowice, Poland. There were 700 people in the hall at the time. The collapse was due to the weight of the snow on the roof. A later inquiry found numerous design and construction flaws that contributed to the speed of collapse. 65 people were killed when the roof collapsed.
  4. The twin towers of the World Trade Center (WTC) in downtown Manhattan collapsed on September 11 2001when al-Qaeda terrorists hijacked two commercial passenger jets and flew them into the skyscrapers. A government report that looked at the collapse declared that the WTC design had been sound and attributed the collapses to “extraordinary factors beyond the control of the builders”. 2,752 people died, including all 157 passengers and crew aboard the two airplanes.

In at least one of these cases (the Katowice International Fair building) various people (including the designers) have been indicted for “directly endangering lives of other people” and face up to 12 years of prison. They are also charging the buildings operator for “gross negligence” in not removing snow quickly enough.

So what can we learn from these natural and man made disasters and apply to the world of software architecture? In each of these cases the constructions were based on well known “patterns” (suspension bridges, trade halls and skyscrapers have all successfully been built before and have not collapsed). What was different in each of these cases was that the non-functional characteristics were not taken into account. In the case of the bridges, oscillations caused by external factors (people and winds) were not adequately catered for. In the case of the trade hall in Katowice the building’s roof was not engineered to handle the additional weight caused by snow. Finally, in the case of the WTC, the impact of a modern passenger jet, fully laden with fuel, crashing into the building was simply not conceived of (although interestingly an “aircraft-impact analysis”, involving the impact of a Boeing 707 at 600 mph was actually done which concluded that although there would “a horrendous fire” and “a lot of people would be killed,” the building itself would not collapse. Here are some lessons I would draw from these incidents and how we might relate them to the field of software architecture:

  1. Architects need to take into account all non-functional requirements. Obviously this is easier said than done. Who would have thought of such an unexpected event as a passenger jet crashing into a skyscraper? Actually, to their credit, the buildings architects did but what they lacked was the ability to properly model the effect of such impacts on the structures, especially the effects of the fires.
  2. For complex systems, architects should build models to model all aspects of the architecture. Tools appropriate to the task should be deployed and the right “level” of modelling needs to be done. Prototyping as a means of testing new or interesting technical challenges should also be adopted.
  3. Designers should faithfully implement the architectural blueprints and the architect should remain on the project during the design and implementation phases to check their blueprints are implemented as expected.
  4. Testing should be taken into account early and thought given to how the non-functional characteristics can be tested. Real limits should be applied taking into account the worst case (but realistic) scenario.
  5. Operations and maintenance should be involved from an early stage to make sure they are aware of the impact of unexpected events (for example a complete loss of all systems because of an aircraft crashing on the data centre) and have operational procedures in place to address such events.

As a final, and sobering, footnote to the above here’s a quote from a report produced by the British Computer Society and Royal Academy of Engineers called The Challenges of Complex IT Projects.

Compared with other branches of engineering in their formative years, not enough people (are known to) die from software errors. It is much easier to hide a failed software project than a collapsing bridge or an exploding chemical plant.

The implications of this statement would seem to be that it’s only when software has major, and very public, failures that people will really take note and maybe start to address problems before they occur. There are plenty of learning points (anti-patterns) in other industries that we can learn from and should probably do so before we start having major software errors that cause loss of life.

You may be interested in the follow up to the above report which describes some success stories and why they worked (just in case you thought it was all bad).

 

10 Things I (Should Have) Learned in (IT) Architecture School

Inspired by this book I discovered in the Tate Modern book shop this week I don’t (yet) have 101 things I can claim I should have learned in IT Architecture School but this would certainly be my 10 things:

  1. The best architectures are full of patterns. This from Grady Booch. Whilst there is an increasing  need to be innovative in the architectures we create we also need to learn from what has gone before. Basing architectures on well-tried and tested patterns is one way of doing this.
  2. Projects that develop IT systems rarely fail for technical reasons. In this report the reasons for IT project failures are cited and practically all of them are because of human (communication) failures rather than real technical challenges. Learning point: effective IT architects need to have soft (people skills) as well as hard (technical skills). See my thoughts on this here.
  3. The best architecture documentation contains multiple viewpoints. There is no single viewpoint that adequately describes an architecture. Canny architects know this and use viewpoint frameworks to organise and categorise these various viewpoints. Here’s a paper myself and some IBM colleagues wrote a while ago describing one such viewpoint framework. You can also find out much more about this in the book I wrote with Peter Eeles last year.
  4. All architecture is design but not all design is architecture. Also from Grady. This is a tricky one and alludes to the thorny issue of “what is architecture” and “what is design”. The point is that the best practice of design (separation of concerns, design by contract, identification of clear component responsibilities etc) is also the practice of good architecture how architectures focus is on the significant elements that drive the overall shape of the system under development. For more on this see here.
  5. A project without a system context diagram is doomed to fail. Quite simply the system context bounds the system (or systems) under development and says what is in scope and what is out. If you don’t do this early you will spend endless hours later on arguing about this. Draw a system context early, get it agreed and print it out at least A2 size and pin it in highly visible places. See here for more discussion on this.
  6. Complex systems may be complicated but complicated systems are not necessarily complex. For more discussion on this topic see my blog entry here.
  7. Use architectural blueprints for building systems but use architectural drawings for communicating about systems. A blueprint is a formal specification of what is to be. This is best created using a formal modeling language such as UML or Archimate. As well as this we also need to be able to communicate our architectures to none or at least semi-literate IT people (often the people who hold the purse). Such communications are better done using drawings, not created using formal modeling tools but done with drawing tools. It’s worth knowing the difference and when to use each.
  8. Make the process fit the project, not the other way around. I’m all for having a ‘proper’ software delivery life-cycle (SDLC) but the first thing I do when deploying one on a project is customise it to my own purposes. In software development as in gentleman’s suits there is no “one size fits all”. Just like you might think you can pick up a suit at Marks and Spencers that perfectly fits you can’t. You also cannot take an off-the-shelf SDLC that perfectly fits your project. Make sure you customise it so it does fit.
  9. Success causes more problems than failure.This comes from Clay Shirky’s new book Cognitive Surplus. See this link at TED for Clay’s presentation on this topic. You should also check this out to see why organisations learn more from failure than success. The point here is that you can analyse a problem to death and not move forward until you think you have covered every base but you will always find some problem or another you didn’t expect. Although you might (initially) have to address more problems by not doing too much up front analysis in the long run you are probably going to be better off. Shipping early and benefitting from real user experience will inevitably mean you have more problems but you will learn more from these than trying to build the ‘perfect’ solution but running the risk of never sipping anything.
  10. Knowing how to present an architecture is as important as knowing how to create one. Although this is last, it’s probably the most important lesson you will learn. Producing good presentations that describe an architecture, that are targeted appropriately at stakeholders, is probably as important as the architecture itself. For more on this see here.
Enhanced by Zemanta

On Being an Artitect

My mum, who just turned 85 this month, mispronounces the word architect. She says “artitect” where a “t” replaces the “ch”. I’ve tried to put her right on this a few times but I’ve just finished reading the book by Seth Godin called “Linchpin – Are You Indispensable?” and decided that actually she’s probably been pronouncing the word right after all. I’ve decided that the key bit she’s got right and I (and all of the rest of us haven’t) is the “art” bit. Let me explain why.

The thrust of Seth’s book is that to survive in today’s world of work you have to bring a completely different approach to the way you do that work. In other words you have to be an artist. You have to create things that others can’t or won’t because they just do what they are told not what they think could be the right creative approach to building something that is radically new. Before I proceed much further with this thread I guess we need to define what we mean by artist in this context. I like this from Seth’s book:

An artist is someone who uses bravery, insight, creativity and boldness to challenge the status-quo. And an artist takes it personally.

As to what artists create:

Art isn’t only a painting. Art is anything that is creative, passionate and personal.

I’d also add something like “and changes the world for the better” to that last statement otherwise I think that some fairly dodgy activities might pass for art as well (or maybe even that is my lizard brain kicking in, see below).

Of course that’s not to say that you shouldn’t learn the basics of your craft whether you are a surgeon, a programmer or a barista in a coffee shop. Instead you should learn them but then forget them because after that they will hold you back. Picasso was a great “classical” artist. In other words he knew how to create art that would have looked perfectly respectable in traditional parts of the art galleries of the world where all the great masters work is displayed that follows the literal interpretation of the world. However once he had mastered that he threw the rule book out completely and started to create art that no one else had dared to do and changed the art-world forever.

So an artitect (rather than an architect) is someone who uses creativity, insight, breadth of vision and passion to create architectures (or even artitectures) that are new and different in someway that meet the challenges laid down for it, and then some.

Here are the five characteristics that I see a good artitect as having:

  1. Artitects are always creating new “mixes”. Some of the best IT architects I know tell me how they are creating new solutions to problems by pulling together software components and making them work together in interesting and new ways. Probably one of the greatest IT architects of all time – Tim Berners-Lee who invented the world-wide web – actually used a mix of three technologies and ideas that were already out there. Markup languages, the transmission control protocol (TCP) and hypertext. What Tim did was to put them together in quite literally a world-changing way.
  2. Artitects don’t follow the process in the manual, instead they write the manual. If you find yourself climbing the steps that someone else has already carved out then guess what, you’ll end up in the same place as everyone else, not somewhere that’s new and exciting.
  3. Artitects look at problems in a radically different way to everyone else. They try to find a completely different viewpoint that others won’t have seen and to build a solution around that. I liken this to a great photograph that takes a view that others have seen a thousand times before and puts a completely different spin on it either by standing in a different place, using a different type of lens or getting creative in the photo-editing stage.
  4. Artitects are not afraid to make mistakes or to receive ridicule from their peers and colleagues. Instead they positively thrive on it. Today you will probably have tens or even hundreds of ideas for solutions to problems pop into your head and pop straight out again because internally you are rejecting them as not been the “right approach”. What if instead of allowing your lizard brain (that is the part of your brain that evolved first and kept you safe on the savanna when you could easily get eaten by a sabre-toothed tiger) to have its say you wrote those ideas down and actually tried out a few? Nine out of ten or 99 out of a 100 of them might fail causing some laughter from your peers but the one that doesn’t could be great! Maybe even the next world-wide web?
  5. Artitects are always seeking out new ideas and new approaches from unlikely places. They don’t just seek out inspiration from the usual places that their profession demands but go to places and look to meet people in completely different disciplines. For new ideas talk to “proper” artists, real architects or maybe even accountants!!!

Perhaps from now on we should all do a bit less architecture and a bit more artitecture?

Architecture vs. Design

Yes it’s that old knotty problem again! Over at gapingvoid.com Hugh MacLeod is fond of using Venn diagrams to illustrate overlapping concerns so here’s one that I recently used for addressing the eternal architecture versus design debate that was the source of much discussion at a recent Architecture Thinking class I was giving.

As I remember it the discussion went something like this:

  • Student: So what’s the difference between architecture and design? It seems from what you’re saying its just a matter of scale?
  • Me: Whilst its true to say that architecture addresses the major components of the system, rather than the detail, it’s more than that. The architecture is the bridge between the “what” (that is the requirements) and the “how” (that is the design).
  • Student: Yeah but isn’t that we usually call “high-level design”.
  • Me: Not really. Grady Booch says: “All architecture is design but not all design is architecture”. (I cheated and looked up this quote afterwards and found that Booch goes on to say “architecture represents the significant design decisions that shape a system, where significant is measured by cost of change.”). In my experience high-level design is just the the view that allows the complete system to be represented on one page.
  • Student: I still don’t see what the difference really is.
  • Me: Okay, here’s the real difference for me (at this point I draw the above Venn on a flip-chart). As well as defining the structure of the system, architecture must also embrace the “what” and the “how” of that structure.  The “what” in this context is the requirements (functional and non-functional) and so architecture involves reasoning about and resolving these sometimes conflicting requirements. It’s about addressing those architecturally significant requirements (the “what”) that will drive (and constrain) the “how” (the design).

Now, if I was drawing this again (and maybe I’ll do this next time) I would actually draw a third overlapping circle which I’d label the “why”. This is where we’d capture the rationale for why we make the (architectural) decisions we do.

Thanks Hugh, this is a neat way of explaining the way things are!

Software Complexity and the Breakdown of Civilisation

Clay Shirky has written a great entry on his blog called The Collapse of Complex Business Models which has set me thinking about the whole issue around complexity; especially as it applies to complex software systems.The article uses Joseph Tainter’s book called The Collapse of Complex Societies for the basis of its premise. In that book Tainter looked at various ancient, sophisticated societies that suddenly collapsed (the Romans and the Maya for example). Tainter postulated that these societies “hadn’t collapsed despite their cultural sophistication, they’d collapsed because of it”. His theory was that as societies become more organised and efficient they find themselves with a surplus of resources and managing this surplus makes the society more complex. The spare resources go more into “gilding the lily” than creating what is strictly required. Early on the value of this complexity is positive and often pays for itself in improved output. Over time however the law of diminishing returns reduces this value and eventually disappears completely at which point any additional complexity is pure cost. The society has then reached a tipping point and when some unexpected stress occurs it has become too inflexible to respond. As Tainter says when “the society fails to respond to reduced circumstances through orderly downsizing, it isn’t because they don’t want to, it’s because they can’t”.

Shirky’s theory is that today, the internet means many businesses are facing similar challenges: adapt to a new way of working or die. In particular the media industry is failing to recognise it has built hugely complicated edifices around the production of media (that’s TV, movies, newspapers and music) that the internet is wiping away. Media execs like Rupert Murdoch are looking for ways of maintaining their status quo by just using the internet as a new delivery channel which allows them to continue with their current, costly and complex, business models. What they don’t realise is that the internet has changed fundamentally the way the media industry works with practically zero production and delivery costs and unless they change their complex and expensive ways the old order will die out.

So what’s this got to do with software systems? Although we might think the systems we are building today are complex we are about to start building a level of complexity that is an order of magnitude (at least) above what it is today. If we are to address some of the worlds really wicked problems then we need to make systems that are not just the siloed systems we have today but that are systems-of-systems interconnected in ways we cannot yet imagine or envisage. Whilst such interconnected systems might enable collaborations that help solve problems we should remain aware that we are adding new levels of complexity that may be hard to manage and even harder to do without if we ever hit some unexpected “stress” situation. In 1964, science fiction author Arthur C. Clarke wrote a short story called “Dial F for Frankenstein”. In the story the phone network (this was before the internet had even been thought of although, interestingly Tim Berners-Lee, the inventor of the web, suggested this was the story that anticipated that technology) had become so large and complex it was effectively a giant brain that becomes self-aware. Not only could it not be turned off, it started to think and eventually took over the world! Clarke himself even says “Dial F for Frankenstein is dated now because you no longer dial of course, and if I did it now it wouldn’t be the world’s telephone system it would be the internet. And that of course is a real possibility. When will the internet suddenly take over?”

We should be very aware therefore, if we are to learn anything from the history of those ancient civilisations, that adding more and more complexity to our systems is not without cost or risk. Although in the short-term we may reap rewards, in the longer terms we may yet regret some of the actions we are about to take and therefore make sure we remember to provide an on/off switch for these systems!

When Is It Time To Quit?

Whilst teaching an IT architecture class in Moscow this week I was asked the above question by one of the students. The question was in relation to projects and when you should realise the project is not going to deliver and should therefore be cancelled rather than carrying on regardless. I hate using the phrase “it depends” in answer to a question like this but this was one case when that’s all I could think of! However, having had a bit of time to reflect on this (on a long journey back home) here are a few thoughts that I’ve had since then. This is also related to my previous blog entry.

In some ways working on a software delivery project is a bit like fighting a war (though admittedly you don’t usually risk losing your life). In a war you may not win every battle and may sometimes need to retreat and reconfigure however, the important thing is to focus on the strategic battles that will lead you to get to the overall objective (and win the war). In a software delivery project winning the war is to successfully deliver the system or application on time and within budget. The battles are what the architect and project manager must fight every day to address the technical and other issues which arise and must be overcome for fear of letting the project get out of control. So what are the battles, which if not won, are going to cause the project to fail and, more importantly for this topic, will lead to surrender if the cost gets too high? Here’s my top five in order of importance:

  1. No, or poor, sponsorship.If there is no clear (and strong) sponsorship from the right level in the enterprise then you may as well pack up and go home immediately. Even if the project does get to the delivery stage the chances are no one will really want or need the system. The job of the sponsor is to evangelise, convince and cajole a sometimes reluctant workforce of the need for the new system and the change that it might bring about. Weak sponsors will be unable or unwilling to do this.
  2. Not having a water tight business case.If the business case for why the new system is needed is not well thought through with a clear cost-benefit analysis then there is a danger that the project could be cancelled at any time. I once worked on a project that was cancelled days before we were due to exit system test because a review of the business case revealed a flaw in the return on investment (ROI) of the new system so the client decided to pull the plug on the project rather than deliver something that would not make the cost savings that were expected of it. Painful but probably the right decision!
  3. Not engaging all stakeholders. Not identifying and engaging (i.e. talking to, convincing, schmoosing, wining and dining or whatever it takes) all the stakeholders is a bit like death by a thousand cuts. Although you will start off well, entropy will gradually set in and and things will start to fall apart. Stakeholders who should have been informed early on will learn about the project through the grapevine and will not only not support it but may actively try to kill it. They may see it as a threat or may treat it as sour grapes, “no one bothered to tell me about it so why should I support this”?
  4. Underestimating the complexity of building interfaces to legacy systems. Very few projects have the luxury of a clean sheet of paper when starting out. There is nearly always some old, creaking legacy system that needs to be interfaced with, even short-term or temporarily. In my experience one should never underestimate the complexity of these legacy system interfaces. And I don’t just mean technology interfaces. Many times IT departments will see it as a threat to their existence that what starts out as an interface may ultimately lead to a complete replacement of the system that they run and manage so lovingly and will therefore do whatever they can to derail the project.
  5. Not having regular and viable releases of the system/application. Gone are the days when a project could go off for many years before delivering anything of value to the key stakeholders (or I hope they have). I cannot emphasise enough the importance of delivering something of value that users can get their hands on and starting using that will give them some business benefits. This not only shows them the value of the new application but also gives them a belief that IT can actually deliver stuff and will also get them to “buy-in” to the new system and may well have the useful side-effect that they become sales-people for the new system and will convince colleagues of its benefits.

I believe that if one or more of these issues exist on your project you should either take very serious steps to address them or consider looking for another job because there is a greater than 50% chance everything is going to get very messy and is going to end in tears or much, much worse.