Complexity is Simple

I was taken with this cartoon and the comments put up by Hugh Macleod last week over at his gapingvoid.com blog so I hope he doesn’t mind me reproducing it here.

Complexity is Simple (c) Hugh Macleod 2014
Complexity is Simple (c) Hugh Macleod 2014

Complex isn’t complicated. Complex is just that, complex.

Think about an airplane taking off and landing reliably day after day. Thousands of little processes happening all in sync. Each is simple. Each adds to the complexity of the whole.

Complicated is the other thing, the thing you don’t want. Complicated is difficult. Complicated is separating your business into silos, and then none of those silos talking to each other.

At companies with a toxic culture, even what should be simple can end up complicated. That’s when you know you’ve really got problems…

I like this because it resonates perfectly well with a blog post I put up almost four years ago now called Complex Systems versus Complicated Systems. where I make the point that “whilst complicated systems may be complex (and exhibit emergent properties) it does not follow that complex systems have to be complicated“. A good architecture avoids complicated systems by building them out of lots of simple components whose interactions can certainly create a complex system but not one that needs to be overly complicated.

Architecting Out Complexity

Complexity kills. Complexity sucks the life out of users, developers and IT. Complexity makes products difficult to plan, build, test and use. Complexity introduces security challenges. Complexity causes administrator frustration.

So said Ray Ozzie ex-Microsoft Chief Software Architect and creator of Lotus Notes.

Complexity, or more precisely, overly complicated systems (not to be confused with complex systems) is one of the key anti-patterns architects must continuously fight against. Complexity is caused not just by adding additional and unwanted functionality (although that certainly does not help) but also by muddled architectural thinking as well as poorly made architectural decisions. Here’s the real problem though, the initial architecture of almost any system, unless it borrows very heavily from other, similar, architectures will rarely be without complexity. There will almost always be refinements that can be made, over time, that remove complexity and make for a cleaner and more streamlined design. Sometimes you may even need to throw away the initial architecture and start again, using what you have learnt from the initial architecture to take out complexity. Frederick Brooks (author of The Mythical Man Month) famously said of software designs “plan to throw one away, you will anyway“.

The other problem with complexity in systems is that it tends to increase over time due to software entropy. As more changes are made, some not envisaged by the architect because change cases were not adequately thought through, a system naturally becomes more complicated and harder to maintain. It almost seems that the lifecycle of a system could be represented by the complexity curve in the diagram below.

Complexity does not just apply to systems, it also applies to whole styles of architecture. Cloud computing would still seem to be fairly early on in the complexity curve in this respect. Cloud computing is almost the ultimate in information hiding. Just put everything in the cloud and get what you want when you want it. If you’re inside the cloud looking out however you need to deal with a whole lot of pain to create that outward facing simplicity. If you’re a cloud architect therefore you need to understand and design for that complexity otherwise over time our clouds will become weighed down with out of date junk that no one wants. This is definitely a topic I’ll be returning to over the course of 2012.

Default Architecture

One of the attributes that many (if not all) complex systems have is the ability to change (customise) them in controlled ways. Indeed this, by definition, is why some systems are complex (that is exhibit  emergent behaviour) because sometimes the users of those systems select options or make choices that enable unexpected behaviour to occur. Giving users such choices clearly has a number of implications on the architecture of a system.

  1. Users have to make a choice, no choice is actually a choice itself as it means they are choosing the default.
  2. Making systems customisable assumes users have the will (and the time) to decide which options they want to change. Many times they don’t so the default becomes very important as it will dictate how the users actually use the system, possibly forever.
  3. The more options that are built into a system the more difficult it becomes to test each potential combination of those options. Indeed there comes a point at which it becomes impossible to test every combination (at least in a reasonable amount of time), hence the importance of beta software (let the users do the testing).

In his blog entry Triumph of the Default Kevin Kelly points out how “the influence of a default is so powerful that one single default can act as a very tiny nudge that can sway extremely large and complex networks”. The oft quoted example of how defaults can influence behaviour is that of organ donation. If you make the donation of organs upon death automatically an “opt out” choice (it happens unless you refuse beforehand) versus “opt in” (it does not happen unless you sign up). A opt out donor system greatly increases the number of organs donated.

For complex systems then, the default architecture of the system becomes very important. The choices the architect makes on behalf of the users of that system will not only dictate how the users will actually use the system but may also influence their behaviour (in both positive and negative ways). Defining the defaults is as important an architectural decision as to what technologies should be used and sufficient time should always be planned in to allow such decisions to be made in a sensible way. The system architecture that results can profoundly affect how that system will be used.

Evolutionary Systems Development Process

As I’m now into my second year I reckon I can become more experimental in my blogs. This entry is therefore completely off the scale and will either not go any further or be the start of something much bigger. Comments suggesting which it is are welcome.If we want to be able to build real complex systems (see previous entry for what these are) what might we do differently to ensure we allow truly emergent properties to appear? Is there a development process that we could adopt that both ensured something was delivered in a timely fashion but that also allowed sufficient ‘flex’ such that emergent behaviour was not prevented from happening? In other words is it possible to allow systems to evolve in a managed way where those properties that we value are allowed to grow and thrive but those that are of no use can be prevented from appearing or stopped early on?

Here are some properties I think any such a development process (an evolutionary systems development process) should have if we are to build truly complex systems

  1. The process needs to be independent of any project. Normally we use processes to derive some kind of work breakdown structure which is used to plan the project that will deliver the system. In other words there is a well defined start and end point with clear steps stating who does what by when in between. However for a complex system there is by definition no stopping point; just a number of evolutions where each introduces new (and possible unpredictable) behaviour.
  2. The process should be driven not so much by what users want but by what users do. Most processes start with a list of requirements, possibly expressed as use cases, which describe scenarios for how the system will be used. The problem with this approach is that use cases only describe what a user can envisage doing with the system and will not capture what they actually end up doing with the system. In other words the system is deemed to be ‘complete’ once all the use cases have been realised. However what if the act of delivering the system itself results in new use cases emerging. Ones that were not planned for? How do they get realised?
  3. The process must be both flexible enough to allow users to experiment and envisage doing new things whilst at the same time being robust enough not to allow negative emergent behaviour that will prevent any system being delivered or lead to the system deteriorating over time.
  4. If new behaviour is to be adopted this must be of overall benefit to the majority and must still meet (a possibly changed) set of business objectives. The process must therefore allow for some kind of voting system where the majorities behaviour is allowed to dominate. The trick is not allow new and innovative behaviour to be crushed early on.
  5. The underlying rules that control how the systems responds to external stimuli must themselves be easily adaptable. The process must therefore treat as peers the rules that govern (allowable) behaviour together with the description of the behaviour itself. Rules set some kind of constraint on what is allowable and not allowable by the system.

Of course not all systems that we build will or should be complex ones. As noted previously safety-critical systems need to behave in entirely predictable ways and users should not be allowed to change the behaviour of these where lives could be put at risk. So what type of systems might benefit from an evolutionary approach to development?

  1. Social networking systems where the users are intimately bound up with the system itself.
  2. Commerce systems where the buying behaviour of users might change how and when certain products are sold and at what price.
  3. Financial systems where money markets may affect what users do and how they respond.
  4. Government systems where responses to new legislation or the behaviour of citizens needs fast responses.
  5. Other, you decide?

My next step is to think about what the elements (process phases, work products etc) of an evolutionary systems development process might be.

Complex Systems versus Complicated Systems

Brian Kernighan,co-author of the C programming language, once said:

Controlling complexity is the essence of computer programming.

We live in a world where the systems that we use or come across in our day to day lives seem to be ever more complicated. The flight software system that controls a modern aircraft like the new Boeing 787 “Dreamliner” are, on the face of it at least, almost unimaginatively complicated, simply because of their sheer size. According to NASA the software that runs the 787 is almost seven million lines of Ada code, triple that of the 777. The F35 Joint Strike Fighter has 19 million lines of C and C++ code! Does all this mean that these systems are also complex however and if not what’s the difference? Inevitably there are a large number of definitions of exactly what a complex system is however they all seem to agree on a few common things:

  1. They are made up of a collection of components.
  2. The components interact with each other.
  3. These interactions can result in emergent behavior.

Emergent behavior refers to the property that a collection of simple components may exhibit when the interactions between them result in new, and sometimes unpredictable, behavior that none of the components exhibit individually. So whilst complicated systems may be complex (and exhibit emergent properties) it does not follow that complex systems have to be complicated. In fact relatively simple systems may exhibit emergent properties.

In his book Emergence Steven Johnson gives several examples of the emergent properties of all sorts of systems from colonies of ants through to software systems. Whilst some software systems clearly thrive on being complex systems where emergent behavior is a positive benefit (I’m thinking Facebook, World of Warcraft and SecondLife for example) we might be a bit more dubious about getting on an aircraft whose flight software system exhibits emergent behavior! My guess is that most of us would prefer it if that software showed entirely predictable behavior.

Here’s the thing however. Should it not be possible to build some of our business systems where emergent behavior could be allowed and for the systems themselves to be self adjusting to take advantage of that behavior? How might we do that and what software delivery lifecycles (SDLC) might we adopt to allow that to happen? The interesting thing about SDLCs of course is that almost by definition they build predictability into the systems that are developed. We want those systems to behave in the way the users want (i.e. we want them to “meet the requirements”), not in unpredictable ways. However the thing about systems like Facebook is that the users are an integral part of that system and they drive its behavior in new and interesting ways. The developers of Facebook are able to observe this and respond to new ways in which their system is being used, adapting it accordingly. Facebook has clearly been architected in a way that allows this to happen (Nice one Zuck). The problem with many of our business systems is that they are too rigid and inflexible and do not allow emergent behavior. Even the promise of Service Oriented Architecture (SOA) where we were supposed to be able to reconfigure our business processes almost at will by combining services together in new and interesting ways has not really delivered on that promise. I think that this is for two reasons:

  1. We have failed to adapt our (SDLC) processes to take into account that property and are instead building the same old rigid systems out of slightly new sets of moveable parts.
  2. We fail to recognise that the best complex systems have people in them as an integral part of their makeup and that it is often the combination of people and technology that drive new ways of using systems and therefore emergent properties.

Building and managing complex systems needs to recognise the fact that the same old processes (SOPs) may no longer work and what new processes we new develop need to better account for people being an integral part of the system as it is used and evolves. The potential for emergent behavior needs not only to be allowed for but to be positively encouraged in certain types of system.

 

Gapology

gap·ol·o·gy n. The systematic study of, and the method used to identify and close, gaps within and between organisational units (individuals, teams, departments) or social structures (gender gaps, race gaps). Becoming an expert in using gapology or studying gapology are the behaviors of a gapologist.

Okay, this is a definition I made up to make a point and create a reason for this post! You’ll find a couple more definitions here. My observation is that many of the problems we face in systems development are due to the presence of gaps of various sorts. Until we fill such gaps we will not be able to build systems that are as effective or efficient as they might be. Here are some of the worst.

  • The Knowing-Doing Gap. An article in Fast Company discusses this book which asks: Why is it that, at the end of so many books and seminars, leaders report being enlightened and wiser, but not much happens in their organizations?
  • The Business-IT Gap. The best known gap in IT. This is the inability IT have in understanding what business people really want (or the inability the business have in saying what it is they want depending on which side of the gap you are sitting).
  • The IT-IT Gap. This is the gap between what IT develops and what operations/maintenance think they are getting and need to run/maintain.
  • The Gender Gap. As I’ve discussed here there is still unfortunately a real problem getting women into IT which I think is detrimental to the systems we build.

Many of these gaps occur because of the lack of effective stakeholder management that takes place when building systems. A report published back in 2004 (The Challenges of Complex IT Projects) by The Royal Academy of Engineering and and The British Computer Society identified the lack of effective stakeholder management as one of the key reasons for project failure. The key learning point I believe is to understand who your stakeholders are, engage with them early and often and make sure you have good communication plans in place that keep them well informed.

Are Frameworks Too Constraining and is Chaos the Natural State?

ZapThink have recently published a number of good articles on complex systems, the death of Enterprise Architecture and the dangers of ‘checklist architecture’. These have resonated nicely with some of my thoughts on the state of (IT) architecture in general and whether we are constraining ourselves unnaturally with the tools, processes and thinking models we have created. Here are the problems that I see we have with our current approach:

  1. Systems are getting ever more complex but we are still relying on the same-old-processes to deal with that.
  2. We have processes which are too large, overblown and themselves too complex which lead to people being driven by the process rather than the business goals for the system(s) under development.
  3. We are creating siloed professionals who are aligning themselves to particular disciplines; for example: Enterprise Architect, Solution Architect, Integration Architect the list is almost endless! This results in sharing of responsibilities but no one person or group retaining the overall vision.
  4. The majority of enterprises faced with addressing these challenges themselves have such large and complex bureaucracies in place that the people working in them inevitably end up becoming ‘inward facing’ and concentrating on their own needs rather than solving the complex business problems. There is of course an interesting dichotomy here which is: do we develop systems that just reinforce the status-quo of the systems we live by?

What we need is a new approach which both encompasses the need to address the complexity of systems we must build but at the same time allows for the change and chaos which is almost the natural state of human affairs and allows new and interesting properties and behaviours to emerge. As I’ve said elsewhere what we need is a new breed of versatilists who, just like the Vitruvian man of Leonardo da Vinci, can bring to bear a whole range of skills and competencies to help address the challenges we face in building todays complex systems. I think what we need is an update of the agile manifesto. This won’t just address the relatively narrow confines of the software delivery processes but will be far more extensive. Here is my first stab at such a manifesto.

  1. Systems of systems rather than single systems.
  2. Business processes that provide automated flexibility whilst at the same time recognising there is a human and more collaborative element to most processes where sometimes new and unexpected behaviours can emerge.
  3. Adaptable and configurable software delivery processes that recognise business requirements are dynamic, unclear, and difficult to communicate rather than a single, monolithic ‘one size fits all’ approach that assumes requirements are stable, well understood, and properly communicated.
  4. People that objectively view experiences and reflect on what they have learnt, look further than their current roles, explore other possibilities and pursue lifelong learning rather than those that focus on the narrow confines of the current project or how to better themselves (and their employer).
  5. Enterprises (whether they be consulting companies, product developers or end users or IT) that recognise the intrinsic value of their people and allow them to grow and develop rather than driving them to meet artificial goals and targets that feed their own rather than their clients/customers needs.

Why Do Complex Systems Projects (Still) Fail?

Depending upon which academic study you read, the failure rate of complex IT projects is reported as being between 50% and 80%! I thought I’d test this against my own experiences and took a look back over my career at the number of complex systems I have worked on and how many could be counted as being successful. Clearly the first thing you need to do here is to define “complex” and also “success” so I’m defining complex as being a system with:

  • Multiple stakeholders involved.
  • Multiple systems interfaces.
  • Challenging or high risk non-functional requirements (including delivery schedule and budget).

and “success” as being:

  • Delivered on time and within budget.
  • Met the stakeholders requirements.
  • Went into production and ran for at least 12 months.

By my count I have worked on 18 projects which meet the first set of criteria and of those I reckon 8 meet the second set of criteria so can be thought of as “successful”. So that’s a slightly under 50% success rate! Not brilliant but within the industry average (which is of course nothing to brag about).
As you might expect there is a wealth of information out there on the reasons why IT projects fail. Top amongst these are:

  • Lack of agreed measures of success.
  • Lack of clear senior management ownership.
  • Lack of effective stakeholder management.
  • Lack of project/risk management skills.
  • Evaluation of proposals driven by price rather  business benefits.
  • Projects not broken into manageable steps.

These typical failings were highlighted in a joint British Computer Society/Royal Academy of Engineering report from 2004 called The Challenges of Complex IT Projects. That was six years ago and I wonder what has changed since then? Anecdotaly I suspect not much. Certainly newspaper headlines about failed government IT projects of late (see, for example The Independent on 9th January 2010: Labour’s Computer Blunders Cost £26bn) would seem to indicate we are still not very good at delivering complex systems.

The interesting thing to observe about the above list of course is that none of these problems are technical in nature, not directly anyway. Instead they are to do with governance and process (or lack thereof) and what you might term “soft” aspects of systems delivery, how we manage and understand what people want. One of the right-brain activities which we IT folk sometimes fail to exercise is “empathy”. Our capacity for logical thought (i.e. left-brain activity) has gone a long way to creating the technological society we live in today. However in a world of ubiquitous information that is available at the touch of a button logic alone will no longer cut the mustard. In order to thrive we need to understand what makes our fellow humans tick and really get beneath their skin and to forge new relationships.

Happily this is not something that is easily outsourced, at least not yet! There is still something we can do as IT professionals therefore in engaging with stakeholders, understanding there wants and needs and trying to deliver systems that meet their requirements that can only be done with direct, personal contact.

IT Architecture and Wicked Problems

A wicked problem is one that, for each attempt to create a solution, changes the understanding of the problem. A wicked problem exhibits one or more of these characteristics:

  1. The problem is not understood until after the formulation of a solution. Despite your best efforts you don’t actually know what the problem really is and developing solutions only shows up more problems.
  2. Wicked problems have no ‘stopping rule. That is to say if there is no clear statement of the problem, there can be no clear solution so you never actually know when you are finished. For wicked problems ‘finished’ usually means you have run out of time, money, patience or all three!
  3. Solutions to wicked problems are not right or wrong. Wicked problems tend to have solutions which are ‘better’, or maybe ‘worse’ or just ‘good enough’. Further, as most wicked problems tend to have lots of stakeholders involved there is not usually any mutual agreement about which of these the solution actually is.
  4. Every wicked problem is essentially novel and unique. Because there are usually so many factors involved in a wicked problem no two problems are ever the same and each solution needs a unique approach.
  5. Every solution to a wicked problem is a ‘one shot operation’. You can only learn about the problem by building a potential solution but each solution is expensive and has consequences.
  6. Wicked problems have no single solution. There may be lots of solutions to a wicked problem any one of which may be right (or wrong). Its often down to personal (or collective) judgment as to which one to follow or adopt.

By reading the list of properties that wicked problems exhibit you can easily see that many (or even most) large and complex software development projects fall into the category of wicked problems. Especially those which involve many stakeholders, multiple systems and difficult social, political, and environment challenges (e.g. such as building a nations social security or health system) or re-engineering an enterprises core systems whilst still trying to run the business.

I think that the really interesting and challenging roles out there for us IT architects are in tackling some of the wicked problems that many large and complex systems engineering projects contain. In my experience it is very often not just the technological aspects of the problem that needs to be addressed but also the social, political and economic aspects as well. I think the real challenges and also key roles that IT architects will take as we move into the next decade are in bringing new and creative approaches to solving these wicked problems. I’m currently preparing a lecture to be given at a UK university next month around this theme.