Non-Functional Requirements and the Cloud

As discussed here the term non-functional requirements really is a complete misnomer. Who would, after all, create a system based on requirements that were “not functional”? Non-functional requirements refer to the qualities that a system should have and the constraints under which it must operate. Non-functional requirements are sometimes referred to as the “ilities,” because many end in “ility,” such as, availability, reliability, and maintainability etc.

Non-functional requirements will  of course an impact on the functionality of the system. For example, a system may quite easily address all of the functional requirements that have been specified for it but if that system is not available for certain times during the day then it is quite useless even though it may be functionally ‘complete’.

Non-functional requirements are not abstract things which are written down when considering the design of a system and then ignored but must be engineered into the systems design just like functional requirements are. Non-functional requirements can have a bigger impact on systems design than can functional requirements and certainly if you get them wrong can lead to more costly rework. Missing a functional requirement usually means adding it later or doing some rework. Getting a non-functional requirement wrong can lead to some very costly rework or even cancelled projects with the knock-on effect that has on reputation etc.

From an architects point of view, when defining how a system will address non-functional requirements, it mainly (though not exclusively) boils down to how the compute platforms (whether that be processors, storage or networking) are specified and configured to be able to satisfy the qualities and constraints specified of it. As more and more workloads get moved to the cloud how much control do we as architects have in specifying the non-functional requirements for our systems and which non-functionals are the ones which should concern us most?

As ever the answer to this question is “it depends”. Every situation is different and for each case some things will matter more than others. If you are a bank or a government department holding sensitive customer data the security of your providers cloud may be upper most in your mind. If on the other hand you are an on-line retailer who wants your customers to be able to shop at any time of the day then availability may be most important. If you are seeking a cloud platform to develop new services and products then maybe the ease of use of the development tools is key. The question really is therefore not so much which are the important non-functional requirements but which ones should I be considering in the context of a cloud platform?

Below are some of the key NFR’s I would normally expect to be taken into consideration when looking at moving workloads to the cloud. These apply whether they are public or private or a mix of the two. These apply to any of the layers of the cloud stack (i.e. Infrastructure, Platform or Software as a Service) but will have an impact on different users. For example availability (or lack of) of a SaaS service is likely to have more of an impact on the business user than developers or IT operations whereas availability of the infrastructure will effect all users.

  • Availability – What percentage of time does the cloud vendor guarantee cloud services will be available (including scheduled maintenance down-times)? Bear in mind that although 99% availability may sound good that actually equates to just over 3.5 days potential downtime a year. Even 99.99 could mean 8 hours down time. Also consider as part of this Disaster Recovery aspects of availability and if more then one physical data centre is used where do they reside? The latter is especially true where data residency is an issue if your data needs to reside on-shore for legal or regulatory reasons.
  • Elasticity (Scalability) – How easy is it to bring on line or take down compute resources (CPU, memory, network) as workload increases or decreases?
  • Interoperability – If using services from multiple cloud providers how easy is it to move workloads between cloud providers? (Hint: open standards help here). Also what about if you want to migrate from one cloud provider to another ? (Hint: open standards help here as well).
  • Security – What security levels and standards are in place? for public/private clouds not in your data centre also consider physical security of the cloud providers data centres as well as networks. Data residency again needs to be considered as part of this.
  • Adaptability – How easy is it to extend, add to or grow services as business needs change? For example if I want to change my business processes or connect to new back end or external API’s how easy would it be to do that?
  • Performance – How well suited is my cloud infrastructure to supporting the workloads that will be deployed onto it, particularly as workloads grow?
  • Usability – This will be different depending on who the client is (i.e. business users, developers/architects or IT operations). In all cases however you need to consider ease of use of the software and how well designed interfaces are etc. IT is no longer hidden inside your own company, instead your systems of engagement are out there for all the world to see. Effective design of those systems is more important than ever before.
  • Maintainability – More from an IT operations and developer point of view.  How easy is it to manage (and develop) the cloud services?
  • Integration – In a world of hybrid cloud where some workloads and data need to remain in your own data centre (usually systems of record) whilst others need to be deployed in public or private clouds (usually systems of engagement) how those two clouds integrate is crucial.

I mentioned at the beginning of this post that non-functional requirements should actually be considered in terms of the qualities you want from your IT system as well as the constraints you will be operating under. The decision to move to cloud in many ways adds a constraint to what you are doing. You don’t have complete free reign to do whatever you want if you choose off-premise cloud operated by a vendor but have to align with the service levels they provide. An added bonus (or complication depending on how you look at it) is that you can choose from different service levels to match what you want and also change these as and when your requirements change. Probably one of the most important decisions you need to make when choosing a cloud provider is that they have the ability to expand with you and don’t lock you in to their cloud architecture too much. This is a topic I’ll be looking at in a future post.

Consideration of non-functional requirements does not go away in the world of cloud. Cloud providers have very different capabilities, some will be more relevant to you than others. These, coupled with the fact that you also need to be architecting for both on-premise as well as off-premise clouds actually make some of the architecture decisions that need to be made more not less difficult. It seems the advent of cloud computing is not about to make us architects redundant just yet.

For a more detailed discussion of non-functional requirements and cloud computing see this article on IBM’s developerWorks site.

The Wicked Problems of Government

The dichotomy of our age is surely that as our machines become more and more intelligent the problems that we need them to solve are becoming ever more difficult and intractable. They are indeed truly wicked problems, no more so than in our offices of power where the addition of political and social ‘agendas’ would seem to make some of the problems we face even more difficult to address.

Poll Tax
A Demonstration Against the Infamous ‘Poll Tax’

In their book The Blunders of Our Governments the authors Anthony King and Ivor Crewe recall some of the most costly mistakes made by British governments over the last three decades. These include policy blunders such as the so called poll tax introduced by the Thatcher government in 1990 which led to rioting on the streets of many UK cities (above). Like the poll tax many, in fact most, of the blunders recounted are not IT related however the authors do devote a whole chapter (chapter 13 rather appropriately) to the more egregious examples of successive governments IT blunders. These include:

  • The Crown Prosecution Service, 1989 – A computerised system for tracking prosecutions. Meant to be up and running by 1993-94, abandoned in 1997 following a critical report from the National Audit Office (NAO).
  • The Department of Social Security, 1994 – A system to issue pensions and child benefits using swipe cards rather than the traditional books which were subject to fraud and also inefficient. The government cancelled the project in 1999 after repeated delays and disputes between the various stakeholders and following another critical report by the NAO.
  • The Home Office (Immigration and Nationality Directorate), 1996 – An integrated casework system to deal with asylum, refugee and citizenship applications. The system was meant to be live by October of 1998 but was cancelled in 1999 at a cost to the UK taxpayer of at least £77 million. The backlog of cases for asylum and citizenship which the system had meant to address had got worse not better.

Whilst the authors don’t offer any cast iron solutions to how to solve these problems they do highlight a number of factors these blunders had in common. Many of these were highlighted in a joint Royal Academy of Engineering and British Computer Society report published 10 years ago this month called The Challenges of Complex IT Projects.The major reasons found for why complex IT projects fail included:

  • Lack of agreed measures of success.
  • Lack of clear senior management ownership.
  • Lack of effective stakeholder management.
  • Lack of project/risk management skills.
  • Evaluation of proposals driven by price rather than business benefits.
  • Projects not broken into manageable steps.

In an attempt to address at least some of the issues around the procurement and operation of government IT systems (which is not restricted to the UK of course), in particular those citizen facing services over the internet, the coalition government that came to power in May 2010 commissioned a strategic review of its online delivery of public services by the UK Digital Champion Martha Lane Fox. Her report published in November 2010 recommended:

  • Provision of a common look and feel for all government departments’ transactional online services to citizens and business.
  • The opening up of government services and content, using application programme interfaces (APIs), to third parties.
  • Putting a new central team in Cabinet Office that is in absolute control of the overall user experience across all digital channels and that commissions all government online information from other departments.
  • Appointing a new CEO for digital in the Cabinet Office with absolute authority over the user experience across all government online services and the power to direct all government online spending.

Another government report, published in July of 2011, by the Public Administration Select Committee entitled Government and IT – “a recipe for rip-offs” – time for a new approach proposed 33 recommendations on how government could improve it’s woeful record for delivering IT. These included:

  • Developing a strategy to either replace legacy systems with newer, less costly systems, or open up the intellectual property rights to competitors.
  • Contracts to be broken up to allow for more effective competition and to increase opportunities for SMEs.
  • The Government must stop departments specifying IT solutions and ensure they specify what outcomes they wish to achieve.
  • Having a small group within government with the skills to both procure and manage a contract in partnership
    with its suppliers.
  • Senior Responsible Owners (SROs) should stay in post to oversee the delivery of the benefits for which they are accountable and which the project was intended to deliver.

At least partly as a result of these reports and their recommendations the Government Digital Service (GDS) was established in April 2011 under the leadership of Mike Bracken (previously Director of Digital Development at The Guardian newspaper). GDS works in three core areas:

  • Transforming 25 high volume key exemplars from across government into digital services.
  • Building and maintaining the consolidated GOV.UK website –  which brings government services together in one place.
  • Changing the way government procures IT services.

To the large corporates that have traditionally provided IT software, hardware and services to government GDS has had a big impact on how they do business. Not only does most business now have to be transacted through the governments own CloudStore but GDS also encourages a strong bias in favour of:

  • Software built on open source technology.
  • Systems that conform to open standards.
  • Using the cloud where it makes sense to do so.
  • Agile based development.
  • Working with small to medium enterprises (SME’s) rather than the large corporates seen as “an oligarchy that is ripping off the government“.

There can be no doubt that the sorry litany of public sector IT project failures, rightly or wrongly, have caused the pendulum to swing strongly in the direction that favours the above approach when procuring IT. However some argue that the pendulum has now swung a little too far. Indeed the UK Labour party has launched its own digital strategy review led by shadow Cabinet Office minister Chi Onwurah. She talks about a need to be more context-driven, rather than transaction focused saying that while the GDS focus has been on redesigning 25 “exemplar” transactions, Labour feels this is missing the complexity of delivering public services to the individual. Labour is also critical of the GDSs apparent hostility to large IT suppliers saying it is an “exaggeration” that big IT suppliers are “the bogeymen of IT”. While Labour supports competition and creating opportunities for SMEs, she said that large suppliers “shouldn’t be locked out, but neither should they be locked in”.

The establishment of the GDS has certainly provided a wake up call for the large IT providers however, and here I agree with the views expressed by Ms Onwurah, context is crucial and it’s far too easy to take an overly simplistic approach to trying to solve government IT issues. A good example of this is that of open source software. Open source software is certainly not free and often not dramatically cheaper than proprietary software (which is often built using some elements of open source anyway) once support costs are taken into account. The more serious problem with open source is where the support from it comes from. As the recent Heartbleed security issue with OpenSSL has shown there are dangers in entrusting mission critical enterprise software to people who are not accountable (and even unknown).

One aspect to ‘solving’ wicked problems is to bring more of a multi-disciplinary approach to the table. I have blogged before about the importance of a versatilist approach in solving such problems. Like it or not, the world cannot be viewed in high contrast black and white terms. One of the attributes of a wicked problem is that there is often no right or wrong answer and addressing one aspect of the problem can often introduce other issues. Understanding context and making smart architecture decisions is one aspect to this. Another aspect is whether the so called SMAC (social, mobile, analytics and cloud) technologies can bring a radically new approach to the way government makes use of IT? This is something for discussion in future blog posts.

Making Smart Architecture Decisions

Seth Godin whose blog can be found here, posted a wonderful entry called The extraordinary software development manager yesterday which contains some great axioms for managing complex software projects, the last of which is something I wish I’d written! Here is what it says:

Programming at scale is more like building a skyscraper than putting together a dinner party. Architecture in the acquisition of infrastructure and tools is one of the highest leverage pieces of work a tech company can do. Smart architecture decisions will save  hundreds of thousands of dollars and earn millions. We’ll only make those decisions if we can clearly understand our options.

As I’ve said elsewhere architecture decisions are one of the most crucial social objects to be exchanged between stakeholders on a software development project. For a very thorough description of what architecture decisions are and how they can be captured see the article Architecture Decisions: Demystifying Architecture by Jeff Tyree and Art Akerman. However you don’t need to make every decision to such a precise level of detail. At a minimum you just need to make sure you have looked at all options and assumptions and have captured the rationale as to why the decision was made. Only then can you move forward reasonably safe in the knowledge that your decisions will stand the test of time and, as a bonus, will be transparent for all to see and to understand the rationale behind them.

Good architecture decisions will form part of the system of record for the project and the system itself, once it is built. They will allow future software archeologists to delve deeply into the system and understand why it is the way it is (and maybe even help fix it). They also help newcomers to the project understand why things are the way they are and help bring them up to speed more quickly on the project. Finally, and possibly most importantly of all, they stop others making different decisions later on with possibly great financial implications to the project. If you can read and understand the rationale behind why a decision was made you are less likely to overturn the decision or ignore it all together.

Oops There Goes Our Reputation

I’m guessing that up to two weeks ago most people, like me, had never heard of a company called Epsilon. Now, unfortunately for them, too many people know of them for all the wrong reasons. If you sign up to any services from household names such as Marks and Spencer, Hilton, Marriott or McKinsey you will have probably had several emails in the last two weeks advising you of a security breach which led to “unauthorized entry into Epsilon’s email system”. Unfortunately because Epsilon is a marketing vendor that manages customer email lists for these and other well known household brands chances are your email has been obtained by this unauthorised entry as well. Now, it just might be a pure coincidence, but in the last two weeks I have also received emails from the Chinese government inviting me to a conference on some topic I’ve never heard of, from Kofi Annan, ex-Secretary General of the United Nations and from a lady in Nigeria asking for my bank account details so she can deposit $18.4M into the account so she can leave the country!According to the information on Epsilon’s web site the information that was obtained was limited to email addresses and/or customer names only. So, should we be worried by this and what are the implications on architecture of such systems?

I think we should be worried for at least three reasons:

  1. Whilst the increased spam that is seemingly inevitable following an incident such as this is mildly annoying a deeper concern is how could the criminal elements who now have information on the places I do business on the web put this information together to learn more about me and possibly construct a more sophisticated phishing attack? Unfortunately it’s not only the good guys that have access to data analytics tools.
  2. Many people probably have a single password to access multiple web sites. The criminals who now have your email as well as knowledge of which sites you do business at only have to crack one of these and potentially have access to multiple sites, some of which may have more sensitive information.
  3. Finally how come information I trusted to well known (and by implication ‘secure’) brands and their web sites has been handed over to a third party without me even knowing about it? Can I trust those companies not to be doing this with more sensitive information and should I be withdrawing my business from them?. This is a serious breach of trust and I suspect that many of these brands own reputations will have been damaged.

So what are the impacts to us as IT architects in a case like this? Here are a few:

  1. As IT architects we make architectural decisions all the time. Some of these are relatively trivial (I’ll assign that function to that component etc) whereas others are not. Clearly decisions about which part of the system to entrust personal information to is not trivial. I always advocate documenting significant architectural decisions in a formal way where all the options you considered are captured as well as the rationale and implications behind the decision you made. As our systems get ever more complex and distributed the implications of particular decisions become harder to quantify. I wonder how many architects consider the implications to a companies reputation of entrusting even seemingly low grade personal information to third parties?
  2. It is very likely that incidents such as this are going to result in increased legislation that covers personal information just like there is legislation on Payment Card Industry (PCI) standards. This will demand more architectural rigour as new standards essentially impose new constraints on how we design our systems.
  3. As we trundle slowly to a world where more and more of our data is to be held in the cloud using a so called multi-tenant deployment model it’s surely only a matter of time before unauthorised access to one of our cloud data stores will result in access to many other data sources and a wealth of our personal information. What is needed here is new thinking around patterns of layered security that are tried and tested and, crucially, which can be ‘sold’ to consumers of these new services so they can be reassured that their data is secure. As Software-as-a-Service (SaaS) takes off and new providers join the market we will increasingly need to be reassured they are to be trusted with our personal data. After all if we cannot trust existing, large corporations how can we be expected to trust new, small startups?
  4. Finally I suspect that it is only a matter of time before legislation aimed at systems designers themselves is enforced that make us as IT architects liable for some of those architectural decisions I mentioned earlier. I imagine there are several lawyers engaged by the parties whose customers email addresses were obtained and whose trust and reputation with those customers may now be compromised. I wonder if some of those lawyers will be thinking about the design of such systems in the first place and, by implication, the people who designed those systems?

Architecture Anti-Patterns: Pattern #2 – Groundhog Day

Anti-Pattern Name: Groundhog Day
General Form:
Important architectural decisions that were once made get lost, forgotten or are not communicated effectively.
Symptoms and Consequences:

  • People forget or don’t know a decision was made.
  • The same decision is made more than once, possibly differently.
  • New people joining the project don’t understand why a decision was made.

Refactored Solution:

  • Capture important decisions in the “Architectural Decisions” work product.
  • Ensure a process is in place for making and ratifying decisions (maybe a Design Authority responsibility).
  • Ensure decisions get known about by all the right people.

See Pattern #1