Pythons and pandas (or why software architects no longer have an excuse not to code)

pythonpanda

The coronavirus pandemic has certainly shown just how much the world depends not just on accurate and readily available datasets but also the ability of scientists and data analysts to make sense of that data. All of us are at the mercy of those experts to interpret this data correctly – our lives could quite literally depend on it.

Thankfully we live in a world where the tools are available to allow anyone, with a bit of effort, to learn how to analyse data themselves and not just rely on the experts to tell us what is happening.

The programming language Python, coupled with the pandas dataset analysis library and Bokeh interactive visualisation library, provide a robust and professional set of tools to begin analysing data of all sorts and get it into the right format.

Data on the coronavirus pandemic is available from lots of sources including the UK’s Office for National Statistics as well as the World Health Organisation. I’ve been using data from DataHub which provides datasets in different formats (CSV, Excel, JSON) across a range of topics including climate change, healthcare, economics and demographics. You can find their coronavirus related datasets here.

I’ve created a set of resources which I’ve been using to learn Python and some of its related libraries which is available on my GitHub page here. You’ll also find the project which I’ve been using to analyse some of the COVID-19 data around the world here.

The snippet of code below shows how to load a CSV file into a panda DataFrame – a 2-dimensional data structure that can store data of different types in columns that is similar to a spreadsheet or SQL table.

# Return COVID-19 info for country, province and date.
def covid_info_data(country, province, date):
    df4 = pd.DataFrame()
    if (country != "") and (date != ""):
        try:
            # Read dataset as a panda dataframe
            df1 = pd.read_csv(path + coviddata)

            # Check if country has an alternate name for this dataset
            if country in alternatives:
                country = alternatives[country]

            # Get subset of data for specified country/region
            df2 = df1[df1["Country/Region"] == country]

            # Get subset of data for specified date
            df3 = df2[df2["Date"] == date]

            # Get subset of data for specified province. If none specified but there
            # are provinces the current dataframe will contain all with the first one being 
            # country and province as 'NaN'. In that case just select country otherwise select
            # province as well.
            if province == "":
                df4 = df3[df3["Province/State"].isnull()]
            else:
                df4 = df3[df3["Province/State"] == province]
        except FileNotFoundError:
            print("Invalid file or path")
    # Return selected covid data from last subset
    return df4

The first ten rows from the DataFrame df1 shows the data from the first country (Afghanistan).

         Date Country/Region Province/State   Lat  Long  Confirmed  Recovered  Deaths
0  2020-01-22    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
1  2020-01-23    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
2  2020-01-24    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
3  2020-01-25    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0
4  2020-01-26    Afghanistan            NaN  33.0  65.0        0.0        0.0     0.0

Three further subsets of data are made, the final one is for a specific country showing the COVID-19 data for a particular date (the UK on 7th May in this case).

             Date  Country/Region Province/State      Lat   Long  Confirmed  Recovered   Deaths
26428  2020-05-07  United Kingdom            NaN  55.3781 -3.436   206715.0        0.0  30615.0

Once the dataset has been obtained the information can be printed in a more readable way. Here’s a summary of information for the UK on 9th May.

Date:  2020-05-09
Country:  United Kingdom
Province: No province
Confirmed:  215,260
Recovered:  0
Deaths:  31,587
Population:  66,460,344
Confirmed/100,000: 323.89
Deaths/100,000: 47.53
Percent Deaths/Confirmed: 14.67

Obviously there are lots of ways of analysing this dataset as well as how to display it. Graphs are always a good way of showing information and Bokeh is a nice and relatively simple to use Python library for creating a range of different graphs. Here’s how Bokeh can be used to create a simple line graph of COVID-19 deaths over a period of time.

from datetime import datetime as dt
from bokeh.plotting import figure, output_file, show
from bokeh.models import DatetimeTickFormatter

def graph_covid_rate(df):
    x = []
    y = []
    country = df.values[0][1]
    for deaths, date in zip(df['Deaths'], df['Date']):
        y.append(deaths) 
        date_obj = dt.strptime(date, "%Y-%m-%d")
        x.append(date_obj)

    # output to static HTML file
    output_file("lines.html")

    # create a new plot with a title and axis labels
    p = figure(title="COVID-19 Deaths for "+country, x_axis_label='Date', y_axis_label='Deaths', x_axis_type='datetime')

    # add a line renderer with legend and line thickness
    p.line(x, y, legend_label="COVID-19 Deaths for "+country, line_width=3, line_color="green")
    p.xaxis.major_label_orientation = 3/4

    # show the results
    show(p)

Bokeh creates an HTML file of an interactive graph. Here’s the one the above code creates, again for the UK, for the period 2020-02-01 to 2020-05-09.

As a recently retired software architect (who has now started a new career working for Digital Innovators, a company addressing the digital skills gap) coding is still important to me. I’m a believer in the Architect’s Don’t Code anti-pattern believing that design and coding are two sides of the same coin and you cannot design if you cannot code (and you cannot code if you cannot design). These days there really is no excuse not to keep your coding skills up to date with the vast array of resources available to everyone with just a few clicks and Google searches.

I also see coding as not just a way of keeping my own skills up to date and to teach others vital digital skills, but also, as this article helpfully points out, as a way of helping solve problems of all kinds. Coding is a skill for life that is vitally important for young people entering the workplace to at least have a rudimentary understanding of to help them not just get a job but to also understand more of the world in these incredibly uncertain times.

Parkinson’s Law of Triviality (Applied to Architecture)

Parkinson’s Law of Triviality (as proposed by C. Northcote Parkinson in 1957) is that people give disproportionate weight to trivial issues. Parkinson gives as an example a hypothetical (planning) committe deliberating on the building of a nuclear power plant and a bicycle shed. Parkinson stated that “the time spent on any item of the agenda will be in inverse proportion to the sum involved.” Clearly a nuclear reactor is vastly expensive and complicated and the average person is unlikely to understand it because it is so far outside the realms of their daily experience. A bike shed, on the other hand, everyone understands (or thinks they do, which can be even more dangerous). The result is that people will be reluctant to discuss for long the nuclear reactor either because they assume those working on it understand it or because they are to fearful of making themselves look foolish in front of the experts. The bike shed however can result in endless discussions because everyone involved wants to add their “feature” and show that they have contributed. While discussing the bike shed, debate rages over what the best choice of roofing should be, whether there should be windows, whether it should be fully enclosed with walls and a door etc, etc rather than whether the shed is a good idea or not.Parkinson’s Law of Triviality is something that frequently creeps into architecture discussions. When requirements are being discussed and how best to realize them architecturally it is frequently the case that when the domain under discussion consists of parts that are familiar and ones that are not it is the former that will often receive inordinate amounts of discussion whereas the latter do not (but usually should). A good example to illustrate this is that of e-commerce systems. Because most people have used such systems everyone will have an opinion on the bits they are familiar with (the actual look and feel of the shopping experience for example, do you call it a shopping basket or a trolley, do you allow goods to be placed in the basket before you have logged in or registered and so on). The part of the system that deals with payment however (i.e. that goes off in the background and does credit checks etc) or checking stock level is not seen so people will probably have less of an opinion on it. This part, at least as far as the web site owner is concerned, is the most important however because if that does not work he will lose money. Worse however is if the “payments specialist” says that this is the way payments will work, and no one else feels they have the expertise or authority to challenge her. Doing something because that’s the way we always do it is an example of the Golden Hammer anti-pattern and does not always result in the most innovative or creative new systems.

So how to protect against falling foul of Parkinson’s Law of Triviality?

  1. Good architects need to develop deep domain knowledge and know how to, and be fearless in doing so, challenge expert opinion. Frequently appearing foolish by challenging the blindingly obvious is a small price to pay for occasionally highlighting a problem which everyone else overlooked because they were following conventional wisdom.
  2. Ensure complex problems receive proportionately more discussion than simple ones. In other words cut the discussion of addressing the simple problem sooner to allow time to discuss the complex ones. Better still if in a meeting there are several problems to be discussed prioritise them by complexity and start with the most complex first. Leaving simpler ones till the end means you can rush those through when everyone gets tired or you are out of time.
  3. Recognise that everyone has an equal say and it is often the people who know least about the problem that ask the daft, but hard, questions. This of course would seem to go against number 1. If everyone is an expert doesn’t that mean no one will be naive enough to ask the simple but challenging questions? No. A good team has a diverse mix of skills, knowledge and opinions and should allow everyone to comment.
  4. Leave your ego at the meeting room door and do not use your relative position in the hierarchy of the company to trivialise other peoples ideas or prevent constructive discussion because it is not “going your way”.
  5. Finally capture all decisions in a systematic way (recording at the very least what the decision was, what options were discussed, the rationale for choosing the one you did and any related decisions or implications).

The Tools We Use

Back in 1964 Marshall McLuhan said  “We shape our tools and afterwards our tools shape us”. McLuhan was actually talking about the media when he said this but much of what he said then has a great deal of relevance in today’s mixed up media world too.It occurs to me that McLuhan’s tool quote equally applies to the tools we use, or misuse, as software architects. PowerPoint (or Keynote for that matter) has received pretty bad press over the years as being a tool that inhibits rather than enhances our creativity. Whilst this does not have to be the case, too many people take tools, such as PowerPoint, and use them in ways I’m pretty sure their creators never intended. Here are some common tool (mis)uses I’ve observed over the years (anti-patterns for tools if you like):

  1. Spreadsheets as a databases. Too many people seem to use spreadsheets as a sort of global repository for dumping ideas, data and information in general because it gives them the ability to easily sort and categorise information. Spreadsheets are good at numbers and presenting analytical data but not for capturing textual information.
  2. Presentations as documents. Sometimes what started out as a presentation to illustrate a good idea seems to grow into a more detailed description of that idea and eventually turns into a full-blown specification! The excuse for doing this being “we can use this to present to the client as well as leaving it with them at the end of the project as the design of the system”. Bad idea!
  3. Presentations as a substitute for presenting. The best presenters present “naked”. Minimal presentations (where sometimes minimal = 0) where the presenter is at the fore and his or her slides are illustrating the key ideas is what presenting is or should be about. Did John F Kennedy, Winston Churchill or Martin Luther King rely on PowerPoint to get their big ideas across? I think not!
  4. Word processors as presentations. This is the opposite of number 2. Whilst not so common  people have been known in my experience to ‘present’ their documents on a screen in a meeting. It goes without saying, or should do, that 12pt (or less) text does not come across well on a screen.
  5. Word processors as web sites. Although most word processors have the capability of generating HTML this is not a good reason for using them to build web sites. There are a multitude of free, open and paid for tools that do a far better job of this.
  6. Emails as documents. This is variant (generalisation) of one of my favourite [sic] anti-patterns. e-mails are one of the greatest sources of unstructured data in the world today. There must be, literally, terabytes of data stored using this medium that should otherwise be captured in a more readily consumable and accessible form. e-mails clearly have a place for forming ideas but not for capturing outcomes and persisting those ideas so others can see them and learn from them.

How to Avoid the Teflon Architecture Syndrome

So you’ve sent all of your budding architects on TOGAF (or similar) training, hired in some expensive consultants in sharp suits to tell you how to “do architecture” and bought a pile of expensive software tools (no doubt from multiple vendors) for creating all those architecture models  you’ve been told about but for some reason you’re still not getting any better productivity or building the more reliable systems you were expecting from all this investment. You’re still building siloed systems that don’t inter-work, you’re still misinterpreting stakeholder requests and every system you build seems to be “one-of-a-kind”, you’re not “leveraging reuse” and SOA seems to be last years acronym you never quite got the hang of. So what went wrong? Why isn’t architecture “sticking” and why does it seem you have given yourselves a liberal coating of Teflon.The plain fact of the matter is that all this investment will not make one jot of difference if you don’t have a framework in place that allows the architects in your organisation to work together in a consistent and repeatable fashion. If it is to be effective the architecture organisation needs careful and regular sustenance. So, welcome to my architecture sustenance framework*.

Here’s how it works, taking each of the above containers one at a time:

  • Common Language: Architects (as do any professionals) need to speak the same, common language. This needs to be the foundation for everything they do. Common languages can come from standards bodies (UML and SPEM are from the OMG, IEEE 1471-2000 is the architecture description standard from the IEEE) or may be ones your organisation has developed and is in your own glossary.
  • Community: Communities are where people come together to exchange ideas, share knowledge and create intellectual capital (IC) that can be shared more broadly in the organisation. Setting up an architecture Community of Practice (CoP) where your thought leaders can share ideas is vital to making architecture stick and become pervasive. Beware the Ivory Tower Antipattern however.
  • Tools: If communities are to be effective they need to use tools that allow them to create and share information (enterprise social networking tools, sometimes referred to as Enterprise 2.0). They also need tools that allow them to create and maintain delivery processes and manage intellectual capital of all types.
  • Processes: Anything but a completely trivial system will need some level of system development lifecycle (SDLC) that enables it creation and brings a level of ceremony that the project team should follow. An SDLC brings order to the chaos that would otherwise ensue if people working on the project do not know what role they are meant to perform, what work products they are meant to create or what tasks they are meant to do to create those work products. Ideally processes are defined by a community that understands, not only the rules of system development, but also what will, and what will not, work in the organisation. Processes should be published in a method repository so that everyone can easily access them.
  • Guidance: Guidance is what actually enables people to do their jobs. Whereas a process shows what to create when, guidance shows how to create that content. Guidance can take many forms but some of the most common is examples (how did someone else do it), templates (show me what I need to do so I don’t start with a blank sheet every time) and guidelines (provide me with a step-by-step guide on how to perform my task) and tool mentors (how can I make use of a tool to perform this task and create my work product). Guidance should be published in the same (method) repository as the process so it is easy to jump between what I do as well as how I do it.
  • Projects: A project is the vehicle for actually getting work done. Projects follow a process, produce work products (using guidance) to deliver a system. Projects (i.e. the work products they produce) are also published in a repository though ideally this will separate from the generic method repository. Projects are “instances” of a delivery process which is configured in a particular way for that project. The project repository stores the artefacts from the project which serve as examples for others to use as they see fit.
  • Project and Method Repositories: The place where the SDLC and the output of multiple projects is kept. These should be well publicized as the place people know to go to to find out what to do as well as what others have done on other projects.

All of the above elements really do need to be in place to enable architecture (and architects) to grow and flourish in an organisation. Whilst these alone are not a guarantee of success without them your chances of creating an effective architecture team are greatly reduced.

*This framework was developed with my colleague at IBM, Ian Turton.

Challenging What Architects Do

For most of this year I’ve been fortunate (sic) enough to travel around several countries (and indeed continents) delivering a series of architecture workshops to a client. These workshops aim to teach, through the use of instructor led case studies, best practice approaches to building solution architectures. The workshops take a holistic approach to developing architecture by ensuring both the application as well as infrastructure aspects are considered together and both the functional and non-functional requirements are taken into account.

An enjoyable side-effect of these workshops is I get to meet lots of different people who like to challenge the approach and test my understanding (I sometimes think I’m learning more than the workshop participants). Here’s one such challenge (paraphrased) I received recently: “The problem with architects and the work products they produce (for example models built using the Unified Modeling Language) is that the work products become an end in their own right and we can easily lose track of the fact that actually what we want to build is a system with running code on real hardware”. The clear implications behind this challenge could be described via three architecture anti-patterns:

  • Antipattern: Architecture Overkill. Symptoms: Architects create work that no one understands or cares about. Multiple artefacts with overlapping content get created that no one actually reads. They are too large, too complex and may even be delivered too late.
  • Antipattern: Ivory Tower Architecture. Symptoms: Architects are out of touch with real-world technical aspects. Instead they just sit in their ivory-towers making their pronouncements (which most of the time no one listens to).
  • Antipattern: Work Creation. Symptoms: The role of the architect becomes self-fulfilling. They create work products which only they understand and therefore need more architects to help create these work products.

Here’s my response:

  • Any architectural work product needs to have either an up-stream (e.g. to the business) or down-stream (e.g. to the implementers) justification. Each of these groups of stakeholders needs to be able to recognise how the architecture, as encompassed in the work products, addresses their particular concerns otherwise the work product really is only for its own sake. There are other stakeholders who will have interests or concerns in the architecture but I just focus on these to illustrate my point.
  • The role of the architect needs to be clearly defined and work produced around that role. The role may change as business needs etc evolve but the work created should be in tune with that role and ultimately fulfil the needs of the business. This does not, of course, mean that part of what architects  do is research new technology or try out new ideas. Indeed if they don’t do this the next point will be tricky to fulfil.
  • One of the difficulties with the role of the architect is that it can mean the architect easily becomes out of touch with the latest trends etc in programming. This can be caused by a number of things including: architects are paid too much to do programming, architects are too senior to do programming, architects are too busy to do programming etc. This particular problem is particularly endemic in many services organisations where architects may be charged out at premium rates and where the role of programming is seen as almost too “menial”. What to do? It’s vital that this is not the case. There must always be time allowed in any software architects work day to stay in touch with the stuff that gives him his reason for being.

How Not to Create Artitecture

Following on from my previous post what should we do if we just want to create a SOA (same old architecture) rather than a TOA (totally outrageous artitecture)? Here are five things you should do if you want average rather than exceptional architectures (anti-patterns for ineffective architecture if you like):

  1. Focus on what your employer wants you to do (your job) rather than what your client wants you to do (your work). How much time is the project team spending excessively networking obsessing over recording data that is not project related and of dubious use anyway or attending endless meetings which don’t have a clear agenda or any useful outcome.
  2. Start committees instead of taking action. Like most things in life the best architectures don’t come from committees but come from the focussed efforts of a small team of architects. Sometime that small team can be just one person (consider Tim Berners-Lee’s world-wide web or Ray Ozzie’s Lotus Notes). Committees (we call them Design Authorities in technical circles) may have their place when multiple stakeholders need to be bought together but don’t confuse governance (i.e. controlling the steady-state) with design (i.e. initiating a change of state). Unfortunately there is safety in committees where there is no single person responsible for decisions and no one individual can be blamed when something goes wrong.
  3. Create a culture of blame and negative criticism where everyone has to watch their back. Many people on projects interact with others as though they are better than their peers or want to teach them a lesson. Others assign motivations and plots where there are none. Still others criticise anyone who is doing something differently from the norm. Mistakes happen and are usually the result of a series of unfortunate events rather than deliberate negligence or dishonesty. Learn from mistake and move on.
  4. Stop people from learning. Not allowing or fostering a learning culture is probably one of the gravest crimes that can be committed in the conceptual age. Learning does not just have to come from attending classroom based courses (or the dreaded “online training”) but should come from everything we do. Treat every type of interaction with a person (talking to them, reading their blog or whatever) as an opportunity to learn something new.
  5. Produce overly complex and outlandish work products. The problem with many delivery processes is that they can demand large numbers of work products that, when taken literally, will pull the team down into a never ending spiral of “document production” where every deliverable has to be signed off before progress can be made. Delivery processes and the work products they produce should be highly customised to the projects needs not the needs of stakeholders that demand huge volumes of written material which no one needs let alone reads.

Not having any/all of these is not a cast-iron guarantee you will succeed in building great systems based on innovative architectures but having them will certainly ensure you have architectures not artitectures.

Architecture Anti-Patterns: Pattern #6 – Architecture Thinking Not Doing

Anti-Pattern Name: Architecture Thinking Not Doing
General Form:
An organisation or enterprise embarks on an initiative to retrain their designers and developers to be ‘architects’. They set up education classes, assign new roles to their people with the word ‘architect’ in their job title and may even pay them some more money to match their ‘elevated’ status. However after a period of time (could be 6 months, could be longer) the initiative dies, developers just keep developing, designers just keep designing and no true system architectures materialise.
Symptoms and Consequences:

  • People return from training but make no changes to their behaviour. They revert to whatever it was they were doing previously.
  • The person(s) responsible for the initiative move on and as it was “their baby” the initiative dies.
  • Management think that training is sufficient and that once students have been trained they will “know what to do differently”.
  • People want to put into practice what they have learnt but have no idea where to go for help or guidance.

Refactored Solution:

  • Architecture thinking has to be turned into architecture doing. Follow up training with projects which make immediate use of new skills.
  • Identify early on who the real leaders are and get them to mentor others.
  • Obtain a critical mass of core people who will continue to develop their skills and be advocates of the new approach even once the initial hype has died down.
  • Ensure the initiative to train architects has support from the highest (i.e. CxO) level of the organisation.
  • Get buy-in from the business as well as IT. Convince the business of the need for architecture by showing them time to market and quality will improve and their will be less maintenance costs.
  • If the organisation is immature but keen to learn ‘buy-in’ experience either by hiring experienced people or working with good technical consultancies who have a proven track record (or some mix of both).
  • Follow up training with good quality, written down guidance (including guidelines, examples and templates) which is readily available and easily found (preferably in the companies website).
  • As part of the training get a written-down commitment from each student that describes one thing they will do differently as a result of attending this training. Follow this up and see if they did it. If not find out why not.

See Pattern #5

Architecture Anti-Patterns: Pattern #5 – Death by (Technical) Presentations

Anti-Pattern Name: Death by (Technical) Presentations
General Form:
A presentation on a technical topic contains many, many slides of tightly packed text with overly complex diagrams that are difficult to read close up let alone from any distance away.
Symptoms and Consequences:

  • You’re invited along to a presentation where the architecture will be “presented”.
  • The presentation consists of 167 slides in mainly 12pt font with closely packed words on each page.
  • The presenter often says things like “I’m sorry you can’t see this from the back” or ” I’m going to skip over this slide as it’s not really relevant”.

Refactored Solution:

  • Read this book and visit Garr Reynolds blog for ideas and inspiration
  • Architectures should be created as models in a legitimate modelling tool. It’s okay to cut and paste pictures into a presentation tool such as PowerPoint as long as they can easily be viewed and are meaningful.
  • Make sure presentations are consistent in their use of layout, colour and fonts.
  • Don’t attempt to put all the information in a presentation. Refactor to contain the key points only.
  • Put the detailed information in handouts which are given out after the presentation.

See Pattern #4

Architecture Anti-Patterns: Pattern #4 – Too Many Chiefs

Anti-Pattern Name: Too Many Chiefs
General Form:
A large project or program is top-heavy with architects. Lots of high-level pictures and documents get produced but no real specifications are created.
Symptoms and Consequences:

  • The project has been running for several months and there is no clear plan.
  • Lots of high-level documents not following any obvious templates or serving any useful purpose (e.g. “Functional Specification”) are created.
  • No detailed specifications are in evidence (e.g. Component Models, Operational Models).
  • Overuse of MS Office products rather than “real” architecture tools.

Refactored Solution:

  • Ensure all roles are well defined and have clear deliverables.
  • Ensure all architects understand the process they are following.
  • Ensure there is a clear plan with roles and deliverables assigned.
  • Instigate use of professional tooling

See Pattern #3

Architecture Anti-Patterns: Pattern #3 – Throwing Out the Baby with the Bath Water

Anti-Pattern Name: Throwing Out the Baby with the Bath Water
General Form:
A new idea, technology or process comes along and everyone jumps on the bandwagon forgetting the fundamental principles developed previously.
Symptoms and Consequences:

  • Lots of external and internal “marketware” describing the benefits of the new approach but often with very little substance behind them.
  • People being encouraged to go on education, join working groups etc where the new approach is described.
  • Skills in “legacy” technology become difficult to find as everyone is now trained in the new approach

Refactored Solution:

  • Ensure fundamentals are captured as principles and approaches in existing methods and processes.
  • Reinforce through training and good governance.
  • Take care when/how to adopt the new approach.

See Pattern #2