Improving Wetware

Because technology is never the issue

Automating system tests for Defects

Posted by Pete McBreen 05 Jun 2022 at 21:23

An interesting problem comes up when exploratory testing finds a defect in an application. If the defect cannot be fixed immediately, the fix is going to be assigned to the backlog and eventually scheduled to be worked on by the development team. The challenge now is what to put into the automated test. should the test fail or should it pass?

  • A failing test is the obvious choice, since the test should specify correct behavior, and when the developers get around to making the necessary changes, the test will just pass without any extra work required by the QA team.
  • Under PyTest there is the option to mark a test as XFail, which reports the count of tests that failed as expected (XFAIL) and the count of tests that were expected to fail but passed (XPASS)
  • A passing test that encodes the current behavior of the code as correct is another option for the test. The passing test is a Change Detector in that the test will fail when the defect is fixed.

Of the three options, a failing test will mean that every time the test suite runs, one or more tests will fail and there is some overhead of deciding which of the failures are expected and which are not expected. The xfail test has a similar problem, but without the benefit of a stack trace, so if the actual behavior changes, but is not fully fixed, then the test may still report as an XFAIL, so there is still some overhead of checking the test suite result.

A passing test is cheap to evaluate. If all tests pass, then nothing has changed and there is nothing to investigate. If however a test fails, then the normal process of investigating the error starts – and if the passing test had a good error explanation referring to the defect, then the investigation should be short. It is then a short process to amend the test case to reflect the new correct behavior and now the entire test suite should then pass. Some exploratory testing is still necessary to make sure that the fix did not introduce any other weird behavior, but the process of getting the test suite passing again should be trivial.

Automating System Tests

Posted by Pete McBreen 04 Jun 2022 at 23:06

Rails and Phoenix introduced the idea of testing to web development, see Rails Guide and Hexdocs Testing Guide their main focus is on Unit Testing, Testing Views and Testing Controllers, while not saying much about System testing. They both make a nod towards Integration testing, but the focus is more on what developers should know about testing rather than a full system test.

With Unit Tests, the test case setup and teardown is relatively minimal, so often a unit test will have a single assertion – basically a unit test is testing just a single thing. This is appropriate for a Unit Test, since the execution time for the test suite is relatively insensitive to the the number of tests when the testing framework can execute a suite of 1,000 unit tests in a few seconds.

For Controller and View tests, both Rails and Phoenix have a nice way of mocking out the webserver interaction so that the behavior of the Controller that responds to the GET/POST/PUT/DELETE requests from the browser can be tested without needing the full stack. Both include the idea of a Test Database for use by the test suite that gets populated by fixtures that work in conjunction with the test cases to provide appropriate data for the tests. Typically the data is set up before each test case and then cleaned up after each test case so that the failure of one test case does not impact the other test cases. Typically these tests run slower than Unit tests, but a reasonable suite of several hundred tests can run in less than 10 seconds.

Both Unit and the Controller/View tests tend to be relatively simple with few assertions. System Testing is different. For a start the setup and teardown time for these tests can be significant, especially if you have a microservice architecture and the test case covers the interactions between multiple services. So system tests have to pay back the larger setup and teardown time by doing more work inside each system test case which means

  • the scenario for a system test case has to be more of a Soap Opera
  • there should be many more assertions about the steps along the scenario so that if there is an error, the source of the error can be found quickly
  • the scenarios should follow multiple alternate paths through the system

Just-In-Time, Work-In-Progress and Kanban

Posted by Pete McBreen 05 May 2022 at 04:50

There are several interesting parallels between Scrum style development and Just-In-Time manufacturing systems, the most useful is the idea of Work-In-Progress (WIP).

  • In manufacturing, having a lot of WIP means that all workstations can be kept busy, at the cost of a lot of investment in the partially finished work and a really slow response to changes. With enough WIP any supply chain disruptions have minimal impact, since there are always enough in somewhere in the manufacturing system.
  • Scrum has the same sort of problem if there are too many epics/stories/tasks in progress, but the WIP is less obvious other than the problem that there is no working software being delivered from the process. It solves the problem of the product owner being slow to document new epics/stories, which is the manufacturing equivalent of supply chain disruption.

Overall low WIP is preferable for a responsive process, but the downside of too low a WIP level is that it will reveal any process issues that exist. In the manufacturing world, the metaphor that was commonly used was that of water flowing in a river. If you slow the volume of water flowing, then the rocks in river become visible, which should be used as a signal to increase the volume of the WIP until the rocks in the stream could be removed (meaning that the process issues could be fixed).

A specific manufacturing example would be the setup time to start processing a new part. With a large setup time, it makes sense to have a large batch size (and hence a larger WIP) so that the setup time can be amortized over the larger batch. The obvious fix is to do the necessary work on fixtures to reduce the setup time, so that smaller batches are feasible, so that the WIP can be reduced.

In the software world, a branch and associated Pull Request (PR) can be viewed as a batch of WIP. A small PR that touches only a few files is fast to review and merge, with a low probability of any merge conflicts. A large PR that represents multiple days of work modifying many files takes much longer to review and merge, with a high probability of a merge conflict that will further slow the process. The normal choice is to choose to do smaller PRs, but that does not work if the review process (AKA the setup time in the manufacturing process) takes a long time.

So the eventual outcome of a slow review process is that developers will choose to have more WIP, since they will choose to make larger changes, since the review takes a long time. They will also work on multiple stories/tasks concurrently so that they can stay busy rather than being stuck idle waiting for their PR to be merged.

Kanban is a mechanism for controlling the amount of WIP.

  • In manufacturing, the available space to store parts on the manufacturing floor is reduced to limit the WIP. Parts can only be produced if there are appropriate places to put the finished components.
  • Software does not have the equivalent concept of intermediate storage of WIP, so the equivalent control is to limit the stories/tasks that can be in progress. This is not quite as physical a control as reducing the size and number of the finished part storage, but with appropriate enforcement it can be an effective strategy.

Working As Designed and other stories

Posted by Pete McBreen 25 Apr 2022 at 03:59

A misconception that some developers hold about testing and quality assurance is that the role of the tester is to validate that the implementation matches the design and/or the documented requirements. The symptoms of this misconception is when a reported bug gets marked as Working As Designed or Meets Requirements, or rejected as an Edge Case that is unlikely to happen in production use.

One of the roles of testers is to act as a User Advocate who protects the interests of the end users. Product owners, analysts and developers sometimes forget that the main goal for software is to improve the life of the users of the software. They can become locked into their own ideas or the existing software architecture and end up creating a sub-optimal user experience.

Architecture at the wrong scale

Posted by Pete McBreen 06 Apr 2022 at 23:03

In a post titled You are not Google Oz Nova points out that many decisions about software architecture are made without really addressing the scale and context of the system under design related to the scale and context of the organization that developed the technology. One of the examples in the post is that of DyanmoDB which was originally designed to support write availability at scale so that Amazon did not lose any “put in basket” actions. In service of the write availability, DynamoDB loses many of the attributes of relational databases that are useful, consistency, referential integrity and easy joins, but few teams that adopt DynamoDB as part of their architecture really need what DynamoDB is optimized for.

In the service of thinking about technology choices, Oz introduces an acronym UNPHAT

  • Understand the problem – then you can produce a solution within the problem space
  • eNumerate the candidates – choose at least three so that you are not doing a simple binary choice
  • Find a Paper to read about each candidate – get beyond the marketing and conference presentation
  • Investigate the Historical context – what was it designed to optimize and what could it ignore
  • What Advantages does the technology have and what are the disadvantages – nothing is ever all positives
  • Think about how well the technology fits your problem space – does it work for your context and scale?

In the wake of this, how many problem spaces really need the complexity of Microservices?

Testing emails when your app is not the sender

Posted by Pete McBreen 08 Mar 2022 at 04:24

When testing emails, when your application is the sender of the email, then tools like mailtrap are good for capturing the SMTP emails from Development and Staging environments, allowing automated tests to grab the emails and check things like the password reset flows.

Problems arise however when other systems are sending emails that are relevant to your application. A recent example I ran across was similar to this One Time Passcode workflow documented by Microsoft. The basic problem is that a third party is emailing to one of your test accounts and you need to extract some information from that email in order to complete an action.

As a tester, it is normally feasible to get a few test accounts setup by you email admins, and then your test cases can reuse that limited set of email addresses for the automated tests. A better way though is to use a service like Mailinator which allows easy access to a reasonable number of randomly generated usernames, say, on one of the Mailinator domains. On the free tier, the emails show up in a public email box at

The fun starts as soon as you have an account, then you can use an API to get the contents of your private emails to your own mailinator domain. The Message API allows you to fetch the message identifiers, read the email related to that identifier and then delete the message related to that identifier to keep the mailbox clean.

The way this works is by specifying a wildcard catch-all on the email domain, that will catch all emails not addressed to known usernames. Normally an email server would send back a message saying mailbox not known, but the catch-all just grabs all those unknown emails and forms the basics of the mailinator system. You could roll your own, but much simpler to use Mailinator or one of the competing services.

Test code is not Production code

Posted by Pete McBreen 06 Mar 2022 at 23:52

And as such needs to be held to different standards. Copy and Paste of test code is not as problematic as it could be in Production code. What matters is that the tests are cheap to write and modify, so speculatively scattering the test case code into multiple modules is at best a premature optimization, and potentially a waste of time.

When using playwright and pytest, a new test should be written inline in a single test_xxx without writing any new any helper functions, test utilities or extracting selector strings out to constants. Yes, it can use existing fixtures to handle things like Login/logout and data setup, and exiting helper functions to perform necessary actions, but all the new stuff has to be inline in the test and ideally should be less than 30 lines of code.

The resulting commit to the repository should only be for a single file, with the file touched in at most three places

  • The DocString that describes the tests contained in the file
  • Potentially some new import statements
  • The new test case inside the test_xxx method

The rationale for this is to force the person writing the test to focus just on the test and nothing else. Refactoring to extract out selector strings can happen later, as can extracting reusable bits of test code, either as test utilities or possible fixtures. The initial focus should be on a quick and dirty implementation for the test so that it can run against the application that is being tested in the relevant environments.

Once the test is running and being useful testing the application, then is the time to think about the code quality:

  • If there are now multiple tests that have identical data setup and teardown, then extract that code to a fixture
  • If the same selector is now used in multiple test cases, or multiple places in the same test case, then it can make sense to extract it to a constant and collect the selectors for that UI component into a separate place.
  • If the test case had to hit a web service, it may make sense to extract code to a helper function, but only if there are other existing tests that could make use of the extracted function.

Overall the goal is to have new test cases that read from top to bottom without your team having to investigate what is happening in as myriad of new functions and modules that have been added just to support this new test case.

Applying Quality Assurance to your Tests

Posted by Pete McBreen 27 Feb 2022 at 22:13

Although Brian Marick was not the originator of the concept, I first heard about Soap Opera Tests from Brian. Rather than a test covering a single, simple scenario, instead exaggerate and complicate the scenario to push the system to see where the failures can occur. This gets around the problem that is often seen in Agile projects where the team tries to simplify the problem domain by ignoring what could be considered to be edge-cases and just addressing the simple scenarios.

The lens of a Soap Opera can be useful to review the test suite for an application that goes beyond the simplistic code coverage that is often reported from unit tests and component tests within a deployment pipeline

  • How many tests (outside of unit tests) have a trivial sequence of setup, do action, check result, teardown, (or to use the Agile terms, how many tests are of the form Given, When, Then) rather than a connected sequence of transactions that represents a complex scenario?
  • For parameterized tests, how many are truly distinct tests rather than just equivalent values that exercise the exact same code path?
  • Are the System tests already covered by the Component level tests already implemented by the developers? (Typically the developer written tests may consider some possible failures, but miss others)
  • Do the System tests touch multiple parts of the architecture as part of a test scenario? (This is where a Soap Opera mindset helps, making sure that the test addresses what happens at team and component boundaries.)
  • Do the System tests address the full scope of the system and cover all interacting systems? (A common failing is that of not testing that the data replicated to the associated data lake/swamp/warehouse accurately represents the system data.)

Overall whenever evaluating a test, it is useful to know what risk is it addressing. Ideally any descriptive text included in the automated test case should include information about the motivation for the test, why it is important and the consequences of skipping the test. My take is that System tests should not be just repeating what can already be done by unit and component level tests (e.g. view and controller tests in Phoenix Testing terminology), they have to go beyond those simple scenarios and probe the interfaces between the various components.

Basically all tests have to answer the economic question as to what is the value of this test case?

Dan North and CUPID

Posted by Pete McBreen 17 Feb 2022 at 00:11

CUPID is Dan’s response to the SOLID principles and back story. Rather than another set of principles, Dan instead chose to focus on the properties of the software.

  • Composable – code that works well with others and does not have too many external dependencies that make it harder to use in another context, ideally with Intention Revealing terminology

  • Unix philosophy – related to the composability property, does one thing well and works well with others to build a larger solution

  • Predictable – or as the saying goes, “does what it says on the tin.” Dan calls this a generalization of Testability, it should behave as expected, be deterministic and observable

  • Idiomatic – naturally fits in with the way code is written in the implementation language, so for example in Python, rather than open, write and then close a text file, the natural way to write this is as below, where Python automatically handles the closing of the file

    with open("file.txt", 'w') as textfile:
        textfile.write("hello world!")

  • Domain based – uses words and language in a way that would be familiar to practitioners in that domain.

Read Evaluate Print Loop - REPL for the win

Posted by Pete McBreen 06 Feb 2022 at 04:48

When working with interpreted languages like Ruby, Elixir and Python it is great to use the REPL to discover the capabilities of the various variables that you are dealing with.

Ruby uses irb, Elixir uses irb and to be different Python jumps directly into the interactive prompt using python. In each of these you have the full power of the language to use whatever libraries you have installed by just typing code at the relevant prompt. So at the python prompt you could do the following to see how Playwright interacts with the browser - using code borrowed from an earlier post.

from playwright.sync_api import sync_playwright

playwright = sync_playwright().start()
browser = playwright.chromium.launch() 

page = browser.new_page()
page.goto("")"#main_navbar :text('blog')")
title = page.title()

The nice thing with each of these REPLs is that they allow you to see the type of the object and the associated attributes and methods, and hence get a better understanding of the library by trying things out and getting immediate success or failure - with an associated error message and stack dump, immediately followed by the REPL prompt for you to try again. Amusingly this even works for overly complex APIs like the Amazon Boto3 python library that you need to interact with the AWS services.

Sometimes you have to read the comics

Posted by Pete McBreen 04 Feb 2022 at 23:22

Normally I avoid any hint of political comment, but this just hit the sweet spot of asking who in influencing our wetware: how do we decide what to care about and what to argue about.

Pearls Before Swine

A neat Playwright Codegen feature

Posted by Pete McBreen 03 Feb 2022 at 00:25

When using the playwright codegen utility, it provides a nice preview of the available selector when hovering the mouse over any part of the web page. When tried with the Phoenix Liveview default application, it can be started with the command

> playwright codegen http://localhost:4000/

and after navigating to the LiveDashboard, the selector for the refresh speed shows up in the chromium browser

Preview of selectors

It also does a good job of generating some sample code that can then be copied into a pytest test case for future reuse

# Click text=Ports
# with page.expect_navigation(url="http://localhost:4000/dashboard/ports"):
with page.expect_navigation():"text=Ports")

# Select 2
page.select_option("select[name=\"refresh\"]", "2")

Note that it will delay the script with expect_navigation until the Ports page is displayed - although it is not waiting for a specific url unlike the commented out part of the code.

A legal framework for automated vehicles

Posted by Pete McBreen 27 Jan 2022 at 23:01

Good news out of the UK, proposal for a new Automated Vehicle Act that transfers liability from the person in the driving seat to the “Authorised Self-Driving Entity” (ADSE) (aka the Manufacturer). The person in the driving seat becomes the “user-in-charge” (UIC), responsible for the condition of the vehicle, the ADSE is responsible for “the way the vehicle drives, ranging from dangerous or careless driving, to exceeding the speed limit or running a red light”.

Proposal also covers “no user-in-charge” (NUIC) where any occupants are merely passengers. “Responsibilities for overseeing the journey will be undertaken by an organisation, a licensed NUIC operator.”

A key new part of this idea is the prevention of misleading marketing

The distinction between driver assistance and self-driving is crucial. Yet many drivers are currently confused about where the boundary lies. This can be dangerous. This problem is aggravated if marketing gives drivers the misleading impression that they do not need to monitor the road while driving - even though the technology is not good enough to be self-driving.

An ASDE is the vehicle manufacturer or software developer who puts an AV forward for authorisation. Our proposals provide some flexibility over the identity of the ASDE: it may be a vehicle manufacturer, or a software developer, or a partnership between the two. However, the ASDE must show that it was closely involved in assessing the safety of the vehicle. It must also be of good repute and have sufficient funds to respond to regulatory action (including organising a recall).

The onus will be on the ASDE to show that the vehicle meets the tests for authorisation. As a minimum, the ASDE would be expected to present evidence of approval, a safety case and an equality impact assessment.

Obviously the devil will be in the details, but this is a massive change in the way that software is covered by the law. Most software falls under the category whereby the developers basically disclaim all responsibility for the operation of the software, but this proposed framework changes that. Even if the vehicle requests a handover to the person in the driving seat, the ASDE remains responsible if the Automated Driving System caused the issue, their example being

While in self-driving mode, an automated vehicle turns into a one-way street in the wrong direction. The user-in-charge takes over but is unable to avoid a collision. Alternatively, no collision takes place, but in the moment the user-in-charge takes over, they are driving in the wrong direction and may be guilty of an offence on that basis.

Sometimes it’s better to light a flamethrower than curse the darkness

Posted by Pete McBreen 27 Jan 2022 at 03:43

Terry Pratchett’s Discworld series has many delightful sayings and ideas, and it is nice to hear that the “Sam Vimes ‘Boots’ theory of socio-economic unfairness” has inspired some action in Roundworld, a new price index that one that

will document the disappearance of the budget lines and the insidiously creeping prices of the most basic versions of essential items at the supermarket

For those who have not heard of the ‘Boots’ theory

“The reason that the rich were so rich, Vimes reasoned, was because they managed to spend less money,” wrote Pratchett. “Take boots, for example. He earned thirty-eight dollars a month plus allowances. A really good pair of leather boots cost fifty dollars. But an affordable pair of boots, which were sort of okay for a season or two and then leaked like hell when the cardboard gave out, cost about ten dollars. Those were the kind of boots Vimes always bought, and wore until the soles were so thin that he could tell where he was in Ankh-Morpork on a foggy night by the feel of the cobbles. But the thing was that good boots lasted for years and years. A man who could afford fifty dollars had a pair of boots that’d still be keeping his feet dry in ten years’ time, while a poor man who could only afford cheap boots would have spent a hundred dollars on boots in the same time and would still have wet feet.”

When does Herd Intelligence kick in?

Posted by Pete McBreen 24 Jan 2022 at 05:12

Why is it that collectively we seem to be fawning over tech visionaries, celebrities and politicians who have no connection with reality?

  • A tweet from the end of 2018 “You can summon your Tesla from your phone. Only short distances today, but in a few years summon will work from across the continent”. – Were there any plans for unattended recharging? How was this supposed to work?
  • Non-fungible Tokens (NFTs) – finally we have a use for the blockchain that is not a ponzi scheme – Really?
  • Microservices and the Cloud – Useful at massive scale in some organizations and contexts, seem to be being adopted cargo cult fashion, by everyone, even for small scale applications. Luckily not everyone buys into this, You Don’t need the cloud.
  • Single Page Applications (SPA) – in context, occasionally useful, but for many websites the overall effect is to make the information on the webpage slower to load and harder to bookmark

If the covid pandemic has taught us anything, it is that a lot of people with an audience are absolutely clueless about the topics on which they are pontificating. On Bullshit, a book published all the way back in 2005, could usefully be required reading for our current age.

Year 15 of the five year plan to retire the mainframe

Posted by Pete McBreen 09 Jan 2022 at 00:32

I heard about this project in the early years of the Agile methodologies, another case of planning for the best case and then failing to realize that their reality check bounced. After an overrun of one or two years you would have thought that there would need to be a radical reappraisal of the approach.

Since then I have seen multiple projects which were supposedly agile that seem to have not heard of the first principle, early and continuous delivery of valuable software. One failure mode is to spend a lot of time in the project initiation activities, or gathering and documenting all requirements before starting to deliver software. Another, probably worse failure mode is to develop a framework for delivering the application, with the thought that this will make the eventual delivery of the application faster.

My take is that it is OK to invest in framework development, but not to do it as part of delivering business value in a project. The problem is similar to what used to happen in the early days of OO projects. The team would spend too much time building a framework that in the end turned out not to help the overall project, but added a lot of delivery risk.

If a company has money and people to burn, then it can make sense to either extract a framework from an existing application or speculatively create a framework. But this must be treated as an investment and must not be on the critical path for any real project until it has been proven out.

For normal projects, it is OK to spend one or two iterations at the start to build up some infrastructure and components for use, but after three iterations the project should be delivering real features to the user community. If you manage to go ten iterations without delivering customer value, the project is not agile, even if it is doing some of the agile ceremonies.

Hoping for the Best, Planning for the Worst

Posted by Pete McBreen 08 Jan 2022 at 04:58

Humans are not very good at doing this. As the last two years have proved, lots of people have hoped for the best and then planned for that best case. This has not turned out to be a very good approach.

In the good times, planning on everything turning out reasonably well and running lean with just in time deliveries can result in good return on investment and higher profits. Typically there are enough buffers in the system that small interruptions can be dealt with, so a week or two delay in shipping due to storms do not cause the system to break down.

Bigger interruptions however can cause major problems, but ideally the effect should be localized. Earthquakes obviously have a major local impact, but unless it hits a monopoly provider location, the impact should not be global. Obviously in an era of offshoring to cheaper locations, there has been a lot of concentration of industry, so the vulnerability to regional disruption is worse.

The downside to the optimization however is the lack of slack in the system. This is when planning for the best case causes problems. Hoping that a pandemic is going to fade away quickly is OK, but making plans on that assumption is not sensible based on the history to date. By now policy makers should be thinking and talking about how many more waves could occur, rather than scrambling to contain the current wave. How do we get the number of active cases in the population low enough that we can control the spread in the long term?

One effect that is starting to be seen is the effect on staffing. How organizations cope when 5% of the staff are off at any one time is a relatively solved problem, but when 25% to 35% are off there are no ready made answers.

A sample pikchr

Posted by Pete McBreen 04 Dec 2021 at 04:48

To use pikchr you really need to downlolad it, play with the examples and read the documentation, but an example pikchr source file (input.pikchr) such as shown below is a great exemplar of the Diagrams as Code paradigm.

box color blue "A" fit;
circle color red "C"
arrow <-> "double headed" "arrow" width 200% chop;
circle "Small" fit;
box "long text will expand the box" fit;

By the use of the command pikchr --svg-only input.pikchr > image.svg gives the image in svg format, ideal for just pasting into a markdown document to display on a web page.

Introducing pikchr

Posted by Pete McBreen 04 Dec 2021 at 04:47

pikchr works well for diagrams that are otherwise awkward to do with the normal visual drawing tools. Brought to you by the maker of fossil and sqlite, as a means of drawing the SQL syntax diagrams. It is useful when you need to draw several similar diagrams to illustrate and idea, while being able to version the changes in a repository so that the diagrams can be regenerated later if changes are needed.

Using pikchr for Diagrams as Code

Posted by Pete McBreen 04 Dec 2021 at 04:43

A C double headed arrow Small long text will expand the box