Just-In-Time, Work-In-Progress and Kanban

There are several interesting parallels between Scrum style development and Just-In-Time manufacturing systems, the most useful is the idea of Work-In-Progress (WIP).

  • In manufacturing, having a lot of WIP means that all workstations can be kept busy, at the cost of a lot of investment in the partially finished work and a really slow response to changes. With enough WIP any supply chain disruptions have minimal impact, since there are always enough in somewhere in the manufacturing system.
  • Scrum has the same sort of problem if there are too many epics/stories/tasks in progress, but the WIP is less obvious other than the problem that there is no working software being delivered from the process. It solves the problem of the product owner being slow to document new epics/stories, which is the manufacturing equivalent of supply chain disruption.

Overall low WIP is preferable for a responsive process, but the downside of too low a WIP level is that it will reveal any process issues that exist. In the manufacturing world, the metaphor that was commonly used was that of water flowing in a river. If you slow the volume of water flowing, then the rocks in river become visible, which should be used as a signal to increase the volume of the WIP until the rocks in the stream could be removed (meaning that the process issues could be fixed).

A specific manufacturing example would be the setup time to start processing a new part. With a large setup time, it makes sense to have a large batch size (and hence a larger WIP) so that the setup time can be amortized over the larger batch. The obvious fix is to do the necessary work on fixtures to reduce the setup time, so that smaller batches are feasible, so that the WIP can be reduced.

In the software world, a branch and associated Pull Request (PR) can be viewed as a batch of WIP. A small PR that touches only a few files is fast to review and merge, with a low probability of any merge conflicts. A large PR that represents multiple days of work modifying many files takes much longer to review and merge, with a high probability of a merge conflict that will further slow the process. The normal choice is to choose to do smaller PRs, but that does not work if the review process (AKA the setup time in the manufacturing process) takes a long time.

So the eventual outcome of a slow review process is that developers will choose to have more WIP, since they will choose to make larger changes, since the review takes a long time. They will also work on multiple stories/tasks concurrently so that they can stay busy rather than being stuck idle waiting for their PR to be merged.

Kanban is a mechanism for controlling the amount of WIP.

  • In manufacturing, the available space to store parts on the manufacturing floor is reduced to limit the WIP. Parts can only be produced if there are appropriate places to put the finished components.
  • Software does not have the equivalent concept of intermediate storage of WIP, so the equivalent control is to limit the stories/tasks that can be in progress. This is not quite as physical a control as reducing the size and number of the finished part storage, but with appropriate enforcement it can be an effective strategy.

Working As Designed and other stories

A misconception that some developers hold about testing and quality assurance is that the role of the tester is to validate that the implementation matches the design and/or the documented requirements. The symptoms of this misconception is when a reported bug gets marked as Working As Designed or Meets Requirements, or rejected as an Edge Case that is unlikely to happen in production use.

One of the roles of testers is to act as a User Advocate who protects the interests of the end users. Product owners, analysts and developers sometimes forget that the main goal for software is to improve the life of the users of the software. They can become locked into their own ideas or the existing software architecture and end up creating a sub-optimal user experience.

Architecture at the wrong scale

In a post titled You are not Google Oz Nova points out that many decisions about software architecture are made without really addressing the scale and context of the system under design related to the scale and context of the organization that developed the technology. One of the examples in the post is that of DyanmoDB which was originally designed to support write availability at scale so that Amazon did not lose any “put in basket” actions. In service of the write availability, DynamoDB loses many of the attributes of relational databases that are useful, consistency, referential integrity and easy joins, but few teams that adopt DynamoDB as part of their architecture really need what DynamoDB is optimized for.

In the service of thinking about technology choices, Oz introduces an acronym UNPHAT

  • Understand the problem – then you can produce a solution within the problem space
  • eNumerate the candidates – choose at least three so that you are not doing a simple binary choice
  • Find a Paper to read about each candidate – get beyond the marketing and conference presentation
  • Investigate the Historical context – what was it designed to optimize and what could it ignore
  • What Advantages does the technology have and what are the disadvantages – nothing is ever all positives
  • Think about how well the technology fits your problem space – does it work for your context and scale?

In the wake of this, how many problem spaces really need the complexity of Microservices?

Testing emails when your app is not the sender

When testing emails, when your application is the sender of the email, then tools like mailtrap are good for capturing the SMTP emails from Development and Staging environments, allowing automated tests to grab the emails and check things like the password reset flows.

Problems arise however when other systems are sending emails that are relevant to your application. A recent example I ran across was similar to this One Time Passcode workflow documented by Microsoft. The basic problem is that a third party is emailing to one of your test accounts and you need to extract some information from that email in order to complete an action.

As a tester, it is normally feasible to get a few test accounts setup by you email admins, and then your test cases can reuse that limited set of email addresses for the automated tests. A better way though is to use a service like Mailinator which allows easy access to a reasonable number of randomly generated usernames, say lmykuhuzgfbgim@mailinator.com, on one of the Mailinator domains. On the free tier, the emails show up in a public email box at https://www.mailinator.com/v4/public/inboxes.jsp?to=lmykuhuzgfbgim.

The fun starts as soon as you have an account, then you can use an API to get the contents of your private emails to your own mailinator domain. The Message API allows you to fetch the message identifiers, read the email related to that identifier and then delete the message related to that identifier to keep the mailbox clean.

The way this works is by specifying a wildcard catch-all on the email domain, that will catch all emails not addressed to known usernames. Normally an email server would send back a message saying mailbox not known, but the catch-all just grabs all those unknown emails and forms the basics of the mailinator system. You could roll your own, but much simpler to use Mailinator or one of the competing services.

Test code is not Production code

And as such needs to be held to different standards. Copy and Paste of test code is not as problematic as it could be in Production code. What matters is that the tests are cheap to write and modify, so speculatively scattering the test case code into multiple modules is at best a premature optimization, and potentially a waste of time.

When using playwright and pytest, a new test should be written inline in a single test_xxx without writing any new any helper functions, test utilities or extracting selector strings out to constants. Yes, it can use existing fixtures to handle things like Login/logout and data setup, and exiting helper functions to perform necessary actions, but all the new stuff has to be inline in the test and ideally should be less than 30 lines of code.

The resulting commit to the repository should only be for a single file, with the file touched in at most three places

  • The DocString that describes the tests contained in the file
  • Potentially some new import statements
  • The new test case inside the test_xxx method

The rationale for this is to force the person writing the test to focus just on the test and nothing else. Refactoring to extract out selector strings can happen later, as can extracting reusable bits of test code, either as test utilities or possible fixtures. The initial focus should be on a quick and dirty implementation for the test so that it can run against the application that is being tested in the relevant environments.

Once the test is running and being useful testing the application, then is the time to think about the code quality:

  • If there are now multiple tests that have identical data setup and teardown, then extract that code to a fixture
  • If the same selector is now used in multiple test cases, or multiple places in the same test case, then it can make sense to extract it to a constant and collect the selectors for that UI component into a separate place.
  • If the test case had to hit a web service, it may make sense to extract code to a helper function, but only if there are other existing tests that could make use of the extracted function.

Overall the goal is to have new test cases that read from top to bottom without your team having to investigate what is happening in as myriad of new functions and modules that have been added just to support this new test case.

Applying Quality Assurance to your Tests

Although Brian Marick was not the originator of the concept, I first heard about Soap Opera Tests from Brian. Rather than a test covering a single, simple scenario, instead exaggerate and complicate the scenario to push the system to see where the failures can occur. This gets around the problem that is often seen in Agile projects where the team tries to simplify the problem domain by ignoring what could be considered to be Edge Cases and just addressing the simple scenarios.

The lens of a Soap Opera can be useful to review the test suite for an application that goes beyond the simplistic code coverage that is often reported from unit tests and component tests within a deployment pipeline

  • How many tests (outside of unit tests) have a trivial sequence of setup, do action, check result, teardown, (or to use the Agile terms, how many tests are of the form Given, When, Then) rather than a connected sequence of transactions that represents a complex scenario?
  • For parameterized tests, how many are truly distinct tests rather than just equivalent values that exercise the exact same code path?
  • Are the System tests already covered by the Component level tests already implemented by the developers? (Typically the developer written tests may consider some possible failures, but miss others)
  • Do the System tests touch multiple parts of the architecture as part of a test scenario? (This is where a Soap Opera mindset helps, making sure that the test addresses what happens at team and component boundaries.)
  • Do the System tests address the full scope of the system and cover all interacting systems? (A common failing is that of not testing that the data replicated to the associated data lake/swamp/warehouse accurately represents the system data.)

Overall whenever evaluating a test, it is useful to know what risk is it addressing. Ideally any descriptive text included in the automated test case should include information about the motivation for the test, why it is important and the consequences of skipping the test. My take is that System tests should not be just repeating what can already be done by unit and component level tests (e.g. view and controller tests in Phoenix Testing terminology), they have to go beyond those simple scenarios and probe the interfaces between the various components.

Basically all tests have to answer the economic question as to what is the value of this test case?

Dan North and CUPID

CUPID is Dan’s response to the SOLID principles and back story. Rather than another set of principles, Dan instead chose to focus on the properties of the software.

  • Composable – code that works well with others and does not have too many external dependencies that make it harder to use in another context, ideally with Intention Revealing terminology
  • Unix philosophy – related to the composability property, does one thing well and works well with others to build a larger solution
  • Predictable – or as the saying goes, “does what it says on the tin.” Dan calls this a generalization of Testability, it should behave as expected, be deterministic and observable
  • Idiomatic – naturally fits in with the way code is written in the implementation language, so for example in Python, rather than open, write and then close a text file, the natural way to write this is as below, where Python automatically handles the closing of the file

with open("file.txt", 'w') as textfile: textfile.write("hello world!")

  • Domain based – uses words and language in a way that would be familiar to practitioners in that domain.

Read Evaluate Print Loop - REPL for the win

When working with interpreted languages like Ruby, Elixir and Python it is great to use the REPL to discover the capabilities of the various variables that you are dealing with.

Ruby uses irb, Elixir uses irb and to be different Python jumps directly into the interactive prompt using python. In each of these you have the full power of the language to use whatever libraries you have installed by just typing code at the relevant prompt. So at the python prompt you could do the following to see how Playwright interacts with the browser - using code borrowed from an earlier post.

from playwright.sync_api import sync_playwright
playwright = sync_playwright().start()
browser = playwright.chromium.launch() 

page = browser.new_page()
page.goto("https://www.selenium.dev/")
page.click("#main_navbar :text('blog')")
title = page.title()
print(title)

The nice thing with each of these REPLs is that they allow you to see the type of the object and the associated attributes and methods, and hence get a better understanding of the library by trying things out and getting immediate success or failure - with an associated error message and stack dump, immediately followed by the REPL prompt for you to try again. Amusingly this even works for overly complex APIs like the Amazon Boto3 python library that you need to interact with the AWS services.

A neat Playwright Codegen feature

When using the playwright codegen utility, it provides a nice preview of the available selector when hovering the mouse over any part of the web page. When tried with the Phoenix Liveview default application, it can be started with the command

> playwright codegen http://localhost:4000/

and after navigating to the LiveDashboard, the selector for the refresh speed shows up in the chromium browser

Preview of selectors

It also does a good job of generating some sample code that can then be copied into a pytest test case for future reuse

# Click text=Ports

# with page.expect_navigation(url="http://localhost:4000/dashboard/ports"):

with page.expect_navigation():

    page.click("text=Ports")



# Select 2

page.select_option("select[name=\"refresh\"]", "2")

Note that it will delay the script with expect_navigation until the Ports page is displayed - although it is not waiting for a specific url unlike the commented out part of the code.

Sometimes it's better to light a flamethrower than curse the darkness

Terry Pratchett’s Discworld series has many delightful sayings and ideas, and it is nice to hear that the “Sam Vimes ‘Boots’ theory of socio-economic unfairness” has inspired some action in Roundworld, a new price index that one that

will document the disappearance of the budget lines and the insidiously creeping prices of the most basic versions of essential items at the supermarket

For those who have not heard of the ‘Boots’ theory

“The reason that the rich were so rich, Vimes reasoned, was because they managed to spend less money,” wrote Pratchett. “Take boots, for example. He earned thirty-eight dollars a month plus allowances. A really good pair of leather boots cost fifty dollars. But an affordable pair of boots, which were sort of okay for a season or two and then leaked like hell when the cardboard gave out, cost about ten dollars. Those were the kind of boots Vimes always bought, and wore until the soles were so thin that he could tell where he was in Ankh-Morpork on a foggy night by the feel of the cobbles. But the thing was that good boots lasted for years and years. A man who could afford fifty dollars had a pair of boots that’d still be keeping his feet dry in ten years’ time, while a poor man who could only afford cheap boots would have spent a hundred dollars on boots in the same time and would still have wet feet.”

A legal framework for automated vehicles

Good news out of the UK, proposal for a new Automated Vehicle Act that transfers liability from the person in the driving seat to the “Authorised Self-Driving Entity” (ADSE) (aka the Manufacturer). The person in the driving seat becomes the “user-in-charge” (UIC), responsible for the condition of the vehicle, the ADSE is responsible for “the way the vehicle drives, ranging from dangerous or careless driving, to exceeding the speed limit or running a red light”.

Proposal also covers “no user-in-charge” (NUIC) where any occupants are merely passengers. “Responsibilities for overseeing the journey will be undertaken by an organisation, a licensed NUIC operator.”

A key new part of this idea is the prevention of misleading marketing

The distinction between driver assistance and self-driving is crucial. Yet many drivers are currently confused about where the boundary lies. This can be dangerous. This problem is aggravated if marketing gives drivers the misleading impression that they do not need to monitor the road while driving - even though the technology is not good enough to be self-driving.

An ASDE is the vehicle manufacturer or software developer who puts an AV forward for authorisation. Our proposals provide some flexibility over the identity of the ASDE: it may be a vehicle manufacturer, or a software developer, or a partnership between the two. However, the ASDE must show that it was closely involved in assessing the safety of the vehicle. It must also be of good repute and have sufficient funds to respond to regulatory action (including organising a recall).

The onus will be on the ASDE to show that the vehicle meets the tests for authorisation. As a minimum, the ASDE would be expected to present evidence of approval, a safety case and an equality impact assessment.

Obviously the devil will be in the details, but this is a massive change in the way that software is covered by the law. Most software falls under the category whereby the developers basically disclaim all responsibility for the operation of the software, but this proposed framework changes that. Even if the vehicle requests a handover to the person in the driving seat, the ASDE remains responsible if the Automated Driving System caused the issue, their example being

While in self-driving mode, an automated vehicle turns into a one-way street in the wrong direction. The user-in-charge takes over but is unable to avoid a collision. Alternatively, no collision takes place, but in the moment the user-in-charge takes over, they are driving in the wrong direction and may be guilty of an offence on that basis.

When does Herd Intelligence kick in?

Why is it that collectively we seem to be fawning over tech visionaries, celebrities and politicians who have no connection with reality?

  • A tweet from the end of 2018 “You can summon your Tesla from your phone. Only short distances today, but in a few years summon will work from across the continent”. – Were there any plans for unattended recharging? How was this supposed to work?
  • Non-fungible Tokens (NFTs) – finally we have a use for the blockchain that is not a ponzi scheme – Really?
  • Microservices and the Cloud – Useful at massive scale in some organizations and contexts, seem to be being adopted cargo cult fashion, by everyone, even for small scale applications. Luckily not everyone buys into this, You Don’t need the cloud.
  • Single Page Applications (SPA) – in context, occasionally useful, but for many websites the overall effect is to make the information on the webpage slower to load and harder to bookmark

If the covid pandemic has taught us anything, it is that a lot of people with an audience are absolutely clueless about the topics on which they are pontificating. On Bullshit, a book published all the way back in 2005, could usefully be required reading for our current age.

Year 15 of the five year plan to retire the mainframe

I heard about this project in the early years of the Agile methodologies, another case of planning for the best case and then failing to realize that their reality check bounced. After an overrun of one or two years you would have thought that there would need to be a radical reappraisal of the approach.

Since then I have seen multiple projects which were supposedly agile that seem to have not heard of the first principle, early and continuous delivery of valuable software. One failure mode is to spend a lot of time in the project initiation activities, or gathering and documenting all requirements before starting to deliver software. Another, probably worse failure mode is to develop a framework for delivering the application, with the thought that this will make the eventual delivery of the application faster.

My take is that it is OK to invest in framework development, but not to do it as part of delivering business value in a project. The problem is similar to what used to happen in the early days of OO projects. The team would spend too much time building a framework that in the end turned out not to help the overall project, but added a lot of delivery risk.

If a company has money and people to burn, then it can make sense to either extract a framework from an existing application or speculatively create a framework. But this must be treated as an investment and must not be on the critical path for any real project until it has been proven out.

For normal projects, it is OK to spend one or two iterations at the start to build up some infrastructure and components for use, but after three iterations the project should be delivering real features to the user community. If you manage to go ten iterations without delivering customer value, the project is not agile, even if it is doing some of the agile ceremonies.

Hoping for the Best, Planning for the Worst

Humans are not very good at doing this. As the last two years have proved, lots of people have hoped for the best and then planned for that best case. This has not turned out to be a very good approach.

In the good times, planning on everything turning out reasonably well and running lean with just in time deliveries can result in good return on investment and higher profits. Typically there are enough buffers in the system that small interruptions can be dealt with, so a week or two delay in shipping due to storms do not cause the system to break down.

Bigger interruptions however can cause major problems, but ideally the effect should be localized. Earthquakes obviously have a major local impact, but unless it hits a monopoly provider location, the impact should not be global. Obviously in an era of offshoring to cheaper locations, there has been a lot of concentration of industry, so the vulnerability to regional disruption is worse.

The downside to the optimization however is the lack of slack in the system. This is when planning for the best case causes problems. Hoping that a pandemic is going to fade away quickly is OK, but making plans on that assumption is not sensible based on the history to date. By now policy makers should be thinking and talking about how many more waves could occur, rather than scrambling to contain the current wave. How do we get the number of active cases in the population low enough that we can control the spread in the long term?

One effect that is starting to be seen is the effect on staffing. How organizations cope when 5% of the staff are off at any one time is a relatively solved problem, but when 25% to 35% are off there are no ready made answers.

Introducing pikchr for Diagrams as Code

pikchr works well for diagrams that are otherwise awkward to do with the normal visual drawing tools. Brought to you by the maker of fossil and sqlite, as a means of drawing the SQL syntax diagrams. It is useful when you need to draw several similar diagrams to illustrate and idea, while being able to version the changes in a repository so that the diagrams can be regenerated later if changes are needed.

A sample pikchr

To use pikchr you really need to downlolad it, play with the examples and read the documentation, but an example pikchr source file (input.pikchr) such as shown below is a great exemplar of the Diagrams as Code paradigm.

box color blue "A" fit;
circle color red "C"
arrow <-> "double headed" "arrow" width 200% chop;
circle "Small" fit;
box "long text will expand the box" fit;

By the use of the command pikchr --svg-only input.pikchr > image.svg gives the image in svg format, ideal for just pasting into a markdown document to display on a web page.

A C double headed arrow Small long text will expand the box

A different take on autonomous vehicles

From Why We Drive by Matthew B. Crawford …

… the logical necessity of driverless cars becomes clear. It seems likely there will be real-time auctions to determine the route your Google car takes, so you can be offered empowering choices along the way. […] One marketer put it quite frankly: the goal is to intercept people in their daily routines with brand and promotional messages.

The book draws a distinction between the use of a car to get somewhere, and the pure joy of just going out for a drive. The creators of autonomous vehicles do not seem to be drawn from the population that enjoys going out for a drive down a twisty road.

The book also highlights the problem of incomplete features and systems

… We are not just daunted by by the obscure logic of such machines, but seem to feel ourselves responsible to them, afraid of being wrong in their presence, and therefore reluctant to challenge them even as the […] GPS directs us into a lake.

Many drivers of such vehicles even go as far as to defend these early attempts at making the technology work, thinking that it is acceptable for companies to test out their systems on public roads.

Systems designed to minimize the role of human intelligence tend to be brittle, as they are not able to anticipate every contingency. When they fail, their failures tend to be systemic, in proportion to the comprehensive reach of their control.

To date we have been lucky in that most times the driver has managed to react in time before the vehicle plows into a stationary object.

Just say no to software certifications

Certifications can make sense in the mechanical world where there is a One Best Way to achieve a desired outcome, or there is a basic level of competency that is awkward to test for. So many mechanical trades have safety certificates that have to be periodically renewed, and most countries have the idea of a driving license that is a permit to drive a specific type of vehicle. After all we do not want an electrician or gas fitter getting creative with the building code.

In software though, as Perl programmers say There Is More Than One Way To Do It, and we do want developers to get creative, and by some reports parts of Ruby came from Perl. We don’t want to be certifying developers as capable of creating Perl CGI scripts when there is Ruby on Rails available. The same can be said for all of the cloud certifications, a money maker for the providers, but quickly outdated certifications as the could providers release new capabilities every few months, and hey presto, you need to take that certification exam again (and obviously pay the fee again).

Interesting take from Tim Bray on Microservices and Integration testing

In talking about testing in 2021 Tim Bray says explicitly about Integration or end to end testing …

The problem is that moving from monoliths to microservices, which makes these tests more important, also makes them harder to build. Which is another good reason to stick with a nice simple monolith if you can. No, I’m not kidding.

Which in turn means you have to be sure to budget time, including design and maintenance time, for your integration testing. (Unit testing is just part of the basic coding budget.)