Prioritization and Cycle Time

Many of the agile processes make a nod towards cycle time, which is typically associated with the time taken from when developers start working on an item and the time when the item is released to production. While this is a useful measure for finding out how long on average developers take to complete work, it is not the full picture.

Defining Cycle Time as just the time when developers are working makes it seem that the time is of the order of a week or so. Admittedly I have seen scrum teams on a two week sprint cycle take multiple sprints to complete items, so even on a simplified cycle time measure many teams are slow.

The full, user’s experienced cycle time however is from the when an item is first recognized, understood, written up into backlog, queued for prioritization, queued for development, developed and released. Although it is hard to get a good handle on the first few stages, I commonly see open JIRA tickets that are 12 months or older, and that is in the unprioritized backlog. From a user viewpoint this is abysmal, sending in a request and not hearing back for a long time.

The prioritization needs to be done in a way that allows for fast feedback to the users. One way of doing this is to adjust the workflow so that there is a two stage writeup, and the initial request can then be simplified and routed for evaluation as soon as it is created. This initial prioritization puts the requests into one of four buckets.

  • Do Now, team parks whatever it is working on and work on the item. Typically only needed for items that have a major, imminent business impact.
  • Do Next, team will pick up this item next after finishing the current work items.
  • Do Later, item gets queued into the backlog for consideration later.
  • Decline, item gets marked as declined, not a part of the backlog.

An immediate Decline is useful as it allows the user to either make a better case for the item or know that it is not going to change so the user needs to accept the current process and/or come up with a workaround.

The Do Later items need to be worked on by the user, business domain experts, analysts and managers to better understand the item, the value and associated costs so that the relative importance of the items in the backlog can be assessed. As part of this there has to be an agreed maximum age of items in the backlog to keep the size manageable. Items that get too old need to be Declined, as they are lower value than all the other items that could be worked on.

The top items in the backlog are effectively in the Do Next category, so if no new high priority items have been put into the Do Next category, then the team can work on the top priority from the Do Later backlog.

The size of the Do Next category has to be managed by the user, business domain experts, analysts and managers group, so that it only contains items that are truly higher value than the top items in the Do Later backlog. The size of the team that is working sets the limit on the number of items in the Do Next category. It has to be small so that the initial evaluation can be immediate, a good starting point is five or one less than the team size if that size is less than five.

More Dan North and CUPID

Finally Dan has got around to creating the Cupid.dev website

I believe that there are properties or characteristics of software that make it a joy to work with. The more your code has these qualities, the more joyful it is to work with; but everything is a tradeoff so you should always consider your context.

There are likely many of these properties, overlapping and interrelating, and there are many ways to describe them. I have chosen five that underpin much of what I care about in code. There is a diminishing return; five are enough to make a handy acronym, and few enough to remember.

Just for reference, CUPID stands for

  • Composable: plays well with others,
  • Unix philosophy: does one thing well,
  • Predictable: does what you expect,
  • Idiomatic: feels natural,
  • Domain-based: the code models the problem domain in language and structure.

Simon Wardley's Blah Template

An amusing take on the generic corporate strategy templates that companies seem to use, just replace Blah with an appropriate buzzword…

        Our strategy is [Blah]. We will lead
    a [Blah] effort of the market through our use
   of [Blah] and [Blah] to build a [Blah]. By being
   both [Blah] and [Blah], our [Blah] approach will
       drive [Bah] throughout the organisation.
     Synergies between our [Blah] and [Blah] will
          enable us to capture the upside by
       becoming [Blah] in a [Blah] world. These
      transformations combined with [Blah] due to
                our [Blah] will create
          a [Blah] through [Blah] and [Blah].

Resulting in a strategy like

Our strategy is customer focused. We will lead a disruptive effort of the market through our use of innovative social media and big data to build a collaborative cloud based ecosystem. By being both digital first and agile, our open approach will drive efficiency throughout the organisation…

From Simon Wardley’s “Why the fuss about Serverless?

Choosing what to work on at the start of a project

This was prompted by watching a presentation by Chris Matts, who pointed out that he often sees teams that deliver a Login screen as the first part of a project. A common pattern he observed was teams delivering the easy parts first and only then tackling the harder parts of the project. The unfortunate outcome of this was that the teams backloaded the project risks, which made the eventual delivery date hard to predict.

His suggestion was to undertake the high risk work first, so as the project moves on the easier stuff moves to the front of the queue and hence the end date becomes more predictable at time progresses. Unstated but part of this is that the team gets better at delivery as the project progresses, so the easier work can be done faster as less learning is needed.

Similar thoughts on the same problem

  • Alistair Cockburn’s Walking Skeleton as presented in Crystal Clear
  • The Executable Architecture idea from the early versions of the Unified Process
  • The Spike Solution idea from eXtreme Programming

Related to this is the idea from Agile of learning from the delivered system. The goal is to learn and respond to that learning as early as possible, so the key idea is to manage the risks by choosing to work on the high risk (business or technical) first.

This is different from the idea of doing the high value items first, mainly because the value of a feature is a guess by the business, but the risk associated with a feature can be assessed. The team knows whether they have built something like the feature before and know how stable the technology is. The business owners know if they have a really clear idea as to how the feature will work, and how the feature will be accepted in the marketplace. All features that score highly on uncertainty need to be scheduled early, so that if the idea does not work, the project can steer towards a different outcome.

Affordances in programming languages

Embedded programming is often done in languages like C because it makes it easy to do bit level operations. As a result, many low level protocols for talking to hardware have partial byte data as per https://www.rfc-editor.org/rfc/rfc791.txt, which draws out an internet header as shown below, with several fields not taking a full byte.

 0                   1                   2                   3   
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version|  IHL  |Type of Service|          Total Length         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|      Fragment Offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live |    Protocol   |         Header Checksum       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       Source Address                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Destination Address                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Options                    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The obvious thing to do in C is to define a struct using bit fields:

struct {
    unsigned int Version : 4;
    unsigned int IHL : 4;
    unsigned int Type_of_Service : 8;
    unsigned int Total_Length : 16;
} Header;

Other languages such as Python lack this construct, so often a programmer might resort to doing a masking and shifting bits to extract the values

>>> Version = first_byte & 0b00001111
>>> IHL = (first_byte & 0b11110000) >> 4

But in this case the affordance of the language leads us astray, yes it is easy to do the masking and shifting, but it is much better to use a library that makes the intention of the code simpler to understand

>>> import bitstring
>>> bits = bitstring.BitString(header_bytes)
>>> version, ihl, tos, t_len = bits.unpack("uint:4, uint:4, uint:8,uint:16")

Implications of Shift-Left for Quality Assurance

Most development teams that use Continuous Integration and Continuous Delivery pipelines are starting to adopt the Shift-Left approach of making sure that the development team build appropriate Unit, Component and Integration tests into the pipeline. The obvious implication is that the traditional Quality Assurance team – from a testing standpoint – only has to deal with System testing.

But in the spirit of Shift-Left, the Quality Assurance team should be looking to see how to validate things earlier as well.

  • With appropriate Unit, Component and Integration tests, the traditional refrain that the code is working as designed becomes more accurate, so the QA team can no longer write End-to-End tests and consider the the job is complete. QA instead has to poke at the corners between the design of the individual subsystems and instead look at the interactions between those subsystems.
  • The problem of working as designed becomes one of QA validating that the design is indeed correct, basically shifting the validation to before the code is written as a logical extension of the Shift-Left mindset.
  • Similarly, QA has to look at the Requirements and the way that they are elicited and documented to move the validation process further left.

A challenge with this is that the Agilista mindset is that it is only working code that matters. So having validation before the code is written could lead to the suggestion that the QA team is falling back into the old ways and encouraging analysis paralysis.

The resolution of this is to look at the overall workflow from a Kanban perspective and see how long it takes for an idea that comes up in Requirements Elicitation to make it into the Development Queue and hence into the hands of the Users. For most organizations, once the Requirement has hit the front of the backlog queue, the process to deliver is relatively short, typically less than 2 weeks for a simple User Story, but can be of the order of a month or more for an Epic containing multiple User Stories.

So the QA team does have a window of opportunity for validating the Requirements – the gap between Elicitation and Documentation and when the Stories get into the development queue. Validating the Design is harder, since most agile teams tend to do that on the fly…

Automating system tests for Defects

An interesting problem comes up when exploratory testing finds a defect in an application. If the defect cannot be fixed immediately, the fix is going to be assigned to the backlog and eventually scheduled to be worked on by the development team. The challenge now is what to put into the automated test. should the test fail or should it pass?

  • A failing test is the obvious choice, since the test should specify correct behavior, and when the developers get around to making the necessary changes, the test will just pass without any extra work required by the QA team.
  • Under PyTest there is the option to mark a test as XFail, which reports the count of tests that failed as expected (XFAIL) and the count of tests that were expected to fail but passed (XPASS)
  • A passing test that encodes the current behavior of the code as correct is another option for the test. The passing test is a Change Detector in that the test will fail when the defect is fixed.

Of the three options, a failing test will mean that every time the test suite runs, one or more tests will fail and there is some overhead of deciding which of the failures are expected and which are not expected. The xfail test has a similar problem, but without the benefit of a stack trace, so if the actual behavior changes, but is not fully fixed, then the test may still report as an XFAIL, so there is still some overhead of checking the test suite result.

A passing test is cheap to evaluate. If all tests pass, then nothing has changed and there is nothing to investigate. If however a test fails, then the normal process of investigating the error starts – and if the passing test had a good error explanation referring to the defect, then the investigation should be short. It is then a short process to amend the test case to reflect the new correct behavior and now the entire test suite should then pass. Some exploratory testing is still necessary to make sure that the fix did not introduce any other weird behavior, but the process of getting the test suite passing again should be trivial.

Automating System Tests

Rails and Phoenix introduced the idea of testing to web development, see Rails Guide and Hexdocs Testing Guide their main focus is on Unit Testing, Testing Views and Testing Controllers, while not saying much about System testing. They both make a nod towards Integration testing, but the focus is more on what developers should know about testing rather than a full system test.

With Unit Tests, the test case setup and teardown is relatively minimal, so often a unit test will have a single assertion – basically a unit test is testing just a single thing. This is appropriate for a Unit Test, since the execution time for the test suite is relatively insensitive to the the number of tests when the testing framework can execute a suite of 1,000 unit tests in a few seconds.

For Controller and View tests, both Rails and Phoenix have a nice way of mocking out the webserver interaction so that the behavior of the Controller that responds to the GET/POST/PUT/DELETE requests from the browser can be tested without needing the full stack. Both include the idea of a Test Database for use by the test suite that gets populated by fixtures that work in conjunction with the test cases to provide appropriate data for the tests. Typically the data is set up before each test case and then cleaned up after each test case so that the failure of one test case does not impact the other test cases. Typically these tests run slower than Unit tests, but a reasonable suite of several hundred tests can run in less than 10 seconds.

Both Unit and the Controller/View tests tend to be relatively simple with few assertions. System Testing is different. For a start the setup and teardown time for these tests can be significant, especially if you have a microservice architecture and the test case covers the interactions between multiple services. So system tests have to pay back the larger setup and teardown time by doing more work inside each system test case which means

  • the scenario for a system test case has to be more of a Soap Opera
  • there should be many more assertions about the steps along the scenario so that if there is an error, the source of the error can be found quickly
  • the scenarios should follow multiple alternate paths through the system

Just-In-Time, Work-In-Progress and Kanban

There are several interesting parallels between Scrum style development and Just-In-Time manufacturing systems, the most useful is the idea of Work-In-Progress (WIP).

  • In manufacturing, having a lot of WIP means that all workstations can be kept busy, at the cost of a lot of investment in the partially finished work and a really slow response to changes. With enough WIP any supply chain disruptions have minimal impact, since there are always enough in somewhere in the manufacturing system.
  • Scrum has the same sort of problem if there are too many epics/stories/tasks in progress, but the WIP is less obvious other than the problem that there is no working software being delivered from the process. It solves the problem of the product owner being slow to document new epics/stories, which is the manufacturing equivalent of supply chain disruption.

Overall low WIP is preferable for a responsive process, but the downside of too low a WIP level is that it will reveal any process issues that exist. In the manufacturing world, the metaphor that was commonly used was that of water flowing in a river. If you slow the volume of water flowing, then the rocks in river become visible, which should be used as a signal to increase the volume of the WIP until the rocks in the stream could be removed (meaning that the process issues could be fixed).

A specific manufacturing example would be the setup time to start processing a new part. With a large setup time, it makes sense to have a large batch size (and hence a larger WIP) so that the setup time can be amortized over the larger batch. The obvious fix is to do the necessary work on fixtures to reduce the setup time, so that smaller batches are feasible, so that the WIP can be reduced.

In the software world, a branch and associated Pull Request (PR) can be viewed as a batch of WIP. A small PR that touches only a few files is fast to review and merge, with a low probability of any merge conflicts. A large PR that represents multiple days of work modifying many files takes much longer to review and merge, with a high probability of a merge conflict that will further slow the process. The normal choice is to choose to do smaller PRs, but that does not work if the review process (AKA the setup time in the manufacturing process) takes a long time.

So the eventual outcome of a slow review process is that developers will choose to have more WIP, since they will choose to make larger changes, since the review takes a long time. They will also work on multiple stories/tasks concurrently so that they can stay busy rather than being stuck idle waiting for their PR to be merged.

Kanban is a mechanism for controlling the amount of WIP.

  • In manufacturing, the available space to store parts on the manufacturing floor is reduced to limit the WIP. Parts can only be produced if there are appropriate places to put the finished components.
  • Software does not have the equivalent concept of intermediate storage of WIP, so the equivalent control is to limit the stories/tasks that can be in progress. This is not quite as physical a control as reducing the size and number of the finished part storage, but with appropriate enforcement it can be an effective strategy.

Working As Designed and other stories

A misconception that some developers hold about testing and quality assurance is that the role of the tester is to validate that the implementation matches the design and/or the documented requirements. The symptoms of this misconception is when a reported bug gets marked as Working As Designed or Meets Requirements, or rejected as an Edge Case that is unlikely to happen in production use.

One of the roles of testers is to act as a User Advocate who protects the interests of the end users. Product owners, analysts and developers sometimes forget that the main goal for software is to improve the life of the users of the software. They can become locked into their own ideas or the existing software architecture and end up creating a sub-optimal user experience.

Architecture at the wrong scale

In a post titled You are not Google Oz Nova points out that many decisions about software architecture are made without really addressing the scale and context of the system under design related to the scale and context of the organization that developed the technology. One of the examples in the post is that of DyanmoDB which was originally designed to support write availability at scale so that Amazon did not lose any “put in basket” actions. In service of the write availability, DynamoDB loses many of the attributes of relational databases that are useful, consistency, referential integrity and easy joins, but few teams that adopt DynamoDB as part of their architecture really need what DynamoDB is optimized for.

In the service of thinking about technology choices, Oz introduces an acronym UNPHAT

  • Understand the problem – then you can produce a solution within the problem space
  • eNumerate the candidates – choose at least three so that you are not doing a simple binary choice
  • Find a Paper to read about each candidate – get beyond the marketing and conference presentation
  • Investigate the Historical context – what was it designed to optimize and what could it ignore
  • What Advantages does the technology have and what are the disadvantages – nothing is ever all positives
  • Think about how well the technology fits your problem space – does it work for your context and scale?

In the wake of this, how many problem spaces really need the complexity of Microservices?

Testing emails when your app is not the sender

When testing emails, when your application is the sender of the email, then tools like mailtrap are good for capturing the SMTP emails from Development and Staging environments, allowing automated tests to grab the emails and check things like the password reset flows.

Problems arise however when other systems are sending emails that are relevant to your application. A recent example I ran across was similar to this One Time Passcode workflow documented by Microsoft. The basic problem is that a third party is emailing to one of your test accounts and you need to extract some information from that email in order to complete an action.

As a tester, it is normally feasible to get a few test accounts setup by you email admins, and then your test cases can reuse that limited set of email addresses for the automated tests. A better way though is to use a service like Mailinator which allows easy access to a reasonable number of randomly generated usernames, say lmykuhuzgfbgim@mailinator.com, on one of the Mailinator domains. On the free tier, the emails show up in a public email box at https://www.mailinator.com/v4/public/inboxes.jsp?to=lmykuhuzgfbgim.

The fun starts as soon as you have an account, then you can use an API to get the contents of your private emails to your own mailinator domain. The Message API allows you to fetch the message identifiers, read the email related to that identifier and then delete the message related to that identifier to keep the mailbox clean.

The way this works is by specifying a wildcard catch-all on the email domain, that will catch all emails not addressed to known usernames. Normally an email server would send back a message saying mailbox not known, but the catch-all just grabs all those unknown emails and forms the basics of the mailinator system. You could roll your own, but much simpler to use Mailinator or one of the competing services.

Test code is not Production code

And as such needs to be held to different standards. Copy and Paste of test code is not as problematic as it could be in Production code. What matters is that the tests are cheap to write and modify, so speculatively scattering the test case code into multiple modules is at best a premature optimization, and potentially a waste of time.

When using playwright and pytest, a new test should be written inline in a single test_xxx without writing any new any helper functions, test utilities or extracting selector strings out to constants. Yes, it can use existing fixtures to handle things like Login/logout and data setup, and exiting helper functions to perform necessary actions, but all the new stuff has to be inline in the test and ideally should be less than 30 lines of code.

The resulting commit to the repository should only be for a single file, with the file touched in at most three places

  • The DocString that describes the tests contained in the file
  • Potentially some new import statements
  • The new test case inside the test_xxx method

The rationale for this is to force the person writing the test to focus just on the test and nothing else. Refactoring to extract out selector strings can happen later, as can extracting reusable bits of test code, either as test utilities or possible fixtures. The initial focus should be on a quick and dirty implementation for the test so that it can run against the application that is being tested in the relevant environments.

Once the test is running and being useful testing the application, then is the time to think about the code quality:

  • If there are now multiple tests that have identical data setup and teardown, then extract that code to a fixture
  • If the same selector is now used in multiple test cases, or multiple places in the same test case, then it can make sense to extract it to a constant and collect the selectors for that UI component into a separate place.
  • If the test case had to hit a web service, it may make sense to extract code to a helper function, but only if there are other existing tests that could make use of the extracted function.

Overall the goal is to have new test cases that read from top to bottom without your team having to investigate what is happening in as myriad of new functions and modules that have been added just to support this new test case.

Applying Quality Assurance to your Tests

Although Brian Marick was not the originator of the concept, I first heard about Soap Opera Tests from Brian. Rather than a test covering a single, simple scenario, instead exaggerate and complicate the scenario to push the system to see where the failures can occur. This gets around the problem that is often seen in Agile projects where the team tries to simplify the problem domain by ignoring what could be considered to be Edge Cases and just addressing the simple scenarios.

The lens of a Soap Opera can be useful to review the test suite for an application that goes beyond the simplistic code coverage that is often reported from unit tests and component tests within a deployment pipeline

  • How many tests (outside of unit tests) have a trivial sequence of setup, do action, check result, teardown, (or to use the Agile terms, how many tests are of the form Given, When, Then) rather than a connected sequence of transactions that represents a complex scenario?
  • For parameterized tests, how many are truly distinct tests rather than just equivalent values that exercise the exact same code path?
  • Are the System tests already covered by the Component level tests already implemented by the developers? (Typically the developer written tests may consider some possible failures, but miss others)
  • Do the System tests touch multiple parts of the architecture as part of a test scenario? (This is where a Soap Opera mindset helps, making sure that the test addresses what happens at team and component boundaries.)
  • Do the System tests address the full scope of the system and cover all interacting systems? (A common failing is that of not testing that the data replicated to the associated data lake/swamp/warehouse accurately represents the system data.)

Overall whenever evaluating a test, it is useful to know what risk is it addressing. Ideally any descriptive text included in the automated test case should include information about the motivation for the test, why it is important and the consequences of skipping the test. My take is that System tests should not be just repeating what can already be done by unit and component level tests (e.g. view and controller tests in Phoenix Testing terminology), they have to go beyond those simple scenarios and probe the interfaces between the various components.

Basically all tests have to answer the economic question as to what is the value of this test case?

Dan North and CUPID

CUPID is Dan’s response to the SOLID principles and back story. Rather than another set of principles, Dan instead chose to focus on the properties of the software.

  • Composable – code that works well with others and does not have too many external dependencies that make it harder to use in another context, ideally with Intention Revealing terminology
  • Unix philosophy – related to the composability property, does one thing well and works well with others to build a larger solution
  • Predictable – or as the saying goes, “does what it says on the tin.” Dan calls this a generalization of Testability, it should behave as expected, be deterministic and observable
  • Idiomatic – naturally fits in with the way code is written in the implementation language, so for example in Python, rather than open, write and then close a text file, the natural way to write this is as below, where Python automatically handles the closing of the file

with open("file.txt", 'w') as textfile: textfile.write("hello world!")

  • Domain based – uses words and language in a way that would be familiar to practitioners in that domain.

Read Evaluate Print Loop - REPL for the win

When working with interpreted languages like Ruby, Elixir and Python it is great to use the REPL to discover the capabilities of the various variables that you are dealing with.

Ruby uses irb, Elixir uses irb and to be different Python jumps directly into the interactive prompt using python. In each of these you have the full power of the language to use whatever libraries you have installed by just typing code at the relevant prompt. So at the python prompt you could do the following to see how Playwright interacts with the browser - using code borrowed from an earlier post.

from playwright.sync_api import sync_playwright
playwright = sync_playwright().start()
browser = playwright.chromium.launch() 

page = browser.new_page()
page.goto("https://www.selenium.dev/")
page.click("#main_navbar :text('blog')")
title = page.title()
print(title)

The nice thing with each of these REPLs is that they allow you to see the type of the object and the associated attributes and methods, and hence get a better understanding of the library by trying things out and getting immediate success or failure - with an associated error message and stack dump, immediately followed by the REPL prompt for you to try again. Amusingly this even works for overly complex APIs like the Amazon Boto3 python library that you need to interact with the AWS services.

A neat Playwright Codegen feature

When using the playwright codegen utility, it provides a nice preview of the available selector when hovering the mouse over any part of the web page. When tried with the Phoenix Liveview default application, it can be started with the command

> playwright codegen http://localhost:4000/

and after navigating to the LiveDashboard, the selector for the refresh speed shows up in the chromium browser

Preview of selectors

It also does a good job of generating some sample code that can then be copied into a pytest test case for future reuse

# Click text=Ports

# with page.expect_navigation(url="http://localhost:4000/dashboard/ports"):

with page.expect_navigation():

    page.click("text=Ports")



# Select 2

page.select_option("select[name=\"refresh\"]", "2")

Note that it will delay the script with expect_navigation until the Ports page is displayed - although it is not waiting for a specific url unlike the commented out part of the code.

Sometimes it's better to light a flamethrower than curse the darkness

Terry Pratchett’s Discworld series has many delightful sayings and ideas, and it is nice to hear that the “Sam Vimes ‘Boots’ theory of socio-economic unfairness” has inspired some action in Roundworld, a new price index that one that

will document the disappearance of the budget lines and the insidiously creeping prices of the most basic versions of essential items at the supermarket

For those who have not heard of the ‘Boots’ theory

“The reason that the rich were so rich, Vimes reasoned, was because they managed to spend less money,” wrote Pratchett. “Take boots, for example. He earned thirty-eight dollars a month plus allowances. A really good pair of leather boots cost fifty dollars. But an affordable pair of boots, which were sort of okay for a season or two and then leaked like hell when the cardboard gave out, cost about ten dollars. Those were the kind of boots Vimes always bought, and wore until the soles were so thin that he could tell where he was in Ankh-Morpork on a foggy night by the feel of the cobbles. But the thing was that good boots lasted for years and years. A man who could afford fifty dollars had a pair of boots that’d still be keeping his feet dry in ten years’ time, while a poor man who could only afford cheap boots would have spent a hundred dollars on boots in the same time and would still have wet feet.”

A legal framework for automated vehicles

Good news out of the UK, proposal for a new Automated Vehicle Act that transfers liability from the person in the driving seat to the “Authorised Self-Driving Entity” (ADSE) (aka the Manufacturer). The person in the driving seat becomes the “user-in-charge” (UIC), responsible for the condition of the vehicle, the ADSE is responsible for “the way the vehicle drives, ranging from dangerous or careless driving, to exceeding the speed limit or running a red light”.

Proposal also covers “no user-in-charge” (NUIC) where any occupants are merely passengers. “Responsibilities for overseeing the journey will be undertaken by an organisation, a licensed NUIC operator.”

A key new part of this idea is the prevention of misleading marketing

The distinction between driver assistance and self-driving is crucial. Yet many drivers are currently confused about where the boundary lies. This can be dangerous. This problem is aggravated if marketing gives drivers the misleading impression that they do not need to monitor the road while driving - even though the technology is not good enough to be self-driving.

An ASDE is the vehicle manufacturer or software developer who puts an AV forward for authorisation. Our proposals provide some flexibility over the identity of the ASDE: it may be a vehicle manufacturer, or a software developer, or a partnership between the two. However, the ASDE must show that it was closely involved in assessing the safety of the vehicle. It must also be of good repute and have sufficient funds to respond to regulatory action (including organising a recall).

The onus will be on the ASDE to show that the vehicle meets the tests for authorisation. As a minimum, the ASDE would be expected to present evidence of approval, a safety case and an equality impact assessment.

Obviously the devil will be in the details, but this is a massive change in the way that software is covered by the law. Most software falls under the category whereby the developers basically disclaim all responsibility for the operation of the software, but this proposed framework changes that. Even if the vehicle requests a handover to the person in the driving seat, the ASDE remains responsible if the Automated Driving System caused the issue, their example being

While in self-driving mode, an automated vehicle turns into a one-way street in the wrong direction. The user-in-charge takes over but is unable to avoid a collision. Alternatively, no collision takes place, but in the moment the user-in-charge takes over, they are driving in the wrong direction and may be guilty of an offence on that basis.