Good performance measurement focuses on global outputs. Software delivery performance can be measured with following criteria:
– lead time
– deployment frequency
– time to restore service
– change fail rate
Lead time – The time it takes from receiving a request from a customer to this request being satisfied. Lead time is divided into two phases. First phase – design and validate is a highly variable process where hypothesis drive the design decisions. Second phase – delivery is the time it takes for work to be implemented tested and deployed to production.
Shorter lead times allow for faster feedback loops – thus more frequent course corrections.
Delivery lead time is the time it takes for code committed to the main branch to be running in production or to arrive in the app store.
Full feedback loop looks as following:
customer request > design and validation > software development > commit to main branch > delivery lead time > deployment to production
Delivery lead times fall under categories < 1 hour , < 1 day, 1 day - 1 week, 1 week - 1 month, 1 month - 6 months, > 6 months.
Reduction of batch size is one of foundations of lean paradigm. Since software development doesn’t have material inventory, the batch size is a measured in frequency of deployments. Buy reducing the batch size we can reduce cycle time, variability in flow and accelerate feedback cycle.
System stability is measured as time between failures. It indicates how fast the system can be restored.
Fail rate indicates how often changes to the system (e.g software releases or configuration changes) cause failure, degradation of the service or require remediation.
The studies presented in the book indicated that there is no tradeoff between speed and stability.
High performers maintain or improve the tempo and stability over the years while low performers stay on the same level or decreased both velocity and stability.
According to the studies high performers:
– deploy to production on demand <1h
– have lead time for changes below <1h
– mean time to recover is below <1h
– change failure rate is below 15%