Software, The Invisible Design
Why having Software and the invisible design in the same phrase? Do we mean that the software lines of code can’t be read or seen? Of course not! What is meant is that the lines of code despite the fact that they are visible are only understandable for the one who has written them. For anyone else, the software program is obscure. The colleagues during a peer review, in order to help criticizing the code, will need to spend about as much time as to write it the first time. The functioning of the software product is thus a bit invisible.
Invisible as long as it works, it’s ok. But sometimes it stops working for unknown reasons. So we try here and there to make the code understandable and the local patches make it work again but can create interactions with other sections of programs, which then create new problems later on: problems that we’ll wait for with legitimate anxiety.
So, pressed by time, the debugging continues to take place when the product is already with the customer. And apparently, that’s ok as long as the customer is captive … as long as the customer is willing to pay for our quality problems. The clients gently pay for communicating the bugs from which they suffer (1) and wait patiently for our solutions so long as there is no business competition or the problems are not life threatening.
Life threatening and aggressive competition are the usual throes of some business like the automotive manufacturing in which the margins are tiny, the volume huge and the risks very high. In such conditions making mistakes is just unacceptable because mistakes can kill people and make the whole company disappear all at once.
The automotive industry has been using for decades the tools and process of failure mode avoidance for the design and the production of hardware and it has been successful. For the development team, it’s mainly about discussing around the engineering blue prints and criticizing the conception guided by the FMEA (Failure Modes and Effect Analysis).
Previously, the FMEA workshop has been prepared with other tools such as the study of the variation modes of each key product input; work which would have been made possible by the fine analysis of the product subsystems and elementary parts or components. We recognize that a hardware product always breaks down due to a variation problem with one of its components. Effectively, as long as it stays within its nominal values foreseen during conception, it works. And it’s when it deviates from this, then problems emerge.
So we torture our minds looking at the blue prints proposing this or that improvement in order to reduce risks and this by:
- reducing the impact of the component on its hierarchically superior system for which it is the source of functionality and/or by reducing the failure risk of that very component (2).
- Should a problem nevertheless emerge, some controls will be anticipated and put in place in order to reduce or totally remove the negative impact onto the customer.
Finally we’ll verify the impact of improvement changes on the rest of the product by updating other failure mode avoidance tools i.e. primarily the HoQ (House-of-Quality) chain but also the functional mapping of the intended or unintended functions.
The systematic approach is worthwhile, if thoroughly applied by using for example Quality Companion from Minitab. Clients appreciate the product robustness from which they can expect a performance they have paid for. Products accept customer operating errors because most of these errors have been foreseen and consequently rendered harmless. The supplier will benefit from a solid client base through a more reliable reputation.
Nevertheless, in today’s digital age, the automotive industry recognizes like many others, that its technology is being transformed by the greater and relentless integration of electronics. All jobs are changing because they are confronted with a new type of challenge … The invisible design.
The failure mode avoidance tools that have been mentioned above, must now take into account the software aspects and the Quality engineers originally trained to criticize blueprints need to evolve to integrate this new characteristic. After all, how can we imagine that these very same Quality engineers in charge of verifying product reliability can spend most of their time pulling their hair out trying to conceive with the hardware development team the best possible product yet at the same time accept with a religious faith black boxes that are the software objects (3) delivered by the software teams. It’s not reasonable.
So what to do? Since these changes are unavoidable, what can be done?
Let’s start with the FMEA, the real key to quality insurance. Is it applicable? We might believe so but in fact quickly realize that this tool is not well adapted as such. For example the Occurrence (2) can’t be evaluated by estimating what can scrape, move, break, heat … in the design. The problem is no longer expressed in physical terms. Software is a set of code lines and thus a set of logical formulas that, unless altered with an all-or-nothing type of change by for example an electromagnetic disorder which would directly modify the code, would necessarily always deliver the same result if it’s subject to the same inputs; no quantitative change (also called variable or continuous) can be originated by the software itself.
So where the breakdowns are coming from? Because even if the code can’t suffer any continuous type of variation problems, breakdowns exist nonetheless; whoever has already worked with a computer knows that.
In fact, breakdowns are coming from cases that emerge outside of the standard product utilization and were not anticipated during development. The inputs are not what were expected and the program stalls. We call this a qualitative type of variation (also called discrete or attribute).
To discover these case (and consequently resolve them), we’ll need to make use of TRIZ (4) in the exploratory phase of tests. This exploratory work enables, thanks to the systematic creativity of the TRIZ process, a rigorous and comprehensive catch of unforeseen situations.
Moreover, the software object is assessed using a set of performance and reliability tests like ALT or HALT (Accelerated Life Testing / Highly …) of the intended and unintended functions. . The reliability modelling can provide reliability values for use, among other things, as input value to FMEA Occurrence.
Finally, we can benefit by combining the estimated reliability with those of other parts of the system to identify the weakest link and its reliability. Reliability which will form the basis of the guaranteed product life time and typically, when the product life times are longer than those of the competition, it’ll be used as a selling point.
We have talked in this article only about the failure mode avoidance strategy but it’s also important to know that it all works together hand-in-hand with the Agile (or Scrum) (5) method which will enable fast and regular delivery of functional sections of programs. Agile was inspired from the Lean Manufacturing method to make software development faster and closer to the customer’s needs (see just-in-time delivery). Over the past 15 years Agile has proven its efficiency and now the method is being redeployed for product development in the manufacturing industry.
The combination of the failure mode avoidance method and Agile enables not just reliability but also a tremendous advantage in rapid product development!
By Thierry Mariani – Juggling Paradigms Sàrl
Thierry is the Managing director at Juggling Paradigms – serving as an operations consultant (Lean Six Sigma Master Black Belt & TRIZ practitioner) who is driven by excellence, focused on performance improvement realizing targeted benefits, operating profit, revenue & robust/reliable Product. With an MSc. in Mechanical/Automotive Engineering, he is a process design management expert with 20 years of experience across Canada, Europe, Morocco, Ukraine, Russia & China, working across disciplines (e.g. Finance, Production, Customer Service, Engineering, Human Resources) for major global companies (e.g. Kraft Foods, Coca Cola, KBL, Areva, Lactalis, Continental, Husky, Sanofi, Delphi International, SKF, France Telecom, Parker, Foyer insurance, Bosch, Honeywell, Air Liquide, AMI Semiconductor, DuPont, Sagemcom, Goodyear, Sberbank …)
Some of his specialties include: Continuous Improvement Transformation: Lean Six Sigma DMAIC, TOC (Theory of Constraint); Agile Continuous Innovation: Design for Six Sigma inc; Design for: Reliability (Failure Modes Avoidance), Robustness (Reduced CTQ Variation Sensitivity), Manufacturability, Assembly & Test (speed with simplicity) – among many more.
Connect with him on LinkedIn