Introduction

In the era of machine learning, system behaviour is no longer transparent and is only somewhat debuggable. More and more sensors or components have logic delivered from machine learning (ML) processes that are trained on enormous amounts of data. Those models have been widely used in the automotive industry – especially for ADAS and Autonomous Driving. From a safety point of view, there are many concerns about such systems and their risk evaluation. There is no easy way to determine the precise failure rate and acceptance times of ML-based systems. Perhaps the failure rate is not as critical as a wrong decision within a unit’s normal operation. For SAE 3–5-Level systems, ML is used for sensing, mapping, fusion and decision-making algorithms. This means that if one of the algorithms makes a wrong recognition/decision, it can lead to undesired behaviour that may cause a dangerous road event. This brief article reviews recent trends in validation processes for highly automated and autonomous vehicles.

Current functional safety standards are used for deterministic systems, limited to hazard and risk analysis and component failure that may lead to system or item malfunction. The ISO 26262 standard does not describe how to prevent a wrong decision of the function itself.

The second edition of the ISO 26262 standard is coming this year (2018), and there is huge debate about ML and cybersecurity (which will probably be covered by a separate standard). Before the new release of the standard, let us take a closer look at the article “An Analysis of ISO 26262: Using Machine Learning Safely in Automotive Software”, published by researchers from Waterloo University, who try to address this matter and try to define new processes that may help to control malfunction risk in such systems.

Machine Learning – where is it used in ADAS and AD products?

In recent years, ML and its powerful models, like deep neural networks (DNNs), have been widely used in a number of industries. Today, we have vehicles on the road whose braking, accelerating (adaptive cruise control, automatic emergency braking and intelligent traffic sign recognition) and steering decisions (lane keeping assistance and traffic jam assistance) are based on sensing, the logic of which has been developed via ML. The driver is still responsible and can override the system actuation – laterally or longitudinally.

The sensor fusion algorithms and decision algorithms, whether acceleration, braking or steering, are still deterministic model based, but this will change in the next generation of assistance systems (SAE Level 3).

ML steps can be divided into four independent categories:

sensing (e.g. image recognition by camera)
mapping (some mapping definitions are based on crowd-sourced data that position the vehicle according to detected reference objects (e.g. traffic lights, commercial posts, etc.))
processing (fusion algorithms for combining radar, lidar, map, GPS, camera and v2x data)
decisions (autonomous driving algorithms are trained on real-world driving patterns and a road book)

Machine leaning and its main problems for safety

There are several characteristics of ML that can impact safety or safety assessment:

Non-transparency. All types of ML models contain knowledge in an encoded form, but this encoding is harder to interpret for humans in some types than in others. Neural network models are considered non-transparent, and significant research effort has been devoted to making them more transparent. [1] For example, a sensing system can define various road edges even in an image where structure and colour are almost identical, as it will compute the entire scene using ML To reduce false-positives, most of the OEM will include mapping and positioning to assure high confidence (e.g. Intel and Mobileye’s REM, whose positioning is camera based). This non-transparency is an obstacle to safety assurance and lowers confidence that the model is operating as intended. Another problem of non-transparency is when one has to explain the root causes of accidents.

Error rate. An ML model typically does not operate perfectly and exhibits an error rate. Thus, the absolute “correctness” of an ML component, even with respect to test data, is seldom achieved, and it must be assumed that it will periodically fail. Furthermore, many detection systems have debatable confidence levels. For example, lane detection systems may not detect washed-out lanes or may make wrong detections with tar lanes (especially in dense city areas). Additionally, even if the estimate of the error rate is accurate, it may not reflect the error rate that the system actually experiences while in operation after a finite set of inputs because the true error rate is based on an infinite set of samples. [1]

Training based. ML models are created by training them on a set of data (traffic signs, lanes, vehicles and people). These data are labelled using a subset of possible inputs that could be encountered operationally. Therefore, the training set is necessarily incomplete, and there is no guarantee that it is even representative of the space of possible inputs. In addition, learning may overfit a model by capturing details incidental to the training set, rather than general to all inputs. Another factor is that, even if the training set is representative, it may under-represent the safety-critical cases because these are often rarer in the input space. [1]

Instability. DNN models, which are more-powerful ML models, are typically trained with optimization algorithms, which may have various optima – consequently, one may get different results with the same training set. This characteristic makes it difficult to debug models or reuse parts of previous safety assessments.

Having laid these characteristics out, one can define potential impact areas in the current ISO 26262 standard.

ISO 26262 impacts and recommendations

Researchers from Waterloo University in Canada listed the potential impacts of ML in the current standard and pointed to some recommendations and improvements:

HARA. It is more difficult to estimate its risk and controllability. Some researchers believe that the extended use of assisted driving can create behavioural changes – meaning that the driver’s assumed controllability may be delayed or not present, for example, if the vehicle tracks the wrong lane marking and smoothly leaves the desired lane into opposite traffic.Recommendations for ISO 26262: The definition of hazards should be broadened to include harm potentially caused by complex behavioural interactions between humans and the vehicle that are not due to a system malfunction.

FMEA. ML faults have their own characteristics, and many fault types and failure modes have been catalogued for neural networks, but their failure modes have to be very carefully revisited.Recommendations for ISO 26262: It is recommend to use fault detection tools and techniques that take into account the unique features of neural networks.

Level of ML usage. As mentioned in the introduction, currently, ML is widely used for sensing; however, in future, ML could be used to implement an entire software system, including its architecture, using an end-to-end approach. For example, Bojarski et al. [4] trained a DNN to make appropriate steering commands directly from raw sensor data, side-stepping typical AV architectural components such as lane detection, path planning, etc. An end-to-end approach deeply challenges the assumptions underlying ISO 26262. Another challenge with an end-to-end approach is that, in some cases, the size of the training set needs to be exponentially larger than when a programmed architecture is used. [5]Recommendations for ISO 26262: Although using an end-to-end approach has shown some recent successes with autonomous driving (e.g. Bojarski et al. [4]), researchers recommend that an end-to-end use of ML should not be encouraged by ISO 26262, due to its incompatibility with the assumptions about stable hierarchical architectures of components. [1]

Required software techniques. Part 6 of ISO 26262 deals with product development at the software level and specifies 75 software development techniques. After evaluating all of these techniques for efficiency in ML assessments, it turned out that only 30% of the highly recommended techniques (++) could be used without any adaptation in ML SW assessments for ASIL C and D. More than 40% were completely inapplicable, while approximately 20% could be used with adaptation to ML.Recommendations for ISO 26262: Many techniques are specifically biased toward the assumption that code is implemented using an imperative programming language. In order to remove this bias, it is recommended that the requirements be expressed in terms of the intent and maturity of the techniques, rather than their specific details. [1]

What does Intel’s Mobileye have to say?

As one of the leading suppliers of vision sensors, road mapping systems and potentially autonomous driving decision-making algorithms, Intel’s acquired company Mobileye has its own safety strategy. In a recent speech at CES 2018, Professor Amnon Shashua (Mobileye’s senior vice president) spoke about safety in its EyeQ generation. Mr Shashua referred to the paper “On a Formal Model of Safe and Scalable Self-Driving Cars”, which he co-authored. In the paper, it was claimed that a model-based approach to safety is required but that the existing “functional safety” and ASIL requirements in the automotive industry are not designed to cope with multi-agent environments.

Sensing and fusion maturity

During his speech, Mr Shashua made a clear distinction between system malfunction (i.e. failure that is recognized) and normal operation (where false-positive detections can happen). One of the factors reducing sensing failure is Mobileye’s REM technology, which delivers on-line crowd-sourced data that stabilize vehicles by ensuring that they follow paths even when there are no lanes or lanes are washed out or misleading. This is definitely a risk-reducing factor, as the vehicle will receive data (reference objects) that enhance its self-positioning up to a few centimetres.

On top of that, it was said that the detection capacity is quite mature – currently, it is the fourth generation of the system, which has been continuously built on with training data. It is not only image texture that is taken into account – a holistic path prediction model helps the algorithm to define actual objects, lanes and edges.

One can roughly analyse how accurate the entire system is or what happens if there is LTE/GSM signal corruption so that the system works in a limited mode, which definitely should be within the scope of FMEA.

Decision-making using a responsibility-sensitive safety (RSS) model

Mobileye’s idea is to base its decision-making algorithms not on an accident-free idea (as some accidents might be impossible to avoid – e.g. when a vehicle is hit by another and has no room to escape). Its approach is to avoid blame for causing an accident – in other words, a vehicle’s decision-making should compute such paths/behaviour that will not be responsible for causing accidents. On top of that, the algorithm shall respond to the mistakes of other drivers to secure the highest safety level for its vehicle’s occupants.

Safety goal: Self-driving vehicles shall never be responsible for accidents:

They shall not cause them
They shall properly react to provide the highest safety level when put in a pre-accident situation caused by another driver

To achieve this, an RSS model has been formulated. This model formalizes the “common sense” of the dilemma “who is responsible for an accident?” in four respects:

Keep a safe distance from the car in front of you so that if it brakes abruptly, you will be able to stop in time
Keep a safe distance from cars on your side, and when performing lateral manoeuvres and cutting in to another car’s trajectory, you must leave the other car enough space to respond
You should respect “right-of-way” rules, but “right-of-way” is given, not taken
Be cautious of occluded areas. For example, a child might be occluded behind a parked car [2]

The above model is verified with past accident data: one way to validate such a model is to reproduce (in simulation) scenarios of accidents that happened in real life. This is partially what Mobileye did – it took NHTSA crash data from over six million accidents, which were grouped into 37 scenarios that covered 99.4% of the accidents.

The real challenge here is to somehow describe human laws with mathematical formulae and to define very subjective driving patterns (e.g. a dangerous cut-in or unsafely pulling into traffic). Therefore, one has to define entire driving laws mathematically, defining dangerous manoeuvres and proper or safe responses. Based on such definitions, one can create system restrictions that will not allow AVs to initiate dangerous situations and will always follow the proper response.

The only obvious downside of this approach is that AVs may get stuck in really intensive traffic, such as pulling in from a low-priority road.

Summary

Many on-demand transportation companies, like Lyft and Uber, are now in the phase of developing and testing autonomous driving solutions. They are focused on various technological and human acceptance topics; however, without methods that keep safety high (lower than 10^-9fatalities per hour of operation), considering the social acceptance of the technology is pointless. The above enhancements to safety standards are certainly needed, but will they be efficient in lowering failure and wrong decision consequences below a factor of one per billion (10^-9)?

References

[1] R. Saley, R. Queiroz, K. Czarnecki, “An Analysis of ISO 26262: Using Machine Learning Safely in Automotive Software”, arXiv:1709.02435v1 [cs.AI] 7 Sep 2017.

[2] S. Shalev-Shwartz, S. Shammah, A. Shashua, “On a Formal Model of Safe and Scalable Self-Driving Cars”, arXiv:1708.06374v5 [cs.RO] 15 Mar 2018.

[3] ISO 26262: Road Vehicles – Functional Safety, International Organization for Standardization, 2011, 1st version.

[4] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang et al., “End to End Learning for Self-Driving Cars,” preprint arXiv:1604.07316, 2016.

[5] S. Shalev-Shwartz and A. Shashua, “On the Sample Complexity of End-to-End Training vs. Semantic Abstraction Training,” arXiv preprint arXiv:1604.06915, 2016.

Glossary

AD – Autonomous Driving

ASIL – Automotive Safety Integrity Level

AV – Autonomous Vehicle

ADAS – Advanced Driving Assist Systems

FMEA – Failure Mode Effects Analysis

GPS – Global Positioning System

HARA – Hazard And Risk Analysis

NHTSA – National Highway Traffic Safety Administration

OEM – Original Equipment Manufacturer

REM – Road Experience Management

How Functional and Use-Case Safety of Machine Learning Methods should change to cover Autonomous Driving – ADAS –