Patrick Saint-Dizier

Detecting Incoherent Requirements in Large Documents

Experiments and User Evaluation


The different protagonists involved in requirement production is a source of mismatches, inconsistencies, incoherence and redundancies. Indeed, stakeholders, technical writers, managers, and then, users and manufacturers, all play different roles and have different views on these requirements. These discrepancies are developed in technical writing principles (Schriever, 1989), (Unwalla, 2004), (Weiss, 1991) and (Kuhn 2014) where a large synthesis is proposed. In an attempt to overcome these problems, most industrial sectors have defined authoring recommendations (e.g. IEEE 830 and variants, ISO 29148), methods (IREB Handbook of Requirement Modelling (Cziharz et al. 2016) and Elicitation (Häußer et al., 2019)) and tools to elaborate, structure and write requirements of various types. These specify how correct requirements should be elaborated. For that purpose, they provide attribute elicitation techniques and discuss the iterative application of requirements processes within life cycles. The result is easier traceability, control and update.

Probably the best way to avoid incoherence in requirements is to precisely follow requirement modelling principles, such as those elaborated most notably in the IREB Handbook of Requirements Modelling (Cziharz et al. 2016). Atomic requirements can be compared to a large number of other requirements related to the same task allowing potential incoherence or misconceptions to be detected. Although this approach works for relatively simple, natural language formulations of requirements, it allows engineers to produce and integrate large sets of requirements with a good level of coherence a priori.

Our approach complements these modelling principles. It deals with situations where engineers indeed follow these principles but nevertheless make errors which are more ‘local’, for example in the specification of values. These values may be, for example, numbers, colours, units, but also linguistic markers with a heavy semantic content such as adverbs, prepositions or modals. These errors are not related to modelling misconceptions but rather to lack of attention or distractions, for example due to work overload, domain evolution (e.g. norms, types of equipments) paired with lack of traceability, or insufficient validation steps when several authors are involved. Incoherent requirements turn out to have very costly consequences at production stages: up to 80% of the initial costs. It is therefore crucial to detect them as early as possible.

Incoherence is not easy to define. It has several facets and it is not only a logical problem. It basically consists in two or more requirements, or groups of requirements, which cannot co-exist in the same context without introducing negative consequences, whatever they could be. These requirements are in general not adjacent in a document, otherwise they could easily be fixed. They generally appear in different sections or chapters of a document or even in different documents. A very simple example is:
TCAS (traffic collision avoidance system) alarm message must appear in red on the FMGC (flight management guidance computer) screen,
and later:
TCAS alarm message must appear in purple on the FMGC screen,
where there is a colour specification mismatch. More subtle is the mismatch between the two following requirements where modelling principles should detect the incoherence if they are sufficiently accurate:
The system S must be stopped every 24 hours for maintenance, and later in the document:
The update of the database via the system S must not be interrupted.

Incoherence between two requirements may be partial: they may include divergences without being completely logically opposed. Next, incoherence may be visible linguistically, where words used induce incoherence, in about 40% of the cases according to our evaluation, or, instead, incoherence may require domain knowledge and inferences to be detected and characterized. We focus in this article on incoherence which can be detected from a linguistic and general semantic analysis: mining them is simpler and probably re-usable over domains. Finally, we consider here only pairs of individual requirements. Identifying incoherence among two or more groups of requirements is more challenging.

1 What is incoherence in requirements? Definition issues and state of the art

1.1 Some definitions and terminological distinctions

Let us consider four main properties that a set of requirements must meet, from the most external to the most conceptual. We have:

Cohesion (maximize uniformity of words and constructions) surface form and readability
Completeness (no omissions in a situation) modelling of all situations
Consistency (control data and trace updates) traceability and update
Coherence (no contradictions at any stage) deep level no contradictions

Cohesion deals with the characterization of the uniformity of a text, from its typography to semantic aspects such as the choice of words. A requirement document, including its associated structures (summary, definitions, introduction, etc.) must have a uniform organization and presentation to facilitate its understanding and update. It must show very regular syntactic structures. Lexical variation (e.g. use of synonyms) must be constrained. Its semantic and pragmatic content must be controlled to guarantee good uniformity, in particular pre-requisites and implicit elements must be stable to avoid misconceptions.

Completeness means that a set of requirements must fully cover a situation. A simple, but frequent, situation is the development of cases where each case requires for example a different action. Cases must not overlap to guarantee unique solutions and no case must be forgotten. A typical example is the flap level extension before landing, e.g.:

Speed between 250 and 220 kts: flap 5 degrees
Speed between 220 and 200 kts: flap 15 degrees
Speed between 200 and 160 kts: flap 25 degrees
Speed between 160 and 130 kts: flaps full

This case structure is not ambiguous; it satisfies the completeness criterion since the intervals which are given do not overlap and cover the whole spectrum of relevant speeds. The description is also comprehensive since all flaps positions have been considered. In requirement authoring, there are many forms completeness may take.

Inconsistency is a complex notion. It involves requirements which deviate or diverge from a standard, in particular from the semantic or pragmatic points of view. Inconsistency is frequently dynamic; it accounts, for example, for the fact that stakeholders may change their mind when producing requirements. Similarly, equipment may evolve and their update is not fully guaranteed, even with good traceability. Inconsistency is therefore related to human behaviour, whereas coherence is static. Consistency may sometimes be associated with cohesion to characterize how good, and how stable, a text is from the points of view of language and general understanding.

Coherence is a logical notion whose scope extends to a whole set of documents. In epistemology, coherence is part of a theory of truth. A document or a knowledge base is coherent if there is no logical contradiction between all the propositions it contains. A coherent set of requirements does not guarantee that this description is sound, correct and comprehensive. However, we can observe in requirement authoring that coherence is not a Boolean notion: it is possible to observe degrees of incoherence, where lower levels can be allowed.

1.2 Aspects of coherence

Coherence analysis has not been very much investigated in the area of requirement production. Besides modelling principles, authoring norms that limit the linguistic complexity of requirements in terms of structure and lexicon is also a means to avoid incoherence. These considerations are developed in several articles, among which the following, published in the IREB magazine:

Several road-maps on requirement elicitation and writing, e.g. (Wyner, 2010) show the importance of having consistent and complete sets of requirements, and rank it as a priority. On the research side, projects such as those developed by (Kloetzer et al., 2013) aim at finding contradictions between lexico-syntactic patterns in Japanese, including spatial and causal relations. In (de Maneffe et al., 2008), the limits of textual entailment as a method to detect inconsistencies is shown and a corpus is developed from which a typology of contradictions is constructed. The need for very fine granularity in the data is advocated to avoid errors, which is not possible for large sets of requirements.

1.3 Some general purpose research directions to mine incoherent requirements

Incoherence analysis in texts in general and in requirements in particular is in a very early development stage. One of the reasons is that incoherence analysis is not a surface problem, it requires accurate domain knowledge, reasoning and analysis strategies. Artificial Intelligence foundations and techniques can be found in (Marquis et al. 2018), for example.

When dealing with the coherence problem, three main directions can be foreseen:

1.4 Working methodology

Our methodology to analyse incoherence is to:

(1) observe the problem on corpora that contain incoherent requirements,
(2) organize the errors and categorize them, and then
(3) develop a model, organize the linguistic resources which are needed,
(4) implement a prototype,
(5) evaluate the results, the user’s satisfaction, and define future improvements.

These points are described with some detail in the sections below.

2 Preliminaries: constructing a corpus of incoherent requirements

Our analysis of incoherence is based on a corpus of requirements coming from five companies in five different critical industrial sectors: energy, aeronautics, insurance, transportation regulations and telecommunications. To guarantee a certain generality to our results, the main features considered to validate our corpus are:

(1) requirements correspond to various professional activities, and have been authored by technical writers in different industrial sectors, over a relatively long time span (between one and two years),
(2) requirements correspond to different conceptual levels, from abstract rules to technical specifications,
(3) requirements have been validated and are judged to be in a relatively `final' state,
(4) requirements follow various kinds of authoring norms imposed by companies, including predefined patterns (boilerplates),
(5) the documents from which requirements are extracted are well-structured and of different levels of language and conceptual complexity.

A total of 7 documents have been analysed, partly manually, using the metrics advocated below, searching for incoherent requirement pairs. Corpora characteristics can be summarized as follows:

Average length of specification documents Number of requirements per documents Total number of requirements Total number of pairs of incoherent requirements
150 - 200 pages 3500 to 4000 27500 128

When searching for incoherent requirements, we noted that:

Since requirements should a priori follow strict authoring guidelines (no synonyms, limited syntactic forms), dealing with the same precise topic means that two requirements which are potentially incoherent should:

To automatically detect such pairs, we have defined two metrics (Saint-Dizier, 2018):

It is important to note that our system is aimed at generating incoherence warnings: incoherence is probable but it must be confirmed by a human expert. Then, the revision of these requirements, and possibly others as a consequence, is in the hands of requirement authors. The system cannot resolve them. There are also several severity levels which can help writers to organize their revisions.

3 A typology of incoherence based on linguistic criteria

The categorization presented below is defined empirically from our corpus and is based on simple linguistic considerations. The goal of this categorization is to organize and facilitate the definition of templates and associated lexical resources to automatically mine pairs of incoherent requirements in a large diversity of types of requirements. Templates must be sufficiently expressive and generic, but they should not over-recognize incoherent requirements.

The term incompatible, used below between two elements, means that these elements are not equivalent in the domain in which they operate. They are not synonyms or semantically equivalent. Our dissimilarity metric is based on a measure of the incompatibility between two elements.

3.1 Partial or total incompatibilities between expressions

In this category, eight closely related forms of incoherence are included. Categories are based on linguistic factors:

3.2 Incoherent events

In this category fall pairs of requirements that describe incoherent events, the detection of which often requires domain knowledge. In spite of this knowledge limitation, a number of differences appear at a `surface' level and can be detected for the large part on a linguistic basis. A typical example is given in the introduction of this article. Another type of incoherence frequently found is:

The maximum 20 degree flap extension speed is 185 kts. vs.
Extend flaps to 20 degrees and then slow down to 185 kts.

Note that the second example is more a procedure than a requirement.

3.3 Terminological incoherence

In this category, requirements which largely overlap are considered. Their slight differences may be symptomatic of a partial inconsistency or of terminological variations which must be fixed. These cases are relatively frequent and typical of documents produced either by several authors or over a long time span (where for example equipment names, attributes or properties may have changed). Two typical examples are:

The up-link frequency, from earth to space, must be in the s band. vs.
The down-link frequency, from space to earth, must be in the s band.

And:

Those tests aim at checking the rf compatibility of the ground stations used during the mission and the tm / tc equipment on board. vs.
Those tests aim at checking the rf compatibility of the ground stations used during the mission and the on board equipment.

In the first example, the use of the s band in both situations may sound strange and requires a control check by an expert; in the next example, the expression tm / tc must be compared with on board equipment.

As the reader may note, besides simple terminological variants, this class of incoherence hides deeper modelling problems.

3.4 Incoherence in enumerations

This case covers situations where different actions must be carried out depending on specific criteria, as illustrated in section 1. For example, for a given set of temperature intervals, different actions must be carried out. If these intervals overlap, or if some values are missing between the intervals, it is possible to have a form of incoherence.

3.5 Synthesis

The observed frequency of these forms of incoherence can be summarized as follows:

Category Scope Occurrence rate
Incompatible values or expressions Differences between various types of values, or linguistic expressions 59%
Incoherent events Events which differ in their content or structure 15%
Terminological incoherence Term variations due to updates 14%
Incoherent enumerations Situations such as overlap or gaps 12%

4 Mining Incoherence in texts: the overall processing strategy

The next challenge is the definition and implementation of templates, which encode the forms of incoherence reported above, to mine incoherent pairs of requirements. The templates are implemented in our TextCoop research platform dedicated to discourse analysis that allows an easy declarative specification of discourse templates. They are then integrated into our freeware LELIE research software, aimed at improving requirement authoring from a language point of view (Saint-Dizier, 2014), (Kang et al., 2015).

The global analysis strategy is based on a comprehensive analysis of a requirement document. The complexity in terms of processing time is quite high since requirements must be compared one after the other with all the other requirements. The processing strategy is organized as a loop as follows:

STEP 1: a new requirement Ri is read from the source text; its discourse structure is tagged via TextCoop, including the kernel requirement portion,

STEP 2: Ri is then compared to all the elements Rj already stored in the requirement database. For that purpose:

(1) for all requirements Rj already checked and stored in a database, the similarity and dissimilarity metrics advocated in section 2 are activated to check whether Ri and Rj may potentially be incoherent,
(2) If this is the case, then incoherence patterns are activated to detect potential incoherence. If this is the case, a potential incoherence warning is produced and Ri is not added to the database,
(3) Ri and Rj are then stored in the incoherent pair database for checking by an expert,
(4) if no incoherence has been detected, then Ri is added to the requirement database.

STEP 3: read a new requirement and go to STEP 1.

In this article, we will not go into the implementation details of point (2) above. Briefly, the strategy is to extract from two requirements the set of terms which are different and then to check whether they belong to one of the categories given in section 3. For that purpose, dedicated templates have been implemented. To identify expressions which differ, syntactic analysis is carried out based on local grammars which deal with the expressions of incoherence. In addition, structured, lexical semantics resources are used, in particular antonyms. These are in general domain independent and therefore re-usable in almost any context.

5 System Evaluation and Discussion

The research elements briefly described above have been implemented within our freeware Lelie authoring environment (Saint-Dizier, 2014) (Kang et al., 2015). This is obviously at an experimental stage that needs further improvements. The last step is to evaluate the results (a) from a purely technical and linguistic perspective and (b) from the point of view of the requirement authors. This evaluation is indicative: it provides improvement directions, not final results.

5.1 Evaluation of the technical results and limitations

An evaluation of the accuracy of the system has been carried out on a test corpus (a corpus different from the above one, called the development corpus) where incoherence has been introduced artificially to be able to make tests. A first result is that the system is more powerful than human analysis:

Then, when inspecting the pairs mined by the system, 74% of the pairs (81 pairs) turn out to be really incoherent and require revisions. It seems that this is a relatively acceptable accuracy for requirement authors.

5.2 Improvement directions

The main directions in which we plan to improve incoherence recognition and, at the same time, to limit noise, are characterized by the following situations:

(1) Different forms for a similar content do not necessarily entail incoherence: two requirements may deal with the same point using slightly different means of expression. For example, values and units may be different, or intervals or arithmetical constraints may be used instead of a single value, but these expressions remain globally equivalent, even if they do not follow authoring guidelines very strictly. For example, these two requirements are not equivalent according to the similarity metric: the A320 neo optimal cruise altitude must be FL380 in normal operating conditions vs. the A320 neo optimal cruise altitude must be between FL 360 and FL 380, depending on weather conditions. The second requirement is just more precise.
(2) Use of generic terms instead of specific ones does not entail incoherence: it is also frequently the case that two requirements differ only in that a general purpose term or a business term is used, where one is either more generic or at a different level of linguistic abstraction that the other. Detecting this situation requires an accurate domain ontology.
(3) Presence of negative terms may cause problems to the pattern analysis: the negation or negatively-oriented terms, though not recommended in technical writing, (verbs, adjectives) may appear in one requirement and not in the other, but these requirements are in fact broadly similar. For example: Acid A must not be thrown in any standard garbage vs. Acid A must be thrown in a dedicated garbage.
(4) Influence of the co-text: requirements dealing with a given activity are often grouped under a title and subtitle in an enumeration or in a chart. Two requirements that belong to two different groups may seem to be incoherent, but if the context, given by the title or enumeration introducing the requirement is different, then these two requirements may deal with different cases and are not a priori incoherent.

In terms of silence, a few requirements have not been detected as incoherent because:

(5) Implicit elements prevent the similarity diagnosis: this is the case in the pair: the upper attachment limit must not exceed 25GB. vs. It must be possible to specify a maximum limit for the storage capacity of an attachment. The implicit 'for the storage capacity' expression is unexpressed in the first pair: as a result these two requirements are presumed to be different by the system since they are not close enough (similarity metrics). Missing information prevents the similarity metric from mining requirements that deal with the same precise topic.
(6) Coherent, but related to opposed external contexts: Similarly to (4) above, but conversely, two requirements may be identical but incoherent if the sections in which they appear have titles which are in some way opposed. The incoherence may then be 'external' to the requirements.

5.3 User satisfaction analysis

Requirement authors are always curious to see what an automatic system can detect in terms of incoherence. They know that detecting incoherence, even with some noise, is a difficult task for humans and they found the results to be useful. Our system can only detect incoherence visible from a linguistic point of view, which we estimate to represent about 40% of the total occurrences of incoherence. Given the relative simplicity of our patterns, our final estimate is that about 25% to 30% of the total occurrences of incoherence are detected. This is not much, but useful given the cost of errors at production stage.

The main feedback from authors which played the role of testers is given informally below, where we concentrate on three main issues. These testers have two preoccupations:

The first remark we get is that the incoherence is often not limited to the terms which differ between two requirements, but must be analysed on a larger portion of each requirement. For example, our system underlines:
the maximum 20 degree flap extension speed is 185 kts. vs. Extend flaps to 20 degrees and then slow down to 185 kts.
which are the main differences. However, the linguistic scope of the incoherence is larger than what is underlined. It is difficult to precisely generate an automatic message that explains the misconception in simple terms.

Secondly, testers feel that for revising incoherent requirements they need additional tools so that they can access related requirements: indeed resolving incoherence may entail the rewriting of several related requirements, besides those which are explicitly incoherent.

Finally, some forms of incoherence were felt to be rather limited and no revision was carried out (about 15% of the cases). It would be of much interest to be able to identify the severity of the incoherence so that authors can develop a revision strategy, probably starting with the most crucial ones.

6 Conclusion and Takeaways

The best way to avoid incoherence in requirements is to follow requirement modeling principles whenever, and as much as, possible, such as those elaborated most notably in the IREB Handbook of Requirements Modelling and Elicitation (Cziharz et al., 2016), (Häußer et al., 2019). Developing accurate traceability methods is also crucial. Nevertheless, as shown in this article, requirement authors can make errors which are less conceptual, for example due to distractions, work overload or domain evolutions that were not taken into account. These values may be for example numbers, colours, units, equipment or product names, but also linguistic markers with a heavy semantic content such as adverbs, prepositions or modals. These errors are rather difficult to detect in a majority of situations because they may concern requirements which are in quite remote sections or chapters of a document. These errors occur in about 1-2% of the total number of requirements - this is not very frequent but may nevertheless have significant consequences on the development of a product or a process.

Section 3 of this article proposes a categorization of the errors found in our corpora. Section 4 shows how these can be mined with a relatively good accuracy. An evaluation and the current limits of the system are provided in Section 5.

Before using our system, we recommend requirement authors to accurately proofread their text over short text portions: a few related sections or a few pages, concentrating on discrepancies which may arise at the level of value specifications, enumerations and terminology usages. This careful reading may contribute to detecting unexpected errors. The case of incoherent events is more difficult to detect and may reveal misconceptions.

Detecting the categories of errors presented in Section 3 in requirements which are not adjacent but in remote text sections or in different documents could partly be resolved by using our approach, even if it does not cover all the possible errors. However, our approach can be extended or customized to specific errors or document genres. Besides using our system, implementing a few scripts that search for typical errors of a domain over documents can be really helpful and should produce results quite similar to those given above. The challenge remains the identification of pairs of requirements which may potentially share similarities but contain errors, as well as the development of related linguistic resources. A good system must indeed have a low level of noise: it must accurately and essentially point to ‘real’ errors.

Acknowledgements

I wish to thank the two reviewers, Peter Hruschka and Thorsten Weyer, whose contribution greatly helped to improve this text. I also thank Gareth Rogers, who polished the English of this article.

References