The Requirements Engineering Magazine appears quarterly. It is cost free and provides you with up-to-date articles reflecting the activities of the RE and BA community.
Simply sign up for being notified about new issues of the Requirements Engineering Magazine.
You may unregister at any time by sending a mail to firstname.lastname@example.org from the e-mail address which you have registered with.
When writing or revising a set of requirements, or any technical document, it is particularly challenging to make sure that texts read easily and are unambiguous for any domain actor. Experience shows that even with several levels of proofreading and validation, most texts still contain a large number of language errors (lexical, grammatical, style, business, w.r.t. authoring recommendations), and lack of overall cohesion and coherence. LELIE [a] has been designed to track these errors and, whenever possible, to suggest corrections. LELIE has obviously an impact on the technical writer behavior: LELIE rapidly becomes an essential and user-friendly authoring companion.
LELIE [a] was funded by the French National Research Agency (ANR) from 2008 till 2013. It is still a research framework but it is now paired with R&D efforts in order to investigate its relevance and customization to the industrial world. The LELIE project is a research and R&D framework, based on natural language processing and artificial intelligence, the aim of which is to detect and analyze potential risks in technical documents, related to health and ecology, but also to a number of social and economic dimensions.
Risks emerge from poorly written texts, and from various forms of incoherence. For example, ‘Progressively heat the probe X27’ relies too much on the operator’s knowledge and practice: what temperature should be reached and in how much time? A wrong interpretation may lead to accidents and damages. Among technical documents, requirements are a central issue since they must comply with a high number of constraints of e.g. readability, lack of ambiguity and implicit data, feasibility, relevance, traceability, and overall cohesion and coherence.
The main aim of LELIE is, given a set of requirements, whatever their domain and type, to analyze their contents and to annotate them in an appropriate way wherever potential errors are identified. Errors range from poor writing quality to incoherence between requirements. Authors are then invited to revise these documents. This requires some domain knowledge for example: ontology, terminology, and lexical. Requirements are a textual genre dedicated to action: little space should be left for ambiguities and for personal interpretation.
LELIE is based on three levels of analysis:
The LELIE project addresses a large number of problems of controlled natural language. We concentrate in this document on the first topic: the detection of inappropriate ways of authoring requirements, which has now reached a good level of maturity. A prototype has been developed for this first topic for French and English. A kernel of this prototype, with a basic user interface, is available for testing at: http://www.irit.fr/~Patrick.Saint-Dizier/. The two other topics given have reached a lower level of maturity: they are extremely complex in general. Investigations are made on a case-based approach.
The approach in LELIE is not to guide requirement authors to write on the basis of predefined templates, also called boilerplates, which are not very often strictly followed, but to let authors express themselves freely and then to make, upon demand, a posteriori controls.
LELIE develops a hybrid approach that is cooperative with the requirement author based on:
Tools controlling the authoring quality of requirements have been developed in the past with the use of templates or boilerplates meant to guide the technical writer [Arora et al. 2013]. This is most notably the case for the well-known RAT-RQA system (the Reusecompany) and of the RUBRIC system developed at the University of Luxemburg. Let us also cite two major CNL-based university prototypes which are of much interest for requirement authoring: ACE [Fuchs et al. 2008, 2012], which stands for Attempto Controlled English. This system makes an in-depth language semantic analysis. It was initially designed to control software specifications, and has been used more recently in the semantic web. PENG (Processable English [White et al. 2009] is a computer-processable controlled natural language system designed for writing unambiguous and precise specifications. These systems make heavy use of syntactic analysis, which is rather costly. LELIE is based on shallow parsing techniques and semantic analysis, which makes it more relevant for requirements where the language is complex and sometimes ill-formed. A synthesis of CNL based systems is developed in [Kuhn 2013, 2014].
Let us now concentrate on LELIE as an intelligent assistant tool for requirement authoring. LELIE is a system based on rules that detect errors of different levels: syntactic, lexical, semantic, discourse. From the analysis carried out by LELIE, it becomes easier to measure the quality of a specification, composed of requirements, in terms of its testability, ambiguity, singularity, consistency, completeness, redundancy and traceability. The error correction rules have been developed and validated in four steps:
|Fuzzy terms||ambiguity, testability
|wherever possible, suitably, adequately|
|Complex or ambiguous coordination||Singularity
|X shall ACTION1 and ACTION2 or ACTION3|
|multiple negation makers or double negation||Readability
|It shall not be possible to do not…|
|Multiple actions in a requirement||validity, testability, traceability
|X shall ACTION1 and ACTION2 / X shall ACTION1 and Y shall ACTION2|
|Complex discourse structures||readability, ambiguity
|Pronouns with uncertain reference||Ambiguity
|their, them, these, it…|
|Incorrect references to other chapters||Not feasible
|below, above, see…|
|Heterogeneous enumeration||Waste of time to understand, ambiguity
The texts of the company S3 have been reviewed by experts of technical document production before our analysis, however there remain several errors. We observe that the distribution of the errors depends in particular on the complexity of texts: those of S2 are clearly more complex than those of S1. Finally, we note that there are on average 15 errors by page, i.e. approximately an alert every 2 or 3 lines, not counting errors related to the business rules. This is obviously very large and motivates the use of LELIE.
The alerts produced by the LELIE system have been found useful by most requirement writers that tested the system. However, they feel that:
In the LELIE project, we develop and test several facets of an error correction memory system that would, after a period of observation of requirement writers making corrections from the LELIE alerts, add flexibility and context sensitivity in error detection and correction. General principles of language processing via a contextual memory are developed in [Daelemans 2005].
This memory system is based on the following operations:
The error correction memory is based on a two level organization:
Roughly, after induction (step (1) above), an error correction rule has the following form:
[error pattern] → [correction pattern] – Context.
The “error pattern” describes an incorrect structure, the “correction pattern” is the correction that should preferably be applied, while “Context” refers to the conceptual environment of the correction pattern. In LELIE it is realized by memorizing the four closest words (adjectives, nouns, verbs) occurring before or after the error. The context allows the specification of precise recommendations.
For example, a correction rule used for fuzzy manner adverbs is:
[progressively VP(durative)]} → [progressively VP(durative) in X(time)] – Context.
where X(time) is a variable of type time. VP(durative) indicates an action that takes some time to be realized.
e.g. progressively heat the probe X37 → progressively heat the probe X37 in 10 minutes.
In this example, Context = (Probe X37 heat), VP = heat and X= 10 minutes. X is suggested by a correction recommendation in relation with the context (heating the X37 probe), the adverb is kept in order to keep the manner facet which is not fuzzy, since it is the temporal dimension that is fuzzy. Note that ‘heat’ is here underspecified: the temperature to reach is not given. This is another type of error detected by LELIE, but not developed in this text.
We noted that correction divergences between technical writers often arise; therefore, a strict automatic learning process is not totally accurate and achievable. In LELIE, the approach is to propose to a team of technical writers several possible corrections via simple generalizations on coherent subsets of corrections and to let them decide on the best solution, via discussion, mediation, or via a decision made by an administrator.
Let us now concentrate on a few typical cases related to fuzzy terms and negation, which are frequent errors in requirement authoring. There are several categories of fuzzy lexical items which involve different correction strategies. They include a number of adverbs (manner, temporal, location, and modal adverbs), adjectives (adapted, appropriate), determiners (some, a few), prepositions (near, around), a few verbs (minimize, increase) and nouns. These categories are not homogeneous in terms of fuzziness, e.g. fuzzy determiners and fuzzy prepositions are always fuzzy whereas, for example, fuzzy adverbs may be fuzzy only in certain contexts. The degree of fuzziness is also quite different from one term to another in a category.
On a small experiment with two technical writers from one of our users, considering 120 alerts concerning fuzzy lexical items in different contexts, 36 have been judged not to be errors (rate: 30%). Among the other 84 errors, only 62 have been corrected. The remaining 22 have been judged problematic and very difficult to correct. Correcting fuzzy lexical items indeed often requires domain expertise.
To conclude this section, let us give a few typical error correction patterns that have been induced, a number of them deal with various forms of implicit quantification:
|Error type||Error pattern||Correction pattern||Example|
|Fuzzy determiner||[a few Noun]||[less than X Noun]
* Adds an upper boundary X
|A few minutes
→ Less than 5 minutes
|[most Noun]||[more than X Noun]
* Adds a lower boundary X
|Most pipes shall ...
→ More than 8 pipes shall...
|Temporal, iterative adverbs||[VP(action) Adverb(iterative)]
* VP(action): action verb
|[VP(action) every X(time)]||The steam pressure shall be controlled regularly
→ The steam pressure shall be controlled every 10 minutes
|Fuzzy prepositions||[near Noun(location)]||[less than X(distance) from Noun(location)]
* X(distance) depends on Context
|Near the gate
→ Less than 100 m from the gate
|Negation on usages||[(do) not Verb(use) NP]
* NP: any noun
* Verb(use) any verb such as ‘use’
|[Verb(use) hyperonym(NP) other than NP]
* Hyperonym(NP) denotes a more generic term than the NP, given in a domain terminology
|shall not use hydrogen
→ Shall use a gas other than hydrogen
|Reverse synchronization||[do not/never VP before VP']
* VP and VP' denote two actions
|[VP only after VP'] or [VP'. Then VP]
* Actions are reversed in the correction, some persuasion effects may be lost
|never unplug before the machine has been stopped.
→ Stop the machine. Then unplug it
The LELIE prototype is based on the following components:
The kernel of the system is the set of rules that consult lexical entries. The lexical entries of the secondary lexicon must be defined for each domain (e.g. aeronautics, energy, chemistry) and possibly adapted or tuned for each application. This can be done manually, by lexicographers, or via the support of a lexical acquisition platform. The kernel is available for testing.
The LELIE architecture is summarized in Figure 1:
LELIE is a project which aims to investigate the different tools that are needed for technical writers, and in particular requirement writers, to improve the quality of their texts. We have presented in the previous section the first step: improving the authoring quality of texts, via alerts and correction patterns. This is the first step in such a project. It is important to note that, although there are guidelines for writing requirements, large differences in style and form have been observed between authors and companies.
Once requirements are relatively well-written, additional quality controls can be carried out. Let us review here those which seem to be the most crucial from the errors found in large collections of requirements. Most of them are complex and cover several situations. Therefore, we feel a case-based approach is appropriate to analyze and develop them gradually in a sound way. These controls are, in particular:
[a] The name LELIE is not an acronym, it is a character from Molière who makes a lot of errors in his everyday life, hence the name for our project.