Harry Sneed Birgit Demuth
Tracing Change Requests
From Requirements to Code
One of the biggest challenges in software maintenance is the need to trace incoming change requests to the code affected by those requests. This need is especially acute in large systems with a million or more lines of code. Maintenance personnel spend a significant portion of their time looking for the code they should alter. An approach to solving this problem is “Feature Analysis”. Feature Analysis links change requests to an existing requirements document by analyzing the common nouns they use. Once the affected requirement is found, it is traced to the code which implements it by means of a repository containing all the entities and relationships contained in the code. This article defines the problem, reviews the existing literature on the subject, and presents a tool which the first author has developed for working toward a solution to resolving the problem.
Key Words: Software Maintenance, Change Requests, Requirements Documentation, Feature Analysis, Traceability, Repository Management, Impact Analysis, Requirements Traceability Matrix.
1. The Maintenance Challenge
Studies have shown that the greatest cost in software maintenance next to testing is in finding the code affected by the maintenance requests [Lehm98]. Maintenance personnel without intimate knowledge of the entire system are normally helpless when it comes to tracing a change request to those portions of code which have to be changed. If system documentation exists, it is often obsolete or insufficient. The persons responsible for making changes have little insight concerning how to proceed. They may have a requirements document, but there is no way of getting from the requirements to the pertinent code. There is a wide semantic gap between the natural language requirements and the programming language of the code which cannot be overcome easily. The challenge is to bridge that gap with a minimum of effort in a minimum of time. This article proposes a method to address this problem. In addition, a tool is available from the author that offers potential to evolve an automated solution.
2. The Need to Maintain the Requirements Document
There are many reasons for maintaining a separate requirements document. One important reason is to provide an objective set of statements that can be used as a basis for testing the system [Sned07]. The evolving software has to be tested against something. That something could be another version of the same system, but it takes a long time before a version of the software has matured sufficiently to be used for comparison. Even then, the users cannot readily see if their requirements are fulfilled or not. A separate requirements document is needed as a base line to evaluate the evolving code. If there is no requirements document, no one can be sure that the system is meeting its specific requirements. In software maintenance, the requirements document is essential as a link between the natural language change requests and the target code. The requirements document is the only description of the system which is understood by the users and all stakeholders of the system. A question often posed, especially if it comes to a management or legal dispute is „whether the code fulfills the requirements”. Change requests and error reports are provided in the natural language description of a system, the only semantic level with which the users and stakeholders are familiar. It is difficult enough to link changes and corrections to the code via a requirements document, but without that document, it depends solely on the persons responsible for the code. They have to associate what they see in the user request with that which they know about the code. Without a deep knowledge of the code they are lost. This makes the user organization totally dependent on its maintenance personnel. Over time, these professionals move to different roles, departments, and organizations. If they are not available, the ability to evolve the system being maintained is jeopardized. With a current requirements document, it is at least possible to understand what the system should be doing. The next step is to link what the system should be doing to what it really does.
3. Feature Analysis
When a system is newly developed, links can be inserted from the code back to the requirements. For example, in the header of each method, i.e. procedure, the developer can insert a reference to the requirement fulfilled by that method (see Sample 1). Since there is a m:n relationship between requirements and methods, the same method may point to several requirements. This would be the optimal solution.
// programe name | :GWFSONST.DLL |
// Source name | :GWMOPTAU.CXX |
// Generating date | :20.01.1995 |
// Generating time | :13.40.58 |
// description | :GUI Implementation |
// &fulfills | :REQU-114_GWFSONST |
// ------------------------------------------ |
But what to do if the system has already been implemented and there are neither links from the requirements to the code nor from the code to the requirements? There are only the natural language requirements documents on the one hand and the implemented code on the other. The effort required to manually insert links for a large system is prohibitive. The public medical insurance system referred to in this article entails 2.7 million lines of code with 1,094,684 Java statements in 21,548 classes and 45,262 methods. The requirements document encompasses 507 pages with 12,825 German sentences in 229 requirements and 267 use cases. To manually insert a requirement link into every method would cost at 2 hours per method a total of 90,524 person hours. It is probable that most of the original developers of the legacy system are no longer available. Therefore, a manual approach is unrealistic.
The important question is: “How can one facilitate automatic (electronic) traceability of requirements to code?” One answer is through feature analysis. Feature analysis is a means of identifying specific features in the requirements documentation and tracing them through to the code. The approach evolves from the field of reverse engineering and is well defined in the pertinent literature [Anto02]. To implement it, one must first refine the requirements to a level where they are no longer ambiguous. It must be possible to identify individual functional, non-functional, and data requirements, business rules, use cases and their steps, data objects and their attributes. At this point, we are focused on the functional and data requirements. This is the first step in the feature analysis process. The next step is to extract the specific requirements, rules and use case steps, and represent them in a semi-formal notation for comparison with the implemented source code. Having done that, the next step is to analyze the source code to extract the elementary source entities. One will not find the use cases in the source, but one will find the methods, procedures, objects, and attributes. These can subsequently be paired with the use case steps, i.e. activities or events, the logical conditions and the code operations.
The matching algorithm used here is to compare substrings with a length of six characters. If a substring of one name matches at least one substring of the other name, the two names are considered to be similar. If a requirement name is found to be similar to a segment of code, the relation between the two should be studied more closely. Similarity is determined by matching the nouns from the requirement text with the operands in the code. If for a given requirement, rule, object, or use case step, an elementary code block, i.e. method, operation or procedure exists, then that requirement is considered to be traced. Provided the developers take over the data names from the requirements document and only alter them with prefixes and suffices, it is possible to match code blocks to requirement texts. If in comparing name strings with each other enough substrings match the names are considered to be equivalent. If it is not possible to match up a requirement with a source code segment, that requirement will be placed in a log of unmatched requirements. It is left to the human auditor is to scan through the code in search of functions which might apply to the unresolved requirements. If a match cannot be found the remaining requirement entities are categorized as unfulfilled requirements.
Thus, there are two phases in associating code with requirements. The first phase is automated. By matching the code data and method names with the nouns in the requirement text, the tool identifies code segments which possibly fulfill that requirement. The comments should also be analyzed. The selected code segments and comment blocks are stored together with the requirements to which they apply. In the second phase the responsible maintenance engineer validates whether the code really applies to that requirement or not. In order to accomplish this, the engineer must understand both the requirement and the programming language. Each and every functional requirement should be assigned to at least one code function. The maintenance engineer is provided with an association table to associate the column of requirements with the column of code functions or methods [Chec01]. It is estimated that this manual matching process requires at least two hours per requirement.
The same technique can be applied to matching the change requests and error reports with the requirements document. Change requests and error reports are formulated in natural language. Accordingly, they can be analyzed and the objects they refer to extracted from the text. The prerequisite is that objects referred to in the change requests are the same as those used in the requirements document. A common ontology is needed. It should be defined when the requirements are originally written and attached to the requirements document to ensure that the users continue to use the same terms when they formulate their error reports and change requests. The objects, i.e. nouns, contained therein are then matched to the objects in the requirements document. If a significant number of nouns in the request match the nouns in a requirement, then the request is assigned to that requirement. This needs to be verified by the responsible requirement engineer.
A simplified model of the terminology used in Feature Analysis is provided in Figure 1.
4. Associating Requirements to Code
There are three steps involved in linking requirements to code. Figure 2 describes the method used.
- Extracting the nouns from the requirements document (Feature Analysis)
- Extracting the functions and data from the code (Code Mining)
- Matching the words with one another (Name Matching)
4.1. Extracting Nouns from the Requirements Document (Feature Analysis)
The first step in the process of comparing requirements to code is to parse the requirements document and recognize the essential elements – the individual requirements and the use cases. These are marked by keywords inserted in the original requirements document. The use cases are intended to fulfill the requirements. Therefore, every functional requirement should be assigned to a use case. One use case can fulfill one to many requirements. Then it is necessary to examine the text of each use case and the requirements it fulfills. All of the nouns are extracted. For instance, in the requirement “The user should be able to select a patient and see the treatment type he is getting”, the nouns are < patient > and < treatment_type >.
&Requ_13: Pairing Patients with Treatment; patient; treatment_type
The user should be able to select a patient and see the treatment type he is getting.
Notice that in English many nouns are compound nouns consisting of two or more words. The selected nouns are then stored in a table together with the requirement in which they appeared [Gueh06] (see Sample 2).
Entity Name | Entity Relationships |
---|---|
Type;Base Entity; | Rel ;Type;Target Entity |
PROD;HVB; | OWNS;SYST;ZPV4 |
SYST;HVB; | OWNS;DOCU;100_ZPV4.0 |
DOCU;100_ZPV4; | OWNS;REQU;&AN03.04.01 |
REQU;&AN03.04.01; | USES;DATA;Patient |
REQU;&AN03.04.01; | USES;DATA;Treatment_Type |
REQU;&AN03.04.01; | USES;DATA;Hospital |
REQU;&AN03.04.01; | USES;DATA;Ailment Type |
REQU;&AN03.04.01; | USES;DATA;Treatment_Facility |
REQU;&AN03.04.01; | USES;DATA;Subject_Area |
REQU;&AN03.04.01; | USES;DATA;Examination_Center |
REQU;&AN03.04.01; | USES;DATA;Treatment |
REQU;&AN03.04.01; | USES;DATA;Subject_Area |
REQU;&AN03.04.01; | USES;DATA;Partner_Agency |
REQU;&AN03.04.01; | USES;DATA;Treatment_Facility |
It is not enough to extract nouns from the requirement statements. Requirements are needs expressed by the user. These needs have to be assigned to events in a business process. Events in business processes are referred to as use cases, a term which is now well established in the requirements engineering field. Use cases fulfill requirements. Any one use case can fulfill one or more requirements. It is up to the requirements analyst to assign requirements to use cases. Use cases have their own features such as triggers, preconditions, post conditions, and exceptions (see Sample 3).
&BAF04.03 – Display List of Subject_Areas | |
&ShortDescription: This Use Case displays a list of Subject_Areas from which the user can select. It includes a classification of ailment and treatment types. . |
|
&ExecutionMode: online. | |
&Actor: Insurance_Agent. | |
&Fulfills: AN03.04.01. | |
&PreConditions . Subject Area is known to the ZPV4.0 system. . Treatment Location is identified by the User. |
|
&PostCondition if successfull . List of Subject Area treatment types and locations is displayed. |
|
&PostCondition if unsuccessfull . Error Message is displayed in a new window. |
The use cases are analyzed and their nouns extracted to a table of use case operands. In this way, the use cases can be associated with the requirements. It is the uses cases which are implemented in the code. Therefore, the use case nouns are those searched for in the code (see Sample 4).
Entity Name | Entity Relationships |
---|---|
DOCU;103_ZPV4; | OWNS;CASE;&BAF04.03 |
CASE;&BAF04.03; | USES;DATA;UseCase |
CASE;&BAF04.03; | USES;DATA;Display |
CASE;&BAF04.03; | USES;DATA;List |
CASE;&BAF04.03; | USES;DATA; Subject Areas |
CASE;&BAF04.03; | USES;DATA;Location Classification |
CASE;&BAF04.03; | HAS ;PRE ;. Location is found. |
CASE;&BAF04.03; | USES;DATA;Treatment Location |
CASE;&BAF04.03; | USES;DATA;User |
CASE;&BAF04.03; | HAS ;POST;successful |
CASE;&BAF04.03; | USES;DATA;List |
CASE;&BAF04.03; | USES;DATA;Treatment Type |
CASE;&BAF04.03; | USES;DATA; Location Classification |
CASE;&BAF04.03; | USES;DATA;Error Occurence |
CASE;&BAF04.03; | USES;DATA;Error Report |
4.2. Extracting Data and Function Names from the Code (Code Mining)
The second step in the process is to mine the code and extract the names of the classes, methods and attributes from the code. In the method getTreatmentType()
public TreatmentType getTreatmentType (PatientNo) throws DatenbankException {
return new PatientTreatment (“PatientNo”, “Treatments”, “ErrorCode” ); }
the operands are TreatmentType, PatientNo, and PatientTreatment. The function is getTreatmentType. They are stored in the set with ordered elements (“tuple”):
getTreatmentType; TreatmentType; PatientNo; PatientTreatment; ErrorCode
This is done for all the methods in all classes to create a table of methods and attributes for every class [Dit13] (see Sample 5).
Entity Name | Entity Relationships |
---|---|
SRC ;LocationTypeClass; | OWNS;FUNC;@Component |
SRC ;LocationTypeClass | OWNS;FUNC;RuleforSubjectAreaClassification |
SRC ;LocationTypeClass | OWNS;FUNC; RuleforLocationClassification |
FUNC;RuleforLocationClassification | OWNS;DATA;Location |
SRC ;LocationTypeClass; | OWNS;FUNC;isValidLocation |
SRC ;LocationTypeClass; | OWNS;FUNC; ListofValidLocations |
SRC ;LocationTypeClass; | OWNS;FUNC; ListofValidTreatments |
FUNC;SubjectAreaClass; | OWNS;DATA;Rule;rule |
DATA;Rule; | USES;TYPE;regel |
SRC ;LocationTypeClass; | OWNS;FUNC; ValidTreatment |
FUNC;SubjectAreaClass; | OWNS;DATA;Rule;rule |
DATA;Rule; | USES;TYPE;rule |
4.3. Matching the Words with One Another (Name Matching)
In the third step the three relational tables are compared with one another to find matching tuples. The code tuple
getTreatmentType; ValidTreatment; Location; ValidLocation; LocationClassification
matches with the requirement tuple
AN03.04.01: Patient, Treatment_Type, Hospital, Location, Treatment_Facility
In this sample three nouns used in the requirements document are common with operands in the code, i.e. treatment, treatment_type and location. There need to be only two matches to qualify the requirements as being potentially fulfilled, whereby name fragments or substrings are also counted. The potentially fulfilled requirements are noted. The remaining requirements for which no matching code is found are listed separately for the user to check manually (if that is at all possible in such a mass of code).
5. Associating Maintenance Requests to Requirements
Having linked the requirements to the code, those links are stored and remain valid as long as the requirements remain unchanged. The next step should be according to theory of requirements engineering to verify and validate the links that have been established between requirements and code. That step would mean manually inspecting both the requirements document and the code. This requires a tremendous additional effort which no commercial user is willing to pay and certainly not the Austrian social security department. Therefore, the verification step was purposely left out. It was assumed that a partially accurate link is better than no link at all. The next step here was to deal with the change requests and to link them to the requirements knowing full well that the links from the requirements to the code were not completely accurate. For this need, the same technique is applied. Nouns are extracted from the text of the maintenance request and compared with the nouns of the requirements. These nouns are stored in a relational table where they can be searched. The request is assigned to the requirement with the greatest number of noun matches. If that is indecisive, the requirements engineer must determine which requirement matches best.
Once the maintenance request has been assigned to a particular requirement, it is possible to link the request to the affected code. The requirements to code links are searched to select those code segments – methods and classes – which come closest to matching the change request. The result is the tuple (Change Request, Requirement, Code Segment). The maintenance engineer receives a list of code segments which are likely to be affected by the incoming request. This is a much better starting point than his memory of what code could be impacted. As shown in Figure 3, nouns of a change request are being matched with nouns of the requirements document, objects of the design model, operands in the code, data entities, and text objects in the test cases.
6. Automated Change Request Tracing with SofTrace
SofTrace, a member of the SoftEval tool family for supporting software maintenance and evolution, is a tool developed by the first author for supporting feature analysis. It is designed to match a change request written in prose text to a requirement, a design model, a source code member, a database table, or a test case. The primary input is the change request – a text file with a German or English text describing what should be changed in the existing system. Change requests are usually formulated in a preformatted form with key words and text boxes (see Sample 6). One set of attributes is mandatory; another set of attributes is optional. The attributes which are mandatory are:
- change request id,
- change request data,
- change request reporter,
- product to which the change request applies,
- system or component to which the change request applies,
- change priority (critical, high, medium, low),
- artifact targeted by the change request (GUI, database, component),
- prose description of the desired change (see Sample 6).
Optional attributes are:
- the expected benefits,
- the maximum justifiable costs,
- the desired completion date.
CR-11 Credit Check | |
UC-03: Customer_Order_Processing | |
BO-02 Customer | |
REQ-22: Credibility Checking | |
The customer credibility should be changed from a numeric to a character data type of the same length. The minimum credit rating should be 4 instead of 3. Customers with credibility less than 4 are not allowed to order articles. |
The change request form should be processed by a natural language parser which recognizes the nouns in the text. The nouns in the prose description are extracted for comparison with the nouns in the current requirements document and the current design model as well as with the variable names in the code and in the test cases. For this purpose, it is important that the change request is structured and completed in such a manner as to be processed easily, i.e. the text describing the change should be in a separate box distinct from the other Change Request (CR) attributes. The name list is produced by a parsing routine which distinguishes nouns from the other syntax elements. It is based on a dictionary of common nouns including compound nouns which are composed of concatenated elementary nouns. In languages like German and Hungarian it is much easier to recognize such nouns since they are joined together in one word. In English, it is more problematic since the member nouns of a compound noun are separated by spaces. They could be either separate nouns or parts of a compound noun. It depends on the context in which they are used. The natural language parser must be able to recognize that context and to join nouns of a compound noun together with an underscore as depicted in the following name list. It contains all the nouns extracted from a change request to add a credibility check to customer order processing (see Sample 7).
0029 10 NOUN < articles > |
0030 21 NOUN < character_data_type > |
0031 13 NOUN < credibility > |
0032 08 NOUN < credit > |
0033 22 NOUN < customer_credibility > |
0034 20 NOUN < customer_exclusion > |
0035 11 NOUN < customers > |
0036 08 NOUN < length> |
0037 16 NOUN < minimum_credit > |
0038 07 NOUN < order > |
There are all together four steps in going from a change request to the code. In the first step the change request is associated with the target requirement. In the second step the requirement is linked to the use cases which implement that requirement. There may be one or more. In the third step the use cases are associated with the code segments – methods or procedures – which implement the selected use cases.
6.1. Step 1 = Associating the Change Request to the Requirement
From the requirements document, a CSV (Comma-separated values) file is generated depicting the entities and relationships extracted from the change request text. The change request as a base entity has a relationship to several other entity types – to the objects, interfaces, processes, rules, and use-cases – which are referred to in the prose text. It is important that these entities are recognizable. The CSV file has the format:
Base Entity Type; Base Entity Name; Relationship Type; Target Entity Type;
Target Entity Name
The entity types of requirements depicted in the samples used here are SYST = System, DOCU = Document, SECT = Document Section, REQU = Requirement, CASE = Use Case, OBJT = Object, DATA = Data Variable, TEST = Test Case. The relationship types used are USES = uses/processes, EXEC = executes test case. The entity types of code elements depicted in the code tables are SYST = System, SRC = Source Code Member, CLAS = Class, FUNC = Procedure or Method, INTR = Interface, PARM = Parameter. The code relationship types are USES, CALLS = calls/invokes, RECV = receives, SEND = sends, LINK = sends and receives. This CSV table is loaded into a relational database where it serves as a basis of comparison with the other CSV tables with entities and relationships taken from the design documents, test cases and source-code members (see Sample 8).
Type;Base Entity | ;Rel ;Type;Target Entity |
---|---|
PROD;STORE | ;OWNS;SYST;ORDERS |
SYST;ORDERS | ;OWNS;DOCU;CR_11_Credit.txt |
DOCU;CR_11_Credit.txt | ;ISAT;LIB ;CR_11_Credit.txt |
DOCU;CR_11_Credit.txt | ;OWNS;SECT;CR_11_Credit |
SECT;CR_11_Credit | ;USES;DATA;customer_credibility |
SECT;CR_11_Credit | ;USES;DATA;character_data_type |
SECT;CR_11_Credit | ;USES;DATA;length |
SECT;CR_11_Credit | ;USES;DATA;minimum_credit |
SECT;CR_11_Credit | ;USES;DATA;customers |
SECT;CR_11_Credit | ;USES;DATA;credibility |
SECT;CR_11_Credit | ;USES;DATA;order |
SECT;CR_11_Credit | ;USES;DATA;articles |
The nouns taken from the change request and stored here as data used are then matched against the table of nouns taken from the requirements document to locate which requirement or use case is being targeted (see Sample 9).
DOCU;OrderEntry | ;OWNS;REQU;FUNC-REQ-02 OrderProcessing |
REQU;FUNC-REQ-02 OrderProcessing | ;USES;DATA;customer |
REQU;FUNC-REQ-02 OrderProcessing | ;USES;DATA;customer_number |
REQU;FUNC-REQ-02 OrderProcessing | ;USES;DATA;customer_credit |
REQU;FUNC-REQ-02 OrderProcessing | ;USES;DATA;minimum_credit |
REQU;FUNC-REQ-02 OrderProcessing | ;USES;DATA;credibility |
REQU;FUNC-REQ-02 OrderProcessing | ;EXEC;TEST;ORDERS0005 |
REQU;FUNC-REQ-02 OrderProcessing | ;USES;DATA;order_item |
REQU;FUNC-REQ-02 OrderProcessing | ;EXEC;TEST;ORDERS0006 |
REQU;FUNC-REQ-02 OrderProcessing | ;USES;DATA;article |
REQU;FUNC-REQ-02 OrderProcessing | ;USES;DATA;stock |
REQU;FUNC-REQ-02 OrderProcessing | ;USES;DATA;quantity |
REQU;FUNC-REQ-02 OrderProcessing | ;USES;DATA;order |
REQU;FUNC-REQ-02 OrderProcessing | ;EXEC;TEST;ORDERS0007 |
6.2. Step 2 = Associating the Requirements to Use Cases
In the second step, the selected requirements are connected to use cases based on their common data names. Use cases have a m:n relationship to requirements. One use case may fulfill many requirements and one requirement can be fulfilled by many use cases. The other use cases affected by the requirement change must be included (see Sample 10).
DOCU;OrderEntry | ;OWNS;CASE;Customer_Order_Processing |
CASE;Customer_Order_Processing | ;USES;DATA;customer_order |
CASE;Customer_Order_Processing | ;USES;DATA;system |
CASE;Customer_Order_Processing | ;USES;DATA;customer |
CASE;Customer_Order_Processing | ;USES;DATA;customer_database |
CASE;Customer_Order_Processing | ;EXEC;TEST;ORDERS0084 |
CASE;Customer_Order_Processing | ;USES;DATA;credibility |
CASE;Customer_Order_Processing | ;USES;DATA; customer_credit |
CASE;Customer_Order_Processing | ;USES;DATA; minimum_credit |
CASE;Customer_Order_Processing | ;EXEC;TEST;ORDERS0085 |
CASE;Customer_Order_Processing | ;USES;DATA;customer_order |
6.3. Step 3 = Associating the Use Cases to the Code
In the third step the table of selected use cases is compared with the table of operations and operands extracted from the code. Those code segments with the most matches are then presented to the maintenance engineer as candidates for change. The final selection as to what code segments or methods to really change is left to the maintenance engineer (see Sample 11).
CLAS;CustomerOrder | ;OWNS;FUNC;CustomerOrder |
FUNC;CustomerOrder | ;OWNS;INTR;CustomerOrder |
INTR;CustomerOrder | ;RECV;PARM;cusOrder |
PARM;cusOrder | ;USES;TYPE;CustomerOrder |
CLAS;CustomerOrder | ;OWNS;FUNC;getCustomerOrder |
FUNC;getCustomerOrder | ;LINK;PARM;OrderNumber |
PARM;OrderNumber | ;USES;TYPE;int |
CLAS;CustomerOrder | ;OWNS;FUNC;checkCustomerCredit |
FUNC;checkCustomerCredit | ;LINK;PARM;MinimumCredibility |
PARM;MinimumCredibility | ;USES;TYPE;CreditRatings |
FUNC;checkCustomerCredit | ;LINK;PARM;CustomerCredibility |
PARM;CustomerCredibility | ;USES;TYPE;CreditRatings |
CLAS;CustomerOrder | ;OWNS;FUNC;setCustomerOrder |
FUNC;setCustomerOrder | ;LINK;PARM;CustomerOrder |
PARM;CustomerOrder | ;USES;TYPE;CustomerOrder |
The final result is a join of change request, requirement, use case and code segments. The maintenance engineer will receive a list of code segments which are candidates for change. A closer examination of the code will show what code really is affected by the change request. It is here that the impact analysis sets in. Once the directly affected code segments are known an impact analysis will reveal which other code segments may be impacted [RAWo08] (see Sample 12).
Change Request | Requirement | UseCase | CodeSegment |
---|---|---|---|
CR-11 Credit-Check | FUNC-REQ-02 OrderProcessing | Customer_Order_Processing | checkCustomerCredit |
Billing | CreateInvoice | ||
Back_Order_Processing | checkCustomerCredit |
7. Requirement Traceability Matrix
The requirement traceability matrix (RTM) has been described by Young [Young01] among others to document the relationships between requirements and other system entities, e.g. use cases, design documents, code segments, and test cases. It is a very useful instrument for visualizing these relationships; however, first the relationships have to be recognized. If one is addressing a large legacy system, it is quite possible that an RTM exists. This would provide an invaluable starting point. Alternatively, the relationships can be recognized by humans searching through the documents, code, and test cases, or they can be recognized by specialized software analyzing the documents, code, and test cases. In the case of existing legacy systems, the costs of human recognition are prohibitive. Another vital concern is that it is unlikely to find persons with knowledge of all the languages used (for example, German, UML, Java, SQL, and Test Script). One would most likely have to find multiple qualified people to accomplish this task. As proposed in this article, it is more cost-effective to build and provide specialized software which can process the languages and automatically recognize relationships between software entities including those from requirements to code.
This article has presented a method – feature analysis – for recognizing relationships between requirements and code. The method can potentially be extended to include UML design documents and test cases. It could also be extended to generate various requirements traceability matrixes. However, this can only be done once the trace links have been recognized. This article has focused on presenting a method and tool to develop those links in existing systems.
Summary
The feature analysis method of statically associating requirements to code presented here has been tested on a complex real-life system and found to be plausible. The tool for matching requirements to code is SofTrace, a member of the SoftEvol tool family for supporting software maintenance and evolution. All members of that tool family run in an MS-Windows environment and are available free of charge from Harry.Sneed@t-online.de. The algorithms for extracting the names are fairly well ripened. The algorithms for matching the names still have to be refined. The main obstacle to using this approach is the need to use similar names in the code as were used in the requirements. The open question is: how many names must match in a segment of code – procedure or method – in order to qualify that code as matching to the requirement in question? More research is necessary concerning feature analysis in order to further strengthen and improve the method. Similar research efforts began in the early 1990’s [Wild95]. Even now the tools can be useful in determining the degree of requirements coverage. This is a research area which deserves a lot more attention. Semantic comparison through static analysis is a major step in automating software quality assurance.
End Notes
-
Perhaps the best discussion available concerning the topic of traceability is the article “Traceability“, by James D. Palmer, originally published in Software Requirements Engineering, R. H. Thayer and M. Dorfman, Eds., 1997, pp. 364-374. The article has been reprinted and is available as Appendix A in [Young06]. The article is closely related to this article because “good” systems should meet prioritized customer needs, not wants. Envision the reduced level of effort required to deliver a system that is developed utilizing prioritized customer needs rather than wants.
-
A significant contribution of [Young01] is the author’s concept of “real requirements” as contrasted with “stated requirements”. Young observes that there is a huge difference between the two, and this difference accounts for many of our requirements-related problems. Historically, clients have not been able to articulate their real customer requirements and needs. Accordingly, an effective requirements process must provide for the time, resources, mechanisms, methods, techniques, tools, and trained requirements engineers familiar with the application domain to define the real customer needs. See Chapter 4, Define the Real Customer Needs, for a thorough discussion of this concept and how to implement it.
Acronyms Used in this Article
Acronym | Explanation |
---|---|
CALLS | One code entity invokes another |
CASE | Application use case |
CLAS | Class in the terminology of an object-oriented language |
CR | Change Request |
CSV | Comma Separated Variables |
DATA | Noun in the requirement document or variable in the code |
DOCU | The system requirements document |
EXEC | Base entitiy is tested by the target test case |
FUNC | Procedure or method in the code |
INTR | Interface defined in the code |
LINK | Two components communicate with each other |
OBJT | Business object |
PARM | Parameter in an interface |
RECV | Component receives a message |
REQU | Functional or non-functional requirement |
Requ Docu | Requirements Document |
SECT | Section of the requirements document |
SEND | Component sends a message |
SRC | Source code member or file in source library |
SYST | Name of the target application system |
TEST | Logical test case, i.e. test condition |
USES | Denotes the base entity uses the target entity |
Acknowledgements: Acknowledgement is made to the IT Service GmbH (ITSV), the software provider of the Austrian Social Security Agency, for providing the material used in the research presented here, and in particular to Herr Richard Ringbauer, the director of software development; and also to the International Requirements Engineering Board (IREB) for reviewing and advising on drafts of this article.
References and Literature
- [Anto02] Antoniol G., Canfora G., Casazza G., De Lucia A., and Merlo E. “Recovering Traceability Links Between Code and Documentation.” IEEE Transactions on Software Engineering, 28(10) 2002, p. 970.
- [Gueh06] Antoniol G., and Gueheneuc Y.-G. “Feature Identification: An Epidemilogical Metaphor.” IEEE Transactions on Software Engineering: 32(9) 2006, p. 627.
- [Chec01] Chechik M., and Gannon J. “Automatic Analysis of Consistency between Requirements and Design.” IEEE Transactions on Software Engineering, 27(7) 2001, p. 651.
- [Dit13] Dit B., Revelle M., Gethers M., and Poshyvanyk D. “Feature Location in Source Code: A Taxonomy and Survey”. Journal of Software Evolution and Process, 25(1) 2013, p.53.
- [Lehm98] Lehman M. “Software’s Future: Managing Evolution”. IEEE Software, January 1998, p. 40.
- [RAWo08] Rovegard P., Angelis L., and Wohlin C. “An Empirical Study on Views of Importance of Change Impact Analysis Issues”, IEEE Transactions on Software Engineering, 34(4) 2008, p. 516.
- [Sned07] Sneed H. “Testing Against Natural Language Requirements”, IEEE Proceedings of Seventh International Conference on Quality Software (QSIC 2007), Portland, USA, October 2007, p. 380.
- [Wild95] Wilde N., and M. Scully. “Software Reconnaisance: Mapping Program Features to Code”. Journal of Software Maintenance: Research and Practice, 7(1) 1995, p. 49.
- [Young01] Young R. Effective Requirements Practices, Boston MA USA: Addison-Wesley, 2001, p.113.
- [Young06] Young R. Project Requirements: A Guide to Best Practices, Vienna VA USA: Management Concepts, 2006, pp. 201-222.
Harry M. Sneed has a Master’s Degree in Information Sciences from the University of Maryland, 1969. He has been working in the IT field since 1967 when he started as a FORTRAN programmer for the US Navy Department. He migrated to Germany in 1971 and worked first for the Federal University Administration and then for Siemens in the database area. In 1978 he set up the first commercial software test laboratory in Budapest. There he developed the first German requirements engineering tool SoftSpec in 1982. That tool was used to document the requirements in many large German organizations, including BMW, Bertelsmann, Thyssen Steel and the German Railways. That tool was used not only to collect and store the requirements on the mainframe, but also to check the completeness and consistency of the requirements, as well as to generate a system design.
At the end of the 1980’s Sneed moved over to the field of reverse and reengineering and became involved in projects throughout Europe. In 2009 he received the Stevens Award from the IEEE Computer Society for his pioneering achievements in that field. He conducts courses at two technical colleges and two universities. He has published over 400 technical articles and written 23 books on the subjects of software testing, maintenance, migration and measurement. His work in requirement engineering is mainly in connection with reverse engineering, change management and test, three areas in which he still works as freelance consultant.
Birgit DemuthDr.-Ing. Birgit Demuth has over 20 years experience as researcher and educator in software engineering.
Amongst other lectures, she teaches students in requirements engineering.
Her interests are how to apply model-based techniques in the continuous software life cycle.
In a large-scale software project course, she coaches students and student tutors to elicit and model (often real)
user requirements before they start to implement.