A Look at Zero-Defect Code
The US National Security Agency has released a case study showing how to develop zero-defect code in a cost-effective manner. The researchers of the project conclude that, if adopted widely, the practices advocated in the case study could help make commercial software programs more reliable and less vulnerable. I examined a small part of the case study's code, and was not impressed.
The Tokeneer case study involves an NSA-funded project carried out by the U.K.-based Praxis High Integrity Systems and Spre Inc. The project's materials, such as requirements, security target, specifications, designs, and proofs, and the code are now available online.
According to an article in the Government Computer News, Tokeneer meets or exceeds the Common Criteria Evaluation Assurance Level (EAL) 5. A related paper presented at the 1st IEEE International Symposium of Secure Software Engineering concludes that the case study has shown that software-based security products can be built so that they are reliable, verifiable and cost effective against Common Criteria guidelines, thus raising the bar for both procurers and suppliers.
I'm not the most qualified person to judge the project's requirements analysis, the formal specification, and formal refinement of the specification. However, intrigued by the statements regarding the project's software quality, and based on the legendary reputation of Praxis's work, I decided to download and have a look at the released code. I examined one of the largest source code files (tis_release/code/core/auditlog.adb), and in it I found difficult to maintain, error-prone code, a poor naming choice, inconsistent code formatting, and, even, what I think is a logic error (I sincerely hope somebody proves me wrong on the last one). My judgment criteria are relatively strict, but I think they are commensurate with the extraordinary claims made by the study.
Difficult to Maintain and Error-Prone Code
Here are excerpts of the code used for mapping the number of a log file to its name.
subtype LogFileIndexT is LogFileCountT range 1 .. MaxNumberLogFiles;
subtype FileNameI is Positive range 1 .. 16;
subtype FileNameT is String(FileNameI);
type LogFileNamesT is array (LogFileIndexT) of FileNameT;
LogFileNames : constant LogFileNamesT :=
LogFileNamesT'( 1 => "./Log/File01.log",
2 => "./Log/File02.log",
3 => "./Log/File03.log",
4 => "./Log/File04.log",
5 => "./Log/File05.log",
6 => "./Log/File06.log",
7 => "./Log/File07.log",
8 => "./Log/File08.log",
9 => "./Log/File09.log",
10 => "./Log/File10.log",
11 => "./Log/File11.log",
12 => "./Log/File12.log",
13 => "./Log/File13.log",
14 => "./Log/File14.log",
15 => "./Log/File15.log",
16 => "./Log/File16.log",
17 => "./Log/File17.log"
);
See how more than 20 lines of code are used for what I could express in C or Java with a couple of lines. like the following.
static String logFileName(int n) {
Formatter f = new Formatter();
return f.format("./Log/File%02d.log", n).toString();
}
There are a number of problems with this part of the study's code. First, the code violates the DRY (don't repeat yourself) principle, making it error prone, and difficult to change and maintain. Each array element initialization must be separately inspected by hand to verify that it matches the corresponding array index. The code also violates the single point of truth principle. Each time MaxNumberLogFiles changes, an entry must also be added to or removed from the array initialization. If the log file name or location changes, it must be changed 17 times, and if the string's length changes, one must also adjust the magic number 16 appearing in the declaration of FileNameI, which is the length of the filename used for initializing the array. Finally, the large number of repetitive lines inflates the productivity figure cited in the study (10,000 lines of code in 260 person-days, or about 38 lines of code per day).
A Poor Name Choice
A data structure named and documented as a list is actually what is commonly called a circular or ring buffer.
------------------------------------------------------------------
-- NextListIndex
--
-- Description:
-- Returns the next index, wrapping if necessary.
--
-- Implementation Notes:
-- None.
--
------------------------------------------------------------------
function NextListIndex(Value : LogFileIndexT) return LogFileIndexT
is
Result : LogFileIndexT;
begin
if Value = LogFileIndexT'Last then
Result := LogFileIndexT'First;
else
Result := Value + 1;
end if;
return Result;
end NextListIndex;
The above 21 lines implement what is commonly written inline using a simple modulo division.
Inconsistent Code Formatting
The code's developers, despite the development environment they use, seem to have trouble maintaining a consistent formatting style for spacing the code's elements. As an example, notice the spacing after LogFiles. We have cases where there is a space before the bracket, after the bracket, and no space at all.
LogFilesStatus (Index) := Free;
LogFiles( Index) := TheFile;
FileH := LogFiles(I);
A Logic Error?
A function, SystemFaultOccurred, is documented to return "True exactly when a critical system fault has occurred while attempting to maintain the audit log." This matches the corresponding specification (tis_release/docs/50_2_INFORMED_Design/50_2.pdf p. 43): "The operation SystemFaultOccurred indicates whether or not a critical system fault has occurred whilst writing to the log."
This is implemented by having
a global variable, AuditSystemFault
,
set very conservatively to true whenever something goes wrong in the system.
Here are the cases in the code where AuditSystemFault is set.
(OK is a variable set by various system functions.)
AuditSystemFault := AuditSystemFault or not OK;
AuditSystemFault := AuditSystemFault or not OK;
AuditSystemFault := AuditSystemFault or not OK;
AuditSystemFault := AuditSystemFault or not OK;
AuditSystemFault := True;
AuditSystemFault := True;
AuditSystemFault := True;
AuditSystemFault := True;
AuditSystemFault := not OK;
However, when a log file is deleted the previous value of AuditSystemFault is ANDed instead of ORed with the operation's result, thus clearing it if the deletion is successful, and failing to set it if the deletion fails, but no fault was detected before.
File.Delete(TheFile => TheFile,
Success => OK);
AuditSystemFault := AuditSystemFault and not OK;
The following table illustrates this strange behavior.
AuditSystemFault | OK | not OK | AuditSystemFault and not OK | |
False | False | True | False | |
False | True | False | False | |
True | False | True | True | |
True | True | False | False |
I found these problems in less than an hour, in the second source code file I looked (the first was extremely short). At the very least my findings indicate that formal methods are not a substitute or a guarantee of good programming practices.
Read and post comments