STSC CrossTalk - C++ Component Model Reengineering By Automatic Transformation

May 2005 Issue

C++ Component Model Reengineering By Automatic Transformation
Dr. Robert L. Akers, Semantic Designs
Ira D. Baxter, Semantic Designs
Michael Mehlich, Semantic Designs
Brian Ellis, The Boeing Company
Kenn Luecke, The Boeing Company

Reengineering legacy software to use a modern component model can be accomplished by repeatedly applying a large number of semantically-sensitive program transformations, some of which synthesize new code structures, while others modify legacy code and meld it into the new framework. Using machinery that automates the process conquers the problems of massive scale, soundness, and regularity, and furthermore reduces time to completion by overlapping the project's design and implementation phases. This article describes experiences in automating the reengineering of a collection of avionics components to conform to a Common Object Request Broker Architecture-like component framework, using Design Maintenance System, a program analysis and transformation system designed for building software engineering tools for large systems.

Automated program transformation holds promise for a variety of software life-cycle endeavors, particularly where the size of legacy systems makes code analysis, reengineering, and evolution very difficult and expensive. But constructing transformation tools that handle the full generality of modern languages and that scale to very large applications is itself a painstaking and expensive process. This cost can be managed by developing a common transformation system infrastructure that is reused by an array of derived tools that each address specific tasks, thus leveraging the infrastructure cost over the various tools.

This article describes how the Design Maintenance System (DMS) tool-building infrastructure was employed to construct the Boeing Migration Tool (BMT), a custom, component-modernization application being applied to a large C++ industrial avionics system. The BMT automatically transforms components developed under a 1990s era component style to a more modern Common Object Request Broker Architecture (CORBA)-like component framework, preserving functionality but introducing regular interfaces for inter-component communication.

The authors describe the DMS technology and the BMT application itself to provide insight into how transformation technology can address software analysis and evolution problems where scale, complexity, and custom needs are barriers. The authors also discuss the development experience, present Boeing's evaluation of the results, and provide some insight for software modernization managers and engineers. The intent is not only to convey how this particular modernization effort worked, but also to give a flavor for what can be accomplished with automatic analysis and transformation and offer observations for approaching such an effort.

The DMS Software Reengineering Technology

The DMS technology supports the deep semantic analysis of software and the transformation of software based not only on syntactic structure, but also semantic understanding. It is fundamentally a tool-building technology, engineered to support arbitrary programming languages and designed for scale, so that very large software systems can be handled.

In essence, DMS is a generalization of compiler technology. A domain representing a programming language is created by introducing the language grammar to the environment. From this, DMS automatically generates a lexer that produces token streams, a parser that produces abstract syntax trees (ASTs), and a prettyprinter that reproduces source code from AST representations. Tools can be built upon (possibly multiple) domains by combinations of three techniques: (1) attribute evaluation, where tool-specific evaluation rules are attached to grammar constructs and triggered any time the targeted construct is encountered in an AST; (2) application of tool-specific rewrite rules, which are specified in terms of the domains in question and applied whenever the left hand side of the transform matches an AST and satisfies all other (arbitrary) conditions that are attached to the rule; and (3) custom code written in PARLANSE, the DMS native implementation language [1].

AST structures are suitable for arbitrary analysis, transformation, and prettyprinting, unlike the ASTs produced by standard compiler front ends for code generation. Attribute evaluation is general purpose, rather than limited to symbol table construction and code generation, as with a table-driven compiler. Arbitrary language translations are possible, rather than a compiler technology's limitation to rewriting a single source code language to a single object code language. Dialect variances (e.g., Visual C++ 6.0) from a core language (e.g., ANSI C++ ISO14882:1998) are supported as well. Among over 70 domains implemented at various levels of completeness are C and C++, Ada, Java, C#, PHP, Verilog, VHDL, UML, JOVIAL, COBOL, FORTRAN, and Visual Basic, along with many of their dialects.

By combining the key elements of the DMS technology with the domain expertise of the requirements specifiers and tool-building engineers, nearly arbitrary software analysis tools can be constructed. Instrumentation tools can insert code in a (possibly multi-lingual) system for purposes like test coverage analysis or execution profiling. Pattern matching and rewriting has been used to identify and eliminate clone or dead code. Cross-platform migrations can be implemented primarily by transformation rules from the source to the target domains (since domains may be freely mixed in rewrite rules), with symbol tables derived for the source program by name and type resolution. Architecture extraction from source code can be done by a combination of data flow analysis (based on attribute evaluation) and pattern matching for key constructs. Code cleansing or optimization can be achieved with different kinds of rewrite rules. The scaleable infrastructure of DMS allows these tools to operate on truly massive software systems with millions of lines of code and thousands of source files implemented in multiple source code languages. Figure 1 illustrates the architecture embodying the DMS technology.

A more complete overview of DMS is presented in [2], including discussion of how DMS was extensively used to create itself. For example, the DMS lexer generator, prettyprinter generator, and its name and type resolution analyzers for various languages are all tools created with DMS. Various other DMS-based tools are described on the Semantic Designs Inc. (SD) web site [3].

Figure 1: DMS is a multi-lingual, generalized compiler technology that supports static analysis, code enhancement, and cross-platform migration of source code
Figure 1: DMS is a multi-lingual, generalized compiler technology that supports static analysis, code enhancement, and cross-platform migration of source code

Note: DMS is a multi-lingual, generalized compiler technology that supports static analysis, code enhancement, and cross-platform migration of source code.

The BMT

Boeing's Bold Stroke avionics component software architecture is based on the best practices of the mid 1990s [4]. Component technology has since matured, and the CORBA component model has emerged as a standard. The U.S. government's Defense Advanced Research Projects Agency (DARPA)-Program Composition for Embedded Systems (PCES) program and Object Management Group are sponsoring development of a CORBA-inspired standard real time embedded system component model (CCMRT) [5], which offers standardization, improved interoperability, superior encapsulation, and interfaces for ongoing development of distributed, real time, embedded systems such as Bold Stroke. This standardization also provides a base for tools for design and analysis of such systems, and for easier integration of newly developed technologies such as advanced schedulers and telecommunication bandwidth managers.

Boeing wishes to upgrade its airframe software to a more modern architecture, a proprietary CCMRT variant known as PRiSm. This will allow more regular interoperability, standardization across flight platforms, and opportunities for integrating emerging technologies that require CORBA-like interfaces. Yet since the legacy software is operating in mature flight environments, maintaining functionality is critical. The modernization effort, then, focuses solely on melding legacy components into the modern component framework without altering functionality.

The task of converting components is straightforward and well understood, but a great deal of detail must be managed with rigorous regularity and completeness. Since Bold Stroke is implemented in C++, the complexity of the language and its preprocessor requires careful attention to semantic detail. With thousands of legacy components now fielded, the sheer size of the migration task is an extraordinary barrier to success. With the use of C++ libraries, approximately 250,000 lines of C++ source contributes to a typical component, and a sound understanding of the component's name space requires comprehension of all this code.

To deal with the scale, semantic sensitivity, and regularity issues, DARPA, Boeing, and SD decided on an automated approach to component migration using a custom DMS-based tool. DMS, with its C++ front end complete with name and type resolution, its unique C++ preprocessor, its transformation capability, and its scalability, was an appropriate substrate for constructing a migration tool that blended code synthesis with code reorganization. Automating the migration process assures regularity of the transformation across all components and allows the examination of transformation correctness to focus primarily on the general transforms rather than on particular, potentially idiosyncratic examples.

The legacy component structure was essentially flat, with all a component's methods collected in a very few classes (often just one), each defined with .h and .cpp files. One principal piece of the migration involves factoring a component into facets, distinct classes reflecting different areas of concern. Some facets encapsulate various functional aspects and are specific to each component. Others capture protocols for inter-component communication. While communication protocols are common in style among all components, code specifics vary with the components' functional interfaces. In addition to sorting methods into thematic classes and constructing all the appropriate name spaces and wiring, the BMT must also extract idiomatic communication and configuration code from existing methods and isolate it in appropriate new methods and classes. Figure 2 illustrates the high-level restructuring task the tool must perform.

Figure 2: Transforming monolithic Boeing Bold Stroke components by factoring functionality and by isolating and regularizing communication interfaces
Figure 2: Transforming monolithic Boeing Bold Stroke components by factoring functionality and by isolating and regularizing communication interfaces

Note: Transforming monolithic Boeing Bold Stroke components by factoring functionality and by isolating and regularizing communication interfaces

Factoring a component into functional facets requires human understanding. Essentially, the legacy interface methods must be sorted into bins corresponding to the facets, and indicative names given to the new facet classes. To provide a clean specification facility for the Boeing engineers using the BMT, SD developed a simple facet specification language. For each component, an engineer names the facets and uniquely identifies which methods (via simple name, qualified name, or signature if necessary) comprise its interface. The bulk of the migration engineer's task is formulating facet specifications for all the components to be migrated, a very easy task for a knowledgeable engineer.

The BMT translates components one at a time. Input consists of the source code, the facet specification for the component being translated, and the facet specifications of all components with which it communicates, plus a few bookkeeping directives. Conversion-related input is succinct.

The facet language itself is defined as a DMS domain, allowing DMS to automatically generate a parser from its grammar. (The BMT therefore is a multi-domain application, employing both Visual C++ and the facet language.) An attribute evaluator over the facet domain traverses the facet specifications' ASTs and assembles a database of facts for use during component transformation.

After processing the facet specifications, the BMT parses and does full name and type resolution on the C++ source code base, including files referenced (via #include) by any of the components in play. The name resolver constructs a symbol table for the entire code base, allowing lookup of identifiers and methods with respect to any lexical scope. Only by internalizing the entire code base in this manner can symbol lookups and the transformations depending on them be guaranteed sound. This is one key point that defeats scripting languages as C++ transformers.

Four particular collections of transformations typify what the BMT does to perform the component migration:

New classes for facets and their interfaces are generated based on the facet specifications. The BMT generates a base class for each facet, essentially a standard form. A wrapper class is also generated, inheriting from the facet, and containing one method for each method in the functional facet's interface. These wrapper methods simply relay calls to the corresponding method in the component's core classes. Constructing the wrapper methods involves replicating each method's header and utilizing its arguments in the relayed call. Appropriate #include directives must be generated for access to entities incorporated for these purposes, as well as for standard component infrastructure. A nest of constructor patterns expressed in the DMS pattern language pull the pieces together into a class definition. This is an example of pattern-based code synthesis.
After constructing the facets and wrappers, the BMT transforms all the legacy code calls to any of the facets' methods, redirecting original method calls on the core class to instead call the appropriate wrapper method via newly declared pointers. The pointer declarations, their initializations, and their employment in access paths are all inserted using source-to-source transforms, the latter with conditionals to focus their applicability. Figure 3 illustrates a variant of the facet reorganization and adjustment of access paths.
Newly generated receptacle classes provide an image of the outgoing interface of a component to the other components whose methods it calls. Since a particular component's connectivity to other components is not known at compile time, the receptacles provide a wiring harness through which dynamic configuration code can connect instances into a flight configuration. Constructing the receptacles involves searching all of a component's classes for outgoing calls and generating code to serve each connection accordingly. This is a combination of synthesis and semantically-directed transformation.
Event sinks are classes that represent an entry point through which an event service can deliver its product. Since the code for event processing already exists in the legacy classes (though its location is dispersed and not specified to the BMT), synthesizing event sinks involves having the BMT identify idiomatic legacy code by matching against DMS patterns for those idioms. Code thus identified is moved into the new event sink class, which is synthesized with a framework of constructive patterns. Definitions and #include directives supporting the moved code must also be constructed in the event sink class. Event sink code extraction and synthesis exemplifies pattern-based recognition of idiomatic forms and namespace and code re-organization and simplification via semantically informed transformation.

Figure 3: Engineer-specified component facetization, illustrating re-factoring of class structure and redirection of references
Figure 3: Engineer-specified component facetization, illustrating re-factoring of class structure and redirection of references

Note: This illustrates refactoring of class structure and redirection of references.

Experience

Boeing has extensive expertise in avionics and component engineering, but only a nascent appreciation of transformation technology. The tool builder, SD, understands transformation and the mechanized semantics of C++, but had only cursory prior understanding of CORBA component technology and avionics. Other operational issues in the conversion were the strict proprietary nature of most of the source code, uncertainty about what exotic C++ features might turn up in the large source code base, Boeing's evolving understanding of the details of the target configuration, and the geographical separation between Boeing and SD.

To deal with most of these issues, Boeing chose a particular non-proprietary component and performed a hand conversion, thus providing SD with a concrete image of source and target and a benchmark for progress. The hand conversion forced details into Boeing's consideration. New requirements developed in mid-project, modifying the target (on one occasion very significantly). The flexible technology allowed SD to adjust the tool accordingly and with manageable reworking. Being unburdened by application knowledge, SD was able to focus purely on translation issues, removing from the conversion endeavor the temptation to make application-related adjustments that could add instability. Electronic communication of benchmark results provided a basis for ongoing evaluation, and phone conferences supported development of sufficient bilateral understanding of tool and component technologies, minimizing the need for travel.

SD's lack of access to the full source code base required the tool builders to prepare for worst case scenarios of what C++ features would be encountered by the BMT in the larger source base. This forced development of SD's C++ preprocessing and name resolution infrastructure to handle the cross product of preprocessing conditionals, templates, and macros. These improvements both hardened the tool against unanticipated stress and strengthened the DMS infrastructure for future projects.

Evaluation

Boeing used the BMT to convert two large components handling navigation and launch area region computations to the PRiSm Component Model for use in the successful DARPA PCES Flight Demonstration, which occurred in April, 2005.

Tool input was extremely easy to formulate, generally around 20 minutes per component. The tool's name resolution capability proved to be a major advantage over the scripting approaches usually used, which often create unpredictable side effects that BMT is capable of avoiding. The generated and modified code was regular in style and free of irregularities and unexpected errors.

Human engineers did a modest amount of hand-finishing, mostly in situations understood in advance to require human discretion. In these cases, the tool highlighted these places with comments recommending some particular action and sometimes generated candidate code. Roughly half the time, this candidate code was sufficient; other cases required hand modification. The most difficult part of the conversion was the event sink mechanization. For input components that had their event mechanization spread across several classes, the BMT correctly moved the code fragments into the event sinks, but the human engineer was required to update some code inside the transitioned code fragments. This would have been necessary with a hand conversion as well. More extensive engineering of the BMT might have eliminated some or this entire requirement.

Return on Investment

Developing a custom migration tool takes a significant effort. Not including what the DMS environment contributed, the BMT required 13,000 lines of new tool code. This development cost must be balanced against the alternative cost of doing the tool's work by hand, so the tradeoff for mechanized migration depends predominately on the amount of code being ported. For small applications, the effort is not worthwhile, but with even a modest sized legacy system, the economics quickly turn positive.

One benchmark legacy component conversion gives an idea of the scale of this conversion. The legacy component, typical in size and complexity, contained 9,931 lines of source code. The BMT-converted component contained 9,456 lines, including 2,109 lines of code in newly generated classes and 2,222 modified lines in the residual legacy classes. Scaling these numbers, to convert a mere 60 components would require revision or creation of over 250,000 lines of code. Porting four such components would cause more new code to be written than went into the BMT itself. With airframes typically containing thousands of components, the economic advantage of mechanized migration is compelling.

The BMT automates only the coding part of the migration. Testing and integration are also significant factors, and some hand polishing of the BMT output was required. Even so, the savings in coding time alone allowed a reduction of approximately half the total time required to migrate the components used for the DARPA PCES demonstration. The regularity of style in automatically migrated code provides a less quantifiable but nevertheless worthwhile extra value.

The measure of success is not whether a migration tool achieves 100 percent automation, but whether it saves time and money overall. Boeing felt that converting 75 percent of the code automatically would produce significant cost savings, a good rule of thumb for modest-sized projects. Anything less puts the benefit in a gray area at least. The code produced by the BMT was 95 percent to 98 percent finished. This number could have been driven higher, but the additional tool development cost did not justify the dwindling payoff in this pilot project.

Cost-benefit tradeoffs should be considered when scoping the task of a migration tool, even as the project is in progress. In this project, for example, we could have developed elaborate heuristics for consolidating event sink code, but we judged the expense to not be worthwhile for the pilot project. Any project of this kind would face similar issues.

Huge projects would easily justify greater refinement of a migration tool, resulting in less need for hand polishing the results, and thus driving coding costs ever lower. Mechanization can mean the difference between feasibility and total infeasibility of even a medium size project.

Technological Barriers

One technical difficulty is the automatic conversion of semantically informal code comments. Though comments are preserved through the BMT migration, what they say may not be wholly appropriate for the newly modified code. Developing an accurate interpretation of free text discussing legacy code and modifying these legacy comments to reflect code modifications would challenge the state of the art in both natural language and code understanding. So while new documentation can be generated to accurately reflect the semantics of new code, legacy documentation must be viewed as subject to human revision.

Though the DMS C++ preprocessor capability, with its special treatment of conditionals, was up to the task for this migration, extremely extensive use of C/C++ preprocessors exploiting dialect differences, conditionals, templates, and macros can lead to an explosion of possible semantic interpretations of system code and a resource problem for a migration tool. Preserving all these interpretations, however, is necessary for soundness.

Furthermore, since macro definitions and invocations must be preserved through migration, macros that do not map cleanly to native language constructs (e.g., producing only fragments of a construct or fragments that partially overlap multiple constructs) are very difficult to maintain. Though these unstructured macro definitions cause no problem for compilers, since they are relieved prior to semantic analysis with respect to any single compilation, to preserve them in the abstract representation of a program for all cases is extremely difficult.

All these factors suggest that for some projects, a cleanup of preprocessor code prior to system migration is in order. For reasons of scale and complexity, this is a separate problem that could be tackled with another automated, customized tool.

Observations

A few over-arching observations apply to this and other mass transformation projects:

Mass migrations are best not mingled with changes in business logic, optimization, or other software enhancements. Entangling tasks muddies requirements, induces extra interaction between tool builders and application specialists, and makes evaluation difficult, at the expense of time and money. Related tasks may be considered independently, applying new transformation tools if appropriate.
Automating a transformation task helps deal with changing requirements. Modifying a few rewrite rules, constructive patterns, and organizational code is far easier and results in a more consistent product than revising a mass of hand-translated code. Changes implemented in the tool may manifest in all previously migrated code by simply rerunning the modified tool on the original sources. This allows blending the requirements definition timeframe into the implementation timeframe, which can significantly shorten the whole project.
Cleanly factoring a migration task between tool builders and application specialists allows proprietary information to remain within the customer's organization while forcing tool builders toward optimal generality. Lack of access to proprietary sources, or in general lack of full visibility into a customer's project induces transformation engineers to anticipate problems and confront them in advance by building robust tools. Surprises therefore tend to be less overwhelming.
Automated transformation allows the code base to evolve independently during the migration tool development effort. To get a final product, the tool may be rerun on the most recent source code base at the end of development. There is no need for parallel maintenance of both the fielded system and the system being migrated.
Using a mature infrastructure makes the construction of transformation-based tools not just economically viable, but advantageous. Not doing this is infeasible. Language front ends and analyzers, transformation engines, and other components are all very significant pieces of software. The BMT contains approximately 1.5 million lines of source code, but most is infrastructure. Only 11,000 lines of code are BMT-specific. Furthermore, off-the-shelf components are inadequate to the task. For example, lex and yacc do not produce ASTs that are suitable for manipulation. Only a common parsing infrastructure can produce AST structures that allow a rewrite engine and code generating infrastructure to function over arbitrary domain languages and combinations of languages.
Customers can become transformation tool builders. There is a significant learning curve in building transformation-based tools. A customer seeking a single tool can save money by letting transformation specialists build it. But transformation methods are well-suited to a range of software life-cycle tasks, and engineers can be trained to build tools themselves and incorporate the technology into their operation with great benefit and cost savings.

Future Directions

The PRiSm or CORBA component technologies impose computational overhead as service requests are routed through several new layers of component communication protocol. A DMS-based approach to partial evaluation could relieve this overhead. Essentially, the extra layers exist to provide separation of concern in design and coding and to provide plug-and-play capability at configuration time. With semantic awareness of the component wiring present in the source code, though, a transformation tool could be developed to statically evaluate the various communication indirections, thus sparing that run-time overhead. In this highly performance-sensitive environment, the effort could be well justified.

References

Semantic Designs. PARLANSE Reference Manual. Austin, TX: Semantic Designs, Inc. 1998.
Baxter, I. D., C. Pidgeon, and M. Mehlich. DMS: Program Transformations for Practical Scalable Software Evolution. Proc. of the 26th International Conference on Software Engineering, 2004.
Semantic Designs Inc. 13 Apr. 2005 http://www.semanticdesigns.com.
Sharp, David C. Reducing Avionics Software Cost Through Component-Based Product Line Development. Proc. of the 1998 Software Technology Conference, Salt Lake City, UT.
Gidding, V., and B. Beckwith. "Real-Time CORBA Tutorial." OMG's Workshop on Distributed Object Computing For Real-Time and Embedded Systems, Arlington, VA, 14-17 Jul. 2003 www.omg.org/news/meetings/workshops/rt_embedded2003.htm.

Acknowledgements

The authors thank the Defense Advanced Research Projects Agency-Program Composition for Embedded Systems program for its funding.

About the Authors
Dr. Robert L. Akers

Robert (Larry) Akers, Ph. D., is the lead developer of the Boeing Migration Tool at Semantic Designs Inc. At Semantic Designs Inc., SciComp Inc., and Computational Logic Inc, he has devoted 25 years to applied research in mathematical modeling of digital systems, software language design and semantics, automated software synthesis, and automated analysis and transformation of software systems. Akers has a master's degree and doctorate in computer science from the University of Texas at Austin.

Semantic Designs Inc.
12636 Research BLVD STE C214
Austin, TX 78759-2200
Phone: (512) 250-1018
Fax: (512) 250-1191
E-mail: lakers@semanticdesigns.com

Ira D. Baxter founded Semantic Designs Inc. in 1996, winning a $2 million National Institute of Standards & Technology award to develop the PARLANSE language and the basic technology for the Design Maintenance System Software Engineering Toolkit, of which he is the principal architect. Baxter been designing software systems since 1976 and has extensive experience in operating systems, compilers, and software engineering. He has been co-chair and invited speaker at various conferences on software engineering, maintenance, and reuse.

Semantic Designs Inc.
12636 Research BLVD STE C214
Austin, TX 78759-2200
Phone: (512) 250-1018
Fax: (512) 250-1191
E-mail: idbaxter@semanticdesigns.com

Michael Mehlich is the Lead Research Engineer for the development of the Design Maintenance System Toolkit at Semantic Designs, with major contributions to its core infrastructure and industrial applications. Mehlich is an expert on information theory with a special focus on formal methods and tools for software engineering, which he has pursued over a 10-year career in academia and industry.

Semantic Designs Inc.
12636 Research BLVD STE C214
Austin, TX 78759-2200
Phone: (512) 250-1018
Fax: (512) 250-1191
E-mail: mmehlich@semanticdesigns.com

Brian Ellis is a software engineer with the Boeing Company. He is currently supporting Defense Advanced Research Projects Agency programs within Boeing's Phantom Works Division. His specialty is real-time embedded software development for military avionics applications. He has been involved in the development of the F/A-18, AV8B, and Tomahawk avionic systems.

The Boeing Company
6200 J. S. McDonnell BLVD
Berkeley, MO 63134
Phone: (314) 234-2011
Fax: (314) 233-8323
E-mail: brian.j.ellis@boeing.com

Kenn Luecke is a software engineer with the Boeing Company. He is currently developing software on the Network Centric Organization's PCES II project for Boeing's Phantom Works division. He has previously developed software for the AV8B and FA18 Mission Computer Software teams for McDonnell Douglas and the Boeing Company.

The Boeing Company
6200 J. S. McDonnell BLVD
Berkeley, MO 63134
Phone: (314) 232-7178
Fax: (314) 233-8323
E-mail: kenn.r.luecke@boeing.com