Keynote:
DMS (Transformational Software Maintenance by Reuse):
A Production Research System?


Ira D. Baxter
Semantic Designs, Inc.
12636 Research Blvd, #C214
Tel: (512) 250-1018, fax: (512)250-1191
idbaxter@semdesigns.com
http://www.semdesigns.com

Extended Abstract

The Design Maintenance System™ (DMS™) is a research vision and a set of tools for semi-automated construction and maintenance of medium to large-scale software systems. Rather than reusing code, DMS is intended to apply program transformation technology, (re)use construction knowledge from multiple domains, and (re)use the design of the target software system. For the DMS vision to be successful in practice, not only must the related research problems be solved, but the related scale problems must also be solved. A considerable amount of domain knowledge must be acquired, and DMS must be able to handle large codes in multiple legacy languages. To obtain enough time and resources to handle such scale, DMS must be successful in a commercial sense. We walk a difficult line between Research and Product. The DMS vision is sketched, and the technology we have used to build a scalable, near-production foundation will be described. We mention some DMS applications.

The Design Maintenance Vision

Some 80% of Software Engineering lifecycle costs are invested in "software maintenance" [Grady87], and yet there are few decent methods and tools for improving this activity. (These figures also apply to software built using reuse techniques). We believe that a key problem is absence of a tool-manipulable design, containing a specification, design decisions, implementation decisions, and decision rationale. Such a tool will be most effective if has deep semantic understanding of the problem domain as well as the implementation domains.

Transformation systems [Agresti86] [Partsch90] [Batory98] are systems that apply rules to port, optimize, compile or otherwise modify syntactic structures. Problem domain notations (factory process diagrams, constraint languages, UML, algebra, state machines, Java) provide the syntax. The notion of "correctness preserving" transforms is based on the semantics of the domain, and provides one class of "deep semantic understanding". Ideally, the problem domain semantics are written down (in yet another domain notation for specifying semantics) in a way that can be harnessed by analyzers, etc; technically, the transforms can be derived from the semantics.

DMS follows the Draco model [Neighbors84] in reusing (multiple) domain analysis/engineering for code generation in the form of domains, having formal syntax and reusable generative knowledge in the form of transforms. DMS extends this model to include metaprograms to control the choice of applied transforms [Wile83] augmented with performance goals (XCL).

A (transformational) design is: [Baxter90]:

  1. A specification written in some domain notation, stating both functionality and performance goals
  2. An implementation coded in some other (lower level) domain notation (e.g., C++)
  3. A set of functionality-preserving transformation steps mapping the spec to the code
  4. A rationale for each transformation step, which justifies its use in achieving the performance goals, derived from XCL goal/execution traces.

A scheme for writing formal maintenance deltas (functional, performance, technology and others) and for using these deltas to semi-automatically, incrementally revise the design is sketched in [ABPF85, Baxter90, Baxter92, BaxPid97]. The approach revalidates design decisions in the new context, and can discard decisions that no longer apply, making it more sophisticated than simple decision replay [Gold90]. The revised design contains updated implementation code. DMS is the research vehicle to test out this idea in practice. DMS will reuse (domain) synthesis knowledge, and design knowledge of a particular implementation.

Scale

A significant problem to testing this idea is scale, along a number of fronts:

  1. Real languages, such as COBOL/85, are huge, messy, and inconvenient to parse/analyze/prettyprint.
  2. The knowledge required to understand an application comes from many domains (problem, computer science, and target language), almost independent of application size.
  3. Collecting enough knowledge canít be justified for small applications.
  4. The DMS tools themselves are a large application, and require considerable energy to design and implement.
  5. One must reverse-engineer legacy systems to obtain the design information necessary for a DMS-like system.
  6. Big applications require significant computational resources to manage, especially for symbolic computation such as transformation systems.

We have chosen to tackle the scale problem because of the potential payoff.

To handle real languages, we have implemented strong (GLR) parser technology [Wagner97] and special methods for crusty preprocessors. The parsers can handle ambiguous languages, automatically build abstract syntax trees, collect comments and lexical formatting information, and prettyprinters can regenerate source from the AST with high fidelity on a million lines of source. We have already built target domains for C/C++, COBOL/2, Fortran95, Java, Pascal, and to demonstrate the potential for DMS application to hardware design, VHDL and Verilog.

For domain acquisition, the parsers accept source-to-source transforms, and we have built an algebraic specification domain [SPECTRUM93] in which to specify semantic foundations, abstract data types and rewrite rules. We expect to invest in other computer science domains such as IDL and SQL next.

Since DMS is a transformational synthesis system, it can aid in its own construction. We have domains and generators for parsers, attribute evaluators, and prettyprinters. A significant fraction of the DMS code is automatically generated by DMS.

The design of large legacy systems is always lost, and so a transformational design is not available. One must be recovered through unconventional reverse engineering, which involves proposing, in reverse, a plausible transformation sequence that untangles optimizations and abstracts implementations back to domain concepts [BaxMeh97]. The very domain synthesis knowledge required to generate and maintain such applications appears to be the crucial information required to carry this out. The reverse and re-engineering communities have not yet really discovered the conceptual power that a forward synthesis engine offers to the reverse engineer.

Computing power we obtain through PARLANSE, a programming language for expressing the irregular, small-grain parallelism that seems natural to large, irregular data structures. PARLANSE runs on commodity symmetric multiprocessors. All of DMS is coded in PARLANSE, on the grounds that one cannot add parallelism after the fact to a large application. Interestingly, generated attribute evaluators use partial-order parallelism to obtain nearly linear speedups. In the long term, we expect to compile PARLANSE transformationally using DMS.

Practicality

Building research systems that require scale to demonstrate practical value is:

  1. Impractical in universities because large, long-lasting teams are implicitly discouraged, and
  2. Generally unfundable in the commercial sector due to long gestation times and unacceptable perceived risk.

Such research systems must gain seed money from unusual sources. One possible approach, which we have taken, is to build with the long-term intention of being commercially successful. Although it is our goal to be successful in the long term, one need not necessarily deliver early or even any commercial value from the long term vision Rather, the intermediate steps themselves may prove by design to be quite valuable. For DMS, we think this is likely true for the domain-oriented approach and our scale investment. We remark that the long-term DMS vision has been a significant benefit rather than a hazard, as it provides us with considerably generality for the DMS components.

DMS was launched via Department of Commerce Advanced Technology Program funding, allowing us several years of investment to establish the base technology. We have endeavored to build DMS "bottom up" to aid commercial viability, and have consequently concentrated on scale and handling of legacy languages, with emphasis on reengineering support rather than design capture. A downside of this approach is the risk of being sidetracked; we hope the research background of the principals will prevent the latter. Recently, the DMS toolkit is verging on commercial viability for some of its (re-engineering) aspects, such as clone detection (below). Such applications are necessary to continue funding the long-term vision.

The traditional research method of focusing on a single result has the advantage of obtaining an interesting demonstration at modest cost. But it has the significant defect of often producing a toy, which cannot be scaled into practice because the implementation foundation is hurried, or because in fact its small scale allowed it gloss over significant troubles. We think many software engineering demonstrations today suffer from this problem. We hope that DMS has sufficient scale so that the all the troubles will appear, and hope its commercial practicality at some level give us the resources to attempt appropriate fixes.

Early Applications of DMS

DMS has been/is being applied to interesting problems:

  1. Large scale Boolean (domain) expression simplification: A Boolean expression DAG equivalent to some 120,000 terms was simplified to 250 terms by using axioms, extracted from a Boolean algebra specification, as rewriting rules supported by DMSís term rewriting engine.
  2. Generation of factory controller code (domain) from a domain-specific factory process automation language.
  3. Detection of exact and near-miss code clones [BaxterEtAl98] and removal by macro/procedural abstraction, for full C, C++, and COBOL.

Clone detection [BaxterEtAl98] shows singular early promise. It matches early DMS capability in using scale parsing in legacy languages, and the O(N^2) clone-finding process is linearly sped up by careful coding of the detector in PARLANSE. We were surprised to find that large application suites seem to consistently have 10% removable redundancy. Removing clones has the potential of cutting maintenance costs proportionately. A useful side effect of commercial clone detection is the validation on scale of legacy domains, and access to large applications with a friendly client. We also observed that most clones appear to domain abstraction instances, which suggests that clone detection can aid domain engineering.

We are considering application of DMS to VLSI, where the in-vogue design methodology produces several design representations for a single chip (architecture, register transfer level, gate level, simulators, etc.). In this world, the problems are synthesis of lower levels of design from higher levels and validation of one level from another. This matches the DMS multiple domain model quite well. Remarkably, the engineering management of such organizations seems more receptive to the DMS vision than most software organizations.

Conclusion

DMS is a large research tool for the long-term investigation of semiautomatic software maintenance. It provides means to capture and reuse domain knowledge (syntax, semantics, generative knowledge), and eventually means to capture design information and update it from change specifications.

Such a tool faces scale and high engineering costs. We address scale by modularizing knowledge via domains, building robust parsing/prettyprinting technology, aiding entry of domain information with algebraic specifications of rewrites, and a parallel computing foundation. High engineering costs we face by attempting to make the tool commercially viable, to fund its own development.

To date, we have some 20 man-years invested. DMS is showing early practical promise in its clone-detection/removal ability on large-scale legacy COBOL codes. We expect to try scale custom reengineering tasks with DMS in 1999 and move on to design capture and revision.

We thank the Department of Commerce for ATP Award: 70NNANB5H1165.

References

[ABPF86] Guillermo Arango, Ira Baxter, Christopher Pidgeon, Peter Freeman, "TMM: Software Maintenance by Transformation", IEEE Software 3(3), May 1986, pp. 27-39

[Agresti86] William W. Agresti, What are the New Paradigms?, New Paradigms for Software Development, William W. Agresti, IEEE Press, 1986, ISBN 0-8186-0707-6.

[Batory98] D. Batory, "JTS: Tools for Implementing Domain-Specific Languages", Proceedings of 5th International Conference on Software Reuse, 1998, IEEE.

[Baxter90] Ira Baxter, Transformational Maintenance by Reuse of Design Histories, Ph.D. Thesis, ICS Department, University of California at Irvine, November 1990, TR 90-36.

[Baxter92] Ira Baxter, "Design Maintenance Systems", Communications of the ACM 35(4), April 1992.

[BaxMeh97] I. Baxter and M. Mehlich. "Reverse Engineering is Reverse Forward Engineering". 4th Working Conference on Reverse Engineering, 1997, IEEE

[BaxPidg97] I. Baxter and C. Pidgeon. "Software Change Through Design Maintenance". Proceedings of International Conference on Software Maintenance, 1997, IEEE

[BaxterEtAl98] I. Baxter, A. Yahin, A., L. Moura, M. Sant'anna, L. Bier. "Clone Detection Using Abstract Syntax Trees" -Proceedings of International Conference on Software Maintenance '98, 1998, IEEE Press.

[Gold90] A. Goldberg. "Reusing Software Developments". ACM SIGSOFT Symposium on Software Development Environments. Kestrel Institute Technical Report KES.U.90.2, 1990.

[Grady87] R. Grady, Software Metrics: Establishing a Company-Wide Program, Prentice-Hall, 1987.

[Neighbors84] J. Neighbors. "The Draco Approach to Constructing Software from Components". IEEE Transactions on Software Engineering 10(5), 1984.

[Partsch90] Helmut A. Partsch, Specification and Transformation of Programs, Springer-Verlag, 1990.

[SPECTRUM93] M. Broy, C. Facchi, R. Grosu, R. Hettler, H. Hußmann, D. Nazareth, F. Regensburger, O. Slotosch, and K. Stølen. "The Requirements and Design Specification Language Spectrum: An Informal Introduction", Version 1.0. Technical Reports TUM-I-9311 and TUM-I9312, Technische Universität München, 1993

[Wagner97] Tim Wagner and Susan Graham, "Incremental Analysis of Real Programming Languages", Proceedings 1997 SIGPLAN Conference on Programming Language Design and Implementation, June 1997, ACM

[Wile83] D. Wile, Program Developments: Formal Explanations of Implementations, CACM 26(11), Nov. 1983