Editorial Special Issue Artificial Intelligence in Medicine "Architectures for Intelligent Systems Based on Reusable Components"

June 1995

M. A. Musen [1] A. Th. Schreiber [2]

[1] Section on Medical Informatics, Stanford University School of Medicine Stanford, California, 94305--5479

[2] Social Science Informatics, University of Amsterdam Roetersstraat 15, 1018 WB Amsterdam, The Netherlands

Background

In recent years, there has been considerable pessimism about the practical application of decision-support technology in medicine. At the surface, this pessimism appears paradoxical. Given the dramatic growth of medical information and knowledge, there should be enormous demand for computer systems that aid in information management. Even if such systems are imperfect, they still ought to be used. Many of us, for example, complain bitterly about the deficiencies of our word processors, yet we continue to use them routinely anyway. Why, then, are so few people clamoring for the decision-support software that the medical-informatics community continues to build?

This seeming paradox is at the center of many ongoing debates. Heathfield and Wyatt (1993) suggest that not enough attention is given to practical issues, such as whether users actually benefit from a system. At the same time, the need for proper organizational analysis has been emphasized by workers outside the medical field (Klinker et al., 1993); decision-support systems cannot be designed in isolation, but must meet the work flow requirements of complex organizations. Heathfield and Wyatt argue that there is a need for sound development methodologies for decision- support systems. Indeed, the literature describes countless systems that have been built in an ad hoc fashion without clear theoretical underpinnings. It is our contention that the lack of principled development strategies seriously hampers evaluation and maintenance of our systems, and leads to curtailed system life cycles.

It is not sufficient, however, to argue simply for more principled approaches to software development. The question remains: On what principles should a viable methodology for medical decision-support systems be based? The hypothesis of the research reported in this special issue of Artificial Intelligence in Medicine is that a prime principle underlying systems engineering should be the configuration of systems from reusable---empirically validated---components. A system may be build from scratch, but it should consist of components that have proved their viability in the past.

This philosophy, of course, represents nothing new. The reuse of successful designs and the modular construction of new artifacts are important goals for all engineering disciplines. Component-based software architectures, have analogous, important advantages. They provide handles for quality control and for reliability, both of which are sine qua non in the medical setting. Component-based architectures also provide well-defined interfaces to existing information systems, and in many ways are concordant with ongoing standardization efforts in medicine---such as clinical protocols, practice guidelines, and controlled terminologies.

In this editorial, we sketch relevant developments in the knowledge-engineering community in general, discuss the specific research objectives that this approach entails for the medical domain, and relate the research in the articles of this issue to these objectives. Finally, we discuss the open challenges that current research on component-based architectures seeks to address.

Developments in Knowledge Engineering

In the last ten years, a major transformation has occurred in the field of knowledge engineering. Until the late 1980s, the practice of building knowledge-intensive systems was viewed uniformly as "extracting" rules from application experts, and putting those rules into an expert-system shell. Developers built systems rule by rule, attempting to mimic with their rule bases the problem-solving behaviors that application experts seemed to display. This "knowledge mining" view has many weaknesses, as Clancey (1983) showed in his analysis of the MYCIN system. First, it generally is impossible to elicit from professionals in a given application area an adequate set of rules for all but trivial tasks. This elicitation problem has been called the "knowledge acquisition bottleneck," a mournful phrase that has been repeated so often in the literature that it almost has become trite. Second, the resulting knowledge- based systems are generally difficult to maintain: Adding one more rule can change the behavior dramatically.

Newell's (1982) introduction of the "knowledge-level hypothesis" proved to be a turning point. This hypothesis states that it is possible to describe the behavior of a rational agent at a level that is independent of particular symbolic computer representations. This level was termed the "knowledge level". Newell's proposal was taken to heart by several knowledge-engineering research groups who were studying the development of decision- support systems. These researchers continue to advocate approaches in which a knowledge-level model plays a central role. Such a model can be seen as a reification of Newell's abstract hypothesis for practical use in knowledge engineering.

Various frameworks (Chandrasekaran, 1989; Clancey, 1985; Marcus, 1988; Puerta et al., 1992; Steels, 1990; Wielinha et al., 1992) for knowledge-level modelling have been proposed in the literature, each with its own merits. Three principles lie at the heart of all these proposals:

We discuss each of these in some detail.

Role-limiting

A knowledge-level model imposes a structure on the knowledge of an agent, indicating the role that each element of this knowledge plays in problem solving. By limiting the manner in which knowledge can be used, it possible to overcome the computational problems associated with rule- based systems and theorem provers, where all knowledge is potentially applicable at each point during the reasoning process. The names used for knowledge roles are required to be meaningful for an external observer, and are therefore potentially useful for explaining the reasoning to a user. Thus, adoption of the role-limiting principle facilitates maintenance of a developed system, by clearly identifying the purpose of each entry in the knowledge base.

Knowledge-typing

Role-limitation is achieved via an elaborate typing of knowledge. Although the various approaches have used different terminology for specific knowledge types, there is some consensus that the at least the following distinctions need to be made:

Knowledge types thus provide the vocabulary for writing down knowledge-level models.

Reusability

A prime reason for introducing knowledge-level models is that such abstractions pave the way for reusability of model components. Implementation-dependent, "symbol-level", representations turn out to be impossible to reuse in a new situation. There is just no basis for comparing or even understanding them without an accompanying knowledge- level description (see, for example, the study by Brachman and Smith, 1980). Knowledge-level descriptions, on the other hand, allows a developer to identify generic components of decision-support systems.

For each of the knowledge types listed above, researchers have identified distinct classes of such generic descriptions. For example, Breuker and Van de Velde (1994) distinguish nine task types: design, diagnosis, prediction, planning, scheduling, configuration, assignment, assessment, and modelling. Other researchers have proposed sets of standard methods and ontologies.

Reusability of KBS Components in Medicine

Given the different knowledge types identified in the previous section, we can specialize these types to reflect the objectives of research on reusable components for medical decision-support systems.

As investigators in medical AI work to seek answers to questions such as these, the use of component-based approaches is becoming standard of practice within the general software-engineering community. The use of object-oriented design and object-oriented programming is now pervasive. The Object Management Group's common object-request broker architecture (CORBA) gives software objects that are distributed over networks the ability to send messages to each other in standardized ways. Software vendors everywhere are beginning to market sets of reusable objects that programmers can incorporate directly into their applications code.

Many workers in AI have resisted the notion that building knowledge-based systems is actually a form of software engineering. Many knowledge engineers view their craft as something very distinct from computer programming. Nevertheless, the popularization of modern object-oriented software engineering had its origins in the frame-based knowledge-representation systems developed by AI researchers in the late 1970s. It was languages such as KL- ONE and KRL that inspired the development of languages such as SmallTalk and, ultimately, C++. It therefore is particularly ironic that developers of knowledge-based systems now must turn to the traditional software- engineering community to observe the manifest value of modularity of design and of the reuse of software components.

Concomitant with the revolution in object-oriented programming, there has been a revolution, of sorts, in the process of health care. Concern both about the escalating cost of medical care and about wide variation in clinical practices and utilization of services has prompted the widespread development of clinical protocols and practice guidelines. Payors, professional societies, health-care institutions, and physicians themselves are actively involved in defining standardized clinical practices that can impose more uniformity in the way that patients receive care. It is not too far-fetched an analogy to view the practice of medicine as increasingly based on the reuse of standardized "components". It remains an open challenge for policy makers, however, to identify the most appropriate means for clinicians to select protocols and guidelines in particular circumstances, and to apply those components of care in an optimal manner. (These difficulties parallel those faced by software developers who must select and integrate software elements from standard libraries.)

As clinical protocols and guidelines become increasingly ingrained in medical practice, health-care workers will be seeking new ways by which they can benefit from the knowledge of relevant guidelines directly at the point of care. In this regard, the articles in this special issue share a common theme: the desire to bring protocol knowledge to the point of decision making---whether those decisions concern strategies for cross-matching blood products for transfusion (Smith et al., this issue) or therapy regimens for patients who have AIDS (Tu et al., this issue). Knowledge-based systems offer the possibility to communicate protocols and guidelines in a highly detailed, situation-specific fashion. Whereas protocols that are defined using paper documents typically contain rigid clinical algorithms that, for pragmatic reasons, may not be able to incorporate many of the nuances that may affect day-to-day decision making, knowledge-based systems offer an extremely rich medium for expressing medical knowledge and for applying that knowledge in highly targeted ways. The authors who have contributed to this special issue certainly expect their systems to have practical clinical benefits. Whether the methodologies that they describe in this issue will lead to decision-support systems that overcome the pragmatic obstacles identified by Heathfield and Wyatt remains to be seen, however.

Although component-based architectures have the potential to address many of the software-engineering difficulties that previously have hindered the deployment and maintenance of decision-support systems, these approaches offer additional scientific advantages. By making explicit the ontologies that define different areas of medical practice, by clarifying the generic problem-solving methods that may apply to different decision-making tasks, and by demonstrating which components are and are not reusable in solving new application problems, the architectures described in this special issue can clarify the nature of medical knowledge itself. These architectures thus serve as substrates for exploring models of medical knowledge, and for elucidating the structure of medical expertise.

Contents of this issue

This special issue of Artificial Intelligence in Medicine contains three original articles that present distinct, but complementary, approaches to the development of knowledge-based systems from reusable components.

The first paper, by Smith et al concentrates on the control of problem solving required for sophisticated knowledge-based systems. Smith's group builds on Soar , a "universal weak method" that construes all problem-solving as search through a problem space. Although Soar has been an influential architecture for building intelligent systems within the AI community, Smith contents that, for Soar to be truly usable, it is essential for developers to be able to view problem solving at a higher level of abstraction. Smith represents in Soar several of the generic tasks (problem-solving methods) identified previously by his group and by other workers at Ohio State University, and applies those problem-solving methods to two significant application areas in the domain of clinical pathology.

The second paper, by Van Heijst et al. does not address the control issues that are central in Smith's work, and instead focuses on the problems of developing reusable domain ontologies. Van Heijst and his colleagues, though their research as part of the GAMES-II project within the European Community, point out the difficulties of creating domain ontologies that truly are reusable in new applications, and discuss the ways in which domain ontologies stored in an archival library must be modified when actually incorporated within new knowledge-based systems.

The final paper, by Tu et al. demonstrates how a relatively mature component-based architecture, PROTEGE-II, has been applied to the domain of protocol-based care for patients who have AIDS. Tu and his co-authors from Stanford University show how PROTEGE-II is used to construct a general domain ontology, to modify that ontology for the purposes of the protocol-based--care application, and to construct an appropriate role-limiting problem-solving method for the task of planning protocol-directed therapy. An additional feature of the PROTEGE-II approach concerns the automated generation of a domain- specific knowledge-acquisition tool directly from the corresponding ontology; the domain-specific tool can be used by expert physicians to enter the details of individual AIDS-treatment protocols.

Although there are important distinctions among them, all three approaches recognize the importance of role-limiting problem-solving methods in structuring knowledge-based systems. The PROTEGE-II methodology emphasizes the reuse of previously defined problem- solving methods. Whereas developers can specialize a method by selecting alternative mechanisms to satisfy subtasks identified by the method, the creators of PROTEGE-II argue for problem-solving methods with relatively rigid, well understood control strategies and well defined data requirements. The Stanford group believes that the rigidity of their problem- solving methods provides important guidance during the knowledge-acquisition process---albeit with the potential problem that no method in the PROTEGE-II library can be configured for a given application task. The Ohio State group, on the other, favors flexibility both at runtime and at knowledge-acquisition time, and is willing to allow the weak search strategy of Soar's problem-space computational model (PSCM) to dictate control flow at runtime in order to accommodate unexpected changes in available input data. The GAMES-II project adopts a philosophy close to that of the PROTEGE-II group, although the paper by Van Heijst et al. primarily discusses the use of problem-solving methods to help index domain ontologies stored in libraries.

Similarly, all three approaches must make commitments to the manner in which system builders represent domain ontologies. Whereas domain knowledge in the Ohio State approach is represented solely in terms of Soar PSCM constructs, the PROTEGE-II and GAMES-II methodologies involve the construction of explicit, and separate, domain ontologies. The papers by Tu et al. and Van Heijst et al. discuss the types of ontological commitments that are required in their respective systems. The issues concerning the construction, indexing, and maintenance of the GAMES-II library of domain ontologies presented by Van Heijst et al. apply equally to the requirements of the PROTEGE-II project. At the same time, the use of domain ontologies to generate custom-tailored knowledge-acquisition tools, as described by Tu et al., is an approach that workers in the GAMES-II project currently are attempting to adopt in their own work.

The papers in this special issue consequently demonstrate different, yet interrelated, approaches to the development of component-based architectures. Each approach structures the components in distinct ways, and makes somewhat different assumptions about the manner in which components can be stored in libraries and reused for future applications. Only the Ohio State and Stanford groups have used their methodologies to build knowledge- based systems that are more than research prototypes--- indeed, evaluation of the Ohio State systems is a central aspect of the paper by Smith et al. Although this special issue brings descriptions of these three approaches together to facilitate the reader's understanding of current component-based architectures, the nagging question still remains: How can the research community determine the specific elements of each approach that lead to improved software-engineering practices, and to improved system performance? Given the complexity of each methodology, there is a significant credit-assignment problem that researchers in medical AI ultimately must address if we are to continue to build on the work presented here.

Outlook

The use of object-oriented design and the adoption of component-based architectures seem destined to influence knowledge engineering as much as these practices have altered conventional software engineering in recent years. Even more than is the case with traditional computer programs, knowledge-based systems are difficult to build, difficult to maintain, and often fail to meet the needs of their intended users. The ability to reuse previously tested components in the design of new knowledge-based systems has obvious appeal, and the results to date---although almost exclusively anecdotal---are extremely encouraging.

Significant technical challenges remain to be overcome, however. As developers expand their libraries of components, there will be increasing requirements to define precisely the meaning of individual components, and the ways in which components can be assembled into working systems. Currently, system architects deal with libraries of relatively limited numbers of components, and have a good intuitive feeling for the semantics of each element. As the contents of these component libraries expand, it will be difficult for developers to remember intuitively what any particular component might be "good for". Currently, we lack an adequate vocabulary to describe formally the features of problem-solving methods and domain ontologies that might confirm the suitability of those components for modeling new application tasks. Although the taxonomies of ontologies described by Van Heijst et al. represent a step in the right direction, substantially more work is needed to allow automated and semi-automated tools to browse through large component libraries and to retrieve potentially reusable components. In particular, despite a number of proposals that have been made, the formal characterization of the semantics of problem-solving methods---which inherently are procedural in nature--- remains an open issue.

The assembly of relevant ontologies and methods from libraries is a central problem in all component-based approaches. Whereas the role of each individual component in an overall system may be well known to the developer, the interactions among components may be difficult to visualize. Builders of component-based systems recognize that high priority must be placed on the creation of interactive development environments that can clarify the relationships among components at appropriate levels of abstraction. Tools that can examine previously successful designs and thereby critique an emerging component- based application also would have significant value.

Beyond the technical challenges, the adoption of component-based approaches ultimately will have significant policy ramifications for the managers of projects to develop knowledge-based systems. Although there may be significant advantages to the engineering of intelligent systems from well defined building blocks, those building blocks must, of course, be constructed in the first place. Rather than concentrating on the resources required to construct one-time applications, managers will have to commit to investing in the engineering of components that may not demonstrate their value until future systems are developed. As is already happening in component-based engineering of conventional software applications, there must be a shift to a "product line" view of the application- development process.

Since the early 1970s, workers in AI have emphasized the value of separating domain knowledge from reusable inference engines. Current component-based architectures extend this modularization a significant step further, with the goal that developers will be able to build intelligent systems from well understood primitives in a "plug and play" fashion. As the articles in this special issue testify, the important research questions no longer concern simply whether we can build useful knowledge-based systems from our reusable ontologies and methods; the real challenges now concern how can we can best build repositories of components---and thereby create a new kind of library science for management of those components---while providing not only automated tools that can dispel the complexity of configuring the building blocks, but also principled methodologies that can streamline and clarify the process of knowledge engineering.

Acknowledgements

Schreiber's work is supprted by the GAMES-II project. This project is part of the AIM Programme of the Commission of the European Communities as project number A2034.

References

H. A. Heathfield and J. Wyatt. Philosophies for the design and development of clinical decision-support systems. Methods of Information in Medicine, 32:1--8, 1993.

G. Klinker, D. Marques, and J. McDermott. The active glossary: Taking integration seriously. Knowledge Acquisition, 5:173--197, 1993.

W. J. Clancey. The epistemology of a rule based system -a framework for explanation. Artificial Intelligence, 20:215--251, 1983.

A. Newell. The knowledge level. Artificial Intelligence, 18:87--127, 1982.

B. Chandrasekaran. Generic tasks as building blocks for knowledge-based systems: The diagnosis and routine design examples. The Knowledge Engineering Review, 3(3):183--210, 1988.

W. J. Clancey. Heuristic classification. Artificial Intelligence, 27:289--350, 1985.

S. Marcus, editor. Automatic knowledge acquisition for expert systems. Kluwer, Boston, 1988.

A. R. Puerta, J. Egar, S. Tu, and M. A. Musen. A multiple-method shell for the automatic generation of knowledge acquisition tools. Knowledge Acquisition, 4:171--196, 1992.

L. Steels. Components of expertise. AI Magazine, Summer 1990.

B. J. Wielinga, A. Th. Schreiber, and J. A. Breuker. KADS: A modelling approach to knowledge engineering. Knowledge Acquisition, 4(1):5--53, 1992. Special issue `The KADS approach to knowledge engineering'. Reprinted in: Buchanan, B. and Wilkins, D. editors (1992), Readings in Knowledge Acquisition and Learning, San Mateo, California, Morgan Kaufmann, pp. 92-116.

R. J. Brachman and B. C. Smith. Special issue on knowledge representation. SIGART Newsletter, 70:1--138, 1980.

J. A. Breuker and W. Van de Velde, editors. The CommonKADS Library for Expertise Modelling. IOS Press, Amsterdam, The Netherlands, 1994.

M. Ramoni, M. Stefanelli, G. Barosi, and L. Magnani. An epistemological framework for medical knowledge based systems. IEEE Transactions on Systems, Man and Cybernetics, 22:1361--1375, 1992.

JM. A. Musen, K. E. Wieckert, E. T. Miller, K. E. Campbell, and L. M. Fagan. Development of a controlled medical terminology: Knowledge acquisition and knowledge representation. Methods of Information in Medicine, 1994..

G. Hripscak, P. D. Clayton, T. A. Pryor, P. Haug, O. B. Wigertz, and J. Van der Lei. The Arden Syntax for Medical Logic Modules. In: Proceedings of Fourteenth Annual Symposium on Computer Applications in Medical Care, pp. 200--204. Washington, DC: IEEE Computer Society Press, 1990.

M. A. Musen. Dimensions of knowledge sharing and reuse. Computers and Biomedical Research, 25:435--467, 1992.

J. W. Smith, A. Bayazitoglu, T. R. Johnson, K. A. Johnson, and N. K. Arma. One framework, two systems: Task specific architectures in the problem space paradigm applied to antibody identification and biopsy interpretation. Artificial Intelligence in Medicine, this issue.

J. Laird, P. Rosenbloom, and A. Newell. Universal Subgoaling and Chunking: The Automatic Generation and Learning of Goal Hierarchies. Boston: Kluwer Academic Publishers, 1986.

S. W. Tu, H. Eriksson, J. H. Gennari, Y. Shahar, and M. A. Musen. Ontology-based configuration of problem-solving methods and generation of knowledge acquisition tools: The application of protege-ii to protocol-based decision support. Artificial Intelligence in Medicine, this issue.

G. van Heijst, S. Falasconi, A. Abu-Hanna, A. Th. Schreiber, and M. Stefanelli. A case study in ontology library construction. Artificual Intelligence in Medicine, this issue.