The earliest AI reading systems were attempts to have a machine gain a comprehension of some text (usually limited to short texts or single sentences) by performing a straightforward dictionary lookup (e.g., see [#!ai:booth1!#]). Unfortunately, the number of possible ``raw'' meanings which can be generated from even a short sentence via this approach runs into the millions. A slightly more sophisticated system was ELIZA ([#!ai:weizenbaum1!#]). While not a story reading system, ELIZA was a language-using model capable of carrying on realistic conversations with humans. ELIZA managed to convince more than one person that it was comprehending them. However, a closer examination reveals that ELIZA was nothing but a sophisticated matcher and string substituter--really just a little more advanced than the brute force dictionary lookup. This idea will be returned to in the evaluation chapter of this dissertation (Chapter 8) when I explore the types of evaluation which might be better used to demonstrate the actual power of a reading theory.
Until the late 1960s, much of the work on developing a reading system was unprincipled and ad hoc. Woods (acm:woods) changed this with the creation of a language understanding paradigm known as Augmented Transition Networks, or ATNs. Originally embodied in the LUNAR project for NASA, ATNs are fairly robust systems which do a rather good job at parsing sentences. They generally are restricted to single sentences, and they generally produce only a syntactic parse of the input. However, additional systems can be daisy-chained onto the back-end of an ATN. These systems can take the syntactic parse produced by the ATN, do some sort of conceptual dictionary look-up, and arrive at an understanding of the sentence.
ATNs were a good beginning, but they were limited by their focus on the sentence level. Their strict reliance on the syntactic nature of language meant that they were unable to efficiently handle sentences with a clear semantic meaning but ambiguous syntactic interpretations. My sentence processor relies on both syntactic information and semantic information in order to make it both more robust and more efficient. Much earlier and utilizing a different technique for a similar outcome was the work of Winograd (read:winograd1) and his SHRDLU model. SHRDLU was a system which existed in a simulated micro-world containing only blocks, a tabletop, and a robot arm. One could type English sentences for SHRDLU describing various actions which the arm should perform. If SHRDLU comprehended the input, the proper actions would be taken. If not, clarifications could be asked for. To accomplish this comprehension task, SHRDLU made use of ATN technology, coupled with the concept of procedural semantics. This simply means that the meaning of the input sentence, its semantics, were represented as physical tasks which the robot arm could carry out. Given the nature of the domain and the style of interaction which it fostered, this approach led to spectacular results. In fact, many people at this point thought that the ``natural language problem'' had been solved. While SHRDLU was only a blocks-world comprehender, this was seen as nothing more than an engineering problem. It is also significant to note that SHRDLU was the earliest natural language system which could be described as being situated; much of its power came from the awareness it had of the world in which it existed.
Unfortunately, SHRDLU, while sophisticated, was still greatly limited. For example, procedural semantics lose their advantages in a situation where language's purpose is not to cause a physical action to occur. This is the case for more of textual understanding outside the genre of directions. In narratives, for example, there might be elements of the story which describe actions as occurring, but the reader does not comprehend the story by simulating the outcome of the action sequence being performed (or, at least, this is not the only technique which is used) since a great deal of what makes up the reading experience is not about the actions of the characters at all. And, beyond the idea of SHRDLU being situated, there is little in Winograd's work which has direct influence on my own. That is, except for the fact that the weaknesses of SHRDLU, particularly in determining what the meaning of a piece of text has led later researchers to address this issue. It is this issue of what the ``output'' of reading should be which eventually led to the tri-representation I make use of in my own work. Before that, however, the idea of how to represent the meaning of a text had more direct influence on ISAAC by driving aspects of the knowledge representation used in the ISAAC theory (see Chapter 4).
The first solution suggested was by Schank (read:schank3) and the idea of conceptual dependency (CD) theory. This theory proposed clustering all possible actions into conceptually similar primitives; for example, WALK, RUN, CRAWL, DRIVE, and FLY are all examples of a primitive called PTRANS, or Physical TRANSfer of location. The meaning of a piece of text would be whatever CD (or set of CDs) could be interpreted from it. This allowed sentences with extremely different forms (surface representations) to share the same meaning (deep representation). For instance, John walked to the store, To the store John went, and John was taken to the store all share the idea that John was PTRANSed from somewhere to the store. As the next chapter will elaborate, the ISAAC theory makes use of a lowest level knowledge representation which maps English words onto CD-like knowledge elements. However, while ISAAC uses this CD idea as the inspiration for its internal (lowest level) representation of the material being read; the theory does not claim this is the meaning of a text. The meaning is a more complex representation which is built using these same internal primitives.
The basic CD theory was weak outside of the domain of actions, mainly physical ones. It was also too complex to build up CDs into the kinds of extensive structures needed to understand texts beyond the sentence level. As a result, two representations were developed, scripts and plans ([#!scripts:schank-abelson-1977!#]). Scripts grouped CDs into packages representing stereotypical sequences of actions, such as going to a restaurant. SAM ([#!acm:sam!#]) was an early model which ``comprehended'' simple stories via script application. Plans were designed to handle the less stereotypical aspects of life. By understanding a person's goals, one could reason about potential plans that person may be attempting in order to achieve those goals. PAM ([#!acm:pam!#]) was a reading system which ``comprehended'' plan-based stories. Other AI models were also plan-based, leveraging off the enormous amount of AI work in the realm of planning (for early, foundational work in this area, see [#!plan:fikes!#,#!plan:newell!#,#!plan:sacerdoti!#]), as well as much of the work from psychology on plan recognition as discussed previously. Allen (read:allen1,read:allen2) and Wilensky (acm:pam) were among the first to attempt this; others included Carberry (read:carberry) and Sidner (read:sidner). The ISAAC theory also extends the CD-style representation in order to overcome these limitations. First, the representation I have developed is designed to handle non-physical actions as well as physical ones. In addition, other elements of the world besides actions are representable with the same level of power--for example, objects, agents, and states (descriptions). This will be further described in the next chapter. Chapter 5 will explain how the ISAAC theory has extended the idea of plans and goals to represent higher levels of events than simple sentences.
Indeed, the ISAAC theory and model make use of a large variety of knowledge types, from the low-level to more complex entities. This was inspired by the work of Dyer (read:dyer1) whose BORIS model was a more involved system which attempted to overcome the deficiencies of earlier ones by integrating the current theories of its day in order to perform ``in-depth'' reading. As a result, BORIS concentrated on high-level reading aspects. It was a very knowledge intensive system, with semantic information, scripts, plans, and so forth. In addition, BORIS added the concept of Thematic Affect Units (TAUs) as a knowledge organization schema. These organized memory around themes in addition to any existing organization the memory may already have. Although BORIS used these TAUs in only a limited fashion, their theoretical power is large--with the proper TAUs, a reader would have no trouble recognizing the similarities between two apparently diverse stories, such as Apocalypse Now and Heart of Darkness. BORIS, though, was not a creative reader. It assumed that all the information necessary to comprehend a given story would be in memory prior to the reading experience--the novelty of new stories came about, it was argued, since each story could organize the known concepts in various ways. My work with ISAAC has kept the idea of knowledge intensive reading but extended the model of reading to account for handling novel concepts.
Other research in this same line focused on different aspects of the overall reading process, while trying to maintain the same underlying representational ideas. For example, it is possible to use script-like knowledge to allow a model to skim a text rather than read it in-depth. This was demonstrated in the FRUMP theory ([#!read:dejong1!#]). FRUMP used a knowledge organization structure known as a sketchy script to guide its skimming of news articles. After skimming a given article, the system produced a summary which could be used to check accuracy. FRUMP was fairly fast and robust, and was able to handle stories not known about before being run, as long as they fit into one of its sketchy scripts. However, the FRUMP approach did not include learning the scripts it needed or refining the ones it possessed. For what it was intended, though, FRUMP must be regarded as a success; however, its static reliance on using only skimming as its method of text interaction limited its overall reading ability. Although FRUMP could comprehend stories which it did not have complete prior knowledge of (the limitation of BORIS), it did so only through matching the sketchy scripts it possessed--it was not creatively understanding novel concepts to arrive at a comprehension. Further, the inability to read a text in-depth led to comprehension errors which could have been avoided if the depth of reading could have been varied by the program. This is the approach which is taken in ISAAC--skimming is used for some portions of reading a text but the model can always return to an in-depth mode of reading.
Following on the success of FRUMP, later systems tried to vary the reading depth, such as IPP ([#!acm:ipp!#]). IPP would skim ``uninteresting words'' in a text, explicitly looking for content words. When a content word was identified, more in-depth processing would begin. The specific meaning of the content word would then feed back to the skimming mechanism by providing hints as to what content words should now be looked for. By this method, skimming was a more dynamic process; while some words were tagged as always being ``skimmable,'' others were skimmed only when the more intensive comprehension process determined them to be unnecessary. In addition to the ability to skim, IPP was also commendable in that it added some level of learning to the reading process. The ISAAC theory bears some resemblance, in this regard, to IPP's methods. The model does incorporate the ability to alter reading depth; this facet of reading is handled by the control supertask. However, this control mechanism is dynamic and driven by the needs of the reasoner. If a reasoner is predicting well based on a shallow reading of the material, this continues; if a prediction fails, then ISAAC switches to a deeper mode of reading, re-reading material if necessary. And, as with the last few models presented, ISAAC extends the theory of reading to include the provision for handling novel concepts; i.e., creative understanding.
Finally, a word needs to be said concerning the type of AI models which have generally been proposed for language comprehension. Many AI models of language comprehension were built on the premise that understanding consisted of building a proper representation of the input. Riesbeck and Martin's DMAP (read:riesbeck1) took a different approach, which has also been used by many other AI models. Rather than viewing comprehension as a Build and Store task, the DMAP model treated it as a Recognize and Record task. The primary goal of a language comprehender was not to build a representation but to recognize what existing episodic memory structures could be applied to the input. As the authors argue, this is not necessarily a good approach to the parsing of sentences like ``The boy kissed the girl'' since the only type of recognition is simple semantic meaning. However, with more complex or more in-context language, the recognize and record model could prove superior in processing speed and quality of results.
This is same sort of philosophy underlying the use of scripts or plans, although DMAP carried it to a more refined level. It also fits in well to the reading education view that meaning is not contained within the text (where a build and store model would need it); instead, the meaning of a text is contained in the shared knowledge between the author and the reader. Because of this commitment of DMAP's to treat understanding as recognition, the theory made another significant contribution--the memory that the ``parser'' uses is the same as the total episodic memory the system possesses. Thus, there was no need for separate memory structures, the language understanding process can make full use of the full range of memories possessed by the system as a whole. However, it is best to view DMAP as an almost complete antithesis to ISAAC. While ISAAC does have a heavy reliance on memory with the way in which the memory supertask is integrated into the reasoner, as will be seen in Chapter 5, this is in support of the creative understanding of novel concepts. In DMAP, the reliance on memory as the method of understanding meant that only pre-existing concepts could be understood (or concepts which were I-Novel). Therefore, there was no potential for creative understanding.