Next: Evaluators
Up: Evaluation
Previous: Evaluation
Although the ISAAC model appears to handle a range
of example stories, requiring various elements
of understanding for comprehension to be successful,
it is not yet clear to what level of competence
ISAAC is capable of
performing. For this, some formal empirical evaluation
must be undertaken.
Traditional
artificial intelligence reading systems were evaluated
in a fashion analogous to how human readers would be
evaluated--the programs would generate summaries
of the stories they read or they would answer questions
created by the researcher which were designed
to evaluate the level of comprehension achieved.
However, something beyond this level is required to
fully demonstrate my theory. Initially, I believed
it would be possible to appeal to the
reading education literature and ``pull out'' a set
of guidelines to use in developing a motivated
evaluation of the reading capabilities of ISAAC.
Unfortunately, the
reading education literature leaves precise evaluation
issues up to the individual teacher. While specific reading
comprehension tests exist,
these depend on using
a set of passages and questions which are provided
to the instructor;
only general
guidelines are given for how a reading teacher should test
the comprehension level of an arbitrary piece of text.
Since my research uses
science fiction stories
with the specific goal of understanding novel concepts,
these ``pre-packaged''
tests are inadequate.
Therefore, the problem is
that I do not possess the experience
necessary to produce accurate evaluation criteria of
the set of stories and the literature of the field where I would
expect to find such expertise is also lacking. However,
I did have access to experienced reading educators.
The technique I eventually settled on was a modification
of the classic Turing Test ([#!core:turing-test!#]).
First, I developed ISAAC to a point which I felt
was sufficient for it to read and comprehend the
stories I was using. Then, I froze the
development of the system at that level.
I gave the stories to a group of
reading experts. They provided me with a
set of questions which they felt was sufficient for testing
a person's comprehension of the material.
I then had a group of humans read the stories
and answer the questions. At the same time, I
allowed ISAAC to read the stories and answer the same
questions.
Then, the human evaluators were given
the answered questions and asked to grade them, unaware
of which were human and which were ISAAC.
Examining the final scores provided
evidence for how well ISAAC fares as a reader. By
then analyzing the knowledge and processes it had to work with,
I am able to substantiate my claims concerning
the relationship of creativity, understanding,
and reading. In particular, with an instantiation of the
theory capable of reading and comprehending well, it
is possible to determine
the power of the theory by observing what processes
are implemented and what knowledge is available
to the model.
Next: Evaluators
Up: Evaluation
Previous: Evaluation
Kenneth Moorman
11/4/1997