Next: Self-evaluation of agents
Up: Baseline model performance
Previous: Question type
Although I am not willing to make strong
conclusions simply from the types of questions as presented in
the previous section, it is somewhat revealing to examine the
questions which ISAAC failed to answer correctly in order
to determine if there is a particular style of question
which is problematic for the model.
In most cases, ISAAC's mistakes involved literary concepts which
are beyond the scope of this project. For example,
questions dealing with the way in which an author ``built
suspense'' went completely unanswered. Similarly, questions
dealing with
irony were answered at a more simplistic level
than the evaluators were looking for. In some cases,
ISAAC failed to provide the evaluator with the full
range of response expected. And, in some other questions,
ISAAC provided an inference in order to answer the question
which the evaluator felt was unjustified.
Human participants also exhibited many of the characteristics
which ISAAC did, with respect to not providing what the evaluators
considered a ``full'' answer. Additionally, the students
often missed the literal comprehension questions.
I hypothesize that this is due to memory issues; students
were not allowed to consult the text as they were answering the
questions. As such, incorrect memory retrieval could hamper
the production of the correct answers. While ISAAC's memory is
inspired by human memory and is not ``perfect'' in any sense
of the word, it is likely that its memory is better
at retrieving factual information contained in a text
than the humans.
Next: Self-evaluation of agents
Up: Baseline model performance
Previous: Question type
Kenneth Moorman
11/4/1997