In order to test the ISAAC system on the same questions, a number of precautions were taken to lessen researcher bias. The ISAAC system was developed to the competence level which was judged necessary to read and comprehend the stories. The model was then ``frozen'' at that level of development. This occurred prior to the questions being solicited from the evaluators. Thus, there was little chance that the researcher would unintentionally bias the outcome by tailoring the computer model to the specific questions being asked. Then, ISAAC read each story in the order in which they were given to the human participants. After each story was completed, ISAAC was asked the questions. The questions were given to the ISAAC system in their original English forms. ISAAC possesses an external question answering task which interprets questions as a mixture of memory requests and the need to build connections between items which are retrieved. As ISAAC possesses no English generation capability at this time, the answers produced were in conceptual form. These forms were translated into English and written on a questionnaire sheet. A more complete description of the process by which ISAAC was evaluated, including examples of conceptual representations and the resulting English translations can be found in Appendix C.
The human responses and the ISAAC responses were then given to the four evaluators, with the identities of each participant hidden via a simple code number. Each evaluator was responsible for grading the questions which they had developed. No instructions were given to the judges as to what criteria should be used in the evaluation process; thus, the harshness of their standard of performance was completely under their control. In addition to grading the answers, each evaluator was asked to judge which participant they believed to be the ISAAC system and why.