How computer-based tests are enriching education research

By Marco Paccagnella

Analyst, OECD Directorate for Education and Skills

Test scores are a convenient measure of knowledge and skills, and they are useful in identifying both under-performers and high achievers. But test scores alone (and rankings based on them) might not tell the whole story.

Whether at the individual or country level, test scores do not tell us anything about how and why test-takers achieve a given level of performance – information that is essential to help students improve. In a classroom situation, teachers usually have access to students’ actual responses on a maths test or writing assignment, which they can use – together with their own knowledge about the student – to provide feedback, highlight subtleties and correct misunderstandings. This is not the case for researchers who analyse data from large-scale assessments, such as PISA and the Survey of Adult Skills; but computer-based testing provides researchers with new – and important – insights.

Computer-based testing has significantly increased the amount of information that researchers can collect from respondents, and promises to improve our understanding and interpretation of test scores. The software used to administer assessments can also record all interactions that respondents make with the computer interface (storing them in so-called “log files”), which provide researchers with an unprecedented level of detail on the behaviours of respondents.

This kind of information can be used, for example, to determine the engagement and perseverance of test takers, based on the time they take on a given assessment. These measures are particularly important for low-stakes assessments like the Survey of Adult Skills or PISA, because the validity of such tests rests on the assumption that participants give a reasonable amount of effort. In the case of the Survey, though, data from log files show that engagement varies widely across test takers in different countries, as we detail in a new report.

Smart measurement is on its way – and exciting new developments are on the horizon.

Adults in Norway, Germany, Austria and Finland, for example, spent nearly 50 minutes on the test, on average; and only about 5% of respondents in those countries were identified as disengaged on more than 20% of the items. In Italy and in the Slovak Republic, on the other hand, the average test taker spent only about 40 minutes on the assessment, and the share of disengaged respondents was about 15% and 10%, respectively.

These findings are important for two reasons. First, they convey information about some important non-cognitive traits of the respondents, such as their conscientiousness and the ability to endure fatigue and remain committed to a task. (Such information should be interpreted with caution, however, as we cannot be sure how those who participated in the Survey would behave in a real-world, higher-stakes situation.) Second, they provide context that allows us to better interpret the results. The assessed performance of disengaged respondents, for example, is less indicative of their true underlying performance, and this should be accounted for in comparing results across countries or groups of respondents. Such results are suggestive, however, rather than conclusive, underscoring the need for further research rather than immediate policy actions.

Research on log-files is still in its infancy, and there are limitations. The content of log-files is limited both by what software developers chose to record (current datasets only record a limited subset of all respondent interactions), and the format of assessment items allowed to be recorded. The disengagement indicator discussed above is based on the assumption that time spent on an item is a reasonable approximation of effort, but we have no way to observe how respondents spent that time, apart from the few actions that required interaction with the computer platform. Many of today’s computer-based assessments are simply digital transpositions of materials that were originally designed for paper. As a result, they are not interactive in nature, which limits the amount of computer interactions. This, in turn, means that most cognitive processes required to answer the questions cannot be observed or inferred based on information from log files.

Things are likely to change in the future, though. As the research community improves its understanding of technology’s potential, they will begin designing more items capable of capturing cognitive strategies, and myriad other cognitive and non-cognitive facets. Emerging technological developments will also allow researchers to measure even more aspects of test-taking behaviours. Experiments have already been conducted with sophisticated eye-tracking technology, for example, though not yet at a large scale.

Smart measurement is on its way – and exciting new developments are on the horizon.

Read more: