Consulting for Test Publishers

Technical Services

Testing has a strong technical side, and not everyone administering and developing tests has access to the necessary technical expertise. Of course, technical advice is an integral part of most services we offer—test development and validation, for example, both have a strong technical component. But there are certain technical tasks that we can carry out on a per task basis: item analysis, test-data analysis, item pool design, scaling, and standard setting are obvious examples.

We can carry out these tasks, or indeed other technical tasks, for you, either on a task-by-task basis, or as part of a regular program of technical support. Please contact us to discuss details.

Item Analysis

Item analysis is used to determine how well each individual item on the test is functioning. It is an integral part of item development, and test evaluation. There are two main types of item analysis; classical and IRT. Determining which is appropriate depends mainly on the purpose of the analysis and the number of test-takers available.

If the data is readily available, the analysis itself is usually relatively inexpensive.

Test Analysis

There are a number of reasons for carrying out statistical analyses of test results. Estimating reliability is perhaps the most common, but test analysis is usually a central part of the validation process, and is often used to provide normative information on how various groups perform on the test. An analysis of the test results should be part of any routine quality control or ongoing process of monitoring test performance.

There are a number of ways tests can be analyzed. One common way is to look at the relationship between the various sub-sections, to determine whether these are as predicted by the theoretical basis of the test. Similarly, it is common to examine the relationship between the test and other assessments, or other variables, again to ascertain whether the pattern of relationships is as it ought to be.

Item Pool Design

Most modern, large-scale testing systems do not build fixed test forms but rather create a pool of calibrated items that can be used to generate a number of different forms, according to the needs of the assessment system. An item pool is essentially a data-base of test items, but it needs to be designed in such a way that the items in the pool can be easily pulled out to create test forms according to the test specifications. This usually requires data on item performance and a variety of technical analyses.

Scaling

Tests need to report results in a manner that non-experts can easily understand. Usually a reporting scale is created for that purpose. Developing scales can be very technical, and designing and creating the right scale requires not only a knowledge of the test and how the scores will be used, but also considerable technical expertise.

Standard Setting

Test results are often used to provide information about whether the test taker has attained a level of performance necessary for some purpose. Common examples are achievement of a certain level of knowledge, graduation from a study program, sufficient English proficiency to function in mainstream classes, etc. This requires that a passing standard; i.e., a cut score, be determined.

Standard setting uses experts to determine the level of attainment that should be considered sufficient. The process is far more difficult than it would seem, and setting standards has proved to be fraught with complex problems. We strongly recommend that standard setting studies be carried out by experts who have experience with such studies.