Calculating the cut score: An introduction to standard setting in Surpass

Standard setting is the final stage of the test development process. It is the process of determining the minimum score – also known as the cut score – required on a test to distinguish candidates who are competent in their field, from those who are not.

If you’d like to discover more about standard setting, watch our Surpass Community webinar where the Surpass Product Communications team demonstrated the features in Surpass. As part of this webinar, Amanda Dainis, Lead Psychometrician and CEO of Dainis & Co also discussed best practice for standard-setting activities.

How that cut score is calculated depends on whether candidates are being assessed in comparison to their peers, or assessed on how they perform against the test content.

Norm-referenced tests are used if a certain portion of test-takers is required to pass the test, and they are then judged against the their peers. As this method only tells you how someone has performed in comparison to their peers, as opposed to the content of the test, it is not widely used for high-stakes examinations.

Conversely, a test-taker may pass or fail a criterion-referenced test based on how they perform in relation to the test content, therefore taking the difficulty of items into account. It is this type of test that requires a process of standard setting to determine the appropriate and valid cut score for the test.

It’s a common misconception that a test cut score is always 70, but organizations cannot decide a cut score without taking into account the difficulty of the questions. In order for a test to be valid, fair, and legally defensible, organizations must be able to demonstrate that appropriate methodology has been used to determine and validate the cut score.

For criterion-referenced tests, there are different methods for setting cut scores, the most widely known being Angoff and Hofstee, which can be used independently or in conjunction with one another.

Angoff Method

This is the most commonly used method for standard setting. Using this method, Subject Matter Experts (SMEs) give each test question a rating between 0% and 100%. This value indicates the portion of minimally competent candidates that they believe would get that item correct. An established and detailed concept of the minimally competent test-taker is essential for all SME raters to understand and apply to the process.

One of the benefits of this method is that as items in the bank are rated independently of each other, multiple test forms with the same cut score can be generated if the item bank is robust. Further, Angoff ratings also make it more straightforward should an item need to be removed or replaced in a test form.

Angoff rating process:

SMEs rate each item between 0% and 100% based on its level of difficulty, expressed as the proportion of minimally competent candidates that will answer it correctly. Therefore, harder items will have a lower rating than easier items.
- If there is a low rater agreement (i.e. significant differences in ratings applied by SMEs), another round of ratings may be required.
- Revised/Modified Angoff rating – Some organizations may also allow a ‘modified Angoff’ rating to be applied. This allows the SME to change their initial rating, either after discussion, or after seeing item performance data (such as from beta-testing), which may make them re-think the initial rating they applied.
- The average of the difficulty rating for all items within a test form becomes the cut score for that test form.

Hofstee rating

The Hofstee rating method requires the SMEs to consider the test form as a whole to estimate pass rates and cut scores. This method requires the SME to estimate four values:

The minimum acceptable failure rate
The maximum acceptable failure rate
The minimum cut score, even if all examinees failed
The maximum cut score, even if all examinees passed

These values are plotted against test data, and the intersection is the cut score for the test form.

As a standalone rating method, Hofstee can be used for static test forms, but the way the score is calculated means if any items in the test form change, the entire form must be rated again.

The Hofstee method is more commonly used in conjunction with Angoff rating, as a confirmatory measure once Angoff ratings have been completed.

Standard Setting in Surpass

“Ideally when we do standard setting, we want to do it in person at our headquarters…but that’s not always feasible… so we really needed a tool that could help us facilitate the activity, help us securely present exam content, and also capture real-time item ratings.“
Richard Feinberg, NBME (speaking at the ATP conference)

So now we’ve looked at what standard setting is, and at some of the commonly used methods. How can Surpass facilitate this process?

Standard setting functionality will soon be introduced within the Tasks area of Surpass, developed in collaboration with a leading Surpass Community organization. This innovative and timely functionality is available exclusively to the Surpass Community.

The functionality not only provides the perfect arena for standard setting activities to be conducted remotely, but it also protects the integrity of your item bank by limiting SME access to specific items or banks. Additional security measures, such as proctoring or lockdown browsers, can also be implemented based on program requirements.

In the initial implementation of the tool in Surpass, it will be possible to apply the Angoff method as well as modified Angoff ratings to items in a seamless and efficient process. This functionality is set to expand further in future releases of Surpass.

Key features of standard setting in Surpass:

Create an item list containing a subset of items from the bank to be rated – no need to expose the entire item bank.
Monitor progress via a user-friendly dashboard – perfect for collaborative remote working.
Choose whether to apply a standard or forward-only navigation method to items (the benefit of a forward-only method forces SMEs to consider the item independently of the selection of items they’ve been assigned, as they cannot navigate away from an item until a rating has been applied).
Choose whether to ask for a revised rating to also be submitted, with item settings revealed to the SME to help inform their decision.
Export all ratings to a CSV file for the calculation of test cut scores.