8.4 Testing CBA Projects
A downside of the many advantages of computer-based assessments (see section 2.1 is that the more interactive (and innovative) assessment items are, the more sensitive the created content is to errors and potential glitches. Potential issues that require (sometimes) intensive testing can be functional (e.g., a missing NEXT_TASK command, see section 3.12.1), or affect only the layout and presentation of tasks (see section 6.8.5 for tips).
Moreover, since different software components and multiple steps are usually involved in generating assessment content, finding errors and testing assessment projects can be complex, requiring a systematic approach. The most crucial part of implementing CBA assessment projects is understanding and developing a notion of what needs to be tested and when tests may need to be repeated if specific changes have been made to a test delivery or the components used.
| Testing |
|---|
| Cross-Browser Testing (when required) |
| Component Testing / Scoring Testing |
| Integration Testing / Data Storage Testing |
8.4.1 Testing Cross-Browser Compatibility
The items created with the CBA ItemBuilder are displayed in the browser with the help of a runtime (see TaskPlayer API in section 7.7) and, if necessary, other additional software components of the delivery software. In the current version, the CBA ItemBuilder runtime is implemented in JavaScript based on the React framework and tested with the CBA ItemBuilder in the current browsers at the time of release. However, additional functionality provided by the used browser or the used browser component is always necessary to display the items. Browsers are subject to continuous development and change, and several differences exist between browsers on devices with different operating systems.
The dedicated testing of CBA ItemBuilder content in different browsers is necessary if A) browsers are used which had a low penetration at the time of the release of the CBA ItemBuilder version (e.g., browsers on specific devices like SmartTVs) or were still unknown (e.g., newer browser versions after the release of a specific CBA ItemBuilder version/runtime). Cross-browser testing is B) also necessary if any content is included via the ExternalPageFrame interface that was either implemented specifically for an assessment or has not yet been tested in the intended browsers.
Running Preview in Specific Browser: The assessment figures created with the CBA ItemBuilder can be viewed in the browser directly from the authoring tool (see section 1.4). By default, the system browser is used, i.e., the browser registered Web browser on the local computer used to run the CBA ItemBuilder. After a preview has been started, the URL opened in the default browser can be opened in other browsers if they are installed locally on the computer. In this way, assessment content can be viewed and checked in different browsers.96
Using External Tools for Cross-Browser Testing: Since CBA ItemBuilder created assessment components are generated as HTML output for integration into test deliveries, tools for testing websites in different browsers can also be used to verify cross-browser compatibility.
Key technologies for automating website testing such as Selenium or PlayWright, and the various solutions for automated website testing on various devices can be used to test cross-browser comparability. Cross-browser testing is suggested in particular when external components are embedded into CBA ItemBuilder project files using the ExternalPageFrame-component (see sections 3.14 and 4.6.3), of if CSS adaptations are used (see section 6.3.3).
8.4.2 Testing Assessments Using Synthetic Data
For the further test steps shown in table 8.6, it has proven helpful to consider the structure of assessment projects. Assessments generally consist of many individual components (items and units, as well as instructional components like tutorials). The Components are administered either in a fixed sequence (see Fixed Form Testing in section 2.7.1), in different booklets or rotations (see section 2.7.2) or in individualized sequences (see Multi-Stage Testing in section 2.7.3 and Computerized-Adaptive Testing in section 2.7.4).
An essential first step is Component Testing to ensure that test cases systematically cover the specific conditions of all components. This means that the individual components (i.e., items or units, instruction pages, tutorials, etc.) are tested separately.
Component Testing / Scoring Testing: Component testing regarding the behavior of items can be combined with scoring testing of individual items. Depending on the complexity of the evidence identification rules used to define the scoring, scoring testing might be trivial (for instance, to make sure that the selection of radio buttons is appropriately captured). However, it can get more complex if, for instance, multiple pages are used and the missing-value coding is included in the scoring definition.
To test the scoring, the individual assessment components must first be identified. For each item, all the different correct and incorrect solutions should be systematically entered, produced, and checked. If several items or ratings are contained within a component, it is recommended to check to what extent they are independent. If the defined scoring rules reveal dependencies, then the scoring check of the individual components should also consider all combinations, as far as possible, with reasonable total effort.
For organizing the testing of CBA projects, it is recommended to use a version control system and to organize the process using an issue tracker (see section 8.3).
Integration Testing / Data Storage Testing: Data storage (typically for result data) uses the single assessment components integrated into the test deployment software. A systematic approach is based on Click Pattern (also called Mock Cases), meaning pre-defined responses to all items in a test or booklet. To verify the entered responses (synthetic data) with the results collected by the assessment software, the data post-processing (see section 8.6) should be in place (end-to-end testing).
For pragmatic reasons, the use of screen recording software (e.g., OBS Studio) to check the scoring may also prove helpful. If the screen is continuously recorded during the scoring check, then possible input errors can be identified more quickly in the event of inconsistencies.
Missing or skipped responses, time constraints, timeouts, and special test events, such as test leader interventions or test terminations at defined points, should be included in the mock cases so that missing coding can be checked later.
result-text()-operator is used). In addition, the delivery software can use a codebook to translate them into variables (with customized variable names and variable labels) and, for categorical variables, with (newly defined) variable values (e.g., of type integer) and with additional value labels.
Please note that if item selection is also dependent on random processes in the context of adaptive testing, for instance, as part of exposure control, then the algorithms for item selection must be tested as an additional step. Testing adaptive algorithms is done, for instance, in pre-operational simulation studies, if the algorithms used for operational testing are accessible for simulation studies as well.
Verification of Log Events: As described in the section 2.8, the theory-driven collection of log data is increasingly important. Various possible process indicators can be extracted from log data, providing information about emotional and cognitive processes during test and item processing. Log data thus provide the basis for a possible improvement in the interpretation of test scores and, if use is planned, should be reviewed before an assessment is conducted. When verifying and checking log data, special attention should be paid to the fact that log events depend on their context (Contextual Dependency of Log Events). Therefore, verifying and checking log events may require a reconstruction of the context from previous log events.