8.6 Data Processing after Assessments
The data preparation process should be already tested as part of the Integration Testing (see section 8.4.2). For this purpose, the required routines (e.g., R scripts) should already have been created prior to data collection and tested with the help of synthetic data. Testing is complete if it is verified that the central information can be derived from completing tasks required for identifying evidence about the test-taker’s knowledge, skills, and abilities.
| Data Preparation / Reporting / Feedback |
|---|
| Data Preparation |
| Coding of Open-Ended Responses |
| Final Scoring / Cut Scores / Test Reports |
| Data Set Generation / Data Dissemination |
ExternalPageFrame content is used, the required data preparation for result data boils down to combining data stored in case-wise individual raw data archives into a data set in the desired file format.
Data Preparation: Data preparation can begin during data collection if intermediate data are provided or made available. Typically, the data is generated in smaller units (i.e., sessions) in which a test-taker processes a set of tasks compiled for him or assigned to him via pre-load information. The data on a test taker, as provided by the assessment software, can be understood as a Raw Data Archive. Analogous to the scans of paper test booklets, these Raw Data Archives (for example, combined as a ZIP archive) are the starting point for data preparation. If raw data from computer-based assessments must be archived in terms of good scientific practice, then this can be understood as the requirement for long-term storage of the Raw Data Archives.
A first step often required to describe the collected data as pseudonymized or anonymized is the exchange of the person identifiers (ID change) that were used during data collection. Person identifiers might be used as the file name of the Raw Data Archives and might be included in several places. Since the Raw Data Archives should not be changed after data collection, the data processing means extracting the relevant information from the Raw Data Archives and changing the person identifier in the extracted result data and the extracted log data.
Approaches known for Open Science and Reproducible research (Gandrud 2020) should be used (i.e., using scripts that are maintained under version control), to allow re-running the complete data preparation starting from the Raw Data Archives to the final data sets. If the data preparation is carried out entirely with the help of scripts (e.g., using R), later adjustments are more straightforward. Possible adjustments include deletion requests for the data of individual test-takers, which might otherwise be cumbersome if, for example, a large number of data sets is created due to the collected log data (see section 2.8).
LogFSM (see section 2.8.5).
Coding of Open-Ended Responses: Operators described in chapter 5 for the CBA ItemBuilder for evaluating so-called Open-Ended Answers are currently limited. Open-ended answers (such as text answers) can only be scored automatically to a minimal extent (in the CBA ItemBuilder, only with the help of regular expressions). More modern methods of evaluating open-text responses using natural language processing methods [NLP; see, for instance, Zehner, Sälzer, and Goldhammer (2016)] might require a two-step procedure. Training data are collected in the first step and not evaluated live during test-taking. Afterward, classifiers are trained based on NLP language models or adapted in the form of fine-tuning. Once such classifiers are available, the answers can be automatically evaluated by test-takers and transferred to the data set.
A similar procedure applies to graphical answer formats (e.g., when an ExternalPageFrame allows test takers to create a drawing for their answer). For the creation of training data as preparation of an automatic coding or if answers are to be evaluated exclusively humanly, the open answer must be extracted from the raw data archives for an evaluation process (Human Scoring).
ExternalPageFrames, the JavaScript/HTML5 content embedded into CBA ItemBuilder items must implement the getState()/setState()-interface to collect the state of the ExternalPageFrames on exit and to allow to restore the content for scoring purposes (rating).
Final Scoring: The decision of whether items already score the responses (scoring) or whether only the raw responses (i.e., the selected items, entered texts, etc.) are collected at runtime is made differently for different assessments. As long as the responses are not needed for filtering or ability estimation (see section 2.7), there is no critical reason why scoring should not be performed as part of post-processing. Only if created assessment content is shared (see section 8.7.3) is it helpful to define the scoring, for instance, directly within the CBA ItemBuilder Project Files (i.e., the files to be shared), because this way, items are automatically shared with the appropriate scoring.
Cut Scores and Item Parameters: Even if the scoring, i.e., for example, the mapping of a selection to a scoring (correct, wrong, partially correct), can be part of the item (i.e., is implemented, for instance, using the scoring operators described in Chapter 5), the Item Parameters and potential Cut Scores (i.e., threshold values for estimated latent abilities) are not considered to be part of the assessment content, because these parameters might either not be known when a newly developed instrument is used for the first time or the values might depend on the intended target population.
Test Reports: Different parts of an assessment software might be responsible for feedback either during the assessment (see section 2.9.1), or after data processing and scoring of open-ended responses (see section 2.9.2). Hence, reports can be generated either online (as part of the assessment software) or offline (as part of the data processing).
ShinyItemBuilder (see section 7.3.5).
Data Dissemination: The provision and distribution (i.e., dissemination) of data from computer-based assessments, for example in research data centers, can be done for Result Data and Process Indicators in the typical form of data sets (one row per person, one column per variable). Since the different assessment platforms and software tools provide log data in different ways, log data can be transformed into one of the data formats described in section 2.8.4 as part of the data processing after an assessment.