8 Assessment Cycle and Workflows
Starting from a concrete diagnostic question (Diagnostic Interest), the development and implementation of computer-based assessments take place in a process that can be illustrated, for example, as an Assessment Cycle (see Figure 8.1).
The first part of this workflow of computer-based assessments (Diagnostic Interest and Item Development, see also, e.g., Lane, Raymond, and Haladyna 2015) can be broken down into more detailed parts and steps:
If the intended use of assessment material, for instance, prepared using the CBA ItemBuilder is defined (i.e., if Test Design and Test Assembly are known), the distribution of content to Project Files can be optimized (see section 8.2).
A well-considered distribution of assessment content to individual project files can reduce effort and the risk of errors for the following steps:
- Testing (see section 8.4),
- Test Administration and Data Collection (section 8.5),
- Data Preparation, Reporting and Feedback (see section 8.6), and
- Documentation and Archiving (see section 8.7).
Finally, developed instruments can be shared and made available, for instance, as Open Educational Resources (OER, see section 8.7.4) to be used for further research or in (educational) practice. All different parts of a usual workflow for CBA projects as shown in the Assessment Cycle are described in this chapter.
8.1 Planning of CBA Projects
Assessment projects can face time pressure if necessary steps for planning and preparing have either not been considered or if timetables and milestone plans underestimated requirements for necessary steps. However, time pressure at the end might be less likely if a systematic approach is followed.
8.1.1 Overall Planing and Preparation
Table 8.1 lists the initial steps that should be taken before implementing specific items or tasks for a concrete computer-based assessment.
Overall Planing and Preparation |
---|
Domain Definition and Claims Statements |
Content Specification |
Feature Collection & Requirements |
Software-Tool Selection |
Domain Definition and Claims Statements: As the first step of planning and preparing an assessment project, the construct domain to be tested needs to be defined and articulated. After naming and defining the domain to be measured, clear statements of the claims to be made about test-takers knowledge, skills and abilities (KSAs) are required.
Content Specification: The fundamental arguments that should be possible based on an assessment require validity-evidence based on the test content to support the specific interpretations and uses of test scores. This requires a precise content specification (i.e., test content and test format), including specifications on how the KSAs should be measured, which cognitive processes are required, and which item formats and response types are intended.
Feature Collection & Requirements: A systematic collection and documentation of all requirements that exist regarding item presentation and test delivery is suggested before selecting a particular assessment software. Planning for technology-based assessments (see, e.g., International Test Commission and Association of Test Publishers 2022) also includes considering how the use of technology can directly impact the assessment experience.
What if the required functionalities and features of the assessment software and the requirements for test delivery, analyses, and implementation of the computer-based assessment still need to be precisely described? In that case, creating storyboards and implementing minimal examples (as described in the 8.1.2 section) could help.
Software-Tool Selection: Selecting software components for the different parts is possible based on the collected requirements. Various tools might be appropriate for implementing the actual items, the assessment delivery, and the data processing during or after the assessments. If different tools are used, their interplay poses another requirement.
Some aspects for the selection of software components are:
- Features of the software: Items of which type can be implemented using the assessment software (e.g., specific response formats, support of QTI items, items composed by multiple pages etc.)? Is response times measurement required with an accuracy of milliseconds, or is a web-based implementation appropriate?
- License to use the software: How can the software be used for different parts of the assessment project, including the actual data collection, archiving of the instruments, and the dissemination of the developed CBA instruments?
- Interoperability and vendor lock-in: How can the assessment content be used if key stakeholders or project partners change?
- Support and Service Level Agreement: Is technical support for implementing the items and conducting the data collections available, or can a specific agreement be concluded?
- Runtime requirements for offline delivery: Is test delivery without internet access possible and which devices and operating systems are supported? How is it ensured that testing can be carried out without interruption in the face of incorrect entries and that it can be continued after system failures?
- Requirements for online assessments: Bandwidth for group testing, hosting requirements, number of concurrent supported test-takers, redundancy and backup strategy, supported browser versions?
- Software integration: If developers are involved, are they required to implement the complete assessment or only parts (e.g., specific content embedded using
ExternalPageFrames
, see section 3.14, or the integration of CBA ItemBuilder items using the TaskPlayer API, see section 7.7)?
The personal abilities, resources, and skills of those involved in the project also play a not inconsiderable role in the success of CBA projects. Assessment projects often require competencies from different areas, which is an argument for interdisciplinary teams.
Programmers and software developers are only needed in this process if specific extensions in the form of ‘ExternalPageFrame’ content are required or existing HTML5/JavaScript content is to be integrated. Psychometricians (e.g., for scaling and IRT-based item selection), and system administrators (e.g., for hosting online assessments on in-house servers), may be needed to complete a team.
8.1.2 From Idea to Implementation
Once the process model for item creation and software selection has been decided upon, the individual items (in cycles, if necessary) are implemented using the steps shown in Table 8.2.
Item Development |
---|
Item Story Boards and Item Writing |
Minimal Examples and Item Computerization |
Item Review and Modification |
Scoring Definition and Scoring Testing |
Item Tryouts (Cog-Labs / Field Testing) |
Item Banking |
Story boards: A first step for the creation of more complex computer-based items are storyboards, which illustrate in the form of sketches in which sequence information is to be presented and answer options are to be given. For diagnostically meaningful assessment components, particularly the behavior by which test-takers should provide information about their competence or ability is of particular importance, i.e., which behavior or actions should be used for evidence identification. According to the possibilities of computerized assessments to create interactive, authentic, and complex tasks (cf. Parshall 2002), evidence identification does not have to include the work products exclusively. Still, it can also refer to features of the test-taking process (i.e., process indicators from the log data included in the measurement model).
Minimal Examples: Based on the initial ideas and storyboards, the functionalities and features required for designing the computer-based items can be derived. Typically, developing complex items to the end is not necessary to check whether a specific implementation is possible. Instead, so-called minimal examples, i.e., executable items that exclusively illustrate a particular functionality, can be used.
Feature-Complete Prototype: Based on the division of content into pages, reused page components, and dialogs, designing a prototype is suggested that is as complete as possible and that at least fully maps navigation within assessment components (i.e., within-unit navigation). This step is not necessary if items are created based on an already developed and tested template.
Production of Audio files, Images and Videos: For the production of authentic tasks, simulation-based assessment and the contextualization of assessment content, images, audio, and video files are often required (see section 6.2). These must be created as accurately as possible and across tasks, with consistent font sizes, colors, etc., and saved at the required size.
Item Computerization: Combining and merging the visual material of items with potential possibilities for the test-taker’s interactions (i.e., ways to respond to the assessment content) is a creative process that should result in opportunities to collect (valid) evidence about what test-takers know, can do, and understand. In other words, everything should be allowed that helps in making justifiable claims about KSAs.
In order to exploit the potential of computer-based testing for creating tasks that require specific construct-relevant test-taking behavior and that elicit informative evidence, two approaches are possible: A) Collaborative work in interdisciplinary teams (content experts and developers) and an iterative, agile approach for implementing, evaluating, and modifying computer-based tasks. B) Content experts learn and utilize tools to implement computer-based items directly, allowing them to play around with potential implementations and evaluate the impact on task characteristics and the interpretation of work products and test-taking processes.
ExternalPageFrame
, then feedback and review rounds are recommended (approach A).
Item Review and Modification: Tasks and computer-based implementations of items are usually not created in one step. Instead, assessment components are typically reviewed after an initial draft and revised in review loops to improve and optimize them step by step.
Item Tryouts (Cog-Labs / Field Testing): After item development (and testing, see section 8.4), initial empirical testing in cognitive labs (so-called coglabs) or small-scale testing (e.g., with only one school class) is often helpful. Use cases for tryouts are to learn more about the comprehension and usability of new tasks or to (roughly) estimate the required processing time and task difficulty. The test deployment software described in chapter 7, for instance the R/Shiny package ShinyItemBuilder
(see section 7.3), can be used to implement a tryout. If necessary, either a screen-recording software can be used to capture the detailed test-taking process, or the tryout can use the CBA ItemBuilder’s feature of collecting replay-complete log data (see section 2.8.3).92
Item Banking: The steps that individual items must go through describe a process from initial design, revisions, and tryouts to scaling and then the operational use of items in an automated or manual test assembly technique. At each stage, persons with different roles, such as item author, item reviewer, psychometrician, test developer, project manager, and others, can change an item’s status in a pre-defined workflow. Possible workflows include the dissemination or archiving of operational items and the long-term storage of items required for follow-up assessments, subsequent cycles, or linking or bridge studies. Moreover, the role of persons and the defined workflow also determine which actions, such as commenting on an item, moving it to the next stage, or bringing an item back to a previous stage (or even discarding an item draft), are possible. Hence, instead of managing items in files (or CBA ItemBuilder project files) and metadata about items in spreadsheets, Item Banking using, for instance, web-based software is possible.
8.2 Distributing Content to Project Files and Tasks
The CBA ItemBuilder’s flexibility for creating assessment components with multiple pages requires planning how the content should be distributed as either one or in multiple Tasks (i.e., entry-points) and Project Files (i.e., zip archives).
Assessments created with the CBA ItemBuilder are composed of Tasks. Tasks are stored in Project Files that share their resources (e.g., images, videos, audio files). All test deployment tools described in chapter 7 can be used to administer either a single Task or a linear sequence of Tasks. While this is sufficient for typical data collections using only one booklet or a set of fixed booklets, adaptive testing (including multi-stage tests) can require to analyze responses live during the assessment to select the appropriate subsequent Task (or multiple Tasks such as stages).
Assessment material will be used as shown in Table 8.3 to collect data with a particular test design (i.e., implementing a particular test assembly strategy), using either manual or automated test assembly (see section 2.7) or booklets. Typically, in a calibration (or field trial) study, the first version of a newly developed test (i.e., a more extensive selection of items implemented, for instance, in CBA ItemBuilder project files) is administered. After investigating item properties (such as item fit, see section 2.5), items are slightly modified, and a selection of items is used to create the test(s) using the test assembly approach of choice.
Test Design and Assembly |
---|
Test Assembly Specification |
Booklet Definition |
For many use cases, the following five rules are helpful in deciding how to distribute content to Tasks and how to distribute Tasks to Project Files:
All content that is always administered together should be in one task. For example, if the items that belong to a shared stimulus form a unit, then each unit should be created as a task.
Content that might be separated after a revision or item selection should be put into different tasks. This ensures that the CBA ItemBuilder tasks need as little revision as possible after a field test.
Tasks that refer to the same pages or resources should be placed in one CBA ItemBuilder project file. This avoids repeated copying of content (e.g., images, videos, etc.).
If information from an item is needed, for instance, for a subsequent filter or jump rule, then the items involved can, in the simplest case, be placed in one task. In this way, the CBA ItemBuilder project files remain as independent as possible from the specific functions of the test delivery software.
Generally, the tasks (and the CBA ItemBuilder project files) should be as small as possible. This will save time inspecting and previewing the items and allow different persons to work on different parts of the assessment.
A more detailed, albeit complicated, description of the dependencies is summarized below:
Each CBA ItemBuilder Task will always contain at least one page with at least one item. However, multiple pages (and multiple items) within a single Task are possible. In order to guide the possibilities to optimize the distribution of items to Tasks and Tasks to Project Files, but also to discuss dependencies and potential limitations, the following section summarizes what needs to be considered when planning the use of Project Files with multiple Tasks.
Tasks: Tasks are the entry-points the test-deployment software can use (see section 3.6). The primary role of a Task is to define the first page (or the first page and an additional X-Page), shown after an assessment component has been loaded. Only one Task can be used at once. If multiple items should be visible simultaneously, the items need to be on the same page and shown within the identical Task (see section 2.4 for details about test design, item presentation and navigation).
Runtime Commands: Runtime Commands (see section 3.12) can be used to trigger action from the current Task to the test deployment softare, for instance, to request a navigation to the next or the previous Task. Test-taker can trigger a Runtime Commands, when the Runtime Command is attached to components (e.g., Butttons
). Runtime Commands can also be triggered either by timers or by any component that can be linked to Events (i.e., Runtime Commands can be triggered as operators in rules defined in the Finite-State Machine, see section 4.4.6).
Pages: Tasks show either single pages (one at a time) or multiple pages simultaneous (either using X-Page layout, see section 3.4.2, as Page Areas, see section 3.5.4 or using dialog pages, see section 3.15). Each Page can be used in a Task multiple times and different Tasks within one Project File can share Pages (i.e., different Tasks can use the same Pages). Links (see section 3.11) and Conditional Links (see section 4.3) can be used to navigate between Pages.
Finite-State Machine: The internal logic layer of the CBA ItemBuilder (i.e., one or multiple Finite-State Machine(s), see section 4.4) is defined for each CBA ItemBuilder Project File (i.e., multiple Tasks) share the identical Finite-State Machine definition(s). However, the Task Initialization syntax (see section 4.5) can be used to prepare the general Finite-State Machine definition for a particular Task.
Variables: In addition to the finite-state machine definition, variables in CBA ItemBuilder Project Files are also globally defined and valid for all Tasks.
Summary: The use of multiple Tasks within one Project Files requires additional considerations to make sure, the Tasks can be used independently. Dependencies can arise based on links (see section 3.11) and conditional links, the Finite-State Machine(s) (see section ) and variables (see section 4.2). If information or results within parts of an assessment needs to be shared, for instance, using Variables provided by the Finite-State Machine, these assessment components must be implemented with one Tasks (see, for instance, the examples provided in section 6.4.2).
Assessment content (i.e., items, units or clusters) distributed in booklet designs, for instance, used in balanced (incomplete) block designs, must be distributed into different Tasks to enable an test deployment software to assemble the tests as required. Similarly, for item-level adaptive testing or unit-level adaptive testing, the entities that are selected adaptively from an Item Pool must be implemented in separate Tasks.
Finally, if identical resources are used in different Tasks (for instance, audio and video files), the Tasks should be implemented in one Project File to reduce redundant files that need to be deployed (and maintained).
8.3 IT-Management of CBA Projects
Assessment projects are often not created by one person alone. If a complete computer-based instrument is developed, content experts, psychometricians, and computer experts are involved. In research contexts, practical implementation of assessment components is often supported or even delegated to student assistants. Besides, when complex assessment software, such as the CBA ItemBuilder, is utilized, user-support may be involved. Additional content experts may be asked, for instance, to review the developed items or tasks. And when data collection takes place in the context of more extensive empirical studies, professional survey institutes or data collection agencies are sometimes involved and contribute to the overall success based on their diverse experience. This large number of participants and the many individual decisions at the content level and the technical implementation of computer-based assessments quickly become quite complex. For this reason, it is recommended to use project management software whenever possible, as briefly described in the following section.
8.3.1 Use of Project Management Software
First of all, personalized user accounts are needed to use project management software for preparing computer-based assessments in teams or groups. The user accounts might be organized using groups with different roles and permissions. Personal accounts that are not shared between users are prerequisites for efficiently dividing tasks between people and assigning changes to specific users.
To distinguish and structure different phases of creating, testing, piloting, and delivering computer-based assessments, project management tools provide the concept of versions or milestones. The actual work steps are then divided into parts (issues or tasks) and managed in an issue tracker. Each topic or ticket can then be assigned to a user and processed by them individually or assigned to other users for work sharing. Observers can be registered, informed about the progress of the processing of a task. Typically, issues or tasks can be structured hierarchically and combined into superordinate work packages. Project management software can be used to process tasks using well-defined workflows (e.g., tickets of a particular category can be assigned to one of the states ‘new’, ‘in progress’, ‘review’, ‘feedback’, ‘solved’ and ‘closed’). Each ticket is typically dedicated to one separate topic, and the individual tickets can be assigned to milestones or versions. In that case, the progress within the issue tracker can be automatically used to create a roadmap that shows which work steps still need to be completed before the next milestone is reached. Finally, project management tools also provide assistance for knowledge management, for example, by providing Wiki pages or additional storage for documents or files.
Open-source tools that can be hosted on your own servers are, for instance, Redmine (Lesyuk 2013; Pavic 2016), OpenProject or Gitlab (Hethey 2013; Baarsen 2014). If the assessment content does not require special protection, public cloud solutions such as Github can also be used (Tsitoara 2020).
8.3.2 Use of Version Control Software
First, the answer to the obvious question: Why is version control helpful for developing computer-based assessments? The explanation refers to the nature of computer-based assessments. Computer-based assessments are created using different software tools. A deployment software (see chapter 7) is used, for instance, together with a Web Browser or a browser component, to show the assessment to test-takers or participants. Moreover, the test content is either created with an authoring software for the test content (such as the CBA ItemBuilder) or is specifically programmed (i.e., implemented with a particular programming language). Altogether, all components together create the computer-based assessment, typically stored in multiple files. Accordingly, a tool that keeps track of all files and identifies differences between files and folders is useful for preparing computer-based assessments. Techniques that originated in software development has proven useful for managing and creating assessments. Version control software allows one or more users to managage and document the status of a set of files (referred to as a repository) and make changes traceable at the level of individual files. Moreover, a verbal description for changes in files is documented using so-called commit-messages.
Since item authors and users of the CBA ItemBuilder may not be familiar with the idea and concrete implementation of version management software, we describe two popular systems in more detail below. However, version control software is a also a key component in the context of Open Science and Reproducible Research (Christensen, Freese, and Miguel 2019). The development environment for R (RStudio) for example comes with Git integration (Gandrud 2020).
8.3.2.1 SVN / Subversion
Often, project management software can be configured already to support the creation of repositories for version management. A project can then be assigned one or more repositories, each of which supports a specific version management technique. The use of two such methods will be described in some detail: Subversion (SVN) and Git.
Overview: Subversion (SVN) is less complex than GIT, which is why we describe this method first. Version management generally means that all files and documents belonging to a project are stored in a common repository. In the context of SVN, this repository is always on a server, i.e., usually accessible via a network. Only selected users have access to the repository. If SVN is used in combination with a project management tool, all registered users with the appropriate permission can access the repository. Every file is stored in the central repository in all created versions. Files can be added to the SVN repository using the commit operation. Check-out or update transfers or updates the files from the central repository to a local working copy (see Figure 8.3).
All users of an SVN repository work with local copies and can adjust, modify, delete, and add new files (or folders) in the working copy of the repository. After completing a particular step (e.g., after changing the assessment according to a specific ticket from the issue tracker), the changes are submitted collectively to the repository (i.e., checked-in with the help of a commit command). For that purpose, the individual changes are described in a commit message so that it is traceable which changes were checked-in. Suppose the changes are related to a specific ticket. In that case, the ticket number can be specified in the Commit message, for example, to document the changes in a traceable manner.
Repository URL: A repository URL identifies SVN repositories. If a public domain is used, they are thus globally unique. For example, the repository URL could look like this: https://{example-domain.org}/svn/{project-title}
If the version management is integrated into a project management software, then the repository URL can usually also be retrieved there.
Revision: To manage the different versions, SVN uses the concept of revisions. A repository is always in a concrete revision (starting with zero). Each commit, i.e. all changes to one or more files that are submitted at the same time, increases the revision number by one. The SVN repository stores the complete history of changes, i.e. for each file its content can be exactly determined (and if necessary also restored) to a specific revision. If you know the repository URL of an SVN and a specific revision number, then the content is also uniquely referenced.
Requirements: Many project management tools allow you to view a repository online in the browser and, for example, browse through revisions and, if necessary, follow the links to the issue tracker. However, client software is required to work with an SVN repository locally.
Different software tools for the different operating systems can be used to work with SVN repositories. For Windows, for example, TortoiseSVN is quite widespread, a free edition of an SVN software for MacOS offers, for example, SmartSVN.
Check-out: After installing an SVN client (e.g., TortoiseSVN), a working copy can be created locally for an existing repository. Even for a repository that is empty until then (i.e., revision 0), working with an SVN starts with a first check-out. The first check-out will create the working copy locally and connects the local folder to the SVN repository. If the repository is already filled with files and folders (i.e., in a revision greater than zero), then all files are downloaded and cached locally during the check-out. As soon as the check-out is completed (at a particular revision), it will be possible to work, modify, and, if necessary, even execute files in the working copy.
Checking out an SVN repository works the same way if there are already files in the repository.
Commit: The local working copy of an SVN repository can be worked within the same way as any other directory. After an intermediate (completed) state is reached, the files can be committed to the repository via the Commit command. For this purpose, the SVN client displays the files that have been changed. The selection to be transferred can be made and described with a commit-message. Afterward, the changed files are transferred over the network, and the revision number of the SVN repository increases by one. Newly inserted or files that were not yet under version control must be added to the repository with Add.
Save changes (for instance, in the CBA ItemBuilder) before committing files.
Update: As soon as more than two working copies are used (e.g., because several people are involved in the preparation of a computer-based assessment), the current status of a working copy may be out of date. For SVN repositories, this means, in the simplest case, that the current revision of the working copy is smaller than the most recent revision on the server (in the repository). If no files have been changed in the outdated working copy, a simple update can be used to update the working copy.
Check for Modifications: The question of whether files or directories in a local working copy have changed, been added, or deleted can be easily checked with the help of SVN. For this purpose, the function Check for Modifications is available, with the help of which a comparison of the working copy can be displayed with its current revision. This function can also be used to check whether an existing working copy still contains modified files that are not yet under version control.
Conflicts: As long as parallel changes in the SVN repository always affect different files, Commit and Update allow all users of the SVN repository to edit files in parallel and to share them using the repository. However, if two users make parallel changes to one or multiple identical files, so-called conflicts occurs. Conflicts can be related to files (file conflict) or to the directory tree (tree conflicts). Conflicts occur when executing the update command. If user A tries to commit a file that was changed and committed already by a different user B, SVN requests to update the working copy for user A if he or she tries to commit changes. Since the SVN repository is agnostic against its content, conflicts need to be resolved by users. With existing conflicts, no commits are possible. A graphical user interface (or the explorer integration of TortoiseSVN using the context menu) is of great value for resolving conflicts.
Advanced Features: A complete introduction to all features, options, and possibilities of version management with SVN would go beyond this book’s scope. Therefore, only the keywords for selected advanced features will be mentioned and briefly explained in the following:
Merging: SVN attempts to combine changes in the repository with local changes that have not yet been committed when updating. This process is called merging. If this does not succeed, a conflict occurs.
Ignored Files or Folders: Files or directories that should not be part of the repository but are located within the working copy’s directory can be excluded from the SVN. For instance, this function is useful if an assessment software that is part f the SVN, result data are written into a subdirectory, and test possible test data should be excluded from the repository.
Diff: For files in text format (i.e., explicit text documents, but also files with program syntax and files in CSV, DAT, INI, XML, JSON, YAML, and similar formats), the difference from a previous version can be easily displayed directly in SVN. The display of differences (diff) is beneficial, especially when the commit-message is not meaningful. SVN client tools (such as TortoiseSVN) create diffs for other file formats. Unfortunately, the simple visualization of differences of non-text based file formats (i.e., images, but also ZIP archives and CBA ItemBuilder project files) is often not possible.
Revert: In order to restore a previous state of the SVN repository, the revert function can be used to restore an earlier revision of the files and folders of the repository.
Version management using SVN can also use tags and branches, and has a concept for locking of files (see for more details, for instance, Mason 2006).
Summary: Version control allows to prepare and develop computer-based assessment using multiple files in repositories. The critical benefit of using version control compared to other file sharing approaches (e.g., cloud storage) is that modification of files are documented (using commit messages) and that conflicts (i.e., modification of identical part of the repository by multiple users) are detected (and infrastructure to handle conflicts is provided). Moreover, the revision number (of SVN repositories) can be used to exactly define the version of all93 files used for deployment of a computer-based assessment (i.e., for a particular data collection).
8.3.2.2 GIT
As a more modern alternative to SVN, the basics of version management with Git is now briefly described.
Overview: Git, unlike SVN, is a distributed version management system. This allow to use Git to manage different versions before pushing changes over a network to a remote repository. This two-step process of commits adds some complexities compared to SVN. However, it allows using Git (i.e., to perform almost all operations) locally.
Repository URL: For GIT, the remote repository is addressed via a URL. How the URL looks exactly depends on the way of communication with the server. Possible protocols are https
and ssh
.
Commit Hash: Instead of an incremental revision number (that is used by SVN), each commit in git is identified by an SH1
hash. To identify a commit, the hash shortened from 40 characters to 6-8 characters is often displayed in the git history.
Instead of taking the largest revision number, git uses a HEAD as a named pointer to a specific commit, representing the current commit (of a given branch).
Requirements: Various cloud services (e.g., github) and project management tools offer the possibility of creating git repositories and viewing them in the browser. Files can often also be edited or uploaded in the browser and changed directly in the repository via commit.
Git clients for all platforms can be downloaded from https://git-scm.com). Git clients are directly integrated in a number of tools (e.g., RStudio) and there are graphical tools for git (e.g. GitHub Desktop, SourceTree, and many more) as well as a Windows Explorer integration (TortoiseGit).
A full introduction to git is not necessary for organizing assessment projects and is beyond the scope of this chapter (see, e.g., Tsitoara 2020). In the following, only the basic steps necessary to use git without branches to manage files will be described.
Clone: Before working with files in a local working copy of a git repository, a copy of the repository is required. This can be created for empty and already used repositories via the Clone command. Compared to SVN (where checkout
was used for this step), git uses clone
to download not only the current commit (HEAD) but also all previous versions of files and all changes in the local repository.
Staged Files: Files within the working copy are, analogous to SVN, not automatically part of the repository. For that they have to be added with Add. Git then differentiates between the states for files shown in Figure 8.5.
New files are initially ignored by git, i.e. their contents are untracked. When a file is added, git marks it as staged. A snapshot of the staged files can be created in a git repository via commit. After that the state of the compressed files is umodified until they are edited or changed. Then they are marked as modified. Before edited files with the status modified can become unmodified files again via Commit, they undergo the status of staged files again. Unmodified files can be removed from the repository (i.e., marked as untracked).
Commit: Tracked files that have been modified or added (i.e. files in the staged status, see figure 8.5) can be commented. Graphical tools for git often show the files in staged state or allow to stage all files that have been modified by a simple selection.
Analogous to SVN, a commit message is required for each commit, which is then used in the git history to describe the changes. The snapshot of the files from the working copy created with the help of a commit
is marked with a hash at git and stored in the local repository.
Push: After committing files the additional push
command is required to transfer the commit, which is made on a local branch of the git repository to a remote repository (see Figure 8.4).
Fetch: When multiple users push to a git repository, commits made by another user can be retrieved with Fetch
. This makes the repository aware of changes, but Fetch
does not yet integrate them into the local working copy.
Pull: Only with the command Pull
will changes pushed to the repository by other users be downloaded and copied to the local working copy. If conflicts occur, these must be resolved and solved via a merge commit. Git provides the options use ours
and use theirs
for this.
Additional Features: The git tool, popular in software development, goes far beyond the features and functionality needed to manage (binary) assessment components in the form of CBA ItemBuilder Project Files.
Tagging: Since each commit is only marked with a hash, this is not well suited for naming a specific version (e.g. the tested final version of an assessment). For this purpose, the option of
tagging
can be used in git, where a concrete state of a repository is named with a (readable) label (e.g.v1.0
).Branches: Git has a sophisticated concept for Branches, i.e. for the division into several areas, in which files with the same name can have a different status. For example, when changes are made by different users at the same time, git automatically creates Branches. Branches can also be used to test and develop changes in a protected section, while the main section remains usable. Often, Branches are used systematically with git, for instance, whenever a new feature is to be developed and tested. A popular strategy for this approach is git flow.
Revert and Reset: Git provides several ways to access previous commits in a repository. Especially when several users work in a common git repository, it is necessary to choose carefully here. In order to track the history of CBA ItemBuidler Project Files and to revert to a previous version if necessary, it is often sufficient to display the git history in the browser if the changes were committed to the central repository via `push’. Project Files in earlier versions can then be downloaded and reused if changes are to be discarded.
Similar to SVN, git can also ignore individual files or files of a certain type or in a certain directory via the
.gitignore
file.
Summary: Git is a complex and powerful version control system whose basic features can also be used to manage assessment projects. It is superior to SVN for this task if the versions are not stored in a central repository and are to be managed locally even without a network connection.
8.3.3 Working with Project Files as ZIP Archives
Extract CBA ItemBuilder Version: Each CBA ItemBuilder file contains a file {Project-Name}.json
. In this JSON file, which can be read with a text editor, the supported version of the CBA ItemBuilder (runtimeCompatibilityVersion
) is directly in the first line.
Extract Scoring Information: Suppose many CBA ItemBuilder files are to be tested automatically and integrated into delivery software. In that case, it is a good idea to automatically check the transfer of result data (i.e., the scoring). For this purpose, the scoring can be read out automatically from the JSON file included in the CBA ItemBuilder project files.94
Reading Metadata: CBA ItemBuilder Project Files contain metadata for describing the content (see section 6.3.4). Metadata can be found in the file metadata.xml
. This XML-file following the Dublin Core specification can be extracted from the ZIP archives.
Folder/File | Description |
---|---|
stimulus.json |
Metadata and information about the tasks |
config.json |
Runtime definition of the tasks |
resources/ |
Resource files required for the task |
external-resources/ |
External resources required for the task |
Runtime Code: The CBA ItemBuilder Project Files fulfill two functions. They allow modifying, editing, and previewing of items using the CBA ItemBuilder desktop program. In addition, they can be used with existing deployment software (see, for example, section 7.5) or by programmers using the TaskPlayer API (see section 7.7) to execute assessments. Files and directories required at runtime are listed in Table 8.4. Files and directories required for editing with the CBA ItemBuilder are summarized in Table 8.5.
Folder/File | Description |
---|---|
metadata.xml |
Metadata defined in the Project File (see section 6.3.4) |
internal.json |
Internal project information used by the CBA ItemBuilder |
global_1_1.xlf |
XML file containing texts for translation in XLIF-format |
global.cbavaluemap |
XML file containing the definition of Value Maps |
global.cbaitemscore |
Scoring definition (with reference to dsl -file in folder scoringResources/
|
scoringResources/ |
Folder with dsl -files containing Scoring Conditions
|
global.cbavariables |
XML file containing the definition of Variables |
global.cbalayoutsettings |
XML file containing the layout definition for Tasks |
conditionFiles/ |
Folder with dsl -files containing Conditional Link Conditions
|
global.emfstatemachine |
XML file containing the State definitions |
statemachineFiles/ |
Folder with dsl -files containing Finite-State Machine syntax |
{page}.cbaml/.cbaml_diagram |
Page definition edited with the CBA ItemBuilder for each {page}
|
project.properties |
Global properties of the Project File |
.project |
(Can be ignored.) |
resources
and external-resources
and make backup copies in any case, or use version management (see section 8.3.2).
Replace Resource Files: Images, videos and audio files added to CBA ItemBuilder Project Files via the Resource Browser (see section 3.10.1) are included in the ZIP archives in the sub-directory resources
. If the file names (incl. upper and lower case) and the file formats (incl. the file extension) remain identical, resource files can be exchanged, modified, and updated even without the CBA ItemBuilder in the ZIP archives. The resolution (pixel width times height) for image and video files should remain the same to guarantee that the resources are appropriately rendered during runtime.95
Add ExternalPageFrame
Resources: If in CBA ItemBuilder Project Files content is inserted as ExternalPageFrame
, then the files are included unchanged in the directory external-resources
of the ZIP archive. Content can be added to ZIP archives (i.e., CBA ItemBuilder Project Files) using the Embedded HTLM Explorer (see section 3.14.2). As long as the Page Address (see Figure 3.152), i.e., the file which is directly included by a component of type ExternalPageFrame
remains identical, the external resources in the directory external-resources
can also be updated, added or inserted directly in the ZIP archive.
Edit Value Maps: Defining complex Value Maps using the editor provided by the CBA ItemBuilder (see section 4.2.4) can be cumbersome. The definition of Value Maps is stored in the file global.cbavaluemap
inside of the CBA ItemBuilder Project Files. The following XML shows the content of the file global.cbavaluemap
used for the example item shown in Figure 4.14 (see section 4.2.4).
<?xml version="1.0" encoding="UTF-8"?>
<valuemap:ValueMapper xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:valuemap="http://valuemap.softcon.de" version="VERSION_01_01">
<valueMaps xsi:type="valuemap:DiscreteValueMap" name="M_Example">
<entries guard="1" text="Red" image="1.fw.png"
audio="red.mp3" video="red.mp4"/>
<entries guard="2" text="Red-Yellow" image="2.fw.png"
audio="red-yellow.mp3" video="original.ogv"/>
<entries guard="3" text="Green" image="3.fw.png"
audio="green.mp3" video="green.mp4"/>
<entries guard="4" text="Yellow" image="4.fw.png"
audio="yellow.mp3" video="yellow.mp4"/>
</valueMaps>
</valuemap:ValueMapper>
As long as the structure of the XML file remains valid and all resources exist (i.e., images, audio and video files mentioned in the XML attributes are included in the Project File), the file global.cbavaluemap
can be extracted from the ZIP archive, edited in a text editor (or XML editor) and copied in the ZIP archive again. To make sure the CBA ItemBuilder Project File remains valid, open the file in the CBA ItemBuilder and preview all Tasks after changing the file global.cbavaluemap
.
8.3.4 Use of Continuous Integration/Continuous Delivery
Continuous Integration (CI) and Continuous Delivery (CD) refers to techniques developed in software engineering, for automatically updating software environments enforcing automation in building, testing and the deployment of applications. CI/CD is organized in so-called Pipelines, which are code scripts executed based on triggers, such as Push commands in git repositories (see section 8.3.2.2).
Content created as Project Files with the CBA ItemBuilder can be used together with the TaskPlayer API (see section 7.7) in CI/CD-pipelines. The converter described for the integration of CBA ItemBuilder items as PCI components into TAO (see section 7.4) is based on a Github workflow. The pipeline that is shared as a Github template project (see the project fastib2pci) is configured to automatically build the PCI components using CBA ItemBuilder Project Files committed to the corresponding git repository.
8.4 Testing CBA Projects
A downside of the many advantages of computer-based assessments (see section 2.1 is that the more interactive (and innovative) assessment items are, the more sensitive the created content is to errors and potential glitches. Potential issues that require (sometimes) intensive testing can be functional (e.g., a missing NEXT_TASK
command, see section 3.12.1), or affect only the layout and presentation of tasks (see section 6.8.5 for tips).
Moreover, since different software components and multiple steps are usually involved in generating assessment content, finding errors and testing assessment projects can be complex, requiring a systematic approach. The most crucial part of implementing CBA assessment projects is understanding and developing a notion of what needs to be tested and when tests may need to be repeated if specific changes have been made to a test delivery or the components used.
Testing |
---|
Cross-Browser Testing (when required) |
Component Testing / Scoring Testing |
Integration Testing / Data Storage Testing |
8.4.1 Testing Cross-Browser Compatibility
The items created with the CBA ItemBuilder are displayed in the browser with the help of a runtime (see TaskPlayer API in section 7.7) and, if necessary, other additional software components of the delivery software. In the current version, the CBA ItemBuilder runtime is implemented in JavaScript based on the React framework and tested with the CBA ItemBuilder in the current browsers at the time of release. However, additional functionality provided by the used browser or the used browser component is always necessary to display the items. Browsers are subject to continuous development and change, and several differences exist between browsers on devices with different operating systems.
The dedicated testing of CBA ItemBuilder content in different browsers is necessary if A) browsers are used which had a low penetration at the time of the release of the CBA ItemBuilder version (e.g., browsers on specific devices like SmartTVs) or were still unknown (e.g., newer browser versions after the release of a specific CBA ItemBuilder version/runtime). Cross-browser testing is B) also necessary if any content is included via the ExternalPageFrame
interface that was either implemented specifically for an assessment or has not yet been tested in the intended browsers.
Running Preview in Specific Browser: The assessment figures created with the CBA ItemBuilder can be viewed in the browser directly from the authoring tool (see section 1.4). By default, the system browser is used, i.e., the browser registered Web browser on the local computer used to run the CBA ItemBuilder. After a preview has been started, the URL opened in the default browser can be opened in other browsers if they are installed locally on the computer. In this way, assessment content can be viewed and checked in different browsers.96
Using External Tools for Cross-Browser Testing: Since CBA ItemBuilder created assessment components are generated as HTML output for integration into test deliveries, tools for testing websites in different browsers can also be used to verify cross-browser compatibility.
Key technologies for automating website testing such as Selenium or PlayWright, and the various solutions for automated website testing on various devices can be used to test cross-browser comparability. Cross-browser testing is suggested in particular when external components are embedded into CBA ItemBuilder project files using the ExternalPageFrame
-component (see sections 3.14 and 4.6.2), of if CSS adaptations are used (see section 6.3.3).
8.4.2 Testing Assessments Using Synthetic Data
For the further test steps shown in table 8.6, it has proven helpful to consider the structure of assessment projects. Assessments generally consist of many individual components (items and units, as well as instructional components like tutorials). The Components are administered either in a fixed sequence (see Fixed Form Testing in section 2.7.1), in different booklets or rotations (see section 2.7.2) or in individualized sequences (see Multi-Stage Testing in section 2.7.3 and Computerized-Adaptive Testing in section 2.7.4).
An essential first step is Component Testing to ensure that test cases systematically cover the specific conditions of all components. This means that the individual components (i.e., items or units, instruction pages, tutorials, etc.) are tested separately.
Component Testing / Scoring Testing: Component testing regarding the behavior of items can be combined with scoring testing of individual items. Depending on the complexity of the evidence identification rules used to define the scoring, scoring testing might be trivial (for instance, to make sure that the selection of radio buttons is appropriately captured). However, it can get more complex if, for instance, multiple pages are used and the missing-value coding is included in the scoring definition.
To test the scoring, the individual assessment components must first be identified. For each item, all the different correct and incorrect solutions should be systematically entered, produced, and checked. If several items or ratings are contained within a component, it is recommended to check to what extent they are independent. If the defined scoring rules reveal dependencies, then the scoring check of the individual components should also consider all combinations, as far as possible, with reasonable total effort.
For organizing the testing of CBA projects, it is recommended to use a version control system and to organize the process using an issue tracker (see section 8.3).
Integration Testing / Data Storage Testing: Data storage (typically for result data) uses the single assessment components integrated into the test deployment software. A systematic approach is based on Click Pattern (also called Mock Cases), meaning pre-defined responses to all items in a test or booklet. To verify the entered responses (synthetic data) with the results collected by the assessment software, the data post-processing (see section 8.6) should be in place (end-to-end testing).
For pragmatic reasons, the use of screen recording software (e.g., OBS Studio) to check the scoring may also prove helpful. If the screen is continuously recorded during the scoring check, then possible input errors can be identified more quickly in the event of inconsistencies.
Missing or skipped responses, time constraints, timeouts, and special test events, such as test leader interventions or test terminations at defined points, should be included in the mock cases so that missing coding can be checked later.
result-text()
-operator is used). In addition, the delivery software can use a codebook to translate them into variables (with customized variable names and variable labels) and, for categorical variables, with (newly defined) variable values (e.g., of type integer) and with additional value labels.
Please note that if item selection is also dependent on random processes in the context of adaptive testing, for instance, as part of exposure control, then the algorithms for item selection must be tested as an additional step. Testing adaptive algorithms is done, for instance, in pre-operational simulation studies, if the algorithms used for operational testing are accessible for simulation studies as well.
Verification of Log Events: As described in the section 2.8, the theory-driven collection of log data is increasingly important. Various possible process indicators can be extracted from log data, providing information about emotional and cognitive processes during test and item processing. Log data thus provide the basis for a possible improvement in the interpretation of test scores and, if use is planned, should be reviewed before an assessment is conducted. When verifying and checking log data, special attention should be paid to the fact that log events depend on their context (Contextual Dependency of Log Events). Therefor, verifying and checking log events may require a reconstruction of the context from previous log events.
8.5 Running Assessments
After the preparation and testing of assessment projects, the actual data collection (fieldwork) takes place. The data collection can be released with a specific revision or tagged version status if a software tool is used to version the assessment content (see section 8.3.2). Suppose this revision number or version information is also stored in the survey data. In that case, it can be traced which exact version was used for a test-taker, even in the case of longer data collection phases and possible adjustments during the field time.
Test Administration / Data Collection |
---|
User Management / Authentication |
Test Deployment |
The steps summarized in table 8.7 start with User Management / Authentication. If assessments are embedded into other (digital) environments or (longitudinal) designs, information might be linked to the identifiers used for test-takers authentication (e.g., as log-in or token). These so-called pre-load variables need to be handled by the test deployment software (see, for instance, section 7.5). Special focus is also on identifier used for persons (user management / authentication), as these identifiers might need to be replaced during data post-processing.
While the assessment is running typical using one or multiple (mixed-mode) delivery modes described in section 7.2.1, intermediate data might be available to start data processing while the data collection is running (for instance, by providing the Raw Data Archives of already completed assessments incrementally). If CBA is integrated into a computerized survey with even more components (e.g., questionnaires or interviews), then selected data from the assessment can also be taken over directly for field monitoring (e.g., in the form of a monitoring file, see section 7.5.7).
8.6 Data Processing after Assessments
The data preparation process should be already tested as part of the Integration Testing (see section 8.4.2). For this purpose, the required routines (e.g., R scripts) should already have been created prior to data collection and tested with the help of synthetic data. Testing is complete if it is verified that the central information can be derived from completing tasks required for identifying evidence about the test-taker’s knowledge, skills, and abilities.
Data Preparation / Reporting / Feedback |
---|
Data Preparation |
Coding of Open-Ended Responses |
Final Scoring / Cut Scores / Test Reports |
Data Set Generation / Data Dissemination |
ExternalPageFrame
content is used, the required data preparation for result data boils down to combining data stored in case-wise individual raw data archives into a data set in the desired file format.
Data Preparation: Data preparation can begin during data collection if intermediate data are provided or made available. Typically, the data is generated in smaller units (i.e., sessions) in which a test-taker processes a set of tasks compiled for him or assigned to him via pre-load information. The data on a test taker, as provided by the assessment software, can be understood as a Raw Data Archive. Analogous to the scans of paper test booklets, these Raw Data Archives (for example, combined as a ZIP archive) are the starting point for data preparation. If raw data from computer-based assessments must be archived in terms of good scientific practice, then this can be understood as the requirement for long-term storage of the Raw Data Archives.
A first step often required to describe the collected data as pseudonymized or anonymized is the exchange of the person identifiers (ID change) that were used during data collection. Person identifiers might be used as the file name of the Raw Data Archives and might be included in several places. Since the Raw Data Archives should not be changed after data collection, the data processing means extracting the relevant information from the Raw Data Archives and changing the person identifier in the extracted result data and the extracted log data.
Approaches known for Open Science and Reproducible research (Gandrud 2020) should be used (i.e., using scripts that are maintained under version control), to allow re-running the complete data preparation starting from the Raw Data Archives to the final data sets. If the data preparation is carried out entirely with the help of scripts (e.g., using R), later adjustments are more straightforward. Possible adjustments include deletion requests for the data of individual test-takers, which might otherwise be cumbersome if, for example, a large number of data sets is created due to the collected log data (see section 2.8).
LogFSM
(see section 2.8.5).
Coding of Open-Ended Responses: Operators described in chapter 5 for the CBA ItemBuilder for evaluating so-called Open-Ended Answers are currently limited. Open-ended answers (such as text answers) can only be scored automatically to a minimal extent (in the CBA ItemBuilder, only with the help of regular expressions). More modern methods of evaluating open-text responses using natural language processing methods [NLP; see, for instance, Zehner, Sälzer, and Goldhammer (2016)] might require a two-step procedure. Training data are collected in the first step and not evaluated live during test-taking. Afterward, classifiers are trained based on NLP language models or adapted in the form of fine-tuning. Once such classifiers are available, the answers can be automatically evaluated by test-takers and transferred to the data set.
A similar procedure applies to graphical answer formats (e.g., when an ExternalPageFrame
allows test takers to create a drawing for their answer). For the creation of training data as preparation of an automatic coding or if answers are to be evaluated exclusively humanly, the open answer must be extracted from the raw data archives for an evaluation process (Human Scoring).
ExternalPageFrames
, the JavaScript/HTML5 content embedded into CBA ItemBuilder items must implement the getState()/setState()
-interface to collect the state of the ExternalPageFrames
on exit and to allow to restore the content for scoring purposes (rating).
Final Scoring: The decision of whether items already score the responses (scoring) or whether only the raw responses (i.e., the selected items, entered texts, etc.) are collected at runtime is made differently for different assessments. As long as the responses are not needed for filtering or ability estimation (see section 2.7), there is no critical reason why scoring should not be performed as part of post-processing. Only if created assessment content is shared (see section 8.7.3) is it helpful to define the scoring, for instance, directly within the CBA ItemBuilder Project Files (i.e., the files to be shared), because this way, items are automatically shared with the appropriate scoring.
Cut Scores and Item Parameters: Even if the scoring, i.e., for example, the mapping of a selection to a scoring (correct, wrong, partially correct), can be part of the item (i.e., is implemented, for instance, using the scoring operators described in Chapter 5), the Item Parameters and potential Cut Scores (i.e., threshold values for estimated latent abilities) are not considered to be part of the assessment content, because these parameters might either not be known when a newly developed instrument is used for the first time or the values might depend on the intended target population.
Test Reports: Different parts of an assessment software might be responsible for feedback either during the assessment (see section 2.9.1), or after data processing and scoring of open-ended responses (see section 2.9.2). Hence, reports can be generated either online (as part of the assessment software) or offline (as part of the data processing).
ShinyItemBuilder
(see section 7.3.5).
Data Dissemination: The provision and distribution (i.e., dissemination) of data from computer-based assessments, for example in research data centers, can be done for Result Data and Process Indicators in the typical form of data sets (one row per person, one column per variable). Since the different assessment platforms and software tools provide log data in different ways, log data can be transformed into one of the data formats described in section 2.8.4 as part of the data processing after an assessment.
8.7 Documentation and Archiving of Computer-Based Assessments
The assessment cycle introduced in the beginning of this chapter (see Figure 8.1) contains the Documentation & Dissemination as the last component. In general, documentation can be understood with respect to the items (i.e., the instrument) and the data (see Table 8.9).
Documentation |
---|
Item and Instrument Documentation |
Data and Log-Data Documentation |
Archiving and documentation of computer-based assessments can have different objectives. The first central question is whether there is a link to research data that has already been collected. Hence, the software’s archiving often takes place in the context of the data archiving so that questions regarding the interpretation or understanding of the existing data can be answered concerning the used software. In this case, the software used should be provided along with the assessment content (i.e., tasks, instruction pages, etc.) as closely as possible to how they were used to collect the research data. However, since the software might come with specific requirements archiving the computer-based assessment must take into account these requirements so that the software can (hopefully) also be executed in the future.
Archiving of computer-based assessments can also serve the purpose that other researchers or stakeholders can use the developed assessment instruments (sharing). The two goals need not be mutually exclusive, but it should be made clear what the goal of archiving computer-based assessments is.
- Goal to archive assessment content to document an existing data set
- Goal to allow the use of developed content in future data collections
A second key issue concerns the separation of assessment software and assessment content. Such a separation exists, for example, if the software allows the export of the assessment content created with it, as it is the case, for instance, with TAO that allows exporting the items in QTI format (Question and Test Interoperability97). In the case of QTI, different software components could be used to administer assessments that use the QTI content. A similar separation also applies to the CBA ItemBuilder, which allows the assessment components created with it to be archived independently of the software (i.e., the specific version used to author the CBA ItemBuilder project files and the software used for test-deployment). Since the CBA ItemBuilder project files contain the runtime configuration (see section 8.3.3), that is sufficient to use deployment software (including TAO, see section 7.4) or the TaskPlayer API (see section 7.7).
- Requirements to run / use the software (operating system / frameworks / browsers)
- Requirements to run / use the content (compatibility of content, e.g., QTI version)
A third question concerns the anticipated, expected, and allowed use and possible modifications to the archived computer-based assessment, for instance, for future data collections on new samples. This third question includes licensing issue regarding the content (i.e., the items and possible embedded resources such as images), licensing of the software, and the technical aspects required for using (i.e., executing and running) the software securely.
- Right to use the software / the content for specific purposes (e.g., new data collection)
- Right to store the software / the content (for instance, for achieving)
- Right to distribute the software / the content for further use (e.g., for other projects)
- Right to change the software / the content (for instance, to adjust for further needs)
8.7.1 Archiving CBA Software to Document Datasets
If the goal is to archive a digitally-based assessment to interpret existing data, a first idea could be to archive the complete software as used for the data collection.98 The underlying rationale is similar to paper-based assessments and the practice of archiving the assessment materials (i.e., booklets), for instance, as PDF files. However, acknowledging that the assessment was digitally based, more than static representations for items or screens (e.g., screenshots) might be required, and archiving the assessment as an interactive system might be considered the natural choice.
Documentation of Requirements: Whether the archiving of the software used in data collection is useful depends, first of all, on how the requirements needed to run the software can be fulfilled. Accordingly, a prerequisite for investigating the viability of this approach is a documentation of all runtime requirements from a technical perspective. Assessments used in offline deployments (see section 7.2.1) might require a particular operating system, require a minimum screen resolution, and might be tested only for particular pointing devices (e.g., not tested for touch input). Beyond these apparent requirements, dependencies (i.e., specific browser versions, installed frameworks or components, such as Java or .NET), user privileges (i.e., is admin access required), and network requirements (e.g., free ports) need to be documented and considered. If assessments were performed with dedicated hardware (i.e., computers that were deployed to the assessment sites), additional settings and configurations (e.g., at the operating system level) might also be necessary in order to be able to reproduce the data collection with the archived software. In particular for mobile deployments using apps, the distribution of the assessment software to the mobile devices needs special attention. For online deployments, both perspective need to be distinguished: For online deployments, both perspectives need to be distinguished: At the client side, supported (i.e., tested, see section 8.4.1) browsers need to be documented, while at the server side, documentation of runtime requirements and the server configuration might be relevant to run the assessment software.
Software Virtualization: Techniques such as Virtual Machines (used, for instance, for desktop virtualization, such as VMWare, Virutal Box or Parallels) and Containers (used, for instance, for server virtualization, such as Docker or LXC) might help to make software (in specific environments) available for a more extended period. However, in particular for desktop virtualization, licensing of the operating system need to be considered.
Intended Use of Archived Assessment Software: The critical question regarding the usefulness of this type of archiving is what researchers can do with assessment software archived in this way. If no further precautions have been taken in the assessment software itself, then items can be replayed and answered in the combinations used in the field (e.g., within a booklet design). This option can be helpful, for example, to learn about the items (i.e., the assessment content) in context, to inspect the behavior of items and the assessment platform, and to investigate how prompts or feedback were displayed. If the archived assessment software also provides the generated (raw) data access, this approach also allows checking how a particular test taker or response behavior is stored or represented in the data set.
8.7.2 Dedicated Approaches for Documenting CBA Data
As described in the past section, archiving the assessment software itself, while an obvious idea, is of limited benefit for documenting data from computer-based assessments without special provisions within the assessment software.
Documentation of Result Data and Process Indicators: In terms of documentation of outcome data (i.e., raw responses and input as well as scored responses), data sets with result data of computer-based surveys are standard. Hence, codebook documents can be used to describe the result data (in terms of metadata). Result Data, available in variable values per person, can be supplemented by additional Process Indicators (i.e., information describing the test-taking process), for which a value (including NA
) is also expected for each person.
If knowledge of the specific item content is necessary for interpreting the result data or the process indicators, insight into tasks provided by an archived assessment software may be sufficient. However, some information about the log data generated when interacting with the assessment content can be necessary to document Raw Log Events and Contextualized Log Events (see section 2.8.1 for the terminology).
Documentation of Raw Log Events and Contextualized Log Events: Which interactions are generally stored by a digitally-based assessment can often be documented and described even without the specific assessment content. In case of the CBA ItemBuilder, the log events provided by the items are described for the different components used to implement the content (see appendix B.7 for a documentation of log events), and additional log events might be defined by the item author (described as Contextualized Log Events). Moreover, the deployment software is expected to add additional log events at the platform-level.
The more challenging part of the documentation is to relate the assessment content and the collected log data so that the data can be meaningfully interpreted in the context of test-takers interactions and assessment content.
Real Items and Live Access to Log Events: The obvious option to allow researchers to inspect interactive assessments is to give them the computerized items in a form where the events stored in the log data are visible after one has demonstrated a particular behavior or interacted with the item. This can be achieved by different approaches, either by modifying the deployment software (see, for instance, section 7.3.7) or by using the authoring software (see Trace Debug Window in section 1.6.2).
Documenting Instruments Using Mock-Items: Given that assessments are often translated (e.g., in the context of international large-scale assessments), there is another way of documenting interactive items to facilitate the interpretation of log data. For that purpose, we define Mock-Items as items in which the sensitive item content (i.e., everything that should not become public to keep the items secure) is replaced by placeholders. Such a replacement is required for all texts, images, video, and audio files that could provide hints about the item’s content. However, it is assumed that replacing the content is possible without altering or destroying the interactive items’ structure and functioning.
Screen Casts or Annotated Screenshots: Documenting log events can also be done using screen casts (i.e., screen recordings showing a particular behavior and the generated log events), for instance using released items. Or annotated screenshots of computer-based instruments can be used. And even specifically created webpages that show how specific interaction are logged can be used (e.g., PIAAC R1 Log Data Documentation).
8.7.3 Approaches to Archive or Share Assessments for Re-Use
Beyond documenting existing data, an important goal can be sharing developed assessment content to use in future data collections.
Sharing Software as is: Although similar to the idea described above (see section 8.7.1), sharing assessment content bundled with an assessment software as-is for re-use adds additional challenges. The following aspects require special attention: First, it must be considered that the redistribution of the software is different from the use of the software, so it may be a question of licensing whether the right to redistribute exists for the software and for the included content. A second aspect concerns the issue of IT security. For archiving accompanying a data set, the assessment software is used under controlled conditions. However, if sharing assessment for re-use aims to facilitate future data collections with a digitally-based assessment using existing software as is, it must also be possible to do so safely. For online deliveries, in particular, this requires patching and applying security updates sooner or later, meaning the possibility of maintaining the software.
Sharing of Software with Sources: Many assessment software maintenance and customization issues can be solved if the runtime components (i.e., compiled or built code) and the source code are archived. In particular, if assessment content and assessment software are not separate, making them available, for example, via a public source code repository (e.g., GitHub.com) may allow other researchers to reuse the resources developed. While the open source provision of assessment software naturally presupposes the right to disseminate the sources, it also presupposes the human resources (i.e., appropriate IT know-how) to be able to use them.
Sharing of Content (Only): An obvious alternative to sharing created assessment content for further use arises when the Content can be separated from the Software. The option to share created items as Content is at first analogous to paper-based assessment. As soon as a PDF or Word document of a test booklet is shared, it can be used to prepare future assessments.
Two examples will be examined in more detail here. If a standard exists (as is the case, for example, with Open Office XML99 for text documents), then different programs can use documents that follow that standard. The Question & Test Interoperability (QTI) specification can be understood as a similar standard for computer-based assessments. If, for example, items created in TAO are exported in QTI format, then these can be stored and used in later assessments if an assessment software can read and process the QTI format. The apparent prerequisite for this model to be applicable is that the assessment content can be implemented as QTI items. As the field of computer-based assessment continues to evolve, the QTI standard is also being expanded and adapted100. Hence, it might be necessary to document the exact version of the QTI standard, and only the particular version of the software used to author the QTI items (e.g., a specific TAO version) might interpret the assessment content precisely (i.e., the rendering and behavior of the interactive content might be different across different QTI players). Moreover, if the software used for QTI editing adopt to a new version, a migration process might be required.
If the standard is not sufficient, sharing the content independently from the software used to create the content can also be possible. This is illustrated with the CBA ItemBuilder, which does not follow the QTI standard. However, as long as a deployment software is available that supports this version of the CBA ItemBuilder, the generated content can be used for future data collections.
Migration Strategy: Project Files of recent CBA ItemBuilder versions can be used for assessment projects, as long as sufficient browser support is provided and no technical or security-related issues prohibit the use of old versions. If archived Project Files of an older version can no longer be used in current delivery software, older Project Files can be migrated using the CBA ItemBuilder. Migrating an outdated Project File means opening the Project File in a newer CBA ItemBuilder and then saved as a Project File in this new version. Doing so will update the generated code or the runtime configuration required to use the Project File with a particular deployment software.
The update of Project Files is possible because the implementation of the CBA ItemBuilder ensures that a newer version can read the content of the previous version and convert it if necessary. Accordingly, it may be necessary to perform the migration in multiple steps (using intermediate versions of the CBA ItemBuilder). The release notes of the CBA ItemBuilder (see Table B.5) provide information on points to be considered regarding backward compatibility.
8.7.4 Assessment Content as Open Educational Ressources (OER)
The archiving of created and digitally based implemented assessment content in educational science applications can be understood as a particular form of Open Educational Resources (OER). This is particularly true if the goal is to enable content sharing, where the developed items constitute the shared resource.
Before making extensive adjustments to items, it must be thought about whether this will change psychometric properties and item parameters that have been empirically determined or verified, for example, with the help of a scaling study (see section 2.5.4).