5 Scoring of Tasks
This chapter describes how scoring can be defined in CBA ItemBuilder Projects Files. Scoring is always defined on the task level (see section 3.6 for details on defining tasks). Accordingly, before scoring can be implemented in newly created CBA ItemBuilder projects files, a Task must be configured (see section 3.6 for details). A second prerequisite for defining the scoring of Tasks is to define names for all required components. That means that User Defined Ids
(see subsection 3.7.4 for details) is required for the components used to gather responses. Human readable User Defined Ids
are suggested to remember the meaning of particular identifiers when using them, for instance, in the syntax definition of scoring rules. Since this definition of scoring rules is based on the User Defined Ids,
systematically defined and easily readable IDs simplify the creation and validation of scoring rules for item authors.
The definition of explicit scoring rules is not mandatory for the use of CBA ItemBuilder items, but it provides the most flexible way to combine evidence into scoring. Alternatively, 1) the so-called component state, that is, the values selected or entered for all input elements, can be automatically stored by the test assembly and deployment software (see chapter 7). Moreover, 2) the components can be linked to FSM variables (applies to version 10.0) and the last value of FSM variables when a task is exited can be used. Finally, 3), log data can also contain all changes of component values and it is often possible, to infer about the final response from the collected log events (see section 1.6, and Kroehne and Goldhammer (2018) for response-completeness of log data).
Motivation: Explicit automatic scoring of items can be necessary at runtime when scoring results that incorporate logical rules are required either for adaptive test assemblies (branched testing, multi-stage testing, or adaptive testing, see section 2.7.4), for the different feedback purposes (see section 2.9), or to monitor the test-taking processes. Scoring of CBA ItemBuilder Tasks is evaluated at task switches (i.e., when tasks are changed from one task to another). Task switches can either be triggered from within the Task (using Runtime Commands, see section 3.12, or from outside (by the deployment software, for instance, because of a global timeout, see section 7.2.8). Having the scoring definition implemented within the assessment components created with the CBA ItemBuilder can also simplify data post-processing workflows (see section 8.6) and sharing of items (for instance, as Open Educational Resources, see section 8.7.4). Hence, automatic scoring provides essential advantages over above mentioned alternatives: It standardizes the scoring procedures and gives immediate access to the scored results.
Implementing the scoring rules within the CBA ItemBuilder project files comes with a second advantage. The scoring definition becomes independent from the deployment software (see chapter 7) and the approach used for data post-processing (see section 8.4.2). Scoring embedded in CBA ItemBuilder projects can be tested already during item development and will be available after distributing items (as CBA ItemBuilder projects). An essential tool for checking the scoring in CBA ItemBuilder items is the so-called Scoring Debugger, as described already in section 1.5. The Scoring Debugger can be used to inspect the scoring live during Preview.
5.1 Terminology, Concepts and User Interface
The core component for the definition of the automatic scoring is the design of syntax conditions, which can be evaluated based on inputs (i.e., Component State of elements) and operators (i.e., incorporating the states in the internal finite-state machine(s) and the visited pages).
User Defined Id’s: The link between components and the syntax is provided by the User Defined Id's
. Using the main menu Project > Edit all user defined IDs
all named components of a CBA ItemBuilder Project File can be displayed. As shown in Figure 5.1, for a simple multiple-choice item, various components of type Checkbox
can be defined, all fo them with a unique UserDefinedId
.
Figure 5.2 shows a simple item to illustrate the scoring of a multiple-choice item. The item was created by adding a HTMLTextField
(see subsection 3.8.2 for the instruction and the four Chechboxes
(see subsection 3.9.3). A simple scoring of this item might distinguish the conditions Correct (Jaguar and Panda) and Wrong (Jaguar and Panda are not selected or additional wrong options are selected). If a test-taker never selected any Chechboxe
at all, this might be called a Missing response (see section 5.3.11 for details).
The CBA ItemBuilder allows to define scoring-conditions using these UserDefinedIds
. The conditions are labeled of Hits (i.e., Hit-conditions) and Misses (i.e., Mis-conditions) combining UserDefinedIds
of components, and additional functional operators (see appendix B.2 for all operators), if necessary, combined with logical operators. Each condition represents a nominal conditions that is to be differentiated when computing the value of a scoring variable (called Class). The item contains one Task (defined in the Task-Editor, see subsection 3.6. The Task-Editor is also used to define the scoring (see Figure 5.3).
Class: Scoring is defined using nominal Hit-conditions. Each conditions corresponds to a potential value of a variable. To specify the relationship of values to variables, Hit-conditions (i.e., values) are assigned to Classes (i.e., Classes are equivalent to Variables in the final data set), as shown in Figure 5.4. The variable Score
can have the three potential values Correct
, Wrong
and Omitted
.
Name: Each hit- and miss-condition requires a unique name, that is defined in the Task-Editor (shown in Figure 5.3). This name represents the nominal value of the variable, if the corresponding condition is met (i.e., if the hit is active).
Condition Syntax: Each Hit-condition is defined by providing a scoring syntax. The example item shown in Figure 5.2 contains three different Hit, and the syntax for the conditions are shown in the editor for Conditions in Figure 5.5. The first condition for the hit Correct
combines the User Defined Id's
of the four Checkboxes
with logical operators (using the CBA ItemBuilder specific bracketing of expressions, see section 4.1.3).
However, the syntax for scoring rules (see section 5.3) also allows using scoring operators, for instance, to incorporate information from the dynamic part of CBA ItemBuilder tasks. Operators are illustrated in Figure 5.5 in the syntax for the hit-condition for Wrong
, which evaluates to true if the item’s current state is Answered
. Note that the state Answered
is implemented using a simple finite-state machine (see section 4.4 for details).
Use first active hit per class (applies to all tasks).
(see right part of Figure 5.5).
Scoring Debug Window: The Scoring Debug Window (already introduced in section 1.5) can be used to explore the scoring of the example item shown in Figure 5.2, as shown in Figure 5.6. The Scoring Debug Window can be requested during item development in the Preview and is also available in the examples embedded in the online version of this book.
Multiple Classes: If components are to be used only in a particular way to form outcome variables, defining scoring constraints may be more onerous than strictly necessary (see section 5.2 for an alternative). The full potential of CBA ItemBuilder scoring unfolds in the use cases when different summaries of answers to variables are to be used. This is illustrated in Figure 5.7 for a simple Likert-style item. Assume that two variables should be created: One variable containing the response (Class: Response) and one indicating agreement or disagreement [Class: Style, as used, for instance, in models to investigate response style; cf. Böckenholt and Meiser (2017) and others].
By introducing the layer of hit definitions (described in detail in section 5.3), the CBA ItemBuilder allows the creation of flexible scoring for responses that can combine multiple components and can result in multiple variables (i.e., classes). Use cases not only include questionnaires (as the example in Figure 5.7) but can also be found for cognitive assessment, e.g., when both the raw response and the automatically scored response (correct vs. incorrect) are to be stored or when dichotomous and polytomous scoring is to be considered. Use cases for explicit scoring with multiple classes also arise if, for example, time measures are included in the scoring of responses.
Result-Text: Hit- and miss-conditions can be used to define evidence in terms of categorical values, which can then be assigned to classes to be used as outcome variables. However, also entered text and numbers are required as result variables. To copy text responses to result variables, the CBA ItemBuilder provides the result_text()
-operator. The underlying idea is that each class (i.e., variable) can provide a Result-Text in addition to the name of the active hit. The condition defines which particular value is used as Result-Text. The (first) active hit-/miss-condition of a class defines which text is copied into the Result-Text.
The item shown in Figure 5.8 illustrates the use of the Result-Text. The first class (Var1
) is used for question 1: Class Var1
has only one hit-condition with the syntax result_text(input1)
. This condition is always true, and whatever is entered in the SingleLineInputField
with the User-Defined Id input1
is copied to the Result-Text for Var1
. For question 2, the class Var2
is used with two hit conditions. When a text is entered into the InputFiled
with the User-Defined Id input2
(i.e., the text is not empty checked with the condition matches(input2,"")
), the value is copied to the Result-Text using the operator result_text(input2)
in the condition Q2_Text
. When nothing is entered, the Result-Text is filled with the string Missing
(see hit-condition Q2_Missing
). The class Var3
is used for question 3. The class contains either the selected option (A or B) in the Result-Text (see hit-conditions Q3_A
and Q3_B
). If neither A
, B
or Other
is selected, the string Missing
is copied to the * Result-Text* (see hit-condition Q3_Missing
). Two hit-conditions are defined that deal with conditions that Other
is selected. If no text is entered into the SingleLineInputField
with the User-Defined Id input3
, the text Other: Not Specified
is copied to the Result-Text (see hit-condition Q3_OtherNotSpecified
). If a text is entered, the Result-Text is filled with the string Other:
followed by the provided text. This is achieved by using an argument list for the restult_text()
-operator (see 4.1.5). Note that Var3
will not contain the text entered into input3
if A
or B
is selected. This issue is addressed by defining Var4
that contains the text entered in input3
, even if Other
is not selected.
5.2 Scoring using FSM Variables
Using items provided by Toplak, West, and Stanovich (2014) the item in Figure 5.9 illustrates scoring using variables. In this example, the finite-state machine updates variable values, designed to allow immediate feedback (correct response, intuitive incorrect responses, and any other wrong response) and to compute the total score for all seven items of the Cognitive Reflection Test.
Checkboxes
) and groups (e.g., RadioButtonGroup
and FrameSelectGroups
) using FSM Variables (instead of so-called Hit-/Mis-conditions).
5.3 Definition of Explicit Scoring Rules
5.3.1 UserDefinedId's
as String Literals
An important part of possible scoring rules are input elements, i.e. components for the design of items, which have a value (i.e., are either selected or un-selected). To refer to the value of a component in a Hit- or Miss-condition, it is sufficient to include the UserDefinedId
of the respective component into the condition-syntax. For instance, for a checkbox
with the UserDefinedId: myCheckbox
, the string literal myCheckbox
is interpreted as TRUE
if the checkbox
is selected, when the syntax is evaluated. If the checkbox
is not selected, the string literal myCheckbox
is interpreted with the value FALSE
.
CheckBox
- components, the selected/unselected state of RadioButton
- components, the toggle state of Buttons in toggling mode and the selected/unselected-state of ComboBoxItem
in a ComboBox
can be used to define Hit- or Mis- conditions by simply referring to the UserDefinedId
of the component.
5.3.2 Syntax for Scoring Rules
The item scoring mechanism implemented in CBA ItemBuilder goes beyond simple mapping of Scoring Conditions (i.e., hit- and miss conditions) to component states. This is enabled by providing the possibility to formulate conditions as arbitrary combinations of statements using a so-called Domain Specific Language (DSL, i.e., by using a specific syntax).
To combine UserDefinedIds
of the components to logical expressions, the following logical operators can be used:
-
A and B
:true
ifA
andB
evaluate totrue
. -
A or B
:true
ifA
orB
evaluate totrue
. -
not A
:true
ifA
is nottrue
.
Flexible combinations of conditions are possible with the basic operators and
, or
and not
. Use the Scoring Debug Window (Ctrl / Strg + S
, see section 1.5) to explore the hit conditions in the item shown in Figure 5.10.
Notice the specific bracketing in the last hit condition shown in Figure 5.10: (((A and B) and C) and D)
. This condition illustrates that combining multiple Boolean expressions requires to include brackets so that the statement can be decomposed into pairs: A
and B
, (A and B)
and C
, and finally ((A and B) and C)
and D
.
For a number of scoring tasks, simply checking the Boolean value of components is not sufficient. Therefore, the CBA ItemBuilder provides functions in the scoring syntax (so-called operators), which can be used within the scoring syntax to take into account properties of the current task for the formulation of hit and miss conditions.
5.3.3 Sequential Evaluation of Scoring Rules
By default (i.e, when not configured differently), Hit- and Miss-conditions are evaluated independently. If variables are created, i.e., hits are assigned to classes, a central condition must be met: At any time, precisely one hit must be active for each class. This condition follows directly from using hits as (categorical) values for variables. Consequently, hit conditions within a class must always be formulated in such a way that they are mutually exclusive. To support checking this condition, the CBA ItemBuilder’s Preview of tasks provides the Scoring Debug Window, which contains a red exclamation mark once multiple hits are active for a class (see Figure 5.10).
However, a powerful alternative is to active the sequential evaluation of scoring conditions in the Task-Editor by selecting the checkbox Use first active hit/miss per class (applies to all tasks)
. If this option is activated, the evaluation is performed sequentially, starting with the first hit condition of a class. Only if the first hit is not true
, the second hit is evaluated. Accordingly, a last hit (when no other conditions evaluate to true
) can be added, for instance, to simplify missing value coding (see the item shown in Figure 5.2 as an example).63
Use first active hit/miss per class (applies to all tasks)
is not activated.
5.3.4 Use of Text Responses in Scoring Rules
Text responses can be automatically scored inside of the CBA ItemBuilder using keywords or pattern. The provided matches()
-operator takes two arguments: The UserDefinedId
of the component used to collect the text response (see section 3.9.1) and a regular expression (see section 6.1 for details).
matches(UserDefinedId, RegularExpression)
The matches()
-operator can be used with regular expressions (see section 6.1) and with concrete texts. Examples for using the matches()
-operator are illustrated in Figure 5.11.
Note that the logical operators and
, or
, and not
can be combined with several matches()
-operators and other conditions. Hence, there is no need to formulate too complex regular expressions since multiple expressions can be combined using multiple matches()
-operators.
5.3.5 Use of FSM-Variables in Scoring Rules
The value of FSM-Variables can be used within scoring rules (i.e., Hit- and Mis-conditions). This is achieved using the variable_in()
-operator:
variable_in(FSMVariable,SetOfValues)
An examples for using the variable_in()
-operator is provided in the Figure 5.12 for the scoring of a Drag-and-Drop response format, implemented using FSM Variables (see section 4.2.6 for the implementation of Drag-and-Drop).
The visited_all_values_of_variable()
-operator can be used to check whether a variable has taken one or more concrete values in the course of test-taking (see Figure 5.13 for an example):
visited_all_values_of_variable(FSMVariable,SetOfValues)
5.3.6 Use of Positions for Free Drag-and-Drop in Scoring Rules
As described in section 4.2.6, the CBA ItemBuilder supports free drag and drop. The panel_position_range()
-operator can be used to score the position of drag-and-drop elements (see Figure 5.14 for an example):
[CheckNonMembers], XStart, XEnd, YStart, YEnd,
panel_position_range(Container, Center, Component, Component, ...)
The operator evaluates to true
if the (X,Y) positions of all given Components
in the given Container
are within the range given by XStart
, XEnd
, YStart
and YEnd
relative to the container’s (X,Y) position. If the flag CheckNonMembers
is not give or set to true
, the operator only evaluates to true
if the (X,Y)-positions of all other components in the given Container
are outside the given range. The upper left corner of the component is used as (X,Y)-position of a Component
if the flag Center
is not provided as true
.
Alternatively to the position of drag and drop element, the distance to score can also be evaluated. The panel_distance_range()
-operator returns true if the mutual distance of all given Components
in the given Container
are within the given range between MinDistance
and MaxDistance
:
[CheckNonMembers], MinDistance,
panel_distance_range(Container, MaxDistance, Center, Component, Component, ...)
5.3.7 Use of Events, States and Interaction in Scoring Rules
The CBA ItemBuilder provides various operators to incorporate events and the number of interactions into scoring conditions.
Number of Events: The number of events that have been raised during the execution of the current task can be used in scoring conditions. The CBA ItemBuilder considers an event to be raised even if it did not trigger a transition, and the count includes events raised by the raise()
-operator:
raised_events()
If only the number of specific events should be counted, the raised_nb_events()
-operator can be used:
raised_nb_events(SetOfEvents)
An even more advanced version of the raised_nb_events()
-operator exist, that can be used to count how often one or multiple events were raised, while the item was in a particular state:
raised_nb_events_in_state(State, SetOfEvents)
Indicators for Events: In addition to the operators that count the events (of a particular type / within states), operators exist to check if an event was triggered. These operators evaluate to true
(instead of returning the frequencies). The raised_all_events(EventA, EventB)
return true
if all events listed in the set of events (e.g., EventA
and EventB
) were raised:
raised_all_events(SetOfEvents)
Again, an more advanced version of the raised_nb_events_in_state()
-operator exist, that can be used to check if one or multiple events were raised, while the item was in a particular state:
raised_all_events_in_state(State, SetOfEvents)
The following item shown in Figure 5.15 illustrates the use of the event-related operators.
Number of State Visits: The visited_nb_states()
-operator return the number of the visits for a set of states during the execution of the current task:
visited_nb_states(State, State, ...)
Indicators State: The is_last_state()
-operator returns true
if the last state the finite-state machine is one of the given states in the SetOfStates
:
is_last_state(SetOfStates)
While the is_last_state()
-operator refers to the last state of the finite-state machine, the visited_all_states()
-operator can be used the check if all states listed in the SetOfStates
were visited during the execution of the current task:
visited_all_states(SetOfStates)
Number of Interactions: A simple generic operator is provided that counts the number of user-interactions within the current task:
user_interactions()
Note that this operator counts the total number of interactions within the running task. Counting specific interactions in FSM variables is possible using the finite-state machine (see section 4.4).
Elapsed Time: Another generic operator is provided that measures the elapsed time in the current task:
elapsedTime()
The scoring-operator elapsedTime()
counts the total time in the current task. Measuring more specific time intervals is possible using finite-state machines (see section 4.4.6 and the example provided in Figure 4.60).
5.3.8 Use of Specific Operators in Scoring Rules
Tree
Components: The scoring of response formats created using components of type Tree
, TreeView
and TreeChildArea
(see section 3.9.9) is supported with the following operators:
- The operator
current_node()
allows to check if in a particularTree
aRegularExpression
matches to the node path ID of the current node:
current_node(Tree, RegularExpression)
- The operator
exists_nodes()
returns number of nodes in the givenTree
whose node path ID matches at least one of the givenRegularExpressions
(each node counts once only):
exists_nodes(Tree, RegularExpression, RegularExpression, ...)
- The operator
visited_nodes()
returns number of visited nodes in the givenTree
whose node path ID matches at least one of the givenRegularExpressions
(each node counts once only):
visited_nodes(Tree, RegularExpression, RegularExpression, ...)
- The operator
matches_nodes()
returns the number of nodes in the givenTree
whose node path ID matches theNodeIdPattern
and whose column values match the specifiedColumnPatterns
. The firstColumnPattern
corresponds to the node name, the secondColumnPattern
to the first additional column, etc. (each node counts once only).
matches_nodes(Tree, NodeIdPattern, ColumnPattern, ColumnPattern, ...)
Pages: An operator current_page()
is provided to check if a specified Page
is currently displayed (or is displayed within the specified PageArea
):
current_page(Page)
current_page(Page, PageArea)
For browser-components that support the bookmark function (see section 3.13.2) the following operator can be used to check if a page was bookmarked:
bookmarked(Page)
Spread Sheets: Operator to score value (or the computed formula) entered in a spreadsheet table with a given UserDefinedId
(see section 3.9.8) as integer value:
integer_value(UserDefinedId, RoundingMode, Default)
The parameter RoundingMode
can take the values up
. down
, half_up
and half_down
. If the text content is empty or does not represent a number, the Default
value is returned.
To score the entered formula (instead of the value), the matches()
-operator provides the additional argument Selector
. If the value formula
is requested, the operator evaluates the formula text of a spreadsheet table cell (instead of the formula value):
matches(Component, RegularExpression, Selector)
Highlighting: The following operators are provided to score the response format of multiple text highlighting (see section 3.8.3):
highlighted(RichText, RichText, ...)
complete(Selection, Selection, ...)
partial(Selection, Selection, ...)
5.3.9 Note on Scoring with PageAreas
PageAreas (see section 4.1.4) can be used to embed existing pages as content when designing pages. The CBA ItemBuilder allows that identical content can be re-used multiple times in different PageAreas on a single page, as illustrated in Figure 5.16. It is therefore generally necessary to add the UserDefinedId
of the PageArea to all references to components displayed within PageAreas.
{UserDefinedId-of-PageArea}.{UserDefinedId-of_Component}
.
5.3.10 Scoring Rules and Result Text
As shown in Figure 5.8, the CBA ItemBuilder integrates the handling of numerical and string responses into the Scoring Rules (i.e., Hit- and Miss-conditions) using the result_text()
-operator. For each class, the active hit is determined first. If the option Use first active hit/miss per class (applies to all tasks)
is activated (see section 5.3.3), this is the first condition within a class that applies. Otherwise item authors need to make sure that all conditions are mutually exclusive within each class. If the active hit contains a result_text()
-operator, numerical or text input is provided as Result-Text.
5.3.11 Missing Value Coding for Tasks with Multiple Pages
The following examples shows, how to define scoring for single choice and multiple choice items including hits for not reached items and omitted responses. For this purpose, a variable is defined in the finite-state machine that counts how often a page was visited.
Items without response on a not visited page are coded as not reached (NR), missing responses on visited pages are coded as omitted response (OR). For more details, see the item shown in Figure 5.17 and use the Scoring Debug Window (as described in section 1.5.
5.4 Automatically Generated Variables
The CBA ItemBuilder runtime will create some selected variables automatically:
- reactionTime: Time (in milliseconds) between the start of the task execution and the first user interaction.
- execTime: Time in (milliseconds) since the start of this task execution.
- nbInteractions: Number of user interactions since start of the current task execution.
Deployment software for CBA ItemBuilder tasks (see chapter 7) can use identical tasks multiple times and can allow to re-visit tasks. For that purpose, the runtime also computes the cumulative variables:
- reactionTimeTotal: Accumulated time (in milliseconds) between the start of the task execution and the first user interactions in previous executions of the task (excluding the last execution, that can be found in the variable reactionTime).
- execTimeTotal: Accumulated time (in milliseconds) in previous executions of the task (excluding the last execution, that can be found in the variable execTime).
- nbInteractionsTotal: Accumulated number of user interactions in previous executions of the task (excluding the last execution, that can be found in the variable nbInteractions).
5.4.1 Scoring Complete Tasks with Weights
If only dichotomous scoring is required for the complete Task, the CBA ItemBuilder implemented a simple approach.
-
MinHits: For each Task can be defined, how many Hit-conditions must be fulfilled, that the task is scored as
True
. - Weight: Each Hit-/ and Miss-conditions is assigned to a Weight.
- Class: Each Hit-/ and Miss-conditions is assigned to a Class and to each Class either Hit-/ and Miss-conditions are assigned.
Each task provides the following results:
- result: Overall result (\(1\) if the at least the number of hits is active that is defined as the property MinHits, \(0\) otherwise).
- nb_Hits: Number of (active) hits.
- Hit_weight: Total weight of hits.
- nb_Misses: Number of (active) misses.
- Miss_weight: Total weight of misses.
- credit_Class: Name of the Class with the highest value. The value is computed as the sum of weight for all active Hits (in classes with Hits) or all active Misses (in classes with Misses).
- credit_weight: Weight of the class with the highest class weight.
5.5 Checklist and Complete Workflow
As a summary the following list describes the typical workflow that is required for implementing automatic scoring in the CBA ItemBuilder:
Prepare the implementation of scoring by defining explicit User Defined IDs (see 3.7.4) for all components that should be used for scoring. It is not possible to define scoring using the automatically generated User Defined IDs that start with a
$
-sign.Define a task as an entry point for the CBA ItemBuilder project. Since the scoring definition is done per task, a task must always be defined first (see section 3.6 for details).
When tasks are defined, define Classes. For each variable that should be included in the result data for a particular Task define one variable. For all newly created items activate the option
Use first active hit per class (applies to all classes)
.Define Hit-Conditions, that evaluate to
true
if the required conditions are fulfilled (see section 5.3.2. Order the Hit-conditions and add a default condition with the hit-syntaxtrue
as last condition. This will ensure that each class has one active hit (see section 5.3.3). For a usual workflow Miss-conditions are not needed, neither are Weights.Extract string information using the
resultText()
-operator, if necessary. While hits are as values of categorical variables, the Result-Text can be used to capture numerical or text responses.Assign hits to classes. Classes fulfill the function of variables in the scoring of CBA ItemBuilder projects. Besides the unique name of the class (variable name), a description of the class can be entered in the Class Comment (variable description).
Decide how to handle missing response and implement, if necessary, additional Hit-Conditions for omitted responses and not reached questions (see section 5.3.11).
Test the scoring implementation using the Scoring Debug Window. If the option
Use first active hit per class (applies to all classes)
was not activated, make sure that exactly one hit (or miss) is active for each class at any point in time (see 8.4.2).