2.8 Log and Process Data

Computer-based assessments provide the opportunity to collect not only the final work product (i.e., raw responses and scored responses, see section 2.5.1) but also to allow the collection of so-called log data that origin from students’ interactions with the computer-based assessment platform (e.g., clicked buttons, selected radio buttons or checkboxes, entered text, mouse-moves, etc.) or internal system changes (e.g., timers). The examination of these log data from cognitive ability testing has gained increased attention (Goldhammer and Zehner 2017), for instance, in educational research, since the computer-based assessment was introduced in large-scale assessments (e.g., Greiff et al. 2016).

2.8.1 Basic Terminology

In the context of computer-based assessment, using log data is still a relatively new field of research. Different terms like stream data, log file data, event data, process data, and others are used (and sometimes mixed). In order to illustrate the meaning and use of log data for the theory-based construction of process indicators and to provide guidance concerning potential implementations in the CBA ItemBuilder, a conceptual clarification follows first.

Paradata: Additional information about the test-taking process and the data collection can be understood as part of the so-called paradata, commonly used in social science research (e.g., Kreuter 2013). Kroehne and Goldhammer (2018) summarizes categories of access-related, response-related and process-related paradata.

Response-related paradata include all user interactions (pointing device like mouse click or touch events, keyboard entries of hardware keyboards or soft keyboards) together with all state changes of components that have different states (components like checkboxes, radiobuttons etc. that can be selected and deselected) and internal states (like timers etc.)
Process-related paradata cover, for instance, information related to the navigation within assessment instruments (see section 2.4) as well as interactions not directly related to item responses
Additional assess-related paradata can occur when administering computer-based assessments. For example, this data can inform when an assessment is started, continued, or completed with what type of device.

Describing possible paradata with a taxonomy cannot cover all possible future applications of the additional information available in technology-based assessments. For a deeper look, it is worth considering the underlying nature and origin of this additional information, the emergence of which can be conceptualized in terms of events that create what is called Log Data.

Log Events: Log Data are generated and stored by assessment platforms as events, that inform about how a platform providing assessment material was used and how a provided platform changed (see section 1.6). For the data to be events, we can assume without further limitation that each event contains the following information:

Time stamp: When did something take place?
Event name: What has taken place?

The Time Stamp can represent an absolute date and time, or it can represent a relative time difference. The Event Name (or Event type) is only a first nominal distinction of different events. As described in section 1.6, log events in the context of assessments can be expected to contain the following additional information:

Person identifier: Which test-taker can it be assigned to?
Instrument identifier: Which part of an assessment can it be assigned to?

The assignment to a person is made by a reference (e.g., in the form of an identifier), and this personal reference must be taken into account in the context of using log data as research data (e.g., in the form of an ID exchange, see section 8.6). The reference to a part of the instrument can be established, for example, by an item identifier or a unit identifier or by describing the level at which a log event occurred (e.g., test-level).

Event-specific attributes: What additional information describes what happened?

The Event Name describes various possible log events distinguished by an assessment platform. Each Event can provide specific further information, which in addition to the Event Name form the actual content of the log data. Depending on the event type, the event-specific attributes can be optional or required, and attributes can have a data type (e.g., String, Boolean, or some numeric type). If the information provided by the assessment platform with an event-specific attribute is not in atomic format (i.e., if it is not a single piece of information but a data structure, see Kroehne In Preperation for details), storing log data in rectangular data set files becomes more challenging (see section 2.8.4).

Raw Log Events: From a technical perspective, events in digital environments like web browsers are required and used for programming and implementing digital (interactive) content, such as assessment instruments. Accordingly, a basic layer tries to connect at a low level to make those events available and usable for diagnostic purposes. The resulting log events not specific to any concrete task or assessment content are called Raw Log Events. Raw Log Events have event types that relate the captured information to the event’s origin (e.g., button click, mouse move, etc.). Raw log events are not necessarily schematically identical to the events of the used technological environment in which the (interactive) assessment content is implemented (such as, for instance, HTML5/JavaScript for browser-based content). However, raw log events are platform specific (i.e., different software implementations of identical content can provide different raw log events). Hence, the assessment software defines which raw log events are captured (and how).

How to do this with the CBA ItemBuilder? Assessment components created with the CBA ItemBuilder automatically provide Raw Log Events (see appendix B.7 for a description of all log events). For interpreting the log events it is crucial to define User Defined Ids (i.e., identifiers for interactive components, that are used as references for the events data provided as trace logs to the components used in the Page Editor to design the assessment content, see section 3.7.4).

Contextualized Log Event: Based on the assessment content, a second kind of log event can be described: Events that inform about an event concerning a particular action or change in a concrete task or a particular item. These events can be called Contextualized Log Events, and instrument developers (i.e., item authors) need to define which particular action or internal change has which particular meaning. The event name (or event type) can encode the semantics of contextualized log events, and contextualized log events fit (as raw log events) into the concept of log events as described above.

How to do this with the CBA ItemBuilder? The definition of specific Contextualized Log Events as part of the implementation of assessment materials with the CBA ItemBuilder is possible (see Operators to Create Trace Messages described in section 4.4.6) and recommended if the derivation of the events based on the Raw Log Events is either laborious or if theoretically defined Contextualized Log Events are already defined as part of the instrument construction. HTML5 / JavaScript assessment content that is included in CBA ItemBuilder projects (see section 3.14 for a description of ExternalPageFrames) can provide custom log-entries (Raw Log Events or Contextualized Log Events) via the API described in section 4.6.4.

Feature Extraction: Tagging or labeling selected Raw Log Events as Contextualized Log Events can be understood as an example of Feature Extraction (i.e., the derivation of Low-Level Features using the raw log events, see Kroehne and Goldhammer 2018). In this context, Contextualized Log Events are Actions (i.e.,Low-Level Features that occur at a point in time but do not have a time duration). More generally, Actions are contextualized information that can be extracted from the log data. So-called States (i.e., Low-Level Features that have a time duration) supplement the possible features that can be extracted from log events. As a rule, log events indicate the beginning and end of a States, while Actions represent specific Log Events that occur within States.

How to do this with the CBA ItemBuilder? The R package LogFSM described in secction 2.8.5 can be used to analyze log data provided by the different deployment software tools (see chapter 7).

Process Indicators: Information about emotional, motivational, and cognitive processes during assessment processing may be contained in log data. Their interpretation in the context of assessments is guided by psychometric concepts such as validity (e.g., Goldhammer et al. 2021) and scientific principles such as reproducibility, replicability, and (independent) replication of empirical research.

Raw log events are platform-specific and are not suitable for defining indicators since if an assessment is re-implemented in a different technical platform, it cannot be assumed that the Raw Log Events will arise identically. Accordingly, the definition of process indicators that can become the subject of empirical validation is based on low-level features (Actions and States), where Actions also include Contextualized Log Events.

In this context, Process Indicators are aggregates of Low-Level Features (e.g., the number of occurrences of a particular Action or the aggregated time in a particular State), meaning values of person-level variables that can be derived from low-level features, and for whose interpretation theoretical arguments (e.g., in the sense of evidence identification) and empirical support can be collected. Psychometric models (e.g., measurement models) can be used, for instance, to investigate the within-person relationship of process indicators across different items or tasks and their relationship to outcome measures.

How to do this with the CBA ItemBuilder? Low-level features extracted with LogFSM can be used to compute Process Indicators in R.

2.8.2 Item Designs and Interpretation of Log Data

In line with the terminology described in the previous section, Kroehne and Goldhammer (2018) describe a framework for analyzing log data. The core of this framework is the decomposition of the task processing into sections (called States), which can be theoretically described regarding an assessment framework.

The presentation of an assessment component (i.e., an item or unit, for example) always begins in a designated start state. Raw Log Events collected by an assessment platform can be used to mark the transition from one state to another state. As described above, Raw Log Events can also indicate specific Contextualized Log Events (i.e., Actions with a task-related interpretation). This way, two identical Raw Log Events can be interpreted differently depending on the current state (called Contextual Dependency of Log Events, see Kroehne and Goldhammer in Press).

Decomposition of Test-Taking Processes: The theoretical framework can also be used to describe item designs with respect to the interpretability of log data. For this purpose, it is helpful first to consider what creates States. According to Kroehne and Goldhammer (2018) the meaning of States is constituted by combining the displayed information (i.e., what is presented to test-takers on screen) with the possibilities to interact (i.e., what can test-taker do and how can test-taker interact with the content).

Suppose the presented information changes (i.e., a page change or a modification of the visible area of a scrollable page) or the opportunities change how the test-taker can interact with the assessment content. In that case, it may be helpful to describe the test-taking process using two different States. A log event (or events) can mark the transition between the old and new state (e.g., a page-change-event or scroll-event). If the interpretation of the two states differs meaningfully, then the interpretation of the involved log event(s) follows from the difference between the two states.

Log events can, for example, represent the selection of a response from predefined response options (i.e., events that can be categorized as answer-change-event). Suppose a State contains the view of an item stem, question, and the possibility to respond, for instance, by selecting a radio button. In that case, these answer-change-events can distinguish two states, BEFORE_RESPONDING and AFTER_FIRST_RESPONSE. While the assessment is ongoing, meaning while the item is on screen and the test-taker still has the opportunity to change the response, the state AFTER_FIRST_RESPONSE cannot be further decomposed into WHILE_RESPONDING (the time between the first and the last answer-change event) and AFTER_RESPONDING, as it is not yet decided whether the test-taker will select an answer only once (meaning, only one answer-change-event), or the test-taker will change to a different answer by clicking on different radio button. However, the situation will be different if log data from concluded assessments are analyzed. Either way, the interpretation of the state BEFORE_RESPONDING rests on the premise that the item design allows assigning this time component to one question. This is only possible with assumptions when multiple questions are presented in parallel on one screen.

Process Indicators for Item Analysis: Based on the decomposition of test-taking processes into individual sections (i.e., States), which are subdivided by log events, an in-depth item analysis can be performed, for example. An example of using frequency-based aggregates of low-level features (e.g., number of Actions meaning events tagged as answer-change events) and low-level features within states (e.g., number of visits of an item after selecting a final response), as well as time-based aggregates (e.g., total time after selecting the final response), can be found in Lahza, Smith, and Khosravi (2022).

States with Dedicated Meaning: Theory-driven created, interpretable process indicators are also possible if dedicated States can be crafted with a specific interpretation regarding the measured construct. An example of this idea can be found at Hahnel et al. (2019), based on making additional information necessary for solving a task of the task solution visible only after a test-taker interaction. The additional information about the source of a document is placed on a dialog page, and buttons that create log events need to be clicked to open and close the dialog page.

A similar concept underlies the analysis of navigation behavior in hypertext reading when relevant pages (i.e., States reconstructed based on navigation behavior, on which information essential for the task solution is presented) are distinguished from irrelevant page visits (see e.g. Naumann 2015).

2.8.3 Completeness of Log-Data

Which Raw Log Events are provided by an assessment platform depends on the respective programming, and Contextualized Log Events are each related to concrete item content. Hence, both forms are not suitable to describe whether the programming of a computer-based assessment provides all (relevant) log events. Kroehne and Goldhammer (2018), therefore, describe different completeness conditions.

How to do this with the CBA ItemBuilder? The log data collected with the CBA ItemBuilder can be insepcted live during a Preview of the assessment using the Trace Debug Window (see section 1.6).

Log Data versus Result Data: The starting point for differentiating different completeness conditions of log data is the review of the relation between log data and result data of a (computer-based) assessment. The result data contain for each item the raw responses (for instance, the text entered into text fields and the selection of choice elements such as radio buttons and checkboxes), and if implemented within the assessment platform, the scored item responses (see section 2.5.1). Although result data can be missing coded (see section 2.5.2) already when provided by the assessment software, we ignore missing value coding for the following explanation.

Response completeness: Suppose a result variable that contains the final selection of, for instance, a group of radio buttons (e.g., A, B and C). The value for this variable is an identifier for the finally selected radio button or a transformation of this identifier to a numeric value using a simple mapping (e.g., 1=A, 2=B, and 3=C). Log data of the intuitive type answer-change are generated each time the selection of the radio button group is changed. If A is selected first, followed by a selection of C, two answer-change log events are expected, one for the first selection (A) and one for the second selection (C). Taking both log events together in the correct order allows us to reconstruct the final selection and, thus, the value of the result variable (C or 3). Hence, if all answer-change events are collected, the result data can be re-constructed form the log data. If this is possible, all answer-change events are logged (whatever technique is used to collect the responses), and log data are called response complete. To achieve this property, answer-change events needs to be ordered by timestamp. However, we do not need real time measures, it is sufficient that the order of log events is maintained by the logging.

How to do this with the CBA ItemBuilder? Although the collection of log data is well developed for the CBA ItemBuilder, scoring defined within the CBA ItemBuilder tasks is only evaluated when test-taker end the task (using a runtime command, see section 3.12). Accordingly, only the raw input can be reconstructed from log events provided by the CBA ItemBuilder runtime in the current version and result data and log data are stored in parallel. Note, however, that using the TaskPlayer API (see section 7.7) deployment software that uses the CBA ItemBuilder runtime can request scoring data at any time (and multiple times).

Progress-completeness: If the answer-change events can not only be sorted, but all answer changes are logged immediately with sufficient accuracy, then log data can also be Progress Complete. To check this property, it must be ensured that the result variables can be determined from the log events at any time (and not only after an assessment component, i.e., an item or a unit, has ended). This property can be easily illustrated through text input. If the changes in a text input field are only logged when the input focus changes, then Progress Completeness is not satisfied because the values of the result variables can be reconstructed from the answer-change events only at the times when the test-taker leaves the input field. To achieve Progress Completeness all text changes (e.g., in connection with the key-down or key-up-events) would have to be logged.

State‑completeness: The completeness conditions described so far are agnostic regarding the planned use of log data. This is different for the condition described as State Completeness. Consider a use case in which we want to replicate findings from a specific study that used a particular set of States (or specific Actions or Contextualized Log Events). To verify that this replication will be possible using the assessment software under consideration, State Completeness needs to be checked regarding this differentiation. For that purpose, all transitions between distinguished States need to be identified with available Raw Log Events. Note that the Raw Log Events used for the re-implementation can be different from the original implementation as long as all required transitions (as well as Actions and Contextualized Log Events) can be recognized from the log data with specific Raw Log Events that are provided by the new platform.

Replay‑completeness: Verifying log data with respect to State Completeness is especially helpful if a concrete set of States and Actions (or Contextualized Log Events) is known. If one wants to ensure that log data is as complete as possible so that all changes based on user interactions and internal state changes, such as timers, etc., are included in the log data, then Replay Completeness is helpful. Replay Completeness is fulfilled when a screencast (like a video)²⁸ can be recreated from the collected log data. Figure 2.22 provides an example.

FIGURE 2.22: Item illustrating the replay feature (work in progress) (html|ib).

How to do this with the CBA ItemBuilder? The development of the CBA ItemBuilder starting with Version 9.8 aims to achive Replay Completeness. Note, that in the current preview the pointing device (i.e., the mouse pointer) is not included and, hence, invisible for the replay.

An understanding and objective analysis of the completeness of log data (i.e., what interactions and system state changes can be inferred from the log event data) is also crucial for making valid statements about Idle Times and for interpreting time differences in log data analyzed as time-stamped action sequences (e.g., Ulitzsch, He, and Pohl 2022).

2.8.4 Data Formats for Log Data

Log data collection often requires specific programming and presents additional requirements that must be specified, implemented, and tested (see, for instance, section 8.4 for details about Testing of CBA projects).

Even though based on the definition of log events described above (see section 2.8.1), the structure of the event data can be derived, the data of software developers in concrete assessment software tools do not necessarily have to be stored in log files in a structured way. Often log data is stored mixed with other data (including paradata and metadata), and functional requirements (regarding presenting assessment content and collecting final responses) might be prioritized for the assessment software in comparison to a transparent separation of different data forms.

For various reasons, data from an assessment platform may initially be stored in a preliminary (and proprietary) data format, and additional steps of data post-processing (see section 8.6 may be necessary to extract the Raw Log Event data (or Contextualized Log Events). The data formats for log event data described briefly below must therefore be generated from the data from the preliminary data formats used by particular assessment software. It does not matter whether the assessment software stores the data internally in a relational or document-based database or whether it is based on the generic markup language XML or the JSON serialization commonly used in web contexts.

Flat and Sparse Log Data Table: Starting from a long-format with one event per line, the storage of log data in the form of a one big Flat and Sparse Log Data Table (FSLDT) is possible (Kroehne In Preperation). As long as the minimal conditions described above (see section 2.8.1) are fulfilled (i.e., each log event is assigned to a person, has an event type or name, and a timestamp), the corresponding columns in the FSLDT are filled in for each line. Suppose many different event types that contain different required or optional event-specific data. In that case, the FSLDT contains many missing values and can become large and messy. Moreover, additional specifications are required for non-atomic event-specific data (i.e., to handle nesting, see Kroehne In Preperation).

Universal Log Format: Log data can be stored clearly and efficiently using simple relational database concepts. For this purpose, the data is stored in tables per event type. Each of these data tables thus contains fewer missing values, and the semantics and data types of the event-specific attributes (i.e., columns) can be defined and documented (see section 8.7). Missing values only occur for optional event-specific attributes, and additional specifications can be used to handle nested data types in the form of additional tables.

The individual tables per event type can be combined and sorted again based on the time stamps to create an FSLDT. The individual tables can be saved as data sets in the common data set formats (CSV, Stata, SPSS, …) and thus easily managed by research data centers since standard procedures (e.g., for the exchange of identifiers, see section 8.6) can also be applied.

eXtensible Event Stream (XES): Developed to achieve interoperability for archiving event logs data, the IEEE 1849-2016 XES Standard is the most attractive format for storing log data in a way that different tools can read. As described in Kroehne (In Preperation), the XES format combines information about the log data (i.e., how the data are stored) and the data, making this standard very flexible and valuable for log data from computer-based assessments. However, although the data are stored in XML format, researchers unfamiliar with the XES standard cannot read or verify the data without additional tools.

How to do this with the CBA ItemBuilder? When post-processing data collected with CBA ItemBuilder content with the R package LogFSM (see secction 2.8.5), log data are processed and provided as (compressed) XES file and in the Universal Log Format (in either Stata, SPSS or CSV tables). When reading and merging all event-specific tables from a ZIP archive containing data in the Universal Log Format, a Flat and Sparse Log Data Table can be created in R.

Learning Analytics: Log data gathered in assessment can also be described and stored using the concepts developed in the domain of learning analytics (for instance, the experience API statements, xAPI). The test-taker, required for log events after data preparation as person identifier corresponds to the actor in an xAPI statement. The event name (if necessary in combination with one or multiple event-specific attributes) can create a verb (e.g., clicked). The object is specified by the instrument identifier, and when necessary, further specified by event-specific attributes. Finally, context information of an xAPI statement can refer to access-related paradata (e.g., the location where an assessment takes place) or to metadata or linked data (such as the instructor of a course, in which an assessment is conducted).

Note that other standards (such as IMS Caliper and Hao, Shu, and von Davier 2015) exist that might be worse to consider.

2.8.5 Software Tools

The development of generic tools for analyzing log data from computer-based assessments is still in its infancy. Often, log data are only analyzed in a study-specific way, for example, by creating specific programs for analysis (cf. PIAAC Log Analyzer, Goldhammer, Hahnel, and Kroehne 2020).

LogFSM: A generic tool for analyzing log data based on algorithmic processing of log events (Raw Log Events or Contextualized Log Events) using finite-state machines is the R package LogFSM, which implements the framework for feature extraction suggested by Kroehne and Goldhammer (2018). Finite-State Machines (FSM, e.g., Alagar and Periyasamy 2011) are used, for instance, in the CBA ItemBuilder to implement dynamic interactive items (see section 4.4, and Rölke 2012; Neubert et al. 2015). Similar principles are also useful for the analysis of log file data (e.g., Kroehne and Goldhammer 2018).

Further R Packages: Additional R packages for analyzing log data include, for instance, LOGAN (Reis Costa and Leoncio 2019) / LOGANTree, ProcData (Tang et al. 2021), and TraMineR (Gabadinho et al. 2011).

References

Alagar, Vangalur S., and K. Periyasamy. 2011. Specification of Software Systems. 2nd ed. Texts in Computer Science. New York: Springer.

Gabadinho, Alexis, Gilbert Ritschard, Nicolas S. Müller, and Matthias Studer. 2011. “Analyzing and Visualizing State Sequences in R with TraMineR.” Journal of Statistical Software 40 (4). https://doi.org/10.18637/jss.v040.i04.

Gobert, Janice D., Michael Sao Pedro, Juelaila Raziuddin, and Ryan S. Baker. 2013. “From Log Files to Assessment Metrics: Measuring Students’ Science Inquiry Skills Using Educational Data Mining.” Journal of the Learning Sciences 22 (4): 521–63. https://doi.org/10.1080/10508406.2013.837391.

Goldhammer, Frank, Caroline Hahnel, and Ulf Kroehne. 2020. “Analysing Log File Data from PIAAC.” In Large-Scale Cognitive Assessment: Analyzing PIAAC Data, edited by Debora B. Maehler and Beatrice Rammstedt. Cham: Springer.

Goldhammer, Frank, Carolin Hahnel, Ulf Kroehne, and Fabian Zehner. 2021. “From Byproduct to Design Factor: On Validating the Interpretation of Process Indicators Based on Log Data.” Large-Scale Assessments in Education 9 (1): 20. https://doi.org/10.1186/s40536-021-00113-5.

Goldhammer, Frank, and Fabian Zehner. 2017. “What to Make Of and How to Interpret Process Data.” Measurement: Interdisciplinary Research and Perspectives 15 (3-4): 128–32. https://doi.org/10.1080/15366367.2017.1411651.

Greiff, Samuel, Christoph Niepel, Ronny Scherer, and Romain Martin. 2016. “Understanding Students’ Performance in a Computer-Based Assessment of Complex Problem Solving: An Analysis of Behavioral Data from Computer-Generated Log Files.” Computers in Human Behavior 61 (August): 36–46. https://doi.org/10.1016/j.chb.2016.02.095.

Hahnel, Carolin, Ulf Kroehne, Frank Goldhammer, Cornelia Schoor, Nina Mahlow, and Cordula Artelt. 2019. “Validating Process Variables of Sourcing in an Assessment of Multiple Document Comprehension.” British Journal of Educational Psychology, April, bjep.12278. https://doi.org/10.1111/bjep.12278.

Hao, Jiangang, Zhan Shu, and Alina von Davier. 2015. “Analyzing Process Data from Game/Scenario-Based Tasks: An Edit Distance Approach.” JEDM-Journal of Educational Data Mining 7 (1): 33–50.

Kreuter, Frauke, ed. 2013. Improving Surveys with Paradata: Analytic Uses of Process Information. Wiley Series in Survey Methodology. Hoboken, New Jersey: Wiley & Sons.

Kroehne, Ulf. In Preperation. “Standardization of Log Data from Computer-Based Assessments.”

Kroehne, Ulf, and Frank Goldhammer. in Press. “Tools for Analyzing Log-File Data.” In.

———. 2018. “How to Conceptualize, Represent, and Analyze Log Data from Technology-Based Assessments? A Generic Framework and an Application to Questionnaire Items.” Behaviormetrika. https://doi.org/10.1007/s41237-018-0063-y.

Lahza, Hatim, Tammy G. Smith, and Hassan Khosravi. 2022. “Beyond Item Analysis: Connecting Student Behaviour and Performance Using e-Assessment Logs.” British Journal of Educational Technology, October, bjet.13270. https://doi.org/10.1111/bjet.13270.

Naumann, Johannes. 2015. “A Model of Online Reading Engagement: Linking Engagement, Navigation, and Performance in Digital Reading.” Computers in Human Behavior 53 (December): 263–77. https://doi.org/10.1016/j.chb.2015.06.051.

Neubert, Jonas C., André Kretzschmar, Sascha Wüstenberg, and Samuel Greiff. 2015. “Extending the Assessment of Complex Problem Solving to Finite State Automata: Embracing Heterogeneity.” European Journal of Psychological Assessment 31 (3): 181–94. https://doi.org/10.1027/1015-5759/a000224.

Reis Costa, Denise, and Waldir Leoncio. 2019. LOGAN: Log File Analysis in International Large-Scale Assessments. Manual.

Rölke, Heiko. 2012. “The ItemBuilder: A Graphical Authoring System for Complex Item Development.” In World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, 2012:344–53.

Tang, Xueying, Susu Zhang, Zhi Wang, Jingchen Liu, and Zhiliang Ying. 2021. “ProcData: An R Package for Process Data Analysis.” Psychometrika 86 (4): 1058–83. https://doi.org/10.1007/s11336-021-09798-7.

Ulitzsch, Esther, Qiwei He, and Steffi Pohl. 2022. “Using Sequence Mining Techniques for Understanding Incorrect Behavioral Patterns on Interactive Tasks.” Journal of Educational and Behavioral Statistics 47 (1): 3–35. https://doi.org/10.3102/10769986211010467.

Note that develops the the Text Replay suggested by Gobert et al. (2013) further.↩︎