Why are you gesturing? Elicited Gestures and Learning Gains in an Embodied Learning Environment

Alejandro Andrade, Joshua A. Danish, Adam V. Maltese

Indiana University

Systems Dynamics

To understand ecosystem dynamics among competing populations, experts have focused on developing models of disturbances over time (Grotzer, Kamarainen, Tutwiler, Metcalf, & Dede, 2013). In ecosystems, a dynamic disturbance framework includes concepts such as interacting feedback loops among chained populations (Hokayem, Ma, & Jin, 2015). For instance, predation ecosystems show that while the numbers of predator and prey are continuously changing, the two populations constitute an unstable yet resilient equilibrium over time (Folke, 2006; Grotzer et al., 2013). However, understanding dynamic disturbances as an emergent property of predation ecosystems is hard because it requires an understanding of causal effects that take place over time, which manifest as cyclical, non-linear, bidirectional, and delayed effects (see Figure 1; Folke, 2006; Mitchell, 2009).

figure 1

Embodied Learning

To help understand this complex aspect of ecosystems (i.e., disturbances and interactions), the proposed project builds upon theories of embodied cognition. Embodiment theory suggests that, in order to learn a concept meaningfully, this concept should be grounded in body-based experiences (e.g., Abrahamson & Sánchez-García, 2016; Lee, 2014; Nathan et al., 2014; Smith, King, & Hoyte, 2014). Body-based experiences provide support for the understanding of a concept via analogical mapping (Barsalou, 2008; Lakoff & Núñez, 2000),  where simulated mental models of action support learning insofar as the dynamic aspects of the body-based experiences map onto the dynamic aspects of the to-be-learned concept. Although the exact cognitive mechanisms of embodiment are still unclear (Mahon & Caramazza, 2008), educational researchers are making use of sensing technologies, such as the Kinect, the Leap Motion sensor, or automated visual tracking, to design embodied activities in which learners make use of gestures or physical movements to interact with virtual objects.

Eliciting Congruent Gestures

While many studies have focused on spontaneous movements, fewer have examined how those movements solicited explicitly by the experimenter or the design of educational software might influence students’ explanations and learning (Lindgren, 2015; Nathan et al., 2014). We refer to these explicit movements, solicited by the pedagogical design, as elicited movements. The working hypothesis is that the increase of one’s movement repertoire would spur the learning of a concept by grounding that concept in elicited body-based experiences (Lindgren, 2015). Furthermore, the training experience would leave a historical trace in students’ cognition (Nathan & Walkington, 2017). During later performance or explanations of the physically-learned concept, this embodied impression will express in the form of gestures (Hostetter & Alibali, 2008; Nathan & Walkington, 2017). As these gestures are congruent with the dynamic aspects of a concept, but take place at a later time, we refer to them as spontaneous congruent gestures. Moreover, researchers suggest that production of spontaneous congruent gestures during later performance or explanations correlates with learning gains due to the tight coupling between gesture and conceptual understanding (Kang & Tversky, 2016; Nathan & Walkington, 2017).

The Embodied Simulation of Predation Dynamics (ESPD)

In our instructional design, called the Embodied Simulation of Predation Dynamics (ESPD), the student uses their hands to represent unstable equilibrium between two populations (e.g., foxes and rabbits) as a phase-shift sinusoidal cyclical pattern. The student sees a graphical change and is asked to shadow the movement of the graphs with their hands, and, in this way, has an embodied experience of unstable equilibrium between two populations (see Figure 2). In an environment such as this, how will we know if the embodiment is supporting learning?

figure 2

Methods and Data Sources

Fifteen third and fourth graders (F = 8, M= 7, Avg. Age = 9.13, SD Age = 0.8) were individually interviewed by one of the researchers where they answered a pre-tutorial questionnaire, interacted with the simulation, and then answered a post-tutorial questionnaire. Interviews were videotaped and took thirty minutes in average. The learning space was created by two pedagogical moves: (a) eliciting a bimanual movement, and (b) an inquiry-based approach with predicting and reflection questions between tasks that help the student to reflect beyond superficial structural analogies (Nathan & Walkington, 2017). The tutorial protocol included nine tasks, dividing in three phases (Briefing, Training, and Demonstration). The pre- and post-tutorial questions were adapted from Hokayem, Ma, and Jin (2015) and students’ answers were scored using the Feedback Loop Reasoning Coding Scheme, also adapted from Hokayem et al. (2015). The coding scheme consists of seven levels (the higher the level the more in-depth the students’ understanding is), detailing the reasoning progression about feedback loops within a predator-prey ecosystem (see Figure 3). A Wilcox Sign test for repeated measures compared the pre- and post-tutorial median difference, and a Mann-Whitney U test compared between gesturing and non-gesturing groups median difference. To analyze students’ use of gestures during the interviews, multimodal transcripts (Norris, 2004) were created.

figure 3

 Finding 1: After experiencing an elicited movement of predation ecosystem dynamics with the ESPD, learners showed learning gains of unstable equilibrium

Feedback loop reasoning significantly increased from pre to post-tutorial scores (Mdn = 4 and 6, respectively), Z(15) = 2.779, p-value = .008, with a large effect size, r = .718.

Finding 2: Changes in UE understanding correlate with spontaneous use of congruent gestures

Students who gestured in their post-tutorial explanation had greater learning gains from pre to post-tutorial (Mdn = 1.5 point increase) than those who did not gesture (Mdn = 0 point increase), and this difference is statistically significant, Z(15) = 1.878, p-value = .033 one-sided alternative hypothesis, with a large effect size, r = 0.485.

Finding 3: Gestures play an important role in students’ explanations of cyclical, non-linear, bidirectional, delayed effects within predation ecosystems (See Figure 4)

figure 4

Discussion and Significance

While this study was only exploratory in nature, the results are promising and suggest there is value in continuing to pursue these questions. First, we noted that students benefited from the instruction with the ESPD, as illustrated by the significant learning gains from pre- to post-tutorial explanations. Second, we found that students started to use gestures compatible with predator-prey dynamics to reorient their explanation of the ecosystem dynamics. Third, students who gestured in their post-tutorial explanations also had the greatest learning gains, compared to those students who did not gesture. These results suggest that using elicited gestures to support how students represent systems dynamics while learning about them may help students attend more productively to the patterns represented within the system. This has important consequences for instruction and assessment because, as noted by others (e.g., Alibali & Nathan, 2012; Goldin-Meadow, 2004; Goldin-Meadow & Alibali, 2013), gestures can be used to recognize the student’s disposition for learning a new concept. We are examining various novel ways in which our simulation can be combined with other simulations, such as the agent-based NetLogo simulation (Wilensky, 1999), to further challenge students into reasoning quantitatively about complex systems.


This work was funded by an Indiana University Proffitt Award (2015), by an NSF Data Consortium Fellowship (2016), and by the IU Center for Research in Learning Technologies (CRLT). We also thank the teachers and students who participated.


Abrahamson, D., & Sánchez-García, R. (2016). Learning Is Moving in New Ways: The Ecological Dynamics of Mathematics Education. Journal of the Learning Sciences, 25(2), 203-239. doi:10.1080/10508406.2016.1143370

Barsalou, L. W. (2008). Grounded cognition. Annu. Rev. Psychol., 59, 617-645.

Folke, C. (2006). Resilience: The emergence of a perspective for social–ecological systems analyses. Global environmental change, 16(3), 253-267.

Grotzer, T. A., Kamarainen, A. M., Tutwiler, M. S., Metcalf, S., & Dede, C. (2013). Learning to Reason about Ecosystems Dynamics over Time: The Challenges of an Event-Based Causal Focus. BioScience, 63(4), 288-296. doi:10.1525/bio.2013.63.4.9

Hokayem, H., Ma, J., & Jin, H. (2015). A learning progression for feedback loop reasoning at lower elementary level. Journal of Biological Education, 49(3), 246-260.

Hostetter, A. B., & Alibali, M. (2008). Visible embodiment: Gestures as simulated action. Psychonomic bulletin & review, 15. doi:10.3758/pbr.15.3.495

Kang, S., & Tversky, B. (2016). From hands to minds: Gestures promote understanding. Cognitive Research: Principles and Implications, 1(1), 4. doi:10.1186/s41235-016-0004-9

Lakoff, G., & Núñez, R. E. (2000). Where mathematics comes from: How the embodied mind brings mathematics into being: Basic books.

Lee, V. (2014). Learning technologies and the body: Integration and implementation in formal and informal learning environments. New York, NY: Routledge.

Lindgren, R. (2015). Getting into the cue: Embracing technology-facilitated body movements as a starting point for learning. In V. Lee (Ed.), Learning technologies and the body: Integration and implementation in formal and informal learning environments (pp. 39-54). New York: Routledge.

Mahon, B. Z., & Caramazza, A. (2008). A critical look at the embodied cognition hypothesis and a new proposal for grounding conceptual content. Journal of physiology-Paris, 102(1), 59-70.

Mitchell, M. (2009). Complexity: A guided tour: Oxford University Press.

Nathan, M. J., & Walkington, C. (2017). Grounded and embodied mathematical cognition: Promoting mathematical insight and proof using action and language. Cognitive Research: Principles and Implications, 2(1), 9.

Nathan, M. J., Walkington, C., Boncoddo, R., Pier, E., Williams, C. C., & Alibali, M. W. (2014). Actions speak louder with words: The roles of action and pedagogical language for grounding mathematical proof. Learning and Instruction, 33, 182-193. doi:10.1016/j.learninstruc.2014.07.001

Norris, S. (2004). Analyzing multimodal interaction: A methodological framework: Routledge.

Smith, C. P., King, B., & Hoyte, J. (2014). Learning angles through movement: Critical actions for developing understanding in an embodied activity. The Journal of Mathematical Behavior, 36, 95-108.

Exploring a Text-Mining Approach as Rapid Prototyping Tool for Formative-Assessment Development in Problem-Based Learning Environments

By Alejandro Andrade and Chris Georgen

As online courses feedback larger amounts of data than what researchers can analyze manually, many educational researchers are exploring computational tools, such as text mining. Text mining is a method that lets you find pattern in textual data. Text mining has been used, for instance, to analyze responses to tutor prompts, to facilitate automatic coding of online discussion forums, to analyze collaboration in discussion boards, to assess asynchronous discussions forums, or to identify interaction patterns and sequential organization in chats (for a more complete discussion see Romero & Ventura, 2010; Romero & Ventura, 2013). Our goal is to explore a text mining approach to analyze collaboration products, in the form of textual answers to problem-based class activities. The long standing goal would be to develop a feasible approach for the analysis of individual and collaborative processes and to provide real-time instructor feedback in the context of rapid evolving contexts within problem-based learning pedagogies.

In a class of undergraduate circuitry in the music engineering program at a large Midwest research-oriented institution, we analyzed 18 students’ pre-, during-, and post-collaboration artifacts, called Peer Investigation Groups (or PIGs for short), using a text-mining approach. PIGs are both MC and open-ended questions developed by the class instructors to gauge students’ performance in solving circuitry problems collaboratively and individually. In our approach, by comparing the students’ textual answers to an expert benchmark, we analyze the effect of collaboration on student learning. This approach is very similar to our previous study (Andrade et al., 2016). We assume that the closer the students’ answers resemble that of the expert—in this case, the course instructor—the more their discourse would be aligned with that of circuitry. In obtaining a distance measure of how far or close a student answer is from that of the expert’s, we can analyze student learning gains and the effect of collaboration, and in the future, feedback (to instructor and as well as students alike) whether students are on the right track or making enough progress towards mastering the content.

To put it more concrete, our goal is to find whether distance values, originated from our text-mining analysis, can be used as formative assessment feedback within PBL pedagogies. PBL courses do not display static content throughout the semester but evolve to adapt to different kind of problems and to what different students bring to the table (Hmelo-Silver, 2004). We believe this text mining approach can support a way to develop a rapid prototyping of assessments. This is because problem-based learning activities require a feasible method to gauge students’ understanding at flexible time points along the course, and with our text-mining approach we can avoid elaborating complex knowledge and skill models that otherwise would also require a large and fixed set of questions. Instead, distance values provide, though coarse-grain, an initial window of student performance while in the process of learning. Finally, we intend to validate these distance values with other measures of student learning such as final grades or instructor-generated grades of the students’ PIGs.

A Text-Mining Approach

The question was:

“Describe why the answers for the open switch circuit are what they are.”

In order to get a distance value from each student text document to the expert’s benchmark document, we construct a document-term matrix. The rows in this matrix are text documents such as the students’ and instructor answers to the question; columns are terms (relevant words); and entries are term frequencies. Each row is then a vector with the relevant word frequency for a particular text document. Following Kopainsky, Pirnay‐Dummer, and Alessi (2012), we extracted only nouns and names by using part-of-speech (POG) tags. For instance, we illustrate the whole procedure with the instructor answer.

An excerpt from the instructor answer is as follows:

“You can have Voltage without current, but not current without voltage. Since there is an open switch we know that current equals zero. To find the voltage drop of each resistor we insert that 0 amps of current into Ohm’s law to get V=0*R. We know that V will always equal zero (no matter what the resistance is). If each resistor drops 0 volts, then that means there is 0 volts difference between the two ends of each resistor, in other words the voltage is the same at both ends of each resistor.”

We used Python’s Natural Language Processing NLTK package to produce POG tags:

[(‘You’, ‘PRP’), (‘can’, ‘MD’), (‘have’, ‘VB’), (‘Voltage’, ‘NNP’), (‘without’, ‘IN’), (‘current’, ‘JJ’), (‘,’, ‘,’), (‘but’, ‘CC’), (‘not’, ‘RB’), (‘current’, ‘JJ’), (‘without’, ‘IN’), (‘voltage’, ‘NN’), (‘.’, ‘.’), (‘Since’, ‘IN’), (‘there’, ‘EX’), (‘is’, ‘VBZ’), (‘an’, ‘DT’), (‘open’, ‘JJ’), (‘switch’, ‘NN’), (‘we’, ‘PRP’), (‘know’, ‘VBP’), (‘that’, ‘DT’), (‘current’, ‘JJ’), (‘equals’, ‘NNS’), (‘zero’, ‘NN’), (‘.’, ‘.’), (‘To’, ‘TO’), (‘find’, ‘VB’), (‘the’, ‘DT’), (‘voltage’, ‘NN’), (‘drop’, ‘NN’), (‘of’, ‘IN’), (‘each’, ‘DT’), (‘resistor’, ‘NN’), (‘we’, ‘PRP’), (‘insert’, ‘VBP’), (‘that’, ‘IN’), (‘0’, ‘CD’), (‘amps’, ‘NNS’), (‘of’, ‘IN’), (‘current’, ‘JJ’), (‘into’, ‘IN’), (‘Ohm’, ‘NNP’), (“‘s”, ‘POS’), (‘law’, ‘NN’), (‘to’, ‘TO’), (‘get’, ‘VB’), (‘V=0*R.’, ‘NNP’), (‘We’, ‘PRP’), (‘know’, ‘VBP’), (‘that’, ‘DT’), (‘V’, ‘NNP’), (‘will’, ‘MD’), (‘always’, ‘RB’), (‘equal’, ‘VB’), (‘zero’, ‘NN’), (‘(‘, ‘(‘), (‘no’, ‘DT’), (‘matter’, ‘NN’), (‘what’, ‘WP’), (‘the’, ‘DT’), (‘resistance’, ‘NN’), (‘is’, ‘VBZ’), (‘)’, ‘)’), (‘.’, ‘.’), (‘If’, ‘IN’), (‘each’, ‘DT’), (‘resistor’, ‘NN’), (‘drops’, ‘VBZ’), (‘0’, ‘CD’), (‘volts’, ‘NNS’), (‘,’, ‘,’), (‘then’, ‘RB’), (‘that’, ‘DT’), (‘means’, ‘VBZ’), (‘there’, ‘EX’), (‘is’, ‘VBZ’), (‘0’, ‘CD’), (‘volts’, ‘NNS’), (‘difference’, ‘NN’), (‘between’, ‘IN’), (‘the’, ‘DT’), (‘two’, ‘CD’), (‘ends’, ‘NNS’), (‘of’, ‘IN’), (‘each’, ‘DT’), (‘resistor’, ‘NN’), (‘,’, ‘,’), (‘in’, ‘IN’), (‘other’, ‘JJ’), (‘words’, ‘NNS’), (‘the’, ‘DT’), (‘voltage’, ‘NN’), (‘is’, ‘VBZ’), (‘the’, ‘DT’), (‘same’, ‘JJ’), (‘at’, ‘IN’), (‘both’, ‘DT’), (‘ends’, ‘NNS’), (‘of’, ‘IN’), (‘each’, ‘DT’), (‘resistor’, ‘NN’), (‘.’, ‘.’)]

In keeping only those terms with ‘NN’, ‘NNP’, or ‘NNS’ tags, we stripped the document of everything but nouns and names. We only took names and nouns as they might represent concepts (Kopansky et al., 2012). However, we could have also account for verbs and adverbs that represent relationships between those concepts, but we are tackling this problem in future work. The following step was to get rid of capitalization and stem the words so that ‘circuit’ and ‘circuits’ are not counted as different terms:

[‘voltag’, ‘voltag’, ‘switch’, ‘equal’, ‘zero’, ‘voltag’, ‘drop’, ‘resistor’, ‘amp’, ‘ohm’, ‘law’, ‘v=0*r.’, ‘v’, ‘zero’, ‘matter’, ‘resist’, ‘resistor’, ‘volt’, ‘volt’, ‘differ’, ‘end’, ‘resistor’, ‘word’, ‘voltag’, ‘end’, ‘resistor’]

Note that though ‘resistance’ was stemmed as ‘resist,’ ‘resistor’ was not. This list also contains some non-relevant terms such as ‘word’, ‘end,’ and ‘matter,’ but for the most part it works relatively well.

Then, we converted the list into a term-frequency vector (where entries are term frequencies):

FreqDist({‘voltag’: 4, ‘resistor’: 4, ‘end’: 2, ‘zero’: 2, ‘volt’: 2, ‘ohm’: 1, ‘differ’: 1, ‘law’: 1, ‘word’: 1, ‘drop’: 1, …})

We followed the same procedure with each student document, and by stacking these vector we produce three document-term matrix, one at each time point (i.e., pre, during, post), with 19 rows (1 expert plus 18 students) and between 51 and 72 columns, depending on the number of terms contained in the documents at each time point. For instance, an excerpt of the document-term matrix for the pre-collaboration PIGs looks like this:


The distance from each student document to the benchmark document was computed using the cosine similarity measure in the Latent Semantic Analysis LSA R package (Leydesdorff, 2005), which calculates the distance between all vectors within a document-term matrix. The cosine similarity values range between [0,1], where 0 = totally different documents and 1 = totally identical documents.

From this point on, things got easier. In taking the average values across the 18 students (see Figure 1), we compared the three (pre, during, post-collaboration) time points and found that the reduction between pre and post values is statistically significant, indicating that the post-collaboration values are more similar to the expert benchmark than before the collaboration. Thus, we conclude that students aligned their discourse towards more disciplinary ways of writing about circuits, according to the expert’s benchmark, after they have had an opportunity to collaborate.

values table

Figure 1. Cosine Values for PIGs at three time points. The increased similarity to the expert benchmark from pre (M=.228, SD = .090) to post (M=.288, SD=.059) PIG is statistically significant, paired t-test t(17) = 2.697, p-value = .0152, with a large effect size, g = 0.785.

However, we still need a couple more pieces of information to make sure these distance values are meaningful at all. For instance, is a similarity increase of 0.059 of any practical significance? What does a low or high distance value look like? Is this increase less than what it could be expected for this student population? How do these distance values relate to other measures of students’ performance? We will try to tackle these and other questions in our forthcoming iteration of this study.


Andrade, A., Saleh, A., Rehack, A. M., Gomoll, A. S., Danish, J. A., & Hmelo-Silver, C. E. (2016). Exploring a Text-Mining Approach for the Analysis of Computer Collaborative Data from a Design-Based Research Project. Paper presentado en  Annual Meeting of the American Educational Research Association AERA 2017, Washington, DC, USA.

Hmelo-Silver, C. E. (2004). Problem-based learning: What and how do students learn? EDUCATIONAL PSYCHOLOGY REVIEW, 16(3), 235-266.

Kopainsky, B., Pirnay‐Dummer, P., & Alessi, S. M. (2012). Automated assessment of learners’ understanding in complex dynamic systems. System Dynamics Review, 28(2), 131-156.

Leydesdorff, L. (2005). Similarity Measures, Author Cocitation Analysis,and Information Theory. JASIST, 56(7), 769-772.

Romero, C., & Ventura, S. (2010). Educational data mining: a review of the state of the art. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 40(6), 601-618.

Romero, C., & Ventura, S. (2013). Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3(1), 12-27.

Estimating intervention effects in ABAB reversal designs using Bayesian Statistics: A function for Single-Case Designs using R and RJags

By Alejandro Andrade, Li-Ting Chen, and Mathew Hanauer, Indiana University

In this post, we introduce a simple function (ABAB-level function.r) we wrote for the R software to perform a Bayesian statistical analysis within the Single-case design (SCD) framework. This function produces Bayesian estimates of level comparisons between the study phases to support claims about the effect of a particular intervention. This function intends to take advantage of R as an open-source software and the power of Bayesian inferences given the small sample sizes of traditional SCDs. Our intention is to also provide a clean output with plenty of visualizations to help SCD practitioners interpret their results without having to decipher complicated numeric tables.

Single-Case Research Design

Single-case design (SCD) is a repeated-measures research approach for the study of an intervention effect, and its importance is increasingly being recognized in education research because of its role in evidence-based practice (Horner et al., 2005). SCD is an experimental methodology that documents a causal relationship between a treatment/independent variable (IV) and a measured/dependent variable (DV). SCD controls for internal validity by comparing the level of the DV before and after the IV has been introduced, and by comparing the effect of the treatment between various participants. Systematic replication in other studies has been recommended in order to enhance the external validity of the analysis (Horner et al., 2005).

In SCD, the unit of analysis is the individual, which serves as her own control. Before the introduction of the treatment, the participant’s performance on the DV is repeatedly measured to establish a baseline, after which the intervention starts and the DV is measured again. This simple design is called AB, where the baseline phase is referred as A and the intervention phase is referred as B. In order to ensure that the observed changes in the DV are due to the inclusion of the IV, the treatment is removed and then reintroduced once more in a second AB cycle. This design is called an ABAB (reversal) design. There are several other single-case research designs beyond the AB or ABAB (reversal) designs, such as multiple baseline designs, and alternating treatments designs, which try to rule out other confounding variables that might disturb the DV in addition to the intervention. In this brief post, however, we focus on the ABAB (reversal) design.

An ABAB Statistical Model

Various SCD modeling approaches are available in both frequentist and Bayesian frameworks (de Vries & Morey, 2013; Moeyaert, Ferron, Beretvas, & Van den Noortgate, 2014; Rindskopf, 2014; Swaminathan, Rogers, & Horner, 2014). Although most of the cited references are applied to multilevel model that account for autocorrelation, the R function we created only applies to a single level in which estimation for only one participant is produced and disregards serial dependency (a.k.a. autocorrelation). In addition, our function assumes there are no trends in the data—for instance, the DV decreases in phase B while increases in phase A. Trends, autocorrelation, and effect size will be introduced in future R functions.

The statistical model in the R function (ABAB-level function.r) includes four regression coefficients and has the following form (Moeyaert et al., 2014, Equation 3):

model equation

Where beta 0 is the mean level during baseline (A1), and beta 1 through beta 3 are the level differences between A1 and B1, between B1 and A2, and between A2 and B2.  P1, P2, and P3 are three dummy variables reflecting the phase for each time point in the following form (assuming each phase has three time points):

phase table

Bayesian Inference

Bayesian inference uses the estimation of an updated belief about the values under examination. To update the belief, the analysis takes the prior beliefs and weighs them by the collected evidence. The updated belief, also referred to as the posterior belief, is a distribution of plausible values. In order to test a research question, Bayesian inference compares the posterior distribution to a test value, such as zero, and makes a decision depending on whether or not the posterior distribution includes the test value. One way to ensure that prior subjective beliefs are not biasing the analysis, is to select non-committal priors that present a broad range of a priori plausible values (Kruschke, 2010). Although we don’t elaborate the topic of priors here, Bayesian inference can take advantage of prior information in a way that is not available in non-Bayesian approaches. For the model under examination, inferences have to be made about the four regression coefficients that represent the phase level differences.

The R Function using RJags

We use RJags, which is an interface between R (a freely available statistical software) and Jags (which stands for Just Another Gibbs Sampler). Jags takes a user’s defined model and returns an MCMC sample of the posterior distribution. There are four things needed to be downloaded, including:

  1. Jags available at from http://mcmc-jags.sourceforge.net/;
  2. R available at https://www.r-project.org/ (we recommend downloading and installing RStudio available at https://www.rstudio.com/ to run the program);
  3. two R files (‘ABAB-level function.r’ and ‘ABAB-level run file.r’) and the data set in the following zip file ABAB-level function. These files have to be downloaded to the same folder.

To test the function, we use Student B1 data from Lambert, Cartledge, Heward, & Lo’s (2006). To run the analysis with your own data, you need to save them in comma separated value (csv) format and dummy code the phase variable where 0 = A and 1 = B phases. In fact, you only need to input two variables, the DV and a Phase identifier, to run the function. Often times, however, when data has been taken from multiple participants, a participant identifier variable is used (usually in the form 1 = participant 1 … n = participant n).

Running the ABAB-Level Function

Open the ‘ABAB-level run file.r’. You will see the following:


Before running the file, you need to modify line 2 (setwd) to include the path to the directory where the two files (‘ABAB-level function.r’ and ‘ABAB-level run file.r’) were downloaded. If you are using your own data, you also need to point R to your data file by modifying line 5 (read.csv) inserting the appropriate data file name (note that this file has to be in the same folder as the r files). If you only have one participant, you need to delete or comment out (by including a # at the beginning of) line 6. If the data comes from various participants, and there is a participant identifier variable, you can use line 6 to select from which participant you want to run the analysis. This can be done by changing the number after the == to the appropriate identifier value. Finally, highlight lines 1 through 11 and click RUN.

Example using Lambert’s et al. (2006) Data

After running ‘ABAB-level run file.r’, the following output is produced along with the graph in Figure 1.

Results show that the posterior distribution of the regression coefficients do not include zero, thus supporting an updated belief that the changes on the DV values in each phase is dependent on the introduction or removal of the treatment. For instance, beta 2 represents the level change between phases A1 and B1 (when the treatment is first introduced). The mean of beta 2 posterior distribution, which can be seen on the second from left histogram on the top of Figure 1, is equal to -4.98, which indicates a reduction of almost 5 points on the DV scale (which ranges from 0 to 10). The 95% interval of credible values for this reduction is between – 2.82 and -7.14, which clearly excludes the value of zero. It can be seen that the other regression coefficient estimates also diverge from zero. Finally, the plot on the bottom of Figure 1 shows the data points for each phase and superimposes the estimates of the phase levels, where MUs equation.

ABAB Level Only

Figure 1. ABAB-level function output

Final Remarks

The ABAB-level function is a R code that has been written for ease of use and to support practitioners of single-case design methodologies. It is freely available and downloadable from this blog post provided the appropriate reference is included in any academic publication. We are in the process of writing a manuscript elaborating on additional expansions to the function that will include trend comparison, effect size computation, account for autocorrelation, and Bayes Factor.


de Vries, R. M., & Morey, R. D. (2013). Bayesian hypothesis testing for single-subject designs. Psychological Methods, 18(2), 165. doi:10.1037/a0031037

Horner, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S., & Wolery, M. (2005). The use of single-subject research to identify evidence-based practice in special education. Exceptional children, 71(2), 165-179. doi:10.1177/001440290507100203

Kruschke, J. (2010). Doing Bayesian Data Analysis: A Tutorial Introduction with R. Boston, MA.: Academic Press.

Lambert, M. C., Cartledge, G., Heward, W. L., & Lo, Y.-y. (2006). Effects of response cards on disruptive behavior and academic responding during math lessons by fourth-grade urban students. Journal of Positive Behavior Interventions, 8(2), 88-99. doi:10.1177/10983007060080020701

Moeyaert, M., Ferron, J. M., Beretvas, S. N., & Van den Noortgate, W. (2014). From a single-level analysis to a multilevel analysis of single-case experimental designs. Journal of School Psychology, 52(2), 191-211. doi:10.1016/j.jsp.2013.11.003

Rindskopf, D. (2014). Bayesian analysis of data from single case designs. Neuropsychological rehabilitation, 24(3-4), 572-589. doi:10.1080/09602011.2013.866903

Swaminathan, H., Rogers, H. J., & Horner, R. H. (2014). An effect size measure and Bayesian analysis of single-case designs. Journal of School Psychology, 52(2), 213-230. doi:10.1016/j.jsp.2013.12.002


Brief Reflection about Using a Text Mining Approach in a Design-Based Research

After having the opportunity to explore the use of a text mining approach to analyze information in a design-based research project about using video to support pre-service teachers’ ability to notice (Van Es & Sherin, 2002), I have three major ideas to share. First, these data-mining techniques are flexible and powerful tools, and yet one should be aware of several of their limitations. For instance, the stemmed words in a text document are but proxies of participants’ conceptual engagement, but these might be a rather distal than proximal type of evidence. The bag-of-words approach, the one used in my analysis, overlooks a great deal of information that might have been relevant to help tease apart more nuanced hypotheses. Nonetheless, the approach, however distal it might have been, did provide relevant evidence given the context of the present study, for instance, the relationships between the learning theories and the student analytics, and these latter and the experts’ analytics.

wordcloud sociocultural  wordcloud behaviorist
wordcloud constructivist wordcloud IP
 interaction plot

Second, while the bag-of-words is one text mining approach, it is not the only computerized technique available. Indeed, other more powerful tools can supplement or replace such an approach. For instance, some computerized linguistic analyses exist that can provide measures of coherence and cohesion in text documents. One of such techniques is the Coh-Metrix (Graesser, McNamara, Louwerse, & Cai, 2004), a free online tool that provides more than a hundred different indices with text characteristics. Among others, Coh-Metrix provides information about text easability, referential cohesion, content word overlap, connective incidence, passivity, causal verbs and causal particles, intentional particles, temporal cohesion, etc. With this tool one can supplement the findings about differences and similarities between the students’ and experts’ analytics, for instance.

Third, I believe that the incorporation of computational techniques to the researcher’s toolkit is bound to gain traction in the learning sciences. In particular, as researchers adopt design-based research methodologies (Cobb, Confrey, Disessa, Lehrer, & Schauble, 2003; W. Sandoval, 2014; W. A. Sandoval & Bell, 2004) that demand a sequence of test and refine iterations, they might consider having tools in their belts that can allow swift and reliable understanding of their results. Unlike other slow qualitative coding-scheme-based approaches that require inter-rater reliability, content analytic tools such as the bag-of-words are much faster and consistent. Also, these quantitative tools are useful with only a small group of students or with larger samples of various hundreds or even several thousands of participants. This doesn’t mean that traditional coding schemes are not good, or that we should stop caring about them. On the contrary, I believe that both approaches can work in tandem, where computational techniques provide a first glance at the data for a quick and dirty pass of analysis that can inform the research team on how to adapt and refine the design, and then, when resources allow, researchers can go deep into the data and examine the nuances of student learning interactions.


Cobb, P., Confrey, J., Disessa, A., Lehrer, R., & Schauble, L. (2003). Design experiments in educational research. Educational Researcher, 32(1), 9.

Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193-202.

Sandoval, W. (2014). Conjecture mapping: an approach to systematic educational design research. Journal of the Learning Sciences, 23(1), 18-36.

Sandoval, W. A., & Bell, P. (2004). Design-based research methods for studying learning in context: Introduction. Educational Psychologist, 39(4), 199-201.

Van Es, E. A., & Sherin, M. G. (2002). Learning to Notice: Scaffolding New Teachers’ Interpretations of Classroom Interactions. Journal of Technology and Teacher Education, 10(4), 571-596.

Multimodal Coding of Rapport in Early Elementary Interviews

Rapport is not a personality trait but only exists in the interaction between participants, and is described as the experience when people “click” with each other (Tickle-Degnen & Rosenthal, 1990). These authors describe the nature of rapport in terms of a dynamic structure of three interrelated components: (a) mutual attentiveness, (b) positivity, and (c) coordination. During the early stages of the interaction the first two components—mutual attentiveness and positivity—play a crucial role in the development of rapport (Cassell, Gill, & Tepper, 2007).

Gratch et al. (2006) track some indicative features of rapport in paying attention to head and face motions such as nodding, head shaking, eye gaze, and vocal features such as intensity, range, and backchannel feedback. The authors develop an interactional mapping between the speaker and the receiver using these multimodal features to test the development of mutual attentiveness. For instance, if the speaker lowers or raises pitch then the receiver nods; if there is a small disruption in the speaker’s pattern speech flow then the receiver’s posture or gaze shifts; if the speaker shifts posture, gazes away, or nods, then the receiver mimics this feature.

Because the transcription of multimodal features is a difficult, intensive piece of work, recent approaches suggest using specialized transcription software to keep track of a simultaneous display of the relevant non-verbal behaviors. Allwood’s et al. (2007) suggest the tracking of ten different behaviors (faces, eyebrow movement, eye gaze, head movement, hands, body posture, etc.). Because all these behaviors are tracked simultaneously, separate annotation tiers are in use to keep track of the various behaviors.

In order to transcribe those behaviors that Gratch et al. (2006) suggested as important in the study of rapport, we selected those multimodal features that were relevant from to rapport from those presented in Allwood’s et al. (2007) list and developed the following coding scheme. We track the moment-by-moment development of the voice pitch, assenting head movements, eye gaze, body positioning, face expressions, and type of verbal interchange with respect to self-disclosure or backchannel (mhmm) (see Table 1). To a certain extent these levels are grounded in the data because, for instance, we wanted to differentiate when the student was looking at the camera but not when the interviewer was looking at the computer or at the notes. Some preliminary results of this hand annotation of multimodal features using ELAN (Wittenburg, Brugman, Russel, Klassmann, & Sloetjes, 2006) can be seen in figure 1. It is important to note that we coded six features for each participant, that is, there are six tiers for the student and six for the interviewer.

Table 1

Relevant behaviors and their corresponding levels

Behavior Levels
Voice lower neutral Raised silent
Head nod/shakes NA
Gaze eye contact up Down sideways computer/notes other
Body straight back Forward sideways towards other
Face laughter/smile frown Neutral other
Verbal praise self-disclosure backchannel other



Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., & Paggio, P. (2007). The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. Language Resources and Evaluation, 41(3-4), 273-287.

Cassell, J., Gill, A. J., & Tepper, P. A. (2007). Coordination in conversation and rapport. Paper presented at the Proceedings of the workshop on Embodied Language Processing.

Gratch, J., Okhmatovskaia, A., Lamothe, F., Marsella, S., Morales, M., van der Werf, R. J., & Morency, L.-P. (2006). Virtual rapport. Paper presented at the Intelligent virtual agents.

Tickle-Degnen, L., & Rosenthal, R. (1990). The nature of rapport and its nonverbal correlates. Psychological inquiry, 1(4), 285-293.

Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., & Sloetjes, H. (2006). Elan: a professional framework for multimodality research. Paper presented at the Proceedings of LREC.


Coding Multimodal Features of Interaction

Coding for multimodal features of talk-in-interaction is not only an important matter but also a very demanding task. In looking at participants’ non-verbal and paralinguistic behaviors, such as body positioning, eye gaze, gestures, frowning or smiling faces, as well as speech prosodic such as pitch, tone, or speed, researchers are able to make sense of the wide range of communicative resources participants use in their moment-to-moment interactions. However, transcribing such rich multimodal interactions and displaying them in a linear fashion is a complicated matter.

In order to capture the full bandwidth of participants’ communicative resources several transcribing approaches have been proposed. For instance, Goodwin (2000) delineated a possible way to include several multimodal resources by overlapping a sketch-like snapshot of the activity superimposed over the verbal transcription (see Figure 1). However, this kind of transcription is crowded and, more importantly, does not support a good temporal description of the non-verbal behaviors.

Goodwin MMT  Figure 1. Goodwin’s (2000) Multimodal Transcription

Traditional approaches to transcription such as (Ochs, 1979), propose the use of a two-column format in which the verbal transcription is annotated on the right hand side and the non-verbal behavior on the left hand side (see Figure 2). However, not only is this format very rough but it also fails in providing a clear temporal annotation structure for various kinds of non-verbal behaviors that are to be analyzed simultaneously.

Ochs NVT

Figure 2. Traditional transcription format (Olchs, 1979)

A more recent approach includes using specialized transcription software to keep track of a simultaneous display of several non-verbal behaviors. For instance, in Allwood’s et al. (2007) MUMIN coding scheme, the authors list ten different behaviors to keep track of, including faces, eyebrow movement, eye gaze, head movement, the hand that is gesturing, the body posture, etc. (see Figure 3). Because all these behaviors are tracked simultaneously, separate annotation tiers are in use to keep track of the various behaviors (see Figure 4). One tricky issue in this coding scheme seems to achieve high interrater reliability. Although the behavioral values appear to be straight forward—for instance, gaze direction can be up or down or sideways–, but what exactly is the cutting point between these values for each particular case is often a gray matter. For instance, body position seems to be somewhat easy to spot—whether or not the person is leaning forward or sitting straight or sideways—but to what extent has the person to lean forward for this movement to deserve being coded as leaning forward, a significant inclination of the body, a slight forward movement? I don’t think there is a simple, all-encompassing answer because sometimes slight movements can be meaningful depending on the situation. But for the most past this coding scheme seems versatile and simple of use.

allwood 1Figure 3. Non-verbal Behavior List for Multimodal Annotation (Allwood et al., 2007)

allwood 2Figure 4. Multimodal Annotation Software (Allwood et al., 2007)

In a similar way, Carletta et al. (2003) created a tool kit for multimodal video data annotation written in the XML language. The NITE tool is a customizable software that supports the creation of human interfaces for the analysis of video data with temporal and structural information. Their main goal is to support the integration of the annotation of both language and observable behaviors in a meaningful way (see Figure 5).

Carletta XMLFigure 5. The NITE Architecture to code various modalities of communication from video data (Carletta et al., 2003).

However, although this software is flexible and robust, creating annotations that combine all the multiple modalities of human communication becomes resource intensive. For instance, annotating just one minute of video data can take up to several hours of work. Thus, computational approaches that can offload the heavy weight of annotating observable features in interaction can play an important role in streamlining a more comprehensive and reliable way of transcribing multimodal data.


Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., & Paggio, P. (2007). The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. Language Resources and Evaluation, 41(3-4), 273-287.

Carletta, J., Evert, S., Heid, U., Kilgour, J., Robertson, J., & Voormann, H. (2003). The NITE XML toolkit: flexible annotation for multimodal language data. Behavior Research Methods, Instruments, & Computers, 35(3), 353-363.

Goodwin, C. (2000). Action and embodiment within situated human interaction. Journal of pragmatics, 32(10), 1489-1522.

Ochs, E. (1979). Transcription as theory. Developmental pragmatics, 43-72.

Multimodal Features of Instant Rapport in Elementary Interviews

I illustrate some preliminary features identified in the study of instant rapport in early elementary interviews. Because of their limited use of talk, in the study of interviews with elementary students it seems extremely relevant to pay attention to their gestures and other nonverbal behaviors given their early developmental stage (Goldin-Meadow, 2004). There are few studies that can inform our understanding of rapport and its subsequent effect on the outcomes of the interview with early elementary students.

According to the literature, rapport is not a personality trait but only exists in the interaction between participants. Tickle-Degnen and Rosenthal (1990) argue that “Individuals experience rapport as the result of a combination of qualities that emerge from each individual during interaction. This experience is expressed clearly when people say they “clicked” with each other, or felt the good interaction to be due to “chemistry”.” (Tickle-Degnen & Rosenthal, 1990, p. 286). These authors describe the nature of rapport in terms of a dynamic structure of three interrelating components: (a) mutual attentiveness, (b) positivity, and (c) coordination.

During the early stages of the interaction the first two components—mutual attentiveness and positivity—play a crucial role in the development of rapport—whereas coordination plays an important role in later stages (Cassell, Gill, & Tepper, 2007). Instant rapport is understood as the creation of the felling of being in synchrony (Gratch et al., 2006). I would refer to this concept as the creation of the experience of being in synchrony at the outset of the interview. Focusing on the outset of the interview is important because it has been shown that the opening interactions may alter the eventual outcomes (O’Reilly, Karim, Stafford, & Hutchby, 2014).

Observing Instant Rapport

Various approaches study multimodal features and rapport in both human-human and human-virtual agent interactions (Cassell et al., 2007; Cassell et al., 1994; Gratch et al., 2006; Huang, Morency, & Gratch, 2011; Wang & Gratch, 2009; Zhao, Papangelis, & Cassell, 2014). In Gratch et al. (2006) approach, the authors track some indicative features of rapport in paying attention to head and face motions such as nodding, head shaking, eye gaze, and vocal features such as intensity, range, and backchannel feedback. The authors develop an interaction mapping between the speaker and the recipient using these multimodal features to test the development of mutual attentiveness. For instance, if speaker lowers or raises pitch then receiver nods; if there is a small disruption in the speaker’s pattern speech flow then the recipient’s posture or gaze shifts; if the speaker shifts posture, gazes away, or nods, then the recipient mimics this feature. According to Cassell et al. (2007), one of the key features in building positivity is participants’ self-disclosing utterances, including negative self-disclosure, such as related or unrelated intimate personal information.

Transcription Challenges

As I was trying to bring up an example, I realized how difficult it is to convey the multimodality-in-interaction in a piece of paper. Although it seems relatively simple to observe these multimodal features in the video data, or to create a verbatim transcription of their utterances, a visual representation of several time-dependent variables interacting with each other seems cumbersome. Goodwin (2000) provides an interesting example to bringing several features of interaction together in a visual transcript. I will likely produce something similar, though the use of other visualizations for qualitative time-series may be worth exploring.


Cassell, J., Gill, A. J., & Tepper, P. A. (2007). Coordination in conversation and rapport. Paper presented at the Proceedings of the workshop on Embodied Language Processing.

Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, T., . . . Stone, M. (1994). Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents. Paper presented at the Proceedings of the 21st annual conference on Computer graphics and interactive techniques.

Goldin-Meadow, S. (2004). Gesture’s role in the learning process. Theory into Practice, 43(4), 314-321.

Goodwin, C. (2000). Action and embodiment within situated human interaction. Journal of pragmatics, 32(10), 1489-1522.

Gratch, J., Okhmatovskaia, A., Lamothe, F., Marsella, S., Morales, M., van der Werf, R. J., & Morency, L.-P. (2006). Virtual rapport. Paper presented at the Intelligent virtual agents.

Huang, L., Morency, L.-P., & Gratch, J. (2011). Virtual Rapport 2.0. Paper presented at the Intelligent virtual agents.

O’Reilly, M., Karim, K., Stafford, V., & Hutchby, I. (2014). Identifying the interactional processes in the first assessments in child mental health. Child and Adolescent Mental Health.

Tickle-Degnen, L., & Rosenthal, R. (1990). The nature of rapport and its nonverbal correlates. Psychological inquiry, 1(4), 285-293.

Wang, N., & Gratch, J. (2009). Rapport and facial expression. Paper presented at the Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on.

Zhao, R., Papangelis, A., & Cassell, J. (2014). Towards a dyadic computational model of rapport management for human-virtual agent interaction. Paper presented at the Intelligent Virtual Agents.

Goodwin’s Symbiotic Gestures and Ecology of Sign Systems

I find it really interesting the argument Goodwin (2003) advances against McNeill (1996). In a certain way, this discussion takes place at the epistemological level in which Goodwin represents a social constructionist stance and McNeill a social constructivist. Although this difference is subtle, it creates a very challenging yet interesting theoretical and methodological discrepancy. I will present McNeill’s ideas about gestures and thought. Then I will present Goodwin’s critique in the form of the idea of symbiotic gestures. And finally, I will revisit a previous study of mine to present some empirical evidence from an experimental setting in which we compared and contrasted the use of gestures and language when a small group tries to solve an abstract reasoning problem with and without the use of tangible materials.

McNeill’s account of gestures

According to McNeill (2005), gesticulations and gestures embody meaning when accompanying speech. At least three types of gestures are discernable: deictic, iconic, and metaphoric. These types of gestures relate to a hierarchy of symbolic levels of meaning, from the indexical (deictic), the representation of picturable aspects of the meaning (iconic), to a more abstract, semiotic image (metaphoric) (McNeill, 1996).

Goodwin sees a core issue with McNeill’s account of gestures. McNeill sees gesture as an embodied manifestation of the same psychological processes as utterances and sentences. However, because these processes seem to be the product of the individual mind, McNeill’s account of gestures does not provide the theoretical and methodological tools for investigating gestures as a distributed phenomenon outside the speaker. This is a fundamental distinction between constructionist stances, which take knowledge as essentially social –Goodwin’s position-, and constructivist stances, which take knowledge as an internalization process –McNeill’s position.

Goodwin’s account of symbiotic gestures

On the other hand, Goodwin recognizes that little attention has been paid to understanding how the environment contributes to the way gestures are organized. To Goodwin, gestures have to be investigated in relation to the physical, semiotic, social and cultural properties of the environment within which it is embedded. In using the concept of symbiotic gestures, Goodwin states that:

“Symbiotic gestures are built through the conjunction of quite different kinds of entities instantiated in diverse media: first, talk; second, gesture: and third, material and graphic structure in the environment. The actions they are performing cannot be understood by focusing on the gesturing hand in isolation, or even just on the gesture and the talk that accompanies it. Symbiotic gestures might thus constitute one perspicuous site for investigating embodiment as something lodged within both human interaction and a consequential, structured environment.” (Goodwin, 2003, p. 10)

Put differently, if we ignored the context –understood here in a very general sense including talk, materials and graphic media- in which gestures are produced, we will never understand human communication in its essence, as a mutual interaction between the body and the environment. The concept of symbiotic gestures help us understand how many forms of human action are built in the multiple effects of various expressive modalities.

A study of how the structure of the environment defines gestures

With the goal of understanding how the structure of the representational format affects the attainment of the joint problem-solving space in collaborative work (Barron, 2000; Roschelle & Teasley, 1995), two small groups of three graduate students were given two abstract reasoning problems called amoeboid patterns (see annexes). The first group was given physical materials –small plastic balls and foam frames-, whereas the second group received the printed materials on sheets of paper. The students were told to find the rule that governed the movement of a sequence of circles moving around a grid and then predict the two next states in the sequence. Audiovisual records were captured with two cameras (i.e., one with a whole-group frame and the other with a close-up frame on the materials) and both groups took about the same amount of time (~40 minutes) to find the correct answer.

In order to evaluate the performance, two sources of evidence were compared: a) the participants’ use of gestures, and b) the complexity of their talk. We categorized the gestures according to McNeill’s description of gesture levels: deictic, iconic, and metaphoric. We measured the complexity of talk using the linguistic features –cohesion and coherence- from the software Coh-Metrix (Graesser, McNamara, & Kulikowich, 2011; Graesser, McNamara, Louwerse, & Cai, 2004). According to Graesser et al. (2004), cohesion is an objective characteristic of language, that is, explicit features such as words and sentences that guide the interpretation of and the connection between ideas. Coherence, in contrast, is a characteristic of the speaker that relates to the mental representation that is being communicated in a meaningful and organized manner (Graesser et al., 2011; Graesser et al., 2004).

Because the problem-solving representational contexts are different, we hypothesized that the communication in each condition had to go through different modalities, and these differences should be apparent in both gestures and speech. If Goodwin is right, we would be able to observe gestures that do not have an intrinsic meaning unless they are observed in conjunction with the rest of the environment. We also hypothesized that the tangible materials would provide an additional channel through which much of the communication should take place. This distributed hypothesis would support Goodwin’s claims against the individualistic perspective advanced by McNeill.

Some preliminary findings of talk and gesture

An initial analysis revealed that participants in the paper condition seemed more inclined to produce iconic and metaphoric gestures as well as to make use of more complex talk. The following excerpt illustrates the typical talk produced by the participants in the paper condition group.


1 F2: So if the core is rotating then we have to determine the rule [ between these
2 F1:                                                                                                              [That would work? Sorry, keep going
3 F2: The rule that figures out how the appendices go
4 F1: Aha
5 (5 sec pause)
6 F1: Yeah, Okay. So, if these two are the core, and this swings down, and these are all connected, so, this one would come down, this one would go over
7 F2: Aha
8 F1: And this one[ here
9 F2:                        [would go over
10 F1: Yeah, so then we’re here. So now this is the core, so it swings…
11 F2: Oh yeah, that’s just it’s really hard to see cause the grid is so spaced out. But yeah, there is that core of four actually, seems like this four are…
12 F1: Aha

This short excerpt provides a picture of participants working their words to provide a visual image of the rotating object. Proper nouns such as the “core” and the “appendices” were put in the place of the circles, and verbs describing actions such as rotation, swinging, connection, were frequently used to visualize the sequence. In addition, a quick overview of the video revealed that gestures helped to convey these meanings. For instance, in the picture on the left in Figure 1, a two-hand gesture conveys the idea of a container and in the picture on the right, the dynamic gesture of a rotating core is displayed.

gesture1 gesture2

Figure 1. Metaphoric gestures were predominant in the paper condition

In contrast, participants in the tangible condition seemed to have made extensive use of the physical materials to communicate their ideas about the sequence and to coordinate their problem solving. These interactions were mostly indicative and their utterances frequently overlapped each other. The following excerpt illustrates this type of interaction:


17 M1: I’m just thinking if there is another rotation issue that might be going on here?
18 F4: One, two
19 M1: If we treated them as [ independent
20 F4:                                        [As independent balls…
21 M1: No
22 F4: Let’s just treat them [ independently
23 F5:                                        [One, two, three, four/ (while F4 moves the balls)
24 M1: What rule are they all following there?
25 F4: They’re moving [ like
26 F5:                                [ Like down
27 M1: Alright, if the ball, it has to be an ‘if-then’ rule so like it has to be in relation to
28 F4: The ball must revolve clockwise around this ball, so…
29 M1: It can’t move in diagonal so it goes towards
30 F4: It goes this way, this way, this has to go this way, [ and this has to go this way
31 M1:                                                                                         [ And it has to go that way
32 M1: Ok, let’s move all back into same thing here [ so, this way
33 F4:                                                                                [Now, has to go this way, has to move this way, has to move this way [ and has to go this way
34 F5:                                                  [ And has to go that way

Without looking at the video it is relatively difficult to follow the conversation and to form a coherent picture of what took place in the activity. It is apparent that there are many demonstrative pronouns, such as this and that, as well as many sentences where participants were counting “one two three four” as they moved the balls, and which make it difficult to interpret and make sense of the interaction without explicit attention to the visual information. In addition, the participants’ hand movement seemed to be mostly action-based rather than gesture-like (see figure 2).


Figure 2. Action-based hand movements were more predominant in the tangible condition

In order to see how generalizable these finding were across the data, we supported our analysis with the systematic data from the automated computer-based analysis. Results show that participants in the paper condition produced a little bit more gestures per segment unit than the tangible condition (M=22.5, SD=17.49, versus M=22.50, SD=11.55 respectively). However, this difference was not statistically significant at a 10% level, t(22.52)=0.54, p=.588. Table 1 shows the results from the textual analysis comparing the problem-solving attempts across conditions. The tangible condition has more content word as well as pronoun noun overlap between pairs of sentences (see indices 34 and 38). That is, there seems to be less lexical density in the talk in the tangible condition. Conversely, the paper condition showed more syntactic complexity and more noun continuity (see indices 14 and 31). That is, the information conveyed verbally seems to be more coherent in the paper condition.

Table 1. T-tests between Tangible and Paper Conditions in Textual Analsyis

Index Mean Tangibles Mean Paper t-value df p-value Lower CI Upper CI
Index 14 15.811 23.264 -1.931 25.887 0.065 -14.038 -0.868
Index 31 5.101 11.112 -2.171 26 0.039 -10.732 -1.289
Index 34 322.792 312.126 2.146 21.682 0.043 2.124 19.208
Index 38 4.953 4.451 2.032 25.639 0.053 0.08 0.923

Significance level at 10%

Table 2. Coh-Metrix Table of Indices

Index 14 Text Easability PC Syntactic simplicity, z score This component reflects the degree to which the sentences in the text contain fewer words and use simpler, familiar syntactic structures, which are less challenging to process. At the opposite end of the continuum are texts that contain sentences with more words and use complex, unfamiliar syntactic structures. The larger the number the more complex, challenging the syntactic structure.
Index 31 Noun overlap, all sentences, binary, mean This is a measure of global overlap between sentences in terms of nouns. Global overlap represents the average number of sentences in the text that have noun overlap of each sentence with every other sentence.
Index 34 Content word overlap, adjacent sentences, proportional, mean This measure considers the proportion of explicit content words that overlap between pairs of sentences. For example, if a sentence pair has fewer words and two words overlap, the proportion is greater than if a pair has many words and two words overlap.
Index 38 Anaphor overlap, adjacent sentences This measure considers the anphor overlap between pairs of sentences. A pair of sentences has an anphor overlap if the later sentence contains a pronoun that refers to a pronoun or noun in the earlier sentence. The score for each pair of sentences is binary, i.e., 0 or 1. The measure of the text is the average of the pair scores.

Discussion and conclusion

These findings show how the representational format has an effect on attainment of the joint problem-solving space. During the communicative attempts, the quality and quantity of information that flows through the different modalities suggests different levels of complexity. Although the average number of gestures is similar in both conditions, the type of gestures are different. In the tangible condition, action-based hand movements were prevalent, whereas iconic metaphoric gestures were prevalent in the paper condition. Although the average number of words and sentences did not differ across conditions, the quality of the talk did. In the tangible condition, more demonstrative pronouns occurred, which qualified this talk as of low lexical complexity and cohesion. On the other hand, the talk had more syntactic complexity in the paper-based condition.

How do these results support Goodwin’s or McNeill’s positions? Although the comparison is not fair in the sense that neither of these authors would agree on the use of experimental studies, McNeill did not study how the gestures are an intricate process that places body and environment alongside. McNeill’s account of gestures rest at a descriptive individual level. On the other hand, Goodwin argues for an orchestrated interpretation of gesture in close relation to the other semiotic resources. The fact that both groups correctly solved the problem indicates, at least in principle, that the problem was solved through collaboration regardless of the structure of the representation. However, the fact that more communication seemed to flow through higher level of gestures as well as the coherence of the utterances in the paper condition in contrast to the tangible condition imply that a good deal of the information had to flow through the movement of the tangible materials. Further, the dialog in the tangible condition was not intelligible without looking at the actions. As Goodwin expressed it: “One thus finds here an utterance in which not only talk but also structure in the environment and gesture linking the two are central to its organization. If these were removed, what was being said would not be expressed, and the action being performed would fail.” (Goodwin, 2003, p. 13)


Barron, B. (2000). Achieving coordination in collaborative problem-solving groups. The Journal of the Learning Sciences, 9(4), 403-436.

Goodwin, C. (2003). The semiotic body in its environment. Discourses of the body, 19-42.

Graesser, A. C., McNamara, D. S., & Kulikowich, J. M. (2011). Coh-Metrix providing multilevel analyses of text characteristics. Educational Researcher, 40(5), 223-234.

Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193-202.

McNeill, D. (1996). Hand and mind: What gestures reveal about thought: University of Chicago Press.

McNeill, D. (2005). Gesture and thought: University of Chicago Press.

Roschelle, J., & Teasley, S. D. (1995). The construction of shared knowledge in collaborative problem solving. Paper presented at the Computer supported collaborative learning.

Reflection on Quantification from a Social Constructionist Stance

Does the epistemological stance of social constructionism imply a methodological approach that rejects quantitative analysis?

In opposition to cognitivist theory that seeks knowledge structures in the mind and independent of context, social constructionism understands mental processes and categories as constituted through social, discursive activities (Jørgensen & Phillips, 2002). In opposition to positivist approaches that regard objective knowledge as something that is out there independently, social constructivism is based on poststructuralist premises that takes the constitutive and constructive aspects of language at its core, focusing on talk-in-use as its object of study. Qualitative analysis seems to be the ‘natural’ method of analysis of social constructionism (Jørgensen & Phillips, 2002; Wooffitt, 2005).

  • Research questions about production of meaning (as is created in discourse) instead of predictions (of universal yet individual properties).
  • Focused sample (particular informed inferences) versus representative samples (inferential generalizable claims)
  • Data stems from naturally occurring materials instead of decontextualized superimposed categories from where to choose (as in a survey)
  • Coding (themes) versus counting (units)
  • Discourse/conversation analysis versus analysis of variance and covariance
  • Reflexivity versus statistical significance
  • Trustworthiness instead of (construct, content, face, etc. forms of) validity

However, is it possible that situational constraints may qualify a situation in such a way that it would allow the researcher to quantify the utterances or other kinds of interactions? Situational constraints can be thought of as the embeddedness of human activity in the social and material contexts that participate in the construction of the meaning of such practices. Schegloff (1993, pp. 114-115) expressed similar ideas more than thirty years ago:

“We need to know what the phenomena are, how they are organized, and how they are related to each other as a precondition for cogently bringing methods of quantitative analysis to bear on them… in some areas of inquiry it may be worth exploring whether or not the conditions for cogent quantification can be met. That is, whether we know enough about the domain of phenomena, and whether the features of its organization or practices are such as to allow us to try this mode of inquiry on it.”

In my vision, this quantitative approach does not have to contradict a social constructionist stance. Because this approach would not be a type of a content analysis that would disregard context altogether, the quantification would have to preserve the interactional aspects of the unit of analysis. I believe quantitative tools can be built in a conversation/interaction analytical approach, stemming from social constructionist epistemic stances. However, such quantification would require that the domain/situation would be constrained enough in such a way that actions are specifiable and relatively well-defined.



Jørgensen, M. W., & Phillips, L. J. (2002). Discourse analysis as theory and method: Sage.

Schegloff, E. A. (1993). Reflections on quantification in the study of conversation. Research on language and social interaction, 26(1), 99-128.

Wooffitt, R. (2005). Conversation analysis and discourse analysis: A comparative and critical introduction: Sage.

What is a “discourse” and how should we study it? Ideas from Critical Discourse Analysis

Discourse and discursive practices

According to Jørgensen and Phillips (2002, p. 61), “discourse is a form of social practice which both constitutes the social world and is constituted by other social practices.” Thus, we can interpret a discourse as having at least two properties. On the one hand, in the universe of social practices discourse is but one kind of practice; other practices such as family structure practices exist in the world and also shape (or constitute) the social world. On the other hand, other social practices such as the way food is produced (e.g., GMO foods or In-vitro reproduction) shape the discourses around such practices. But, how does a discourse constitute the social world? Because language (understood in a broad sense) is action oriented, discourse is a form of action that can change the world and, at the same time, is socially and historically situated. For instance, discourse contributes to the constitution of social identities, social relations, and systems of knowledge and meaning.

From a related perspective, according to Parker (1990), “discourses are taken to be systems of meanings which reflect real power relations, and which in turn are a consequence of the material and economic infrastructure of society.” (Wooffitt, 2005, p. 146) In this account, discourse is a representation, it stands for the constitution and constituted meanings attached to the material and economic societal conditions in the world. And a very consequential property of these conditions is of being asymmetric. Because of this asymmetry, discourses constitute and are constituted by power struggles between social identities that possess a differential access to whatever is being produced by the social, as well as the discursive, practices.

An important concept is that of discursive practices, which I think can be understood as discourses viewed as processes – discourse as a process, not an outcome.

Texts and intertextuality

Discursive practices produce texts, which are then consumed in discursive and social practices – as an iterative process. This production-consumption is a dynamic process in which communicative events and discourse genres are to be understood. What are these communicative events and how do they form genres? For Fairclough (1992), that which is produced by a discursive practice is called a “text.” Texts are understood broadly to include instances of language use such as newspaper articles, films, videos, interviews, or political speeches. Because produced by discursive practices, texts are concatenated in particular social practices forming genres. For instance, in institutional practices such as health, legal, or police practices.

Discourse analysis analyzes texts. In deconstructing texts, the analyst reveals latent oppositional alternatives where dominant positions exercise their power (Wooffitt, 2005). Thus, it is the goal of the analyst to denounce oppressing ideological stances that would otherwise be taken for granted.

Ideology and emancipation

An explicit goal of critical discourse analysis studies is an emancipatory agenda of political change (Jørgensen & Phillips, 2002; Wooffitt, 2005). The analyst should take the side of any disenfranchised group and make sure to expose inequalities and oppressions that have been maintained by unequal power relations. Class, race, and gender rights monopolize the attention of CDA researchers. But how to avoid creating new inequities and disparities in the process? How not to disenfranchise some other identities and subjectivities? No guidelines are provided. If identities are constituted in discourse, and created by referring to some hodgepodge of otherness, how can one ever be entirely fair? If identities are always and necessarily fragmented, as Lacan put it, how can any critique do more than just that, critic? Unfortunately, CDA does not provide any solution to this paradox.