Mulce : MUltimodal contextualized Learner Corpus Exchange
The study of online learning, whether aimed at understanding this form of situated human learning, at evaluating relevant pedagogical scenarios and settings or at improving technological environments, requires the availability of interaction data from all those involved in such learning situations, including learners and teachers and other participants.
There is no shortage of scientific material (publications and conferences) on online learning in France or internationally. But the pluridisciplinary research communities working in this area have not so far succeeded in defining their object of study or agreeing a relevant methodology. The data are either inaccessible, or of limited access to those who were not involved in the original research. This data is fragmented, therefore decontextualized in terms of the components of the original teaching setting. Sometimes it is buried, in a proprietary format, within the technological environment. The consequences of this for research findings are that the work of the researchers lacks a scientific basis and contradictory conclusions may arise. Comparisons are often attempted between objects that are ill-defined and are in fact different. The processes of scientific enquiry, such as re-analyzing, replicating, verifying, invalidating or extending the findings, are therefore disabled.
To address this anomaly, we propose to create and to disseminate a new type of corpus, contextualized learner corpus, entitled "Learning and Teaching Corpus" (LETEC). Such corpora include not only the data that are the direct output of learner activity on online courses, but also their context, i.e. the data characterizing the pedagogical and research settings.
These data are multimodal: participants may carry out activities in various modes; the new synchronous environments offer spaces that allow production and communication in interrelated modes; the screen capture videos produced as part of the research process create a flattened representation of the event itself, which needs to be restored to its original multidimensionality in order to be analysed. Therefore the construct of contextualized learner corpus implies that we reflect on the work of transcribing, annotating and analyzing multimodally. These problems also affect other social science domains but they need to be reconceptualized to address the specific situation of human participants, in learning groups, interacting within appropriate technological environments.
The only point in creating contextualized learner corpora is to enable sharing of corpora within the research community. Such an aim implies that: (1) corpora are formatted and structured according to a (yet to be designed) model that is compatible with existing standards for corpora and for learning design specifications; (2) corpora are placed on a server offering cross-platform compatibility and free access; (3) an ethics policy is formulated (given that we work with the productions of individuals) as well as a charter for ownership and use of copyright. Furthermore, turning a contextualized learner corpus into an object of scientific study means specifying a relevant methodology for its examination and exploitation. This is why part of the Mulce project relates to the chain of data processing: transcripts, annotations, labels and tags, analyses and associated tools. Results of this processing bring new levels of description which must complement the original corpus.
Once these new corpora, and their associated tools and services, are made available to the research community, new dimensions appear. For example it becomes possible to have data re-analyzed by teams not involved in the original projects, to compare analyses of data extracted from different corpora, or even to standardize data processing and analytical tools.