Tuesday 20 September 2011 by Thierry CHANIER

Final deliverable of the Mulce project sent to the ANR "Corpora and tools in Humanities" in March 2011.

Research into online learning, whether aimed at understanding this form of situated human learning, or at evaluating relevant pedagogical scenarios and settings, or at improving technological environments, requires interaction data, from all participants in the learning situations, to be made available. At present, interaction data are often either inaccessible, or only partly accessible to researchers not involved in the original projects. Moreover, data are fragmented and therefore decontextualized with respect to the original teaching / learning settings and may be buried in a proprietary format within the technological environment. Consequently, research lacks a scientific basis. In literature, comparisons are often attempted between objects that are ill-defined and which may in fact differ. The processes of scientific enquiry, such as re-analyzing, replicating, verifying, refuting or extending the original findings, are therefore impossible.

To address this inconsistency, we have created and disseminated a new type of corpus. This is a contextualized learner corpus, entitled "LEarning and TEaching Corpus" (LETEC ) (Olac-Letec, 2010). Such corpora include not only the data that correspond to the output of learner activity during online courses, but also the data that correspond to the contexts for such output. Sharing LETEC corpora within the research community implies the following three prerequisites: (1) corpora are formatted and structured according to a new model which is compatible with existing standards for corpora and for learning design specifications (Mulce-struct, 2010); (2) corpora are placed on a server offering cross-platform compatibility (according to the Open Language Archives Community – (OLAC-Mulce(2010)) and free access; (3) an ethical protocol, as well as copyright licenses, are formulated.

Mulce databank (2011) includes 6 LETEC corpora and 26 distinguishable corpora (Mulce-contenu, 2011) of 3 types: corpus designed as replication data sets (Gary, 2007) which associate a publication and its specifically formatted data, i.e. corpus including data structured in a way they can be directly used with open access analysis tools; corpus with analysis included and associated tools. We developed the information site (, 2011), which presents our methodological approach.

Regular meetings with external research teams offered occasions to start benchmarking analysis tools and our data. Eventually we attended or have organized international workshops which gathered researchers working on data and analysis tools of online interactions occurring in learning situations (Eurocall-Mulce, 2010).

