IJTEL : lessons learned in five years by the Mulce project
In order to make replication possible for interaction analysis in
online learning, the French project named Mulce (2007–2010) and its team
worked on requirements for research data to be shareable. We defined a
Learning & Teaching Corpus (LETEC) as a package containing the data issued
from an online course, the contextual information and metadata, necessary to
make these data visible, shareable and reusable. These human, technical and
ethical requirements are presented in this paper. We briefly present the
structure of a corpus and the repository we developed to share these corpora.
Related works are also described and we show how conditions evolved
between 2006 and 2011. This leads us to report on how the Mulce project was
faced with four particular challenges and to suggest acceptable solutions for
computer scientists and researchers in the humanities: both concerned by data
sharing in the Technology Enhanced Learning community.
Jim Gray, a renowned computer scientist, presented a new paradigm of scientific research, labelled “E-science” or “data-intensive science”. It is characterized as the fourth paradigm in the research cycle after experiment, theory, and simulation.
have to do better at producing tools to support the whole research cycle […]. Today, the tools for capturing data […] are just dreadful. After you have captured the data, you need to curate it before you can start doing any kind of data analysis, and we lack good tools for both data curation and data analysis. Then comes the publication of the results of your research, and the published literature is just the tip of the data iceberg. (Gray, 2007: xvii)
Although Gray took his examples from the fields of biology and environmental sciences, in the TEL community, we are also directly concerned with this paradigm. Data have to be shared. Tools for organising and analysing these data need to be much improved, as well as the links between data and the most visible part of our work, namely publications.
First and foremost we must be concerned by the extensiveness of data collection and by the description of the context. Studying collaborative online learning, in order to understand this specific type of situated human learning, to evaluate scenario, or to improve technological environments, requires accessibility to interaction data collected from various participants in the learning situations. However, interdisciplinary communities involved in this research have not been able to characterise a shareable scientific object according to a comprehensible methodology. On one hand, one finds subsets of data, not contextualised with respect to the pedagogical and technological learning situations. On the other hand, raw data are inextricably tangled in specific software using proprietary formats. A simple collection of students’ online interaction data does not represent a scientific object, as Kern emphasised in the language learning field:
“Researchers must carefully document the relationships among media choice, language usage, and communicative purpose, but they must also attend to the increasingly blurry line separating linguistic interaction and extra linguistic variables. […] Studies of linguistic interaction will likely need to account for a host of independent variable: the instructor’s role as mediator, facilitator, or teacher; cross-cultural differences in communicative purpose and rhetorical structure; institutional convergence or divergence on defining course goals; and the affective responses of students involved in online language learning projects.” (Kern, Ware and Warshauer, 2004).
In this article, we develop a new scientific object created in 2006 by the Mulce project, namely the LEarning & TEaching Corpus (LETEC). The LETEC structure has been used to organize data compiled from a variety of collaborative online learning and teaching situations into various sorts of corpora. We also present the Mulce repository which is the location where data can be shared in order to facilitate the comparison of analyses.
Collaborative online learning situations have a number of variables which are difficult to control. These variables make the comparison of scientific results difficult and the replication of a given learning and teaching experience near impossible. For example, applying the same learning design to a new cohort of learners does not imply that phenomena observed within the first cohort will occur in the latter. Replication in ecological contexts being impossible to obtain, we worked to make interaction and production traces, issued from Learning Management Systems (LMS), available to the whole research community. This was our (Mulce) strategy to make these situations comparable and re-analysable.
Since the beginning of the Mulce project, the international situation has changed in various scientific fields with the apparition of mandates for open access to research results, including access to data. Other TEL repositories have appeared, alongside discussions around tools for data analysis, or around data structure appropriate for the encoding of interactions. The question of finding a common framework is set. Challenges faced in the Mulce project, whose audience includes TEL researchers, differ from questions raised among the ITS and CSCL communities. This persuaded us to propose a flexible framework which can improve the quality of TEL research and, at the same time, the creditability of researchers belonging to our heterogeneous community.
After this introduction, the paper is organized into three main sections before a short conclusion. Section 2 presents the achievements of the Mulce project. Section 3 gives an overview of the changes and progress made, since 2006, on the data sharing topic. The experience of the Mulce project with respect to four important challenges is reported upon in section 4 with suggestions to improve corpora building, deposit and reuse for the TEL community.
Achievements of the Mulce project
Only subtitles hereafter.
- Learning & Teaching Corpus definition: structure of a coherent dataset.
- Corpus composition and structure: Data longevity, human readability and personal data conformance
- The core component Instantiation: machine readability
- The License component: collecting, distributing and accessing the data
- A repository to share and visualize corpora: open access data
- Linking analyses to the main corpus: notion of distinguished corpus
What is new in this arena since 2006?
see article for full text
Challenges and solutions according to the Mulce experience
Considering the impulse of some researchers, the support of institutions and the current development of research data repositories, one might believe that sharing research data has become easy. However, are individual scientists ready to spend time making their data shareable and accessible through data repositories? Nelson (2009), for example, asserts that open archives stay empty in spite of the intellectual agreement from all disciplines as regards to the benefits of data sharing. In some disciplines, including physics, mathematics and computer science, communities are populating data repositories. This is also the case for some specific data banks in fields such as geophysics, biodiversity, ecology, Protein Data Bank, GenBank, …
“But those discipline-specific successes are the exception rather than the rule in science. All too many observations lie isolated and forgotten on personal hard drives and CDs, trapped by technical, legal and cultural barriers — a problem that open-data advocates are only just beginning to solve.” (Nelson, 2009)
In this section, we detail four challenges a data repository may face. For each challenge, we report the Mulce project experience and draw some perspectives to face them. As already mentioned, the motivation for sharing research data varies from one discipline to another. LETEC corpora correspond to a multidisciplinary field. It not only concerns computer science, but also educational science, domain specific learning sciences, like applied linguistics when, for example, language learning is at stake. Every discipline has its own methodology, units of analysis and viewpoints concerning the observed and recorded data. They also have long traditions of well-established content analysis methods where a systematic organization of data and processing on a large scale is not the rule. This is the reason why, in the description of the following challenges, we will sometimes make the distinction between researchers in the humanities and in computer science.
Only subtitles hereafter.
- Challenge 1: exchanging corpus and reusing data
- Challenge 2: Building an exhaustive, well structured and contextualised corpus: A hard work for scientists from different disciplines
- Challenge 3: Making the deposit of a new corpus, available for a non Mulce member
- Challenge 4: Making connections with analysis tools
Reffay, C., Betbeder, M.-L., Chanier, T. (2012). "Multimodal Learning and Teaching Corpora Exchange: Lessons learned in 5 years by the Mulce project". Special Issue on dataTEL : Datasets and Data Supported Learning in Technology-Enhanced Learning, International Journal of Technology Enhanced Learning (IJTEL), (4) , 1/2). Pp 11-30. DOI: 10.1504/IJTEL.2012.048310 ; http://edutice.archives-ouvertes.fr...