Thursday 4 June 2015 by Thierry CHANIER

Discovering LETEC corpora

If you want to learn more about LETEC we recommend the following route:

  • Go to the website ( It regroups all the documentation on LETEC methodology including manuals, descriptions of the Mulce repository, publications, workshops, etc. Part of the site is in English.
  • Go to the Mulce repository website ( where you can access the corpora. The first page will display criteria for selecting different learning situations, environments, types of interactions. Ignore them at this stage. Click on the search button and the corpora will be listed. Start by skimming through the global corpora (for Copéas, Simuligne, Tridem, Archi21, etc.), i.e. do not consider distinguished corpora and avoid starting with downloads). The IDs of the global corpora all end with “letec-all”. For example, the Copéas ID is “mce-copeas-letec-all”. Most, if not all, of the global corpora descriptions are in English.
  • Discover one LETEC. Download the Copéas global corpus. After you extract files from the Zip archive, click on the file named “index.htm”. It opens a window in your browser. Discover first the learning scenario (design). A graph will appear, check for small interactive buttons next to the nodes in order to navigate through sub-graphs down to the activity level. At this level, other small interactive buttons will give you access to the corresponding pedagogical guidelines in the format given to the learners. Do the same with the research protocol. You will find the interactions in the instantiation folder. Everything is coded in the XML file called “imsmanifest.xml”. However, if you are not familiar with XML, you may prefer to visualize the corpus in another way. Choose a distinguished corpus, for example, “mce-copeas-T5_contexte-all”. Read the form online (it is also in the ZIP). Download the corpus. In the archive you have the video (generally, videos are too large and have to be downloaded separately), the transcription in the TATIANA format and video tutorials that explain how to use our data with the software. Of course, you will have to download and install the TATIANA software.
  • If you are interested in pedagogical corpora, you can download them at the same place, look at the activities and discover the corresponding materials. If you want to explore learning contexts other than Copéas, for example, intercultural courses, have a look at Simuligne, Ecofralin, Tridem, or Infral. You will find documents, videos, interactions in several languages (Spanish, German, English, French) corresponding to the different learning communities involved. (NB. All locations are displayed on the map when you enter the repository website). Do not forget to download large videos from another part of the repository. If you click on the link “Accès à une ressource de grande taille” you will have access to more than 200 resources. Your feedback is welcomed. The contact address is on the site.
  • Citation and references. When you reuse data produced by other researchers, do not forget to cite them by giving the full reference of their work, like you would do for a publication. Bibliographic information is shown at the top of every corpus description, e.g. for the global Copéas LETEC: Chanier, T., Reffay, C., Betbeder, M-L., Ciekanski, M. & Lamy, M-N. (2009). LETEC (Learning and Teaching Corpus) Copéas. Clermont Université. [oai:;].

Discovering standards for corpus linguistics

Of course there exist many concurrent standards on the topics related to this chapter (see footnote on standards). The one chosen for the Mulce and the TEI-CMC projects reflect the multidisciplinarity of the teams involved (various fields in linguistics, applied linguistics, educational research, computer sciences). In order to minimize the number of references, we have just given their acronyms in the text. You can access their websites by just typing the corresponding keywords (OLAC, CLARIN, DARIAH, etc.). However, let us look more closely at some of them.

Concerning open access to data and resource formats:

Concerning the learning design:

  • When describing learning scenarios, research protocols, and data packaging, we used different specifications of the international IMS Global Learning Consortium ( ), namely IMS-learning Design (IMS-LD), and IMS-Content Packaging (IMS-CP).

Concerning interaction and language corpora:

