Discovering LETEC corpora
Discovering LETEC corpora
If you want to learn more about LETEC we recommend the following route:
- Go to the Mulce.org website (http://Mulce.org). It regroups all the documentation on LETEC methodology including manuals, descriptions of the Mulce repository, publications, workshops, etc. Part of the site is in English.
- Go to the Mulce repository website (http://repository.Mulce.org) where you can access the corpora. The first page will display criteria for selecting different learning situations, environments, types of interactions. Ignore them at this stage. Click on the search button and the corpora will be listed. Start by skimming through the global corpora (for Copéas, Simuligne, Tridem, Archi21, etc.), i.e. do not consider distinguished corpora and avoid starting with downloads). The IDs of the global corpora all end with “letec-all”. For example, the Copéas ID is “mce-copeas-letec-all”. Most, if not all, of the global corpora descriptions are in English.
- Discover one LETEC. Download the Copéas global corpus. After you extract files from the Zip archive, click on the file named “index.htm”. It opens a window in your browser. Discover first the learning scenario (design). A graph will appear, check for small interactive buttons next to the nodes in order to navigate through sub-graphs down to the activity level. At this level, other small interactive buttons will give you access to the corresponding pedagogical guidelines in the format given to the learners. Do the same with the research protocol. You will find the interactions in the instantiation folder. Everything is coded in the XML file called “imsmanifest.xml”. However, if you are not familiar with XML, you may prefer to visualize the corpus in another way. Choose a distinguished corpus, for example, “mce-copeas-T5_contexte-all”. Read the form online (it is also in the ZIP). Download the corpus. In the archive you have the video (generally, videos are too large and have to be downloaded separately), the transcription in the TATIANA format and video tutorials that explain how to use our data with the software. Of course, you will have to download and install the TATIANA software.
- If you are interested in pedagogical corpora, you can download them at the same place, look at the activities and discover the corresponding materials. If you want to explore learning contexts other than Copéas, for example, intercultural courses, have a look at Simuligne, Ecofralin, Tridem, or Infral. You will find documents, videos, interactions in several languages (Spanish, German, English, French) corresponding to the different learning communities involved. (NB. All locations are displayed on the map when you enter the repository website). Do not forget to download large videos from another part of the repository. If you click on the link “Accès à une ressource de grande taille” you will have access to more than 200 resources. Your feedback is welcomed. The contact address is on the site.
- Citation and references. When you reuse data produced by other researchers, do not forget to cite them by giving the full reference of their work, like you would do for a publication. Bibliographic information is shown at the top of every corpus description, e.g. for the global Copéas LETEC: Chanier, T., Reffay, C., Betbeder, M-L., Ciekanski, M. & Lamy, M-N. (2009). LETEC (Learning and Teaching Corpus) Copéas. Mulce.org: Clermont Université. [oai: Mulce.org:mce-copeas-letec-all; http://repository.Mulce.org].
Discovering standards for corpus linguistics
Of course there exist many concurrent standards on the topics related to this chapter (see footnote on standards). The one chosen for the Mulce and the TEI-CMC projects reflect the multidisciplinarity of the teams involved (various fields in linguistics, applied linguistics, educational research, computer sciences). In order to minimize the number of references, we have just given their acronyms in the text. You can access their websites by just typing the corresponding keywords (OLAC, CLARIN, DARIAH, etc.). However, let us look more closely at some of them.
Concerning open access to data and resource formats:
- OpenData definitions: http://opendefinition.org/okd/ . We recommend using Creative Common licenses for data / corpora: http://opendefinition.org/okd/ . Preferably use a CC-BY or CC0 licences, if you want your corpora to be fully sharable and reusable.
- For further reading on the OpenData question in CALL, see the slide presentation “EUROCALL 2013, Survey on CALL in the Digital Humanities: considering CALL journals, research data”: http://Mulce-doc.univ-bpclermont.fr/spip.php?article93
- The CINES site explains, in English, what resource formats are eligible for archiving: (https://www.cines.fr/en/long-term-preservation/expertises/formats-expertise/facile/)
- Recommendations for the management of research data (here from Jisc, UK, for all disciplines): http://www.jisc.ac.uk/whatwedo/programmes/di_researchmanagement/
Concerning the learning design:
- When describing learning scenarios, research protocols, and data packaging, we used different specifications of the international IMS Global Learning Consortium (http://www.imsglobal.org/specifications.html ), namely IMS-learning Design (IMS-LD), and IMS-Content Packaging (IMS-CP).
Concerning interaction and language corpora:
- The Text Encoding Initiative (TEI) is the standard used for many different kinds of corpora. To access a full description in several languages and formats (PDF, Epub, Kindle): http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ You may prefer to start by reading a condensed version: http://www.tei-c.org/Guidelines/Customization/Lite/ . If you wish to examine some examples, you will need an XML editor. Although there exist many free ones, we recommend Oxygen, (academic license). All tutorials and examples are on the same website.
- CoMeRe corpora can be accessed from the ORTOLANG website (http://hdl.handle.net/11403/comere). The project website address is: http://comere.org . Current developments of the Interaction Space model, as well as other activities of the TEI-CMC group can be found here: http://wiki.tei-c.org/index.php/SIG:Computer-Mediated_Communication
Articles by this author
- Discovering LETEC corpora
- 2/15 Mulce reçoit un ISSN
- 11/14 Corpus Mulce transformés et publiés dans CoMeRe
- Results of the survey “CALL within the Digital Humanities" Eurocall 2013
- Results of the survey “CALL within the Digital Humanities: considering CALL journals, research data and the sharing of research results”