"BACKBONE pedagogic corpus design and development – a ’do it yourself’ approach"

1) A short description of the pedagogical context:

The BACKBONE pedagogic corpora and tools are being developed in the EU Lifelong Learning project "BACKBONE – Corpora for Content and Language Integrated Learning" (2009-10) []. BACKBONE is in the tradition of small pedagogic ’do it yourself’ corpora; forerunner projects are ELISA ( and SACODEYL (
BACKBONE focuses on online spoken interviews representing pedagogically neglected languages and varieties that include lesser taught languages, regional and socio-cultural varieties of more frequently taught languages, as well as European manifestations of English as a lingua Franca. Pedagogic annotation combines thematic, grammatical, lexical and functional characteristics. In addition, pedagogic enrichment resources are integrated, which consist in corpus-based language learning materials and are supported by the authoring software Telos Language Partner and the e-learning platform Moodle.
The BACKBONE corpora are designed to support content and language integrated learning (CLIL) in secondary, higher and vocational education. Demonstration and piloting courses in Moodle illustrate how BACKBONE online searches and search results can be used in e-learning and blended learning scenarios.

2) A short description of BACKBONE pedagogic corpora:

Seven BACKBONE corpora are available for online demonstration. They contain video-recorded interviews in Polish and Turkish as lesser taught languages, English, French, German and Spanish as regional & socio-cultural varieties of more frequently taught languages, as well as European manifestations of English as a lingua franca. The transcribed interviews have been pedagogically annotated; in addition, they have been pedagogically enriched with ready-made learning packages and e-learning/blended language learning activity templates.

The BACKBONE search interface offers several online search modes, which have been designed with a pedagogic application in mind: ’Browse’ is used for reading and watching entire interviews; in ’Section search’ thematic and linguistic annotation categories can be flexibly combined to look for corresponding interview passages along with their video and sound files; ’Concordances’, ’Co-occurrences’ and ’Word lists’ provide various lexical searches either for an entire corpus or restricted to certain thematic fields.

3) A short description of BACKBONE search interface:

The BACKBONE suite of pedagogic corpus tools comprises and integrates software components for offline transcription and annotation, virtual resource management, and online search.

The BACKBONE search interface runs online in all browsers that support JavaScript. This feature makes it very easy to use, as there is no software installation needed on the client side, i.e. on the computer of the user. All that is needed is a web browser and an internet connection. The seven BACKBONE corpora can be accessed free of charge.

On the server end, the search interface runs with standard Java Enterprise technology and can be set up on any standard server in a Java Servlet Container. Like the other BACKBONE tools, the search interface is made available as open source software (GPL). The data of the corpus files are stored using XML technology, and the coding format follows the TEI P5 guidelines. Video and sound files are made available from a Windows Media streaming server; flash streaming will be available as well.

My workshop presentation focuses on the online search interface; for more information about the other tools, please see the BACKBONE website:

4) Access to the BACKBONE search interface:

The BACKBONE corpora (as described above) are accessible for online search at

5) Research questions associated to the BACKBONE search interface:

The primary purpose of the BACKBONE corpora and tools is to make authentic spoken discourse – in our case spoken interviews – available for pedagogic exploitation. Research questions thus concern in particular the pedagogic angle. Two broad corpus-pedagogic research areas can be distinguished.

(1) Development and pedagogic evaluation of corpus-based e-learning templates that use BACKBONE-type searches and specify ways of deploying the search results in constellations of e-learning tasks. These templates should be fairly open and neutral with regard to the e-learning tools used. Possible options include the resources and activity functions provided by e-learning platforms like e.g. Moodle, combined with separately available multimedia authoring tools (e.g. Telos Language Partner, Hot Potatoes) to) or Web 2 environments for communication and collaboration (e.g. Wikispaces, Skype).

(2) Deployment of BACKBONE-based e-learning templates as a methodological basis for defining and carrying out language learning studies that focus on issues of learner autonomy and guidance, collaboration and proximal development, or authentication and ’linguistic ownership’ in CLIL.

