Le terme "linguistique de corpus" -- traduction littérale de corpus linguistics -- reste une notion floue en français.

Dans le monde anglo-saxon, la définition de corpus linguistics est relativement claire. Elle se définit par les deux éléments suivants :

Il s'agit donc d'une approche empirique, qui cherche à dégager des régularités à partir des observations effectuées sur le corpus.

Corpus ou corpus?

Un corpus dans cette approche est donc un ensemble structuré de textes, assemblés selon des critères linguistiques explicites, sur support électronique, dans le but d'être représentatif au maximum d'une langue, d'une variété de langue, d'un sous-ensemble. Dans cette optique, le "corpus" n'est donc pas un ensemble d'exemples extraits d'un corpus.


Conférences  en relation avec la linguistique de corpus


3e Jounées de la Linguistique de Corpus

Les 3èmes Journées de la linguistique de corpus auront lieu à Lorient les 11, 12 et 13 septembre 2003. Elles sont organisées par le Centre de Recherche en Littérature, Linguistique et Civilisation (CRELLIC - UBS Lorient) avec la colloboration de Valoria (UBS Vannes) et sont également soutenues les Départements LEA et LLCE de l'Université de Bretagne Sud.

Ces 3èmes Journées de Linguistique de Corpus visent à promouvoir le développement de la linguistique de corpus en France. Elles réunissent des chercheurs venus d'horizons divers qui s'intéressent à l'utilisation de l'informatique pour l'analyse des faits de langues. Les contributions attendues pourront concerner, sans exclusive :


Corpus Linguistics: The state of the art twenty-five years

The aim for ICAME-25 is to take stock of corpus linguistics after twenty-five years of intense, fruitful activity. Hence papers on virtually all the fields touched on by corpus linguistics are welcome. A special encouragement will go to anyone focussing on the compilation and use of specialised corpora, be they monolingual or multilingual; indeed, over the last few years LSP corpora have gained more and more ground as valuable means of helping scholars identify, describe and discuss the typical features of many specialised fields of knowledge.


Corpus Use and Learning to Translate

CULT 2004 is the follow up to CULT 2000, organised by the Scuola Superiore di Lingue Moderne per Interpreti e Traduttore of the Università di Bologna in Forlì in November 2000 (

The aim of the conference is to bring together practitioners and theorists sharing an interest in the design and use of corpora in translation-related areas, with special reference to translator and interpreter training. Contributions, in the form of papers, demonstrations and posters, are sought on, but not restricted to the following topics:


6th Teaching and Language Corpora Conference

Following the highly successful conferences in Lancaster (TaLC 1994 and TaLC 1996), Oxford (TaLC 1998), Graz (TaLC 2000) and Bertinoro (TaLC 2002), the sixth international TALC conference will bring together practitioners and theorists with a common interest in the use of corpus tools for such purposes as:

The conference will be held in Granada (Spain), beginning at 9 a.m. on July 7 and ending at 7 p.m on July 9.


International Conference on Language Resources and Evaluation

In the Information Society, the pervasive character of Human Language Technologies (HLT) and their relevance to practically all fields of Information Society Technologies (IST) has been widely recognised.

The term language resources (LRs) refers to sets of language data and descriptions in machine readable form, used in many types of areas/components/systems/applications:

creation and evaluation of natural language, speech and multimodal algorithms and systems,
software localisation and language services,
language enabled information and communication services,
knowledge management,
e-commerce, e-publishing, e-learning, e-government,
cultural heritage,
linguistic studies,

Examples of LRs are written or spoken corpora and lexica, which may be annotated or not, multimodal resources, grammars, terminology or domain specific databases and dictionaries, ontologies, multimedia databases, etc. LRs also cover basic software tools for the acquisition, preparation, collection, management, customisation and use of the above mentioned examples.

The aim of this conference is to provide an overview of the state-of-the-art, discuss problems and opportunities, exchange information regarding LRs, their applications, ongoing and planned activities, industrial uses and needs, requirements coming from the new e-society, both with respect to policy issues and to technological and organisational ones. LREC will also elaborate on evaluation methodologies and tools, explore the different trends and promote initiatives for international collaboration in the areas mentioned above.

Mise à jour Novembre 2003