Cyber-langagerie: 10/11/08

TIME MAGAZINE CORPUS (100 MILLION WORDS, 1923-2006)

Citations

« This website allows you to quickly and easily search more than 100 million words of text of American English from 1923 to the present, as found in TIME magazine. You can see how words and phrases have increased and decreased in usage and see how words have changed meaning over time. »

À noter

Accès au texte complet

Interface convivial

Copie-écran des options possibles :

Choose the type of display

CHART: This option presents "bar charts" that indicate the overall frequency for all matching words or phrases in each section of the corpus. This is probably the best option for comparing between different genres (spoken, magazines, etc), or to compare time blocks since 1990. (Example of chart display)

LIST: Choose this option to see a listing of each individual word or string that matches the query. (More information on types of search strings).

COMPARE WORDS: This allows you to compare the collocates (nearby words) for two different words, such as small / little, or start / begin, which provides insight into the difference in meaning or use of these two words. (More information on word comparisons).
Beaucoup d'autres options sont possibles.
Une excellente source à consulter si on tient compte de la notoriété du magazine qui a toujours été très près de l'actualité.

Corpus http://corpus.byu.edu/time/

EUROPEAN PARLIAMENT INTERPRETING CORPUS

Citations du site

« EPIC is an open, parallel, trilingual (Italian, English and Spanish) corpus of European Parliament speeches and their corresponding interpretations currently being compiled at SITLeC (University of Bologna).

....

In 2004 several European Parliament plenary sessions were recorded off the news channel EbS (Europe by Satellite). By selecting different audio channels, it was possible to record the original speakers and the interpreters working in the various booths (in our case, Italian, English and Spanish). All the material thus obtained is being digitised and edited by using dedicated software in order to create a multimedia archive. At the moment, video and audio files are not available on-line, but information on the content and the structure of the archive can be obtained by clicking on Multimedia Archive in the left hand-side bar.

...

The final step in the compilation of EPIC is the alignment of source texts and target texts in order to create parallel subcorpora (see Aligned Texts). Overall, EPIC is made up of three subcorpora of original texts (Org-It, Org-En and Org-Es) and 6 subcorpora of interpreted texts (indicated as Int followed by the language direction, e.g. En-It for English into Italian) covering all the combinations and directions of the three languages, as well as 6 aligned subcorpora of source and target texts (indicated as Org + Int).

Liens associés

http://dev.sslmit.unibo.it/corpora/corpora.php

http://wacky.sslmit.unibo.it/doku.php?id=corpora

Iframe Eureka

Iframe Eureka.
Interrogation des mémoires de traduction
Notes : Les quatres fenêtre sont interrogeables séparément à partir d'une seule page. Les langues par défaut de LinearB sont l'anglais et le français mais il est possible de les changer comme celles des autres sites. Utiliser le clic droit de la souris sur les liens Open in new Tab ou Open in new Window pour plus de convivialité. Cette page fonctionne dans IE, Firefox et Google Chrome.

Adresse

http://pages.globetrotter.net/mverge/eureka/4windows_page.htm

Cyber-langagerie

Rechercher sur ce blogue