December 2018
Beginner to intermediate
684 pages
21h 9m
English
spaCy includes trained language models for English, German, Spanish, Portuguese, French, Italian, and Dutch, as well as a multi-language model for NER. Cross-language usage is straightforward since the API does not change.
We will illustrate the Spanish language model using a parallel corpus of TED Talk subtitles (see the GitHub repo for data source references). For this purpose, we instantiate both language models:
model = {}for language in ['en', 'es']: model[language] = spacy.load(language)
We then read small corresponding text samples in each model:
text = {}path = Path('../data/TED')for language in ['en', 'es']: file_name = path / 'TED2013_sample.{}'.format(language) text[language] = file_name.read_text()
Sentence ...