December 2018
Beginner to intermediate
684 pages
21h 9m
English
We will illustrate sentence detection by calling the NLP object on the first of the articles:
doc = nlp(bbc_articles[0])type(doc)spacy.tokens.doc.Doc
spaCy computes sentence boundaries from the syntactic parse tree so that punctuation and capitalization play an important but not decisive role. As a result, boundaries will coincide with clause boundaries, even for poorly punctuated text.
We can access parsed sentences using the .sents attribute:
sentences = [s for s in doc.sents]sentences[:3][Voting is under way for the annual Bloggies which recognize the best web blogs - online spaces where people publish their thoughts - of the year. ,Nominations were announced on Sunday, but traffic to the official site was so ...