This function turns a corpus of texts into a quanteda tokens object of sentences.
Arguments
- corpus
A
quantedacorpus object, typically the output of thecreate_corpus()function or the output ofcontentmask().- model
The spacy model to use. The default is "en_core_web_sm".
Details
The function first split each text into paragraphs by splitting at new line markers and then uses spacy to tokenize each paragraph into sentences. The function accepts a plain text corpus input or the output of contentmask(). This function is necessary to prepare the data for lambdaG().
