Skip to contents

This function uses quanteda::kwic() to return a concordance for a search pattern. The function takes as input three datasets and a pattern and returns a data frame with the hits labelled for authorship.


  token.type = "word",
  window = 5,
  case_insensitive = TRUE


A quanteda corpus object, such as the output of create_corpus().

A quanteda corpus object, such as the output of create_corpus().

A quanteda corpus object, such as the output of create_corpus(). This is optional.

A string. It can be any sequence of characters and it also accepts the use of * as a wildcard.


Choice between "word" (default), which searches for word or punctuation mark tokens, or "character", which instead uses a single character search.


The number of context items to be displayed around the keyword (a quanteda::kwic() parameter).


Logical; if TRUE, ignore case (a quanteda::kwic() parameter).


The function returns a data frame containing the concordances for the search pattern.


concordance(enron.sample[1], enron.sample[2], enron.sample[3:49], "wants to", token.type = "word")
#>                 docname from  to                 pre     node           post
#> 1 known [Kh Mail_1].txt    5   6             N N N N wants to be N when he V
#> 2 known [Ld Mail_5].txt  160 161          D S D . he wants to   V to V the N
#> 3 known [Lb Mail_1].txt  573 574 our N . anyone that wants to    V us is J .
#>   authorship
#> 1          Q
#> 2  Reference
#> 3  Reference

#using wildcards
concordance(enron.sample[1], enron.sample[2], enron.sample[3:49], "want * to", token.type = "word")
#>                 docname from  to                pre       node
#> 1 known [Kw Mail_2].txt  672 674 let me know if you want me to
#> 2 known [Lc Mail_5].txt  175 177       s N . if you want me to
#> 3 known [Ml Mail_5].txt  242 244  need . you do n't want me to
#>                    post authorship
#> 1       V on other N in  Reference
#> 2        , i can put on  Reference
#> 3 come work for you too  Reference

#searching character sequences with wildcards
concordance(enron.sample[1], enron.sample[2], enron.sample[3:49], "help*", token.type = "character")
#>                    docname from   to   pre  node  post authorship
#> 1    known [Kh Mail_1].txt  703  707 need  help  V it           Q
#> 2    known [Kh Mail_1].txt 2014 2018 want  help  V it           Q
#> 3    known [Kh Mail_3].txt 1797 1801  N ,  helpe d the          K
#> 4    known [Kh Mail_4].txt   52   56  P P  helpe d the  Reference
#> 5  unknown [Kw Mail_3].txt 2756 2760 ding  help  in th  Reference
#> 6    known [Kw Mail_5].txt   31   35 your  help  and N  Reference
#> 7    known [Kw Mail_5].txt 1463 1467 need  help  in do  Reference
#> 8    known [Lc Mail_2].txt 1600 1604 some  help  . why  Reference
#> 9    known [Lc Mail_5].txt 1163 1167 d of  help  and B  Reference
#> 10   known [Ld Mail_2].txt  285  289 ally  help  us ou  Reference
#> 11   known [Lt Mail_1].txt  884  888 r be  helpi ng to  Reference
#> 12   known [Lt Mail_1].txt  919  923 , or  help  V a N  Reference
#> 13   known [Lt Mail_3].txt  910  914 your  help  as a   Reference
#> 14   known [Lt Mail_4].txt 1611 1615 ttle  help  from   Reference
#> 15 unknown [Lk Mail_4].txt 1243 1247 N to  help  V N f  Reference
#> 16 unknown [Lk Mail_4].txt 1272 1276 N to  help  V the  Reference
#> 17   known [Lk Mail_1].txt 1512 1516 ease  help  him w  Reference
#> 18   known [Lk Mail_2].txt  387  391 ight  help  . ple  Reference
#> 19   known [Lk Mail_3].txt  994  998 ease  help  him a  Reference
#> 20 unknown [Lb Mail_3].txt 2279 2283 N to  help  our N  Reference
#> 21 unknown [Lb Mail_3].txt 2405 2409  and  help  the N  Reference
#> 22 unknown [Lb Mail_3].txt 2479 2483 g to  help  out w  Reference
#> 23 unknown [Lb Mail_3].txt 2617 2621  and  help  them   Reference
#> 24   known [Lb Mail_1].txt 1363 1367 d to  help  you i  Reference
#> 25   known [Lb Mail_2].txt 1652 1656  and  help  in V   Reference
#> 26   known [Lb Mail_2].txt 1676 1680  and  helpi ng ea  Reference
#> 27   known [Lb Mail_4].txt 1038 1042 e to  help  P N a  Reference
#> 28   known [Lb Mail_5].txt 1066 1070 this  helps  V ou  Reference
#> 29   known [La Mail_2].txt 2086 2090 ould  help  V the  Reference
#> 30   known [La Mail_2].txt 2494 2498 ould  help  get t  Reference
#> 31   known [La Mail_4].txt 1908 1912 also  help  V N .  Reference
#> 32   known [La Mail_5].txt 2424 2428 will  help  the N  Reference
#> 33   known [Mf Mail_2].txt  805  809 your  help  . thi  Reference
#> 34   known [Mf Mail_2].txt 2097 2101  any  help  you c  Reference
#> 35   known [Mf Mail_2].txt 2458 2462  can  help  with   Reference
#> 36   known [Ml Mail_1].txt  596  600  you  help  and V  Reference
#> 37   known [Ml Mail_1].txt 1223 1227 your  help  and l  Reference
#> 38   known [Ml Mail_1].txt 2492 2496  and  help  save   Reference
#> 39   known [Ml Mail_1].txt 2598 2602 your  help  and V  Reference
#> 40   known [Ml Mail_2].txt   12   16 your  help  and V  Reference
#> 41   known [Ml Mail_4].txt  622  626 your  help  in V   Reference
#> 42   known [Ml Mail_4].txt 1296 1300 ou N  helpe d us   Reference
#> 43   known [Ml Mail_4].txt 1589 1593 your  help  so th  Reference
#> 44   known [Ml Mail_4].txt 1962 1966 your  help  and V  Reference
#> 45   known [Ml Mail_5].txt  475  479 an B  help  you V  Reference