At Fri, 24 Jun 2011 08:38:25 +0100, Peter Alcibiades wrote:
Does anyone know any packages that do word recurrence intervals, something called the trigram markov model etc to test for same authorship of two passages? Obviously a linux package would be best, but windows would be ok. Whatever works.
Are you familiar with R? Quite a few of my colleagues (especially psychologists) use it. It may be worth spending a bit of time learning if you're interested in this sort of thing. Here are a few pointers for NLP:
http://cran.ma.imperial.ac.uk/web/views/NaturalLanguageProcessing.html
The Python Natural Language Toolkit may also be worth considering: http://www.nltk.org/. The book http://www.nltk.org/book is available online.
Techniques used for language detection are often quite similar to those used for authorship attribution, e.g.:
http://misja.posterous.com/language-detection-with-python-nltk
Sounds like a fun project.
Best,