Text corpora prepared during the project time:
For the authorship attribution problem:
- STENOGRAMOS_INDV contains Lithuanian parliamentary transcripts (download)
- FORUMAS_INDV contains Internet forum texts (download)
- GROŽINĖ_INDV contains fiction texts (download)
- INT_KOMENTARAI_INDV contains Internet comments (download)
- INT_KOMENTARAI_INDV2 contains Internet comments (expanded) (download)
For the author profiling problem:
- AMŽIUS_PROF contains Lithuanian parliamentary transcripts for author profiling by age characteristic (download)
- GROŽ_AMŽIUS_PROF contains fiction texts for author profiling by age characteristic (download)
- LYTIS_PROF contains Lithuanian parliamentary transcripts for author profiling by gender characteristic (download)
- GROŽ_LYTIS_PROF contains fiction texts for author profiling by gender characteristic (download)
- POLITINĖS_PAŽIŪROS_PROF contains Lithuanian parliamentary transcripts for author profiling by political attitude characteristic (download)
Meta information about the corpora (inside the downloads) is in Lithuanian so far; therefore if you have any questions, please, do not hesitate to contact as.
The corpora can be used in your research as well!