As part of the Language Researchers' Toolkit project, our researchers have created programmes to make it easier to analyse CHILDES corpora.
This browser allows you to examine various statistics to identify which corpus to use.
- Go to the CHILDES Browser.
Bag-of-words Incremental Generation - Sentence Prediction Accuracy (BIGSPA)
This displays figures showing how well different statistics are at generating sentences in adult and child utterances in typologically-different languages.
- Go to BIGSPA
Childes corpora are provided in various formats, but it can be difficult to convert them into data.frames which can be used in R for various analyses. Childes2csv is a programme that generates CSV files directly from CHILDES XML files. It can generate word or utterance level corpora at different levels (e.g. dialects, languages etc.).
- Go to Childes2CSV
Another task that one often needs to do is to filter corpora for words/utterances that match particular rules. Then recode some of the columns into numeric format, group the data along other columns and collect some statistics.
- Go to FilterCombine
This allows you to examine 1- to 4-grams in various corpora. It also displays Zipfian log-frequency log-rank curves.
- Go to Ngrams
Other toolkit links
There is information about other automatic tools at the toolkit page.
- Go to Toolkit
To cite these tools, please use this reference.
Chang, F. (2017) The LuCiD language researcher's toolkit [Computer software]. Retrieved from http://www.lucid.ac.uk/resources/for-researchers/toolkit/