As part of the Language Researchers' Toolkit project, our researchers have created programmes to make it easier to analyse CHILDES corpora. 


This browser allows you to examine various statistics to identify which corpus to use.  

Bag-of-words Incremental Generation - Sentence Prediction Accuracy (BIGSPA)

This displays figures showing how well different statistics are at generating sentences in adult and child utterances in typologically-different languages.


Childes corpora are provided in various formats, but it can be difficult to convert them into data.frames which can be used in R for various analyses. Childes2csv is a programme that generates CSV files directly from CHILDES XML files. It can generate word or utterance level corpora at different levels (e.g. dialects, languages etc.).

Filter Combine

Another task that one often needs to do is to filter corpora for words/utterances that match particular rules. Then recode some of the columns into numeric format, group the data along other columns and collect some statistics.


This allows you to examine 1- to 4-grams in various corpora. It also displays Zipfian log-frequency log-rank curves.

Other toolkit links

There is information about other automatic tools at the toolkit page.


To cite these tools, please use this reference.

Chang, F. (2017) The LuCiD language researcher's toolkit [Computer software]. Retrieved from