Databases Download

 

 

 

 

 

Each file can be downloaded individually in .txt format below (right click, save as..). Full zip archives including all files can also be downloaded in .txt format as well as in Excel, and DBase formats (see bottom of the page).

Permission is granted to use these databases for non-commercial purposes.
These databases may not be incorporated into any product which is sold without prior permission from the authors.

Deep linking to this download-page, or to the database files, is not encouraged (updated files and related information are indicated only on the web site entry). For the same reason, the databases should not be stored on other servers for public usage (this does not apply for private usage).


- Lexical Databases by Educational Level -

Grade 1
(CP)

 
Grade 2
(CE1)
 
Grades 3 to 5
(CE2-CM2)
 
All Grades
(CP-CM2)
manu1   
 
manu2   
 
manu35   
 
manuAll   
Each file includes all lexical entries having a frequency of occurrence of at least 1 in the educational level considered (F frequency from Manulex). The lexical entries are characterized both by level-independent and by level-dependent variables. Values of level-independent variables are identical for each level since they are not function of the word corpus specific to each level (e. g., word length, number of syllables, ...). Values of level-dependent variables are function of the word corpus corresponding to the specific level considered. For example, syllable frequency or grapheme-phoneme consistency are function of the reading lexicon corresponding to each specific level. A full description of the databases is provided in the manual.
(see Field description)
 
- Orthographic and Phonographic Neighborhoods -

Grade 1
(CP)

Grade 2
(CE1)
Grades 3 to 5
(CE2-CM2)
All Grades
(CP-CM2)
Orthographic Neighbors
no1   
no2   
no35   
noAll   
Phonographic Neighbors
nop1   
nop2   
nop35   
nopAll   
These files list the orthographic neighbors (filenames starting by "no") and the phonographic neighbors (filenames starting by "nop") for each lexical entry appearing in the word corpus. Neighborhoods are determined separately for each level since they are function of the words known at each level. See the manual for details.
(see Field description)

 
- Homographic Heterophones and Heterographic Homophones -

Grade 1
(CP)

Grade 2
(CE1)
Grades 3 to 5
(CE2-CM2)
All Grades
(CP-CM2)
Homographic Heterophones
hg1   
hg2   
hg35   
hgAll   
Heterographic Homophones
hp1   
hp2   
hp35   
hpAll   
These files list the homographic heterophones (identical orthographic entries having dissimilar phonologies) and heterographic homophones (identical phonological entries having dissimilar orthographies) for each lexical entry appearing in the word corpus. They are determined separately for each level since they are function of the words known at each level. See the manual for details.
(see Field description)

 
- Sublexical Tables -

Grade 1
(CP)

Grade 2
(CE1)
Grades 3 to 5
(CE2-CM2)
All Grades
(CP-CM2)
Letter (by position) frequency
letter   
Phoneme (by position) frequency
phonem   
Bigram (by position) frequency
bigram   
Trigram (by position) frequency
trigram   
Biphone (by position) frequency
biphone   
Syllable (by position) frequency
syllable   
Grapheme-to-Phoneme Frequency/Consistency
GP   
Phoneme-to-Grapheme Frequency/Consistency
PG   
Letter, phoneme, bigram, trigram, biphone and syllable frequencies are computed as a function of the following three serial positions: initial, middle, or final. Frequency and consistency of grapheme-phoneme and phoneme-grapheme associations are also computed according to the same three serial positions. Data corresponding to the different levels are grouped in the files. See the manual for further details.
(see Field description)

 

Click here to download all files at once in .txt format    

All files in .txt format (tab delimited) can be opened using spreadsheet programs (e. g. Open Office) or text editors such as ConText (PC), BBEdit_Lite or TextWrangler (Mac). Note that the '.' and '-' characters are used as separators for word segmentation in several fields of the databases, so do not use these characters to separate colums in spreadsheets when importing the files. For download, .txt files are included in .zip archives. Use unzip utilities to extract the files (e. g., 7-zip for PC, Zip Tools for Mac).

 

Other formats

 

All files in Excel format (.xls)    

All files in DBase format (.dbf)     -see here how to exploit the dbf format-

All files in Access format -forthcoming-