The files are fetched on gtoahpa, gtweb, by this command: cat user_log.txt |grep 'False.* sma nob.*201[67]'|\ cut -f1 > errsma.list cat orig/errsme.list |sort|uniq -c|sort -nr|cut -c6-|\ usme|grep '?'|cut -f1|unob|grep '?'|cut -f1|grep -v '[ Ã%¾¡‘]'|\ grep '[a-zá]' > freq/errsme.freq The list files are in orig, and contain the raw entries. The freq files are in freq, and contain the words recognised neither by L1 or L2 analysers, without conversion errors.