!!!Useful commandoes for linguists !!Words in an analysed text !How many words, not numerals {{{ cat | grep '"<' | grep '[a-zA-Z]' | wc -l }}} !How many uniq lemmas, not numerals {{{ cat | grep -v '"<' | cut -d '"' -f2 | grep '[a-zA-Z]' | sort -u | wc -l }}} !!Compounds in an analysed text The syntactic analysis is important for getting the correct lemma through disambiguation. Many compounds are lexicalised in our analyser, and therefore we have to analyse the lemmas once more to find the compounds. \\ For all commands: Add 'sort -u' instead of 'uniq' to get numbers for uniqs !How many compound nouns {{{ cat | grep -v '"<' | cut -d '"' -f2 | grep '[a-záčžA-ZÁČŽ]' | usme | egrep 'Cmp.*N\+' |cut -f1 | uniq }}} !How many compounds with noun in the first part We don't want Der/NomAct as N: {{{ cat | grep -v '"<' | cut -d '"' -f2 | grep '[a-záčžA-ZÁČŽ]' | usme | egrep 'N\+.*Cmp' | grep -v 'NomAct.*Cmp' | cut -f1 | uniq }}} !How many compounds with verb in the first part We don't want Der/NomAg as V. {{{ cat | grep -v '"<' | cut -d '"' -f2 | grep '[a-záčžA-ZÁČŽ]' | usme | egrep 'V\+.*Cmp' | grep -v 'NomAg.*Cmp' | cut -f1 | uniq }}} !How many compounds with adjective the in first part We don't want +N as A: {{{ cat | grep -v '"<' | cut -d '"' -f2 | grep '[a-záčžA-ZÁČŽ]' | usme | sed 's/^$/¢/' | tr "\n" "€" | tr "¢" "\n" | egrep 'A\+[A-Za-z\+]*Cmp' | egrep -v 'N\+[A-Za-z\+]*Cmp' |cut -f1 |tr -d "€" | uniq }}} !How many compounds with adverb in the first part {{{ cat | grep -v '"<' | cut -d '"' -f2 | grep '[a-záčžA-ZÁČŽ]' | usme | egrep 'Adv\+.*Cmp' | grep -v 'NomAct.*Cmp' | cut -f1 | uniq }}}