Making statistics: ---------------- Replace the name of the files. grep LAJ cada_po_LAJ.txt | sed 's/LAJ/LAJ$/' | cut -d "$" -f2 | sed 's/LA://' | sed 's/ ERROR /$ ERROR /' | cut -d "$" -f1 | tr -s " " | sed 's/ $//' | sort | uniq -c > stat_cada_po Explanation: get the lines with LAJ in the file | add $ after LAJ | remove everything before $ | remove LA: | add $ before ERROR | remove everything after $ | squeeze whitespaces | remove any whitespace at the end for the line | sort alphabetically | unify and count =========================== Get only the wordforms of the sentence ---------------------------------------- grep BEAIVI cada_po_LAJ.txt | sed 's/""/$<"/g' | cut -d "$" -f2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60 | tr "$" " " | less Explanation: get the lines with BEAIVI in the file | add $ in front of wordform | add $ after wordform | cut the lines by delimiter $ and take 2,4,6,8th.... part | change $ to whitespace statistics ----------- echo '31.10.2011' > stat_cada_po cat cada_po_sentences | sed 's/LAJ /$LAJ /g' | tr "$" "\n" | grep LAJ | wc -l >> stat_cada_po echo 'oversikt' >> stat_cada_po cat cada_po_sentences | sed 's/LAJ /$LAJ /g' | tr "$" "\n" | grep LAJ | sed 's/LAJ/$/g' | sed 's/LA:/$/' | cut -d "$" -f2 | cut -d " " -f2 | sort | uniq -c | sort -nr >> stat_cada_po echo 'verb' >> stat_cada_po cat cada_po_sentences | sed 's/LAJ /$LAJ /g' | tr "$" "\n" | grep LAJ | sed 's/LAJ /$/' | sed 's/LA:/$/' | cut -d "$" -f2 | sed 's/ $//g' | sed 's/POSTV/$POSTV/g' | sed 's/PREV/$PREV/g' | sed 's/SPLITV/$SPLITV/g' |sed 's/NOVER/$NOVER/g' |sed 's/ VERB:/$ VERB:/g' |sed 's/ ADJ:/$ ADJ:/g' |sed 's/ SUBST:/$ SUBST:/g' | cut -d "$" -f2 | tr -d "$" | tr -s " " | sed 's/ $//' | sort | uniq -c | sort -nr >> stat_cada_po echo 'detaljer' >> stat_cada_po cat cada_po_sentences | sed 's/LAJ /$LAJ /g' | tr "$" "\n" | grep LAJ | sed 's/LAJ /$/' | sed 's/LA:/$/' |sed 's/ VERB/$ VERB/g' | cut -d "$" -f2 | sed 's/ $//g' | sort | uniq -c >> stat_cada_po