1 alphabetically
2 by probabilities


1 rel freqency WP 
2 rel freq actual corpus

only words with higher frequency in fo than in wp

we are looking for terms

could be but not so frequently

-6.146 = 50 -- jo närare null desto meir frekvent
confidence is conficence for the pair

likelihood of these words to be trans of each other

sme = dynamic compound
first part nom, gen, pl

if it never changes I can add it back
the reason they are removed is to get a smaller vocabulary size

lemma for compound
	ok for sme
	

updated, with all nouns, not the ones with high
containing also absolute freq


giza++ ??

n
a
v
exit rest

árvalit+V+TV+Der2+Der/eapmi+N+SgCmp#