A short description of a possible way to acquire new lemmata from the existing resources
based on infos found at
http://divvun.no/doc/ling/common.html

For instance, for sma:

1. resource location
    /usr/local/share/corp/gtbound/sma
    /usr/local/share/corp/gtfree/sma

2. tools: ccat
   http://divvun.no/doc/ling/catxml.html

~/gtsvn/gt/script/samiXMLParser>./ccat -h 

Usage: ccat <options> [FileName]
Print the contents of a corpus file in XML format.
The default is to print paragraphs with no type (=text type).
The possible options include:
	-l <lang>	Process elements in language <lang>.
	-a	 Print all text elements.
	-p	 Print plain paragraphs. (default)
	-T	 Print paragraphs with title type.
	-L	 Print paragraphs with list type.
	-t	 Print paragraphs with table type.
	-C	 Print corrected xml-files with corrections.
	-ort	 Print corrected xml-files with ortoghraphical corrections.
	-synt	 Print corrected xml-files with syntactical corrections.
	-lex	 Print corrected xml-files with lexical corrections.
	-typos	 Print corrections with tabs separated output.
	-S	 Print the whole text in a word per line. Errors are tab separated. 
	-r <dir> Recursively process directory dir and subdirs encountered.
	-h	 Print this help message.


  Ex. 1  Task: extract (only) sma text from a single file
     ccat -l sma /usr/local/share/corp/bound/sma/ficti/karijuse.txt.xml

  
  Ex. 2  Task: extract (only) sma text from a single file in one-word-per-line-format
     ccat -l sma -S /usr/local/share/corp/bound/sma/ficti/karijuse.txt.xml


--------------

Possible word acquisition steps:

 1. extract words (one word per line) as described above

 2. filter, preprocess the output based on patterns

 3. compare the new word/lemma list with the list of 
    already existing words/lemmata (for this task, Ciprian will check in 
    a simple xslt-stylesheet)

 4. done!