!!!Meeting setup

* Date: 05.03.2007
* Time: 09.00 Norw. time
* Place: Internet
* Tools: SubEthaEdit, iChat

!!!Agenda

# Opening, agenda review
# Reviewing the task list from last week
# Documentation - divvun.no
# Corpus gathering
# Corpus infrastructure
# Infrastructure
# Linguistics
# name lexicon infrastructure
# Spellers
# Other issues
# Summary, task lists
# Closing

!!!1. Opening, agenda review, participants

Opened at 09:53.

Present: __Sjur, Steinar, Thomas__

Absent: __Børre, Maaren, Saara, Tomi, Trond__

Agenda accepted as is.

!!!2. Updated task status since last meeting

!! Børre

On Winter Holidays.

!! Maaren
* lexicalise actio compounds
* Manually mark speller test documents for typos


!! Saara
* continue aligning the rest of the parallel files
* prepare files for manual alignment
** done for bible files
* add ABBR, ACR, clitics to closed classes + ADV to paradigm generator
** done
* update lexc2xml with comment field
** in progress
* start improving the corpus interface for Sámi in Oslo.
* Set up (sub)directories for speller test documents
** not done
* Mark-up the added speller test texts, using our existing xml format
** not done, where are they? If any, __Børre__ and __Trond__ should have added
   them last week.
* [fix bugs!|http://giellatekno.uit.no/bugzilla]
** soon finishing the bug with name lexicon


!! Sjur
* name lexicon:
** refactor the rest of the SD-terms editor code
** implement missing propnouns editing functions
** implement improvements decided upon in Tromsø
*** name lexicon hiatus while finishing the public beta
* hire linguist and programmer
** short e-mail to the linguist candidate
** programmer position stopped
* publish corpus contracts and project infra as open-source on NoDaLi-sta
* fix stuorra-oslolaš lower case {{o}}
* write form to request corpus user account
* document how to apply for access to closed corpus, and details on the corpus
  and its use in general
* get an Intel Mac for __Tomi__
** asked __Børre__ to do it in Tromsø
* write press release for the beta
* get speller test tool from __Polderland__
** asked them - nothing received so far (they promised them by Wednesday last
   week by latest - hmm)
* Set up ways of adding meta-information to speller test docs
* [fix bugs!|http://giellatekno.uit.no/bugzilla]
* other tasks:
** went to Stockholm to present the Divvun project for an informal group of Sámi
   IT coordinators within each government - positive feedback, and a learning
   experience
** worked a lot on derivations and generation of them, finding and fixing bugs
   in the lexc code.
** added {{ms-speller}} target to the make file, to allow automatic builds of
   the binary speller lexicon files.


!! Steinar
* Beta testing: Align manually (shorter texts)
* Manually mark speller test texts for typos (making them into gold standards)
** started working, added some results to documentation 
* Infrastructure test: add report to {{gt/doc/infra/}}, probably as
  {{infrareport.jspwiki}}
** sent to __Børre__ who will help add to documentation
* Complete the semantic sets in sme-dis.rle
** no work this week
* missing lists
** no work this week
* Look at the actio compound issue when adding from missing lists
** not done
* Align corpus manually
** not done
* [fix bugs!|http://giellatekno.uit.no/bugzilla]


!! Thomas
* refine {{smj}} proper noun lexica, cf. the propernoun-smj-lex.txt 
** not done
* work with compounding
** worked and still working
* Lack of lowering before hyphen: Twol rewrite.
** not done
* fix stuorra-oslolaš lower case {{o}}
** not done
* translate beta release docs to {{sme}} and {{smj}}
** not done
* Add potential speller test texts
** not done
* [fix bugs!|http://giellatekno.uit.no/bugzilla]
** all the time


!! Tomi
* add derivations to the PLX generation
** in progress
* make PLX conversion test sample; add conversion testing to the make file
* improve number PLX conversion
* [fix bugs!|http://giellatekno.uit.no/bugzilla]


!! Trond
* Participate in the beta testing setup
* Test the beta versions
* Work on the parallel corpus issues
** Discuss with Anders
** Work on the aligner with (__Børre__)
** fix {{sme}} texts in corpus this month
** find missing {{nob}} parallel texts in corpus, go through Saara's list
* Postpone these tasks to after the beta:
** update the {{smj}} proper noun lexicon, and refine the morphological
   analysis, cf. the propernoun-smj-lex.txt
** Go through the Num bugs
* Improve automatic alignment process
* Align corpus manually
* Include a testbed and results in the cvs (gt/doc/proof/spelling/testing)
* Store the tested texts, for reference
* Add potential speller test texts
* [fix bugs!|http://giellatekno.uit.no/bugzilla].


!!!3. Documentation

The open documentation issues fall into these three categories:
* Beta documentation for testers
* Documentation for the online corpora
* General documentation improvement after Steinar's test (for open-source
  release)


TODO:
* write form to request corpus user account (__Børre, Sjur, Trond__)
* document how to apply for access to closed corpus, and details on the corpus
  and its use in general (__Børre, Sjur, Trond__)
* correct and imrove it based on feedback from __Steinar__ (__Børre__)
* beta documentation (see separate beta section below)


!!!4. Corpus gathering

TODO:
* {{sme}} texts: no new additions, fix corpus errors during this month
  (__Børre, Trond, Saara__)
* missing {{nob}} parallel texts should be added if such holes are found
  (__Børre, Trond__)
* Go through the list of missing or errouneous {{nob}} texts, based upon
  __Saara's__ perfect list (__Børre, Trond__)
* add {{sma}} texts to the corpus repository (__Børre__)


!!!5. Corpus infrastructure

!!Alignment

__TODO__
* go through other directories (nob dicrectories, sd directories), fix 
  parallellity information for other documents (2 hours)
  (__Børre__)
* Improve the automatic process:
** Improve the anchor list and realign (__Trond, Børre__)
** Only adding words does not improve alignment, you have to consider the format
   as well. If you cut with star e.g guo* too early, wrong word can be selected.
** The documents have still some formatting issues which cause trouble in
   alignment. (In some documents tables are included to the text, some not,
   etc.)
** Test and improve settings in the aligner
* Align manually (__Trond, Steinar__) (especially shorter terminological texts)
  good idea. __Saara__ will look for troublesome texts and prepare them for
  manual alignment.


!!!6. Infrastructure

__TODO:__
* add report to {{gt/doc/infra/}}, probably as {{infrareport.jspwiki}}
  (__Steinar__)
* update and fix our documentation and infrastructure as __Steinar__ finds
  problem areas (__Børre__)


!!!7. Linguistics

!!North Sámi

TODO:
* lexicalise actio compounds. Example: ''vuolggasadji''  vs. ''vuolginsadji''
  (__Maaren__)
* fix stuorra-oslolaš lower case {{o}} (__Sjur, Thomas, Trond__)
** postponed till after the public beta


!!Lule Sámi

TODO:
* refine {{smj}} proper noun lexica, cf. the propernoun-smj-lex.txt
  (__Thomas, Trond__)


!!!8. Name lexicon infrastructure

Decisions made in Tromsø can be found in [this meeting
memo.|/doc/admin/physical_meetings/tromso-2006-08-propnoun.html]

__TODO:__
# fix bugs in lexc2xml; add comments to the log element (__Saara__)
# finish first version of the editing (__Sjur__)
# test editing of the xml files. If ok, then: (__Sjur, Thomas, Trond__)
# make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
  well) (the morphological section should be kept intact, in e.g.
  propernoun-sme-morph.txt) (__Sjur, Saara__)
# convert propernoun-($lang)-lex.txt to a derived file from common xml files
  (__Sjur, Tomi, Saara__)
# implement data synchronisation between [risten.no|http://www.risten.no] and
   the cvs repo, and possibly other servers (ie the G5 as an alternative server
   to the public risten.no - it might be faster and better suited than the
   official one; also local installations could be treated the same way)
# start to use the xml file as source file
# clean terms-sme.xml such that all names have the correct tag for their use
  (e.g. @type=secondary) (__Thomas, Maaren, linguists__)
# merge placenames which are errouneously in different entries: e.g. Helsinki,
  Helsingfors, Helsset (__linguists__)
# publish the name lexicon on risten.no (__Sjur__)
# add missing parallel names for placenames (__linguists__)
# add informative links between first names like Niillas and Nils
  (__linguists__)


!!!9. Spellers

!!OOo speller(s)

TODO after the MS Office Beta is delivered:
* add Aspell/Hunspell data generation to the lexc2xspell (__Tomi__ - after the
  PLX data generation is finished)
* study Hunspell, perhaps also Soikko (__Børre, Sjur, Tomi__)


!!Testing

!Different ways of testing

# Impressionistic, functionality: try the program, try all the functions
# Impressionistic, coverage: try the program on different texts, look for
  false positives
# Systematic (in order of importance):
## Make a corpus of texts, from different genres (can be done before 0.2
   release)
### For each text, detect precision
### For each text, detect recall
### For each text, detect accuracy


Before beta release: precision is important, but have a look at recall as well.


!Definitions

* __tp__ - true positives (correctly recognised misspellings)
* __fp__ - false positives (correct words errouneously marked as misspellings)
* __fn__ - false negatives (misspellings not recognised by the speller)


!Recall and precision

* __precision__ = tp / ( tp + fp )  = true redlines / all redlines
** can we trust that the redlines are actually errors?
** Task: check all hits 
** (test p, are they tp or fp?)
* __recall__ = tp / ( tp + fn)      = true redlines / all errors in doc
** can we trust that all errors are actually found? 
** Task: check every single word 
** (test p, are they tp or fp, test n, are they tn or fn?)
* __accuracy__ = tp + tn / tp + fp + fn + tn = overall performance


!Precision and recall testing

A testbed has been set up (__Trond__), and some texts are marked for errors and
corrections (__Steinar__). Versions alpha, beta 0.1 and beta 0.2 have been
tested.


Types of tests:

# Technical testing
# Testing for linguistic functionality
# Testing for lexical coverage
# Testing for normativity
# Testing the suggestions


The tester should identify these 4 values:

* wds - number of words in the text
* tp - correctly identified errors
* fp - correctly written but marked as errors
* fn - errors not marked as such

The spreadsheet will then calculate precision, recall and accuracy. __Steinar__
has marked some texts like this: Errors are makred§marked with paragraph number
followed by the correct form. Way of finding precision: Take out the § entries
and evalate them for tp and fp. Way of finding recall: Remove the § entries and
count the fn among the remaining words. Then fill in and collect results.

Testing of suggestion should follow the same lines:

* errs - number of errors in the text
* tp - the intended word is among the suggestions
* fp - the intended word is not among the suggestions
* fn - no suggestions
* tn - (not relevant??)


Ordering of suggestions:
* place in the list of the intended correction
** ordered first
** ordered top-five
** ordered below top-five


"Perceived Quality", ie for all recognised errors/tp:
* number of correct suggestions at top
* number of correct suggestions among top-five
* number of correct suggestions below top-five


!Testing on unseen texts

We need to use unknown texts in order to measure the performance of the speller.

!Regression tests

We need to ensure that we do not take steps backwars, ie all known spelling
errors in the corpus should be correctly identified, with a proper suggestion
among the top five. For this purpose we can use the regular corpus with
correction markup.

We also need to regression test the PLX conversion. In principle this is easy -
just send the full word-form list (as generated by {{make wordlist TARGET=sme}})
through the speller. None should be rejected - any word form rejected is in
principle a regression. In practice, this is not that easy, since the word list
is so huge. We have to investigate alternatives for this testing.

__TODO:__
* add extraction of all known spelling errors in the corpus (not the
  {{prooftest}} corpus), and check that they are properly corrected
  (__Børre, Sjur__)
* test the typos.txt list, and check that all entries are properly corrected
  (__Børre, Sjur__)
* consider how to do a regression __self-test__, ie, how to test the full
  wordlist (__Børre, Sjur__)


!Storing test texts

Test texts should be stored in the corpus catalogue, separated from the ordinary
corpus files. They should be marked as to whether their unknown words have been
added to the lexicon or not (in the former case, they cannot be used for testing
of performance any more, only for regression testing). When the words have been
added, the whole text can be transferred to the regular corpus repository.

__TODO:__
* get an Intel Mac for Tomi (__Sjur__)
** asked __Børre__ to do it in Tromsø
* Include a testbed and results in the cvs {{gt/doc/proof/spelling/testing}}
  (__Trond, Børre__)
** textid - nu_wds - tp - fp - tn - fn - prec - rec - acc - spellid - ref_to_txt
** done
* Store the tested texts, for reference (__Trond, Børre__)
* get speller test tool from __Polderland__ (__Sjur__)
** asked for it - nothing received, will remind them
* Set up (sub)directories (__Saara__)
** top-level dir {{corpus/prooftest/orig/}} and {{prooftest/xml/}}
* Add potential test texts (__Børre, Thomas, Trond, anyone, really__)
* Manually mark them for typos (making them into gold standards)
  (__Steinar, Maaren__)
** {{erorr§error}}
* Format the added texts in appropriate ways - use our existing xml format, with
  correct markup as decided earlier (the only thing that separate these
  documents from regular corpus documents is the directory (tree) in which they
  reside), thus regular corpus conversion tools, plus error markup (__Saara__)
** requires changes to {{ccat}} to handle error/correction markup (__Tomi__)
* Set up ways of adding meta-information (source info, used in testing or not,
  added to lexicon or not) (__Sjur, Børre__)
* Set up test record page in {{gt/doc/proof/spelling/testing/}} (__Børre__)
** done
* Conduct tests on new beta versions on the basis of the unspoiled gold standard
  documents (__whoever has time__), and fill in data from testing (the testers: 
  __who?__)
* alternatively: make test scripts that will run the tests automatically,
  collect the numbers, and transform them into test results (__who?__)
  dependent upon the functionality of the Polderland tools.
* include the ones already tested in the {{testing/}} catalogue
* test 0.3 on the same texts
* test each version before beta release


!!The b0.3 / 2007.02.26 version

Known errors:
* clitics do not work with {{W}} class words (uninflected words). Two options:
** generate these with clitics (adds words from 6700 -> 100 000)
   (__Tomi, Saara__)
*** done
** ask Polderland to look at it - __Sjur__ will do that
*** __Tomi__ did it, follow-up e-mail discussions by __Sjur__ and __Tomi__


!!Localisation

We need to translate the info added to our front page (and a separate page)
regarding the beta release. Also the press release needs to be translated.

TODO:
* translate beta release docs to {{sme}} (__Thomas__)
* translate beta release docs to {{smj}} (__Thomas__)


!!Lexicon conversion to the PLX format

We need to test that the conversion is correct and gives expected results in all
cases, especially regarding compounding and derivation. For that we need a small
set of test entries in lexc format, and the corresponding expected output in PLX
format. By comparing the actual output with the expected output we get a measure
of the quality of the conversion.

__TODO:__
# add derivations to the PLX generation (__Tomi__)
## working on it
# add prefixes to the PLX (__Børre__)
# middle nouns (__Børre__)
# make conversion test sample; add conversion testing to the make file
  (__Tomi__)
## __Sjur__ added {{ms-speller}} to the Makefile
# improve number conversion (__Børre, Tomi__)


!!Public Beta release

Tentative public beta release: after the initial linguistic bugs and poor
coverage, it is now moved to Thursday 15.3. - this time with derivations and
numbers included:-)

Internal deadlines:
* A date for when lexical updates should be checked in, in
  order to make it to the beta.
* A plan for how many pre-betas we should compile, and when(?)  
** alpha = Dutch ({{sme}}) + French ({{smj}})
** beta 0.1 = the first Catalan ({{sme}}) + Basque ({{smj}})
** beta 0.2 = the second Catalan
** beta 0.3 =  26. or 27.: compound beta
** beta 0.4 = 2.3.: first derivation beta, also including numbers, prefixes.
** beta 0.5 = 7.3.: final derivation beta, also including middle nouns


Linguistic issues still open:
* derivations (__Tomi__)
* numbers 1-20 (__Børre__)
* prefixes (eahpe, ii-) (__Børre__)
* middle nouns (LEXICON: lexc: Rmiddle, plx: L) (__Børre__)


The middle nouns are:  {{beai, beal, geaš, oahpaheai, oai, miel, vuol}}. They
are also marginally used initially (not found in the corpus):

{{{
 beai+ShCmp:beai  Rreal ; (not used init in our corpus)
 beal+ShCmp:beal  Rreal ; (init with Num -goalmmat, -guđát, -nuppi, lexicalized)
 geaš+ShCmp:geaš  Rreal ; (not used init in our corpus)
 oahpaheai+ShCmp:oahpaheai  Rreal ;  init, but then actually 2-part 
 oai+ShCmp:oai 	  Rreal ;  (not used in corpus init oaivuolli (SUB? yes!)
 vuol+ShCmp:vuol  Rreal ;  (not used in our corpus)
}}}

The __PLX__ format does not allow encoding a stem as middle-only. For the public
beta we will encode them as Left-only (which is really non-right), and evaluate
their effect on the quality of the speller as we progress.


DONE:
* delivered PLX data of {{sme}} and {{smj}} including compounding
* translated Windows installer to {{sme}} and {{smj}}
* installed PLX compiler in G5 at {{/usr/local/bin/mklex*}} (one version for
  {{sme}} and one for {{smj}})
* added resources needed for compiling PLX lexicons to our cvs repo
* tested the beta drop from Polderland - good we did, it is absolutely
  unacceptable (our responsibiliby - only linguistic errors (poor coverage)
  found so far)


__TODO:__
* write press release (__Sjur__)
** done first draft, see {{xtdoc/sd/.../xdocs/pr/}}
* add info to front page (incl. download links) (__Børre__)
* write separate page with detailed info (incl. download links) (__Børre__)
** __Sjur__ wrote a start
* add compilation of MS Office spellers part of the Makefile (__Tomi__)
** __Sjur__ did it
* install Windows and MS Office; test tools on Windows (__Børre, Thomas__)
** done
* collect a list of PR recipients, forward to Berit Karen Paulsen
  (__Børre, Sjur, Trond__)
* questions for Polderland (__Børre__):
** version info in the speller?
*** see below
** remaking/updating the installer packages with linguistic updates - who?
*** discussed with __Polderland__, they will discuss it internallyl


!!Version identification of speller lexicons

See the Norwegian spellers for an example, with the trigger string {{tfosgniL}}.

Suggestion:
{{{
nuvviD -> Divvun
nuvviD -> Dávvisámegiella
nuvviD -> Veršuvdna_1.0b1 (based on cvs tag?)
nuvviD -> 12.2.2007  (automatically generated/added)
nuvviD -> Sjur_Nørstebø_Moshagen
nuvviD -> Børre_Gaup
nuvviD -> Thomas_Omma
nuvviD -> Maaren_Palismaa
nuvviD -> Tomi_Pieski
nuvviD -> Trond_Trosterud
nuvviD -> Saara_Huhmarniemi
nuvviD -> Steinar_Nilsen
nuvviD -> Lene_Antonsen
nuvviD -> Linda_Wiechetek
}}}

These correction rules (and their corresponding PLX entries) should be added
automatically to the PLX file and the phonetic file as part of the compilation
process, to include build date and version number.

__TODO:__
* add version info to the generated speller lexicons (__Børre, Sjur, Tomi__)


!!Conversion from LexC to PLX

{{{
Adjectives compile at 60 sec/adjective, i.e. (5000*60) / 3600 = 83 hrs
Nouns compile at 3 sec/noun,            i.e. (23600*3) / 3600 = 19 hrs
}}}

This is so far acceptable for nouns, but on the edge of being unacceptable for
adjectives. These times will multiply many times when we add derivation, meaning
we will need more than a week to convert the major POSes from LexC to PLX then.

We need to investigate why adjectives are so slow, and try to improve on the
conversion speed.

__TODO:__
* evaluate the speed of conversion to PLX, and whether we need to try to improve
  it


!!!10. Other

!!Corpus contracts

TODO:
* publish corpus contracts and project infra as open-source on NoDaLi-sta
  (__Sjur__)
** __delayed__ until the public beta is out the door


!!Bug fixing

__57__ open Divvun/Disamb bugs, and __23__ risten.no bugs


!!!11. Next meeting, closing

The next meeting is 12.3.2007, 09:30 Norwegian time.

The meeting was closed at 10:42.

!!!Appendix - task lists for the next week

!! Boerre
[iCal|/doc/admin/weekly/2007/Tasks_2007-03-05_Boerre.ics]
* write form to request corpus user account
* document how to apply for access to closed corpus, and details on the corpus
  and its use in general
* update and fix our documentation and infrastructure as __Steinar__ finds
  problem areas
* continue work on script for automatic testing of the spell checker in Word
* fix {{sme}} texts in corpus this month
* find missing {{nob}} parallel texts in corpus
* add prefixes to the PLX conversion
* add middle nouns to the PLX conversion
* improve number PLX conversion
* go through other directories, fix parallellity information for other documents
* add {{sma}} texts to the corpus repository
* add info to front page (incl. download links)
* write separate page with detailed info (incl. download links)
* Improve automatic alignment process
* Store the tested texts, for reference
* Add potential speller test texts
* Set up ways of adding meta-information to speller test docs
* get an Intel Mac for __Tomi__
* collect a list of PR recipients, forward to Berit Karen Paulsen
* add version info to the generated speller lexicons
* run all known spelling errors in the corpus through the speller
* test the typos.txt list, and check that all entries are properly corrected
* consider how to do a regression __self-test__
* [fix bugs!|http://giellatekno.uit.no/bugzilla]


!! Maaren
[iCal|/doc/admin/weekly/2007/Tasks_2007-03-05_Maaren.ics]
* lexicalise actio compounds
* Manually mark speller test documents for typos


!! Saara
[iCal|/doc/admin/weekly/2007/Tasks_2007-03-05_Saara.ics]
* continue aligning the rest of the parallel files
* prepare more files for manual alignment
* update lexc2xml with comment field
* start improving the corpus interface for Sámi in Oslo.
* Set up corpus directories for proofing test documents
* Mark-up the added speller test texts, using our existing xml format
* [fix bugs!|http://giellatekno.uit.no/bugzilla]


!! Sjur
[iCal|/doc/admin/weekly/2007/Tasks_2007-03-05_Sjur.ics]
* name lexicon:
** refactor the rest of the SD-terms editor code
** implement missing propnouns editing functions
** implement improvements decided upon in Tromsø
* hire linguist
* fix stuorra-oslolaš lower case {{o}}
* write form to request corpus user account
* document how to apply for access to closed corpus, and details on the corpus
  and its use in general
* write press release for the beta
* get speller test tool from __Polderland__
* Set up ways of adding meta-information to speller test docs
* collect a list of PR recipients
* add version info to the generated speller lexicons
* run all known spelling errors in the corpus through the speller
* test the typos.txt list, and check that all entries are properly corrected
* consider how to do a regression __self-test__
* [fix bugs!|http://giellatekno.uit.no/bugzilla]


!! Steinar
[iCal|/doc/admin/weekly/2007/Tasks_2007-03-05_Steinar.ics]
* Beta testing: Align manually (shorter texts)
* Manually mark speller test texts for typos (making them into gold standards),
  add the texts to a certain directory 
* Infrastructure test: add report to {{gt/doc/infra/}}, probably as
  {{infrareport.jspwiki}}
* Complete the semantic sets in sme-dis.rle
* missing lists
* Look at the actio compound issue when adding from missing lists
* Align corpus manually
* [fix bugs!|http://giellatekno.uit.no/bugzilla]


!! Thomas
[iCal|/doc/admin/weekly/2007/Tasks_2007-03-05_Thomas.ics]
* refine {{smj}} proper noun lexica, cf. the propernoun-smj-lex.txt 
* work with compounding
* Lack of lowering before hyphen: Twol rewrite.
* fix stuorra-oslolaš lower case {{o}}
* translate beta release docs to {{sme}} and {{smj}}
* Add potential speller test texts
* [fix bugs!|http://giellatekno.uit.no/bugzilla]


!! Tomi
[iCal|/doc/admin/weekly/2007/Tasks_2007-03-05_Tomi.ics]
* add derivations to the PLX generation
* make PLX conversion test sample; add conversion testing to the make file
* improve number PLX conversion
* update {{ccat}} to handle error/correction markup
* add version info to the generated speller lexicons
* [fix bugs!|http://giellatekno.uit.no/bugzilla]


!! Trond
[iCal|/doc/admin/weekly/2007/Tasks_2007-03-05_Trond.ics]
* Participate in the beta testing setup
* Test the beta versions
* Work on the parallel corpus issues
** Discuss with Anders
** Work on the aligner with (__Børre__)
** fix {{sme}} texts in corpus this month
** find missing {{nob}} parallel texts in corpus, go through Saara's list
* Postpone these tasks to after the beta:
** update the {{smj}} proper noun lexicon, and refine the morphological
   analysis, cf. the propernoun-smj-lex.txt
** Go through the Num bugs
* Improve automatic alignment process
* Align corpus manually
* Store the tested texts, for reference
* Add potential speller test texts
* collect a list of PR recipients
* [fix bugs!|http://giellatekno.uit.no/bugzilla].