!!!Meeting setup

* Date: 26.3.2008
* Time: 09.30 Norw. time
* Place: Internet
* Tools: SubEthaEdit, iChat/Skype

!!!Agenda

Cf. one of the following, depending on context:
* the upper bar of the SEE window (provided you use the JSPWiki syntax mode)
* the TOC in Forrest-rendered output, like HTML and PDF

!!!Opening, agenda review, participants

Opened at 10:16.

Present: __Børre, Lene, Sjur, Thomas, Trond__

Absent: __Maaren, Per-Eric, Tomi__

Agenda accepted as is.

!!!Updated task status since last meeting

!!Børre
* start to reorganise the documentation
* gather {{sma}} texts
* improve forrest stability with i18n, site look
* set up the Leopard Server features for collaborative support
* Hunspell lexicon conversion
* InDesign documentation
* investigate the NSIS installer
* give contract with blank fields to __Per-Eric__
** done
* [fix bugs!|http://giellatekno.uit.no/bugzilla]
* Other:
** Had a meeting with Davvi Girji and Čálliid lágádus. They agreed to add a 
   paragraph to their standard contracts which lets the Sami Parliament use
   the texts they publish.
** Visited __Johan Jernsletten__, __Aage Solbakk__ and __Kari Meløy__ who signed
   contracts. Will also make contracts with __Yngve Engkvist__,
   __Harald Gaski__, __Siri Broch Johansen__ and __Roald E. Kristiansen__.
   All in all this will give us quite a few books to work with.

!!Lene
* Ped project - status:
** waiting for __Tino__ to do some changes of the synt.tags in the VISL-games
** Saara is programming the morph-drill and question/answering-drill - we need
   good (funny?) names for the drills. Does anybody have suggestions?
** Saara also makes an xml for the lexicon for the drills
** I am doing other work now waiting for this. When Saara has finished the first 
   versions of the programming, then I will continue.
** The next big task after the drills is to continue the work with the dialogues


!!Maaren
* Put the list of possible {{sma}} corpus sources into a document
* update the ''Changes'' document


!!Per-Eric
* keep the contact with Kurt Tores family about his texts.
** Nothing new
* try to find other authors who have smj texts digitaly, send contracts to them
** Nothing new
* Work with missing list from the bible texts.
** Not done
* Keep the contact with Ulf-Stefan Winka who has many more smj texts to add.
** Have some new texts from Sigga Tuolja Sandström, which I have done missing
   lists of and also texts from Lars-Matto Tuolja which I also have done missing 
   list of
* [fix bugs!|http://giellatekno.uit.no/bugzilla]
** Nothing done

!!Saara
* add new XSL/XML headers for proofing test docs
* Set up ways of adding meta-information for proofing correct corpus docs
  (source info, used in testing or not, added to lexicon or not)
* discuss more parallel texts

!!Sjur
* start to reorganise the documentation
* gather {{sma}} texts
* improve forrest stability with i18n, site look
* set up the Leopard Server features for collaborative support
* name db/risten.no
* investigate the NSIS installer
* make a first {{sma}} project plan
* publish corpus contracts and project infra as open-source on NoDaLi-sta
* [fix bugs!|http://giellatekno.uit.no/bugzilla]
* other things:
** hyphenator bug hunting and reporting

!!Thomas
* look at test cases still not behaving properly
** not much done here
* add remaining hyphenation bugs to Bugzilla
** done
* lexicalise ''europarádeministarjuogos''
** done
* try to fix 636
** did not succeed
* [fix bugs!|http://giellatekno.uit.no/bugzilla]
** worked some

!!Tomi
* Hunspell lexicon conversion
* document how compounding is controlled in the PLX conversion
* fix double hyphen bugs
* [fix bugs!|http://giellatekno.uit.no/bugzilla]

!!Trond
* Reorganise documentation (with Børre and Sjur)
** Not done
* Gather sma texts (with Børre and Sjur)
** Not done
* Name lexicon project: Test editing xml files (when they are ready for it)
** Not done
* Work on {{sma}} analyser and visl integration
** Not so much here (but on smn, sms, sjd)
* [fix bugs!|http://giellatekno.uit.no/bugzilla].


!!!Pedagogical software online

__UiT/GT__ is developing their own language games & drills, in addition to the
VISL games. See Lene's report above. Links:

* [http://giellatekno.uit.no/oahpa/]
* [http://giellatekno.uit.no/ped/index.html]

Ped-speller? That is, a speller with restricted vocabulary and morphology (rare
words, names and forms are removed, possibly even compounding, ie with only
lexicalised compounds). Examples of problem pairs: ''boaris'' vs ''buoris'',
''vieljas'' vs ''vielljas''. With a smaller lexicon, it might be possible to
increase the complexity of the suggestion ("phonetic") rules.

South Sámi ped-prog should be discussed with the {{sma}} groups we'll meet
throughout the spring.

Note that there is a new deadline for pedagogical and strengthening of language
projects within Sámediggi in the autumn. 

__TODO:__
* get an easy-to-remember URL (__UiT/IT__) 
* More thorough skin, layout, ... (__External person within the Ped team__,
  __Internal forrest expert__) This we will postpone until later
* Make a pedagogical speller (__Tomi__ when finished with his MA thesis)
** Add a flag !^P^ for forms to be excluded (__Thomas, Lene__)
** Turn off peripheral compounds (numbers, acros, perhaps names)
** Increase editing distance by one for suggestions? Only possible with limited
   compounding


!!!Documentation

__TODO:__
* start to reorganise the documentation (__Børre, Sjur, Trond__)

!!!Corpus gathering

__TODO:__
* follow-up on the {{smj}} texts from __Kurt Tore__ (__Per-Eric__)
* get texts from __Sigga Tuolja Sandstrøm__, possibly through __Olavi Korhonen__
  (contract is ok now) (__Per-Eric__)
* other contacts: Nord-Salten avis, Børge Strandskog, Lena Davidsson daughter to
  Lars-Matto Tuolja
* gather {{sma}} texts (__Børre, Sjur, Trond, Joseph__)
* Put the list of possible corpus sources into a document
  {{gt/doc/lang/sma/sma-corpus-plan.jspwiki}} (__Maaren__)
* give contract with blank fields to __Per-Eric__ (__Børre__)

!!!Future plans, directions and ideas

* more speller engines supported (to different degrees)
* more hyphenators supported
* grammar checker
** what the society wants
** it is interesting both for the university and SD
** it is a very good cooperation project
* tailored proofing tools
* machine translation (further work to something useful)
* cooperation with groups teaching Sámi, starting at UiT
* speech
* searching and indexing
* automatic (bilingual) lexicon building, semantics
* more public visibility & delivery
* more open-source technology (sfst, other?)

See also a separate document in {{plan/strat/5year.jspwiki}}.

!!!Infrastructure

To accomodate future enhancements in different directions:
# migrate to svn
# merge gt, kt and st into one, probably after the svn move
# more modularised make / build infra (prepare for smn, sms, sjd, others)
# close certain parts of the code repository (requires svn)
# set up the Leopard Server features for collaborative support:
## permanent chat rooms
## stored (and indexed) chat transcripts of the chat rooms
## iCal server / group calendars
## wiki
# wiki? (is part of Leopard Server) or other web-based documentation
# improve Forrest stability and i18n support
# reorganise the documentation & look
# migrate to XML
# sfst? Both as replacement for xfst and for hunspell/open-source proofing tools
# investigate the NSIS installer, potentially replacing the InstallShield
  package from Polderland
# corpus content moved to Max Planck repositories?


__TODO:__
* add Jabber account in iChat (__all__)
* prepare migration to svn (__Børre, Sjur, Trond__)


!!!Linguistics

!!North Sámi

(nothing new, see proofing bugs below)

!!Lule Sámi

(nothing new, see proofing bugs below)

__TODO:__
* {{sme->smj}} lexicon conversion to build bilingual lexicon resources, and
  increase {{smj}} coverage (__Trond, Svenne__). 
* Add the words when all words are ready.

!!South Sámi

Nothing new.

!!!Name lexicon infrastructure

__TODO:__
# fix i18n bug in risten.no/G5 (so they will work without the proper locale
  request) (__Sjur__)
## it works ok locally, set-up / config needs to be checked on the G5; probably
   easy to fix
### it works the same both locally and on the G5, relates to i18n setup in
    forrest
# fix bugs in lexc2xml; add comments to the log element (__Saara__)
# finish first version of the editing (__Sjur__)
# test editing of the xml files. If ok, then: (__Sjur, Thomas, Trond__)
# make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
  well) (the morphological section should be kept intact, in e.g.
  propernoun-sme-morph.txt) (__Sjur, Saara__)
# convert propernoun-($lang)-lex.txt to a derived file from common xml files
  (__Sjur, Tomi, Saara__)
# implement data synchronisation between [risten.no|http://www.risten.no] and
   the cvs repo, and possibly other servers (ie the G5 as an alternative server
   to the public risten.no - it might be faster and better suited than the
   official one; also local installations could be treated the same way)
# start to use the xml file as source file
# clean terms-sme.xml such that all names have the correct tag for their use
  (e.g. @type=secondary) (__Thomas, Maaren, linguists__)
# merge placenames which are errouneously in different entries: e.g. Helsinki,
  Helsingfors, Helsset (__linguists__)
# publish the name lexicon on risten.no (__Sjur__)
# add missing parallel names for placenames (__linguists__)
# add informative links between first names like Niillas and Nils
  (__linguists__)


!!!Proofing tools

!!Hunspell

__TODO:__
# add {{smj}} to the soup, make sure it works roughly as good as {{sme}}
# fix the remaining conversion bugs for {{sme}}
# return to {{smj}}, and fix whatever is left to fix
# integrate the derivations as separate "continuation lexicons"

!!Testing

!Spelling Error Markup

__TODO:__
* Set up ways of adding meta-information (source info, used in testing or not,
  added to lexicon or not) (__Saara__)
* test new and nested error markup (__Sjur__)

!!Speller bugs

Open issues based on test results :

!sme
Version: __Davvisámi, version 1.0.1, 2008-02-17__
* 426 - comp words from Divvun.no - ''guoktedássásaš'' accepted - still open
* 435 - roman number - single letter numbers now recognised
** we should pregenerate all numbers once and for all, and store them in a
   separate lexicon file
* 595 - prefix+name wihtout hyphen (''ovdaLot'' instead of ''ovda-Lot'')
* 600 - __REGRESSION:__ gen+hyph compound ''sámi-dáru''
* 603 - suomabealdi, norggabealdi accepted
* 606 - speller accepts VUOHTA compound
* 607 - acro + hyphen
**''NRKGA'' is acro + clitic accepted without colon - what is correct?
* 611 - double hyphen sugg still accepted
* 613 - short gen. as second compound part
* 619 - numerals and pronouns to NAMÁK and SASJ fails
* 627 - prefix + hyhpen does not get accepted
* 629 - ''a'' taking part in compounding without hyphen
* 633 - double hyphens accepted in Word, not by cmdline speller
* 634 - PropGen+hyph+PropGen
* 641 - numeral+noun compounds
* 642 - noun/adj/proper + hyphen + ain
* 644 - cased numeral+numeral compund
* 646 - adverb + hyphen + noun
* 647 - numerals+NOUN
* 648 - unmotivated suggestions with numeral+noun
* 649 - name + adj compound without hyphen
* 654 - speller does not recognize ordinals on -nuppelogát
* 655 - pron + nai
* 658 - Suggestion saame
* 660 - abbr. not recognised

!smj
Version: __Julevsáme, version 1.0.1, 2008-02-14__
* 435 - roman number - single letter numbers now recognised
** we should pregenerate all numbers once and for all, and store them in a
   separate lexicon file
* 595 - prefix+name wihtout hyphen (''tsåhkeLot'' instead of ''tsåhke-Lot'')
* 600 - __REGRESSION:__ gen+hyph compound ''sáme-dáro''
* 607 - acro + hyphen
**''NRKGA'' is acro + clitic accepted without colon - what is correct?
* 616 - Bispadime-me-ráden - still __OPEN__, try to find an acro or abbr ''me''
* 619 - numerals and pronouns to NAMÁK and SASJ fails - still __OPEN__
* 629 - ''a'' taking part in compound - still __OPEN__
* 634 - rop gen + hyphen + Prop gen - still __OPEN__
* 641 - numeral+noun compounds - still __OPEN__
* 644 - cased numeral+numeral compund
* 647 - numerals+NOUN
* 648 - unmotivated suggestions with numeral+noun
* 649 - name + adj compound without hyphen
* 650 - noun prefix+name compound without hyphen
* 658 - Suggestion saame

__TODO:__
* compile new speller lexicons (__Tomi__)
* document how compounding is controlled in the PLX conversion (__Tomi__)

!!Hyphenator bugs

Open issues based on test results :

!sme
* 468 - ''Márkomeanu'' -> Polderland - __FIXED__
* 548 - ''duostan'' -> Polderland - __FIXED__
* 549 - missing hyph at word boundary -> Polderland - __FIXED__
* 633 - extra hyphen inserted -> Divvun - __FIXED__

There are still some bugs found in the wordtypes test file. They should be added
to Bugzilla.

__TODO:__
* add remaining hyphenation bugs to Bugzilla (__Thomas__)

!smj
* 549 - missing hyph at word boundary -> Polderland - __FIXED__
* 633 - extra hyphen inserted -> Polderland - __FIXED__
* 636 - hyphen before last char -> Divvun

Possible solution:
{{{
define saveclitic  %# -> 0 || _ k .#. ;
}}}

The wordtypes test file does contain another problem, but that one belongs to
Polderland, and is reported.

__TODO:__
* lexicalise ''europarádeministarjuogos'' (__Thomas__)
* try to fix 636 (__Thomas, Trond__)

!!InDesign tools

We're waiting for an update from Polderland.

!!Windows installer

This point is now moved to the section for future plans, and will be tackled as
time permits.

!!Releases

__TODO:__
* update the ''Changes'' document (__Børre__)
* documentation (__Sjur__)
** Norwegian translation received from Davvi Girji

!!!Other

!!Corpus contracts + open source

TODO:
* publish corpus contracts and project infra as open-source on NoDaLi-sta
  (__Sjur__)


!!!Next meeting, closing

The next meeting is 31.3.2008.

The meeting was closed at 13:16.

!!!Appendix - task lists for the next five days

!!Boerre
[iCal|/doc/admin/weekly/2008/Tasks_2008-03-26_Boerre.ics]
* gather {{sma}} texts
* Hunspell lexicon conversion
* InDesign documentation
* update the ''Changes'' document
* prepare migration to svn (with __Sjur, Trond__)
* [fix bugs!|http://giellatekno.uit.no/bugzilla]

!!Lene
* Ped project
* Add a flag !^P^ for forms to be excluded from ped. speller

!!Maaren
[iCal|/doc/admin/weekly/2008/Tasks_2008-03-26_Maaren.ics]
* Put the list of possible {{sma}} corpus sources into a document


!!Per-Eric
[iCal|/doc/admin/weekly/2008/Tasks_2008-03-26_Per-Eric.ics]
* keep the contact with Kurt Tores family about his texts.
* try to find other authors who have smj texts digitaly, send contracts to them
* Work with missing list from Tjaktjalasta, Lars-Matto Tuolja.
* Work with missing list from texts written by Sigga Tuolja Sandström.
* Keep the contact with Ulf-Stefan Winka who has many more smj texts to add.
* [fix bugs!|http://giellatekno.uit.no/bugzilla]

!!Saara
[iCal|/doc/admin/weekly/2008/Tasks_2008-03-26_Saara.ics]
* add new XSL/XML headers for proofing test docs
* Set up ways of adding meta-information for proofing correct corpus docs
  (source info, used in testing or not, added to lexicon or not)
* discuss more parallel texts

!!Sjur
[iCal|/doc/admin/weekly/2008/Tasks_2008-03-26_Sjur.ics]
* gather {{sma}} texts
* name db/risten.no
* make an improved {{sma}} project plan
* publish corpus contracts and project infra as open-source on NoDaLi-sta
* prepare migration to svn (__Børre, Trond__)
* [fix bugs!|http://giellatekno.uit.no/bugzilla]

!!Thomas
[iCal|/doc/admin/weekly/2008/Tasks_2008-03-26_Thomas.ics]
* look at test cases still not behaving properly
* try to fix 636
* Add a flag !^P^ for forms to be excluded from ped. speller
* [fix bugs!|http://giellatekno.uit.no/bugzilla]

!!Tomi
[iCal|/doc/admin/weekly/2008/Tasks_2008-03-26_Tomi.ics]
* Hunspell lexicon conversion
* document how compounding is controlled in the PLX conversion
* fix double hyphen bugs
* compile new speller lexicons
* Make a pedagogical speller (after MA thesis is delivered)
* [fix bugs!|http://giellatekno.uit.no/bugzilla]

!!Trond
[iCal|/doc/admin/weekly/2008/Tasks_2008-03-26_Trond.ics]
* Work on {{sma}} analyser and visl integration
* try to fix 636
* Prepare svn migration (with __Sjur, Børre__)
* [fix bugs!|http://giellatekno.uit.no/bugzilla].