!!!Meeting setup * Date: 21.4.2008 * Time: 12.30 Norw. time * Place: Internet * Tools: SubEthaEdit, iChat/Skype !!!Agenda Cf. one of the following, depending on context: * the upper bar of the SEE window (provided you use the JSPWiki syntax mode) * the TOC in Forrest-rendered output, like HTML and PDF !!!Opening, agenda review, participants Opened at 09:42. Present: __Børre, Per-Eric, Sjur, Thomas, Tomi, Trond__ Absent: __Jovsset__ Agenda accepted as is. !!!Updated task status since last meeting !!Børre * Hunspell lexicon conversion ** New [testrelease available|http://divvun.no/static_files/hunspell-sme-2008-04-19.zip] * InDesign documentation ** nothing done * prepare migration to svn (with __Sjur, Trond__) ** done a conversion from cvs to svn, have to set up access. * release hunspell public beta at the end of April (with __Sjur__) * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Lene * Ped project * Add a flag !^P^ for forms to be excluded from ped. speller !!Maaren * Put the list of possible {{sma}} corpus sources into a document !!Per-Eric * try to find other authors who have {{smj}} texts digitaly, send them contracts ** Nothing new * Work with missing list from texts written by Sigga Tuolja Sandström. ** Worked and still working * Keep the contact with Ulf-Stefan Winka who has many more smj texts to add. ** Nothing new * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** Nothing new !!Saara * add new XSL/XML headers for proofing test docs * Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not) * discuss more parallel texts !!Sjur * gather {{sma}} texts ** nothing new * name db/risten.no ** lots of work, see details below * make an improved {{sma}} project plan ** nothing new, going to Røros and Östersund this week * publish corpus contracts and project infra as open-source on NoDaLi-sta ** nothing new * prepare migration to svn (with __Børre, Trond__) ** nothing new * add Jabber account in iChat for __Svenne__ ** done * release hunspell public beta at the end of April (with __Børre__) ** depends on the hunspell development * update the ''Changes'' document ** nothing new * discuss {{sma}} project public tender with __Julie__, as well as maintenance and support needs from Polderland ** done * follow-up on some Polderland-related bugs: 621, 630, 652, 656 ** some done, to be discussed during this meeting * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** no fixes last week !!Thomas * look at test cases still not behaving properly ** worked a little bit * Hunspell: add {{smj}} to the soup, make sure it works roughly as good as {{sme}} ** Børre * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** worked a little bit !!Tomi * Hunspell lexicon conversion ** helped Børre a bit * document how compounding is controlled in the PLX conversion ** not done * fix double hyphen bugs ** not done * Make a pedagogical speller (after MA thesis is delivered) ** not done * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** documented on some !!Trond * Help Jovsset with vislcg3 and sma ** Not done. Jovsset? * Set up Jabber for Lene, Kimme, Saara ** Not done. * Prepare svn migration (with __Sjur, Børre__) ** Discussed with Sjur and Börre, got answer from Steinar. * [fix bugs!|http://giellatekno.uit.no/bugzilla]. ** At least one bug fixed. !!!Pedagogical software online __TODO:__ * get an easy-to-remember URL (__UiT/IT__) * More thorough skin, layout, ... (__External person within the Ped team__, __Internal forrest expert__) This we will postponed until later * Make a pedagogical speller (__Tomi__ when finished with his MA thesis) ** Turn off peripheral compounds (numbers, acros, perhaps names) ** Increase editing distance by one for suggestions? Only possible with limited compounding !!!Documentation __TODO:__ * start to reorganise the documentation (__Børre, Sjur, Trond__) !!!Corpus gathering __TODO:__ * follow-up on the {{smj}} texts from __Kurt Tore__ (__Per-Eric__) * get texts from __Sigga Tuolja Sandstrøm__, possibly through __Ulf-Stefan Winka__ (contract is ok now) (__Per-Eric__) * other contacts: Nord-Salten avis (__Børge Strandskog__), Lena Davidsson daughter to Lars-Matto Tuolja * gather {{sma}} texts (__Børre, Sjur, Trond, Joseph__) ** gathered text should be given to __Børre__ * Put the list of possible corpus sources into a document {{gt/doc/lang/sma/sma-corpus-plan.jspwiki}} (__Maaren__) * give contract with blank fields to __Per-Eric__ (__Børre__) !!!Future plans, directions and ideas See a separate document in {{plan/strat/5year.jspwiki}}. !!!Infrastructure To accomodate future enhancements in different directions (in rough order of importance): # migrate to svn # merge gt, kt and st into one, probably after the svn move # more modularised make / build infra (prepare for smn, sms, sjd, others) # close certain parts of the code repository (requires svn) # set up the Leopard Server features for collaborative support: ## permanent chat rooms ## stored (and indexed) chat transcripts of the chat rooms ## iCal server / group calendars ## wiki # wiki? (is part of Leopard Server) or other web-based documentation # improve Forrest stability and i18n support # reorganise the documentation content: ## differ between target groups ## get better grouping ## decide what to write in forrest and what in wiki (cf. Apertium for a similar split) ## update/add missing parts # migrate lexicons to XML, splitting the task ## Name lexica (the Name project) ## Dictionaries (already in XML, task is to integrate them) ## Open POSes (Komi as a test case) # change the look of the documentation web # sfst? Both as replacement for xfst and for hunspell/open-source proofing tools # investigate the NSIS installer, potentially replacing the InstallShield package from Polderland # corpus content moved to Max Planck repositories? __TODO:__ * add Jabber account in iChat ** check that all accounts are ready for iChat on the G5 (__Børre__) ** UiT: Lene, Kimme, Saara (__Trond__) ** SD: Svenne (__Sjur__) *** done * prepare migration to svn (__Børre, Sjur, Trond__) ** there is a fairly robust python script available at the Subversion home page to do the conversion job * try to repair G5 accounts for iCal Server (__Børre__) !!!Linguistics !!North Sámi (nothing new, see proofing bugs below) !!Lule Sámi (nothing new, see proofing bugs below) __TODO:__ * {{sme->smj}} lexicon conversion to build bilingual lexicon resources, and increase {{smj}} coverage (__Trond, Svenne__). * Add the words when all words are ready. !!South Sámi Nothing new since last week. !!!Name lexicon infrastructure Much work last week on the SD-terms collection, and the editing code relating to it. All old legacy code was cleaned and moved into the proper place, and deletion of entries was added. Next up: to finish the work on the SD-terms collection (editing), then look at the name lexicon, and also dictionary search interface. __TODO:__ # fix i18n bug in risten.no/G5 (so they will work without the proper locale request) (__Sjur__) # fix bugs in lexc2xml; add comments to the log element (__Saara__) # finish first version of the editing (__Sjur__) # test editing of the xml files. If ok, then: (__Sjur, Thomas, Trond__) # make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (__Sjur, Saara__) # convert propernoun-($lang)-lex.txt to a derived file from common xml files (__Sjur, Tomi, Saara__) # implement data synchronisation between [risten.no|http://www.risten.no] and the cvs repo, and possibly other servers (ie the G5 as an alternative server to the public risten.no - it might be faster and better suited than the official one; also local installations could be treated the same way) # start to use the xml file as source file # clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (__Thomas, Maaren, linguists__) # merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (__linguists__) # publish the name lexicon on risten.no (__Sjur__) # add missing parallel names for placenames (__linguists__) # add informative links between first names like Niillas and Nils (__linguists__) !!!Proofing tools !!Hunspell Hunspell 1.2.2 released, several improvements and bug fixes. Hunspell 1.1.12 released as part of OOo 2.4. __Børre__ released a new version today: [http://divvun.no/static_files/hunspell-sme-2008-04-19.zip] (sme only). * se.dic: 14 MB * se.aff: 35 385 B __TODO:__ # review hunspell lexicon branch, merge with trunk if ok (__Børre, Thomas__) # test latest lexicon (__Sjur__) # add {{smj}} to the soup, make sure it works roughly as good as {{sme}} (__Børre, Thomas, Per-Eric__) ## added to derivations, needs to be tested # fix the remaining conversion bugs for {{sme}} (__Børre, Tomi__) # return to {{smj}}, and fix whatever is left to fix (__Børre, Tomi__) # release a public beta at the end of April (__Børre, Sjur__) !!Testing !Spelling Error Markup __TODO:__ * Set up ways of adding meta-information (source info, used in testing or not, added to lexicon or not) (__Saara__) * test new and nested error markup (__Sjur__) !!Speller bugs List of bugs returned from Polderland: 621, 630, 652, 656, 676. Open issues based on test results : !sme Version: __Davvisámi, version 1.0.1, 2008-04-01__ * 425 - __REGRESSION:__ other words from Divvun.no - two words rejected * 426 - comp words from Divvun.no - ''guoktedássásaš'' accepted - still __OPEN__ * 435 - roman numbers - __REGRESSION:__ inflection of single letter numbers rejected (but is ok in {{smj}}) ** we should pregenerate all numbers once and for all, and store them in a separate lexicon file * 452 - __REGRESSION:__ several lexical bugs - ''oažžuin'' + ''ožžuin'' * 595 - prefix+name wihtout hyphen (''ovdaLot'' instead of ''ovda-Lot'') - still __OPEN__ * 600 - gen+hyph compound ''sámi-dáru'' - still __OPEN__ * 603 - suomabealdi accepted - still __OPEN__ * 606 - speller accepts VUOHTA compound - still __OPEN__ * 607 - acro + hyphen - still __OPEN__ **''NRKGA'' is acro + clitic accepted without colon - what is correct? * 611 - double hyphen sugg still accepted - still __OPEN__ * 613 - short gen. as second compound part - still __OPEN__ * 619 - numerals and pronouns to NAMÁK and SASJ fails - still __OPEN__ * 627 - prefix + hyhpen does not get accepted - still __OPEN__ * 629 - ''a'' taking part in compounding without hyphen - still __OPEN__ * 634 - PropGen+hyph+PropGen - still __OPEN__ * 641 - numeral+noun compounds - still __OPEN__ * 642 - noun/adj/proper + hyphen + ain - still __OPEN__ * 644 - cased numeral+numeral compund - still __OPEN__ * 646 - adverb + hyphen + noun - still __OPEN__ * 647 - numerals+NOUN - still __OPEN__ * 648 - unmotivated suggestions with numeral+noun - still __OPEN__ * 649 - name + adj compound without hyphen - still __OPEN__ * 654 - speller does not recognize ordinals on -nuppelogát - still __OPEN__ * 655 - pron + nai - still __OPEN__ * 658 - Suggestion saame - still __OPEN__ !smj Version: __Julevsáme, version 1.0.1, 2008-04-01__ * 435 - roman number - single letter numbers now recognised ** we should pregenerate all numbers once and for all, and store them in a separate lexicon file ** please note that ''inflection'' of single letter numerals is __fine__ in {{smj}}, as opposed to {{sme}} * 595 - prefix+name wihtout hyphen (''tsåhkeLot'' instead of ''tsåhke-Lot'') - still __OPEN__ * 600 - gen+hyph compound ''sáme-dáro'' - still __OPEN__ * 607 - acro + hyphen **''NRKGA'' is acro + clitic accepted without colon - what is correct? * 616 - Bispadime-me-ráden - still __OPEN__, try to find an acro or abbr ''me'' * 619 - numerals and pronouns to NAMÁK and SASJ fails - still __OPEN__ * 629 - ''a'' taking part in compound - still __OPEN__ * 634 - rop gen + hyphen + Prop gen - still __OPEN__ * 641 - numeral+noun compounds - still __OPEN__ * 644 - cased numeral+numeral compund - still __OPEN__ * 647 - numerals+NOUN - still __OPEN__ * 648 - unmotivated suggestions with numeral+noun - still __OPEN__ * 649 - name + adj compound without hyphen - still __OPEN__ * 650 - noun prefix+name compound without hyphen - still __OPEN__ * 658 - Suggestion saame - still __OPEN__ __TODO:__ * compile new speller lexicons (__Tomi__) * document how compounding is controlled in the PLX conversion (__Tomi__) !!Hyphenator bugs Open issues based on test results : !sme Lexicon version: __Davvisámi, version 1.0.1, 2008-04-01__ * 468 - __REGRESSION:__''Márkomeanu'' * 547 - __REGRESSION:__ hyphen in front of vowel: ''Lotnolasealáhusas'' * 548 - __REGRESSION:__ mid syllable hyphenation: ''Háliidivččen'' * 549 - __REGRESSION:__ division without hyph: ''Váccedettiin'' * 673 - adj-derivations: ''guovttenuppelotčoarvvagiin'' (the word is not rec.) * 677 - __NEW:__ Wrongly hyphenated ending -danidja - invalid !smj Lexicon version: __Julevsáme, version 1.0.1, 2008-04-01__ * 545 - __REGRESSION:__ bad hyphenation in compounds: ''åhpadusorganisásjåvnån'' (not recognised) * 546 - __REGRESSION:__ obligatory hyph rules seem to work in facultative manner: ''organisásjåvnån'' (not recognised) * 547 - __REGRESSION:__ hyphen in front of vowel: ''Jienastimnjuolgadusá'' and ''Orgánajs'' !!InDesign tools We're waiting for an update from Polderland. !!Releases __TODO:__ * update the ''Changes'' document (__Sjur__) * InDesign documentation (__Sjur__) ** Norwegian translation received from Davvi Girji * public hunspell beta during first week of May * public 1.1 update of the Polderland-based tools during May !!!Other !!Corpus contracts + open source Now decided to wait until we have changed from {{cvs}} to {{svn}}. TODO: * publish corpus contracts and project infra as open-source on NoDaLi-sta (__Sjur__) !!Travel to Røros and Östersund 22-25. of April Dates ok by: * Sjur * Jovsset * Trond !!!Next meeting, closing The next meeting is 28.4.2008. The meeting was closed at 10:27. !!!Appendix - task lists for the next five days !!Boerre [iCal|/doc/admin/weekly/2008/Tasks_2008-04-21_Boerre.ics] * Hunspell lexicon conversion * prepare migration to svn (with __Sjur, Trond__) * release hunspell public beta at the end of April (with __Sjur__) * review hunspell lexicon branch (with __Thomas__), merge with trunk if ok * try to repair G5 accounts for iCal Server * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Lene * Ped project !!Maaren * Put the list of possible {{sma}} corpus sources into a document !!Per-Eric [iCal|/doc/admin/weekly/2008/Tasks_2008-04-21_Per-Eric.ics] * try to find other authors who have {{smj}} texts digitaly, send them contracts * Work with missing list from texts written by Sigga Tuolja Sandström. * Work with missing list same_dutkama_pgr.txt * Work with missing list sameriekta_tjoahkkagæsos.txt * Keep the contact with Ulf-Stefan Winka who has many more smj texts to add. * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Saara * add new XSL/XML headers for proofing test docs * Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not) * discuss more parallel texts !!Sjur [iCal|/doc/admin/weekly/2008/Tasks_2008-04-21_Sjur.ics] * gather {{sma}} texts * name db/risten.no * make an improved {{sma}} project plan * publish corpus contracts and project infra as open-source on NoDaLi-sta * prepare migration to svn (with __Børre, Trond__) * release hunspell public beta at the end of April (with __Børre__) * update the ''Changes'' document * follow-up on some Polderland-related bugs: 621, 630, 652, 656 * test latest hunspell lexicon * InDesign documentation * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Thomas [iCal|/doc/admin/weekly/2008/Tasks_2008-04-21_Thomas.ics] * look at test cases still not behaving properly * review hunspell lexicon branch with __Børre__ * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Tomi [iCal|/doc/admin/weekly/2008/Tasks_2008-04-21_Tomi.ics] * Hunspell lexicon conversion * document how compounding is controlled in the PLX conversion * fix double hyphen bugs * Make a pedagogical speller (after MA thesis is delivered) * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Trond [iCal|/doc/admin/weekly/2008/Tasks_2008-04-21_Trond.ics] * Help Jovsset with vislcg3 and sma * Set up Jabber for Lene, Kimme, Saara * Prepare svn migration (with __Sjur, Børre__) * [fix bugs!|http://giellatekno.uit.no/bugzilla].