!!!Meeting setup * Date: 3.3.2008 * Time: 09.30 Norw. time * Place: Internet * Tools: SubEthaEdit, iChat/Skype !!!Agenda Cf. one of the following, depending on context: * the upper bar of the SEE window (provided you use the JSPWiki syntax mode) * the TOC in Forrest-rendered output, like HTML and PDF !!!Opening, agenda review, participants Opened at 10:30. Present: __Børre, Per-Eric, Sjur, Thomas, Tomi__ Absent: __Maaren, Trond__ Agenda accepted as is. !!!Updated task status since last meeting !!Børre * start to reorganise the documentation ** not done * gather {{sma}} texts ** not done * improve forrest stability with i18n, site look ** fixed bug with pdf-generation * set up the Leopard Server features for collaborative support ** not done * Hunspell lexicon conversion ** nothing this week * investigate the NSIS installer ** nothing done * work on Tromsø Sami workshop paper ** prepared and dit the talk together with the others * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Lene * Ped project * work on Tromsø Sami workshop paper !!Maaren * Put the list of possible {{sma}} corpus sources into a document * update the ''Changes'' document !!Per-Eric * check some unusual and missing words from the last Olavi missing list ** Done * keep the contact with Kurt Tores family about his texts. ** Nothing * try to find other authors who have smj texts digitaly ** Talked to Lena Davidsson who has copyright to Lars-Matto Tuoljas book Tjaktjalasta. It's okej to use the book and I am going to send her a contract as soon as possible. * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** Nothing done !!Saara * add new XSL/XML headers for proofing test docs * Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not) * discuss more parallel texts !!Sjur * start to reorganise the documentation * gather {{sma}} texts * improve forrest stability with i18n, site look * set up the Leopard Server features for collaborative support * name db/risten.no ** some work (details only) * investigate the NSIS installer * make a first {{sma}} project plan * publish corpus contracts and project infra as open-source on NoDaLi-sta * work on Tromsø Sami workshop paper ** done and presented * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Thomas * look at test cases still not behaving properly * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Tomi * Hunspell lexicon conversion * document how compounding is controlled in the PLX conversion * work on Tromsø Sami workshop paper * fix double hyphen bugs * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Trond * Reorganise documentation (with Børre and Sjur) * Gather sma texts (with Børre and Sjur) * Name lexicon project: Test editing xml files (when they are ready for it) * Work on {{sma}} analyser and visl integration * [fix bugs!|http://giellatekno.uit.no/bugzilla]. !!!Pedagogical software online __TODO:__ * get an easy-to-remember URL (__UiT/IT__) * More thorough skin, layout, ... (__External person within the Ped team__, __Internal forrest expert__) This we will postpone until later !!!Workshop in Tromsø Nice workshop, very informative, several potential projects discussed with multiple attendees. At first there will be a network initiative. !!!Documentation __TODO:__ * start to reorganise the documentation (__Børre, Sjur, Trond__) !!!Corpus gathering __TODO:__ * follow-up on the {{smj}} texts from __Kurt Tore__ (__Per-Eric__) * get texts from __Sigga Tuolja Sandstrøm__, possibly through __Olavi Korhonen__ (contract is ok now) (__Per-Eric__) * other contacts: Nord-Salten avis, Børge Strandskog, Lena Davidsson daughter to Lars-Matto Tuolja * gather {{sma}} texts (__Børre, Sjur, Trond, Joseph__) * Put the list of possible corpus sources into a document {{gt/doc/lang/sma/sma-corpus-plan.jspwiki}} (__Maaren__) * give contract with blank fields to __Per-Eric__ (__Børre__) !!!Infrastructure __TODO:__ * add Jabber account in iChat (__all__) * improve forrest stability with i18n, site look (__Børre, Sjur, Tomi__) * set up the Leopard Server features for collaborative support - permanent chat rooms, project calendar(s), wiki? (__Børre, Sjur__) !!!Linguistics !!North Sámi Hyphenation bugs still there, now properly documented by the improved test bench. !!Lule Sámi All missing lists now added. A few things missing from the bible texts. Hyphenation: same as for {{sme}}. __TODO:__ * {{sme->smj}} lexicon conversion to build bilingual lexicon resources, and increase {{smj}} coverage (__Trond, Svenne__). * Add the words when all words are ready. !!South Sámi __Tomi__ has added even more verbs. The corrections to ''Verbh'' are now added to the lexicon. __Joseph__ is hunting for {{sma}} texts at SD. __TODO:__ * check the present sources (__Sjur, Trond__) ** done !!!Name lexicon infrastructure __TODO:__ # fix i18n bug in risten.no/G5 (so they will work without the proper locale request) (__Sjur__) ## it works ok locally, set-up / config needs to be checked on the G5; probably easy to fix # fix display in column 3 (__Sjur__) ## it works in Firefox and other Mozilla-based browsers; not in Safari and other Webkit-based browsers ### fixed # fix bugs in lexc2xml; add comments to the log element (__Saara__) # finish first version of the editing (__Sjur__) # test editing of the xml files. If ok, then: (__Sjur, Thomas, Trond__) # make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (__Sjur, Saara__) # convert propernoun-($lang)-lex.txt to a derived file from common xml files (__Sjur, Tomi, Saara__) # implement data synchronisation between [risten.no|http://www.risten.no] and the cvs repo, and possibly other servers (ie the G5 as an alternative server to the public risten.no - it might be faster and better suited than the official one; also local installations could be treated the same way) # start to use the xml file as source file # clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (__Thomas, Maaren, linguists__) # merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (__linguists__) # publish the name lexicon on risten.no (__Sjur__) # add missing parallel names for placenames (__linguists__) # add informative links between first names like Niillas and Nils (__linguists__) !!!Proofing tools !!Hunspell The %> marker has been fixed, it was a bug in the homebuilt xml-parser. Derivations seem to work, but no new lexicons due to technical problems. __TODO:__ # debug the missing > marker - the problem is on the Java side (__Børre, Tomi__) # add {{smj}} to the soup, make sure it works roughly as good as {{sme}} # fix the remaining conversion bugs for {{sme}} # return to {{smj}}, and fix whatever is left to fix # integrate the derivations as separate "continuation lexicons" !!Testing !Spelling Error Markup __TODO:__ * Set up ways of adding meta-information (source info, used in testing or not, added to lexicon or not) (__Saara__) * test new and nested error markup (__Sjur__) !!Speller bugs Open issues based on test results : !sme Version: __Davvisámi, version 1.0.1, 2008-01-31__ * 425 - roman number - will not be fixed in 1.0 release - __FIXED__ * 426 - comp words from Divvun.no - ''guoktedássásaš'' accepted - still open * 536 - speller accepts "impossible" compound-forms, ''geažideapmigárvu'' and ''giddesteapmisággi'' accepted - __FIXED__ * 593 - missing words in beta2 - __FIXED__ * 595 - prefix+name wihtout hyphen (''ovdaLot'' instead of ''ovda-Lot'') * 597 - does not recognize nubbelohki - __FIXED__ * 603 - suomabealdi, norggabealdi accepted * 606 - speller accepts VUOHTA compound * 611 - double hyphen sugg still accepted * 613 - short gen. as second compound part * 619 - __REGRESSION:__ - numerals and pronouns to NAMÁK and SASJ fails * 625 - word+footnote - possibly Polderland or MS, or a consequence of allowing spell checking of words including digits * 627 - prefix + hyhpen does not get accepted * 629 - ''a'' taking part in compounding without hyphen * 631 - numbers starting with 0 - __FIXED__ * 633 - double hyphens accepted in Word, not by cmdline speller * 634 - PropGen+hyph+PropGen * 637 - nai(go) becomes -naj(go) - __FIXED__ * 641 - umeral+noun compounds * 642 - noun/adj/proper + hyphen + ain !smj Version: __Julevsáme, version 1.0.1, 2008-01-31__ * 482 - ''Nuorttalijguovlojn'' accepted again ** testcase changed, test __PASSED__ * 607 - acro + hyphen, ''NRKGA'' accepted - test pair is wrong, should be corr. * 615 - actio and actor compounds - __FIXED__ * 616 - Bispadime-me-ráden - still __OPEN__ * 618 - dipht. simpl. - __FIXED__ * 619 - __REGRESSION:__ - numerals and pronouns to NAMÁK and SASJ fails * 629 - ''a'' taking part in compound - still __OPEN__ * 631 - number compounds starting with 0 - __FIXED__ * 634 - rop gen + hyphen + Prop gen * 641 - umeral+noun compounds __TODO:__ * look at test cases still not behaving properly (__Thomas, Tomi__) * document how compounding is controlled in the PLX conversion (__Tomi__) !!Hyphenator bugs Open issues based on test results : !sme * 468 - ''Márkomenau'' -> Polderland * 548 - ''duostan'' -> Polderland * 549 - missing hyph at word boundary -> Polderland * 633 - extra hyphen inserted -> Polderland !smj * 549 - missing hyph at word boundary -> Polderland * 633 - extra hyphen inserted -> Polderland * 636 - hyphen before last char -> Polderland !!InDesign tools Near-final tools were released on Friday, Feb. 1, including working user dictionary functionality. __TODO:__ * fix double hyphen bugs (__Tomi__) !!Windows installer __TODO:__ * investigate the NSIS installer (__Børre, Sjur__) !!Releases __TODO:__ * update the ''Changes'' document (__Maaren__) * documentation (__Sjur__) ** Norwegian translation received from Davvi Girji !!!Other !!Corpus contracts + open source TODO: * publish corpus contracts and project infra as open-source on NoDaLi-sta (__Sjur__) !!Easter vacation Week 12, 20-24.3. || Name || Period | Børre | 10.-16.3. (winter holidays) | Sjur | 18.-24.3 | Jovsset | 17.-24.3. | Maaren | 19.-24.3. | Per-Eric | ??? !!!Next meeting, closing The next meeting is 10.3.2008, but will potentially be delayed till Wednesday. E-mail coming. The meeting was closed at 11:25. !!!Appendix - task lists for the next five days !!Boerre [iCal|/doc/admin/weekly/2008/Tasks_2008-03-03_Boerre.ics] * start to reorganise the documentation * gather {{sma}} texts * improve forrest stability with i18n, site look * set up the Leopard Server features for collaborative support * Hunspell lexicon conversion * InDesign documentation * investigate the NSIS installer * give contract with blank fields to __Per-Eric__ * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Lene * Ped project !!Maaren [iCal|/doc/admin/weekly/2008/Tasks_2008-03-03_Maaren.ics] * Put the list of possible {{sma}} corpus sources into a document * update the ''Changes'' document !!Per-Eric [iCal|/doc/admin/weekly/2008/Tasks_2008-03-03_Per-Eric.ics] * keep the contact with Kurt Tores family about his texts. * try to find other authors who have smj texts digitaly, send contracts to them * Work with a few things missing from the bible texts. * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Saara [iCal|/doc/admin/weekly/2008/Tasks_2008-03-03_Saara.ics] * add new XSL/XML headers for proofing test docs * Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not) * discuss more parallel texts !!Sjur [iCal|/doc/admin/weekly/2008/Tasks_2008-03-03_Sjur.ics] * start to reorganise the documentation * gather {{sma}} texts * improve forrest stability with i18n, site look * set up the Leopard Server features for collaborative support * name db/risten.no * investigate the NSIS installer * make a first {{sma}} project plan * publish corpus contracts and project infra as open-source on NoDaLi-sta * call Julie Eira about Maaren's share of the Divvun project * prepare Trondheim presentation * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Thomas [iCal|/doc/admin/weekly/2008/Tasks_2008-03-03_Thomas.ics] * look at test cases still not behaving properly * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Tomi [iCal|/doc/admin/weekly/2008/Tasks_2008-03-03_Tomi.ics] * Hunspell lexicon conversion * document how compounding is controlled in the PLX conversion * fix double hyphen bugs * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Trond [iCal|/doc/admin/weekly/2008/Tasks_2008-03-03_Trond.ics] * Reorganise documentation (with Børre and Sjur) * Gather sma texts (with Børre and Sjur) * Name lexicon project: Test editing xml files (when they are ready for it) * Work on {{sma}} analyser and visl integration * [fix bugs!|http://giellatekno.uit.no/bugzilla].