!!!Meeting setup * Date: 25.2.2008 * Time: 09.30 Norw. time * Place: Internet * Tools: SubEthaEdit, iChat/Skype !!!Agenda Cf. one of the following, depending on context: * the upper bar of the SEE window (provided you use the JSPWiki syntax mode) * the TOC in Forrest-rendered output, like HTML and PDF !!!Opening, agenda review, participants Opened at 09:43. Present: __Børre, Per-Eric, Sjur, Thomas, Tomi__ Absent: __Maaren, Trond__ Agenda accepted as is. !!!Updated task status since last meeting !!Børre * start to reorganise the documentation * gather {{sma}} texts * improve forrest stability with i18n, site look ** had to disable pdf link, forrest crashes when it's available. Investigating the reason for the crash. * set up the Leopard Server features for collaborative support * Hunspell lexicon conversion ** Derivation is implemented, but not tested, due to technical difficulties building the hunspell lexicons * investigate the NSIS installer * work on Tromsø Sami workshop paper * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Lene * Ped project * work on Tromsø Sami workshop paper !!Maaren * Put the list of possible {{sma}} corpus sources into a document * update the ''Changes'' document !!Per-Eric * check some unusual and missing words from the last Olavi missing list ** Worked and still working, ready in a few days * keep the contact with Kurt Tores family about his texts. ** Nothing new from them * try to find other authors who have smj texts digitaly ** Contacted some, but nothing which we can start to work with yet. Nord-Salten avis, Matto Tuoljas daughter Lena Davidsson. * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** Fixed smj bug 495 !!Saara * add new XSL/XML headers for proofing test docs * Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not) * discuss more parallel texts !!Sjur * start to reorganise the documentation * gather {{sma}} texts * improve forrest stability with i18n, site look * set up the Leopard Server features for collaborative support * check the present {{sma}} sources ** done * name db/risten.no ** worked on it * investigate the NSIS installer * make a first {{sma}} project plan ** partially done, as part of the Snåsa presentation, not finished * publish corpus contracts and project infra as open-source on NoDaLi-sta * work on Tromsø Sami workshop paper ** done, not finished * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Thomas * look at test cases still not behaving properly ** not much done * work on Tromsø Sami workshop paper ** not participating * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** not much done !!Tomi * Hunspell lexicon conversion * document how compounding is controlled in the PLX conversion * work on Tromsø Sami workshop paper * fix double hyphen bugs * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Trond * Report the smesmj project ** Done * Start working on the samdoc talk ** Working, working * {{sme->smj}} lexicon conversion to build bilingual lexicon resources ** Hmm, does not remember this issue. * Reorganise documentation (with Børre and Sjur) ** Not done * Gather sma texts (with Børre and Sjur) ** Not done * Look at the sma source files (with Sjur) ** Huge progress here. Now verbs included, in principle. * Name lexicon project: Test editing xml files (when they are ready for it) * Make a first {{sma}} project plan * work on Tromsø Sami workshop paper * [fix bugs!|http://giellatekno.uit.no/bugzilla]. !!!Pedagogical software online __TODO:__ * Setting up the user documentation with an external address, and cross-reference via tabs to giellatekno and divvun. (__Børre__) ** [done|http://giellatekno.uit.no/oahpa/] * get an easy-to-remember URL (__UiT/IT__) * More thorough skin, layout, ... (__External person within the Ped team__, __Internal forrest expert__) This we will postpone until later !!!Workshop in Tromsø, end of February List of topics, authors and presenters (*): * corpinfra -> Saara* Sjur Børre * testinfra -> Saara Sjur* Lene * makeinfra -> Saara* Sjur * gramm -> Lene* Trond * ped -> Lene* Trond * MT -> Trond* * spell -> Sjur* * docu -> Børre* * corpcontent -> Lene, Børre * new lgs -> Trond* 10 topis in 45 mins = 4,5 mins on each topic. Dry test presentation on Wednesday afternoon? Sjur is arriving at 11 at the airport, Trond has a meeting at 14. Perhaps after the meeting, to allow some time to write the final bits of the presentation first? At 16, 17 roughly. __TODO:__ * Presentation of our work ** Basic tools (__Sjur, Trond__) ** Applications (__Lene, Sjur__) ** Corpus infrastructure (__Børre, Saara, Sjur__) ** Overall infrastructure ("Makefile") (__Sjur, Tomi__) * Plans for future work (__Sjur, Trond__) * Relevance for other projects ** Standard written language texts (__Trond__) ** Existing written dialect texts (__Lene, Trond__) ** Existing dialect recordings (__Lene__) * Turn the text into slides (samdoc08.tex into samdoc08-sem.tex (__Trond__) !!!Documentation __TODO:__ * start to reorganise the documentation (__Børre, Sjur, Trond__) !!!Corpus gathering __TODO:__ * follow-up on the {{smj}} texts from __Kurt Tore__ (__Per-Eric__) * get texts from __Sigga Tuolja Sandstrøm__ (__Per-Eric__) * gather {{sma}} texts (__Børre, Sjur, Trond__) * Put the list of possible corpus sources into a document {{gt/doc/lang/sma/sma-corpus-plan.jspwiki}} (__Maaren__) !!!Infrastructure __TODO:__ * add Jabber account in iChat (__all__) * improve forrest stability with i18n, site look (__Børre, Sjur, Tomi__) * set up the Leopard Server features for collaborative support - permanent chat rooms, project calendar(s), wiki? (__Børre, Sjur__) !!!Linguistics !!North Sámi Hyphenation bugs still there, now properly documented by the improved test bench. !!Lule Sámi Hyphenation: same as for {{sme}}. __TODO:__ * {{sme->smj}} lexicon conversion to build bilingual lexicon resources, and increase {{smj}} coverage (__Trond, Svenne__). * Add the words when all words are ready. !!South Sámi done: * __Tomi__ has added a lot of verbs (1512) * __Joseph__ is hunting for {{sma}} texts at SD __TODO:__ * check the present sources (__Sjur, Trond__) !!!Name lexicon infrastructure __TODO:__ # fix i18n bug in risten.no/G5 (so they will work without the proper locale request) (__Sjur__) ## it works ok locally, set-up / config needs to be checked on the G5; probably easy to fix ### looked at it # fix display in column 3 (__Sjur__) ## it works in Firefox and other Mozilla-based browsers; not in Safari and other Webkit-based browsers ### looked at it # fix bugs in lexc2xml; add comments to the log element (__Saara__) # finish first version of the editing (__Sjur__) # test editing of the xml files. If ok, then: (__Sjur, Thomas, Trond__) # make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (__Sjur, Saara__) # convert propernoun-($lang)-lex.txt to a derived file from common xml files (__Sjur, Tomi, Saara__) # implement data synchronisation between [risten.no|http://www.risten.no] and the cvs repo, and possibly other servers (ie the G5 as an alternative server to the public risten.no - it might be faster and better suited than the official one; also local installations could be treated the same way) # start to use the xml file as source file # clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (__Thomas, Maaren, linguists__) # merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (__linguists__) # publish the name lexicon on risten.no (__Sjur__) # add missing parallel names for placenames (__linguists__) # add informative links between first names like Niillas and Nils (__linguists__) !!!Proofing tools !!Hunspell The %> marker does not survive into Hunspell to work as a boundary marker, despite being defined as %> for the Hunspell version. __TODO:__ # debug the missing > marker - the problem is on the Java side (__Børre, Tomi__) # add {{smj}} to the soup, make sure it works roughly as good as {{sme}} # fix the remaining conversion bugs for {{sme}} # return to {{smj}}, and fix whatever is left to fix # integrate the derivations as separate "continuation lexicons" !!Testing !Spelling Error Markup __TODO:__ * Set up ways of adding meta-information (source info, used in testing or not, added to lexicon or not) (__Saara__) * test new and nested error markup (__Sjur__) !!Speller bugs Open issues based on test results : !sme Version: __Davvisámi, version 1.0.1, 2008-01-31__ * 425 - roman number - will not be fixed in 1.0 release - __FIXED__ * 426 - comp words from Divvun.no - ''guoktedássásaš'' accepted - still open * 536 - speller accepts "impossible" compound-forms, ''geažideapmigárvu'' and ''giddesteapmisággi'' accepted - __FIXED__ * 593 - missing words in beta2 - __FIXED__ * 595 - prefix+name wihtout hyphen (''ovdaLot'' instead of ''ovda-Lot'') * 597 - does not recognize nubbelohki - __FIXED__ * 603 - suomabealdi, norggabealdi accepted * 606 - speller accepts VUOHTA compound * 611 - double hyphen sugg still accepted * 613 - short gen. as second compound part * 619 - __REGRESSION:__ - numerals and pronouns to NAMÁK and SASJ fails * 625 - word+footnote - possibly Polderland or MS, or a consequence of allowing spell checking of words including digits * 627 - prefix + hyhpen does not get accepted * 629 - ''a'' taking part in compounding without hyphen * 631 - numbers starting with 0 - __FIXED__ * 633 - double hyphens accepted in Word, not by cmdline speller * 634 - PropGen+hyph+PropGen * 637 - nai(go) becomes -naj(go) - __FIXED__ * 641 - umeral+noun compounds * 642 - noun/adj/proper + hyphen + ain !smj Version: __Julevsáme, version 1.0.1, 2008-01-31__ * 482 - ''Nuorttalijguovlojn'' accepted again ** testcase changed, test __PASSED__ * 607 - acro + hyphen, ''NRKGA'' accepted - test pair is wrong, should be corr. * 615 - actio and actor compounds - __FIXED__ * 616 - Bispadime-me-ráden - still __OPEN__ * 618 - dipht. simpl. - __FIXED__ * 619 - __REGRESSION:__ - numerals and pronouns to NAMÁK and SASJ fails * 629 - ''a'' taking part in compound - still __OPEN__ * 631 - number compounds starting with 0 - __FIXED__ * 634 - rop gen + hyphen + Prop gen * 641 - umeral+noun compounds __TODO:__ * look at test cases still not behaving properly (__Thomas, Tomi__) * document how compounding is controlled in the PLX conversion (__Tomi__) !!Hyphenator bugs Open issues based on test results : !sme * 468 - ''Márkomenau'' -> Polderland * 548 - ''duostan'' -> Polderland * 549 - missing hyph at word boundary -> Polderland * 633 - extra hyphen inserted -> Polderland !smj * 549 - missing hyph at word boundary -> Polderland * 633 - extra hyphen inserted -> Polderland * 636 - hyphen before last char -> Polderland !!InDesign tools Near-final tools were released on Friday, Feb. 1, including working user dictionary functionality. __TODO:__ * test twolc hash-mark bug solution (__Tomi, Trond, Sjur__) ** done - it worked fine, and is the only possible solution due to special treatment of this character in {{twolc}} * fix double hyphen bugs (__Tomi__) * new lexicons by Tuesday (__Tomi__) ** done * updated Polderland tools by Wednesday (__Sjur__) ** done, delivered on Thursday * final changes and bug fixes by Thursday afternoon (__Thomas, Sjur, Tomi__) ** done * final lexicons by Friday morning (__Tomi__) ** done !!Windows installer __TODO:__ * investigate the NSIS installer (__Børre, Sjur__) !!Releases __TODO:__ * update the ''Changes'' document (__Maaren__) * release InDesing tools Jan. 30. (__Børre, Sjur, Thomas, Tomi__) ** compile new lexicons (__Tomi__) *** done ** test (__all__) *** partially done ** document (__Sjur__) *** not really ** package and release (__Sjur__) *** done !!!Other !!Corpus contracts + open source TODO: * publish corpus contracts and project infra as open-source on NoDaLi-sta (__Sjur__) !!!Next meeting, closing The next meeting is 11.2.2008 in Snåsa. The meeting was closed at 10:40. !!!Appendix - task lists for the next five days !!Boerre [iCal|/doc/admin/weekly/2008/Tasks_2008-02-25_Boerre.ics] * start to reorganise the documentation * gather {{sma}} texts * improve forrest stability with i18n, site look * set up the Leopard Server features for collaborative support * Hunspell lexicon conversion * InDesign documentation * investigate the NSIS installer * work on Tromsø Sami workshop paper * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Lene * Ped project * work on Tromsø Sami workshop paper !!Maaren [iCal|/doc/admin/weekly/2008/Tasks_2008-02-25_Maaren.ics] * Put the list of possible {{sma}} corpus sources into a document * update the ''Changes'' document !!Per-Eric [iCal|/doc/admin/weekly/2008/Tasks_2008-02-25_Per-Eric.ics] * check some unusual and missing words from the last Olavi missing list * keep the contact with Kurt Tores family about his texts. * try to find other authors who have smj texts digitaly * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Saara [iCal|/doc/admin/weekly/2008/Tasks_2008-02-25_Saara.ics] * add new XSL/XML headers for proofing test docs * Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not) * discuss more parallel texts !!Sjur [iCal|/doc/admin/weekly/2008/Tasks_2008-02-25_Sjur.ics] * start to reorganise the documentation * gather {{sma}} texts * improve forrest stability with i18n, site look * set up the Leopard Server features for collaborative support * name db/risten.no * investigate the NSIS installer * make a first {{sma}} project plan * publish corpus contracts and project infra as open-source on NoDaLi-sta * work on Tromsø Sami workshop paper * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Thomas [iCal|/doc/admin/weekly/2008/Tasks_2008-02-25_Thomas.ics] * look at test cases still not behaving properly * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Tomi [iCal|/doc/admin/weekly/2008/Tasks_2008-02-25_Tomi.ics] * Hunspell lexicon conversion * document how compounding is controlled in the PLX conversion * work on Tromsø Sami workshop paper * fix double hyphen bugs * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Trond [iCal|/doc/admin/weekly/2008/Tasks_2008-02-25_Trond.ics] * Reorganise documentation (with Børre and Sjur) * Gather sma texts (with Børre and Sjur) * Name lexicon project: Test editing xml files (when they are ready for it) * Work on {{sma}} analyser and visl integration * [fix bugs!|http://giellatekno.uit.no/bugzilla].