!!!Meeting setup * Date: 28.1.2008 * Time: 09.30 Norw. time * Place: Internet * Tools: SubEthaEdit, iChat/Skype !!!Agenda Cf. one of the following, depending on context: * the upper bar of the SEE window (provided you use the JSPWiki syntax mode) * the TOC in Forrest-rendered output, like HTML and PDF !!!Opening, agenda review, participants Opened at 09:58. Present: __Børre, Lene, Maaren, Per-Eric, Sjur, Thomas, Tomi, Trond__ Absent: __none__ Agenda accepted as is. !!!Updated task status since last meeting !!Børre * start to reorganise the documentation ** not done * gather {{sma}} texts ** not done * improve forrest stability with i18n, site look * set up the Leopard Server features for collaborative support ** not done * Hunspell lexicon conversion ** not done * InDesign documentation ** not done * investigate the NSIS installer ** not done * release InDesign tools Jan. 30. ** not done * [fix bugs!|http://giellatekno.uit.no/bugzilla] * other: ** worked on layouts for giellatekno.uit.no, and the coming oahpa.uit.no sites. This work will also be incorporated into the divvun.no site. !!Lene * Pedagogical project, running till 31.12.08: making grammatical games (VISL) and interactive dialogues on internet - for pupils, students and others: Lene, Trond, Saara ** VISL-games and quizes are almost ready (ready for trying, some adjustments to do) http://beta.visl.sdu.dk ** dialogues: Made a simple technical model, beginning to write the dialogues ** have had course for teachers at one school to get feedback ** made users´ documentation: OAHPA!-portal !!Maaren * Put the list of possible {{sma}} corpus sources into a document ** not done * update the ''Changes'' document ** not done !!Per-Eric * check some unusual and missing words from the last Olavi missing list ** Working on it * keep the contact with Kurt Tores family about his texts, send a new contract ** Sent a new contract, Kurt Tores wife has a contact person who is __A Kintel__ * try to visit __S T Sandstrøm__ personally as soon as possible, maybe this week ** Sent a new contract, now she is really positive to give all her text to us without visiting her personaly * try to find other authors who have {{smj}} texts digitally ** nothing done * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** Worked some !!Saara * add new XSL/XML headers for proofing test docs * Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not) * discuss more parallel texts !!Sjur * document Windows CD installation work-around ** unless we get feedback saying otherwise, the present documenation should be ok * start to reorganise the documentation ** not done * gather {{sma}} texts ** not done * improve forrest stability with i18n, site look ** not done, but found an i18n issue on the G5/"internal risten.no" * set up the Leopard Server features for collaborative support * check the present {{sma}} sources * name db/risten.no ** identified the issue with non-working browsers - locale mismatch * improve hyphenation testing ** done, and several issues identified * investigate the NSIS installer ** not done any more * get hotel rooms in Snåsa ** not done yet, will do today * make a first {{sma}} project plan ** not done * publish corpus contracts and project infra as open-source on NoDaLi-sta * add verb paradigm generation bug to Bugzilla ** done * test that hyphenation is identical in InDesign and the command line tool ** done - they seem to be identical, which means we can trust the test results * release InDesing tools Jan. 30. * [fix bugs!|http://giellatekno.uit.no/bugzilla] * other: ** tested the automatic language identification in Word 2007, after user feedback that it works just fine. And it does also for me. We probably need a FAQ to relegate such issues to, where we can say something like "IF this, TRY that" ** hyphenation and speller testing !!Thomas * look at test cases still not behaving properly ** worked some * create hyphenation test data ** done * release InDesing tools Jan. 30. ** jaså * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** worked !!Tomi * Hunspell lexicon conversion ** not done * document how compounding is controlled in the PLX conversion ** not done * release InDesing tools Jan. 30. ** not past this day yet * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** fixed some !!Trond * {{sme->smj}} lexicon conversion to build bilingual lexicon resources * Reorganise documentation (with Børre and Sjur) ** Reorganised ped doc, otherwise not * Gather sma texts (with Børre and Sjur) ** Not done * Look at the sma source files (with Sjur) ** Not done * Name lexicon project: Test editing xml files (when they are ready for it) ** No files yet * Make a first {{sma}} project plan ** Looked at it myself, not in plenary * [fix bugs!|http://giellatekno.uit.no/bugzilla]. !!!Pedagogical software online We now have user documentation ''almost'' online (the technical one is part of our TechDoc hierarchy). What is missing is a working URL. __Børre__ has been working on setting up a Forrest based site for the user front end, with adjusted CSS styling, another layout, etc. This did not quite succeed. The next step is to establish an url, say {{http://oahpa.no/}} or {{http://oahpa.uit.no/}}, directing to that site. It should be online on Feb. xx, the slick URL and professional layout should be ready by YY. __TODO:__ * Setting up the user documentation with an external address, and cross-reference via tabs to giellatekno and divvun. (__?__) * get an easy-to-remember URL (__UiT/IT__) * More thorough skin, layout, ... (__External person within the Ped team__, __Internal forrest expert__) This we will postpone until later !!!Workshop in Tromsø, end of February Conference in Tromsø in week 9, february 28-29, on [Sámi documentation and revitalisation|http://uit.no/humfak/9315/]. The two teams should present our work, and our view on the future. There is a start for it at {{plan/art/samdoc08/samdoc08.tex}} and {{plan/art/samdoc08/samdoc08-sem.tex}}. One of the goals for the conference is to make proposals for grant support. First draft ready in Snåsa. __TODO:__ * Presentation of our work ** Basic tools (__Sjur, Trond, Thomas__) ** Applications (__Lene, Sjur__) ** Corpus infrastructure (__Børre, Saara, Sjur__) ** Overall infrastructure ("Makefile") (__Sjur, Tomi__) * Plans for future work (__Sjur, Trond__) * Relevance for other projects ** Standard written language texts (__Trond__) ** Existing written dialect texts (__Lene, Trond__) ** Existing dialect recordings (__Lene__) * Turn the text into slides (samdoc08.tex into samdoc08-sem.tex (__Trond__) !!!Documentation __TODO:__ * start to reorganise the documentation (__Børre, Sjur, Trond__) !!!Corpus gathering __TODO:__ * follow-up on the {{smj}} texts from __Kurt Tore__ (__Per-Eric__) ** the text discussions will go via __Anders Kintel__ * go visit Sigga Tuolja Sandstrøm (__Per-Eric__) ** no need to go there, __Per-Eric__ called her and sent her a contract ** need to talk to a person who has scanned the texts, will get the texts from him, he will send them to __Børre__ * gather {{sma}} texts (__Børre, Sjur, Trond__) * Put the list of possible corpus sources into a document {{gt/doc/lang/sma/sma-corpus-plan.jspwiki}} (__Maaren__) !!!Infrastructure __TODO:__ * add Jabber account in iChat (__all__) * improve forrest stability with i18n, site look (__Børre, Sjur, Tomi__) * set up the Leopard Server features for collaborative support - permanent chat rooms, project calendar(s), wiki? (__Børre, Sjur__) !!!Linguistics !!North Sámi Hyphenation bugs still there, now properly documented by the improved test bench. !!Lule Sámi Hyphenation: same as for {{sme}}. __TODO:__ * {{sme->smj}} lexicon conversion to build bilingual lexicon resources, and increase {{smj}} coverage (__Trond, Svenne__). * Add the words when all words are ready. !!South Sámi __TODO:__ * check the present sources (__Sjur, Trond__) !!!Name lexicon infrastructure The upcoming dictionaries are: * kven: fkvnob, nobfkv * smesmj * (smjsme) * smenob The kven work should be visible. The {{smj}} should be reported this week, the {{smenob}} is part of the ped work and is interesting for the general audience. Status quo: The dictionaries are shown online (http://www.divvun.no:8889/index.html?locale=no), but do not give translations. It requires the locale request parameter to work properly for most users. Short-term goal: Have them work in risten GUI. Long-term goal: Give them a better GUI and integrate in ped platform and other places. Decisions made in Tromsø can be found in [this meeting memo.|/doc/admin/physical_meetings/tromso-2006-08-propnoun.html] __TODO:__ # fix i18n bug in risten.no/G5 (so they will work without the proper locale request) (__Sjur__) # fix display in column 3 (__Sjur__) # fix bugs in lexc2xml; add comments to the log element (__Saara__) # finish first version of the editing (__Sjur__) # test editing of the xml files. If ok, then: (__Sjur, Thomas, Trond__) # make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (__Sjur, Saara__) # convert propernoun-($lang)-lex.txt to a derived file from common xml files (__Sjur, Tomi, Saara__) # implement data synchronisation between [risten.no|http://www.risten.no] and the cvs repo, and possibly other servers (ie the G5 as an alternative server to the public risten.no - it might be faster and better suited than the official one; also local installations could be treated the same way) # start to use the xml file as source file # clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (__Thomas, Maaren, linguists__) # merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (__linguists__) # publish the name lexicon on risten.no (__Sjur__) # add missing parallel names for placenames (__linguists__) # add informative links between first names like Niillas and Nils (__linguists__) !!!Proofing tools !!Hunspell The %> marker does not survive into Hunspell to work as a boundary marker, despite being defined as %> for the Hunspell version. Priority list: # debug the missing > marker # add {{smj}} to the soup, make sure it works roughly as good as {{sme}} # fix the remaining conversion bugs for {{sme}} # return to {{smj}}, and fix whatever is left to fix # integrate the derivations as separate "continuation lexicons" __TODO:__ * Hunspell lexicon conversion (__Tomi, Børre__) * debug %> problem (__Tomi__) !!Testing !Spelling Error Markup __TODO:__ * Set up ways of adding meta-information (source info, used in testing or not, added to lexicon or not) (__Saara__) * test new and nested error markup (__Sjur__) !Automated testing __TODO:__ * improve hyphenation testing (__Sjur__) ** done * add verb paradigm generation bug to Bugzilla (__Sjur__) ** done !!Lexicon conversion to the PLX format Open issues based on test results : !sme Version: __Davvisámi, version 1.0.1, 2008-01-28__ * 425 - roman number - will not be fixed in 1.0 release * 426 - comp words from Divvun.no - ''guoktedássásaš'' accepted - still open * 536 - speller accepts "impossible" compound-forms, ''geažideapmigárvu'' and ''giddesteapmisággi'' accepted - __FIXED__ * 593 - missing words in beta2, still missing ''Nuppelohkái'' - not lexicalized * 595 - prefix+name wihtout hyphen (''ovdaLot'' instead of ''ovda-Lot'') * 597 - does not recognize nubbelohki - not lexicalized * 603 - suomabealdi, norggabealdi accepted * 606 - speller accepts VUOHTA compound * 611 - double hyphen sugg still accepted * 613 - short gen. as second compound part * 625 - word+footnote - possibly Polderland or MS, or a consequence of allowing spell checking of words including digits * 627 - prefix + hyhpen does not get accepted * 629 - ''a'' taking part in compounding without hyphen * 631 - numbers starting with 0 * 633 - double hyphens accepted in Word, not by cmdline speller * 634 - PropGen+hyph+PropGen * 637 - nai(go) becomes -naj(go) !smj Version: __Julevsáme, version 1.0.1, 2008-01-28__ * 482 - ''Nuorttalijguovlojn'' accepted again ** testcase changed, test __PASSED__ * 607 - acro + hyphen, ''NRKGA'' accepted - still __OPEN__ * 615 - actio and actor compounds - __FIXED__ * 616 - Bispadime-me-ráden - still __OPEN__ * 618 - dipht. simpl. - __FIXED__ * 629 - ''a'' taking part in compound - still __OPEN__ * 631 - number compounds starting with 0 * 634 - rop gen + hyphen + Prop gen __TODO:__ * look at test cases still not behaving properly (__Thomas, Tomi__) * document how compounding is controlled in the PLX conversion (__Tomi__) !!InDesign tools The speller works in InDesign and InCopy. Lacks user defined lexicons, but Polderland is trying to fix this bug. The new Sámi newspaper, ''Ávvir'', is publishing its first edition on February 6th. We should release final InDesign tools, including final spellers, one week ahead of that, Wednesday Jan. 30th. __TODO:__ * improve hyphenation testing (__Sjur__) ** done * test that the hyphenation is identical in InDesign and the command line hyphenator (__Sjur__) ** done * test twolc # bug solution (__Tomi, Trond, Sjur__) * fix double hyphen bugs (__Tomi__) * new lexicons by Tuesday (__Tomi__) * updated Polderland tools by Wednesday (__Sjur__) * final changes and bug fixes by Thursday afternoon (__Thomas, Sjur, Tomi__) * final lexicons by Friday morning (__Tomi__) !!Hyphenators We need more test data, to test hyphenation of different types of words. __Thomas__ should make a file {{gt/sme|smj/testing/hyphenation.txt}} of the format: {{{ compoundword com^pound#word }}} It should contain all possible word formation patterns and their correct hyphenation. That is, at least: * compounds * derivations * names * misspellings * compounds with acros, numbers, names, etc. __TODO:__ * create hyphenation test data (__Thomas__) ** done ** also done: used it in testing, bugs discovered !!Windows installer __TODO:__ * investigate the NSIS installer (__Børre, Sjur__) !!Releases __TODO:__ * update the ''Changes'' document (__Maaren__) * release InDesing tools Jan. 30. (__Børre, Sjur, Thomas, Tomi__) ** compile new lexicons (__Tomi__) ** test (__all__) ** document (__Sjur__) ** package and release (__Sjur__) !!!Other !!South Sámi project startup meeting * in Snåsa * 11th - 15th of Feb, kick-off meeting Wednesday 13. * Participants: SD (incl. Divvun), Nord-Trøndelag fylkeskommune, Snåsa kommune, UiT, "resource persons", south Sámi part of SGL (at least one representative) We extend the meeting on our part, to have this project's first gathering. __TODO:__ * get hotel rooms (__Sjur__) * make a first {{sma}} project plan (__Sjur, Trond__) !!Corpus contracts + open source TODO: * publish corpus contracts and project infra as open-source on NoDaLi-sta (__Sjur__) !!Bug fixing When fixing bugs, record the version number containing the fix in the Bugzilla bug report, such that for each bug, we know exactly when it should have been fixed, in what file(s) and what version. __83__ open Divvun/Disamb bugs (__45__ of these 83 are speller-related bugs, __38__ are other bugs), and __23__ risten.no bugs !!!Next meeting, closing The next meeting is 4.2.2008, 09:30 Norwegian time. The meeting was closed at 11:28. !!!Appendix - task lists for the next five days !!Boerre [iCal|/doc/admin/weekly/2008/Tasks_2008-01-28_Boerre.ics] * start to reorganise the documentation * gather {{sma}} texts * improve forrest stability with i18n, site look * set up the Leopard Server features for collaborative support * Hunspell lexicon conversion * InDesign documentation * investigate the NSIS installer * release InDesing tools Jan. 30. * work on Tromsø Sami workshop paper * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Lene * Ped project * work on Tromsø Sami workshop paper !!Maaren [iCal|/doc/admin/weekly/2008/Tasks_2008-01-28_Maaren.ics] * Put the list of possible {{sma}} corpus sources into a document * update the ''Changes'' document !!Per-Eric [iCal|/doc/admin/weekly/2008/Tasks_2008-01-28_Per-Eric.ics] * check some unusual and missing words from the last Olavi missing list * keep the contact with Kurt Tores family about his texts. * try to find other authors who have smj texts digitaly * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Saara [iCal|/doc/admin/weekly/2008/Tasks_2008-01-28_Saara.ics] * add new XSL/XML headers for proofing test docs * Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not) * discuss more parallel texts !!Sjur [iCal|/doc/admin/weekly/2008/Tasks_2008-01-28_Sjur.ics] * start to reorganise the documentation * gather {{sma}} texts * improve forrest stability with i18n, site look * set up the Leopard Server features for collaborative support * check the present {{sma}} sources * name db/risten.no * investigate the NSIS installer * get hotel rooms in Snåsa * make a first {{sma}} project plan * publish corpus contracts and project infra as open-source on NoDaLi-sta * release InDesing tools Jan. 30. * work on Tromsø Sami workshop paper * updated Polderland tools by Wednesday * final changes and bug fixes by Thursday afternoon * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Thomas [iCal|/doc/admin/weekly/2008/Tasks_2008-01-28_Thomas.ics] * look at test cases still not behaving properly * release InDesing tools Jan. 30. * work on Tromsø Sami workshop paper * final changes and bug fixes by Thursday afternoon * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Tomi [iCal|/doc/admin/weekly/2008/Tasks_2008-01-28_Tomi.ics] * Hunspell lexicon conversion * document how compounding is controlled in the PLX conversion * release InDesing tools Jan. 30. * work on Tromsø Sami workshop paper * debug %> problem in Hunspell conversion * fix double hyphen bugs * new lexicons by Tuesday * final changes and bug fixes by Thursday afternoon * final lexicons by Friday morning * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Trond [iCal|/doc/admin/weekly/2008/Tasks_2008-01-28_Trond.ics] * Report the smesmj project * Start working on the samdoc talk * {{sme->smj}} lexicon conversion to build bilingual lexicon resources * Reorganise documentation (with Børre and Sjur) * Gather sma texts (with Børre and Sjur) * Look at the sma source files (with Sjur) * Name lexicon project: Test editing xml files (when they are ready for it) * Make a first {{sma}} project plan * work on Tromsø Sami workshop paper * [fix bugs!|http://giellatekno.uit.no/bugzilla].