!!!Meeting setup * Date: 06.06.2005 * Time: 11.00 Norw. time * Place: Wherever we are :-) * Tools: Phone, iChat, SubEthaEdit !!! Agenda # Opening, agenda review # Reviewing the task list from a week ago # Documentation - divvun.no # Corpus gathering # Corpus infrastructure # Linguistics # Term db # Other issues # Summary, task lists # Closing !!!1. Opening, agenda review, participants Opened at 11.00. Agenda accepted as is. Present: Børre, Sjur, Thomas, Trond Away: Tomi (Finland), Maaren (Iceland) Main secretary: Trond !!!2. Reviewing the task list from the last meeting !! Børre: * Disamb documentation within giellatekno.uit.no (with Trond) ** The alpha version of the forrest-based giellatekno external site is up and running, but there are still open issues: *** word-form generation not working *** no link back to Divvun project page * Add corpus texts that are unproblematic from a juridical point of view ** nothing done * Contact people about Lule Sámí texts ** Contacted Kåre Tjihkkom, he said he didn't have any texts himself. He asked me to contact Árran. Talked to Bård Eriksen at Árran, who was interested in contributing with text and also would like to be a tester. * Contact skolelinux and Knut Hofland about webcrawler. ** Had a look at [http://borel.slu.edu/crubadan/], which is what skolelinux has been talking about. According to [http://borel.slu.edu/crubadan/stadas.html] * Finish the setup of divvun.no ** Contact Thor Øivind: *** Contacted him, he was going to update Tomcat (had an old version). * Contact Roy Dragseth about cvs-commit mailing ** Sent an e-mail some time ago. He has not answered, will phone him. * continue Forrest i18n ** Nothing done. ** Please also add an issue in Forrest's own issue tracker (bug db) (Sjur) * continue [http://giellatekno.uit.no], encoding problems ** Solved, using decimal codes :-( (add to Forrest issue tracker) !! Maaren: * continue missing list * find and prepare issues for the language board meeting * translate Helsinki contract into rough Norwegian; send it to Sjur, then lawyer !! Sjur: * Scan and OCR the Helsinki contract; mail it to Maaren ** Done (not OCR) ** Resend to Trond (Maaren is away) * Ask Anne-Britt to update the contract with the University (we are getting 3, not 2, persons there from the fall) ** not done yet * write the LIST option to our test tools, as requested in the discussion group. ** not done * complete the action summary after our half-year evaluation ** not done :-( * fix risten.no bugs ** fixed some ** added risten.no to bugzilla @ giellatekno.uit.no ** updated the version on the server ** discussed bugs in eXist, and feature requests * contact Kimmo Koskenniemi about kwic-snt and UTF-8 bug ** not yet * voice group-chat not working to Sámediggi - should be fixed ** tried, but no info from LÅH yet * Also done: ** Tiger-updates (a.o. Aspell), getting Sámediggi intranet access to work !! Thomas: * Find issues to giellastivra čoahkkin ** Found many * help Maaren prepare those same issues ** done, but should go to giellalávdegoddi instead of giellastivra * Continue with the verbs ** A few hundred left, should finish tomorrow !! Tomi: * Start looking at the smj infrastructure, turn it into utf-8, look at whether updates and improvements done for sme should be done for smj as well, in order to get a uniform infrastructure. * update catxml to remove sections in unwanted language(s) * Translate the texts in the sme/corp/correct/ directory to utf-8. ** Done. * Work on the nonrec transducer: Get the nonrec files out of src/ and into int/, get the "make nonrec" process up and running. ** Done (but headed into limits of the available xfst tools). * Look into shortening of three-part compounds, middle part * decapitalisation of proper nouns when (if?) compounded, and when derived (the oslolaš issue) !! Trond: * Disamb documentation with Børre. ** Alpha version up and running. * Systematize the linguistic issues ahead of the Language Board meeting ** Nothing done. * Add cgi-bin errors to Bugzilla. ** Done. * continue the office disc. with the faculty ** Not done. * Linguistic work. ** Work in continuos progress, worked on vocabulary, worked on disamb input from Pekka. * computational linguistics, with Tomi. ** Not done (see Tomi's task list above). !! All: * The www.divvun.no release. * The Bugzilla bugs, and a quip or two? ** Today 15 open bugs. * continue the discussion on the language policy of divvun.no in the newsgroup ** Trond has replied ** Børre agrees to the policy *** What about news? In all languages? Lule or North Sámi as default language. ** Thomas: likes it. * Open questions about language policy: ** All = English? Yes (the amount is small). Decided. ** How much in 5 languages? ** Size of general public info: Trond: only a few paragraphs - about the project, and the language policy of the site. More specifically, it should include: *** what the divvun project is *** why it is made, by whom, on whose decision *** what we make, (why we do not make sma) *** how we make it *** how, when, and for whom it eventually will be used, and on what terms and price *** an explanation of our lg policy, and a good set of links. ** The user documentation will be written in the target language, in Finnish and Norwegian, and in English. !!!3. Documentation - divvun.no Thor Øivind has to get Tomcat working. See our task list for other issues. !!!4. Corpus gathering * __Lule Sámi__: ** Ådå Testamennta (Svenska bibelsällskapet). They must be contacted. * __North Sámi__: ** Børre will get in touch with the Crúbadan project about the ''se'' material there, and access to it. Cf. [http://borel.slu.edu/crubadan/index.html] * The language consultant in Guovdageaidnu is actively working, gathering public documents from the municipality archives. * Otherwise Børre awaits progress on the __contract work__: ** Sjur will check with Maaren about the translation from Finnish to Norwegian, and forward the translation to Trond if not done by somebody else. !!!5. Corpus infrastructure Lars Nygård has made a searchable web interface as a beta version: [Samisk online korpus|http://omilia.uio.no/CE/sami/]. Cf. [http://omilia.uio.no/CE/CE.pdf] for documentation. !!!6. Linguistics # NRK:as # -eaddjiid vs. –eddjiid # -eabbo/-abbo vs. –eabbu/-abbu # -at/-ut vs. –et/-it # -dettiin vs. –diin/-din # -lit vs. –šit (kondišunal) # Western -l- forms : soappulin vs. soppolin (soppošin vs. soappušin) # kond alternativ form livččon vs. livččen # kárášjoh nieida vs. kárášjohnieida # hálede: {{{ +Du1:e K ; =hálede +Du1:edne K ; =hálededne +Du1:etne K ; =háledetne }}} # háliidit+V+TV+Ind+Prs+Sg3{{{ háliida háliid}}} # makkárat imperativhámit galget leahkit? Ovdamearkkat bárrastávval vearbbain: {{{ +Pl1:Q3t K ; ! bohtot ! West ! +Pl1:Y3t K ; ! boahttot ! East +Pl2:Q2t K ; ! bohtet ! West +Pl2:Y3t K ; ! boahttet ! East Bárahis stávvalat: háliidit+V+TV+Imprt+Prs+Du1 háliideadnu háliideahkku háliidit+V+TV+Imprt+Prs+Pl1 háliideadnot háliideahkkot háliidehkot háliidetnot háliidit+V+TV+Imprt+Prs+Pl2 háliideahkket háliidehket Leahkit: +Du1:eadnu K ; +Du1:eahkku K ; +Pl1:ehkot K ; +Pl:eatnot K ; }}} # Noun/adj shortforms {{{ beakkán+A+Sg+Gen beakkána beakkán beakkán+A+Sg+Acc beakkána beakkán }}} # Jeagohin vs. Jeagoheapmi # rieban vs. rievan # Goallossánit: kulturlávdegoddi / kultuvrralávdegoddi (sosial-, privat-, ...) jf. *kultur, kultuvra (privatriekta) # Loatnasánit To be further discussed within the project, and brought forward to the normative bodies. !!!7. Term db Still more bugfixing, adjustments based on user feedback. !!!8. Other issues Nothing. !!!9. Summary, task list !! All: * The www.divvun.no release. * The Bugzilla bugs, and a quip or two? * continue the discussion on the language policy of divvun.no in the newsgroup, esp. re the open questions: ** How much in 5 languages? ** Size of general public info: Trond: only a few paragraphs - about the project, and the language policy of the site. More specifically, it should include: *** what the divvun project is *** why it is made, by whom, on whose decision *** what we make, (why we do not make sma) *** how we make it *** how, when, and for whom it eventually will be used, and on what terms and price *** an explanation of our lg policy, and a good set of links. ** should news be in all languages? What about RSS feeds from Forrest? !!Børre * Contact Kevin Scannell of an Crúbadán fame, ask if we can use his north sami material. * Contact Roy Dragseth about cvs-mailing * Work on divvun.no * Continue http://giellatekno.uit.no conversion to forrest ... * Contact Svenska bibelsällskapet * Add issue to forrest issue tracker about utf-8 ihtml documents. ** Check for existing bug reports, it might already be in there * Continue forrest i18n research !! Maaren: * continue missing list * find and prepare issues for the language board meeting * translate Helsinki contract into rough Norwegian; send it to Sjur, then lawyer !! Sjur: * Check the translation of the Helsinki contract with Maaren; forward the task to Trond if not yet assigned * Ask Anne-Britt to update the contract with the University (we are getting 3, not 2, persons there from the fall) * write the LIST option to our test tools, as requested in the discussion group. * complete the action summary after our half-year evaluation * fix risten.no bugs * contact Kimmo Koskenniemi about kwic-snt and UTF-8 bug * voice group-chat not working to Sámediggi * add i18n bug to Forrest issue tracker * add i18n menu as feature request to Forrest issue tracker !! Thomas: * Begin to prepare issues to giellalávdegotti čoahkkin * Present issues in dicussion forum * Finish the work on verb valency * Start to look at Lule Sámi !! Tomi: * Start looking at the smj infrastructure, turn it into utf-8, look at whether updates and improvements done for sme should be done for smj as well, in order to get a uniform infrastructure. ** __High__ priority * update catxml to remove sections in unwanted language(s) * Look into shortening of three-part compounds, middle part * decapitalisation of proper nouns when (if?) compounded, and when derived (the oslolaš issue) !!Trond * translate Helsinki contract into rough Norwegian; send it to Sjur, then to lawyer !!!10. Next meeting, closing 13.06.2005 10.00 Closed at 12.45