!!!August 2006 gathering * Place: Tromsø * Dates: 22-24? !!!Participants Børre, Maaren, Sjur, Thomas, Tomi, Lene, Trond, Saara !!!Topics * Presentation, status quo and plans for divvun and disamb * Linguistic issues ** Normativity ** Lexical coverage ** 3-part compounds * Lang-tech issues: ** finalise Xerox hyphenation (Trond + Sjur 2-3h) ** G3 issue for sme * Technical issues ** The munching deadlock ** M4 macros for hyph/spell/dis versions of TWOL ** our own wiki? for (technical) CG documentation, * [Speller plans|tromso-2006-08-lexc2xspell.html] * Polderland work ** Online meeting with them? Yes. [Memo|tromso-2006-08-polderland.html] * [Proper noun status and action plan|tromso-2006-08-propnoun.html] ** action plan: ** technical evaluation: how do we make proper noun editing work in Emacs? ** how much more !!!Commented topics * linguistic issues ** General linguistic discussion (morphology (and syntax?)): *** What is on top of the priority list? *** What are our most serious problems? *** What can we learn from each other? *** How should we work together across the projects in the remaining months? ** Lexical coverage ** 3-part compounds * lang-tech issues: ** finalise Xerox hyphenation (Trond + Sjur 2-3 timar) ** G3 issue for sme * technical issues ** The munching deadlock ** M4 macros for hyph/spell/dis versions of TWOL ** our own wiki? for (technical) CG documentation, (open source) fst technology and Sámi language technology (basically the parts of our current documentation that isn't project-specific. Goal: to invite and engage a larger community (CG, fst/transducer users, i.e., the grammatically-based bottum-up parsing community) to improve and create documenation for the above topics. Potential partners are Helsinki, Oslo, Odense, Stuttgart. * [speller plans|tromso-2006-08-lexc2xspell.html] - infra to generate word form lists for Polderland/Aspell/HunSpell type spellers. We need to setle on plans for this infrastructure now. * Polderland work ** online meeting with them? Yes, [memo available|tromso-2006-08-polderland.html] * [proper noun status and action plan|tromso-2006-08-propnoun.html] ** action plan: we need to systematically go through the most typical/useful tasks (problem now: Sjur and Tomi do not edit lexicons enough to really know what is most important) ** technical evaluation: how do we make proper noun editing work in Emacs, and who? Are the coding (as in XML tags/structure) solutions good? (speed is not an issue, at least not on my new Mac:-), and shouldn't be on the server either) Worst-case scenario: The name project failed, took too long time, etc., and we will have to build separate name lexica in traditional lexc format for each language. In the best case we will get it up and running in a reasonable amount of time. * corpus collection: ** how much more (in terms of effort and invested time)? Børre needs to start looking at testing pretty soon - we should have tools for people to test this fall, and we need a working feedback infra to make sure we get the respons we need * divvun plans ahead !!!Time table || Time || Tuesday || Wednesday || Thursday | 8:30-10:00 | A Presentation (9:00->) | T lexc2Xspell | Machine update + planning |   |   | L Consequences of eval | Aligner (Trond, Saara) |10:00-10:30 | A Reports, plans | Coffee | A Coffee |10:30-12:00 | a Polderland | T lexc2Xspell | - |   | a Polderland | L Consequences of eval | - |12:00-13:00 | A Lunch | A Lunch | A Lunch |13:00-14:00 | A G3/Howto/m4/Wiki | A Name lexicon 1 | (exit Saara, Maaren) |   | 13:45: Coffee | (what shall we store) | - |14:00-14:30 | T Video with PL (1h) | A Coffee | A Coffee |   | L Evaluate |   | - |14:30-16:00 | A *G3/Howto/m4/Wiki | T Name lexicon 2 (how) | - | -  | - | L (3part) | - {{{ ... | preprocess --abbr=bin/abbr.txt | corrtypos.pl | .. ... | preprocess --abbr=bin/abbr.txt --corr=src/typos.txt | lookup ... }}} {{{ A = all a = all - Lene T = Saara, Sjur, Trond, Tomi, Børre L = Lene, Maaren, Thomas }}} !!Tuesday afternoon 1 {{{ A Presentation with Polderland T Video with PL (Saara, Sjur, Trond, Tomi, Børre, Maaren, Thomas) L Evaluate our linguistic analysers (Lene, Maaren, Thomas) How good are the tools? (M, T explaining L what input she can expect from M, T) What does it take to make them better? Do we need tools for measurement Or: Office as usual, working }}} !!Tuesday afternoon 2 * G3: Trond, Sjur, Thomas * Howto: Maaren explains practical linguistic work to Lene <= M&L find out how to cooperate... * m4: Saara, Tomi * Wiki: Børre (todo: write a spec) !!Thursday morning * Plenary machine update (all(?)) <== * Aligner (Trond, Saara) Machine updates: * forrest issues / installations / updates * readline / bash !!Thursday evening * G3 (Sjur, Trond, Thomas) <== * Hyphenation (Sjur, Trond) <== * Bible format discussion * Corpus health care * Sámi chars in pdf ** Solved! Almost:-) - path needs to be generalised, then checked in to CVS. * i18n finalisation: language selection menu (using dispatcher) {{{ rule for id field in common.xml default: you are your own id overriding: your id is different from yourself, and points to another lemma common.xml Kautokeino .. nob, nno, eng ... id=Guovdageaidnu Guovdageaidnu ... sme, sma, smj ... id=Guovdageaidnu sme.xml (id info is inherited, perhaps) Kautokeino Guovdageaidnu | `´ sme.lexc (pure lexc file, generated) Kautokeino contlex-i ; Guovdageaidnu contlex-j ; }}} !!M4 flags: HYPH: * exclude or include the hyphenation marks when making a hyphenator transducer ("#", "-" and "^") SHORTCOMP: * Permitting truncated two- and three-part compounds or not DIPHSIMPL: * Exclude/include twol ruleset to allow diphthong simplification. {{{ norm: oahpaheddjiid also: oahpaheaddjiid and oahpaheddjiid ==> include/exclude the G3-sensitivity of the diphthong simplification rules }}} non-m4-variation (variation in the lexicon): * Circular transducer or not? Today: {{{grep -v '^[CN]^'}}} * Including SUB entries of not? * EAST, WEST, ALLDIALECTS Differenting between eastern and western dialectal forms, tolerating both * SOUTH Tolerating cleary non-standard (like Locative Sg -n)