!!!Divvun/Disamb Seminar in Tromsø

* project meeting
**  date: 23. (after lunch) - 26. (Friday is travelling day) January.
**  place: Tromsø

* Participants
** Monday, Tuesday, Wednesday: Børre, Ilona, Linda, Saara, Sjur, Tomi, Trond  
** Thursday: Børre, Ilona, Linda, Sjur, Tomi, Trond
  
* Practical arrangements:
** Room(s): The meeting is at room 4475, with room 4402 as alternating room. 
  Both rooms are situated in the same floor as our offices.
*** We have one projector, and perhaps a possibility to hire another one.
*** 4475 has internet, but no whiteboard, and 4402 has whitebord, but not internet.
*** TODO: Get hold of a flipover for 4475.
*** TODO: Lunch only for 7 persons
*** TODO: Machine for Ilona at 10.00.
** Lunch will be in Sjampanjekantina, 11.30, Monday to Thursday.
** Coffee breaks are not arranged, we may order them from the canteen, or 
   we may buy it ourselves.
** hotel = Thon Polar Hotel, Grønnegata 45 (Saara, Ilona, Linda, Sjur)

Suggested content that's still not scheduled:

* Programmers
** Work distribution (in bug fixing, documentation, maintenance etc.)
*** Not done
** Parallel corpora and the corpus infrastructure
*** Discussions and clarifications, planned a trip to Oslo, no concrete work
** xml2lexc implementation plan
*** Not done (much done earlier, though)
** file-specific xsl scripts for corpus conversions, the what, how and who issues
   (t: Saara/Tomi; p: Trond, Børre, Sjur)
*** Saara has demoed her work, with eager support of Tomi and Børre, we have thus
    got some insight in what to do and how. Our xsl scripts cannot do much heavy
    work, though, they remain on the metalevel (modifying the header rather than 
    the body).

* Linguists
** Two linguists were missing, so not too much linguistics was discussed, but some.
** Get together and discuss common issues
*** Done
** Approaches to working with the lexicon
*** We are wiser now. (or are we?)
** Tags
*** All too short discussion with the Saami dept, but at least it clarified our own
    thoughts. They promised to look at the tags and their usefulness. The indefinite pronoun
    question remains open.
** Substandard forms
*** Not done.
** The numeral projects
*** Progress made:
*** For the arabic part, num.fst is now integrated into sme.fst. It needs refinement, 
    though.
*** For the linguistic part, we did some fieldwork with an innocent informant (Tomi).
** Twol issues
*** Not touched.
** Unix etc. courses.
*** Kept a 45 min unix course, useful. Still more hints to pick up, though, but ok that 
    the hints are presented in palatable portions. Conclusion: Have a unix session at the 
    next meeting as well, or perhaps unix course sessions online, next week etc.

* All
** how to correctly write Forrest documentation
*** site.xml (what is it, how do we add to it?)
*** site-frag.xml (what's the difference?)
** testing - how-to, improvements
** code style guidelines

!!!Schedule

Aims:
* a mixture of training/discussions and practical work, to avoid monotonicity
* most training in the beginning, as preparations for later work
** The schedule never recovered from the changing participant number (-2, +1, -1, etc.),
   but nevertheless we had a good workmeeting.

!!Monday 11.30-16

The meeting starts with lunch in Sjampanjekantina. 11.30.

Common 12-13.30 (1,5 h)

* Presentation, kick-off   15 min
* Project updates          15 min (2*7 min)
* Project milestones       15 min (2*7 min)
* Cooperation evaluation (practical/daily cooperation)  30 min
** project management
** programmer work
** linguistic work
** anything else?
* break                    15 min

13.30-14.45 (split in two groups)
* Trond, Ilona, Linda : Unix
* Sjur, Tomi, Børre, Saara: XQuery intro

15-16 (all)

Software updates/maintenance:
* XXE + config updates
**  Done
* OS X (all on 10.4.4)
**  Done
* bash / localisation status quo
**  Many different setups did that we never reached a conclusion. Sjur has
    a working setup …
* Xerox tools: We must negotiate a new version with Xerox.
** We have full version for creating the full wordlists, but only on cochise
** the new Mac is faster and has more memory
* forrest? Probably not
* SEE, Unison, CVL, Marratech
** Done
* CodeTek VirtualDesktop setup/config (?)
** Done

!!Tuesday 8.30-11.30
* How to use the xml corpora (Tomi, Børre, Saara => all - 1h)
** Done, Saara and Børre had a good introduction.

(Computational) linguistics 9.40-11.30

* keepinmind
** 3-part compound treatment


* Numeral session

We split in 3 different groups
# Techni group: Børre, Sjur (updates)
# Arabic group: Saara, Linda
# Alphab group: Tomi, Trond, Ilona

* Arabic
** regx: regex for numeric expressions, dates, quantifiers, ... 12. 3. 2003 

* Alphabetic
** ling: what are the linguistic facts?
** comp: tags, lexc treatment
** regx: regex for numeric expressions, dates, quantifiers, ... 12. 3. 2003 
*** Much done, but implementation is still half (arabic) and full (linguistic) open. Saara said she
    would have a look at dates with words (months).

!!Tuesday 12-16

* Programmers: Sjur, Børre, Tomi, Saara, (Irene)
** XML specification
*** Irene <===
    Check whether and how we cover all of the kvensk needs, done
    Discuss the multimedia things, done

* Linguists: Ilona, Linda, Trond.
** Note: We should consider including someone from the Saami department.
*** Done. (Kjell Kemi and Steinar Nilsen)
** Tagsets and POS in Saami (Evaluating and fixing (?) the tagset both for 
   the parser and for the disambiguator)
*** We only presented our problems and our analyser to the Saami dept staff.
** SGL decisions:
*** overview, any consequences for our work?
*** No input from SGL.

!!Wednesday 8.30-11.30


* Programmers: Sjur, Børre, Tomi, Saara
** XML specification
*** Irene <===
    Check whether and how we cover all of the kvensk needs
    Discuss the multimedia things


XQuery/eXist/proper names continued

!!Wednesday 12-16

* Disamb
** Reviewing issues from our first meeting:
** Schedules, internal dependencies

!!Thursday

Sjur and Børre in meeting - all day?
* Done, meeting memo at [usual place|tromso-2006-01-pt.html].

!!Thursday 8.30-11.30

!!Thursday 12-16

!!To be allocated

Groups:
ling: * Ilona, Linda
tech: * Saara, Tomi, Sjur, Børre, Trond
twol: * Sjur, Trond

!Techtopics:

Routines for addition to lexicon, cooperation wrt. new corpus texts coming in
* Maaren, Ilona, Børre, Trond, (Linda, Thomas)
** Ilona and Linda have discussed cooperation here.

eXist/XQuery/XSL/webapp/risten.no/proper names tasks (17,5h/2d2,5h):
* XQuery people: primary: Saara, Tomi, Sjur (=STS); optional: Børre, Trond (=BT)
* XQuery/XSLT intro (1,5h) (STS + BT)
* intro to the risten.no architecture (1,5h) (STS + BT)
** server setup
** webapp organisation:
*** frames
*** search vs edit
** how the search works presently
** how the editing works presently
*** Done.

* single down the simultaneous update bug (3h)
** DONE (puh) and reported to the mailing list (read and enjoy)
* enhance security (1,5h) (https? Yes, but how to? password check) (STSB)
**  Some links 
*** [Howto setup ssl|http://www.webhostgear.com/170.html]
*** [RedHat specific, but
    useful anyway|http://www.tldp.org/HOWTO/SSL-RedHat-HOWTO-4.html]
*** [Online password strength
    meter|http://www.securitystats.com/tools/password.php]. When making passwords we should follow the rules recommended on 
    this page.
*** Above points: Not done.    
* refactor the code to split collection-specific code from general
  (both edit and search) (3h+)
**  Work began
* make a first version of proper noun infra (6h)
** search
** edit
*** Not done
* synchronisation with CVS (1,5h):
** download files
** sorting (identical and fixed sort order)
*** Not done.
* setup access to the windows server (0,5h)
**  Not done
* refactor to stored procedures (everything contained within eXist)?
  (discuss, 0,5h)
** will make it easier to upgrade the webapp (no need to log on to the server
   machine, just use the eXist Java client)
** will make the webapp mostly selfcontained
** might introduce a performance hit - needs to be checked
*** Not done
* XML DTD check:
** review the converted lexicons
** anything missing, or badly organized?
*** Not done


! Summary

We have seen important progress in some but very limited fields, but there are still
open issues.

Evaluated against our expectations the week was too short.

One lesson to learn is that we should have met better prepared, we should have gone 
through the issues in beforehand, so that we could concentrate upon the really hard 
problems.

It was nice to meet face to face, both for the first time, and for the 
long-time-no-see factor.