This file contains the current work with filtering and freezing of
smanob files stemming from the dict directory.
It is aimed to ease the communication between Cip, Lene and Trond, the reason
is that the 00_readme.txt file was meant for the steady filtering and reverting files.
Reverting smaswe to swesma
VIC - Very Important Check before any reverting to swesma:
- check the language flag in all files systematically (cip)
==> done
1. Cip supposes that Ryan, Lene & Co don't need all attributes
from sma-lemma into sma-translation (ignore stat="pref" for now).
E.g.
suejies
vs.
suejies
2. Cip's reverting script copies the -elements into the tg-element of the reverted file.
Trond means that re-info is not needed at all in smaoahpa. It is true that apparently there
is no re-element in the nobsma-files but there is some re-information in the l-element.
nobsma>grep '(om ' * | wc -l
42
v_nobsma.xml: reise seg (om hår)
v_nobsma.xml: reke uten lov (om barn)
v_nobsma.xml: slippe lett (om bark)
Question: What to do with that? Shall I add the re automatically to the l-element in brackets?
Trond: Framlegg:
Lag to lister og (la lingvistane) ta ein titt.
3. weird IDs:
n_swesma.xml:
Hei!
Jeg husker ikke om vi har snakka om det tidligere, men: multiword skal
fremdeles være multiword etter snuinga. Dvs at pos-informasjonen ikke
skal ha noen funskjon i snuingsprosessen. Selv om nob-oversettelsen
består av ett ord, skal det tilhører multiword-fila. Multiword viser
til at sma er multiword.
- Lene
1. this information is anyhow in the data, namely with the sma entrie
(which becomes now a translation of the nob entry)
Trond: Viss mwe-info kan bli styrt frå same swesma-fil er det greit.
Lenes poeng slik eg ser det var ikkje at vi må ha ei multiword
__fil__, men at ordpara skal vere mwe også etter snuinga.
2. after reverting you have also dupicates stemming from different
smanob files in the nob data; these have to be merged in order not to
get some messy stuff with the database
Trond: Ok, dette er eit teknisk spørsmål.
3. technically, it is possible to have the former multiword entries in
the same file DESPITE the fact that the nob entries don't carry with
them the pos "multiword": the entries can be traced basen on the pos
of the sma "translations" (see 1. item above)
Trond: Nettopp. Altså: separat fil eller ikkje er eit teknisk spørsmål,
det viktige er å halde på den semantiske gruppa Multiword.