The directory {{pedversions/sjdoahpa/sjd/src}} contains the entries used in the current online version of sjdoahpa.
The russjd and engsjd files will come in {{pedversions/sjdoahpa/sjd/russjd}} and {{pedversions/sjdoahpa/sjd/engsjd}}.
!!!Test
@cip reverted sjdrus-data to russjd and installed the new Leksa (at the moment the restriction elements are ignored)
==> test the new Leksa online in both directions!
!!!TODO
correct the restrictions in the translations
{{{
моаӆӆьчхэ
облезать о шкуре
облезать
облезть
to grow bare about a skin
to grow bare
}}}
this way
{{{
моаӆӆьчхэ
облезать
облезть
to grow bare
}}}
!!!TODO
correct inconsistencies in the verb file: eng verb either with 'to' or without
{{{
v_sjdrus.xml:
to sympathize
v_sjdrus.xml: will hesitate for a time
v_sjdrus.xml: to come nearer
v_sjdrus.xml: to become hardened
v_sjdrus.xml: to become callous
v_sjdrus.xml: to harden
v_sjdrus.xml: to become callous
v_sjdrus.xml: to make someone sincere
v_sjdrus.xml: to become sincere
v_sjdrus.xml: to move
v_sjdrus.xml: change a place
v_sjdrus.xml: to pass to a new place
}}}
!!!TODO
Unlike Trond's claim, no all sjd lemmata have an eng translation:
{{{
кэ̄ннц
ноготь
то̄лл
огонь
костер
место для костра
Total of
xml>grep -h '_ENG' *.xml | wc -l
49
Here is the list
xml>grep -n '_ENG' *.xml
xml>grep -n '_ENG' *.xml
n_sjdrus.xml:1579: ноготь_ENG
n_sjdrus.xml:1597: огонь_ENG
n_sjdrus.xml:2723: лосиха_ENG
n_sjdrus.xml:2933: важенка_ENG
n_sjdrus.xml:4608: pulp ляшки_ENG
n_sjdrus.xml:5525: оленята_ENG
n_sjdrus.xml:5586: олененок_ENG
n_sjdrus.xml:5724: морозец_ENG
n_sjdrus.xml:5739: морозец_ENG
n_sjdrus.xml:5754: морозец_ENG
n_sjdrus.xml:5812: сиг_ENG
n_sjdrus.xml:5826: кумжа_ENG
n_sjdrus.xml:6020: пинагор_ENG
n_sjdrus.xml:6049: каменки_ENG
n_sjdrus.xml:6050: мальки_ENG
n_sjdrus.xml:6065: сиг big_ENG
n_sjdrus.xml:6066: big сиг_ENG
n_sjdrus.xml:6122: хариус_ENG
n_sjdrus.xml:6136: хариус_ENG
n_sjdrus.xml:6377: smell варенной fishes_ENG
n_sjdrus.xml:6447: бражка_ENG
n_sjdrus.xml:6886: лопанье_ENG
n_sjdrus.xml:6914: лопанье_ENG
n_sjdrus.xml:7243: шамшура_ENG
n_sjdrus.xml:7300: zone part on female ярах_ENG
n_sjdrus.xml:7332: skin дублённая_ENG
n_sjdrus.xml:7361: позументная tape_ENG
n_sjdrus.xml:7390: valve on man's ярах_ENG
n_sjdrus.xml:9687: вежа_ENG
n_sjdrus.xml:10103: pure place in куваксе_ENG
n_sjdrus.xml:11309: круча mountains_ENG
n_sjdrus.xml:11617: озерко_ENG
n_sjdrus.xml:11737: корга_ENG
n_sjdrus.xml:12211: thickets ивника_ENG
n_sjdrus.xml:15740: сонорный a sound_ENG
n_sjdrus.xml:15754: deaf сонорный a sound_ENG
n_sjdrus.xml:15782: deaf сонорный a short nasal sound_ENG
n_sjdrus.xml:15796: deaf сонорный a long nasal sound_ENG
n_sjdrus.xml:15810: deaf сонорный языковый a short sound_ENG
n_sjdrus.xml:15824: deaf сонорный языковый a long sound_ENG
n_sjdrus.xml:16663: cuffs малицы_ENG
n_sjdrus.xml:16729: малица_ENG
pron_sjdrus.xml:124: which-nibud_ENG
v_sjdrus.xml:1525: тошнить_ENG
v_sjdrus.xml:1709: тошнить_ENG
v_sjdrus.xml:4584: small шинковать_ENG
v_sjdrus.xml:4805: tax to give_ENG
v_sjdrus.xml:4822: is subject to the supreme court_ENG
v_sjdrus.xml:4984: will hesitate for a time_ENG
}}}
!!!possible future todo
@Micha: a few observations:
* ё vs. е in Russian (e.g. вдвоём / вдвоем); perhaps we should consistently use ё in the xml, but include е (with spellrelax) for oahpa users?
* the semantics should be checked (does the other oahpas use predefined sets of values?), e.g. why is э̄ххт тоа̄фант one thousand
"HUMAN", or why is кутӭ-кутӭ two each
"HUMAN" and "FOOD"? It could be anything: "cars", "reindeer", "xml databases", etc.
* common (uni)coding issues (perhaps we can apply a script to future incoming data):
** Latin letters in Cyrillic: a --> а, o --> о, etc. (even in Russian text)
** Precomposed vs. combining diaeresis: ё --> ё, ӓ --> ӓ, ӭ --> ӭ
** Precomposed vs. combining macron: ӣ --> ӣ, ӯ --> ӯ
* several multi word lemmata, like э̄ххт чӯдтҍ or югкеналла лыдцант or пя̄лла ӣнсэй оанҍхэсь нюннҍ тӣххт (especially the latter two are definitely not lemmas, but paraphrases)
** there are even entries with multiword expressions both as lemmas and translations, like:
{{{
ко̄ппче соа̄йметҍ
собирать сетки
to collect grids
}}}
Can we use these for a vocabulary trainer?
* English verbs with(out) "to"? (e.g. undress
vs. to dress
)
* free word order in Russian NP? (e.g. хвост короткий
and короткий хвост
)
* attr. vs. pred. adjectives (in sjd and rus!)
* translations needs to be checked carefully: cf. this example: the basic meaning of this Kildin word clearly means "unfamiliar, unknown", of course in some situations this can also be expressed as "new", but as a translation of the lemma >eehk< "new" is clearly wrong, especially in a vocabulary trainer (in a true dict we could give this "new-meaning" in an example sentence)
{{{
е̄ххк
незнакомый
новый
unfamiliar
new
}}}
* inflected form as lemma
{{{
углясьт
в уголке
in a corner
}}}
pos="adv" is wrong, because this is an inflected noun (which can of course be used as an adverbial); I understand that such forms should be used for training vocabulary, but we have to find another tag for the pos value here
* this is not a "dim_set", but a "pl_set"!
{{{
па̄лл
мяч
ball
па̄л
мячи
balls
}}}
we could of course use the oahpa for the training of inflectional forms, but is it useful to have plural forms mixed up with diminutives?