Mikas results: I found the following compounds: o v d d a s v á s t á d u s g u o v l l u t # o v d d a s v á s t á d u s g u o v l l u t i n t e r n á h t a s k u v l l a s # i n t e r n á h t a _ s k u v l l a s v e a r r o p o l i t i h k k a # v e a r r o p o l i t i h k k a r u h t a d a n m á r k a n á š š i ¤ u h t a d a n m á r k a n á š š i t e a i g g á d u š š a n v u o i g a t v u o đ a # e a i g g á d u š š a n v u o i g a t v u o đ a g i e l d d a h á l d d a h u s s i i # g i e l d a _ h á l d d a h u s s i i m e a h c c e e a l á h u s d o a i m m a i d b i r a s g á h t t e n d e p a r t e m e a n t a e a m i á l b m o g i i d g u o r a h a l l a n g u o v l l u g u o r a h a l l a n v u o i g a t v u o đ a d u t k a n m a t e r i á l a n l á h k a m e a r r á d u s a i d l á h k a p a r a g r á f a k u l t u r m u i t o s u o d j a l u s g u o v l l u d o a i m m a ---- proably a false positive ## ?? k u l t u r m u i t t u i n , _ c e a l k á _ k u l t u v r a _ m u i t u _ l á g a _ § _ 8 e a l á h u s b a r t a s a d j á i g e a s s e g u o h t u m i i d d a m u r j e n v u o i g a t v u o đ a f á d d á b a r g g u s g u o v l l u h á l d d a h u s d o a i m m a h a g a i n j i e n a s t a n l i h p p o k o n f a l u h t a č á l l i n p r o š e a v t t a i d e https://raw.githubusercontent.com/mikahama/compound-errors/main/data/src-gtfree-bible.text.json?token=AHZY5U2PFR2QMDJ5YJXR7PTAE2IXQ comments: * some Norwegian in the start * Sámi starts here: { "correct": "kristuskránsa", "error": [ "kristuskránsa" ], "pos": [ "?" ] }, { "correct": "¶", "error": [ "¶" ], "pos": [ "CLB\n:\\n\n\n" ] }, { "correct": "stállu", "error": [ "stállu" ], "pos": [ "N" ] }, Whis is there twice N? { "correct": "stálloáhkku", "error": [ "stállu", "áhkku" ], "pos": [ "N", "N" ] }, Not Pos: * Err/Orth, Err/MissingSpace, Ex/A, Ex/V, Err/Lex, Err/Orth-a-á, Err/Orth-nom-gen (267 Err/ matches and more than 1000 Ex/ matches) * also what about punctuation? - PUNCT, CLB?, sometimes nothing * also when there is a quotation mark or a bracket right next to a form, then it should not get PUNCT: ** (Čáhceviežžan -- this should get N ** “ii --- this should get V ** “Ho --- I don't know? { "correct": "heajumus", "error": [ "heajumus" ], "pos": [ "Ex/A" ] }, { "correct": "hálideigga", "error": [ "hálideigga" ], "pos": [ "Err/Orth" ] }, { "correct": "(Čáhceviežžan", "error": [ "(Čáhceviežžan" ], "pos": [ "PUNCT" ] }, { { "correct": "“ii", "error": [ "“ii" ], "pos": [ "PUNCT" ] }, { "correct": "“Ho,", "error": [ "“Ho," ], "pos": [ "PUNCT" ] }, { "correct": "ovttamielas", "error": [ "ovttamielas" ], "pos": [ "Err/MissingSpace" ] }, { "correct": "“", "error": [ "“" ], "pos": [ "PUNCT" ] },