Oc@s?ddlZddlZddlZddlZddlZddlZddlmZddlmZddl Z dd%dYZ de j fdYZ de j fd YZ d d&d YZd d'd YZde j fdYZdd(dYZde j fdYZdefdYZde j fdYZdd)dYZde j fdYZdd*dYZde j fdYZd d+d!YZd"d,d#YZed$kr;x[e e eeeeegD];Ze jZeje j ee j!j"eqWndS(-iN(tetree(tdoctestcomparet ParallelFilecBsVeZdZdZdZdZdZdZdZdZ dZ RS( sY A class that contains the info on a file to be parallellized, name and language cCs ||_||_|jdS(N(tnametparalangtsetLang(tselfRR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt__init__'s  cCs|jS(s6 Return the absolute path of the file (R(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytgetName,scCstjj|jS(s0 Return the dirname of the file (tostpathtdirnameR(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt getDirname2scCstjj|jS(s1 Return the basename of the file (R R tbasenameR(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt getBasename8scCs/tj|j}|jjd|_dS(s* Set the lang of the file s*{http://www.w3.org/XML/1998/namespace}langN(RtparseRtgetroottattribtlang(Rt origfile1Tree((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR>scCs|jS(s* Get the lang of the file (R(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytgetLangEscCsftj|j}|j}|jd}x/|D]'}|jd|jkr7|jdSq7WdS(sf Get the basename of the parallel file Input is the lang of the parallel file s.//parallel_texts*{http://www.w3.org/XML/1998/namespace}langtlocationN(RRRRtfindallRR(RRtroott parallelFilestp((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytgetParallelBasenameKs   cCsD|jj|j|j}|jd}tjj||S(s> Infer the absolute path of the parallel file s.xml(R treplaceRRRR R tjoin(RtparallelDirnametparallelBasename((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytgetParallelFilenameWs!( t__name__t __module__t__doc__RRR RRRRR(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR"s       tTestParallelFilecBsMeZdZdZdZdZdZdZdZdZ RS(s1 A test class for the ParallelFile class cCs!ttjddd|_dS(NtGTFREEsA/prestable/converted/sme/facta/skuvlahistorja2/aarseth2-s.htm.xmltnob(RR tenvirontpfile(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytsetUpdscCs|j|jjddS(Nsaarseth2-s.htm.xml(t assertEqualR'R(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt testBasenamegscCs4|j|jjtjjtjdddS(NR$s./prestable/converted/sme/facta/skuvlahistorja2(R)R'R R R RR&(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt testDirnamejscCs(|j|jjtjdddS(NR$sA/prestable/converted/sme/facta/skuvlahistorja2/aarseth2-s.htm.xml(R)R'RR R&(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestNamemscCs|j|jjddS(Ntsme(R)R'R(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestLangpscCs|j|jjddS(Nsaarseth2-n.htm(R)R'R(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestGetParallelBasenamesscCs(|j|jjtjdddS(NR$sA/prestable/converted/nob/facta/skuvlahistorja2/aarseth2-n.htm.xml(R)R'RR R&(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestGetParallelFilenamevs( R R!R"R(R*R+R,R.R/R0(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR#`s      tTestSentenceDividercBsMeZdZdZdZdZdZdZdZdZ RS(s/A test class for the SentenceDivider class cCstdd|_dS(Ns4parallelize_data/finnmarkkulahka_web_lettere.pdf.xmlR-(tSentenceDividertsentenceDivider(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR(|scCstj|dt}tj|dt}tj}|j||ds|jtjd||dj d}t |ndS(s5 Check if two xml snippets are equal t pretty_printitsutf-8N( RttostringtTrueRtLXMLOutputCheckert check_outputtoutput_differencetdoctesttExampletencodetAssertionError(Rtgottwantt string_gott string_wanttcheckertmessage((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytassertXmlEquals  *cCsL|j|jjd|j|jjd|j|jjjjddS(sACheck that the constructor makes what it is suppposed to iR-t _ElementTreeN(R)R3tsentenceCountertdocLangt inputEtreet __class__R (R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestConstructorscCs<|jj|jj}tjd}|j||dS(NsAparallelize_data/finnmarkkulahka_web_lettere.pdfsme_sent.xml.test(R3tprocessAllParagraphstdocumentRRRE(RR?R@((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestProcessAllParagraphss  cCsd|j_tjd}|jj|}tjd}|j||tjd}|jj|}tjd}|j||tjd}|jj|}tjd}|j||d|j_tjd }|jj|}tjd }|j||tjd }|jj|}tjd }|j||tjd }|jj|}tjd}|j||dS(s0Check the output of processOneParagraph R-s

Jápmá go sámi kultuvra veahážiid mielde go nuorat ovdal guldalit Britney Spears go Áilluhačča? máksá Finnmárkkuopmodat. § 10 Áššit meahcceeatnamiid

s

Jápmá go sámi kultuvra veahážiid mielde go nuorat ovdal guldalit Britney Spears go Áillohačča ?máksá Finnmárkkuopmodat .§ 10 Áššit meahcceeatnamiid

sZ

Stuora oassi Romssa universitehta doaimmain lea juohkit dieđuid sámi, norgga ja riikkaidgaskasaš dutkanbirrasiidda, sámi ja norgga eiseválddiide, ja sámi servodagaide (geahča mielddus A “Romssa universitehta ja guoskevaš institušuvnnaid sámi dutkan ja oahpahus” álggahusa ).

s=

Stuora oassi Romssa universitehta doaimmain lea juohkit dieđuid sámi , norgga ja riikkaidgaskasaš dutkanbirrasiidda , sámi ja norgga eiseválddiide , ja sámi servodagaide ( geahča mielddus A “ Romssa universitehta ja guoskevaš institušuvnnaid sámi dutkan ja oahpahus ” álggahusa ) .

sA

Artikkel i boka Samisk skolehistorie 2 . Davvi Girji 2007.

s]

Artikkel i boka Samisk skolehistorie 2 .Davvi Girji 2007 .

R%si

Bjørn Aarseth med elever på skitur - på 1950-tallet. (Foto: Trygve Madsen)

ss

Bjørn Aarseth med elever på skitur - på 1950-tallet .( Foto : Trygve Madsen )

sa

finne rom for etablering av en fast tilskuddsordning til allerede etablerte språksentra..

so

finne rom for etablering av en fast tilskuddsordning til allerede etablerte språksentra .

s6

elevene skal få! Sametingsrådet mener målet

sR

elevene skal få !Sametingsrådet mener målet

N(R3RHRtXMLtprocessOneParagraphRE(RRR?R@((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestProcessOneParagraphs4  cCsPd|j_tjd}|jj|}tjd}|j||dS(s4Test how SentenceDivider handles quotemarks R%s

Forsøksrådet for skoleverket godkjente det praktiske opplegget for kurset i brev av 18/8 1959 og uttalte da bl. a.: «Selve innholdet i kurset virker gjennomtenkt og underbygd og ser ut til å konsentrere seg om vesentlige emner som vil få stor betydning for elevene i deres yrkesarbeid. Med flyttsame-kunnskapen som bakgrunn er det grunn til å vente seg mye av dette kursopplegget.»

s

Forsøksrådet for skoleverket godkjente det praktiske opplegget for kurset i brev av 18/8 1959 og uttalte da bl. a. : « Selve innholdet i kurset virker gjennomtenkt og underbygd og ser ut til å konsentrere seg om vesentlige emner som vil få stor betydning for elevene i deres yrkesarbeid . Med flyttsame-kunnskapen som bakgrunn er det grunn til å vente seg mye av dette kursopplegget . »

N(R3RHRRORPRE(RRR?R@((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestQuotemarkss  cCsR|jjddddddg}|j|jdd|j|jd dS( Nu Sámerievttiu ovdáneapmiuleaudahkanu vuđđosauFinnmárkkuláhkiitidt0u@Sámerievtti ovdáneapmi lea dahkan vuđđosa Finnmárkkuláhkii(R3t makeSentenceR)Rttext(Rts((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestMakeSentences$( R R!R"R(RERKRNRQRRRX(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR1ys    # R2cBsDeZdZdZdZdZdZdZdZRS(s&A class that takes a giellatekno xml document as input. It spits out an xml document that has divided the text inside the p tags into sentences, but otherwise is unchanged. Each sentence is encases in an s tag, and has an id consisting of the outputfilename and a numberic cCs(d|_||_tj||_dS(Ni(RGRHRRRI(Rt inputXmlfileR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRs  cCshtjd|_tjd}|jj|x0|jjdD]}|j|j|qDWdS(sLGo through all paragraphs in the etree and process them one by one. RMtbodys//pN(RtElementRMtappendRIRRP(RRZt paragraph((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRLs cCsNt|d}tj|j}|j|dtdddt|jdS(s0Write the given tmx to the outfile name twR4tencodingsutf-8txml_declarationN(topenRt ElementTreeRMtwriteR7tclose(Rtoutfiletftet((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt writeResultscCs&g}|jdkrDtjjtjdd}dd|g}nOtjjtjdd}tjjtjdd}dd|d|g}tj|d tjd tjd tj}|j|j d \}}|j d krt j dIJt j |IJt j |IJt j n|SdS(s}Send the text in preprocessInput into preprocess, return the result. If the process fails, exit the program R%tGTHOMEsst/nob/bin/abbr.txtt preprocesss--abbr=sgt/sme/bin/abbr.txtsgt/sme/bin/corr.txts--corr=tstdintstdouttstderrsutf-8isCould not divide into sentencesN(RHR R RR&t subprocesstPopentPIPEt communicateR=t returncodetsysRmtexit(RtpreprocessInputtpreprocessCommandtabbrFiletcorrFiletsubptoutputterror((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytgetPreprocessOutputs*    c Cstjd}|jd}dj|}|r|j|}g}t}d}x|jdD]} |j| jd| dkr|tkrt }qt}|dks|dkr|j|j |g}qn| dks| dks| d kr`|t kr`|dgkrW|dgkrW|d gkrW|j|j |ng}n| }qdWt |d kr|j|j |qn|S( sRun the content of the origParagraph through preprocess, make sentences Return a new paragraph containing the marked up sentences Rs .//text()R5s sutf-8s«»"t.t?t!i( RR[txpathRR|tFalsetsplitR\tdecodeR7RUtlen( Rt origParagrapht newParagraphtallTextRuRztsentencet insideQuotet previousWordtword((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRPs2    0-  cCsJtjd}t|j|jd/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRU+s ( R R!R"RRLRhR|RPRU(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR2s    (t ParallelizecBszeZdZdZdZdZdZdZdZdZ dZ d Z d Z d Z d ZRS( s* A class to parallelize two files Input is the xml file that should be parallellized and the language that it should be parallellized with. The language of the input file is found in the metadata of the input file. The other file is found via the metadata in the input file cCsg|_ttjj||}|jj|t|jdj|jdj}|jj||jr|j ndS(s Set the original file name, the lang of the original file and the language that it should parallellized with. Parse the original file to get the access to metadata iN( t origfilesRR R tabspathR\RRtisTranslatedFromLang2treshuffleFiles(Rt origfile1tlang2ttmpfile((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR?s ) cCs2|jd}|jd|jd<||jd/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRPs cCs|jS(sE Return the list of (the two) files that are aligned (R(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt getFilelistXscCs|jdjS(Ni(RR(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytgetLang1^scCs|jdjS(Ni(RR(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytgetLang2ascCs|jdjS(Ni(RR(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt getOrigfile1dscCs|jdjS(Ni(RR(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt getOrigfile2gscCskt}tj|j}|j}|jd}|dk rg|jd|jkrgt }qgn|S(sD Find out if the given doc is translated from lang2 s.//translated_froms*{http://www.w3.org/XML/1998/namespace}langN( RRRRRtfindtNoneRRR7(RtresultRRttranslated_from((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRjs   cCs#tjjtjdd}tjjtjdd}tjdd|jd|jdtjd||gd tjd tj}|j \}}|j d krt j d It jId IJt j |IJt j |IJ|j Sd|j|jd}tjjtjd|S(sb Generate an anchor file with lang1 and lang2. Return the path to the anchor file Risgt/common/src/anchor.txtsgt/common/src/anchor-admin.txtsgenerate-anchor-list.pls--lang1=s--lang2s --outdir=R$RlRmisCould not generate s into sentencessanchor-s.txt(R R RR&RnRoRRRpRqRrRsRmR'R(Rtinfile1tinfile2RyRzR{t outFilename((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytgenerateAnchorFilexsR  cCsx|jD]}tjj|j}tjj|rx|j|}t||j}|j |j |q t j |IdIJdSq WdS(s` Call corpus-analyse.pl which reads an xml file and makes it palatable for tca2 s doesn't existii( RR R RRtexiststgetSentFilenameR2RRLRhRsRm(RR'tinfileRetdivider((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytdividePIntoSentencess cCs9|jjdd}tjdd||jdS(sr Compute a name for the corpus-analyze output and tca2 input file Input is a ParallelFile s.xmlR5R$s/tmp/s _sent.xml(RRR R&R(RR't origfilename((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRscCs|j}tjd||j|jd|j|jdgdtjdtj}|j\}}|jdkrtj dI|j|jdIdI|j|jdIdIJtj |IJtj |IJn|jS( s2 Parallelize two files using tca2 stca2.shiiRlRmsCould not parallelizetands into sentences( RRnRoRRRpRqRrRsRm(Rt anchorNameRyRzR{((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytparallelizeFiless SK (R R!R"RRRRRRRRRRRR(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR6s           tTestParallelizecBs_eZdZdZdZdZdZdZdZdZ dZ d Z RS( s0 A test class for the Parallelize class cCs!ttjddd|_dS(NR$sA/prestable/converted/sme/facta/skuvlahistorja2/aarseth2-s.htm.xmlR%(RR R&t parallelize(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR(scCs(|j|jjtjdddS(NR$sA/prestable/converted/nob/facta/skuvlahistorja2/aarseth2-n.htm.xml(R)RRR R&(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt testOrigPathscCs(|j|jjtjdddS(NR$sA/prestable/converted/sme/facta/skuvlahistorja2/aarseth2-s.htm.xml(R)RRR R&(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestParallelPathscCs|j|jjddS(NR%(R)RR(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt testLang1scCs|j|jjddS(NR-(R)RR(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt testLang2scCs8|j|jj|jjdtjdddS(NiR$s/tmp/aarseth2-n.htmnob_sent.xml(R)RRRR R&(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestGetSentFilenamescCs|j|jjddS(Ni(R)RR(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestDividePIntoSentencesscCs|j|jjddS(Ni(R)RR(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestParallizeFilesscCs3|j|jjtjjtjdddS(NR$sanchor-nobsme.txt(R)RRR R RR&(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestGenerateAnchorFiles( R R!R"R(RRRRRRRR(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRs        tTmxcBs_eZdZdZdZdZdZdZdZdZ dZ d Z RS( s| A class that reads a tmx file, and implements a bare minimum of functionality to be able to compare two tmx'es cCs ||_dS(N(ttmx(RR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRscCs|jjdjdd S(s9 Get the srclang from the header element s .//headertsrclangi(RRR(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt getSrcLangscCs|jS(s) Get the tmx xml element (R(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytgetTmxscCsd}y||ddjj}Wntk r8nX|d7}y||ddjj}Wntk runX|d7}|jdS(s9 Extract the two strings of a tu element R5is is sutf-8(RVtstriptAttributeErrorR=(Rttutstring((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt tuToStrings    cCs>d}y|djj}Wntk r0nX|jdS(s9 Extract the string from the tuv element R5isutf-8(RVRRR=(RttuvR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt tuvToStrings  cCs[|jjd|ddidd6}g}x$|D]}|j|j|q7W|S(sm Get all the strings in the tmx in language lang, insert them into the list strings s.//tuv[@xml:lang="s"]t namespacess$http://www.w3.org/XML/1998/namespacetxml(RRR\R(RRtall_tuvtstringsR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytlangToStringlist s  cCsF|jjd}g}x$|D]}|j|j|q"W|S(sH Extract all string pairs in a tmx to a list of strings s.//tu(RRR\R(Rtall_tuRR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttmxToStringlists  cCsy,|ddjj}||dd_Wntk r?nXy,|ddjj}||dd_Wntk rnX|S(s' Input is a tu-element ii(RVRR(RRR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt prettifySegs"s  cCs|jjd}tjd}x[|D]S}tjd}|j|d|j|d|j|}|j|q+Wtjd}|j|||_dS(sL Reverse the langs in a tmx Return the reverted tmx s.//tuRZRiiRN(RRRR[R\RR(RRRZRRR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt reverseLangs4s  ( R R!R"RRRRRRRRR(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRs      tTestTmxcBsVeZdZdZdZdZdZdZdZdZ dZ RS( s( A test class for the Tmx class cCsttjd|_dS(Ns#parallelize_data/aarseth2-n.htm.tmx(RRRR(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR(MscCstj|dt}tj|dt}tj}|j||ds|jtjd||dj d}t |ndS(s5 Check if two xml snippets are equal R4iR5sutf-8N( RR6R7RR8R9R:R;R<R=R>(RR?R@RARBRCRD((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyREPs  *cCs|j|jjddS(s$Test the getSrcLang routine R%N(R)RR(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestGetSrcLang]scCs/tjd}|j|jj|ddS(NsdSámegiellaSamisksSámegiella Samisk (RROR)RR(RR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestTuToStringbscCs/tjd}|j|jj|ddS(Ns0Sámegiellas Sámegiella(RROR)RR(RR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestTuvToStringgscCstdd}|j}g}g}xE|D]=}|jd}|j|d|j|djq.W|j|jjd||j|jjd|dS(Ns*parallelize_data/aarseth2-n.htm.tmx.as.txttrs iiR%R-(Rat readlinesRR\RR)RR(RRft stringListtnobListtsmeListRtpairList((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestLangToStringListls  cCsBtdd}|j}|j|j|jj|dS(Ns*parallelize_data/aarseth2-n.htm.tmx.as.txtR(RaRRdR)RR(RRftwantList((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestTmxToStringlistzs  cCs>tjd}tjd}|j|jj||dS(Nsubba gubba. ibba gibba.abba gabba. ebba gebba.subba gubba. ibba gibba. abba gabba. ebba gebba. (RRORERR(RtwantXmltgotXml((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestPrettifySegss( R R!R"R(RERRRRRR(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRIs      t TmxFromTca2cBsheZdZdZdZdZdZdZdZdZ dZ d Z d Z RS( sA A class to make tmx files based on the output from tca2 cCs#||_tj||jdS(s9 Input is a list of ParallelFile objects N(tfilelistRRtsetTmx(RR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRs cCs_tjd}|j|j||jdj|j|j||jdj|S(sI Make a tmx tu element based on line1 and line2 as input Rii(RR[R\tmakeTuvRR(Rtline1tline2R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytmakeTus&&cCs]tjd}||jd/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRs  ! cCsTtjd}d|jd Make a tmx header based on the lang variable theaderRtsegtypes OmegaT TMXso-tmfsen-USt adminlangRt plaintexttdatatype(RR[R(RRR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt makeTmxHeaders     cCs7|jdd}tjd}|jd|}|S(s7 Remove the s tags that tca2 has added sR5s(Rtretcompiletsub(RRtsregex((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRscCsd|jdjd}d|jdjd|jdjd}|jdjj||}|jdjjdd}tjj||S( s2 Compute the name of the tmx file s /converted/it/s/tmx/t2is.xmls.tmx(RRR RRR R R(Rt origPathParttreplacePathPartt outDirnameR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytgetOutfileNames 0cCsz|j}yQt|d}tj|j}|j|dtdddt|jWntk rudGHnX|S(s< Write a tmx file given a tmx etree element R^R4R_sutf-8R`souch, printTmxFile( RRaRRbRRcR7RdtIOError(RRRfRg((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt printTmxFiles   c Cstjd}|j|jdj}|j||j|jd}|j|jd}tj|d}x?td||D]+\}}|j ||}|j|qW|S(sE Make tmx file based on the two output files of tca2 RiiRZN( RR[RRRR\treadTca2Outputt SubElementtmapRR( RRRt pfile1_datat pfile2_dataRZRRR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRs cCs{d}|j|jdd}y)t|d}|j}|jWn*tk rv\}}dj||GHnX|S(sI Read the output of tca2 Input is a ParallelFile R5s.xmls_new.txtRsI/O error({0}): {1}(RRRaRRdRtformat(RR'RVt pfileNameRfterrnotstrerror((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRs cCs9|jjdd}tjdd||jdS(sr Compute a name for the corpus-analyze output and tca2 input file Input is a ParallelFile s.xmlR5R$s/tmp/s _sent.xml(RRR R&R(RR'R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRs( R R!R"RRRRRRRRRR(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRs      tTestTmxFromTca2cBsVeZdZdZdZdZdZdZdZdZ dZ RS( s0 A test class for the TmxFromTca2 class cCs3ttjddd}t|j|_dS(sK Hand the data from the Parallelize class to the tmx class R$sA/prestable/converted/sme/facta/skuvlahistorja2/aarseth2-s.htm.xmlR%N(RR R&RRR(Rtpara((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR(scCstj|dt}tj|dt}tj}|j||ds|jtjd||dj d}t |ndS(s5 Check if two xml snippets are equal R4iR5sutf-8N( RR6R7RR8R9R:R;R<R=R>(RR?R@RARBRCRD((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyREs  *cCsDd}d}|jj||}tjd}|j||dS(Ns3ubba gubba. ibba gibba.s3abba gabba. ebba gebba.subba gubba. ibba gibba.abba gabba. ebba gebba.(RRRRORE(RRRtgotTutwantTu((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt testMakeTu%s cCsDd}d}|jj||}tjd}|j||dS(Ns3ubba gubba. ibba gibba.tsmis<ubba gubba. ibba gibba.(RRRRORE(RRRtgotTuvtwantTuv((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt testMakeTuv/s cCs;d}|jj|}tjd}|j||dS(NRsd
(RRRRORE(RRRR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestMakeTmxHeader8scCs,|jjd}d}|j||dS(Ns3ubba gubba. ibba gibba.subba gubba. ibba gibba.(RRR)(RR?R@((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestRemoveSTag@scCs3|j|jjtjjtjdddS(NR$s>prestable/tmx/nob2sme/facta/skuvlahistorja2/aarseth2-n.htm.tmx(R)RRR R RR&(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestGetOutfileNameFscCsHtjd}|jjtj|jj}|j||dS(Ns#parallelize_data/aarseth2-n.htm.tmx(RRRRRRE(RR@R?((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestPrintTmxFileIs ( R R!R"R(RERR R R R R (((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR s    t TmxComparatorcBs;eZdZdZdZdZdZdZRS(s* A class to compare two tmx-files cCs||_||_dS(N(twantTmxtgotTmx(RRR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRTs cCst|jjS(sA Return the number of lines in the reference doc (RRR(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytgetLinesInWantedfileYscCs\d}xOtj|jj|jjddD]#}|d dkr1|d7}q1q1W|S(s Given a unified_diff, find out how many lines in the reference doc differs from the doc to be tested. A return value of -1 means that the docs are equal itniit-(tdifflibt unified_diffRRR(Rt numDiffLinesR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytgetNumberOfDifferingLines`s 1cCsLg}x?tj|jj|jjddD]}|j|q1W|S(s? Return a stringlist containing the diff lines Ri(RRRRRR\(RtdiffR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt getDiffAsTextns1cCsVg}xItj|jj||jj|ddD]}|j|dq7W|S(s? Return a stringlist containing the diff lines Ris (RRRRRR\(RRRR((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytgetLangDiffAsTextxs7(R R!R"RRRRR(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRPs     tTestTmxComparatorcBseZdZdZRS(s2 A test class for the TmxComparator class cCsytttjdttjd}|j|jd|j|jd|jt|jddS(Ns#parallelize_data/aarseth2-n.htm.tmxii i( RRRRR)RRRR(Rtcomp((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestEqualTmxess-(R R!R"R(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRstTmxTestDataWritercBsVeZdZdZdZdZdZdZdZdZ dZ RS( s5 A class that writes tmx test data to a file cCsZ||_y&tj|}|j|jWn$tk rU|j|jnXdS(N(tfilenameRRtsetParagsTestingElementRRtmakeParagstestingElement(RRttree((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRs   cCs|jS(N(R(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyt getFilenamescCs:tjd}||jd<||jd<||jd<|S(s; Make the element file, set the attributes tfileRtgspairst diffpairs(RR[R(RRR%R&t fileElement((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytmakeFileElements    cCs ||_dS(N(t paragstesting(RR)((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR scCs tjd}||jd<|S(s= Make the testrun element, set the attribute ttestruntdatetime(RR[R(RR+ttestrunElement((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytmakeTestrunElements cCstjd}|S(s0 Make the paragstesting element R)(RR[(RtparagstestingElement((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR!scCs|jjd|dS(Ni(R)tinsert(RR*((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytinsertTestrunElementscCsnyQt|jd}tj|j}|j|dtdddt|jWntk ridGHnXdS(s8 Write the paragstesting data to a file R^R4R_sutf-8R`souch, ParagstestingresultsN( RaRRRbR)RcR7RdR(RRfRg((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytwriteParagstestingDatas ( R R!R"RR#R(R R-R!R0R1(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRs    tTestTmxTestDataWritercBsVeZdZdZdZdZdZdZdZdZ dZ RS( s+ A class to test TmxTestDataWriter cCstd|_dS(Nt testfilename(Rtwriter(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR(scCstj|dt}tj|dt}tj}|j||ds|jtjd||dj d}t |ndS(s5 Check if two xml snippets are equal R4iR5sutf-8N( RR6R7RR8R9R:R;R<R=R>(RR?R@RARBRCRD((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyREs  *cCs|j|jjddS(NR3(R)R4R#(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestGetFilenamescCs;tjd}|jjddd}|j||dS(Ns/tabct634t84(RROR4R(RE(Rt wantElementt gotElement((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestMakeFileElementscCsZtjd}|jjd}|jjddd}|j||j||dS(Ns[s 20111208-1234R6R7R8(RROR4R-R(R\RE(RR9R:R'((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestMakeTestrunElements  cCsvtjd}|jj}|jjd}|jjddd}|j||j||j||dS(Nszs 20111208-1234R6R7R8(RROR4R!R-R(R\RE(RR9R:R,R'((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestMakeParagstestingElements  cCstjd}|jj}|jj||jjd}|jjddd}|j||j||jjd}|jjddd}|j||jj||j ||dS(Nss 20111208-1234R6R7R8s 20111208-2345( RROR4R!R R-R(R\R0RE(RR9R:R,R'((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestInsertTestrunElements   cCstjd}|jj}|jj||jjd}|jjddd}|j||j||jjtj |jj }|j ||dS(Nszs 20111208-1234R6R7R8( RROR4R!R R-R(R\R1RRRE(RR@R:R,R'R?((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyttestWriteParagstestingData$s   ( R R!R"R(RER5R;R<R=R>R?(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR2s     tTmxGoldstandardTestercBsbeZdZd dZdZdZdZdZdZ dZ dZ d Z RS( sL A class to test the alignment pipeline againt the tmx goldstandard cCsMd|_t||_|dkr6|j|_n|j||_dS(sn Set the name where the testresults should be written Find all goldstandard tmx files iN(tnumberOfDiffLinesRttestresultWriterRt dateformattdate(Rttestresult_filenametdateformat_addition((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR8s   cCs|j|7_dS(sI Increase the total number of difflines in this test run N(RA(Rt diffLines((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytsetNumberOfDiffLinesDscCs|jS(N(RA(R((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytgetNumberOfDiffLinesJscCs=ddl}ddl}|jj|j}|jdS(sQ Get the date and time, 20111209-1234. Used in a testrun element iNs %Y%m%d-%H%M(R+ttimet fromtimestamptstrftime(RR+RJtd((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRCMs  cCs|jj|j}d}xX|jD]J}dG|GdGH|jddkrYd}nd}|j|||q(W|jj||jjdS(NR5ttestings...tnob2smeiR-R%(RBR-RDtfindGoldstandardTmxFilesRt alignFilesR0R1(RR*Rt wantTmxFile((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pytrunTestWs  c Cs|j|}t||}|jdkr|jdkr|j}t|}ttj|}t ||} |j j |dj t | jt | j} |j| j|j| |j| ||dj qndS(s Align files Compare the tmx's of the result of this parallellization and the tmx of the goldstandard file Write the result to a file Write the diffs of these to tmx's to a separate file iN(tcomputeXmlfilenameRRRRRRRRRRBR(RRRRRHR\twriteDiffFiles( RR*RRRtxmlFilet parallelizerRRRt comparatorR'((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRQns   : cCsL|jdd}|jdd}|jdd}|jdd}|S( sI Compute the name of the xmlfile which should be aligned stmx/goldstandard/s converted/ROR%tsme2nobR-s.tmxs.xml(R(RRRRV((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRTs cCsadG|GH|d|jd}tjjtjj|jjd}y"ttjj||d}Wn6tk rdGtjj||GHt j dnX|j d|d |j d |j |j |j d |jd |j |j|j|j d |jd |j |j|j|j d |jdS(s/ Write diffs to a jspwiki file RUt_s.jspwikit tca2testingR^scouldn't write fileis!!!s s!!TMX diff {{{ s }}} !!s diff {{{ s }}} N(RDR R RR RBR#RaRRsRtRct writelinesRRRRRd(RRXRWRR Rf((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRUs" *"   cCstjdtjjtjdddddgdtjdtj}|j\}}|jd krt j d IJt j |IJt j d n|j d }|d SdS(sH Find the goldstandard tmx files, return them as a list RR$sprestable/tmx/goldstandards-names*.tmxs-printRlRmis*Error when searching for goldstandard docsis iN( RnRoR R RR&RpRqRrRsRmRtR(RRyRzR{tfiles((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRPsF  N( R R!R"RRRHRIRCRSRQRTRURP(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR@4s    # tTmxFixercBseZdZdZRS(s A class to reverse the langs and change the name of a tmx file if needed Possible errors of a tmx file: * the languages can be in the wrong order * the name is wrong * the file is placed in the wrong lang directory cCsdS(sA Input is the tmx file we should consider to fix N((Rt filetofix((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyRs(R R!R"R(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyR^st__main__(((((((((#R RRsRnRR;tlxmlRRtunittestRtTestCaseR#R1R2RRRRRRRRRR2R@R^R ttestt TestSuitet testSuitetaddTestt makeSuitetTextTestRunnertrun(((s>/home/boerre/langtech/trunk/gt/script/langTools/parallelize.pyts:       >Sju=D2!AP "