eXist-based speller test results browser - gold standard testing.
Issues with the present solution:
* static, not searchable
* displays everything - no filtering options
* does not allow (historic) comparison of different spellers
* displays the performance on a word level (makes it easy to correct those
errors, and thus in turn fool the test)
* no logging of version of goldstandard corpus used
* no comparison of released versions using the same test data
Spec for a replacement:
* some test results only visible with authorization (login)
* a set of "views" or speller profiles visible to everyone
* some views should be filterable
** display only test results for spelling errors of type X
(selectable from a dropdown menu)
* dynamically created graphs
* test result stats also available as downloadable CSV (for easy creation of
other types of graphs)
* comparison of a set of spellers (independent of language)
* comparison of different versions of the same speller tested on the same data
* only gold standard tests, regression and other tests should move to test
scripts in the new infra
Compare with existing graphs.
Some data is not properly represented in the xml files today:
{{{
errorinfo=vowc,i-ï;cons,j-g, file: Sorsamisk-valprogram.correct.doc.xml
}}}
Here, {{errorinfo}} should be converted into a separate attribute.
And the same with the file info:
{{{
}}}
There might be errors in the markup:
{{{
errorinfo=vowcinit;Ae-Aa,
file: Duedtie_faageteekste.correct.txt.xml
errorinfo=conm;n-b;finis,ie-e,
file: Dokumentaarefilme_Ealoenyjsenaejja.correct.doc.xml
errorinfo=dem;i-ï, infl;nom,nomsg-nompl ,
file: Sveerjen_nasjovnegaerjagaetie.correct.txt.xml
errtype=mix,pos=prop
}}}
Create a list of potential markup errors (ie ;X-Y) and send that list to Maja,
so that she can correct the errors in the original files.
Work plan:
* convert all xml files to a new version, where errorinfo and file are separated
* upload the new xml for smj to an eXist db
* start playing with the code to make the webapp