$Rev: 158353 $ Added the missing files for a working grammar checker. Fixed grammar checker build rules to not be dependent upon enabling tokenisers. r158289: Added conversion of the analysis tags from the grammar checker speller into CG format. r158250: One misplaced variable caused the grammar checker speller to be built independent of the configuration. This caused a build fail for everyone. Solves bug #2437. Also added $(srcdir) in front of root.lexc, to ensure that the file reference resolves correctly in local build targets. r158242: Moved the target clean-local to the local Makefile, to make it possible to enhance the clean target with locally generated files. r157960: Correctiona to the grammar checker speller build: we now build a working zhfst file that can be used as part of the development cycle. Also additions to silent builds. r157879: Major update to the grammar checker template. It still does not work completely as it should, so hold your horses. Update content: ensured that all files needed are copied to the grammar checker build dir, removed option to name files (=irrelevant bloat), now builds an almost proper zip file, and ensured that tokenisers are built before grammarcheckers. Also made it so that when grammar checkers are enabled, spellers are automatically enabled too, as they will be included as part of the grammar checker pipeline. r157261: Changed the file exists test for the lemma generation testing so that it will work even in cases where multiple source files are used as input. r157204: Made cg3 file compilation more general. r157096: Moved the code to build the apertium relabel script in the apertium directory, so that we can use the actual giella-tagged fst for MT as the tag source. This should fix all issues of missing tags in the relabel script. r157021: GLE requires regex compilation possibilities in src/, no reason why it can't be. r156971: Fixed a shortcoming in the build infra uncovered by gle: no explicit support for language-specific build rules that will not end up in lexicon.?fst. r156319: Moved tag extraction to a separate am-include file, so that it can be shared between different dirs. Moved generation of regex for turning tags into CG friendly format from src/filters/ to tools/tokenisers/filters/. r156233: After a couple of bug fixes in giella-core, require the new version. r156188: Initial support for building tokenisers where the morphological analysis tags are given in CG format directly instead of having to be postprocess by hfst-tokenise before being printed. The idea is to make the hfst-tokenise code more general, and move everything that is particular to one language or setup go into the fst instead of being hardcoded in the C++ code. There are some issues that must be resolved, but fst-wise the code works. r156180: Added support for building a regex that transform all tags from the format "+Adv" to " Adv" (including space). The idea is to make the tags readily consumable by CG. Both prefix and suffix tags are converted. Newest giella-core required. r156162: Part two of renaming the preprocess dir to tokenisers. Now all refs to it are updated. r156153: Renamed the preprocess dir to tokenisers, to better describe the content of it. r155820: Added support for diffing and merging on Linux. As part of that added checking for diff tools in m4/giella-macros.m4, and added more tests against failures. Also added test for cg-mwesplit, and increased the required vislcg3 version to the 1.0 release. r155779: More robust test for the existence of the various vislcg3 files. r155748: Added more robust option checking, and a test for the existence of the specified corpus file. Also added some comments. r155732: Actually open the other diff views. And force-add to svn - we don't want error messages in this context. r155718: Corrected glaring variable copy&paste bug. Thanks to Trond for spotting it! r154835: Removed from the default build rules the automatic removal of +Comp tags in adverbs. That is definitely not a behavior we want universally. r154751: Fixed a bug that caused the check_analysis_regressions.sh script to fail if you hadn't put giella-core/scripts/ in your path - which is not automatically done when you just checks out giella-core and your language of interest. r154655: Changed command to extract the specified fst name, the old version was not reliable. r153095: Due to wrong AM conditional, it still built a few mobile speller fst's. Now it should be quiet. r153089: Really do disable mobile spellers by default... r153083: Made mobile spellers not build by default, even when enabling spellers. The mobile spellers must now be explicitly enabled. r152757: Removed Ins() around Unknown. This triggered a bug(?) in hfst-tokenise, that caused wordforms not to be output. Speed and memory consumption should not be noticably affected. r152167: Improved pmatch scripts - unification by reference instead of full fst unification. Reduces file size by ≈2/3, and runtime memory consumption by 50%. r151497: Now that there is a new version of Hfst out, require it. Should resolve issues with compiling the url.lexc file. r150101: Further development of the analysis regression check: added support for diff views of all diff types, and now you can specify which diff view you want to see (and you must specify at least one). You can also override the default corpus, and specify a corpus of your own with the -c/--corpus option. Also corrected the initial description of the script in the help text, and added a diff view comparing the old pipeline using Xerox with the new pipeline using hfst-tokenise. This will help in finding unwanted differences between the two. r150035: Further improvements to the analysis regression check: only do function and dependency analysis if the required cg3 files exist. Also clarified the -d option and silenced the Xerox lookup tool. r150021: Improved analysis regression check script: added a short help text, and added an option to ask for a diff between old-style (preprocess+lookup+lookup2cg) and new-style (hfst-tokenise+mwe-disamb+cg-mwesplit) morphological analysis. Intended to be used to find weak (and strong!) spots in the new-style morphological analysis. r150008: Added the first version of a $LANG/devtools/ script that will process a corpus with the available tools, and compare the result against the previous version in the svn repository. The idea is to be able to easily spot regressions in analyses due to changes in the lexicons or CG rules. There are a number of rough edges, but it works. r149897: Only remove generated lemma files if the lemma generation tests succeeds. r149609: Only delete generated dic and tex files if one really wants to start anew. Do not delete the version.txt file, only the generated wordlist file. r149598: Add the url parser also to the grammar checker tokeniser. r149543: Make the url.hfst a dependent of the hfst tokenising analyser. Improved the tokeniser based on recent changes in sme. r149455: Removed automatic inclusion of the url parsing fst. The union with the regular fst blew up the total, in some cases more than 10x! The preferred way of adding it is to add it in the last steps of the *.tmp.fst > *.fst processing by loading it onto the stack (and inverse it for hfst) before saving the fst stack, and thus creating a transducer file with two fst's. Applying the input to them both will in effect union them, giving the output we want without blowing up the size of the fst file. r149385: Added support for compiling a lexc file for parsing URL's as such, giving them a separate tag. Only added to the descriptive analysers for now. Requires an updated version of giella-shared, due to the new file needed for the new functionality. r149344: Corrects an inconsistency in the order of tag changing processing, where generators and analysers got their tags changed in different order, which caused different tags in some cases. Fixes bug #2264. Thanks to Heiki-Jaan Kaalep for the new and corrected code. r149190: Updated Python feedback to correctly state that Python 3.5 is required. r148907: Fixed issue with link generation thanks to Heiki-Jaan Kalep. r148504: Increased reqiured version of Python3, due to the updated speller test bench. r148389: New version of the speller test bench, now with sortable table columns, and optional timing of the suggestions for every input word (hfst-ospell-office only). Not finished, but working quite well. It is also possible now to specify the number of suggestions returned by hfst-ospell-office. r147813: Increased required version of giella-core due to bug fix in the core. r147789: Increased required version of giella-core due to changes in speller building. r147702: One more attempt at fixing the giella-common package bug. r147651: Added final step in building pattern-based hyphenators: now also prepared for Hunspell-like OOo hyphenation. Requires new version of the giella-core. Also corrected bug in checking the version number of giella-common. r147603: Tex pattern based hyphenation generation works. The output must be checked and tested, and the process may have to be rerun several times to get the desired hyphenation behavior. Removed outcommented build code from the old infra - the new build code is essentially just a reformulation of the old one. r147592: Added support for checking the version of the giella-common package (aka giella-shared/). Added two new regexes to the source file list for shared regexes. Updated the required version of Hfst - it has not been updated in ages. r147576: Further work on the pattern based hyphenators: added tra file template, which is used to 'translate' non-ASCII chars to ascii only for the pattern creation process. Initial build steps for the pattern build. r147564: Improved the fst-based hyphenator by removing irrelevant paths from the fst. Started work on the pattern-based hyphenator, based on code from the old infra. r147524: Finished first version of fst-based hyphenator: now includes plain rules as a fall-back solution (including for misspelled words), and Err-tagged forms get a high weight penalty. In general, this seems to give good hyphenation patterns if one pick the first (lowest-weight) one. r147517: First version of lexicon-based and fst-based hyphenation done. Works, but misses capitalised words, and does not give extra weights to Err-tagged word forms. Also no hyphenation of misspelled words yet. Hyphenation builds are off by default. r147509: Added template file for weighting tags when the fst is used as a hyphenator. r147495: Added check for cg-relabel when enabling apertium. Thanks to Flammie for identifying the issue. r147393: Added basic dir structure for building hyphenators. r147218: Replaced gtcore with giella-core. r147022: Added test dir for hyphenators, to store data from the old infra. r147006: Added test dirs for listbased spellcheckers, if we ever get to that. r146815: Fixed logical error in the handling of negated specified fst handling in yaml tests (e.g. ~xfst) - the test didn't work, and the yaml file was run when not intended. r146786: Fixed regression introduced in the previous commit: one-sided tests where included when looking for test data, causing a subsequent python fail when no actual test data was found. Fixed by using a stricter file name pattern. r146741: Added option to specify in a yaml filename that it should only be tested against a specific technology or not, by specifying one of .foma, .hfst or .xfst before the suffix part (before [.gen].yaml), and prefixed with '~' if negated (i.e. .~xfst for NOT running it against Xerox). r146706: Slightly more robust yaml testing code. r146700: Common starting point for both weighted and unweighted parts. r146325: Added removal of Area tags also for specialised fst's. Fixes Korp issue reported by Ciprian. r145082: Ensure the fastest lookup method is used during hfst yaml generation tests. r144553: Removed the bash hack to add a css processing instruction - it is done by the perl script writing the xml file. r144287: Removed the removal for dialect and variant tags from the grammar checker analyser, the information can be useful when generating suggestions for corrections. r143980: Removed repetition of the frequency weighted fst. The goal was to promote compounds where each part was already seen in the corpus, but it made the speller bigger and slower, and actually decreased suggestion quality slightly. — Also added code to do manual priority union, but it is buggy and outcommented for now. r143822: Added info about which file to look in to find a suitable frequency corpus cut-off location (=line number). r143635: Renamed the option --enable-hfst-dekstop-spellers (added plural 's'), and changed the behavior of it so that when disabled, zhfst files are still built (and only those). r142732: Cleaner build steps for local speller filters - the regex is now copied in and compiled according to the fst-format of the speller as opposed to earlier, where the binary fst was compiled and then transformed. r142638: Move CmpNP processing from general speller processing to each language. r142614: Also moved the CmpNP filtering to the relevant languages. r142542: Forgot one file in the previous commit - now that filter is completely removed from the core and template, and all language-independent processing. r142532: Moved the remove-norm-comp-tags.regex file from the giella-shared directory to the languages actually using it, and consequently removed it from the language-independent build files. r142098: Updated the speller devtools scripts to obey the new name and location of the giella-core directory. r142078: Added test for available GNU Make, and at least at version 3.82. Error if not found, except on OSX/macOS, where the builtin make is GNU Make 3.81 + patches, which corresponds to the required version or newer. r141817: Better support for speller filters using source files from other locations. r141652: Added mwe-dis.cg3, to allow disambiguation of multiword expressions and other tokenisation ambiguity. r141536: We build the tokeising analysers directly off the disamb and grammar checker analysers in src/, assuming that they are identical. This is a reasonable assumption now that the hfst tool kit contains all necessary machinery, and we don't need to pay special attention to the requirements of the tokenisation. r141525: Make --with-backend-format work also for the tokenising analysers. r141189: Wrong variable name :-( - now it is correct. r141182: Corrected makefile dependency for the und.timestamp file. r141056: More robustness added to the test scripts: checking several variables, testing whether the found variables are pointing to existing directories, and giving an error message if no directory is found. r140928: Changed variable name and definition to allow overriding the path to the called script, to make it easy to use a locally modified script instead. r140921: Changed variable name in devtool scripts, to reflect similar changes elsewhere. Part of fixing bug #2219. r139846: Corrected a number of bugs and deficiencies when building spellers when the giella proofing tools libraries must be fetched over the net. Not the spellers build correctly under all intended circumstances given that there is a network connection. r139830: Corrected path for the test for availability of the giella-common resources. r139822: Added support for getting precompiled proofing tools libraries across the net if not found locally. Makes it actually possible to build spellers without checking out the whole of $GIELLA_HOME. Now it is also possible to just check out $GIELLA_LIBS if one still wants to build everything locally. r139526: Applied backend format rules to the tools/mt/ap/filters dir. This is not future proof, but does not create problems for sme, and solves a bug in smj. The future problem is that we mix both a specified backend format (for compilation efficiency) with the default/unspecified format fst (for weighting) in the same dir, and we can't automatically say which filters need to be in the specified backend format and which should be in the default format. This needs further consideration. r139499: Completely clean src/transcriptions/, and also clean tools/mt/apertium/filters/. r139442: Do not use PKG_CHECK_MODULES if you don't really have to - it clutters your code and creates unneeded variables = noise. r139241: Corrected placeholder string for two-letter ISO language code. r139232: Changed the path to the css for the xml speller test results in devtools. r139138: Added support for building alternate orthography fst's for dictionary and oahpa, and also morphers for alternative orthographies. Slight simplification of defs. r139116: One small change to support spellers for alternative orthographies built off of the raw fst instead of the standard fst. r139107: Added a possibility to build fst's for alternate orthographies based on the raw fst surface forms, instead of from the default/standard orthography. r139057: Changed all references to $(GIELLA_SHARED)/common into $(GIELLA_SHARED)/all_langs. r139045: Rewrote the code for identifying the location of GIELLA_CORE (former GTCORE). The code should be more robust, and is prepared to check against a pkg-config pc file as well. GTCORE is still used throughout the code, but in parallel to GIELLA_CORE, so that one can easily replace the former with the latter without causing bugs or other problems. r139018: Added checking for and setting of GIELLA_TEMPLATES, but only if you have defined GIELLA_MAINTAINER (renamed from GTMAINTAINER). Otherwise it is ignored. r138923: Revert experiment with priority union - it doesn't work as expected when weights are involved. Corrected filenames in the .SECONDARY target. r138908: Added download links to the build feedbad for 'make upload' in tools/spellcheckers/fstbased/desktop/hfst/. r138842: Final step to make the GIELLA_SHARED dir be found in all cases: assign the path from pkg-config to the variable. r138835: Removed the separate test for content, instead adding the test to each possible location, moving to the next location if no data is found. r138827: Changed the search order for GIELLA_SHARED data: * using --with-giella-shared=/path/to/giella-shared/data/root/dir * env. variable GIELLA_SHARED * env. variable GIELLA_HOME * env. variable GTHOME * env. variable GTCORE * using pkg-config This way it is always possible to overtide everything else using the --with option. Added comments. r138817: Added a configure test to check that there is actually data in GIELLA_SHARED. r138781: The giella-shared data dir is now found using several techniques in the following order: * env. variable GIELLA_SHARED * env. variable GIELLA_HOME * env. variable GTHOME * env. variable GTCORE * using --with-giella-shared=/dir/to/giella-shared * using pkg-config If all these fail, configure errors out. Since it a.o. uses GTHOME, the change should be of no concern to existing users having checked out everything. And since the svn location is still within GTCORE, it will also work for those checking out only the core and a single or a couple of languages. r138673: Second steps in renaming and splitting the gtcore into giella-core, giella-shared and giella-templates: replaced $(GTCORE)/giella-shared with the Automake variable GIELLA_SHARED. r138663: First steps in renaming and splitting the gtcore into giella-core, giella-shared and giella-templates: renamed variables. r137357: Generalised the build instructions for the morphological segmenter, aka the morpher. The morpher output can be used as input to a stemmer. r136448: Fixed a bug in speller builds introduced lately - missing hfst target. r136431: Updated filename reference, and added a pmatch setting fixes that the issue where words next to punctuation like "ja." don't get analysed. r136374: Removed '+' in front of tag patterns to be extracted from the tag list and used as input to regex generation scripts. This was done to accomodate the use of prefix tags, where the '+' is at the end of the tag, not in the beginning. r136363: Added new test to check that the speller accepts all lemmas in the lexicon. r136280: Rewrote the pmatch compilation code to support Kevin's tokenisation hints for MWE-ambiguous entries. Requires Kevin's hfst fork for now. Work in progress. r136207: Small change to support new style, backtracking based tokenisation experiments on space separated compounds in sme. r136015: The next batch of changes to support building hfst fst's with a specified backend fst format: desktop spellers are now supported. The speller fst's will be built using the specified backend format up to the point where corpus and tag weights are added, when the fst format will be changed to the default (openfst-tropical) format. That is, even if you specify (the unweighted) sfst as the backend format, the final speller will still be weighted. r135695: Better variable name and clearer comment about editing distance in spellers. r135618: Changed the build files for the desktop spellers to allow better user control of which files to include in the error model. r135594: Use priority union to avoid duplication of paths and thus make a mutch smaller (and hence faster) mobile speller fst. r135584: Use priority union to avoid duplication of paths and thus make a mutch smaller (and hence faster) speller fst. r135221: Fixed bug in building Oahpa fst's for alternate orthographies and writing systems. r135102: Fixed a bug in the default build of grammar checker analysers. Blocked all languages without local overrides. r134870: Moved removal of word boundaries out of the default, language-independent processing of the grammar checker analyser - we want to be able to do language-depending things with word boundaries, e.g. in freely compounding languages. r134780: Added provisions for including xfscript files in the src/morphology/ directory. r134762: Removed unneeded subtraction that just increased the size of the resulting fst a lot (how much of course depends on the grammar in question). r134733: Added initial support for doing more targeted regex replacements on multichar sequences in parallel to the regular editdist operations. The idea is that these replacements can be applied more times (since they are few), and thus allow for more corrections of frequent spelling errors. r134357: Restricted the new spellrelax to only give one tag. The previous version caused out-of-memory issues on a lot of systems. r134222: Added support for alternative orthographies in spellers. Works nicely in LO, but needs more testing. Also updated the clean target. r134145: Added a new spellrelax system that will add an +Err/ tag (or more) to the analysis of words misspelled according to the new spellrelax rules. Can be very costly in terms of size if applied to large lexical fst's, and if many error types are tagged, so initially it is only applied to the transcriptor fst (which are used in Oahpa). Template data is from Plains Cree (crk). r134126: Fixed compilation error: added missing inversion (.i). r133943: Changed the final file format for hfst transcriptors to the hfstol format. r133935: Fixed a bug in speller building with Xerox tools enabled. r133822: Added support for filters for the top-level speller dir, in preparations for needs by the Haida spellers. r133578: One more bugfix for tag reordering with language-specific additions. r133568: Fixed bug for tag reordering with language-specific additions. Made building of glossing fst's configurable, and at the same time fixed a build bug for them. r133537: Added initial support for hfst-based tokenisers, built on generalisations of Kevin's work. They are built using the hfst-tool hfst-pmatch2fst, which is the Hfst implementation of the pmatch tool from Xerox. Supports a regular tokeniser, and one targeted at grammar checking. r133451: Corrected errors in the makefile that stopped dictionary fst builds for languages with alternative orthographies. r133384: Build analyser for grammar checker when grammar checkers are enabled. r133325: Generalised hack to force make to go via hfst instead of directly to hfstol. r133254: Added support for specifying backend fst format also for (parts of the) apertium fst's. One step further to speed up compilation by specifying e.g. sfst as the backend format. The implementation is a bit hacky, but will have to do for now. r133243: Added support for building glossing analysers, where the analysis tags are NOT shifted around to canonical positions. The idea is that one keeps tags and morphs together in the lexc code, and that the analyser output thus will reflect the order of the surface morphs. If one wants to build such analysers, one has to specify the final analyser filename in src/Makefile.am. r132774: Corrections to make the Oahpa builds work, and also to properly build with foma. r132767: Corrected an error that made the new option to select fst format (=backen) in hfst non-functional. r132760: Now also Oahpa transducers are ready to be built with a specified backend format when building using hfst. Also cleaned up the code, removed 300+ lines of code, and added support for builds using Foma. r132747: A number of corrections to the previous commit for issues missed during first round of testing. Now specifying an alternative backend format works correctly for all standard analysers and generators except for Oahpa-fst's. r132713: Enabled the new option to specify transducer format when compiling with Hfst, to speed up compilation time by using an unweighted format (ie sfst or foma). Default is still openfst-tropical, until further testing is done. r132700: Further preparations for enabling the new option to choose the backend format for fst's, for compilation speed improvements in cases where weight is not used: generalisations and corrections of build instructions. r132669: Bummer: wrong default backend format - only openfst-tropical is stable, the other formats are more or less buggy. r132663: More preparations for new configure option to specify backend format of compiled fst's. r132653: Preparations for new configure option to specify backend format of compiled fst's. Removed some old code. r131846: Further abstractions over parallel patterns, reducing code size. r131125: Remove generated files also in tools/mt/apertium/tagsets/. r131115: Updated required versions for Hfst and VislCG3. A number of bug fixes and new features require these versions for many of our tools. r130847: One pattern rule had for some reason become ambiguous, and caused strange build behavior. Replaced with full filename in one stable case solved the issue. r130730: cut on linux does not like unicode chars as delimiters, use awk instead. r130586: Code cleanup - moved target variables related to running xfst tools to the xfscript include file, and thereby removing duplicate code. r130504: Added an option to enable building tokenisers, off by default. r130224: Do the CG3 tag relabelling in the Giella infra, not in Apertium. r130128: Forgot to rename the Area variable in the previous commit. r130117: First iteration of adding support for Area codes (ie countries) based on ISO 3166 codes. Right now does nothing except filtering out the tags, proper support coming in steps. r129977: Better handling of hfst/xfst/foma for the top-level speller dir - invert when needed. r129968: There was still one more automake file with references to the remove-derivation-position-tags.regex filter. Now they are gone. r129939: A typo made reversed compose&intersect seem buggy, whereas in fact it was not. r129915: Small correction to bring the Giella version of reversed comp&intersect closer to what Miikka has: added minimisation to the reversed twolc rules. r129867: Added configure option to reverse the lexicon and the morph-phon rules during composition and intersection. Reduced the time needed for that operation to ≈1/3 of what it used to be in SMS, and RAM consumption went down from 11Gb to max 400Mb! Speed and RAM gains will vary from language to language. r129841: Now the lexical fst is first compiled into a .tmp file, to allow language-specific changes to be applied from .tmp to final file. More support for xfscript compilation. r129823: Added include to xfscfript-include.am, to let xfscripts be used in lexc compilation. r129796: Second part of libdir cleanup: removed the libdir line in the pkgconf file. r129790: libdir -> datadir for zhfst installations using autotools. r129752: Removed some references to remove-derivation-position-tags.regex that were forgotten in commit r129657. r129722: Added analyser-disamb-gt-desc.hfst as a noinst_DATA target, to force make to build it instead of going directly to the *.hfstol file, and thus breaking compilation when local modifications are needed. r129689: Added INVERT_ variables to help in improving compilation of analysers and generators for different fst technologies (Xerox, Hfst, Foma). Hfst has the inversed convention for lookup compared to the other two, and by using a variable we can now actually share the same build code irrespective of which one we need to inverse for the final analyser or generator. r129657: Removed the remove-derivation-position-tags filter from language-independent processing, it is language-specific, and will be added to the languages needing it. This also makes it possible to do further local processing dependent on these tags. r129627: Error out if Hfst is requested but not found or too old. r129095: LibreOffice-voikko 5.0 support for spellers with alternating writing systems. r129044: Initial support for building zhfst files for mobile phone keyboards. This version is essentially the same as the desktop one, we'll start from here and adapt as we find better solutions. The zhfst file is compressed using xz for optimal file size (this is presently in violation of the zhfst specification, it must be updated soon). Also changed some of the configure options to error out when requested but without the required software installed - this is better than silently turning the requested feature off. r129031: Cleaned the speller build and configuration code in preparation for adding support for building mobile spellers. r129012: Added a missing SUBDIR, and fixed a speller test script that was not working. r128994: Updating path in test script. r128983: Corrected a miss in the previous commit. r128981: Moving the hfst speller test dir inside test/tools/spellcheckers/fstbased/desktop/. r128977: Preparing to reorganise the speller testing parallel to what has been done in the development dir. r128972: Updated path to desktop speller files. r128934: Corrected a few misses in the previous commit. r128931: Major reorganisation to support building zhfst files for mobile systems (aka keyboard + speller). These need very different weighting priorities, another error model, and are thus placed in a separate subdirectory from desktop spellers. r128901: First step in adding support for mobile phone spellers. r128872: Added support for building LO-voikko 5.0 extensions. Python-based interface to LO, and initial support for specifying unknown speller languages by typing in the language code in the language name field. r128639: Commented out xz compression, it isn't supported by libvoikko. r128072: Changed test pair conventions for twolc from !€/!$ to !!€/!!$ to make it follow the conventions in the rest of the infrastructure, and make it possible to include test data in the documentation. r126367: Readded the initial-letter edits in the regex - everything else is there for the initial letter machinery, so leaving it out made the build inconsistent. The default is off, with a large warning for those turning it on. r126340: Added script to run suggestion testing for the hfst-ospell-service (MS Office) speller. Rewrote the speller testing scripts to allow parallel execution. r126115: Make transitivity tags optional also for the Apertium generator. r126072: Push weights even when not minimising the speller acceptor. Minimisation is not always the best strategy. r125928: Removed --Werror from the language-independent automake file. Added a variable to make it possible to add it to the language-specific automake file. r125918: Added configure option to enable symbol alignment during lexc compilation for the lexical transducer. Defaults to off for now, we need to test the effect on various languages before making it default to on. Also added --Werror to lexc to make it break on all warnings when compiling the lexical fst. r125889: Use tar + xz for a 40-50 % reduction in file size for zhfst files. r125801: Allow longer filenames by using tar-pax for make dist. r125756: Added upload target for zhfst files. That will be the only method for spell checking in more than one language for now (for regular users). Not ideal, but have no time for anything else. r125485: Ensure that all required cg3 files are copied over to the apertium dir. Also make sure that included files are copied before including files are processed. r125470: Silent build updates for Apertium. r125444: No morphology backend for now in our infra. Corrected typo. r125405: Added support for the vfst fst format for voikko-based spellers, to be used in mobile apps. r125348: Corrected typo. r125331: Upload xpi and MacVoikko files, beta versions. r124945: Look for saxon in $HOME/lib first. Fixes bug http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=2100. r124920: Add lexicon version to the speller testing output. r124883: Added a new variable HAS_FOMA, which will be set independently of the configuration if foma is available. This can be used to circumvent bugs in Hfst if weights are not needed: if foma is available, print as ATT, read in foma, perform transformations, print as ATT, convert, and continue. r124785: Error out if one tries to build abbr files with generators disabled. r124773: Error out if syntax is enabled and no vislcg3 is found or too old. r124730: Added support for building abbr.txt. Copy of the sme template committed in r111579. Hopefully fixes bug 2030. r124428: Added targets for foma spellers, outcommented now due to build issues. Added more silent build strings. r123664: Added some general tag cleanup before making the speller fst used as input for the analyser and generator that is the last step before building the acceptor, Makes it easier to write yaml tests for the speller fst's. r123037: Added filter to remove tags irrelevant to speller builds. Adjusted required version of GTCORE accordingly. r122780: Corrected a bug with filter compilations for speller filters involving tag conversion to flag diacritics. r122724: Make sure analyser-raw-gt-desc.hfst is always built, to ensure we have the necessary prerequisites for all targets. Refactored the initial speller fst build to use common build code for all fst technologies. Makes it possible to easier test and compare test results when debugging. r122125: Changed the response to missing transducers from FAIL to SKIP to avoid problems with lexc tests for fst's not enabled and thus not available. Instead report the missing fst to the user. r122053: Streamlined descriptive compounding tags to follow a shared tag structure. r121755: Added a comment about the non-functioning of the initial edit setting. Made the compound-restricted fst a tmp file, to allow for additional local processing. r121729: Removed all minimization of the error model except for the final build step. Removed also the initial letter handling for now, it blows up the error model, and slows it down correspondingly, making spellers that has turned this on useless. For now we apply the regular error model on the first letter, that seems to work ok. r121564: Added a very short test script written by Lene to help run a subset of tests frequently needed. r121514: Fixed a problem running bc on the linux servers, which caused the yaml test summaries to be blank. r121418: Added an option to specify how many lines of the frequency corpus to be used in the frequency weighting, to trim the acceptor fst at a point where the weights don't really matter. Removed all occurrences of remove-epsilon, determinisation and minimisation of intermediate speller fst's - this cut the size of the final acceptor in two! r121218: Replaced 'giellatekno' with 'giella' or added Divvun, depending on context. r121210: Renamed m4/giellatekno.m4 to bring it in line with the switch to 'giella' for all things common to GT and Divvun. r121204: The previous commit did not solve the issue - the different jars where checked in the wrong order. Now it should be ok. r121189: Added standard Linux location for Saxon to the paths searched. Fixes bug #2080. r121124: Corrected path for pkgconfig data and one variable name in MT filters. r121097: gtdshared has been renamed to giella-shared, all references now updated. r121060: More robust handling of MWE in speller testing. Now also possible to specify build dir different from source dir. r121050: Added a Makefile.am variable to turn on or off corpus-based (frequency) weighting of suggestions. Default for the time being is off while we work out the best interactions between the different parts of the spellers. Changed one intermediary filename to ensure proper dependency checks and thus rebuilds. r120994: Added support for specifying regexes or list of string pairs for initial and final symbols in the error model. Also added a Makefile variable to control whether to allow edits of the initial letter(s), default is ‘no’. r120628: Guard against -q for lookups that don't support it in test scripts. r120599: Small code cleanup that has been lingering since June. r120420: Made new of a new option for the speller suggestion testing: output an attribute on each test word element containing essential info about the correct suggestion. This will support better styling of the xml file with the test data. Also changed the path to the css from the local filesystem (which will vary from machine to machine) to the svn repository web url. r120182: Added a variable to hold source files to be included in the distro but not compiled as such. r120120: Added first version of a shell script to check the suggestions generated by spellers. Requires the file test/data/typos.txt for data input. r120070: Shortened a filename to make tar happy when building distribution packages. r120030: Fixed an error in distcheck - one test shell script was not included. r120005: Made one step in the speller build behave properly wrt silent builds. Removed grammar checker targets, we are far from ready for this, and it breaks 'make distcheck'. r119950: Added a variable to pass a compilation option to hfst-regexp2fst. Used this variable to compile all filter regexes with the option --xerox-composition=ON. This will ensure that all filters where flag diacritics are used as symbols will be compiled correctly for proper used in later compositions. A.o. this fixes a bug where tags converted to flags to restrict compounding did not work at all. r118834: Replaced sed expression with double cut - the sed did not work on the xserve for whatever reason, and caused the testing to hang. r118809: More robust checking of Saxon, now requires that any jar found is at least v8.0. r118657: Added /usr/share/java/ as a search path for the Saxon jar, this is what is used on the UiT Linux virtual machines, and probably many other Linux systems. r118602: Initial support for building Mozvoikko spellers for our languages. r118333: Adding support for specifying one-sided tests (half tests) in the lexc test data, using an optional .gen or .ana "suffix" after the fst name. Simplified source file processing. r115589: When building with Foma, use the new lexc-align feature. r115439: Added lexicon filtering when pair-testing twolc rules. r115155: Corrected e-mail address, changed the template content of the transcription files from SMA to CRK, and at the same time corrected the direction of the code. Also added a default punctuation lexicon. r115069: Added support for easter eggs specific to alternative writing systems and other variants. Will help in debugging. r114915: Moved specification of default weight and editing distance to the language specific Makefile. r114904: After a lot of experimenting, a moderate set of changes to the speller error models. The biggest change is that the alphabet for the edit distance error model is not taken from the acceptor anymore, but must be explicitly listed in the editdist.*.txt file. The suggestion speed is back to normal, but more work is needed re the interaction of the error model and corpus weights. r114860: Prefixed all silent build strings for Hfst tools with H, for easier identification. r114468: Commented out the old target for calculating unit weights (default weight for out-of-corpus word forms), and added a new which is basically the highest tropical weight + the ALPHA smoothing value. This is just the first step in further developing the suggestion ordering for the spellers. r114332: Added a simple test to check a minimum suggestion speed for our test word nuvviDspeller. No speller should be released that does not pass this test. Additional and more elaborate tests should be added as well, this is just the very bare minimum in suggestion speed testing. r114310: Corrected typo in twolc compilation for foma (using hfst). r114286: Worked around a bug in hfst-fst2fst by going via att and foma instead. r114281: Initial support for compiling twolc files for foma by way of hfst, intersect and conversion to foma format. r114124: Yaml testing is now working also when building with Foma. r114105: Fixed downcasing of derived short names. Made yaml testing output a bit more readable (hopefully). r113871: More robust xfscript build code for hfst-xfst. Clean hfstol files. r113770: Extended Foma support to alternate writing systems and orthographies. At the same time put to use a new idiom to handle multiple independent target variables / patterns, which will be useful in other contexts as well. The code was generalised using this new idiom, and effectively reduced to half the original code size, with much less duplicate code. r113520: First working version of foma builds. The basic set of analysers and generators are built, but nothing else. A lot of changes to variables and build rules, including generalisations that save quite a few lines of code. r113372: First steps to support building with Foma. Lexicon compilation is working, but note that Foma crashes on regexes in lexc. r111916: Slightly more robust pair-testing with hfst. r111898: Corrected rsynk options also for the alternate writing system oxt's. r111883: Fixed a build bug for MacVoikko, causing the final target to always be out of date. Regulated verbosity for zhfst targets. The twolc testing scripts now print a message when there is no test data. The Hfst twolc testing script properly detects when there is no test data, and exits with the SKIP (77) value. r111786: Added support for alternate writing systems for spellers. r111627: Finally got all weighting to work as intended, including the no-sugg weights. r111511: Further modularisation and improvements to weighted spellers. With hfst3 revision 4329, using a tab-separated tag reweighting file is working. r111192: Do not remove usage tags when building spellers, speller tags were throwned out. r111179: Added an attempt at normalising the corpus-based weights towards a standard max upper weight, to allow a much higher weight for strings not to be suggested. Also split the processing of adding corpus-based weights and morphology weights into more steps - retaining each intermediate fst - to allow easier debugging of the weight assignments. r110944: Xerox composition of weights and lexical fst. r110791: Moved a script for cleaning weighting corpus to the core. Require new core. r110770: Fixed bugs related to the new support for frequency-weighted spellers: missing checks for required tools. r110758: Stupid copy-paste error turned the positive test into a negative. Now corrected. r110747: Skip Xerox testing if no test data is found. Added comments. r110734: Added pair-test for hfst, improved pair-testing for Xerox' twolc. r110661: Add a huge weight to words tagged with +Use/SpellNoSugg. r110646: Added support for corpus-based (frequency) weighting of the speller fst's. Also reorganised where to specify the tag-based weights (and this is subject to change pending a bug fix in hfst-reweight). All languages are given a toy corpus, which can be replaced with a real one. This is finally the core of Tommi's dissertation applied to all languages. r110513: More robust testing for Xerox fst's - will properly report all generation fails. r110464: Corrected tests for nouns and propernouns. Now nouns behave correctly with hfst, and proper nouns have correct tags. r110331: Modernised the generate-noun-lemmas.sh.in script, added similar scripts for adj, proper nouns and verbs. r110301: Check that yaml testing is enabled before running yaml tests in test/tools/. r110208: Require new version of the core, updated comments about Err tags. r109663: Removed CmpNP tags from downcase-derived-proper-strings.xfscript.in. r109310: When doing 'make clean', remove generated html files in the root dir. r109253: Removed multichar definition of superfluous flag diacritics. r109225: Added a new directory named devtools/ to each language, with the idea that it should contain tools useful for development, but not necessarily suitable for automake testing. Initially it contains shell scripts to generate a table of generated word forms for each continuation lexicon. r109112: Removed corpus names from tools/spellcheckers/fstbased/hfst/data/Makefile.am. It caused the build to stop with an error for all languages except FIN. r109098: Make building the abbr.txt configurable (default=no), check for the existence of src/morphology/stems/abbreviations.lexc, and error out if not found. r109092: Forgot to include the new Makefile (r109076) in configure.ac. r109077: Path correction. r109076: Preparations for supporting corpus-based frequency weights, as per TommiP. r109063: Enabled weighting of speller fst's. Adjust weights and tags as needed. r108914: Added support for all languages to generate the abbr.txt file used by $GTCORE/scripts/preprocess. At the same time added initial support for compiling pmatch scripts into fst's for hfst-proc2, which is the future alternative to preprocess. r108859: Forgot to remove some debug statements from the yaml test runner. Now cleaned. r108840: Moved MWE tag processing into the core - we want this for many languages. r108818: Added support for a new type of yaml tests: speller acceptance testing. The basic idea is to just give a list of words and word constructions (compounds, derivations, etc) the speller should accept or reject, and let the yaml test bench verify whether this is actually the case. r108755: Several changes to properly support all position-based +CmpN/XX tags: * moved tag path splitting and tag-to-flag conversion into separate regex files in the core. * added support for compiling and using the new regexes * added support for a new type +CmpN/Suff * added the required multichar symbols to the root.lexc files * increased required core version number Fixed a bug in the yaml test bench when both hfst and xfst was enabled, but where only one type is built, e.g. for Apertium. r108692: Added build support for alternate orthographies: default fst's, dicts and oapha. r108675: Fixed a bug that caused the wrong fst to be picked in certain cases, which caused the test script to fail. r108646: A couple of changes related to testing: * require Python 3.3+ * require new gtcore * update YAML test runner to make SMS testing work as intended also with Xerox r108560: Added support for country/region specific proofing tools in configure.ac. r108404: We do not support anything but the latest/newest Voikko now. r108395: Finalised the basic multiple writing system support, by adding support for Oahpa and dictionary fst's. r108384: Added a configuration flag to enable two-step compose-intersect. In most cases this will not make any difference, but for some languages it will correct a bug in compose-intersect that would otherwise create a bad fst, and for other languages it will make the operation much slower without changing the fst. Disabled by default, whether it is useful must be tested in each case / language. Also made the verbosity handling such that when verbosity is on (V=1), some tools are now more verbose, for better help when debugging. r108355: Corrected errors in hfst compilation of alternative writing system fst's. r108322: Added test runners for genation and analysis tests only for the descriptive fst. r108289: Compilation of the default set of fst's with alternate writing systems working. r108187: First step in adding support for alternate writing systems and orthographies: adding variables to configure.ac. Removed the variable LO_min_version, it isn't used. r108134: Split the m4/ax_python_module.m4 file, it contained mostly java autotools stuff. Improved the message to update the gtcore. r107335: Added the make-optional-hyph-tags filter to the generators. Fixes bug #1914. r107310: Make use of the new remove-adv_comp filter. Require new core and newest hfst. r107278: Put to use the make-optional-adv_comp filter. r107272: Don't build xerox fst's within the Apertium dir tree - no need for it. r107222: Require new core because of new filters. Use hfst-optimized-lookup in the yaml testing, should speed up hfst testing quite a lot. r107150: Put the new optional minip filter to use, and increased required gtcore version. r107122: Replaced all instances of sub and lexsub filters with the new, generated error filters. r107115: Added support for extracting error tags and constructing filters for manipulating error strings and tags. Updated required version of gtcore. r107106: Remove variant tags in disamb analyser. r107047: Xerox fst's are irrelevant to Apertium, don't even try to build them. r106998: Use the new make-optional-v1-tags filter for apertium generators. r106982: Forgot to include the new regex in the src file listing in the previous commit. r106973: Corrected dictionary generators to require a variant tag except for +v1, which is optional. r106966: Removed 'invert net' from a couple of more instances. r106957: Treat Hfst and Xerox the same during *tmp.Xfst and *.Xfst build - invert both only in the last step when going from tmp to non-tmp fst (invert the analyser for hfst, the generator for xfst). This should remove one more confusing difference between the two. r106951: Check that we have at least Python3.1 when enabling Apertium, error out if not. Also add AM check for hfst-optimized-lookup. r106451: A small, functionally equivalent change: from suffix rule to pattern rule. r106428: Now +CmpN/Pref is correctly supported (earlier it was treated as +CmpN/First). r106402: Corrected fst file reference in test shell script. r106398: Corrected source file reference in test shell script. r106356: Changes to a couple of Makefile.am files to fix issues with 'make dist'. r106346: The last part of the CmpN location restriction flag diacritics added. r106245: Code cleanup: no use for the M4 part - the null alternative did not work. r106226: Finally nailed all combinations of fst compilator and lexicon minimisation - now downcasing of derived proper nouns is working as it should again for both Xerox, Hfst hyperminimised and Hfst normal lexc compilation. r106160: +CmpN/Only supported, first steps in tag splitting taken. r106122: Moved code common to all yaml testrunner shell scripts to an include file in GTCORE to avoid code duplication and reduce the risk for introducing bugs. This requires the newest version of the CORE. Because of the inclusion, I had to rename the test runner to .sh.in, and added autoconf processing of it. Also added a test file for testing the base speller fst (it must be tailored to each language of course). r105950: Last change to get hyperminimisation to produce the correct output: made the derived-proper downcase script being processed by autoconf, so that we can require a symbol in a certain context, and at the same time in the end let the symbol be empty if not needed. r105935: Added optional flag diacritic inserted by Hfst hyperminimisation. This resolves the remaining cases of errors after the hfst team fixed a bug in lexc compilation with hyperminimisation turned on. Since it is optional, it does not make any harm when using Xerox or when not using hyperminimisation. r105926: Added xerox variable flag-is-epsilon to the tag reorder regex. This fixes most of the cases of errors after the hyperminimisation bug was fixed in hfst-lexc. The remaining errors must be fixed in the downcase-derived-proper regex. r105715: Added more silent builds for hfst tools. r105673: Added conversion of tags to flag diacritica for position-restricting tags. These are currently used in sma, sme, smj and sje. Together with some additions to the R lexicon, the tags will finally do what they are meant to do for hfst-based spellers. r105616: Added Multichar symbol definitions for flag diacritica controlling compounding based on position tags. Done for most langs, the symbols will be ignored if not used. r105496: New: added example test file for the fstspeller fst file (starting point for foma and hfst spellers). r105492: Fixed: errors in the yaml test runner when the fst has a suffix 'hfst'. r105488: Fixed: directory and fst names in the yaml runner shell script. r105484: Added support for yaml tests for speller fst's. r105438: Added support for Xerox fst's in tools/spellcheckers/fstbased, mainly to help in debugging hfst. Turned out to be very useful. Why can't none of the toolsets work properly? r105424: Improved comments to make the lemma generation script easier to adapt. r105390: Additions to generate the inverted fst's, to enable symmetric yaml testing. r105382: Fixed: order of filter application was wrong, causing all Use/-Spell forms to be included in the spellers. r105286: Fixed: error in easter egg building after the previous commit. r105284: Make sure the easter egg is rebuilt every time the fst is rebuilt. r105238: Fixed: The MacVoikko target contained one subtarget that built even when spellers were not enabled, and thus failed because of a missing dependency. r105201: A number of changes to make the MacVoikko.service build cleanly with proper dependency tracking. Also a bit safer cleaning. r105194: Fixed: The MacVoikko target was missing from noinst_DATA, thus it was not built. r105104: Added initial support for building language-specific macosx systemwide spellers. r104185: Added strip function to get rid of extra spaces, resolves bug in abbr.txt build. r103029: Included lexc files in src/morphology/ in the abbr file making. r103027: Expanded the source file base for building the abbr file, more like the old infra. r102952: Only delete (aka 'make clean') generated corpus files used for weighting if such files exist. Removes a very dangerous 'rm -rf .*' command. r102809: Fixed bug in the phonology building that caused extra source files not to be compiled. r102678: Removing Use/LexSub strings from all normative fst's. Fixes bug #1904. r102214: Added support for turning off building of vislcg3/syntactic tools. r101825: Improvements and corrections in the README file. r101818: Changed Hfst configuration: * moved xerox check before hfst check to ... * automatically enable hfst if the Xerox tools are not found * moved minimum version requirement definition to configure.ac * removed hfst-foma requirement, instead checking for all required tools * removed path check for obsolete hfst tools * improved hfst configuration messages * updated the summary text to reflect that hfst is automatically enabled These changes should ease configuration on systems without Xerox. r101729: Corrected names of compiled twolc files in test/src/phonology/pair-test*.sh.in. We need to use the 'compose' fst because compiled twolc files are not treated the same as other fst's. We can't just skip the new lookup friendly filenames either, because morphophonological rules can be written using xfscript, in which case the lookup renaming (and inversion) is essential. r101575: Corrrected references to the new lookup style fst names in the inituppercase test. Fixes broken inituppercase tests. Updated config header in initcap yaml file correspondingly. r101554: Now both general and language-pair specific relabelling using regexes are supported, in addition to using relabel files. The regexes allow context-dependent and multisymbol changes, whereas the relabel files only cover 1:1 mappings of single symbols. The actual change was to add support for regex files in the language-pair independent processing. The tools/mt/apertium/tagsets/README.txt file was more or less completely rewritten to better document the filenames being recognised, and how they should be used. r101434: Retain the regular non-optimised hfst analyser for easy paradigm generation using a regex plus composition. r101405: Fixed a bug in the Apertium build that blocked building of AP-tagged analysers. r101363: Make sure there is always an apertium analyser for 'und' if nothing else. r101193: Do not remove homonymy tags from the apertium fst's. Also simplified the automatic conversion by moving all non-automatic changes to a separate file, run as a sort of tag conversion postprocessing. Updated the tagset/README.txt file to contain info aobut the manually maintained postprocessing relabel file. Added an initial postprocessing relabel file containing word boundary and homonymy tag changes. r101189: Do not remove homonymy tags from the regular analysers. r101162: Fixed a bug in building Oahpa generators - orig-lang tags were not removed. Clean *.hfstol files in tools/mt/apertium/. r101043: Moved Apertium tagset creation and relabeling from src/tagsets/ to tools/mt/apertium/tagsets/. This should fix building of apertium fst's for fin, smn. r100989: Renamed AWK to GAWK in relevant places to get around another AWK test. Now gawk is found properly in all cases. r100985: Improved test for gnu awk. r100878: Require newest core to force people to upgrade to get an important bugfix. r100786: Fixed a bug in the core for generated regexes - a reserved char was not escaped. Required core version bumped. r100719: Hfst 3.8.0 is out, with a number of important bug fixes and improvements, including new options required to make our code build properly. r100565: Several changes to accomodate a downcaseerror variant of the L2 error fst for Oahpa: * added configure.ac option --enable-downcaseerror (independent of the L2 opt) * a number of changes to the build instructions for Oahpa to support the new fst * made the error fst compilation independent of whether an L2 twolc/xfscript file is used - if not, it will just use the ordinary twolc/xfscript file. This way it is possible to * svn-copied regexes from the old to the new infra, including to the core * increased gtcore version number and required version number due to new regexes r100542: Corrected wrong filenames and file references that blocked the oahpa L2 build. r100538: Tagset relabeling didn't work for xfst files, now it does. Also generalised the use of relabel files (for use with hfst-relabel). r100499: Simplified the building of hfst's with alternative tagsets. Silenced regex compilation. r100478: Last part of the lookup & composition cleanup: phonetics and phonology now covered. Now all non-lexical and non-filter files have a suffix .compose.* or .lookup.* depending on their intended use, and they are all properly inverted where needed (i.e. only for Xerox' lookup tool). There might still be source files to clean, but that is a separate step. r100468: Corrected a couple of cases where old filenames were still used, and thus broke compilation. Also improved filtering of transcriptors, and constructed transcriptor target names dynamically based on the source files. r100453: Xfscript and lookup cleanup: now we explicitly build files made for lookup and composition marked in the filenames. This is done for hyphenation and for orthography, phonology and phonetics still to be done. From now on there should be no need to use invert as part of the xfscript code - DON'T DO IT! All targets updated to use the new filenames. Removed inversion from the hyphenation xfscript. r100424: Use explicit pipe mode with hfst-xfst. r100413: Moved Apertium target language specification from configure.ac to tools/mt/apertium/Makefile.am. Changed the target filename construction to better follow the Apertium naming scheme. Fixed a bug introduced about four weeks ago that destroyed the dependency chain (due to a bug/fragileness in GNU make). r100356: Cleaned up building of target fst's using the lookup-include.am file. Now all hfst transducers in optimised lookup format have the suffix .hfstol, and optimisation should not be hidden or implisit anymore. All test scripts should be updated as well. Also move all common targets from src/Makefile.am to am-shared/src-dir-include.am and sub-included AM files. This cleans up the src/ dir Makefile.am quite a lot. r100346: Added support for additional local lexc files not part of the lexical fst. r100126: Several changes to clean up the mess with the transcriptors: * moved transcriptor final builds from src/ to src/transcriptions/ * renamed transcriptor source files and targets * streamlined transcriptor compilation to use lexc-include and lookup-include * also silenced xfst in lookup-include.am r100035: There were a couple of issues in the previous commit: * vpath directive didn't work reliably * L1 and L2 variabless were declared for easy merging, but in a way that AM didn't like * forgot to change the name of the lexical fst in the filter processing r99883: Several fixes to accomodate L2 (language learner) analysers for Oahpa: * removed silent build instructions from twolc-include (they are taken from the silent-build-include instead) * added support for compiling L2 phonology/twolc files when configured to * renamed $(GTLANG)-lexc.?fst to just lexicon.?fst. * added support for the error analyser in src_oahpa-include.am * added configure support for the L2 analyser (off by default) * added support for building the L2 lexical fst using L2 source files * added variables a.o. to support specifying L2 source files in src/morphology/ r99793: Added support for filters written in lexc and xfscript. Renamed variables and added a lexc-include.am file to support general lexc compilation. r99665: Fixed an unfortunate AM syntax error that blocked Automake, and thus all builds. r99587: Another filter build cleanup: all filter regexes in core are now built for all languages. One obsolete filter was removed. r99584: Fixed a problem with MT filter compilation that only revealed itself in sme. r99579: Cleaned the filter build files even more. Now only local / language specific regex source files need to be listed in the local Makefile.am. r99574: Added a new filter to the filter compilation. Used the new filter to build correct fst's for dictionary analysis and generation. Increased the version number of the required gtd core version, due to the new and required filter in the core. r99544: Major cleanup of filter and tagset compilation: * moved all non-local data and build instructions into am-shared/ * created dir-specific am-include files * clean use of regex-include.am * removed sme-specific source files from tools/mt/apertium/tagsets/Makefile.am * switched the apertium filter use to use the one built in src/filters/ instead of rebuilding it r99473: analyser-oahpa-gt-desc should be analyser-oahpa-gt-norm. Now renamed. r99462: The listbased speller fst is now generated properly using both Xerox and Hfst. r99451: Fixed a logical error that turned off all hfst spellers. Renamed a variable. r99445: Only build Apertium tagsets in tools/mt/ if Apertium is turned on. r99425: Corrected a syntax error in the src_disamb-include.am file. Moved all fst trimming of general interest from tools-spellcheckers-listbased to tools-spellcheckers. Made the configuration so that list-based spellers will only compile if configured to build Hunspell. Also tried to make the configuration of other spellers such that they are automatically off when spellers are off. r99366: Batch two of the initial letter downcasing fix. r99350: Downcasing of the initial letter of derived proper nouns (Pariisi -> pariisilainen) is now finally working with Hfst. It requires Hfst svn rev. 4000. r99221: The first major step for adding support for generating list-based spellers such as Hunspell and the PLX (Polderland/MS Word) spellers. The conversion is not trivial, since we try to control compounding according to the linguistic specifiation in the lexicon (using tags). Although PLX is only for three Sámi languages, Hunspell conversion should be useful for all languages in our infrastructure. No real Hunspell or PLX files produced yet, only prerequisite fst's. - At the same time fixed a glitch in the version checking of VislCG3 that would turn off support for CG files now that the vislcg3 svn revision number has turned 10 000. r99176: Added support for local overrides of the base speller fst. r99109: Generalised and simplified the code for building oxt's - no more hard-coded filenames. Now the LO-voikko versions supported as well as the platforms are just defined in two variables, and the rest follows from there. The build code also handles cases of unsupported combinations of voikko versions and platforms. Also silenced the build quite a lot in non-verbose mode. r98986: Switched to universal binary build for the LO41 voikko OXT. r98767: Made the hfst optimised lookup file format explicit by using the .hfstol suffix, and by optimising files for lookup in a separate build step, instead of implicitly as before. So far only for tools/mt/apertium/, but more will come. r98696: Made speller minimisation default to yes, specified where to push weights. r98671: Added --encode-weights to determinise and minimise. This fixed the never-ending compilation of Finnish spellers. r98633: The optimisations that worked for Greenlandic didn't work for Finnish, potentially due to Finnish being corpus-weighted and thus posing more challenges to determinisation and minimisation. Because of this the Greenlandic optimisation is now rolled into the configuration option --enable-minimised-spellers. r98616: Added size and speed optimisations to the speller compilation process: remove-epsilons, push-weights, determinise and minimise. Together this made the KAL speller *much* smaller and *much* faster. It is now as fast and small as any other fst-based speller. r98563: Hyperminimisation seems to be stable now, and we offer it as a standard configuration option. Also added autoconf support for the preliminary tool hfst-proc2, to facilitate easier testing of the tokeniser/analyser. r98486: Updated the tagset targets to support Xerox fst's, and tagset replacement using regexes instead of the hfst-only relabel tool. Now all languages can get localised analysis and generation tags by adding a regex file and specifying a few targets. r98469: Added build step to explicitly convert hfst transducers to optimised lookup format. Whitespace changes in the silent rule variables. Included the new lookup-include file in src-dir-include.am. r98459: Preparations for better handling of lookup & testing of free-standing lexc and rewrite rule transducers: added build rules to do inversion of fst's intended for lookup. r98454: Added a test dir for the upcoming hfst-based tokeniser. r98323: Corrected some paths to enable VPATH building of spellers. Added support for retaining intermediate files when building using "make --debug". r98165: Added support for building OXT for LO/OOo 3.6-4.0 for Mac. Language support is limited. r98043: Properly clean src/morphology/. r98034: Encapsulated most shell variable names in {} to handle hyphens etc in the variable names (after merge/update substitution of the __UND__ string). r98024: Added a dir src/morphology/generated_files/ containing files generated during the build process. This is done to make a clear separation between files to be edited and files to be ignored. Also added a directory src/morphology/incoming/ to hold incoming lexical resources used to build the lexc or xml source files. Both dirs have a 00README.txt file explaining their use. r97047: Added WANT_OAHPA option for analysers for all languages (until now only generator) r97041: Added oahpa analyser as target (oahpa here meaning L2 transducer) r96645: Fixed a bug in sigma extraction on certain Linux systems. r96604: Better/more generalised handling of tag modifications. r96280: Added removal of lines marked '#RemoveFromApertium' from the apertium cg3 files. r96100: Removed temporarily the downcasing of derived proper nouns from the hfst transducers - it causes them to become malfunctioning. r96043: Fixed bug introduced yesterday that broke compilation of certain xfscript files. r95955: Forgot two files in the previous commit. r95954: Added support for testing the fst for initial upper casing of strings. This also includes yaml test support for non-analysing/-genereting fst's. r95796: Properly handle downcasing of derived proper nouns as well as optional initial upper case. The optional initial upper-casing doesn't work for derived proper nouns when using Hfst because of an unimplemented featuare in hfst-xfst. It is reported to the hfst team. Increased gtd core version number due to new scripts and possible dependencies in the gtd core. r94834: Extracting flag diacritics, to build regexes that can ignore them in certain cases (like optional initial upper case). Requires new version of the gtd core. At the same time split tag extraction in two - the first step extracts the whole sigma set, and from that we can extract tags, flag diacritics, etc. The sigma set extraction was greatly improved, removing a number of small errors due to handling of reserved symbols in Hfst and Xfst. r94398: Added test summary for all yaml tests for a given fst. r94272: With feedback from Brendan I finally got the number of tests passed and failed printed as part of the YAML testing. r94267: Adaption to a new version of the morph-tester.py script by Brendan Molloy. Small adjustments to the yaml test printouts. r94063: Major bug fix to the generate lemma test script. Now it actually checks that the generated lemmas correspond to the listed ones. r94027: Bugfix: no hardcoded language codes. r94017: Now also (language pair independent) morphological generators for Apertium are installed with their correct Apertium file names. r94002: Added renaming to Apertium style filenames, changed installation file list to only include files actually used by Apertium. With this change, everything should be in place for a fully automatic integration between the GT-Divvun infrastructure and the Apertium infrastructure through the use of pkg-config files, with one exception: morphological generators. r93938: A rewritten pc file, with proper paths actually reflecting where things are installed, and with a shortened description to better fit the use of it. r93933: We also need to install the pkg-config file... r93927: After a long discussion, the moniker 'giella' was chosen instead of gtdivvun. Changed datadir from $(datadir)/gtdivvun/* to $(datadir)/giella/*. Added a pkg-config file so that all installed resources can be found automatically. r93880: Changed datadir from $(datadir)/hfst/* to $(datadir)/gtdivvun/*, as it is the directory used to install the gtdivvun products, and not only hfst transducers are installed. r93810: Require Automake 1.11.6 to avoid errors caused by older Automake's. r93210: Make semantic tags optional also for dict and oahpa generators. Added support for hfst fst's for dict and oahpa. r93205: Make semantic tags optional for all generators. Fixes bug http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=1854. r93153: Uncommented the cg3-with-apertium-tags targets, increased the gtcore version number. r92913: Started work on adding hyphenators. No substantial changes, just Automake conditionals. r92866: Actually made the options --disable-analysers and --disable-generators do what they should, earlier they had no effect. Also renamed those options. Wrapped the filter targets in mt/tools/apertium/filters/ in apertium conditionals, so that they will only be built if the apertium option is enabled. Added separate configure.ac option to disable the transcriptors (the num2text family). r92827: Make sure all tests are within conditionals - only run them if the fst's have been built. r92650: Added conversion of analysis tags from GTDivvun format to Apertium format for the vislcg3 files. The generated vislcg3 files are not valid, and the targets are thus commented out for now. r92404: Added support for tmp files in the apertium target language specific analysers, to allow local processing of those analysers. r92369: Rewrote tag reordering of semantic tags to use a dynamically generated regex, and split tag reordering in three: reordering sub-POS tags, semantic tags, and language specific tags. The two first reordering operations are done on all languages. The reordering is done when building the raw file, to build a fixed tag order that other fst operations can rely on. The raw file build had to be split in two steps because of this. r92339: Added support for target-language specific filtering for the Apertium analysers. r92295: A major update to the Apertium fst building: * corrected broken logic when building the list of tags used by a language * build filter to remove derivation strings dynamically from the list of tags * added a new taglist2remove...strings-regex.sh file to the core * added a new dir filters/ within tools/mt/apertium/ for building apertium specific filters * added facility to modify locally remove...strings.regex files by using an exception file * build the remove-derivation-strings.regex dynamically also for regular fst's r92170: Now building the remove dialect tag removal filter dynamically, in the same way as done for the semantic tags. Requires a new version of the GTD core. r92144: Dialect tags are now removed in the Apertium fst compilation. In addition, tags can now be custom changed and reordered on a language pair basis, see README.txt in tools/mt/apertium/tagsets/. r92118: Corrected several errors in the MT Apertium fst builds: now removing semantic tags and tags for originating language. Silent hfst-invert. r92100: Modified the gttags.txt target to produce output also in cases where no GTD tags are defined. Earlier the build would break in this case. r92029: Commented out another debug echo statement. r91993: Commented out a debug echo statement. r91943: Fixed a bug with optional semantic tags: we built the regex, but not the fst's. r91821: Corrected a bug in the lexc yaml testing. Fixed file refs in the dict fst tests. r91682: Moved yaml test scripts for different transducer types up one level, to correspond to the parallel location of the fst files in the build tree. r91672: Generalised the yaml test runner code, to identify the relative paths of the test scripts and the fst's being tested, so that all sorts of fst's can be tested irrespective of where they are built. Added yaml testing for MT/Apertium. r91623: Moved some back-end scripts for yaml testing to the uppermost test directory, to ease sharing of the same code across test subdirectories. r91605: Moved all silencing code to a separate include file (except in a few cases of double includes). Made the yaml testing a bit more verbose when rerunning individual tests (copy-paste testing). r91581: Changed target language specifc analysers to be based off of analyser-mt-gt-desc.hfst, instead of the *.tmp.hfst file, to allow local post processing to be applied in the step from *.tmp.hfst to the *.hfst file. r91564: Forgot to remove all the targets and build instructions in the old location. r91559: Finalised moving the Apertium MT build code to the new location. All parts have been generalised, and the set of target languages to go with a specific source language (when analysing) is specified in configure.ac. That is, just list your target languages in configure.ac, and off you go. One feature still missing: target language derivation (and other) string filtering for the source language analyser. Coming soon. r91448: Reorganised MT fst building, moving it to a new dir in tools/. This is done to avoid too much stuff in one dir (src/), and to make it easier to extend the MT support without making the build files too large for one dir. r91222: Added a tmp-file step for the raw fst, to allow local/language specific overrides when building the raw transducer. Required for Estionan. r91100: Added support for dialectal fst's in Oahpa. The dialect tags need only be specified in $GTLANG/configure.ac, and all filters and fst's will be constructed automatically. r91028: Made generated regex files build and be retained, as well as deleted when using 'make clean'. r91021: Added missing test runners, and at the same time made the test XFAILS (i.e. expected fails due to immature code). r91015: Greatly improved support for the dictionary fst building: * filtering semantic tags now work properly (removed for all but Prop) * properly silent when using silent builds * added dict-specific yaml test files and corresponding test runners * removed building of reduntant dictionary fst's since we now can test generation and analysis independently - we only test the fst's we are actually going to use, in the intended "direction"; this should noticably speed up compilation, especially when using hfst * all languages now build a dictionary analyser with a mobile phone spell relax Also removed semantic tags from the regular analyser, as discussed earlier. Increased the required GTD core version, as the new version is required to fix the bug mentioned above. r90845: Corrected the test that triggers a FAIL in the twolc negative test script. r90799: Temporarily add the new twolc pair tests to XFAIL-TESTS, to make them pass as expected fails. This will let all tests run, but will have to be reverted either when all broken twolc tests are fixed, or when the full test suite can be run without stopping make. r90726: Rewrote the test for awk to check for a feature found only in GNU awk, and use the one found. Will check both awk and gawk, and use whichever supports the feature (gawk on some Linux systems is renamed awk). This should make the build configuration more robust. r90692: Added twolc pair string testing for Xerox. Hfst requires another type of pair strings, and can't easily be tested at present. r90613: Added more tailored silent output, silenced Xerox tools as much as possible (not very much). r90588: Made documentation build process work properly when using VPATH builds, and at the same time silenced the doc build by default. r90437: Replaced grep in Makefile with a shell script for extracting semantic tags. This is done to catch the case where there are no semantic tags to extract. Earlier this caused a failed build, now it is handled properly. Required GTD core version increased because of the new script is only found in the latest version of the core. r90421: Build remove-semantic-tags.regex for all languages, since we now use it when building speller files. r90419: Only test spellers if we build spellers. r90415: Moved the new test script processing in configure.ac up a few lines to avoid conflicts during template merge. r90414: Added a test to check zhfst file validity. Not functional yet, because of a bug in hfst-ospell. r90368: Fixed bug http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=1830. The word border mark removal had earlier been moved from a preceding step to the final product compilation step. It was added to the foma speller compilation, but not included in the hfst speller compilation for some reason. Now it is. At the same time removed semantic tags from the speller transducers, to make the analysis/generation string more readable when debugging - they are not used by the speller builds. r89769: Fixed a build bug: it tried to build oxt files also when hfst support was not enabled. Now oxt files will only be built if hfst is on, and spellers have been requested. r89623: Fixed a problem with building zhfst files using VPATH builds. r89587: Added convenience upload target to upload the oxt files and make a permanent link to the latest version. r89570: Fixed pattern rule error. r89569: Simplified the building of the hfst lexical transducer, and made it easy to use the -F option to hyperminimise the lexical fst. Using pattern rules instead of fixed filenames. r89561: Added first working build of oxt files. Not yet generalised, but working with the paths we have. Will build Windows and Mac OXT files for all languages. r89338: Whitespace change. r89333: Stop with error if --with-hfst was requested but could not be turned on. Based on patch by Unhammer. r88686: Really fixed syntax errors, and another old error. r88684: Fixed syntax errors. r88682: More comments and clear separation between the different macro sections. Added first components for configuring oxt building - checking whether it is possible to sync the oxt template locally from $GTHOME. Added check that all components of the Xerox tools are installed before enabling Xerox builds. r88677: Added a variable to hold the LibreOffice version number where speller support for a language was initially available. r88669: Made minimisation of speller automatas configurable (default=no), since it can be extremely time and resource consuming for some languages, and the size difference for the final fst is not very big; we might loose some speed thoug, which needs to be tested. Reorganised the configure.ac coce by moving most code to the m4/giellatekno.m4 macro file. What remains in configure.ac is pretty clean and mostly easily understood. r88620: Renamed the last files in am-shared/ to follow the correct naming scheme. r88615: Removed test-src-morph-include.am - it was in reality empty. r88596: Moved word border removed to each fst-based speller, as we need it in the hfst speller production (for word-based weighting) but not in the foma-based speller. Rewrote all (fst-)speller build steps to regexes instead of hfst pipelines. r88584: Rewrote the mt build instructions to a regex instead of an hfst pipe. r88578: Renamed doc-include.am to follow the naming scheme. r88569: Renamed hunspell-include and listbased-spellchecker-include. r88562: Deleted unused include file. r88558: Renamed orthography-include.am and hyphenation-include.am to follow the naming scheme. Now all src-dir includes are renamed. r88545: Renamed phonetics-include.am to follow the naming scheme. r88543: Renamed syntax-include.am to follow the naming scheme. r88539: Renamed transcriptions-include.am to follow the naming scheme. r88533: Renamed phonology-include.am to follow the naming scheme. r88531: Renamed disamb-include.am to follow the naming scheme. r88526: Removed 130 lines of code, and made the code much more readable by replacing the long pipe of hfst commands with one regex pr target. The actual regex is a mirror copy of the xfst regex already in the file, which means that it is also very easy to maintain functional parity between the two architectures. r88522: Renamed the lexc include file to follow the correct naming convention. r88520: Renamed the main src include file to follow the correct naming convention. r88405: Completed documentation for updating gtcore. r88390: Added support for building disamb-oriented fst's, which include the semantic tags. r87898: Corrected VPATH build of lexc files generated from xml. At the same time silenced the XSL processing a bit, and corrected a minor configure error for VPATH configurations. r87890: It was not a good idea to redefine a variable referencing itself - AM stops. r87889: Enabled compression of zhfst files again, it should now work across all platforms. Made filter regex compilation quiter for xfst, and made the silent mode more informative for filter compilation. r87298: Make the the xfscript compilers quiet in silent mode, verbose in verbose mode. r87220: When running LexC tests, if no tests were found, the test bench will now report that the whole test was skipped. Earlier it reported a pass. r87212: Tailored silent build output for Vislcg3. r87206: Increased the actual and required version number after a small bugfix in thespeller version easter egg, to ensure all generated spellers have proper version info. r87099: Corrected a fatal bug for non-latin spell checkers: the error model contained one letter from the easter egg not found in the acceptor. This symbol mismatch is fatal for hfst-ospell, and caused all non-latin spellers to crash (the latin spellers would all have this symbol ('p') anyway, so no problem was noticed earlier). r86991: Corrected the compilation of xfscript files such that we still have a general build rule for xfscript files, but now with a following inversion when needed. Also added better feedback on the build steps in silent mode. r86874: Updated required and actual version number of gtdcore. The easter egg creation for hfst spellers depends on new files in the core, and also the abbr.txt building does so. Without an updated core e.g. speller builds will fail. r86855: Renamed more am-shared files. r86822: Renamed topdir-include.am to src-include.am to follow the correct naming pattern. r86820: Experimenting with feedback on silent builds (make V=0). Looks good. r86806: Added Autotools support for building the abbr.txt file. This file is _not_ included in the regular make commands, one has to cd into the tools/preprocess/ directory, and to 'make abbr' there. This is on purpose. r86774: Added a new dir tools/preprocess/ to hold resources for the preprocess utility. r86740: Added automatic switch between hfst-foma and hfst-xfst for compiling xfscript files into transducers. hfst-foma is the default, with fallback to hfst-xfst if hfst-foma is not found. There are still issues with hfst-xfst. r86734: Moved xfscript compilation out of phonetics-include and hyphenation include. These am-files contained and invert command that combined with an invert command in the actual xfscripts created a meaningless double inversion. r86719: Removed unused twolc.am file. Reorganised the code for twolc and xfscript compilation, to avoid duplicate code and prepare for improvements. Added M4 macro check that either hfst-xfst or hfst-foma is included, hfst compilation is turned off if none of them is. r86655: Reduced weights for the easter egg suggestions, to avoid other suggestions to come in between. r86582: Easter egg with version info now working in the hfst speller. r86438: Added initial version file for the hfst-based spellers. r86359: Explicit support for local source files and targets for the syntax. r86355: Added support for building (compiling into binary form) cg3 files for syntactic functions and dependency graphs. Added a template file for syntactic functions. Made the compiled binary files installable through 'make install'. r86122: Added version checking of vislcg3, renamed a couple of variables, and improved configuration feedback a bit. Now we require a vislcg3 new enough to not complain about recent addition of new features. r85561: Changed the file order when building zhfst files - there are still issues caused by the index.xml file being non-first. Now it is always first. r85550: Finally fixed the libvoikko/zhfst spellers. Ready for Windows! r85468: Moved the common src/filters/ inside a common/ dir, to allow for other parallel dirs like smi/ and und-Cyrl/ that target only a subset of the languages. At the same time renamed gtshared/ to gtdshared/. This change require version 0.2.0 of the gtdcore. r85453: Added the requirement to remove orig_lang-tags (OLang/NOB etc) by adding filters/remove-orig_lang-tags.xfst also to generator fsts, not only to analyser, for the dicts fst-s. r85430 Fixed a bug that hindered the GTD core from finding the version info script in the core (as opposed to installed). r85421: Added version checking of the GTD core: if the core is too old, configure will stop and print an error message with instructions on how to proceed. Added an external Autoconf M4 macro for version comparison, and renamed the file of an existing module, to be more consistent and explicit in the filenames. This work is done in preparations for other changes in the GTD core, which will require the core to be updated to not render all languages broken. r85356: Removed the filter "remove-NG-string.regex" from the analyser-dict-gt-norm.xfst target, in order to allow Use/NG entries in dict fsts. r85211: No PCDATA text elements should be on a line of its own, that seems to trip off TinyXML2. r85193: Another whitespace change to make TinyXML2 happy. r85177: Removed a space that tripped off TinyXML2. Tiny typo correction. r85044: Added some default content to the description element, to avoid hfst-ospell to segfault. r84651: And with some more coffee in my system, remove-semantic-tags-except-prop.xfst is now included in the mobile dict analyser. r84650: Checked in the two dict analysers with different spellrelax, but forgot semantic tags and orig_lang tags. r84648: Two dict analysers, one with mobile spellrelax, and one without. Also removing certain semantic tags and orig_lang tags which prevent POS from being the first tag, and messing with lookups for NDS r84363: Adding possibility to first look for specific regex creation shell script before falling back to a default shell script. This will allow us to create more complex or tailored regexes for certain tag sets (like the semantic tags), while having a reasonable fallback for other cases. r84350: Keeping intermediate files didn't work, created an error. Now it works. r84334: Fixed a make warning, made generated regex files survive the build. r84212: Further cleanup of semantic tag filtering: no processing of semantic filters in the shared makefiles. r84117: Remove semantic tag filtering from the common targets, it is only used by sme and sma. r84090: Added rules to generate regexes automatically from the list of extracted tags. First out is the regex to make semantic tags optional, and another to remove them completely. Also fixed file references in the relabel targets. r84086: Added a rule to generate phonology documentation from xfscript, not only from twolc. r83997: Only build one file of tags, using hfst or xfst depending on the configuration. Extract semantic tags. r83976: Reverted a change to hfst lexc compilation - the -f option doesn't work. r83961: Moved tag extraction from tagsets to filters, as it has a more general use as the basis for dynamic filter construction. Tag extraction now works with both Xerox and Hfst. r83906: Xerox will now stop on lexc syntax errors. Hfst will not until (hfst_)foma is fixed, because foma doesn't stop on syntax errors. But one is better than none. r83807: Removed one harmless but irritating warning. r83736: Commented out weighting of the acceptor fst - it causes a segfault in hfst-ospell. r83655: Added a filter to remove dynamic derivation. r83588: YES! Finally got weighted automatas working in the speller. Added missing hfst tools, and sorted all the hfst tools alphabetically. Updated the required hfst to version 3.5.1. r82738: Changed build files to support Hfst 3.5, requires 3.5. r82633: Added LexSub string filter. r82452: Changed voikko compression back to zip - gzip isn't voikko compatible. r82434: FINALLY fixed the automake 1.11 vs 1.13 test incompatibilities. Now we can allow version 1.11, and still get the pretty output we want in newer automakes. r82406: Fixed references to GTCORE in test scripts. Earlier we relied solely on it beingset in the environment, now we take it from configure (which can take it from the environment or from a script). r82403: One more gzip option fix. r82399: Fixed argument structure of gzip - zipping was broken for hfst and gramcheck. r82316: Consistently use gzip instead of zip, and find gzip outside any conditionals. r82308: Redirected command feedback of the analyser shell script to stderr, to avoid cluttering the analysed text in pipe use. r82266: Restored the Makefile and the shell script, now that the dir is merged. r82261: Had to remove the Makefile as well, adding only the dir in the first go - no text replacement was done inside the Makefile. r82258: Removed the shell script, to take the merge in two steps: first create the dir, then add the shell script file. This makes it possible to rename the file at the same time, whereas if we merge the dir and the file in one go, no renaming will take place. That leaves us with a tedious manual rename process afterwards. r82255: The first lookup shell script added, with supporting infrastructure. r82231: Added option to automatically create a language home dir environment variable. The idea is that by setting this variable, we can reliably find transducers in the working copy dirs of the users. The default is to not do anything (but give a warning). r82207: Changed back the Automake requirement to 1.11 - 1.12 is creating too much trouble. We'll have to see what to do with the test output - the version requirements change must be followed by another change that will substantially degrade test reports on newer automakes. r82203: Made the check for GTCORE functional, looking for both the gt-core.sh script (and using its output if found), and the environment variable $GTCORE. This means that there is no need anymore to set the GTCORE variable as long as one configure, make and make install in the gtcore directory. r82063: Corrected bug/feedback e-mail address to one actually working. r82028: Made LexC compilation break on error, at least for Xerox (Hfst only gives a warning for the same error). r82022: Moved the remove-illegal-derivation-strings.regex from all langs to only the three Sámi langs actually using it. Even though potentially useful for more languages, it can hardly be considered a language universal... r81906: More build rules for the grammar checker. Now it will install. r81862: Corrected the --enable-grammarchecker option testing. r81857: Changed the order of the configure macros, to allow for testing for program availability when checking the enable options. r81854: Forgot to add the new Makefile to configure.ac. r81831 Added basic build infrastructure for a CG-based grammar checker. No template source file added yet, as this is still pretty experimental. r81653: Updated the filenames to match what we actually check out. r81633: Copy-paste error introduced scanning of a subdir test that doesn't exist for any language but SME. Now corrected. r81625: Reorganised the phonetic build code to better support parallel phonetic transcription depending on the source language of loan words and foreign names. r81597: Added check for the availability of 'see' when testing, to avoid bad fails on systems without 'see'. r81592: Added config feedback about vislcg3/syntactic parsing status. Added config check for the see tool (SubEthaEdit). r81588: Remove copying of the timestamp file for non-maintainers. It breaks the automatic merge, and requires a revision-explicit merge for each such language. Also added removal of originating language tags - they are only used in TTS. r81579: Added compilation of the remove-orig_lang-tags filter. Sorted the filter targets. r81562: Improved and corrected configure feedback for spellers. r81556: Corrected syntax error in a test. Improved config feedback further. r81551: Now all speller fst's are turned off by default (I missed a few in the previous commit). The configure feedback is slightly improved. r81544: Changed the default setup to only include morphological analysis and generation. This is done to reduce the build time during regular development. This means that to build spellers and other specialised fst's, the must now be enabled using ./configure. Cf. bugzilla #1710. r81095: Corrected filter order for the text2X transcriptors. r81078: Completely redid the text2num etc transducers. The previous solution was in the wrong place, and didn't incorporate the actual filtering. Now it does, but whether this is the way it should be needs to be tested. r81065 Another Xerox error correction - we're using LexC, not Xfst. Skipped the result stack - not needed. r81054: Corrected Xerox error. r81052: Forgot one small make step. r81051: Added the inverse transcriptors, to go from text to numerical expressions. r81022: Wrapped phonetic / IPA conversion in a configure option, default is 'no'. Now compiling SME with Xerox should be back to normal speed again. r80337: Added Remove ACR filter. r80322: Added compilation of the filters for the orthographic tags, and added removal of them and the IPA strings in all regular fst's. r80164: Added missing hfst tool hfst-fst2strings to the M4 autoconf macros. r79977: Forgot to rename a variable after copy-paste. r79929: Reorganised the build code for dictionaries, added a dictionary option for configure (disabled by default), and added the new filter for mobile keyboard spellrelax. r79408: Still one more case of optimised lookup format removed. The underlying problem remains, though: that the hfst tools can't take all produced formats as input. Also added gzip compression of the att file transferred to Apertium. r79402: Another bugfix: switched from -f owl to -t in another case, to avoid hfst crash. r79373: Bugfix: hfst-substitute can't take lookup-optimised fst's as input. r79338: Removed a sma-specific filter that had crept in. Added att output fst to the default apertium analyser target. r79327: Added missing check and variable definition for hfst-fst2txt. Several minor changes to the apertium build instructions. r79316: Added missing reference to the remove-semantic-tags-except-prop filter. r79315: One small change forgotten in the previous commit: comments and one less target in the default setup (more to be added on a per-language base). r79308: Moved MT/Apertium code to the und template from sma. Not tested, most likely buggy. r79152: Added remove-variant-string.regex, for removing strings containing +v2, +v3, +v4, +v5, but not removing +v1. r79081: Change echo to printf for cross-platform compatibility. r79057: Improved error handling in testing shell scripts. r78956: Removed -s option from hfst-summarise in tagsets/ and added # -> + to the Apertium relabel script r78059: Renamed refs to template dir in preparation for support for multiple template dirs. r77898: Commented out examples of error models for string and word pairs - they would in most cases add symbols to the error model not found in the acceptor, and this combination would crash the speller badly. r77567: Cleaned up speller fst building, removing all unnecessary inverts and streamlining the code. Prepared for the introduction of weights, but commented out for now because of bugs or inefficiences in openfst. Renamed the included hfst speller build file, to follow an emerging naming standard for the include files. r77523: Added support for making variant analysers and generators using the Apertium tag convensions. The generated transducers are still not fully Apertium-compatible but they are a major step forward. r77475: Renamed analyser-raw-gt-desc.hfst to generator-raw-gt-desc.hfst, to make the behavior in hfst-lookup explicit and clear. Still, the "generator" behaves as the Xerox "analyser" in hfst when in comes to composition and filtering. Confusing, I know. r77459: Build the filter to remove CLB strings from speller transducers, and use it. r77449: Added missing hfst tools. Removed commented-out code in the index.xml file. r77368: Removed the ocr error model from the zhfst building, it causes libvoikko 3.4 to segfault. r77364: Added an explicit copy operation into the hfst speller dir, to facilitate local modifications of the speller transducer before further processing, by just replacing the copy operation with whatever is needed. r77356: Added string pairs and whole-word corrections to the speller error model. Added support for an ocr error model. Removed obsolete Voikko config file. Corrected bugs in the hfst M4 macros. r77317: Moved the initial spell checker processing to the top spellchecker dir, to serve as the default starting point for all spell checkers. r77273: Added a tagset directory in preparation for generating Apertium transducers automatically. Corrected and expanded a few M4 macros for the hfst tools. r76046: Added support for testing analysers and generators only. For several of our more specialised transducers, this is more practical and useful than always generating both pairs of transducers to test both directions. r75902: Corrected the existing oahpa transducer. Added dummy hfst oahpa target. r75594: Corrected a bug in the hyphenator hfst build: fst's must be inverted in hfst. r75459: Corrected another copy-paste error that broke speller fst's. r75424: Corrected copy-paste error. r75423: Rewrote a number of targets to reflect a splitted morph boundary removal filter. There are now three filters instead of one, to allow for more flexible fst building for speech processing. r75275: Added gzip compression of foma speller transducer, and proper checks for prerequisites. Foma spellers can now be disabled, they are enabled by default. r75256: Corrected a bug when building foma-based spellers. Changed one fst filename to follow the naming scheme for the new infra. Improved building of the zfst speller file. r74922: Added processing of new filters. r74604: Do not try to build hfst-based tools if hfst building is not enabled. r74427: Forgot to include changes to the filters Makefile.am in the previous commit. r74424: Moved some of the fst-speller building one level up, and added support for building foma-based spellers. r74285: Renamed phonetics source and target files to reflect the actual purpose. r74259: Added possibility to build morph segmenter for those langs that have morph boundaries marked in lexicons. r74234: Added a top-level misc/ dir to hold private / non-svn files needed during development of the language. All files are ignored. r74072: Corrected hfst 2ipa fst: the final fst needs to be inverted before being used in lookup. r74012: Corrected the homonymy and variant filters used for generators - those tags should be optional, not completely removed. r73986: We require gawk specifically, not any awk whatsoever. Improved config feedback. r73341: Corrected reference to the built fst's. r73159: Updated the zhfst building to reflect recent changes in Voikko. There is now official support for zhfst speller files, but with a new location and no *.pro file. Also added simple support for local loading of the zhfst file - voikkospell requires that the file is located within a dir named '3'. r72875: Further improvements to the test run output. r72864: More tweaks to make the test output compact and readable. r72854: Made use of the more compact modes of morph-tester.py. For all PASSed test runs, only one line is printed. r72836: Moved Oahpa transducer compilation to a separate (included) file, and added support for compiling dictionary transducers. r72827: We need the last part of the path to properly identify the lexc file tested. r72823: Made the morph-tester test runner (LexC and YAML tests) less verbose. All messages are one-liners, except for FAILs. r72769: More thorough clean in src/morphology/. r72662: Moved the definitions of the transducer variables to the Makefile.am, to make it possible to extend them by local modifications. r72560: Forgot to update the src/filter/Makefile.am file. r72534: Split the filter 'remove-dictionary-tags' in two to remove homonymy and variant tags separately. r72518: Added filter to remove NGminip strings, ie paths that should not be used for generating miniparadigms in dictionaries. r72423: Added infrastructure for building fst's for list-based spellers. The actual building is not yet implemented. r71921: Remove doc build dir when cleaning. r71912: Forgot to update the config file. r71905: Reorganised the tools/ dir to fit better with coming development. r71576: Several adjustments to the forrest setup for jspwiki validation. r71421: Added support files to enable forrest validation of jspwiki files. Second part of making sure extracted documentation comments won't break site building. Also added another make step to actually run forrest at the end of building the documentation. Make will now break if there are fatal errors in the jspwiki markup. r71326: Corrected cut&paste error. r71325: Added check for forrest as part of configuring the documentation extraction. Forrest will be used to validate the jspwiki documents during the build, to avoid that invalid documents enter the svn repository and corrupts the web page building. r71071: Upped the required automake version from 1.11 to 1.12, to avoid all hassles with the test harnesses and backwards compatibility. r71052: And then the rest of the tests changed into the most portable format. r71025: Even more portable testing... r71002: Improved portability & correctness of conditional tests in the morphology testing. r70961: Major update to the LexC testing. Now test data directly in the LexC code is supported by the python test script morph-tester.py (it reads the lexc files directly), which solves the bugs with multiple wordforms for the same morphosyntactic inflection. It is also a bit faster than the awk solution. r70633: Added initial documentation extraction of CG3 files. Probably more work to be done to get things working as intended. r68844: Finally found out how to get the old test behaviour back. We want the serial tests, because it gives direct feedback to the linguists. Automake 1.13 uses parallel testing by default, which logs all test results to files. r68823: Added support for processing twolc files for documentation extraction. r68816: Some files may contain digits in their filename. Extended the filename match pattern for the Links target. r68806: Added support for automatically building a file with links to each individual jspwiki file generated based. r68760: Forgot to add the jspwiki preamble file. Now added. r68649: Added some very basic documentation comments to the template root.lexc. r68639: Forgot to add support for the conditional CAN_DOCC in the previous commit. r68630: Added initial support for extracting documentation from comments in the source code. Only jspwiki supported initially. Also added initial support for extracting test data from source code comments. Only yaml tests in lexc is supported initially. r67871: The final fix to get the XML-to-LexC conversion working on Cygwin. r67860: Concatenate all LexC source files into one file explicitly, instead of letting hfst-lexc do it. This is more robust cross-platform, and makes the file used for transducer compilation easily available for debugging. r67840: Corrected the host detection test for Cygwin. r67828: Made spell-relax a language-specific file by adding it to the und template. r67827: Added support for XSL conversion of XML source files on Cygwin. r67732: Made Voikko support optional instead of required. r67724: Fixed a stupid bash syntax error in the previous commit. r67720: Rewrote LexC and TwolC Xerox rules to make them work on Cygwin: the Windows Xerox tools need a script file as input, the scripts can't be piped in as on *nix systems. Removed the hack in the previous commit. The bug can be worked around by avoiding linebreaks in the piped script. r67563: Added hack to work around a very strange bug in LexC transducer saving - the filename is slightly garbled if the save command is passed in from a script generated by a make file (but the same command passed in from a manually typed script works correctly). r67353: More robust Saxon/Java setup: no need to define CLASSPATH. The M4 macros will look for a couple of predefined pathnames, and pick the first saxon9he.jar file it finds. More locations should be added as needed. r67309: Require at least HFST 3.4 - it includes all backends, and simplifies dependency handling quite a bit. r67043: Fixed parsing of regexes for hfst, due to a bug in hfst-regex2fst when parsing regexes with comments after the regex is closed. r66959: Refactored the yaml test code, moving duplicate parts to a separate file. Makes for much easier adaption to new transducer types. r66833; First step in making the digit transcriptor transducers work. The transducers are compiled, and are given proper names according to the fst naming conventions, and the Xerox transducers work in the digit-2-string direction. The Hfst transducers do not yet work (segmentation fault due to running out of memory because of an infinite recursion), and the string-2-digit direction is not yet in place. r66815: Made the yaml test scrips obey configuration options, ie only run the hfst tests if hfst is turned on at configuration time. r66329: Automake requirement reduced to 1.11, after getting confirmation that that version is fine for finding Python (the main problem issue that triggered the version requirement). r66316: There's too much trouble with finding the correct Python version when using Automake v.1.10. We thus require 1.12 from now on. r66241: The noun lemma generation test script has been updated to only test the transducer types that have been turned on at configuration time. r65468: Grep out comments from regex files in orthography/, as there is a bug in hfst-regexp2fst. r65380: The actual Oahpa configuration was lost! Now finally included and working. r65377: Forgot Oahpa configuration feedback. r65376: Several updates: * slightly improved feedback from the configure script * improved hfst spell checker building * added basic support for building Oahpa transducers, *disabled* by default r65370: Renamed 'dictionary' `spellerautomaton` in giellatekno.m4. The old variable name and printouts were confusing - 'dictionary' has manh meanings, and some very concrete ones in the context of the GT/Divvun work. r65289: Minimise after every compose operation - always. r65064: Bug fixes to the Saxon/Java configuration. r65002: Call saxon checks from confugre. In previous commit: Ubuntu version of xml2lexc with autostuff. r64988: Only check hfst version if requested using '--with-hfst', otherwise disable. Likewise, disable xfst if requested and print warning if both are disabled. r64645: Added simple feedback to autogen.sh, valuable when processing many languages. Some reformatting of am-shared/hfst-spellchecker-include.am. r64584: Removed double inversion from the hfst generator - it didn't work. r64568: One more syntax error. r64567: Corrected syntax errors introduced in the previous commit. r64566: Silenced the build using an Automake macro. r64560: Make the silent build rules backwards compatible. r64552: Reapplied the simplification of hfst regex expressions, now with the correct command, thus working. Corrected hfst filter compilation. Took the first steps in silencing the verbose make output. r64550: Reverted the simplification of hfst regex compilation. r64539: Added support for spellrelax. Also renamed the orthography include file to be more generic. Simplified regex build commands now that the bugs in hfst-regexp2fst have been corrected. r63846: hfst-preprocess-for-optimized-lookup-format has been removed from the hfst distribution. r63795: Always fail hfst check if hfst-info can't be found. r63667: Add requirement of foma for hfst compilation; remove distinction between WANT_[HX]FST and CAN_[HX]FST. r63602: The local modifications to Makefile.am files must be before the fallback pattern targets, it seems, otherwise the fallback targets are used. r63579: I believe I finally have fixed the yaml testing shell scripts. r63573: One more fix for the yaml testing shell scripts. r63571: Escaping in the yaml test scripts didn't work - removing the single quotes did. Also added an underscore in front of the transducer string in the yaml testing, to avoid that the test scripts get too greedy when we get more transducers and test data. r63557: Small variable correction in the yaml test bench. r63556: Forgot to escape the single quotes used within the backtic expression. r63527: Corrected fail check in noun lemma generation test. r63522: Split the yaml test runner in two, one for norm and one for desc transducers, and updated the autoconf file correspondingly. Updated the lemma generator test to work with the renamed transducer. Made all test runners more robust. r63501: Renamed all existing targets to follow the naming scheme defined at http://divvun.no/doc/infra/infraremake/TransducerNamesInTheNewInfra.html. Also added making of true normative and descriptive analysers and generators, as well as moved all of the hfst speller building to the tools/spellcheckers/hfstspeller/ dir. More explicit separation of local and central code in src/. r63483: Added a simple header to the beginning of the compilation, to make it easier to spot each new language when building all languages in $GTHOME/langs/. r63469: With the recent fixes to regexp parsing in hfst-regexp2fst it was possible to bring the hfst compilation up to par with the Xerox compilation. In principle the Xerox and Hfst transducers should behave exactly the same - any deviation is a candidate bug in either the Xerox or the Hfst tools. This update requires hfst 3.3.14 to work properly, the requirement is added to the configure.ac file. r63149: Removed references to newinfra/. r63144: Added warning about missing YAML testing, with short instructions on how to enable them. r62953: The top-level syntax include AM file had not been changed to reflect the rle->cg3 suffix change. r62867: Corrected a bug in the default generate-noun-lemmas.sh test script. Made file references more robust. r62683: Variables=cleaner code. r62663: Updated the yaml test runner to properly report the exit value of the yaml tests, and also to give directions for how to see the details of each test if it failed. r62650: Corrected typo in shell scripts. r62648: Several testing shell script updates: correct exit value when data files are not found, proper use of Autoconf-made variables (will free the test scripts from relying on the user setting up environment variables), and better checks on the availability of test data for the lemma generation test. r62643: Added check for the Xerox lookup tool, which also defines the LOOKUP variable. r62639: Reorganised AC processing of shell scripts to be more future-proof. Added AC variable to the AC-processed shell script to make casual by-lookers aware of the fact that the resulting shell script file is generated by AC. r62635: Moved Autoconf processing of the yaml testing shell script to the top of the list of AC_CONFIG_FILES, to avoid annoying warnings from chmod. r62621: Corrected error in previous commit. r62619: Forgot to update configure.ac. r62617: Refined the yaml test runner: more informative banner, ignore extra analyses (= removes false alarms). r62610: Added basic setup for running YAML tests in the test/src/morphology/ dir. The default setup will run all *.yaml files found in this dir, but this can be modified in the shell (*.sh.in) script. r62596: Enable yaml tests by magic. r62590: Added conditional support for running python-based tests in test/src/morphology. r62581: Added checks for Python 3.1+ and py-yaml, and defined CAN_YAML_TEST. The idea is that we will run the python-based tests only if the prerequisites are available to us, and skip them if not. r62276: Added missing entries in configure.ac and src/Makefile.am. r62275: Added support for transcribing transducers, ie transducers that change the input from one orthographical representation to another, e.g. date and time expressions as strings or digits to the opposite form. r62270: Renamed the default error model file, to follow the naming scheme used in the zhfst guidelines. r62265: Don't remove the *.tmp files - that destroys the dependency relationships for (auto)make, which forces a full recompilation of all target fst's, and a lot of extra waiting time. r62192: Add missing src to hfst spellchecker automaton path r62171: Added missing reference to dialect tag filter. r62151: Updated my simplistic noun generation script to be aware of its new location. r62147: Corrected typo in Makefile list in AC_CONFIG_FILES. r62145: Forgot to add the new Makefile's to the list of AC_CONFIG_FILES. r62141: Reorganised the test dir, in anticipation of a larger set of tools and source types in need of testing. r62122: Added test/data/typos.txt to hold a list of collected typos. The list is used both for testing spellers, and as part of the preprocessor used with the Xerox lookup tool. r62084: Added core filters for making transitivity and semantic tags optional in the default generator. This change fixes most of the generation issues. r62039: Generalised the local/language-specific *fst processing. r62010: Adding support for local/language-specific *fst processing. r61971: Reverting the copy source file step for $GTCORE/gtshare/src/filters/, replacing it with direct compilation. The main reason for copying was to have the source files available for distribution, but that is not required, since the distribution and installations of the languages depends on the presence of GTCORE. Instead we risk that people start to add the copied source files to svn. r61965: The GT_FILTER_TARGETS variable needs to be parametrised for HFST and Xerox also for the local modifications. r61961: Several small typos and glitches in building the main xfst's fixed. r61959: Removed an extraneous backslash that broke compilation. r61954: Hopefully this version of cp will work. r61953: Forgot to force the copy, which stopped the compilation. r61949: Added several (tag removal) filters from the old infra, and added compilation of them as well. Applied (composed) them on the generator and the analyser, such that the gen. and anal. should now produce the same output as in the old infra. Corrected the README file. r61776: Remove duplicate methods to enable toolkits; --with-options are to be used with optional path to tools. The automake tree will still have two conditionals on whether CAN and WANT hfst, xfst or somesuch r61734: Added border removal to the basic analyser and generator, such that they become useful. r61731: Made the first test script more robust: it bails out if no transducer is found, and gives basic feedback to whether it is testing Xerox or Hfst. The test data files are not deleted after the test run, so that they can be easily inspected if needed, even after a successful test run. It also uses the more common filename and morphological tags (=less typing by default). r61723: Added the first test script: it tests whether noun lemmas do generate. The script does contain some language-specific bits, and must thus be adapted to the requirements of each language. r61712: Corrected reference to inituppercase.?fst. r61708: Corrected compilation of hyphenation rules. r61706: Corrected compilation of phonetic/orth2ipa rules. r61685: Added basic structure for hyphenation and conversion to IPA. r61658: Added an empty Hunspell dir to indicate the home of Hunspell building. r61640: Removed (auto)make processing of the now deleted src/spellchecker/ dir. Refactored the phonotactics processing, such that xfst script files are on an equal foot with twolc files, and such that it is easy to switch from one to the other. Also added some basic template files for disambiguation and dependency tagging, taken from the faroese source. r61605: Reorganized spell checker build structure, moving it to a new tools/ dir, which will also be home to other applications of the basic linguistic analysers. r61603: Added build support for xml source files. r61520: Added initial support for xml source files. NB! The support isn't fully according to GNU (autotools) standards yet, but will have to do for the moment. r61472: Reverted the ignores - merging wasn't working properly for properties, and caused a lot of noise. Hopefully this commit will cancel that noise. r61464: Added svn:ignore on most dirs, in the hope that they will be copied over to the language dirs. r61458: A lot of cleanup and corrections: * suffix rules in more places (although not in all - that is not possible) * removed automake warning about pattern rules - we need them * checked all *-include.am files for consistency, added missing Xerox and HFST targets were needed, corrected vars to HFST tools, added comments and generally made the files easier to maintain (I hope). r61422: Only pattern rules for uppercasing targets. r61380: Replaced pattern rules with suffix rules in phonology (twolc) processing. First step in switching to suffix rules everywhere, for backwards compatibility, and fewer automake errors. Also added support for compiling phonologies written as xfst script files. r61346: Added Autoconf processing of the Makefile.am files in test/. r61318: More elaborate dir structure in test/, added Makefile.am files everywhere in test/, added test/ to SUBDIRS. r59870: Include filters and indent properly, use gtcore version of editdist.py in spellers r59759: Corrected syntax error, renamed a couple of errors. r59755: Finally added initial uppercase to the actual analysing transducer. NB! It presently works only for Xerox transducers. The code for HFST looks correct to me, but it still doesn't work. Will have to debug this further later. r59728: Corrected file suffix match for xfst script files. r59707: Revert tests r59705: Test commit r59643: Added uppercasing compilation including proper processing for HFST. Uppercasing is not yet applied to the output transducers. That is coming in a second commit. Added a variable for hfst-xfst (and tested for its existence). r59115: Corrected dependency for speller transducer. Corrected filter path. r59113: Correct the path of merge instructions. r59079: Added filter compilation and processing of fst's. r58484: Corrected path to speller metadata files, corrected commands for them. r58482: Updated zhfst building according to new location of files. r58478: Replaced direct command calls with variables, completed and corrected a couple of commands, made the transducer commands consistent across applications. r58471: The timestamp file has been renamed, with corresponding changes in the Makefile.am. r58456: Making the echo silent for clean output. r58409: Renamed a couple of folders and files, moved hfst & voikko speller metadata files to a more appropriate place. Updated the top Makefile.am to properly check for changes to this file before building anything else.