!!!Requirements and specifications for synching XML files against a CVS repository Most, if not all of the dictionaries and terminology collections in risten.no should be checked in to CVS with regular intervals. This will allow manual editing and corrections on the xml source, and at the same time ensure that those changes are automatically transfered back to the public server. It will also allow people to work off-line on the xml, whether it is using a web browser (requires local eXist installation) or a text or XML editor. Synching of the risten.no content with CVS also makes for a decent version history, as well as a good backup system. Synching, diffing and merging different versions of an XML file is not as straightforward as with regular text files, as the merge of two versions can potentially create invalid XML when done by tools not aware of the XML syntaxm, and thus the text structure. At the same time, the XML-aware diff and merge utilities that exist are either very expensive or unfinished (and thus buggy or slow). The strategy adopted here will be to use the traditional, line-oriented tools, combined with some principles (and corresponding tools) to ensure the XML is as easy to merge for such tools as possible. A very good summary of the things to remember when writing version controllable XML is found in the Oracle document [Writing Version Controllable XML|http://www.oracle.com/webapps/online-help/jdeveloper/10.1.3/state/content/navId.4/navSetId._/vtTopicFile.teamworking%7Cscs%7Cscs_u_writingversioncontrollablexml~html/]. It boils down to the following seven rules: * Rule 1: Avoid making implicit changes * Rule 2: Avoid marking files "dirty" when they have not changed * Rule 3: Don't arbitrarily reorder elements inside an XML file * Rule 4: Don't reformat XML files if the user can manually edit them * Rule 5: Separate XML elements and attributes onto separate lines * Rule 6: Avoid magic values * Rule 7: Sort elements to avoid conflicts Of these rules, Rules 1 and 2 do not apply to our case (it is relevant for software authors), and Rule 4 is overruled by a separate rule used here. Thus, in our case, we will enforce the following four rules upon each cvs commit/update/merge: * Rule 1: Sort elements to avoid conflicts * Rule 2: Sort attributes within an element to avoid conflicts * Rule 3: Separate XML elements and attributes onto separate lines * Rule 4: Reformat the XML in a standardised way These rules are enforced as follows: !!!Tools !!Rule 1: Sort elements to avoid conflicts By using a fixed sort order, there will be no differences or conflicts caused by different sort orders. Since the sort order does not contribute to the content of a dictionary file (the entries can be sorted in a number of different ways, and the sorting can happen at any time), this does not affect the content of dictionary or term collections in any destructive way. The sort order should be the Unicode default, ie according to character code. !Tool The sorting can either be done in XQuery upon export to XML, or as a separate postprocessing step in XSL. The second option is probably the preferred one, to avoid bogging down eXist (sorting is resource intensive), and to keep the cvs commit/update preflight scripts equal between different editors and environments. It is important to verify that the chosen XSL processor supports the sort order specified above. !!Rules 2-4: Sort attributes, separate elements and attributes, reformat These rules are enforced using XML:Twig, the excellent XML processing module for Perl. Twig has a pretty-printer that can output the parsed XML in a number of different ways, sorts attributes in alphabetical order, and can also separate elements and attributes on different lines. None of the built-in modes support directly what we want, but it was relatively easily extended with a new mode. A patch will be submitted. !!!Routines The XML in the CVS repository is expected to always be valid and correct XML, formatted as described above. Given that, the following flow should guarantee a least-problematic synch between local copies and the repository: # if from eXist: export the XML files # sort each file on /*/entry/@id # pretty-print the sorted files # validate - if not valid, stop # cvs up - if conflict, stop # validate again - if not valid, stop # cvs ci # if from eXist, reimport the files In all stop cases, if the update/commit is automatic (e.g. started by cron), send an e-mail to the responsible person, with some error report.