ITS (Internationalization Tag Set)
This section provides an introduction and a quick reference to ITS, the W3C's Internationalization Tag Set as implemented by Okapi's XML Filter. For a complete and official description of ITS, see the W3C ITS specifications.
ITS is a set of elements and attributes you can use in any XML document to specify different internationalization-related aspect of your documents. For example: what elements should be translated, notes for the translators, how elements should affect sentence segmentation, etc.
Each type of ITS feature is called a "data category". Depending on what kind of function a tool is doing it will implement one or more data categories. The ITS features currently supported by the Okapi XML Filter are the following:
There are two way to apply ITS information to your document: Global rules and local rules.
<its:rules>elements. Global rules can be stored in standalone files (external global rules) you associate to a group of documents, or embedded inside the documents themselves (embedded global rules).
There are two ways to associate external rules to a documents:
xlink:hrefattribute in an embedded
<its:rules>element. This is basically importing the external rules within the document at process-time. In this case the rules of the external file behaves like if they would be embedded rules.
To some degree you can compare the way ITS rules are declared to CSS styles: Global rules in standalone files are the equivalent of a CSS file. Embedded global rules are the equivalent of the HTML
<style> elements. And local rules are the equivalent of the HTML
ITS rules are always applied in the following order:
xlink:href). The embedded rules can be anywhere in the document. If there are more than one
<its:rules>within the same document, the are applied in the order the
A later rule overrides a previous one when both apply to the same parts of the document (in other words: the last rule always wins).
Example of external global rules:
<its:rules xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <its:translateRule selector="//fexp" translate="no"/> <its:withinTextRule selector="//fexp|//strong" withinText="yes"/> </its:rules>
Example of a document with embedded global rules (in red) and local rules (in blue):
<book xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <head> <its:rules> <its:translateRule ... /> </its:rules> <title>The Life of a Simple Man</title> </head> <body> <p>Everything started when Zebulon discovered that he had a <fexp>doppelgänger</fexp> who was a serious baseball <fexp its:translate="yes">aficionado</fexp>.</p> </body> </book>
||Absolute XPath expression that indicates what nodes should be affected by the given rule.|
||"yes" if the nodes selected should be translated, "no" if they should not be translated. The property applies to the children elements of the selected nodes, but not to the attributes. By default elements are translatable and attributes are not.|
For example, in the following XML document, the
is declared as not translatable, and the
alt attribute is declared
<myDoc> <head> <docID>ABC123-456-987</docID> <title>Title</title> <its:rules xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <its:translateRule selector="//docID" translate="no"/> <its:translateRule selector="//@alt" translate="yes"/> </its:rules> </head> <body> <para>Look at this picture: <img href="e1.png" alt="Elephants in the river"/></para> </body> </myDoc>
The Terminology data category is implemented, but no Okapi tools takes advantage of it for now.
In some document you may have the same text in different languages. Usually
the language information in XML is specified with the
attribute. There is an XPath
lang() function that allows you to get
the language of a given node. This function is case-insensitive and take in
account inheritance of
So, for example, the expression
will match the first
<item> <var xml:lang='en-US'> <seg>Text</seg> </var> <var xml:lang='fr'> <seg>Texte</seg> </var> </item>
ITS can handle namespaces by using prefixes in its XPath expressions. You can declare the namespace URI and prefixes anywhere as long as the rule is in its scope.
For example, the following XML document is composed of elements that belong to two namespaces: the main one is "myDocumentNamespace", and there are also element from the XHTML namespace ("html").
<myDoc xmlns="myDocumentNamespace" xmlns:h="html"> <head> <docID>ABC123-456-987</docID> </head> <body> <para>This text is <h:b>bolded</h:b>, and this is some <h:i>text in italics</h:i>.</para> </body> </myDoc>
In order to point to the proper nodes to specify that
not translatable, and that
<i> are inline
codes you simply use XPath expressions with prefixes that are mapped to the
<its:rules xmlns:its="http://www.w3.org/2005/11/its" xmlns:m="myDocumentNamespace" xmlns:h="html" its:version="1.0"> <its:translateRule selector="//m:docID" translate="no"/> <its:withinText selector="//h:*" withinText="yes"/> </its:rules>