Okapi Shared HelpITS (Internationalization Tag Set) |
|
UNDER CONSTRUCTION
This section provides an introduction and a quick reference to ITS, the W3C's Internationalization Tag Set as implemented by Okapi's XML Filter. For a complete and official description of ITS, see the W3C ITS specifications.
ITS is a set of elements and attributes you can use in any XML document to specify different internationalization-related aspect of your documents. For example: what elements should be translated, notes for the translators, how elements should affect sentence segmentation, etc.
Each type of ITS feature is called a "data category". Depending on what kind of function a tool is doing it will implement one or more data categories. The ITS features currently supported by the Okapi XML Filter are the following:
There are two way to apply ITS information to your document: Global rules and local rules.
<its:rules>
elements.
Global rules can be stored in standalone files (external global rules) you
associate to a group of documents, or embedded inside the documents
themselves (embedded global rules).There are two ways to associate external rules to a documents:
xlink:href
attribute in an embedded
<its:rules>
element. This is basically importing the external rules
within the document at process-time. In this case the rules of the external
file behaves like if they would be embedded rules.Note:
To some degree you can compare the way ITS rules
are declared to CSS styles: Global rules in standalone files are the equivalent
of a CSS file. Embedded global rules are the equivalent of the HTML
<style>
elements. And local rules are the equivalent of the HTML
style
attributes.
ITS rules are always applied in the following order:
xlink:href
). The embedded rules can be anywhere in the
document. If there are more than one <its:rules>
within the
same document, the are applied in the order the <its:rules>
appears.A later rule overrides a previous one when both apply to the same parts of the document (in other words: the last rule always wins).
Example of external global rules:
<its:rules xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <its:translateRule selector="//fexp" translate="no"/> <its:withinTextRule selector="//fexp|//strong" withinText="yes"/> </its:rules>
Example of a document with embedded global rules (in red) and local rules (in blue):
<book xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <head> <its:rules> <its:translateRule ... /> </its:rules> <title>The Life of a Simple Man</title> </head> <body> <p>Everything started when Zebulon discovered that he had a <fexp>doppelgänger</fexp> who was a serious baseball <fexp its:translate="yes">aficionado</fexp>.</p> </body> </book>
<translateRule> |
|||
Required Attributes: | |||
selector |
Absolute XPath expression that indicates what nodes should be affected by the given rule. | ||
translate |
"yes" if the nodes selected should be translated, "no" if they should not be translated. The property applies to the children elements of the selected nodes, but not to the attributes. By default elements are translatable and attributes are not. | ||
Optional Attributes: | |||
None |
For example, in the following XML document, the <docID>
element
is declared as not translatable, and the alt
attribute is declared
as translatale.
<myDoc> <head> <docID>ABC123-456-987</docID> <title>Title</title> <its:rules xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <its:translateRule selector="//docID" translate="no"/> <its:translateRule selector="//@alt" translate="yes"/> </its:rules> </head> <body> <para>Look at this picture: <img href="e1.png" alt="Elephants in the river"/></para> </body> </myDoc>
TODO
The Terminology data category is implemented, but no Okapi tools takes advantage of it for now.
TODO
TODO
TODO
In some document you may have the same text in different languages. Usually
the language information in XML is specified with the xml:lang
attribute. There is an XPath lang()
function that allows you to get
the language of a given node. This function is case-insensitive and take in
account inheritance of xml:lang
.
So, for example, the expression selector="//seg[lang('en-us')]"
will match the first <seg>
in:
<item> <var xml:lang='en-US'> <seg>Text</seg> </var> <var xml:lang='fr'> <seg>Texte</seg> </var> </item>
ITS can handle namespaces by using prefixes in its XPath expressions. You can declare the namespace URI and prefixes anywhere as long as the rule is in its scope.
For example, the following XML document is composed of elements that belong to two namespaces: the main one is "myDocumentNamespace", and there are also element from the XHTML namespace ("html").
<myDoc xmlns="myDocumentNamespace" xmlns:h="html"> <head> <docID>ABC123-456-987</docID> </head> <body> <para>This text is <h:b>bolded</h:b>, and this is some <h:i>text in italics</h:i>.</para> </body> </myDoc>
In order to point to the proper nodes to specify that <docID>
is
not translatable, and that <b>
and <i>
are inline
codes you simply use XPath expressions with prefixes that are mapped to the
proper namespaces.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" xmlns:m="myDocumentNamespace" xmlns:h="html" its:version="1.0"> <its:translateRule selector="//m:docID" translate="no"/> <its:withinText selector="//h:*" withinText="yes"/> </its:rules>