Okapi Help

UNDER CONSTRUCTION

Overview

This section provides an introduction and a quick reference to ITS, the W3C's Internationalization Tag Set as implemented by Okapi's XML Filter. For a complete and official description of ITS, see the W3C ITS specifications.

ITS is a set of elements and attributes you can use in any XML document to specify different internationalization-related aspect of your documents. For example: what elements should be translated, notes for the translators, how elements should affect sentence segmentation, etc.

Each type of ITS feature is called a "data category". Depending on what kind of function a tool is doing it will implement one or more data categories. The ITS features currently supported by the Okapi XML Filter are the following:

Translate
Localization Note
Terminology
Element Within Text

Types of ITS Rules

There are two way to apply ITS information to your document: Global rules and local rules.

Global rules are ITS rules elements grouped inside <its:rules> elements. Global rules can be stored in standalone files (external global rules) you associate to a group of documents, or embedded inside the documents themselves (embedded global rules).
Local rules are ITS attributes (and, more rarely, elements) that are specified within the document itself, on the elements they affect.

There are two ways to associate external rules to a documents:

By tool-specific mean. A given tool can have a mechanism to associate an external standalone ITS file with a set of document to process. This allow to apply ITS to documents without modifying anything inside these documents.
Or by using the xlink:href attribute in an embedded <its:rules> element. This is basically importing the external rules within the document at process-time. In this case the rules of the external file behaves like if they would be embedded rules.

Note:

To some degree you can compare the way ITS rules are declared to CSS styles: Global rules in standalone files are the equivalent of a CSS file. Embedded global rules are the equivalent of the HTML <style> elements. And local rules are the equivalent of the HTML style attributes.

ITS rules are always applied in the following order:

The global rules in the standalone files (if any are associated to a document)
The global rules embedded in the document (directly or imported using xlink:href). The embedded rules can be anywhere in the document. If there are more than one <its:rules> within the same document, the are applied in the order the <its:rules> appears.
The local rules.

A later rule overrides a previous one when both apply to the same parts of the document (in other words: the last rule always wins).

Example of external global rules:

<its:rules xmlns:its="http://www.w3.org/2005/11/its"
 its:version="1.0">
 <its:translateRule selector="//fexp" translate="no"/>
 <its:withinTextRule selector="//fexp|//strong" withinText="yes"/>
</its:rules>

Example of a document with embedded global rules (in red) and local rules (in blue):

<book xmlns:its="http://www.w3.org/2005/11/its"
 its:version="1.0">
 <head>
  <its:rules>
   <its:translateRule ... />
  </its:rules>
  <title>The Life of a Simple Man</title>
 </head>
 <body>
  <p>Everything started when Zebulon discovered that he had
a <fexp>doppelgänger</fexp> who was a serious
baseball <fexp its:translate="yes">aficionado</fexp>.</p>
 </body>
</book>

ITS Markup

Translate

`<translateRule>`
	Required Attributes:
		`selector`	Absolute XPath expression that indicates what nodes should be affected by the given rule.
		`translate`	"yes" if the nodes selected should be translated, "no" if they should not be translated. The property applies to the children elements of the selected nodes, but not to the attributes. By default elements are translatable and attributes are not.
	Optional Attributes:
		None

For example, in the following XML document, the <docID> element is declared as not translatable, and the alt attribute is declared as translatale.

<myDoc>
 <head>
  <docID>ABC123-456-987</docID>
  <title>Title</title>
  <its:rules xmlns:its="http://www.w3.org/2005/11/its"
 its:version="1.0">
   <its:translateRule selector="//docID" translate="no"/>
   <its:translateRule selector="//@alt" translate="yes"/>
  </its:rules>
 </head>
 <body>
  <para>Look at this picture:
<img href="e1.png" alt="Elephants in the river"/></para>
 </body>
</myDoc>

Localization Note

TODO

Terminology

The Terminology data category is implemented, but no Okapi tools takes advantage of it for now.

Element Within Text

TODO

How To...

How to set an attribute as translatable

TODO

How to set ITS properties depending on context

TODO

How to deal with language conditions?

In some document you may have the same text in different languages. Usually the language information in XML is specified with the xml:lang attribute. There is an XPath lang() function that allows you to get the language of a given node. This function is case-insensitive and take in account inheritance of xml:lang.

So, for example, the expression selector="//seg[lang('en-us')]" will match the first <seg> in:

<item>
 <var xml:lang='en-US'>
  <seg>Text</seg>
 </var>
 <var xml:lang='fr'>
  <seg>Texte</seg>
 </var>
</item>

How to work with namespaces?

ITS can handle namespaces by using prefixes in its XPath expressions. You can declare the namespace URI and prefixes anywhere as long as the rule is in its scope.

For example, the following XML document is composed of elements that belong to two namespaces: the main one is "myDocumentNamespace", and there are also element from the XHTML namespace ("html").

<myDoc xmlns="myDocumentNamespace"
 xmlns:h="html">
 <head>
  <docID>ABC123-456-987</docID>
 </head>
 <body>
  <para>This text is <h:b>bolded</h:b>, and this is
some <h:i>text in italics</h:i>.</para>
 </body>
</myDoc>

In order to point to the proper nodes to specify that <docID> is not translatable, and that <b> and <i> are inline codes you simply use XPath expressions with prefixes that are mapped to the proper namespaces.

<its:rules xmlns:its="http://www.w3.org/2005/11/its"
 xmlns:m="myDocumentNamespace"
 xmlns:h="html"
 its:version="1.0">
 <its:translateRule selector="//m:docID" translate="no"/>
 <its:withinText selector="//h:*" withinText="yes"/>
</its:rules>

Okapi Shared Help

Overview

Types of ITS Rules

ITS Markup

Translate

Localization Note

Terminology

Element Within Text

How To...

How to set an attribute as translatable

How to set ITS properties depending on context

How to deal with language conditions?

How to work with namespaces?