Okapi ComponentsTMX Language Duplicates Splitting Utility |
|
- The utility set identifier for this utility is: oku_set04
- The utility identifier is: tmxsplittingdup
The TMX Language Duplicates Splitting utility allows you split
into separate <tu>
elements the <tuv>
elements that
are of the same language inside a given <tu>
element.
For example, if you have the following TMX file:
<tmx version="1.4"> <header creationtool="XYZ" creationtoolversion="1.0" datatype="plaintext" segtype="sentence" adminlang="en-US" srclang="en-US" o-tmf="WXYTool"> </header> <body> <tu tuid="1"> <tuv xml:lang="en-US"> <seg>Efficiency is intelligent laziness.</seg> </tuv> <tuv xml:lang="fr-FR"> <seg>L'efficacité c'est la paresse intelligente.</seg> </tuv> <tuv xml:lang="es-ES"> <seg>La eficacia es la holgazanería inteligente.</seg> </tuv> <tuv xml:lang="fr-FR"> <seg>L'efficacité est la fille de la paresse.</seg> </tuv> </tu> <tu tuid="2" srclang="fr"> <tuv xml:lang="en"> <seg>Item text</seg> </tuv> <tuv xml:lang="fr"> <seg>Texte de l'article</seg> </tuv> <tuv xml:lang="en"> <seg>Text of the article</seg> </tuv> </tu> </body> </tmx>
The utility will generate the following output:
<tmx version="1.4"> <header creationtool="XYZ" creationtoolversion="1.0" datatype="plaintext" segtype="sentence" adminlang="en-US" srclang="en-US" o-tmf="WXYTool"> </header> <body> <tu tuid="1"> <tuv xml:lang="en-US"> <seg>Efficiency is intelligent laziness.</seg> </tuv> <tuv xml:lang="fr-FR"> <seg>L'efficacité c'est la paresse intelligente.</seg> </tuv> <tuv xml:lang="es-ES"> <seg>La eficacia es la holgazanería inteligente.</seg> </tuv> </tu> <tu tuid="2" srclang="fr"> <tuv xml:lang="en"> <seg>Item text</seg> </tuv> <tuv xml:lang="fr"> <seg>Texte de l'article</seg> </tuv> </tu> <tu tuid="1"> <tuv xml:lang="en-US"> <seg>Efficiency is intelligent laziness.</seg> </tuv> <tuv xml:lang="fr-FR"> <seg>L'efficacité est la fille de la paresse.</seg> </tuv> </tu> <tu tuid="2" srclang="fr"> <tuv xml:lang="fr"> <seg>Texte de l'article</seg> </tuv> <tuv xml:lang="en"> <seg>Text of the article</seg> </tuv> </tu> </body> </tmx>
Some things to keep in mind when using this utility:
<tuv>
elements in a <tu>
element, there are never duplicates (because
it assumes one is in the source language, while the other is in a target
language).<tuv>
element has been found.<tu>
element have the same tuid
attribute as the original <tu>
.<tu>
elements are located anywhere in the file (not
necessarily near the original <tu>
).<tu>
elements contain only the source <tuv>
and the duplicated language <tuv>
of the original <tu>.The common parameters are the options specified from the application calling the utility rather than in the options dialog box of the utility itself. For this utility the common parameters you need to specify are the following:
Files of the first input list | - Needed (the TMX files to process) |
Root for the first input list | - Not Needed |
Files of the second input list | - Not Needed |
Root for the second input list | - Not Needed |
Files of the third input list | - Not Needed |
Root for the third input list | - Not Needed |
Input language | - Not Needed |
Output language | - Not Needed |
Input default encoding | - Not Needed |
Output default encoding | - Not Needed |
Location and names for output files | - Needed |
This utility has no options.