- Overview
- Filter Properties
- Processing Details
- Credits

Overview

The Trados Text Filter is an Okapi component that implements the Okapi Filter Interface for Trados Text translation memory files. See the Trados Web site for more information on the Trados tools.

The following is an example of a very simple Trados Text TM file. The source text is marked in blue bold. and the target text is marked in green bold.

Not implemented yet: The filter currently does handle only code in internal style, not the RTF-only code like \b or other RTF objects in the segment. Those are stripped out.

<RTF Preamble>
<FontTable>
{\fonttbl 
{\f1 \fmodern\fprq1 \fcharset0 Courier New;}
{\f2 \fswiss\fprq2 \fcharset0 Arial;}}
<StyleSheet>
{\stylesheet 
{\St \s0 {\StN Normal}}
{\St \cs1 {\StB \v\f1\fs24\sub\cf12 }{\StN tw4winMark}}
{\St \cs2 {\StB \cf4\fs40\f1 }{\StN tw4winError}}
{\St \cs3 {\StB \f1\cf11\lang1024 }{\StN tw4winPopup}}
{\St \cs4 {\StB \f1\cf10\lang1024 }{\StN tw4winJump}}
{\St \cs5 {\StB \f1\cf15\lang1024 }{\StN tw4winExternal}}
{\St \cs6 {\StB \f1\cf6\lang1024 }{\StN tw4winInternal}}
{\St \cs7 {\StB \cf2 }{\StN tw4winTerm}}
{\St \cs8 {\StB \f1\cf13\lang1024 }{\StN DO_NOT_TRANSLATE}}}
</RTF Preamble>
<TrU>
<CrD>18042005, 15:05:27
<CrU>DC
<ChD>18042005, 15:05:27
<ChU>BJ
<UsC>13
<Att L=Component>HD
<Seg L=EN-US>Some text in {\cs6\f1\cf6\lang1024 <b>}bold{\cs6\f1\cf6\lang1024 </b>}.
<Seg L=FR-FR>Du texte en {\cs6\f1\cf6\lang1024 <b>}gras{\cs6\f1\cf6\lang1024 </b>}.
</TrU>

Filter Properties

The properties for the Trados Text Filter are the following:

Property This Filter
INPUTFILE Yes
INPUTSTRING No
BILINGUALINPUT Yes
TEXTBASED Yes
OUTPUTFILE

Yes

OUTPUTSTRING No
ANCILLARYOUTPUT No
XMLOUTPUT No
RTFOUTPUT No
USEKEY No
ISINDEMOMODE No

Processing Details

Input Encoding

If the UTF-8 or UTF-16 has been auto-detected as the encoding of the input file, that encoding is used, otherwise the encoding specified by the user is used.

Be careful when selection the input encoding: Trados Text file are bilingual file, and often the correct encoding of the input file is the one corresponding to the target/output language.

The translation unit entries are in RTF format. RTF provides different ways to write out extended characters:

The two last forms can be read correctly without problem, but the raw character form present the problem of switching encoding as the file is read depending of the font: The filter currently does not provide support for this form: raw characters are read using the encoding specified by the user. Note that in the Text TM files for Trados 7 the raw characters are in UTF-8, and do not have that issue of possible multiple encodings within the same file.

Output Encoding

The encoding of the output is the one specified by the user, except when UTF-8 or UTF-16 has been auto-detected as the input encoding. In that case, the output encoding used is the same as the input encoding.

When writing out extended character the filter uses the latest RTF syntax that utilizes both the hexadecimal and the Unicode escape mechanism. This allows faster and safer read. Each entry starts with a \ucN commend to reset the number of hexadecimal characters to skip if reading Unicode values. This is made necessary because of a bug in the way Trados reads the Text TM in all version of Trados before version 7: the \ucN value is not reset to its normal default and Trados ends up reading both escape forms resulting in duplicated extended characters everywhere in the imported TM.

Input and Output Languages

The user-specified source language code is checked against the first <Seg> found. A warning is generated if they are not identical, but the process continues. In the same way, the user-specified target language code is checked against the second <Seg> found. A warning is generated if they are not identical, but the process continues.

Date and Time

The date and time information in Trados Text TMs is in local time. Ideally the filter should try to convert the TM date/time to UTC, however because the entries cover dates that can be in both daylight saving periods and non-daylight periods using a single time difference information may result in onerous data anyway. In addition as the TM circulate between translators, the entries may be in different local times. Overall since there is no certainty that a conversion would be accurate the filter does not make any.

As a result, the filter simply assumes the date and time information is in UTC.

Item Properties

This filter implement support for a few item properties that give you access to the all information set for each translation unit entry. These properties are:

CreationDate This corresponds to the <CrD> field. The property value is in the format "yyyyMMddTHHmmssZ".
CreationUser This corresponds to the <CrU> field.
ChangeDate This corresponds to the <ChD> field. The property value is in the format "yyyyMMddTHHmmssZ".
ChangeUser This corresponds to the <ChU> field.
UsageCount This corresponds to the <UsC> field.
UsageDate This corresponds to the <UsD> field. The property value is in the format "yyyyMMddTHHmmssZ".
@AName Attribute with pick-list, with Name the name of the attribute. This corresponds to the <Att L=Name> fields.
@TName Attribute with text value, with Name the name of the attribute. This corresponds to the <Txt L=Name> fields.

Note that all these fields may not be available for all entries. The IFilterItem method GetProperty() return null when the property does not exist. You can use the ListProperties() method to get a semi-colon delimited list of all properties in the current filter item.

TODO: Recognize the restype, resname, and flag attributes.

Credits

Special thanks to Gerrit Sanders and Jean-Christophe Helary for their help with Trados version 7.