- Overview
- Filter Properties
- Processing Details
- Credits

Overview

The Wordfast Filter is an Okapi component that implements the Okapi Filter Interface for Wordfast translation memory files. See the Wordfast Web site for more information on Wordfast.

The filter supports Wordfast TM version 5, as described in the Wordfast manual.

The following is an example of a very simple Wordfast TM file. The source text is marked in blue bold. and the target text is marked in green bold. The marker <tab> represent a tabulation character.

%20051224~235959<tab>%User ID,YS,YS Yves,<tab>%TU=00000000<tab>%EN-US<tab>%Wordfast TM v 5.0/00<tab>%FR-FR<tab>%---45399785.
20051224~235959<tab>YS<tab>1<tab>EN-US<tab>Item 1<tab>FR-FR<tab>Article 1
20051224~235959<tab>YS<tab>1<tab>EN-US<tab>Item 2<tab>FR-FR<tab>Article 2

Filter Properties

The properties for the Wordfast Filter are the following:

Property This Filter
INPUTFILE Yes
INPUTSTRING No
BILINGUALINPUT Yes
TEXTBASED Yes
OUTPUTFILE

Yes

OUTPUTSTRING No
ANCILLARYOUTPUT No
XMLOUTPUT No
RTFOUTPUT No
USEKEY No
ISINDEMOMODE No

Processing Details

Input Encoding

If the UTF-8 or UTF-16 has been auto-detected as the encoding of the input file, that encoding is used, otherwise the encoding specified by the user is used. (Note that Wordfast files are normally never in UTF-8).

Be careful when selection the input encoding: Wordfast file are bilingual file, and often the correct encoding of the input file is the one corresponding to the target/output language.

Output Encoding

The encoding of the output is the one specified by the user, except when UTF-8 or UTF-16 has been auto-detected as the input encoding. In that case, the output encoding used is the same as the input encoding.

If one or more characters are not supported in the output encoding, the unsupported characters are escaped into \uHHHH (where HHHH is the hexadecimal Unicode value of the character) and a warning message is displayed when closing the output file.

TODO: line-break MAC support

Date and Time

The date and time information in Wordfast TMs is in local time. Ideally the filter should try to convert the TM date/time to UTC, however because the entries cover dates that can be in both daylight saving periods and non-daylight periods using a single time difference information may result in onerous data anyway. In addition as the TM circulate between translators, the entries may be in different local times. Overall since there is no certainty that a conversion would be accurate the filter does not make any.

As a result, the filter simply assumes the date and time information is in UTC.

Item Properties

This filter implement support for a few additional properties that give you access to the all information set for each entry. The properties are:

ChangeDate This corresponds to the field 1 of the entry. The property value is in the format "yyyyMMddTHHmmssZ"
ChangeUser This corresponds to the field 2 of the entry.
Flag 1 if the entry is flagged, 0 otherwise.
UsageCount This corresponds to the field 3 of the entry.
@AAttr2 This corresponds to the field 8 of the entry. The @A prefix is used for compatibility with other filters. The character @ indicates an user-defined field, the flag A indicates it's an attribute with a possible pick-list (as opposed to a T flag indicating a text-field). Note that there is no @AAttr1 attribute, as in Wordfast the first attribute is the second field of the entry (UserID) and the filter maps it to the ChangeUser property.
@AAttr3 This corresponds to the field 9 of the entry.
@AAttr4 This corresponds to the field 10 of the entry.
@AAttr5 This corresponds to the field 11 of the entry.

Note that all these fields may not be available for all entries. The IFilterItem method GetProperty() return null when the property does not exist. You can use the ListProperties() method to get a semi-colon delimited list of all properties in the current filter item.

Credits

Special thanks to Yves Champollion for providing information on the format of the Wordfast TM.