The Wordfast Filter is an Okapi component that implements the Okapi Filter Interface for Wordfast translation memory files. See the Wordfast Web site for more information on Wordfast.
The filter supports Wordfast TM version 5, as described in the Wordfast manual.
The following is an example of a very simple Wordfast TM file. The source text
is marked in blue bold. and the target text
is marked in green bold. The marker
<tab> represent a tabulation character.
%20051224~235959<tab>%User ID,YS,YS Yves,<tab>%TU=00000000<tab>%EN-US<tab>%Wordfast TM v 5.0/00<tab>%FR-FR<tab>%---45399785. 20051224~235959<tab>YS<tab>1<tab>EN-US<tab>Item 1<tab>FR-FR<tab>Article 1 20051224~235959<tab>YS<tab>1<tab>EN-US<tab>Item 2<tab>FR-FR<tab>Article 2
The properties for the Wordfast Filter are the following:
If the UTF-8 or UTF-16 has been auto-detected as the encoding of the input file, that encoding is used, otherwise the encoding specified by the user is used. (Note that Wordfast files are normally never in UTF-8).
Be careful when selection the input encoding: Wordfast file are bilingual file, and often the correct encoding of the input file is the one corresponding to the target/output language.
The encoding of the output is the one specified by the user, except when UTF-8 or UTF-16 has been auto-detected as the input encoding. In that case, the output encoding used is the same as the input encoding.
If one or more characters are not supported in the output encoding, the
unsupported characters are escaped into
is the hexadecimal Unicode value of the character) and a warning message is
displayed when closing the output file.
TODO: line-break MAC support
The date and time information in Wordfast TMs is in local time. Ideally the filter should try to convert the TM date/time to UTC, however because the entries cover dates that can be in both daylight saving periods and non-daylight periods using a single time difference information may result in onerous data anyway. In addition as the TM circulate between translators, the entries may be in different local times. Overall since there is no certainty that a conversion would be accurate the filter does not make any.
As a result, the filter simply assumes the date and time information is in UTC.
This filter implement support for a few additional properties that give you access to the all information set for each entry. The properties are:
||This corresponds to the field 1 of the entry.
The property value is in the format "
||This corresponds to the field 2 of the entry.|
||1 if the entry is flagged, 0 otherwise.|
||This corresponds to the field 3 of the entry.|
||This corresponds to the field 8 of the entry.
The @A prefix is used for compatibility with other filters. The
||This corresponds to the field 9 of the entry.|
||This corresponds to the field 10 of the entry.|
||This corresponds to the field 11 of the entry.|
Note that all these fields may not be available for all entries. The
GetProperty() return null when
the property does not exist. You can use the
method to get a semi-colon delimited list of all properties in the current
Special thanks to Yves Champollion for providing information on the format of the Wordfast TM.