IMPORTANT NOTICE - Apr-30-2009:
This .NET implementation of Okapi is no longer actively developed.
Instead, a NEW JAVA IMPLEMENTATION IS AVAILABLE and is being actively developed.

The material on this Web site is for archive propose. Some applications of the old .NET implementation (e.g. Olifant) will be maintained to some degree until they have a replacement in the Java project.

Okapi Framework

Filter Interface Specification

Revision information: Version 1.1
     Latest revision: http://okapi.sourceforge.net/IFilter.html

- Document Status
- Overview
- Usage
- Defined Values
- Methods

Document Status

This document is a stable specification. Feedback about the content of this document is encouraged. Send your comments to the Okapi Framework administrator, or alternatively to post them in the Okapi Tools users group.

DISCLAIMER: Parts of the documents are generated automatically from the source code documentation of Okapi's implementation of the specification. Because the generation tool is not completely working yet, this may result in incomplete or broken text or links in this document. See the source code documentation of the interface for complete text.

Overview

The Okapi Filter Interface is an object model that allows any given input source to be parsed so its different localization-related parts are presented in a common manner to a given calling program.

Essentially, the Filter Interface allows you to abstract the intricacies of the original data source you are processing, and develop applications that act upon a single common and standardized type of input.

Other related specifications:

Usage

This interface provides a common way of reading the localizable information of an input. It also offers the option to re-write the input into a new output at the same time.

Just after creating an object that implements IFilter, you should call the IFilter.Initialize method. At this point you can query the filter to get more information about the filter: use IFilter.GetName to retrieve its name, IFilter.GetIdentifier to retrieve its identifier, or IFilter.GetDefaultDatatype to know what kind of data it processes.

Use the IFilter.LoadSettings method to specify which filter, variant and parameters file to use.

The IFilter.QueryProperty method allows you to detect whether the filter supports or not different types of features.

Defined Values

The following defined values are used with the IFilter interface:

Filter Properties

The filter properties are used with the QueryProperty method to know if a filter supports a given property. The properties currently defined are the following:

0 (INPUTFILE)

Constant value = 0. Queries whether the filter supports input file.

1 (INPUTSTRING)

Constant value = 1. Queries whether the filter supports input for strings.

2 (BILINGUALINPUT)

Constant value = 2. Queries whether the filter input may be bilingual.

3 (TEXTBASED)

Constant value = 3. Queries whether the format supported by the filter is text-based.

4 (OUTPUTFILE)

Constant value = 4. Queries whether the filter supports output file.

5 (OUTPUTSTRING)

Constant value = 5. Queries whether the filter supports string output.

6 (ANCILLARYOUTPUT)

Constant value = 6. Queries whether the filter supports ancillary output(s).

7 (XMLOUTPUT)

Constant value = 7. Queries whether the filter supports XML and HTML output.

8 (RTFOUTPUT)

Constant value = 8. Queries whether the filter supports RTF output.

9 (USEKEY)

Constant value = 9. Queries whether the filter is currently using a key.

10 (ISINDEMOMODE)

Constant value = 10. Queries whether the filter is currently in demonstration mode.

Filter Item Types

The filter item types are used to identify what type of item is returned by the filter. The ReadItem method return such a value, and an object that implements the IFilterItem interface is labeled with such a type as well.

0 (ERROR)

Constant value = 0. Error.

An item of this type indicates that an error has occurred. The processing of the input should be stopped.

1 (USERCANCEL)

Constant value = 1. User cancellation.

An item of this type indicates that the user has cancelled the operation. The processing of the input should be stopped.

2 (ENDINPUT)

Constant value = 2. End of the input.

An item of this type indicates that the end of the input file or of the input string has been reach. The processing of the input should be stopped.

3 (TEXT)

Constant value = 3. Text item.

An item of this type contains text.

All text items are not necessarily to be translated. Use the IFilterItem.IsTranslatable method to see if the text is translatable. Use IFilterItem.IsTranslated to detect if text item has a corresponding translation (in the case of bilingual files).

4 (STARTGROUP)

Constant value = 4. Beginning of group.

5 (ENDGROUP)

Constant value = 5. End of group.

6 (BLOCK)

Constant value = 6. Block of data.

An item of this type is a block of data not relevant for localization.

7 (BINARY)

Constant value = 7. Standalone binary data (e.g. bitmap)

An item of this type corresponds to an ancillary file. The text content is the path, mime-type must be set.

8 (STARTBINARY)

Constant value = 8. Start of a binary data (e.g. graphic with extracted text).

An item of this type corresponds to an ancillary file with some additional data (possible TEXT items). The text content is the path, mime-type must be set.

9 (ENDBINARY)

Constant value = 9. End of a binary data.

10 (STARTSECONDARYFILTER)

Constant value = 10. Start of a part parsed with a secondary filter.

11 (ENDSECONDARYFILTER)

Constant value = 11. End of a part parsed with a secondary filter.

Any item with a type value of 0, 1, or 2 (so any possible value below 3) should result in stopping the process. A possible way to program the parsing of the input is something like this:

int nRes;
do
{
   nRes = myFilter.ReadItem();
   switch ( nRes )
   {
      case 0: // Fatal error
         myLog.Error("A fatal parsing error has occurred.");
         continue;
      case 1: // User interruption
         myLog.Message("Process interrupted by the user.");
         continue;
      case 2: // Normal end of input
         continue;
   }
}
while ( nRes > 2 );

Methods

The Filter Interface provides the following methods:

IFilter.Initialize

Initializes the filter object.

IFilter.GetInterfaceVersion

Gets the version of the IFilter interface the object implements.

IFilter.GetIdentifier

Gets the identifier of the filter.

IFilter.GetName

Gets the name of the filter.

IFilter.GetCurrentEncoding

Gets the current encoding used to process the input file.

IFilter.GetCurrentLanguage

Gets the current language of the processed input.

IFilter.GetInputLanguage

Gets the language of the input.

IFilter.GetOutputLanguage

Gets the language of the output.

IFilter.GetOutputEncoding

Gets the name of the encoding used for the output.

IFilter.GetItem

Gets the current filter item.

IFilter.GetTranslatedItem

Gets the translation of the current filter item.

IFilter.GetDefaultDatatype

Gets the default datatype identifier.

IFilter.QueryProperty

Queries whether a specified property is supported by the filter.

IFilter.GetVariantCount

Gets the number of variants supported by the filter.

IFilter.GetVariantID

Gets the identifier for a specified variant of the filter.

IFilter.GetVariantDescription

Gets the description of a specified variant of the filter.

IFilter.LoadSettings

Loads the filter settings.

IFilter.SaveSettingsAs

Saves the current settings to a specified path

IFilter.GetSettingsString

Gets the current settings string for the filter.

IFilter.EditSettings

Edits a specified parameters file for the filter.

IFilter.GetLocalizationDirectives

Get the current localization directives of the filter.

IFilter.SetLocalizationDirectives

Passes a localization directives context to the filter.

IFilter.OpenInputFile

Opens the file to process.

IFilter.OpenInputString

Set a given string as the input to process.

IFilter.ResetInput

Resets the input.

IFilter.GetLastItemID

Gets the last ID used for an item.

IFilter.SetLastItemID

Sets the last ID for an item.

IFilter.ReadItem

Reads the next item of the input file or input string.

IFilter.CloseInput

Closes the current input.

IFilter.SetOutputOptions

Sets the options for the output. Call this method before calling IFilter.OpenOutputFile.

IFilter.UseOutputLayer

Sets the layer information for the output.

IFilter.GetOutputLayer

Gets the parameters of the current output layer.

IFilter.OpenOutputFile

Creates the output file. Call IFilter.SetOutputOptions before calling this method.

IFilter.OpenOutputString

Creates an output string.

IFilter.SetAncillaryDirectory

Sets the directory for the ancillary data.

IFilter.GenerateAncillaryData

Generates the ancillary data.

IFilter.WriteItem

Writes the last item read.

IFilter.CloseOutput

Closes the current output.

IFilter.Initialize

Initializes the filter object.

Parameters

Okapi.Library.Base.ILogp_LogThe log object to use.

Remarks

This method must be called first after the creation of the object.

IFilter.GetInterfaceVersion

Gets the version of the IFilter interface the object implements.

Return

The version of the IFilter interface.

IFilter.GetIdentifier

Gets the identifier of the filter.

Return

The string that identify the filter.

Remarks

Use this method to retrieve the unique identifier of the filter. The value is case-sensitive.

The identifier used in the name of the parameters files, along with other parts. For example, in okp_myfilter@Test.fprm the okp_myfilter is the filter identifier.

IFilter.GetName

Gets the name of the filter.

Return

The name of the filter.

Remarks

Use this method to retrieve the localized name of the filter. For example: Okapi PO Filter.

IFilter.GetCurrentEncoding

Gets the current encoding used to process the input file.

Return

The IANA charset name of the current input eencoding.

Remarks

As some format may provide mechanism for declaring their encoding, the encoding provided when calling IFilter.OpenInputFile or IFilter.OpenInputString may get modified. The IFilter.GetCurrentEncoding method allows you to retrieve the encoding really used for the processing, after opening the input.

IFilter.GetCurrentLanguage

Gets the current language of the processed input.

Return

The string that identifies the current language.

Remarks

The value is a BCP47 language code (e.g. pt-BR). Note that the value is not case-sensitive.

IFilter.GetInputLanguage

Gets the language of the input.

Return

The string that identify the input language.

Remarks

The value is a BCP47 language code (e.g. pt-BR). Note that the value is not case-sensitive.

IFilter.GetOutputLanguage

Gets the language of the output.

Return

The string that identify the output language.

Remarks

The value is a BCP47 language code (e.g. pt-BR). Note that the value is not case-sensitive.

IFilter.GetOutputEncoding

Gets the name of the encoding used for the output.

Return

The name of the encoding. It is a IANA charset name. Returns null if no encoding is defined.

IFilter.GetItem

Gets the current filter item.

Return

The IFilterItem interface to the current item.

Remarks

You must call the IFilter.ReadItem method before using this method. Depending on the value of IFilterItem.GetItemType the data available with this object are very different.

IFilter.GetTranslatedItem

Gets the translation of the current filter item.

Return

The IFilterItem interface to the translated text of the current item.

Remarks

You must call the IFilter.ReadItem method before using this method. If the item returned is a text item it may have an existing translation (in the case of bilingual files), use the IFilterItem.IsTranslated method on the item returned by IFilter.GetItem to detect if such translation is available. If there is no translation available, the result of IFilter.GetTranslatedItem is null.

IFilter.GetDefaultDatatype

Gets the default datatype identifier.

Return

The datatype identifier of the filter.

IFilter.QueryProperty

Queries whether a specified property is supported by the filter.

Parameters

Integerp_nPropertyProperty to query. The value must be one of the Filter.FilterProperty values.

Return

True if the specified property is supported, false if it not supported.

IFilter.GetVariantCount

Gets the number of variants supported by the filter.

Return

Number of variant supported by the filter.

Remarks

Each filter must support at least one variant (the default one).

IFilter.GetVariantID

Gets the identifier for a specified variant of the filter.

Parameters

Integerp_nIndexIndex of the variant. The value must be between 0 and the value returned by IFilter.GetVariantCount - 1.

Return

The string identifying the specified variant.

IFilter.GetVariantDescription

Gets the description of a specified variant of the filter.

Parameters

Integerp_nIndexIndex of the variant. The value must be between 0 and the value returned by IFilter.GetVariantCount - 1.

Return

The short description of the specified variant.

IFilter.LoadSettings

Loads the filter settings.

Parameters

Stringp_sFilterSettingsFilter settings string.
System.Booleanp_bIgnoreErrorsTrue if no error is generated if the file cannot be loaded. False if the methods generates an error and return false when the file cannot be loaded.

Return

True if the settings could be loaded (or set to their defaults values), false if an error occured.

Remarks

The filter settings string has the following syntax:

[<folder>]<filterID>[#<variantID>][@[<F>%]<parametersID>]

Where:

Example

Loads the Okapi PO Filter with the parameters file named okf_po@myOptions.fprm located in the Okapi Parameters system older. An error is generated if the file does not exist.

LoadSettings("okf_po@myOptions", true);
The actual file loaded will be C:\Program Files\Okapi\Shared\Parameters\okf_po@myOptions.fprm assuming the application is installed with its default parameters file location. The parameter could also have been okf_po@S%myOptions.fprm.

IFilter.SaveSettingsAs

Saves the current settings to a specified path

Parameters

Stringp_sPathFull path of the file where to save the settings.
Stringp_sPrefixOptional prefix. If not null, it will be placed in front of any settings-related additional file(s) the filter saves along with the parameters file.

Return

True if no error occured, false otherwise.

IFilter.GetSettingsString

Gets the current settings string for the filter.

Return

The Filter settings string.

IFilter.EditSettings

Edits a specified parameters file for the filter.

Parameters

Stringp_sFilterSettingsFilter settings string. See IFilter.LoadSettings for the string format.
System.Booleanp_bNewTrue if it is a new file, false otherwise.

Return

True if the edit was succesful, false if the user cancels the edit or if an error occurs.

IFilter.GetLocalizationDirectives

Get the current localization directives of the filter.

Return

The current localization directive object of the filter, or null if the filter does not support localization directives.

IFilter.SetLocalizationDirectives

Passes a localization directives context to the filter.

Parameters

Okapi.Library.Filter.ILocalizationDirectivesp_LDContext to pass.

Return

True if the context was passed without error, false otherwise.

Remarks

When the filter is used to process data extracted from inside another dataset (e.g. a script within an HTML file) you may need to pass the context of the outer filter.

IFilter.OpenInputFile

Opens the file to process.

Parameters

Stringp_sPathFull path of the input file.
Stringp_sLanguageThe code of the language to process. The code must be a BCP47 tag.
Stringp_sEncodingThe name of the encoding of the file. The name must be an IANA charset name. This parameter is used only if the filter cannot detect automatically the encoding of the file.

Return

True if the file is opened successfully, false if an error occurs.

Remarks

Use this method to open the input file to process. You must call IFilter.CloseInput after the file has been processed. To process a string, use the IFilter.OpenInputString method.

Example

See IFilter.ReadItem for an example.

IFilter.OpenInputString

Set a given string as the input to process.

Parameters

Stringp_sInputString to process.
Stringp_sLanguageThe code of the language to process. The code must be a BCP47 tag.
Stringp_sEncodingThe name of the encoding of the file. The name must be an IANA charset name. This parameter is used only if the filter cannot detect automatically the encoding of the file.
Integer64p_lOffsetInFilePosition in an input file where the string starts.

Return

True if the string to set sucessfully, false if an error occurs.

Remarks

Use this method to set the string to process. Call IFilter.CloseOutput after the string has been processed. To process a file, use the IFilter.OpenInputFile method.

The p_lOffsetInFile parameter is to be used if the string is coming from a file, to allow the filter to provide a more accurate location along with any error or warning. Set p_lOffsetInFile to 0 if the input string has no context.

IFilter.ResetInput

Resets the input.

Remarks

Use this method to to reset the current input for re-processing, without re-calling IFilter.OpenInputFile or IFilter.OpenInputString. You do not need to call this method if you have just open the input.

IFilter.GetLastItemID

Gets the last ID used for an item.

Return

The last item ID used.

IFilter.SetLastItemID

Sets the last ID for an item.

Parameters

Integerp_nIDValue of the last item ID.

IFilter.ReadItem

Reads the next item of the input file or input string.

Return

The type of the item processed. The value is one of the Filter.FilterItemType values.

Remarks

Use this method to read the next ietm in the current input. You must call IFilter.OpenInputFile, IFilter.OpenInputString, or IFilter.ResetInput prior call this method a first time.

After this method has been called, you can use the IFilter.GetItem method to access the different data that have been parsed from the input. The data available depend on the type of item returned (see Filter.FilterItemType for details).

Example

The following example shows how to traverse an input file using the Filter interface:

            myLog.BeginProcess("Test extraction");
            myFilter.OpenInputFile("myFile.xyz", "en-us", "windows-1252");
            int nRes;
            do
            {
               nRes = myFilter.ReadItem();
               switch ( nRes )
               {
                  case FiterItemType.ERROR:
                     myLog.Error("An error has occurred.");
                     continue;
                  case FiterItemType.USERCANCEL:
                     myLog.Message("Operation cancelled.");
                     continue;
                  case FiterItemType.ENDINPUT:
                     continue; // Normal end
                  case FilterItemType.TEXT:
                  case FilterItemType.STARTGROUP:
                  case FilterItemType.ENDGROUP:
                  case FilterItemType.BINARY:
                     // Do something with the item ...
                     break;   
               }
            } while ( nRes > FiterItemType.ENDINPUT );
            myFilter.CloseInput();
            myLog.EndProcess(null);
            

IFilter.CloseInput

Closes the current input.

Remarks

If you are also using the output of the filter, you must call IFilter.CloseInput after IFilter.CloseOutput as some filters may need to have some input information in order to generate the output.

IFilter.SetOutputOptions

Sets the options for the output. Call this method before calling IFilter.OpenOutputFile.

Parameters

Stringp_sLanguageThe code of the language for the output. The code must be a BCP47 tag.
Stringp_sEncodingThe name of the encoding of the file. The name must be an IANA charset name. This parameter is used only if the filter cannot detect automatically the encoding of the file.

Return

True if the options are set successfully, false if an error occurs.

Remarks

Use this method to specify the language and encoding of the output.

If they are both the same as for the input, you do not have to call this method.

If you call this method, you must call it before calling IFilter.OpenOutputFile or IFilter.OpenOutputString.

IFilter.UseOutputLayer

Sets the layer information for the output.

Parameters

Integerp_nLayerType of layer to use in the output. The value must be one of the Filter.FilterOutputLayer values.
Stringp_sStartDocumentCodes to place a the start of the layered output.
Stringp_sEndDocumentCodes to place at the end of the layered output.
Stringp_sStartCodeCodes to place before runs of external codes.
Stringp_sEndCodeCodes to place after runs of external codes.
Stringp_sStartInlineCodes to lace before runs of internal codes.
Stringp_sEndInlineCodes to place after runs of internal codes.
Stringp_sStartText[Reserved for future use]
Stringp_sEndText[Reserved for future use]

Return

True if the data are set successfully, false if an error occurs.

Remarks

This method should be called before you call IFilter.OpenOutputFile or IFilter.OpenOutputString. You do not need to call this method if the output has no extra layer.

IFilter.GetOutputLayer

Gets the parameters of the current output layer.

Parameters

Integerp_nLayerType of layer to use in the output.
Stringp_sStartDocumentCodes to place a the start of the layered output.
Stringp_sEndDocumentCodes to place at the end of the layered output.
Stringp_sStartCodeCodes to place before runs of external codes.
Stringp_sEndCodeCodes to place after runs of external codes.
Stringp_sStartInlineCodes to lace before runs of internal codes.
Stringp_sEndInlineCodes to place after runs of internal codes.
Stringp_sStartText[Reserved for future use]
Stringp_sEndText[Reserved for future use]

IFilter.OpenOutputFile

Creates the output file. Call IFilter.SetOutputOptions before calling this method.

Parameters

Stringp_sPathFull path of the file to create.

Return

True if the file was created successfully, false otherwise.

Remarks

Use this method to open an output file. The path of the output file can be the same as the input file.

If you want to use an encoding or a language in the output that is different from the input encoding or language, you must call IFilter.SetOutputOptions before calling this method.

If you use a layered output, you must call IFilter.UseOutputLayer before calling this method (and after calling IFilter.SetOutputOptions).

Note that some filters may generate the actual final output only when IFilter.CloseOutput is called, at the end of the process.

Example

See IFilter.WriteItem for an example.

IFilter.OpenOutputString

Creates an output string.

Remarks

Use this method to open an output string. You can open an output string only when working with an input string.

If you want to use an encoding or a language in the output that is different from the input encoding or language, you must call IFilter.SetOutputOptions before calling this method.

IFilter.SetAncillaryDirectory

Sets the directory for the ancillary data.

Parameters

Stringp_sInputRootRoot directory of the input files.
Stringp_sAncillaryRootRoot directory for the ancillary output.

IFilter.GenerateAncillaryData

Generates the ancillary data.

Parameters

Stringp_sId(Output) [TBD]
Stringp_sType(Output) [TBD]
Stringp_sPath(Output) [TBD]

Remarks

Use this method to generate the ancillary data for a BINARY item.

IFilter.WriteItem

Writes the last item read.

Remarks

If you call this method, you must call it before the next call to the IFilter.ReadItem or IFilter.CloseInput methods.

Example

            myFilter.OpenInputFile("myFile.xyz", "en-us", "windows-1252");
            myFilter.OpenOutputFile("myFile.out.xyz");
            int nRes;
            do
            {
               nRes = myFilter.ReadItem();
               switch ( nRes )
               {
                  case FiterItemType.ERROR:
                     myLog.Error("An error has occurred.");
                     continue;
                  case FiterItemType.USERCANCEL:
                     myLog.Message("Operation cancelled.");
                     continue;
                  case FiterItemType.ENDINPUT:
                     continue; // Normal end
                  case FilterItemType.TEXT:
                  case FilterItemType.STARTGROUP:
                  case FilterItemType.ENDGROUP:
                  case FilterItemType.BINARY:
                     // Do something with the item ...
                     break;   
               }
               myFilter.WriteItem();
            } while ( nRes > FiterItemType.ENDINPUT );
            myFilter.CloseOutput();
            myFilter.CloseInput();
            

IFilter.CloseOutput

Closes the current output.

Return

The output string if the output was a string or if the output was a file.

Remarks

Use this method to close the last output you opened. The method will return if the output was opened as a file (using IFilter.OpenOutputFile). It will return the actual output string if the output was opened as a string (using IFilter.OpenOutputString). You must call IFilter.CloseOutput before calling IFilter.CloseInput, as some filters may need to have some input information in order to generate the output.