4 Filters
Before you can start using quaneko, you will most likely want to
add support for your favorite file formats (e.g. doc, html, pdf, ..).
quaneko allows you to install and configure individual filters
for file types you want to index. This section describes how to
do this. If you want to index plain text files with the extension "txt"
only, you can skip this section.
Adding Support For Various File Formats
Adding support for a new file format means configuring
a new filter in quaneko. By default, quaneko only supports plain
text files with the extension "txt". If you want to index other
file types, you need to configure filters for them.
If you want to index a file format
that is not listed here, you will also find a generic description about
how to add support for any file format at the end of this section.
The example screenshot in figure 1 shows how to configure
a utility called "gettext.exe" under Windows for parsing Word
documents. The exact settings may vary for your system. In these
settings we assume that you've installed "gettext.exe" in
"C:\Program files\gettext\" and that the file "wordpad.exe" is
located in "C:\Program files\Windows NT\Accessories":
Filter extensions: |
doc |
Parse Command: |
C:\Program files\gettext\gettext.exe "%f" "%o" |
Open Command: |
C:\Program files\Windows NT\Accessories\wordpad.exe |
Once configured like described here, the Word doc filter should be
ready to use.

Figure 1: Configuration
of the Word doc filter.
You can configure the other format filters in the same way. The
following sections list some possible filter configuration settings
for Windows and Linux.
Configuration Under Windows
For Windows there is one filter available which works for many
common formats:
doc |
Microsoft Word document format |
xls |
Microsoft Excel spreadsheet format |
ppt |
Microsoft PowerPoint presentation format |
pdf |
Adobe portable document format |
html, htm |
Hypertext Markup Language |
txt |
Plain text |
rtf |
Rich Text Format |
wpd |
Corel WordPerfect® document format |
hlp |
Microsoft Help format |
The utility to convert all these formats into plain text is called
"GetText" and is available from:
http://www.kryltech.com/freestf.htm.
Word
Filter extensions: |
doc |
Parse Command: |
C:\Program files\gettext\gettext.exe "%f" "%o" |
Open Command: |
C:\Program files\Windows NT\Accessories\wordpad.exe |
Excel
Filter extensions: |
xls |
Parse Command: |
C:\Program Files\gettext\gettext.exe "%f" "%o" |
Open Command: |
C:\Program Files\Microsoft Office\Office10\excel.exe |
PowerPoint
Filter extensions: |
ppt |
Parse Command: |
C:\Program Files\gettext\gettext.exe "%f" "%o" |
Open Command: |
C:\Program Files\Microsoft Office\Office10\powerpnt.exe |
Adobe Portable Document Format (PDF)
Filter extensions: |
pdf |
Parse Command: |
C:\Program Files\gettext\gettext.exe "%f" "%o" |
Open Command: |
C:\Program Files\Adobe\Acrobat 5.0\Reader\AcroRd32.exe |
Hypertext Markup Language (HTML)
Filter extensions: |
html htm |
Parse Command: |
C:\Program Files\gettext\gettext.exe "%f" "%o" |
Open Command: |
C:\Program Files\Internet Explorer\iexplore.exe |
Plain Text
Filter extensions: |
txt |
Parse Command: |
C:\Program Files\gettext\gettext.exe "%f" "%o" |
Open Command: |
C:\Program files\Windows NT\Accessories\wordpad.exe |
Rich Text (RTF)
Filter extensions: |
rtf |
Parse Command: |
C:\Program Files\gettext\gettext.exe "%f" "%o" |
Open Command: |
C:\Program files\Windows NT\Accessories\wordpad.exe |
Word Perfect®
Filter extensions: |
wpd |
Parse Command: |
C:\Program Files\gettext\gettext.exe "%f" "%o" |
Open Command: |
|
Help Files
Filter extensions: |
hlp |
Parse Command: |
C:\Program Files\gettext\gettext.exe "%f" "%o" |
Open Command: |
winhlp32 |
Configuration Under Linux
There are numerous filters available to convert file formats into
plain text. Some might already come with your favorite distribution,
for others you might have to download the sources and compile them.
Word
Adobe Portable Document Format (PDF)
MP3 Description (ID3 Tags)
Hypertext Markup Language (HTML)
Adding Support for Other File Formats
If you want to index file types that are not mentioned in the
previous sections, you need to configure your own filters for them.
The following steps are required to configure a new filter:
- Download an appropriate converter application. The utility must
be able to produce a plain text file from a file in an other
format. Further, it should neither show a GUI nor require any user
interaction.
- Install the application on your system.
- Configure the converter as a filter in quaneko.
- After this procedure, the filter is ready to be used with quaneko.
What Is A Filter?
A filter configuration for one filter consists of:
- A list of file types this filter supports (e.g. "htm html").
- A string which specifies the application call for converting those
types into plain text (e.g.
html2text "%f" "%o" ).
We refer to this string as 'Filter Conversion String'.
- Optionally the name of the application which can be used to open that
document type (e.g. "mozilla")
Filter Conversion Strings
The filter command to parse a file and convert it into plain text can be
configured as a string which contains %f for the data file
that is handed from quaneko to the converter and %o for
the output file.
Example: The string
pdf2text "%f" "%o"
is converted at runtime to:
pdf2text "/home/tux/file.pdf" "/home/tux/.qnk_tmp.txt"
If %o is omitted, quaneko assumes that the filter streams the
plain text to standard output (internally it adds ">%o" to the command).
It's usually recommendable to add quotation marks around %f and
%o. Otherwise you will experience problems with spaces in
file names.
|