 
	
		 
	
    
	
		 
	
	
		 
	
	
		 
	
	
		 
	
        
                 
	 
		It supports all major office document formats such as:
.doc
.docx
.html
.odp
.ods
.odt
.otp
.ots
.ott
.pdf
.ppt
.pptx
.stc
.sti
.stw
.sxc
.sxi
.sxw
.tif
.txt
.xls
.xlsx
.xml
abiword, antiword, catdoc, soffice or wvWare. Use the one that serves best your needs and
is available on your platform. For instance soffice is a very good choice to serve as a document converter.
However using it is rather performance demanding. The more simpler ones suffice most of the time but may
have an inferior quality of text being extracted.
.doc) you will need to install one of the following:
 antiword
abiword
catdoc
soffice
wvWare
.pdf files you need to install poppler-utils.
.ppt files you may select one of the following:
 catdoc 
pphtml
soffice
To index these file types, you will need to install the following tools from Sourceforge:
docx2txt for .docx
pptx2txt for .pptx
xls2txt for .xls
.xlsx
configure.
You do not need to install anything in the browser to use this extension. The following instructions are for the administrator who installs the extension on the server.
Open configure, and open the "Extensions" section. "Extensions Operation and Maintenance" Tab -> "Install, Update or Remove extensions" Tab. Click the "Search for Extensions" button. Enter part of the extension name or description and press search. Select the desired extension(s) and click install. If an extension is already installed, it will not show up in the search results. You can also install from the shell by running the extension installer as the web server user: (Be sure to run as the webserver user, not as root!)cd /path/to/foswiki perl tools/extension_installer <NameOfExtension> installIf you have any problems, or if the extension isn't available in
configure, then you can still install manually from the command-line. See https://foswiki.org/Support/ManuallyInstallingExtensions for more help.
configure before you can use the Contrib.
antiword, abiword or wvHtml is in place: Type antiword, abiword or wvHtml on the prompt and check that the command exists.
pdftotext is in place: Type pdftotext on the prompt and check that the command exists.
ppthtml is in place: Type ppthtml on the prompt and check that the command exists.
stringify some files (see below)
stringify Some users report problems with the stringification: The stringifier scipts fails, takes too long on attachments. Some times this may result from installation errors, especially of the installation of the backends for the stringification.
stringify give you the opportunity to test the stringification in advance.
Usage: stringify file_name
In the result you see, which stringifier is used and the result of the stringification.
Example:
stringify /path/to/foswiki/StringifierContrib/test/unit/StringifierContrib/attachement_examples/Simple_example.doc Simple example Keyword: dummy Umlaute: Grober, Uberschall, Anderung
lib/Foswiki/Contrib/Stringifier/Plugins. 
You can add new stringifier plugins by just adding new files here. The minimum things to be implemented are:
Foswiki::Contrib::StringififierContrib::Base
__PACKAGE__->register_handler($application, $file_extension);
$text = stringForFile ($filename)
| Name | Version | Description | 
|---|---|---|
| File::Which | >0 | Required | 
| Module::Pluggable | >0 | Required | 
| Spreadsheet::ParseExcel | >0 | Required for .xlsfiles | 
| Spreadsheet::XLSX | >0 | One of Spreadsheet::ParseXLSX or xlsx2csv is required for .xlsx files | 
| Encode | >0 | Required | 
| Error | >0 | Required | 
| catdoc | >0 | Optional | 
| ppthtml | >0 | Required | 
| pdftotext | >0 | Required for indexing =.pdf. Part of poppler-utils | 
| soffice | >0 | One of antiword, abiword, soffice or wvWare is required for .docand.docx0 files | 
| antiword | >0 | One of antiword, abiword, soffice or wvWare is required for =.doc files | 
| abiword | >0 | One of antiword, abiword, soffice or wvWare is required for .docfiles | 
| wvWare | >0 | One of antiword, abiword, soffice or wvWare is required for .docfiles | 
| html2text | >0 | One of html2text or lynx for indexing html files | 
| lynx | >0 | One of html2text or lynx for indexing html files | 
| odt2txt | >0 | Required for indexing OpenDocument and StarOffice documents | 
| xlsx2csv | >0 | One of Spreadsheet::ParseXLSX or xlsx2csv is required for =.xlsx files | 
| tesseract | >0 | OCR for tiff files,Optional | 
| 16 Aug 2018: | (5.20) register more mime types to the text stringifier | 
| 09 Jan 2018: | (5.10) added support for tiff documents using tesseract | 
| 18 Sep 2017: | (5.00) make html-to-text converter pluggable | 
| 31 Jan 2017: | (4.40) improved XLSX stringifier | 
| 23 Jan 2017: | (4.30) added stringifier to index XLS using soffice | 
| 18 Oct 2015: | (4.20) removed dependency on File::MMagic; now using extension-based mime detection | 
| 01 Oct 2015: | (4.10) don't default to pass-through for non-supported document types; fixed unit tests | 
| 29 Sep 2015: | (4.00) added unicode support with Foswiki > 2.0 | 
| 22 Jul 2015: | (3.10) added support for stringification of ppt using catdoc as ppthtml isn't available on some distros | 
| 29 Aug 2014: | (3.00) added support for stringification using open/libreoffice | 
| 07 May 2012: | (2.20) added configuration parameter to specify the encoding of the output of each external helper in use | 
| 17 Oct 2011: | (2.10) using wvText instead of wvHtml now; encoding stringified files to the site's charset now; fixed unit tests to use utf8 exclusively | 
| 05 Sep 2011: | (2.00) added OpenDocument serializer; removed dependency left-over on Text::Iconv; added dependency on odt2txt; fixed defaults for wv serializer | 
| 01 Dec 2010: | (1.20) moved core from StringifierContrib to Stringifier not to disturb configure | 
| 12 Nov 2010: | (1.14) Foswiki:Main.PadraigLennon - Foswikitask:Item9311 | 
| 23 Oct 2010: | (1.12) made system fault-tolerant in case of missing dependencies for a given file type; doc cleanup -- Foswiki:Main.WillNorris | 
| 12 Feb 2010: | robust parsing of password protected XLS files | 
| 02 Oct 2009: | extracted from Foswiki:Extensions/KinoSearchContrib (MD) | 
| Author | Foswiki:Main.MarkusHesse, Foswiki:Main.SvenDowideit, Foswiki:Main.MichaelDaum & Foswiki:Main.AndrewJones | 
| Version | 5.20 | 
| Release | 16 Aug 2018 | 
| Description | Helper library to stringify binary document formats | 
| Repository | https://github.com/foswiki/StringifierContrib | 
| Copyright | © 2007, Foswiki:Main.MarkusHesse; © 2009-2018, Foswiki Contributors | 
| License | GPL (GNU General Public License) | 
| Home | Foswiki:Extensions/StringifierContrib | 
| Support | Foswiki:Support/StringifierContrib | 
 Copyright © by the contributing authors. All material on this site is the property of the contributing authors.
Copyright © by the contributing authors. All material on this site is the property of the contributing authors.