Substitute and extract information from content by using regular expressions
  Description 
This plugin allows to substitute and extract information from content by
using regular expressions. There are three different types of new functions: 
-  FORMATLIST: maniplulate a list of items; it is highly configurable to      define what constitutes a list and how to extract items from it
-  SUBST, STARTSUBST/STOPSUBST: substiture a pattern in a chunk of text
-  EXTRACT, STARTEXTRACT/STOPEXTRACT: extract a pattern from a text
 
While the START-STOP versions of SUBST and EXTRACT work on inline text,
the normal versions process a source topic before including it into the current one.
  Syntax Rules 
  SUBST 
Syntax: 
%SUBST{topic="..." ...}% 
insert a topic by processing its content.
 
-  topic="...": name of the topic text to be processed
-  rev="...": revision of the topic to be processed (defaults to latest version)
-  text="...": text to be processed (has got higher precedence than 'topic')
-  pattern="...": pattern to be extracted or substituted
-  format="...": format expression or pattern substitute
-  header="...": header string prepended to output
-  footer="...": footer string appended to output
-  limit="<n>"maximum number of occurences to extract or substitute counted from the      start of the text (defaults to100000aka all hits)
-  skip="<n>"skip the first n occurences
-  exclude="...": skip occurences that match this regular expression
-  include="...": skip occurences that don't match this regular expression
-  sort="on,off,alpha,num" order of the formatted items (default "off")
-  expand="on,off": toggle expansion of markup before filtering     (defaults toon)
  STARTSUBST, STOPSUBST 
Syntax:
%STARTSUBST{...}% 
... 
%STOPSUBST%
substitute text given inline. see 
SUBST.
Syntax: 
%EXTRACT{topic="..."  ...}% 
extract text from a topic. see 
SUBST.
Syntax:
%STARTEXTRACT{...}% 
... 
%STOPEXTRACT%
extract content given inline. see 
SUBST.
Syntax: 
%FORMATLIST{"<list>" ...}%
formats a list of items. The <list> argument is separated into items by using
a split expression; each item is matched agains a pattern and then formatted
using a format string while being separated by a separator string; the result is
prepended with a header and appended with a footer in case the list is not empty. 
-  <list>: the list 
-  tokenize="...": regex to tokenize the list before spliting it up, tokens are inserted back again after the split stage has been passed
-  split="...": the split expression (default ",")
-  replace="key1=value1,key2=value2, ...": this allows to preprocess each list item by replacing the given keys with their value
-  pattern="...": pattern applied to each item (default "\s(.*)\s")
-  format="...": the format string for each item (default "$1")
-  header="...": header string
-  footer="...": footer string
-  separator="...": string to be inserted between list items
-  lastseparator="...": string separating the last item from the rest of the list
-  null="...": the format string to render the empty list
-  hideempty="on,off": when set to "on" then empty list items will not be added to the result (empty in the sense of ''); set this to "off" to still add them (default "on")
-  limit="...": max number of items to be taken out of the list (default "-1")
-  skip="...": number of list items to skip, not adding them to the result
-  sort="on,off,alpha,num,nocase" order of the formatted items (default "off")
-  reverse="on,off": reverse the sortion of the list
-  unique="on,off": remove dupplicates from the list
-  exclude="...": remove list items that match this regular expression
-  include="...": remove list items that don't match this regular expression
-  selection="...": regular expression that a list item must match to be "selected"; if this matches the $markeris inserted
-  marker="...": string to be inserted when the selectionregex matches; this will be inserted at the position$markeras     indicated informat.
-  map="key1=value1,key2=value2, ...": this establishes a key-value hash available via the $map()variable. (see also thereplaceparameter for means     to preprocess list items automatically.)
The pattern string shall group matching substrings in the list item to which
you can refer to by using $1, $2, ... in the format string. Any format string
(
format, 
header, 
footer) may contain variables 
$percnt$, 
$nop,
$dollar and 
$n. The variable 
$index referse to the position number within
the list being formatted; 
$hits refers to the total number of matched list
elements; 
$count expands to the total number of elements in the list; 
$marker is set if the 
selection regular expression matches the
current item. The 
$map(key) macro returns the value for "key" as specified in
the 
map argument.
  MAKEINDEX 
Syntax: 
%MAKEINDEX{"<list>" ...}%
formats a list into a multi-column index like in MediaWiki's category topcis.
MAKEINDEX insert capitals as headlines to groups of sorted items. It will try to balance all
columns equally, and keep track of breaks to prevent "schusterkinder", that is avoid 
isolated headlines at the bottom of a column.
parameters: 
-  <list>: the list of items
-  split="...": the split expression to separate the <list> into items (default ",")
-  pattern="...": pattern applied to each item (default "(.*)")
-  cols="...": maximum number of cols to split the list into, defaults to automatic, that is the number of columns is specified bycolwidthandcolgap;      in general it is better to specifycolwidthandcolgaphrather than hard-coding the number of columns; this will let the viewport of the browser/device decide on     the number of columns dynamically based on the available space
-  colwidth="...": maximum width of a column, defaults to 18em
-  colgap="...": size of gap betweel columns, defaults to 2em
-  format="...": format of each list item (default "$item")
-  sort="on,off,alpha,num,nocase": sort the list (default "on")
-  unique="on/off": removed duplicates (default "off")
-  exclude="...": pattern to check against items in the list to be excluded
-  include="...": pattern to check against items in the list to be included
-  reverse="on/off": reverse the list (default "off")
-  header="...": format string to prepend to the result
-  footer="..." format string to be appended to the result
-  transliterate="on/off/<mapping>" influences the way sorting and grouping is handled:     either a boolean switch to enable/disable decoding unicodes into their neares latin character (using CPAN:Text::Unidecode),    or a custom mapping list "<source1>=<target1>, <source2>=<target2>, ..."to map a source string to a given target string (default "on")
Like in FORMATLIST the 
format parameter can make use of 
$1, 
$2, ... variables
to match the groupings defined in the 
pattern argument (like in 
pattern="(.*);(.*);(.*)") .
The first matched grouping $1 will be used as the $item to sort the list and is optionally being transliterated.
In addition 
header and 
footer might contain the 
$anchors variable which will expand
to a navigation to jump to the groups within the index.
  Examples 
One of the uses of this plugin is to extract data from tables, which is useful for creating "database-like" wiki applications where data is stored in foswiki tables. While it is certainly possible to do that without this plugin the plugin makes these requests easier to create and maintain. Note, however, that best practice is to store database-like
information using 
DataForms, so that you don't need to parse the format of the data to extract its records repeatedly.
The table:
	
		
			| Pos | Description | Hours | 
	
	
		
			| 1 | onsite troubleshooting | 3 | 
		
			| 2 | normalizing data to new format | 10 | 
		
			| 3 | testing server performance | 5 | 
	
You type:
%EXTRACT{topic="%TOPIC%" expand="off" 
  pattern="^\|\s\s(.*?)\s*\|\s*(.*?)\s*\|\s*(.*?)\s*\|" 
  format="   * it took $3 hours $2$n"
  skip="1"
}%
Expected result (simulated):
 
-  it took 3 hours onsite troubleshooting 
-  it took 10 hours normalizing data to new format 
-  it took 5 hours testing server performance 
Actual result (this site):
-  it took 3 hours onsite troubleshooting
-  it took 10 hours normalizing data to new format
-  it took 5 hours testing server performance
-  it took  hours added revparam to %SUBST and %EXTRACT
-  it took  hours improved sorting of lists, i.e. with numeric values
-  it took  hours rewrite MAKEINDEX from using tables to css3 multicolumn
-  it took  hours don't fallback to unidecode if an explicit mapping is given; don't use Foswiki's internal anchor creator as it does not support unicode
-  it took  hours fixing deprecated unescaped left brace in regexes
-  it took  hours transliterate/normalize unicode strings before sorting them in MAKETEXT
-  it took  hours fixed paging through lists in FORMATLIST
-  it took  hours modernized plugin by using a proper OO-core;                   fixed processing of tokenizeproperly;                  addedreplaceparameter for FORMATLIST;                   fixed the plugin callingFoswiki::Func::expandCommonVariables()itself unnecessarily
-  it took  hours fixed SUBST macro topicparam processing embedded META
-  it took  hours fixed parsing zero values in lists (by Grzegorz Marszalek)
-  it took  hours fixed wrapper for non-official api call to getAnchorName on foswiki-1.1
-  it took  hours ease tokenize; forward compatibility for newer foswikis
-  it took  hours added includecounterpart to already existingexcludeparams;                   fixed SUBST not to forget about the non-matching tail of a char sequence
-  it took  hours added $anchorsto MAKEINDEX (by Dirk Zimoch);                   addednocaseoption to FORMATLIST (by Dirk Zimoch);                   fixed null/empty string match in FORMATLIST
-  it took  hours sorting a list before, not after, formatting it in FORMATLIST
-  it took  hours added MAKEINDEX, added lazy compilation
-  it took  hours using registerTagHandler() as far as possible;                   enhanced parameters to EXCTRACT and SUBST
-  it took  hours fixed SUBST, added skip parameter to FORMATLIST
-  it took  hours fixed limitparameter in FORMATLIST
-  it took  hours added use strict;and fixed revealed errors
-  it took  hours fixed SUBST not to cut off the rest of the text
Use CSS tags to format text comments as a tabular data (e.g., to allow sorting).
The comments:
-- Michael Daum on 22 Aug 2005
-- Michael Daum on 22 Aug 2005
%EXTRACT{
   topic="%TOPIC%" expand="off"
   pattern=".div class=\"text\">.*?[\r\n]+(.*?)[\r\n]+(?:.*?[\r\n]+)+?-- (.*?) on (.*?)[\r\n]+"
   format="| $3 | $2 | $1 ... |$n" header="|*Date*|*Author*|*Headline*|$n"
}%
Expected result (simulated):
	
		
			| Date | Author | Headline | 
	
	
		
			| 22 Aug 2005 | Michael Daum | This is the first comment. ... | 
		
			| 22 Aug 2005 | Michael Daum | This is the second comment. ... | 
	
Actual result (this site):
	
		
			| Date | Author | Headline | 
	
	
		
			| 22 Aug 2005 | Michael Daum | This is the first comment. ... | 
		
			| 22 Aug 2005 | Michael Daum | This is the second comment. ... | 
	
  MAKEINDEX example 1: creating an index from a chunk of text 
compare with 
Philosophy articles needing attention
  MAKEINDEX example 2: creating an index for a search result 
 
  Installation Instructions 
You do not need to install anything in the browser to use this extension. The following instructions are for the administrator who installs the extension on the server.
Open configure, and open the "Extensions" section. "Extensions Operation and Maintenance" Tab -> "Install, Update or Remove extensions" Tab.  Click the "Search for Extensions" button.  
Enter part of the extension name or description and press search.   Select the desired extension(s) and click install. If an extension is already installed, it will 
not show up in the
search results.
You can also install from the shell by running the extension installer as the web server user: (Be sure to run as the webserver user, not as root!)
cd /path/to/foswiki
perl tools/extension_installer <NameOfExtension> install
If you have any problems, or if the extension isn't available in 
configure, then you can still install manually from the command-line. See 
https://foswiki.org/Support/ManuallyInstallingExtensions for more help.
  Dependencies 
| Name | Version | Description | 
|---|
| Text::Unidecode | >=1.27 | Optional | 
  Change History 
	
		
			| 25 Oct 2018: | added revparam to %SUBST and %EXTRACT | 
		
			| 08 Oct 2018: | added colwidthandcolgapto %MAKEINDEX; fixed numerical sorting of lists | 
		
			| 01 Jun 2018: | improved sorting of lists, i.e. with numeric values | 
		
			| 05 Mar 2018: | css fixes for MAKEINDEX | 
		
			| 30 Aug 2017: | rewrite MAKEINDEX from using tables to css3 multicolumn | 
		
			| 05 Sep 2016: | added $hitsto FORMATLIST to distinguish it from$countand$index | 
		
			| 29 Apr 2016: | don't fallback to unidecode if an explicit mapping is given; don't use Foswiki's internal anchor creator as it does not support unicode | 
		
			| 20 Apr 2016: | added transliterateparameter, including custom mappings; upgraded Text::Unidecode fallback shipped with this plugin | 
		
			| 31 Aug 2015: | fixing deprecated unescaped left brace in regexes | 
		
			| 17 Jul 2015: | fixed compatibility with Foswiki-2.x | 
		
			| 10 Apr 2014: | transliterate/normalize unicode strings before sorting them in MAKETEXT | 
		
			| 19 Jun 2012: | added lastseparator(by Foswiki:Main/OliverKrueger);                  fixed paging when using together withincludeandexcludeparameters | 
		
			| 15 May 2012: | fixed paging through lists in FORMATLIST | 
		
			| 05 May 2012: | fixed lists not being processed properly before iterating over them in FORMATLIST and MAKEINDEX | 
		
			| 19 Apr 2012: | modernized plugin by using a proper OO-core;                   fixed processing of tokenizeproperly;                  addedreplaceparameter for FORMATLIST;                   fixed the plugin callingFoswiki::Func::expandCommonVariables()itself unnecessarily | 
		
			| 10 Jan 2012: | fixed filtering zero; fixed counting list items without formating them; added hideemptyparameter to enable/disable rendering empty list items | 
		
			| 29 Sep 2011: | fixed SUBST macro topicparam processing embedded META | 
		
			| 25 Aug 2011: | fixed perl rookie error initializing defaults | 
		
			| 14 Jul 2011: | fixed parsing zero values in lists (by Grzegorz Marszalek) | 
		
			| 06 Apr 2011: | fixed SUBST to removing everything after the last match | 
		
			| 23 Jul 2010: | fixed wrapper for non-official api call to getAnchorName on foswiki-1.1 | 
		
			| 07 Jun 2010: | fixed expanding standard escapes ($n, $percent, ...); improved examples in docu | 
		
			| 12 Feb 2010: | ease tokenize; forward compatibility for newer foswikis | 
		
			| 17 Nov 2009: | added tokenizepattern for FORMATLIST;                   fixed potential deep recursion in SUBST/EXTRACT | 
		
			| 14 Sep 2009: | added includecounterpart to already existingexcludeparams;                   fixed SUBST not to forget about the non-matching tail of a char sequence | 
		
			| 17 Apr 2009: | converted to foswiki, added numerical sorting to MAKETEXT | 
		
			| 08 Oct 2008: | added $anchorsto MAKEINDEX (by Dirk Zimoch);                   addednocaseoption to FORMATLIST (by Dirk Zimoch);                   fixed null/empty string match in FORMATLIST | 
		
			| 20 Aug 2008: | added selectionandmarkerto FORMATLIST, similar in use as VarWEBLIST | 
		
			| 03 Jul 2008: | sorting a list before, not after, formatting it in FORMATLIST | 
		
			| 08 May 2008: | added 'text' parameter to SUBST and EXTRACT;                   fixed SUBST as it was pretty useless before | 
		
			| 07 Dec 2007: | added MAKEINDEX, added lazy compilation | 
		
			| 14 Sep 2007: | added sorting for EXTRACT and SUBST | 
		
			| 02 May 2007: | using registerTagHandler() as far as possible;                   enhanced parameters to EXCTRACT and SUBST | 
		
			| 05 Feb 2007: | fixed escapes in format strings;                   added better default value for max number of hits to prevent deep recursions                   on bad regexpressions | 
		
			| 22 Jan 2007: | fixed SUBST, added skip parameter to FORMATLIST | 
		
			| 18 Dec 2006: | using registerTagHandler for FORMATLIST | 
		
			| 13 Oct 2006: | fixed limitparameter in FORMATLIST | 
		
			| 31 Aug 2006: | added NO_PREFS_IN_TOPIC | 
		
			| 15 Aug 2006: | added use strict;and fixed revealed errors | 
		
			| 14 Feb 2006: | moved in FORMATLIST from the Foswiki:Extensions/NatSkinPlugin;                  added escape variables to format strings | 
		
			| 06 Dec 2005: | fixed SUBST not to cut off the rest of the text | 
		
			| 09 Nov 2005: | fixed deep recursion using expand="on" | 
		
			| 22 Aug 2005: | Initial version; added expandtoggle |