{{ transformations:SanitizeAction.png}}
====== SANITIZE TEXT ======
Category: Transform / Advanced\\

\\ 
=====Description=====
This action removes "invisible" characters from text values that are frequently unwanted because they may lead to mismatches and wrong merges:

  * Hidden system characters
  * Tabs
  * Line breaks
  * Leading spaces
  * Trailing spaces
  * Repeating spaces

Non-text values (numbers, symbols, etc.) are not affected by this action.\\


\\ 
=====Use cases=====
  *Use //Sanitize text// on text columns to be used in a merge action, just prior to performing the merge, to ensure "hidden" characters don't prevent proper matches.
  *Remove markup tags from XML- or HTML-based files, leaving the plain text for downstream processing.

\\ 
=====Action settings=====
^Setting^Description^
|Remove system characters|When checked, ASCII characters 0-31 are removed, except for //tab//, //carriage return// and //line feed// characters.|
|Tabs|Select how tab characters embedded in the text will be handled.  Options: //Do nothing//, //Remove//, //Remove repeating//,\\ and //Replace with spaces//.|
|Line breaks|Select how line breaks embedded in the text will be handled.  Options: //Do nothing//, //Remove//, //Remove repeating//,\\ and //Replace with spaces//.|
|Remove ASCII FE-FF|When checked, the characters with ASCII codes 0xFE (hexadecimal, 254 decimal) and 0xFF (hexadecimal, 255 decimal) will be removed.|
|Trim leading spaces|When checked, whitespace occurring at the start of text will be removed.|
|Trim trailing spaces|When checked, whitespace occurring at the end of text will be removed.|
|Remove repeating spaces|When checked, instances of more than one, adjacent space will be converted to a single space.|
|Remove XML/HTML tabs|When checked, all XML and HTML markup tags will be removed.|
|Sanitize columns|Select whether to sanitize all columns, or selected columns.  Options: //Sanitize all columns// or //Sanitize only\\ selected columns// (and select which columns to process).|

\\ 
=====Remarks=====
The //Remove repeating spaces// option removes repeating spaces from //anywhere// within the text, leading spaces, and trailing spaces.  All occurrences found within a text value will be replaced, so more than one instance within a single text value will be addressed.\\

\\ 
=====Examples=====

====Example #1====
>Clean out all unneeded text characters.

===Before (source data)===
(raw text shown for clarity)
<code>
Sample Text
"  2 Leading spaces"
"2 Trailing spaces  "
"<b>Bold HTML tags</b>"
"2 spaces here->  and 3 spaces here->   ."
</code>

===After (result table)===
^Sample Text^
|**2 Leading spaces**|
|**2 Trailing spaces**|
|**Bold HTML tags**|
|**2 spaces here-> and 3 spaces here-> .**|

===Action parameters===
>Remove system characters (ASCII 0-31 except TAB, CR, LF)
>Tabs: Remove
>Line breaks: Remove
>Options:  Select all options
>Sanitize all columns

\\ 
=====Community examples=====
  * [[https://community.easymorph.com/t//2008/2|“Printed” Text File: Could EasyMorph import this?]] ([[https://community.easymorph.com/uploads/short-url/kIb1qqOJb9WK6D1N1jFZdnHzC46.morph|Project]]; Module: //Parse Group//; Group: //Tab 1//; Table: //Header (3)//; Action position: //5//)
  * [[https://community.easymorph.com/t//2160/1|How to pull data from web APIs with pagination]] ([[https://community.easymorph.com/uploads/short-url/dvCSpcEDXYZ8aB0B2gtnt7qulTF.morph|Project]]; Module: //Main//; Group: //Group 1//; Table: //Query API with pagination//;\\ Action position: //5//)

\\ 