Tutorials & Examples Web-help Blog

Regex capture groups


Would like to be able to calculate a new field based on a regex capture group from another field.

For instance, let’s say I had a field named “HTML” with HTML in it, with a variety of HTML tags. But I’d like to pull out the title from the title tag. I’d like to create a transformation step that I put in something like “<TITLE>(.*)</TITLE>” (yes the regex expression could be improved) and then it fills in a new field “TITLE” with those capture groups. So if row one had <TITLE>My Webpage</TITLE> then the TITLE field would be “My Webpage” and if the next row had <TITLE>Another page</TITLE> then the TITLE field would be “Another page” etc.


Maybe I’m missing something, but it sounds like the “Regular expression” transformation does exactly this. Is there a case where it doesn’t do what is needed?


How would I pull out just what’s in between the open and close title tag?

I can see how to pull in the ENTIRE match, but I only want the CAPTURE GROUP. But it doesn’t seem that’s how the regular expression transformation works:

Instead of this I just want the title field to have “lkajsdf”, “lakjsdafalskjdf”, etc.


Oh, now I see what you are talking about. Point taken.


Will be added in 3.8 with three new capture modes: Matches only, Groups only, Matches and groups.