Capture Data on PDP if URL Provided

Hi. Does EM have the functionality to capture the results of a Product Detail Page if a URL is provided in an excel or csv file? For example, I have the following URL links and would like to capture what is presented on the site.

If you have a list of URLs, you can use iterations to loop across the lists and in each iteration, use the "Download file" action, to download the HTML page returned by each URL.

Note that it will only save the URL page without the images or any other linked content.

Also, check this topic if you need to parse a website and download content.

@dgudkov @JWelch ChatGPT is surprisingly good and quite accurate for these types of tasks. I did a quick test on the URL:

ChatGPT has a new thing called Assistants and Anthropic has one called Tools that can likely be leveraged via an API call. Assuming this is true, you can iterate URL's though the API and get the product details.

While GenAI is new and in the hype cycle, data extraction from unstructured documents seems to be a strong capability. I am diligently working to understand how EasyMorph and GenAI can work together. I think there is some magic there that will surface soon.



@dgudkov @JWelch Thought on this more and I wasn't very smart in my approach. This is how I am thinking now and I think it is quite workable:

1.Create a connector to connect to Google Search API and a connector for ChatGPT
2. Send the product URL to the search API using the Web Request action
3. Collect the results
4. Send those results to the Ask ChatGPT action using JSON inside of the Text option for the Body
5. Include instructions in the prompt about what you want ChatGPT to extract and then include the search results using a parameter.

Assuming this works as planned, you can add iteration to cycle through multiple product pages. I would be interested to hear if it works.

Additional information on the Ask ChatGPT action: :bulb: Experimental action: Ask ChatGPT - General Q&A - EasyMorph Community - Data preparation professionals and enthusiasts

Thank you so much for the additional information. We will try it out over the next week and report back.

cc: @ceverhart

@JWelch @ceverhart

I couldn't resist the fun, so I worked through a concept and proved this out yesterday and it works pretty well. Some takeaways:

  1. When I use our internally developed chatbot running ChatGPT 4o and manually instruct it, the bot returns more details
  2. Using the exact same Google Search API, which means there is likely something in the application code. We are debugging now to see if we can understand what is going on. (We used a chatbot accelerator, so don't know all the code)
  3. @dgudkov I think there could be some interesting use cases here where EM could iterate through a list of URL's, scrape that content and get ChatGPT to summarize. Looking forward, and to a certain extent already here, EM could be the glue between AI agents orchestrating flows automatically based on the inputs.

Still trying to wrap my head around it, but I think there is real opportunity here to leverage EM in collaboration with GenAI.

Thanks for putting up with me. The exercise was a fun diversion!

Disclaimer on the screenshot below: I am far removed from day-to-day development, so the JSON parsing is pretty ugly.