Login to website and download

Is there an (easy) way with EasyMorph that we can open a website URL, login to this website, and then download an csv from that website?
Or should this be handled with external code?

It depends on how authentication is done on the website. If authentication is cookie-based and done using a web form, then it can be handled with EasyMorph relatively easy:

Step 1: Understand the login form

  1. Look at the source code of the login page and find the authentication HTML form. Here is an example of such a form (taken from a random internet search and simplified):
<form action="login.php" method="post">

    <label for="uname"><b>Username</b></label>
    <input type="text" placeholder="Enter Username" name="uname" required>

    <label for="psw"><b>Password</b></label>
    <input type="password" placeholder="Enter Password" name="psw" required>

    <button type="submit">Login</button>

</form>
  1. In the form, find the action URL and the HTTP method. For instance, in the form above the action URL path is /login.php and the HTTP method is POST.

  2. Also, find the input names for login and password. In the example above they are uname and psw respectively.

Step 2: Configure Web Request action for logging in

  1. Configure a “Web location” connector. The URL should be the website URL, and the authentication set to “None”.

  2. Create a “Web request” action. In the action:

    • Set the HTTP request method. In the form above it’s POST.
    • Set the request path to the login action path. In the form above it’s login.php.
    • Set request body to “Form” and configure two form values for username and password.

For the web form above a sample web request is below:

Step 3: Configure Web Request action for downloading a file

  1. Create another “Web request” action. In the action:

    • Set the HTTP request method to GET.
    • Set the request path to the URL of the file to download
    • In the “Response” tab of the action, choose “Save response body to file” and provide a file location.

See screenshot below:

How it works

When you run the authentication action, EasyMorph will submit the webform as if it was submitted from the website. If the request is successful, the website will respond with an HTML page and will set a session cookie.

The download request will send a GET request with the session cookie attached and save the response body (the requested file) into a file.

:exclamation:Important The cookie will be temporarily saved in the “Web location” connector and will stay there until project execution stops. Therefore it is important to run the authentication web request and the download action in the same run. If you run only the authentication request, then stop, then run the download request, the session cookie will be lost in between and the download request will be rejected as unauthorized. If you modified the download action, then you need to re-run the workflow starting from the authentication request, in one run.

Hint: right-click action and select “Discard result” to change its status to uncalculated.

Notes

  • If authentication is done using Javascript it may still be possible to emulate in EasyMorph the POST request done with JS. But this is less trivial and requires inspecting in the browser the requests sent from the login page during authentication.
  • The “Save response body to file” doesn’t support multi-part downloads, so a large file may be saved incompletely.
  • You may need to send a web request (GET) to load the login page before the authentication web request, if the website needs requires a cookie sent with the authentication form. Loading the login page will also be needed if the website uses a CSRF token (see my comment below).
  • Besides downloading a file, in a similar fashion, you can download a web page with needed information and then parse it in EasyMoprh. The “Load plain text” action would be helpful in this case.

Thanks for your answer!

This is the code of the form:

Which elements to take from the form, as <label for and <input name are not equal, but they are in your example.
I get a status 500 error while trying.

Look the the “input” elements. In this form there are 2 of them with the following names:

Employee[username]
Employee[password]

The login form also uses a CSRF token. It complicates things a bit, but hopefully not too much. The CSRF token must be submitted with the form, however, it’s generated dynamically when the login page is loaded. Therefore:

Dealing with CSRF tokens

When a login page uses a CSRF token, an additional web request must be executed before the authentication request. The web request should fetch the login page itself. For this create a “Web request” action with the following settings:

  • Request path - the path of the login page
  • Request method - GET
  • Response mode - Return response

When executed, the response body will contain the HTML text of the login page, including the login form and the CSRF token in it. In the screenshot above, the “input” element with the CSRF token is named _csrf and has the value TXu9DYANznHlHFhNTHP9...... (I’m too lazy to type it all). Note that the token isn’t visible on the login web page in the browser, because it’s a hidden field (notice the type “hidden” in the “input” element of the token).

Extract the token value using EasyMorph functions (keepbetween() would work nicely here), and include the token and its value into the authentication request as a form value.

Therefore, when a CSRF token is used, the authentication request should contain at least 3 form values:

  • username
  • password
  • CSRF token

For the web form in the screenshot above, the “Web request” action should contain look as below:

image

Some websites may require the CSRF token additionally submitted as a request header. In this case, the authentication request should be inspected in the browser in order to understand the name of the request header that contains the CSRF token.

hi Dmitry,
I know it’s a bit late, but finally had some time to look at this again…
Thanks for such a extended answer.
However, I do get a code 500…
Just want to be sure, I need to iterate the CSRF token to another module to be able to pass it further. Will it still be in the same run then?

this is how it looks in the header when I am logged in manually:

this is the error that I get:
HTTP/1.1 500 Internal Server Error
transfer-encoding: chunked
x-backend: web02
access-control-allow-origin: *
access-control-allow-headers: *
access-control-request-methods: POST, GET, OPTIONS
x-balancer: lb01
strict-transport-security: max-age=31536000;
Date: Tue, 13 Apr 2021 20:27:29 GMT
Server: nginx
Content-Type: text/html; charset=UTF-8

An Error occurred while handling another error:
yii\base\InvalidRouteException: Unable to resolve the request "site/error". in /mnt/nfs/web/htdocs/abcd/releases/7763/backoffice/vendor/yiisoft/yii2/base/Module.php:543
Stack trace:
#0 /mnt/nfs/web/htdocs/abcd/releases/7763/backoffice/vendor/yiisoft/yii2/web/ErrorHandler.php(109): yii\base\Module->runAction()
#1 /mnt/nfs/web/htdocs/abcd/releases/7763/backoffice/vendor/yiisoft/yii2/base/ErrorHandler.php(135): yii\web\ErrorHandler->renderException()
#2 [internal function]: yii\base\ErrorHandler->handleException()
#3 {main}
Previous exception:
yii\web\BadRequestHttpException: Unable to verify your data submission. in /mnt/nfs/web/htdocs/abcd/releases/7763/backoffice/vendor/yiisoft/yii2/web/Controller.php:216
Stack trace:
#0 /mnt/nfs/web/htdocs/abcd/releases/7763/backoffice/vendor/yiisoft/yii2/base/Controller.php(179): yii\web\Controller->beforeAction()
#1 /mnt/nfs/web/htdocs/abcd/releases/7763/backoffice/vendor/yiisoft/yii2/base/Module.php(534): yii\base\Controller->runAction()
#2 /mnt/nfs/web/htdocs/abcd/releases/7763/backoffice/vendor/yiisoft/yii2/web/Application.php(104): yii\base\Module->runAction()
#3 /mnt/nfs/web/htdocs/abcd/releases/7763/backoffice/vendor/yiisoft/yii2/base/Application.php(392): yii\web\Application->handleRequest()
#4 /mnt/nfs/web/htdocs/abcd/releases/7763/backoffice/web/index.php(12): yii\base\Application->run()
#5 {main}

Yes. that's correct. Extract the token and then do a 1-cycle iteration to pass the token into another module which would use it in a web request as a parameter.

Thanks Dmitry!!