Hi There. Does anyone have an example on how to covert a CSV file to Parquet? Thank you.
EasyMorph doesn’t have built-in means to convert CSV to Apache Parquet. You need to use 3rd party utilities for such conversion. The utilities can be run from EasyMorph using the “Run program” action.
Yes, native export to parquet will be great addition
Can you please share in a few words what is your current setup that uses Parquet files? How do you generate them currently and what are the applications/systems that consume the generated Parquet files?
I’d like to understand better the case for Parquet files.
currently, I use PowerBI desktop and Python to generate and load a parquet file to google storage.
those files will be loaded into BigQuery, the reason we use parquet instead of CSV is the storage cost
to be honest loading a csv file to Google storage is a bigger priority as I can convert the csv later using google Function.
I think I will open a separate feature request for loading data to Google Storage
@dgudkov Parquet files are commonly used in big data platforms. They allow for MPP columnar capability (via partitioning within the file). Compression is also an option using things like Snappy.
These types of files are becoming very popular with cloud tools such as AWS Athena, Google Cloud, and I believe Azure will have a similar capability very soon.
I currently use a 3rd party API that creates the files, schema, compression, and partitioning… I call that API from Easymorph.