Parquet File Creation

camprice01 · October 30, 2019, 5:04am

Hi There. Does anyone have an example on how to covert a CSV file to Parquet? Thank you.

dgudkov · October 30, 2019, 8:01am

EasyMorph doesn’t have built-in means to convert CSV to Apache Parquet. You need to use 3rd party utilities for such conversion. The utilities can be run from EasyMorph using the “Run program” action.

mim · November 10, 2019, 7:30am

Yes, native export to parquet will be great addition

dgudkov · November 10, 2019, 5:03pm

@camprice01 and @mim,

Can you please share in a few words what is your current setup that uses Parquet files? How do you generate them currently and what are the applications/systems that consume the generated Parquet files?

I’d like to understand better the case for Parquet files.

Thank you.

mim · November 11, 2019, 2:08am

currently, I use PowerBI desktop and Python to generate and load a parquet file to google storage.
those files will be loaded into BigQuery, the reason we use parquet instead of CSV is the storage cost

to be honest loading a csv file to Google storage is a bigger priority as I can convert the csv later using google Function.

I think I will open a separate feature request for loading data to Google Storage

camprice01 · November 11, 2019, 11:34pm

@dgudkov Parquet files are commonly used in big data platforms. They allow for MPP columnar capability (via partitioning within the file). Compression is also an option using things like Snappy.

These types of files are becoming very popular with cloud tools such as AWS Athena, Google Cloud, and I believe Azure will have a similar capability very soon.

I currently use a 3rd party API that creates the files, schema, compression, and partitioning… I call that API from Easymorph.

dgudkov · November 25, 2023, 1:08am