How to read a Parquet file into Pandas DataFrame?

We Are Going To Discuss About How to read a Parquet file into Pandas DataFrame?. So lets Start this Python Article.

How to read a Parquet file into Pandas DataFrame?

How to solve How to read a Parquet file into Pandas DataFrame?

pandas 0.21 introduces new functions for Parquet:
import pandas as pd pd.read_parquet('example_pa.parquet', engine='pyarrow')
or
import pandas as pd pd.read_parquet('example_fp.parquet', engine='fastparquet')
The above link explains:
These engines are very similar and should read/write nearly identical parquet format files. These libraries differ by having different underlying dependencies (fastparquet by using numba, while pyarrow uses a c-library).

How to read a Parquet file into Pandas DataFrame?

pandas 0.21 introduces new functions for Parquet:
import pandas as pd pd.read_parquet('example_pa.parquet', engine='pyarrow')
or
import pandas as pd pd.read_parquet('example_fp.parquet', engine='fastparquet')
The above link explains:
These engines are very similar and should read/write nearly identical parquet format files. These libraries differ by having different underlying dependencies (fastparquet by using numba, while pyarrow uses a c-library).

Solution 1

pandas 0.21 introduces new functions for Parquet:

import pandas as pd
pd.read_parquet('example_pa.parquet', engine='pyarrow')

or

import pandas as pd
pd.read_parquet('example_fp.parquet', engine='fastparquet')

The above link explains:

These engines are very similar and should read/write nearly identical parquet format files. These libraries differ by having different underlying dependencies (fastparquet by using numba, while pyarrow uses a c-library).

Original Author chrisaycock Of This Content

Solution 2

Update: since the time I answered this there has been a lot of work on this look at Apache Arrow for a better read and write of parquet. Also: http://wesmckinney.com/blog/python-parquet-multithreading/

There is a python parquet reader that works relatively well: https://github.com/jcrobak/parquet-python

It will create python objects and then you will have to move them to a Pandas DataFrame so the process will be slower than pd.read_csv for example.

Original Author danielfrg Of This Content

Solution 3

Aside from pandas, Apache pyarrow also provides way to transform parquet to dataframe

The code is simple, just type:

import pyarrow.parquet as pq

df = pq.read_table(source=your_file_path).to_pandas()

For more information, see the document from Apache pyarrow Reading and Writing Single Files

Original Author WY Hsu Of This Content

Solution 4

Parquet

Step 1: Data to play with

df = pd.DataFrame({
    'student': ['personA007', 'personB', 'x', 'personD', 'personE'],
    'marks': [20,10,22,21,22],
})

Step 2: Save as Parquet

df.to_parquet('sample.parquet')

Step 3: Read from Parquet

df = pd.read_parquet('sample.parquet')

Original Author Harish Masand Of This Content

Conclusion

So This is all About This Tutorial. Hope This Tutorial Helped You. Thank You.

Also Read,

ittutorial team

I am an Information Technology Engineer. I have Completed my MCA And I have 4 Year Plus Experience, I am a web developer with knowledge of multiple back-end platforms Like PHP, Node.js, Python and frontend JavaScript frameworks Like Angular, React, and Vue.

Leave a Comment