About 8,410,000 results
Open links in new tab
  1. What are the pros and cons of the Apache Parquet format …

    Apr 24, 2016 · Parquet has gained significant traction outside of the Hadoop ecosystem. For example, the Delta Lake project is being built on Parquet files. Arrow is an important project …

  2. Reading / Fixing a corrupt parquet file - Stack Overflow

    Sep 3, 2024 · Thank you. I got pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file. when I tried to construct a ParquetFile …

  3. Inspect Parquet from command line - Stack Overflow

    How do I inspect the content of a Parquet file from the command line? The only option I see now is $ hadoop fs -get my-path local-file $ parquet-tools head local-file | less I would like to avoid

  4. Querying multiple parquet files in a range using duckdb

    Apr 28, 2024 · I have parquet files arranged in this format /db/ {year}/table {date}.parquet In each year folder, there are up to 365 files. If I want to query data from a time range, say the week …

  5. How to read partitioned parquet files from S3 using pyarrow in …

    Jul 13, 2017 · By setting dataset=True awswrangler expects partitioned parquet files. It will read all the individual parquet files from your partitions below the s3 key you specify in the path.

  6. Spark parquet partitioning : Large number of files

    Jun 28, 2017 · Spark 2.2+ From Spark 2.2 on, you can also play with the new option maxRecordsPerFile to limit the number of records per file if you have too large files. You will …

  7. Methods for writing Parquet files using Python? - Stack Overflow

    Oct 5, 2015 · I'm having trouble finding a library that allows Parquet files to be written using Python. Bonus points if I can use Snappy or a similar compression mechanism in conjunction …

  8. Reading partitioned multi-schema parquet files from S3 using Polars

    Jan 23, 2024 · Unfortunately scan_parquet doesn't have that option. The pl.concat method does allow for a vertical relaxed combination of frames so you can use that. There are two steps to …

  9. Read multiple parquet files in a folder and write to single csv file ...

    Aug 5, 2018 · I need to read these parquet files starting from file1 in order and write it to a singe csv file. After writing contents of file1, file2 contents should be appended to same csv without …

  10. How to read a Parquet file into Pandas DataFrame?

    How to read a modestly sized Parquet data-set into an in-memory Pandas DataFrame without setting up a cluster computing infrastructure such as Hadoop or Spark? This is only a …