[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1107333: RFP: pg-parquet -- Copy to/from Parquet in S3, Azure Blob Storage, Google Cloud Storage, http(s) stores, local files or standard inout stream from within PostgreSQL



Package: wnpp
Severity: wishlist
X-Debbugs-Cc: debian-rust@lists.debian.org, team+postgresql@tracker.debian.org

* Package name    : pg-parquet
  Version         : 0.4.0
  Upstream Contact: https://github.com/CrunchyData
* URL             : https://github.com/CrunchyData/pg_parquet/?tab=readme-ov-file
* License         : PostgreSQL
  Programming Lang: Rust
  Description     : Copy to/from Parquet in S3, Azure Blob Storage, Google Cloud Storage, http(s) stores, local files or standard inout stream from within PostgreSQL

pg_parquet is a PostgreSQL extension that allows you to read and write
Parquet files, which are located in S3, Azure Blob Storage, Google
Cloud Storage, http(s) endpoints or file system, from PostgreSQL via
COPY TO/FROM commands. It depends on Apache Arrow project to read and
write Parquet files and pgrx project to extend PostgreSQL's COPY
command.

-- Copy a query result into Parquet in S3
COPY (SELECT * FROM table) TO 's3://mybucket/data.parquet' WITH (format 'parquet');

-- Load data from Parquet in S3
COPY table FROM 's3://mybucket/data.parquet' WITH (format 'parquet');

----

We're using this on a development database server and are hoping to
offload data from PostgreSQL into parquet files. We've been compiling
this from source which is causing some trouble during major upgrades.

Typically, extensions are maintained by the PostgreSQL, but in this
case it's a Rust extension, so perhaps the rust team could do it?

Not sure.


Reply to: