Software Engineering

How to read csv file in OSS with PolarDB

머니기어 2023. 12. 27. 13:58
반응형

Introduction

This post explains how to easily read csv file that has been uploaded on OSS as table with using PolarDB (for MySQL or PostgreSQL). This methods can make your PolarDB directly query CSV-formatted data stored on OSS just like AWS Redshift does one on S3.

 

Prerequisites

Supported versions

According to the documentation, if you're using PolarDB for MySQL, the cluster meets one of the following requirements:

  • A cluster of PolarDB for MySQL 8.0.1 whose revision version is 8.0.1.1.25.4 or later.
  • A cluster of PolarDB for MySQL 8.0.2 whose revision version is 8.0.2.2.1 or later.

 

If you're using PolarDB for PostgreSQL, the cluster meets the following requirements. This is not mentioned in PolarDB documentation but in PostgreSQL's :

  • A cluster of PolarDB for PostgreSQL whose version is 12 or later.

Create key pair to access OSS

You must have a key pair to access OSS.

  1. Log on to the RAM console.
  2. In the left-side navigation pane, choose Identities > Users.
  3. Click the username of the RAM user or create a new RAM user whose permission includes OSS access policy.
  4. In the User AccessKeys section, click Create AccessKey.
  5. In the Create AccessKey message, view the AccessKey ID and AccessKey secret.
  6. You can click Download CSV File to download the AccessKey pair or click Copy to copy the AccessKey pair.

SQL Procedure with MySQL

Create a server with OSS information.

CREATE SERVER <server_name> 
FOREIGN DATA WRAPPER oss OPTIONS (
    DATABASE '<my_database_name>', --optional
    EXTRA_SERVER_INFO '{
        "oss_endpoint": "<my_oss_endpoint>",
        "oss_bucket": "<my_oss_bucket>",
        "oss_access_key_id": "<my_oss_access_key_id>",
        "oss_access_key_secret": "<my_oss_access_key_secret>",
        "oss_prefix":"<my_oss_prefix>",
        "oss_sts_token":"<my_oss_sts_token>"
    }' -- prefix and sts token are optional
);

Create a foreign table

CREATE TABLE <table_name> 
(...) 
engine=csv 
connection="<connection_string>"

 

<connection_string> consists of the OSS server name and optional directory or file path.

 

e.g. "server_name/folder_name/file_name" (It does not contain the ".csv" extension.)

 

SQL Procedure with PostgreSQL

Create foreign data wrapper for OSS

CREATE EXTENSION oss_fdw;

Create a server with OSS information

CREATE SERVER <server_name>
FOREIGN DATA WRAPPER "oss_fdw" OPTIONS
(
    HOST '<oss_endpoint>',
    BUCKET '<bucket_name>',
    ID '<key_id>',
    KEY '<key_secret>'
);

Create a foreign table

CREATE FOREIGN TABLE <table_name> 
(...) 
SERVER <server_name> 
OPTIONS (
    FILEPATH '<file_path>', 
    DELIMITER ',' ,
    FORMAT 'csv', 
    ENCODING 'utf8', 
    PARSE_ERRORS '100'
);

 

<file_path> is a file name that contains a path in OSS. e.g. 'folder/filename.csv' or you can use "DIR" and give a path of directory.

반응형