For more information about the Lexio Dropzone in general, please reference this article first.
When connection from Snowflake, we recommend using the "copy into" feature. For details, check out the Snowflake reference for COPY INTO.
Parquet (Preferred)
To have Snowflake load data into the Data Lake as a Parquet file, create a worksheet and run the following COPY INTO query:
COPY INTO 's3://prod-talos-stitchingest-v3/INTEGRATION_NAME/TABLE_NAME/data.parquet'
FROM "DATABASE"."SCHEMA"."TABLE_OR_VIEW_NAME"
CREDENTIALS = (
AWS_KEY_ID = 'AWS_KEY_ID'
AWS_SECRET_KEY = 'AWS_SECRET_KEY'
)
FILE_FORMAT = (
TYPE = PARQUET
TRIM_SPACE = TRUE
COMPRESSION = SNAPPY
)
OVERWRITE = TRUE
HEADER = TRUE
SINGLE = TRUE
MAX_FILE_SIZE = 4900000000;
Where:
DATABASE, SCHEMA, TABLE_OR_VIEW_NAME point to the table or view to load (customer Snowflake)
INTEGRATION_NAME and TABLE_NAME configure where the file lands in S3 (provided by Narrative Science)
AWS_KEY_ID and AWS_SECRET_KEY are the customer/user AWS credentials (provided by Narrative Science)
Since we're configuring the query to overwrite the data, the filename (data.parquet) isn't important.
CSV
To have Snowflake load data into the Data Lake as a CSV, create a worksheet and run the following COPY INTO query:
COPY INTO 's3://prod-talos-stitchingest-v3/INTEGRATION_NAME/TABLE_NAME/data.csv.gz'
FROM "DATABASE"."SCHEMA"."TABLE_OR_VIEW_NAME"
CREDENTIALS = (
AWS_KEY_ID = 'AWS_KEY_ID'
AWS_SECRET_KEY = 'AWS_SECRET_KEY'
)
FILE_FORMAT = (
TYPE = CSV
TRIM_SPACE = TRUE
COMPRESSION = GZIP
)
OVERWRITE = TRUE
HEADER = TRUE
SINGLE = TRUE
MAX_FILE_SIZE = 4900000000;
The variables are the same as the Parquet query above.
Since we're configuring the query to overwrite the data, the name of the file (data) isn't important but the extension (.csv.gz) is.
Scheduling
To have this query run on a recurring/automated basis, create a scheduled task. For example,
create task LEXIO_TASK
warehouse = WAREHOUSE
schedule = '240 M'
as
COPY INTO
' s3://prod-talos-stitchingest-v3/...