For more information about the Lexio Dropzone in general, please reference this article first.



When connection from Snowflake, we recommend using the "copy into" feature. For details, check out the Snowflake reference for COPY INTO.


Parquet (Preferred)

To have Snowflake load data into the Data Lake as a Parquet file, create a worksheet and run the following COPY INTO query:

COPY INTO 's3://prod-talos-stitchingest-v3/INTEGRATION_NAME/TABLE_NAME/data.parquet'

FROM "DATABASE"."SCHEMA"."TABLE_OR_VIEW_NAME"

CREDENTIALS = ( 

    AWS_KEY_ID = 'AWS_KEY_ID

    AWS_SECRET_KEY = 'AWS_SECRET_KEY'

)

FILE_FORMAT = ( 

    TYPE = PARQUET 

    TRIM_SPACE = TRUE 

    COMPRESSION = SNAPPY 

)

OVERWRITE = TRUE

HEADER = TRUE

SINGLE = TRUE

MAX_FILE_SIZE = 4900000000;    


Where:

  • DATABASE, SCHEMATABLE_OR_VIEW_NAME point to the table or view to load (customer Snowflake)

  • INTEGRATION_NAME and TABLE_NAME configure where the file lands in S3 (provided by Narrative Science)

  • AWS_KEY_ID and AWS_SECRET_KEY are the customer/user AWS credentials (provided by Narrative Science)

Since we're configuring the query to overwrite the data, the filename (data.parquet) isn't important.


CSV

To have Snowflake load data into the Data Lake as a CSV, create a worksheet and run the following COPY INTO query:

COPY INTO 's3://prod-talos-stitchingest-v3/INTEGRATION_NAME/TABLE_NAME/data.csv.gz'

FROM "DATABASE"."SCHEMA"."TABLE_OR_VIEW_NAME"

CREDENTIALS = ( 

    AWS_KEY_ID = 'AWS_KEY_ID

    AWS_SECRET_KEY = 'AWS_SECRET_KEY'

)

FILE_FORMAT = ( 

    TYPE = CSV 

    TRIM_SPACE = TRUE 

    COMPRESSION = GZIP

)

OVERWRITE = TRUE

HEADER = TRUE

SINGLE = TRUE

MAX_FILE_SIZE = 4900000000;


The variables are the same as the Parquet query above.

Since we're configuring the query to overwrite the data, the name of the file (data) isn't important but the extension (.csv.gz) is.



Scheduling

To have this query run on a recurring/automated basis, create a scheduled task. For example,


create task LEXIO_TASK

      warehouse = WAREHOUSE

      schedule = '240 M'

    as

      COPY INTO

      '     s3://prod-talos-stitchingest-v3/...