Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
Long shot.
I've got a job which pulls from an API. If i write the JSON response to a file using tFileOutputDelimited and upload to an S3 bucket via tS3Put, i can read the resulting data in Redshift via Spectrum. It works, however, i need to compress these files to save space.
If i add an extra step and use tFileArchive on the JSON file and upload it to S3, Redshift is unable to read the data with the error below.
SQL Error [500310] [XX000]: [Amazon](500310) Invalid operation: Spectrum Scan Error
Details: Spectrum Scan Error code: 15001 context: Error while reading next ION/JSON value: IERR_INVALID_TOKEN. In file 'https://s3.eu-west-1.amazonaw..... [Line 1, Pos 1] query: 2178747 location: dory_util.cpp:1081
process: fetchtask_thread [pid=27919]
As mentioned,
the raw JSON file can be uploaded and read
If i use 7ZIP to create a .gz archive fromt his file and upload via tS3Put the data can be read
The problem only occurs when i introduce tFileArchive into the mix. I've tried both tar.gz and gzip as the archive format in tFileArchive.
Ok looks like the issue was actually in the config for tS3Put. In the key section i wasn't adding ".gz" onto the end so S3 was seeing it as .json but then couldn't read the gzipped contents. Now fixed
Ok looks like the issue was actually in the config for tS3Put. In the key section i wasn't adding ".gz" onto the end so S3 was seeing it as .json but then couldn't read the gzipped contents. Now fixed
Hello,
Thanks for letting us know this issue has been resolved by yourself.
Best regards
Sabrina