Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Landsat downloading questions #188

Open
CMCDragonkai opened this issue May 22, 2016 · 3 comments
Open

Landsat downloading questions #188

CMCDragonkai opened this issue May 22, 2016 · 3 comments

Comments

@CMCDragonkai
Copy link

CMCDragonkai commented May 22, 2016

According to the docs, AWS S3 offers the each band individually, while Google Storage has all the bands into one scene compressed archive.

Before when downloading from Google Storage, they would always give a .tar.bz. However now when downloading from Amazon, they only give individual bands.

Is there a way to specify to always download a .tar.bz even from Amazon?


Ok assuming the API has changed, so we can no longer get .tar.bz from landsat-util, I have another question.

A while ago, I downloaded LC80990832016037LGN01, and this was fetched from google storage. Uncompressing and untarring it gave me a file LC80990832016037LGN01_B1.TIF. This file was 114 MiB.

However when I try to download LC80990832016037LGN01 from landsat-util now using AWS S3 as a source, I need to specify --bands="1", the new LC80990832016037LGN01_B1.TIF was instead only 57 MiB.

Is the source from AWS S3 preprocessed or compressed in some way?

Looking at using file tool, it explains:

$ file direct_aws.tif 
direct_aws.tif: TIFF image data, little-endian, direntries=17, height=7881, bps=16, compression=deflate, PhotometricIntepretation=BlackIsZero, width=7811
$ file landsat_google.tif 
landsat_google.tif: TIFF image data, little-endian

Which shows that the AWS version is compressed with deflate. And we get a bit more metadata which is nice. However afterwards, I checked the 2 files inside QGIS, and the max band of the google supplied file is 11997, while the max band from the supposedly same file on AWS was only 11502. It seems that the 2 files are not equivalent even after decompression?


I also noticed sometimes there's wild throughput differences between downloading from landsat-util and direct from AWS S3 http://landsat-pds.s3.amazonaws.com/ Even while downloading the same file at the same time.

@CMCDragonkai CMCDragonkai changed the title Landsat Download <scene-id> sometimes downloads an entire .tar.gz and sometimes a _BQA.TIF Landsat downloading questions May 22, 2016
@scisco
Copy link
Contributor

scisco commented May 24, 2016

@CMCDragonkai I think AWS only offers individual bands, so you cannot download a zip file from AWS.

On you second question, you are right, AWS does some optimization scripts on the files before save which might explain the size different. I haven't actually ran any comparisons. It's really interesting if the files on Google and AWS have different sizes. If this is true, we should also compare the files with the originals on EarthExplorer.

On your last question, which one has a higher throughput?

@matthewhanson
Copy link
Contributor

@CMCDragonkai The TIF files on AWS use internal compression, while those from Google (and from EarthExplorer) do not and instead are distributed as the compressed archives (tar.gz). If internally compressed creating a tar.gz will not have a large effect on the file size, because they are already compressed (in fact, it could actually make the resulting tar.gz file larger), so getting a tar.gz from AWS wouldn't gain anything.

Interesting that there are different values, as @scisco says, it would be worthwhile to compare them to the originals. Everything else should be the same: resolution, size, projection. And the internal compression used, deflate, is lossless, so that should not cause any issue either.

@CMCDragonkai
Copy link
Author

CMCDragonkai commented May 25, 2016

I only checked one of the TIFF files since I only have one archive that was from Google Storage prior to landsat switching to downloading from AWS S3.

I'll check some of the other files in the same archive if they also have different max values for their bands:

However afterwards, I checked the 2 files inside QGIS, and the max band of the google supplied file is 11997, while the max band from the supposedly same file on AWS was only 11502. It seems that the 2 files are not equivalent even after decompression?

As for the throughput, it's sometimes lower through landsat-util and faster on AWS S3, and sometimes they are equivalent, and sometimes landsat-util is slightly faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants