It is recommended you read the general introduction “Working with Zarr arrays in R” before reading this vignette.
Reading files in S3 storage works in a very similar fashion to local
disk. This time the path needs to be a URL to the Zarr array. We can
again use zarr_overview() to quickly retrieve the array
metadata.
s3_address <- "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0076A/10501752.zarr/0"
zarr_overview(s3_address)## Type: Array
## Path: https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0076A/10501752.zarr/0
## Shape: 50 x 494 x 464
## Chunk Shape: 1 x 494 x 464
## No. of Chunks: 50 (50 x 1 x 1)
## Data Type: float64
## Endianness: little
## Compressor: blosc
You can also pass an S3 client to the function, which is useful if you need to set credentials or other options for accessing the bucket. See the section @ref(s3-client) for more details. If absent, Rarr will try to find credentials and other settings on its own, which may not always be successful. This is equivalent to the previous code block:
s3_address <- "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0076A/10501752.zarr/0"
s3_client <- paws.storage::s3(
config = list(
credentials = list(anonymous = TRUE),
region = "auto",
endpoint = "https://uk1s3.embassy.ebi.ac.uk"
)
)
zarr_overview(s3_address, s3_client = s3_client)## Type: Array
## Path: https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0076A/10501752.zarr/0
## Shape: 50 x 494 x 464
## Chunk Shape: 1 x 494 x 464
## No. of Chunks: 50 (50 x 1 x 1)
## Data Type: float64
## Endianness: little
## Compressor: blosc
The output above indicates that the array is stored in 50 chunks,
each containing a slice of the overall data. In the example below we use
the index argument to extract the first and tenth slices
from the array. Choosing to read only 2 of the 50 slices is much faster
than if we opted to download the entire array before accessing the
data.
We then plot our two slices on top of one another using the
image() function.
## plot the first slice in blue
image(
log2(z2[1, , ]),
col = hsv(h = 0.6, v = 1, s = 1, alpha = 0:100 / 100),
asp = dim(z2)[2] / dim(z2)[3],
axes = FALSE
)
## overlay the tenth slice in green
image(
log2(z2[2, , ]),
col = hsv(h = 0.3, v = 1, s = 1, alpha = 0:100 / 100),
asp = dim(z2)[2] / dim(z2)[3],
axes = FALSE,
add = TRUE
)Note: if you receive the error message
"Error in stop(aws_error(request$error)) : bad error message"
it is likely you have some AWS credentials available in to your R
session, which are being inappropriately used to access this public
bucket. Please see the section @ref(s3-client) for details on how to set
credentials for a specific request.
If you’re accessing data in a private S3 bucket, you can set the
environment variables AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY to store your credentials. For
example, lets try reading a file in a private S3 bucket:
## Error:
## ! AccessDenied (HTTP 403). Access Denied.
We can see the “Access Denied” message in our output, indicating that
we don’t have permission to access this resource as an anonymous user.
However, if we use the key pair below, which gives read-only access to
the objects in the rarr-testing bucket, we’re now able to
interrogate the files with functions in Rarr.
Sys.setenv(
"AWS_ACCESS_KEY_ID" = "bYUBYVg1AsEreuDgtg5K",
"AWS_SECRET_ACCESS_KEY" = "r8FrLXc9dseD6V1P3htsu7ZBzP7Gszsd3sM1G4KX"
)
zarr_overview("https://s3.embl.de/rarr-testing/bzip2.zarr")## Type: Array
## Path: https://s3.embl.de/rarr-testing/bzip2.zarr
## Shape: 20 x 10
## Chunk Shape: 10 x 10
## No. of Chunks: 2 (2 x 1)
## Data Type: int32
## Endianness: little
## Compressor: bz2
Behind the scenes Rarr makes use of the paws suite of packages (https://paws-r.github.io/) to interact with S3 storage. A comprehensive overview of the multiple ways credentials can be set and used by paws can be found at https://github.com/paws-r/paws/blob/main/docs/credentials.md. If setting environment variables as above doesn’t work or is inappropriate for your use case please refer to that document for other options.
Although Rarr will try its best to find appropriate
credentials and settings to access a bucket, it is not always
successful. Once such example is when you have AWS credentials set
somewhere and you try to access a public bucket. We can see an example
of this below, where we access the same public bucket used in
@ref(read-s3), but it now fails because we have set the
AWS_ACCESS_KEY_ID environment variable in the previous
section.
s3_address <- "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0076A/10501752.zarr/0"
zarr_overview(s3_address)##
You might encounter similar problems if you’re trying to access
multiple buckets each of which require different credentials. The
solution here is to create an “s3_client” using
paws.storage::s3(), which contains all the required details
for accessing a particular bucket. Doing so will prevent
Rarr from trying to determine things on its own, and
gives you complete control over the settings used to communicate with
the S3 bucket. Here’s an example that will let us access the failing
bucket by creating a client with anonymous credentials.
s3_client <- paws.storage::s3(
config = list(
credentials = list(anonymous = TRUE),
region = "auto",
endpoint = "https://uk1s3.embassy.ebi.ac.uk"
)
)If you’re accessing a public bucket, the most important step is to
provide a credentials list with
anonymous = TRUE. Doing so ensures that no attempts to find
other credentials are made, and prevents the problems seen above. If
you’re using files on Amazon AWS storage you’ll need to set the
region to whatever is appropriate for your data
e.g. "us-east-2", "eu-west-3", etc. For other
S3 providers that don’t have regions use the value "auto"
as in the example below. Finally the endpoint argument is
the full hostname of the server where your files can be found. For more
information on creating an S3 client see the paws.storage
documentation.
We can then pass our s3_client to zarr_overview() and it
now works successfully.
## Type: Array
## Path: https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0076A/10501752.zarr/0
## Shape: 50 x 494 x 464
## Chunk Shape: 1 x 494 x 464
## No. of Chunks: 50 (50 x 1 x 1)
## Data Type: float64
## Endianness: little
## Compressor: blosc
Most functions in Rarr have the
s3_client argument and it can be applied in the same
way.