---
title: "Working with **remote** Zarr arrays in R"
date: "23 June 2026"
author:
  - name: Hugo Gruson
    affiliation:
      - EMBL Heidelberg
    email: hugo.gruson@embl.de
  - name: Mike Smith
    affiliation:
      - EMBL Heidelberg
package: "Rarr 2.1.18"
vignette: >
  %\VignetteIndexEntry{"Working with **remote** Zarr arrays in R"}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
output:
  BiocStyle::html_document
---

<!-- 
# Pre-render with 
knitr::knit("vignettes/_S3.Rmd", output = "vignettes/S3.Rmd")
-->


``` r
library(Rarr)
```

It is recommended you read the [general introduction "Working with Zarr arrays in R"](https://huber-group-embl.github.io/Rarr/articles/Rarr.html) before reading this vignette.

Reading files in S3 storage works in a very similar fashion to local disk.  This
time the path needs to be a URL to the Zarr array.
We can again use `zarr_overview()` to quickly retrieve the array metadata.


``` r
s3_address <- "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0076A/10501752.zarr/0"
zarr_overview(s3_address)
```

```
## Type: Array
## Path: https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0076A/10501752.zarr/0
## Shape: 50 x 494 x 464
## Chunk Shape: 1 x 494 x 464
## No. of Chunks: 50 (50 x 1 x 1)
## Data Type: float64
## Endianness: little
## Compressor: blosc
```

You can also pass an S3 client to the function, which is useful if you need to set credentials or other options for accessing the bucket. See the section \@ref(s3-client) for more details.
If absent, *[Rarr](https://bioconductor.org/packages/3.24/Rarr)* will try to find credentials and other settings on its own, which may not always be successful.
This is equivalent to the previous code block:


``` r
s3_address <- "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0076A/10501752.zarr/0"
s3_client <- paws.storage::s3(
  config = list(
    credentials = list(anonymous = TRUE),
    region = "auto",
    endpoint = "https://uk1s3.embassy.ebi.ac.uk"
  )
)
zarr_overview(s3_address, s3_client = s3_client)
```

```
## Type: Array
## Path: https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0076A/10501752.zarr/0
## Shape: 50 x 494 x 464
## Chunk Shape: 1 x 494 x 464
## No. of Chunks: 50 (50 x 1 x 1)
## Data Type: float64
## Endianness: little
## Compressor: blosc
```

The output above indicates that the array is stored in 50 chunks, each
containing a slice of the overall data. In the example below we use the `index`
argument to extract the first and tenth slices from the array. Choosing to read only
2 of the 50 slices is much faster than if we opted to download the entire array
before accessing the data.


``` r
z2 <- read_zarr_array(
  s3_address,
  index = list(c(1, 10), NULL, NULL)
)
```

We then plot our two slices on top of one another using the `image()` function.


``` r
## plot the first slice in blue
image(
  log2(z2[1, , ]),
  col = hsv(h = 0.6, v = 1, s = 1, alpha = 0:100 / 100),
  asp = dim(z2)[2] / dim(z2)[3],
  axes = FALSE
)
## overlay the tenth slice in green
image(
  log2(z2[2, , ]),
  col = hsv(h = 0.3, v = 1, s = 1, alpha = 0:100 / 100),
  asp = dim(z2)[2] / dim(z2)[3],
  axes = FALSE,
  add = TRUE
)
```

![plot of chunk plot-raster](figure/plot-raster-1.png)

**Note:** if you receive the error message
`"Error in stop(aws_error(request$error)) : bad error message"` it is likely you
have some AWS credentials available in to your R session, which are being
inappropriately used to access this public bucket.  Please see the section
\@ref(s3-client) for details on how to set credentials for a specific
request.

## Using credentials to access S3 buckets {#s3-credentials}

If you're accessing data in a private S3 bucket, you can set the environment
variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` to store your
credentials.  For example, lets try reading a file in a private S3 bucket:


``` r
zarr_overview("https://s3.embl.de/rarr-testing/bzip2.zarr")
```

```
## Error:
## ! AccessDenied (HTTP 403). Access Denied.
```

We can see the "Access Denied" message in our output, indicating that we don't
have permission to access this resource as an anonymous user.  However, if we use the key pair
below, which gives read-only access to the objects in the `rarr-testing` bucket,
we're now able to interrogate the files with functions in *Rarr*.


``` r
Sys.setenv(
  "AWS_ACCESS_KEY_ID" = "bYUBYVg1AsEreuDgtg5K",
  "AWS_SECRET_ACCESS_KEY" = "r8FrLXc9dseD6V1P3htsu7ZBzP7Gszsd3sM1G4KX"
)
zarr_overview("https://s3.embl.de/rarr-testing/bzip2.zarr")
```

```
## Type: Array
## Path: https://s3.embl.de/rarr-testing/bzip2.zarr
## Shape: 20 x 10
## Chunk Shape: 10 x 10
## No. of Chunks: 2 (2 x 1)
## Data Type: int32
## Endianness: little
## Compressor: bz2
```

Behind the scenes **Rarr** makes use of the **paws** suite of packages
(https://paws-r.github.io/) to interact with S3 storage.  A comprehensive
overview of the multiple ways credentials can be set and used by **paws** can be
found at https://github.com/paws-r/paws/blob/main/docs/credentials.md.  If
setting environment variables as above doesn't work or is inappropriate for
your use case please refer to that document for other options.

## Creating an S3 client {#s3-client}

Although **Rarr** will try its best to find appropriate credentials and settings
to access a bucket, it is not always successful. Once such example is when you
have AWS credentials set somewhere and you try to access a public bucket.  We
can see an example of this below, where we access the same public bucket used in
\@ref(read-s3), but it now fails because we have set the `AWS_ACCESS_KEY_ID`
environment variable in the previous section.


``` r
s3_address <- "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0076A/10501752.zarr/0"
zarr_overview(s3_address)
```

```
## 
```

You might encounter similar problems if you're trying to access multiple buckets
each of which require different credentials.  The solution here is to create an
"s3_client" using `paws.storage::s3()`, which contains all the required details
for accessing a particular bucket.  Doing so will prevent **Rarr** from trying
to determine things on its own, and gives you complete control over the settings
used to communicate with the S3 bucket. Here's an example that will let us
access the failing bucket by creating a client with anonymous credentials.


``` r
s3_client <- paws.storage::s3(
  config = list(
    credentials = list(anonymous = TRUE),
    region = "auto",
    endpoint = "https://uk1s3.embassy.ebi.ac.uk"
  )
)
```

If you're accessing a public bucket, the most important step is to provide a
`credentials` list with `anonymous = TRUE`.  Doing so ensures that no attempts
to find other credentials are made, and prevents the problems seen above.  If
you're using files on Amazon AWS storage you'll need to set the `region` to
whatever is appropriate for your data e.g. `"us-east-2"`, `"eu-west-3"`, etc.
For other S3 providers that don't have regions use the value `"auto"` as in the
example below.  Finally the `endpoint` argument is the full hostname of the
server where your files can be found.  For more information on creating an S3
client see the [**paws.storage**
documentation](https://paws-r.github.io/docs/s3/).

We can then pass our s3_client to `zarr_overview()` and it now works
successfully.


``` r
zarr_overview(s3_address, s3_client = s3_client)
```

```
## Type: Array
## Path: https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0076A/10501752.zarr/0
## Shape: 50 x 494 x 464
## Chunk Shape: 1 x 494 x 464
## No. of Chunks: 50 (50 x 1 x 1)
## Data Type: float64
## Endianness: little
## Compressor: blosc
```

Most functions in **Rarr** have the `s3_client` argument and it
can be applied in the same way.