Images and ImageCollections

wf.Image and wf.ImageCollection are the main proxy objects you’ll interact with in Workflows. They hold multi-band raster data in a NumPy masked array format, and metadata in a dict format. They behave a bit like NumPy arrays, with support for vectorized operations and broadcasting: img + 1 adds 1 to every pixel in the image; img1 + img2 does pixel-wise addition, matched by bands, imagecollection + img adds img to every Image in the ImageCollection.

An Image represents a single image, whereas an ImageCollection represents a stack or timeseries of images. Both objects are structured the same, and in many cases can be used interchangeably. They both have these fields:

  • ndarray: a MaskedArray of the raster data.

    • For Image, it’s 3D, with shape (bands, y, x).
    • For ImageCollection, it’s 4D, with shape (images, bands, y, x)
  • properties: metadata, such as the date

    • For Image, it’s a Dict with Str keys. All values are Any besides these:

      • "id": Str
      • "date": Datetime
      • "product": Str
      • "crs": Str
      • "geotrans": Tuple[Float, Float, Float, Float, Float, Float]
    • For ImageCollection, it’s a List of the above Dicts, equal in length to the number of Images in the ImageCollection

  • bandinfo: a Dict of metadata about the bands. The keys are band names; the values are Dicts of band metadata, such as data_range.

Since Image and ImageCollection behave so similary, throughout this guide, we’ll use the term “imagery” to mean “an Image or an ImageCollection”.

Loading Imagery

You can load imagery using Image.from_id and ImageCollection.from_id. (It’s also possible to construct an ImageCollection from a list of Images.)

>>> import descarteslabs.workflows as wf
>>> img = wf.Image.from_id("sentinel-1:GRD:meta_2021-01-05_056D1117_S1A")
>>> imagecollection = wf.ImageCollection.from_id("sentinel-1:GRD", start_datetime="2021-01-01", end_datetime="2021-02-01")
>>> import descarteslabs as dl
>>> santa_fe_ctx = dl.scenes.DLTile.from_latlon(35.67, -105.93, resolution=40, tilesize=512, pad=0)
>>> img.inspect(santa_fe_ctx)
ImageResult:
  * ndarray: MaskedArray<shape=(2, 512, 512), dtype=float64>
  * properties: 'acquired', 'area', 'bits_per_pixel', 'bucket', ...
  * bandinfo: 'vv', 'vh'
  * geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...
>>> imagecollection.inspect(santa_fe_ctx)
ImageCollectionResult of length 15:
  * ndarray: MaskedArray<shape=(15, 2, 512, 512), dtype=float64>
  * properties: 15 items
  * bandinfo: 'vv', 'vh'
  * geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...

Arithmetic

Imagery objects support all the operators between both single numbers and other imagery objects:

  • Arithmetic: +, -, *, /, //

    >>> img + 1
    >>> img1 / img2
    
  • Equality: ==, !=, <, >, <=, =>

    >>> img.pick_bands("red") < 0.4
    >>> imagecollection >= img
    
  • Bitwise: ~, &, |, ^, <<, >>

    >>> ~img.pick_bands("cloud-mask")
    >>> (img.pick_bands("class") < 5) | ((ndvi_img > 0.4) & img.properties["date"].is_between("2018-01-01", "2019-01-01"))
    

Image and ImageCollection mostly behave like NumPy arrays with respect to these operators. For example, img + 1 adds 1 to every pixel in the image; img1 + img2 does pixel-wise addition. If you’re familiar with NumPy, all you need to know is that Workflows behaves the exact same way, with one slight difference: the bands of two imagery objects are aligned by name, not by order.

Some common examples of combining imagery:

>>> imagecollection = wf.ImageCollection.from_id("sentinel-2:L1C")
>>> img = imagecollection[0]
>>> img + 1  # adds 1 to every pixel
>>> img + img  # adds corresponding pixels (equivalent to `img * 2`)
>>> img + img.pick_bands("red")  # adds corresponding pixel from the red band to every band
>>> img.pick_bands("red green") + img.pick_bands("red green")  # adds red band to red band, and green band to green band
>>> img.pick_bands("red green") + img.pick_bands("green red")  # same as above---bands are matched up by name
>>> imagecollection + 1  # adds 1 to every pixel in every Image
>>> imagecollection + imagecollection  # adds corresp. pixels in corresp. Images (equivalent to `imagecollection * 2`)
>>> imagecollection + imagecollection.pick_bands("red")  # adds corresp. pixel from the red band to every band, Image by Image
>>> imagecollection + img  # adds `img` to every Image
>>> imagecollection + img.pick_bands("red")  # adds `img`'s red band to every band in every Image

For the full details, Imagery Broadcasting and Alignment Rules.

Bands

Image and ImageCollection objects contain one more more bands. When you do arithmetic operations on them, the operation applies to every band.

When you construct an Image or ImageCollection, the object initially contains every band for that product.

Tip: img.bandinfo.keys().inspect(ctx) is a quick way to find out the names of all the bands in an Image or ImageCollection.

>>> img = wf.Image.from_id("sentinel-2:L1C:2019-04-07_13SDV_33_S2B_v1")
>>> img.bandinfo.keys().inspect(santa_fe_ctx)
['red',
 'nir',
 'green',
 'bright-mask',
 'blue',
 'alpha',
 'red-edge',
 'swir1',
 'red-edge-2',
 'swir2',
 'red-edge-3',
 'red-edge-4',
 'coastal-aerosol',
 'water-vapor',
 'cloud-mask',
 'opaque-cloud-mask',
 'cirrus',
 'cirrus-cloud-mask',
 'derived:bai',
 'derived:evi',
 'derived:nbr',
 'derived:ndvi',
 'derived:ndwi',
 'derived:ndwi1',
 'derived:ndwi2',
 'derived:rsqrt',
 'derived:visual_cloud_mask']

With pick_bands, you can select just the band(s) you need into a new imagery object.

>>> rgb = img.pick_bands("red green blue")
>>> rgb.bandinfo.keys().inspect(santa_fe_ctx)
['red', 'green', 'blue']

Often, it’s helpful to split apart the bands into separate variables, in order to do math between them. unpack_bands makes this easier to type, returning a tuple of single-band imagery instead of one multi-band Image/ImageCollection:

>>> nir, red = img.unpack_bands("nir red")
>>> nir.bandinfo.keys().inspect(santa_fe_ctx)
['nir']
>>> red.bandinfo.keys().inspect(santa_fe_ctx)
['red']
>>> ndvi = (nir - red) / (nir + red)
>>> ndvi.bandinfo.keys().inspect(santa_fe_ctx)
['nir_sub_red_div_nir_add_red']

When you calculate new derived data like this, the band names may not make much sense. You can use rename_bands to create a new object with more sensible band names:

>>> ndvi_renamed = ndvi.rename_bands("ndvi")
>>> ndvi_renamed.bandinfo.keys().inspect(santa_fe_ctx)
['ndvi']

When both Images/ImageCollections in an arithmetic operation have multiple bands, they must have the same band names. Workflows then performs the operation between equivalent bands, as matched by name.

>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C", start_datetime="2019-01-01", end_datetime="2019-06-01")
>>> l8 = wf.ImageCollection.from_id("landsat:LC08:01:T1:TOAR", start_datetime="2019-01-01", end_datetime="2019-06-01")
>>> s2_comp = s2.mean("images")
>>> l8_comp = l8.mean("images")
>>> s2_rgb = s2_comp.pick_bands("red green blue")
>>> l8_rgb = l8_comp.pick_bands("red green blue")
>>> s2_l8_diff = s2_rgb - l8_rgb

Normally, this is what you want. But in the rare case where you match up bands positionally, you can use rename_bands as a trick to override Workflows’ alignment rules. For example, say we want to look at the difference between subsequent bands in a certain order:

>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C", start_datetime="2019-01-01", end_datetime="2019-02-01")
>>> spectral_order = s2.pick_bands("swir2 swir1 nir red")
>>> spectral_order_shifted = s2.pick_bands("swir1 nir red green")
>>> delta = spectral_order_shifted - spectral_order
>>> delta.mean(axis=("pixels", "images")).inspect(santa_fe_ctx)
---------------------------------------------------------------------------
BadRequestError                           Traceback (most recent call last)
...
BadRequestError: In 'sub': Rasters must have the same bands. ['swir1', 'nir', 'red', 'green'] != ['swir2', 'swir1', 'nir', 'red']

To work around this, you can rename the bands to match up in the order you want:

>>> matching_names = ["0", "1", "2", "3"]
>>> spectral_order_renamed = spectral_order.rename_bands(*matching_names)
>>> spectral_order2_renamed = spectral_order_shifted.rename_bands(*matching_names)
>>> delta = spectral_order2_renamed - spectral_order_renamed
>>> delta.mean(axis=("pixels", "images")).inspect(santa_fe_ctx)
{'0': 0.023125917875686113,
 '1': 0.2456081922092364,
 '2': -0.020058692454203053,
 '3': -0.02892147951430528}

Pixel values

In Workflows, pixel values are automatically rescaled to their physical_range, since physical units are more intuitive to work with (particularly if you’re referencing remote sensing literature). (Note that this behavior is different from raster and Scenes.) For example, for a Landsat-8 image where values are stored as uint16 with the data_range 0-10000, in Workflows the values would be float64 from 0.0-1.0. Specifically, if a band defines both data_range and physical_range, and the band’s type is not classification, then when the data is first loaded, its values are rescaled, without clipping, from data_range to physical_range. Otherwise, the values are left alone.

Reductions and compositing

Reductions such as mean, max, and median are like their NumPy equivalents: they calculate a statistic along one or more dimensions of imagery, reducing its dimensionality.

The most common use of reductions is flattening an ImageCollection into a single Image. In remote sensing, this is commonly called compositing:.

>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C", start_datetime="2019-04-07", end_datetime="2019-04-10")
>>> all_s2 = s2.pick_bands("red green blue").inspect(santa_fe_ctx)
>>> dl.scenes.display(*all_s2.ndarray, ncols=2)
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/imagery_figure12_1.png

We can create a median composite of this stack of images:

>>> s2_comp = s2.pick_bands("red green blue").median(axis="images").inspect(santa_fe_ctx)
>>> dl.scenes.display(s2_comp.ndarray)
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/imagery_figure13_1.png

This is computing the median of each pixel across time (for each band). That “flattens” the images axis of the ImageCollection, producing a single Image.

These reductions are available:

You can also use Image.reduction / ImageCollection.reduction to specify the reduction type as a string (like "mean"), which is helpful if you want to pick the reduction operation with a widget, or change it conditionally with ifelse.

Reduction axis

Inspired by NumPy, all reductions take an axis argument: which axis (or axes) to “flatten”. Unlike NumPy, the axes are not given numerically, but by name:

  • "pixels": aggregates the spatial dimensions to a single value (equivalent to axis=(-2, -1) of the ndarray)
  • "bands": aggregates all bands to a single band (equivalent to axis=-3 of the ndarray)
  • "images": only valid for ImageCollection: aggregates all Images to a single Image (equivalent to axis=0 of the ndarray)

With bands and images, the result is still an Image or ImageCollection. When reducing pixels, though, the spatial component is eliminated, so instead the results are returned in a Dict (keyed by band name), List (of the same length as the ImageCollection), or List[Dict].

Refer to the API reference of each of the reduction functions for more information, including what they return for different combinations of the axis argument.

For illustration purposes, here is ImageCollection.mean with all the possible combinations of the axis argument. We’ll use the same ImageCollection for every case:

>>> s1 = wf.ImageCollection.from_id("sentinel-1:GRD", start_datetime="2021-01-01", end_datetime="2021-02-01")
>>> s1.inspect(santa_fe_ctx)
ImageCollectionResult of length 15:
  * ndarray: MaskedArray<shape=(15, 2, 512, 512), dtype=float64>
  * properties: 15 items
  * bandinfo: 'vv', 'vh'
  * geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...

axis="images"

>>> s1.mean(axis="images").inspect(santa_fe_ctx)
ImageResult:
  * ndarray: MaskedArray<shape=(2, 512, 512), dtype=float64>
  * properties: 'cs_code', 'proj4', 'descartes_version', 'sw_version', ...
  * bandinfo: 'vv', 'vh'
  * geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...

axis="bands"

>>> s1.mean(axis="bands").inspect(santa_fe_ctx)
ImageCollectionResult of length 15:
  * ndarray: MaskedArray<shape=(15, 1, 512, 512), dtype=float64>
  * properties: 15 items
  * bandinfo: 'mean'
  * geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...

axis="pixels"

>>> s1.mean(axis="pixels").inspect(santa_fe_ctx)
[{'vv': 0.06507844363941867, 'vh': 0.013471549539005053},
 {'vv': 0.0672581267668532, 'vh': 0.012318438247697678},
 {'vv': 0.08511264914812902, 'vh': 0.015800944029523173},
 {'vv': 0.03462038566820695, 'vh': 0.009320577734452272},
 {'vv': 0.06395339252744914, 'vh': 0.012412520835099021},
 {'vv': 0.03661998548801368, 'vh': 0.006415286723717648},
 {'vv': 0.06575639762130438, 'vh': 0.013508246926700367},
 {'vv': 0.06645751142928526, 'vh': 0.01276219756396224},
 {'vv': 0.08598611798555544, 'vh': 0.01651434521501131},
 {'vv': 0.03692038186786661, 'vh': 0.010285123715727401},
 {'vv': 0.06696140452529671, 'vh': 0.013725120566631954},
 {'vv': 0.039123885552189534, 'vh': 0.00703503718474875},
 {'vv': 0.0678754021139706, 'vh': 0.013611529481177236},
 {'vv': 0.07074211130000033, 'vh': 0.012591674749612845},
 {'vv': 0.09397280188010747, 'vh': 0.016609604723956565}]

axis=("images", "bands")

>>> s1.mean(axis=("images", "bands")).inspect(santa_fe_ctx)
ImageResult:
  * ndarray: MaskedArray<shape=(1, 512, 512), dtype=float64>
  * properties: 'cs_code', 'proj4', 'descartes_version', 'sw_version', ...
  * bandinfo: 'mean'
  * geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...

axis=("images", "pixels")

>>> s1.mean(axis=("images", "pixels")).inspect(santa_fe_ctx)
{'vv': 0.06471048479811546, 'vh': 0.012596908153723297}

axis=("bands", "pixels")

>>> s1.mean(axis=("bands", "pixels")).inspect(santa_fe_ctx)
[0.03927499658921186,
 0.03978828250727544,
 0.0504567965888261,
 0.021970481701329612,
 0.03818295668127408,
 0.021517636105865663,
 0.039632322274002374,
 0.03960985449662375,
 0.05125023160028338,
 0.023602752791797004,
 0.04034326254596433,
 0.023079461368469143,
 0.040743465797573916,
 0.041666893024806584,
 0.055291203302032015]

axis=("images", "bands", "pixels")

>>> s1.mean(axis=("images", "bands", "pixels")).inspect(santa_fe_ctx)
0.038653696475919375

axis=None (equivalent to axis=("images", "bands", "pixels"))

>>> s1.mean(axis=None).inspect(santa_fe_ctx)
0.038653696475919375

And here is Image.mean with all the possible combinations of the axis argument. We’ll use the same Image for every case:

>>> s1_img = s1[0]
>>> s1_img.inspect(santa_fe_ctx)
ImageResult:
  * ndarray: MaskedArray<shape=(2, 512, 512), dtype=float64>
  * properties: 'acquired', 'cs_code', 'pass', 'bits_per_pixel', ...
  * bandinfo: 'vv', 'vh'
  * geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...

axis="bands"

>>> s1_img.mean(axis="bands").inspect(santa_fe_ctx)
ImageResult:
  * ndarray: MaskedArray<shape=(1, 512, 512), dtype=float64>
  * properties: 'acquired', 'cs_code', 'pass', 'bits_per_pixel', ...
  * bandinfo: 'mean'
  * geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...

axis="pixels"

>>> s1_img.mean(axis="pixels").inspect(santa_fe_ctx)
{'vv': 0.06507844363941867, 'vh': 0.013471549539005053}

axis=("bands", "pixels")

>>> s1_img.mean(axis=("bands", "pixels")).inspect(santa_fe_ctx)
0.03927499658921186

axis=None

>>> s1_img.mean(axis=None).inspect(santa_fe_ctx)
0.03927499658921186

ImageCollection-specific reductions

ImageCollection.mosaic flattens the ImageCollection into a single Image by picking each first-unmasked pixel (ordered from last to first). mosaic is often used to make a “most-recent-valid-pixel” composite. Unlike the other reductions, the order of the ImageCollection matters. Additionally, mosaic doesn’t take an axis argument: the axis is always images.

ImageCollection.sortby_composite uses the argmin or argmax of one band select pixels from other bands. For example, you could create an RGB composite, where each pixel is selected from the image where NDVI was highest for that pixel:

>>> kansas_ctx = dl.scenes.DLTile.from_latlon(38.379776, -99.275899, resolution=40, tilesize=512, pad=0)
>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C", start_datetime="2019-06-01", end_datetime="2019-08-01")
>>>
>>> nir, red = s2.unpack_bands("nir red")
>>> ndvi = (nir - red) / (nir + red)
>>>
>>> rgb = s2.pick_bands("red green blue")
>>> rgb_max_ndvi_comp = rgb.sortby_composite(ndvi, operation="argmax")
>>>
>>> # median and mosaic, for comparison
>>> rgb_median = rgb.median(axis="images")
>>> rgb_mosaic = rgb.mosaic()
>>>
>>> sortby_result, median_result, mosaic_result = wf.inspect(
...     [rgb_max_ndvi_comp.ndarray, rgb_median.ndarray, rgb_mosaic.ndarray],
...     kansas_ctx
... )
>>>
>>> dl.scenes.display(
...     sortby_result, median_result, mosaic_result, title=["argmax(NDVI) sortby composite", "Median composite", "Mosaic"]
... )
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/imagery_figure28_1.png

Masking

Imagery objects use a MaskedArray, which is a proxy object for a NumPy masked array. Along with their pixel values, they have a boolean mask with the same shape as .ndarray. Like NumPy, True means masked (invalid), False means unmasked (valid).

Masks behave exactly the same as NumPy masked arrays. Masks propagate: in img1 + img2, if a pixel is masked in img1 but not img2, it will be masked in the result. Masked values are also ignored in reductions: imagecollection.median(axis="images") takes the median of only the unmasked pixels, for example.

When loading imagery, Workflows automatically masks the nodata value for each band that has one set. If the product contains a band named alpha, it also masks all bands by it. Additionally, if your GeoContext has the geometry field set, all pixels outside of that geometry will be masked.

Updating masks

You can create new imagery with an updated mask using Image.mask and ImageCollection.mask. For example

>>> import descarteslabs.workflows as wf
>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C")
>>> cloud, cirrus = s2.unpack_bands("cloud-mask cirrus-cloud-mask")
>>> s2_cloudmasked = s2.mask(cloud | cirrus)

By default, mask is additive: whatever was masked before remains masked, in addition to the new mask you pass in. Pass replace=True to change this.

Any pixels that are masked in the new mask are ignored: in img.mask(new_mask), masked pixels in new_mask are treated as False. (Note that this is different from NumPy.)

A common mistake is forgetting that mask returns a new object—it doesn’t modify the object you call it on. For example, this won’t work as expected:

>>> s2.mask(cloud | cirrus)

This does nothing, because you didn’t store the result into a new variable!

Accessing masks

You can access the current mask with getmask. This returns an Image or ImageCollection with one band of boolean data named mask.

Groupby

ImageCollection.groupby is similar to groupby in pandas: given a key function to identify which group a given Image belogs to, it splits the ImageCollection into a dict-like object of groups (ImageCollectionGroupby). You can then apply an operation to each group (such as a mean), and combine the results from all the groups back into one ImageCollection or Dict.

The most common use of groupby is making temporal composites, such as a monthly median. For that reason, groupby can take a dates argument for convenience, where you can give the date components to groupby (year, month, day, etc.):

>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C", start_datetime="2020-08-01", end_datetime="2020-11-01")
>>> s2_monthly_median = s2.groupby(dates=("year", "month")).median("images")
>>> result = s2_monthly_median.pick_bands("red green blue").inspect(kansas_ctx)
>>> result
ImageCollectionResult of length 3:
  * ndarray: MaskedArray<shape=(3, 3, 512, 512), dtype=float64>
  * properties: 3 items
  * bandinfo: 'red', 'green', 'blue'
  * geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...

groupby adds a field named group to each image in the output:

>>> groups = s2_monthly_median.properties.map(lambda p: p["group"]).inspect(kansas_ctx)
>>> groups
[(2020, 8), (2020, 9), (2020, 10)]
>>> dl.scenes.display(*result.ndarray, title=groups, ncols=2)
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/imagery_figure31_1.png

Instead of dates, you can give a function. This function must take one Image, and return which group it belongs to. Typically, the function would look at the Image’s properties.

For example, you can use this to bin the images into specific time intervals, such as 14-day composites:

>>> twoweek_max = (
...     s2.groupby(
...         lambda img: img.properties["date"] // wf.Timedelta(days=14)
...     ).max("images")
... )

Groupby doesn’t have to produce an ImageCollection. If the operation applied to each group produces something besides an Image or ImageCollection, groupby returns a Dict, where the keys are the groups, and the values are the value for each group. Here, we compute the mean NDVI value for each month:

>>> nir, red = s2.unpack_bands("nir red")
>>> ndvi = (nir - red) / (nir + red)
>>> monthly_max_ndvi = ndvi.groupby(dates="month").mean()
>>> monthly_max_ndvi.inspect(kansas_ctx)
{8: 0.4094230069494817, 9: 0.27414022149032186, 10: 0.23755633545839724}

Joining ImageCollections using Groupby

Workflows doesn’t currently have a built-in join operation. Instead, you can efficiently implement a join yourself using the ImageCollectionGroupby object.

The basic idea is to do a groupby on the two (or more) ImageCollections you want to combine. If you know they’ll always have the same groups, then you can aggregate both of them, and use normal vectorized operations to combine them:

>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C", start_datetime="2020-08-01", end_datetime="2020-11-01")
>>> s1 = wf.ImageCollection.from_id("sentinel-1:GRD", start_datetime="2020-08-01", end_datetime="2020-11-01")
>>> s2_nir_swir1 = s2.pick_bands("nir swir1")
>>> s1_vv = s1.pick_bands("vv")
>>>
>>> s2_monthly_mean = s2_nir_swir1.groupby(dates="month").mean("images")
>>> s1_monthly_mean = s1_vv.groupby(dates="month").mean("images")
>>> # now `s2_monthly_mean` and `s1_monthly_mean` are both monthly composites, so they should be the same length (3)
>>>
>>> combined_ic = s2_monthly_mean.concat_bands(s1_monthly_mean)
>>> combined_ic.bandinfo.keys().inspect(kansas_ctx)
['nir', 'swir1', 'vv']
>>> combined_ic.properties.map(lambda p: p["group"]).inspect(kansas_ctx)
[8, 9, 10]

But in some cases, you might get an error like ImageCollections must be the same length, 3 != 2. In that case, one of the ImageCollections was missing a group that the other one had. You can use a more advanced technique to address that: map over the groups from one of the ImageCollections, and for each, select the matching group from the other. This works because ImageCollectionGroupby acts like a dict: you can select groups from it with [] syntax:

>>> s1_missing_aug = s1.filter(lambda img: img.properties["date"].month != 8)
>>> # create a groupby object. note that we _don't_ apply a reduction, like `.mean()`
>>> s2_grouped = s2.groupby(dates="month")
>>> combined = s1_missing_aug.groupby(dates="month").map(
...     lambda group, s1_imgs: (
...         s1_imgs.mean("images")
...         .concat_bands(
...             # select the current group from `s2_grouped` (it acts like a dict)
...             s2_grouped[group].mean("images")
...         )
...     )
... )

By mapping over the ImageCollection that’s missing some groups, we know the result will only contain groups that both ImageCollections had.

But even if you mapped over the ImageCollection that wasn’t missing groups—or if both ImageCollections were missing some groups—this would still work, thanks to Workflows’ handling of empty imagery. Let’s consider the example above, but where we pick from s1_missing_aug instead of s2:

>>> s1_missing_aug = s1.filter(lambda img: img.properties["date"].month != 8)
>>> s1_missing_aug_grouped = s1_missing_aug.groupby(dates="month")
>>> combined = s2.groupby(dates="month").map(
...     lambda group, s2_imgs: (
...         s2_imgs.mean("images")
...         .concat_bands(
...             # when `s1_missing_aug_grouped` doesn't contain `group`, it gives an empty ImageCollection
...             s1_missing_aug_grouped[group].mean("images")
...         )
...     )
... )

Unlike a dict, selecting a missing item from an ImageCollectionGroupby doesn’t raise an error—it just returns a empty ImageCollection. (Read more about empty ImageCollections here.) You can use this empty ImageCollection just like any other, but everything you do to it just produces another empty ImageCollection. Then at the end, when ImageCollectionGroupby.map combines all of your groups together, it drops any groups where your function returned empty imagery. So in this case above, the function you passed to map returns an empty Image for month of August, which then gets dropped, so the final ImageCollection combined has 2 Images: one for September, one for October. (In SQL terms, this is an inner join: only keeping keys that both inputs have.)

You can combine this behavior with ImageCollection.replace_empty_with to have a “default value” for missing groups, letting you implement left or right joins, or even more complex logic.

>>> s1_missing_aug = s1.filter(lambda img: img.properties["date"].month != 8)
>>> s1_missing_aug_grouped = s1_missing_aug.groupby(dates="month")
>>> combined_with_s1_default = s2.groupby(dates="month").map(
...     lambda group, s2_imgs: (
...         s2_imgs.mean("images")
...         .concat_bands(
...             s1_missing_aug_grouped[group]
...             # replacing empty values with an ImageCollection of 0s makes this a left join:
...             # there will be an output for every group in `s2`.
...             .replace_empty_with(0, bandinfo=s1.bandinfo)
...             .mean("images")
...         )
...     )
... )

Filter

Note: see Life without for-loops for why filter is necessary.

Use ImageCollection.filter to create a new ImageCollection with images removed based on their metadata. Pass in a function which takes an Image; if the function returns False, the image is removed in the output.

>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C", start_datetime="2019-03-01", end_datetime="2019-04-01")
>>> rgb = s2.pick_bands("red green blue")
>>> rgb.length().inspect(santa_fe_ctx)
24
>>> low_cloud = rgb.filter(lambda img: img.properties["cloud_fraction"] < 0.1)
>>> low_cloud.length().inspect(santa_fe_ctx)
4
>>> all, low_cloud = wf.inspect([rgb.mean("images").ndarray, low_cloud.mean("images").ndarray], santa_fe_ctx)
>>> dl.scenes.display(all, low_cloud, title=["All images", "cloud_fraction filtered"], ncols=2)
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/imagery_figure37_1.png

Note

Filtering is relatively slow

Especially for large ImageCollections, every filter can slow down your computation. In particular, try to avoid using multiple filters, by instead combining multiple expressions within the same filter operation:

>>> # slow!
>>> ic.filter(lambda img: img.properties.date.year >= 2016).filter(lambda img: img.properties.date.year < 2019)
>>> # faster
>>> ic.filter(lambda img: (img.properties.date.year >= 2016) & (img.properties.date.year < 2019))

Map

Note: see Life without for-loops for why map is necessary.

Use ImageCollection.map instead of a for-loop. map applies a function to every Image in the ImageCollection and returns the results as a new ImageCollection, or a List.

>>> year_offsets = wf.proxify({
...     2018: 0.0,
...     2019: 0.3,
...     2020: 0.9
... })
>>>
>>> s2.map(lambda img: img + year_offsets[img.properties["date"].year])
<descarteslabs.workflows.types.geospatial.imagecollection.ImageCollection at 0x7f92ea307110>

Note

Map is relatively slow

ImageCollection.map is like using a for-loop over a NumPy array: it’s slow, and with creative use of vectorized arithmetic, other vectorized operations on ImageCollections, and the NumPy array interface, can often be avoided.

>>> # slow!
>>> ic.map(lambda img: img + 1)
>>> # fast
>>> ic + 1
>>> # slow!
>>> ic.map(lambda img: img / img.mean())
>>> # faster
>>> means_arr = wf.Array(ic.mean(axis=["pixels", "bands"]))
>>> normd_arr = ic.ndarray / means_arr
>>> normd_arr.to_imagery(ic.properites, ic.bandinfo)

Indexing

You can index an ImageCollection just like a list:

>>> ic[0]
>>> ic[-1]
>>> ic[1:4]

Using an out-of-bounds index will cause an error.

Concatenation

Use wf.concat to concatenate any number of Images and/or ImageCollections into one new ImageCollection. All the items must have the same bands. Any empty Images/ImageCollections will be dropped from the result.

>>> s1 = wf.ImageCollection.from_id("sentinel-1:GRD", start_datetime="2020-08-01", end_datetime="2020-11-01")
>>> first, second = s1[0], s1[1]
>>>
>>> # concatenate two Images
>>> two_length_ic = wf.concat(first, second)
>>>
>>> # concatenate an Image and an ImageCollection
>>> all_but_first = wf.concat(second, s1[2:])
>>>
>>> # concatenate two Images and an ImageCollection
>>> all = wf.concat(first, second, s1[2:])

Empty Imagery

Sometimes, no data exist for the dates or places on Earth where you’re looking. Rather than raising an error, this missing data is represented by an empty Image or ImageCollection, which has no data or metadata. All operations on these “empties” succeed, but propagate the “emptiness” (except for some operations like concatenation, which drop empties). In general, the goal is that missing data, and anything it touches, will simply be ignored in your computations. If you need it to be handled differently, you can replace it with default values.

For example, say we want to make a composite of the ascending and descending orbital passes of Sentinel-1. In some places, Sentinel-1 captures data in either an ascending and descending orbit; in other places, both. So if we filter by both both pass directions, we’ll sometimes get missing data:

>>> col = wf.ImageCollection.from_id("sentinel-1:GRD", start_datetime="2019-04-01", end_datetime="2019-07-01")
>>> a = col.filter(lambda img: img.properties["pass"]=="ASCENDING") # empty ImageCollection (in eastern North America)
>>> d = col.filter(lambda img: img.properties["pass"]=="DESCENDING") # non-empty ImageCollection (in eastern North America)
>>>
>>> # Pick bands and min will succeed in both the empty and non-empty case
>>> a_min = a.pick_bands("vv").min(axis="images") # empty Image
>>> d_min = d.pick_bands("vv").min(axis="images") # non-empty Image
>>>
>>> mins_ic = wf.ImageCollection([a_min, d_min])
>>> # creating an ImageCollection drops any empty inputs,
>>> # so this will only contain `d_min` (`a_min` is an empty Image)
>>>
>>> max_composite = mins_ic.max(axis="images")
>>> # a maximum composite of the non-empty ascending/descending min composites.
>>> # since `a_min` was empty and `mins_ic` only contains `d_min`,
>>> # this is equivalent to `d_min`.
>>>
>>> max_composite.inspect(ctx)
ImageResult:
  * ndarray: MaskedArray<shape=(1, 512, 512), dtype=float64>
...

What if we did not want to ignore empty values, but actually fill them in with something else? Let’s treat empty data explicitly as 0:

>>> a_min_filled = a_min.replace_empty_with(0, mask=False, bandinfo={"vv": {}})
>>> # when `a_min` is empty, replaces it with a new Image with one "vv" band of all 0s.
>>> # if `a_min` wasn't empty, it would return it unchanged.
>>>
>>> mins_ic = wf.ImageCollection([a_min_filled, d_min])
>>> # now `mins_ic` contains both `a_min_filled` (all 0s) and `d_min` (actual data)
>>>
>>> max_composite = mins_ic.max(axis="images")
>>> max_composite.inspect(ctx)
ImageResult:
  * ndarray: MaskedArray<shape=(1, 512, 512), dtype=float64>
...

Bear in mind that this is a contrived example, and is simply meant to demonstrate how empty rasters can be handled. There are many other ways to do something like this (ImageCollection.groupby() would be the simplest). More specific details about how empties are represented, and how they are handled in specific functions can be found below.

Both Image and ImageCollection objects can be empty, meaning they lack an ndarray, properites, and bandinfo. The .ndarray will be None, .properties will be an empty dictionary (Image) or empty list (ImageCollection), and .bandinfo will be an empty dictionary. Empty imagery objects have a .geocontext determined by the GeoContext passed to .compute(). An ImageCollection cannot have some empty and some non-empty images. Every image in a non-empty ImageCollection is non-empty. This is why concat drops empties. In a similar way, you cannot have some empty and some non-empty bands. Every band in a non-empty Image or ImageCollection is non-empty. This is why concat_bands and map_bands return an empty if any of the input bands or mapped bands are empty.

An empty Image cannot be explicitly constructed, and will only result from other operations. An empty ImageCollection can be explicitly constructed by calling the constructor with an empty list/list of empty images, as well as calling .from_id where no imagery matches the given constraints:

# Construct from empty list
>>> empty_col = wf.ImageCollection([])
>>> empty_col.length().inspect(ctx) # ctx is the GeoContext the computation will be performed over
0

# Construct from list of empty images
>>> empty_col = wf.ImageCollection([empty_img1, empty_img2, empty_img3])
>>> empty_col.length().inspect(ctx)
0

# No imagery matching given constraints
>>> empty_col = wf.ImageCollection.from_id("landsat:LC08:01:RT:TOAR", start_datetime="2017-01-01", end_datetime="2017-02-01") # no imagery exists between these start and end datetimes
>>> empty_col.length().inspect(ctx)
0

Empties can be created without explicitly constructing them as shown above. Some operations can result in an empty, when being passed a non-empty.

Image Operations:

  • map_bands: If the mapper function returns an empty Image, map_bands will also return an empty Image.

ImageCollection Operations:

  • filter: If the filter function is not true for any images in the collection, filter will return an empty ImageCollection

    >>> non_empty_col = wf.ImageCollection.from_id("landsat:LC08:01:RT:TOAR", start_datetime="2017-01-01", end_datetime="2017-12-01")
    >>> filtd = non_empty_col.filter(lambda img: img.properties["date"].year == 2018) # none of the images are from 2018
    >>> filtd.length().inspect(ctx)
    0
    
  • map: If the mapper function returns an empty Image, map will return an empty ImageCollection.

  • map_bands: If the mapper function returns an empty ImageCollection, map_bands will return an empty ImageCollection. If the mapper function returns an empty Image, map_bands will return an empty Image.

  • map_window: If the mapper function returns empty imagery, map_window will return an empty ImageCollection.

Empties are aggressively propagated. This means that when there is empty involved in an operation, it will result in a empty, regardless of the emptiness of other arguments:

>>> empty_col = wf.ImageCollection([])
>>> non_empty_col = wf.ImageCollection.from_id("landsat:LC08:01:RT:TOAR", start_datetime="2017-01-01", end_datetime="2017-07-01")
>>> added = non_empty_col + empty_col
>>> added.length().inspect(ctx)
0

Some operations are an exception to this:

  • concat: All non-empty imagery is preserved

    >>> empty_col = wf.ImageCollection([])
    >>> non_empty_col = wf.ImageCollection.from_id("landsat:LC08:01:RT:TOAR", start_datetime="2017-01-01", end_datetime="2017-07-01")
    >>> concat = empty_col.concat(non_empty_col) # non_empty_col.concat(empty_col) will yield the same result
    >>> concat.length().inspect(ctx)
    4
    >>> concat = non_empty_col.concat(empty_col, non_empty_col)
    >>> concat.length().inspect(ctx)
    8
    
  • ImageCollection constructor: All non-empty imagery is preserved

    >>> empty_col = wf.ImageCollection([])
    >>> empty_img = empty_col[0] # slicing into an empty ImageCollection results in an empty Image (all indices are valid)
    >>> non_empty_img = wf.Image.from_id("landsat:LC08:PRE:TOAR:meta_LC80270312016188_v1")
    >>> col = wf.ImageCollection([non_empty_img, empty_img, empty_img])
    >>> col.length().inspect(ctx)
    1
    
  • mask: If the mask is empty, mask is a no-op. If the object being masked is empty, mask is an empty (type determined by broadcasting rules)

    >>> empty_col = wf.ImageCollection([])
    >>> non_empty_col = wf.ImageCollection.from_id("landsat:LC08:01:RT:TOAR", start_datetime="2017-01-01", end_datetime="2017-07-01")
    >>> no_op = non_empty_col.mask(empty_col)
    >>> no_op.length().inspect(ctx)
    4
    >>> op = empty_col.mask(non_empty_col)
    >>> op.length().inspect(ctx)
    0
    

Aggresive propagation of empties may not always be desirable. If you have just done an operation that may result in an empty, but do not want that emptiness to propagate down through the rest of the operations, you can replace it with a non-empty:

>>> empty_col = wf.ImageCollection([])
>>> non_empty_col = wf.ImageCollection.from_id("landsat:LC08:01:RT:TOAR", start_datetime="2017-01-01", end_datetime="2017-07-01")
>>> new_col = empty_col.replace_empty_with(non_empty_col) # replacing with an ImageCollection
>>> new_col.length().inspect(ctx)
4
>>> new_col = empty_col.replace_empty_with(1.1, bandinfo={"red": {}, "green": {}, "blue": {}}) # replacing with a scalar (3 bands per Image)
>>> # new_col will be fully masked as the default behavior of replace_empty_with (specifying mask=False would make it unmasked)
>>> new_col.length().inspect(ctx)
1

See Image.replace_empty_with() and ImageCollection.replace_empty_with() for details on how to replace empty Image and ImageCollection objects. More information about empty handling for specific functions can be found in their documentation.