Images and ImageCollections¶
wf.Image
and wf.ImageCollection
are the main proxy objects you’ll interact with in Workflows. They hold multi-band raster data in a NumPy masked array format, and metadata in a dict format. They behave a bit like NumPy arrays, with support for vectorized operations and broadcasting: img + 1
adds 1 to every pixel in the image; img1 + img2
does pixel-wise addition, matched by bands, imagecollection + img
adds img
to every Image
in the ImageCollection
.
An Image
represents a single image, whereas an ImageCollection
represents a stack or timeseries of images. Both objects are structured the same, and in many cases can be used interchangeably. They both have these fields:
ndarray
: aMaskedArray
of the raster data.- For
Image
, it’s 3D, with shape(bands, y, x)
. - For
ImageCollection
, it’s 4D, with shape(images, bands, y, x)
- For
properties
: metadata, such as thedate
bandinfo
: aDict
of metadata about the bands. The keys are band names; the values are Dicts of band metadata, such asdata_range
.
Since Image
and ImageCollection
behave so similary, throughout this guide, we’ll use the term “imagery” to mean “an Image
or an ImageCollection
”.
Loading Imagery¶
You can load imagery using Image.from_id
and ImageCollection.from_id
. (It’s also possible to construct an ImageCollection
from a list of Images.)
>>> import descarteslabs.workflows as wf
>>> img = wf.Image.from_id("sentinel-1:GRD:meta_2021-01-05_056D1117_S1A")
>>> imagecollection = wf.ImageCollection.from_id("sentinel-1:GRD", start_datetime="2021-01-01", end_datetime="2021-02-01")
>>> import descarteslabs as dl
>>> santa_fe_ctx = dl.scenes.DLTile.from_latlon(35.67, -105.93, resolution=40, tilesize=512, pad=0)
>>> img.inspect(santa_fe_ctx)
ImageResult:
* ndarray: MaskedArray<shape=(2, 512, 512), dtype=float64>
* properties: 'acquired', 'area', 'bits_per_pixel', 'bucket', ...
* bandinfo: 'vv', 'vh'
* geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...
>>> imagecollection.inspect(santa_fe_ctx)
ImageCollectionResult of length 15:
* ndarray: MaskedArray<shape=(15, 2, 512, 512), dtype=float64>
* properties: 15 items
* bandinfo: 'vv', 'vh'
* geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...
Arithmetic¶
Imagery objects support all the operators between both single numbers and other imagery objects:
Arithmetic:
+, -, *, /, //
>>> img + 1 >>> img1 / img2
Equality:
==, !=, <, >, <=, =>
>>> img.pick_bands("red") < 0.4 >>> imagecollection >= img
Bitwise:
~, &, |, ^, <<, >>
>>> ~img.pick_bands("cloud-mask") >>> (img.pick_bands("class") < 5) | ((ndvi_img > 0.4) & img.properties["date"].is_between("2018-01-01", "2019-01-01"))
Image
and ImageCollection
mostly behave like NumPy arrays with respect to these operators. For example, img + 1
adds 1 to every pixel in the image; img1 + img2
does pixel-wise addition. If you’re familiar with NumPy, all you need to know is that Workflows behaves the exact same way, with one slight difference: the bands of two imagery objects are aligned by name, not by order.
Some common examples of combining imagery:
>>> imagecollection = wf.ImageCollection.from_id("sentinel-2:L1C")
>>> img = imagecollection[0]
>>> img + 1 # adds 1 to every pixel
>>> img + img # adds corresponding pixels (equivalent to `img * 2`)
>>> img + img.pick_bands("red") # adds corresponding pixel from the red band to every band
>>> img.pick_bands("red green") + img.pick_bands("red green") # adds red band to red band, and green band to green band
>>> img.pick_bands("red green") + img.pick_bands("green red") # same as above---bands are matched up by name
>>> imagecollection + 1 # adds 1 to every pixel in every Image
>>> imagecollection + imagecollection # adds corresp. pixels in corresp. Images (equivalent to `imagecollection * 2`)
>>> imagecollection + imagecollection.pick_bands("red") # adds corresp. pixel from the red band to every band, Image by Image
>>> imagecollection + img # adds `img` to every Image
>>> imagecollection + img.pick_bands("red") # adds `img`'s red band to every band in every Image
For the full details, Imagery Broadcasting and Alignment Rules.
Bands¶
Image
and ImageCollection
objects contain one more more bands. When you do arithmetic operations on them, the operation applies to every band.
When you construct an Image
or ImageCollection
, the object initially contains every band for that product.
Tip: img.bandinfo.keys().inspect(ctx)
is a quick way to find out the names of all the bands in an Image
or ImageCollection
.
>>> img = wf.Image.from_id("sentinel-2:L1C:2019-04-07_13SDV_33_S2B_v1")
>>> img.bandinfo.keys().inspect(santa_fe_ctx)
['red',
'nir',
'green',
'bright-mask',
'blue',
'alpha',
'red-edge',
'swir1',
'red-edge-2',
'swir2',
'red-edge-3',
'red-edge-4',
'coastal-aerosol',
'water-vapor',
'cloud-mask',
'opaque-cloud-mask',
'cirrus',
'cirrus-cloud-mask',
'derived:bai',
'derived:evi',
'derived:nbr',
'derived:ndvi',
'derived:ndwi',
'derived:ndwi1',
'derived:ndwi2',
'derived:rsqrt',
'derived:visual_cloud_mask']
With pick_bands
, you can select just the band(s) you need into a new imagery object.
>>> rgb = img.pick_bands("red green blue")
>>> rgb.bandinfo.keys().inspect(santa_fe_ctx)
['red', 'green', 'blue']
Often, it’s helpful to split apart the bands into separate variables, in order to do math between them. unpack_bands
makes this easier to type, returning a tuple of single-band imagery instead of one multi-band Image/ImageCollection:
>>> nir, red = img.unpack_bands("nir red")
>>> nir.bandinfo.keys().inspect(santa_fe_ctx)
['nir']
>>> red.bandinfo.keys().inspect(santa_fe_ctx)
['red']
>>> ndvi = (nir - red) / (nir + red)
>>> ndvi.bandinfo.keys().inspect(santa_fe_ctx)
['nir_sub_red_div_nir_add_red']
When you calculate new derived data like this, the band names may not make much sense. You can use rename_bands
to create a new object with more sensible band names:
>>> ndvi_renamed = ndvi.rename_bands("ndvi")
>>> ndvi_renamed.bandinfo.keys().inspect(santa_fe_ctx)
['ndvi']
When both Images/ImageCollections in an arithmetic operation have multiple bands, they must have the same band names. Workflows then performs the operation between equivalent bands, as matched by name.
>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C", start_datetime="2019-01-01", end_datetime="2019-06-01")
>>> l8 = wf.ImageCollection.from_id("landsat:LC08:01:T1:TOAR", start_datetime="2019-01-01", end_datetime="2019-06-01")
>>> s2_comp = s2.mean("images")
>>> l8_comp = l8.mean("images")
>>> s2_rgb = s2_comp.pick_bands("red green blue")
>>> l8_rgb = l8_comp.pick_bands("red green blue")
>>> s2_l8_diff = s2_rgb - l8_rgb
Normally, this is what you want. But in the rare case where you match up bands positionally, you can use rename_bands
as a trick to override Workflows’ alignment rules. For example, say we want to look at the difference between subsequent bands in a certain order:
>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C", start_datetime="2019-01-01", end_datetime="2019-02-01")
>>> spectral_order = s2.pick_bands("swir2 swir1 nir red")
>>> spectral_order_shifted = s2.pick_bands("swir1 nir red green")
>>> delta = spectral_order_shifted - spectral_order
>>> delta.mean(axis=("pixels", "images")).inspect(santa_fe_ctx)
---------------------------------------------------------------------------
BadRequestError Traceback (most recent call last)
...
BadRequestError: In 'sub': Rasters must have the same bands. ['swir1', 'nir', 'red', 'green'] != ['swir2', 'swir1', 'nir', 'red']
To work around this, you can rename the bands to match up in the order you want:
>>> matching_names = ["0", "1", "2", "3"]
>>> spectral_order_renamed = spectral_order.rename_bands(*matching_names)
>>> spectral_order2_renamed = spectral_order_shifted.rename_bands(*matching_names)
>>> delta = spectral_order2_renamed - spectral_order_renamed
>>> delta.mean(axis=("pixels", "images")).inspect(santa_fe_ctx)
{'0': 0.023125917875686113,
'1': 0.2456081922092364,
'2': -0.020058692454203053,
'3': -0.02892147951430528}
Pixel values¶
In Workflows, pixel values are automatically rescaled to their physical_range
, since physical units are more intuitive to work with (particularly if you’re referencing remote sensing literature). (Note that this behavior is different from raster and Scenes.) For example, for a Landsat-8 image where values are stored as uint16
with the data_range
0-10000, in Workflows the values would be float64
from 0.0-1.0. Specifically, if a band defines both data_range
and physical_range
, and the band’s type is not classification
, then when the data is first loaded, its values are rescaled, without clipping, from data_range
to physical_range
. Otherwise, the values are left alone.
Reductions and compositing¶
Reductions such as mean
, max
, and median
are like their NumPy equivalents: they calculate a statistic along one or more dimensions of imagery, reducing its dimensionality.
The most common use of reductions is flattening an ImageCollection
into a single Image
. In remote sensing, this is commonly called compositing:.
>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C", start_datetime="2019-04-07", end_datetime="2019-04-10")
>>> all_s2 = s2.pick_bands("red green blue").inspect(santa_fe_ctx)
>>> dl.scenes.display(*all_s2.ndarray, ncols=2)

We can create a median composite of this stack of images:
>>> s2_comp = s2.pick_bands("red green blue").median(axis="images").inspect(santa_fe_ctx)
>>> dl.scenes.display(s2_comp.ndarray)

This is computing the median of each pixel across time (for each band). That “flattens” the images
axis of the ImageCollection
, producing a single Image
.
These reductions are available:
Image.min
/ImageCollection.min
Image.max
/ImageCollection.max
Image.mean
/ImageCollection.mean
Image.median
/ImageCollection.median
Image.sum
/ImageCollection.sum
Image.std
/ImageCollection.std
Image.count
/ImageCollection.count
You can also use Image.reduction
/ ImageCollection.reduction
to specify the reduction type as a string (like "mean"
), which is helpful if you want to pick the reduction operation with a widget, or change it conditionally with ifelse
.
Reduction axis¶
Inspired by NumPy, all reductions take an axis
argument: which axis (or axes) to “flatten”. Unlike NumPy, the axes are not given numerically, but by name:
"pixels"
: aggregates the spatial dimensions to a single value (equivalent toaxis=(-2, -1)
of thendarray
)"bands"
: aggregates all bands to a single band (equivalent toaxis=-3
of thendarray
)"images"
: only valid forImageCollection
: aggregates all Images to a single Image (equivalent toaxis=0
of thendarray
)
With bands
and images
, the result is still an Image
or ImageCollection
. When reducing pixels
, though, the spatial component is eliminated, so instead the results are returned in a Dict
(keyed by band name), List
(of the same length as the ImageCollection
), or List[Dict]
.
Refer to the API reference of each of the reduction functions for more information, including what they return for different combinations of the axis
argument.
For illustration purposes, here is ImageCollection.mean
with all the possible combinations of the axis
argument. We’ll use the same ImageCollection
for every case:
>>> s1 = wf.ImageCollection.from_id("sentinel-1:GRD", start_datetime="2021-01-01", end_datetime="2021-02-01")
>>> s1.inspect(santa_fe_ctx)
ImageCollectionResult of length 15:
* ndarray: MaskedArray<shape=(15, 2, 512, 512), dtype=float64>
* properties: 15 items
* bandinfo: 'vv', 'vh'
* geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...
axis="images"
>>> s1.mean(axis="images").inspect(santa_fe_ctx)
ImageResult:
* ndarray: MaskedArray<shape=(2, 512, 512), dtype=float64>
* properties: 'cs_code', 'proj4', 'descartes_version', 'sw_version', ...
* bandinfo: 'vv', 'vh'
* geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...
axis="bands"
>>> s1.mean(axis="bands").inspect(santa_fe_ctx)
ImageCollectionResult of length 15:
* ndarray: MaskedArray<shape=(15, 1, 512, 512), dtype=float64>
* properties: 15 items
* bandinfo: 'mean'
* geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...
axis="pixels"
>>> s1.mean(axis="pixels").inspect(santa_fe_ctx)
[{'vv': 0.06507844363941867, 'vh': 0.013471549539005053},
{'vv': 0.0672581267668532, 'vh': 0.012318438247697678},
{'vv': 0.08511264914812902, 'vh': 0.015800944029523173},
{'vv': 0.03462038566820695, 'vh': 0.009320577734452272},
{'vv': 0.06395339252744914, 'vh': 0.012412520835099021},
{'vv': 0.03661998548801368, 'vh': 0.006415286723717648},
{'vv': 0.06575639762130438, 'vh': 0.013508246926700367},
{'vv': 0.06645751142928526, 'vh': 0.01276219756396224},
{'vv': 0.08598611798555544, 'vh': 0.01651434521501131},
{'vv': 0.03692038186786661, 'vh': 0.010285123715727401},
{'vv': 0.06696140452529671, 'vh': 0.013725120566631954},
{'vv': 0.039123885552189534, 'vh': 0.00703503718474875},
{'vv': 0.0678754021139706, 'vh': 0.013611529481177236},
{'vv': 0.07074211130000033, 'vh': 0.012591674749612845},
{'vv': 0.09397280188010747, 'vh': 0.016609604723956565}]
axis=("images", "bands")
>>> s1.mean(axis=("images", "bands")).inspect(santa_fe_ctx)
ImageResult:
* ndarray: MaskedArray<shape=(1, 512, 512), dtype=float64>
* properties: 'cs_code', 'proj4', 'descartes_version', 'sw_version', ...
* bandinfo: 'mean'
* geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...
axis=("images", "pixels")
>>> s1.mean(axis=("images", "pixels")).inspect(santa_fe_ctx)
{'vv': 0.06471048479811546, 'vh': 0.012596908153723297}
axis=("bands", "pixels")
>>> s1.mean(axis=("bands", "pixels")).inspect(santa_fe_ctx)
[0.03927499658921186,
0.03978828250727544,
0.0504567965888261,
0.021970481701329612,
0.03818295668127408,
0.021517636105865663,
0.039632322274002374,
0.03960985449662375,
0.05125023160028338,
0.023602752791797004,
0.04034326254596433,
0.023079461368469143,
0.040743465797573916,
0.041666893024806584,
0.055291203302032015]
axis=("images", "bands", "pixels")
>>> s1.mean(axis=("images", "bands", "pixels")).inspect(santa_fe_ctx)
0.038653696475919375
axis=None
(equivalent to axis=("images", "bands", "pixels")
)
>>> s1.mean(axis=None).inspect(santa_fe_ctx)
0.038653696475919375
And here is Image.mean
with all the possible combinations of the axis
argument. We’ll use the same Image
for every case:
>>> s1_img = s1[0]
>>> s1_img.inspect(santa_fe_ctx)
ImageResult:
* ndarray: MaskedArray<shape=(2, 512, 512), dtype=float64>
* properties: 'acquired', 'cs_code', 'pass', 'bits_per_pixel', ...
* bandinfo: 'vv', 'vh'
* geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...
axis="bands"
>>> s1_img.mean(axis="bands").inspect(santa_fe_ctx)
ImageResult:
* ndarray: MaskedArray<shape=(1, 512, 512), dtype=float64>
* properties: 'acquired', 'cs_code', 'pass', 'bits_per_pixel', ...
* bandinfo: 'mean'
* geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...
axis="pixels"
>>> s1_img.mean(axis="pixels").inspect(santa_fe_ctx)
{'vv': 0.06507844363941867, 'vh': 0.013471549539005053}
axis=("bands", "pixels")
>>> s1_img.mean(axis=("bands", "pixels")).inspect(santa_fe_ctx)
0.03927499658921186
axis=None
>>> s1_img.mean(axis=None).inspect(santa_fe_ctx)
0.03927499658921186
ImageCollection-specific reductions¶
ImageCollection.mosaic
flattens the ImageCollection
into a single Image
by picking each first-unmasked pixel (ordered from last to first). mosaic
is often used to make a “most-recent-valid-pixel” composite. Unlike the other reductions, the order of the ImageCollection
matters. Additionally, mosaic
doesn’t take an axis
argument: the axis is always images
.
ImageCollection.sortby_composite
uses the argmin or argmax of one band select pixels from other bands. For example, you could create an RGB composite, where each pixel is selected from the image where NDVI was highest for that pixel:
>>> kansas_ctx = dl.scenes.DLTile.from_latlon(38.379776, -99.275899, resolution=40, tilesize=512, pad=0)
>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C", start_datetime="2019-06-01", end_datetime="2019-08-01")
>>>
>>> nir, red = s2.unpack_bands("nir red")
>>> ndvi = (nir - red) / (nir + red)
>>>
>>> rgb = s2.pick_bands("red green blue")
>>> rgb_max_ndvi_comp = rgb.sortby_composite(ndvi, operation="argmax")
>>>
>>> # median and mosaic, for comparison
>>> rgb_median = rgb.median(axis="images")
>>> rgb_mosaic = rgb.mosaic()
>>>
>>> sortby_result, median_result, mosaic_result = wf.inspect(
... [rgb_max_ndvi_comp.ndarray, rgb_median.ndarray, rgb_mosaic.ndarray],
... kansas_ctx
... )
>>>
>>> dl.scenes.display(
... sortby_result, median_result, mosaic_result, title=["argmax(NDVI) sortby composite", "Median composite", "Mosaic"]
... )

Masking¶
Imagery objects use a MaskedArray
, which is a proxy object for a NumPy masked array. Along with their pixel values, they have a boolean mask with the same shape as .ndarray
. Like NumPy, True means masked (invalid), False means unmasked (valid).
Masks behave exactly the same as NumPy masked arrays. Masks propagate: in img1 + img2
, if a pixel is masked in img1
but not img2
, it will be masked in the result. Masked values are also ignored in reductions: imagecollection.median(axis="images")
takes the median of only the unmasked pixels, for example.
When loading imagery, Workflows automatically masks the nodata
value for each band that has one set. If the product contains a band named alpha
, it also masks all bands by it. Additionally, if your GeoContext has the geometry
field set, all pixels outside of that geometry will be masked.
Updating masks¶
You can create new imagery with an updated mask using Image.mask
and ImageCollection.mask
. For example
>>> import descarteslabs.workflows as wf
>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C")
>>> cloud, cirrus = s2.unpack_bands("cloud-mask cirrus-cloud-mask")
>>> s2_cloudmasked = s2.mask(cloud | cirrus)
By default, mask
is additive: whatever was masked before remains masked, in addition to the new mask you pass in. Pass replace=True
to change this.
Any pixels that are masked in the new mask are ignored: in img.mask(new_mask)
, masked pixels in new_mask
are treated as False. (Note that this is different from NumPy.)
A common mistake is forgetting that mask
returns a new object—it doesn’t modify the object you call it on. For example, this won’t work as expected:
>>> s2.mask(cloud | cirrus)
This does nothing, because you didn’t store the result into a new variable!
Accessing masks¶
You can access the current mask with getmask
. This returns an Image
or ImageCollection
with one band of boolean data named mask
.
Groupby¶
ImageCollection.groupby
is similar to groupby in pandas: given a key function to identify which group a given Image
belogs to, it splits the ImageCollection
into a dict-like object of groups (ImageCollectionGroupby
). You can then apply an operation to each group (such as a mean
), and combine the results from all the groups back into one ImageCollection
or Dict
.
The most common use of groupby
is making temporal composites, such as a monthly median. For that reason, groupby
can take a dates
argument for convenience, where you can give the date components to groupby (year
, month
, day
, etc.):
>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C", start_datetime="2020-08-01", end_datetime="2020-11-01")
>>> s2_monthly_median = s2.groupby(dates=("year", "month")).median("images")
>>> result = s2_monthly_median.pick_bands("red green blue").inspect(kansas_ctx)
>>> result
ImageCollectionResult of length 3:
* ndarray: MaskedArray<shape=(3, 3, 512, 512), dtype=float64>
* properties: 3 items
* bandinfo: 'red', 'green', 'blue'
* geocontext: 'geometry', 'key', 'resolution', 'tilesize', ...
groupby
adds a field named group
to each image in the output:
>>> groups = s2_monthly_median.properties.map(lambda p: p["group"]).inspect(kansas_ctx)
>>> groups
[(2020, 8), (2020, 9), (2020, 10)]
>>> dl.scenes.display(*result.ndarray, title=groups, ncols=2)

Instead of dates
, you can give a function. This function must take one Image
, and return which group it belongs to. Typically, the function would look at the Image’s properties
.
For example, you can use this to bin the images into specific time intervals, such as 14-day composites:
>>> twoweek_max = (
... s2.groupby(
... lambda img: img.properties["date"] // wf.Timedelta(days=14)
... ).max("images")
... )
Groupby doesn’t have to produce an ImageCollection
. If the operation applied to each group produces something besides an Image
or ImageCollection
, groupby returns a Dict
, where the keys are the groups, and the values are the value for each group. Here, we compute the mean NDVI value for each month:
>>> nir, red = s2.unpack_bands("nir red")
>>> ndvi = (nir - red) / (nir + red)
>>> monthly_max_ndvi = ndvi.groupby(dates="month").mean()
>>> monthly_max_ndvi.inspect(kansas_ctx)
{8: 0.4094230069494817, 9: 0.27414022149032186, 10: 0.23755633545839724}
Joining ImageCollections using Groupby¶
Workflows doesn’t currently have a built-in join operation. Instead, you can efficiently implement a join yourself using the ImageCollectionGroupby
object.
The basic idea is to do a groupby
on the two (or more) ImageCollections you want to combine. If you know they’ll always have the same groups, then you can aggregate both of them, and use normal vectorized operations to combine them:
>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C", start_datetime="2020-08-01", end_datetime="2020-11-01")
>>> s1 = wf.ImageCollection.from_id("sentinel-1:GRD", start_datetime="2020-08-01", end_datetime="2020-11-01")
>>> s2_nir_swir1 = s2.pick_bands("nir swir1")
>>> s1_vv = s1.pick_bands("vv")
>>>
>>> s2_monthly_mean = s2_nir_swir1.groupby(dates="month").mean("images")
>>> s1_monthly_mean = s1_vv.groupby(dates="month").mean("images")
>>> # now `s2_monthly_mean` and `s1_monthly_mean` are both monthly composites, so they should be the same length (3)
>>>
>>> combined_ic = s2_monthly_mean.concat_bands(s1_monthly_mean)
>>> combined_ic.bandinfo.keys().inspect(kansas_ctx)
['nir', 'swir1', 'vv']
>>> combined_ic.properties.map(lambda p: p["group"]).inspect(kansas_ctx)
[8, 9, 10]
But in some cases, you might get an error like ImageCollections must be the same length, 3 != 2
. In that case, one of the ImageCollections was missing a group that the other one had. You can use a more advanced technique to address that: map
over the groups from one of the ImageCollections, and for each, select the matching group from the other. This works because ImageCollectionGroupby
acts like a dict: you can select groups from it with []
syntax:
>>> s1_missing_aug = s1.filter(lambda img: img.properties["date"].month != 8)
>>> # create a groupby object. note that we _don't_ apply a reduction, like `.mean()`
>>> s2_grouped = s2.groupby(dates="month")
>>> combined = s1_missing_aug.groupby(dates="month").map(
... lambda group, s1_imgs: (
... s1_imgs.mean("images")
... .concat_bands(
... # select the current group from `s2_grouped` (it acts like a dict)
... s2_grouped[group].mean("images")
... )
... )
... )
By mapping over the ImageCollection
that’s missing some groups, we know the result will only contain groups that both ImageCollections had.
But even if you mapped over the ImageCollection
that wasn’t missing groups—or if both ImageCollections were missing some groups—this would still work, thanks to Workflows’ handling of empty imagery. Let’s consider the example above, but where we pick from s1_missing_aug
instead of s2
:
>>> s1_missing_aug = s1.filter(lambda img: img.properties["date"].month != 8)
>>> s1_missing_aug_grouped = s1_missing_aug.groupby(dates="month")
>>> combined = s2.groupby(dates="month").map(
... lambda group, s2_imgs: (
... s2_imgs.mean("images")
... .concat_bands(
... # when `s1_missing_aug_grouped` doesn't contain `group`, it gives an empty ImageCollection
... s1_missing_aug_grouped[group].mean("images")
... )
... )
... )
Unlike a dict, selecting a missing item from an ImageCollectionGroupby
doesn’t raise an error—it just returns a empty ImageCollection. (Read more about empty ImageCollections here.) You can use this empty ImageCollection just like any other, but everything you do to it just produces another empty ImageCollection. Then at the end, when ImageCollectionGroupby.map
combines all of your groups together, it drops any groups where your function returned empty imagery. So in this case above, the function you passed to map
returns an empty Image for month of August, which then gets dropped, so the final ImageCollection
combined
has 2 Images: one for September, one for October. (In SQL terms, this is an inner join: only keeping keys that both inputs have.)
You can combine this behavior with ImageCollection.replace_empty_with
to have a “default value” for missing groups, letting you implement left or right joins, or even more complex logic.
>>> s1_missing_aug = s1.filter(lambda img: img.properties["date"].month != 8)
>>> s1_missing_aug_grouped = s1_missing_aug.groupby(dates="month")
>>> combined_with_s1_default = s2.groupby(dates="month").map(
... lambda group, s2_imgs: (
... s2_imgs.mean("images")
... .concat_bands(
... s1_missing_aug_grouped[group]
... # replacing empty values with an ImageCollection of 0s makes this a left join:
... # there will be an output for every group in `s2`.
... .replace_empty_with(0, bandinfo=s1.bandinfo)
... .mean("images")
... )
... )
... )
Filter¶
Note: see Life without for-loops for why filter
is necessary.
Use ImageCollection.filter
to create a new ImageCollection
with images removed based on their metadata. Pass in a function which takes an Image
; if the function returns False, the image is removed in the output.
>>> s2 = wf.ImageCollection.from_id("sentinel-2:L1C", start_datetime="2019-03-01", end_datetime="2019-04-01")
>>> rgb = s2.pick_bands("red green blue")
>>> rgb.length().inspect(santa_fe_ctx)
24
>>> low_cloud = rgb.filter(lambda img: img.properties["cloud_fraction"] < 0.1)
>>> low_cloud.length().inspect(santa_fe_ctx)
4
>>> all, low_cloud = wf.inspect([rgb.mean("images").ndarray, low_cloud.mean("images").ndarray], santa_fe_ctx)
>>> dl.scenes.display(all, low_cloud, title=["All images", "cloud_fraction filtered"], ncols=2)

Note
Filtering is relatively slow
Especially for large ImageCollections, every filter can slow down your computation. In particular, try to avoid using multiple filters, by instead combining multiple expressions within the same filter
operation:
>>> # slow!
>>> ic.filter(lambda img: img.properties.date.year >= 2016).filter(lambda img: img.properties.date.year < 2019)
>>> # faster
>>> ic.filter(lambda img: (img.properties.date.year >= 2016) & (img.properties.date.year < 2019))
Map¶
Note: see Life without for-loops for why map
is necessary.
Use ImageCollection.map
instead of a for-loop. map
applies a function to every Image
in the ImageCollection
and returns the results as a new ImageCollection
, or a List
.
>>> year_offsets = wf.proxify({
... 2018: 0.0,
... 2019: 0.3,
... 2020: 0.9
... })
>>>
>>> s2.map(lambda img: img + year_offsets[img.properties["date"].year])
<descarteslabs.workflows.types.geospatial.imagecollection.ImageCollection at 0x7f92ea307110>
Note
Map is relatively slow
ImageCollection.map
is like using a for-loop over a NumPy array: it’s slow, and with creative use of vectorized arithmetic, other vectorized operations on ImageCollections, and the NumPy array interface, can often be avoided.
>>> # slow!
>>> ic.map(lambda img: img + 1)
>>> # fast
>>> ic + 1
>>> # slow!
>>> ic.map(lambda img: img / img.mean())
>>> # faster
>>> means_arr = wf.Array(ic.mean(axis=["pixels", "bands"]))
>>> normd_arr = ic.ndarray / means_arr
>>> normd_arr.to_imagery(ic.properites, ic.bandinfo)
Indexing¶
You can index an ImageCollection
just like a list:
>>> ic[0]
>>> ic[-1]
>>> ic[1:4]
Using an out-of-bounds index will cause an error.
Concatenation¶
Use wf.concat
to concatenate any number of Images and/or ImageCollections into one new ImageCollection. All the items must have the same bands. Any empty Images/ImageCollections will be dropped from the result.
>>> s1 = wf.ImageCollection.from_id("sentinel-1:GRD", start_datetime="2020-08-01", end_datetime="2020-11-01")
>>> first, second = s1[0], s1[1]
>>>
>>> # concatenate two Images
>>> two_length_ic = wf.concat(first, second)
>>>
>>> # concatenate an Image and an ImageCollection
>>> all_but_first = wf.concat(second, s1[2:])
>>>
>>> # concatenate two Images and an ImageCollection
>>> all = wf.concat(first, second, s1[2:])
Empty Imagery¶
Sometimes, no data exist for the dates or places on Earth where you’re looking. Rather than raising an error, this missing data is represented by an empty Image
or ImageCollection
, which has no data or metadata. All operations on these “empties” succeed, but propagate the “emptiness” (except for some operations like concatenation, which drop empties). In general, the goal is that missing data, and anything it touches, will simply be ignored in your computations. If you need it to be handled differently, you can replace it with default values.
For example, say we want to make a composite of the ascending and descending orbital passes of Sentinel-1. In some places, Sentinel-1 captures data in either an ascending and descending orbit; in other places, both. So if we filter by both both pass directions, we’ll sometimes get missing data:
>>> col = wf.ImageCollection.from_id("sentinel-1:GRD", start_datetime="2019-04-01", end_datetime="2019-07-01")
>>> a = col.filter(lambda img: img.properties["pass"]=="ASCENDING") # empty ImageCollection (in eastern North America)
>>> d = col.filter(lambda img: img.properties["pass"]=="DESCENDING") # non-empty ImageCollection (in eastern North America)
>>>
>>> # Pick bands and min will succeed in both the empty and non-empty case
>>> a_min = a.pick_bands("vv").min(axis="images") # empty Image
>>> d_min = d.pick_bands("vv").min(axis="images") # non-empty Image
>>>
>>> mins_ic = wf.ImageCollection([a_min, d_min])
>>> # creating an ImageCollection drops any empty inputs,
>>> # so this will only contain `d_min` (`a_min` is an empty Image)
>>>
>>> max_composite = mins_ic.max(axis="images")
>>> # a maximum composite of the non-empty ascending/descending min composites.
>>> # since `a_min` was empty and `mins_ic` only contains `d_min`,
>>> # this is equivalent to `d_min`.
>>>
>>> max_composite.inspect(ctx)
ImageResult:
* ndarray: MaskedArray<shape=(1, 512, 512), dtype=float64>
...
What if we did not want to ignore empty values, but actually fill them in with something else? Let’s treat empty data explicitly as 0:
>>> a_min_filled = a_min.replace_empty_with(0, mask=False, bandinfo={"vv": {}})
>>> # when `a_min` is empty, replaces it with a new Image with one "vv" band of all 0s.
>>> # if `a_min` wasn't empty, it would return it unchanged.
>>>
>>> mins_ic = wf.ImageCollection([a_min_filled, d_min])
>>> # now `mins_ic` contains both `a_min_filled` (all 0s) and `d_min` (actual data)
>>>
>>> max_composite = mins_ic.max(axis="images")
>>> max_composite.inspect(ctx)
ImageResult:
* ndarray: MaskedArray<shape=(1, 512, 512), dtype=float64>
...
Bear in mind that this is a contrived example, and is simply meant to demonstrate how empty rasters can be handled. There are many other ways to do something like this (ImageCollection.groupby()
would be the simplest). More specific details about how empties are represented, and how they are handled in specific functions can be found below.
Both Image
and ImageCollection
objects can be empty, meaning they lack an ndarray, properites, and bandinfo. The .ndarray
will be None, .properties
will be an empty dictionary (Image
) or empty list (ImageCollection
), and .bandinfo
will be an empty dictionary. Empty imagery objects have a .geocontext
determined by the GeoContext
passed to .compute()
. An ImageCollection
cannot have some empty and some non-empty images. Every image in a non-empty ImageCollection
is non-empty. This is why concat
drops empties. In a similar way, you cannot have some empty and some non-empty bands. Every band in a non-empty Image
or ImageCollection
is non-empty. This is why concat_bands
and map_bands
return an empty if any of the input bands or mapped bands are empty.
An empty Image
cannot be explicitly constructed, and will only result from other operations. An empty ImageCollection
can be explicitly constructed by calling the constructor with an empty list/list of empty images, as well as calling .from_id
where no imagery matches the given constraints:
# Construct from empty list
>>> empty_col = wf.ImageCollection([])
>>> empty_col.length().inspect(ctx) # ctx is the GeoContext the computation will be performed over
0
# Construct from list of empty images
>>> empty_col = wf.ImageCollection([empty_img1, empty_img2, empty_img3])
>>> empty_col.length().inspect(ctx)
0
# No imagery matching given constraints
>>> empty_col = wf.ImageCollection.from_id("landsat:LC08:01:RT:TOAR", start_datetime="2017-01-01", end_datetime="2017-02-01") # no imagery exists between these start and end datetimes
>>> empty_col.length().inspect(ctx)
0
Empties can be created without explicitly constructing them as shown above. Some operations can result in an empty, when being passed a non-empty.
Image Operations:
map_bands
: If the mapper function returns an emptyImage
,map_bands
will also return an emptyImage
.
ImageCollection Operations:
filter
: If the filter function is not true for any images in the collection,filter
will return an emptyImageCollection
>>> non_empty_col = wf.ImageCollection.from_id("landsat:LC08:01:RT:TOAR", start_datetime="2017-01-01", end_datetime="2017-12-01") >>> filtd = non_empty_col.filter(lambda img: img.properties["date"].year == 2018) # none of the images are from 2018 >>> filtd.length().inspect(ctx) 0
map
: If the mapper function returns an emptyImage
,map
will return an emptyImageCollection
.map_bands
: If the mapper function returns an emptyImageCollection
,map_bands
will return an emptyImageCollection
. If the mapper function returns an emptyImage
,map_bands
will return an emptyImage
.map_window
: If the mapper function returns empty imagery,map_window
will return an emptyImageCollection
.
Empties are aggressively propagated. This means that when there is empty involved in an operation, it will result in a empty, regardless of the emptiness of other arguments:
>>> empty_col = wf.ImageCollection([])
>>> non_empty_col = wf.ImageCollection.from_id("landsat:LC08:01:RT:TOAR", start_datetime="2017-01-01", end_datetime="2017-07-01")
>>> added = non_empty_col + empty_col
>>> added.length().inspect(ctx)
0
Some operations are an exception to this:
concat
: All non-empty imagery is preserved>>> empty_col = wf.ImageCollection([]) >>> non_empty_col = wf.ImageCollection.from_id("landsat:LC08:01:RT:TOAR", start_datetime="2017-01-01", end_datetime="2017-07-01") >>> concat = empty_col.concat(non_empty_col) # non_empty_col.concat(empty_col) will yield the same result >>> concat.length().inspect(ctx) 4 >>> concat = non_empty_col.concat(empty_col, non_empty_col) >>> concat.length().inspect(ctx) 8
ImageCollection constructor
: All non-empty imagery is preserved>>> empty_col = wf.ImageCollection([]) >>> empty_img = empty_col[0] # slicing into an empty ImageCollection results in an empty Image (all indices are valid) >>> non_empty_img = wf.Image.from_id("landsat:LC08:PRE:TOAR:meta_LC80270312016188_v1") >>> col = wf.ImageCollection([non_empty_img, empty_img, empty_img]) >>> col.length().inspect(ctx) 1
mask
: If the mask is empty,mask
is a no-op. If the object being masked is empty,mask
is an empty (type determined by broadcasting rules)>>> empty_col = wf.ImageCollection([]) >>> non_empty_col = wf.ImageCollection.from_id("landsat:LC08:01:RT:TOAR", start_datetime="2017-01-01", end_datetime="2017-07-01") >>> no_op = non_empty_col.mask(empty_col) >>> no_op.length().inspect(ctx) 4 >>> op = empty_col.mask(non_empty_col) >>> op.length().inspect(ctx) 0
Aggresive propagation of empties may not always be desirable. If you have just done an operation that may result in an empty, but do not want that emptiness to propagate down through the rest of the operations, you can replace it with a non-empty:
>>> empty_col = wf.ImageCollection([])
>>> non_empty_col = wf.ImageCollection.from_id("landsat:LC08:01:RT:TOAR", start_datetime="2017-01-01", end_datetime="2017-07-01")
>>> new_col = empty_col.replace_empty_with(non_empty_col) # replacing with an ImageCollection
>>> new_col.length().inspect(ctx)
4
>>> new_col = empty_col.replace_empty_with(1.1, bandinfo={"red": {}, "green": {}, "blue": {}}) # replacing with a scalar (3 bands per Image)
>>> # new_col will be fully masked as the default behavior of replace_empty_with (specifying mask=False would make it unmasked)
>>> new_col.length().inspect(ctx)
1
See Image.replace_empty_with()
and ImageCollection.replace_empty_with()
for details on how to replace empty Image
and ImageCollection
objects. More information about empty handling for specific functions can be found in their documentation.