Raster

The Raster API has two purposes: retrieving data from the Descartes Labs Catalog, and providing a consistent way of splitting that data into smaller tiles.

Note

The Raster API is currently deprecated and should not be used directly. The functionality is now available as part of the Catalog V2 API and the latter should be used for all new code.

Note

For information about API Quotas and limits see our Quotas & Limits page.

Retrieving data

The ndarray, stack, and raster methods retrieve raster data, returning them either in an NumPy array or, for raster, an image file. These methods accept a list of Image id`s, which can be found by using the :mod:`Catalog ~descarteslabs.catalog client search available imagery. These methods enable you to rescale, resample, reproject, clip, resize, select bands of interest, and convert data type of the imagery in a single call.

These three methods differ in how they return raster data, but all follow the same pattern. Here’s an example of using ndarray to retrieve a visible-light image from the Sentinel-2 constellation as a NumPy array, taken over the booming town of McCarthy, Alaska in the lower-right (population 28).

>>> from descarteslabs.client.services.raster import Raster
>>> from descarteslabs.utils import display
>>> import matplotlib.pyplot as plt
>>>
>>> raster = Raster()
>>> s2_scene_id = "sentinel-2:L1C:2017-08-07_07VCJ_99_S2A_v1"
>>>
>>> s2_arr, s2_meta = raster.ndarray(
...     s2_scene_id,
...     resolution=120,
...     bands=["red", "green", "blue", "alpha"],
...     scales=[[0,8000], [0,8000], [0,8000], [0,1]],
...     data_type="Byte",
...     drop_alpha=True,
... )
>>> display(s2_arr, bands_axis=-1, title=s2_scene_id)
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure3_1.png

See in Viewer

Notice that we’ve already chosen a specific scene to retrieve: the ID "sentinel-2:L1C:2017-08-07_07VCJ_99_S2A_v1". Typically, you’d find the scenes you want by searching with the Catalog client, which we’ve just done in advance here. This illustrates the difference between Catalog and Raster: Catalog is for searching for imagery, Raster is for retrieving it.

The three methods generally take the same parameters, but differ in how they return data:

ndarray

ndarray returns images as a NumPy array with 3 dimensions (height, width, and bands), for analysis and manipulation with Python scientific computing tools. These are masked arrays by default which return the data along with a mask that represents pixels with no data.

You can see an example of using ndarray above and below.

>>> # percent of pixels valid
>>> import numpy as np
>>> np.sum(s2_arr.mask == 0) / s2_arr.data.size
1.0

By default masked=True but setting masked=False will return an unmasked NumPy array. Note that in the above call, the drop_alpha parameter was used to remove the alpha band from the result (after using it to mask the result). This is because masking and the alpha band are redundant, and the display function accepts either a four-band image (with the alpha channel) or masked data but not both.

raster

raster will save an image file to disk.

>>> file, meta = raster.raster(
...     "landsat:LC08:PRE:TOAR:meta_LC80330352017056_v1",
...     resolution=150,
...     bands=["red", "green", "blue", "alpha"],
... )
>>> file
landsat:LC08:PRE:TOAR:meta_LC80330352017056_v1.tif

stack

stack returns multiple images as a 4-D NumPy array. This is particularly useful for creating time-stacks.

>>> # Two Landsat-8 scenes over Santa Fe, NM in Feb and Apr 2017
>>> scene_ids = [
...     "landsat:LC08:PRE:TOAR:meta_LC80330352017056_v1",
...     "landsat:LC08:PRE:TOAR:meta_LC80330352017104_v1"
... ]
>>>
>>> # use a DLTile (see bottom of this guide) to specify explicit
>>> # resolution, bounds, & SRS.  required by stack so that each
>>> # ndarray "layer" has the same shape.
>>> dltile_key = "1024:0:60.0:13:-2:64"
>>> stack, raster_info = raster.stack(
...     scene_ids,
...     dltile=dltile_key,
...     bands=["red", "green", "blue", "alpha"],
...     scales=[[0,4000], [0,4000], [0,4000], [0,1]],
...     data_type="Byte",
...     drop_alpha=True
... )
>>>
>>> stack.shape
(2, 1024, 1024, 3)
>>> display(*stack, title=scene_ids, bands_axis=-1)  # easily show multiple images
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure8_1.png

See in Viewer

Mosaicking

You can also pass in multiple IDs to these methods, which will mosaic those images together. This is useful when your area of interest spans multiple images.

The mosaic ordering follows the ordering of the list of inputs, bottom to top. Thus, if there is any area of overlap between images, the mosaic image will contain the data from the last image.

As an example, here are two Landsat 8 scenes taken near Rome in early 2018:

https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure10_1.png

See in Viewer

You can mosaic the two into a single image. Note that where the images overlap, the pixels from the last image in the list simply “cover up” the others. There is no compositing (as you might do with a median, mean, etc.) of the two images.

>>> scene_ids = [
...     "landsat:LC08:01:RT:TOAR:meta_LC08_L1TP_191031_20180130_20180130_01_RT_v1",
...     "landsat:LC08:01:RT:TOAR:meta_LC08_L1TP_190031_20180123_20180123_01_RT_v1"
... ]
>>>
>>> mosaic1, meta1 = raster.ndarray(
...     scene_ids,
...     resolution=150,
...     bands=["red", "green", "blue", "alpha"],
...     drop_alpha=True
... )
>>>
>>> display(mosaic1, bands_axis=-1, size=5)  # use to easily show uint16 images
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure12_1.png

Notice how the order changes which image is “on top”:

>>> mosaic2, meta2 = raster.ndarray(
...     list(reversed(scene_ids)),
...     resolution=150,
...     bands=["red", "green", "blue", "alpha"],
...     drop_alpha=True
... )
>>>
>>> display(mosaic2, bands_axis=-1, size=5)
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure14_1.png

Note

In most cases, but always when mosiacing, you should include the alpha band as the last band. The alpha band indicates which pixels are valid (1) or invalid (0).

Pixels ouside the valid data area (where alpha would be 0) may not be exactly 0, due to compression artifacts. Without including the alpha band, invalid areas of one scene may mask out valid areas of another.

https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure16_1.png

Raster Parameters

Raster methods can take many parameters to handle many different use-cases, but it’s not necessary to use (or know) all of them at first.

Bands

Use the bands parameter to choose which bands of the images you want, and in which order. Though photographs typically have exactly three bands—red, green, and blue—remotely-sensed data often has many more bands than that outside the visible spectrum. You can select as many or few bands as you want, in any order. Alpha (mask) bands are handled specially only when they are the last band in the list.

Note

matplotlib can only display images with one or three bands (plus an optional alpha band).

Here are two rasters of the same scene of a wildfire in Oklahoma, one in the visible-light spectrum, one falsely colored with the short-wave infrared, near-infrared 2, and near-infrared 1 bands:

>>> bands_scene_id = "sentinel-2:L1C:2018-04-13_14SME_99_S2B_v1"
>>>
>>> bands_visible, _ = raster.ndarray(
...     bands_scene_id,
...     bands=["red", "green", "blue", "alpha"],  # visible-light bands
...     resolution=150,
...     scales=[[0,4000], [0,4000], [0,4000], [0,1]],
...     data_type="Byte",
...     drop_alpha=True
... )
>>>
>>> bands_ir, _ = raster.ndarray(
...     bands_scene_id,
...     bands=["swir2", "swir1", "nir", "alpha"],  # infrared bands
...     resolution=150,
...     scales=[[0,4000], [0,4000], [0,4000], [0,1]],
...     data_type="Byte",
...     drop_alpha=True
... )
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure19_1.png

See in Viewer

Scales

You only should specify scales if you specify a data_type smaller than the product’s native data type.

Scales allow you to map pixel values from their native domain to a range of your choosing. For example, a scale of [0, 4000, 0, 255] would map 4000 to 255, 0 to 0, and 2000 to 127.5. If the range is left out (the second two values), it defaults to [0, 255].

>>> raster_params = {
...     "data_type": "Byte",
...     "bands": ["red", "green", "blue", "alpha"],
...     "resolution": 150,
...     "drop_alpha": True
... }
>>>
>>> full_TOAR_arr, full_TOAR_meta = raster.ndarray(
...     s2_scene_id,
...     scales=[[0,10000], [0,10000], [0,10000], [0,1]],
...     **raster_params
... )
>>>
>>> partial_TOAR_arr, partial_TOAR_meta = raster.ndarray(
...     s2_scene_id,
...     scales=[[0,4000], [0,4000], [0,4000], [0,1]],
...     **raster_params
... )
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure22_1.png

See in Viewer

Notice how in the second image, the snowy peaks look blown out, but you can see more detail in the darker valleys. Every pixel above 4000 is rescaled to 255, so detail is lost in areas with high reflectance, and more values between 0-255 are used to express dark areas, giving them more dynamic range.

Scaling is only necessary when requesting data in a smaller data type than its native format—like above, where we requested UInt16 data as Byte. The 65,535 possible values in a UInt16 must be represented with just 255 values in a Byte. Though you could simply map [0..65,535] to [0..255], when you know something about the data, you can be more clever to reduce loss of precision.

In typical images, most pixels will be below around 40% reflectance. So rather than using only 0-101 to represent 0-40%, and 102-255 for 40-100%—values which rarely occur—it’s common to use a scaling like [0, 4000, 0, 255] for each band (as seen in the image above). This allows the values containing most of the useful information to utilize the full output range.

Note

Data range vs physical range

The values in raster data have physical meaning—but for efficiency, are often stored on-disk as different numbers.

Most satellite imagery, for example, measures reflectance: the fraction of the sun’s light reflected by the Earth. (Technically, it measures radiance, which is converted to reflectance, then also possibly corrected for atmospheric absorption.)

It would seem logical to represent these fractions as floating-point numbers between 0 and 1. However, they can also be represented as integers between 0 and 10,000 without loss of precision (due to the number of significant figures to which the instrument is actually accurate).

So storing the data as 16-bit unsigned integers is as precise as 64-bit floats, but uses four times less space (and is faster to operate on for the CPU).

We call the data range the range that values can have in their on-disk format (for example, 0-10000), and the physical range the numbers with physical meaning that the data range represents (in this case, reflectance from 0 to 1).

Again: scales are unnecessary if you just request the data in its native data type. The parameter exists only to allow for the optimization of downsampling to a smaller data type (from UInt16 to Byte), while retaining control over how that downsampling occurs. Unless you need to optimize your memory usage in that way, it’s easier to specify neither dtype nor scales and work with data in its native format. If necessary, you can use display to visualize it (since matplotlib’s imshow can only handle uint8 arrays).

Warning

You should always specify a scaling when requesting data in a different data type than its native format.

Otherwise, you’ll get unexpected results, as the original binary values are simply cast to the new data type.

>>> do_this, _ = raster.ndarray(
...     s2_scene_id,
...     scales=[[0, 10000], [0, 10000], [0, 10000], [0, 1]],
...     data_type="Byte",  # original data is uint16
...     bands=["red", "green", "blue", "alpha"],
...     resolution=200,
...     masked=False
... )
>>>
>>> dont_do_this, _ = raster.ndarray(
...     s2_scene_id,
...     scales=None,   # no! specify a scaling!
...     data_type="Byte",  # original data is uint16
...     bands=["red", "green", "blue", "alpha"],
...     resolution=200,
...     masked=False
... )
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure25_1.png https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure26_1.png

Cutline

A cutline clips raster data to geometry of your choice. Cutlines can be any GeoJSON object, or a WKT string. As a GeoJSON object, coordinates are expected to be in WGS84 lat-lon.

>>> geometry = {
...     "type": "Polygon",
...     "coordinates": [[
...         [-105.718463, 36.995841],
...         [-105.200117, 36.915628],
...         [-105.53038, 36.013014],
...         [-106.058364, 36.296978],
...         [-105.938469, 36.465516],
...         [-106.006634, 36.995343],
...         [-105.718463, 36.995841]
...     ]]
... }
>>>
>>> taos_scene_id = "landsat:LC08:PRE:TOAR:meta_LC80330352017056_v1"
>>>
>>> clipped_arr, c_meta = raster.ndarray(
...     taos_scene_id,
...     cutline=geometry,  # clip to shape
...     bands=["red", "green", "blue", "alpha"],
...     resolution=150,
...     scales=[[0,4000], [0,4000], [0,4000], [0,1]],
...     data_type="Byte",
...     drop_alpha=True
... )
>>>
>>> unclipped_arr, u_meta = raster.ndarray(
...     taos_scene_id,
...     cutline=None,  # no cutline
...     bands=["red", "green", "blue", "alpha"],
...     resolution=150,
...     scales=[[0,4000], [0,4000], [0,4000], [0,1]],
...     data_type="Byte",
...     drop_alpha=True
... )
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure29_1.png

Bounds

Bounds clips rasters to a rectangular spatial region defined by [min_x, min_y, max_x, max_y]. If bounds_srs is not set, those coordinates should be in the same spatial reference system as the output.

Bounds define a spatial region more explicitly than a cutline; if you find that different images with the same cutline have slightly different numbers of pixels, try setting the bounds.

SRS

The SRS (Spatial Reference System, also known as Coordinate Reference System) sets the map projection of the output raster. It can be specified as an EPSG code (like “EPSG:4326”), a PROJ.4 definition, or OGC CRS Well-Known Text.

>>> s2_arr_wgs84, m_wgs84 = raster.ndarray(
...     s2_scene_id,
...     srs="EPSG:4326",  # WGS84 (lat-lon)
...     resolution=0.001, # in decimal degrees
...     bands=["red", "green", "blue", "alpha"],
...     scales=[[0,8000], [0,8000], [0,8000], [0,1]],
...     data_type="Byte",
...     drop_alpha=True
... )
>>>
>>> s2_arr_polar, m_polar = raster.ndarray(
...     s2_scene_id,
...     srs="EPSG:3995",  # WGS84 Arctic Polar Stereographic
...     resolution=150,
...     bands=["red", "green", "blue", "alpha"],
...     scales=[[0,8000], [0,8000], [0,8000], [0,1]],
...     data_type="Byte",
...     drop_alpha=True
... )
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure32_1.png

Here’s the first image we saw over Alaska, in its (native) UTM projection, versus a WGS84 geographic coordinate system (Plate Carrée, where latitude and longitude are mapped directly to x and y), versus an Arctic Plolar Stereographic projection, if you’re into that.

The affine geotransform matrix for the raster is available in the rasterization metadata dictionary that’s also returned from raster calls if you need it for interacting with other geospatial software, or uploading data back to the Descartes Labs Catalog.

>>> m_wgs84["geoTransform"]
[-144.84256570972653, 0.001, 0.0, 62.22092380677686, 0.0, -0.001]

Resolution

Resolution specifies the distance that the edge of each pixel represents on the ground, in units of the SRS. So in UTM coordinates, for example, it should be given in meters; in lat-lon coordinates, it should be in decimal degrees. Therefore, the smaller the number, the finer resolution.

Using as coarse of a resolution as possible for your analysis will be both faster and cheaper (less data to transfer over the network, and to operate on). In particular, loading an entire scene at full resolution is rarely necessary, and will take a while.

Dimensions

Dimensions specify a width and height, in pixels, within which the output raster should fit.

You can either specify the resolution or the dimensions, but not both. If you use resolution, the output image is resampled to the specified pixel size, which determines its width and height. If you use dimensions, a resolution is chosen that fits the output image into the specified width and height. However, to maintain the aspect ratio, the output dimensions may end up being smaller than you specify.

>>> singapore_s2 = "sentinel-2:L1C:2016-04-16_48NUG_99_S2A_v1"
>>>
>>> arr_smalldims, _ = raster.ndarray(
...     singapore_s2,
...     dimensions=(100, 100),  # width x height
...     bands=["red", "green", "blue", "alpha"],
...     scales=[[0,4000], [0,4000], [0,4000], [0,1]],
...     data_type="Byte",
...     drop_alpha=True
... )
>>>
>>> arr_smalldims.shape
(100, 100, 3)
>>> arr_biggerdims, _ = raster.ndarray(
...     singapore_s2,
...     dimensions=(600, 400),  # width x height, unequal
...     bands=["red", "green", "blue", "alpha"],
...     scales=[[0,4000], [0,4000], [0,4000], [0,1]],
...     data_type="Byte",
...     drop_alpha=True
... )
>>>
>>> # notice the array is still 400 x 400; it just fits *within* 600 x 400
>>> arr_biggerdims.shape
(400, 600, 3)
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure36_1.png

Align Pixels

The align_pixels parameter ensures that pixels’ top-left corners in the output raster correspond to non-fractional intervals of resolution in the output SRS.

What does that mean? Consider requesting two rasters in some imaginary coordinate system with units in meters, one requested with the bounds [0, 0, 10, 10], and another with the bounds [0.5, 0.8, 10.5, 10.8]. Since both areas are 10x10 meters, both rasters would be 10x10 pixels at 1-meter resolution. However, pixel (0, 0) in the first raster would refer to the coordinates (0, 0) in our imaginary SRS, whereas pixel (0, 0) in the second would refer to (0.5, 0.8). Therefore, though the pixels would have the same array indicies, they wouldn’t be quite aligned to the same place on Earth, and comparisons between them would be invalid.

Specifying align_pixels=True in this case would internally shift bounds for you, so that the coordinates fell on whole-number intervals of 1 meter. In that case, the bounds of the output would end up being [0, 0, 10, 10].

Therefore, when you’re loading different images covering the same area, and you want to compare pixel-to-pixel between them, set align_pixels=True in most cases.

>>> s1_id = "sentinel-1:GRD:meta_2018-06-16_085D5608_S1A"
>>>
>>> utm_bounds = (
...     418080.5,
...     3379200.25,
...     518080.5,
...     3579200.25
... )
>>>
>>> s1_arr, s1_meta = raster.ndarray(
...     s1_id,
...     align_pixels=False,
...     bounds=utm_bounds,
...     resolution=200,
...     bands=["vv", "vh"]
... )
>>>
>>> s1_arr_tap, s1_meta_tap = raster.ndarray(
...     s1_id,
...     align_pixels=True,
...     bounds=utm_bounds,
...     resolution=200,
...     bands=["vv", "vh"]
... )
>>>
>>> # without align_pixels, the pixels start on a fraction of a meter
>>> s1_meta['cornerCoordinates']['upperLeft'], s1_meta['cornerCoordinates']['lowerRight']
([418080.5, 3579200.25], [518080.5, 3379200.25])
>>> # with it, they are "snapped" to whole meters
>>> s1_meta_tap['cornerCoordinates']['upperLeft'], s1_meta_tap['cornerCoordinates']['lowerRight']
([418000.0, 3579400.0], [518200.0, 3379200.0])
>>> s1_arr.shape
(1000, 500, 2)
>>> # notice that this changes the output dimensions
>>> # to cover the extra fraction of a pixel
>>> s1_arr_tap.shape
(1001, 501, 2)

All Touched

The cutline_all_touched controls how source image pixels are mapped into the output pixels. Specifically it controls the GDAL CUTLINE_ALL_TOUCHED warp option. By default this parameter is False, and a source pixel will only contribute to the destination pixel if the bounds of the destination pixel contain the centroid of the source pixel (the so-called “pixel-as-point” approach). If set to True then a source pixel will contribute to the destination pixel if the bounds of the destination pixel intersect with the bounds of the source pixel.

It is recommended to leave cutline_all_touched at its default value. In general it guards against systematic bias in the results. However, there are circumstances where all_touched=True is warranted. This is the case when the destination resolution is smaller than the source resolution and the bounds of a destination pixel may not contain the centroid of any source pixel, resulting in a masked pixel.

Here’s an example:

>>> scene_id = "ecmwf:era5:v0:ERA5_20210401T20_tile4"
>>>
>>> geometry = {
...     "type": "Polygon",
...     "coordinates": ((
...         (-50.21819862632718, -6.033494420295789),
...         (-50.21973462385856, -6.021475529166609),
...         (-50.09615761215656, -6.021324708843302),
...         (-50.09368786752497, -6.121996683190829),
...         (-50.21768560099267, -6.122384050341935),
...         (-50.21819862632718, -6.033494420295789)
...     ),)
... }
>>>
>>> bounds = (
...     -50.21973462385856,
...     -6.122384050341935,
...     -50.09368786752497,
...     -6.021324708843302
... )
>>>
>>>
>>> arr, meta = raster.ndarray(
...     scene_id,
...     align_pixels=False,
...     outputBounds=bounds,
...     outputBoundsSRS="EPSG:4326",
...     srs="EPSG:3857",
...     resolution=100.,
...     bands=["tp"],
...     cutline=geometry
... )
>>>
>>> arr_at, meta_at = raster.ndarray(
...     scene_id,
...     align_pixels=False,
...     outputBounds=bounds,
...     outputBoundsSRS="EPSG:4326",
...     srs="EPSG:3857",
...     resolution=100.,
...     bands=["tp"],
...     cutline=geometry,
...     cutline_all_touched=True
... )
>>>
>>> # without ``cutline_all_touched`` the result contains no data (all masked)
>>> import numpy as np
>>> arr.mask.sum() == np.prod(arr.shape)
True
>>> # with the ``cutline_all_touched`` the result contains data which comes from outside the cutline
>>> arr_at.mask.sum() == 0
True

Resampler

This determines the algorithm used to interpolate between pixels when scaling up, scaling down, or otherwise warping a raster to a new resolution or SRS.

Nearest-neighbor is the default (fastest but lowest-quality). Bilinear produces visually smoother images when downsampling. Be aware, though, that if using discretely-valued data (such as a raster classification layer where 1 means brick, 3 means wood, etc.), use a resampler like nearest-neighbor or median—an interpolating resampler could produce values that don’t exist in your input data (like 2).

>>> resamplers = [
...     "bilinear",
...     "near",
...     "cubic"
... ]
>>>
>>> arrs = [
...     raster.ndarray(
...         s1_id,
...         resampler=resampler,
...         resolution=1000,
...         bands=["vv", "vh", "vv"]
...     )[0]
...     for resampler in resamplers
... ]
>>>
>>> display(arrs[0], title=resamplers[0], bands_axis=-1)
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure42_1.png
>>> display(arrs[1], title=resamplers[1], bands_axis=-1)
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure43_1.png
>>> display(arrs[2], title=resamplers[2], bands_axis=-1)
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/raster_figure44_1.png

See in Viewer