# Raster¶

The Raster API has two purposes: retrieving data from the Descartes Labs Catalog, and providing a consistent way of splitting that data into smaller tiles.

## Retrieving data¶

The ndarray, stack, and raster methods retrieve raster data, returning them either in an NumPy array or, for raster, an image file. These methods accept a list of identifiers from the Metadata API which identify the image(s) to retrieve. All of these methods enable you to rescale, resample, reproject, clip, resize, select bands of interest, and convert data type of the imagery in a single call.

These three methods differ in how they return raster data, but all follow the same pattern. Here’s an example of using ndarray to retrieve a visible-light image from the Sentinel-2 constellation as a NumPy array, taken over the booming town of McCarthy, Alaska in the lower-right (population 28).

In [1]: import matplotlib.pyplot as plt

In [2]: import descarteslabs as dl

# isntantiate the raster client
In [3]: raster_client = dl.Raster()

In [4]: s2_scene_id = "sentinel-2:L1C:2017-08-07_07VCJ_99_S2A_v1"

In [5]: s2_arr, s2_meta = raster_client.ndarray(s2_scene_id,
...:                                         resolution=120,
...:                                         bands=["red", "green", "blue", "alpha"],
...:                                         scales=[[0,8000], [0,8000], [0,8000], [0,1]],
...:                                         data_type="Byte")
...:

In [6]: plt.imshow(s2_arr);

In [7]: plt.title(s2_scene_id);

In [8]: plt.show();


See in Viewer

Notice that we’ve already chosen a specific scene to retrieve: the ID "sentinel-2:L1C:2017-08-07_07VCJ_99_S2A_v1". Typically, you’d figure out which scenes you want by searching with the Metadata API, which we’ve just done in advance here. This illustrates the difference between Metadata and Raster: Metadata is for searching for imagery, Raster is for retrieving it.

The three methods generally take the same parameters, but differ in how they return data:

### ndarray¶

ndarray returns images as a NumPy array with 2 or 3 dimensions (one band vs multiple bands), for analysis and manipulation with Python scientific computing tools.

You can see an example of using ndarray above.

### raster¶

raster returns data as an image file, optionally saving it to disk.

It returns a dictionary of information about the raster with a files key, which contains another dictionary mapping a filename to the image file data as a byte array. Passing save=True will save that image file to disk in the current working directory.

In [9]: raster = raster_client.raster("landsat:LC08:PRE:TOAR:meta_LC80330352017056_v1",
...:                               resolution=150,
...:                               bands=["red", "green", "blue", "alpha"],
...:                               save=True)  # save file to disk
...:

In [10]: raster["files"].keys()
Out[10]: [u'landsat:LC08:PRE:TOAR:meta_LC80330352017056_v1_red-green-blue-alpha.tif']

In [11]: # raster["files"]["landsat:LC08:PRE:TOAR:meta_LC80330352017056_v1_red-green-blue-alpha.tif"]
....: # is a byte array of the data as a GeoTIFF
....:

In [12]: import os

In [13]: os.listdir(os.getcwd())  # image saved as GeoTIFF in current directory
Out[13]: ['landsat:LC08:PRE:TOAR:meta_LC80330352017056_v1_red-green-blue-alpha.tif']


### stack¶

stack returns multiple images as a 4-D NumPy array. This is particularly useful for creating time-stacks.

# Two Landsat-8 scenes over Santa Fe, NM in Feb and Apr 2017
In [14]: scene_ids = [
....:   "landsat:LC08:PRE:TOAR:meta_LC80330352017056_v1",
....:   "landsat:LC08:PRE:TOAR:meta_LC80330352017104_v1"
....: ]
....:

# use a DLTile (see bottom of this guide) to specify explicit resolution, bounds, & SRS.
# required by stack so that each ndarray "layer" has the same shape.
In [15]: dltile_key = "1024:0:60.0:13:-2:64"

In [16]: stack, raster_info = raster_client.stack(scene_ids,
....:                                          dltile=dltile_key,
....:                                          bands=["red", "green", "blue", "alpha"],
....:                                          scales=[[0,4000], [0,4000], [0,4000], [0,1]],
....:                                          data_type="Byte")
....:

In [17]: stack.shape
Out[17]: (2, 1024, 1024, 4)

In [18]: dl.scenes.display(*stack, title=scene_ids, size=5, bands_axis=-1)  # easily show multiple images


See in Viewer

### Mosaicking¶

You can also pass in multiple IDs to these methods, which will mosaic those images together. This is useful when your area of interest spans multiple images.

The mosaic ordering follows the ordering of the list of inputs, bottom to top. Thus, if there is any area of overlap between images, the mosaic image will contain the data from the last image.

As an example, here are two Landsat 8 scenes taken near Rome in early 2018:




See in Viewer

You can mosaic the two into a single image. Note that where the images overlap, the pixels from the last image in the list simply “cover up” the others. There is no compositing (as you might do with a median, mean, etc.) of the two images.

In [19]: scene_ids = [
....:   "landsat:LC08:01:RT:TOAR:meta_LC08_L1TP_191031_20180130_20180130_01_RT_v1",
....:   "landsat:LC08:01:RT:TOAR:meta_LC08_L1TP_190031_20180123_20180123_01_RT_v1"
....: ]
....:

In [20]: mosaic1, meta1 = raster_client.ndarray(scene_ids,
....:                                        resolution=150,
....:                                        bands=["red", "green", "blue", "alpha"])
....:

In [21]: dl.scenes.display(mosaic1, bands_axis=-1, size=5)  # use to easily show uint16 images


Notice how the order changes which image is “on top”:

In [22]: mosaic2, meta2 = raster_client.ndarray(list(reversed(scene_ids)),
....:                                        resolution=150,
....:                                        bands=["red", "green", "blue", "alpha"])
....:

In [23]: dl.scenes.display(mosaic2, bands_axis=-1, size=5)


Note

In most cases, but always when mosiacing, you should include the alpha band as the last band. The alpha band indicates which pixels are valid (1) or invalid (0).

Pixels ouside the valid data area (where alpha would be 0) may not be exactly 0, due to compression artifacts. Without including the alpha band, invalid areas of one scene may mask out valid areas of another.




## Raster Parameters¶

Raster methods can take many parameters to handle many different use-cases, but it’s not necessary to use (or know) all of them at first.

### Bands¶

Use the bands parameter to choose which bands of the images you want, and in which order. Though photographs typically have exactly three bands—red, green, and blue—remotely-sensed data often has many more bands than that outside the visible spectrum. You can select as many or few bands as you want, in any order. (However, matplotlib can only display images with one or three bands.) Careful: not specifying bands will return a raster of every band in the product, which can be quite large.

Here are two rasters of the same scene of a wildfire in Oklahoma, one in the visible-light spectrum, one falsely colored with the short-wave infrared, near-infrared 2, and near-infrared 1 bands:

In [24]: bands_scene_id = "sentinel-2:L1C:2018-04-13_14SME_99_S2B_v1"

In [25]: bands_visible, _ = raster_client.ndarray(bands_scene_id,
....:                                          bands=["red", "green", "blue", "alpha"],  # visible-light bands
....:                                          resolution=150,
....:                                          scales=[[0,4000], [0,4000], [0,4000], [0,1]],
....:                                          data_type="Byte")
....:

In [26]: bands_ir, _ = raster_client.ndarray(bands_scene_id,
....:                                     bands=["swir2", "swir1", "nir", "alpha"],  # infrared bands
....:                                     resolution=150,
....:                                     scales=[[0,4000], [0,4000], [0,4000], [0,1]],
....:                                     data_type="Byte")
....:




See in Viewer

### Scales¶

tl;dr: you only should specify scales if you specify a data_type smaller than the product’s native data type.

Scales allow you to map pixel values from their native domain to a range of your choosing. For example, a scale of [0, 4000, 0, 255] would map 4000 to 255, 0 to 0, and 2000 to 127.5. If the range is left out (the second two values), it defaults to [0, 255].

In [27]: raster_params = {
....:    "data_type": "Byte",
....:    "bands": ["red", "green", "blue", "alpha"],
....:    "resolution": 150
....: }
....:

In [28]: full_TOAR_arr, full_TOAR_meta = raster_client.ndarray(
....:      s2_scene_id,
....:      scales=[[0,10000], [0,10000], [0,10000], [0,1]],
....:      **raster_params
....: )
....:

In [29]: partial_TOAR_arr, partial_TOAR_meta = raster_client.ndarray(
....:      s2_scene_id,
....:      scales=[[0,4000], [0,4000], [0,4000], [0,1]],
....:      **raster_params
....: )
....:




See in Viewer

Notice how in the second image, the snowy peaks look blown out, but you can see more detail in the darker valleys. Every pixel above 4000 is rescaled to 255, so detail is lost in areas with high reflectance, and more values between 0-255 are used to express dark areas, giving them more dynamic range.

Scaling is only necessary when requesting data in a smaller data type than its native format—like above, where we requested UInt16 data as Byte. The 65,535 possible values in a UInt16 must be represented with just 255 values in a Byte. Though you could simply map [0..65,535] to [0..255], when you know something about the data, you can be more clever to reduce loss of precision.

In typical images, most pixels will be below around 40% reflectance. So rather than using only 0-101 to represent 0-40%, and 102-255 for 40-100%—values which rarely occur—it’s common to use a scaling like [0, 4000, 0, 255] for each band (as seen in the image above). This allows the values containing most of the useful information to utilize the full output range.

Note

Data range vs physical range

The values in raster data have physical meaning—but for efficiency, are often stored on-disk as different numbers.

Most satellite imagery, for example, measures reflectance: the fraction of the sun’s light reflected by the Earth. (Technically, it measures radiance, which is converted to reflectance, then also possibly corrected for atmospheric absorption.)

It would seem logical to represent these fractions as floating-point numbers between 0 and 1. However, they can also be represented as integers between 0 and 10,000 without loss of precision (due to the number of significant figures to which the instrument is actually accurate).

So storing the data as 16-bit unsigned integers is as precise as 64-bit floats, but uses four times less space (and is faster to operate on for the CPU).

We call the data range the range that values can have in their on-disk format (for example, 0-10000), and the physical range the numbers with physical meaning that the data range represents (in this case, reflectance from 0 to 1).

Again: scales are unnecessary if you just request the data in its native data type. The parameter exists only to allow for the optimization of downsampling to a smaller data type (from UInt16 to Byte), while retaining control over how that downsampling occurs. Unless you need to optimize your memory usage in that way, it’s easier to specify neither dtype nor scales and work with data in its native format. If necessary, you can use dl.scenes.display to visualize it (since matplotlib’s imshow can only handle uint8 arrays).

Warning

You should always specify a scaling when requesting data in a different data type than its native format.

Otherwise, you’ll get unexpected results, as the original binary values are simply cast to the new data type.

In [30]: do_this, _ = raster_client.ndarray(
....:     s2_scene_id,
....:     scales=[[0, 10000], [0, 10000], [0, 10000], [0, 1]],
....:     data_type="Byte",  # original data is uint16
....:     bands=["red", "green", "blue", "alpha"],
....:     resolution=200
....: )
....:

In [31]: dont_do_this, _ = raster_client.ndarray(
....:     s2_scene_id,
....:     scales=None,   # no! specify a scaling!
....:     data_type="Byte",  # original data is uint16
....:     bands=["red", "green", "blue", "alpha"],
....:     resolution=200
....: )
....:




### Cutline¶

A cutline clips raster data to geometry of your choice. Cutlines can be any GeoJSON object, or a WKT string. As a GeoJSON object, coordinates are expected to be in WGS84 lat-lon.

In [32]: geometry = dl.Places().shape("north-america_united-states_new-mexico_northwest_taos")

In [33]: taos_scene_id = "landsat:LC08:PRE:TOAR:meta_LC80330352017056_v1"

In [34]: clipped_arr, c_meta = raster_client.ndarray(taos_scene_id,
....:                                             cutline=geometry,  # clip to shape
....:                                             bands=["red", "green", "blue", "alpha"],
....:                                             resolution=150,
....:                                             scales=[[0,4000], [0,4000], [0,4000], [0,1]],
....:                                             data_type="Byte")
....:

In [35]: unclipped_arr, u_meta = raster_client.ndarray(taos_scene_id,
....:                                               cutline=None,  # no cutline
....:                                               bands=["red", "green", "blue", "alpha"],
....:                                               resolution=150,
....:                                               scales=[[0,4000], [0,4000], [0,4000], [0,1]],
....:                                               data_type="Byte")
....:




### Bounds¶

Bounds clips rasters to a rectangular spatial region defined by [min_x, min_y, max_x, max_y]. If bounds_srs is not set, those coordinates should be in the same spatial reference system as the output.

Bounds define a spatial region more explicitly than a cutline; if you find that different images with the same cutline have slightly different numbers of pixels, try setting the bounds.

### SRS¶

The SRS (Spatial Reference System, also known as Coordinate Reference System) sets the map projection of the output raster. It can be specified as an EPSG code (like “EPSG:4326”), a PROJ.4 definition, or OGC CRS Well-Known Text.

In [36]: s2_arr_wgs84, m_wgs84 = raster_client.ndarray(s2_scene_id,
....:                                               srs="EPSG:4326",  # WGS84 (lat-lon)
....:                                               resolution=0.001, # in decimal degrees
....:                                               bands=["red", "green", "blue", "alpha"],
....:                                               scales=[[0,8000], [0,8000], [0,8000], [0,1]],
....:                                               data_type="Byte")
....:

In [37]: s2_arr_polar, m_polar = raster_client.ndarray(s2_scene_id,
....:                                               srs="EPSG:3995",  # WGS84 Arctic Polar Stereographic
....:                                               resolution=150,
....:                                               bands=["red", "green", "blue", "alpha"],
....:                                               scales=[[0,8000], [0,8000], [0,8000], [0,1]],
....:                                               data_type="Byte")
....:




Here’s the first image we saw over Alaska, in its (native) UTM projection, versus a WGS84 geographic coordinate system (Plate Carrée, where latitude and longitude are mapped directly to x and y), versus an Arctic Plolar Stereographic projection, if you’re into that.

The affine geotransform matrix for the raster is available in the rasterization metadata dictionary that’s also returned from raster calls if you need it for interacting with other geospatial software, or uploading data back to the Descartes Labs Catalog.

In [38]: m_wgs84["geoTransform"]
Out[38]: [-144.8425657097265, 0.001, 0.0, 62.220923806776845, 0.0, -0.001]


### Resolution¶

Resolution specifies the distance that the edge of each pixel represents on the ground, in units of the SRS. So in UTM coordinates, for example, it should be given in meters; in lat-lon coordinates, it should be in decimal degrees. Therefore, the smaller the number, the finer resolution.

Using as coarse of a resolution as possible for your analysis will be both faster and cheaper (less data to transfer over the network, and to operate on). In particular, loading an entire scene at full resolution is rarely necessary, and will take a while.

### Dimensions¶

Dimensions specify a width and height, in pixels, within which the output raster should fit.

You can either specify the resolution or the dimensions, but not both. If you use resolution, the output image is resampled to the specified pixel size, which determines its width and height. If you use dimensions, a resolution is chosen that fits the output image into the specified width and height. However, to maintain the aspect ratio, the output dimensions may end up being smaller than you specify.

In [39]: singapore_s2 = "sentinel-2:L1C:2016-04-16_48NUG_99_S2A_v1"

In [40]: arr_smalldims, _ = raster_client.ndarray(singapore_s2,
....:                                         dimensions=(100, 100),  # width x height
....:                                         bands=["red", "green", "blue", "alpha"],
....:                                         scales=[[0,4000], [0,4000], [0,4000], [0,1]],
....:                                         data_type="Byte")
....:

In [41]: arr_smalldims.shape
Out[41]: (100, 100, 4)

In [42]: arr_biggerdims, _ = raster_client.ndarray(singapore_s2,
....:                                           dimensions=(600, 400),  # width x height, unequal
....:                                           bands=["red", "green", "blue", "alpha"],
....:                                           scales=[[0,4000], [0,4000], [0,4000], [0,1]],
....:                                           data_type="Byte")
....:

# notice the array is still 400 x 400; it just fits *within* 600 x 400
In [43]: arr_biggerdims.shape
Out[43]: (400, 400, 4)




### Align Pixels¶

The align_pixels parameter ensures that pixels’ top-left corners in the output raster correspond to non-fractional intervals of resolution in the output SRS.

What does that mean? Consider requesting two rasters in some imaginary coordinate system with units in meters, one requested with the bounds [0, 0, 10, 10], and another with the bounds [0.5, 0.8, 10.5, 10.8]. Since both areas are 10x10 meters, both rasters would be 10x10 pixels at 1-meter resolution. However, pixel (0, 0) in the first raster would refer to the coordinates (0, 0) in our imaginary SRS, whereas pixel (0, 0) in the second would refer to (0.5, 0.8). Therefore, though the pixels would have the same array indicies, they wouldn’t be quite aligned to the same place on Earth, and comparisons between them would be invalid.

Specifying align_pixels=True in this case would internally shift bounds for you, so that the coordinates fell on whole-number intervals of 1 meter. In that case, the bounds of the output would end up being [0, 0, 10, 10].

Therefore, when you’re loading different images covering the same area, and you want to compare pixel-to-pixel between them, set align_pixels=True in most cases.

In [44]: s1_id = "sentinel-1:GRD:meta_2018-06-16_085A2218_S1A"

In [45]: utm_bounds = (
....:   300000.5,
....:   3000000.25,
....:   400000.5,
....:   3200000.25
....: )
....:

In [46]: s1_arr, s1_meta = raster_client.ndarray(s1_id,
....:                                         align_pixels=False,
....:                                         bounds=utm_bounds,
....:                                         resolution=200,
....:                                         bands=["vv", "vh"])
....:

In [47]: s1_arr_tap, s1_meta_tap = raster_client.ndarray(s1_id,
....:                                                 align_pixels=True,
....:                                                 bounds=utm_bounds,
....:                                                 resolution=200,
....:                                                 bands=["vv", "vh"])
....:

# without align_pixels, the pixels start on a fraction of a meter
In [48]: s1_meta['cornerCoordinates']['upperLeft'], s1_meta['cornerCoordinates']['lowerRight']
Out[48]: ([300000.5, 3200000.25], [400000.5, 3000000.25])

# with it, they are "snapped" to whole meters
In [49]: s1_meta_tap['cornerCoordinates']['upperLeft'], s1_meta_tap['cornerCoordinates']['lowerRight']
Out[49]: ([300000.0, 3200200.0], [400200.0, 3000000.0])

In [50]: s1_arr.shape
Out[50]: (1000, 500, 2)

# notice that this changes the output dimensions
# to cover the extra fraction of a pixel
In [51]: s1_arr_tap.shape
Out[51]: (1001, 501, 2)


### Resampler¶

This determines the algorithm used to interpolate between pixels when scaling up, scaling down, or otherwise warping a raster to a new resolution or SRS.

Nearest-neighbor is the default (fastest but lowest-quality). Bilinear produces visually smoother images when downsampling. Be aware, though, that if using discretely-valued data (such as a raster classification layer where 1 means brick, 3 means wood, etc.), use a resampler like nearest-neighbor or median—an interpolating resampler could produce values that don’t exist in your input data (like 2).

In [52]: resamplers = [
....:   "bilinear",
....:   "near",
....:   "cubic"
....: ]
....:

In [53]: arrs = [
....:   raster_client.ndarray(s1_id,
....:     resampler=resampler,
....:     resolution=1000,
....:     bands=["vv", "vh", "vv"]
....:   )[0]
....:   for resampler in resamplers
....: ]
....:

In [54]: dl.scenes.display(arrs[0], title=resamplers[0], size=8, bands_axis=-1)

In [55]: dl.scenes.display(arrs[1], title=resamplers[1], size=8, bands_axis=-1)

In [56]: dl.scenes.display(arrs[2], title=resamplers[2], size=8, bands_axis=-1)


See in Viewer

## DLTiles¶

A common method for scaling up analysis over large geographic areas is to split those areas up into a grid of small, equally-sized tiles and operate on each tile separately. To make this easier, Descartes Labs offers DLTiles: a way of defining a grid of arbitrary spacing that covers the globe.

A tiling is defined by three parameters: the spatial resolution, tile size, and overlap between tiles (padding). The optimal tiling parameters will depend on your application. After defining a tiling, you can obtain the individual DLTiles intersecting a region of interest by providing a latitude/longitude pair or GeoJSON geometry to one of the DLTile methods. Each DLTile is identified by a unique key; that key encodes the tiling parameters, and which number tile in grid that specific DLTile is.

DLTiles have another benefit: they let you consistently refer to the same spatial region across products that might have different native resolutions and coordinate reference systems. Even if you aren’t working over a large region that needs to be split into smaller tiles, DLTiles are still the recommended way of specifying a region of interest for this reason. Every Raster request for the same DLTile will return an array with the same shape (same number of pixels), covering the same bounds on Earth, at the same resolution, so data from different products can be used together without clipping or reprojecting.

Using the dltile_from_latlon and dltile_from_shape methods, you can generate uniformly sized tile(s) over point or polygon Areas of Interest (respectively). These tile geometries can be passed to the Metadata and Raster methods to search for and access data for each tile.

Spatial resolution The spatial resolution determines how much distance on the ground (in meters) is covered by a single pixel. Higher- or lower-resolution versions of the same tile grid can be easily created by adjusting the spatial resolution and tile dimensions such that they cover the same ground distance. For example, a 1048x1048 pixel tile at 30m resolution will yield the same tile boundaries as a 524x524 pixel tile at 60m. DLTiles use UTM zones as a base projection, thus the spatial resolution is always specified in meters.

Tile size Tile size determines the number of pixels along each side of a tile. The choice of tile size is somewhat arbitrary, but it’s most efficient to choose the largest tile size possible given your application. However, if you have a memory-intensive computation, reducing tile size is often the easiest way to reduce memory usage. Typical tile dimensions that cover a majority of use cases are 1024x1024 to 2048x2048.

Padding Padding sets the number of extra pixels to buffer each side of a tile—therefore also controlling the number of pixels by which two tiles will overlap. This padding is often necessary in computations that involve a stencil or kernel, such as a taking a derivative or doing a convolution, or where edge effects may otherwise be introduced. Padding is added outside of tile_size, so the total number of pixels along each side of a DLtile is tile_size + 2*padding.

Here’s an example of a tiling over the state of New Mexico.

import descarteslabs as dl
>>> new_mexico = dl.Places().shape('north-america_united-states_new-mexico')
>>> tile_size = 2048
>>> resolution = 60

If we were to add a padding, all tiles will overlap. For illustration purposes, we created a new tiling with the same parameters above, but set a very large padding (pad = 100). Overlaying this tiling on the above tiling, we can see the overlap between tiles related to padding as well as the overlap due to the UTM projection.