Catalog

Use the Descartes Labs Catalog to discover existing raster products, search the images contained in them and manage your own products and images.

Note

The Catalog Python object-oriented client provides the functionality previously covered by the more low-level, now deprecated Metadata and Catalog Python clients.

Note

The Catalog Python client is mainly for discovering data and for managing data. For data analysis and rastering use Scenes.

Concepts

The Descartes Labs Catalog is a repository for georeferenced images. Commonly these images are either acquired by Earth observation platforms like a satellite or they are derived from other georeferenced images. The catalog is modeled on the following core concepts, each of which is represented by its own class in the API.

Images

An image (represented by class Image in the API) contains data for a shape on earth, as specified by its georeferencing. An image references one or more files (commonly TIFF or JPEG files) that contain the binary data conforming to the band declaration of its product.

Bands

A band (represented by class Band) is a 2-dimensional slice of raster data in an image. A product must have at least one band and all images in the product must conform to the declared band structure. For example, an optical sensor will commonly have bands that correspond to the red, blue and green visible light spectrum, which you could raster together to create an RGB image.

Products

A product (represented by class Product) is a collection of images that share the same band structure. Images in a product can generally be used jointly in a data analysis, as they are expected to have been uniformly processed with respect to data correction, georegistration and so on. For example, you can composite multiple images from a product to run an algorithm over a large geographic region.

Some products correspond directly to image datasets provided by a platform. See for example the Landsat 8 Collection 1 product. This product contains all images taken by the Landsat 8 satellite, is updated continuously as it takes more images, and is processed to NASA’s Collection 1 specification.

A product may also represent data derived from multiple other products or data sources - some may not even derive from Earth observation data. A raster product can contain any sort of image data as long as it’s georeferenced.

Searching the catalog

All objects support the same search interface. Let’s look at two of the most commonly searched for types of objects: products and images.

Finding products

Filtering and sorting

Product.search() is the entry point for searching products. It returns a query builder that you can use to refine your search and can iterate over to retrieve search results.

Count all products with some data before 2016 using filter():

>>> from descarteslabs.catalog import Product, properties as p
>>> search = Product.search().filter(p.start_datetime < "2016-01-01")
>>> search.count()
56

You can apply multiple filters. To restrict this search to products with data after 2000:

>>> search = search.filter(p.end_datetime > "2000-01-01")
>>> search.count()
26

Of these, get the 3 products with the oldest data, using sort() and limit(). The search is not executed until you start retrieving results by iterating over it:

>>> oldest_search = search.sort("start_datetime").limit(3)
>>> for result in oldest_search:
...     print(result.id)
...
landsat:LT05:PRE:TOAR
dmsp:nightlights
42b24cbb9a71ed9beb967dbad04ea61d7331d5af:global_forest_change_v0

All attributes are documented in the Product API reference, which also spells out which ones can be used to filter or sort.

Lookup by id and object relationships

If you know a product’s id, look it up directly with Product.get():

>>> landsat8_collection1 = Product.get("landsat:LC08:01:RT:TOAR")
>>> landsat8_collection1

Product: Landsat 8 Real Time Collection 1
  id: landsat:LC08:01:RT:TOAR

Wherever there are relationships between objects expect methods such as bands() to find related objects. This shows the first four bands of the Landsat 8 product we looked up:

>>> list(landsat8_collection1.bands().limit(4))
[
SpectralBand: coastal-aerosol
  id: landsat:LC08:01:RT:TOAR:coastal-aerosol
  product: landsat:LC08:01:RT:TOAR,
SpectralBand: blue
  id: landsat:LC08:01:RT:TOAR:blue
  product: landsat:LC08:01:RT:TOAR,
SpectralBand: green
  id: landsat:LC08:01:RT:TOAR:green
  product: landsat:LC08:01:RT:TOAR,
SpectralBand: red
  id: landsat:LC08:01:RT:TOAR:red
  product: landsat:LC08:01:RT:TOAR]

bands() returns a search object that can be further refined. This shows all class bands of this Landsat 8 product, sorted by name:

>>> from descarteslabs.catalog import BandType
>>> list(landsat8_collection1.bands().filter(p.type == BandType.CLASS).sort("name"))
[
ClassBand: qa_cirrus
  id: landsat:LC08:01:RT:TOAR:qa_cirrus
  product: landsat:LC08:01:RT:TOAR,
ClassBand: qa_cloud
  id: landsat:LC08:01:RT:TOAR:qa_cloud
  product: landsat:LC08:01:RT:TOAR,
ClassBand: qa_cloud_shadow
  id: landsat:LC08:01:RT:TOAR:qa_cloud_shadow
  product: landsat:LC08:01:RT:TOAR,
ClassBand: qa_saturated
  id: landsat:LC08:01:RT:TOAR:qa_saturated
  product: landsat:LC08:01:RT:TOAR,
ClassBand: qa_snow
  id: landsat:LC08:01:RT:TOAR:qa_snow
  product: landsat:LC08:01:RT:TOAR,
ClassBand: valid-cloudfree
  id: landsat:LC08:01:RT:TOAR:valid-cloudfree
  product: landsat:LC08:01:RT:TOAR]

Finding images

Image filters

Search images by the most common attributes - by product, intersecting with a geometry and by a date range:

>>> from descarteslabs.catalog import Image, properties as p
>>> geometry = {
...     "type": "Polygon",
...     "coordinates": [[
...         [2.915496826171875, 42.044193618165224],
...         [2.838592529296875, 41.92475971933975],
...         [3.043212890625, 41.929868314485795],
...         [2.915496826171875, 42.044193618165224]
...     ]]
... }
...
>>> search = Product.get("landsat:LC08:01:RT:TOAR").images()
>>> search = search.intersects(geometry)
>>> search = search.filter((p.acquired > "2017-01-01") & (p.acquired < "2018-01-01"))
>>> search.count()
14

There are other attributes useful to filter by, documented in the API reference for Image. For example exclude images with too much cloud cover:

>>> search = search.filter(p.cloud_fraction < 0.2)
>>> search.count()
7

Filtering by cloud_fraction is only reasonable when the product sets this attribute on images. Images that don’t set the attribute are excluded from the filter.

The created timestamp is added to all objects in the catalog when they are created and is immutable. Restrict the search to results created before some time in the past, to make sure that the image results are stable:

>>> from datetime import datetime
>>> search = search.filter(p.created < datetime(2019, 1, 1))
>>> search.count()
7

Note that for all timestamps we can use datetime instances or strings that can reasonably be parsed as a timestamp. If a timestamp has no explicit timezone, it’s assumed to be in UTC.

Image summaries

Any queries for images support a summary via the summary() method, returning a SummaryResult with aggregate statistics beyond just the number of results:

>>> from descarteslabs.catalog import Image, properties as p
>>> search = Image.search().filter(p.product_id == "landsat:LC08:01:T1:TOAR")
>>> search.summary()

Summary for 478563 images:
 - Total bytes: 55,793,492,234,364
 - Products: landsat:LC08:01:T1:TOAR

These summaries can also be bucketed by time intervals with summary_interval() to create a time series:

>>> search.summary_interval(interval="month", start_datetime="2017-01-01", end_datetime="2017-06-01")
[
Summary for 9872 images:
 - Total bytes: 1,230,379,744,242
 - Interval start: 2017-01-01 00:00:00+00:00,
Summary for 10185 images:
 - Total bytes: 1,288,400,404,886
 - Interval start: 2017-02-01 00:00:00+00:00,
Summary for 12426 images:
 - Total bytes: 1,556,107,514,684
 - Interval start: 2017-03-01 00:00:00+00:00,
Summary for 12492 images:
 - Total bytes: 1,476,030,969,986
 - Interval start: 2017-04-01 00:00:00+00:00,
Summary for 13768 images:
 - Total bytes: 1,571,780,442,608
 - Interval start: 2017-05-01 00:00:00+00:00]

Managing products

Creating and updating a product

Before uploading images to the catalog, you need to create a product and declare its bands. The only required attributes are a unique id, passed in the constructor, and a name:

>>> from descarteslabs.catalog import Product
>>> product = Product(id="guide-example-product")
>>> product.name = "Example product"
>>> product.save()
>>> product.id
u'descarteslabs:guide-example-product'
>>> product.created
datetime.datetime(2019, 8, 19, 18, 53, 26, 250005, tzinfo=<UTC>)

save() saves the product to the catalog in the cloud. Note that you get to choose an id for your product but it must be unique within your organization (you get an exception if it’s not). This code example is assuming the user is in the “descarteslabs” organization. The id is prefixed with the organization id on save to enforce global uniqueness and uniqueness within an organization. If you are not part of an organization the prefix will be your unique user id.

Every object has a read-only created attribute with the timestamp from when it was first saved.

There are a few more attributes that you can set (see the Product API reference). You can update the product to define the timespan that it covers. This is as simple as assigning attributes and then saving again:

>>> product.start_datetime = "2012-01-01"
>>> product.end_datetime = "2015-01-01"
>>> product.save()
>>> product.start_datetime
datetime.datetime(2012, 1, 1, 0, 0, tzinfo=<UTC>)
>>> product.modified
datetime.datetime(2019, 8, 19, 18, 53, 27, 114274, tzinfo=<UTC>)

A read-only modified attribute exists on all objects and is updated on every save.

Note that all timestamp attributes are represented as datetime instances in UTC. You may assign strings to timestamp attributes if they can be reasonably parsed as timestamps. Once the object is saved the attributes will appear as parsed datetime instances. If a timestamp has no explicit timezone, it’s assumed to be in UTC.

Creating bands

Before adding any images to a product you should create bands that declare the structure of the data shared among all images in a product.

>>> from descarteslabs.catalog import SpectralBand, DataType, Resolution, ResolutionUnit
>>> band = SpectralBand(name="blue", product=product)
>>> band.data_type = DataType.UINT16
>>> band.data_range = (0, 10000)
>>> band.display_range = (0, 4000)
>>> band.resolution = Resolution(unit=ResolutionUnit.METERS, value=60)
>>> band.band_index = 0
>>> band.save()
>>> band.id
u'descarteslabs:guide-example-product:blue'

A band is uniquely identified by its name and product. The full id of the band is composed of the product id and the name.

The band defines where its data is found in the files attached to images in the product: In this example, band_index = 0 indicates that blue is the first band in the image file, and that first band is expected to be represented by unsigned 16-bit integers (DataType.UINT16).

This band is specifically a SpectralBand, with pixel values representing measurements somewhere in the visible/NIR/SWIR electro-optical wavelength spectrum, so you can also set additional attributes to locate it on the spectrum:

>>> # These values are in nanometers (nm)
>>> band.wavelength_nm_min = 452
>>> band.wavelength_nm_max = 512
>>> band.save()

Bands are created and updated in the same way was as products and all other Catalog objects.

Band types

It’s common for many products to have an alpha band, which masks pixels in the image that don’t have valid data:

>>> from descarteslabs.catalog import MaskBand
>>> alpha = MaskBand(name="alpha", product=product)
>>> alpha.is_alpha = True
>>> alpha.data_type = DataType.UINT16
>>> alpha.resolution = band.resolution
>>> alpha.band_index = 1
>>> alpha.save()

Here the “alpha” band is created as a MaskBand which is by definition a binary band with a data range from 0 to 1, so there is no need to set the data_range and display_range attribute.

Setting is_alpha to True enables special behavior for this band during rastering. If this band appears as the last band in a raster operation (such as SceneCollection.mosaic or SceneCollection.stack in the scenes client) pixels with a value of 0 in this band will be treated as transparent.

There are five band types which may have some attributes specific to them. The type of a band does not necessarily affect how it is rastered, it mainly conveys useful information about the data it contains.

Access control

By default only the creator of a product can read and modify it as well as read and modify the images in it. To share access to a product with others you can modify its access control lists (ACLs):

>>> product.readers = ["org:descarteslabs"]
>>> product.writers = ["email:jane.doe@descarteslabs.com", "email:john.daly@gmail.com"]
>>> product.save()

For some more details on access control lists see the Sharing Resources guide

This gives read access to the whole “descarteslabs” organization. All users in that organization can now find the product. This also gives write access to two specific users identified by email. These two users can now update the product and add new images to it.

New bands and images created in a product inherit the product’s ACLs by default, but the ACLs for existing images are not automatically updated when they change on the product.

You can change the ACLs for all bands and images associated with a given product using update_related_objects_permissions(). This method kicks off an asynchronous task that performs the updates. If the product has more than 10,000 associated images, this might take several minutes to finish running. You get the current status of the job using get_update_permissions_task_status() or wait for the task to complete using wait_for_completion().

This sets the ACLs for all bands and images in product to those of the product and waits for the update to complete:

>>> status = product.update_related_objects_permissions(readers=product.readers, writers=product.writers)
>>> if status:
...     status.wait_for_completion()

Derived bands

A derived band is the result of a pixel function applied to one or more existing bands of a product. Derived bands become available on a product automatically when canonically named bands it relies on are present in the product. For example, the derived:ndvi band provides the normalized difference vegetation index (NDVI) if a product has bands named red and nir:

>>> from descarteslabs.catalog import DerivedBand
>>>
>>> ndvi = DerivedBand.get("derived:ndvi")
>>> ndvi.description
'Normalized Difference Vegetation Index'
>>> ndvi.bands
['nir', 'red']

The id and name of a derived band always has a derived: prefix to distinguish them clearly from bands declared in a product. The catalog provides a standard set of derived bands - you can’t create your own.

The bands attribute defines the band names that must be present in a product for this derived band. Find all derived bands available for a product with Product.derived_bands:

>>> landsat8_collection1 = Product.get("landsat:LC08:01:RT:TOAR")
>>> list(landsat8_collection1.derived_bands())
[
DerivedBand: derived:bai
  id: derived:bai,
DerivedBand: derived:evi
  id: derived:evi,
DerivedBand: derived:ndvi
  id: derived:ndvi,
DerivedBand: derived:ndwi
  id: derived:ndwi,
DerivedBand: derived:ndwi1
  id: derived:ndwi1,
DerivedBand: derived:ndwi2
  id: derived:ndwi2,
DerivedBand: derived:rsqrt
  id: derived:rsqrt,
DerivedBand: derived:visual_cloud_mask
  id: derived:visual_cloud_mask]

Deleting bands and products

All objects can be deleted using delete(). For example, delete the previously created alpha band:

>>> alpha.delete()
True

A product can only be deleted if it doesn’t have any bands or images. Because the product we created still has one band this fails:

>>> product.delete()
Traceback (most recent call last):
  File "< chunk 24 named None >", line 1, in <module>
  File "descarteslabs/catalog/catalog_base.py", line 450, in delete
    r = self._client.session.delete(self._url + "/" + self.id)
  File "requests/sessions.py", line 615, in delete
    return self.request('DELETE', url, **kwargs)
  File "descarteslabs/client/services/service/service.py", line 74, in request
    raise ConflictError(resp.text)
ConflictError: {"errors":[{"detail":"One or more related objects exist","status":"409","title":"Related objects exist"}],"jsonapi":{"version":"1.0"}}

There is a convenience method to delete all bands and images in a product. Be careful as this may delete a lot of data and can’t be undone!

>>> status = product.delete_related_objects()

This kicks off a job that deletes bands and images in the background. You can wait for this to complete and then delete the product:

>>> if status:
>>>    status.wait_for_completion()
>>>    product.delete()

Finding Products by id

You may have noticed that when creating products, the id you provide isn’t the id that is assigned to the object.

>>> product = Product(id="guide-example-product")
>>> product.name = "Example product"
>>> product.save()
>>> product.id
"descarteslabs:guide-example-product"

The id has a prefix added to ensure uniqueness without requiring you to come up with a globally unique name. The downside of this is you need to remember that prefix when looking up your products later:

# this will return False because the id has a prefix!
>>> Product.exists("guide-example-product")
False

You can use namespace_id() to generate a fully-namespaced product if you know the unprefixed part.

# this will return False because the id has a prefix!
>>> product_id = Product.namespace_id("guide-example-product")
>>> product_id
"descarteslabs:guide-example_product"

Managing images

Apart from searching and discovering data available to you, the main use case of the catalog is to let you upload new images.

Uploading image files

If your data already exists on disk as an image file, usually a GeoTIFF or JPEG file, you can upload it directly.

In the following examples we will upload data with a single band representing the blue light spectrum. First let’s create a product and band corresponding to that:

>>> # Create a product
>>> from descarteslabs.catalog import Band, DataType, Product, Resolution, ResolutionUnit, SpectralBand
>>> product = Product(id="guide-example-product", name="Example product")
>>> product.save()
>>>
>>> # Create a band
>>> band = SpectralBand(name="blue", product=product)
>>> band.data_type = DataType.UINT16
>>> band.data_range = (0, 10000)
>>> band.display_range = (0, 4000)
>>> band.resolution = Resolution(unit=ResolutionUnit.METERS, value=60)
>>> band.band_index = 0
>>> band.save()

Now Image.upload uploads images to the new product and returns a ImageUpload. Images are uploaded and processed asynchronously, so they are not available in the catalog immediately. With ImageUpload.wait_for_completion we wait until the upload is completely finished.

>>> # Set any attributes that should be set on the uploaded images
>>> image = Image(product=product, name="scene1")
>>> image.acquired = "2012-01-02"
>>> image.cloud_fraction = 0.1
>>>
>>> # Do the upload
>>> image_path = "docs/guides/blue.tif"
>>> upload = image.upload(image_path)
>>> upload.wait_for_completion()
>>> upload.status
u'success'

Attributes that can be derived from the image file, such as the georeferencing, will be assigned to the image during the upload process. But you can set any additional Image attributes such as acquired and cloud_fraction here.

Note that this code makes a number of assumptions:

  • A GeoTIFF exists locally on disk at the path docs/guides/blue.tiff from the current directory.

  • The GeoTIFF’s first band matches the blue band we created (for example, it has an unsigned 16-bit integer data type).

  • The GeoTIFF is correctly georeferenced.

Image uploads use Descartes Labs Storage behind the scenes. You can find the uploaded file using the product id as a prefix in the products storage type:

>>> import descarteslabs as dl
>>> storage_client = dl.Storage()
>>> storage_client.list(prefix=product.id, storage_type="products")
[]

Uploading ndarrays

Often, when creating derived product - for example, running a classification model on existing data - you’ll have a NumPy array (often referred to as “ndarrays”) in memory instead of a file written to disk. In that case, you can use Image.upload_ndarray. This method behaves like Image.upload, with one key difference: you must provide georeferencing attributes for the ndarray.

Georeferencing attributes are used to map between geospatial coordinates (such as latitude and longitude) and their corresponding pixel coordinates in the array. The required attributes are:

  • An affine geotransform in GDAL format (the geotrans attribute)

  • A coordinate reference system definition, preferrably as an EPSG code (the cs_code attribute) or alternatively as a string in PROJ.4 or WKT format (the projection attribute)

If the ndarray you’re uploading was rastered through the the platform, this information is easy to get. When rastering you also receive a dictionary of metadata that includes both of these parameters. Using the Scene.ndarray, you have to set raster_info=True; with Raster.ndarray, it’s always returned.

The following example puts these pieces together. This extracts the blue band from a Landsat 8 scene at a lower resolution and uploads it to our product:

>>> from descarteslabs.catalog import OverviewResampler
>>>
>>> scene, geoctx = dl.scenes.Scene.from_id("landsat:LC08:01:T1:TOAR:meta_LC08_L1TP_163068_20181025_20181025_01_T1_v1")
>>> ndarray, raster_meta = scene.ndarray(
...     "blue",
...     geoctx.assign(resolution=60),
...     # return georeferencing info we need to re-upload
...     raster_info=True
... )
...
>>> image2 = Image(product=product, name="scene2")
>>> image2.acquired = "2012-01-02"
>>> upload2 = image2.upload_ndarray(
...     ndarray,
...     raster_meta=raster_meta,
...     # create overviews for 120m and 240m resolution
...     overviews=[2, 4],
...     overview_resampler=OverviewResampler.AVERAGE,
... )
...
>>> upload2.wait_for_completion()
>>> upload2.status
u'success'

The rastered ndarray here is a three-dimensional array in the shape (band, x, y) - the first axis corresponds to the band number. Image.upload_ndarray expects an array in that shape and will raise a warning if thinks the shape of the array is wrong. If the given array is two-dimensional it will assume you’re uploading a single band image.

This also specifies typically useful values for overviews and overview_resampler. Overviews allow the platform to raster your image faster at non-native resolutions, at the cost of more storage and a longer initial upload processing time to calculate the overviews.

The overviews argument specifies a list of up to 16 different resolution magnification factors to calulate overviews for. E.g. overviews=[2,4] calculates two overviews at 2x and 4x the native resolution. The overview_resampler argument specifies the algorithm to use when calculating overviews, see Image.upload_ndarray for which algorithms can be used.

Updating images

The image created in the previous example is now available in the Catalog. We can look it up and update any of its attributes like any other catalog object:

>>> image2 = Image.get(image2.id)
>>> image2.cloud_fraction = 0.2
>>> image2.save()

To update the underlying file data, upload an ndarray again using the same image id.

Tags & extra properties

The image attributes you can set, filter by and sort on are documented on the Image class. If you have other structured metadata to attach with your images you can use extra_properties:

>>> image2.extra_properties = {
...     "processing_time": 120,
...     "quality": 0.5,
...     "reviewer": "joe@acme.com",
... }
...
>>> image2.save()

extra_properties is a dictionary with string keys and values of any type that can be JSON-serialized (booleans, numbers, strings, lists, dictionaries).

Note that you cannot filter or sort images by extra_properties. Use tags if you have a finite discrete number of custom values you’d like to filter by:

>>> image2.tags = ["temporary", "guide"]
>>> image2.save()
>>>
>>> # Find all images in the product tagged "temporary"
>>> search = product.images().filter(p.tags == "temporary")
>>> list(search)
[
Image:
  id: descarteslabs:guide-example-product:scene2
  product: descarteslabs:guide-example-product
  created: Mon Aug 19 18:53:43 2019]

Troubleshooting uploads

The ImageUpload returned from Image.upload and Image.upload_ndarray provides status information on the image upload.

In the following example we upload an invalid file (it’s empty), so we expect the upload to fail.

>>> import tempfile
>>> invalid_image_path = tempfile.mkstemp()[1]
>>> with open(invalid_image_path, "w"): pass
>>>
>>> image3 = Image(product=product, name="scene3", acquired="2012-03-01")
>>> upload3 = image3.upload(invalid_image_path)
>>> upload3.status
u'pending'
>>>
>>> upload3.wait_for_completion()
>>> upload3.status
u'failure'

You can also list any past upload results with Product.image_uploads() and Image.image_uploads(). Note that upload results are currently not stored indefinitely, so you may not have access to the full history of uploads for a product or image.

>>> for upload in product.image_uploads():
...     print(upload.id, upload.status)
...
ZGVzY2FydGVzbGFiczpndWlkZS1leGFtcGxlLXByb2R1Y3Q=:blue.tif success
ZGVzY2FydGVzbGFiczpndWlkZS1leGFtcGxlLXByb2R1Y3Q=:tmpE8E2xf.tif success
ZGVzY2FydGVzbGFiczpndWlkZS1leGFtcGxlLXByb2R1Y3Q=:tmpOT6LF0 failure

Remote images

In addition to hosting rasterable images with file data attached, the catalog also supports images where the underlying raster data is not directly available. These remote images cannot be rastered but can be searched for using the catalog. This is useful for a couple of scenarios:

  • A product of images that have not been consistently processed, optimized or georegistered in a way that prevents them from being rastered by the platform, for example raw imagery taken in unprocessed form from a sensor. Such a product can serve as the basis for higher-level products that have been processed consistently from the raw imagery.

  • A product of images for which file data exist somewhere outside the platform but has not been uploaded or only partly uploaded into the platform. This gives users the chance to browse the full metadata of images and then make decisions about what file data should be uploaded on demand.

To create a remote image set storage_state to "remote". The only required attributes for remote images are acquired and geometry to anchor them in time and space. No bands are required for a product holding only remote images.

>>> from descarteslabs.catalog import Product, Image, StorageState
>>> product = Product(id="guide-example-raw", name="Raw product")
>>> product.save()
>>>
>>> geometry = {
...     "type": "Polygon",
...     "coordinates": [[
...         [7.488099932670593, 46.95386728954941],
...         [7.488352060317992, 46.953656742419255],
...         [7.488429844379425, 46.953916722233814],
...         [7.488099932670593, 46.95386728954941]
...     ]]
... }
...
>>> image = Image(product=product, name="raw-image")
>>> image.storage_state = StorageState.REMOTE
>>> image.acquired = "2018-04-12"
>>> image.geometry = geometry
>>> image.save()

If some form of URL referencing the remote image is available, attach it through the files attribute using a File:

>>> from descarteslabs.catalog import File
>>> image.files = [File(href="http://remote.server.com/path/image.tiff")]
>>> image.save()