Catalog

The Catalog API enables you to upload your own raster data to the Descartes Labs Platform. When you upload data via the Catalog, you are able to interact with the data like any other product on the platform. You can search for your data using the Metadata API, retrieve the data with Raster, and explore the data using the Viewer user interface.

There are two common uses cases relevant to Catalog:
  • uploading derived products
  • uploading existing data to a new product

In the first case, you may want to use the Catalog API to create a derived product from data available through the platform, then upload the new product back to the platform. For example, let’s say you have developed an image classification model. You may use the Metadata API to search for satellite imagery, the Scene API to obtain the imagery, then use the Tasks API to scale the model across a large area. With Catalog, you can upload your new image classifications to the platform as they’re created so that you can visualize and interact with them.

In the other case, you may simply want to upload existing data that you have access to, but is not hosted by Descartes Labs.

In all cases, uploading a product to the Catalog requires either an image file in GeoTIFF or JPEG2000 format; or georeferenced ndarray data. Currently HDF is not a supported file format.

Basic Example

In [1]: import descarteslabs as dl

In [2]: import os

In [3]: import uuid

In [4]: import geojson

In [5]: from random import randint

In [6]: from time import sleep

In [7]: from datetime import datetime

In [8]: catalog_client = dl.Catalog()

In [9]: metadata_client = dl.Metadata()

# used to generate unique product IDs
In [10]: new_id = str(uuid.uuid4())

In [11]: new_id
Out[11]: 'fb04a008-d652-43eb-b482-5e4b5ede2636'

Create a product

Before uploading product data to the platform, you need to create a product and its bands. You can upload optional metadata about the product including number of bands, resolution and revisit rate, as well as grant read access to other users or groups. By default, a new product and its bands are available only to you.

When creating a product you provide a product_id parameter that needs to be unique among all the products you’ve created. The platform will assign a full ID using your unique user ID and the product ID you provided.

In [12]: product_id = catalog_client.add_product(
   ....:     'example-product' + new_id,            # unique ID that you generate
   ....:     title='Example Product',
   ....:     description='An example product'
   ....: )['data']['id']
   ....: 

In [13]: product_id                                 # unique ID created by platform
Out[13]: u'd4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636'

Create band(s)

After creating the product, you need to add one or more bands to the product using Catalog.add_band. You need to indicate a number of required band metadata properties, including: data type, number of bits to store band data, minimum and maximum values for values in the band, which band from the input file the data for the band is mapped to, color maps, optimal scaling for visualization, and any other relevant metadata properties.

In [14]: band_id = catalog_client.add_band(
   ....:     product_id=product_id, # id of the product we just created
   ....:     name='blue',           # unique name to describe what the band encodes
   ....:     srcfile=0,             # if data for a single scene will be in multiple files, which file this band will be in (0-indexed)
   ....:     srcband=1,             # 1-based index indicating which band in the file or ndarray this band is
   ....:     nbits=14,
   ....:     dtype='UInt16',
   ....:     data_range=[0, 10000], # minimum and maximium data values in the band
   ....:     physical_range=[0,1],  # minimum and maximum values data *represents*, e.g. [0, 100000] might represent [0, 1]
   ....:     type='spectral'
   ....: )['data']['id']
   ....: 

In [15]: band_id
Out[15]: u'd4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636:blue'

Note

See the note on ‘Data range vs physical range’ here to learn more about these parameters.

It’s also possible to create, edit, and delete both products and bands through the Catalog

Uploading Data to the Catalog

When your data already exists on disk in a GeoTIFF or other raster format, you can use Catalog.upload_image. This method allows you to upload the image metadata and the image data in one call. Images are processed by the platform asynchronously, so no information about the image is returned. You can poll the Scene.from_id to determine when your image has been processed, or use Catalog.iter_upload_results or Catalog.upload_result.

In [16]: image_path = os.path.join(os.getcwd(), 'guides', 'blue.tif')

In [17]: catalog_client.upload_image(image_path, product_id)

# Poll for processed image
In [18]: processed_image_id = '{}:{}'.format(product_id, 'blue')

In [19]: while True:
   ....:     try:
   ....:         image, ctx = dl.scenes.Scene.from_id(processed_image_id)
   ....:         break
   ....:     except Exception:
   ....:         sleep(2)
   ....: 

In [20]: image
Out[20]: 
Scene "d4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636:blue"
  * Product: "d4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636"
  * CRS: "+proj=utm +zone=43 +datum=WGS84 +units=m +no_defs "
  * Date: Tue Nov 27 17:33:42 2018
  * Bands:
    * blue: UInt16, [0, 10000] -> [0, 1]

Note about alpha bands and nodata values

When creating new products, you should set a nodata value or include an alpha band with the product, especially if you want to clip the product by shape. Though an explicit nodata value or an alpha band isn’t strictly required, not providing one or the other can lead to confusion between meaningful 0-values and no-data values that are represented as 0s.

Working with the data

Once the image has been processed, you can work with it like any other image.

Viewing the image

In [21]: arr = image.ndarray('blue', ctx, mask_alpha=False)

In [22]: dl.scenes.display(arr)
https://cdn.descarteslabs.com/docs/public/_images/catalog-22.jpg

Modifying the bands

It’s possible to modify the bands to do things like add a colormap for more pretty viewing. These changes can take a few minutes to take effect.

In [23]: response = catalog_client.change_band(product_id,
   ....:                                       band_id,
   ....:                                       colormap_name='magma')
   ....: 

If you want to add your own color map you need to first put it in a form the platform can understand. The general form is a list where each item is a map to the appropriate colorspace for the pixel value at the same item’s index. An example for integer data types:

In [24]: bad_colormap = [[str(randint(0, 255)) for i in range(4)] for i in range(256)]

In [25]: response = catalog_client.change_band(product_id,
   ....:                                       band_id,
   ....:                                       colormap=bad_colormap)
   ....: 

Sharing the product

If you decide that your new product might be useful for others you can share it, its bands, and its images.

In [26]: response = catalog_client.change_product(product_id,
   ....:                                          read=['some:group'],
   ....:                                          set_global_permissions=True)
   ....: 

Making all the data available to the group can take some time. The changes are made in the background, and won’t lock you up while they’re occurring. If set_global_permissions=True isn’t set, others can see the metadata about the product itself, but those permissions won’t be applied to the bands or images it contains, which isn’t that helpful.

Adding Custom Metadata

You can add custom metadata to an image when you upload an image with Catalog.upload_image, or after an image with Catalog.change_image. This metadata is be added to the auto generated image metadata during processing. Acquired date can’t be inferred accurately, and you should set this metadata property yourself. If you don’t provide this value, the platform will use the update time of the supplied file which probably will not represent a real aquisition time for an image.

In [27]: catalog_client.change_image(product_id,
   ....:                             processed_image_id,
   ....:                             acquired=datetime.now().isoformat())
   ....: 
Out[27]: 
{u'data': {u'attributes': {u'acquired': u'2018-11-27T17:33:55.960134+00:00',
   u'bucket': u'storage-d4ef22d5a6969cb61147ec8ea3e060cdf33e1a49-products',
   u'descartes_version': u'dl-platform-ingest',
   u'directory': u'd4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636/',
   u'file_md5s': [u'1cbec99bdcd2b4d24fc646deb39342f3'],
   u'file_sizes': [1503001],
   u'files': [u'1cbec99bdcd2b4d24fc646deb39342f3.tif'],
   u'geometry': {u'coordinates': [[[73.4682704195585, 50.510649275790676],
      [73.72290871796778, 50.51372144948099],
      [73.72602088693354, 50.39800582255859],
      [73.47200258138672, 50.39494617828006],
      [73.4682704195585, 50.510649275790676]]],
    u'type': u'Polygon'},
   u'geotrans': [391395.0, 15.0, 0.0, 5596530.0, 0.0, -15.0],
   u'identifier': u'd4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636/blue.tif',
   u'key': u'blue',
   u'owners': [u'user:d4ef22d5a6969cb61147ec8ea3e060cdf33e1a49'],
   u'processed': u'2018-11-27T17:33:51.016043+00:00',
   u'proj4': u'+proj=utm +zone=43 +datum=WGS84 +units=m +no_defs ',
   u'raster_size': [1204, 858],
   u'read': [u'group:some:group'],
   u'writers': []},
  u'id': u'd4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636:blue',
  u'meta': {u'modified': u'2018-11-27T17:33:56.034899+00:00',
   u'owner_type': u'user'},
  u'relationships': {u'product': {u'data': {u'id': u'd4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636',
     u'type': u'product'}}},
  u'type': u'image'}}

The following metadata properties are inferred from the image file, and are overwritten by any values you supply:

  • acquired
  • key
  • proj4
  • geotrans
  • descartes_version
  • geometry
  • raster_size
  • bucket
  • directory
  • processed
  • files
  • file_sizes
  • file_md5s
  • identifier
  • owner
  • product

Cleaning Up

You can delete products, but keep in mind that a product can’t be removed if it has any images or bands attached to it. You can use cascade=True and force all the bands and images to be removed when deleting a product.

In [28]: catalog_client.remove_product(product_id, cascade=True)
Out[28]: {u'deletion_task': u'eRNNnh1FQYCrKCke-F93Gw:31643398'}

Uploading Multi-Image Scenes

If you have separate files for band data in the same scene you can upload them into the same product as separate bands.

First create the bands, as usual. Note that the srcfile parameter values change for each band.

In [29]: product_id = catalog_client.add_product(
   ....:     'masks' + new_id,
   ....:     title='Cloud and Brightness masks',
   ....:     description='Cloud and Brightness masks'
   ....: )['data']['id']
   ....: 

In [30]: band0_id = catalog_client.add_band(
   ....:     product_id=product_id,
   ....:     name='cloud-mask',
   ....:     jpx_layer=0,
   ....:     srcfile=0,  # first file
   ....:     srcband=1,  # first band of first file
   ....:     nbits=1,
   ....:     dtype='UInt16',
   ....:     nodata=0,
   ....:     data_range=[0, 1],
   ....:     type='mask',
   ....: )['data']['id']
   ....: 

In [31]: band1_id = catalog_client.add_band(
   ....:     product_id=product_id,
   ....:     name='bright-mask',
   ....:     jpx_layer=0,
   ....:     srcfile=1,   # second file
   ....:     srcband=1,   # first band of second file
   ....:     nbits=1,
   ....:     dtype='UInt16',
   ....:     nodata=0,
   ....:     data_range=[0, 1],
   ....:     type='mask',
   ....: )['data']['id']
   ....: 

Now upload the image files, setting multi=True. Be sure to order the files in the list according to the srcfile parameter values for the created bands. E.G. if band0 data is stored in image_path and it’s srfile is 0; and band1 data is stored in other_image_path, and it’s srcfile is 1; then the order to list the files is [image_path, other_image_path].

Also note that when using multi=True you must provide a unique value for image_id.

In [32]: cloud_image_path = os.path.join(os.getcwd(), 'guides', 'cloud-mask.tif')

In [33]: bright_image_path = os.path.join(os.getcwd(), 'guides', 'bright-mask.tif')

In [34]: image_id = '_'.join(['test_multi_image_scene', str(datetime.now().isoformat())])

In [35]: catalog_client.upload_image(
   ....:     [cloud_image_path, bright_image_path],
   ....:     product_id,
   ....:     multi=True,
   ....:     image_id=image_id,
   ....: )
   ....: 
In [36]: catalog_client.remove_product(product_id, cascade=True)
Out[36]: {u'deletion_task': u'OOYyIYE8R1ml7TQrsHDsPQ:31562390'}

Uploading an ndarray

Often, when creating derived product—for example, running a classification model on existing data—you’ll have a NumPy array in memory instead of a GeoTIFF written to disk. In that case, you can use Catalog.upload_ndarray. This method behaves like Catalog.upload_image, with one key difference: you must have georeferencing information for the ndarray, in the form of a geotransform and a coordinate reference system definition. If the ndarray you’re uploading was derived from an ndarray you loaded from the platform, this information is easy to get.

Like when adding an image, you first have to create a product and bands. When you call Catalog.upload_ndarray, the array is uploaded to the backend, then processed asynchronously, so the new data isn’t available immediately.

Getting Georeferencing Information

Uploading an ndarray requires georeferencing information, which is used to map back and forth between geospatial coordinates (such as latitude and longitude) and which pixel coordinates they correspond to in the array. Doing this requires an affine geotransform in GDAL format, and a coordinate reference system definition in PROJ.4 or OGC Well-Known-Text format.

When loading an ndarray from the platform, you also receive a dictionary of metadata that includes both of these parameters. Using the Scene.ndarray, you have to set raster_info=True; with Raster.ndarray, it’s always returned.

As long as you didn’t change the shape of the array, you can use the original georeferencing parameters when uploading your derived array. Just pass that whole raster_info dict into Catalog.upload_ndarray.

Note

When working with the Scenes API Scene.ndarray method, you’ll get back an ndarray in the shape (band, y, x), where the bands axis comes first. Catalog.upload_ndarray expects an ndarray where the bands axis comes last, so you should either set bands_axis=-1 in Scene.ndarray, or use np.moveaxis(arr, 0, -1) before uploading to reshape the array.

Keep the shape of your ndarray in mind; failing to properly shape the ndarray will cause your product to have the wrong shape, or completely invalid data!

Uploading a single band ndarray

When you have a single band of data, be sure you upload a two-dimensional ndarray when using Catalog.upload_ndarray. All the other steps are the same as when using Catalog.upload_image.

When using Catalog.upload_ndarray, you must supply the georeferencing information, either by passing in the metadata returned from the ndarray call as raster_meta, or explicitly setting geotrans and proj4 or wkt_srs.

You should also specify values for overviews and overview_resampler Overviews allow the platform to more efficiently handle requests for data at non-native resolutions and improve the speed that Viewer renders images. overviews specifies a list of up to 16 different resolution magnification factors to calulate overviews for. E.g. overviews=[2,4] calculates two overviews at 2x and 4x the native resolution. overview_resampler specifies the algorithm to use when calculating overviews, see Catalog.upload_ndarray for which algorithms can be used.

In [37]: product_id = catalog_client.add_product(
   ....:     'bright-ndarray' + new_id,
   ....:     title='bright mask ndarray upload',
   ....:     description='bright mask ndarray upload'
   ....: )['data']['id']
   ....: 

In [38]: scene, geoctx = dl.scenes.Scene.from_id("landsat:LC08:01:T1:TOAR:meta_LC08_L1TP_163068_20181025_20181025_01_T1_v1")

In [39]: arr, raster_meta = scene.ndarray(
   ....:     "bright-mask",
   ....:     geoctx.assign(resolution=150),
   ....:     raster_info=True  # return georeferencing info we need to re-upload
   ....: )
   ....: 

In [40]: bright_band = catalog_client.add_band(
   ....:     product_id=product_id,
   ....:     name='bright-mask',
   ....:     jpx_layer=0,
   ....:     srcfile=0,
   ....:     srcband=1,
   ....:     default_range=[0, 1],
   ....:     nbits=1,
   ....:     color='Gray',
   ....:     dtype='UInt16',
   ....:     data_range=[0, 1],
   ....:     nodata=None,
   ....:     type='mask',
   ....: )['data']['id']
   ....: 

# note using arr[0] for single band, 2d ndarray
In [41]: catalog_client.upload_ndarray(
   ....:     arr[0],
   ....:     product_id,
   ....:     "bright",                     # unique ID for the image
   ....:     raster_meta=raster_meta,      # dict from ndarray call containing georeferencing info
   ....:     overviews=[2,4],              # create overviews for 300m and 600m resolution
   ....:     overview_resampler="average", # use "average" algorithm for determining overview pixel values
   ....:     acquired=scene.properties.acquired
   ....: )
   ....: 
# cleaning up
In [42]: catalog_client.remove_product(product_id, cascade=True)
Out[42]: {u'deletion_task': u'1X2fa-sJRbONiqYiAztw9g:31798711'}

Uploading multi-band ndarray

Uploading an ndarray to a multi-band product requires that you have the ndarray in the shape (y, x, band). In the below example, we’ll request that the Scenes API give us an ndarray in this shape, rather than in its default (band, y, x) form. When working with your own data, be sure to shape your ndarray correctly.

In [43]: product_id = catalog_client.add_product(
   ....:     'blue-bright-ndarray' + new_id,
   ....:     title='blue and bright mask ndarray upload',
   ....:     description='blue and bright mask ndarray upload'
   ....: )['data']['id']
   ....: 

# note ``band_axis=-1``
In [44]: arr, raster_meta = scene.ndarray(
   ....:     "bright-mask blue",
   ....:     geoctx.assign(resolution=150),
   ....:     raster_info=True,
   ....:     bands_axis=-1
   ....: )
   ....: 

In [45]: bright_band = catalog_client.add_band(
   ....:     product_id=product_id,
   ....:     name='bright-mask',
   ....:     jpx_layer=0,
   ....:     srcfile=0,
   ....:     srcband=1,
   ....:     default_range=[0, 1],
   ....:     nbits=1,
   ....:     color='Gray',
   ....:     dtype='UInt16',
   ....:     data_range=[0, 1],
   ....:     nodata=None,
   ....:     type='mask',
   ....: )['data']['id']
   ....: 

In [46]: blue_band = catalog_client.add_band(
   ....:     product_id=product_id,
   ....:     name='blue',
   ....:     jpx_layer=0,
   ....:     srcfile=0,
   ....:     srcband=2,
   ....:     default_range=[0, 4000],
   ....:     color='Blue',
   ....:     dtype='UInt16',
   ....:     type='spectral',
   ....:     nbits=14,
   ....:     data_range=[0, 10000],
   ....:     nodata=None,
   ....: )['data']['id']
   ....: 

In [47]: catalog_client.upload_ndarray(
   ....:     arr,
   ....:     product_id,
   ....:     "blue-bright",
   ....:     raster_meta=raster_meta,
   ....:     acquired=scene.properties.acquired
   ....: )
   ....: 
# final clean up
In [48]: catalog_client.remove_product(product_id, cascade=True)
Out[48]: {u'deletion_task': u'1X2fa-sJRbONiqYiAztw9g:31799725'}