Catalog¶
Warning
This low-level client has been deprecated in favor of the newer object-oriented Catalog client.
The Catalog
API
enables you to upload your own raster data to the Descartes Labs Platform.
When you upload data via the Catalog
,
you are able to interact with the data like any other product
on the platform. You can search for your data using the
Metadata
API, retrieve the data with Raster
, and explore
the data using the Viewer user interface.
- There are two common uses cases relevant to
Catalog
: - uploading derived products
- uploading existing data to a new product
In the first case, you may want to use the Catalog
API to create a
derived product from data available through the platform,
then upload the new product back to the platform. For example,
let’s say you have developed an image classification model.
You may use the Metadata
API to search for satellite imagery,
the Scene
API to obtain the imagery, then use the Tasks
API to scale the model across a large area. With Catalog
,
you can upload your new image classifications
to the platform as they’re created so that you can visualize and
interact with them.
In the other case, you may simply want to upload existing data that you have access to, but is not hosted by Descartes Labs.
In all cases, uploading a product to the Catalog requires either an image file in GeoTIFF or JPEG2000 format; or georeferenced ndarray data. Currently HDF is not a supported file format.
Note
For information about API Quotas and limits see our Quotas & Limits page.
Example¶
>>> from descarteslabs.client.services.catalog import Catalog
>>> from descarteslabs.client.services.metadata import Metadata
>>> import os
>>> import uuid
>>> import geojson
>>>
>>> from random import randint
>>> from time import sleep
>>> from datetime import datetime
>>>
>>> catalog_client = Catalog()
>>> metadata_client = Metadata()
>>>
>>> # used to generate unique product IDs
>>> new_id = str(uuid.uuid4())
>>>
>>> new_id
'fb04a008-d652-43eb-b482-5e4b5ede2636'
Create a product¶
Before uploading product data to the platform, you need to create a product and its bands. You can upload optional metadata about the product including number of bands, resolution and revisit rate, as well as grant read access to other users or groups. By default, a new product and its bands are available only to you.
When creating a product you provide a product_id
parameter that
needs to be unique among all the products you’ve created. The platform
will assign a full ID using your unique user ID and the product ID
you provided.
>>> product_id = catalog_client.add_product(
... 'example-product' + new_id, # unique ID that you generate
... title='Example Product',
... description='An example product'
... )['data']['id']
...
>>> product_id # unique ID created by platform
u'd4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636'
Create band(s)¶
After creating the product, you need to add one or more bands to the
product using Catalog.add_band
. You need to indicate a number of
required band metadata properties, including: data type, number
of bits to store band data, minimum and maximum values for values in the band,
which band from the input file the data for the band is mapped to,
color maps, optimal scaling for visualization, and any other relevant
metadata properties.
>>> band_id = catalog_client.add_band(
... # id of the product we just created
... product_id=product_id,
... # unique name to describe what the band encodes
... name='blue',
... # if data for a single scene will be in multiple files,
... # which file this band will be in (0-indexed)
... srcfile=0,
... # 1-based index indicating which band in the
... # file or ndarray this band is
... srcband=1,
... nbits=14,
... dtype='UInt16',
... # minimum and maximium data values in the band
... data_range=[0, 10000],
... # minimum and maximum values data *represents*,
... # e.g. [0, 100000] might represent [0, 1]
... physical_range=[0,1],
... type='spectral'
... )['data']['id']
...
>>> band_id
u'd4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636:blue'
Note
See the note on ‘Data range vs physical range’ here to learn more about these parameters.
It’s also possible to create, edit, and delete both products and bands through the Catalog.
Uploading Data to the Catalog¶
When your data already exists on disk in a GeoTIFF or other raster format, you can use
Catalog.upload_image
. This method allows you to upload the image
metadata and the image data in one call. Images are processed by the
platform asynchronously, so no information about the image is returned.
You can poll the Scene.from_id
to determine when your image has been
processed, or use Catalog.iter_upload_results
or
Catalog.upload_result
. Uploaded images can be accessed directly through Storage
using the products
storage type.
>>> import descarteslabs.scenes
>>> image_path = os.path.join(os.getcwd(), 'guides', 'blue.tif')
>>> catalog_client.upload_image(image_path, product_id)
>>>
>>> # Poll for processed image
>>> processed_image_id = '{}:{}'.format(product_id, 'blue')
>>>
>>> while True:
... try:
... image, ctx = descarteslabs.scenes.Scene.from_id(processed_image_id)
... break
... except Exception:
... sleep(2)
...
>>> image
Scene "d4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636:blue"
* Product: "d4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636"
* CRS: "+proj=utm +zone=43 +datum=WGS84 +units=m +no_defs "
* Date: Tue Nov 27 17:33:42 2018
* Bands:
* blue: UInt16, [0, 10000] -> [0, 1]
The newly uploaded image can be accessed directly through Storage
using the
products
storage type and an optional prefix of the image id.
>>> from descarteslabs.client.services.storage import Storage
>>> storage_client = Storage()
>>> image_prefix = "d4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-product"
>>> storage_client.list(prefix=image_prefix, storage_type="products")
['d4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636:blue']
Note about alpha bands and nodata values¶
When creating new products, you should set a nodata
value
or include an alpha band with the product, especially
if you want to clip the product by shape. Though an explicit
nodata
value or an alpha band isn’t strictly required,
not providing one or the other can lead to confusion between
meaningful 0-values and no-data values that are represented as 0s.
Working with the data¶
Once the image has been processed, you can work with it like any other image.
Viewing the image¶
>>> arr = image.ndarray('blue', ctx, mask_alpha=False)
>>> descarteslabs.scenes.display(arr)

Modifying the bands¶
It’s possible to modify the bands to do things like add a colormap for more pretty viewing. These changes can take a few minutes to take effect.
>>> response = catalog_client.change_band(
... product_id,
... band_id,
... colormap_name='magma'
... )
...
If you want to add your own color map you need to first put it in a form the platform can understand. The general form is a list where each item is a map to the appropriate colorspace for the pixel value at the same item’s index. An example for integer data types:
>>> bad_colormap = [[str(randint(0, 255)) for i in range(4)] for i in range(256)]
>>>
>>> response = catalog_client.change_band(
... product_id,
... band_id,
... colormap=bad_colormap
... )
...
Sharing the product¶
If you decide that your new product might be useful for others you can share it, its bands, and its images.
>>> response = catalog_client.change_product(
... product_id,
... read=['some:group'],
... set_global_permissions=True
... )
...
Making all the data available to the group can take some time. The
changes are made in the background, and won’t lock you up while they’re
occurring. If set_global_permissions=True
isn’t set, others can see
the metadata about the product itself, but those permissions won’t
be applied to the bands or images it contains, which isn’t that helpful.
Adding Custom Metadata¶
You can add custom metadata to an image when you upload an image with
Catalog.upload_image
, or after an image with Catalog.change_image
.
This metadata is be added to the auto generated image metadata
during processing. Acquired date can’t be inferred accurately, and
you should set this metadata property yourself. If you don’t provide
this value, the platform will use the update time of the supplied file
which probably will not represent a real aquisition time for an image.
>>> catalog_client.change_image(
... product_id,
... processed_image_id,
... acquired=datetime.now().isoformat()
... )
...
{u'data': {u'attributes': {u'acquired': u'2018-11-27T17:33:55.960134+00:00',
u'bucket': [u'storage-d4ef22d5a6969cb61147ec8ea3e060cdf33e1a49-products'],
u'descartes_version': u'dl-platform-ingest',
u'directory': [u'd4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636/'],
u'file_md5s': [u'1cbec99bdcd2b4d24fc646deb39342f3'],
u'file_sizes': [1503001],
u'files': [u'1cbec99bdcd2b4d24fc646deb39342f3.tif'],
u'geometry': {u'coordinates': [[[73.4682704195585, 50.510649275790676],
[73.72290871796778, 50.51372144948099],
[73.72602088693354, 50.39800582255859],
[73.47200258138672, 50.39494617828006],
[73.4682704195585, 50.510649275790676]]],
u'type': u'Polygon'},
u'geotrans': [391395.0, 15.0, 0.0, 5596530.0, 0.0, -15.0],
u'identifier': u'd4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636/blue.tif',
u'key': u'blue',
u'owners': [u'user:d4ef22d5a6969cb61147ec8ea3e060cdf33e1a49'],
u'processed': u'2018-11-27T17:33:51.016043+00:00',
u'proj4': u'+proj=utm +zone=43 +datum=WGS84 +units=m +no_defs ',
u'raster_size': [1204, 858],
u'read': [u'group:some:group'],
u'writers': []},
u'id': u'd4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636:blue',
u'meta': {u'modified': u'2018-11-27T17:33:56.034899+00:00',
u'owner_type': u'user'},
u'relationships': {u'product': {u'data': {u'id': u'd4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:example-productfb04a008-d652-43eb-b482-5e4b5ede2636',
u'type': u'product'}}},
u'type': u'image'}}
The following metadata properties are inferred from the image file, and are overwritten by any values you supply:
- acquired
- key
- proj4
- geotrans
- descartes_version
- geometry
- raster_size
- bucket
- directory
- processed
- files
- file_sizes
- file_md5s
- identifier
- owner
- product
Cleaning Up¶
You can delete products, but keep in mind that a product can’t
be removed if it has any images or bands attached to it.
You can use cascade=True
and force all the bands and
images to be removed when deleting a product.
>>> catalog_client.remove_product(product_id, cascade=True)
{u'deletion_task': u'eRNNnh1FQYCrKCke-F93Gw:31643398'}
Uploading Multi-Image Scenes¶
If you have separate files for band data in the same scene you can upload them into the same product as separate bands.
First create the bands, as usual. Note that the srcfile
parameter values
change for each band.
>>> product_id = catalog_client.add_product(
... 'masks' + new_id,
... title='Cloud and Brightness masks',
... description='Cloud and Brightness masks'
... )['data']['id']
...
>>> band0_id = catalog_client.add_band(
... product_id=product_id,
... name='cloud-mask',
... jpx_layer=0,
... srcfile=0, # first file
... srcband=1, # first band of first file
... nbits=1,
... dtype='UInt16',
... nodata=0,
... data_range=[0, 1],
... type='mask',
... )['data']['id']
...
>>> band1_id = catalog_client.add_band(
... product_id=product_id,
... name='bright-mask',
... jpx_layer=0,
... srcfile=1, # second file
... srcband=1, # first band of second file
... nbits=1,
... dtype='UInt16',
... nodata=0,
... data_range=[0, 1],
... type='mask',
... )['data']['id']
...
Now upload the image files, setting multi=True
. Be sure to order the
files in the list according to the srcfile
parameter values for the
created bands. E.G. if band0
data is stored in image_path
and it’s
srfile
is 0; and band1
data is stored in other_image_path
, and it’s
srcfile
is 1; then the order to list the files is [image_path, other_image_path]
.
Also note that when using multi=True
you must provide a unique value
for image_id
.
>>> cloud_image_path = os.path.join(
... os.getcwd(),
... 'guides',
... 'cloud-mask.tif'
... )
...
>>> bright_image_path = os.path.join(
... os.getcwd(),
... 'guides',
... 'bright-mask.tif'
... )
...
>>> image_id = '_'.join([
... 'test_multi_image_scene',
... str(datetime.now().isoformat())
... ])
...
>>> catalog_client.upload_image(
... [cloud_image_path, bright_image_path],
... product_id,
... multi=True,
... image_id=image_id,
... )
...
>>> catalog_client.remove_product(product_id, cascade=True)
{u'deletion_task': u'OOYyIYE8R1ml7TQrsHDsPQ:31562390'}
Uploading an ndarray¶
Often, when creating derived product—for example, running a classification model
on existing data—you’ll have a NumPy array in memory instead of a GeoTIFF written to disk.
In that case, you can use Catalog.upload_ndarray
. This method behaves like Catalog.upload_image
,
with one key difference: you must have georeferencing information for the ndarray, in the form of
a geotransform and a coordinate reference system definition. If the ndarray you’re uploading
was derived from an ndarray you loaded from the platform, this information is easy to get.
Like when adding an image, you first have to create a product and bands.
When you call Catalog.upload_ndarray
, the array is uploaded to the backend,
then processed asynchronously, so the new data isn’t available immediately.
Getting Georeferencing Information¶
Uploading an ndarray requires georeferencing information, which is used to map back and forth between geospatial coordinates (such as latitude and longitude) and which pixel coordinates they correspond to in the array. Doing this requires an affine geotransform in GDAL format, and a coordinate reference system definition in PROJ.4 or OGC Well-Known-Text format.
When loading an ndarray from the platform, you also receive a dictionary of metadata
that includes both of these parameters. Using the Scene.ndarray
, you have to set
raster_info=True
; with Raster.ndarray
, it’s always returned.
As long as you didn’t change the shape of the array, you can use
the original georeferencing parameters when uploading your derived array.
Just pass that whole raster_info
dict into Catalog.upload_ndarray
.
Note
When working with the Scenes API Scene.ndarray
method, you’ll get back an ndarray in the shape
(band, y, x)
, where the bands axis comes first. Catalog.upload_ndarray
expects
an ndarray where the bands axis comes last, so you
should either set bands_axis=-1
in Scene.ndarray
,
or use np.moveaxis(arr, 0, -1)
before uploading to reshape the array.
Keep the shape of your ndarray
in mind;
failing to properly shape the ndarray
will
cause your product to have the wrong shape, or
completely invalid data!
Uploading a single band ndarray¶
When you have a single band of data, be sure you upload a
two-dimensional ndarray
when using Catalog.upload_ndarray
.
All the other steps are the same as when using Catalog.upload_image
.
When using Catalog.upload_ndarray
, you must supply the
georeferencing information, either by passing in the metadata
returned from the ndarray
call as raster_meta
, or
explicitly setting geotrans
and proj4
or wkt_srs
.
You should also specify values for overviews
and overview_resampler
Overviews allow the platform to more efficiently handle
requests for data at non-native resolutions and improve
the speed that Viewer renders images. overviews
specifies
a list of up to 16 different resolution magnification factors
to calulate overviews for. E.g. overviews=[2,4]
calculates
two overviews at 2x and 4x the native resolution.
overview_resampler
specifies the algorithm to use
when calculating overviews, see Catalog.upload_ndarray
for which algorithms can be used.
>>> product_id = catalog_client.add_product(
... 'bright-ndarray' + new_id,
... title='bright mask ndarray upload',
... description='bright mask ndarray upload'
... )['data']['id']
...
>>> scene, geoctx = descarteslabs.scenes.Scene.from_id("landsat:LC08:01:T1:TOAR:meta_LC08_L1TP_163068_20181025_20181025_01_T1_v1")
>>>
>>> arr, raster_meta = scene.ndarray(
... "bright-mask",
... geoctx.assign(resolution=150),
... # return georeferencing info we need to re-upload
... raster_info=True
... )
...
>>> bright_band = catalog_client.add_band(
... product_id=product_id,
... name='bright-mask',
... jpx_layer=0,
... srcfile=0,
... srcband=1,
... default_range=[0, 1],
... nbits=1,
... color='Gray',
... dtype='UInt16',
... data_range=[0, 1],
... nodata=None,
... type='mask',
... )['data']['id']
...
>>> # note using arr[0] for single band, 2d ndarray
>>> catalog_client.upload_ndarray(
... arr[0],
... product_id,
... # unique ID for the image
... "bright"
... # dict from ndarray call containing georeferencing info
... raster_meta=raster_meta,
... # create overviews for 300m and 600m resolution
... overviews=[2,4],
... # use "average" algorithm for determining overview
... # pixel values
... overview_resampler="average",
... acquired=scene.properties.acquired
... )
...
>>> # cleaning up
>>> catalog_client.remove_product(product_id, cascade=True)
{u'deletion_task': u'1X2fa-sJRbONiqYiAztw9g:31798711'}
Uploading multi-band ndarray¶
Uploading an ndarray
to a multi-band product requires
that you have the ndarray
in the shape (y, x, band)
.
In the below example, we’ll request that the Scenes API
give us an ndarray
in this shape, rather than in its default
(band, y, x)
form. When working with your own data, be sure to
shape your ndarray
correctly.
>>> product_id = catalog_client.add_product(
... 'blue-bright-ndarray' + new_id,
... title='blue and bright mask ndarray upload',
... description='blue and bright mask ndarray upload'
... )['data']['id']
...
>>> # note ``band_axis=-1``
>>> arr, raster_meta = scene.ndarray(
... "bright-mask blue",
... geoctx.assign(resolution=150),
... raster_info=True,
... bands_axis=-1
... )
...
>>> bright_band = catalog_client.add_band(
... product_id=product_id,
... name='bright-mask',
... jpx_layer=0,
... srcfile=0,
... srcband=1,
... default_range=[0, 1],
... nbits=1,
... color='Gray',
... dtype='UInt16',
... data_range=[0, 1],
... nodata=None,
... type='mask',
... )['data']['id']
...
>>> blue_band = catalog_client.add_band(
... product_id=product_id,
... name='blue',
... jpx_layer=0,
... srcfile=0,
... srcband=2,
... default_range=[0, 4000],
... color='Blue',
... dtype='UInt16',
... type='spectral',
... nbits=14,
... data_range=[0, 10000],
... nodata=None,
... )['data']['id']
...
>>> catalog_client.upload_ndarray(
... arr,
... product_id,
... "blue-bright",
... raster_meta=raster_meta,
... acquired=scene.properties.acquired
... )
...
>>> # final clean up
>>> catalog_client.remove_product(product_id, cascade=True)
{u'deletion_task': u'1X2fa-sJRbONiqYiAztw9g:31799725'}
Debugging uploads¶
You can retrieve upload results to find out about the status of image uploads with Catalog.upload_result
and Catalog.iter_upload_results
. If an upload failed, for example because file format wasn’t recognized, you can find out more details.
In the following example we upload a file that’s clearly not a valid image (it’s an empty file), so we expect the upload to fail.
>>> import tempfile
>>> from descarteslabs.exceptions import NotFoundError
>>>
>>> product_id = catalog_client.add_product(
... 'debugging' + new_id,
... title='debugging uploads',
... description='debugging uploads'
... )['data']['id']
...
>>> invalid_image_path = tempfile.mkstemp()[1]
>>> with open(invalid_image_path, "w"): pass
...
>>> upload_id = catalog_client.upload_image(
... invalid_image_path,
... product_id
... )
>>>
>>> # Poll for the image upload to fail
>>> while True:
... try:
... result = catalog_client.upload_result(
... product_id,
... upload_id
... )
... if result["data"]["attributes"]["status"] == "FAILURE":
... break
... except NotFoundError:
... sleep(2)
>>> result["data"]["attributes"]
{u'status': u'FAILURE', u'stacktrace': u'Traceback (most recent call
last):<omitted>', u'created': u'2019-06-12T12:10:16.975530',
u'failure_type': u'exception', u'labels': [
u'ae60fc891312ffadc94ade8063313b0063335a3c:debuggingef0356d7-ed19-439e-9713-67f3a39150d7',
u'google-oauth2|114477062978269698904', u'tmpQkBseI'], u'runtime':
0.652752161026001, u'exception_name': u'IOError'}
Note that for some period after starting an upload the result may not exist yet - we explicitly handle the NotFoundError
that is raised in that period.
Image uploads use the Tasks service underneath, so the upload result looks similar to a task result, e.g., with a timestamp and the runtime in seconds. In case of a failure the stacktrace may include detailed information about what went wrong.
The above looks up a specific upload result by id. You can also list any past upload results, for example in case you haven’t captured an upload id. The below prints all upload results created for a specific product in the last 24 hours.
>>> from datetime import datetime, timedelta
>>> day_ago = datetime.now() - timedelta(days=1)
>>> for result in catalog_client.iter_upload_results(
... product_id,
... created=day_ago.isoformat()
... ):
... print result
...
{u'attributes': {u'status': u'FAILURE', u'peak_memory_usage':
111620096, u'exception_name': u'IOError', u'created':
u'2019-06-12T12:10:16.975530', u'failure_type': u'exception',
u'runtime': 0.652752161026001}, u'type': u'upload', u'id':
u'578849947833531'}
Upload results are currently not stored indefinitely, so you may not have access to the full history of uploads for a product.
>>> # final clean up
>>> catalog_client.remove_product(product_id, cascade=True)
{u'deletion_task': u'1X2fa-sJRbONiqYiAztw9g:31799725'}