Tasks

The Tasks API provides scalable compute capabilities to parallelize your computations. It works by pickling your Python code and executing the code on nodes hosted by Descartes Labs in our cloud infrastructure. These nodes are able to access imagery at extremely high rates of throughput which, paired with horizontal scaling, allow you to execute computations over nearly any spatiotemporal scale.

There is a one-time initialization to your user-specific resources that you need to run before you can submit jobs. The following code will initialize these resources for you and allow you to run asynchronous tasks.

from descarteslabs.client.services.tasks import Tasks
at = Tasks()
at.create_namespace()

Basic Example

# import relevant methods from Tasks API
from descarteslabs.client.services.tasks import Tasks, as_completed

# define function to scale out
def hello(i):
   import geopandas
   print(geopandas)
   return "hello {}".format(i)

# create task group
client = Tasks()
async_func = client.create_function(
hello,
name='my-task-hello',
image="us.gcr.io/dl-ci-cd/images/tasks/public/geospatial/geospatial-public:latest",
)

# submit a task to the task group
task = async_func(5)

# print the task result and logs
print task.result
print task.log

This example illustrates the basics usage of the Tasks API. Here, we define a function called hello which imports geopandas, prints out information about the package, and returns the string hello <argument>.

A task group is generated using the create_function method which specifies the function scale (hello), gives the task group a name, and specifies a Docker image that defines the environment in which the code will be executed.

In the next line, we call the async function to submit a single task. This submits the task to the task group created by the create_function call, and the reference to the task is stored in the ‘task’ variable. This also triggers instances to spin up on the backend to execute the task. Instance management is handled in the background. Instances are created or destroyed as needed to match the compute resources required by the job.

A few important features of the Tasks API are highlighted by this example:

  • You can pass any JSON-serializable argument to a task, e.g. arguments with type str, dict, list, None, or any numeric data type.
  • You can return a value from a task just like you can from a function executed locally. The return value is accessed via task.result. While arbitrary return values are supported, it is highly recommended that these are JSON-serializable or numpy arrays.
  • You can import nonstandard Python packages within your async function. Any packages that your function uses that are outside of the standard Python library need to be imported within the function itself. The function is the only segment of the code being serialized and sent to the node, so all variables and packages need to be sent along with it.
  • Any packages that your function imports also need to be installed on the Docker image used to define the execution environment. The default Docker image contains a number of common dependencies for geospatial analysis. Currently, you are not currently able to generate your own image or install additional packages.
  • You can access any logging or debugging information, including print statements executed inside your function, through the logs stored in task.log.

The next example illustrates the more typical use case of submitting multiple tasks.

Multiple Task Example

from descarteslabs.client.services.tasks import Tasks, as_completed
import numpy as np

# define the function to scale out
def generate_random_image(num_bands):

    # import numpy within the async function
    import numpy as np

    image_shape = (100, 100)
    image = np.random.rand(num_bands, *image_shape)


# create the task group
client = Tasks()
async_func = client.create_function(
generate_random_image,
name='my-task-random-image',
image="us.gcr.io/dl-ci-cd/images/tasks/public/geospatial/geospatial-public:latest",
)

# submit 20 tasks to the task group
tasks = async_func.map(range(20))

# print the shape of the image array returned by each task

for task in as_completed(tasks):
     if task.is_success:
        print(task.result.shape)
     else:
        print(task.exception)
        print(task.log)

Here, we are defining a new function that generates a random image using numpy where the user passes the number of bands to the n_bands parameter.

This example highlights a few additional features of the Tasks API:

  • To submit tasks to the task group, we are using the map method to submit a task for each of the elements in the list. This is typically the most efficient way to submit tasks to a task group, particularly if the number of tasks is large. You are also able to submit tasks one at a time, e.g. within in a for-loop.
  • We use the as_completed method to retrieve the task results for each task as it is completed. Within this loop, we also catch exceptions and print the logs of any failed task.

Choosing your environment

The execution environment for your function in the cloud is defined by the docker image you pick when creating the function. The below images are available covering typical use cases. If none of the provided images suits your needs, contact support@descarteslabs.com about customizing an image.

Match your local Python version to the image you choose. Your function will be rejected or might not run successfully if there is a mismatch between your local Python version and the Python version in the image. A differing bug release version (the “x” in Python version “2.7.x”) is fine.

Current images

Python 2.7.12, Ubuntu 16.04
Image: us.gcr.io/dl-ci-cd/images/tasks/public/py2/default:v2018.10.10
Date: 10/10/2018
Python highlights: GDAL, numpy, pandas, scikit-image, scikit-learn, scipy, Tensorflow, PyTorch
Other libraries and tools: GEOS 3.5.1, proj 4.9.2, FFTW 3.3.4
absl-py==0.5.0
affine==2.2.1
astor==0.7.1
astropy==2.0.5
atomicwrites==1.2.1
attrs==18.2.0
backports.functools-lru-cache==1.5
backports.weakref==1.0.post1
bleach==1.5.0
blosc==1.5.1
cachetools==2.0.1
certifi==2018.8.24
chardet==3.0.4
Click==7.0
click-plugins==1.0.4
cligj==0.5.0
cloudpickle==0.4.0
cryptography==1.2.3
cycler==0.10.0
decorator==4.3.0
descartes==1.1.0
descarteslabs==0.12.0
enum34==1.1.2
Fiona==1.7.11.post1
funcsigs==1.0.2
futures==3.2.0
gast==0.2.0
GDAL==2.2.2
geojson==2.4.0
geopandas==0.3.0
grpcio==1.15.0
h5py==2.7.1
html5lib==0.9999999
idna==2.6
ipaddress==1.0.16
Keras==2.1.5
kiwisolver==1.0.1
Markdown==3.0.1
matplotlib==2.2.3
mock==2.0.0
more-itertools==4.3.0
munch==2.3.2
networkx==2.1
numpy==1.11.0
pandas==0.22.0
pathlib2==2.3.2
pbr==4.2.0
Pillow==5.1.0
pluggy==0.7.1
protobuf==3.6.1
psutil==5.4.5
py==1.6.0
pyasn1==0.1.9
pyOpenSSL==0.15.1
pyparsing==2.2.2
pyproj==1.9.5.1
pytest==3.8.1
python-dateutil==2.7.3
pytz==2018.5
PyWavelets==1.0.1
PyYAML==3.13
rasterio==0.36.0
requests==2.18.4
scandir==1.9.0
scikit-image==0.13.1
scikit-learn==0.19.1
scipy==1.0.1
Shapely==1.6.4.post1
six==1.10.0
snuggs==1.4.2
subprocess32==3.5.2
tensorboard==1.7.0
tensorflow==1.7.0
termcolor==1.1.0
torch==0.4.0
torchvision==0.2.1
urllib3==1.22
Werkzeug==0.14.1
xarray==0.10.3
Python 3.4.8, Ubuntu 16.04
Image: us.gcr.io/dl-ci-cd/images/tasks/public/py3.4/default:v2018.10.10
Date: 10/10/2018
Python highlights: GDAL, numpy, pandas, scikit-image, scikit-learn, scipy, Tensorflow
Other libraries and tools: GEOS 3.5.1, proj 4.9.2, FFTW 3.3.4
absl-py==0.5.0
affine==2.2.1
astor==0.7.1
astropy==2.0.5
atomicwrites==1.2.1
attrs==18.2.0
bleach==1.5.0
blosc==1.5.1
cachetools==2.0.1
certifi==2018.8.24
chardet==3.0.4
Click==7.0
click-plugins==1.0.4
cligj==0.5.0
cloudpickle==0.4.0
cycler==0.10.0
decorator==4.3.0
descartes==1.1.0
descarteslabs==0.12.0
Fiona==1.7.11.post1
gast==0.2.0
GDAL==2.2.2
geojson==2.4.0
geopandas==0.3.0
grpcio==1.15.0
h5py==2.7.1
html5lib==0.9999999
idna==2.6
Keras==2.1.5
kiwisolver==1.0.1
Markdown==3.0.1
matplotlib==2.2.3
more-itertools==4.3.0
munch==2.3.2
networkx==2.1
numpy==1.11.0
pandas==0.22.0
pathlib2==2.3.2
Pillow==5.1.0
pluggy==0.7.1
protobuf==3.6.1
psutil==5.4.5
py==1.6.0
pycurl==7.43.0
pygobject==3.20.0
pyparsing==2.2.2
pyproj==1.9.5.1
pytest==3.8.1
python-apt==1.1.0b1+ubuntu0.16.4.1
python-dateutil==2.7.3
pytz==2018.5
PyWavelets==1.0.1
PyYAML==3.13
rasterio==0.36.0
requests==2.18.4
scandir==1.9.0
scikit-image==0.13.1
scikit-learn==0.19.1
scipy==1.0.1
Shapely==1.6.4.post1
six==1.11.0
snuggs==1.4.2
tensorboard==1.7.0
tensorflow==1.7.0
termcolor==1.1.0
urllib3==1.22
Werkzeug==0.14.1
xarray==0.10.3
Python 3.5.2, Ubuntu 16.04
Image: us.gcr.io/dl-ci-cd/images/tasks/public/py3.5/default:v2018.10.10
Date: 10/10/2018
Python highlights: GDAL, numpy, pandas, scikit-image, scikit-learn, scipy, Tensorflow, PyTorch
Other libraries and tools: GEOS 3.5.1, proj 4.9.2, FFTW 3.3.4
absl-py==0.5.0
affine==2.2.1
astor==0.7.1
astropy==2.0.5
atomicwrites==1.2.1
attrs==18.2.0
bleach==1.5.0
blosc==1.5.1
cachetools==2.0.1
certifi==2018.8.24
chardet==3.0.4
Click==7.0
click-plugins==1.0.4
cligj==0.5.0
cloudpickle==0.4.0
cycler==0.10.0
decorator==4.3.0
descartes==1.1.0
descarteslabs==0.12.0
Fiona==1.7.11.post1
gast==0.2.0
GDAL==2.2.2
geojson==2.4.0
geopandas==0.3.0
grpcio==1.15.0
h5py==2.7.1
html5lib==0.9999999
idna==2.6
Keras==2.1.5
kiwisolver==1.0.1
Markdown==3.0.1
matplotlib==3.0.0
more-itertools==4.3.0
munch==2.3.2
networkx==2.1
numpy==1.11.0
pandas==0.22.0
pathlib2==2.3.2
Pillow==5.1.0
pluggy==0.7.1
protobuf==3.6.1
psutil==5.4.5
py==1.6.0
pycurl==7.43.0
pygobject==3.20.0
pyparsing==2.2.2
pyproj==1.9.5.1
pytest==3.8.1
python-apt==1.1.0b1+ubuntu0.16.4.1
python-dateutil==2.7.3
pytz==2018.5
PyWavelets==1.0.1
PyYAML==3.13
rasterio==0.36.0
requests==2.18.4
scikit-image==0.13.1
scikit-learn==0.19.1
scipy==1.0.1
Shapely==1.6.4.post1
six==1.11.0
snuggs==1.4.2
tensorboard==1.7.0
tensorflow==1.7.0
termcolor==1.1.0
torch==0.4.0
torchvision==0.2.1
urllib3==1.22
Werkzeug==0.14.1
xarray==0.10.3
Python 3.6.5, Ubuntu 16.04
Image: us.gcr.io/dl-ci-cd/images/tasks/public/py3.6/default:v2018.10.10
Date: 10/10/2018
Python highlights: GDAL, numpy, pandas, scikit-image, scikit-learn, scipy, Tensorflow, PyTorch
Other libraries and tools: GEOS 3.5.1, proj 4.9.2, FFTW 3.3.4
absl-py==0.5.0
affine==2.2.1
astor==0.7.1
astropy==2.0.5
atomicwrites==1.2.1
attrs==18.2.0
bleach==1.5.0
blosc==1.5.1
cachetools==2.0.1
certifi==2018.8.24
chardet==3.0.4
Click==7.0
click-plugins==1.0.4
cligj==0.5.0
cloudpickle==0.4.0
cycler==0.10.0
decorator==4.3.0
descartes==1.1.0
descarteslabs==0.12.0
Fiona==1.7.11.post1
gast==0.2.0
GDAL==2.2.2
geojson==2.4.0
geopandas==0.3.0
grpcio==1.15.0
h5py==2.7.1
html5lib==0.9999999
idna==2.6
Keras==2.1.5
kiwisolver==1.0.1
Markdown==3.0.1
matplotlib==3.0.0
more-itertools==4.3.0
munch==2.3.2
networkx==2.1
numpy==1.11.0
pandas==0.22.0
Pillow==5.1.0
pluggy==0.7.1
protobuf==3.6.1
psutil==5.4.5
py==1.6.0
pycurl==7.43.0
pygobject==3.20.0
pyparsing==2.2.2
pyproj==1.9.5.1
pytest==3.8.1
python-apt==1.1.0b1+ubuntu0.16.4.1
python-dateutil==2.7.3
pytz==2018.5
PyWavelets==1.0.1
PyYAML==3.13
rasterio==0.36.0
requests==2.18.4
scikit-image==0.13.1
scikit-learn==0.19.1
scipy==1.0.1
Shapely==1.6.4.post1
six==1.11.0
snuggs==1.4.2
tensorboard==1.7.0
tensorflow==1.7.0
termcolor==1.1.0
torch==0.4.0
torchvision==0.2.1
urllib3==1.22
Werkzeug==0.14.1
xarray==0.10.3