Computing Data¶
There are two ways to get data out of Workflows: compute
and inspect
. Though they act similarly, they should be used in different circumstances.
tl;dr: use inspect
for things that take less than ~45sec, otherwise use compute
.
Inspect¶
wf.inspect
(or call .inspect
on any Workflows object) is for quick computations. It’s particularly good for operations that don’t touch raster data, like looking at imagecollection.length()
or imagecollection.properites.map(lambda p: p["date"])
, but also works well with raster data over small-ish AOIs. It’s designed to have as low latency and overhead as possible: best-case, a call to inspect has about 10ms of overhead (time spent doing other things besides your computation). It uses the same backend that renders tiles for wf.map
, and sends the results directly back to your machine without storing them anywhere else. If any transient errors occur while running your computation, they aren’t retried.
Compute¶
wf.compute
(or call .compute
on any Workflows object) is for larger, slower computations. It creates a batch Job
, which waits in a queue until it’s run by a worker. If the job fails due to a transient error, it automatically re-runs until it succeeds. When complete, the results are written out to a destination, where they’re generally stored for 10 days, or indefinitely. The tradeoff of knowing that your computation will complete eventually, and that it will store its results somewhere without your computer needing to wait around to receive them, is that there’s more variability around when it will complete. compute
can still be quick: best-case, it has about 1s of overhead (though that’s 100x more overhead than inspect
). But depending on the load on the system, the queue might be empty, or it could take a few seconds, a few minutes, or even a few hours for your job to start.
If your computation takes 30 seconds, waiting 15sec for it to start is a long time—you should use inspect
instead. But if your computation takes 10min, waiting 15sec for it to start with compute
doesn’t make much difference, and is worth it for the progress reporting and better reliability.
Here’s a simplified view of the architecture of a .compute
in Workflows:

Workflows Architecture Overview¶
Jobs¶
By default, compute
blocks until the Job finishes, showing a progress bar and downloading the results for you at the end, making it act the same way as inspect
. However, if you pass block=False
to compute
, it just returns a Job
object. You can use this to run many jobs asynchronously, kicking off hundreds or thousands of jobs to run simultaneously.
Jobs are asynchronous; once started, they run on the backend whether you are waiting for them or not. You can use the Job
object to watch their progress, wait for them to finish, retrieve their results, cancel them, and rerun them.
>>> job = result.compute(block=False)
>>> job.id
626e3036857d492fbc11e7fa09b25f16
Calling Job.result
waits for the result to be available and downloads the result when it’s done. In Jupyter notebooks, it also displays a progress bar. (Calling compute
is actually just creating a Job
and calling result
on it for you.)
>>> Job.get("626e3036857d492fbc11e7fa09b25f16").result()
[###############] | Steps: 1/1 | Stage: STAGE_DONE | Status: STATUS_SUCCESS
2
By default, while you’re waiting on a compute
call, pressing Ctrl-C (or the “interrupt kernel” button in Jupyter) will cancel the job. You can also use Job.cancel
to cancel a running job.
Job.object
, Job.geoctx
, and Job.arguments
let you see what went into the computation. And Job.resubmit
lets you easily rerun the job.
Note that the Workflows backend enforces some hard quotas to reduce the likelihood of a single user monopolizing finite computational resources. Specifically, the API that backs the compute
call will return a 429 error if the caller has too many outstanding jobs. By default, these limited requests will not automatically be retried. If you reach your outstanding job limit, you will receive a descarteslabs.client.grpc.exceptions.ResourceExhausted
.
Workflows is optimized for interactive use. If you are filling up the queue with long-running jobs, then the best thing to do would be to make your compute
requests blocking by setting block=True
. Another alternative would be to reduce the rate at which you are making non-blocking compute
requests by adding time.sleep
calls in your code. Finally, if the requests are lightweight enough to complete within 30 seconds, consider changing your compute
requests to inspect
requests.
Formats¶
By specifying format=
to compute
or inspect
, you can output results in different formats, such as GeoTIFF or JSON. See Output Formats for details.
Destinations¶
By specifying destination=
to compute
, you can control where Job
results are stored. By default, they’re stored at a downloadable link and deleted after 10 days. But by passing a catalog.Image
as the destination, for example, Workflows will instead upload data to the Catalog for you. See Output Destinations for details.