There are two ways to get data out of Workflows:
inspect. Though they act similarly, they should be used in different circumstances.
inspect for things that take less than ~45sec, otherwise use
wf.inspect (or call
.inspect on any Workflows object) is for quick computations. It’s particularly good for operations that don’t touch raster data, like looking at
imagecollection.properites.map(lambda p: p["date"]), but also works well with raster data over small-ish AOIs. It’s designed to have as low latency and overhead as possible: best-case, a call to inspect has about 10ms of overhead (time spent doing other things besides your computation). It uses the same backend that renders tiles for
wf.map, and sends the results directly back to your machine without storing them anywhere else. If any transient errors occur while running your computation, they aren’t retried.
wf.compute (or call
.compute on any Workflows object) is for larger, slower computations. It creates a batch
Job, which waits in a queue until it’s run by a worker. If the job fails due to a transient error, it automatically re-runs until it succeeds. When complete, the results are written out to a destination, where they’re generally stored for 10 days, or indefinitely. The tradeoff of knowing that your computation will complete eventually, and that it will store its results somewhere without your computer needing to wait around to receive them, is that there’s more variability around when it will complete.
compute can still be quick: best-case, it has about 1s of overhead (though that’s 100x more overhead than
inspect). But depending on the load on the system, the queue might be empty, or it could take a few seconds, a few minutes, or even a few hours for your job to start.
If your computation takes 30 seconds, waiting 15sec for it to start is a long time—you should use
inspect instead. But if your computation takes 10min, waiting 15sec for it to start with
compute doesn’t make much difference, and is worth it for the progress reporting and better reliability.
Here’s a simplified view of the architecture of a
.compute in Workflows:
compute blocks until the Job finishes, showing a progress bar and downloading the results for you at the end, making it act the same way as
inspect. However, if you pass
compute, it just returns a
Job object. You can use this to run many jobs asynchronously, kicking off hundreds or thousands of jobs to run simultaneously.
Jobs are asynchronous; once started, they run on the backend whether you are waiting for them or not. You can use the
Job object to watch their progress, wait for them to finish, retrieve their results, cancel them, and rerun them.
>>> job = result.compute(block=False) >>> job.id 626e3036857d492fbc11e7fa09b25f16
Job.result waits for the result to be available and downloads the result when it’s done. In Jupyter notebooks, it also displays a progress bar. (Calling
compute is actually just creating a
Job and calling
result on it for you.)
>>> Job.get("626e3036857d492fbc11e7fa09b25f16").result() [###############] | Steps: 1/1 | Stage: STAGE_DONE | Status: STATUS_SUCCESS 2
By default, while you’re waiting on a
compute call, pressing Ctrl-C (or the “interrupt kernel” button in Jupyter) will cancel the job. You can also use
Job.cancel to cancel a running job.
Job.arguments let you see what went into the computation. And
Job.resubmit lets you easily rerun the job.
Note that the Workflows backend enforces some hard quotas to reduce the likelihood of a single user monopolizing finite computational resources. Specifically, the API that backs the
compute call will return a 429 error if the caller has too many outstanding jobs. By default, these limited requests will not automatically be retried. If you reach your outstanding job limit, you will receive a
Workflows is optimized for interactive use. If you are filling up the queue with long-running jobs, then the best thing to do would be to make your
compute requests blocking by setting
block=True. Another alternative would be to reduce the rate at which you are making non-blocking
compute requests by adding
time.sleep calls in your code. Finally, if the requests are lightweight enough to complete within 30 seconds, consider changing your
compute requests to
inspect, you can output results in different formats, such as GeoTIFF or JSON. See Output Formats for details.
compute, you can control where
Job results are stored. By default, they’re stored at a downloadable link and deleted after 10 days. But by passing a
catalog.Image as the destination, for example, Workflows will instead upload data to the Catalog for you. See Output Destinations for details.