Concepts

In Workflows, the code you write doesn’t execute on your computer or inside your notebook. Instead, a version of that code gets sent to our servers, which run it and send you back the result (or store it elsewhere). By doing that, we can apply many optimizations that would be difficult or impossible for you to write on your own, and make it easy see your results on a map. However, it means you’ll have to think about programming differently than you’re used to, because the data you’re working with a) hasn’t been computed yet, and b) doesn’t exist on your computer, so you can’t just print things out.

Proxy Objects

In normal Python, adding two numbers happens right away:

>>> 1 + 1
2

But if we use a Workflows Int, we don’t get 2, we just get another Int:

>>> import descarteslabs.workflows as wf
>>> wf.Int(1) + 1
<descarteslabs.workflows.types.primitives.number.Int at 0x...>

This Int—like everything else in the descarteslabs.workflows client—is a lazy proxy object. Every time you call a function or access an attribute on a Workflows object, it returns a proxy object representing what the result would be, and keeps track of that operation for later.

To actually find out the value of this proxy object, you have to execute it by calling compute or inspect. When you do this, your computer sends all of those tracked operations to the backend, which processes them and sends the final result back to you.

>>> result = wf.Int(1) + 1
>>> result.inspect()
2

This proxy object is actually just storing a dependency graph representing the operation 1 + 1, in a syntax called “graft”:

>>> result.graft
{'1': 1, '2': 1, '3': ['add', '1', '2'], 'returns': '3'}

You don’t ever need to worry about the graft or understand the syntax, but knowing that it’s happening might make the system a little less mysterious.

It’s also important to understand that the backend isn’t running the exact Python code you’ve typed out. Instead, the Python code you write is what actually produces this graft structure, which then gets sent to the backend.

Additionally, all Workflows proxy objects are immutable. Every operation in Workflows returns a new object. For example, if you have an Image called img, img.pick_bands("red") doesn’t change the bands in img—it returns a new object which just contains the one band red. A common mistake is writing code like:

>>> # this doesn't work as you expect!
>>> ic = wf.ImageCollection.from_id("sentinel-2:L1C")
>>> ic.pick_bands("red green blue")
>>> ic.mean(axis="images")
>>> ic.mask(ic.mean(axis="bands") > 0.7)

If you do ic.compute(ctx), you might be surprised that your pick_bands, mean, and mask didn’t happen. But since those operations don’t modify ic in-place, and this code didn’t store their results into variables, they effectively got lost. Indeed, ic hasn’t changed since the original ic = wf.ImageCollection.from_id("sentinel-2:L1C"). Instead, you should write this as:

>>> ic = wf.ImageCollection.from_id("sentinel-2:L1C")
>>> rgb = ic.pick_bands("red green blue")
>>> mean = rgb.mean(axis="images")
>>> masked = mean.mask(ic.mean(axis="bands") > 0.7)
>>> masked.compute(ctx)

Result Objects

When the result of compute or inspect gets sent back to you, by default it’s returned as an object you can interact with directly in Python. (See here for other formats you can use for results.) For Workflows types with Python equivalents like Int, Dict, or List, there is no special result object—we just use the native Python type:

>>> wf_list = wf.List[wf.Str](["foo", "bar"])
>>> wf_list.inspect()  # sends the wf.List to the backend; returns a normal Python list
['foo', 'bar']

For Workflows types with no direct Python equivalent, such as Image, ImageCollection, GeoContext, FeatureCollection, etc., there are special result objects for holding the results of computations.

To further illustrate the difference between proxy and result objects, we’ll access the properties attribute of an Image proxy object:

>>> img = wf.Image.from_id("landsat:LC08:PRE:TOAR:meta_LC80330352016022_v1")
>>> type(img)
<class 'descarteslabs.workflows.types.geospatial.image.Image'>
>>> img.properties
<descarteslabs.workflows.types.containers.known_dict.KnownDict[{'crs': Str, 'date': Datetime, 'geotrans': Tuple[Float, Float, Float, Float, Float, Float], 'id': Str, 'product': Str}, Str, Any] at 0x...>

Notice we don’t see any actual data there: we haven’t yet called compute on img, so we only get the proxy object representing img.properties. Next we’ll call img.compute to get the actual data in an ImageResult, then look at the properties field on that:

>>> img_result = img.compute(ctx) # ctx is the geocontext used to compute img_result
Job ID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[######] | Steps: 17/17 | Stage: SAVING | Status: SUCCESS
>>> img_result
ImageResult:
  * ndarray: MaskedArray<shape=(3, 137, 398), dtype=float64>
  * properties: 'acquired', 'bucket', 'cloud_fraction_0', 'confidence_dlsr', ...
  * bandinfo: 'red', 'green', 'blue'
  * geocontext: 'geometry', 'resolution', 'crs', 'bounds', ...
>>> type(img_result)
<class 'descarteslabs.workflows.results.results.ImageResult'>

Now if we look at .properties on our ImageResult, we see the actual data:

>>> img_result.properties
{'acquired': '2016-01-22T17:38:34.696493+00:00',
...
}

In fact, we can now see the data for all of its attributes (not just properties):

>>> img_result.ndarray
masked_array(
data=[[[0.1815, 0.2013, ..., 0.0722, 0.0653],
        [0.1599, 0.1821, ..., 0.0644, 0.0592],
        ...,
        [0.2273, 0.1877, ..., 0.2449, 0.2346],
        [0.2125, 0.1844, ..., 0.259 , 0.2349]]],
mask=False,
fill_value=1e+20)

Note

Don’t call .compute or .inspect on the whole object unless you actually need data from the whole object.

If you only need one part of an object’s data, it’s more efficient to call inspect() directly on that attribute, instead of computing every attribute as we did above. If all we needed was just .properties—not .ndarray, .bandinfo, etc.—this would be much faster:

>>> img.properties.inspect(ctx)
{'acquired': '2016-01-22T17:38:34.696493+00:00',
...
}

The GeoContext

In Workflows, all operations happen within one GeoContext. This is the same as a dl.scenes.GeoContext object: it specifies the bounding-box, coordinate reference system, and resolution, and those parameters are used by all operations on raster and vector data.

Unlike with the Scenes API, though, you specify the GeoContext at the very end by passing it into compute or inspect. This means the Workflows code you write is “spatially-invariant”: without changing any code, you can just pass in a different GeoContext to run the same operation somewhere else in the world. (In fact, this is how the interactive map works: each 256x256 square runs your same code within a GeoContext for that square.)

So for example, when you write:

>>> import descarteslabs.workflows as wf
>>> ic = wf.ImageCollection.from_id("sentinel-2:L1C", start_datetime="2019-04-01", end_datetime="2019-07-01")
>>> rgb_composite = ic.pick_bands("red green blue").median(axis="images")

That rgb_composite object isn’t tied to a particular place on Earth. In fact, you can’t compute it without telling it where to run:

>>> rgb_composite.inspect()
descarteslabs.exceptions.BadRequestError: A GeoContext is required for this computation

We’ll create two different GeoContexts and see how computing the same object under them changes the results:

>>> import descarteslabs as dl
>>>
>>> # Seward, AK
>>> seward_ctx = dl.scenes.AOI(
...     bounds=[-149.475, 60.100, -149.325, 60.165],
...     bounds_crs="+proj=longlat +datum=WGS84 +no_defs",
...     crs="+proj=utm +zone=6",
...     resolution=30.0,
... )
>>>
>>> # Santa Fe, NM
>>> santa_fe_ctx = dl.scenes.DLTile.from_latlon(35.6870, -105.93780, resolution=100, tilesize=512, pad=0)

First, notice that the same ImageCollection object ic will contain a different number of images depending on where we look:

>>> ic.length().inspect(seward_ctx)
106
>>> ic.length().inspect(santa_fe_ctx)
146

This is an important point to understand: ic isn’t a fixed, concrete set of images. Since it’s a proxy object, it’s more like a placeholder for whatever images happen to fall within the GeoContext when you compute it. When you chain operations onto that ImageCollection (in this case, pick_bands("red green blue").median(axis="images")), you’re basically constructing a template or a recipe for Workflows to follow, using the GeoContext as input.

Computing the same rgb_composite object with different GeoContexts gives us results in two different locations (with different resolutions and coordinate reference systems as well):

>>> seward_comp = rgb_composite.inspect(seward_ctx)
>>> dl.scenes.display(seward_comp.ndarray)
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/concepts_figure7_1.png
>>> santa_fe_comp = rgb_composite.inspect(santa_fe_ctx)
>>> dl.scenes.display(santa_fe_comp.ndarray)
https://cdn.descarteslabs.com/docs/1.12.1/public/_images/concepts_figure8_1.png

Indeed, we could display rgb_composite on the map and look at it anywhere in the world, just by panning around:

>>> rgb_composite.visualize("Sentinel-2 composite", scales=[[0, 0.4], [0, 0.4], [0, 0.4]])
>>> wf.map

Life without for-loops

Because of its proxy object model, you can’t use typical Python features like for-loops, if statements, or list comprehensions with Workflows objects. All of these would require having the data present on your computer. But in Workflows, the data is only processed on the backend—an Image doesn’t contain any actual data, just instructions. So Python can’t know, for instance, whether img.max() > 0.5 is True or not, or how many items are in an ImageCollection, until you call compute or inspect. And calling compute or inspect all the time is very inefficient: the whole point of Workflows is to move as much computation as possible to the backend. If you call compute or inspect in the middle of your code, you break Workflows’ ability to optimize your code and run it in parallel. Additionally, visualize won’t work as you expect on code with compute or inspect in the middle of it: from Workflows’ perspective, the computed value is just hardcoded, and won’t re-compute itself as you pan around the map.

With Workflows, you’ll learn to write code differently in order to avoid control-flow statements like if and for:

Instead of Use
if ifelse
for-loops or list comprehensions .map
Removing from a list .filter
and, or &, | (remember to use more parenthesis!)
len(x) x.length()
for i in range(10) wf.range(10).map(lambda i: ...)
for x, y in zip(xs, ys) wf.zip(xs, ys).map(lambda x_y_tuple: ...)
for i, x in enumerate(xs) wf.zip(wf.range(xs.length()), xs).map(lambda i_y_tuple: ...)
if key in a_dict a_dict.contains(key)
for key in a_dict a_dict.keys().map(lambda key: ...)
for value in a_dict.values() a_dict.values().map(lambda value: ...)
for key, value in a_dict.items() a_dict.items().map(lambda key_value_tuple: ...)