Vectors¶
Warning
Vectors
has been replaced with
Tables
.
Please use the
Tables client
and read the
Tables guide.
The Vector service lets you store vector geometries (points, polygons, etc.) along with key-value properties, and query that data spatially and/or by properties.
It’s meant for data at the scale of millions to billions of features. If your data can fit in memory, work with it there — the Vector service is not meant for small datasets, and will be far less performant than working locally.
A typical use for the Vector service that might produce such millions of features is storing the output from Tasks. For example, a computer vision detector might be run in thousands of tasks over many years of data across a continent; the objects it detects could be saved as features for later querying and analysis.
Note
For information about API Quotas and limits see our Quotas & Limits page.
Data Types¶
The Vector service mirrors GeoJSON by offering two types:
Feature
and
FeatureCollection
.
A Feature
is a single
geometry and key-value properties; a
FeatureCollection
holds
many Features
,
with an id and access controls.
The key-value properties of Features
are schemaless:
Features
in the
same FeatureCollection
do not all have to have the same keys present, or the same types
for their values. Features
cannot be modified after
they have been added, but they can be removed.
Feature¶
A Feature
is a single
GeoJSON Geometry,
a dict
of properties, and a unique id
:
import descarteslabs as dl
>>> import shapely.geometry
>>>
>>> feature = dl.vectors.Feature(
... geometry={
... 'type': 'Polygon',
... 'coordinates': [[[-95, 42], [-93, 42], [-93, 40], [-95, 41], [-95, 42]]]
... },
... properties={
... "temperature": 70.13,
... "size": "large",
... "tags": None
... }
... )
>>>
>>> feature
Feature({
'geometry': {
'coordinates': (((-95.0, 42.0), (-93.0, 42.0), (-93.0, 40.0), (-95.0, 41.0), (-95.0, 42.0)),),
'type': 'Polygon'
},
'id': None,
'properties': {
'size': 'large',
'tags': None,
'temperature': 70.13
},
'type': 'Feature'
})
Unlike GeoJSON, the values in properties
can only be strings
(up to 256 characters), integers, floats, or the value None
.
Therefore, properties
doesn’t support nesting (containing more
dictionaries or lists).
geometry
must be a primitive GeoJSON geometry (Point,
MultiPoint, Polygon, MultiPolygon, LineString, MultiLineString,
GeometryCollection). Using a
Feature
or
FeatureCollection
will
raise an error. As a GeoJSON geometry, the coordinates are assumed to be (longitude,
latitude)
in WGS84 decimal degrees (EPSG:4326), with planar edges.
You don’t need to—and shouldn’t—set id
yourself.
The geometry you pass in is converted to a Shapely shape:
>>> feature.geometry
<shapely.geometry.polygon.Polygon at 0x7f3f45e19790>
The properties
are stored in a
DotDict
, which allows you
refer to values by key or property access, making syntax for getting
and setting properties more convenient:
>>> feature.properties['temperature']
70.13
>>> feature.properties.temperature
70.13
FeatureCollection¶
In the Vector service, you create products with a id, description, and access controls to hold a collection of Features.
A FeatureCollection
represents one of those vector products with some filters applied
to it.
A FeatureCollection
doesn’t actually contain data. Instead,
filter()
sets up the filters to be used, then
features()
returns an iterator over the matching Features
collection, and retrieving
the first value from the iterator will perform the query.
Each of those methods return a new
FeatureCollection
instance, allowing you to partially apply and chain filters.
Creating FeatureCollections¶
To see existing FeatureCollections
that
you have access to, use
list()
:
>>> fcs = dl.vectors.FeatureCollection.list()
>>> fcs[:3]
[FeatureCollection({
'description': '',
'id': 'noaa_tornado_reports',
'name': 'noaa_tornado_reports',
'title': 'NOAA Tornado Reports'
}),
FeatureCollection({
'description': '',
'id': '06d1f4694ead46a49f6b32194dfadac',
'name': 'us_congressional_districts_area',
'title': 'Congressional Districts of the USA'
}),
FeatureCollection({
'description': 'This product was created using an example file.',
'id': '6c7945a01f1842d983417223b226673',
'name': 'my_test_product',
'owners': ['user:d4ef22d5a6969cb61147ec8ea3e060cdf33e1a49', 'org:descarteslabs'],
'readers': [],
'title': 'My Test Product',
'writers': []
})]
You can instantiate a
FeatureCollection
object
from an existing Vector product using its ID:
>>> us_cities_fc = dl.vectors.FeatureCollection("d1349cc2d8854d998aa6da92dc2bd24")
>>> us_cities_fc
FeatureCollection({
'description': '',
'id': 'd1349cc2d8854d998aa6da92dc2bd24',
'name': 'us_cities_area',
'title': 'Cities of the USA'
})
To create a new Vector product, use
create()
.
You must supply an id , a human-readable title, and a description.
You can also supply optional owners
, readers
, and writers
lists.
Ids must be less than 204 characters and may only contain alphanumeric
characters, dashes (-
), and underscores (_
).
>>> import uuid
>>>
>>> unique_id = uuid.uuid4().hex
>>> mountains_of_middle_earth = dl.vectors.FeatureCollection.create(
... product_id="mome_" + unique_id,
... title="Mountains of Middle Earth",
... description="Nice spots to climb around the Shire"
... )
Your user id is prepended to the id in the created
FeatureCollection
:
>>> mountains_of_middle_earth.id
'd4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:mome_303f39838def4edf840cb86a2fe5eba6'
Modifying FeatureCollections¶
To modify the metadata of a
FeatureCollection
(its
title, description, access control lists, etc.), use
update()
:
>>> mountains_of_middle_earth.update(
... description="Mt. Doom is on private land; landowner not climber-friendly",
... writers=mountains_of_middle_earth.writers + ['org:descarteslabs']
... )
>>>
>>> mountains_of_middle_earth
FeatureCollection({
'description': 'Mt. Doom is on private land; landowner not climber-friendly',
'id': 'd4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:mome_303f39838def4edf840cb86a2fe5eba6',
'name': None,
'owners': ['user:d4ef22d5a6969cb61147ec8ea3e060cdf33e1a49', 'org:descarteslabs'],
'readers': [],
'title': 'Mountains of Middle Earth',
'writers': ['org:descarteslabs']
})
Delete a Vector product that you own using
delete()
:
>>> id = mountains_of_middle_earth.id
>>>
>>> before_delete = id in [fc.id for fc in dl.vectors.FeatureCollection.list()]
>>> before_delete
True
>>> mountains_of_middle_earth.delete()
>>>
>>> after_delete = id in [fc.id for fc in dl.vectors.FeatureCollection.list()]
>>> after_delete
False
Adding Features¶
To add Features to a
FeatureCollection
, use
add()
and pass in a Feature
instance, or a list of them. The method returns a copy of the
Features, with the id
property now set.
>>> # first, we need a FeatureCollection to add to
>>> nm_hotsprings = dl.vectors.FeatureCollection.create(
... product_id="hs_" + unique_id,
... title="Geothermal Springs in New Mexico",
... description="Data from 1980 NOAA 'Thermal Springs List for the United States'"
... )
Make some Features:
>>> hotsprings_features = [
... dl.vectors.Feature(
... shapely.geometry.Point(-106.646, 35.938),
... {'name': 'San Antonio', 'temp_c': 54, 'category': 'hot'}
... ),
... dl.vectors.Feature(
... shapely.geometry.Point(-106.827, 35.548),
... {'name': 'San Ysidro', 'temp_c': 20, 'category': 'warm', 'fun': 'no'}
... ),
... dl.vectors.Feature(
... shapely.geometry.Point(-108.209, 33.199),
... {'name': 'Gila', 'temp_c': 66, 'category': 'hot'}
... ),
... ]
>>>
>>> added_features = nm_hotsprings.add(hotsprings_features)
>>> added_features
[Feature({
'geometry': {
'coordinates': (-106.646, 35.938),
'type': 'Point'
},
'id': 'e7a939450e18159e24ff05be3eb13ecc0937b4f152...32b22f45e1b8038e060f546b77_5fa35de47ddd4af1',
'properties': {
'category': 'hot',
'name': 'San Antonio',
'temp_c': 54
},
'type': 'Feature'
}),
Feature({
'geometry': {
'coordinates': (-106.827, 35.548),
'type': 'Point'
},
'id': 'e7a939450e18159e24ff05be3eb13ecc0937b4f152...32b22f45e1b8038e060f546b77_6dbdb48c191d4a88',
'properties': {
'category': 'warm',
'fun': 'no',
'name': 'San Ysidro',
'temp_c': 20
},
'type': 'Feature'
}),
Feature({
'geometry': {
'coordinates': (-108.209, 33.199),
'type': 'Point'
},
'id': 'e7a939450e18159e24ff05be3eb13ecc0937b4f152...32b22f45e1b8038e060f546b77_f988bc9419404209',
'properties': {
'category': 'hot',
'name': 'Gila',
'temp_c': 66
},
'type': 'Feature'
})]
Notice that the returned Features have an ID set.
Features do not need to follow a fixed schema: notice how the
Feature
for San
Ysidro hot springs has {'fun': 'no'}
set, whereas the other hot
springs do not have a fun
property (since obviously they both
are). Features
can
have different properties, or different types of values for properties
of the same name, so be prepared for this in your code.
Querying FeatureCollections¶
FeatureCollections
can
be queried spatially, as well as by their key-value properties,
with the
filter()
method.
Remember that a
FeatureCollection
represents a Vector product with filters applied to it. That means
that each call to
filter()
returns a new
FeatureCollection
instance, still referring to the same underlying product, but with
more filters applied. This lets you start with one query, and chain
more onto it.
Spatial Filtering¶
To add a spatial query to a
FeatureCollection
, pass
a GeoJSON geometry dict, or object with __geo_interface__
, to
the geometry
keyword argument of
filter()
.
Only Features
that
intersect that geometry will be selected. Any geometry type can be
used (though Point doesn’t make a whole lot of sense).
>>> northern_nm_polygon = {
... "type": "Polygon",
... "coordinates": [[[-107, 35], [-105, 35], [-105, 37], [-107, 37], [-107, 35]]]
... }
>>>
>>> northern_nm_springs = nm_hotsprings.filter(geometry=northern_nm_polygon)
>>>
>>> southern_nm_polygon = {
... "type": "Polygon",
... "coordinates": [[[-109, 32], [-106, 32], [-106, 35], [-109, 35], [-109, 32]]]
... }
>>>
>>> southern_nm_springs = nm_hotsprings.filter(geometry=southern_nm_polygon)
Notice that calling
filter()
returns a copy of the
FeatureCollection
, not
the Features
themselves:
>>> northern_nm_springs
FeatureCollection({
'description': "Data from 1980 NOAA 'Thermal Springs List for the United States'",
'id': 'd4ef22d5a6969cb61147ec8ea3e060cdf33e1a49:hs_303f39838def4edf840cb86a2fe5eba6',
'name': None,
'owners': ['user:d4ef22d5a6969cb61147ec8ea3e060cdf33e1a49', 'org:descarteslabs'],
'readers': [],
'title': 'Geothermal Springs in New Mexico',
'writers': []
})
The two FeatureCollections
(northern_nm_springs
and southern_nm_springs
) refer to the
same Vector product, but will return different data when iterating
through
features()
:
>>> nnm_features = list(northern_nm_springs.features())
>>> nnm_features
[Feature({
'geometry': {
'coordinates': (-106.646, 35.938),
'type': 'Point'
},
'id': 'e7a939450e18159e24ff05be3eb13ecc0937b4f152...32b22f45e1b8038e060f546b77_5fa35de47ddd4af1',
'properties': {
'category': 'hot',
'name': 'San Antonio',
'temp_c': 54
},
'type': 'Feature'
}),
Feature({
'geometry': {
'coordinates': (-106.827, 35.548),
'type': 'Point'
},
'id': 'e7a939450e18159e24ff05be3eb13ecc0937b4f152...32b22f45e1b8038e060f546b77_6dbdb48c191d4a88',
'properties': {
'category': 'warm',
'fun': 'no',
'name': 'San Ysidro',
'temp_c': 20
},
'type': 'Feature'
})]
>>> snm_features = list(southern_nm_springs.features())
>>> snm_features
[Feature({
'geometry': {
'coordinates': (-108.209, 33.199),
'type': 'Point'
},
'id': 'e7a939450e18159e24ff05be3eb13ecc0937b4f152...32b22f45e1b8038e060f546b77_f988bc9419404209',
'properties': {
'category': 'hot',
'name': 'Gila',
'temp_c': 66
},
'type': 'Feature'
})]
Property Filtering¶
To add a properties filter to a
FeatureCollection
, the
descarteslabs.vectors.properties
helper
lets you use normal Python operators to specify comparisons that
you can pass to the properties
keyword argument of
filter()
.
For example:
>>> from descarteslabs.vectors import properties as p
>>>
>>> very_hot_hotsprings = nm_hotsprings.filter(
... properties=(p.category == "hot") & (p.temp_c > 60)
... )
>>>
>>> vh_features = list(very_hot_hotsprings.features())
>>> vh_features
[Feature({
'geometry': {
'coordinates': (-108.209, 33.199),
'type': 'Point'
},
'id': 'e7a939450e18159e24ff05be3eb13ecc0937b4f152...32b22f45e1b8038e060f546b77_f988bc9419404209',
'properties': {
'category': 'hot',
'name': 'Gila',
'temp_c': 66
},
'type': 'Feature'
})]
To refer to a field in your data, just access that attribute by
name from descarteslabs.vectors.properties
. Then
you can use Python binary comparison operators (>
, <
, >=
,
<=
, ==
, !=
). like
is also supported for pattern-matching
in strings; see the like example for more.
To combine these expressions, use &
(logical AND), |
(logical
OR), and parenthesis. Using Python and
and or
will not
work as expected:
>>> type(p.a > 1 and p.b == 1) # just returns the `p.b == 1` part
descarteslabs.common.property_filtering.filtering.EqExpression
>>> type((p.a > 1) & (p.b == 1)) # AndExpression as intended
descarteslabs.common.property_filtering.filtering.AndExpression
When filtering, the value of a field that doesn’t exist is considered
None
. Additionally, if the types of a field’s value and the
value it’s compared to are incompatible, the comparison evaluates
to False.
Retrieving Data¶
The filters you set aren’t actually applied until you iterate through features()
.
This means you can start with one filtered FeatureCollection
and chain other filters onto it:
>>> hot_hotsprings = nm_hotsprings.filter(properties=(p.category == "hot"))
>>> northern_hot_hotsprings = hot_hotsprings.filter(geometry=northern_nm_polygon)
>>> southern_hot_hotsprings = hot_hotsprings.filter(geometry=southern_nm_polygon)
>>>
>>> nnm_features = list(northern_hot_hotsprings.features())
>>> nnm_features
[Feature({
'geometry': {
'coordinates': (-106.646, 35.938),
'type': 'Point'
},
'id': 'e7a939450e18159e24ff05be3eb13ecc0937b4f152...32b22f45e1b8038e060f546b77_5fa35de47ddd4af1',
'properties': {
'category': 'hot',
'name': 'San Antonio',
'temp_c': 54
},
'type': 'Feature'
})]
>>> snm_features = list(southern_hot_hotsprings.features())
>>> snm_features
[Feature({
'geometry': {
'coordinates': (-108.209, 33.199),
'type': 'Point'
},
'id': 'e7a939450e18159e24ff05be3eb13ecc0937b4f152...32b22f45e1b8038e060f546b77_f988bc9419404209',
'properties': {
'category': 'hot',
'name': 'Gila',
'temp_c': 66
},
'type': 'Feature'
})]
The chained filters are logically ANDed together:
>>> southern_notfun_hotsprings = southern_hot_hotsprings.filter(
... properties=(p.fun == 'no')
... )
>>>
>>> notfun_features = list(southern_notfun_hotsprings.features())
>>> notfun_features
[]
You can’t chain multiple geometries, however—filtering with a new geometry simply replaces the old one.
Note
Because Vector products can potentially contain millions
or billions of Features, you must specify some filter in
order to iterate through
features()
.
Not doing so will raise an error:
>>> list(nm_hotsprings.features())
---------------------------------------------------------------------------
BadRequestError Traceback (most recent call last)
/tmp/ipykernel_7/3620452046.py in <module>
----> 1 list(nm_hotsprings.features())
~/.cache/bazel/_bazel_stephen/94c12998f847de25a1ba31a2db877e84/sandbox/linux-sandbox/6/execroot/monorepo/bazel-out/k8-opt-exec-2B5CBBC6/bin/docs/guides/pweave-vectors.runfiles/monorepo/descarteslabs/vectors/featurecollection.py in features(self)
412 )
413
--> 414 return _FeaturesIterator(self.vector_client.search_features(**params))
415
416 # TODO: remove name from params
~/.cache/bazel/_bazel_stephen/94c12998f847de25a1ba31a2db877e84/sandbox/linux-sandbox/6/execroot/monorepo/bazel-out/k8-opt-exec-2B5CBBC6/bin/docs/guides/pweave-vectors.runfiles/monorepo/descarteslabs/client/services/vector/vector.py in search_features(self, product_id, geometry, query_expr, query_limit, **kwargs)
954 """
955 return _SearchFeaturesIterator(
--> 956 self, product_id, geometry, query_expr, query_limit
957 )
958
~/.cache/bazel/_bazel_stephen/94c12998f847de25a1ba31a2db877e84/sandbox/linux-sandbox/6/execroot/monorepo/bazel-out/k8-opt-exec-2B5CBBC6/bin/docs/guides/pweave-vectors.runfiles/monorepo/descarteslabs/client/services/vector/vector.py in __init__(self, client, product_id, geometry, query_expr, query_limit, **kwargs)
27
28 # updates _continuation_token, _page_offset, _page_len
---> 29 self._next_page()
30 self._length = self._page.meta.total_results
31
~/.cache/bazel/_bazel_stephen/94c12998f847de25a1ba31a2db877e84/sandbox/linux-sandbox/6/execroot/monorepo/bazel-out/k8-opt-exec-2B5CBBC6/bin/docs/guides/pweave-vectors.runfiles/monorepo/descarteslabs/client/services/vector/vector.py in _next_page(self)
55 query_limit=self._query_limit,
56 continuation_token=self._continuation_token,
---> 57 **self._kwargs
58 )
59
~/.cache/bazel/_bazel_stephen/94c12998f847de25a1ba31a2db877e84/sandbox/linux-sandbox/6/execroot/monorepo/bazel-out/k8-opt-exec-2B5CBBC6/bin/docs/guides/pweave-vectors.runfiles/monorepo/descarteslabs/client/services/vector/vector.py in _fetch_feature_page(self, product_id, geometry, query_expr, query_limit, continuation_token, **kwargs)
887 }
888
--> 889 r = self.session.post("/products/{}/search".format(product_id), json=params)
890 return DotDict(r.json())
891
~/.cache/bazel/_bazel_stephen/94c12998f847de25a1ba31a2db877e84/sandbox/linux-sandbox/6/execroot/monorepo/bazel-out/k8-opt-exec-2B5CBBC6/bin/docs/guides/pweave-vectors.runfiles/requirements_requests/requests/sessions.py in post(self, url, data, json, **kwargs)
633 """
634
--> 635 return self.request("POST", url, data=data, json=json, **kwargs)
636
637 def put(self, url, data=None, **kwargs):
~/.cache/bazel/_bazel_stephen/94c12998f847de25a1ba31a2db877e84/sandbox/linux-sandbox/6/execroot/monorepo/bazel-out/k8-opt-exec-2B5CBBC6/bin/docs/guides/pweave-vectors.runfiles/monorepo/descarteslabs/client/services/service/service.py in request(self, *args, **kwargs)
399
400 try:
--> 401 resp = super(JsonApiSession, self).request(*args, **kwargs)
402 except (ClientError, ServerError) as error:
403 if self.rewrite_errors:
~/.cache/bazel/_bazel_stephen/94c12998f847de25a1ba31a2db877e84/sandbox/linux-sandbox/6/execroot/monorepo/bazel-out/k8-opt-exec-2B5CBBC6/bin/docs/guides/pweave-vectors.runfiles/monorepo/descarteslabs/common/http/session.py in request(self, method, url, headers, **kwargs)
285 return resp
286 elif resp.status_code == HTTPStatus.BAD_REQUEST:
--> 287 raise BadRequestError(resp.text)
288 elif resp.status_code == HTTPStatus.NOT_FOUND:
289 text = resp.text
BadRequestError: {"errors": [{"status": "400", "detail": "No query given and no limit set: one of geometry, query_expr must be set or a limit given"}]}
Modifying and deleting Features¶
It isn’t possible to modify individual Features
in a
FeatureCollection
. Keep
this in mind when adding data that might change to a
FeatureCollection
. If
you want to retain multiple versions of Features
, you could use a "version"
property on your Features
to make it possible to query for different versions of your data.
If you need to make changes to features from a Vector product and
you don’t want multiple versions of Features
, you can create a new
Vector product, copy over all the Features
from the old one that you
want to keep, and make any changes to them in the process of copying.
It is possible to delete Features
using a filter. You apply
a filter much the same way you do to retrieve Features
from a
FeatureCollection
;
however you cannot use
limit()
when deleting Features
.
Once you create your filters, call
delete_features()
to start removing features from the collection.
>>> from descarteslabs.vectors import FailedJobError
>>>
>>> hot_hotsprings = nm_hotsprings.filter(properties=(p.category == "hot"))
>>>
>>> try:
... delete_job = hot_hotsprings.delete_features()
... except FailedJobError:
... print(delete_job.state, delete_job.errors)
...
Because deleting features can take a long time,
delete_features()
returns a DeleteJob
. If
you need to ensure the DeleteJob
ran successfully, you can use the
wait_for_completion()
to
block until the job reports that is is done. You can then check the
state of the job, how long it took to complete, and what errors,
if any, occurred.
It’s only possible to run a single
DeleteJob
at a time per
Vector product. If you have multiple filters you need to apply,
you’ll need to wait for each job to complete before running the
next delete filter.
A note on small polygons¶
Polygons and multi-polygons that have points or edges that are are separated on the order of microns
can confuse our optimised spatial filtering. If you’re working with Features that have very small
geometries you may find that your spatial filters are returning Features outside your area of interest.
If you run into this issue, you can shapely.intersects
to perform post-Filtering:
from shapely.geometry import shape
aoi = shape({
"type":"Polygon",
"coordinates":[
[
[-85.84030613794451,30.165376869552276],
[-85.77959777508137,30.165376869552276],
[-85.77959777508137,30.193893824317527],
[-85.84030613794451,30.193893824317527],
[-85.84030613794451,30.165376869552276]
]
]})
features = [f for f in fc.filter(geometry=aoi).features() if aoi.intersects(f.geometry)]