Note
Click here to download the full example code
Tables Example¶
The tables client exposes a highly scalable backend for ingesting data and performing various types of queries, such as geospatial and temporal, over that data. This example creates a new table, ingests some rows, performs some basic queries, deletes some rows, shares the table with others, and then deletes the table.
The API reference for these methods and classes is at descarteslabs.tables
.
from descarteslabs.tables import Tables
t = Tables()
First let’s download a sample dataset in the form of a shape file so that we don’t have to create our own. We will use active fire data from NASA FIRMS (https://firms.modaps.eosdis.nasa.gov/usfs/active_fire) and assume that it was saved to a file called /tmp/modis_fire.zip. Example command to download: wget -O /tmp/modis_fire.zip https://firms.modaps.eosdis.nasa.gov/data/active_fire/modis-c6.1/shapes/zips/MODIS_C6_1_USA_contiguous_and_Hawaii_7d.zip
import requests
url = "https://firms.modaps.eosdis.nasa.gov/data/active_fire/modis-c6.1/shapes/zips/MODIS_C6_1_USA_contiguous_and_Hawaii_7d.zip"
r = requests.get(url)
with open("/tmp/modis_fire.zip", "wb") as f:
f.write(r.content)
Once you have data, the next step is to create a table that supports the features in
this shape file. We need a schema and an SRID, and a quick way to get that is by
using Tables.inspect_dataset
Note
that this method of deriving the schema and srid should not be relied upon without
manually inspecting the output due to the sheer amount of conversions possible. For
example, converting a date
type may not translate properly so it is encumbent on
the user to ensure everything looks correct before creating a table.
schema, srid = t.inspect_dataset("/tmp/modis_fire.zip")
print(schema, srid)
Out:
{'properties': {'LATITUDE': 'float', 'LONGITUDE': 'float', 'BRIGHTNESS': 'float', 'SCAN': 'float', 'TRACK': 'float', 'ACQ_DATE': 'date', 'ACQ_TIME': 'str', 'SATELLITE': 'str', 'CONFIDENCE': 'int', 'VERSION': 'str', 'BRIGHT_T31': 'float', 'FRP': 'float', 'DAYNIGHT': 'str'}, 'geometry': 'Point'} 4326
Now let’s create our table with Tables.create_table
. The table must have a primary key, which is
either provided on creation or created by the service. We’ll create one on our own
and make it auto-increment.
from uuid import uuid4
table_name = f"fires_{uuid4().hex}"
schema["properties"]["ID"] = "auto"
t.create_table(table_name, srid=srid, schema=schema, primary_key="ID")
Now that we’ve created the table, we can start inserting rows into it. We have the
dataset from earlier, so let’s just insert that using Tables.insert_rows
. Like many other asynchronous client calls in
the platform, a job ID is returned that can be used to poll the status of the job.
Tables.wait_until_completion
can be used to halt execution of the program until the ingest job succeeds.
jobid = t.upload_file("/tmp/modis_fire.zip", table_name)
t.wait_until_completion(jobid)
Once the data has been ingested, the ibis library provides an interface for us to lazily evalute queries and retrieve them as Geo/DataFrames.
fires = t.table(table_name)
print(fires.execute())
Out:
ACQ_DATE VERSION ... LATITUDE geom
0 2022-01-13 6.1NRT ... 21.25716 POINT (-77.88353 21.25716)
1 2022-01-13 6.1NRT ... 22.59575 POINT (-79.84987 22.59575)
2 2022-01-13 6.1NRT ... 22.61468 POINT (-83.06532 22.61468)
3 2022-01-13 6.1NRT ... 33.01699 POINT (-79.88882 33.01699)
4 2022-01-13 6.1NRT ... 44.68003 POINT (-109.64005 44.68003)
... ... ... ... ... ...
2728 2022-01-20 6.1NRT ... 39.48658 POINT (-122.02285 39.48658)
2729 2022-01-20 6.1NRT ... 39.48912 POINT (-122.02832 39.48912)
2730 2022-01-20 6.1NRT ... 39.69320 POINT (-122.15646 39.69320)
2731 2022-01-20 6.1NRT ... 39.74248 POINT (-122.07507 39.74248)
2732 2022-01-20 6.1NRT ... 39.74580 POINT (-122.08365 39.74580)
[2733 rows x 15 columns]
There are many different types of queries that can be performed. For example, to select specific columns:
print(fires["ID", "LONGITUDE", "LATITUDE", "ACQ_TIME"].execute())
Out:
ID LONGITUDE LATITUDE ACQ_TIME
0 1 -77.88353 21.25716 0321
1 2 -79.84987 22.59575 0321
2 3 -83.06532 22.61468 0321
3 4 -79.88882 33.01699 0323
4 5 -109.64005 44.68003 0508
... ... ... ... ...
2728 2729 -122.02285 39.48658 2038
2729 2730 -122.02832 39.48912 2038
2730 2731 -122.15646 39.69320 2038
2731 2732 -122.07507 39.74248 2038
2732 2733 -122.08365 39.74580 2038
[2733 rows x 4 columns]
A more in depth example: let’s find the first row with SATELLITE == ‘T’ whose geometry intersects with the highest confidence field. Note that you will need the ibis library to do this intersection, which is installed as a dependency of the tables client
import ibis
confidence = fires["CONFIDENCE", "geom"].execute()
confidence = confidence[confidence["CONFIDENCE"] > 90]
df_geom = fires[
(fires.SATELLITE == "T")
& fires.geom.intersects(ibis.literal(confidence.iloc[0].geom, type="point;4326"))
]
print(df_geom.execute())
Out:
ACQ_DATE VERSION ... LATITUDE geom
0 2022-01-13 6.1NRT ... 46.51318 POINT (-112.31757 46.51318)
[1 rows x 15 columns]
If we want to remove some rows from the table, we need to get a list of the primary
keys associated with those rows. We can use the Tables.delete_rows
method to achieve this. Let’s delete every row
with < 20 confidence.
to_delete = fires[fires.CONFIDENCE < 20].ID.execute()
jobid = t.delete_rows(to_delete.to_list(), table_name)
t.wait_until_completion(jobid)
Sharing is handled by the Discover service. Therefore, we will need the
Discover
client to do this.
from descarteslabs.discover import Discover, Organization
d = Discover()
To share a table you own with someone as a “viewer” (i.e. read-only), use
Discover.discover.table.share
d.table(table_name).share(with_=Organization("descarteslabs"), as_="viewer")
To change access to a table from viewer to editor (i.e. read/write), use
Discover.discover.table.replace_shares
d.table(table_name).replace_shares(
user=Organization("descarteslabs"), from_role="viewer", to_role="editor"
)
Finally, to revoke access to a table use Discover.discover.table.revoke
d.table(table_name).revoke(from_=Organization("descarteslabs"), as_="editor")
Once you are finished with a table, you can delete the entire thing with
Tables.delete_table
t.delete_table(table_name)
Total running time of the script: ( 0 minutes 12.596 seconds)