Skip to content

Commit

Permalink
Make consistent: STAC, PgSTAC, pyPgSTAC (#244)
Browse files Browse the repository at this point in the history
* Make caps consistent: PgSTAC, pyPgSTAC

* Capitalization consistency: STAC
  • Loading branch information
tylere authored Mar 5, 2024
1 parent f54094e commit 9effb17
Show file tree
Hide file tree
Showing 11 changed files with 42 additions and 42 deletions.
2 changes: 1 addition & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"name": "PGStac",
"name": "PgSTAC",
"dockerComposeFile": "../docker-compose.yml",
"service": "pgstac",
"workspaceFolder": "/opt/src"
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Development - Contributing

PGStac uses a dockerized development environment. However,
PgSTAC uses a dockerized development environment. However,
it still needs a local install of pypgstac to allow an editable
install inside the docker container. This is installed automatically
if you have set up a virtual environment for the project. Otherwise
Expand Down Expand Up @@ -58,7 +58,7 @@ This will create a base migration for the new version and will create incrementa
All changes to SQL should only be made in the `/src/pgstac/sql` directory. SQL Files will be run in alphabetical order.

### Adding Tests
PGStac tests can be written using PGTap or basic SQL output comparisons. Additional testing is available using PyTest in the PyPgSTAC module. Tests can be run using the `scripts/test` command.
PgSTAC tests can be written using PGTap or basic SQL output comparisons. Additional testing is available using PyTest in the PyPgSTAC module. Tests can be run using the `scripts/test` command.

PGTap tests can be written using [PGTap](https://pgtap.org/) syntax. Tests should be added to the `/src/pgstac/tests/pgtap` directory. Any new sql files added to this directory must be added to `/src/pgstac/tests/pgtap.sql`.

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

PgSTAC provides functionality for STAC Filters, CQL2 search, and utilities to help manage the indexing and partitioning of STAC Collections and Items.

PgSTAC is used in production to scale to hundreds of millions of STAC items. PgSTAC implements core data models and functions to provide a STAC API from a PostgreSQL database. PgSTAC is entirely within the database and does not provide an HTTP-facing API. The [Stac FastAPI](https://github.com/stac-utils/stac-fastapi) PgSTAC backend and [Franklin](https://github.com/azavea/franklin) can be used to expose a PgSTAC catalog. Integrating PgSTAC with any other language with PostgreSQL drivers is also possible.
PgSTAC is used in production to scale to hundreds of millions of STAC items. PgSTAC implements core data models and functions to provide a STAC API from a PostgreSQL database. PgSTAC is entirely within the database and does not provide an HTTP-facing API. The [STAC FastAPI](https://github.com/stac-utils/stac-fastapi) PgSTAC backend and [Franklin](https://github.com/azavea/franklin) can be used to expose a PgSTAC catalog. Integrating PgSTAC with any other language with PostgreSQL drivers is also possible.

PgSTAC Documentation: https://stac-utils.github.io/pgstac/pgstac

Expand Down
26 changes: 13 additions & 13 deletions docs/src/pgstac.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@

PGDatabase Schema and Functions for Storing and Accessing STAC collections and items in PostgreSQL

STAC Client that uses PGStac available in [STAC-FastAPI](https://github.com/stac-utils/stac-fastapi)
STAC Client that uses PgSTAC available in [STAC-FastAPI](https://github.com/stac-utils/stac-fastapi)

PGStac requires **Postgresql>=13** and **PostGIS>=3**. Best performance will be had using PostGIS>=3.1.
PgSTAC requires **Postgresql>=13** and **PostGIS>=3**. Best performance will be had using PostGIS>=3.1.

### PGStac Settings
PGStac installs everything into the pgstac schema in the database. This schema must be in the search_path in the postgresql session while using pgstac.
### PgSTAC Settings
PgSTAC installs everything into the pgstac schema in the database. This schema must be in the search_path in the postgresql session while using pgstac.


#### PGStac Users
#### PgSTAC Users
The pgstac_admin role is the owner of all the objects within pgstac and should be used when running things such as migrations.

The pgstac_ingest role has read/write privileges on all tables and should be used for data ingest or if using the transactions extension with stac-fastapi-pgstac.
Expand All @@ -28,7 +28,7 @@ To grant pgstac permissions to a current postgresql user:
GRANT pgstac_read TO <user>;
```

#### PGStac Search Path
#### PgSTAC Search Path
The search_path can be set at the database level or role level or by setting within the current session. The search_path is already set if you are directly using one of the pgstac users. If you are not logging in directly as one of the pgstac users, you will need to set the search_path by adding it to the search_path of the user you are using:
```sql
ALTER ROLE <user> SET SEARCH_PATH TO pgstac, public;
Expand All @@ -45,13 +45,13 @@ kwargs={
}
```

#### PGStac Settings Variables
#### PgSTAC Settings Variables
There are additional variables that control the settings used for calculating and displaying context (total row count) for a search, as well as a variable to set the filter language (cql-json or cql-json2).
The context is "off" by default, and the default filter language is set to "cql2-json".

Variables can be set either by passing them in via the connection options using your connection library, setting them in the pgstac_settings table or by setting them on the Role that is used to log in to the database.

Turning "context" on can be **very** expensive on larger databases. Much of what PGStac does is to optimize the search of items sorted by time where only fewer than 10,000 records are returned at a time. It does this by searching for the data in chunks and is able to "short circuit" and return as soon as it has the number of records requested. Calculating the context (the total count for a query) requires a scan of all records that match the query parameters and can take a very long time. Setting "context" to auto will use database statistics to estimate the number of rows much more quickly, but for some queries, the estimate may be quite a bit off.
Turning "context" on can be **very** expensive on larger databases. Much of what PgSTAC does is to optimize the search of items sorted by time where only fewer than 10,000 records are returned at a time. It does this by searching for the data in chunks and is able to "short circuit" and return as soon as it has the number of records requested. Calculating the context (the total count for a query) requires a scan of all records that match the query parameters and can take a very long time. Setting "context" to auto will use database statistics to estimate the number of rows much more quickly, but for some queries, the estimate may be quite a bit off.

Example for updating the pgstac_settings table with a new value:
```sql
Expand Down Expand Up @@ -92,19 +92,19 @@ The nohydrate conf item returns an unhydrated item bypassing the CPU intensive s
SELECT search('{"conf":{"nohydrate"=true}}');
```
#### PGStac Partitioning
By default PGStac partitions data by collection (note: this is a change starting with version 0.5.0). Each collection can further be partitioned by either year or month. **Partitioning must be set up prior to loading any data!** Partitioning can be configured by setting the partition_trunc flag on a collection in the database.
#### PgSTAC Partitioning
By default PgSTAC partitions data by collection (note: this is a change starting with version 0.5.0). Each collection can further be partitioned by either year or month. **Partitioning must be set up prior to loading any data!** Partitioning can be configured by setting the partition_trunc flag on a collection in the database.
```sql
UPDATE collections set partition_trunc='month' WHERE id='<collection id>';
```
In general, you should aim to keep each partition less than a few hundred thousand rows. Further partitioning (ie setting everything to 'month' when not needed to keep the partitions below a few hundred thousand rows) can be detrimental.
#### PGStac Indexes / Queryables
#### PgSTAC Indexes / Queryables
By default, PGStac includes indexes on the id, datetime, collection, and geometry. Further indexing can be added for additional properties globally or only on particular collections by modifications to the queryables table.
By default, PgSTAC includes indexes on the id, datetime, collection, and geometry. Further indexing can be added for additional properties globally or only on particular collections by modifications to the queryables table.
The `queryables` table controls the indexes that PGStac will build as well as the metadata that is returned from a [STAC Queryables endpoint](https://github.com/stac-api-extensions/filter#queryables).
The `queryables` table controls the indexes that PgSTAC will build as well as the metadata that is returned from a [STAC Queryables endpoint](https://github.com/stac-api-extensions/filter#queryables).
| Column | Description | Type | Example |
|-----------------------|--------------------------------------------------------------------------|------------|--------------------------------------------------------------------------------------------------------------------|
Expand Down
18 changes: 9 additions & 9 deletions docs/src/pypgstac.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@

PgSTAC includes a Python utility for bulk data loading and managing migrations.

PyPGStac is available on PyPI
pyPgSTAC is available on PyPI
```
pip install pypgstac
```

By default, PyPGStac does not install the `psycopg` dependency. If you want the database driver installed, use:
By default, pyPgSTAC does not install the `psycopg` dependency. If you want the database driver installed, use:

```
pip install pypgstac[psycopg]
Expand Down Expand Up @@ -39,7 +39,7 @@ Commands:
version Get version from a pgstac database.
```

PyPGStac will get the database connection settings from the **standard PG environment variables**:
pyPgSTAC will get the database connection settings from the **standard PG environment variables**:

- PGHOST=0.0.0.0
- PGPORT=5432
Expand All @@ -50,18 +50,18 @@ PyPGStac will get the database connection settings from the **standard PG enviro
It can also take a DSN database url "postgresql://..." via the **--dsn** flag.

### Migrations
PyPGStac has a utility to help apply migrations to an existing PGStac instance to bring it up to date.
pyPgSTAC has a utility to help apply migrations to an existing PgSTAC instance to bring it up to date.

There are two types of migrations:
- **Base migrations** install PGStac into a database with no current PGStac installation. These migrations follow the file pattern `"pgstac.[version].sql"`
- **Incremental migrations** are used to move PGStac from one version to the next. These migrations follow the file pattern `"pgstac.[version].[fromversion].sql"`
- **Base migrations** install PgSTAC into a database with no current PgSTAC installation. These migrations follow the file pattern `"pgstac.[version].sql"`
- **Incremental migrations** are used to move PgSTAC from one version to the next. These migrations follow the file pattern `"pgstac.[version].[fromversion].sql"`

Migrations are stored in ```pypgstac/pypgstac/migration`s``` and are distributed with the PyPGStac package.
Migrations are stored in ```pypgstac/pypgstac/migration`s``` and are distributed with the pyPgSTAC package.

### Running Migrations
PyPGStac has a utility for checking the version of an existing PGStac database and applying the appropriate migrations in the correct order. It can also be used to setup a database from scratch.
pyPgSTAC has a utility for checking the version of an existing PgSTAC database and applying the appropriate migrations in the correct order. It can also be used to setup a database from scratch.

To create an initial PGStac database or bring an existing one up to date, check you have the pypgstac version installed you want to migrate to and run:
To create an initial PgSTAC database or bring an existing one up to date, check you have the pypgstac version installed you want to migrate to and run:
```
pypgstac migrate
```
Expand Down
2 changes: 1 addition & 1 deletion src/pypgstac/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# pypgstac

Python tools for working with PGStac
Python tools for working with PgSTAC
2 changes: 1 addition & 1 deletion src/pypgstac/python/pypgstac/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""PyPGStac Version."""
"""pyPgSTAC Version."""
from pypgstac.version import __version__

__all__ = ["__version__"]
10 changes: 5 additions & 5 deletions src/pypgstac/python/pypgstac/db.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Base library for database interaction with PgStac."""
"""Base library for database interaction with PgSTAC."""
import atexit
import logging
import time
Expand Down Expand Up @@ -53,7 +53,7 @@ class Settings(BaseSettings):


class PgstacDB:
"""Base class for interacting with PgStac Database."""
"""Base class for interacting with PgSTAC Database."""

def __init__(
self,
Expand Down Expand Up @@ -260,7 +260,7 @@ def version(self) -> Optional[str]:
if isinstance(version, str):
return version
except psycopg.errors.UndefinedTable:
logger.debug("PGStac is not installed.")
logger.debug("PgSTAC is not installed.")
if self.connection is not None:
self.connection.rollback()
return None
Expand All @@ -278,7 +278,7 @@ def pg_version(self) -> str:
version = version.decode()
if isinstance(version, str):
if int(version.split(".")[0]) < 13:
raise Exception("PGStac requires PostgreSQL 13+")
raise Exception("PgSTAC requires PostgreSQL 13+")
return version
else:
if self.connection is not None:
Expand All @@ -299,5 +299,5 @@ def func(self, function_name: str, *args: Any) -> Generator:
return self.query(base_query, cleaned_args)

def search(self, query: Union[dict, str, psycopg.types.json.Jsonb] = "{}") -> str:
"""Search PgStac."""
"""Search PgSTAC."""
return dumps(next(self.func("search", query))[0])
2 changes: 1 addition & 1 deletion src/pypgstac/python/pypgstac/migrate.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ def run_migration(self, toversion: Optional[str] = None) -> str:
toversion = "unreleased"

pg_version = self.db.pg_version
logger.info(f"Migrating PGStac on PostgreSQL Version {pg_version}")
logger.info(f"Migrating PgSTAC on PostgreSQL Version {pg_version}")
oldversion = self.db.version
if oldversion == toversion:
logger.info(f"Target database already at version: {toversion}")
Expand Down
12 changes: 6 additions & 6 deletions src/pypgstac/python/pypgstac/pypgstac.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@


class PgstacCLI:
"""CLI for PgStac."""
"""CLI for PgSTAC."""

def __init__(
self,
Expand All @@ -22,7 +22,7 @@ def __init__(
debug: bool = False,
usequeue: bool = False,
):
"""Initialize PgStac CLI."""
"""Initialize PgSTAC CLI."""
if version:
sys.exit(0)

Expand All @@ -39,7 +39,7 @@ def initversion(self) -> str:

@property
def version(self) -> Optional[str]:
"""Get PGStac version installed on database."""
"""Get PgSTAC version installed on database."""
return self._db.version

@property
Expand All @@ -52,11 +52,11 @@ def pgready(self) -> None:
self._db.wait()

def search(self, query: str) -> str:
"""Search PgStac."""
"""Search PgSTAC."""
return self._db.search(query)

def migrate(self, toversion: Optional[str] = None) -> str:
"""Migrate PgStac Database."""
"""Migrate PgSTAC Database."""
migrator = Migrate(self._db)
return migrator.run_migration(toversion=toversion)

Expand All @@ -68,7 +68,7 @@ def load(
dehydrated: Optional[bool] = False,
chunksize: Optional[int] = 10000,
) -> None:
"""Load collections or items into PGStac."""
"""Load collections or items into PgSTAC."""
loader = Loader(db=self._db)
if table == "collections":
loader.load_collections(file, method)
Expand Down
4 changes: 2 additions & 2 deletions src/pypgstac/tests/hydration/test_hydrate_pg.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Test Hydration in PGStac."""
"""Test Hydration in PgSTAC."""
import os
from contextlib import contextmanager
from typing import Any, Dict, Generator
Expand All @@ -12,7 +12,7 @@


class TestHydratePG(THydrate):
"""Test hydration using PGStac."""
"""Test hydration using PgSTAC."""

@contextmanager
def db(self) -> Generator[PgstacDB, None, None]:
Expand Down

0 comments on commit 9effb17

Please sign in to comment.