Name		Name	Last commit message	Last commit date
parent directory ..
21MNT.parquet		21MNT.parquet
readme.md		readme.md
requirements.txt		requirements.txt
sample_metadata.json		sample_metadata.json
schema.json		schema.json
validate.py		validate.py

readme.md

Earth Index Embeddings

Submission Details

Submitter (Affiliation): Ben Strong (Earth Genome), Hutch (Tom) Ingold (Earth Genome)
Data Provider (Legal Entity): Earth Genome (501(c)(3) Nonprofit)
Homepage: http://earthindex.ai/

Overview

Earth Index is a platform for tile-level geospatial search and classification of satellite imagery. Earth Index works by “pre-indexing” the planet for search using earth observation foundation models.

Use Cases

We've been using embeddings for human-in-the-loop tile-level search (approximate nearest neighbors) and classification (linear or other lightweight models). Some downstream applications include monitoring targets like:

Poultry CAFOs
Artisinal gold mining
Infrastucture development (roads, dams, etc.)
Landfills and waste sites

Data

URL: [tbd, on source.coop]
Documentation: [tbd]
Projection: EPSG:4326
License: CC-BY

Samples

Validation

GeoParquet validation: gpq validate embeddings.parquet
Emb sample metadata and parquet metadata validation: python3 validate.py

Columns

Field Name	Type	Description
geometry	geometry	Geometry of tile used to generate embeddings
id	string	Unique identifier for tile
embedding	array	The vector embedding

File structure

The files originate from MGRS 100km x 100km imagery tile, e.g. 21MNT, which the parquet files inherit. That being said, using a filename as metadata is an easy way to lose context so we make sure that all relevant metadata from the filename also gets embedding in the file itself.

Metadata

The embedding.parquet file is also a valid Geoparquet file, with the expected metadata stored under the geo key. Additionally we have added an emb key for the embeddings metadata - also in JSON format.

The metadata adheres to the json schema provided. For interoperability with STAC this schema references some STAC schema elements, specifically provider, datetime and licensing

EMB Metadata

Field Name	Type	Description
version	string	fixed at `0.0.1` (required)
model	object<string, object>	see Model metadata
providers	object<string, object>	provider metadata -- see provider
licensing	object<string, object>	licensing metadata -- see licensing
datetime	object<string, object>	datetime metadata -- see datetime
source_datasets	object<string, object>	datasets used to generate embeddings - see Dataset metadata
embedding	object<string, object>	Embeddings metadata

Model metadata

Field Name	Type	Description
id	string	id of model (required)
source	string	URI for model (required)
version	string	version of model
family	string	model family
name	string	Name of model
description	string	human readable description of model
config	string	Configuration of model; includes information needed to generate embeddings (e.g. what layers were extracted)

Dataset metadata

Field Name	Type	Description
id	string	Dataset id (required)
name	string	Dataset used to generate the embeddings
description	string	Dataset used to generate the embeddings
source	string	URI for dataset that was used by the embeddings model (required)

Embeddings metadata

Field Name	Type	Description
dim	int	Embeddings size
quantization	string	Description of quantization scheme, if any

Design thoughts / open questions

We also duplicate most of this metadata in STAC and have reused STAC metadata definitions to ease interoperability between parquet and STAC .
We've opted to put the metadata within the emb key, in the same style as geoparquet. But we're not committed to this and are interested in other opinions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

earth_index

earth_index

readme.md

Earth Index Embeddings

Submission Details

Overview

Use Cases

Data

Samples

Validation

Columns

File structure

Metadata

EMB Metadata

Model metadata

Dataset metadata

Embeddings metadata

Design thoughts / open questions

Files

earth_index

Directory actions

More options

Directory actions

More options

Latest commit

History

earth_index

Folders and files

parent directory

readme.md

Earth Index Embeddings

Submission Details

Overview

Use Cases

Data

Samples

Validation

Columns

File structure

Metadata

EMB Metadata

Model metadata

Dataset metadata

Embeddings metadata

Design thoughts / open questions