onestop

OneStop is a data discovery system being built by CIRES researchers on a grant from the NOAA National Centers for Environmental Information. We welcome contributions from the community!

This project is maintained by cedardevs

OneStop API

Estimated Reading Time: 20 minutes

Registry Overview

Table of Contents

Intro

The registry provides a horizontally-scalable API and storage for granule and collection-level metadata backed by Kafka. It publishes metadata updates to Kafka, then uses a Kafka Streams app to aggregate those raw metadata events, merging them with previous events to provide a full picture of the metadata for each granule and collection.

SwaggerHub Generated Documentation

Our OpenAPI documents are available on SwaggerHub. This should list supported endpoints and parameters necessary.

Registry OneStop Endpoint

Aside from the OpenApi documents listed above there are also the default supported HTTP Methods

NOTE: If you ever get a 401 Authorization Required add this to your curl and file in the username and password with valid credentials.

-u '<username>:<password>'

The Registry API endpoint which you would append to the end of a OneStop deployment:

Where context-path is explicitly set at time of deployment (otherwise localhost:8080)

Metadata Notes

For granule metadata you need to include the relationships field, which contains the collection UUID as OneStop knows it:

{
  "relationships": [
    {
      "type": "COLLECTION",
      "id": <collection-uuid>
    }
  ]
}

You can easily include this within Json, but it is impossible to include this within an XML document and must added via a PATCH HTTP request after the initial metadata upload.

Note: The use of backslashes in the curl examples below is simply to allow for carriage returns. This is because this example display doesn’t do line wrapping.

JSON Records

When submitting a JSON record the request body can contain any or all of the following content (links direct to the associated Avro schemas describing the accepted content):

  1. FileLocation: A map of URIs to location objects describing where the file is located
  2. FileInformation: Details about the file that this input object is in reference to
  3. Relationships: A record of this objects relationships to other objects in the inventory
  4. Publishing: Information pertaining to whether a file is private and for how long if so
  5. Discovery: Metadata about the file contents that is meant for discoverability/search/access

The complete set of Avro schemas used by OneStop Inventory Manager can be found in the schemas-core module of the Schemas repository. Many fields in the above schemas reference other schemas contained within this repository.

Example Input JSON:

{
  "fileInformation": {
    "name": "The file name",
    "size": The size of the file in bytes,
    "checksums": [
      #A list of checksums for the file
      {
        "algorithm": "checksum algorithm",
        "value": "checksum value"
      }
    ],
    "format": "Optional field to indicate the format of the file",
    "headers": "Optional field to capture a file's headers",
    "optionalAttributes": {
      #A discretionary map of key/value pairs to capture arbitrary attributes
      "EXTRA_ATTRIBUTES": "EXTRA_ATTRIBUTES"
    }
  },
  "fileLocations": {
    "A Uniform Resource Identifier as defined by RFCs 2396 and 3986": {
      "uri": "A Uniform Resource Identifier as defined by RFCs 2396 and 3986",
      "type": "The type of the file location, e.g. an ingest location, access location, etc",
      "deleted": false,
      "restricted": Is access to this location restricted from the public? true/false,
      "asynchronous": Indicates if access to this location is asynchronous, true/false,
      "locality": A string indicating the locality of the data, e.g. a FISMA boundary, an AWS Region, an archive service, etc. or null,
      "lastModified": "Datetime when the location created/last modified, in milliseconds from the unix epoch",
      "serviceType": "The type of service this location belongs to, e.g. Amazon:AWS:S3",
      "optionalAttributes": {
        #key/value pairs to capture extra attributes 
        "EXTRA_ATTRIBUTES": "EXTRA_ATTRIBUTES"
      }
    }
  },
  "relationships": [
    {
      "type": "Relationship type: only COLLECTION for now ",
      "id": "Collection id it belongs to"
    }
  ],
  "discovery": {
    "fileIdentifier": "",
    "parentIdentifier": "",
    "hierarchyLevelName": "",
    "doi": "",
    "status": "onGoing",
    "title": "title, a short description",
    "alternateTitle": "alternate title",
    "description": "description of the metadata",
    "keywords": [],
    "responsibleParties": [],
    "thumbnail": "https://www1.ncdc.noaa.gov/pub/data/metadata/images/C00811_SR_lowRes.png",
    "thumbnailDescription": "Global image of daily AVHRR surface reflectance",
    "creationDate": null,
    "revisionDate": null,
    "publicationDate": "2014-05-21",
    "citeAsStatements":[],
    "crossReferences": [],
    "accessFeeStatement": null,
    "orderingInstructions": null,
    "edition": "Version 4",
    "dsmmAccessibility": 2,
    "dsmmDataIntegrity": 3,
    "services": [ ]
    ...
  }
}

Creating And Replacing a Record

Create a record using a POST, or create or replace records using a PUT, via the endpoint specified above.

Example

Where:

curl -X PUT \
     -H "Content-Type: application/xml" \
     -u '<username>:<password>' \
     https://cedardevs.org/onestop/api/registry/metadata/collection \
     --data-binary @path/to/the/xml-file.xml

Successful response body with the format:

{
  "id"  : "<idValue>",
  "type": "<typeValue>"
}

Unsuccessful response body with the format:

{
  "errors": []
}

Read a Collection/Granule Record

Retrieve a stored record using GET and HEAD requests via the endpoint specified above. Requests sent will return the original input metadata in the Input format. Requests sent to {baseURL}/parsed will return in the ParsedRecord format. The returned object is located in the data.attributes key of the returned JSON.

Example

Where:

curl  \
    -u '<username>:<password>' \
    https://cedardevs.org/onestop/api/registry/metadata/collection/73d16fe3-7ccb-4918-b77f-30e343cdd378

Found records will return a response body with the format:

{
  "links" : {
    "input"     : "<inputUrlValue>",
    "parsed"    : "<parsedUrlValue>",
    "self"      : "<selfReferencingUrlValue>"
  },
  "data" : {
    "id"        : "<idValue>",
    "type"      : "<typeValue>",
    "attributes": "<resultObject>"
  }
}

NOTE: The links object will contain either the input or parsed URL, but not both. The self URL will refer to the endpoint at which the request was received, and that URL is the one that will not be present.

If the record isn’t found and doesn’t exist, a response body will be returned with the format:

{
  "links" : {
    "input"   : "<inputUrlValue>",
    "parsed"  : "<parsedUrlValue>",
    "self"    : "<selfReferencingUrlValue>"
  },
  "errors": [
    {
      "status": 404,
      "title" : "NOT_FOUND",
      "detail": "No input exists for <typeValue> with id [<idValue>] from source [<sourceValue>]"
    }
  ]
}

NOTE: If the request is received at the {baseURL}/parsed endpoint, the links object will contain both self and input URLs, under the assumption the parsed record may not yet be available. However, a request received at the {baseURL} endpoint will only contain the self value.

If the record isn’t found but the id references a deleted record, the following response body will be returned:

{
  "links" : {
    "resurrection": "<resurrectionUrlValue>",
    "self"        : "<selfReferencingUrlValue>"
  },
  "errors": [
    {
      "status"    : 404,
      "title"     : "NOT_FOUND",
      "detail"    : "DELETE processed for <typeValue> with id [<idValue>] from source [<sourceValue>]"
    }
  ]
}

Updating an Existing Record

If the original input metadata format is JSON, PATCH requests via the endpoint specified above can be used to modify or add subsections to a record. Currently, PATCH requests will fully replace an existing key-value pair or add a new one to the final merged record. JSON lists and objects sent in a PATCH request should therefore be the desired complete element.

A patch will need to be performed if you upload a granule metadata without the relationships field. See Metadata Notes section.

XML PATCH requests are not supported.

Example

Where:

curl -X PATCH \
     -H "Content-Type: application/json" \
     -H "Accept: application/json" \
     -u '<username>:<password>' \
     https://cedardevs.org/onestop/api/registry/metadata/granule/5690da06-2db2-4291-a879-c7e37662dc81 \
     -d "{ \"relationships\": [{\"type\": \"COLLECTION\", \"id\": 73d16fe3-7ccb-4918-b77f-30e343cdd378}]}"

Successful operations will return a response body with the format:

{
  "id"  : "<idValue>",
  "type": "<typeValue>"
}

Unsuccessful operations will return a response body with the format:

{
  "errors": []
}

Deleting a Record

Removing a record is possible with a DELETE request via the endpoint specified above. This will “tombstone” the record in all downstream topics, which deletes it from any sinks connected to PSI (e.g. OneStop). Since Registry is modeled on the Kappa Architecture paradigm (see our architectural background page for some more info), the event(s) concerning any given record prior to a DELETE are still kept and so it is possible to “undo” a DELETE with a resurrection request. But…

WARNING: Deleting a record via an intentionally empty request body (i.e. "") on a PUT or POST is a non-guaranteed and unclean way to purge a metadata record from downstream sinks that cannot be undone through the Registry API. Don’t do it!

Example

Where:

curl -X DELETE \
    -u '<username>:<password>' \
    https://cedardevs.org/onestop/api/registry/metadata/granule/5690da06-2db2-4291-a879-c7e37662dc81

Successful operations will return a response body with the format:

{
  "id"  : "<idValue>",
  "type": "<typeValue>"
}

Unsuccessful operations will return a response body with the format:

{
  "errors": []
}

Resurrecting a Deleted Record

A record which has been DELETEd can be resurrected with a GET request to {baseUrl}/resurrection.

Example

Where:

curl \
    -u '<username>:<password>' \
    https://cedardevs.org/onestop/api/registry/metadata/granule/5690da06-2db2-4291-a879-c7e37662dc81/resurrection

Successful operations will return a response body with the format:

{
  "id"  : "<idValue>",
  "type": "<typeValue>"
}

Unsuccessful operations will return a response body with the format:

{
  "errors": []
}

NOTE: This functionality is ONLY available if a record was removed via a DELETE request.

External Resources


Top of Page