onestop

OneStop is a data discovery system being built by CIRES researchers on a grant from the NOAA National Centers for Environmental Information. We welcome contributions from the community!

This project is maintained by cedardevs

Operator Documentation Home

Estimated Reading Time: 5 minutes

deployment System Requirements

The core of the OneStop system are Apache Kafka and Elastic search. Apache Kafka provides a distributed, durable, ordered, streaming event platform which OneStop leverages to store all its inputs as well as all derived metadata. It also utilizes the Confluent Schema Registry to store Avro schemas defining the shapes of its messages and ensure that they evolve in a backward-compatible way.

Elastic Search is full-text search and analytics engine, which OneStop leverages for indexing, searching, and analyzing of metadata quickly and in near real time.

For more information about the system architecture and the way it utilizes these tools see the architecture section below.

Each piece of required infrastructure is described in more depth below. For Kafka we recommend using the open source versions published with the Confluent Platform. These packages are free, open source, and actively developed and maintained by Confluent.

Document Structure

Zookeeper

Zookeeper is a distributed key-value store and is a requirement of Kafka.

Requirements

For more detailed information, see the Confluent deployment guide.

Kafka

Kafka is a distributed streaming platform. It provides both the messaging and primary storage for the PSI system. All inputs and outputs are streamed in and out of the system via Kafka, and some data such as raw inputs are preserved indefinitely in Kafka topics, which is perfectly okay!

Requirements

For much more information, see the Confluent deployment guide.

Schema Registry

The system utilizes Avro schemas to define the shapes of the metadata entities which flow through it. The Schema Registry is a central location where all these schemas are stored; data producers publish the schemas of data they produce to it so that consumers can then retrieve those schemas and read the data. It also facilitates the evolution of those schemas over time, for example to enforce backwards compatibility.

Requirements

For more detailed information see the Confluent deployment guide.


Top of Page