How to think about opscotch
There are good examples of the kind of problems opscotch is solving in the Overview
As you get to know opscotch and its abilities, you'll likely find that it can be used for more that what we present here. We know that it can be used to do a lot of different things, but for the sake of understanding, we'll only discuss the core use case... collecting data to answer an observability question.
Using the opscotch platform, you take a problem, break it down into steps, codify and test it, then run it continuously to answer your questions.
The question (or problem statement) will be something that you want to know, and the answer should be something that can be measured, something like:
- How many services do not meet minimum compliance?
- How many users don't use Multi Factor Authentication?
- What is the minimum amount of disk available across our fleet
- Are the data retention policies actually working on that system?
- Can I go home? ie Does my timesheet have enough hours for the week?
- Has someone booked my holiday home? What is the occupancy rate?
The important concept is: the raw data is available somewhere but the feature or function you need is not provided by the service that hosts the data. opscotch will pull the data together and do something with it.
The moving parts
The agent and the agent bootstrap configuration are installed into the target network.
The workflow configuration is authored and deployed to the opscotch configuration service.
The agent loads and executes the new configuration.
The agent
The agent is installed as a single, no dependency binary on a host or container that has network access to the services to be monitored. It communicates regularly (outbound only) with an opscotch configuration service which issues configuration updates and receives metrics sent from the agent.
The bootstrap configuration
The bootstrap configuration is an agent local configuration which creates a container for workflow configurations. It is also used to define the following:
- a private key used to decrypt the workflow configuration
- information to identify the agent
- information on how to load the workflow configuration
- information on how to handle startup logs and errors
- information on hosts this agent can communicate with
- secrets such as credentials etc.
The bootstrap is intended to be "owned" and updated by the customer and can not be changed or accessed by the agent workflows; the customer has the final control over what is accessible to the workflows. For example: workflows do not define the hosts they connect to, instead they refer to a host definition in the bootstrap - the workflows have no access to the host url or credentials. Additionally, the customer can define an allowed pattern of URLs and http methods that the workflows can execute - kind of like an internal firewall.
The workflow configuration
The workflow configuration is where all the observability logics resides, and will contain one or more workflows, each of which contain one or more workflow steps. Workflows are authored, tested, signed and encrypted before being uploaded to an opscotch configuration service where they will be fetched by the agent.
Workflows are made up of one or more execution steps that form a chain, or more technically a directed graph of steps. Each step performs a task, using the 4 templated functions (executed in order if supplied):
- url generation
- payload generation
- authentication
- results processing
Using this template, it is possible to work with any combination of http data sources and authentication methods, process and publish data.
Workflows are designed to be generic, that is, they contain no customer specific information - no urls (paths yes, but not the full url). They should be designed to work with no modification against multiple of the same services - the properties of each specific instances are supplied in the bootstrap and merged inside the agent.
As workflows could be authored and published externally to the customer, but are loaded into the agent on the customer network, the agent is designed to consider them "un-trusted".
However, there are several safe guards in place to reduce the risk. Workflows pass through two layers of cryptography one to prove the workflow was produced by opscotch software, and another to prove that it was produced specifically for the customer - you can't deliver the wrong configuration to the wrong agent, likewise you can't fiddle with an authorized configuration.
Another safe guard, is that each step execution runs in an isolated context in the agent - a step can only access a very restricted set of data, and really can't do much: it can work with data that the agent passes to it, and pass data to the agent, but the step can not make HTTP calls or access files etc. Steps can only "request" the agent to do something through a highly controlled interaction that the agent allows. Likewise multiple Workflows are isolated from each other and can not interact with each others steps.
The opscotch configuration service
The opscotch configuration service is a place to store configuration that the agent has access to. It might be hosted locally to the customer or they might choose to use a partner hosted solution.
An example
Lets run through a quick example.
To be successful with opscotch you need to:
- know the problem statement AND how to solve it using data within reach of the agent.
- break down the problem solving process into discrete steps.
- codify the discrete steps into to opscotch workflows, with tests to prove the working.
- deploy and let opscotch do the rest.
1. The problem statement
Your manager comes to you:
"Projects keep going over budget! Our SaaS project software doesn’t tell us what we need to know, when we need to know it."
You gulp because you know how painful this will be:
“I’ll need you to login and make a list. Have the report on my desk in the morning!”
And you know how repetitive it will be:
“Every morning!”
No problems... you know opscotch can do this.
2. Break it down
Let’s break it down…
- There are many projects to track
- We’ll need to check each project
- We’ll produce a metric for each project if its with budget or not
- We’ll do this daily
Technically speaking, we know how to do this:
- For the service we're using there is an API for listing projects.
- We’ll then use another API to fetch each project.
- We’ll analyse the response for the project and calculate if its with budget or not.
- We'll send the answer (in budget or not) as a metric to an analysis platform
3. Codify and test
We'll gloss over the detail right now, but say that using the method we defined above, we'd codify and the process using the opscotch workflow framework.
The opscotch workflow framework is designed to solve these kind of problems, offering these attributes:
- reliability (templated)
- repeatability (configured)
- rigorosity (tested)
- reportability (monitoring)
4. Deploy and let opscotch do the rest
Again we'll gloss over the details here, but once you publish the configuration, opscotch will load and run the configuration with no human hands (other the initial setup).
The opscotch Workflow Framework
The opscotch Workflow Framework provides a schema for defining tasks to execute in order to achieve the goal you've set.
The opscotch Workflow Framework has the following characteristics:
Reliability - the template
The opscotch Workflow Framework prescribes a simple template for performing a unit of work - known as a workflow step. Steps are chained together into a flow, which in turn can solve all manors of complex tasks.
The opscotch workflow step is designed around the HTTP request life-cycle:
- prepare the HTTP payload and url
- execute the HTTP request safely in a controlled, delegated environment
- process the HTTP response
Repeatability - the configuration
opscotch is designed to be repeatedly deployed in terms of:
- being able to repeatedly deploy a workflow to operate against multiple similar systems: you have 10 instances of the same service and you run the same workflow against each service individually
- being able to repeatedly deploy a workflow remotely, without restarts or service redeploys - the workflow automatically reloads on change.
Rigorosity - testing
opscotch workflows are designed to be unit tested
- when authoring: create a test with controlled data.
- when making changes: (including shared libraries) unit tested workflows assert the functionality remains unbroken.
- when upgrading the agent: workflows can be tested before deploying
The testing process uses an actual agent without any special testing mode - as long as the input test data is representative, you know your workflows will work.
Reportability - monitoring
opscotch reports its operation data for monitoring:
- audit data: contains important information such as calls to new hosts, new metrics etc
- operational logs: to watch for and diagnose problems
- operational metrics: to watch for and diagnose problems
Metrics
opscotch was initially designed around producing metrics, and metric output remains a core capability.
opscotch metrics are designed to contain all the information about the metric including the expected anomaly detection behavior - this allows down stream services to take more informed actions on the data.
Metrics have the following fields: timestamp
, key
, value
and dimensionMap
which is effectively any metadata you choose.