Skip to main content
Version: 2.3.x

The problem opscotch solves

In the work that we all do there are some problems that we want to solve and it seems like there just isn’t the data available to support it. Sometimes when we look closer, or ask experts, there is data that might be useful but it’s either hard to get to or in a form that is hard to work with, or both!

opscotch is a platform for developing, deploying and running workflows that collect awkward hard to reach data, making it easy to work with.

Overview

opscotch augments traditional monitoring strategies with the ability to query across multiple systems, and to provide continuous industry best practice advice.

Traditional monitoring strategies tend to employ various solutions to obtain monitoring coverage.

In an ideal scenario the solutions offer different aspects or practicalities of monitoring, in theory providing a broad view into the monitored system. In reality, it is difficult to obtain a unified view, or “a single pane of glass”. While the various monitoring solutions observe the same monitored system, each solutions data is disconnected from the others, siloed and difficult to integrate.

With various solutions monitoring your services, comes the responsibility to ensure that those solutions are running well. Monitoring tools tend to collect large amounts of data and place the onus on the consumer to identify and interpret data of importance. This often results in ineffective monitoring strategies and lost opportunities for improvement.

opscotch offers a solution to programmatically unify data across multiple systems (monitoring or application) and then use that tool to continuously provide industry best practice advice.

The kinds of problems opscotch is solving

Here is an idea of the kinds of problems opscotch is already solving for customers:

Create time-series data from data that has no time component (cross-sectional data).

Instead of logging into that really important website (when you remember) and looking at that really important number which changes over time, but has no available history, have opscotch collect that number and add the date and time, so that you can observe the number AND have the history without having to log in (remember where to go) and navigate to the page.

Concrete generic examples:

  • Create weather history: Retrieve the current temperature from your weather service, and the current time to create a history of the temperature.
  • Create auction bid history: Retrieve the current bid in an online auction, add the current time to create a history of bids.
  • Monitor website availability for reporting: Access a website and see if it’s up. Take the current time and create a history of the uptime.
  • Monitor the number of security audit violations: Retrieve the currently known number of audit violations with the current time, and create a history.

Bring data from disparate, unrelated systems to a single place of analysis

Instead of logging into many services and exporting a csv and spending hours formatting and analysing them in excel, have opscotch collect them into a single place of analysis.

Concrete generic examples:

  • Consolidate website analytics: Bring all your website analytics data from multiple vendors into a single place, to combine for enriched analysis.
  • Report on what servers do not have log collection: Combine your log file data with your host inventory to identify hosts without log collection.
  • Discover new service connections: Combine your APM data with your CMDB to discover unknown relationships.

Perform transformations, reductions, calculations at the point of collection

Rather than collecting vast amounts of raw data into an expensive "unified data platform", that will in all likelihood never be used again, is expensive to store, requires more resources and licensing for the more data that is ingested - only to perform a simple calculation; instead perform the calculation at collection time and just send the facts.

Concrete generic examples:

  • Answer the question at collection time: Traditional analytic operations are done on the analytic platform, and require all the data to operate on to have been collected. Often the data collected is in an overzealous vendor driven, unconsidered nature; collecting highly detailed high frequency data from multiple sources, perhaps even multiple times. This data is then stored for a long period of time, incurring additional costs and resources for search etc. If the questions are known, ie you already do processing for say: “Number of servers with high CPU”, opscotch can fetch all the data, filter the servers with “high cpu”, then aggregate that as a number (perhaps with a note stating which servers). You still know how many (and which) servers have high CPU, but do not have to store all the raw data that may never be used anyway.

Observe the change of data in a meaningful way.

In cases where an absolute number such a total or sum is available, often this absolute number has no context. In these cases often the difference between observations adds meaningful context.

Concrete generic examples:

  • Make a ‘counter’ more meaningful: For data points like “total number of website visitors” that increases over time, but has no history or context to the increase, opscotch can track the difference between observations, for example, what's more meaningful, that there has been 1,693,958 visitors to your website in total, or in the last hour there has been an increase of 500 visitors.

Perform complex workflows (that almost all other agents can not do)

Most agents can only do the simplest collection use cases. opscotch is different - you can pull data from multiple sources, iterate over list items, perform calculations, combine and reduce data etc.

Concrete generic examples:

  • Log into data service with customer controlled credentials: To avoid having to share the data service credentials directly with opscotch, complex flows can be used for advanced authentication methods such as oauth2, token exchange, key vaults etc. For example: data service credentials may be stored in a customer key vault, which is authenticated with oauth2. The flow would be, using the “client application” (opscotch) credentials, request an oauth2 access token. Use the access token to request a credential from the key vault, use that credential to authenticate to the data service - the data service credentials are never exposed, and the customer can update the credentials without having to make changes to opscotch.
  • Prioritise device vulnerability checks for uses with poor security protections: Log into the security audit platform, retrieve a list of all users. Check each user’s security settings. For each user that has inadequate security protections, list all their devices. For each device check for vulnerabilities. For each vulnerability, raise a service request to remediate.

Repeatedly perform data collections on multiple sources with consistency and ease of update.

Write a workflow once and run it against multiple targets, and update it over-the-wire

Concrete generic examples:

  • Check user security over multiple corporate accounts: Write a workflow that checks user security settings in an Azure account. Use that one workflow with different accounts. Any updates made to the workflow will be reflected for all accounts.
  • Check service logs in all product environments: Write a workflow that checks service logs in an environment. Use that one workflow in your development, test and production environments.
  • Check all instances of a platform for compliance: Write a workflow that validates that your data platform is in compliance with company policies. Run that workflow over all instances of that platform.

What is opscotch, the agent

opscotch is a network connected, multi-step logic engine that can reach further than traditional collection tools. By using a dynamic step-by-step logic technique, opscotch excels at collecting data where static collectors do not traditionally work, for example when multiple steps are required such as

  • flexible authentication with logging in, token generation, fetch from key vault, before performing an action
  • performing an action on each item in a list
  • querying multiple systems
  • performing calculations and aggregations
  • generating metrics derived from multiple sources that are traditionally difficult or impossible to express.

The flexibility of the Agent allows for non-traditional data points to be identified and observed, like metrics that are important to product owners or business representatives. The Agent can not only query your applications for data, it can also take advantage of traditional monitoring and logging systems by using them as a source of data to generate metrics.

Continuous industry best practice advice

A practical implementation of The Agent is the continuous observation of monitoring services such as Splunk, Elasticsearch, Appdynamics or Cribl for not only key operational metrics, but also for industry advised best practices.

Often best practises are simple cases like “this feature should not be enabled in production”. Along with the simple cases, The Agent is perfectly suited to dynamic cases with criteria that changes over time such as “this property should not be more than 10 times some other changing value” or “the ratio of this value over that value should always be positive”.

Sometimes best practices involve multiple systems, like “APM agent logs should be in the enterprise logging system”. The Agent can query the APM system and cross reference with the enterprise logging system, to produce a metric that informs the number of APM agents missing from the enterprise logging system.

Minimal data dependency, security and privacy

Security and privacy are built into the system by default. The system is designed to

  • produce and consume the smallest units of data possible.
  • intentionally prevent the transmission of text out of the client network
  • be transparent and auditable in the data that is transmitted
  • provide for hashing, redacting and protection of data