Skip to main content

Introduction to Patch

Patch is a data platform that provides predictable pricing and high performance for traditionally expensive workloads. It connects to data warehouses like Snowflake and BigQuery and replicates selected tables into a high performance compute engine. Then, Patch exposes query interfaces - SQL, generated GraphQL, and importable client libraries called data packages.

Because the platform is designed for performance and fixed costs, software engineering teams use Patch to power:

  • Personalized end-user experiences, such as promotions based on sentiment analysis or billing usage
  • Customer-facing analytics and embedded BI
  • Backend data enrichment & decision services

Data teams use Patch to:

  • Deliver low latency APIs over data warehouses without writing any code
  • Run batch jobs at predictable costs

Tell me more: what is Patch exactly?

Patch is a data platform that connects to your existing databases, starting with data warehouses, with a built-in query acceleration layer.

You select tables to form a dataset, then Patch instantly exposes query interfaces. The tables are replicated into a high performance cache that supports a wide variety of OLAP, OLTP, and search queries with ultra low latency. You can query the data with SQL, generated GraphQL APIs, or dataset-specific client libraries called data packages that you can import into your code just like a normal dependency.

Data packages include:

  1. Declarative query interfaces exposed as GraphQL and generated client libraries in Python, TypeScript, or C#
  2. Performance acceleration for highly reliable & low latency apps
  3. Change management through a familiar package versioning workflow
  4. Metadata, notably a version, maintainer, and description of intended usage and constraints
  5. Embedded access policies for federated governance

The query interfaces and optional acceleration layer are designed to support both OLTP and OLAP style queries with ultra low latency.

import { FactsAppEngagement as FactsAppEngagementSnow } from 'snowflake-demo-package';

// Get avg time in app and user counts
// broken down by app and day of week
async function main() {
let { appTitle, foregroundduration, panelistid, starttimestamp } = FactsAppEngagementSnow.fields;

let query = FactsAppEngagementSnow.select(
appTitle.as("App_Name"),
foregroundduration.avg().as("Avg_Time_in_App"),
panelistid.countDistinct().as("User_Count"),
starttimestamp.day.as("Day_of_week")
)

query.compile().then((data)=> console.log("Compiled query: ", data));
query.execute().then((data)=> console.log(data));
}

main().catch(console.error);

The response will look like this:

{
Panelist: '2394618',
App_title: 'Tik Tok',
Day_of_week: 6,
Avg_time_in_app: 42826
},
{
Panelist: '2394975',
App_title: 'WhatsApp Messenger',
Day_of_week: 6,
Avg_time_in_app: 33489.466667
},
{
Panelist: '2445785',
App_title: 'Cash App',
Day_of_week: 7,
Avg_time_in_app: 2007
}, {...}

Data packages often replace or obviate:

  1. Pipelines into operational & online analytics stores
  2. Search or analytics databases
  3. Caches and/or read-replicas
  4. API & SDK development

Features & use cases

  • Leverage data from Snowflake, BigQuery, or Databricks in customer-facing applications
  • Embed analytics data in your production
  • Query & enrich data from data warehouses like a data micro-service
  • Build apps & services with generated, type-safe query interfaces derived from a dataset schema
  • Query data immediately without waiting on pipelines or direct database access
  • Perform time series bucketing, aggregations, grouping, filtering and sorting without writing complicated SQL queries
  • Run analytical queries over large datasets with low latency and without hitting the underlying warehouses
  • Use data package versions to safely update your schemas without impacting downstream consumers
  • Look up single row records with single digit millisecond response times
  • Turn your dbt models into data packages in minutes

Why Patch?

Analytics, data science, and AI are fueling massive growth of valuable datasets stored in data warehouses. They are currently under-utilized. Patch makes it easy to use them to directly power customer experiences, driving revenue expansion, product adoption, and return on data stack investment.

There are a variety of existing approaches, each with tradeoffs.

Trap 1: Query data warehouse directlyTrap 2: Copy data to separate database
Expensive - compute-based pricing penalizes high volume of queriesDowntime risk - eb of databases, pipelines, and streams is brittle
Slow - production apps require sub-50ms response timesOverhead - Challenging to manage; lots of overhead
Low concurrency - warehouses have low concurrency limitsExpertise - requires in-house knowledge of a new system

Patch is designed as a replacement for pipelines and setting up specialized databases for new use cases.

How does Patch work?

Data producers define a data package by selecting tables from a data source. Then, a GraphQL API and client library are generated from the tables' schemas.

The generated package can be published to registries like npm and PyPI, so consumers can install them using a familiar npm or pip workflow. They can also safely upgrade as the schema or other properties of the data package are updated.

The package is imported as a library dependency into a code project. The client enables users to write queries with type safety and helper functions for common date functions, aggregates, filters, and lookups. The query is routed through an agent process, which translates the query into a source-appropriate dialect.

Learn more

To stay up to date with dpm, be sure to follow @patch_data and @dpminstall on Twitter/X!

If you have questions about anything related to dpm, you're welcome to ask on GitHub Discussions.