Introduction to Kaspr
Welcome to Kaspr – a Kubernetes-native stream processing framework that dramatically simplifies building real-time, event-driven applications on Apache Kafka.
What is Kaspr?
Kaspr is a declarative stream processing platform that allows you to build sophisticated data processing pipelines using simple YAML configurations instead of complex stream processing code. Built as Kubernetes Custom Resource Definitions (CRDs), Kaspr provides enterprise-ready features like scaling, authentication, persistent storage, and monitoring out of the box.
Why Choose Kaspr?
Declarative & Simple - Define your entire stream processing application in YAML – no need to write complex Kafka Streams or custom applications from scratch.
Python-Powered Processing - Write your business logic in Python using familiar libraries, while Kaspr handles the distributed systems complexity.
Production-Ready - Built for Kubernetes with enterprise features: RBAC, resource limits, persistent storage, TLS authentication, and horizontal scaling.
Stateful Processing - Maintain tables, perform joins, and handle aggregations across distributed partitions with automatic state management.
Core Components
Kaspr applications are built using three main resource types:
KasprApp
The main application definition that specifies infrastructure requirements, Kafka connection details, and deployment configuration.
apiVersion: kaspr.io/v1alpha1
kind: KasprApp
metadata:
name: my-stream-processor
spec:
replicas: 3
bootstrapServers: kafka-cluster:9092
authentication:
type: scram-sha-512
username: my-user
resources:
requests:
cpu: 0.2
memory: 512MiKasprAgent
The stream processing units that consume from Kafka topics, transform data using Python logic, and produce results.
apiVersion: kaspr.io/v1alpha1
kind: KasprAgent
metadata:
name: price-calculator
labels:
kaspr.io/app: my-stream-processor
spec:
description: "Calculates median prices from item data"
input:
topic:
name: raw-prices
output:
topics:
- name: calculated-prices
processors:
pipeline:
- calculate-median
operations:
- name: calculate-median
map:
entrypoint: calculate_price
python: |
def calculate_price(value):
prices = [item['price'] for item in value['items']]
median = sorted(prices)[len(prices)//2]
return {
"product_id": value["product_id"],
"median_price": median
}KasprTable
Persistent key-value stores for stateful processing, enabling joins, aggregations, and lookups.
apiVersion: kaspr.io/v1alpha1
kind: KasprTable
metadata:
name: product-prices
labels:
kaspr.io/app: my-stream-processor
spec:
keySerializer: json
valueSerializer: json
partitions: 16Common Use Cases
Based on real-world implementations, Kaspr excels at:
- Real-time Data Joins - Combine data from multiple Kafka topics based on common keys, like joining product information with pricing data.
- Stream Aggregations - Calculate rolling averages, medians, counts, and other metrics over time windows or event groups.
- Business Rules Processing - Apply complex business logic to streaming data using a rules engine that can be updated dynamically.
- Data Filtering & Transformation - Clean, enrich, and transform streaming data before forwarding to downstream systems.
- REST API Integration - Expose web interfaces for interacting with your stream processing applications using
KasprWebView.
Architecture Benefits
- Event-Driven Microservices - Build loosely-coupled services that communicate via Kafka events, enabling better scalability and fault tolerance.
- Horizontal Scaling - Kaspr automatically distributes processing across multiple pods and handles partition assignment and rebalancing.
- Fault Tolerance - Built-in error handling, dead letter topics, and automatic recovery ensure your applications stay resilient.
- Observability - Native Kubernetes integration means your stream processors work with existing monitoring, logging, and alerting infrastructure.
Who Should Use Kaspr?
- Data Engineers - Building real-time data pipelines without the overhead of managing complex stream processing frameworks.
- Backend Developers - Creating event-driven microservices that need to process and react to streaming data.
- Platform Teams - Providing a standardized way for teams to build and deploy stream processing applications at scale.
- Python Developers - Leveraging existing Python skills and libraries for stream processing without learning new DSLs.
What’s Next?
Ready to get started? Here’s your path forward:
- Install Kaspr - Set up the Kaspr operator in your Kubernetes cluster
- Learn Concepts - Create a simple stream processing application
- Explore Examples - See real-world use cases and patterns
Ready to simplify your stream processing? Kaspr transforms complex distributed systems challenges into simple YAML configurations, letting you focus on business logic instead of infrastructure complexity.