System Overview

Learn about the bizsupply platform capabilities and how to use it effectively.

What is bizsupply?

bizsupply is a cloud-native document processing platform that helps you:

Ingest documents from various sources (Gmail, Outlook, IMAP, custom APIs)
Classify documents automatically using AI (invoices, receipts, contracts, etc.)
Extract structured data using customizable schemas (ontologies)
Aggregate related documents and build relationships

The platform provides a flexible, plugin-based architecture where you can create custom document processing workflows.

Key Capabilities

Multi-tenant Processing - Complete data isolation between users and organizations
Dynamic Plugin System - Create custom plugins for any document processing task
Ontology-driven Extraction - Define exactly what data to extract with configurable schemas
External Source Integration - Connect Gmail, Outlook, IMAP, or custom APIs
LLM-powered AI - Leverage Google Gemini and OpenAI for intelligent processing
Secure Credentials - OAuth tokens and API keys stored securely
RESTful API - Integrate with any application via standard HTTP APIs
Real-time Updates - Server-Sent Events (SSE) for live job status

How It Works

Document Processing Flow

1. Ingest documents from your connected sources
   ↓
2. Documents are stored and parsed
   ↓
3. Classification plugins analyze and label documents
   ↓
4. Extraction plugins pull structured data based on your ontology
   ↓
5. Aggregation plugins link related documents
   ↓
6. Access your processed documents via API

The Plugin System

Plugins are built using the bizsupply-sdk package (pip install bizsupply-sdk), which provides base classes, data models, and a CLI for scaffolding and validation.

bizsupply uses four types of plugins, each with a specific role:

Plugin Type	Purpose	Example
Source	Ingest documents from external sources	Gmail inbox fetcher
Classification	Categorize documents with labels	Invoice detector
Extraction	Extract structured data fields	Invoice data extractor
Aggregation	Link related documents together	Invoice-to-contract linker

Plugins execute in order: Source → Classification → Extraction → Aggregation

Core Concepts

Documents

A document is any file you process through bizsupply. Each document has:

Metadata - Source information, dates, sender, etc.
Labels - Classification tags (e.g., "invoice", "receipt")
Data - Extracted structured fields (e.g., total, date, vendor)
Files - Original file and parsed text content

Pipelines

A pipeline is a configured workflow that combines:

One or more plugins to execute
One or more ontologies for extraction schemas
Optional source filters to process specific sources

Ontologies

An ontology defines your extraction schema:

Taxonomy - Hierarchical labels for classification
Fields - What data to extract (name, type, required)

Jobs

When you execute a pipeline, a job is created to track:

Execution status (pending, running, completed, failed)
Documents processed
Results and any errors

Authentication

All API requests require a JWT Bearer token:

Authorization: Bearer <your_jwt_token>

Tokens are obtained through OAuth2 login with Google or Microsoft.

Architecture Principles

Multi-Tenancy

Every resource is scoped to your user and tenant:

All data queries automatically filter by your context
File storage paths include your identifiers
You can only access your own documents and plugins

Plugin Isolation

Plugins run in isolated environments:

Each plugin has its own dependencies
Plugin crashes don't affect other operations
Plugins receive only the data they need

Scalability

The platform automatically scales:

Handle large document volumes
Process multiple jobs concurrently
Scale up during peak usage
Background tasks (auto-sync, data processing) run as Cloud Run worker pools for cost-efficient scaling

Next Steps

Install the SDK → pip install bizsupply-sdk
Learn core concepts → Key Concepts
Build a plugin → Create a Plugin
Process documents → Process Documents
Define extraction schemas → Create an Ontology