System Overview

System Overview

Learn about the bizsupply platform capabilities and how to use it effectively.


What is bizsupply?

bizsupply is a cloud-native document processing platform that helps you:

  • Ingest documents from various sources (Gmail, Outlook, IMAP, custom APIs)
  • Classify documents automatically using AI (invoices, receipts, contracts, etc.)
  • Extract structured data using customizable schemas (ontologies)
  • Aggregate related documents and build relationships

The platform provides a flexible, plugin-based architecture where you can create custom document processing workflows.


Key Capabilities

  • Multi-tenant Processing - Complete data isolation between users and organizations
  • Dynamic Plugin System - Create custom plugins for any document processing task
  • Ontology-driven Extraction - Define exactly what data to extract with configurable schemas
  • External Source Integration - Connect Gmail, Outlook, IMAP, or custom APIs
  • LLM-powered AI - Leverage Google Gemini and OpenAI for intelligent processing
  • Secure Credentials - OAuth tokens and API keys stored securely
  • RESTful API - Integrate with any application via standard HTTP APIs
  • Real-time Updates - Server-Sent Events (SSE) for live job status

How It Works

Document Processing Flow

1. Ingest documents from your connected sources
   ↓
2. Documents are stored and parsed
   ↓
3. Classification plugins analyze and label documents
   ↓
4. Extraction plugins pull structured data based on your ontology
   ↓
5. Aggregation plugins link related documents
   ↓
6. Access your processed documents via API

The Plugin System

Plugins are built using the bizsupply-sdk package (pip install bizsupply-sdk), which provides base classes, data models, and a CLI for scaffolding and validation.

bizsupply uses four types of plugins, each with a specific role:

Plugin TypePurposeExample
SourceIngest documents from external sourcesGmail inbox fetcher
ClassificationCategorize documents with labelsInvoice detector
ExtractionExtract structured data fieldsInvoice data extractor
AggregationLink related documents togetherInvoice-to-contract linker

Plugins execute in order: Source → Classification → Extraction → Aggregation


Core Concepts

Documents

A document is any file you process through bizsupply. Each document has:

  • Metadata - Source information, dates, sender, etc.
  • Labels - Classification tags (e.g., "invoice", "receipt")
  • Data - Extracted structured fields (e.g., total, date, vendor)
  • Files - Original file and parsed text content

Pipelines

A pipeline is a configured workflow that combines:

  • One or more plugins to execute
  • One or more ontologies for extraction schemas
  • Optional source filters to process specific sources

Ontologies

An ontology defines your extraction schema:

  • Taxonomy - Hierarchical labels for classification
  • Fields - What data to extract (name, type, required)

Jobs

When you execute a pipeline, a job is created to track:

  • Execution status (pending, running, completed, failed)
  • Documents processed
  • Results and any errors

Authentication

All API requests require a JWT Bearer token:

Authorization: Bearer <your_jwt_token>

Tokens are obtained through OAuth2 login with Google or Microsoft.


Architecture Principles

Multi-Tenancy

Every resource is scoped to your user and tenant:

  • All data queries automatically filter by your context
  • File storage paths include your identifiers
  • You can only access your own documents and plugins

Plugin Isolation

Plugins run in isolated environments:

  • Each plugin has its own dependencies
  • Plugin crashes don't affect other operations
  • Plugins receive only the data they need

Scalability

The platform automatically scales:

  • Handle large document volumes
  • Process multiple jobs concurrently
  • Scale up during peak usage
  • Background tasks (auto-sync, data processing) run as Cloud Run worker pools for cost-efficient scaling

Next Steps