Key Concepts
Key Concepts
Understanding the core concepts will help you work effectively with the bizsupply API.
Core Entities
Document
A document is any file processed through bizsupply.
Key Properties:
| Property | Description | Example |
|---|---|---|
id | Unique identifier | 01HQZX3K4M2N5P7Q8R9S0T1U2V |
metadata | Source information | {"source": "gmail", "sender": "[email protected]"} |
labels | Classification tags | ["invoice", "utility"] |
data | Extracted structured data | {"total": 1500.00, "vendor": "Acme"} |
Lifecycle:
- Created by a Source plugin (ingested from Gmail, etc.)
- Labeled by a Classification plugin (tagged as "invoice", etc.)
- Data extracted by an Extraction plugin (total, date, vendor, etc.)
- Related to other documents by an Aggregation plugin
Plugin
Plugins are Python code that process documents. They are the core extensibility mechanism.
Four Plugin Types:
| Type | Base Class | Method | Return Type |
|---|---|---|---|
| Source | SourcePlugin | fetch() | AsyncIterator[DocumentInput] |
| Classification | ClassificationPlugin | classify() | str | None |
| Extraction | ExtractionPlugin | extract() | ExtractionResult |
| Aggregation | BaseBenchmark | score(), compute(), compare() | float | None, float, bool |
Plugin Components:
- Code (
plugin.py): Python class with your processing logic and metadata defined as class attributes. Built using thebizsupply-sdkpackage (pip install bizsupply-sdk).
Execution Order:
Plugins always execute in this order: Source → Classification → Extraction → Aggregation
Ontology
An ontology defines what to classify and extract from documents.
Two Parts:
- Taxonomy - Hierarchical labels for classification
- Fields - Data fields to extract for each label
Example:
name: "Invoice Ontology"
description: "Schema for invoice processing"
taxonomy:
label: "invoice"
fields:
- name: "invoice_total"
type: "number"
required: true
- name: "invoice_date"
type: "date"
required: true
- name: "vendor_name"
type: "string"
required: true
children:
- label: "utility_invoice"
fields:
- name: "utility_type"
type: "string"
required: trueUsage:
- Classification plugins use the taxonomy to apply appropriate labels
- Extraction plugins use the fields to know what data to extract
- Multiple ontologies can be combined in a single pipeline
Pipeline
A pipeline is a configured workflow combining plugins and ontologies.
Components:
| Component | Description |
|---|---|
plugin_ids | Ordered list of plugins to execute |
ontology_catalogs_ids | Ontologies to use for extraction |
source_ids | (Optional) Specific sources to process |
Example:
Pipeline: "Gmail Invoice Processing"
├─ Plugins:
│ 1. Gmail Source (ingest emails with attachments)
│ 2. Invoice Classifier (detect and label invoices)
│ 3. Invoice Extractor (extract total, date, vendor)
└─ Ontologies:
• Invoice Ontology
Job
When you execute a pipeline, a job tracks the processing.
Job States:
| State | Description |
|---|---|
pending | Job created, waiting to start |
running | Currently processing documents |
completed | Finished successfully |
failed | Encountered an error |
Job Information:
- Documents processed count
- Current plugin being executed
- Start/end timestamps
- Error details (if failed)
Credential
Credentials connect bizsupply to external services.
Supported Types:
OAuth2 (Gmail, Outlook):
{
"client_id": "your-client-id",
"client_secret": "your-client-secret",
"refresh_token": "your-refresh-token"
}IMAP (Email servers):
{
"host": "imap.gmail.com",
"port": 993,
"username": "[email protected]",
"password": "your-app-password",
"use_ssl": true
}API Key (Custom APIs):
{
"api_key": "your-api-key",
"api_url": "https://api.example.com"
}Credentials are stored securely and never exposed in API responses.
Relationships
User
├─ owns Documents
├─ owns Plugins
├─ owns Ontologies
├─ owns Pipelines
└─ has Credentials
Pipeline
├─ references Plugins (in execution order)
├─ references Ontologies
└─ creates Jobs when executed
Job
├─ executes a Pipeline
├─ processes Documents
└─ tracks status and results
Document
├─ has labels (from Classification)
├─ has data (from Extraction)
└─ can relate to other Documents
Data Isolation
Every resource belongs to a specific user:
- You can only access your own documents, plugins, and pipelines
- All API operations are automatically scoped to your user context
- Complete data isolation between users
Next Steps
- Install the SDK →
pip install bizsupply-sdk - Build a plugin → Create a Plugin
- Define extraction schemas → Create an Ontology
- Process documents → Process Documents
Updated about 1 month ago