Key Concepts

Key Concepts

Understanding the core concepts will help you work effectively with the bizsupply API.


Core Entities

Document

A document is any file processed through bizsupply.

Key Properties:

PropertyDescriptionExample
idUnique identifier01HQZX3K4M2N5P7Q8R9S0T1U2V
metadataSource information{"source": "gmail", "sender": "[email protected]"}
labelsClassification tags["invoice", "utility"]
dataExtracted structured data{"total": 1500.00, "vendor": "Acme"}

Lifecycle:

  1. Created by a Source plugin (ingested from Gmail, etc.)
  2. Labeled by a Classification plugin (tagged as "invoice", etc.)
  3. Data extracted by an Extraction plugin (total, date, vendor, etc.)
  4. Related to other documents by an Aggregation plugin

Plugin

Plugins are Python code that process documents. They are the core extensibility mechanism.

Four Plugin Types:

TypeBase ClassMethodReturn Type
SourceSourcePluginfetch()AsyncIterator[DocumentInput]
ClassificationClassificationPluginclassify()str | None
ExtractionExtractionPluginextract()ExtractionResult
AggregationBaseBenchmarkscore(), compute(), compare()float | None, float, bool

Plugin Components:

  • Code (plugin.py): Python class with your processing logic and metadata defined as class attributes. Built using the bizsupply-sdk package (pip install bizsupply-sdk).

Execution Order:

Plugins always execute in this order: Source → Classification → Extraction → Aggregation


Ontology

An ontology defines what to classify and extract from documents.

Two Parts:

  1. Taxonomy - Hierarchical labels for classification
  2. Fields - Data fields to extract for each label

Example:

name: "Invoice Ontology"
description: "Schema for invoice processing"
taxonomy:
  label: "invoice"
  fields:
    - name: "invoice_total"
      type: "number"
      required: true
    - name: "invoice_date"
      type: "date"
      required: true
    - name: "vendor_name"
      type: "string"
      required: true
  children:
    - label: "utility_invoice"
      fields:
        - name: "utility_type"
          type: "string"
          required: true

Usage:

  • Classification plugins use the taxonomy to apply appropriate labels
  • Extraction plugins use the fields to know what data to extract
  • Multiple ontologies can be combined in a single pipeline

Pipeline

A pipeline is a configured workflow combining plugins and ontologies.

Components:

ComponentDescription
plugin_idsOrdered list of plugins to execute
ontology_catalogs_idsOntologies to use for extraction
source_ids(Optional) Specific sources to process

Example:

Pipeline: "Gmail Invoice Processing"
  ├─ Plugins:
  │   1. Gmail Source (ingest emails with attachments)
  │   2. Invoice Classifier (detect and label invoices)
  │   3. Invoice Extractor (extract total, date, vendor)
  └─ Ontologies:
      • Invoice Ontology

Job

When you execute a pipeline, a job tracks the processing.

Job States:

StateDescription
pendingJob created, waiting to start
runningCurrently processing documents
completedFinished successfully
failedEncountered an error

Job Information:

  • Documents processed count
  • Current plugin being executed
  • Start/end timestamps
  • Error details (if failed)

Credential

Credentials connect bizsupply to external services.

Supported Types:

OAuth2 (Gmail, Outlook):

{
  "client_id": "your-client-id",
  "client_secret": "your-client-secret",
  "refresh_token": "your-refresh-token"
}

IMAP (Email servers):

{
  "host": "imap.gmail.com",
  "port": 993,
  "username": "[email protected]",
  "password": "your-app-password",
  "use_ssl": true
}

API Key (Custom APIs):

{
  "api_key": "your-api-key",
  "api_url": "https://api.example.com"
}

Credentials are stored securely and never exposed in API responses.


Relationships

User
  ├─ owns Documents
  ├─ owns Plugins
  ├─ owns Ontologies
  ├─ owns Pipelines
  └─ has Credentials

Pipeline
  ├─ references Plugins (in execution order)
  ├─ references Ontologies
  └─ creates Jobs when executed

Job
  ├─ executes a Pipeline
  ├─ processes Documents
  └─ tracks status and results

Document
  ├─ has labels (from Classification)
  ├─ has data (from Extraction)
  └─ can relate to other Documents

Data Isolation

Every resource belongs to a specific user:

  • You can only access your own documents, plugins, and pipelines
  • All API operations are automatically scoped to your user context
  • Complete data isolation between users

Next Steps