Plugin Service API Reference

Complete reference for all service methods, helper functions, and data models available through the bizsupply-sdk.

Install the SDK

This is the first step. Everything below requires the SDK.

pip install bizsupply-sdk

How It Works

The bizsupply Engine handles most operations automatically. Plugins and benchmarks focus on business logic:

What the Engine Does	What You Do
Fetches document file data	Call `prompt_llm()` for AI processing
Resolves ontology fields from labels	Call `get_prompt()` to load prompt templates
Persists classifications and extracted data	Format fields with `format_fields_for_prompt()`
Manages source credentials and state	Log with `self.logger`
Pre-fetches aggregations for benchmarks	Return results (Engine persists them)
Resolves slot names to semantic field names	Access fields via `document.get("field_name")`

Service Methods Quick Reference

Method	Type	Available To	Returns
`await self.prompt_llm(...)`	Async	All plugins	`dict \| list \| None`
`await self.get_prompt(prompt_id)`	Async	All plugins	`str`
`self.format_fields_for_prompt(fields)`	Sync	Extraction plugins	`str`
`self.logger`	Property	All plugins + benchmarks	`Logger`

prompt_llm()

result = await self.prompt_llm(
    prompt,
    file_data=None,
    mime_type=None,
    schema=None,
)

Sends a prompt to an LLM and returns a parsed JSON response. The platform handles provider selection, caching, and other optimizations automatically.

Parameters:

Parameter	Type	Required	Description
`prompt`	`str`	Yes	The prompt text to send to the LLM
`file_data`	`bytes \| None`	No	File bytes to include for multimodal/vision processing. Pass the `file_data` parameter from your plugin method.
`mime_type`	`str \| None`	No	MIME type of `file_data` (e.g., `"application/pdf"`, `"image/png"`, `"text/plain"`). Pass the `mime_type` parameter from your plugin method.
`schema`	`type[BaseModel] \| dict \| None`	No	Pydantic model class or dict schema for structured output. The LLM response will conform to this schema.

Returns: dict | list | None

dict - Parsed JSON object from LLM response
list - Parsed JSON array from LLM response
None - If LLM fails, returns empty response, or JSON parsing fails

Raises: RuntimeError if services are not initialized.

Example - Basic prompt:

result = await self.prompt_llm(
    prompt="Is this an invoice? Return JSON: {\"is_invoice\": true/false}"
)
if result:
    is_invoice = result.get("is_invoice")

Example - With file attachment (multimodal):

result = await self.prompt_llm(
    prompt="Extract invoice data from this document.",
    file_data=file_data,
    mime_type=mime_type,
)

Example - With Pydantic schema for structured output:

from pydantic import BaseModel

class InvoiceData(BaseModel):
    vendor_name: str
    total: float
    date: str

result = await self.prompt_llm(
    prompt="Extract vendor, total, and date from this invoice.",
    file_data=file_data,
    mime_type=mime_type,
    schema=InvoiceData,
)
# result is a dict matching the schema: {"vendor_name": "...", "total": ..., "date": "..."}

get_prompt()

template = await self.get_prompt(prompt_id)

Retrieves a stored prompt template by ID. Prompt templates are created via the REST API and referenced by their UUID.

Parameters:

Parameter	Type	Required	Description
`prompt_id`	`str`	Yes	UUID of the prompt template

Returns: str - The prompt template content as a string.

Raises: RuntimeError if services not initialized. Exception if prompt not found.

Example:

prompt_id = configs.get("classification_prompt_id")
template = await self.get_prompt(prompt_id)

prompt = template.format(
    labels=available_labels,
    document_content="[See attached document file]",
)

result = await self.prompt_llm(prompt=prompt, file_data=file_data, mime_type=mime_type)

format_fields_for_prompt()

fields_json = self.format_fields_for_prompt(fields)

Formats a list of OntologyField objects as a JSON string for inclusion in LLM prompts. This is a sync method (no await needed).

Parameters:

Parameter	Type	Required	Description
`fields`	`list[OntologyField]`	Yes	List of OntologyField objects to format

Returns: str - Pretty-printed JSON string with field definitions.

Output format:

[
  {"name": "invoice_total", "type": "number", "description": "Total amount due"},
  {"name": "vendor_name", "type": "string", "description": "Name of the vendor"}
]

Example:

async def extract(self, document, file_data, mime_type, fields, configs):
    fields_json = self.format_fields_for_prompt(fields)

    result = await self.prompt_llm(
        prompt=f"Extract these fields:\n{fields_json}",
        file_data=file_data,
        mime_type=mime_type,
    )

    return ExtractionResult(data=result or {})

self.logger

Logger instance automatically available in all plugins and benchmarks. Creates a hierarchical logger name: bizsupply_sdk.plugins.{type}.{class_name}.

Methods:

Method	Use For
`self.logger.debug(msg)`	Detailed debugging information
`self.logger.info(msg)`	Progress updates, key decisions
`self.logger.warning(msg)`	Non-critical issues that may need attention
`self.logger.error(msg)`	Errors (continues execution)

Important: Use self.logger, not print() or logging.getLogger().

What the Engine Handles

The Engine manages all persistence and lifecycle operations automatically.

For Classification Plugins

Operation	How It Works
Fetch document content	Engine pre-fetches `file_data` and `mime_type` once, passes to every `classify()` call
Traverse ontology	Engine walks the tree, calling `classify()` at each level with the correct `available_labels`
Persist labels	Engine builds the full classification path from your responses and saves it
Handle suggestions	When you return `None` or a label not in the ontology, Engine triggers the suggestion workflow

For Extraction Plugins

Operation	How It Works
Skip unclassified docs	Engine skips documents without labels
Resolve fields	Engine finds ontology fields matching `document.labels` and injects them as `fields`
Fetch document content	Engine pre-fetches `file_data` and `mime_type`
Persist extracted data	Engine saves your `ExtractionResult.data` to the database

For Source Plugins

Operation	How It Works
Fetch credentials	Engine loads credentials from secure storage and injects as `DynamicCredential`
Load state	Engine loads and deserializes your typed state model from the database
Create documents	For each `DocumentInput` you yield, Engine creates the document
Save state	Engine auto-saves your state after each document (crash-resilient)

For Benchmarks

Operation	How It Works
Query documents	Engine queries documents matching `target_labels`
Resolve fields	Engine maps physical slot names (number_1, string_2) to semantic names (price_per_kwh)
Build ExtendedDocument	Engine constructs `ExtendedDocument` with resolved fields
Pre-fetch aggregations	Engine links related documents using your `MATCH_RULES` and attaches as `.aggregations`
Run scoring loop	Engine calls `score()` for each document, filters `None` results
Persist scores	Engine builds score records and saves to database

Data Models

Document

The primary model passed to classification and extraction plugins.

from bizsupply_sdk import Document

Attributes:

Attribute	Type	Description
`document_id`	`str`	Unique identifier for the document (UUID)
`original_filename`	`str`	Original filename with extension (e.g., `"invoice.pdf"`, `"contract.docx"`)
`labels`	`list[str] \| None`	Hierarchical classification labels (e.g., `["contract", "energy", "residential"]`). `None` or empty list if not yet classified.
`data`	`dict[str, Any] \| None`	Extracted field values (e.g., `{"invoice_total": 1500.00}`). Empty dict if no data extracted yet.
`metadata`	`dict[str, Any] \| None`	Custom metadata attached to the document (e.g., source info, external IDs).
`created_at`	`datetime \| None`	Timestamp when the document was created in the platform
`updated_at`	`datetime \| None`	Timestamp when the document was last modified

Usage in plugins:

# In classify() or extract()
self.logger.info(f"Processing {document.document_id}: {document.original_filename}")
self.logger.info(f"Labels: {document.labels}")
self.logger.info(f"Existing data: {document.data}")
self.logger.info(f"Metadata: {document.metadata}")

ExtractionResult

Return type for extraction plugins' extract() method.

from bizsupply_sdk import ExtractionResult

Attributes:

Attribute	Type	Required	Description
`data`	`dict[str, Any]`	Yes	Dictionary of field name -> extracted value pairs. Keys should match `OntologyField.name` values from the `fields` parameter.
`llm_fields`	`list[str] \| None`	No	List of field names that were generated by the LLM (for provenance tracking). Default: `None`.
`document_type`	`str \| None`	No	Document type identifier for silver tier mapping. If not provided, inferred from the document's labels.

Example:

return ExtractionResult(
    data={
        "invoice_total": 1500.00,
        "vendor_name": "ACME Corp",
        "invoice_date": "2025-01-15",
    },
    llm_fields=["invoice_total", "vendor_name", "invoice_date"],
    document_type="invoice",
)

DocumentInput

Yielded by source plugins from fetch(). Each DocumentInput becomes a new document in the platform.

from bizsupply_sdk import DocumentInput

Attributes:

Attribute	Type	Required	Description
`file_data`	`bytes`	Yes	Raw file content as bytes. This is the actual file (PDF, image, text, etc.).
`filename`	`str`	Yes	Original filename including extension (e.g., `"invoice_123.pdf"`). The extension is used for MIME type detection.
`mime_type`	`str \| None`	No	MIME type of the file (e.g., `"application/pdf"`). Auto-detected from content if not provided.
`metadata`	`dict[str, Any] \| None`	No	Optional metadata to attach to the document (e.g., `{"source": "gmail", "sender": "[email protected]"}`).

Example:

yield DocumentInput(
    file_data=pdf_bytes,
    filename="invoice_123.pdf",
    mime_type="application/pdf",
    metadata={
        "source": "gmail",
        "sender": "[email protected]",
        "subject": "Invoice #123",
        "received_at": "2025-01-15T10:30:00Z",
    },
)

BaseSourceState

Base class for source plugin state models. Subclass to define your sync state.

from bizsupply_sdk import BaseSourceState

BaseSourceState is an empty Pydantic BaseModel that serves as a marker base class. Define your own fields:

class MySourceState(BaseSourceState):
    cursor: str | None = None
    last_sync_time: str | None = None
    page: int = 0
    processed_ids: list[str] = []

How state works:

Engine loads your state from the database (deserialized as your typed model)
Engine passes it to fetch() as the state parameter
You mutate fields directly (state.cursor = "abc")
Engine auto-saves state after each yielded DocumentInput
If plugin crashes, Engine resumes from the last saved state

Example:

async def fetch(self, credentials, state, configs):
    self.logger.info(f"Resuming from cursor: {state.cursor}")

    for item in items:
        yield DocumentInput(...)
        state.cursor = item.id      # Engine saves after this yield
        state.page += 1

DynamicCredential

Credential object injected into source plugins by the Engine. Provides multiple access patterns for credential fields.

from bizsupply_sdk import DynamicCredential

All credential values are stored as strings. Convert to other types in your plugin as needed.

Methods:

Method	Signature	Description
Attribute access	`credentials.api_key`	Get field value. Raises `AttributeError` if not found.
`.get()`	`credentials.get(name, default=None)`	Get field value or default. Returns `None` if not found and no default.
`.validate_required_fields()`	`credentials.validate_required_fields(["api_key", "api_url"])`	Raises `ValueError` if any listed fields are missing from credentials.
`.has_field()`	`credentials.has_field("api_key")`	Returns `True` if field exists and has a truthy (non-empty) value.
`in` operator	`"api_key" in credentials`	Returns `True` if field name exists (even if value is empty).
`.keys()`	`credentials.keys()`	Returns `list[str]` of all credential field names.
`.to_dict()`	`credentials.to_dict()`	Returns `dict[str, Any]` of all fields. Caution: may expose secrets in logs.

Example:

async def fetch(self, credentials, state, configs):
    # Validate required fields early (raises ValueError if missing)
    credentials.validate_required_fields(["api_key", "api_url"])

    # Attribute access (raises AttributeError if missing)
    api_key = credentials.api_key
    api_url = credentials.api_url

    # .get() with default (returns default if missing)
    timeout = int(credentials.get("timeout_seconds", "30"))

    # Check if optional field exists and is non-empty
    if credentials.has_field("refresh_token"):
        # Handle OAuth refresh
        ...

    # List all available fields
    self.logger.info(f"Available credential fields: {credentials.keys()}")

OntologyField

Field definition injected into extraction plugins. Each field describes one piece of data to extract.

from bizsupply_sdk import OntologyField

Attributes:

Attribute	Type	Description
`name`	`str`	Field name, used as the key in `ExtractionResult.data` (e.g., `"invoice_total"`, `"vendor_name"`)
`dtype`	`str`	Data type: `"string"`, `"int"`, `"float"`, `"date"`, or other custom types
`description`	`str \| None`	Human-readable description of what this field contains (e.g., `"Total monetary value of the invoice"`)

Example:

# In extract(), fields are injected by the Engine
for field in fields:
    self.logger.info(f"Field: {field.name} ({field.dtype}): {field.description}")

# Use format_fields_for_prompt() to convert to JSON for prompts
fields_json = self.format_fields_for_prompt(fields)

OntologyNode

A node in the ontology classification tree. Contains a label, optional fields, and optional child nodes.

from bizsupply_sdk import OntologyNode

Attributes:

Attribute	Type	Description
`label`	`str`	The classification label for this node (e.g., `"contract"`, `"energy"`)
`fields`	`list[OntologyField]`	Fields to extract for documents at this classification level. Default: `[]`.
`children`	`list[OntologyNode]`	Child nodes (sub-classifications). Default: `[]`.

Methods:

Method	Returns	Description
`get_all_fields()`	`list[OntologyField]`	All fields in this subtree (this node + all descendants)
`find_node_by_path(path)`	`OntologyNode \| None`	Find a node by following a list of labels (e.g., `["contract", "energy"]`)
`get_leaf_labels()`	`list[list[str]]`	All paths from this node to leaf nodes

OntologyManifest

Complete ontology model with taxonomy tree and navigation methods.

from bizsupply_sdk import OntologyManifest

Attributes:

Attribute	Type	Description
`ontology_id`	`str \| None`	Unique identifier for the ontology
`name`	`str`	Human-readable name (e.g., `"Invoice Processing Ontology"`)
`description`	`str \| None`	Description of the ontology's purpose
`taxonomy`	`OntologyNode`	Root node of the classification tree
`created_at`	`datetime \| None`	Creation timestamp
`updated_at`	`datetime \| None`	Last update timestamp

Methods:

Method	Returns	Description
`get_all_label_paths()`	`list[list[str]]`	All possible classification paths from root to leaf
`find_fields_for_path(path)`	`list[OntologyField]`	All fields that apply to a classification path (inherits from parent nodes)

ExtendedDocument

Read-only document wrapper used in benchmarks. The Engine resolves physical slot names (number_1, string_2) to semantic field names (price_per_kwh, client_tax_id) before constructing this object. Benchmarks never see raw slot names.

from bizsupply_sdk import ExtendedDocument

Properties and Methods:

Method / Property	Returns	Description
`.get(field_name)`	`Any \| None`	Get a field value by semantic name (e.g., `doc.get("price_per_kwh")`). Returns `None` if not found.
`.aggregations`	`list[ExtendedDocument]`	Related documents linked by `MATCH_RULES` (e.g., invoices linked to a contract). Each aggregation is also an `ExtendedDocument`.
`.raw`	`dict[str, Any]`	Escape hatch: access the underlying dict directly. Use when you need fields not covered by `.get()`.

Example in a benchmark:

def score(self, document):
    # Access fields by semantic name
    contract_value = document.get("contract_value")

    # Access related documents (linked by MATCH_RULES)
    for invoice in document.aggregations:
        price = invoice.get("price_per_kwh")
        supplier = invoice.get("supplier_name")

    # Escape hatch for raw dict access
    raw_data = document.raw
    doc_id = raw_data.get("document_id")

Note: ExtendedDocument is NOT used in plugins. Plugins receive Document.

ScoredDocument

Pairs an ExtendedDocument with its calculated score. Used in the benchmark compute() method.

from bizsupply_sdk import ScoredDocument

Attributes:

Attribute	Type	Description
`document`	`ExtendedDocument`	The scored document (with all fields and aggregations)
`score`	`float`	The float score calculated by `score()`

Example in compute():

def compute(self, results):
    # results is list[ScoredDocument]
    # Simple: best (lowest) score
    return min(r.score for r in results)

    # Weighted by volume:
    total_weight = sum(r.document.get("volume") or 1 for r in results)
    return sum(
        r.score * (r.document.get("volume") or 1) for r in results
    ) / total_weight

MatchRule

Defines how to link related document types for benchmark aggregations. Declared as MATCH_RULES class attribute on benchmarks.

from bizsupply_sdk import MatchRule

Attributes:

Attribute	Type	Required	Description
`name`	`str`	Yes	Unique identifier for this rule (e.g., `"contract_invoice_match"`)
`left_group`	`list[str]`	Yes	Labels for source documents (e.g., `["contract", "energy"]`)
`right_group`	`list[str]`	Yes	Labels for target documents to link (e.g., `["invoice", "energy"]`)
`conditions`	`list[MatchCondition]`	Yes	ALL conditions must match (AND logic)
`description`	`str \| None`	No	Human-readable description of what this rule links

Example:

from bizsupply_sdk import MatchRule, MatchCondition

MATCH_RULES = [
    MatchRule(
        name="contract_invoice_match",
        left_group=["contract", "energy"],
        right_group=["invoice", "energy"],
        description="Link energy contracts to their invoices",
        conditions=[
            MatchCondition(
                left_field="client_tax_id",
                right_field="client_tax_id",
                match_type="==",
            ),
            MatchCondition(
                left_field="cpe_point_of_delivery",
                right_field="cpe_point_of_delivery",
                match_type="==",
            ),
        ],
    ),
]

MatchCondition

A single matching condition between two document fields. Used inside MatchRule.conditions.

from bizsupply_sdk import MatchCondition

Attributes:

Attribute	Type	Required	Description
`left_field`	`str`	Yes	Semantic field name in the left (source) document
`right_field`	`str`	Yes	Semantic field name in the right (target) document
`match_type`	`str`	Yes	Comparison operator (see table below)

Supported match_type operators:

Operator	Description	Example
`==`	Equal	`client_tax_id == client_tax_id`
`!=`	Not equal	`status != "closed"`
`<`	Less than	`start_date < end_date`
`<=`	Less than or equal	`emission_date <= end_date`
`>`	Greater than	`amount > threshold`
`>=`	Greater than or equal	`end_date >= emission_date`
`contains`	String contains	`description contains keyword`
`starts_with`	String starts with	`code starts_with "ENE"`

Methods:

Method	Returns	Description
`evaluate(left_value, right_value)`	`bool`	Evaluate this condition against two values. Returns `False` if either value is `None`. Useful for local testing.

Complete Import Reference

# Plugin base classes
from bizsupply_sdk import ClassificationPlugin, ExtractionPlugin, SourcePlugin

# Benchmark base class
from bizsupply_sdk import BaseBenchmark

# Core models (plugins)
from bizsupply_sdk import Document, ExtractionResult, DocumentInput
from bizsupply_sdk import BaseSourceState, DynamicCredential
from bizsupply_sdk import OntologyField, OntologyNode, OntologyManifest

# Benchmark models
from bizsupply_sdk import ExtendedDocument, ScoredDocument
from bizsupply_sdk import MatchRule, MatchCondition

# Protocol (for type hints)
from bizsupply_sdk import PluginServicesProtocol

Next Steps

Plugin Interface - Required contract for all plugins and benchmarks
Create a Plugin - Step-by-step plugin development guide