Plugin Service API Reference

Plugin Service API Reference

Complete reference for all service methods, helper functions, and data models available through the bizsupply-sdk.


Install the SDK

This is the first step. Everything below requires the SDK.

pip install bizsupply-sdk

How It Works

The bizsupply Engine handles most operations automatically. Plugins and benchmarks focus on business logic:

What the Engine DoesWhat You Do
Fetches document file dataCall prompt_llm() for AI processing
Resolves ontology fields from labelsCall get_prompt() to load prompt templates
Persists classifications and extracted dataFormat fields with format_fields_for_prompt()
Manages source credentials and stateLog with self.logger
Pre-fetches aggregations for benchmarksReturn results (Engine persists them)
Resolves slot names to semantic field namesAccess fields via document.get("field_name")

Service Methods Quick Reference

MethodTypeAvailable ToReturns
await self.prompt_llm(...)AsyncAll pluginsdict | list | None
await self.get_prompt(prompt_id)AsyncAll pluginsstr
self.format_fields_for_prompt(fields)SyncExtraction pluginsstr
self.loggerPropertyAll plugins + benchmarksLogger

prompt_llm()

result = await self.prompt_llm(
    prompt,
    file_data=None,
    mime_type=None,
    schema=None,
)

Sends a prompt to an LLM and returns a parsed JSON response. The platform handles provider selection, caching, and other optimizations automatically.

Parameters:

ParameterTypeRequiredDescription
promptstrYesThe prompt text to send to the LLM
file_databytes | NoneNoFile bytes to include for multimodal/vision processing. Pass the file_data parameter from your plugin method.
mime_typestr | NoneNoMIME type of file_data (e.g., "application/pdf", "image/png", "text/plain"). Pass the mime_type parameter from your plugin method.
schematype[BaseModel] | dict | NoneNoPydantic model class or dict schema for structured output. The LLM response will conform to this schema.

Returns: dict | list | None

  • dict - Parsed JSON object from LLM response
  • list - Parsed JSON array from LLM response
  • None - If LLM fails, returns empty response, or JSON parsing fails

Raises: RuntimeError if services are not initialized.

Example - Basic prompt:

result = await self.prompt_llm(
    prompt="Is this an invoice? Return JSON: {\"is_invoice\": true/false}"
)
if result:
    is_invoice = result.get("is_invoice")

Example - With file attachment (multimodal):

result = await self.prompt_llm(
    prompt="Extract invoice data from this document.",
    file_data=file_data,
    mime_type=mime_type,
)

Example - With Pydantic schema for structured output:

from pydantic import BaseModel

class InvoiceData(BaseModel):
    vendor_name: str
    total: float
    date: str

result = await self.prompt_llm(
    prompt="Extract vendor, total, and date from this invoice.",
    file_data=file_data,
    mime_type=mime_type,
    schema=InvoiceData,
)
# result is a dict matching the schema: {"vendor_name": "...", "total": ..., "date": "..."}

get_prompt()

template = await self.get_prompt(prompt_id)

Retrieves a stored prompt template by ID. Prompt templates are created via the REST API and referenced by their UUID.

Parameters:

ParameterTypeRequiredDescription
prompt_idstrYesUUID of the prompt template

Returns: str - The prompt template content as a string.

Raises: RuntimeError if services not initialized. Exception if prompt not found.

Example:

prompt_id = configs.get("classification_prompt_id")
template = await self.get_prompt(prompt_id)

prompt = template.format(
    labels=available_labels,
    document_content="[See attached document file]",
)

result = await self.prompt_llm(prompt=prompt, file_data=file_data, mime_type=mime_type)

format_fields_for_prompt()

fields_json = self.format_fields_for_prompt(fields)

Formats a list of OntologyField objects as a JSON string for inclusion in LLM prompts. This is a sync method (no await needed).

Parameters:

ParameterTypeRequiredDescription
fieldslist[OntologyField]YesList of OntologyField objects to format

Returns: str - Pretty-printed JSON string with field definitions.

Output format:

[
  {"name": "invoice_total", "type": "number", "description": "Total amount due"},
  {"name": "vendor_name", "type": "string", "description": "Name of the vendor"}
]

Example:

async def extract(self, document, file_data, mime_type, fields, configs):
    fields_json = self.format_fields_for_prompt(fields)

    result = await self.prompt_llm(
        prompt=f"Extract these fields:\n{fields_json}",
        file_data=file_data,
        mime_type=mime_type,
    )

    return ExtractionResult(data=result or {})

self.logger

Logger instance automatically available in all plugins and benchmarks. Creates a hierarchical logger name: bizsupply_sdk.plugins.{type}.{class_name}.

Methods:

MethodUse For
self.logger.debug(msg)Detailed debugging information
self.logger.info(msg)Progress updates, key decisions
self.logger.warning(msg)Non-critical issues that may need attention
self.logger.error(msg)Errors (continues execution)

Important: Use self.logger, not print() or logging.getLogger().


What the Engine Handles

The Engine manages all persistence and lifecycle operations automatically.

For Classification Plugins

OperationHow It Works
Fetch document contentEngine pre-fetches file_data and mime_type once, passes to every classify() call
Traverse ontologyEngine walks the tree, calling classify() at each level with the correct available_labels
Persist labelsEngine builds the full classification path from your responses and saves it
Handle suggestionsWhen you return None or a label not in the ontology, Engine triggers the suggestion workflow

For Extraction Plugins

OperationHow It Works
Skip unclassified docsEngine skips documents without labels
Resolve fieldsEngine finds ontology fields matching document.labels and injects them as fields
Fetch document contentEngine pre-fetches file_data and mime_type
Persist extracted dataEngine saves your ExtractionResult.data to the database

For Source Plugins

OperationHow It Works
Fetch credentialsEngine loads credentials from secure storage and injects as DynamicCredential
Load stateEngine loads and deserializes your typed state model from the database
Create documentsFor each DocumentInput you yield, Engine creates the document
Save stateEngine auto-saves your state after each document (crash-resilient)

For Benchmarks

OperationHow It Works
Query documentsEngine queries documents matching target_labels
Resolve fieldsEngine maps physical slot names (number_1, string_2) to semantic names (price_per_kwh)
Build ExtendedDocumentEngine constructs ExtendedDocument with resolved fields
Pre-fetch aggregationsEngine links related documents using your MATCH_RULES and attaches as .aggregations
Run scoring loopEngine calls score() for each document, filters None results
Persist scoresEngine builds score records and saves to database

Data Models

Document

The primary model passed to classification and extraction plugins.

from bizsupply_sdk import Document

Attributes:

AttributeTypeDescription
document_idstrUnique identifier for the document (UUID)
original_filenamestrOriginal filename with extension (e.g., "invoice.pdf", "contract.docx")
labelslist[str] | NoneHierarchical classification labels (e.g., ["contract", "energy", "residential"]). None or empty list if not yet classified.
datadict[str, Any] | NoneExtracted field values (e.g., {"invoice_total": 1500.00}). Empty dict if no data extracted yet.
metadatadict[str, Any] | NoneCustom metadata attached to the document (e.g., source info, external IDs).
created_atdatetime | NoneTimestamp when the document was created in the platform
updated_atdatetime | NoneTimestamp when the document was last modified

Usage in plugins:

# In classify() or extract()
self.logger.info(f"Processing {document.document_id}: {document.original_filename}")
self.logger.info(f"Labels: {document.labels}")
self.logger.info(f"Existing data: {document.data}")
self.logger.info(f"Metadata: {document.metadata}")

ExtractionResult

Return type for extraction plugins' extract() method.

from bizsupply_sdk import ExtractionResult

Attributes:

AttributeTypeRequiredDescription
datadict[str, Any]YesDictionary of field name -> extracted value pairs. Keys should match OntologyField.name values from the fields parameter.
llm_fieldslist[str] | NoneNoList of field names that were generated by the LLM (for provenance tracking). Default: None.
document_typestr | NoneNoDocument type identifier for silver tier mapping. If not provided, inferred from the document's labels.

Example:

return ExtractionResult(
    data={
        "invoice_total": 1500.00,
        "vendor_name": "ACME Corp",
        "invoice_date": "2025-01-15",
    },
    llm_fields=["invoice_total", "vendor_name", "invoice_date"],
    document_type="invoice",
)

DocumentInput

Yielded by source plugins from fetch(). Each DocumentInput becomes a new document in the platform.

from bizsupply_sdk import DocumentInput

Attributes:

AttributeTypeRequiredDescription
file_databytesYesRaw file content as bytes. This is the actual file (PDF, image, text, etc.).
filenamestrYesOriginal filename including extension (e.g., "invoice_123.pdf"). The extension is used for MIME type detection.
mime_typestr | NoneNoMIME type of the file (e.g., "application/pdf"). Auto-detected from content if not provided.
metadatadict[str, Any] | NoneNoOptional metadata to attach to the document (e.g., {"source": "gmail", "sender": "[email protected]"}).

Example:

yield DocumentInput(
    file_data=pdf_bytes,
    filename="invoice_123.pdf",
    mime_type="application/pdf",
    metadata={
        "source": "gmail",
        "sender": "[email protected]",
        "subject": "Invoice #123",
        "received_at": "2025-01-15T10:30:00Z",
    },
)

BaseSourceState

Base class for source plugin state models. Subclass to define your sync state.

from bizsupply_sdk import BaseSourceState

BaseSourceState is an empty Pydantic BaseModel that serves as a marker base class. Define your own fields:

class MySourceState(BaseSourceState):
    cursor: str | None = None
    last_sync_time: str | None = None
    page: int = 0
    processed_ids: list[str] = []

How state works:

  1. Engine loads your state from the database (deserialized as your typed model)
  2. Engine passes it to fetch() as the state parameter
  3. You mutate fields directly (state.cursor = "abc")
  4. Engine auto-saves state after each yielded DocumentInput
  5. If plugin crashes, Engine resumes from the last saved state

Example:

async def fetch(self, credentials, state, configs):
    self.logger.info(f"Resuming from cursor: {state.cursor}")

    for item in items:
        yield DocumentInput(...)
        state.cursor = item.id      # Engine saves after this yield
        state.page += 1

DynamicCredential

Credential object injected into source plugins by the Engine. Provides multiple access patterns for credential fields.

from bizsupply_sdk import DynamicCredential

All credential values are stored as strings. Convert to other types in your plugin as needed.

Methods:

MethodSignatureDescription
Attribute accesscredentials.api_keyGet field value. Raises AttributeError if not found.
.get()credentials.get(name, default=None)Get field value or default. Returns None if not found and no default.
.validate_required_fields()credentials.validate_required_fields(["api_key", "api_url"])Raises ValueError if any listed fields are missing from credentials.
.has_field()credentials.has_field("api_key")Returns True if field exists and has a truthy (non-empty) value.
in operator"api_key" in credentialsReturns True if field name exists (even if value is empty).
.keys()credentials.keys()Returns list[str] of all credential field names.
.to_dict()credentials.to_dict()Returns dict[str, Any] of all fields. Caution: may expose secrets in logs.

Example:

async def fetch(self, credentials, state, configs):
    # Validate required fields early (raises ValueError if missing)
    credentials.validate_required_fields(["api_key", "api_url"])

    # Attribute access (raises AttributeError if missing)
    api_key = credentials.api_key
    api_url = credentials.api_url

    # .get() with default (returns default if missing)
    timeout = int(credentials.get("timeout_seconds", "30"))

    # Check if optional field exists and is non-empty
    if credentials.has_field("refresh_token"):
        # Handle OAuth refresh
        ...

    # List all available fields
    self.logger.info(f"Available credential fields: {credentials.keys()}")

OntologyField

Field definition injected into extraction plugins. Each field describes one piece of data to extract.

from bizsupply_sdk import OntologyField

Attributes:

AttributeTypeDescription
namestrField name, used as the key in ExtractionResult.data (e.g., "invoice_total", "vendor_name")
dtypestrData type: "string", "int", "float", "date", or other custom types
descriptionstr | NoneHuman-readable description of what this field contains (e.g., "Total monetary value of the invoice")

Example:

# In extract(), fields are injected by the Engine
for field in fields:
    self.logger.info(f"Field: {field.name} ({field.dtype}): {field.description}")

# Use format_fields_for_prompt() to convert to JSON for prompts
fields_json = self.format_fields_for_prompt(fields)

OntologyNode

A node in the ontology classification tree. Contains a label, optional fields, and optional child nodes.

from bizsupply_sdk import OntologyNode

Attributes:

AttributeTypeDescription
labelstrThe classification label for this node (e.g., "contract", "energy")
fieldslist[OntologyField]Fields to extract for documents at this classification level. Default: [].
childrenlist[OntologyNode]Child nodes (sub-classifications). Default: [].

Methods:

MethodReturnsDescription
get_all_fields()list[OntologyField]All fields in this subtree (this node + all descendants)
find_node_by_path(path)OntologyNode | NoneFind a node by following a list of labels (e.g., ["contract", "energy"])
get_leaf_labels()list[list[str]]All paths from this node to leaf nodes

OntologyManifest

Complete ontology model with taxonomy tree and navigation methods.

from bizsupply_sdk import OntologyManifest

Attributes:

AttributeTypeDescription
ontology_idstr | NoneUnique identifier for the ontology
namestrHuman-readable name (e.g., "Invoice Processing Ontology")
descriptionstr | NoneDescription of the ontology's purpose
taxonomyOntologyNodeRoot node of the classification tree
created_atdatetime | NoneCreation timestamp
updated_atdatetime | NoneLast update timestamp

Methods:

MethodReturnsDescription
get_all_label_paths()list[list[str]]All possible classification paths from root to leaf
find_fields_for_path(path)list[OntologyField]All fields that apply to a classification path (inherits from parent nodes)

ExtendedDocument

Read-only document wrapper used in benchmarks. The Engine resolves physical slot names (number_1, string_2) to semantic field names (price_per_kwh, client_tax_id) before constructing this object. Benchmarks never see raw slot names.

from bizsupply_sdk import ExtendedDocument

Properties and Methods:

Method / PropertyReturnsDescription
.get(field_name)Any | NoneGet a field value by semantic name (e.g., doc.get("price_per_kwh")). Returns None if not found.
.aggregationslist[ExtendedDocument]Related documents linked by MATCH_RULES (e.g., invoices linked to a contract). Each aggregation is also an ExtendedDocument.
.rawdict[str, Any]Escape hatch: access the underlying dict directly. Use when you need fields not covered by .get().

Example in a benchmark:

def score(self, document):
    # Access fields by semantic name
    contract_value = document.get("contract_value")

    # Access related documents (linked by MATCH_RULES)
    for invoice in document.aggregations:
        price = invoice.get("price_per_kwh")
        supplier = invoice.get("supplier_name")

    # Escape hatch for raw dict access
    raw_data = document.raw
    doc_id = raw_data.get("document_id")

Note: ExtendedDocument is NOT used in plugins. Plugins receive Document.


ScoredDocument

Pairs an ExtendedDocument with its calculated score. Used in the benchmark compute() method.

from bizsupply_sdk import ScoredDocument

Attributes:

AttributeTypeDescription
documentExtendedDocumentThe scored document (with all fields and aggregations)
scorefloatThe float score calculated by score()

Example in compute():

def compute(self, results):
    # results is list[ScoredDocument]
    # Simple: best (lowest) score
    return min(r.score for r in results)

    # Weighted by volume:
    total_weight = sum(r.document.get("volume") or 1 for r in results)
    return sum(
        r.score * (r.document.get("volume") or 1) for r in results
    ) / total_weight

MatchRule

Defines how to link related document types for benchmark aggregations. Declared as MATCH_RULES class attribute on benchmarks.

from bizsupply_sdk import MatchRule

Attributes:

AttributeTypeRequiredDescription
namestrYesUnique identifier for this rule (e.g., "contract_invoice_match")
left_grouplist[str]YesLabels for source documents (e.g., ["contract", "energy"])
right_grouplist[str]YesLabels for target documents to link (e.g., ["invoice", "energy"])
conditionslist[MatchCondition]YesALL conditions must match (AND logic)
descriptionstr | NoneNoHuman-readable description of what this rule links

Example:

from bizsupply_sdk import MatchRule, MatchCondition

MATCH_RULES = [
    MatchRule(
        name="contract_invoice_match",
        left_group=["contract", "energy"],
        right_group=["invoice", "energy"],
        description="Link energy contracts to their invoices",
        conditions=[
            MatchCondition(
                left_field="client_tax_id",
                right_field="client_tax_id",
                match_type="==",
            ),
            MatchCondition(
                left_field="cpe_point_of_delivery",
                right_field="cpe_point_of_delivery",
                match_type="==",
            ),
        ],
    ),
]

MatchCondition

A single matching condition between two document fields. Used inside MatchRule.conditions.

from bizsupply_sdk import MatchCondition

Attributes:

AttributeTypeRequiredDescription
left_fieldstrYesSemantic field name in the left (source) document
right_fieldstrYesSemantic field name in the right (target) document
match_typestrYesComparison operator (see table below)

Supported match_type operators:

OperatorDescriptionExample
==Equalclient_tax_id == client_tax_id
!=Not equalstatus != "closed"
<Less thanstart_date < end_date
<=Less than or equalemission_date <= end_date
>Greater thanamount > threshold
>=Greater than or equalend_date >= emission_date
containsString containsdescription contains keyword
starts_withString starts withcode starts_with "ENE"

Methods:

MethodReturnsDescription
evaluate(left_value, right_value)boolEvaluate this condition against two values. Returns False if either value is None. Useful for local testing.

Complete Import Reference

# Plugin base classes
from bizsupply_sdk import ClassificationPlugin, ExtractionPlugin, SourcePlugin

# Benchmark base class
from bizsupply_sdk import BaseBenchmark

# Core models (plugins)
from bizsupply_sdk import Document, ExtractionResult, DocumentInput
from bizsupply_sdk import BaseSourceState, DynamicCredential
from bizsupply_sdk import OntologyField, OntologyNode, OntologyManifest

# Benchmark models
from bizsupply_sdk import ExtendedDocument, ScoredDocument
from bizsupply_sdk import MatchRule, MatchCondition

# Protocol (for type hints)
from bizsupply_sdk import PluginServicesProtocol

Next Steps