Plugin Service API Reference
Plugin Service API Reference
Complete reference for all service methods, helper functions, and data models available through the bizsupply-sdk.
Install the SDK
This is the first step. Everything below requires the SDK.
pip install bizsupply-sdkHow It Works
The bizsupply Engine handles most operations automatically. Plugins and benchmarks focus on business logic:
| What the Engine Does | What You Do |
|---|---|
| Fetches document file data | Call prompt_llm() for AI processing |
| Resolves ontology fields from labels | Call get_prompt() to load prompt templates |
| Persists classifications and extracted data | Format fields with format_fields_for_prompt() |
| Manages source credentials and state | Log with self.logger |
| Pre-fetches aggregations for benchmarks | Return results (Engine persists them) |
| Resolves slot names to semantic field names | Access fields via document.get("field_name") |
Service Methods Quick Reference
| Method | Type | Available To | Returns |
|---|---|---|---|
await self.prompt_llm(...) | Async | All plugins | dict | list | None |
await self.get_prompt(prompt_id) | Async | All plugins | str |
self.format_fields_for_prompt(fields) | Sync | Extraction plugins | str |
self.logger | Property | All plugins + benchmarks | Logger |
prompt_llm()
result = await self.prompt_llm(
prompt,
file_data=None,
mime_type=None,
schema=None,
)Sends a prompt to an LLM and returns a parsed JSON response. The platform handles provider selection, caching, and other optimizations automatically.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | str | Yes | The prompt text to send to the LLM |
file_data | bytes | None | No | File bytes to include for multimodal/vision processing. Pass the file_data parameter from your plugin method. |
mime_type | str | None | No | MIME type of file_data (e.g., "application/pdf", "image/png", "text/plain"). Pass the mime_type parameter from your plugin method. |
schema | type[BaseModel] | dict | None | No | Pydantic model class or dict schema for structured output. The LLM response will conform to this schema. |
Returns: dict | list | None
dict- Parsed JSON object from LLM responselist- Parsed JSON array from LLM responseNone- If LLM fails, returns empty response, or JSON parsing fails
Raises: RuntimeError if services are not initialized.
Example - Basic prompt:
result = await self.prompt_llm(
prompt="Is this an invoice? Return JSON: {\"is_invoice\": true/false}"
)
if result:
is_invoice = result.get("is_invoice")Example - With file attachment (multimodal):
result = await self.prompt_llm(
prompt="Extract invoice data from this document.",
file_data=file_data,
mime_type=mime_type,
)Example - With Pydantic schema for structured output:
from pydantic import BaseModel
class InvoiceData(BaseModel):
vendor_name: str
total: float
date: str
result = await self.prompt_llm(
prompt="Extract vendor, total, and date from this invoice.",
file_data=file_data,
mime_type=mime_type,
schema=InvoiceData,
)
# result is a dict matching the schema: {"vendor_name": "...", "total": ..., "date": "..."}get_prompt()
template = await self.get_prompt(prompt_id)Retrieves a stored prompt template by ID. Prompt templates are created via the REST API and referenced by their UUID.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt_id | str | Yes | UUID of the prompt template |
Returns: str - The prompt template content as a string.
Raises: RuntimeError if services not initialized. Exception if prompt not found.
Example:
prompt_id = configs.get("classification_prompt_id")
template = await self.get_prompt(prompt_id)
prompt = template.format(
labels=available_labels,
document_content="[See attached document file]",
)
result = await self.prompt_llm(prompt=prompt, file_data=file_data, mime_type=mime_type)format_fields_for_prompt()
fields_json = self.format_fields_for_prompt(fields)Formats a list of OntologyField objects as a JSON string for inclusion in LLM prompts. This is a sync method (no await needed).
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
fields | list[OntologyField] | Yes | List of OntologyField objects to format |
Returns: str - Pretty-printed JSON string with field definitions.
Output format:
[
{"name": "invoice_total", "type": "number", "description": "Total amount due"},
{"name": "vendor_name", "type": "string", "description": "Name of the vendor"}
]Example:
async def extract(self, document, file_data, mime_type, fields, configs):
fields_json = self.format_fields_for_prompt(fields)
result = await self.prompt_llm(
prompt=f"Extract these fields:\n{fields_json}",
file_data=file_data,
mime_type=mime_type,
)
return ExtractionResult(data=result or {})self.logger
Logger instance automatically available in all plugins and benchmarks. Creates a hierarchical logger name: bizsupply_sdk.plugins.{type}.{class_name}.
Methods:
| Method | Use For |
|---|---|
self.logger.debug(msg) | Detailed debugging information |
self.logger.info(msg) | Progress updates, key decisions |
self.logger.warning(msg) | Non-critical issues that may need attention |
self.logger.error(msg) | Errors (continues execution) |
Important: Use self.logger, not print() or logging.getLogger().
What the Engine Handles
The Engine manages all persistence and lifecycle operations automatically.
For Classification Plugins
| Operation | How It Works |
|---|---|
| Fetch document content | Engine pre-fetches file_data and mime_type once, passes to every classify() call |
| Traverse ontology | Engine walks the tree, calling classify() at each level with the correct available_labels |
| Persist labels | Engine builds the full classification path from your responses and saves it |
| Handle suggestions | When you return None or a label not in the ontology, Engine triggers the suggestion workflow |
For Extraction Plugins
| Operation | How It Works |
|---|---|
| Skip unclassified docs | Engine skips documents without labels |
| Resolve fields | Engine finds ontology fields matching document.labels and injects them as fields |
| Fetch document content | Engine pre-fetches file_data and mime_type |
| Persist extracted data | Engine saves your ExtractionResult.data to the database |
For Source Plugins
| Operation | How It Works |
|---|---|
| Fetch credentials | Engine loads credentials from secure storage and injects as DynamicCredential |
| Load state | Engine loads and deserializes your typed state model from the database |
| Create documents | For each DocumentInput you yield, Engine creates the document |
| Save state | Engine auto-saves your state after each document (crash-resilient) |
For Benchmarks
| Operation | How It Works |
|---|---|
| Query documents | Engine queries documents matching target_labels |
| Resolve fields | Engine maps physical slot names (number_1, string_2) to semantic names (price_per_kwh) |
| Build ExtendedDocument | Engine constructs ExtendedDocument with resolved fields |
| Pre-fetch aggregations | Engine links related documents using your MATCH_RULES and attaches as .aggregations |
| Run scoring loop | Engine calls score() for each document, filters None results |
| Persist scores | Engine builds score records and saves to database |
Data Models
Document
The primary model passed to classification and extraction plugins.
from bizsupply_sdk import DocumentAttributes:
| Attribute | Type | Description |
|---|---|---|
document_id | str | Unique identifier for the document (UUID) |
original_filename | str | Original filename with extension (e.g., "invoice.pdf", "contract.docx") |
labels | list[str] | None | Hierarchical classification labels (e.g., ["contract", "energy", "residential"]). None or empty list if not yet classified. |
data | dict[str, Any] | None | Extracted field values (e.g., {"invoice_total": 1500.00}). Empty dict if no data extracted yet. |
metadata | dict[str, Any] | None | Custom metadata attached to the document (e.g., source info, external IDs). |
created_at | datetime | None | Timestamp when the document was created in the platform |
updated_at | datetime | None | Timestamp when the document was last modified |
Usage in plugins:
# In classify() or extract()
self.logger.info(f"Processing {document.document_id}: {document.original_filename}")
self.logger.info(f"Labels: {document.labels}")
self.logger.info(f"Existing data: {document.data}")
self.logger.info(f"Metadata: {document.metadata}")ExtractionResult
Return type for extraction plugins' extract() method.
from bizsupply_sdk import ExtractionResultAttributes:
| Attribute | Type | Required | Description |
|---|---|---|---|
data | dict[str, Any] | Yes | Dictionary of field name -> extracted value pairs. Keys should match OntologyField.name values from the fields parameter. |
llm_fields | list[str] | None | No | List of field names that were generated by the LLM (for provenance tracking). Default: None. |
document_type | str | None | No | Document type identifier for silver tier mapping. If not provided, inferred from the document's labels. |
Example:
return ExtractionResult(
data={
"invoice_total": 1500.00,
"vendor_name": "ACME Corp",
"invoice_date": "2025-01-15",
},
llm_fields=["invoice_total", "vendor_name", "invoice_date"],
document_type="invoice",
)DocumentInput
Yielded by source plugins from fetch(). Each DocumentInput becomes a new document in the platform.
from bizsupply_sdk import DocumentInputAttributes:
| Attribute | Type | Required | Description |
|---|---|---|---|
file_data | bytes | Yes | Raw file content as bytes. This is the actual file (PDF, image, text, etc.). |
filename | str | Yes | Original filename including extension (e.g., "invoice_123.pdf"). The extension is used for MIME type detection. |
mime_type | str | None | No | MIME type of the file (e.g., "application/pdf"). Auto-detected from content if not provided. |
metadata | dict[str, Any] | None | No | Optional metadata to attach to the document (e.g., {"source": "gmail", "sender": "[email protected]"}). |
Example:
yield DocumentInput(
file_data=pdf_bytes,
filename="invoice_123.pdf",
mime_type="application/pdf",
metadata={
"source": "gmail",
"sender": "[email protected]",
"subject": "Invoice #123",
"received_at": "2025-01-15T10:30:00Z",
},
)BaseSourceState
Base class for source plugin state models. Subclass to define your sync state.
from bizsupply_sdk import BaseSourceStateBaseSourceState is an empty Pydantic BaseModel that serves as a marker base class. Define your own fields:
class MySourceState(BaseSourceState):
cursor: str | None = None
last_sync_time: str | None = None
page: int = 0
processed_ids: list[str] = []How state works:
- Engine loads your state from the database (deserialized as your typed model)
- Engine passes it to
fetch()as thestateparameter - You mutate fields directly (
state.cursor = "abc") - Engine auto-saves state after each yielded
DocumentInput - If plugin crashes, Engine resumes from the last saved state
Example:
async def fetch(self, credentials, state, configs):
self.logger.info(f"Resuming from cursor: {state.cursor}")
for item in items:
yield DocumentInput(...)
state.cursor = item.id # Engine saves after this yield
state.page += 1DynamicCredential
Credential object injected into source plugins by the Engine. Provides multiple access patterns for credential fields.
from bizsupply_sdk import DynamicCredentialAll credential values are stored as strings. Convert to other types in your plugin as needed.
Methods:
| Method | Signature | Description |
|---|---|---|
| Attribute access | credentials.api_key | Get field value. Raises AttributeError if not found. |
.get() | credentials.get(name, default=None) | Get field value or default. Returns None if not found and no default. |
.validate_required_fields() | credentials.validate_required_fields(["api_key", "api_url"]) | Raises ValueError if any listed fields are missing from credentials. |
.has_field() | credentials.has_field("api_key") | Returns True if field exists and has a truthy (non-empty) value. |
in operator | "api_key" in credentials | Returns True if field name exists (even if value is empty). |
.keys() | credentials.keys() | Returns list[str] of all credential field names. |
.to_dict() | credentials.to_dict() | Returns dict[str, Any] of all fields. Caution: may expose secrets in logs. |
Example:
async def fetch(self, credentials, state, configs):
# Validate required fields early (raises ValueError if missing)
credentials.validate_required_fields(["api_key", "api_url"])
# Attribute access (raises AttributeError if missing)
api_key = credentials.api_key
api_url = credentials.api_url
# .get() with default (returns default if missing)
timeout = int(credentials.get("timeout_seconds", "30"))
# Check if optional field exists and is non-empty
if credentials.has_field("refresh_token"):
# Handle OAuth refresh
...
# List all available fields
self.logger.info(f"Available credential fields: {credentials.keys()}")OntologyField
Field definition injected into extraction plugins. Each field describes one piece of data to extract.
from bizsupply_sdk import OntologyFieldAttributes:
| Attribute | Type | Description |
|---|---|---|
name | str | Field name, used as the key in ExtractionResult.data (e.g., "invoice_total", "vendor_name") |
dtype | str | Data type: "string", "int", "float", "date", or other custom types |
description | str | None | Human-readable description of what this field contains (e.g., "Total monetary value of the invoice") |
Example:
# In extract(), fields are injected by the Engine
for field in fields:
self.logger.info(f"Field: {field.name} ({field.dtype}): {field.description}")
# Use format_fields_for_prompt() to convert to JSON for prompts
fields_json = self.format_fields_for_prompt(fields)OntologyNode
A node in the ontology classification tree. Contains a label, optional fields, and optional child nodes.
from bizsupply_sdk import OntologyNodeAttributes:
| Attribute | Type | Description |
|---|---|---|
label | str | The classification label for this node (e.g., "contract", "energy") |
fields | list[OntologyField] | Fields to extract for documents at this classification level. Default: []. |
children | list[OntologyNode] | Child nodes (sub-classifications). Default: []. |
Methods:
| Method | Returns | Description |
|---|---|---|
get_all_fields() | list[OntologyField] | All fields in this subtree (this node + all descendants) |
find_node_by_path(path) | OntologyNode | None | Find a node by following a list of labels (e.g., ["contract", "energy"]) |
get_leaf_labels() | list[list[str]] | All paths from this node to leaf nodes |
OntologyManifest
Complete ontology model with taxonomy tree and navigation methods.
from bizsupply_sdk import OntologyManifestAttributes:
| Attribute | Type | Description |
|---|---|---|
ontology_id | str | None | Unique identifier for the ontology |
name | str | Human-readable name (e.g., "Invoice Processing Ontology") |
description | str | None | Description of the ontology's purpose |
taxonomy | OntologyNode | Root node of the classification tree |
created_at | datetime | None | Creation timestamp |
updated_at | datetime | None | Last update timestamp |
Methods:
| Method | Returns | Description |
|---|---|---|
get_all_label_paths() | list[list[str]] | All possible classification paths from root to leaf |
find_fields_for_path(path) | list[OntologyField] | All fields that apply to a classification path (inherits from parent nodes) |
ExtendedDocument
Read-only document wrapper used in benchmarks. The Engine resolves physical slot names (number_1, string_2) to semantic field names (price_per_kwh, client_tax_id) before constructing this object. Benchmarks never see raw slot names.
from bizsupply_sdk import ExtendedDocumentProperties and Methods:
| Method / Property | Returns | Description |
|---|---|---|
.get(field_name) | Any | None | Get a field value by semantic name (e.g., doc.get("price_per_kwh")). Returns None if not found. |
.aggregations | list[ExtendedDocument] | Related documents linked by MATCH_RULES (e.g., invoices linked to a contract). Each aggregation is also an ExtendedDocument. |
.raw | dict[str, Any] | Escape hatch: access the underlying dict directly. Use when you need fields not covered by .get(). |
Example in a benchmark:
def score(self, document):
# Access fields by semantic name
contract_value = document.get("contract_value")
# Access related documents (linked by MATCH_RULES)
for invoice in document.aggregations:
price = invoice.get("price_per_kwh")
supplier = invoice.get("supplier_name")
# Escape hatch for raw dict access
raw_data = document.raw
doc_id = raw_data.get("document_id")Note: ExtendedDocument is NOT used in plugins. Plugins receive Document.
ScoredDocument
Pairs an ExtendedDocument with its calculated score. Used in the benchmark compute() method.
from bizsupply_sdk import ScoredDocumentAttributes:
| Attribute | Type | Description |
|---|---|---|
document | ExtendedDocument | The scored document (with all fields and aggregations) |
score | float | The float score calculated by score() |
Example in compute():
def compute(self, results):
# results is list[ScoredDocument]
# Simple: best (lowest) score
return min(r.score for r in results)
# Weighted by volume:
total_weight = sum(r.document.get("volume") or 1 for r in results)
return sum(
r.score * (r.document.get("volume") or 1) for r in results
) / total_weightMatchRule
Defines how to link related document types for benchmark aggregations. Declared as MATCH_RULES class attribute on benchmarks.
from bizsupply_sdk import MatchRuleAttributes:
| Attribute | Type | Required | Description |
|---|---|---|---|
name | str | Yes | Unique identifier for this rule (e.g., "contract_invoice_match") |
left_group | list[str] | Yes | Labels for source documents (e.g., ["contract", "energy"]) |
right_group | list[str] | Yes | Labels for target documents to link (e.g., ["invoice", "energy"]) |
conditions | list[MatchCondition] | Yes | ALL conditions must match (AND logic) |
description | str | None | No | Human-readable description of what this rule links |
Example:
from bizsupply_sdk import MatchRule, MatchCondition
MATCH_RULES = [
MatchRule(
name="contract_invoice_match",
left_group=["contract", "energy"],
right_group=["invoice", "energy"],
description="Link energy contracts to their invoices",
conditions=[
MatchCondition(
left_field="client_tax_id",
right_field="client_tax_id",
match_type="==",
),
MatchCondition(
left_field="cpe_point_of_delivery",
right_field="cpe_point_of_delivery",
match_type="==",
),
],
),
]MatchCondition
A single matching condition between two document fields. Used inside MatchRule.conditions.
from bizsupply_sdk import MatchConditionAttributes:
| Attribute | Type | Required | Description |
|---|---|---|---|
left_field | str | Yes | Semantic field name in the left (source) document |
right_field | str | Yes | Semantic field name in the right (target) document |
match_type | str | Yes | Comparison operator (see table below) |
Supported match_type operators:
| Operator | Description | Example |
|---|---|---|
== | Equal | client_tax_id == client_tax_id |
!= | Not equal | status != "closed" |
< | Less than | start_date < end_date |
<= | Less than or equal | emission_date <= end_date |
> | Greater than | amount > threshold |
>= | Greater than or equal | end_date >= emission_date |
contains | String contains | description contains keyword |
starts_with | String starts with | code starts_with "ENE" |
Methods:
| Method | Returns | Description |
|---|---|---|
evaluate(left_value, right_value) | bool | Evaluate this condition against two values. Returns False if either value is None. Useful for local testing. |
Complete Import Reference
# Plugin base classes
from bizsupply_sdk import ClassificationPlugin, ExtractionPlugin, SourcePlugin
# Benchmark base class
from bizsupply_sdk import BaseBenchmark
# Core models (plugins)
from bizsupply_sdk import Document, ExtractionResult, DocumentInput
from bizsupply_sdk import BaseSourceState, DynamicCredential
from bizsupply_sdk import OntologyField, OntologyNode, OntologyManifest
# Benchmark models
from bizsupply_sdk import ExtendedDocument, ScoredDocument
from bizsupply_sdk import MatchRule, MatchCondition
# Protocol (for type hints)
from bizsupply_sdk import PluginServicesProtocolNext Steps
- Plugin Interface - Required contract for all plugins and benchmarks
- Create a Plugin - Step-by-step plugin development guide
Updated 2 months ago