Create a Classification Plugin
Create a Classification Plugin
Goal: Build a plugin that categorizes documents with labels (invoice, receipt, contract, etc.)
Use When: You need to automatically categorize documents based on their content.
What Classification Plugins Do
Classification plugins analyze document content and assign labels from your ontology taxonomy. For example:
- Categorize emails as "invoice", "receipt", "statement"
- Identify document types: "contract", "amendment", "proposal"
- Multi-level classification: "contract" > "energy" > "residential"
Key Method: Return a label string from classify() - the Engine handles persistence and ontology traversal.
Prerequisites
Before starting:
- Install the SDK:
pip install bizsupply-sdk - Read Plugin Interface Specification
- Create a Prompt for classification instructions
- Optionally: Create an Ontology with your label taxonomy
Step 1: Create the Plugin Code
Create invoice_classifier.py:
from bizsupply_sdk import ClassificationPlugin
class InvoiceClassifierPlugin(ClassificationPlugin):
"""
Classifies documents as invoices, receipts, or other.
The Engine calls classify() once per level of the ontology tree.
Just pick from available_labels - Engine handles traversal!
"""
# Optional: Define configurable parameters as class attributes
configurable_parameters = [
{
"parameter_name": "classification_prompt_id",
"parameter_type": "str",
"default_value": None,
"description": "Prompt ID for classification (REQUIRED)",
},
]
async def classify(
self,
document,
file_data,
mime_type,
available_labels,
current_path,
configs,
):
"""
Classify at a single hierarchy level.
Args:
document: The document being classified
file_data: Raw file bytes (Gemini reads PDFs directly)
mime_type: File MIME type
available_labels: Ontology labels at this level
current_path: Labels already selected (e.g., ["Energy", "Contract"])
configs: Runtime configuration (prompt IDs, etc.)
Returns:
- Label from available_labels: Engine continues to next level
- Label NOT in available_labels: tracked as llm_suggested
- None: Engine triggers suggestion workflow
"""
self.logger.info(
f"Classifying {document.document_id} at level {len(current_path)}: "
f"path={current_path}, options={available_labels}"
)
# Build prompt with context
path_str = " > ".join(current_path) if current_path else "Root"
options_str = ", ".join(available_labels)
prompt = f"""You are a document classification expert.
Current classification path: {path_str}
Available categories: {options_str}
Analyze the document and select the most appropriate category from the list above.
Return JSON: {{"category": "selected_category"}}
If none of the categories fit, return: {{"category": null}}
[See attached document file]"""
# Call LLM (MUST use await)
result = await self.prompt_llm(
prompt=prompt,
file_data=file_data,
mime_type=mime_type,
)
if not result:
self.logger.error(f"Empty LLM response for {document.document_id}")
return None
# Return the selected label
selected_label = result.get("category")
if not selected_label or selected_label == "null":
self.logger.info(f"LLM returned no category for {document.document_id}")
return None # Engine will trigger suggestion workflow
self.logger.info(f"Selected '{selected_label}' for {document.document_id}")
return selected_labelStep 2: Create a Classification Prompt
Before registering, create the prompt your plugin will use:
POST /prompts
Content-Type: application/json
Authorization: Bearer <token>
{
"name": "Invoice Classification Prompt",
"prompt": "Analyze this document and classify it.\n\nDocument:\n{document_content}\n\nRespond with JSON:\n{\n \"document_type\": \"invoice\" | \"receipt\" | \"statement\" | \"other\",\n \"confidence\": 0.0-1.0\n}"
}Save the returned prompt_id - you'll need it when configuring pipelines.
Step 3: Validate and Register
Validate your plugin before registering:
bizsupply validate invoice_classifier.pyThen register:
curl -X POST "https://api.bizsupply.com/api/v1/plugins" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "name=Invoice Classifier" \
-F "description=Classifies documents as invoices, receipts, or other" \
-F "code_file=@invoice_classifier.py"The plugin type and configurable parameters are automatically extracted from your code.
Key Methods
| Method | Purpose |
|---|---|
await self.prompt_llm(prompt, file_data, mime_type) | Call LLM for classification |
await self.get_prompt(prompt_id) | Load prompt template |
self.logger | Plugin-specific logger |
The Engine handles document content fetching, ontology traversal, and classification persistence automatically.
Hierarchical Classification
The Engine handles multi-level classification automatically. Your classify() method is called once per level of the ontology tree:
Level 0: available_labels=["invoice", "contract", "receipt"], current_path=[]
-> Plugin returns "contract"
Level 1: available_labels=["energy", "service", "lease"], current_path=["contract"]
-> Plugin returns "energy"
Level 2: available_labels=["residential", "commercial"], current_path=["contract", "energy"]
-> Plugin returns "residential"
Final classification: ["contract", "energy", "residential"]
You just pick from available_labels at each level. The Engine builds the full path and persists it.
Common Mistakes
Using Old execute() Method
# WRONG - old v1.0 API
async def execute(self, context: PluginContext):
for doc in context.documents:
await self.add_document_classification(doc, labels=["invoice"])
return context.documents
# CORRECT - return a label from classify()
async def classify(self, document, file_data, mime_type, available_labels, current_path, configs):
return "invoice" # Engine handles persistenceReturning a List Instead of a String
# WRONG - classify() returns str | None, not list
return ["invoice", "utility"]
# CORRECT - return a single label
return "invoice"Missing SDK Import
# WRONG - no import
class MyPlugin(ClassificationPlugin): # NameError!
...
# CORRECT
from bizsupply_sdk import ClassificationPlugin
class MyPlugin(ClassificationPlugin):
...Forgetting to Await
# WRONG - returns coroutine, not result
result = self.prompt_llm(prompt="...")
# CORRECT
result = await self.prompt_llm(prompt="...")Next Steps
- Use Plugins - Execute your plugin in a pipeline
- Create an Extraction Plugin - Extract data from classified documents
- Plugin Service API - All available service methods
Updated 2 months ago