Create a Classification Plugin

Create a Classification Plugin

Goal: Build a plugin that categorizes documents with labels (invoice, receipt, contract, etc.)

Use When: You need to automatically categorize documents based on their content.


What Classification Plugins Do

Classification plugins analyze document content and assign labels from your ontology taxonomy. For example:

  • Categorize emails as "invoice", "receipt", "statement"
  • Identify document types: "contract", "amendment", "proposal"
  • Multi-level classification: "contract" > "energy" > "residential"

Key Method: Return a label string from classify() - the Engine handles persistence and ontology traversal.


Prerequisites

Before starting:


Step 1: Create the Plugin Code

Create invoice_classifier.py:

from bizsupply_sdk import ClassificationPlugin


class InvoiceClassifierPlugin(ClassificationPlugin):
    """
    Classifies documents as invoices, receipts, or other.

    The Engine calls classify() once per level of the ontology tree.
    Just pick from available_labels - Engine handles traversal!
    """

    # Optional: Define configurable parameters as class attributes
    configurable_parameters = [
        {
            "parameter_name": "classification_prompt_id",
            "parameter_type": "str",
            "default_value": None,
            "description": "Prompt ID for classification (REQUIRED)",
        },
    ]

    async def classify(
        self,
        document,
        file_data,
        mime_type,
        available_labels,
        current_path,
        configs,
    ):
        """
        Classify at a single hierarchy level.

        Args:
            document: The document being classified
            file_data: Raw file bytes (Gemini reads PDFs directly)
            mime_type: File MIME type
            available_labels: Ontology labels at this level
            current_path: Labels already selected (e.g., ["Energy", "Contract"])
            configs: Runtime configuration (prompt IDs, etc.)

        Returns:
            - Label from available_labels: Engine continues to next level
            - Label NOT in available_labels: tracked as llm_suggested
            - None: Engine triggers suggestion workflow
        """
        self.logger.info(
            f"Classifying {document.document_id} at level {len(current_path)}: "
            f"path={current_path}, options={available_labels}"
        )

        # Build prompt with context
        path_str = " > ".join(current_path) if current_path else "Root"
        options_str = ", ".join(available_labels)

        prompt = f"""You are a document classification expert.

Current classification path: {path_str}
Available categories: {options_str}

Analyze the document and select the most appropriate category from the list above.
Return JSON: {{"category": "selected_category"}}
If none of the categories fit, return: {{"category": null}}

[See attached document file]"""

        # Call LLM (MUST use await)
        result = await self.prompt_llm(
            prompt=prompt,
            file_data=file_data,
            mime_type=mime_type,
        )

        if not result:
            self.logger.error(f"Empty LLM response for {document.document_id}")
            return None

        # Return the selected label
        selected_label = result.get("category")

        if not selected_label or selected_label == "null":
            self.logger.info(f"LLM returned no category for {document.document_id}")
            return None  # Engine will trigger suggestion workflow

        self.logger.info(f"Selected '{selected_label}' for {document.document_id}")
        return selected_label

Step 2: Create a Classification Prompt

Before registering, create the prompt your plugin will use:

POST /prompts
Content-Type: application/json
Authorization: Bearer <token>

{
  "name": "Invoice Classification Prompt",
  "prompt": "Analyze this document and classify it.\n\nDocument:\n{document_content}\n\nRespond with JSON:\n{\n  \"document_type\": \"invoice\" | \"receipt\" | \"statement\" | \"other\",\n  \"confidence\": 0.0-1.0\n}"
}

Save the returned prompt_id - you'll need it when configuring pipelines.


Step 3: Validate and Register

Validate your plugin before registering:

bizsupply validate invoice_classifier.py

Then register:

curl -X POST "https://api.bizsupply.com/api/v1/plugins" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "name=Invoice Classifier" \
  -F "description=Classifies documents as invoices, receipts, or other" \
  -F "code_file=@invoice_classifier.py"

The plugin type and configurable parameters are automatically extracted from your code.


Key Methods

MethodPurpose
await self.prompt_llm(prompt, file_data, mime_type)Call LLM for classification
await self.get_prompt(prompt_id)Load prompt template
self.loggerPlugin-specific logger

The Engine handles document content fetching, ontology traversal, and classification persistence automatically.


Hierarchical Classification

The Engine handles multi-level classification automatically. Your classify() method is called once per level of the ontology tree:

Level 0: available_labels=["invoice", "contract", "receipt"], current_path=[]
  -> Plugin returns "contract"

Level 1: available_labels=["energy", "service", "lease"], current_path=["contract"]
  -> Plugin returns "energy"

Level 2: available_labels=["residential", "commercial"], current_path=["contract", "energy"]
  -> Plugin returns "residential"

Final classification: ["contract", "energy", "residential"]

You just pick from available_labels at each level. The Engine builds the full path and persists it.


Common Mistakes

Using Old execute() Method

# WRONG - old v1.0 API
async def execute(self, context: PluginContext):
    for doc in context.documents:
        await self.add_document_classification(doc, labels=["invoice"])
    return context.documents

# CORRECT - return a label from classify()
async def classify(self, document, file_data, mime_type, available_labels, current_path, configs):
    return "invoice"  # Engine handles persistence

Returning a List Instead of a String

# WRONG - classify() returns str | None, not list
return ["invoice", "utility"]

# CORRECT - return a single label
return "invoice"

Missing SDK Import

# WRONG - no import
class MyPlugin(ClassificationPlugin):  # NameError!
    ...

# CORRECT
from bizsupply_sdk import ClassificationPlugin

class MyPlugin(ClassificationPlugin):
    ...

Forgetting to Await

# WRONG - returns coroutine, not result
result = self.prompt_llm(prompt="...")

# CORRECT
result = await self.prompt_llm(prompt="...")

Next Steps