How do I test it before connecting my real ERP?

Run the file in the 'Put it together' section against a sample invoice PDF. The four stubs (the purchase-order total, the approved-vendor set, the posted-set, and create_bill) stand in for your ERP, ledger, and vendor master, so you can watch the full extract, match, decide, and post flow end to end and confirm the logic before wiring in any real system. Then swap one stub at a time for the real API and test again.

Which framework should I use: ADK, the Claude SDK, or LangGraph?

The right framework depends on your team and your stack, and the agent logic is the same in all three. The code tabs on every step let you read the identical pipeline in Google ADK, the Anthropic SDK, and LangGraph and pick the one you already know. ADK and LangGraph are model-agnostic, so you can run Gemini, Claude, or another model under either; the Claude tab uses the Anthropic SDK directly.

Which model should I use, and what does it cost?

Use a fast, cheap model such as Gemini 2.5 Flash for the bulk of invoices, and reserve a stronger model (Gemini 2.5 Pro or 3-series, or Claude Opus) for genuinely hard layouts in the exception loop. The running cost is a small per-invoice model charge plus hosting, which is tiny next to the labour it removes, and you control it by sending the cheap model first and only escalating when a check fails.

What is an agent framework actually doing for me?

It saves you from hand-rolling the loop. You define each step as a unit with a model, an instruction, and a few tools, where a tool is a typed Python function. The framework wires the steps together (a sequence, plus a loop for retries), passes data between them through shared state, runs the whole thing, and gives you the events to log. That is why most of the build is your schema, your matching rules, and your ERP functions, while the framework handles the orchestration.

What accuracy can I expect, and how does scan quality affect it?

Extraction is highly accurate on clean digital PDFs and drops on poor scans and photos, which is why the exception loop re-extracts disputed fields with a stronger model and a stricter prompt before involving a human. Accuracy also climbs over time because the agent learns from each correction. Treat any single accuracy percentage you see in vendor marketing as best-case for clean inputs.

Do I need three-way matching for every invoice, or is two-way enough?

Two-way (invoice against PO) is enough for services and where you do not track goods receipts. Three-way (invoice, PO, and goods receipt) is the standard for physical goods because it confirms you are paying for what actually arrived. The matching is deterministic Python exposed to the agent as a tool, and you set which applies per category, with a tolerance for small differences.

How does it detect and prevent duplicate payments?

Every file is hashed on intake, and before posting the guard checks vendor plus invoice number plus that hash against the ledger. If any match, it flags a duplicate and holds it. This catches both the same PDF sent twice and the same invoice resubmitted in a different format, and it runs before the posting tool is ever called.

How do you prevent fraud such as changed bank details or fake invoices?

The guard compares the bank details on each invoice against the vendor master and holds anything that changed for explicit human confirmation, since bank-detail swaps are a common fraud. Combined with vendor validation, approval thresholds, and an immutable audit log, it closes the usual gaps.

Can this run on Vertex AI or with data residency in India?

Yes. The google-genai code targets either the Gemini API or Vertex AI by setting Client(vertexai=True, project=..., location=...), and on Vertex you can pin the region to an Indian Google Cloud location for data residency. ADK and LangGraph both deploy to Cloud Run or similar, and for fully air-gapped requirements you use the Document AI fallback on-premise.

How does it integrate with QuickBooks, Xero, NetSuite, SAP, or Tally?

Through each tool's API, which all of these expose. The posting step is a typed function built against your specific system that sends fields in the format it expects, with the PDF and audit trail attached. Because it is an ordinary tool, swapping the ERP changes one function while the rest of the pipeline stays the same. Where a system has no API, the agent emits a validated import file.

How is this different from traditional OCR or RPA, and is OCR ever still needed?

OCR only turns an image into text, and you still have to make sense of it. RPA replays fixed clicks and breaks when a screen or layout changes. This agent reads any layout natively, validates against your records, decides under your rules, and escalates what it cannot resolve. You usually do not need a separate OCR step, since the model reads the PDF directly, but a dedicated engine such as Google Document AI is worth keeping as a fallback for air-gapped data you cannot send to a hosted model, very large batch scans where a per-page engine is cheaper, or badly degraded images where preprocessing helps.

Should I build a custom agent or buy an off-the-shelf AP tool?

Buy if your AP is standard and you want speed with less engineering. Build, or have it built, when you need your own rules, tight ERP integration, and control of the data and IP. See the comparison table above; the right answer depends on volume and how custom your process is, and a framework keeps the build far smaller.

Agent Playbook, build tutorial

How to Build an AI Agent for Invoice Processing

A walkthrough of how to build an accounts payable (AP) agent, with the core decision logic in runnable code for each stage. It watches your inbox, reads every invoice directly from the PDF with no separate optical character recognition (OCR) step, matches it to the purchase order and goods receipt, applies your rules, and posts it to your accounting system, escalating to a human only on exceptions. Every code step is shown in three frameworks, Google ADK, the Claude SDK, and LangGraph, so you can build it on the stack your team already knows. The code here is the agent's core logic; standing it up in production also means adding the setup, integrations, and hardening around it.

Why manual and rule-based AP breaks

Accounts payable has more moving parts than it looks. Every vendor formats invoices differently, totals and taxes need checking, each one must be matched to a purchase order and what actually arrived, approvals get chased over email, and only then is it keyed into the accounting system. The result is slow and error prone. Industry benchmarks put manual processing at roughly $15 to $40 per invoice (Ardent Partners), while top-quartile automated teams run around $10 or below (APQC), and manual AP teams capture only 20 to 30 percent of available early-payment discounts (Ardent Partners).

Template and robotic process automation (RPA) tools help until a vendor changes their layout, then they break. An agent reads any layout instead, checks the numbers against your records, decides under rules you set, and asks a human only when something genuinely does not add up.

Agent vs fixed pipeline: when each is the honest choice

An agent follows an observe, decide, act loop: it reads the invoice, checks it against your data, decides what to do under your policy, and takes the action (post, or escalate). A fixed pipeline runs the same steps every time with no branching. All three frameworks here can do both: a straight sequence for the happy path, and a loop or branch where the document needs judgement.

Which you need depends on the work. If your invoices are uniform and your rules are simple, the straight sequence is cheaper and easier to trust. The branching agent earns its complexity when invoices vary a lot, exceptions are common, and the right next step genuinely depends on what the document says. Most real AP sits in that second case, which is why a deterministic matching core handles the numbers and the model is used only where judgement helps.

The architecture

Every invoice flows through one pipeline: ingested, classified, extracted, validated, matched, then either auto-approved and posted, or sent into the exception loop and on to a human. Corrections feed back into the agent's memory, and guardrails wrap the whole thing. The shape is identical whether you build it in ADK, the Claude SDK, or LangGraph.

Pick a framework: ADK, Claude SDK, or LangGraph

You do not build the agent loop from scratch. A framework gives you the moving parts, and you supply the logic: a step is a model plus an instruction plus a few typed tools, and a tool is just a Python function. The code tabs on every build step below show the identical pipeline three ways, so you can choose by what your team already runs.

Google ADK composes steps with a SequentialAgent and a LoopAgent and runs them with a Runner. The Claude SDK drives a tool-use loop directly against the Anthropic API. LangGraph builds an explicit graph of nodes and edges with shared state. ADK and LangGraph are model-agnostic, so you can run Gemini or Claude under either; the deterministic steps, the schema, and the ERP tools are the same code in all three.

Before you build: prerequisites

Inputs

Digital and scanned PDFs, images, multi-page invoices, and credit and debit notes. The model handles these directly; keep Document AI ready only for the messiest air-gapped cases.

Systems to connect

A model API key (Gemini, Claude, or both), Gmail API access with Pub/Sub, your accounting or ERP (Tally, Zoho, QuickBooks, NetSuite, SAP), and the PO, goods-receipt, and vendor-master sources.

Governance

Approval rules and limits, an audit log, data retention and PII handling, and separation of duties between who approves and who pays.

Setup: install, keys, and Gmail intake

Install the packages and set your Gemini key. That alone is enough to run the agent against a sample invoice PDF on your disk, which is what the runnable file further down does. The Gmail setup is only needed once you want invoices to arrive by email in production.

Shell (install and key)

# Python 3.10 or newer
python -m venv venv && source venv/bin/activate

# The runnable path below uses Google ADK + Gemini:
pip install google-adk google-genai pydantic \
            google-api-python-client google-auth-oauthlib

# The comparison tabs also use:
pip install anthropic langgraph langchain-google-genai

# Your Gemini key from aistudio.google.com/apikey:
export GEMINI_API_KEY="your-key-here"

For production intake, authorise Gmail once and register a watch so new accounts-payable mail is pushed to your handler. This writes a token.json that the ingest step reuses on every run.

Python (one-time Gmail auth)

from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow

# Read messages and manage the watch.
SCOPES = ["https://www.googleapis.com/auth/gmail.modify"]

# One-time: in Google Cloud, create a project, enable the Gmail API and Pub/Sub,
# create an OAuth "Desktop app" credential and download it as credentials.json.
# Then run this once; it opens a browser and writes token.json, which the ingest
# step reuses on every run.
creds = InstalledAppFlow.from_client_secrets_file(
    "credentials.json", SCOPES).run_local_server(port=0)
open("token.json", "w").write(creds.to_json())

# Tell Gmail to push new AP mail to a Pub/Sub topic (renew about weekly). The
# topic's push subscription points at your webhook, which calls on_ap_push().
gmail = build("gmail", "v1", credentials=creds)
gmail.users().watch(userId="me", body={
    "labelIds": ["Label_AP"],
    "topicName": "projects/<your-project>/topics/invoices"}).execute()

The stack, and why each piece

Layer	Pick	Why
Intake trigger	Gmail API (google-api-python-client) with a Pub/Sub watch, or a Drive / portal upload	Fire on every new invoice email in real time. A Pub/Sub watch pushes the event to you; a poll every few minutes is the simpler fallback.
Model	Gemini 2.5 Flash (or Pro) via google-genai, with Claude or any model as a drop-in	Reads the PDF or photo directly and returns typed JSON, so there is no separate OCR step for the common case.
Agent framework	Google ADK, the Claude Agent SDK, or LangGraph	Each defines the steps, tools, retries, and state. The code tabs below show the same agent built in all three so you can pick the one your team knows.
Matching + state	Your ERP (enterprise resource planning) API (Tally, Zoho, SAP) + Postgres	Pulls purchase orders (POs) and goods receipts to compare against, and stores runs, decisions, and corrections for the audit trail.
OCR fallback	Google Document AI, only when needed	For air-gapped data you cannot send to a hosted model, very large batch scans where a dedicated engine is cheaper per page, or badly degraded images.
Review UI	A web app, a sheet, or Slack approval buttons	Where a human clears exceptions. Start with Slack buttons; graduate to a console as volume grows.

Build it, step by step

The model and orchestration steps show the same logic in Google ADK, the Claude SDK, and LangGraph; use the tabs on those blocks to switch. The deterministic steps are plain Python, identical in every framework, so they appear once with a single label. These are the building blocks, and they call your own systems (the ERP, the ledger, the vendor master) by name; the complete file you can run today, with those placeholders stubbed out, is in "Put it together" below.

1

Ingest reliably

Watch the accounts-payable inbox with the Gmail API. You authorise once with OAuth, then register a watch so Gmail pushes a Pub/Sub notification on every new message under your AP label. On each push you list the new messages, pull the PDF attachment, and drop it on a queue so the handler returns fast. Capture a content hash up front: it is what later stops the same invoice being processed twice. Wire on_ap_push to an HTTPS route (Flask or FastAPI) and point a Pub/Sub push subscription at it; jobs is your task queue (Celery, RQ, or similar). To just try the agent without any of this, skip straight to the runnable file in 'Put it together' below, which reads a PDF from disk. This step is the same whichever agent framework you pick.

Python (Gmail API)

from googleapiclient.discovery import build
from google.oauth2.credentials import Credentials
import base64, hashlib

# SCOPES is defined in the Setup section; 'jobs' is your task queue (Celery, RQ).
gmail = build("gmail", "v1",
              credentials=Credentials.from_authorized_user_file("token.json", SCOPES))

# Gmail pushes a Pub/Sub event on new AP mail; this runs per push.
def on_ap_push(_event):
    msgs = gmail.users().messages().list(
        userId="me", labelIds=["Label_AP"], q="has:attachment").execute()
    for m in msgs.get("messages", []):
        full = gmail.users().messages().get(userId="me", id=m["id"]).execute()
        for part in full["payload"].get("parts", []):
            if part.get("filename", "").lower().endswith(".pdf"):
                blob = gmail.users().messages().attachments().get(
                    userId="me", messageId=m["id"], id=part["body"]["attachmentId"]).execute()
                pdf = base64.urlsafe_b64decode(blob["data"])
                jobs.enqueue("process_invoice",
                             {"pdf": pdf, "hash": hashlib.sha256(pdf).hexdigest()})

2

Classify the document

Not everything in an AP inbox is an invoice. Before extraction, a cheap classification pass sorts invoices from credit notes, statements, reminders, and spam, and routes non-invoices out. Send the file straight to a fast model and ask for one label; the model reads the PDF directly, so there is no OCR step. The tabs show the identical call in Gemini, Claude, and LangChain.

from google import genai
from google.genai import types

client = genai.Client()   # GEMINI_API_KEY; or Client(vertexai=True, ...) for Vertex AI

def classify(pdf: bytes) -> str:
    resp = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=[types.Part.from_bytes(data=pdf, mime_type="application/pdf"),
                  "Label this as: invoice, credit_note, statement, reminder, other."])
    return resp.text.strip().lower()

import base64
from anthropic import Anthropic

client = Anthropic()   # ANTHROPIC_API_KEY

def classify(pdf: bytes) -> str:
    msg = client.messages.create(
        model="claude-opus-4-8", max_tokens=16,
        messages=[{"role": "user", "content": [
            {"type": "document", "source": {"type": "base64",
             "media_type": "application/pdf",
             "data": base64.standard_b64encode(pdf).decode()}},
            {"type": "text",
             "text": "Label: invoice, credit_note, statement, reminder, other."}]}])
    return msg.content[0].text.strip().lower()

import base64
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage

llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")  # or ChatAnthropic(...)

def classify(pdf: bytes) -> str:
    msg = HumanMessage(content=[
        {"type": "text", "text": "Label: invoice, credit_note, statement, reminder, other."},
        {"type": "media", "mime_type": "application/pdf",
         "data": base64.b64encode(pdf).decode()}])
    return llm.invoke([msg]).content.strip().lower()

3

Extract with native file input, into a strict schema

There is no OCR step here. The model is multimodal: you hand it the PDF as bytes and it returns the fields, forced into a typed Pydantic schema and validated by the SDK. The schema is defined once (in the Gemini tab) and reused; the Claude and LangGraph tabs show the same extraction through their own SDK. Forcing structured output is what makes the result reliable downstream, and it survives the vendor format drift that breaks template tools. When the agent calls extract as a tool it receives a file path rather than raw bytes, since a model cannot emit bytes, so the runnable file defines extract(pdf_path); the call is otherwise identical.

from pydantic import BaseModel, Field
from datetime import date

class LineItem(BaseModel):
    description: str
    quantity: float
    unit_price: float
    amount: float

class Invoice(BaseModel):
    vendor: str
    invoice_no: str
    invoice_date: date
    po_number: str | None = Field(None, description="PO number if present")
    currency: str = "INR"
    tax: float = 0
    total: float
    line_items: list[LineItem]

def extract(pdf: bytes) -> Invoice:
    resp = client.models.generate_content(
        model="gemini-2.5-flash",            # 2.5-pro / 3-pro for hard layouts
        contents=[types.Part.from_bytes(data=pdf, mime_type="application/pdf"),
                  "Extract this invoice into the schema."],
        config=types.GenerateContentConfig(
            response_mime_type="application/json", response_schema=Invoice))
    return resp.parsed                       # a validated Invoice

import base64
from anthropic import Anthropic

client = Anthropic()
# Invoice is the same Pydantic schema as the Gemini tab.

def extract(pdf: bytes) -> Invoice:
    msg = client.messages.parse(
        model="claude-opus-4-8", max_tokens=4096,
        messages=[{"role": "user", "content": [
            {"type": "document", "source": {"type": "base64",
             "media_type": "application/pdf",
             "data": base64.standard_b64encode(pdf).decode()}},
            {"type": "text", "text": "Extract this invoice into the schema."}]}],
        output_config={"format": Invoice})   # validated, typed output
    return msg.parsed_output                 # a validated Invoice

import base64
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage

# Invoice is the same Pydantic schema as the Gemini tab.
extractor = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash").with_structured_output(Invoice)

def extract(pdf: bytes) -> Invoice:
    return extractor.invoke([HumanMessage(content=[
        {"type": "text", "text": "Extract this invoice into the schema."},
        {"type": "media", "mime_type": "application/pdf",
         "data": base64.b64encode(pdf).decode()}])])

4

Validate and three-way match

Now check what you read against your own records, in plain deterministic Python so it is testable and auditable rather than a model's opinion. Re-add the line items and compare to the stated total, look up the PO and goods receipt through your ERP API and compare within a tolerance you set, verify the vendor, and confirm the invoice number has not already been paid. This is the same function in every framework, exposed to the agent as a tool.

Python (any framework)

def three_way_match(inv: dict, po: dict, grn: dict, tolerance: float = 100) -> dict:
    """Compare an extracted invoice against its PO and goods receipt; return the checks."""
    line_total = sum(li["amount"] for li in inv["line_items"])
    return {
        "math":     abs(line_total - inv["total"]) < 1,
        "po_total": abs(inv["total"] - po["total"]) <= tolerance,
        "receipt":  all(grn["qty"].get(li["description"], 0) >= li["quantity"]
                        for li in inv["line_items"]),
        "vendor":   vendor_master.is_approved(inv["vendor"]),
        "not_dup":  not ledger.exists(inv["vendor"], inv["invoice_no"]),
    }

5

Apply the decision policy

Turn business rules into explicit conditions you own. Clean, in-policy invoices auto-approve; everything else routes to a human with the exact failing check named. The model never decides on its own here, it applies your policy, and every decision is logged with the rule that fired. Keep the thresholds in config so finance can change them without a deploy.

Python (any framework)

def decide(inv: dict, checks: dict) -> tuple[str, list]:
    """Auto-approve clean, in-policy invoices; otherwise return the failing checks."""
    failed = [name for name, ok in checks.items() if not ok]
    if not failed and inv["total"] < policy.auto_approve_limit:
        return "auto_approve", []
    return "exception", failed            # e.g. ["po_total"] -> PO mismatch

6

Handle exceptions with a human in the loop

Exceptions are a first-class feature, because they are where money is saved or lost. Each framework expresses the retry-then-escalate loop differently: ADK has a LoopAgent that exits when a tool escalates, Claude runs a plain retry budget, and LangGraph uses a conditional edge that routes back until the checks pass or the budget runs out. Whatever cannot be resolved lands on a review surface with the invoice, the fields, and the failing check, and every human correction is stored against the vendor.

from google.adk.agents import LoopAgent, LlmAgent
from google.adk.tools import ToolContext

def finish_or_escalate(checks: dict, tool_context: ToolContext) -> dict:
    """Exit the loop once every check passes."""
    if all(checks.values()):
        tool_context.actions.escalate = True
    return {"resolved": all(checks.values())}

refiner = LlmAgent(name="refiner", model="gemini-2.5-pro",
    instruction="Re-extract the disputed fields strictly, re-run three_way_match, "
                "then call finish_or_escalate with the new checks.",
    tools=[extract, three_way_match, finish_or_escalate])

exception_loop = LoopAgent(name="exception_loop", sub_agents=[refiner], max_iterations=2)

def resolve_exception(pdf: bytes, po, grn, budget: int = 2):
    """Re-extract with a stricter prompt up to the budget, then hand off to a human."""
    for _ in range(budget):
        inv = extract(pdf)
        checks = three_way_match(inv.model_dump(), po, grn)
        if all(checks.values()):
            return inv, checks                 # clean -> done
    review_queue.add(inv, reason=[k for k, ok in checks.items() if not ok])
    return inv, checks                         # out of retries -> human

from langgraph.graph import StateGraph, START, END

def refine(state: dict) -> dict:               # re-extract + re-check
    inv = extract(state["pdf"])
    return {"invoice": inv, "tries": state["tries"] + 1,
            "checks": three_way_match(inv.model_dump(), state["po"], state["grn"])}

def route(state: dict) -> str:                 # loop until clean or out of tries
    return END if all(state["checks"].values()) or state["tries"] >= 2 else "refine"

g = StateGraph(dict)
g.add_node("refine", refine)
g.add_edge(START, "refine")
g.add_conditional_edges("refine", route)
exception_loop = g.compile()

7

Post to the ERP via a typed tool

On approval, write the bill through your accounting tool's API and attach the source PDF and the decision log. An action is just a typed function the agent can call: ADK auto-wraps a plain function, LangChain uses an @tool decorator, and Claude takes a JSON tool schema with your handler. Swapping the ERP means changing one function while the rest of the pipeline stays the same. Where a tool has no API, emit a validated import file instead of re-keying.

def post_bill_to_erp(invoice: dict, attachment_id: str) -> dict:
    """Create a bill in the ERP after approval; returns the new bill id."""
    bill = erp.create_bill(vendor=invoice["vendor"], total=invoice["total"],
                           line_items=invoice["line_items"], attachment=attachment_id)
    ledger.mark_posted(invoice["vendor"], invoice["invoice_no"], bill["id"])
    return {"bill_id": bill["id"]}

# ADK reads the signature and wraps the function as a tool automatically:
poster = LlmAgent(name="poster", model="gemini-2.5-flash",
                  instruction="Post the approved invoice with post_bill_to_erp.",
                  tools=[post_bill_to_erp])

post_bill = {   # JSON tool schema the model can call
    "name": "post_bill_to_erp",
    "description": "Create a bill in the ERP after approval; returns the bill id.",
    "input_schema": {"type": "object", "required": ["invoice", "attachment_id"],
        "properties": {"invoice": {"type": "object"},
                       "attachment_id": {"type": "string"}}}}

def run_post(invoice: dict, attachment_id: str) -> dict:   # your handler
    bill = erp.create_bill(vendor=invoice["vendor"], total=invoice["total"],
                           line_items=invoice["line_items"], attachment=attachment_id)
    return {"bill_id": bill["id"]}

from langchain_core.tools import tool

@tool
def post_bill_to_erp(invoice: dict, attachment_id: str) -> dict:
    """Create a bill in the ERP after approval; returns the new bill id."""
    bill = erp.create_bill(vendor=invoice["vendor"], total=invoice["total"],
                           line_items=invoice["line_items"], attachment=attachment_id)
    return {"bill_id": bill["id"]}

llm_with_tools = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash").bind_tools([post_bill_to_erp])

8

Assemble the pipeline, add guardrails, and run it

Now wire the steps into one runnable agent. ADK composes them with a SequentialAgent and a Runner, Claude drives a tool-use loop until there is nothing left to call, and LangGraph builds an explicit StateGraph of nodes and edges. Around all three sit the same guardrails: block duplicates on vendor plus invoice number plus content hash before anything posts, hold any invoice where the vendor bank details changed, keep an immutable audit log, and alert when accuracy slips.

from google.adk.agents import LlmAgent
from google.adk.runners import InMemoryRunner

# One agent with the tools is the simplest version that runs (see the full
# runnable file below). Split into a SequentialAgent plus a LoopAgent for
# retries as your volume grows.
ap_agent = LlmAgent(name="ap_agent", model="gemini-2.5-flash",
    instruction="Extract, three-way match, decide, then post; escalate exceptions.",
    tools=[extract, three_way_match, decide, post_bill_to_erp])

runner = InMemoryRunner(agent=ap_agent, app_name="ap")

def guard(inv: dict) -> str | None:
    if ledger.exists(inv["vendor"], inv["invoice_no"]) or ledger.seen(inv["hash"]):
        return "duplicate"                     # never pay the same bill twice
    if vendor_master.bank_changed(inv["vendor"], inv["bank_account"]):
        return "bank_detail_change"            # classic fraud vector -> hold
    return None

def run_ap(messages: list) -> "Message":
    """Drive the tool-use loop until the model has no more tools to call."""
    while True:
        resp = client.messages.create(model="claude-opus-4-8", max_tokens=4096,
                                       tools=[post_bill], messages=messages)
        if resp.stop_reason != "tool_use":
            return resp
        messages.append({"role": "assistant", "content": resp.content})
        for blk in resp.content:
            if blk.type == "tool_use":
                out = run_post(**blk.input)     # execute the tool
                messages.append({"role": "user", "content": [
                    {"type": "tool_result", "tool_use_id": blk.id, "content": str(out)}]})

from langgraph.graph import StateGraph, START, END

g = StateGraph(dict)
g.add_node("extract", lambda s: {"invoice": extract(s["pdf"])})
g.add_node("match",   lambda s: {"checks": three_way_match(
                          s["invoice"].model_dump(), s["po"], s["grn"])})
g.add_node("decide",  lambda s: {"decision": decide(s["invoice"].model_dump(), s["checks"])})
g.add_node("post",    lambda s: {"bill": post_bill_to_erp.invoke(
                          {"invoice": s["invoice"].model_dump(), "attachment_id": s["att"]})})
g.add_edge(START, "extract"); g.add_edge("extract", "match"); g.add_edge("match", "decide")
g.add_conditional_edges("decide",
    lambda s: "post" if s["decision"][0] == "auto_approve" else END)
g.add_edge("post", END)
ap_pipeline = g.compile()

Put it together: run it end to end

Here is the whole thing as one runnable file. It points the agent at a single invoice PDF on disk, extracts it, three-way matches it against in-memory stubs, decides, and posts, then prints the result. Replace the four stubs (the purchase-order total, the approved-vendor set, the posted-set, and create_bill) with your own systems. Save it as main.py, put a PDF next to it, and run python main.py sample_invoice.pdf.

Python (full runnable file, ADK)

# main.py  --  run:  python main.py sample_invoice.pdf
# A runnable reference: reads ONE invoice PDF, extracts it, matches it against
# in-memory stubs, decides, and "posts". Replace the four stubs with your systems.
import sys, asyncio
from datetime import date
from pydantic import BaseModel
from google import genai
from google.genai import types
from google.adk.agents import LlmAgent
from google.adk.runners import InMemoryRunner

client = genai.Client()                       # reads GEMINI_API_KEY

# ---- schema (same as the Extract step) ----
class LineItem(BaseModel):
    description: str; quantity: float; unit_price: float; amount: float

class Invoice(BaseModel):
    vendor: str; invoice_no: str; invoice_date: date
    po_number: str | None = None; currency: str = "INR"
    tax: float = 0; total: float; line_items: list[LineItem]

# ---- stubs: replace these with your ERP, ledger, and vendor master ----
PO = {"total": 124500.0}                       # purchase-order total for this vendor
APPROVED_VENDORS = {"Acme Corp"}
POSTED = set()                                 # (vendor, invoice_no) already booked
def create_bill(inv): return {"id": "BILL-1001"}

# ---- tools (the agent calls these) ----
def extract(pdf_path: str) -> dict:
    """Read an invoice PDF at pdf_path and return its fields as JSON."""
    pdf = open(pdf_path, "rb").read()
    inv = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=[types.Part.from_bytes(data=pdf, mime_type="application/pdf"),
                  "Extract this invoice into the schema."],
        config=types.GenerateContentConfig(
            response_mime_type="application/json", response_schema=Invoice)).parsed
    return inv.model_dump()

def decide_and_post(invoice: dict) -> dict:
    """Three-way match the invoice, then post it or flag an exception."""
    lines = sum(li["amount"] for li in invoice["line_items"])
    checks = {
        "math":     abs(lines - invoice["total"]) < 1,
        "po_total": abs(invoice["total"] - PO["total"]) <= 100,
        "vendor":   invoice["vendor"] in APPROVED_VENDORS,
        "not_dup":  (invoice["vendor"], invoice["invoice_no"]) not in POSTED,
    }
    failed = [k for k, ok in checks.items() if not ok]
    if failed:
        return {"status": "exception", "failed": failed}
    bill = create_bill(invoice)
    POSTED.add((invoice["vendor"], invoice["invoice_no"]))
    return {"status": "posted", "bill_id": bill["id"]}

# ---- the agent: one model, two tools ----
agent = LlmAgent(
    name="ap_agent", model="gemini-2.5-flash",
    instruction=("Call extract on the file path the user gives, then call "
                 "decide_and_post with the extracted invoice, then report the result."),
    tools=[extract, decide_and_post])

async def main(pdf_path: str):
    runner = InMemoryRunner(agent=agent, app_name="ap")
    session = await runner.session_service.create_session(app_name="ap", user_id="demo")
    message = types.Content(role="user",
        parts=[types.Part(text=f"Process the invoice at {pdf_path}")])
    async for event in runner.run_async(
            user_id="demo", session_id=session.id, new_message=message):
        if event.is_final_response():
            print(event.content.parts[0].text)

if __name__ == "__main__":
    if len(sys.argv) < 2:
        sys.exit("usage: python main.py <invoice.pdf>")
    asyncio.run(main(sys.argv[1]))

You should see one line of output: {'status': 'posted', 'bill_id': 'BILL-1001'} when the invoice matches the purchase order, or {'status': 'exception', 'failed': [...]} naming the checks that did not pass. The per-step tabs above are the same logic split into stages; one agent with two tools, as here, is the simplest version that runs. Split it back into a SequentialAgent with a LoopAgent for retries as your volume grows.

The review interface

The human only sees exceptions. The console shows the invoice, the extracted fields, the match result, and one decision to make.

Build, buy, or no-code

Approach	Best for	Effort	Cost	Control
Custom agent (build in-house)	You need control, your own rules, and tight ERP fit	High	Build cost + model usage	Full
Document-AI platform (Nanonets, Rossum)	Standard AP, fast start, less engineering	Low	Per-page or seat subscription	Medium
No-code (n8n, Make)	Low volume, a quick pilot, simple rules	Low	Cheap, but brittle at scale	Low

Security, compliance, and fraud

AP touches money, so these controls matter as much as the extraction. Run the model through Vertex AI or your provider's enterprise tier with the region pinned for data residency where SOC-2 or GDPR requires it, and keep the Document AI fallback on-premise for anything that cannot leave your network. Enforce separation of duties so the system that approves is not the one that pays, and keep an immutable audit log of every action.

Two fraud vectors matter most. Duplicate payments: blocked by the vendor, invoice-number, and content-hash check in the guard before posting. Changed bank details: any account that differs from the vendor master is held for human confirmation, since redirecting payments to a new account is the classic invoice fraud.

The numbers

Benchmarks vary by source and should be treated as ranges, not promises. These are from neutral industry bodies, not vendor marketing.

Metric	Manual	Automated / agent
Cost per invoice	~$15 to $40 (Ardent Partners)	Top-quartile teams run ~$10 or below (APQC)
Invoices per person / year	~4,200 (IOFM average)	~6,900 at top performers (IOFM)
Early-payment discounts captured	20 to 30% (Ardent Partners)	Higher, because nothing is paid late by accident
Touchless rate	Low; most invoices are keyed by hand	Climbs as the auto-approve threshold proves out

Sources: Ardent Partners AP (accounts payable) Metrics That Matter; APQC AP benchmarking; IOFM (Institute of Finance and Management). Figures are industry ranges.

What makes it fail

!Trusting a low-confidence extraction instead of routing it into the exception loop.
!No immutable audit trail, so finance cannot defend a posting.
!Template parsing that breaks when a vendor changes layout.
!No dedup or bank-change guard, so it pays twice or pays a fraudster.
!A brittle ERP integration that silently drops postings under load.

A safe rollout

Pilot on one high-volume vendor format, with a human approving every invoice.
Turn on auto-approve for clean, in-policy invoices once accuracy proves out; keep exceptions manual.
Widen format by format, raise the auto-approve limit, and add monitoring and alerts.
Scale to all vendors and entities on one pipeline, each with its own rules.

Want this built and running in your stack?

Galific designs, builds, and runs agents like this on ADK, the Claude SDK, or LangGraph, integrated with your ERP and your rules. Or explore the ready-made versions in our agent suite.

Build my invoice agent → See our live agents