ptrnsai

Web Browsing

Intermediate🔧 Tool Use PatternsAnthropic / Industry practice

Intent

Give agents the ability to navigate websites, extract information, fill forms, and interact with web applications.

Problem

Most of the world's information and services live on the web, but standard LLM tools only provide API access. Many systems don't have APIs. The agent needs to interact with the web as a human would — clicking, reading, navigating — to access information and complete tasks.

Solution

Connect the agent to a headless browser (Playwright, Puppeteer) that it can control through actions: navigate to URL, click element, type text, extract content, take screenshot. The agent observes the page (via HTML, screenshots, or accessibility tree) and decides what action to take next. This gives agents access to any web-based system, not just those with APIs.

Diagram

Agent: navigate("https://flights.example.com")
  → [Browser renders page] → [Screenshot/HTML returned]
Agent: type("#from", "New York")
Agent: type("#to", "London")
Agent: click("#search")
  → [Results page] → [Extract flight data]
Agent: "The cheapest flight is $450 on March 15"

When to Use

  • Tasks requiring interaction with websites that don't have APIs
  • Web research that goes beyond simple search
  • Automating web-based workflows (booking, form filling, data extraction)
  • Testing web applications

When NOT to Use

  • When an API is available (APIs are faster and more reliable)
  • Simple information retrieval (use web search instead)
  • When security policies prohibit browser automation

Pros & Cons

Pros

  • Access to any web-based system or information
  • Can perform actions (not just read) on websites
  • Works with systems that have no API
  • Visual understanding via screenshots

Cons

  • Slow compared to API calls
  • Fragile — websites change their layout
  • Complex to implement reliably
  • Security and privacy considerations

Implementation Steps

  1. 1Set up a headless browser environment (Playwright recommended)
  2. 2Define browser tools: navigate, click, type, scroll, screenshot, extract
  3. 3Choose observation format: simplified HTML, accessibility tree, or screenshots
  4. 4Implement action parsing and execution
  5. 5Add error recovery for common web issues (loading, popups, CAPTCHAs)
  6. 6Set timeouts and limit the number of navigation steps

Real-World Example

Competitive Price Monitoring

Agent navigates to competitor websites, searches for specific products, extracts pricing data, handles pagination and different page layouts. Results are compiled into a comparison spreadsheet. Runs on a schedule to track price changes over time.

PythonBrowser Automation with Playwright
from playwright.async_api import async_playwright
from openai import AsyncOpenAI

client = AsyncOpenAI()

async def browse_and_extract(url: str, goal: str) -> str:
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto(url)

        title = await page.title()
        content = await page.inner_text("body")
        await browser.close()

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Extract information from web page content."},
            {"role": "user", "content": f"Page: {title}\n\n{content[:3000]}\n\nGoal: {goal}"},
        ],
    )
    return response.choices[0].message.content

References