Web Browsing

Intermediate🔧 Tool Use PatternsAnthropic / Industry practice

Intent

Give agents the ability to navigate websites, extract information, fill forms, and interact with web applications.

Problem

Most of the world's information and services live on the web, but standard LLM tools only provide API access. Many systems don't have APIs. The agent needs to interact with the web as a human would — clicking, reading, navigating — to access information and complete tasks.

Solution

Connect the agent to a headless browser (Playwright, Puppeteer) that it can control through actions: navigate to URL, click element, type text, extract content, take screenshot. The agent observes the page (via HTML, screenshots, or accessibility tree) and decides what action to take next. This gives agents access to any web-based system, not just those with APIs.

Diagram

Agent: navigate("https://flights.example.com")
  → [Browser renders page] → [Screenshot/HTML returned]
Agent: type("#from", "New York")
Agent: type("#to", "London")
Agent: click("#search")
  → [Results page] → [Extract flight data]
Agent: "The cheapest flight is $450 on March 15"

When to Use

Tasks requiring interaction with websites that don't have APIs
Web research that goes beyond simple search
Automating web-based workflows (booking, form filling, data extraction)
Testing web applications

When NOT to Use

When an API is available (APIs are faster and more reliable)
Simple information retrieval (use web search instead)
When security policies prohibit browser automation

Pros & Cons

Pros

Access to any web-based system or information
Can perform actions (not just read) on websites
Works with systems that have no API
Visual understanding via screenshots

Cons

Slow compared to API calls
Fragile — websites change their layout
Complex to implement reliably
Security and privacy considerations

Implementation Steps

1Set up a headless browser environment (Playwright recommended)
2Define browser tools: navigate, click, type, scroll, screenshot, extract
3Choose observation format: simplified HTML, accessibility tree, or screenshots
4Implement action parsing and execution
5Add error recovery for common web issues (loading, popups, CAPTCHAs)
6Set timeouts and limit the number of navigation steps

Real-World Example

Competitive Price Monitoring

Agent navigates to competitor websites, searches for specific products, extracts pricing data, handles pagination and different page layouts. Results are compiled into a comparison spreadsheet. Runs on a schedule to track price changes over time.

PythonBrowser Automation with Playwright

from playwright.async_api import async_playwright
from openai import AsyncOpenAI

client = AsyncOpenAI()

async def browse_and_extract(url: str, goal: str) -> str:
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto(url)

        title = await page.title()
        content = await page.inner_text("body")
        await browser.close()

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Extract information from web page content."},
            {"role": "user", "content": f"Page: {title}\n\n{content[:3000]}\n\nGoal: {goal}"},
        ],
    )
    return response.choices[0].message.content

References

Computer Use — Anthropic