webclaw

Go SDK

The Go SDK provides a typed client for every webclaw endpoint. Zero dependencies beyond the standard library, context.Context on every method, and functional options for configuration.

Installation

go get
go get github.com/0xMassi/webclaw-go
Note
Requires Go 1.21 or later. No external dependencies.

Configuration

Create a client with your API key and optional functional options.

Basic
import webclaw "github.com/0xMassi/webclaw-go"

client := webclaw.NewClient("wc_your_api_key")

Options

FunctionDefaultDescription
WithBaseURL(url)https://api.webclaw.ioOverride for self-hosted instances.
WithTimeout(d)30sHTTP client timeout.
WithHTTPClient(hc)default http.ClientReplace the underlying HTTP client entirely.
All options
client := webclaw.NewClient(
    "wc_your_api_key",
    webclaw.WithBaseURL("https://api.webclaw.io"),
    webclaw.WithTimeout(60 * time.Second),
    webclaw.WithHTTPClient(&http.Client{
        Transport: customTransport,
    }),
)

Scrape

go
result, err := client.Scrape(ctx, webclaw.ScrapeRequest{
    URL:              "https://example.com",
    Formats:          []webclaw.Format{webclaw.FormatMarkdown, webclaw.FormatLLM},
    IncludeSelectors: []string{"article", ".content"},
    ExcludeSelectors: []string{"nav", "footer"},
    OnlyMainContent:  true,
    NoCache:          true,
})
if err != nil {
    log.Fatal(err)
}

fmt.Println(result.URL)
fmt.Println(result.Markdown)
fmt.Println(result.LLM)
fmt.Println(result.Cache.Status)  // "hit" | "miss" | "bypass"

Crawl

Start an async crawl, then poll or wait for completion.

go
job, err := client.Crawl(ctx, webclaw.CrawlRequest{
    URL:        "https://example.com",
    MaxDepth:   3,
    MaxPages:   100,
    UseSitemap: true,
})
if err != nil {
    log.Fatal(err)
}

// Block until complete (polls every 2s, 5min timeout)
status, err := client.WaitForCrawl(ctx, job.ID, 2*time.Second, 5*time.Minute)
if err != nil {
    log.Fatal(err)
}

fmt.Println(status.Status)    // "completed" | "failed"
fmt.Println(status.Total)     // pages discovered
fmt.Println(status.Completed) // pages crawled

for _, page := range status.Pages {
    fmt.Printf("%s: %d bytes\n", page.URL, len(page.Markdown))
}
Tip
WaitForCrawl respects the context deadline -- pass a context with timeout for upper-bound control.

Map

go
result, err := client.Map(ctx, webclaw.MapRequest{
    URL: "https://example.com",
})
fmt.Println(result.Count)
for _, u := range result.URLs {
    fmt.Println(u)
}

Batch

go
result, err := client.Batch(ctx, webclaw.BatchRequest{
    URLs:        []string{"https://a.com", "https://b.com", "https://c.com"},
    Formats:     []webclaw.Format{webclaw.FormatMarkdown},
    Concurrency: 5,
})
for _, item := range result.Results {
    if item.Error != "" {
        fmt.Printf("FAIL %s: %s\n", item.URL, item.Error)
    } else {
        fmt.Printf("OK   %s: %d chars\n", item.URL, len(item.Markdown))
    }
}

Extract

LLM-powered structured extraction. Pass a JSON schema or a plain-text prompt.

Schema-based
result, err := client.Extract(ctx, webclaw.ExtractRequest{
    URL:    "https://example.com/pricing",
    Schema: json.RawMessage(`{
        "type": "object",
        "properties": {
            "plans": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string"},
                        "price": {"type": "string"}
                    }
                }
            }
        }
    }`),
})
fmt.Println(string(result.Data))
Prompt-based
result, err := client.Extract(ctx, webclaw.ExtractRequest{
    URL:    "https://example.com",
    Prompt: "Extract all pricing tiers with names and monthly prices",
})
fmt.Println(string(result.Data))

Summarize

go
result, err := client.Summarize(ctx, webclaw.SummarizeRequest{
    URL:          "https://example.com",
    MaxSentences: 3,
})
fmt.Println(result.Summary)

Brand

go
result, err := client.Brand(ctx, webclaw.BrandRequest{
    URL: "https://example.com",
})
fmt.Println(string(result.Data))

Error handling

API errors are returned as *APIError with status code and message. Use errors.As to unwrap.

FieldTypeDescription
StatusCodeintHTTP status code (401, 404, 429, etc.)
MessagestringHuman-readable error message
go
result, err := client.Scrape(ctx, req)
if err != nil {
    var apiErr *webclaw.APIError
    if errors.As(err, &apiErr) {
        switch apiErr.StatusCode {
        case 401:
            fmt.Println("Check your API key")
        case 429:
            fmt.Println("Rate limited, slow down")
        default:
            fmt.Printf("API error %d: %s\n", apiErr.StatusCode, apiErr.Message)
        }
    } else {
        // Network error, timeout, etc.
        fmt.Println("Request failed:", err)
    }
}

Types and constants

The SDK exports typed constants for formats and crawl statuses.

go
// Output formats
webclaw.FormatMarkdown  // "markdown"
webclaw.FormatText      // "text"
webclaw.FormatLLM       // "llm"
webclaw.FormatJSON      // "json"

// Crawl statuses
webclaw.CrawlStatusRunning    // "running"
webclaw.CrawlStatusCompleted  // "completed"
webclaw.CrawlStatusFailed     // "failed"

// Cache statuses
webclaw.CacheHit     // "hit"
webclaw.CacheMiss    // "miss"
webclaw.CacheBypass  // "bypass"

Source

github.com/0xMassi/webclaw-go