AI-Powered Content Audit for 30,000+ Pages

Faced with thousands of pages of unmanaged content, an enterprise brand teamed up with Workshop Digital to deploy an AI-driven audit, unlocking actionable insights and a cleaner, more purposeful site architecture.

Faced with thousands of pages of unmanaged content, an enterprise brand teamed up with Workshop Digital to deploy an AI-driven audit, unlocking actionable insights and a cleaner, more purposeful site architecture.

The Challenge

Our enterprise client needed to organize over 30,000 pages across ten decentralized websites. Years of unstructured publishing left internal and public content mixed together, making it nearly impossible to determine what should stay public without a massive manual review.

The Outcome

Using Screaming Frog’s AI integration, we automatically categorized each page by audience intent. The result: a clear, data-backed roadmap for content migration, hundreds of hours saved, and a cleaner, audience-focused website.

The Story

An enterprise client managing ten separate websites and tens of thousands of pages reached out to Workshop Digital. Their challenge: a fragmented, decentralized content environment where multiple departments published pages (many meant for internal use) accessible from the public site. Their goal: clearly identify pages containing content belonging on an external-facing (public) website vs. an intranet for internal audiences to prepare for a major web overhaul and migration.

Because of the volume (30,000+ pages), manual auditing was impractical. The client needed an automated, scalable solution that could reason about page intent and audience, not just crawl URLs. This set the stage for a partnership leveraging Screaming Frog’s AI integration to categorise content at scale.

The Challenge

The client’s ten websites contained thousands of pages, including PDFs, news posts, faculty/staff resources, and internal announcements—many of which were publicly accessible but not meant for the general audience.

Manually reviewing each page would have required hundreds of hours, and traditional crawling alone couldn’t deliver audience-intent classification.

The client needed a way to automate the analysis: for each URL, decide whether it’s for the “public” audience or “internal” audience (or combination) and flag for migration or removal accordingly.

With migration on the horizon, the risk of moving too little or too much content (and thereby confusing users or leaving internal pages exposed) was high. A repeatable, transparent process was required.

The Approach

Workshop Digital designed a workflow using the Screaming Frog SEO Spider tool with its AI (LLM) integration to create an AI-powered content audit at scale.

Steps included:

  • Crawl configuration: enabled “Store HTML” in the crawl settings so full page HTML could be extracted for each URL.

  • API integration: connected Screaming Frog to an LLM (in this case, OpenAI) via API access, allowing outcomes for each page.

  • Prompt design and testing: crafted a prompt with instructions such as:
    “You are an expert in web strategy and audience analysis. Based on the HTML content, categorize this page as INTERNAL or EXTERNAL. Provide a short explanation for your decision.”
    Iteratively refined the prompt (for example: "Ignore universal navigation, login buttons, or links common to all pages") to reduce false positives and improve accuracy.

  • Batch testing: ran small batches (10-20 pages) initially, reviewed output, refined prompts, logic, and LLM models until classification aligned with expectations.

  • Full crawl execution: once confident, scanned the full 30,000+ pages; for each URL, output was appended in Screaming Frog with:
    • AI Category (INTERNAL / EXTERNAL / COMBINATION)

    • AI Reasoning (explain why classification)

  • Export & analysis: The dataset was then filtered/exported for migration planning — enabling the client to report on “what percentage of pages need to move to intranet” vs “what pages stay public”, etc.

30,000+

Pages Audited and Classified

100+

Hours Saved

The Results

The execution of this AI-powered audit produced clear, actionable results for the client.

Outcomes:

  • The client obtained a reasoning-backed classification for every one of the 30,000+ pages: no more guessing or manual review.

  • They were provided with a data-driven roadmap for content migration: which pages to transfer to the intranet, which to keep public, and which to archive.

  • According to our team:
    • A framework for a cleaner, audience-focused public website emerged.

    • Streamlined content governance enabled by the new classification process.

    • Time savings measured in hundreds of hours, replacing weeks of manual review.

Section of a spreadsheet detailing department, parent URL, and percentage of audience.
Pie chart depicting the Count of Audience percentages.

Client Impact:

  • Significant reduction in labor/time cost for auditing content at scale.

  • Reduced risk in migration by relying on consistent, explainable classifications rather than subjective manual review.

  • Improved governance and future auditability (process can be rerun).

The project demonstrated that AI integrations in a tool like Screaming Frog can transform formerly manual audits into fast, repeatable workflows.

Facing a massive content migration or needing to audit your website at scale? Let us help you deploy an AI-powered audit process. Contact our team today to see how we can apply this methodology to your website.