Using ChatGPT to Solve Real-World SEO Problems: How to Perform an Internal Linking Analysis at Scale

Aug 29, 2023   |   Clock Icon 9 min read

In the ever-evolving realm of SEO, staying ahead of the curve is paramount. Website owners and digital marketers continually seek innovative tools and strategies to enhance their websites' visibility and rank higher in search results.

In the old days (pre-2023), when existing tools and conventional approaches weren’t cutting it, we all ended up scratching our heads and burning up a ton of time trying to come up with, and test, new ideas. Nowadays, we’re scratching our heads trying to figure out how to leverage OpenAI's ChatGPT to hasten the speed of deliverables, generate more insightful ideas, and get up-and-to-the-right results faster.

It’s true. The emergence of ChatGPT has enabled us to dip our toes into an untapped pool of innovative solutions. Here is one such example.

Problem: How do we perform an internal linking analysis at scale?

It doesn’t matter if you employ one, are an SEO analyst, or play one on TV, you’re likely no stranger to the importance of internal linking. It contributes to improved UX, keyword optimization, improved content visibility, and clearly communicates to search engines about topical relevance, page importance, and the relationship between pages. Internal linking is a must - without it, your SEO game could be dead in the water.

Internal linking is easy if you’ve got a nascent blog or a few service or product pages. But what if you’ve been producing content for decades or have thousands of pages in your sitemap?

Hopefully, your plan is more productive than scratching your head and less time-consuming than manually combing through a spreadsheet.

Even after you’ve identified pages that should be linked to one another, how will you go about identifying internal linking opportunities on each page?

We need a solution that does the heavy lifting for us, and it must do it fast.

Get a free expert analysis of your website's SEO

Our expert analysts will show you the improvements we would make to your website to attract the type of traffic that becomes new business. Get your free SEO Scorecard for actionable insights that can have a big impact on your marketing ROI.

How our internal linking project got started

SEOs who work at an agency have an unfair advantage over SEOs who work in-house or do their own independent consulting: not only do we develop a skillset across a range of industries and organizations of all sizes, but more importantly, we work within a greater SEO team.

That means we’re not only learning how to solve our own problems and developing solutions for clients, but we also have the pleasure of learning from the problems and solutions of our colleagues.

Twice per month, our SEO department meets to discuss anything from the latest core updates to conferences and training opportunities to how to get your stubborn Google Business Pages listing unsuspended. We also devote time to things we’re stuck on in a section called, “What Can the SEO Team Help You With?”

A colleague recently asked,

“Has anyone done a large internal linking analysis before?”

It was so quiet you could hear a SERP impression. Then we started asking questions:

  • Could we build one? How would it work?

  • What would the desired output look like?

  • What kind of input do we need to get there?

  • ChatGPT writes code, right?

Thus spawning an intimate relationship with ChatGPT.

We prompted, corrected, revised, clarified, confirmed, repeated, retreated, and advanced. ChatGPT has the wisdom of the internet, but sometimes it feels like we’re trying to coax that wisdom out of a toddler. So while all the insights and suggestions kept us from scratching our heads, we couldn’t avoid banging our collective heads against the wall a few times.

With the help of caffeine and some foundational Python knowledge, we built an internal linking analysis tool capable of analyzing thousands of web pages alongside thousands of keywords to produce hundreds of actionable, internal linking opportunities in the time it takes you to go pick up your next coffee order.

Solution: Building an internal linking analysis tool for SEO

Designed to comprehensively analyze a website's internal linking structure, the internal linking tool was developed through a series of collaborative sessions with ChatGPT.

This cooperative process allowed us to communicate our needs and objectives, and the AI, armed with its extensive training, provided the technical expertise necessary for the coding framework.

Over multiple iterations and constant feedback, the script evolved to become an SEO tool capable of crawling thousands of URLs, identifying potential keywords, and generating a detailed output of potential internal linking opportunities. This tool stands testament to the potential of combining human creativity and AI proficiency.

Here’s how we did it.

Step 1: User input (Keyword Research Export, Homepage, Sitemap)

The script begins by reading an input Excel file provided by a user, which contains a list of keywords and corresponding URLs (among other data). We used a recent Semrush export. The file also includes additional data like keyword rankings, search volume, and keyword difficulty, which are used later. Users must also submit the website’s homepage and sitemap.

Step 2: Crawl the sitemap

To start, we knew we needed to take inventory of all a site’s most important pages. From an organic visibility perspective, if they’re in the sitemap, they’re important. The script retrieves the sitemap of a given website and builds a list of all the URLs on the site. It uses the requests library to send a GET request to the sitemap URL, and the xml.etree.ElementTree library to parse the XML sitemap and extract the URLs.

Step 3: Skip PDFs and other user-defined URLs

One essential feature of the tool is its ability to strategically skip specific URLs during its processing. Specifically, the script is designed to bypass any URLs that end with .pdf. The primary reason for this decision is that PDF files are not conducive to quick and easy editing, particularly for SEO and linking purposes. Other URLs a user might prefer to skip can be manually added.

Step 4: Analyze page content using an NLP library

For our use case, we don’t need to look for internal linking opportunities anywhere beyond the <p> (paragraph) text. For each paragraph of text on the webpage, the script checks if any of the keywords from the input file, or their synonyms, as determined by the NLTK WordNet corpus, are present. If a keyword is found, the script then checks if that keyword is linked to its corresponding target URL.

Step 5: Record internal linking opportunities

If a keyword is found in a paragraph, but that paragraph does not contain a link to the keyword's target URL, this is considered an "opportunity". The script records the keyword, the text of the paragraph, the URL being analyzed (the "source URL"), the target URL, the link presence (False), and the other data from the input file.

Step 6: Build an output file containing opportunities and insights

The script outputs an Excel file containing the recorded opportunities. Each row represents an opportunity to add an internal link from a source URL to a target URL using a specific keyword.

The output file contains nine columns in the following format:

Limitations and areas of improvement

Keyword: The keyword found in the source URL. This keyword comes directly from the input file.

Text: The paragraph of text on the source URL where the keyword was found.

Source URL: The URL where the keyword was found.

Target URL: The URL to which the keyword should link. This URL comes directly from the input file.

Link Presence: A boolean value indicating whether or not a link to the target URL is already present in the paragraph where the keyword was found.

Keyword Ranking: The current search engine ranking of the keyword. This data comes directly from the input file to help users prioritize striking distance keywords.

Search Volume: The estimated monthly number of searches for the keyword. This data comes directly from the input file.

Keyword Difficulty: A measure of how challenging it would be to achieve high search rankings with the keyword. This data comes directly from the input file.

Opportunity Count: The number of times the keyword is found in the source URL, representing potential link opportunities. This is calculated by the script during the analysis of each URL.

ChatGPT, automation, and the future of SEO

ChatGPT has officially lowered the barrier to entering the world of technical innovation. It’s helping armchair coders delve into full-blown development, equipping them with what they need to build solutions to enterprise-level problems.

Working on this project has shown us that we’ve just skimmed the surface of what we can achieve by applying AI to digital marketing strategies, and it has instilled within us the confidence to continue pursuing bigger, better projects that provide scalable solutions for our clients.

We’re focused on embracing AI to increase the speed and accuracy of deliverables, improve analyst efficiency, and get results faster. As we enter this new realm of possibilities, we welcome the opportunity to discuss the marketing challenges your team faces and how AI + smart thinking can help resolve them.

Portrait of Michael Adelizzi

Michael Adelizzi