Automation to Extract Title and Description Of All Urls On a Domain

CaptureKit Team
title and description extractorwebsite metadata automationdomain data scrapermetadata extraction workflowCaptureKit automation
If you are in SEO, or you repeatedly need to look into meta data for a domain, this automation can come handy in initial audit of the website.

Using CaptureKit’s Page Data Extraction API, you can get all the Indexed URLs, their meta-title & description.

I have built a simple n8n automation in this blog, that you can follow along to built your own meta data extractor.

Or at the very end, I will give you a blueprint, that you can use as is in your n8n account.

What Tools You Need

  1. CaptureKit Access Key (Sign up if you haven’t to get 100 free credits)
  2. n8n (14-day free trial)
  3. Google Sheets

Let’s start building it!

Getting CaptureKit’s Access Keys

The Page Data API will give us all the URLs from sitemap, the same API gives us meta details.

You can look at the sample output of this API below:

{
    "success": true,
    "data": {
        "metadata": {
            "title": "CaptureKit - Turn any website into a screenshot with our powerful Screenshot API",
            "description": "CaptureKit is a powerful API for capturing screenshots, extracting HTML, gathering links, and summarizing content—all with a simple request.",
            "favicon": "https://capturekit.dev/favicon.ico",
            "ogImage": "https://capturekit-assets.s3.amazonaws.com/capturekit-og+(1).png"
        },
        "links": {
            "internal": [
                "https://capturekit.dev/",
                "https://capturekit.dev/dashboard",
                "https://capturekit.dev/pricing",
                "https://capturekit.dev/blog"
            ],
            "external": [
                "https://docs.capturekit.dev",
                "https://zapier.com/apps/capturekit-website-screenshots-p/integrations",
                "https://www.nextupkit.com"
            ],
            "social": [
                "https://github.com/CaptureKit-Web-Scraping-API",
                "https://x.com/capturekit"
            ]
        },
        "html": "Hello, world!",
        "markdown": "CaptureKit - Turn any website into a screenshot with our powerful Screenshot API...",
        "sitemap": {
            "source": "https://capturekit.dev/sitemap.xml",
            "totalLinks": 3,
            "links": [
                "https://www.capturekit.dev/",
                "https://www.capturekit.dev/page-content",
                "https://www.capturekit.dev/ai"
            ]
        }
    }
}

You have to pass a Seed URL to it as input & ofcourse the access key.

As you can see from the output of this API, we have meta deta and all the links from Sitemap.

So this API will be used twice in our workflow, once to get all the links, these links will be passed again through this API to collect Meta data.

Read the documentation here, if you want to know how this API works.

Once you successfully sign ups to CaptureKit, you will get your Access Key in your dashboard.
Access Key in your dashboard

Building our database in Google Sheets

In Spreadsheet, we now will create a database, where we will collect the meta data of all the URLs. Further, we will give the Seed URL from here to feed in our workflow.
Building our database in Google Sheets
The other tab “Title & Description” is for the output data.
Title & Description
And we have kept the name of our document “Find Title & Description From a Domain”

Let’s start building our workflow

The platfrom to automate our meta data extraction will take place in n8n. Here we will use all of the other tools, merge them together and get the desired result.

If this is the first time you hearing about n8n, it is a no-code automation tool. You can use any tool, be it Make, Zapier or any other. The logic of the flow would be as told in here.

But for the sake of this tutorial we will use n8n, it gives a 14-day free trial.

Create a new workflow in n8n, and the first node we will be using is schedule trigger.
Create a new workflow in n8n
Now the next logical operation should get us the Seed URL, which is what we are doing using Google Sheets Node and Get Rows operation
Get Seed URL
Let’s execute this step to check if this and the previous node is working fine.

The node works fine, we are getting the seed url from the sheet-1. ⬇️
Seed url from the sheet-1
Now logically, we will pass this URL to our API to extract sitemap URLs.

Our next node will be HTTP node & this will be the configuration of it. ⬇️
Pass this URL to our API
Let’s also test this node and see the output we get.
 Test Node and see the output
All the URLs are in an array, we have to sperate them out, the node we will use is split node and map the URLs here.
 Split node and map the URLs here
Now for each URL, we have to collect meta details, the next node would be to keep them in loop and pass them one by one to CaptureKit’s API again.
Loop Over Items
Let’s Pass this to HTTP node (CaptureKit’s API).
Pass this to HTTP node
This will give us the meta details, that we will append in our Google Sheets and further loop this process until each URL is passed.
Give us the meta details
When the whole process is done, we mark this task done by putting in a “Yes” in sheet-1 here ⬇️
Task done by putting yes.
And, that’s you can put in any Seed URL and run this automation.

Here’s the blueprint for this automation, that you can download and use it as is in your canvas.

Conclusion

You are not only limited to extracting meta data with this API. You can even extract raw HTML, can convert page data into markdown format (handy in case you want to feed the data to LLMs).

Again the documentation and playground can help you to understand how the API works.

In case, you are having problem integrating this API to your workflow, you can reach out to us on chat!

Additional Resources

  1. How To Find All The URLs on a Domain (5 Methods)
  2. How To Scale Screenshots with Google Sheets (Using App Script)
  3. How To Extract HTML of a Webpage using Puppeteer
  4. How To Convert HTML to Markdown

Ready to get started with CaptureKit?

Start capturing and analyzing your user interactions today. Get started for free.

Get Started