Automation to Extract Title and Description Of All Urls On a Domain
If you are in SEO, or you repeatedly need to look into meta data for a domain, this automation can come handy in initial audit of the website.
Using CaptureKit’s Page Data Extraction API, you can get all the Indexed URLs, their meta-title & description.
I have built a simple n8n automation in this blog, that you can follow along to built your own meta data extractor.
Or at the very end, I will give you a blueprint, that you can use as is in your n8n account.
What Tools You Need
- CaptureKit Access Key (Sign up if you haven’t to get 100 free credits)
- n8n (14-day free trial)
- Google Sheets
Let’s start building it!
Getting CaptureKit’s Access Keys
The Page Data API will give us all the URLs from sitemap, the same API gives us meta details.
You can look at the sample output of this API below:
{
"success": true,
"data": {
"metadata": {
"title": "CaptureKit - Turn any website into a screenshot with our powerful Screenshot API",
"description": "CaptureKit is a powerful API for capturing screenshots, extracting HTML, gathering links, and summarizing content—all with a simple request.",
"favicon": "https://capturekit.dev/favicon.ico",
"ogImage": "https://capturekit-assets.s3.amazonaws.com/capturekit-og+(1).png"
},
"links": {
"internal": [
"https://capturekit.dev/",
"https://capturekit.dev/dashboard",
"https://capturekit.dev/pricing",
"https://capturekit.dev/blog"
],
"external": [
"https://docs.capturekit.dev",
"https://zapier.com/apps/capturekit-website-screenshots-p/integrations",
"https://www.nextupkit.com"
],
"social": [
"https://github.com/CaptureKit-Web-Scraping-API",
"https://x.com/capturekit"
]
},
"html": "Hello, world!",
"markdown": "CaptureKit - Turn any website into a screenshot with our powerful Screenshot API...",
"sitemap": {
"source": "https://capturekit.dev/sitemap.xml",
"totalLinks": 3,
"links": [
"https://www.capturekit.dev/",
"https://www.capturekit.dev/page-content",
"https://www.capturekit.dev/ai"
]
}
}
}
You have to pass a Seed URL to it as input & ofcourse the access key.
As you can see from the output of this API, we have meta deta and all the links from Sitemap.
So this API will be used twice in our workflow, once to get all the links, these links will be passed again through this API to collect Meta data.
Read the documentation here, if you want to know how this API works.
Once you successfully sign ups to CaptureKit, you will get your Access Key in your dashboard.

Building our database in Google Sheets
In Spreadsheet, we now will create a database, where we will collect the meta data of all the URLs. Further, we will give the Seed URL from here to feed in our workflow.

The other tab “Title & Description” is for the output data.

And we have kept the name of our document “Find Title & Description From a Domain”
Let’s start building our workflow
The platfrom to automate our meta data extraction will take place in n8n. Here we will use all of the other tools, merge them together and get the desired result.
If this is the first time you hearing about n8n, it is a no-code automation tool. You can use any tool, be it Make, Zapier or any other. The logic of the flow would be as told in here.
But for the sake of this tutorial we will use n8n, it gives a 14-day free trial.
Create a new workflow in n8n, and the first node we will be using is schedule trigger.

Now the next logical operation should get us the Seed URL, which is what we are doing using Google Sheets Node and Get Rows operation

Let’s execute this step to check if this and the previous node is working fine.
The node works fine, we are getting the seed url from the sheet-1. ⬇️

Now logically, we will pass this URL to our API to extract sitemap URLs.
Our next node will be HTTP node & this will be the configuration of it. ⬇️

Let’s also test this node and see the output we get.

All the URLs are in an array, we have to sperate them out, the node we will use is split node and map the URLs here.

Now for each URL, we have to collect meta details, the next node would be to keep them in loop and pass them one by one to CaptureKit’s API again.

Let’s Pass this to HTTP node (CaptureKit’s API).

This will give us the meta details, that we will append in our Google Sheets and further loop this process until each URL is passed.

When the whole process is done, we mark this task done by putting in a “Yes” in sheet-1 here ⬇️

And, that’s you can put in any Seed URL and run this automation.
Here’s the blueprint for this automation, that you can download and use it as is in your canvas.
Conclusion
You are not only limited to extracting meta data with this API. You can even extract raw HTML, can convert page data into markdown format (handy in case you want to feed the data to LLMs).
Again the documentation and playground can help you to understand how the API works.
In case, you are having problem integrating this API to your workflow, you can reach out to us on chat!