Big Idea or Enduring Question:

What is a robots.txt file, and how can you create a custom robots.txt for your Experience Cloud site?

Objectives:

After reading this blog, you’ll be able to:

Understand the role of robots.txt in SEO.
Create and customize a robots.txt file for Experience Cloud sites.
Control search engine crawlers effectively.
Improve SEO and indexing for your portal.
And much more!

👉 Previously, I’ve shared several posts on effectively implementing key branding and SEO features for Experience Cloud sites. Why not check them out while you’re here?

Pre-requisites

Before you start reading this website, make sure to read these blogs to gain a basic understanding of SEO, sitemaps, meta tags, etc.

Business Use case

Olivia Bennett, a Junior Developer at Gurukul on Cloud (GoC), is part of a team working on building an Experience Cloud site for the company’s help portal. The portal is branded with the URL: https://help.gurukuloncloud.com/

She has already:

Branded the portal URL as described in this post.
Configured Google Analytics™ for Experience Cloud Sites.
Learned and implemented Meta Titles and Meta Descriptions to improve SEO.

Additionally, Olivia has a requirement to block crawlers from indexing the following media files:

Images (.jpg, .png, .gif)
Audio files (.mp3, .wav)
Videos (.mp4, .avi)
PDF documents (.pdf)

These files should be blocked from search engines to prevent unnecessary indexing and reduce server load because:

They have no valuable SEO content.
Search engines should not send traffic to them.
Indexing them could create bloat, negatively affecting SEO rankings.

Doordash Help Portal Robots.txt Generated by Salesforce

Feature	Crawling	Indexing
Definition	Process of discovering and scanning web pages.	Process of storing and organizing pages in the search engine database.
Purpose	Helps search engines find and understand web pages.	Makes web pages searchable in Google or other search engines.
Controlled by	Robots.txt, sitemaps, internal links.	Meta tags (`noindex`), content quality, canonical tags.
Outcome	Search engines recognize the page but may not show it in search results.	Indexed pages can appear in search engine results.
Can be Blocked?	Yes, via robots.txt or `nofollow` attributes.	Yes, using `noindex` meta tag, canonical tags, or poor content quality.

Use Case	robots.txt	Meta Tag
Block entire directories	Yes	No
Prevent search engines from crawling a page	Yes	No
Ensure a page is NOT indexed	No	Yes
Prevent a specific page from appearing in search results	No	Yes
Control how bots follow links on a page	No	Yes

Automation Champion Approach (I-do):

One last thing, another important reason to create a custom robots.txt file is when you have a custom sitemap. In Salesforce Experience Cloud sites, the default sitemap is automatically generated, but if you have a custom site structure, restricted content, or dynamic pages, you may need to define your own sitemap.

Now let’s come back to Olivia requirement to block crawlers from indexing the following media files:

Images (.jpg, .png, .gif)
Audio files (.mp3, .wav)
Videos (.mp4, .avi)
PDF documents (.pdf)

and the answer is we have to. create a custom robots.txt.

Step 1: Create a Custom robots.txt File Using a Visualforce Page

Click Setup.
In the Quick Find box, type Visualforce Page.
Select Visualforce Page, click on the, clicks on the New button.
Select Available for Lightning Experience, Experience Builder sites, and the mobile app.

Copy the following code into your Visualforce page and remove only the grammar mistakes.

Copy Code

<apex:page contentType="text/plain"> # Default robots.txt for GurukulOnCloud sites # For use by salesforce.com User-agent: * # Applies to all robots Allow: / # Allow all # Block specific file types Disallow: /*.jpg$ Disallow: /*.png$ Disallow: /*.gif$ Disallow: /*.mp3$ Disallow: /*.wav$ Disallow: /*.mp4$ Disallow: /*.avi$ Disallow: /*.pdf$ # Sitemap location Sitemap: https://help.gurukuloncloud.com/s/sitemap.xml </apex:page>

Save the changes.

Step 2: Update Robots.txt on Experience Workspaces

In this step, you’ll update the robots.txt file in Experience Workspaces by linking it to the custom Visualforce page. This ensures search engines follow your defined rules while indexing your Experience Cloud site.

Open Experience Builder.
Navigate to Administration → Pages, then click Go to Force.com.
On the Site Details page, click Edit.
In the Site Robots.txt field, select the name of the Visualforce page that you created in step 1.
Save the changes.

If you have multiple Experience Cloud sites under the same subdomain, you typically need to update the robots.txt file for each site individually within Experience Workspaces. Make sure that the robots.txt file includes sitemaps for all Experience Cloud sites that share the same subdomain.

Proof of Concept

To validate the new robots.txt, open this URL: https://help.gurukuloncloud.com/robots.txt. You should see the new robots.txt file we just implemented.

Formative Assessment:

I want to hear from you!

What is one thing you learned from this post? How do you envision applying this new knowledge in the real world? Feel free to share in the comments below.

← Back

Step-by-Step Guide to Creating a Custom Robots.txt for Experience Cloud Sites

Big Idea or Enduring Question:

Objectives:

Pre-requisites

Business Use case

Understanding the Basics: Crawling vs. Indexing in SEO

Crawling vs. Indexing: Key Differences

What is a Robots.txt File?

How Does a Robots.txt File Work?

How to Locate a Robots.txt File

Why Is Robots.txt Important?

Robots.txt vs. Meta Tags: When to Use Each?

How Many Robots.txt Files Does Experience Cloud Generate for Multiple Sites?

Sites with Different Custom Domains

Sites with the Same Domain (Subpath-Based)

Automation Champion Approach (I-do):

Step 1: Create a Custom robots.txt File Using a Visualforce Page

Step 2: Update Robots.txt on Experience Workspaces

Proof of Concept

Formative Assessment:

Thank you for your response. ✨

Leave a ReplyCancel reply

Big Idea or Enduring Question:

Objectives:

Pre-requisites

Business Use case

Understanding the Basics: Crawling vs. Indexing in SEO

Crawling vs. Indexing: Key Differences

What is a Robots.txt File?

How Does a Robots.txt File Work?

How to Locate a Robots.txt File

Why Is Robots.txt Important?

Robots.txt vs. Meta Tags: When to Use Each?

How Many Robots.txt Files Does Experience Cloud Generate for Multiple Sites?

Sites with Different Custom Domains

Sites with the Same Domain (Subpath-Based)

Automation Champion Approach (I-do):

Step 1: Create a Custom robots.txt File Using a Visualforce Page

Step 2: Update Robots.txt on Experience Workspaces

Proof of Concept

Formative Assessment:

Thank you for your response. ✨

Similar Posts

The Rise of the Full-Stack Salesforce Admin

How to Add Payment Methods in Salesforce: Payment Gateways Integration

Retain Event Monitoring Event Log Files for Up to One Year

Leave a ReplyCancel reply