Step-by-Step Guide to Generate a Sitemap in Salesforce Experience Cloud

Step-by-Step Guide to Generate a Sitemap in Salesforce Experience Cloud

Last Updated on February 7, 2025 by Rakesh Gupta

Big Idea or Enduring Question:

  • How do you generate a sitemap for your Salesforce Experience Cloud site?

Objectives:

After reading this blog, you’ll be able to:

  1. Understand what a sitemap is, including its standards and formats.
  2. Understand what a robots.txt is, including its standards and formats.
  3. Automatically generate a sitemap using Salesforce’s built-in tools.
  4. Understand when and why you might need to generate a sitemap manually.
  5. Explore additional best practices for maintaining and optimizing your sitemap.
  6. And much more!

Business Use case

Olivia Bennett, a Junior Developer at Gurukul on Cloud (GoC), is part of a team working on building an Experience Cloud site for the company’s help portal. She branded the portal URL https://help.gurukuloncloud.com/ as described in this post and configured Google Analytics™ for Experience Cloud Sites. 

Now, she has been tasked with generating a sitemap for the help portal, a critical step to ensure effective search engine indexing.

What is a Sitemap?

A sitemap provides a high-level, bird’s-eye view of your websit. It plays a crucial role in SEO by ensuring search engines can efficiently crawl and index your site, helping to unlock its full ranking potential. By using a sitemap, you can eliminate common indexing issues such as duplicate content and orphan pages.

At its core, a sitemap is a structured file that lists all the important URLs of your website. Its primary purpose is to help search engines understand your site’s structure and locate specific pages more easily. 

Why Do You Need a Sitemap?

As websites grow in size and complexity, navigating them becomes increasingly challenging for both users and search engines. This is where sitemaps come in – they act as a guide, ensuring that important pages are easily discoverable and properly indexed.

Here’s why sitemaps are essential:

  1. Better Discoverability: An XML sitemap helps search engines find key pages on your website, which is especially useful for large sites with thousands of pages. Without a sitemap, some pages might go unnoticed due to crawl budget limitations.
  2. Faster Indexing: For new websites, submitting an XML sitemap can speed up the ranking process by helping search engines find and index pages more quickly. If you frequently update content, a sitemap ensures that search engines detect those changes sooner.
  3. Help Search Bots Locate Orphan Pages: Google’s crawlers typically find pages by following internal links. However, orphan pages those without any inbound links can be difficult for search engines to reach. By including them in your sitemap, you ensure Google can easily locate and index these otherwise hidden pages.

    Image Source: Seobility – License: CC BY-SA 4.0
  4. Help Google Manage Duplicate Pages: Websites, especially eCommerce platforms, often have similar pages, like product variants. A sitemap with canonical tags helps Google identify the primary page, preventing duplicate content issues.
  5. Enhanced User Experience: An HTML sitemap serves as a structured directory, allowing visitors to quickly locate the information they need. It provides a clear overview of your most important pages in one centralized place.

Since different types of sitemaps offer unique advantages, let’s explore them in more detail.

The Two Types of Sitemaps

Sitemaps come in two main forms, each serving a distinct purpose:

  1. XML(Extensible Markup Languag) Sitemaps: These are structured files formatted specifically for search engines. They act as a roadmap, guiding crawlers to important pages and helping improve indexing efficiency.
    Linkedin Business’ XML sitemap
    1. XML sitemap contains valuable details that search engine crawlers rely on, including:
      1. A complete list of all published URLs on your website.
      2. The last modified date for each URL, helping search engines track updates.
      3. Hreflang tags to indicate language and regional variations of your pages, ensuring proper indexing for international audiences.
  2. HTML Sitemaps: Unlike XML sitemaps, HTML sitemaps are designed for users. They appear as regular web pages, listing key sections of a website to make navigation easier, especially for visitors looking for specific content. However, HTML sitemaps are not as common anymore and should not be used as a replacement for well-structured site navigation elements such as menus, footer links, breadcrumbs, and categories.

    Apple’s sitemap

Both types of sitemaps work together to enhance website accessibility—ensuring search engines can properly index pages while also improving user experience.

How to Find a Sitemap

Finding a website’s XML sitemap is simple and can usually be done manually. In most cases, it follows a standard URL structurehttps://yourdomain.com/sitemap.xml

If the website is built on platforms like WordPress with SEO plugins (such as Yoast), the sitemap might be structured as an index file instead, commonly found at: https://yourdomain.com/sitemap_index.xml

If you can’t find it using these URLs, you can also check the robots.txt file by visiting https://yourdomain.com/robots.txt, where many sites specify the exact location of their sitemap.

Do You Need a Sitemap?

While Google is quite effective at discovering web pages on its own, a sitemap can enhance your SEO and indexing efficiency especially for certain types of websites. According to Google, a sitemap is particularly useful if:

  1. Your Website is Large (500+ pages): The more pages you have, the higher the chance that search engines might miss new or updated content.
  2. Your Internal Linking is Weak: If your site has orphan pages (pages without internal links pointing to them), a sitemap helps search engines find and index them.
  3. Your Site is New or Has Few Backlinks: Since Google primarily discovers pages through links, a sitemap ensures that important content gets indexed even if your site lacks external backlinks.
  4. You Use Rich Media: If your site includes images, videos, or news content, a sitemap helps search engines understand and properly display them in search results.

For most websites, having a sitemap is a simple yet effective way to improve discoverability and indexing.

What is a Robots.txt File?

A robots.txt file is like a set of simple rules for search engines. It tells them which pages or parts of your website they can visit and which ones they should ignore. This way, only the best pages show up when people search for your website.

While major search engines like Google, Bing, and Yahoo recognize and respect robots.txt directives, it’s important to note that this file is not a foolproof method for preventing web pages from appearing in search results.

For your experience cloud site it is automatically generated and located at the root of your domain. Each domain has a unique robots.txt, so sites sharing a domain share the same file, while those on different domains (e.g., *.force.com, *.my.site.com, custom domains) have separate ones. The file uses Allow ( A directory or page, relative to the root domain, that may be crawled by the user agent just mentioned) and Disallow (A directory or page, relative to the root domain, that you don’t want the user agent to crawl) directives to manage bot access, with only relative URLs being valid.

Doordash Help Portal Robots.txt Generated by Salesforce

To check robots.txt for https://help.gurukuloncloud.com/ navigate to https://help.gurukuloncloud.com/robots.txt

Each rule blocks or allows access for all or a specific crawler to a specified file path on the domain or subdomain where the robots.txt file is hosted. Unless you specify otherwise in your robots.txt file, all files are implicitly allowed for crawling.

The robots.txt file tells search engines where your sitemap is located, while the sitemap tells them which pages to crawl and index. In simple terms, the robots.txt file is like a set of basic rules for search engines. It guides them on which parts of your website they can visit and which ones to ignore, ensuring that only the best pages appear in search results.

Why Is Robots.txt Important?

Most websites don’t need a robots.txt file because Google typically finds and indexes the important pages while ignoring duplicates or unimportant content. However, there are three key reasons to use one:

  1. Block Non-Public Pages: Prevent indexing of staging sites, login pages, or internal search results to keep them out of search results.
  2. Maximize Crawl Budget: By blocking unimportant pages, search engines focus on your essential content, ensuring the best use of your crawl budget.
  3. Control Resource Indexing: For resources like PDFs and images, robots.txt works better than meta tags to prevent unwanted indexing.

Using a robots.txt file helps ensure that search engines index only the content that matters most to your site.

Key Differences: Robots.txt vs Sitemap

In the world of SEO, both robots.txt files and sitemaps are essential tools. They help search engines interact with your website, but each serves a different purpose. Understanding their roles can help you optimize your site’s performance and ensure that the right pages are indexed.

Aspect Robots.txt Sitemap
Purpose Tells search engines which pages or folders to ignore. Lists all the important pages to help search engines index them.
File Type Plain text file. XML file.
Location Placed in the root directory (e.g., example.com/robots.txt). Usually placed in the root directory or submitted directly to search engines.
Function Controls crawler access to certain areas of your website. Helps search engines find and index your content effectively.
SEO Impact Can prevent certain pages from being indexed. Enhances the visibility of valuable pages in search results.

Both robots.txt files and sitemaps play vital roles in guiding search engines through your website. While robots.txt files manage crawler access, sitemaps ensure that all the important pages are found and indexed. Using both effectively can help boost your site’s SEO and improve your search engine rankings.

Steps to Generate a Sitemap For Experience Cloud Sites

If your Salesforce Experience Cloud site is designed for guest users (unauthenticated visitors), optimizing it for search engines is essential to enhance discoverability through organic search. A critical step in this process is submitting your site to search engines like Google, which requires a sitemap.

The sitemap.xml file for your Experience Cloud site includes a list of public pages, along with objects and fields that have read access in the guest user profile. When launching a new Experience Builder site or making significant updates to an existing one, you can manually generate a sitemap once every 24 hours, in addition to the weekly automatic sitemap refresh.

  1. In Experience Builder, navigate to Settings | SEO.
  2. Click on the Generate Sitemap button to manually generate a sitemap.
  3. The status bar displays In Progress while the sitemap is being generated. Once the process is complete, the status updates to Complete.

Once your sitemap is generated, submit it to Google Search Console to ensure it is indexed quickly and properly.

Things To Remember

  1. The full sitemap is automatically refreshed every Sunday.
  2. The sitemap including all new pages is updated automatically every 24 hours.
  3. If the account of the user who most recently published the site is deactivated, the automatic refresh will stop.
  4. Always ensure that a current, active user manages site publishing to maintain automatic updates.
  5. For optimal performance, generate a manual sitemap during off-peak hours to minimize impact on site performance during high-traffic periods.
  6. Use a third-party tool (like Screaming Frog or Google Search Console) to scan your sitemap for broken links.

Proof of Concept

To verify whether a sitemap has been generated for your portal, simply open your portal in a browser and append /sitemap.xml to the URL.

For example, in our case, the sitemap can be accessed at: https://help.gurukuloncloud.com/s/sitemap.xml

Formative Assessment:

I want to hear from you!

What is one thing you learned from this post? How do you envision applying this new knowledge in the real world? Feel free to share in the comments below.

Go back

Your message has been sent

Warning
Warning
Warning
Warning
Warning

Warning.

2 thoughts on “Step-by-Step Guide to Generate a Sitemap in Salesforce Experience Cloud

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.