In this video, we are going to discuss a small but quite powerful file within your website, known as the robots.txt file. Now, it’s an important file when it comes to technical SEO, and we’re going to explore what the file does, how it works, and the implications it has on your SEO. Coming up.
Hey there, guys. Darren Taylor of thebigmarketer.co.uk here, and my job is to teach you all about search engine marketing. If that’s up your street, you should consider subscribing to my channel. Today, we are talking about the robots.txt file, which is a small file held on all websites, that instruct Google and other crawlers how to handle the URLs and sections of your website.
First of all, what is the robots.txt file and where can you find it? Well, if you go to pretty much all websites out there — you can try this now if you like — and go to the base URL, and then do a forward slash, and then type in robots.txt, you’ll be taken to a plain text page showing a few lines of different descriptions of different things. I’m going to break down what those things are and the implications they have on SEO.
First of all, a robots.txt file is simply a file that instructs web crawlers like Google’s crawler which crawls your content in order to index your website, what to do when it hits certain areas of your site. Now, the options of what to do can come under two categories, “Allow” in terms of letting the crawler go to the area of the website, and index it, and find the content, or “Disallow” where you don’t want the crawler to find specific pages and areas of your content.
By using these two options of Allow and Disallow, you can instruct the crawlers like the Googlebots or any other crawlers across the Web, to access or not access specific areas of your website.
Technically speaking, there are usually two parameters here. The first one will be user agent. Now the user agent is the name of the crawler. If you just want to describe Google, it will be Googlebot. There is Bingbots as well. There are a number of different crawler types out there for different search engines and different platforms as well, so you have to describe the user agent you want to describe to access or not access areas of your site.
Now, if you just put a star here, this character means everything, it means anyone and everyone. You don’t need to define all of the crawlers you know in your head within your robot’s file. Simply putting a star in there will indicate to all crawlers whether or not they can crawl or access different areas of your website.
The next part is you add your Disallow line. You want to give the definition of the URL or the section of your website you don’t want people to crawl. After the forward slash of your website, what are the pages, or subfolders, or sections of your site you don’t want to be crawled by any crawlers? By defining this in the second line of your robots.txt file, you can actually say to different crawlers not to access areas of your website.
Why would you do this? Well, some areas might pose a security risk. You wouldn’t necessarily want Google to crawl very sensitive data. You might have a platform in the background of your website, maybe it’s a software as a service product, maybe hold a lot of secure details or information, or maybe the area just doesn’t provide value to users of Google. Maybe you don’t want that content to be indexed because it could be harmful to your ranking.
There are a number of different reasons you might want to do this, and by using the robots.txt file you can instruct all the crawlers out there to crawl or not crawl different areas of your website.
Most robots.txt files are pretty simple, and the majority you see will have maybe just a couple of lines in them covering a couple of areas of the website, but some might be a lot more complex as well. Some might actually include time delays as well. Some people might want their crawlers not to crawl a website as quickly and they can put time delays in, to tell the bots or the crawlers not to crawl areas of their website until a time delay has been specified. You can see that here as well, where you can actually go ahead and define a time period you can delay crawlers.
One more thing to remember as well, and that is if you have an XML sitemap, and you should, you should also include a directive to the XML sitemap within your robots.txt file to tell the crawler where to find your sitemap so it has a good understanding of your content as well. If you can define your sitemap location in your robots.txt file as well, and you can see here, this is how to define your robots.txt to find your sitemap as well.
Your robots.txt file should always be uploaded to your root directory. When you go to your browser, type your URL in, and then put forward slash and then robots.txt, all in lower case, then that should provide access to your robots.txt file. Now, you can obviously go ahead and test your robots.txt file as well, to make sure you’re not blocking off good areas of your website that you want Google or other bots to crawl.
This can be done in the search console, and I’ve gone on to explain this in other videos across my channel, but it’s really important to make sure that you go ahead and you make sure you test this file, because otherwise, you might deindex your entire website from all crawlers without even realizing. It’s more of a technical issue and technical element, so it’s important to get it right and make sure that you have the file optimized in the right way.
Now, with bigger and more complex websites, you might want to include something called pattern matching. I won’t go into detail into this because, generally speaking, a lot of websites don’t need to go in to do this, but what you can do is you can instruct Google, or Bing, or any other bots out there crawling your website via the robots.txt file, to go through different pages based on a set of rules.
Now, again, I won’t discuss this in this video, but it’s something to be aware of and something you can learn more about. I’ve left a link in the description below where you can learn more about how to target your crawl of your website based on your robots.txt file rules known as regular expressions.
If you don’t have a robots.txt file don’t worry, they’re really easy to create, and as long as you have access to your roots directory within your website, you can upload a file. All you need to do is open up Notepad, write to your robots.txt file and upload that to your root directory. That’s it. You’ll have it in place, and you’ll be able to instruct crawlers and bots what to do when they get to your website.
Thanks so much for watching this video. If you liked it, please leave a like below. Let me know in the comments how you’re getting on optimizing your robots.txt file or whether you think these technical areas of SEO are more difficult to understand. More important than that, don’t forget to subscribe. Check out the other content on my channel, and I’ll see you guys on my next video.