নাইম আইটি https://www.nayemit.com/2021/11/robots-txt-generator-online.html
Robots txt Generator Online
What is robots.txt?
Robots.txt is a text file that contains instructions for web crawlers. It specifies which parts of a website crawlers are permitted to search. The robots.txt file, however, does not specifically name these. Instead, specific locations are not permitted to be searched. You may quickly exclude whole domains, complete directories, one or more subdirectories, or individual files from search engine crawling using this simple text file. This file, however, does not safeguard against unwanted access.
Robots.txt is kept in the domain's root directory. As a result, it is the first document that crawlers open when they visit your site. The file, however, does not simply regulate crawling. You may also provide a link to your sitemap, which provides search engine crawlers with an overview of all of your domain's current URLs.
How robots.txt works
The REP (Robots Exclusion Standard Protocol) protocol was published in 1994. This protocol requires all search engine crawlers (user-agents) to first look for the robots.txt file in your site's root folder and understand the instructions it provides. Only then will robots be able to begin indexing your website. Because robots read the robots.txt file and its instructions case-sensitively, it must be located immediately in the root folder of your domain and written in lowercase letters. Unfortunately, not all search engine robots adhere to these guidelines. At the very least, the file is compatible with the most popular search engines, including Bing, Yahoo, and Google. Their search robots strictly adhere to the instructions in the REP and robots.txt files.
Which instructions are used in robots.txt?
Your robots.txt file must be saved in the root directory of your website as a UTF-8 or ASCII text file. There can only be one file with this name. It comprises one or more rule sets that are organized in a logical manner. Upper and lower case letters are differentiated as the rules (instructions) are processed from top to down.
The following terms are used in a robots.txt file:
- user-agent: indicates the name of the crawler (the names can be found in the Robots Database).
- disallow: prevents crawling of certain files, directories, pdfs or web pages
- allow: overwrites disallow and allows crawling of files, web pages, and directories.
- sitemap (optional): shows the location of the sitemap.
- *: stands for any number of characters.
- $: stands for the end of the line.