Web Scraping

Learning Goal: I’m working on a web scraping presentation and need guidance to h

Learning Goal: I’m working on a web scraping presentation and need guidance to help me learn.
1. Make sure there is an index file in every navigable folderThe file name index.html is the universally recognized name for a site’s home page. In fact, all website home pages are named index.html. A web browser cannot display a website without an index.html page. This is the only page of your website where the naming is not optional. All other pages can and should be given unique names that identify them.
When a browser navigates to some directory on the Internet, it immediately tries to display the index file (named “index.html” or “index.htm”). If it can’t find an index file, it will just display a list of all the files in that folder. At best, this is a bit ugly and, at worst, is a security risk! So, make sure that you put an index file in every folder that a user can navigate to on your website.
2. Use all lowercase names when naming your filesSome Unix systems have trouble with uppercase filenames (and many web servers are Unix-based), so make sure to use all lowercase letters when naming your documents. It’s a good habit to write HTML tags in lowercase letters as well!
3. Reserve separate folders for image, script, and CSS filesIt’s a good idea to keep all of your images in a single folder (named “images”, for example), and so the same for any script and CSS files. If you have only a few image files, you can keep them all in one folder. As your website gets bigger, and you are working with more and more images, you’ll want to make sub-folders within your image folder, splitting them up by category.
4. Categorize your web pages into foldersIf you have a small website (ten pages or less), you may be able to get away with putting all of your HTML files in the main (root) folder. But if your website is larger than that, you’ll want to split up your files into separate folders. Organize them in a way that makes sense to you by grouping similar pages together in their own folders. For example, all of your product pages can go into one folder, your artwork pages can go into another, and so on.