This is the fourth of five posts on the topic of conducting your own site reviews. In the previous posts, we discussed why you’d want to perform a site review (Part 1), then took an initial look at page-level issues (Part 2), followed by a discussion of site-wide issues (Part 3), that can affect site performance for users and search engine ranking. In this post, we continue our look at site-wide issues that should also be examined in a site review.
Using HTTP redirects
When you remove, rename, or relocate a previously published webpage on your web server, do you implement an HTTP redirect on your server to help users find the content they are looking for? If not, you should. Otherwise, any inbound links from external sites directed toward those old pages will no longer work. Users navigating to your site through those links will likely move on, and any search engine ranking based on that old page will be lost as well.
Many webmasters mistakenly use 302 redirects by default, which are strictly intended for temporary moves (such as to an interim page for sold out inventory item page), when they should be using permanent, 301 redirects. The significance of using 302s versus 301s is important to search engines, as that information is applied to the pages in its index. When a 302 is used, the ranking status of the original page is retained by that URL in the index, whereas a 301 transfers the ranking status of the old page to the new URL. When a page is permanently removed, renamed, or relocated, the new version fails to inherit the rank status possessed by the previous page when a 302 is used. To optimize your redirects, always use 301s for permanent changes to your site.
Site review task: Review the way HTTP redirects are used on your site. If any permanent redirects are being used, be sure they are configured as 301s.
Empowering canonicalization
While we briefly touched on this topic in Part 3 in our discussion on linking, let’s finally dig into a review of how you are linking to your internal pages and if you’re being consistent in your methodology. The issue is URL canonicalization, aka determining the standard URL for a given page (especially the site’s home page). The reason this is important is because each variation of a URL used to refer to your site is individually tracked and ranked by search engines. When you allow multiple URLs to be used as valid inbound links from external sites to the same page (and/or are inconsistent in how you form URLs in your intra-site linking), the URL variations become duplicate content that the search engine has to recurrently crawl and then potentially index. To avoid diluting the ranking value for a page among many URL forms, consolidate all of the alternative forms into the primary URL for the page. This process is called canonicalization.
For example, many sites allow you to reach their home page by either including or omitting the subdomain prefix “www.”, including or omitting the home page’s filename, and more. The following URLs, all of which are considered separate locations to search engines, usually lead to the same page:
The possible permutations of these variations can be quite numerous, with each one earning its own rank value in the search engine index. But you can resolve this rank redistribution riddle. While you can’t control what URL form other webmasters use in their outbound links to your site, you can use 301 (permanent) redirects to aggregate all possible alternative URLs to reach (and thus funnel that inbound “credit” to) the one primary URL for your site’s home page.
Run this test: Start your browser and open up three session tabs for each of your favorite search engines (don’t forget Bing!). In the first tab’s search text box, type the following:
SITE:<YourDomain>.com
specifying just your root domain name and its associated top-level domain (TLD), but omitting the “www.” subdomain prefix or any other subdomain name. This query searches for all pages in the index from the entire domain (including all subdomains). Next, in the second tab for the same search engine, type:
SITE:www.<YourDomain>.com
which includes the “www.” subdomain. This query specifically looks for indexed pages associated with the “www.” subdomain. Then in the third tab for the same search engine, type this:
SITE:<YourDomain>.com -SITE:www.<YourDomain>.com
The last search reruns the first query while using the second query as an exclusion filter. This filter removes all indexed results that include the “www.” subdomain in the URL. If you do get results in the last test (which can include URLs in subdomains other than “www.”), compare the search results in detail to see if both the “www.” and non-”www.” variations of your site’s URLs are in the index (as well as other URL variations as listed above). If so, you’re allowing the hard-earned rank for your pages to be diluted between multiple URLs. You need to canonicalize your site to consolidate the URL variations in the search engine so your pages can earn the highest possible rank for the canonical URL form.
While you’re at it, take a look at what URLs you are using in the cross-links between the pages of your own site. Do they consistently use the same singular URL form to each page? I recommend using absolute rather than relative links, which are optimal for continuing to build up that aggregated canonical credit for your site’s pages. For more information on canonicalization, see the blog posts Making links work for you (SEM 101) and Optimizing your very large site for search – Part 1.
Site review task: Check to see if your site allows multiple URL forms to access the same page. If so, determine which URL form you want to be primary and then implement 301 redirects for all possible variations to that primary URL. Also, check your intra-site links to be sure the URLs to other pages on your site are consistently formatted the same way.
Don’t we all want validation (of our coding)?
Is your HTML source code valid? Just because it displays more or less correctly, are you sure it’s solid? (Some browsers are much more tolerant of HTML coding errors than others are, so you may not actually see the problems. However, search bots are typically not as forgiving as those tolerant browsers, which is why this issue is important.) Errors in your page source code can have a detrimental effect on your page rank if the search engine doesn’t understand and thus can’t effectively crawl your code. For example, if you didn’t properly format the <head> tag code discussed in Part 2, all of the work you put into enhancing its content for keyword usage could be for naught. Test your HTML source code with the validation tool in your webpage development environment or at a public resource such as the W3C Markup Validation Service page. You might be surprised at the number of problems found, but the details provided usually make them easy to resolve.
While you’re at it, does your site comply with the federal regulations Section 508 for disabled users? While non-compliance with this may not directly affect your page ranking, improved compliance may enhance the number of visitors to and participants in the activities on your site. Hey, it can’t hurt!
Site review task: Run a source code validator on all of your web page files and make corrections. Check your site for Section 508 compliance.
How about content validation?
Lastly, have you ever run a spell check on your content? Misspelling a keyword will never help you improve your ranking in the use of that term. Is your content logically organized? Do you have content split up into logical segments? If not, maybe you need to reorganize the content of your pages. Pages that were once initially small but have grown over time into long pages are ripe candidates for this consideration in your site review. For more information on content architecture, check out our blog post Architecting content for SEO (SEM 101).
And horrors that be: do you have broken links? While it can be a drag to have to constantly recheck the validity of your site’s outbound links when you have a ton of links, it does matter. And there’s no excuse for broken links between pages of your site! Submit your pages to the W3C Link Checker for validation. If you’ve already registered to use the Bing Webmaster Center tools, log on and use the Crawl Issues tool (with the File Not Found (404) issue type) to get a report on all of the broken links on your site (up to 1,000 line items in a downloadable report).
Site review task: Check the content on each page for spelling errors, overly long pages that should be split up, and the validity of the links.
Custom 404 error pages
What do visitors see when they type in an incorrect URL or follow a broken, inbound link to your site? If the URL points to a non-existent page on your site (when no redirect is in place), your web server will return an HTTP error code 404, which presents a generic File Not Found screen in your browser. It’s almost guaranteed that a user will abandon any further attempt to use your site, which is a lost opportunity for you.
Instead of losing that potential conversion, implement a custom 404 error page for your site. By creating a better user experience for customers, as long as they get to your domain’s web server, you can provide them with a custom “File not found” error page that offers them basic information about your site’s content, wrapped in your site’s page design theme template, and provide them with your primary site navigation scheme continuing their search. For more information on developing a custom 404 error page, see the blog article Fixing 404 File Not Found frustrations (SEM 101). Note that you can implement both HTTP redirects for known page moves, and a custom 404 page for all other broken links when no redirect is in place.
Site review task: If you have not set up a custom 404 error page for your site, do so.
Page-level web spam techniques
Are you using page-level web spam techniques in an effort to deceive the bot into giving you a higher-than-deserved ranking? The search bots, as they crawl the Web on a daily basis, see every conceivable web spam technique. The indexing algorithms are constantly updated to reflect this exposure to attempted deception, and when web spam is encountered, penalties usually ensue for the offending site. SEO is hard work. Providing legitimate ranking is important to search engines because it’s important to our customers. Any detected efforts to cheat the system is definitely frowned upon, and for webmasters who use these techniques, it ultimately results in the exact opposite of their original goal – poor ranking (if not outright expulsion from the index). For more information on page-level web spam, see the blog post The pernicious perfidy of page-level web spam (SEM 101).
Site review task: Review your site for any use of web spam techniques and, if found, remove them from your site. If your site has already been penalized for such usage, after you clean up the offending issues and republish your site, you can then request reconsideration of the penalty by following the instructions listed at the end of the blog article The liability of loathsome, link-level web spam (SEM 101).
Got malware?
One of the best ways to sink your site’s value to users is to allow your site to be infected with or links out to malware. Now most webmasters are certain that their sites are clean (and likely, most of those are). But malware infections aren’t always intentional on the part of the webmaster. Web servers can be hacked and malware stealthily deployed (typically the “drive-by” form of malware), despite the best intentions of a site’s webmaster. And malware infections not only affect the host site, but also the sites that have outbound links to it. Of course, it’s the visitors to the infected site who pay the price (unless you consider the repercussions for the infected site, for when visitors realize they became infected with malware at that site, they will likely never return there to do business, and will likely cast their warnings of such site far and wide).
When a site is infected with malware, search engines that scan for malware will detect that and warn their users about the malware threat in the search engine results page (SERP) listing. This warning includes all pages that have outbound links to other infected sites – so while your site may technically be clean, an infected link on your page may still generate a malware warning in the SERP). Given that webmasters typically have no control over the security of the sites they link to, this accessory-to-the-crime affiliation with malware-infected sites is unfortunate, but it’s appropriately done to protect search engine users from infection.
To see if your site has been identified as infected with malware by the Bing crawler, log on to your account in Webmaster Center tools, click the Crawl Issues tool, and then select the Malware Infected issue type. If your site does contain malware, you’ll see a list of which pages are infected. To see if you are linking out to other sites that are identified as malware infected, in Webmaster Center, click Outbound Links, and then Show only outbound links to malware. From either tool, if you get positive results, you can download a CSV file with the name of the affected pages and other information to help you clean up the mess.
We published as detailed series of blog posts on malware and website security issues. For more information on identifying and the implications of a malware infection, see The merciless malignancy of malware Part 1 (SEM 101). For likely website malware attack vectors and information on how to clean up infections, see The merciless malignancy of malware Part 2 (SEM 101). For my Top 10 list of recommended security strategies for avoiding malware infections (split into two parts!), see The merciless malignancy of malware Part 3 (SEM 101) and The merciless malignancy of malware Part 4 (SEM 101).
Site review task: Check your site for possible malware infections and clean up any detected problems. Implement recommended security measures to reduce likelihood of future malware attacks.
If you have any questions, comments, or suggestions, feel free to post them in our SEM forum. Coming up next: In the last post of this series, we’ll look at some architectural issues that can help your site with the search bot. Until then…
– Rick DeJarnette, Bing Webmaster Center
Let’s continue our run-down of issues to consider in a site review. In Part 1 of this series, we looked at the whats and whys for doing a site review, and covered baselining pre-optimized performance and gathering tools. Part 2 covered important but often overlooked on-page issues that, if not properly addressed, can prove detrimental to a site’s performance, both for search engine ranking as well as for usability and discoverability for users. In this post, let’s examine site review issues that are more site-wide in scope.
What is the meaning of this (file name)?
Look at your URLs – what do they say about the page’s content? Do you use human-friendly page file names or globally unique identifier (GUID)-based gibberish? While bots may have no particular grievance with irrelevantly named pages in URLs, you may be missing an opportunity. Using one or two of your targeted keywords in a file name can help associate those words to your page in the eyes of the search bot. Of course, if you simply crunch those keywords all together, that might not be as helpful as you want. CamelCasing words in file names is of no value to SEO, and the ability to parse individual keywords out of a concatenated string of letters may be beyond the reliable ability of a bot. So how do you assist with the parsing process so that you get full value of any keywords used in a file name (and thus in a URL)?
Using underscores in a file name is not necessarily a good idea. Old style programming code used underscores as a concatenating device, so while it me be parsable to the human eye, that may not be the case for the search engine. Besides, since it’s common practice to format hyperlinks in text with underlines, underscore characters in link text may mistakenly appear to be spaces. And forget about using space characters. But there is a reasonable solution you can use.
Adding hyphens between words in a file name, as long as this technique is used in moderation, is perfectly acceptable. Moderation is suggested because of the inclination of some folks to push on boundaries well into the realm of web spam. But when used in moderation, hyphens parse individual words just fine for both human readers and search bots. You might even try this for your next domain name (but shorter is usually better here, so using any more than one or two hyphens should indicate that the webmaster should rethink the proposed domain name in a shorter version).
Site review task: Review the file-naming scheme used for page files to see if there are opportunities to use your targeted keywords. To consider the use of keyword phrases, look into employing hyphens to parse the text into discrete keywords.
Absolutely ban relative links
Are your intra-site links relative or absolute? Either form technically works just fine in the eyes of a search engine, but the format of your links can make a big difference to you for other reasons. Content security alone may be reason enough for some folks to go with absolute links.
The difference between the two forms is whether or not the reference to the web server of origin is always maintained within the link URL. Relative links, as used in the sample href attribute of the anchor tag — “/sales/today.htm” — assume the server’s domain name and only provide the portion of a link’s path beyond the website’s root (if even that much of a path), whereas absolute links provide the full URL in the link.
The use of relative links can become a problem when page content is used out of the context of the original webpage (thus from a different URL). When used out of their original context, relative links will be broken and the source content they refer to will not be available. This can be a problem when users cut and paste content from your site into another document, such as an email or, sadly, screen scrape it into their own website. If you use absolute links, the links will never be broken. Interestingly, it’s pretty common that folks who are too lazy to create their own content (meaning those who instead screen scrape – aka steal – that content from others) will also be too lazy to check the inline links in that content. It’s a bit of poetic justice to at least get inbound link credit from those plagiaristic sites back to your own work! (And in one case recently, our team used the absolute links to Site A embedded in content found in Site B as evidence for Site A’s claim that Site B had stolen their original content. As a result, the offending Site B was penalized for improperly duplicating the content!).
To be sure your links always refer back to your website and your linked content is always available (and thus not context-dependent), use the entire URL as the link path — an absolute link. Just remember this one caveat: if you move a content page to a different directory on your site, you’ll need to update all of your hard-coded URL links going to that page (of course, the outbound links from that moved page will remain valid!).
Absolute links can also be better for helping establish the preferred URL for your site (known as canonicalization). Some webmasters use multiple domain names with fully populated, identical content pages, which can lead to duplicate content confusion and ranking dilution. Always using absolute links to the primary source URL across will contribute to canonicalizing the content. In addition, using relative links when you have multiple URLs pointing to the same content, such as with HTTP: and HTTPS:, can also lead to duplicate content confusion for the search engine (we’ll get deeper into canonicalization issues later on in this series).
Site review task: Review your intra-site links, including your site navigation scheme, to be sure they are formatted as absolute links.
Getting from here to there within your site
How do users navigate between the pages on your site? Do you provide a clearly understandable intra-site navigation scheme? Does your site navigation rely on scripted processes or linked images (neither of which the bot can see)? Is every page on your site linked to at least one other page? Do you provide an HTML sitemap page linking to every page of your site? These are some of the questions you need to ask about your site.
Remember back in Part 2 of this series when you were asked to look at your site with a text-only browser (either configuring your browser to disable images, script, and technologies like Silverlight and Flash or use a tool like SEO-Browser) to see your site the way that search bots do? Repeating this exercise here will reveal whether your current intra-site navigation scheme works to the benefit or the detriment of search crawlers (and thus to indexing potential). Can you navigate to other pages on your site in this view? If your site uses images as links, do they include keyword-rich alt text attributes? If your site’s navigation design accounts for down-level users, that’s also very useful to search bots. Otherwise, unless your other pages receive deep, inbound links from external websites, they may never be discovered by the bots. And if they can’t be seen, they can’t be crawled and indexed.
Site review task: Review your intra-site navigation scheme for down-level visibility, comprehensiveness, and keyword usage.
Link to yourself (when appropriate)
Do you have written content in the body text of your pages that references the content found on the other pages of your site? You should. That way, you can find logical ways to reference that content and then link to it inline within your pages. Optimally those inline body text links will use the relevant, targeted keywords you established in Part 2 (see how I did that? You should do the same).
One thing to note, however, when creating links to content that is of no value to searchers, such as sign in pages, shopping carts, “print this page” links, and the like. Add the rel=”nofollow” attribute to the anchor (<a>) tag of such links to prevent the crawling and possible indexing of valueless content. Make every link count, and exclude those that don’t.
Site review task: Review your content pages for relevant opportunities to link inline to other pages on your site. Use rel=”nofollow” anchor tag attribute for links to content you don’t need indexed.
Let relevance drive your outbound linking strategy
Links matter. Links to other pages represent a de-facto endorsement of the page to which you are linking. However, the relevance of the pages you link to matters as well. If the theme of your site is about collecting ancient coins, would it make sense for you to link out to dozens of sites that promote herbal body enhancements and instant college diplomas? (No.) It’s not likely this’ll make sense to your users, and thus it doesn’t make sense to search engines, either (this holds true for inbound links coming from those irrelevant sites, too). Outbound links are good things to have, but they should be relevant to the subject of the pages you have them on. If hundreds or even thousands of non-relevant links appear on a website, this can look like link-level web spam, and that’s not a good thing. For more information on link-level web spam, see the blog post The liability of loathsome, link-level web spam (SEM 101).
Remember: if the outbound link is valuable and relevant to your users, it’ll be considered valuable to search engines.
Site review task: Check your external, outbound links for relevance.
Gimme some inbound links (but banish link spam)
We previously discussed that outbound links on your site represent your endorsement of the sites you link to, so they are often considered to carry value with search engines. Inbound links from other sites to yours carry value in the same way. And, like outbound links, the relevance of the links does matter. Some webmasters try to quickly (aka fraudulently) build search engine ranking cachet by buying inbound link value from spammy link farms.
Paid link farms aren’t helpful here, and in fact, usually prove to be detrimental to ranking if the link farm is determined to be a purveyor of web spam. To truly create value for your site, you need to do the hard work of creating valuable content for your target audience, followed by evangelizing your site to that targeted community of users with whom you wish to connect. Ask for links back to your site from webmasters running legitimate and respected sites on the same subject matter. That way you build up more and better legitimate (aka organic) inbound links. These will prove to be the most valuable links you can get.
For more information on earning valuable inbound links, see the blog post Link building for smart webmasters (no dummies here) (SEM 101).
Another way to get more inbound links is to consider using developing a Pay-Per-Click search advertising campaign. This can earn your site instant visibility in the search engine results pages (SERPs) when your organic ranking is not yet up to snuff. To be clear, buying PPC ads will have no effect on your site’s organic ranking, but PPC ads do show up on Page 1 of the SERP, and if your ultimate goal is to increase conversions, well-designed PPC ads can contribute mightily to that effort. For more information about starting a PPC ad campaign in Bing, check out Microsoft Advertising.
Site review task: Check your inbound links to see if they come from paid link farm services. If you have inbound links from a non-relevant link farm or link exchange, which are typically seen by search bots as deceptive attempts to artificially elevate the rank of a website, disengage from that service and check to see if any penalties have been implemented against your site. For more information on penalties, see the blog post Getting out of the penalty box.
If you have any questions, comments, or suggestions, feel free to post them in our SEM forum. Coming up next: We’ll continue our look at some site-wide issues that interfere with optimal ranking. Until next time…
– Rick DeJarnette, Bing Webmaster Center
The Bing Webmaster Center team has been very busy lately, working on very cool stuff that we can’t wait to share with you (patience, Grasshopper – all will be revealed in time). But the blog waits for no one (well, that’s the intent, anyway). From time to time, we gather up enough interesting tidbits of Q&A that we want to share with all of our blog readers. Now it’s that time again. So let’s get to it.
Q: I’m not able to gain access to Webmaster Center with the authentication code used in a <meta> tag. Can you help?
A: The Webmaster Center online Help topic Authenticate your website recommends using a <meta> tag formed as follows:
<meta name=”msvalidate.01″ content=”0123456789ABCDEF0123456789ABCDEF” />
However, some users attempt to combine the flow of authentication codes for multiple sites in the <meta> tags. If you must use the <meta> tag method of authentication (as opposed to the XML file authentication method as described in the Help topic), we recommend placing your Bing Webmaster Center authentication code last so that it is not followed by a space. In addition, Webmaster Center does look for the proper XHTML-based closing of the <meta> tag – the ” />”, so be sure to use this closing in your code.
This issue is discussed further in the Webmaster Center forum topic Site Verification Error for Bing Webmasters Tools.
Q: Why do I have to register as a user in the Webmaster Center blog just to post a comment?
A: We were getting a few non-registered visitors who were posting way too much spam in the blog comments. We needed to block that junk from being posted, so we implemented a new rule that requires folks to register before they can leave comments. Since we can control spam from registered user accounts, we felt this was the best course for minimizing the disruption of irrelevant comments. We hope this is not a hardship on anyone!
Q: I’ve posted two random blog comments requesting inclusion of my site into Bing News Service? Why haven’t you added my site?
A: Let’s redirect those requests to the right place. To request that Bing add your news site to our list of news sources, we ask that you send the request via email to the Bing News Service team. Please be sure to identify yourself, your URL, what types of news you provide, your audience, and any other determining factors such as awards won, etc.
Q: I have a very complicated or specific question to ask about my site and the Bing index. Can you answer it here?
A: Blog comments are best used for furthering the conversation about the associated blog article. Specialized service requests or specific questions about Bing products and services requiring detailed, individualized answers are always better left in the Bing Webmaster Center forums as a starting place. We have a forums administrator on staff who, along with the regular VIP contributors there, can offer helpful advice and insight to your questions. There are some amazing folks participating over there!
Q: How do I get my company listed in the Bing local listings?
A: Use the Bing Local Listing Center form. You may need to sign in to your Webmaster Center account or create a new sign-in account to access this form.
Q: How can I ensure that my local business contact information (address and phone number) from my website get into the Bing index?
A: One common problem we see with this is that some sites rely solely upon an image containing text to convey this information. This is not good practice for SEO. If you want to be sure MSNBot (or any other search engine bot) can to read such information, please add it to your website as text (the image is OK as long as the text version also exists)!
Q: Your recent posts on web spam has brought up a question: how do I report web spam that I find in search engine results pages to Bing?
A: To report web spam sites, we recommend that you go to the Bing Support web form to file the complaint. In the Problem list, select Content Removal Request. In the resulting list box, select Other. In the comments text box, include specific and detailed information in your report. Complete the rest of the form and then click Submit.
A member of the Bing web spam team will review the report and investigate the matter. If the report is accurate, appropriate action will be taken. Note that if the report is malicious and false, no action will be taken against the accused site.
Q: My website offers tax-related services. As a result, I use the word “tax” numerous times in my content. Could Bing consider my site to be web spam due to the appearance of keyword stuffing? When do I cross the line from acceptable to web spam?
A: The key here always comes back to how the content appears to the human reader. Is it logical? Is it readable? Does it make sense? In this particular case, the repeated use of the word “tax” in content regarding tax services offered is reasonably expected and thus is fine. In fact, including a solid set of explanatory content that defines these keyword phrases only strengthens the case for reasonably repeating this word. If the use of this repeated word makes contextual sense to the reader and is not a clumsy attempt to stuff the word in where it’s not necessary or helpful, and you have a good amount of supporting content to accompany it, you’ll be fine. Our crawler sees this usage and understands it is legitimate. Just write your content for the reader’s comprehension and the crawler will not penalize you for keyword stuffing.
The important thing to remember is that true web spam often involves multiple issue violations. As such, it typically takes more than one violation to trigger web spam consequences – having a slightly above average number of keywords won’t automatically torpedo your site. Just as you need to do several things well to improve your ranking (build good content, build valuable inbound links, target several keywords, etc.), you need to do several things wrong to really hurt your ranking. That said, if it’s obvious that you are trying to abuse the system, even with just one egregious issue, then penalties will ensue.
Lastly, we don’t define any borderline between acceptable and non-acceptable web spam. If you think what you’ve done might be considered web spam because you know you’re trying to game the system, then take a different approach to optimizing your pages. I’ll repeat my mantra: write content for the human reader, not the crawler. Develop good, unique content that is readable, understandable, and valuable. If you do this without involving any black-hat, SEO-style trickery in an effort to artificially boost your ranking, then you’ll never have to worry about this being an issue.
Q: Regarding backlinks in forum comments and link-level web spam, is it only a problem when the page linked to is not relevant to the conversation in the forum, or is this a problem for all backlinks?
A: It always comes down to whether the effort is intended to legitimately benefit the human reader or benefit the owner of the link. If the link in a blog comment is relevant to the content in both the blog article and the blog comment and as an extension to that content, is of value and interest to the reader, then it is not a problem. In fact, this is a fine idea (whether or not the rel=”nofollow” attribute is automatically applied by the blog to user-generated links). However, if the link in the blog comment is not relevant to either the blog article or the blog comment’s content, is not of relevant, legitimate interest to the reader, and instead is only beneficial to the link owner, then that is web spam. It’s pretty straight-forward.
Also consider how the blog comment link is formed, as in whether it is a single link inline to the comment’s content or is it a bazooka blast consisting of multiple, irrelevant links following a short, generic message that could be applicable to anything (or nothing). If your goal is to tell the reader about some information relevant to the post and that info is found within good content on your site, that’s great. Add those links! Even if rel=”nofollow” is employed by the blog in all UGC-based links, the potential for driving live traffic to your site is good, and if the content there is worthwhile, that will improve public awareness of that content and ultimately be a good link building strategy. But if the comment is merely an excuse for blatant advertising links, it is web spam. Note the difference in intent. If you do right by the reader, you’ll be fine.
If you have any questions, comments, or suggestions, feel free to post them in our SEM forum. Later…
– Rick DeJarnette, Bing Webmaster Center
When I was a kid in high school, I used to go to the public library and do initial research in the Encyclopedia Britannica (yes, the bound book editions. I also remember black & white television with vacuum tubes and rotary telephones! Sheesh, I’m getting old!). I would pick up the index volume that contained the keyword I wanted to look up to identify which of the main volumes had the content I sought.
But imagine this: when I opened up the referenced main volume to the page specified, I always found the content I wanted. I never once went to the content page referenced in the index and found a page full of advertisements, come-ons for dubious physical enhancement pharmaceuticals, or any irrelevant, unwanted garbage like that. That’s how Internet search is supposed to work, too.
Search engine as master index
However, unlike the Encyclopedia Britannica, which maintained sole control over the information it published (thus making its index a really good bet for finding the content you want), the fast and loose world of the Internet is open to all comers, for better or for worse. The good of that trait allows for information of all types, from highly important to trivial (and in all ranges of value, from well-researched reports to skewed opinions to deceptive trash), to be found, but you must know where to look. This is where a search engine’s role as master indexer comes into play. Services like Bing use their own resources to scan the Web for content and organize their findings into a useful index of the available content for users.
But since no one entity has control over the content placed on the Web, the useful and informative website is joined by the unscrupulous huckster, who spends a huge amount of effort to deceive the search engine index in order to bring unsuspecting web searchers to their irrelevant website. This deception is the core of web spam.
Bing and other search engine service providers work diligently to detect and eliminate web spam-tainted results from getting into our search engine results pages (SERPs). It’s a tough battle, and it requires a great deal of work to keep our SERPs useful and legitimate for search customers.
We’ve already discussed the basic definition of web spam and one of the two major implementations, page-level web spam, in previous blog articles. We’ll wrap up this web spam series with a discussion of the other major type, link-level web spam. And finally, we’ll discuss what a webmaster can do to restore their website’s listing (removing penalties) with Bing once the detected web spam has been removed.
Definition of link-level web spam
Link-level web spam uses web link deceptions in an attempt to artificially inflate the page rank of a specific page or site. Savvy webmasters know that earning high quality, relevant inbound links from authoritative sites can have a very positive influence on the search engine’s rank of the linked site – we recently published a blog post on this subject titled, Link building for smart webmasters (no dummies here). This is good search engine optimization (SEO). Some less savvy and/or more unscrupulous folks believe they can simply substitute the “high quality, relevant” part of the equation for high quantity and swap “authoritative sites” for either junk or irrelevant sites and achieve the same goal. Sadly for them, this is not the case.
The intent of the link-level web spammer is to create huge numbers of inbound links (typically from unrelated, low quality sites) to attain illegitimate page rank for a site to fool web searchers into visiting their sites. Luckily, Bing and the other search engines can assess the quality and authority of a particular website.
Sites employing link-level techniques also often employ page-level web spam techniques to make their sites appear to be relevant to a commonly searched keyword when they are not. The use of link-level web spam techniques will cause a search engine to examine your site more deeply, and if it’s determined to be using web spam techniques, your site could be penalized.
As we stated earlier with page-level web spam, some of these techniques can have valid uses at their core, but the intention behind their use is the distinguishing factor. When we detect deceptive intent as we crawl the Web, we identify those pages as web spam and penalize them as appropriate, ranging from neutralization (which levels the playing field for other sites offering content on the same subject) to expulsion from the index. As you can imagine, for an online-based business, these are serious consequences, so it pays to know what NOT to do when you optimize your site for search (or hire a consultant to do the same).
Post web spam
Definition: This is a form of user-generated content (UGC)-based outbound links posted in other web sites, such as in guest book pages, forums, blog comments, message boards, and referrer logs.
Problem: The destination links in post web spam are usually unrelated, topic-wise, to the page containing the UGC outbound link. Often these posts include multiple links. In sites that rely on post web spam for inbound links, it is not unusual for a sizable percentage of all of their inbound links to be from post web spam.
What we look for: Several techniques for implementing post web spam are commonly used, including:
What post web spammers don’t often realize is that many UGC-oriented pages automatically append the attribute rel=”nofollow” to any links created in UGC content. As such, no inbound link credit is derived when search engines crawl and index these pages.
From a webmaster point of view, however, we encourage active, regular cleaning up (or better yet, preventing) of UGC-based web spam content. If there is too much junk or web spam content on a page, it could reflect poorly on the overall quality of your page, even if you are employing rel=”nofollow” to URLs. For that matter, it is very important for any site that allows UGC content to actively monitor their site’s security. Hosting malware can also get a site penalized, and you don’t want that! For more information on malware, see our blog article series called The merciless malignancy of malware, Part 1, Part 2, Part 3, and Part 4.
Link farming
Definition: A link farm is a large collection of websites that exist for the sole purpose of providing massive numbers of links to targeted websites, ostensibly to improve the appearance of their organic, online popularity.
Problem: Link farming is often employed to promote one website using many other websites or it can be a commercial enterprise in which the link farm sells its unscrupulous (and worthless) outbound linking services to less-SEO-savvy webmasters.
What we look for: Link farming is often implemented using the following techniques:
When link farms are identified, those sites are penalized, which negates the value of the links they contain. In addition, the pages they link to are more likely to be heavily scrutinized for other forms of web spam.
See our earlier blog articles for more information on what makes a good link versus a bad link.
Link exchanges
Definition: Unlike link farms that target a few selected sites, link exchanges are organized groups of websites who participate in providing reciprocal inbound and outbound links ostensibly to benefit all websites in the exchange.
Problem: Web spam-oriented link exchanges typically involve unrelated web sites reciprocally exchanging links en masse for the purposes of rank inflation. As such, they offer no value to human visitors, and thus they are candidates for being considered web spam.
While earning inbound links are a part of legitimate SEO activities, as we’ve stated before, Bing values quality links over quantity of links. Inbound links from sites unrelated to the theme of your site, typical with most link exchanges, will be of little to no value to you for improving your page rank.
What we look for: Link exchanges usually include the following activities:
Note that reciprocal linking is not an automatic red flag. Some websites within a particular niche will link to others when it provides a relevant value to their customers. For example, think of a bed and breakfast who links out to local wineries and a winery who links out to local bed and breakfasts – these are interrelated activities to a region that are naturally relevant for site visitors.
But as usual, too much of a good thing can be bad. And when there is no relevance between linked sites, the value of link exchanges can quickly degrade down to the level of web spam (especially when the numbers of unrelated links is deemed excessive).
Penalties and restitution
Mistakes happen. An entrepreneurial do-it-yourselfer optimizes a website based on bad (spammy) advice from the Web. A Mom-and-Pop-shop website owner naively hires an unscrupulous website consultant. Heck, it’s even possible that a search engine might mistakenly label an innocent site as web spam. So what do you do?
If you made a mistake on your site and your rank has been neutralized, the solution is easy. Web spam neutralization is handled automatically with Bing. If you are using web spam techniques on your website and you want to remove the site’s web spam neutralization penalty, eliminate the web spam violations and then republish your website. Once the Bing crawler, MSNBot, recrawls your site, if the web spam violations have been removed, the neutralization will be automatically resolved in the index.
But what if your site has been purged from the Bing index? That requires some manual intervention.
Request reconsideration for your site
If you search for your site in the Bing index using the advanced search keyword phrase site:www.myURL.com (using your URL, of course!) and nothing turns up, your site is not in the index. If this is a sudden change and you know you’ve used some unscrupulous web spam techniques, you’ll need help to get back into the index.
First of all, fix all of the web spam violations on your site. Not just one or two, but all of them. Then, once you’ve republished a corrected version of your website, contact Bing support to request reconsideration of your website’s penalty. Here’s how:
A member of the Bing support team will quickly review your request and schedule your site to be recrawled. If the crawler determines that all of the violations have indeed been resolved, then your site is eligible to be added back into the index. But be patient – this process doesn’t happen overnight (which is why it’s a wise idea to avoid such web spam penalties in the first place).
For more information on Bing penalties and restitution, see the blog article Getting out of the penalty box.
If you have any questions, comments, or suggestions, feel free to post them in our Ranking Feedback and Discussion forum. Later…
– Rick DeJarnette, Bing Webmaster Center
P.S. It was suggested to me that I list the other articles in this web spam series for those who might be interested in reading the entire set, so here goes (in order of publication):
Enjoy!
Rick
In the exciting world of today’s Internet, where the world’s information is literally at your fingertips, where you can endlessly communicate, shop, research, and be entertained, spam is a big downer. The unwanted email spam that fills our inboxes also consumes huge portions of the available bandwidth of our routers and trunk lines. But email is not the only spam game in town.
Web spam is the bane (well, one of the banes) of the search engine and web searcher communities. Search engines want to provide search users with a great experience, helping them find what they want as quickly and as easily as possible. Search users want to use search engines to get the right information they seek as quickly as possible. And webmasters want search users to find their websites, but also to get those search user visitors to become conversions instead of bounces.
Web spam, those unwanted garbage pages that use overtly deceptive search engine optimization (SEO) techniques and contain no valuable content, is a frustration to search engines and search users alike, and ultimately work against the best interests of conversion-seeking webmasters (severely annoying a potential customer is rarely a great sales technique!).
In the previous article that defined web spam and discussed how it is different from junk content, we mentioned that there are two types of web spam. In this article, we’re going to delve into the details of the first type: page-level web spam.
Definition of page-level web spam
Page-level web spam uses on-page SEO trickery (not to be confused with link-level web spam, which we’ll discuss in an upcoming article). Webmasters and optimizers for these sites do this because they believe they can fool the search engines into giving their webpages a higher-than-deserved ranking based on their content relevancy, often times for subject areas that are completely unrelated to the site’s actual content. This is done in an effort to deceive searchers into visiting their spammy sites for a multitude of reasons, none of which usually benefit the end user.
The use of the following questionable SEO techniques will cause Bing to examine your site more deeply for page-level web spam. If your site is determined to be using web spam techniques, your site could be penalized as a result.
Note that Bing recognizes that the core concepts behind many of these techniques can have valid uses. No one is saying that their use always and automatically denotes web spam. The issue of intent behind their use is the distinguishing factor for determining whether or not web spam is present and any site penalties are needed. Please understand that, from a search engine perspective, the web spam effort consistently provides very little to no value whatsoever to end users. The entire effort is directed to fraudulently affect search engine rankings. As Martha Stewart might say, that’s not a good thing.
Keyword URL and link stuffing
Definition: This is the use of heavily repeated keywords and phrases with the goal of attaining a more favorable ranking for those words in a search engine index.
Problem: Keywords can be repeated to excess, so much so that they render any text in which they appear unintelligible from a natural language point of view. Those excessive repetitions can also be added in places that are not seen by the end user (meaning outside of displayed page text). Some web spam pages even use repeated keywords that are unrelated to the theme of the page. If any of these conditions are detected, these techniques will draw the attention of Bing as likely web spam.
What we look for: The purveyors of web spam use a variety of methods for keyword stuffing, including:
Note that stuffing the keywords <meta> tag alone is not a reason to be judged as web spam. But <meta> tag stuffing could be an indicator that other web spam techniques may be employed and could draw a search engine to take a closer look at such a site.
It is important that webmasters not overreact to this information. A small amount of relevant keyword repetition is considered common and is not considered web spam as long as it is used naturally within the page content language and the page provides useful, relevant content. They key message is always the same: develop your pages for human readers, not for search engine bots, for the best results. For more information on creating and using keywords wisely, see the blog articles The key to picking the right keywords and Put your keywords where the emphasis is.
Misspelling and computer generated words
Definition: Pages populated with many various spellings of targeted keywords, especially those unrelated to the theme of the page or the site, can indicate that the keyword lists are computer generated.
Problem: Aggressive inclusion of large numbers of misspelled or rare word lists and phrases can be considered web spam when used to excess. The relevance of those words to the theme of the page or the site is the key distinguishing factor here.
What we look for: The Bing team commonly sees the following techniques on web spam sites:
Redirecting and cloaking
Definition: When a web client visits a website, certain traits can be used to identify the user and redirect them to a different page. These include, but are not limited to, redirects based on the referral code, the user agent (bot or human), and IP address.
Problem: Redirecting can be a legitimate technique in some cases such as if a web client is limited in what it can display on a mobile device web browser, or when a web server uses the client’s IP address to determine the language in which to present the content (aka geo-targeting). However, problems arise when sites filter their content based on whether the user agent belongs to an end user web browser versus a search engine bot. This type of filtering can run the gamut between showing the bot a keyword-stuffed page to an entirely different set of content, all of which is an attempt to deceive. When used with this intent, this is web spam.
What the webmasters who implement these techniques don’t understand is that search engines can detect this attempted deception. We do see when the content presented is user-agent based, and when the differences between the content variations is not done in the same light as that done between mobile and desktop browsers.
What we look for: Some webmasters design their websites to use the following deceptive techniques when the detected user agent is a search engine bot:
The problem for webmasters practicing these techniques is that their technical deceptions are not very effective. Search engines use a number of techniques to uncover such fraudulent practices as redirect and cloaking web spam. When they are revealed, the websites of the perpetrators are penalized, sometimes severely. Well-meaning webmasters or online business owners who hire unscrupulous consultants or carelessly take black hat SEO advice from indiscriminate sources on the Web are setting themselves up for trouble. Reviewing the issues identified in this article as well as the official webmaster guidelines for Bing, Yahoo, and Google, will go a long way to keeping a website on the right track for search.
In the next article on web spam, we’ll discuss link-level web spam in detail. We’ll also include some information on what to do if your site was pegged as web spam and after the problems have been resolved, how to request reinstatement into the Bing index as a normal website. Stay tuned!
If you have any questions, comments, or suggestions, feel free to post them in our Ranking Feedback and Discussion forum. Until next time…
– Rick DeJarnette, Bing Webmaster Center
Web development is a challenging job, so you need the very best web owner tools to get it done right. Whether it's SEO, programming, utilities, software or just keeping up on the latest trends; web owner tools are what you need to succeed. Give yourself a headstart on the competition, and bookmark Web Owner Tools today.