Archive for the ‘Security and malware’ Topic


Is your site ranking rank? Do a site review – Part 4 (SEM 101)

-June 15, 2010 byRick DeJarnette

This is the fourth of five posts on the topic of conducting your own site reviews. In the previous posts, we discussed why you’d want to perform a site review (Part 1), then took an initial look at page-level issues (Part 2), followed by a discussion of site-wide issues (Part 3), that can affect site performance for users and search engine ranking. In this post, we continue our look at site-wide issues that should also be examined in a site review.

Using HTTP redirects

When you remove, rename, or relocate a previously published webpage on your web server, do you implement an HTTP redirect on your server to help users find the content they are looking for? If not, you should. Otherwise, any inbound links from external sites directed toward those old pages will no longer work. Users navigating to your site through those links will likely move on, and any search engine ranking based on that old page will be lost as well.

Many webmasters mistakenly use 302 redirects by default, which are strictly intended for temporary moves (such as to an interim page for sold out inventory item page), when they should be using permanent, 301 redirects. The significance of using 302s versus 301s is important to search engines, as that information is applied to the pages in its index. When a 302 is used, the ranking status of the original page is retained by that URL in the index, whereas a 301 transfers the ranking status of the old page to the new URL. When a page is permanently removed, renamed, or relocated, the new version fails to inherit the rank status possessed by the previous page when a 302 is used. To optimize your redirects, always use 301s for permanent changes to your site.

Site review task: Review the way HTTP redirects are used on your site. If any permanent redirects are being used, be sure they are configured as 301s.

Empowering canonicalization

While we briefly touched on this topic in Part 3 in our discussion on linking, let’s finally dig into a review of how you are linking to your internal pages and if you’re being consistent in your methodology. The issue is URL canonicalization, aka determining the standard URL for a given page (especially the site’s home page). The reason this is important is because each variation of a URL used to refer to your site is individually tracked and ranked by search engines. When you allow multiple URLs to be used as valid inbound links from external sites to the same page (and/or are inconsistent in how you form URLs in your intra-site linking), the URL variations become duplicate content that the search engine has to recurrently crawl and then potentially index. To avoid diluting the ranking value for a page among many URL forms, consolidate all of the alternative forms into the primary URL for the page. This process is called canonicalization.

For example, many sites allow you to reach their home page by either including or omitting the subdomain prefix “www.”, including or omitting the home page’s filename, and more. The following URLs, all of which are considered separate locations to search engines, usually lead to the same page:

  • mysite.com
  • www.mysite.com
  • www.mysite.com/
  • www.mysite.com:8080
  • www.mysite.com/default.htm
  • www.mysite.com/en/us/
  • www.<ExternalHostProvider>.com/~mysite

The possible permutations of these variations can be quite numerous, with each one earning its own rank value in the search engine index. But you can resolve this rank redistribution riddle. While you can’t control what URL form other webmasters use in their outbound links to your site, you can use 301 (permanent) redirects to aggregate all possible alternative URLs to reach (and thus funnel that inbound “credit” to) the one primary URL for your site’s home page.

Run this test: Start your browser and open up three session tabs for each of your favorite search engines (don’t forget Bing!). In the first tab’s search text box, type the following:

SITE:<YourDomain>.com

specifying just your root domain name and its associated top-level domain (TLD), but omitting the “www.” subdomain prefix or any other subdomain name. This query searches for all pages in the index from the entire domain (including all subdomains). Next, in the second tab for the same search engine, type:

SITE:www.<YourDomain>.com

which includes the “www.” subdomain. This query specifically looks for indexed pages associated with the “www.” subdomain. Then in the third tab for the same search engine, type this:

SITE:<YourDomain>.com -SITE:www.<YourDomain>.com

The last search reruns the first query while using the second query as an exclusion filter. This filter removes all indexed results that include the “www.” subdomain in the URL. If you do get results in the last test (which can include URLs in subdomains other than “www.”), compare the search results in detail to see if both the “www.” and non-”www.” variations of your site’s URLs are in the index (as well as other URL variations as listed above). If so, you’re allowing the hard-earned rank for your pages to be diluted between multiple URLs. You need to canonicalize your site to consolidate the URL variations in the search engine so your pages can earn the highest possible rank for the canonical URL form.

While you’re at it, take a look at what URLs you are using in the cross-links between the pages of your own site. Do they consistently use the same singular URL form to each page? I recommend using absolute rather than relative links, which are optimal for continuing to build up that aggregated canonical credit for your site’s pages. For more information on canonicalization, see the blog posts Making links work for you (SEM 101) and Optimizing your very large site for search – Part 1.

Site review task: Check to see if your site allows multiple URL forms to access the same page. If so, determine which URL form you want to be primary and then implement 301 redirects for all possible variations to that primary URL. Also, check your intra-site links to be sure the URLs to other pages on your site are consistently formatted the same way.

Don’t we all want validation (of our coding)?

Is your HTML source code valid? Just because it displays more or less correctly, are you sure it’s solid? (Some browsers are much more tolerant of HTML coding errors than others are, so you may not actually see the problems. However, search bots are typically not as forgiving as those tolerant browsers, which is why this issue is important.) Errors in your page source code can have a detrimental effect on your page rank if the search engine doesn’t understand and thus can’t effectively crawl your code. For example, if you didn’t properly format the <head> tag code discussed in Part 2, all of the work you put into enhancing its content for keyword usage could be for naught. Test your HTML source code with the validation tool in your webpage development environment or at a public resource such as the W3C Markup Validation Service page. You might be surprised at the number of problems found, but the details provided usually make them easy to resolve.

While you’re at it, does your site comply with the federal regulations Section 508 for disabled users? While non-compliance with this may not directly affect your page ranking, improved compliance may enhance the number of visitors to and participants in the activities on your site. Hey, it can’t hurt!

Site review task: Run a source code validator on all of your web page files and make corrections. Check your site for Section 508 compliance.

How about content validation?

Lastly, have you ever run a spell check on your content? Misspelling a keyword will never help you improve your ranking in the use of that term. Is your content logically organized? Do you have content split up into logical segments? If not, maybe you need to reorganize the content of your pages. Pages that were once initially small but have grown over time into long pages are ripe candidates for this consideration in your site review. For more information on content architecture, check out our blog post Architecting content for SEO (SEM 101).

And horrors that be: do you have broken links? While it can be a drag to have to constantly recheck the validity of your site’s outbound links when you have a ton of links, it does matter. And there’s no excuse for broken links between pages of your site! Submit your pages to the W3C Link Checker for validation. If you’ve already registered to use the Bing Webmaster Center tools, log on and use the Crawl Issues tool (with the File Not Found (404) issue type) to get a report on all of the broken links on your site (up to 1,000 line items in a downloadable report).

Site review task: Check the content on each page for spelling errors, overly long pages that should be split up, and the validity of the links.

Custom 404 error pages

What do visitors see when they type in an incorrect URL or follow a broken, inbound link to your site? If the URL points to a non-existent page on your site (when no redirect is in place), your web server will return an HTTP error code 404, which presents a generic File Not Found screen in your browser. It’s almost guaranteed that a user will abandon any further attempt to use your site, which is a lost opportunity for you.

Instead of losing that potential conversion, implement a custom 404 error page for your site. By creating a better user experience for customers, as long as they get to your domain’s web server, you can provide them with a custom “File not found” error page that offers them basic information about your site’s content, wrapped in your site’s page design theme template, and provide them with your primary site navigation scheme continuing their search. For more information on developing a custom 404 error page, see the blog article Fixing 404 File Not Found frustrations (SEM 101). Note that you can implement both HTTP redirects for known page moves, and a custom 404 page for all other broken links when no redirect is in place.

Site review task: If you have not set up a custom 404 error page for your site, do so.

Page-level web spam techniques

Are you using page-level web spam techniques in an effort to deceive the bot into giving you a higher-than-deserved ranking? The search bots, as they crawl the Web on a daily basis, see every conceivable web spam technique. The indexing algorithms are constantly updated to reflect this exposure to attempted deception, and when web spam is encountered, penalties usually ensue for the offending site. SEO is hard work. Providing legitimate ranking is important to search engines because it’s important to our customers. Any detected efforts to cheat the system is definitely frowned upon, and for webmasters who use these techniques, it ultimately results in the exact opposite of their original goal – poor ranking (if not outright expulsion from the index). For more information on page-level web spam, see the blog post The pernicious perfidy of page-level web spam (SEM 101).

Site review task: Review your site for any use of web spam techniques and, if found, remove them from your site. If your site has already been penalized for such usage, after you clean up the offending issues and republish your site, you can then request reconsideration of the penalty by following the instructions listed at the end of the blog article The liability of loathsome, link-level web spam (SEM 101).

Got malware?

One of the best ways to sink your site’s value to users is to allow your site to be infected with or links out to malware. Now most webmasters are certain that their sites are clean (and likely, most of those are). But malware infections aren’t always intentional on the part of the webmaster. Web servers can be hacked and malware stealthily deployed (typically the “drive-by” form of malware), despite the best intentions of a site’s webmaster. And malware infections not only affect the host site, but also the sites that have outbound links to it. Of course, it’s the visitors to the infected site who pay the price (unless you consider the repercussions for the infected site, for when visitors realize they became infected with malware at that site, they will likely never return there to do business, and will likely cast their warnings of such site far and wide).

When a site is infected with malware, search engines that scan for malware will detect that and warn their users about the malware threat in the search engine results page (SERP) listing. This warning includes all pages that have outbound links to other infected sites – so while your site may technically be clean, an infected link on your page may still generate a malware warning in the SERP). Given that webmasters typically have no control over the security of the sites they link to, this accessory-to-the-crime affiliation with malware-infected sites is unfortunate, but it’s appropriately done to protect search engine users from infection.

To see if your site has been identified as infected with malware by the Bing crawler, log on to your account in Webmaster Center tools, click the Crawl Issues tool, and then select the Malware Infected issue type. If your site does contain malware, you’ll see a list of which pages are infected. To see if you are linking out to other sites that are identified as malware infected, in Webmaster Center, click Outbound Links, and then Show only outbound links to malware. From either tool, if you get positive results, you can download a CSV file with the name of the affected pages and other information to help you clean up the mess.

We published as detailed series of blog posts on malware and website security issues. For more information on identifying and the implications of a malware infection, see The merciless malignancy of malware Part 1 (SEM 101). For likely website malware attack vectors and information on how to clean up infections, see The merciless malignancy of malware Part 2 (SEM 101). For my Top 10 list of recommended security strategies for avoiding malware infections (split into two parts!), see The merciless malignancy of malware Part 3 (SEM 101) and The merciless malignancy of malware Part 4 (SEM 101).

Site review task: Check your site for possible malware infections and clean up any detected problems. Implement recommended security measures to reduce likelihood of future malware attacks.

If you have any questions, comments, or suggestions, feel free to post them in our SEM forum. Coming up next: In the last post of this series, we’ll look at some architectural issues that can help your site with the search bot. Until then…

– Rick DeJarnette, Bing Webmaster Center

Is your site ranking rank? Do a site review – Part 3 (SEM 101)

-June 4, 2010 byRick DeJarnette

Let’s continue our run-down of issues to consider in a site review. In Part 1 of this series, we looked at the whats and whys for doing a site review, and covered baselining pre-optimized performance and gathering tools. Part 2 covered important but often overlooked on-page issues that, if not properly addressed, can prove detrimental to a site’s performance, both for search engine ranking as well as for usability and discoverability for users. In this post, let’s examine site review issues that are more site-wide in scope.

What is the meaning of this (file name)?

Look at your URLs – what do they say about the page’s content? Do you use human-friendly page file names or globally unique identifier (GUID)-based gibberish? While bots may have no particular grievance with irrelevantly named pages in URLs, you may be missing an opportunity. Using one or two of your targeted keywords in a file name can help associate those words to your page in the eyes of the search bot. Of course, if you simply crunch those keywords all together, that might not be as helpful as you want. CamelCasing words in file names is of no value to SEO, and the ability to parse individual keywords out of a concatenated string of letters may be beyond the reliable ability of a bot. So how do you assist with the parsing process so that you get full value of any keywords used in a file name (and thus in a URL)?

Using underscores in a file name is not necessarily a good idea. Old style programming code used underscores as a concatenating device, so while it me be parsable to the human eye, that may not be the case for the search engine. Besides, since it’s common practice to format hyperlinks in text with underlines, underscore characters in link text may mistakenly appear to be spaces. And forget about using space characters. But there is a reasonable solution you can use.

Adding hyphens between words in a file name, as long as this technique is used in moderation, is perfectly acceptable. Moderation is suggested because of the inclination of some folks to push on boundaries well into the realm of web spam. But when used in moderation, hyphens parse individual words just fine for both human readers and search bots. You might even try this for your next domain name (but shorter is usually better here, so using any more than one or two hyphens should indicate that the webmaster should rethink the proposed domain name in a shorter version).

Site review task: Review the file-naming scheme used for page files to see if there are opportunities to use your targeted keywords. To consider the use of keyword phrases, look into employing hyphens to parse the text into discrete keywords.

Absolutely ban relative links

Are your intra-site links relative or absolute? Either form technically works just fine in the eyes of a search engine, but the format of your links can make a big difference to you for other reasons. Content security alone may be reason enough for some folks to go with absolute links.

The difference between the two forms is whether or not the reference to the web server of origin is always maintained within the link URL. Relative links, as used in the sample href attribute of the anchor tag — “/sales/today.htm” — assume the server’s domain name and only provide the portion of a link’s path beyond the website’s root (if even that much of a path), whereas absolute links provide the full URL in the link.

The use of relative links can become a problem when page content is used out of the context of the original webpage (thus from a different URL). When used out of their original context, relative links will be broken and the source content they refer to will not be available. This can be a problem when users cut and paste content from your site into another document, such as an email or, sadly, screen scrape it into their own website. If you use absolute links, the links will never be broken. Interestingly, it’s pretty common that folks who are too lazy to create their own content (meaning those who instead screen scrape – aka steal – that content from others) will also be too lazy to check the inline links in that content. It’s a bit of poetic justice to at least get inbound link credit from those plagiaristic sites back to your own work! (And in one case recently, our team used the absolute links to Site A embedded in content found in Site B as evidence for Site A’s  claim that Site B had stolen their original content. As a result, the offending Site B was penalized for improperly duplicating the content!).

To be sure your links always refer back to your website and your linked content is always available (and thus not context-dependent), use the entire URL as the link path — an absolute link. Just remember this one caveat: if you move a content page to a different directory on your site, you’ll need to update all of your hard-coded URL links going to that page (of course, the outbound links from that moved page will remain valid!).

Absolute links can also be better for helping establish the preferred URL for your site (known as canonicalization). Some webmasters use multiple domain names with fully populated, identical content pages, which can lead to duplicate content confusion and ranking dilution. Always using absolute links to the primary source URL across will contribute to canonicalizing the content. In addition, using relative links when you have multiple URLs pointing to the same content, such as with HTTP: and HTTPS:, can also lead to duplicate content confusion for the search engine (we’ll get deeper into canonicalization issues later on in this series).

Site review task: Review your intra-site links, including your site navigation scheme, to be sure they are formatted as absolute links.

Getting from here to there within your site

How do users navigate between the pages on your site? Do you provide a clearly understandable intra-site navigation scheme? Does your site navigation rely on scripted processes or linked images (neither of which the bot can see)? Is every page on your site linked to at least one other page? Do you provide an HTML sitemap page linking to every page of your site? These are some of the questions you need to ask about your site.

Remember back in Part 2 of this series when you were asked to look at your site with a text-only browser (either configuring your browser to disable images, script, and technologies like Silverlight and Flash or use a tool like SEO-Browser) to see your site the way that search bots do? Repeating this exercise here will reveal whether your current intra-site navigation scheme works to the benefit or the detriment of search crawlers (and thus to indexing potential). Can you navigate to other pages on your site in this view? If your site uses images as links, do they include keyword-rich alt text attributes? If your site’s navigation design accounts for down-level users, that’s also very useful to search bots. Otherwise, unless your other pages receive deep, inbound links from external websites, they may never be discovered by the bots. And if they can’t be seen, they can’t be crawled and indexed.

Site review task: Review your intra-site navigation scheme for down-level visibility, comprehensiveness, and keyword usage.

Link to yourself (when appropriate)

Do you have written content in the body text of your pages that references the content found on the other pages of your site? You should. That way, you can find logical ways to reference that content and then link to it inline within your pages. Optimally those inline body text links will use the relevant, targeted keywords you established in Part 2 (see how I did that? You should do the same).

One thing to note, however, when creating links to content that is of no value to searchers, such as sign in pages, shopping carts, “print this page” links, and the like. Add the rel=”nofollow” attribute to the anchor (<a>) tag of such links to prevent the crawling and possible indexing of valueless content. Make every link count, and exclude those that don’t.

Site review task: Review your content pages for relevant opportunities to link inline to other pages on your site. Use rel=”nofollow” anchor tag attribute for links to content you don’t need indexed.

Let relevance drive your outbound linking strategy

Links matter. Links to other pages represent a de-facto endorsement of the page to which you are linking. However, the relevance of the pages you link to matters as well. If the theme of your site is about collecting ancient coins, would it make sense for you to link out to dozens of sites that promote herbal body enhancements and instant college diplomas? (No.) It’s not likely this’ll make sense to your users, and thus it doesn’t make sense to search engines, either (this holds true for inbound links coming from those irrelevant sites, too). Outbound links are good things to have, but they should be relevant to the subject of the pages you have them on. If hundreds or even thousands of non-relevant links appear on a website, this can look like link-level web spam, and that’s not a good thing. For more information on link-level web spam, see the blog post The liability of loathsome, link-level web spam (SEM 101).

Remember: if the outbound link is valuable and relevant to your users, it’ll be considered valuable to search engines.

Site review task: Check your external, outbound links for relevance.

Gimme some inbound links (but banish link spam)

We previously discussed that outbound links on your site represent your endorsement of the sites you link to, so they are often considered to carry value with search engines. Inbound links from other sites to yours carry value in the same way. And, like outbound links, the relevance of the links does matter. Some webmasters try to quickly (aka fraudulently) build search engine ranking cachet by buying inbound link value from spammy link farms.

Paid link farms aren’t helpful here, and in fact, usually prove to be detrimental to ranking if the link farm is determined to be a purveyor of web spam. To truly create value for your site, you need to do the hard work of creating valuable content for your target audience, followed by evangelizing your site to that targeted community of users with whom you wish to connect. Ask for links back to your site from webmasters running legitimate and respected sites on the same subject matter. That way you build up more and better legitimate (aka organic) inbound links. These will prove to be the most valuable links you can get.

For more information on earning valuable inbound links, see the blog post Link building for smart webmasters (no dummies here) (SEM 101).

Another way to get more inbound links is to consider using developing a Pay-Per-Click search advertising campaign. This can earn your site instant visibility in the search engine results pages (SERPs) when your organic ranking is not yet up to snuff. To be clear, buying PPC ads will have no effect on your site’s organic ranking, but PPC ads do show up on Page 1 of the SERP, and if your ultimate goal is to increase conversions, well-designed PPC ads can contribute mightily to that effort. For more information about starting a PPC ad campaign in Bing, check out Microsoft Advertising.

Site review task: Check your inbound links to see if they come from paid link farm services. If you have inbound links from a non-relevant link farm or link exchange, which are typically seen by search bots as deceptive attempts to artificially elevate the rank of a website, disengage from that service and check to see if any penalties have been implemented against your site. For more information on penalties, see the blog post Getting out of the penalty box.

If you have any questions, comments, or suggestions, feel free to post them in our SEM forum. Coming up next: We’ll continue our look at some site-wide issues that interfere with optimal ranking. Until next time…

– Rick DeJarnette, Bing Webmaster Center

Webmaster Center blog comments Q&A, Round 3

-March 11, 2010 byRick DeJarnette

The Bing Webmaster Center team has been very busy lately, working on very cool stuff that we can’t wait to share with you (patience, Grasshopper – all will be revealed in time). But the blog waits for no one (well, that’s the intent, anyway). From time to time, we gather up enough interesting tidbits of Q&A that we want to share with all of our blog readers. Now it’s that time again. So let’s get to it.

Q: I’m not able to gain access to Webmaster Center with the authentication code used in a <meta> tag. Can you help?

A: The Webmaster Center online Help topic Authenticate your website recommends using a <meta> tag formed as follows:

<meta name=”msvalidate.01″ content=”0123456789ABCDEF0123456789ABCDEF” />

However, some users attempt to combine the flow of authentication codes for multiple sites in the <meta> tags. If you must use the <meta> tag method of authentication (as opposed to the XML file authentication method as described in the Help topic), we recommend placing your Bing Webmaster Center authentication code last so that it is not followed by a space. In addition, Webmaster Center does look for the proper XHTML-based closing of the <meta> tag – the ” />”, so be sure to use this closing in your code.

This issue is discussed further in the Webmaster Center forum topic Site Verification Error for Bing Webmasters Tools.

Q: Why do I have to register as a user in the Webmaster Center blog just to post a comment?

A: We were getting a few non-registered visitors who were posting way too much spam in the blog comments. We needed to block that junk from being posted, so we implemented a new rule that requires folks to register before they can leave comments. Since we can control spam from registered user accounts, we felt this was the best course for minimizing the disruption of irrelevant comments. We hope this is not a hardship on anyone!

Q: I’ve posted two random blog comments requesting inclusion of my site into Bing News Service? Why haven’t you added my site?

A: Let’s redirect those requests to the right place. To request that Bing add your news site to our list of news sources, we ask that you send the request via email to the Bing News Service team. Please be sure to identify yourself, your URL, what types of news you provide, your audience, and any other determining factors such as awards won, etc.

Q: I have a very complicated or specific question to ask about my site and the Bing index. Can you answer it here?

A: Blog comments are best used for furthering the conversation about the associated blog article. Specialized service requests or specific questions about Bing products and services requiring detailed, individualized answers are always better left in the Bing Webmaster Center forums as a starting place. We have a forums administrator on staff who, along with the regular VIP contributors there, can offer helpful advice and insight to your questions. There are some amazing folks participating over there!

Q: How do I get my company listed in the Bing local listings?

A: Use the Bing Local Listing Center form. You may need to sign in to your Webmaster Center account or create a new sign-in account to access this form.

Q: How can I ensure that my local business contact information (address and phone number) from my website get into the Bing index?

A: One common problem we see with this is that some sites rely solely upon an image containing text to convey this information. This is not good practice for SEO. If you want to be sure MSNBot (or any other search engine bot) can to read such information, please add it to your website as text (the image is OK as long as the text version also exists)!

Q: Your recent posts on web spam has brought up a question: how do I report web spam that I find in search engine results pages to Bing?

A: To report web spam sites, we recommend that you go to the Bing Support web form to file the complaint. In the Problem list, select Content Removal Request. In the resulting list box, select Other. In the comments text box, include specific and detailed information in your report. Complete the rest of the form and then click Submit.

A member of the Bing web spam team will review the report and investigate the matter. If the report is accurate, appropriate action will be taken. Note that if the report is malicious and false, no action will be taken against the accused site.

Q: My website offers tax-related services. As a result, I use the word “tax” numerous times in my content. Could Bing consider my site to be web spam due to the appearance of keyword stuffing? When do I cross the line from acceptable to web spam?

A: The key here always comes back to how the content appears to the human reader. Is it logical? Is it readable? Does it make sense? In this particular case, the repeated use of the word “tax” in content regarding tax services offered is reasonably expected and thus is fine. In fact, including a solid set of explanatory content that defines these keyword phrases only strengthens the case for reasonably repeating this word. If the use of this repeated word makes contextual sense to the reader and is not a clumsy attempt to stuff the word in where it’s not necessary or helpful, and you have a good amount of supporting content to accompany it, you’ll be fine. Our crawler sees this usage and understands it is legitimate. Just write your content for the reader’s comprehension and the crawler will not penalize you for keyword stuffing.

The important thing to remember is that true web spam often involves multiple issue violations. As such, it typically takes more than one violation to trigger web spam consequences – having a slightly above average number of keywords won’t automatically torpedo your site. Just as you need to do several things well to improve your ranking (build good content, build valuable inbound links, target several keywords, etc.), you need to do several things wrong to really hurt your ranking. That said, if it’s obvious that you are trying to abuse the system, even with just one egregious issue, then penalties will ensue.

Lastly, we don’t define any borderline between acceptable and non-acceptable web spam. If you think what you’ve done might be considered web spam because you know you’re trying to game the system, then take a different approach to optimizing your pages. I’ll repeat my mantra: write content for the human reader, not the crawler. Develop good, unique content that is readable, understandable, and valuable. If you do this without involving any black-hat, SEO-style trickery in an effort to artificially boost your ranking, then you’ll never have to worry about this being an issue.

Q: Regarding backlinks in forum comments and link-level web spam, is it only a problem when the page linked to is not relevant to the conversation in the forum, or is this a problem for all backlinks?

A: It always comes down to whether the effort is intended to legitimately benefit the human reader or benefit the owner of the link. If the link in a blog comment is relevant to the content in both the blog article and the blog comment and as an extension to that content, is of value and interest to the reader, then it is not a problem. In fact, this is a fine idea (whether or not the rel=”nofollow” attribute is automatically applied by the blog to user-generated links). However, if the link in the blog comment is not relevant to either the blog article or the blog comment’s content, is not of relevant, legitimate interest to the reader, and instead is only beneficial to the link owner, then that is web spam. It’s pretty straight-forward.

Also consider how the blog comment link is formed, as in whether it is a single link inline to the comment’s content or is it a bazooka blast consisting of multiple, irrelevant links following a short, generic message that could be applicable to anything (or nothing). If your goal is to tell the reader about some information relevant to the post and that info is found within good content on your site, that’s great. Add those links! Even if rel=”nofollow” is employed by the blog in all UGC-based links, the potential for driving live traffic to your site is good, and if the content there is worthwhile, that will improve public awareness of that content and ultimately be a good link building strategy. But if the comment is merely an excuse for blatant advertising links, it is web spam. Note the difference in intent. If you do right by the reader, you’ll be fine.

If you have any questions, comments, or suggestions, feel free to post them in our SEM forum. Later…

– Rick DeJarnette, Bing Webmaster Center

The liability of loathsome, link-level web spam (SEM 101)

-February 18, 2010 byRick DeJarnette

When I was a kid in high school, I used to go to the public library and do initial research in the Encyclopedia Britannica (yes, the bound book editions. I also remember black & white television with vacuum tubes and rotary telephones! Sheesh, I’m getting old!). I would pick up the index volume that contained the keyword I wanted to look up to identify which of the main volumes had the content I sought.

But imagine this: when I opened up the referenced main volume to the page specified, I always found the content I wanted. I never once went to the content page referenced in the index and found a page full of advertisements, come-ons for dubious physical enhancement pharmaceuticals, or any irrelevant, unwanted garbage like that. That’s how Internet search is supposed to work, too.

Search engine as master index

However, unlike the Encyclopedia Britannica, which maintained sole control over the information it published (thus making its index a really good bet for finding the content you want), the fast and loose world of the Internet is open to all comers, for better or for worse. The good of that trait allows for information of all types, from highly important to trivial (and in all ranges of value, from well-researched reports to skewed opinions to deceptive trash), to be found, but you must know where to look. This is where a search engine’s role as master indexer comes into play. Services like Bing use their own resources to scan the Web for content and organize their findings into a useful index of the available content for users.

But since no one entity has control over the content placed on the Web, the useful and informative website is joined by the unscrupulous huckster, who spends a huge amount of effort to deceive the search engine index in order to bring unsuspecting web searchers to their irrelevant website. This deception is the core of web spam.

Bing and other search engine service providers work diligently to detect and eliminate web spam-tainted results from getting into our search engine results pages (SERPs). It’s a tough battle, and it requires a great deal of work to keep our SERPs useful and legitimate for search customers.

We’ve already discussed the basic definition of web spam and one of the two major implementations, page-level web spam, in previous blog articles. We’ll wrap up this web spam series with a discussion of the other major type, link-level web spam. And finally, we’ll discuss what a webmaster can do to restore their website’s listing (removing penalties) with Bing once the detected web spam has been removed.

Definition of link-level web spam

Link-level web spam uses web link deceptions in an attempt to artificially inflate the page rank of a specific page or site. Savvy webmasters know that earning high quality, relevant inbound links from authoritative sites can have a very positive influence on the search engine’s rank of the linked site – we recently published a blog post on this subject titled, Link building for smart webmasters (no dummies here). This is good search engine optimization (SEO). Some less savvy and/or more unscrupulous folks believe they can simply substitute the “high quality, relevant” part of the equation for high quantity and swap “authoritative sites” for either junk or irrelevant sites and achieve the same goal. Sadly for them, this is not the case.

The intent of the link-level web spammer is to create huge numbers of inbound links (typically from unrelated, low quality sites) to attain illegitimate page rank for a site to fool web searchers into visiting their sites. Luckily, Bing and the other search engines can assess the quality and authority of a particular website.

Sites employing link-level techniques also often employ page-level web spam techniques to make their sites appear to be relevant to a commonly searched keyword when they are not. The use of link-level web spam techniques will cause a search engine to examine your site more deeply, and if it’s determined to be using web spam techniques, your site could be penalized.

As we stated earlier with page-level web spam, some of these techniques can have valid uses at their core, but the intention behind their use is the distinguishing factor. When we detect deceptive intent as we crawl the Web, we identify those pages as web spam and penalize them as appropriate, ranging from neutralization (which levels the playing field for other sites offering content on the same subject) to expulsion from the index. As you can imagine, for an online-based business, these are serious consequences, so it pays to know what NOT to do when you optimize your site for search (or hire a consultant to do the same).

Post web spam

Definition: This is a form of user-generated content (UGC)-based outbound links posted in other web sites, such as in guest book pages, forums, blog comments, message boards, and referrer logs.

Problem: The destination links in post web spam are usually unrelated, topic-wise, to the page containing the UGC outbound link. Often these posts include multiple links. In sites that rely on post web spam for inbound links, it is not unusual for a sizable percentage of all of their inbound links to be from post web spam.

What we look for: Several techniques for implementing post web spam are commonly used, including:

  • Add backlinks to all UGC content. When users go onto websites that allow UGC to be created, those who use post web spam include backlink URLs to their sites, even if they don’t have anything to do with the comment or, more significantly, the theme of the UGC-sponsoring site.
  • Automation. Spammers often use automated techniques to repeatedly submit the same UGC post containing short, generic text and a clickable URL to their sites in every UGC-sponsoring page possible.
  • Keyword stuffing. Post web spam text is often keyword stuffed. Check out our page-level web spam article titled The pernicious perfidy of page-level web spam for more information on this.
  • Massive repetition. Lots of non-relevant, poor quality, inbound links come from such pages as online guest books, forums, and blog comments.

What post web spammers don’t often realize is that many UGC-oriented pages automatically append the attribute rel=”nofollow” to any links created in UGC content. As such, no inbound link credit is derived when search engines crawl and index these pages.

From a webmaster point of view, however, we encourage active, regular cleaning up (or better yet, preventing) of UGC-based web spam content. If there is too much junk or web spam content on a page, it could reflect poorly on the overall quality of your page, even if you are employing rel=”nofollow” to URLs. For that matter, it is very important for any site that allows UGC content to actively monitor their site’s security. Hosting malware can also get a site penalized, and you don’t want that! For more information on malware, see our blog article series called The merciless malignancy of malware, Part 1, Part 2, Part 3, and Part 4.

Link farming

Definition: A link farm is a large collection of websites that exist for the sole purpose of providing massive numbers of links to targeted websites, ostensibly to improve the appearance of their organic, online popularity.

Problem: Link farming is often employed to promote one website using many other websites or it can be a commercial enterprise in which the link farm sells its unscrupulous (and worthless) outbound linking services to less-SEO-savvy webmasters.

What we look for: Link farming is often implemented using the following techniques:

  • Large, sudden surge of new inbound links. When dozens or hundreds of inbound links suddenly appear for a new or a previously small website, such a big change can indicate link farm web spam activity. The relevance of the outbound linking sites will be a key factor in whether or not such a sudden change warrants further investigation.
  • Consistent similarities between outbound linking sites. If a large number of the inbound links for a site come from sites that are very similar in design, structure, and other key characteristics, this can lead to deeper scrutiny of a website for web spam.
  • Poor linking standards. A link farm will often have a large number of unrelated links on the page, or will have related links to many sites that employ other spam methods.The pages themselves are designed to maximize the number of links on them, favoring outbound links rather than original content on the page.

When link farms are identified, those sites are penalized, which negates the value of the links they contain. In addition, the pages they link to are more likely to be heavily scrutinized for other forms of web spam.

See our earlier blog articles for more information on what makes a good link versus a bad link.

Link exchanges

Definition: Unlike link farms that target a few selected sites, link exchanges are organized groups of websites who participate in providing reciprocal inbound and outbound links ostensibly to benefit all websites in the exchange.

Problem: Web spam-oriented link exchanges typically involve unrelated web sites reciprocally exchanging links en masse for the purposes of rank inflation. As such, they offer no value to human visitors, and thus they are candidates for being considered web spam.

While earning inbound links are a part of legitimate SEO activities, as we’ve stated before, Bing values quality links over quantity of links. Inbound links from sites unrelated to the theme of your site, typical with most link exchanges, will be of little to no value to you for improving your page rank.

What we look for: Link exchanges usually include the following activities:

  • Starts out as email spam. Link exchanges often start out as spam emails sent from webmasters of unrelated sites asking other webmasters if they would like to improve their ranking by exchanging links.
  • Excessive links. Link exchanges (reciprocal links) between unrelated sites, especially when done to excess, can be indicators of web spam, and a participating website might be more heavily scrutinized for other web spam problems.

Note that reciprocal linking is not an automatic red flag. Some websites within a particular niche will link to others when it provides a relevant value to their customers. For example, think of a bed and breakfast who links out to local wineries and a winery who links out to local bed and breakfasts – these are interrelated activities to a region that are naturally relevant for site visitors.

But as usual, too much of a good thing can be bad. And when there is no relevance between linked sites, the value of link exchanges can quickly degrade down to the level of web spam (especially when the numbers of unrelated links is deemed excessive).

Penalties and restitution

Mistakes happen. An entrepreneurial do-it-yourselfer optimizes a website based on bad (spammy) advice from the Web. A Mom-and-Pop-shop website owner naively hires an unscrupulous website consultant. Heck, it’s even possible that a search engine might mistakenly label an innocent site as web spam. So what do you do?

If you made a mistake on your site and your rank has been neutralized, the solution is easy. Web spam neutralization is handled automatically with Bing. If you are using web spam techniques on your website and you want to remove the site’s web spam neutralization penalty, eliminate the web spam violations and then republish your website. Once the Bing crawler, MSNBot, recrawls your site, if the web spam violations have been removed, the neutralization will be automatically resolved in the index.

But what if your site has been purged from the Bing index? That requires some manual intervention.

Request reconsideration for your site

If you search for your site in the Bing index using the advanced search keyword phrase site:www.myURL.com (using your URL, of course!) and nothing turns up, your site is not in the index. If this is a sudden change and you know you’ve used some unscrupulous web spam techniques, you’ll need help to get back into the index.

First of all, fix all of the web spam violations on your site. Not just one or two, but all of them. Then, once you’ve republished a corrected version of your website, contact Bing support to request reconsideration of your website’s penalty. Here’s how:

  1. Go to Bing E-mail Support and fill out the form completely
  2. Select Content Inclusion Request from the drop-down list. A new drop-down will appear underneath.
  3. From the new drop-down list, select Reinclusion request.
  4. Write a clear and detailed explanation of what you have done to resolve the problem in the next text box. (You can prepare this in advance, and then copy and paste the text into the form.)
  5. Type the security code from the presented image into the text box below.
  6. Once the form is completed, click submit.

A member of the Bing support team will quickly review your request and schedule your site to be recrawled. If the crawler determines that all of the violations have indeed been resolved, then your site is eligible to be added back into the index. But be patient – this process doesn’t happen overnight (which is why it’s a wise idea to avoid such web spam penalties in the first place).

For more information on Bing penalties and restitution, see the blog article Getting out of the penalty box.

If you have any questions, comments, or suggestions, feel free to post them in our Ranking Feedback and Discussion forum. Later…

– Rick DeJarnette, Bing Webmaster Center


P.S. It was suggested to me that I list the other articles in this web spam series for those who might be interested in reading the entire set, so here goes (in order of publication):

Enjoy!

Rick

The pernicious perfidy of page-level web spam (SEM 101)

-February 11, 2010 byRick DeJarnette

In the exciting world of today’s Internet, where the world’s information is literally at your fingertips, where you can endlessly communicate, shop, research, and be entertained, spam is a big downer. The unwanted email spam that fills our inboxes also consumes huge portions of the available bandwidth of our routers and trunk lines. But email is not the only spam game in town.

Web spam is the bane (well, one of the banes) of the search engine and web searcher communities. Search engines want to provide search users with a great experience, helping them find what they want as quickly and as easily as possible. Search users want to use search engines to get the right information they seek as quickly as possible. And webmasters want search users to find their websites, but also to get those search user visitors to become conversions instead of bounces.

Web spam, those unwanted garbage pages that use overtly deceptive search engine optimization (SEO) techniques and contain no valuable content, is a frustration to search engines and search users alike, and ultimately work against the best interests of conversion-seeking webmasters (severely annoying a potential customer is rarely a great sales technique!).

In the previous article that defined web spam and discussed how it is different from junk content, we mentioned that there are two types of web spam. In this article, we’re going to delve into the details of the first type: page-level web spam.

Definition of page-level web spam

Page-level web spam uses on-page SEO trickery (not to be confused with link-level web spam, which we’ll discuss in an upcoming article). Webmasters and optimizers for these sites do this because they believe they can fool the search engines into giving their webpages a higher-than-deserved ranking based on their content relevancy, often times for subject areas that are completely unrelated to the site’s actual content. This is done in an effort to deceive searchers into visiting their spammy sites for a multitude of reasons, none of which usually benefit the end user.

The use of the following questionable SEO techniques will cause Bing to examine your site more deeply for page-level web spam. If your site is determined to be using web spam techniques, your site could be penalized as a result.

Note that Bing recognizes that the core concepts behind many of these techniques can have valid uses. No one is saying that their use always and automatically denotes web spam. The issue of intent behind their use is the distinguishing factor for determining whether or not web spam is present and any site penalties are needed. Please understand that, from a search engine perspective, the web spam effort consistently provides very little to no value whatsoever to end users. The entire effort is directed to fraudulently affect search engine rankings. As Martha Stewart might say, that’s not a good thing.

Keyword URL and link stuffing

Definition: This is the use of heavily repeated keywords and phrases with the goal of attaining a more favorable ranking for those words in a search engine index.

Problem: Keywords can be repeated to excess, so much so that they render any text in which they appear unintelligible from a natural language point of view. Those excessive repetitions can also be added in places that are not seen by the end user (meaning outside of displayed page text). Some web spam pages even use repeated keywords that are unrelated to the theme of the page. If any of these conditions are detected, these techniques will draw the attention of Bing as likely web spam.

What we look for: The purveyors of web spam use a variety of methods for keyword stuffing, including:

  • Excessive repetitions of keywords. The number of repetitions relative to the amount of content on the page is a key indicator of web spam. The practice of repetitive keyword stuffing is often relative to the amount of content in a page. For example, a very long page of text dedicated to a single topic may naturally repeat its primary theme keyword several times, but a page with less content using the same number of repetitions of the same word may be indicative of keyword stuffing.
  • Stuffing words unrelated to the page or site theme. Stuffing the page with words that are known to be heavily searched on the Web when they are irrelevant to the theme of a site can be an indicator of web spam. Relevance is an important factor for evaluating whether keywords are indicators of web spam.
  • Stuffing on-page text. Littering the text of a page with repeated keywords that render the text meaningless and unreadable to humans is a clear problem. When such content on the page is not useful to people, the content is often suspect as web spam.
  • Stuffing in less visible areas of the page. Placing repeated keywords in less visible areas of a page, such as at the bottom of the page, in links, in Alt text, and in the title tag, can be indicative of web spam.
  • Hiding stuffed keywords in the code of a page. By putting keywords in the code of a page that the search engine crawler (aka a bot) will see but configuring it so that a web browser will not show it to a human reader can be highly suspicious. Such methods as formatting text fonts the same color as the background, using extremely small fonts, and hiding stuffed keywords using tag attributes such as style=”display: none” and class=”hide” (both of which prevent the tagged contents from being shown to the user) will draw the attention of a search engine for closer scrutiny.

Note that stuffing the keywords <meta> tag alone is not a reason to be judged as web spam. But <meta> tag stuffing could be an indicator that other web spam techniques may be employed and could draw a search engine to take a closer look at such a site.

It is important that webmasters not overreact to this information. A small amount of relevant keyword repetition is considered common and is not considered web spam as long as it is used naturally within the page content language and the page provides useful, relevant content. They key message is always the same: develop your pages for human readers, not for search engine bots, for the best results. For more information on creating and using keywords wisely, see the blog articles The key to picking the right keywords and Put your keywords where the emphasis is.

Misspelling and computer generated words

Definition: Pages populated with many various spellings of targeted keywords, especially those unrelated to the theme of the page or the site, can indicate that the keyword lists are computer generated.

Problem: Aggressive inclusion of large numbers of misspelled or rare word lists and phrases can be considered web spam when used to excess. The relevance of those words to the theme of the page or the site is the key distinguishing factor here.

What we look for: The Bing team commonly sees the following techniques on web spam sites:

  • Excessive use of misspelled keywords. Huge lists containing all possible iterations of a misspelled word can be so excessive that the page will be worthy of closer inspection for web spam.
  • Large numbers of misspelled words unrelated to the theme of the site. Long lists of word spelling variations whose core definitions are unrelated to the theme of the page or the site can indicate the site is web spam.
  • Common misspellings of popular site URLs in domain names. Common misspellings of URLs and other computer-generated content are usually considered web spam sites.

Redirecting and cloaking

Definition: When a web client visits a website, certain traits can be used to identify the user and redirect them to a different page. These include, but are not limited to, redirects based on the referral code, the user agent (bot or human), and IP address.

Problem: Redirecting can be a legitimate technique in some cases such as if a web client is limited in what it can display on a mobile device web browser, or when a web server uses the client’s IP address to determine the language in which to present the content (aka geo-targeting). However, problems arise when sites filter their content based on whether the user agent belongs to an end user web browser versus a search engine bot. This type of filtering can run the gamut between showing the bot a keyword-stuffed page to an entirely different set of content, all of which is an attempt to deceive. When used with this intent, this is web spam.

What the webmasters who implement these techniques don’t understand is that search engines can detect this attempted deception. We do see when the content presented is user-agent based, and when the differences between the content variations is not done in the same light as that done between mobile and desktop browsers.

What we look for: Some webmasters design their websites to use the following deceptive techniques when the detected user agent is a search engine bot:

  • Script-based redirects. The use of JavaScript or <meta> tag refreshes to automatically change which page is displayed are often suspicious in nature and will get more scrutiny from Bing. This is because some sites use JavaScript to redirect all visiting user agents to a new page, and that page may contain web spam. However, since search engine bots don’t execute JavaScript natively, they won’t execute the redirect and thus are supposed to index the contents of the original page (although the search engines bots can still detect this behavior).
  • Referral redirects. Some websites consider the referrer when they show a page. When the referrer is a SERP and the target website shows a different page than the one shown when the user directly navigates to the URL, this behavior is considered web spam.
  • Redirect search engine bot to a target page. Some sites detect the user agent specified and send search engine bots to alternate, text-based pages modified with other web spam techniques such as keyword stuffing (but the site provides its normal web content pages to end user web browser user agents). When redirects are filtered on search engine user agents for the purpose of deceiving them, this is a web spam version of cloaking. Bots can detect when they are redirected to special pages. So when this is encountered, it is usually indicative of web spam and will be investigated further.
  • Redirect end users to a target page. Sometimes webmasters use cloaking to work the opposite way than described immediately above. They may serve highly optimized content pages on Topic A to search engine bot user agents, but when a web browser visits the site, the page shown shows content for a completely different subject (typically an illicit one, such as a page promoting porn, casino or online gambling, illicit pharmaceuticals, and the like.). The effort here is to rank well for a commonly searched topic of interest in a search engine results page (SERP). Then supposedly when searchers find that link in their SERPs, they click the blue link in the SERP and are unwittingly redirected to the web spam page.

The problem for webmasters practicing these techniques is that their technical deceptions are not very effective. Search engines use a number of techniques to uncover such fraudulent practices as redirect and cloaking web spam. When they are revealed, the websites of the perpetrators are penalized, sometimes severely. Well-meaning webmasters or online business owners who hire unscrupulous consultants or carelessly take black hat SEO advice from indiscriminate sources on the Web are setting themselves up for trouble. Reviewing the issues identified in this article as well as the official webmaster guidelines for Bing, Yahoo, and Google, will go a long way to keeping a website on the right track for search.

In the next article on web spam, we’ll discuss link-level web spam in detail. We’ll also include some information on what to do if your site was pegged as web spam and after the problems have been resolved, how to request reinstatement into the Bing index as a normal website. Stay tuned!

If you have any questions, comments, or suggestions, feel free to post them in our Ranking Feedback and Discussion forum. Until next time…

– Rick DeJarnette, Bing Webmaster Center

« Older Entries    

Web Owner Tools










Recent Posts:
Sources:
Archives:
Meta:
Welcome to Web Owner Tools!

Web development is a challenging job, so you need the very best web owner tools to get it done right. Whether it's SEO, programming, utilities, software or just keeping up on the latest trends; web owner tools are what you need to succeed. Give yourself a headstart on the competition, and bookmark Web Owner Tools today.

 


Visit the Web Owner Store