Wednesday 18 December 2013

How Search Engines Gather Information?

Search engines gather information by crawling web sites. They crawl from page to page
visiting sites already known and by following the links that they find. Whilst crawling,
the robots, or spiders, gather information from the source code of each site and then send
back that information for indexing. The Spiders were designed to read HTML code or
code related to it such as XHTML or PHP. The Spiders find it difficult to read pages
written in Flash and some other popular web programmes. Spiders cannot directly read
Java Script or images. They can however read the alt tags which may be provided with
GIF, JPEG or PNG images.

The Four Phases of an SEO Project

In addition to definitive information about the workings of search engines, there is much
speculation, myth and rumour. There are many spurious ideas in circulation and applying
them may do more harm than good. In this section, I will try to stick to tried and trusted
conventions.

How to Optimise Your Site

Introduction
This section describes the key processes undertaken to obtain a higher organic ranking
with the major search engines.
How search engines work is part of their proprietary knowledge. The exact workings of
their algorithms are closely guarded commercial secrets. However, guidance to how these
algorithms (or algos) work can be found or deduced from various sources. Some general
guidance is available free, directly from the search engines’ own web sites. Some
guidance can be found from examining the various Google and related patents. Some
general guidance can be found from authoritative articles on SEO forum sites. However,
real world applications of this knowledge can only be found by experimentation and trial
and error.
There are some general rules. Applying them will provide a route to improved search
engine visibility. The guidance in this section could be broadly applied to the three main
engines – Google, Yahoo and MSN. However, given its dominance, much of the advice
is derived from my interpretation of the Google “Hilltop” patent of 2001. The patent is
believed by SEOs to have been the basis of the so-called Google “Florida” update of
November 2003.

Haw To Comment Spam?

Related to link spamming is comment spam. Comment spam is where a spammer visits a
publicly accessible site and deposits a comment with an anchor text link back to a
designated site. Forums and blogs are typical target. This activity became identified as a
major problem in January 2005 when Google took steps to prevent it from the blogs of
Blogger.com. The reason was that spammers working for so called PPC (Pills, Porn and
Casino) web sites were trawling legitimate blogs and posting uninvited comment
advertisements with their web site’s anchor text. Blogs were vulnerable because they
typically possess a comment section that can be accessed without the need for passwords
or even registration.

Haw To Link Spamming?

In many respects, due to the increasing influence of links, it was inevitable that link
spamming would become an issue. Spamming of links has been a growing problem as
many people have realised the importance that Google, in particular, places on links. As a
significant issue it raised its head in April 2005 when Google’s new release appeared to
ban one of the leading SEO firms from its rankings. Few people outside of Google and
the SEO firm concerned are entirely sure why this is the case. But the industry consensus
is that Google are cracking down on web sites and organisations that accumulate vast
numbers of irrelevant links with the sole intention of climbing the rankings.

Tiny Text

Tiny text is a technique of using very small text that is barely visible to the human eye.
This text can be read by the engines. However, the engines will also attribute this text as
spam.

Hidden Text


The technique here is to fill or “stuff” a page with keywords invisible to the naked eye.
This is done by using the same colour for text as for the background page. This technique
is sometimes referred to as WOW, short for white on white.

Mirror Sites

Mirror sites use an alternative URL to the target site but contain identical content. With
automated page production, there maybe hundreds of different URLs all with the same
content. This technique is sometimes referred to as domain duplication.

Throwaway Sites

Throwaway sites are almost always doorway sites. They are web sites built by spammers
to provide a short-term and artificial boost to traffic. Once their traffic objectives are
achieved they are often switched off or left to decay – hence throwaway. Throwaway
sites are stuffed with links and keywords to attract and then re-direct traffic to a target
web site. Typically, the spammers retain ownership of the throwaway domain. The
spammers’ clients initially receive large amounts of traffic. But once the throwaway site
is switched off – or thrown away – the traffic comes to an abrupt halt and the clients
business suffers. The clients are then effectively blackmailed into spending vast sums to
retain traffic. The target web site receives no long term ranking benefits.

Doorway Sites


A doorway site is a site that acts as a referring page for another site. The doorway page is
highly optimised – containing hidden links and keywords that the ordinary web user
never sees. The doorway site then climbs the search engine rankings but re-directs all of
its traffic to the target – and perhaps poorly optimised site.