Practice English Speaking&Listening with: Cloaking

Normal
(0)
Difficulty: 0

MATT CUTTS: Hi, everybody.

It's Matt Cutts.

And we're back to talk a little bit

about cloaking today.

A lot of people have questions about cloaking.

What exactly is it?

How does Google define it?

Why is it high risk behavior?

All those sorts of things.

And there's a lot of HTML documentation.

We've done a lot of blog posts.

But I wanted to sort of do the definitive cloaking video, and

answer some of those questions, and give people a

few rules of thumb to make sure that you're not

in a high risk area.

So first off, what is cloaking?

Cloaking is essentially showing different content to

users than to Googlebot.

So imagine that you have a web server right here.

And a user comes and asks for a page.

So here's your user.

You give him some sort of page.

Everybody's happy.

And now, let's have Googlebot come and ask

for a page as well.

And you give Googlebot a page.

Now in the vast majority of situations, the same content

goes to Googlebot and to users.

Everybody's happy.

Cloaking is when you show different content to users

than to Googlebot.

And it's definitely high risk.

That's a violation of our quality guidelines.

If you do a search for quality guidelines on Google, you'll

find a list of all the stuff--

a lot of auxiliary documentation about how to

find out whether you're in a high risk area.

But let's just talk through this a little bit.

Why do we consider cloaking bad, or why does Google not

like cloaking?

Well, the answer is sort of in the ancient days of search

engines, when you'd see a lot of people do really deceptive

or misleading things with cloaking.

So for example, when Googlebot came, the web server that was

cloaking might return a page all about cartoons--

Disney cartoons, whatever.

But when a user came and visited the page, the web

server might return something like porn.

And so if you do a search for Disney cartoons on Google,

you'd get a page that looked like it would be about

cartoons, you'd click on it, and then you'd get porn.

That's a hugely bad experience.

People complain about it.

It's an awful experience for users.

So we say that all types of cloaking are against our

quality guidelines.

So there's no such thing as white hat cloaking.

Certainly, when somebody's doing something especially

deceptive or misleading, that's when we care the most.

That's when the web spam team really gets involved.

But any type of cloaking is against our guidelines.

OK.

So what are some rules of thumb to sort of save you the

trouble or help you stay out of a high risk area?

One way to think about cloaking is, almost take the

page, like you Wget it or you cURL it.

You somehow fetch it, and you take a hash of that page.

So take all the different content and boil it down to

one number.

And then you pretend to be Googlebot, with a Googlebot

user agent.

We even have a Fetch as Googlebot feature in Google

Webmaster Tools.

So you fetch a page as Googlebot, and you hash that

page as well.

And if those numbers are different, then that could be

a little bit tricky.

That could be something where you might be

in a high risk area.

Now pages can be dynamic.

You might have things like timestamps, the ads might

change, so it's not a hard and fast rule.

Another simple heuristic to keep in mind is if you were to

look through the code of your web server, would you find

something that deliberately checks for a user agent of

Googlebot specifically or Googlebot's IP address

specifically?

Because if you're doing something very different, or

special, or unusual for Googlebot--

either its user agent or its IP address--

that's the potential to maybe be showing different content

to Googlebot than to users.

And that's the stuff that's high risk.

So keep those kinds of things in mind.

Now one question we get from a lot of people who are white

hat, and don't want to be involved in cloaking in any

way, and want to make sure that they steer clear of high

risk areas, are what about geolocation and mobile user

agents-- so phones and that sort of thing.

And the good news-- the executive sort of summary-- is

that you don't really need to worry about that.

But let's talk through exactly why geolocation and handling

mobile phones is not cloaking.

OK.

So until now, we've had one user.

Now let's go ahead and say this user

is coming from France.

And let's have a completely different user, and let's say

maybe they're coming from the United Kingdom.

In an ideal world, if you have your content available on a

.fr domain, or .uk domain, or in different languages,

because you've gone through the work to translate them,

it's really, really helpful if someone coming from a French

IP address gets their content in French.

They're going to be much happier about that.

So what geolocation does is whenever a request comes in to

the web server, you look at the IP address and you say,

ah, this is a French IP address.

I'm going to send them the French language version or

send them to .fr version of my domain.

If someone comes in and their browser language is English,

or their IP address is something from America or

Canada, something like that, then you say, aha, English is

probably the best message, unless they're coming from the

French part of Canada, of course.

So what that is doing is you're making the decision

based on the IP address.

As long as you're not making some specific country that

Googlebot belongs to--

Googlandia or something like that--

then you're not doing something special or different

for Googlebot.

At least currently-- when we're making this video--

Googlebot crawls from the United States.

And so you would treat Googlebot just like a visitor

from the United States.

You'd serve up content in English.

And we typically recommend that you treat Googlebot just

like a regular desktop browser-- so Internet Explorer

7 or whatever a very common desktop browser is for your

particular site.

So geolocation--

that is, looking at the IP address and reacting to that--

is totally fine, as long as you're not reacting

specifically to the IP address of just Googlebot, just that

very narrow range.

Instead, you're looking at OK, what's the best user

experience overall depending on the IP address?

In the same way, if someone now comes in--

and let's say that they're coming in

from a mobile phone--

so they're accessing it via an iPhone or an Android phone.

And you can figure out OK, that is a completely different

user agent.

It's got completely different capabilities.

It's totally fine to respond to that user agent and give

them a more squeezed version of the website or something

that fits better on a smaller screen.

Again, the difference is if you're treating Googlebot like

a desktop user-- so that user agent doesn't have anything

special or different that you're doing--

then you should be in perfectly fine shape.

So you're looking at the capabilities of the mobile

phone, you're returning an appropriately customized page,

but you're not trying to do anything deceptive or

misleading.

You're not treating Googlebot really differently, based on

its user agent.

And you should be fine there.

So the one last thing I want to mention-- and this is a

little bit of a power user kind of thing-- is some people

are like, OK, I won't make the distinction based on the exact

user agent string or the exact IP address range that

Googlebot comes from, but maybe I'll

say check for cookies.

And if somebody doesn't respond to cookies or if they

don't treat JavaScript the same way, then I'll carve out

and I'll treat that differently.

And the litmus test there is are you basically using that

as an excuse to try to find a way to treat Googlebot

differently or try to find some way to segment Googlebot

and make it do a completely different thing?

So again the instinct behind cloaking is are you treating

users the same way as you're treating Googlebot?

We want to score and return roughly the same page that the

user is going to see.

So we want the end user experience when they click on

a Google result to be the same as if they'd just come to the

page themselves.

So that's why you shouldn't treat Googlebot differently.

That's why cloaking is a bad experience, why it violates

our quality guidelines.

And that's why we do pay attention to it.

There's no such thing as white hat cloaking.

We really do want to make sure that the page the user sees is

the same page that Googlebot saw.

OK, so I hope that kind of helps.

I hope that explains a little bit about cloaking, some

simple rules of thumb.

And again, if you get nothing else from this video,

basically ask yourself, do I have special code that looks

exactly for the user agent Googlebot or the exact IP

address of Googlebot and treat it differently somehow?

If you treat it just like everybody else-- so you send

it based on geolocation, you look at

the user agent phones--

that sort of thing is fine.

It's just you're looking for Googlebot specifically, and

you're doing something different, that's where you

start to get into a high risk area.

We've got more documentation on our website.

So we'll probably have links to that, if you look at the

metadata for this video.

But I hope that explains a little bit about why we feel

the way we do about cloaking, why we take it seriously, and

how we look at the overall effect in trying to decide

whether something is cloaking.

The end user effect is what we're ultimately looking at.

And so regardless of what your code is, if something is

served up that's radically different to Googlebot than to

users, that's something that we're probably going to be

concerned about.

Hope that helps.

The Description of Cloaking