15 April 2012

How A Search Engine Might Classify Web Pages Sensitive

image
Some web pages feature unsuitable content, but what might be considered unsuitable is obviously a matter of interpretation. This creates the problem of search-engine classification: subjective standards cannot be applied to objective systems. Computers are not human – at least not yet – and there is no sense is pretending otherwise. Unsuitable content should not be served to unsuspecting web users, however, which is why Microsoft and Google may be developing technologies designed to solve the problem.

What is Unsuitable?
Unsuitable may refer to the quality of content provided by a website or it could describe sensitive categories of content such as gambling and pornography. The term is more often used by SEO experts to describe advertisements that appear incongruous on their host websites.
The point is more effectively illustrated by way of example. Steven Levy’s book about Google, In The Plex, provides several cautionary tales for website administrators and SEO specialists. In the book, Mr Levy highlights various indiscretions arising from automated Google AdSense campaigns. One of the worst was an advertisement for plastic bags on a web page detailing the news of a murder. What made the advertisement unsuitable? The murder victim’s dismembered body parts were stored in plastic bags before being discovered. In this sense, unsuitable is synonymous with gruesomely insensitive.
There are countless other ways in which automated advertising campaigns might be considered unsuitable or insensitive, but the important point to note is that Google has failed to adequately address the problem. According to Mr Levy, the problem would never be resolved because it was simply too hard.
Developments
A good UK SEO company will appreciate the importance of matching appropriate ads and content in ways that aim to avoid offending potential customers. The extent to which unsuitable content in this context might offend web users is a matter of speculation, but no respectable UK SEO companies would risk alienating potential customers by matching the insensitive with the unsuitable.
Beyond human-managed campaigns, web users are at the mercy of search-engine algorithms that compute data and classify queries to produce results and serve ads. The absence of a human touch is limiting, not least in the sense that SERPs (search engine results pages) might be filled with unsuitable content or web pages might be crammed with insensitive, inappropriate ads.
Microsoft has attempted to tackle the problem to a degree by building on its ‘Sensitive webpage content detection’ patent, which was granted last year after being filed in 2007. The aim of Microsoft’s tool is to identify “sensitivity categories” by analysing one or more pages of content on a website.
The system uses “multi-class classifiers” –essentially algorithmic filters – to establish the “sensitivity level” of a website. Microsoft provides a detailed breakdown of categories that are considered sensitive or non-sensitive, with topics involving war, criminal activities, sex or terrorism among those that would achieve a relatively high sensitivity level. Frequency of keywords (sex, porn, naked, etc.) in a page of content is also assessed by the application.
Google is yet to produce a system that effectively analyses the sensitivity of website content, but it has shown a willingness to censor content that may be deemed unsuitable for users living in countries such as China and India. The extent to which sensitivity testing may one day overlap with broad-scale censorship is a cause for concern among all internet users.

SHARE THIS POST   

  • Facebook
  • Twitter
  • Myspace
  • Google Buzz
  • Reddit
  • Stumnleupon
  • Delicious
  • Digg
  • Technorati
Author: Deepak
Deepak Rana is the CEO and Founder of Technofers. He is a young Blogger from India.