Find out Duplicate Content on Your WordPress Blog and Fix Them

Talking of setting up a personal blog or business portal up on WordPress is one of the most common and widely accepted ways to get onto the World Wide Web with your own content; but to make it outshine the rest of the sites and drive in traffic – a lot goes in behind the scenes. You must be aware of SEO specific write ups and other ways of pulling in traffic; but factors like avoiding repetition and duplicate content penalties is also of utmost importance to make sure that your pages get proper search engine authorities.

duplicate

The Problem

Now by talking about repetitions in WordPress language, we mean the varied URL addresses that could lead to the same page of a blog, which are generally created automatically by the system. The different ways of getting into WWW servers and varied gateways may also result in the above said problem.

You can think of all the following URLs containing the same content:

  • http://www.domain.com/new-blog-post
  • http://domain.com/new-blog-post
  • http://www.domain.com/category/new-category/new-blog-post
  • http://www.domain.com/tag/new-tag/new-blog-post
  • http://domain.com/category/new-category/new-blog-post
  • http://domain.com/tag/new-tag/new-blog-post

What is the drawback of multiple addresses pointing to the same page?

The simple answer would be that search engines would confuse with multiple URLs leading to the same page content; thinking different URLs have copied the content from one original page and over time will make the pages fall down in search results due to duplicate entries – something that is popularly known as duplicate content penalty.

The Way Out

The first step in order to reduce duplicity is to find out the duplicates and the results that are being penalized by search engines for being repeated. The efficient way to find is to search for www.yoursitename.comon Google and to read through the omitted results and note which URLs relating to your blog have been indexed. Once this is done, you are ready to tackle the main problem that is in the subject. URL relinking can be on various elements of the blog or maybe the whole blog itself.

WordPress

Tags & Categories

Once the tags and categories have been allotted to the posts that are uploaded on the site, the most common look and feel of the site says that you can get the same post under certain pages for corresponding tags and categories. To prevent this, you can program WordPress to only show an excerpt of the post under tags page and to redirect to the main post when asked by the user. This no doubt reduces the chances of duplicate pages.

Author & Archives

Putting up the author or archive links on the blog is one of the most user-friendly additions to the blog, but the pages that are created under this are having the same content as the main post as such are indexed by the search partners. The way out is to program the archives to display excerpt of the post rather than the whole post.

Attachment URL issue

The links that are set for the attachments up for the posts like pictures or videos are in itself creating multiple links, as the same media maybe uploaded elsewhere on the internet. The best way to get out of it is to display the media on your page and not to link it to its root source. Moreover setting the attachment’s URL to the parent post in a way would also help you fight some canonicalization issues.

Canonical Issues

  • http://domain.com/new-blog-post
  • http://domain.com/ new-blog-post/
  • http://www.domain.com/ new-blog-post
  • http://www.domain.com/ new-blog-post/

All of the above links would take you to the same very page.  Thus making the URL settings to create canonical page URLs can reduce or remove this said problem that gives rise to duplicate pages.

duplicate content

Now to make up for a canonical page URL, you can follow the codes below:

RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.mysite\.com$ [NC]
RewriteRule ^(.*)$ http://www.mysite.com/$1 [R,L]
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

The above codes are expected to be inserted into your .htaccess file in order to facilitate 301 redirects.

Content stealing by third party

Once the content from your page is copied and hosted under another domain name, the repetition issue becomes a problem again. In this case, you are unable to take care of the problem themselves and must approach the search administrator, like Google DMCA which allows you to file complaints in order to remove duplicate or copied content from the indexes.

Adding non-index, no follow Meta tags

This means that one tells the search engines which pages from your blog the spiders must crawl so as to avoid coming across repeated data. Simple enough to do is to edit the homepage codes under theme editor in WP; i.e. to insert the following codes within the head.  

if((is_home() && ($paged < 2 )) || is_single() || is_page() || is_category()){
echo '<meta name="robots" content="index,follow" />';
} else {
echo '<meta name="robots" content="noindex,follow" />';
}

Adding Unique Meta description

Utilizing independent WP plugins, you can easily crawl through the posts to select out excerpts of posts to add to Meta description. This helps spiders stay off the duplicate data that is found in the main post or other tags and categories pages. As such it is clear that to get the most out of the web and to draw in the maximum traffic, you must be aware of killing the duplicate links so as to get recognized by the search engines and to get and keep the traffic coming over.

About the author

Nitin Agarwal

A blogger, tech evangelist, YouTube creator, books lover, traveler, thinker, and believer of minimalist lifestyle.