Find out Duplicate Content on Your WordPress Blog and Fix Them
Talking of setting up a personal blog or business portal up on WordPress is one of the most common and widely accepted ways to get onto the World Wide Web with your own content; but to make it outshine the rest of the sites and drive in traffic – a lot goes in behind the scenes. You must be aware of SEO specific write ups and other ways of pulling in traffic; but factors like avoiding repetition and duplicate content penalties is also of utmost importance to make sure that your pages get proper search engine authorities.
The Problem
Now by talking about repetitions in WordPress language, we mean the varied URL addresses that could lead to the same page of a blog, which are generally created automatically by the system. The different ways of getting into WWW servers and varied gateways may also result in the above said problem.
You can think of all the following URLs containing the same content:
- http://www.domain.com/new-blog-post
- http://domain.com/new-blog-post
- http://www.domain.com/category/new-category/new-blog-post
- http://www.domain.com/tag/new-tag/new-blog-post
- http://domain.com/category/new-category/new-blog-post
- http://domain.com/tag/new-tag/new-blog-post
What is the drawback of multiple addresses pointing to the same page?
The simple answer would be that search engines would confuse with multiple URLs leading to the same page content; thinking different URLs have copied the content from one original page and over time will make the pages fall down in search results due to duplicate entries – something that is popularly known as duplicate content penalty.
The Way Out
The first step in order to reduce duplicity is to find out the duplicates and the results that are being penalized by search engines for being repeated. The efficient way to find is to search for www.yoursitename.comon Google and to read through the omitted results and note which URLs relating to your blog have been indexed. Once this is done, you are ready to tackle the main problem that is in the subject. URL relinking can be on various elements of the blog or maybe the whole blog itself.
Tags & Categories
Author & Archives
Putting up the author or archive links on the blog is one of the most user-friendly additions to the blog, but the pages that are created under this are having the same content as the main post as such are indexed by the search partners. The way out is to program the archives to display excerpt of the post rather than the whole post.
Attachment URL issue
The links that are set for the attachments up for the posts like pictures or videos are in itself creating multiple links, as the same media maybe uploaded elsewhere on the internet. The best way to get out of it is to display the media on your page and not to link it to its root source. Moreover setting the attachment’s URL to the parent post in a way would also help you fight some canonicalization issues.
Canonical Issues
- http://domain.com/new-blog-post
- http://domain.com/ new-blog-post/
- http://www.domain.com/ new-blog-post
- http://www.domain.com/ new-blog-post/
All of the above links would take you to the same very page. Thus making the URL settings to create canonical page URLs can reduce or remove this said problem that gives rise to duplicate pages.
Now to make up for a canonical page URL, you can follow the codes below:
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.mysite\.com$ [NC]
RewriteRule ^(.*)$ http://www.mysite.com/$1 [R,L]
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
The above codes are expected to be inserted into your .htaccess file in order to facilitate 301 redirects.
Content stealing by third party
Once the content from your page is copied and hosted under another domain name, the repetition issue becomes a problem again. In this case, you are unable to take care of the problem themselves and must approach the search administrator, like Google DMCA which allows you to file complaints in order to remove duplicate or copied content from the indexes.
Adding non-index, no follow Meta tags
This means that one tells the search engines which pages from your blog the spiders must crawl so as to avoid coming across repeated data. Simple enough to do is to edit the homepage codes under theme editor in WP; i.e. to insert the following codes within the head.
if((is_home() && ($paged < 2 )) || is_single() || is_page() || is_category()){
echo '<meta name="robots" content="index,follow" />';
} else {
echo '<meta name="robots" content="noindex,follow" />';
}
Adding Unique Meta description
Utilizing independent WP plugins, you can easily crawl through the posts to select out excerpts of posts to add to Meta description. This helps spiders stay off the duplicate data that is found in the main post or other tags and categories pages. As such it is clear that to get the most out of the web and to draw in the maximum traffic, you must be aware of killing the duplicate links so as to get recognized by the search engines and to get and keep the traffic coming over.