How to Get Related Posts by Similar Tags in SQL

Here's some SQL code to find Related Blog Posts by common Tags using SQL.
by Updated August 25, 2012
If you've ever wanted to get Related Posts based on similar Tags from other Posts using SQL, the following SQL code should help with creating your own SQL query.  In this example, the Related Posts are articles, but they could be Blog posts or any other data that uses Tags.  The SQL query below gets the count of Posts that have the most amount of Tags in common and orders these Related Posts highest.  So in essence, the Related Posts returned should be more highly related based on the number of tags that the Posts have in common.   SEE BELOW for the SQL Table Structure of the Tags.

Basic code to query the number of Common Tags an Article has with other Related Articles (Ordered by Common Tag):
SELECT tpe1.ArticleId, tpe2.ArticleId, COUNT(*) as CommonTags
FROM dbo.article_TagPostEntry tpe1 
INNER join dbo.article_TagPostEntry tpe2 ON tpe1.TagId = tpe2.TagId
WHERE  tpe1.ArticleId <> tpe2.ArticleId
GROUP BY tpe1.ArticleId, tpe2.ArticleId
ORDER BY CommonTags DESC;


Here's the SQL code to query Related Article Posts based on similar Tags:
DECLARE @ArticleId INT
SET @ArticleId = 3 -- TEST NUMBER

-- Get the Related Articles based on similiar Tags
			SELECT TOP 5 a.Title, a.Description FROM dbo.article_Articles a 
			WHERE a.ArticleId IN (
				SELECT TOP 8 tpe2.ArticleId
				FROM dbo.article_TagPostEntry tpe1 
				INNER join dbo.article_TagPostEntry tpe2 ON tpe1.TagId = tpe2.TagId
				WHERE  tpe1.ArticleId != tpe2.ArticleId AND tpe1.ArticleId = @ArticleId
				GROUP BY tpe2.ArticleId 
				ORDER BY NEWID())
			



Here's another way to query Related Article Posts based on similar Tags (THIS METHOD WILL PROBABLY be SLOWER, but you could modify the code to query TAGS from seperate tables that may not be related by IDs):
DECLARE @ArticleId INT
SET @ArticleId = 3 -- TEST NUMBER

-- Get the Related Articles based on similiar Tags
			SELECT TOP 5 a.Title, a.Description FROM dbo.article_Articles a 
			WHERE a.ArticleId IN (
				SELECT TOP 5 ArticleId
				FROM dbo.article_TagPostEntry 
				WHERE ArticleId <> @ArticleId AND TagId IN (SELECT tpe.TagId FROM dbo.article_TagPostEntry tpe WHERE tpe.ArticleId = @ArticleId) 
				GROUP BY  ArticleId 
				ORDER BY COUNT(TagId) DESC)
				ORDER BY NEWID()

Alternatively, to get a more randomized listing of related Posts, replace ORDER BY COUNT(TagID) DESC wtih:
ORDER BY NEWID(), COUNT(TagId) DESC


----------------------

Keep in mind the Table Structure for Tags uses 2 tables:

Table #1 is the "article_TagPostEntry" table and looks like this:
TABLE [dbo].[article_TagPostEntry](
	[PostTagId] [bigint] IDENTITY(1,1) NOT NULL,
	[ArticleId] [int] NOT NULL,
	[TagId] [bigint] NOT NULL,
	[DateCreated] [datetime] NOT NULL,

Table #2 is the "article_TagNames" table and is where the Names of the Tags are actually kept.  TagId is used to relate the two Tag tables together.  The Related Posts SQL code in the example above doesn't need to access this table.
 
TABLE [dbo].[article_TagNames](
	[TagId] [bigint] IDENTITY(1,1) NOT NULL,
	[TagName] [nvarchar](50) NOT NULL,
	[TagPathName] [nvarchar](100) NOT NULL,
	[TagCount] [int] NOT NULL,
	[DateCreated] [date] NOT NULL,

NOTE: The main Articles table gets related to the "article_TagPostEntry" table using ArticleId.


JUST Keep in mind that these queries will put a serious strain on your database if you run them dynamically everytime someone visits a page.  You'd probably be better off inserting the Related Posts that this query outputs into a Table, and retrieving the data by ID's.   Otherwise, you may find you get a lot of  SQL Timeout errors if your database server is not powerful enough to handle the load on a high traffic site.  You don't want to deal with the following error:
 "Timeout expired. The timeout period elapsed prior to completion of the
operation or the server is not responding"
 


2
0

Add your comment

by Anonymous - Already have an account? Login now!
Your Name:

Comment:
Enter the text you see in the image below
What do you see?
Can't read the image? View a new one.
Your comment will appear after being approved.

Related Posts


UPDATE 12-16-2011: If you are using ASP.NET and your website / database is live, and you want to take the database offline, you're better off putting an App_Offline.htm file in your website main directory to first take the website down for maintance (I...  more »

Here's a step by step process to enable a remote connection to your SQL Server 2008 Express database that is configured on Windows Server 2008 using an assigned port #. 1) First make sure the SQL Server Authentication is enabled and the User name that...  more »

In SQL Server 2008 Management Studio, you may want to change the default settings to allow you to edit more than the 200 rows at a time, or select more than 1000 rows, which are the default settings for SQL Server. To modify "Edit Top 200 Rows" or "Select...  more »

When using SQL Server 2008, you may receive a Save (Not Permitted Dialog) box pop-up when trying to save changes to an existing table. The dailog box reads: "Saving changes is not permitted. The changes you have made require the following tables to be...  more »

Here's how you can enable SQL Server Authentication in a SQL Server 2008 database, and then add a User login and password account for connecting to the database. Open SQL Server Management Studio In the Object Explorer sidebar, right-click on the top SQL...  more »

So here's a quick tip on how to fix the following SQL Server "SqlDateTime overflow" error. System.Data.SqlTypes.SqlTypeException SqlDateTime overflow. Must be between 1/1/1753 12:00:00 AM and 12/31/9999 11:59:59 PM. The reason that you are probably...  more »

I just came across a way to test a data providers connection string (like a SQL Server database) with the help of a plain text file using Notepad. To investigate and test out if your connection string works, your going to want to create a UDL file. To do...  more »

Here's a quick SQL tip on how to get records beginning with numbers only. Use the query string LIKE '[0-9]%' For example: SELECT u.UserId, u.UserName FROM dbo.aspnet_Users u WHERE u.UserName LIKE '[0-9]%' ORDER BY u.UserName Keep in mind, If you are using...  more »