How to Code a Search Helper Class to Clean Stop Words with C#

Remove stopwords in text with C#.
by Updated May 25, 2021

Today I decided to implement a StopWords filter in C# that would filter out certain woulds from a search engine query.  I wanted something to filter out common words like "a", "I", "to", "the" "how", from search queries since in most cases these words don't really help with getting the most accurrate search results from a query, and instead they just create more unnecessary search results.  

Keep in mind, there's not an end all be all list of stop words to use in all cases because ultimately you have to decide for yourself what's best for the application (and it's users) when determining what stopwords to include and what to exclude.  However, below are a couple resources to find common stop words lists that you may want to use to create your own StopWords list: 

I ultimately narrowed my StopWords list down to some of the more common words, that I felt wouldn't interfere too much with a searcher's intent:

"a", "about", "actually", "after", "also", "am", "an", "and", "any", "are", "as", "at", "be", "because", "but", "by", 
"could", "do", "each", "either", "en", "for", "from", "has", "have", "how",  "i", "if", "in", "is", "it", "its", "just", "of", "or", "so", "some", "such", "that", "the", "their", "these", "thing", "this", "to", "too", "very", "was", "we", "well", "what",        "when", "where",  "who", "will", "with", "you", "your"

Once I figured out my StopWords list I created a SearchHelper class in C# to clean search query Words before sending them to the database to return search results.  Below is the SearchHelper.cs C# class (download available: see attached .cs file below):

using System;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Text;

public class SearchHelper
{
    private static string[] stopWordsArrary = new string[] { "a", "about", "actually", "after", "also", "am", "an", "and", "any", "are", "as", "at", "be", "because", "but", "by", 
                                                "could", "do", "each", "either", "en", "for", "from", "has", "have", "how", 
                                                "i", "if", "in", "is", "it", "its", "just", "of", "or", "so", "some", "such", "that", 
                                                "the", "their", "these", "thing", "this", "to", "too", "very", "was", "we", "well", "what", "when", "where",
                                                "who", "will", "with", "you", "your" 
                                            };

        /// 
		/// Removes stop words from the specified search string.
		/// 
		public static string CleanSearchedWords(string searchedWords)
		{

			searchedWords = searchedWords
											.Replace("\\", string.Empty)
											.Replace("|", string.Empty)
											.Replace("(", string.Empty)
											.Replace(")", string.Empty)
											.Replace("[", string.Empty)
											.Replace("]", string.Empty)
											.Replace("*", string.Empty)
											.Replace("?", string.Empty)
											.Replace("}", string.Empty)
											.Replace("{", string.Empty)
											.Replace("^", string.Empty)
											.Replace("+", string.Empty);

            // transform search string into array of words
            char[] wordSeparators = new char[] { ' ', '\n', '\r', ',', ';', '.', '!', '?', '-', ' ', '"', '\'' };
            string[] words = searchedWords.Split(wordSeparators, StringSplitOptions.RemoveEmptyEntries);

            // Create and initializes a new StringCollection.
             StringCollection myStopWordsCol = new StringCollection();
            // Add a range of elements from an array to the end of the StringCollection.
             myStopWordsCol.AddRange(stopWordsArrary);

			StringBuilder sb = new StringBuilder();
			for (int i = 0; i < words.Length; i++)
			{
				string word = words[i].ToLowerInvariant().Trim();
                if (word.Length > 1 && !myStopWordsCol.Contains(word))
					sb.Append(word + " ");
			}

			return sb.ToString();
		}
}

That's it...   Now on your search results page code, you can use the SearchHelper.CleanSearchWords(searchWordsHere)  to clean the searched words string.  Pretty simple, but works well for filtering out common words from a search query.

 


FILES: SearchHelper.cs - Clean Stop Words in C#

0
0

Add your comment

by Anonymous - Already have an account? Login now!
Your Name:

Comment:
Enter the text you see in the image below
What do you see?
Can't read the image? View a new one.
Your comment will appear after being approved.

Related Posts


The AjaxFileUpload control that's part of the AJAX Control Toolkit, works great for easily uploading multiple files at once. However, it gets a little tricky if you want to update an UpdatePanel after all the files have finished uploading, especially if...  more »

You may need to have an image refreshed automatically on a web page in ASP.NET to get the latest image. One instance where you might want the fresh image is if you upload an image that has the same file name as an already existing image file on the...  more »

Adding a CSS border to an ASP.NET Image control was a mystery to me for the longest time. While you could easily use an html image and add the runat="server" to it and then add CSS, I really wanted to use an asp:Image control along with a CSS border....  more »

After running a ASP.NET website on IIS 7.5 for the first time on a Windows 7 computer, I was faced with the following error message: Login failed for user 'IIS APPPOOL\ASP.NET v4.0'. Description: An unhandled exception occurred during the execution of the...  more »

Here's how you can add "nofollow" tags to links generated by a Sitemap file that is bound to an ASP.NET Repeater control using a SiteMapDataSource. I'm currently using this technique for the footer links of GotKnowHow.com, so if you View Source of the...  more »

This is one of those simple web page design things that can drive a web developer absolutely crazy.  more »

Here's how to install Internet Information Services (IIS7) on a Windows 7 (or Vista) computer so that ASP.NET websites will run on the IIS7 web server. First, you will want to make sure that you are signed into an account with Administrator access on your...  more »

So below I'm going to share with you a fairly easy to use and understand ASP.NET User Control that allows you to pick a Date (with the ajaxToolkit CalenderExtender) and also select the Time of day using a drop down list. I've named the control...  more »

Here's how you can UrlEncode the plus sign (+) in a URL querystring in ASP.NET and then retrieve the plus symbol after UrlDecoding the string. In this example, I will do a postback and redirect the Server.UrlEncoded string to another page. First we will...  more »

If you do any sort of ASP.NET programming there usually comes a time when you need to get a websites Base URL. The following shows two examples, the first example shows how to get the Base Site Url using C#, which can be used for getting both the...  more »