Link Search Menu Expand Document

How to Find Word Frequencies using Regex

In this article, you will see how to find frequencies of different words within a C# string using regular expressions. You will first see how to count the number of words in a string, next you will see how to find the frequency of occurrence of each word within the input string. So, let’s begin without an ado.

Counting All Words

To count all the words in a string, you can use the Matches() method from the System.Text.RegularExpressions module.

The Matches() method accepts a regular expression and the input string as the first and second parameters, respectively. The regex expression that returns all the words (non digits) in a string is “\w+”.

The Matches() method returns a collection of Match objects where each object contains information about one of the matched words. To find the total number of words, you can use the Count attribute of the object returned by the Matches() method.

Furthermore, you can use the Value attribute of each of the Match objects to print the corresponding values of words. Similarly, the Index attribute prints the index of words.

The following script prints the total count of all the words in the input string along with the text and index for each word.

using System;
using System.Text.RegularExpressions;

namespace RegexCodes
{
    class Program
    {
        static void Main(string[] args)
        {
            string input = "Your name is [John] and your age is (32). You are {married}";

            string regex = @"\w+";

            var result = Regex.Matches(input, regex);

            if (result.Count > 0)
            {
                Console.WriteLine("Total Number of Words found: " + result.Count);

                int i = 1;
                foreach (Match m in result)
                {
                    Console.WriteLine("Word '" + m.Value + "' found at index: " + m.Index);
                    i++;
                }

            }

            Console.ReadLine();
        }
    }
}

 

Output:

Regular Expression How to Find Word Frequency

 

Finding  Frequency of Each Word

You can also find the frequency of occurrence of each word. To do so, you can again use the Matches() method which matches all the words in the input string. The regex expression will remain the same i.e. ““\w+””.

Next, you can create a C# Dictionary collection with key as string type and value as integer type. The keys of this dictionary will store word texts while values will correspond to the frequency of occurrences of words.

After that you can iterate through the collection of Match objects returned by the Matches() method. If the word doesn’t already exist in the dictionary that you created, add the word text as the dictionary key and assign it a value of 1. Else if the word already exists in the dictionary as a dictionary key, increment the corresponding dictionary value by 1.

The following script returns frequencies of occurrences of all the words in an input string.

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace RegexCodes
{
    class Program
    {
        static void Main(string[] args)
        {
            string input = "Two bananas and two apples. Three bananas, and two mangoes.";

            string regex = @"\w+";

            var result = Regex.Matches(input, regex);

            Dictionary<string, int> words = new Dictionary<string, int>();

            if (result.Count > 0)
            {
               
                Console.WriteLine("Total Number of Words found: " + result.Count);

                foreach (Match m in result)
                {

                    if (!words.ContainsKey(m.Value.ToLower()))
                        words.Add(m.Value.ToLower(), 1);
                    else
                        words[m.Value.ToLower()]++;
                }

                foreach (var item in words)
                {
                    Console.WriteLine(item.Key + " " + item.Value);
                }

            }

            Console.ReadLine();
        }
    }
}

 

Output:

Regular Expression - Find Word Frequency

 

Finding Frequency of Specific Words

The regex expression “\w+” returns all the words from the input string. You can also find frequencies of specific words within a string by filtering the words using regex expressions.

For instance, in the script below, the regex expression used is “\w*s\b” which matches all the words that end with an “s”.

The rest of the process is similar to what you saw in the previous section. You can create a dictionary where keys correspond to word texts, while values correspond to frequencies of occurrences for words.

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace RegexCodes
{
    class Program
    {
        static void Main(string[] args)
        {
            string input = "Two bananas and two apples. Three bananas, and two mangoes.";

            string regex = @"\w*s\b";

            var result = Regex.Matches(input, regex);

            Dictionary<string, int> words = new Dictionary<string, int>();

            if (result.Count > 0)
            {
               
                Console.WriteLine("Total Number of Words found: " + result.Count);

                foreach (Match m in result)
                {

                    if (!words.ContainsKey(m.Value.ToLower()))
                        words.Add(m.Value.ToLower(), 1);
                    else
                        words[m.Value.ToLower()]++;
                }

                foreach (var item in words)
                {
                    Console.WriteLine(item.Key + " " + item.Value);
                }

            }

            Console.ReadLine();
        }
    }
}

 

Output:

Reg Ex How to Find Word Frequency

 

Finding Frequencies using Split Method

In addition to using the Matches() method, you can also use the regex Split() method to count the frequencies of all the words in a string. The Split() method simply splits a string using some delimiter and returns the resulting substrings.

To count frequencies of all the words using the Split() method approach, you can use the regex value “\s+” which splits a string using spaces.

Next, using the returning collection of strings, you can create a dictionary which contains words and corresponding frequencies, as you saw in the previous section.

Here is an example of how to use the Split() method approach to finding word frequencies within an input string.

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace RegexCodes
{
    class Program
    {
        static void Main(string[] args)
        {
            string input = "Two bananas and two apples. Three bananas, and two mangoes.";

            string regex_clean = @"[^\w\s]+";
            string regex = @"\s+";


            string clean = Regex.Replace(input, regex_clean, "");

            var result = Regex.Split(clean, regex);

            

            Dictionary<string, int> words = new Dictionary<string, int>();

            if (result.Length > 0)
            {

                Console.WriteLine("Total Number of Words found: " + result.Length);

                foreach (string s in result)
                {

                    if (!words.ContainsKey(s.ToLower()))
                        words.Add(s.ToLower(), 1);
                    else
                        words[s.ToLower()]++;
                }

                foreach (var item in words)
                {
                    Console.WriteLine(item.Key + " " + item.Value);
                }

            }

            Console.ReadLine();
        }
    }
}

 

Output

Regular Expression - Find Word Frequency  

Other useful articles:


Back to top

© , Regexsonline.com — All Rights Reserved - Terms of Use - Privacy Policy