Link Search Menu Expand Document

How to Find Total Amount Using a Regular Expression in C#

Text documents in the form of Word, PDF, or text files may contain sale or purchase receipts and invoices. Different tools are available (insert bytescout link here) that can be used to read texts from Word, PDF, or other text documents. Once the text is read from PDF, the next step is to extract important information from the text such as the total amount of all the items, the invoice number, the currency symbols, etc. Regular expressions can be used to extract such information.

In this article, you will see how to calculate the total amount from text using regular expressions in C#.

Calculating Total Amount using Numbers Only

Suppose you have the following invoice in the form of a text file and it contains prices of various fruit and you want to calculate the total price of all the fruits.

Find Invoice Total Amount

The first step is to read the text file and then you can use a Regex expression that returns all the numbers from the text. Finally, the numbers can be converted into integers or floats, and then the sum of the numbers can be calculated. To read numbers from text, you can use the Split() method which returns a string array containing all the numbers. The Regex expression used to retrieve all the numbers from a text is “\D+”. The following script reads a text file that contains a fictional receipt from a local drive and then displays the sum of all numbers in a file.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace RegexCodes
{
    class Program
    {
        static void Main(string[] args)
        {
            string textFile = 
File.ReadAllText(@"E:\Datasets\invoice.txt", Encoding.UTF8);
            Console.WriteLine(textFile);

            Console.WriteLine("==========");

            string[] numbers = Regex.Split(textFile, @"\D+");
            numbers= numbers.Except(new List<string> { string.Empty 
}).ToArray();
            int total_amount= Array.ConvertAll(numbers, s => 
int.Parse(s)).Sum();

            Console.WriteLine("Total amount: "+ total_amount);
        }
    }
}

Here is the output of the above script:

Item: Price

Apple: 10
Orange: 20
Banana: 12
Peach: 13


==========
Total amount: 55

Calculating Total Amount using Currency Symbols

Invoiced often contains sale details with currency symbols. For instance, in the following receipt, the price of fruits is mentioned along with a dollar sign “$”. Also, the receipt contains an invoice number.

Find Total Amount on Invoices

For receipts like the one mentioned above, The total sum will be the sum of the numbers that contain a dollar sign only. The invoice number should be ignored while calculating the total amount.

To do so, you can use the Regex.Matches() function which returns all the numbers from the text including the dollar sign. From the list of returned numbers, you can remove the numbers without the dollar sign i.e. the invoice number. The regex expression that returns all the numbers including the number with a dollar sign is “$?[0-9]+(.[0-9]+)?”.

Here is a script that adds all the numbers that contain a dollar sign.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace RegexCodes
{
   class Program
   {
      static void Main(string[] args)
      {
        string textFile = File.ReadAllText(@"E:\Datasets\invoice.txt", 
Encoding.UTF8);
        Console.WriteLine(textFile);

        Console.WriteLine("==============");

        var numbers = Regex.Matches(textFile, @"\$?[0-9]+(\.[0-9]+)?");

        List<float> num_list = new List<float>();
        foreach (Match result in numbers)
        {
          if(result.Value.StartsWith("$"))
          {
            float num  = float.Parse(result.Value.Replace("$", 
""));

            num_list.Add(num);
          }

        }

        float total_amount= num_list.Sum();
        Console.WriteLine("Total amount: $"+ total_amount);
      }
   }
}

Output:

Item: Price

Apple: $10.75
Orange: $20.50
Banana: $12.50
Peach: $13.50

Invoice #: 4885

==============
Total amount: $57.25

 

Other useful articles:


Back to top

© , Regexsonline.com — All Rights Reserved - Terms of Use - Privacy Policy