How to Find Total Amount Using a Regular Expression in C#
Text documents in the form of Word, PDF, or text files may contain sale or purchase receipts and invoices. Different tools are available (insert bytescout link here) that can be used to read texts from Word, PDF, or other text documents. Once the text is read from PDF, the next step is to extract important information from the text such as the total amount of all the items, the invoice number, the currency symbols, etc. Regular expressions can be used to extract such information.
In this article, you will see how to calculate the total amount from text using regular expressions in C#.
Calculating Total Amount using Numbers Only
Suppose you have the following invoice in the form of a text file and it contains prices of various fruit and you want to calculate the total price of all the fruits.
The first step is to read the text file and then you can use a Regex expression that returns all the numbers from the text. Finally, the numbers can be converted into integers or floats, and then the sum of the numbers can be calculated. To read numbers from text, you can use the Split() method which returns a string array containing all the numbers. The Regex expression used to retrieve all the numbers from a text is “\D+”. The following script reads a text file that contains a fictional receipt from a local drive and then displays the sum of all numbers in a file.
using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; using System.Text.RegularExpressions; namespace RegexCodes { class Program { static void Main(string[] args) { string textFile = File.ReadAllText(@"E:\Datasets\invoice.txt", Encoding.UTF8); Console.WriteLine(textFile); Console.WriteLine("=========="); string[] numbers = Regex.Split(textFile, @"\D+"); numbers= numbers.Except(new List<string> { string.Empty }).ToArray(); int total_amount= Array.ConvertAll(numbers, s => int.Parse(s)).Sum(); Console.WriteLine("Total amount: "+ total_amount); } } }
Here is the output of the above script:
Item: Price Apple: 10 Orange: 20 Banana: 12 Peach: 13 ========== Total amount: 55
Calculating Total Amount using Currency Symbols
Invoiced often contains sale details with currency symbols. For instance, in the following receipt, the price of fruits is mentioned along with a dollar sign “$”. Also, the receipt contains an invoice number.
For receipts like the one mentioned above, The total sum will be the sum of the numbers that contain a dollar sign only. The invoice number should be ignored while calculating the total amount.
To do so, you can use the Regex.Matches() function which returns all the numbers from the text including the dollar sign. From the list of returned numbers, you can remove the numbers without the dollar sign i.e. the invoice number. The regex expression that returns all the numbers including the number with a dollar sign is “$?[0-9]+(.[0-9]+)?”.
Here is a script that adds all the numbers that contain a dollar sign.
using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; using System.Text.RegularExpressions; namespace RegexCodes { class Program { static void Main(string[] args) { string textFile = File.ReadAllText(@"E:\Datasets\invoice.txt", Encoding.UTF8); Console.WriteLine(textFile); Console.WriteLine("=============="); var numbers = Regex.Matches(textFile, @"\$?[0-9]+(\.[0-9]+)?"); List<float> num_list = new List<float>(); foreach (Match result in numbers) { if(result.Value.StartsWith("$")) { float num = float.Parse(result.Value.Replace("$", "")); num_list.Add(num); } } float total_amount= num_list.Sum(); Console.WriteLine("Total amount: $"+ total_amount); } } }
Output:
Item: Price Apple: $10.75 Orange: $20.50 Banana: $12.50 Peach: $13.50 Invoice #: 4885 ============== Total amount: $57.25
Other useful articles:
- How to Use RegEx for Data Extraction
- How to Find Total Tax Using a Regular Expression in C#
- How to Find a Number Using Regular Expressions in C#
- How to Find Invoice Numbers Using Regular Expressions in C#
- Find SSN Using a Regular Expression in C#
- Find Total Amount Using a Regular Expression in C#
- How to Find Website Links using Regex
- Module 1: Regular Expressions for Beginners
- Module 1: Regex Usage and Tool Demo
- Module 2: Regex Engine Basics (Part 1)
- Module 2: Regex Engine Basics (Part 2)
- Module 2: Regex Syntax in Detail (Part 1)
- Module 2: Regex Syntax in Detail (Part 2)
- Module 2: Quantifiers in Reg Ex for Beginners
- Module 2: Short Codes in Reg Ex for Beginners
- Module 2: Anchors and Boundaries in Detail
- Module 2: Grouping and Subpattern in Detail
- Module 3: Realtime Use Case of Regular Expressions - Part 1
- Module 3: Realtime Use Case of Regular Expressions - Part 2
- Module 3: Realtime Use Case of Regular Expressions - Part 3
- Module 3: Realtime Use Case of Regular Expressions - Part 4
- How to Find Quantity Field Using Regular Expression in C#
- How to Find Phone Numbers without a Specific Format
- How to Find Date Using Regular Expression in C#
- How to Find Time Using Regular Expression in C#
- How to Find a Sentence Using Regular Expressions in C#
- Find a Word Using Regular Expression in C#
- Find a Due Date using Regular Expressions in C#
- How to Find the End of a String Using Regular Expression in C
- How to Find the Start of a String Using Regular Expression in C
- How to Find a Comma using Regular Expression in C Sharp
- How to Find a Dot using Regular Expression in C
- How to Find a Semicolon using Regular Expression in C Sharp
- How to Find a Double Space using Regular Expression in C