Link Search Menu Expand Document

How to Find Invoice Numbers Using Regular Expressions in C#

Sale and purchase invoices contain important information about the products being sold or purchased. For instance, invoices contain information about the prices of all the items, the total price of the order, VAT numbers, website URL, etc. It is difficult to keep track of all the orders manually. Digitizing invoices via a database or any other flat computer file can automate the process of finding and processing text invoices.

However, if you have text invoices and you need to store them in a database, you will first need to read the information from an invoice. This article explains how to read the invoice number from a text invoice.

Finding Invoice Containing Numbers Only

In this section, you will be reading the invoice number from the following sample invoice.

Invoice Numbers

The invoice number in the above invoice is 588944694, which consists of numbers only. The regex patterns used to read the invoice number depends highly on the invoice text. For instance, in the above invoice, the invoice number is presented as “Invoice #: 588944694”.  To read such an invoice, you can use the regex patterns that read the string “Invoice #: ” followed by any digit. The regex pattern which can be used to do so is Invoice #: \s\d. The regex patterns tell the regex to include the string “Invoice #:” followed by any number of spaces and then numbers, in the final output.  This pattern can be passed to the “Match()” function of the regex module as shown in the following script:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace RegexCodes
{
    class Program
    {
        static void Main(string[] args)
        {
            string textFile = 
File.ReadAllText(@"E:\Datasets\invoice.txt", Encoding.UTF8);

            Console.WriteLine("====================");
            var myRegex = new Regex(@"(Invoice #: \s*\d*)", 
RegexOptions.IgnoreCase);

            string result = myRegex.Match(textFile).ToString();

            Console.WriteLine(result);

        }
    }
}

The output of the script above is as follows.

Output

Find Invoice Numbers

You can see that the invoice number is successfully read.

Finding Invoice Containing Numbers and Alphabets

Invoice numbers often contain alphabets in addition to numbers as shown in the following example. Here the invoice number is XY588944694.

How to Find Invoice Numbers

To read such invoice numbers, you will need to update your regex expression to Invoice #: \s\wHere the regex pattern “w*” tells regex to include digits as well as numbers in the final output. Look at the following script for an example:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace RegexCodes
{
    class Program
    {
        static void Main(string[] args)
        {
            string textFile = 
File.ReadAllText(@"E:\Datasets\invoice.txt", Encoding.UTF8);

            Console.WriteLine("====================");
            var myRegex = new Regex(@"(Invoice #: \s*\w*)", 
RegexOptions.IgnoreCase);

            string result = myRegex.Match(textFile).ToString();

            string invoice_num = result.Split(":")[1];
            Console.WriteLine(invoice_num.Trim());

        }
    }
}

In the script above, after reading the text “Invoice #:  XY588944694”, the string is split using a colon and the second part of the string i.e the invoice number only is printed on the console. Here is the output.

Output

Invoice Numbers with RegEx Csharp

Other useful articles:


Back to top

© , Regexsonline.com — All Rights Reserved - Terms of Use - Privacy Policy