How to Find a Due Date using Regular Expressions in C#
Extracting due dates from the documents can be an important task. For instance, a company might want to get information about the invoices that have not been paid before the due date. Finding due dates manually from thousands of documents can be cumbersome. Automatic extraction can save time and human resources. In this article, you will see how to extract the due date from a text document using the C# regex expression.
As an example, you will see how to find the due date from the following text invoice using regular expressions in C#. For the sake of experimentation, we name the following text file “invoice.txt”.
How to Find a Due Date without a Specific Format
In this section, you will see how to find the due date that is not in any specific format. The trick here is to simply extract any string that follows the word “Due Date:”. The regular expression that matches the string that follows the word “Due Date:” is @”(Due Date: .*). I
Look at the following string. Here, you first read all the text from the “invoice.txt” file using the File.ReadAllText() function.
Next, you create an object of the Regex class from C#. To find the due date, you need to call the Match() function from the Regex class object and pass it the file text and your regex expression. The Match() function returns the matched string which corresponds to the due date.
using System; using System.IO; using System.Linq; using System.Text; using System.Text.RegularExpressions; namespace RegexCodes { class Program { static void Main(string[] args) { string textFile = File.ReadAllText(@"D:\Datasets\invoice.txt", Encoding.UTF8); Console.WriteLine("===================="); var myRegex = new Regex(@"(Due Date: .*)", RegexOptions.IgnoreCase); string result = myRegex.Match(textFile).ToString(); Console.WriteLine(result); Console.ReadLine(); } } }
Here is the output of the above script, you can see that the due date has been successfully extracted.
The problem with the above script is that it will match any string that follows the “Due Date:” even if it's some random text. To make sure that you only extract text that is in a specific date format, you can use a regular expression that specifies the format. You will see that in the next section.
How to Find Due Date with a Specific Format
In this section, you will extract a string that follows the word “Due Date:” in the format dd-dd-dddd (two digits followed by a dash, then two digits followed by a dash, and then four digits). The regex expression used for that will be: (Due Date: [0-9]?[0-9]-[0-9]{2}-[0-9]{4}).
Execute the following script to see this in action:
using System; using System.IO; using System.Linq; using System.Text; using System.Text.RegularExpressions; namespace RegexCodes { class Program { static void Main(string[] args) { string textFile = File.ReadAllText(@"D:\Datasets\invoice.txt", Encoding.UTF8); Console.WriteLine("===================="); var myRegex = new Regex(@"(Due Date: [0-9]?[0-9]-[0-9]{2}-[0-9]{4})", RegexOptions.IgnoreCase); string result = myRegex.Match(textFile).ToString(); Console.WriteLine(result); Console.ReadLine(); } } }
You can also specify multiple date formats with regex. For example the regular expression (Due Date: [0-9]?[0-9](-|/)[0-9]{2}(-|/)[0-9]{4}) returns dates in both dd-dd-dddd and dd/dd/dddd formats. Here, you specify both dash(-) and forward-slash (/) in parenthesis with an or symbol (|). You can also add other delimiters such as a period, a backslash or any other special character if you want.
Update your text invoice as follows. Here you can see the date in the dd/dd/dddd format.
Run the following script:
using System; using System.IO; using System.Linq; using System.Text; using System.Text.RegularExpressions; namespace RegexCodes { class Program { static void Main(string[] args) { string textFile = File.ReadAllText(@"D:\Datasets\invoice.txt", Encoding.UTF8); Console.WriteLine("===================="); var myRegex = new Regex(@"(Due Date: [0-9]?[0-9](-|/)[0-9]{2}(-|/)[0-9]{4})", RegexOptions.IgnoreCase); string result = myRegex.Match(textFile).ToString(); Console.WriteLine(result); Console.ReadLine(); } } }
You will see that the due date has been extracted in dd/dd/dddd format.
If you update the date to dd-dd-dddd format, you will see that the above script will be able to extract that too.
Other useful articles:
- How to Use RegEx for Data Extraction
- How to Find Total Tax Using a Regular Expression in C#
- How to Find a Number Using Regular Expressions in C#
- How to Find Invoice Numbers Using Regular Expressions in C#
- Find SSN Using a Regular Expression in C#
- Find Total Amount Using a Regular Expression in C#
- How to Find Website Links using Regex
- Module 1: Regular Expressions for Beginners
- Module 1: Regex Usage and Tool Demo
- Module 2: Regex Engine Basics (Part 1)
- Module 2: Regex Engine Basics (Part 2)
- Module 2: Regex Syntax in Detail (Part 1)
- Module 2: Regex Syntax in Detail (Part 2)
- Module 2: Quantifiers in Reg Ex for Beginners
- Module 2: Short Codes in Reg Ex for Beginners
- Module 2: Anchors and Boundaries in Detail
- Module 2: Grouping and Subpattern in Detail
- Module 3: Realtime Use Case of Regular Expressions - Part 1
- Module 3: Realtime Use Case of Regular Expressions - Part 2
- Module 3: Realtime Use Case of Regular Expressions - Part 3
- Module 3: Realtime Use Case of Regular Expressions - Part 4
- How to Find Quantity Field Using Regular Expression in C#
- How to Find Phone Numbers without a Specific Format
- How to Find Date Using Regular Expression in C#
- How to Find Time Using Regular Expression in C#
- How to Find a Sentence Using Regular Expressions in C#
- Find a Word Using Regular Expression in C#
- Find a Due Date using Regular Expressions in C#
- How to Find the End of a String Using Regular Expression in C
- How to Find the Start of a String Using Regular Expression in C
- How to Find a Comma using Regular Expression in C Sharp
- How to Find a Dot using Regular Expression in C
- How to Find a Semicolon using Regular Expression in C Sharp
- How to Find a Double Space using Regular Expression in C