How to Find Date Using Regular Expression in C#
Automatic extraction of the desired text from documents has various applications ranging from machine learning and data science to pattern recognition and natural language processing. Reading text documents, extracting the desired information, and then storing the text manually is a laborious task. With the advancements in programming language and computing hardware, text extraction tasks can be automated.
In this article, you will see how you can read a text document in C# and then find dates in the document using C# regular expressions. You will see how you can read dates that are not in any specific format, as well as dates in a specific format.
Table of Contents
Finding Dates without a Specific Format
A text document can contain dates in a variety of formats. The simplest approach to find a date from a text document is to find the text after which a date is most likely to be mentioned. For instance, if you look at the following text invoice, you can see that the date is mentioned after the text “Date : “.
Using regex, you can find the text that occurs after the word “Date : “. The regex pattern which can be used to do so is “Date : ([\w-]+)” The regex patterns tell the regex to include the string “Date :” followed by any digit or letter or dash (-). We included “dash” in the pattern because, in the text invoice, we can see that the day, month, and year are separated with dashes. This pattern can be passed to the “Match()” function of the regex module as shown in the following script. At the beginning of the following script, the File.ReadAllText() method reads the text document and returns the document text in the form of a C# string.
using System; using System.IO; using System.Text; using System.Text.RegularExpressions; namespace RegexCodes { class Program { static void Main(string[] args) { string textFile = File.ReadAllText(@"E:\Datasets\invoice.txt", Encoding.UTF8); Console.WriteLine("===================="); var myRegex = new Regex(@"Date : ([\w-]+)", RegexOptions.IgnoreCase); string result = myRegex.Match(textFile).ToString(); Console.WriteLine(result); } } }
The above script will return any text that occurs after the word “Date : “ as shown in the output below:
Output:
Finding Dates with a Specific Format
You can also find dates that are in a specific format. For instance, if you look at the following invoice, you can see that date is in the format dd-mm-yyyy.
In this case, the regex needs to find the pattern XX-XX-XXXX. The regex pattern that finds date in such a format is ”@”\d{2}-\d{2}-\d{4}”. Look at the script below. Here the File.ReadAllText() method first reads all the text from the invoice.txt document. Next, the regex pattern is passed to the Match() function which returns the date in the specified format.
using System; using System.IO; using System.Text; using System.Text.RegularExpressions; namespace RegexCodes { class Program { static void Main(string[] args) { string textFile = File.ReadAllText(@"E:\Datasets\invoice.txt", Encoding.UTF8); Console.WriteLine("===================="); var myRegex = new Regex((@"\d{2}-\d{2}-\d{4}"), RegexOptions.IgnoreCase); string result = myRegex.Match(textFile).ToString(); Console.WriteLine(result); } } }
In the output below, you can see that the date in the format dd-mm-yyyy is returned. You can try other formats yourself and see if you can get the desired results.
Output:
Other useful articles:
- How to Use RegEx for Data Extraction
- How to Find Total Tax Using a Regular Expression in C#
- How to Find a Number Using Regular Expressions in C#
- How to Find Invoice Numbers Using Regular Expressions in C#
- Find SSN Using a Regular Expression in C#
- Find Total Amount Using a Regular Expression in C#
- How to Find Website Links using Regex
- Module 1: Regular Expressions for Beginners
- Module 1: Regex Usage and Tool Demo
- Module 2: Regex Engine Basics (Part 1)
- Module 2: Regex Engine Basics (Part 2)
- Module 2: Regex Syntax in Detail (Part 1)
- Module 2: Regex Syntax in Detail (Part 2)
- Module 2: Quantifiers in Reg Ex for Beginners
- Module 2: Short Codes in Reg Ex for Beginners
- Module 2: Anchors and Boundaries in Detail
- Module 2: Grouping and Subpattern in Detail
- Module 3: Realtime Use Case of Regular Expressions - Part 1
- Module 3: Realtime Use Case of Regular Expressions - Part 2
- Module 3: Realtime Use Case of Regular Expressions - Part 3
- Module 3: Realtime Use Case of Regular Expressions - Part 4
- How to Find Quantity Field Using Regular Expression in C#
- How to Find Phone Numbers without a Specific Format
- How to Find Date Using Regular Expression in C#
- How to Find Time Using Regular Expression in C#
- How to Find a Sentence Using Regular Expressions in C#
- Find a Word Using Regular Expression in C#
- Find a Due Date using Regular Expressions in C#
- How to Find the End of a String Using Regular Expression in C
- How to Find the Start of a String Using Regular Expression in C
- How to Find a Comma using Regular Expression in C Sharp
- How to Find a Dot using Regular Expression in C
- How to Find a Semicolon using Regular Expression in C Sharp
- How to Find a Double Space using Regular Expression in C