Link Search Menu Expand Document

How to Find Date Using Regular Expression in C#

Automatic extraction of the desired text from documents has various applications ranging from machine learning and data science to pattern recognition and natural language processing. Reading text documents, extracting the desired information, and then storing the text manually is a laborious task. With the advancements in programming language and computing hardware, text extraction tasks can be automated.

In this article, you will see how you can read a text document in C# and then find dates in the document using C# regular expressions. You will see how you can read dates that are not in any specific format, as well as dates in a specific format.

Table of Contents

  1. Finding Dates without a Specific Format
  2. Finding Dates with a Specific Format

Finding Dates without a Specific Format

A text document can contain dates in a variety of formats. The simplest approach to find a date from a text document is to find the text after which a date is most likely to be mentioned. For instance, if you look at the following text invoice, you can see that the date is mentioned after the text “Date : “.

Find Date

Using regex, you can find the text that occurs after the word “Date : “. The regex pattern which can be used to do so is “Date : ([\w-]+)” The regex patterns tell the regex to include the string “Date :” followed by any digit or letter or dash (-). We included “dash” in the pattern because, in the text invoice, we can see that the day, month, and year are separated with dashes.  This pattern can be passed to the “Match()” function of the regex module as shown in the following script. At the beginning of the following script, the File.ReadAllText() method reads the text document and returns the document text in the form of a C# string. 

using System;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;

namespace RegexCodes
{
class Program
{
static void Main(string[] args)
{
string textFile = File.ReadAllText(@"E:\Datasets\invoice.txt", Encoding.UTF8);

Console.WriteLine("====================");
var myRegex = new Regex(@"Date : ([\w-]+)", RegexOptions.IgnoreCase);

string result = myRegex.Match(textFile).ToString();

Console.WriteLine(result);
}
}
}

The above script will return any text that occurs after the word “Date : “ as shown in the output below:

Output:

How to Find Date

Finding Dates with a Specific Format

You can also find dates that are in a specific format. For instance, if you look at the following invoice, you can see that date is in the format dd-mm-yyyy.

Find Date Reg Ex

In this case, the regex needs to find the pattern XX-XX-XXXX. The regex pattern that finds date in such a format is ”@”\d{2}-\d{2}-\d{4}”.  Look at the script below. Here the File.ReadAllText() method first reads all the text from the invoice.txt document. Next, the regex pattern is passed to the Match() function which returns the date in the specified format.

using System;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;

namespace RegexCodes
{
class Program
{
static void Main(string[] args)
{
string textFile = File.ReadAllText(@"E:\Datasets\invoice.txt", Encoding.UTF8);

Console.WriteLine("====================");
var myRegex = new Regex((@"\d{2}-\d{2}-\d{4}"), RegexOptions.IgnoreCase);

string result = myRegex.Match(textFile).ToString();

Console.WriteLine(result);
}
}
}

In the output below, you can see that the date in the format dd-mm-yyyy is returned. You can try other formats yourself and see if you can get the desired results.

Output:

How to Find Date Reg Ex

Other useful articles:


Back to top

© , Regexsonline.com — All Rights Reserved - Terms of Use - Privacy Policy