Module 2: Regex Engine Basics (Part 2)
Knowing how the Regex engine works enable you to craft better Regular Expressions more easily. It will help you to understand quickly why a particular Regex does not do what you initially expected. This will save you a lot of casework and head-scratching when you need to write more complex Regexes.
Let's start with our previous demo to see how the Regex engine works. Regex will always try to match as soon as it is ultimately possible. Regex engines execute the regex one character at a time in left to right order. This input string itself is parsed one character at a time in left-to-right order. Once a character is matched, it's set to be consumed from the input and the engine moves to the next input character and tries to match that against the previously matched character in the input string.
Now to make Dot optional in the title I have added a question mark quantifier, and as soon as it encounters a quantifier, it will surge forward matching as much as possible. For example, if the Regex doesn't match a character in the input source, then it will step back a character for the character until it finds a position in which it can match again. It will continue to do so until it has either found a complete match or has exhausted all possible options without finding one.
This stepping back is called backtracking. Either way, the engine always knows its current position within the Regex. In the Regex, if the regex specifies an alternative and if one search path fails, the engine will backtrack to match the next alternative. In the regex and alteration contracts in simple language are called either or operation.
Here we have specified the alternative. In the name, if Mr. title is not found, then the engine will backtrack to match the next alternative. Therefore, the engine also stores the backtracking position. There is no match for Miss title, so the engine will move ahead and match for the next alternation. Before we go any further, let's understand the type of regex engine. Please note that there is no standard definition of what a regular expression engine is.
There are two types of engine. The first is a Text-Directed engine and the second one is a Regex-Directed engine. Text directed engine attempts all paths of the Regex before moving to the next character of input. Thus, this engine doesn't backtrack. While in Regex-directed engine, paths are attempted in left-to-right order, as we have already seen. If the engine fails to match, then it backtracks to attempt an alternative path.
Most modern engines like PCRE, GREP, etc are Regex directed because this is the only way to implement some useful features, like a lazy quantifier and atomic grouping, etc. The most important one is that there is one very big difference between the Regex directed engine and the Text directed engine. Regex directed engine will stop at the first possible match as it encounters, while a POSIX based or Text directed engine will try to find the longest match.
This does not mean that a text directed engine will always return the longest possible match, just that it will try to make the first match as long as possible, even if a shorter part of the string already gave a match.
Let's perform one demo to understand this concept. This is another good online regex testing tool with different flavors of the engine. Here I am on the PCRE tab now and you can see that I have given the input string here and I want to match the word byte or BYTESCOUT. For that, we need to write the Regex. As you can see here that the world byte gets matched and stops.
As I said earlier, that Regex detected engine will stop at the first possible match as it encounters. Now let's see what happens in this project tab. When I click on this button this time we get a different result.
This POSIX engine or text directed engine will always return the longest possible match, even if a shorter part of the string already gave a match. When regular expressions were first made available in computing, they only supported a very limited number of syntaxes.
But as things go by in time, people wanted to be able to match more complex patterns. They started expanding and started to add more advanced features and syntaxes. Hence they built their Regex library or engine with their Syntax variation.
The reality today is that, according to Wikipedia, there are more than 25 different Regex engines, which are widely used, and they all have their particular regex dialects.
Now, one last note before we go ahead because there are such a variety of Regex engines, you should keep your preferred environment in mind when you select a tool for testing your regular expression.
As I said earlier, throughout this course we will be using either a BYTESCOUT multi-tool or some other online tools to give you some basic idea and understanding of fundamentals. Now that you know all that, let's dive deeper for a more detailed view of the basics, syntaxes, and elements of Regular Expressions.
Here's RegEx video tutorial:
Other useful articles:
- How to Use RegEx for Data Extraction
- How to Find Total Tax Using a Regular Expression in C#
- How to Find a Number Using Regular Expressions in C#
- How to Find Invoice Numbers Using Regular Expressions in C#
- Find SSN Using a Regular Expression in C#
- Find Total Amount Using a Regular Expression in C#
- How to Find Website Links using Regex
- Module 1: Regular Expressions for Beginners
- Module 1: Regex Usage and Tool Demo
- Module 2: Regex Engine Basics (Part 1)
- Module 2: Regex Engine Basics (Part 2)
- Module 2: Regex Syntax in Detail (Part 1)
- Module 2: Regex Syntax in Detail (Part 2)
- Module 2: Quantifiers in Reg Ex for Beginners
- Module 2: Short Codes in Reg Ex for Beginners
- Module 2: Anchors and Boundaries in Detail
- Module 2: Grouping and Subpattern in Detail
- Module 3: Realtime Use Case of Regular Expressions - Part 1
- Module 3: Realtime Use Case of Regular Expressions - Part 2
- Module 3: Realtime Use Case of Regular Expressions - Part 3
- Module 3: Realtime Use Case of Regular Expressions - Part 4
- How to Find Quantity Field Using Regular Expression in C#
- How to Find Phone Numbers without a Specific Format
- How to Find Date Using Regular Expression in C#
- How to Find Time Using Regular Expression in C#
- How to Find a Sentence Using Regular Expressions in C#
- Find a Word Using Regular Expression in C#
- Find a Due Date using Regular Expressions in C#
- How to Find the End of a String Using Regular Expression in C
- How to Find the Start of a String Using Regular Expression in C
- How to Find a Comma using Regular Expression in C Sharp
- How to Find a Dot using Regular Expression in C
- How to Find a Semicolon using Regular Expression in C Sharp
- How to Find a Double Space using Regular Expression in C