Link Search Menu Expand Document

Module 2: Regex Engine Basics (Part 2)

Previous Tutorial - Next Tutorial

Knowing how the Regex engine works enable you to craft better Regular Expressions more easily. It will help you to understand quickly why a particular Regex does not do what you initially expected. This will save you a lot of casework and head-scratching when you need to write more complex Regexes.

Let's start with our previous demo to see how the Regex engine works. Regex will always try to match as soon as it is ultimately possible. Regex engines execute the regex one character at a time in left to right order. This input string itself is parsed one character at a time in left-to-right order. Once a character is matched, it's set to be consumed from the input and the engine moves to the next input character and tries to match that against the previously matched character in the input string.

Now to make Dot optional in the title I have added a question mark quantifier, and as soon as it encounters a quantifier, it will surge forward matching as much as possible. For example, if the Regex doesn't match a character in the input source, then it will step back a character for the character until it finds a position in which it can match again. It will continue to do so until it has either found a complete match or has exhausted all possible options without finding one.

Regular Ex Engine

This stepping back is called backtracking. Either way, the engine always knows its current position within the Regex. In the Regex, if the regex specifies an alternative and if one search path fails, the engine will backtrack to match the next alternative. In the regex and alteration contracts in simple language are called either or operation.

Here we have specified the alternative. In the name, if Mr. title is not found, then the engine will backtrack to match the next alternative. Therefore, the engine also stores the backtracking position. There is no match for Miss title, so the engine will move ahead and match for the next alternation. Before we go any further, let's understand the type of regex engine. Please note that there is no standard definition of what a regular expression engine is.

There are two types of engine. The first is a Text-Directed engine and the second one is a Regex-Directed engine. Text directed engine attempts all paths of the Regex before moving to the next character of input. Thus, this engine doesn't backtrack. While in Regex-directed engine, paths are attempted in left-to-right order, as we have already seen. If the engine fails to match, then it backtracks to attempt an alternative path.

Most modern engines like PCRE, GREP, etc are Regex directed because this is the only way to implement some useful features, like a lazy quantifier and atomic grouping, etc. The most important one is that there is one very big difference between the Regex directed engine and the Text directed engine. Regex directed engine will stop at the first possible match as it encounters, while a POSIX based or Text directed engine will try to find the longest match.

This does not mean that a text directed engine will always return the longest possible match, just that it will try to make the first match as long as possible, even if a shorter part of the string already gave a match.

Let's perform one demo to understand this concept. This is another good online regex testing tool with different flavors of the engine. Here I am on the PCRE tab now and you can see that I have given the input string here and I want to match the word byte or BYTESCOUT. For that, we need to write the Regex. As you can see here that the world byte gets matched and stops.

Regular Ex Engines

As I said earlier, that Regex detected engine will stop at the first possible match as it encounters. Now let's see what happens in this project tab. When I click on this button this time we get a different result.

This POSIX engine or text directed engine will always return the longest possible match, even if a shorter part of the string already gave a match. When regular expressions were first made available in computing, they only supported a very limited number of syntaxes.

But as things go by in time, people wanted to be able to match more complex patterns. They started expanding and started to add more advanced features and syntaxes. Hence they built their Regex library or engine with their Syntax variation.

The reality today is that, according to Wikipedia, there are more than 25 different Regex engines, which are widely used, and they all have their particular regex dialects.

Regular Expressions Engine

Now, one last note before we go ahead because there are such a variety of Regex engines, you should keep your preferred environment in mind when you select a tool for testing your regular expression.

As I said earlier, throughout this course we will be using either a BYTESCOUT multi-tool or some other online tools to give you some basic idea and understanding of fundamentals. Now that you know all that, let's dive deeper for a more detailed view of the basics, syntaxes, and elements of Regular Expressions.

Web API for developers Free Trial Offline SDK

Here's RegEx video tutorial:

Other useful articles:

Back to top

© , — All Rights Reserved - Terms of Use - Privacy Policy