Module 2: Anchors and Boundaries in Detail
In the world of regex Anchors and Boundaries are the most important concept to understand. They are simple but yet you can not ignore them. Anchors are a way to assert where we expect boundaries in the string which we are examining. Though boundary assertions are not much important, when combined with other regex syntaxes they can be very powerful. These metacharacters are a way to tell regex that, hey! Please start a match from here but do not cross this limit. Here I have defined one statement.
Over here, boundaries can be the start of a string or the end of a string or it can be the start of a line or the end of a line or it can be a word boundary. But remember one thing boundary assertions will not match characters. They just check for the boundary itself. So far we have been writing regular expressions that partially match pieces across all the text. Sometimes this is not desirable.
Did you remember this hl7 file name match demo from the previous video? The last file name consists of more than three characters but still regex matches with it and to avoid this I would need to get more specific with my expression and see only match things that start with this particular set and that is exactly when you would want to use anchors and boundaries.
Anchors have a special meaning in regular expressions. They don't match any characters. What they do is ensure that matches a position before, after, or between characters. There are two different types of anchors. A start of line anchor which is a carrot sign and an end of line anchor which is a dollar sign.
The carrot sign anchor matches the beginning of the string or a line to find only the words that appear at the start of a line. You can include this anchor sign before the text to match in the Regular Expression. Using dollar sign anchor characters you can search for a text at the end of the string or the end of a line and with this anchor, the rest of the pattern must match immediately before a line fits the character.
Let's see one demo from our previous video where we match the hl7 file name and how this anchor meta character helps us to match a position before or after the characters. If you remember this demo from the previous module where we wrote this regex to match the hl7 file which name starts with the first three characters followed by an underscore and followed by six digits. Now notice these last two file names. It is still matched by this given regex even if it starts with the six characters and ends with some other extension.
Now if I add the carrot sign over here then did you see the difference? This file is removed now from the matching list.
In the same way, if I add the dollar sign over here then the last filename is also removed from the matching list. This is how we tell the regex that please start matching input from here and end the matching over here. Now, what about word boundaries?
Let us understand that meta character concept. As we saw in the previous video, by using the start of a line and end of a line anchors we will be able to match and exit files that we are expecting. But if what I was matching was not at the start of the line or the end of the line. In that case, these anchors would not have helped at all that is where word boundaries come in.
Word boundaries are another meta character in Regular Expressions that matches by position rather than by character. There are again two meta characters, backslash b and capital backslash B which match word boundaries and non-word boundaries respectively. Word boundaries are the point in a string between a word character and a non-word character. It can be at the beginning of a string if the first character is a word character or it can be at the end of a string if the last character is a word character.
Non-word characters can be a period or a dot or space or colon or any other punctuation mark. Non-word boundaries are exactly the opposite. It points in a string between two adjacent characters of the same type either a word or a non-word.
Now let's take a look at the example of word boundaries. Here I have defined one statement in the input text and we want to find a specific word from this text. For example, here I want to find a word out. If I write that word in the regex then it starts matching the three words instead of what I want. Now let's define the word boundaries by using the backslash b meta character in the start and the end.
Did you see the result changes and now it started matching the exact word which we are looking for. This is how we can define the boundaries by using the backslash b meta character.
Here's RegEx video tutorial:
Other useful articles:
- How to Use RegEx for Data Extraction
- How to Find Total Tax Using a Regular Expression in C#
- How to Find a Number Using Regular Expressions in C#
- How to Find Invoice Numbers Using Regular Expressions in C#
- Find SSN Using a Regular Expression in C#
- Find Total Amount Using a Regular Expression in C#
- How to Find Website Links using Regex
- Module 1: Regular Expressions for Beginners
- Module 1: Regex Usage and Tool Demo
- Module 2: Regex Engine Basics (Part 1)
- Module 2: Regex Engine Basics (Part 2)
- Module 2: Regex Syntax in Detail (Part 1)
- Module 2: Regex Syntax in Detail (Part 2)
- Module 2: Quantifiers in Reg Ex for Beginners
- Module 2: Short Codes in Reg Ex for Beginners
- Module 2: Anchors and Boundaries in Detail
- Module 2: Grouping and Subpattern in Detail
- Module 3: Realtime Use Case of Regular Expressions - Part 1
- Module 3: Realtime Use Case of Regular Expressions - Part 2
- Module 3: Realtime Use Case of Regular Expressions - Part 3
- Module 3: Realtime Use Case of Regular Expressions - Part 4
- How to Find Quantity Field Using Regular Expression in C#
- How to Find Phone Numbers without a Specific Format
- How to Find Date Using Regular Expression in C#
- How to Find Time Using Regular Expression in C#
- How to Find a Sentence Using Regular Expressions in C#
- Find a Word Using Regular Expression in C#
- Find a Due Date using Regular Expressions in C#
- How to Find the End of a String Using Regular Expression in C
- How to Find the Start of a String Using Regular Expression in C
- How to Find a Comma using Regular Expression in C Sharp
- How to Find a Dot using Regular Expression in C
- How to Find a Semicolon using Regular Expression in C Sharp
- How to Find a Double Space using Regular Expression in C