How to Find Website Links Using Regular Expressions in C#
Text documents contain different types of information such as alphabets, numbers, images, special characters, and website links, etc. One of the most important tasks is to extract all the website links that appear in a document. In this article, you will see how you can use regular expressions in C# to find website links in text documents. It is important to mention that you will be finding only explicitly mentioned website links and not the links embedded inside the text. So let’s begin without any ado.
Finding A Single Website Link
To find a single website link the Regex expression “\b(?:https?://|www.)\S+\b”. The explanation of the regular expression is as follows:
- \b: looks for the start of a word after a period or empty space.
- ?:https?://|www\: Search for a string that starts with https, http or www.
- \S+: Search for a series of non-whitespace characters
- \b: Marks the end of a string.
To return a single result, the regular expression can be passed to the Match() method of the regex object. The following script returns the first website link encountered in the text.
using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; using System.Text.RegularExpressions; namespace RegexCodes { class Program { static void Main(string[] args) { string textFile = "Hello, search the items on www.google.com "; Console.WriteLine("===================="); var myRegex = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.IgnoreCase); string result = myRegex.Match(textFile).ToString(); Console.WriteLine(result); } } }
Here is the output of the above script. You can see that the website link has successfully been retrieved.
Output
================== www.google.com
Finding Multiple Links
A text document can contain multiple website links starting with https, http or www. To fetch those links we can again use the “\b(?:https?://|www.)\S+\b” regular expression. However, this time we need to pass the regular expression to the “Matches()” function instead of the match function.
In the following script, the sample text contains three website links: www.google.com, https://bing.com, and http://yahoo.com. The “Matches()” function is used to return all the links. The result of the “Matches()” function is iterated via a for each loop and the value for each result is printed on the console.
using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; using System.Text.RegularExpressions; namespace RegexCodes { class Program { static void Main(string[] args) { string textFile = "Hello, search the items on www.google.com and https://bing.com. If you dont find any answer, you can search it on http://yahoo.com as well"; Console.WriteLine("===================="); var myRegex = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.IgnoreCase); var results = myRegex.Matches(textFile); foreach(Match result in results) { Console.WriteLine(result.ToString()); } } } }
The output of the above script is as follows. You can see that all the website links in the text have successfully been retrieved.
Output
================== www.google.com https://bing.com http://yahoo.com
Other useful articles:
- How to Use RegEx for Data Extraction
- How to Find Total Tax Using a Regular Expression in C#
- How to Find a Number Using Regular Expressions in C#
- How to Find Invoice Numbers Using Regular Expressions in C#
- Find SSN Using a Regular Expression in C#
- Find Total Amount Using a Regular Expression in C#
- How to Find Website Links using Regex
- Module 1: Regular Expressions for Beginners
- Module 1: Regex Usage and Tool Demo
- Module 2: Regex Engine Basics (Part 1)
- Module 2: Regex Engine Basics (Part 2)
- Module 2: Regex Syntax in Detail (Part 1)
- Module 2: Regex Syntax in Detail (Part 2)
- Module 2: Quantifiers in Reg Ex for Beginners
- Module 2: Short Codes in Reg Ex for Beginners
- Module 2: Anchors and Boundaries in Detail
- Module 2: Grouping and Subpattern in Detail
- Module 3: Realtime Use Case of Regular Expressions - Part 1
- Module 3: Realtime Use Case of Regular Expressions - Part 2
- Module 3: Realtime Use Case of Regular Expressions - Part 3
- Module 3: Realtime Use Case of Regular Expressions - Part 4
- How to Find Quantity Field Using Regular Expression in C#
- How to Find Phone Numbers without a Specific Format
- How to Find Date Using Regular Expression in C#
- How to Find Time Using Regular Expression in C#
- How to Find a Sentence Using Regular Expressions in C#
- Find a Word Using Regular Expression in C#
- Find a Due Date using Regular Expressions in C#
- How to Find the End of a String Using Regular Expression in C
- How to Find the Start of a String Using Regular Expression in C
- How to Find a Comma using Regular Expression in C Sharp
- How to Find a Dot using Regular Expression in C
- How to Find a Semicolon using Regular Expression in C Sharp
- How to Find a Double Space using Regular Expression in C