I went here: https://regexr.com/ and searched and found the following
/[-a-zA-Z0-9@:%+.~#?&//=]{2,256}.[a-z]{2,4}\b(/[-a-zA-Z0-9@:%+.~#?&//=]*)?/gi
if you test your string you will see it work. However trusting nobody very much I went to verify it here: https://regex101.com/ and it told me that the backslashes needed escaping so I got the following
URLs can be quite complicated to write a good regex for, so I normally use this one as a base (it is not mine, but it is released under the MIT license):
Adapted for your purpose, it would look something like this:
import re
regexURL = re.compile(
'(?:^|\\s)(' +
# protocol identifier (optional)
# short syntax // still required
'(?:(?:(?:https?|ftp):)?//)' +
# user:pass BasicAuth (optional)
'(?:\\S+(?::\\S*)?@)?' +
'(?:' +
# IP address exclusion
# private & local networks
'(?!(?:10|127)(?:\\.\\d{1,3}){3})' +
'(?!(?:169\\.254|192\\.168)(?:\\.\\d{1,3}){2})' +
'(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})' +
# IP address dotted notation octets
# excludes loopback network 0.0.0.0
# excludes reserved space >= 224.0.0.0
# excludes network & broadcast addresses
# (first & last IP address of each class)
'(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])' +
'(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}' +
'(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))' +
'|' +
# host & domain names, may end with dot
# can be replaced by a shortest alternative
# (?![-_])(?:[-\\w\\u00a1-\\uffff]{0,63}[^-_]\\.)+
'(?:' +
'(?:' +
'[a-z0-9\\u00a1-\\uffff]' +
'[a-z0-9\\u00a1-\\uffff_-]{0,62}' +
')?' +
'[a-z0-9\\u00a1-\\uffff]\\.' +
')+' +
# TLD identifier name, may end with dot
'(?:[a-z\\u00a1-\\uffff]{2,}\\.?)' +
')' +
# port number (optional)
'(?::\\d{2,5})?' +
# resource path (optional)
'(?:[/?#]\\S*)?'
')(?:\\s|$)',
re.IGNORECASE
)
text = 'Go to this link: https://www.google.com'
# gets the first URL only
urlMatch = regexURL.search(text)
if urlMatch:
url = urlMatch.group()
print(url)
# gets all URLs
urlList = regexURL.findall(text)
for url in urlList:
print(url)
Credit for regex:
Regular Expression for URL validation
Author: Diego Perini
Created: 2010/12/05
Updated: 2018/09/12
License: MIT
Copyright (c) 2010-2018 Diego Perini (http://www.iport.it)