PHP Version:
/\<a.*?href=('|\")(.*?)(?:(?<!\\\)\\1|\w+(?=\=)|.(?=\s))[^\>]*?>(.*?)(?:\<\/)(?=[a]).*?(?=\>)\>/i
Actual Regex
/\<a.*?href=('|\")(.*?)(?:(?<!\\)\1|\w+(?=\=)|.(?=\s))[^\>]*?>(.*?)(?:\<\/)(?=[a]).*?(?=\>)\>/i
That regex was designed for deVolf's new RSS Import feature. It takes an a link and removes the href link and the text inside the . It allows for empty links as well as links without href's. The regex return matches are as follows:
- match 1 is whether single or double quotes were used, this is required for later on in the regex and is not usual after the regex is run
- match 2 contains the href link
- match 3 contains the text between the <a></a>
Things to consider:
- The regex matches anything after <a> until it hits </a>
- Between the href="" it looks for a closing quote (that matches the quote used to start it), a space or another html property. Therefore, I recommend checking the end of the url for a quote or space before working with it.
- It will NOT match newlines that are in the link anywhere. If you want to, add a s after the i at the end.
- It works with PHP 5.3. I have not tested other versions.
Thanks,
James Hartig
2 comments:
Hi, do you happen to be the same James Hartig who created this: http://userscripts.org/scripts/show/79598 ?
If yes, please be so kind as to update your script, because your script is very useful :)
I'm so sorry to be going so far and search for your blog >.<
After looking at atleast 20 other pages for a reg exp to remove links that have javascript in them I finally found yours. Thank you very much!
Joe Mas
Post a Comment