PHP:imagesetbrush()的用法_GD库图像处理函数
414
2023-12-15
前言
大家应该都有所体会,很多时候在做网络爬虫的时候特别需要将爬虫搜索到的超链接进行处理,统一都改成绝对路径的,所以本文就写了一个正则表达式来对搜索到的链接进行处理。下面话不多说,来看看详细的介绍吧。
通常我们可能会搜索到如下的链接:
< 空超链接 --> <a href="http://www.gimoo.net/t/1805/5aead87a05db2.html"></a> < 空白符 --> <a href="http://www.gimoo.net/t/1805/ " rel="external nofollow" > </a> < a标签含有其它属性 --> <a href="http://www.gimoo.net/t/1805/index.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" alt="超链接"> index.html </a> <a href="http://www.gimoo.net/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" target="_blank"> / target="_blank" </a> <a target="_blank" href="http://www.gimoo.net/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" alt="超链接" > target="_blank" / alt="超链接" </a> <a target="_blank" title="超链接" href="http://www.gimoo.net/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" alt="超链接" > target="_blank" title="超链接" / alt="超链接" </a> < 根目录 --> <a href="http://www.gimoo.net/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" > / </a> <a href="http://www.gimoo.net/t/1805/a" rel="external nofollow" > a </a> < 含参数 --> <a href="http://www.gimoo.net/index.html&" rel="external nofollow" > /index.html?id=1 </a> <a href="http://www.gimoo.net/t/1805/&" rel="external nofollow" > ?id=2 </a> < // --> <a href="http://index.html" rel="external nofollow" > //index.html </a> <a href="http://www.mafutian.net" rel="external nofollow" > //www.mafutian.net </a> < 站内链接 --> <a href="http://www.hole_1.com/index.html" rel="external nofollow" > http://www.hole_1.com/index.html </a> < 站外链接 --> <a href="http://www.mafutian.net" rel="external nofollow" > http://www.mafutian.net </a> <a href="http://www.numberer.net" rel="external nofollow" > http://www.numberer.net </a> < 图片,文本文件格式的链接 --> <a href="http://www.gimoo.net/t/1805/1.jpg" rel="external nofollow" > 1.jpg </a> <a href="http://www.gimoo.net/t/1805/1.jpeg" rel="external nofollow" > 1.jpeg </a> <a href="http://www.gimoo.net/t/1805/1.gif" rel="external nofollow" > 1.gif </a> <a href="http://www.gimoo.net/t/1805/1.png" rel="external nofollow" > 1.png </a> <a href="http://www.gimoo.net/t/1805/1.txt" rel="external nofollow" > 1.txt </a> < 普通链接 --> <a href="http://www.gimoo.net/t/1805/index.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" > index.html </a> <a href="http://www.gimoo.net/t/1805/index.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" > index.html </a> <a href="http://www.gimoo.net/t/1805/index.html" rel="external nofollow" > ./index.html </a> <a href="http://www.gimoo.net/t/index.html" rel="external nofollow" > ../index.html </a> <a href="http://www.gimoo.net/t/1805/.../" rel="external nofollow" > .../ </a> <a href="http://www.gimoo.net/t/1805/..." rel="external nofollow" > ... </a> < 非链接,含有链接冒号 --> <a href="javascript:void(0)" rel="external nofollow" > javascript:void(0) </a> <a href="a:b" rel="external nofollow" > a:b </a> <a href="http://www.gimoo.net/a" rel="external nofollow" > /a#a:b </a> <a href="mailto:'mafutian@126.com'" rel="external nofollow" > mailto:'mafutian@126.com' </a> <a href="http://www.gimoo.net/tencent://message/&" rel="external nofollow" > /tencent://message/?uin=335134463 </a> < 相对路径 --> <a href="http://www.gimoo.net/t/1805/." rel="external nofollow" > . </a> <a href="http://www.gimoo.net/t/1805/.." rel="external nofollow" > .. </a> <a href="http://www.gimoo.net/t/" rel="external nofollow" > ../ </a> <a href="http://www.gimoo.net/a/b/.." rel="external nofollow" > /a/b/.. </a> <a href="http://www.gimoo.net/a" rel="external nofollow" > /a </a> <a href="http://www.gimoo.net/t/1805/b" rel="external nofollow" > ./b </a> <a href="http://www.gimoo.net/t/1805/b" rel="external nofollow" > ./././././././././b </a> < 其实就是 ./b --> <a href="http://www.gimoo.net/t/c" rel="external nofollow" > ../c </a> <a href="http://www.gimoo.net/d" rel="external nofollow" > ../../d </a> <a href="http://www.gimoo.net/t/b/d" rel="external nofollow" > ../a/../b/c/../d </a> <a href="http://www.gimoo.net/t/e" rel="external nofollow" > ./../e </a> <a href="http://www.hole_1.org/./../e" rel="external nofollow" > http://www.hole_1.org/./../e </a> <a href="http://www.gimoo.net/t/f" rel="external nofollow" > ./.././f </a> <a href="http://www.hole_1.org/../a/.../../b/c/../d/.." rel="external nofollow" > http://www.hole_1.org/../a/.../../b/c/../d/.. </a> < 带有端口号 --> <a href="http://www.gimoo.net/t/1805/:8081/index.html" rel="external nofollow" > :8081/index.html </a> <a href="http://www.mafutian.net:80/index.html" rel="external nofollow" > :80/index.html </a> <a href="http://www.mafutian.net:8081/index.html" rel="external nofollow" > http://www.mafutian.net:8081/index.html </a> <a href="http://www.mafutian.net:8082/index.html" rel="external nofollow" > http://www.mafutian.net:8082/index.html </a>
处理的第一步,设置成绝对路径:
http:// ... / ../ ../
然后本文讲讲如何去除绝对路径中的 './'、'../'、'/..'的实现代码:
function url_to_absolute($relative) { $absolute = ''; // 去除所有的 './' $absolute = preg_replace('/(?<!.).//','',$relative); $count = preg_match_all('/(?<!/)/([^/]{1,}?)/..//',$absolute,$res); // 迭代去除所有的 '/abc/../' do { $absolute = preg_replace('/(?<!/)/([^/]{1,}?)/..//','/',$absolute); $count = preg_match_all('/(?<!/)/([^/]{1,}?)/..//',$absolute,$res); }while($count >= 1); // 除去最后的 '/..' $absolute = preg_replace('/(?<!/)/([^/]{1,}?)/..$/','/',$absolute); $absolute = preg_replace('//..$/','',$absolute); // 除去存在的 '../' $absolute = preg_replace('/(?<!.)..//','',$absolute); return $absolute; } $relative = 'http://www.mytest.org/../a/.../../b/c/../d/..'; var_dump(url_to_absolute($relative)); // 输出:string 'http://www.mytest.org/a/b/' (length=26)
总结
以上就是这篇文章的全部内容了,希望本文的内容对大家的学习或者工作能带来一定的帮助,如果有疑问大家可以留言交流,谢谢大家对绿夏网的支持。
#免责声明#
本站[绿夏技术导航]提供的一切软件、教程和内容信息仅限用于学习和研究目的;不得将上述内容用于商业或者非法用途,否则,一切后果请用户自负。本站信息来自网络收集整理,版权争议与本站无关。您必须在下载后的24个小时之内,从您的电脑或手机中彻底删除上述内容。如果您喜欢该程序或内容,请支持正版,购买注册,得到更好的正版服务。我们非常重视版权问题,如有侵权请邮件[admin@lxwl520.com]与我们联系进行删除处理。敬请谅解!