c# - Regex to extract Favicon url from a webpage -
please me find favicon url sample html below using regular expression. should check file extension ".ico". developing personal bookmarking site , want save favicons of links bookmark. have written c# code convert icon gif , save have limited knowledge regex unable select tag because ending tags different in different sites . example of ending tags "/>" "/link>"
my programming language c#
<meta name="description" content="create 360 degree rotation product presentation online 3dbin. 360 product pics, object rotationg presentation can created website @ 3dbin.com web service." /> <meta name="robots" content="index, follow" /> <meta name="verify-v1" content="x42ckcsdiernwyvbsdbdlxn0x9aghmzz312zpwwtmf4=" /> <link rel="shortcut icon" href="http://3dbin.com/favicon.ico" type="image/x-icon" /> <link rel="stylesheet" type="text/css" href="http://3dbin.com/css/1261391049/style.min.css" /> <!--[if lt ie 8]> <script src="http://3dbin.com/js/1261039165/ie8.js" type="text/javascript"></script> <![endif]-->
solution: 1 more way this download , add reference htmlagilitypack dll. helping me. love site :)
htmldocument doc = new htmldocument(); doc.loadhtml(readcontent); if (doc.documentnode != null) { foreach (htmlnode link in doc.documentnode.selectnodes(@"//link[@href]")) { htmlattribute att = link.attributes["href"]; if (att.value.endswith(".ico")) { faviconurl = att.value; } } }
this should match whole link tag contain href=http://3dbin.com/favicon.ico
<link .*? href="http://3dbin\.com/favicon\.ico" [^>]* />
correction based on comment:
i see have c# solutions excellent! in case still wondering if done regular expressions following expression want. group 1 of match have url.
<link .*? href="(.*?.ico)"
simple c# snipet makes use of it:
// snipet example link item in form <link ... href="...ico" > ... </link> //just make sure pick properly. string htmltext = string htnltext = "<meta name=\"description\" content=\"create 360 degree rotation product presentation online 3dbin. 360 product pics, object rotationg presentation can created website @ 3dbin.com web service.\" /><meta name=\"robots\" content=\"index, follow\" /><meta name=\"verify-v1\" content=\"x42ckcsdiernwyvbsdbdlxn0x9aghmzz312zpwwtmf4=\" /><link rel=\"shortcut icon\" href=\"http://3dbin.com/favicon.ico\" type=\"image/x-icon\" /><link rel=\"shortcut icon\" href=\"http://anotherurl/someicofile.ico\" type=\"image/x-icon\">just make sure works different link ending</link><link rel=\"stylesheet\" type=\"text/css\" href=\"http://3dbin.com/css/1261391049/style.min.css\" /><!--[if lt ie 8]> <script src=\"http://3dbin.com/js/1261039165/ie8.js\" type=\"text/javascript\"></script><![endif]-->"; foreach (match match in regex.matches(htmltext, "<link .*? href=\"(.*?.ico)\"")) { string url = match.groups[1].value; console.writeline(url); }
which prints following console:
http://3dbin.com/favicon.ico http://anotherurl/someicofile.ico
Comments
Post a Comment