c# - Regex to extract Favicon url from a webpage -


please me find favicon url sample html below using regular expression. should check file extension ".ico". developing personal bookmarking site , want save favicons of links bookmark. have written c# code convert icon gif , save have limited knowledge regex unable select tag because ending tags different in different sites . example of ending tags "/>" "/link>"

my programming language c#

<meta name="description" content="create 360 degree rotation product presentation online 3dbin. 360 product pics, object rotationg presentation can created website @ 3dbin.com web service." /> <meta name="robots" content="index, follow" /> <meta name="verify-v1" content="x42ckcsdiernwyvbsdbdlxn0x9aghmzz312zpwwtmf4=" /> <link rel="shortcut icon" href="http://3dbin.com/favicon.ico" type="image/x-icon" /> <link rel="stylesheet" type="text/css" href="http://3dbin.com/css/1261391049/style.min.css" /> <!--[if lt ie 8]>     <script src="http://3dbin.com/js/1261039165/ie8.js" type="text/javascript"></script> <![endif]--> 

solution: 1 more way this download , add reference htmlagilitypack dll. helping me. love site :)

 htmldocument doc = new htmldocument();     doc.loadhtml(readcontent);      if (doc.documentnode != null)     {         foreach (htmlnode link in doc.documentnode.selectnodes(@"//link[@href]"))         {              htmlattribute att = link.attributes["href"];             if (att.value.endswith(".ico"))             {                 faviconurl = att.value;             }         }     } 

this should match whole link tag contain href=http://3dbin.com/favicon.ico

 <link .*? href="http://3dbin\.com/favicon\.ico" [^>]* /> 

correction based on comment:

i see have c# solutions excellent! in case still wondering if done regular expressions following expression want. group 1 of match have url.

 <link .*? href="(.*?.ico)" 

simple c# snipet makes use of it:

// snipet example link item in form <link ... href="...ico" > ... </link>  //just make sure pick properly. string htmltext = string htnltext = "<meta name=\"description\" content=\"create 360 degree rotation product presentation online 3dbin. 360 product pics, object rotationg presentation can created website @ 3dbin.com web service.\" /><meta name=\"robots\" content=\"index, follow\" /><meta name=\"verify-v1\" content=\"x42ckcsdiernwyvbsdbdlxn0x9aghmzz312zpwwtmf4=\" /><link rel=\"shortcut icon\" href=\"http://3dbin.com/favicon.ico\" type=\"image/x-icon\" /><link rel=\"shortcut icon\" href=\"http://anotherurl/someicofile.ico\" type=\"image/x-icon\">just make sure works different link ending</link><link rel=\"stylesheet\" type=\"text/css\" href=\"http://3dbin.com/css/1261391049/style.min.css\" /><!--[if lt ie 8]>    <script src=\"http://3dbin.com/js/1261039165/ie8.js\" type=\"text/javascript\"></script><![endif]-->";  foreach (match match in regex.matches(htmltext, "<link .*? href=\"(.*?.ico)\"")) {     string url = match.groups[1].value;      console.writeline(url); } 

which prints following console:

http://3dbin.com/favicon.ico http://anotherurl/someicofile.ico 

Comments

Popular posts from this blog

c# - SharpSVN - How to get the previous revision? -

c++ - Is it possible to compile a VST on linux? -

url - Querystring manipulation of email Address in PHP -