注册 | 登录

Using regex in Classic ASP to get content of specific elements

itPublisher 分享于



So I am loading some remote content and need to use regex to isolate the the content of some tags.

  set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP") "GET", url, false 
 xmlhttp.setRequestHeader "Content-Type", "application/x-www-form-urlencoded" 
 xmlhttp.setRequestHeader "Accept-Language", "en-us" 
 xmlhttp.send "x=hello" 
 status = xmlhttp.status 
    if err.number <> 0 or status <> 200 then 
        if status = 404 then 
            Response.Write "[EFERROR]Page does not exist (404)." 
        elseif status >= 401 and status < 402 then 
            Response.Write "[EFERROR]Access denied (401)." 
        elseif status >= 500 and status <= 600 then 
            Response.Write "[EFERROR]500 Internal Server Error on remote site." 
            Response.write "[EFERROR]Server is down or does not exist." 
        end if 
 data =  xmlhttp.responseText 

I basically need to get the content of the <title>Here is the title</title> also the meta description, keywords and some selected open graph meta data.

And finally I need to get the content of the first <h1>Heading</h1> and <p>Paragraph</p>

How can I parse the html data to get these things? Should I use regex?

regex asp-classic serverxmlhttp
  this question
asked May 28 '12 at 13:53 Chris Dowdeswell 463 2 6 23 1   have you considered using an xml parser instead? –  Daniel A. White May 28 '12 at 13:55      Could I just specific the returned content as XML then and use node selection? could you elaborate on how that might work? thanks @DanielA.White –  Chris Dowdeswell May 28 '12 at 14:02


3 Answers

You may be able to use the .responseXML property to retrieve the content you want without using regex. Because you are looking for data inside <title>, <h1> and <p> tags, the document returned is probably HTML. If the HTML document is well-formed according to the XML specifications it could mean it is already automatically parsed and accessible after you get the response.

So you could try this:

Dim objData
Set objData = xmlhttp.responseXML.selectSingleNode("//*[local-name() = 'title']")

If objData Is Nothing Then
    Response.Write "# no result #<br />"
    Response.Write "title: " & objData.Text & "<br />"
End If

Note though, that this XPath expression may not be the most efficient way to query an XML document (in case you want to process large amounts of data).

  this answer
answered Dec 12 '12 at 10:22 Sander_P 1,282 1 6 21


Use the Mid function combined with the Instr function. I built a function which uses the Mid function to determine the tag wrapped text by finding the position of each tag using the Instr function:

 Function GetInnerData(Data,TagOpen,TagClose)
   OpenPos = Instr(1,data,TagOpen,1)
   ClosePos = Instr(1,data,TagClose,1)
   If OpenPos > 0 And ClosePos > 0 Then GetInnerData = Trim(Mid(data,OpenPos+Len(TagOpen),ClosePos-(OpenPos+Len(TagOpen))))
 End Function

When you run this function like this, it will return My Title

<%=GetInnerData("any text <title>My Title</title> any text","<title>","</title>")%>

And in your case, You would do it like this:

 TitleData = GetInnerData(data,"<title>","</title>")

This will get the content in your <title> tag. or

 H1Data = GetInnerData(data,"<h1>","</h1>")

This will get the content in your <h1> tag.

The Instr function returns the first string found in the data, so this function will do exactly what you need.

  this answer
edited May 28 '12 at 18:36 answered May 28 '12 at 18:03 Control Freak 6,615 17 59 106


I actually used this solution in the end as it also solve the problem of having class names in the code.

Function GetFirstMatch(PatternToMatch, StringToSearch)
    Dim regEx, CurrentMatch, CurrentMatches

    Set regEx = New RegExp
    regEx.Pattern = PatternToMatch
    regEx.IgnoreCase = True
    regEx.Global = True
    regEx.MultiLine = True
    Set CurrentMatches = regEx.Execute(StringToSearch)

    GetFirstMatch = ""
    If CurrentMatches.Count >= 1 Then
        Set CurrentMatch = CurrentMatches(0)
        If CurrentMatch.SubMatches.Count >= 1 Then
            GetFirstMatch = CurrentMatch.SubMatches(0)
        End If
    End If
    Set regEx = Nothing
End Function

    title = clean_str(GetFirstMatch("<title[^>]*>([^<]+)</title>",data))
    firstpara = clean_str(GetFirstMatch("<p[^>]*>([^<]+)</p>",data))
    firsth1 = clean_str(GetFirstMatch("<h1[^>]*>([^<]+)</h1>",data))

  this answer
answered Jun 6 '12 at 19:32 Chris Dowdeswell 463 2 6 23









您的注册邮箱: 修改

重新发送激活邮件 进入我的邮箱