Basic4ppc - Windows Mobile Development  

Go Back   Basic4ppc - Windows Mobile Development > Main Category > Questions & Help Needed
Home Register FAQ Members List Search Today's Posts Mark Forums Read

Questions & Help Needed Post any question regarding Basic4ppc.


Get all links from an html file using regex


Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 10-12-2007, 03:01 PM
MM2forever's Avatar
Junior Member
 
Join Date: Jun 2007
Location: Germany
Posts: 45
Send a message via ICQ to MM2forever Send a message via MSN to MM2forever Send a message via Skype™ to MM2forever
Default Get all links from an html file using regex

Hi guys,
I am trying to get links from an html file. My code (important parts of it) looks like this:
regex.new1("href=(.*?)[\s>]")

match.value=regex.match(htmtemp)
Do While match.success=True
list.add(SubString(htmtemp,match.index,match.lengt h))
match.value=match.nextmatch
Loop

Im not getting any results, whats wrong? Is it my regular expression itself?

Thank you for your help
Christian
[MM2forever]
__________________
Regards, Christian

>> My PPC Hardware <<


There are 10 kinds of people: Those who understand binary, and those who don't.
Reply With Quote
  #2 (permalink)  
Old 10-12-2007, 04:25 PM
Erel's Avatar
Administrator
 
Join Date: Apr 2007
Posts: 2,954
Default

The pattern is taken from this site: http://sastools.com/b2/post/79393902
You should add a Regex object and a Match object.
Code:
Sub Globals
    'Declare the global variables here.

End Sub

Sub App_Start
    Form1.Show
    If OpenDialog1.Show = cCancel Then AppClose
    q = Chr(34) & Chr(34)
    r = "(?:[hH][rR][eE][fF]\s*=)"
    r = r & "(?:[\s"&q&"(']*)"
    r = r & "(?!#|[Mm]ailto|[lL]ocation.|[jJ]avascript|.*css|.*this\.)"
    r = r & "(.*?)(?:[\s>)"&q&"'])"
    Regex.New2(r,true,true)
    FileOpen(c1,OpenDialog1.File,cRead)
    s = FileReadToEnd(c1)
    FileClose(c1)
    Match.New1
    Match.Value = regex.Match(s)
    Do While Match.Success
        lstLinks.Add(Match.GetGroup(1))
        Match.Value = Match.NextMatch
    Loop
End Sub
Reply With Quote
  #3 (permalink)  
Old 10-13-2007, 07:56 AM
MM2forever's Avatar
Junior Member
 
Join Date: Jun 2007
Location: Germany
Posts: 45
Send a message via ICQ to MM2forever Send a message via MSN to MM2forever Send a message via Skype™ to MM2forever
Default

thank you, the regex works great, but i took the "bracket exception" or whatever I should call it out, because it gave my trouble with links like "gnfgn (1)"
__________________
Regards, Christian

>> My PPC Hardware <<


There are 10 kinds of people: Those who understand binary, and those who don't.
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
HTML in mailbodym timsteeman Questions & Help Needed 4 06-02-2008 09:27 PM
Interessante Links (PPC-Mobile) Tazer German Forum 0 03-25-2008 07:14 PM
Problem with Regex HARRY Questions & Help Needed 4 01-10-2008 08:07 AM
CSV file generates strange start in file sunnyboyj Questions & Help Needed 9 12-21-2007 08:38 PM


All times are GMT. The time now is 06:16 PM.


Powered by vBulletin® Version 3.6.10
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.1.0