In this moment I can't connect to the source of the html-page but what happens is that I use a procedure similar to this one to download the html-source for successive parsing:
Code:
Dim Reader As TextReader
reader.Initialize(HttpUtils.GetInputStream(posturl))
ChannelList.Initialize
Dim line As String
line = Reader.ReadLine
Do While line <> Null
if line <> Null Then
ChannelList.Add(line)
End If
line = Reader.ReadLine
Loop
Reader.Close
For i = 0 To channelList.Size -1
channelCode=channelList.Get(i)
'various code for parsing channelcode
loop
TextReader supports UTF8 encoding by default which should be OK. I know that I can use reader.Initialize2 and specify which encoding to use but I doubt this will help because in any case, I don't know beforehand which encoding-scheme is being used.
In any case, using above code, for instance I get:
PHP Code:
------->> Generalisti <<-------
which should be:
PHP Code:
------->> Generalisti <<-------
Of course, the string parsed may contain other HTML-entities.
That is why I thought something to handle the HTML-entities could be useful.