@agraham:
Quote:
|
Chr() is UTF-16 based and only knows about wide characters. By the time characters are inside B4PPC they are UTF-16 characters
|
Maybe internal, but Chr() help is talking about ACSII and a value range of 0 to 255:
Quote:
Returns the ASCII character represented by the given number.
Syntax: Chr (Integer)
Integer ranges from 0 to 255.
|
Quote:
what about MIME/quoted-printable encoding ?
They will be treated as any other character stream.
|
What does this mean..? :-) If the Compiler and or the OS is dealing with UTF internally, some conversion might be needed if a character/stream is coming in with an encoding different from UTF.
Quote:
|
Http should be UTF-8 based which is why the character coding works correctly.
|
That's not true.A http stream can use UTF-8, but this is not obligatory.The used/supported encoding is determined by server and client and can be read from a http header.
Quote:
|
Your details on the POP3 stream seem contradictory. From what you say it sounds like the POP3 stream is the same as the Http stream and so is also UTF-8 which would mean that unmlauts are encoded as two bytes.
|
I'm afraid this is called a wrong assumption...
Both streams are in the same encoding, which is ISO8859-1 and NOT UTF-8.So it's clear no matter what transport is used a conversion has to take place.
Because i cannot guess in what encoding a text is i need some hint, and this is usually contained in a header.
Quote:
|
But you also say that bit.New2(1252) works which implies that the stream is actually single byte characters coded to code page 1252.
|
That's absolutely true.
So for me the conclusions from this are as following:
- the programmer does not need to take care about character encodings as long as everything is kept in UTF
- strings in basic4ppc are in UTF
- if a (foreign) character/stream from outside enters a basic4ppc variable,
a conversion needs to take place, if the stream is not in UTF.
- the conversion can only be done properly, if the stream's code page is known and a conversion function supporting a code page is available
- if no code page is specified, basic4ppc seems to interpret the non-UTF stream as ASCII ( this is why i could read most of the text, but the umlauts were replaced with the squares), which equals to the lower 7 Bit of any ISO8859 charset.
For a http stream this can be achieved easily by interpreting the content-type header, which contains the used charset.But the (ISO-)Charset number needs to be converted to a code page, though.
cheers
TWELVE