![]() |
|
|||||||
| Home | Register | FAQ | Members List | Search | Today's Posts | Mark Forums Read |
| Questions & Help Needed Post any question regarding Basic4ppc. |
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
Hello,
i've got a question regarding different / localized character encodings. In the app i'm working on i receive a certain text from an internet server.I use two different transport methods to get this text, one method is http and the other one is pop3( mail).The text is independent from which transport method i choose and it is always using the same character encoding. The text is then printed in a textbox control. The text contains non-US/ACSII characters ( german umlauts for example).For the http method i use the code page style webresponse object: Code:
Response.New2(1252) which is working fine and as expected ( the Response.New1 leaves me with unprintable characters). The text is in the ISO 8859-1 encoding, which is an 8bit extension to the 7bit ACSII - see ISO/IEC 8859 - Wikipedia, the free encyclopedia . The used code page 1252 in the response is pretty much the same as ISO8859-1 encoding except for some control characters which does not matter in this case - see Windows-1252 - Wikipedia, the free encyclopedia If i receive the same text from the POP3/Mail server, i end up with unprintable characters ( squares).Using a network sniffer i can see that the text is encoded in exactly the same way when received through http. So let's say, the text contains a german "ö", which has a hex code of F6 in ISO8859-1 encoding.Due to the lack of any code page handling in B4P ( except a few intructions as the mentioned webresponse.new2) my plan was just to substitute the ISO codes for umlauts using a Quote:
but this does not work, probably due to the fact that Chr() does not know about Code Page 1252 or ISO8859-1. Questions regarding that matter: - how does Basic4PPC handle different code pages ? - does it at all or does it completely rely on UTF-8 encoding ? ( Chr() does not appear to be able to cope with UTF-8 ???) or on ASCII encoding ? - what about MIME/quoted-printable encoding ? - how can i solve my problem outlined above ? Manual character conversion is relatively complex and time-consuming. Kind regards TWELVE |
|
|||
|
Meanwhile i found a solution for my particular problem:
Since i use the network library to communicate with the POP3 server and a bitwise object to convert between strings and binary bytes, the following works similar to the solution i use for the http response: Code:
bit.New2(1252) cheers TWELVE |
|
||||
|
Hi TWELVE,
JamesC had a similar problem with german characters coded in a single byte. What happened to the ß? Erel's solution was the same with the binary file and bin.New2(c,Code Page number). Hi Erel, The link in the help file for the Code Page numbers doesn't work anymore, it says Contend not found. Best regards.
__________________
Klaus Switzerland Last edited by klaus : 05-28-2008 at 10:38 AM. |
|
||||
|
Code page link: Code Page Identifiers
It was updated in version 6.30. |
|
||||||
|
@agraham:
Quote:
Maybe internal, but Chr() help is talking about ACSII and a value range of 0 to 255: Quote:
Quote:
Quote:
Quote:
![]() Both streams are in the same encoding, which is ISO8859-1 and NOT UTF-8.So it's clear no matter what transport is used a conversion has to take place. Because i cannot guess in what encoding a text is i need some hint, and this is usually contained in a header. Quote:
So for me the conclusions from this are as following: - the programmer does not need to take care about character encodings as long as everything is kept in UTF - strings in basic4ppc are in UTF - if a (foreign) character/stream from outside enters a basic4ppc variable, a conversion needs to take place, if the stream is not in UTF. - the conversion can only be done properly, if the stream's code page is known and a conversion function supporting a code page is available - if no code page is specified, basic4ppc seems to interpret the non-UTF stream as ASCII ( this is why i could read most of the text, but the umlauts were replaced with the squares), which equals to the lower 7 Bit of any ISO8859 charset. For a http stream this can be achieved easily by interpreting the content-type header, which contains the used charset.But the (ISO-)Charset number needs to be converted to a code page, though. cheers TWELVE |
|
|||||||||
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
) basic4ppc doesn't interpret anything, it receives UTF-16 from a stream. It depends on the stream how the encoding is treated. How are you getting this ASCII default? I assume you are using a BinaryFile object as the stream which if opened by New1 gives you the choice of ASCII or UTF-8 or if opened by New2 requires a codepage to be specified. I see no default behaviour ![]() Quote:
EDIT :- I'm wrong again about Webby stuff and the WebResponse handling things - I just saw your "Response.New2(1252)" in the first post. I suppose you need to New The WebRequest with the required code page and use the same codepage for Newing the WebResponse! Last edited by agraham : 05-29-2008 at 03:08 PM. |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| ascii character problem | Gale Johnson | Questions & Help Needed | 1 | 05-28-2008 04:59 PM |
| possible to use a character to break off lines of code? | Stellaferox | Questions & Help Needed | 2 | 02-11-2008 10:41 PM |
| Walking character using the Sprite library | Erel | Code Samples & Tips | 2 | 01-18-2008 06:46 PM |
| Replace encoding UTF 8 by UTF 7 | EdQas | Questions & Help Needed | 6 | 09-16-2007 05:36 PM |
| is a single character string a number as well? | Stellaferox | Questions & Help Needed | 16 | 06-08-2007 12:01 PM |