Delphi 2007 and Delphi 2009+ difference in string types. Migration

CodeGear Delphi 2007

This is not a divine revelation, but simply semblance of my personal reference.

In Delphi 2007AnsiString was the default string type used for general-purpose string manipulation.
This means variables declared simply as string were compiled as AnsiString. This changed in Delphi 2009+, where string became an alias for UnicodeString.

AnsiString in Delphi 2007:
  • In Delphi 2007 Char was an 8-bit (1-byte) AnsiChar, while in Delphi 2009+, it became a 16-bit WideChar (UTF-16) by default.
  • Encoding: The interpretation of these bytes depends on the operating system's current active code page (e.g., Windows-1252, or specific locales like code page 936 for simplified Chinese).
  • Length: AnsiString was dynamically allocated and limited only by available memory, unlike the older ShortString that was limited to 255 characters.
The primary distinction in Delphi 2009 and later is the shift to Unicode.
  • Char now represents a 16-bit character, enabling full Unicode support. Char becomes WideChar
  • string is an alias for UnicodeString (UTF-16, 2 bytes per character).
  • AnsiString still exists but is used primarily for backward compatibility or interfacing with non-Unicode systems/APIs.
When moving from Delphi 2007 to Delphi 2009 (or later):
  • AnsiString to UnicodeString: String changes, leading to potential data loss if ANSI data is assigned directly to Unicode strings.
  • Explicit Casts: Use PAnsiChar(myWideString), AnsiString(myUnicodeString) and similar, where needed, and be aware of character mapping.
  • Migration Required: Code using Char, PChar needed updates, especially with assignments between ANSI and Unicode types.
  • If a 1-byte buffer is in need, use RawByteString instead of string or Char. RawByteString is AnsiString with no code page set by default (AnsiString($ffff)).
(e.g., #128 isn't the Euro sign in Unicode).
Delphi and Unicode, Marco Cantù, December 2008

No comments:

Post a Comment