Sprezzatura :: Making Databases Happen

DLL Prototyping - Strings and Things Part 1

By Captain C | Friday, 26 June 2009 10:23 | 0 Comments

In our last post about DLL Prototyping we looked at using Namespaces to avoid collisions with other programs in the system. Over the next few articles we're going to take a look at Windows API calls that take strings as arguments along with the considerations you need to take into account when you use them.

Unicode and ANSI functions

One thing that you'll find working with the Windows API is that nearly every function that accepts string arguments has two versions: one that takes ANSI strings (one byte per character) and another that takes Unicode strings (two bytes per character or UTF-16). By convention each of these functions is named slightly differently when exported from its parent DLL - The ANSI version is suffixed with an "A" and the Unicode version is suffixed with a "W" (for "wide char" which is a 2-byte character type).

For example, if you want to use the GetWindowText() function it's actually exported from User32.dll as two functions:

GetWindowTextW()

GetWindowTextA()

There is no exported function named GetWindowText!

The documentation is written to use the plain function name because a C/C++ compiler can automatically substitute the correct A or W version when it sees the plain version. When prototyping the function for use with OI this is something you have to take care of yourself!

To carry on with the GetWindowText example here's how Microsoft documents the function:

 
 int GetWindowText( HWND   hWnd,
                    LPTSTR lpString,
                    int    nMaxCount
                  );
   

However, if you were writing a C/C++ program here's what the compiler would actually use for a Unicode program:

 
 int GetWindowTextW( HWND   hWnd,
                     LPWSTR lpString,
                     int    nMaxCount
                   );
   

and here's what it would use for an ANSI program:

 
 int GetWindowTextA( HWND  hWnd,
                     LPSTR lpString,
                     int   nMaxCount
                   );
   

These last two are the definitions you would have to use when prototyping the function in OpenInsight - NOT the first definition as given in the documentation.

TEXT strings (LPTSTR and LPCTSTR)

As well as the difference in the function name notice how the type of the string argument "lpString" has changed as well. In the Unicode version the argument has been translated from LPTSTR to LPWSTR, while in the ANSI version it has been translated to LPSTR.

This type of string argument (LPTSTR) is called a "TEXT string" and functions that support Unicode and ANSI versions always want them as arguments. However, in reality there is no TEXT string type - what actually happens is that the C/C++ compiler resolves the TEXT string to either a Unicode string (LPWSTR) or an ANSI string (LPSTR) in the same way that it works out which version of the function ("W" or "A") to use at compile time.

So, when dealing with Windows API functions that support this dual string interface you must:

Prototype the function with the "W" or "A" suffix
Prototype the function arguments using the correct string type

This is all very well but how do you know when you have to watch out for the W/A suffix? Well, there are two easy ways to tell:

Read the documentation! Microsoft documentation has been improving steadily over the years - these days they usually explicitly mention the A and W versions along with DLL they are exported from (usually at the bottom of the actual function documentation).

Look at the arguments passed. If you see any arguments that look like this:
- LPTSTR (pointer to a TEXT string)
- LPCTSTR (pointer to a constant TEXT string)
Then you can be pretty sure you're dealing with a function that has an ANSI and a Unicode version.

(Incidentally the "constant" version of the argument means that the function promises not to alter the contents of the string that you pass it - it actually makes no difference to the way you prototype in function in OI)

Unicode or ANSI?

Having a dual string interface harks back to the days of Windows 9x operating systems that were basically ANSI systems with a thin layer of Unicode bolted on the top. Modern NT-based systems (Win2K, XP, Vista/Win2008 and Windows 7 etc.) all use Unicode internally, and the "A" functions simply convert passed ANSI strings to Unicode and invoke the "W" function instead, thus incurring extra overhead (In fact you may notice that many of the newer API functions that were not present on Win9x systems do not have a dual interface, and are exclusively Unicode, so there's no suffix/TEXT string translation to worry about).

As OpenInsight no longer supports Win9x it makes sense to exclusively use the "W" versions where you can, thus avoiding the overhead of the internal "A" to "W" translation.

What's next?

That's the heavy theory lesson over now, so you should be aware of the "A" and "W" style Windows API functions and the fact that they take different types of strings. In Part 2 of this series, we'll be taking a look at actually passing strings to these functions and the different ways this can be done.

Labels: DLL, DLL Prototyping, OpenInsight, Windows API

0 Comments:

Post a Comment