Pages

Wednesday, 10 March 2010

UTF8 - To Malloc or not to Malloc - that is another question

In our recent post on using memory pre-allocation when building large strings commenter M@ pointed out quite correctly that using the normal [] operators while in UTF8 mode results in a severe performance hit due to the necessity of calculating the character position of the insertion point during each iteration.

A workaround that was suggested was to temporarily switch to ANSI mode for the [] operation and then switch back afterwards. This is a valid solution and one we've used ourselves before, but it does create a possible failure point: If your system hits a fatal debug condition before you switch back you might unknowingly be stuck in ANSI mode which could result in subsequent data corruption.

A safer alternative to this is to use the PutBinaryValue function that we documented here - this ignores any string-encoding and does a straightforward binary copy to the specified offset.

Here's the Preallocation sample program from the previous post updated with the binary functions:


Subroutine ZZ_SpeedTest( Void )

   Declare Function TimeGetTime

   startTime    = TimeGetTime()
   stringLength = GetByteSize( @Upper.Case : @Fm )
   totalLength  = stringLength * 99999
   newArray     = Space(totalLength)
   arrayPtr     = 1

   For loopPtr = 1 To 99999
      PutBinaryValue( newArray, arrayPtr, CHAR, @Upper.Case : @Fm )
      arrayPtr += stringLength
   Next

   endTime   = TimeGetTime()
   totalTime = endTime - startTime

   Call Msg(@Window, "Total time was " : totalTime)

Return


This option took 95 milliseconds in UTF8 mode in our testing. Pretty much on a par with the [] operator in ANSI mode (As a aside the [] operator in UTF8 mode took....... well we don't know actually - we gave up after 10 minutes of waiting for it to finish!)

We also tested the concatenation (:=) option in UTF8 mode - this slowed down the program by half - better than the [] operators but still not great.

No comments:

Post a Comment