Sprezzatura :: Making Databases Happen

To Malloc or not to Malloc - that is the question

By Sprezz | Tuesday, 9 March 2010 23:21 | 4 Comments

As part of one of our conference presentations we're showing log file analysis in OpenInsight. If you've ever tried any of this you'll know that the files can get pretty big pretty quickly and trawling through them extracting substrings and building new arrays can be a time consuming process.

For one particular application we had to build a field mark delimited array from the data we were retrieving from file and - conscious of the penalty incurred by using <-1> operators - we were merrily using := String : @Fm syntax. Despite this attempt at efficiency the operation was still woefully slow - not so much as put the kettle on slow as cook gourmet dinner from scratch, eat and wash up slow.

So we had to find a way to speed up the operation. Fortunately as we've posted on the Revelation forum before there is an easy way to do this - preallocating memory. Let's compare the rival techniques and see how we get on.

The <-1> Operator

Subroutine ZZ_SpeedTest( Void )

   Declare Function TimeGetTime

   startTime = TimeGetTime()
   newArray  = ""

   For loopPtr = 1 To 99999
      newArray<-1> = @Upper.Case
   Next

   endTime   = TimeGetTime()
   totalTime = endTime - startTime

   Call Msg(@Window, "Total time was " : totalTime)

Return

This option took 4207 milliseconds

The := Operator

Subroutine ZZ_SpeedTest( Void )

   Declare Function TimeGetTime

   startTime = TimeGetTime()
   newArray  = ""

   For loopPtr = 1 To 99999
      newArray := @Upper.Case : @Fm
   Next

   newArray[-1,1] = ""
   endTime   = TimeGetTime()
   totalTime = endTime - startTime

   Call Msg(@Window, "Total time was " : totalTime)

Return

This option took 4164 milliseconds

The Preallocation Option

Subroutine ZZ_SpeedTest( Void )

   Declare Function TimeGetTime

   startTime    = TimeGetTime()
   stringLength = Len(@Upper.Case : @Fm)
   totalLength  = stringLength * 99999
   newArray     = Space(totalLength)
   arrayPtr     = 1

   For loopPtr = 1 To 99999
      newArray[arrayPtr, stringLength] = @Upper.Case : @Fm
      arrayPtr += stringLength
   Next

   endTime   = TimeGetTime()
   totalTime = endTime - startTime

   Call Msg(@Window, "Total time was " : totalTime)

Return

This option took 81 milliseconds - so 50 times quicker - and the bigger the string you are creating, the more impressive the improvement in speed.

The reason for this is simple - when we concatenate to a string the engine has to grab more memory using a process called a malloc - memory allocation. These operations are resource intensive as they need to juggle memory around to make room for the new string. Resource intensive operations are, by their very nature, slow. By preallocating the space needed we do all of our mallocing in one fell swoop and can concentrate on the task at hand.

Labels: Basic+, Performance

4 Comments:

I'd add a warning - do not attempt the string preallocation method while in UTF8 mode! I tried the example above in UTF8 mode and had to stop Oengine after waiting 10 minutes.

The string [pos,len] operators in UTF8 mode are veeeery expensive. The further into the string 'pos' is, the longer it will take to process (so the example loop above gets slower and slower). Essentially this is because in UTF8 mode the byte position of character 'pos' has be calculated from the start of the string each time, to cater for multi-byte characters of different lengths - (correct me if I'm wrong ;)

However, for an example like the one above you don't need to be in UTF8 mode for it to work - even if the strings contain multi-byte characters (as does @upper.case in UTF8 mode). So you can temporarily ensure UTF8 mode is off before such a loop with
utf8mode = isUTF8()
Call SetUTF8( 0)
and return to the previous state afterwards with
Call SetUTF8( utf8mode)

Cheers, M@

By Matt Crozier, At 10 March 2010 at 06:58
Hi M@,

Glad you pointed that out - you are indeed correct :) UTF8 and the [] operators really don't play nicely together.

That said there's been some new functionality implemented in the engine and compiler for the upcoming OI 9.2. The [] operators have been extended so that you can pass in a BYTE position instead of a CHARACTER position, which is something we'll be looking at in a new post later this week. It really pulls the speed of string scanning and parsing in UTF8 mode back into the real world!

(As an aside you can also use the PutBinaryValue function we mentioned in a recent post instead of the [] operators if you wish when replacing parts of a string)

By Captain C, At 10 March 2010 at 07:29
A post on using PutBinaryValue can be found here:

http://www.sprezzatura.com/blog/2010/03/utf8-to-malloc-or-not-to-malloc-that-is.html

By Captain C, At 10 March 2010 at 10:09
Aye Aye, Cap'n!

Good point about aborting while in ANSI mode, and how the Binary functions are safer - I might revisit some of our code! The OI 9.2 string handling enhancements sound great.

Cheers, M@

By Matt Crozier, At 10 March 2010 at 20:06

4 Comments:

Post a Comment