|
||||||||||||||||||||||||||||||||||||||
Those of you with long memories may well remember the heyday of Advanced Revelation when it seemed that every week somebody would come out with a new BFS – or “Bond” as we then called it. Starting with Icicle Software and their Lotus 123 bond followed by AS400 bonds, dBase bonds, SQL Server Bonds, Universe Bonds – the list grew and grew. These days though we seem not to hear quite so much about them so we thought that with the forthcoming re-introduction of the SQL Bond in 9.2 it might be time to revisit this subject.
So what is a BFS? The clue is – as ever – in the title. Base Filing System. A Base Filing System is the lowest level of file system available in a Revelation environment. Ultimately it is an abstraction of all of the operations required to manipulate a particular file format encapsulated into one conceptual black box program. We say “conceptual” because there is nothing stopping a BFS being made up of more than one program – a classic example of this is the Linear Hash BFS, which uses both RTP57 and RTP57A to implement Linear Hash. Consider the problem faced by Revelation back in the day. They wanted to create a tool set that could work against ANY back end without requiring reprogramming. Such a tool set would be a boon for developers and users alike. So they came up with the concept of a BFS. They created a list of primitive codes that a filing system would need to support and the codified this in the BFS standard. Basically they said “if you respond in the following manner to the following requests we really don’t care what it is you’re talking to on the back end”. So how does it work? In the first instance we need to provide OpenInsight with some information about the data we're trying to access. If you think about it, all that OI needs to be told is just the name of a data row that defines both the location we're interested in and the name of the BFS to use. And this is exactly what a volume pointer is. Now admittedly it might need some additional information such as where to keep shadow dictionaries but location and the BFS are the main constituents. Attaching a volume could be mapped as follows :- So let's describe a simple application of this by reference to an imaginary BFS called dBASE.BFS which is a program that has been created to read and write dBASE files. We can obviously store dBASE files anywhere on the disk so we need to tell OI their exact location. Let's assume that we're going to keep them in DATAVOL - yes we can mix and match different file types at the same location because we're using different BFSs. So to "dry run" the above diagram we firstly create a volume pointer called "DBASE_INFO" that tells OI that the BFS is called DBASE.BFS and that the location is called DATAVOL...
So now if you looked at SYSTABLES you would find rows for each DBF table detaiing its location and the fact that it uses the DBASE.BFS. For the sake of argument let us assume that one such row was called MEMBERS.DBF. Now when we come to want to open the file for any reason RTP9 (the routine responsible for opening files) would be called with the parameter MEMBERS.DBF. It would read the row from SYSTABLES and establish that the BFS used for that file was DBASE.BFS. It would then call DBASE.BFS passing it the location and the name of the file to open - so DATAVOL and MEMBERS.DBF. DBASE.BFS would simply validate that the file existed and if it did it would return a file handle which could have any structure that the DBASE.BFS wanted as long as it followed the syntax BFS_NAME : @Vm : HandleIdentifyingFile. In its most simple form this could be DBASE.BFS : @Vm : MEMBERS.DBF. This resulting handle would be put into the variable that the file was opened to and would become our file handle. So now we have a file handle. When we come to read or write to the file handle OI is completely agnostic as to what the BFS actually is. It doesn't care if we are reading or writing Linear Hash, DBASE or SQL. The engine in its role as opcode interpreter simply comes across the opcode for, say a read and says "Oooh a read - so the next opcode will identify the file handle, the next opcode will identify the row id and the next opcode will identify the variable to place the resultant read in". It then looks at the file handle and effectively says FirstBitOfFileHandle = FileHandle<0, 1> Call @FirstBitOfFileHandle(Read, FileHandle, RowId, Record, Unused, Flag) which you'll recognise if you've ever written an MFS. It is then down to DBASE.BFS to implement the read as it sees fit - as long as it sets the Flag variable to indicate a successful or unsuccessful operation (and in the event of it being unsuccessful ideally sets @File.Error to indicate the nature of the failure) OI will be perfectly happy. So in conclusion - BFSs abstract the logical layer from the physical layer and OI tools work with the logical layer rather than the physical layer. If the BFS is well written then all of OI's data manipulation tools (Window, Popups, Rlist, Basic+ etc) will just assume that the underlying filing system is effectively linear hash.
Recent changes to the "[]" operators in OpenInsight 9.2 have resulted in substantial performance improvements to UTF8 mode string handling. This post highlights another such enhancement introduced in 9.2 to help bring UTF8 mode applications up to the standard of their ANSI counterparts.
Consider the Loop/Remove construct below: 0001 /* 0002 Example showing standard loop/remove construct used 0003 to parse dynamic arrays at high speed 0004 */ 0005 0006 mark = 1 0007 pos = 1 ; * // This is the CHARACTER position 0008 Loop 0009 Remove nextVal From dynArray At pos Setting mark 0010 0011 // Process nextVal... 0012 0013 While mark 0014 Repeat This is a common way to efficiently parse dynamic arrays in Basic+, but just like the normal "[]" operators it suffers from a severe performance degradation in UTF8 mode due to the need to find the byte offset of a character when given the position. To alleviate this Revelation have introduced the BRemove statement - this operates in exactly the same fashion as the normal Remove statement, but the index variable used in BRemove refers to a byte offset rather than a character position. Here is the same example rewritten to use BRemove: 0001 /* 0002 Example showing UTF8-friendly loop/remove construct used 0003 to parse dynamic arrays at high speed 0004 */ 0005 0006 mark = 1 0007 pos = 1 ; * // This is the BYTE offset 0008 Loop 0009 BRemove nextVal From dynArray At pos Setting mark 0010 0011 // Process nextVal... 0012 0013 While mark 0014 Repeat As you can see it's a simple change and one worth making - using BRemove in your UTF8 applications will ensure that your dynamic array parsing remains fast and efficient. Labels: Basic+, Performance, Unicode
So now we’ve covered how the index transactions get put into the bang table all that is left is to discuss how they move from the bang table as transactions into the bang table as balanced index nodes.
In the first article we explained that transactions were introduced to allow slow hardware to distribute the transaction processing. In addition to this the engineers at Cosmos had to come up with a way of allowing individual workstations to use spare processing power (when the PC was left unused) to move the transactions from the bang file into the index itself in a way that was easily interruptible if the user wanted to take control of their PC again. This being the case they opted not to move transactions straight from the bang file into the indexes as this could be an intensive operation. Before pressing on with an explanation of this let’s briefly review what the transactions actually contain. At this stage we’re not going to explain the precise structure of the index transaction rows, just the concepts behind them. As part II explained transaction records are made up of the changes to the indexed columns, specifically the row id of the row that has changed, the column that has changed and the old and new values of the indexed column. If time were no constraint each transaction row could be picked up and all of the indexes referenced therein be updated before returning control to the user. However it is unlikely that a user would be prepared to wait this long so the process has been subdivided into tasks. There are essentially two tasks – move the transactions from generic transactions to indexed column specific transactions and finally move the index specific transactions into the index, removing the old value if appropriate and inserting the new. This is achieved using three routines :- REV_BGND_UPDATE F.DISTRIBUTOR F.INDEXER REV_BGND_UPDATE This is the routine that runs when the system is idle. It works through the indexed files in the system – seemingly using the system variable @INDEX.TIME which has three fields – field one contains an @Vm delimited list of indexed tables, field two contains the table number to start on and field three contains the pointer to the indexed column to work on. It is responsible for calling F.DISTRIBUTOR and F.INDEXER as required. F.DISTRIBUTOR This is the routine that moves the generic transactions (0, 1, 2, et al) into column specific transactions (e.g. NAME, NAME*1, NAME*2 et al). F.INDEXER This is the routine that takes the columns specific transactions (be they BTree, Relational or Computational) and updates the appropriate index row. Labels: indexing
As was pointed out in a recent post the performance of the "[]" string operators in UTF8 mode is pretty poor. In fact it's downright painful - If you've not seen the effects before go and create yourself a UTF8 application and then try compiling a program. The speed drop you see is due to the system pre-compiler (REV_COMPILER_EXPAND) making heavy use of the "[]" operators during the compilation process in a manner similar to this:
0001 /* 0002 This is the usual way of implementing fast string parsing in Basic+. 0003 We scan along the string looking for a delimiter, and remember 0004 where we found it via the Col2() function. For the next iteration 0005 we increment that position and scan from that point. 0006 */ 0007 src = xlate( "SYSPROCS", "MSG", "", "X" ) 0008 0009 pos = 1 0010 eof = len( src ) 0011 0012 loop 0013 token = src[pos," "] 0014 pos = col2() + 1 0015 0016 * // Do stuff... 0017 0018 while ( pos < eof ) 0019 repeat The problem is caused by the need to find the correct starting character position before any string processing can take place. Because UTF8 is a multi-byte encoding scheme it is necessary to start looking from the beginning of the string to find the byte offset of the specified character, as it's possible for a character to be encoded in more than one byte. As you can appreciate, parsing a large string over many iterations wastes a lot of time looking for the character at the specified position - we could get much better performance if we had some way to access the actual byte offset and pass that to the "[]" operators instead. Well, the good news is that with the upcoming release of OpenInsight 9.2 Revelation have addressed this problem by extending the "[]" operators and adding two new functions to allow access to the byte offset: BCol1() and BCol2(). BCol1() and BCol2() The usual way to access the position of delimiters found with the "[]" operators or the Field() function is to use the Col1() and Col2() functions, which return the character position. The new BCol1() and BCol2() functions work in a similar fashion but return the byte offset instead, so you know how many bytes from the beginning of the string that a particular character was found. The extended "[]" operators Although BCol1() and BCol2() allow access to the byte offsets they can't be used with a normal "[]" operator because it expects the character index as the first argument not the byte offset. The extended "[]" operators take an extra argument (a simple "1" as a flag) to indicate that the first argument is a byte offset, and can be used like so: 0001 /* 0002 This example shows a UTF8-friendly way of parsing a string using 0003 byte offsets with the extended "[]" operators. 0004 */ 0005 src = xlate( "SYSPROCS", "MSG", "", "X" ) 0006 0007 pos = 1 0008 delim = " " 0009 delimSize = GetByteSize( delim ) 0010 eof = GetByteSize( src ) 0011 0012 loop 0013 token = src[pos,delim,1] ; * // Extended - note the last "1" argument 0014 pos = BCol2() + delimSize ; * // Get the byte offset and increment by 0015 ; * // the delimiter _byte_ size 0016 0017 * // Do stuff... 0018 0019 while ( pos < eof ) 0020 repeat (Note also that we check the byte size of the delimiter we are using - although we *know* that a space is 1 byte in both ANSI and UTF8 modes, it's good practice to check this at runtime in case you ever end up using a delimiter that is multi-byte encoded instead) Both Field() and the normal "[]" operators update the BCol1 and BCol2 offsets, as well as the normal Col1 and Col2 positions. The extended "[]" operators only update the BCol1 and BCol2 offsets for obvious reasons. [EDIT: 20 March 2010] To maintain naming consistency with other UTF8-related enhancements the Col1B and Col2B functions have been renamed to BCol1 and BCol2 - this has been changed in the post above. Labels: Basic+, Performance, Unicode
In our recent post on using memory pre-allocation when building large strings commenter M@ pointed out quite correctly that using the normal [] operators while in UTF8 mode results in a severe performance hit due to the necessity of calculating the character position of the insertion point during each iteration.
A workaround that was suggested was to temporarily switch to ANSI mode for the [] operation and then switch back afterwards. This is a valid solution and one we've used ourselves before, but it does create a possible failure point: If your system hits a fatal debug condition before you switch back you might unknowingly be stuck in ANSI mode which could result in subsequent data corruption. A safer alternative to this is to use the PutBinaryValue function that we documented here - this ignores any string-encoding and does a straightforward binary copy to the specified offset. Here's the Preallocation sample program from the previous post updated with the binary functions: Subroutine ZZ_SpeedTest( Void ) Declare Function TimeGetTime startTime = TimeGetTime() stringLength = GetByteSize( @Upper.Case : @Fm ) totalLength = stringLength * 99999 newArray = Space(totalLength) arrayPtr = 1 For loopPtr = 1 To 99999 PutBinaryValue( newArray, arrayPtr, CHAR, @Upper.Case : @Fm ) arrayPtr += stringLength Next endTime = TimeGetTime() totalTime = endTime - startTime Call Msg(@Window, "Total time was " : totalTime) Return This option took 95 milliseconds in UTF8 mode in our testing. Pretty much on a par with the [] operator in ANSI mode (As a aside the [] operator in UTF8 mode took....... well we don't know actually - we gave up after 10 minutes of waiting for it to finish!) We also tested the concatenation (:=) option in UTF8 mode - this slowed down the program by half - better than the [] operators but still not great. Labels: Basic+, Performance, Unicode
As part of one of our conference presentations we're showing log file analysis in OpenInsight. If you've ever tried any of this you'll know that the files can get pretty big pretty quickly and trawling through them extracting substrings and building new arrays can be a time consuming process.
For one particular application we had to build a field mark delimited array from the data we were retrieving from file and - conscious of the penalty incurred by using <-1> operators - we were merrily using := String : @Fm syntax. Despite this attempt at efficiency the operation was still woefully slow - not so much as put the kettle on slow as cook gourmet dinner from scratch, eat and wash up slow. So we had to find a way to speed up the operation. Fortunately as we've posted on the Revelation forum before there is an easy way to do this - preallocating memory. Let's compare the rival techniques and see how we get on. The <-1> Operator Subroutine ZZ_SpeedTest( Void ) Declare Function TimeGetTime startTime = TimeGetTime() newArray = "" For loopPtr = 1 To 99999 newArray<-1> = @Upper.Case Next endTime = TimeGetTime() totalTime = endTime - startTime Call Msg(@Window, "Total time was " : totalTime) Return This option took 4207 milliseconds The := Operator Subroutine ZZ_SpeedTest( Void ) Declare Function TimeGetTime startTime = TimeGetTime() newArray = "" For loopPtr = 1 To 99999 newArray := @Upper.Case : @Fm Next newArray[-1,1] = "" endTime = TimeGetTime() totalTime = endTime - startTime Call Msg(@Window, "Total time was " : totalTime) Return This option took 4164 milliseconds The Preallocation Option Subroutine ZZ_SpeedTest( Void ) Declare Function TimeGetTime startTime = TimeGetTime() stringLength = Len(@Upper.Case : @Fm) totalLength = stringLength * 99999 newArray = Space(totalLength) arrayPtr = 1 For loopPtr = 1 To 99999 newArray[arrayPtr, stringLength] = @Upper.Case : @Fm arrayPtr += stringLength Next endTime = TimeGetTime() totalTime = endTime - startTime Call Msg(@Window, "Total time was " : totalTime) Return This option took 81 milliseconds - so 50 times quicker - and the bigger the string you are creating, the more impressive the improvement in speed. The reason for this is simple - when we concatenate to a string the engine has to grab more memory using a process called a malloc - memory allocation. These operations are resource intensive as they need to juggle memory around to make room for the new string. Resource intensive operations are, by their very nature, slow. By preallocating the space needed we do all of our mallocing in one fell swoop and can concentrate on the task at hand. Labels: Basic+, Performance
Way back in the days of OpenInsight 7.2.1 the EditTable was modified so that the speed of setting large amounts of data via the LIST or ARRAY properties was significantly increased.
What was not changed was the speed of getting data via LIST or ARRAY and this can have a impact on the setting speed if you're not careful, because each Set_Property operation performs an implicit Get_Property regardless of whether or not you actually want the original data. For example, if you already have 10000 lines of data in your EditTable and you want to use the LIST or ARRAY property to set new data you may still see a speed drop as the system uses the slow Get_Property to retrieve the original data before the update. To overcome this problem you can tell the EditTable to clear the existing data before you use Set_Property, that way the implicit Get_Property has nothing to process. This can be done with SendMessage() and the DTM_RESETDATA message like so: 0001 equ DTM_RESETDATA$ to 1025 0002 0003 hwndEdt = get_Property( @window : ".TABLE_1", "HANDLE" ) 0004 call sendMessage( hwndEdt, DTM_RESETDATA$, 0, 0 ) 0005 0006 call set_Property( @window : ".TABLE_1", "LIST", lotsOfStuff ) Labels: EditTable, EditTable Cookbook, OpenInsight
One of the most important points to bear in mind when using the Basic+ string handling functions is that all normal string operations are character-based - not byte-based. This has major implications if you wish to manipulate your data in a byte-oriented fashion when in UTF8 mode, because UTF8 is a multi-byte encoding scheme; i.e. it doesn't always follow that one byte represents one character as is the case in ANSI mode.
To overcome this issue Revelation introduced several new Basic+ functions way back in OpenInsight 7.0 that explicitly allows binary manipulation regardless of the string-handling mode you are currently in (Note that these functions are intrinsic to the Basic+ language and do not need to be declared before use). These functions are:
The intention of this blog post is to document these functions and to make you aware of them so that you can develop your applications correctly should you wish to work in UTF8 mode. (Also note that most of these functions expect you to specify a variable type when using them. This type should be chosen from one of the standard "C" types understood by the Basic+ compiler and listed at the end of this post) GetByteSize Returns the number of bytes occupied by the specified variable. This is in contrast to the Len() function which returns the number of characters. sizeInBytes = GetByteSize( varData )
E.g. rec = Xlate( "SYSOBJ", "$WRITE_ROW", "", "X" ) recSize = GetByteSize( rec ) GetBinaryValue This function extracts a binary value from a variable at a specified offset. You must specify the type of data to extract, and if you are extracting a type with a variable length, such as a string of bytes, you must also pass the number of bytes you wish to copy. binVal = GetBinaryValue( varData, byteOffset, varType, [,noOfBytes] )
E.g. rec = Xlate( "SYSOBJ", "$WRITE_ROW", "", "X" ) // Get the first byte of the record as a number firstByte = GetBinaryValue( rec, 1, BYTE ) // Get the next 10 bytes as a binary string someBytes = GetBinaryValue( rec, 2, BINARY, 10 ) PutBinaryValue This subroutine modifies a variable by replacing binary data at a specifed byte offset. You must specify the type of data you wish to insert as well as the data itself. PutBinaryValue( binData, byteOffset, varType, varData )
E.g. * // Example showing how to access and update * // a Windows API structure using * // the binary operators. * // * // A RECT structure consists of four LONG types * // (32-bit signed integer, each 4 bytes long) * // * // typedef tagRECT{ * // LONG left, * // LONG top, * // LONG right, * // LONG bottom * // } RECT; * // We're going to use the GetWindowRect API function * // to get some RECT coordinates hwnd = Get_Property( @window, "HANDLE" ) rect = blank_Struct( "RECT" ) rect = GetWindowRect( hwnd, rect ) * // Increment the top member by 10 top = GetBinaryValue( rect, 5, LONG ) top += 10 PutBinaryValue( rect, 5, LONG, top ) CreateBinaryData This function creates and returns a "blank" binary variable of the specified type. binVal = CreateBinaryData( varType, varData )
E.g. * // Create a binary integer with an initial value of * // 100 a = "100" intA = CreateBinaryData( INT, a ) Basic+ "C" types The following is a list of variable types that may be used with the Basic+ binary manipulation functions described above.
[EDIT: 05 March 2010] Due to a recently discovered compiler bug (since fixed) the following "C" types will NOT work with the binary manipulation functions prior to OpenInsight 9.2.0:
Probably the biggest impact this will have is processing BINARY types, but you can work around this by using the CHAR type instead as they both perform exactly the same operation. Labels: OpenInsight, Unicode, UTF8
Getting notification of mouse messages has always been something of a problem with OpenInsight EditTable controls. With other controls this process is quite easy - you simply qualify the WINMSG event with the relevent mouse message number and you can easily respond to it, but with an EditTable this process fails.
This is mainly due to the architecture of the EditTable itself - it's not really a single control, it's actually two: The visible control that you interact with (known as the "DataTbl" control) and a very thin parent wrapper around it (called the "Editable" control). When you use an EditTable control in your application OpenInsight creates the wrapper control which in turn creates the visible "DataTbl" control. When you interact with an EditTable in Basic+ you are interacting directly with the wrapper - it simply passes on your request to the "DataTbl" as appropriate. If you qualify the WINMSG event on an EditTable you are qualifying against the wrapper - you are NOT qualifying against the visible "DataTbl" control! When the user interacts with the EditTable control they communicate with the visible "DataTbl", so it is this part that receives the mouse messages. These messages are interpreted and the relevant notifications passed up to the wrapper and then onto OpenInsight - they are not passed directly, so you will never be able use a WINMSG event with the usual mouse messages. However, since OpenInsight 9.1 it has been possible to trap mouse messages by another means. In this version, when the "DataTbl" receives a mouse message, it actually sends a notification message to the wrapper that you can pick up with a WINMSG event. The mouse message itself is simply offset by the value 3124 (WM_USER + 2100). For example to detect a "Right Mouse Button Down" message you would qualify against WM_RBUTTONDOWN + 3124 like so: $insert logical * // From the Windows SDK headers: equ WM_USER$ to 0x0400 equ WM_LBUTTONDOWN$ to 0x0201 equ WM_LBUTTONUP$ to 0x0202 equ WM_LBUTTONDBLCLK$ to 0x0203 equ WM_RBUTTONDOWN$ to 0x0204 equ WM_RBUTTONUP$ to 0x0205 equ WM_RBUTTONDBLCLK$ to 0x0206 equ WM_MBUTTONDOWN$ to 0x0207 equ WM_MBUTTONUP$ to 0x0208 equ WM_MBUTTONDBLCLK$ to 0x0209 * // Offset value equ ETM_MOUSEMSGOFFSET to (WM_USER$ + 2100) ; * // 3124 * // EditTable Mouse message notifications equ ETM_LBUTTONDOWN$ to (ETM_MOUSEMSGOFFSET$ + WM_LBUTTONDOWN$) equ ETM_LBUTTONUP$ to (ETM_MOUSEMSGOFFSET$ + WM_LBUTTONUP$) equ ETM_LBUTTONDBLCLK$ to (ETM_MOUSEMSGOFFSET$ + WM_LBUTTONDBLCLK$) equ ETM_MBUTTONDOWN$ to (ETM_MOUSEMSGOFFSET$ + WM_MBUTTONDOWN$) equ ETM_MBUTTONUP$ to (ETM_MOUSEMSGOFFSET$ + WM_MBUTTONUP$) equ ETM_MBUTTONDBLCLK$ to (ETM_MOUSEMSGOFFSET$ + WM_MBUTTONDBLCLK$) equ ETM_RBUTTONDOWN$ to (ETM_MOUSEMSGOFFSET$ + WM_RBUTTONDOWN$) equ ETM_RBUTTONUP$ to (ETM_MOUSEMSGOFFSET$ + WM_RBUTTONUP$) equ ETM_RBUTTONDBLCLK$ to (ETM_MOUSEMSGOFFSET$ + WM_RBUTTONDBLCLK$) * // To trap a right button down message: call Send_Message( @window : ".TABLE_1", "QUALIFY_EVENT", ETM_RBUTTONDOWN$, TRUE$ ) Labels: EditTable, EditTable Cookbook, OpenInsight |
||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||