kbmMW Binary Parser #1

Next version of kbmMW Enterprise Edition will include a definition file based (for the moment, fixed record) binary parser.

What does it do? Well.. it parses well formed binary (or textual) streams to extract telegrams and their contents. It functionality wise can be compared to a regular expression, just for bit and byte level information, although with simple scripting and calculation capabilities.

A telegram is in this sense a fixed length bunch of bytes, which may contain bit fields or byte fields or ASCII type string data.

The definition file defines how the telegrams are looking, what subparts they consist of, and what to do when a matching part has been found.

The outcome is typically a match along with a number of keys/values, or a failure to match with anything. The actual naming and use of the keys and their values is up to the developer to decide.

A definition file is default written in YAML and consists of 3 main sections:

VALUES
TELEGRAMS
TAGS

The VALUES section can contain a list of predefined values to be available before any attempt to match anything. This can for example be used for defining “constants” which your application understands or default values.

The TELEGRAMS section contains an array of the telegram masks to look for, and the TAGS section contains an optional number of named sub parts referenced from either the TELEGRAMS section or from within the TAGS section.

It may seem a bit vague right now, but it probably makes more sense when I show a sample in a moment.

The TELEGRAMS section and the TAGS section both contains masks and optional expressions to be executed when a mask match. They also both define if the mask is a bytes mask, a bits mask or a string mask.

Masks which has been defined as bytes masks, always operates on the byte level. Similarly masks which has been defined as bits masks, always operates on the bit level (currently maximum 8 bit per mask).

String masks are similar to bytes masks, except that they compare ASCII strings.

Let us look at a sample definition file. As YAML actively is using indentation to determine if something belongs to current definition or is a new definition, it is of high importance that the indentation is correct. YAML also recognizes lines starting with – as an entry in an array, unless the dash seems to be part of another property. In fact YAML is pretty complex in what it understands, but it does read easier for the human eye, why I chose it as the default definition file format.

The sample definition file is for a standard scale format called Toledo deviced by a company called Mettler Toledo many years ago.

YAML wise, the document actually contains an object with one single property named TOLEDO, which has 3 properties, VALUES, TELEGRAMS and TAGS.
The VALUES property has a number of properties with values. The TELEGRAMS object has one property with an array of objects each having a mask and optional expr property.

The TAGS property is an object which has a number of properties (SWA, SWB, SWC, DP etc) which each are objects containing a property named bytes/bits/string which is an object containing either a single mask and optional expr property, or an array of such.

It may take a while getting used to read and write YAML documents, but perseverance makes experts.

Lines starting with # are comments.

# This is a sample file showing how to parse Toledo telegrams
# using kbmMW Binary Parser

TOLEDO:
    VALUES:
        # Unit constants
        C_UNIT_GRAM:                 2000
        C_UNIT_UK_POUND:             2001
        C_UNIT_KILOGRAM:             2002
        C_UNIT_METRIC_TON:           2003
        C_UNIT_OUNCE:                2004
        C_UNIT_TROY_OUNCE:           2005
        C_UNIT_PENNY_WEIGHT:         2006
        C_UNIT_UK_TON:               2007
        C_UNIT_CUSTOM:               2008

        # Status constants
        C_STATUS_OK:                 1000
        C_STATUS_DATA_ERROR:         1001
        C_STATUS_SCALE_ERROR:        1002
        C_STATUS_SCALE_OVERLOAD:     1003
        C_STATUS_IN_MOTION:          1004
        C_STATUS_TARE_ERROR:         1005
        C_STATUS_TRANSMISSION_ERROR: 1006
        C_STATUS_INVALID_COMMAND:    1007
        C_STATUS_INVALID_PARAMETER:  1008

        # Tare constants
        C_TARE_PRESET:               3000
        C_TARE_AUTO:                 3001
        C_TARE_NONE:                 3002

        # Default values
        STATUS:                    @C_STATUS_OK
        TARE:                      0
        GROSS:                     0
        NET:                       0
        INCREMENT_SIZE:            1
        IS_POWER_NOT_ZEROED:       false
        IS_SETTLED:                false
        IS_OVERLOAD:               false
        IS_NEGATIVE:               false
        IS_CHECKSUM_OK:            false
        WEIGHT_FACTOR:             1
        TARE_FACTOR:               1
        TARE_CODE:                 @C_TARE_NONE
        TERMINAL_NO:               0
        WEIGHT_UNIT:               @C_UNIT_KILOGRAM
        TARE_UNIT:                 @C_UNIT_KILOGRAM

    TELEGRAMS:
         bytes:
              - mask: [ 0x2, @SWA, @SWB, @SWC, 6*@W, 6*@T, 0xD, @CHK ]
                expr: - "WEIGHT_UNIT=IF(IS_UNIT_UK_POUND=1,C_UNIT_POUND,IF(IS_UNIT_KILOGRAM,C_UNIT_KILOGRAM,WEIGHT_UNIT))"
                      - "TARE_UNIT=WEIGHT_UNIT"
                      - "STATUS=IF(IS_CHECKSUM_OK=1,IF(IS_OVERLOAD,C_STATUS_OVERLOAD,C_STATUS_OK),C_STATUS_DATA_ERROR)"
                      - "WEIGHT=WEIGHT*WEIGHT_EXPANSION*IF(WEIGHT_FACTOR<1,WEIGHT_FACTOR,1)"
                      - "TARE=TARE*TARE_EXPANSION*IF(TARE_FACTOR<1,TARE_FACTOR,1)"
                      - "GROSS=IF(IS_NETTO=0,WEIGHT,0)"
                      - "NET=IF(IS_NETTO=1,WEIGHT,0)"

    TAGS:

        SWA:
         bits:
                # bit offset 0
                mask: [ 0, 0, 1, 2*@IS, 3*@DP ]

        DP:
         bits:
        # bit offset 0, 3 bits
              - mask: [ 0, 0, 0 ]
                expr: [ WEIGHT_FACTOR=100, TARE_FACTOR=100 ]
              - mask: [ 0, 0, 1 ]
                expr: [ WEIGHT_FACTOR=10, TARE_FACTOR=10 ]
              - mask: [ 0, 1, 0 ]
                expr: [ WEIGHT_FACTOR=1, TARE_FACTOR=1 ]
              - mask: [ 0, 1, 1 ]
                expr: [ WEIGHT_FACTOR=0.1, TARE_FACTOR=0.1 ]
              - mask: [ 1, 0, 0 ]
                expr: [ WEIGHT_FACTOR=0.01, TARE_FACTOR=0.01 ]
              - mask: [ 1, 0, 1 ]
                expr: [ WEIGHT_FACTOR=0.001, TARE_FACTOR=0.001 ]
              - mask: [ 1, 1, 0 ]
                expr: [ WEIGHT_FACTOR=0.0001, TARE_FACTOR=0.0001 ]
              - mask: [ 1, 1, 1 ]
                expr: [ WEIGHT_FACTOR=0.00001, TARE_FACTOR=0.00001 ]

        IS:
         bits:
              # bit offset 3, 2 bits
              - mask: [ 0, 1 ]
                expr: INCREMENT_SIZE=1
              - mask: [ 1, 0 ]
                expr: INCREMENT_SIZE=2
              - mask: [ 1, 1 ]
                expr: INCREMENT_SIZE=5

        CHK:
         bytes:
                expr: "IS_CHECKSUM_OK=IF(CHK2COMP7(0,17)=VALUE,1,0)"

        SWB:
         bits:
                mask: [ 0, IS_POWER_NOT_ZEROED, 1, IS_UNIT_UK_POUND/IS_UNIT_KILOGRAM, !IS_SETTLED, IS_OVERLOAD, IS_NEGATIVE, IS_NETTO ]

        SWC:
         bits:
                mask: [ 0, IS_HANDTARE, 1, @EW, IS_PRINTREQUEST, 3*@WF ]

        WF:
         bits:
              - mask: [ 0, 0, 0 ]

              - mask: [ 0, 0, 1 ]
                expr: [WEIGHT_UNIT=C_UNIT_GRAM, TARE_UNIT=C_UNIT_GRAM ]
              - mask: [ 0, 1, 0 ]
                expr: [WEIGHT_UNIT=C_UNIT_METRIC_TON, TARE_UNIT=C_UNIT_METRIC_TON ]
              - mask: [ 0, 1, 1 ]
                expr: [WEIGHT_UNIT=C_UNIT_OUNCE, TARE_UNIT=C_UNIT_OUNCE ]
              - mask: [ 1, 0, 0 ]
                expr: [WEIGHT_UNIT=C_UNIT_TROY_OUNCE, TARE_UNIT=C_UNIT_TROY_OUNCE ]
              - mask: [ 1, 0, 1 ]
                expr: [WEIGHT_UNIT=C_UNIT_PENNY_WEIGHT, TARE_UNIT=C_UNIT_PENNY_WEIGHT ]
              - mask: [ 1, 1, 0 ]
                expr: [WEIGHT_UNIT=C_UNIT_UK_TON, TARE_UNIT=C_UNIT_UK_TON ]
              - mask: [ 1, 1, 1 ]
                expr: [WEIGHT_UNIT=C_UNIT_CUSTOM, TARE_UNIT=C_UNIT_CUSTOM ]

        EW:
         bits:
              - mask: 0
                expr: [ WEIGHT_EXPANSION=1, TARE_EXPANSION=1 ]
              - mask: 1
                expr: [ WEIGHT_EXPANSION=10, TARE_EXPANSION=10 ]

        W:
         string:
                expr: WEIGHT=VALUE

        T:
         string:
                expr: TARE=VALUE

When the kbmMW Binary Parser is provided this definition file, it compiles it to build a parse tree, which efficiently can parse whatever you throw at it as a file or a stream.

We can see that one telegram mask has been defined in the TELEGRAMS/bytes array. It contains a mask that consists of 8 parts. Each part is, unless a * is included, exactly 1 byte wide.

The first part is 0x2 which simply means that the data must start with the hexadecimal value 2, which is STX (start of transmission) in the ASCII character set.

The second part is @SWA, which means that there must be one byte, which will be parsed by the tag called SWA.

The @SWB and @SWC also match one byte, that each of them must be parsed by a named tag.

Then we have the 6*@W part. That means that there are 6 bytes which must be parsed by the W tag.

You get the picture?

Let’s look at the SWA tag. It is defined as a bits tag. Hence it only parses bits and at most 8 of them. It has a mask defined as 0, 0, 1, 2*@IS, 3*@DP

That means that most significant bit should be 0, next one should also be 0, next should be 1, and then comes 2 bits which should be parsed by the IS tag, and then 3 lowest significant bits should be parsed by the DP tag.

Looking at the DP tag, you will see that is also a bits type tag, which makes sense since we are parsing a subset of bits from the SWA tag.

There are defined a number of possible DP bit masks, which, when matched, result in one or more expressions being executed.

So let’s say that the 3 bits matches 1 0 1. Then the expressions WEIGHT_FACTOR=0.001 and TARE_FACTOR=0.001 are both executed, essentially setting some values we can use later on, or explore from our program. Notice the []? In YAML that is called an inline array, where each element is separated by a comma. That is the reason why I mention that two expressions are executed in this case, when the match is successful.

The IS tag follow a similar procedure as the DP tag.

The SWB tag is an interesting one. It is used for parsing the 3rd byte of the data. It is also a bits type mask, and contains 8 parts, one for each bit in the matched byte.
The most significant bit should be 0. Whatever the next bit is, is set in the value IS_POWER_NOT_ZEROED, which can then be used in other expressions or by the developer later on. Then a 1 bit must be available.

Next comes a bit, which if set, sets IS_UNIT_UK_POUND to 1 and IS_UNIT_KILOGRAM to 0, else it sets IS_UNIT_KILOGRAM to 1 and IS_UNIT_UK_POUND to 0.

The next bit is set negated to the value IS_SETTLED. So if the bit was 1, then IS_SETTLED is set to 0 and visa versa.

The 3 remaining bits sets IS_OVERLOAD, IS_NEGATIVE and IS_NETTO values.

Simple stuff, right? 🙂

Now let us look at the W tag. It’s defined to be a string type tag, which means that any masks we write must be written as strings, and any value matched is seen as a string (a collection of bytes). As the W tag do not have any masks defined, the tag mask is considered a match, and any optional expression on that tag is run. In this case we just set the value WEIGHT equal to the complete matched value.

That introduce the magic word VALUE. It is a special variable, which always contains the latest match, regardless if it is a bits or strings match. In this case, it is how we get the

When all matches has been successful, we have a matching telegram, and only then will all the matching telegrams expressions be run. Internally kbmMW Binary Parser uses the kbmMemTable SQL and expression parser, and as such can do all the things that the expression parser can do, including calling functions etc.

We miss the code to run the parser.

     rd:=TkbmMWBPFileReader.Create(eDefFile.Text);
     try
        rd.OnSkipping:=procedure(var AByteCount:integer)
                       begin
                          Log.Info('Skipping '+inttostr(AByteCount));
                       end;

        // If you want to see the parsed values on a positive match.
        rd.OnMatch:=procedure(AValues:IkbmMWBPValues; var ABreak:boolean; const ASize:integer)
                    var
                       a:TArray;
                       i:integer;
                       row:integer;
                    begin
                         grid.DefaultRowHeight:=25;
                         grid.RowCount:=AValues.Count+1;
                         row:=1;
                         a:=AValues.Names;
                         TArray.Sort(a);
                         for i:=0 to High(a) do
                         begin
                              grid.Cells[0,row]:=a[i];
                              grid.Cells[1,row]:=VarToStr(AValues.Value[a[i]]);
                              inc(row);
                         end;
                         if AValues.Value['IS_OVERLOAD'] then
                            Log.Info('Overload')
//                         else if AValues.Value['IS_SETTLED'] then
//                              Log.Info('Unsettled gross:'+VarToStr(AValues.Value['GROSS']))
                         else
                             Log.Info('Gross:'+VarToStr(AValues.Value['GROSS'])+' Settled:'+vartostr(AValues.Value['IS_SETTLED']));
//                         ABreak:=true; // Only return first match.
                    end;

        rd.Run(eDataFile.Text);
        Log.Info('Found '+inttostr(rd.MatchCount)+' matches. Skipped '+inttostr(rd.SkippedBytes)+' bytes');
     finally
        rd.Free;
     end;

We take advantage of that a file reader is made available, that makes it easy to parse large files. But one could just as easily have created any other type readers, descending from TkbmMWBPCustomReader.

Each time the parser is not able to parse something successfully, it will attempt to skip past it, until either a match is made, or all data has been processed. The OnSkipping event is called on those occasions.

When a match is made, the OnMatch event is called. The developer can choose what to do with the values and if the parsing should continue when the event is done.

The file reader accepts one argument, the name of the definition file. And what is being read, is the file with the filename given in the Run statement.

Run will continue to run, until either all data has been read, or the process is interrupted, by either setting a zero value for AByteCount in OnSkipping, or setting ABreak to true in OnMatch.

After the execution ends, you can explore how many bytes was skipped and how many telegrams was read in total.

The parser is likely to evolve as new requirements appear, and I encourage users of it to play an active role in extending it so we all can benefit from a very versatile binary parser.

kbmMW Binary Parser #1

Bykimbomadsen

Related Posts:

By kimbomadsen

Related Post

Lock-Free Hash Arrays in kbmMW — A Practical Guide

Reverse-Engineering Delphi for Effective Debugging

Taming Delphi’s Unit Initialization Order — A Dependency Graph Approach

Leave a Reply Cancel reply

You missed

MIMERCode: The AI-Friendly Programming Language

Introducing theSKULD — Because Your .dproj Files Deserve Better

Release of theMIMER v1.0.1.3!

Revealing theMIMER v1.0.0