kbmMW Pro, Enterprise and Community Edition also contains a 4th way to parse XML, namely SAX parsing.
It is, in fact, being used automatically by the 3 previous ways shown in Did you know? #1 3 ways to parse XML
A SAX parser is extremely fast in parsing XML, and can handle huge XML files, however it is more cumbersome to use, since it does not automatically build an XML DOM for you, which means you will need to make some code, that targets the specific XML file.
But I think it could be of interest to some to see how one can derive your own specific parser from the kbmMW SAX XML parser.
We are again starting out with some XML to be parsed:
const
XML: string =
'<?xml version="1.0"?>'+
'<DefaultBody>'+
'<Default>'+
' <Defaultcode>XML_BOLIMPORT_MAP</Defaultcode>'+
' <Omschrijving>Folder waar de xml bestanden staan, die gemaakt zijn door bolmate vanuit de inkoop</Omschrijving>'+
' <Waarde>Z:\tmp</Waarde>'+
'</Default>'+
'<Default>'+
' <Defaultcode>XML_DESTUSED_MAP</Defaultcode>'+
' <Omschrijving>XMLBestanden die verplaatst worden als ze klaar zijn</Omschrijving>'+
' <Waarde>Z:\tmp\xmlexport</Waarde>'+
'</Default>'+
'</DefaultBody>';
And we still want it parsed into a list of TDef objects:
type
TDef = class
public
Defaultcode:string;
Omschrijving:string;
Waarde:string;
end;
TDefs = class(TObjectList<TDef>);
Method 4 – SAX based parsing
First we need to make a SAX parser that match the XML file we want to parse. It can be coded in different ways, with more or less sloppy syntax and error handling, since it is up to you, to determine when there is a logic/syntax error in the XML data.
Since a SAX parser is basically a tokenizer, all you get from it are tokens. How their internal relation is, is all up to you to figure out.
Typically it makes sense to have one or more statemachines within your derived SAX parser to keep structure of the syntax, so you know where each token you get, is supposed to be handled.
The following is a sample SAX parser that both makes rudimentary syntax check, and parses XML files structured as the above XML example:
interface
using
kbmMWXML;
...
type
TSAXState = (ssNone,ssDefaultBody,ssDefault,ssSymbol);
TSAXParser = class(TkbmMWCustomSAXXMLParser)
private
FState:TSAXState;
FDefs:TDefs;
public
constructor Create(const AString:string; const ADefs:TDefs);
procedure Parse; override;
end;
implementation
constructor TSAXParser.Create(const AString:string; const ADefs:TDefs);
var
ss:TStringStream;
begin
ss:=TStringStream.Create(AString);
try
inherited Create;
SetStream(ss);
finally
ss.Free;
end;
FDefs:=ADefs;
Parse;
end;
procedure TSAXParser.Parse;
procedure Error(const AText:string; const AToken:string);
begin
raise Exception.Create('Error parsing XML - '+AText+' - '+AToken);
end;
var
def:TDef;
symbol:string;
begin
FState:=ssNone;
def:=nil;
while true do
begin
NextToken(FState=ssSymbol);
case TokenType of
mwxml_tEnd:
begin
break;
end;
mwxml_tSymbol:
begin
symbol:=TokenString;
dec(FState);
continue;
end;
mwxml_tLineEnd:
begin
continue;
end;
mwxml_tXMLTag:
begin
if IsClosingTag then
begin
if (FState=ssNone) and (TokenName='xml') then
continue
else if (FState=ssDefault) then
begin
if TokenName='Default' then
begin
if def<>nil then
begin
if FDefs<>nil then
FDefs.Add(def)
else
def.Free;
end;
def:=nil;
FState:=ssDefaultBody;
end
else if TokenName='Defaultcode' then
def.Defaultcode:=symbol
else if TokenName='Omschrijving' then
def.Omschrijving:=symbol
else if TokenName='Waarde' then
def.Waarde:=symbol
else
Error('Invalid closing tag',TokenName);
end;
end
else
begin
if (FState=ssNone) and (TokenName='xml') then
continue
else if (FState=ssNone) and (TokenName='DefaultBody') then
FState:=ssDefaultBody
else if (FState=ssDefaultBody) and (TokenName='Default') then
begin
FState:=ssDefault;
def:=TDef.Create;
end
else if (FState=ssDefault) then
begin
if TokenName='Defaultcode' then
FState:=ssSymbol
else if TokenName='Omschrijving' then
FState:=ssSymbol
else if TokenName='Waarde' then
FState:=ssSymbol
else
Error('Unknown value',TokenName);
end
else
Error('Invalid structure',TokenName);
end;
end;
mwxml_tXMLComment:
begin
continue;
end;
mwxml_tXMLCDATA:
begin
continue;
end;
end;
end;
end;
As you notice all the gruntwork is happening within a Parse method. The code basically asks for next token, figures out what to do with it, and updates a simple statemachine to indicate how far down the example XML tree the parser is. It also checks if the token is an opening tag or not, or if the token is a tag or a symbol. A tag is the one that define the name of each node in the XML file. Eg. <tag></tag>. The later tag is a closing tag.
Similarly a tag like this: <tag/> which basically combines the tag and the close tag in one, usually indicating a null value tag, can be checked with IsEndTag. You can also test for if the value of the tag is null using the property IsNilTag. In the above example we do not have any of those types of tags, so we do not check for that.
If a token is a tag, it can also have attributes. Eg. <tag attr1=1></tag>. Those attributes are available within the parser in the Attribs property, or via the IndexOfAttrib or AttribValue methods.
A tags namespace can be accessed via the NameSpace propety, and if there is defined a datatype on the tag, it can be accessed via the DataType property.
The SAX parser also detects declaration type tags eg. <?xml version=”1.0″?> which you can test for using IsDeclarationTag, and markup type tags, eg. <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>which can be checked for by IsMarkupDeclarationTag.
Back to basics… the actual call to make the parsing happening:
function ParseXML4(const AString:string):TDefs;
begin
Result:=TDefs.Create();
TSAXParser.Create(AString,Result).Free;
end;
![]()






