kbmMW Pro, Enterprise and Community Edition also contains a 4th way to parse XML, namely SAX parsing.
It is, in fact, being used automatically by the 3 previous ways shown in Did you know? #1 3 ways to parse XML
A SAX parser is extremely fast in parsing XML, and can handle huge XML files, however it is more cumbersome to use, since it does not automatically build an XML DOM for you, which means you will need to make some code, that targets the specific XML file.
But I think it could be of interest to some to see how one can derive your own specific parser from the kbmMW SAX XML parser.
We are again starting out with some XML to be parsed:
const XML: string = '<?xml version="1.0"?>'+ '<DefaultBody>'+ '<Default>'+ ' <Defaultcode>XML_BOLIMPORT_MAP</Defaultcode>'+ ' <Omschrijving>Folder waar de xml bestanden staan, die gemaakt zijn door bolmate vanuit de inkoop</Omschrijving>'+ ' <Waarde>Z:\tmp</Waarde>'+ '</Default>'+ '<Default>'+ ' <Defaultcode>XML_DESTUSED_MAP</Defaultcode>'+ ' <Omschrijving>XMLBestanden die verplaatst worden als ze klaar zijn</Omschrijving>'+ ' <Waarde>Z:\tmp\xmlexport</Waarde>'+ '</Default>'+ '</DefaultBody>';
And we still want it parsed into a list of TDef objects:
type TDef = class public Defaultcode:string; Omschrijving:string; Waarde:string; end; TDefs = class(TObjectList<TDef>);
Method 4 – SAX based parsing
First we need to make a SAX parser that match the XML file we want to parse. It can be coded in different ways, with more or less sloppy syntax and error handling, since it is up to you, to determine when there is a logic/syntax error in the XML data.
Since a SAX parser is basically a tokenizer, all you get from it are tokens. How their internal relation is, is all up to you to figure out.
Typically it makes sense to have one or more statemachines within your derived SAX parser to keep structure of the syntax, so you know where each token you get, is supposed to be handled.
The following is a sample SAX parser that both makes rudimentary syntax check, and parses XML files structured as the above XML example:
interface using kbmMWXML; ... type TSAXState = (ssNone,ssDefaultBody,ssDefault,ssSymbol); TSAXParser = class(TkbmMWCustomSAXXMLParser) private FState:TSAXState; FDefs:TDefs; public constructor Create(const AString:string; const ADefs:TDefs); procedure Parse; override; end; implementation constructor TSAXParser.Create(const AString:string; const ADefs:TDefs); var ss:TStringStream; begin ss:=TStringStream.Create(AString); try inherited Create; SetStream(ss); finally ss.Free; end; FDefs:=ADefs; Parse; end; procedure TSAXParser.Parse; procedure Error(const AText:string; const AToken:string); begin raise Exception.Create('Error parsing XML - '+AText+' - '+AToken); end; var def:TDef; symbol:string; begin FState:=ssNone; def:=nil; while true do begin NextToken(FState=ssSymbol); case TokenType of mwxml_tEnd: begin break; end; mwxml_tSymbol: begin symbol:=TokenString; dec(FState); continue; end; mwxml_tLineEnd: begin continue; end; mwxml_tXMLTag: begin if IsClosingTag then begin if (FState=ssNone) and (TokenName='xml') then continue else if (FState=ssDefault) then begin if TokenName='Default' then begin if def<>nil then begin if FDefs<>nil then FDefs.Add(def) else def.Free; end; def:=nil; FState:=ssDefaultBody; end else if TokenName='Defaultcode' then def.Defaultcode:=symbol else if TokenName='Omschrijving' then def.Omschrijving:=symbol else if TokenName='Waarde' then def.Waarde:=symbol else Error('Invalid closing tag',TokenName); end; end else begin if (FState=ssNone) and (TokenName='xml') then continue else if (FState=ssNone) and (TokenName='DefaultBody') then FState:=ssDefaultBody else if (FState=ssDefaultBody) and (TokenName='Default') then begin FState:=ssDefault; def:=TDef.Create; end else if (FState=ssDefault) then begin if TokenName='Defaultcode' then FState:=ssSymbol else if TokenName='Omschrijving' then FState:=ssSymbol else if TokenName='Waarde' then FState:=ssSymbol else Error('Unknown value',TokenName); end else Error('Invalid structure',TokenName); end; end; mwxml_tXMLComment: begin continue; end; mwxml_tXMLCDATA: begin continue; end; end; end; end;
As you notice all the gruntwork is happening within a Parse method. The code basically asks for next token, figures out what to do with it, and updates a simple statemachine to indicate how far down the example XML tree the parser is. It also checks if the token is an opening tag or not, or if the token is a tag or a symbol. A tag is the one that define the name of each node in the XML file. Eg. <tag></tag>. The later tag is a closing tag.
Similarly a tag like this: <tag/> which basically combines the tag and the close tag in one, usually indicating a null value tag, can be checked with IsEndTag. You can also test for if the value of the tag is null using the property IsNilTag. In the above example we do not have any of those types of tags, so we do not check for that.
If a token is a tag, it can also have attributes. Eg. <tag attr1=1></tag>. Those attributes are available within the parser in the Attribs property, or via the IndexOfAttrib or AttribValue methods.
A tags namespace can be accessed via the NameSpace propety, and if there is defined a datatype on the tag, it can be accessed via the DataType property.
The SAX parser also detects declaration type tags eg. <?xml version=”1.0″?> which you can test for using IsDeclarationTag, and markup type tags, eg. <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>which can be checked for by IsMarkupDeclarationTag.
Back to basics… the actual call to make the parsing happening:
function ParseXML4(const AString:string):TDefs; begin Result:=TDefs.Create(); TSAXParser.Create(AString,Result).Free; end;