This demo is designed to show you how to use locator information. Locator information consists of the LineNumber, ColumnNumber and Ids of the document being parsed. For example, you may want to know where in the original document a specific element begins or ends. The locator information can help you to do this.
As the document starts, the parser fires am OnSetDocumentLocator event. In the event handler you can copy a pointer to the locator and check the properties as the document is parsed. Throughout the parse, the locator information is updated.
In this demo, we will skip all of the basic project setup information. If you need to know more about the project setup, please see the "Choosing a Parser Demo" or the "Simple Parsing Demo".
For this demo we will need to add a few components to
the form. First we will need a ComboBox to choose our parser. Second
we'll add a Listbox (Listbox1) for our document. Next we will need a
Listbox (Listbox2) to show our element and error messages. Next we will need
a button to start our parsing. We also want to add an OpenDialog so that we
can search for the file. Finally we will need to add a
TSAXContentHandler
and a TSAXErrorHandler
.
The SAX components are installed to the SAX tab in the Component
Palette by default.
Now that you have added all of the components let's set up some events.
Our FormCreate event will work exactly as it did in the "Choosing a Parser Demo".
The event handlers for the TSAXErrorHandler
will work similarly
to the ones in our "Simple Parsing Demo". The only difference is that we
are going to record the line number in the Object
property of
each Listbox item:
// In the errors we can use the locator info or the info that is sent // as part of the ISAXParseError. The line and column numbers are generally // be the same-- but may be slightly different depending on the error ListBox2.Items.AddObject('[Error] ' + Error.getMessage + ' Line ' + IntToStr(Error.getLineNumber) + ' Column ' + IntToStr(Error.getColumnNumber), Pointer(Error.getLineNumber));
We typecast the LineNumber, an Integer (4 Bytes), as a Pointer (also 4
bytes) and must remember to typecast it back later. The Objects
property is a great place to store small amounts of information (e.g.
less than four bytes) so that you don't need to create a new object for
each item.
Within our OnClick event for our Button we have also done things a little
different. First we load the document into our first Listbox. Then, instead
of calling the parse
method and sending the file name as we
have in our other demos, we instead create an IStreamInputSource and read
the data out of Listbox1:
procedure TForm1.Button1Click(Sender: TObject); var Stream : TMemoryStream; Input : IInputSource; XMLReader : IXMLReader; XMLBufReader: IBufferedXMLReader; Vendor: TSAXVendor; ContentHandler: IContentHandler; begin ListBox1.Items.Clear; Listbox2.Items.Clear; CurrLocator:= nil; ContentHandler:= SAXContentHandler1; if (OpenDialog1.Execute) then begin // Load the text into the memo ListBox1.Items.LoadFromFile(OpenDialog1.FileName); // Creating a memory stream to store the document might be bad because it // requires that we load the entire document into memory. If you have a // document that is 1 gigabyte large you may run out of memory. Of course, this // demo assumes you can load the whole document into the memo to begin with, // so it is unlikely you will be using a multi megabyte file. Stream:= TMemoryStream.Create; ListBox1.Items.SaveToStream(Stream); // We must reset the stream! Stream.Seek(0, 0); // Now we can create a StreamInputSource. We don't need to free the // Stream we are passing to it. Input:= TStreamInputSource.Create(Stream) as IStreamInputSource; // Get the Default SAX Vendor and XML Reader Vendor:= GetSAXVendor(ComboBox1.Items[ComboBox1.ItemIndex]); if Vendor is TBufferedSAXVendor then begin XMLBufReader:= TBufferedSAXVendor(Vendor).BufferedXMLReader; XMLBufReader.setContentHandler(Adapt(ContentHandler, XMLBufReader)); XMLBufReader.setErrorHandler(SAXErrorHandler1); // This time we send the InputSource we created XMLBufReader.parse(Input); XMLBufReader:= nil; end else begin XMLReader:= Vendor.XMLReader; XMLReader.setContentHandler(ContentHandler); XMLReader.setErrorHandler(SAXErrorHandler1); // This time we send the InputSource we created XMLReader.parse(Input); XMLReader:= nil; end; CurrLocator:= nil; end; end;
Also notice that we set CurrLocator to nil. CurrLocator is our temporary variable for the parser locator information. We declared this variable in the private section of our form:
private CurrLocator : ILocator; end;
Now that we have created the parsing code we need to add the OnSetDocumentLocator and OnStartElement event handlers. Let's look at the OnSetDocumentLocator handler first:
procedure TForm1.SAXContentHandler1SetDocumentLocator(Sender: TObject; const Locator: ILocator); begin // Save the locator, this happens before any other content events CurrLocator:= Locator; end;
This is a very simple event handler. It is fired before any of our other content events. It allows us to assign our temporary locator variable before the document begins parsing. It is important to note that not all parsers report their locator information accurately, although most are close. This is an area of poor conformance. For example, SAX requires parsers to report the element start as the first character after the ">". Some parsers do not adhere to this rule. For this reason it is important to check your parser's accuracy before relying on the information.
The first thing we need to do is to check if CurrNode is nil. Remember we set CurrNode to nil before we began our parse. If CurrNode is nil then this is the first time that the OnStartElement callback has ocurred. This indicates that we need to add the root node to our TreeView. Otherwise we need to add a new node as a child of CurrNode. As we add the item, we also assign CurrNode so that we remember where we are at in the TreeView. Finally we set the ImageIndex so that we can tell that it is an element in the TreeView.
Next we can look at our OnStartElement handler:
procedure TForm1.SAXContentHandler1StartElement(Sender: TObject; const NamespaceURI, LocalName, QName: WideString; const Atts: IAttributes); begin // We will add an "[Element]" item to the list. We will store the // LineNumber in the objects property for easy access later ListBox2.Items.AddObject('[Element] ' + QName + ' Line ' + IntToStr(CurrLocator.LineNumber) + ' Column ' + IntToStr(CurrLocator.ColumnNumber), Pointer(CurrLocator.LineNumber)); end;
This event works very similar to our various error handler events. Instead of obtaining the line and column number from the error object, however, we use our temporary locator variable.
Finally, we will add a double-click event to our second listbox (ListBox2). This allows the user to double-click a message in the listbox and see the relevant position in the document.
procedure TForm1.ListBox2DblClick(Sender: TObject); begin // Check if anything is selected if (ListBox2.ItemIndex = -1) then Exit; // Set the item index to the recorded position. Notice, we need to subtract // 1, because the line numbers reported by the document are "1" based and // the TListBox is "0" based ListBox1.ItemIndex:= Integer(ListBox2.Items.Objects[ListBox2.ItemIndex])-1; end;