In this tutorial I will show how to use one of XML parser libraries called Expat to parse XML data in your C/C++ applications on Linux platform. I assume that you haveĀ  installed Linux Mint on your PC, have basic knowledge of programming in Linux and already learned the basic of XML data format by now as said in the previous tutorial. In the current version Linux Mint comes with Expat XML parser pre-installed.

In this tutorial for discussion I will consider the following sample XML. Save the XML in a file called Catalog.xml

The following program reads Catalog.xml file and prints the name and value of each tag. Save the following program in file ExpatParser.cpp and compile the program with the command $> g++ ExpatSample.cpp -lexpat -o parser. Note -lexpat in the compile command which is to link the Expat library. Run the program with the command $> ./parser and see the output for yourself

This tutorial is not complete guide to the Expat library, instead it gives a start up introduction so the readers after completing the tutorial will be able to explore other facilities of Expat on their own. The Expat API’s name are in itself self explanatory and additional explanation of each API used in the sample program is discussed below

XML_Parser is the data type that acts as handle to the parser. In the first line of the main function a handle to the parser is created using the XML_ParserCreate() function. The argument to this function is the string encoding format which is considered UTF8 if passed NULL and respective formating for the following values

The function XML_SetElementHandler() that takes three argument the handle to the parser, function to be called when the parser encounters opening of a new XML tag and third argument is the function to be called when the parser completes parsing the tag

The function XML_SetCharacterDataHandler() that takes the first argument as handle to the parser and second argument is the function to call for values in the XML tags Now that the purpose of the Expat API is said lets see the details about the local functions start, value and end. These three functions are callback functions and are used by the Expat library to let know the status of parsing to the program

The first argument of all the three functions is as the name implies is the data or argument that the programmer needs to pass to the callback functions. We will see how to pass value for user data later

The second argument to the start() function is name of the tag that is currently being parsed, the third argument is the attribute name and value array. the argument to the value() functions are the user data and the value of the XML tag and the length of that value. The arguments to the end() function are user data and the name of the tag whose parsing is complete

Until now we have only setup the values of parser environment and not yet started the actual parsing of any XML. The actual parsing is done in the call to XML_Parse(), which takes four arguments first is the handle to parser, second the char array that contains the XML, third the length of the XML and fourth argument specifies if this is the last part of XML data or not

The return value of XML_Parse() is validated against 0 (zero) for failure and any non zero return values means success. If the return is 0 (zero) the error code and error message related to the parser is retrieved using the XML_GetErrorCode() and XML_ErrorMessage() functions respectively

Last after the parsing is complete XML_ParserFree() is called that takes handle to the parser as first argument and does clearing of all the resource related to the parser