Register Login

Converting XML to JSON using Python

Updated Jun 11, 2021

XML is a well-known markup language that provides data in an organized and easy-to-understand manner. The team designed this markup language to store data. It is case-sensitive and offers developers to establish markup elements and produce customized markup language.

XML does not have predefined tags. They are lightweight but time-consuming to write as compare to JSON. In this particular tutorial, you will learn how to convert an XML structure to JSON.

What JSON does?

JSON (JavaScript Object Notation), is an open-standard file format used for data interchange. It can store human-readable text. We can access and use it, transmit data objects as attribute-value pairs & arrays. Developers can use JSON in place of XML as JSON is in trend because of its heavy use, lightweight, easy-readable structure, and ease to design. Both JSON & XML uses the same concept to transfer data from client to server and vice versa. However, both have different ways to serve for the same cause.

Various methods to Convert XML to JSON

1. Using xmltodict Module:

Xmltodict is a popular Python module that can convert your XML structure to JSON structure. It makes working in XML Ease so that you feel like you are working with JSON. It is not a predefined module, and hence you need to install it using the pip install command.

Program:

import xmltodict
import json
xml_data = """
    <EthicalHacker>
    <ThreatAnalyst>
        <ID> TA01 </ID>
        <Name> Karlos Ray </Name>
        <Rating> 4.6 </Rating>
        <Dept> Intelligence Analyst Dept </Dept>
        <Available> Yes </Available>
    </ThreatAnalyst>
    <ThreatAnalyst>
        <ID> TA102 </ID>
        <Name> Sue </Name>
        <Rating> 4.0 </Rating>
        <Dept>
             <D1> Network Security </D1>
             <D2> Cloud systems </D2>
             <D3> Malware analysis </D3>
        </Dept>
        <Available> False </Available>
    </ThreatAnalyst>
</EthicalHacker>
"""
data = xmltodict.parse(xml_data)
# using json.dumps to convert dictionary to JSON
json_data = json.dumps(data, indent = 3)
print(json_data)

Explanation:

Here, you have to first import the xmltodict as well as the json module. Next, create an XML structure and put all of it inside a string (triple quoted). Create another object that will assign the result of xmltodict.parse() mothod. This method takes the XML string as a parameter. Next use json.dumps() to structure your converted or parsed data. Provide the indentation value as 3. Finally print that JSON data to see the JSON structure.

2. Using Xmljson Module:

Xmljson is another library that its contributors do not maintain actively. But it acts as an alternative to xmltodict and untangle. This library can help developers parse XML to JSON but using specific XML to JSON conventions. It allows converting XML to various Python objects (especially dictionary structures) like in JSON, tree, etc., and vice-versa.

Since it is not a predefined module, you have to install the xmljson module first using the pip install command.

pip install xmljson

 

Program:

from xmljson import badgerfish as bf

from json import dumps

from xml.etree.ElementTree import fromstring

from xmljson import parker

print(dumps(bf.data(fromstring('<EmployeeID> E101 </EmployeeID>')) ))

print(dumps(bf.data(fromstring('<EmployeeID id="A001"> Karlos <sal> 64,000 </sal> </EmployeeID>'))))

print(dumps(bf.data(fromstring('<EmployeeID id="A002"> Deeza <sal> 47,500 </sal> </EmployeeID>'))))

print(dumps(parker.data(fromstring('<x> <a> 101 </a> <b> 203 </b> </x>'), preserve_root = True)))

Explanation:

Here, you have to first import the badgerfish from the xmljson module. Here we alias it with bf. Also, import the json and xml.etree.ElementTree, for helping the conversion happen. Next, inside the print() function, you have to dump the baggerfish data taking the data using fromstring(). These are nested functions call one inside the other. Pass an XML format as a string in the fromstring(). Perform this 'n' number of times for multiple XML lines.

3. Using Reverse Expression (re) Module:

Regular Expression is a powerful feature most modern programming language provides. A RegEx, or Regular Expression, makes a sequence of characters forming a search pattern. Developers and programmers use regular expressions to validate whether a string contains a specific pattern or not. Python has a built-in module called the re module that allows developers to program regular expressions. The re module uses Unicode strings (str) as well as 8-bit strings (bytes) to perform its pattern-based search. We will use this re module to convert our XML code to the JSON structure.

Program:

import json

import re

def getdict(fileArg):

    res = re.findall("<(?P<var>\S*)(?P<attr>[^/>]*)(?:(?:>(?P<val>.*?)</(?P=var)>)|(?:/>))", fileArg)

    if len(res) >= 1:

        attreg="(?P<avr>\S+?)(?:(?:=(?P<quote>['\"])(?P<avl>.*?)(?P=quote))|(?:=(?P<avl1>.*?)(?:\s|$))|(?P<avl2>[\s]+)|$)"

        if len(res) > 1:

            return [{i[0]:[{"@attributes" : [{j[0]:(j[2] or j[3] or j[4])} for j in re.findall(attreg, i[1].strip())]}, {"$values":getdict(i[2])}]} for i in res]

        else:

            return {res[0]:[{"@attributes" : [{j[0]:(j[2] or j[3] or j[4])} for j in re.findall(attreg, res[1].strip())]}, {"$values":getdict(res[2])}]}

    else:

        return fileArg



with open("C:\\Users\\GauravKarlos\\PycharmProjects\\XmlJson\\data.xml", "r") as fil:

    print(json.dumps(getdict(fil.read())))

Explanation:

First, we have to import two modules, json and re. Now, create a user-defined function getdict() with a parameter fileArg. Now use the findall() method of the re module to match the xml pattern having the marker on both sides and some string or variable in between. The return statement is converting those XML structures to JSON structures if the result is greater than 1. Lastly, open a separate XML file which you want to take as input and open it in the Read mode. Then, use the json.dumps() and pass the regular expression function getDict() and pass the file as an argument to it.

4. Using lxml2json Module:

lxml2json is a package of Python that helps in converting an XML structure to its JSON equivalent & vice versa. It allows various options to convert the data to the desired format. For implementing this, you have to import the package before using it in your program.

Program:

from lxml2json import convert

from pprint import pprint as pp

xml_data = """

    <EthicalHacker>

    <ThreatAnalyst>

        <ID> TA01 </ID>

        <Name> Karlos Ray </Name>

        <Rating> 4.6 </Rating>

        <Dept> Intelligence Analyst Dept </Dept>

        <Available> Yes </Available>

    </ThreatAnalyst>

    <ThreatAnalyst>

        <ID> TA102 </ID>

        <Name> Sue </Name>

        <Rating> 4.0 </Rating>

        <Dept>

             <D1> Network Security </D1>

             <D2> Cloud systems </D2>

             <D3> Malware analysis </D3>

        </Dept>

        <Available> False </Available>

    </ThreatAnalyst>

</EthicalHacker>

"""

d = convert(xml_data)

pp(d)

Explanation:

Here, you have to first import the lxml2json module. Also import the pprint module. Next, create an XML structure and put all of it inside a string (triple quoted). Use connvert() function to convert your xml_data string to JSON structure. Finally, print that returned object (d) to see the JSON structure.

Conclusion:

Among these, the regular expression is the fastest because all of its modules are implicitly existing. Xmltodict is the most popular of them all but is slightly slower but has more functionality. It is recommended not to use the xmljson module as it requires some other modules to install and work in conjunction making your program slower. lxml2json is another package you can use as an alternative to convert XML to the JSON structure.


×