Skip to content Skip to sidebar Skip to footer

In Hive, How To Read Through Null / Empty Tags Present Within An Xml Using Explode(xpath(..)) Function?

In below Hive-query, I need to read the null / empty 'string' tags as well, from the XML content. Only the non-null 'string' tags are getting considered within the XPATH() list now

Solution 1:

The problem here is that XPATH returns NodeList and if it contains empty node, it is not included in the list.

Concatenation with some string (in XPATH): concat(/Value/string/text()," ") does not work here:

Caused by: javax.xml.xpath.XPathExpressionException: com.sun.org.apache.xpath.internal.XPathException: Can not convert #STRING to a NodeList!

at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:195)

Easy solution is to replace <string></string> and <string/> with <string>NULL</string> and then you can convert 'NULL' string to null.

Demo:

with your_data as (
select  '<ParentArray><ParentFieldArray><Name>ABCD</Name><Value><string>111</string><string></string><string>222</string></Value></ParentFieldArray><ParentFieldArray><Name>EFGH</Name><Value><string/><string>444</string><string></string><string>555</string></Value></ParentFieldArray></ParentArray>' as xmlinfo
)

select name, case when value='NULL' then null else value end value
  from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo 
          from your_data d
       ) d
       lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as  Name
       lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as value

Result:

name    value
ABCD    111
ABCD    
ABCD    222
EFGH    
EFGH    444
EFGH    
EFGH    555

Post a Comment for "In Hive, How To Read Through Null / Empty Tags Present Within An Xml Using Explode(xpath(..)) Function?"