In Hive, How To Read Through Null / Empty Tags Present Within An Xml Using Explode(xpath(..)) Function?
In below Hive-query, I need to read the null / empty 'string' tags as well, from the XML content. Only the non-null 'string' tags are getting considered within the XPATH() list now
Solution 1:
The problem here is that XPATH
returns NodeList and if it contains empty node, it is not included in the list.
Concatenation with some string (in XPATH): concat(/Value/string/text()," ")
does not work here:
Caused by: javax.xml.xpath.XPathExpressionException: com.sun.org.apache.xpath.internal.XPathException: Can not convert #STRING to a NodeList!
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:195)
Easy solution is to replace <string></string>
and <string/>
with <string>NULL</string>
and then you can convert 'NULL' string to null.
Demo:
with your_data as (
select '<ParentArray><ParentFieldArray><Name>ABCD</Name><Value><string>111</string><string></string><string>222</string></Value></ParentFieldArray><ParentFieldArray><Name>EFGH</Name><Value><string/><string>444</string><string></string><string>555</string></Value></ParentFieldArray></ParentArray>' as xmlinfo
)
select name, case when value='NULL' then null else value end value
from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo
from your_data d
) d
lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as Name
lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as value
Result:
name value
ABCD 111
ABCD
ABCD 222
EFGH
EFGH 444
EFGH
EFGH 555
Post a Comment for "In Hive, How To Read Through Null / Empty Tags Present Within An Xml Using Explode(xpath(..)) Function?"