caracolina

Life after Work

XMLParse and UTF-8 encoding

I know this is supposed to be an AFTER work blog, but I don’t really know where else to keep track of this stuff and this might come in handy again. And I might end up changing the “life after work” tagline at some point anyway.

So, here I am trying to xmlParse an incoming datafeed with location and geocode information. However, sometimes foreign characters show up. Now, as far as I know, xmlParse is supposed to encode to UTF-8 by default and the xml data clearly is UTF-8 encoded so what is going on?

At work I couldn’t really come up with anything on Google but I just tried it again from home and, bingo! Instead of xmlParse(cfhttp.filecontent) it needed to be xmlParse(cfhttp.filecontent.toString(“UTF-8”). Apparently, cfhttp.filecontent is usually a string, but in some cases it is an object of the class ByteArrayOutputStream. The reason for that is that CFHTTP looks at the MIME type, and if it doesn’t recognize it, it returns the ByteArrayOutputStream.

Therefore, if you don’t know what you’re getting, try this (not sure why that second double quote doesn’t show):

<cfif cfhttp.filecontent.getClass() IS “class java.io.ByteArrayOutputStream”>
     <cfset xmlResult = xmlParse(cfhttp.filecontent.toString(“UTF-8”))>          
<cfelse>
     <cfset xmlResult = xmlParse(cfhttp.filecontent) />           
</cfif>

Here is my source, way better explained. Thanks, Roger!

RSS 2.0 | Trackback | Comment

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>