Page 1 of 1

Invalid XML being generated?

Posted: Fri Nov 06, 2009 11:13 pm
by dooferlad
Hi,

I am getting this in my XMLTV data file that a parser doesn't like:

Code: Select all

<programme start="20091110170000 +0000" stop="20091110180000 +0000" channel="UK_Digi_88">
    <title>Sen in Action</title>
    <sub-title>10 Nov 09</sub-title>
    <desc>Primary - Tackling Challenging Behaviour 1; &2; Research and Development in SEN - Movement; Secondary - Working with Pupils with Down&apos;s Syndrome: an hour featuring a range of SEN strategies.</desc>
    <category>Education</category>
  </programme>
I think the problem is the &2; that I am guessing isn't a valid escape sequence. That sequence of characters is in the original program description so the grabber just needs to escape it.

Re: Invalid XML being generated?

Posted: Wed Nov 18, 2009 10:51 am
by routerunner
Hi,

I'm having a very similar issue with the XMLTV importer I'm using which doesn't like the "&nbsp" escape sequence appeared few days ago in the DigiGuide data. This escape sequence is not valid in an XML file and I think the grabber should just skip it.

I now every day manually remove the unwanted escape sequence from the EPG :?

Regards

Re: Invalid XML being generated?

Posted: Wed Nov 18, 2009 5:17 pm
by routerunner
Hi,

I found a solution (actually two) to our problem:

1) In the grabber tab enable the XMLTV Importer along with the UK_digi, of course with UK_digi being at higher priority. Set the XMLTV importer file to point the same file as the output. What is happening is that after UK_digi has done his job, the XMLTV importer will re-import your out file, cleaning up every "unsupported" markup automagically.

2) I did a bit of LUA study today and I managed to create a new postprocessor script that does the above after UK_digi runs. It works better because the first solution is not bullet proof, in fact I had instances where the XMLTV importer didn't import the whole block of data, so you tend to loose the last programs in the file. I tried to upload the script in attachment here, but the system doesn't allow you to do for .lua files.

Regards
Edo

Re: Invalid XML being generated?

Posted: Wed Nov 18, 2009 5:20 pm
by routerunner
Hi,

I found a solution (actually two) to our problem:

1) In the grabber tab enable the XMLTV Importer along with the UK_digi, of course with UK_digi being at higher priority. Set the XMLTV importer file to point the same file as the output. What is happening is that after UK_digi has done his job, the XMLTV importer will re-import your out file, cleaning up every "unsupported" markup automagically.

2) I did a bit of LUA study today and I managed to create a new postprocessor script that does the above after UK_digi runs. It works better because the first solution is not bullet proof, in fact I had instances where the XMLTV importer didn't import the whole block of data, so you tend to loose the last programs in the file. I tried to upload the script in attachment here, but the system doesn't allow you to do for .lua files.

Regards
Edo

Re: Invalid XML being generated?

Posted: Wed Nov 18, 2009 5:50 pm
by dooferlad
I don't know Lua and didn't have much time so I wrote a python script that throws anything away invalid XML escape sequences. Now I call the grabber through a batch script that calls the XML cleaner script after the XMLTV file is generated. A bit messy, but seems to be effective so far. Clearly a Lua version would be best! Since it wasn't a real fix to the problem I didn't want to post it here because it is a bit of a nasty hack having to run a Python script, but since it is up on the SageTV forums...

http://forums.sagetv.com/forums/showthr ... tcount=823

I really should have given it a license. If anyone wants to hack on it I can put it under Creative Commons non commercial share alike.

Re: Invalid XML being generated?

Posted: Wed Nov 18, 2009 7:50 pm
by routerunner
Hi,

have a go at the solution #1, It does avoid the python script.

Edo