I avoid XML like the plague. I am not a programmer, so configuration files and software that use XML are anathema to me. And where I have to use it, like in Openbox’s rc.xml and menu.xml files, I look for just about any way out of it.
xidel describes itself as a tool that will “download and extract data from HTML/XML pages.” The home page supplies quite a few examples of that.
Yes, xidel can retrieve web pages, and yes, xidel can extract the data that’s embedded in them, so you don’t have to pick through it to find what you need.
But it can also sift through configuration files and pull out, for example, the programs executed in an Openbox menu.xml file.
For someone like me, who considers XML to be cruel and unusual punishment, that is a very nifty trick. The next time I need to switch window managers and want to convert the list of keybindings I know, xidel will be there to expedite it.
At this point you might ask, “What’s the benefit of this over an HTML stripper, perhaps like dehtml?”
Mostly in its flexibility, I would answer. dehtml yanks the core text out of an HTML page, but xidel allows you to filter or search through a file, and control the output.
I’m definitely no expert, but it only took me about 20 minutes with a few examples to get xidel working how I wanted. If you need to wrangle XML pages on a regular basis (and I feel bad for you if you do), I’m sure you can get xidel to work on your project in a matter of minutes.
Spend a little time with the parser documentation, and you’ll see how you can send extracted data into variables, loop through documents for specific tags, and otherwise make your life sooo much easier.
I like it when a program makes my life easier. 😀
How does one avoid the plague? 😉
Check also html-xml-tools and the command line tools of ltxml2. Lxprintf is love.
Do you have any links for those? I just skimmed quickly for them and I think I found a home page for ltxml, but didn’t see anything for html-xml-tools. Thanks. 😉
You can find here ltxml2: http://www.ltg.ed.ac.uk/software/ltxml2
And there you find a README paired with tarballs of html-xml-tools, a toolchain I’m still naming html-xml-tools but everywhere it’s called html-xml-utils: http://www.w3.org/Tools/HTML-XML-utils/
Check the directory man1, the man pages are well done here imho so you’ll grasp easily which tool does what. For that piece of software there were also a page from its author but I forgot where the hell it is. It is in Debian and AUR, too.
Thanks! I’ll add these to my list for the next swing through. Cheers!
Pingback: xmlstarlet: A superstar for XML | Inconsolation
Pingback: html-xml-utils: A sweet suite | Inconsolation
Pingback: html-xml-utils: A sweet suite | Linux Admins
Pingback: Bonus: A dozen more remainders | Inconsolation
Pingback: Bonus: A dozen more remainders | Linux Admins