XML Parsing: Stripping comments from an XML file

Saturday, December 18, 2010

In this post, we will see how to strip comments from an XML file from an UNIX shell through the use of sed command, I am writing this because recently I was developing an XML parser to parse XML content and the problem is that the contents gets parsed irrespective of whether it's commented out or not, therefore the best way to avoid that is to strip comments before parsing the XML, after some research, I found the following sed command which will do the trick.

cat <XML_file> | sed 's/<!--.*-->//' | sed '/<!--/,/-->/d'

The code has two piped sed commands, the first sed command

sed 's/<!--.*-->//' - Matches all single line comments and substitutes them with a null character, I can't use sed delete option here because in a single line when you have uncommented XML content, it will also get deleted along with the comment when you use delete option, to prevent that, use sed subsitutes, the only disadvantage with this approach is that a commented line won't be deleted, but the comments will be.

The second sed command sed '/<!--/,/-->/d' follows the multi-line sed syntax where we match a start pattern and an end pattern and delete all lines from the line in which the start pattern is matched till the line in which the end pattern is matched.

So simple isn't it!


<!-- Sample XML to show XML comment stripping -->

<tag1>tag1name</tag1> <!-- I am tag 1 -->

<!-- Commenting out tag 2 to tag 3

<!-- This is tag 4 -->

$ cat sample.xml | sed 's/<!--.*-->//' | sed '/<!--/,/-->/d'





Copyright © 2016 Prasanna Seshadri, www.prasannatech.net, All Rights Reserved.
No part of the content or this site may be reproduced without prior written permission of the author.