A quick reference to HTML/URL escape/unescape methods in python

Sunday, July 1, 2012

This code briefs about some of the escape/unescape methods in python available for HTML special characters and also ways to quote/unquote characters using urllib methods, though obvious, this is one of the common tasks involved when we generate/parse HTML/XML documents and process URLs.


"""
   Escape/unescape URL encode/unquote methods in python
   Shows the use of different methods available for these tasks
   
   Note: For detailed description of the differences between
   these methods, refer the respective python doc/wiki, the purpose
   of this code is to have a quick reference for these common
   repititive tasks
   
   Author: S.Prasanna
"""

import xml.sax.saxutils
import HTMLParser
import urllib2

# Escape <, > and &
char_list = ["<", ">", "&"]
html_char_list = ["&lt;", "&gt;", "&amp;"]

# Escape/unescape through xml.sax.saxutils escape/unescape methods
for item in char_list:
    print "escape(%s) = %s" % (item, xml.sax.saxutils.escape(item))

for item in html_char_list:
    print "unescape(%s) = %s" % (item, xml.sax.saxutils.unescape(item))

# Unescape through HTMLParser object's method
htmlparser_obj = HTMLParser.HTMLParser()
for item in html_char_list:
    print "unescape(%s) = %s" % (item, htmlparser_obj.unescape(item))

# URL encodings
url_char_list = [";", " ", "+", ","]

for item in url_char_list:
    print "quote(%s) = %s, unquote(%s) = %s" %\
          (item, urllib2.quote(item), urllib2.quote(item), urllib2.unquote(urllib2.quote(item)))
Syntax highlighter: Pygments

Sample Output:

>>>
escape(<) = &lt;
escape(>) = &gt;
escape(&) = &amp;
unescape(&lt;) = <
unescape(&gt;) = >
unescape(&amp;) = &
unescape(&lt;) = <
unescape(&gt;) = >
unescape(&amp;) = &
quote(;) = %3B, unquote(%3B) = ;
quote( ) = %20, unquote(%20) =
quote(+) = %2B, unquote(%2B) = +
quote(,) = %2C, unquote(%2C) = ,
>>>

No comments:


Copyright © 2016 Prasanna Seshadri, www.prasannatech.net, All Rights Reserved.
No part of the content or this site may be reproduced without prior written permission of the author.