Sunday, January 13, 2013

HTML Encoding in ASP.NET

There are certain characters that have a special meaning in HTML. For example, the angle brackets (< >) are always used to create tags. This can cause problems if you actually want to use these characters as part of the content of your web page. For example, imagine you want to display this text on a web page:

Enter a word <here>

If you try to write this information to a page or place it inside a control, you end up with this instead:

Enter a word

The problem is that the browser has tried to interpret the <here> as an HTML tag. A similar problem occurs if you actually use valid HTML tags. For example, consider this text:

To bold text use the <b> tag.

Not only will the text <b> not appear, but the browser will interpret it as an instruction to make the text that follows bold. To circumvent this automatic behavior, you need to convert potential problematic values to their HTML equivalents. For example, < becomes &lt; in your final HTML page, which the browser displays as the < character. 

Some special characters that need to be encoded.

Result       Description                       Encoded Entity

                 Nonbreaking space             &nbsp;

<               Less-than symbol               &lt;

>               Greater-than symbol           &gt;

&               Ampersand                       &amp;

"                Quotation mark                  &quot;

You can performthis transformation on your own, or you can circumvent the problem by using the InnerText property of an HTML server control. When you set the contents of a control using InnerText, any illegal characters are automatically converted into their HTML equivalents. However, this won’t help if you want to set a tag that contains a mix of embedded HTML tags and encoded characters. It also won’t be of any use for controls that don’t provide an InnerText property, such as the Label web control. In these cases, you can use the HttpServerUtility.HtmlEncode() method to replace the special characters. (Remember, an HttpServerUtility object is provided through the Page.Server property.)

Here’s an example:

// Will output as "Enter a word &lt;here&gt;" in the HTML file, but the
// browser will display it as "Enter a word <here>".

ctrl.InnerHtml = Server.HtmlEncode("Enter a word <here>");

Or consider this example, which mingles real HTML tags with text that needs to be encoded:

ctrl.InnerHtml = "To <b>bold</b> text use the ";
ctrl.InnerHtml += Server.HtmlEncode("<b>") + " tag.";

Figure shows the results of successfully and incorrectly encoding special HTML characters. 

The HtmlEncode() method is particularly useful if you’re retrieving values from a database and you aren’t sure whether the text is valid HTML. You can use the HtmlDecode() method to revert the text to its normal form if you need to perform additional operations or comparisons with it in your code. Along with the HtmlEncode() and HtmlDecode() methods, the HttpServerUtility class also includes UrlEncode() and UrlDecode() methods. Much as HtmlEncode() allows you to convert text to valid HTML with no special characters, UrlEncode() allows you to convert text into a form that can be used in a URL. This technique is particularly useful if you want to pass information from one page to another by tacking it onto the end of the URL. 

No comments:
Write comments
Recommended Posts × +