URL escaping

From WhyNotWiki

Jump to: navigation, search

RFC 1738

"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."

URL Encoding :: What characters need to be encoded and why (http://www.blooberry.com/indexdot/html/topics/urlencoding.htm). Retrieved on 2007-03-12 18:34.

Contents

[edit] "Reserved characters"

Why URLs use some characters for special use in defining their syntax. When these characters are not used in their special role inside a URL, they need to be encoded.

  • Dollar ("$")
  • Ampersand ("&")
  • Plus ("+")
  • Comma (",")
  • Forward slash/Virgule ("/")
  • Colon (":")
  • Semi-colon (";")
  • Equals ("=")
  • Question mark ("?")
  • 'At' symbol ("@")

[edit] "Unsafe characters"

Why: Some characters present the possibility of being misunderstood within URLs for various reasons. These characters should also always be encoded.

  • Space — Significant sequences of spaces may be lost in some uses (especially multiple spaces)
  • Quotation marks, 'Less Than' symbol ("<"), 'Greater Than' symbol (">") — These characters are often used to delimit URLs in plain text.
  • 'Pound' character ("#") — This is used in URLs to indicate where a fragment identifier (bookmarks/anchors in HTML) begins.
  • Percent character ("%") — This is used to URL encode/escape other characters, so it should itself also be encoded.
  • Misc. characters:
    • Left Curly Brace ("{")
    • Right Curly Brace ("}")
    • Vertical Bar/Pipe ("|")
    • Backslash ("\")
    • Caret ("^")
    • Tilde ("~")
    • Left Square Bracket ("[")
    • Right Square Bracket ("]")
    • Grave Accent ("`")

[edit]

http://www.cumbrowski.com/CarstenC/affiliatemarketing_datafeeds_CJ_Advanced_Links_Tutorial.asp CJ Advanced Links Tutorial and URL Encoding


Replace characters that are not Letters or Numbers (a-z,A-Z,0-9) with "%" + Hex Value of ASCII Character.

Here is an ASCII Table where you can see all characters.

Note: Unix/Linux does not have any character beyond ascii 128 (7bit). 129-255 are Windows Specific (8bit) and should not be in the URL

Common Characters to replace

: = %3A (colon) / = %2F (forward slash) \ = %5C (back slash)
. = %2E (dot)   SP = %20 (space or blank)       ? = %3F (question mark)
= = %3D (equal sign)    - = %2D (dash or hyphen)        _ = %5F (under score or line)
+ = %2B (plus sign)     % = %25 (percent)       & = %26 (ampersant or "and" symbol)
( = %28 (opening bracket)       ) = %29 (closing bracket)       

[edit] Tools

[edit]

Aliases: URL encoding

Personal tools