In computing and typesetting, a soft hyphen (U+00AD soft hyphen, HTML: ­ ­) is a type of hyphen used to specify a place in text where a hyphenated break is allowed without forcing a line break in an inconvenient place if the text is re-flowed.
Additional semantics associated with the soft hyphen vary. According to the Unicode standard, a soft hyphen is not displayed if the line is not broken at that point.[1] HTML4 describes it as a "hyphenation hint," though it suggests that that interpretation is not universal:[2]
- In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character. The soft hyphen tells the user agent where a line break can occur. Those browsers that interpret soft hyphens must observe the following semantics: If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. For operations such as searching and sorting, the soft hyphen should always be ignored.
ISO 8859-1 specifies that it is always visible. EBCDIC has a SHY character, with "SHY" an abbreviation for "syllable hyphen,"[1][3] which is defined by IBM to mean a "hyphen used to divide a word at the end of a line [that] may be removed when a program adjusts lines."[4]
In most parts of ISO-8859 the soft hyphen is at position 0xAD (hexadecimal), and since the first 256 positions in Unicode are taken from ISO-8859-1, it has a Unicode codepoint of U+00AD. HTML 3.2 introduced a character entity for the soft hyphen, "­". In TeX and LaTeX the soft hyphen is represented by the command \-
.[5]
To show the effect of a soft hyphen, the following “wocka”s have been separated with soft hyphens
Pac-Man goes "wockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawockawocka."
On browsers supporting soft hyphens, resizing the window will hyphenate the above text only as “wocka-”s (and never with, for example, “wock-”). On browsers not supporting soft hyphens, the above might appear as one very long line, or as several lines but without hyphens at the end of each broken line.
Compare soft hyphen's semantics and HTML implementation with the zero-width space.
Accessibility issues
Soft hyphens are known to cause some text-to-speech systems to mispronounce words.[citation needed]
Security issues
Soft hyphens have been used to obscure malicious domains or URLs in E-mail spam.[6][7]
See also
References
- ^ a b Jukka Korpela (Revision as of January 2011). "Soft hyphen (SHY) – a hard problem?". Tampere University of Technology. http://www.cs.tut.fi/~jkorpela/shy.html. Retrieved 2011-04-08.
- ^ "9.3.3 Hyphenation". HTML 4.01 Specification. World Wide Web Consortium. 24 December 1999. http://www.w3.org/TR/html401/struct/text.html#h-9.3.3. Retrieved 2011-04-08.
- ^ "Extended Binary-Coded Decimal Interchange Code - S/390". comsci.us. http://www.comsci.us/datacom/ebcdic3.html. Retrieved 2011-04-08.
- ^ "Glossary". IBM. http://publib.boulder.ibm.com/infocenter/iseries/v5r4/topic/rzaat/rzaats.htm#x2047006. Retrieved 2011-04-08.
- ^ "Commonly Confused Characters". Greg Baker, Simon Fraser University. http://www.cs.sfu.ca/~ggbaker/reference/characters/#dash. Retrieved 2011-07-12.
- ^ "Spammers Using Soft Hyphen To Hide Malicious URLs". Slashdot. October 7, 2010. http://it.slashdot.org/story/10/10/07/2127241/Spammers-Using-Soft-Hyphen-To-Hide-Malicious-URLs. Retrieved 2011-04-08.
- ^ "Soft Hyphen – A New URL Obfuscation Technique". Symantec. http://www.symantec.com/connect/blogs/soft-hyphen-new-url-obfuscation-technique. Retrieved 2011-04-08.
Recent Comments