<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:og="http://ogp.me/ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:schema="http://schema.org/" xmlns:sioc="http://rdfs.org/sioc/ns#" xmlns:sioct="http://rdfs.org/sioc/types#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" version="2.0" xml:base="https://www.linuxjournal.com/tag/encoding">
  <channel>
    <title>Encoding</title>
    <link>https://www.linuxjournal.com/tag/encoding</link>
    <description/>
    <language>en</language>
    
    <item>
  <title>Unicode</title>
  <link>https://www.linuxjournal.com/content/unicode</link>
  <description>  &lt;div data-history-node-id="1148134" class="layout layout--onecol"&gt;
    &lt;div class="layout__region layout__region--content"&gt;
      
            &lt;div class="field field--name-field-node-image field--type-image field--label-hidden field--item"&gt;  &lt;img src="https://www.linuxjournal.com/sites/default/files/nodeimage/story/200px-Unicode_logo.png" width="200" height="200" alt="" typeof="foaf:Image" class="img-responsive" /&gt;&lt;/div&gt;
      
            &lt;div class="field field--name-node-author field--type-ds field--label-hidden field--item"&gt;by &lt;a title="View user profile." href="https://www.linuxjournal.com/users/reuven-lerner" lang="" about="https://www.linuxjournal.com/users/reuven-lerner" typeof="schema:Person" property="schema:name" datatype="" xml:lang=""&gt;Reuven Lerner&lt;/a&gt;&lt;/div&gt;
      
            &lt;div class="field field--name-body field--type-text-with-summary field--label-hidden field--item"&gt;&lt;p&gt;
Let's give credit where credit's due: Unicode is a brilliant invention
that makes life easier for millions—even billions—of people on
our planet. At the same time, dealing with Unicode, as well as the
various encoding systems that preceded it, can be an incredibly
painful and frustrating experience. I've been dealing with some
Unicode-related frustrations of my own in recent days, so I thought
this might be a good time to revisit a topic that every modern
software developer, and especially every Web developer, should
understand.
&lt;/p&gt;

&lt;p&gt;
In case you don't know what Unicode is, or how it affects you,
consider this: in C and in older versions of languages like Python
and Ruby, a string is nothing more than a bunch of bytes. There's no
rhyme or reason to it; you can read whatever data you want into a
string, and the language will be fine with it. For example, if I fire
up iPython (which uses Python 2.7), I can read a JPEG image into
a string:

&lt;/p&gt;&lt;pre&gt;&lt;code&gt;
s = open('Downloads/test.jpg').read()
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;
Most of the time, you use strings not to hold JPEG images, but rather
to hold text. If your text is all in English, you're in luck,
because all the characters used by the English language are defined
in ASCII, a standard that defines 128 different characters, each with
a unique number. Thus, character 65 is uppercase A, and the space
character is number 32. ASCII is great, and it works just fine—until
you want to start using languages other than English.
&lt;/p&gt;

&lt;p&gt;
The problem is most languages require characters that are not
used in English, and that aren't defined in ASCII. This means if
you want to write words in French, let alone in Arabic or Chinese,
you won't have a way to represent characters using ASCII.
&lt;/p&gt;

&lt;p&gt;
A solution for alphabetic languages was a set of ISO standards (ISO
8859-*), which took advantage of the fact that ASCII uses only 7 bits,
but that data is transmitted with 8 bits. If you can take advantage of
all 8 bits, you double the number of available characters, from
128 to 256. This is more than enough for languages with a defined
alphabet. Thus, Western European languages were defined in
ISO-8859-1, Hebrew in ISO-8859-8 and so forth. Moreover, these ISO
standards were meant to make it possible to mix the "foreign" language
with English. Thus, you could have a document with English and
French or English and Arabic. ASCII characters retained their
original values, and the non-ASCII characters were defined in the
upper 128.
&lt;/p&gt;&lt;/div&gt;
      
            &lt;div class="field field--name-node-link field--type-ds field--label-hidden field--item"&gt;  &lt;a href="https://www.linuxjournal.com/content/unicode" hreflang="und"&gt;Go to Full Article&lt;/a&gt;
&lt;/div&gt;
      
    &lt;/div&gt;
  &lt;/div&gt;

</description>
  <pubDate>Mon, 16 Sep 2013 17:19:45 +0000</pubDate>
    <dc:creator>Reuven Lerner</dc:creator>
    <guid isPermaLink="false">1148134 at https://www.linuxjournal.com</guid>
    </item>

  </channel>
</rss>
