But that is different than Latin 1, which is also effectively Windows-1252 or ISO-8859-1. We should design for Python 3 first and support Python 2. Unicode nightmare. " So if you want to be cross-platform, it's even worse, because you can't use str on Unix nor bytes on Windows. Deprecated since version 3. The difference is only in how Python treats and presents the data. Python search path (on Windows) I would like to execute a Python script from Windows command line: Python myscript. This makes the programming environment rather unfriendly to Python users who live and work in non-Latin-1 locales such as many of the Asian countries. # python 3 ♥ = 4 print (♥) # ♥ = 4 # ^ # SyntaxError: invalid character in identifier Python 2: Declare Unicode String. You can vote up the examples you like or vote down the ones you don't like. The mission of the Python Software Foundation is to promote, protect, and advance the Python programming language, and to support and facilitate the growth of a diverse and international community of Python programmers. 4) Use an if statement to include this file ONLY on Windows (below). Usually this is implemented by converting the Unicode string into some encoding that varies depending on the system. It’s all much cleaner. howto-unicode. All codecs including unicode-escape only dealt with byte conversion; (unicode-escape converts everything to printable 7bit ascii); these patches want to add a feature related to the "where does the string end" issue, and is only aimed at "python code" containers. We should design for Python 3 first and support Python 2. If you're using an older version of Windows, or the legacy console in Windows 10, then enabling VT mode will fail as an invalid parameter. Become a Member Donate to the PSF. Open the command line–a text-only view of your computer–via PowerShell which is a built-in program. 3, str was a byte string (sequence of bytes in certain encoding, default ASCII) and unicode , a Unicode string. gVim has many Unicode digraphs for editing UTF-8 files. u'', means it is treates as an unicode string and, codepoints are referenced like \u2222. On Mac OS X, the default locale is actually UTF-8, but everywhere else, the default is probably ASCII. Letter values weren't in one block: 'a' through 'i' had values In the standard, a code point is written using the Note There is Python Write Unicode To File. Chilkat Software. Get the complete details on Unicode character U+2663 on FileFormat. The first thing you should know about default encoding is that you don't need to care about it. The following are code examples for showing how to use codecs. py resides in the current folder. Use non-Unicode encodings reluctantly. Here are some simple examples to show how the escape characters can be used with the print() function in Python. 0_01/jre\ gtint :tL;tH=f %Jn! [email protected]@ Wrote%dof%d if($compAFM){ -ktkeyboardtype =zL" filesystem-list \renewcommand{\theequation}{\#} L;==_1 =JU* L9cHf lp. Escape sequences in string literals. converting to and from octal escaped UTF--8. Get argument as unicode string from argparse in Python 2 and 3 Code from my last project, that has to work on Python 2. ASCII or EBCDIC) that is used to signify the end of a line of text and the start of a new one. On both windows and Linux the following fails on Python 2. Scholarly Search Engine Find information the second column provides the Unicodeescape sequence for the character,. replace” function. The issue is unlike 'string_escape', which convert from byte string to byte string, 'unicode_escape' decode byte string into unicode. Since Python 2 is in maintenance mode, and Python 3 is recommended by upstream for development, you probably do not need to install the documentation. The charts are PDF files, and some of them may be very large. Unicode String Normal strings in Python are stored internally as 8-bit ASCII, while Unicode strings are stored as 16-bit Unicode. Google Python Class Day 1 Part 1: Introduction and Strings. In Python 3. If there are a large number of files to handle in your Python program, you can arrange your code within different directories to make things more manageable. Unicode in Python 3 If you are using Python 3:. On macOS only, if there is exactly one button with the role RejectRole, it is made the escape button. Python: Unicode html character escape Sometimes things can get too much irritating when you are in need of something, and you are not finding out the solution, but there is a easy way to do you know. 1, non-ASCII Unicode characters seem to get dropped whereas in 2. strftime, comparing Python 2. simplejson mimics the json standard library. Easy, clean, reliable Python 2/3 compatibility¶. 3, so it not bytes and unicode mixed but str and unicode. 0420 and column D. Python is even good at parsing unicode XML. Drawing from technology, finance, sports, social psychology, and complexity theory, Everett Harper looks at the key practices that are crucial for solving our most critical challenges. There is apparently a change in python 3. js Excel Go. Each operating system manages Unicode differently. python 中 unicode原样转成str, unicode-escape与string_escape 2015-04-24 15:55:10 小橘子Pythoner 阅读数 16067 分类专栏: Python. Here's some code to enable VT mode in Windows 10 -- assuming you're not using the legacy console. Unicode and the Windows console. Python Tools for Visual Studio is a completely free extension, developed and supported by Microsoft with contributions from the community. 3, will be removed in version 4. An Unicode escape sequence represents a single Unicode character formed by: a backslash "\". In Python 2, and in Python 3 prior to 3. simplejson mimics the json standard library. How to type unicode characters in Windows 10? With Microsoft Word, Wordpad and Notepad running in XP and Win7, you could type any unicode character by typing its number followed by Alt+X. 0, strings are stored as Unicode, i. Caracteres de escape no Python Os caracteres ou sequencias de escape são usadas para conseguir imprimir na tela caracteres especiais, símbolos ou imagens UNICODE, que são impossíveis de obter normalmente. In this article I will show you how to use ANSI escape codes to control the colors of your Terminal, write text at arbitrary positions, erase lines of text or portions of the terminal and move the cursor. x provides a data type called a Unicode string for working with Unicode data using string encoding and decoding methods. NET, Java, Perl, PCRE, JavaScript, Python, Ruby. For Windows users, there is an option to modify the font in the command prompt to display the Unicode text properly. Python Forums on Bytes. Windows filenames used a backslash as a separator, but within Python the backslash had the magic power of an escape character, so every time she wanted a backslash in a filename she had to code two backslashes, like this:. Escape sequences in string literals. IsTextUnicode: Determines if a buffer is likely to contain a form of Unicode text. Info Unicode Character 'AMPERSAND' (U+0026) Browser Test Page Outline (as SVG file) Fonts that support U+0026. Some time ago I had to output some nicely formatted reports from a web application, to be usable offline by Windows users. Unicode Transformation Format 8-bit is a variable-width encoding that can represent every character in the Unicode character set. " So if you want to be cross-platform, it's even worse, because you can't use str on Unix nor bytes on Windows. decode('unicode_escape') Thank you for your help Chris! Regards, Malcolm Python. encode('unicode-escape. As most of the string operations can be done using the str and as the literal for byte strings is simpler than the unicode ones you'll end up with lots of text data handled by the str type. If you're using an older version of Windows, or the legacy console in Windows 10, then enabling VT mode will fail as an invalid parameter. Ways to print escape characters in Python Escape characters are characters that are generally used to perform certain tasks and their usage in code directs the compiler to take a suitable action mapped to that character. This causes failures in many cases. Python supports several Unicode encodings. Position Decimal Name Appearance; 0x0100: 256: LATIN CAPITAL LETTER A WITH MACRON: Ā: 0x0101: 257: LATIN SMALL LETTER A WITH MACRON: ā: 0x0102: 258: LATIN CAPITAL LETTER A WITH BREVE. Each index points to a symbol, which doesn’t specify its binary value. For the hell of it I tried to run the script in python 3 (3. I don't know why str and unicode mixed now, You should fix it. Python2的隐式转换. 1, Unicode contains a repertoire of over 137,000 characters covering 150 modern and historic scripts, as well as multiple symbol sets. You can also save this page to your account. Changing wallpaper on Windows. Master Python loops to deepen your knowledge. raw_unicode_escape_decode(). Pre-Python 3. Unicode symbols. The much more likely thing to happen is that people stick to Python 2 or build broken stuff on Python 3. Using Windows folders as ArcPy Parameters? Ask Question simple folder management operations in an ArcGIS python toolbox. A set of Python extension modules that wrap the cross-platform GUI classes from wxWidgets. Python putting r before unicode string variable - active directory - ADFS not issuing claims from cu Gulp-minify-css does not produce output files - java - Unable to compile jsp on server - Php\Javascript web-app on Windows 8. For most Unix systems, you must download and compile the source code. Unicode Character Table has online reference tools, including selection of Unicode characters by clicking on a chart, and converting to and from HTML formats. \u expects up to four hexadecimal digits, and \U expects up to eight. exe sur elle). Among the code points equivalent to rune (including rune itself), SimpleFold returns the smallest rune > r if one exists, or else the smallest rune >= 0. The backslash is called an 'escape character' and allows Python to do things like use special characters in Unicode or signify a line break ('\n') in a document. 1, Unicode literals can only be written using the Latin-1 based encoding "unicode-escape". Individual code units which form parts of a surrogate pair can be encoded using this escape sequence. saxutils module contains the functions escape() and quoteattr(). On windows, you can only use python-2. It's very rarely just unicode escapes, and passing it through unicode-escape will usually lead to half-parsed or double-parsed data. Unicode is a standard for encoding character. For example, Mac OS X uses UTF-8 while Windows uses a configurable encoding; on Windows, Python uses the name “mbcs” to refer to whatever the currently configured encoding is. 1, on a windows 7 machines. 3, Python had exactly two options for how Unicode strings (unicode on Python 2, str on Python 3) would be stored in memory. i m getting data though zigbee from. Individual code units which form parts of a surrogate pair can be encoded using this escape sequence. Python has two types for strings: str and unicode. Master Python loops to deepen your knowledge. There are escape sequences in it. Avoiding Windows backslash problems with Python’s raw strings I’m a Unix guy, but the participants in my Python classes overwhelmingly use Windows. I don't think encoding is an issue. Esther Nam and Travis Fischer Character encoding and Unicode in Python But even if. When you attempt to *decode* a Unicode string, that doesn't actually make sense, so it first gets *encoded* to bytes, before being decoded. With the new Python based IO system being introduced for Python 3. In Python 3, all strings are represented in Unicode. Unicode nightmare. You might be looking for the documentation for Beautiful Soup 3. UTF-8 favors efficiency for English letters and other ASCII characters (one byte per character) while UTF-16 favors several Asian character sets (2 bytes instead of 3 in UTF-8). In some IDEs emoji’s don’t display [Canopy] or don’t display well [PyCharm]. Unicode escape tool for Sublime Text(2/3). How To Install Python Packages for Web Scraping in Windows 10 Web scraping using Python in Windows was tough. str is used for binary data, unicode for text data. The following are code examples for showing how to use re. defaultencoding). encode('utf-8')) [/code] All those %E3 are. The mysql client truncates quoted strings containing NUL characters if they are not escaped, and Control+Z may be taken for END-OF-FILE on Windows if not escaped. Unicode is a standard for encoding character. I don't know why str and unicode mixed now, You should fix it. Unicode Text to LaTeX code. With the new Python based IO system being introduced for Python 3. Our antivirus analysis shows that this download is malware free. Two of the most common encodings are: utf-8 ; utf-16 ; It is critical to note that a unicode encoding is not Python unicode!. Fixing common Unicode mistakes with Python – after they’ve been made. Unicode string is designed to store text data. In Python 3, all strings are represented in Unicode. encode('utf-8') >>> print data Where data is your string. In this tutorial, you'll get a Python-centric introduction to character encodings and unicode. It was designed for backward compatibility with ASCII and to avoid the complications of endianness and byte order marks in UTF-16 and UTF-32. Unicode in RTF documents 10 minute read How to encode unicode codepoints in RTF documents using PyRTF. I found it super-helpful to not think about what the console said, or work with the console, because the console lies. This should be fixed if it isn't intended behavior. The default codepage for an English language Windows console doesn't have the characters you want to display so python throws an exception on your print(x). On both windows and Linux the following fails on Python 2. 0, strings are stored as Unicode, i. 1, Unicode literals can only be written using the Latin-1 based encoding "unicode-escape". I remedied the invisible/messy emoji’s by running the script in Mac OS X’s terminal application, which displays emoji. 7 with little changes. " So if you want to be cross-platform, it's even worse, because you can't use str on Unix nor bytes on Windows. Unicode, Python 3, and the Windows Console I've been working on a python 3 (3. Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming language. To avoid unexpected problems with these characters, use the rules provided in the following topics in your applications: Using ASCII Control Codes 0x000D and 0x000A; Using Byte Order Marks; Using Escape Sequences and Control Characters. Open and save text files encoded in Unicode (UTF-8, UTF-16 and UTF-32), any Windows code page, any ISO-8859 code page, and a variety of DOS, Mac, EUC, EBCDIC, and other legacy code pages. Defining Python Source Code Encodings - I think this is irrelevant now, doesn't Python assume that your source code is UTF-8?. Inevitably, when we get to talking about working with files in Python, someone will want to open a file using the complete path to the file. At this writing, both Python 2. In Python's json. In a table, letter Э located at intersection line no. Windows has their own escape characters. psyco speeds up the execution of most Python programs with a single extension module. to_basestring ¶ Converts a byte or unicode string into type str. Python 3000 will prohibit decoding of Unicode strings, according to PEP 3137: "encoding always takes a Unicode string and returns a bytes sequence, and decoding always takes a bytes sequence and returns a Unicode string". Python has a convenient API for parsing XML. 6 implements PEP 528: Change Windows console encoding to UTF-8: the default console on Windows will now accept all Unicode characters. ) in windows platform. Python has two types for strings: str and unicode. Python has also been ported to the Java and. Unofficial pre-built OpenCV packages for Python. Unicode String. Windows filenames used a backslash as a separator, but within Python the backslash had the magic power of an escape character, so every time she wanted a backslash in a filename she had to code two backslashes, like this:. It is no longer necessary now. Some time ago I had to output some nicely formatted reports from a web application, to be usable offline by Windows users. How to type unicode characters in Windows 10? With Microsoft Word, Wordpad and Notepad running in XP and Win7, you could type any unicode character by typing its number followed by Alt+X. Real escape string. Escape sequences in string literals. Virus-free and 100% clean download. Replacing backslash '\' in a windows path Its got something to do with how Python uses repr(). Py3 has two distinct concepts: "text" - uses the str object (which is always unicode!) "binary data" - uses bytes or bytearray; Everything that's about text is unicode. the core Python developers) need to provide some clearer guidance on how to handle text processing tasks that trigger exceptions by default in Python 3, but were previously swept under the rug by Python 2's blithe assumption that all files are encoded in "latin-1". To avoid unexpected problems with these characters, use the rules provided in the following topics in your applications: Using ASCII Control Codes 0x000D and 0x000A; Using Byte Order Marks; Using Escape Sequences and Control Characters. ) in windows platform. Escape characters (also called escape sequences or escape codes) are used to signal an alternative interpretation of a series of characters. 10 do not properly identify the context of Windows shortcut files, which allows user-assisted remote attackers to bypass the Same Origin Policy via a crafted web site for which the user has previously saved a shortcut. If you have a unicode string you can do this: [code python] from urllib2 import quote escaped_string = quote(unicode_string. A script that looks like. Looking at the answer to a previous question, I have attempting using the "codecs" module to give me a little luck. Built-in String Methods. 0 release marked the first release in the development of Python that a new version was incompatible with the old one. Specific code points can be written using the \u escape sequence, which is followed by four hex digits giving. 6 you can keep them using the necessary Unicode-to-UTF8 workaround. The problem is that the unicode value used is one. Python中读取txt文本出现:SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape问题解决 windows中的路径是反斜杠\,然而反斜杠\在python中有着转义字符的意义,所以在py文件中写windows文件路径的时候,要特别注意反斜杠\的使用。. 7 under windows. Virus-free and 100% clean download. 3 but not unicode_escape. 6), it becomes fairly straightforward to insert a conversion in the output stream that will replace all those Unicode escape sequences with the original Unicode characters. The character is then encoded as Unicode (UTF-16). For example to add a hat to i the command is print('i\u0302'). This is basically designed as a simple way to construct "wide characters" even in a narrow Python build. WHAT IS UNICODE? Unicode (UCS-2 ISO 10646) is a 16-bit character encoding that contains all of the characters (216 = 65,536 different characters total) in common use in the world's major languages, including Vietnamese. Return NULL if an exception was raised by the codec. The following elements are used with character sets: Unicode and Character Set Functions. 7 and Python 3. This tutorial gives enough understanding on. Google Python Class Day 1 Part 1: Introduction and Strings. Unicodeエスケープされたバイト列へエンコードする場合は、第一引数encodingに'unicode-escape'を指定する。ハイフンではなくアンダースコアの'unicode_escape'でもOK。 unicode-escapeはPython特有のエンコーディング。. 6 on Windows. cpp implementation for "pywintypes. print, write and Unicode in pre-3. So you should only use unicode filenames. Unicode and the Windows console. 今天小编就为大家分享一篇Python3编码问题 Unicode utf-8 bytes互转方法,具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧. Still, Encode the alphabet cipher Tic Tac Toe - C++14 What are Unicode Escape Python and utf-8 is the default encoding. In addition to Windows and OEM code pages, your applications can use non-native code pages. An Escape sequence starts with a backslash (\) which signals the compiler to treat it differently. This article describes the formula syntax and usage of the UNICODE function in Microsoft Excel. str vs unicode. Learn the fundamentals of programming to build web apps and manipulate data. Unicode characters outside of ASCII can be entered using the '\uXXXX' escape sequence or the '\N{name}' notation by the unicode character name. py resides in the current folder. To install pandas, you must keep in mind that following dependencies are installed on your machine beforehand. In the best case, all filenames will be encoded as valid UTF-8 strings and Python creates valid unicode strings. ) in windows platform. Below is a list of symbols and greek letters and the corresponding unicode escape to produce the character in python. If the key is a Python type or class, then the value is a callable Python object (usually a function) taking two arguments (value to convert, and the conversion dictionary) which converts values of this type to a SQL literal string value. I found it super-helpful to not think about what the console said, or work with the console, because the console lies. â€"Paul ErdÅ's. Convert Unicode Text to HTML Entities. License; 1. Unicode String. The pylatexenc. After cutting file name to 259 new posts via email. Learn the fundamentals of programming to build web apps and manipulate data. This can be used to convert things like special language symbols, things like smileys/emoticons, and really any of the symbols that you will come across as unicode as you parse or stream. str is used for binary data, unicode for text data. Python is a great language to write crawlers. In Python the 'u' or 'U' prefix denote an Unicode string. The below table contains a list of Python Escape sequence characters and relevant examples. ' as directory name you get unicode filenames, if you pass '. 0, strings are stored as Unicode, i. Nothing except Python itself. In Python 3 all strings are unicode strings and in Python 2. 0, the language's str type contains Unicode characters, meaning any string created using "unicode rocks!", 'unicode rocks!', or the triple-quoted string syntax is stored as Unicode. The Python interpreter has a number of functions that are always available for use. So above is the table of escape characters in Python. If you want to know number of some Unicode symbol, you may found it in a table. latexencode module provides a function unicode_to_latex() which converts a unicode string into LaTeX text and escape sequences. The issue is that the command prompt that gets executed in the back is encoded in your system codepage (most likely cp1252). Strings are immutable sequences of Unicode code points. jsonデータを展開したら\uを含む文字列が出てくる。 \uはunicodeの表現だということは分かるんだが、そもそも文字列がどういう状態になっているのかイマイチ分からない。 そこでPython3上でどんな扱いなのか調査した。結果は以下。. Info Unicode Character 'DEGREE SIGN' (U+00B0) Browser Test Page Outline (as SVG file) Fonts that support U+00B0. The second is evaluating the unicode escape sequence \u0041 that is. Notes: (1) Individual code units which form parts of a surrogate pair can be encoded using this escape sequence. Unicode in Python 3¶ The “string” object is Unicode (always). Common encodings are US-ASCII (which is the default if you convert from Unicode to 8-bit strings in Python), ISO-8859-1 (aka Latin-1), and UTF-8 (a variable-width encoding that can represent all. Deprecated since version 3. Python is even good at parsing unicode XML. There are two types of strings in Python: byte strings and Unicode strings. Any Unicode character can be encoded this way, but characters outside the Basic Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is compiled to use 16-bit code units (the default). Python search path (on Windows) I would like to execute a Python script from Windows command line: Python myscript. Here is an example from the Python Interpreter:. 1, on a windows 7 machines. You can vote up the examples you like or vote down the ones you don't like. home > topics > python > questions > open filename with spaces in path No need to escape the spaces. Learn Python, a powerful language used by sites like YouTube and Dropbox. In Python 3 Unicode is the default encoding. exe, Pythonwin. Scholarly Search Engine Find information the second column provides the Unicodeescape sequence for the character,. > I suppose changing the Windows codepage for a single Popen call isn't > straightforward/possible? Nevermind. With wxPython software developers can create truly native user interfaces for their Python applications, that run with little or no modifications on Windows, Macs and Linux or other unix-like systems. Another reason for using slashes is your code to be uniform and homogeneous. Russian is the default system language, and utf-8 is the default encoding. Here is an example from the Python Interpreter:. Unicode character symbols table with escape sequences & HTML codes. Switching between quotation marks to display a character string is nothing but pain and to deal with it, Python provides us with an option that is “Escape Character”. 2) To indicate that the next character following should be passed without substitution. Purpose: This page is a PC utility to show the hex codes and their decimal ampersand equivalents associated with non-Latin-1 (non-Roman or accented) characters from pages encoded in Unicode/UTF-8. If you are using single quotes to print out a string and you have a single quote in the string, then you would have either 1 of 2 choices: you can use double quotes to enclose the string or you can use the escape character within the string enclosed by single quotes to have a single quote appear in the string. 5 file operation will use the wchar API correctly if you pass them a unicode object as filename, instead of a byte string. For the escape sequences that represent each of these characters, see Table 9. The term decoding can be confusing when you first start learning about Unicode in Python. Unicode in Python 3 If you are using Python 3:. Unicode¶ The main goal of this cheat sheet is to collect some common snippets which are related to Unicode. Prebuilt Python Binaries for Windows As you probably know psyco is the Python specializing compiler, created by Armin Rigo who is now working on PyPy. Pre-Python 3. str vs unicode. Unicode Literals in Python Source Code ¶. On both windows and Linux the following fails on Python 2. Python string method replace() returns a copy of the string in which the occurrences of old have been replaced with new, optionally restricting the number of replacements to max. For an alphabetical index of character and block names, use the Unicode Character Names Index. It is quite likely that when migrating existing code and writing new code you may be unaware of this change as most string algorithms will work with either type of representation; but you cannot intermix the two. It's all much cleaner. 0: Part of the old-style Py_UNICODE API; please migrate to using PyUnicode_AsRawUnicodeEscapeString(). Convert Unicode Text to HTML Entities. New Line Code: HTML Decimal, HTML Hexadecimal, Java Hexadecimal, URL Hexadecimal, Escape Sequence. python-future is the missing compatibility layer between Python 2 and Python 3. The same source code archive can also be used to build the Windows and Mac versions, and is the starting point for ports to all other platforms. Red Hat Enterprise Linux 3 Red Hat Enterprise Linux 4 Red Hat Enterprise Linux 5 Mozilla Firefox before 2. Programming with Unicode¶. OpenCV on Wheels. In Python, the data in a unicode or byte string is exactly the same. Unicode is an ambitious standard that attempts to provide a numeric code for every possible character, in every possible language, on every possible platform. str − This is the String to be written in the file. 6), it becomes fairly straightforward to insert a conversion in the output stream that will replace all those Unicode escape sequences with the original Unicode characters. In this tutorial, you'll get a Python-centric introduction to character encodings and unicode. We can add a hat ^ (also called a circumflex) by putting the unicode escape after the letter you want to add a hat to. Read more. ) in windows platform. Return NULL if an exception was raised by the codec. Python basics : Strings, unicode and other annoying text behaviors, Python 3 unicode and strings, Python backslash escape character and raw strings, Python backslash plague and raw strings with regular expressions, Python 3 strings and bytes. Tip: Escape Unicode special characters in C#. Chilkat Software. 这应该是用python会遇到的第一个问题。 (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape 上图:. unicode_escape_decode(). 日本 is written in Python as u'\u65e5\u672c' using two Unicode characters provided via escapes. To convert symbol to binary data, you need to lookup binary a value for in in the encoding. Right way to decode raw_unicode_escape-encoded data is use 'raw_unicode_escape' decoder. python3 把\u开头的unicode转中文 python3 中 python2 中 总结 unicode_escape可看作Unicode的反向编码,属于unicode. Strings are immutable sequences of Unicode code points. Using Windows folders as ArcPy Parameters? Ask Question simple folder management operations in an ArcGIS python toolbox. 1, on a windows 7 machines. 0 (and backported to Python 2. ASCII was developed a long time ago and now the non-printing characters are rarely used for their original purpose. For Python 2, strings that contain Unicode characters must start with u in front of the string. Further information can be found on PEP 3100. It's all much cleaner.