Ask Your Question

ascii encoding problem when clicking on a tag (that is in greek) - url is broken because contains <<< >>> characters

asked 2012-03-31 14:07:13 -0500

alexandros.z's avatar

updated 2012-04-05 16:59:55 -0500

Hello, this is the stacktrace I am getting when I am pressing on a tag that is in greek

Traceback (most recent call last):

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\core\servers\", line 283, in run self.result = application(self.environ, self.start_response)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\contrib\staticfiles\", line 68, in __call__ return self.application(environ, start_response)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\core\handlers\", line 272, in __call__ response = self.get_response(request)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\core\handlers\", line 146, in get_response response = debug.technical_404_response(request, e)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\views\", line 294, in technical_404_response 'reason': smart_str(exception, errors='replace'),

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\utils\", line 123, in smart_str errors) for arg in s])

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\utils\", line 124, in smart_str return unicode(s).encode(encoding, errors)

UnicodeEncodeError: 'ascii' codec can't encode characters in position 24-30: ordinal not in range(128)

Is this a bug? Any advice?

Something more that I have noticed and may help is that this is not happening in all pages. For example, the same tag seems to work in /tags page. But it is not working in home page.

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted

answered 2012-04-10 16:12:41 -0500

zaf's avatar

updated 2012-04-10 16:15:03 -0500

Yes, re.UNICODE as suggested in comments would be just fine. Check this solution for detailed answer.

edit flag offensive delete link more


Yes indeed, this also worked for me. In case anyone wants to read more about re.unicode, here is the official documentation

alexandros.z's avatar alexandros.z  ( 2012-04-10 16:21:03 -0500 )edit

answered 2012-04-05 17:03:42 -0500

alexandros.z's avatar

updated 2012-04-06 02:11:14 -0500

Still invastigating but I think that problem appears because a regular expression in is not valid. Please have a look

def get_summary_html(self, search_state):
    html = self.get_cached_summary_html()
    if not html:
        html = self.update_summary_html()

    # use `<<<` and `>>>` because they cannot be confused with user input
    # - if user accidentialy types <<<tag-name>>> into question title or body,
    # then in html it'll become escaped like this: &lt;&lt;&lt;tag-name&gt;&gt;&gt;
    regex = re.compile(r'<<<(%s)>>>' % const.TAG_REGEX_BARE)

    while True:
        match =
        if not match:
        seq =  # e.g "<<<my-tag>>>"
        tag =  # e.g "my-tag"
        full_url = search_state.add_tag(tag).full_url()
        html = html.replace(seq, full_url)

    return html

Since my tags are not in english, but in greek, there is a chance that my url already contains different strange symbols. Thus, regex may is confused and dont remove these <<< >>> characters

edit flag offensive delete link more


Could you please give samples where this regex would fail?

Evgeny's avatar Evgeny  ( 2012-04-06 09:53:49 -0500 )edit

I am sorry , I should rephrase because I am afraid that my statement is not clear enough. I dont imply that there is a problem in the regex. I am saying that the fact that my url is broken (because contains. these characters <<<>>>) may be is related with this piece of code. I need to investigate more. Any tips are more than welcome

alexandros.z's avatar alexandros.z  ( 2012-04-06 16:59:18 -0500 )edit

Makes sense, we'd like to have examples which break the urls, then we can fix the regex. Maybe you can try adding flag to the regex re.UNICODE - will that help?

Evgeny's avatar Evgeny  ( 2012-04-06 17:12:19 -0500 )edit

Yep, this helps! I will let you know.

alexandros.z's avatar alexandros.z  ( 2012-04-06 17:47:05 -0500 )edit

I have an similar issue and for me it was not enough to add flag re.UNICODE but I found that my python use different locale settings in locale., sys. and os.environ. To fix locale issue I have to: 1) export ru_RU.UTF-8 through envvar 2) convert some template files in UTF-8 with BOM (without that step I caught "UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)")

Yes, I have no idea what I'm doing.

Olloff's avatar Olloff  ( 2012-04-08 00:02:59 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools



Asked: 2012-03-31 14:07:13 -0500

Seen: 755 times

Last updated: Apr 10 '12