First time here? Check out the FAQ!
2

ascii encoding problem when clicking on a tag (that is in greek) - url is broken because contains <<< >>> characters

Hello, this is the stacktrace I am getting when I am pressing on a tag that is in greek

Traceback (most recent call last):

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\core\servers\basehttp.py", line 283, in run self.result = application(self.environ, self.start_response)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\contrib\staticfiles\handlers.py", line 68, in __call__ return self.application(environ, start_response)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\core\handlers\wsgi.py", line 272, in __call__ response = self.get_response(request)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\core\handlers\base.py", line 146, in get_response response = debug.technical_404_response(request, e)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\views\debug.py", line 294, in technical_404_response 'reason': smart_str(exception, errors='replace'),

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\utils\encoding.py", line 123, in smart_str errors) for arg in s])

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\utils\encoding.py", line 124, in smart_str return unicode(s).encode(encoding, errors)

UnicodeEncodeError: 'ascii' codec can't encode characters in position 24-30: ordinal not in range(128)

Is this a bug? Any advice?

Something more that I have noticed and may help is that this is not happening in all pages. For example, the same tag seems to work in /tags page. But it is not working in home page.

alexandros.z's avatar
596
alexandros.z
asked 2012-03-31 14:07:13 -0600, updated 2012-04-05 16:59:55 -0600
edit flag offensive 0 remove flag close merge delete

Comments

add a comment see more comments

2 Answers

2

Yes, re.UNICODE as suggested in comments would be just fine. Check this solution for detailed answer.

zaf's avatar
512
zaf
answered 2012-04-10 16:12:41 -0600, updated 2012-04-10 16:15:03 -0600
edit flag offensive 0 remove flag delete link

Comments

Yes indeed, this also worked for me. In case anyone wants to read more about re.unicode, here is the official documentation http://docs.python.org/library/re.html#re.UNICODE

alexandros.z's avatar alexandros.z (2012-04-10 16:21:03 -0600) edit
add a comment see more comments
2

Still invastigating but I think that problem appears because a regular expression in question.py is not valid. Please have a look

def get_summary_html(self, search_state):
    html = self.get_cached_summary_html()
    if not html:
        html = self.update_summary_html()

    # use `<<<` and `>>>` because they cannot be confused with user input
    # - if user accidentialy types <<<tag-name>>> into question title or body,
    # then in html it'll become escaped like this: &lt;&lt;&lt;tag-name&gt;&gt;&gt;
    regex = re.compile(r'<<<(%s)>>>' % const.TAG_REGEX_BARE)

    while True:
        match = regex.search(html)
        if not match:
            break
        seq = match.group(0)  # e.g "<<<my-tag>>>"
        tag = match.group(1)  # e.g "my-tag"
        full_url = search_state.add_tag(tag).full_url()
        html = html.replace(seq, full_url)

    return html

Since my tags are not in english, but in greek, there is a chance that my url already contains different strange symbols. Thus, regex may is confused and dont remove these <<< >>> characters

alexandros.z's avatar
596
alexandros.z
answered 2012-04-05 17:03:42 -0600, updated 2012-04-06 02:11:14 -0600
edit flag offensive 0 remove flag delete link

Comments

Could you please give samples where this regex would fail?

Evgeny's avatar Evgeny (2012-04-06 09:53:49 -0600) edit

I am sorry , I should rephrase because I am afraid that my statement is not clear enough. I dont imply that there is a problem in the regex. I am saying that the fact that my url is broken (because contains. these characters <<<>>>) may be is related with this piece of code. I need to investigate more. Any tips are more than welcome

alexandros.z's avatar alexandros.z (2012-04-06 16:59:18 -0600) edit

Makes sense, we'd like to have examples which break the urls, then we can fix the regex. Maybe you can try adding flag to the regex re.UNICODE - will that help?

Evgeny's avatar Evgeny (2012-04-06 17:12:19 -0600) edit

Yep, this helps! I will let you know.

alexandros.z's avatar alexandros.z (2012-04-06 17:47:05 -0600) edit

I have an similar issue and for me it was not enough to add flag re.UNICODE but I found that my python use different locale settings in locale., sys. and os.environ. To fix locale issue I have to: 1) export ru_RU.UTF-8 through envvar 2) convert some template files in UTF-8 with BOM (without that step I caught "UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)")

Yes, I have no idea what I'm doing.

Olloff's avatar Olloff (2012-04-08 00:02:59 -0600) edit
add a comment see more comments