Ask Your Question
2

ascii encoding problem when clicking on a tag (that is in greek) - url is broken because contains <<< >>> characters

asked 2012-03-31 14:07:13 -0600

alexandros.z gravatar image

updated 2012-04-05 16:59:55 -0600

Hello, this is the stacktrace I am getting when I am pressing on a tag that is in greek

Traceback (most recent call last):

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\core\servers\basehttp.py", line 283, in run self.result = application(self.environ, self.start_response)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\contrib\staticfiles\handlers.py", line 68, in __call__ return self.application(environ, start_response)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\core\handlers\wsgi.py", line 272, in __call__ response = self.get_response(request)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\core\handlers\base.py", line 146, in get_response response = debug.technical_404_response(request, e)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\views\debug.py", line 294, in technical_404_response 'reason': smart_str(exception, errors='replace'),

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\utils\encoding.py", line 123, in smart_str errors) for arg in s])

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\utils\encoding.py", line 124, in smart_str return unicode(s).encode(encoding, errors)

UnicodeEncodeError: 'ascii' codec can't encode characters in position 24-30: ordinal not in range(128)

Is this a bug? Any advice?

Something more that I have noticed and may help is that this is not happening in all pages. For example, the same tag seems to work in /tags page. But it is not working in home page.

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted
2

answered 2012-04-10 16:12:41 -0600

zaf gravatar image

updated 2012-04-10 16:15:03 -0600

Yes, re.UNICODE as suggested in comments would be just fine. Check this solution for detailed answer.

edit flag offensive delete link more

Comments

Yes indeed, this also worked for me. In case anyone wants to read more about re.unicode, here is the official documentation http://docs.python.org/library/re.html#re.UNICODE

alexandros.z gravatar imagealexandros.z ( 2012-04-10 16:21:03 -0600 )edit
2

answered 2012-04-05 17:03:42 -0600

alexandros.z gravatar image

updated 2012-04-06 02:11:14 -0600

Still invastigating but I think that problem appears because a regular expression in question.py is not valid. Please have a look

def get_summary_html(self, search_state):
    html = self.get_cached_summary_html()
    if not html:
        html = self.update_summary_html()

    # use `<<<` and `>>>` because they cannot be confused with user input
    # - if user accidentialy types <<<tag-name>>> into question title or body,
    # then in html it'll become escaped like this: &lt;&lt;&lt;tag-name&gt;&gt;&gt;
    regex = re.compile(r'<<<(%s)>>>' % const.TAG_REGEX_BARE)

    while True:
        match = regex.search(html)
        if not match:
            break
        seq = match.group(0)  # e.g "<<<my-tag>>>"
        tag = match.group(1)  # e.g "my-tag"
        full_url = search_state.add_tag(tag).full_url()
        html = html.replace(seq, full_url)

    return html

Since my tags are not in english, but in greek, there is a chance that my url already contains different strange symbols. Thus, regex may is confused and dont remove these <<< >>> characters

edit flag offensive delete link more

Comments

Could you please give samples where this regex would fail?

Evgeny gravatar imageEvgeny ( 2012-04-06 09:53:49 -0600 )edit

I am sorry , I should rephrase because I am afraid that my statement is not clear enough. I dont imply that there is a problem in the regex. I am saying that the fact that my url is broken (because contains. these characters <<<>>>) may be is related with this piece of code. I need to investigate more. Any tips are more than welcome

alexandros.z gravatar imagealexandros.z ( 2012-04-06 16:59:18 -0600 )edit

Makes sense, we'd like to have examples which break the urls, then we can fix the regex. Maybe you can try adding flag to the regex re.UNICODE - will that help?

Evgeny gravatar imageEvgeny ( 2012-04-06 17:12:19 -0600 )edit

Yep, this helps! I will let you know.

alexandros.z gravatar imagealexandros.z ( 2012-04-06 17:47:05 -0600 )edit

I have an similar issue and for me it was not enough to add flag re.UNICODE but I found that my python use different locale settings in locale., sys. and os.environ. To fix locale issue I have to: 1) export ru_RU.UTF-8 through envvar 2) convert some template files in UTF-8 with BOM (without that step I caught "UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)")

Yes, I have no idea what I'm doing.

Olloff gravatar imageOlloff ( 2012-04-08 00:02:59 -0600 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

2 followers

Stats

Asked: 2012-03-31 14:07:13 -0600

Seen: 546 times

Last updated: Apr 10 '12