First time here? Check out the FAQ!
2

ascii encoding problem when clicking on a tag (that is in greek) - url is broken because contains <<< >>> characters
 

Hello, this is the stacktrace I am getting when I am pressing on a tag that is in greek

Traceback (most recent call last):

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\core\servers\basehttp.py", line 283, in run self.result = application(self.environ, self.start_response)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\contrib\staticfiles\handlers.py", line 68, in __call__ return self.application(environ, start_response)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\core\handlers\wsgi.py", line 272, in __call__ response = self.get_response(request)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\core\handlers\base.py", line 146, in get_response response = debug.technical_404_response(request, e)

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\views\debug.py", line 294, in technical_404_response 'reason': smart_str(exception, errors='replace'),

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\utils\encoding.py", line 123, in smart_str errors) for arg in s])

File "c:\work\python\lib\site-packages\django-1.3.1-py2.7.egg\django\utils\encoding.py", line 124, in smart_str return unicode(s).encode(encoding, errors)

UnicodeEncodeError: 'ascii' codec can't encode characters in position 24-30: ordinal not in range(128)

Is this a bug? Any advice?

Something more that I have noticed and may help is that this is not happening in all pages. For example, the same tag seems to work in /tags page. But it is not working in home page.

To enter a block of code:

  • enter empty line after your previous text
  • paste or type the code
  • select the code and press the button above
Preview: (hide)
alexandros.z's avatar
596
alexandros.z
asked 13 years ago, updated 13 years ago

Comments

see more comments

2 Answers

2

Still invastigating but I think that problem appears because a regular expression in question.py is not valid. Please have a look

def get_summary_html(self, search_state):
    html = self.get_cached_summary_html()
    if not html:
        html = self.update_summary_html()

    # use `<<<` and `>>>` because they cannot be confused with user input
    # - if user accidentialy types <<<tag-name>>> into question title or body,
    # then in html it'll become escaped like this: &lt;&lt;&lt;tag-name&gt;&gt;&gt;
    regex = re.compile(r'<<<(%s)>>>' % const.TAG_REGEX_BARE)

    while True:
        match = regex.search(html)
        if not match:
            break
        seq = match.group(0)  # e.g "<<<my-tag>>>"
        tag = match.group(1)  # e.g "my-tag"
        full_url = search_state.add_tag(tag).full_url()
        html = html.replace(seq, full_url)

    return html

Since my tags are not in english, but in greek, there is a chance that my url already contains different strange symbols. Thus, regex may is confused and dont remove these <<< >>> characters

To enter a block of code:

  • enter empty line after your previous text
  • paste or type the code
  • select the code and press the button above
Preview: (hide)
alexandros.z's avatar
596
alexandros.z
answered 13 years ago, updated 13 years ago
link

Comments

Could you please give samples where this regex would fail?

Evgeny's avatar Evgeny (13 years ago)

I am sorry , I should rephrase because I am afraid that my statement is not clear enough. I dont imply that there is a problem in the regex. I am saying that the fact that my url is broken (because contains. these characters <<<>>>) may be is related with this piece of code. I need to investigate more. Any tips are more than welcome

alexandros.z's avatar alexandros.z (13 years ago)

Makes sense, we'd like to have examples which break the urls, then we can fix the regex. Maybe you can try adding flag to the regex re.UNICODE - will that help?

Evgeny's avatar Evgeny (13 years ago)

Yep, this helps! I will let you know.

alexandros.z's avatar alexandros.z (13 years ago)

I have an similar issue and for me it was not enough to add flag re.UNICODE but I found that my python use different locale settings in locale., sys. and os.environ. To fix locale issue I have to: 1) export ru_RU.UTF-8 through envvar 2) convert some template files in UTF-8 with BOM (without that step I caught "UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)")

Yes, I have no idea what I'm doing.

Olloff's avatar Olloff (13 years ago)
see more comments
2

Yes, re.UNICODE as suggested in comments would be just fine. Check this solution for detailed answer.

To enter a block of code:

  • enter empty line after your previous text
  • paste or type the code
  • select the code and press the button above
Preview: (hide)
zaf's avatar
512
zaf
answered 13 years ago, updated 13 years ago
link

Comments

Yes indeed, this also worked for me. In case anyone wants to read more about re.unicode, here is the official documentation http://docs.python.org/library/re.html#re.UNICODE

alexandros.z's avatar alexandros.z (13 years ago)
see more comments