Recurrent 500 error

server
memory
deployment
retag add tags

Since I upgraded from 0.6.35 to 0.6.56 I have a recurring 500 error. The cause seems to be my fastcgi process crashing or becoming unresponsive as the server log shows a response not received from the fastcgi socket.

My other Django sites on the same server are running fine, so the problem appears to be specific to either Askbot or to the current Django version (Askbot is installed using pip and I stick the the Django version that it auto installs as a dependency).

I have been unable to reproduce the error short of leaving it running for a while.

It does seem that one of my Askbot processes always has memory usage that rises to about 26mb (RSS) after serving a few pages. All my other Django processes are in the 11 - 15 mb range, Not sure if its connected.

[edited to fix incorrect title. Although the error mentioned was in the log, it did not recur when the 500 error did)

13.2k

Evgeny

updated 2011-01-09 21:52:17 -0500

319

graeme

asked 2011-01-09 12:46:28 -0500

edit flag offensive 0 remove flag close merge delete

Comments

Hi Graeme, Is there anything in the file "log/askbot.log"? Also, if "DEBUG=True" is set in the settings.py the error traceback will show on the screen, debug mode is more memory consumptive though. Does this happen when you try to log in? If there is error 500 it's likely to be a bug in the code. I will look into the memory footprint too. Is this 500 error returned by python?

Evgeny (2011-01-09 13:20:16 -0500) edit

Please feel free to contact me directly, I can give more hands on help. Probably in this case it will be best to revert to the earlier version and upgrade when the issue is resolved.

Evgeny (2011-01-09 13:32:13 -0500) edit

Nothing in askbot.log. Access log does not show it follows any particular action. I have tested loggin in, logging out, asking questions. I have now set DEBUG = True on the production process (because I have not reproduced the error on Django test server) and will either post the traceback here or send it to you (assuming I get it!). Thanks for the prompt reply.

graeme (2011-01-09 13:53:23 -0500) edit

The fastcgi process appears to crash as I had one fewer process than expected following the last 500. It is possible that it is being terminated for excess memory use or similar. It does not appear from the lighttpd logs that the error is coming from python: lighttpd cannot get a response from the fastcgi process.

graeme (2011-01-09 13:54:45 -0500) edit

I am checking the memory limits and what info they have with the host.

graeme (2011-01-09 21:30:42 -0500) edit

add a comment see more comments

1 Answer

Sort by

oldest

newest

most voted

I went through the code evolution between the versions and found that the memory usage increased when I added settings for the badge awards and the code to award badges. The increase was by ~ 5-10%.

What is the per-process memory limit on your host, also - is your server 32 or 64 bit? Is it possible to increase the memory limit? Maybe you could talk to your hosting provider to help find the diagnosis?

One thing that you might try independently - configure the forum (and other django sites that you are running on the same host) to use memcached - that might reduce the memory load on your server.

I will have a special look into reducing the memory requirement for the application.

edit: updated the code where memory usage dropped by about 7% - in version 0.6.57 (btw, the site version is printed in the footer), along with some other fixes. There have been some changes to css and templates - may want to give it a test-run first and adjust the css, if needed.

13.2k

Evgeny

updated 2011-01-09 22:25:14 -0500, answered 2011-01-09 14:29:02 -0500

edit flag offensive 0 remove flag delete link

Comments

I have a 100mb total memory limit. I doubt a 5-10% increase will cause a problem. I will contact me host and ask if they can see a problem. Its 64 bit linux.

graeme (2011-01-09 21:21:06 -0500) edit

well, you'll need to add up the numbers to see whether you are really hitting the memory limit. I have managed to reduce the memory usage to almost the previous level, will publish the code soon. I've retagged the post to reflect that there actually is no python exception in this case.

Evgeny (2011-01-09 21:29:30 -0500) edit

I have set up a cron job to restart the process every 20 min until we have a proper fix. Its never failed in less than an hour or so, so this should make it acceptably reliable for the moment.

graeme (2011-01-09 22:51:51 -0500) edit

ok, I will try to reduce memory usage further, but it may take a significant effort, because memory is eaten in many small installments. Also, most likely with time memory requirement will grow, as the code grows. Have you tried uwsgi http://projects.unbit.it/uwsgi/ along with nginx - may save on the memory and uwsgi is fast. Also, there is a tool to detect memory leaks called Dozer http://pypi.python.org/pypi/Dozer it shows graphs of the memory usage per module.

Evgeny (2011-01-09 23:31:54 -0500) edit

btw, if you are using cron to restart the process, you might also add couple of http hits with wget - maybe one on the front page and one - on a question page - to force the process compile the code, otherwise the users will be experiencing slow response time, especially if the rate of visits is not very high.

Evgeny (2011-01-09 23:37:12 -0500) edit

add a comment see more comments

Recurrent 500 error edit

Comments

1 Answer

Comments

Recurrent 500 error