Slow And Steady Beats Fast And Furious

By Serdar Yegulalp on 2021-08-31 21:00:00 No comments

For some years now, I've been running my blog using a software project I wrote myself in Python, which I call M2 for short. (Actually, I should call it M4 ... ehh, it's a long story.) I've rewritten M2 completely once, mainly to apply all the things I learned about Python and programming along the way since I started. For all the guff Python gets about being a slow language, it's typically more than fast enough for any job that isn't specifically about computation. For jobs that are about computation, there are libraries one can reach for to pave over that gap. In my case, when M2 has been slow, it hasn't been because of Python; it's been because I run it on shared hosting with significant constraints on CPU and I/O. It's not because Python itself is too slow for the job.

The single slowest part of M2 is not in the program itself but in one of the templates I wrote for my site. It's the one that lists every single post I've ever made in chronological order for a given category, and for the site as a whole. Because templates are processed one at a time, the entire site's publishing operation grinds to a crawl when it hits one such template. And a big part of the slowness is, again, I/O: a database query that coughs up thousands of records and churns through them one at a time. When run on my own system, it runs considerably faster, because it's not competing for resources with a dozen other users. But there's still room for improvement.

I could do a few things. For one, I could use this worst-case scenario as a test case for optimizing how M2 builds pages -- for instance, by flagging certain pages as long-running, and having them processed in the background. But in a way, I already do this. When you publish a new post, all the pages associated with the post are pushed into a queue, which is then worked on in the background, independent of any other operations you're doing. If I set up a post to appear on a schedule, it handles all that by way of a cron job, so it's not like I'm stuck hanging around waiting for these things to finish before I can do other work anyway.

Option #2 is just to forget about those all-in-one templates, which offer few advantages to the reader anyway, and stick with the yearly indexes. At some point I may do this, and just have the base index for each category be, say, the top 100 entries in that category, with a link at the end to the year index of the last entry.

Option #3, the most useful, would be to create a new kind of template directive that allows a large query to be broken across multiple pages. This would also be the most difficult to implement as I have no native mechanism for doing this abstractly with a query. It would have to work on any kind of query to be truly useful. But if implemented right, it would be a hugely powerful way to handle this issue. The right thing is often difficult because it's the right thing.

For now, though, the full index build in the background is not the worst option. It happens out of the main flow of the program, so it doesn't interfere with UI operations. I also have the build process set up so that it happens after most everything else has finished processing, so you're not waiting on it before completing other things that are more important. Where this becomes an issue, it's one that has been mitigated well enough that it only gives me aesthetic heebie-jeebies, not functional ones.

Tags: Python blogging software