Pylons is next on my list of frameworks to look at, because I like the scalability characteristics of WSGI middleware servers.
The idea that you might have 50% of your requests using sessions, 25% using authentication, 10% some web service, 10% using some other web service (cached), and some other percentage using any of a number of other things that could involve caching, computation, instantiation, database and/or network connections, etc., and not using the mod_perl solution of loading everything into every server process, the PHP way of loading everything from scratch (or one big cache) on every request or the java solution of really big APP servers with lots of threads.
Instead of have 100 high memory and/or cpu worker processes (or threads) taking up resources so they can handle everything (except static content or database) or moving everything onto the network with webservices, splitting it up so you have just as many processes as you need for each service, saving 90 memory for each of your web service handlers, and 50 percent for your session handlers.
However, I’ve heard in practice that only a few things need offloaded, and WSGI in practice doesn’t do that much better than separating static, appserver, database, and cache, which is probably good enough for most. So the overhead of a common gateway interface isn’t that useful.
Yet I do like the idea of “feature-specific” caches or servers. Having a pool of memcached’s is really nice, but having pools of each of session caches, page caches, web service caches, business object caches, etc. seems like it would be more resource efficient, and possibly more secure.
Coming from PHP, the idea of offloading things like PDF and graphic processing from the webserver (or appserver) also appeals.
You say ‘and not using the mod_perl solution of loading everything into every server process’. But using Pylons the same thing can happen. The thing you are seeking isn’t really related to the particular Python web application framework you use, but the underlying hosting mechanism.
Thus, host Pylons on mod_python or mod_wsgi (embedded mode) and you will still have a Pylons application instance in every Apache child process.
Thus you are confusing things. Even if you use Paste server with Pylons in a single process behind mod_proxy, you are still loading everything into the same process. The only difference is that by running it in a separate process, there is only one instance unlike when they are hosted embedded with Apache child processes. Of course, this isn’t quite true if you tell Paste server itself to create multiple processes as then you will have multiple instances again. Same thing applies with you use FASTCGI solutions.
One could use mod_proxy and map different URLs through to different Paste server instances running Pylons. That is, map URLs based on features required so all specific type of handlers handle in one process. With Pylons though am not sure whether that would save too much as am not sure if it preloads a lot based on configuration. That, is not sure how much it uses lazy loading. So, even if you map all URLs needing sessions to a specific process, the other process which never handle those URLs, depending on how you configure Pylons may have still at least loaded the code even if it doesn’t have all the run time overhead of that feature.
An alternate to mod_proxy is use mod_wsgi daemon mode, where multiple daemon processes are configured with different URL subsets again mapped to different processes. Similar idea but maybe perhaps easier to setup and manage than something based on mod_proxy and separate Paster server processes.
Maybe, I don’t properly understand what you are after, but at the moment am not so sure Pylons is going to automatically give you what you want. This is because as I say it has more to do with how hosting mechanism is configured to distribute URLs across processes handling those requests rather than the web framework itself.
Yeah, I didn’t mean using mod_python or mod_wsgi specifically. An implementation like that would be similar to mod_perl architecturally unless you proxied various wsgi services (but the same could be done for mod_perl apps, minus the WSGI standard protocol — which I think was the original etoys solution back in 1998.)
I’d probably not use paste or cherrypy servers except behind Apache anyway. I really like the up-front processing you an do with Apache that you can do with rewrites, gzip, static files, or custom handlers. I was reading something recently about the problems of serving static content through WSGI and thought that was really missing the point. On the other hand, routes is much more appealing than mod_rewrite plus or auto_prepend type solutions, so I can feel the pain.
Many people have moved beyond Apache, but I guess I haven’t made that step yet. WSGI seems to be a step beyond FCGI, but I’d still be afraid of implementations. Probably the best way to do it is inside Apache, but then you’re back to where you started. Sorry if I’m rambling.
I guess what I really want is application pool daemons. Memcached is the architectural model I keep coming back to, but I don’t want a cache, I want application pools, that can handle caching, persistence, and access. I want my front server to handle routing and manage authentication.
Sessions would be only a specific case of an application pool. So you might have a pool of shopping carts, which is a specific case of a session. You might also have a pool of PDF report renderers called by report generators which could for instance give users purchase history.
The whole point of this is that then you could tune your applications based on fine tuned usage. If you have a pool of 10 front end servers handling static content and basic requests, and then an adjustable (either manually or via min/max pooling) specific app servers. Perhaps finding that 5 session servers (some of which have shopping carts) is adequate to meet demand, and that 2 report services that have 1 or more persistent db connections and 1 PDF rendering server that takes heavy CPU and memory calculations off of physical server where all these other individual app servers might live.
When I say “server” I usually mean mean single thread or process that can handle 1 blocking request at a time. The idea being to keep the architecture simple.
Maybe that’s too much complexity for the return. Probably. But not having 20 db connections open all the time (10 for shopping carts and 10 to another db for reports) and 10 PDF libs loaded in memory, but instead having 7 db connections (5 for carts, 2 for orders) and 1 PDF library seems like substantial savings, and additional clarity (and hopefully security) in the code. Of course I’m pulling numbers out of thin air, but my gut says they’re close enough approximations to make the case.