Devoured By Lions

the eternal struggle to tame complexity

Initial Survey of the Ruby Landscape

The last week or so I have been trying to learn a bit about Ruby and Ruby on Rails, and evaluate both of these in light of an existing project for which I have been to date using Java technologies (in particular, rich web frameworks (~ GWT/ITMill) and ORM (JPA + Derby)). Here’s what I have surmised so far:

* Ruby is a great, dynamic, and fun (!) language. However the state of release management, and fracturing of incompatible implementations and platforms is a little concerning.
* Ruby on Rails does a few important things that make it really fast to develop with: 1) for a certain class of application, it makes all the right major decisions for you so you have less mental noise to deal with - the stack is fully integrated 2) provides scripts to automate development 3) the philosophy is to exploit the utility of the Ruby language for all aspects of the framework, so again, there is less mental baggage to deal with (no XML schemas to remember, no extra config file syntaxes). It actually seems quite a bit like Django.
* RoR is plainly targeted towards mass public web sites. Scaling RoR eventually typically involves introducing a bunch of different technologies (somewhat diluting the uniformity effect above). So far I haven’t found support for developing, for example, portable downloadable applications in RoR. But…
* One of the best things about Ruby is JRuby. Apparently it is the leading implementation now, which is great, because it connects Ruby to the wealth of existing Java-based technologies. This also potentially provides the one-click download/run that is possible with Java. There are tools that allow packaging up (j)RoR applications as a Java applications. So JRuby + RoR + Derby + Warbler == instantly portable RoR bundle.

I look forward to trying RoR and packaging as a Derby-based Java application to see if the claimed productivity benefits (developer from Competitious claims in seriousness “100x”) can translate over to the Java platform.

In addition, I’d like to try out Heroku. It would be cool to develop both for JRoR, and Heroku. Unfortunately I couldn’t install Heroku via JRuby, as it appears to require some native integration (so I guess one just needs to use native Ruby when working with Heroku).

On a related note, Mats has said that re-using an existing VM does not save as much work as one would think, but still, it appears to me there are great benefits to doing that (in addition to JRuby there is IronRuby on CLR). Granted the JVM does not (yet) have good dynamic language support, but once you get to a single VM there so many more possibilities and a more efficient focus of resources (e.g. on optimizing or porting the VM). Really, how many client-side database drivers do folks need to be rewriting over and over again. Let’s face it, most programming languages are just a slightly different permutation of features which push the same bits around.

Drupal and Gzip Compression

In the process of trying to get the Drupal Image module -based gallery working I discovered my PHP GD did not support jpeg images. I upgraded to PHP 5.2.9 in the process, enabling jpeg support as well as gzip/zlib. For the two existing Drupal sites I had up, and for which I had enabled caching and compression under the Performance settings, all of a sudden Firefox started reporting a content encoding error and would not render content. I suspected the gzip support I had just compiled in, and at first I thought this might be due to my reverse proxy configuration (by the way, Drupal 6.10 installation worked through the proxy just fine; I didn’t even realize this until afterwards). However, I eventually determined (after reverting and reading the Performance settings details more closely) that Drupal applies the compression to the cached content. After I cleared the cache, the sites started working again. Some testing with wget revealed what clearly was happening: Drupal was claiming the cached content was gzipped when it really wasn’t, because at the time it was cached, gzip support wasn’t available (even though the option was enabled in the Drupal settings).

So galleries are working now. For some reason I had to set Thickbox to use the ‘Original’ image and not the ‘Preview’ which is the default. I don’t know what the difference is - ‘Original’ sounds like what one would want anyway.

Drupal Won’t Install Behind a Reverse Proxy

After having tried repeatedly to install Drupal 6.9 against Apache + mod_php + postgresql (assuming that I should get with the program and run it on its native platform instead of Quercus/Tomcat), then falling back to 6.8, then losing all hope when even 6.2 still hung on “Initializing.” despite all my efforts (and countless rebuilds of PHP), I finally decided I should probably give it one last try directly served by Apache as opposed to in a reverse proxy configuration.

And the install worked off the bat.

So if you are struggling to install Drupal and are about to give up in disgust, make sure you are not in a reverse proxy configuration. Why this doesn’t actually work I have no idea. I don’t see reverse proxy options in the administrative settings, so I’m not sure how/whether I should configure them even though I see them in the config file.

Quercus and Drupal 6.9 Regression

A regression must have been introduced in Drupal 6.9 because I went to give it a spin tonight on Tomcat 6 (just placing the Quercus libs in Tomcat WEB-INF/lib and configuring the database connection in the META-INF/context.xml), and the install failed with a warning about UTF/UNICODE encoding, weird IO errors, and printing “SHOW SERVER_VERSION”, finally warning that the version of PostgreSQL was not supported (even though it is 8+). There appear to have been some changes in how the PostgreSQL version is detected, but even after commenting that check right out of the PostgreSQL database support file, the install fails.

I downloaded a fresh Drupal 6.2 and that did install (although it also gives a customary error at the end of installation…but it does create and initialize the tables in the database).

So I guess I’ll chalk it up to a difference between the versions. Hopefully it will be fixed (I may have to do more research) because Drupal 6.2 displays a red error indicating that it needs to be upgraded because of a security risk.

Update: it looks like someone else has run into this

Apache Configuration

One frustrating thing about Apache is how unamenable its configuration system is to parameterization, generalization, and reuse. To start with, even in the simplest Apache configuration one typically wants to parameterize the configuration file that Apache uses, and the stock apachectl has meagre support for parameterization (although it does source the envvars file; this mechanism is still hard to use to support multiple separate configurations). But beyond the wrapper script (which I think most people just copy, modify or rewrite entirely anyway), there is no consistent variable/parameter expansion support in the configuration system. Sure, specific modules and directives support some for of parameterization, but they are usually bespoke, awkward and incompatible with each other (the prime culprit being mod_rewrite which is its own adhoc language). Here is a short list of some of the inconsistent parameterization capabilities:

* “environment” variables
* Defines
* mod_rewrite RewriteRule and RewriteCond backreferences
* Parameter expansion in mod_vhost_alias

I am currently trying to design and implement an architecture that consists of a frontend Apache instance that proxies to backend user Apache instances (from my current mess which is single Apache instance where I tried in vain to apply various inconsistent parameterization approaches above to allow multiple users to share the same Apache instance; ultimately I decided true isolation is really the best and simpler approach). This is almost complete however my last stumbling block is how to compose the backend user Apache config from a large stock configuration plus a minimal (would be unnecessary if the main config could be properly parameterized!) user config that specifies User/Group and Listen port.

First I thought I could have the master config include every user config, but wrap each user config with a define that ensures only the proper config is used. E.g.:

httpd -f backend.conf -DUSER_FOO

master.conf: Include users/*.inc


<IfDefine USER_FOO>
...


This actually works, although I realized one drawback is that each user’s config must be readable by all other users. Apache will not start up if one of the includes is unreadable. This also renders the conditional configuration moot.

My second attempt was to have the main Apache config include a generic user config at a fixed location:

Include user.inc

A setup script ensures that the correct user config is linked in the location that their Apache instance will ultimately find. Read access to each user’s config can be restricted. However this requires a yet another manual (well, it’s automated, but it’s still superfluous) configuration step.

My last attempt was to use Apache’s -f flag to specify two configuration files on the command line.

httpd -f backend.conf -f user_foo.conf

This theoretically offers the benefit that each user’s Apache server root (each user must have their own ServerRoot because, yet again, Apache configuration is not parameterizable enough to use a shared root) can be identical. This superficially appears to work…however I discovered that it doesn’t really work because the two config files are not actually composed in the way I would expect (that is, both parsed under the same context such that the second and subsequent config files observe configuration set by preceding files). In reality what happens is that each file is parsed independently and instead of previously parsed configuration being observed, the built-in Apache defaults are used. That means, for example, if you set the DocumentRoot in one config file, but omit it in the next…it doesn’t actually get used. The omission in the second file causes the default to override the previously parsed value!

So ultimately I’m going to have to revert to strategy #2.

If the Apache config just supported a consistent parameter expansion syntax throughout (e.g. like a pre-processor, so that support doesn’t have to be manually implemented in every single directive) I suspect that most of the configuration could be boiled down to a single shared config file. (Yes I realize that pre-processor parameter expansion is not alone sufficient since things need to be parameterized at runtime as well; but that is not the type of parameterization needed for the above scheme). Other Apache products (Ant, Maven, Tomcat, etc.; all Java based…) seem to support this type of parameterization.

I remember seeing a Google Summer of Code proposal for redesigning the Apache config so maybe there is some hope (although I can no longer find any reference to it).

Python 2.4 and Cookies

Everytime I start liking Python…there has to be a downer.


451 raise CookieError(“Attempt to set a reserved key: %s” % key)
452 if “” != translate(key, idmap, LegalChars):
453 raise CookieError(“Illegal key value: %s” % key)
454
455 # It’s a good key, so save it.
global CookieError = , key = ‘ABC:123.123.123.123:XYZ’
CookieError: Illegal key value: ABC:123.123.123.123:XYZ
args = (‘Illegal key value: ABC:123.123.123.123:XYZ’,)


Really? Apparently Python’s (at least the cruddy version I’m stuck with in RHEL 5) Cookie implementation is rather strict in the headers it will accept and will just blow up if there are characters that are not in the RFC.

http://markmail.org/message/bsabuonsslb4ybiz

While I generally would agree that the implementation conform to the spec and indicate when the input is non-standard, I think just blowing up is rather harsh. There have to be garbage cookies floating around there on the web. Unfortunately this one is related to the single sign on system in front of my app. So there are two fails here.

So I guess I’m left to figure out a way to kill this cookie before Python sees it…most likely I’ll have to implement this as a mysterious hack in the config of the Apache instance that frontends it. Unless I can sneak it into some WSGI layer. But I’ll probably just hack it because my life and patience are finite.