(Apologies, I still don't have another blog. :-))
Just a brief follow-up to my earlier article.
A disclaimer which I should have added to my last article would include that most of my pseudo benchmarks are very subjective and also way too basic. For example, our server setup is pretty comprehensive but we have to take everything into account in order to provide real benchmark. And when I write everything I mean CPU (cores), RAM, motherboard, HDD and so on. Maybe even the throughput of the network card -- if it's different.
Having said this, here are a couple follow-ups.
1) require/include(_once) and __autoload, or "Why is __autoload() 'better'?"
A lot of people asked me if __autoload() wasn't slower than a straight include, and of course they are correct.
They are correct because when __autoload() is invoked, that is the extra overhead before anything else happens. Because inside __autoload() there's just include too. When coding a simple application the developer should be on top of the code and all its dependency, but sometimes they (or we) are not. ;-)
To illustrate my point from the above, look at the following example:
Faster:
<?php
include 'Foobar.php';
$foobar = new Foobar();
echo $foobar->hello('World');
?>
Slower:
Even faster (though not really maintainable):<?php
function __autoload($class) {
include '"{$class}.php";
}$foobar = new Foobar();
echo $foobar->hello('World');
?>
Alas, it it's not always that easy. Applications are often bigger than a single page and we sacrifice all the manual keeping track of our dependencies with calls to require/include_once or better __autoload(). :-) And because the Zend Framework is full require_once calls, which I explained that I stripped, I choose the "in this case" less expensive route with __autoload().<?php
class Foobar {
public function __construct() {}
public function hello($var) { return $var; }
}$foobar = new Foobar();
echo $foobar->hello('World');
?>
The bottom line is, that it's all about convenience.
Reasons for not being on top include that maintaining dependencies inside a framework such as Zend Framework are a nightmare. This is not about its 6 or so MB (mini release). It's also not about 20 MB -- we can all agree on the fact that disk space is cheap!
It's just that even though not all the files are loaded when we use only certain components of the framework, it does not make it easier for the developer to figure out which classes (and essentially files) are used. Not in a very straight forward way at least.
We could use Xdebug to figure out what is loaded and where - but that requires us to walk through everything our application does, revisiting this process when we add features and it overall adds to whatever we are working on already, thus taking away all the RAD benefits and reasons why we use a framework to begin with.
An alternative would be to break up the framework into components (as Zend advertises it) and to offer a PEAR-style installation. While this may not be convenient for unzip and go type of people, I'd offer it as an alternative installation method for the more knowledgable user.
2) Zend_Loader ERRATA
My loader in my last blog entry, looked like:
function __autoload($className) {
include_once str_replace('_', '/', $className) . '.php';
}
But since __autoload() is only invoked when the class is not yet present, it makes no sense to include_once, so let's use this instead (thanks for the comments and pointing that out):
function __autoload($className) {
include str_replace('_', '/', $className) . '.php';
}
3) Caching database results
In my first article, I talked about how you have to avoid queries and cache whatever is possible. Despite people raving about caches, all the awesomeness aside, caching also has major drawbacks which are almost always excluded and overlooked.
I'm sure many people know Cache_Lite, and have seen one or two examples - it's super nice and simple.
In a nutshell:
- set a TTL
- pull from the cache if it exists
- populate the cache if it doesn't
- Cache_Lite also takes care of deleting the cache when the TTL ran out
What none of those examples talk about, is the issue we run into when you a user added something and the cache is not invalidated by your application. For instance, if I decided to cache our user's data today, it would double or tripple our support volume in an instant. Just because our customers don't know or care about the resources consumed by a database query. What they care about is their data, and they want to see it now.
In this case the web2.0 is not too helpful either. Applications are online, available at all times, instant, very responsive and the data is live. ;-)
So far so good -- no! No?
Well, we need to keep in mind that a file based cache with a high traffic application might not be exactly suitable anyway ("How slow is the disk?"). So we may need to setup a RAM disk, if you got plenty of it and put your cache on there or look for a maybe more robust solution such as memcached.
To back up my claim and in an effort to normalize my blog post, go head over to Tillate (a pretty awesome name, if I may add). They just posted a blog entry going into detail on the obstacles you run into when you cache content and why you should consider outsourcing some of your caching to the clientside.
Last but not least -- two things.
- To re-iterate on my disclaimer, make sure to actually put a meter on things so you can measure an improvement and not guess ("I think it's faster."), because that is the worst. For an example of what I mean, please check out Sander van de Graaf's blog.
- We need to keep in mind that the added complexity needs to be taken into account -- during development (for example replicating the setup for development and staging) and also for maintainance (more services, more sorrow).
4) Zend_Db
I'm not trying to pick a fight with Zend_Db, but I can't really avoid talking about it. Aside from general shortcomings in the implementation, my number one issue currently is that the Mysqli driver prepares all queries that are send to the MySQL server.
Let aside all the pseudo security (I believe each value is quoted anyway before prepare is used), they are slow and also suck because MySQL's query cache currently cannot deal with them. Even though Bill Karwin pointed out that MySQL 5.1.x will eventually solve this problem, we are still stuck in the here and now.
The quickfix would be to shortcut all queries and inject directly into (I think) exec(), which would avoid the internal prepared statement but this leads to getting rid off it all together, which means we throw aboard all the convenience offered through insert(), update(), delete(), fetch*() and so on.
I have a patch to basically add in a fake Zend_Db_Adapter_Mysqli_Statement class (Did I get the name right?), which would basically mock mysqli_stmt but this is not a solution that would go into the framework. I'm attempting to find a more general solution, but don't hold you breath.
5) Zend Framework
Zend Framework is both amazing and frightening at the same time. With all the feedback I should add that it's semi-open-source (open source backed by a company), so you can always report bugs, (sign a CLA to) supply patches and maybe even get access to commit code to the official repository.
With (of course) no offense meant, there are a few things that lack severely currently:
- Good coding guidelines vs. overengineering - I wonder who's gonna optimize all the weirdness for 2.0 that is apparently required by CS/design. While engineering in general is a good thing (TM), various people have already pointed out that some of the components (or helpers, validators, etc.) tend to be overengineered - for example, when you load a new class (and essentially file) to apply strtolower() on a variable.
- Code review - I feel like
oftensometimes components are promoted to trunk and not enough people looked at it before that has been done. For a great example of peer code review, let me pimp PEAR again because while we are sort of agressive on our coding style (and also tend to drive people crazy or at least provoke flamewars), all the feedback you get during the proposal phase is worth so much.
You don't learn all that in school, open source gives it away for free. - Even though the Zend Framework is advertised as a glue framework (you just use what you need), many components have hardcoded dependencies - an example is my favorite Zend_Loader. Especially in case of the Zend_Loader there are two things I'd like to see -- one is allowing the user to override the loader used in any component, two is replacing all the require_once calls with the loader since by default it tends to be the less expensive operation anyway.
- Maintainers - some components seem to be rather unmaintained and it seems there is little/no communication that is visible for everyone on the outside. Not very open-sourcy. :-P
Also, various people who contributed to the framework initially, have left Zend's payroll and their work is sort of orphaned (so it looks).
There's just not enough people who have time to actively contribute to core components currently. And in a way, despite a company backing the code, the Zend Framework suffers from the same problems any opensource project has.
Last but not least -- I don't hate the Zend Framework, or frameworks in general. ;-)
(Despite some other people, who's blogs I like to read and find very entertaining.)
(Sorry, German-only content. But if you are in the area (Berlin, Germany) this week (2008/11/05) feel free to drop by. We can always arrange talks in English or translate. :-) Attending is free!)
Im November laedt die Berliner PHP Usergroup zu einem Vortrag ueber CouchDB ein.
Die Eckdaten:
Thema: CouchDB (Jan Lehnardt)
Wann: 20:30 Uhr, 5. November, 2008
Wo: Z-Bar, Bergstr. 2, Berlin-Mitte (Google Maps)
Kosten: freiRSVP: Facebook, Qype, Mailinglist