RSS feed size and Planet
For fun I just set up my own Planet. Reading both Planet Debian and Planet Linux Australia is a bit of a drag due to the overlap. So I wrote a little Perl script to extract the feeds from both those sources and generate a Planet configuration. My planet is publicly available in case anyone is interested. Also you will notice that I have a formatting problem on my planet, if anyone has some advice on planet templates (I'm using the Debian package) then please let me know.
While playing with Planet I noticed that my blog has one of the largest file sizes of all the blogs from Planet Debian and Planet Linux Australia. That would be partly due to writing blog entries of moderate size and trying to maintain an average of one post per day. But also I imagine that it woul d be partly due to the blogger configuration.
I changed my main blog page from 14 days to 7 days (which took the index.html size from 80K down to 40K). But strangely I can't seem to change the number of days that are to be kept in the RSS feed file which remains at about 80K.
It seems to me that the feed mechanism is badly designed in this regard. A more efficient mechanism would be to send an XML file that describes the other blog entries, which could then be downloaded IFF they meet the criteria that are desired by the syndication software (and IFF they have not already been downloaded). If the web server and the syndication program are correctly configured then the requests could be chained on the same TCP connection for no loss of performance under any situation when compared to the way things currently work.
Also as many (most?) RSS and ATOM feed files are generated from a database it might be difficult to preserve the creation time, and thus I expect that most web caching is also broken. I haven't verified this though.
Also it would be handy if there was a mechanism for syndication programs to request notification of changes and for a blog server to push content to a syndication server. I have Planet running every 6 hours, some of the blogs I read are updated once per week, apparently my Planet server does 28 downloads of the entire XML file for every change. This might not sound so bad, but there are planets which run every hour and would therefore have 168 downloads per week.
Please let me know if you have any ideas of how to alleviate these problems, or if there are already solutions of which I am not aware.
1 comment:
Feedparser (which planet uses), does support etag and last-modified headers which should stop it fetching the entire feed each time, provided their server is correctly configured.
The atom publishing protocol provides a way of giving links to other entries, but I'm not really sure if it's suitable for this case.
mod_speedyfeed is a way of only supplying new entries using RFC 3229 - delta encoding for HTTP, but it requires both client and server support.
Post a Comment