For quite a while, RSS2.com has been slow. Ever since it hit, oh, around 1.4 million items in its database (now up over 2 million) the queries that were responsible for building the home page view had gotten slower and slower.
The old design
The problem was one of normalization, and the desire to not have data going into two places. The design originally worked this way:
- Look up the latest posts. Find all the latest ones that have a distinct
feed_id
. This was running against thefeed_data
table, which now has over 2 million rows. - Next, look up the feed information for each
feed_id
found. Originally, I couldn’t figure out a way to do this all in one query. So this was another 10 individual SQL queries, looking up information about the feed needed to display on the home page (like the name of the feed, URL, etc.) This query did a join of thefeed_data
with the feed_info tables. This is what took progressively longer and longer, the more rows in thefeed_data
table. - Next, combine the results of finding the 10 most recent posts from the 10 most recently updated feeds into an array using PHP, then pass those results from the Data Access Object (DAO) to the presentation layer for output on the home page.
This ended up being really, really slow on the homepage load (like over 2 minutes). This was obviously too slow. I implemented caching at a number of layers to mitigate the problem, which made for a very speedy second page load, but the first one was still way too slow.
After consulting with my friend Ben, he agreed that it was an inherently expensive query to run. We agreed that it needed to be re-architected to be able to significantly improve performance.
The new design
The new design required creating a new table dedicated to the home page posts where, when a RSS feed is read, it inserts the most recent new item into the new table.
The new process works like this:
- RSS feed is read. Any new items are first inserted into the main
feed_data
table, and thefeed_info
table is updated as well with a last checked timestamp. - Next, the new table is updated, using a MySQL
REPLACE
command, with the combined feed information as well as item information - The complicated 11 queries that used to be run are now replaced by a dead-simple single query that will only ever have to search through the total number of feeds in the system (currently 753 rows), instead of the over 2 million rows before.
These new changes have made using RSS2.com much more usable again, and it is way faster. The homepage loads in under a 1/2 second now, just like it should.