Monday, April 10, 2006

Database Maintenance Strategy for Fifth World Scan

I completed my fourth scan of the world and then ran a more current update on the caches between home and northern California and Nevada for my upcoming trip. In preparing to start my fifth scan of the world, I thought I'd try to improve my pocket query strategy.

In the fourth scan, I emphasized getting information about archived caches which I had largely ignored in the third scan. To do that I had to have queries covering all dates from January 1, 2000 to as close to the current date as possible for each of the 25 partitions of the world listed in the "Pocket Query World Scan" entry. For the current size of the database that took just over 3 months. I have over 250,000 caches with over 4 million logs.

I wanted to catch up on identifying the archived caches and manually download the final gpx file for each of them. However, that meant that I didn't keep current on new caches. I obtained the new caches only in the placed date ranges at the end of each partition. I was behind three months on the oldest partition and a month and a half on the median partition. The only partition that I was ever nearly current on was the one I had just finished at any point in time.

There are five classes of caches that need to be considered; new caches, caches that have been received in a pocket query and have not yet been archived, caches that have been archived since the last time they were received in a pocket query, older archived caches, and caches that were archived before they could be received in any pocket query. I choose to ignore caches that were once active but have returned to the not-yet-approved state for this discussion. There are a few which appear in my database as not archived but yet having a very stale 'Last GPX date'. I get the error message when I try to access them for an individual gpx file update.

In order to make sure that my database has a record of as many caches as possible, I need to run pocket queries that capture them as they are approved. New caches that have not yet been approved are not available to pocket queries and only become visible when they are approved. Usually I have to wait about a week after caches are placed before I can count on most of them being approved. At the current time roughly 350 new caches are placed every day and reliably show up in pocket queries within a week. If I were to place a pocket query every day for all of the caches placed one week earlier, then I would have to ask for just one day. Otherwise only 500 random caches in the group would be returned. If I ran it every other day and asked for two placed dates, I would expect 700 new caches and miss about 200 of them.

By partitioning the world into three equal sized zones, I would expect just over 100 caches in each zone each day. By experimentation, I determined that I could select multiple states/provinces to cover all the caches in three roughly equivalently sized groups. The 'None' value covers caches that are in countries that are not included elsewhere in the state list. The ranges are states that begin with A-M, N, and O-Z. The range with all of the N's includes the value 'None'.

That would give me flexibility to ask for 3 or 4 days worth of caches for a group and have a query nearer the 500 maximum. I create the query, run a preview to test how many caches will be returned, and mark it to run once and then delete the query. I select the States/Provinces for the group and adjust the range of 'placed during' days to result in as close to 500 caches as possible. If more than 500 caches were placed and approved for a single day, I'd have to temporarily split the zone. I'll place a comment to this article after I've started to let you know how well that strategy is working. This strategy will use about one of my five daily queries for the first class of caches, leaving four remaining queries.

The second type of pocket query will be like those I used on the fourth world scan using it's 25 partitions, only at a slower rate to leave pocket queries for other uses. This allows me to update caches that have not been archived since the last time they were received in a pocket query and to infer which caches have been archived so that I can download the final individual gpx files for them. I keep the archived caches in my database with 'Last GPX Date' older than that of similar active caches. I don't need to worry about updating caches that I know have been archived because if any were ever reactivated, I would receive it the next time that that cache range is scanned. About a hundred caches are newly archived every day. The scanning process will also catch any caches that are missed in the first strategy for new caches. I can run these queries at a slower pace as long as it is faster than new caches are added. I plan to use some of the extra pocket queries to prepare for actual cache outings.

Caches that were archived before they could be received in any pocket query present a separate problem. They will never appear in a pocket query. To get information about these caches I do a manual scan of all the possible waypoint codes that aren't already in my database. This is a laborious process and I've only done it for a few of the earliest placed caches.

Until next time, this is Nudecacher.