Search against a large, rapidly changing data set?
2 Nov 2009 - 5:16pm
5 years ago
I'm going to guess I'm delving into sufficiently esoteric areas that
nobody will have an answer, but we are smarter than me, so here goes:
I'm trying to improve one of our key search interfaces. The use cases
involve people making searches against a large (hundreds of thousands
of records) data set. To make matters more complicated, the data set
changes very rapidly, to the point where any set of search results we
can return may well be inaccurate or incomplete by the time it's
Right now we allow unbounded searches, but truncate the result set at
an arbitrary size. Result sets are timestamped so that people know
the data were accurate as of the timestamp. My intuition and informal
user research tells me that people don't really want these large data
sets. They want more focused results.
The typical interaction patterns I know to accomplish this are
search-within-search, and faceted search. However, both patterns are
confounded by the rapid pace of data change:
- Search-within-search would result in increasingly inaccurate results
as searches were performed against outdated information or might
confuse people if we re-ran the search against the updated data, since
the second result wouldn't be a true subset of the first result, but
rather an updated subset.
- faceted search interfaces typically give people a size for their
too-large query, and then give actual results when the query
parameters have been narrowed down to the point where the result set
size is "reasonable." (for whatever definition of reasonable fits the
problem domain). In my domain, the rapid change in the data confounds
this process because the sizeof() queries are only accurate at the
time they're performed and while we might tell the person that he'd
get back 100 records based on the data now by the time the query has
run he might get back 1000 records. Or 10,000. So it's not inherently
clear to me that faceted search would help either.
Has anyone tried anything like this or have any thoughts/insights to
share about this problem?
(*) There's a different problem here of people wanting to monitor the
changes, rather than perform static searches, but that's not what this
song is about.