Taming AuditTrail Proliferation
django (72)I've spent a bit of time withAuditTrailover the past day,自从我first discovered it, and I've been quite pleased with it. However, my app makes a large number of changes, and I was beginning to experience a bit of database bloat because of the growing number of audits.
After a day of usage, one of my models had about 180 revisions, and while each revision itself is small, it was pretty clear that I wasn't going to be able to ignore the situation without causing myself some serious headaches in the relatively near future (of course, being able to only record diffs is a nice advantage for something likedjango-rcsfield, which would be able to get by with much less space).
幸运的是,这取决于你使用revisions, there is a fairly simple solution to this dilemma: throw the excess revisions away. I didn't want to perform extra database lookups everytime a new revision was created, so I decided that adding an extension tomanage.py
would be an adequate solution (which I could periodically activate with a cronjob).
So I setup the skeleton for a management command:
cdmy_app mkdir managementcdmanagement touch __init__.py mkdir commandscdcommands touch __init__.py emacs clean_audit_trails.py
At first I intended to go with a very specific set of rules for picking the revisions to keep:
- All revisions in the past hour,
- The first revision older than one hour,
- The first revision older than one day,
- 第一个修改超过一个星期,
- The first revision older than one month, 6 and so on...
But then I started actually writing that code, and my enthusiasm for that approach swiftly dwindled. Instead I decided I could accomplish roughly what I wanted much more concisely by using a simple backoff to determine the cutoffs for dates.
Depending the type of backoff you use, you can control the spacing of revisions to save.
>>>defmult_backoff(x):...returnx*10...>>>[mult_backoff(x)forxinxrange(0,10)][0, 10, 20, 30, 40, 50, 60, 70, 80, 90]>>>defexp_backoff(x):...returnx*x...>>>[exp_backoff(x)forxinxrange(0,10)][0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
You could also do an additive backoff, etc. For my needs the multiplicitive backoff worked well. Starting from 60 seconds and multiplying by ten it follows this pattern: 1 minute, 10 minutes, 1 hour, 16 hours, 6 days, 9 weeks, and so on.
Here is the implementation of theclean_audit_trails
management command:
fromdjango.core.management.baseimportNoArgsCommandfrommy_app.modelsimportMyModelimportdatetimeclassCommand(NoArgsCommand):help='Removes excessive Reversion history for Notes.',args=''
def handle_noargs(self, **options): print "Removing unwanted audit trails..." # if you let the backoff grow too large, # it'll turn into a long int and datetime.timedelta # cannot be instantiated with a long int max_age = 60000 objects = MyModel.objects.select_related().all() remove = 0 now = datetime.datetime.now() for obj in objects: backoff = 60 cutoff = datetime.timedelta(seconds=backoff) for trail in obj.history.all(): diff = now - trail._audit_timestamp if backoff > max_age or diff < cutoff: trail.delete() remove = remove + 1 else: backoff = backoff * 10 cutoff = datetime.timedelta(seconds=backoff) print "Removed %d audit trails." % remove
Note that the code is assuming a model that looks like this:
fromdjango.dbimportmodelsimportauditclassMyModel(models.Model)title=models.CharField(max_length=200)text=models.TextField()history=audit.AuditTrail()
Using it is the same as any other management command:
python manage.py clean_audit_trails
With a little meta-magic you could probably put together a versitle tool based on this that isn't hardcoded to clean a specific model, and uses a backoff method specified in the projectssettings.py
.