Saturday 1 March 2014

Huge archive generation with PostgreSQL cluster??

Recently I found huge archive logs generation on one of the servers(for my client). Around 3500 files per day and it needs 3500 * 16 MB = 54GB space every day and space is low on the server. First thing came to my mind was checkpoint parameters, I had a look at them and they were at default. Hmm, so had changed them as below:

checkpoint_segments = 180
checkpoint_timeout = 30min
checkpoint_completion_target = 0.9 

Monitored for couple of days and found that archive logs generation decreased to 50%(around 1700 files). 

With the above parameter settings, archive log generation should be minimal. However if not, Here are some general suggestions to look at:

-- Make sure that you don't UPDATE a row to the same values it already has. The row will be logged even if old and new values match. One way to handle this is to make sure that any UPDATE statements which might set a column to its existing value include in the WHERE clause a test that oldvalue IS DISTINCT FROM new value. Or you can consider creating trigger on problem tables using the Postgres built-in function suppress_redundant_updates_trigger (you might want to look here: http://www.postgresql.org/docs/9.3/static/functions-trigger.html) 

-- Use one UPDATE statement instead of several where possible. If updates to different columns or from multiple processing steps can be combined, this can reduce WAL logging.

-- Consider using UNLOGGED or TEMPORARY tables where possible. For example, if data is inserted into the database and updated right away, perform the initial steps in an UNLOGGED or TEMPORARY table, and INSERT into the main table once the data has been prepared.

-- Identify and eliminate unused indexes. Index changes are WAL-logged so that indexes can be recovered on a crash. The storage space needed for indexes can be more than the storage space needed for the data itself, with a corresponding increase in generated WAL.

3 comments:

  1. If you're worried about spurious UPDATE statements which don't really change anything in the row, you can consider creating trigger on problem tables using the Postgres built-in function suppress_redundant_updates_trigger (see [1] ).

    Of course the best option would be to stop the application from sending such UPDATEs, but if that's not an option, or as a stop-gap measure until the application developers can fix the behaviour, this trigger can save on WAL generation, and hence archive generation.

    [1]: http://www.postgresql.org/docs/9.3/static/functions-trigger.html

    Best regards,

    ReplyDelete
    Replies
    1. Great, thank you Gurjeet, I was not aware of this built-in function. Good one. Will update the post as well.. ;-)

      Delete
  2. Hi,

    I'm currently using PostgreSQL 9.1 in my production server. Recently, I found very huge amount of archive logs generation on the server. Around 80 files per hour for just a delete statement of 8000 rows approx. And the archive logs size per day will be around 42 GB. We are getting​serious disk space problem. Kindly help me to sort out this issue, ASAP. Thanks in advance.

    Regards,
    Pavan Teja,
    DBA,
    Postgres Certified Associate,
    9841380956

    ReplyDelete