Log in

No account? Create an account

Postgresql 8.3 Features I'm looking forward to

PG 8.3 is coming along soon. (although I read from Bruce M that there's likely to be RC2 coming out).

In any case, I looked through the pgwiki and there looks like only 2 features which I'm looking forward to.

  • HOT
  • Create table like including indexes (although right now, this is being automated via a stored procedure/function)

The other thing which is nice, but not absolutely necessary is the multiple Autovacuum worker feature. My concern is largely on the few very large tables which I used to have. (I've since sliced it down to partitions by date ranges to keep it manageable. I initially just wanted to see how _much_ data it can cope with before my system** starts to bog down. BTW, It turned out to be approx 200 million rows, and Now I know)

Of late, the nightly vacuum has been taking a long time and this is in part, a fault of mine due to a design issue. I won't go so much into this, but know  that I need to relook into my current ETL implementation and where the data goes into the Db.

As of right now, I'm pulling data from a MSSQL server into PG to be made as a data-mart. My current process involves pulling from MSSQL into a table in PG. Unlike the usual method of making a partition, namely a master table w/o holding any data or insert directly into  the partition, I chose to insert into  the master table, and then, 1 week later (I started with 1 month then 2 weeks and ended up with 1 week's worth of current data in the master table) I start to offload data from the master table into the partition.

Master Table (1 wk data)

I was looking through my system's load and found that it's always on IO wait. Performing a vacuum on the large table after the data offload into the partition took quite a while due to

  1. The table is large
  2. The indexes are sometimes even larger than the table size
  3. The number of indexes in that table
  4. My usage of a concatenated prikey named as unique_id to simplify the loading process which ended up being a bad decision because I needed to create the same prikey (non-concatenated as an index) anyway to improve join performance. Hence, in some sense, i have double the amount to vacuum through. Bad. Bad. (David Fetter warned me of this but I chose to shoot myself in the foot anyway.)
So, I figured that by reducing the amount of data in that particular table, I could well reduce the amount of time being spent in vacuuming that particular table. (Note that I don't know how true is this hypothesis of mine, but I'm giving it a shot anyhow.)

Note: I'm looking forward to 8.4, which I don't really know when, but I'm hoping that by then, (on disk) bitmap indexes will be made available and my (multiple) indexes can be made to be smaller and more efficient. (up to 8 index on a table)

** : The system in question is a celeron 1.7G/768MB RAM and 2x500GB Raid 1 w/ ~250GB DB size
Tags: ,



concatenated prikey

Can you clarify what you meant by "concatenated prikey"? I could certainly benifit from what you've learned from this experience. I just want to be sure I understand exactly what you are referring to.


Re: concatenated prikey

Prikey for the table is (A,date_time,B,C)
concatenated prikey = (A || date_time(converted to since epoch)||B||C)

then create another index on A,date_time, B, C

Re: concatenated prikey

Prikey for the table is (A,date_time,B,C)
concatenated prikey = (A || date_time(converted to since epoch)||B||C)

then create another index on A,date_time, B, C

April 2009

Powered by LiveJournal.com