<?xml version='1.0' encoding='utf-8' ?>
<!--  If you are running a bot please visit this policy page outlining rules you must respect. http://www.livejournal.com/bots/  -->
<rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/' xmlns:media='http://search.yahoo.com/mrss/' xmlns:atom10='http://www.w3.org/2005/Atom'>
<channel>
  <title>LotSo&apos;s OSS World</title>
  <link>http://lotso.livejournal.com/</link>
  <description>LotSo&apos;s OSS World - LiveJournal.com</description>
  <lastBuildDate>Sun, 05 Apr 2009 12:22:01 GMT</lastBuildDate>
  <generator>LiveJournal / LiveJournal.com</generator>
  <lj:journal>lotso</lj:journal>
  <lj:journalid>4034558</lj:journalid>
  <lj:journaltype>personal</lj:journaltype>
  <image>
    <url>http://l-userpic.livejournal.com/21859311/4034558</url>
    <title>LotSo&apos;s OSS World</title>
    <link>http://lotso.livejournal.com/</link>
    <width>100</width>
    <height>75</height>
  </image>

<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/107348.html</guid>
  <pubDate>Sun, 05 Apr 2009 12:22:01 GMT</pubDate>
  <title>Postgresql 8.4 -&amp;gt; Where are On Disk Bitmap Indexes?</title>
  <link>http://lotso.livejournal.com/107348.html</link>
  <description>Postgresql 8.4 is nearly out. There&apos;s quite a few things which looks interesting to me. However, the one thing which I&apos;m still missing and am not able to find the status of is where or what happened to the On-Disk-Bitmap-Indexes which was supposed to come out for the 8.4 release.&lt;br /&gt;&lt;br /&gt;Anyone from the Postgreql SQL Team would be privy to that info? can&apos;t really seem to find it on google.&lt;br /&gt;&lt;br /&gt;Thanks.</description>
  <comments>http://lotso.livejournal.com/107348.html</comments>
  <category>linux</category>
  <category>sql</category>
  <lj:mood>aggravated</lj:mood>
  <lj:security>public</lj:security>
  <lj:reply-count>9</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/106296.html</guid>
  <pubDate>Mon, 23 Jun 2008 01:32:15 GMT</pubDate>
  <title>Automatic Raid Array Rebuilding</title>
  <link>http://lotso.livejournal.com/106296.html</link>
  <description>Hi guys, long time no post. Last post was at March and it&apos;s now already June.&lt;br /&gt;&lt;br /&gt;Been busy as usual, however, not been dabbling as much as I &quot;should&quot; as I&apos;ve been busy with other NON-FOSS related stuffs. (psst: I&apos;m now heavily into photography. Went to shoot some Japan GT queens!! Kawaaiii)&lt;br /&gt;&lt;br /&gt;Anyway, since this is a (nearly) purely an FOSS based blog, I&apos;m gonna talk about my automatic Raid Rebuilding script.&lt;br /&gt;&lt;br /&gt;You see, what happens is this, my postgresql box, (celeron 2x500GB in Raid 1) has a tendency to keep dieing once in a while for X reasons. (I have till now, been unable to locate the reason why it&apos;s dieing so often) I&apos;ve tried to the write-all, read-all using dd but thus far, has not seen errors being thrown out. So, it&apos;s been a manual instance of...&lt;br /&gt;&lt;br /&gt;go to work. see the email : Your raid has Died!&lt;br /&gt;log onto the box, do the rebuild.&lt;br /&gt;&lt;br /&gt;After a while, this just becomes tiring and I decided to fsck it and make it automatic.&lt;br /&gt;&lt;br /&gt;Here&apos;s the script&lt;br /&gt;&lt;br /&gt;#!/bin/bash&lt;br /&gt;&lt;br /&gt;FAIL_DRV=`mdadm --detail /dev/md0 | grep faulty | awk &apos;{print $6}&apos;`&lt;br /&gt;&lt;br /&gt;if [ -n &quot;$FAIL_DRV&quot; ]&lt;br /&gt;then&lt;br /&gt;&amp;nbsp; echo &quot;Detected degraded array : $FAIL_DRV&quot;&lt;br /&gt;&amp;nbsp; echo &quot;Starting automated array rebuild process&quot;&lt;br /&gt;&amp;nbsp; mdadm /dev/md0 --fail $FAIL_DRV --remove $FAIL_DRV --add $FAIL_DRV&lt;br /&gt;else&lt;br /&gt;&amp;nbsp; echo &quot;Nothing to do&quot;&lt;br /&gt;fi&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Simple eh.. &lt;br /&gt;&lt;br /&gt;So, now I don&apos;t have to come to work to see it all wonky because it&apos;ll automatically rebuild itself.&lt;br /&gt;&lt;br /&gt;Some of you may ask, how come I don&apos;t just replace the drive? Because I can&apos;t find any replacement drive which is a PATA connection and at 500GB capacity! The largest I can find are 160GB.&lt;br /&gt;&lt;br /&gt;Bummer</description>
  <comments>http://lotso.livejournal.com/106296.html</comments>
  <category>linux</category>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>2</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/105424.html</guid>
  <pubDate>Tue, 05 Feb 2008 01:48:54 GMT</pubDate>
  <title>Location..Location..Location</title>
  <link>http://lotso.livejournal.com/105424.html</link>
  <description>I’m in San Jose. Still pondering if I can make it to the Local PUG (postgresql user group) meeting to be held on Feb 12 since I’m here.&lt;br /&gt;&lt;br /&gt;Will get the chance to meet David Fetter and team.&lt;br /&gt;&lt;br /&gt;I’ll see what happens.&lt;br /&gt;&lt;br /&gt;PS : I freaking hate it here this time of year. It’s cold and so are my fingers! I need to constantly rub my hands together</description>
  <comments>http://lotso.livejournal.com/105424.html</comments>
  <category>linux</category>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>6</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/104993.html</guid>
  <pubDate>Sun, 27 Jan 2008 09:51:27 GMT</pubDate>
  <title>You say Lemon, I say Lemonade (A story)</title>
  <link>http://lotso.livejournal.com/104993.html</link>
  <description>The past few weeks was not all that great as in addition to facing additional challenges at my primary day job, I also had to deal with my pet project in my day job to help smoothen my day job’s activities.&lt;br /&gt;&lt;br /&gt;Some of you may know that my pet project involves pulling gobs of data into a PG instance to make my own version of a company datamart. I’m not talking about small gobs of data, but more towards in the range of 200+GB (It was more, but in one of the efforts to control/tune the server, I deleted close to 2-3 month’s worth of data.)&lt;br /&gt;&lt;br /&gt;200+GB may not seem like much to you guys who gets to play with some real iron hardware or some “real” server hardware. All I had was just a Celeron 1.7G w/ 768MB of ram and some Gobs of IDE 7200 RPM drives. In short, all I had was lemons and I needed to make the best of it!&lt;br /&gt;&lt;br /&gt;Actually, all was working fine and dandy up until I decided to make a slave server using Slony-I + PGpool and while that was a good decision, the involved hardware was the same if not worst(512MB ram only). When I started to implement that, I was faced with 2 issues.&lt;br /&gt;&lt;br /&gt;1. Replication would lag behind by up to a day or so waiting for the next sync (dreaded fetch 100 from log) was taking to long.&lt;br /&gt;2. My nightly vacuum job went from an average of 4+ hours to like 27+ hours.&lt;br /&gt;&lt;br /&gt;So, in a effort to get things under control, I went through a few paths and hit more than my share of stumbling blocks. One of the things which I tried was to reduce the amount of “current” data in a particular table from 1 month -&amp;gt; 2 weeks -&amp;gt; 1 week (and move them into a so-called archive table but still in the same tablespace).  This didn’t really bode well, as I initially tried to move the data in like 3 hourly chunks, which failed and to 1 hour chunks and then finally to 15 minutes chunks. &lt;br /&gt;&lt;br /&gt;But in the end, it was all really futile because what i was essentially doing was just generate more and more IO activity (and that’s not a good thing). In addition to that, I also had to deal with vacuuming the tables due to PG’s MVCC feature and that was also not fun.&lt;br /&gt;&lt;br /&gt;So, in the end, I broke my 3x500GB Raid 1 mirror (1 spare disk) and used the spare as the Slony-I log partition. Initially, that wasn’t all I did, I also included the 2 main problematic table, moving it from the main raid1 tablespace into that 1 disk tablespace. (that was also a mistake) and it didn’t help at all. IO activity was still high and I wasn’t able to solve my vacuuming process as wel.&lt;br /&gt;&lt;br /&gt;Time for another plan.&lt;br /&gt;&lt;br /&gt;This time around, what i did was to move the 2 big tables back into the raid1 tablespace and left the slony logs in the single disk. In addition to that, I also made a few alterations to the manner in which I pull data from the main MSSQL database and the way it was inserted into PG. &lt;br /&gt;&lt;br /&gt;This time around, I’m utilising partitioning and some additional pgagent rules to automatically switch into a new table every 7 days and in doing so, I also had to change a few more other items to get things to work smoothly. I did this last Friday and based on the emailed logs, I think I’ve made a good decision as right now, everything seems peachy with the vacuum back to ~4 hours and there’s also no lag in the Slony replication.&lt;br /&gt;&lt;br /&gt;I still hav another thing to do which is to alter the script I use to pull from the main Db as I’m being kicked (requested) to pull from an alternate DB which has a slightly different architecture.&lt;br /&gt;&lt;br /&gt;2 disk Raid1 is definitely MUCH better than a single disk tablespace. With the amount of read/write activity that i have, it’s just not doable.&lt;br /&gt;&lt;br /&gt;So, that’s how I made lemonade with my lemons. (hmm.. does this sound right?)</description>
  <comments>http://lotso.livejournal.com/104993.html</comments>
  <category>linux</category>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>2</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/104934.html</guid>
  <pubDate>Sat, 12 Jan 2008 12:39:40 GMT</pubDate>
  <title>Postgresql 8.3 Features I&apos;m looking forward to</title>
  <link>http://lotso.livejournal.com/104934.html</link>
  <description>PG 8.3 is coming along soon. (although I read from Bruce M that there&apos;s likely to be RC2 coming out).&lt;br /&gt;&lt;br /&gt;In any case, I looked through the &lt;a href=&quot;http://developer.postgresql.org/index.php/WhatsNew83&quot; rel=&quot;nofollow&quot;&gt;pgwiki&lt;/a&gt; and there looks like only 2 features which I&apos;m looking forward to.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;HOT&lt;/li&gt;&lt;li&gt;Create table like including indexes (although right now, this is being automated via a stored procedure/function)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;The other thing which is nice, but not absolutely necessary is the multiple Autovacuum worker feature. My concern is largely on the few very large tables which I used to have. (I&apos;ve since sliced it down to partitions by date ranges to keep it manageable. I initially just wanted to see how _much_ data it can cope with before my system** starts to bog down. BTW, It turned out to be approx 200 million rows, and Now I know)&lt;br /&gt;&lt;br /&gt;Of late, the nightly vacuum has been taking a long time and this is in part, a fault of mine due to a design issue. I won&apos;t go so much into this, but know&amp;nbsp; that I need to relook into my current ETL implementation and where the data goes into the Db.&lt;br /&gt;&lt;br /&gt;As of right now, I&apos;m pulling data from a MSSQL server into PG to be made as a data-mart. My current process involves pulling from MSSQL into a table in PG. Unlike the usual method of making a partition, namely a master table w/o holding any data or insert directly into&amp;nbsp; the partition, I chose to insert into&amp;nbsp; the master table, and then, 1 week later (I started with 1 month then 2 weeks and ended up with 1 week&apos;s worth of current data in the master table) I start to offload data from the master table into the partition.&lt;br /&gt;&lt;br /&gt;Master Table (1 wk data)&lt;br /&gt;-&amp;gt;partition_200710&lt;br /&gt;-&amp;gt;partition_200711&lt;br /&gt;-&amp;gt;partition_200712&lt;br /&gt;&lt;br /&gt;I was looking through my system&apos;s load and found that it&apos;s always on IO wait. Performing a vacuum on the large table after the data offload into the partition took quite a while due to &lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The table is large&lt;/li&gt;&lt;li&gt;The indexes are sometimes even larger than the table size&lt;/li&gt;&lt;li&gt;The number of indexes in that table&lt;/li&gt;&lt;li&gt;My usage of a concatenated prikey named as unique_id to simplify the loading process which ended up being a bad decision because I needed to create the same prikey (non-concatenated as an index) anyway to improve join performance. Hence, in some sense, i have double the amount to vacuum through. Bad. Bad. (David Fetter warned me of this but I chose to shoot myself in the foot anyway.)&lt;/li&gt;&lt;/ol&gt;So, I figured that by reducing the amount of data in that particular table, I could well reduce the amount of time being spent in vacuuming that particular table. (Note that I don&apos;t know how true is this hypothesis of mine, but I&apos;m giving it a shot anyhow.)&lt;br /&gt;&lt;br /&gt;Note: I&apos;m looking forward to 8.4, which I don&apos;t really know when, but I&apos;m hoping that by then, (on disk) bitmap indexes will be made available and my (multiple) indexes can be made to be smaller and more efficient. (up to 8 index on a table)&lt;br /&gt;&lt;br /&gt;** : The system in question is a celeron 1.7G/768MB RAM and 2x500GB Raid 1 w/ ~250GB DB size</description>
  <comments>http://lotso.livejournal.com/104934.html</comments>
  <category>linux</category>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>3</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/104440.html</guid>
  <pubDate>Sun, 06 Jan 2008 06:24:57 GMT</pubDate>
  <title>SQL - pgpool-II (Step 2)</title>
  <link>http://lotso.livejournal.com/104440.html</link>
  <description>So, this is step 2 to getting replication + load balancing to work for postgresql.&lt;br /&gt;&lt;br /&gt;I&apos;ve already detailed the 1st step to getting Slony to work in a previous blog. (that was on a development machine/vmware image. When I tried it on the production/slave server, I was faced with some issues which I might elaborate in another post. It all boils down myself shooting my own foot. What to do, it was a time when I wasn&apos;t connected to the internet and thus, no googling privileges.)&lt;br /&gt;&lt;br /&gt;So, here are my experience with pgpool and it&apos;s also a little bit like shooting myself in the foot (again!)&lt;br /&gt;&lt;br /&gt;First off, I started out with using the _wrong_ version of pgpool. The newest version of pgpool-II (note that -&amp;gt; pgpool-II and not pgpool-I) is 2.0.1 and the newest version of pgpool found on the yum mirrors (I&apos;m using centos4/5) was 2.01 (well, the numbers match don&apos;t they?) The only different was the one available on the yum mirrors was that of pgpool-I and not pgpool-II. However, since documentation on pgpool were sparse (I googled everywhere, read all the relevant and NON-relevant mailing list and found nothing much to go on by.)&lt;br /&gt;&lt;br /&gt;It was not until I signed up to the pgpool mailing list (which was very low volume by the way) and interacting with one of the Japanese developer did I find out that I was in-fact using the OLD version of pgpool which was pgpool-I which, unfortunately had the same version as pgpool-II!&lt;br /&gt;(I even downloaded the tarball from pgfoundry[but I _did_ download the _correct_ tarball] and searched through the source to figure out what was happening.)&lt;br /&gt;&lt;br /&gt;By that little(big!) mistake I did, I was tearing my hair out for the past 3+ weeks. (well, I didn&apos;t play with it everyday and in-between my dayjob and such....) However, I did get pgpool-I to work properly with a little tweaking and I could get load-balancing to work, albeit it was not as advertise as in I can&apos;t get it to work without it functioning as replication as well. (of sorts anyway, which was the reason I can&apos;t deploy it as I was using slony)&lt;br /&gt;&lt;br /&gt;So, after I found out my mistake last friday, I started to google for a new RPM of pgpool-II (newest version 2.0.1) but was unable to locate it in any place. The latest RPM I could find was that of version 1.3 which was _too_ old in a sense. (It&apos;s always better to have the latest stable version) So, I had to engineer a way to get a RPM from the tarball. Luckily, the tarball from pgfoundry also contained the pgpool.spec file, which was packaged by Devrim. Unfortunately for my, the spec file was a little old in that it refered to the 2.0 beta1 version. It wasn&apos;t too much of an issue as all it needed was a little hack here and a little hack there. (I was getting bad owner/group permission error which I narrowed down to the .spec file not having valid user/groups.) &lt;br /&gt;&lt;br /&gt;After that was done a rpmbuild -ba pgpool.spec and I got an RPM.&lt;br /&gt;&lt;br /&gt;After that, I just installed it, configured the pgpool.conf and got it up and running as advertised with replication mode off, master slave mode on and load balancing mode on.&lt;br /&gt;&lt;br /&gt;Cool.. I&apos;m rolling this to production on Monday.&lt;br /&gt;&lt;br /&gt;So, this means I&apos;ll have 1x Master (1.7G celeron/768MB ram, 500G Raid1 with ~200GB of data), 1xSlave (1.7G celeron/512MB ram 3x160GB raid0). I still have another box sitting under my desk which has even poorer specs than the above, but I think it&apos;ll work out just fine.&lt;br /&gt;&lt;br /&gt;Cool..Ultra Cool Even!!&lt;br /&gt;&lt;br /&gt;If anyone wants the RPM or the modified spec file, do drop me a line and I&apos;ll post it to you or something.</description>
  <comments>http://lotso.livejournal.com/104440.html</comments>
  <category>linux</category>
  <category>sql</category>
  <lj:mood>cheerful</lj:mood>
  <lj:security>public</lj:security>
  <lj:reply-count>29</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/104119.html</guid>
  <pubDate>Sun, 23 Dec 2007 17:18:23 GMT</pubDate>
  <title>SQL - Slony-I (step 1)</title>
  <link>http://lotso.livejournal.com/104119.html</link>
  <description>Been playing around with some level of replication for Postgresql. Like in all FOSS based software, there is lots of choices to choose from and that, in itself, though a blessing is also a curse. There’s just too many choices! (Both Foss and Non-Foss per se)&lt;br /&gt;&lt;br /&gt;1. Sequoia&lt;br /&gt;2. PgCluster&lt;br /&gt;3. CyberCluster&lt;br /&gt;4. Slony-I&lt;br /&gt;5. PgPool&lt;br /&gt;6. Skytools (this is skype)&lt;br /&gt;&lt;br /&gt;and i believe the list goes on. In any case, my requirements are just 2 I think. (for now anyway)&lt;br /&gt;&lt;br /&gt;i. Replicate only a subset of the tables. (not the entire db)&lt;br /&gt;(AFAIK, pgcluster, while easier to configure is also an entire DB replication solution, which is not what I wanted)&lt;br /&gt;&lt;br /&gt;ii. Connection load balancing to a few read-only slaves (for select queries only)&lt;br /&gt;&lt;br /&gt;Hence, based on the overflowing amount of information of which option to choose, I finally arrived at using slony-I and pgpool and of the two options, I’ve (more or less) already completed the configuration of Slony-I.&lt;br /&gt;&lt;br /&gt;For Slony-I, I made sure that I understood how to do the “old-style” which is by using the cli, before I moved on to doing the rest of the configuration using pgadmin which is way easier.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;&lt;u&gt;Slony-I&lt;/u&gt;&lt;/i&gt;&lt;/b&gt; &lt;br /&gt;There are a few caveats when using Slony-I and I’ll list down my experiences when I’m playing with it using both gentoo and centos 4 (this is running in a VM)&lt;br /&gt;&lt;br /&gt;1st off, version 1.2.12 is out from the slony-website but gentoo is still at 1.2.10. The easiest thing to do with this is just to hack the ebuild and change the version from 1.2.10--&amp;gt; 1.2.12 (gentoo bug #143600) and move it to /usr/local/portage.&lt;br /&gt;&lt;br /&gt;So, in that sense, building on gentoo was relatively straightforward and less than 10 min job (excluding compilation)&lt;br /&gt;&lt;br /&gt;But on centos, it’s another matters since there’s no default rpm supplied. Only a src rpm was supplied and not being too utterly familiar with it, (i’ve switched to using gentoo nearly 4/5 years ago as I hated fedora’s upgrade cycle and centos was “supposed” to be server-grade.)&lt;br /&gt;&lt;br /&gt;In anycase, most of the caveats are when dealing with centos. For one, since this is a src.rpm, you have to compile it 1st.&lt;br /&gt;&lt;br /&gt;Hence, you need these additional packages :&lt;br /&gt;&lt;br /&gt;1. bison&lt;br /&gt;2. flex&lt;br /&gt;3. gcc (and all it’s dependencies)&lt;br /&gt;4. rpm-build&lt;br /&gt;5. postgresql-devel&lt;br /&gt;6. docbook-style-dsssl&lt;br /&gt;7. netpbm-progs (and netpbm dependency)&lt;br /&gt;6. (there might be more as I didn’t document it)&lt;br /&gt;&lt;br /&gt;Once you start compiling it, you’ll run into 1 error which is caused by the NAMELEN of the docs. (this is marked as bug &lt;a href=&quot;https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=159382&quot; rel=&quot;nofollow&quot;&gt;#159382&lt;/a&gt; and the solution is to either upgrade to centos 5 (supposed to be fixed by this release. Keyword = supposed) or to hack it. (I chose to hack it)&lt;br /&gt;&lt;br /&gt;depending on where your docbook files are, you can do this&lt;br /&gt;&lt;em&gt;&lt;code&gt;&lt;pre&gt;
cd /usr/share/sgml &amp;&amp; perl -pi.bak -e ‘s/(NAMELEN\s+)44/${1}256/’ ‘find . -type f |xargs grep ’NAMELEN.*44’|sed -e ‘s/:.*//’‘
&lt;/em&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;So, after that is resolved, (which took 1-2 hours w/ scouring net etc.) Then move on to the experimenting stage. I used articles from these few locations :&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://slony.info/documentation/&quot; rel=&quot;nofollow&quot;&gt;slony-i official docs&lt;/a&gt;&lt;br /&gt;&lt;a href=&quot;http://odyssi.blogspot.com/2007/10/postgresql-replication-with-slony-i.html&quot; rel=&quot;nofollow&quot;&gt;WhoAmI’s Blog&lt;/a&gt;&lt;br /&gt;&lt;a href=&quot;http://www.onlamp.com/pub/a/onlamp/2004/12/16/slony_install.html?page=2&quot; rel=&quot;nofollow&quot;&gt;OnLamp Article from 2005&lt;/a&gt;&lt;br /&gt;&lt;a href=&quot;http://www.pgadmin.org/archives/pgadmin-support/2007-09/msg00101.php&quot; rel=&quot;nofollow&quot;&gt;Pgadmin Archives&lt;/a&gt;&lt;br /&gt;&lt;a href=&quot;http://www.pgadmin.org/docs/1.8/slony-overview.html&quot; rel=&quot;nofollow&quot;&gt;Pgadmin Docs&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Anyway, a few more caveats with the configuration is.&lt;br /&gt;&lt;br /&gt;1. Ensure you use a .pgpass file for the passwords (chmod go-rwx ~/.pgpass)&lt;br /&gt;&lt;em&gt;&lt;pre&gt;
192.168.10.100:5432:*:postgres:pguserpassword
192.168.10.20:5432:*:postgres:pguserpassword
&lt;/em&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;2. Ensure that you use sane configs for your pg_hba.conf file (use trust/ident authentication 1st just in case, to ensure it’s not due to that if it’s not working)&lt;br /&gt;&lt;br /&gt;3. ensure that the connection string used for slon/slonik also uses the “user=postgres” line.&lt;br /&gt;(notice that this &lt;a href=&quot;http://odyssi.blogspot.com/2007/10/postgresql-replication-with-slony-i.html&quot; rel=&quot;nofollow&quot;&gt;guide&lt;/a&gt; doesn’t have the user to connect as in the slonik shell script. This caused me some headache as I was getting both a password error as well as some “cannot connect admin node xxx issues)&lt;br /&gt;&lt;br /&gt;4. Create the replication using either directly using shellscripts or using pgadmin3. (i followed both the examples from the pgadmin docs as well as the mail I found on the pgadmin mailing list - links provided above, with the exception that I didn’t make it 2 way as in slave&amp;lt;--&amp;gt;master but only master--&amp;gt;slave and slave--&amp;gt;master.)&lt;br /&gt;&lt;br /&gt;5. starting the slon process is as simple as (I used a config file instead)&lt;br /&gt;$cat &amp;gt; slon_master.conf&lt;br /&gt;cluster_name = ‘pgcluster’&lt;br /&gt;conn_info = ‘dbname=testcluster host=192.168.10.20 user=postgres’&lt;br /&gt;^C&lt;br /&gt;&lt;br /&gt;$slon -d4 -f slon_master.conf&lt;br /&gt;&lt;br /&gt;(-d4 to give lots of debug output)&lt;br /&gt;&lt;br /&gt;&lt;u&gt;&lt;b&gt;on the Master DB&lt;/b&gt;&lt;/u&gt;&lt;br /&gt;&lt;em&gt;&lt;pre&gt;
2007-12-24 01:04:12 MYT DEBUG2 syncThread: new sl_action_seq 1 - SYNC 217
2007-12-24 01:04:16 MYT DEBUG2 localListenThread: Received event 10,217 SYNC
2007-12-24 01:04:17 MYT DEBUG2 remoteListenThread_1: queue event 1,195 SYNC
2007-12-24 01:04:17 MYT DEBUG2 remoteListenThread_1: UNLISTEN
2007-12-24 01:04:22 MYT DEBUG2 syncThread: new sl_action_seq 1 - SYNC 218
&lt;/em&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;u&gt;&lt;b&gt;on the Slave DB&lt;/b&gt;&lt;/u&gt;&lt;br /&gt;&lt;em&gt;&lt;pre&gt;
2007-12-23 22:05:02 MYT DEBUG2 remoteWorkerThread_10: SYNC 227 processing
2007-12-23 22:05:02 MYT DEBUG2 remoteWorkerThread_10: no sets need syncing for this event
2007-12-23 22:05:04 MYT DEBUG2 remoteListenThread_10: queue event 10,228 SYNC
2007-12-23 22:05:04 MYT DEBUG2 remoteWorkerThread_10: Received event 10,228 SYNC
2007-12-23 22:05:04 MYT DEBUG3 calc sync size - last time: 1 last length: 2005 ideal: 29 proposed size: 3
&lt;/em&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;6. BTW, there’s no such need to do a database dump and restore of the tables you want to be replicated. It’s as good to just create the schema w/o any data and start the slon processes. I learned that all my effort to dump and restore the replicated tables just ended up in the drain as slony-I will just truncate the table (this was a command I caught a glimpse of when slon started) and restart from scratch. (i really wonder if this is intended behaviour. What happens when the slon processes goes down? and it seems that it’s quite fragile, so I’ll have to look into that.)&lt;br /&gt;&lt;br /&gt;Next up is to look at pg-pool. That’ll be another fun(?) thing to look at??&lt;br /&gt;&lt;br /&gt;BTW, I’m looking to do the replication to another (low end celeron) box and perhaps just do a raid0 out of 3 drives for greater performance(?) and then pg-pool to load balance it to the raid0 box.&lt;br /&gt;&lt;br /&gt;Build performance and redundancy through multiple un-reliablie boxes eh? The google philosophy. &lt;br /&gt;I’ve got a few low end boxes lying around in the office which can be put to use I suspect.</description>
  <comments>http://lotso.livejournal.com/104119.html</comments>
  <category>linux</category>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>3</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/103530.html</guid>
  <pubDate>Sun, 18 Nov 2007 13:10:27 GMT</pubDate>
  <title>SQL - pgadmin can&apos;t do table inheritance</title>
  <link>http://lotso.livejournal.com/103530.html</link>
  <description>So I found out that in some sense, pgadmin is like shooting oneself in the foot. (if you’re not in the know)&lt;br /&gt;&lt;br /&gt;Let’s see how many times I’ve shot myself in the foot.&lt;br /&gt;&lt;br /&gt;1. There was no option for moving indexes to a separate tablespace from within pgadmin (1.8)&lt;br /&gt;--&amp;gt; This can be done using psql -&amp;gt; alter index xxx set tablespace fastspace&lt;br /&gt;&lt;br /&gt;2. There’s no option for making a table to become an inherited table AFTER table creation. (Note that there is an option in the GUI for adding/removing inherit tables, but in mine, it’s greyed out.)&lt;br /&gt;--&amp;gt; Via psql -&amp;gt; alter table footable inherit footable_parent&lt;br /&gt;&lt;br /&gt;hmm.. only 2 times.. (at least that’s how many I can remember right now)</description>
  <comments>http://lotso.livejournal.com/103530.html</comments>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/103389.html</guid>
  <pubDate>Sat, 17 Nov 2007 05:09:14 GMT</pubDate>
  <title>SQL - Perl DBI - Updating the rows counts</title>
  <link>http://lotso.livejournal.com/103389.html</link>
  <description>I’m syncing my PG database with the main MSSQL DB at a specified interval and I was wondering how many records were being deleted/inserted at every hour so  that i can get a feel of how much latency there is between the main SQL server and my data mart.&lt;br /&gt;&lt;br /&gt;My solution to pull data from mssql and insert into PG is based on Perl-DBI. Initially, I was wondering how I can get the rowcount (rows affected by a query) to be inserted into a log table. AFAIK, there isn’t a @@rowcount (mssql feature) in PG but there is a GET DIAGNOSTICS = ROW COUNT feature using pl/pgsql (which I use by the way when I write pl/pgsql) but since this was using perl-DBI, this feature wasn’t available. Hence the problem.&lt;br /&gt;&lt;br /&gt;I researched it a bit and found out that Perl-DBI provides the metadata of the number of rows affected by NON-SELECT statements.&lt;br /&gt;&lt;br /&gt;Hence, I set out to incorporate that into my script. (turns out that it wasn’t all that difficult)&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;em&gt;&lt;code&gt;
  # We have a basic 4 SQL statement to execute.
  # DELETE / INSERT / UPDATE / TRUNCATE
  my $query0 = “TRUNCATE TABLE $table_name_loading”;
  my $query1 = “DELETE FROM $table_name
                WHERE $unique_id in
                (SELECT $unique_id from $table_name_loading)”;
  my $query2 = “INSERT INTO $table_name SELECT * FROM $table_name_loading”;
  my $query3 = “UPDATE log_sync SET last_sync=?,
                record_update_date_time=current_timestamp
                WHERE table_name=?
                AND db_name = ?”;
  my $query4 = “INSERT INTO log_update(job_name, table_name, from_date, to_date, rows_deleted, rows_inserted)
                VALUES (‘mssql_2_pg’,?, ?, ?, ?, ?)”;
  my $query5 = “TRUNCATE TABLE $table_name_loading”;
#DBI-&amp;gt;trace(1);

  # The queries/SQL are wrapped into an EVAL because we
  # expect that these queries MAY fail due to duplicate Primary Keys
  # Note that these are all running as 1 transaction. If anyone failed,
  # we will call the errorhandler, rollback the changes, send email
  # and quit
  eval {
    print “Executing CLEANUP\n” if ($verbose);
    $sth_pg = $dbh_pg-&amp;gt;prepare($query0) or die “prepare failed $DBI::errstr”;
    $sth_pg-&amp;gt;execute();

    print “Executing DELETE\n” if ($verbose);
    #$sth_pg = $dbh_pg-&amp;gt;prepare($query1) or die “prepare failed $DBI::errstr”;
    #$sth_pg-&amp;gt;execute();
&lt;i&gt;&lt;b&gt;    my $del_rows = $dbh_pg-&amp;gt;do($query1) or die “prepare failed $DBI::errstr”;
   if ($del_rows != 0)
   {
     print “Number of rows deleted: ” . $del_rows . “\n”;
   } else {
     $del_rows = 0;
   }
&lt;/b&gt;&lt;/i&gt;
    print “Executing INSERT\n” if ($verbose);
    #$sth_pg = $dbh_pg-&amp;gt;prepare($query2) or die “prepare failed $DBI::errstr”;
    #$sth_pg-&amp;gt;execute();
    my $ins_rows = $dbh_pg-&amp;gt;do($query2) or die “prepare failed $DBI::errstr”;
    if ($ins_rows != 0)
    {
      print “Number of rows inserted: ” . $ins_rows . “\n”;
    } else {
      $ins_rows = 0;
    }

    print “Executing UPDATE\n” if($verbose);
    $sth_pg = $dbh_pg-&amp;gt;prepare($query3) or die “prepare failed $DBI::errstr”;
    $sth_pg-&amp;gt;execute($to_datetime, $table_name, $mssql_default_db);

    print “Executing INSERT INTO LOG\n” if($verbose);
    $sth_pg = $dbh_pg-&amp;gt;prepare($query4) or die “prepare failed $DBI::errstr”;
    $sth_pg-&amp;gt;execute($table_name, $from_datetime, $to_datetime, $del_rows, $ins_rows);

    print “Executing TRUNCATE\n” if ($verbose);
    $sth_pg = $dbh_pg-&amp;gt;prepare($query5) or die “prepare failed $DBI::errstr”;
    $sth_pg-&amp;gt;execute();
  };

  errorhandler(“dbh_pg”);
#  $dbh_pg-&amp;gt;rollback;
# If we got this far, then we commit the transaction
$dbh_pg-&amp;gt;commit;
}
&lt;/pre&gt;&lt;/em&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;A portion of the code as highlighted above needed to be done because in the log_update table, I’ve defined the rows_inserted/deleted as integers and Perl-DBI, in it’s wisdom, output &lt;b&gt;0E0&lt;/b&gt; when there is 0 (zero) rows affected by the query, for whatever reason. Hence those lines were added to ensure that 0 is outputed instead of 0E0.</description>
  <comments>http://lotso.livejournal.com/103389.html</comments>
  <category>linux</category>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/103128.html</guid>
  <pubDate>Sat, 03 Nov 2007 08:47:53 GMT</pubDate>
  <title>SQL - Coolness Factor X</title>
  <link>http://lotso.livejournal.com/103128.html</link>
  <description>This is really cool and I can already see some &lt;s&gt;nasty &lt;/s&gt; nifty stuffs to do with this nugget.&lt;br /&gt;&lt;br /&gt;My objective is to see if I can do some kind of DB link from within &lt;a href=&quot;http://www.postgresql.org&quot; rel=&quot;nofollow&quot;&gt; postgresql &lt;/a&gt;  to a Miscrosoft(tm) SQL Server instance.&lt;br /&gt;&lt;br /&gt;There are a few method of implementation, namely&lt;br /&gt;&lt;br /&gt;1. &lt;a href=&quot;http://pgfoundry.org/projects/dbi-link/&quot; rel=&quot;nofollow&quot;&gt;dbi-link&lt;/a&gt;&lt;br /&gt;2. &lt;a href=&quot;http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/dblink/&quot; rel=&quot;nofollow&quot;&gt;dblink&lt;/a&gt;&lt;br /&gt;3. &lt;a href=&quot;http://pgfoundry.org/projects/dblink-tds/&quot; rel=&quot;nofollow&quot;&gt;dblink-tds&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In this blog, I will talk about using option 3 which is basically a method for getting access to a MSSQL or a Sybase DB instance from within PG.&lt;br /&gt;&lt;br /&gt;The install process is quite simple (at least on gentoo, but I’ve yet to determine how to go about installing it on Centos, which is the deployment/target server)&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;em&gt;&lt;pre&gt;
make
make install

       query
        ------------
        dblink_tds(text, text, text, text) RETURNS setof record
                - returns a set of results from remote query (can be any kind of SQL query);
                - arguments are:
                - 1 text: SQL command string
                - 2 text: Name of the server entry in freetds.conf
                - 3 text: Username used to connect to MS SQL
                - 4 text: Password used to connect to MS SQL

        dblink_tds(text, text, text, text, int) RETURNS setof record
                - returns a set of results from remote query (can be any kind of SQL query);
                - arguments are:
                - 1 text: SQL command string
                - 2 text: Name of the server entry in freetds.conf
                - 3 text: Username used to connect to MS SQL
                - 4 text: Password used to connect to MS SQL
                - 5 int: Port number used to connect to MS SQL (default is 1433)

        dblink_tds(text, text, text, text, int, text) RETURNS setof record
                - returns a set of results from remote query (can be any kind of SQL query);
                - arguments are:
                - 1 text: SQL command string
                - 2 text: Name of the server entry in freetds.conf
                - 3 text: Username used to connect to MS SQL
                - 4 text: Password used to connect to MS SQL
                - 5 int: Port number used to connect to MS SQL
                - 6 text: Complete path to freetds.conf file (default is /etc/freetds/freetds.co
nf)
&lt;/code&gt;&lt;/em&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;The only drawback from doing a source install and not from an ebuild is that the default install location is in /usr/local/pgsql but my postgresql library location is in /usr/lib/pgsql. Thus, i made a symbolic link as a hack.&lt;br /&gt;&lt;br /&gt;Anyway.. post installation, I did a couple of testing and I’m happy with the results.&lt;br /&gt;&lt;br /&gt;One example usage is&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;em&gt;&lt;pre&gt;
SELECT 
a.id, a.famid, a.dcm,b.supplier 
FROM d_part a
INNER JOIN (
SELECT * 
FROM dblink_tds($$select famid, dcm, supplier_name 
    FROM database.dbo.supplier_lookup b$$,$$NeuroXP$$,$$sa$$,$$11111$$) AS 
(famid text,dcm text, supplier_name text)
) b
on a.famid = b.famid
and a.dcm = b.dcm
WHERE batchid = ‘2002’;

&lt;/code&gt;&lt;/em&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;take note that you have to put the AS (..) condition else it’ll spew out errors relating to wrong record type.&lt;br /&gt;&lt;br /&gt;my freetds version is 0.64 and this is what I’ve placed inside as definition to the connection&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;em&gt;&lt;pre&gt;
[NeuroXP]
        host = 172.16.124.128
        port = 1433
        tds version = 8.0

&lt;/code&gt;&lt;/em&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Thanks to davidfetter in #postgresql again</description>
  <comments>http://lotso.livejournal.com/103128.html</comments>
  <category>linux</category>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>1</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/102867.html</guid>
  <pubDate>Sat, 27 Oct 2007 06:26:54 GMT</pubDate>
  <title>SQL - Error!! Take Evasive Action (and not the other way around)</title>
  <link>http://lotso.livejournal.com/102867.html</link>
  <description>As I look more into the previous d_refresh(tblname text) function, the more I see that the &lt;a href=&quot;http://lotso.livejournal.com/102351.html&quot; rel=&quot;nofollow&quot;&gt;checks for duplicates&lt;/a&gt; are taking up a whole chunk of server/cpu/disk time. (most important is disk IO time, even though the time it spends there are just seconds, it is seconds too long to check for duplicates IF there aren’t any to begin with!)&lt;br /&gt;&lt;br /&gt;So, I embarked to make the function better. &lt;br /&gt;&lt;br /&gt;In this case, I chose to NOT follow the adage, fix it &lt;i&gt;&lt;u&gt;&lt;b&gt;before&lt;/b&gt;&lt;/u&gt;&lt;/i&gt; it breaks. This time around, I chose to &lt;b&gt;“Fix it IF and ONLY if it breaks”&lt;/b&gt;. This will be a needed relief for the server.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;So, I broke down the previous function and included an EXCEPTION check for unique_contraints. (I was looking through the &lt;a href=&quot;http://www.postgresql.org/docs/8.2/interactive/errcodes-appendix.html&quot; rel=&quot;nofollow&quot;&gt;Postgres Docs&lt;/a&gt; looking for error codes for duplicate primary key issues but can’t find any; at that time, I didn’t know that unique_contrainst == duplicate primary key until I saw this from &lt;a href=&quot;http://www.varlena.com/GeneralBits/106.php&quot; rel=&quot;nofollow&quot;&gt;Varlena.com&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;Hence, curently the function is broken down to 2 different segments. &lt;br /&gt;&lt;br /&gt;1. Normal Insertion, (delete/insert)&lt;br /&gt;2. Delete duplicate Primary key. (this is different from #1 in ways which I can’t explain properly in writing w/o much effort. So, just trust me, its different.) &lt;br /&gt;&lt;br /&gt;&lt;u&gt;Function 1.&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;em&gt;&lt;code&gt;
CREATE OR REPLACE FUNCTION d_refresh(tblname text)
  RETURNS void AS
$BODY$

DECLARE
last_r timestamp;
r_interval interval;
del_qry text;
ins_qry text;
del_stime timestamp;
del_etime timestamp;
ins_stime timestamp;
ins_etime timestamp;
max_time timestamp;
del_rows integer; 
ins_rows integer; 

tblname text;
del_job_tblname text := job_tblname || ‘_delete’;


BEGIN
  SELECT last_refreshed, refresh_interval, sql_delete, sql_insert 
  INTO last_r, r_interval, del_qry, ins_qry
  FROM d_log 
  WHERE table_name = tblname;

select last_sync 
into max_time
from sync_log where table_name = tblname

IF (last_r+r_interval) &amp;lt; max_time THEN

ins_qry := replace(ins_qry,‘fromdate’,quote_literal(last_r));
ins_qry := replace(ins_qry,‘todate’,quote_literal(last_r+r_interval));

del_qry := replace(del_qry,‘fromdate’,quote_literal(last_r));
del_qry := replace(del_qry,‘todate’,quote_literal(last_r+r_interval));

   del_stime := timeofday();
    execute del_qry;
    del_etime := timeofday();

    GET DIAGNOSTICS del_rows = ROW_COUNT;
    

    ins_stime := timeofday();
        BEGIN
	  execute ins_qry;
     	  EXCEPTION WHEN UNIQUE_VIOLATION THEN
	    execute d_refresh_delete(del_job_tblname);
	    execute ins_qry;
	END;
    ins_etime := timeofday();
 
    GET DIAGNOSTICS ins_rows = ROW_COUNT;

  UPDATE d_log 
  SET    SET last_refreshed = last_r + r_interval,
    record_update_date_time =  now(),
    delete_time = del_etime - del_stime,
    insert_time = ins_etime - ins_stime,
    rows_deleted = del_rows,
    rows_inserted = ins_rows
    WHERE job_table_name = job_tblname;
END IF;

RETURN;
END;
$BODY$
  LANGUAGE ‘plpgsql’ VOLATILE;
&lt;/pre&gt;&lt;/em&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;u&gt;Function 2&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;em&gt;&lt;code&gt;
CREATE OR REPLACE FUNCTION d_refresh_delete(tblname text)
  RETURNS void AS
$BODY$

DECLARE
last_r timestamp;
r_interval interval;
del_qry text;
ins_qry text;
del_stime timestamp;
del_etime timestamp;
ins_stime timestamp;
ins_etime timestamp;
max_time timestamp;
del_rows integer; 
ins_rows integer; 

tblname text;
base_tblname text := replace(job_tblname,‘_delete’,‘’);

BEGIN
   
   SELECT table_name, sql_delete, sql_insert 
  INTO tblname, del_qry, ins_qry
  FROM d_log 
  WHERE job_table_name = job_tblname;

  SELECT last_refreshed, refresh_interval
  INTO last_r, r_interval
  FROM d_log 
  WHERE job_table_name = base_tblname;


    ins_qry := replace(ins_qry,‘fromdate’,quote_literal(last_r));
    ins_qry := replace(ins_qry,‘todate’,quote_literal(last_r+r_interval));

    del_qry := replace(del_qry,‘fromdate’,quote_literal(last_r));
    del_qry := replace(del_qry,‘todate’,quote_literal(last_r+r_interval));

    del_stime := timeofday();
    execute del_qry;
    del_etime := timeofday();
    GET DIAGNOSTICS del_rows = ROW_COUNT;

    ins_stime := timeofday();
    execute ins_qry;
    ins_etime := timeofday();
    GET DIAGNOSTICS ins_rows = ROW_COUNT;

    UPDATE d_log 
    SET last_refreshed = last_r + r_interval,
    record_update_date_time =  now(),
    delete_time = del_etime - del_stime,
    insert_time = ins_etime - ins_stime,
    rows_deleted = del_rows,
    rows_inserted = ins_rows
    WHERE job_table_name = job_tblname;

RETURN;
END;
$BODY$
  LANGUAGE ‘plpgsql’ VOLATILE;
&lt;/pre&gt;&lt;/em&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;So.. when I call d_refresh(‘table_name’)&lt;br /&gt;and a duplicate primary key condition occurs, it will call the d_refresh_delete(‘table_name_delete’) w/ the table name concatenated with a “_delete”.&lt;br /&gt;&lt;br /&gt;In the d_refresh_delete function, it will first strip the “_delete” from the input tablename and use that to obtain the last_refresh and the refresh_interval from the base table so that it will use the same time/date/interval where the error occurs and then remove the duplicate into another duplicate table for further review.&lt;br /&gt;&lt;br /&gt;Once finished, it will return the control back to the original originating function and continue again with the insert execution and the normal flow.&lt;br /&gt;&lt;br /&gt;I’m pretty happy with this as it greatly reduces the amount of disk IO and server time. Don’t do useless stuffs right?&lt;br /&gt;&lt;br /&gt;Plenty Cool.</description>
  <comments>http://lotso.livejournal.com/102867.html</comments>
  <category>linux</category>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>1</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/102440.html</guid>
  <pubDate>Sat, 20 Oct 2007 15:14:59 GMT</pubDate>
  <title>SQL - My plpgSQL Function.. checking for maxtime</title>
  <link>http://lotso.livejournal.com/102440.html</link>
  <description>This is just an update on the &lt;a href=&quot;http://lotso.livejournal.com/102114.html&quot; rel=&quot;nofollow&quot;&gt;function d_refresh(tblname text) &lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I wanted to ensure that the function will not run when the last sync time of the DB to the master DB is less than the last_refreshed time + refresh_interval time. So, I used an IF to wrap it up.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;code&gt;&lt;pre&gt;
CREATE OR REPLACE FUNCTION d_refresh(tblname text)
  RETURNS void AS
$BODY$

DECLARE
last_r timestamp;
r_interval interval;
del_qry text;
ins_qry text;
del_stime timestamp;
del_etime timestamp;
ins_stime timestamp;
ins_etime timestamp;
max_time timestamp;

BEGIN
  SELECT last_refreshed, refresh_interval, sql_delete, sql_insert 
  INTO last_r, r_interval, del_qry, ins_qry
  FROM d_log 
  WHERE table_name = tblname;

select last_sync 
into max_time
from sync_log where table_name = tblname

IF (last_r+r_interval) &amp;lt; maxtime THEN

ins_qry := replace(ins_qry,‘fromdate’,quote_literal(last_r));
ins_qry := replace(ins_qry,‘todate’,quote_literal(last_r+r_interval));

del_qry := replace(del_qry,‘fromdate’,quote_literal(last_r));
del_qry := replace(del_qry,‘todate’,quote_literal(last_r+r_interval));

  del_stime := timeofday();
  execute del_qry;
  del_etime := timeofday();

  ins_stime := timeofday();
  execute ins_qry;
  ins_etime := timeofday();


  UPDATE d_log 
  SET last_refreshed = last_r + r_interval,
  record_update_date_time =  now(),
  delete_time = del_etime - del_stime,
  insert_time = ins_etime - ins_stime
  WHERE table_name = tblname;
END IF;

RETURN;
END;
$BODY$
  LANGUAGE ‘plpgsql’ VOLATILE;
&lt;/em&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Cool...</description>
  <comments>http://lotso.livejournal.com/102440.html</comments>
  <category>linux</category>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/102351.html</guid>
  <pubDate>Sat, 20 Oct 2007 14:42:04 GMT</pubDate>
  <title>SQL - Optimising Delete</title>
  <link>http://lotso.livejournal.com/102351.html</link>
  <description>I admit, my sql foo still has to be honed.&lt;br /&gt;&lt;br /&gt;My &lt;s&gt;previous&lt;/s&gt; problem was  that the previous entry’s profiling has found that I spent too much time on the delete portion of the query as opposed to the insert.&lt;br /&gt;&lt;br /&gt;(what I was basically trying to do was just a simple DELETE then INSERT)&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;code&gt;&lt;pre&gt;
     table_name      |   delete_time   |   insert_time
---------------------+-----------------+-----------------
z                  | 00:01:14.424943 | 00:00:02.622862
&lt;/em&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;the delete query was this..&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;code&gt;&lt;pre&gt;
select * from d_trz where exists(
select 1
from z
where z.record_update_date_time &amp;gt;= ‘2007-08-08 18:00:00’
and z.record_update_date_time   &amp;lt;  &apos;2007-08-08 18:01:00&apos;
and d_trz.id= z.id 
and d_trz.hid = z.hid 
and d_trz.start_date_time = z.start_date_time 
and d_trz.type = z.type 
and d_trz.phase_id = z.phase_id
)

the explain was giving me this:

Seq Scan on d_trz  (cost=0.00..414862.31 rows=21114 width=1611)&quot;
  Filter: (subplan)&quot;
  SubPlan&quot;
    -&amp;gt;  Index Scan using idx_trz_uptime on z  (cost=0.00..9.71 rows=1 width=0)“
          Index Cond: ((record_update_date_time &amp;gt;= ‘2007-08-08 18:00:00’::timestamp without time zone) AND (record_update_date_time &amp;lt; &apos;2007-08-08 18:01:00&apos;::timestamp without time zone))&quot;
          Filter: ((($0)::text = (id)::text) AND ($1 = hid) AND ($2 = start_date_time) AND ($3 = type) AND ($4 = phase_id))&quot;
&lt;/em&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Performance really sucked when the number of rows in that table increases.&lt;br /&gt;&lt;br /&gt;So, I went googling, but turned up nothing which will help me optimise the query. So.. Off I went to IRC and asked the question.&lt;br /&gt;...&lt;br /&gt;...&lt;br /&gt;...&lt;br /&gt;&lt;later&gt;&lt;br /&gt;&lt;br /&gt;I got an answer (again) from &lt;a href=&quot;http://a-kretschmer.de/&quot; rel=&quot;nofollow&quot;&gt;akretschmer&lt;/a&gt;to try the query in an alternate way..&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;code&gt;&lt;pre&gt;

select * from d_trz where 
(id, hid, start_date_time, type, phase_id) in 
(select id, hid, start_date_time, type, phase_id from z
where z.record_update_date_time &amp;gt;= ‘2007-08-08 18:00:00’
and z.record_update_date_time   &amp;lt;  &apos;2007-08-08 18:01:00&apos;) 

Nested Loop  (cost=9.71..18.05 rows=1 width=1611) (actual time=66.683..70.852 rows=82 loops=1)&quot;
  -&amp;gt;  HashAggregate  (cost=9.71..9.72 rows=1 width=30) (actual time=66.634..67.047 rows=254 loops=1)”
        -&amp;gt;  Index Scan using idx_trz_uptime on z  (cost=0.00..9.70 rows=1 width=30) (actual time=0.107..9.729 rows=5170 loops=1)“
              Index Cond: ((record_update_date_time &amp;gt;= ‘2007-08-08 18:00:00’::timestamp without time zone) AND (record_update_date_time &amp;lt; &apos;2007-08-08 18:01:00&apos;::timestamp without time zone))&quot;
  -&amp;gt;  Index Scan using d_trz_pkey on d_trz  (cost=0.00..8.30 rows=1 width=1611) (actual time=0.009..0.010 rows=0 loops=254)”
        Index Cond: (((d_trz.id)::text = (z.id)::text) AND (d_trz.hid = z.hid) AND (d_trz.start_date_time = z.start_date_time) AND (d_trz.type = z.type) AND (d_trz.phase_id = z.phase_id))“
Total runtime: 71.182 ms”
&lt;/em&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Many thanks  again to &lt;a href=&quot;http://a-kretschmer.de/&quot; rel=&quot;nofollow&quot;&gt;akretschmer&lt;/a&gt;</description>
  <comments>http://lotso.livejournal.com/102351.html</comments>
  <category>linux</category>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>1</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/102114.html</guid>
  <pubDate>Sat, 20 Oct 2007 09:25:55 GMT</pubDate>
  <title>SQL - Profiling a SQL function</title>
  <link>http://lotso.livejournal.com/102114.html</link>
  <description>It’s not really rocket science, but I wanted to know what was taking so long on my function. I’m not sure if it’s the time taken to delete (because the explain plan keeps doing a sequential scan as opposed to an index scan; then again the table is still very small and it’s now only 12K rows)&lt;br /&gt;&lt;br /&gt;So.. I first proceeded to put in some “timers” in the function. However, (since I didn’t know better) I used current_timestamp and now().&lt;br /&gt;&lt;br /&gt;The function looks something like this :-&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;code&gt;&lt;pre&gt;
CREATE OR REPLACE FUNCTION d_refresh(tblname text)
  RETURNS void AS
$BODY$

DECLARE
last_r timestamp;
r_interval interval;
del_qry text;
ins_qry text;
del_stime timestamp;
del_etime timestamp;
ins_stime timestamp;
ins_etime timestamp;

BEGIN
  SELECT last_refreshed, refresh_interval, sql_delete, sql_insert 
  INTO last_r, r_interval, del_qry, ins_qry
  FROM d_log 
  WHERE table_name = tblname;

ins_qry := replace(ins_qry,‘fromdate’,quote_literal(last_r));
ins_qry := replace(ins_qry,‘todate’,quote_literal(last_r+r_interval));

del_qry := replace(del_qry,‘fromdate’,quote_literal(last_r));
del_qry := replace(del_qry,‘todate’,quote_literal(last_r+r_interval));

  del_stime := now();
  execute del_qry;
  del_etime := now();

  ins_stime := now();
  execute ins_qry;
  ins_etime := now();


  UPDATE d_log 
  SET last_refreshed = last_r + r_interval,
  record_update_date_time =  now(),
  delete_time = del_etime - del_stime,
  insert_time = ins_etime - ins_stime
  WHERE table_name = tblname;

RETURN;
END;
$BODY$
  LANGUAGE ‘plpgsql’ VOLATILE;
&lt;/em&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;But I then found out that the time returned was actually 0. Now, that most certainly can’t be right since the query (insert/delete) takes approx between 5 to 19 secs.&lt;br /&gt;&lt;br /&gt;I pinged some guys on IRC #postgres and half an hour later, I found out my problem. Note that I used now() above. So, according to &lt;a href=&quot;http://a-kretschmer.de/&quot; rel=&quot;nofollow&quot;&gt;akretschmer&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;
[akretschmer] now() and current_timestamp returns the start-timestamp of the current transaction
[akretschmer] timeofday() returns the exact time
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;so.. with that, the above becomes &lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;code&gt;&lt;pre&gt;
CREATE OR REPLACE FUNCTION d_refresh(tblname text)
  RETURNS void AS
$BODY$

DECLARE
last_r timestamp;
r_interval interval;
del_qry text;
ins_qry text;
del_stime timestamp;
del_etime timestamp;
ins_stime timestamp;
ins_etime timestamp;

BEGIN
  SELECT last_refreshed, refresh_interval, sql_delete, sql_insert 
  INTO last_r, r_interval, del_qry, ins_qry
  FROM d_log 
  WHERE table_name = tblname;

ins_qry := replace(ins_qry,‘fromdate’,quote_literal(last_r));
ins_qry := replace(ins_qry,‘todate’,quote_literal(last_r+r_interval));

del_qry := replace(del_qry,‘fromdate’,quote_literal(last_r));
del_qry := replace(del_qry,‘todate’,quote_literal(last_r+r_interval));

  del_stime := timeofday();
  execute del_qry;
  del_etime := timeofday();

  ins_stime := timeofday();
  execute ins_qry;
  ins_etime := timeofday();


  UPDATE d_log 
  SET last_refreshed = last_r + r_interval,
  record_update_date_time =  now(),
  delete_time = del_etime - del_stime,
  insert_time = ins_etime - ins_stime
  WHERE table_name = tblname;

RETURN;
END;
$BODY$
  LANGUAGE ‘plpgsql’ VOLATILE;
&lt;/em&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;and I get &lt;br /&gt;&lt;code&gt;&lt;pre&gt;&lt;em&gt;

MyDB=&amp;gt; select table_name, delete_time, insert_time from denorm_log where 
table_name = ‘z’;
     table_name      |   delete_time   |   insert_time
-----------------------+-----------------+-----------------
z                         | 00:00:02.105404 | 00:00:08.312243
&lt;/em&gt;&lt;/code&gt;&lt;/pre&gt;</description>
  <comments>http://lotso.livejournal.com/102114.html</comments>
  <category>linux</category>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>3</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/101633.html</guid>
  <pubDate>Sun, 07 Oct 2007 13:20:40 GMT</pubDate>
  <title>string_to_array and getting a function to return table column_names</title>
  <link>http://lotso.livejournal.com/101633.html</link>
  <description>It’s true, I’m slow when it comes to PG in terms of writing plpgsql functions. I blame it on a lack of ample documentation. Seriously, when it comes to PG’s plpgsql documentation, it’s really sparse. I know there’s the official documentation and all, but to me, it’s not nearly as useful because it lacks examples and that sucks in a truly big way.&lt;br /&gt;&lt;br /&gt;The only book I can find on PG is (what I have) a 1st edition PostgreSQL written by Korry Douglas and Susan Douglas. And that too has like perhaps a little over 30pages worth of stuffs on plpgsql docs.&lt;br /&gt;&lt;br /&gt;In the Microsoft world, and the MySQL world and Oracle, there’s tons of books on their stored procedure. I myself have a few of those in my cupboard.&lt;br /&gt;&lt;br /&gt;Anwyay, I digres. The point of this post is to let the reader know how to get all the column_names for a particular table returned as a string so that this can be used for perl-dbi functions to pull from mssql-&amp;gt;pgsql automatically so that I don’t have to re-write the query to add additional columns when additional columns are added into the tables. This will make maintenance easier.&lt;br /&gt;&lt;br /&gt;There are 2 ways of doing this. I went with the more complicated way 1st. (mainly because I didn’t know there was a simpler way) and I was banging my head against the wall for the better part of the day. (oh.. I did take an afternoon nap for 3 hours when my head hurt enough.)&lt;br /&gt;&lt;br /&gt;Here’s the 1st one.&lt;br /&gt;&lt;pre&gt;&lt;em&gt;&lt;code&gt;

CREATE OR REPLACE FUNCTION select_columns(tablename text) RETURNS text as $$
DECLARE
   sql_str text;
   qry text;
BEGIN
	for sql_str in 
        select attname from pg_class 
	join pg_attribute 
	on pg_class.oid = pg_attribute.attrelid
	join pg_namespace
	on pg_namespace.oid = pg_class.relnamespace
	where relname = tablename
	and nspname = ‘myschema
	and attnum &amp;gt; 0
	LOOP
	qry := coalesce(qry || ’,‘,’‘) || sql_str;
	END LOOP;

   RETURN qry;
END;
$$
LANGUAGE plpgsql;
&lt;/pre&gt;&lt;/em&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;the &lt;br /&gt;&lt;pre&gt;&lt;em&gt;&lt;code&gt;
qry := coalesce(qry || ’,‘,’’) || sql_str;
&lt;/pre&gt;&lt;/em&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;is cool because it automatically strips out the comma from (before) the 1st column_name. (because in the original declaration, the qry is NULL)&lt;br /&gt;&lt;br /&gt;This trick wouldn’t work if we had originally declared qry as &lt;br /&gt;&lt;pre&gt;&lt;em&gt;&lt;code&gt;
qry text:=‘’;
&lt;/pre&gt;&lt;/em&gt;&lt;/code&gt;&lt;br /&gt;which makes qry not NULL&lt;br /&gt;&lt;br /&gt;2nd method is definitely simpler and easier to be used and understood.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;em&gt;&lt;code&gt;
SELECT array_to_string(array(SELECT attname FROM pg_class
JOIN pg_attribute
ON pg_class.oid = pg_attribute.attrelid
JOIN pg_namespace
ON pg_namespace.oid = pg_class.relnamespace
WHERE relname = tablename
AND nspname = ‘myschema
AND attnum &amp;gt; 0), ’,‘);
&lt;/pre&gt;&lt;/em&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;I’ll be using the 2nd incarnation as my choice of query string.&lt;br /&gt;&lt;br /&gt;Thanks to depesz in #Postgresql IRC for the pointers.</description>
  <comments>http://lotso.livejournal.com/101633.html</comments>
  <category>linux</category>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/101134.html</guid>
  <pubDate>Sun, 23 Sep 2007 04:14:18 GMT</pubDate>
  <title>The Mantra of Successful BI</title>
  <link>http://lotso.livejournal.com/101134.html</link>
  <description>These days, my focus is geared towards learning and making SQL and in the past, I’ve tried to replicate from mssql to mysql and I wasn’t too successful in doing that due to various reasons.&lt;br /&gt;&lt;br /&gt;However, now I’m learning and playing more and more with postgresql and I’m getting more and more impressed with it as a Database. While the setup differs between the company’s mssql server and mssql in terms of the number of columns (I’m using PG more like a datamart as opposed to a data warehouse), I’m also limited to lesser hardware w/ only a 2G celeron w/ 1G ram to be deployed on.&lt;br /&gt;&lt;br /&gt;Anyway, that’s not the point of this post. The point of this post is to present to the larger community at hand on what makes a good BI Tool (Business Intelligence). BI is a hot topic these days what with everyone digging and mining through terabytes of raw or summarised data in search of the golden nugget. (I just read the papers today that the US Dept of Homeland Security is collecting and mining data on US residents to determine suspicious behaviour)&lt;br /&gt;&lt;br /&gt;So, what makes a good BI tool? Point and Click? Drag and Drop? (yet another) New interface?&lt;br /&gt;&lt;br /&gt;Let me tell you a story on what makes a good BI tool and something which (nearly) everyone can use w/o much re-learning. It’s a ubiquitus tool and though I’m largely a power user of it, I’m not recommending its use if you can utilise some other FOSS based versions. There’s nothing much to learn due to it’s ubiquitious nature and it’s pervasiveness.&lt;br /&gt;&lt;br /&gt;You get your drag and drop and you get your point and click as well. It’s interface is user friendly and it provides a familiar surrounding to your users.&lt;br /&gt;&lt;br /&gt;The tool, which I use and which I wrote macros for is called “EXCEL” (tm). Yep.. I (or rather we) use Microsoft(tm) Excel(tm) as a front end to the DB to obtain and to slice &amp; dice the data and it’s working well.&lt;br /&gt;&lt;br /&gt;Why Excel? I know there are lots of technical and non-technical reasons not to use Excel, but that’s besides the point. When you’re given lemon, you have to use it to make lemonade right? So, that was the situation provided to me and I grabbed it by it’s horns and make it work. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;u&gt;Adapt of die.&lt;/u&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Reading &lt;a href=&quot;http://andyonenterprisesoftware.com/2007/07/the-price-of-failure/&quot; rel=&quot;nofollow&quot;&gt;Andy&lt;/a&gt; and his opinion on “The Price Of Failure” on the comment by Madan Sheina on  the failure of BI projects.&lt;br /&gt;&lt;br /&gt;I especially like and I quote Andy on point #3.&lt;br /&gt;&lt;quote&gt;&lt;em&gt;&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;3. “Just one more new user interface” is not what the customer wants to hear. “Most are familiar with Excel and are not willing to change their business experience” was one quote from a customer in the article. Spot on! Why should a customer whose main job is, after all, not IT but something in the business, have to learn a different tool just to get access to data that he or she needs? Some tool vendors have done a good job of integrating with Excel, and yet are often in denial about this since they view their proprietary interface as a key competitive weapon against other vendors. Customers don’t care about this; they just want to get at the data they need to do their job on an easy and timely way. Hence a BI project should, if at all possible, look at ways of allowing users to getting data into their familiar Excel rather than foisting new interfaces on them. A few analyst types will be prepared to learn a new tool, but this is only a small subset of the audience for a BI project, likely 10% or less.&lt;br /&gt;&lt;br /&gt;&lt;/quote&gt;&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Like it or not, spreadsheets are here to stay.&lt;br /&gt;Spreadsheets in any form, gnumeric, openoffice calc, excel, koffice anything which can connect to a DB and retrieve data is your greatest BI tool.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://searchcrm.techtarget.com/columnItem/0,294698,sid11_gci1081869,00.html&quot; rel=&quot;nofollow&quot;&gt;Rick Sherman, in a 2005 article wrote&lt;/a&gt; :&lt;br /&gt;&lt;quote&gt;&lt;em&gt;&lt;br /&gt;...&lt;br /&gt;For many years BI vendors have been building front-end tools to try to replace spreadsheets for querying, reporting and analyzing data results. But despite the fact that tens of thousands of BI tool licenses have been sold, spreadsheets are still the most pervasive and dominant tool.&lt;br /&gt;...&lt;br /&gt;&lt;/quote&gt;&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;Forrester even went to publish a whitepaper entitled “Ouch! Get Ready - Spreadsheets are Here to Stay for Business Intelligence” which can be downloaded &lt;a href=&quot;http://web1.forrester.com/forr/reg/campaignlogin.jsp?lr=/Marketing/Campaign2/1,6538,909,00.html&amp;amp;RegistrationID=1-BI38PD&amp;amp;regmode=marketingtrial&amp;amp;iCampaignID=909&quot; rel=&quot;nofollow&quot;&gt;here for free&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;quote&gt;&lt;em&gt;&lt;br /&gt;...&lt;br /&gt;“Spreadsheets — the most widely used business intelligence (BI) tool — are a permanent fixture in enterprises because no other analytical application outperforms them in flexibility, ease of use, and ubiquity. Spreadsheets’ role in BI is no longer limited to simple import/export mechanisms; they now play an integral role in all layers of the BI stack. Yet the lack of controls and security and integrity issues create tremendous challenges. To minimize risks while gaining the inherent BI value of spreadsheets, information and knowledge management professionals must discriminate between the different ways spreadsheets are used. Then, they must help users apply advanced spreadsheet tools and techniques to their daily jobs, while also implementing a tightly controlled (or closely monitored) environment for critical production processes that rely on spreadsheet data. In turn, vendors should take advantage of this market opportunity by introducing tools that will bridge the gap between spreadsheet management and spreadsheet usage in the BI process.”&lt;br /&gt;...&lt;br /&gt;&lt;quote&gt;&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;In the paper from Information Builders titled “Worst Practices in BI”, which can be downlaoded from &lt;a href=&quot;http://www.b-eye-network.com/files/2007%20Information%20Builders%20Worst%20Practices%20in%20BI%20WP.pdf&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt; which has this bit of anecdote by Ralph Kimball&lt;br /&gt;&lt;br /&gt;&lt;quote&gt;&lt;em&gt;&lt;br /&gt; “The majority of the user base likely will access the data via pre -built parameter-driven analytic applications. Approximately 90 to 95 percent of the potential users will be served by these canned applications that are essentially finished templates that do not require users to construct relational queries directly.”&lt;br /&gt;&lt;/quote&gt;&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;Bi tools needs to satisfy the needs of the main business users and it needs to provide these data in a timely fashion to gain the most from the “time to market” and to maintain competitiveness. &lt;br /&gt;&lt;br /&gt;However, usage of spreadsheets is also a bane because of “operator error” as I will put it. One way to ensure that data is not calculated wrongly, is to ensure that users/operators need not apply formulas and other manual calculations themselves and just use the data as is. Excel or any other spreadsheet only provides the means (UI) for getting the data from the Data mart/warehouse for pivot tables.&lt;br /&gt;&lt;br /&gt;One thing which I’ve yet to be able to do is to determine how to get what WebFOCUS does(you’ll have to refer to figure 2 &amp; 3 of the report to understand) that is to also put the excel calculations into the spreadsheet when the data is exported.&lt;br /&gt;&lt;br /&gt;That, is a cool feature.&lt;br /&gt;&lt;br /&gt;So.. my friends, what BI tools are being used extensively in your environment? And Are your users using it properly? I know that I didn’t use the company’s new BI tool (actually, there was none since they pulled the plug on Business Objects) and asked users to render SQL themselves.</description>
  <comments>http://lotso.livejournal.com/101134.html</comments>
  <category>gentoo</category>
  <category>sql</category>
  <category>rants</category>
  <lj:security>public</lj:security>
  <lj:reply-count>6</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/100875.html</guid>
  <pubDate>Tue, 04 Sep 2007 14:09:53 GMT</pubDate>
  <title>The (Partial) Ultimate Stress Test</title>
  <link>http://lotso.livejournal.com/100875.html</link>
  <description>So.. I had a request to provide some production level data. Normally, when these sort of requests or needs for looking at production data, we will either balk or cringe at the very thought of having to shift through tons of data. But the worst of the prospects isn’t actually the thought of the amount of data returned or having to look at it. It’s more towards having to &lt;b&gt;wait&lt;/b&gt; for it to be returned!!&lt;br /&gt;&lt;br /&gt;In my world(tm), pulling production data is a nightmare and the IS/IT folks will scream your head off as one tries to pull 1 week’s worth of data from the SQL Server instance.&lt;br /&gt;&lt;br /&gt;We’ve been forewarned to only pull at a maximum, a couple of hours of data per-pull. This means, to be able to even get a rough semblance of how the data looks like for a period of 1 week, we’ll have to go through multiple pulls or jump through hoops to get to that data.&lt;br /&gt;&lt;br /&gt;Not today... NO... Not today.. Today, seems to be my lucky day. As some of you who may already know, I’ve been playing around with Perl::DBI and PostgreSQL specifically. Writing a perl script to pull that data in little chunks and transforming/loading them into my very own PostgreSQL instance in my (I wouldn’t call it trusty, but.. it’s what I’ve got) laptop which; keep this in mind, runs on a meagre 5400rpm 2.5in drive sitting on a 1.4Ghz (single core) centrino w/ 1.5GB of RAM, vs the SQL server which has like &amp;gt; 3 GB of ram and 8 cores of processing power w/ 10K RPM SATA drives.&lt;br /&gt;&lt;br /&gt;Anyway, I am not only impressed with PostgreSQL’s ability to churn out 48,000 rows worth of datapoints for a period of 2 weeks in under ~4mins vs that of SQL server. (we sometimes leave it running overnight and when we get into the office in the morning, it’ll &lt;b&gt;still&lt;/b&gt; be running!)&lt;br /&gt;&lt;br /&gt;Total Runtime : 245 044.492 milliseconds = 4.08407487 minute&lt;br /&gt;&lt;br /&gt;Note : This is not a “toy” data I’m playing around with. This is the entire ETL’ed data pulled from the SQL Server and loaded into my laptop for development purposes. That’s between 2million to 8 million of &lt;i&gt;&lt;b&gt;real&lt;/b&gt;&lt;/i&gt; data.&lt;br /&gt;&lt;br /&gt;Take THAT!!&lt;br /&gt;&lt;br /&gt;So, yes. I am proud and I’m feeling quite &lt;b&gt;smug&lt;/b&gt;!!</description>
  <comments>http://lotso.livejournal.com/100875.html</comments>
  <category>sql</category>
  <lj:mood>accomplished</lj:mood>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/100390.html</guid>
  <pubDate>Sun, 02 Sep 2007 08:59:59 GMT</pubDate>
  <title>I AM Impressed!! Seriously I am!!</title>
  <link>http://lotso.livejournal.com/100390.html</link>
  <description>Seriously man, this ROCKS!&lt;br /&gt;&lt;br /&gt;There is a total of 3300 unique rows in one table (A) of 1.5million which I want the data of in another table. This other table (B), has ~7.5 million rows and I want to only search for these 3300 rows and return the result, denormalised using a bunch of case statements.&lt;br /&gt;&lt;br /&gt;The results came out in like 44secs!! BEAT that man.. SQL Server takes a “much” longer time then this. How long? I don’t know.. don’t have the same data in a SQL server instance in my laptop. (in My Laptop - 5400rpm drive!!) For one, it will definitely be longer than 1 min</description>
  <comments>http://lotso.livejournal.com/100390.html</comments>
  <category>linux</category>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>2</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/100134.html</guid>
  <pubDate>Sun, 02 Sep 2007 08:51:31 GMT</pubDate>
  <title>Aliasing Log10 to point to Log</title>
  <link>http://lotso.livejournal.com/100134.html</link>
  <description>Thanks to dennisb/mastermind in IRC #postgres, I was able to create a function which effectively called LOG when I asked for Log10.&lt;br /&gt;&lt;br /&gt;I needed this function once again to main compatibility between SQL Server and PostgreSQL.&lt;br /&gt;&lt;br /&gt;in PostgreSQL, there’s function overloading, so you need to be careful else your log and log10 will return different values.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;
=&amp;gt;select log(980000000), log10(980000000);
       log       |       log10
-----------------+--------------------
 8.9912260756925 | 8.9912260756924949
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;so the way to do it is to also make one for each data_type. (I made 2, one for numeric and another for float8, also known as double precision)&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;
CREATE OR REPLACE FUNCTION xmms.log10(“numeric”)
  RETURNS “numeric” AS
‘SELECT log($1);’
  LANGUAGE ‘sql’ IMMUTABLE STRICT;
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;IMMUTABLE is also needed to ensure that the result stays the same across execution.  See &lt;a href=&apos;http://www.postgresql.org/docs/current/static/xfunc-volatility.html&apos; rel=&apos;nofollow&apos;&gt;http://www.postgresql.org/docs/current/static/xfunc-volatility.html&lt;/a&gt; for more info. Also, the use “strict” is also needed in case a null input was given.&lt;br /&gt;&lt;br /&gt;STRICT is an alias for “returns null on null input”&lt;br /&gt;&lt;br /&gt;So.. Cool.. works!!</description>
  <comments>http://lotso.livejournal.com/100134.html</comments>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/99920.html</guid>
  <pubDate>Sun, 02 Sep 2007 07:41:24 GMT</pubDate>
  <title>Why one uses Index, the other doesnt??</title>
  <link>http://lotso.livejournal.com/99920.html</link>
  <description>&lt;pre&gt;
hmxmms=&amp;gt; explain select a.number, a.code, c, value_1, value_2, case when value_2 in (‘0’,‘2’,‘4’,‘6’) then ‘Up’ else ‘Dn’ end as UpDn, zn_2, round(cast(log(zn_2) as numeric),2) as ZnERR from zone_b b inner join a on a.number = b.number where b.serial_number = ‘95656497’ and zn_2 != 0 limit 1000;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..10560.63 rows=1000 width=44)
   -&amp;gt;  Nested Loop  (cost=0.00..11870.15 rows=1124 width=44)
         -&amp;gt;  Index Scan using idx_zone_b on zone_b b  (cost=0.00..2764.41 rows=1124 width=36)
               Index Cond: ((number)::text = ‘95656497’::text)
               Filter: (zn_2 &amp;lt;&amp;gt; 0::double precision)
         -&amp;gt;  Index Scan using a_pkey on a (cost=0.00..8.07 rows=1 width=24)
               Index Cond: (a.number = (b.number)::bpchar)
(7 rows)

Time: 3.279 ms
hmxmms=&amp;gt; explain select a.number, a.code, c, value_1, value_2, case when value_2 in (‘0’,‘2’,‘4’,‘6’) then ‘Up’ else ‘Dn’ end as UpDn, zn_2, round(cast(log(zn_2) as numeric),2) as ZnERR from zone_b b inner join a on a.number = b.number where a.serial_number = ‘95656497’ and zn_2 != 0 limit 1000;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..211653.58 rows=1000 width=44)
   -&amp;gt;  Nested Loop  (cost=0.00..237898.63 rows=1124 width=44)
         -&amp;gt;  Index Scan using a_pkey on a  (cost=0.00..8.49 rows=1 width=24)
               Index Cond: (number = ‘95656497’::bpchar)
         -&amp;gt;  Seq Scan on zone_b b  (cost=0.00..237862.04 rows=1124 width=36)
               Filter: ((zn_2 &amp;lt;&amp;gt; 0::double precision) AND (‘95656497’::bpchar = (_number)::bpchar))
(6 rows)
Time: 3.070 ms
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;WHy?? why does the 1stone uses an index scan when the 2nd one doesn’t?&lt;br /&gt;&lt;br /&gt;difference is only a.number vs b.number..&lt;br /&gt;&lt;br /&gt;Update : Hmm.. seems like I missed out on a very important distinction. The number column’s data_type was different between the 2 columns in the 2 tables which made it behave differently.&lt;br /&gt;&lt;br /&gt;I’ve not converted it such that both are not varchar and everything is peachy!!</description>
  <comments>http://lotso.livejournal.com/99920.html</comments>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/99686.html</guid>
  <pubDate>Sun, 02 Sep 2007 07:35:28 GMT</pubDate>
  <title>use(less) info of the day #1</title>
  <link>http://lotso.livejournal.com/99686.html</link>
  <description>did you know that in Postgres, the ROUND() type can only be used for under 2 conditions? When the datatype is either numeric or integer.&lt;br /&gt;&lt;br /&gt;&lt;a href=&apos;http://www.postgresql.org/docs/8.1/static/typeconv-func.html&apos; rel=&apos;nofollow&apos;&gt;http://www.postgresql.org/docs/8.1/static/typeconv-func.html&lt;/a&gt;</description>
  <comments>http://lotso.livejournal.com/99686.html</comments>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/99549.html</guid>
  <pubDate>Sun, 02 Sep 2007 03:01:31 GMT</pubDate>
  <title>Porting UDF from SQL Server into PostgreSQL</title>
  <link>http://lotso.livejournal.com/99549.html</link>
  <description>It turned out even better than I had hoped. Didn’t take me too long to convert a simple function from SQL Server to PostgresSQL.&lt;br /&gt;&lt;br /&gt;This function basically returns a substring from a given number.&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;pre&gt;
create or replace function dbo.FmtMyNumber(Number varchar(30)) returns varchar(4) as
$$
DECLARE
  RtnNumber varchar(4);
  FmtNumber varchar(100);
BEGIN
  FmtNumber := LTRIM(RTRIM(UPPER(Number)));

  -- Number of Form AA600WP-00LPX0
  IF FmtNumber ~ ‘A[A-Z,0-9][0-9][0-9][A-Z][A-Z]-[0-9][0-9][A-Z][A-Z,0-9][A-Z][A-Z,0-9]’ THEN
    Capacity := substring(FmtNumber,3,2);
    return RtnNumber;
  -- Number of Form AA6000WP-00LPX0
  ELSIF FmtNumber ~ ‘A[A-Z,0-9][0-9][0-9][0-9][A-Z][A-Z]-[0-9][0-9][A-Z][A-Z,0-9][A-Z][A-Z,0-9]’ THEN
    Capacity := substring(FmtNumber,3,3);
    return RtnNumber;
  ELSE
    return FmtNumber;
  END IF;

END;
$$ language plpgsql;
&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Question to folks : With regard to the returns, based on the above, there is 1 return per IF statement, wherelse in SQL Server, the return is only way at the end of the function. After all the processes. Is there similar method in PostgreSQL? Or the above is correct??&lt;br /&gt;&lt;br /&gt;Thanks</description>
  <comments>http://lotso.livejournal.com/99549.html</comments>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/98309.html</guid>
  <pubDate>Fri, 31 Aug 2007 18:10:32 GMT</pubDate>
  <title>SQL - Good Progress</title>
  <link>http://lotso.livejournal.com/98309.html</link>
  <description>Been playing around with Perl DBI and PostgreSQL of late. Been hacking together a perl script to pull data from a MS SQL Server and getting it imported into PostgreSQL.&lt;br /&gt;&lt;br /&gt;Progress is good.&lt;br /&gt;&lt;br /&gt;check list&lt;br /&gt;1. pull from SQL Server&lt;br /&gt;2. format into CSV suitable for loading into PG&lt;br /&gt;3. loading into PG&lt;br /&gt;4. Error checking and DIEing on error (firing off an email alert - VERY important)&lt;br /&gt;5. getting the sync dates/time logged&lt;br /&gt;6. getting the last sync date/time from server and using that as a base date/time if process was restarted&lt;br /&gt;&lt;br /&gt;the process is nearly complete. What’s left to do :&lt;br /&gt;&lt;br /&gt;1. Port over a UDF from MS SQL Server&lt;br /&gt;2. determine if I want to also put in a sync_interval into PG &lt;br /&gt;3. get some HD space..&lt;br /&gt;4. create a script to de-normalise the data&lt;br /&gt;5. create a script to make it easier to add new columns (to script/DB)&lt;br /&gt;6. edit the Excel front end to include pulling from PG</description>
  <comments>http://lotso.livejournal.com/98309.html</comments>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/98295.html</guid>
  <pubDate>Sun, 26 Aug 2007 08:57:05 GMT</pubDate>
  <title>moving schemas</title>
  <link>http://lotso.livejournal.com/98295.html</link>
  <description>Problem statement 1 :&lt;br /&gt;How do I create a new schema and move all existing tables to the new schema&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&lt;br /&gt;=&amp;gt;create schema new_schema;&lt;br /&gt;&lt;br /&gt;=&amp;gt;alter table public.table_in_public set schema new_schema.&lt;br /&gt;&lt;br /&gt;&lt;a href=&apos;http://www.postgresql.org/docs/8.2/static/sql-altertable.html&apos; rel=&apos;nofollow&apos;&gt;http://www.postgresql.org/docs/8.2/static/sql-altertable.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;=&amp;gt;show search_path;&lt;br /&gt;  search_path&lt;br /&gt;----------------&lt;br /&gt; “$user”,public&lt;br /&gt;&lt;br /&gt; =&amp;gt;\d&lt;br /&gt;No relations found&lt;br /&gt;=&amp;gt; set search_path to new_schema,public;&lt;br /&gt;SET&lt;br /&gt;&lt;br /&gt;=&amp;gt;\d&lt;br /&gt; Schema           | Name | Type  |  Owner&lt;br /&gt;--------------------+-------+-------+----------&lt;br /&gt; new_schema   | foo    | table | operator&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Move all existing tables out from public schema to a new schema&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Problem statement 2:&lt;br /&gt;As a result of moving all existing tables from the public schema, the default search path is still $user,public which makes all the existing tables from the old public schema invisible.&lt;br /&gt;&lt;br /&gt;There are ways to set the new search path at each session (eg above) but this is not wanted...&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;ALTER ROLE “operator” SET search_path=new_schema, public;&lt;br /&gt;&amp;lt;/coder&amp;gt;&lt;br /&gt;&lt;br /&gt;which solves the problem&lt;/code&gt;</description>
  <comments>http://lotso.livejournal.com/98295.html</comments>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://lotso.livejournal.com/97913.html</guid>
  <pubDate>Sun, 26 Aug 2007 07:41:18 GMT</pubDate>
  <title>DB Size</title>
  <link>http://lotso.livejournal.com/97913.html</link>
  <description>Just playing around with my new DB (postgres) on my laptop.&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;select pg_database_size(‘main_main’);&lt;br /&gt; pg_database_size&lt;br /&gt;------------------&lt;br /&gt;       4031489758&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Say Again? it’s such a drag that 1MB = 1024 bytes or 1MB = 1000 bytes. (did you know that there was once a class action suit filed against some of the &lt;a href=&quot;http://www.xbitlabs.com/news/storage/display/20060629225834.html&quot; rel=&quot;nofollow&quot;&gt;biggest Hard Drive producers&lt;/a&gt;?)&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;select pg_size_pretty(pg_database_size(‘XMMS’));&lt;br /&gt; pg_size_pretty&lt;br /&gt;----------------&lt;br /&gt; 3845 MB&lt;br /&gt;&lt;/code&gt;</description>
  <comments>http://lotso.livejournal.com/97913.html</comments>
  <category>sql</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
</channel>
</rss>
