PostgreSQL 9.0新功能示例

发布: 2010-06-30 09:06


PostgreSQL 9.0 release notes


This document showcases many of the latest developments in PostgreSQL 9.0, compared to the last major release – PostgreSQL 8.4. There are more than 200 improvements in this release. While all the important advances are explained and demonstrated, some of the more esoteric changes are not covered, but these are unlikely to be known issues for most users.

Hot Standby and Streaming Replication are the two new features that mark Version 9.0 as a landmark in PostgreSQL's development.

While there are several third party replication solutions available for PostgreSQL that meet a range of specific needs, this new release brings a simple, sturdy and integrated version that will probably be used as a default in most High Availability installations using PostgreSQL.

Changes are arranged in four chapters:


  • The two new features you can't ignore
  • Other new features
  • Potential issues when upgrading existing systems
  • Other improvements




Contents

[hide]



The two features you can't ignore


The significance of these two new features is the motivation for allocating a full version number to this release – 9.0 (not 8.5).


Hot Standby


This feature allows users to create a 'Standby' database – that is, a database replaying the primary's binary log, while making it available for read-only queries. It is substantially similar to enterprise standby database features of top proprietary databases, such as Oracle's DataGuard.

Implementation of this feature took over two years and is quite complex. While read-only queries are running, the standby database has to be able to replay the binary modifications coming from the production database, decide if these modifications are conflicting with the read-only queries and which action should be taken as a consequence: pause the replay or kill some read-only queries and move forward. Hot Standby also adds some data in WAL logs for the standby database and a conflict resolution mechanism.

To setup Hot standby, you just have to set up the production database as follows:

postgresql.conf, Primary:


 wal_level = 'hot standby' # Adds the required data in the WAL logs
# vacuum_defer_cleanup_age # You may want to set this up, but it could be complicated (see documentation)

Then, create a standby database (the same way you did with previous standby databases: pg_start_backup on primary, copy the files (on the , pg_end_backup on primary).

Then, you just have to copy logs and data files on the secondary server and add this in its postgresql.conf:

postgresql.conf, Secondary:


 hot_standby=on
max_standby_delay=30s # -1= always wait, 0= never wait, else wait for this

and use a program such as pg_standby on the secondary to help replay the logs (this parameter goes in recovery.conf, but there are new options there too, see next feature).

max_standby_delay determines the behaviour of the standby database when conflicts between replay and read-only queries occur. In this situation, the standby database will wait at most until it's lagging max_standby_delay behind the production database before killing the conflicting read-only queries.

Of course, you're strongly encouraged to read the documentation before putting this feature in place… if only to understand the consequences of tuning max_standby_delay and vacuum_defer_cleanup_age, which are not easy to grasp.


Streaming Replication


Complimenting Hot Standby, Streaming Replication is the second half of the "great leap forward" for PostgreSQL. This time, the goal is improving the archiving mechanism to make it as continuous as possible and to not rely on file shipping. Standby databases can now connect to the master and get sent, whenever they want, what they are missing from the Write Ahead Log, not in terms of complete files ('wal segments'), but in terms of records in the WAL (you can think of them as fragments of these files).

Streaming Replication is an asynchronous mechanism; the standby database lags behind the master. But unlike other replication methods, this lag is very short, and can be as little as a single transaction, depending on network speed, database activity, and Hot Standby settings. Also, the load on the master for each slave is minimal, allowing a single master to support dozens of slaves.

Production and standby databases are identical at the binary level (well, almost, we won't get into details, but don't worry if your datafiles don't have the same checksum).

For Streaming Replication, wal_level should be 'archive' (or 'hot standby') to do continuous archiving.

postgresql.conf, Primary:


 max_wal_senders = x # Maximum 'wal_senders', processes responsible for managing a connection with a standby server
wal_keep_segments # How many WAL segments(files) should be kept on the primary, whatever may happen (you won't have to copy them manually on the standby if the standby gets too far behind)

On the secondary:

recovery.conf, Secondary:


 standby_mode = on
primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' # connection string to reach the primary database

postgresql.conf, Secondary:


 wal_level # same value as on the primary (you'll need this after a failover, to build a new standby)
hot_standby=on/off # Do you want to use Hot Standby at the same time ?

pg_hba.conf file:

There must be an entry here for the replication connections. The fake database is 'replication', the designated user should be superuser. Be careful not to give broad access to this account: a lot of privileged data can be extracted from WAL records.

pg_hba.conf, Primary:


 host    replication     foo             192.168.1.100/32        md5

As for Hot Standby, this feature is rich and complex. It's advised to read the documentation. And to perform failovers and switchovers tests when everything is in place.

One thing should be stressed about these two features: you can use them together. This means you can have a near-realtime standby database, and run read-only queries on it, such as reporting queries. You can also use them independently; a standby database can be Hot Standby with file shipping only, and a Streaming Replication database can stream without accepting queries.


New features


Exclusion constraints


Exclusion constraints are very similar to unique constraints. They could be seen as unique constraints using other operators than '=': A unique constraint defines a set of columns for which two records in the table cannot be identical.

To illustrate this, we will use the example provided by this feature's author, using the temporal data type, that he also developed. This datatype stores time ranges, that is 'the time range from 10:15 to 11:15'.

First, we need to retrieve the temporal module here: http://pgfoundry.org/projects/temporal/ , then compile and install it as a contrib (run the provided sql script).


 CREATE TABLE reservation
(
room TEXT,
professor TEXT,
during PERIOD);

 ALTER TABLE reservation ADD CONSTRAINT test_exclude EXCLUDE   USING gist (room WITH =,during WITH &&);

Doing this, we declare that a record should be rejected (exclusion constraint) if there already is one verifying the two conditions 'the same room' and 'be in intersection for the time range' (the && operator).


 marc=# INSERT INTO reservation (professor,room,during) VALUES ( 'mark', 'tech room', period('2010-06-16 09:00:00', '2010-06-16 10:00:00'));
INSERT 0 1
marc=# INSERT INTO reservation (professor,room,during) VALUES ( 'john', 'chemistry room', period('2010-06-16 09:00:00', '2010-06-16 11:00:00'));
INSERT 0 1
marc=# INSERT INTO reservation (professor,room,during) VALUES ( 'mark', 'chemistry room', period('2010-06-16 10:00:00', '2010-06-16 11:00:00'));
ERROR: conflicting key value violates exclusion constraint "test_exclude"
DETAIL: Key (room, during)=(chemistry room, [2010-06-16 10:00:00+02, 2010-06-16 11:00:00+02)) conflicts with existing key (room, during)=(chemistry room, [2010-06-16 09:00:00+02, 2010-06-16 11:00:00+02)).

The insert is forbidden, as the chemistry room is already reserved from 9 to 11.

Exclusion constraints may also be used with arrays, geographic data, or other non-scalar data in order to implement advanced scientific and calendaring applications. No other database system has this feature.


Column triggers


Column triggers fire only when a specific column is explicitly UPDATED. They allow you to avoid adding lots of conditional logic and value comparisons in your trigger code.

Example:


 CREATE TRIGGER foo BEFORE UPDATE OF a ON t1 FOR EACH ROW EXECUTE PROCEDURE my_trigger();

This trigger fires only when t1's 'a' column of the t1 table has been updated.

Column triggers are not executed if columns are set to DEFAULT.


WHEN Triggers


Completing PostgreSQL's effort to limit IF ... THEN code in triggers, conditional triggers define simple conditions under which the trigger will be executed. This can dramatically decrease the number of trigger executions and reduce CPU load on the database server.

For example, this trigger would check that an account was correctly balanced only when the balance changes:


 CREATE TRIGGER check_update
BEFORE UPDATE ON accounts
FOR EACH ROW
WHEN (OLD.balance IS DISTINCT FROM NEW.balance)
EXECUTE PROCEDURE check_account_update();

And this trigger will only log a row update when the row actually changes. It's very helpful with framework or ORM applications, which may attempt to save unchanged rows:


 CREATE TRIGGER log_update
AFTER UPDATE ON accounts
FOR EACH ROW
WHEN (OLD.* IS DISTINCT FROM NEW.*)
EXECUTE PROCEDURE log_account_update();

You could even further than this and decide not to save a row at all if it hasn't changed:


 CREATE TRIGGER log_update
AFTER UPDATE ON accounts
FOR EACH ROW
WHEN (OLD.* IS NOT DISTINCT FROM NEW.*)
EXECUTE PROCEDURE no_op();

DEFERRABLE UNIQUE CONSTRAINTS


This feature will also be very useful. Here is an example, using a primary key instead of a simple unique key:


 marc=# CREATE TABLE test (a int primary key);
marc=# INSERT INTO test values (1), (2);
marc=# UPDATE test set a = a+1;
ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (a)=(2) already exists.

That's normal, but a pity nevertheless: at the end of the transaction, my data would have been consistent, as far as this constraint is concerned. Even worse, if the table had been physically sorted by descending order, the query would have worked! With 8.4, there was no easy way out, we had to find a trick to update the records in the right order.

We can now do this:


 marc=# CREATE TABLE test (a int primary key deferrable);
marc=# INSERT INTO test values (2),(1);
marc=# UPDATE test set a = a+1;
ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (a)=(2) already exists.

Oops, it doesn't work.

As a matter of fact, this is on purpose: It's a little reminder on deferrable/deferred constraints: a deferrable constraint CAN be checked at the end of a transaction. You still have to ask PostgreSQL to defer it.

One can, for the current session, require that all the constraints be deferred:


 marc=# SET CONSTRAINTS ALL DEFERRED;
SET CONSTRAINTS
marc=# UPDATE test set a = a+1;
UPDATE 2

If one doesn't want to perform a SET CONSTRAINTS each time, the constraint can also be declared as INITIALLY DEFERRED:


 CREATE TABLE test (a int PRIMARY KEY DEFERRABLE INITIALLY DEFERRED);

Another reminder is necessary here: DEFERRED constraints are slower than IMMEDIATE constraints. Furthermore, the list of records to be checked at the end of the transaction has to be stored somewhere. So be careful of not doing this for millions of records at once. This is one of the reasons that DEFERRABLE constraints aren't INITIALLY DEFERRED by default.


Anonymous Functions


This new feature is for creating run-once functions. Effectively, this allows you to run stored procedure code on the command line or dynamically as you can on SQL Server and Oracle. Unlike those, however, PostgreSQL allows you to run an anonymous function in any procedural language which is installed of the more than a dozen which PostgreSQL supports.

This feature will be very useful for schema upgrade scripts for instance. Here is a slightly different version of the 'GRANT SELECT ON ALL TABLES' that will be seen later in this document, giving SELECT rights to a bunch of tables, depending on the table owner, and not taking into account two schemas.:


 DO language plpgsql $$
DECLARE
vr record;

BEGIN

FOR vr IN SELECT tablename FROM pg_tables WHERE tableowner = 'marc' AND schemaname NOT IN ('pg_catalog','information_schema')
LOOP
EXECUTE 'GRANT SELECT ON ' || vr.tablename || ' TO toto';
END LOOP;
END
$$;

As of 8.4, this would have required creating a function (with CREATE FUNCTION), running it, then removing it (with DROP FUNCTION). All of this requiring having rights to do this. 9.0 simplifies performing this kind of procedures.

Anonymous functions are also called "anonymous code blocks" in the software industry.


Named parameters


Combined with the Default Parameters introduced in version 8.4, named parameters allow for dynamic calling of functions with variable numbers of arguments, much as they would be inside a programming language. Named parameters are familiar to users of SQL Server or Sybase, but PostgreSQL does one better by supporting both named parameter calls and function overloading.

The chosen syntax to name parameters is the following:


 CREATE FUNCTION test (a int, b text) RETURNS text AS $$
DECLARE
value text;
BEGIN
value := 'a is ' || a::text || ' and b is ' || b;
RETURN value;
END;
$$ LANGUAGE plpgsql;

Until now, we wrote:


 SELECT test(1,'foo');
test
-------------------------
a is 1 and b is foo
(1 row)

Now this explicit syntax can be used:


 SELECT test( b:='foo', a:=1);
test
-------------------------
a is 1 and b is foo
(1 row)

Named parameters should eliminate the need to write many overloaded "wrapper" functions. Note that this does add a backwards compatibility issue; you are no longer able to rename function parameters using a REPLACE command, but must now drop and recreate the function.


GRANT/REVOKE IN SCHEMA


One annoying limitation in PostgreSQL has been the lack of global GRANT/REVOKE capabilities. With 9.0 it's now possible to set privileges on all tables, sequences and functions within a schema using without having to write a script or a stored procedure:


 GRANT SELECT ON ALL TABLES IN SCHEMA public TO toto;

And reverting this:


 REVOKE SELECT ON ALL TABLES IN SCHEMA public FROM toto;

See the GRANT documentation page for further details.

Note that the above only works for existing objects. However, it's now also possible to define default permissions for new objects:


ALTER DEFAULT PRIVILEGES


This feature also makes permission management more efficient.


 ALTER DEFAULT PRIVILEGES FOR ROLE marc GRANT SELECT ON TABLES TO PUBLIC;
CREATE TABLE test_priv (a int);
\z test_priv
Access privileges
Schema | Name | Type | Access privileges | Column access privileges
--------+------------+-------+-------------------+--------------------------
public | test_priv | table | =r/marc +|
| | | marc=arwdDxt/marc |

These new informations are stored in the pg_default_acl system table.


PL/pgSQL changes which may cause regressions


There are two changes in PL/pgSQL which may break code which works in 8.4 or earlier, meaning PL/pgSQL functions should be audited before before migrating to 9.0 to prevent possible runtime errors.


Removal of column/variable name ambiguity


In 8.4 and earlier, PL/PgSQL variables will take preference over a table or view column with the same name. While this behaviour is consistent, it is a potential source of coding errors. 9.0 will throw a runtime error if this situation occurs:


 marc=# DO LANGUAGE plpgsql
$$
DECLARE
a int;
BEGIN
SELECT a FROM test;
END
$$
 ;
ERROR: column reference "a" is ambiguous
LINE 1: select a from test
DETAIL: It could refer to either a PL/pgSQL variable or a table column.
QUERY: select a from test
CONTEXT: PL/pgSQL function "inline_code_block" line 4 at SQL statement

This behaviour can be altered globally in postgresql.conf, or on a per function basis by inserting one of these three options in the function declaration:


 #variable_conflict error        (default)
#variable_conflict use_variable (variable name name takes precedence - pre-9.0 behaviour)
#variable_conflict use_column (column name takes precedence)

The manual contains more details.


Reserved words


From 9.0, use of unquoted reserved words as PL/PgSQL variable names is no longer permitted:


 marc=# DO LANGUAGE plpgsql
$$
DECLARE
table int;
BEGIN
table :=table+1;
END
$$
 ;
ERROR: syntax error at or near "table"
LINE 6: table :=table+1;

The correct syntax is:


 marc=# DO LANGUAGE plpgsql
$$
DECLARE
"table" int;
BEGIN
"table" :="table"+1;
END
$$
 ;
DO

Best practice is of course to avoid reserved words completely.


Improvements


The query planner got a lot of improvements in this version. So we'll start by talking about these:


Join Removal


 marc=# CREATE TABLE t1 (a int);
CREATE TABLE
marc=# CREATE TABLE t2 (b int);
CREATE TABLE
marc=# CREATE TABLE t3 (c int);
CREATE TABLE

We put a little bit of data with a generate_series…


 marc=# EXPLAIN SELECT t1.a,t2.b from t1 join t2 on (t1.a=t2.b) left join t3 on (t1.a=t3.c);
QUERY PLAN
------------------------------------------------------------------------------
Merge Right Join (cost=506.24..6146.24 rows=345600 width=8)
Merge Cond: (t3.c = t1.a)
-> Sort (cost=168.75..174.75 rows=2400 width=4)
Sort Key: t3.c
-> Seq Scan on t3 (cost=0.00..34.00 rows=2400 width=4)
-> Materialize (cost=337.49..853.49 rows=28800 width=8)
-> Merge Join (cost=337.49..781.49 rows=28800 width=8)
Merge Cond: (t1.a = t2.b)
-> Sort (cost=168.75..174.75 rows=2400 width=4)
Sort Key: t1.a
-> Seq Scan on t1 (cost=0.00..34.00 rows=2400 width=4)
-> Sort (cost=168.75..174.75 rows=2400 width=4)
Sort Key: t2.b
-> Seq Scan on t2 (cost=0.00..34.00 rows=2400 width=4)

For now, everything is normal, and we have the same behavior in 8.4. But let's imagine that on t3, there is a UNIQUE constraint on the 'c' column. In this case, the join on t3 doesn't serve any purpose, theoretically speaking: the number of rows returned won't change, neither will their content. It's because the column is UNIQUE, the join is a LEFT JOIN, and no column of t3 is retrieved. If the column wasn't UNIQUE, the join could bring more rows. If that wasn't a LEFT JOIN, the join could ignore some rows.

With 9.0:


 marc=# ALTER TABLE t3 ADD UNIQUE (c);
NOTICE: ALTER TABLE / ADD UNIQUE will create implicit index "t3_c_key" for table "t3"
ALTER TABLE
marc=# EXPLAIN SELECT t1.a,t2.b from t1 join t2 on (t1.a=t2.b) left join t3 on (t1.a=t3.c);
QUERY PLAN
------------------------------------------------------------------
Merge Join (cost=337.49..781.49 rows=28800 width=8)
Merge Cond: (t1.a = t2.b)
-> Sort (cost=168.75..174.75 rows=2400 width=4)
Sort Key: t1.a
-> Seq Scan on t1 (cost=0.00..34.00 rows=2400 width=4)
-> Sort (cost=168.75..174.75 rows=2400 width=4)
Sort Key: t2.b
-> Seq Scan on t2 (cost=0.00..34.00 rows=2400 width=4)
(8 rows)

This enhancement will sometimes be very nice, among other things when using an ORM (Object Relation Mapping). These tools have a tendency to write unnecessary joins. In this simple case, the estimated cost has been divided by 10.

This could also help a lot for programs using a lot of joins and nested views.

This feature is another reason to declare the constraints in the database: without these constraints, there is no way for the engine to be sure these rewrites can be done.


IS NOT NULL can now use indexes


For this demonstration, we will compare the 8.4 and 9.0 versions (the table I created contains mostly nulls):

With 8.4:


 marc=# EXPLAIN ANALYZE SELECT max(a) from test;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.03..0.04 rows=1 width=0) (actual time=281.320..281.321 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.03 rows=1 width=4) (actual time=281.311..281.313 rows=1 loops=1)
-> Index Scan Backward using idxa on test (cost=0.00..29447.36 rows=1001000 width=4) (actual time=281.307..281.307 rows=1 loops=1)
Filter: (a IS NOT NULL)
Total runtime: 281.360 ms
(6 rows)

With 9.0:


 marc=# EXPLAIN ANALYZE SELECT max(a) from test;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.08..0.09 rows=1 width=0) (actual time=0.100..0.102 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.08 rows=1 width=4) (actual time=0.092..0.093 rows=1 loops=1)
-> Index Scan Backward using idxa on test (cost=0.00..84148.06 rows=1001164 width=4) (actual time=0.089..0.089 rows=1 loops=1)
Index Cond: (a IS NOT NULL)
Total runtime: 0.139 ms
(6 rows)

The difference is that 9.0 only scans the not-null keys in the index. 8.4 has to go check in the table (Filter step, when 9.0 uses an index condition). In this precise use case, the gain is really big.


Use of index to get better statistics on the fly


Before starting to explain this new feature, let's talk about histograms: PostgreSQL, like some other databases, uses a statistical optimizer. This means that when planning a query it has (or should have) an approximately correct idea of how many records each step of the query will bring back. In order to do this, it uses statistics, such as the approximate number of records in a table, its size, most common values, and histograms. PostgreSQL use these to get estimates about the number of records brought back by a WHERE clause on a column, depending on the value or range asked in this WHERE clause.

In some cases, these histograms are rapidly out of date, and become a problem, for certain SQL queries. For instance, a log table in which timestamped records would be inserted, and from which we would most of the time want to get the records from the last 5 minutes.

In this specific case, it was impossible before 9.0 to get correct statistics. Now, when PostgreSQL detects while planning that a query asks for a 'range scan' on a value larger than the largest of the histogram (or smaller than the smallest), that is, the largest detected value during the last statistics calculation, and this column has an index, it gets the max (or min) value for this column using the index BEFORE really executing the query, in order to get more realistic statistics. As PostgreSQL uses an index for this, there HAS to be an index, of course.

Here comes an example. The a column of the test table has already been filled with a lot of dates, all in the past. It's statistics are up to date.

It's 13:37, and I haven't inserted anything after 13:37 yet.


 marc=# EXPLAIN ANALYZE select * from test where a > '2010-06-03 13:37:00';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Index Scan using idxtsta on test (cost=0.00..8.30 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=1)
Index Cond: (a > '2010-06-03 13:37:00'::timestamp without time zone)
Total runtime: 0.027 ms
(3 rows)

Everything's normal. The upper boundary of the histogram is '2010-06-03 13:36:16.830007' (this information comes from pg_stats). There is no way of guessing how many records are larger than 13:37, and with 8.4, PostgreSQL would have continued estimating '1' until the next analyze.


 marc=# DO LANGUAGE plpgsql
$$
DECLARE
i int;
BEGIN
FOR i IN 1..10000 LOOP
INSERT INTO test VALUES (clock_timestamp());
END LOOP;
END
$$
 ;
DO

(I must say I really like 'DO').
We just inserted 10000 records with a date larger than 13:37.


 marc=# EXPLAIN ANALYZE SELECT * FROM test WHERE a > '2010-06-03 13:37:00';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using idxtsta on test (cost=0.00..43.98 rows=1125 width=8) (actual time=0.012..13.590 rows=10000 loops=1)
Index Cond: (a > '2010-06-03 13:37:00'::timestamp without time zone)
Total runtime: 23.567 ms
(3 rows)

The estimated rows isn't 0 or 1 anymore. The statistics haven't been updated, though:


 marc=# SELECT last_autoanalyze FROM pg_stat_user_tables WHERE relname = 'test';
last_autoanalyze
-------------------------------
2010-06-03 13:36:21.553477+02
(1 row)

We still have a one magnitude error in the evaluation (10 times). But it's not that bad: without this enhancement, it would be of four magnitudes (10,000). Anyway, a much smaller error makes it more likely we'll get a good plan out of this kind of queries.


Per tablespace seq_page_cost/random_page_cost


 marc=# ALTER TABLESPACE pg_default SET ( random_page_cost = 10, seq_page_cost=5);
ALTER TABLESPACE

We just changed random_page_cost and seq_page_cost for all the objects contained in pg_default. What for ?

The use case is when different tablespaces have different performance: for instance, you have some critical data on a SSD drive, or historical data on an older disk array, slower than the brand new array you use for active data. This makes it possible to tell PostgreSQL that all your tablespaces don't always behave the same way, from a performance point of view. This is only useful, of course, for quite big databases.


Force distinct statistics on a column


This makes it possible to set the number of different values for a column. This mustn't be used lightly, but only when ANALYZE on this column can't get a good value.

Here's how to do this:


 marc=# ALTER TABLE test ALTER COLUMN a SET (n_distinct = 2);
ALTER TABLE

ANALYZE has to be run again for this to be taken into account:


 marc=# ANALYZE test;
ANALYZE

Let's try now:


 marc=# EXPLAIN SELECT distinct * from test;
QUERY PLAN
------------------------------------------------------------------
HashAggregate (cost=6263.00..6263.02 rows=2 width=8)
-> Seq Scan on test (cost=0.00..5338.00 rows=370000 width=8)
(2 rows)

This is an example of what SHOULDN'T be done : there REALLY is 370 000 distinct values in my table. Now my execution plans may be very bad.

If n_distinct is positive, it's the number of distinct values.

If it's negative (between 0 and -1), it's the multiplying factor regarding the number of estimated records in the table: for instance, -0.2 means that there is a distinct value for each 5 records of the table.

0 brings the behavior back to normal (ANALYZE estimates distinct by itself).

Don't change this parameter, unless you are completely sure you have correctly diagnosed you problem. Else, be assured performance will be degraded.



Many other aspects of the database have been improved. Here are a few amongst them:


Better VACUUM FULL


Until now, VACUUM FULL was very slow. This statement can recover free space from a table to reduce its size, mostly when VACUUM itself hasn't been run frequently enough.

It was slow because of the way it operated: records were read and moved one by one from their source bloc to a bloc closer to the beginning of the table. Once the end of the table was emptied, this empty part was removed.

This strategy was very inefficient: moving records one by one creates a lot of random IO. Moreover, during this reorganization, indexes had to be maintained, making everything even more costly, and fragmenting indexes. It was therefore advised to reindex a table just after a VACUUM FULL.

The VACUUM FULL statement, as of version 9.0, creates a new table from the current one, copying all the records sequentially. Once all records are copied, index are created back, and the old table is destroyed and replaced.

This has the advantage of being much faster. VACUUM FULL still needs an AccessExclusiveLock while running though. The only drawback of this method compared to the old one, is that VACUUM FULL can use as much as two times the size of the table on disk, as it is creating a new version of it.

Let's now compare the runtimes of the two methods. In both cases, we prepare the test data as follows (for 8.4 and 9.0)


 marc=# CREATE TABLE test (a int);
CREATE TABLE
marc=# CREATE INDEX idxtsta on test (a);
CREATE INDEX
marc=# INSERT INTO test SELECT generate_series(1,1000000);
INSERT 0 1000000
marc=# DELETE FROM test where a%3=0; -- making holes everywhere
DELETE 333333
marc=# VACUUM test;
VACUUM

With 8.4:


 marc=# \timing
Timing is on.
marc=# VACUUM FULL test;
VACUUM
Time: 6306,603 ms
marc=# REINDEX TABLE test;
REINDEX
Time: 1799,998 ms

So around 8 seconds.
With 9.0:


 marc=# \timing
Timing is on.
marc=# VACUUM FULL test;
VACUUM
Time: 2563,467 ms

That still doesn't mean that VACUUM FULL is a good idea in production. If you need it, it's probably because your VACUUM policy isn't appropriate.


64 bits for windows.


There is now a 64 bits version of PostgreSQL for Windows. There is no evidence for now the 500MB shared_buffers size limit before performance degrades seen on the 32 bits version for Windows is solved with this 64 bit version, though. But there are many other reasons you may want to use this 64 bit version, such as larger work_mem.


PL/pgSQL by default


You won't have to add PL/pgSQL in databases, as it will be installed by default.


Many improvements on PL languages.


Many languages have been vastly improved, PLPerl for instance. Read the release notes if you want more details, there are too many to detail here.


ALIAS keyword


ALIAS can now be used. As its name suggests, it can be used to alias variable names to other names.

The syntax is new_name ALIAS FOR old_name. This is put in the DECLARE section of PL/pgSQL code.

It has two main use cases:


  • to give names to PL functions variables:

 myparam ALIAS FOR $0

  • to rename potentially conflicting variables. In a trigger for instance:

 new_value ALIAS FOR new

(without this, we might have conflicted with the NEW variable in the trigger function).

Message passing in NOTIFY/pg_notify


Messages can now be passed using NOTIFY. Here is how:


  • Subscribe in session 1 to the 'instant_messenging' queue.

Session 1:

 marc=# LISTEN instant_messenging;
LISTEN

  • Send a notification through 'instant_messenging', from another session

Session 2:

 marc=# NOTIFY instant messenging, 'You just received a message';
NOTIFY

  • Check the content of the queue in the first session

Session 1:

 marc=# LISTEN instant_messenging;
LISTEN
Asynchronous notification "instant_messenging" with payload "You just received a message" received from server process with PID 5943.

So we can now associate messages (payloads) with notifications, making NOTIFY even more useful.

Let's also mention the new pg_notify function. With it, the second session's code can also be:


 SELECT pg_notify('instant_messenging','You just received a message');

This can simplify some code, in the case of a program managing a lot of different queues.


get_bit and set_bit for bit strings


Here is a very simple example. This tool can manipulate bits in a bit() independently.


 marc=# SELECT set_bit('1111'::bit(4),2,0);
set_bit
---------
1101
(1 row)




 marc=# SELECT get_bit('1101'::bit(4),2);
get_bit
---------
0
(1 row)

application_name in pg_stat_activity


In a monitoring session:


 marc=# SELECT * from pg_stat_activity where procpid= 5991;
datid | datname | procpid | usesysid | usename | application_name | client_addr | client_port | backend_start | xact_start | query_start | waiting | current_query
------+---------+---------+----------+---------+------------------+-------------+-------------+-------------------------------+------------+-------------+---------+----------------
16384 | marc | 5991 | 10 | marc | psql | | -1 | 2010-05-16 13:48:10.154113+02 | | | f | <IDLE>
(1 row)

In the '5991' session:


 marc=# SET application_name TO 'my_app';
SET

Back to the monitoring session:


 >marc=# SELECT * from pg_stat_activity where procpid= 5991;
datid | datname | procpid | usesysid | usename | application_name | client_addr | client_port | backend_start | xact_start | query_start | waiting | current_query
------+---------+---------+----------+---------+------------------+-------------+-------------+-------------------------------+------------+-------------+---------+-----------------+----------------
16384 | marc | 5991 | 10 | marc | my_app | | -1 | 2010-05-16 13:48:10.154113+02 | | 2010-05-16 13:49:13.107413+02 | f | <IDLE>
(1 row)

It's your job to set this up correctly in your program or your sessions. Your DBA will thank you for this, at last knowing who runs what on the database easily.


Per database+role configuration


Instead of being able to set up configuration variables per database or per user, one can now set them up for a certain user in a certain database:


 marc=# ALTER ROLE marc IN database marc set log_statement to 'all';
ALTER ROLE

To know who has which variables set-up in which user+database, there is a new psql command:


 marc=# \drds
List of settings
role | database | settings
-----+----------+-----------------
marc | marc | log_statement=all
(1 row)

There was a catalog change to store this:


 Table "pg_catalog.pg_db_role_setting"
Column | Type | Modifier
------------+--------+----------
setdatabase | oid | not null
setrole | oid | not null
setconfig | text |

Log all changed parameters on a postgresql.conf reload


Here is an example, the log_line_prefix parameter has been changed:


 LOG:  received SIGHUP, reloading configuration files
<%> LOG:  parameter "log_line_prefix" changed to "<%u%%%d> "

New frame options for window functions


If you don't know window functions yet, you'd better learn about them. You can start here : <a href='http://www.depesz.com/index.php/2009/01/21/waiting-for-84-window-functions' />. They make writing certain kind of queries much easier.

New options have been added for declaring frames of windowing functions. Let's use this table (not having a better example…)


 marc=# SELECT * FROM salary ;
entity | name | salary | start_date
-----------+-----------+---------+---------------
R&D | marc | 700.00 | 2010-02-15
Accounting | jack | 800.00 | 2010-05-01
R&D | maria | 700.00 | 2009-01-01
R&D | kevin | 500.00 | 2009-05-01
R&D | john | 1000.00 | 2008-07-01
R&D | tom | 1100.00 | 2005-01-01
Accounting | millicent | 850.00 | 2006-01-01

Here is a window function example, without declaring the frame:


 marc=# SELECT entity, name, salary, start_date, avg(salary) OVER (PARTITION BY entity ORDER BY start_date) FROM salary;
entity | name | salary | start_date | avg
-----------+-----------+---------+---------------+-----------------------
Accounting | millicent | 850.00 | 2006-01-01 | 850.0000000000000000
Accounting | jack | 800.00 | 2010-05-01 | 825.0000000000000000
R&D | tom | 1100.00 | 2005-01-01 | 1100.0000000000000000
R&D | john | 1000.00 | 2008-07-01 | 1050.0000000000000000
R&D | maria | 700.00 | 2009-01-01 | 933.3333333333333333
R&D | kevin | 500.00 | 2009-05-01 | 825.0000000000000000
R&D | marc | 700.00 | 2010-02-15 | 800.0000000000000000

The frame is the group of records over which the window function is run. Of course, if the frame isn't explicitly declared, there is a default one.

Here is the same query, with an explicit frame:


 marc=# SELECT entity, name, salary, start_date, avg(salary) OVER (PARTITION BY entity ORDER BY start_date    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM salary;
entity | name | salary | start_date | avg
-----------+-----------+---------+---------------+-----------------------
Accounting | millicent | 850.00 | 2006-01-01 | 850.0000000000000000
Accounting | jack | 800.00 | 2010-05-01 | 825.0000000000000000
R&D | tom | 1100.00 | 2005-01-01 | 1100.0000000000000000
R&D | john | 1000.00 | 2008-07-01 | 1050.0000000000000000
R&D | maria | 700.00 | 2009-01-01 | 933.3333333333333333
R&D | kevin | 500.00 | 2009-05-01 | 825.0000000000000000
R&D | marc | 700.00 | 2010-02-15 | 800.0000000000000000

In this example, the frame is a 'range' frame, between the start of the partition (the group of similar rows) and the current row (not exactly the current row, but let's put that aside for now, read the documentation if you want to learn more). One can see, the average (avg) function is evaluated from the frame's first row (grouped together records) and the current row.

First new feature: as of 9.0, the frame can be declared to be between the current row and the end of the partition:


 marc=# SELECT entity, name, salary, start_date, avg(salary)  OVER (PARTITION BY entity ORDER BY start_date    RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM salary;
entity | name | salary | start_date | avg
-----------+-----------+---------+---------------+----------------------
Accounting | millicent | 850.00 | 2006-01-01 | 825.0000000000000000
Accounting | jack | 800.00 | 2010-05-01 | 800.0000000000000000
R&D | tom | 1100.00 | 2005-01-01 | 800.0000000000000000
R&D | john | 1000.00 | 2008-07-01 | 725.0000000000000000
R&D | maria | 700.00 | 2009-01-01 | 633.3333333333333333
R&D | kevin | 500.00 | 2009-05-01 | 600.0000000000000000
R&D | marc | 700.00 | 2010-02-15 | 700.0000000000000000

Second new feature: frames can be declared as 'x previous records to y next records'. There is no point with this example, but let's do it anyway::


 marc=# SELECT entity, name, salary, start_date, avg(salary)  OVER (PARTITION BY entity ORDER BY start_date    RANGE ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM salary;
entity | name | salary | start_date | avg
-----------+-----------+---------+---------------+-----------------------
Accounting | millicent | 850.00 | 2006-01-01 | 825.0000000000000000
Accounting | jack | 800.00 | 2010-05-01 | 825.0000000000000000
R&D | tom | 1100.00 | 2005-01-01 | 1050.0000000000000000
R&D | john | 1000.00 | 2008-07-01 | 933.3333333333333333
R&D | maria | 700.00 | 2009-01-01 | 733.3333333333333333
R&D | kevin | 500.00 | 2009-05-01 | 633.3333333333333333
R&D | marc | 700.00 | 2010-02-15 | 600.0000000000000000

The frame is still limited to the partition (see tom's record, for instance: jack's record isn't use for it's average).

If one wanted the same query, with a moving average on three rows, not reset on each partition switch (still no practical use):


 marc=# SELECT entity, name, salary, start_date, avg(salary) OVER (ORDER BY entity, start_date    ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM salary;
entity | name | salary | start_date | avg
-----------+-----------+---------+---------------+----------------------
Accounting | millicent | 850.00 | 2006-01-01 | 825.0000000000000000
Accounting | jack | 800.00 | 2010-05-01 | 916.6666666666666667
R&D | tom | 1100.00 | 2005-01-01 | 966.6666666666666667
R&D | john | 1000.00 | 2008-07-01 | 933.3333333333333333
R&D | maria | 700.00 | 2009-01-01 | 733.3333333333333333
R&D | kevin | 500.00 | 2009-05-01 | 633.3333333333333333
R&D | marc | 700.00 | 2010-02-15 | 600.0000000000000000

In short, a power full tool to be mastered, even if I couldn't provide a good example.


Sort in aggregates


This feature is a subtle one: the result of an aggregate function may depend on the order it receives the data.

Of course, we're not talking about count, avg, but of array_agg, string_agg…

This is nice, as this will showcase string_agg, which is another 9.0 feature, killing two birds with one stone.

Let's start again with our salary table. We want the list of employees, concatenated as a single value, grouped by entity. It's going into a spreadsheet…


 marc=# SELECT entity,string_agg(name,', ') FROM salary GROUP BY entity;
entity | string_agg
-----------+-------------------------------
Accounting | jack, millicent
R&D | marc, maria, kevin, john, tom

That's already nice. But I want them sorted in alphabetical order, because I don't know how to write a macro in my spreadsheet to sort this data.


 marc=# SELECT entity,string_agg(name,', ' ORDER BY name) FROM salary GROUP BY entity;
entity | string_agg
-----------+-------------------------------
Accounting | etienne, stephanie
R&D | john, kevin, marc, maria, tom

To use this new feature, the sort clause must be inserted inside the aggregate function, without a comma to separate it from the parameters.


Better unique constraints error messages


With 8.4:


 marc=# INSERT INTO test VALUES (1);
ERROR: duplicate key value violates unique constraint "test_a_key"

With 9.0:


 marc=# INSERT INTO test VALUES (1);
ERROR: duplicate key value violates unique constraint "test_a_key"
DETAIL: Key (a)=(1) already exists.

This will make diagnosing constraint violation errors much easier.


Explain buffers, hashing statistics, xml, json, yaml, new optional explain syntax


Here is EXPLAIN ANALYZE as we all know it:


 marc=# EXPLAIN ANALYZE SELECT a, sum(c) FROM pere JOIN fils ON (pere.a = fils.b) WHERE b BETWEEN 1000 AND 300000 GROUP BY a;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=905.48..905.86 rows=31 width=8) (actual time=0.444..0.453 rows=6 loops=1)
-> Nested Loop (cost=10.70..905.32 rows=31 width=8) (actual time=0.104..0.423 rows=6 loops=1)
-> Bitmap Heap Scan on fils (cost=10.70..295.78 rows=31 width=8) (actual time=0.040..0.154 rows=30 loops=1)
Recheck Cond: ((b >= 1000) AND (b <= 300000))
-> Bitmap Index Scan on fils_pkey (cost=0.00..10.69 rows=31 width=0) (actual time=0.023..0.023 rows=30 loops=1)
Index Cond: ((b >= 1000) AND (b <= 300000))
-> Index Scan using pere_pkey on pere (cost=0.00..19.65 rows=1 width=4) (actual time=0.005..0.005 rows=0 loops=30)
Index Cond: (pere.a = fils.b)
Total runtime: 0.560 ms
(9 rows)

To get access to the new available information, use the new syntax::


 EXPLAIN [ ( { ANALYZE boolean | VERBOSE boolean | COSTS boolean | BUFFERS boolean | FORMAT { TEXT | XML | JSON | YAML } } [, ...] ) ] instruction

For instance:


 marc=# EXPLAIN (ANALYZE true, VERBOSE true, BUFFERS true) SELECT a, sum(c) FROM pere JOIN fils ON (pere.a = fils.b) WHERE b BETWEEN 1000 AND 300000 GROUP BY a;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=905.48..905.86 rows=31 width=8) (actual time=1.326..1.336 rows=6 loops=1)
Output: pere.a, sum(fils.c)
Buffers: shared hit=58 read=40
-> Nested Loop (cost=10.70..905.32 rows=31 width=8) (actual time=0.278..1.288 rows=6 loops=1)
Output: pere.a, fils.c
Buffers: shared hit=58 read=40
-> Bitmap Heap Scan on public.fils (cost=10.70..295.78 rows=31 width=8) (actual time=0.073..0.737 rows=30 loops=1)
Output: fils.b, fils.c
Recheck Cond: ((fils.b >= 1000) AND (fils.b <= 300000))
Buffers: shared hit=4 read=28
-> Bitmap Index Scan on fils_pkey (cost=0.00..10.69 rows=31 width=0) (actual time=0.030..0.030 rows=30 loops=1)
Index Cond: ((fils.b >= 1000) AND (fils.b <= 300000))
Buffers: shared hit=3
-> Index Scan using pere_pkey on public.pere (cost=0.00..19.65 rows=1 width=4) (actual time=0.013..0.014 rows=0 loops=30)
Output: pere.a
Index Cond: (pere.a = fils.b)
Buffers: shared hit=54 read=12
Total runtime: 1.526 ms
(18 rows)

VERBOSE displays the 'Output' lines (it already existed on 8.4).

BUFFERS displays data about buffers (input-output operations performed by the query): hit is the number of blocks obtained directly from shared_buffers, read is the number of blocs asked to the operating system. Here, there was very little data in shared_buffers.

One can also ask for another formatting than plain text. For a user, it's not useful. For people developing GUIs over EXPLAIN, it simplifies development as they can get rid of an 'explain' parser (and its potential bugs), and use a more standard one, such as XML.

Costs display can also be deactivated with COSTS false.


Unaccent filtering dictionary


Filtering dictionaries can now be set up. This is about Full Text Search dictionaries.

These dictionaries' purpose it applying a first filter on words before lexemizing them. The module presented here is the first one to use this mechanism. Filtering can consist in removing words or modifying them.

Unaccent doesn't remove words, it removes accents (all diacritic signs, as a matter of fact), replacing accentuated characters with non-accentuated ones (many people, at least in French, don't type them). Unaccent is a contrib module.

Installing it, as all contrib modules, is as easy as


 psql mydb < contribs_path/unaccent.sql.

We'll now follow unaccent's documentation, the example being filtering french words.

Let's create a new 'fr' dictionary (keeping standard 'french' dictionary clean):


 marc=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french );
CREATE TEXT SEARCH CONFIGURATION

The next statement alters the 'fr' setup for word and alike lexemes. These now have to go through unaccent and french_stem instead of only french_stem.


 marc=# ALTER TEXT SEARCH CONFIGURATION fr
>ALTER MAPPING FOR hword, hword_part, word
>WITH unaccent, french_stem;
>ALTER TEXT SEARCH CONFIGURATION

SELECT to_tsvector('fr','Hôtels de la Mer');
to_tsvector
-------------------
'hotel':1 'mer':4
(1 row)

marc=# SELECT to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels');
 ?column?
----------
t
(1 row)

It's now easy, without changing even one line of code in the client application, and keeping accentuated characters in the database, to look up words without taking accents into account.


vacuumdb --analyze-only


As the parameter name's indicates, one can now use vacuumdb to run analyzes only. It may be useful for cronjobs for instance.


Hstore contrib enhancements


This already powerful contrib module has become even more powerful:


  • Keys and values size limit has been removed.
  • GROUP BY and DISTINCT can now be used.
  • New operators and functions have been added.

An example would take too long, this module has a lot of features. Read the documentation at once !


Statement logged by auto_explain


auto_explain contrib module will now print the statement with its plan, which will make it much easier to use.


Buffers accounting for pg_stat_statements


This already very useful contrib module now provides data about buffers. pg_stat_statements, as a reminder, collects statistics on the queries run on the database. Until now, it stored the query's code, number of executions, accumulated runtime, accumulated returned records. It now collects buffer operations too.


 marc=# SELECT * from pg_stat_statements order by total_time desc limit 2;
-[ RECORD 1 ]-------+---------------------
userid | 10
dbid | 16485
query | SELECT * from table1 ;
calls | 2
total_time | 0.491229
rows | 420000
shared_blks_hit | 61
shared_blks_read | 2251
shared_blks_written | 0
local_blks_hit | 0
local_blks_read | 0
local_blks_written | 0
temp_blks_read | 0
temp_blks_written | 0
-[ RECORD 2 ]-------+---------------------
userid | 10
dbid | 16485
query | SELECT * from table2;
calls | 2
total_time | 0.141445
rows | 200000
shared_blks_hit | 443
shared_blks_read | 443
shared_blks_written | 0
local_blks_hit | 0
local_blks_read | 0
local_blks_written | 0
temp_blks_read | 0
temp_blks_written | 0

When this contrib is installed, one can now answer these questions:


  • Which query has the biggest accumulated runtime ?
  • Which query generates the most IO operations ? (we still can't know if data has been found in the Operating System's cache)
  • Which query uses mostly the cache (and hence won't be faster if we make it bigger) ?
  • Which query modifies the most blocks ?
  • Who does sorting ?

'temp' and 'local' are the buffer operations relative to temporary tables and other local operations (sorts, hashs) to a database backend.


passwordcheck


This contrib module can check passwords, and prevent the worst of them from getting in. After having it installed and set up as described in the documentation, here is the result:


 marc=# ALTER USER marc password 'marc12';
<marc%marc> ERROR: password is too short
<marc%marc> STATEMENT: ALTER USER marc password 'marc12';
ERROR: password is too short
marc=# ALTER USER marc password 'marc123456';
<marc%marc> ERROR: password must not contain user name
<marc%marc> STATEMENT: ALTER USER marc password 'marc123456';
ERROR: password must not contain user name

This module has limitations, mostly due to PostgreSQL accepting already encrypted passwords to be declared, making correct verification impossible. Nevertheless, it's a step forward in the right direction.

Moreover, its code is well documented, and can be easily adapted to suit specific needs (one can activate cracklib very easily, for instance)









原文: http://qtchina.tk/?q=node/491

Powered by zexport