In this article by Simon Riggs, Gabriele Bartolini, Hannu Krosing, Gianni Ciolli, the authors of the book PostgreSQL Administration Cookbook - Third Edition, you will learn the following recipes:

Reading The Fine Manual (RTFM)
Planning a new database
Changing parameters in your programs
Finding the current configuration settings
Which parameters are at nondefault settings?
Updating the parameter file
Setting parameters for particular groups of users
The basic server configuration checklist
Adding an external module to PostgreSQL
Using an installed module
Managing installed extensions

(For more resources related to this topic, see here.)

I get asked many questions about parameter settings in PostgreSQL. Everybody's busy and most people want a 5-minute tour of how things work. That's exactly what a Cookbook does, so we'll do our best.

Some people believe that there are some magical parameter settings that will improve their performance, spending hours combing the pages of books to glean insights. Others feel comfortable because they have found some website somewhere that "explains everything", and they "know" they have their database configured OK.

For the most part, the settings are easy to understand. Finding the best setting can be difficult, and the optimal setting may change over time in some cases. This article is mostly about knowing how, when, and where to change parameter settings.

Reading The Fine Manual (RTFM)

RTFM is often used rudely to mean "don't bother me, I'm busy", or it is used as a stronger form of abuse. The strange thing is that asking you to read a manual is most often very good advice. Don't flame the advisors back; take the advice! The most important point to remember is that you should refer to a manual whose release version matches that of the server on which you are operating.

The PostgreSQL manual is very well-written and comprehensive in its coverage of specific topics. However, one of its main failings is that the "documents" aren't organized in a way that helps somebody who is trying to learn PostgreSQL. They are organized from the perspective of people checking specific technical points so that they can decide whether their difficulty is a user error or not. It sometimes answers "What?" but seldom "Why?" or "How?"

I've helped write sections of the PostgreSQL documents, so I'm not embarrassed to steer you towards reading them. There are, nonetheless, many things to read here that are useful.

How to do it…

The main documents for each PostgreSQL release are available at
http://www.postgresql.org/docs/manuals/.

The most frequently accessed parts of the documents are as follows:

SQL command reference, as well as client and server tools reference: http://www.postgresql.org/docs/current/interactive/reference.html
Configuration: http://www.postgresql.org/docs/current/interactive/runtime-config.html
Functions: http://www.postgresql.org/docs/current/interactive/functions.html

You can also grab yourself a PDF version of the manual, which can allow easier searching in some cases. Don't print it! The documents are more than 2000 pages of A4-sized sheets.

How it works…

The PostgreSQL documents are written in SGML, which is similar to, but not the same as, XML. These files are then processed to generate HTML files, PDF, and so on. This ensures that all the formats have exactly the same content. Then, you can choose the format you prefer, and you can even compile it in other formats such as EPUB, INFO, and so on.

Moreover, the PostgreSQL manual is actually a subset of the PostgreSQL source code, so it evolves together with the software. It is written by the same people who make PostgreSQL. Even more reasons to read it!

There's more…

More information is also available at http://wiki.postgresql.org.

Many distributions offer packages that install static versions of the HTML documentation. For example, on Debian and Ubuntu, the docs for the most recent stable PostgreSQL version are named postgresql-9.6-docs (unsurprisingly).

Planning a new database

Planning a new database can be a daunting task. It's easy to get overwhelmed by it, so here, we present some planning ideas. It's also easy to charge headlong at the task as well, thinking that whatever you know is all you'll ever need to consider.

Getting ready

You are ready. Don't wait to be told what to do. If you haven't been told what the requirements are, then write down what you think they are, clearly labeling them as "assumptions" rather than "requirements"—we mustn't confuse the two things.

Iterate until you get some agreement, and then build a prototype.

How to do it…

Write a document that covers the following items:

Database design—plan your database design
Calculate the initial database sizing
Transaction analysis—how will we access the database?
Look at the most frequent access paths
What are the requirements for response times?
Hardware configuration
Initial performance thoughts—will all of the data fit into RAM?
Choose the operating system and filesystem type
How do we partition the disk?
Localization plan
Decide server encoding, locale, and time zone
Access and security plan
Identify client systems and specify required drivers
Create roles according to a plan for access control
Specify pg_hba.conf
Maintenance plan—who will keep it working? How?
Availability plan—consider the availability requirements
checkpoint_timeout
Plan your backup mechanism and test it
High-availability plan
Decide which form of replication you'll need, if any

How it works…

One of the most important reasons for planning your database ahead of time is that retrofitting some things is difficult. This is especially true of server encoding and locale, which can cause much downtime and exertion if we need to change them later. Security is also much more difficult to set up after the system is live.

There's more…

Planning always helps. You may know what you're doing, but others may not. Tell everybody what you're going to do before you do it to avoid wasting time. If you're not sure yet, then build a prototype to help you decide. Approach the administration framework as if it were a development task. Make a list of things you don't know yet, and work through them one by one.

This is deliberately a very short recipe. Everybody has their own way of doing things, and it's very important not to be too prescriptive about how to do things. If you already have a plan, great! If you don't, think about what you need to do, make a checklist, and then do it.

Changing parameters in your programs

PostgreSQL allows you to set some parameter settings for each session or transaction.

How to do it…

You can change the value of a setting during your session, like this:

SET work_mem = '16MB';

This value will then be used for every future transaction. You can also change it only for the duration of the "current transaction":

SET LOCAL work_mem = '16MB';

The setting will last until you issue this command:

RESET work_mem;

Alternatively, you can issue the following command:

RESET ALL;

SETand RESET commands are SQL commands that can be issued from any interface. They apply only to PostgreSQL server parameters, but this does not mean that they affect the entire server. In fact, the parameters you can change with SET and RESET apply only to the current session. Also, note that there may be other parameters, such as JDBC driver parameters, that cannot be set in this way.

How it works…

Suppose you change the value of a setting during your session, for example, by issuing this command:

SET work_mem = '16MB';

Then, the following will show up in the pg_settings catalog view:

postgres=# SELECT name, setting, reset_val, source
                      FROM pg_settings WHERE source = 'session';
name   | setting | reset_val | source  
----------+---------+-----------+---------
 work_mem | 16384   | 1024      | session

Until you issue this command:

RESET work_mem;

After issuing it, the setting returns to reset_val and the source returns to default:

name   | setting | reset_val | source  
---------+---------+-----------+---------
work_mem | 1024    | 1024      | default

There's more…

You can change the value of a setting during your transaction as well, like this:

SET LOCAL work_mem = '16MB';

Then, this will show up in the pg_settings catalog view:

postgres=# SELECT name, setting, reset_val, source 
                      FROM pg_settings WHERE source = 'session';
name   | setting | reset_val | source  
----------+---------+-----------+---------
 work_mem |  1024   | 1024      | session

Huh? What happened to your parameter setting? The SET LOCAL command takes effect only for the transaction in which it was executed, which was just the SET LOCAL command in our case. We need to execute it inside a transaction block to be able to see the setting take hold, as follows:

BEGIN;
SET LOCAL work_mem = '16MB';

Here is what shows up in the pg_settings catalog view:

postgres=# SELECT name, setting, reset_val, source
                      FROM pg_settings WHERE source = 'session';
name   | setting | reset_val | source  
----------+---------+-----------+---------
 work_mem | 16384   | 1024      | session

You should also note that the value of source is session rather than transaction, as you might have been expecting.

Finding the current configuration settings

At some point, it will occur to you to ask, "What are the current configuration settings?"

Most settings can be changed in more than one way, and some ways do not affect all users or all sessions, so it is quite possible to get confused.

How to do it…

Your first thought is probably to look in postgresql.conf, which is the configuration file, described in detail in the Updating the parameter file recipe. That works, but only as long as there is only one parameter file. If there are two, then maybe you're reading the wrong file! (How do you know?) So, the cautious and accurate way is not to trust a text file, but to trust the server itself.

Moreover, you learned in the previous recipe, Changing parameters in your programs, that each parameter has a scope that determines when it can be set. Some parameters can be set through postgresql.conf, but others can be changed afterwards. So, the current value of configuration settings may have been subsequently changed.

We can use the SHOW command like this:

postgres=# SHOW work_mem;

Its output is as follows:

work_mem 
----------
1MB 
(1 row)

However, remember that it reports the current setting at the time it is run, and that can be changed in many places.

Another way of finding the current settings is to access a PostgreSQL catalog view named pg_settings:

postgres=# x
Expanded display is on.
postgres=# SELECT * FROM pg_settings WHERE name = 'work_mem';
[ RECORD 1 ] --------------------------------------------------------
name       | work_mem 
setting    | 1024 
unit       | kB 
category   | Resource Usage / Memory 
short_desc | Sets the maximum memory to be used for query workspaces.
extra_desc | This much memory can be used by each internal sort operation and hash table before switching to temporary disk files.
context    | user 
vartype    | integer 
source     | default 
min_val    | 64 
max_val    | 2147483647 
enumvals   | 
boot_val   | 1024 
reset_val  | 1024 
sourcefile | 
sourceline |

Thus, you can use the SHOW command to retrieve the value for a setting, or you can access the full details via the catalog table.

There's more…

The actual location of each configuration file can be asked directly to the PostgreSQL server, as shown in this example:

postgres=# SHOW config_file;

This returns the following output:

               config_file
------------------------------------------
 /etc/postgresql/9.4/main/postgresql.conf
(1 row)

The other configuration files can be located by querying similar variables, hba_file and ident_file.

How it works…

Each parameter setting is cached within each session so that we can get fast access to the parameter settings. This allows us to access the parameter settings with ease.

Remember that the values displayed are not necessarily settings for the server as a whole. Many of those parameters will be specific to the current session. That's different from what you experience with many other database software, and is also very useful.

Which parameters are at nondefault settings?

Often, we need to check which parameters have been changed or whether our changes have correctly taken effect.

In the previous two recipes, we have seen that parameters can be changed in several ways, and with different scope. You learned how to inspect the value of one parameter or get the full list of parameters.

In this recipe, we will show you how to use SQL capabilities to list only those parameters whose value in the current session differs from the system-wide default value.

This list is valuable for several reasons. First, it includes only a few of the 200-plus available parameters, so it is more immediate. Also, it is difficult to remember all our past actions, especially in the middle of a long or complicated session.

Version 9.4 introduces the ALTER SYSTEM syntax, which we will describe in the next recipe, Updating the parameter file. From the viewpoint of this recipe, its behavior is quite different from all other setting-related commands; you run it from within your session and it changes the default value, but not the value in your session.

How to do it…

We write a SQL query that lists all parameter values, excluding those whose current value is either the default or set from a configuration file:

postgres=# SELECT name, source, setting
                        FROM pg_settings 
                        WHERE source != 'default'
                        AND source != 'override'
                        ORDER by 2, 1;

The output is as follows:

name            |        source        |      setting
----------------------------+----------------------+-----------------
 application_name           | client               | psql
 client_encoding            | client               | UTF8
 DateStyle                  | configuration file   | ISO, DMY
 default_text_search_config | configuration file   | pg_catalog.english
 dynamic_shared_memory_type | configuration file   | posix
 lc_messages                | configuration file   | en_GB.UTF-8
 lc_monetary                | configuration file   | en_GB.UTF-8
 lc_numeric                 | configuration file   | en_GB.UTF-8
 lc_time                    | configuration file   | en_GB.UTF-8
 log_timezone               | configuration file   | Europe/Rome
 max_connections            | configuration file   | 100
port                       | configuration file   | 5460
 shared_buffers             | configuration file   | 16384
 TimeZone                   | configuration file   | Europe/Rome
 max_stack_depth            | environment variable | 2048

How it works…

You can see from pg_settings which parameters have nondefault values and what the source of the current value is.

The SHOW command doesn't tell you whether a parameter is set at a nondefault value. It just tells you the value, which isn't of much help if you're trying to understand what is set and why.

If the source is a configuration file, then the sourcefile and sourceline columns are also set. These can be useful in understanding where the configuration came from.

There's more…

The setting column of pg_settings shows the current value, but you can also look at boot_val and reset_val. The boot_val parameter shows the value assigned when the PostgreSQL database cluster was initialized (initdb), while reset_val shows the value that the parameter will return to if you issue the RESET command.

The max_stack_depth parameter is an exception because pg_settings says it is set by the environment variable, though it is actually set by ulimit -s on Linux and Unix systems. The max_stack_depth parameter just needs to be set directly on Windows.

The time zone settings are also picked up from the OS environment, so you shouldn't need to set those directly. In older releases, pg_settings showed them as command-line settings. From version 9.1 onwards, they are written to postgresql.conf when the data directory is initialized, so they show up as configuration files.

Updating the parameter file

The parameter file is the main location for defining parameter values for the PostgreSQL server. All the parameters can be set in the parameter file, which is known as postgresql.conf.

There are also two other parameter files: pg_hba.conf and pg_ident.conf. Both of these relate to connections and security.

Getting ready

First, locate postgresql.conf, as described earlier.

How to do it…

Some of the parameters take effect only when the server is first started. A typical example might be shared_buffers, which defines the size of the shared memory cache.

Many of the parameters can be changed while the server is still running. After changing the required parameters, we issue a reload operation to the server, forcing PostgreSQL to reread the postgresql.conf file (and all other configuration files). There is a number of ways to do that, depending on your distribution and OS. The most common is to issue the following command:

pg_ctl reload

with the same OS user that runs the PostgreSQL server process. This assumes the default data directory; otherwise you have to specify the correct data directory with the -D option.

As noted earlier, Debian and Ubuntu have a different multiversion architecture, so you should issue the following command instead:

pg_ctlcluster 9.6 main reload

On modern distributions you can also use systemd, as follows:

sudo systemctl reload [email protected]

Some other parameters require a restart of the server for changes to take effect, for instance, max_connections, listen_addresses, and so on. The syntax is very similar to a reload operation, as shown here:

pg_ctl restart

For Debian and Ubuntu, use this command:

pg_ctlcluster 9.6 main restart

and with systemd:

sudo systemctl restart [email protected]

The postgresql.conf file is a normal text file that can be simply edited. Most of the parameters are listed in the file, so you can just search for them and then insert the desired value in the right place.

How it works…

If you set the same parameter twice in different parts of the file, the last setting is what applies. This can cause lots of confusion if you add settings to the bottom of the file, so you are advised against doing that.

The best practice is to either leave the file as it is and edit the values, or to start with a blank file and include only the values that you wish to change. I personally prefer a file with only the nondefault values. That makes it easier to see what's happening.

Whichever method you use, you are strongly advised to keep all the previous versions of your .conf files. You can do this by copying, or you can use a version control system such as Git or SVN.

There's more…

The postgresql.conf file also supports an include directive. This allows the postgresql.conf file to reference other files, which can then reference other files, and so on. That may help you organize your parameter settings better, if you don't make it too complicated.

If you are working with PostgreSQL version 9.4 or later, you can change the values stored in the parameter files directly from your session, with syntax such as the following:

ALTER SYSTEM SET shared_buffers = '1GB';

This command will not actually edit postgresql.conf. Instead, it writes the new setting to another file named postgresql.auto.conf. The effect is equivalent, albeit in a safer way. The original configuration is never written, so it cannot be damaged in the event of a crash. If you mess up with too many ALTER SYSTEM commands, you can always delete postgresql.auto.conf manually and reload the configuration, or restart PostgreSQL, depending on what parameters you had changed.

Setting parameters for particular groups of users

PostgreSQL supports a variety of ways of defining parameter settings for various user groups. This is very convenient, especially to manage user groups that have different requirements.

How to do it…

For all users in the saas database, use the following commands:

ALTER DATABASE saas
SET configuration_parameter = value1;

For a user named simon connected to any database, use this:

ALTER ROLE Simon
SET configuration_parameter = value2;

Alternatively, you can set a parameter for a user only when connected to a specific database, as follows:

ALTER ROLE Simon
IN DATABASE saas
SET configuration_parameter = value3;

The user won't know that these have been executed specifically for them. These are default settings, and in most cases, they can be overridden if the user requires nondefault values.

How it works…

You can set parameters for each of the following:

Database
User (which is named role by PostgreSQL)
Database/user combination

Each of the parameter defaults is overridden by the one below it.

In the preceding three SQL statements:

If user hannu connects to the saas database, then value1 will apply
If user simon connects to a database other than saas, then value2 will apply
If user simon connects to the saas database, then value3 will apply

PostgreSQL implements this in exactly the same way as if the user had manually issued the equivalent SET statements immediately after connecting.

The basic server configuration checklist

PostgreSQL arrives configured for use on a shared system, though many people want to run dedicated database systems. The PostgreSQL project wishes to ensure that PostgreSQL will play nicely with other server software, and will not assume that it has access to the full server resources. If you, as the system administrator, know that there is no other important server software running on this system, then you can crank up the values much higher.

Getting ready

Before we start, we need to know two sets of information:

We need to know the size of the physical RAM that will be dedicated to PostgreSQL
We need to know something about the types of applications for which we will use PostgreSQL

How to do it…

If your database is larger than 32 MB, then you'll probably benefit from increasing shared_buffers. You can increase this to much larger values, but remember that running out of memory induces many problems.

For instance, PostgreSQL is able to store information to the disk when the available memory is too small, and it employs sophisticated algorithms to treat each case differently and to place each piece of data either in the disk or in the memory, depending on each use case.

On the other hand, overstating the amount of available memory confuses such abilities and results in suboptimal behavior. For instance, if the memory is swapped to disk, then PostgreSQL will inefficiently treat all data as if it were the RAM. Another unfortunate circumstance is when the Linux Out-Of-Memory (OOM) killer terminates one of the various processes spawned by the PostgreSQL server. So, it's better to be conservative. It is good practice to set a low value in your postgresql.conf and increment slowly to ensure that you get the benefits from each change.

If you increase shared_buffers and you're running on a non-Windows server, you will almost certainly need to increase the value of the SHMMAX OS parameter (and on some platforms, other parameters as well).

On Linux, Mac OS, and FreeBSD, you will need to either edit the /etc/sysctl.conf file or use sysctl -w with the following values:

For Linux, use kernel.shmmax=value
For Mac OS, use kern.sysv.shmmax=value
For FreeBSD, use kern.ipc.shmmax=value

There's more…

For more information, you can refer to http://www.postgresql.org/docs/9.6 /static/kernel-resources.html#SYSVIPC.

For example, on Linux, add the following line to /etc/sysctl.conf:

kernel.shmmax=value

Don't worry about setting effective_cache_size. It is much less important a parameter than you might think. There is no need for too much fuss selecting the value.

If you're doing heavy write activity, then you may want to set wal_buffers to a much higher value than the default. In fact wal_buffers is set automatically from the value of shared_buffers, following a rule that fits most cases; however, it is always possible to specify an explicit value that overrides the computation for the very few cases where the rule is not good enough.

If you're doing heavy write activity and/or large data loads, you may want to set checkpoint_segments higher than the default to avoid wasting I/O in excessively frequent checkpoints.

If your database has many large queries, you may wish to set work_mem to a value higher than the default. However, remember that such a limit applies separately to each node in the query plan, so there is a real risk of overallocating memory, with all the problems discussed earlier.

Ensure that autovacuum is turned on, unless you have a very good reason to turn it off—most people don't.

Leave the settings as they are for now. Don't fuss too much about getting the settings right. You can change most of them later, so you can take an iterative approach to improving things.

And remember, don't touch the fsync parameter. It's keeping you safe.

Adding an external module to PostgreSQL

Another strength of PostgreSQL is its extensibility. Extensibility was one of the original design goals, going back to the late 1980s. Now, in PostgreSQL 9.6, there are many additional modules that plug into the core PostgreSQL server.

There are many kinds of additional module offerings, such as the following:

Additional functions
Additional data types
Additional operators
Additional indexes

Getting ready

First, you'll need to select an appropriate module to install.

The walk towards a complete, automated package management system for PostgreSQL is not over yet, so you need to look in more than one place for the available modules, such as the following:

Contrib: The PostgreSQL "core" includes many functions. There is also an official section for add-in modules, known as "contrib" modules. They are always available for your database server, but are not automatically enabled in every database because not all users might need them. On PostgreSQL version 9.6, we will have more than 40 such modules. These are documented at http://www.postgresql.org/docs/9.6/static/contrib.html.
PGXN: This is the PostgreSQL Extension Network, a central distribution system dedicated to sharing PostgreSQL extensions. The website started in 2010, as a repository dedicated to the sharing of extension files.
Separate projects: These are large external projects, such as PostGIS, offering extensive and complex PostgreSQL modules. For more information, take a look at http://www.postgis.org/.

How to do it…

There are several ways to make additional modules available for your database server, as follows:

Using a software installer
Installing from PGXN
Installing from a manually downloaded package
Installing from source code

Often, a particular module will be available in more than one way, and users are free to choose their favorite, exactly like PostgreSQL itself, which can be downloaded and installed through many different procedures.

Installing modules using a software installer

Certain modules are available exactly like any other software packages that you may want to install in your server. All main Linux distributions provide packages for the most popular modules, such as PostGIS, SkyTools, procedural languages other than those distributed with core, and so on.

In some cases, modules can be added during installation if you're using a standalone installer application, for example, the OneClick installer, or tools such as rpm, apt-get, and YaST on Linux distributions. The same procedure can also be followed after the PostgreSQL installation, when the need for a certain module arrives. We will actually describe this case, which is way more common.

For example, let's say that you need to manage a collection of Debian package files, and that one of your tasks is to be able to pick the latest version of one of them. You start by building a database that records all the package files. Clearly, you need to store the version number of each package. However, Debian version numbers are much more complex than what we usually call "numbers". For instance, on my Debian laptop, I currently have version 9.2.18-1.pgdg80+1 of the PostgreSQL client package. Despite being complicated, that string follows a clearly defined specification, which includes many bits of information, including how to compare two versions to establish which of them is older.

Since this recipe discussed extending PostgreSQL with custom data types and operators, you might have already guessed that I will now consider a custom data type for Debian version numbers that is capable of tasks such as understanding the Debian version number format, sorting version numbers, choosing the latest version number in a given group, and so on. It turns out that somebody else already did all the work of creating the required PostgreSQL data type, endowed with all the useful accessories: comparison operators, input/output functions, support for indexes, and maximum/minimum aggregates. All of this has been packaged as a PostgreSQL extension, as well as a Debian package (not a big surprise), so it is just a matter of installing the postgresql-9.2-debversion package with a Debian tool such as apt-get, aptitude, or synaptic. On my laptop, that boils down to the command line:

apt-get install postgresql-9.2-debversion

This will download the required package and unpack all the files in the right locations, making them available to my PostgreSQL server.

Installing modules from PGXN

The PostgreSQL Extension Network, PGXN for short, is a website (http://pgxn.org) launched in late 2010 with the purpose of providing "a central distribution system for open source PostgreSQL extension libraries". Anybody can register and upload their own module, packaged as an extension archive. The website allows browsing available extensions and their versions, either via a search interface or from a directory of package and user names.

The simple way is to use a command-line utility, called pgxnclient. It can be easily installed in most systems; see the PGXN website on how to do so. Its purpose is to interact with PGXN and take care of administrative tasks, such as browsing available extensions, downloading the package, compiling the source code, installing files in the proper place, and removing installed package files. Alternatively, you can download the extension files from the website and place them in the right place by following the installation instructions.

PGXN is different from official repositories because it serves another purpose. Official repositories usually contain only seasoned extensions because they accept new software only after a certain amount of evaluation and testing. On the other hand, anybody can ask for a PGXN account and upload their own extensions, so there is no filter except requiring that the extension has an open source license and a few files that any extension must have.

Installing modules from a manually downloaded package

You might have to install a module that is correctly packaged for your system but is not available from the official package archives. For instance, it could be the case that the module has not been accepted in the official repository yet, or you could have repackaged a bespoke version of that module with some custom tweaks, which are so specific that they will never become official. Whatever the case, you will have to follow the installation procedure for standalone packages specific to your system.

Here is an example with the Oracle compatibility module, described at http://postgres.cz/wiki/Oracle_functionality_(en):

First, we get the package, say for PostgreSQL 8.4 on a 64-bit architecture, from http://pgfoundry.org/frs/download.php/2414/orafce-3.0.1-1.pg84.rhel5.x86_64.rpm.

Then, we install the package in the standard way:

rpm -ivh orafce-3.0.1-1.pg84.rhel5.x86_64.rpm
If all the dependencies are met, we are done.

I mentioned dependencies because that's one more potential problem in installing packages that are not officially part of the installed distribution—you can no longer assume that all software version numbers have been tested, all requirements are available, and there are no conflicts. If you get error messages that indicate problems in these areas, you may have to solve them yourself, by manually installing missing packages and/or uninstalling conflicting packages.

Installing modules from source code

In many cases, useful modules may not have full packaging. In these cases, you may need to install the module manually. This isn't very hard and it's a useful exercise that helps you understand what happens.

Each module will have different installation requirements. There are generally two aspects of installing a module. They are as follows:

Building the libraries (only for modules that have libraries)
Installing the module files in the appropriate locations

You need to follow the instructions for the specific module in order to build the libraries, if any are required. Installation will then be straightforward, and usually there will be a suitably prepared configuration file for the make utility so that you just need to type the following command:

make install

Each file will be copied to the right directory. Remember that you normally need to be a system superuser in order to install files on system directories.

Once a library file is in the directory expected by the PostgreSQL server, it will be loaded automatically as soon as requested by a function. Modules such as auto_explain do not provide any additional user-defined function, so they won't be auto-loaded; that needs to be done manually by a superuser with a LOAD statement.

How it works…

PostgreSQL can dynamically load libraries in the following ways:

Using the explicit LOAD command in a session
Using the shared_preload_libraries parameter in postgresql.conf at server start
At session start, using the local_preload_libraries parameter for a specific user, as set using ALTER ROLE

PostgreSQL functions and objects can reference code in these libraries, allowing extensions to be bound tightly to the running server process. The tight binding makes this method suitable for use even in very high-performance applications, and there's no significant difference between additionally supplied features and native features.

Using an installed module

In this recipe, we will explain how to enable an installed module so that it can be used in a particular database. The additional types, functions, and so on will exist only in those databases where we have carried out this step.

Although most modules require this procedure, there are actually a couple of notable exceptions. For instance, the auto_explain module mentioned earlier, which is shipped together with PostgreSQL, does not create any function, type or operator. To use it, you must load its object file using the LOAD command. From that moment, all statements longer than a configurable threshold will be logged together with their execution plan. In the rest of this recipe, we will cover all the other modules. They do not require a LOAD statement because PostgreSQL can automatically load the relevant libraries when they are required.

As mentioned in the previous recipe, Adding an external module to PostgreSQL, specially packaged modules are called extensions in PostgreSQL. They can be managed with dedicated SQL commands.

Getting ready

Suppose you have chosen to install a certain module among those available for your system (see the previous recipe, Adding an external module to PostgreSQL); all you need to know is the extension name.

How to do it…

Each extension has a unique name, so it is just a matter of issuing the following command:

CREATE EXTENSION myextname;

This will automatically create all the required objects inside the current database.

For security reasons, you need to do so as a database superuser. For instance, if you want to install the dblink extension, type this:

CREATE EXTENSION dblink;

How it works…

When you issue a CREATE EXTENSION command, the database server looks for a file named EXTNAME.control in the SHAREDIR/extension directory. That file tells PostgreSQL some properties of the extension, including a description, some installation information, and the default version number of the extension (which is unrelated to the PostgreSQL version number). Then, a creation script is executed in a single transaction, so if it fails, the database is unchanged. The database server also notes in a catalog table the extension name and all the objects that belong to it.

Managing installed extensions

In the last two recipes, we showed you how to install external modules in PostgreSQL to augment its capabilities.

In this recipe, we will show you some more capabilities offered by the extension infrastructure.

How to do it…

First, we list all available extensions:

postgres=# x on
Expanded display is on.
postgres=# SELECT *
postgres-# FROM pg_available_extensions
postgres-# ORDER BY name;
-[ RECORD 1 ]-----+--------------------------------------------------
name              | adminpack
default_version   | 1.0
installed_version | 
comment           | administrative functions for PostgreSQL
-[ RECORD 2 ]-----+--------------------------------------------------
name              | autoinc
default_version   | 1.0
installed_version | 
comment           | functions for autoincrementing fields
(...)

In particular, if the dblink extension is installed, then we see a record like this:

-[ RECORD 10 ]----+--------------------------------------------------
name              | dblink
default_version   | 1.0
installed_version | 1.0
comment           | connect to other PostgreSQL databases from within a database

Now, we can list all the objects in the dblink extension, as follows:

postgres=# x off
Expanded display is off.
postgres=# dx+ dblink
                      Objects in extension "dblink"
                           Object Description
---------------------------------------------------------------------
function dblink_build_sql_delete(text,int2vector,integer,text[])
function dblink_build_sql_insert(text,int2vector,integer,text[],text[])
function dblink_build_sql_update(text,int2vector,integer,text[],text[])
function dblink_cancel_query(text)
function dblink_close(text)
function dblink_close(text,boolean)
function dblink_close(text,text)
(...)

Objects created as parts of extensions are not special in any way, except that you can't drop them individually. This is done to protect you from mistakes:

postgres=# DROP FUNCTION dblink_close(text);
ERROR:  cannot drop function dblink_close(text) because extension dblink requires it
HINT:  You can drop extension dblink instead.

Extensions might have dependencies too. The cube and earthdistance contrib extensions provide a good example, since the latter depends on the former:

postgres=# CREATE EXTENSION earthdistance;
ERROR:  required extension "cube" is not installed
postgres=# CREATE EXTENSION cube;
CREATE EXTENSION
postgres=# CREATE EXTENSION earthdistance;
CREATE EXTENSION
As you can reasonably expect, dependencies are taken into account when dropping objects, just like for other objects:
postgres=# DROP EXTENSION cube;
ERROR:  cannot drop extension cube because other objects depend on it
DETAIL:  extension earthdistance depends on extension cube
HINT:  Use DROP ... CASCADE to drop the dependent objects too.
postgres=# DROP EXTENSION cube CASCADE;
NOTICE:  drop cascades to extension earthdistance
DROP EXTENSION

How it works…

The pg_available_extensions system view shows one row for each extension control file in the SHAREDIR/extension directory (see the Using an installed module recipe). The pg_extension catalog table records only the extensions that have actually been created.

The psql command-line utility provides the dx meta-command to examine e"x"tensions. It supports an optional plus sign (+) to control verbosity and an optional pattern for the extension name to restrict its range. Consider the following command:

dx+ db*

This will list all extensions whose name starts with db, together with all their objects.

The CREATE EXTENSION command creates all objects belonging to a given extension, and then records the dependency of each object on the extension in pg_depend. That's how PostgreSQL can ensure that you cannot drop one such object without dropping its extension.

The extension control file admits an optional line, requires, that names one or more extensions on which the current one depends. The implementation of dependencies is still quite simple. For instance, there is no way to specify a dependency on a specific version number of other extensions, and there is no command that installs one extension and all its prerequisites.

As a general PostgreSQL rule, the CASCADE keyword tells the DROP command to delete all the objects that depend on cube, the earthdistance extension in this example.

There's more…

Another system view, pg_available_extension_versions, shows all the versions available for each extension. It can be valuable when there are multiple versions of the same extension available at the same time, for example, when making preparations for an extension upgrade.

When a more recent version of an already installed extension becomes available to the database server, for instance because of a distribution upgrade that installs updated package files, the superuser can perform an upgrade by issuing the following command:

ALTER EXTENSION myext UPDATE TO '1.1';

This assumes that the author of the extension taught it how to perform the upgrade.

Starting from version 9.6, the CASCADE option is accepted also by the CREATE EXTENSION syntax, with the meaning of “issue CREATE EXTENSION recursively to cover all dependencies”. So, instead of creating extension cube before creating extension earthdistance, you could have just issued the following command:

postgres=# CREATE EXTENSION earthdistance CASCADE;
NOTICE:  installing required extension "cube"
CREATE EXTENSION

Remember that CREATE EXTENSION … CASCADE will only work if all the extensions it tries to install have already been placed in the appropriate location.

Summary

In this article, we studies about RTFM, how to plan new database, changing parameters in the program, changing configurations, updating parameter files, how to use a module which is already installed.