Packt+ | Advance your knowledge in tech

You're reading from Learn PostgreSQL Build and manage high-performance database solutions using PostgreSQL 12 and 13

Product type Paperback

Published in Oct 2020

Publisher Packt

ISBN-13 9781838985288

Length 650 pages

Edition 1st Edition

Languages

SQL

Tools

PostgreSQL

Concepts

Databases

Authors (2):

Ferrari

Enrico Pirozzi

View More author details

Exploring PostgreSQL terminology

A PostgreSQL instance is called a cluster because a single instance can serve and handle multiple databases. Every database is an isolated space where users and applications can store data.

A database is accessed by allowed users, but users connected to a database cannot cross the database boundaries and interact with data contained in another database, unless they explicitly connect to the latter database too.

A database can be organized into namespaces, called schemas. A schema is a mnemonic name that the user can assign to organize database objects, such as tables, into a more structured collection. Schemas cannot be nested, so they represent a flat namespace.

Database objects are represented by everything the user can create and manage within the database—for instance, tables, functions, triggers, and data types. Every object belongs to one and only one schema that, if not specified, is the default public schema.

Users are defined at a cluster-wide level, which means they are not tied to a particular database in the cluster. A user can connect with and manage any database in the cluster they have been allowed to.

PostgreSQL splits users into two main categories:

Normal users: These users are the ones who can connect to and handle databases and objects depending on their privilege set.
Superusers: These users can do anything with any database object.

PostgreSQL allows the configuration of as many superusers as you need, and every superuser has the very same permissions: they can do everything with every database and object and, most notably, can also control the life cycle of the cluster (for instance, they can terminate normal user connections, reload the configuration, stop the whole cluster, and so on).

PostgreSQL internal data, such as users, databases, namespaces, configuration, and database runtime status, is provided by means of catalogs: special tables that present information in a SQL-interactive way. Many catalogs are trimmed depending on the user who is inspecting them, with the exception that superusers usually see the whole set of available information.

PostgreSQL stores the user data (for example, tables) and its internal status on the local filesystem. This is an important point to keep in mind: PostgreSQL relies on the underlying filesystem to implement persistence, and therefore tuning the filesystem is an important task in order to make PostgreSQL perform well. In particular, PostgreSQL stores all of its content (user data and internal status) in a single filesystem directory known as PGDATA. The PGDATA directory represents what the cluster is serving as databases, so it is possible for you to have a single installation of PostgreSQL and make it switch to different PGDATA directories to deliver different content. In effect, this is a possible way to implement quick upgrades between major versions. As you will see in the next sections, the PGDATA directory needs to be initialized before it can be used by PostgreSQL; the initialization is the creation of the directory structure within PGDATA itself and is, of course, a one-time operation.

The detailed content of PGDATA will be explained later in the next chapter, but for now, it will suffice for you to remember that the PGDATA directory is where PostgreSQL expects to find data and configuration files. In particular, the PGDATA directory is made by at least the write-ahead logs (WALs) and the data storage. Without either of those two parts, the cluster is unable to guarantee data consistency and, in some critical circumstances, even start.

WALs are a technology that many database systems use, and even some transaction filesystems (such as ZFS, ReiserFS, UFS with Soft Updates, and so on) provide. The idea is that, before applying any change to a chunk of data, an intent log will be made persistent. In this case, if the cluster crashes, it can always rely on the already-written intent log to understand what operations have been completed and what must be recovered (more details on this in later chapters). Please note that with the term "crash," we refer to any possible disaster that can hit your cluster, including a software bug, but more likely the lack of electrical power, hard disk failures, and so on. PostgreSQL does commit to providing to you the best data consistency it can, and therefore, it makes a great effort to ensure that the intent log (WAL) is as secure as possible.

Internally, PostgreSQL keeps track of the tables structures, indexes, functions, and all the stuff needed to manage the cluster in dedicated storage named the catalog. The PostgreSQL catalog is fundamental for the life cycle of the cluster and reflects pretty much every action the database does on the user's structures and data. PostgreSQL provides access to the catalog from database superusers by means of an SQL interface, which means the catalog is totally explorable and, to some extent, manipulable, via SQL statements.

The SQL standard defines a so-called information schema, a collection of tables common to all standard database implementations, including PostgreSQL, that the DBA can use to inspect the internal status of the database itself. For instance, the information schema defines a table that collects information about all the user-defined tables so that it is possible to query the information schema to see whether a specific table exists or not.

The PostgreSQL catalog is what some call an "information schema on steroids": the catalog is much more accurate and PostgreSQL-specific that the general information schema, and the DBA can extract a lot more information about the PostgreSQL status from the catalog. Of course, PostgreSQL does support the information schema, but throughout the whole book, you will see references to the catalogs because they provide much more detailed information.

When the cluster is started, PostgreSQL launches a single process called the postmaster. The aim of the postmaster is to wait for incoming client connections, often made over a TCP/IP connection, and fork another process named the backend process, which in turn is in charge of serving one and only one connection.

This means that every time a new connection against the cluster is opened, the cluster reacts by launching a new backend process to serve it until the connection ends and the process is, consequently, destroyed. The postmaster usually starts also some utility processes that are responsible to keep PostgreSQL in good shape while it is running; these processes will be discussed later in this process.

To summarize, PostgreSQL provides you with executables that can be installed wherever you want on your system and can serve a single cluster. The cluster, in turn, serves data out of a single PGDATA directory that contains, among other stuff, the user data, the cluster internal status, and the WALs. Every time a client connects to the server, the postmaster process forks a new backend process that is the minion in charge of serving the connection.

This is a quick recap of the main terms used within PostgreSQL:

Cluster: Cluster refers to the whole PostgreSQL service.
Postmaster: This is the first process the cluster executes, and this process is responsible for keeping track of the activities of the whole cluster. The postmaster forks itself into a backend process every time a new connection is established.
Database: The database is an isolated data container to which users (or applications) can connect to. A cluster can handle multiple databases. A database can be made by different objects, including schemas (namespaces), tables, triggers, and other objects you will see as the book progresses.
PGDATA: PGDATA is the name of the directory that, on persistent storage, is fully dedicated to PostgreSQL and its data. PostgreSQL stores the data within such a directory.
WALs: WALs contains the intent log of database changes, used to recover data from a critical crash.

Now that we've discussed the basic terminology related to PostgreSQL, it is time to get it installed on your machine.