Databases | 0 articles | Tech News, Tutorials & Expert Insights

article-image-tom-swartz-tuning-your-postgres-database-for-high-write-loads-from-planet-postgresql

14 Oct 2020

1 min read

Tom Swartz: Tuning Your Postgres Database for High Write Loads from Planet PostgreSQL

14 Oct 2020

As a database grows and scales up from a proof of concept to a full-fledged production instance, there are always a variety of growing pains that database administrators and systems administrators will run into. Very often, the engineers on the Crunchy Data support team help support enterprise projects which start out as small, proof of concept systems, and are then promoted to large scale production uses. As these systems receive increased traffic load beyond their original proof-of-concept sizes, one issue may be observed in the Postgres logs as the following: LOG: checkpoints are occurring too frequently (9 seconds apart) HINT: Consider increasing the configuration parameter "max_wal_size". LOG: checkpoints are occurring too frequently (2 seconds apart) HINT: Consider increasing the configuration parameter "max_wal_size".

0
0
995

Matthew Emerick

14 Oct 2020

1 min read

Bruce Momjian: Thirty Years of Continuous PostgreSQL Development from Planet PostgreSQL

Matthew Emerick

14 Oct 2020

1 min read

I did an interview with EDB recently, and a blog post based on that interview was published yesterday. It covers the Postgres 13 feature set and the effects of open source on the software development process.

0
0
1296

Matthew Emerick

14 Oct 2020

8 min read

Hans-Juergen Schoenig: pg_squeeze: Optimizing PostgreSQL storage from Planet PostgreSQL

Matthew Emerick

14 Oct 2020

8 min read

Is your database growing at a rapid rate? Does your database system slow down all the time? And maybe you have trouble understanding why this happens? Maybe it is time to take a look at pg_squeeze and fix your database once and for all. pg_squeeze has been designed to shrink your database tables without downtime. No more need for VACUUM FULL – pg_squeeze has it all. The first question any PostgreSQL person will ask is: Why not use VACUUM or VACUUM FULL? There are various reasons: A normal VACUUM does not really shrink the table in disk. Normal VACUUM will look for free space, but it won’t return this space to the operating system. VACUUM FULL does return space to the operating system but it needs a table lock. In case your table is small this usually does not matter. However, what if your table is many TBs in size? You cannot simply lock up a large table for hours just to shrink it after table bloat has ruined performance. pg_squeeze can shrink large tables using only a small, short lock. However, there is more. The following listing contains some of the operations pg_squeeze can do with minimal locking: Shrink tables Move tables and indexes from one tablespace to another Index organize (“cluster”) a table Change the on-disk FILLFACTOR After this basic introduction it is time to take a look and see how pg_squeeze can be installed and configured. PostgreSQL: Installing pg_squeeze pg_squeeze can be downloaded for free from our GitHub repository. However, binary packages are available for most Linux distributions. If you happen to run Solar, AIX, FreeBSD or some other less widespread operating system just get in touch with us. We are eager to help. After you have compiled pg_squeeze or installed the binaries some changes have to be made to postgresql.conf: wal_level = logical max_replication_slots = 10 # minimum 1 shared_preload_libraries = 'pg_squeeze' The most important thing is to set the wal_level to logical. Internally pg_squeeze works as follows: It creates a new datafile (snapshot) and then applies changes made to the table while this snapshot is copied over. This is done using logical decoding. Of course logical decoding needs replication slots. Finally the library has to be loaded when PostgreSQL is started. This is basically it – pg_squeeze is ready for action. Understanding table bloat in PostgreSQL Before we dive deeper into pg_squeeze it is important to understand table bloat in general. Let us take a look at the following example: test=# CREATE TABLE t_test (id int); CREATE TABLE test=# INSERT INTO t_test SELECT * FROM generate_series(1, 2000000); INSERT 0 2000000 test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 69 MB (1 row) Once we have imported 2 million rows the size of the table is 69 MB. What happens if we update these rows and simply add one? test=# UPDATE t_test SET id = id + 1; UPDATE 2000000 test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 138 MB (1 row) The size of the table is going to double. Remember, UPDATE has to duplicate the row which of course eats up some space. The most important observation, however, is: If you run VACUUM the size of the table on disk is still 138 MB – storage IS NOT returned to the operating system. VACUUM can shrink tables in some rare instances. However, in reality the table is basically never going to return space to the filesystem which is a major issue. Table bloat is one of the most frequent reasons for bad performance, so it is important to either prevent it or make sure the table is allowed to shrink again. PostgreSQL: Shrinking tables again If you want to use pg_squeeze you have to make sure that a table has a primary key. It is NOT enough to have unique indexes – it really has to be a primary key. The reason is that we use replica identities internally, so we basically suffer from the same restrictions as other tools using logical decoding. Let us add a primary key and squeeze the table: test=# ALTER TABLE t_test ADD PRIMARY KEY (id); ALTER TABLE test=# SELECT squeeze.squeeze_table('public', 't_test', null, null, null); squeeze_table --------------- (1 row) Calling pg_squeeze manually is one way to handle a table. It is the preferred method if you want to shrink a table once. As you can see the table is smaller than before: test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 69 MB (1 row) The beauty is that minimal locking was needed to do that. Scheduling table reorganization pg_squeeze has a builtin job scheduler which can operate in many ways. It can tell the system to squeeze a table within a certain timeframe or trigger a process in case some thresholds have been reached. Internally pg_squeeze uses configuration tables to control its behavior. Here is how it works: test=# d squeeze.tables Table "squeeze.tables" Column | Type | Collation | Nullable | Default ------------------+------------------+-----------+----------+-------------------------------------------- id | integer | | not null | nextval('squeeze.tables_id_seq'::regclass) tabschema | name | | not null | tabname | name | | not null | clustering_index | name | | | rel_tablespace | name | | | ind_tablespaces | name[] | | | free_space_extra | integer | | not null | 50 min_size | real | | not null | 8 vacuum_max_age | interval | | not null | '01:00:00'::interval max_retry | integer | | not null | 0 skip_analyze | boolean | | not null | false schedule | squeeze.schedule | | not null | Indexes: "tables_pkey" PRIMARY KEY, btree (id) "tables_tabschema_tabname_key" UNIQUE CONSTRAINT, btree (tabschema, tabname) Check constraints: "tables_free_space_extra_check" CHECK (free_space_extra >= 0 AND free_space_extra < 100) "tables_min_size_check" CHECK (min_size > 0.0::double precision) Referenced by: TABLE "squeeze.tables_internal" CONSTRAINT "tables_internal_table_id_fkey" FOREIGN KEY (table_id) REFERENCES squeeze.tables(id) ON DELETE CASCADE TABLE "squeeze.tasks" CONSTRAINT "tasks_table_id_fkey" FOREIGN KEY (table_id) REFERENCES squeeze.tables(id) ON DELETE CASCADE Triggers: tables_internal_trig AFTER INSERT ON squeeze.tables FOR EACH ROW EXECUTE FUNCTION squeeze.tables_internal_trig_func() The last column here is worth mentioning: It is a custom data type capable of holding cron-style scheduling information. The custom data type looks as follows: test=# d squeeze.schedule Composite type "squeeze.schedule" Column | Type | Collation | Nullable | Default ---------------+------------------+-----------+----------+--------- minutes | squeeze.minute[] | | | hours | squeeze.hour[] | | | days_of_month | squeeze.dom[] | | | months | squeeze.month[] | | | days_of_week | squeeze.dow[] | | | If you want to make sure that pg_squeeze takes care of a table simple insert the configuration into the table: test=# INSERT INTO squeeze.tables (tabschema, tabname, schedule) VALUES ('public', 't_test', ('{30}', '{22}', NULL, NULL, '{3, 5}')); INSERT 0 1 In this case public.t_test will be squeezed at 22:30h in the evening every 3rd and 5th day of the week. The main question is: When is that? In our setup days 0 and 7 are sundays. So 3 and 5 means wednesday and friday at 22:30h. Let us check what the configuration looks like: test=# x Expanded display is on. test=# SELECT *, (schedule).* FROM squeeze.tables; -[ RECORD 1 ]----+---------------------- id | 1 tabschema | public tabname | t_test clustering_index | rel_tablespace | ind_tablespaces | free_space_extra | 50 min_size | 8 vacuum_max_age | 01:00:00 max_retry | 0 skip_analyze | f schedule | ({30},{22},,,"{3,5}") minutes | {30} hours | {22} days_of_month | months | days_of_week | {3,5} Once this configuration is in place, pg_squeeze will automatically take care of things. Everything is controlled by configuration tables so you can easily control and monitor the inner workings of pg_squeeze. Handling errors If pg_squeeze decides to take care of a table it can happen that the reorg process actually fails. Why is that the case? One might drop a table and recreate it, the structure might change or pg_squeeze might not be able to get the brief lock at the end. Of course it is also possible that the tablespace you want to move a table too does not have enough space. There are many issues which can lead to errors. Therefore one has to track those reorg processes. The way to do that is to inspect squeeze.errors: test=# SELECT * FROM squeeze.errors; id | occurred | tabschema | tabname | sql_state | err_msg | err_detail ----+----------+-----------+---------+-----------+---------+------------ (0 rows) This log table contains all the relevant information needed to track things fast and easily. Finally … pg_squeeze is not the only Open Source tool we have published for PostgreSQL. If you are looking for a cutting edge scheduler we recommend taking a look at what pg_timetable has to offer. The post pg_squeeze: Optimizing PostgreSQL storage appeared first on Cybertec.

0
0
1146

article-image-rob-sullivan-using-pg_repack-in-aws-rds-from-planet-postgresql

Matthew Emerick

13 Oct 2020

4 min read

rob sullivan: Using pg_repack in AWS RDS from Planet PostgreSQL

Matthew Emerick

13 Oct 2020

4 min read

As your database keeps growing, there is a good chance you're going to have to address database bloat. While Postgres 13 has launched with some exciting features with built-in methods to rebuild indexes concurrently, many people still end up having to use pg_repack to do an online rebuild of the tables to remove the bloat. Customers on AWS RDS struggle figuring out how to do this. Ready to learn how? Since you have no server to access the local binaries, and because AWS RDS provides no binaries for the versions they are using, you're going to have to build your own. This isn't as hard as one might think because the official pg repos have an installer (ie: sudo apt install postgresql-10-pg_repack). If you don't use the repos, the project itself, is an open source project with directions: http://reorg.github.io/pg_repack/ While you were getting up to speed above, I was spinning up a postgres 10.9 db on RDS. I started it yesterday so that it would be ready by the time you got to this part of the post. Lets create some data: -- let's create the tableCREATE TABLE burritos (id SERIAL UNIQUE NOT NULL primary key,title VARCHAR(10) NOT NULL,toppings TEXT NOT NULL,thoughts TEXT,code VARCHAR(4) NOT NULL,UNIQUE (title, toppings));--disable auto vacuumALTER TABLE burritos SET (autovacuum_enabled = false, toast.autovacuum_enabled = false);-- orders upINSERT INTO burritos (title, toppings, thoughts, code)SELECT left(md5(i::text), 10), md5(random()::text), md5(random()::text), left(md5(random()::text), 4)FROM GENERATE_SERIES(1, 1000000) s(i);UPDATE burritos SET toppings = md5(random()::text) WHERE id < 250;UPDATE burritos SET toppings = md5(random()::text) WHERE id between 250 and 500;UPDATE burritos SET code = left(md5(random()::text), 4) WHERE id between 2050 and 5000;UPDATE burritos SET thoughts = md5(random()::text) WHERE id between 10000 and 20000;UPDATE burritos SET thoughts = md5(random()::text) WHERE id between 800000 and 900000;UPDATE burritos SET toppings = md5(random()::text) WHERE id between 600000 and 700000; (If you are curious how Magistrate presents bloat, here is a clip of the screen:) Much like a human that has had that much interaction with burritos... our database has quite a bit of bloat. Assuming we already have the pg_repack binaries in place, either though compilation or installing the package on the OS, we now need to enable the extension. We've put together a handy reference for installing extensions to get you going. pg_repack has a lot of options. Feel free to check them out, but I'm going to start packing: /usr/local/bin/pg_repack -U greataccounthere -h bloatsy.csbv99zxhbsh.us-east-2.rds.amazonaws.com -d important -t burritos -j 4NOTICE: Setting up workers.connsERROR: pg_repack failed with error: You must be a superuser to use pg_repack This might feel like game over because of the implementation of superuser on RDS, but the trick is to take a leap of faith and add another flag (-k) that skips the superuser check: /usr/local/bin/pg_repack-1.4.3/pg_repack -U greataccounthere -h bloatsy.csbv99zxhbsh.us-east-2.rds.amazonaws.com -k -d important -t burritos -j 4NOTICE: Setting up workers.connsINFO: repacking table "public.burritos"LOG: Initial worker 0 to build index: CREATE UNIQUE INDEX index_16449 ON repack.table_16442 USING btree (id) TABLESPACE pg_defaultLOG: Initial worker 1 to build index: CREATE UNIQUE INDEX index_16451 ON repack.table_16442 USING btree (title, toppings) TABLESPACE pg_defaultLOG: Command finished in worker 0: CREATE UNIQUE INDEX index_16449 ON repack.table_16442 USING btree (id) TABLESPACE pg_defaultLOG: Command finished in worker 1: CREATE UNIQUE INDEX index_16451 ON repack.table_16442 USING btree (title, toppings) TABLESPACE pg_default It works! The table is feeling fresh and tidy and your application has a little more pep in its step. When using Magistrate our platform matrix also knows when you have pg_repack installed and gives you the commands to run for tables it detects with high bloat percentage.

0
0
1244

Matthew Emerick

12 Oct 2020

3 min read

Jonathan Katz: PostgreSQL Monitoring for App Developers: Alerts & Troubleshooting from Planet PostgreSQL

Matthew Emerick

12 Oct 2020

3 min read

We've seen an example of how to set up PostgreSQL monitoring in Kubernetes. We've looked at two sets of statistics to keep track of in your PostgreSQL cluster: your vitals (CPU/memory/disk/network) and your DBA fundamentals. While starting at these charts should help you to anticipate, diagnose, and respond to issues with your Postgres cluster, the odds are that you are not staring at your monitor 24 hours a day. This is where alerts come in: a properly set up alerting system will let you know if you are on the verge of a major issue so you can head it off at the pass (and alerts should also let you know that there is a major issue). Dealing with operational production issues was a departure from my application developer roots, but I looked at it as an opportunity to learn a new set of troubleshooting skills. It also offered an opportunity to improve communication skills: I would often convey to the team and customers what transpired during a downtime or performance degradation situation (VSSE: be transparent!). Some of what I observed I used to help us to improve to application, while other parts helped me to better understand how PostgreSQL works. But I digress: let's drill into alerts on your Postgres database. Note that just because an alert or alarm is going off, it does not mean you need to immediately react: for example, a transient network degradation issue may cause a replica to lag further behind a primary for a bit too long but will clear up when the degradation passes. That said, you typically want to investigate the alert to understand what is causing it. Additionally, it's important to understand what actions you want to take to solve the problem. For example, a common mistake during an "out-of-disk" error is to delete the PostgreSQL WAL logs with a rm command; doing so can lead to a very bad day (and is also an advertisement for ensuring you have backups). As mentioned in the post on setting up PostgreSQL monitoring in Kubernetes, the Postgres Operator uses pgMonitor for metric collection and visualization via open source projects like Prometheus and Grafana. pgMonitor uses open source Alertmanager for configuring and sending alerts, and is what the PostgreSQL Operator uses. Using the above, let's dive into some of the items that you should be alerting on, and I will describe how my experience as an app developer translated into troubleshooting strategies.

0
0
805

article-image-mongodb-withdraws-controversial-server-side-public-license-from-the-open-source-initiatives-approval-process

Richard Gall

12 Mar 2019

4 min read

MongoDB withdraws controversial Server Side Public License from the Open Source Initiative's approval process

Richard Gall

12 Mar 2019

4 min read

MongoDB's Server Side Public License was controversial when it was first announced back in October. But the team were, back then, confident that the new license met the Open Source Initiative's approval criteria. However, things seem to have changed. The news that Red Hat was dropping MongoDB over the SSPL in January was a critical blow and appears to have dented MongoDB's ambitions. Last Friday, Co-founder and CTO Eliot Horowitz announced that MongoDB had withdrawn its submission to the Open Source Initiative. Horowitz wrote on the OSI approval mailing list that "the community consensus required to support OSI approval does not currently appear to exist regarding the copyleft provision of SSPL." Put simply, the debate around MongoDB's SSPL appears to have led its leadership to reconsider its approach. Update: this article was amended 19.03.2019 to clarify that the Server Side Public License only requires commercial users (ie. X-as-a-Service products) to open source their modified code. Any other users can still modify and use MongoDB code for free. What's the purpose of MongoDB's Server Side Public License? The Server Side Public License was developed by MongoDB as a means of protecting the project from "large cloud vendors" who want to "capture all of the value but contribute nothing back to the community." Essentially the license included a key modification to section 13 of the standard GPL (General Public License) that governs most open source software available today. You can read the SSPL in full here , but this is the crucial sentence: "If you make the functionality of the Program or a modified version available to third parties as a service, you must make the Service Source Code available via network download to everyone at no charge, under the terms of this License." This would mean that users are free to review, modify, and distribute the software or redistribute modifications to the software. It's only if a user modifies or uses the source code as part of an as-a-service offering that the full service must be open sourced. So essentially, anyone is free to modify MongoDB. It's only when you offer MongoDB as a commercial service that the conditions of the SSPL state that you must open source the entire service. What issues do people have with the Server Side Public License? The logic behind the SSPL seems sound, and probably makes a lot of sense in the context of an open source landscape that's almost being bled dry. But it presents a challenge to the very concept of open source software where the idea that software should be free to use and modify - and, indeed, to profit from - is absolutely central. Moreover, even if it makes sense as a way of defending open source projects from the power of multinational tech conglomerates, it could be argued that the consequences of the license could harm smaller tech companies. As one user on Hacker News explained back in October: "Let [sic] say you are a young startup building a cool SaaS solution. E.g. A data analytics solution. If you make heavy use of MongoDB it is very possible that down the line the good folks at MongoDB come calling since 'the value of your SaaS derives primarily from MongoDB...' So at that point you have two options - buy a license from MongoDB or open source your work (which they can conveniently leverage at no cost)." The Hacker News thread is very insightful on the reasons why the license has been so controversial. Another Hacker News user, for example, described the license as "either idiotic or malevolent." Read next: We need to encourage the meta-conversation around open source, says Nadia Eghbal [Interview] What next for the Server Side Public License? The license might have been defeated but Horowitz and MongoDB are still optimistic that they can find a solution. "We are big believers in the importance of open source and we intend to continue to work with these parties to either refine the SSPL or develop an alternative license that addresses this issue in a way that will be accepted by the broader FOSS community," he said. Whatever happens next, it's clear that there are some significant challenges for the open source world that will require imagination and maybe even some risk-taking to properly solve.

0
0
17111

article-image-protocol-flaw-in-mysql-client-allows-mysql-server-to-request-any-local-file-from-mysql-client

Melisha Dsouza

21 Jan 2019

2 min read

Protocol flaw in MySQL client allows MySQL server to request any local file from MySQL client

Melisha Dsouza

21 Jan 2019

2 min read

Last week, William de Groot, a digital forensics consultant discovered a protocol flaw in MySQL, which he alleges is the main reason behind e-commerce and government sites getting hacked via the Adminer database tool. He stated that Adminer can be “lured to disclose arbitrary files” which attackers can then misuse to fetch passwords for popular apps such as Magento and Wordpress, thus gaining control of a site’s database. Because of this flaw, MySQL client allows MySQL server to request any local file by default. He further states that an example of such a malicious MySQL server can be found at GitHub that was “likely used to exfiltrate passwords from these hacked sites”. A reddit user also pointed out that flaw could be further exploited to steal SSH keys and crypto wallets. The only check mark is that the server has to know the full path of the file on the client to exploit this flaw. Unlike Adminer, several clients and libraries including Golang, Python, PHP-PDO, have built-in protection for this “feature” or disable it by default. This flaw is surprisingly a part of MySQL documentation which states: Source: MySQL Documentation You can head over to Willem Groot’s blog for more insights on this news. Alternatively, head over to his Twitter thread for a more in-depth discussion on the topic. How to optimize MySQL 8 servers and clients 6 reasons to choose MySQL 8 for designing database solutions 12 most common MySQL errors you should be aware of

0
0
3004

article-image-mysql-8-0-is-generally-available-with-added-features

Amey Varangaonkar

23 Apr 2018

2 min read

MySQL 8.0 is generally available with added features

Amey Varangaonkar

23 Apr 2018

2 min read

The long awaited general release of MySQL 8.0 was finally announced last week. MySQL, as we all know, is the world’s most popular open source database. Despite the growing adoption of NoSQL databases, MySQL continues to be widely used across the industry. The new features introduced in version 8.0 bring significant upgrades in performance, security as well as database development and administration. What’s new in MySQL 8.0 server? Let’s take a quick overview of all the new features and additions introduced in MySQL 8.0, and how they may affect the developers and DBAs: SQL Window Functions, including major enhancements to table expressions, indexes, regular expressions, and more New JSON functions and enhancements to performance, for working with JSON values GIS Support added, which means MySQL 8.0 is now capable of handling and working with geographic data with efficiency. Spatial data types, indexes and functions have been introduced. Better reliability, with DDL statements becoming atomic and crash-safe. New enhancements to InnoDB means the metadata is now stored more securely and can be worked with in a better manner. Significant enhancements to performance schema, configuration variables, and error logging. New security enhancements, with improvements to OpenSSL, SQL roles, changes to authentication and privileges and more Performance improvements, with InnoDB now able to perform better read/write workloads and better resource optimization There are lot more enhancements to the MySQL database such as replication, MySQL shell, and the different DevAPI-based connectors.To know more about the newly added features in MySQL 8.0 in detail, you can check out their official blog page. Download the in-demand 8.0 release to try the new features of MySQL! Additionally, to upgrade your existing MySQL installation from the previous version, you can also check out official MySQL 8.0 documentation. Read More Top 10 MySQL 8 performance bench-marking aspects to know 12 most common MySQL errors you should be aware of

0
0
2543

Tech News - Databases

Tom Swartz: Tuning Your Postgres Database for High Write Loads from Planet PostgreSQL

Bruce Momjian: Thirty Years of Continuous PostgreSQL Development from Planet PostgreSQL

Hans-Juergen Schoenig: pg_squeeze: Optimizing PostgreSQL storage from Planet PostgreSQL

rob sullivan: Using pg_repack in AWS RDS from Planet PostgreSQL

Jonathan Katz: PostgreSQL Monitoring for App Developers: Alerts & Troubleshooting from Planet PostgreSQL

MongoDB withdraws controversial Server Side Public License from the Open Source Initiative's approval process

Trending Topics

Protocol flaw in MySQL client allows MySQL server to request any local file from MySQL client

MySQL 8.0 is generally available with added features