PostgreSQL Weekly News - May 9, 2021

Posted on 2021-05-10 by PWN
PWN

PostgreSQL Weekly News - May 9, 2021

Person of the week: https://postgresql.life/post/dave_page/

PostgreSQL Product News

pgBackRest 2.33, a backup and restore system for PostgreSQL, released. https://pgbackrest.org/release.html#2.33

pg_probackup 2.4.15, a utility to manage backup and recovery of PostgreSQL database clusters, released. https://github.com/postgrespro/pg_probackup/releases/tag/2.4.15

PostgreSQL Jobs for May

https://archives.postgresql.org/pgsql-jobs/2021-05/

PostgreSQL in the News

Planet PostgreSQL: https://planet.postgresql.org/

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm PST8PDT to david@fetter.org.

Applied Patches

Alexander Korotkov pushed:

Amit Kapila pushed:

Peter Eisentraut pushed:

Tom Lane pushed:

  • Fix performance issue in new regex match-all detection code. Commit 824bf7190 introduced a new search of the NFAs generated by regex compilation. I failed to think hard about the performance characteristics of that search, with the predictable outcome that it's bad: weird regexes can trigger exponential search time. Worse, there's no check-for-interrupt in that code, so you can't even cancel the query if this happens. Fix by introducing memo-ization of the search results, so that any one NFA state need be examined in detail just once. This potentially uses a lot of memory, but we can bound the memory usage by putting a limit on the number of states for which we'll try to prove match-all-ness. That is sane because we already have a limit (DUPINF) on the maximum finite string length that a matchall regex can match; and patterns that involve much more than DUPINF states would probably exceed that limit anyway. Also, rearrange the logic so that we check the basic is-the-graph- all-RAINBOW-arcs property before we start the recursive search to determine path lengths. This will ensure that we fall out quickly whenever the NFA couldn't possibly be matchall. Also stick in a check-for-interrupt, just in case these measures don't completely eliminate the risk of slowness. Discussion: https://postgr.es/m/3483895.1619898362@sss.pgh.pa.us https://git.postgresql.org/pg/commitdiff/f68970e33f4dc48094c24c78c452ad730ae9ae12

  • Disable cache clobber to avoid breaking postgres_fdw termination test. Commit 93f414614 improved a pre-existing test case so that it would show whether or not termination of the "remote" worker process happened. This soon exposed that, when debug_invalidate_system_caches_always (nee CLOBBER_CACHE_ALWAYS) is enabled, no such termination occurs. That's because cache invalidation forces postgres_fdw connections to be dropped at end of transaction, so that there's no worker to terminate. There's a race condition as to whether the worker will manage to get out of the BackendStatusArray before we look, but at least on buildfarm member hyrax, it's failed twice in two attempts. Rather than re-lobotomizing the test, let's fix this by transiently disabling debug_invalidate_system_caches_always. (Hooray for that being just a GUC nowadays, rather than a compile-time option.) If this proves not to be enough to make the test stable, we can do the other thing instead. Discussion: https://postgr.es/m/3854538.1620081771@sss.pgh.pa.us https://git.postgresql.org/pg/commitdiff/1273a15bf91fa322915e32d3b6dc6ec916397268

  • Doc: improve and centralize the documentation for OID alias types. Previously, a lot of information about type regclass existed only in the discussion of the sequence functions. Maybe that made sense in the beginning, because I think originally those were the only functions taking regclass. But it doesn't make sense anymore. Move that material to the "Object Identifier Types" section in datatype.sgml, generalize it to talk about the other reg* types as well, and add more examples. Per bug #16991 from Federico Caselli. Discussion: https://postgr.es/m/16991-bcaeaafa17e0a723@postgresql.org https://git.postgresql.org/pg/commitdiff/f33a178a34809a2bae7a5f4c00984d87771f4204

  • Doc: update RELEASE_CHANGES checklist. Update checklist to reflect current practice: The platform-specific FAQ files are long gone. We've never routinely updated the libbind code we borrowed, either, and there seems no reason to start now. Explain current practice of running pgindent twice per cycle. Discussion: https://postgr.es/m/4038398.1620238684@sss.pgh.pa.us https://git.postgresql.org/pg/commitdiff/e8ce68b0b9ae2757c6153a88bf869904d2d5ac0b

  • Doc: trivial wording adjustment. Improve self-referential foreign key example, per suggestion from David Johnston. Discussion: https://postgr.es/m/CAKFQuwZTke7+HUn4YUGqu2+gAPi4Cy18TXMrg_Z5nADkxfPNMw@mail.gmail.com https://git.postgresql.org/pg/commitdiff/c38cadc0907a8d071b043b2b32b83efa09db38ea

  • Doc: copy-editing for debug_invalidate_system_caches_always description. I came to fix "useful only useful", but the more I looked at the text the more things I thought could be improved. https://git.postgresql.org/pg/commitdiff/f9b809e7fbe36cd3fe1ce33edb277288a31da386

  • Sync guc.c and postgresql.conf.sample with the SGML docs. It seems that various people have moved GUCs around in the config.sgml listing without bothering to make the code agree. Ensure that the config_group codes assigned to GUCs match where they are listed in config.sgml. Likewise ensure that postgresql.conf.sample lists GUCs in the same sub-section and same ordering as they appear in config.sgml. (I've got some doubts about some of these choices, but for the purposes of this patch, we'll treat config.sgml as gospel.) Notably, this requires adding a WAL_RECOVERY config_group value, because 1d257577e didn't. As long as we're renumbering that enum anyway, let's take out the values corresponding to major groups that are divided into sub-groups. No GUC should be assigned to the major group itself, so those values just create a temptation to do the wrong thing, while adding work for translators. In passing, adjust the short_desc strings for PRESET_OPTIONS GUCs to uniformly use the phrasing "Shows XYZ.", removing the impression some of these strings left that you can set the value. While some of these errors are old, no back-patch, as changing the contents of the pg_settings view in stable branches seems more likely to be seen as a compatibility break than anything helpful. Bharath Rupireddy, Justin Pryzby, Tom Lane Discussion: https://postgr.es/m/16997-ff16127f6e0d1390@postgresql.org Discussion: https://postgr.es/m/20210413123139.GE6091@telsasoft.com https://git.postgresql.org/pg/commitdiff/a55a98477b690dedb9b4368d7e5710c8e7fa534e

Robert Haas pushed:

Bruce Momjian pushed:

Álvaro Herrera pushed:

Andres Freund pushed:

Peter Geoghegan pushed:

Thomas Munro pushed:

  • Revert per-index collation version tracking feature. Design problems were discovered in the handling of composite types and record types that would cause some relevant versions not to be recorded. Misgivings were also expressed about the use of the pg_depend catalog for this purpose. We're out of time for this release so we'll revert and try again. Commits reverted: 1bf946bd: Doc: Document known problem with Windows collation versions. cf002008: Remove no-longer-relevant test case. ef387bed: Fix bogus collation-version-recording logic. 0fb0a050: Hide internal error for pg_collation_actual_version(<bad OID>). ff942057: Suppress "warning: variable 'collcollate' set but not used". d50e3b1f: Fix assertion in collation version lookup. f24b1569: Rethink extraction of collation dependencies. 257836a7: Track collation versions for indexes. cd6f479e: Add pg_depend.refobjversion. 7d1297df: Remove pg_collation.collversion. Discussion: https://postgr.es/m/CA%2BhUKGLhj5t1fcjqAu8iD9B3ixJtsTNqyCCD4V0aTO9kAKAjjA%40mail.gmail.com https://git.postgresql.org/pg/commitdiff/ec48314708262d8ea6cdcb83f803fc83dd89e721

  • Doc: Update notes about libc collation versions. The per-index collation version tracking feature was reverted, but we still have the ability to ask Windows (352f6f2d) and FreeBSD (ca051d8b) for collation versions to store in pg_collation.collversion. So, from the reverted patch, take a few words of documentation about libc on all three supported OSes to replace the pre-existing note that mentioned only glibc. Discussion: https://postgr.es/m/CA%2BhUKGLhj5t1fcjqAu8iD9B3ixJtsTNqyCCD4V0aTO9kAKAjjA%40mail.gmail.com https://git.postgresql.org/pg/commitdiff/b65431ca5e12a475ba7cf68afb63edb070c2ce08

Tomáš Vondra pushed:

Andrew Dunstan pushed:

Michaël Paquier pushed:

David Rowley pushed:

  • Move memory accounting Asserts for Result Cache code. In 9eacee2e6, I included some code to verify the cache's memory tracking is correct by counting up the number of entries and the memory they use each time we evict something from the cache. Those values are then compared to the expected values using Assert. The problem is that this requires looping over the entire cache hash table each time we evict an entry from the cache. That can be pretty expensive, as noted by Pavel Stehule. Here we move this memory accounting checking code so that we only verify it on cassert builds once when shutting down the Result Cache node. Aside from the performance increase, this has two distinct advantages: 1) We do the memory checks at the last possible moment before destroying the cache. This means we'll now catch accounting problems that might sneak in after a cache eviction. 2) We now do the memory Assert checks when there were no cache evictions. This increases the coverage. One small disadvantage is that we'll now miss any memory tracking issues that somehow managed to resolve themselves by the end of execution. However, it seems to me that such a memory tracking problem would be quite unlikely, and likely somewhat less harmful if one were to exist. In passing, adjust the loop over the hash table to use the standard simplehash.h method of iteration. Reported-by: Pavel Stehule Discussion: https://postgr.es/m/CAFj8pRAzgoSkdEiqrKbT=7yG9FA5fjUAP3jmJywuDqYq6Ki5ug@mail.gmail.com https://git.postgresql.org/pg/commitdiff/92c4c269d24d016c19858a21347ff25a7de1f486

Pending Patches

Aleksander Alekseev sent in another revision of a patch to allow CustomScan nodes to signal projection support.

Melanie Plageman sent in another revision of a patch to avoid an immediate fsync for just-built indexes.

Dilip Kumar sent in two revisions of a patch to add TOAST compression method options.

Vigneshwaran C sent in four more revisions of a patch to identify missing publications from publisher during CREATE/ALTER SUBSCRIPTION.

Bertrand Drouvot sent in a patch to intended to fix a bug that manifested as pg_upgrade can result in early wraparound on databases with high transaction load by exactly copying the oldestXid limit from the source database.

Bertrand Drouvot sent in a patch to intended to fix a bug that manifested as orphaned dependencies by adding a new catalog snapshot is DirtyCatalogSnapshot controlled by a new UseDirtyCatalogSnapshot variable, and using it to check for in-flight dependencies and get the objects' description to generate the error messages.

Thomas Munro sent in another revision of a patch to add a new TAP test under src/test/recovery to run the standard regression tests with a streaming replica replaying the WAL. This provides a basic workout for WAL decoding and redo code.

Mark Dilger sent in two revisions of a patch to extend amcheck to check TOAST size and compression.

Jeff Davis sent in a patch to bound ItemPointers in table AMs in a way that other parts of the system expect.

Dilip Kumar and Kyotaro HORIGUCHI traded patches to intended to fix a bug that manifested as a race condition in recovery.

Craig Ringer sent in another revision of a patch to make Valgrind runs simpler with make USE_VALGRIND=1.

David Fetter sent in a patch to clarify some column descriptions in pg_constraint.

Vigneshwaran C sent in two more revisions of a patch to print the backtrace of a specified postgres process.

Craig Ringer sent in a patch to emit dummy statements for probes.d probes when disabled.

Jeff Davis sent in another revision of a patch to implement ALTER TABLE ... SET ACCESS METHOD.

Amit Langote sent in a patch to clarify the documentation of ALTER TABLE ... DETACH PARTITION CONCURRENTLY.

Etsuro Fujita sent in a patch to fix EXPLAIN ANALYZE for async capable nodes.

Etsuro Fujita sent in a patch to describe the way asynchronous execution works for FDWs.

Nitin Jadhav sent in a patch to implement list partitioning for multiple columns.

Yugo Nagata sent in another revision of a patch to implement incremental view maintenance.

Konstantin Knizhnik sent in another revision of a patch to remove unneeded self-joins.

Andrey Borodin sent in another revision of a patch to speed up GiST index builds.

Etsuro Fujita sent in a patch to fix an infelicity between inherited UPDATE/DELETE and async execution by modifying postgresPlanDirectModify() so that it clears the async-capable flag if it is set.

James Coleman sent in a patch to parallelize correlated subqueries that execute within each worker.

Jeff Janes sent in a patch to add configure cache checkpoints before fatal checks.

Hou Zhijie sent in another revision of a patch to fix EXPLAIN's information about FDW batch inserts.

Erik Rijkers and Alexander Korotkov traded patches to improve the example in JSON path queries to highlight what's actually happening.

Bruce Momjian sent in another revision of a patch to fix a misbehavior in interval arithmetic.

Hou Zhijie sent in a patch to fix a bug where there was an inaccurate error message when fdw batch_size was set to 0.

David Rowley sent in another revision of a patch to speed up NOT IN() with a set of Consts.

Andrew Dunstan sent in another revision of a patch to implement the JSON_TABLE part of SQL/JSON.

Andrew Dunstan sent in another revision of a patch to implement the functions from SQL/JSON.

Tom Lane sent in a patch to make a binary-coercible C function in order to reduce opr_sanity test's runtime under CLOBBER_CACHE_ALWAYS.

David Rowley sent in a patch to only validate resultcache memory on plan shutdown.

Vigneshwaran C sent in a patch to correct the data type for the logical replication message formats.