blog
Top Ten Reasons to Migrate from Oracle to PostgreSQL
Oracle Relational Database Management System (RDBMS) has been widely used by large organizations and is considered by far to be the most advanced database technology available in the market. It’s typically the most often compared RDBMS with other database products serving as the standard “de-facto” of what a product should offer. It is ranked by db-engines.com as the #1 RDBMS available in the market today.
PostgreSQL is ranked as the #4 RDBMS, but that doesn’t mean there aren’t any advantages to migrating to PostgreSQL. PostgreSQL has been around since 1989 it open-sourced in 1996. PostgreSQL won DBMS of the year on two consecutive years from 2017 and 2018. That just indicates there’s no signs of stopping from attracting large number of users and big organizations.
One of the reasons why PostgreSQL has attracted a lot of attention is because people are looking for an alternative to Oracle so they can cut off the organizations high costs and escape vendor lock-in.
Moving from a working and productive Oracle Database can be a daunting task. Concerns such as the company’s TCO (Total Cost of Ownership) is one of the reasons why companies drag their decision whether or not to ditch Oracle.
In this blog we will take a look at some of the main reasons why companies are choosing to leave Oracle and migrate to PostgreSQL.
Reason One: It’s a True Open Source Project
PostgreSQL is open-source and is released under the PostgreSQL License, a liberal Open Source license, similar to the BSD or MIT licenses. Acquiring the product and support requires no fee.
If you want to leverage the database software, it means that you can get all the available features of PostgreSQL database for free. PostgreSQL has been more than 30 years old of maturity in the database world and has been touch based as open-source since 1996. It has enjoyed decades developers working to create extensions. That, in itself, makes developers, institutions, and organizations choose PostgreSQL for enterprise applications; powering leading business and mobile applications.
Once again, organizations are waking up to the realization that open source database solutions like Postgres offer greater capacity, flexibility, and support that isn’t entirely dependent on any one company or developer. Postgres, like Linux before it, has been (and continues to be) engineered by dedicated users solving day-to-day business problems who choose to return their solutions to the community. Unlike a large developer like Oracle, which may have different motives of developing products that are profitable or support a narrow but lucrative market, the Postgres community is committed to developing the best possible tools for everyday relational database users.
PostgreSQL often carries out those tasks without adding too much complexity. Its design is focused strictly on handling the database without having to waste resources like managing additional IT environments through added features. It’s one of the things that consumers of this open-source software like when migrating from Oracle to PostgreSQL. Spending hours to study complex technology about how an Oracle database functions, or how to optimize and tune up might end up with its expensive support. This lures institutions or organizations to find an alternative that can be less headache on the cost and brings profit and productivity. Please check out our previous blog about how capable does PostgreSQL to match SQL syntax presence with Oracle’s syntax.
Reason Two: No License and a Large Community
For users of the Oracle RDBMS platform, it’s difficult to find any type of community support that is free or without a hefty fee. Institutions, organizations, and developers often end up finding an alternative information online that can offer answers or solutions to their problems for free.
When using Oracle, it’s difficult to decide on a specific product or whether to go with Product Support because (typically) a lot of money is involved. You might try a specific product to test it, end up buying it, just to realize it can’t help you out. With PostgreSQL, the community is free and full of experts who have extensive experience that are happy to help you out with your current problems.
You can subscribe to the mailing list right here at https://lists.postgresql.org/ to start reaching out with the community. Newbies or prodigies of PostgreSQL touch based here to communicate, showcase, and share their solutions, technology, bugs, new findings or even share their emerging software. You may even ask help from IRC chat using irc.freenode.net and joining to #postgresql channel. You can also reach out to the community through Slack by joining with https://postgres-slack.herokuapp.com/ or https://postgresteam.slack.com/. There’s a lot of options to take and lots of Open Source organizations that can offer you questions
For more details and information about where to start, go check out here https://www.postgresql.org/community/.
If you want to go and checkout for Professional Services in PostgreSQL, there’s tons of options to choose from. Even checking their website at https://www.postgresql.org/support/professional_support/northamerica/, you can find a large list of companies there and some of these are at a cheap price. Even here at Severalnines, we do offer also Support for Postgres, which is part of the ClusterControl license or a DBA Consultancy.
Reason Three: Wide Support for SQL Conformance
PostgreSQL has always been keen to adapt and conform to SQL as a de facto standard for its language. The formal name of the SQL standard is ISO/IEC 9075 “Database Language SQL”. Any successive revised versions of the standard releases replaces the previous one, so claims of conformance to earlier versions have no official merit.
Unlike Oracle, some keyword or operators are still present that does not conform the ANSI-standard SQL (Structured Query Language). Example, the OUTER JOIN (+) operator can attribute confusions with other DBA’s that have not touched or with the least familiarity to Oracle. PostgreSQL follows the ANSI-SQL standard for JOIN syntax and that leaves an advantage to jump easily and simply with other open-source RDBMS database such as MySQL/Percona/MariaDB databases.
Another syntax that is very common with Oracle is on using hierarchical queries. Oracle uses the non-standard START WITH..CONNECT BY syntax, while in SQL:1999, hierarchical queries are implemented by way of recursive common table expressions. For example, the queries below differs its syntax in accordance to hierarchical queries:
Oracle
SELECT
restaurant_name,
city_name
FROM
restaurants rs
START WITH rs.city_name = 'TOKYO'
CONNECT BY PRIOR rs.restaurant_name = rs.city_name;
PostgreSQL
WITH RECURSIVE tmp AS (SELECT restaurant_name, city_name
FROM restaurants
WHERE city_name = 'TOKYO'
UNION
SELECT m.restaurant_name, m.city_name
FROM restaurants m
JOIN tmp ON tmp.restaurant_name = m.city_name)
SELECT restaurant_name, city_name FROM tmp;
PostgreSQL has a very similar approach as the other top open-source RDBMS like MySQL/MariaDB.
According to the PostgreSQL manual, PostgreSQL development aims for conformance with the latest official version of the standard where such conformance does not contradict traditional features or common sense. Many of the features required by the SQL standard are supported, though sometimes with slightly differing syntax or function. This is, in fact, what is great with PostgreSQL as it’s also supported and collaborated by the different organizations, whether it’s small or large. The beauty stays on its SQL language conformity to what has the standard push through.
PostgreSQL development aims for conformance with the latest official version of the standard where such conformance does not contradict traditional features or common sense. Many of the features required by the SQL standard are supported, though sometimes with slightly differing syntax or function. Further moves towards conformance can be expected over time.
Reason Four: Query Parallelism
To be fair, PostgreSQL’s Query Parallelism is not as rich when compared to Oracle’s parallel execution for SQL statements. Amongst the features that Oracle’s parallelism are statement queuing with hints, ability to set the degree of parallelism (DOP), set a parallel degree policy, or adaptive parallelism.
PostgreSQL has a simple degree of parallelism based on the plans supported, but that does not define that Oracle edges over the open source PostgreSQL.
PostgreSQL’s parallelism has been constantly improving and continuously enhanced by the community. When PostgreSQL 10 was released, it added more appeal to the public especially the improvements on parallelism support for merge join, bitmap heap scan, index scan and index-only scan, gather merge, etc. Improvements also adds statistics to pg_stat_activity.
In PostgreSQL versions < 10, parallelism is disabled by default which you need to set the variable max_parallel_workers_per_gather.
postgres=# timing
Timing is on.
postgres=# explain analyze select * from imdb.movies where birthyear >= 1980 and birthyear <=2005;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on movies (cost=0.00..215677.28 rows=41630 width=68) (actual time=0.013..522.520 rows=84473 loops=1)
Filter: ((birthyear >= 1980) AND (birthyear <= 2005))
Rows Removed by Filter: 8241546
Planning time: 0.039 ms
Execution time: 525.195 ms
(5 rows)
Time: 525.582 ms
postgres=# o /dev/null
postgres=# select * from imdb.movies where birthyear >= 1980 and birthyear <=2005;
Time: 596.947 ms
Query plan reveals that it's the actual time can go around 522.5 ms then the real query execution time goes around 596.95 ms. Whereas enabling parallelism,
postgres=# set max_parallel_workers_per_gather=2;
Time: 0.247 ms
postgres=# explain analyze select * from imdb.movies where birthyear >= 1980 and birthyear <=2005;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..147987.62 rows=41630 width=68) (actual time=0.172..339.258 rows=84473 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on movies (cost=0.00..142824.62 rows=17346 width=68) (actual time=0.029..264.980 rows=28158 loops=3)
Filter: ((birthyear >= 1980) AND (birthyear <= 2005))
Rows Removed by Filter: 2747182
Planning time: 0.096 ms
Execution time: 342.735 ms
(8 rows)
Time: 343.142 ms
postgres=# o /dev/null
postgres=# select * from imdb.movies where birthyear >= 1980 and birthyear <=2005;
Time: 346.020 ms
The query plan determines that the query needs to use parallelism and then it does use a Gather node. It's actual time estimates to 339ms with 2 works and estimates to 264ms before it has been aggregated by the query plan. Now, the real execution time of the query took 346ms, which is very near to the estimated actual time from the query plan.
This just illustrates how fast and beneficial it is with PostgreSQL. Although PostgreSQL has its own limits when parallelism can occur or when query plan determine it's faster than to use parallelism, it does not make its feature a huge difference than Oracle. PostgreSQL's parallelism is flexible and can be enabled or utilized correctly as long as your query matches the sequence required for query parallelism.
Reason Five: Advanced JSON Support and is Always Improving
JSON support in PostgreSQL is always on par compared to the other open source RDBMS. Take a look at this external blog from LiveJournal where PostgreSQL's JSON support reveals to be always more advanced when compared to the other RDBMS. PostgreSQL has a large number of JSON functions and features.
The JSON data-type was introduced in PostgreSQL-9.2. Since then, it has a lot of significant enhancements and amongst the major addition came-up in PostgreSQL-9.4 with the addition of JSONB data-type. PostgreSQL offers two data types for storing JSON data: json and jsonb. With jsonb, it is an advanced version of JSON data-type which stores the JSON data in binary format. This is the major enhancement which made a big difference to the way JSON data was searched and processed in PostgreSQL.
Oracle has extensive support of JSON as well. In contrast, PostgreSQL has extensive support as well as functions that can be used for data retrieval, data formatting, or conditional operations that affects the output of the data or even the data stored in the database. Data stored with jsonb data type has a greater advantage with the ability to use GIN (Generalized Inverted Index) which can be used to efficiently search for keys or key/value pairs occurring within a large number of jsonb documents.
PostgreSQL has additional extensions that are helpful to implement TRANSFORM FOR TYPE for the jsonb type to its supported procedure languages. These extensions are jsonb_plperl and jsonb_plperlu for PL/Perl. Whereas for PL/Python, these are jsonb_plpythonu, jsonb_plpython2u, and jsonb_plpython3u. For example, using jsonb values to map Perl arrays, you can use jsonb_plperl or jsonb_plperlu extensions.
ArangoDB had posted a benchmark comparing PostgreSQL's JSON performance along with other JSON-support databases. Although it's an old blog, still it showcases how PostgreSQL's JSON performs compared to other databases where JSON is it's core feature in their database kernel. This just makes PostgreSQL has its own advantage even with its side feature.
Reason Six: DBaaS Support By Major Cloud Vendors
PostgreSQL has been supported widely as a DBaaS. These services are coming from Amazon, Microsoft's with its Azure Database for PostgreSQL, and Google's Cloud SQL for PostgreSQL.
In comparison Oracle, is only available on Amazon RDS for Oracle. The services offered by the major players start at an affordable price and are very flexible to setup in accordance to your needs. This helps institutions and organizations to setup accordingly and relieve from its large cost tied up on the Oracle platform.
Reason Seven: Better Handling of Massive Amounts of Data
PostgreSQL RDBMS are not designed to handle analytical and data warehousing workloads. PostgreSQL is a row-oriented database, but it has the capability to store large amount of data. PostgreSQL has the following limits for dealing with data store:
Limit |
Value |
Maximum Database Size |
Unlimited |
Maximum Table Size |
32 TB |
Maximum Row Size |
1.6 TB |
Maximum Field Size |
1 GB |
Maximum Rows per Table |
Unlimited |
Maximum Columns per Table |
250-1600 depending on column types |
Maximum Indexes per Table |
Unlimited |
If you want to use the core features of PostgreSQL, you may store large amount of data using jsonb. For example, a large amount of documents (PDF, Word, Spreadsheets) and store this using jsonb data type. For geolocation applications and systems, you can use PostGIS.
Reason Eight: Scalability, High-Availability, Redundancy/Geo-Redundancy, and Fault-Tolerant Solutions on the Cheap
Oracle offers similar, but powerful, solutions such as Oracle Grid, Oracle Real Application Clusters (RAC), Oracle Clusterware, and Oracle Data Guard to name a few. These technologies can add to your increasing costs and are unpredictably expensive to deploy and make stable. It's hard to ditch these solutions. Training and skills must be enhanced and develop the people involved on the deployment and implementation process.
PostgreSQL has massive support and that has a lot of options to choose from. PostgreSQL includes streaming and logical replication built-in to the core package of the software. You may also able to setup a synchronous replication for PostgreSQL to have more high-availability cluster, while making a stand by node process your read queries. For high availability, we suggest you read our blog Top PG Clustering High Availability (HA) Solutions for PostgreSQL and that covers a lot of great tools and technology to choose from.
There are enterprise features as well that offers high-availability, monitoring, and backup solutions. ClusterControl is one of this technology and offers at an affordable price compared to Oracle Solutions.
Reason Nine: Support for Several Procedural Languages: PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python.
Since version 9.4, PostgreSQL has a great feature where you can define a new procedural language in accordance to your choice. Although not all variety of programming languages are supported, but it has a number of languages that are supported. Currently, with base distribution, it includes PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python. The external languages are:
Name |
Language |
Website |
PL/Java |
Java |
|
PL/Lua |
Lua |
|
PL/R |
R |
|
PL/sh |
Unix shell |
|
PL/v8 |
JavaScript |
The great thing about this is that, unlike Oracle, developers that have jump off newly to PostgreSQL can quickly provide business logic to their application systems without further taking time to learn about PL/SQL. PostgreSQL makes the environment for developers easier and efficient. This nature of PostgreSQL contributes to the reason why developers loves PostgreSQL and starts to shift away on enterprise platform solutions to the open source environment.
Reason Ten: Flexible Indexes for Large and Textual Data (GIN, GiST, SP-GiST, and BRIN)
PostgreSQL has a huge advantage when it comes to the support of indexes which are beneficial to handling large data. Oracle has a lot of index types that are beneficial for handling large data sets as well, especially for full text indexing. But for PostgreSQL, these types of indexes are made to be flexible according to your purpose. For example, these types of indexes are applicable for large data:
GIN - (Generalized Inverted Indexes)
This type of index is applicable for jsonb, hstore, range, and arrays data type columns. It is useful when you have data types that contain multiple values in a single column. According to the PostgreSQL docs, “GIN is designed for handling cases where the items to be indexed are composite values, and the queries to be handled by the index need to search for element values that appear within the composite items. For example, the items could be documents, and the queries could be searches for documents containing specific words.”
GiST - (Generalized Search Tree)
A height-balanced search tree that consists of node pages. The nodes consist of index rows. Each row of a leaf node (leaf row), in general, contains some predicate (boolean expression) and a reference to a table row (TID). GiST indexes are best if you use this for geometrical data type like, you want to see if two polygons contained some point. In one case a specific point may be contained within box, while another point only exists within one polygon. The most common datatypes where you want to leverage GiST indexes are geometry types and text when dealing with full-text search
In choosing which index type to use, GiST or GIN, consider these performance differences:
- GIN index lookups are about three times faster than GiST
- GIN indexes take about three times longer to build than GiST
- GIN indexes are moderately slower to update than GiST indexes, but about 10 times slower if fast-update support was disabled
- GIN indexes are two-to-three times larger than GiST indexes
As a rule of thumb, GIN indexes are best for static data because lookups are faster. For dynamic data, GiST indexes are faster to update.
SP-GiST - (Space Partitioned GiST)
For larger datasets with natural but uneven clustering. This type of index leverage space partitioning trees. SP-GiST indexes are most useful when your data has a natural clustering element to it, and is also not an equally balanced tree. A great example of this is phone numbers, for example in the US, they use the following format:
- 3 digits for area code
- 3 digits for prefix (historically related to a phone carrier’s switch)
- 4 digits for line number
This means that you have some natural clustering around the first set of 3 digits, around the second set of 3 digits, then numbers may fan out in a more even distribution. But, with phone numbers some area codes have a much higher saturation than others. The result may be that the tree is very unbalanced. Because of that natural clustering up front and the unequal distribution of data–data like phone numbers could make a good case for SP-GiST.
BRIN - (Block Range Index)
For really large datasets that line up sequentially. A block range is a group of pages adjacent to each other, where summary information about all those pages is stored in Index. Block range indexes can focus on some similar use cases to SP-GiST in that they’re best when there is some natural ordering to the data, and the data tends to be very large. Have a billion record table especially if it’s time series data? BRIN may be able to help. If you’re querying against a large set of data that is naturally grouped together such as data for several zip codes (which then roll up to some city) BRIN helps to ensure that similar zip codes are located near each other on disk.
When you have very large datasets that are ordered such as dates or zip codes BRIN indexes allow you to skip or exclude a lot of the unnecessary data very quickly. BRIN additionally are maintained as smaller indexes relative to the overall data size making them a big win for when you have a large dataset.
Conclusion
PostgreSQL has some major advantages when competing against Oracle's enterprise platform and business solutions. It's definitely easy to hail PostgreSQL as your go-to choice of open source RDBMS as it is nearly powerful as Oracle.
Oracle is hard to beat (and that is a hard truth to accept) and it's also not easy to ditch the tech-giant’s enterprise platform. When systems provide you power and productive results, that could be a dilemma.
Sometimes though there are situations where a decision has to be made as continued over-investing in your platform cost can outweigh the cost of your other business layers and priorities which can affect progress.
PostgreSQL and its underlying platform solutions can be of choice to help you cut down the cost, relieve your budgetary problems; all with moderate to small changes.