Don’t Let the Redis HyperLogLog Make You Hyper!20 April 2017
Guide to Redshift Remodeling- Friends and Foes4 May 2017
There is a number of OLAP solutions available in the market, including the popular ones like Vertica, Oracle BI, SAP Business Warehouse, which have pre-existed for a long time. While many of these started as on-premise solutions, have tried to re-establish them in today’s competitive market by providing SaaS variance.
But, the world is heading towards cloud and OLAP solutions on cloud work well, as they are built with the mindset of exploiting cloud potential of robustness, scalability, and availability. Hence, we picked the most popular data warehouses provided by the leading cloud platforms and ran an experiment to do a performance comparison!
Below three OLAP DBs were chosen for comparison:
- IBM DashDB
- Amazon Redshift
- Microsoft SQL Data Warehouse
All these are built on columnar and MPP architecture, to quicken up the aggregations and utilize both cores and servers for query parallelization.
A managed cloud database based on the DB2 and Netezza engine. There are 3 editions: IBM dashDB for Transactions (general purpose, transactional or web workloads), dashDB for Analytics (data warehouse), and dashDB Local (a Docker container image allowing deployment on private clouds).
Provides integration with different kinds of data sources like Oracle, on-prem, analytics tools like R, SPSS and BI tools like SAS, Cognos, Tableau.
Azure SQL Data Warehouse is a cloud-based data warehouse-as-a-service hosted on Microsoft’s Azure platform. It has a massively parallel processing (MPP) shared nothing architecture capable of distributing query computation over a set of compute nodes running Azure SQL Database and uses Azure Storage Blobs as the underlying data storage.
Azure SQL Data Warehouse decouples compute and storage enabling compute power, to be adjusted independently of storage based on workload requirements at any given time.
SQL data warehouse integrates with Azure machine learning, Azure data factory, stream analytics and PowerBI. This also provides integration with other visualization and reporting tools like Tableau, Pentaho etc.
Microsoft SQL Data Warehouse
Amazon Redshift, a hosted data warehouse product, forms part of the larger cloud-computing platform Amazon Web Services. It is built on top of technology from the massive parallel processing (MPP) data-warehouse company- ParAccel and based on PostgreSQL.
Redshift differs from Amazon’s other hosted database offering, Amazon RDS, in its ability to handle analytics workloads. To be able to handle large-scale data sets and database migrations, Amazon makes use of massively parallel processing.
It provides seamless integration with many AWS services like S3, Elastic Search as a Service, Kinesis, Quicksight, Mobile analytics etc. while also supports lots of external BI tool like Jaspersoft, Tableau, Pentaho etc.
To find best among these, we ran the experiment to see how these perform. Here are some stats on the experiment setup:
Data: 900 GB structured data,
Query Iterations: 6
Machine Configuration: 32 core with 256 RAM
Same kinds of machines were procured on all the platforms to standardize the results.
Once loaded, further steps were taken to tune the OLAP platforms as much as possible by:
- Distributing data across nodes on the filter or query.
- Sort data by the selection of query.
3 different queries were executed to test the performance of multiple metrics like data volume, multiple filters joins.
Query 1 – Aggregation over complete data set
Query 2 –Multiple filters and grouping
Query 3 –Join between tables
6 different iterations were done for same query and the best performing result is picked. This is how the three OLAP performed:
|Redshift||DashDB||SQL Data Warehouse|
|Query 1||65 sec||247 sec||311 sec|
|Query 2||15 sec||14 sec||12 sec|
|Query 3||33 sec||262 sec||130 sec|
P.S. Redshift consistently outperforms other OLAP on all the fronts. Be it joins or aggregations on whole data set or aggregations with filter. If tuned correctly and the right set of distribution and sort keys are used, Redshift will prove to be much faster than any other on-cloud OLAP solution.
We, at BluePi, have deep-rooted expertise in working with on-cloud OLAP solutions, ‘Redshift’ is a primary domain. He has implemented the same on a very large scale for numerous happy customers, performing complex cloud migrations, giving niche solutions.
If you have any query or are looking forward to reach out to an expert, need technical consultation or a cutting-edge solution for any complex technical bottleneck, we are listening! Reach out to us at firstname.lastname@example.org or contact us here!
This blog has been written by Hariprasad and Saubhagya.
Hariprasad Narasimhaiah, Sr. Manager Technology
Hari is an experienced Systems Architect with expertise in Systems Migration, UNIX/LINUX Infrastructure, Data Center Migration and Consolidation, Solutions Engineering and Implementation, Virtualization and Disaster Recovery. I have successfully lead direct reports and cross functional teams, to manage project initiation to deployment and support, assess and analyze current infrastructure and provide improvements and solutions.
Saubhagya Banerji, System Analyst
Saubhagya Banerji works as a System Analyst at BluePi Consulting Pvt. Ltd. His areas of interest include Spring Cloud, Android, Strongloop and AWS. When he is not working on advances in tech field, he likes to play Tennis, enjoys spending time on his PS4 and doing leisure activities. For anything and everything geek, he is right here to help you get answers and accelerate your tech-quench.