emGee Software Solutions Custom Database Applications

Share this

Web Technologies

MyRocks Storage Engine in MariaDB is Now Release Candidate

Planet MySQL - Fri, 03/09/2018 - 08:19
MyRocks Storage Engine in MariaDB is Now Release Candidate Sergey Petrunya Fri, 03/09/2018 - 11:19

The MyRocks storage engine was introduced in MariaDB Server 10.2 as an alpha plugin – the maturity of plugins is separate from the database. It became a beta plugin earlier this year, and with the release of MariaDB Server 10.3.5 (RC) last week, it is now a release candidate plugin.

So, what is MyRocks? It is a storage engine like InnoDB, but optimized for disk space and write efficiency. It uses a log-structured merge-tree (LSM Tree) technology to achieve higher compression and write performance. MyRocks is developed by Facebook, where it is used in production.

We make MyRocks available for MariaDB users via binaries and packages for recent versions of Ubuntu/Debian (deb), Red Hat/CentOS (rpm), Generic Linux (tarballs) and Microsoft Windows. For developers, you can continue to use features like common table expressions (CTEs) and window functions. For administrators, you can continue to enable and configuration parallel replication.

While it’s easy to get started with MyRocks using MariaDB, you have to run a couple of commands to enable it. The process is documented in our knowledge base (and there are links for further MyRocks documentation).

Additional Resources

The MyRocks storage engine was introduced in MariaDB Server 10.2 as an alpha plugin – the maturity of plugins is separate from the database. It became a beta plugin earlier this year, and with the release of MariaDB Server 10.3.5 (RC) last week, it is now a release candidate plugin.

Login or Register to post comments

Categories: Web Technologies

Sneak Peek at Proxytop Utility

Planet MySQL - Fri, 03/09/2018 - 07:18

In this blog post, I’ll be looking at a new tool Proxytop for managing MySQL topologies using ProxySQL. Proxytop is a self-contained, real-time monitoring tool for ProxySQL. As some of you already know ProxySQL is a popular open source, high performance and protocol-aware proxy server for MySQL and its forks (Percona and MariaDB).

My lab uses MySQL and ProxySQL on Docker containers provided by Nick Vyzas. This lab also uses Alexey Kopytov’s Sysbench utility to perform benchmarking against ProxySQL.


Installation of Proxytop is pretty straightforward:

## You may first need to install system Python and MySQL dev packages ## e.g. "sudo apt install python-dev libmysqlclient-dev" pip install MySQL-python npyscreen wget -P /usr/bin https://raw.githubusercontent.com/sysown/proxytop/master/proxytop

At this stage, we have everything we need to demonstrate Proxytop. The lab we have setup provides a bunch of bash scripts to demonstrate load for reruns. I’m using following script under the bin directory:

root@localhost docker-mysql-proxysql]# ./bin/docker-benchmark.bash [Fri Feb 16 10:19:58 BRST 2018] Dropping 'sysbench' schema if present and preparing test dataset:mysql: [Warning] Using a password on the command line interface can be insecure. [Fri Feb 16 10:19:58 BRST 2018] Running Sysbench Benchmarksi against ProxySQL:sysbench 1.0.12 (using bundled LuaJIT 2.1.0-beta2)

This script is totally customizable to benchmark as parameters can be tuned within the script:


Now let’s take a look at the Proxytop utility. It has menu driven style similarly to Innotop. Once you are in the tool, use [tab] to toggle between screens. Various shortcuts are also available to do things like changing sort order (‘s’), filter on specific criteria (‘l’) or changing the refresh interval for the view you are on (‘+’ / ‘-’).

Current, y it supports viewing the following aspects of a ProxySQL instance.

  • ConnPool – “ProxySQL Connection Pool” statistics
  • QueryRules – “ProxySQL Query Rules” statistics and definitions
  • GloStat – “ProxySQL Global Status” statistics
  • ProcList – “ProxySQL Processlist” for all incoming DML / DQL
  • ComCount – “ProxySQL Command Counter” statistics

We’ll go each of these screens in detail.

ConnPool Screen:

This screen basically shows the Connection Pool, specifically:

  • MySQL hostname and port
  • Assigned ProxySQL hostgroup
  • Connection statistics: Used / Free / OK / Error
  • MySQL Server state in ProxySQL i.e. ONLINE / OFFLINE / etc.
  • MySQL Server latency

Query Rules Screen:

This screen shows query rules and their use by count, and can be sorted either by rule_id or hits (ascending or descending) by cycling through the ordering list by pressing “s”.

It also allows you to view the actual definition of each rule by selecting and entering a rule. In the popup window, you will find a list of the relevant and defined columns for the query rule. For example:

If you have a lot of query rules defined, you can filter on a specific rule by pressing the letter “l”:

Global Statistics Screen: This screen shows Global Statistics from ProxySQL divided into four sections.

  • Connection Information
  • Prepared Statement Information
  • Command Information
  • Query Cache information

Proclist Screen: In this screen, we’re able to see running active queries with a minimum of a five-second refresh interval. In this way you can monitor long running queries in flight for troubleshooting:

ComCount Screen: This screen shows all command types executed with the total time and counts for each type, and also provides drill down to view the number of queries executed within specific ranges. This way type of workload can be easily identified both during testing and production:

You can drill down on each Com by using arrows and hitting enter key:

We all know the power of command line utilities such as proxysql-admin. The proxysql-admin utility is designed to be part of the configuration and ad-hoc monitoring of ProxySQL that is explained here in this blog post. Proxytop is designed to be menu driven to repeat commands in intervals. You can easily monitor and administer ProxySQL from the command line, but sometimes running recursive commands and monitoring over a period of time is annoying. This tool helps with that situation.

Categories: Web Technologies

Automated Dependency Injection using Containers - Andrew Embler

Planet PHP - Fri, 03/09/2018 - 04:10

Containers can build your PHP objects for you, saving you time and trouble.
Categories: Web Technologies

On RDBMS, NoSQL and NewSQL databases. Interview with John Ryan

Planet MySQL - Fri, 03/09/2018 - 03:05

“The single most important lesson I’ve learned is to keep it simple. I find designers sometimes deliver over-complex, generic solutions that could (in theory) do anything, but in reality are remarkably difficult to operate, and often misunderstood.”–John Ryan

I have interviewed John Ryan, Data Warehouse Solution Architect (Director) at UBS.


Q1. You are an experienced Data Warehouse architect, designer and developer. What are the main lessons you have learned in your career?

John Ryan: The single most important lesson I’ve learned is to keep it simple. I find designers sometimes deliver over-complex, generic solutions that could (in theory) do anything, but in reality are remarkably difficult to operate, and often misunderstood. I believe this stems from a lack of understanding of the requirement – the second most important lesson.

Everyone from the senior stakeholders to architects, designers and developers need to fully understand the goal. Not the solution, but the “problem we’re trying to solve”. End users never ask for what they need (the requirement), but instead, they describe a potential solution. IT professionals are by nature delivery focused, and, get frustrated when it appears “the user changed their mind”. I find the user seldom changes their mind. In reality, the requirement was never fully understood.

To summarise. Focus on the problem not the solution. Then (once understood), suggest a dozen solutions and pick the best one. But keep it simple.

Q2.  How has the Database industry changed in the last 20 years?

John Ryan: On the surface, not a lot. As recent as 2016 Gartner estimated Oracle, Microsoft and IBM accounted for 80% of the commercial database market, but that hides an underlying trend that’s disrupting this $50 billion industry.

Around the year 2000, the primary options were Oracle, DB2 or SQL Server with data warehouse appliances from Teradata and Netezza. Fast forward to today, and the database engine rankings include over 300 databases of which 50% are open source, and there are over 11 categories including Graph, Wide column, Key-value and Document stores; each suited to a different use-case.

While relational databases are still popular, 4 of the top 10 most popular solutions are non-relational (classed as NoSQL), including MongoDB, Redis and Cassandra. Cross-reference this against a of survey of the most highly sought skills, and we find MongoDB, Redis and Cassandra again in the top ranking, with nearly 40% of respondents seeking MongoDB skills compared to just 12% seeking Oracle expertise.

Likewise open source databases make up 60% of the top ten ranking, with open source database MySQL in second place behind Oracle in the rankings, and Gartner states that “By 2018, more than 70% of new in-house applications will be developed on an [Open Source] DBMS”.

The move towards cloud computing and database-as-a-service is causing further disruption in the data warehouse space with cloud and hybrid challengers including Vertica, Amazon Redshift and Snowflake.

In conclusion, the commercial relational vendors currently dominate the market in revenue terms. However, there has been a remarkable growth of open source alternatives, and huge demand for solutions to handle high velocity unstructured and semi-structured data. These use-cases including social media, and the Internet of Things are ill-suited to the legacy structured databases provided by Oracle, DB2 and SQL Server, and this void has been largely filled by open source NoSQL and NewSQL databases.

Q3. RDBMS vs. NoSQL vs. NewSQL: How do you compare Database Technologies?

John Ryan: The traditional RDBMS solutions from Oracle, IBM and Microsoft implement the relational model on a 1970s hardware architecture, and typically provide a good general purpose database platform which can be applied to most OLTP and Data Warehouse use cases.

However, as Dr. Michael Stonebraker indicated in this 2007 paper, The End of an Architectural Era (It’s Time for a Complete Rewrite), these are no longer fit for purpose, as both the hardware technology, and processing demands have moved on. In particular, the need for real time (millisecond) performance, greater scalability to handle web-scale applications, and the need to handle unstructured and semi-structured data.

Whereas the legacy RDBMS is a general purpose (will do anything) database, the NoSQL and NewSQL solutions are dedicated to a single problem, for example, short lived OLTP operations.

The Key-Value NoSQL databases were developed to handle the massive transaction volume, and low latency needed to handle web commerce at Amazon and LinkedIn. Others (eg. MongoDB) where developed to handle semi-structured data, while still others (eg. Neo4J) were built to efficiently model data networks of the type found at Facebook or LinkedIn.

The common thread with NoSQL databases is they tend to use an API interface rather than industry standard SQL, although increasingly that’s changing. They do however, entirely reject the relational model and ACID compliance. They typically don’t support cross-table join operations, and are entirely focused on low latency, trading consistency for scalability.

The so-called NewSQL databases include VoltDB , MemSQL and CockroachDB are a return to the relational model, but re-architected for modern hardware and web scale use cases.  Like NoSQL solutions, they tend to run on a shared nothing architecture, and scale to millions of transactions per second, but they also have full transaction support and ACID compliance that are critical for financial operations.

Q4. What are the typical trade-off of performance and consistency when using NoSQL and NewSQL databases to support high velocity OLTP and real time analytics?

John Ryan: The shared nothing architecture is built to support horizontal scalability, and when combined with data replication, can provide high availability and good performance. If one node in the cluster fails, the system continues, as the data is available on other nodes. The NoSQL database is built upon this architecture, and to maximize throughput, ACID compliance is relaxed in favor of Eventual Consistency, and in some cases (eg. Cassandra), it supports tunable consistency, allowing the developer to trade performance for consistency, and durability.

For example, after a write operation, the change cannot be considered durable (the D in ACID) until the change is replicated to at least one, ideally two other nodes , but this would increase latency, and reduce performance. It’s possible however, to relax this constraint, and return immediately, with the risk the change may be lost if the node crashes before the data is successfully replicated. This becomes even more of a potential issue if the node is temporarily disconnected from the network, but is allowed to continue accepting transactions until the connection is restored. In practice, consistency will be eventually be achieved when the connection is reestablished – hence the term Eventual Consistency.

A NewSQL database on the other hand accepts no such compromise, and some databases (eg. VoltDB), even support full serializability, executing transactions as if they were executed serially. Impressively, they manage this impressive feat at a rate of millions of transactions per second, potentially on commodity hardware.

Q5. One of the main challenges for real time systems architects is the potentially massive throughput required which could exceed a million transactions per second.  How do handle such a challenge?

John Ryan: The short answer is – with care! The longer answer is described in my article, Big Data – Velocity. I’d break the problem into three components, Data Ingestion, Transformation and Storage.

Data ingestion requires message based middleware (eg. Apache Kafka), with a range of adapters and interfaces, and the ability to smooth out the potentially massive spikes in velocity, with the ability to stream data to multiple targets.

Transformation, typically requires an in-memory data streaming solution to restructure and transform data in near-real time. Options include Spark Streaming, Storm or Flink.

Storage and Analytics is sometimes handled by a NoSQL database, but for application simplicity (avoiding the need to implement transactions or handle eventual consistency problems in the application), I’d recommend a NewSQL database.
All the low-latency, high throughput of the NoSQL solutions, but with the flexibility and ease of a full relational database, and full SQL support.

In conclusion, the solution needs to abandon the traditional batch oriented solution in favour of an always-on streaming solution with all processing in memory.

Q6.  Michael Stonebraker introduced the so called “One Size no longer fits all”-concept. Has this concept come true on the database market?

John Ryan: First stated in the paper One Size Fits All – An Idea Whose Time Has Come And Gone Dr. Michael Stonebraker argued that the legacy RDBMS dominance was at an end, and would be replaced by specialized database technology including stream processing, OLTP and Data Warehouse solutions.

Certainly disruption in the Data Warehouse database market has been accelerated with the move towards the cloud, and as this Gigaom Report illustrates, there are at least nine major players in the market, with new specialized tools including Google Big Query, Amazon Redshift and Snowflake, and the column store (in memory or on secondary storage) dominates.

Finally, the explosion of specialized NoSQL and NewSQL databases, each with its own specialty including Key-Value, Document Stores, Text Search and Graph databases lend credence to the statement “One Size no longer fits all”.

I do think however, we’re still in a transformation stage, and the shake-out is not yet complete. I think a lot of large corporations (especially Financial Services) are wary of change, but it’s already happening.

I think the quote Stewart Brand is appropriate: “Once a new technology rolls over you, if you’re not part of the steamroller, your part of the road”.

Q7. Eventual consistency vs ACID. A pragmatic approach or a step too far?

John Ryan: As with so many decisions in IT, it depends. Eventual Consistency was built into the Amazon Dynamo database as a pragmatic decision, because it’s difficult to maintain high throughput and availability at scale. Amazon accepted the relatively minor risk of inconsistencies because of the huge benefits including scalability and availability. In many other web scale applications (eg. Social media), the implications of temporary inconsistency are not important, and it’s therefore an acceptable approach.

Having said that, it does place a huge addition burden on the developer to code for relatively rare and unexpected conditions, and I’d question why would anyone settle for a database which supported eventual consistency, when full ACID compliance and transaction handling is available.

Q8. What is your take on the Lambda Architecture ?

John Ryan: The Lambda Architecture is a an approach to handle massive data volumes, and provide real time results by running two parallel solutions, a Batch Processing and a real time Speed Processing stream.

As a Data Warehouse and ETL designer, I’m aware of the complexity involved in data transformation, and I was immediately concerned about an approach which involved duplicating the logic, probably in two different technologies.

I’d also question the sense of repeatedly executing batch processing on massive data volumes when processing is moving to the cloud, on a pay for what you use basis.

I’ve since found this article on Lambda by Jay Kreps of LinkedIn and more recently Confluent (who developed Apache Kafka), and he describes the challenges from a position of real (and quite painful) experience.

The article recommends an alternative approach, a single Speed Processing stream with slight changes to allow data to be reprocessed (the primary advantage of the Lambda architecture). This solution, named the Kappa Architecture, is based upon the ability of Kafka to retain and re-play data, and it seems an entirely sensible and more logical approach.

Qx Anything else you wish to add?

John Ryan: Thank you for this opportunity. It may be a Chinese curse that “may you live in interesting times”, but I think it’s a fascinating time to be working in the database industry.


John Ryan, Data Warehouse Solution Architect (Director), UBS.

John has over 30 years experience in the IT industry, and has maintained a keen interest in database technology since his University days in the 1980s when he invented a B-Tree index, only to find Oracle (and several others) had got there already.

He’s worked as a Data Warehouse Architect and Designer for the past 20 years in a range of industries including Mobile Communications, Energy and Financial Services. He’s regularly found writing articles on Big Data and database architecture, and you can find him on LinkedIn.





Related Posts

– On the InterSystems IRIS Data Platform ,ODBMS Industry Watch,  2018-02-09

– Facing the Challenges of Real-Time Analytics. Interview with David Flower, ODBMS Industry Watch, 2017-12-19

– On the future of Data Warehousing. Interview with Jacque Istok and Mike Waas, ODBMS Industry Watch, 2017-11-09

– On Vertica and the new combined Micro Focus company. Interview with Colin Mahony, ODBMS Industry Watch, 2017-10-25

– On Open Source Databases. Interview with Peter Zaitsev, ODBMS Industry Watch, 2017-09-06

Follow us on Twitter: @odbmsorg



Categories: Web Technologies

MySQL 8.0 : more about new authentication plugin and replication

Planet MySQL - Fri, 03/09/2018 - 01:47

MySQL 8.0’s new default authentication plugin means more secure connections for users connections but also for replication… but you need to be aware of it !

Yesterday Giuseppe – the datacharmer – Maxia, submitted a bug related to a strange behavior of MySQL’s replication.

He observed that after creating a new user for replication (using the new authentication method, caching_sha2_password), when he created the slave as usual, replication was not starting… Slave_IO_Running was constantly in “Connecting” and the Last_IO_Error was:

Last_IO_Error: error connecting to master 'repl2@' - retry-time: 60 retries: 51

So, like almost everybody would do, Giuseppe, tried the credentials he created using the mysql client… and it worked ! He could connect to the master, so the credentials where correct ! But the most surprising thing is that replication restarted automatically at the next retry.

He didn’t change anything and all sudendly it worked. I’m sure he tried this scenario several times with always the same result before submitting the bug.

But this is not a bug… In fact, with the new default authentication plugin you need to have the public key of the master for RSA key pair-based password exchange.

With Asynchronous Replication, you need to add to your CHANGE MASTER statement one of the two options below:

  •  MASTER_PUBLIC_KEY_PATH, which indicates the path name to a file containing a slave-side copy of the public key required.
  •  GET_MASTER_PUBLIC_KEY, which notifies the SLAVE to request the public key to the MASTER when set to 1.

For Group Replication, there are also two options:

  • group_replication_recovery_public_key_path
  • group_replication_recovery_get_public_key

In Giuseppe’s case, why did the connection from the client helped ?

This is because caching_sha2_password creates an in-memory cache after first successful authentication through secure channel for a given user account. This cache allows fast authentication based on sha256 scramble for subsequent connection.


But if you setup replication like that without using the appropriate replication options listed above, your solution is temporary, because if the cache gets cleared (FLUSH PRIVILEGES, or restart of the server), the connection will fail again.

Also please note, that this is not the safer way, defining the path and let the DBA distribute the public key is of course more secure, and the the best remains to use SSL.


Indeed, sometimes safer rhymes with a little more difficulty, or at least with some more operations. This means that if you are using, and I recommend you do to so, the new authentication plugin, you will have to manage your keys or at least setup replication to get it from the master.

All this is already in the manual, you can find all this information in the following pages :

Categories: Web Technologies

Loading Third-Party JavaScript

Echo JS - Fri, 03/09/2018 - 00:33
Categories: Web Technologies

JavaScript Named Parameters

Echo JS - Fri, 03/09/2018 - 00:33
Categories: Web Technologies

Bootstrap a React App with Parcel

Echo JS - Fri, 03/09/2018 - 00:33
Categories: Web Technologies

Palindromes in JavaScript

Echo JS - Fri, 03/09/2018 - 00:33
Categories: Web Technologies

Is Houdini ready yet‽

Echo JS - Fri, 03/09/2018 - 00:33
Categories: Web Technologies