emGee Software Solutions Custom Database Applications

Share this

Web Technologies

Migrating to Amazon Aurora: Optimize for Binary Log Replication

Planet MySQL - Fri, 11/16/2018 - 08:25

In this Checklist for Success series, we will discuss reducing unknowns when hosting in the cloud using and migrating to Amazon Aurora. These tips might also apply to other database as a service (DBaaS) offerings.

In our previous article, we discussed the importance of continuous query performance analysis, especially in Amazon Aurora where there is less diagnostic visibility compared to running on EC2 or on-premise. Aside from uptime though, we need a lot more from our data, and we definitely cannot isolate it in Aurora.

Next on our checklist is that at one point or another, we will need to use asynchronous replication. Amazon Aurora has an excellent reputation for absorbing intense amounts of writes, and for many cases where you need an asynchronous replica, any replica can have potential issues catching up.

Different Clusters for Different Workloads

Critical workloads and datasets cannot rely on a single copy of their data. With Amazon Aurora, predictable performance means avoiding mixing workloads within your production cluster. While read heavy workloads might fit easily into read-replicas, reporting or analytics workloads might not be a good fit to execute on your main cluster where read-what-you-write profiles are normally found. You can either delegate this on a separate asynchronous replica as separate Amazon Aurora cluster, or another that runs on an EC2 instance (as an example). This is also true if, say, your analytics or reporting workload generate a significant amount of disk IOPs.

Amazon Aurora IO bills operations per million. You might save some money running disk-heavy analytics operations on a replica running on an i3 instance with a local NVMe for example. Similarly, running an async replica on an EC2 instance or on-premise allows you to take your own independent backups, or just an extra level of redundancy.

Multi-Threaded Replication

It is a known fact that MySQL asynchronous replication performance is subject to some limitations. The biggest one is that, by default, it is single-threaded. MySQL 5.6 introduced multi-threaded replication at one database per thread. This did not apply to the majority of use cases, as workloads vary per database and therefore create an imbalance. With MySQL 5.7 (Aurora 2.0), there have been additional improvements such as an alternative algorithm in parallelizing thread execution that depends on certain behaviors regarding how acting primary servers write binary log entries.

With that said, certain multi-threaded replication variables (transaction_write_set_extraction) require that the binlog format is set to ROW. This might sound counter-intuitive because the ROW binlog format actually can increase the replication workload. While ROW format reduces the ambiguity from potentially non-deterministic statements that could cause a replica to drift and become inconsistent, critical operations (schema changes) and optimizations (MTS) requires that you use ROW binlog format.

It should be apparent by now that the only reasonable path forward to improving asynchronous replication is via the multi-threaded approach. Along with that, there is the need for ROW binlog format. Any design effort should always include this fact if async replication lag is considered a risk. For the basics, configuration options like

slave_compressed_protocol  and binlog_row_image can reduce network churn. In the deep end, reducing dataset hotspots, ensuring tables have PRIMARY KEYs and embracing multi-threaded optimization can also go a long way. Additional Benefits

While running certain read-heavy queries on an async replica, or ensuring you have access to physical datafiles are common use cases, being able to switch to another location (region) or just simply another cluster might also be necessary for some instances.

  • Adding or dropping a column/index on a large table can be achieved with either pt-online-schema-change or gh-ost. But for cases where time is a constraint, applying schema changes on an asynchronous cluster and switching to that sometimes pays for itself.
  • Configuration changes or upgrades that require a cluster restart can take seconds or even minutes. Wouldn’t it be nice if you already had a cluster with all these changes ready to take over at a moment’s notice? Or even fail back to the original if there was an unforeseen issue?

Stay “tuned” for part three.

Meanwhile, we’d like to hear your success stories in Amazon Aurora in the comments below!

Don’t forget to come by and see us at AWS re:Invent, November 26-30, 2018 in booth 1605! Percona CEO Peter Zaitsev will deliver a keynote on MySQL High Availability & Disaster Recovery, Tuesday, November 27 at 1:45 PM – 2:45 PM in the Bellagio Hotel, Level 1, Gauguin 2

Categories: Web Technologies

An Overview of Render Props in React

CSS-Tricks - Fri, 11/16/2018 - 07:05

An Overview of Render Props in React

Using render props in React is a technique for efficiently re-using code. According to the React documentation, "a component with a render prop takes a function that returns a React element and calls it instead of implementing its own render logic." To understand what that means, let’s take a look at the render props pattern and then apply it to a couple of light examples.

The render props pattern

In working with render props, you pass a render function to a component that, in turn, returns a React element. This render function is defined by another component, and the receiving component shares what is passed through the render function.

This is what this looks like:

class BaseComponent extends Component { render() { return <Fragment>{this.props.render()}</Fragment>; } }

Imagine, if you will, that our App is a gift box where App itself is the bow on top. If the box is the component we are creating and we open it, we’ll expose the props, states, functions and methods needed to make the component work once it’s called by render().

The render function of a component normally has all the JSX and such that form the DOM for that component. Instead, this component has a render function, this.props.render(), that will display a component that gets passed in via props.

Example: Creating a counter

See the Pen React Render Props by Kingsley Silas Chijioke (@kinsomicrote) on CodePen.

Let’s make a simple counter example that increases and decreases a value depending on the button that is clicked.

First, we start by creating a component that will be used to wrap the initial state, methods and rendering. Creatively, we’ll call this Wrapper:

class Wrapper extends Component { state = { count: 0 }; // Increase count increment = () => { const { count } = this.state; return this.setState({ count: count + 1 }); }; // Decrease count decrement = () => { const { count } = this.state; return this.setState({ count: count - 1 }); }; render() { const { count } = this.state; return ( <div> {this.props.render({ increment: this.increment, decrement: this.decrement, count: count })} </div> ); } }

In the Wrapper component, we specify the methods and state what gets exposed to the wrapped component. For this example, we need the increment and decrement methods. We have our default count set as 0. The logic is to either increment or decrement count depending on the method that is triggered, starting with a zero value.

If you take a look at the return() method, you’ll see that we are making use of this.props.render(). It is through this function that we pass methods and state from the Wrapper component so that the component that is being wrapped by it will make use of it.

To use it for our App component, the component will look like this:

class App extends React.Component { render() { return ( <Wrapper render={({ increment, decrement, count }) => ( <div> <div> <h3>Render Props Counter</h3> </div> <div> <p>{count}</p> <button onClick={() => increment()}>Increment</button> <button onClick={() => decrement()}>Decrement</button> </div> </div> )} /> ); } } Example: Creating a data list

The gain lies in the reusable power of render props, let’s create a component that can be used to handle a list of data which is obtainable from an API.

See the Pen React Render Props 2 by Kingsley Silas Chijioke (@kinsomicrote) on CodePen.

What do we want from the wrapper component this time? We want to pass the source link for the data we want to render to it, then make a GET request to obtain the data. When the data is obtained we then set it as the new state of the component and render it for display.

class Wrapper extends React.Component { state = { isLoading: true, error: null, list: [] }; fetchData() { axios.get(this.props.link) .then((response) => { this.setState({ list: response.data, isLoading: false }); }) .catch(error => this.setState({ error, isLoading: false })); } componentDidMount() { this.setState({ isLoading: true }, this.fetchData); } render() { return this.props.render(this.state); } }

The data link will be passed as props to the Wrapper component. When we get the response from the server, we update list using what is returned from the server. The request is made to the server after the component mounts.

Here is how the Wrapper gets used:

class App extends React.Component { render() { return ( <Wrapper link="https://jsonplaceholder.typicode.com/users" render={({ list, isLoading, error }) => ( <div> <h2>Random Users</h2> {error ? <p>{error.message}</p> : null} {isLoading ? ( <h2>Loading...</h2> ) : ( <ul>{list.map(user => <li key={user.id}>{user.name}</li>)}</ul> )} </div> )} /> ); } }

You can see that we pass the link as a prop, then we use ES6 de-structuring to get the state of the Wrapper component which is then rendered. The first time the component loads, we display loading text, which is replaced by the list of items once we get a response and data from the server.

The App component here is a class component since it does not manage state. We can transform it into a functional stateless component.

const App = () => { return ( <Wrapper link="https://jsonplaceholder.typicode.com/users" render={({ list, isLoading, error }) => ( <div> <h2>Random Users</h2> {error ? <p>{error.message}</p> : null} {isLoading ? ( <h2>Loading...</h2> ) : ( <ul>{list.map(user => <li key={user.id}>{user.name}</li>)}</ul> )} </div> )} /> ); } That’s a wrap!

People often compare render props with higher-order components. If you want to go down that path, I suggest you check out this post as well as this insightful talk on the topic by Michael Jackson.

The post An Overview of Render Props in React appeared first on CSS-Tricks.

Categories: Web Technologies

PHP on RHEL-8 - Remi Collet

Planet PHP - Fri, 11/16/2018 - 00:16

RHEL-8 Beta is announced and is available for download for whom want to try it.

This is an opportunity to look at PHP installation and how modules work.

1. Installation

ISO image is available for everyone, see the README file.

Don't forget to enable the Beta repositories.

# dnf repolist repo id repo name status rhel-8-for-x86_64-appstream-beta-rpms Red Hat Enterprise Linux 8 for x86_64 - AppStream Beta (RPMs) 4594 rhel-8-for-x86_64-baseos-beta-rpms Red Hat Enterprise Linux 8 for x86_64 - BaseOS Beta (RPMs) 1686 2. Installation of PHP

PHP is not part of BaseOS which is the base of the Operating System, reduced to minimal, but is available in AppStream, i.e. as a module.

# dnf module list Red Hat Enterprise Linux 8 for x86_64 - AppStream Beta (RPMs) php 7.1 devel, minimal, defaul PHP scripting language t [d] php 7.2 [d] devel, minimal, defaul PHP scripting language t [d]

You can see that both versions 7.1 and 7.2 (défaut) are available.

Installation of version 7.1

# dnf module install php:7.1 Dependencies resolved. ========================================================================================================== Package Arch Version Repository Size ========================================================================================================== Installing group/module packages: php-cli x86_64 7.1.20-2.el8+1700+11d526eb rhel-8-for-x86_64-appstream-beta-rpms 2.9 M php-common x86_64 7.1.20-2.el8+1700+11d526eb rhel-8-for-x86_64-appstream-beta-rpms 624 k php-fpm x86_64 7.1.20-2.el8+1700+11d526eb rhel-8-for-x86_64-appstream-beta-rpms 1.5 M php-json x86_64 7.1.20-2.el8+1700+11d526eb rhel-8-for-x86_64-appstream-beta-rpms 70 k php-mbstring x86_64 7.1.20-2.el8+1700+11d526eb rhel-8-for-x86_64-appstream-beta-rpms 547 k php-xml x86_64 7.1.20-2.el8+1700+11d526eb rhel-8-for-x86_64-appstream-beta-rpms 187 k Installing dependencies: httpd-filesystem noarch 2.4.35-6.el8+2089+57a79027 rhel-8-for-x86_64-appstream-beta-rpms 32 k nginx-filesystem noarch 1:1.14.0-3.el8+1631+ba902cf0 rhel-8-for-x86_64-appstream-beta-rpms 23 k Installing module profiles: php/default Enabling module streams: httpd 2.4 nginx 1.14 php 7.1 Transaction Summary ========================================================================================================== Install 8 Packages Total download size: 5.9 M Installed size: 20 M Is this ok [y/N]: y

Result:

# php -v PHP 7.1.20 (cli) (built: Jul 19 2018 06:17:27) ( NTS ) Copyright (c) 1997-2018 The PHP Group Zend Engine v3.1.0, Copyright (c) 1998-2018 Zend Technologies

And you can switch easily to  7.2

# dnf module install php:7.2 Dependencies resolved. ========================================================================================================== Package Arch Version Repository Size ========================================================================================================== Upgrading: php-cli x86_64 7.2.11-1.el8+2002+9409c40c rhel-8-for-x86_64-appstream-beta-rpms 3.1 M php-common x86_64 7.2.11-1.el8+2002+9409c40c rhel-8-for-x86_64-appstream-beta-rpms 653 k php-fpm x86_64 7.2.11-1.el8+2002+9409c40c rhel-8-for-x86_64-appstream-beta-rpms 1.6 M php-json x86_64 7.2.11-1.el8+2002+9409c40c rhel-8-for-x86_64-appstream-beta-rpms 73 k php-mbstring x86_64 7.2.11-1.el8+2002+9409c40c rhel-8-for-x86_64-appstream-beta-rpms 580 k php-xml x86_64 7.2.11-1.el8+2002+9409c40c rhel-8-for-x86_64-appstream-beta-rpms 188 k Switching module streams: php 7.1 -> 7.2 Transaction Summary ========================================================================================================== Upgrade 6 Packages Total download size: 6.2 M Is this ok [y/N]: y

Result:

# php -v PHP 7.2.11 (cli) (built: Oct 9 2018 15:09:36) ( NTS ) Copyright (c) 1997-2018 The PHP Group Zend Engine v3.2.0, Copyright (c) 1998-2018 Zend Technologies 3. Web usage 3.1 with Apache HTTP Server

Truncated by Planet PHP, read more at the original (another 1504 bytes)

Categories: Web Technologies

How Not to do MySQL High Availability: Geographic Node Distribution with Galera-Based Replication Misuse

Planet MySQL - Thu, 11/15/2018 - 16:46

Let’s talk about MySQL high availability (HA) and synchronous replication once more.

It’s part of a longer series on some high availability reference architecture solutions over geographically distributed areas.

Part 1: Reference Architecture(s) for High Availability Solutions in Geographic Distributed Scenarios: Why Should I Care?

Part 2: MySQL High Availability On-Premises: A Geographically Distributed Scenario

The Problem

A question I often get from customers is: How do I achieve high availability in case if I need to spread my data in different, distant locations? Can I use Percona XtraDB Cluster?

Percona XtraDB Cluster (PXC), mariadb-cluster or MySQL-Galera are very stable and well-known solutions to improve MySQL high availability using an approach based on multi-master data-centric synchronous data replication model. Which means that each data-node composing the cluster MUST see the same data, at a given moment in time.

Information/transactions must be stored and visible synchronously on all the nodes at a given time. This is defined as a tightly coupled database cluster. This level of consistency comes with a price, which is that nodes must physically reside close to each other and cannot be geographically diverse.

This is by design (in all synchronous replication mechanisms). This also has to be clarified over and over throughout the years. Despite that we still see installations that span across geographic locations, including AWS Regions.

We still see some solutions breaking the golden rule of proximity, and trying to break the rules of physics as well. The problem/mistake is not different for solutions based on-premises or in the cloud (for whatever cloud provider).

Recently I had to design a couple of customer solutions based on remote geographic locations. In both cases, the customer was misled by an incorrect understanding of how the synchronous solution works, and from a lack of understanding of the network layer. I decided I need to cover this topic again, as I have done previously in Galera geographic replication and Effective way to check network connection in a geographically distributed environment 

What Happen When I Put Things on the Network?

Well, let’s start with the basics.

While light travels at 300 million meters per second, the propagation of the electric fields or electric signaling is slower than that.

The real speed depends by the medium used to transmit it. But it can be said that the real speed normally spans from 0% to 99% of light-speed (depending on the transmission medium).

This means that in optimal conditions the signal travels at approximately 299.72Km per millisecond, in good/mid condition about half that at 149.86Km per millisecond, and in bad conditions it could be 3Km per millisecond or less.

To help you understand, the distance between Rome (Italy) and Mountain View (California) is about 10,062Km. At light-speed it will take 33.54ms. In good conditions (90% of light-speed) the signal will take 37.26ms to reach Mountain View, and in less optimal conditions it can easily double to 74.53 ms.

Keep in mind this is the electric field propagation speed: optimal conditions with no interruption, re-routing and so on. Reality will bring all the kind of interruptions, repeaters and routing.

All the physics above works as a baseline. On top of this, each human construct adds functionalities, flexibility and (unfortunately) overhead – leading to longer times and slower speeds.

The final speed will be different than the simple propagation of the electric fields. It will include the transmission time of complex signaling using ICMP protocol, or even higher delays with the use of a very complex protocol like TCP/IP, which includes handshaking, package rerouting, re-sending and so on. On top of that, when sending things over the internet, we need to realize that it is very improbable we will be the only user sending data over that physical channel. As such, whatever we have “on the road” will need to face bandwidth limitation, traffic congestion and so on.

I had described the difference between protocols (ICMP – TCP/IP) here, clarifying how the TCP/IP scenario is very different from using different protocols like ICMP, or the theoretical approach.

What it all means is that we cannot trust the theoretical performance. We must move to a more empirical approach. But we must understand the right empirical approach or we will be misled.

An Example

I recently worked on a case where a customer had two data centers (DC) at a distance of approximately 400Km, connected with “fiber channel”. Server1 and Server2 were hosted in the same DC, while Server3 was in the secondary DC.

Their ping, with default dimension, to Server3 was ~3ms. Not bad at all, right?

We decided to perform some serious tests, running multiple sets of tests with netperf for many days collecting data. We also used the data to perform additional fine tuning on the TCP/IP layer AND at the network provider.

The results produced a common (for me) scenario (not so common for them):

 

The red line is the first set of tests BEFORE we optimized. The yellow line is the results after we optimized.

The above graph reports the number of transactions/sec (AVG) we could run against the different dimension of the dataset and changing the destination server. The full roundtrip was calculated.

It is interesting to note that while the absolute numbers were better in the second (yellow) tests, this was true only for a limited dataset dimension. The larger the dataset, the higher the impact. This makes sense if you understand how the TCP/IP stack works (the article I mentioned above explains it).

But what surprised them were the numbers. Keeping aside the extreme cases and focusing instead on the intermediate case, we saw that shifting from a 48k dataset dimension to 512K hugely dropped the performance. The drop for executed transactions was from 2299 to 219 on Server2 (same dc) and from 1472 to 167 Server3 (different DC).

Also, note that Server3 only managed ~35% fewer transactions comparing to Server2 from the start given the latency. Latency moved from a more than decent 2.61ms to 27.39ms for Server2 and 4.27ms to 37.25ms for Server3.

37ms latency is not very high. If that had been the top limit, it would have worked.

But it was not.

In the presence of the optimized channel, with fiber and so on, when the tests were hitting heavy traffic, the congestion was such to compromise the data transmitted. It hit a latency >200ms for Server3. Note those were spikes, but if you are in the presence of a tightly coupled database cluster, those events can become failures in applying the data and can create a lot of instability.

Let me recap a second the situation for Server3:

We had two datacenters.

  • The connection between the two was with fiber
  • Distance Km ~400, but now we MUST consider the distance to go and come back. This because in case of real communication, we have not only the send, but also the receive packages.
  • Theoretical time at light-speed =2.66ms (2 ways)
  • Ping = 3.10ms (signal traveling at ~80% of the light speed) as if the signal had traveled ~930Km (full roundtrip 800 Km)
  • TCP/IP best at 48K = 4.27ms (~62% light speed) as if the signal had traveled ~1,281km
  • TCP/IP best at 512K =37.25ms (~2.6% light speed) as if the signal had traveled ~11,175km

Given the above, we have from ~20%-~40% to ~97% loss from the theoretical transmission rate. Keep in mind that when moving from a simple signal to a more heavy and concurrent transmission, we also have to deal with the bandwidth limitation. This adds additional cost. All in only 400Km distance.

This is not all. Within the 400km we were also dealing with data congestions, and in some cases the tests failed to provide the level of accuracy we required due to transmission failures and too many packages retry.

For comparison, consider Server2 which is in the same DC of Server1. Let see:

  • Ping = 0.027ms that is as if the signal had traveled ~11km light-speed
  • TCP/IP best at 48K = 2.61ms as if traveled for ~783km
  • TCP/IP best at 512K =27.39ms as if traveled for ~8,217km
  • We had performance loss, but the congestion issue and accuracy failures did not happen.

You might say, “But this is just a single case, Marco, you cannot generalize from this behavior!”

You would be right IF that were true (but is not).

The fact is, I have done this level of checks many times and in many different environments. On-premises or using the cloud. Actually, in the cloud (AWS), I had even more instability. The behavior stays the same. Please test it yourself (it is not difficult to use netperf). Just do the right tests with RTT and multiple requests (note at the end of the article).

Anyhow, what I know from the tests is that when working INSIDE a DC with some significant overhead due to the TCP/IP stack (and maybe wrong cabling), I do not encounter the same congestion or bandwidth limits I have when dealing with an external DC.

This allows me to have more predictable behavior and tune the cluster accordingly. Tuning that I cannot do to cover the transmission to Server3 because of unpredictable packages behavior and spikes. >200ms is too high and can cause delivery failures.

If we apply the given knowledge to the virtually-synchronous replication we have with Galera (Percona XtraDB Cluster), we can identify that we are hitting the problems well-explained in Jay’s article Is Synchronous Replication right for your appThere, he explains Callaghan’s Law: [In a Galera cluster] a given row can’t be modified more than once per RTT. 

On top of that, when talking of geographical disperse solutions we have the TCP/IP magnifying the effect at writeset transmission/latency level. This causes nodes NOT residing on the same physical contiguous network delay for all the certification-commit phases for an X amount of time.

When X is predictable, it may range between 80% – 3% of the light speed for the given distance. But you can’t predict the transmission-time of a set of data split into several datagrams, then sent on the internet, when using TCP/IP. So we cannot use the X range as a trustable measure.

The effect is unpredictable delay, and this is read as a network issue from Galera. The node can be evicted from the cluster. Which is exactly what happens, and what we experience when dealing with some “BAD” unspecified network issue. This means that whenever we need to use a solution based on tightly coupled database cluster (like PXC), we cannot locate our nodes at a distance that is longer than the largest RTT time of our shortest desired period of commit.

If our application must apply the data in a maximum of 200ms in one of its functions, our min RTT is 2ms and our max RTT is 250ms. We cannot use this solution, period. To be clear, locating a node on another geolocation, and as such use the internet to transmit/receive data, is by default a NO GO given the unpredictability of that network link.

I doubt that nowadays we have many applications that can wait an unpredictable period to commit their data. The only case when having a node geographically distributed is acceptable is if you accept commits happening in undefined periods of time and with possible failures.

What Is the Right Thing To Do?

The right solution is easier than the wrong one, and there are already tools in place to make it work efficiently. Say you need to define your HA solution between the East and West Coast, or between Paris and Frankfurt. First of all, identify the real capacity of your network in each DC. Then build a tightly coupled database cluster in location A and another tightly coupled database cluster in the other location B. Then link them using ASYNCHRONOUS replication.

Finally, use a tool like Replication Manager for Percona XtraDB Cluster to automatically manage asynchronous replication failover between nodes. On top of all of that use a tool like ProxySQL to manage the application requests.

The full architecture is described here.

Conclusions

The myth of using ANY solution based on tightly coupled database cluster on distributed geographic locations is just that: a myth. It is conceptually wrong and practically dangerous. It MIGHT work when you set it up, it MIGHT work when you test it, it MIGHT even work for some time in production.

By definition, it will break, and it will break when it is least convenient. It will break in an unpredictable moment, but because of a predictable reason. You did the wrong thing by following a myth.

Whenever you need to distribute your data over different geographic locations, and you cannot rely on a single physical channel (fiber) to connect the two locations, use asynchronous replication, period!

References

https://github.com/y-trudeau/Mysql-tools/tree/master/PXC

http://www.tusacentral.net/joomla/index.php/mysql-blogs/164-effective-way-to-check-the-network-connection-when-in-need-of-a-geographic-distribution-replication-.html

https://www.percona.com/blog/2013/05/14/is-synchronous-replication-right-for-your-app/

Sample test

#!/bin/bash test_log=/tmp/results_$(date +'%Y-%m-%d_%H_%M_%S').txt exec 9>>"$test_log" exec 2>&9 exec 1>&9 echo "$(date +'%Y-%m-%d_%H_%M_%S')" >&9 for ip in 11 12 13; do echo " ==== Processing server 10.0.0.$ip === " for size in 1 48 512 1024 4096;do echo " --- PING ---" ping -M do -c 5 10.0.0.$ip -s $size echo " ---- Record Size $size ---- " netperf -H 10.0.0.$ip -4 -p 3307 -I 95,10 -i 3,3 -j -a 4096 -A 4096 -P 1 -v 2 -l 20 -t TCP_RR -- -b 5 -r ${size}K,48K -s 1M -S 1M echo " ---- ================= ---- "; done echo " ==== ----------------- === "; done

 

Categories: Web Technologies

MySQL High Availability On-Premises: A Geographically Distributed Scenario

Planet MySQL - Thu, 11/15/2018 - 16:46

MySQL High Availability. Shutterstock.com

In this article, we’ll look at an example of an on-premises, geographically distributed MySQL high availability solution. It’s part of a longer series on some high availability reference architecture solutions over geographically distributed areas.

Part 1: Reference Architecture(s) for High Availability Solutions in Geographic Distributed Scenarios: Why Should I Care?

Percona consulting’s main aim is to identify simple solutions to complex problems. We try to focus on identifying the right tool, a more efficient solution, and what can be done to make our customers’ lives easier. We believe in doing the work once, doing it well and have more time afterward for other aspects of life.

In our journey, we often receive requests for help – some simple, some complicated.  

Scenario

The company “ACME Inc.” is moving its whole business from a monolithic application to a distributed application, split into services. Each different service deals with the requests independently from each other. Some services follow the tightly-bounded transactional model, and others work/answer asynchronously. Each service can access the data storage layer independently.

In this context, ACME Inc. identified the need to distribute the application services over wide geographic regions, focusing on each region achieving scale independently.

The identified regions are:

  • North America
  • Europe
  • China

ACME Inc. is also aware of the fact that different legislation acts on each region. As such, each region requires independent information handling about sales policies, sales campaigns, customers, orders, billing and localized catalogs, but will share the global catalog and some historical aggregated data. While most of the application services will work feeding and reading local distributed caches, the basic data related to the catalog, sales and billing is based on an RDBMS.

Historical data is instead migrated to a “Big Data” platform, and aggregated data is elaborated and push to a DWH solution at HQ. The application components are developed using multiple programming languages, depending on the service.   

The RDBMS identified by ACME Inc. in collaboration with the local authorities was MySQL-oriented. There were several solutions like:

  • PostgreSQL
  • Oracle DB
  • MS SQL server

We excluded closed-source RDBMSs given that some countries imposed a specific audit plugin. This plugin was only available for the mentioned platforms. The cost of parallel development and subsequent maintenance in case of RDBMS diversification was too high. As such all the regions must use the same major RDBMS component.

We excluded PostgreSQL given that compared to the adoption of MySQL, utilization cases were higher and MySQL had a well-defined code producer. Finally, the Business Continuity team of ACME Inc., had defined an ITSC (Information Technology Service Continuity) plan that defined the RPO (Recovery Point Objective), the RTO (Recovery Time Objective) and system redundancy.

That’s it. To fulfill the ITSCP, each region must have the critical system redundantly replicated in the same region, but not on the proximity.

Talking About the Components

This is a not-so-uncommon scenario, and it also presents a lot of complexity if you try to address it with one solution. But let’s analyze it and see how we can simplify the approach while still meeting the needs and requirements of ACME Inc.

When using MySQL-based solutions, the answer to “what should we use?” is use what best fits your business needs. The “nines” availability reference table for the MySQL world (most RDBMSs) can be summarized below:

9 0. 0 0 0 % (36 days) MySQL Replication
9 9. 9 0 0 % (8 hours) Linux Heartbeat with DRBD (Obsolete DRBD)
9 9. 9 0 0 % (8 hours) RHCS with Shared Storage (Active/Passive)
9 9. 9 9 0 % (52 minutes) MHA/Orchestrator with at least three nodes
9 9. 9 9 0 % (52 minutes) DRBD and Replication (Obsolete DRBD)
9 9 .9 9 5 % (26 minutes) Multi-Master (Galera replication) 3 node minimum
9 9. 9 9 9 % (5 minutes) MySQL Cluster

An expert will tell you that it always doesn’t make sense to go for the most “nines” in the list. This because each solution comes with a tradeoff: the more high availability (HA) you get, the higher the complexity of the solution and in managing the solution.

For instance, the approach used in MySQL Cluster (NDB) makes this solution not suitable for generic utilization. It requires proper analysis of the application needs, data utilization and archiving before being selected. It also requires in-depth knowledge to properly manage the cluster, as it is more complex than other similar solutions.

This indirectly makes a solution based on MySQL+Galera replication the one with the highest HA level a better choice, since it is close to the defaults generalized utilizations. 

This is why MySQL+Galera replication has become in the last six years the most used solution for platform looking for very high HA, without the need to diverge from standard MySQL/InnoDB approach. You can read more about Galera replication: http://galeracluster.com/products/ 

Read more about Percona XtraDB Cluster.

There are several distributions implementing Galera replication:

*Note that MariaDB Cluster/Server and all related solutions coming from MariaDB have significantly diverged from the MySQL mainstream. This often means that once migrated to MariaDB; your database will not be compatible with other MySQL solutions. In short, you are locked-in to MariaDB. It is recommended that you carefully evaluate the move to MariaDB before making that move.

Choosing the Components RDBMS

Our advice is to use Percona XtraDB Cluster (PXC), because at the moment it is one of the most flexible and reliable and compatible solutions. PXC is composed of three main components:

The cluster is normally composed of three nodes or more. Each node can be used as a Master, but the preferred and recommended way is to use one node as a Writer and the other as Readers.

Application-wise, accessing the right node can be challenging since this means you need to be aware of which node is the writer, which is the reader, and be able to shift from one to the other if necessary.

Proxy

To simplify this process, it helps to have an additional component that works as a “proxy” connecting the application layer to the desired node(s). The most popular solutions are:

  • HAProxy
  • ProxySQL

There are several important differences between the two. But in summary, ProxySQL is a Level 7 proxy and is MySQL protocol aware. So, while HAProxy is just passing the connection over as a forward proxy (level 4), ProxySQL is aware of what is going through it and acts as reverse proxy. 

With ProxySQL is possible to decide, based on several parameters, where to send traffic (read/write split and more), what must be stopped, or if we should rewrite an incoming SQL command. A lot of information is available on the ProxySQL website https://github.com/sysown/proxysql/wiki and on the Percona Database Performance Blog .

Backup/Restore

No RDBMS platform is safe without a well-tested procedure for backup and recovery. The Percona XtraDB Cluster package distribution comes with Percona XtraBackup as the default method for node provisioning. A good backup and restore (B/R) policy start from the consideration of ACME’s ITSCP, to have full and incremental backups, perfectly covering the RPO, and a good recovery procedure to keep the recovery time inside RTO whenever possible.

There are several tools that allow you to plan and execute backup/restore procedure, some coming from vendors other than open source and community-oriented. In respect to being a fully open source and community-oriented, we in consulting normally suggest using: https://github.com/dotmanila/pyxbackup.

Pyxbackup is a wrapper around XtraBackup that helps simplify the B/R operations, including the preparation of a full and incremental set. This helps significantly reduce the recovery time.  

Disaster Recovery

Another very important aspect of the ITSC Plan is the capacity of the system to survive to major disasters. The disaster and recovery (DR) solution must be able to act as the main production environment. Therefore, it must be designed and scaled as the main production site in resources. It must be geographically separated, normally hundreds of kilometers or more. It must be completely independent of the main site. It must be as much as possible in sync with the main production site.

While the first three “musts” are easy to understand, the fourth one is often the object of misunderstanding.

The concept of be as much in sync with the production site as possible creates confusion in designing HA solutions with Galera replication involved. The most common misunderstanding is the misuse of the Galera replication layer. Mainly the conceptual confusion between tightly coupled database cluster and loosely coupled database cluster.

Any solution based on Galera replication is a tightly coupled database cluster, because the whole idea is to be data-centric, synchronously distributed and consistent. The price is that this solution cannot be geographically distributed.

Solutions like standard MySQL replication are instead loosely coupled database cluster and they are designed to be asynchronous. Given that, the nodes connected by it are completely independent in processing/apply the transaction, and the solution fits perfectly into ANY geographically distributed replication solution. The price is that data on the receiving front might not be up to date with the one from the source in that specific instant.

The point is that for the DR site the ONLY valid solution is the asynchronous link (loosely coupled database cluster), because by design and requirement the two sites must be separated by a significant number of kilometers. For better understanding about why synchronous replication cannot work in a geographically distributed scenario, see “Misuse of Geographic Node distribution with Galera-based replication“.

In our scenario, the use of Percona XtraDB Cluster helps to create a most robust asynchronous solution. This is because each tightly coupled database cluster, no matter if source or destination, will be seen by the other tightly coupled database cluster as a single entity.

What it means is that we can shift from one node to another inside the two clusters, still confident we will have the same data available and the same asynchronous stream passing from one source to the other.

To ensure this procedure is fully automated, we add to our architecture the last block: replication manager for Percona XtraDB Cluster (https://github.com/y-trudeau/Mysql-tools/tree/master/PXC). RMfP is another open source tool that simplifies and automates failover inside each PXC cluster such that our asynchronous solution doesn’t suffer if the node is currently acting as Master fails.  

How to Link the Components

Summarizing all the different components of our solution:

  • Application stack
    • Load balancer
    • Application nodes by service
    • Distributed caching
    • Data access service
  • Database stack
    • Data proxy (ProxySQL)
    • RDBMS (Percona XtraDB Cluster)
    • Backup/Restore
      • Xtrabackup
      • Pyxbackup
      • Custom scripts
    • DR
      • Replication Manager for Percona XtraDB Cluster
  • Monitoring
    • PMM (not covered here see <link> for detailed information)

 

In the solution above, we have two locations separated by several kilometers. On top of them, the load balancer(s)/DNS resolution redirects the incoming traffic to the active site. Each site hosts a full application stack, and applications connect to local ProxySQL.

ProxySQL has read/write enabled to optimize the platform utilization, and is configured to shift writes from one PXC node to another in case of node failure. Asynchronous replication connects the two locations and transmits data from master to slave.

Note that with this solution, it is possible to have multiple geographically distributed sites.

Backups are taken at each site independently and recovery test is performed. RMfP oversees and modifies the replication channels in the case of a node failure.

Finally, Percona Monitoring and Management (PMM) is in place to perform in-depth monitoring of the whole database platform.

Conclusions

We always look for the most efficient, manageable, user-friendly combination of products, because we believe in providing and supporting the community with simple but efficient solutions. What we have presented here is the most robust and stable high availability solution in the MySQL space (except for MySQL NDB that we have excluded). 

It is conceptualized to provide maximum service continuity, with limited bonding between the platforms/sites. It also is a well-tested solution, that has been adopted and adapted in many different scenarios where performance and real HA are a must. I have preferred to keep this digression at a high level, given the details of the implementation have already been discussed elsewhere (see reference section for more reading).

Still, Percona XtraDB Cluster (as any other solution implementing Galera replication) might not fit the final use. Given that, it is important to understand where it does and doesn’t fit. This article is a good summary with examples: Is Synchronous Replication right for your app?.

Check out the next article on How Not to do MySQL High Availability.

References

https://www.percona.com/blog/2016/06/07/choosing-mysql-high-availability-solutions/

https://dev.mysql.com/doc/mysql-ha-scalability/en/ha-overview.html

https://www.percona.com/blog/2014/11/17/typical-misconceptions-on-galera-for-mysql/

http://galeracluster.com/documentation-webpages/limitations.html

http://tusacentral.net/joomla/index.php/mysql-blogs/170-geographic-replication-and-quorum-calculation-in-mysqlgalera.html

http://tusacentral.net/joomla/index.php/mysql-blogs/167-geographic-replication-with-mysql-and-galera.html

http://tusacentral.net/joomla/index.php/mysql-blogs/164-effective-way-to-check-the-network-connection-when-in-need-of-a-geographic-distribution-replication-.html

http://tusacentral.net/joomla/index.php/mysql-blogs/183-proxysql-percona-cluster-galera-integration.html

https://github.com/sysown/proxysql/wiki

 

Categories: Web Technologies

Reference Architecture(s) for High Availability Solutions in Geographic Distributed Scenarios: Why Should I Care?

Planet MySQL - Thu, 11/15/2018 - 16:44

High Availability Solutions. Shutterstock.com

In this series of blog posts, I’m going to look at some high availability reference architecture solutions over geographically distributed areas.

The Problem

Nowadays, when businesses plan a new service or application, it is very common for them to worry about ensuring a very high level of availability. 

It doesn’t matter if we are talking about an online shop, online banking or the internal services of a large organization. We know users are going to expect access to services 24x7x365. They also expect to access data consistently and instantaneously. If we fail to meet their expectations, then they move to another provider and we lose money. Simple as that.

The other important aspect of providing online services and applications is that the amount of data produced, analyzed and stored is growing every day. We’ve moved from the few gigabytes of yesterday to terabytes today. Who knows what number of petabytes we need tomorrow?

What was once covered with a single LAMP stack, today can require dozens of Ls, As, different letters instead of P (like J, R, Py, G) and M. Our beloved MySQL that used to be “enough” to cover our needs 12 years ago is not fitting well with all the needs of many modern applications.

It is very common to have an application using different types of  “storage” at different levels and in different aspects of their activities. We can use a key-value store to cache inflight operations, and a relational full ACID database for the “valuable” core data (the kind of data that must be consistent and durable). Large data gets stored in an eventually consistent columns store mechanism, and long-term data in some “big data” approach. 

On top of all this is are reporting mechanisms that collect elements of each data store to provide a required, comprehensive data picture. The situation is very diversified and complex, and the number of possible variables is high. The way we can combine them is so vast that nowadays developers have no limits, and often comes up with creative solutions.

This is where we as architects can help: we can clarify how each tool can be used for the right JOB. We, at Percona, have the strong belief that we must seek simplicity in the complexity, and embracing the KISS approach. This starts with the initial identification of the right tool for the job.

Let’s start by looking at the following good practices in the following examples:

  • It is not a good idea to use key-value storage if you need to define the relationship between entities and rules between them.
  • Avoid using an eventually consistent storage when you have to save monetary information about customer payments.
  • It’s not a best practice to use a relational database to store HTML caching, page-tracking information, or game info in real time.

Use the right tool for the right job. Some tools scale writes better and keep an eventually consistent approach. Some others are designed to store an unbelievable amount of data, but cannot handle relations. As a result, they might take a long time when processing a typical OLTP request – if they can at all. Each tool has a different design and goal, each one scales differently, and each one has its way of handling and improving availability.

It is a crucial part of the architectural phase of your project not to mix the cards. Keep things clean and build the right architecture for each component. Then combine them in the way that harmonizes in the final result. We should optimize each block when solving a complex issue with simple answers.

How far are we from the old LAMP single stack? Ages. It is like turning your head and looking at our ancestors building the first tents. Tents are still a valid solution if you want to go camping. But only for fun, not for everyday life.

There is too often confusion around what a relational database should do and how it should do it. A relational database should not replace every other component of the wide architecture, and vice versa. They must coexist and work together with other options. Each one should maximize its characteristics and minimize its limitations.

In this series, we will focus on RDBMSs, and we will present a few possible reference architectures for the relational database layer. I will illustrate solutions that improve service availability, keeping a focus on what the tool’s design and the relational data approach concerning the ACID paradigm.

This means employing the simple rules of:

  • Atomicity -> All operations, part of the same transaction, are concluded successfully or not applied at all.
  • Consistency -> Any data written must be valid/validated against the defined rules and combination thereof.
  • Isolation -> Guarantees that all transactions will occur in isolation. No transaction affects any other transaction.
  • Durability -> Durability means that, once a transaction is committed, it will remain in the system even if there’s a system crash immediately following the transaction. Transaction changes must be stored permanently.

We will discuss the solution involving the most common open source RDBMSs, covering on-premises and in the cloud:

  • MySQL
  • PostgreSQL
  • MongoDB

The scenario will be common to all solutions, but the way we implement the solution will instead answer to different needs. The first example is MySQL high availability on premises: MySQL High Availability on premises.

Categories: Web Technologies

CSS Animations and Transitions in Email

CSS-Tricks - Thu, 11/15/2018 - 14:55

We don't generally think of CSS animations or transitions inside of email, or really any movement at all outside of an awkward occasional GIF. But there is really no reason you can't use them inside HTML emails, particularly if you do it in a progressive enhancement-friendly way. Like, you could style a link with a hover state and a shaking animation, but if the animation (or even the hover) doesn't work, it's still a functional link. Heck, you can use CSS grid in email, believe it or not.

Jason Rodriguez just wrote Understanding CSS Animations in Email: Transitions and Keyframe Animations that covers some of the possibilities. On the supported list of email clients that support CSS transitions and keyframe animations is Apple Mail, Outlook, and AOL mail, among others.

Other things to look at:

The post CSS Animations and Transitions in Email appeared first on CSS-Tricks.

Categories: Web Technologies

Scaling CSS: Two Sides of a Spectrum

CSS-Tricks - Thu, 11/15/2018 - 07:16

The subject of scaling CSS came up a lot in a recent ShopTalk Show with Ben Frain. Ben has put a lot of thought into the subject, even writing a complete book on it, Enduring CSS, which is centered around a whole ECSS methodology.

He talked about how there are essentially two solutions for styling at scale:

  1. Total isolation
  2. Total abstraction

Total isolation is some version of writing styles scoped to some boundary that you've set up (like a component) in which those styles don't leak in or out.

Total abstraction is some version of writing styles that are global, yet so generic and re-usable, that they have no unintended side effects.

Total isolation might come from <style scoped> in a .vue file, CSS modules in which CSS class selectors and HTML class attributes are dynamically generated gibberish, or a CSS-in-JS project, like glamerous. Even strictly-followed naming conventions like BEM can be a form of total isolation.

Total abstraction might come from a project, like Tachyons, that gives you a fixed set of class names to use for styling (Tailwind is like a configurable version of that), or a programmatic tool (like Atomizer) that turns specially named HTML class attributes into a stylesheet with exactly what it needs.

It's the middle ground that has problems. It's using a naming methodology, but not holding strictly to it. It's using some styles in components, but also having a global stylesheet that does random other things. Or, it's having lots of developers contributing to a styling system that has no strict rules and mixes global and scoped styles. Any stylesheet that grows and grows and grows. Fighting it by removing some unused styles isn't a real solution (and here's why).

Note that the web is a big place and not all projects need a scaling solution. A huge codebase with hundreds of developers that needs to be maintained for decades absolutely does. My personal site does not. I've had my fair share of styling problems, but I've never been so crippled by them that I've needed to implement something as strict as Atomic CSS (et al.) to get work done. Nor at at any job I've had so far. I see the benefits though.

Imagine the scale of Twitter.com over a decade! Nicolas has a great thread where he compares Twitter's PWA against Twitter's legacy desktop website.

The legacy site's CSS is what happens when hundreds of people directly write CSS over many years. Specificity wars, redundancy, a house of cards that can't be fixed. The result is extremely inefficient and error-prone styling that punishes users and developers alike.

The post Scaling CSS: Two Sides of a Spectrum appeared first on CSS-Tricks.

Categories: Web Technologies

Why monday.com is the Universal Team Management Tool for Your Team

CSS-Tricks - Thu, 11/15/2018 - 06:52

This platform is perfect for teams sized at 2-to-200 — and gives every employee the same level of transparency.

Every project management tool seeks to do the same instrumental thing: keep teams connected, on task and on deadline to get major initiatives done. But the market is getting pretty crowded, and for good reason — no platform seems to have gotten the right feel for what people need to see, and how that information should be displayed so that it’s both actionable/relevant and contextualized.

That’s why monday.com is worth a shot. The platform is based off a simple, but powerful idea: that as humans, we like to feel like we’re contributing to part of a greater/effort good — an idea that sometimes gets lost in the shuffle as we focus on the details of getting stuff done. So projects are put onto a task board (think of it like a digital whiteboard), where everyone can have the same level of visibility into anyone else who’s contributing a set of tasks. That transparency breaks down the silos between teams that cause communication errors and costly project mistakes — and it’s a beautiful, simple way to connect people to the processes that drive forward big business initiatives.

Whether you’re part of a tech-forward team or not, monday.com is a welcome relief to cumbersome Excel files, messy (physical) whiteboards, or meetings that waste time when actual work could be completed. The scalable, intuitive structure can effectively work for a team of two, or an international team of 2,000+ — and a beautiful, color-coded board lays out tasks you can cleanly see and tag for various stages of completion. That way, employees can see exactly what needs to be done (and who needs to do it), while managers can optimize their time re-allocating resources as necessary to optimize processes. It’s a win-win.

monday.com also allows teams to communicate within the platform, cutting down on the amount of laborious sifting through various email threads to figure out a workflow. Messages can be sent inside of tasks — so all the communication is contextualized before meeting resolution or seeking it. The platform also supports uploads, so documents and videos can be added to facilitate more collaboration, and integration with other productivity apps. So if your team is already using tools like Slack, Google Calendar, Dropbox, Microsoft Excel, Trello, and Jira, there’s specific, clean shortcuts to integrate the information from those platforms into monday.com. And even beyond team communication and management, you can use monday.com for client-facing exchanges, so all your messages are consolidated into a single place.

The platform recently raised $50M in funding, and received nods from the likes of Forbes, Entrepreneur, Business Insider, and more for its ability to empower international teams to do better work together. Best of all, unlike other team management software, which can be pricey and time-intensive to scope, test and run, you can try monday.com today — for free.

What can this app do?
  • Creating and managing a project’s milestones
  • Creating and assigning tasks
  • Attaching files to any project’s table projects on the go.
  • Using mobile applications to manage projects
  • Communicating with your team members
  • Updating team using the news feed
  • Keeping clients in the loop
  • Organizing the organization into teams
  • Creating detailed project charts and reports
  • Tracking the time your team members spend on tasks
  • Managing a project's financials
  • Website as well as a desktop app for Mac and Windows

monday.com to make every user feel empowered and part of something bigger than their own individual tasks, and as a result, to boost collective productivity and transparency.

The post Why monday.com is the Universal Team Management Tool for Your Team appeared first on CSS-Tricks.

Categories: Web Technologies

Importing Data from MongoDB to MySQL using Python

MySQL Server Blog - Thu, 11/15/2018 - 05:16

MySQL Shell 8.0.13 (GA) introduced a new feature to allow you to easily import JSON documents to MySQL. The basics of this new feature were described in a previous blog post. In this blog we we will provide more details about this feature, focusing on a practical use case of interest for to many: How to import JSON data from MongoDB to MySQL.…

Categories: Web Technologies

Importing Data from MongoDB to MySQL using Python

Planet MySQL - Thu, 11/15/2018 - 05:16

MySQL Shell 8.0.13 (GA) introduced a new feature to allow you to easily import JSON documents to MySQL. The basics of this new feature were described in a previous blog post. In this blog we we will provide more details about this feature, focusing on a practical use case of interest for to many: How to import JSON data from MongoDB to MySQL.…

Categories: Web Technologies

Pages