emGee Software Solutions Custom Database Applications

Share this

Web Technologies

How to Install Seafile with Nginx on Ubuntu 18.04 LTS

Planet MySQL - Fri, 06/15/2018 - 08:20
Seafile is an open source file-hosting and cloud storage system similar to Dropbox, but you can install and run it on your own server. In this tutorial, I will show you step-by-step how to install and configure a Seafile server with Nginx web server and the MySQL database.
Categories: Web Technologies

MySQL Performance : IP port -vs- UNIX socket impact in 8.0 GA

Planet MySQL - Fri, 06/15/2018 - 07:05
Generally, when I'm analyzing MySQL Performance on Linux with "localhost" test workloads, I'm configuring client connections to use IP port (loopback) to connect to MySQL Server (and not UNIX socket) -- this is still at least involving IP stack in the game, and if something is going odd on IP, we can be aware ahead about. And indeed, it already helped several times to discover such kind of problems even without network links between client/server (like this one, etc.). However, in the past we also observed a pretty significant difference in QPS results when IP port was used comparing to UNIX socket (communications via UNIX socket were going near 15% faster).. Over a time with newer OL kernel releases this gap became smaller and smaller. But in all such cases it's always hard to say if the gap was reduced due OS kernel / IP stack improvements, or it's just because MySQL Server is hitting new scalability bottlenecks ;-))
Anyway, I've still continued to use IP port in all my tests until now. But recently the same discussion about IP port impact -vs- UNIX socket came up again in other investigations, and I'd like to share the results I've obtained on different HW servers with MySQL 8.0 GA release.
The test workload is the most "aggressive" for client/server ping-pong exchange -- Sysbench point-selects (the same test scenario which initially already reported 2.1M QPS on MySQL 8.0) -- but this time on 3 different servers :
  • 4CPU Sockets (4S) 96cores-HT Broadwell
  • 2CPU Sockets (2S) 48cores-HT Skylake
  • 2CPU Sockets (2S) 44cores-HT Broadwell

and see how different the results will be when exactly the same test is running via IP port / or UNIX socket.
Broadwell 2S 44cores-HT - IP port : - UNIX socket : Comments :
  • wow, up to 18% difference !
  • and you can clearly see MySQL 8.0 out-passing 1.2M QPS with UNIX socket, and staying under 1.1M QPS with IP port..


Read more... (4 min remaining to read)

Categories: Web Technologies

MongoDB versus MySQL Document Store command comparisons I

Planet MySQL - Thu, 06/14/2018 - 14:36
Both MongoDB and the MySQL Document Store are JSON document stores.  The syntax differences in the two products are very interesting.  This long will be a comparison of how commands differ between these two products and may evolve into a 'cheat sheet' if there is demand.

I found an excellent Mongo tutorial Getting Started With MongoDB that I use as a framework to explore these two JSON document stores.
The DataI am using the primer-dataset.json file that MongoDB has been using for years  in their documentation, classes, and examples. MySQL has created the world_x data set based on the world database used for years in documentation, classes and examples.  The data set is a collection of JSON documents filled with restaurants around Manhattan.

For the Mongo examples the schema name is test and the collection is named restaurants while the MySQL corollary schema name is nyeats and the collection is named restaurants.  I kept the collection names the same between the two products and hope that the differences in schema names causes no problems. Please see my previous entry if you seek details on loading this data into the MySQL Document Store.

Starting the ShellsThe first step in comparing how the two work is access the data through their respective shells.  
The MySQL mysqlsh connected to the nyeats schema
The MongoDB mongo shell connected to the test schema  I have widows with both shells ready to go and not it is time to start the comparison.

All The Records in a Collection

Both use db as a global variable to point to the current schema. Simply typing db at the command prompt will report back the current active schema for both.

But what if you want to see all the records in the collection restaurants?  With both you can issue db.restaurants.find() but where MySQL returns all the documents in the collection Mongo has a pager that requires you to type 'it' to continue?

Find Documents by CuisineSo lets pick restaurants by their cuisine and since Red Beans and Rice is one of my favorites we will use Cajun as the cuisine of choice.  The arguments to the find() function are a JSON object in Mongo and an equation for MySQL.

MySQL:  db.restaurants.find("cuisine = 'Cajun'")
Mongo:   db.restaurants.find( { "cuisine" : "Cajun" })

The output is show below under 'Output From Cajun Cuisine as it takes up a lot of real estate on a computer screen.  The big difference for those who do not want to page down is that MySQL pretty prints the output while Mongo does not. The pretty print is much easier on my old eyes.

Restaurants By Zipcode How about we look for restaurants in one Zipcode (or postal code for those outside the USA). By the way a Zipcode can cover a lot of territory.

Mongo takes a JSON object as the search parameter while MySQL wants and equation.  Note that we are using a second tier key 'address.zipcode' to reach the desired information.

MySQL:  db.restaurants.find("address.zipcode = '10075'")
MongoDB:  db.restaurants.find( { "address.zipcode": "10075" })

When gt Is Not Great Than >!!!

I wanted to tinker with the above by changing the equal sign to a great than. It is easy to change the equal sign in the MySQL argument to any other relation symbol like <, >, or >= intuitively.  I am still working on getting Mongo's $gt to work (Not intuitive or easy).

Logical OR

So far there has not been a whole lot of difference between the two. But now we start to see differences. The or operator for Mongo wants a JSON array with the delimiters inside JSON objects.  MySQL looks more like traditional SQL.

MongoDB: db.restaurants.find( 
     { $or : [ { "cuisine": "Cajun"}, { "address.zipcode": "10075" } ] } ) 
MySQL:  db.restaurants.find(
     "cuisine = 'Cajun' OR address.zipcode = '10075'")

To me the MySQL argument looks more like every other programming language I am used to.  

Sorting on Two KeysLet sort the restaurants by burough and zipcode, both ascending.   Mongo is looking for JSON objects with the key name and sort order (1 for ascending, -1 for descending!) while MySQL defaults to ascending on the keys provided.

MongoDB: db.restaurants.find().sort( { "burough" : 1, "address.zipcode" : 1 })
MySQL:     db.restaurants.find().sort("burough","address.zipcode")

End of Part I
I am going to spend some time to dive deeper into the differences between the two and especially Mongo's confusing (at least to me) great than expression.
Output From Cajun CuisineMySQL:
JS > db.restaurants.find("cuisine = 'Cajun'")
[
    {
        "_id": "00005b2176ae00000000000010ec",
        "address": {
            "building": "1072",
            "coord": [
                -74.0683798,
                40.6168076
            ],
            "street": "Bay Street",
            "zipcode": "10305"
        },
        "borough": "Staten Island",
        "cuisine": "Cajun",
        "grades": [
            {
                "date": {
                    "$date": 1408579200000
                },
                "grade": "A",
                "score": 13
            },
            {
                "date": {
                    "$date": 1391644800000
                },
                "grade": "A",
                "score": 9
            },
            {
                "date": {
                    "$date": 1375142400000
                },
                "grade": "A",
                "score": 12
            },
            {
                "date": {
                    "$date": 1338336000000
                },
                "grade": "A",
                "score": 8
            }
        ],
        "name": "Bayou",
        "restaurant_id": "40974392"
    },
    {
        "_id": "00005b2176ae000000000000128a",
        "address": {
            "building": "9015",
            "coord": [
                -73.8706606,
                40.7342757
            ],
            "street": "Queens Boulevard",
            "zipcode": "11373"
        },
        "borough": "Queens",
        "cuisine": "Cajun",
        "grades": [
            {
                "date": {
                    "$date": 1420848000000
                },
                "grade": "A",
                "score": 11
            },
            {
                "date": {
                    "$date": 1400457600000
                },
                "grade": "A",
                "score": 7
            },
            {
                "date": {
                    "$date": 1384473600000
                },
                "grade": "A",
                "score": 12
            },
            {
                "date": {
                    "$date": 1370390400000
                },
                "grade": "B",
                "score": 16
            },
            {
                "date": {
                    "$date": 1338249600000
                },
                "grade": "A",
                "score": 7
            }
        ],
        "name": "Big Easy Cajun",
        "restaurant_id": "41017839"
    },
    {
        "_id": "00005b2176ae0000000000002146",
        "address": {
            "building": "90-40",
            "coord": [
                -73.7997187,
                40.7042655
            ],
            "street": "160 Street",
            "zipcode": "11432"
        },
        "borough": "Queens",
        "cuisine": "Cajun",
        "grades": [
            {
                "date": {
                    "$date": 1416873600000
                },
                "grade": "A",
                "score": 9
            },
            {
                "date": {
                    "$date": 1384732800000
                },
                "grade": "A",
                "score": 10
            },
            {
                "date": {
                    "$date": 1366070400000
                },
                "grade": "B",
                "score": 16
            },
            {
                "date": {
                    "$date": 1345507200000
                },
                "grade": "B",
                "score": 18
            }
        ],
        "name": "G & L Cajun Grill",
        "restaurant_id": "41336510"
    },
    {
        "_id": "00005b2176ae0000000000002ce7",
        "address": {
            "building": "2655",
            "coord": [
                -74.1660553,
                40.5823983
            ],
            "street": "Richmond Avenue",
            "zipcode": "10314"
        },
        "borough": "Staten Island",
        "cuisine": "Cajun",
        "grades": [
            {
                "date": {
                    "$date": 1412035200000
                },
                "grade": "A",
                "score": 10
            },
            {
                "date": {
                    "$date": 1392768000000
                },
                "grade": "B",
                "score": 18
            },
            {
                "date": {
                    "$date": 1371772800000
                },
                "grade": "B",
                "score": 16
            },
            {
                "date": {
                    "$date": 1335916800000
                },
                "grade": "A",
                "score": 11
            },
            {
                "date": {
                    "$date": 1322611200000
                },
                "grade": "A",
                "score": 11
            }
        ],
        "name": "Cajun Cafe & Grill",
        "restaurant_id": "41485811"
    },
    {
        "_id": "00005b2176ae000000000000352d",
        "address": {
            "building": "509",
            "coord": [
                -73.964513,
                40.693846
            ],
            "street": "Myrtle Avenue",
            "zipcode": "11205"
        },
        "borough": "Brooklyn",
        "cuisine": "Cajun",
        "grades": [
            {
                "date": {
                    "$date": 1417651200000
                },
                "grade": "A",
                "score": 13
            },
            {
                "date": {
                    "$date": 1386028800000
                },
                "grade": "A",
                "score": 9
            },
            {
                "date": {
                    "$date": 1370390400000
                },
                "grade": "A",
                "score": 4
            },
            {
                "date": {
                    "$date": 1355529600000
                },
                "grade": "A",
                "score": 13
            }
        ],
        "name": "Soco Restaurant",
        "restaurant_id": "41585575"
    },
    {
        "_id": "00005b2176ae0000000000003579",
        "address": {
            "building": "36-18",
            "coord": [
                -73.916912,
                40.764514
            ],
            "street": "30 Avenue",
            "zipcode": "11103"
        },
        "borough": "Queens",
        "cuisine": "Cajun",
        "grades": [
            {
                "date": {
                    "$date": 1418256000000
                },
                "grade": "A",
                "score": 10
            },
            {
                "date": {
                    "$date": 1394668800000
                },
                "grade": "A",
                "score": 0
            },
            {
                "date": {
                    "$date": 1375488000000
                },
                "grade": "B",
                "score": 17
            },
            {
                "date": {
                    "$date": 1358467200000
                },
                "grade": "A",
                "score": 10
            },
            {
                "date": {
                    "$date": 1341446400000
                },
                "grade": "A",
                "score": 12
            },
            {
                "date": {
                    "$date": 1324080000000
                },
                "grade": "A",
                "score": 10
            }
        ],
        "name": "Sugar Freak",
        "restaurant_id": "41589054"
    },
    {
        "_id": "00005b2176ae0000000000004172",
        "address": {
            "building": "1433",
            "coord": [
                -73.9535815,
                40.6741202
            ],
            "street": "Bedford Avenue",
            "zipcode": "11216"
        },
        "borough": "Brooklyn",
        "cuisine": "Cajun",
        "grades": [
            {
                "date": {
                    "$date": 1397001600000
                },
                "grade": "A",
                "score": 8
            },
            {
                "date": {
                    "$date": 1365033600000
                },
                "grade": "A",
                "score": 10
            }
        ],
        "name": "Catfish",
        "restaurant_id": "41685267"
    }
]
7 documents in set (0.0488 sec)



Mongo:

db.restaurants.find( { "cuisine" : "Cajun" })
{ "_id" : ObjectId("5b2293b5f46382c40db834ce"), "address" : { "building" : "1072", "coord" : [ -74.0683798, 40.6168076 ], "street" : "Bay Street", "zipcode" : "10305" }, "borough" : "Staten Island", "cuisine" : "Cajun", "grades" : [ { "date" : ISODate("2014-08-21T00:00:00Z"), "grade" : "A", "score" : 13 }, { "date" : ISODate("2014-02-06T00:00:00Z"), "grade" : "A", "score" : 9 }, { "date" : ISODate("2013-07-30T00:00:00Z"), "grade" : "A", "score" : 12 }, { "date" : ISODate("2012-05-30T00:00:00Z"), "grade" : "A", "score" : 8 } ], "name" : "Bayou", "restaurant_id" : "40974392" }
{ "_id" : ObjectId("5b2293b5f46382c40db8366b"), "address" : { "building" : "9015", "coord" : [ -73.8706606, 40.7342757 ], "street" : "Queens Boulevard", "zipcode" : "11373" }, "borough" : "Queens", "cuisine" : "Cajun", "grades" : [ { "date" : ISODate("2015-01-10T00:00:00Z"), "grade" : "A", "score" : 11 }, { "date" : ISODate("2014-05-19T00:00:00Z"), "grade" : "A", "score" : 7 }, { "date" : ISODate("2013-11-15T00:00:00Z"), "grade" : "A", "score" : 12 }, { "date" : ISODate("2013-06-05T00:00:00Z"), "grade" : "B", "score" : 16 }, { "date" : ISODate("2012-05-29T00:00:00Z"), "grade" : "A", "score" : 7 } ], "name" : "Big Easy Cajun", "restaurant_id" : "41017839" }
{ "_id" : ObjectId("5b2293b5f46382c40db84528"), "address" : { "building" : "90-40", "coord" : [ -73.7997187, 40.7042655 ], "street" : "160 Street", "zipcode" : "11432" }, "borough" : "Queens", "cuisine" : "Cajun", "grades" : [ { "date" : ISODate("2014-11-25T00:00:00Z"), "grade" : "A", "score" : 9 }, { "date" : ISODate("2013-11-18T00:00:00Z"), "grade" : "A", "score" : 10 }, { "date" : ISODate("2013-04-16T00:00:00Z"), "grade" : "B", "score" : 16 }, { "date" : ISODate("2012-08-21T00:00:00Z"), "grade" : "B", "score" : 18 } ], "name" : "G & L Cajun Grill", "restaurant_id" : "41336510" }
{ "_id" : ObjectId("5b2293b5f46382c40db850c6"), "address" : { "building" : "2655", "coord" : [ -74.1660553, 40.5823983 ], "street" : "Richmond Avenue", "zipcode" : "10314" }, "borough" : "Staten Island", "cuisine" : "Cajun", "grades" : [ { "date" : ISODate("2014-09-30T00:00:00Z"), "grade" : "A", "score" : 10 }, { "date" : ISODate("2014-02-19T00:00:00Z"), "grade" : "B", "score" : 18 }, { "date" : ISODate("2013-06-21T00:00:00Z"), "grade" : "B", "score" : 16 }, { "date" : ISODate("2012-05-02T00:00:00Z"), "grade" : "A", "score" : 11 }, { "date" : ISODate("2011-11-30T00:00:00Z"), "grade" : "A", "score" : 11 } ], "name" : "Cajun Cafe & Grill", "restaurant_id" : "41485811" }
{ "_id" : ObjectId("5b2293b5f46382c40db8590d"), "address" : { "building" : "509", "coord" : [ -73.964513, 40.693846 ], "street" : "Myrtle Avenue", "zipcode" : "11205" }, "borough" : "Brooklyn", "cuisine" : "Cajun", "grades" : [ { "date" : ISODate("2014-12-04T00:00:00Z"), "grade" : "A", "score" : 13 }, { "date" : ISODate("2013-12-03T00:00:00Z"), "grade" : "A", "score" : 9 }, { "date" : ISODate("2013-06-05T00:00:00Z"), "grade" : "A", "score" : 4 }, { "date" : ISODate("2012-12-15T00:00:00Z"), "grade" : "A", "score" : 13 } ], "name" : "Soco Restaurant", "restaurant_id" : "41585575" }
{ "_id" : ObjectId("5b2293b5f46382c40db8596b"), "address" : { "building" : "36-18", "coord" : [ -73.916912, 40.764514 ], "street" : "30 Avenue", "zipcode" : "11103" }, "borough" : "Queens", "cuisine" : "Cajun", "grades" : [ { "date" : ISODate("2014-12-11T00:00:00Z"), "grade" : "A", "score" : 10 }, { "date" : ISODate("2014-03-13T00:00:00Z"), "grade" : "A", "score" : 0 }, { "date" : ISODate("2013-08-03T00:00:00Z"), "grade" : "B", "score" : 17 }, { "date" : ISODate("2013-01-18T00:00:00Z"), "grade" : "A", "score" : 10 }, { "date" : ISODate("2012-07-05T00:00:00Z"), "grade" : "A", "score" : 12 }, { "date" : ISODate("2011-12-17T00:00:00Z"), "grade" : "A", "score" : 10 } ], "name" : "Sugar Freak", "restaurant_id" : "41589054" }
{ "_id" : ObjectId("5b2293b6f46382c40db86551"), "address" : { "building" : "1433", "coord" : [ -73.9535815, 40.6741202 ], "street" : "Bedford Avenue", "zipcode" : "11216" }, "borough" : "Brooklyn", "cuisine" : "Cajun", "grades" : [ { "date" : ISODate("2014-04-09T00:00:00Z"), "grade" : "A", "score" : 8 }, { "date" : ISODate("2013-04-04T00:00:00Z"), "grade" : "A", "score" : 10 } ], "name" : "Catfish", "restaurant_id" : "41685267" }


Categories: Web Technologies

Metacat: Making Big Data Discoverable and Meaningful at Netflix

Planet MySQL - Thu, 06/14/2018 - 09:41

by Ajoy Majumdar, Zhen Li

Most large companies have numerous data sources with different data formats and large data volumes. These data stores are accessed and analyzed by many people throughout the enterprise. At Netflix, our data warehouse consists of a large number of data sets stored in Amazon S3 (via Hive), Druid, Elasticsearch, Redshift, Snowflake and MySql. Our platform supports Spark, Presto, Pig, and Hive for consuming, processing and producing data sets. Given the diverse set of data sources, and to make sure our data platform can interoperate across these data sets as one “single” data warehouse, we built Metacat. In this blog, we will discuss our motivations in building Metacat, a metadata service to make data easy to discover, process and manage.

Objectives

The core architecture of the big data platform at Netflix involves three key services. These are the execution service (Genie), the metadata service, and the event service. These ideas are not unique to Netflix, but rather a reflection of the architecture that we felt would be necessary to build a system not only for the present, but for the future scale of our data infrastructure.

Many years back, when we started building the platform, we adopted Pig as our ETL language and Hive as our ad-hoc querying language. Since Pig did not natively have a metadata system, it seemed ideal for us to build one that could interoperate between both.

Thus Metacat was born, a system that acts as a federated metadata access layer for all data stores we support. A centralized service that our various compute engines could use to access the different data sets. In general, Metacat serves three main objectives:

  • Federated views of metadata systems
  • Unified API for metadata about datasets
  • Arbitrary business and user metadata storage of datasets

It is worth noting that other companies that have large and distributed data sets also have similar challenges. Apache Atlas, Twitter’s Data Abstraction Layer and Linkedin’s WhereHows (Data Discovery at Linkedin), to name a few, are built to tackle similar problems, but in the context of the respective architectural choices of the companies.

Metacat

Metacat is a federated service providing a unified REST/Thrift interface to access metadata of various data stores. The respective metadata stores are still the source of truth for schema metadata, so Metacat does not materialize it in its storage. It only directly stores the business and user-defined metadata about the datasets. It also publishes all of the information about the datasets to Elasticsearch for full-text search and discovery.

At a higher level, Metacat features can be categorized as follows:

  • Data abstraction and interoperability
  • Business and user-defined metadata storage
  • Data discovery
  • Data change auditing and notifications
  • Hive metastore optimizations
Data Abstraction and Interoperability

Multiple query engines like Pig, Spark, Presto and Hive are used at Netflix to process and consume data. By introducing a common abstraction layer, datasets can be accessed interchangeably by different engines. For example: A Pig script reading data from Hive will be able to read the table with Hive column types in Pig types. For data movement from one datastore to another, Metacat makes the process easy by helping in creating the new table in the destination data store using the destination table data types. Metacat has a defined list of supported canonical data types and has mappings from these types to each respective data store type. For example, our data movement tool uses the above feature for moving data from Hive to Redshift or Snowflake.

The Metacat thrift service supports the Hive thrift interface for easy integration with Spark and Presto. This enables us to funnel all metadata changes through one system which further enables us to publish notifications about these changes to enable data driven ETL. When new data arrives, Metacat can notify dependent jobs to start.

Business and User-defined Metadata

Metacat stores additional business and user-defined metadata about datasets in its storage. We currently use business metadata to store connection information (for RDS data sources for example), configuration information, metrics (Hive/S3 partitions and tables), and tables TTL (time-to-live) among other use cases. User-defined metadata, as the name suggests, is a free form metadata that can be set by the users for their own usage.

Business metadata can also be broadly categorized into logical and physical metadata. Business metadata about a logical construct such as a table is considered as logical metadata. We use metadata for data categorization and for standardizing our ETL processing. Table owners can provide audit information about a table in the business metadata. They can also provide column default values and validation rules to be used for writes into the table.

Metadata about the actual data stored in the table or partition is considered as physical metadata. Our ETL processing stores metrics about the data at job completion, which is later used for validation. The same metrics can be used for analyzing the cost + space of the data. Given two tables can point to the same location (like in Hive), it is important to have the distinction of logical vs physical metadata because two tables can have the same physical metadata but have different logical metadata.

Data Discovery

As consumers of the data, we should be able to easily browse through and discover the various data sets. Metacat publishes schema metadata and business/user-defined metadata to Elasticsearch that helps in full-text search for information in the data warehouse. This also enables auto-suggest and auto-complete of SQL in our Big Data Portal SQL editor. Organizing datasets as catalogs helps the consumer browse through the information. Tags are used to categorize data based on organizations and subject areas. We also use tags to identify tables for data lifecycle management.

Data Change Notification and Auditing

Metacat, being a central gateway to the data stores, captures any metadata changes and data updates. We have also built a push notification system around table and partition changes. Currently, we are using this mechanism to publish events to our own data pipeline (Keystone) for analytics to better understand our data usage and trending. We also publish to Amazon SNS. We are evolving our data platform architecture to be an event-driven architecture. Publishing events to SNS allows other systems in our data platform to “react” to these metadata or data changes accordingly. For example, when a table is dropped, our S3 warehouse janitor services can subscribe to this event and clean up the data on S3 appropriately.

Hive Metastore Optimizations

The Hive metastore, backed by an RDS, does not perform well under high load. We have noticed a lot of issues around writing and reading of partitions using the metatore APIs. Given this, we no longer use these APIs. We have made improvements in our Hive connector that talks directly to the backed RDS for reading and writing partitions. Before, Hive metastore calls to add a few thousand partitions usually timed out, but with our implementation, this is no longer a problem.

Next Steps

We have come a long way on building Metacat, but we are far from done. Here are some additional features that we still need to work on to enhance our data warehouse experience.

  • Schema and metadata versioning to provide the history of a table. For example, it is useful to track the metadata changes for a specific column or be able to view table size trends over time. Being able to ask what the metadata looked like at a point in the past is important for auditing, debugging, and also useful for reprocessing and roll-back use cases.
  • Provide contextual information about tables for data lineage. For example, metadata like table access frequency can be aggregated in Metacat and published to a data lineage service for use in ranking the criticality of tables.
  • Add support for data stores like Elasticsearch and Kafka.
  • Pluggable metadata validation. Since business and user-defined metadata is free form, to maintain integrity of the metadata, we need validations in place. Metacat should have a pluggable architecture to incorporate validation strategies that can be executed before storing the metadata.

As we continue to develop features to support our use cases going forward, we’re always open to feedback and contributions from the community. You can reach out to us via Github or message us on our Google Group. We hope to share more of what our teams are working on later this year!

And if you’re interested in working on big data challenges like this, we are always looking for great additions to our team. You can see all of our open data platform roles here.

Metacat: Making Big Data Discoverable and Meaningful at Netflix was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Categories: Web Technologies

What is the Top Cause of Application Downtime Today?

Planet MySQL - Thu, 06/14/2018 - 02:43

I frequently talk to our customer base about what keeps them up at night. While there is a large variance of answers, they tend to fall into one of two categories. The first is the conditioned fear of some monster lurking behind the scenes that could pounce at any time. The second, of course, is the actual monster of downtime on a critical system. Ask most tech folks and they will tell you outages seem to only happen late at night or early in the morning. And that they do keep them up.

Entire companies and product lines have been built around providing those in the IT world with some ability to sleep at night. Modern enterprises have spent millions to mitigate the risk and prevent their businesses from having a really bad day because of an outage. Cloud providers are attuned to the downtime dilemma and spend lots of time, money, and effort to build in redundancy and make “High Availability” (HA) as easy as possible. The frequency of “hardware” or server issues continues to dwindle.

Where does the downtime issue start?

In my discussions, most companies I have talked to say their number one cause of outages and customer interruptions is ultimately related to the deployment of new or upgraded code. Often I hear the operations team has little or no involvement with an application until it’s put into production. It is a bit ironic that this is also the area where companies tend to drastically under-invest. They opt instead to invest in ways to “Scale Out or Up”. Or perhaps how to survive asteroids hitting two out three of their data centers.

Failing over broken or slow code from one server to another does not fix it. Adding more servers to distribute the load can mitigate a problem, but can also escalate the cost dramatically. In most cases, the solutions they apply don’t address the primary cause of the problems.

While there are some fantastic tools out there that can help with getting better visibility into code level issues — such as New Relic, AppDynamics and others — the real problem is that these often end up being used to diagnose issues after they have appeared in production. Most companies carry out some amount of testing before releasing code, but typically it is a fraction of what they should be doing. Working for a company that specializes in open source databases, we get a lot of calls on issues that have prevented companies’ end users from using critical applications. Many of these problems are fixable before they cost a loss of revenue and reputation.

I think it’s time technology companies start to rethink our QA, Testing, and Pre-Deployment requirements. How much time, effort, and money can we save if we catch these “monsters” before they make it into production?

Not to mention how much better our operations team will sleep . . .

The post What is the Top Cause of Application Downtime Today? appeared first on Percona Database Performance Blog.

Categories: Web Technologies

Next week in Barcelona

Planet MySQL - Thu, 06/14/2018 - 01:16

Next week I will be speaking at DataOps in Barcelona about MySQL 8.0 Document Store. If you don’t know it yet, I really invite you to join this talk, you will be very surprised about all MysQL can do in the NoSQL world !

There will be also a lot other MySQL related sessions by many good speakers of the MySQL Community.

As I will be in Barcelona, the Barcelona MySQL Meetup invited me to give a session about MySQL InnoDB Cluster and Group Replication and I will also share the stage with my friend and colleague Abel Flórez. Don’t hesitate to join us and register for this event !

Ola, see you soon in Barcelona !

Categories: Web Technologies

More Porting Data from MongoDB to the MySQL Document Store

Planet MySQL - Wed, 06/13/2018 - 14:56
Last time we looked at moving a JSON data set from MongoDB to the MySQL Document Store.  Let's move another and then see how to investigate this date.  We will use the primer-dataset.json that contains data on restaurants around New York City.

Loading Data
The loading of the JSON data set was covered last time but here is the gist. The first step is to fire up the MySQL Shell and login to the server.
Here a new schema is created and then a new collection  We need a new schema for this data and the example shows one created as nyeats.  The within that new schema a collection is created with the name restaurants.


Then we switch to Python mode to load the data with a simple program Switching to Python mode, a simple program reads the data from a file and loads the collection.
What types of RestaurantsWe can quickly look at all the restaurants but it may be easier to start out looking at the types of cuisine. And it would be nice to see the numbers of each type.

The result set:

[
    {
        "$.cuisine": "Polynesian",
        "count('*')": 1
    },
    {
        "$.cuisine": "Café/Coffee/Tea",
        "count('*')": 2
    },
    {
        "$.cuisine": "Cajun",
        "count('*')": 7
    },
...
    {
        "$.cuisine": "Latin (Cuban, Dominican, Puerto Rican, South & Central American)",
        "count('*')": 850
    },
    {
        "$.cuisine": "Other",
        "count('*')": 1011
    }
]
85 documents in set (0.7823 sec)

The big surprise for me was the 1,011 other restaurants after seeing a rather inclusive list of cuisine styles.

What cuisine types are available and their numbers? Feel like Red Beans and RiceSo lets narrow our search down and look for some Cajun food.  But since we are health conscious we will want to check the health department ratings on the restaurants.
And we can see the names of the restaurants with their latest health department grades.

Next time we will dig deeper into our NYC restaurants

Categories: Web Technologies

Log Buffer #547: A Carnival of the Vanities for DBAs

Planet MySQL - Wed, 06/13/2018 - 07:57

This Log Buffer edition covers Oracle, SQL Server and MySQL.

Cloud:

What DBAs need to know about Cloud Spanner, Part 1: Keys and indexes

Introducing sole-tenant nodes for Google Compute Engine — when sharing isn’t an option

A serverless solution for invoking AWS Lambda at a sub-minute frequency

Amazon Aurora MySQL DBA Handbook – connection management

Why you should bet on Azure for your infrastructure needs, today and in the future

Oracle:

The ability to make grants on objects in the database such as tables, views, procedures or others such as SELECT, DELETE, EXECUTE and more is the cornerstone of giving other users or schemas granular access to objects.

While clients tend to tell developers to skip wireframing and prototyping, seasoned veterans tell newbies that they can skip wireframing and proceed with prototyping.

Last week in Stream Processing & Analytics – 6.6.2018

Facebook, Google and Custom Authentication in the same Oracle APEX 18.1 app

Quick install of Prometheus, node_exporter and Grafana

MySQL:

Benchmarking the Read Backup feature in the NDB storage engine

MySQL Cluster 7.6 and the thread pool

MySQL on Docker: Running a MariaDB Galera Cluster without Container Orchestration Tools – Part 1

MySQL Streaming Xtrabackup to Slave Recovery

A friendly comparison of InnoDB and MyRocks Performance

PostgreSQL:

By Jeremy Schneider

Hello from California!

Part of my team is here in Palo Alto and I’m visiting for a few days this week. You know, for all the remote work I’ve done over the years, I still really value this in-person, face-to-face time. These little trips from Seattle to other locations where my teammates physically sit are important to me.

This is also part of the reason I enjoy user groups and conferences so much. They’re opportunities to meet with other PostgreSQL users in real life. In fact – at this very moment – one of the most important PostgreSQL conferences in the world is happening: PgCon! Having attended a few other conferences over the past year, I’m holding down the fort in the office this week in order to send a bunch of other teammates… but you can be sure I’m keeping an eye on Twitter. :)

In the meantime, let’s get busy with the latest updates from the postgresql virtual world. First of all, I think the biggest headline is that (just in time for pgcon) we have the first official beta version of PostgreSQL 11! The release announcement headlines with major partitioning enhancements, more parallelism, a feature to speed up SQL execution by compiling certain operations on-demand into native machine code (JIT/Just-In-Time compilation), and numerous SQL enhancements. You can also read the first draft of the release notes. This is the time to start testing and give feedback to the development community!

Closely related to this, there’s one other really big headline that I’m excited about: the new AWS RDS Preview Environment. You can now try out the new pg11 features ahead of time with a single click! In part, because the development community is so awesome, the first database available in the RDS Preview Environment is PostgreSQL. And the official PostgreSQL 11 beta release is _already_ available on RDS!! Personally, I’m hoping that this benefits the community by getting more people to test and give feedback on new features being built for PostgreSQL 11. I hope it will make a great database even better.

Moving on from headlines, let’s get to the real stuff – the meaty technical articles. :)

First up, who likes testing and benchmarking? One of my favorite activities, truth be told! So I can’t quite convey just how excited I am about the preview release of Kevin Closson’s pgio testing kit. For those unfamiliar, Kevin has spent years refining his approach for testing storage through database I/O paths. Much work was done in the past with Oracle databases, and he calls his method SLOB. I’m excited to start using this kit for exploring the limits of storage through PostgreSQL I/O paths too.

Right after Kevin published that post, Franck Pachot followed up with a short article using pgio to look at the impact of the ext4 “sync” filesystem option (made relevant by the recently disclosed flaws in how PostgreSQL has been interacting with Linux’s implementation of fsync).

In addition to Kevin’s release of PGIO, I also saw three other generally fun technical articles. First, Kaarel Moppel from Cybertec published an article showing much lower-than-expected impact of pl/pgsql triggers on a simple pgbench execution. Admittedly, I want to poke around at this myself, having seen a few situations where the impact seemed higher. Great article – and it certainly got some circulation on Twitter.

Next, Sebastian Insausti has published an article explaining PostgreSQL streaming replication. What I appreciate the most about this article is how Sebastian walks through the history of how streaming replication was developed. That context is so important and helpful!

Finally, the requisite Vacuum post. :) This month we’ve got a nice technical article from Sourabh Ghorpade on the Gojek engineering team. Great high-level introduction to vacuuming in general, and a good story about how their team narrowly averted an “xid wraparound” crisis.

Categories: Web Technologies

Node restart improvements in MySQL Cluster 7.6

Planet MySQL - Wed, 06/13/2018 - 06:28
A simple test to see how much faster restarts are in 7.6 compared to 7.5 is to
load a set of DBT2 warehouses and next perform a node restart of one of the
data nodes.

In the test we load 600 warehouses, this means about 60 GByte of data inserted
into NDB in each data node.

A node restart of 7.5.10 using this data set takes 19 minutes and 11 seconds.
In 7.6.6 this restart takes 6 minutes and 49 seconds.

The difference can vary, especially since the restart time in 7.5 has a fairly high
variance. The restart time in 7.6.6 has much smaller variance.

We will look into the different restart phases and see how those are affected.

The first phase is to allocate the memory and ensure that the memory is allocated
to our process by touching the memory. This phase roughly allocates 2.7 GByte
per second. In this case it takes 26 seconds in both 7.5 and in 7.6.

After that we have some preparatory steps, these normally are very quick and in
this takes about 1-2 seconds. These steps involve preparing the metadata to use
in the restart.

The next step is the restore phase where we recover data from a local checkpoint.
Given the introduction of Partial LCP in 7.6 this phase actually takes longer time
since there is some duplication in the restore where one row is restored multiple
times. This phase will go faster through setting RecoveryWork to a smaller
number, but this will instead increase work on checkpointing during normal
operation.

In this test 7.5 took 1 minute and 51 seconds to restore and 7.6.6 used 2 minutes
and 15 seconds.

The next phase is the REDO log execution phase. Since partial LCP means that
LCPs are shorter, this phase is shortened. This particular benchmark actually
doesn't shorten as much as could be the case since there is heavy insert activity
right before the restart. But it still decreased from 51 seconds in 7.5 to 35 seconds
in 7.6.

After this we execute the UNDO log, this particular benchmark has no disk data.
So both 7.5 and 7.6 takes no time in this phase. But this phase has improved by
5 times with 4 LDM threads and this benchmark uses 8 LDM threads. So this
phase is dramatically faster in 7.6 if used.

The next phase is the rebuilding of the ordered indexes. This phase executes the
same code as in 7.5, but we have changed default configuration to ensure that
the rebuild use fully parallelised rebuild. This means that we have 16 CPUs
working on the rebuild instead of 8 CPUs as in 7.5. This gives a rebuild time
of 1 minute and 17 seconds in 7.6.6 compared to 2 minutes and 4 seconds. The
reason it isn't twice is since we make use of hyperthreading to speed things up.

The copy fragment phase is more or less empty in both 7.5 and 7.6 since we
didn't perform any updates during the restart. We don't expect any major
differences in this phase between 7.5 and 7.6.

Next we come to biggest gain in restart times in 7.6. This is the phase where
we wait for the local checkpoint to complete. In 7.5 we have to wait between
1 and 2 checkpoint times. In 7.5 in this benchmark the checkpoint takes about
11 minutes although we have increased the disk write speed compared to the
default. In this execution 7.5 took 13 minutes and 48 seconds in this phase.

In 7.6 we execute a checkpoint that is local to the starting node. This takes
2 minutes and 4 seconds. Finally we participate in a checkpoint where all
nodes participate. This is now extremely fast, it takes only 4 seconds since
no activity is ongoing. So the total time for this phase 2 minutes and 8 seconds.

This wait for checkpoint phase is fairly constant in time in 7.6, in 7.5 it grows
with a growing database size and is dependent on the disk write speed we have
configured. Thus the gain in restart time in 7.6 is a bit variable, but this
experiment have been done with fairly conservative numbers.

The final restart phase is handing over the responsibility of event handling
between connected MySQL servers, this phase takes 5-6 seconds in both
7.5 and 7.6. It can take longer if some MySQL Server is down during the
restart.

As can be seen we have significantly brought down the restart times for a
size that is common in 7.5, with larger sizes the difference is much bigger.
Categories: Web Technologies

ChatOps - Managing MySQL, MongoDB & PostgreSQL from Slack

Planet MySQL - Wed, 06/13/2018 - 06:04
What is ChatOps?

Nowadays, we make use of multiple communication channels to manage or receive information from our systems, such as email, chat and applications among others. If we could centralize this in one or just a few different possible applications, and even better, if we could integrate it with tools that we currently use in our organization, we would be able to automate processes, improve our work dynamics and communication, having a clearer picture of the current state of our system. In many companies, Slack or other collaboration tools is becoming the centre and the heart of the development and ops teams.

What is ChatBot?

A chatbot is a program that simulates a conversation, receiving entries made by the user and returns answers based on its programming.

Some products have been developed with this technology, that allow us to perform administrative tasks, or keeps the team up to date on the current status of the systems.

This allows, among other things, to integrate the communication tools we use daily, with our systems.

CCBot - ClusterControl

CCBot is a chatbot that uses the ClusterControl APIs to manage and monitor your database clusters. You will be able to deploy new clusters or replication setups, keep your team up to date on the status of the databases as well as the status of any administrative jobs (e.g., backups or rolling upgrades). You can also restart failed nodes, add new ones, promote a slave to master, add load balancers, and so on. CCBot supports most of the major chat services like Slack, Flowdock and Hipchat.

CCBot is integrated with the s9s command line, so you have several commands to use with this tool.

ClusterControl Notifications via Slack

Note that you can use Slack to handle alarms and notifications from ClusterControl. Why? A chat room is a good place to discuss incidents. Seeing an actual alarm in a Slack channel makes it easy to discuss it with the team, because all team members actually know what is being discussed and can chime in.

The main difference between CCBot and the integration of notifications via Slack is that, with CCBot, the user initiates the communication via a specific command, generating a response from the system. For notifications, ClusterControl generates an event, for example, a message about a node failure. This event is then sent to the tool that we have integrated for our notifications, for example, Slack.

You can review this post on how to configure ClusterControl in order to send notifications to Slack.

After this, we can see ClusterControl notifications in our Slack:

ClusterControl Slack Integration CCBot Installation

To install CCBot, once we have installed ClusterControl, we must execute the following script:

$ /var/www/html/clustercontrol/app/tools/install-ccbot.sh

We select which adapter we want to use, in this blog, we will select Slack.

-- Supported Hubot Adapters -- 1. slack 2. hipchat 3. flowdock Select the hubot adapter to install [1-3]: 1

It will then ask us for some information, such as an email, a description, the name we will give to our bot, the port, the API token and the channel to which we want to add it.

? Owner (User <user@example.com>) ? Description (A simple helpful robot for your Company) Enter your bot's name (ccbot): Enter hubot's http events listening port (8081): Enter your slack API token: Enter your slack message room (general):

To obtain the API token, we must go to our Slack -> Apps (On the left side of our Slack window), we look for Hubot and select Install.

CCBot Hubot

We enter the Username, which must match our bot name.

In the next window, we can see the API token to use.

CCBot API Token Enter your slack API token: xoxb-111111111111-XXXXXXXXXXXXXXXXXXXXXXXX CCBot installation completed!

Finally, to be able to use all the s9s command line functions with CCBot, we must create a user from ClusterControl:

$ s9s user --create --cmon-user=cmon --group=admins --controller="https://localhost:9501" --generate-key cmon

For further information about how to manage users, please check the official documentation.

We can now use our CCBot from Slack.

Here we have some examples of commands:

$ s9s --help CCBot Help

With this command we can see the help for the s9s CLI.

$ s9s cluster --list --long CCBot Cluster List

With this command we can see a list of our clusters.

$ s9s cluster --cluster-id=17 --stat CCBot Cluster Stat

With this command we can see the stats of one cluster, in this case cluster id 17.

$ s9s node --list --long CCBot Node List

With this command we can see a list of our nodes.

$ s9s job --list CCBot Job List

With this command we can see a list of our jobs.

$ s9s backup --create --backup-method=mysqldump --cluster-id=16 --nodes=192.168.100.34:3306 --backup-directory=/backup CCBot Backup

With this command we can create a backup with mysqldump, in the node 192.168.100.34. The backup will be saved in the /backup directory.

ClusterControl Single Console for Your Entire Database Infrastructure Find out what else is new in ClusterControl Install ClusterControl for FREE

Now let's see some more complex examples:

$ s9s cluster --create --cluster-type=mysqlreplication --nodes="mysql1;mysql2" --vendor="percona" --provider-version="5.7" --template="my.cnf.repl57" --db-admin="root" --db-admin-passwd="root123" --os-user="root" --cluster-name="MySQL1" CCBot Create Replication

With this command we can create a MySQL Master-Slave Replication with Percona for MySQL 5.7 version.

CCBot Check Replication Created

And we can check this new cluster.

In ClusterControl Topology View, we can check our current topology with one master and one slave node.

Topology View Replication 1 $ s9s cluster --add-node --nodes=mysql3 --cluster-id=24 CCBot Add Node

With this command we can add a new slave in our current cluster.

Topology View Replication 2

And we can check our new topology in ClusterControl Topology View.

$ s9s cluster --add-node --cluster-id=24 --nodes="proxysql://proxysql" CCBot Add ProxySQL

With this command we can add a new ProxySQL node named "proxysql" in our current cluster.

Topology View Replication 3

And we can check our new topology in ClusterControl Topology View.

You can check the list of available commands in the documentation.
If we try to use CCBot from a Slack channel, we must add "@ccbot_name" at the beginning of our command:

@ccbot s9s backup --create --backup-method=xtrabackupfull --cluster-id=1 --nodes=10.0.0.5:3306 --backup-directory=/storage/backups

CCBot makes it easier for teams to manage their clusters in a collaborative way. It is fully integrated with the tools they use on a daily basis.

Note

If we have the following error when wanting to run the CCBot installer in our ClusterControl:

-bash: yo: command not found Related resources  ClusterControl Integrations  Introducing the ClusterControl Alerting Integrations  Database Automation: Integrating the ClusterControl CLI with your ChatBot

We must update the version of nodejs package.

Conclusion

As we said previously, there are several ChatBot alternatives for different purposes, we can even create our own ChatBot, but as this technology facilitates our tasks and has several advantages that we mentioned at the beginning of this blog, not everything that shines is gold.

There is a very important detail to keep in mind - security. We must be very careful when using them, and take all the necessary precautions to know what we allow to do, in what way, at what moment, to whom and from where.

Tags:  chatops chatbot ccbot slack MySQL MongoDB PostgreSQL
Categories: Web Technologies

Porting Data From MongoDB to MySQL Document Store in TWO Easy Steps

Planet MySQL - Tue, 06/12/2018 - 15:29
Porting data from MongoDB to the MySQL Document Store is very easy.  The example I will use is an example data set from the good folks at Mongo named zips.json that contains a list of US Postal Codes and can be found at http://media.mongodb.org/zips.json for your downloading pleasure.

I copied the file into the Downloads directory on my Unbuntu laptop and then fired up the new MySQL Shell.  After login, I created a new schema creatively named zips with session.createSchema('zips').  When then set the db object to this new schema with the command \use zips.

Creating a new schema named 'zips' and then informing the system that I wish to use this new schema as the db object
Now it is time to populate the schema with a collection for holding documents. The collection is named zip and is created with db.createCollection('zip') The next step is to read the zips.json file into the collection using Python

We need to create a new collection named zip in the schema we just created and then switch to Python mode to read in the data line by line and store it as a document in the zip collection.

You might want to go back and read the wonderful presentation and scripts by Giuseppe Maxia on loading MongoDB data into MySQL at https://github.com/datacharmer/mysql-document-store as he originated this very useful bit of code.

One thing that helps is that this data set has a _id field that the MySQL Document Store will grab and use as the InnoDB primary key.  I will have to brush up on my Python to extract another value from data sets to use for the _id field.


And we can now perform NoSQL searches on the data.

A NoSQL search of the data in the zip collection
Or SQL. First we have to change to SQL node with \sql and then search for the same record with SELECT * FROM zip WHERE _id='01010';

A SQL search of the data in the zip table


If you have questions about porting data from MongoDB into the MySQL Document Store or the Document Store in general, please drop me a line.




Categories: Web Technologies

Shinguz: Select Hello World FromDual with MariaDB PL/SQL

Planet MySQL - Tue, 06/12/2018 - 14:36

MariaDB 10.3 was released GA a few weeks ago. One of the features which interests me most is the MariaDB Oracle PL/SQL compatibility mode.

So its time to try it out now...

Enabling Oracle PL/SQL in MariaDB

Oracle PL/SQL syntax is quite different from old MySQL/MariaDB SQL/PSM syntax. So the old MariaDB parser would through some errors without modification. The activation of the modification of the MariaDB PL/SQL parser is achieved by changing the sql_mode as follows:

mariadb> SET SESSION sql_mode=ORACLE;

or you can make this setting persistent in your my.cnf MariaDB configuration file:

[mysqld] sql_mode = ORACLE

To verify if the sql_mode is already set you can use the following statement:

mariadb> pager grep --color -i oracle PAGER set to 'grep --color -i oracle' mariadb> SELECT @@sql_mode; | PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE,ORACLE,NO_KEY_OPTIONS,NO_TABLE_OPTIONS,NO_FIELD_OPTIONS,NO_AUTO_CREATE_USER,SIMULTANEOUS_ASSIGNMENT | mariadb> nopager
Nomen est omen

First of all I tried the function of the basic and fundamental table in Oracle, the DUAL table:

mariadb> SELECT * FROM dual; ERROR 1096 (HY000): No tables used

Sad. :-( But this query on the dual table seems to work:

mariadb> SELECT 'Hello World!' FROM dual; +--------------+ | Hello World! | +--------------+ | Hello World! | +--------------+

The second result looks much better. The first query should work as well but does not. We opened a bug at MariaDB without much hope that this bug will be fixed soon...

To get more info why MariaDB behaves like this I tried to investigate a bit more:

mariadb> SELECT table_schema, table_name FROM information_schema.tables WHERE table_name = 'dual'; Empty set (0.001 sec)

Hmmm. It seems to be implemented not as a real table... But normal usage of this table seems to work:

mariadb> SELECT CURRENT_TIMESTAMP() FROM dual; +---------------------+ | current_timestamp() | +---------------------+ | 2018-06-07 15:32:11 | +---------------------+

If you rely heavily in your code on the dual table you can create it yourself. It is defined as follows:

"The DUAL table has one column, DUMMY, defined to be VARCHAR2(1), and contains one row with a value X."

If you want to create the dual table yourself here is the statement:

mariadb> CREATE TABLE `DUAL` (DUMMY VARCHAR2(1)); mariadb> INSERT INTO `DUAL` (DUMMY) VALUES ('X');
Anonymous PL/SQL block in MariaDB

To try some PL/SQL features out or to run a sequence of PL/SQL commands you can use anonymous blocks. Unfortunately MySQL SQL/PSM style delimiter seems still to be necessary.

It is recommended to use the DELIMITER /, then most of the Oracle examples will work straight out of the box...

DELIMITER / BEGIN SELECT 'Hello world from MariaDB anonymous PL/SQL block!'; END; / DELIMITER ; +--------------------------------------------------+ | Hello world from MariaDB anonymous PL/SQL block! | +--------------------------------------------------+ | Hello world from MariaDB anonymous PL/SQL block! | +--------------------------------------------------+
A simple PL/SQL style MariaDB Procedure DELIMITER / CREATE OR REPLACE PROCEDURE hello AS BEGIN DECLARE vString VARCHAR2(255) := NULL; BEGIN SELECT 'Hello world from MariaDB PL/SQL Procedure!' INTO vString FROM dual; SELECT vString; END; END hello; / BEGIN hello(); END; / DELIMITER ;
A simple PL/SQL style MariaDB Function DELIMITER / CREATE OR REPLACE FUNCTION hello RETURN VARCHAR2 DETERMINISTIC AS BEGIN DECLARE vString VARCHAR2(255) := NULL; BEGIN SELECT 'Hello world from MariaDB PL/SQL Function!' INTO vString FROM dual; RETURN vString; END; END hello; / DECLARE vString VARCHAR(255) := NULL; BEGIN vString := hello(); SELECT vString; END; / DELIMITER ;
An PL/SQL package in MariaDB

Up to here there is nothing really new, just slightly different. But now let us try a PL/SQL package in MariaDB:

DELIMITER / CREATE OR REPLACE PACKAGE hello AS -- must be delared as public! PROCEDURE helloWorldProcedure(pString VARCHAR2); FUNCTION helloWorldFunction(pString VARCHAR2) RETURN VARCHAR2; END hello; / CREATE OR REPLACE PACKAGE BODY hello AS vString VARCHAR2(255) := NULL; -- was declared public in PACKAGE PROCEDURE helloWorldProcedure(pString VARCHAR2) AS BEGIN SELECT 'Hello world from MariaDB Package Procedure in ' || pString || '!' INTO vString FROM dual; SELECT vString; END; -- was declared public in PACKAGE FUNCTION helloWorldFunction(pString VARCHAR2) RETURN VARCHAR2 AS BEGIN SELECT 'Hello world from MariaDB Package Function in ' || pString || '!' INTO vString FROM dual; return vString; END; BEGIN SELECT 'Package initialiser, called only once per connection!'; END hello; / DECLARE vString VARCHAR2(255) := NULL; -- CONSTANT seems to be not supported yet by MariaDB -- cString CONSTANT VARCHAR2(255) := 'anonymous block'; cString VARCHAR2(255) := 'anonymous block'; BEGIN CALL hello.helloWorldProcedure(cString); SELECT hello.helloWorldFunction(cString) INTO vString; SELECT vString; END; / DELIMITER ;
DBMS_OUTPUT package for MariaDB

An Oracle database contains over 200 PL/SQL packages. One of the most common one is the DBMS_OUTPUT package. In this package we can find the Procedure PUT_LINE.

This package/function has not been implemented yet by MariaDB so far. So we have to do it ourself:

DELIMITER / CREATE OR REPLACE PACKAGE DBMS_OUTPUT AS PROCEDURE PUT_LINE(pString IN VARCHAR2); END DBMS_OUTPUT; / CREATE OR REPLACE PACKAGE BODY DBMS_OUTPUT AS PROCEDURE PUT_LINE(pString IN VARCHAR2) AS BEGIN SELECT pString; END; END DBMS_OUTPUT; / BEGIN DBMS_OUTPUT.PUT_LINE('Hello world from MariaDB DBMS_OUTPUT.PUT_LINE!'); END; / DELIMITER ;

The other Functions and Procedures have to be implemented later over time...

Now we can try to do all examples from Oracle sources!

Taxonomy upgrade extras:  mariadb pl/sql package procedure function Oracle
Categories: Web Technologies

MariaDB Audit Plugin

Planet MySQL - Tue, 06/12/2018 - 10:31

MariaDB DBAs are accountable for auditing database infrastructure operations to proactively troubleshoot performance and operational issues, MariaDB Audit Plugin is capable of auditing the database operations of both MariaDB and MySQL. MariaDB Audit Plugin is provided as a dynamic library: server_audit.so (server_audit.dll for Windows).  The plugin must be located in the plugin directory, the directory containing all plugin libraries for MariaDB.

MariaDB [(none)]> select @@plugin_dir; +--------------------------+ | @@plugin_dir | +--------------------------+ | /usr/lib64/mysql/plugin/ | +--------------------------+ 1 row in set (0.000 sec)

There are two ways you can install MariaDB Audit Plugin:

INSTALL SONAME statement while logged into MariaDB, You need to use administrative account which has INSERT privilege for the mysql.plugin table.

MariaDB [(none)]> INSTALL SONAME 'server_audit'; Query OK, 0 rows affected (0.012 sec) MariaDB [(none)]>

Load Plugin at Start-Up 

The plugin can be loaded by setting -plugin_load system variable in my.cnf (my.ini in windows)

[mysqld] # # include all files from the config directory # !includedir /etc/my.cnf.d plugin_load=server_audit=server_audit.so

System variables to configure MariaDB Audit Plugin

MariaDB Audit Plugin is highly configurable, Please find below the system variables available for MariaDB Audit Plugin:

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE '%server_audit%'; +-------------------------------+-----------------------+ | Variable_name | Value | +-------------------------------+-----------------------+ | server_audit_events | | | server_audit_excl_users | | | server_audit_file_path | server_audit.log | | server_audit_file_rotate_now | OFF | | server_audit_file_rotate_size | 1000000 | | server_audit_file_rotations | 9 | | server_audit_incl_users | | | server_audit_logging | OFF | | server_audit_mode | 0 | | server_audit_output_type | file | | server_audit_query_log_limit | 1024 | | server_audit_syslog_facility | LOG_USER | | server_audit_syslog_ident | mysql-server_auditing | | server_audit_syslog_info | | | server_audit_syslog_priority | LOG_INFO | +-------------------------------+-----------------------+ 15 rows in set (0.001 sec)

configure system variable server_audit_events for auditing MariaDB transaction events:

MariaDB [(none)]> SET GLOBAL server_audit_events = 'CONNECT,QUERY,TABLE'; Query OK, 0 rows affected (0.008 sec)

Enable MariaDB Audit Plugin

MariaDB [(none)]> set global server_audit_logging=on; Query OK, 0 rows affected (0.007 sec)

MariaDB Audit Plugin creates audit log file “server_audit.log” on path  /var/lib/mysql/ 

Testing MariaDB Audit Plugin 

MariaDB [employees]> update employees -> set last_name='Gupta' -> where emp_no= 499999; Query OK, 1 row affected (0.010 sec) Rows matched: 1 Changed: 1 Warnings: 0

[root@localhost mysql]# tail -f server_audit.log 20180612 20:32:07,localhost.localdomain,root,localhost,16,433,QUERY,,'SHOW GLOBAL VARIABLES LIKE \'%server_audit%\'',0 20180612 20:32:26,localhost.localdomain,root,localhost,16,434,QUERY,,'update employees set last_name=\'Gupta\' where emp_no= 499999',1046 20180612 20:32:37,localhost.localdomain,root,localhost,16,435,QUERY,,'SELECT DATABASE()',0 20180612 20:32:37,localhost.localdomain,root,localhost,16,437,QUERY,employees,'show databases',0 20180612 20:32:37,localhost.localdomain,root,localhost,16,438,QUERY,employees,'show tables',0 20180612 20:32:41,localhost.localdomain,root,localhost,16,447,WRITE,employees,employees, 20180612 20:32:41,localhost.localdomain,root,localhost,16,447,READ,employees,dept_emp, 20180612 20:32:41,localhost.localdomain,root,localhost,16,447,READ,employees,dept_manager, 20180612 20:32:41,localhost.localdomain,root,localhost,16,447,QUERY,employees,'update employees set last_name=\'Gupta\' where emp_no= 499999',0

How can we block UNINSTALL PLUGIN ?

The INSTALL PLUGIN statement can be used to uninstall a plugin but you can disable this by adding following line in my.cnf after plugin is loaded once:

[mysqld] # # include all files from the config directory # !includedir /etc/my.cnf.d plugin_load=server_audit=server_audit.so server_audit=FORCE_PLUS_PERMANENT

 

The post MariaDB Audit Plugin appeared first on MySQL Consulting, Support and Remote DBA Services By MinervaDB.

Categories: Web Technologies

Build a Health Tracking App with React, GraphQL, and User Authentication

Planet MySQL - Tue, 06/12/2018 - 10:27

I think you’ll like the story I’m about to tell you. I’m going to show you how to build a GraphQL API with Vesper framework, TypeORM, and MySQL. These are Node frameworks, and I’ll use TypeScript for the language. For the client, I’ll use React, reactstrap, and Apollo Client to talk to the API. Once you have this environment working, and you add secure user authentication, I believe you’ll love the experience!

Why focus on secure authentication? Well, aside from the fact that I work for Okta, I think we can all agree that pretty much every application depends upon a secure identity management system. For most developers who are building React apps, there’s a decision to be made between rolling your own authentication/authorization or plugging in a service like Okta. Before I dive into building a React app, I want to tell you a bit about Okta, and why I think it’s an excellent solution for all JavaScript developers.

What is Okta?

In short, we make identity management a lot easier, more secure, and more scalable than what you’re used to. Okta is a cloud service that allows developers to create, edit, and securely store user accounts and user account data, and connect them with one or multiple applications. Our API enables you to:

Are you sold? Register for a forever-free developer account, and when you’re done, come on back so we can learn more about building secure apps in React!

Why a Health Tracking App?

In late September through mid-October 2014, I'd done a 21-Day Sugar Detox during which I stopped eating sugar, started exercising regularly, and stopped drinking alcohol. I'd had high blood pressure for over ten years and was on blood-pressure medication at the time. During the first week of the detox, I ran out of blood-pressure medication. Since a new prescription required a doctor visit, I decided I'd wait until after the detox to get it. After three weeks, not only did I lose 15 pounds, but my blood pressure was at normal levels!

Before I started the detox, I came up with a 21-point system to see how healthy I was each week. Its rules were simple: you can earn up to three points per day for the following reasons:

  1. If you eat healthy, you get a point. Otherwise, zero.
  2. If you exercise, you get a point.
  3. If you don't drink alcohol, you get a point.

I was surprised to find I got eight points the first week I used this system. During the detox, I got 16 points the first week, 20 the second, and 21 the third. Before the detox, I thought eating healthy meant eating anything except fast food. After the detox, I realized that eating healthy for me meant eating no sugar. I'm also a big lover of craft beer, so I modified the alcohol rule to allow two healthier alcohol drinks (like a greyhound or red wine) per day.

My goal is to earn 15 points per week. I find that if I get more, I'll likely lose weight and have good blood pressure. If I get fewer than 15, I risk getting sick. I've been tracking my health like this since September 2014. I've lost weight, and my blood pressure has returned to and maintained normal levels. I haven't had good blood pressure since my early 20s, so this has been a life changer for me.

I built 21-Points Health to track my health. I figured it'd be fun to recreate a small slice of that app, just tracking daily points.

Building an API with TypeORM, GraphQL, and Vesper

TypeORM is a nifty ORM (object-relational mapper) framework that can run in most JavaScript platforms, including Node, a browser, Cordova, React Native, and Electron. It’s heavily influenced by Hibernate, Doctrine, and Entity Framework. Install TypeORM globally to begin creating your API.

npm i -g typeorm@0.2.7

Create a directory to hold the React client and GraphQL API.

mkdir health-tracker cd health-tracker

Create a new project with MySQL using the following command:

typeorm init --name graphql-api --database mysql

Edit graphql-api/ormconfig.json to customize the username, password, and database.

{ ... "username": "health", "password": "pointstest", "database": "healthpoints", ... }

TIP: To see the queries being executed against MySQL, change the "logging" value in this file to be "all". Many other logging options are available too.

Install MySQL

Install MySQL if you don’t already have it installed. On Ubuntu, you can use sudo apt-get install mysql-server. On macOS, you can use Homebrew and brew install mysql. For Windows, you can use the MySQL Installer.

Once you’ve got MySQL installed and configured with a root password, login and create a healthpoints database.

mysql -u root -p create database healthpoints; use healthpoints; grant all privileges on *.* to 'health'@'localhost' identified by 'points';

Navigate to your graphql-api project in a terminal window, install the project’s dependencies, then start it to ensure you can connect to MySQL.

cd graphql-api npm i npm start

You should see the following output:

Inserting a new user into the database... Saved a new user with id: 1 Loading users from the database... Loaded users: [ User { id: 1, firstName: 'Timber', lastName: 'Saw', age: 25 } ] Here you can setup and run express/koa/any other framework. Install Vesper to Integrate TypeORM and GraphQL

Vesper is a Node framework that integrates TypeORM and GraphQL. To install it, use good ol' npm.

npm i vesper@0.1.9

Now it's time to create some GraphQL models (that define what your data looks like) and some controllers (that explain how to interact with your data).

Create graphql-api/src/schema/model/Points.graphql:

type Points { id: Int date: Date exercise: Int diet: Int alcohol: Int notes: String user: User }

Create graphql-api/src/schema/model/User.graphql:

type User { id: String firstName: String lastName: String points: [Points] }

Next, create a graphql-api/src/schema/controller/PointsController.graphql with queries and mutations:

type Query { points: [Points] pointsGet(id: Int): Points users: [User] } type Mutation { pointsSave(id: Int, date: Date, exercise: Int, diet: Int, alcohol: Int, notes: String): Points pointsDelete(id: Int): Boolean }

Now that your data has GraphQL metadata create entities that will be managed by TypeORM. Change src/entity/User.ts to have the following code that allows points to be associated with a user.

import { Column, Entity, OneToMany, PrimaryColumn } from 'typeorm'; import { Points } from './Points'; @Entity() export class User { @PrimaryColumn() id: string; @Column() firstName: string; @Column() lastName: string; @OneToMany(() => Points, points => points.user) points: Points[]; }

In the same src/entity directory, create a Points.ts class with the following code.

import { Entity, PrimaryGeneratedColumn, Column, ManyToOne } from 'typeorm'; import { User } from './User'; @Entity() export class Points { @PrimaryGeneratedColumn() id: number; @Column({ type: 'timestamp', default: () => 'CURRENT_TIMESTAMP'}) date: Date; @Column() exercise: number; @Column() diet: number; @Column() alcohol: number; @Column() notes: string; @ManyToOne(() => User, user => user.points, { cascade: ["insert"] }) user: User|null; }

Note the cascade: ["insert"] option on the @ManyToOne annotation above. This option will automatically insert a user if it's present on the entity. Create src/controller/PointsController.ts to handle converting the data from your GraphQL queries and mutations.

import { Controller, Mutation, Query } from 'vesper'; import { EntityManager } from 'typeorm'; import { Points } from '../entity/Points'; @Controller() export class PointsController { constructor(private entityManager: EntityManager) { } // serves "points: [Points]" requests @Query() points() { return this.entityManager.find(Points); } // serves "pointsGet(id: Int): Points" requests @Query() pointsGet({id}) { return this.entityManager.findOne(Points, id); } // serves "pointsSave(id: Int, date: Date, exercise: Int, diet: Int, alcohol: Int, notes: String): Points" requests @Mutation() pointsSave(args) { const points = this.entityManager.create(Points, args); return this.entityManager.save(Points, points); } // serves "pointsDelete(id: Int): Boolean" requests @Mutation() async pointsDelete({id}) { await this.entityManager.remove(Points, {id: id}); return true; } }

Change src/index.ts to use Vesper's bootstrap() to configure everything.

import { bootstrap } from 'vesper'; import { PointsController } from './controller/PointsController'; import { Points } from './entity/Points'; import { User } from './entity/User'; bootstrap({ port: 4000, controllers: [ PointsController ], entities: [ Points, User ], schemas: [ __dirname + '/schema/**/*.graphql' ], cors: true }).then(() => { console.log('Your app is up and running on http://localhost:4000. ' + 'You can use playground in development mode on http://localhost:4000/playground'); }).catch(error => { console.error(error.stack ? error.stack : error); });

This code tells Vesper to register controllers, entities, GraphQL schemas, to run on port 4000, and to enable CORS (cross-origin resource sharing).

Start your API using npm start and navigate to http://localhost:4000/playground. In the left pane, enter the following mutation and press the play button. You might try typing the code below so you can experience the code completion that GraphQL provides you.

mutation { pointsSave(exercise:1, diet:1, alcohol:1, notes:"Hello World") { id date exercise diet alcohol notes } }

Your result should look similar to mine.

You can click the "SCHEMA" tab on the right to see the available queries and mutations. Pretty slick, eh?!

You can use the following points query to verify that data is in your database.

query { points {id date exercise diet notes} } Fix Dates

You might notice that the date returned from pointsSave and the points query is in a format the might be difficult for a JavaScript client to understand. You can fix that, install graphql-iso-date.

npm i graphql-iso-date@3.5.0

Then, add an import in src/index.ts and configure custom resolvers for the various date types. This example only uses Date, but it's helpful to know the other options.

import { GraphQLDate, GraphQLDateTime, GraphQLTime } from 'graphql-iso-date'; bootstrap({ ... // https://github.com/vesper-framework/vesper/issues/4 customResolvers: { Date: GraphQLDate, Time: GraphQLTime, DateTime: GraphQLDateTime }, ... });

Now running the points query will return a more client-friendly result.

{ "data": { "points": [ { "id": 1, "date": "2018-06-04", "exercise": 1, "diet": 1, "notes": "Hello World" } ] } }

You've written an API with GraphQL and TypeScript in about 20 minutes. How cool is that?! There's still work to do though. In the next sections, you'll create a React client for this API and add authentication with OIDC. Adding authentication will give you the ability to get the user's information and associate a user with their points.

Get Started with React

One of the quickest ways to get started with React is to use Create React App. Install the latest release using the command below.

npm i -g create-react-app@1.1.4

Navigate to the directory where you created your GraphQL API and create a React client.

cd health-tracker create-react-app react-client

Install the dependencies you'll need to talk to integrate Apollo Client with React, as well as Bootstrap and reactstrap.

npm i apollo-boost@0.1.7 react-apollo@2.1.4 graphql-tag@2.9.2 graphql@0.13.2 Configure Apollo Client for Your API

Open react-client/src/App.js and import ApolloClient from apollo-boost and add the endpoint to your GraphQL API.

import ApolloClient from 'apollo-boost'; const client = new ApolloClient({ uri: "http://localhost:4000/graphql" });

That's it! With only three lines of code, your app is ready to start fetching data. You can prove it by importing the gql function from graphql-tag. This will parse your query string and turn it into a query document.

import gql from 'graphql-tag'; class App extends Component { componentDidMount() { client.query({ query: gql` { points { id date exercise diet alcohol notes } } ` }) .then(result => console.log(result)); } ... }

Make sure to open your browser's developer tools so you can see the data after making this change. You could modify the console.log() to use this.setState({points: results.data.points}), but then you'd have to initialize the default state in the constructor. But there's an easier way: you can use ApolloProvider and Query components from react-apollo!

Below is a modified version of react-client/src/App.js that uses these components.

import React, { Component } from 'react'; import logo from './logo.svg'; import './App.css'; import ApolloClient from 'apollo-boost'; import gql from 'graphql-tag'; import { ApolloProvider, Query } from 'react-apollo'; const client = new ApolloClient({ uri: "http://localhost:4000/graphql" }); class App extends Component { render() { return ( <ApolloProvider client={client}> <div className="App"> <header className="App-header"> <img src={logo} className="App-logo" alt="logo" /> <h1 className="App-title">Welcome to React</h1> </header> <p className="App-intro"> To get started, edit <code>src/App.js</code> and save to reload. </p> <Query query={gql` { points {id date exercise diet alcohol notes} } `}> {({loading, error, data}) => { if (loading) return <p>Loading...</p>; if (error) return <p>Error: {error}</p>; return data.points.map(p => { return <div key={p.id}> <p>Date: {p.date}</p> <p>Points: {p.exercise + p.diet + p.alcohol}</p> <p>Notes: {p.notes}</p> </div> }) }} </Query> </div> </ApolloProvider> ); } } export default App;

You've built a GraphQL API and a React UI that talks to it - excellent work! However, there's still more to do. In the next sections, I'll show you how to add authentication to React, verify JWTs with Vesper, and add CRUD functionality to the UI. CRUD functionality already exists in the API thanks to the mutations you wrote earlier.

Add Authentication for React with OpenID Connect

You'll need to configure React to use Okta for authentication. You'll need to create an OIDC app in Okta for that.

Log in to your Okta Developer account (or sign up if you don’t have an account) and navigate to Applications > Add Application. Click Single-Page App, click Next, and give the app a name you’ll remember. Change all instances of localhost:8080 to localhost:3000 and click Done. Your settings should be similar to the screenshot below.

Okta's React SDK allows you to integrate OIDC into a React application. To install, run the following commands:

npm i @okta/okta-react@1.0.2 react-router-dom@4.2.2

Okta's React SDK depends on react-router, hence the reason for installing react-router-dom. Configuring routing in client/src/App.tsx is a common practice, so replace its code with the JavaScript below that sets up authentication with Okta.

import React, { Component } from 'react'; import { BrowserRouter as Router, Route } from 'react-router-dom'; import { ImplicitCallback, SecureRoute, Security } from '@okta/okta-react'; import Home from './Home'; import Login from './Login'; import Points from './Points'; function onAuthRequired({history}) { history.push('/login'); } class App extends Component { render() { return ( <Router> <Security issuer='https://{yourOktaDomain}.com/oauth2/default' client_id='{yourClientId}' redirect_uri={window.location.origin + '/implicit/callback'} onAuthRequired={onAuthRequired}> <Route path='/' exact={true} component={Home}/> <SecureRoute path='/points' component={Points}/> <Route path='/login' render={() => <Login baseUrl='https://{yourOktaDomain}.com'/>}/> <Route path='/implicit/callback' component={ImplicitCallback}/> </Security> </Router> ); } } export default App;

Make sure to replace {yourOktaDomain} and {yourClientId} in the code above. Your Okta domain should be something like dev-12345.oktapreview. Make sure you don't end up with two .com values in the URL!

The code in App.js references two components that don't exist yet: Home, Login, and Points. Create src/Home.js with the following code. This component renders the default route, provides a Login button, and links to your points and logout after you've logged in.

import React, { Component } from 'react'; import { withAuth } from '@okta/okta-react'; import { Button, Container } from 'reactstrap'; import AppNavbar from './AppNavbar'; import { Link } from 'react-router-dom'; export default withAuth(class Home extends Component { constructor(props) { super(props); this.state = {authenticated: null, userinfo: null, isOpen: false}; this.checkAuthentication = this.checkAuthentication.bind(this); this.checkAuthentication(); this.login = this.login.bind(this); this.logout = this.logout.bind(this); } async checkAuthentication() { const authenticated = await this.props.auth.isAuthenticated(); if (authenticated !== this.state.authenticated) { if (authenticated && !this.state.userinfo) { const userinfo = await this.props.auth.getUser(); this.setState({authenticated, userinfo}); } else { this.setState({authenticated}); } } } async componentDidMount() { this.checkAuthentication(); } async componentDidUpdate() { this.checkAuthentication(); } async login() { this.props.auth.login('/'); } async logout() { this.props.auth.logout('/'); this.setState({authenticated: null, userinfo: null}); } render() { if (this.state.authenticated === null) return null; const button = this.state.authenticated ? <div> <Button color="link"><Link to="/points">Manage Points</Link></Button><br/> <Button color="link" onClick={this.logout}>Logout</Button> </div>: <Button color="primary" onClick={this.login}>Login</Button>; const message = this.state.userinfo ? <p>Hello, {this.state.userinfo.given_name}!</p> : <p>Please log in to manage your points.</p>; return ( <div> <AppNavbar/> <Container fluid> {message} {button} </Container> </div> ); } });

This component uses <Container/> and <Button/> from reactstrap. Install reactstrap, so everything compiles. It depends on Bootstrap, so include it too.

npm i reactstrap@6.1.0 bootstrap@4.1.1

Add Bootstrap's CSS file as an import in src/index.js.

import 'bootstrap/dist/css/bootstrap.min.css';

You might notice there's a <AppNavbar/> in the Home component's render() method. Create src/AppNavbar.js so you can use a common header between components.

import React, { Component } from 'react'; import { Collapse, Nav, Navbar, NavbarBrand, NavbarToggler, NavItem, NavLink } from 'reactstrap'; import { Link } from 'react-router-dom'; export default class AppNavbar extends Component { constructor(props) { super(props); this.state = {isOpen: false}; this.toggle = this.toggle.bind(this); } toggle() { this.setState({ isOpen: !this.state.isOpen }); } render() { return <Navbar color="success" dark expand="md"> <NavbarBrand tag={Link} to="/">Home</NavbarBrand> <NavbarToggler onClick={this.toggle}/> <Collapse isOpen={this.state.isOpen} navbar> <Nav className="ml-auto" navbar> <NavItem> <NavLink href="https://twitter.com/oktadev">@oktadev</NavLink> </NavItem> <NavItem> <NavLink href="https://github.com/oktadeveloper/okta-react-graphql-example/">GitHub</NavLink> </NavItem> </Nav> </Collapse> </Navbar>; } }

In this example, I'm going to embed Okta's Sign-In Widget. Another option is to redirect to Okta and use a hosted login page. Install the Sign-In Widget using npm.

npm i @okta/okta-signin-widget@2.9.0

Create src/Login.js and add the following code to it.

import React, { Component } from 'react'; import { Redirect } from 'react-router-dom'; import OktaSignInWidget from './OktaSignInWidget'; import { withAuth } from '@okta/okta-react'; export default withAuth(class Login extends Component { constructor(props) { super(props); this.onSuccess = this.onSuccess.bind(this); this.onError = this.onError.bind(this); this.state = { authenticated: null }; this.checkAuthentication(); } async checkAuthentication() { const authenticated = await this.props.auth.isAuthenticated(); if (authenticated !== this.state.authenticated) { this.setState({authenticated}); } } componentDidUpdate() { this.checkAuthentication(); } onSuccess(res) { return this.props.auth.redirect({ sessionToken: res.session.token }); } onError(err) { console.log('error logging in', err); } render() { if (this.state.authenticated === null) return null; return this.state.authenticated ? <Redirect to={{pathname: '/'}}/> : <OktaSignInWidget baseUrl={this.props.baseUrl} onSuccess={this.onSuccess} onError={this.onError}/>; } });

The Login component has a reference to OktaSignInWidget. Create src/OktaSignInWidget.js:

import React, {Component} from 'react'; import ReactDOM from 'react-dom'; import OktaSignIn from '@okta/okta-signin-widget'; import '@okta/okta-signin-widget/dist/css/okta-sign-in.min.css'; import '@okta/okta-signin-widget/dist/css/okta-theme.css'; import './App.css'; export default class OktaSignInWidget extends Component { componentDidMount() { const el = ReactDOM.findDOMNode(this); this.widget = new OktaSignIn({ baseUrl: this.props.baseUrl }); this.widget.renderEl({el}, this.props.onSuccess, this.props.onError); } componentWillUnmount() { this.widget.remove(); } render() { return <div/>; } };

Create src/Points.js to render the list of points from your API.

import React, { Component } from 'react'; import { ApolloClient } from 'apollo-client'; import { createHttpLink } from 'apollo-link-http'; import { setContext } from 'apollo-link-context'; import { InMemoryCache } from 'apollo-cache-inmemory'; import gql from 'graphql-tag'; import { withAuth } from '@okta/okta-react'; import AppNavbar from './AppNavbar'; import { Alert, Button, Container, Table } from 'reactstrap'; import PointsModal from './PointsModal'; export const httpLink = createHttpLink({ uri: 'http://localhost:4000/graphql' }); export default withAuth(class Points extends Component { client; constructor(props) { super(props); this.state = {points: [], error: null}; this.refresh = this.refresh.bind(this); this.remove = this.remove.bind(this); } refresh(item) { let existing = this.state.points.filter(p => p.id === item.id); let points = [...this.state.points]; if (existing.length === 0) { points.push(item); this.setState({points}); } else { this.state.points.forEach((p, idx) => { if (p.id === item.id) { points[idx] = item; this.setState({points}); } }) } } remove(item, index) { const deletePoints = gql`mutation pointsDelete($id: Int) { pointsDelete(id: $id) }`; this.client.mutate({ mutation: deletePoints, variables: {id: item.id} }).then(result => { if (result.data.pointsDelete) { let updatedPoints = [...this.state.points].filter(i => i.id !== item.id); this.setState({points: updatedPoints}); } }); } componentDidMount() { const authLink = setContext(async (_, {headers}) => { const token = await this.props.auth.getAccessToken(); const user = await this.props.auth.getUser(); // return the headers to the context so httpLink can read them return { headers: { ...headers, authorization: token ? `Bearer ${token}` : '', 'x-forwarded-user': user ? JSON.stringify(user) : '' } } }); this.client = new ApolloClient({ link: authLink.concat(httpLink), cache: new InMemoryCache(), connectToDevTools: true }); this.client.query({ query: gql` { points { id, user { id, lastName } date, alcohol, exercise, diet, notes } }` }).then(result => { this.setState({points: result.data.points}); }).catch(error => { this.setState({error: <Alert color="danger">Failure to communicate with API.</Alert>}); }); } render() { const {points, error} = this.state; const pointsList = points.map(p => { const total = p.exercise + p.diet + p.alcohol; return <tr key={p.id}> <td style={{whiteSpace: 'nowrap'}}><PointsModal item={p} callback={this.refresh}/></td> <td className={total <= 1 ? 'text-danger' : 'text-success'}>{total}</td> <td>{p.notes}</td> <td><Button size="sm" color="danger" onClick={() => this.remove(p)}>Delete</Button></td> </tr> }); return ( <div> <AppNavbar/> <Container fluid> {error} <h3>Your Points</h3> <Table> <thead> <tr> <th width="10%">Date</th> <th width="10%">Points</th> <th>Notes</th> <th width="10%">Actions</th> </tr> </thead> <tbody> {pointsList} </tbody> </Table> <PointsModal callback={this.refresh}/> </Container> </div> ); } })

This code starts with refresh() and remove() methods, which I'll get to in a moment. The important part happens in componentDidMount(), where the access token is added in an Authorization header, and the user's information is stuffed in an x-forwarded-user header. An ApolloClient is created with this information, a cache is added, and the connectToDevTools flag is turned on. This can be useful for debugging with Apollo Client Developer Tools.

componentDidMount() { const authLink = setContext(async (_, {headers}) => { const token = await this.props.auth.getAccessToken(); // return the headers to the context so httpLink can read them return { headers: { ...headers, authorization: token ? `Bearer ${token}` : '', 'x-forwarded-user': user ? JSON.stringify(user) : '' } } }); this.client = new ApolloClient({ link: authLink.concat(httpLink), cache: new InMemoryCache(), connectToDevTools: true }); // this.client.query(...); }

Authentication with Apollo Client requires a few new dependencies. Install these now.

npm apollo-link-context@1.0.8 apollo-link-http@1.5.4

In the JSX of the page, there is a delete button that calls the remove() method in Points. There's also 'component. This is referenced for each item, as well as at the bottom. You'll notice both of these reference therefresh()` method, which updates the list.

<PointsModal item={p} callback={this.refresh}/> <PointsModal callback={this.refresh}/>

This component renders a link to edit a component, or an Add button when no item is set.

Create src/PointsModal.js and add the following code to it.

import React, { Component } from 'react'; import { Button, Form, FormGroup, Input, Label, Modal, ModalBody, ModalFooter, ModalHeader } from 'reactstrap'; import { withAuth } from '@okta/okta-react'; import { httpLink } from './Points'; import { ApolloClient } from 'apollo-client'; import { setContext } from 'apollo-link-context'; import { InMemoryCache } from 'apollo-cache-inmemory'; import gql from 'graphql-tag'; import { Link } from 'react-router-dom'; export default withAuth(class PointsModal extends Component { client; emptyItem = { date: (new Date()).toISOString().split('T')[0], exercise: 1, diet: 1, alcohol: 1, notes: '' }; constructor(props) { super(props); this.state = { modal: false, item: this.emptyItem }; this.toggle = this.toggle.bind(this); this.handleChange = this.handleChange.bind(this); this.handleSubmit = this.handleSubmit.bind(this); } componentDidMount() { if (this.props.item) { this.setState({item: this.props.item}) } const authLink = setContext(async (_, {headers}) => { const token = await this.props.auth.getAccessToken(); const user = await this.props.auth.getUser(); // return the headers to the context so httpLink can read them return { headers: { ...headers, authorization: token ? `Bearer ${token}` : '', 'x-forwarded-user': JSON.stringify(user) } } }); this.client = new ApolloClient({ link: authLink.concat(httpLink), cache: new InMemoryCache() }); } toggle() { if (this.state.modal && !this.state.item.id) { this.setState({item: this.emptyItem}); } this.setState({modal: !this.state.modal}); } render() { const {item} = this.state; const opener = item.id ? <Link onClick={this.toggle} to="#">{this.props.item.date}</Link> : <Button color="primary" onClick={this.toggle}>Add Points</Button>; return ( <div> {opener} <Modal isOpen={this.state.modal} toggle={this.toggle}> <ModalHeader toggle={this.toggle}>{(item.id ? 'Edit' : 'Add')} Points</ModalHeader> <ModalBody> <Form onSubmit={this.handleSubmit}> <FormGroup> <Label for="date">Date</Label> <Input type="date" name="date" id="date" value={item.date} onChange={this.handleChange}/> </FormGroup> <FormGroup check> <Label check> <Input type="checkbox" name="exercise" id="exercise" checked={item.exercise} onChange={this.handleChange}/>{' '} Did you exercise? </Label> </FormGroup> <FormGroup check> <Label check> <Input type="checkbox" name="diet" id="diet" checked={item.diet} onChange={this.handleChange}/>{' '} Did you eat well? </Label> </FormGroup> <FormGroup check> <Label check> <Input type="checkbox" name="alcohol" id="alcohol" checked={item.alcohol} onChange={this.handleChange}/>{' '} Did you drink responsibly? </Label> </FormGroup> <FormGroup> <Label for="notes">Notes</Label> <Input type="textarea" name="notes" id="notes" value={item.notes} onChange={this.handleChange}/> </FormGroup> </Form> </ModalBody> <ModalFooter> <Button color="primary" onClick={this.handleSubmit}>Save</Button>{' '} <Button color="secondary" onClick={this.toggle}>Cancel</Button> </ModalFooter> </Modal> </div> ) }; handleChange(event) { const target = event.target; const value = target.type === 'checkbox' ? (target.checked ? 1 : 0) : target.value; const name = target.name; let item = {...this.state.item}; item[name] = value; this.setState({item}); } handleSubmit(event) { event.preventDefault(); const {item} = this.state; const updatePoints = gql` mutation pointsSave($id: Int, $date: Date, $exercise: Int, $diet: Int, $alcohol: Int, $notes: String) { pointsSave(id: $id, date: $date, exercise: $exercise, diet: $diet, alcohol: $alcohol, notes: $notes) { id date } }`; this.client.mutate({ mutation: updatePoints, variables: { id: item.id, date: item.date, exercise: item.exercise, diet: item.diet, alcohol: item.alcohol, notes: item.notes } }).then(result => { let newItem = {...item}; newItem.id = result.data.pointsSave.id; this.props.callback(newItem); this.toggle(); }); } });

Make sure your GraphQL backend is started, then start the React frontend with npm start. The text squishes up against the top navbar, so add some padding by adding a rule in src/index.css.

.container-fluid { padding-top: 10px; }

You should see the Home component and a button to log in.

Click Login and you'll be prompted to enter your Okta credentials.

And then you'll be logged in!

Click Manage Points to see the points list.

It's cool to see everything working, isn't it?! :D

Your React frontend is secured, but your API is still wide open. Let's fix that.

Get User Information from JWTs

Navigate to your graphql-api project in a terminal window and install Okta's JWT Verifier.

npm i @okta/jwt-verifier@0.0.12

Create graphql-api/src/CurrentUser.ts to hold the current user's information.

export class CurrentUser { constructor(public id: string, public firstName: string, public lastName: string) {} }

Import OktaJwtVerifier and CurrentUser in graphql-api/src/index.ts and configure the JWT verifier to use your OIDC app's settings.

import * as OktaJwtVerifier from '@okta/jwt-verifier'; import { CurrentUser } from './CurrentUser'; const oktaJwtVerifier = new OktaJwtVerifier({ clientId: '{yourClientId}, issuer: 'https://{yourOktaDomain}.com/oauth2/default' });

In the bootstrap configuration, define setupContainer to require an authorization header and set the current user from the x-forwarded-user header.

bootstrap({ … cors: true, setupContainer: async (container, action) => { const request = action.request; // require every request to have an authorization header if (!request.headers.authorization) { throw Error('Authorization header is required!'); } let parts = request.headers.authorization.trim().split(' '); let accessToken = parts.pop(); await oktaJwtVerifier.verifyAccessToken(accessToken) .then(async jwt => { const user = JSON.parse(request.headers['x-forwarded-user'].toString()); const currentUser = new CurrentUser(jwt.claims.uid, user.given_name, user.family_name); container.set(CurrentUser, currentUser); }) .catch(error => { throw Error('JWT Validation failed!'); }) } ... });

Modify graphql-api/src/controller/PointsController.ts to inject the CurrentUser as a dependency. While you're in there, adjust the points() method to filter by user ID and modify pointsSave() to set the user when saving.

import { Controller, Mutation, Query } from 'vesper'; import { EntityManager } from 'typeorm'; import { Points } from '../entity/Points'; import { User } from '../entity/User'; import { CurrentUser } from '../CurrentUser'; @Controller() export class PointsController { constructor(private entityManager: EntityManager, private currentUser: CurrentUser) { } // serves "points: [Points]" requests @Query() points() { return this.entityManager.getRepository(Points).createQueryBuilder("points") .innerJoin("points.user", "user", "user.id = :id", { id: this.currentUser.id }) .getMany(); } // serves "pointsGet(id: Int): Points" requests @Query() pointsGet({id}) { return this.entityManager.findOne(Points, id); } // serves "pointsSave(id: Int, date: Date, exercise: Int, diet: Int, alcohol: Int, notes: String): Points" requests @Mutation() pointsSave(args) { // add current user to points saved if (this.currentUser) { const user = new User(); user.id = this.currentUser.id; user.firstName = this.currentUser.firstName; user.lastName = this.currentUser.lastName; args.user = user; } const points = this.entityManager.create(Points, args); return this.entityManager.save(Points, points); } // serves "pointsDelete(id: Int): Boolean" requests @Mutation() async pointsDelete({id}) { await this.entityManager.remove(Points, {id: id}); return true; } }

Restart the API, and you should be off to the races!

Source Code

You can find the source code for this article at https://github.com/oktadeveloper/okta-react-graphql-example.

Learn More About React, Node, and User Authentication

This article showed you how to build a secure React app with GraphQL, TypeORM, and Node/Vesper. I hope you enjoyed the experience!

At Okta, we care about making authentication with React and Node easy to implement. We have several blog posts on the topic, and documentation too! I encourage you to check out the following links:

I hope you have an excellent experience building apps with React and GraphQL. If you have any questions, please hit me up on Twitter or my whole kick-ass team on @oktadev. Our DMs are wide open! :)

Categories: Web Technologies

MySQL 8.0: Optimizing Small Partial Update of LOB in InnoDB

Planet MySQL - Tue, 06/12/2018 - 08:39

In this article I will explain the partial update optimizations for smaller (LOBs) in InnoDB. Small here qualifies the size of the modification and not the size of the LOB.  For some background information about the partial update feature, kindly go through our previous posts on this (here, here and here).…

Categories: Web Technologies

Webinar Weds 6/13: Performance Analysis and Troubleshooting Methodologies for Databases

Planet MySQL - Tue, 06/12/2018 - 04:48

Please join Percona’s CEO, Peter Zaitsev as he presents Performance Analysis and Troubleshooting Methodologies for Databases on Wednesday, June 13th, 2018 at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).

Register Now

 

Have you heard about the USE Method (Utilization – Saturation – Errors)? RED (Rate – Errors – Duration), or Golden Signals (Latency – Traffic – Errors – Saturations)?

In this presentation, we will talk briefly about these different-but-similar “focuses”. We’ll discuss how we can apply them to data infrastructure performance analysis, troubleshooting, and monitoring.

We will use MySQL as an example, but most of this talk applies to other database technologies too.

Register for the webinar

About Peter Zaitsev, CEO

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 140 professionals in 30 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014, 2015 and 2016.

Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization Volume 1 is one of percona.com’s most popular downloads. Peter lives in North Carolina with his wife and two children. In his spare time, Peter enjoys travel and spending time outdoors.

The post Webinar Weds 6/13: Performance Analysis and Troubleshooting Methodologies for Databases appeared first on Percona Database Performance Blog.

Categories: Web Technologies

Setting up Basic Master-Slave Replication in MySQL 8

Planet MySQL - Tue, 06/12/2018 - 02:37

Since April 19th, when MySQL 8.0 became Generally Available (GA), the MySQL community has been abuzz with excitement over all of the new features and improvements. Many of new features were improvements to performance or monitoring, while others were specifically related to replication. We reviewed Replication Performance Enhancements in MySQL 8 recently. Today’s blog will describe how to set up a basic master-slave configuration with MySQL, using two servers on a single machine.

Replication Defined

MySQL replication is a process in which data from one MySQL database server (the master) is copied automatically to one or more MySQL database servers (the slaves). In the case of multiple slaves, these are usually referred to as a slave cluster. Replication should not to be confused with backup operations. Whereas the aim of backups is to protect the data and/or data structure, the role of replication is typically to spread read and write workloads across multiple servers for the purposes of scalability. In some cases, replication is also employed for failover, or for analyzing data on the slave in order to lessen the master’s workload.

Master-slave replication is almost always a one-way replication, from master to slave, whereby the master database is employed for write operations, while read operations may be performed on a single slave database or spread across several slaves. This is because the ratio of read requests is almost always higher than writes for most applications. If your single read database server (the slave) is starting to find it difficult to keep up, you can always add additional read servers to your cluster. This is known as scaling your database servers horizontally. In terms of applications, this configuration requires that you have at least two data sources defined, one for write operations and the other for read operations.

A more rare scenario is where your database server(s) perform more writes than reads. That may require another topology called sharding, which involves splitting up your data across different tables or even across different physical databases. Sharding is most beneficial when you have a dataset that A) outgrows storage limits or B) contributes to unreasonable performance within a single database.

The master and slave(s) can reside on the same physical server or on different ones. Placing both on the same machine offers some performance advantages by omitting the shuttling of the data over the network or internet connection.

The Master -> Slave replication process can be either synchronous or asynchronous. The difference lies in the timing: if the changes are made to the Master and Slave at the same time, the process is said to be synchronous; if changes are queued up and written later, the process is considered asynchronous.

There are a number of ways of propagating changes to the slave databases. In Trigger-Based replication, triggers on columns in the Master system propagate write changes to the Slave system.

 

 

 

 

 

Fig 1 – Trigger-based replication | From Wikimedia Commons, the free media repository

 

 

Another replication mechanism called Row-based Replication was introduced in MySQL 5.1. It’s a type of binary logging that provides the best combination of data integrity and performance for many users.

In Log-based replication, the transaction log files are copied to another instance and replayed there. The log files are intended to reproduce a working version of the database in the event of a failure, so they represent an ideal mechanism for replicating data. This type of replication always works asynchronously.

 

 

 

Fig 2 – Log-based replication | From Wikimedia Commons, the free media repository

 

 

 

Setting up the MySQL Slave Server

While setting up a single MySQL instance is fairly simple, adding a second MySQL server to the mix entails some extra effort. There are then a few extra steps to configure the master-slave replication. These differ based on your Operating System so I will describe the process for Linux and Windows in turn.

Recall that master-slave replication is a one-way mechanism from master to slave, whereby the master database is used for the write operations, while read operations are handled by the slave database(s). What this means is that we’ll need to have two MySQL instances running on the same box, at the same time. As we’ll see in a moment, we have to take steps to ensure a smooth coexistence between the two instances. This tutorial assumes that you’ve already got a MySQL instance running. We’ll make that database instance the master; hence, it will handle write operations. Next, we’ll create a 2nd database instance to act as the slave, for handling read operations.

Instructions for Linux

Due to the many flavors of Linux, note that the exact details of your installation may differ slightly. This tutorial uses Ubuntu Linux as the host operating system, thus the provided commands are for that operating system. If you want to setup your MySQL master-slave replication on a different flavor of Linux, you may need to make adjustments for its specific commands. Having said that, the general principles of setting up the MySQL master-slave replication on the same machine are basically the same for all Linux variations.

Installing mysqld_multi on Linux Systems

In order to manage two MySQL instances on the same machine efficiently on Linux, you’ll need to use mysqld_multi. It’s designed to manage several mysqld processes that listen for connections on different Unix socket files and TCP/IP ports. It can start or stop servers, or report their current status.

Note that, for some Linux platforms, MySQL installation from RPM or Debian packages includes systemd support for managing MySQL server startup and shutdown. On these platforms, mysqld_multi is not installed because it is unnecessary.

The first step in setting up mysqld_multi is to create two separate [mysqld] groups in the existing my.cnf file. On Ubuntu, the default location of the my.cnf file is /etc/mysql/.

Open my.cnf file in your favorite text editor and rename the existing [mysqld] group to [mysqld1]. This renamed group will be used for the configuration of the master MySQL instance. Each instance must have its own unique server-id, so add the following line in the [mysqld1] group:

server-id = 1

To create a separate [mysqld] group for the second MySQL instance, copy the [mysqld1] group with all current settings, and paste it directly below in the same my.cnf file. Now, rename the copied group to [mysqld2], and make the following changes in the configuration for the slave:

server-id        = 2

port             = 3307

socket           = /var/run/mysqld/mysqld_slave.sock

pid-file         = /var/run/mysqld/mysqld_slave.pid

datadir          = /var/lib/mysql_slave

log_error        = /var/log/mysql_slave/error_slave.log

relay-log       = /var/log/mysql_slave/relay-bin

relay-log-index = /var/log/mysql_slave/relay-bin.index

master-info-file = /var/log/mysql_slave/master.info

relay-log-info-file = /var/log/mysql_slave/relay-log.info

read_only        = 1

Some things to notice:

  • To setup the second MySQL instance as a slave, we set its server-id to 2, so that it is different to the master’s server-id.
  • Since both instances will run on the same machine, we set the port for the second instance to 3307; the first instance will continue to use the default port of 3306.
  • In order to enable this second instance to use the same MySQL binaries, we need to set different values for the socket, pid-file, datadir and log_error.
  • We also need to enable relay-log in order to use the second instance as a slave by setting the relay-log, relay-log-index and relay-log-info-file parameters, as well as master-info-file.
  • Finally, the read_only parameter is set to 1 in order to make the slave instance read-only. Keep in mind that prior to MySQL 5.7.8, this option doesn’t completely prevent changes on the slave. Even when the read_only is set to 1, updates will still be permitted from users who have the SUPER privilege. More recent versions of MySQL has the super_read_only parameter to prevent SUPER users making changes.

In addition to the [mysqld1] and [mysqld2] groups, we also need to add a new group [mysqld_multi] to the my.cnf file:

[mysqld_multi]

mysqld = /usr/bin/mysqld_safe

mysqladmin = /usr/bin/mysqladmin

user    = multi_admin

password   = multipass

Once we install the second MySQL instance, and we start up both, we will grant appropriate privileges to the multi_admin user in order to be able to shut down both MySQL instances.

Create New Folders

In the previous step we prepped the my.cnf configuration file for the second MySQL instance. In that configuration file two new folders are referenced. The following Linux commands will create those folders with the appropriate privileges:

mkdir -p /var/lib/mysql_slave

chmod –reference /var/lib/mysql /var/lib/mysql_slave

chown –reference /var/lib/mysql /var/lib/mysql_slave

mkdir -p /var/log/mysql_slave

chmod –reference /var/log/mysql /var/log/mysql_slave

chown –reference /var/log/mysql /var/log/mysql_slave

AppArmor security settings on Ubuntu

In some Linux environments, including Ubuntu, AppArmor security settings are needed in order to run the second MySQL instance. AppArmor, which stands for “Application Armor”, is a Linux kernel security module that allows the system administrator to restrict programs’ capabilities with per-program profiles. Profiles can allow capabilities like network access, raw socket access, and the permission to read, write, or execute files on matching paths. To properly set-up AppArmor, add the following lines to the /etc/apparmor.d/usr.sbin.mysqld file using your favorite text editor:

/var/lib/mysql_slave/ r,

/var/lib/mysql_slave/** rwk,

/var/log/mysql_slave/ r,

/var/log/mysql_slave/* rw,

/var/run/mysqld/mysqld_slave.pid rw,

/var/run/mysqld/mysqld_slave.sock w,

/run/mysqld/mysqld_slave.pid rw,

/run/mysqld/mysqld_slave.sock w,

After you save the file, you’ll need to reboot the machine for the changes to take effect.

Installing the Second MySQL Instance

There are a few different approaches that you can follow in the installation of the second MySQL instance. We’ll be sharing the MySQL binaries, while employing separate data files for each installation.

Since we have already prepared the configuration file, necessary folders and security changes in the previous steps, the final installation step of the second MySQL instance is to initialize the MySQL data directory. Here’s the command to do that:

mysql_install_db –user=mysql –datadir=/var/lib/mysql_slave

Once MySQL data directory is initialized, you can start both MySQL instances using the mysqld_multi service:

mysqld_multi start

Set the root password for the second MySQL instance by using the mysqladmin with the appropriate host and port. Both the host and port are required, because if they are not specified, mysqladmin will connect to the first MySQL instance by default:

mysqladmin –host=127.0.0.1 –port=3307 -u root password rootpwd

In the example above I set the password to ìrootpwdî, but using a more secure password is recommended.

Windows Instructions

Again, the first step in setting up replication involves editing the my.cnf file. In this case, we’ll provide two local configuration files named “master.cnf” and “slave.cnf” that will be used when starting up the MySQL servers.

At a minimum you’ll want to add two options to the [mysqld] section of the master.cnf file:

  1. log-bin: in this example we choose master-bin.log.
  2. server-id: in this example we choose 1. The server cannot act as a replication master unless binary logging is enabled. The server_id variable must be a positive integer value between 1 to 2^32.

The master.cnf file:

[mysqld]

server-id=1

log-bin=master-bin.log

datadir=/home/robg/mysql/master/data

innodb_flush_log_at_trx_commit=1

sync_binlog=1

Note: For the greatest possible durability and consistency in a replication setup using InnoDB with transactions, you should also specify the innodb_flush_log_at_trx_commit=1, sync_binlog=1 options.

You’ll need to add the server-id option to the [mysqld] section of the slave’s slave.cnf file as well. It must be different from the ID of the master.

The slave.cnf file:

[mysqld]

server-id=2

relay-log-index=slave-relay-bin.index

relay-log=slave-relay-bin

datadir=/home/robg/mysql/slave/data

Next, start the MySQL servers from the command line or using the service manager if you’re running your instances as a service. Here are some example commands:

[admin@master ~]$ mysqld –defaults-file=/home/robg/mysql/master/master.cnf &

[admin@slave ~]$ mysqld –defaults-file=/home/robg/mysql/slave/slave.cnf&

Create the Replication User Account

We’ll need to create an account on the master server that the slave server can use to connect. This account must be given the REPLICATION SLAVE privilege:

[admin@master ~]$ mysql -u root –prompt=’master> ‘

master> CREATE USER repl_user@192.168.0.34;

master> GRANT REPLICATION SLAVE ON *.* TO repl_user@192.168.0.34 IDENTIFIED BY ‘admin’;

Initialize Replication

We are now ready to initialize replication on the slave using the CHANGE MASTER command:

slave> CHANGE MASTER TO MASTER_HOST=’192.168.0.31′,

-> MASTER_USER=’repl_user’,

-> MASTER_PASSWORD=’*********’,

-> MASTER_LOG_FILE=”,

-> MASTER_LOG_POS=4;

Here’s an explanation of each of the above parameters:

  1. MASTER_HOST: the IP or hostname of the master server, in this example “slave” or 192.168.0.31
  2. MASTER_USER: this is the user we granted the REPLICATION SLAVE privilege to above, in this example, “repl_user”
  3. MASTER_PASSWORD: this is the password we assigned to “rep_user” above
  4. MASTER_LOG_FILE: is an empty string although it would contain data if there were existing writes to be picked up from the master
  5. MASTER_LOG_POS: is 4 (again, this value would likely be different if there were existing writes to be picked up from the master

Lastly, issue the following command to start replication on the slave:

slave> start slave;

Configuration Option Tips

The MySQL 8.0 Reference Manual contains lots of useful information on running multiple MySQL instances on the same machine.

To summarize some of the key points:

  1. Parameters may be set on the command line, in configuration files, or by setting environment variables (though configuration files are recommended for replication). To see the values used by a given instance, you can connect to it and execute a SHOW VARIABLES statement.
  2. The primary resource managed by a MySQL instance is the data directory. Each instance should use a different data directory, the location of which is specified using the –datadir=dir_name option.
  1.     In addition to using different data directories, several other options must have different values for each server instance:
    1. –port=port_num: –port controls the port number for TCP/IP connections. Alternatively, if the host has multiple network addresses, you can use –bind-address to cause each server to listen to a different address.
    2. –socket={file_name|pipe_name}: –socket controls the Unix socket file path on Unix or the named pipe name on Windows. On Windows, it is necessary to specify distinct pipe names only for those servers configured to permit named-pipe connections.
    3. –shared-memory-base-name=name: This option is used only on Windows. It designates the shared-memory name used by a Windows server to permit clients to connect using shared memory. It is necessary to specify distinct shared-memory names only for those servers configured to permit shared-memory connections.
    4. –pid-file=file_name: This option indicates the path name of the file in which the server writes its process ID.
    5. If you use the following log file options, their values must differ for each server:
      1. –general_log_file=file_name
      2. –log-bin[=file_name]
      3. –slow_query_log_file=file_name
      4. –log-error[=file_name]
    6. –tmpdir=dir_name: To achieve better performance, you can specify the –tmpdir option differently for each server, to spread the load between several physical disks.
Testing the Replication Process

You should at least perform a basic check to ensure that replication is indeed working. Here’s a simple example that inserts some data into the “test” table on the master server and then verifies that the new rows are replicated on the slave server:

master> insert into test values (1),(2),(3),(4);

slave> select * from test;

+—-+

|  id  |

+—-+

|   1   |

|   2  |

|   3  |

|   4  |

+—-+

Conclusion

In today’s blog, we learned how to set up a basic master-slave configuration with MySQL, using two servers on a single machine. In the next blog, we’ll explore how to set up slaves on different physical servers.

The post Setting up Basic Master-Slave Replication in MySQL 8 appeared first on Monyog Blog.

Categories: Web Technologies

MySQL 8.0 Group Replication Limitations

Planet MySQL - Tue, 06/12/2018 - 02:08

We build highly available and fault tolerant MySQL database infrastructure operations for some of the largest internet properties in this planet,  Our consulting team spend several hours daily researching on MySQL documentation and MySQL blogs to understand what are the best possible ways we can build optimal, scalable, highly available and reliable database infrastructure operations for planet-scale web properties. The most common approach towards building a fault-tolerant system is to make all the components in the ecosystem redundant, To make it even simple, component can be removed and system should continue to operate as expected.  MySQL replication is an proven method to build redundant database infrastructure operations, operationally these systems are highly complex, requiring maintenance and administration of several servers instead of just one, You need Sr. DBAs to manage such systems.

MySQL Group Replication can operate in both single-primary mode with automatic primary election, where only one server accepts updates at a time and multi-primary mode, where all servers can accept updates, even if they are issued concurrently. The built-in group membership service retains view of the group consistent and available for all servers at any given point in time, Servers can leave and join the group and the view is updated accordingly. If servers leave the group unexpectedly, the failure detection mechanism detects this and notifies the group that the view has changed, All this happens automatically !

For any transaction to commit, the majority of group members have to agree on the order of a given transaction in the global sequence of transactions. Decision to commit or abort a transaction is taken by respective servers, but all servers make same decision. In the case of systems built on network partition, When the members of the group are unable to reach agreement on a transaction, the system halt till the issue is resolved, This guarantees split-brain protection mechanism.

All these fault tolerance mechanisms are powered by Group Communication System (GCS) protocols, a group membership service, and completely safe ordered message delivery system. Group Replication is powered by Paxos algorithm (https://en.wikipedia.org/wiki/Paxos_(computer_science)), which acts as the group communication engine.

Till now, we have just posted about the capabilities of Group Replication, Now we have mentioned below (in bullets) about all the limitations of Group Replication:

  • set –binlog-checksum=NONE – Group replication cannot benefit from –binlog-checksum due to the design limitation of replication event checksums.
  • Gap Locks not supported – The information about gap locks are not available outside InnoDB so certification process cannot acknowledge gap locks  .
  • Table Locks and Named Locks not supported – Certification process will not acknowledge table / named locks.
  • Concurrent DDL versus DML Operations – Concurrent data definition statements and data manipulation statements executing against the same object but on different servers is not supported when using multi-primary mode. During execution of Data Definition Language (DDL) statements on an object, executing concurrent Data Manipulation Language (DML) on the same object but on a different server instance has the risk of conflicting DDL executing on different instances not being detected.
  • Very Large Transactions.  Individual transactions that result in GTID contents which are large enough that it cannot be copied between group members over the network within a 5 second window can cause failures in the group communication. To avoid this issue try and limit the size of your transactions as much as possible. For example, split up files used with LOAD DATA INFILE into smaller chunks.
  • Multi-primary Mode Deadlock.  When a group is operating in multi-primary mode, SELECT … FOR UPDATE statements can result in a deadlock. This is because the lock is not shared across the members of the group, therefore the expectation for such a statement might not be reached.

 

The post MySQL 8.0 Group Replication Limitations appeared first on MySQL Consulting, Support and Remote DBA Services By MinervaDB.

Categories: Web Technologies

How to Benchmark Performance of MySQL & MariaDB using SysBench

Planet MySQL - Tue, 06/12/2018 - 01:08

What is SysBench? If you work with MySQL on a regular basis, then you most probably have heard of it. SysBench has been in the MySQL ecosystem for a long time. It was originally written by Peter Zaitsev, back in 2004. Its purpose was to provide a tool to run synthetic benchmarks of MySQL and the hardware it runs on. It was designed to run CPU, memory and I/O tests. It had also an option to execute OLTP workload on a MySQL database. OLTP stands for online transaction processing, typical workload for online applications like e-commerce, order entry or financial transaction systems.

In this blog post, we will focus on the SQL benchmark feature but keep in mind that hardware benchmarks can also be very useful in identifying issues on database servers. For example, I/O benchmark was intended to simulate InnoDB I/O workload while CPU tests involve simulation of highly concurrent, multi-treaded environment along with tests for mutex contentions - something which also resembles a database type of workload.

SysBench history and architecture

As mentioned, SysBench was originally created in 2004 by Peter Zaitsev. Soon after, Alexey Kopytov took over its development. It reached version 0.4.12 and the development halted. After a long break Alexey started to work on SysBench again in 2016. Soon version 0.5 has been released with OLTP benchmark rewritten to use LUA-based scripts. Then, in 2017, SysBench 1.0 was released. This was like day and night compared to the old, 0.4.12 version. First and the foremost, instead of hardcoded scripts, now we have the ability to customize benchmarks using LUA. For instance, Percona created TPCC-like benchmark which can be executed using SysBench. Let’s take a quick look at the current SysBench architecture.

SysBench is a C binary which uses LUA scripts to execute benchmarks. Those scripts have to:

  1. Handle input from command line parameters
  2. Define all of the modes which the benchmark is supposed to use (prepare, run, cleanup)
  3. Prepare all of the data
  4. Define how the benchmark will be executed (what queries will look like etc)

Scripts can utilize multiple connections to the database, they can also process results should you want to create complex benchmarks where queries depend on the result set of previous queries. With SysBench 1.0 it is possible to create latency histograms. It is also possible for the LUA scripts to catch and handle errors through error hooks. There’s support for parallelization in the LUA scripts, multiple queries can be executed in parallel, making, for example, provisioning much faster. Last but not least, multiple output formats are now supported. Before SysBench generated only human-readable output. Now it is possible to generate it as CSV or JSON, making it much easier to do post-processing and generate graphs using, for example, gnuplot or feed the data into Prometheus, Graphite or similar datastore.

Why SysBench?

The main reason why SysBench became popular is the fact that it is simple to use. Someone without prior knowledge can start to use it within minutes. It also provides, by default, benchmarks which cover most of the cases - OLTP workloads, read-only or read-write, primary key lookups and primary key updates. All which caused most of the issues for MySQL, up to MySQL 8.0. This was also a reason why SysBench was so popular in different benchmarks and comparisons published on the Internet. Those posts helped to promote this tool and made it into the go-to synthetic benchmark for MySQL.

Another good thing about SysBench is that, since version 0.5 and incorporation of LUA, anyone can prepare any kind of benchmark. We already mentioned TPCC-like benchmark but anyone can craft something which will resemble her production workload. We are not saying it is simple, it will be most likely a time-consuming process, but having this ability is beneficial if you need to prepare a custom benchmark.

Being a synthetic benchmark, SysBench is not a tool which you can use to tune configurations of your MySQL servers (unless you prepared LUA scripts with custom workload or your workload happen to be very similar to the benchmark workloads that SysBench comes with). What it is great for is to compare performance of different hardware. You can easily compare performance of, let’s say, different type of nodes offered by your cloud provider and maximum QPS (queries per second) they offer. Knowing that metric and knowing what you pay for given node, you can then calculate even more important metric - QP$ (queries per dollar). This will allow you to identify what node type to use when building a cost-efficient environment. Of course, SysBench can be used also for initial tuning and assessing feasibility of a given design. Let’s say we build a Galera cluster spanning across the globe - North America, EU, Asia. How many inserts per second can such a setup handle? What would be the commit latency? Does it even make sense to do a proof of concept or maybe network latency is high enough that even a simple workload does not work as you would expect it to.

What about stress-testing? Not everyone has moved to the cloud, there are still companies preferring to build their own infrastructure. Every new server acquired should go through a warm-up period during which you will stress it to pinpoint potential hardware defects. In this case SysBench can also help. Either by executing OLTP workload which overloads the server, or you can also use dedicated benchmarks for CPU, disk and memory.

As you can see, there are many cases in which even a simple, synthetic benchmark can be very useful. In the next paragraph we will look at what we can do with SysBench.

What SysBench can do for you? What tests you can run?

As mentioned at the beginning, we will focus on OLTP benchmarks and just as a reminder we’ll repeat that SysBench can also be used to perform I/O, CPU and memory tests. Let’s take a look at the benchmarks that SysBench 1.0 comes with (we removed some helper LUA files and non-database LUA scripts from this list).

-rwxr-xr-x 1 root root 1.5K May 30 07:46 bulk_insert.lua -rwxr-xr-x 1 root root 1.3K May 30 07:46 oltp_delete.lua -rwxr-xr-x 1 root root 2.4K May 30 07:46 oltp_insert.lua -rwxr-xr-x 1 root root 1.3K May 30 07:46 oltp_point_select.lua -rwxr-xr-x 1 root root 1.7K May 30 07:46 oltp_read_only.lua -rwxr-xr-x 1 root root 1.8K May 30 07:46 oltp_read_write.lua -rwxr-xr-x 1 root root 1.1K May 30 07:46 oltp_update_index.lua -rwxr-xr-x 1 root root 1.2K May 30 07:46 oltp_update_non_index.lua -rwxr-xr-x 1 root root 1.5K May 30 07:46 oltp_write_only.lua -rwxr-xr-x 1 root root 1.9K May 30 07:46 select_random_points.lua -rwxr-xr-x 1 root root 2.1K May 30 07:46 select_random_ranges.lua

Let’s go through them one by one.

First, bulk_insert.lua. This test can be used to benchmark the ability of MySQL to perform multi-row inserts. This can be quite useful when checking, for example, performance of replication or Galera cluster. In the first case, it can help you answer a question: “how fast can I insert before replication lag will kick in?”. In the later case, it will tell you how fast data can be inserted into a Galera cluster given the current network latency.

All oltp_* scripts share a common table structure. First two of them (oltp_delete.lua and oltp_insert.lua) execute single DELETE and INSERT statements. Again, this could be a test for either replication or Galera cluster - push it to the limits and see what amount of inserting or purging it can handle. We also have other benchmarks focused on particular functionality - oltp_point_select, oltp_update_index and oltp_update_non_index. These will execute a subset of queries - primary key-based selects, index-based updates and non-index-based updates. If you want to test some of these functionalities, the tests are there. We also have more complex benchmarks which are based on OLTP workloads: oltp_read_only, oltp_read_write and oltp_write_only. You can run either a read-only workload, which will consist of different types of SELECT queries, you can run only writes (a mix of DELETE, INSERT and UPDATE) or you can run a mix of those two. Finally, using select_random_points and select_random_ranges you can run some random SELECT either using random points in IN() list or random ranges using BETWEEN.

How you can configure a benchmark?

What is also important, benchmarks are configurable - you can run different workload patterns using the same benchmark. Let’s take a look at the two most common benchmarks to execute. We’ll have a deep dive into OLTP read_only and OLTP read_write benchmarks. First of all, SysBench has some general configuration options. We will discuss here only the most important ones, you can check all of them by running:

sysbench --help

Let’s take a look at them.

--threads=N number of threads to use [1]

You can define what kind of concurrency you’d like SysBench to generate. MySQL, as every software, has some scalability limitations and its performance will peak at some level of concurrency. This setting helps to simulate different concurrencies for a given workload and check if it already has passed the sweet spot.

--events=N limit for total number of events [0] --time=N limit for total execution time in seconds [10]

Those two settings govern how long SysBench should keep running. It can either execute some number of queries or it can keep running for a predefined time.

--warmup-time=N execute events for this many seconds with statistics disabled before the actual benchmark run with statistics enabled [0]

This is self-explanatory. SysBench generates statistical results from the tests and those results may be affected if MySQL is in a cold state. Warmup helps to identify “regular” throughput by executing benchmark for a predefined time, allowing to warm up the cache, buffer pools etc.

--rate=N average transactions rate. 0 for unlimited rate [0]

By default SysBench will attempt to execute queries as fast as possible. To simulate slower traffic this option may be used. You can define here how many transactions should be executed per second.

--report-interval=N periodically report intermediate statistics with a specified interval in seconds. 0 disables intermediate reports [0]

By default SysBench generates a report after it completed its run and no progress is reported while the benchmark is running. Using this option you can make SysBench more verbose while the benchmark still runs.

--rand-type=STRING random numbers distribution {uniform, gaussian, special, pareto, zipfian} to use by default [special]

SysBench gives you ability to generate different types of data distribution. All of them may have their own purposes. Default option, ‘special’, defines several (it is configurable) hot-spots in the data, something which is quite common in web applications. You can also use other distributions if your data behaves in a different way. By making a different choice here you can also change the way your database is stressed. For example, uniform distribution, where all of the rows have the same likeliness of being accessed, is much more memory-intensive operation. It will use more buffer pool to store all of the data and it will be much more disk-intensive if your data set won’t fit in memory. On the other hand, special distribution with couple of hot-spots will put less stress on the disk as hot rows are more likely to be kept in the buffer pool and access to rows stored on disk is much less likely. For some of the data distribution types, SysBench gives you more tweaks. You can find this info in ‘sysbench --help’ output.

--db-ps-mode=STRING prepared statements usage mode {auto, disable} [auto]

Using this setting you can decide if SysBench should use prepared statements (as long as they are available in the given datastore - for MySQL it means PS will be enabled by default) or not. This may make a difference while working with proxies like ProxySQL or MaxScale - they should treat prepared statements in a special way and all of them should be routed to one host making it impossible to test scalability of the proxy.

In addition to the general configuration options, each of the tests may have its own configuration. You can check what is possible by running:

root@vagrant:~# sysbench ./sysbench/src/lua/oltp_read_write.lua help sysbench 1.1.0-2e6b7d5 (using bundled LuaJIT 2.1.0-beta3) oltp_read_only.lua options: --distinct_ranges=N Number of SELECT DISTINCT queries per transaction [1] --sum_ranges=N Number of SELECT SUM() queries per transaction [1] --skip_trx[=on|off] Don't start explicit transactions and execute all queries in the AUTOCOMMIT mode [off] --secondary[=on|off] Use a secondary index in place of the PRIMARY KEY [off] --create_secondary[=on|off] Create a secondary index in addition to the PRIMARY KEY [on] --index_updates=N Number of UPDATE index queries per transaction [1] --range_size=N Range size for range SELECT queries [100] --auto_inc[=on|off] Use AUTO_INCREMENT column as Primary Key (for MySQL), or its alternatives in other DBMS. When disabled, use client-generated IDs [on] --delete_inserts=N Number of DELETE/INSERT combinations per transaction [1] --tables=N Number of tables [1] --mysql_storage_engine=STRING Storage engine, if MySQL is used [innodb] --non_index_updates=N Number of UPDATE non-index queries per transaction [1] --table_size=N Number of rows per table [10000] --pgsql_variant=STRING Use this PostgreSQL variant when running with the PostgreSQL driver. The only currently supported variant is 'redshift'. When enabled, create_secondary is automatically disabled, and delete_inserts is set to 0 --simple_ranges=N Number of simple range SELECT queries per transaction [1] --order_ranges=N Number of SELECT ORDER BY queries per transaction [1] --range_selects[=on|off] Enable/disable all range SELECT queries [on] --point_selects=N Number of point SELECT queries per transaction [10]

Again, we will discuss the most important options from here. First of all, you have a control of how exactly a transaction will look like. Generally speaking, it consists of different types of queries - INSERT, DELETE, different type of SELECT (point lookup, range, aggregation) and UPDATE (indexed, non-indexed). Using variables like:

--distinct_ranges=N Number of SELECT DISTINCT queries per transaction [1] --sum_ranges=N Number of SELECT SUM() queries per transaction [1] --index_updates=N Number of UPDATE index queries per transaction [1] --delete_inserts=N Number of DELETE/INSERT combinations per transaction [1] --non_index_updates=N Number of UPDATE non-index queries per transaction [1] --simple_ranges=N Number of simple range SELECT queries per transaction [1] --order_ranges=N Number of SELECT ORDER BY queries per transaction [1] --point_selects=N Number of point SELECT queries per transaction [10] --range_selects[=on|off] Enable/disable all range SELECT queries [on]

You can define what a transaction should look like. As you can see by looking at the default values, majority of queries are SELECTs - mainly point selects but also different types of range SELECTs (you can disable all of them by setting range_selects to off). You can tweak the workload towards more write-heavy workload by increasing the number of updates or INSERT/DELETE queries. It is also possible to tweak settings related to secondary indexes, auto increment but also data set size (number of tables and how many rows each of them should hold). This lets you customize your workload quite nicely.

--skip_trx[=on|off] Don't start explicit transactions and execute all queries in the AUTOCOMMIT mode [off]

This is another setting, quite important when working with proxies. By default, SysBench will attempt to execute queries in explicit transaction. This way the dataset will stay consistent and not affected: SysBench will, for example, execute INSERT and DELETE on the same row, making sure the data set will not grow (impacting your ability to reproduce results). However, proxies will treat explicit transactions differently - all queries executed within a transaction should be executed on the same host, thus removing the ability to scale the workload. Please keep in mind that disabling transactions will result in data set diverging from the initial point. It may also trigger some issues like duplicate key errors or such. To be able to disable transactions you may also want to look into:

--mysql-ignore-errors=[LIST,...] list of errors to ignore, or "all" [1213,1020,1205]

This setting allows you to specify error codes from MySQL which SysBench should ignore (and not kill the connection). For example, to ignore errors like: error 1062 (Duplicate entry '6' for key 'PRIMARY') you should pass this error code: --mysql-ignore-errors=1062

What is also important, each benchmark should present a way to provision a data set for tests, run them and then clean it up after the tests complete. This is done using ‘prepare’, ‘run’ and ‘cleanup’ commands. We will show how this is done in the next section.

Examples

In this section we’ll go through some examples of what SysBench can be used for. As mentioned earlier, we’ll focus on the two most popular benchmarks - OLTP read only and OLTP read/write. Sometimes it may make sense to use other benchmarks, but at least we’ll be able to show you how those two can be customized.

Primary Key lookups

First of all, we have to decide which benchmark we will run, read-only or read-write. Technically speaking it does not make a difference as we can remove writes from R/W benchmark. Let’s focus on the read-only one.

As a first step, we have to prepare a data set. We need to decide how big it should be. For this particular benchmark, using default settings (so, secondary indexes are created), 1 million rows will result in ~240 MB of data. Ten tables, 1000 000 rows each equals to 2.4GB:

root@vagrant:~# du -sh /var/lib/mysql/sbtest/ 2.4G /var/lib/mysql/sbtest/ root@vagrant:~# ls -alh /var/lib/mysql/sbtest/ total 2.4G drwxr-x--- 2 mysql mysql 4.0K Jun 1 12:12 . drwxr-xr-x 6 mysql mysql 4.0K Jun 1 12:10 .. -rw-r----- 1 mysql mysql 65 Jun 1 12:08 db.opt -rw-r----- 1 mysql mysql 8.5K Jun 1 12:12 sbtest10.frm -rw-r----- 1 mysql mysql 240M Jun 1 12:12 sbtest10.ibd -rw-r----- 1 mysql mysql 8.5K Jun 1 12:10 sbtest1.frm -rw-r----- 1 mysql mysql 240M Jun 1 12:10 sbtest1.ibd -rw-r----- 1 mysql mysql 8.5K Jun 1 12:10 sbtest2.frm -rw-r----- 1 mysql mysql 240M Jun 1 12:10 sbtest2.ibd -rw-r----- 1 mysql mysql 8.5K Jun 1 12:10 sbtest3.frm -rw-r----- 1 mysql mysql 240M Jun 1 12:10 sbtest3.ibd -rw-r----- 1 mysql mysql 8.5K Jun 1 12:10 sbtest4.frm -rw-r----- 1 mysql mysql 240M Jun 1 12:10 sbtest4.ibd -rw-r----- 1 mysql mysql 8.5K Jun 1 12:11 sbtest5.frm -rw-r----- 1 mysql mysql 240M Jun 1 12:11 sbtest5.ibd -rw-r----- 1 mysql mysql 8.5K Jun 1 12:11 sbtest6.frm -rw-r----- 1 mysql mysql 240M Jun 1 12:11 sbtest6.ibd -rw-r----- 1 mysql mysql 8.5K Jun 1 12:11 sbtest7.frm -rw-r----- 1 mysql mysql 240M Jun 1 12:11 sbtest7.ibd -rw-r----- 1 mysql mysql 8.5K Jun 1 12:11 sbtest8.frm -rw-r----- 1 mysql mysql 240M Jun 1 12:11 sbtest8.ibd -rw-r----- 1 mysql mysql 8.5K Jun 1 12:12 sbtest9.frm -rw-r----- 1 mysql mysql 240M Jun 1 12:12 sbtest9.ibd

This should give you idea how many tables you want and how big they should be. Let’s say we want to test in-memory workload so we want to create tables which will fit into InnoDB buffer pool. On the other hand, we want also to make sure there are enough tables not to become a bottleneck (or, that the amount of tables matches what you would expect in your production setup). Let’s prepare our dataset. Please keep in mind that, by default, SysBench looks for ‘sbtest’ schema which has to exist before you prepare the data set. You may have to create it manually.

root@vagrant:~# sysbench /root/sysbench/src/lua/oltp_read_only.lua --threads=4 --mysql-host=10.0.0.126 --mysql-user=sbtest --mysql-password=pass --mysql-port=3306 --tables=10 --table-size=1000000 prepare sysbench 1.1.0-2e6b7d5 (using bundled LuaJIT 2.1.0-beta3) Initializing worker threads... Creating table 'sbtest2'... Creating table 'sbtest3'... Creating table 'sbtest4'... Creating table 'sbtest1'... Inserting 1000000 records into 'sbtest2' Inserting 1000000 records into 'sbtest4' Inserting 1000000 records into 'sbtest3' Inserting 1000000 records into 'sbtest1' Creating a secondary index on 'sbtest2'... Creating a secondary index on 'sbtest3'... Creating a secondary index on 'sbtest1'... Creating a secondary index on 'sbtest4'... Creating table 'sbtest6'... Inserting 1000000 records into 'sbtest6' Creating table 'sbtest7'... Inserting 1000000 records into 'sbtest7' Creating table 'sbtest5'... Inserting 1000000 records into 'sbtest5' Creating table 'sbtest8'... Inserting 1000000 records into 'sbtest8' Creating a secondary index on 'sbtest6'... Creating a secondary index on 'sbtest7'... Creating a secondary index on 'sbtest5'... Creating a secondary index on 'sbtest8'... Creating table 'sbtest10'... Inserting 1000000 records into 'sbtest10' Creating table 'sbtest9'... Inserting 1000000 records into 'sbtest9' Creating a secondary index on 'sbtest10'... Creating a secondary index on 'sbtest9'...

Once we have our data, let’s prepare a command to run the test. We want to test Primary Key lookups therefore we will disable all other types of SELECT. We will also disable prepared statements as we want to test regular queries. We will test low concurrency, let’s say 16 threads. Our command may look like below:

sysbench /root/sysbench/src/lua/oltp_read_only.lua --threads=16 --events=0 --time=300 --mysql-host=10.0.0.126 --mysql-user=sbtest --mysql-password=pass --mysql-port=3306 --tables=10 --table-size=1000000 --range_selects=off --db-ps-mode=disable --report-interval=1 run

What did we do here? We set the number of threads to 16. We decided that we want our benchmark to run for 300 seconds, without a limit of executed queries. We defined connectivity to the database, number of tables and their size. We also disabled all range SELECTs, we also disabled prepared statements. Finally, we set report interval to one second. This is how a sample output may look like:

[ 297s ] thds: 16 tps: 97.21 qps: 1127.43 (r/w/o: 935.01/0.00/192.41) lat (ms,95%): 253.35 err/s: 0.00 reconn/s: 0.00 [ 298s ] thds: 16 tps: 195.32 qps: 2378.77 (r/w/o: 1985.13/0.00/393.64) lat (ms,95%): 189.93 err/s: 0.00 reconn/s: 0.00 [ 299s ] thds: 16 tps: 178.02 qps: 2115.22 (r/w/o: 1762.18/0.00/353.04) lat (ms,95%): 155.80 err/s: 0.00 reconn/s: 0.00 [ 300s ] thds: 16 tps: 217.82 qps: 2640.92 (r/w/o: 2202.27/0.00/438.65) lat (ms,95%): 125.52 err/s: 0.00 reconn/s: 0.00

Every second we see a snapshot of workload stats. This is quite useful to track and plot - final report will give you averages only. Intermediate results will make it possible to track the performance on a second by second basis. The final report may look like below:

SQL statistics: queries performed: read: 614660 write: 0 other: 122932 total: 737592 transactions: 61466 (204.84 per sec.) queries: 737592 (2458.08 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) Throughput: events/s (eps): 204.8403 time elapsed: 300.0679s total number of events: 61466 Latency (ms): min: 24.91 avg: 78.10 max: 331.91 95th percentile: 137.35 sum: 4800234.60 Threads fairness: events (avg/stddev): 3841.6250/20.87 execution time (avg/stddev): 300.0147/0.02

You will find here information about executed queries and other (BEGIN/COMMIT) statements. You’ll learn how many transactions were executed, how many errors happened, what was the throughput and total elapsed time. You can also check latency metrics and the query distribution across threads.

If we were interested in latency distribution, we could also pass ‘--histogram’ argument to SysBench. This results in an additional output like below:

Latency histogram (values are in milliseconds) value ------------- distribution ------------- count 29.194 |****** 1 30.815 |****** 1 31.945 |*********** 2 33.718 |****** 1 34.954 |*********** 2 35.589 |****** 1 37.565 |*********************** 4 38.247 |****** 1 38.942 |****** 1 39.650 |*********** 2 40.370 |*********** 2 41.104 |***************** 3 41.851 |***************************** 5 42.611 |***************** 3 43.385 |***************** 3 44.173 |*********** 2 44.976 |**************************************** 7 45.793 |*********************** 4 46.625 |*********** 2 47.472 |***************************** 5 48.335 |**************************************** 7 49.213 |*********** 2 50.107 |********************************** 6 51.018 |*********************** 4 51.945 |**************************************** 7 52.889 |***************** 3 53.850 |***************** 3 54.828 |*********************** 4 55.824 |*********** 2 57.871 |*********** 2 58.923 |*********** 2 59.993 |****** 1 61.083 |****** 1 63.323 |*********** 2 66.838 |****** 1 71.830 |****** 1

Once we are good with our results, we can clean up the data:

sysbench /root/sysbench/src/lua/oltp_read_only.lua --threads=16 --events=0 --time=300 --mysql-host=10.0.0.126 --mysql-user=sbtest --mysql-password=pass --mysql-port=3306 --tables=10 --table-size=1000000 --range_selects=off --db-ps-mode=disable --report-interval=1 cleanup Write-heavy traffic

Let’s imagine here that we want to execute a write-heavy (but not write-only) workload and, for example, test I/O subsystem’s performance. First of all, we have to decide how big the dataset should be. We’ll assume ~48GB of data (20 tables, 10 000 000 rows each). We need to prepare it. This time we will use the read-write benchmark.

root@vagrant:~# sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=4 --mysql-host=10.0.0.126 --mysql-user=sbtest --mysql-password=pass --mysql-port=3306 --tables=20 --table-size=10000000 prepare

Once this is done, we can tweak the defaults to force more writes into the query mix:

root@vagrant:~# sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=16 --events=0 --time=300 --mysql-host=10.0.0.126 --mysql-user=sbtest --mysql-password=pass --mysql-port=3306 --tables=20 --delete_inserts=10 --index_updates=10 --non_index_updates=10 --table-size=10000000 --db-ps-mode=disable --report-interval=1 run

As you can see from the intermediate results, transactions are now on a write-heavy side:

[ 5s ] thds: 16 tps: 16.99 qps: 946.31 (r/w/o: 231.83/680.50/33.98) lat (ms,95%): 1258.08 err/s: 0.00 reconn/s: 0.00 [ 6s ] thds: 16 tps: 17.01 qps: 955.81 (r/w/o: 223.19/698.59/34.03) lat (ms,95%): 1032.01 err/s: 0.00 reconn/s: 0.00 [ 7s ] thds: 16 tps: 12.00 qps: 698.91 (r/w/o: 191.97/482.93/24.00) lat (ms,95%): 1235.62 err/s: 0.00 reconn/s: 0.00 [ 8s ] thds: 16 tps: 14.01 qps: 683.43 (r/w/o: 195.12/460.29/28.02) lat (ms,95%): 1533.66 err/s: 0.00 reconn/s: 0.00 Related resources  ClusterControl for MySQL  ClusterControl for MariaDB  How to Make Your MySQL or MariaDB Database Highly Available on AWS and Google Cloud Understanding the results

As we showed above, SysBench is a great tool which can help to pinpoint some of the performance issues of MySQL or MariaDB. It can also be used for initial tuning of your database configuration. Of course, you have to keep in mind that, to get the best out of your benchmarks, you have to understand why results look like they do. This would require insights into the MySQL internal metrics using monitoring tools, for instance, ClusterControl. This is quite important to remember - if you don’t understand why the performance was like it was, you may draw incorrect conclusions out of the benchmarks. There is always a bottleneck, and SysBench can help raise the performance issues, which you then have to identify.

Tags:  MySQL MariaDB sysbench performance benchmark
Categories: Web Technologies

Benchmark of new cloud feature in MySQL Cluster 7.6

Planet MySQL - Mon, 06/11/2018 - 23:49

In previous blogs we have shown how MySQL Cluster can use the Read Backup
feature to improve performance when the MySQL Server and the NDB data
node are colocated.

There are two scenarios in a cloud setup where additional measures are
needed to ensure localized read accesses even when using the Read Backup
feature.

The first scenario is when data nodes and MySQL Servers are not colocated.
In this case by default we have no notion of closeness between nodes in
the cluster.

The second case is when we have multiple node groups and using colocated
data nodes and MySQL Server. In this case we have a notion of closeness
to the data in the node group we are colocated with, but not to other
node groups.

In a cloud setup the closeness is dependent on whether two nodes are in
the same availability domain (availability zone in Amazon/Google) or not.
In your own network other scenarios could exist.

In MySQL Cluster 7.6 we added a new feature where it is possible
to configure nodes to be contained in a certain location domain.
Nodes that are close to each other should be configured to be part of
the same location domain. Nodes belonging to different location domains
are always considered to be further away than the one with the same
location domain.

We will use this knowledge to always use a transaction coordinator placed
in the same location domain and if possible we will always read from a
replica placed in the same location domain as the transaction coordinator.

We use this feature to direct reads to a replica that is contained
in the same availability domain.

This provides a much better throughput for read queries in MySQL Cluster
when the data nodes and MySQL servers span multiple availability domains.

In the figure below we see the setup, each sysbench application is working
against one MySQL Server, both of these are located in the same availability
domain. The MySQL Server works against a set of 3 replicas in the NDB data
nodes. Each of those 3 replicas reside in a different availabilty domain.

The graph above shows the difference between using location domain ids in
this setup compared to not using them. The lacking measurements is missing
simply because there wasn't enough time to complete this particular
benchmark, but the measurements show still the improvements possible and
the improvement is above 40%.

The Bare Metal Server used for data nodes was the DenseIO2 machines and
the MySQL Server used a bare metal server without any attached disks and
not even any block storage is needed in the MySQL Server instances. The
MySQL Servers in an NDB setup are more or stateless, all the required state
is available in the NDB data nodes. Thus it is quite ok to start up a MySQL
Server from scratch all the time. The exception is when the MySQL Server
is used for replicating to another cluster, in this case the binlog state is required
to be persistent on the MySQL Server.
Categories: Web Technologies

Pages