MySQL questions

From: Aaron welch 
I am wanting to provide a MySQL high-availability solution for our web app.
 I am looking to do MySQL replication with a master and two slaves.  I am
wanting to send all the writes to the master while reading from the slaves.
 I am limited to to only using 3 hosts so each box will run a MySQL instance
with the two slaves also running the web front ends.  Any of you have any
suggestions or caveats to this arrangement?


Aaron Welch
"Enabling people to do great things with their own ideas."

Chief Mechanic @ HyvMynd Technologies

=============================================================== From: Billy ------------------------------------------------------ I did this in another dev/admin life. The thing i remember most about this i= s: Auto generated Identity fields are calculated by each individual host. At least it was in circa 2003. I used a binary log and took great pains to e= nsure everything got the replication updates. However, one day, something w= ent wrong. I can't remember what. It was something like hdd space or reboote= d box or something. Maybe I screwed up the log numbers myself. Anyways, whatever it was left out a few rows in a generated table. Therefore= i learned inserts were processed per node. Not by the master. A table on on= e slave node didn't match the others. The auto generated id field was off by= a few. This was bad. I had to rsync the tables later that night during a maint window. Moral: keep backups, and have a standby procedure to rebuild the slaves. --b . I am looking to do MySQL replication with a master and two slaves. I am w= anting to send all the writes to the master while reading from the slaves. I= am limited to to only using 3 hosts so each box will run a MySQL instance w= ith the two slaves also running the web front ends. Any of you have any sug= gestions or caveats to this arrangement?

=============================================================== From: Dave Brockman ------------------------------------------------------ -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Auto-Increment fields are writable node specific, you need a distinct and unique offset per writable node. This leaves gaps in the auto-increment fields, so don't use them for "visible" data like order number, etc, as they will jump. (ie. 11, 23, 32, 46, 53, 68) Something's not right with that scenario. Only the Master should be updatable, or you aren't really master/slaving, but multi-master replicating. I know MySQL doesn't call it that, but if you update more than one node, you have more than one Master. None of this should apply if Aaron is truly using the slave nodes for reading and one Master node for writing. Regards, dtb -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - iEYEARECAAYFAk3cJUkACgkQABP1RO+tr2THiACeOXv6jG8YxBt31aNZhxVJY7me x9IAoJtHqcNkXNhQYwS6l5wY61WYGIP7 =Xd7r -----END PGP SIGNATURE-----

=============================================================== From: Aaron Welch ------------------------------------------------------ We are trying to cover for a Master-write failure or a Slave-read failure wi= thout dumping the whole app stack. -AW

=============================================================== From: Billy ------------------------------------------------------ This sounds vaguely like the natural keys, generated keys, sequences debate.= Jumping is fine as long as it's consistent across all nodes (more on that la= ter). Yes, it's called "trying to use MySql as a real database". I honestly haven'= t used mysql since 2005, but I was using it's binary log per the instruction= s on clustering (master/slave). I looked in those logs and I saw a log of sq= l commands, almost verbatim to what was sent to the master. When I tcpdump'd= the raw sockets between nodes, those contents where what i saw going across= the wire. I can only infer that the slaves "replay" the sql commands from t= he log when they receive (or asked for) them. That's not true replication. It will work in a pinch.=20 As far as I could tell, there was nothing to prevent you (or a bad program)= from writing to town on the slave nodes. In a nice world, there would be protocol/config constraints to prevent anyth= ing but the master log from updating persistent tables on the slaves. Questi= ons is: does mysql have those constraints now? --b

=============================================================== From: Dave Brockman ------------------------------------------------------ -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 If you don't assign each writable node it's own unique offset, you have a chance of collisions. Replicated nodes, current auto-inc fields is 23. I insert a record on server A, it gets ID 24, I insert a record on Server B prior to replication, it assigns ID 24. Guess what happens when replication occurs? Is that not how a SQL transaction log is replayed? I haven't sniffed it since MS-SQL 2000, but it used to do the exact same thing, replay the exact SQL commands that were sent to the Master. That is exactly what it does if you are restoring from a transaction log. I don't know what format it's stored in, but the inserts happen, transaction by transaction.... MS-SQL replication by default does the same thing, and if you don't have an Enterprise license, you actually have to wait for the replicated server to acknowledge the transaction before you can continue. Expensive Enterprise Licensing lets you fire it and continue before the remote server acknowledges the write. MySQL uses user/host/password for security. You would need to ensure your replication account has a damn good password, and only allowed from the other node, not anywhere else. Very similar to how you protect your MySQL "root" account. Or apparently there are more correct ways to do this, including specific replication only permissions, as well as configuration options to address this very issue:

=============================================================== From: Billy ------------------------------------------------------ same thing if you insert a row with a unique key constraint. The issue in this scenario is as you pointed out earlier and below: why are rows being inserted into one server outside of this master/slave architecture? That couldn't be good in any scenario, regardless if you have replication setup or not. Having tables that are supposed to be the same, and having something else insert rows that aren't guaranteed to be on the other server is a recipe for disaster. I think using auto-inc fields would the be the least of your worries here. The main thing is, there is a contract. This contract says, (master/slave) whatever you do on A, it will be duplicated on B. If it's master/master, then the reverse is true as well. If there are loop holes in the contract, well, then the design is bad. If auto-inc fields aren't supported in a cluster, then the contract better not allow someone to create a table with them! they also attempt to transmit other interesting data, such as row hash, conflict resolution (if required), and other data to ensure that the command it's about to play will have the same result in the destination as it did on the source. I couldn't find the exact verbiage for MsSql, but Oracle talked plenty about it. It makes sense, unless you're just pushing to "subscribers" and hoping they deal with the data in a sane way. That's kind of a Deist notion for database replication: the great creator sends the update and then forgets about it, hoping for the best. I would hope my database treated the data a little bit more... monotheistic :) I was under the impression that if you updated the mysql users table on the master, it would have just replicated it to the slave! I'm glad you did his homework! Unfortunately, I wrote the last two emails from my iPhone using 3G. --b

=============================================================== From: Dave Brockman ------------------------------------------------------ -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 And just in case I'm not translating well, I'm asking more questions than I'm making statements. I *know* you have much more db experience than I do. I think maybe I'm more confused about your crash experience, I don't understand why it was so horrible. I thought MySQL had a consistency check type of option that could compare and repair inconsistent data on a slave. I know you can drop the slave config and database, re-add re-create and restart and get a good sync.... The main thing is, there is a contract. This contract says, While it would be a wonderful great world if all the tools and applications I used warned me that option A for parameter B will not work because I turned on option Z, it has been my experience that this just does not happen. I agree with your logic, but I make my living because someone has to be smarter than the computer and the programs residing on it's hard drive. If (pick your OS to flame) actually held your hand and told you exactly how to do everything you want to do -- the Right Way (tm), us Geeks would have to find a new way to make a living I suspect.... What I meant was, the query you sent from your web app to the local database server does not return an answer until all the remote servers acknowledge the transaction in addition to the local one. I didn't mean the replication itself acted in such a manner. Nice option to a feature to require quadrupling your license expenditures. Don't replicate the mysql database, or don't replicate the user table? I admit when typing that I was thinking your replication user was being used to update data on the slave, but I see your point. Specifying the read-only databases is probably the best solution to that issue as well. I know, the damn iphone auto-correct just refuses to let you type meaningful conversation, doesn't it? :) Regards, dtb -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - iEYEARECAAYFAk3cWJgACgkQABP1RO+tr2SQvgCfS7mfl4XUN/VXNVESwzjGpnz7 hDUAmwY5Wt1iIOagcMYwun9KQRL5uzk/ =SX1u -----END PGP SIGNATURE-----

=============================================================== From: "Alex Smith (K4RNT)" ------------------------------------------------------ I hate to toot my own horn, but have you tried an Amazon EC2 instance, or using their database offering? I do work for, but I don't get any types of kickbacks for advocating their products, I just really like their offerings. :) p. =C2=A0I am s. instance e any

=============================================================== From: Aaron Welch ------------------------------------------------------ We are eventually going to do something other than MySQL and we went with Ra= ckspace Cloud due to costs. We were running on EC2 before jumping to Racksp= ace, but our instances became incredibly slow after the EC2 crash debacle. S= o based on all that, we will not be relying on Amazon for anything critical a= ny time soon. I am looking to build an Openstack private cloud for our Dev a= nd backup environments though. -AW On May 24, 2011, at 9:20 PM, "Alex Smith (K4RNT)" w= rote: p. s. nce

=============================================================== From: Billy ------------------------------------------------------ I forget who mentioned this (I think it was ebwolf), but this notion of "To The Cloud!" to solve everything needs to stop. It's a tool. Nothing more. Nothing less. It's a great tool. It has hard edges, and it can cut you. Use it to open the wrong thing, and it will bend and break. Use it to protect your business, and someone will use it against you. It's in Amazon's (and Rackspace) best interest for them to market it as the "End all. Be all" killer app. Wouldn't you? Such is the IT life however. IT should be referred as a "practice" much like Doctors and Lawyers. It's not perfect, and we make a lot of it up as we go along because things change so damn fast. I bet Amazon tested and double tested everything. It was a perfect storm, and they got hit with their own Katrina and the levees fell. As IT folks, we should know better. We're the quarterbacks in this game, so we should be calling audibles even if our coaches are screaming at us to do a Hail Mary. Ok, I'll stop. I couldn't think of any more metaphors to pass in... (ok I lied) --b

=============================================================== From: Dave Brockman ------------------------------------------------------ -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'll go even farther than that... Wrong hand touched the wrong button, human error if the press is to be believed. It's the same with everything else in this industry, *nothing* is infallible. Nothing is 100%. Hell, 99.99% is hard enough, how many *years* before Amazon can talk about 5 nines again? Regards, dtb -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - iEYEARECAAYFAk3cbpkACgkQABP1RO+tr2TOBACdEU1yCG5KWxegxfNR3oauZlcy Gh0AoJTHrWHgG57bP+ODtWN1TVohnHCz =WTYv -----END PGP SIGNATURE-----

=============================================================== From: Aaron welch ------------------------------------------------------ /begin defend the cloud decision We chose to go to the cloud due to the fact that we are a startup on a tight budget that provides line of business applications to businesses. Any type of service interruption to our customers would be catastrophic to our business right now. By pushing our app into the cloud we guard against things that would be trivial to other types of businesses such as a temporary power or network outage. The other points are: Easier to scale at speed with minimal cost and headache Easier and safer application upgrades using snapshots before major code rolls No hardware maintenance Try before you by advanced features without the extensive proof of concept setup required to do it in a colo with dedicated hardware This decision was not taken lightly or without alot of considerations for what-ifs. We will eventually outgrow our current cloud environment due to memory constraints and simple monthly costs. Building in the cloud out of the gate allows us to get up and running fast on a redundant and distributed system that would cost us 10x as much to do with dedicated hardware. As we move into architectures that we control using Openstack and Cassandra things will dramatically change from an operational standpoint. This is my short answer to a week long discussion we had in house. What finally tipped the scale for me was working on a "hardware solution" for our Dev environment for 5 hours only to find that each of the servers had one or more problems that prevented it from working in the cluster. It took all of an hour to get the same solution up and running on the cloud at Rackspace and it is costing us $50 a month (this is a small deployment for POC). Just my experience, YMMV. -AW

=============================================================== From: Billy ------------------------------------------------------ Things worked GREAT as long as every log file (entry) was retrieved by the slave like it was supposed to. However, there were a few constraints that restrict what could be done. (A) log files take up space. (B) space isn't unlimited. I'm stretching my brain here, trying to remember what happened, and I'm coming up blank. I do know that I had a process that cleared out the "picked up and processed" logs nightly, using a nifty sql command on the master. What I don't remember is what caused one of the slaves to get unsynced. From what I remember, the table that had the issue wasn't so much a game stopper if it missed one (few) row(s), and I was fully prepared to update that row(s) at a later time. What I wasn't expecting was that the master didn't tell the slave the auto-inc field it derived. If the master can derive the id field, then surely it should send that down the wire? Nope! That was the big surprise, which caused fatal errors on down the road (obviously). It probably was in the documentation at the time, but I'm fairly certain I researched that. Maybe I didn't. Who can say at this point? [**** IMPORTANT FOR AARON ****] My original Moral to the story is sound, however: have a backup and a way to sync the slaves manually. [**** ****] Today, you could pause the database updates, create a snapshot volume, and rsync the database snapshot to the other server. Snapshots under LVM in RedHat were kind of new then, and it probably wasn't running under a LVM in production anyways. I think I could still do a hot backup. Still, if you have an option of stopping the server, copying the files over and restarting, you're going to FEEL a lot better about it. Sometimes you don't have the option, so you need to rely on scary "hot backup" scripts and "pause the database" commands with snapshots. True. If everything was just point and click, none of us would be in IT. I'd probably be lawyer, scientist or Marine - depending upon which cute tail I had chased or not chased back in the day... Still, we shouldn't be afraid to use a highlighter and say, "This is bad software/hardware design, and puppies will cry if we use it." I got that part. That portion was my thought from the previous paragraph, and it must have been so great that it wrapped around your quoted text and manifested itself in attack formation attempting to swarm it on all sides. Or maybe I just trimmed poorly. I don't believe selective table replication was an option then. Hell, I can't remember. It doesn't matter at this point. We're so off topic, I could write the lyrics to an 80's hair band, and most people would just delete this message and miss it. No, this is a learned skill by being in a pointy hair corporate cubical environment for way too long. --b

=============================================================== From: Billy ------------------------------------------------------ I didn't mean to come across and suggesting that you didn't do your homework. I know you better than that. Yet, the old adage: "you get what you pay for" is poignant here. I think there is a LOT of value in EC2 and Cloud Storage. I use it, too! Yet, it's really cheap, and if I had lots of money riding on it, I know there's not a lot of insurance there. It's a risk assessment correlated with the perceived value. Once that goes below 0, then I'm sure you'll be spending the money on something else that meets your needs! In a way, you're displacing your risks from god, nature or man and placing them in the sphere of RS. The risks are the same, you're just trusting them that they'll be better prepared than you! I live in a place that gets hurricanes and will probably flood in the next 20 years. My infrastructure is not immune to fires. There are some data things I can not replace. I have them replicated in the cloud and at home - probably other places, too. This fits my needs as I'm fairly confident amazon s3 infrastructure is better prepared for hdd failure, hurricanes, tornadoes and floods than I am. Another good one: How you want your solution (can only tick two): [ ] Cheap [ ] Quickly [ ] High Quality --b

=============================================================== From: Billy ------------------------------------------------------ wow. I'm not sure I'd go that far! It's a tool, and it fits some specific problems. People just need to be aware, and treat it for what it is. I'd like to see more mixed architectures of cloud and conventional applications. And by see more, I mean more stories in the media about them - to me, that's a much more realistic point of view. --b

=============================================================== From: Aaron welch ------------------------------------------------------ Option 4 is particular to SaaS models like us: Highly Available You can get 2.5 options out of the list, but cheap is time constrained based on growth and quickly goes out the door after your POC is proven. We will go "cheap" cloud until we outgrow it and our quality will (hopefully) increase with experience and market acceptance. -AW

=============================================================== From: Aaron welch ------------------------------------------------------ We will be in a "split hybrid" architecture by the end of the year. DBs will run on dedicated hardware and our app servers (read LAMP boxen) will be deployed from AMIs in our private cloud. We are looking forward to the day when our budget (both time and money) supports us deploying Apache Cassandra. We plan on doing alot of analytics on our data and using Datastax Brisk is really appealing from a deployment and support standpoint. I hope that our little company grows by leaps and bounds towards the end of the year so that we can use some of the cool architecture and technology that we are currently reviewing. Once we are out of stealth mode I will share more about what we do and why. That way I can have the tin foil hat people review our public posts before I make major PR mistakes. -AW

=============================================================== From: Dave Brockman ------------------------------------------------------ -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yeah, that's what I thought the point was, the article was actually a response to the "abandon" article. It has some valid uses, it has some deficiencies, we should be aware of these and make informed decisions based on that knowledge. Exactly! Unless SaaS is your business model, you're probably not starting everything to the cloud. I haven't heard anything about integrations like this! Regards, dtb -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - iEYEARECAAYFAk3cfSMACgkQABP1RO+tr2TWrACeKsvErp9VMcRSH+D6ne77jFgJ +0kAn1Hr+z4ZVlvXQtXheMQmmp0F7Ml6 =ahj6 -----END PGP SIGNATURE-----

=============================================================== From: Dave Brockman ------------------------------------------------------ -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hehe, you should make us some pretty graphs showing the inverse relationships among Cheap/Fast/Reliable. I want a bottle of good jack if you figure out a way to make money out of that idea though ;) Regards, dtb -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - iEYEARECAAYFAk3cfiIACgkQABP1RO+tr2SR/gCeP37zfWjZC732hAenwiJhOyJJ AqQAn3ryQLPpRzaGgQwHlsMGPH/x4MX1 =MP+o -----END PGP SIGNATURE-----

=============================================================== From: Dave Brockman ------------------------------------------------------ -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Didn't mean to start the whole Cloud vs no Cloud holy war. I actually think your particular business model is the perfect venture for the cloud, and you know why, you've stated them. The one totally awesome, super effing cool thing about the cloud is the ability to dynamically scale up on demand, on a pay as you play model. Granted, the top end of that costs a lot to play, but it seems like you've got that covered too, per your next post :) Regards, dtb -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - iEYEARECAAYFAk3cfvUACgkQABP1RO+tr2RELwCgt2DkIi5JSPgp4R/2pbIWCm0P L8gAn0k7juYELOQKSOTc+A5Ujoi3dvJZ =8XCh -----END PGP SIGNATURE-----

=============================================================== From: Aaron welch ------------------------------------------------------ You know... I think that will be my topic for DevChat. "Deploying SaaS Apps Into the Cloud" aka how not to crash and burn with your VCs money. -AW

=============================================================== From: Aaron welch ------------------------------------------------------ I am building a spreadsheet and collateral docs about the breaking point for cloud vs dedicated deployments for memory intensive applications. We need this for what we do, but it would prolly be a good thing to share back into the world. The one thing that the cloud has really going for it is the easy of deploying into diverse networks. With a dedicated stack solution you are silo'd into a single physical location with hopefully many redundant connections to the net. A place like this does not exist locally in Chattanooga, so it would be really expensive for us to deploy this model out of the gate in Atlanta for example. A mixed hybrid solution using cloud web frontends and dedicated data storage (both structured and unstructured) is a second transitional step. Our final architecture is going to be a public/private cloud with our hardware mixed with Rackspace and Amazon with a dedicated and diverse data storage ring using Datastax Brisk or Apache Cassandra. Keep in mind that we make products for 3 separate industries currently, so each has different business processes that generate different data models that will be a nightmare to support and replicate using a traditional DB. This is especially true when rolling out new features that require us to change existing data while maintaining system performance across ALL of the application stacks and in ALL of the replication nodes. Erg... sorry to be info dumping on the list. I do appreciate the feedback though. -AW

=============================================================== From: Mike Harrison ------------------------------------------------------ Dave: Correct... and honestly for "trash data" it's a perfectly valid way to cache a lot of data to another server.. but I'm not so sure it's the best. Worth a try. On the "slave/copy/clone" systems database I would make the local web application be a read-only login. You might even have a local database on the slaves that via a different application/script, updates the master. In reality, if you really need that kind of scale for database stuff, the "master" should be a cluster, and then also setup some read-only-clones.. Just guessing at the table structures/usage, it might be easier/better to update the trash data tables on the clones and clients via a script that runs every minute (or random times approximately a minute) Connect to master, look for new data since 5 minutes ago. If no data, nada. If new data, update. sleep. repeat.

=============================================================== From: Mike Harrison ------------------------------------------------------ I think with some proper application configuration and slave server configs it is possible to do that fairly well.

=============================================================== From: Mike Harrison ------------------------------------------------------ I love mysqldump. i hear there may be better ways and need to explore them, but so far, "mysqldump -c databasename >file.sql" rocks. Via cron. As often as i can.

=============================================================== From: Kenneth Ratliff ------------------------------------------------------ master. id more apply node Which doesn't change how replication actually works. Replication doesn't = just copy the data from one place to another. It copies the QUERY. So = the master may initially take the update or insert, but when it = replicates to the slaves, the slaves will actually run that query as if = the update had taken place on that node. So if you run an ALTER TABLE on = the master, and you leave logging turned on the slaves, when that = command replicates to the slaves, it will fail, and replication will = stop.=20 Or, lets say youve got a big database on a master, with a 7 gig table = that's crashed. It's not crashed on the slaves, but it is on the master. = So you run REPAIR TABLE on the master. Well, guess what - that repair = command is going to replicate to the slaves as well, who will also run = repair's on this very large table. This is a good way to bring your = applications processing to a grinding halt (I learned this from very = painful experience, was subsequently enlightened as to the existence of = the REPAIR LOCAL TABLE command) =20=

=============================================================== From: Kenneth Ratliff ------------------------------------------------------ on It depends on how you set your replication up. You certainly can force = the mysql database to replicate, but it's usually not a good idea. Most = of the setups I've dealt with explicitly define which databases are = replicated, and mysql is not one of them, so authentication is per node=

=============================================================== From: Kenneth Ratliff ------------------------------------------------------ Do me a favor and go kick the engineer who causes the EC2 outage ;)

=============================================================== From: Kenneth Ratliff ------------------------------------------------------ tight budget that provides line of business applications to businesses. = Any type of service interruption to our customers would be catastrophic = to our business right now. By pushing our app into the cloud we guard = against things that would be trivial to other types of businesses such = as a temporary power or network outage. The other points are: You also take the risk of things that are beyond your control bringing = your service down. The Amazon EC2 crash is a very good example of this. I personally have quite a few issues with IaaS, but given the nature of = the company I work for, that's probably not surprising. I agree that = IaaS can be beneficial to a startup, but there will come a certain point = in time where you'll want to bring it back in house, so to speak.

=============================================================== From: Kenneth Ratliff ------------------------------------------------------ them, but so far, "mysqldump -c databasename >file.sql" rocks. Via cron. Off of your slaves, of course ;) Lord knows you don't want to make your = master that unhappy

=============================================================== From: Billy ------------------------------------------------------ r insert, but when it replicates to the slaves, the slaves will actually run= that query as if the update had taken place on that node. There are actually several ways db's perform replication. Transactional repl= ication is just one of them. There is normally more information that should g= o along with that besides just the sql statement. Like I said earlier...I wa= nt a more monotheistic handle on my data than deist. --b