Your Data Lives Somewhere
Databases, and the very definition of one, come in all shapes and sizes. But ultimately every definition has the same roots: a system with which data can be stored persistently and reliably. Persistence (it’s not lost when the application closes or the machine shuts down) and reliability (you will be able to retrieve as needed and have some sort of understanding of what it is you’re retrieving when doing so) are two adjectives that are absolutely critical to a database. Without them you have little more than nothing.
Virtually every application is doing to use a database at some point or another. We live in a data-driven world. When you go to Target and buy socks, that transaction is stored in a database. When you go to your physician’s office for a check up, they store that in your EMR (electronic medical record), otherwise known as a database. Every time you cut a check out to your mortgage company or landlord and they deposit it to their bank, it’s recorded in a transactions database.
If we were to pull every bit of data, from every organization/enterprise/mom and pop shop/healthcare facility/etc., for any given person over the age of a few seconds old, you would be inundated with information. Today it is a possibility (if you got your hands on enough warrants to pull the necessary data from every entity) to effectively map out your entire life, day by day, minute detail by minute detail.
It’s scary to think about, but the next application you’re going to write may very well just contribute to that fact. So it’s important you know just how critical databases are to the world.
Relational Databases Saved Us
In the 1970s it became very clear just how much data we were storing. Now at this point in time it’s very small, but keep things in perspective: technology as we know it today was just starting to get ready to kick off.
As datasets became more complex it was realized that a better way of storing them was needed. Enter relational databases. The relational model solved a lot of problems and quickly became the new de facto standard. Databases were finally able to modeled after real world data sets.
Strict schemas could be set in place to ensure data reliability and consistency. Atomic transactions were realized (importance of which is not so important to us in this tutorial). Efficiency in interacting (querying) data was improved by drawing relations between the datasets. A database was finally able to effectively communicate “I have this employee, and he has many duties assigned to him, but I only have a single record on file for his start date (as I should).”
And this model has persisted as the go-to standard right through to today. Oracle Database, Microsoft SQL Server and MySQL represent the top three database applications used across the world. The former two are almost-strictly seen in enterprise/businesses only. MySQL is the five hundred pound gorilla in the web and open source worlds. And it’s what we’re going to be getting our hands dirty with today!
MySQL would come along in the mid-90s, just in time for the dot com boom (timing is everything). Between it being open source and free (as in free beer), MySQL quickly picked up popularity. The demands of early web applications brought about a simple realization: this was going to be a data-driven domain. And MySQL facilitated that by providing an RDBMS (Relational Database Management System) that early webmasters could take advantage of. To this day MySQL remains in the top three of all RDBMSes. It is arguably only beat by Oracle’s flagship database system, which is used by virtually every one of the Fortune 100 companies (pretty big deal).
You Must Communicate Somehow
An RDBMS is useless if it cannot be interacted with. This is where SQL (Structured Query Language).
What Good Is SQL?
Without SQL you have nothing but a service running on your PC or server. You can forge connections to it (hopefully) but you cannot chat with it. Basically you have allocated resources that you’re gaining nothing out of. SQL fixes that. It is through SQL that we do everything with our RDBMS. Creating a new user? It’s done in SQL. Formatting output prior to it hitting your application? SQL does that too. Create a new relation to store data? Yep, SQL. Placing actual data in that relation? You get my point. SQL and relational databases go hand in hand. Even more so than peanut butter and jelly. After all, peanut butter without jelly is still delicious.
Where Will I Use SQL?
You’ll use SQL with any and all relational database systems. It is specifically purposed for relational databases, so that’s the extent to where you’ll use it.
How Is SQL Used?
This will vary for every reader. And for the purpose of this article we’re going to strictly keep this within the scope of MySQL. If you’re a PHP web developer then you’ll use an extension such as MySQLi or PDO. Ruby on Rails developers typically use the ActiveRecord object-relational mapping (ORM) gem.
Virtually every language supports some form of interaction with MySQL, be it more direct (such as the MySQLi extension in PHP) or an ORM library/extension. Connectors even exist for languages such as C++, C# and Visual Basic.
If it’s a mainstream language then the tools exist to interact with MySQL. We’re going to be interacting with MySQL directly through the command line in this tutorial. Everything we learn can be translated one-for-one into the language of your choice.
The queries we’ll see and dissect are valid regardless of how you interact with MySQL. But by doing so without any frills we are able to focus solely on the language itself. This also alleviates any concerns of abstractions deterring you from learning the nuts and bolts, which is what this tutorial aims to do introduce you to.
Let’s quickly go over installation MySQL on Windows and Mac OS X. If you’re using Linux then I would encourage you to install MySQL via your distro’s package manager (Yum, Apt, etc.). But given most will be on Windows, and the few that aren’t are likely on OS X, I’m going to focus on them specifically.
For Windows your best bet is to grab the MSI installer you can find by clicking here. It’s a GUI installer, so you’ll just need to follow the prompts. If you want MySQL to start automatically, then be sure to install it as a service. And if you have the option to do so, choose to have it added to your PATH. That will make the rest of this tutorial go much smoother.
Mac OS X
A couple options exist for you. If you have Homebrew installed and set up then you can simply (via terminal):
brew search mysql
Then install the latest MySQL supported (likely mysql55- MySQL 5.5). DMGs for MySQL can also be found here. If you do not have Homebrew set up then this is your best bet. It will also get you the latest version (5.6 at the time of writing this), so that may be preferable anyways.
Final Steps for Following This Guide
We have just a couple steps to accomplish before we can really start this guide.
If you’re on Windows and installed MySQL as a service then this should already be done for you. But you can quickly check by doing this (open your command prompt and follow along):
C:\Program Files\MySQL\bin\mysqld.exe status
Change the path there to wherever you installed MySQL to. The result should either be an error because MySQL isn’t running, or success confirming it is. If you’re told it’s not running then issue the following:
C:\Program Files\MySQL\bin\mysqld.exe start
To make the rest of this tutorial easier, just change your directory to the MySQL bin directory:
cd C:\Program Files\MySQL\bin
Changing the path to be appropriate for your installation, of course. Now you can simply call the application instead of writing out the full path:
mysqld status OR mysqld.exe status (if the above doesn't work)
And while you’re at it, add that path to your PATH directive so you don’t have to constantly type it out. OS X users should simply try (in terminal):
If an error is received because MySQL is not running then issue:
It’s impossible for me to know every use case here for what may be wrong with any given OS, so please post below if you have issues with this step. And I’ll flesh this section out with scenarios.
Your first time logging in will be with the root user. However you’re accessing the mysqld application (as you did above to start it), do the following:
mysql -u root
If MySQL installed correctly and is running, you should see the following: There will be copyright information above it as well. But if you do see this then congratulations: you’re connected to MySQL! Let’s rock and roll.
At this point you should be sitting in front of a MySQL prompt. We’re now going to start using SQL. Specifically: we’re going to create a database to play around in and a user to access that database! Let’s start, shall we?
We Need a Database
MySQL is the RDBMS. It’s a type of database system. Within it are databases that contain relations that contain data. Think of a database as a container for all of your relations (tables). Databases are usually created on a one-to-one basis per the applications accessing them. So for example, your MyBB forum has a database and your WordPress blog has its own. Both databases may be served by the same RDBMS instance, but they’re separate from one another internally. We’re going to create a database specific for our testing here. Type and execute (hit enter) the following:
CREATE DATABASE my_testing;
If successful you should see this: This is telling you that one database was created. But what exactly did we just do?
Statements and Functions: SQL Has Them in Spades
Like most languages, SQL too has statements and functions. Functions in SQL may take parameters; statements may take flags, data types, labels or other tokens (to an extent, so too do functions). In the above example we used the CREATE DATABASE statement. There are a number of flags we could have set (IF NOT EXISTS, for example, to ensure we don’t err out if the database already exists) or we could have designated the character set. But the above is perfectly acceptable and what you will see most commonly. CREATE DATABASE is the statement call and my_testing is the label we’re passing it. This basic syntax is seen all over SQL. With the database created we now need a user to interact with it. In a moment I’ll explain why it’s beneficial to use database-specific users instead of root. But for now just follow along.
Creating a User
With the database created and stored in MySQL, we can now create a user specifically for it. Let’s do that now:
CREATE USER 'myname'@'localhost' IDENTIFIED BY 'mypassword';
You should have an OK result from this query, if done correctly. Again, we see the statement call, but this time it’s CREATE USER. As the name describes, it’s going to create a user with the information following it. We’re passing the following:
On the left of the @ symbol we have the username. Right now you’re using a user with the username of root. And we just created a user with the username of myname. It’s a string type, so we encase it in quotations (single or double- I went single). Now the @’localhost’ may look a little weird. But it’s actually quite simple: we only want this user being able to access the database from the local machine and not remotely (outside our network). This does just that. You could change that to allow specific IP address, IP address range, wildcard, etc. But for our purposes, localhost is just fine. We also called another statement: IDENTIFIED BY with a string passed to it. This sets a password for the user. Change ‘mypassword’ to whatever you’d like. Remove the statement entirely if you want. But it’s best practice to give every user a password.
Not All Users Are Created Equally
Before we can log out and log back in with our new user, we need to give it some permissions. Right now our user can do just about nothing. We need to fix that. We’re going to pick and choose what rights this user needs for this tutorial. These statements that we’re allowing don’t need to make sense to you right now; just know that they’ll allow you to follow along with this tutorial:
1 2 3
GRANT CREATE, INSERT, SELECT, UPDATE, DELETE ON my_testing.* TO 'myname'@'localhost';
You’ll notice I broke this statements out onto separate lines. You can do so, or just write them all on one line, separated by a space. I like to break longer queries out by statement, just to see the query better. And subsequently spot errors more easily. But that’s a coding style preference that you’ll come into your own on. In this query we are issuing the GRANT statement. It expects a list of statements that the receiving username should be allowed to perform. So we’re allowing the CREATE, INSERT, SELECT, UPDATE and DELETE statements. We need all 5 of these for this tutorial. We then specify, via the ON statement, what database and relations these rights apply to. You’ll notice we’re passing the database we just created. And we’re appending a .*. Why? Because that’s the nomenclature to say “every relation belonging to this database.” We could list specific relations in the same format (database.relation). But we don’t need to be that granular here. Your mileage will vary as you progress into bigger and more complication databases. Lastly, we state who these rights are being applied to. And unsurprisingly, it’s the user we just created, restricted to localhost still. And with that we have the user we’ll use from here on out! Exit MySQL:
And log back in:
mysql -u myname -p
Entering the password you set at the prompt. You should now be sitting in front of the MySQL prompt once again, ready to go!
Stricter Is Better
So I’ll level with you: most developers just use their root account. The few that don’t only don’t because they’re forced not to by the likes of cPanel. This is unfortunate. The root account has god-like privileges to the entire RDBMS instance. That is: every piece of data, in every relation, in every single database. And even beyond that, root has access to system setting, users, and other nitty-gritty details that you do NOT WANT AN ATTACKER HAVING ACCESS TO. The stricter your permissions the better, in the event an attacker gain access to your MySQL server with said user. Will an application ever need to completely drop a relation (delete it)? If not, then don’t allow the user that application accesses through be allowed to drop relations! This should be common sense, but unfortunately few follow it.
We’re logged in with our new myname user and ready to roll? Good. I have formatted the rest of this tutorial to introduce the basic CRUD statements within SQL. “CRUD you say? What is CRUD!?” Create Read Update Delete These four actions represent the basics of any data-driven application. Data is constantly being stored, read, changed and removed. So let’s take a look at how to do that in SQL.
I like to format my guides with little stories. Maybe it’s a distraction, maybe it helps. I don’t care- I like to do it. And like the other guide I wrote where I did it as a story, we’re going to following the work life of Joey the Database Administrator (he has a name now I guess…). He works for Company XYZ and answers to Mrs. CIO. Oh and we’ll see some poor performance by Mr. CFO (it’s entirely coincidental that I just chose green for the CFO) that will affect our story. Stay tuned!
Joey gets a call one day that he has been chosen for a contract position with the new company in town. Mrs. CIO, already being lazy, tells Joey that his first task is to make it so the database can store not only the two employees they are hiring, but also the date they started. Work anniversaries are important, after all. Just that- HR will paper-trail the rest of their vitals, because that’s so efficient (not really- it just makes it easier on me when writing this guide). Joey gets to work.
Relations (Also Known as Tables)
We know that everything is encapsulated by databases. And we just recently created a database to store our stuff. What databases really store are relations; you will also see them called tables. Think of them as Excel spreadsheets: you have columns defining what the data is, and rows that contain that actual data. Relational schemas are hard-defined. That is, you create them and then you use them. And while using them all data being placed into the relation must fit within the schema’s rules. Let’s just create a relation and dig into it.
We saw above how relations can be accessed via the database.relation_name nomenclature. Well typing that out all of the time gets tedious, so we issue the USE statement. We can now reference relations within the database without explicitly typing the database name.
1 2 3 4 5 6 7 8 9 10 11 12 13
CREATE TABLE employees ( emp_id INT UNSIGNED NOT NULL AUTO_INCREMENT, emp_first_name VARCHAR(16) NOT NULL, emp_last_name VARCHAR(20) NOT NULL, PRIMARY KEY(emp_id), UNIQUE(emp_first_name, emp_last_name) ); CREATE TABLE employee_start_dates ( emp_id INT UNSIGNED NOT NULL, emp_started_on DATETIME NOT NULL DEFAULT NOW(), UNIQUE(emp_id) );
Issues with the employee_start_dates table?
When I wrote this guide I was just getting done playing around in MySQL 5.6. Prior to 5.6.5 you could only pass constants as DEFAULT values (this is a long awaited feature). Let’s show you the better query to use if you’re on, as you probably are, MySQL 5.5:
1 2 3 4 5 CREATE TABLE employee_start_dates ( emp_id INT UNSIGNED NOT NULL, emp_started_on TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, UNIQUE(emp_id) );
It’s the same idea, but we’re using a TIMESTAMP data type instead. What happens with this is the Unix timestamp is stored (number of seconds since the Unix epoch) instead of a date in the YYYY-MM-DD HH:MM:SS format. The CURRENT_TIMESTAMP constant is just that- the timestamp at the time of the record being saved.
And thus enters the CREATE TABLE statement! This will create a relation with the specified schema. Effectively, you define the columns that make up the table and the type of data they need to contain. These columns are separated by commas; think of it as a list. In the above example we’re creating two relations. In the employees relation we have the following: emp_id: An unsigned integer (0 to 4294967295) that does not allow NULL values and it auto increments each time a new record is stored. emp_first_name: A variable character that accepts up to 16 characters (after that it just truncates) and does not want any NULL values. emp_last_name: Another variable character, this time good for up to 20 characters, that also does not want to see any NULL values. You should see the basic format:
[label] [data-TYPE] [options]
MySQL has a number of different data types. Four very popular datatypes are:
|INT||Basic 32bit integer, up to 11 digits wide, maxing out at 4,294,967,295 if unsigned; 2,147,483,647 if signed (negatives allowed).|
|DATETIME||The current date and time (YYYY-MM-DD HH:MM:SS format).|
|CHAR(LENGTH)||A string with a fixed length; CHAR(5) will always expect, and store, 5 characters. If less than 5 are presented then MySQL will pad with spaces to reach 5 characters. Good for the likes of social security numbers (if storing the dashes) and other values where you will always know the length.|
|VARCHAR(LENGTH)||Where CHAR is a fixed-length string, VARCHAR is a variable-length string. VARCHAR will not pad to reach the given length. Rather, it will accept strings UP TO that length. The given length is simply the maximum, not the required. Arguably the most common data type.|
We’ve told the ID column to be unsigned as we’re never going to have an employee with an ID number in the negatives; it just makes no sense. And with it being told to auto increment, every time we insert a new employee, that employee will have an ID of +1 the most previous (1… 2… 3… 4… 5… etc.). So we don’t have to formally set the ID for each employee. And it’s just going to be there to reference across relations (as we will see in the employee_start_dates relation). Two weird things pop up at the end though:
PRIMARY KEY(emp_id), UNIQUE(emp_first_name, emp_last_name);
What we’re doing is saying the emp_id is the primary identifier for this relation. There can only be one primary key per relation, and it needs to uniquely identify any given tuple (row). Every employee should have a unique ID number in the database, so this constraint enforces that. We also want to make sure we don’t end up with two employees with the same first and last name. Two Roberts is okay; two Jones (last name) is okay. But no two Robert Jones should work for Company XYZ. It’s a small company in a tiny town… and I’m fishing for excuses for simply wanting to show you this constraint (social security number is probably a better constraint, but oh well). You can pass a number of columns through the UNIQUE constraint. By passing both the first and last name we are forcing that those two be unique together, not exclusively (so two Roberts can be put in this relation, so long as the last names differ and vice versa). Let’s jump over to the start dates relation. With that knowledge, you can see that the employee_start_dates relation has an ID column (with the same data type as the ID column in employees) and a DATETIME column for the start date. Why do I want both IDs in each relation to have the same data type? Because the ID in the start date relation will reflect the ID of the employee in the employees relation. Hence why we don’t need it to auto increment here; we’re going to declare it specifically when storing information in this relation. The DATETIME column also shows something new: a default value. Any column may have a DEFAULT option. What we’re doing here is passing it the MySQL NOW() function. This function simply returns the current DATETIME. If we do not supply a DATETIME when inserting data into this relation, it will default to the DATETIME of that moment. At this point Joey is feeling proud of himself. He has two relations set up for the new employees! Oh but they’ve just started and he needs to get them added in!
The moment you’ve been waiting for… time to see how Joey inserts data for the first employee: John Smith!
1 2 3 4 5
INSERT INTO employees(emp_first_name, emp_last_name) VALUES('John', 'Smith'); INSERT INTO employee_start_dates(emp_id) VALUES(LAST_INSERT_ID());
Data is inserted into a relation via the INSERT INTO statement. It expects the following:
INSERT INTO TABLE_NAME(COLUMNS, TO, be, used) VALUES(DATA,maps,TO,COLUMNS)
You declare what columns you’re inserting into, and then provide that many values. It’s a pretty simple format. Do note: if you’re ever inserting numeric data types: do not send them as strings. That is, don’t insert numbers as: “12”. Send it without the quotes. This is a best practice. You do not need to explicitly declare and insert into every column. If a default value is set for the column then you’re okay to omit it. In the above case, the employee’s ID number is auto incrementing (a default value, as it’s being set by MySQL itself) and the DATETIME in the start dates relation will default to that very second. The only thing of real interest here is the LAST_INSERT_ID() function. It will take the ID generated (by auto increment, for example) of the last INSERT statement and return it. This way we don’t need to ask MySQL what the ID number is for the new employee; it’ll just occur seamlessly. A little on the advanced side as far as this tutorial shouldb e concerned, but worth seeing. One issue you might see arising here, however, is mix and matching of employees and start dates if the database gets hammered with new hires. This is a legitimate concern; it breaks a lot of best practices that we should look to follow. I will not even begin to explain this bit of code here, but just know MySQL does fix that, and that I will touch on this in a future guide:
1 2 3 4 5 6 7 8 9
DELIMITER || START TRANSACTION; INSERT INTO employees(emp_first_name, emp_last_name) VALUES('Amanda', 'Jones'); INSERT INTO employee_start_dates(emp_id) VALUES(LAST_INSERT_ID()); COMMIT; ||
This transaction guarantees both queries happen one after the other, and do not get screwed up in a hail storm of new hires. But as I said- this is just a teaser for a future guide.
When you’re not creating new relations or tuples (rows/data sets), you’re probably reading them. But we read a lot of different type of data from MySQL. Yes you want to be able to grab tuples (rows) out of a relation for using it in your application. But administering MySQL may require seeing what databases are in existence, or what tables are in the database you’re using. Let’s just take a quick look at this, then move on to working with our data sets again.
Poor Joey forgot the name of the database he’s working with. Oh, he can just look at all databases currently in existence on this database server? Awesome!
Your results will very, but the above is typical of a fresh installation. This lists every single database MySQL knows about. You can see our my_testing database listed there! Now little old Joey needs to know what relations he has in the database. He forgets things too quickly. So he does the following:
If you’re still using the right database, and have been following along, you’ll see the above. Neither of these really require any explanations. I think they both do that for themselves very well. Let’s jump into working with real data again.
We grab data out of relations by using the SELECT statement. In its most simplistic form:
SELECT emp_first_name, emp_last_name FROM employees;
The SELECT statement expects a list of columns to return to you. We’re specifically looking at the first and last name of employees in the employees relation. The FROM statement defines that last bit for us. SELECT needs to know where to pull the data from. FROM tells it just that. Now SELECT can be used to do some rather cool things, when combined with MySQL functions. Like for example:
1 2 3
SELECT CONCAT(emp_last_name, ', ', emp_first_name) AS employee_name FROM employees;
Like the transaction above, I won’t dig into this now. But just know more complicated SELECTs can be performed. As you can probably tell, the most basic format expected here is:
SELECT column_a, column_b, column_c, etc. FROM TABLE_NAME;
But we don’t always want to grab every single record! Wouldn’t it be nice to be able to set a condition where maybe we just grab any employees with a specific first name? Joey was asked to do just this when Mrs. CIO asked him to facilitate a directory. She wants him to build a query example that will return all first names of people with a specific last name. He gives her this:
1 2 3
SELECT emp_first_name FROM employees WHERE emp_last_name = 'Jones';
What’s different here? Only one thing: we’ve added a WHERE statement. The WHERE statement sets conditions that need to be met for records in order to return them. Specifically, we’re asking for all first names in employees where the last name is ‘Jones’. We’ll see one more operator used with a WHERE statement in this guide, but you should know a full suite of them exist (AND, OR, >, <, =, etc. etc.). Now Mrs. CIO comes back and asks Joey to show her all employees, ranked by when they were inserted into the database. He knows that means he won’t need the WHERE statement any more, because he wants all records but sorted in a certain fashion. How will he do it? Simple:
1 2 3
SELECT emp_first_name, emp_last_name FROM employees ORDER BY emp_id ASC;
We’re already familiar with most of this. But it’s the ORDER BY statement that is new. ORDER BY does just that- orders the returned records by the given row(s) and in the direction stated. In this case we’re ordering by the ID number and we want it to go smallest-to-largest, so ASCending. Alternatively we could go DESC (descending). There you have the basics of pulling information out of MySQL!
UPDATE Those Records!
With John and Amanda working together such long hours at Company XYZ they ended up dating, and then…. married? Ah crap, that means Joey needs to change her last name. And he does so just like this:
1 2 3 4 5 6 7
UPDATE employees SET emp_last_name = 'Smith' WHERE emp_last_name = 'Jones' AND emp_first_name = 'Amanda'; SELECT emp_last_name FROM employees WHERE emp_first_name = 'Amanda';
Some new things here! First off, basic
1 2 3
UPDATE TABLE_NAME SET column_name = new_value OPTIONAL WHERE;
So we’re saying “update employees and set the last name to Smith for a record that currently is Jones with a first name of Amanda”. This accurately finds Amanda and changes her last name, as the SELECT query shows us! What’s really of interest here is we’ve built on the WHERE statement we just learned about. The AND conditional means that both sides of it (the last name being Jones and the first name being Amanda) must both be true to pass the test. If either side return false then the whole query goes to shit and nothing happens. Congratulations John and Amanda… but bad news!
Remember Mr. CFO? Well he sucked and invested all of the firm’s capital in BTC prior to it plummeting, and now the company is under water. Which sucks, because Amanda is pregnant, John already has anxiety, and everybody just needs their paychecks. Oh well. Joey has one last duty: delete all two employee records from the database. They just aren’t needed any more.
DELETE FROM employees WHERE emp_id > 0;
Poor John and Amanda. The WHERE clause here is optional, as it usually is. I simply added it to show you how you can use another operator (this time the greater-than operator). Syntax of DELETE FROM:
DELETE FROM TABLE_NAME OPTIONAL WHERE
If no WHERE statement is given then all rows are removed from the table. Forewarning, in case that’s not your intention! Challenge (because I didn’t query and screenshot it myself, and don’t feel like it, as I have been typing for over two hours…): delete all records from the start dates relation and show us how you did it.
The sheer volume of the guide kind of forced me into glaring over some topics and going in depth in others. Let me know if there’s something in this guide that needs clarification, and I’ll make amendments. I’m always happy to answer any questions or fix any errors you may have spotted. I am human, and ended up with tunnel vision thirty minutes in… so it’s likely there are many. Thanks for reading!