So let me just preface this by saying I am not an expert or a competent user of SQL. This is just an intro to SQL from a learner’s perspective. On my learning journey towards becoming a competent data scientist, I have had to learn a number of skills and this article will just be a record of that. If you do end up reading this, I hope you find it interesting, or at the very least, a refresher on some basic SQL. I have learned this from DataCamp. This is not sponsored or anything, but it has been very useful too. Although, you could probably do it for free.
Selecting columns.
To select single columns you would use the SELECT command. This command specifies what column of data you would like to take from. You will then need to specify the SQL table you are taking this data from. For example, if we wanted to SELECT names FROM the people column we would use this command:
SELECT names
FROM people
I could have also written out the command all on one line. But I happen to think doing it like this makes it easier to read.
You could also select two columns at one time using the following code. You will have noticed a comma is needed much like when we make a list in our everyday life.
SELECT names, birthdate
FROM people
And if I really wanted to I could select all columns inside of a table using the * command.
SELECT *
FROM people
DISTINCT Values.
Now, it could also be quite common for a data scientist to SELECT unique values. We could also call them DISTINCT values. And we do so like this.
SELECT DISTINCT languages
FROM film;
I did two things here, I ended my code with a ‘;’ and I also used the DISTINCT code. I do not need to add a ‘;’ however it is probably a good idea to get into the habit of this now since leaving it out is, quite frankly, barbaric.
What if we want to count the number of employees in a table? Well, this is what we can do. We can use the COUNT() command to do so. We will also be adding the argument ‘*’ to our COUNT command. Here it is COUNT(*), this allows us to return all the data from wherever we specify, even if it is a null value. In other words, it returns to us the total amount of rows. Here is what it looks like in action:
SELECT COUNT(*)
FROM people;
‘people’ is the name of a table, this code does not make any reference to columns so make sure to bear that in mind.
Putting it all together.
Now, let's put everything we have done so far and put it together.
SELECT COUNT(DISTINCT birthdate)
FROM people;
What is this doing? Take a little while to figure it out.
It is counting the number of unique values in the column called birthday FROM the people table. You are probably thinking about the possibilities of what you could do with this code, even with a small set of tools you can accomplish quite a lot and access a lot of data in a structured way.
Take a look at this table. We can use the SQL we have learned to access its data. Here is an example of how we might access the number of unique country values from the film table.
SELECT COUNT(DISTINCT country)
FROM films;
I hope you have learned a few things from this guide. Here is another breakdown of everything we have learned so far:
SELECT specifies what column or data to retrieve.
FROM specifies what table to retrieve the data from.
COUNT(*) returns an integer value of the number of rows including null values.
COUNT() returns the number of rows based on the argument given.
DISTINCT is used in conjunction with some of the above to return unique or DISTINCT values.