11 Joining Data in SQL
https://learn.datacamp.com/courses/joining-data-in-postgresql
11.1 Introduction to joins
Inner join
SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id;Instead of writing the full table name, you can use table aliasing as a shortcut. For tables you also use AS to add the alias immediately after the table name with a space.
SELECT c1.name AS city, c2.name AS country
FROM cities AS c1
INNER JOIN countries AS c2
ON c1.country_code = c2.code;Notice that to select a field in your query that appears in multiple tables, you’ll need to identify which table/table alias you’re referring to by using a . in your SELECT statement.
The ability to combine multiple joins in a single query is a powerful feature of SQL:
# Select fields
SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
# From countries (alias as c)
FROM countries AS c
# Join to populations (as p)
INNER JOIN populations AS p
# Match on country code
ON c.code = p.country_code
# Join to economies (as e)
INNER JOIN economies AS e
# Match on country code
ON c.code = e.code;Inner join with USING
When joining tables with a common field name, e.g.
SELECT *
FROM countries
INNER JOIN economies
ON countries.code = economies.codeYou can use USING as a shortcut:
SELECT *
FROM countries
INNER JOIN economies
USING(code)CASE WHEN and THEN
Often it’s useful to look at a numerical field not as raw data, but instead as being in different categories or groups.
You can use CASE with WHEN, THEN, ELSE, and END to define a new grouping field.
SELECT name, continent, code, surface_area,
# First case
CASE WHEN surface_area > 2000000 THEN 'large'
# Second case
WHEN surface_area > 350000 THEN 'medium'
# Else clause + end
ELSE 'small' END
# Alias name
AS geosize_group
# From table
FROM countries;11.2 Outer joins and cross joins
Left join
# Select name, region, and gdp_percapita
SELECT name, region, gdp_percapita
# From countries (alias as c)
FROM countries AS c
# Left join with economies (alias as e)
LEFT JOIN economies AS e
# Match on code fields
ON c.code = e.code
# Focus on 2010
WHERE year = 2010;Right join
Right joins aren’t as common as left joins. One reason why is that you can always write a right join as a left join.
SELECT cities.name AS city, urbanarea_pop, countries.name AS country,
indep_year, languages.name AS language, percent
FROM languages
RIGHT JOIN countries
ON languages.code = countries.code
RIGHT JOIN cities
ON countries.code = cities.country_code
ORDER BY city, language;Full join
SELECT name AS country, code, region, basic_unit
# From countries
FROM countries
# Join to currencies
FULL JOIN currencies
# Match on code
USING (code)
# Where region is North America or null
WHERE region = 'North America' OR region IS NULL
# Order by region
ORDER BY region;11.3 Set theory clauses
UNION
UNION includes every record in both tables but doesn’t double count the matched/overlapped ones.
# Select fields from 2010 table
SELECT *
# From 2010 table
FROM economies2010
# Set theory clause
UNION
# Select fields from 2015 table
SELECT *
# From 2015 table
FROM economies2015
# Order by code and year
ORDER BY code, year;UNION ALL
UNION ALL includes every record in both tables and does replicate the matched/overlapped ones.
# Select fields
SELECT code, year
# From economies
FROM economies
# Set theory clause
UNION ALL
# Select fields
SELECT country_code, year
# From populations
FROM populations
# Order by code, year
ORDER BY code, year;INTERSECT
INTERSECT results in only those matched/overlapped.
# Select fields
SELECT code, year
# From economies
FROM economies
# Set theory clause
INTERSECT
# Select fields
SELECT country_code, year
# From populations
FROM populations
# Order by code and year
ORDER BY code, year;EXCEPT
EXCEPT results in only those doesn’t matched/overlapped.
# Select field
SELECT name
# From cities
FROM cities
#Set theory clause
EXCEPT
# Select field
SELECT capital
# From countries
FROM countries
# Order by result
ORDER BY name;Diagnosing problems using anti-join
Another powerful join in SQL is the anti-join. It is particularly useful in identifying which records are causing an incorrect number of records to appear in join queries.
# 3. Select fields
SELECT code, name
# 4. From Countries
FROM countries
# 5. Where continent is Oceania
WHERE continent = 'Oceania'
# 1. And code not in
AND code NOT IN
# 2. Subquery
(SELECT code
FROM currencies);11.4 Subqueries
Subquery inside WHERE
# Select fields
SELECT *
# From populations
FROM populations
# Where life_expectancy is greater than
WHERE life_expectancy >
# 1.15 * subquery
1.15 * (SELECT AVG(life_expectancy)
FROM populations
WHERE year = 2015) AND
year = 2015;Subquery inside SELECT
The code given in query.sql selects the top nine countries in terms of number of cities appearing in the cities table. Recall that this corresponds to the most populous cities in the world.
SELECT countries.name AS country, COUNT(*) AS cities_num
FROM cities
INNER JOIN countries
ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;convert the code to get the same result as the code shown:
SELECT countries.name AS country,
(SELECT COUNT(*)
FROM cities
WHERE countries.code = cities.country_code) AS cities_num
FROM countries
ORDER BY cities_num DESC, country
LIMIT 9;Subquery inside FROM
Begin by determining for each country code how many languages are listed in the languages table using SELECT, FROM, and GROUP BY.
Alias the aggregated field as lang_num.
-- Select fields (with aliases)
SELECT code, COUNT(name) AS lang_num
-- From languages
FROM languages
-- Group by code
GROUP BY code;Include the previous query (aliased as subquery) as a subquery in the FROM clause of a new query.
Select the local name of the country from countries.
Also, select lang_num from subquery.
Make sure to use WHERE appropriately to match code in countries and in subquery.
Sort by lang_num in descending order.
# Select fields
SELECT local_name, subquery.lang_num
# From countries
FROM countries,
# Subquery (alias as subquery)
(SELECT code, COUNT(name) AS lang_num
FROM languages
GROUP BY code) AS subquery
# Where codes match
WHERE countries.code = subquery.code
# Order by descending number of languages
ORDER BY lang_num DESC;