Covid-19
Data Analysis Project
Introduction
The data analysis report presents an in-depth exploration and visualization of data related to the Covid-19 pandemic. The report aims to provide insights into the trends, patterns, and impacts of the pandemic on a global scale. The dashboard I intend to create for analysis includes descriptive statistics, data visualizations, and insights derived from the analysis.
The findings and insights presented in this report are intended to provide a comprehensive understanding of the pandemic and its impacts, as well as to inform public health policy and decision-making. Overall, the report seeks to contribute to the ongoing efforts to combat the pandemic and its effects on society.
Collecting and Preparing the Dataset

To start this project, I obtained publicly accessible data on Covid-19 cases, deaths, hospitalizations, and tests from all countries worldwide. This data, containing 67 columns and over 200,000 rows, was sourced from Our World In Data and can be accessed here. After downloading the raw data, I divided it into two datasets in Google Sheets- one containing details on Covid Deaths and the other containing information on Covid Vaccination data. To begin my analysis, I imported the datasets into MySQL and commenced my exploration.
Data Exploration
Once I had imported my data into MySQL, I realized that there was a lot of raw data that was not relevant to my analysis. In order to simplify my dataset, I focused on analyzing specific variables such as location, date, total and new cases, total deaths, and population. However, my curiosity about the global impact of Covid-19 continued to grow, prompting me to run initial queries that centered on the death percentage and the percentage of the world's population that had contracted the virus.
Diving Deeper
Expanding on my curiosity with different Countries, I wanted to look specifically into the U.S., and compare them to my findings for other Countries. In doing so, I ran the following queries:
​
- Query that examined the U.S. highest infection rate relative to their population
-
​Query that showed each of the continent’s total death count, including the world total
-
Query that showed each of the country’s total death count
-
Query that showed each continent’s death percentage based off their total death count divided by total continental cases
Joining the Vaccination Table
After studying the data related to Covid-19 deaths, my focus shifted towards analyzing the Covid-19 vaccination data I had collected. In order to combine the two datasets, I joined their respective tables in MySQL. Once this was done, I aimed to analyze the percentage of the population in each country that had been vaccinated. To do this, I used a Common Table Expression and a Temp table as an alternative to compared the total population against the total vaccination count for each country.
Data Visualization
After exploring the dataset, it's time for my favorite part: creating data visualizations. To start, I set aside four queries that I used for Tableau:
-
A table that shows the total number of cases worldwide.
-
A table that shows the total number of deaths by continent.
-
A table that shows the highest number of infections by location.
-
A table that shows the percentage of the population infected daily.
I used these queries to create a CSV file by running each query, selecting the entire table by double-clicking the top left corner, and then right-clicking on the table to bring up the option to 'Save as CSV'. This allowed me to upload the tables to Tableau, where I could use them as data sources to create a dashboard for analysis.
After I uploaded it to Tableau, it was time to create the dashboard. To highlight Covid-data statistics, I created a Dashboard with the following visualizations:
-
Table Chart showing the Global Impact on Covid-19
-
Line Graph the shows the Daily Infection Count by Country
-
Bar-graph showing total deaths per continent
-
Map that shows Infection Rate by Country




Created a CTE to create a new column Rolling People Vaccinated

Created a Temp Table as an alternative
Joined the Vaccine table to show the total Population vs Vaccine


The Results
With all of the data exploration and visualization complete, it was time to analyze the results of my project. While studying the data, it was interesting to compare the global numbers between all of the continents and find that Europe has the most total cases of Covid-19 and the highest total death count. Using the ‘Percent of Population Infected Daily’ chart I created, it was fascinating to see that the European Union and the rest of Europe also had the highest percentage of infected people per their population. This could be a result of the close proximity among countries in the European continent as well as a factor of when their borders began to close. More interestingly, North America as a continent is third in total cases with close to 80 million cases less than Asia, but is second in total death count!
Focusing on each country, the top ten countries with the highest infection percentage per population were:
-
Cyprus - 67.26%
-
Faroe Islands - 65.25%
-
San Marino - 64.30%
-
Gibraltar - 61.58%
-
Austria - 61.06%
-
Andorra - 58.35%
-
Slovenia - 58.34%