In a span of just few months, COVID-19 quickly transcended from being an epidemic to a pandemic bringing the whole world to a standstill. As a fresh graduate in Business Analytics, I considered applying my visualization skills in R to visualize the data provided by the John Hopkins University Center for Systems Science and Engineering to understand deeper about how this virus snowballed from being an epidemic in China in December 2019 to a global pandemic in March 2020.
This project primarily targets the initial notable events of the COVID wave that began in December 2019 to pinpoint the exact time when this epidemic got converted to a pandemic.
In this project from Data Camp, I have created lucid line charts using ggplot, dplyr, readr packages by using data provided between December 11, 2019 and March 15, 2020. These graphs are intended to provide quick and simple overview of the complex data available.
Some of the key observations from this project in the form of line graphs are as mentioned below-
This graph helps us to visualize the confirmed cases worldwide. The numbers look quite terrifying with the overall cases in March reaching 200000. Also, we can see a strange jump in mid Feb, then the new cases slowing down before they rise in March.
I have now compared COVID cases in China and rest of the world separately to find out more about the global growth of COVID cases versus its place of origin.
Clearly, there is a difference in the two lines. China had the maximum cases in February which changed in March when it was actually declared as a global pandemic.
Some of the other notable WHO events that occurred during Feb and March can be seen in the plot below.
The last graph shows us the top 7 countries that were worst affected by COVID-19 as of Mid March 2020.
An interesting fact that comes to light here is 4 out of 7 countries listed above (France, Germany, Italy and Spain) are in Europe and share borders.
Although this project is not predicting the future trend of the COVID-19 pandemic, it shows us how ggplot and dplyr packages can be used to represent complex data with easy to understand visualizations in R.
Disclaimer: COVID -19 data and information is frequently being updated. The data used here was pulled on March 17th 2020 and should not be considered as the recent most data available. This project has been purely executed for educational motives.