Portfolio
Data Analysis
Collecting Data
Data wrangling
- CSV opened in Excel
- New rows with formulas & Flash Fill added for GBP
- Altered some spelling (e.g Liter > Litre) and updated headers with PROPER function
- TRIM function for unwanted spaces
- Format column types (Currency, numeric commas etc.)
- Replaced null values
- Added normalised row (simple feature scaling) for population and yearly oil consumption
- Insert Table & PivotTable (new sheet)
- Added specific columns for PivotTable, and applied some conditional formatting to highlight areas of interest
Screenshot 1

Analysis and Visualisation
- First thing to notice when looking at the PivotTable, is that out of the top 30 of highest oil producing countries, 15 of these are part of the lowest 20% regarding price per litre. Which you would expect from the top producing countries. However, 3 of the top 5 would appear to have a relatively high price in comparison (US, China and Russia). This is likely down to the high consumption per capita and other political influences.
Screenshot 2
Screenshot 3

- The below pie chart shows that, out of the top 10 oil producing countries, the US, Saudi Arabia and Russia are the biggest contributors by a considerable amount (58% of total top 10 production).
Screenshot 4
Screenshot 5

- Created a column chart to show the top 25 populated countries alongside oil consumption. Again the US stands out here with a much higher consumption rate compared with population. Created a similar column chart whilst filtering by consumption instead, to provide another angle. There are further countries here that show a higher consumption rate in proportion to their population (Japan, Russia, Saudi Arabia, South Korea…).
Screenshot 6

The below combo chart helps explore usage per capita further. Unexpectedly, the US is outside of the top 10, despite their much higher net consumption. Singapore is definitely an outlier here, at least doubling all other entries. 7 of these top 10 countries have a much smaller land mass, which is a point of interest and asks for further investigation into why their consumption per capita is so high.
Screenshot 7

The final 2 graphs help visualise the countries with the highest and lowest price per litre. North Korea having a 10 times higher price than the average (£1.25 per litre). Clearly a result of heavy sanctions and limited resources.
Screenshot 8

Below shows a combo graph for the countries with the lowest price per litre (line) and their production rates, to show if there is a strong correlation with price and production. However, whilst there are some high producing countries here with low prices (Saudi Arabia in particular), there is no clear relationship, as some of the lowest prices also have a low production contribution.
Screenshot 9

These are the Notebooks, CSVs, screenshots and links for the Coursera IBM Data Analyst Capstone project.
Libraries included:
- Pandas
- Beautiful Soup
- JSON
- Requests
- Matplotlib
- Seaborn
Scraping, API & Exploring - This folder has 3 notebooks for importing from APIs, web pages, and exploratory analysis: (1) API folder:
- Jobs_API.ipynb (the Flask file used to host API, created by course providers. Everything that follows is my code and work)
- Collecting_Jobs_data_Using_API.ipynb (first assignment for pulling data from API)
- job-postings.xlsx (Excel file created in assignment, after pulling data and sorting into dataframe)
(2) Web scraping folder:
- Web-Scraping-Lab.ipynb (main file for scraping data from website)
- popular-languages.csv (CSV file created after scraping data)
(3) Exploring data folder:
- M1ExploreDataSet-lab.ipynb (main file for exploration)
Wrangling - This week is for data wrangling (finding, determining, removing duplicates & missing values. Plus normalising data)
- DataWranglingNotebook.ipynb
Exploratory Analysis - Further exploratory data analysis (plotting distribution, finding outliers and checking for correlation)
- ExploratoryDataAnalysis.ipynb
Visualisation & SQLite - Data visualisation (Visualising distribution, relationship, composition and comparison)
- DataVisualisation.ipynb (main folder for assignment)
Dashboard - Using IBM Cognos Analytics to create a dashboard (screenshots included)
- IBM Cloud Pak for data (HTTP link for dashboard)
- survery_data_demographics.csv (data used for dashboard)
- survey_data_technologies_normalised.csv (data used for dashboard)