Learn. It is all about success and failure.
What are binomial distributions and why are they so useful? When we repeat a set of events like 10 times coin flipping and each single event in a set has two possible outcomes (head or tails) think about Binomial distributions. Each single event here is known as a Bernoulli Trial.
Bi- in binomial distributions refers to the two outcomes usually described as Success or no Success.
This week, I will analyze Car Fuel Economy dataset from TidyTuesday.
What is TidyTuesday? TidyTuesday is a weekly social data project in R organized by the R for Data Science community.
It is a great way of improving your Data wrangling and visualization techniques, sharing and learning from others.
You can find more information on their github.
Fuel economy data are the result of the work done by the US Environmental Protection Agency.
It is hard to understand your data by looking at the numbers on a csv file. You need to plot it. And adding statistics to your plots will make it more informative.
To evaluate data, we typically use mean and median to define its central tendency and range, quartiles, variance and standard deviation to define how spread it is.
Mean and standard deviation is a good representation of the data if we don’t have extreme values that result in a skewed distribution.
Accessing different data sources Sometimes, the data you need is available on the web. Accessing those will ease your life as a data scientist.
I want to perform an exploratory data analysis on 2018/19 Season of England Premier league.
Are there changes in team performances during the season timeline? Does some teams cluster? Which is the earliest week we can predict team’s final positions? I need the standings table for each week of the season and integrate them in a way that will allow me to plot the graphs that I want.
ggplot2 is a powerful data visualization tool of R. Make quick visualizations to explore or share your insights.
Learning how aesthetics and attributes are defined in ggplot will give you an edge to develop your skills quickly.
ggplot2 tips: distinction between aesthetics and attributes Aesthetics are defined inside aes() in ggplot syntax and attributes are outside the aes().
e.g. ggplot(data, aes(x, y, color=var1) + geom_point(size=6)
We typically understand aesthetics as how something looks, color, size etc.