Now, What’s a Probability Distribution? By – NEHA KUMARI Things happen all the time: dice are rolled, it rains, buses arrive. After the fact, the specific outcomes are certain: the dice came up 3 and 4, there was half an inch of rain today, the bus took 3 minutes to arrive. Before, we can only talk about how likely the outcomes are. Probability distributions describe what we think the probability of each outcome is, which is sometimes more interesting to know than simply which single outcome is most likely.

They come in many shapes, but in only one size: probabilities in a distribution always add up to 1. Probability distributions is one of many statistical techniques that can be used to analyze data to find useful patterns. Here in this project, I ha ve chosen a interesting and one of the reknowned role of probability distributuion i.e. in the field of DATA SCIENCE. Data science ” discovery of data insight . This aspect of data science is all about uncovering findings from data.

Diving in at a granular level to mine and understand complex behaviors, trends, and inferences. It’s about surfacing hidden insight that can help enable companies to make smarter busin ess decisions. For example: ‚· Netflix data mines movie viewing patterns to understand what drives user interest, and uses that to make decisions on which Netflix original series to produce. ‚· Target identifies what are ma jor customer segments within it i s bas e and the unique shopping behaviors within those segments, which helps to guide messaging to different market audiences. ‚· Proctor & Gamble utilizes time series models to more clearly understand future demand, which help plan for p roduction levels more optim ally, etc. Introduction Suppose you are a teacher in a university. After checking assignments for a week, you graded all the students. You gave these graded papers to a data entry guy in the university and tell him to create a spreadsheet containing the gr ades of all the students. But the guy only stores the grades and not the corresponding students. He made another blunder, he missed a couple of entries in a hurry and we have no idea whose grades are missing. Let’s find a way to solve this.One way is th at you visualize the grades and see if you can find a trend in the data. The graph that you have plot is called the frequency distribution of the data. You see that there is a smooth curve like structure that defines our data, but do you notice an anomal y? We have an abnormally low frequency at a particular score range. So the best guess would be to have missing values that remove the dent in the distribution.This is how you would try to solve a real -life problem using data analysis. For any Data Scientis t, a student or a practitioner, distribution is a must know concept. It provides the basis for analytics and inferential statistics. While the concept of probability gives us the mathematical calculations, distributions help us actually visualize what’s happening underneath. This is why and how probability distribution is helpful in analysing data and thus, a major and most essential co ncept required in data science. Data scientists have hundreds of probability distributions from which to choose. Where to start? I am going to briefly describe some of the types of d istributions and about their implimentations: 1. Bernoulli Distribution 2. Binomial Distribution 3. Normal Distribution 4. Poisson Distribution 5. Exponential Distribution What is a Bernoulli Distribution? A Bernouilli distribution is a discrete probability distribution for a Bernouilli trial ” a random experiment that has only two outcom es (usually called a Success or a Failure). For example, the probability of getting a heads (a success) while flipping a coin is 0.5. The probability of failure is 1 ” P (1 minus the probability of success, which also equals 0.5 for a coin toss). I t is a special case of the binomial distribution for n = 1. In other words, it is a binomial distribution with a single trial (e.g. a single coin toss).The probability of a failure is labeled on the x -axis as 0 and success is labeled as 1.The probability density function (pdf) for this distribution is p x (1 ” p)1 ” x, which can also be written as:An important part of every Bernoulli trial is that each action must be independent . That means the probabilities must remain the same throughout the trials; ea ch event must be completely separate and have nothing to do with the previous event. For ex – Winning a scratch off lottery is an independent event . Your odds of winning on one ticket are the same as winning on any other ticket can be determined easily. Bin omial distribution: You would use the binomial distribution to analyze variables that can assume only one of two values. For example, you could determine the probability that a given percentage of members at a sports club are left -handed. b(x; n, P ) = nCx * P x * (1 – P)n – x Normal distribution: The normal distribution is the most widely used probability distribution in most disciplines, including economics, finance, marketing, biology, psychology, and many others. One of the characteristic features of the normal distribution is symmetry ” the pr obability of a variable being a given distance below the mean of the distribution equals the probability of it being the same distance above the mean. For example, if the mean height of all men in the United States is 70 inches, and heights are normally di stributed, a randomly chosen man is equally likely to be between 68 and 70 inches tall as he is to be between 70 and 72 inches tall. The normal distribution works well with many applications. For example, it’s often used in the field of finance to describe the returns to financial assets. Due to its ease of interpretation and implementation, the normal distribution is sometimes used even when the assumption of normality is only approximately correct. Another i s the distribution of errors in measurements. On e of the first applications of the normal distribution was to the analysis of errors of measurement made in astronomical observations by Galileo. Poisson distribution: You would use the Poisson distribution to describe the likelihoo d of a given number of events occurring over an interval of time. For example, it could be used to describe the probability of a specified number of hits on a website over the coming hour. What about the count of customers calling a support hotline each minute? That’s an outcome whose distribution sounds binomial, if you think of each second as a Bernoulli trial in which a customer doesn’t call (0) or does (1). However, as the power company know s, when the power goes out, 2 or even hundreds of people can call in the sa me second. So here poisson plays an important role. Like the binomial distribution, the Poisson distribution is the distribution of a count ” the count of times something happened. It’ s parameterized not by a probability p and number of trials n but by an average rate “, which in this analogy is simply the constant value of np . The Poisson distribution is what you must think of when trying to count events over a time given the continuou s rate of events occurring.When things like packets arrive at routers, or customers arrive at a store, or things wait in some kind of queue, think Poisson . Some more examples are: The number of thefts reported in an area on a day, The number of suicides reported in a particular city,etc. ‚· Here, X is called a Poisson Random Variable and the probability distribution of X is called Poisson distribution.Let µ denote the mean number of events in an interval of length t. Then, µ = “*t. ” is the rate at which an event occurs,t is the length of a time interval .The PMF of X following a Poisson distribution is given by: Exponential Distribution : Let’s consider the call center example one more time. What about the interval of time between the calls ? Here, expo nential distribution comes to our rescue. Exponential distribution models the interval of time between the calls.Other examples are: Length of time beteeen metro arrivals, Length of time betw een arrivals at a gas station, The life of an Air Conditioner , et c. Exponential distribution is widely used for survival analysis. From the expected life of a machine to the expected life of a human, exponential distribution successfully delivers the result .A random variable X is said to have an exponential distribution with PDF: f(x) = { “e -“x, x ‰Ґ 0 and parameter “>0 which is also called the rate.For survival analysis, ” is called the failure rate of a device at any time t, given that it has survived up to t. Also, the greater the rate, the faster the curv e drops and the lower the rate, flatter the curve. This is explained better with the graph shown below.