Business Statistics Unit 1 - Introduction

In this first unit you are introduced to some basic statistical concepts and terms.  Remember to obtain a copy of the learning objectives for this unit by going to the Unit 1 link under Course Documents in Blackboard, and clicking on the link for Learning Objectives, Unit 1.

INTRODUCTION


Why should you learn how to use and interpret statistics?  There are many reasons why you should become more knowledgeable about statistics.  First of all, you are exposed to statistics in many ways.  Have you ever watched, read, or listened to the news and been presented with a "scientific research study", national voter polling data, or economic data indicating "a 20% decline in unemployment...".  Many of us believe statistical information and data without questioning who prepared it, how was it collected, who interpreted it, what data is missing or excluded, and finally who paid for the study.  By studying statistics you will understand and question statistical data that is presented to you and be able to ask the right questions before accepting the data.

A second reason why learning statistics is so important relates to your current and future employment.  Many jobs require the calculation, interpretation, and presentation of statistical data.  Many decisions are based on data generated by computer systems and company employees.  Sales trends, product quality reports, accounting costs, and payroll information are all analyzed using statistical techniques.

So what is statistics all about?  The definition of statistics is the process or science of designing research studies, accumulating or gathering data, and then taking the data and classifying it into categories, summarizing the data, and finally presenting the data to support and explain decisions that will be made based upon the statistical analysis.  Each of these steps contains critical procedures and steps which need to be followed in order to make a proper decision based upon the data.

Data is collected in statistical research from a population.  A population is the total number of people or objects under study.  For example, a population could be the total number of students attending college in the State of Michigan.  Or it could be the total number of lakes in the State of Indiana.  A population could also be the number of people in the United States who are in favor of higher speed limits on interstate freeways.  Simple put, the population is the entire group being studied.

Clip art of people When you conduct statistical research on a population you have two choices about how you would like to examine or survey the population.  You could conduct a census, which is a survey, count or examination of all members or elements of the population.  You could also conduct a sample, which is a survey, count or examination of a portion or part of the population.
Statisticians also have some other terms which are used to describe a population and a sample.  A parameter is a number or term that describes a characteristic of the population.  For example, if we were studying all of the lakes in Indiana, a parameter could be the number of lakes in the population that are larger than 100 acres.  Looking at the total number of college students in Michigan, an example of a parameter is the number of students who are over the age of 30 in the population. Clip art of a scenic lake

A statistic is a number that is used to describe a characteristic of a sample.  A statistic is used when a parameter is unknown.  For example, if we selected a sample of 1000 students from the population of college students in Michigan, and asked the students in our sample their age, we could then determine the average age of students in our sample.  If the average age was 28, the number 28 is a statistic that represents or describes a characteristic of our sample.

There are two types of statistics that are used when research is conducted.  The first type are called descriptive statistics.  Descriptive statistics are used to collect, classify, summarize, and present data.  If you see data that is presented using charts, graphs, and tables, descriptive statistics is being used.  Also, when data is organized and totaled by categories or grouped into percentages, descriptive statistics are being used.  Most of us have been exposed to descriptive statistics simply because they are often used to summarize and classify large amounts of data.  Imagine if you conducted a survey that received 5,000 responses to 20 different multiple choice questions with four possible answers for each question.  Would you like to see a report that contains each person's response to each question?  No!  This is an excellent example of the use of descriptive statistics.  Instead of reading each person's response to each question, totals for the four possible answers to each question are added and reported as a total number or a percent.  So now you get a summary report that may say, for example, in regards to the first question, 200 people selected answer A, 400 people selected B, 350 people selected C, and 50 people selected answer D.

The second type of statistics is called inferential statistics.  Inferential statistics are used to analyze sample data and make a determination about the population.  Statistical analysis is conducted on the sample data and a statistic is calculated.  The statistic is used to represent an unknown population parameter.  For example, suppose we sampled 100 lakes in Indiana and determined that largemouth bass were present in 85% of the lakes in our sample.  We could infer that 85% of the lakes in Indiana (the population) have largemouth bass in them based upon our sample statistic.

Clip art of tires

Inferential statistics are often used by business to assist with many decisions.  For example, if a company produces wheels for the Ford Motor Company they often have specifications for the size and weight of the wheels.  Tolerances or ranges for the weight and size are often specified.  How can the company making the wheels determine if it is meeting the specifications established by Ford?  By conducting a sample of the wheels produced and measuring the size and weight of each wheel in the sample, the company can then determine if the wheels are meeting the specifications.  This is how inferential statistics are used.

TYPES OF VARIABLES


A variable is a numerical value that is possessed by each item, person, or object being studied.

Within statistical research we commonly find two types of variables:  qualitative and quantitative.  A qualitative variable is a variable that is nonnumeric.  It it sometimes called an attribute.  Examples of qualitative data include the color of your car, the city you live in, and the type of home you live in.  When qualitative data is presented, it is often summarized by totals or percentages.  For example, 20% of the new cars sold in 2003 were white in color.  When you are using qualitative data many of the simple statistical techniques become meaningless.  Often qualitative data is assigned a number and entered into a database for reporting purposes.  But if you were collecting data on car color preferences, does it make sense to calculate an average color?  No!  So you must be careful how you use and present qualitative data.  Even if a number is assigned to qualitative data, it does not make sense to calculate an average color.  Instead you will likely report the number or percent of people who own a white car, red car, blue car, etc.

Our primary focus in this course is on quantitative variables.  A quantitative variable has a numerical value.  The amount of cash in your wallet, your age or weight, and the distance you travel to work or college from home are all quantitative variables.

A quantitative variable can be either discrete or continuous.  A discrete variable has a specific or finite value.  It can be counted.  Examples of discrete variables include the number of floors in the Empire State Building in New York, the number of cards in a standard deck of cards, or the number of people enrolled at a local college.  A continuous variable is different.  It can assume any value within a given range of precision.  Your weight is a great example of a continuous variable.  Depending upon the precision of the measuring device, your weight could be 150 pounds, 149.7 pounds, 149.72 pounds, etc.  In most situations continuous variables occur when you measure something.

CLASSIFICATION AND MEASUREMENT OF DATA

Data is classified into four different categories based upon its characteristics and how the data is measured.  The types of statistical analysis that can be conducted on data is based upon the category it resides in.  The four categories are:  nominal, ordinal, interval, and ratio.  Let's take a look at each category.

  1. Nominal scale data.  Nominal data does not have any order to it.  The data can only be counted and classified by categories or labels.  Examples of nominal data include gender, the color of your hair, or the types of automobiles made (sports cars, luxury cars, SUVs, etc.).  Survey answers that are yes or no are another example of nominal data.  Even if numbers are used to classify data, the numbers themselves have no meaning other than as a label or category.

  2. Ordinal scale data.  Ordinal data is data that has some type of ranking or order to it.  Ordinal data has the properties of nominal data, but the order or rank is meaningful.  Many of us are acquainted with ordinal data which is found often on surveys.  For example, if a survey asked for your opinion of this course, there could be five possible answers:  excellent, very good, good, fair, or poor.  The answers have a rank or value associated with them.  Sometimes numbers are used to represent the possible answers:  for example 1 = excellent, 2 = very good, etc.  This is ordinal data. 

  3. Interval scale data.  Interval data is data that has the properties of ordinal data and a fixed, measurable difference exists between the variables.  The data contains an order that is based upon the amount of a characteristic it possesses.  The value of zero (0) does not have a meaningful value.  Examples of interval data include the temperature using a thermometer and standardized test scores like the ACT, SAT, and the GRE.  Interval data is always numeric.

  4. Ratio scale data.  Ratio data contains data that has the highest possible level of measurement.  It contains in addition to all of the characteristics of interval data the value of zero (0) which indicates that no value exists for a variable.  Ratio data includes measuring distance, height, weight, and the cost of a good or service.  For example, if you purchased a new TV today for $600, and the price of the DVD player that you also purchased was $150, since the data is ratio data you could indicate that the price of the TV was 4 times the price of the DVD player ($600/$150).   The most significant difference between ratio and interval data is that you can make comparisons like the TV and DVD player with ratio data, but you cannot do the same with Interval data.  Is 100 degrees twice as hot as 50 degrees?  The number value is twice as much but it is difficult to determine if 100 degrees is twice as hot as 50 degrees.

SAMPLING


So far we have looked at some of the basic terms and concepts associated with the use of statistics.  A sample was defined as a survey or examination of a portion or part of the population.  Why do we choose to sample instead of examining the entire population?

  1. First of all a sample is less costly than examining the entire population (census).  Whether you are studying lakes in Indiana or wheels produced by a company, conducting a sample will be less costly than examining the entire population.
  2. Another reason why a sample is preferred over a census is that a sample can be conducted in a shorter time period.  If decisions need to be made quickly, a sample can save you time over a census since there is less data to compile and analyze.
  3. The third reason why a sample is preferred over a census is that you may not be able to obtain a census of the population.  Suppose we were interested in studying the population of mosquitoes in northern Michigan during the month of June.  Is it possible to conduct a census and count the number of mosquitoes?  No!  Therefore we use sampling and inferential statistics to estimate the population of mosquitoes in northern Michigan.
  4. Finally, a sample, if it is conducted properly will provide you with statistics that are extremely reliable.  It is likely that there is no need to conduct a census if you can get a statistically reliable estimation of the population parameter you wish to determine by using a sample.

We will look at statistical sampling techniques and methods to calculate the size of a sample needed in a later module.  It is important to understand that how a sample is obtained from a population is an important factor in determining the validity of the sample.  There are two basic methods used to collect sample data from a population.

  1. Non-probability sample.  A non-probability sample is simply a sample that is based upon the researcher's experience, knowledge, or judgment.  Included within this sampling method is voluntary sampling which is a sampling technique where people are asked to participate.  Voluntary sampling is found in advertisements in the newspaper and on television, and it is also found on the Internet.  Have you ever participated in one of those online polls at some of the news websites like www.cnn.com, www.msnbc.com, or www.foxnews.com?  If so, you have participated in a voluntary non-probability sample.  Another non-probability sampling techniques is called a convenience sample.  A convenience sample is a research technique where the researcher looks for a large number of people and conducts his or her research at that location in order to get the most data.  An example of this technique is a political candidate who is collecting information about a local issue important to voters.  Instead of sending out a mailing to all of the voters asking for their opinion, a person is placed at the local shopping mall and hands out questionnaires to people who walk by.

  2. Probability sample.  A probability sample is a sample that has been obtained using one of four techniques listed below.  When probability sampling is used, the researcher is already aware of the chances of each person or object in a population being selected.

This is by no means a complete list of all possible sampling techniques.  Depending upon the type of business or research area, other techniques may be used.  The listed techniques are the most popular ones used and have many years of research experience.

Evaluating Statistical Data


We are exposed to a large amount of statistical data every day.  We see and hear it on the television and often read articles in newspapers and magazines that use statistical data and charts.  One question we need to ask whenever we are presented with statistical data is:  Is the data accurate?  Often many of the articles we read and newscasts we watch contain some form of bias.  Bias is the intentional or unintentional alteration of data and/or analysis results.  Bias occurs when the data is misinterpreted or if a critical part is left out.  Bias also occurs through the selection of words that may slant or taint the data or the analysis.

For example, suppose a survey was conducted to obtain the opinion of voters in regards to a proposed ballot issue.  Bias could occur as a result of who was surveyed, when they were surveyed, how they were surveyed, and the types of questions asked.  Bias can also occur within a question itself.  The wording of the question may lead people to answer it in a specific way that is favorable towards one position or another.  The results of the survey itself may not be fully disclosed.  If 10 questions were asked and only 5 of them were favorable to a particular position, the responses to the other 5 questions may not be disclosed.

We can also find bias in graphs that are used to show and explain data.  The scale could be changed to make it appear that differences are larger or smaller than they really are.  So, what questions should we ask when we evaluate statistical data?

  1. Who is the source of the information?  Finding out how the data was collected and the sources may be difficult.  However it is important to know who obtained the data and how it was obtained in order to effectively evaluate it.  Often another question is asked:  Who paid for the study?  Finding out who paid for a study may provide you with information that helps you determine if a study is valid.  For example, if an independent lab pays for and conducts a test on a new drug you will likely be more comfortable with the results than if the pharmaceutical company who invented the drug pays for and conducts the test themselves.

  2. What evidence is offered to support the information or findings?  Did the data come from the government, or was it obtained from a survey?  If a survey was used, what did the survey ask for?  If the data came from the government, what was the source of the data and was all of the data included?

  3. What information is missing?  Was all of the data disclosed or were there parts missing that might not support the findings?  Is there other data that might contradict the data that was disclosed?

  4. Is the conclusion reasonable based upon the evidence given?  Did the data support the conclusion or is another conclusion possible?  Was the conclusion logical and verifiable based upon the evidence?  Was the statistical information valid?

Always carefully review data that was obtained from a sample.  How the sample was obtained, its size and how the data was collected are all extremely important considerations when evaluating research data obtained from a sample.  Understand that any statistics used may support the findings of the research but additional statistics that were not used could refute the study.  For example, by using the mean instead of the median, the analysis could produce different results because the mean can be influenced by large or small values while the median is not.  We will learn more about the mean and median in future units as well as the many methods that can be used to analyze data and make decisions.

Statistical software packages today have made it possible for researchers who use statistics to analyze and produce statistical information at a faster pace than ever before.  Products like Excel®, Minitab®, SPSS®, SAS®, and other software packages assist researchers with the calculation of statistical data.  Although this can greatly reduce the time necessary to produce statistical data, the researcher still needs to understand the types of statistical analysis available, the best choice for a particular project, and the researcher needs to be able to interpret the output and how it applies to a problem that has been identified.

This is a great time to get acquainted with Excel.  We will be using Excel throughout the course whenever possible to do our calculations for us.

It is important that you verify that your copy of Excel contains the add-in called Analysis Toolpak.  Analysis Toolpak is used for many of the statistical tasks we have in this.  Follow the steps below to verify and load Analysis Toolpak.

Excel 2007

Add In Instructions

Make sure you select Analysis Toolpak as the Add-in you wish to load.  If Analysis Toolpak is not listed, you will need to either reinstall Excel or download the Add-In from Microsoft.

 

Excel 2003

1.  Click on the Tools menu option.

2.  Look for Add-Ins in the menu options.

3.  Click on Add-Ins.

4.  There should be a window like the one below.

5.  Click in the box next to Analysis ToolPak.

6.  Now click OK.

7.  Analysis Toolpak will be loaded each time you start Excel.

8.  If the option for Analysis Toolpak was not listed under Add-Ins, you will need to either get your original CDs with Excel and reinstall Excel, or download the Add-In from Microsoft.

 

 

The core purpose of this course is to provide you with the knowledge and tools necessary for you to apply the correct statistical analysis to a problem, organize and analyze the data, and make the correct decision.  Because of the many statistical software packages available today, there is less emphasis on calculating the statistical information and more emphasis on interpreting the information generated and making a decision.  That is what statistics is used for in the business sector today. 

Joke Time!

A somewhat advanced society in the future has figured out how to package basic knowledge in pill form.

A student, needing some learning, goes to the pharmacy and asks what kind of knowledge pills are available.  The pharmacist says "Here's a pill for English literature."  The student takes the pill and swallows it and has new knowledge about English literature.

"What else to you have?"  asks the student.  "Well, I have pills for art history, biology, and world history," replies the pharmacist.  The student then asks, "Do you have a pill for statistics?"  The pharmacist says "Wait just a moment", and goes back into the storeroom and brings back a whopper of a pill that is about twice the size of a jawbreaker candy and plunks it on the counter.  "I have to take that huge pill for statistics?" inquires the student.

The pharmacist understandingly nods his head and replies:  "Well, you know, statistics always was a little hard to swallow."

Source:  M. Holtz


SUMMARY


In this unit we have learned about many of the terms associated with the use of statistics.  We have also discussed sampling methods, types of variables and data, and some of the problems associated with reviewing and analyzing statistical data.  You should be comfortable with all of the terms and concepts before proceeding to the next unit.  Each new unit builds upon the concepts and ideas discussed in previous units so make sure you are comfortable with what you have been presented with so far before you continue on to the next unit.

ASSIGNMENT


The homework problems for unit 1 can be found by going to the Course Documents link in Blackboard, and clicking on the link for Unit 1.  Look for the Unit 1 - Assignment link.  Once you have completed your homework assignment, you will need to post your answers on Blackboard®.  Once you have signed into Blackboard, simply go to the Unit 1 - Assignment 1 - Post Answers Here link and post your answers.  Immediate feedback is provided once you have completed the posting of all of your answers and clicked on submit.  Make sure you print the entire submitted homework assignment to assist you with quizzes and tests.

©2008, 2007 by E.H. McKay, III.  Version 5.0

Some Images © 2006, 2004 by Clipart.com.