# Data Analysis - Descriptive and Inferential Statistic

Descriptive Statistics

Descriptive statistics may be helpful for 2 purposes: 1) to supply basic info regarding variables in an exceedingly data set and 2) to focus on potential relationships between variables. The 3 commonest descriptive statistics may be displayed diagrammatically or pictorially and area unit measures of: 1. Graphical/Pictorial ways
2. Measures of Central Tendency
3. Measures of Dispersion
4. Measures of Association

Graphical/Pictorial ways
There area unit many graphical and pictorial ways that enhance researchers' understanding of individual variables and therefore the relationships between variables. Graphical and pictorial ways give a visible illustration of the info. a number of these ways include:
• Histograms
• Scatter plots
• Geographical info Systems (GIS)
• Sociograms
Histograms
• Visually represent the frequencies with that values of variables occur
• Every price of a variable is displayed on all-time low of a bar graph, and a bar is drawn for every price.
• The peak of the bar corresponds to the frequency with that that price happens
Scatter plots
• Show the link between 2 quantitative or numeric variables by plotting one variable against the worth of another variable.
• as an example, one axis of a scatter plot might represent height and therefore the different might represent weight. everyone within the information would receive one information on the scatter plot that corresponds to his or her height and weight
Geographic Info Systems (GIS)
• A GIS could be a automatic data processing system capable of capturing, storing, analyzing, and displaying geographically documented information; that's, information known consistent with location.
• Employing a GIS program, a research worker will produce a map to represent information relationships visually
Sociograms
• Show networks of relationships among variables, sanctioning researchers to spot the character of relationships that may rather be too complicated to gestate.
Measures of Central Tendency
Measures of central tendency area unit the foremost basic and, often, the foremost informative description of a population's characteristics. They describe the "average" member of the population of interest. There area unit 3 measures of central tendency:
• Mean -- the add of a variable's values divided by the full range of values
• Median -- the center price of a variable
• Mode -- the worth that happens most frequently
Example:
The incomes of 5 indiscriminately elect individuals within the u.  s. area unit \$10,000, \$10,000, \$45,000, \$60,000, and \$1,000,000.

Mean financial gain = (10,000 + 10,000 + 45,000 + 60,000 + 1,000,000) / five = \$225,000
Median financial gain = \$45,000
Modal financial gain = \$10,000

The mean is that the most ordinarily used live of central tendency. Medians area unit usually used once some values area unit extraordinarily completely different from the remainder of the values (this is termed a inclined distribution). as an example, the median financial gain is usually the most effective live of the typical financial gain as a result of, whereas most people earn between \$0 and \$200,000, a couple of people earn millions.

Measures of Dispersion
Measures of dispersion give info regarding the unfold of a variable's values. There area unit four key measures of dispersion:
• Range
• Variance
• Standar Deviation
• Skew
Range is just the distinction between the tiniest and largest values within the information. The interquartile vary is that the distinction between the values at the seventy fifth mark and therefore the twenty fifth mark of the info.

Variance is that the most ordinarily used live of dispersion. it's calculated by taking the typical of the square variations between every price and therefore the mean.

Standard deviation, another ordinarily used datum, is that the root of the variance.

Skew could be a live of whether or not some values of a variable area unit extraordinarily completely different from the bulk of the values. as an example, financial gain is inclined as a result of the majority create between \$0 and \$200,000, however a couple of individuals earn millions. A variable is completely inclined if the intense values area unit on top of the bulk of values. A variable is negatively inclined if the intense values area unit under the bulk of values.

Example:
The incomes of 5 indiscriminately elect individuals within the u.  s. area unit \$10,000, \$10,000, \$45,000, \$60,000, and \$1,000,000:

Range = 1,000,000 - 10,000 = 990,000
Variance = [(10,000 - 225,000)2 + (10,000 - 225,000)2 + (45,000 - 225,000)2 + (60,000 - 225,000)2 + (1,000,000 - 225,000)2] / five = one hundred fifty,540,000,000
Standard Deviation = root (150,540,000,000) = 387,995
Skew = financial gain is completely inclined.

Measures of Association
Measures of association indicate whether or not 2 variables area unit connected. 2 measures area unit ordinarily used:
1. Chi-square
2. Correlation
Chi-Square
• As a live of association between variables, chi-square tests area unit used on nominal information (i.e., information that area unit place into classes: e.g., gender [male, female] and kind of job [unskilled, semi-skilled, skilled]) to see whether or not they area unit associated.
• A chi-square is termed important if there's associate degree association between 2 variables, associate degree non significant if there's not an association
To test for associations, a chi-square is calculated within the following way: Suppose a research worker needs to grasp whether or not there's a relationship between gender and 2 sorts of jobs, artisan and body assistant. To perform a chi-square check, the research worker counts up the amount of feminine body assistants, the amount of feminine construction employees, the amount of male body assistants, and therefore the range of male construction employees within the information. These counts area unit compared with the amount that may be expected in every class if there have been no association between job sort and gender (this expected count relies on applied mathematics calculations). If there's an over sized distinction between the ascertained values and therefore the expected values, the chi-square check is critical, that indicates there's associate degree association between the 2 variables.

*The chi-square check may be used as a live of goodness of match, to check if information from a sample return from a population with a particular distribution, as an alternate to Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit tests. As such, the chi sq. check isn't restricted to nominal information; with non-binned data, however, the results depend upon however the bins or categories area unit created and therefore the size of the sample

Correlation
• A correlation is employed to live the strength of the link between numeric variables (e.g., weight and height)
• The foremost common correlation is Pearson's r, which may vary from -1 to +1.
• If the constant is between zero and one, united variable will increase, the opposite additionally will increase. this can be known as a direct correlation. as an example, height and weight area unit completely correlate as a result of taller individuals sometimes weigh additional.
• If the correlation is between -1 and zero, united variable will increase the opposite decreases. this can be known as a indirect correlation. as an example, age and hours slept per night area unit negatively correlate as a result of older individuals sometimes sleep fewer hours per night
Comment Policy: Silahkan tuliskan komentar Anda yang sesuai dengan topik postingan halaman ini. Komentar yang berisi tautan tidak akan ditampilkan sebelum disetujui.
Buka Komentar
Tutup Komentar