Representation of data
Representation of data
Basic Terminologies
- Data: The quantities, characters, or symbols on which operations are performed by a computer.
- Representation of Data: The visual display of quantified information.
Types of Data
- Quantitative Data: Consists of numerical values. This data can be further divided into Discrete Data (countable data) and Continuous Data (measurable data).
- Qualitative Data (Categorical Data): Records non-numeric information such as gender, race, religion, etc.
Graphical Representation of Data
- Histograms: Show the distribution of continuous data in bins.
- Bar Charts: Represent categorical data with rectangular bars. Each bar have height or length proportional to the values they represent.
- Pie Charts: Used to compare parts of a whole. Each sector of the pie represents a category.
- Line Graphs: Show trends over time or a continuous scale.
- Scatter plots: Display values for two variables from a set of data.
Measures of Central Tendency
- Mean: The average of all data points.
- Median: The middle value when data is arranged in ascending or descending order.
- Mode: The value that appears most frequently.
Measures of Dispersion
- Range: The difference between the highest and lowest values.
- Interquartile Range (IQR): The difference between the upper quartile (Q3) and the lower quartile (Q1), which provides the spread of the middle 50% of the data.
- Standard Deviation : Measures the amount of variation or dispersion of a set of values.
- Variance: The average of the squared differences from the mean.
Probability
- It’s a measure of uncertainty. It is a value between 0 and 1; where 0 indicates impossibility and 1 indicates certainty.
Hypothesis Testing
- It’s a procedure used to test the validity of a claim made about a population.
- The claim is translated to an assumption (null hypothesis, H0) and its opposite (alternative hypothesis, H1).
- The conclusion is drawn on the basis of the statistical evidence obtained (Reject H0 or Fail to reject H0).
Regression Analysis
- It is used to understand the relationship between dependent and independent variables.
- Simple Linear Regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model (that is, vertical distances between the points of the data set and the fitted line) as small as possible.
Remember that proper data representation forms a vital part of Data Analysis. Good luck with your learning!