1. DScovery

Statistics: Definition, Concept, Data, Types and How to Process It

Statistics function itself as a tool for decision making. This is because a proper assessment cannot be taken without accurate data analysis.

Statistical information obtained from survey responses, direct interviews with respondents, or other data collection techniques. You can listen to the statistical review in the section below.

Definition of Statistics

You will definitely be able to identify statistical data if you learn about data. Data is processed or displayed in the form of tables, charts, graphs and other visual representations to create statistics. Statistics is the study of statistics. Therefore, statistics are data and knowledge respectively.

Statistical Definitions

You cannot separate data administration, including statistics, from a compilation of studies. Yes, statistics are important components of data that require extra precision to be accurate. However, do you really understand what statistics are?

Statistics is defined as facts in the form of numbers that are collected, compiled, and tabulated to show information or conclusions related to a subject matter in the Big Indonesian Dictionary (KBBI). Statistics can also refer to a group of data results that have been assembled, analyzed, and displayed in the form of a graph, table, or other similar format.

You can then draw conclusions or gather facts from there. The use of statistics, a kind of quantitative data, is widespread in many disciplines, including business, economics, marketing, and manufacturing. While statistics is the study of statistics.

Types of Statistics

The collection of data, which is classified into many categories depending on certain criteria, is related to statistics. These are some categories of statistics:

1. Consider Discussion Orientation

Statistics is separated into various kinds of mathematics and is applied according to the direction of the debate. There are several types of statistics, including those that emphasize conceptual derivation, normality/homogeneity tests, regression analysis, errors, and others.

While applied statistics is primarily concerned with understanding statistical concepts and methods that are unique or used in certain scientific fields.

2. Depending on the Stages and Purpose of Analysis

Descriptive and inferential types of statistics are separated based on phase analysis and objectives. The descriptive model relates to procedures for collecting and presenting data in the form of tables, graphs, diagrams, modes, Leon bet, and other formats, as stated previously.

However, general predictions and conclusions from various hypotheses generated from data or events tend to be provided by inferential statistics.

3. Information obtained from the premise of population distribution

There are parametric statistics and non-parametric statistics, both of which are based on assumptions about the population distribution of the data. In short, parametric statistics are a type of data that is based on a normal distribution model. Meanwhile, the free distribution approach is used for nonparametric types.

4. It depends on how many dependent variables there are

Statistics are divided into univariate and multivariate categories based on the number of dependent variables. This kind of data, known as univariate statistics, contains only one dependent variable. Multivariate type statistics have many dependent variables.

Basic Concepts of Statistics

Statistics is a collection of information that includes numerical data as well as non-numeric data that is entered into a table or graph to represent a problem. Then what are the basic concepts of statistics?

Data and Variables

Variables are research objects, or what is the focus of research, while data are the results of research records, both in the form of facts and figures. It is clear from this understanding that data and variables are different from one another.

For example, suppose a researcher collects information about the heights of 50 students. Data on the height of 50 students was collected from this data collection. In this example, the data refers to the heights of 50 students, while the variable is height.

Variable data values ​​in research cannot be empty (no values ​​are missing). This prevents bias from influencing the conclusions drawn. However, missing values ​​must be entered or filled in, if any. By using a custom regression analysis of missing value imputation, for example. It can switch to main analysis after entering missing values.

It can switch to main analysis after entering missing values. The presence of outlier values, or observations in a data set with a pattern or scheme that is different from other observations in the data set, is a common problem besides the problem of missing values.

The existence of truly extraordinary circumstances, such as the respondent's perspective on something that is distorted for reasons unknown to the researcher himself, or procedural errors in measurement or analysis is a source of outliers.

Cook's distance, normalized residuals, scatter plots, boxplots, and difference-fitted value FITS (DFFITS) can all be used to identify outliers.

Data collection

Data collection is the process of gathering information or information needed to answer questions.

For example, the teacher is interested in studying students' favorite subjects. The teacher must gather information about the number of students, subjects, and subject grades to determine a solution.

So how do we collect this data?

Data can be collected using a variety of techniques, including surveys, opinion polls, interviews and direct counts.

Data processing

Data processing is a procedure that can be used to transform unprocessed data into understandable information. Processing is required to turn the raw data into usable information as they are usually in the form of numbers or notes that are not important to the user.

Data processing is usually done by a data scientist or a team of data scientists, and must be done properly to prevent damage to the final product or data output. The raw data is first converted into a more understandable format (graphics, text, and so on), providing it in the form and context necessary for computers to embody and use by people throughout the company.

The departments within an organization that are in charge of running data processing programs are often referred to as "Data Processing", "Data Processing", and "Data Processing" (DP) in English.

Data Interpretation

Your data must be correctly interpreted to be useful. This will make it easier for you to guarantee that you are basing your decisions and actions on accurate information.

In today's world, data is everywhere. There are two kinds of people and organizations: those who overload data or abuse it, and those who profit from it.

Reviewing the data process and drawing related conclusions with the help of different analysis techniques is the interpretation of data. Researchers can categorize, manipulate and summarize data with the help of data analysis to find answers to important issues.

In terms of business, the application of several procedures is the interpretation of data. For the purpose of acquiring knowledge and identifying new patterns and behaviors, these procedures examine and revise data. With all the information you have, these conclusions will help you as a manager in making numbers-based decisions.

Data Types

When it comes to the types of statistical data, there are actually many variations available depending on the nature and results of measurement. Just look at the explanation below.

Quantitative Data

Numerical data is another name for quantitative information. Given it represents a numerical value, such as how much or how often. Quantitative data tells us how many of a particular object there are.

Size, weight, height and other quantitative variables are some examples. Based on the data set, there are two categories for quantitative data.

Discrete data and continuous data are two different types of quantitative data. The SPSS (Statistical Product and Service Solutions) application can be used to perform calculations manually or automatically.

Many measuring instruments, such as rulers, scales, stopwatches, thermometers, etc., are used to measure quantitative data.

Examples of quantitative data that are often encountered, such as:

Number of people living

Amount of water (eg 1,7 liters)

Weight (in grams, kilograms, tonnes)

Time (in seconds, minutes, hours, days or years)

Temperature (in degrees Celsius, Fahrenheit or Kelvin)

Qualitative Data

Qualitative data is another name for qualitative data. due to the fact that it describes data in terms of categories. Qualitative data is not numeric.

The qualitative variables in this data describe characteristics such as a person's gender, hometown, language, religion, and others.

The descriptor specification defines the category size, although it doesn't do it numerically.

Qualitative data can sometimes be quantified. However, unlike quantitative data, these values ​​have no mathematical meaning.

An illustration of a table that includes qualitative information is:

• date of birth

• favorite sport

• school zip code

• the color of the car in the parking lot

• student grades in class

In this case, the school zip code and date of birth have a quantitative value but no numerical meaning. Nominal and ordinal data are included in the qualitative data category. Read on how this kind of statistical data is distributed.

Secondary Data

Researchers will inevitably use secondary data, i.e. information gathered through primary sources by previous studies, in their own investigations. Researchers who collect data for a specific purpose then make it available so that it can be used by other researchers are the source of this type of data.

In addition, this information may be collected for projects without a clear scientific purpose, such as a national census. Researchers use it to find alternative perspectives on original subjects from previous research or to provide answers to problems from new research.

Primary data

A kind of study data known as primary data is collected for the first time from experience or direct testimony. Because primary data is real and unbiased, it is often used in research. Primary data can also be referred to as raw data or first-hand knowledge.

Primary data is usually collected using a variety of methods, including personal interviews, questionnaires, surveys, physical examinations, and observation. Because of this, sampling can be used to obtain primary data rather than the entire population when conducting quantitative research, especially.

Primary data is information generated independently by researchers to answer questions posed by the study problem formulation.

Method of collecting data

The procedures that researchers use to obtain data are an important part of the research. The analysis process will be challenging if mistakes are made during the data collection phase. In addition, if data collection is not carried out properly, the results and conclusions drawn will be confusing.

Depending on the type of research the researcher intends to do, each study has unique data collection procedures. Quantitative data collection will certainly be different from qualitative data collection. In addition, it is inappropriate to link the analysis of data collection with the acquisition of statistical data.

Collecting data for research should not be haphazard. There are procedures and data collection methods that must be used. The purpose of data collection procedures and methodologies is to collect accurate data, ensuring the validity of research findings and conclusions.

Observation

The observation method is a data collection strategy that involves making a number of recordings of the state or activity of the target object in addition to observing it. The practice of observation can also be seen as an action directed at a process or object that involves sensing and understanding phenomena.

This is done based on known concepts and knowledge, so that a lot of information is collected to advance ongoing research.

The observation method is a way of collecting data that is used to verify the validity of the research design used. This involves attentively observing, reviewing, and being present at the study site.

To collect the data needed to carry out further studies, this observation activity is carried out to process objects with the aim of feeling and understanding knowledge about the existence of phenomena based on pre-existing knowledge and ideas.

By directly observing events or situations in the field, this observation technique aims to obtain data. You can use tests, questionnaires, audio recordings, image recordings, and other tools to make observations.

However, using observation instructions, such as observation formats or blanks organized with elements relating to the events or behaviors mentioned and about to occur, is usually the most efficient way to complete the data.

Interview

As a result, there are various ways to collect data, including interviews or interview methods. For individuals who wish to ask many questions directly to respondents and informants, the interview method is a method that can be used to research data.

There are several tools used in the interview procedure that need to be understood, including a description of the research provided as a checklist of questions. The following way lists the different types of interviews.

1. Guided interview, where questions can be asked in accordance with the questions that have been prepared.

2. A free form interview procedure in which the interviewer and respondent exchange questions and answers that flow organically but are consistent with the research and objectives.

3. By combining the first two interview methods, guided free interviews can be conducted. You can still use a list of questions, but you can also create a relaxed environment for the interview and respond to some of the respondents' responses.

You can choose one interview that is used to collect survey data from the various types of interviews available. Make sure to conduct interviews with supporting tools that are appropriate to research, such as recorders that have been prepared, and questions that are in accordance with trends.

Questionnaire

Questionnaire is a data collection method in which respondents are given a list of questions or written statements and asked to respond. Simply put, a questionnaire is a useful tool for collecting data when the researcher is sure of the variable to be measured. Plus, pay attention to what to expect in a reply.

Questionnaires can also be used if the respondents are spread over a large geographic area and in large numbers. The form of the questionnaire includes closed and open statements and questions. After that, it can be delivered personally to the respondent, sent by mail, or sent online.

If the scope is narrow enough to be used for investigation. so it doesn't take too long to submit the questionnaire directly. As a result, there is no need to send surveys to respondents. Direct communication between researchers and respondents will create a favorable environment. so that respondents are willing to provide fast and accurate data.

Uma Sekaran (1992) suggests a number of guidelines for creating a questionnaire as a data collection method. Specifically, the concepts of measurement, writing, and physical appearance.

Experiment

Experimental research is conducted to ensure the results of a deliberate action taken by the researcher.

An experiment, a term derived from the Latin "ex-periri" (to test), is also known as a research experiment. Experimental research involves doing and observing things to test theories or find relationships between causes of symptoms.

The source of all the symptoms will be tested in this experimental study to identify the causes or independent variables that will affect the consequences or dependent variables.

Natural science and social psychology experimental research is typically used to advance understanding in both fields.

The Big Indonesian Dictionary (KBBI) defines research as an activity that involves systematically collecting, processing, analyzing and presenting data. The aim of this research is to establish general principles through problem solving or hypothesis testing.

While experimental research is an experiment that is planned and intended to show the truth of a hypothesis.

Case Study

In terms of research methodology studies, it is necessary to process as much information as possible about the topic being studied in order to provide a complete explanation of all aspects of a person, group or research organization (Mulyana, 2018, p. 201).

The word "case" itself refers to a real situation or problem at hand; unique conditions related to a person or item (KBBI, 2016). So a case study is an attempt to study a topic or condition by gathering as many facts or pieces of data as possible. This information is often referred to as evidence in case studies.

Case studies, according to Wahyuningsih (2013, p. 3), are investigations of "bound systems" or "a case/cases" which sometimes involve extensive data collection and "rich" sources of information. " " in one context.

These time- and location-bound systems have instances that can originate from programs, events, activities, or individuals. In other words, a case study is a study in which the researcher investigates a particular phenomenon (case) over a certain period of time and activity.

Statistical Data Processing

To make conclusions from research findings, statistical data processing involves reducing complexity and converting data into a format that is simpler to read and understand. Data can be numeric data or non-numeric data, and are processed in different ways.

Data Presentation

The method connected with collecting and displaying aggregated data to offer meaningful information is known as statistical data presentation. Making statistical data simple to understand and interpret improves the accuracy of inferences and fact-based decisions.

Statistical data is usually presented in one of two ways: tabularly or graphically. The data table view is more general. Graphs display data as visual graphs while tables usually offer data as columns and rows. But don't be surprised if the use of graphical representations of statistical data consistently attracts attention.

Data can be presented graphically in various ways, including polygons, histograms, frequency distributions, and ogives. This is so that the characteristics and trends of data distribution can be described through the appearance of statistical data.

Data Concentration Measurement

Usually, the central value serves to characterize the data set. Data Concentration Measurement is the name given to the number. The average data center size sum is a representative value of the data set, and therefore must meet the requirements listed below.

• Must consider all data in the data group.

• Must not be affected by extreme values ​​or outliers.

• Must be stable from sample to sample.

• Must be capable of being used for further statistical analysis.

The mean, median, and mode are three frequently used measures of data concentration. The mean (mean) of the three measurements of data centering satisfies all but the second of the criteria listed above. Extreme numbers or outliers have a significant impact on the average.

Data Variability Measurement

Since it does not reveal information about the sample as a whole, the information obtained by assessing central tendency alone is sufficient for statistical analysis. The central tendency only reveals the value that is in the middle of other values; it does not reveal how far apart or how similar the values ​​are within the group. As an illustration, consider the following three groups of data:

A : 25 25 25 25 25 25 25 25 25

B : 21 23 23 24 25 26 26 27 30

C : 6 15 15 21 25 27 30 41 45

The three data groups above have different data quality but have the same average. While the temporary data group B is more homogeneous than C, the data group is quite homogeneous. What measurements should be taken later to get clearer information?

In addition to central tendency, measure variability is also necessary to provide an effective summary of a data distribution or data set.

The spread of variable values ​​relative to their central tendency in a distribution—especially the Mean or average—shows how much the variable values ​​deviate or deviate from their central tendency. This is known as variability. A summary of the variation, range, and heterogeneity of a group's measurements (data) will be provided by the measure of variability.

Correlation Analysis

What exactly is the correlation? An analytical method that is part of the association measure methodology is known as correlation analysis. The set of bivariate statistical methods used to assess the strength of the correlation between two variables is collectively referred to as a measure of association.

Two correlation methods, the Karl Pearson Product Moment Correlation and the Spearman Rank Correlation, are among the various association measurement approaches in use today. the process of calculating numbers to assess the degree or strength of the relationship between variables.

If one behavioral variable influences other variables, then the two variables are said to be related. The two variables are called independent if they are not influenced. As a result, correlation analysis has no dependent variable or dependent entity.

Regression Analysis

Regression analysis, what is it? A method or methodology called regression analysis, which is expressed as a mathematical equation (regression), is used to analyze the research hypothesis and determine whether there is a difference between one variable and another.

Regression can be divided into two categories: simple linear regression and multivariate linear regression. If one independent variable is used in basic linear regression to explain or predict the outcome of the dependent variable Y.

Multiple linear regression is a method for determining the impact of two or more independent variables (also known as independent variables or X) on the dependent variable (also known as the dependent variable Y).

Thus, in simple terms it can be said that basic regression analysis is used if we want to find out whether one variable X has no effect on variable Y. Meanwhile, we use multiple linear regression analysis to determine the effect of two or more X variables on variable Y. .

Descriptive Statistical Analysis

The purpose of descriptive statistics is to make data more relevant, easy to read and understand for data users. Descriptive statistics involve collecting, organizing, summarizing, and presenting data. Without trying to generalize the sample to the population, descriptive statistics are limited to offering a summary or broad description of the characteristics of the object under consideration.

By looking at the minimum value, maximum value, average (mean), and standard deviation of each independent variable and dependent variable, descriptive statistical analysis can be used to provide an overview of the distribution and behavior of the research sample data.

Size of Data Center (Mean, Median, Mode)

The size of the data center determines how large a group of sorted data will appear when the center is displayed. Data should ideally be arranged from small to large. The mean, median, and mode will then be searched as values ​​based on the data center distribution. What do the data sets, median, and mode look like? Here's the explanation.

1. Means

When asked to calculate the average number, you must be quite knowledgeable. The term "mean" also refers to the mean value of a data set. The mean value of a data set, a group, or a frequency can be calculated. Sorted and summed data will be accepted. The data will then be separated by volume.

There are two types of mean values: the population mean and the sampling mean. The sample mean is referred to as the statistic, and the population mean is referred to as the parameter. In use, it can be more challenging to determine the population mean unless the data is sparse. Therefore use the sample technique.

2. Median

Next is the media. The number that falls right in the middle of the data is called the median. Data must be sorted from smallest to largest to calculate it. You should check the evenness or probability of the data before calculating the median. Due to the different calculation approaches, this is quite significant.

3. Mode

The mode of value is another frequently used metric besides the mean or average. Finding frequently repeated values ​​is the emphasis of this single data centering measure. Data does not need to be sorted from smallest to largest to find the mode value, unlike searching for the median value. Just keep an eye on the data and check the datum that appears frequently.

Frequently occurring datum frequencies are easy to calculate. The following example is a live example. 2, 2, 1, 5, 3, 2, 1, 3, 3, 1, 3, 1, 4, and 2 form the data group. We can tell right away from the statistics that the number three occurs most often. You can also arrange them based on data that has been grouped to make things easier.

In order to properly compare or evaluate data, a thorough understanding of data concentration measures is required. For example, you need to know the average number of women living there. Finding the mean, median, and most frequently occurring value—also known as the mean, median, and mode—is a basic theoretical method for measuring the centrality of data.

Measures of Data Variability (Range, Variance, Standard Deviation)

Because it does not reveal information about the sample as a whole, the information obtained by assessing central tendency alone is not sufficient for statistical analysis. The central tendency only reveals the value that is in the middle of other values; it does not reveal how far apart or how similar the values ​​are in the grouping.

In addition to central tendency, a measure of variability is also needed to provide an adequate picture of data distribution or data collection.

The spread of variable values ​​relative to their central tendency in a distribution—especially the mean or average—shows how much the variable values ​​deviate or deviate from their central tendency. This is known as variability. A summary of the variation, range, and heterogeneity of a group's measurements (data) will be provided by the measure of variability.

1. Range

The range is the difference between the lowest and highest scores, or the distance between the lowest and highest scores. Reach, distance, spread, and range are other names for range.

Depending on whether the measure of variability is applied to data with ordinal, interval, or ratio scales, different properties apply.

2. Variance

The variance is equal to the sum of the squares of deviations from the mean divided by n-1 or the squared mean of each score to the mean. Biased estimation and unbiased estimation are two different types of variance. Other names for variation are variance, variance, and variance. This type of measure of variability is suitable for data with interval and ratio scales.

3.Standard Deviation

The average deviation of each number from the mean in a data set is known as the mean deviation. The dispersion (spread) of data values ​​relative to the average decreases as the price deviation decreases.

A. The mean deviation of a single data point is the absolute difference from the mean, multiplied by the number of data points.

B. For grouping data, the mean deviation is the sum of the absolute deviations between each data point and the average value, multiplied by the frequency value, and divided by the number of data points.

This type of measure of variability is suitable for data with interval and ratio scales.

Frequency Distribution

There are many different approaches to data analysis that can be applied when carrying out research activities. The frequency distribution is a method of data analysis that is often used in statistics. Quantitative data analysis techniques often use the frequency distribution methodology.

The frequency distribution can be an option in data processing if the researcher has a set of numerical data that is random, scattered, and is still in a state of raw data.

The number series is then disseminated by categorizing it based on certain intervals or categories according to the frequency distribution of data loading. If we want to use the frequency distribution technique, there are two types of data, namely individual data and data that has been entered into intervals.

The frequency distribution according to Riduwan (2003) is the organization of data from the smallest to the largest which categorizes the amount of data into different groups. The purpose of using data that has been converted into a frequency distribution is to make information easier to display, interpret, and read.

The findings of the frequency distribution will later be used in statistical calculations, as a basis for choosing a method for visualizing statistical data, and for categorical distributions in drawing conclusions from the data.

Bar Charts and Pie Charts

What are diagrams? Diagrams are visual representations and presentation tools that can be used to present specific facts, instructions or information. Diagrams provide easy understanding of data, instructions, and information by others. Bar charts, line charts are some examples of diagrams that display data.

1. Bar Chart

A type of chart known as a bar chart displays data as bars or rectangles. In a bar chart, specific values ​​or chunks of data are represented by short bars. Bar charts are often used to track the evolution of data values ​​over time or data types with multiple classifications.

2. Pie chart

A pie chart is a chart that shows numbers or values ​​as circles. The data is separated into circles, each of which represents a different value.

Inferential Statistical Analysis

Inferential Statistics, What Is It? While data is used to derive findings involving a sample from a population, inferential statistics is one of the existing analytical procedures used to draw conclusions and generalize them to the population.

Drawing conclusions from the sample to the population

The parameter estimation approach is that used by inferential statistical methods. In population analysis, the standard deviation, mode, median, and mean are calculated from the sample findings using the parameter estimation method. Lower population range constructs and confidence intervals were used for parameter estimation.

Hypothesis test

Hypothesis testing is the second approach you have for doing inferential statistical analysis. This is achieved by performing a statistical comparison task based on the combined averages of two samples.

In the field of medical research, especially pharmaceutical research, the hypothesis testing method is often used. All of that is only to determine whether the drug offered is successful in treating a disease or not. Every test given to anyone is unnatural. Therefore, to determine whether a treatment is beneficial or not in treating disease in the community, drug trials must be carried out with adequate and random samples.

Confidence Intervals

Using samples and certain statistical techniques, confidence intervals are techniques used to estimate population parameters based on a sample within a certain range.

We think there is a population parameter whose value is unknown for this time period. Of course, the results will vary, and to get a more accurate estimate, we have to account for the variability.

The margin of error is the term usually used to describe the variability of this measure. Because it defines a confidence interval, this value is very important.

The statistics applied will produce an estimated value that contains population parameters by reducing or increasing the variability value. This is what the confidence interval (confidence interval) looks like.

For example, a political consulting firm might use statistical methods to select 100.000 people in a random sample to ascertain the electability (parameters) of presidential candidates before a general election.

With a 5% margin of error, around 59% of respondents chose candidate "A". Hence, between 54 and 64 percent of voters voted for candidate "A", according to these statistics.

When using inferential statistics, the confidence intervals are inseparable.

T test and Z test

The two most important testing techniques in inferential statistical analysis are the Student-T test or Student-T Test and the Z-test or Z-Test.

A method of statistical analysis called inferential statistics is often used to compare differences in treatment between groups or groupings.

Probability statistics is another name for inferential statistics, which allow any researcher to justify findings based on sample data that are in reality coincidental. There are two categories of inferential statistics: nonparametric inferential statistics and parametric inferential statistics.

The normal distribution and the Z test, often known as the Z-Test, are closely related. In statistics, the normal distribution, sometimes known as the normal curve, is a key concept.

The theory behind inferential statistics centers on the normal curve. Test results are sometimes assigned using the normal curve. Statistical hypothesis testing that approaches the normal distribution can be done using the Z-Test and Z-Test techniques.

Data with a high sample size will have a more normal distribution, within the anticipated theoretical constraints. A sample of 30 or more is considered a large sample.

William Sealy Gosset created the Student-T Test, often known as the Student T-Test. He wrote under the pseudonym "Student" in his writings, thus the test method became known as the Student-T Test or Student T-Test.

William Sealy Gosset believed that the Z distribution/values ​​were not suitable for use with data from small samples, so he created the Student-T Test, also known as the Student's T-test, which has a distribution that is closer to the normal. distribution. Sample size < 30 is considered small.

The Student's T distribution is a distinct distribution that should be used to find the area under the sampling distribution and identify critical areas for smaller samples and when the standard deviation is unknown.

The Student-T distribution will align with the normal distribution the larger the sample size, so this test method can actually be applied to large samples as well.

Of course the calculation of the t method is different from the calculation of the Z method. Degrees of freedom are a new set of conditions for calculations that must be met when using the T method.

The exact location of the critical zone also varies with sample size because the exact shape of the t distribution fluctuates. Before determining the crucial region for determining alpha, the degrees of freedom, which are equal to N-1 in the case of the single-sample mean, must be calculated.

Another difference is that, in contrast to the region under the sampling distribution, the entry in the table is the real score, designated t(critical), which marks the start of the critical region.

Anova and Chi-Square analysis

ANOVA test (Analysis of Variance) is used to compare three or more population means. To ascertain whether there is substantial variation between population means, the ANOVA test method is helpful. Depending on the number of components investigated, statistical one-way (one-way ANOVA) or two-way (two-way ANOVA) statistics can be used.

The chi-square test is also used to compare observed frequencies with predicted frequencies. If there is a large difference between the observed frequency and the expected frequency, it can be determined using the chi-square test. In examining categorical or nonparametric data, the chi-square test is often used.

Use of Statistics in Research

Especially in quantitative research methodology, statistics play an important role in model development, hypothesis formulation, tool development, instrument data collection, research design compilation, sample maintenance, and data processing.

Research Design

In general, a research design can be thought of as a deliberate sketch or research design. The research framework can also be seen as a research design. Therefore, a research proposal is needed before the research is carried out.

The research design is included in the contents of the research proposal, and it is the lecturer or supervisor who decides whether it is sufficient and suitable for use in the field or still needs to be updated before submission.

Can proceed to the research stage if no further adjustments are required. In fact, the research plan is constructed to outline a specific framework that is relevant to the overall research topic.

According to Kerlinger, research design is an overall plan consisting of research projects. To get objectivity and validity, the researcher developed this research strategy.

Meanwhile, according to Wisadirana, research designs are widely used to make data analysis designs and support sample selection.

Sampling

Supardi (1993) defines sampling and sampling strategy as a technique or approach to selecting research samples.

The same thing was stated by Margono (2004) so ​​that the sample can represent the population, the withdrawal and sampling must be carried out according to the sample size that will be used as the actual data source, taking into account the nature and distribution of the population.

There are procedures or actions that we must follow properly when conducting sampling. Our ability to achieve our research objectives will be aided by adhering to current systematics. The sample technique involves the general steps listed below:

1. Define the population to be observed

2. Determine the sample frame and set of all events that can occur.

3. Determine the appropriate sampling technique or method

4. Conduct sampling (data collection)

5. Re-examine the sampling process

Data analysis

A data practitioner must be proficient in data analysis. Critical thinking and effective problem solving skills are required to complete the data analysis process. The capacity to choose the right data analysis technique is required.

The results of the analysis are greatly influenced by the application of appropriate data analysis techniques. The required results may not be obtained if you use the wrong data analysis methodology, which will obviously result in wasted time and effort.

Processing data with the intention of finding relevant information that can be the basis for making decisions to address problems is known as data analysis.

To extract important information from data, this analytical method groups data based on its properties, performs data cleaning, transforms data, and builds data models.

Remember that after going through this procedure, the data must be presented in a visually appealing and easy-to-understand way, usually in the form of graphs or plots.

Today, technology is used in almost all of our activities. This technology is undoubtedly connected to data, which will continue to evolve. Data will only be wasted if allowed to accumulate.

However, data can be manipulated and leveraged to generate meaningful information. Therefore, data analysis is an important stage in data processing. There are many methods or strategies that can be applied in data analysis.

Interpretation of Results

Your data must be correctly interpreted to be useful. This will make it easier for you to guarantee that you are basing your decisions and actions on accurate information.

In today's world, data is everywhere. There are two kinds of people and organizations: those who overload data or abuse it, and those who profit from it.

Reviewing the data process and drawing related conclusions with the help of different analysis techniques is the interpretation of data. Researchers can categorize, manipulate and summarize data with the help of data analysis to find answers to important issues.

In terms of business, the application of several procedures is the interpretation of data. For the purpose of acquiring knowledge and identifying new patterns and behaviors, these procedures examine and revise data. With all the information you have, these conclusions will help you as a manager in making numbers-based decisions.

Statistical Applications in Various Fields

Basically, statistics is the study of data collection, data processing, data analysis, and making judgments based on the results of the analysis. As a tool for decision making, statistics serve a purpose in itself. A proper assessment cannot be taken without accurate data analysis.

Statistics is used in a variety of ways that directly relate to and benefit many aspects of human life. Of course, there are applications of statistics beyond the social sciences. In both the sciences and the disciplines of business, industry, and economics, statistics are used frequently. Statistical developments in many sectors are as follows:

Statistics in Economics and Business

The Central Bureau of Statistics (BPS) is an organization in Indonesia that specializes in statistics. The economic census conducted by BPS every ten years is a form of routine research. The census tries to assess Indonesia's economic development. Information obtained from the results of the census can be used to describe the state of the Indonesian economy which continues to grow or even shrink. In addition, this data can be used as a standard to compare the Indonesian economy with other countries in the world.

Statistics plays an important role in the country's economy. This important role relates to the collection of general economic data. The position primarily requires knowledge of the following.

1. Government policies against inflation

2. Reducing the poverty rate

3. Equal distribution of education and community income

4. Improvement of community welfare

5. The development of the prices of staple goods

6. Development of demand for certain commodities

7. The level of unemployment and poverty in society

8. The amount of money circulating in society

10. Percentage of economic growth.

The innovation process of enterprise and industry development relies heavily on statistics. For example, achieving performance that is able to compete with competitive advantage, statistics as a method is the key to success in solving or obtaining solutions to problems faced in the world of business and industry.

Statistics is used in almost all aspects of business and industry to create and evaluate data to support strategic and managerial decision making.

Innovation has emerged as a major concern in the development of business and industry in times of excellence and intense competition. The ability to compete with outsiders, or at the very least, keep up, requires the ability to innovate. Innovation is the process of developing and commercializing new products to gain a competitive advantage.

Statistics in Science and Engineering

Mathematics is one of the disciplines that also analyze statistics, along with programming and data science. As a result, this can be seen as one of the foundations for the rise of data science as one of the most lucrative modern science professions.

Along with the emergence of big data, the demand for skilled human resources in data science is increasing. However, the availability of human resources does not match their needs. A data scientist is more skilled at programming than any statistician and more skilled at statistics than any programmer, according to Josh Wills, former head of engineering at Slack.

In other words, statistics is an important element that is standard in data science.

Statistics in Health and Medicine

The following are statistical applications in the medical field:

- Measuring important events or significant social events.

- Assess the state of public health and be aware of existing health problems among diverse population groups.

- Comparing current community health status with past community or community health status from one location to another.

- Determine the condition of public health in the future.

- Evaluation of the health program or service being implemented, including its flow, successes and failures.

- Requirements to evaluate public health service requirements and set clear objectives.

- Requirements to study environment, family planning, and health-related issues.

- Systems for organizing and managing healthcare.

- Requirements for scientific media coverage.

Statistics in Social and Humanities

Scientists make use of statistics in the social and natural sciences for at least three reasons, specifically:

- Data collection (through surveys or experiments)

Four possible data collection methods: registration, census, survey and experiment. However, there are usually two methods used in statistics to collect data: census and survey.

Census is a comprehensive data collection technique in which each component of the population under study is placed and counted one by one. While surveys are a means of collecting data, the constituent part of the population is the data collected.

The challenge of balancing labor, money and time to get data right increases as more types of data are required in a project. Because of this, surveys are more often used in research.

- Hypothesis test

More Coverage:

A proposition is tested using statistical methods in hypothesis testing, and the findings of the test are then stated to be statistically significant. The inferential statistics component is hypothesis testing.

The hypothesis is a reasonable claim is still in question. We can systematically collect data and run experiments to prove its claims beyond any doubt. We can decide whether the hypothesis should be accepted (the data does not provide evidence against it) or rejected (the data does provide evidence to reject the hypothesis) by performing a statistical test of the hypothesis.

- Theory development

Research will initiate a new chapter of theory development, making research a procedural method for theory development. Since the purpose of research is to make us open to new ideas for development, by noticing and monitoring new phenomena that are different from the norm, new hypotheses can be raised.

For example, it is clear that the findings of this study will produce valuable new theories if many people do research in the social field. The development of these new theories can then be used to explain various agreements, differences, and problems. If social science research continues to add experience, this might happen.

If social research is carried out, new experiences will be obtained which may vary. Completely new hypotheses can be developed from new encounters that deviate from what is normally imagined.

This experience develops into postulates, which are basic assumptions or pillars of reasoning or premises of a chain of reasoning that are not always proven until additional evidence is presented.

Statistics Software

Statistical software, also known as statistical analysis software, refers to technologies that help collect and analyze data on a statistical basis so that research can gain insight into patterns and trends.

Excel

Microsoft Excel is a program that can be used for statistical analysis. Many rows, columns and cells make up this software, which also has many formulas ranging from statistical formulas to financial mathematical formulas and so on. These formulas, also known as formulas, allow us to perform complex calculations using only one line of the formula.

A spreadsheet-based program called Microsoft Excel, or simply Excel, is used to organize data and numbers using various accessible formulas and functions. Excel is another very powerful spreadsheet program in its category. Excel is used for almost all purposes in various business sectors and for companies of all sizes to perform financial analysis or data analysis. Microsoft Excel includes a variety of features, including basic Excel operations, complex formulas, and shortcuts that you can use to enhance functionality.

Therefore, it is not surprising that Microsoft Excel has strong supporters from almost all industries, including small and large businesses.

SPSS

Large data sets can be processed quickly by IBM's SPSS program to generate insights for research decision making. According to IBM's website, 81% of reviewers find SPSS easy to use, making it an excellent choice for statisticians and new users alike.

To enable more accurate reports, it can also estimate and identify missing values ​​in data sets.

With as many user licenses as needed, SPSS is designed to handle large volumes of data and is scalable and agile.

Since the system is open source, users have the option of creating their own custom applications or using the more than 100 free extensions available from the IBM Extension Hub to extend SPSS syntax with R and Python.

R and Python

Of course, you should first practice each of these programming languages ​​to determine which one is the best for you to learn.

In this scenario, python and R, developers typically use the python language for problems that involve implementing programs or results from data analysis programming that apply statistical techniques to dashboards or data-driven applications.

Python is a programming language that must be a tool integrated with every step of the workflow in order to be used in production. Although R was developed for use in research and therefore excellent for exploratory data analysis, its use in business has increased rapidly in recent years.

Python's ability to be widely used in scripting, even as a glue language or computer language that can be used to put together various software components, is useful for data science.

Programming languages ​​like Python and R are useful for data science and have several advantages. Starting with the data collection process, Python offers powerful libraries for crawling and scraping data collection, in particular the tweepy and scrapy libraries.

Additionally, R offers comparable advantages to Python in terms of comprehensive libraries, particularly for data visualization and data analysis.

SAS

Users can use tools and processes for statistical analysis and data visualization on this cloud-based platform. necessary to meet certain analytical requirements. It is mostly used for statistical modeling, spotting trends and patterns in data, and assisting in decision making by business analysts, statisticians, data scientists, researchers and other scientists.

By performing several tasks at the same time, the effectiveness and stability of the program is increased. Users can design countless fully configurable built-in graphs and statistics. SAS users are allowed to experiment and program using the user interface or coding language they choose.

Ethics in the Use of Statistics

In statistics, it is necessary to put forward many things, especially using ethics in the use of statistics. Let's discuss more deeply about ethics in statistics.

Statistical Code of Ethics

Statistics that are independent, unaffected and not influenced by any party; statistics that guarantee the confidentiality of individual data; statistics that are impartial and can be used by all parties; statistics that meet the norms, standards, procedures and criteria that apply to each statistical activity; and statistics that adheres to a statistical code of ethics are the basic principles and norms that must be obeyed in every statistical activity; (l) Easy-to-understand statistics; (m) Statistics are made without burdening the respondents.

Honesty and Integrity in Data Reporting

Take responsibility for all work bearing your name, disclose relevant data and examples used in your research, as well as the statistics and assumptions that underlie them. You should also clearly identify the intellectual sources that inform your writing.

Understand the rules governing the subject of protection, avoid using excessive numbers of research participants, ensure privacy and confidentiality in accordance with legal requirements, prevent or limit fraud, and other issues.

Data Security and Privacy

Respect different viewpoints and admit mistakes openly; if errors are found, investigate and trace the procedures used; regretting all kinds of mistakes; prohibit plagiarism, data falsification, or falsification; abstain from retaliation; or openly admit their expertise could scientifically point to errors.

Challenges in Statistical Analysis

Today, data is seen as the new fuel and something the whole world needs. The amount of data generated globally today is 2,5 trillion bytes, huge isn't it? This information is gathered from a variety of sources, including social media, trade data, manufacturing and sales transactions, and more.

In fact, there are other data sources that can be used, such as website visits and link clicks. Data processing systems or applications must first be able to read and process unstructured data, which today is more common than neatly arranged numbers as seen on a spreadsheet.

Today, data is seen as the new fuel and something the whole world needs. The amount of data generated globally today is 2,5 trillion bytes, huge isn't it? This information is gathered from a variety of sources, including social media, trade data, manufacturing and sales transactions, and more.

In fact, there are other data sources that can be used, such as website visits and link clicks. Data processing systems or applications must first be able to read and process unstructured data, which is currently more common than neatly arranged numbers as seen in a spreadsheet.

Incomplete Data

Incomplete and data quality constraints are the first hurdles in data processing. Information currently available is often insufficient, incorrect, or contains false information. This can result in poor analysis and judgment.

The fix is ​​before running the analysis, tidy up the data. This includes handling missing numbers, standardizing data, and eliminating erroneous data.

Verify and validate data to ensure high quality before using it.

Inaccurate Data

Any source, including primary and secondary sources, can be used for data collection. How to collect the right data for this process to produce reliable results is the difficulty. Therefore, it is very important to get accurate data to produce the required results.

Choosing the right data collection method is the answer that can be used to overcome this problem. Data can be collected in various ways, including through observation, surveys, interviews, group discussion forums, and others.

Non-Representative Data

Unfounded conclusions and poor decision making may result from insignificant or incidental patterns.

The fix is ​​to use the appropriate algorithm for the available data. Improper algorithms can lead to inaccurate results. Before drawing conclusions, consider the context of the data and run statistical tests on the hypotheses. Data mining professionals should regularly assess their findings in light of these difficulties to increase the quality and depth of their research.

Sampling Error

Errors in sample selection resulting in duplicated data. Duplicate data is the result of frequent data collection from many sources. That is, there are two or more identical data in the same dataset.

The findings of the output data processing will be less accurate as a result. Duplicate data not only takes up a lot of storage space but also impacts the results. As a result, we have to search the dataset for duplicate data and delete one of them.

Statistical Trends

Statistical trends are statistical analysis techniques commonly used to make historical perspectives or forecasts. To make a good presentation, several different types of information (data) must be used, and it must be collected over a long period of time so that the results of the analysis can reveal some significant fluctuations and specific factors that cause the presentation to fail to change.

Development of Statistical Analysis Techniques

Uses data analysis methods that Google uses to make its data centers more effective. Google can streamline operations and reduce energy use by analyzing data about temperature, humidity and electricity usage in data centers.

Data analysis techniques are becoming increasingly important as a result of advances in technology and digitization in various fields, including business, health and education.

The right data analysis methods can help us uncover hidden patterns, trends and connections in data. even very large data sets.

Business intelligence benefits more from statistical data analysis than most other types of data analysis. It is also called "descriptive analysis." Using a variety of quantitative research techniques, you can collect and examine different types of data to look for important trends and patterns.

Statistical data analysis is often used to analyze observational and survey data, but can also be used to analyze many other business KPIs.

Statistical data analysis has begun to be applied in various technologies, especially social media, apart from being used by various businesses. For example, it offers analytical capabilities to determine how much traffic is coming from social media, such as how many likes, followers, and so on.

Given the fact that each method can be built to be a solution to a particular problem, statistics show that science is easy to develop.

Trends in the Use of Statistics in the Big Data Era

Big Data has been widely used in today's data trends, especially during the digitization period. Today, businesses and society are increasingly dependent on other data producers for the provision of reliable, high-quality data and information.

Big data can actually drive faster, more diverse and more detailed delivery of statistical data. The promise of big data has been officially recognized by the global statistical community.

In today's digital era, big data analytics is crucial for organizational and business development. The main reason is that data analysis will enable the disclosure of the latest patterns and trends every day.

If HR professionals wish to understand the pace of change and productivity patterns of candidates and workers and identify solutions for them, this information will be of great help.

Developments in the Field of Statistics in the Future

We often use statistics in our daily lives without even realizing it. even in simple situations like homework, work, or other places. Many fields, including the natural sciences, make extensive use of statistics.

The population census is one of the many uses of statistics in the public sector. Other common uses of statistics today include quick counts (quick tallies of election results), polls, or polling methods (such as those used before a general election). Statistics can be used in processing for pattern detection and artificial intelligence.

The application of statistics in our daily life can be increased. This is evident from our use of statistics within the organization. For example, a large company requires a degree in statistics. Especially for analyzing the market and forecasting the coming years using statistical data. This profession is undoubtedly difficult, requiring special knowledge acquired through college studies. As a result, statisticians earn big salaries.

To facilitate the management of population data, statistics in the public sector also require data management. consider population estimation and analysis, academic research, welfare indices, and government budgets as examples. In addition, while identifying the area, counting the number of schools, and cataloging natural resources. The government can determine the needs of elementary school students by understanding and collecting population data aged 7 to 12 years which is one of the real applications of this statistical science.

Statistics will be needed in the future to manage large amounts of data. Of course, managing data for a country, let alone the whole world, is no simple matter. To manage this data requires statistical understanding and processing expertise. Data analysis is required for large-scale data processing (big data). In addition, the volume of data is on a global scale.

Well, that's a complete and thorough explanation of statistics. We hope that you will understand more and understand more about the importance of statistics, especially when it can be used in our daily lives.

Are you sure to continue this transaction?
Yes
No
processing your transactions....
Transaction Failed
try Again

Sign up for our
newsletter

Subscribe Newsletter
Are you sure to continue this transaction?
Yes
No
processing your transactions....
Transaction Failed
try Again