17

NovStatistics is considered an important branch of mathematics. It is primarily centered around the collection, analysis, interpretation, and presentation of various numerical facts. It is used in almost all research fields and forms the basics of data science. Statistics help in better understanding of the data. It is helpful both in data collection, summarization, analysis, and even interpretation of variable numerical data. Thus, the field offers immense scope in many sectors like sports, psychology, marketing, education, etc. where one can get work as statistics administrator, financial analyst, etc.

- What are the various branches of statistics?
- Enumerate various fields where statistics can be used?
- What is the difference between Data Science and Statistics?
- What is the meaning of correlation and covariance in statistics?
- What is Bayesian?
- What is Frequentist?
- What is the Likelihood?
- What is P-value?
- Explain P-value with the help of an example?
- What do you mean by sampling?
- What are various methods of sampling?
- What do you mean by Mode?
- What do you mean by Median?
- What do you mean by skewness?
- What is the meaning of Covariance?
- What is One Sample test?
- What do you mean by Alternative Hypothesis?
- What do you mean by Significance Level?
- Give Example of Central Limit Theorem?
- What is Binary Search?
- Can you throw more light on the Hash Table?
- Explain the difference between ‘long’ and ‘wide’ format data?
- What is the meaning of normal distribution?
- What is the primary goal of A/B testing?
- What is the meaning of statistical power of sensitivity, and how is it calculated?
- What do you think is the difference between overfitting and underfitting?

Here is a comprehensive list of questions which you can prepare for your interview in the same field.

Read: How to Do Data Manipulation of Packages Using R?

Statistics have two main branches, namely:

**Descriptive Statistics:**This usually summarizes the data from the sample by making use of an index like mean or standard deviation. The methods which are used in the descriptive statistics are displaying, organizing, and describing the data.**Inferential Statistics:**These conclude from data which are subject to random variations like observation mistakes and other sample variation.

Statistics are usually used in many different kinds of research fields. The lists of files in which statistics are used are :

- Science
- Technology
- Biology
- Computer Science
- Chemistry
- Business

It is also used in the following areas:

- Providing comparison
- Explaining action which has already occurred
- Predicting the future result
- Estimation of quantities which are not known

Data Science is a science which is led by data. It includes the inter-disciplinary fields of scientific methods, algorithms, and even the process for extracting the insights from the data. The data can be either structured or unstructured. There are many similarities between data science and data mining as both useful abstract information from the data. Now, data science also includes mathematical statistics and computer science and its applications. It is by the combination of statistics, visualization and applied mathematics and computer science that data science can convert a vast amount of data into insights and knowledge. Thus, statistics form the main part of the data science as it is a branch of mathematical commerce with the collection, analysis, interpretation, organization, and presentation of data.

Both correlation and covariance are basically two concepts of mathematics which are widely used in statistics. They not only help in establishing the relations between two random variables but also help in measuring the dependency between the two. Although the work between these two mathematical terms is similar, they are quite different from each other.

**Correlation**: It is considered as the best technique for measurement and also for estimation of the quantitative relationship between the two variables. Correlation measures how efficiently two variables are related.**Covariance**: In this, two terms vary together, and it is a measure which shows the extent to which two random variables can change in cycle. It forms a statistical relationship between a pair of random variables, where any change is one variable reciprocates by a corresponding change in another variable.

Bayesian rest on the data which is observed in reality and further consider the probability distribution on the hypothesis.

Frequentists rest on the hypothesis of choice and further consider the probability distribution on the data, whether it is observed or not.

The probability of some of the observed outcomes under specific parameter values is regarded as the likelihood of the set of parameter values under certain observed outcomes.

In terms of statistical significance testing, the p-value represents the probability of obtaining a test value, which is as extreme as the one which had been observed originally. The underlying condition is that the null hypothesis is true.

Let us suppose the experimental results showing the coin turning heads 14 in 20 flips in total. Here is what is derived:

- Null hypothesis (HO): a fair coin
- Observation 0: 14 heads out of 20 flips
- P-value of observation 0 given HO= Prob (≥ 14 heads or ≥ 14 tails) = 0.115

We can see above that the p-value overshoots the value of 0.05, so the observation is in line with the null hypothesis-that means the observed result of 14 heads in 20 flips can be related to the chance alone- as it comes within the range of what would happen 95% of the time is this was a real case. In the example, we failed to reject the null hypothesis at the level of 5 %. The coin did not have an even fall and the shift from the expected outcome is slight to be reported as “not statistically significant at 5% level).

Sampling is considered as part of the statistical practice which is concerned with the selection of an unbiased or random subset of single observations in a population of individuals which are directed to yield some knowledge about the population of concern.

Sampling can be done in 4 broad methods:

- Randomly or in a simple yet random method
- Systematically or taking every kth member of the population
- Cluster when the population is considered in groups or clusters
- Stratified i.e. when the exclusive groups or strata, a sample from a group) samplings.

The mode is defined as that element of the data sample, which appears most often in the collection.

X= [ 1 5 5 6 3 2]

Read: How Online Training is Better Than In-Person Training?

Mode (x) % return 5, happen most.

Median is often described as that numerical value which separates the higher half of the sample, which can be either a group or a population or even a probability distribution from the lower half. The median can usually be found by a limited list of numbers when all the observations are arranged from the lowest to the highest value and picking the middle one.

Skewness is described as the data asymmetry, which is centered around a mean. If skewness is negative, the data is spread more on the left of the mean to the right. If skewness is seen as positive, then the data is moving more to the right.

Covariance is a measure of how two variables move in sync with each other.

y 2= [1 3 4 5 6 7 8]

cov ( x,y2) % return 2*2 matrix, diagonal represents variance.

T-test refers to any statistical hypothesis test in which the statistic of the test follows a Student’s t distribution if the null hypothesis is supported.

[h, p, ci] = test (y2,0)% return 1 0.0018ci = 2.6280 7.0863

The Alternative-hypothesis, which is represented by H1 is the statement which holds true if the null hypothesis is false.

The probability of rejection of the null hypothesis when it is known as the significance level a, and very common choices are α=0.05 and α=0.01.

Let us suppose that the population of the men has normally distributed weights, with a mean of 173lb and a standard deviation of 30 lb and one has to find the probability

- If one man is randomly selected, the weight is greater than 180 lb
- If 36 different men are randomly selected, the mean weight is more than 180 lb.

The solution will be:

z= (x-µ)/σ= (180-173)/30=0.23

For normal distribution P(Z>0.23)= 0.4090

σ xÌ…= σ/√n=20/√36=5

Read: How to Learn Python for Data Science?

z=(180-17)/5=1.40

P(Z>1.4) =0.0808

In any binary search, the array has to be arranged either in ascending or descending order. In every step, the search key value is compared with the key value of the middle element of the array by the algorithm. If both the keys match, a matching element is discovered, and the index or the position is returned. Else, if the search key falls below the key of the middle element, then the algorithm will repeat the action on the sub-array which falls to the left of the middle element of the array if the search key is more than the sub-array to the right.

A hash table refers to a data structure which is used for implementation in an associative way in a structure which can map keys to values. A hash table makes use of a hash function for computing an index into an array of buckets or slots from which the correct value can be got.

In the wide format, the repeated responses of the subject will fall in a single row, and each response will go in a separate column. In the long format, every row makes a one-time point per subject. The data in the wide format can be recognized by the fact that the columns are basically represented by the groups.

Data is usually distributed in many ways which incline to left or right. There are high chances that data is focussed around a middle value without any particular inclination to the left or the right. It further reaches the normal distribution and forms a bell-shaped curve.

** The normal distribution has the following properties:**

- Unimodal or one-mode.
- Both the left and right halves are symmetrical and are mirror images of each other.
- It is bell-shaped with a maximum height at the center.
- Mean, mode, and even the median are all present at the center.
- Asymptotic

A/B testing refers to a statistical hypothesis with two variables A and B. The primary goal of A/B testing is the identification of any changes to the web page for maximizing or increasing the outcome of interest. A/B testing is a fantastic method for finding the most suitable online promotional and marketing strategies for the business. It is basically used for testing everything from website copy to even the emails made for sales and also search ads.

The statistical power of sensitivity refers to the validation of the accuracy of a classifier, which can be Logistic, SVM, Random Forest, etc. Sensitivity is basically Predicted True Events/Total Events. True events are the ones which are true and also predicted as true by the model.

In both statistics and machine learning, fitting the model to a set of training data to be able to make increased reliable predictions on general untrained data is a common task. In the case of overfitting, random errors or noise is described by a statistical model instead of an underlying relationship. In the case of overfitting, the model is highly complex, like having too many parameters which are relative to many observations. The overfit model has poor predictive performance, and it overreacts to many minor fluctuations in the training data. In the case of underfitting, the underlying trend of the data cannot be captured by the statistical model or even the machine learning algorithm. Even such a model has poor predictive performance.

**Conclusion**

Statistics has widespread applications everywhere. It has been used in many applications like biology, meteorology, demography, economics, and mathematics. Economic planning is not possible without statistics and is largely baseless. The field, due to its immense applications has a huge scope. One has to have sharp clarity of concepts and knowledge of the basics in addition to a keen interest in the field. The interview questions mentioned above if done with care will be a great help for helping you to tide over an interview with ease.

- Introduction of Data Science
- Importing Data into R
- Random Forest In R
- Data Manipulation in R
- Python for Data Science
- R Programming for Data Science

JanBask Training is a leading Global Online Training Provider through Live Sessions. The Live classes provide a blended approach of hands on experience along with theoretical knowledge which is driven by certified professionals.

AWS

- AWS & Fundamentals of Linux
- Amazon Simple Storage Service
- Elastic Compute Cloud
- Databases Overview & Amazon Route 53

DevOps

- Intro to DevOps
- GIT and Maven
- Jenkins & Ansible
- Docker and Cloud Computing

Data Science

- Data Science Introduction
- Hadoop and Spark Overview
- Python & Intro to R Programming
- Machine Learning

Hadoop

- Architecture, HDFS & MapReduce
- Unix Shell & Apache Pig Installation
- HIVE Installation & User-Defined Functions
- SQOOP & Hbase Installation

Salesforce

- Salesforce Configuration Introduction
- Security & Automation Process
- Sales & Service Cloud
- Apex Programming, SOQL & SOSL

QA

- Introduction and Software Testing
- Software Test Life Cycle
- Automation Testing and API Testing
- Selenium framework development using Testing

Business Analyst

- BA & Stakeholders Overview
- BPMN, Requirement Elicitation
- BA Tools & Design Documents
- Enterprise Analysis, Agile & Scrum

SQL Server

- Introduction & Database Query
- Programming, Indexes & System Functions
- SSIS Package Development Procedures
- SSRS Report Design

Search Posts

Trending Posts

Top 30 Core Java Interview Questions and Answers for Fresher, Experienced Developer
** 22.3k**

Difference Between AngularJs vs. Angular 2 vs. Angular 4 vs. Angular 5 vs. Angular 6
** 15.3k**

Cloud Computing Interview Questions And Answers
** 10.3k**

Different Types of SQL Server & SQL Database Functions
** 9.1k**

SSIS Interview Questions & Answers for Fresher, Experienced
** 9k**

Related Posts

Receive Latest Materials and Offers on **Data Science Course**

**Interviews**

- Business Analyst Interview Questions
- DevOps Interview Questions
- AWS Interview Questions
- QA Testing Interview Questions
- Software Testing Interview Questions
- SQL Interview Questions
- Salesforce Interview Questions
- Java Interview Questions
- Hibernate Interview Questions
- Spark Interview Questions
- Vmware Interview Questions
- Data Science Interview Questions
- Digital Marketing Interview Questions
- API Testing Interview Questions
- SSAS Interview Questions
- Power BI Interview Questions
- Cloud Computing Interview Questions
- SSRS Interview Questions
- Manual Testing Interview Questions
- Social Media Interview Questions
- Performance Testing Interview Questions
- MSBI Interview Questions
- QTP Interview Questions
- Automation Testing Interview Questions
- SSIS Interview Questions
- GIT Interview Questions