Data Analysis Interview Questions
What is DBMS?
A database management system (DBMS) is system software for creating and managing databases.
What is RDBMS?
RDBMS is a program used to maintain a relational database.
What data analytics software are you familiar with?
Data Processing Chain
Data -> Database -> Data Warehouse -> Data Mining -> Data Visualization
What is Data?
Anything that is recorded is data.
What is Database?
- A database is an organized collection of structured information, or data, typically stored electronically in a computer system.
- Generally used for day-to-day operations
What is Data Warehouse?
- A data warehouse is an organized store of data from all over the organization, specially designed to help make management decisions.
- Generally used for reporting and analysis
Graphic representation of data and information.
What is Data Analysis?
Data analysis is the process of collecting, modeling, and analyzing data to extract insights that support decision-making.
What are the responsibilities of a Data Analyst?
What is the importance of Data Analysis?
Informed decision-making, Reduce Cost, Target Customers Better
Data Analysis Process / Data Mining Cycle
A Identify -> Collect -> Clean -> Analyze -> Interpreted
Data Analysis Methods
- Descriptive analysis – What happened
- Predictive analysis – What will happen
- Prescriptive analysis – How will it happen
What are sampling techniques in Data Analysis?
It is the practice of selecting an individual group from a population to study the whole population.
Types of sampling techniques
- Random sampling – selects the participants randomly
- Systematic sampling
- Cluster sampling
- Stratified sampling
- Judgmental or purposive sampling
What is univariate Analysis?
A data analysis where the data being analyzed contains only one variable.
What is bivariate Analysis?
The analysis involves the analysis of two variables
What is multivariate Analysis?
An analysis of three or more variables to understand the relationship of each variable with the other variables
How can you handle missing values?
What is your process for cleaning data?
- Missing data
- Duplicate data
- Data from different sources
- Structural errors
What is Quantitative data?
- Quantitative data are measures of values or counts and are expressed as numbers.
- Quantitative data are data about numeric variables (e.g. how many; how much; or how often).
What is Qualitative data?
Qualitative data are measures of ‘types’ and may be represented by a name, symbol, or number code.
Which validation methods are employed by data analysts?
- Field Level Validation
- Form Level Validation
- Data Saving Validation
What is an outlier?
In data analytics, outliers are values within a dataset that vary greatly from the others (ડેટાસેટની વેલ્યૂ કરતાં ખુબજ અલગ પડતી વેલ્યૂ / डेटासेट की वेल्यू से काफ़ी अलग वेल्यू)
Full form of BI tools
Business Intelligence (BI) tools
What is BI tools?
Business intelligence (BI) tools are types of application software that collect and process large amounts of unstructured data from internal and external systems
List out BI tools available in MS Excel?
A Table, PivotTables, charts, Conditional Formatting, slicers, timeline, PowerPivot
What is Data Mining?
Data mining is the process of discovering relevant information that has not yet been identified before.
What is Diamond mining?
It is the act of digging into large amounts of unrefined ore to discover precious gems or nuggets.
What is Text Mining?
Text mining is the art and science of discovering knowledge, insights, and patterns from an organized collection of textual databases.
What is Web Mining?
Web mining is the art and science of discovering patterns and insights from the Worldwide web.
Types of Web Mining
- Web content mining
- Web structure mining
- Web usage mining
What is Data profiling?
Data profiling is the process of examining, analyzing, and creating useful summaries of data.
What is Data Wrangling (વ્રેન્ગલીંગ / व्रे न्गलिङ्ग )?
It is the process wherein raw data is cleaned, structured, and enriched into a desired usable format for better decision-making.
What is Cluster analysis?
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups.
What is Cohort analysis?
Cohort analysis is a kind of behavioral analytics that breaks the data in a data set into related groups before analysis.
What is EDA (Exploratory Data Analysis)?
It refers to the critical process of performing initial investigations on data to discover patterns.
What are Decision Trees?
Decision trees are a simple way to guide one’s path to a decision. The decision may be a simple binary one, whether to approve a loan or not.
What is Big Data?
Big data is an umbrella term for a collection of data sets so large and complex that it becomes difficult to process them using traditional data management tools.
Full form of KNN
Python libraries used in data analysis
Explain Collaborative Filtering
Based on user behavioral data
For example on online shopping sites when you see phrases such as “recommended for you”
What is Predictive Accuracy?
Predictive Accuracy = Correct Predictions / Total Predictions
What will be the Maximum Predicative Accuracy ?
What will be the minimum Predicative Accuracy consider to use
How have you used Excel for data analysis in the past?
What is a VLOOKUP, and what are its limitations?
What is a pivot table, and how do you make one?
How do you find and remove duplicate data?
What is Sparkline?
A sparkline is a tiny chart in a worksheet cell that provides a visual representation of data.
What is Slicers?
Slicers provide buttons that you can click to filter tables or PivotTables.
What is Timeline in MS Excel?
Microsoft Excel’s timeline object is a dynamic filter option that filters PivotTables and PivotCharts by Date/Time values.
What is Power Pivot in MS Excel?
It is an Excel add-in you can use to perform powerful data analysis and create sophisticated data models.
Difference between Normal Pivot and PowerPivot
The normal pivot version just lists fields within this single table or source that we’re pointing to. Power Pivot allows us to access any of the fields in any of the tables in our data model, and then analyze them based on any relationships that we’ve defined.
Analyst Facility in Excel
Table, PivotTables, charts, Conditional Formatting
Open Ended Questions
What will you do if Excel is not working while presenting your data?
What will you do to get data every day from different 100 web pages?
How will you handle duplicate data in MS Excel?
How much time you will take to learn a new Language or Database system?
How much importance of salary for you?
What will you do if your actual salary is lesser than your expectation?
In Excel How to share data which will be available live 24 hours?
What will be your strategy to collect basic information from a large population?
What will be your strategy for data backup ?
Every year you should take and store a backup, how and which source you will use?
How will you secure your database?
What will be your strategy to type 50 images pages in 1 hrs?