Data Analysis and Billing Hours Distribution for Consultants

Identifying the Distribution of Consultants and Billing Hours

Question

You work as a machine learning specialist for a consulting firm where you analyze data about the consultants who work there in preparation for using the data in your machine learning models.

The features you have in your data are things like employee id, specialty, practice, job description, billing hours, and principle.

The principle attribute is represented as ‘yes' or ‘no', whether the consultant has made principle level or not.

For your initial analysis, you need to identify the distribution of consultants and their billing hours for the given period.

What visualization best describes this relationship?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

Answer: B.

Options A is incorrect.

You are looking for distribution on a single dimension: the consultants billing hours.

From the Amazon QuickSite User Guide titled Working with Visual Types in Amazon QuickSight, “A scatter chart shows multiple distributions, i.e., two or three measures for a dimension.”

Option B is correct.

You are looking for a distribution of a single dimension: the consultants billing hours.

From the Wikipedia article titled Histogram, “A histogram is an accurate representation of the distribution of numerical data.

It is an estimate of the probability distribution of a continuous variable.” The continuous variable in this question: the billing hours, binned into ranges (x-axis), at a frequency: the number of consultants at a billing hour range (y-axis).

Option C is incorrect.

From the Amazon QuickSite User Guide titled Working with Visual Types in Amazon QuickSight, “Use line charts to compare changes in measured values over a period of time.” You are looking for distribution, not a comparison of changes over a period of time.

Option D is incorrect.

From the Statistics How To article titled Types of Graphs Used in Math and Statistics, “A boxplot, also called a box and whisker plot, is a way to show the spread and centers of a data set.

Measures of spread include the interquartile range and the mean of the data set.

Measures of the center include the mean or average and median (the middle of a data set).” A Box Plot shows the distribution of multiple dimensions of the data.

Once again, you are looking for a distribution of a single dimension, not a distribution on multiple dimensions.

Option E is incorrect.From the Wikipedia article titled Bubble Chart, “A bubble chart is a type of chart that displays three dimensions of data.

Each entity with its triplet (v1, v2, v3) of associated data is plotted as a disk that expresses two of the vi values through the disk's xy location and the third through its size.” Once again, you are looking for a distribution of a single dimension, not a distribution on three dimensions.

Reference:

Please see the Amazon QuickSight user guide titled Working with Amazon QuickSight Visuals and the Statistics How To article titled Types of Graphs Used in Math and Statistics.

The best visualization to represent the distribution of consultants and their billing hours for the given period is a histogram (option B).

A histogram is a graphical representation of the distribution of a continuous variable. It consists of a series of bars, where each bar represents a range of values for the variable and the height of the bar represents the frequency or count of the observations within that range.

In this case, the billing hours are the continuous variable, and the frequency or count of consultants falling within each range of billing hours will be represented by the height of the bars. The histogram will allow you to see the distribution of billing hours across the entire dataset and identify any patterns or outliers.

A scatter plot (option A) is used to visualize the relationship between two continuous variables, which is not relevant for this scenario. A line chart (option C) is used to visualize trends over time, which is also not applicable in this case. A box plot (option D) is useful for visualizing the distribution of a continuous variable and identifying outliers, but it does not show the frequency or count of observations within each range of values. A bubble chart (option E) is a type of scatter plot that adds a third dimension to the plot, which is not relevant for this scenario.

Therefore, a histogram is the best option to represent the distribution of consultants and their billing hours for the given period.