Skip to main content

MLOps - Day 15

Today's learning :-
Data Visualization:-
  • I'll continue to learning data visualization, As we know there are lots of libraries we have in python for data visualization (matplotlib, seabon, folium etc.). Today I'll learn one more library called 'seaborn'.
  • Seaborn, a library of python which help us to achieve statistical graphs, but in background it use matplotlib.
  • We have do two types of operations on data, 1. Analysis - if we do data operation on past data it is known as data analysis. 2. Analytics - if we do data operation on future data it is known as data analytics.
  • There are three type of variable we have in machine learning, Uni-variate, Bi-variate and Multivariate. Graphs plotted for Uni-variate, Bi-variate and Multivariate are known as Uni-variate, Bi-variate and Multivariate distribution graphs.
  • Let's do some practicals of data visualize using seaborn library-
Code :-
import seaborn as sns
sns.set()
tips = sns.load_dataset('tips')
tips.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   total_bill  244 non-null    float64 
 1   tip         244 non-null    float64 
 2   sex         244 non-null    category
 3   smoker      244 non-null    category
 4   day         244 non-null    category
 5   time        244 non-null    category
 6   size        244 non-null    int64   
dtypes: category(4), float64(2), int64(1)
memory usage: 7.3 KB
tips.head(5)
tips.columns
Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'], dtype='object')
bill = tips['total_bill']
bill.head()
0    16.99
1    10.34
2    21.01
3    23.68
4    24.59
Name: total_bill, dtype: float64
bill.max() 
50.81 
bill.min()
3.07 
sns.distplot(bill)
<matplotlib.axes._subplots.AxesSubplot at 0xe2cacc8>
#Univariate
sns.distplot(bill, kde=False, bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0xe415548>
bill.head()
0    16.99
1    10.34
2    21.01
3    23.68
4    24.59
Name: total_bill, dtype: float64
# Bivariate
sns.jointplot(data=tips, x='total_bill', y='tip', kind='scatter')

<seaborn.axisgrid.JointGrid at 0xed40fc8>
Like this you can check document of seaborn  for more details and options.

After this jumped on classification in machine learning
  • If the data is not continuous and only to be decided the probability (to be happen or not) then this kind of use-case we solve using classification approach instead of regression.
  • There are two types of classification, Binary and multi-classification.
  • As mentioned it gives probability(0 or 1).
  • In Binary classification we have to set cut-off point to decided if something will happen or not.
  • To solve binary classification use cases we can use sklearn, which is known as Logistic regression, and sklearn use sigmoid function(1/1+e^-x) in background. 

Comments