Hasan's Post

Tutorial repository

View on GitHub
19 November 2022

EDA on Anonymized Data

by Hasan

Anonymized data

What can be done as a data scientist or competitiona participant?

1. Explore individual feature

2. Explore feature relations

1. Individual feature

2. Explore feature relations

1. Individual features

  plt.hist(x)

2. Feature relation

plt.scatter(x,y)
\[x2<=1 -x1\]

diagonal_equation

pd.scatter_matrix(df)
df.corr(), plt.matshow(..)

messy_matrix

we can create some kind of clsutering and then plot them, like k means clustering or rows and columnd and reorder those features. The following plot is the result of k means clustering.

ordered

Feature groups

df.mean().plot(style='.')

x -> feature y -> feature mean

df.mean().sort_values().plot(style='.')

ordered_feature_mean

Next post can be found here

tags: