Skip to contents

Introduction

We attempt to link similar features on the basis of correlation. TangledFeatures automatically detects the type of correlation needed between two features. We currently generate the correlations between numeric, unordered categorical variables and ordered categorical variables You can set the correlation type based upon the data set you have within the package.

You can also generate the correlation heat map for the entire data set as well as the interconnected network graph

Correlation types

  • Numeric-to-Numeric Correlation: Set by the cor1 metric. Currently it defaults to Pearson correlation
  • Numeric-to-Factor(Unordered): Set by the cor2 metric. Currently it defaults to PointBiserial correlation
  • Numeric-to-Factor(Ordered): Set by the cor3 metric. Currently it defaults to Kendall correlation
  • Factor(Unordered)-to-Factor(Ordered) Set by the cor3 metric. Currently it defaults to a chi Squared test
  • Factor(Unordered)-to-Factor(Unordered) Set by the cor3 metric. Currently it defaults to Cramer’s V
  • Factor(Ordered)-to-Factor(Ordered) Set by the cor3 metric. Currently it defaults to Polychoric correlation

Correlation Visualization

Set plot = TRUE in the initial function

We can generate the heat map from the final TangledFeatures object

plot_obj <- TangledFeatures(Data)$Correlation_heatmap
plot(plot_obj)

We can generate the interconnected graph as well

plot_obj <- TangledFeatures(Data)$Graph_plot
plot(plot_obj)