Skip to contents

The main TangledFeatures function

Usage

TangledFeatures(
  Data,
  Y_var,
  Focus_variables = list(),
  corr_cutoff = 0.85,
  RF_coverage = 0.95,
  plot = FALSE,
  fast_calculation = FALSE,
  cor1 = "pearson",
  cor2 = "polychoric",
  cor3 = "spearman"
)

Arguments

Data

The imported Data Frame

Y_var

The dependent variable

Focus_variables

The list of variables that you wish to give a certain bias to in the correlation matrix

corr_cutoff

The correlation cutoff variable. Defaults to 0.8

RF_coverage

The Random Forest coverage of explainable. Defaults to 95 percent

plot

Return if plotting is to be done. Binary True or False

fast_calculation

Returns variable list without many Random Forest iterations by simply picking a variable from a correlated group

cor1

The correlation metric between two continuous features. Defaults to pearson correlation

cor2

The correlation metric between one categorical feature and one continuous feature. Defaults to bi serial correlation correlation

cor3

The correlation metric between two categorical features. Defaults to Cramer's V.

Value

Returns a list of variables that are ready for future modelling, along with other metrics

Examples

TangledFeatures(Data = TangledFeatures::Advertisement, Y_var = 'Sales')
#> Warning: length(LHS)==0; no columns to delete or assign RHS to.
#> $Final_Variables
#> [1] "tv"    "radio"
#> 
#> $Variable_groups
#> NULL
#> 
#> $Correlation_heatmap
#> NULL
#> 
#> $Graph_plot
#> NULL
#>