Link to github repo
These are the column of the dataset representing various aspects of a software. These will decide if a software is legit or malware
Correlation between the attributes
- Data preparation was not that much needed cause it was already a well proccesed data.
- Did Hyper parameter tunning using GridSearchCV
- Among GaussianNB, AdaBoostClassifier, DecisionTreeClassifier, KNeighborsClassifier, SVM, RandomForest LogisticRegression, RandomForest performed best with score of 99.1%