Text-Classification

Converted the raw documents fetched from the 20newsgroups dataset into a vocabulary frequency table discounting the stop words.
Created a dictionary and engineered a Multi Naive Bayes function to classify the documents and it achieved an accuracy of 86%.
Printed the classification report for both inbuilt and self-engineered implementations and approximately got the same accuracy in both of them.
I have imported the stopwords from nltk and copied more of them from internet.
Instead of split() function, I am using the tokenizer which makes the job much easier.
Instead of manually downloading the data from the internet, I have downloaded it using sklearn.datasets.fetch_20newsgroups

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Text Classification-Project.ipynb		Text Classification-Project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Classification

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text-Classification

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages