Data mining

Document classification

Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is mainly in information science and computer science. The problems are overlapping, however, and there is therefore interdisciplinary research on document classification. The documents to be classified may be texts, images, music, etc. Each kind of document possesses its special classification problems. When not otherwise specified, text classification is implied. Documents may be classified according to their subjects or according to other attributes (such as document type, author, printing year etc.). In the rest of this article only subject classification is considered. There are two main philosophies of subject classification of documents: the content-based approach and the request-based approach. (Wikipedia).

Video thumbnail

HTML Paragraphs

In this HTML video, you’ll learn about paragraphs. They help to organize text on websites. We hope you enjoy! To learn more, check out our Basic HTML tutorial here: #html #htmlparagraphs #coding

From playlist HTML

Video thumbnail

Formatting a Business Document

In this video, you’ll learn more about formatting a business document. Visit to learn even more. We hope you enjoy!

From playlist Communication in the Workplace

Video thumbnail

HTML and CSS Basics - Text I

This E-Lecture deals with the formatting of text in plain HTML. It discusses the use of headings and paragraphs, shows how text can be subdivided into divisions and introduces the principles of working with HTML-entities.

From playlist HTML and CSS Basics

Video thumbnail

Text Classification 1: Centroid Method

[] The simplest way to classify text is to construct a centroid representation of each class by averaging the positive/negative training examples. We can classify new documents by seeing which class centroid it is closer to. This results in a linear decision boundary, wh

From playlist Text Classification

Video thumbnail

Text Classification 5: Learning to Rank

[] How can a search engine combine PageRank, BM25 and all the other relevance indicators? By leveraging the user clicks in a learning-to-rank (LeToR) framework.

From playlist Text Classification

Video thumbnail


This is CS50

From playlist CS50 Sections 2015

Video thumbnail

SYN109 - Word Stores

This E-lecture first draws a distinction between dictionaries and lexicons and then discusses the role of the lexicon in linguistics. It shows how lexical entries are specified linguistically.

From playlist VLC206 - Morphology and Syntax

Video thumbnail

Writing a Formal Business Letter

In this video, you’ll learn more about writing a formal business letter. Visit for our text-based lesson. This video includes information on: • The format and structure of business letters • Uses

From playlist Communication in the Workplace

Video thumbnail

Visual Document Understanding with Multi-Modal Image & Text Mining in Spark OCR 3 | Webinar

Spark NLP and Spark OCR Free Trials are available here: The Transformer architecture in NLP has truly changed the way we analyze text. NLP models are great at processing digital text, but many real-word applications use documents with more

From playlist AI & NLP Webinars

Video thumbnail

Digging into Data: Supervised Classification with Logistic Regression and Naive Bayes

Our first lecture on classification, where we cover two linear methods.

From playlist Digging into Data

Video thumbnail

Text classification and named entity recognition with BertForTokenClassification

Get Started with Spark NLP for free: Recognizing entities is a fundamental step towards understanding unstructured data in documents. Spark NLP includes state-of-the-art BERT-based models for token classification and sequence classification. This s

From playlist AI & NLP Webinars

Video thumbnail

Classification Big Picture and Evaluation

Video Lecture from the course CMSC 723: Computational Linguistics Full course information here:

From playlist Computational Linguistics I

Video thumbnail

R & Python - Classification Part 1

Lecturer: Dr. Erin M. Buchanan Summer 2020 This video is part of my Natural Language Processing course. This video explores the basic concepts of classification with a focus on text data using word2vec, bag of words, tfidf as feature extraction a

From playlist Natural Language Processing

Video thumbnail

R & Python - Classification Part 1 (2022)

Lecturer: Dr. Erin M. Buchanan Spring 2022 This video is part of my Natural Language Processing course. This video explores the basic concepts of classification with a focus on text data using word2vec, bag of words, tfidf as feature extraction an

From playlist Natural Language Processing

Video thumbnail

R & Python - Classification Part 1 (2021)

Lecturer: Dr. Erin M. Buchanan Spring 2021 This update includes a few changes to the lecture material to help clarify what is necessary for classification (i.e., taking out some confusing parts from scikit-learn). This video is part of my Natural

From playlist Natural Language Processing

Video thumbnail

HTML and CSS - HTML Fundamentals

This is the opening E-Lecture of the HTML part of our "HTML and CSS Basics" class. It looks at the history and the status of HTML and introduces the basic conception of this data format for the web: its basic elements, i.e. tags and attributes, as well the general structure of HTML documen

From playlist HTML and CSS Basics

Video thumbnail

Paper Read Aloud: Interactive Refinement of Cross-Lingual Word Embeddings

An experiment! I recorded this a while ago but didn't post it until now because ... 2020. A long time ago, a blind student once asked me to record myself reading my papers when he found that I do that anyway during my editing process, so I finally did it. This is an experiment, feedback

From playlist Papers Read Aloud

Related pages

ID3 algorithm | Rough set | Text mining | Naive Bayes classifier | Concept mining | C4.5 algorithm | Instantaneously trained neural networks | Document retrieval | Decision tree learning | Soft set | Artificial neural network