skip to main content
Resource type Show Results with: Show Results with: Index

Applying natural language processing and document classification to text mining RSS feeds in order to classify documents as interesting or not, to an analyst at the company, Alliant

Houtz, Malcolm L., 1963-, author. ;Bilisoly, Roger, 1963-, thesis advisor.

2011

Online access. The library also has physical copies.

  • Title:
    Applying natural language processing and document classification to text mining RSS feeds in order to classify documents as interesting or not, to an analyst at the company, Alliant
  • Author/Creator: Houtz, Malcolm L., 1963-, author.
  • Bilisoly, Roger, 1963-, thesis advisor.; Central Connecticut State University. Department of Mathematical Sciences.
  • Creation Date: 2011
  • Language: English
  • Physical Description: 73 leaves : color illustrations ; 29 cm.
  • Bibliography: Includes bibliographical references (leaves 61-62)
  • Subjects: Data mining; RSS feeds; Natural language processing (Computer science)
  • Description: Really Simple Syndication (RSS) is an internet-based technology allowing website publishers to easily and frequently broadcast relevant content updates to subscribers. Business analysts looking for content relevant to their needs have thousands of RSS feeds to chose from, with each feed potentially syndicating dozens of articles each day. Text mining techniques combined with classic data mining methodology can be used to help predict the relevance of RSS content, saving the analyst valuable time. As the need for text mining applications grows with the overabundance of available data, text mining methodology has grown more and more complex, integrating the natural language rules of the spoken language. Natural Language Processing (NLP) programs are available in many languages, and some programs are specialized for particular industries, such as finance. This thesis proposes that tuning the Natural Language Rules in IBM's SPSS Modeler software for the interests of the direct marketing service provider Alliant, will improve predictions as to whether or not an RSS feed is relevant. Tuning the Natural Language rules from the perspective of Alliant's interest was chosen because of the author's employment there as well as the availability of data. Articles syndicated by a variety of web sites related to the marketing and marketing analytics industries were collected for analysis. This corpus served as the basis for developing models predicting whether or not the article was interesting from the perspective of an analyst working at Alliant. After these predictions were made, the NLP rules were studied and modified, resulting in a measurable improvement in model accuracy.
  • Notes: Includes bibliographical references (leaves 61-62)
  • Degree Granted: M.S. Central Connecticut State University, 2011.
  • OCLC Number: 798924257