Documentos de Académico
Documentos de Profesional
Documentos de Cultura
5070u10a1
Group 4
Capella University
DATA ANALYTICS INTERNSHIP: TEXT MINING APPLICATIONS 2
We first approached this problem by creating a corpus, or repository of data, on the topic: the use
of data analytics in quality assurance. The corpus of documents was created by searching the university
library and downloading the subject matter pertinent to this investigation of our topic. A file called the
corpus compilation was used to record this information. This file was imported into SAS Enterprise Miner
after being uploaded to the Tool Wire remote server. A project diagram, library, and data source was
created inside SAS Enterprise Miner. Following the Text Import node, we partitioned the data, parsed the
text, and filtered the results. Additional documents can be easily added by moving more documents into
the folder and rerunning the project flow diagram. This can be easily automated inside SAS Enterprise
Miner. A screenshot of the project flow diagram is shown in Figure 1; it displays how each of the nodes
were connected and which analyses were to be performed and their order of operations.
Text mining is the analytic investigation of information contained in dialectical content. The use of
content mining methods to solve business problems is called content investigation, or text analytics. Text
mining has a wide array of applications that often result in unique insights about a person or an
organization. Frequently discussed domains of text mining are sentiment analysis, natural language
DATA ANALYTICS INTERNSHIP: TEXT MINING APPLICATIONS 3
processing, and information extraction. According to Search Business Analytics (2017), “Text mining can
help an organization derive potentially valuable business insights from text-based content such as word
documents, email and postings on social media streams like Facebook, Twitter and LinkedIn.” The
volume of unstructured data keeps increasing and organizations are even becoming reliant on data
analytics to remain competitive. One such reason for this increase in volume of unstructured data is that
social media is becoming widely used by many organizations. As valuable insights are to be gained by
organizations about products, customers, and the marketplace, interpreting this information is a priority for
most organizations.
Search Business Analytics (2017) explains the necessity to better understand unstructured data
with text mining methods, “Mining unstructured data with natural language processing (NLP), statistical
modeling and machine learning techniques can be challenging, however, because natural language text
is often inconsistent. It contains ambiguities caused by inconsistent syntax and semantics, including
slang, abbreviations, entities, language specific to vertical industries and age groups, double entendres,
and even sarcasm.” Since understanding the information obtained through textual analysis continually
proves to be quite valuable, we can help offset the difficulties of interpreting its meaning through more
Search Business Analytics (2017) states, “Text analytics software can help by transposing words
and phrases in unstructured data into numerical values which can then be linked with structured data in a
database and analyzed with traditional data mining techniques. With an iterative approach, an
organization can successfully use text analytics to gain insight into content-specific values such as
sentiment, emotion, intensity and relevance. Because text analytics technology is still considered to be an
emerging technology, however, results and depth of analysis can vary wildly from vendor to vendor.”
Therefore, finding the right set of tools for the task at hand is necessary when approaching the business
problem with text analytics software. In the case of our project at Vila Health, we used SAS Enterprise
Miner to accomplish this task. It had an extensive array of text mining features including a built-in
Analytics India (2017) summarizes the goal of text analytics, “The scope in the field of business
analytics is ever expanding and is helping it become mainstream as companies of all sizes and analytics
DATA ANALYTICS INTERNSHIP: TEXT MINING APPLICATIONS 4
skill levels get into the big data game. Exploring business analytics needs the right focus, right
technology, right people, right culture and top management commitment. Companies like IBM, Accenture,
and Deloitte are using business analytics tools and coming up with decisions that are useful and
profitable. Business Analytics plays a very important role here as it uses statistics and tools to decode
consumer insights. This is done based on accrued data, and Business Intelligence that garners key
insights that can help predict future behavior, in effect, helping businesses run better. The latest
developments in Business Analytics’ technology are playing a crucial role in automating the analysis
process.”
“Now a day’s most of the information is available in digital form to get the proper data that is a
challenging task. Most of the researchers focused on these problems and come up with the new model to
retrieving the information from the digital system. In this paper, we learn performance of the different
linguistic patterns and statistical scores considered is carefully studied and evaluated in order to design a
method that maximizes the quality of the results. Our proposal is also evaluated for several well
distinguish domain, offering in all cases, reliable taxonomies considering precision and recall along with
F-measure. In this paper, we propose sequential pattern mining based pattern taxonomy relation, which
discover pattern effectively, to achieve the goal we use some state of art data mining method and popular
algorithms for evolution, for the experimental result we use Reuters (RCV1) dataset and the results show
that we improve the discovering pattern as compared to previous text mining methods. The results of the
experiment setup show that the keyword-based methods not give better performance than pattern-based
method. The results also indicate that removal of meaningless patterns not only reduces the cost of
computation but also improves the effectiveness of the system” (ieeeplore, 2017).
Figure 2 shows the results of the text filter node. Figure 3 displays the documents returned from
the search term: ‘analytics’. Figure 3 also lists a dictionary of terms from the corpus with their frequency,
number of documents of which they appear, whether or not they are kept, and their term weight. Figure 4
shows a map of concept links for terms related to the search term: ‘analytics’. The term “analytics” is
associated with “multiple”, “database”, “doi”, “individual”, “design”, “management”, “technology” and
especially the term “hospital”. It is possible to right-click on any child term, and select ‘expand links’ to
display the terms associated with the child term. In this case the term “hospital” was selected, as such it is
DATA ANALYTICS INTERNSHIP: TEXT MINING APPLICATIONS 5
possible to see the expanded relationships i.e. “indicator”, “nurse”, “clinician”, “score”, and “capability”.
Figure 5 includes a map where the node for hospital is expanded in addition to the original search term.
Concept linking is an important part of exploratory analysis in text mining because it allows an initial
interactive view in preparation for a more procedural solution such as text clustering, probability modeling,
or regression modeling.
Reflecting on this experience of group work, we both thought that we worked exceptionally well
together. Hal did an excellent job of putting together the initial document with a lot of cited material, and
Dante was able to add the necessary content to coincide with the directives, make adjustments to the
initial document, and add sufficient commentary to submit a polished paper. We were able to complete
References
Analytics India. (2017). Scope and Future Trends of Business Analytics. Retrieved from
http://analyticsindiamag.com/scope-and-future-trends-of-business-analytics-in-india/
http://media.capella.edu/CourseMedia/ANLT5070/TextMining/transcript.asp
Ieeexplore. (2017). Electronics and Communication...Operational pattern detection in text mining using
http://ieeexplore.ieee.org/abstract/document/6892780/?reload=true
SAS4084-2015.pdf
Search Business Analytics. (2017). Analytics technologies lend enterprise content management a hand.
mining