Documentos de Académico
Documentos de Profesional
Documentos de Cultura
ABSTRACT
Spreadsheet applications, and in particular Microsoft Excel,
are now ubiquitous. Even though, many large organisations
heavily rely on them for data analysis, management reporting,
and decision making, limited research regarding their potential
impacts on organisational information quality has been published.
This paper aims to bridge that gap in the literature by identifying
key factors inherent to spreadsheet applications as well as
related to their use which may have significant negative effects
on information quality in organisations. The findings presented in
this paper have been identified as a part of a broader ethnographic
study on information quality, which was conducted in a large
telecommunications company over a period of six months. This
paper shows that the diffusion of spreadsheet applications is
driven by reporting limitations inherent in existing transactional
and Business Intelligence (BI) systems. However, while the
use of spreadsheets may often be justified from the operational
perspective, it frequently leads to significant negative effects on
the quality of relevant information.
Keywords: data quality, information quality, spreadsheet
application, ethnography
INTRODUCTION
End-user computing has been defined as the autonomous use
of information technology by knowledge workers outside of the
information systems department [4, p. 115]. As such, spreadsheets
are the most common data analysis and manipulation tools used
by end users in organisations [48]. Spreadsheets are often used
as tools for modelling relevant for management decision making
[1], and most contemporary Business Intelligence (BI) tools
allow for integration with Microsoft Excel [42]. Whats more,
some organisations even use Excel as their main BI client [7, 22,
42]. For instance, pivot tables are often used for multidimensional
analysis since they provide rollup, drill-down, and slice-and-dice
functionality [7]. According to Gartner Research, the critical
path to virtually every materially significant enterprise financial
statement includes multiple spreadsheets [20, p. 4], and many
financial services companies often use complex spreadsheets to
price a range of financial derivatives [21].
However, in order to maintain the quality of information in
organisational information systems, it is imperative to control
the processes, which introduce, modify and transform relevant
information [11]. Nevertheless, organisations often export
critical information from transactional systems to spreadsheets
and, thus, separate it from source system information integrity
Spring 2011
77
Believability
Accuracy
Objectivity
Reputation
Value-Added
Relevancy
Timeliness
Completeness
Amount
Interpretability
Understandability
Consistency
Conciseness
Accessibility
Access Security
Spring 2011
Rare
Occasional
Frequent
Low
Medium
High
Rate of
Occurrence
High
Medium
Low
High
Medium
Low
Relevancy
Timeliness
Completeness
Amount
M M
(+)Redundant Storage
Rate of Occurrence
(-)Quality Assurance
(-)Training
H
H
H
M
H
M
Medium
(-)Configuration Management
(-)Security Controls
(+)Spreadsheet Silo
High
(-)Integrity Controls
(-)Metadata
Spring 2011
Access Security
Value-Added
Accessibility
Reputation
Conciseness
Objectivity
Consistency
Accuracy
Understandability
Interpretability
Believability
79
Discussion
Reasons for Spreadsheet Use
This study has identified ad hoc reporting as the main use of
spreadsheet applications. As such, data is rarely directly entered
into spreadsheets; instead it is usually imported from transactional
and Business Intelligence (BI) systems. Even though most of the
transactional and certainly all BI systems provide reporting
functionalities, several reasons why they are not commonly used
have been identified (Figure 1):
1. Lack of capital expenditure (CAPEX) funding.
2. The System Development Life Cycle (SDLC) process
is too complex.
3. The time frame required for development is too long.
Development of new system functionality (such as the
development of new reports) is usually considered as capital
expenditure and, as such, the requesting business unit first requires
the appropriate funding. Any such funding usually has to be
formally requested and accompanied by a relevant business case.
Given that CAPEX budgets are usually periodically requested
and approved, any new reporting requirements would have to
be identified well in advance. Furthermore, even if the required
funding is approved, the SDLC phases that have to be followed
(i.e. analysis, design, implementation, testing) are usually very
complex and time consuming. As a result, it may take more than
one year to operationalise any new reports. However, ad hoc
reports are by definition urgently required and only infrequently
used. Consequently, many analysts and managers often prefer to
export raw data from source systems and analyse it in spreadsheet
applications as required. One of the managers observed:
It takes forever and it costs an absolute fortune to develop
these reports. And, once they are developed they are not
really what you asked for in the first place anyway.
Additionally, conflicting business priorities may also impact
on resourcing required for report development; a business analyst
explained:
We had a guy from Oracle developing some reports in
APEX, but he got moved to another project before he was
able to finish them. So, we dont have any other choice
but to extract the data we need and to analyse it in a
spreadsheet.
what system hes getting it from, but I think its the right
data.
As a consequence, such data sets are frequently found to be
incomplete as they may not include all relevant information. At the
same time, they frequently include much irrelevant information,
which may not be value adding. Another manager explained:
We only use about five columns out of 20+ we have in the
spreadsheet.
As business experts are usually cautious about deleting data
from such spreadsheets, much of the irrelevant information
is never removed. This may result in a negative impact on the
amount of information found in such spreadsheets. One of the
business experts explained:
I do not usually delete any data from spreadsheets unless
Im absolutely sure I wont need it in the future.
Another issue is that spreadsheet-based data sets are most
often exported from source systems at the lowest possible level
of granularity. As a result, such raw data may not always be
concisely represented. One of the managers commented:
80
Spring 2011
Redundant Storage
Spreadsheets are also regularly shared between stakeholders
and, thus, they are often redundantly stored. Taking into
consideration that some of the stakeholders are likely to make
changes to the original spreadsheet, multiple versions of the truth
(i.e. inconsistencies) may emerge. Once business experts realise
that spreadsheets are redundantly stored, this may result in a
negative impact on the believability, objectivity, and reputation of
relevant information. A business expert explained:
The problem is that too many people have their own
copies of this spreadsheet. There is no single version of
the truth.
Spreadsheet Silos
While it is usually relatively easy to export data from
organisational information systems into spreadsheets, it is often
much more difficult (frequently impossible) for the data to flow
the other way i.e. to upload spreadsheets to organisational
information systems. This limitation frequently results in the
creation of spreadsheet silos, which, at the very least, may lead to
problems with the accessibility of relevant information.
Manual Data Analysis/Transformations
Configuration Management
Related to the problem of redundant storage is the issue of
configuration management. Given that updates to spreadsheet
models frequently result in the creation of many different versions
of the same file, version control becomes a key requirement.
However, effective configuration management processes are rarely
implemented and followed. As a result, different versions of the
same model may replicate the same data leading to issues with
the amount of information. At the same time, lack of formalised
configuration management may lead to issues with believability,
objectivity, and reputation of such spreadsheets. One of the
managers explained:
Version control is a nightmare. Weve got so many
different versions; everybody has their own. I never know
which one is the most up-to-date one. We do try to put the
date in the file name, but its not the best solution.
Spreadsheet Size Limitations
As already mentioned, the file-based nature of spreadsheets
may negatively impact on accessibility of the relevant information.
For instance, the file size of spreadsheet models and reports can
quickly become very big, so that raw data is often deleted, leaving
only the analysed/aggregated information. In such cases, the raw
data may not be easily accessible. A business analyst explained:
We dont keep the old data in our model; we only keep
the aggregated information. Otherwise the file would be
1GB.
If the raw data is not deleted, and if spreadsheet file size
becomes too big, it may not be possible to easily share it within
81
82
Spring 2011
Spring 2011
83
84
Spring 2011