Documentos de Académico
Documentos de Profesional
Documentos de Cultura
INSIDE:
> Lexicons Defined
Table of Contents
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Healthcare Lexicons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Financial Lexicons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4. Profanity Lexicon. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5. Lexicon Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7. About ZixCorp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1. INTRODUCTION
ZixCorp services use a set of comprehensive lexicons to scan for sensitive information such as personal
health information or personal financial information in electronic messages. Searches are conducted
by scanning all message subjects, bodies, and attachments for expressions defined within the lexicons.
A lexicon is a file consisting of a comprehensive set of terms, phrases, expressions, and pattern masks
that identify “sensitive” types of information. Sensitive information is defined as information that
could result in liability if disclosed. ZixCorp uses many sources to generate the lexicon content that
searches for sensitive information including federal regulations, authoritative reference sources in the
subject, and standard of care practices.
The following is a description of the lexicons that are typically used in ZixCorp services. In addition to
these standard lexicons, custom lexicons can be created if a customer would like to search for specific
terms in its ZixAuditor® data capture or in its production e-mail systems using ZixVPM®.
2. HEALTHCARE LEXICONS
The healthcare lexicons are a set of two lexicons, identifiers and health terms, that work together to
recognize Protected Health Information (PHI). The lexicons search for PHI by taking the intersection of
identifying information, as set forth by the Department of Health and Human Services, combined with
health terms or claims information. This provides the highest level of confidence that the context is
related to PHI. An example of this would be a spreadsheet containing a Social Security number (SSN),
date of service, and diagnosis. The SSN and date of service would constitute an identifier, and the
diagnosis would constitute health information.
To search for potential healthcare content, both of the healthcare lexicons are combined using the
following logic:
The identifiers lexicon looks for identifiers indicating official business communications (such as SSNs,
Subscriber IDs, etc.) This combination of lexicons is referred to within ZixCorp services as the
Healthcare Content Standard definition.
When used with well-designed policies, the healthcare lexicons can effectively help companies comply
with HIPAA legislation by securing email communications that contain PHI. The following are several
example messages that would trigger “violations” by the healthcare lexicons. The expressions shown
in bold font indicate terms that would trigger violations.
From: Sue
To: Linda
Subject: RE: Shared patient
Linda,
Here’s the info you requested on patient Jane Doe, ss# 123456789. She sees Dr. A. at
General Hospital. She began tamoxifen approximately 5/15/2005. When he saw her in
2006, he stated that she had been on tamoxifen for a year. Her last visit was 1/14/2006.
No sign of cancer.
From: Sue
To: Linda
Subject: RE: Daily Inpatient Report
General Hospital does have an acute rehab service. Both members are improving
considerably with their therapy. Members are Mr. Smith, Mbr Num: 123456
& Mr. Jones, Mbr Num: 234567. They are on a rehab unit.
From: Sue
To: Linda
Subject: New Physical Therapy Doc
Hi Linda,
I would like for you to clarify a SSN for a provider – we show it as 123-45-6789 and the
paper work is showing 123-45-6788. Please verify this so I can process the provider.
Thanks,
Linda
3. FINANCIAL LEXICONS
Like the healthcare lexicons, the personal financial lexicons consist of a set of three lexicons: financial
terms, financial identifiers, and credit card numbers. These lexicon files are designed to work in com-
bination to recognize personally identifiable financial information as defined by the SEC, FTC, Federal
Reserve, and FDIC in the final rulings of Privacy of Consumer Financial Information. These agencies are
the regulation arm of the Gramm-Leach-Bliley Act (GLBA).
The lexicons work in conjunction to recognize the intersection of financial identifiers (such as SSNs,
account numbers, or loan numbers) AND financial terms (such as “balance transfer,” “refinance” or
“deposit”) or credit card numbers.
When used with well-designed policies, these lexicons can effectively help companies implement cor-
porate consumer privacy policies (legislated by GLBA) by reducing the disclosure of personally identifi-
able financial information via email. Additionally, they can help reduce liability risk for financial privacy
issues such as credit card fraud. Below are several example messages that would trigger violations by
the personal financial lexicons. The expressions shown in bold font indicate terms that would trigger
violations.
From: Sue
To: Linda
Subject: My Account
Sorry for the delay in getting back to you. Here is my credit card account info:
5403 1500 0001 0000 – MasterCard Exp. Date: 06/2007
From: Sue
To: Linda
Subject: Your Account
From: Mike
To: Daniel
Subject: Prepayment Fees
In order to complete the monthly billing, please verify the prepayment fee for the
following accounts:
JOHN DOE 111001111 2,630.00
SUE JONES 222002222 4,250.00
Please respond as soon as possible, so we may complete the billing process.
Thank you for your assistance.
4. PROFANITY LEXICON
The profanity lexicon is designed to recognize profane and obscene language in email messages.
According to Merriam-Webster, profane means “to debase by a wrong, unworthy, or vulgar use.”
Obscene means “marked by violation of accepted language inhibitions and by the use of words
regarded as taboo in polite usage.”
These definitions form the basis on which this lexicon was designed and developed. The terms,
phrases, and patterns selected for this lexicon satisfy a narrow set of selection criteria. These are:
Additionally, ...
6. some idiomatic misspellings are included
7. some foreign language terms (notably from Yiddish) are included
Examples of messages containing profanity are not included, for obvious reasons.
5. LEXICON ACCURACY
ZixCorp goes to great lengths to ensure that lexicons are accurate and precise. This is accomplished
through a comprehensive definition and design of the lexicons, coupled with exhaustive manual analysis
to ensure that the lexicon results agree with the judgment of the lexicon designers. The following
example provides a high level overview of the design process and validation of the healthcare lexicons:
As with all automated analysis tools, there will be a certain percentage of false positives and false
negatives. With each new release of the ZixCorp lexicons, the accuracy improves, minimizing the
occurrence of false readings. The accuracy of a lexicon is calculated using the following formula:
The current healthcare lexicons have an average accuracy rate greater than 99%. This means that this
lexicon will correctly identify 99% of messages as containing sensitive health information or not without
any customization.
False Negatives:
A message is classified as a false negative when it contains sensitive information and a lexicon fails to
flag it. To minimize false negatives, the ZixCorp lexicons make use of wildcard characters, masks and
other mechanisms to catch multiple forms of various terms. For example, the identifier lexicon that is
used in searching for PHI uses several masks to search for various combinations of nine digits that might
represent a Social Security number.
False negatives can occur when organizations have terms or codes that are unique to their operation. In
the healthcare lexicons for example, if a healthcare provider has unique patient identification number
formats or an insurance company has unique subscriber ID numbers, these may not be flagged by the
healthcare lexicons in their standard form. The lexicons have been designed to be very flexible however,
and, as such, these unique identifiers (and other types of terms) can be added to the lexicons to be used
by ZixVPM, or before data is analyzed by ZixAuditor. This type of minor customization has been shown
to reduce the false negative rates for PHI to almost 0%. The false negative rate is calculated using the
following formula:
(False Negatives)
False Negative Rate = ------------------------------------------------
(All Sensitive Messages)
False Positives:
A false positive occurs when a message is flagged as containing sensitive information when in fact it
does not. The ZixCorp lexicons have been designed to err on the side of prudent practice and liability
protection, which reduces false negatives. Because of this, false positives are more likely to occur.
Through the ongoing validation process described above, the false positive rate of the healthcare lexi-
cons (in the standard definition) has been shown to be less than 1%.
False positives can occur for several reasons. The following list shows situations that could potentially
cause false positives with the healthcare lexicons:
The ZixCorp lexicons use a variety of mechanisms to reduce the rate of false positives. For example
lexicon entries can be combined with exclusion lists (so that terms are ignored when they appear close
to other specific terms) or inclusion lists (so that terms are only considered when found close to other
specific terms). These mechanisms allow lexicons to search not only for content, but for content used
within specific context. The false positive rate is calculated using the following formula:
(False Positives)
False Positive Rate = ---------------------------------------------
(All Non-Sensitive Messages)
6. CONCLUSION
It is important to remember that the ZixCorp lexicons do not function in isolation. The lexicons are
designed to work within ZixAuditor and ZixVPM, combined with well-designed policy rules and
actions. Then, the lexicons serve as a fundamental part of the powerful content scanning capabilities
within the ZixCorp products.
ZixCorp Professional Services can help customers develop custom lexicons (for ZixAuditor searching),
deploy custom lexicons (for ZixVPM), or design effective ZixVPM policies that can best implement their
corporate email policies.
7. About ZixCorp
www.zixcorp.com
Zix Corporation (ZixCorp®) is the leading provider of hosted email encryption and e-prescribing services.
ZixCorp's email encryption services provide an easy and cost-effective way to ensure customer privacy
and regulatory compliance for corporate email. Its e-prescribing service reduces costs and improves
patient care, by automating the prescription process between payers, doctors and pharmacies.
ZixAuditor®
A non-intrusive email assessment service that enables organizations to identify email security
vulnerabilities and implement more effective policies and procedures to achieve higher levels of
protection and compliance.
ZixPort®
A Web-based secure e-messaging portal that provides enterprises with private, secure, and branded
communication capabilities while minimizing the impact to existing IT, Web, or security infrastructures.
ZixMail®
An easy-to-use, point-to-point desktop service that enables users to encrypt, decrypt, and send private
emails and attachments to anyone.
e-Prescribing
PocketScript®
e-Prescribing application applies the benefits of e-messaging by enabling healthcare providers to
write and transmit prescriptions electronically from anywhere directly to the pharmacy.
For more information on ZixCorp’s products and services contact ZixCorp at 866-257-4949
or email sales@zixcorp.com.
© 2006 Zix Corporation. All rights reserved. Zix Corporation cannot be responsible for errors in typography.
All company, brand and product names are trademarks and/or registered trademarks of their respective owners. www.zixcorp.com LEXICONWP5206