27 Years and 81 Million Opportunities Later:
Investigating the Use of Email Encryption for an Entire University

Publication #

This is the companion website for our paper “27 Years and 81 Million Opportunities Later: Investigating the Use of Email Encryption for an Entire University” which will appear at IEEE S&P'22.

We measured the use of email encryption based on 27 years of data for 37,089 users at a large university. While attending to ethical and data privacy concerns, we were able to analyze the use of S/MIME and PGP in 81,612,595 emails.

We found that only 5.46% of all users ever used S/MIME or PGP, which led to only 0.06% of encrypted and 2.8% signed emails. We identified S/MIME as the most used solution and that using multiple email clients had a negative impact on signing and encrypting emails.

Overall the adoption of email encryption was very low and key management tasks posed a challenge for S/MIME and PGP users.

Share this page

Do you know someone who would be interested in this research? Feel free to share it with them!

For any further questions, feel free to check out the sections below or write us an informal email:

Contact: stransky@sec.uni-hannover.de


27 Years and 81 Million Opportunities Later:
Investigating the Use of Email Encryption for an Entire University

Christian Stransky, Oliver Wiese, Volker Roth, Yasemin Acar, and Sascha Fahl.
43rd IEEE Symposium on Security and Privacy (S&P'22), May 22-26, 2022.

Abstract

Email is one of the main communication tools which has seen significant adoption in the past decades. However, emails are sent in plain text by default and allow attackers easy access. Users can protect their mail by encrypting them using tools such as PGP or S/MIME.

Although PGP was introduced already in 1991, it is common belief that email encryption is a niche tool that has not seen wide spread adoption to date. Previous work identified ample usability issues with email encryption such as key management and user interface challenges, which contribute to the limited success of email encryption.

However, so far an investigation of email encryption adoption for a larger population over a long time is missing in the literature. Towards filling this gap, we measure the use of email encryption based on 27 years of data for 37,089 users at a large university. While attending to ethical and data privacy concerns, we were able to investigate the use of PGP and S/MIME in 81,612,595 emails.

We found that only 5.46% of all users ever used PGP or S/MIME. This led to 0.06% encrypted and 2.8% signed emails. Users were more likely to use S/MIME than PGP by a factor of six. We saw that using multiple email clients had a negative impact on signing and encrypting emails and that only 3.36% of all emails between S/MIME users who previously exchanged certificates were encrypted on average.

Our results imply that the adoption of email encryption is indeed very low and that key management challenges have a negative impact even for users who have setup PGP or S/MIME previously.

Downloads
Filename Type
Preprint pdf
Replication Package zip

Overview #

Email is one of the main communication tools which has seen significant adoption in the past decades. However, emails are sent in plain text by default and allow attackers easy access. Users can protect their mail by encrypting them using tools such as PGP or S/MIME.

While email is used for all kinds of information including the most sensitive kinds such as trade secrets, account credentials, and health data, regular email is not encrypted and allows network attackers and service providers unauthorized access. This is not for a lack of tools. Both S/MIME and PGP were introduced almost 30 years ago with the goal to provide end-to-end encryption for email. However, in contrast to modern messaging tools such as Signal that implement end-to-end encryption by default, S/MIME and PGP require a complex manual setup by users. Consequently, previous work has shown that using email encryption correctly and securely is challenging for many users (cf. 1, 2, 3, 4, 5, 6). They struggle with setting up and configuring encryption keys, distributing them, managing keys on multiple devices, and revoking them. These findings, already anticipated by Davis, are corroborated by public reports of failed PGP use. For example, it took Edward Snowden and the journalist Glenn Greenwald a few months and serious effort to set up PGP for email in order to communicate securely.

Hence, it is commonly believed in the security community that end-to-end encrypted email is not widely used, mostly because of lacking usability and awareness issues identified in a multitude of user studies in the past 22 years (cf. 1, 2, 3, 4, 5, 6). To the best of our knowledge, our work is the first scientific collection and evaluation of ground truth on the adoption of end-to-end email encryption.

Our work was mainly motivated as follows:

  • Ground Truth: We aim to confirm the security community’s anecdotal knowledge about the low adoption of end-to-end email encryption and provide ground truth based on field data. Our longitudinal field data can help motivate future work to improve the adoption of end-to-end encryption for email.
  • Method Extension: We extend the toolbox of the past 22 years of email encryption research that was initiated with the seminal paper ``Why Johnny Can’t Encrypt'' at USENIX Security’99 that is mostly based on laboratory experiments and self-reporting studies: Here, we investigate a large dataset including millions of data points of thousands of users and years of their email data.
  • Validate Results from Previous Work and Investigate Underexplored Challenges: We confirm findings from previous work obtained by other methods including smaller-scale interviews, surveys, and controlled experiments. Additionally, we also investigate further challenges that require large scale field data.

With this work we make the following contributions:

  • Data Collection Pipeline: We developed and tested a reproducible and privacy friendly data collection pipeline in collaboration with our data protection officer, the university staff council and the technical staff of the university IT department. This allows to analyze a large amount of email with a focus on S/MIME and PGP usage and reproduce our results on different populations.
  • Adoption of Email Encryption at a Large German University: We provide a detailed evaluation of the adoption of email encryption at our university in the past 27 years.
  • Use of S/MIME and PGP: We provide a detailed overview of the algorithms and key sizes we encountered in our dataset.
  • User Interaction Challenges including Key Management: We investigated user interaction challenges that previous work has identified, focusing on key management issues, multi device use, and key rollovers.

Data Collection Pipeline #

Our data collection pipeline was designed with privacy in mind and split in two parts. The university IT staff that is responsible for the mail servers executes the part that pseudonymizes the mails to ensure that the researchers never gain access to raw mail metadata. At no point did we collect email subject or body information to avoid the disclosure of personally identifiable information (PII) to the researchers. We also ensured that metadata included neither email account names nor the departments' names or subdomains. We aimed to keep the number of processing errors low and consistently tested the pipeline with our own mailboxes until no further processing errors occurred.

Figure 1 provides an overview of the nine-step processing pipeline:

  • We initially started with a local testing environment on a small sample emailbox created specifically for our study (1)
  • The technical staff reviewed the initial pipeline and iteratively tested it against the full set of emailboxes of the researchers and their own emailboxes (2)
  • We exported the emailboxes to JSON-formatted files (3)
  • We parsed and pseudonymized all emails (4)
  • We performed assertion checks on every email to ensure that neither the emailaddress nor the domain was present in any result fields to account for email clients writing private data to unexpected places (5)
  • In the case of a succeeding assertion check, we stored the resulting email metadata for further analysis on a secure server in the university’s computing center (6a)
  • In the event of a failure, we dropped all email metadata to avoid the leakage of private information (6b)
  • The IT department’s technical staff transferred pseudonymized results to the authors' secure cloud storage (7)
  • We analyzed the pseudonymized results (8)
Figure 1: Illustration of our data collection and evaluation pipeline.

Figure 1: Illustration of our data collection and evaluation pipeline.

Pseudonymization and Ethics #

We collected meta data for every email and applied pseudonymization to it to ensure the privacy of users. This pseudonymization was generally done by hashing with SHA256 using a secret salt, that stays on the mail servers where the pseudonymization is happening. Exact dates were not stored, but only the corresponding week. We also used categorization to group the users into one of the following groups: Student, Staff, Faculty, NX Unknown, and External. The following tables give an overview of the collected data and the applied pseudonymization.

Pseudonymization Details
Header-Data Format
Message ID SHA-256 with salt
User SHA-256 with salt
User group Categorized
Sender SHA-256 with salt
Receivers SHA-256 with salt
CC list SHA-256 with salt
BCC list SHA-256 with salt
Date Bracketed into week
Client Raw value
Folder Categorized
S/MIME-Data Format
Serial Number SHA-256 with salt
Not Valid Before and After Bracketed into weeks
User group Categorized
Issuer Info Raw value
Signature Algo Raw value
Key size Raw value
Key type Raw value

We applied pseudonymization for S/MIME to leaf certificates.

PGP-Data Format
KeyID SHA-256 with salt
Creation date Bracketed into weeks
Expiration date Bracketed into weeks
Type of key Raw value
Length Raw value
Key Algo Raw value
Digest Algo Raw value

We applied pseudonymization for PGP to both primary and sub key information.

Ethical Concerns and Data Privacy

To conduct the large scale measurement study on email data, our institutions, and specifically the institution where the data was collected and evaluated, did not require formal ethical review for this type of study. Therefore, we did not involve an ethics review committee. However, we followed our institutions' guidelines for good scientific practice, which includes ethical guidelines. Here, the institutions specifically place the burden of determination of whether research is ethical on the respective researchers. We intensively discussed within and outside our research team to determine possible concerns with this research project, and whether this project would be feasible. We concluded that in addition to following laws and our institutions’ ethics requirements, we should also follow the de facto ethics standards of the S&P community. The data used in this study can be described as pseudonymized data derived from human subjects, as mentioned in the Call for Papers.

In addition to ethics, we made sure to address all legal aspects of our research to adhere to strict German privacy protection laws and the European General Data Protection Regulation (GDPR). Therefore, we involved the data protection officer and the works committee of the institution where the data was collected and evaluated, as required by the German data protection regulations. We developed the data collection plan jointly with the data protection officer, with the goal to protect users’ privacy and adhere to the strict German data protection regulations and the regulations in the state of Lower Saxony. After more than a year of multiple discussions and hearings, we agreed on the presented data collection plan (cf. details in Section IV-A in the paper). After the involved authorities had rigorously assessed the legal situation based on the GDPR, German data protection laws, and state law of the involved authorities, we were allowed to analyze pseudonymized metadata of all users at our institution without requiring user consent. Additionally, our legal counsel decreed that the benefit of our research to society outweighed the risks to individuals. We concur with the assessment that answering our research questions is beneficial for future end-to-end encryption research, which ultimately benefits society, and that there was no harm done to any participants based on possibly re-identifiable metadata. Importantly, we will not publish the metadata we collected and only an encrypted copy will be stored at the university data center for ten years without access by researchers to follow good scientific practice. As is common in research, we only publish aggregate data, and no email accounts can be re-identified through the publication of this paper. As part of the joint development of our data collection plan, we decided to take the following measures to protect users' (metadata) privacy and adhere to the GDPR, German, and state laws:

  • The involved researchers never had access to raw data. The data collection pipeline was executed by the university’s IT staff who operate the email servers and have access to the backup data. They transferred the pseudonymized data to the researchers to a secure server.
  • We reduced the amount of data to the absolute minimum we required to investigate our research questions.
  • We used cryptographically secure hash functions with salts unavailable to the researchers to pseudonymize user data.
  • At all times pseudonymized data (cf. Pseudonymization) was only stored on secured university servers.
  • We did not and will not share any data other than the aggregate numbers in the paper with anyone outside the team of involved researchers.
  • We assured the data protection officer and the works committee that we would not take any actions to de-identify users.

Results #

Below we give a brief overview on our key findings. For details please refer to our IEEE S&P'22 paper.

In total, we analyzed metadata for 81,647,559 emails from 37,089 email accounts. Overall, the university’s email users exchanged 40,540,140 (49.67%) emails internally. Figure 2 illustrates the use of email at our university in the past 27 years. While we found only 350 emails in 1994, we detected an almost exponential growth and found 17,190,472 emails in 2020. This development reflects the enormous relevance of email as a communication tool and is in line with reports on the global use of email.

Dataset #

Figure 2: Growth of email communication over the years.

Figure 2: Growth of email communication over the years.

  • We saw an exponential growth of the use of email between 1994 and 2020.
  • Only 0.06% of emails were encrypted.
  • 2.8% of emails were signed.
  • S/MIME was more widely used than PGP.

Email Encryption Certificates and Keys #

  • RSA is the most widely used key algorithm.
  • A key size of 2048 bits was used most often with S/MIME.
  • PGP keys used 4096 bits most often, although newer PGP keys used less secure 2048 bits.
  • About one third of the PGP keys did not have an expiration date set.
  • The Deutsche Telekom was root CA for 64.95% of all S/MIME certificates.

Email Encryption Users #

  • More than 94% of all active users never used S/MIME or PGP.
  • S/MIME users signed six times more of their emails than PGP users on average.
  • Using two to three different clients decreased the likelihood of signing emails by 51.76%.
  • On average, less than 3% of all emails between users who had exchanged S/MIME certificates previously were encrypted.
  • Leakage of private keys via email does not seem to pose an issue.

Summary #

In this work, we presented the first analysis of a large corpus of longitudinal email data for thousands of users at a large German university. We were able to confirm common beliefs and results from previous work in the security community: Only few users used email encryption to secure only a small fraction of their emails. We identified key management to be challenging in particular in the context of multiple clients, key rollovers and key exchange. Based on our evaluation, we make suggestions for improving email encryption adoption. Overall, we hope our investigation provides a data driven motivation for future work to improve both the security and usability of email encryption solutions.

Researchers #

Researchers

Christian Stransky | Researcher and PhD Student (Leibniz University Hannover).
Contact: stransky@sec.uni-hannover.de

Oliver Wiese | Researcher (Freie Universität Berlin).
Volker Roth | Professor (Freie Universität Berlin).
Yasemin Acar | Research Group Leader (Max Planck Institute for Security and Privacy) and Research Professor (George Washington University).

Sascha Fahl | Tenured Faculty (CISPA) and Full Professor (Leibniz University Hannover).
Contact: sascha.fahl@cispa.de

Institutions

Leibniz University Hannover logo

Leibniz University Hannover

FU Berlin logo

Freie Universität Berlin

MPI SW logo

Max Planck Institute for Security and Privacy

CISPA logo

CISPA Helmholtz-Center for Information Security

Cite This Work #

Feel free to cite this publication as:

@inproceedings{conf/oakland/stransky22,
	author = {Stransky, Christian and Wiese, Oliver and Roth, Volker and Acar, Yasemin and Fahl, Sascha},
	title = {27 Years and 81 Million Opportunities Later: Investigating the Use of Email Encryption for an Entire University},
	booktitle = {To appear in 43rd IEEE Symposium on Security & Privacy (SP'22)},
	month = {May},
	year = {2022},
	publisher = {IEEE Computer Society},
	volume = {}
}
Stransky et al. "27 Years and 81 Million Opportunities Later: Investigating the Use of Email Encryption for an Entire University" 43rd IEEE Symposium on Security & Privacy. May 2022.

Footnotes


  1. Why Johnny Can’t Encrypt: A Usability Evaluation of PGP 5.0 Alma Whitten and J. D. Tygar. 8th USENIX Security Symposium (USENIX Security 99), 1999. ↩︎

  2. Johnny 2: A User Test of Key Continuity Management with S/MIME and Outlook Express Simson L. Garfinkel and Robert C. Miller. In Proceedings of the 2005 Symposium on Usable Privacy and Security, (SOUPS ‘05), 2005. ↩︎

  3. Why Johnny Still Can’t Encrypt: Evaluating the Usability of Email Encryption Software Steve Sheng, Levi Broderick, Colleen Alison Koranda and Jeremy J. Hyland. In Proceedings of the 2006 Symposium On Usable Privacy and Security, (SOUPS ‘06), 2006. ↩︎

  4. Why Johnny Still, Still Can’t Encrypt: Evaluating the Usability of a Modern PGP Client Scott Ruoti, Jeff Andersen, Daniel Zappala and Kent Seamons. arXiv preprint arXiv:1510.08555v2 [cs.CR], 2016. ↩︎

  5. Confused Johnny: When Automatic Encryption Leads to Confusion and Mistakes. Scott Ruoti, Nathan Kim, Ben Burgon, Timothy van der Horst and Kent Seamons. In Proceedings of the Ninth Symposium on Usable Privacy and Security, SOUPS ‘13, 2013. ↩︎

  6. Private Webmail 2.0: Simple and Easy-to-Use Secure Email. Scott Ruoti, Jeff Andersen, Travis Hendershot, Daniel Zappala and Kent Seamons. arXiv preprint arXiv:1510.08435v5 [cs.CR], 2016. ↩︎