Let’s Encrypt just announced an infrastructure makeover which means the open certificate authority (CA) is able to re-issue up to 200 million certificates in a 24-hour period, something the service said could be necessary in “some of the worst scenarios.”
The upgrade comes a year after Let’s Encrypt was compromised by a Certificate Authority Authorization (CAA) bug and was forced to revoke 3 million Transport Layer Security (TLS) certificates on a single day, March 4, potentially leaving the sites behind them insecure or unavailable.
Let’s Encrypt, a free service of the Internet Security Research Group, has secured nearly 250 million websites, toward its goal of “100 percent HTTPS,” the group’s 2020 annual report said.
Protection Against Breaches
Josh Aas said in a recent blog post about the upgrade that the automated service issues about 2 million certificates every day. But in the event of a wide-scale breach, it could be necessary to replace all of them at once.
Aas explained last March’s CAA bug only impacted 2.6 percent of all Let’s Encrypt’s active certificates, and while disruptive, could have been much worse.
“What if that bug had affected all of our certificates?” Aas wrote. “That’s more than 150 million certificates covering more than 240 million domains. What if it had also been a more serious bug, requiring us to revoke and replace all certificates within 24 hours? That’s the kind of worst-case scenario we need to be prepared for.”
The large-scale upgrade was funded by corporate donations from companies including Facebook, Amazon Web Services, Mozilla, GitHub, Red Hat and others, the group explained. The hardware was provided courtesy of Cisco, Thales and Fortinet, they added.
Aas explained that efforts to improve Let’s Encrypt were focused on five specific areas: database performance, internal networking speed, cryptographic signing module (HSM) performance and bandwidth.
Let’s Encrypt Upgraded RAM
The database, he said, is “at the heart of the service we offer.” The Let’s Encrypt database keeps track of all the certificates and accounts and, Aas explained, is “Write-heavy with plenty of reads as well.”
The previous Let’s Encrypt servers couldn’t have handled a massive re-issue in a single day, he said, so they were replaced with new-generation Dell database servers with “dual AMD EPYC 7542 CPUs, 64 physical cores in total,” the announcement said.
“These machines have 2TB of faster RAM. Much faster CPUs and double the memory is great, but the really interesting thing about these machines is that the EPYC CPUs provide 128 PCIe4 lanes each,” Aas explained. “This means we could pack in 24 6.4TB NVME drives for massive I/O performance. There is no viable hardware RAID for NVME, so we’ve switched to ZFS to provide the data protection we need.”
Let’s Encrypt Now Running on 25G Fiber Network
Let’s Encrypt also upgraded its 1G copper network infrastructure.
“We originally looked into upgrading to 10G but learned that upgrading to 25G fiber wasn’t much more expensive, Aas said. “Cisco ended up generously donating most of the switches and equipment we needed for this upgrade, and after replacing a lot of server network interface cards, Let’s Encrypt is now running on a 25G fiber network!”
Let’s Encrypt Gets HSM Cryptograhic Capacity
On a day Let’s Encrypt would need to re-issue 200 million certificates it would require its pair of Luna Hardware Security Modules (HSMs) at each data center to perform at least 600 million cryptographic signing operations in 24 hours — including an online certificate status protocol (OSCP) response for signature revocation; a certificate signature for replacements; and a response signature for the replacement— for each certificate.
Aas said the previous HSMs that Let’s Encrypt was using could only handle about 190 million signatures in 24 hours, max.
“That isn’t enough,” he wrote. Thales donated new HSMs, giving Let’s Encrypt the capacity to process 864 million signing operations per day, just from one data center.
Let’s Encrypt Boosts Bandwidth, API
The CA also boosted its bandwidth to increase its ability to sync and analyze their databases across data centers and the cloud, according to the announcement. It also enhanced its notification for early renewal with an API extension.
Let’s Encrypt uses the ACME protocol to verify that users control a given domain name and to issue them a certificate. To get a Let’s Encrypt certificate, users need to choose a piece of third-party client software to use.
“In order to get all those certificates replaced, we need an efficient and automated way to notify ACME clients that they should perform early renewal,” Let’s Encrypt’s Aas wrote. “Normally ACME clients renew their certificates when one third of their lifetime is remaining, and don’t contact our servers otherwise. We published a draft extension to ACME last year that describes a way for clients to regularly poll ACME servers to find out about early-renewal events. We plan to polish up that draft, implement and collaborate with clients and large integrators to get it implemented on the client side.”
Is your small- to medium-sized business an easy mark for attackers?
Threatpost WEBINAR: Save your spot for “15 Cybersecurity Gaffes SMBs Make,” a FREE Threatpost webinar on Feb. 24 at 2 p.m. ET. Cybercriminals count on you making these mistakes, but our experts will help you lock down your small- to mid-sized business like it was a Fortune 100. Register NOW for this LIVE webinar on Wed., Feb. 24.