In every audio sample, there are obviously noises. Most commonly, they're the ones we hear. However, in every recorded (in real life) audio sample, at about 50 or 60 Hz, there's another incredibly important noise: the power grid. Power grids in the United States run at about 50Hz, and those elsewhere tend to run at about 60Hz. They slightly adjust their frequency every second, in no particular pattern, to match power demand. Since every single grid has these unique sounds, we can, in theory, accurately match any audio sample to an exact time (and general region) of recording. We can even verify the authenticity of any audio sample! This approach, using electrical network frequency (ENF) analysis, is currently (allegedly) used by only a few government-level actors. Open-source intelligence tooling developers at Bellingcat are working on a public implementation of the technique. However, available ENF datasets are very limited and out-of-date, but crucial for the approach to work.
This is where a high schooler with too much free time can help. At mainsfrequency.com, there is a widget that shows the current ENF of the European grid:
If this website has this data updating constantly, surely I can get it too, right? Needless to say, a European ENF dataset that's up-to-date would have quite an impact on the usefulness of ENF analysis. Checking the network tab in dev tools, we see second-by-second requests to some server at netzfrequenzmessung.de
:
What's this c
query parameter? It was pretty simple to find in the source code:
Code as found originally, with modified indentation.
Now, I don't speak German, but I do speak Math.round(Math.random()*100000)*31
! It seems that c
is used to verify the authenticity of the request (instead of CORS...?), and force IE to get new data instead of caching. Using the same number twice too often returns a 429 Too Many Requests
(but it seems to reset every so often). From here, I started trying random c
-values.
Only positive integers are generated by the website's code, so I use negative numbers (random multiples of between and negative integer limit) to get the data.
I also alternate UAs and generate random IPs for the Forwarded
headers to avoid rate limits. In Python, the code is basically
Running this function about once a second on HackClub's Nest (an amazing, free service for high schoolers that I highly recommend) gives fewer than ten seconds of data loss per day!* The following graph looks very "this is the ENF signature of a power grid over a timeframe of one hour, plotted with matplotlib
"-y:
* — For rate limits that last less than three seconds, we average the value that was skipped. I found that averaging a gap more than one second could lead to unpredictable results, so I didn't average those errors. The ten-seconds-of-data-loss statement includes both filled and unfilled errors.
Rendering a full day (it was an accident...) took a pretty long time, but looks very believable:
Cool! We now have day-by-day CSV data of the European grid's ENF signature. Every month, I also run a script to compile all of the CSV into a single Parquet file to share with the world (but mostly the people at Bellingcat).
tl;dr: open-source is AWESOME and breaking things is fun.