Machine Learning vs. Cookie Consent Systems

Machine Learning vs. Cookie Consent Systems

A new research collaboration between the University of Wisconsin and Google puts machine learning against one of the most notorious annoyances of web users of the past decade — the obfuscation and parody abuse of GDPR-compliant cookie advertising.

titled cookiethe new framework uses semantic text understanding to analyze the importance and usefulness of the underlying code behind the cookie consent popup or banner, in order to provide the user with a “one-click” missing solution to disable all truly “unnecessary” cookies – including those that may Domain owners present it as “basic,” even if it isn’t.

CookieEnforcer checks the cookie consent code from the website www.askubuntu.com.  Source: https://arxiv.org/pdf/2204.04221.pdf

CookieEnforcer checks the cookie consent code from the website www.askubuntu.com. Source: https://arxiv.org/pdf/2204.04221.pdf

The system is implemented via a user-installed web browser plugin, which is able to apply user-defined rules with a single click. Once the cookie consent framework appears on the website, the user can activate the plugin, which will then scan the cookie consent code for possible actions before creating an appropriate JavaScript to enact options on the user’s behalf.

The plugin can be set to automatically enforce user preferences, or take instances individually, allowing the user to fine-tune settings before final submission.

Apply cookies while you work.  If preferred, the Chrome plugin can fully automate this process, without further user input.  See later embedded video for more details.  Source: https://www.youtube.com/watch?

Apply cookies while you work. If preferred, the Chrome plugin can fully automate this process, without further user input. See later embedded video for more details. Source: https://www.youtube.com/watch?

The challenge of analyzing potential ‘disapproval’ options, which are usually hidden in vague and tedious combinations of settings (rather than user-friendly options accept all typical of consent frameworks) similar to the sequence-to-sequence task.

In the overall accuracy assessment, CookieEnforcer was able to generate all necessary steps to evade cryptographic cookie consent procedures in 91% of the cases studied, on domains not seen while training the system’s machine learning model. The user study further showed that the system significantly reduces user effort in navigating consent modules.

The paper presenting the method is titled CookieEnforcer: automated analysis and enforcement of cookie notificationand comes from three researchers at the University of Wisconsin at Madison and one from Google Inc.

Dark ways to accept cookies

Since the enactment of the General Data Protection Regulation (GDPR) in 2016 and the California Consumer Privacy Act (CCPA) in 2018, websites that wish to engage users from areas covered by this legislation have been required to provide cookie preference mechanisms (usually based on Disclose the user’s IP address as a proxy to their country of origin).

However, since domain owners have long been accustomed to collecting valuable and actionable user data from the usually obscure and invisible implementation of cookies, they have proven reluctant to offer easy opt-outs to their newly authorized users.

The default user interface of cookie consent interfaces (which appear the first time a user visits a domain, or if the user deletes cookies for that domain) quickly settled into dark patterns designed to overwhelm viewers with subtle, time-consuming, and exhaustive options should they wish to exercise their rights to consent; Or a simple, accessible button in which the user has selected all the cookies that the domain owner would like to turn on. The labyrinthine user interface choices culture was described in one 2020 study as a “scavenger hunt.”

New paper comments:

“[Users] You may find it difficult to exercise informed control over cookies for websites with complex notifications. They are more likely to rely on default configurations than on setting each other’s cookie settings [website]. In many cases, these default settings are privacy-invasive and favor providers, resulting in privacy [risks]”.

A comment on a popular forum post about these practices described them as “harmful compliance”. User inconvenience with cookie consent frameworks is a topic at odds with major publishers, who would normally incur additional coverage if not disclosed personally by their own practices in this regard.

A typical maze of options presented, in this case, by the TechCrunch website, ironically as an introduction to an article on the EU's changing position on what constitutes cookie consent.  URL IDs and extension hooks designed to enable tracking are over 262 characters long (omitted here).  Button is not available

A typical maze of options presented, in this case, by the TechCrunch website, ironically as an introduction to an article on the EU’s changing position on what constitutes cookie consent. URL IDs and extension hooks designed to enable tracking are over 262 characters long (omitted here). The Decline All button, although available for certain categories of cookies, is not available for the full set of potential cookies; In those excluded cases, the user must turn on each “toggle”.

A 2019 research paper from Germany found that the majority of website visitors in the studied domains were “pushed” toward broad consent, and that only a third of websites actually explained the purposes of their data collection practices.

A number of web browser plug-ins, add-ons and extensions have emerged to address the problem in recent years, such as the Cookie Quick Manager Firefox extension, and a wide range of Chrome alternatives, as the European Union seeks to close compliance loopholes around cookie consent structures.

method and data

The researchers in the new paper are designed to create a more robust framework for cookie consent management by avoiding dependence on keywords or manually crafted rules, which is the central approach of a number of similar recent projects with the help of ML.

CookieEnforcer has three goals: to translate cookie notifications and interfaces into a machine-readable format; To specify the configuration of the cookie setting in such a way that non-essential cookies are disabled; And to automatically apply additional restrictions without further user input, if the user so desires.

The system consists of a back-end component that detects and analyzes cookie notifications, and a front-end component, in the form of a browser extension, that creates and implements disabling non-essential cookies (that is, cookies that will not impede navigation or access to the domain if it is blocked).

The framework is embodied in a natively installed Chrome extension that uses the Selenium Web Test Library within the ChromeDriver framework.

The backend section features modules for detection, analysis, and a decision model. The parsing module takes into account changes in the code introduced by the user interaction, so that the raw code dump does not become invalid by the user’s simulated exploration.

Understanding natural language

With the code exposed, it’s important for CookieEnforcer to understand the current state of the potential actions he might take, because the language behind toggle buttons can be ambiguous in terms of interest to the end user.

To this end, the researchers trained a text-to-text converter (T5) model of its decision component. The T5-Large model, which contains 770 million parameters, is tuned to a custom I/O code database (for example, code that describes and enables the functionality of switching options).

Coordination model (above) and training data (below) for the T5 model.  Example data from www.askubuntu.com.

Coordination model (above) and training data (below) for the T5 model. Example data from www.askubuntu.com.

The dataset was generated by sampling 300 websites with selected cookie notifications from Tranco’s list of the most popular 50,000 websites. The Detector and Analyzer modules extracted cookie consent options from the runtime source code, and evaluated their default states.

Then a researcher named the manually interpreted series of clicks needed to disable nonessential cookies for all of the websites studied, creating a full 300 categorized domains.

Diversity in source code arrangement across examples from custom dataset.

Diversity in source code arrangement across examples from custom dataset.

Set aside 60 websites as a test set, the T5-Large model was trained with a learning rate of 0.003 with a batch size of 16 for 20 epochs, with a maximum input sequence length of 256 tokens, and a maximum target sequence length of 64. Tokens were formed from generated subwords With a Google SentencePiece token.

Finally, the processed information is stored in a local database and made available to the front end of the system. The authors preferred the HTML querySelector() function over the XML Path Language (XPath) approach taken by some previous similar projects, since XPath for cookie notifications is vulnerable to DOM updates (for example, the code may change after an initial load in response to user interactions). In this way, tracks of the elements can be kept even when they are dynamic and responsive to external factors.

Test and performance

In practice, CookieEnforcer has proven to be able to navigate some of the darkest dark patterns in a data set, such as the option hidden in the cookie consent framework for the new World Which JavaScript blocks until the user explicitly requests to see them.

Authors comment:

Users can easily miss this option as they have to expand an additional window to see it. CookieEnforcer not only finds this option, but also understands the semantics and decides to object. These examples show that the model learns the context and generalizes to new examples.

Conducting three tests, including a comprehensive evaluation of the framework’s performance across 500 invisible domains (ie, websites for which CookieEnforcer was not specifically trained), the authors reported that nonessential cookies could be disabled with 91% success. of sites.

The second test involved an online user study covering 14 sites, and uses a system usability scale (score) against a manual baseline. For this test, the authors reported that CookieEnforcer scored a 15% higher score than baseline.

CookieEnforcer enables a 15% higher degree of basic usage (without assistance), while at the same time automating an annoying process.

CookieEnforcer enables a 15% higher degree of basic usage (without assistance), while at the same time automating a troublesome process.

Finally, CookieEnforcer’s trained parameters were tested against the top 5,000 websites in the US and Europe, to determine their ability to navigate cookie notifications. The authors state:

While measurements of this magnitude have been made before, CookieEnforcer allows for a deeper understanding of options that go beyond keyword-based inference. In particular, we found that 16.7% of UK websites displaying cookie notifications had at least one non-essential cookie enabled. The same number for websites in the US is 22%.

The authors released a short YouTube video showing the work of CookieEnforcer:

First published on April 12, 2022.

Source link

Leave a Comment