=============
Typosquatting
=============

The Aura typosquatting protection requires a dataset file that contains a list of python packages and their popularity (number of downloads) in a JSON file. This file can be obtained by querying the Google Big Query service.

.. note::
    Although Google Big Query is a commercial service, Google provides a free tier of 1TB/month of processed data which is more than enough to obtain the data needed for the typosquatting protection for free.


-----------------------
Manual dataset download
-----------------------

To connect to the Big Query service, you must first install the Big Query command-line tool from google-cloud-sdk. Follow the official documentation to install this tool. Alternatively, you can use the online console to run the query and export the JSON results to Google Drive https://console.cloud.google.com/bigquery .

Now run the following query to generate the dataset needed for the typosquatting protection:

::

    SELECT file.project as package_name, count(file.project) as downloads
    FROM `the-psf.pypi.downloads*`
    WHERE
        _TABLE_SUFFIX
        BETWEEN FORMAT_DATE(
          '%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
        AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
    GROUP BY package_name
    ORDER BY downloads DESC


-------------------------
Download dataset via Aura
-------------------------

If you have a google python SDK installed and authentication configured for the python client, you can download the dataset automatically via Aura by running `aura fetch-pypi-stats`. To find out if your python Big Query SDK is correctly configured, run `aura info` and check the output if the BigQuery service integration is enabled.