How to pull a remote MySQL database and remove personally identifiable data

In this post I am going to detail how to sync a remote database to a local developer’s machine without storing any sensitive or personally identifiable data at rest.

Over the last few years there has been an ongoing change in our attitude towards privacy and how we handle user data. More recently the EU has introduced GDPR which has a whole set of rules that you should be aware of when processing data.

Instead of addressing any specific regulatory solutions for developers, I would like to outline a specific data processing solution that you might want to consider when handling user data, and it’s up to you to decide if this can be applied in a way which aids GDPR compliance.

This solution performs the following steps using unix pipes, in a single solution:

  • MySQL dump the database on the remote server
  • Overwrite the local database with the contents of the dump
  • Run custom SQL statements to remove any sensitive or personally identifiable data

Set up

You should already have the following setup:

  • A remote MySQL server
  • A local MySQL server (of the same, or close, enough version)
  • A remote linux server that you can connect to the MySQL server on. This may host the MySQL server directly.

You should also be able to shell into the remote linux server without entering a password, i.e. you are using key based authentication. If you need to setup key based authentication, do this before proceeding.

Once at a shell terminal on the remote linux server you can type mysql <database-name> to connect to the MySQL database. This is likely possible because you have the credentials stored in a ~/.my.cnf file. See the example below:

[client]
user = myusername
password = "mypassword"
host = localhost

Depending on your security model, there may be valid reasons for not storing these credentials in this user’s home directory. You should consult with the owner of the server should you have any security concerns.

MySQL dump the database on the remote server

Let’s try our first test. Assuming that there is already a remote database, you should now be able to run the following to prove that the connection works:

ssh test@example "mysqldump <database-name>" | tail

The last line should look something like this:

-- Dump completed on 2018-05-02 9:05:30

Run custom SQL statements to remove any sensitive or personally identifiable data

The next step is to append the contents of a MySQL file to this output. This file will contain the SQL statements required to remove any sensitive or personally identifiable data.

So assuming that you have the following file locally:

UPDATE people SET firstName = CONCAT('FirstName-', id), lastName = CONCAT('LastName-', id);

The following command should result in two similar lines of output:

{ ssh test@example "mysqldump <database-name>"; \
 cat ./sanitize.sql; } | tail
-- Dump completed on 2018-05-02 9:05:30
UPDATE people SET firstName = CONCAT('FirstName-', id), lastName = CONCAT('LastName-', id);

Alternatively, depending on your workflow, the sanitize.sql file could be stored on the remote server. This will mean that you will want to cat the remote file instead of the local one, you can do this as follows:

ssh test@example "mysqldump <database-name>; \
  cat /path/on/server/sanitize.sql" | \
  tail

Whichever solution you wish to go for, you can now pipe that into your local mysql database:

ssh test@example "mysqldump <database-name>; \
  cat /path/on/server/sanitize.sql" | \
  mysql -u username -p database-name

There is more work that you can do to this to make it more secure and easier to use. Here are a few suggestions:

  • You should be aware that your local database may have query logging enabled. You should consider programmatically enabling/disabling this in your script so that you can be sure that no personal data is hidden away in a log file. Alternatively, you could consider disabling it in permanently in your my.cnf file. Either way, it’s bad to leave this enabled as you could end up with the entire history of this log full of personal/sensitive data.
  • Use a logical operator to cause a cleanup of the local database should the import fail. e.g. <dump sql dump and sql sanitize script> | <local mysql import> || mysql -u username -p "DROP DATABASE db_name". This means that if the import fails part way through, then the whole local database is deleted and no personal data is left on your computer.
  • You could also prefix the example command with a teardown of the existing local database depending on your use case.
  • If your database is not on the same server that you are remotely connecting to, you should also consider ensuring that TLS has been correctly configured to enable encryption of data in transit.

Post a Comment

Your email address is never published nor shared. Required fields are marked *

Ready to talk?

Whether you want to create a new digital product or make an existing one even better, we'd love to talk it through.

Get in touch