SugarCRM Developers

How to perform Data Anonymization for Sugar development

Blog Post created by SugarCRM Developers on May 2, 2016

Post originally written by Emmanuel iNet.

 

Here is another guest post from Emmanuel Dyan from the Elite SugarCRM Partner iNET Process. In it he addresses a common Sugar project requirement using an open source tool developed by iNET Process.

 

The problem that we will try to solve with this post is:

How do we make sure that we are never developing using actual customer data but, at the same time, work with data that reflects reality?

 

Data Anonymization

 

Usually, when we work on a customization project for a customer you have a minimum of 3 different environments: Development,  User Acceptance Testing (UAT), and Production. To make sure that we work in optimal and consistent conditions, we usually copy the database from one environment to another (preferably from production to other environments). Doing this type of manipulation has multiple drawbacks, including:

  • We have to collect a Database Dump which means that it contains unencrypted raw data. What would happen if we mistakenly expose this dump to someone who is unauthorized?
  • We have to test some functionality to make sure that it works. What would happen if we test a Campaign that sends thousand of e-mails ... to the ... actual customers of our customer?

 

Anonymizing the data is the best practice to avoid "playing" with customer data and to keep their trust in us.

 

The challenge with anonymizing data is figuring out how to overwrite the data with something that is completely unrecognizable. For example: "John Doe" will become "voluptatem accusantium". His job title becomes "doloremque" and his country "magnam". His phone number will become "569898520114457878744778" instead of "+123456789".

 

Anonymization Tool

 

Now how do we work in realistic conditions with that kind of anonymization? Indeed, we need another solution that works with any Sugar instance. The solution we are demonstrating below is implemented in our open source CLI tool (sugarcli).

 

The anonymization architecture follows:

  • It uses another independent tool, called "neuralyzer" (by iNet Process) that is a command line tool to anonymize a database.
  • Neuralyzer uses a library called "Faker" to generate realistic data.
  • It is composed of two subcommands:
    • A configuration generator.  This is used to generate a configuration file automatically without destroying the system tables (config, relationships, etc).
    • The anonymizer that connects directly to the Sugar Database to perform the job. It uses iNET Process libsugarcrm to performs SQL queries (via a PDO connection). It also purges deleted records and cleans _cstm tables of deleted records. It finishes by then emptying all _audit tables.

 

Configuration Generator

 

The configuration generator reads all the Sugar database tables and tries to guess, from the field type or the field name what type of fake data needs to be generated. For example a field containing the string city will receive a random city.  If a field contains _user_id then it will be ignored in order to preserve unique IDs.

 

To use it, first download SugarCLI from https://github.com/inetprocess/sugarcli.

 

Then run:

./sugarcli.phar anonymize:config --path <sugarcrm_path>

 

The parameters are the following:

    --file=FILE                   Path to the configuration file [default: "../db/anonymization.yml"]    --ignore-table=IGNORE-TABLE   Table to ignore. Can be repeated (multiple values allowed)    --ignore-field=IGNORE-FIELD   Field to ignore. Can be repeated (multiple values allowed)-p, --path=PATH                   Path to SugarCRM installation.

 

Example

 

The command ./sugarcli.phar anonymize:config creates a file that looks like:

guesser_version: 1.0.0 entities:     accounts:         cols:             name: { method: company }             description: { method: sentence, params: [20] }             facebook: { method: url }             twitter: { method: url }             googleplus: { method: url }             account_type: { method: randomElement, params: [['', Analyst, Competitor, Customer, Integrator]] }             industry: { method: randomElement, params: [['', Apparel, Banking, Biotechnology, Chemicals]] }             annual_revenue: { method: randomNumber, params: [4] }             phone_fax: { method: phoneNumber }             billing_address_street: { method: streetAddress }             billing_address_city: { method: city }             billing_address_state: { method: state }             billing_address_postalcode: { method: postcode }             billing_address_country: { method: country }             rating: { method: sentence, params: [8] }             phone_office: { method: phoneNumber }             phone_alternate: { method: phoneNumber }             website: { method: url }

 

As you can see, the commands uses different methods to guess the type of faker methods to use:

  • If it is a dropdown then it gets the list of values from vardefs
  • If the field contains a known word it uses a pre-defined method (example .*_city = city)
  • Otherwise it will use the dbType (varchar = sentence)

 

You can change the content of the file once generated to match your criteria.

 

Run the anonymization

Warning! Do not run this on a production instance! This command will overwrite data in the target database. 
./sugarcli.phar anonymize:run --path <sugarcrm_path>

 

The parameters are the following:

    --file=FILE        Path to the configuration file [default: "../db/anonymization.yml"]    --force            Run the queries    --remove-deleted   Remove all records with deleted = 1. Won't be launched if --force is not set    --clean-cstm       Clean all records in _cstm that are not in the main table. Won't be launched if --force is not set    --sql              Display the SQL of UPDATE queries    --table=TABLE      Anonymize only that table (repeat for multiple values) (multiple values allowed)-p, --path=PATH        Path to SugarCRM installation.

 

 

 

Example

 

The command ./sugarcli.phar anonymize:run --table=accounts --force gives the following output:

Be careful, the anonymization is going to start That will overwrite every data in the Database !
If you are sure, please type "yes" in uppercase YES
Anonymizing accounts50/50 [============================] 100%
Emptying accounts_auditEmptying bugs_auditEmptying campaigns_auditEmptying cases_auditEmptying contacts_auditEmptying contracts_audit....
Done in 0.42 sec (consuming 40.5Mb)
Now you can dump your database and send it to your development team! If you use the parameters --remove-deleted and --clean-cstm, it will be smaller too.

Outcomes