DemoDB e

6 Resources to Generate Test Data That’s Realistic (2023)

Needing to Generate test data can be very stressful. Therefore, I like to produce sample data that contains realistic values, such as professional names, so that the people involved in the training have a frame of reference and do not get tripped up on the data. Realistic datasets can be created by following these steps and utilizing the resources listed below.

There are times when, as a consultant, or a trainer, you have to produce a demo and need a demo dataset that has no relation to your client’s data. This can be due to security, confidentiality, or timing of your data availability.   Creating these datasets for Power BI Portfolio Projects can be pretty helpful and usually is defined in the testing plan.

Updated this article for 2023 and added the Pyton section. A lot has changed since 2017 but really grateful that most of the sites were still up and serving data !!

1. Need a Database Schema?

You often know what schema you wish to generate but need some interesting ideas on enhancing the data model. There was a site, Database Answers.org, with over 1500 data models listed in database diagrams. According to (4) What happened to Database Answers? : Database (reddit.com) The person keeping this site up may have passed away, according to the Reddit Post.

Apparently, most of the links are on Archive.org’s Wayback machine located here, List of All Data Models from DatabaseAnswers.org (archive.org). It may not be all, but this site was so helpful to me back in the day, and I want to honor the creator here: List of All Data Models from DatabaseAnswers.org (archive.org).

Note: There is a mirror of the site here https://fordnox.github.io/databaseanswers/data_models/index.htm and a the GitHub project; https://github.com/fordnox/databaseanswers/

Check it out !!! Thanks.

ProductSalesSchema

2. Getting Realistic People’s Names

Most databases sometimes require people’s names, such as employees, customers, or sales reps. These can be difficult to get right and believable. A great resource, Random name generator (random-nameallows you to generate up to one hundred names at a time using various options, including; male, female, or both, and can even select common names or rare names.

names.png

3. Getting Sample Product Names

These can be the hardest, but I found a site that lets you generate fantasy object names, Fantasy-Name-Generator.com. This generator will provide ten random names, normally used for relics, artifacts, and other special trinkets, based on real and fictional artifact names. This adds a bit of fun; however, you can filter out any “non-professional” sounding names depending on your audience.

Products.png

4. Building Business Names and Addresses

Getting realistic business names can be challenging, as with people’s names. Mirthandimages.com allows you to select how many names you need and hit the Randomize button. You also can select an industry group and then a set of names in that sector. North American Address Generator can provide a list of random addresses you can add to the company name dataset. This provides random phone numbers also. However, I usually replace most numbers with an “x” just in case someone wants to dial them up.

Company

5. Get GPS Locations For Sample Maps

Location names as you have access to maps and the ability to capture the longitude and latitude of the location in order to use these on maps. Latitude and Longitude Finder on Map Get Coordinates (latlong.net) will let you select a point on a map and provide the Longitude and Latitude of that point. This is useful if you need to select points in a specific area. This is good when you need actual client sites or points in a specific area.

If you are looking for a group of random locations, the Random Point Generator, pictured below, is a great place where you can gather locations with being able to provide a number of options.

These are resources that I have used to generate realistic data for demos, training, and learning databases. Share out any others you have used.

6. How to use Python to Generate Test Data

Test data can be generated in Python with a combination of libraries and techniques that produce realistic but randomized data. The Faker library is a useful tool for generating false data, including names, addresses, and other common data types, that can be used for various purposes.

The Random Library can assist in the randomization of position selection and yearly salaries. In order to illustrate this, a company payroll table will be created featuring 50 employees and four attributes: “Employee Number,” “Employee Name,” “Position,” and “Yearly Salary”.

Tutorial: Create an Employee Data table in Python.

I needed a payroll table for a data analytics company in the example below. The employee positions are associated with “Employee Number”, “Employee Name”, “Position”, and “Yearly Salary”. Rounding out, the table needed to have 50 employees.

First, you need to install the Faker library if you haven’t already done so. You can install it using pip:

pip install Faker

Now, let’s create a Python script to generate the company payroll table:

import random
from faker import Faker

fake = Faker()
positions = ["Data Analyst", "Data Scientist", "Data Engineer", "Machine Learning Engineer", "Business Analyst", "Database Administrator", "BI Developer", "Data Architect"]

def generate_employee_record(employee_number):
    employee_name = fake.name()
    position = random.choice(positions)
    yearly_salary = round(random.uniform(50000, 120000), 2)
    return {"Employee Number": employee_number, "Employee Name": employee_name, "Position": position, "Yearly Salary": yearly_salary}

def generate_payroll_table(num_employees):
    payroll_table = []
    for i in range(1, num_employees + 1):
        employee_record = generate_employee_record(i)
        payroll_table.append(employee_record)
    return payroll_table

payroll_table = generate_payroll_table(50)
for employee in payroll_table:
    print(employee)

This script creates a company payroll table with 50 employees, each having a unique employee number, a randomly generated employee name, a randomly selected position from the positions list, and a randomized yearly salary between $50,000 and $120,000. The picture below is from VSCode on a Mac.

Conclusion

It seems many things have changed since I wrote the original version of this article in 2017. There is still a need to create demo data, and with data science and machine learning, now being able to generate test data also. Now, with Python, there is another way to get this done.

How do you generate test data?

Resources

What is Test Data? Test Data Preparation Techniques with Example (softwaretestinghelp.com)


Comments

2 responses to “6 Resources to Generate Test Data That’s Realistic (2023)”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.