Skip to content
5MinuteBI Creating Data Power Users 5 Minutes at a time

5MinuteBI

Creating Data Power Users, 5 Minutes at a Time

  • Home
  • Blog
  • About
  • Contact
Home / Data Analytics / Data Storytelling /

6 Resources to Generate Test Data That’s Realistic (2023)

BySteve Young Updated onMay 14, 2023

Needing to Generate test data can be very stressful. Therefore, I like to produce sample data that contains realistic values, such as professional names, so that the people involved in the training have a frame of reference and do not get tripped up on the data. Realistic datasets can be created by following these steps and utilizing the resources listed below.

There are times when, as a consultant, or a trainer, you have to produce a demo and need a demo dataset that has no relation to your client’s data. This can be due to security, confidentiality, or timing of your data availability.   Creating these datasets for Power BI Portfolio Projects can be pretty helpful and usually is defined in the testing plan.

Updated this article for 2023 and added the Pyton section. A lot has changed since 2017 but really grateful that most of the sites were still up and serving data !!

In This Article
  • 1. Need a Database Schema?
  • 2. Getting Realistic People's Names
  • 3. Getting Sample Product Names
  • 4. Building Business Names and Addresses
  • 5. Get GPS Locations For Sample Maps
  • 6. How to use Python to Generate Test Data
    • Tutorial: Create an Employee Data table in Python.
  • Conclusion
  • Resources

1. Need a Database Schema?

You often know what schema you wish to generate but need some interesting ideas on enhancing the data model. There was a site, Database Answers.org, with over 1500 data models listed in database diagrams. According to (4) What happened to Database Answers? : Database (reddit.com) The person keeping this site up may have passed away, according to the Reddit Post.

Apparently, most of the links are on Archive.org’s Wayback machine located here, List of All Data Models from DatabaseAnswers.org (archive.org). It may not be all, but this site was so helpful to me back in the day, and I want to honor the creator here; List of All Data Models from DatabaseAnswers.org (archive.org)

Check it out !!! Thanks.

ProductSalesSchema

2. Getting Realistic People’s Names

Most databases sometimes require people’s names, such as employees, customers, or sales reps. These can be difficult to get right and believable. A great resource, Random name generator (random-name-generator.info), allows you to generate up to one hundred names at a time using various options, including; male, female, or both, and can even select common names or rare names.

names.png

3. Getting Sample Product Names

These can be the hardest, but I found a site that lets you generate fantasy object names, Fantasy-Name-Generator.com. This generator will provide ten random names, normally used for relics, artifacts, and other special trinkets, based on real and fictional artifact names. This adds a bit of fun; however, you can filter out any “non-professional” sounding names depending on your audience.

Products.png

4. Building Business Names and Addresses

Getting realistic business names can be challenging, as with people’s names. Mirthandimages.com allows you to select how many names you need and hit the Randomize button. You also can select an industry group and then a set of names in that sector. North American Address Generator can provide a list of random addresses you can add to the company name dataset. This provides random phone numbers also. However, I usually replace most numbers with an “x” just in case someone wants to dial them up.

Company

5. Get GPS Locations For Sample Maps

Location names as you have access to maps and the ability to capture the longitude and latitude of the location in order to use these on maps. iTouchMap.com will let you select a point on a map and provide the Longitude and Latitude of that point. This is useful if you need to select points in a specific area. This is good for times when you need actual client sites or points in a specific area.

If you are looking for a group of random locations, the Random Point Generator, pictured below, is a great place where you can gather locations with being able to provide a number of options.

These are resources that I have used to generate realistic data for demo, training and learning databases. Share out any others you have used.

6. How to use Python to Generate Test Data

Test data can be generated in Python with a combination of libraries and techniques that produce realistic but randomized data. The Faker library is a useful tool for generating false data, including names, addresses, and other common data types, that can be used for various purposes.

The Random Library can assist in the randomization of position selection and yearly salaries. In order to illustrate this, a company payroll table will be created featuring 50 employees and four attributes: “Employee Number”, “Employee Name”, “Position”, and “Yearly Salary”.

Tutorial: Create an Employee Data table in Python.

I needed a payroll table for a data analytics company in the example below. The employee positions are associated with “Employee Number”, “Employee Name”, “Position”, and “Yearly Salary”. Rounding out, the table needed to have 50 employees.

First, you need to install the Faker library if you haven’t already done so. You can install it using pip:

pip install Faker

Now, let’s create a Python script to generate the company payroll table:

import random
from faker import Faker

fake = Faker()
positions = ["Data Analyst", "Data Scientist", "Data Engineer", "Machine Learning Engineer", "Business Analyst", "Database Administrator", "BI Developer", "Data Architect"]

def generate_employee_record(employee_number):
    employee_name = fake.name()
    position = random.choice(positions)
    yearly_salary = round(random.uniform(50000, 120000), 2)
    return {"Employee Number": employee_number, "Employee Name": employee_name, "Position": position, "Yearly Salary": yearly_salary}

def generate_payroll_table(num_employees):
    payroll_table = []
    for i in range(1, num_employees + 1):
        employee_record = generate_employee_record(i)
        payroll_table.append(employee_record)
    return payroll_table

payroll_table = generate_payroll_table(50)
for employee in payroll_table:
    print(employee)

This script creates a company payroll table with 50 employees, each having a unique employee number, a randomly generated employee name, a randomly selected position from the positions list, and a randomized yearly salary between $50,000 and $120,000. The picture below is from VSCode on a Mac.

Conclusion

It seems many things have changed since I wrote the original version of this article in 2017. There is still a need to create demo data, and with data science and machine learning, now being able to generate test data also. Now, with Python, there is another way to get this done.

How do you generate test data?

Resources

What is Test Data? Test Data Preparation Techniques with Example (softwaretestinghelp.com)

Related

Steve Young

With over 34 years of experience in the data and technology industry, the last 16 with Microsoft, I have had the opportunity to work in various capacities, contributing to my knowledge and expertise in Data Engineering, Power BI, and Data Visualization.

Facebook Twitter Instagram YouTube Linkedin Pinterest

Post navigation

Previous Previous
How to Convert Excel Tables Fact Tables Using Query Editor​
NextContinue
5 Steps to Improve Your Excel Data Tables
Search

Categories

  • Business Intelligence (0)
    • Self-Service BI (3)
  • Data Analytics (0)
    • Azure Data & Analytics (6)
    • Data Analysis With Python (3)
    • Data Storytelling (2)
  • Data Architecture (0)
    • Data Governance (5)
    • Data Modeling (2)
    • Power BI Solution Architecture (11)
  • Data Visualization (0)
    • Power BI Data Visualization Framework (2)
    • Power BI Data Visualizations (7)
  • Technical Education and Training (0)
    • AI in Education Content Creation (1)
    • Creating Training Materials (3)
    • Learning Paths in Tech (2)
    • Visual Learning (2)

Meta

  • Log in
  • Entries feed
  • Comments feed
  • Powered by WordPress.com.

Disclaimer: The views expressed are my own. I offer no guarantees for the accuracy of the information shared, and is for educational purposes only.

All non-original photography is sourced and licensed from my account on PEXELS,  STORYBLOCKS, iStockPhoto, and Pixabay. Please use our Contact Page if you have a question.

The information provided on this blog is for educational purposes only. Steve Young is not responsible for any errors or omissions or for any actions taken based on the information provided on this blog.

© 2023 5MinuteBI

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.Accept Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
  • Home
  • Blog
  • About
  • Contact