pantheonuk
  • Home
  • Business
  • Education
  • Fashion
  • Health
  • Lifestyle
  • News
  • Tech
  • Sports
  • Travel
No Result
View All Result
Pantheonuk.org
  • Home
  • Business
  • Education
  • Fashion
  • Health
  • Lifestyle
  • News
  • Tech
  • Sports
  • Travel
No Result
View All Result
Pantheonuk.org
No Result
View All Result

Introducing pydbgen: A random dataframe/database table generator

Joe Calvin by Joe Calvin
July 12, 2024
in Business
0
Share on FacebookShare on Twitter

Data science is often about accessing raw data. This is what your greatest concern should be when you first start to learn data science. Although there are many great, real-life data sets available online for machine learning, I have found that this is not the case when learning SQL.

A basic knowledge of SQL is essential for data science. However, it’s much easier to access large databases with real data (such name, address, credit card, social media number, birthday, and so on) than to find toy datasets on Kaggle. These data are specifically created or curated to be used in machine learning tasks.

 

More Python Resources

  • What’s an IDE?
  • Cheat Sheet: Python 3.7 for Beginners
  • Top Python GUI Frameworks
  • Download: 7 essential PyPI libraries
  • Red Hat Developers
  • The most recent Python content

It would be wonderful to have a tool or library that could create large databases with multiple tables and data of your choice.

 

Even seasoned software testers, aside from those who are new to data science, may find it helpful to have a simple tool that generates large data sets with random (fake) entries.

This is why I’m happy to present a lightweight Python library, . This article will briefly describe the package. You can also read the docs for more information.

What is pydbgen exactly?

Pydbgen, a lightweight pure-Python library, generates random useful entries (e.g. name, address and credit card number; date, time; company name; job title; license plate number). You can save them as a Pandas object, an SQLite table within a Microsoft Excel file or in a Pandas Dataframe object.

How to install PythonDBgen

The current version (1.0.5), is available on PyPI (the Python Package Index repositorie). To make this work, you must have Faker. Enter:

It was tested with Python 3.6, but it won’t work with Python 2 installations.

How to use it

To start using Pydbgen, initiate a pydb object.

You can then access the various functions of the Python object. Enter:

It will return fictitious names if you enter as opposed to real.

Create a Pandas dataframe using random entries

You can select how many data types and how many will be generated. All data types are returned as strings/texts.

The resultant dataframe looks something like the image below.

Create a database table

You can select how many data types and which data types you want to generate. All data is returned in the text/VARCHAR format. The table name and filename can be specified.

This creates a file called.db that can be used with MySQL and the SQLite databases servers. This image shows a SQLite database table that was opened in DB Browser.

 

Create an Excel file

The following code generates an Excel file with random data, similar to the ones above. Note that phone_simple has been set to false so it can create long-form, complex phone numbers. This is useful if you need to test more complicated data extraction codes.

 

For scrap use, generate random email IDs

realistic_email is a built-in method of pydbgen that generates random email IDs based on a seed name. This is useful if you don’t want your actual email address to be displayed on the internet, but something similar.

Future improvements and user contributions

The current version may have many bugs. Please let me know if your program crashes while being executed (except for one caused by an incorrect entry). If you have a great idea and want to contribute to the source code, please visit the GitHub repo. Several questions are easy to answer:

  • Is it possible to combine some statistical modeling/machine learning with the random data generator?
  • Is it possible to add a visualization function to the generator?

Related Posts

7 Online Jobs for Students with Basic Requirements
Business

A Practical 2026 Checklist for Hiring Freelancers Without Wasting Budget

The freelance economy is no longer a “nice to have” for businesses. It is a standard way to access...

by admin
January 17, 2026
Guide to Selecting the Perfect Large Floor Sweeper for Industrial Use
Business

Guide to Selecting the Perfect Large Floor Sweeper for Industrial Use

Maintaining cleanliness across expansive industrial environments directly impacts safety, productivity, and operational efficiency. Warehouses, manufacturing plants, logistics hubs, and...

by admin
January 13, 2026
Key Things to Consider Before Buying a Workforce Schedule Software for Construction
Business

The Benefits of Using Equipment Checkout Software for Streamlining Inventory Management

Inventory management is fundamental to operational efficiency across businesses, particularly those that rely on a steady flow of equipment...

by admin
January 11, 2026
credit score
Business

Smart Credit Decisions That Lead to Better Financial Stability

Credit problems show up suddenly, but they often build quietly, through small choices that feel harmless at the time....

by admin
January 7, 2026
Next Post
black belt training India

Why It Is Important to Get the Six Sigma Black Belt Training?

Pantheonuk.org


Pantheonuk.org provides a informative articles about the topics of Business, Tech, Lifestyle, Health, Education, News and Travel. It's UK based blogging sites which covers various topics too.

  • Home
  • About
  • Contact

© 2022 pantheonuk.org

No Result
View All Result
  • Home
  • Business
  • Education
  • Fashion
  • Health
  • Lifestyle
  • News
  • Tech
  • Sports
  • Travel

© 2022 pantheonuk