You’ve seen it before. Large, unwieldy spreadsheets with an arbitrary user ID in the leftmost column. Will the data related to those scenarios still be intact across the 15 interrelated in-house systems and third party stubs next week? Tomorrow even? Maybe.
Test data ‘managed’ in this fashion quickly becomes outdated. Perhaps there are multiple teams working with the same testing environments. What if someone in Team A doesn’t know someone in Team B was using USER_ID 00000097 and changes it? What if someone accidentally wipes the database? All manual testing and reliant automated tests come to a standstill.
There is surely a better way.
Automated test data conditioning
Hear me out:
- Test data is procedurally generated. No more ‘what makes this happy path scenario a happy path scenario?’. It’s right there.
- Version-controllable. Data changed that shouldn’t have? Want to see what the system was sending back 6 months ago? It’s all there.
- No massive databases full of trash (unless you want there to be!). Test data generation can be done on the fly at any time. Want to test a very specific scenario? No more looking in spreadsheets for a specific user ID and praying that the data is still there. Someone messed with your data? Quickly recondition it to a known state.
- Agnostic of your mocking mechanism. This just spits out data. How you send that data back is up to you now.
- You can hook this up to your automated tests. Your Given step now generates and conditions the test data for your specific scenario.
- In addition, this strategy allows for multiple user traits to be conditioned with ease. I want a user with traits X, Y and Z. Find that in your spreadsheet!
People want to provision virtual machines. They want to provision architecture. But they don’t seem to want to provision test data. I have done it, and it works.