Tag Archives: working

I love PETL

When I started at my current job I noticed we we had lots of room for improvement about how we imported and exported data.  Folks had been using the MicroSoft SSIS platform as a way to Extract, Transform and Load data in and out of our database to various files.

SSIS is great for lots of things and has a lot of upsides. It is very drag and drop, folks don’t have to know a lot of programming to get it to do things, and it has lots of functions built in. If you need more programming power, you can execute C# or VB scripts to do the fiddly bits.

But I hate it. ( Don’t worry, we’ll get to the love soon.)

My biggest problems with SSIS:

  • It is unversionable. Try reading a git diff of an SSIS change. The xml is designed for a machine to read, not a human. If you want to know what has changed over time in your world, it’s a problem.
  • You can only use Visual Studio to edit it. Many of our SSIS packages include VB or C# scripts. That sounds fine – but apparently these compile to an undiffable, uneditable blob in the xml that is only recompiled if you save using visual studio. So if you want to change something across many SSIS package scripts, you have to open and resave each one.
  • It hides options under rocks. Finding out how something works requires lots of delving into lotsa windows and dialogues.
  • It changes things unexpectedly. Click in the wrong dialogue and it helpfully re-infers datatypes from a file for you. You don’t know until you go to execute.
  • It slapped my momma. Etc.

I wanted to move my team to something that was better for people.

We need something:

  • That we can diff
  • That we can do code reviews and pull requests on
  • That is simple, expressive and clear.
  • That is powerful.

To me that sounds like a programming language.  I encouraged folks on the team to try accomplishing a couple of tasks that might use an SSIS package instead to use Python. Immediately, things got better. Our code reviews made sense. Code quality improved with every single pull request.

We used pymssql to connect to SqlServer and inserted records as needed after processing them. Navigating and transforming XML docs was easy, CSV files were eaten up by the native DictReader.

And then Derrick found PETL. It’s beautiful. You point it at data and make simple moves to completely transform it. I’m smitten.

I had dozens of files to read from, each a quarterly file for a year – only noted in the file name. Each had a crappy heading line that preceded column headers. I needed to put them into 1 file for loading into SalesForce Wave. Whacking together a solution with PETL was effortless. Line 36 is where the PETL starts, and it’s so small and good that it is nice to see how much it encapsulates.

Some good advice to my friends who are terrified of this job market

Don’t try to dodge the recession with grad school.. Many of my friends are considering this sort of move. It’s a sucker bet for a number of reasons that Penelope outlines. My basic argument is her last one.

Graduate school forces you to overinvest: It’s too high risk.
In a world where people did not change careers, grad school made sense. Today, grad school is antiquated. You invest three to six extra years in school in order to get your dream career. But the problem is that not only are the old dream careers deteriorating, but even if you have a dream career, it won’t last. You’ll want to change because you can. Because that’s normal for today’s workplace. People who are in their twenties today will change careers about four times in their life. Which means that grad school is a steep investment for such a short period of time.

You put in many years of avoiding adult life and prolonging adolescence, then commit to a career you have no real idea about. When I thought I might want to be a lawyer, I worked for a law firm and was firmly told by many lawyers that this is the worst job ever. When I thought I wanted to be on the news, I became a news reporter and learned why the news structurally has to be terrible. You learn more by doing.

Of course, that’s coming from a guy who hasn’t gone to graduate school. I still think though, that if you are lost, or unsure, the general best bet is to say yes to lots of opportunities and ditch the ones you hate. You will get somewhere by staying in motion, and learn more things.