All posts by MattK

About MattK

I like you.

I love PETL

When I started at my current job I noticed we we had lots of room for improvement about how we imported and exported data.  Folks had been using the MicroSoft SSIS platform as a way to Extract, Transform and Load data in and out of our database to various files.

SSIS is great for lots of things and has a lot of upsides. It is very drag and drop, folks don’t have to know a lot of programming to get it to do things, and it has lots of functions built in. If you need more programming power, you can execute C# or VB scripts to do the fiddly bits.

But I hate it. ( Don’t worry, we’ll get to the love soon.)

My biggest problems with SSIS:

  • It is unversionable. Try reading a git diff of an SSIS change. The xml is designed for a machine to read, not a human. If you want to know what has changed over time in your world, it’s a problem.
  • You can only use Visual Studio to edit it. Many of our SSIS packages include VB or C# scripts. That sounds fine – but apparently these compile to an undiffable, uneditable blob in the xml that is only recompiled if you save using visual studio. So if you want to change something across many SSIS package scripts, you have to open and resave each one.
  • It hides options under rocks. Finding out how something works requires lots of delving into lotsa windows and dialogues.
  • It changes things unexpectedly. Click in the wrong dialogue and it helpfully re-infers datatypes from a file for you. You don’t know until you go to execute.
  • It slapped my momma. Etc.

I wanted to move my team to something that was better for people.

We need something:

  • That we can diff
  • That we can do code reviews and pull requests on
  • That is simple, expressive and clear.
  • That is powerful.

To me that sounds like a programming language.  I encouraged folks on the team to try accomplishing a couple of tasks that might use an SSIS package instead to use Python. Immediately, things got better. Our code reviews made sense. Code quality improved with every single pull request.

We used pymssql to connect to SqlServer and inserted records as needed after processing them. Navigating and transforming XML docs was easy, CSV files were eaten up by the native DictReader.

And then Derrick found PETL. It’s beautiful. You point it at data and make simple moves to completely transform it. I’m smitten.

I had dozens of files to read from, each a quarterly file for a year – only noted in the file name. Each had a crappy heading line that preceded column headers. I needed to put them into 1 file for loading into SalesForce Wave. Whacking together a solution with PETL was effortless. Line 36 is where the PETL starts, and it’s so small and good that it is nice to see how much it encapsulates.

My legs hurt

Group 3
I did a 5K last night in Central Park – which sounds like nothing, but I am not a runner.

The last time I ran was 15 years ago in the JP Morgan Corporate Challenge, with Chris Acton. It was terrible. I ran the 5K race in Rochester in a terrible 45 minutes. I was so slow and awful that my elf-lord boss ran backwards in bare feet encouraging me to keep going. It seems kind, but it also seems kind of like krumping to show someone that it isn’t hard to dance.

Anyway, this time was better! I don’t dig the 30 minutes of being corralled while we wait to start or the crowding, but once the run started I was able to get going and stay going. I’m still slow – finished in 32 minutes – but it’s better by a lot than last time!

Review: A Burglar’s Guide to the City

A Burglar's Guide to the City
A Burglar’s Guide to the City by Geoff Manaugh
My rating: 4 of 5 stars

I’ve always been a fan of Geoff Manaugh’s BLDGBLOG, which is only nominally a study of architecture through strange lenses. (One of the first posts as I write this looks at an art study of the bacteria on money and how it travels through society and compares to seeds being transmitted through ancient boat ballast.)

And who doesn’t love burglary and heist movies – I’m in it for the naughtiness of penetrating forbidden places and urban exploration.

This book is a loving review of how architecture affects burglary, how burglary affects architecture, how the architecture of a city affects the burglary and then affects how policing responds. The helicopter patrols of L.A. sprawl are a response just as the vertical patrols of giant housing projects reflect their own landscapes.

We delve into locks, lockpicking, escaping, getaways, tunnels through earth, air, traffic, and buildings themselves.

At the end is the sobering reflection that all of this is only interesting as the edges of burglary, the mythical kind of burglary. Real burglary is too often full of ugly nastiness, destruction and damage to the lives of those burgled.

I really enjoyed the discussions on Nakatomi space and turning on burglar eyes to see architecture in a different way – it’s an easy read and I’d recommend it.

View all my reviews

Who watches the watchmen?

The Justice Department and FBI have formally acknowledged that nearly every examiner in an elite FBI forensic unit gave flawed testimony in almost all trials in which they offered evidence against criminal defendants over more than a two-decade period before 2000.

 

The cases include those of 32 defendants sentenced to death. Of those, 14 have been executed or died in prison, the groups said under an agreement with the government to release results after the review of the first 200 convictions.

Source: FBI admits flaws in hair analysis over decades – The Washington Post

It happened before 2000. There was other evidence in those cases. But still – false testimony from these high levels over decades happened.

It should shake you.

What is preventing us from reading a similar headline in ten more years? How could we make sure this lab has an incentive to tell the truth rather than to ally with their colleagues?

Quick Project Names Demo

At work, I’m trying to convince people that we should auto-generate at least a suggested code name for our project names. It’s an important thing for compliance and secrecy. You’d rather someone is overheard talking in the elevator about “Project Icy Gneiss” than about “the restructuring of Acme Corp”.

I wanted to make the point that if you just have a small list of adjectives and nouns you quickly get a vast space of possible names – more than we’ll exhaust.  But a working demo is more persuasive than logic.

I knocked this together last night: Projects-a-Plenty.

projects-a-plenty

Used bootstrap & angular which is kind of overkill on something this tiny.