    Go to permalink Photo: Getty Political data gathered on more than 198 million US citizens was exposed this month after a marketing firm contracted by the Republican National Committee stored internal documents on a publicly accessible Amazon server. The data leak contains a wealth of personal information on roughly 61 percent of the US population.

    1. Data like that would be a combination of polling data, real world data from door-knocking and phone-calling and other canvassing activities, coupled with modeling using the data we already have to extrapolate what the voters we don't know about would think.
    2. Since this event has come to our attention, we have updated the access settings and put protocols in place to prevent further access.
    3. Based on the information we have gathered thus far, we do not believe that our systems have been hacked.
    4. It was decided that law enforcement should be contacted before attempting any contact with the entity responsible.
    5. The data accessed was not built for or used by any specific client.
    6. This is a proprietary dataset based on a mix of public records, data from commercial providers, and a variety of predictive models of uncertain provenance and quality.
    7. My guess is that they were scraping Reddit posts to match to the voter file as another input for individual modeling.
    8. Campaigns are very narrowly focused. They are shoestring operations, even presidential campaigns. So they don't think of this as an asset they need to protect.
    9. I can think of no avenues for punishing political data breaches or otherwise properly aligning the incentives. I worry that if there's no way to punish campaigns for leaking this stuff, it's going to continue to happen until something bad happens.
