• notabot@lemm.ee
      link
      fedilink
      arrow-up
      4
      ·
      2 days ago

      That’s not the only way to do it. In quite a lot of situations you can, instead, generate artificial data that is statistically similar to the original data set and use that instead. That works well for things like system testing, performance tuning and integration testing. Done right, you can even still pull out useful corelations without risking deanonymising the data.

    • a4ng3l@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      2 days ago

      There’s plenty of techniques to avoid re-identification… aggregation isn’t the only way. Especially considering that aggregation if using a stupid dimension isn’t helping at all…