petrescatraian@libranet.de to Technology@beehaw.org · 2 days agoDeepseek when asked about sensitive topicsi.postimg.ccimagemessage-square66fedilinkarrow-up1297arrow-down10file-text
arrow-up1297arrow-down1imageDeepseek when asked about sensitive topicsi.postimg.ccpetrescatraian@libranet.de to Technology@beehaw.org · 2 days agomessage-square66fedilinkfile-text
minus-squareiii@mander.xyzlinkfedilinkEnglisharrow-up39·edit-22 days agoMost commercial models have that, sadly. At training time they’re presented with both positive and negative responses to prompts. If you have access to the trained model weights and biases, it’s possible to undo through a method called abliteration (1) The silver lining is that a it makes explicit what different societies want to censor.
minus-squaredrspod@lemmy.mllinkfedilinkarrow-up7·2 days agoI didn’t know they were already doing that. Thanks for the link!
Most commercial models have that, sadly. At training time they’re presented with both positive and negative responses to prompts.
If you have access to the trained model weights and biases, it’s possible to undo through a method called abliteration (1)
The silver lining is that a it makes explicit what different societies want to censor.
I didn’t know they were already doing that. Thanks for the link!