• mke@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    3
    ·
    edit-2
    5 months ago

    Anyone know if Mozilla ever made a statement on the state of MS’s LLM training on Github data? I’m curious if they don’t care about having Firefox be part of the dataset or if they just think the benefits of the platform outweigh that.

      • mke@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        4 months ago

        Thought I’d forgotten 'bout this comment, didn’tcha? Sorry to say, I’m very good at restarting old discussions instead of sleeping on time.

        I tried looking into this again, and although I’m not sure, I don’t think you can. Please do comment if you have any insight on this.

        I trimmed down the fat and replaced it with “…”, but feel free to open the GH terms of use and read the cited sections yourself.

        Wall of terms

        A. Definitions

        The “Service” refers to the applications, software, products, and services provided by GitHub, including any Beta Previews.

        Pretty sure this includes Copilot.

        “Content” refers to content featured or displayed through the Website, including without limitation code …

        D. User-Generated Content, 3. Ownership of Content, Right to Post, and License Grants

        Because you retain ownership of and responsibility for Your Content, we need you to grant us … legal permissions, listed in Sections D.4 — D.7. These license grants apply to Your Content. … You understand that you will not receive any payment for any of the rights granted … The licenses you grant to us will end when you remove Your Content from our servers, unless other Users have forked it.

        (emphasis mine)

        1. License Grant to Us

        … You grant us … right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time. This license includes the right to do things like … analyze it on our servers …

        (emphasis mine)

        1. Moral Rights

        You retain all moral rights to Your Content that you upload, publish, or submit to any part of the Service, including the rights of integrity and attribution. However, you waive these rights and agree not to assert them against us, to enable us to reasonably exercise the rights granted in Section D.4, but not otherwise.

        To the extent this agreement is not enforceable by applicable law, you grant GitHub the rights we need to use Your Content without attribution and to make reasonable adaptations of Your Content as necessary to render the Website and provide the Service.

        (emphasis mine)

        While there is this option on the GH settings for Copilot:

        • Allow GitHub to use my code snippets from the code editor for product improvements

        …I find it entirely unclear what, precisely, is being disallowed when this is unchecked. Surrounding text and links are unhelpful.

        Also, if I understand the relevant law properly here, many AI companies are likely betting on training being fair use. Your rights, your power to dictate terms in a LICENSE, are thus irrelevant if fair use applies. I lack the background to tell how the COPIED Act, were it to pass, would change this in regard to code, if at all.

        Finally, there’s that question… do you actually trust companies to follow the rules in good faith?