Hosta

How can I help you?

I am the Director of Community Advocacy for the Wikimedia Foundation and the volunteers of all the communities supported by the Wikimedia Foundation. My goal is to increase communication between community members and staff. Do you have questions? Do you have ideas? Please let me know. You can leave me a note here or e-mail me at liaison

wikimedia.org.

Archives

1, 2, 3, 4, 5, 6, 7, 8, 9, 10
11

This page has archives. Sections older than 31 days may be automatically archived by when more than 1 section is present.

Search for a new API for copyright violation detection

As you probably know, we are currently struggling to find a new API for copyright detection work. Phabricator thread, Village Pump thread. It was stated in the Phabricator thread that using Google would be too expensive. User:Crow and myself both asked in the Village Pump thread as to how much the Google service would cost, but we never got an answer. I was wondering if Google is still being considered, and if not, why not? And we would still like an answer as to how much Google would charge. Thanks, — Diannaa (talk) 00:43, 26 April 2016 (UTC)[reply]

Hi, Diannaa. :) I'm sorry to hear that. I'll see what I can find out. :/ --Maggie Dennis (WMF) (talk) 01:28, 26 April 2016 (UTC)[reply]

@Diannaa: I've emailed Kaldari about this to double check (I think he's on a plane for most of today, unfortunately), but I talked to NKohli (WMF) in the meantime and we think this has mostly been answered in the phabricator discussion you linked, specifically in this reply by Kaldari: "For CSE users, the API provides 100 search queries per day for free. If you need more, you may sign up for billing in the Developers Console. Additional requests cost $5 per 1000 queries, up to 10k queries per day." Niharika says that to the best of her knowledge, we've exhausted our contacts at Google as far as hoping we could get those limits/pricing adapted to work for us, so Google is off the table for now, but we'd put it back on the table if we found a new contact who might be able to help us work with Google. Kbrown (WMF) (talk) 15:59, 26 April 2016 (UTC)[reply]

@Kbrown (WMF): I am still not clear why this does not work. It works out to about $50 per day ($1500 per month) for 10,000 queries a day (300,000 per month). Is the problem the price is deemed too expensive, of is 10,000 queries a day insufficient to meet our needs? — Diannaa (talk) 19:39, 26 April 2016 (UTC)[reply]

Hi @Diannaa:, yes, the price is deemed too expensive. If I recall correctly, until now we were paying USD 500 per month for the Yahoo BOSS API access. Besides, our bots frequently cross the 10k queries per day limit. We'd need something that provides more flexibility in terms of usage. NKohli (WMF) (talk) 02:00, 27 April 2016 (UTC)[reply]

I find that to be quite a shocking decision, considering what I am seeing on the Annual Report as to how much money we have in cash, with an increase of $7.3 million in cash over the last year and a total of $35 million of cash and equivalents on hand as of the end of the fiscal year (June 2015). What were you planning on spending the money on, if not this? How can we be taken seriously as a world-class website if the content is riddled with copyright violations? I ask you to reconsider. — Diannaa (talk) 02:17, 27 April 2016 (UTC)[reply]

I must also express some shock that $18,000 per year (3x BOSS) is considered too expensive to use the worlds most comprehensive search database to ensure that we protect the intellectual property rights of potentially 3.6 million owners per year (assuming max count of hits per year). It has already been demonstrated that "lesser" search engines do not catch significant percentages of copyrighted material. Likewise, even if we do exceed the daily hit count, is that mere possibility enough to discount it entirely? I'm pretty sure the previous 10k searches that day will have been worthwhile. In light of the financial report linked, this has me scratching my head and wondering how seriously copyright violation is actually taken when all the cards are on the table... Crow^Caw 14:24, 27 April 2016 (UTC)[reply]

Hi, Crow; Diannaa. I need to acknowledge that I have a bit of a conflict here, given my copyright work. But my conflict is not much of an issue, because this budget doesn't belong to Community Engagement. :)

COI disclosure aside, I can speak to the history of this a bit, which might help you contextualize. The Wikimedia Foundation doesn't so much have a big bucket of money that can be assigned to whatever we want, even if in aggregate it looks like our budget is big-bucket-like. We actually have a pretty tight (and more or less binding) financial plan that is signed off on by the Board every year. When the WMF took on footing the bill for CorenSearchBot, it was an emergency situation, where the bot had ceased to function and the process would not work if we didn't pick up responsibility for it. Engineering took it on even though it wasn't part of the annual financial plan and has continued to support it because the cost could be sustained and the work matters. (I was uninvolved in making that decision.) However, it's important to note that it's a bit of an anomaly for the WMF to pay for a volunteer-run service that focuses on one project in one language without community review, and it was a big deal for us to make the leap to devoting resources to that.

$18,000.00 is not a small sum of money. I know it looks small against the entire budget of the WMF, but keep in mind that we’re not drawing from “all money the WMF has available”; we’re drawing from a much smaller set of dedicated budgets. Aside from salary, it is more than the entire annual operating budget for FY 2016-2017 of the Chief of Community Engagement. That’s all the money the CCE has to spend for the year. The Support and Safety team, which expects to have a staff of eight during the 16-17 year (we're hiring!), requested just shy of $75,000 to operate for all of next fiscal year. This includes all travel, non-standard equipment vital to our work (like our emergency paging system), support for community convenings (last year, we brought the stewards to San Francisco to have in-person meetings) and support for programs like OTRS, staff training and conference attendance, and intern/contractor support for child safety. To meet our goals, we are still working to find ways to cut an additional $6,000.00 from our spending. I don't have that much personal insight into the budget of Engineering and/or Product, but I suspect a change to $18,000 would be no easier for them to absorb, particularly when it was unexpected and hasn’t been planned into any budgets. We're all tightening spending in every area, and in that context, an $18,000 expenditure for a one-project, one-language tool seems likely to be very hard to justify.

It seems like the Community Tech team are, however, looking for alternatives to Google’s more expensive services; in addition there is an alternative funding model that might be able to provide $18,000 a year, subject to community review: Grants. I've asked the Resources team which Grants avenue might be able to support this, and somebody (maybe me; maybe Karen) should get back with you once I hear more about that.

I’m sorry that this is not as easy as it feels like it ought to be; if we could pull money from an unlimited pool and lightheartedly sign off on everything that could be useful to communities, we would love to do it. :( --Maggie Dennis (WMF) (talk) 18:21, 27 April 2016 (UTC)[reply]

I need to P.S. this, 'lest I cause anybody difficulties. I haven't talked to engineering or product about this. If I'm misrepresenting their position, my apologies. :) --Maggie Dennis (WMF) (talk) 18:29, 27 April 2016 (UTC)[reply]

Thanks MRG, that does put things in some perspective. All I can add by way of commentary, for whatever it is worth to the inner discussions, is to mention that with the Biggies out there, "X as a service" is all the rage these days. Just taking the CSB history as a natural evolution - it was free, then the provider realized they could make money off of "search as a service". Anyone with any kind of comparable offering will have already realized this too, so we're going to have to pay someone to get this working again, as unexpected as it is. Though I still think that Bill Gates would take Jimmy's call and could help making something happen there. :) Crow^Caw 19:29, 27 April 2016 (UTC)[reply]

It would be great if money to support this task could be built into future budgets please. — Diannaa (talk) 03:03, 28 April 2016 (UTC)[reply]

@Diannaa: Sorry for the delay, just got back from Berlin. A few answers to your questions. Yes, we have requested limited funding for this for FY 2016–2017. For the current fiscal year, however, we do not have any additional funding budgeted, so we are trying to match the Yahoo price or better. Also note that there are other community members who have objected to us spending any money for a commercial search API, so we want to be diligent about exploring cheaper (or free) alternatives. Also note that FY 2016–2017 will be a tighter budget than FY 2015–2016 so there is no guarantee we will get the full funding we asked for. Hopefully we will though. Ryan Kaldari (WMF) (talk) 13:58, 28 April 2016 (UTC)[reply]

I don't understand why budgets are tight when I look at the annual report and see there's $35 million in cash and equivalents, $7.5 million of which was added in the last fiscal year. The function of the Foundation is Not to accumulate cash, but rather to help support the work of the volunteers who build and maintain the website. — Diannaa (talk) 14:14, 28 April 2016 (UTC)[reply]

Hi, Diannaa. :) As a non-profit organization, the Foundation does need an operating reserve for fiscal responsibility. Our goal is not to operate this year, but to meet our mission of sustaining our resources in perpetuity. We need to support the work of volunteers this year and next year and the year after that and ten years from now, whatever may happen to our fundraising model. It's not really (or even remotely :)) my area of expertise to determine how much of a reserve the WMF should have to shore up this mission, but it makes absolute sense to me that an operating reserve needs to happen, whatever proportion that should be. I presume that the decision on the amount is made at the Board level, potentially in consultation with Fundraising and Finance? As m:2014-2015_Fundraising_Report notes, the decline of desktop readers is a challenge for our fundraising models. We need to be sure that our asks are sustainable and sensible, which has factored into the reduced budget request in next fiscal year's plan at m:Wikimedia Foundation Annual Plan/2016-2017/draft. --Maggie Dennis (WMF) (talk) 14:49, 28 April 2016 (UTC)[reply]

I may be reading this wrong but Table 11 at https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2016-2017/draft#Appendix_2:_Program_-_Non_Staffing_Expenses seems to suggest that there is funding for FY 16/17 for this purpose? 3rd party resources required for the program activities described under Product. Including Bi-weekly regression testings, daily QA, bots/APIs for both copyright/plagiarism detection, and miscellaneous contracting cost to support the program objectives. Not to beat a dead bot, but since the comment period for this proposal ends tomorrow, perhaps some comments there are appropriate? Crow^Caw 21:53, 28 April 2016 (UTC)[reply]

I don't think you're wrong, Crow. :) Ryan said above that they had requested funding for this for next FY. I don't know what the breakdown is, though. And I encourage you to offer any feedback you have. :) --Maggie Dennis (WMF) (talk) 21:57, 28 April 2016 (UTC)[reply]

Thanks! I've commented on the talk page of that proposal. Crow^Caw 22:08, 28 April 2016 (UTC)[reply]

Maggie, I'm not convinced my what you say about the budget, even for this fiscal year. All organizations have a certain amount for contingencies. Furthermore, the WMF is, quite reasonably, using a good deal of the incoming funds not for the current operations, but setting aside a reserve as an endowment. This money is also in principle available if it's really needed. Budgets of organizations change during their fiscal years as things happen. Contingency funds get drawn on; reserve funds get drawn upon. In those organizationsI am familiar with the first two of these are under the control of senior management--obtaining money from the reserves is a matter for the Board.

The question is not whether the money exists. The question is whether this situation is serious enough for the money to be asked for. The purpose of the WMF is to maintain the projects, and one of the principles of all the projects is observing copyright. A gap here will have negative consequences for years to come. Not only will all submissions during the gap need to be re-checked, but the many new users whose copyvios pass undetected will become accustomed to writing in that manner--all quite apart from the harm to our reputation. The situation is in my eyes sufficiently serious. I know that you as an individual in an organization cannot push too hard for some particular emergency unless you are certain you can convince people it is really an emergency. From what I know of you, you are probably pushing as hard as you deem feasible, so it is not you that we need to convince. It's time to escalate, and in my view, the best way is to call the problem to the attention of the community. DGG ( talk ) 20:30, 3 May 2016 (UTC)[reply]

@DGG: If we don't have any resolution from Microsoft by our next meeting with them on Monday (see T125459), I will request emergency funding from Finance to cover the cost of using Google's API until July (at which time we will have funds available from the Community Tech budget). I know that every day the tools are down is painful, but it would be irresponsible of us to spend thousands of donor dollars without making sure that the alternatives are not feasible. Ryan Kaldari (WMF) (talk) 02:11, 4 May 2016 (UTC)[reply]

@DGG, Crow, and Diannaa: Also note that it's not completely clear if Google will actually be a viable option either, as Google's API has a hard limit of 10,000 queries per day (which we would likely exceed). The Bing API has no limit and would be far cheaper, so I'm still holding out hope for it, but we'll have to wait until Monday to see. If you want to read the notes from our last meeting with Microsoft, see my update on the Phabricator ticket. Ryan Kaldari (WMF) (talk) 03:26, 4 May 2016 (UTC)[reply]

I'm very glad to hear that the situation is indeed being viewed with sufficient urgency. DGG ( talk ) 15:55, 4 May 2016 (UTC)[reply]

Question about WMF staff training

I have a question about what training the WMF provides its staff - specifically about free content licensing, attribution and so forth. I couldn't work out who to contact; could you please advise me who I should ask? Thanks, BethNaught (talk) 22:11, 20 May 2016 (UTC)[reply]

Hello, BethNaught. I may be able to answer your question. I helped produce some of the training documentation on that. If not, if you give me more specifics, I may be able to better determine who can. :) --Maggie Dennis (WMF) (talk) 22:23, 20 May 2016 (UTC)[reply]

My concerns are coming from this Commons DR and this other one. They concern files where WMF staff members have uploaded screenshots of Wikipedia and such, without attributing a) MediaWiki developers b) Wikipedia authors c) image contributors, and the respective licenses of each, and simply claiming own work. In the second one, JKatz wrote that "Nirzar and I were not aware that attribution requirements applied to screenshots of Wikipedia for use in discussing Wikipedia". This is very concerning. JKatz has been at the WMF since 2014, so I was surprised he had not been given training (it seems) in how to attribute the WMF's own projects. He also wrote that he will be learning how to correctly attribute later this month. It seems that now the WMF is giving employees training in this matter - what will this involve? BethNaught (talk) 22:31, 20 May 2016 (UTC)[reply]

Hi, BethNaught. We've had licensing documentation for onboarding since at least June 2012, which includes a direct link to commons:Commons:Screenshots as well as a caution about following the license of the product. I suspect the problem has been consistency of delivery of that onboarding, I'm afraid. :/ While there've been steps to make sure onboarding is consistent, in the past it has largely been up to the individual hiring manager. I'll look into this and see if I can figure out what the current practice is and what steps may be taken to rectify any gaps. I'll let you know. --Maggie Dennis (WMF) (talk) 22:43, 20 May 2016 (UTC)[reply]

@@ Line 52: / Line 52: @@
 :Hello, [[User:BethNaught|BethNaught]]. I may be able to answer your question. I helped produce some of the training documentation on that. If not, if you give me more specifics, I  may be able to better determine who can. :) --[[User:Mdennis (WMF)|Maggie Dennis (WMF)]] ([[User talk:Mdennis (WMF)#top|talk]]) 22:23, 20 May 2016 (UTC)
 ::My concerns are coming from [[c:Commons:Deletion requests/Files uploaded by Jkatz (WMF)|this Commons DR]] and [[c:Commons:Deletion requests/Files uploaded by Npangarkar (WMF)|this other one]]. They concern files where WMF staff members have uploaded screenshots of Wikipedia and such, without attributing a) MediaWiki developers b) Wikipedia authors c) image contributors, and the respective licenses of each, and simply claiming own work. In the second one, JKatz wrote that "Nirzar and I were not aware that attribution requirements applied to screenshots of Wikipedia for use in discussing Wikipedia". This is very concerning. JKatz has been at the WMF since 2014, so I was surprised he had not been given training (it seems) in how to attribute the WMF's own projects. He also wrote that he will be learning how to correctly attribute later this month. It seems that now the WMF is giving employees training in this matter - what will this involve? [[User:BethNaught|BethNaught]] ([[User talk:BethNaught|talk]]) 22:31, 20 May 2016 (UTC)
+:::Hi,  [[User:BethNaught|BethNaught]]. We've had licensing documentation for onboarding since at least June 2012, which includes a direct link to [[:commons:Commons:Screenshots]] as well as a caution about following the license of the product. I suspect the problem has been consistency of delivery of that onboarding, I'm afraid. :/ While there've been steps to make sure onboarding is consistent, in the past it has largely been up to the individual hiring manager. I'll look into this and see if I can figure out what the current practice is and what steps may be taken to rectify any gaps. I'll let you know. --[[User:Mdennis (WMF)|Maggie Dennis (WMF)]] ([[User talk:Mdennis (WMF)#top|talk]]) 22:43, 20 May 2016 (UTC)

Hosta

Revision as of 22:43, 20 May 2016

Search for a new API for copyright violation detection

Question about WMF staff training

Recent Comments