The following discussion is an archived debate of the proposed deletion of the article below. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review). No further edits should be made to this page.

The result was Moving to Draft‎. There is a tending consensus to delete and certainly a clear one that the information as it stands is unsuited to mainspace. There is also a clear strand that it's possible to reorganise this content into something manageable and useful. With that in mind it's better to just shift to draft to allow further discussion and agreement to arise without the sword of Damocles hanging over the conversation. Spartaz Humbug! 20:21, 31 July 2023 (UTC)[reply]

List of proteins in the human body

List of proteins in the human body (edit | talk | history | protect | delete | links | watch | logs | views) – (View log | edits since nomination)
(Find sources: Google (books · news · scholar · free images · WP refs· FENS · JSTOR · TWL)

No list with potentially 10k entries can effectively be curated (or even displayed) in this format. We have Category:Human proteins, which should serve for navigation purposes. -- Elmidae (talk · contribs) 06:48, 13 July 2023 (UTC)[reply]

NOTE: please see comment by Boghog, added after most other comments. This points to the existence of List of human protein-coding genes 1, List of human protein-coding genes 2, List of human protein-coding genes 3, and List of human protein-coding genes 4, which I was not aware of. I suggest the concern about effective duplication is justified. --Elmidae (talk · contribs) 12:09, 19 July 2023 (UTC)[reply]
The list may not need 10K entries; the category has 914 pages, but I think it would be difficult to maintain such a large, indiscriminate list of proteins. Perhaps a redirect could be considered. Enervation (talk) 07:05, 13 July 2023 (UTC)[reply]
  • Keep or draftify Brand new article (less than 1 day old at time of nom, WP:BEFORE.C.2). Meets WP:NLIST. Lets allow the creator and other editors time to work to refine it and let the inclusion criteria and organization firm up before bringing it here. —siroχo 07:20, 13 July 2023 (UTC)[reply]
  • Delete Better as a category, a list would be simply unmanageable. CaptainEek Edits Ho Cap'n!⚓ 07:49, 13 July 2023 (UTC)[reply]
    A category just seems harder to export to SQL or a format which can easily be compared with other online databases no? it makes the automization of the maintaince work a lot harder. Or maybe there is a good way to export categories which I have not seen yet? Claes Lindhardt (talk) 13:48, 17 July 2023 (UTC)[reply]
  • Keep but modify: this seems like a very useful list. However, there will be too many proteins, so the article's lede should say "notable" proteins and the list should only contain proteins which are somewhat notable, i.e., they have been frequently discussed in science or news, or they have a substantial or interesting effect on a notable and relevant phenotype. Chamaemelum (talk) 07:55, 13 July 2023 (UTC)[reply]
    Maybe a solution could also just be to make the table collapsable?(if you go to the Statistics section of COVID-19 pandemic by country and territory or List of skeletal muscles of the human body you can see examples of this )so that it does not take up so much space, and then make a column makring if proteins are notable, then define a criteria for when they are notable, in that way the list could quickly be sorted. and one could easily and quickly get the proteins that one need and export them to excel or whatever other format one might need? Claes Lindhardt (talk) 13:47, 17 July 2023 (UTC)[reply]
  • Comment. Could the list be divided either alphabetically or functionally to make it more manageable? Eastmain (talk • contribs) 08:44, 13 July 2023 (UTC)[reply]
    There is indeed precedent for dividing lists alphabetically / lexicographically. List of lists of lists is my favorite starting point for finding examples of good lists. —siroχo 08:53, 13 July 2023 (UTC)[reply]
    I mean it is a sortable table, so if you clock the two arrows next to the name in the table you will have it sorted alphabetically. But you could also do the same with the Cell cloumn or any of the other column in this way there is flexibility on how you sort the proteins. You can sort them according to any parameter which have been entered for all proteins: Is this helpful? or did I misunderstand the question? Claes Lindhardt (talk) 13:45, 17 July 2023 (UTC)[reply]
  • Keep. This nomination is fundamentally misconceived. There's an old and longstanding consensus that in all cases where we can have a category, we can and should have a navigational list; it's all set out with reasons at WP:CLN (and more specifically at WP:NOTDUP).—S Marshall T/C 08:58, 13 July 2023 (UTC)[reply]
    While I am sure that it feels mighty satisfying to make these stentorian pronouncements, see e.g. Wikipedia:Articles for deletion/List of species described in 2022 for a directly parallel case that, yes, was based on the problem of excessive number of entires. --Elmidae (talk · contribs) 09:24, 13 July 2023 (UTC)[reply]
    We don't always remember our guidelines, or follow them. But that doesn't mean the guidelines don't count.—S Marshall T/C 09:33, 13 July 2023 (UTC)[reply]
    Do I have to point out to a long-term contributor that desirables must be balanced against others (such as, having list-form material to parallel category-form, vs not throwing thousands of entries into an uncuratable list)? Would be nice if one could just throw references to The Only Applicable Rule into all these AfDs, eh... --Elmidae (talk · contribs) 10:18, 13 July 2023 (UTC)[reply]
    Isen't the point of having an online encyplopedia that you can have lists longer then you can in a physical one? it is just a question of how we can make it easy enough for everyone to navigate no? what can we do to make the table more intuative to use for people who don't work with large datasets on a daily basis - I think this is the exciting thing about these sortable tables it can let the common man see all the concerete emperical data which is way to often hidden behind abstraction, and it allows him to answer a lot of questions that other may not have asked. In this way it is not only people who can Code SQL who can ask the questions or understand what is already known - or am I getting of the discourse? - I feel like if categories are as easy for everyone to sort and export to homemade code which be used to check and compare. We could just as well use a category. But when I see a category I do not have the overview over which instances in it is lacking what information nor how I would sort it according to anything which is not just alabeth sort of its name? but maybe there is a good way to do this? Claes Lindhardt (talk) 13:54, 17 July 2023 (UTC)[reply]
  • Keep Important medical topic. Okoslavia (talk) 12:26, 13 July 2023 (UTC)[reply]
    <3 Agreed, I feel like List of skeletal muscles of the human body it makes a lot easier to visualize just how many elements there is to it, and how much of the statements that people come with about the number of proteins are estimates vs. actual observation. Claes Lindhardt (talk) 13:56, 17 July 2023 (UTC)[reply]
    Well, it's not a medical topic, it's biochemistry. With this level of understanding, it's baffling how you even are able to assess its importance. — kashmīrī TALK 15:57, 26 July 2023 (UTC)[reply]
  • Convert to category for easier navigation. WeirdNAnnoyed (talk) 14:35, 13 July 2023 (UTC)[reply]
    What is it that makes categories easier to navitage? Maybe we can implement something simmilar here, so that it becomes just as easy? Claes Lindhardt (talk) 13:57, 17 July 2023 (UTC)[reply]
  • Keep or draftify and advise author to produce a curated list of notable human proteins (per Siroxo and Chamaemelum). This will be a useful list with care and attention, and can improve the Category:Human proteins with the removal of obvious errors (like the inclusion of Acalabrutinib) or adding omitted proteins. ― Synpath 17:26, 13 July 2023 (UTC)[reply]
    This is some great idear, do you have the time and energy to help me do it? where can one find a good description of 'per Siroxo and Chamaemelum' I think a key lacking eliment of the category is also the naming convention for each entry. I made one here, but it is not very good yet. It still need a lot of work before it makes sense to implement it regitly on all the proteins. But I think it is still better to have a page where it is clear that no clear naming convention have been implemented yet. Then one that implies it, without having it? Claes Lindhardt (talk) 13:59, 17 July 2023 (UTC)[reply]
  • Note: This discussion has been included in the list of Lists-related deletion discussions. Spiderone(Talk to Spider) 17:35, 13 July 2023 (UTC)[reply]
  • Keep Being too long is never a valid reason to delete it. List that do get long are broken into sublist. This list only shows entries that have their own Wikipedia article. The list offers more information than a category could, thus aids in navigation, helping find what they are looking for easier. A valid information list as well. Dream Focus 18:05, 13 July 2023 (UTC)[reply]
    This is also what I have tried to formulate with concrete examples as replies to the others. If there is something which categories have which we lack here, I am sure we can find a good way to implement it :) Claes Lindhardt (talk) 14:01, 17 July 2023 (UTC)[reply]
  • Keep per S Marshall's take on CLN. If it becomes too long, we can make it a list of lists, but this is precisely the sort of navigational list Wikipedia should have and organize and update in a manner a dead tree encyclopedia cannot. Jclemens (talk) 06:22, 14 July 2023 (UTC)[reply]
  • Draftify. While all 10,000 human proteins are probably not notable, the scope of this list as currently scoped will run to several thousand items, which is not practical for a list article, as the nominator observed. (See, e.g., WP:SALAT.) It's not immediately obvious to me how to subdivide it into manageably-sized lists, but we should deal with that now rather than years later. To add to the precedential example of Wikipedia:Articles for deletion/List of species described in 2022, see Wikipedia:Articles for deletion/List of blast furnaces for another recent deletion of an over-broadly scoped list. Choess (talk) 14:15, 14 July 2023 (UTC)[reply]
    I wonder if the people who first sad down to make a list of all human muscles on paper felt the same? could we try and take inspiration from how they solved it here? and would the list even have to be subdivided if everyone just sort the list according to what they are looking for? Claes Lindhardt (talk) 14:03, 17 July 2023 (UTC)[reply]
    Those examples are not valid in this case. There was nothing that made blast furnaces notable, there no articles for just them, but instead things on that list linked to businesses that had them among their holdings, without mentioning anything about the blast furnaces themselves. The proteins in the human body are notable, mentioned in textbooks and other scholarly publications, any new one discovered covered in science news sources, and some have their own articles dedicated to them. Dream Focus 14:39, 21 July 2023 (UTC)[reply]
    I am not sure I follow, isen't there: List of skeletal muscles of the human body Claes Lindhardt (talk) 09:06, 22 July 2023 (UTC)[reply]
  • Delete, and Do not categorify. The list length is unmanageable, and it serves no navigational purpose. No one is going to want to browse some list of 2000 entries, most with obscure names. Converting to a category is probably worse in this case. Many (if not most) of these are also present in other animals besides humans, and thus this is not a WP:DEFINING characteristic. It also opens the floodgates for thousands of similar categories of proteins in other animals and plants. Additionally, having this as a category requires keeping track of the categories on every such article, rather than simply keeping a central list (which, just to be clear, I still think is useless). 35.139.154.158 (talk) 15:52, 14 July 2023 (UTC)[reply]
    But we also have a list of List of skeletal muscles of the human body , List of bones of the human skeleton, List of distinct cell types in the adult human body , List of human microbiota. Even though there is animals out there with the same bacteria, simmilar muscles and bones and very simmilar cells. There is still a very strong criteria for when one cannot add something to the list, when it is not in humans. So it is still a very exclusive definition. the argument ' having this as a category requires keeping track of the categories on every such article, rather than simply keeping a central list' I find good. I think the use is also visualising all the building blocks of a human and all the places where something can break, and how to detect when something is broken based on how it looks in most other human. A lot of databases exist on this matter(see the list of databases in the end of the article). So clearly it have priority to a lot of people with resources. But all of them or hard to access, and thereby also hard to scrutinize and update. Claes Lindhardt (talk) 14:10, 17 July 2023 (UTC)[reply]
  • Note that the list of proteins in the human body would be very similar to proteins in mammals, and most on the list will be common to all animals. It would be nice to have, but the list will be too long. Graeme Bartlett (talk) 11:16, 16 July 2023 (UTC)[reply]
    How long? List of minor planets has 700,000 entries.—S Marshall T/C 12:21, 16 July 2023 (UTC)[reply]
Whoa. I was not aware of that monster. Well, at least those should be pretty stable entries... --Elmidae (talk · contribs) 19:01, 17 July 2023 (UTC)[reply]
How would you like to help make these entries more stable? :) Claes Lindhardt (talk) 06:27, 19 July 2023 (UTC)[reply]
  • Keep, the list would be useful and if it gets too large, it can always be split. If we can list the hundreds of thousands of minor planets, surely we can list 10,000 (most likely fewer) proteins. Persent101 (talk) 05:05, 17 July 2023 (UTC)[reply]
  • Keep Merge A part of this to me is also veryfying that there is actuelly 10 000 proteins, I see the claim a lot of places the wiki page on proteins also at some point claim 20 000. I also found other claims online, all of these however seems to estimates. I think it is very healthy that people can very quickly and easily verify just by looking at the lenth of the list how many proteins are well described and have documentation available to the broad public? I know wiki should not be seen as containing some kind of final truth. But woulden't it be nice if wiki would at least be used to some degree to verify other sources? — Preceding unsigned comment added by Claes Lindhardt (talk • contribs) 14:14, 17 July 2023 (UTC)[reply]
    Also there is a lot of people with very good idear here, if you have the time and energy please help start implementing them on the article!!! <3 Claes Lindhardt (talk) 14:21, 17 July 2023 (UTC)[reply]
    How do I update my vote to Merge? Claes Lindhardt (talk) 12:26, 23 July 2023 (UTC)[reply]
    You would need to strike your original bolded vote and replace it with your new one (i.e., replace your '''Keep''' vote in the source code with <s>'''Keep'''</s> '''Merge''' ). Seppi333 (Insert ) 18:49, 25 July 2023 (UTC)[reply]
    @Claes Lindhardt: meant to ping before. Seppi333 (Insert ) 20:20, 26 July 2023 (UTC)[reply]
  • Score (if I did not miss anybody - please corret me if I did) Keep or draftify: 9 People, Delete: 2 people. How many people typically cast thier vote before it is deterimed? — Preceding unsigned comment added by Claes Lindhardt (talk • contribs) 14:24, 17 July 2023 (UTC)[reply]
    Thanks for your contributions to this discussion! Please note that these aren't votes (hence the term WP:!VOTE), and as such we tend to avoid putting tallys per WP:AFDEQ as they can encourage distraction from the substance of the discussion itself (which indeed you have contributed to in your other comments!). These discussions run for 7 days minimum, except in rare cases that don't affect this one. —siroχo 18:54, 18 July 2023 (UTC)[reply]
    Ahh allright thank you Claes Lindhardt (talk) 06:27, 19 July 2023 (UTC)[reply]
  • Comment Making the article it feels like it is just as much a quest to list all the proteins on wiki in a way that makes all the already existing articles navigatable.(as a quest to list all proteins in the human body). So far I have not added a single protein not already on wiki, and there is still a good few left before I have all the onces on wiki so far listed. It seems more like a question of how we should make all this proteins navigable and usable then if or? I also could not find another wiki list with proteins as estensive nor which talks about all the other lists on the talk page? - If I missed it please let me know. I think this list might be worth keeping just as an attempt at merging all the great input from the other attempts at makign a central navigable spot for proteins on wiki? — Preceding unsigned comment added by Claes Lindhardt (talk • contribs) 13:12, 18 July 2023 (UTC)[reply]
    I mean, dosen't it seem usefull for such a page just for comparing different articles on the same protein but with different names? and for assuring consistency in how the articles on proteins are structred? Claes Lindhardt (talk) 13:15, 18 July 2023 (UTC)[reply]
  • Keep People will keep making articles about thier favorite human proteins, and there is some dublicate work and low levels of consistency across the articles. A list like this is the ideal chance to get more organcizes, coordinate efforts and establish conventions that makes later data estraction and analysis easier — Preceding unsigned comment added by Claes Lindhardt (talk • contribs) 06:53, 19 July 2023 (UTC)[reply]
    I fear that if we do not start this effort now, we just push the problem of organizing it all so that it is accesible to everyone, ahead of us. Thereby possibly creating a lot of unnecessary issues, as well as allowing exsiting issues to grow. Claes Lindhardt (talk) 08:17, 19 July 2023 (UTC)[reply]
    Duplicate !vote struck. Please only !vote once, and please also see WP:BLUDGEON. 35.139.154.158 (talk) 20:17, 19 July 2023 (UTC)[reply]
  • Delete Comment Duplicates Lists of human genes. Note that a Gene Wiki article is not only about the gene, but also the proteins encoded by those genes (it makes no sense to split up two such interrelated topics). There are ~20,000 protein coding genes in the human genome, of which 12,733 (Infobox gene's transclusion count) already have Wikipedia articles (the Gene Wiki) about them. Including splice variants, there could be as many as 500,000 unique human proteins (see Pray L (2008). "Eukaryotic genome complexity". Nature Education. 1 (1): 96. ). So it is impractical to have a single list, it must be split up into sublists as is done with Lists of human genes. Boghog (talk) 11:47, 19 July 2023 (UTC)[reply]
    Thank you very much this is a lot of very usefull links. None of these seems to list the end-product? nor account for the way that proteins fold? or am I missing something? How would I go from Lists of human genes to haveing an overview of all the pages on finished types of proteins? and how would I be able to see what groups of proteins there is? or like how it makes the most sense to group proteins? Does the Infobox gene's transclusion count also remove dublicates when for example someone named one user mande a protein article called Zinc Finger Protein and another ZNF but both refering to the same? Is there a way where I can easily see all of these infobox templates usages listed in one place and where I can sort them to extract data to answer new questions? Lists of human genes does really have a very nice format, but it feels like there is a lot of jumping between pages to extract the data that one need? It also seems to link to List of proteins which is a less extensive version of this list. What is the purpose of this list and having it on the Lists of human genes article? Claes Lindhardt (talk) 13:06, 19 July 2023 (UTC)[reply]
    43S preinitiation complex, Acid hydrolase, Actibind, seems to also miss templates/infoboxes. Is there a good way to decide when an article should have one and when it should have both infoboxes? so that we can give all the articles consitent infoboxes? Claes Lindhardt (talk) 11:10, 22 July 2023 (UTC)[reply]
    Alpha-2 adrenergic receptor, Angiopoietin-like proteins, ANKS4B, Actin nucleation core , ASIC5, ASTE1, Coronin also seems to be missing infoboxes. Claes Lindhardt (talk) 13:01, 22 July 2023 (UTC)[reply]
    As well as: Adhesion G protein-coupled receptor, Adipokine, Adiponectin receptor, Albondin, Alpha collagen, Amyloid, Anchoring fibrils,
    Is Albinterferon using a different kind of infobox? Claes Lindhardt (talk) 13:20, 22 July 2023 (UTC)[reply]
    Keratin also maybe? Claes Lindhardt (talk) 14:13, 22 July 2023 (UTC)[reply]
Extended discussion
  • Thanks for your questions. It is important to keep in mind that articles that contain the {{Infobox gene}} are about both the gene and the protein encoded by that gene. In List of human protein-coding genes 1, contains columns about the gene (HUGO gene symbol & HGNC ID) as well a the protein (UniProt ID). So this list definitely contains information about the end product. Additional columns such as the recommended UniProt protein name could be added if there is consensus. But because of the size of these lists, I think one has be very selective in what to add. {{Infobox gene}} draws it data from WikiData. In WikiData, each human gene can be entered only once and can only link to a single page in the Enlgish Wikipedia, hence guaranteeing that there a no duplicate articles. {{Infobox protein}} which is independent of WikiData preceeded {{Infobox gene}}. The vast majority of the former has been replaced by the later, but there still might be a few out there that have not been replaced and these could be duplicate articles. Every time I run across one of these duplicates, I merge the articles. (Please note that {{Infobox protein}} has other uses in for example about articles about protein families or protein complex. These are not duplicates.)
    I am a little confused myself about the purpose of Lists of human genes. It seems to have drawn its data from the Genetics Home Reference database which has been superseded by MedLine Plus genetics. It also links to disease pages which makes is doubly confusing.
    For reference, here is a list of MCB Infoboxes. There a number of NavBoxes like {{G protein-coupled receptors}}, {{Ion channels}}, {{Transcription factors and intracellular receptors}}, and {{Enzymes}} which are organized by structure/function.
    Becasue of Wikipedia page size limits, there is no way around jumping between articles when you are querying human coding genes. If you continue to grow your list, you will sooner or later have to divide into sublists. Boghog (talk) 14:40, 19 July 2023 (UTC)[reply]
    A minor thing, but going trough a good this list so far, there is often one that seems to not be ussing the Infobox gene's transclusion count template despite beeing a protein article? Claes Lindhardt (talk) 13:09, 19 July 2023 (UTC)[reply]
    Which protein article? Please provide me with a direct link to the specific Wikipedia article. Thanks. Boghog (talk) 17:25, 19 July 2023 (UTC)[reply]
    It seems that 2',3'-Cyclic-nucleotide 3'-phosphodiesterase uses two different templates, Tau protein, Zuotin uses just one and Β-Thromboglobulin, ZP1, ZNF837, ZNF831 have none?(these are the first that I could think of on top of my head, but there are more troughout the list). Sorry for not providing them in the inital reply. Claes Lindhardt (talk) 22:28, 19 July 2023 (UTC)[reply]
    2',3'-Cyclic-nucleotide 3'-phosphodiesterase contains both {{infobox gene}} and {{infobox enzyme}} which is a bit ackward. Ideally {{infobox gene}} should display enzyme information, but it does not, and that is why there are two templates. Please note that strictly speaking, and Enzyme Commission number refers to the reaction catalyzed by an enzyme, not the enzyme protein itself. Often there is more than one human gene that corresponds to a given EC number. It these cases, it is appropirate to have a seperate enyzme article. In this particular case, there is only one human gene (CNP) that encodes this enzyme (see EC 3.1.4.37), therefore it is appropriate that the gene and enzyme articles are merged.
    According to UniProt P02775, β-Thromboglobulin is a peptide that is cleaved from CXCL7. It is debateable if β-Thromboglobulin should be a free standing article or should be merged into CXCL7. If free standing, β-Thromboglobulin should have a {{infobox protein}}. I have added the infobox in this edit.
    Zuotin is not found in humans, it is produced by yeast. That is why it contains a {{Infobox nonhuman protein}} template.
    ZNF837, ZNF831 were missing {{infobox gene}} templates. I have now added them. Tau protein has a single {{infobox gene}} template as it should. Boghog (talk) 05:11, 20 July 2023 (UTC)[reply]
    Thank you for the quick reply :) So there is no well established relationship between {{Infobox protein}} and {{Infobox gene}} and the list Lists of human genes takes {{Infobox gene}} into account but not {{Infobox protein}}. sorry I read your reply a good few times now, but I still do not quite understand the relationship of these to infoboxes nor how wide they are implemented. Nor how Lists of human genes incorporates them, I feel like it should be obvious at this point. But somehow it is still not to me - I really appriciate you taking the time to explain things.
    The relationship between {{infobox protein}} and {{infobox gene}} is historical. {{infobox protein}} came first. {{infobox gene}} is more modern (draws its data from wikidata) and is intended to replace {{infobox protein}}. However {{infobox protein}} is more compact and for this reason, is still used in special cases (articles about protein families or protein complexes.). The purpose of these infoboxes is summarized here. Boghog (talk) 05:28, 20 July 2023 (UTC)[reply]
    Could a temporary conclusion be that we might need to give List of biological databases a closer look and an update to be sure, that everything is using the most purposefull database with the widest coverage, or is this more of a discourse?.
    These navboxes are added to the individual page and then the tree structure on the overview articles adaptes and updates automatticaly?
    Maybe we could take some inspiration from the way that List of minor planets is structured? somehow that feels easier to keep an overview of then the current Lists of human genes. Claes Lindhardt (talk) 22:47, 19 July 2023 (UTC)[reply]
  • There is another set of gene/protein lists that are maintained by User:Seppi333Bot: Boghog (talk) 11:51, 19 July 2023 (UTC)[reply]
  • Delete as unmanageably large and duplicative of other lists per Boghog. --SilverTiger12 (talk) 19:09, 19 July 2023 (UTC)[reply]
    Does this add a new argument or angle different from the one posted by Graeme Bartlett or would the List of minor planets with 700 000 entries also here suggest that it could make sense to do? Claes Lindhardt (talk) 22:49, 19 July 2023 (UTC)[reply]
    Please, Mr Lindhardt, replying to everyone is rude. The closer will know whether this is a point that's been dealt with earlier in the debate. They don't need to see the same replies repeated.—S Marshall T/C 01:30, 20 July 2023 (UTC)[reply]
    Sorry, I am terribly sorry if I have approched it wrong. How can I be polite and still make everyone feel like the have been heard and thier arguments addressed? Claes Lindhardt (talk) 08:15, 20 July 2023 (UTC)[reply]
    Well, you don't; it's not needful. At the end of the debate, a closer will come along and summarise the points that everyone's made, the rebuttals to them, and how these interact with policy. It's the closer's job to make sure everyone feels heard. We have a whole separate place for analysing how deletion debates are closed, if the details interest you?—S Marshall T/C 14:20, 20 July 2023 (UTC)[reply]
  • Keep Firstly on categories: many of our readers do not use or understand categories, so they are not a golden-bullet alternative to difficult, long lists. Secondly on size: we can't just abandon having information on anything "big", we just need to find ways to handle big lists, for example by subdividing, which is definitely possible, see the human proteome ref below. Thirdly on duplication: genes are not proteins. There is a subtle difference! Fourthly on databasing and relevance of this list: there are online databases of proteins, e.g. the human protein atlas [1], but these do not provide navigation to Wikipedia articles on proteins. Such sites indicate that it is possible to subdivide and present information in a public-friendly way, despite the sheer size of the problem. We are here to help readers find out about stuff. Learning about human proteins is a very obvious encyclopaedic need for kids, teenagers, and interested adults. It's what we're here to do. Why on earth wouldn't we attempt to provide a list, no matter how incomplete, to help our readers appreciate the range of proteins that exist, and find our individual articles on those that are most notable? Elemimele (talk) 12:02, 21 July 2023 (UTC)[reply]
    ... a post-script: this list does not overlap with the numbered lists of human protein-encoding genes; those lists give only the gene's symbol and ID, which are as useful as a chocolate teapot to a schoolkid searching for the proteins that are found in the human heart (a reasonable question to bring to an encyclopaedia). Elemimele (talk) 12:06, 21 July 2023 (UTC)[reply]
    The target articles in both lists are about the gene and the protein encoded by that gene. Hence the scope of both lists include both topics and hence the two lists are in fact duplicates. (There are a few genes that have seperate protein protein articles, but these are rare. The vast majority of protein articles also cover the gene.) In addition, It is impractical to manually create and maintain a list of 12,000 Gene Wiki articles. The list of human protein-coding gene series was created and maintained by a bot (User:Seppi333Bot written by User:Seppi333) and is currently up-to-date. Why reinvent the wheel? In the List of human protein-coding genes 1 series, there is a UniProt column, so it already includes explicit information about the protein. Perhaps what should be done is to selectively merge columns from the List of proteins in the human body into the List of human protein-coding genes 1 series. Boghog (talk) 13:12, 21 July 2023 (UTC)[reply]
    To be fair, I wanted to include more data in those tables, but I didn't really have much support at the time I submitted my bot request. The current list uses like 10% of the available column data from the source data file. Seppi333 (Insert ) 20:18, 21 July 2023 (UTC)[reply]
    I'll give one simple example. So I'm a school-kid and interested in what human proteins handle alcohol. With the current list of proteins, I can open the article in my browser, click "find on page" and type alcohol, and I immediately find the NAD- and NADP-linked alcohol dehydrogenases, and can navigate to the articles on both enzymes. No offence to the list of human protein-coding genes 1 etc. series, but I can't for the life of me work out how to search that for enzymes that use alcohol. We are here to serve real information to normal people with rational questions, not to debate whether the inclusion of a uniprot id technically means that our school-kid ought to be find alcohol dehydrogenase; I'd be interested to see how many real school kids manage that feat. Elemimele (talk) 20:44, 21 July 2023 (UTC)[reply]
    A school kid is going to go to a list of proteins to find out what protein metabolizes ethanol? Highly unlikely. Far more likely the student would first search for Alcohol_(drug) and then find links to alcohol dehydrogenase. Boghog (talk) 20:59, 21 July 2023 (UTC)[reply]
    Even if 'A school kid is going to go to a list of proteins to find out what protein metabolizes ethanol' is unlikely right now, don't we want to live in a future where it is likely and doable? I also don't think your avg. western 7-8 grader(sorry but I am mostly fammiliar with western school systems) would use the term metabolizes but he might be able to get the same understanding with the words handles or uses. I could also well imagine someone in the start of gymnasium, secondary school or high school had such questions before reaching university. Or simply youngsters interested in the STEM subject Claes Lindhardt (talk) 09:28, 22 July 2023 (UTC)[reply]
    Is there a good video on how to get started with wikibots somewhere? Claes Lindhardt (talk) 09:23, 22 July 2023 (UTC)[reply]
    @Claes Lindhardt: I think most Wikipedia bots are programmed with WP:Pywikibot; it's the bot library I use at least. It's fairly easy to figure out how to use this library provided that you are somewhat familiar with programming in Python. You don't need to be an expert programmer to write a Wikipedia bot. Frankly, I used a few python libraries I'd never used before in my python script that creates the bot-generated lists, one of which was Pywikibot. So, I learned as I went (NB: it was the first data pipeline I ever programmed), but it wasn't terribly difficult to figure out based on my previous experience with programming in Python. If you take a course on Python programming that teaches you the basics of the language and perhaps gives you some hands-on experience programming stuff (e.g., applied coursework), you should know enough to start programming Pywikibot scripts that perform basic tasks. With a little more hands-on experience (i.e., maybe a few months of programming stuff in Python), it probably wouldn't be difficult for you to read and understand the Python script I wrote that generates the lists of human protein-coding genes, and potentially modify it for your own purposes (NB: if you can read and understand source code in a programming language, repeatedly modifying it and comparing the output you receive to what you intend is a decent way of learning on the fly; at least, it's what I do anyway: build stuff and learn as I go). Hope that helps. Seppi333 (Insert ) 23:40, 23 July 2023 (UTC)[reply]
    Very helpful, thank you :) Claes Lindhardt (talk) 18:08, 24 July 2023 (UTC)[reply]
  • "List of proteins in the human body" was nominated for deletion on 13 July 2023. On 19 July 2023 (DURING THIS ONGOING DELETION DISCUSSION), User:Claes Lindhardt created 229 links to it (in hatnotes incorrectly formatted without the {See also} template). (I don't know how that figures into the merits of the article.) -A876 (talk) 19:06, 21 July 2023 (UTC)[reply]
    sorry my bad, I am new to creating and linking articles. But I will do my best to use the {See also} when I link to it in the future. thank you for the input. Claes Lindhardt (talk) 09:31, 22 July 2023 (UTC)[reply]

Relisted to generate a more thorough discussion and clearer consensus.
Relisting comment: I'm not seeing a clear cut consensus here, and the discussion seems to still be going. Relisting for more discussion.
Please add new comments below this notice. Thanks, Dusti*Let's talk!* 20:45, 21 July 2023 (UTC)[reply]

  • Delete or merge into List of human protein-coding genes 1, List of human protein-coding genes 2, List of human protein-coding genes 3, or List of human protein-coding genes 4. Both sets of lists concern the gene and the protein encoded by the gene. Maintaining these two sets of lists is redundant. Boghog (talk) 21:11, 21 July 2023 (UTC)[reply]
    That's your second !vote in this debate, Boghog. Do you retract your first one? We normally allow some redundancy in navigational lists, because they're there to help people find content. For example we have a List of dinosaurs and a List of African dinosaurs. All the species in the African list are also in the main list, but the African list isn't useless. Can you see why?—S Marshall T/C 23:36, 21 July 2023 (UTC)[reply]
    Your right about the double vote. Sorry about that. I changed my first vote to a comment. As with List of dinosaurs, there are numerous categories, lists, and navoboxes of specific gene/protein families and I agree with you that these subdivisions are useful. But each of these lists point to articles (with relatively few execeptions) whose scope is the gene and the protein encoded by that gene. Proteins and genes are of course distinct topics. However when discussing an individual protein, it is so highly interelated with the gene that encodes it, it makes sense to have a single article that discusses both. Furthermore the infoboxes that are contained in these articles almost always contain information about the gene and protein (see {{infobox gene}}, {{infobox protein}}, note that the names of these infoboxes is somewhat missleading, both infoboxes include information about the gene and the protein). Boghog (talk) 04:03, 22 July 2023 (UTC)[reply]
    Is there a way in which we could make the names of these infoboxes less missleading? Claes Lindhardt (talk) 09:36, 22 July 2023 (UTC)[reply]
    @Boghog: After looking through the headers/data in the source data file my bot uses and comparing it to the List of proteins in the human body page, I could fairly easily incorporate all the information on protein name(s)/alias(s) [2 options] and EC numbers (for proteins that are enzymes) that are in the current list. I could also add the gene location as well as an IUPHAR link and orphanet link(s) to any associated gene/protein pages in those databases for pharmacology data and clinical information on associated rare diseases for each gene/protein. I can't merge in protein classification or function information without performing some computational gymnastics: I'd need to download a dataset from a protein database, match the identifiers for all entries in the HGNC database, and pull protein classification & function data from the second file. It's entirely within my skillset to do that, and it's probably feasible to merge in protein classification data (provided that I can find a suitable database for this); however, adding protein function information would likely significantly increase the lists' page sizes due to the amount of text that I expect will be added to each entry. Protein function information is seldom concise (e.g., pick an arbitrary UNIPROT page on a human protein and read the function field). On the other hand, the lists' page sizes don't seem like a particularly notable issue now compared to when my bot was approved since they're not even listed on Special:LongPages anymore. That being said, I think I'd need to raise this issue with User:Primefac (please correct me if I'm wrong) before making significant changes like this to my bot's source code, per the original approval discussion. Seppi333 (Insert ) 00:00, 22 July 2023 (UTC)[reply]
    @Seppi333: Thanks for your willingness to modify your bot to include additional columns in your gene/protein lists. It would be desirable to also include function. In princple, this could be extracted from WikiData with a Python API. There is a WikiData property for molecular function (P680) which has already been populated for all human proteins. For example, one can search for human proteins with nuclear receptor activity
    molecular function (P680) = nuclear receptor activity (Q14872989)
    found in taxon (P703) = nuclear receptor activity (Q14872989)
    To get a list of human nuclear receptors: query
    I am not sure how much work this would be to write a subrouitine to return a list of Gene Wiki articles and their associated molecular function(s), but I am willing to help. Boghog (talk) 04:41, 22 July 2023 (UTC)[reply]
    If Wikidata by itself is sufficient for filling in the missing gene data from the HGNC dataset, then it should be easy enough to match the identifiers between the dataset and Wikidata since they should both have assigned gene symbols and HGNC IDs. Haven't looked at the wikidata API, but so long as there's a means to batch-download all the data - or just specific statements - from thousands of WD entries simultaneously, then everything else should be pretty straightforward to program. Would just merge the datasets on unique identifiers present in both (HGNC ID would probably be the safest option if those numbers are static) by appending the wikidata column data to the end of the HGNC data file. This modified bot script would still work normally and generate the same wikitables as the current bot script from this merged dataset as it does from the current HGNC dataset. So, all I'd need to do to add the 2 wikidata columns is add 2 more wikitable columns for that data in the function that generates the wikitables. That's simple enough to do, so I'd just need to make sure the wikidata API has the functionality to give me what I need; might be prudent to wait to see if the consensus leans toward a merger before ironing out the rest of the software design, though. Seppi333 (Insert ) 10:49, 23 July 2023 (UTC)[reply]
    Ideally we would then also merge the articles: en.wikipedia.org/wiki/Proteins_produced_and_secreted_by_the_liver , (Category:Human proteins, List of proteins, List of enzymes, Transporter Classification Database and Index of protein-related articles I feel like a lot of these articles hinges on some of the same questions? Maybe we could even have column of which cells produces the proteins link to: List of distinct cell types in the adult human body? Claes Lindhardt (talk) 09:41, 22 July 2023 (UTC)[reply]
    The HUGO Gene Nomenclature Committee (HGNC) database already contains a list of ALL confirmed human genes and through its UniProt link, all confirmed human proteins. UniProt also keeps track of cleavage products from the original transcribed protein (see for example P02775). The HGNC/UniProt list is complete so there is no reason to merge various protein family lists. The issue of cell types/tissue distribution gets messy. One can infere this by message (mRNA expression) or protein product (immunohistochemistry). UniProt has links to expression databases. The Human Protein Atlas is another possibility. However the more I think about this, the less feasible I think this would be to incorporate in a concise way. Boghog (talk) 19:09, 23 July 2023 (UTC)[reply]
    Also if we merge the articles it is important that we have a heading like 'Naming Convention for the list', because I think if you do not have a wiki profile and don't go to the talk page, and you are trying to navigate what article is about what. Or if you are new and trying to figure out where you can add what information or if something seems off, it is just impossible. Non of the above mentioned lists, categories or articles really have that(if I missed it please let me know), and it makes the job of comparing them much greater. The job of validating weather the lists are consistant with thier own defintions is also much bigger when it is unclear excatly what the defintions that have been agreed upon so far is.
    If they are merged I would also really like to keep 'List of databases containing Human Proteins', to clarify where the data is coming from and maintain a high level of transparrency. Claes Lindhardt (talk) 09:48, 22 July 2023 (UTC)[reply]
    I don't see a problem with that. Seppi333 (Insert ) 10:49, 23 July 2023 (UTC)[reply]
  • Merge per Boghog or Delete - assuming it would eventually become a complete list, it doesn't seem feasible to manually maintain up-to-date information on 20,000-100,000 proteins. Even the names of genes/proteins change more frequently than one might expect. Seppi333 (Insert ) 00:04, 22 July 2023 (UTC)[reply]
    But would it have to happen manually? isen't there ways in which we could automate bits and piaces of it? Claes Lindhardt (talk) 09:43, 22 July 2023 (UTC)[reply]
    Seppi333 I think the work you have done already is amazing, and it would make sense to put in efforts to build on top of that.
    We should however not remove this page, before the columns and so have been added on your original article. There is still a bit of way before the average person can really benefit fully from it. Claes Lindhardt (talk) 09:56, 22 July 2023 (UTC)[reply]
    @Claes Lindhardt: See below for my justification. I also described a few issues you'll encounter based on my experience, provided the consensus is keep and you decide to expand it manually. In the event a merge is the consensus, it shouldn't take more than a few hours (days, if I'm busy off-wiki) to update the source code to include data on all the protein names, ec numbers, protein classifications, and protein functions from the databases Boghog and I discussed. I would assume this list page would continue to exist until those updates are published.
Seppi's justification

To be clear, I'm not opposed to the existence of both List of proteins in the human body (if complete, >100000 list entries on proteins) and the List of human protein-coding genes (~ 20000 entries on genes and the canonical encoded protein(s)) pages based on duplication given that the scope of the former is significantly broader than the latter. Every functional human protein is inherently notable for the same reason that canonical proteins are (i.e., the proteins covered by my list page's smaller scope), although the current convention is to just cover information on protein variants under the article about the gene and canonical protein. The function, stability, and clinical significance of protein variants can differ significantly from canonical proteins; for example, the FosB truncated splice variants and post-translationally modified ΔFosB isoform: FosB vs ΔFosB vs phospho-ΔFosB vs Δ2ΔFosB all differ in one or more of those respects from each other. That might make most protein variants seem like they're not necessarily notable in their own right, but it's just more practical and poses less of a navigational challenge to cover variants of a canonical protein on the canonical protein page. Hence why the MCB project recommends this. In any event, the notability of all potential entries in your list isn't an issue either.

  • The primary area where you will run into problems if you attempt to manually complete this list is your source data is constantly changing, and the only way to avoid including erroneous, outdated, and incomplete information in your lists relative to your source data is to automate the generation of the data in your lists from your source databases. In other words, you need to build a data pipeline, then program a bot to convert it to wikitext markup and publish your list page. If you actually finished adding every potential known entry in this list via manual edits, you'd encounter several problems that will arise due to the fact that your source data (i.e., every database you pulled protein information from) is not static: (1) a few approved gene symbols will change roughly every month, (2) more than a few approved protein names will change every month, (3) new human protein-coding genes will be approved and the corresponding protein variants along with the gene will need to be added from the source databases on a regular basis, (4) a nontrivial fraction of the table's contextual data [protein function/classification/location] will change or be added/expanded in your database sources on a regular basis. All of these problems reduce the utility of your list since incomplete, incorrect, and outdated data will be perpetually present in your list due to the sheer size. If not for the fact that we are talking about a list of 100000 entries, manually updating a list to include updates in a database on a regular basis is not an issue; but, the size of this page and the source databases makes it infeasible to manually update changes from a database even once - and this would still be true if this page were only contained 10000 proteins entries - much less on a regular basis. The only way to effectively address all of these problems is automation, hence my vote.
  • The second area where you'll run into problems is your list will contain thousands of incorrectly-targeted links (targeted to irrelevant non-bio pages that share the same name, target DAB pages, target gene pages that should be DABs, and target redlinked pagenames about a topic that exist at a different page title without a redirect at your link target). To address these issues in the protein-coding gene list for the ~20000 gene symbols (NB: you'll have ~100000 protein links to address), I wrote another pair of scripts to locate mistargeted links ([2]), collaborated with a wikidata editor who pursued the same approach using SPARQL scripts to locate these pages, and worked with basically every active editor at WikiProject Disambiguation for about a week to actively disambiguate every target link on the list page, which was a pretty massive undertaking given that they checked around 12000 link targets and fixed hundreds of DAB issues that all 3 groups identified.
  • The last issue you will face with this list is page size constraints. If this list is ever complete and never split, it will easily contain millions of bytes of page content, in turn securing you the top spot in Special:LongPages. You might consider achieving that rank to be a good thing, but you'll quickly earn yourself the ire of pagesize warriors who will constantly badger you about breaking up the article.
Seppi333 (Insert ) 10:49, 23 July 2023 (UTC)[reply]
  • Comment There seem to be two main categories of ATD right now.
  1. There's the human curated list option, which would seem to require a tighter inclusion criteria (not necessarily as part of this discussion)
  2. There's a bot option, which right now hinges on a merge.
One thought: why not both? Doing both takes the pressure off the curated list to be large, meaning we can have a very restrictive inclusion criteria. And we take the pressure of the large bot list to be in-depth, and it can act mostly as a navigational aid with whatever extra data it can easily include.
siroχo 04:59, 22 July 2023 (UTC)[reply]
@Siroxo: Yes, completely! Having a manual list also gives us a way to deal with problems like Calcitonin, which is one of the two different peptides encoded via alternative splicing by the gene CALCA. At the moment we cover the other product in Calcitonin gene-related peptide which also deals with the product of a second gene CALCB. The point is that there isn't a 1:1:1 relationship between genes, proteins, and wikipedia articles, so while I applaud Seppi333's excellent efforts and want those lists kept, it's great to have both a manually-curated, selective navigational list, and a complete, bot-generated database-list of genes. Elemimele (talk) 07:36, 22 July 2023 (UTC)[reply]
Elemimele (talk) 07:36, 22 July 2023 (UTC)[reply]
This might be a silly question, but when a list is bot currates can no manual changes be made to it? or is it also possible to make manual changes to a bot curated list? in my head an ideal case senario would be that I could start at List of organs of the human body and then see all the proteins required to build each organ as well as what they do, and how they interplay. Or I could start at the bottom Composition of the human body and then see each protein is build from those basic building blocks. The list of proteins is kind of that middle step which could bind all the different levels of building blocks in the human body togehter Claes Lindhardt (talk) 10:02, 22 July 2023 (UTC)[reply]
@Claes Lindhardt: The manual changes remain in effect until the next time the bot is run. A bot-updated list generally obviates the need for anyone to edit the list except to retarget dablinks. I don't know if any other bot-generated lists exist, but the way I handle list edits outside of dablink retargets is to ask people to propose them on the talk page via editnotices and - for those who ignore the editnotices - incorporate manual edits that were introduced since my bot's last edit into its source code, provided that it was a useful edit.
@Siroxo and Elemimele: May be worth reading my response to Claes in the collapse tab above as to why I think a manually-updated list on this topic is a bad idea. If you want an idea of how frequently changes are made to the underlying data in the list I maintain, look at how regularly the list navbox indices change across the page revisions. The bot script doesn't change; it's removals/additions/renamings of genes in the underlying dataset that causes that. It'll be much worse with a list 5x longer than mine.
Also, @Elemimele: I'm just responding to what you said below here: merging the data from the current list page into mine is entirely feasible. I wouldn't be merging the actual data in this list - most of it is actually empty cells anyway; we're just merging in the column headers and populating the corresponding data for each list entry from the primary and a secondary database. As described in my discussion with Boghog, we'd use some of the existing data in the source file I currently use and merge in some of the missing column data from a second database with complete information for all human protein-coding gene list entries. It should be fairly simple to revise my bot's source code to include complete information on the protein name, ec numbers (if applicable), protein classification, and protein function for all ~20000 genes in the list my bot maintains.
Lastly, I welcome any hardcore deletionists to try to get my bot's lists deleted; I wouldn't have had a chance of getting my bot approved if I didn't know the relevant MOS and content guideline pages pertaining to my lists backward and forwards; in fact, I cited them at Wikipedia:Bots/Requests_for_approval/Seppi333Bot & Wikipedia:Articles for deletion/List of human protein-coding genes 1. I thanked the first guy who nominated my list for deletion in the request for approval since he unwittingly helped me quickly generate consensus by doing so. I'd laugh at the futility if someone wants to go round 2, though. Seppi333 (Insert ) 11:40, 23 July 2023 (UTC)[reply]
  • Strong delete. There is so much wrong with this article it is hard to know where to begin, other than by noting that the objections to it already made above by other editors are mostly valid. The best one can say of it is that it is a rough draft of something that could be made useful with a huge amount of work. First all, it is misnamed: it is not a list of proteins, but mainly a list of genes with the sort of meaningless names that geneticists seem to like. As the first of a huge number of examples, ACAD10 is not a protein; it is a gene coding for Acyl-CoA dehydrogenase family, member 10. If you want to continue pretending this is a list of proteins you need to replace all of these gene names with names of proteins. It's as if the List of presidents of the United States didn't bother to give their names but expected you to click on the number in the left-hand column: that will tell you, so what more do you want? Then, for A4GALT (Lactosylceramide 4-α-galactosyltransferase) we find enzyme(EC number?). However, the EC number is right there in the linked article (2.4.1.228), so does (EC number?) just mean "I suppose this enzyme has an EC number, but if you want to know what it is you need to look it up, because I can't be bothered"? For ACOT6 we don't even get that snippet of information. The order is sort of alphabetical, but the article creator didn't bother to apply a system consistently. Thus we have Tubulin before B3GALNT2. Collagen, the most abundant protein in the body, rolls in at No. 253, with no indication of what it is and what it does. Phosphofructokinase, a protein everyone has heard of, is No. 1731. Just above it there is Hexokinase, with no indication that humans have four isoenzymes with different genes. Then at 1805 we find Cooperativity, which is neither a protein nor a gene. Athel cb (talk) 09:51, 22 July 2023 (UTC)[reply]
    Thank you that is some very good input. Do you think these problems apply to all of: Category:Human proteins, List of proteins, List of enzymes, Transporter Classification Database and Index of protein-related articles ? and is thus symptomatic of the wiki community as a whole around this topic or do you think it is just this list/article? I am very open to trying to replace all of the gene name with protein names. How can we try to assure that we do not make the same mistake again before we try to replace all the names?
    The reason that the EC number is there is so that you can sort the list according to EC number if you wish. Most of the information in the different columns of the list can also be found in the article. But it is hard to compare so many articles and porteins if not on a table form somehow. (Here I think it is important to remeber that the point of the column is not just an EC number, but a type of protein so it could also be a TC number.)
    I am not sure what you mean by snippet of information in ' For ACOT6 we don't even get that snippet of information.' please elaborate?
    For the earlier examples you at least tell us that you think EC numbers may exist (though you don't say what they are). For ACOT6 (and others) all we get is a link to a general article about enzymes, though one can probably assume that anyone perusing the list is already familiar with the idea of an enzyme. Adding enzyme serves no purpose. Athel cb (talk) 15:55, 22 July 2023 (UTC)[reply]
    And no nothing is really consistent yet, I can make it clear in the beginning of the article text if you want. But if you click the arrows on the top of the table in the name column you get it sorted in alphabetical order. The whole idear with a sortable table is that you can choose what column you sort based on. I wanted to add all the things already on wikipedia before I started implementing the things from the naming convention strictly(and Ideally also settle on a universially applicable naming convention) before doing that.
    About Collagen, I thought about this myself, and considered adding a column that somehow indicates how present it is in the human body either % of bodyweight or % in number of proteins present in the human body. But I failed to find any databases with extensive accessible data on this for a large number of proteins if you know any, please link it. Then we can add a such column :)
    When you say that Phosphofructokinase is a protein that everyone have heard of, why is it that everyone have heard of it? is it because of its function? Maybe it pops out on a certain parameter from which one could resort the list so that it would pop out in the top rows?
    Why do you think that everyone has heard of phosphofructokinase whereas only a tiny minority of people have heard of 11β-hydroxysteroid dehydrogenase type 1? Athel cb (talk) 15:55, 22 July 2023 (UTC)[reply]
    What would be a good way to make an indication that Hexokinase have four isoenzymes, linking it to 4 EC numbers?
    It's not the only one that exists as isoenzymes. You need to check for all the others. It doesn't have four EC numbers, they're all the same, but there are four different proteins. Athel cb (talk) 15:59, 22 July 2023 (UTC)[reply]
    I will remove Cooperativity
    So you should, and you should search for other similar examples, such as Base excision repair, Bump and hole Athel cb (talk) 16:11, 22 July 2023 (UTC)[reply]
    Regular insulin is not a protein in the human body (unless it's been injected). There are probably other similar examples. Athel cb (talk) 16:11, 22 July 2023 (UTC)[reply]
    Major histocompatibility complex is a locus, not a protein. Athel cb (talk) 16:11, 22 July 2023 (UTC)[reply]
    Histones clumps together several different important proteins that are different from one another. Athel cb (talk) 16:11, 22 July 2023 (UTC)[reply]
    There is still a lot of work to do, no doubt. But I feel it could be worth doing and that a lot of these issues does not only apply to this article/list but to how we go about Proteins(especially in the Human body) on wikipedia in general, and an article/list. Might be a part of the solution.
    There is a tremendous amount of work to do, and you seem to have posted the article without doing more than about 5% of it. Unfortunately there is little reason to think you have the knowledge of proteins and biochemistry in general to do this. Athel cb (talk) 16:15, 22 July 2023 (UTC)[reply]
    A lot of these issues are onces that everyone will face trying to answer protein questions with wiki, yet it is unclear how apperend they are to us as a community?
    I highly appriciate that you took the time to look at the list, and hope to get more good feedback :) Claes Lindhardt (talk) 10:19, 22 July 2023 (UTC)[reply]
  • Comment Woulden't it make sense to have a column for all the non human speices that each protein occurs in. So that researchers looking for animal models could use it to find potential matches?
  • Comment is there a more intuitive way to make clear or visualize how much have been mapped vs. How much there is left to map? Like how many proteins are listed here vs. how many we think there is vs. how many have been observed and described
  • Comment I am highly concerned that the creator of this page is now going on a crusade to add "See also: List of proteins in the human body to the beginning of every single entry in the list, which I think is highly inappropriate. GraziePrego (talk) 13:18, 22 July 2023 (UTC)[reply]
    Agree. I've mass reverted them now. — kashmīrī TALK 14:04, 22 July 2023 (UTC)[reply]
    I did the same thing List of distinct cell types in the adult human body, did I break some guidline here as well? and what guideline is it that this is against? and how does it contribute negativly to this discussion or the state of proteins on wiki? Would it be better if I digged out the creater of each article on the list and wrote on thier talk page? Claes Lindhardt (talk) 14:10, 22 July 2023 (UTC)[reply]
    This is a deletion discussion, it would be helpful for everyone if you did not reply under every comment here, especially with matters unrelated to this deletion discussion. Feel free to ask at Talk of the said article. — kashmīrī TALK 15:47, 22 July 2023 (UTC)[reply]
    Yes, I started to do the same thing, but I decided that there were more urgent things to do, so I left it for later. Athel cb (talk) 16:18, 22 July 2023 (UTC)[reply]
    I'm also concerned this whole debate is going off the rails, and it would certainly be useful to sort out the fate of this list before adding more links to it. On the subject of the current list being in a bad state, Athel cb, are you in favour of a terminal-for-ever delete on the grounds that the subject is non-encyclopaedic? Or a TNT delete on the grounds that it might be possible to handle the subject but the current article is so bad a starting point that one might as well start over?
    The present article is in such poor shape that I can't see how it could be salvaged without scrapping it and starting again. If someone who understands the difference between genes and proteins, and in general has a good knowledge of biochemistry (qualifications that are not evident at the moment), wants to create a new article that does it properly, then OK. Athel cb (talk) 15:47, 22 July 2023 (UTC)[reply]
    I should clarify, I think the concept of a list of proteins in the human body for which we have Wikipedia articles is a very valid subject for a navigational list article. I am not wedded to the current list.
    @GraziePrego:, I don't think extra columns for non-human species would work, because there are too many species.
    A comment on the various suggestions to merge to Seppi333's list, this is probably technically completely impossible. If a list is maintained by a bot, how is it going to handle introduction of human-written material? Also the bot-maintained list is a blatant (but in my view useful!) violation of WP:NOTDATABASE so it's on thin ice itself. If we merge the material there, it might well get deleted as soon as a hard-core deletionist notices the whole thing doesn't fit WP's policies. Elemimele (talk) 14:15, 22 July 2023 (UTC)[reply]
    So how would one go about creating a list that does not run into the problems that this one have run into? is there a way of creating a template or a framework for a such list? I am very open to the idear of starting over, but if we do not come up with a concrete framework for how, I fear we might just have this dicussion all over on the new article/list once again? Claes Lindhardt (talk) 14:35, 22 July 2023 (UTC)[reply]
    It would also be really nice if someone could make a list of all the problems with this list which seems unsolvable and one with the ones that seems solvable. So that we could starty by trying to address the solvable onces, before we started over. and then could have a discussion on how making a new list solves the problem that seems unsolvable here Claes Lindhardt (talk) 14:39, 22 July 2023 (UTC)[reply]
    Please, would you kindly stop hijacking every single comment in this discussion? — kashmīrī TALK 15:48, 22 July 2023 (UTC)[reply]
    kashmiri, I hope you weren't referring to me, as I've written quite a bit here too. If so, I apologise. I don't want to do a take-over of this discussion. But I do think Claes Lindhardt's question is worth an answer. (1) The current table is a mess because it contains a hotch-potch mix of genes and proteins. The two are different. If this is to be truly a list of proteins, it should contain only proteins (2) It's got a rather arbitrary collection of extra columns, some of which is going to be hard to reference; the first row illustrates the problem, where an enzyme is described as a fibrous protein, and we have a "function" column that can't say anything because if you go to the article about this protein, it starts by saying the physiological function is unclear; (3) the associated introductory text clearly implies that the ideal here would be a complete, database-like list, but since it's manually compiled, it's not likely to become complete. Those who want a database will be disappointed because it doesn't compare well to seppi's gene database; those who want an encyclopaedia-style navigational list with some introductory text will also be disappointed because it's not a great navigational aid in its current form. Elemimele (talk) 18:46, 22 July 2023 (UTC)[reply]
    Hi @Elemimele, apologies for not tagging the editor – I meant Claes Lindhardt, not you, as your comments here have obviously been very helpful. — kashmīrī TALK 09:34, 23 July 2023 (UTC)[reply]
  • Userfy/Draftify. The article in its current shape is absolutely unsuitable for mainspace (vide the issues brought up by others above), even as I see a potential for such a topic to be useful when developed properly, both content- and presentation-wise. Currently, neither content nor presentation is to a minimum acceptable quality. — kashmīrī TALK 09:54, 23 July 2023 (UTC)[reply]
  • Delete, for several reasons. First, it does not do anything better than a category does, and should remain a category. Second, the lists List of human protein-coding genes 1, List of human protein-coding genes 2, List of human protein-coding genes 3, and List of human protein-coding genes 4 already exist, and this list as it currently exists is treading the same ground over again. Third, this page is contaminated with entries that are not from humans, not proteins, and some neither human nor proteins. Fourth, it is not useful to navigate- to find a particular protein of interest, someone has to search the giant list. That's what Wikipedia is for- they can just type the protein name into the search bar at the top instead of looking at a list that just contains much less information. Overall this page would be need an absolutely staggering amount of work to contain what it intends to contain, and I believe even that will not serve any significant purpose. GraziePrego (talk) 12:54, 23 July 2023 (UTC)[reply]
  • Comment, I'm becoming concerned that this AfD discussion is a the tip of an iceberg. Our current handling of proteins at a high level is awful. We have nice individual articles on individual proteins, and we have a handful of rather good navigational articles dealing with very small categories, such as Peptide hormone. This is a superb little article, exactly what any non-expert will need if they suddenly see the term in a book or newspaper and want to know more. But please take a look at List of types of proteins. If you feel that the current article has problems, the list of types of proteins is atrocious beyond wildest imagination, a collection of random unassigned quotes often on things that are just general cell biology with little special relevance to proteins (the cytosol is in the list of types of protein). It links to the marginally better List of proteins and List of enzymes but these are also both in a very sorry state. I wonder if we need a much broader delete-and-rethink on this. I will nominate list of types of proteins as it's a TNT case. Seppi's list is at least professional and in good condition. Is there anything that can be done to make the subject of human (and other) proteins accessible to the non-expert who isn't already familiar with the various gene database ID's etc.? Can we give the non-expert reader some sort of meaningful overview? How do we go about doing so? Elemimele (talk) 07:47, 24 July 2023 (UTC)[reply]
    What is the use case for finding proteins by searching through giantic lists? I think GraziePrego point above is worth repeating Fourth, it [a list] is not useful to navigate- to find a particular protein of interest. Far more practical is to use the Wikipedia search bar or Google that frequently places a link to the relevant Wikipeida article at the top of the search results. The list examples given above I think are illustrative. Small lists tend to work much better than long lists. The problem with List of types of proteins is there are many ways to classify proteins. By structure, by function, by cellular location, by tissue expression, by organism, etc. I do think that List of types of proteins is overly complex and could be signficantly simplified to focus on the big picture. Also {{Protein topics}} that is at the bottom of List of types of proteins needs to be cleaned up. Finally there is partial overlap with List of proteins. I will work on these as I find time. Boghog (talk) 08:33, 24 July 2023 (UTC)[reply]
    @Boghog: Good question. I think a list is useful in navigation when it has a definite scope, and when the reader will not necessarily know the name of the protein for which they are searching. A perfect example is the Peptide hormone list that I mentioned above. It is quite likely that someone will be looking for human hormones, and will know that some are peptides, maybe know the names of one or two, and be wondering what others exist. Yes, categories are another way to do the job, but since a lot of readers don't really understand how categories work, these targeted list articles have value.

    I personally favour a two-pronged approach: (1) Seppi-style all-inclusive lists for the "pro" reader; (2) A hierarchy of (incomplete) navigational lists to help the general reader. List of types of proteins would work better if it were simply a list of other lists-and-articles. For example, it could point the reader at Enzyme (as an article on a major class of proteins), and also at List of enzymes and any other useful list articles we have that are confined to enzymes. In a separate section, it could point to any useful articles on protein/petide hormones, or at structural proteins, etc.; in this way, it would provide an overview of the sorts of things proteins do, and access to all our articles that describe protein functions, and list specific proteins. It can, of course, also have links to useful categories, thereby drawing readers into the category system. The general-reader list articles don't need to be complete because they are only providing a "bigger-picture" overview of the subject for the general reader - they are not a substitute for a database or detailed seppi-style list, which will fill in the detail.

    I don't think very fact-filled lists on proteins work well; they overlap with the actual articles, they're harder to maintain, and they become enormous because of the sheer number of proteins in any organism. Elemimele (talk) 12:23, 24 July 2023 (UTC)[reply]
    That is exactly what I had in mind. A big-picture list-of-lists which starts out with an with a description the various ways protiens are classfied that in turn links to more detailed sublists. Boghog (talk) 12:55, 24 July 2023 (UTC) This list of lists will not mention individual proteins, hence the size should be manageable. Boghog (talk) 17:59, 24 July 2023 (UTC)[reply]
    Give basic facts as a service to our readers. Save having to manually update the list by transclusion: ideally WP:LST so the list is updated automatically whenever a linked article's updated.—S Marshall T/C 14:36, 26 July 2023 (UTC)[reply]
  • Merge into separate sublists - definitely a useful topic, problem was that execution was wrong. I don't see the problem of a list with this length - if needed, non-notable proteins can be filtered out. Karnataka (talk) 20:03, 26 July 2023 (UTC)[reply]
  • Delete. Lists of such proteins can be retrieved by search in databases like Uniprot. But having it as a WP page is hardly helpful. My very best wishes (talk) 22:17, 30 July 2023 (UTC)[reply]
  • (weak) delete. I don't know sh*t about proteins. From what I read here, it is a vast subject, and one that may get quite specialized very fast (I mean, beyond the scope of even a regular college educated person). It looks like someone is trying to create the "Protein Wiki" and I don't think we should aim at being the encyclopaedia of everything at every detail, we are general encyclopaedia. Of "everything" but leave the specialised work for the specialists. I think Elemimele's question is a pertinent one: we probably need a better organization of the subject. Is this a good start (then keep), or something that will hinder a better solution (then delete)? In doubt, I'd delete to foment creation of a better solution. Say, we don't have a List of people (at the tiome of writting) it is a redirect to a section on lists about people at "List of Lists". Maybe this article should be that? A list of lists about proteins? - Nabla (talk) 17:24, 31 July 2023 (UTC)[reply]
The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review). No further edits should be made to this page.