At the hearth of the computational proteomics lies the search engine. Considered
a gold standard in the proteomics field, search engines are ubiquitous pieces of
software, being a structural columns in any modern analysis where one needs to
map spectra from mass spectrometers into known protein sequences.
The concept of the database search using spectra, and protein databases is still
fairly new. The idea was first introduced by Jimmy Eng, Ashley McCormack, and
John R. Yates, in 1994. Almost 30 years later, users now have several options of
implementations, including the most recent new generation of tools, that are
able to index the data in such way that searches can be done without restricting
the precursors masses.
Having said that, I decided to look the citation profile of the most popular
search engines. The first thing we need is to list the most important tools, and
the Google scholar IDs of both the paper, and one of the authors. We’ll feed
this information to the {{scholar}}
R package, that will return a data frame
containing their respective citation history.
Authors and papers
First we need to search Google scholar e look for the paper IDs and at least one
author. his part is quite manual, so I had to spend some time looking for the
information.
Software name (author ID : publication ID)
- Sequest (FgWHqtYAAAAJ:u-x6o8ySG0sC)
- ProLuCID (mrXa9nIAAAAJ:blknAaTinKkC)
- Tandem ( ? )
- Comet (FgWHqtYAAAAJ:V3AGJWp-ZtQC)
- MSAmanda (CvSqcXkAAAAJ:IjCSPb-OGe4C)
- MASCOT ( ? )
- MetaMorpheus (OLAeibkAAAAJ:rO6llkc54NcC)
- pFind( ? )
- pFInd 2.0 ( ? )
- Andromeda (kyMcGcIAAAAJ:maZDTaKrznsC)
- MSFragger (iSLx0mgAAAAJ:vV6vV6tmYwMC)
Next, we use fetch th data for each software, and we marge it into one single
data frame.
# fetch the citation history from Google Scholar
# Sequest
pubs <- get_article_cite_history('FgWHqtYAAAAJ', 'u-x6o8ySG0sC')
# ProLuCID
pubs <- rbind(pubs, get_article_cite_history('mrXa9nIAAAAJ', 'blknAaTinKkC'))
# Comet
pubs <- rbind(pubs, get_article_cite_history('FgWHqtYAAAAJ', 'V3AGJWp-ZtQC'))
# MSAmanda
pubs <- rbind(pubs, get_article_cite_history('CvSqcXkAAAAJ', 'IjCSPb-OGe4C'))
# MetaMorpheus
pubs <- rbind(pubs, get_article_cite_history('OLAeibkAAAAJ', 'rO6llkc54NcC'))
# Andromeda
pubs <- rbind(pubs, get_article_cite_history('kyMcGcIAAAAJ', 'maZDTaKrznsC'))
# MSFragger
pubs <- rbind(pubs, get_article_cite_history('iSLx0mgAAAAJ', 'vV6vV6tmYwMC'))
# remove the year of 2022 from the data frame
pubs <- pubs %>%
filter(year < 2022)
# replace the publication IDs by the tool names
pubs <- pubs %>%
mutate(pubid = replace(pubid, pubid == 'u-x6o8ySG0sC', 'Sequest')) %>%
mutate(pubid = replace(pubid, pubid == 'blknAaTinKkC', 'ProLuCID')) %>%
mutate(pubid = replace(pubid, pubid == 'V3AGJWp-ZtQC', 'Comet')) %>%
mutate(pubid = replace(pubid, pubid == 'IjCSPb-OGe4C', 'MSAmanda')) %>%
mutate(pubid = replace(pubid, pubid == 'rO6llkc54NcC', 'MetaMorpheus')) %>%
mutate(pubid = replace(pubid, pubid == 'maZDTaKrznsC', 'Andromeda')) %>%
mutate(pubid = replace(pubid, pubid == 'vV6vV6tmYwMC', 'MSFragger'))
ggplot(pubs, aes(x = year, y = cites, color = pubid)) +
geom_line() +
theme()
