The road goes ever on and on

Posted by Felipe on Thursday, April 14, 2022

At the hearth of the computational proteomics lies the search engine. Considered a gold standard in the proteomics field, search engines are ubiquitous pieces of software, being a structural columns in any modern analysis where one needs to map spectra from mass spectrometers into known protein sequences.

The concept of the database search using spectra, and protein databases is still fairly new. The idea was first introduced by Jimmy Eng, Ashley McCormack, and John R. Yates, in 1994. Almost 30 years later, users now have several options of implementations, including the most recent new generation of tools, that are able to index the data in such way that searches can be done without restricting the precursors masses.

Having said that, I decided to look the citation profile of the most popular search engines. The first thing we need is to list the most important tools, and the Google scholar IDs of both the paper, and one of the authors. We’ll feed this information to the {{scholar}} R package, that will return a data frame containing their respective citation history.

Authors and papers

First we need to search Google scholar e look for the paper IDs and at least one author. his part is quite manual, so I had to spend some time looking for the information.

Software name (author ID : publication ID)

  • Sequest (FgWHqtYAAAAJ:u-x6o8ySG0sC)
  • ProLuCID (mrXa9nIAAAAJ:blknAaTinKkC)
  • Tandem ( ? )
  • Comet (FgWHqtYAAAAJ:V3AGJWp-ZtQC)
  • MSAmanda (CvSqcXkAAAAJ:IjCSPb-OGe4C)
  • MASCOT ( ? )
  • MetaMorpheus (OLAeibkAAAAJ:rO6llkc54NcC)
  • pFind( ? )
  • pFInd 2.0 ( ? )
  • Andromeda (kyMcGcIAAAAJ:maZDTaKrznsC)
  • MSFragger (iSLx0mgAAAAJ:vV6vV6tmYwMC)

Next, we use fetch th data for each software, and we marge it into one single data frame.

# fetch the citation history from Google Scholar

# Sequest
pubs <- get_article_cite_history('FgWHqtYAAAAJ', 'u-x6o8ySG0sC')

# ProLuCID
pubs <- rbind(pubs, get_article_cite_history('mrXa9nIAAAAJ', 'blknAaTinKkC'))

# Comet
pubs <- rbind(pubs, get_article_cite_history('FgWHqtYAAAAJ', 'V3AGJWp-ZtQC'))

# MSAmanda
pubs <- rbind(pubs, get_article_cite_history('CvSqcXkAAAAJ', 'IjCSPb-OGe4C'))

# MetaMorpheus
pubs <- rbind(pubs, get_article_cite_history('OLAeibkAAAAJ', 'rO6llkc54NcC'))

# Andromeda
pubs <- rbind(pubs, get_article_cite_history('kyMcGcIAAAAJ', 'maZDTaKrznsC'))

# MSFragger
pubs <- rbind(pubs, get_article_cite_history('iSLx0mgAAAAJ', 'vV6vV6tmYwMC'))
# remove the year of 2022 from the data frame
pubs <- pubs %>% 
  filter(year < 2022)

# replace the publication IDs by the tool names
pubs <- pubs %>%
  mutate(pubid = replace(pubid, pubid == 'u-x6o8ySG0sC', 'Sequest')) %>%
  mutate(pubid = replace(pubid, pubid == 'blknAaTinKkC', 'ProLuCID')) %>% 
  mutate(pubid = replace(pubid, pubid == 'V3AGJWp-ZtQC', 'Comet')) %>% 
  mutate(pubid = replace(pubid, pubid == 'IjCSPb-OGe4C', 'MSAmanda')) %>% 
  mutate(pubid = replace(pubid, pubid == 'rO6llkc54NcC', 'MetaMorpheus')) %>% 
  mutate(pubid = replace(pubid, pubid == 'maZDTaKrznsC', 'Andromeda')) %>% 
  mutate(pubid = replace(pubid, pubid == 'vV6vV6tmYwMC', 'MSFragger'))
ggplot(pubs, aes(x = year, y = cites, color = pubid)) +
  geom_line() +
  theme()