2 min read

Standing on the Shoulders of Giants

The mathematics geneology project is a nice side-project of the American Mathematical Society - record who supervised who for their PhD Thesis. Except the modern form of the PhD of ~3 years (or more…) with a clear end product and a clearly defined supervisor (or several) is quite modern and the definitions are very loose when you go backwards.

I’ve got myself added to their database - me, and asked the rvest package to trace my ‘ancestors’.

This has gone wrong a few times. I’ve settled on storing everything in a sqlite database, and I’ve pin-ed it so it doesn’t get lost.

I should connect my pinboard to my nextcloud for cross-device sync and getting into the daily remote backup.

connect <- function(){
  con <-  DBI::dbConnect(RSQLite::SQLite(), dbname = pins::pin_get("maths-geneology"))
}
disconnect <- function(con){
  DBI::dbDisconnect(con)
}

con <- connect()

relationships <- tbl(con, "relationship") %>% 
  rename(to=id_supervisor, from=id) %>% 
  collect() 

JR_ancestors <- as_tbl_graph(relationships) %>% 
  activate(nodes) %>% 
  arrange(name != "230731") %>% #Ugly hack I've used before to make a specific item 1st...
  mutate(distance = node_distance_from(1)) %>% 
  filter(is.finite(distance)) %>% 
  mutate(id=as.integer(name)) %>% 
  select(-name) %>% 
  left_join( tbl(con, "researcher"), copy=TRUE)
set_graph_style(plot_margin = margin(1,1,1,1))
ggraph(JR_ancestors, "tree") + 
  geom_edge_diagonal(aes(start_cap = label_rect(node1.name),
                     end_cap = label_rect(node2.name)), strength=0.5) +
  geom_node_label(aes(label=name)) 


relationships <- tbl(con, "relationship") %>% 
  rename(from=id_supervisor, to=id) %>% 
  collect() 

descendants <- function(id_number){
  descendants <- as_tbl_graph(relationships) %>% 
    activate(nodes) %>%
    mutate(id=as.integer(name)) %>% 
    select(-name) %>% 
    arrange(id != id_number) %>% #Ugly hack I've used before to make a specific item 1st...
    mutate(distance = node_distance_from(1)) %>% 
    filter(is.finite(distance)) %>% 
    left_join( tbl(con, "researcher"), copy=TRUE)
}
John  <- descendants(82577L)
Barry <- descendants(80788L)

Looking at the maths descendants of two of my supervisors, a coord_flip makes sense with how many people they directly supervised.

ggraph(John, "tree") + 
  geom_edge_diagonal(aes(start_cap = label_rect(node1.name),
                     end_cap = label_rect(node2.name)), strength=0.5) +
  geom_node_label(aes(label=name)) + coord_flip()

ggraph(Barry) + 
  geom_edge_diagonal(aes(start_cap = label_rect(node1.name),
                     end_cap = label_rect(node2.name)), strength=0.5) +
  geom_node_label(aes(label=name)) + coord_flip()