Bookmarks
A collection of helpful resources that I might want to find again in the future or point other people towards.
Advice: Faculty
- Thread by Sarah Sheffield (@sarahsheffield) advice for new faculty.
- Course Free 10 week course on google drive about writing academic an academic syllabus. Course by @DLabree, recommended by Elena Aydarova (@aydarova).
- Tweet by Gero Grams (@GeroGrams) on how to say no to things
- Thread by Mine Dogucu (@MineDogucu) on learning/teaching (with) R. See also:
- Thread by Wes Kao (@wes_kao) on managing up
- Thread by Vrinda Nair (@VnVrinda) on 22 tools for your PhD Journey
- Managing research careers tool by Edinburgh University Thread and Webpage
- Talk by Laura Albert (@lauraalbertphd) on time management: Do less, Do it faster, Do it at the right time.
Advice: Graduate Students
- Thread by Matt Betts on 8 questions to consider before starting a PhD.
- Thread by Maram Duncan (@MCDuncanLab) on the hidden curriculum for new grad students
- Thread by @LifeAfterMyPhD on 5 low-stakes steps to set yourself up for an industry job search
- Thread by @mathladyhazel on the best math books for self-learners
- Thread by Alex Eble (@alexeble) on adivce for thriving in a PhD. Full document here. Note: has US and economics focus, but translates well.
- Website Stats Notes in the British Medical Journal. Like a dictionary but for stats words and methods.
- Book Esstential Math for Data Scienceby Thomas Nield. Recommended by Vicki Boykis for those tooking for an intro/refresher on linear algebra, probability and statistics.
Agent based modelling
- Paper review of agent based model (preprint of JEL - chunky at 90 pages!)
Analysis and Asymptotics
- Video Lectures by Steven Strogatz (@stevenstrogatz) on asymptotics and pertubation methods.
Causal Inference
- Lecture Notes by Matt Blackwell, “Causal Inference with Applications”
- Paper The taboo against explicit causal inference in nonexperimental psychology. Suggested by Brian Nosel (@BrianNosek)
- Thread by Volodymyr Kuleshov (@volokuleshov) about the ICML 2022 tutorial on Causality and Fairness
- Video Science before statistics: Causal Inference by Richard McElreath (3 hour crash course in causal inference)
- Video Series Statistical Rethinking (2022) by Richard McElreath on Youtube
Coding (General)
Short-form
- Article on setting up a private .gitignore to keep a clean codebase
- Book / website on package development in Python. (Think Hadley & Bryan’s R packages but for Python) Recommended by @EmilyRiederer
- Book The missing readme by Chris Riccomini and Dmitriy Ryaboy
- Paper Ten Simple Rules for Taking Advantage of Git and GitHub
- Sildes by Ariel Muldoon (@aosmith16) on “More git and github - collaborators, merege conflicts and pull requests”
- Docs for
{renv}
Long-form
Docker
Databases
- Article by architecture notes (@arcnotes) on “Things you should know about databases”
Datasets
- Thread by R Ladies - Sources of Messy(ish) data
Data Ethics
The Verge - AI Drug development maske chemical weapons
Richard McElreath recommendation of paper by Xiao-Li Meng on how data quality influences effective sample size
Imperial Explainable AI Seminars
Thread by Santiago (@svpino) on imbalanced datasets
Tweet by Adam Kruchten (@AdamKruchten) on when we care about marginal or conditional effects.
ASA article on the 2020 work and salary survey, showing women tend to earn less in base salary and total income but that in a regression gender is not significant predictor of total income.
NPR story Where Google find that men are underpaid
Thread by Paul Hunermund (@PHunermund) on the above google article.
Preprint on reconstructing large portions of training data from a trained neural network
Paper Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development. Suggested by Abeba Birhane (@Abebab)
Paper Big data loses to good data. Unrepresentative big surveys significantly overestimated US vaccine uptake.
Paper On extending LinearSHAP, TreeSHAP and DeepSHAP to RKHS-SHAP. Found through tweet by Siu Lun Chau (@Chau9991).
Chapter 33 on interpretability of book Probabilistic Machine Learning: Advacned Topicsby Kevin Murphy (@sirbayes), Been Kim (@_beenkim) and others.
Book by Claire McKay Bowen “Protecting your privacy in a data-driven world”
Tweet by Rasha Shrain (@rashaben) requesting reading materials on p-values and p-hacking
Course 12 week reading course on Ehics and Data Science by Rohan Alexander
Data Visualisation
- Thread by Indrajeet Patil (@patilindrajeets) on the effective use of colours in data vis.
- R Package
{performance}
for aesthetically pleasing ggplot sytle diagnostic plots. (The qq plot even has tolerance intervals!) - Blog Post by Thomas Mock on Creating and using custom ggplot2 themes
- Blog Posts by Ameila McNamara (@AmeliaMN) about Histograms and Kernel Density Estimation
History of Statistics
- Course 10-week reading list on “History of Statistics and Data Sciences” by Rohan Alexander
Markdown
Memes
- Status code 400 meme by @da_667
- The binary search tree actually exists by Ahmad Awais (@MrAhmadAwais)
- Data Science Dinosaur A computer science python eating a statistics elephant
Science Communication
- Thread by Carl Bergstrom (@CT_Bergstrom) and Ryan McGee (@RS_McGee) telling the story of a paper using a comic strip and stick-figure Darwin.
- Thread by Tessa Davis on slide design to keep your audience engaged
- Thread by Dorsa Amir (@DorsaAmir) on slide design
- Website OpenPeeps - Open Source hand drawn individual characters
- Blog Post by Kate Jolly (@katejolly6) on designing slides in xaringan with
xaringanthemer
and css.
Machine Learning
- Blog Post on double descent in neural network performance ### Optimisation
- Video by Trefor Bazett (@TreforBazett) on using Lagrange multipliers to solve constrained optimisation problems
- Video Series by @3blue1brown on constrained optimisation (hosted on khan academy)
- Course Notes Advanced Data Analysis from an Elementary Point of View
Parquet
- Thread by Pau Labarta Bajo (@paulabartabajo_)
Point processes
- Course Material By Rick Schoenberg on point process models
- Blog Post by Benjamin Cretois on fitting point process models in
stan
.
Professional Development
- Tweet by Francisco Yirá (@francisco_yira) about designing a personal learning plan.
Quarto
SQL
Thread by Tom Carpenter (@tcarpenter216) on translating dplyr skills to SQL
Online resources for learning SQL, as recommended by Ijeoma Okereafor (@MeetIjeoma)
- http://sqlbolt.com
- http://w3schools.com/sql
- http://mode.com/sql-tutorial
- http://sqlteaching.com
- http://SQLZoo.net
- http://selectstarsql.com
- https://pgexercises.com
SQL games recommended by Vikas Rajputin (@vikasrajputin)
- (SQL Island)[https://sql-island.informatik.uni-kl.de/] (In German but chrome translation is pretty good)
- (SQL Murder Mystery)[https://mystery.knightlab.com/]
- (SQL Polic Department)[https://sqlpd.com/]
Workflow
Writing
- Books on writing suggested by Helen Sword, author of Stylish Academic Writing.