I have mostly worked with python all my life and use pandas extensively for data manipulation. Take this dataframe for example:
Name
Income (in thousands)
Gender
John
19
Male
Jane
21
Female
Jonas
18
Male
Jules
25
Female
To see the summary statistics based on class, I can simply use a pandas function.
But how do you do it in R? I have seen a few solutions on the internet but none of them are quite readable or maintainable. Please provide code examples.
You would have to give four separate expressions (as arguments) to summarize to get the exact result pandas gave you by default. Well, there is a separate function for when you want to calculate a statistic (or even multiple) over the entire dataframe:
The %>% is called the pipe operator and works similar to how you expect the . (dot) operator to work when using pandas. The pipe passes the thing to its left as the first argument to the thing to its right.
Also, RStudio has fantastic cheat sheets about a lot of R data science libraries, which you can use as a reference about the kinds of functions these libraries provide.