In our work, we often encounter the problem of doing various aggregation calculations based on time or other dimensions. Problem 1 : user login to a website ,statistical activity duration Suppose there is a DataFrame of user logins to a website, for instance: +----------------+----------+
| user_name|login_date|
+----------------+----------+
|smith |2020-01-04|
|bob |2020-01-04|
|bob |2020-01-06|
|john |2020-01-10|
|smith |2020-01-11|
+----------------+----------+