Pandas is a very popular open-source Python programming library that is commonly used for data science. It can perform almost all SQL-like operations like SELECT, WHERE, GROUP BY, JOIN, UNION, UPDATE, DELETE, and many more and is very helpful when dealing with large data sets.
PostgreSQL outperforms pandas when working with standalone operations like filter, groupby, sort, and join, but when it comes to complex real-world queries pandas is much faster. This is because a database gives you stuff like concurrency, locking, indexing, etc., which costs some execution time. PostgreSQL is also not designed for transforming data in certain ways. PostgreSQL isn’t good for describing sets of rows, e.g. weighted moving average, string manipulation, regex, and pivot tables. So for such complex reports, it is always safer to use pandas data frames.
Let me know of other Python libraries that you have found helpful in optimizing performance.