Friday, 12 January 2024

SQL -

 Window Functions - 

Window function:

pySpark window functions are useful when you want to examine relationships within group of data rather than between group of data. It performs statictical operations like below explained.PySpark Window function performs statistical operations such as rank, row number, etc. on a group, frame, or collection of rows and returns results for each row individually. It is also popularly growing to perform data transformations.

There are mainly three types of Window function:

Analytical Function

Ranking Function

Aggregate Function

By using Window function, the query directly fetches data from the next row within the window, eliminating the need for an explicit self join that would create a temporary copy of the table and join it back to itself.


What is Split , apply and combine in window functions?

While not explicitly named as "Split, Apply, Combine" in the context of window functions, these concepts implicitly guide their usage:


1. Split:

Partitioning: Divides the data into groups based on specified criteria using the PARTITION BY clause. Each group serves as a distinct window for calculations.

Ordering: Arranges the rows within each partition using the ORDER BY clause. This ordering affects how window functions access and process data within the window.

2. Apply:

Window Function Application: The chosen window function(s) are applied to each row within its respective window, performing calculations or operations on the designated rows.

3. Combine:

Result Integration: The results of the window function calculations are seamlessly integrated into the output, generating new columns alongside original data without aggregation.

Concise Output: The final result retains all original rows while incorporating the derived values from the window functions, providing a comprehensive view of the data with added insights.


SELECT customer_id, order_date, order_amount,

       SUM(order_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS running_total

FROM orders;


Breakdown:

Split:

PARTITION BY customer_id: Splits data into groups based on customer ID.

ORDER BY order_date: Orders rows within each group by order date.

Apply:

SUM(order_amount) OVER (...): Calculates the running total of order amounts for each customer, applied to each row within its window.

Combine:

The original columns (customer_id, order_date, order_amount) are combined with the calculated running_total column, providing a comprehensive view of each order and its cumulative value within the customer's purchase history.

Thursday, 11 January 2024

Data Structures and Algorithms

Space and time complexity are fundamental measures of an algorithm or data structure's efficiency in terms of memory and execution time. They are crucial concepts in computer science, as they help developers choose the most suitable solutions for various programming tasks.

Here's a breakdown of how space and time complexity apply to different algorithms and data structures:

Algorithms:

  • Sorting algorithms (e.g., Bubble Sort, Merge Sort, Quick Sort): Time complexity is analyzed to compare their efficiency in sorting data.
  • Searching algorithms (e.g., Linear Search, Binary Search): Both time and space complexity are considered to assess their effectiveness in finding elements within data structures.
  • Graph algorithms (e.g., Depth-First Search, Breadth-First Search): Space complexity is often a key consideration due to the potential for large graph representations.
  • Recursive algorithms: Time complexity analysis includes evaluating the depth of recursion and potential for overlapping subproblems.

Data Structures:

  • Arrays: Time complexity for accessing elements is O(1), while space complexity is O(n) to store n elements.
  • Linked Lists: Time complexity for insertion and deletion is O(1) on average, but space complexity is O(n) due to node overhead.
  • Stacks and Queues: Time complexity for basic operations (push, pop, enqueue, dequeue) is typically O(1), and space complexity is O(n).
  • Trees: Time complexity for operations varies depending on tree type and balance. Space complexity is usually O(n) for storing nodes.
  • Hash Tables: Average-case time complexity for insertion, deletion, and lookup is O(1), but space complexity is O(n) to accommodate potential collisions.

General Guidelines:

  • Time complexity is commonly expressed using Big O notation, indicating how the algorithm's execution time scales with input size.
  • Space complexity considers the amount of memory required for data storage and algorithm execution.
  • Optimal algorithms aim for low time and space complexity to achieve efficient performance and resource usage.

Remember that analyzing space and time complexity is essential for:

  • Algorithm selection: Choose the most efficient algorithm for a given task based on its resource requirements.
  • Code optimization: Identify potential bottlenecks and improve code performance.
  • Scalability: Ensure algorithms and data structures can handle growing input sizes without significant performance degradation.

SQL -

 Window Functions -  Window function: pySpark window functions are useful when you want to examine relationships within group of data rather...