Friday, 12 January 2024

SQL -

 Window Functions - 

Window function:

pySpark window functions are useful when you want to examine relationships within group of data rather than between group of data. It performs statictical operations like below explained.PySpark Window function performs statistical operations such as rank, row number, etc. on a group, frame, or collection of rows and returns results for each row individually. It is also popularly growing to perform data transformations.

There are mainly three types of Window function:

Analytical Function

Ranking Function

Aggregate Function

By using Window function, the query directly fetches data from the next row within the window, eliminating the need for an explicit self join that would create a temporary copy of the table and join it back to itself.


What is Split , apply and combine in window functions?

While not explicitly named as "Split, Apply, Combine" in the context of window functions, these concepts implicitly guide their usage:


1. Split:

Partitioning: Divides the data into groups based on specified criteria using the PARTITION BY clause. Each group serves as a distinct window for calculations.

Ordering: Arranges the rows within each partition using the ORDER BY clause. This ordering affects how window functions access and process data within the window.

2. Apply:

Window Function Application: The chosen window function(s) are applied to each row within its respective window, performing calculations or operations on the designated rows.

3. Combine:

Result Integration: The results of the window function calculations are seamlessly integrated into the output, generating new columns alongside original data without aggregation.

Concise Output: The final result retains all original rows while incorporating the derived values from the window functions, providing a comprehensive view of the data with added insights.


SELECT customer_id, order_date, order_amount,

       SUM(order_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS running_total

FROM orders;


Breakdown:

Split:

PARTITION BY customer_id: Splits data into groups based on customer ID.

ORDER BY order_date: Orders rows within each group by order date.

Apply:

SUM(order_amount) OVER (...): Calculates the running total of order amounts for each customer, applied to each row within its window.

Combine:

The original columns (customer_id, order_date, order_amount) are combined with the calculated running_total column, providing a comprehensive view of each order and its cumulative value within the customer's purchase history.

Thursday, 11 January 2024

Data Structures and Algorithms

Space and time complexity are fundamental measures of an algorithm or data structure's efficiency in terms of memory and execution time. They are crucial concepts in computer science, as they help developers choose the most suitable solutions for various programming tasks.

Here's a breakdown of how space and time complexity apply to different algorithms and data structures:

Algorithms:

  • Sorting algorithms (e.g., Bubble Sort, Merge Sort, Quick Sort): Time complexity is analyzed to compare their efficiency in sorting data.
  • Searching algorithms (e.g., Linear Search, Binary Search): Both time and space complexity are considered to assess their effectiveness in finding elements within data structures.
  • Graph algorithms (e.g., Depth-First Search, Breadth-First Search): Space complexity is often a key consideration due to the potential for large graph representations.
  • Recursive algorithms: Time complexity analysis includes evaluating the depth of recursion and potential for overlapping subproblems.

Data Structures:

  • Arrays: Time complexity for accessing elements is O(1), while space complexity is O(n) to store n elements.
  • Linked Lists: Time complexity for insertion and deletion is O(1) on average, but space complexity is O(n) due to node overhead.
  • Stacks and Queues: Time complexity for basic operations (push, pop, enqueue, dequeue) is typically O(1), and space complexity is O(n).
  • Trees: Time complexity for operations varies depending on tree type and balance. Space complexity is usually O(n) for storing nodes.
  • Hash Tables: Average-case time complexity for insertion, deletion, and lookup is O(1), but space complexity is O(n) to accommodate potential collisions.

General Guidelines:

  • Time complexity is commonly expressed using Big O notation, indicating how the algorithm's execution time scales with input size.
  • Space complexity considers the amount of memory required for data storage and algorithm execution.
  • Optimal algorithms aim for low time and space complexity to achieve efficient performance and resource usage.

Remember that analyzing space and time complexity is essential for:

  • Algorithm selection: Choose the most efficient algorithm for a given task based on its resource requirements.
  • Code optimization: Identify potential bottlenecks and improve code performance.
  • Scalability: Ensure algorithms and data structures can handle growing input sizes without significant performance degradation.

Sunday, 24 December 2023

Python File Handling

File Handling

File handling is an important part of any web application. 
Python provides inbuilt functions for creating, writing, and reading files.
There are two types of files that can be handled in python,normal text
files and binary files.

By using the below modes we can read or write into the files 

There are four different methods (modes) for opening a file: 
"r" - Read - Default value. Opens a file for reading, error if the file does
not exist 
"a" - Append - Opens a file for appending, creates the file if it does not
exist 
"w" - Write - Opens a file for writing, creates the file if it does not exist
"x" - Create - Creates the specified file, returns an error if the file exists 
"t" - Text - Default value. Text mode 
"b" - Binary - Binary mode (e.g. images) 

We can perform read and write simultaniously by using the below modes

Read and Write (‘r+’): Open the file for reading and writing. The
handle is positioned at the beginning of the file. Raises I/O error if the
file does not exist. 
Write and Read (‘w+’) : Open the file for reading and writing. For an
existing file, data is truncated and over-written. The handle is
positioned at the beginning of the file. 
Append and Read (‘a+’) : Open the file for reading and writing. The file
is created if it does not exist. The handle is positioned at the end of the
file. The  data being written will be inserted at the end, after the existing
data. 

xb -- create binary
wb -- write binary 
ab -- append binary 
rb -- read binary


file built in functions -- open()
file built in methods -- read()--reads entire file ,readline() -- here we need to give number to read those many charecters,readlines() -- it prints all lines in the form of list. -- all these are for to read
write() -- to write
close() -- to close
seek() -- To change file position 
tell() -- Returns the currentfile position

--Create file example
f = open("C:/Desktop/file.txt", "x")
f.close()  

--To open a file and write
--Write method will delete the old data and writes new data.
f = open("file.txt", "w") 
f.write("hellow")
f.close()
-- Append method
f.append("hellow")
f.close()

By looping through the lines of the file, you can read the whole file, line by line: 
f = open("C:/Desktop/file.txt", "r")
for x in f: 
    print (x)
f.close()


Saturday, 23 December 2023

SQLite3 Database

import sqlite3
# For connecting to the database. If the database is not in the system, it will create and connect to the database
con = sqlite3.connect("mydatabase1.db")
# If you want to create a table or insert data into the table, declare a cursor
cur = con.cursor()
# By using execute, we can create a table and execute the table
cur.execute('''
    CREATE TABLE IF NOT EXISTS students (
        name VARCHAR(50),
        rollno VARCHAR(50),
        section VARCHAR(50)
    )
''')
con.commit()
cur.execute('INSERT INTO students VALUES ("sai", "511A", "sectionA")')
cur.execute('INSERT INTO students VALUES ("ramesh", "512A", "sectionB")')
cur.execute('INSERT INTO students VALUES ("raju", "513A", "sectionC")')
con.commit()
con.close() 

Output can be seen from below Query 

import sqlite3
con=sqlite3.connect("mydatabase.db")
cur=con.cursor()
b=cur.execute('select * from students')
print(b.fetchall())
con.close()

Thursday, 21 December 2023

tkinter

 Please find the Basic Code to write tkinter Programme 

'''By using lable method we can create text , by using entry method we can create box
pack(),grid() and place() are geometry managers'''
# Importing the tkinter
from tkinter import *
import sqlite3
import tkinter as tk

# Initializing the screen
win = tk.Tk()

win.geometry("400x400+10+20")

#Creating the lable
l=Label(win, text='Hellow Nagendra')
# Adding lable to the window
l.pack()
#Creating the entry
e=Entry(win)
e.pack()
#creating the button
b=Button(win, text='Button1')
# Adding Button to the window
b.pack()
# Creating the Text Widget
t=Text(win,width=30,height=10)
# Adding text to the window
t.pack()
# Creating Check Box
c = Checkbutton(win,text='Yes')
# Adding Checkbox to the window
c.pack()
# Creating Radio button
r = Radiobutton(win,text='Yes')
# Adding radio button to the window
r.pack()

# Adding the Option Menu
v=StringVar()
o=OptionMenu(win,v,'Python',*('RAM','kan','Jyo','Tul'))
o.pack()
# Adding the Scale
s=Scale(win,from_=0,to=100)
s.pack()


###you can add widgets here. This is used to start the application
win.mainloop()

Interview Prepatarion -

 

Please follow below sites to learn and Crack Pytohn Interview's

leetcode.com 
hackerrank.com 
kaggle.com 
exercism.io
projecteuler.net
DevProjects
codewars.com
hackerearth.com
codechef.com 
geeksforgeeks.org 
coderbyte.com
w3schools.com 
codeforces.com
replit.com
###################################################################################
GIT Architecture

Git architecture and workflow

Multiple contributors can work on the same project at once thanks to the distributed version control system known as Git. Each developer in Git has a local repository that contains an exact replica of the project’s history. Local commits are followed by pushes to a central repository, which is typically housed on a platform such as GitHub or GitLab. As merging and conflict resolution are fundamental components of the Git workflow, this enables collaboration without causing constant overwriting of one another’s work.

Let’s examine a few of the fundamental Git commands you’ll frequently use:

  • Clone: You can copy an existing repository to your local machine using the [Repository_url] git clone command.
  • Commit: When you commit, your changes are saved to the local repository. Git add [file_names] is used to stage changes prior to committing, and git commit -m Commit message is used to commit the staged changes.
  • Push: This sends your committed changes to your remote repository. GitHub push origin [branch_name].
  • Pull: This updates your local repository by downloading updates from a remote repository to your computer. [Branch_name] git pull origin.

After laying a solid foundation in the fundamentals of Git, let’s move on to the next subject: exploring branches, merges, and best practices for efficiently managing challenging data engineering projects.

Let’s look at an instance where you’re tasked with adding a new data validation feature to an already existing ETL pipeline as a data engineer. If your team uses the Git workflow, you will branch off the develop branch and work on your changes in a new branch called feature. When your feature branch is finished, you will start a pull request to merge it back into the develop branch. Eventually, this will be merged into the main branch as part of a planned release.

Consider a different scenario in which you and another engineer are tasked with improving various components of an algorithm for processing data. You both make adjustments to different branches. Git flags a conflict when merging these branches because both of you changed the same method. To fix this, you would manually select which changes to keep in the conflicting file, possibly integrating both sets of optimizations, before successfully completing the merge.

Git collaboration involves more than just pushing your changes; it also involves staying in sync with other people. You can keep your local repository updated with the most recent changes from your team by using the git pull and git fetch commands. git fetch gives you an additional level of control by allowing you to review the changes before merging, in contrast to git pull, which will fetch the changes and immediately merge them into your current branch.

Imagine you are a member of a distributed team in charge of a sizable data pipeline. A global teammate of yours has made some improvements and pushed them to the remote repository. Before merging these changes into your local branch for additional testing, you can use git fetch to verify them.

###################################################################################

Tuesday, 19 December 2023

Regular expressions in Python

#Regular expressions - 

Checking the documentation of re 

import re
help(re)
dir(re)

The re.compile() function returns a regular expression object. This object represents the compiled version of the regular expression pattern.
import re
pattern = re.compile(r'\d+')  # This pattern matches one or more digits.

So, in the regular expression pattern \d{4}, \d is used to match exactly one digit, and {4} specifies that this digit should occur exactly four times in sequence. Therefore, \d{4} is a pattern that matches sequences of four consecutive digits.

string = "The Euro STOXX 600 index, which tracks all stock markets across Europe including the 
FTSE, fell by 11.48% – the worst day since it launched in 1998. The panic selling prompted by 
the coronavirus has wiped £2.7tn off the value of 
STOXX 600 shares since its all-time peak on 19 
February."

import re

# Define a raw string with a regular expression pattern
s = r"\d{4}"

# Compile the regular expression pattern
t = re.compile(s)

# Use the compiled pattern to find all occurrences in the string
result = re.findall(t, string)

# Print the result
print(result)

###################################################################################

>>> import re
>>> result = re.search(r"\d{3}", string)
>>> result
<re.Match object; span=(15, 18), match='600'>

So, in the regular expression pattern \d{4}, \d is used to match exactly one digit, and {4} specifies that this digit should occur exactly four times in sequence. Therefore, \d{4} is a pattern that matches sequences of four consecutive digits.

###################################################################################

Here's a brief breakdown of how the pattern works:

\w: Match a single word character (alphanumeric character or underscore).
{3}: Match exactly three occurrences of the preceding word character

>>> result = re.match(r"\w{3}", string)
>>> result
<re.Match object; span=(0, 3), match='The'>

###################################################################################

>>> result = re.findall(r"\d{3}", string)
>>> result
['600', '199', '600']
###################################################################################

>>> result = re.split(r"\s", string)
>>> result
['The', 'Euro', 'STOXX', '600', 'index,', 'which', 'tracks', 'all', 'stock', 'markets', 'across', 'Europe', 'including', 'the', '', 'FTSE,', 'fell', 'by', '11.48%', '–', 'the', 'worst', 'day', 'since', 'it', 'launched', 'in', '1998.', 'The', 'panic', 'selling', 'prompted', 'by', 'the', 'coronavirus', 'has', 'wiped', '£2.7tn', 'off', 'the', 'value', 'of', 'STOXX', '600', 'shares', 'since', 'its', 'all-time', 'peak', 'on', '19', 'February.']

###################################################################################

>>> result = re.sub(r"[A-Z]{2,}", "INDEX",string)
>>> result
'The Euro INDEX 600 index, which tracks all stock markets across Europe including the  INDEX, fell by 11.48% – the worst day since it launched in 1998. The panic selling prompted by the coronavirus has wiped £2.7tn off the value of INDEX 600 shares since its all-time peak on 19 February.'

In simpler terms, the re.sub() function is replacing any sequence of two or more consecutive uppercase letters in the original string with the word "INDEX".

###################################################################################

>>> result = re.findall(r"the", string)
>>> result
['the', 'the', 'the', 'the']

###################################################################################
'''Prints three charecters of decimals'''
string = "The Euro STOXX 600 index, which tracks all stock markets across Europe including the FTSE, fell by 11.48% – the worst day since it launched in 1998. The panic selling prompted by the coronavirus has wiped £2.7tn off the value of STOXX 600 shares since its all-time peak on 19 February."
import re
result = re.findall(r"\d{3}", string)
print(result)
###################################################################################
'''Spilts all words into List of charecters '''
string = "The Euro STOXX 600 index, which tracks all stock markets across Europe including the FTSE, fell by 11.48% – the worst day since it launched in 1998. The panic selling prompted by the coronavirus has wiped £2.7tn off the value of STOXX 600 shares since its all-time peak on 19 February."
import re
result = re.split(r"\s", string)
print(result)
###################################################################################
'''It will remove the Index charecter'''
import re
pattern = r'\d+'  # This is a regular expression pattern to match one or more digits
replacement = 'X'
text = 'There are 123 apples and 456 oranges.'
result = re.sub(pattern, replacement, text)
print(result)
'''Output -- There are X apples and X oranges.'''
###################################################################################

SQL -

 Window Functions -  Window function: pySpark window functions are useful when you want to examine relationships within group of data rather...