Microsoft Fabric and Delta Lake to manage real-time streaming data to boost your business insights.

I do enjoy talking to customers and discuss the challenges they are facing in the system, services, limitations etc. I may not be solving their architectural challenges instantaneously, but I would sleep on it to get the most optimized solution for their business model. (Remember – “Optimized” will vary based on the actual needs and depends on customer)

Key challenges of any business with conventional data processing system.

  • Most of the conventional system process the data in batches (hourly(s), nightly, weekly) which is considerably late for most of the business in the current competitive market.
  • To get the real-time feedback about your business/products from your end users/consumers.

To address the above challenges, I am demonstrating how we can make use of modern technologies to process the streaming data to make better sense of the data.

Note: Any data landing on a platform or in a format which is unfriendly to analyze are no better than having no data.

Architectural diagram to handle real-time data.

  • For the purpose of demonstration, we have used “coin cap” open streaming API which gives real time pricing of crypto currency, (focused on bitcoin price).
  • Using Azure event hub as the first landing zone to stream the data to second landing zone which is the latest Microsoft Fabric.
  • Spark cluster in Microsoft Fabric has been used to process and pick the relevant data which is essential for business insights.

Tip: You can use twitter or any social networking API to retrieve the data related to your product or service and understand the consumer feedback much faster and prevent brand name damage. Perhaps tracking product hash tag would be a better pick to start with.

Pre-requisites:

Step 1: Set up the streaming python application. (Source)

Note: Here I am setting up a python application to stream the bitcoin Realtime price every second. EVENT_HUB_CONNECTION_STR = You get the connection string from Event hub namespace -> Shared Access Policies -> Properties of the policy. EVENT_HUB_NAME = event hub name, not the name space.

<code>
import asyncio
import requests

from azure.eventhub import EventData
from azure.eventhub.aio import EventHubProducerClient

EVENT_HUB_CONNECTION_STR = "<your event hub connection string>"
EVENT_HUB_NAME = "<your event hub name>"

async def run():

    while True:
        await asyncio.sleep(1)

        producer = EventHubProducerClient.from_connection_string(conn_str=EVENT_HUB_CONNECTION_STR, eventhub_name=EVENT_HUB_NAME)
        async with producer:
            # Create a batch.
            event_data_batch = await producer.create_batch()
            url = "https://api.coincap.io/v2/rates/bitcoin"
            payload={}
            headers = {}
            response = requests.request("GET", url, headers=headers, data=payload)
            data=(response.json())
            event_data_batch.add(EventData(str(data)))

            # Send the batch of events to the event hub.
            await producer.send_batch(event_data_batch)
            # A tracer message which shows the event interval
            print("sent to azure successfully")

loop = asyncio.get_event_loop()

try:
    asyncio.run(run())
    loop.run_forever()
except KeyboardInterrupt:
    pass
finally:
    print("Closing the loop")
    loop.close()

Step 2: Setup up the Lakehouse in fabric

  • Login to the Microsoft fabric account and create a Lakehouse.
  • Save the path and URL of both tables and files (from properties). This is needed in our query to create delta late and spark tables respectively.

Tip: You can also download OneLake explorer app if you prefer Download OneLake file explorer from Official Microsoft Download Center.

Step 3: Code to consume the data from event hub and store it in a DeltaLake.

  • Use your event hub connection string, make sure that you have minimum read permission.
  • Define the streaming content.
<code>
from datetime import date
ehConf = {
  'eventhubs.connectionString' : sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt('<your event hub connection string with shared access key>')
}
df_streaming = spark.readStream.format("eventhubs").options(**ehConf).load()
df = df_streaming.withColumn("body", df_streaming["body"].cast("string"))
  • Define the delta lake.
  • Files/DeltaLake/Delta_Tables/ -> Create subdirectories inside your OneLake “Files”. This is where you delta table is going to be.
  • “checkpointLocation”, -> This is my another dierctory inside files to store checkpoint files and other metadata.
  • Little modification in file path just to partition the data every day, this will give is the flexibility of choosing the subset.
<code>
DeltaLakePath=("Files/DeltaLake/Delta_Tables/"+date.today().strftime("%Y%m%d"))
df.writeStream \
    .format("Delta") \
    .outputMode("append") \
    .option("checkpointLocation","abfss://<abfss connection string saved earlier from one lake>") \
    .start(DeltaLakePath)
  • Define spark table from delta for analysis/reporting.
  • Spark table will be created for each day of data.
<code>
DeltaLakePath=("Files/DeltaLake/Delta_Tables/"+date.today().strftime("%Y%m%d"))
defTable='CREATE TABLE bitCoin_'+date.today().strftime("%Y%m%d")+' USING DELTA Location '+"'"+DeltaLakePath+"'"
spark.sql(defTable)

Tip: Yes, I have the same copy of data in spark table now, its redundant data!!! Pick one on your convenience.

Let us check the table data via explorer.

Let us check the Delta Lake data.

Great !! files are loaded. Now we have to deal with json and process further to clean the data.

json we have -> {‘data’: {‘id’: ‘bitcoin’, ‘symbol’: ‘BTC’, ‘currencySymbol’: ‘₿’, ‘type’: ‘crypto’, ‘rateUsd’: ‘29017.4373609315518503’}, ‘timestamp’: 1692218138650}

For this demo focus are on crypto name, price and time. Hence thought of a view for picking what I want.

There you go!!! We have the real-time bitcoin price now.

Tip from fabric: Optimize delta table with small file compaction.

Hope this would help your business to deal with real data and take your insights to next level .

Happy learning. Thank You!

Hybrid data management

Current businesses are dynamic, so the data generated. Conventional RDBMS systems may face challenges hosting and analyzing this data, in this blog we see how I can make use of modern platforms and process data efficiently and with cost in mind.

Let’s start understanding the requirements from the customer and start modelling.

Requirement 1: Human resources department in company “X” needs to build and maintain a database to maintain their employee details, pay history etc.

Requirement 2: Finance department needs to build and maintain a database to maintain their sales performance data.

Common requirements to consider from their office of CTO:

  • They plan to integrate multiple departments and project this as a centralized single source of truth in future, so scalability is key.
  • Should be format independent to be flexible to customize as per the client requirement.
  • Solution ideal for analytical processing than transactional processing. They don’t plan to retire their conventional relational system anytime sooner.
  • Cost effective, at this moment features like High Availability, Replication are not needed and prefer to avoid saving few $. Cloud solution is ideal.

Considering all the points listed I will demonstrate how this can be accomplished using “Azure Synapse Serverless Lake Database“.

“Perhaps I will write how to implement this in Databricks lake house next time.”

A lake database is a virtual database created on Azure synapse workspace which points to one or more file storage beneath. (Database itself doesn’t store any user data in their table). To further understand I recommend referring. https://learn.microsoft.com/en-us/azure/synapse-analytics/media/metadata/shared-databases.png

Here I am implementing the structure encircled in red above. 

Assumptions:

  • You already have a data lake with data loaded for HR data. (Parquet, csv etc..). If not, your primary focus should be to fetch data and store it in azure data lake. You may use adf pipeline or any ETL framework to extract, curate and load.
  • You already have an Azure synapse workspace and connection (linked service) established to the lake.

Let’s build schema for their HR data which is a best candidate for external table due to the reason that data is not very volatile and in fact the underlying data will get refreshed by an ETL/ELT pipeline every night.

Step 1: Create a lake database.

Step 2: Define the linked service for your data source and the format of the file for the purpose of lake database creation. (Please note this can be changed later for every table you create)

Step 3: Create Lake database table out of the data reside in your lake.

Tip: Partitioning your data in data lake is key here. Polybase do support wildcards.

Step 4: Point table to your source data lake directory

If all above steps are followed you should see the tables in your azure synapse lake database.

Tip: You can even set the relationship between tables for the purpose of data diagram.

Let’s query the tables created with basic SQL with joins.

SELECT te.JobTitle,te.LoginID,eh.rate,eh.payfrequency FROM [productdata].[dbo].[olap_tblemployee] te join [productdata].[dbo].[olap_tblemployeepayhistory] eh on te.businessentityid=eh.businessentityid

There you are.!!!

The “dbo” schema is reserved for the lake tables that are originally created in Spark or database designer. Hence don’t forget to call the three-part name of the object.

Now it’s time for building schema for the volatile data owned by finance department. For this , first we must create a “delta” lake. Delta lake is an abstract layer on top of my azure data lake. I chose delta lake for my volatile data due to it supportability of ACID transactions, data manipulation language including upsert and update, data integrity, streaming etc. (always finance department needs something additional out of their money…lol)

Assumptions:

  • You already have a data lake with data loaded for finance data. (Parquet, csv etc..). If not, your primary focus should be to fetch data and store it in azure data lake. You may use adf pipeline or any ETL framework to extract, curate and load.

Step 1: Build a “delta lake” on my azure “data lake”.

–loading data into data frame from a parquet

%%pyspark
df = spark.read.load('abfss://[email protected]/staging/SalesOrderHeader/SalesOrderHeader.parquet', 
format='parquet',header=True)

–define delta location and save the data frame as delta

delta_table_path = "abfss://[email protected]/Sales/delta/SalesOrderHeader"
df.write.format("delta").save(delta_table_path)

Step 2: It’s time to define the schema on top of delta lake. I prefer SQL for keeping it simple.

–create table from the delta location, point to the directory.

%%sql
CREATE TABLE productdata.oltp_SalesOrderHeader
USING DELTA
Location 'abfss://[email protected]/Sales/delta/SalesOrderHeader'

If above steps are followed you should see the tables in your azure synapse lake database. (I created two tables for demo)

Step 3: Here are few sample queries to select, update, insert operations.

%%sql
--update the table the way you want, just like a SQL table

UPDATE productdata.oltp_salesorderheader 
SET status = 7 WHERE status = 5;
select * from productdata.oltp_salesorderheader

--insert a value into the table 

insert into productdata.oltp_SalesOrderHeader
values(43659,1,'4911-403C-98',1,776,1,2024.994,0.00,2024.994000,'B207C96D-D9E6-402B-8470-2CC176C42283','2011-05-31 00:00:00.000')

-- filtering
select * from productdata.oltp_SalesOrderDetail
WHERE SalesOrderID = 43659

-- joining two tables
SELECT sd.SalesOrderID,sd.orderqty,sh.status,sh.DueDate 
FROM [productdata].[dbo].[oltp_salesorderdetail] sd
join [productdata].[dbo].[oltp_salesorderheader] sh 
on sd.SalesOrderID=sh.SalesOrderID

Now I can read, write,join,merge,upsert,delete etc. !!!

Note: As of today, when I am writing this, delta lake is never a 100% substitute for your RDBMS system. Delta lake will be a better candidate if you have an application which doesn’t need high concurrency transactional processing. Most of the enterprise Datawarehouse has mpp architecture hence those systems can manage concurrency very well than delta lake tables without negotiating the performance.

This is how my database diagram looks like in with couple of tables from HR and Finance department. Having said that it is easy and fun to integrate multiple departments and tables.

Tip: I can even create a view, procedure and function in a server less database and point to the tables created on lake database.

Thank you, Happy learning!

Azure Synapse cost saving trick.

###

Few days back I was assigned a consulting effort for a well-known customer who wants to explore an azure offering. One of the discussion during this engagement has dragged me to talk about Azure datawarehouse/Synapse and its functionalities . When I talk about MPP architecture and azure SQL datawarehouse most of my customers don’t forget to say ” it’s very expensive…. Can’t afford” . Though I can justify the cost of spinning a parallel processing system , often I respond them with a smile (even they know why and how it’s expensive ) . But this time I decided not to take this lightly instead I requested them to walk me through their pricing tier and utilization pattern .

I started with one of their non-production synapse data pool and got to know that their billing is considerably high compare to their usage . This is because they kept running the synapse pool during most of the day when most of their internal development and testing teams are active .With the help of few DMV’s and tracing I could understand the pool is idle at least 3-4 hours (out of ~ 10 hours) a day . When I questioned about pausing  the compute during idle time , the challenge brought to the table was to identify an usage pattern . They can’t really predict the usage , it totally depends on the testing and development activity . So if I can give them an automated framework which can look for active sessions periodically and take a decision to pause the synapse pool then half of their pain is subdued. Another half is to resume the synapse pool based on the demand , and this can be sorted out by giving them a self-service web application to resume the synapse compute . ( right now it all depends on admin ,end consumers have to wait based on admins availability )

I have used a runbook to pause datawarehouse compute pool based on usage . Script inside runbook will in-fact pause the compute if there is no activity from last 30 minutes ( time is flexible ) . Checks will not consider internal/system sessions and runbook’s own session .

Get this runbook scheduled every “xyz” minutes . ( in my demo it is 30 minutes )

# Modules to be installed and imported.

Import-Module Az.Accounts
Import-Module SqlServer
Import-Module Az.Sql

# Using automation runasconnection to get TenantId and SubscriptionId
$Connection = Get-AutomationConnection -Name "AzureRunAsConnection"
$TenantId = $Connection.TenantId
$SubscriptionId = $Connection.SubscriptionId
 
# Get the service principal credentials connected to the automation account. 
$SPCredential = Get-AutomationPSCredential -Name "logtest"
 
# Login to Azure ($null is to prevent output since Out-Null doesn't work in Azure)
Write-Output "Login to Azure using automation account 'logtest'."
$null = Connect-AzAccount -ServicePrincipal -Credential $SPCredential -Tenant $TenantId


$Conn = Get-AutomationConnection -Name AzureRunAsConnection
Connect-AzAccount -ServicePrincipal -Tenant $Conn.TenantID -ApplicationId $Conn.ApplicationID -CertificateThumbprint $Conn.CertificateThumbprint


$response = Invoke-WebRequest ifconfig.co/ip -UseBasicParsing
$ip = $response.Content.Trim()
Set-AzContext -SubscriptionId "<your subscription id>"
Set-AzSqlServerFirewallRule -ResourceGroupName "<your resource group>" -ServerName "<your server name>" -FirewallRuleName "Azure Synapse Runbook" -StartIpAddress $ip -EndIpAddress $ip

Start-Sleep -s 180

$status = Get-AzSqlDatabase -ResourceGroupName "<your resource group name>" -ServerName "<your server name>" -DatabaseName "<DW pool name>"

if ($status.Status -like 'paused' )
{
Write-Output "Synapse pool is already paused"
}
else
{
Write-Output "Checking if there are any active transactions"
$params = @{
  'Database' = '<DW pool name>'
  'ServerInstance' =  '<server name>.database.windows.net'
  'Username' = '<user name>'
  'Password' = '<password>'
  'OutputSqlErrors' = $true
  'Query' = 'if exists
( select * from sys.dm_pdw_exec_sessions where status in (''ACTIVE'',''IDLE'') and (session_id <> session_id()) 
and (app_name not in (''Internal'')) and (status in (''IDLE'') and login_time > DATEADD(minute, -30, GETDATE()))
)
    begin
        select 1;
    end
else

    begin
        select 0;
    end'
   }

 $transactions = Invoke-Sqlcmd  @params
 if ($transactions.Column1 -eq 0)
 {
 Write-Output "pausing azure synapse sql pool as there are no active transactions" 
 Suspend-AzSqlDatabase -ResourceGroupName "<your resource group>" -ServerName "<server name>" -DatabaseName "<DW pool name>" | Out-Null
 Write-Output "paused azure synapse sql pool"
 }
 else {
 Write-Output "There are active transactions hence cannot pause"
 }
}

First half is done. Now I may have to look into the second half of the challenge , which is resuming the synapse pool compute without admin intervention . For this very purpose I have developed a simple C# web application which fetch the current status of the synapse pool with a button click and another click to resume the compute if wanted . Intentionally I did not provision a pause button on the grounds that it may create outages if someone try to pause without actually checking the sessions currently going on .

“Button1_Click “ C# method to get the status of the synapse pool in a textbox. Here I am making a REST API call with an SPN authentication to get the current status of the synapse ( its worthy to note that I have hardcoded for the sake of demo purpose . In case you are in need of handing multiple synapse pools then you may have to accept the input from a text box and build a dynamic uri to pass on )

protected void Button1_Click(object sender, EventArgs e)
        {
            string clientId = "<GIVE YOUR SPN ID>";
            string clientSecret = "<GIVE YOUR SPN SECRET>";
            string tenantId = "<GIVE YOUR TENANT ID>";
            string authContextURL = "https://login.windows.net/" + tenantId;
            var authenticationContext = new AuthenticationContext(authContextURL);
            var credential = new ClientCredential(clientId, clientSecret);
            var result = authenticationContext.AcquireToken(resource: "https://management.azure.com/", credential);
            string token = result.AccessToken;
            string rawurl = "https://management.azure.com/subscriptions/subscriptionid/resourceGroups/rgname/providers/Microsoft.Sql/servers/server/databases/pool?api-version=2014-04-01";
            Uri uri = new Uri(String.Format(rawurl));
            var httpWebRequest = (HttpWebRequest)WebRequest.Create(uri);
            httpWebRequest.Headers.Add(HttpRequestHeader.Authorization, "Bearer " + token);
            httpWebRequest.ContentType = "application/json";
            httpWebRequest.Method = "GET";
            HttpWebResponse httpResponse = null;
            try
            {
                httpResponse = (HttpWebResponse)httpWebRequest.GetResponse();
                
            }
            catch (Exception ex)
            {
               
            }
            string result1 = null;
            
            using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
            {
                result1 = streamReader.ReadToEnd();
                dynamic jsondata = JObject.Parse(result1);
                string status = jsondata.properties.status;
                TextBox1.Text = status;

            }

“Button_Click “  to send a resume instruction via REST method .

protected void Button_Click(object sender, EventArgs e)
        {
            string clientId = "<GIVE YOUR SPN ID>";
            string clientSecret = "<GIVE YOUR SPN SECRET>";
            string tenantId = "<GIVE YOUR TENANT ID>";
            string authContextURL = "https://login.windows.net/" + tenantId;
            var authenticationContext = new AuthenticationContext(authContextURL);
            var credential = new ClientCredential(clientId, clientSecret);
            var result = authenticationContext.AcquireToken(resource: "https://management.azure.com/", credential);
            string token = result.AccessToken;
            string puturl = https://management.azure.com/subscriptions/subscriptionid/resourceGroups/rgname/providers/Microsoft.Sql/servers/servername/databases/poolname/resume?api-version=2014-04-01-preview";
            Uri urip = new Uri(String.Format(puturl));
            var httpWebRequestput = (HttpWebRequest)WebRequest.Create(urip);
            httpWebRequestput.Headers.Add(HttpRequestHeader.Authorization, "Bearer " + token);
            httpWebRequestput.ContentType = "application/json";
            httpWebRequestput.Method = "POST";
            try
            {
                using (var streamWriter = new StreamWriter(httpWebRequestput.GetRequestStream()))
                {
                    string vn = null;
                    streamWriter.Write(vn);
                    streamWriter.Flush();
                    streamWriter.Close();
                }
            }
            catch (Exception ex)
            {

            }

            HttpWebResponse httpResponse = null;
            try
            {
                httpResponse = (HttpWebResponse)httpWebRequestput.GetResponse();

            }
            catch (Exception ex)
            {

            }

Remember , credentials used in runbook and SPN used for REST calls require appropriate permission to pause and resume the synapse pool . To further enhance this self service portal you can restrict the access only to designated users and alert admin with information like who resumed ? time /date and purpose . ( and if needed an approval from admin to get it resumed during weekends or holidays or non-business hours for specific set of users)

However ,  this helped my client save an estimate 6K $ per year for a single synapse pool . Hope this helps your customer too !!!

Exception while trying “Set-AzSqlElasticPool : Invalid SKU name ‘GP’ “, [Set-AzSqlElasticPool], CloudException .

###

I was trying to change pricing model of an Azure SQL elastic pool from DTU to vCore using PowerShell . Microsoft has already a ready-made PowerShell command not only to migrate but to set properties of an elastic pool as well (Set-AzSqlElasticPool , Az module should be loaded before using this command). But I am sticking to my intention , I just wanted to use this command only to change the pricing model . I was all set and ready with my script ,

Connect-AzAccount
Set-AzContext -SubscriptionId "<subscription-id>"
$variable1 = Get-AzSqlElasticPool -ResourceGroupName "learning"`
-ServerName "<server-name>"
foreach ($poolname in $variable1.ElasticPoolName)
{
Set-AzSqlElasticPool -ResourceGroupName "learning"`
 -ServerName "<server-name>"`
 -ElasticPoolName "$poolname"`
 -Edition "GeneralPurpose"`
 -VCore "2"`
 -LicenseType "BasePrice"
}

Whoops !!!!

Error :
Set-AzSqlElasticPool : Invalid SKU name 'GP'.
At line:8 char:1
+ Set-AzSqlElasticPool -ResourceGroupName "learning"`
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : CloseError: (:) [Set-AzSqlElasticPool], CloudException
    + FullyQualifiedErrorId : Microsoft.Azure.Commands.Sql.ElasticPool.Cmdlet.SetAzureSqlElasticPool

I have used all parameters which are necessary for the command as per the “docs.microsoft.com” ( as on 5/8/2020) but wasn’t beneficial .I spent a decent amount of time trying to understand what am I missing in here , don’t know from where the SKU “GP” is picking up by the error ( though I have mentioned as GeneralPurpose). So this leads me to think there is something which is wrong with the query or I am missing some
mandatory parameter which is not mentioned in MS docx ( no one else seemed reported ) . This is where I have stopped doing permutations and
combinations before I contacted one of the stake holder of this command as I was in a hurry to deliver a whole automation script to my customer
( this is one piece among a 200 line code ) . It turns out to be that for command to understand the edition we should add a mandatory parameter
“-ComputeGeneration” , like below .

Set-AzContext -SubscriptionId "e839366e-004a-42cf-8bee-85940495ab95"
$variable1 = Get-AzSqlElasticPool -ResourceGroupName "learning"`
-ServerName "ramtestserver"
foreach ($poolname in $variable1.ElasticPoolName)
{
Set-AzSqlElasticPool -ResourceGroupName "learning"`
 -ServerName "ramtestserver"`
 -ElasticPoolName "$poolname"`
 -ComputeGeneration "Gen5"`
 -Edition "GeneralPurpose"`
 -VCore "2"`
 -LicenseType "BasePrice"
}

Yes , this time I got what I was looking for .

*********** below script is just to get the detailed output , you have to run along with above one ************
foreach ($poolname in $variable1.ElasticPoolName)
{
$CurrentDTU = Get-AzSqlElasticPool -ResourceGroupName "learning" -ServerName "ramtestserver" -ElasticPoolName "$poolname"
Write-host "****************"
Write-host "Pool Name :"$CurrentDTU.ElasticPoolName
Write-host "Pool Edition :"$CurrentDTU.Edition
Write-host "Pool Number of Vcores :"$CurrentDTU.Dtu
That’s promising !!!

I thought to pen down my learnings here so that to be useful for others .

Execute SQL Statements Against Multiple SQL Servers Simultaneously Using PowerShell (On-Premises)

My customer keeps me busy writing automation scripts for various requirements . These days I am helping them to build a centralized repository for their on-premises SQL servers . To start with , I want to collect instance names and versions of SQL server installed in their environment ( quite a few servers have multiple SQL servers installed ) . So I want to be ready with a script which can query all SQL instances in a windows server ,and yes of course will be very helpful for them if I can make the script friendly for any T-SQL queries.

Thinking of this I got few options like querying WMI , using Microsoft MAP toolkit ,querying registry etc.. And today I am going to explore the registry as I feel its more reliable than WMI in many ways ( I do not want to use MAP toolkit as I want to avoid bringing another application into our customer infrastructure and run behind the security clearance , IT approvals etc..)

# Building a PowerShell object for easy presentation and display output in a tabular format in csv
$temp = New-Object -TypeName psobject
$temp | Add-Member -MemberType NoteProperty -Name HostName -Value $null
$temp | Add-Member -MemberType NoteProperty -Name InstanceName -Value $null
$temp | Add-Member -MemberType NoteProperty -Name SQLVersion -Value $null

# list of HostNames.
$hostnames = get-content "D:\Work\PowerTest\Servers_list.txt"  

Step1. Loop through the host names .

foreach ($hostname in $hostnames)
{
$tempobj = $temp | Select-Object *

Step2. Query the registry to find out all SQL instances configured .

#traversing registry of the remote host to identify the sql instances installed
#ensure that you have the right to access the resgistry in remote machines

$SQLInstances = Invoke-Command -ComputerName $hostname {
(Get-ItemProperty 'HKLM:\SOFTWARE\Microsoft\Microsoft SQL Server').InstalledInstances }
foreach ($sql in $SQLInstances)
{

Step3. Build a connection to each instance and execute the T-SQL .

#registry will give me the default instance name as "MSSQLSERVER" which needs to be changed for connection
if($sql -eq 'MSSQLSERVER')
{$instance = $sql.PSComputerName
$tempobj.InstanceName = $instance}

#concatenating hostname with instance name to get the correct connetion object
else { $instance = $sql.PSComputerName,$sql -join"\"
$tempobj.InstanceName = $instance}

Step4. Fetch the output of queries in table format in a csv file .

$result = Invoke-Sqlcmd -ServerInstance $instance -Database master -Query "select @@version as 'Version'"
$tempobj.SQLVersion = $result.Version
$tempobj.HostName = Hostname
$tempobj| Export-Csv "D:\Work\PowerTest\instance_details.csv" -NoTypeInformation -append
}}
Final Script:

# Building a PowerShell object for easy presentation and display output in a tabular format in csv
$temp = New-Object -TypeName psobject
$temp | Add-Member -MemberType NoteProperty -Name HostName -Value $null
$temp | Add-Member -MemberType NoteProperty -Name InstanceName -Value $null
$temp | Add-Member -MemberType NoteProperty -Name SQLVersion -Value $null
$hostnames = get-content "D:\Work\PowerTest\Servers_list.txt"

# looping through the host names provided 
foreach ($hostname in $hostnames)
{
$tempobj = $temp | Select-Object *

#traversing registry of the remote host to identify the sql instances installed
$SQLInstances = Invoke-Command -ComputerName $hostname {
(Get-ItemProperty 'HKLM:\SOFTWARE\Microsoft\Microsoft SQL Server').InstalledInstances }
foreach ($sql in $SQLInstances)
{
#registry will give me the default instance name as "MSSQLSERVER" which needs to be changed for connection
if($sql -eq 'MSSQLSERVER')
{$instance = $sql.PSComputerName
$tempobj.InstanceName = $instance}

#concatenating hostname with instance name to get the correct connetion object
else { $instance = $sql.PSComputerName,$sql -join"\"
$tempobj.InstanceName = $instance}

$result = Invoke-Sqlcmd -ServerInstance $instance -Database master -Query "select @@version as 'Version'"
$tempobj.SQLVersion = $result.Version
$tempobj.HostName = Hostname
$tempobj| Export-Csv "D:\Work\PowerTest\instance_details.csv" -NoTypeInformation -append
}}

Notes

  • I have not used any error handling mechanism as its purely on consumers need . ( perhaps you can add couple of TRY CATCH )
  • You can pull a lot of information from SQL server instances , its just that I need versions . Therefore you will have to handle the output and present accordingly.
  • Ensure that you have enough permission to read the registry.

Execute SQL scripts against ALL databases / in ALL SQL servers / in ALL resource groups / in A subscription.( SQL authentication)

This time I was developing an automation framework for one of my customer . I was told to use “Azure SPN” for database authentication which brings few interesting security benefits like refraining interactive logins etc..To accomplish this we need a user in database ( every database where you need to authenticate) for SPN to authenticate using azure active directory .

But challenge in here is user needs to be created on 1000+ databases spread across multiple resource groups and SQL servers . One way to do this is using “azure run book” ,but in most of the enterprise environments database management team wouldn’t have access to create or manage runbooks . Hence they are heavily dependent on another team .

To overcome this dependency I came up with a powershell script which can be used in situations where we can bulk push one or more T-SQL to multiple databases at a time . I have used “Get-azureRmResourceGroup” which gives me the list of resource groups in a specific subscription and “Get-AzureRmSqlServer” gives me the available sql servers in a specific resource group . So I have piped “Get-AzureRmResourceGroup” with “Get-AzureRmSqlServer” to get the list of SQL servers provisioned in a subscription.

Final script :

#Provide your tenant_id and subscription_id
Connect-AzAccount -Tenant "<tenant_id>" -SubscriptionId "subscription_id"

#Identifying all sql servers in connected subscription 
$servernamesall = Get-AzureRmResourceGroup | Get-AzureRmSqlServer

#Collecting unique resource group names where ONLY sql servers are hosted (im not bothered about resource group which doesn't have SQL servers)
$resourcegroups = $servernamesall.ResourceGroupName | Get-Unique

#Looping through each resource group collected from above #parent_loop
foreach($resourcegroup in $resourcegroups)
{
Write-Output "------------------------------"
Write-Output "Fetching from resource group : $($resourcegroup)"
$serverlistraw = Get-AzureRmSqlServer $resourcegroup
$serverlists = $serverlistraw.ServerName

#Looping through each sql server in the resource group #child_loop1
foreach ($serverlist in $serverlists)
{
$serverlist=$serverlist+".database.windows.net"
Write-Output "Fetching from server:$($serverlist)"

#Excluding datawarehouse pools .
$databases = Invoke-Sqlcmd -ServerInstance $serverlist -Username username -password "<password>" -Database Master -Query "select sd.name from sys.databases sd join [sys].[database_service_objectives] sdo
on sd.database_id = sdo.database_id where sd.database_id <> 1 and sdo.edition not like 'DataWarehouse'"

#Looping through each database collected from sys.databases table #child_loop2
foreach ($database in $databases) 
{
#T-SQL which needs to be executed in each database
#Im constructing another connection here as cross database query is not possible in SQL PaaS
$arrayforoutput = @()
Write-Output "Executing against database:$($database.name)"
$output = Invoke-Sqlcmd -ServerInstance $serverlist -Username username -password "<password>" -Database $database.name -Query "declare @servernm nvarchar(20),
@dbname nvarchar(20)
select @servernm = @@servername
select @dbname = db_name()
select 'Script executed in server '+@servernm +' against a database '+@dbname"
Write-Output $($output.Column1)
$arrayforoutput += $output
}
#If you want to write your ouput in a text file
$arrayforoutput | format-table -HideTableHeaders | Out-File -FilePath C:\powershell_output\query_output.txt -append
}
}

Notes:

  • Here I have used SQL authentication to run the script , in case if you want to use Azure AD account then you may have to use function
    Get-QueryExecutionOutput” from my previous blog.
  • Error handling is very minimal in this script , you may need to use a better handling perhaps a try/catch ? in case if you need to debug a failed query or a loop .
  • I have used single subscription here for easiness of testing ( I have only one ) . If you have multiple subscriptions you can use one more loop to go through a list of subscriptions from source text / excel .
  • Instead of querying “sys.databases” we can use “Get-AzSqlDatabase” as well to loop , however it would be bit clumsy if you want to apply any filter .

Execute a SQL script against a set of azure PaaS databases listed in an excel file. ( Azure AD authentication)

One of my customer has varieties of applications hosted in hybrid environment( on-prem and cloud) predominantly on azure platform and a significant effort is been put to move the back end platform to azure where ever its viable . This pushed me to develop quite a few automation scripts which helped us to get the things done with minimal manual work.

Recently I was pulled into develop a script which should convert a PaaS database tier from DTU to vCore for whatever the business reasons they found . Let me tell you I am talking about 5000+ databases what they are targeting and spread across prod to non-prod . So it’s impossible to accomplish this task using azure portal / UI . Hence I may have to use the script and scale up the execution in multiple databases .

We had brainstorming hours to get the requirements as well as to make our road map vivid , of course we decided to go for PowerShell without much hesitation !!!

  1. Its easily do-able using the DDL “”alter database <database_name>  modify(edition = ‘GeneralPurpose’ ,service_objective = ‘<target_tier>’)” . Let’s see how can I execute this in a loop .
  2. Need a brief downtime during this activity ( during the process when a database gets migrated from DTU to vCore) , so the duration of execution may depends on region , off-business hours etc..(how busy your database is when you migrate)
  3. One approach I considered for developing script was to get the databases names using “sys.databases” from master database and pass it in a loop to execute the migration script. But this is not possible due to point number “2” as they have a mixed collection of databases hosted in a single logical server.
  4. Customer can provide me an excel with server names , database names and corresponding vCore tier model what we are targeting ( followed a rule of thumb, every 100 DTUs in the standard tier require at least 1 vCore, and every 125 DTUs in the premium tier require at least 1 vCore). We picked this approach as we have more control in the activity .
Final script :

# Defined this function to build the connection string using ADO.Net client # reason being Microsoft has already disclosed that there is no plan        # dropping support for System.Data.SqlClient any time soon .
# I want to use more secured AAD account for authentication 
# Note : invoke-sqlcmd won’t support Azure Active Directory authentication as of 3/16/2020 .

function Get-QueryExecutionOutput ($SQLconnectionstring, $SQLquery) {
$Conn = New-Object System.Data.SqlClient.SqlConnection($SQLconnectionstring)
$Conn.open()
$read = New-Object System.Data.SqlClient.SqlCommand($SQLquery,$Conn)
$read.CommandTimeout = "300"
$da = new-object System.Data.SqlClient.SqlDataAdapter($read)
$dt = New-Object system.Data.datatable
[void]$da.fill($dt)
$Conn.close()
return $dt
}

# My source excel path , change appropriately with yours
$excelfile = "C:\Array\vCore.xlsx"
$sheetgiven = "Sheet1"


# Create Excel object
$excelCOM = New-Object -com Excel.Application
#$excelCOM.Visible = $false

#opening the source file and reading the data from the sheet name assigned  variable
$workbook = $excelCOM.workbooks.open($excelfile)
$sheet = $workbook.Worksheets.Item($sheetgiven)

#select total rows
$maxrows = ($sheet.UsedRange.Rows).Count

#assign new powershell-object with values like ServerName , DatabaseName and Tergetmodel
$powershellobject = New-Object -TypeName psobject
$powershellobject | Add-Member -MemberType NoteProperty -Name ServerName -Value $null
$powershellobject | Add-Member -MemberType NoteProperty -Name DatabaseName -Value $null
$powershellobject | Add-Member -MemberType NoteProperty -Name Targetmodel -Value $null
$outputarray = @()
#$i =2 , because im skipping column names
#for loop  will iterate till we reach max row count
for ($i = 2; $i -le $maxrows; $i++)
{
    $adhocobject = $powershellobject | Select-Object *
    $adhocobject.ServerName = $sheet.Cells.Item($i,1).Text
    $adhocobject.DatabaseName = $sheet.Cells.Item($i,2).Text
    $adhocobject.Targetmodel = $sheet.Cells.Item($i,3).Text
    $servername = $adhocobject.ServerName
    $Databasename = $adhocobject.DatabaseName
    $Targetmodel = $adhocobject.Targetmodel
    $SQLquery = "alter database $Databasename  modify(edition = 'GeneralPurpose' ,service_objective = '$Targetmodel')"
    $SQLconnectionstring = "Data Source=tcp:$servername;Initial Catalog=master;Authentication=Active Directory Password;User id=<>;Password=<>;Encrypt=True;TrustServerCertificate=True;Connection Timeout=30;"

# function call to execute the query against the dynamically built connection string .
Get-QueryExecutionOutput "$SQLconnectionstring" $SQLquery 
#creates a new array with the same elements as old one + the new item, and this new larger array replaces the old one in the $outputarray variable         
$outputarray += $adhocobject
}
$excelCOM.quit()
get-process *excel* | stop-process -force  

Expectations from DBA :

  • Install required Azure PowerShell modules .
  • $file = “C:\Array\vCore.xlsx” your excel file .
  • Should have only 3 columns ( ServerName , DatabaseName ,Targetmodel ) should be in same order . You are expected to fill servername , DatabaseName and Target model .
  • No null values are accepted in the excel sheet .
  • You may have to filter out elastic pool databases from the list.
  • If you want a detailed output then you may define a variable to get the output from “Get-QueryExecutionOutput” function . Then this variable content can be written to a text file using “Out-File” and -append . ( within for-each loop)