senior data engineer Resume

Posted: 1/19/2025 | Ref: #84838

Location Status sanfrancisco, CA

Work Environment

Target Salary Negotiable

Category Engineering

Hadoop/Big Data:HDFSMapReduceHBaseHiveClouderaSqoopProgramming Languages:SQLPythonPySparkDatabases:RDBMS (MySQLDB2MS-SQL ServerPostgreSQL)NoSQL HBaseCassandra)Snowflake virtual warehouseAzure SQL WarehouseAzure SQL DBPython LibrariesNumPymatplotlibpandasScikit-learnCloud TechnologiesAmazon Web Services (AWS)Microsoft AzureAzure CloudADFv2ADLSSQL serverAzure SynapseAzure Analytic ServicesData bricksMapping Dataflow (MDF)

Candidate Pitch:

● Bringing over 10+ years of software industry experience dedicated focus on Microsoft Azure cloud services, and Amazon Web Services with a strong background in Big Data and added expertise in Data Warehousing. ● Accomplished data expert with an extensive background in end-to-end management of ETL data flows, guaranteeing flexibility and seamless performance. ● Expertise in migrating SQL databases to Azure Data Lake, leveraging Azure Data Lake Analytics, Azure SQL Database, Azure Databricks, and Azure Synapse Analytics (formerly SQL Data Warehouse). ● Experience in AWS Services like S3, EC2, Lambda, Glue, EMR, SNS, Step Function, IAM, CloudWatch, Athena, Redshift, Quicksight, Sage Maker etc. ● Experience in Extraction, Transformation, and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting, aggregating, and moving data from various sources using Apache Flume, Kafka, PowerBI, and Microsoft SSIS. ● Proficient in managing database access control and executing the transition of on-premises databases to Azure Data Lake Store utilizing Azure Data Factory for optimized data integration and analytics. ● Leveraged Azure Event Hubs to create real-time dashboards and monitoring solutions, enhancing data-driven decision-making processes. ● Implemented Azure Insights for comprehensive logging and diagnostics, enabling data-driven decision-making and continuous improvement of cloud resources. ● Expertly executed data transformation in Azure Data Lake Storage Gen-2 with medallion architecture, revolutionizing workflows, elevating data quality, and driving efficient decision-making in the Azure cloud environment. ● Harnessing Azure Databricks and PySpark in the cloud, orchestrated data processing with a focus on optimization, security, and scalability. Achieved unmatched data efficiency, reliability, and compliance in the organization's cloud-based infrastructure. ● Integrated on-premises (MySQL, Cassandra) and cloud-based (Azure Blob Storage, Azure SQL Data Base) data using Azure Data Factory, applying transformations and loading data into Snowflake. ● Architected and executed a robust Azure Lambda architecture, seamlessly enabling real-time and batch data processing. Leveraging Azure cloud services, it ensured peak performance and scalability for mission-critical business functions. ● Implemented automated data catalog processes, drastically reducing manual labor and ensuring real-time metadata accuracy across all data systems. This innovation streamlined data management to its pinnacle. ● Pioneered the seamless integration of the data catalog with current data lakes and warehouses, empowering unified, streamlined data access and analysis capabilities. This transformative effort revolutionized data utilization. ● Architected and implemented a real-time data pipeline on Azure, leveraging Azure Event Hubs for efficient streaming data ingestion. ● Developed complex SQL queries with window functions to calculate moving averages and cumulative sums. ● Engineered metadata management strategies with data catalogs, facilitating highly efficient data lineage, impact analysis, and reporting capabilities. This advanced approach optimized data governance and decision-making processes. ● Automated critical data workflows using Azure Functions, Logic Apps, and Pipelines, creating a scalable, reliable framework. This streamlined tasks, reduced errors, and cut manual work, boosting efficiency. ● Created 100+ ETL transformations and validations using Spark-SQL, processing 5TB of data weekly in Azure Databricks. ● Implemented robust security and monitoring solutions, and optimized cost management using Azure services including Azure Key Vault, Azure Security Centre, Azure Monitor, and Azure Cost Management. ● Led and executed comprehensive data migration strategies, efficiently transferring mission-critical databases to Azure Data Lake, improving data accessibility and scalability. ● Enhanced Spark performance by optimizing data processing algorithms, and leveraging techniques such as partitioning, caching, and broadcast variables. ● Strategically orchestrated the seamless data ingestion of an extensive 3 billion records into a suite of Azure Services, leveraging Azure Databricks to achieve a notable 40% enhancement in data processing speed. ● Pioneered migrating on-premises Oracle ETL processes to Azure Synapse Analytics, substantially improving data processing efficiency and cost-effectiveness. ● Directed the strategic transference of SQL databases to Azure Data Lake and SQL Data Warehouse, employing Azure Data Factory to ensure optimal data transfer and storage solutions. ● Conducted ETL operations with Azure Databricks and led the migration from on-premises Oracle ETL to Azure Synapse Analytics, optimizing system performance and data management efficiency while achieving remarkable scalability and cost reduction. ● Engineered and implemented high-performance ETL pipelines within Azure Databricks, processing over 3 terabytes of data daily with 99.99% data integrity ● Effectively drove the utilization of Microsoft Azure services, including Azure HDInsight Clusters, Azure Data Factory, Azure Blob Storage, Azure Data Bricks, Azure Function Apps, Azure Security, and Azure Logic Apps, to advance project initiatives and maximize cloud capabilities. ● Implemented job dependencies and triggers in Control-M to optimize data workflow execution. ● Designed and executed state-specific data partitioning and bucketing strategies, harnessing Bucket-based Hive joins for a significant boost in query performance and data retrieval efficiency. This strategic initiative profoundly enhanced analytics capabilities and exemplified excellence in data processing. ● Proactively assessed and championed advancements for the data catalog system, maintaining a leading edge in technology and aligning with evolving business demands. This strategic vision drove continuous improvement. ● Revolutionized troubleshooting by harnessing data lineage for rapid issue pinpointing and resolution, fully optimizing operational efficiency. MS- Data Engineering-University of North Texas, Denton, USA. B. Tech- Electronics & Communication Engineering. Koneru Lakshmaiah University, Vijayawada, India. Hadoop/Big Data: HDFS, MapReduce, HBase, Hive, Cloudera, Sqoop. Programming Languages: SQL, Python, PySpark Databases: RDBMS (MySQL, DB2, MS-SQL Server, PostgreSQL), NoSQL HBase, Cassandra), Snowflake virtual warehouse, Azure SQL Warehouse, Azure SQL DB Python Libraries NumPy, matplotlib, pandas, Scikit-learn Cloud Technologies Amazon Web Services (AWS), Microsoft Azure Azure Cloud ADFv2, ADLS, Azure SQL DB, SQL server, Azure Synapse, Azure Analytic Services, Data bricks, Mapping Dataflow (MDF), Azure Data Lake (Gen1/Gen2) AWS-Services S3, EC2, EMR, Lambda, Redshift, Data Pipeline, Athena, AWS Glue, S3 Glacier, Cloud Formation Reporting Tools JIRA, MS Excel, Tableau, PowerBI, SSRS Management Studio Tools Jenkins, Eclipse, Maven, Git HUB Methodologies Agile, Scrum, Waterfall

🔒

What is a Privacy Pitch Resume?

This candidate has opted to keep their Personally Identifiable Information (PII) secure while actively searching for new opportunities. You are viewing their core qualifications and requirements. If their profile aligns with your needs, click Contact Candidate Securely below. We will route your message directly to their private inbox.