Data Engineer - Information Technology Resume Search
Data Engineer - Information Technology Resume Search
My Spider Scam Awareness Contacting Us F. A. Q.
 
Job Seekers
Search Jobs
Browse Jobs
Post a Resume
Job Alerts
 
Employers
Search Resumes
Browse Resumes
Post a Job

Data Engineer Resume


Desired Industry: Information Technology SpiderID: 84397
Desired Job Location: Kansas City, Missouri Date Posted: 11/7/2023
Type of Position: Full-Time Temporary Availability Date:
Desired Wage:
U.S. Work Authorization: Yes
Job Level: Experienced with over 2 years experience Willing to Travel:
Highest Degree Attained: Masters Willing to Relocate: Yes


Objective:
Experienced Data Engineer (3+ Years) with a strong proficiency in Python, ETL processes, and data transformation. Skilled in managing real-time data pipelines, ensuring data integrity and reliability. Proven track record of implementing efficient data distribution and storage solutions in cloud platforms, such as Google Cloud and AWS. Seeking to contribute technical expertise and innovative solutions to AutoAlert's data engineering team


Experience:
Professional Experience
______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Data Engineer | Indus Group, Hackensack, NJ June 2023 – Present

Real-Time Data Pipeline Implementation for Platform Insights
Indus Group, a finance tech firm, is working on real-time user platform insights for improved efficiency. Leading the implementation of the pipeline using Apache Kafka, Spark Streaming, dbt, Google Cloud Storage, BigQuery, and Looker Studio (Data Studio) for data processing, storage, analytics, visualization, and CI/CD.

• Manage the real -time data ingestion of 100,000+ streaming events per hour from internal financial platforms using Apache Kafka for immediate insights into user interactions.
• Orchestrate Spark Streaming processes, reducing operational bottlenecks by 20% and significantly improving resource allocation efficiency.
• Implement dbt for structured data transformations within BigQuery, leading to a remarkable 25% enhancement in the accuracy of financial reporting and a notable reduction in errors.
• Establish regular storage of processed data in Google Cloud Storage (GCS), resulting in a 10% reduction in software licensing costs.
• Create visually engaging dashboards and reports with Looker Studio (Google Data Studio), directly correlating with an 18% boost in user engagement metrics.
• Implementing Continuous Integration/Continuous Deployment (CI/CD) pipelines through Jenkins, BitBucket version control, Docker, and Apache Airflow, achieving an 80% reduction in manual data processing tasks.

Tech Stack: Apache Kafka, Spark Streaming, dbt, BigQuery, Looker Studio (Google Data Studio), Jenkins, BitBucket, Docker, Apache Airflow, Google Cloud Platform (GCP), Terraform, Docker Compose, Python, SQL

Data Engineer | Stacklogy, Leander, TX Jan 2023 – May 2023

Data Migration and Quality Assurance Automation
At Stacklogy, a financial technology company, the task was to modernize the financial data management infrastructure, achieved by migrating data from PostgreSQL to Hadoop (HDFS) and transitioning to the AWS cloud platform. Leveraged Apache Sqoop, Hive, Spark, Docker, and Hadoop for on-prem migration, and Amazon EMR and S3 for cloud migration.

• Ingested over 10TB of financial data from legacy RDBMS (PostgreSQL) to Hadoop (HDFS) using Apache Sqoop, ensuring data integrity and reliability.
• Implemented Docker-based containerization, reducing operational costs by 15% and enhancing resource allocation and scalability.
• Utilized Apache Hive to perform intricate data transformations, including aggregations, data type conversions, and data cleansing, improving data quality by 20%.
• Integrated custom data quality checks within the pipeline, enabling real-time anomaly detection and ensuring data reliability with a 95% accuracy rate.
• Leveraged Apache Spark for advanced data processing tasks, including data enrichment, complex transformations, and real-time data analysis.
• Successfully loaded transformed data into Hadoop (HDFS), optimizing data retrieval speeds and accommodating future data growth.
• Successfully moved Hadoop tables to AWS S3 for optimized storage.
• Implemented Protegrity to tokenize personally identifiable information (PII) data elements during the migration from on-prem Hadoop to AWS.
• Proficiently processed data on AWS EMR (Elastic Map Reduce) using Hive and Spark for quality enhancements and real-time analysis.

Tech Stack: Apache Sqoop, Apache Hive, Apache Spark, Docker, Hadoop (HDFS), Apache Kafka, Apache Airflow, Jenkins, PostgreSQL, AWS S3, AWS EMR (Elastic Map Reduce), Protegrity, Python, SQL, PL/pgSQL.
Graduate Teaching Assistant | Wichita State University, KS Jan 2022 – May 2022 | Aug 2022 – Dec 2022

Mentored students in WSU's Database Management Systems course, with a strong emphasis on Data Modeling, Entity-Relationship Model (ER Data Models), star schema design and collaborative data project management.

• Evaluated student assignments, emphasizing data modeling and ETL processes and RDBMS skills.
• Collaborated with 200+ students to optimize schema designs, prioritize data normalization & efficient structures.
• Managed Git for student work, emphasizing best practices for collaborative data project management.
• Designed and deployed 10+ Tableau visualization dashboards for real-time, data-driven student performance insights.

Tech Stack: MySQL, MSSQL, PostgreSQL, Stored Procedures, Git, SQL, PL/pgSQL, T-SQL
Data Engineer | ScaleupAlly, India Oct 2018 – July 2021

High-Performance Data Ingestion and Analytics Pipeline
Developed a high-performance data pipeline that integrated, transformed, and stored data from diverse sources to enable advanced analytics and machine learning. Leveraged technologies such as Apache Sqoop, Pig, Hive, Hadoop HDFS, MapReduce, and Oozie, the project streamlined data processing, ensuring data quality, reliability, and performance.
• Orchestrated data ingestion using Apache Sqoop, extracting data from APIs, AWS S3, Teradata, and other sources, automating the process with Oozie schedules.
• Designed ETL workflows with Apache Pig and Hive for data transformation and cleansing, enhancing data quality and consistency for downstream analytics.
• Implemented data storage solutions using Hadoop HDFS and optimized data retrieval by employing Hive partitioning and bucketing strategies.
• Ensured the reliability of the data pipeline through monitoring, maintenance, and log management, facilitating data availability.
• Applied performance optimization techniques, including distributed caching, partition pruning, and Map Side joins, to boost data processing efficiency.
• Played a crucial role in architecting the data architecture, enabling efficient access and retrieval of structured and unstructured data.
• Collaborated with the Tableau team to create visually engaging data reports and dashboards, empowering data-driven decision-making.

Tech Stack: Apache Sqoop, Apache Pig, Apache Hive, Hadoop HDFS, MapReduce, Oozie, Tableau, API, AWS S3, Scala
Data Analyst | Webkul, India Sep 2016 – Aug 2018

• Implemented Apache NiFi to automate the Extract, Transform, Load (ETL) processes, enhancing data integration.
• Optimized query performance through efficient joins and data distribution strategies.
• Managed data extracts, refresh schedules, & implemented advanced Tableau features for enhanced data visualizations.
• Conducted comprehensive data analysis and business analysis tasks, transforming data from various sources into valuable insights for informed decision-making.

Tech Stack: Tableau, SQL, Apache NiFi, PostgreSQL, Extract, Transform, Load (ETL)


Education:
Master of Science in Computer Science | Wichita State University | Wichita, KS, USA


Affiliations:
IEEE – Eta Kappa Nu (HKN) Honors Society – Attained exclusive membership for exceptional academic performance & duties.


Skills:
• Programming: Python, SQL, Shell Scripting
• Databases: MySQL, Microsoft SQL Server (MSSQL), PostgreSQL, MongoDB (NoSQL), Flat files
• Libraries: Pandas, NumPy, Scikit-learn, Pytorch, TensorFlow, NetworkX
• Reporting & Visualization: Tableau, Looker Studio (Data Studio), Statgraphics
• Big Data Technologies: Hadoop, Spark (pyspark), dbt, Apache Kafka, Sqoop, Apache Hive, Apache Spark, Pig, Oozie, YARN
• Cloud Platforms: Google Cloud Platform (GCP), Amazon Web Services (AWS)
• ETL Tools: Apache Nifi, Apache Airflow, Protegrity
• DevOps & CI/CD: Git, Bitbucket (Version Control), Docker, Jenkins
• Methodologies: Agile, Waterfall, Scrum


Additional Information:
Statistical Modeling and Data Analysis with Regression in Statgraphics July 2022

• Developed a regression model for strength data analysis, achieving a strong fit with an adjusted R-squared of 76.25%.
• Evaluated model significance and predictive capabilities using F-ratio and 90% and 95% confidence intervals, validating its usefulness in predicting data.
• Conducted comprehensive statistical data analysis and advanced data visualization, including residual plots, to ensure model validity and absence of violations.

Multi-class Classification of Diamond Cut Quality with Deep Learning Apr 2022

• Designed and executed a feedforward neural network using Keras and feature engineering for predictive modeling.
• Performed exploratory data analysis (EDA), handling missing values, outliers, and data normalization during data cleaning.
• Trained the model using backpropagation and stochastic gradient descent (Adam) optimization.
• Hyperparameter tuning was performed with GridSearchCV (scikit-learn) and evaluated model performance with accuracy, Confusion Matrix, and ROC AUC score metrics.
• Noted enhanced accuracy of predictive model for classes that constituted the majority in the original dataset.

Openfoodfacts Dimensional Modeling with GCP Jan 2022

• Designed and executed a data modeling project, including dimensional modeling, for efficient analytical querying and reporting.
• Extracted and stored raw data in Google Cloud Storage, utilizing BigQuery for data analysis and data warehouse enhancement.
• Developed an Extract, Transform and Load (ETL) pipeline with PySpark in Cloud Data Fusion to transform and load data into BigQuery, optimizing query performance via materialized views.
• Orchestrated & managed pipeline through scheduling in Cloud Composer (Airflow).

Election Data Modeling and Database Development Oct 2021

• Implemented an end-to-end MySQL database system for India's Prime Minister election, ensuring data precision & accessibility.
• Designed an Entity-Relationship Diagram (ERD) to visualize intricate data relationships, enhancing data analysis capabilities.
• Implemented a Star schema architecture with primary keys, attributes, and foreign keys to maintain data integrity and facilitate advanced querying for in-depth analysis.


Candidate Contact Information:
JobSpider.com has chosen not to make contact information available on this page.
Click "Contact Candidate" to send this candidate a response.


    





© 2025 Job Spider
Privacy Policy | CC Marketing Sites | Site Map | Links