Data Engineer OR Data Analyst Resume

Posted: 10/11/2023 | Ref: #84360

Location Status Jacksonville, FL

Work Environment

Target Salary Negotiable

Category Computer Software/Programming

Machine Learning: Predictive ModelingData Pre-ProcessingNatural Language Processing (NLP)Neural Networks/ Deep LearningStatistics: Hypothesis TestingConfidence IntervalsInferenceBayesian ModelingMarkov Chain Monte CarloTime SeriesProgramming: RPython (PandasScikit-learnScipyNumpy)MATLABSASPythonHiveApache SparkTensorFlowDatabases: MongoDBCassandraMySQLMS SQLTools & Applications: HortonworksRClouderaSAS E- MinerSAS Enterprise Guide

Candidate Pitch:

Data Analytics Engineer,American Tire Distributors, Inc., Huntersville, NC June 2019 – Present ● Built data pipelines to load data from MS sql server from cloud storage buckets into bigquery on google cloud platform (python, ms-sql, google cloud platform - cloud storage, bigquery) ● Built framework to automate the data quality testing process of flat files (python) ● Built third party file monitoring - loading framework to load data into bigquery (python) ● Built scheduler framework using Airflow and deployed jobs using CI-CD pipeline (python, git, airflow) ● Built a framework to stream data from Oracle databases (golden gate) into bigquery using various google cloud technologies like confluent cloud kafka, google pub/sub, cloud functions. ● Built a scalable framework to stream real-time data from oracle databases into postgres and bigquery using kafka deployed on managed kubernetes clusters ● Automated the code generation and testing process of ingesting data into new bigquery tables using python. ● Ad-hoc analytics to analyze and predict customer behaviour using unsupervised learning (k-means clustering) ● Built a framework for on demand docker file runs using google container registry. (docker, python, cloud instance) ● Built framework to download/ harvest/ scrape data from 2000+ sources ● Built framework to extract data from different types of sources like csv, excel, pdf, web tables, etc,. ● Built Temporal workflow management tool to execute jobs ● Built a framework to process 2000+ unstructured pdfs and stored data in elasticsearch to make search and extract more efficient ● Built internal websites using Appsmith to build dashboards Sr. Data Analyst/ Engineer, Bluestem Brands, Inc., Minneapolis, MN Jan 2017 – June 2019 ● Analyzed Credit reporting data, architected and built scalable application which decreased manual effort and resulted in 200% plus productivity helping to cut down costs which decreased manual effort (Python, Selenium) ● Investigated data flow from various internal and external applications and predicted yearly and monthly credit bureau dispute volumes. Recommended process improvement for cost savings. ● Analyzed Collections data results, architected and built data pipelines and batch jobs using HiveQL, python, HDFS, Activebatch ● Analyzed Collections data and submitted daily/weekly/monthly reports that helps business in monitoring key metrics and underperforming areas. Python, MySQL, HiveQL ● Analyzed dialer results and decreased cost/credit application by making strategic changes. Python, Mysql ● Analyzed the data and made strategic changes to new customer acquisition process helping to cut down costs (Python, SQL) ● Analyzed Fraud data(peak season vs non-peak) from different samples provided insights on various fraud related reports and recommended areas of improvement that reduced avg. delinquency rates ● Recommended traditional and nontraditional methods of investigating fraud (Web and phone orders) to the Fraud investigators. ● Analyzed retail, credit, web and payments data and recommended modifications to the fraud detections and monitoring systems. (HiveQL, MYSQL, HDFS, Python etc.) ● Built Fraud monitoring system (proof of concept ) and predicted potentially fraudulent activity by analyzing customer’s behavior to further improve customer satisfaction (Python, SQL, HIVEQL, Spark, Scikit-Learn- K-Means, SVM, Isolation Forest, etc.) ● Partnered with product owner, coordinated and assisted with Collections, Credit IT, Credit Bureau Reporting teams. ● Partnered with the product owner on the analysis and automated the process of reporting. ● Built and monitored technical programs for Collections, Credit Bureau reporting and Fraud teams. ● Served as a liaison for technical teams and non-technical teams Data Science Intern, GoFind.ai Berkeley, CA May 2016- Sept. 2016 ● Designed and built a scalable spark architecture to distribute data across clusters using Spark’s ML, MLlib library and deployed batch jobs on Amazon EMR cluster and ec2 instances. ● Designed work flows for processing 3 million images and implemented feature extraction and dimensionality reduction techniques and improved the classification accuracy of the retail categories using TensorFlow’s machine learning models. Assistant Systems Engineer, Tata Consultancy Services, Bangalore, India Oct. 2013 – Dec. 2014 ● Developed machine learning algorithms to improve the search results of products ● Developed a web admin tool for the client to retrieve data related to HP’s Printers and Personal systems ● Analyzed and reported real-time data from google APIs to the MongoDB database using MongoDB, Java and scheduled batch jobs ● Developed map reduce programs to segregate live streaming messages and pipelined the output data into the HDFS through Apache Kafka. Transaction Risk Analyst, Amazon.com, Bangalore, India Jul. 2013- Sep. 2013 ● Analyzed real time order level credit card, prepaid card and gift card transactions using Amazon’s analytic tools to detect risk related patterns and documented operational patterns. ● Decisions were made on accounts using traditional and nontraditional methods using customer’s historical web activity, banking information, E-commerce and social media. ● Co-ordinated with Machine learning teams in building robust fraud detection applications South Dakota State University, Brookings, SD 2015 – 2016 Master of Science, Data Science [GPA: 3.7] Courses: Big Data Analytics, SAS Programming, Data Warehousing/ Data Mining, Modern Applied Statistics, Statistical Programming, Predictive Analytics, Nonparametric Statistics, Programming-Data Analytics SRM University, Chennai, India 2009 –2013 Bachelor of Technology, Mechatronics Data Analytics Engineer,American Tire Distributors, Inc., Huntersville, NC June 2019 – Present ● Built data pipelines to load data from MS sql server from cloud storage buckets into bigquery on google cloud platform (python, ms-sql, google cloud platform - cloud storage, bigquery) ● Built framework to automate the data quality testing process of flat files (python) ● Built third party file monitoring - loading framework to load data into bigquery (python) ● Built scheduler framework using Airflow and deployed jobs using CI-CD pipeline (python, git, airflow) ● Built a framework to stream data from Oracle databases (golden gate) into bigquery using various google cloud technologies like confluent cloud kafka, google pub/sub, cloud functions. ● Built a scalable framework to stream real-time data from oracle databases into postgres and bigquery using kafka deployed on managed kubernetes clusters ● Automated the code generation and testing process of ingesting data into new bigquery tables using python. ● Ad-hoc analytics to analyze and predict customer behaviour using unsupervised learning (k-means clustering) ● Built a framework for on demand docker file runs using google container registry. (docker, python, cloud instance) ● Built framework to download/ harvest/ scrape data from 2000+ sources ● Built framework to extract data from different types of sources like csv, excel, pdf, web tables, etc,. ● Built Temporal workflow management tool to execute jobs ● Built a framework to process 2000+ unstructured pdfs and stored data in elasticsearch to make search and extract more efficient ● Built internal websites using Appsmith to build dashboards Machine Learning: Predictive Modeling, Data Pre-Processing, Natural Language Processing (NLP), Neural Networks/ Deep Learning Statistics: Hypothesis Testing, Confidence Intervals, Inference, Bayesian Modeling, Markov Chain Monte Carlo, Time Series Programming: R, Python (Pandas, Scikit-learn, Scipy, Numpy), MATLAB, SAS, Python, Hive, Apache Spark, TensorFlow Databases: MongoDB, Cassandra, MySQL, MS SQL Tools & Applications: Hortonworks, R, Cloudera, SAS E- Miner, SAS Enterprise Guide, SAP business One, Tableau Sr. Data Analyst/ Engineer, Bluestem Brands, Inc., Minneapolis, MN Jan 2017 – June 2019 ● Analyzed Credit reporting data, architected and built scalable application which decreased manual effort and resulted in 200% plus productivity helping to cut down costs which decreased manual effort (Python, Selenium) ● Investigated data flow from various internal and external applications and predicted yearly and monthly credit bureau dispute volumes. Recommended process improvement for cost savings. ● Analyzed Collections data results, architected and built data pipelines and batch jobs using HiveQL, python, HDFS, Activebatch ● Analyzed Collections data and submitted daily/weekly/monthly reports that helps business in monitoring key metrics and underperforming areas. Python, MySQL, HiveQL ● Analyzed dialer results and decreased cost/credit application by making strategic changes. Python, Mysql ● Analyzed the data and made strategic changes to new customer acquisition process helping to cut down costs (Python, SQL) ● Analyzed Fraud data(peak season vs non-peak) from different samples provided insights on various fraud related reports and recommended areas of improvement that reduced avg. delinquency rates ● Recommended traditional and nontraditional methods of investigating fraud (Web and phone orders) to the Fraud investigators. ● Analyzed retail, credit, web and payments data and recommended modifications to the fraud detections and monitoring systems. (HiveQL, MYSQL, HDFS, Python etc.) ● Built Fraud monitoring system (proof of concept ) and predicted potentially fraudulent activity by analyzing customer’s behavior to further improve customer satisfaction (Python, SQL, HIVEQL, Spark, Scikit-Learn- K-Means, SVM, Isolation Forest, etc.) ● Partnered with product owner, coordinated and assisted with Collections, Credit IT, Credit Bureau Reporting teams. ● Partnered with the product owner on the analysis and automated the process of reporting. ● Built and monitored technical programs for Collections, Credit Bureau reporting and Fraud teams. ● Served as a liaison for technical teams and non-technical teams Data Science Intern, GoFind.ai Berkeley, CA May 2016- Sept. 2016 ● Designed and built a scalable spark architecture to distribute data across clusters using Spark’s ML, MLlib library and deployed batch jobs on Amazon EMR cluster and ec2 instances. ● Designed work flows for processing 3 million images and implemented feature extraction and dimensionality reduction techniques and improved the classification accuracy of the retail categories using TensorFlow’s machine learning models. Assistant Systems Engineer, Tata Consultancy Services, Bangalore, India

🔒

What is a Privacy Pitch Resume?

This candidate has opted to keep their Personally Identifiable Information (PII) secure while actively searching for new opportunities. You are viewing their core qualifications and requirements. If their profile aligns with your needs, click Contact Candidate Securely below. We will route your message directly to their private inbox.