The ins and outs of landing a job as a data engineer

Did you know a data engineer should know so much about Amazon Web Services?
13 September 2022

French engineer and specialist in 3D data acquisition Emmanuel Durand uses his laser-scanner on May 26, 2022, to map the architecture of the Karkhiv fire station built in 1887. – Shells and missiles have been falling on cities since the start of the war in Ukraine, damaging historic buildings. Cultural services try to preserve their memory and record the damages with advanced laser technology and 3D scans. (Photo by Dimitar DILKOFF / AFP)

Modern business is incredibly data-dependent. Moving forward, the demand for a data engineer will be exponential, as the entire data ecosystem at an organization – from analysts to data scientists – will hinge on the data engineer’s ability to deliver expected outcomes.

The data engineer’s job can be broad but critical. They would need to be able to gather disparate information from a variety of sources, know how to discern useful data from non-critical information, and ideally be able to work and learn independently to grasp a wide spectrum of digital tools and platforms.

Understanding the  knowledge funnel for this position, a professional data engineer has helped break down the ins and outs of becoming a data engineer in a handy blog, along with some of the concepts and operating parameters you’ll need to understand.

Master The Basics

Ideally, you’ll have a basic understanding of data concepts like big data, cloud, data lakes, data warehousing, and Python SQL libraries, the two most important languages for data analysts.

It’s also useful to have some knowledge of Docker Engine API, Kubernetes containerized workloads, and the ETL data integration process (Extract, Transform and Load, which means the data can be collated from one or more sources, cleaned up and standardized, and loaded into an output data container). The flow of data must be properly managed by a data engineers, so that it is constantly available and reliable. That means having an understanding of workflow management systems such as Apache Airflow or Prefect can help you stand out from the crowd.

A significant portion of the job description will be building these ETL pipelines, so having some previous ETL development experience will matter. It doesn’t necessarily have to be on major big data projects for businesses – even independent programs released on GitHub or on a blog can help increase your chances of landing a data engineer gig.

To excel as a data engineering generalist, you should have the fundamentals down pat. These include: data storage options like data lakes and warehousing; the 3V’s of big data (volume, variety and velocity – where volume is the amount of information, variety is the number of data types, and velocity is how fast the data can be processed). Knowing databases and REST APIs, or application programming interfaces within the REST (representational state transfer) architectural style is also a must.

The AWS Factor

More often than not, a solid comprehension of SQL, Python, open-source Linux, and cloud services provider Amazon Web Services (better known as AWS), should be enough to secure you an entry-level data engineer job with a decent salary.

Learning the basics of AWS is key to working in this industry, as cloud computing, networking, and database services have revolutionized storage in the last few years. Developing an application or environment in AWS can give a data engineer a leg up when working with Microsoft’s Azure and Google Cloud, the next biggest cloud service providers after Amazon, as the basic ideas behind these well-recognized cloud vendors are essentially the same, such as creating data volumes for block storage versus network file system (NFS), a client/server program for accessing and viewing files on remote devices.

The Best of Both Worlds

Finally, if you’re a good programmer and a good public speaker, you’ll be more likely be selected to be a data engineer over someone who is only one of these things. And while data engineering will involve writing code, it is not necessary to be considered an actual coder – it’s much more integral to be able to pick things up on the fly and apply them with a high degree of abstraction.

That means being able to construct DRY (Don’t Repeat Yourself) code classes so that functions can be built, extended, and reused in a modular manner, along with increasing the efficiencies of working in the Linux operating system and communicating using bash commands, can all round out your skillset. Companies also tend to appreciate data engineers who know other languages like Scala, Java, R, and C.

In engineering, “doing practically” carries much more weight than “theories and book smarts.” So focus on doing self-starting projects, add a touch of soft skills to complement your subject matter expertise, and bring a little of your uniqueness to any application, as a data engineer often has to be self-sufficient and independent, but also work well in teams.

If you can do all this at once, then maybe – just maybe – you’ll find a place in the company you want to work for as a data engineer.