Share this Job

Software Engineer - Spark DB backend

Date: May 11, 2023

Location: Pune, MH, IN

Company: Houghton Mifflin Harcourt

Software Engineer – Spark and Database for the HMH Reporting Platform

HMH Software Engineering
HMH Software Engineering provides cutting edge, individualized learning experiences to millions of
students across the United States. We are as driven by this mission as we are by continuously
improving ourselves and the way we work. Our offices are high energy, collaborative beehives of
activity where work is centered on small, autonomous teams that build great software. We trust each
other, hold ourselves and our teammates accountable for results, and improve student outcomes with
each release.


At HMH we constantly experiment with new approaches and novel ways of solving problems. We
often succeed and sometimes stumble — either way we learn and move forward with more confidence
than we had the day before. We are as passionate about new technologies and engineering
craftsmanship as we are about transforming the EdTech industry itself.
If this sounds like you let’s talk.


The Opportunity – Software Spark and Database Developer for HMH Reporting
Software Engineers personify the notion of constant improvement as they work with their team to
build software that delivers on our mission to improve student outcomes. You’re not afraid to try new
things even if they don’t work out as expected. You are independent, self-directed, high energy and as
eager to contribute to your team as you are to progress on your own path to software
craftsmanship. You’ll thrive working in a fast-paced, low friction environment where you are exposed
to a wide range of cutting-edge technologies.

Reporting Platform:
You will be working on the Reporting Platform that is part of the HMH Educational Online/Digital
Learning Platform using cutting-edge technologies. The Reporting team builds highly scalable and
available platform. The platform is built using Microservices Architecture, Java microservices backend,
REACT JavaScript UI Frontend, REST APIs, AWS RDS Postgres Database, AWS Cloud technologies, AWS
Kafka, AWS Kinesis, Spark with Scala, Kubernetes or Mesos orchestration, Apache Airflow scheduler,
DataDog for logging/monitoring/alerting, Concourse CI or Jenkins, Maven etc.



• Implement complex queries and stored procedures to support REST APIs and batch rollups of
reports data for customer organizations.
• Writing, designing, testing, implementing, and maintaining database applications/procedures
using SQL or other database programming languages.
• Resolve performance issues, performance tuning of database systems, queries, indexing.
• Use of Apache Airflow scheduler to setup DB jobs to run automatically.
• Supporting streaming event processing using Spark framework with Scala.
• Manage and create data import and export processes (ETL) into the databases and create and
manage data integration scripts using file transfers, API calls, and/or other methods.
• Develop solutions using AWS database technologies like RDS Postgres and Aurora Postgres.
• Provide support for systems architecture for Reporting Platform.
• Setting up Monitor Dashboards and Alerts using DataDog to proactively catch issues.
• Diagnose and troubleshoot database errors.
• Create automation for repeating database tasks.
• DevOps knowledge to automate deployments using Jenkins or Concourse.

Skills & Experience
Successful Candidates must demonstrate an appropriate combination of:
• 3+ years of experience as a DB Developer, preferably with Postgres, creating and supporting
commercial data warehouses and data marts.
• 2+ years of experience working with Apache Spark and Scala development.
• 2+ years of experience in Java Backend Services programing.
• Plus for experience working with Airflow Schedulers with Python development.
• Strong hands-on working knowledge of managing databases on AWS Databases including: RDS
and Aurora.
• Strong command of SQL. SQL server tools, ETL jobs including stored procedures.
• Database technologies such as SQL, Aurora, Redshift, Liquibase or Flyway
• Cloud technologies such as AWS and Azure
• Data Center Operating Technologies such as Apache Mesos, Apache Aurora, and TerraForm and
container services such as Docker and Kubernetes
• Advanced knowledge of database security and performance monitoring standards.
• Understanding of relational and dimensional data modeling.
• Shell scripting skills.
• Knowledge of DataDog for setting up monitoring and alerting dashboards.
• Working knowledge of Jenkins or Concourse tool for CI/CD.
• Ability to work independently and in a group to provide sound design and technology
• Self-starter attitude with initiative & creativity.

• Ability to pay attention to details, dealing with interruptions and changing timelines and

• Ability to communicate and work effectively with all levels of company.
• Related AWS DBA Certification is a preferred.
• Working knowledge of AirFlow is a plus.
• Knowledge of AWS Database Migration Service and Lambda is a plus.

Required Education:
• A BS/MS in Computer Science, Computer Engineering, or a STEM field