Location: Hong Kong |
|
Phone: +852 6219 7438 |
|
Email: hckkelvin@gmail.com |
|
LinkedIn: linkedin.com/in/hckkelvin |
A diligent software engineer in commercial application development, creating and executing innovative software solutions to enhance business productivity.
Love to use and develop computer and programming skills in practical, find solutions for problems by manage and analyse data.
Highly experienced in all aspects of the software development life cycle and produced high quality documentation for clients.
Engineered modern applications with Scala, Python, Apache HBase, Apache Hadoop, Apache Spark, Apache ECharts.
Built innovative Apache Kafka microservices on top of Microsoft Kubernetes to stream millions of records in real-time.
Installed PostgreSQL cluster with auto-failover, backup scripts and monitor scripts.
Implemented ETL services with kettle and SSIS, enhanced performance of existing programs 30% by redesigning merge join logic.
Provided high quality, filed and organized documentation.
Deployed and integrated software engineered by team and updated integration and deployment scripts to improve continuous integration practices.
Final Year Project: Mobile Application Second-class, lower division
International Foundation Programme in Mathematics and Economics
Electives: Geography, Information and Communication Technology
I have used Apache Kafka to help clients handled millions of data from different database sources and files and transmitted to Microsoft Azure in real time.
Want to know more?I helped clients install PostgreSQL cluster with auto-failover, monitor and backup scripts. I also helped clients perform both major and minor upgrades to their PostgreSQL cluster.
Want to know more?I helped clients building Apache Kafka cluster on top of Microsoft Kubernetes which handled millions of data from different type of databases and transmitted them to Microsoft Azure in real time. Based on clients' requests, the records of users are stored in Azure SQL Database while the attachement of the records are stored in Azure Blob Storage.
Since Apache Kafka 2.7.0 were used in these projects, Apache Zookeeper is required to install manually which is used to monitor the Apache Kafka cluster.
Technology involoved
Note: Starting from Apache Kafka 2.8.0, Apache Zookeeper was replaced with a self-managed quorum, which means we don't have to install Apache Zookeeper if version of Apache Kafka is >= 2.8.0.
Apache Kafka is a distributed event streaming technology which is used for real-time data integration and streaming data pipelines. It was originally developed by LinkedIn for activity stream data and operational metrics. It was subsequently open-sourced in early 2011 to Apache. It is a distributed and partitioned message system which enable trillions of messages being processed and sent per second to numerous of receivers in real time. It is also higyly fault-tolerant and highly scable.
Netflix has two sets of Apache Kafka cluster: Fronting Kafka and Consumer Kafka.
Fronting Kafka clusters are in charge of obtaining messages from producers, which essentially every Netflix application instance is. They serve as data collectors and buffers for systems farther down the line.
Consumer Kafka clusters contain a subset of topics routed by Samza for real-time consumers.
In 2016, Netflix already operated 36 Kafka clusters whcih consist more than 4,000 broker instaces for both Fronting Kafka and Consumer Kafka. More than 700 billion messages are ingested with in a day.
Spotify is a digital music, podcast, and video service that gives you access to millions of songs and other content from creators all over the world. It has 82 Million songs on its platform while 422 million active Spotify users as of the first quarter of 2022.
Even though Spotify had decided to migrated from Apache Kafka to Google Cloud Pub/Sub in 2016, the way that Spotify uses Apache Kafka on their platform is still intriguing to observe: Whenever a user listens to a specific song or podcast, Spotify logs the experience as an event and utilizes it as data to learn more about the user's preferences. Additionally, Spotify records infrastructure-level events, such as when a logging server runs out of disk space. Events of more than 300 different types are gathered from Spotify users.
I helped clients building PostgreSQL cluster with master node and standyby node. Backup scripts and log rotation scripts are also provided with cron. Futhermore, since auto-failover script is also provided, there is no down time for the cluster. Once the master node is down, the auto-failover script will be executed with in a minute (this can be modified based on client's preference). The standy node will then be prompted as master node and two alert emails will be sent to clients: email about the fail of master node and email about the result of failover.
Technology involoved