Interview

Day to day work

  • Work1: Access Management of Various tools
AWS: Use AWS IAM to create users and groups, assign roles, and manage permissions.
New Relic: Use the New Relic account settings to invite users and assign roles.
Linux Servers: Manage users and permissions using adduser, usermod, and chmod commands.
Grafana: Manage users and roles via the Server Admin section in the Grafana UI.
ELK (Elasticsearch, Logstash, Kibana): Use Elasticsearch's role-based access control (RBAC) for user management.
Azure: Use Azure Active Directory (AD) to create users, groups, and assign roles.
Kibana: Configure user roles and permissions via the Kibana UI using Elasticsearch's RBAC.
  • Work2: Patching and Upgrading of Linux Servers
  • Work3: Automating SRE work using Ansible Code and Terraform Code
  • Work4: Being a part of oncall p1/p2/p3 and postmorte/rca work
  • Work5: Monitoring Infrasture using Newrelic and Alerts service
  • Work6: Database Server mgmt, Query Optimization, Backup and DB upgrade

Types of Roles in AWS

  1. Service Roles: Roles assumed by AWS services to perform tasks on your behalf.
  2. IAM Roles for Cross-Account Access: Roles allowing access between different AWS accounts.
  3. IAM Roles for Identity Federation: Roles allowing external identities to access AWS resources.
  4. AWS Lambda Execution Role: Roles that grant permissions to Lambda functions to interact with AWS services.
  5. AWS Elastic Beanstalk Service Role: Roles allowing Elastic Beanstalk to manage AWS resources for applications.
  6. IAM Roles for EC2: Roles assigned to EC2 instances to grant permissions for AWS resource actions.

Types of Permissions in AWS

  1. Actions: Specific operations that can be performed on AWS resources (e.g., s3:ListBucket).
  2. Resources: AWS entities that actions apply to (e.g., arn:aws:s3:::example-bucket).
  3. Effect: Specifies whether the action is allowed or denied (Allow or Deny).
  4. Conditions: Optional criteria that determine when the policy is in effect (e.g., based on IP address).

Common AWS Managed Policies

  1. AdministratorAccess: Full access to all AWS services and resources.
  2. AmazonS3FullAccess: Full access to all S3 resources.
  3. AmazonEC2FullAccess: Full access to all EC2 resources.
  4. AmazonDynamoDBFullAccess: Full access to DynamoDB resources.
  5. AmazonRDSFullAccess: Full access to RDS resources.
  6. AWSLambdaExecute: Permissions for Lambda functions to execute and manage CloudWatch logs.
  7. AmazonVPCFullAccess: Full access to manage VPC resources.
  8. ReadOnlyAccess: Read-only access to all AWS services and resources.

Custom Policies

  • Custom Policy: User-defined policies that specify precise permissions tailored to specific needs.

Work5: Monitoring Infrasture using Newrelic and Alerts service

Setting Up Alerts in New Relic

  • Login to New Relic: Navigate to the Alerts & AI section.Create Alert Policy: Create a new alert policy or select an existing one.Add Conditions: Add the conditions for the alerts mentioned above using New Relic Query Language (NRQL).Configure Notifications: Set up notification channels (email, Slack, etc.) to receive alerts.Save and Enable: Save the alert conditions and enable the alert policy.
  • Top 10 Alerts for Monitoring Infrastructure using New Relic

    • CPU Utilization Alert: Alerts when CPU usage exceeds 80%.
    • Memory Usage Alert: Alerts when memory usage exceeds 80%.
    • Disk Space Alert: Alerts when disk space usage exceeds 90%.
    • Network Latency Alert: Alerts when network latency exceeds 1000ms.
    • Application Response Time Alert: Alerts when application response time exceeds 2000ms.
    • Error Rate Alert: Alerts when the application error rate increases.
    • Service Down Alert: Alerts when a critical service (e.g., nginx) is down.
    • Database Performance Alert: Alerts when database query time exceeds 500ms.
    • Disk I/O Alert: Alerts when disk I/O operations exceed 100000 bytes/second.
    • HTTP Error Rate Alert: Alerts when HTTP error rate (4xx, 5xx) increases.

    Day to day activity

    working on client project with Palmeto Solutions- I have worked with different different client in last years

    My role and responsibility changes with change in client

    Work assigned to me via Jira ticket for tasks like automation of infrastructure using terraform and ansible

    Involve in P1 and P2 call like incident and post Morten and provide the resolution and root cause

    monitoring and observability doing Newrelic

    wrking on server configuration automation using ansible like patch upgrade for multiples servers

    Using terraform for infrastructure creation

    Apart from this working as a database administration , i have experience in database like SQl server, teradata,azure synapse, azure hyperscale, managing their servers and also working on performance section

    What work i do?

    troubleshooting and Post morten

    working for on call for multiples P1 and P2 issue

    Taking care of SRE work – like checking server , infrastructure application issue

    Looking at different aspects like network troubleshooting , performance trouble shooting, Disk troubleshooting, logging and monitoring trouble shooting

    Tasks of sre

    https://www.cloudopsnow.in/main-tasks-of-an-sre-site-reliability-engineer

    Check the logs of virtual machine– see Var log –>where sys log files resides

    we can check syslog files which is very import using grep command

    system log

    https://www.cloudopsnow.in/to-check-the-logs-of-a-virtual-machine-in-linux

    Types of errors encounter in virtual machine (VM) in Linux

    https://www.cloudopsnow.in/types-of-errors-encounter-in-virtual-machine-vm-in-linux

    check the system performance using like CPU, RAM , disk, IO, network

    https://www.cloudopsnow.in/to-check-system-performance-metrics-in-linux

    check the application using systemctl, service, ps, tail, grep, check logs

    https://www.cloudopsnow.in/troubleshooting-an-application-in-linux

    Common errors encounter for applications in Linux

    https://www.cloudopsnow.in/common-errors-encounter-for-applications-in-linux/

    Troubleshoot webserver

    https://www.cloudopsnow.in/troubleshooting-a-web-server-in-linux

    Common errors encounter in web server in Linux

    https://www.cloudopsnow.in/common-errors-encounter-in-web-server-in-linux/

    Troubleshooting a database server in Linux

    https://www.cloudopsnow.in/troubleshooting-a-database-server-in-linux/

    Capacity planning

    putting fake loads and checking the response time

    like for eg if putting 1 lacs loads and checking the response time we can figure out how much server needed

    to create this traffic we can use Jmeter for performance testing

    https://www.cloudopsnow.in/performance-testing-and-capacity-planning/

    What is Newrelic

    https://www.cloudopsnow.in/new-relic/

    It is cloud bases monitoring and observability tools, maninly SAAS tool

    Different teams working for Newrelic and my work Doing monitoring and observability work

    work like installing Newrelic agent on multiples VM using ansible

    ---
    - name: Install New Relic Agent on Multiple VMs
      hosts: webservers
      become: yes
    
      vars:
        newrelic_license_key: YOUR_NEWRELIC_LICENSE_KEY
    
      tasks:
        - name: Add New Relic APT key
          apt_key:
            url: https://download.newrelic.com/548C16BF.gpg
            state: present
    
        - name: Add New Relic APT repository
          apt_repository:
            repo: 'deb http://apt.newrelic.com/debian/ newrelic non-free'
            state: present
    
        - name: Update APT package index
          apt:
            update_cache: yes
    
        - name: Install New Relic Infrastructure agent
          apt:
            name: newrelic-infra
            state: present
    
        - name: Configure New Relic license key
          copy:
            dest: /etc/newrelic-infra.yml
            content: |
              license_key: "{{ newrelic_license_key }}"
    
        - name: Restart New Relic Infrastructure agent
          systemd:
            name: newrelic-infra
            state: restarted
            enabled: yes
    

    using Newelic agent can collect data like cpu, uptime, disk etc. for monitoring at newrelic end and monitor the data and also create alert like cpu utilisation.

    it helps in getting metrics and can monitor the RAM, CPU, disk etc.

    go to newrelic website and setup dashboard alert

    Issue which facing while working

    Different types of issue some issues like

    infrastructure issue

    application code issue

    Permission issue

    Application code issue – need to contact developer

    Infrastructure issue might be due to high memory because of long running jobs, high traffic, too many process running

    due to bug in software

    Recent issue

    We have multiples servers and i dont manage logs files and size because file size became huge and of that it cause space issue

    we enabled log rotate – utilize to rotate in 7 days

    https://www.cloudopsnow.in/log-rotation-in-linux/

    Ansible

    https://www.bestdevops.com/ansible-interview-questions-and-answers

    Linux

    https://www.bestdevops.com/how-to-troubleshoot-in-linux

    https://www.cloudopsnow.in/top-linux-troubleshooting-commands-with-examples/

    terraform

    https://www.ireviewed.in/ultimate-guide-for-terraform-from-basic-to-advance/

    AWS

    https://www.cloudopsnow.in/aws/

    Bash Scripting

    https://www.cloudopsnow.in/bash-scripting/

    What is Artifactory?

    Artifactory is like a library for all the files and packages that your development and operations teams use to build, test, and deploy software. Instead of storing these files in random places or downloading them from the internet every time you need them, Artifactory provides a centralized, organized, and efficient way to manage these important files.

    Why is Artifactory Important?

    1. Centralized Storage: It stores all your binaries, like Docker images, libraries, and other build artifacts, in one place.
    2. Version Control: Keeps track of different versions of these files so you can easily use the correct version for your needs.
    3. Efficient Builds: Speeds up build processes by caching dependencies and avoiding repeated downloads.
    4. Security: Ensures that only authorized users have access to the artifacts and that the artifacts are secure and reliable.
    5. Integration: Works seamlessly with other DevOps tools like Jenkins, Kubernetes, and CI/CD pipelines to automate and streamline the software delivery process.
    Subscribe
    Notify of
    guest
    0 Comments
    Inline Feedbacks
    View all comments